Accepting request 1161499 from home:mslacken:branches:network:cluster
- removed Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch as incoperated upstream * Changes in Slurm 23.02.5 * Add the JobId to debug() messages indicating when cpus_per_task/mem_per_cpu or pn_min_cpus are being automatically adjusted. * Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if a node features plugin is configured. * Fix and prevent reoccurring reservations from overlapping. * job_container/tmpfs - Avoid attempts to share BasePath between nodes. * Change the log message warning for rate limited users from verbose to info. * With CR_Cpu_Memory, fix node selection for jobs that request gres and *-mem-per-cpu. * Fix a regression from 22.05.7 in which some jobs were allocated too few nodes, thus overcommitting cpus to some tasks. * Fix a job being stuck in the completing state if the job ends while the primary controller is down or unresponsive and the backup controller has not yet taken over. * Fix slurmctld segfault when a node registers with a configured CpuSpecList while slurmctld configuration has the node without CpuSpecList. * Fix cloud nodes getting stuck in POWERED_DOWN+NO_RESPOND state after not registering by ResumeTimeout. * slurmstepd - Avoid cleanup of config.json-less containers spooldir getting skipped. * slurmstepd - Cleanup per task generated environment for containers in spooldir. * Fix scontrol segfault when 'completing' command requested repeatedly in interactive mode. * Properly handle a race condition between bind() and listen() calls in the network stack when running with SrunPortRange set. * Federation - Fix revoked jobs being returned regardless of the -a/--all OBS-URL: https://build.opensuse.org/request/show/1161499 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=292
This commit is contained in:
parent
2bd53c8d44
commit
cda5ce024e
@ -1,27 +0,0 @@
|
||||
From: Egbert Eich <eich@suse.com>
|
||||
Date: Wed Jun 22 16:32:35 2022 +0200
|
||||
Subject: Keep logs of skipped test when running test cases sequentially.
|
||||
Patch-mainline: Not yet
|
||||
Git-repo: https://github.com/SchedMD/slurm
|
||||
Git-commit: 457a53ca97b50530bb2fafda72d465507c434960
|
||||
References:
|
||||
|
||||
Signed-off-by: Egbert Eich <eich@suse.com>
|
||||
Signed-off-by: Egbert Eich <eich@suse.de>
|
||||
---
|
||||
testsuite/expect/regression.py | 3 ++-
|
||||
1 file changed, 2 insertions(+), 1 deletion(-)
|
||||
diff --git a/testsuite/expect/regression.py b/testsuite/expect/regression.py
|
||||
index bcccaadbf5..b39af0c4e2 100755
|
||||
--- a/testsuite/expect/regression.py
|
||||
+++ b/testsuite/expect/regression.py
|
||||
@@ -199,7 +199,8 @@ def main(argv=None):
|
||||
sys.stdout.write('SKIPPED\n')
|
||||
if not options.keep_logs:
|
||||
try:
|
||||
- os.remove(testlog_name)
|
||||
+# os.remove(testlog_name)
|
||||
+ os.rename(testlog_name, testlog_name+'.skipped')
|
||||
except IOError as e:
|
||||
print('ERROR failed to close %s %s' % (testlog_name, e),
|
||||
file=sys.stederr);
|
@ -1,3 +0,0 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:5ad59832f3cf70832a14d08997867af6f0a4ab10340dc89d5a65a275373836ea
|
||||
size 7359396
|
3
slurm-23.11.5.tar.bz2
Normal file
3
slurm-23.11.5.tar.bz2
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:7a8f4b1b46d3a8ec9a95066b04635c97f9095877f6189a8ff7388e5e74daeef3
|
||||
size 7365175
|
180
slurm.changes
180
slurm.changes
@ -1,3 +1,183 @@
|
||||
-------------------------------------------------------------------
|
||||
Mon Mar 25 15:16:44 UTC 2024 - Christian Goll <cgoll@suse.com>
|
||||
|
||||
- removed Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch
|
||||
as incoperated upstream
|
||||
* Changes in Slurm 23.02.5
|
||||
* Add the JobId to debug() messages indicating when cpus_per_task/mem_per_cpu
|
||||
or pn_min_cpus are being automatically adjusted.
|
||||
* Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if
|
||||
a node features plugin is configured.
|
||||
* Fix and prevent reoccurring reservations from overlapping.
|
||||
* job_container/tmpfs - Avoid attempts to share BasePath between nodes.
|
||||
* Change the log message warning for rate limited users from verbose to info.
|
||||
* With CR_Cpu_Memory, fix node selection for jobs that request gres and
|
||||
*-mem-per-cpu.
|
||||
* Fix a regression from 22.05.7 in which some jobs were allocated too few
|
||||
nodes, thus overcommitting cpus to some tasks.
|
||||
* Fix a job being stuck in the completing state if the job ends while the
|
||||
primary controller is down or unresponsive and the backup controller has
|
||||
not yet taken over.
|
||||
* Fix slurmctld segfault when a node registers with a configured CpuSpecList
|
||||
while slurmctld configuration has the node without CpuSpecList.
|
||||
* Fix cloud nodes getting stuck in POWERED_DOWN+NO_RESPOND state after not
|
||||
registering by ResumeTimeout.
|
||||
* slurmstepd - Avoid cleanup of config.json-less containers spooldir getting
|
||||
skipped.
|
||||
* slurmstepd - Cleanup per task generated environment for containers in
|
||||
spooldir.
|
||||
* Fix scontrol segfault when 'completing' command requested repeatedly in
|
||||
interactive mode.
|
||||
* Properly handle a race condition between bind() and listen() calls in the
|
||||
network stack when running with SrunPortRange set.
|
||||
* Federation - Fix revoked jobs being returned regardless of the -a/--all
|
||||
option for privileged users.
|
||||
* Federation - Fix canceling pending federated jobs from non-origin clusters
|
||||
which could leave federated jobs orphaned from the origin cluster.
|
||||
* Fix sinfo segfault when printing multiple clusters with --noheader option.
|
||||
* Federation - fix clusters not syncing if clusters are added to a federation
|
||||
before they have registered with the dbd.
|
||||
* Change pmi2 plugin to honor the SrunPortRange option. This matches the new
|
||||
behavior of the pmix plugin in 23.02.0. Note that neither of these plugins
|
||||
makes use of the "MpiParams=ports=" option, and previously were only limited
|
||||
by the systems ephemeral port range.
|
||||
* node_features/helpers - Fix node selection for jobs requesting changeable
|
||||
features with the '|' operator, which could prevent jobs from running on
|
||||
some valid nodes.
|
||||
* node_features/helpers - Fix inconsistent handling of '&' and '|', where an
|
||||
AND'd feature was sometimes AND'd to all sets of features instead of just
|
||||
the current set. E.g. "foo|bar&baz" was interpreted as {foo,baz} or
|
||||
{bar,baz} instead of how it is documented: "{foo} or {bar,baz}".
|
||||
* Fix job accounting so that when a job is requeued its allocated node count
|
||||
is cleared. After the requeue, sacct will correctly show that the job has
|
||||
0 AllocNodes while it is pending or if it is canceled before restarting.
|
||||
* sacct - AllocCPUS now correctly shows 0 if a job has not yet received an
|
||||
allocation or if the job was canceled before getting one.
|
||||
* Fix intel oneapi autodetect: detect the /dev/dri/renderD[0-9]+ gpus, and do
|
||||
not detect /dev/dri/card[0*9]+.
|
||||
* Format batch, extern, interactive, and pending step ids into strings that
|
||||
are human readable.
|
||||
* Fix node selection for jobs that request --gpus and a number of tasks fewer
|
||||
than gpus, which resulted in incorrectly rejecting these jobs.
|
||||
* Remove MYSQL_OPT_RECONNECT completely.
|
||||
* Fix cloud nodes in POWERING_UP state disappearing (getting set to FUTURE)
|
||||
when an `scontrol reconfigure` happens.
|
||||
* openapi/dbv0.0.39 - Avoid assert / segfault on missing coordinators list.
|
||||
* slurmrestd - Correct memory leak while parsing OpenAPI specification
|
||||
templates with server overrides.
|
||||
* slurmrestd - Reduce memory usage when printing out job CPU frequency.
|
||||
* Fix overwriting user node reason with system message.
|
||||
* Remove --uid / --gid options from salloc and srun commands.
|
||||
* Prevent deadlock when rpc_queue is enabled.
|
||||
* slurmrestd - Correct OpenAPI specification generation bug where fields with
|
||||
overlapping parent paths would not get generated.
|
||||
* Fix memory leak as a result of a partition info query.
|
||||
* Fix memory leak as a result of a job info query.
|
||||
* slurmrestd - For 'GET /slurm/v0.0.39/node[s]', change format of node's
|
||||
energy field "current_watts" to a dictionary to account for unset value
|
||||
instead of dumping 4294967294.
|
||||
* slurmrestd - For 'GET /slurm/v0.0.39/qos', change format of QOS's
|
||||
field "priority" to a dictionary to account for unset value instead of
|
||||
dumping 4294967294.
|
||||
* slurmrestd - For 'GET /slurm/v0.0.39/job[s]', the 'return code' code field
|
||||
in v0.0.39_job_exit_code will be set to *127 instead of being left unset
|
||||
where job does not have a relevant return code.
|
||||
* data_parser/v0.0.39 - Add required/memory_per_cpu and
|
||||
required/memory_per_node to `sacct *-json` and `sacct --yaml` and
|
||||
'GET /slurmdb/v0.0.39/jobs' from slurmrestd.
|
||||
* For step allocations, fix --gres=none sometimes not ignoring gres from the
|
||||
job.
|
||||
* Fix --exclusive jobs incorrectly gang-scheduling where they shouldn't.
|
||||
* Fix allocations with CR_SOCKET, gres not assigned to a specific socket, and
|
||||
block core distribion potentially allocating more sockets than required.
|
||||
* gpu/oneapi - Store cores correctly so CPU affinity is tracked.
|
||||
* Revert a change in 23.02.3 where Slurm would kill a script's process group
|
||||
as soon as the script ended instead of waiting as long as any process in
|
||||
that process group held the stdout/stderr file descriptors open. That change
|
||||
broke some scripts that relied on the previous behavior. Setting time limits
|
||||
for scripts (such as PrologEpilogTimeout) is strongly encouraged to avoid
|
||||
Slurm waiting indefinitely for scripts to finish.
|
||||
* Allow slurmdbd -R to work if the root assoc id is not 1.
|
||||
* Fix slurmdbd -R not returning an error under certain conditions.
|
||||
* slurmdbd - Avoid potential NULL pointer dereference in the mysql plugin.
|
||||
* Revert a change in 23.02 where SLURM_NTASKS was no longer set in the job's
|
||||
environment when *-ntasks-per-node was requested.
|
||||
* Limit periodic node registrations to 50 instead of the full TreeWidth.
|
||||
Since unresolvable cloud/dynamic nodes must disable fanout by setting
|
||||
TreeWidth to a large number, this would cause all nodes to register at
|
||||
once.
|
||||
* Fix regression in 23.02.3 which broken x11 forwarding for hosts when
|
||||
MUNGE sends a localhost address in the encode host field. This is caused
|
||||
when the node hostname is mapped to 127.0.0.1 (or similar) in /etc/hosts.
|
||||
* openapi/[db]v0.0.39 - fix memory leak on parsing error.
|
||||
* data_parser/v0.0.39 - fix updating qos for associations.
|
||||
* openapi/dbv0.0.39 - fix updating values for associations with null users.
|
||||
* Fix minor memory leak with --tres-per-task and licenses.
|
||||
* Fix cyclic socket cpu distribution for tasks in a step where
|
||||
--cpus-per-task < usable threads per core.
|
||||
- Changes in Slurm 23.02.4
|
||||
* Fix sbatch return code when **wait is requested on a job array.
|
||||
* switch/hpe_slingshot * avoid segfault when running with old libcxi.
|
||||
* Avoid slurmctld segfault when specifying AccountingStorageExternalHost.
|
||||
* Fix collected GPUUtilization values for acct_gather_profile plugins.
|
||||
* Fix slurmrestd handling of job hold/release operations.
|
||||
* Make spank S_JOB_ARGV item value hold the requested command argv instead of
|
||||
the srun **bcast value when **bcast requested (only in local context).
|
||||
* Fix step running indefinitely when slurmctld takes more than MessageTimeout
|
||||
to respond. Now, slurmctld will cancel the step when detected, preventing
|
||||
following steps from getting stuck waiting for resources to be released.
|
||||
* Fix regression to make job_desc.min_cpus accurate again in job_submit when
|
||||
requesting a job with **ntasks*per*node.
|
||||
* scontrol * Permit changes to StdErr and StdIn for pending jobs.
|
||||
* scontrol * Reset std{err,in,out} when set to empty string.
|
||||
* slurmrestd * mark environment as a required field for job submission
|
||||
descriptions.
|
||||
* slurmrestd * avoid dumping null in OpenAPI schema required fields.
|
||||
* data_parser/v0.0.39 * avoid rejecting valid memory_per_node formatted as
|
||||
dictionary provided with a job description.
|
||||
* data_parser/v0.0.39 * avoid rejecting valid memory_per_cpu formatted as
|
||||
dictionary provided with a job description.
|
||||
* slurmrestd * Return HTTP error code 404 when job query fails.
|
||||
* slurmrestd * Add return schema to error response to job and license query.
|
||||
* Fix handling of ArrayTaskThrottle in backfill.
|
||||
* Fix regression in 23.02.2 when checking gres state on slurmctld startup or
|
||||
reconfigure. Gres changes in the configuration were not updated on slurmctld
|
||||
startup. On startup or reconfigure, these messages were present in the log:
|
||||
"error: Attempt to change gres/gpu Count".
|
||||
* Fix potential double count of gres when dealing with limits.
|
||||
* switch/hpe_slingshot * support alternate traffic class names with "TC_"
|
||||
prefix.
|
||||
* scrontab * Fix cutting off the final character of quoted variables.
|
||||
* Fix slurmstepd segfault when ContainerPath is not set in oci.conf
|
||||
* Change the log message warning for rate limited users from debug to verbose.
|
||||
* Fixed an issue where jobs requesting licenses were incorrectly rejected.
|
||||
* smail * Fix issues where e*mails at job completion were not being sent.
|
||||
* scontrol/slurmctld * fix comma parsing when updating a reservation's nodes.
|
||||
* cgroup/v2 * Avoid capturing log output for ebpf when constraining devices,
|
||||
as this can lead to inadvertent failure if the log buffer is too small.
|
||||
* Fix **gpu*bind=single binding tasks to wrong gpus, leading to some gpus
|
||||
having more tasks than they should and other gpus being unused.
|
||||
* Fix main scheduler loop not starting after failover to backup controller.
|
||||
* Added error message when attempting to use sattach on batch or extern steps.
|
||||
* Fix regression in 23.02 that causes slurmstepd to crash when srun requests
|
||||
more than TreeWidth nodes in a step and uses the pmi2 or pmix plugin.
|
||||
* Reject job ArrayTaskThrottle update requests from unprivileged users.
|
||||
* data_parser/v0.0.39 * populate description fields of property objects in
|
||||
generated OpenAPI specifications where defined.
|
||||
* slurmstepd * Avoid segfault caused by ContainerPath not being terminated by
|
||||
'/' in oci.conf.
|
||||
* data_parser/v0.0.39 * Change v0.0.39_job_info response to tag exit_code
|
||||
field as being complex instead of only an unsigned integer.
|
||||
* job_container/tmpfs * Fix %h and %n substitution in BasePath where %h was
|
||||
substituted as the NodeName instead of the hostname, and %n was substituted
|
||||
as an empty string.
|
||||
* Fix regression where **cpu*bind=verbose would override TaskPluginParam.
|
||||
* scancel * Fix **clusters/*M for federations. Only filtered jobs (e.g. *A,
|
||||
*u, *p, etc.) from the specified clusters will be canceled, rather than all
|
||||
jobs in the federation. Specific jobids will still be routed to the origin
|
||||
cluster for cancellation.
|
||||
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Mon Jan 29 13:47:55 UTC 2024 - Egbert Eich <eich@suse.com>
|
||||
|
||||
|
@ -19,7 +19,7 @@
|
||||
# Check file META in sources: update so_version to (API_CURRENT - API_AGE)
|
||||
%define so_version 40
|
||||
# Make sure to update `upgrades` as well!
|
||||
%define ver 23.11.3
|
||||
%define ver 23.11.5
|
||||
%define _ver _23_11
|
||||
%define dl_ver %{ver}
|
||||
# so-version is 0 and seems to be stable
|
||||
@ -171,7 +171,7 @@ Source21: README_Testsuite.md
|
||||
Patch0: Remove-rpath-from-build.patch
|
||||
Patch2: pam_slurm-Initialize-arrays-and-pass-sizes.patch
|
||||
Patch10: Fix-test-21.41.patch
|
||||
Patch14: Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch
|
||||
#Patch14: Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch
|
||||
Patch15: Fix-test7.2-to-find-libpmix-under-lib64-as-well.patch
|
||||
|
||||
%{upgrade_dep %pname}
|
||||
@ -1112,7 +1112,8 @@ rm -rf /srv/slurm-testsuite/src /srv/slurm-testsuite/testsuite \
|
||||
%{_mandir}/man1/sjobexitmod.1.*
|
||||
%{_mandir}/man1/sjstat.1.*
|
||||
%{_mandir}/man8/slurmctld.*
|
||||
%{_mandir}/man8/spank*
|
||||
%{_mandir}/man8/spank.*
|
||||
%{_mandir}/man8/sackd.*
|
||||
|
||||
%files openlava
|
||||
%{_bindir}/bjobs
|
||||
|
Loading…
Reference in New Issue
Block a user