forked from pool/slurm
Accepting request 974433 from home:mslacken:branches:network:cluster
- Update to 21.08.7 with following changes: * openapi/v0.0.37 - correct calculation for bf_queue_len_mean in /diag. * Avoid shrinking a reservation when overlapping with downed nodes. * Only check TRES limits against current usage for TRES requested by the job. * Do not allocate shared gres (MPS) in whole-node allocations * Constrain slurmstepd to job/step cgroup like in previous versions of Slurm. * Fix warnings on 32-bit compilers related to printf() formats. * Fix reconfigure issues after disabling/reenabling the GANG PreemptMode. * Fix race condition where a cgroup was being deleted while another step was creating it. * Set the slurmd port correctly if multi-slurmd * Fix FAIL mail not being sent if a job was cancelled due to preemption. * slurmrestd - move debug logs for HTTP handling to be gated by debugflag NETWORK to avoid unnecessary logging of communication contents. * Fix issue with bad memory access when shrinking running steps. * Fix various issues with internal job accounting with GRES when jobs are shrunk. * Fix ipmi polling on slurmd reconfig or restart. * Fix srun crash when reserved ports are being used and het step fails to launch. * openapi/dbv0.0.37 - fix DELETE execution path on /user/{user_name}. * slurmctld - Properly requeue all components of a het job if PrologSlurmctld fails. * rlimits - remove final calls to limit nofiles to 4096 but to instead use the max possible nofiles in slurmd and slurmdbd. * Allow the DBD agent to load large messages (up to MAX_BUF_SIZE) from state. * Fix potential deadlock during slurmctld restart when there is a completing job. * slurmstepd - reduce user requested soft rlimits when they are above max hard rlimits to avoid rlimit request being completely ignored and OBS-URL: https://build.opensuse.org/request/show/974433 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=196
This commit is contained in:
parent
d442993ff4
commit
30c749c9e0
@ -1,3 +0,0 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:fce78185c5c69b3a9143286df641725503be7aa4c1d5cec9161ec72905ed4f8a
|
||||
size 6741051
|
3
slurm-21.08.7.tar.bz2
Normal file
3
slurm-21.08.7.tar.bz2
Normal file
@ -0,0 +1,3 @@
|
||||
version https://git-lfs.github.com/spec/v1
|
||||
oid sha256:43f3e978d8c2682c3d2e80517a03d836c53ab5a7587870ce8259da5232ab4fa3
|
||||
size 6744881
|
120
slurm.changes
120
slurm.changes
@ -1,3 +1,123 @@
|
||||
-------------------------------------------------------------------
|
||||
Mon May 2 14:12:59 UTC 2022 - Christian Goll <cgoll@suse.com>
|
||||
- Update to 21.08.7 with following changes:
|
||||
* openapi/v0.0.37 - correct calculation for bf_queue_len_mean in /diag.
|
||||
* Avoid shrinking a reservation when overlapping with downed nodes.
|
||||
* Only check TRES limits against current usage for TRES requested by the job.
|
||||
* Do not allocate shared gres (MPS) in whole-node allocations
|
||||
* Constrain slurmstepd to job/step cgroup like in previous versions of Slurm.
|
||||
* Fix warnings on 32-bit compilers related to printf() formats.
|
||||
* Fix reconfigure issues after disabling/reenabling the GANG PreemptMode.
|
||||
* Fix race condition where a cgroup was being deleted while another step
|
||||
was creating it.
|
||||
* Set the slurmd port correctly if multi-slurmd
|
||||
* Fix FAIL mail not being sent if a job was cancelled due to preemption.
|
||||
* slurmrestd - move debug logs for HTTP handling to be gated by debugflag
|
||||
NETWORK to avoid unnecessary logging of communication contents.
|
||||
* Fix issue with bad memory access when shrinking running steps.
|
||||
* Fix various issues with internal job accounting with GRES when jobs are
|
||||
shrunk.
|
||||
* Fix ipmi polling on slurmd reconfig or restart.
|
||||
* Fix srun crash when reserved ports are being used and het step fails
|
||||
to launch.
|
||||
* openapi/dbv0.0.37 - fix DELETE execution path on /user/{user_name}.
|
||||
* slurmctld - Properly requeue all components of a het job if PrologSlurmctld
|
||||
fails.
|
||||
* rlimits - remove final calls to limit nofiles to 4096 but to instead use
|
||||
the max possible nofiles in slurmd and slurmdbd.
|
||||
* Allow the DBD agent to load large messages (up to MAX_BUF_SIZE) from state.
|
||||
* Fix potential deadlock during slurmctld restart when there is a completing
|
||||
job.
|
||||
* slurmstepd - reduce user requested soft rlimits when they are above max
|
||||
hard rlimits to avoid rlimit request being completely ignored and
|
||||
processes using default limits.
|
||||
* Fix Slurm user commands displaying available features as active features
|
||||
when no features were active.
|
||||
* Don't power down nodes that are rebooting.
|
||||
* Clear pending node reboot on power down request.
|
||||
* Ignore node registrations while node is powering down.
|
||||
* Don't reboot any node that is power<ing|ed> down.
|
||||
* Don't allow a node to reboot if it's marked for power down.
|
||||
* Fix issuing reboot and downing when rebooting a powering up node.
|
||||
* Clear DRAIN on node after failing to resume before ResumeTimeout.
|
||||
* Prevent repeating power down if node fails to resume before ResumeTimeout.
|
||||
* Fix federated cloud node communication with srun and cloud_dns.
|
||||
* Fix jobs being scheduled on nodes marked to be powered_down when idle.
|
||||
* Fix problem where a privileged user could not view array tasks specified by
|
||||
<array_job_id>_<task_id> when PrivateData had the jobs value set.
|
||||
- Changes in Slurm 21.08.6
|
||||
* Fix plugin_name definitions in a number of plugins to improve logging.
|
||||
* Close sbcast file transfers when job is cancelled.
|
||||
* scrontab - fix handling of --gpus and --ntasks-per-gpu options.
|
||||
* sched/backfill - fix job_queue_rec_t memory leak.
|
||||
* Fix magnetic reservation logic in both main and backfill schedulers.
|
||||
* job_container/tmpfs - fix memory leak when using InitScript.
|
||||
* slurmrestd / openapi - fix memory leaks.
|
||||
* Fix slurmctld segfault due to job array resv_list double free.
|
||||
* Fix multi-reservation job testing logic.
|
||||
* Fix slurmctld segfault due to insufficient job reservation parse validation.
|
||||
* Fix main and backfill schedulers handling for already rejected job array.
|
||||
* sched/backfill - restore resv_ptr after yielding locks.
|
||||
* acct_gather_energy/xcc - appropriately close and destroy the IPMI context.
|
||||
* Protect slurmstepd from making multiple calls to the cleanup logic.
|
||||
* Prevent slurmstepd segfault at cleanup time in mpi_fini().
|
||||
* Fix slurmctld sometimes hanging if shutdown while PrologSlurmctld or
|
||||
EpilogSlurmctld were running and PrologEpilogTimeout is set in slurm.conf.
|
||||
* Fix affinity of the batch step if batch host is different than the first
|
||||
node in the allocation.
|
||||
* slurmdbd - fix segfault after multiple failover/failback operations.
|
||||
* Fix jobcomp filetxt job selection condition.
|
||||
* Fix -f flag of sacct not being used.
|
||||
* Select cores for job steps according to the socket distribution. Previously,
|
||||
sockets were always filled before selecting cores from the next socket.
|
||||
* Keep node in Future state if epilog completes while in Future state.
|
||||
* Fix erroneous --constraint behavior by preventing multiple sets of brackets.
|
||||
* Make ResetAccrueTime update the job's accrue_time to now.
|
||||
* Fix sattach initialization with configless mode.
|
||||
* Revert packing limit checks affecting pmi2.
|
||||
* sacct - fixed assertion failure when using -c option and a federation
|
||||
display
|
||||
* Fix issue that allowed steps to overallocate the job's memory.
|
||||
* Fix the sanity check mode of AutoDetect so that it actually works.
|
||||
* Fix deallocated nodes that didn't actually launch a job from waiting for
|
||||
Epilogslurmctld to complete before clearing completing node's state.
|
||||
* Job should be in a completing state if EpilogSlurmctld when being requeued.
|
||||
* Fix job not being requeued properly if all node epilog's completed before
|
||||
EpilogSlurmctld finished.
|
||||
* Keep job completing until EpilogSlurmctld is completed even when "downing"
|
||||
a node.
|
||||
* Fix handling reboot with multiple job features.
|
||||
* Fix nodes getting powered down when creating new partitions.
|
||||
* Fix bad bit_realloc which potentially could lead to bad memory access.
|
||||
* slurmctld - remove limit on the number of open files.
|
||||
* Fix bug where job_state file of size above 2GB wasn't saved without any
|
||||
error message.
|
||||
* Fix various issues with no_consume gres.
|
||||
* Fix regression in 21.08.0rc1 where job steps failed to launch on systems
|
||||
that reserved a CPU in a cgroup outside of Slurm (for example, on systems
|
||||
with WekaIO).
|
||||
* Fix OverTimeLimit not being reset on scontrol reconfigure when it is
|
||||
removed from slurm.conf.
|
||||
* serializer/yaml - use dynamic buffer to allow creation of YAML outputs
|
||||
larger than 1MiB.
|
||||
* Fix minor memory leak affecting openapi users at process termination.
|
||||
* Fix batch jobs not resolving the username when nss_slurm is enabled.
|
||||
* slurmrestd - Avoid slurmrestd ignoring invalid HTTP method if the response
|
||||
serialized without error.
|
||||
* openapi/dbv0.0.37 - Correct conditional that caused the diag output to
|
||||
give an internal server error status on success.
|
||||
* Make --mem-bind=sort work with task_affinity
|
||||
* Fix sacctmgr to set MaxJobsAccruePer{User|Account} and MinPrioThres in
|
||||
sacctmgr add qos, modify already worked correctly.
|
||||
* job_container/tmpfs - avoid printing extraneous error messages in Prolog
|
||||
and Epilog, and when the job completes.
|
||||
* Fix step CPU memory allocation with --threads-per-core without --exact.
|
||||
* Remove implicit --exact when --threads-per-core or --hint=nomultithread
|
||||
is used.
|
||||
* Do not allow a step to request more threads per core than the
|
||||
allocation did.
|
||||
* Remove implicit --exact when --cpus-per-task is used.
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Wed Dec 22 09:24:28 UTC 2021 - Christian Goll <cgoll@suse.com>
|
||||
|
||||
|
@ -1,7 +1,7 @@
|
||||
#
|
||||
# spec file
|
||||
#
|
||||
# Copyright (c) 2021 SUSE LLC
|
||||
# Copyright (c) 2022 SUSE LLC
|
||||
#
|
||||
# All modifications and additions to the file contributed by third parties
|
||||
# remain the property of their copyright owners, unless otherwise agreed
|
||||
@ -18,7 +18,7 @@
|
||||
|
||||
# Check file META in sources: update so_version to (API_CURRENT - API_AGE)
|
||||
%define so_version 37
|
||||
%define ver 21.08.5
|
||||
%define ver 21.08.7
|
||||
%define _ver _21_08
|
||||
%define dl_ver %{ver}
|
||||
# so-version is 0 and seems to be stable
|
||||
@ -1110,6 +1110,7 @@ exit 0
|
||||
%{_libdir}/slurm/gres_gpu.so
|
||||
%{_libdir}/slurm/gres_mps.so
|
||||
%{_libdir}/slurm/gres_nic.so
|
||||
%{_libdir}/slurm/hash_k12.so
|
||||
%{_libdir}/slurm/jobacct_gather_cgroup.so
|
||||
%{_libdir}/slurm/jobacct_gather_linux.so
|
||||
%{_libdir}/slurm/jobacct_gather_none.so
|
||||
|
Loading…
Reference in New Issue
Block a user