Accepting request 1030432 from network:cluster

- updated to 22.05.5
- NOTE: Slurm validates that libraries are of the same version. Unfortunately,
  due to an oversight, we failed to notice that the slurmstepd loads the
  hash_k12 library only after a job has completed. This means that if the
  hash_k12 library is upgraded before a job finishes, the slurmstepd will load
  the new library when the job finishes, and will fail due to a mismatch of
  versions.  This results in nodes with slurmstepd processes stuck
  indefinitely. These processes require manual intervention to clean up. There
  is no clean way to resolve these hung slurmstepd processes.
  The only recommended way to upgrade between minor versions of 22.05 with
  RPM’s or upgrades that replace current binaries and libraries is to drain the
  nodes of running jobs first.
- Fixes a number of moderate severity issues, noteable are:
  * Load hash plugin at slurmstepd launch time to prevent issues loading the
    plugin at step completion if the Slurm installation is upgraded.
  * Update nvml plugin to match the unique id format for MIG devices in new
    Nvidia drivers.
  * Fix multi-node step launch failure when nodes in the controller aren't in
    natural order. This can happen with inconsistent node naming (such as
    node15 and node052) or with dynamic nodes which can register in any order.
  * job_container/tmpfs - cleanup containers even when the .ns file isn't
    mounted anymore.
  * Wait up to PrologEpilogTimeout before shutting down slurmd to allow prolog
    and epilog scripts to complete or timeout. Previously, slurmd waited 120
    seconds before timing out and killing prolog and epilog scripts. (forwarded request 1010642 from mslacken)

OBS-URL: https://build.opensuse.org/request/show/1030432
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=79
This commit is contained in:
Dominique Leuenberger 2022-10-22 12:13:18 +00:00 committed by Git OBS Bridge
commit 220eec76a4
4 changed files with 33 additions and 4 deletions

View File

@ -1,3 +0,0 @@
version https://git-lfs.github.com/spec/v1
oid sha256:8ff2d1f1cc9b0cbdd344cfcbbe4f14b08d4260b7012619f6cc9c38263f276c41
size 7094002

3
slurm-22.05.5.tar.bz2 Normal file
View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:f687c98c4f7c0b7409f865771bbb05986daa3e207616667a9aa7390ba5a50fce
size 7098772

View File

@ -1,3 +1,32 @@
-------------------------------------------------------------------
Fri Oct 14 08:49:24 UTC 2022 - Christian Goll <cgoll@suse.com>
- updated to 22.05.5
- NOTE: Slurm validates that libraries are of the same version. Unfortunately,
due to an oversight, we failed to notice that the slurmstepd loads the
hash_k12 library only after a job has completed. This means that if the
hash_k12 library is upgraded before a job finishes, the slurmstepd will load
the new library when the job finishes, and will fail due to a mismatch of
versions. This results in nodes with slurmstepd processes stuck
indefinitely. These processes require manual intervention to clean up. There
is no clean way to resolve these hung slurmstepd processes.
The only recommended way to upgrade between minor versions of 22.05 with
RPMs or upgrades that replace current binaries and libraries is to drain the
nodes of running jobs first.
- Fixes a number of moderate severity issues, noteable are:
* Load hash plugin at slurmstepd launch time to prevent issues loading the
plugin at step completion if the Slurm installation is upgraded.
* Update nvml plugin to match the unique id format for MIG devices in new
Nvidia drivers.
* Fix multi-node step launch failure when nodes in the controller aren't in
natural order. This can happen with inconsistent node naming (such as
node15 and node052) or with dynamic nodes which can register in any order.
* job_container/tmpfs - cleanup containers even when the .ns file isn't
mounted anymore.
* Wait up to PrologEpilogTimeout before shutting down slurmd to allow prolog
and epilog scripts to complete or timeout. Previously, slurmd waited 120
seconds before timing out and killing prolog and epilog scripts.
-------------------------------------------------------------------
Sat Sep 24 07:34:31 UTC 2022 - Egbert Eich <eich@suse.com>

View File

@ -18,7 +18,7 @@
# Check file META in sources: update so_version to (API_CURRENT - API_AGE)
%define so_version 38
%define ver 22.05.2
%define ver 22.05.5
%define _ver _22_05
%define dl_ver %{ver}
# so-version is 0 and seems to be stable