Accepting request 1280446 from network:cluster

elevated privileges (CVE-2025-43904, bsc#1243666). OBS-URL: https://build.opensuse.org/request/show/1280446 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=115
elevated privileges (CVE-2025-43904, bsc#1243666).
2025-05-27 16:43:12 +00:00 · 2025-05-27 06:01:40 +00:00 · 2025-05-26 18:16:48 +00:00 · 2025-05-26 16:02:01 +00:00 · 2025-05-26 07:30:05 +00:00
5 changed files with 275 additions and 15 deletions
--- a/slurm-24.11.3.tar.bz2
+++ b/slurm-24.11.3.tar.bz2
@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:e9ed8826a26fcf7106cbf772c7142170672a422a56c96567d2c2dc2c5f369bc3
-size 7248260
--- a/slurm-24.11.5.tar.bz2
+++ b/slurm-24.11.5.tar.bz2
@@ -0,0 +1,3 @@
+version https://git-lfs.github.com/spec/v1
+oid sha256:bd8d28d870676dbd00634aa0a79ffe4706907bd48c86f988f9e2c7b15d1fc0a7
+size 7261594
--- a/slurm.changes
+++ b/slurm.changes
@@ -1,26 +1,283 @@
+-------------------------------------------------------------------
+Mon May 26 08:17:32 UTC 2025 - Egbert Eich <eich@suse.com>
+
+- Update to version 24.11.5
+  * Fix security issue where a coordinator could add a user with
+	elevated privileges (CVE-2025-43904, bsc#1243666).
+  * Return error to `scontrol` reboot on bad nodelists.
+  * `slurmrestd` - Report an error when QOS resolution fails for
+	v0.0.40 endpoints.
+  * `slurmrestd` - Report an error when QOS resolution fails for
+	v0.0.41 endpoints.
+  * `slurmrestd` - Report an error when QOS resolution fails for
+	v0.0.42 endpoints.
+  * `data_parser/v0.0.42` - Added `+inline_enums` flag which
+	modifies the output when generating OpenAPI specification.
+	It causes enum arrays to not be defined in their own schema
+	with references (`$ref`) to them. Instead they will be dumped
+	inline.
+  * Fix binding error with `tres-bind map/mask` on partial node
+	allocations.
+  * Fix `stepmgr` enabled steps being able to request features.
+  * Reject step creation if requested feature is not available
+	in job.
+  * `slurmd` - Restrict listening for new incoming RPC requests
+	further into startup.
+  * `slurmd` - Avoid `auth/slurm` related hangs of CLI commands
+	during startup and shutdown.
+  * `slurmctld` - Restrict processing new incoming RPC requests
+	further into startup. Stop processing requests sooner during
+	shutdown.
+  * `slurmcltd` - Avoid auth/slurm related hangs of CLI commands
+	during startup and shutdown.
+  * `slurmctld` - Avoid race condition during shutdown or
+	ereconfigure that could result in a crash due delayed
+	processing of a connection while plugins are unloaded.
+  * Fix small memleak when getting the job list from the database.
+  * Fix incorrect printing of `%` escape characters when printing
+	stdio fields for jobs.
+  * Fix padding parsing when printing stdio fields for jobs.
+  * Fix printing `%A` array job id when expanding patterns.
+  * Fix reservations causing jobs to be held for `Bad Constraints`.
+  * `switch/hpe_slingshot` - Prevent potential segfault on failed
+	curl request to the fabric manager.
+  * Fix printing incorrect array job id when expanding stdio file
+	names. The `%A` will now be substituted by the correct value.
+  * Fix printing incorrect array job id when expanding stdio file
+	names. The `%A` will now be substituted by the correct value.
+  * `switch/hpe_slingshot` - Fix VNI range not updating on slurmctld
+	restart or reconfigre.
+  * Fix steps not being created when using certain combinations of
+	`-c` and `-n` inferior to the jobs requested resources, when
+	using stepmgr and nodes are configured with
+	`CPUs == Sockets*CoresPerSocket`.
+  * Permit configuring the number of retry attempts to destroy CXI
+	service via the new destroy_retries `SwitchParameter`.
+  * Do not reset `memory.high` and `memory.swap.max` in slurmd
+	startup or reconfigure as we are never really touching this
+	in `slurmd`.
+  * Fix reconfigure failure of slurmd when it has been started
+	manually and the `CoreSpecLimits` have been removed from
+	`slurm.conf`.
+  * Set or reset CoreSpec limits when slurmd is reconfigured and
+	it was started with systemd.
+  * `switch/hpe-slingshot` - Make sure the slurmctld can free
+	step VNIs after the controller restarts or reconfigures while
+	the job is running.
+  * Fix backup `slurmctld` failure on 2nd takeover.
+- Changes from version 24.11.4
+  * `slurmctld`,`slurmrestd` - Avoid possible race condition that
+    could have caused process to crash when listener socket was
+    closed while accepting a new connection.
+  * `slurmrestd` - Avoid race condition that could have resulted
+	in address logged for a UNIX socket to be incorrect.
+  * `slurmrestd` - Fix parameters in OpenAPI specification for the
+    following endpoints to have `job_id` field:
+    ```
+    GET /slurm/v0.0.40/jobs/state/
+    GET /slurm/v0.0.41/jobs/state/
+    GET /slurm/v0.0.42/jobs/state/
+    GET /slurm/v0.0.43/jobs/state/
+    ```
+  * `slurmd` - Fix tracking of thread counts that could cause
+	incoming connections to be ignored after burst of simultaneous
+	incoming connections that trigger delayed response logic.
+  * Avoid unnecessary `SRUN_TIMEOUT` forwarding to `stepmgr`.
+  * Fix jobs being scheduled on higher weighted powered down nodes.
+  * Fix how backfill scheduler filters nodes from the available
+	nodes based on exclusive user and `mcs_label` requirements.
+  * `acct_gather_energy/{gpu,ipmi}` - Fix potential energy
+	consumption adjustment calculation underflow.
+  * `acct_gather_energy/ipmi` - Fix regression introduced in 24.05.5
+	(which introduced the new way of preserving energy measurements
+	through slurmd restarts) when `EnergyIPMICalcAdjustment=yes`.
+  * Prevent `slurmctld` deadlock in the assoc mgr.
+  * Fix memory leak when `RestrictedCoresPerGPU` is enabled.
+  * Fix preemptor jobs not entering execution due to wrong
+	calculation of accounting policy limits.
+  * Fix certain job requests that were incorrectly denied with
+	node configuration unavailable error.
+  * `slurmd` - Avoid crash due when slurmd has a communications
+	failure with `slurmstepd`.
+  * Fix memory leak when parsing yaml input.
+  * Prevent `slurmctld` from showing error message about `PreemptMode=GANG`
+	being a cluster-wide option for `scontrol update part` calls
+	that don't attempt to modify partition PreemptMode.
+  * Fix setting `GANG` preemption on partition when updating
+	`PreemptMode` with `scontrol`.
+  * Fix `CoreSpec` and `MemSpec` limits not being removed
+	from previously configured slurmd.
+  * Avoid race condition that could lead to a deadlock when `slurmd`,
+	`slurmstepd`, `slurmctld`, `slurmrestd` or `sackd` have a fatal
+	event.
+  * Fix jobs using `--ntasks-per-node` and `--mem` keep pending
+	forever	when the requested mem divided by the number of CPUs
+	will surpass the configured `MaxMemPerCPU`.
+  * `slurmd` - Fix address logged upon new incoming RPC connection
+    from `INVALID` to IP address.
+  * Fix memory leak when retrieving reservations. This affects
+	`scontrol`, `sinfo`, `sview`, and the following `slurmrestd`
+	endpoints:
+    `GET /slurm/{any_data_parser}/reservation/{reservation_name}`
+    `GET /slurm/{any_data_parser}/reservations`
+  * Log warning instead of `debuflags=conmgr` gated log when
+	deferring new incoming connections when number of active
+	connections exceed `conmgr_max_connections`.
+  * Avoid race condition that could result in worker thread pool
+	not activating all threads at once after a reconfigure resulting
+	in lower utilization of available CPU threads until enough
+	internal activity wakes up all threads in the worker pool.
+  * Avoid theoretical race condition that could result in new
+	incoming RPC
+    socket connections being ignored after reconfigure.
+  * slurmd - Avoid race condition that could result in a state
+	where	new incoming RPC connections will always be ignored.
+  * Add ReconfigFlags=KeepNodeStateFuture to restore saved `FUTURE`
+	node state on restart and reconfig instead of reverting to
+	`FUTURE` state. This will be made the default in 25.05.
+  * Fix case where hetjob submit would cause `slurmctld` to crash.
+  * Fix jobs using `--cpus-per-gpu` and `--mem` keep pending forever
+	when the requested mem divided by the number of CPUs will surpass
+	the configured `MaxMemPerCPU`.
+  * Enforce that jobs using `--mem` and several `--*-per-*` options
+	do not violate the `MaxMemPerCPU` in place.
+  * `slurmctld` - Fix use-cases of jobs incorrectly pending held
+	when `--prefer` features are not initially satisfied.
+  * `slurmctld` - Fix jobs incorrectly held when `--prefer` not
+	satisfied in some use-cases.
+  * Ensure `RestrictedCoresPerGPU` and `CoreSpecCount` don't overlap.
+- Fix backward compatibility fallout from last update.
+
 -------------------------------------------------------------------
 Thu Apr 24 12:31:16 UTC 2025 - Christian Goll <cgoll@suse.com>

- removed openmpi4-hpc dependency for test suite 
+- removed openmpi4-hpc dependency for test suite.

 -------------------------------------------------------------------
 Fri Mar  7 09:44:31 UTC 2025 - Atri Bhattacharya <badshah400@gmail.com>

 - Update to version 24.11.3:
  * Fix database cluster ID generation not being random.
-  * Fix a regression in which slurmd -G gave no output.
-  * Fix a long-standing crash in slurmctld after updating a
-    reservation with an empty nodelist.
-  * Other minor to moderate bugs.
- Sync upgrades file to relfect last updated versions.
- Pass '-DH5_USE_112_API -DDH5Oget_info_vers=1' to CFLAGS to allow
+  * Fix a regression in which `slurmd -G` gave no output.
+  * Fix a long-standing crash in `slurmctld` after updating a
+    reservation with an empty nodelist. The crash could occur
+	after restarting slurmctld, or if downing/draining a node
+	in the reservation with the `REPLACE` or `REPLACE_DOWN` flag.
+  * Avoid changing process name to "`watch`" from original daemon name.
+    This could potentially breaking some monitoring scripts.
+  * Avoid `slurmctld` being killed by `SIGALRM` due to race condition
+    at startup.
+  * Fix race condition in slurmrestd that resulted in "`Requested
+    data_parser plugin does not support OpenAPI plugin`" error being
+	returned for valid endpoints.
+  * Fix race between `task/cgroup` CPUset and `jobacctgather/cgroup`.
+    The first was removing the pid from `task_X` cgroup directory
+	causing memory limits to not being applied.
+  * If multiple partitions are requested, set the `SLURM_JOB_PARTITION`
+    output environment variable to the partition in which the job is
+	running for `salloc` and `srun` in order to match the documentation
+	and the behavior of `sbatch`.
+  * `srun` - Fixed wrongly constructed `SLURM_CPU_BIND` env variable
+    that could get propagated to downward srun calls in certain mpi
+    environments, causing launch failures.
+  * Don't print misleading errors for stepmgr enabled steps.
+  * `slurmrestd` - Avoid connection to slurmdbd for the following
+    endpoints:
+	```
+    GET /slurm/v0.0.41/jobs
+    GET /slurm/v0.0.41/job/{job_id}
+	```
+  * `slurmrestd` - Avoid connection to slurmdbd for the following
+    endpoints:
+	```
+    GET /slurm/v0.0.40/jobs
+    GET /slurm/v0.0.40/job/{job_id}
+	```
+  * `slurmrestd` - Fix possible memory leak when parsing arrays with
+    `data_parser/v0.0.40`.
+  * `slurmrestd` - Fix possible memory leak when parsing arrays with
+    `data_parser/v0.0.41`.
+  * `slurmrestd` - Fix possible memory leak when parsing arrays with
+    `data_parser/v0.0.42`.
+
+- Changes from version 24.11.2:
+  * Fix segfault when submitting `--test-only` jobs that can
+    preempt.
+  * Fix regression introduced in 23.11 that prevented the
+    following flags from being added to a reservation on an
+    update: `DAILY`, `HOURLY`, `WEEKLY`, `WEEKDAY`, and `WEEKEND`.
+  * Fix crash and issues evaluating job's suitability for running
+    in nodes with already suspended job(s) there.
+  * `slurmctld` will ensure that healthy nodes are not reported as
+    `UnavailableNodes` in job reason codes.
+  * Fix handling of jobs submitted to a current reservation with
+    flags `OVERLAP,FLEX` or `OVERLAP,ANY_NODES` when it overlaps nodes
+    with a future maintenance reservation. When a job submission
+    had a time limit that overlapped with the future maintenance
+    reservation, it was rejected. Now the job is accepted but
+    stays pending with the reason "`ReqNodeNotAvail, Reserved for
+    maintenance`".
+  * `pam_slurm_adopt` - avoid errors when explicitly setting some
+    arguments to the default value.
+  * Fix QOS preemption with `PreemptMode=SUSPEND`.
+  * `slurmdbd` - When changing a user's name update lineage at the
+    same time.
+  * Fix regression in 24.11 in which `burst_buffer.lua` does not
+    inherit the `SLURM_CONF` environment variable from `slurmctld` and
+    fails to run if slurm.conf is in a non-standard location.
+  * Fix memory leak in slurmctld if `select/linear` and the
+    `PreemptParameters=reclaim_licenses` options are both set in
+    `slurm.conf`.  Regression in 24.11.1.
+  * Fix running jobs, that requested multiple partitions, from
+    potentially being set to the wrong partition on restart.
+  * `switch/hpe_slingshot` - Fix compatibility with newer cxi
+    drivers, specifically when specifying `disable_rdzv_get`.
+  * Add `ABORT_ON_FATAL` environment variable to capture a backtrace
+    from any `fatal()` message.
+  * Fix printing invalid address in rate limiting log statement.
+  * `sched/backfill` - Fix node state `PLANNED` not being cleared from
+    fully allocated nodes during a backfill cycle.
+  * `select/cons_tres` - Fix future planning of jobs with
+    `bf_licenses`.
+  * Prevent redundant "`on_data returned rc: Rate limit exceeded,
+    please retry momentarily`" error message from being printed in
+    slurmctld logs.
+  * Fix loading non-default QOS on pending jobs from pre-24.11
+    state.
+  * Fix pending jobs displaying `QOS=(null)` when not explicitly
+    requesting a QOS.
+  * Fix segfault issue from job record with no `job_resrcs`.
+  * Fix failing `sacctmgr delete/modify/show` account operations
+    with `where` clauses.
+  * Fix regression in 24.11 in which Slurm daemons started
+    catching several `SIGTSTP`, `SIGTTIN` and `SIGUSR1` signals and
+    ignored them, while before they were not ignoring them. This
+    also caused slurmctld to not being able to shutdown after a
+    `SIGTSTP` because slurmscriptd caught the signal and stopped
+    while slurmctld ignored it. Unify and fix these situations and
+    get back to the previous behavior for these signals.
+  * Document that `SIGQUIT` is no longer ignored by `slurmctld`,
+    `slurmdbd`, and slurmd in 24.11. As of 24.11.0rc1, `SIGQUIT` is
+    identical to `SIGINT` and `SIGTERM` for these daemons, but this
+    change was not documented.
+  * Fix not considering nodes marked for reboot without ASAP in
+    the scheduler.
+  * Remove the `boot^` state on unexpected node reboot after return
+    to service.
+  * Do not allow new jobs to start on a node which is being
+    rebooted with the flag `nextstate=resume`.
+  * Prevent lower priority job running after cancelling an ASAP
+    reboot.
+  * Fix srun jobs starting on `nextstate=resume` rebooting nodes.
+- Sync upgrades file to reflect last updated versions.
+- Pass `-DH5_USE_112_API -DDH5Oget_info_vers=1` to CFLAGS to allow
  building with hdf5 1.14 as slurm does not yet support HDF5 v114
  API.

 -------------------------------------------------------------------
 Fri Feb  7 11:51:59 UTC 2025 - Egbert Eich <eich@suse.com>

-Update to version 24.11.1:
+- Update to version 24.11.1:
  * With client commands `MIN_MEMORY` will show `mem_per_tres` if
    specified.
  * Fix errno message about bad constraint.
--- a/slurm.spec
+++ b/slurm.spec
@@ -18,8 +18,8 @@

 # Check file META in sources: update so_version to (API_CURRENT - API_AGE)
 %define so_version 42
-# Make sure to update `upgrades` as well!
-%define ver 24.11.3
+# Make sure to update `upgrades` as well if version is to be released with SLES!
+%define ver 24.11.5
 %define _ver _24_11
 %define dl_ver %{ver}
 # so-version is 0 and seems to be stable
@@ -597,6 +597,9 @@ Requires:       bzip2
 Requires:       expect
 Requires:       gcc-c++
 Requires:       libnuma-devel
+%if 0%{?sle_version} && 0%{?sle_version} < 160000
+%ts_depends:     openmpi4-gnu-hpc-devel
+%endif
 Requires:       pam
 Requires:       pdsh
 Requires:       perl-%{name} = %version
@@ -801,7 +804,7 @@ Cflags: -I\${includedir}
 Libs: -L\${libdir} -lslurm
 Description: Slurm API
 Name:           %{pname}
-Version:        24.11.2
+Version:        %{ver}
 EOF

 # Enable rotation of log files
--- a/2
+++ b/2
@@ -1,4 +1,4 @@
-24.11.3
+24.11.5
 24.11.1
 24.11.0
 24.05.4
Author	SHA256	Message	Date
Ana Guerrero	9eccfee1ec	Accepting request 1280446 from network:cluster elevated privileges (CVE-2025-43904, bsc#1243666). OBS-URL: https://build.opensuse.org/request/show/1280446 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=115	2025-05-27 16:43:12 +00:00
Egbert Eich	c306a7cbea	elevated privileges (CVE-2025-43904, bsc#1243666). OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=317	2025-05-27 06:01:40 +00:00
Ana Guerrero	44be7c0cb6	Accepting request 1280317 from network:cluster OBS-URL: https://build.opensuse.org/request/show/1280317 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=114	2025-05-26 18:16:48 +00:00
Christian Goll	23285a4663	- Update to version 24.11.5 * Fix security issue where a coordinator could add a user with elevated privileges. CVE-2025-43904. * Return error to `scontrol` reboot on bad nodelists. * `slurmrestd` - Report an error when QOS resolution fails for v0.0.40 endpoints. * `slurmrestd` - Report an error when QOS resolution fails for v0.0.41 endpoints. * `slurmrestd` - Report an error when QOS resolution fails for v0.0.42 endpoints. * `data_parser/v0.0.42` - Added `+inline_enums` flag which modifies the output when generating OpenAPI specification. It causes enum arrays to not be defined in their own schema with references (`$ref`) to them. Instead they will be dumped inline. * Fix binding error with `tres-bind map/mask` on partial node allocations. * Fix `stepmgr` enabled steps being able to request features. * Reject step creation if requested feature is not available in job. * `slurmd` - Restrict listening for new incoming RPC requests further into startup. * `slurmd` - Avoid `auth/slurm` related hangs of CLI commands during startup and shutdown. * `slurmctld` - Restrict processing new incoming RPC requests further into startup. Stop processing requests sooner during shutdown. * `slurmcltd` - Avoid auth/slurm related hangs of CLI commands during startup and shutdown. * `slurmctld` - Avoid race condition during shutdown or OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=315	2025-05-26 16:02:01 +00:00
Egbert Eich	5c21eb73a7	Add changes from version 24.11.2 to changelog; fix version in pkgconfig file. OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=314	2025-05-26 07:30:05 +00:00