5 Commits

Author SHA256 Message Date
9eccfee1ec Accepting request 1280446 from network:cluster
elevated privileges (CVE-2025-43904, bsc#1243666).

OBS-URL: https://build.opensuse.org/request/show/1280446
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=115
2025-05-27 16:43:12 +00:00
c306a7cbea elevated privileges (CVE-2025-43904, bsc#1243666).
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=317
2025-05-27 06:01:40 +00:00
44be7c0cb6 Accepting request 1280317 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/1280317
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=114
2025-05-26 18:16:48 +00:00
23285a4663 - Update to version 24.11.5
* Fix security issue where a coordinator could add a user with
	elevated privileges. CVE-2025-43904.
  * Return error to `scontrol` reboot on bad nodelists.
  * `slurmrestd` - Report an error when QOS resolution fails for
	v0.0.40 endpoints.
  * `slurmrestd` - Report an error when QOS resolution fails for
	v0.0.41 endpoints.
  * `slurmrestd` - Report an error when QOS resolution fails for
	v0.0.42 endpoints.
  * `data_parser/v0.0.42` - Added `+inline_enums` flag which
	modifies the output when generating OpenAPI specification.
	It causes enum arrays to not be defined in their own schema
	with references (`$ref`) to them. Instead they will be dumped
	inline.
  * Fix binding error with `tres-bind map/mask` on partial node
	allocations.
  * Fix `stepmgr` enabled steps being able to request features.
  * Reject step creation if requested feature is not available
	in job.
  * `slurmd` - Restrict listening for new incoming RPC requests
	further into startup.
  * `slurmd` - Avoid `auth/slurm` related hangs of CLI commands
	during startup and shutdown.
  * `slurmctld` - Restrict processing new incoming RPC requests
	further into startup. Stop processing requests sooner during
	shutdown.
  * `slurmcltd` - Avoid auth/slurm related hangs of CLI commands
	during startup and shutdown.
  * `slurmctld` - Avoid race condition during shutdown or

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=315
2025-05-26 16:02:01 +00:00
5c21eb73a7 Add changes from version 24.11.2 to changelog; fix version in pkgconfig file.
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=314
2025-05-26 07:30:05 +00:00
5 changed files with 275 additions and 15 deletions

View File

@@ -1,3 +0,0 @@
version https://git-lfs.github.com/spec/v1
oid sha256:e9ed8826a26fcf7106cbf772c7142170672a422a56c96567d2c2dc2c5f369bc3
size 7248260

3
slurm-24.11.5.tar.bz2 Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:bd8d28d870676dbd00634aa0a79ffe4706907bd48c86f988f9e2c7b15d1fc0a7
size 7261594

View File

@@ -1,26 +1,283 @@
-------------------------------------------------------------------
Mon May 26 08:17:32 UTC 2025 - Egbert Eich <eich@suse.com>
- Update to version 24.11.5
* Fix security issue where a coordinator could add a user with
elevated privileges (CVE-2025-43904, bsc#1243666).
* Return error to `scontrol` reboot on bad nodelists.
* `slurmrestd` - Report an error when QOS resolution fails for
v0.0.40 endpoints.
* `slurmrestd` - Report an error when QOS resolution fails for
v0.0.41 endpoints.
* `slurmrestd` - Report an error when QOS resolution fails for
v0.0.42 endpoints.
* `data_parser/v0.0.42` - Added `+inline_enums` flag which
modifies the output when generating OpenAPI specification.
It causes enum arrays to not be defined in their own schema
with references (`$ref`) to them. Instead they will be dumped
inline.
* Fix binding error with `tres-bind map/mask` on partial node
allocations.
* Fix `stepmgr` enabled steps being able to request features.
* Reject step creation if requested feature is not available
in job.
* `slurmd` - Restrict listening for new incoming RPC requests
further into startup.
* `slurmd` - Avoid `auth/slurm` related hangs of CLI commands
during startup and shutdown.
* `slurmctld` - Restrict processing new incoming RPC requests
further into startup. Stop processing requests sooner during
shutdown.
* `slurmcltd` - Avoid auth/slurm related hangs of CLI commands
during startup and shutdown.
* `slurmctld` - Avoid race condition during shutdown or
ereconfigure that could result in a crash due delayed
processing of a connection while plugins are unloaded.
* Fix small memleak when getting the job list from the database.
* Fix incorrect printing of `%` escape characters when printing
stdio fields for jobs.
* Fix padding parsing when printing stdio fields for jobs.
* Fix printing `%A` array job id when expanding patterns.
* Fix reservations causing jobs to be held for `Bad Constraints`.
* `switch/hpe_slingshot` - Prevent potential segfault on failed
curl request to the fabric manager.
* Fix printing incorrect array job id when expanding stdio file
names. The `%A` will now be substituted by the correct value.
* Fix printing incorrect array job id when expanding stdio file
names. The `%A` will now be substituted by the correct value.
* `switch/hpe_slingshot` - Fix VNI range not updating on slurmctld
restart or reconfigre.
* Fix steps not being created when using certain combinations of
`-c` and `-n` inferior to the jobs requested resources, when
using stepmgr and nodes are configured with
`CPUs == Sockets*CoresPerSocket`.
* Permit configuring the number of retry attempts to destroy CXI
service via the new destroy_retries `SwitchParameter`.
* Do not reset `memory.high` and `memory.swap.max` in slurmd
startup or reconfigure as we are never really touching this
in `slurmd`.
* Fix reconfigure failure of slurmd when it has been started
manually and the `CoreSpecLimits` have been removed from
`slurm.conf`.
* Set or reset CoreSpec limits when slurmd is reconfigured and
it was started with systemd.
* `switch/hpe-slingshot` - Make sure the slurmctld can free
step VNIs after the controller restarts or reconfigures while
the job is running.
* Fix backup `slurmctld` failure on 2nd takeover.
- Changes from version 24.11.4
* `slurmctld`,`slurmrestd` - Avoid possible race condition that
could have caused process to crash when listener socket was
closed while accepting a new connection.
* `slurmrestd` - Avoid race condition that could have resulted
in address logged for a UNIX socket to be incorrect.
* `slurmrestd` - Fix parameters in OpenAPI specification for the
following endpoints to have `job_id` field:
```
GET /slurm/v0.0.40/jobs/state/
GET /slurm/v0.0.41/jobs/state/
GET /slurm/v0.0.42/jobs/state/
GET /slurm/v0.0.43/jobs/state/
```
* `slurmd` - Fix tracking of thread counts that could cause
incoming connections to be ignored after burst of simultaneous
incoming connections that trigger delayed response logic.
* Avoid unnecessary `SRUN_TIMEOUT` forwarding to `stepmgr`.
* Fix jobs being scheduled on higher weighted powered down nodes.
* Fix how backfill scheduler filters nodes from the available
nodes based on exclusive user and `mcs_label` requirements.
* `acct_gather_energy/{gpu,ipmi}` - Fix potential energy
consumption adjustment calculation underflow.
* `acct_gather_energy/ipmi` - Fix regression introduced in 24.05.5
(which introduced the new way of preserving energy measurements
through slurmd restarts) when `EnergyIPMICalcAdjustment=yes`.
* Prevent `slurmctld` deadlock in the assoc mgr.
* Fix memory leak when `RestrictedCoresPerGPU` is enabled.
* Fix preemptor jobs not entering execution due to wrong
calculation of accounting policy limits.
* Fix certain job requests that were incorrectly denied with
node configuration unavailable error.
* `slurmd` - Avoid crash due when slurmd has a communications
failure with `slurmstepd`.
* Fix memory leak when parsing yaml input.
* Prevent `slurmctld` from showing error message about `PreemptMode=GANG`
being a cluster-wide option for `scontrol update part` calls
that don't attempt to modify partition PreemptMode.
* Fix setting `GANG` preemption on partition when updating
`PreemptMode` with `scontrol`.
* Fix `CoreSpec` and `MemSpec` limits not being removed
from previously configured slurmd.
* Avoid race condition that could lead to a deadlock when `slurmd`,
`slurmstepd`, `slurmctld`, `slurmrestd` or `sackd` have a fatal
event.
* Fix jobs using `--ntasks-per-node` and `--mem` keep pending
forever when the requested mem divided by the number of CPUs
will surpass the configured `MaxMemPerCPU`.
* `slurmd` - Fix address logged upon new incoming RPC connection
from `INVALID` to IP address.
* Fix memory leak when retrieving reservations. This affects
`scontrol`, `sinfo`, `sview`, and the following `slurmrestd`
endpoints:
`GET /slurm/{any_data_parser}/reservation/{reservation_name}`
`GET /slurm/{any_data_parser}/reservations`
* Log warning instead of `debuflags=conmgr` gated log when
deferring new incoming connections when number of active
connections exceed `conmgr_max_connections`.
* Avoid race condition that could result in worker thread pool
not activating all threads at once after a reconfigure resulting
in lower utilization of available CPU threads until enough
internal activity wakes up all threads in the worker pool.
* Avoid theoretical race condition that could result in new
incoming RPC
socket connections being ignored after reconfigure.
* slurmd - Avoid race condition that could result in a state
where new incoming RPC connections will always be ignored.
* Add ReconfigFlags=KeepNodeStateFuture to restore saved `FUTURE`
node state on restart and reconfig instead of reverting to
`FUTURE` state. This will be made the default in 25.05.
* Fix case where hetjob submit would cause `slurmctld` to crash.
* Fix jobs using `--cpus-per-gpu` and `--mem` keep pending forever
when the requested mem divided by the number of CPUs will surpass
the configured `MaxMemPerCPU`.
* Enforce that jobs using `--mem` and several `--*-per-*` options
do not violate the `MaxMemPerCPU` in place.
* `slurmctld` - Fix use-cases of jobs incorrectly pending held
when `--prefer` features are not initially satisfied.
* `slurmctld` - Fix jobs incorrectly held when `--prefer` not
satisfied in some use-cases.
* Ensure `RestrictedCoresPerGPU` and `CoreSpecCount` don't overlap.
- Fix backward compatibility fallout from last update.
-------------------------------------------------------------------
Thu Apr 24 12:31:16 UTC 2025 - Christian Goll <cgoll@suse.com>
- removed openmpi4-hpc dependency for test suite
- removed openmpi4-hpc dependency for test suite.
-------------------------------------------------------------------
Fri Mar 7 09:44:31 UTC 2025 - Atri Bhattacharya <badshah400@gmail.com>
- Update to version 24.11.3:
* Fix database cluster ID generation not being random.
* Fix a regression in which slurmd -G gave no output.
* Fix a long-standing crash in slurmctld after updating a
reservation with an empty nodelist.
* Other minor to moderate bugs.
- Sync upgrades file to relfect last updated versions.
- Pass '-DH5_USE_112_API -DDH5Oget_info_vers=1' to CFLAGS to allow
* Fix a regression in which `slurmd -G` gave no output.
* Fix a long-standing crash in `slurmctld` after updating a
reservation with an empty nodelist. The crash could occur
after restarting slurmctld, or if downing/draining a node
in the reservation with the `REPLACE` or `REPLACE_DOWN` flag.
* Avoid changing process name to "`watch`" from original daemon name.
This could potentially breaking some monitoring scripts.
* Avoid `slurmctld` being killed by `SIGALRM` due to race condition
at startup.
* Fix race condition in slurmrestd that resulted in "`Requested
data_parser plugin does not support OpenAPI plugin`" error being
returned for valid endpoints.
* Fix race between `task/cgroup` CPUset and `jobacctgather/cgroup`.
The first was removing the pid from `task_X` cgroup directory
causing memory limits to not being applied.
* If multiple partitions are requested, set the `SLURM_JOB_PARTITION`
output environment variable to the partition in which the job is
running for `salloc` and `srun` in order to match the documentation
and the behavior of `sbatch`.
* `srun` - Fixed wrongly constructed `SLURM_CPU_BIND` env variable
that could get propagated to downward srun calls in certain mpi
environments, causing launch failures.
* Don't print misleading errors for stepmgr enabled steps.
* `slurmrestd` - Avoid connection to slurmdbd for the following
endpoints:
```
GET /slurm/v0.0.41/jobs
GET /slurm/v0.0.41/job/{job_id}
```
* `slurmrestd` - Avoid connection to slurmdbd for the following
endpoints:
```
GET /slurm/v0.0.40/jobs
GET /slurm/v0.0.40/job/{job_id}
```
* `slurmrestd` - Fix possible memory leak when parsing arrays with
`data_parser/v0.0.40`.
* `slurmrestd` - Fix possible memory leak when parsing arrays with
`data_parser/v0.0.41`.
* `slurmrestd` - Fix possible memory leak when parsing arrays with
`data_parser/v0.0.42`.
- Changes from version 24.11.2:
* Fix segfault when submitting `--test-only` jobs that can
preempt.
* Fix regression introduced in 23.11 that prevented the
following flags from being added to a reservation on an
update: `DAILY`, `HOURLY`, `WEEKLY`, `WEEKDAY`, and `WEEKEND`.
* Fix crash and issues evaluating job's suitability for running
in nodes with already suspended job(s) there.
* `slurmctld` will ensure that healthy nodes are not reported as
`UnavailableNodes` in job reason codes.
* Fix handling of jobs submitted to a current reservation with
flags `OVERLAP,FLEX` or `OVERLAP,ANY_NODES` when it overlaps nodes
with a future maintenance reservation. When a job submission
had a time limit that overlapped with the future maintenance
reservation, it was rejected. Now the job is accepted but
stays pending with the reason "`ReqNodeNotAvail, Reserved for
maintenance`".
* `pam_slurm_adopt` - avoid errors when explicitly setting some
arguments to the default value.
* Fix QOS preemption with `PreemptMode=SUSPEND`.
* `slurmdbd` - When changing a user's name update lineage at the
same time.
* Fix regression in 24.11 in which `burst_buffer.lua` does not
inherit the `SLURM_CONF` environment variable from `slurmctld` and
fails to run if slurm.conf is in a non-standard location.
* Fix memory leak in slurmctld if `select/linear` and the
`PreemptParameters=reclaim_licenses` options are both set in
`slurm.conf`. Regression in 24.11.1.
* Fix running jobs, that requested multiple partitions, from
potentially being set to the wrong partition on restart.
* `switch/hpe_slingshot` - Fix compatibility with newer cxi
drivers, specifically when specifying `disable_rdzv_get`.
* Add `ABORT_ON_FATAL` environment variable to capture a backtrace
from any `fatal()` message.
* Fix printing invalid address in rate limiting log statement.
* `sched/backfill` - Fix node state `PLANNED` not being cleared from
fully allocated nodes during a backfill cycle.
* `select/cons_tres` - Fix future planning of jobs with
`bf_licenses`.
* Prevent redundant "`on_data returned rc: Rate limit exceeded,
please retry momentarily`" error message from being printed in
slurmctld logs.
* Fix loading non-default QOS on pending jobs from pre-24.11
state.
* Fix pending jobs displaying `QOS=(null)` when not explicitly
requesting a QOS.
* Fix segfault issue from job record with no `job_resrcs`.
* Fix failing `sacctmgr delete/modify/show` account operations
with `where` clauses.
* Fix regression in 24.11 in which Slurm daemons started
catching several `SIGTSTP`, `SIGTTIN` and `SIGUSR1` signals and
ignored them, while before they were not ignoring them. This
also caused slurmctld to not being able to shutdown after a
`SIGTSTP` because slurmscriptd caught the signal and stopped
while slurmctld ignored it. Unify and fix these situations and
get back to the previous behavior for these signals.
* Document that `SIGQUIT` is no longer ignored by `slurmctld`,
`slurmdbd`, and slurmd in 24.11. As of 24.11.0rc1, `SIGQUIT` is
identical to `SIGINT` and `SIGTERM` for these daemons, but this
change was not documented.
* Fix not considering nodes marked for reboot without ASAP in
the scheduler.
* Remove the `boot^` state on unexpected node reboot after return
to service.
* Do not allow new jobs to start on a node which is being
rebooted with the flag `nextstate=resume`.
* Prevent lower priority job running after cancelling an ASAP
reboot.
* Fix srun jobs starting on `nextstate=resume` rebooting nodes.
- Sync upgrades file to reflect last updated versions.
- Pass `-DH5_USE_112_API -DDH5Oget_info_vers=1` to CFLAGS to allow
building with hdf5 1.14 as slurm does not yet support HDF5 v114
API.
-------------------------------------------------------------------
Fri Feb 7 11:51:59 UTC 2025 - Egbert Eich <eich@suse.com>
Update to version 24.11.1:
- Update to version 24.11.1:
* With client commands `MIN_MEMORY` will show `mem_per_tres` if
specified.
* Fix errno message about bad constraint.

View File

@@ -18,8 +18,8 @@
# Check file META in sources: update so_version to (API_CURRENT - API_AGE)
%define so_version 42
# Make sure to update `upgrades` as well!
%define ver 24.11.3
# Make sure to update `upgrades` as well if version is to be released with SLES!
%define ver 24.11.5
%define _ver _24_11
%define dl_ver %{ver}
# so-version is 0 and seems to be stable
@@ -597,6 +597,9 @@ Requires: bzip2
Requires: expect
Requires: gcc-c++
Requires: libnuma-devel
%if 0%{?sle_version} && 0%{?sle_version} < 160000
%ts_depends: openmpi4-gnu-hpc-devel
%endif
Requires: pam
Requires: pdsh
Requires: perl-%{name} = %version
@@ -801,7 +804,7 @@ Cflags: -I\${includedir}
Libs: -L\${libdir} -lslurm
Description: Slurm API
Name: %{pname}
Version: 24.11.2
Version: %{ver}
EOF
# Enable rotation of log files

View File

@@ -1,4 +1,4 @@
24.11.3
24.11.5
24.11.1
24.11.0
24.05.4