- Update to version 24.05.3

* `data_parser/v0.0.40` - Added field descriptions. * `slurmrestd` - Avoid creating new slurmdbd connection per request to `* /slurm/slurmctld/*/*` endpoints. * Fix compilation issue with `switch/hpe_slingshot` plugin. * Fix gres per task allocation with threads-per-core. * `data_parser/v0.0.41` - Added field descriptions. * `slurmrestd` - Change back generated OpenAPI schema for `DELETE /slurm/v0.0.40/jobs/` to `RequestBody` instead of using parameters for request. `slurmrestd` will continue accept endpoint requests via `RequestBody` or HTTP query. * `topology/tree` - Fix issues with switch distance optimization. * Fix potential segfault of secondary `slurmctld` when falling back to the primary when running with a `JobComp` plugin. * Enable `--json`/`--yaml=v0.0.39` options on client commands to dump data using data_parser/v0.0.39 instead or outputting nothing. * `switch/hpe_slingshot` - Fix issue that could result in a 0 length state file. * Fix unnecessary message protocol downgrade for unregistered nodes. * Fix unnecessarily packing alias addrs when terminating jobs with a mix of non-cloud/dynamic nodes and powered down cloud/dynamic nodes. * `accounting_storage/mysql` - Fix issue when deleting a qos that could remove too many commas from the qos and/or delta_qos fields of the assoc table. * `slurmctld` - Fix memory leak when using RestrictedCoresPerGPU. * Fix allowing access to reservations without `MaxStartDelay` set. * Fix regression introduced in 24.05.0rc1 breaking `srun --send-libs` parsing. * Fix slurmd vsize memory leak when using job submission/allocation OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=295
2024-10-15 06:51:09 +00:00 · 2024-10-15 06:51:09 +00:00 · b2f6e848a1
commit b2f6e848a1
parent fc209e050f
4 changed files with 629 additions and 302 deletions
--- a/slurm-24.05.0.tar.bz2
+++ b/slurm-24.05.0.tar.bz2
@ -1,3 +0,0 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:a6d3e95f2bbda3c9567060efc3d7090ad8eac257fa3578798c89321957946e49
 size 7117445
--- a/slurm-24.05.3.tar.bz2
+++ b/slurm-24.05.3.tar.bz2
@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:b0b40513e9b6ae867ddb95d60b950bcb980c15b735b5d0dea37a9a00cc64ae24
 size 7189600
--- a/slurm.changes
+++ b/slurm.changes
@ -1,312 +1,637 @@
 -------------------------------------------------------------------
 Mon Oct 14 10:40:10 UTC 2024 - Egbert Eich <eich@suse.com>
 - Update to version 24.05.3
  * `data_parser/v0.0.40` - Added field descriptions.
  * `slurmrestd` - Avoid creating new slurmdbd connection per request
    to `* /slurm/slurmctld/*/*` endpoints.
  * Fix compilation issue with `switch/hpe_slingshot` plugin.
  * Fix gres per task allocation with threads-per-core.
  * `data_parser/v0.0.41` - Added field descriptions.
  * `slurmrestd` - Change back generated OpenAPI schema for
    `DELETE /slurm/v0.0.40/jobs/` to `RequestBody` instead of using
    parameters for request. `slurmrestd` will continue accept endpoint
    requests via `RequestBody` or HTTP query.
  * `topology/tree` - Fix issues with switch distance optimization.
  * Fix potential segfault of secondary `slurmctld` when falling back
    to the primary when running with a `JobComp` plugin.
  * Enable `--json`/`--yaml=v0.0.39` options on client commands to
    dump data using data_parser/v0.0.39 instead or outputting nothing.
  * `switch/hpe_slingshot` - Fix issue that could result in a 0 length
    state file.
  * Fix unnecessary message protocol downgrade for unregistered nodes.
  * Fix unnecessarily packing alias addrs when terminating jobs with
    a mix of non-cloud/dynamic nodes and powered down cloud/dynamic
    nodes.
  * `accounting_storage/mysql` - Fix issue when deleting a qos that
    could remove too many commas from the qos and/or delta_qos fields
    of the assoc table.
  * `slurmctld` - Fix memory leak when using RestrictedCoresPerGPU.
  * Fix allowing access to reservations without `MaxStartDelay` set.
  * Fix regression introduced in 24.05.0rc1 breaking
    `srun --send-libs` parsing.
  * Fix slurmd vsize memory leak when using job submission/allocation
    commands that implicitly or explicitly use --get-user-env.
  * `slurmd` - Fix node going into invalid state when using
    `CPUSpecList` and setting CPUs to the # of cores on a
    multithreaded node.
  * Fix reboot asap nodes being considered in backfill after a restart.
  * Fix `--clusters`/`-M queries` for clusters outside of a
    federation when `fed_display` is configured.
  * Fix `scontrol` allowing updating job with bad cpus-per-task value.
  * `sattach` - Fix regression from 24.05.2 security fix leading to
    crash.
  * `mpi/pmix` - Fix assertion when built under `--enable-debug`.
 - Changes from Slurm 24.05.2
  * Fix energy gathering rpc counter underflow in
    `_rpc_acct_gather_energy` when more than 10 threads try to get
    energy at the same time. This prevented the possibility to get
    energy from slurmd by any step until slurmd was restarted,
    so losing energy accounting metrics in the node.
  * `accounting_storage/mysql` - Fix issue where new user with `wckey`
    did not have a default wckey sent to the slurmctld.
  * `slurmrestd` - Prevent slurmrestd segfault when handling the
    following endpoints when none of the optional parameters are
    specified:
      `DELETE /slurm/v0.0.40/jobs`
      `DELETE /slurm/v0.0.41/jobs`
      `GET /slurm/v0.0.40/shares`
      `GET /slurm/v0.0.41/shares`
      `GET /slurmdb/v0.0.40/instance`
      `GET /slurmdb/v0.0.41/instance`
      `GET /slurmdb/v0.0.40/instances`
      `GET /slurmdb/v0.0.41/instances`
      `POST /slurm/v0.0.40/job/{job_id}`
      `POST /slurm/v0.0.41/job/{job_id}`
  * Fix IPMI energy gathering when no IPMIPowerSensors are specified
    in `acct_gather.conf`. This situation resulted in an accounted
    energy of 0 for job steps.
  * Fix a minor memory leak in slurmctld when updating a job dependency.
  * `scontrol`,`squeue` - Fix regression that caused incorrect values
    for multisocket nodes at `.jobs[].job_resources.nodes.allocation`
    for `scontrol show jobs --(json|yaml)` and `squeue --(json|yaml)`.
  * `slurmrestd` - Fix regression that caused incorrect values for
    multisocket nodes at `.jobs[].job_resources.nodes.allocation` to
    be dumped with endpoints:
      `GET /slurm/v0.0.41/job/{job_id}`
      `GET /slurm/v0.0.41/jobs`
  * `jobcomp/filetxt` - Fix truncation of job record lines > 1024
    characters.
  * `switch/hpe_slingshot` - Drain node on failure to delete CXI
    services.
  * Fix a performance regression from 23.11.0 in cpu frequency
    handling when no `CpuFreqDef` is defined.
  * Fix one-task-per-sharing not working across multiple nodes.
  * Fix inconsistent number of cpus when creating a reservation
    using the TRESPerNode option.
  * `data_parser/v0.0.40+` - Fix job state parsing which could
    break filtering.
  * Prevent `cpus-per-task` to be modified in jobs where a `-c`
    value has been explicitly specified and the requested memory
    constraints implicitly increase the number of CPUs to allocate.
  * `slurmrestd` - Fix regression where args `-s v0.0.39,dbv0.0.39`
    and `-d v0.0.39` would result in `GET /openapi/v3` not
    registering as a valid possible query resulting in 404 errors.
  * `slurmrestd` - Fix memory leak for dbv0.0.39 jobs query which
    occurred if the query parameters specified account, association,
    cluster, constraints, format, groups, job_name, partition, qos,
    reason, reservation, state, users, or wckey. This affects the
    following endpoints:
      `GET /slurmdb/v0.0.39/jobs`
  * `slurmrestd` - In the case the slurmdbd does not respond to a
    persistent connection init message, prevent the closed fd from
    being used, and instead emit an error or warning depending on
    if the connection was required.
  * Fix 24.05.0 regression that caused the slurmdbd not to send back
    an error message if there is an error initializing a persistent
    connection.
  * Reduce latency of forwarded x11 packets.
  * Add `curr_dependency` (representing the current dependency of
    the job).
    and `orig_dependency` (representing the original requested
    dependency of the job) fields to the job record in
    `job_submit.lua` (for job update) and `jobcomp.lua`.
  * Fix potential segfault of slurmctld configured with
    `SlurmctldParameters=enable_rpc_queue` from happening on
    reconfigure.
  * Fix potential segfault of slurmctld on its shutdown when rate
    limitting is enabled.
  * `slurmrestd` - Fix missing job environment for `SLURM_JOB_NAME`,
    `SLURM_OPEN_MODE`, `SLURM_JOB_DEPENDENCY`, `SLURM_PROFILE`,
    `SLURM_ACCTG_FREQ`, `SLURM_NETWORK` and `SLURM_CPU_FREQ_REQ` to
    match sbatch.
  * Fix GRES environment variable indices being incorrect when only
    using a subset of all GPUs on a node and the
    `--gres-flags=allow-task-sharing` option.
  * Prevent `scontrol` from segfaulting when requesting scontrol
    show reservation `--json` or `--yaml` if there is an error
    retrieving reservations from the `slurmctld`.
  * `switch/hpe_slingshot` - Fix security issue around managing VNI
    access. CVE-2024-42511.
  * `switch/nvidia_imex` - Fix security issue managing IMEX channel
    access. CVE-2024-42511.
  * `switch/nvidia_imex` - Allow for compatibility with
    `job_container/tmpfs`.
 - Changes in Slurm 24.05.1
  * Fix `slurmctld` and `slurmdbd` potentially stopping instead of
    performing a logrotate when recieving `SIGUSR2` when using
    `auth/slurm`.
  * `switch/hpe_slingshot` - Fix slurmctld crash when upgrading
    from 23.02.
  * Fix "Could not find group" errors from `validate_group()` when
    using `AllowGroups` with large `/etc/group` files.
  * Add `AccountingStoreFlags=no_stdio` which allows to not record
    the stdio paths of the job when set.
  * `slurmrestd` - Prevent a slurmrestd segfault when parsing the
    `crontab` field, which was never usable. Now it explicitly
    ignores the value and emits a warning if it is used for the
    following endpoints:
      `POST /slurm/v0.0.39/job/{job_id}`
      `POST /slurm/v0.0.39/job/submit`
      `POST /slurm/v0.0.40/job/{job_id}`
      `POST /slurm/v0.0.40/job/submit`
      `POST /slurm/v0.0.41/job/{job_id}`
      `POST /slurm/v0.0.41/job/submit`
      `POST /slurm/v0.0.41/job/allocate`
  * `mpi/pmi2` - Fix communication issue leading to task launch
    failure with "`invalid kvs seq from node`".
  * Fix getting user environment when using sbatch with
    `--get-user-env` or `--export=` when there is a user profile
    script that reads `/proc`.
  * Prevent slurmd from crashing if `acct_gather_energy/gpu` is
    configured but `GresTypes` is not configured.
  * Do not log the following errors when `AcctGatherEnergyType`
    plugins are used but a node does not have or cannot find sensors:
    "`error: _get_joules_task: can't get info from slurmd`"
    "`error: slurm_get_node_energy: Zero Bytes were transmitted or
     received`"
    However, the following error will continue to be logged:
    "`error: Can't get energy data. No power sensors are available.
     Try later`"
  * `sbatch`, `srun` - Set `SLURM_NETWORK` environment variable if
    `--network` is set.
  * Fix cloud nodes not being able to forward to nodes that restarted
    with new IP addresses.
  * Fix cwd not being set correctly when running a SPANK plugin with a
    `spank_user_init()` hook and the new "`contain_spank`" option set.
  * `slurmctld` - Avoid deadlock during shutdown when `auth/slurm`
    is active.
  * Fix segfault in `slurmctld` with `topology/block`.
  * `sacct` - Fix printing of job group for job steps.
  * `scrun` - Log when an invalid environment variable causes the
    job submission to be rejected.
  * `accounting_storage/mysql` - Fix problem where listing or
    modifying an association when specifying a qos list could hang
    or take a very long time.
  * `gpu/nvml` - Fix `gpuutil/gpumem` only tracking last GPU in step.
    Now, `gpuutil/gpumem` will record sums of all GPUS in the step.
  * Fix error in `scrontab` jobs when using
    `slurm.conf:PropagatePrioProcess=1`.
  * Fix `slurmctld` crash on a batch job submission with
    `--nodes 0,...`.
  * Fix dynamic IP address fanout forwarding when using `auth/slurm`.
  * Restrict listening sockets in the `mpi/pmix` plugin and `sattach`
    to the `SrunPortRange`.
  * `slurmrestd` - Limit mime types returned from query to
    `GET /openapi/v3` to only return one mime type per serializer
    plugin to fix issues with OpenAPI client generators that are
    unable to handle multiple mime type aliases.
  * Fix many commands possibly reporting an "`Unexpected Message
    Received`" when in reality the connection timed out.
  * Prevent   slurmctld  from starting if there is not a json
    serializer present and the `extra_constraints` feature is enabled.
  * Fix heterogeneous job components not being signaled with
    `scancel --ctld` and `DELETE slurm/v0.0.40/jobs` if the job ids
    are not explicitly given, the heterogeneous job components match
    the given filters, and the heterogeneous job leader does not
    match the given filters.
  * Fix regression from 23.02 impeding job licenses from being cleared.
  * Move error to `log_flag` which made `_get_joules_task` error to
    be logged to the user when too many rpcs were queued in slurmd
    for gathering energy.
  * For `scancel --ctld` and the associated rest api endpoints:
      `DELETE /slurm/v0.0.40/jobs`
      `DELETE /slurm/v0.0.41/jobs`
    Fix canceling the final array task in a job array when the task
    is pending and all array tasks have been split into separate job
    records. Previously this task was not canceled.
  * Fix `power_save operation` after recovering from a failed
    reconfigure.
  * `slurmctld` - Skip removing the pidfile when running under
    systemd. In that situation it is never created in the first place.
  * Fix issue where altering the flags on a Slurm account
    (`UsersAreCoords`) several limits on the account's association
    would be set to 0 in Slurm's internal cache.
  * Fix memory leak in the controller when relaying `stepmgr` step
    accounting to the dbd.
  * Fix segfault when submitting stepmgr jobs within an existing
    allocation.
  * Added `disable_slurm_hydra_bootstrap` as a possible `MpiParams`
    parameter in `slurm.conf`. Using this will disable env variable
    injection to allocations for the following variables:
    `I_MPI_HYDRA_BOOTSTRAP,` `I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS`,
    `HYDRA_BOOTSTRAP`, `HYDRA_LAUNCHER_EXTRA_ARGS`.
  * `scrun` - Delay shutdown until after start requested.
    This caused `scrun` to never start or shutdown and hung forever
    when using `--tty`.
  * Fix backup `slurmctld` potentially not running the agent when
    taking over as the primary controller.
  * Fix primary controller not running the agent when a reconfigure
    of the `slurmctld` fails.
  * `slurmd` - fix premature timeout waiting for
    `REQUEST_LAUNCH_PROLOG` with large array jobs causing node to
    drain.
  * `jobcomp/{elasticsearch,kafka}` - Avoid sending fields with
    invalid date/time.
  * `jobcomp/elasticsearch` - Fix `slurmctld` memory leak from
    curl usage.
  * `acct_gather_profile/influxdb` - Fix slurmstepd memory leak from
    curl usage
  * Fix 24.05.0 regression not deleting job hash dirs after
    `MinJobAge`.
  * Fix filtering arguments being ignored when using squeue `--json`.
  * `switch/nvidia_imex` - Move setup call after `spank_init()` to
    allow namespace manipulation within the SPANK plugin.
  * `switch/nvidia_imex` - Skip plugin operation if
    `nvidia-caps-imex-channels` device is not present rather than
    preventing slurmd from starting.
  * `switch/nvidia_imex` - Skip plugin operation if
    `job_container/tmpfs` is configured due to incompatibility.
  * `switch/nvidia_imex` - Remove any pre-existing channels when
    `slurmd` starts.
  * `rpc_queue` - Add support for an optional `rpc_queue.yaml`
    configuration file.
  * `slurmrestd` - Add new +prefer_refs flag to `data_parser/v0.0.41`
    plugin. This flag will avoid inlining single referenced schemas
    in the OpenAPI schema.
 -------------------------------------------------------------------
 Tue Jun  4 09:36:54 UTC 2024 - Christian Goll <cgoll@suse.com>
- updated to new release 24.05.0 with following major changes
+- Updated to new release 24.05.0 with following major changes
- IMPORTANT NOTES:
+  * Important Notes:
-  If using the slurmdbd (Slurm DataBase Daemon) you must update
+    If using the slurmdbd (Slurm DataBase Daemon) you must update
-  this first.  NOTE: If using a backup DBD you must start the
+    this first.  NOTE: If using a backup DBD you must start the
-  primary first to do any database conversion, the backup will not
+    primary first to do any database conversion, the backup will not
-  start until this has happened.  The 24.05 slurmdbd will work
+    start until this has happened.  The 24.05 slurmdbd will work
-  with Slurm daemons of version 23.02 and above.  You will not
+    with Slurm daemons of version 23.02 and above.  You will not
-  need to update all clusters at the same time, but it is very
+    need to update all clusters at the same time, but it is very
-  important to update slurmdbd first and having it running before
+    important to update slurmdbd first and having it running before
-  updating any other clusters making use of it.
+    updating any other clusters making use of it.
- HIGHLIGHTS
+  * Highlights
-  * Federation - allow client command operation when slurmdbd is
+    + Federation - allow client command operation when slurmdbd is
-    unavailable.
+      unavailable.
-  * burst_buffer/lua - Added two new hooks: slurm_bb_test_data_in
+    + `burst_buffer/lua` - Added two new hooks: `slurm_bb_test_data_in`
-    and slurm_bb_test_data_out. The syntax and use of the new hooks
+      and `slurm_bb_test_data_out`. The syntax and use of the new hooks
-    are documented in etc/burst_buffer.lua.example. These are
+      are documented in `etc/burst_buffer.lua.example`. These are
-    required to exist. slurmctld now checks on startup if the
+      required to exist. slurmctld now checks on startup if the
-    burst_buffer.lua script loads and contains all required hooks;
+      `burst_buffer.lua` script loads and contains all required hooks;
-    slurmctld will exit with a fatal error if this is not
+      `slurmctld` will exit with a fatal error if this is not
-    successful. Added PollInterval to burst_buffer.conf. Removed
+      successful. Added `PollInterval` to `burst_buffer.conf`. Removed
-    the arbitrary limit of 512 copies of the script running
+      the arbitrary limit of 512 copies of the script running
-    simultaneously.
+      simultaneously.
-  * Add QOS limit MaxTRESRunMinsPerAccount. 
+    + Add QOS limit `MaxTRESRunMinsPerAccount`.
-  * Add QOS limit MaxTRESRunMinsPerUser.
+    + Add QOS limit `MaxTRESRunMinsPerUser`.
-  * Add ELIGIBLE environment variable to jobcomp/script plugin.
+    + Add `ELIGIBLE` environment variable to `jobcomp/script` plugin.
-  * Always use the QOS name for SLURM_JOB_QOS environment variables.
+    + Always use the QOS name for `SLURM_JOB_QOS` environment variables.
-    Previously the batch environment would use the description field,
+      Previously the batch environment would use the description field,
-    which was usually equivalent to the name. 
+      which was usually equivalent to the name.
-  * cgroup/v2 - Require dbus-1 version >= 1.11.16.
+    + `cgroup/v2` - Require dbus-1 version >= 1.11.16.
-  * Allow NodeSet names to be used in SuspendExcNodes.
+    + Allow `NodeSet` names to be used in SuspendExcNodes.
-  * SuspendExcNodes=<nodes>:N now counts allocated nodes in N. The
+    + `SuspendExcNodes=<nodes>:N` now counts allocated nodes in `N`.
-    first N powered up nodes in <nodes> are protected from being
+      The first `N` powered up nodes in <nodes> are protected from
-    suspended.
+      being suspended.
-  * Store job output, input and error paths in SlurmDBD.
+    + Store job output, input and error paths in `SlurmDBD`.
-  * Add USER_DELETE reservation flag to allow users with access to
+    + Add `USER_DELETE` reservation flag to allow users with access
-    a reservation to delete it.
+      to a reservation to delete it.
-  * Add SlurmctldParameters=enable_stepmgr to enable step
+    + Add `SlurmctldParameters=enable_stepmgr` to enable step
-    management through the slurmstepd instead of the controller.
+      management through the `slurmstepd` instead of the controller.
-  * Added PrologFlags=RunInJob to make prolog and epilog run
+    + Added `PrologFlags=RunInJob` to make prolog and epilog run
-    inside the job extern step to include it in the job's cgroup.
+      inside the job extern step to include it in the job's cgroup.
-  * Add ability to reserve MPI ports at the job level for stepmgr
+    + Add ability to reserve MPI ports at the job level for stepmgr
-    jobs and subdivide them at the step level.
+      jobs and subdivide them at the step level.
-  * slurmrestd - Add --generate-openapi-spec argument.
+    + `slurmrestd` - Add `--generate-openapi-spec argument`.
- CONFIGURATION FILE CHANGES (see appropriate man page for details)
+  * Configuration File Changes (see appropriate man page for details)
-  * CoreSpecPlugin has been removed.
+    + `CoreSpecPlugin` has been removed.
-  * Removed TopologyPlugin tree and dragonfly support from
+    + Removed `TopologyPlugin` tree and dragonfly support from
-    select/linear.  If those topology plugins are desired please switch to
+      `select/linear`.  If those topology plugins are desired please
-    select/cons_tres.
+      switch to `select/cons_tres`.
-  * Changed the default value for UnkillableStepTimeout to 60
+    + Changed the default value for `UnkillableStepTimeout` to 60
-    seconds or five times the value of MessageTimeout, whichever is greater.
+      seconds or five times the value of `MessageTimeout`, whichever
-  * An error log has been added if JobAcctGatherParams 'UsePss' or
+      is greater.
-    'NoShare' are configured with a plugin other than jobacct_gather/linux.
+    + An error log has been added if `JobAcctGatherParams` '`UsePss`'
-    In such case these parameters are ignored.
+      or '`NoShare`' are configured with a plugin other than
-  * helpers.conf - Added Flags=rebootless parameter allowing feature changes
+      `jobacct_gather/linux`. In such case these parameters are ignored.
-    without rebooting compute nodes.
+    + `helpers.conf` - Added `Flags=rebootless` parameter allowing
-  * topology/block - Replaced the BlockLevels with BlockSizes in topology.conf.
+      feature changes without rebooting compute nodes.
-  * Add contain_spank option to SlurmdParameters. When set, spank_user_init(),
+    + `topology/block` - Replaced the `BlockLevels` with `BlockSizes`
-    spank_task_post_fork(), and spank_task_exit() will execute within the
+      in `topology.conf`.
-    job_container/tmpfs plugin namespace.
+    + Add `contain_spank` option to `SlurmdParameters`. When set,
-  * Add SlurmctldParameters=max_powered_nodes=N, which prevents powering up
+    `spank_user_init()`, `spank_task_post_fork()`, and
-    nodes after the max is reached.
+    `spank_task_exit()` will execute within the
-  * Add ExclusiveTopo to a partition definition in slurm.conf.
+    `job_container/tmpfs` plugin namespace.
-  * Add AccountingStorageParameters=max_step_records to limit how many steps
+    + Add `SlurmctldParameters=max_powered_nodes=N`, which prevents
-    are recorded in the database for each job *- excluding batc
+      powering up nodes after the max is reached.
- COMMAND CHANGES (see man pages for details)
+    + Add `ExclusiveTopo` to a partition definition in `slurm.conf`.
-  * Add support for "elevenses" as an additional time specification.
+    + Add `AccountingStorageParameters=max_step_records` to limit how
-  * Add support for sbcast --preserve when job_container/tmpfs configured
+      many steps are recorded in the database for each job - excluding
-    (previously documented as unsupported).
+      batch.
-  * scontrol - Add new subcommand 'power' for node power control.
+  * Command Changes (see man pages for details)
-  * squeue - Adjust StdErr, StdOut, and StdIn output formats. These will now
+    + Add support for "elevenses" as an additional time specification.
-    consistently print "(null)" if a value is unavailable. StdErr will no
+    + Add support for `sbcast --preserve` when `job_container/tmpfs`
-    longer display StdOut if it is not distinctly set. StdOut will now
+      configured (previously documented as unsupported).
-    correctly display the default filename pattern for job arrays, and no
+    + `scontrol` - Add new subcommand `power` for node power control.
-    longer show it for non*batch jobs. However, the expansion patterns will
+    + `squeue` - Adjust `StdErr`, `StdOut`, and `StdIn` output formats.
-    no longer be substituted by default.
+      These will now consistently print "`(null)`" if a value is
-  * Add --segment to job allocation to be used in topology/block.
+      unavailable. `StdErr` will no longer display `StdOut` if it is
-  * Add --exclusive=topo for use with topology/block.
+      not distinctly set. `StdOut` will now correctly display the
-  * squeue - Add --expand-patterns option to expand StdErr, StdOut, StdIn
+      default filename pattern for job arrays, and no longer show it
-    filename patterns as best as possible.
+      for non-batch jobs. However, the expansion patterns will
-  * sacct - Add --expand-patterns option to expand StdErr, StdOut, StdIn
+      no longer be substituted by default.
-    filename patterns as best as possible.
+    + Add `--segment` to job allocation to be used in topology/block.
-  * sreport - Requesting format=Planned will now return the expected Planned
+    + Add `--exclusive=topo` for use with topology/block.
-    time as documented, instead of PlannedDown. To request Planned Down,
+    + `squeue` - Add `--expand-patterns` option to expand `StdErr`,
-    one must use now format=PLNDDown or format=PlannedDown explicitly. The
+      `StdOut`, `StdIn` filename patterns as best as possible.
-    abbreviations "Pl" or "Pla" will now make reference to Planned instead of
+    + `sacct` - Add `--expand-patterns` option to expand `StdErr`,
-    PlannedDown.
+      `StdOut`, `StdIn` filename patterns as best as possible.
- API CHANGES
+    + `sreport` - Requesting `format=Planned` will now return the
-  * Removed ListIterator type from <slurm/slurm.h>.
+      expected `Planned` time as documented, instead of `PlannedDown`.
-  * Removed slurm_xlate_job_id() from <slurm/slurm.h>
+      To request `Planned Down`, one must use now `format=PLNDDown`
- SLURMRESTD CHANGES
+      or `format=PlannedDown` explicitly. The abbreviations
-  * openapi/dbv0.0.38 and openapi/v0.0.38 plugins have been removed.
+      "`Pl`" or "`Pla`" will now make reference to Planned instead
-  * openapi/dbv0.0.39 and openapi/v0.0.39 plugins have been tagged as
+      of `PlannedDown`.
-    deprecated to warn of their removal in the next release.
+  * API Changes
-  * Changed slurmrestd.service to only listen on TCP socket by default.
+    + Removed `ListIterator` type from `<slurm/slurm.h>`.
-    Environments with existing drop*in units for the service may need
+    + Removed `slurm_xlate_job_id()` from `<slurm/slurm.h>`
-    further adjustments to work after upgrading.
+  * SLURMRESTD Changes
-  * slurmrestd - Tagged `script` field as deprecated in
+    + `openapi/dbv0.0.38` and `openapi/v0.0.38` plugins have been
-    'POST /slurm/v0.0.41/job/submit' in anticipation of removal in future
+      removed.
-    OpenAPI plugin versions. Job submissions should set the `job.script` (or
+    + `openapi/dbv0.0.39` and `openapi/v0.0.39` plugins have been
-    `jobs[0].script` for HetJobs) fields instead.
+      tagged as deprecated to warn of their removal in the next release.
-  * slurmrestd - Attempt to automatically convert enumerated string arrays with
+    + Changed `slurmrestd.service` to only listen on TCP socket by
-    incoming non*string values into strings. Add warning when incoming value for
+      default. Environments with existing drop-in units for the
-    enumerated string arrays can not be converted to string and silently ignore
+      service may need further adjustments to work after upgrading.
-    instead of rejecting entire request. This change affects any endpoint that
+    + `slurmrestd` - Tagged `script` field as deprecated in
-    uses an enunmerated string as given in the OpenAPI specification. An
+      `POST /slurm/v0.0.41/job/submit` in anticipation of removal in
-    example of this conversion would be to 'POST /slurm/v0.0.41/job/submit' with
+      future OpenAPI plugin versions. Job submissions should set the
-    '.job.exclusive = true'. While the JSON (boolean) true value matches a
+      `job.script` (or `jobs[0].script` for HetJobs) fields instead.
-    possible enumeration, it is not the expected "true" string. This change
+    + `slurmrestd` - Attempt to automatically convert enumerated
-    automatically converts the (boolean) true to (string) "true" avoiding a
+      string arrays with  incoming non-string values into strings.
-    parsing failure.
+      Add warning when incoming value for enumerated string arrays
-  * slurmrestd - Add 'POST /slurm/v0.0.41/job/allocate' endpoint. This endpoint
+      can not be converted to string and silently ignore instead of
-    will create a new job allocation without any steps. The allocation will need
+      rejecting entire request. This change affects any endpoint that
-    to be ended via signaling the job or it will run to the timelimit.
+      uses an enunmerated string as given in the OpenAPI specification.
-  * slurmrestd - Allow startup when slurmdbd is not configured and avoid loading
+      An example of this conversion would be to
-    slurmdbd specific plugins.
+      `POST /slurm/v0.0.41/job/submit` with `.job.exclusive = true`.
- MPI/PMI2 CHANGES
+      While the JSON (boolean) true value matches a possible
-  * Jobs submitted with the SLURM_HOSTFILE environment variable set implies
+      enumeration, it is not the expected "true" string. This change
-    using an arbitrary distribution. Nevertheless, the logic used in PMI2 when
+      automatically converts the (boolean) `true` to (string) "`true`"
-    generating their associated PMI_process_mapping values has been changed and
+      avoiding a parsing failure.
-    will now be the same used for the plane distribution, as if "-m plane" were
+    + `slurmrestd` - Add `POST /slurm/v0.0.41/job/allocate` endpoint.
-    used. This has been changed because the original arbitrary distribution
+      This endpoint will create a new job allocation without any steps.
-    implementation did not account for multiple instances of the same host being
+      The allocation will need to be ended via signaling the job or
-    present in SLURM_HOSTFILE, providing an incorrect process mapping in such
+      it will run to the timelimit.
-    case. This change also enables distributing tasks in blocks when using
+    + `slurmrestd` - Allow startup when `slurmdbd` is not configured
-    arbitrary distribution, which was not the case before. This only affects
+      and avoid loading `slurmdbd` specific plugins.
-    mpi/pmi2 plugin.
+  * MPI/PMI2 Changes
- removed Fix-test-21.41.patch as upstream test changed                                                                           
+    + Jobs submitted with the `SLURM_HOSTFILE` environment variable
      set implies using an arbitrary distribution. Nevertheless, the
      logic used in PMI2 when generating their associated
      `PMI_process_mapping` values has been changed and will now be
      the same used for the plane distribution, as if `-m plane` were
      used. This has been changed because the original arbitrary
      distribution implementation did not account for multiple
      instances of the same host being present in `SLURM_HOSTFILE`,
      providing an incorrect process mapping in such case. This
      change also enables distributing tasks in blocks when using
      arbitrary distribution, which was not the case before. This
      only affects `mpi`/`pmi2` plugin.
  * Removed Fix-test-21.41.patch as upstream test changed.
 -------------------------------------------------------------------
 Mon Mar 25 15:16:44 UTC 2024 - Christian Goll <cgoll@suse.com>
 - removed Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch
  as incoperated upstream
-* Changes in Slurm 23.02.5
+- Changes in Slurm 23.02.5
- * Add the JobId to debug() messages indicating when cpus_per_task/mem_per_cpu
+  * Add the `JobId` to `debug()` messages indicating when
-   or pn_min_cpus are being automatically adjusted.
+    `cpus_per_task/mem_per_cpu` or `pn_min_cpus` are being
- * Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if
+    automatically adjusted.
-   a node features plugin is configured.
+  * Fix regression in 23.02.2 that caused `slurmctld -R` to crash on
- * Fix and prevent reoccurring reservations from overlapping.
+    startup if a node features plugin is configured.
- * job_container/tmpfs - Avoid attempts to share BasePath between nodes.
+  * Fix and prevent reoccurring reservations from overlapping.
- * Change the log message warning for rate limited users from verbose to info.
+  * `job_container/tmpfs` - Avoid attempts to share `BasePath`
- * With CR_Cpu_Memory, fix node selection for jobs that request gres and
+     between nodes.
-   *-mem-per-cpu.
+  * Change the log message warning for rate limited users from
- * Fix a regression from 22.05.7 in which some jobs were allocated too few
+    verbose to info.
-   nodes, thus overcommitting cpus to some tasks.
+  * With `CR_Cpu_Memory`, fix node selection for jobs that request
- * Fix a job being stuck in the completing state if the job ends while the
+    gres and `--mem-per-cpu`.
-   primary controller is down or unresponsive and the backup controller has
+  * Fix a regression from 22.05.7 in which some jobs were allocated
-   not yet taken over.
+    too few nodes, thus overcommitting cpus to some tasks.
- * Fix slurmctld segfault when a node registers with a configured CpuSpecList
+  * Fix a job being stuck in the completing state if the job ends
-   while slurmctld configuration has the node without CpuSpecList.
+    while the primary controller is down or unresponsive and the
- * Fix cloud nodes getting stuck in POWERED_DOWN+NO_RESPOND state after not
+    backup controller has not yet taken over.
-   registering by ResumeTimeout.
+  * Fix `slurmctld` segfault when a node registers with a configured
- * slurmstepd - Avoid cleanup of config.json-less containers spooldir getting
+    `CpuSpecList` while slurmctld configuration has the node without
-   skipped.
+    `CpuSpecList`.
- * slurmstepd - Cleanup per task generated environment for containers in
+  * Fix cloud nodes getting stuck in `POWERED_DOWN+NO_RESPOND` state
-   spooldir.
+    after not registering by `ResumeTimeout`.
- * Fix scontrol segfault when 'completing' command requested repeatedly in
+  * `slurmstepd` - Avoid cleanup of `config.json`-less containers
-   interactive mode.
+    spooldir getting skipped.
- * Properly handle a race condition between bind() and listen() calls in the
+  * `slurmstepd` - Cleanup per task generated environment for
-   network stack when running with SrunPortRange set.
+    containers in spooldir.
- * Federation - Fix revoked jobs being returned regardless of the -a/--all
+  * Fix `scontrol segfault` when 'completing' command requested
-   option for privileged users.
+    repeatedly in interactive mode.
- * Federation - Fix canceling pending federated jobs from non-origin clusters
+  * Properly handle a race condition between `bind()` and `listen()`
-   which could leave federated jobs orphaned from the origin cluster.
+    calls in the network stack when running with `SrunPortRange` set.
- * Fix sinfo segfault when printing multiple clusters with --noheader option.
+  * Federation - Fix revoked jobs being returned regardless of the
- * Federation - fix clusters not syncing if clusters are added to a federation
+    `-a`/`--all` option for privileged users.
-   before they have registered with the dbd.
+  * Federation - Fix canceling pending federated jobs from non-origin
- * Change pmi2 plugin to honor the SrunPortRange option. This matches the new
+    clusters which could leave federated jobs orphaned from the origin
-   behavior of the pmix plugin in 23.02.0. Note that neither of these plugins
+    cluster.
-   makes use of the "MpiParams=ports=" option, and previously were only limited
+  * Fix sinfo segfault when printing multiple clusters with
-   by the systems ephemeral port range.
+    `--noheader` option.
- * node_features/helpers - Fix node selection for jobs requesting changeable
+  * Federation - fix clusters not syncing if clusters are added to
-   features with the '|' operator, which could prevent jobs from running on
+    a federation before they have registered with the dbd.
-   some valid nodes.
+  * Change `pmi2` plugin to honor the `SrunPortRange` option. This
- * node_features/helpers - Fix inconsistent handling of '&' and '|', where an
+    matches the new behavior of the pmix plugin in 23.02.0. Note that
-   AND'd feature was sometimes AND'd to all sets of features instead of just
+    neither of these plugins makes use of the "`MpiParams=ports=`"
-   the current set. E.g. "foo|bar&baz" was interpreted as {foo,baz} or
+    option, and previously were only limited by the systems ephemeral
-   {bar,baz} instead of how it is documented: "{foo} or {bar,baz}".
+    port range.
- * Fix job accounting so that when a job is requeued its allocated node count
+  * `node_features/helpers` - Fix node selection for jobs requesting
-   is cleared. After the requeue, sacct will correctly show that the job has
+    changeable features with the '`|`' operator, which could prevent
-   0 AllocNodes while it is pending or if it is canceled before restarting.
+    jobs from running on some valid nodes.
- * sacct - AllocCPUS now correctly shows 0 if a job has not yet received an
+  * `node_features/helpers` - Fix inconsistent handling of '`&`' and
-   allocation or if the job was canceled before getting one.
+    '`|`', where an AND'd feature was sometimes AND'd to all sets of
- * Fix intel oneapi autodetect: detect the /dev/dri/renderD[0-9]+ gpus, and do
+    features instead of just the current set. E.g. "`foo|bar&baz`" was
-   not detect /dev/dri/card[0*9]+.
+    interpreted as `{foo,baz}` or `{bar,baz}` instead of how it is
- * Format batch, extern, interactive, and pending step ids into strings that
+    documented: "`{foo} or {bar,baz}`".
-   are human readable.
+  * Fix job accounting so that when a job is requeued its allocated
- * Fix node selection for jobs that request --gpus and a number of tasks fewer
+    node count is cleared. After the requeue, sacct will correctly
-   than gpus, which resulted in incorrectly rejecting these jobs.
+    show that the job has  0 `AllocNodes` while it is pending or if
- * Remove MYSQL_OPT_RECONNECT completely.
+    it is canceled before restarting.
- * Fix cloud nodes in POWERING_UP state disappearing (getting set to FUTURE)
+  * `sacct` - `AllocCPUS` now correctly shows 0 if a job has not yet
-   when an `scontrol reconfigure` happens.
+    received an allocation or if the job was canceled before getting
- * openapi/dbv0.0.39 - Avoid assert / segfault on missing coordinators list.
+    one.
- * slurmrestd - Correct memory leak while parsing OpenAPI specification
+  * Fix intel oneapi autodetect: detect the `/dev/dri/renderD[0-9]+`
-   templates with server overrides.
+    gpus, and do not detect `/dev/dri/card[0-9]+`.
- * slurmrestd - Reduce memory usage when printing out job CPU frequency.
+  * Format batch, extern, interactive, and pending step ids into
- * Fix overwriting user node reason with system message.
+    strings that are human readable.
- * Remove --uid / --gid options from salloc and srun commands.
+  * Fix node selection for jobs that request `--gpus` and a number
- * Prevent deadlock when rpc_queue is enabled.
+    of tasks fewer than gpus, which resulted in incorrectly rejecting
- * slurmrestd - Correct OpenAPI specification generation bug where fields with
+    these jobs.
-   overlapping parent paths would not get generated.
+  * Remove `MYSQL_OPT_RECONNECT` completely.
- * Fix memory leak as a result of a partition info query.
+  * Fix cloud nodes in `POWERING_UP` state disappearing (getting set
- * Fix memory leak as a result of a job info query.
+    to `FUTURE`) when an `scontrol reconfigure` happens.
- * slurmrestd - For 'GET /slurm/v0.0.39/node[s]', change format of node's
+  * `openapi/dbv0.0.39` - Avoid assert / segfault on missing
-   energy field "current_watts" to a dictionary to account for unset value
+    coordinators list.
-   instead of dumping 4294967294.
+  * `slurmrestd` - Correct memory leak while parsing OpenAPI
- * slurmrestd - For 'GET /slurm/v0.0.39/qos', change format of QOS's
+    specification templates with server overrides.
-   field "priority" to a dictionary to account for unset value instead of
+  * `slurmrestd` - Reduce memory usage when printing out job CPU
-   dumping 4294967294.
+    frequency.
- * slurmrestd - For 'GET /slurm/v0.0.39/job[s]', the 'return code' code field
+  * Fix overwriting user node reason with system message.
-   in v0.0.39_job_exit_code will be set to *127 instead of being left unset
+  * Remove `--uid` / `--gid` options from salloc and srun commands.
-   where job does not have a relevant return code.
+  * Prevent deadlock when rpc_queue is enabled.
- * data_parser/v0.0.39 - Add required/memory_per_cpu and
+  * `slurmrestd` - Correct OpenAPI specification generation bug where
-   required/memory_per_node to `sacct *-json` and `sacct --yaml` and
+    fields with overlapping parent paths would not get generated.
-   'GET /slurmdb/v0.0.39/jobs' from slurmrestd.
+  * Fix memory leak as a result of a partition info query.
- * For step allocations, fix --gres=none sometimes not ignoring gres from the
+  * Fix memory leak as a result of a job info query.
-   job.
+  * slurmrestd - For `GET /slurm/v0.0.39/node[s]`, change format of
- * Fix --exclusive jobs incorrectly gang-scheduling where they shouldn't.
+    node's energy field `current_watts` to a dictionary to account
- * Fix allocations with CR_SOCKET, gres not assigned to a specific socket, and
+    for unset value instead of dumping `4294967294`.
-   block core distribion potentially allocating more sockets than required.
+  * `slurmrestd` - For `GET /slurm/v0.0.39/qos`, change format of
- * gpu/oneapi - Store cores correctly so CPU affinity is tracked.
+    QOS's field `priority` to a dictionary to account for unset
- * Revert a change in 23.02.3 where Slurm would kill a script's process group
+    value instead of dumping `4294967294`.
-   as soon as the script ended instead of waiting as long as any process in
+  * `slurmrestd` - For `GET /slurm/v0.0.39/job[s]`, the `return code`
-   that process group held the stdout/stderr file descriptors open. That change
+    code field in `v0.0.39_job_exit_code` will be set to 127 instead
-   broke some scripts that relied on the previous behavior. Setting time limits
+    of being left unset where job does not have a relevant return code.
-   for scripts (such as PrologEpilogTimeout) is strongly encouraged to avoid
+  * `data_parser/v0.0.39` - Add `required/memory_per_cpu` and
-   Slurm waiting indefinitely for scripts to finish.
+    required/memory_per_node to `sacct --json` and `sacct --yaml` and
- * Allow slurmdbd -R to work if the root assoc id is not 1.
+    `GET /slurmdb/v0.0.39/jobs` from `slurmrestd`.
- * Fix slurmdbd -R not returning an error under certain conditions.
+  * For step allocations, fix `--gres=none` sometimes not ignoring
- * slurmdbd - Avoid potential NULL pointer dereference in the mysql plugin.
+    gres from the job.
- * Revert a change in 23.02 where SLURM_NTASKS was no longer set in the job's
+  * Fix `--exclusive` jobs incorrectly gang-scheduling where they
-   environment when *-ntasks-per-node was requested.
+    shouldn't.
- * Limit periodic node registrations to 50 instead of the full TreeWidth.
+  * Fix allocations with `CR_SOCKET`, gres not assigned to a specific
-   Since unresolvable cloud/dynamic nodes must disable fanout by setting
+    socket, and block core distribion potentially allocating more
-   TreeWidth to a large number, this would cause all nodes to register at
+    sockets than required.
-   once.
+  * `gpu/oneapi` - Store cores correctly so CPU affinity is tracked.
- * Fix regression in 23.02.3 which broken x11 forwarding for hosts when
+  * Revert a change in 23.02.3 where Slurm would kill a script's
-   MUNGE sends a localhost address in the encode host field. This is caused
+    process group as soon as the script ended instead of waiting as
-   when the node hostname is mapped to 127.0.0.1 (or similar) in /etc/hosts.
+    long as any process in
- * openapi/[db]v0.0.39 - fix memory leak on parsing error.
+    that process group held the stdout/stderr file descriptors open.
- * data_parser/v0.0.39 - fix updating qos for associations.
+    That change broke some scripts that relied on the previous
- * openapi/dbv0.0.39 - fix updating values for associations with null users.
+    behavior. Setting time limits for scripts (such as
- * Fix minor memory leak with --tres-per-task and licenses.
+    `PrologEpilogTimeout`) is strongly encouraged to avoid Slurm
- * Fix cyclic socket cpu distribution for tasks in a step where
+    waiting indefinitely for scripts to finish.
-   --cpus-per-task < usable threads per core.
+  * Allow slurmdbd -R to work if the root assoc id is not 1.
  * Fix `slurmdbd -R` not returning an error under certain conditions.
  * `slurmdbd` - Avoid potential NULL pointer dereference in the
    mysql plugin.
  * Revert a change in 23.02 where `SLURM_NTASKS` was no longer
    set in the job's environment when `--ntasks-per-node` was
    requested.
  * Limit periodic node registrations to 50 instead of the full
   `TreeWidth`.
   Since unresolvable `cloud/dynamic` nodes must disable fanout by
   setting `TreeWidth` to a large number, this would cause all nodes
   to register at once.
  * Fix regression in 23.02.3 which broken x11 forwarding for hosts
    when `MUNGE` sends a localhost address in the encode host field.
    This is caused when the node hostname is mapped to 127.0.0.1
    (or similar) in `/etc/hosts`.
  * `openapi/[db]v0.0.39` - fix memory leak on parsing error.
  * `data_parser/v0.0.39` - fix updating qos for associations.
  * `openapi/dbv0.0.39` - fix updating values for associations with
    null users.
  * Fix minor memory leak with `--tres-per-task` and licenses.
  * Fix cyclic socket cpu distribution for tasks in a step where
    `--cpus-per-task` < usable threads per core.
 - Changes in Slurm 23.02.4
-  * Fix sbatch return code when **wait is requested on a job array.
+  * Fix `sbatch` return code when --wait is requested on a job array.
-  * switch/hpe_slingshot * avoid segfault when running with old libcxi.
+  * `switch/hpe_slingshot` - avoid segfault when running with old
-  * Avoid slurmctld segfault when specifying AccountingStorageExternalHost.
+    libcxi.
-  * Fix collected GPUUtilization values for acct_gather_profile plugins.
+  * Avoid slurmctld segfault when specifying
    `AccountingStorageExternalHost`.
  * Fix collected `GPUUtilization` values for `acct_gather_profile`
    plugins.
  * Fix slurmrestd handling of job hold/release operations.
-  * Make spank S_JOB_ARGV item value hold the requested command argv instead of
+  * Make spank `S_JOB_ARGV` item value hold the requested command
-    the srun **bcast value when **bcast requested (only in local context).
+    argv instead of the srun `--bcast` value when `--bcast` requested
-  * Fix step running indefinitely when slurmctld takes more than MessageTimeout
+    (only in local context).
-    to respond. Now, slurmctld will cancel the step when detected, preventing
+  * Fix step running indefinitely when slurmctld takes more than
-    following steps from getting stuck waiting for resources to be released.
+    `MessageTimeout` to respond. Now, `slurmctld` will cancel the
-  * Fix regression to make job_desc.min_cpus accurate again in job_submit when
+    step when detected, preventing following steps from getting stuck
-    requesting a job with **ntasks*per*node.
+    waiting for resources to be released.
-  * scontrol * Permit changes to StdErr and StdIn for pending jobs.
+  * Fix regression to make job_desc.min_cpus accurate again in
-  * scontrol * Reset std{err,in,out} when set to empty string.
+    job_submit when requesting a job with `--ntasks-per-node`.
-  * slurmrestd * mark environment as a required field for job submission
+  * `scontrol` - Permit changes to `StdErr` and `StdIn` for pending
-    descriptions.
+    jobs.
-  * slurmrestd * avoid dumping null in OpenAPI schema required fields.
+  * `scontrol` - Reset std{err,in,out} when set to empty string.
-  * data_parser/v0.0.39 * avoid rejecting valid memory_per_node formatted as
+  * `slurmrestd` - mark environment as a required field for job
-    dictionary provided with a job description.
+     submission descriptions.
-  * data_parser/v0.0.39 * avoid rejecting valid memory_per_cpu formatted as
+  * `slurmrestd` - avoid dumping null in OpenAPI schema required
-    dictionary provided with a job description.
+    fields.
-  * slurmrestd * Return HTTP error code 404 when job query fails.
+    `data_parser/v0.0.39` - avoid rejecting valid `memory_per_node`
-  * slurmrestd * Add return schema to error response to job and license query.
+    formatted as dictionary provided with a job description.
  * `data_parser/v0.0.39` - avoid rejecting valid `memory_per_cpu`
    formatted as dictionary provided with a job description.
  * `slurmrestd` - Return HTTP error code 404 when job query fails.
  * `slurmrestd` - Add return schema to error response to job and
    license query.
  * Fix handling of ArrayTaskThrottle in backfill.
-  * Fix regression in 23.02.2 when checking gres state on slurmctld startup or
+  * Fix regression in 23.02.2 when checking gres state on `slurmctld`
-    reconfigure. Gres changes in the configuration were not updated on slurmctld
+    startup or  reconfigure. Gres changes in the configuration were
-    startup. On startup or reconfigure, these messages were present in the log:
+    not updated on `slurmctld` startup. On startup or reconfigure,
-    "error: Attempt to change gres/gpu Count".
+    these messages were present in the log:
    "`error: Attempt to change gres/gpu Count`".
  * Fix potential double count of gres when dealing with limits.
-  * switch/hpe_slingshot * support alternate traffic class names with "TC_"
+  * `switch/hpe_slingshot` - support alternate traffic class names
-    prefix.
+    with "`TC_`"  prefix.
-  * scrontab * Fix cutting off the final character of quoted variables.
+  * `scrontab` - Fix cutting off the final character of quoted
-  * Fix slurmstepd segfault when ContainerPath is not set in oci.conf
+    variables.
-  * Change the log message warning for rate limited users from debug to verbose.
+  * Fix `slurmstepd` segfault when `ContainerPath` is not set in
-  * Fixed an issue where jobs requesting licenses were incorrectly rejected.
+    `oci.conf`.
-  * smail * Fix issues where e*mails at job completion were not being sent.
+  * Change the log message warning for rate limited users from
-  * scontrol/slurmctld * fix comma parsing when updating a reservation's nodes.
+    debug to verbose.
-  * cgroup/v2 * Avoid capturing log output for ebpf when constraining devices,
+  * Fixed an issue where jobs requesting licenses were incorrectly
-    as this can lead to inadvertent failure if the log buffer is too small.
+    rejected.
-  * Fix **gpu*bind=single binding tasks to wrong gpus, leading to some gpus
+  * `smail` - Fix issues where emails at job completion were not
-    having more tasks than they should and other gpus being unused.
+    being sent.
-  * Fix main scheduler loop not starting after failover to backup controller.
+  * `scontrol/slurmctld` - fix comma parsing when updating a
-  * Added error message when attempting to use sattach on batch or extern steps.
+    reservation's nodes.
-  * Fix regression in 23.02 that causes slurmstepd to crash when srun requests
+  * `cgroup/v2` - Avoid capturing log output for ebpf when
-    more than TreeWidth nodes in a step and uses the pmi2 or pmix plugin.
+    constraining devices, as this can lead to inadvertent failure
-  * Reject job ArrayTaskThrottle update requests from unprivileged users.
+    if the log buffer is too small.
-  * data_parser/v0.0.39 * populate description fields of property objects in
+  * Fix --gpu-bind=single binding tasks to wrong gpus, leading to
-    generated OpenAPI specifications where defined.
+    some gpus having more tasks than they should and other gpus being
-  * slurmstepd * Avoid segfault caused by ContainerPath not being terminated by
+    unused.
-    '/' in oci.conf.
+  * Fix main scheduler loop not starting after failover to backup
-  * data_parser/v0.0.39 * Change v0.0.39_job_info response to tag exit_code
+    controller.
-    field as being complex instead of only an unsigned integer.
+  * Added error message when attempting to use sattach on batch or
-  * job_container/tmpfs * Fix %h and %n substitution in BasePath where %h was
+    extern steps.
-    substituted as the NodeName instead of the hostname, and %n was substituted
+  * Fix regression in 23.02 that causes slurmstepd to crash when
-    as an empty string.
+    `srun` requests more than `TreeWidth` nodes in a step and uses
-  * Fix regression where **cpu*bind=verbose would override TaskPluginParam.
+    the `pmi2` or `pmix` plugin.
-  * scancel * Fix **clusters/*M for federations. Only filtered jobs (e.g. *A,
+  * Reject job `ArrayTaskThrottle` update requests from unprivileged
-    *u, *p, etc.) from the specified clusters will be canceled, rather than all
+    users.
-    jobs in the federation. Specific jobids will still be routed to the origin
+  * `data_parser/v0.0.39` - populate description fields of property
-     cluster for cancellation.
+    objects in generated OpenAPI specifications where defined.
-
+  * `slurmstepd` - Avoid segfault caused by ContainerPath not being
    terminated by '`/`' in `oci.conf`.
  * `data_parser/v0.0.39` - Change `v0.0.39_job_info` response to tag
    `exit_code` field as being complex instead of only an unsigned
    integer.
  * `job_container/tmpfs` - Fix %h and %n substitution in `BasePath`
    where `%h` was substituted as the `NodeName` instead of the
    hostname, and `%n` was substituted as an empty string.
  * Fix regression where --cpu-bind=verbose would override
  `TaskPluginParam`.
  * `scancel` - Fix `--clusters`/`-M` for federations. Only filtered
    jobs (e.g. -A, -u, -p, etc.) from the specified clusters will be
    canceled, rather than all jobs in the federation.
    Specific jobids will still be routed to the origin cluster
    for cancellation.
 -------------------------------------------------------------------
 Mon Jan 29 13:47:55 UTC 2024 - Egbert Eich <eich@suse.com>
@ -2337,7 +2662,6 @@ Fri Jul  2 08:01:32 UTC 2021 - Christian Goll <cgoll@suse.com>
 - Updated to  20.11.8:
  * slurmctld - fix erroneous "StepId=CORRUPT" messages in error logs.
  * Correct the error given when auth plugin fails to pack a credential.
  * Fix unused-variable compiler warning on FreeBSD in fd_resolve_path().
  * acct_gather_filesystem/lustre - only emit collection error once per step.
  * Add GRES environment variables (e.g., CUDA_VISIBLE_DEVICES) into the
    interactive step, the same as is done for the batch step.
--- a/slurm.spec
+++ b/slurm.spec
@ -19,7 +19,7 @@
 # Check file META in sources: update so_version to (API_CURRENT - API_AGE)
 %define so_version 41
 # Make sure to update `upgrades` as well!
-%define ver 24.05.0
+%define ver 24.05.3
 %define _ver _24_05
 %define dl_ver %{ver}
 # so-version is 0 and seems to be stable
@ -59,6 +59,9 @@ ExclusiveArch:  do_not_build
 %if 0%{?sle_version} == 150500 || 0%{?sle_version} == 150600
 %define base_ver 2302
 %endif
 %if 0%{?sle_version} == 150500 || 0%{?sle_version} == 150600
 %define base_ver 2302
 %endif
 %define ver_m %{lua:x=string.gsub(rpm.expand("%ver"),"%.[^%.]*$","");print(x)}
 # Keep format_spec_file from botching the define below: