- Update to version 24.05.3

* `data_parser/v0.0.40` - Added field descriptions. * `slurmrestd` - Avoid creating new slurmdbd connection per request to `* /slurm/slurmctld/*/*` endpoints. * Fix compilation issue with `switch/hpe_slingshot` plugin. * Fix gres per task allocation with threads-per-core. * `data_parser/v0.0.41` - Added field descriptions. * `slurmrestd` - Change back generated OpenAPI schema for `DELETE /slurm/v0.0.40/jobs/` to `RequestBody` instead of using parameters for request. `slurmrestd` will continue accept endpoint requests via `RequestBody` or HTTP query. * `topology/tree` - Fix issues with switch distance optimization. * Fix potential segfault of secondary `slurmctld` when falling back to the primary when running with a `JobComp` plugin. * Enable `--json`/`--yaml=v0.0.39` options on client commands to dump data using data_parser/v0.0.39 instead or outputting nothing. * `switch/hpe_slingshot` - Fix issue that could result in a 0 length state file. * Fix unnecessary message protocol downgrade for unregistered nodes. * Fix unnecessarily packing alias addrs when terminating jobs with a mix of non-cloud/dynamic nodes and powered down cloud/dynamic nodes. * `accounting_storage/mysql` - Fix issue when deleting a qos that could remove too many commas from the qos and/or delta_qos fields of the assoc table. * `slurmctld` - Fix memory leak when using RestrictedCoresPerGPU. * Fix allowing access to reservations without `MaxStartDelay` set. * Fix regression introduced in 24.05.0rc1 breaking `srun --send-libs` parsing. * Fix slurmd vsize memory leak when using job submission/allocation OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=295
2024-10-15 06:51:09 +00:00 · 2024-10-15 06:51:09 +00:00 · b2f6e848a1
commit b2f6e848a1
parent fc209e050f
4 changed files with 629 additions and 302 deletions
--- a/slurm-24.05.0.tar.bz2
+++ b/slurm-24.05.0.tar.bz2
@ -1,3 +0,0 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:a6d3e95f2bbda3c9567060efc3d7090ad8eac257fa3578798c89321957946e49
 size 7117445
--- a/slurm-24.05.3.tar.bz2
+++ b/slurm-24.05.3.tar.bz2
@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:b0b40513e9b6ae867ddb95d60b950bcb980c15b735b5d0dea37a9a00cc64ae24
 size 7189600
--- a/slurm.changes
+++ b/slurm.changes
@ -1,8 +1,275 @@
 -------------------------------------------------------------------
 Mon Oct 14 10:40:10 UTC 2024 - Egbert Eich <eich@suse.com>
 - Update to version 24.05.3
  * `data_parser/v0.0.40` - Added field descriptions.
  * `slurmrestd` - Avoid creating new slurmdbd connection per request
    to `* /slurm/slurmctld/*/*` endpoints.
  * Fix compilation issue with `switch/hpe_slingshot` plugin.
  * Fix gres per task allocation with threads-per-core.
  * `data_parser/v0.0.41` - Added field descriptions.
  * `slurmrestd` - Change back generated OpenAPI schema for
    `DELETE /slurm/v0.0.40/jobs/` to `RequestBody` instead of using
    parameters for request. `slurmrestd` will continue accept endpoint
    requests via `RequestBody` or HTTP query.
  * `topology/tree` - Fix issues with switch distance optimization.
  * Fix potential segfault of secondary `slurmctld` when falling back
    to the primary when running with a `JobComp` plugin.
  * Enable `--json`/`--yaml=v0.0.39` options on client commands to
    dump data using data_parser/v0.0.39 instead or outputting nothing.
  * `switch/hpe_slingshot` - Fix issue that could result in a 0 length
    state file.
  * Fix unnecessary message protocol downgrade for unregistered nodes.
  * Fix unnecessarily packing alias addrs when terminating jobs with
    a mix of non-cloud/dynamic nodes and powered down cloud/dynamic
    nodes.
  * `accounting_storage/mysql` - Fix issue when deleting a qos that
    could remove too many commas from the qos and/or delta_qos fields
    of the assoc table.
  * `slurmctld` - Fix memory leak when using RestrictedCoresPerGPU.
  * Fix allowing access to reservations without `MaxStartDelay` set.
  * Fix regression introduced in 24.05.0rc1 breaking
    `srun --send-libs` parsing.
  * Fix slurmd vsize memory leak when using job submission/allocation
    commands that implicitly or explicitly use --get-user-env.
  * `slurmd` - Fix node going into invalid state when using
    `CPUSpecList` and setting CPUs to the # of cores on a
    multithreaded node.
  * Fix reboot asap nodes being considered in backfill after a restart.
  * Fix `--clusters`/`-M queries` for clusters outside of a
    federation when `fed_display` is configured.
  * Fix `scontrol` allowing updating job with bad cpus-per-task value.
  * `sattach` - Fix regression from 24.05.2 security fix leading to
    crash.
  * `mpi/pmix` - Fix assertion when built under `--enable-debug`.
 - Changes from Slurm 24.05.2
  * Fix energy gathering rpc counter underflow in
    `_rpc_acct_gather_energy` when more than 10 threads try to get
    energy at the same time. This prevented the possibility to get
    energy from slurmd by any step until slurmd was restarted,
    so losing energy accounting metrics in the node.
  * `accounting_storage/mysql` - Fix issue where new user with `wckey`
    did not have a default wckey sent to the slurmctld.
  * `slurmrestd` - Prevent slurmrestd segfault when handling the
    following endpoints when none of the optional parameters are
    specified:
      `DELETE /slurm/v0.0.40/jobs`
      `DELETE /slurm/v0.0.41/jobs`
      `GET /slurm/v0.0.40/shares`
      `GET /slurm/v0.0.41/shares`
      `GET /slurmdb/v0.0.40/instance`
      `GET /slurmdb/v0.0.41/instance`
      `GET /slurmdb/v0.0.40/instances`
      `GET /slurmdb/v0.0.41/instances`
      `POST /slurm/v0.0.40/job/{job_id}`
      `POST /slurm/v0.0.41/job/{job_id}`
  * Fix IPMI energy gathering when no IPMIPowerSensors are specified
    in `acct_gather.conf`. This situation resulted in an accounted
    energy of 0 for job steps.
  * Fix a minor memory leak in slurmctld when updating a job dependency.
  * `scontrol`,`squeue` - Fix regression that caused incorrect values
    for multisocket nodes at `.jobs[].job_resources.nodes.allocation`
    for `scontrol show jobs --(json|yaml)` and `squeue --(json|yaml)`.
  * `slurmrestd` - Fix regression that caused incorrect values for
    multisocket nodes at `.jobs[].job_resources.nodes.allocation` to
    be dumped with endpoints:
      `GET /slurm/v0.0.41/job/{job_id}`
      `GET /slurm/v0.0.41/jobs`
  * `jobcomp/filetxt` - Fix truncation of job record lines > 1024
    characters.
  * `switch/hpe_slingshot` - Drain node on failure to delete CXI
    services.
  * Fix a performance regression from 23.11.0 in cpu frequency
    handling when no `CpuFreqDef` is defined.
  * Fix one-task-per-sharing not working across multiple nodes.
  * Fix inconsistent number of cpus when creating a reservation
    using the TRESPerNode option.
  * `data_parser/v0.0.40+` - Fix job state parsing which could
    break filtering.
  * Prevent `cpus-per-task` to be modified in jobs where a `-c`
    value has been explicitly specified and the requested memory
    constraints implicitly increase the number of CPUs to allocate.
  * `slurmrestd` - Fix regression where args `-s v0.0.39,dbv0.0.39`
    and `-d v0.0.39` would result in `GET /openapi/v3` not
    registering as a valid possible query resulting in 404 errors.
  * `slurmrestd` - Fix memory leak for dbv0.0.39 jobs query which
    occurred if the query parameters specified account, association,
    cluster, constraints, format, groups, job_name, partition, qos,
    reason, reservation, state, users, or wckey. This affects the
    following endpoints:
      `GET /slurmdb/v0.0.39/jobs`
  * `slurmrestd` - In the case the slurmdbd does not respond to a
    persistent connection init message, prevent the closed fd from
    being used, and instead emit an error or warning depending on
    if the connection was required.
  * Fix 24.05.0 regression that caused the slurmdbd not to send back
    an error message if there is an error initializing a persistent
    connection.
  * Reduce latency of forwarded x11 packets.
  * Add `curr_dependency` (representing the current dependency of
    the job).
    and `orig_dependency` (representing the original requested
    dependency of the job) fields to the job record in
    `job_submit.lua` (for job update) and `jobcomp.lua`.
  * Fix potential segfault of slurmctld configured with
    `SlurmctldParameters=enable_rpc_queue` from happening on
    reconfigure.
  * Fix potential segfault of slurmctld on its shutdown when rate
    limitting is enabled.
  * `slurmrestd` - Fix missing job environment for `SLURM_JOB_NAME`,
    `SLURM_OPEN_MODE`, `SLURM_JOB_DEPENDENCY`, `SLURM_PROFILE`,
    `SLURM_ACCTG_FREQ`, `SLURM_NETWORK` and `SLURM_CPU_FREQ_REQ` to
    match sbatch.
  * Fix GRES environment variable indices being incorrect when only
    using a subset of all GPUs on a node and the
    `--gres-flags=allow-task-sharing` option.
  * Prevent `scontrol` from segfaulting when requesting scontrol
    show reservation `--json` or `--yaml` if there is an error
    retrieving reservations from the `slurmctld`.
  * `switch/hpe_slingshot` - Fix security issue around managing VNI
    access. CVE-2024-42511.
  * `switch/nvidia_imex` - Fix security issue managing IMEX channel
    access. CVE-2024-42511.
  * `switch/nvidia_imex` - Allow for compatibility with
    `job_container/tmpfs`.
 - Changes in Slurm 24.05.1
  * Fix `slurmctld` and `slurmdbd` potentially stopping instead of
    performing a logrotate when recieving `SIGUSR2` when using
    `auth/slurm`.
  * `switch/hpe_slingshot` - Fix slurmctld crash when upgrading
    from 23.02.
  * Fix "Could not find group" errors from `validate_group()` when
    using `AllowGroups` with large `/etc/group` files.
  * Add `AccountingStoreFlags=no_stdio` which allows to not record
    the stdio paths of the job when set.
  * `slurmrestd` - Prevent a slurmrestd segfault when parsing the
    `crontab` field, which was never usable. Now it explicitly
    ignores the value and emits a warning if it is used for the
    following endpoints:
      `POST /slurm/v0.0.39/job/{job_id}`
      `POST /slurm/v0.0.39/job/submit`
      `POST /slurm/v0.0.40/job/{job_id}`
      `POST /slurm/v0.0.40/job/submit`
      `POST /slurm/v0.0.41/job/{job_id}`
      `POST /slurm/v0.0.41/job/submit`
      `POST /slurm/v0.0.41/job/allocate`
  * `mpi/pmi2` - Fix communication issue leading to task launch
    failure with "`invalid kvs seq from node`".
  * Fix getting user environment when using sbatch with
    `--get-user-env` or `--export=` when there is a user profile
    script that reads `/proc`.
  * Prevent slurmd from crashing if `acct_gather_energy/gpu` is
    configured but `GresTypes` is not configured.
  * Do not log the following errors when `AcctGatherEnergyType`
    plugins are used but a node does not have or cannot find sensors:
    "`error: _get_joules_task: can't get info from slurmd`"
    "`error: slurm_get_node_energy: Zero Bytes were transmitted or
     received`"
    However, the following error will continue to be logged:
    "`error: Can't get energy data. No power sensors are available.
     Try later`"
  * `sbatch`, `srun` - Set `SLURM_NETWORK` environment variable if
    `--network` is set.
  * Fix cloud nodes not being able to forward to nodes that restarted
    with new IP addresses.
  * Fix cwd not being set correctly when running a SPANK plugin with a
    `spank_user_init()` hook and the new "`contain_spank`" option set.
  * `slurmctld` - Avoid deadlock during shutdown when `auth/slurm`
    is active.
  * Fix segfault in `slurmctld` with `topology/block`.
  * `sacct` - Fix printing of job group for job steps.
  * `scrun` - Log when an invalid environment variable causes the
    job submission to be rejected.
  * `accounting_storage/mysql` - Fix problem where listing or
    modifying an association when specifying a qos list could hang
    or take a very long time.
  * `gpu/nvml` - Fix `gpuutil/gpumem` only tracking last GPU in step.
    Now, `gpuutil/gpumem` will record sums of all GPUS in the step.
  * Fix error in `scrontab` jobs when using
    `slurm.conf:PropagatePrioProcess=1`.
  * Fix `slurmctld` crash on a batch job submission with
    `--nodes 0,...`.
  * Fix dynamic IP address fanout forwarding when using `auth/slurm`.
  * Restrict listening sockets in the `mpi/pmix` plugin and `sattach`
    to the `SrunPortRange`.
  * `slurmrestd` - Limit mime types returned from query to
    `GET /openapi/v3` to only return one mime type per serializer
    plugin to fix issues with OpenAPI client generators that are
    unable to handle multiple mime type aliases.
  * Fix many commands possibly reporting an "`Unexpected Message
    Received`" when in reality the connection timed out.
  * Prevent   slurmctld  from starting if there is not a json
    serializer present and the `extra_constraints` feature is enabled.
  * Fix heterogeneous job components not being signaled with
    `scancel --ctld` and `DELETE slurm/v0.0.40/jobs` if the job ids
    are not explicitly given, the heterogeneous job components match
    the given filters, and the heterogeneous job leader does not
    match the given filters.
  * Fix regression from 23.02 impeding job licenses from being cleared.
  * Move error to `log_flag` which made `_get_joules_task` error to
    be logged to the user when too many rpcs were queued in slurmd
    for gathering energy.
  * For `scancel --ctld` and the associated rest api endpoints:
      `DELETE /slurm/v0.0.40/jobs`
      `DELETE /slurm/v0.0.41/jobs`
    Fix canceling the final array task in a job array when the task
    is pending and all array tasks have been split into separate job
    records. Previously this task was not canceled.
  * Fix `power_save operation` after recovering from a failed
    reconfigure.
  * `slurmctld` - Skip removing the pidfile when running under
    systemd. In that situation it is never created in the first place.
  * Fix issue where altering the flags on a Slurm account
    (`UsersAreCoords`) several limits on the account's association
    would be set to 0 in Slurm's internal cache.
  * Fix memory leak in the controller when relaying `stepmgr` step
    accounting to the dbd.
  * Fix segfault when submitting stepmgr jobs within an existing
    allocation.
  * Added `disable_slurm_hydra_bootstrap` as a possible `MpiParams`
    parameter in `slurm.conf`. Using this will disable env variable
    injection to allocations for the following variables:
    `I_MPI_HYDRA_BOOTSTRAP,` `I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS`,
    `HYDRA_BOOTSTRAP`, `HYDRA_LAUNCHER_EXTRA_ARGS`.
  * `scrun` - Delay shutdown until after start requested.
    This caused `scrun` to never start or shutdown and hung forever
    when using `--tty`.
  * Fix backup `slurmctld` potentially not running the agent when
    taking over as the primary controller.
  * Fix primary controller not running the agent when a reconfigure
    of the `slurmctld` fails.
  * `slurmd` - fix premature timeout waiting for
    `REQUEST_LAUNCH_PROLOG` with large array jobs causing node to
    drain.
  * `jobcomp/{elasticsearch,kafka}` - Avoid sending fields with
    invalid date/time.
  * `jobcomp/elasticsearch` - Fix `slurmctld` memory leak from
    curl usage.
  * `acct_gather_profile/influxdb` - Fix slurmstepd memory leak from
    curl usage
  * Fix 24.05.0 regression not deleting job hash dirs after
    `MinJobAge`.
  * Fix filtering arguments being ignored when using squeue `--json`.
  * `switch/nvidia_imex` - Move setup call after `spank_init()` to
    allow namespace manipulation within the SPANK plugin.
  * `switch/nvidia_imex` - Skip plugin operation if
    `nvidia-caps-imex-channels` device is not present rather than
    preventing slurmd from starting.
  * `switch/nvidia_imex` - Skip plugin operation if
    `job_container/tmpfs` is configured due to incompatibility.
  * `switch/nvidia_imex` - Remove any pre-existing channels when
    `slurmd` starts.
  * `rpc_queue` - Add support for an optional `rpc_queue.yaml`
    configuration file.
  * `slurmrestd` - Add new +prefer_refs flag to `data_parser/v0.0.41`
    plugin. This flag will avoid inlining single referenced schemas
    in the OpenAPI schema.
 -------------------------------------------------------------------
 Tue Jun  4 09:36:54 UTC 2024 - Christian Goll <cgoll@suse.com>
- updated to new release 24.05.0 with following major changes
+- Updated to new release 24.05.0 with following major changes
- IMPORTANT NOTES:
+  * Important Notes:
    If using the slurmdbd (Slurm DataBase Daemon) you must update
    this first.  NOTE: If using a backup DBD you must start the
    primary first to do any database conversion, the backup will not
@ -11,302 +278,360 @@ Tue Jun  4 09:36:54 UTC 2024 - Christian Goll <cgoll@suse.com>
    need to update all clusters at the same time, but it is very
    important to update slurmdbd first and having it running before
    updating any other clusters making use of it.
- HIGHLIGHTS
+  * Highlights
-  * Federation - allow client command operation when slurmdbd is
+    + Federation - allow client command operation when slurmdbd is
      unavailable.
-  * burst_buffer/lua - Added two new hooks: slurm_bb_test_data_in
+    + `burst_buffer/lua` - Added two new hooks: `slurm_bb_test_data_in`
-    and slurm_bb_test_data_out. The syntax and use of the new hooks
+      and `slurm_bb_test_data_out`. The syntax and use of the new hooks
-    are documented in etc/burst_buffer.lua.example. These are
+      are documented in `etc/burst_buffer.lua.example`. These are
      required to exist. slurmctld now checks on startup if the
-    burst_buffer.lua script loads and contains all required hooks;
+      `burst_buffer.lua` script loads and contains all required hooks;
-    slurmctld will exit with a fatal error if this is not
+      `slurmctld` will exit with a fatal error if this is not
-    successful. Added PollInterval to burst_buffer.conf. Removed
+      successful. Added `PollInterval` to `burst_buffer.conf`. Removed
      the arbitrary limit of 512 copies of the script running
      simultaneously.
-  * Add QOS limit MaxTRESRunMinsPerAccount. 
+    + Add QOS limit `MaxTRESRunMinsPerAccount`.
-  * Add QOS limit MaxTRESRunMinsPerUser.
+    + Add QOS limit `MaxTRESRunMinsPerUser`.
-  * Add ELIGIBLE environment variable to jobcomp/script plugin.
+    + Add `ELIGIBLE` environment variable to `jobcomp/script` plugin.
-  * Always use the QOS name for SLURM_JOB_QOS environment variables.
+    + Always use the QOS name for `SLURM_JOB_QOS` environment variables.
      Previously the batch environment would use the description field,
      which was usually equivalent to the name.
-  * cgroup/v2 - Require dbus-1 version >= 1.11.16.
+    + `cgroup/v2` - Require dbus-1 version >= 1.11.16.
-  * Allow NodeSet names to be used in SuspendExcNodes.
+    + Allow `NodeSet` names to be used in SuspendExcNodes.
-  * SuspendExcNodes=<nodes>:N now counts allocated nodes in N. The
+    + `SuspendExcNodes=<nodes>:N` now counts allocated nodes in `N`.
-    first N powered up nodes in <nodes> are protected from being
+      The first `N` powered up nodes in <nodes> are protected from
-    suspended.
+      being suspended.
-  * Store job output, input and error paths in SlurmDBD.
+    + Store job output, input and error paths in `SlurmDBD`.
-  * Add USER_DELETE reservation flag to allow users with access to
+    + Add `USER_DELETE` reservation flag to allow users with access
-    a reservation to delete it.
+      to a reservation to delete it.
-  * Add SlurmctldParameters=enable_stepmgr to enable step
+    + Add `SlurmctldParameters=enable_stepmgr` to enable step
-    management through the slurmstepd instead of the controller.
+      management through the `slurmstepd` instead of the controller.
-  * Added PrologFlags=RunInJob to make prolog and epilog run
+    + Added `PrologFlags=RunInJob` to make prolog and epilog run
      inside the job extern step to include it in the job's cgroup.
-  * Add ability to reserve MPI ports at the job level for stepmgr
+    + Add ability to reserve MPI ports at the job level for stepmgr
      jobs and subdivide them at the step level.
-  * slurmrestd - Add --generate-openapi-spec argument.
+    + `slurmrestd` - Add `--generate-openapi-spec argument`.
- CONFIGURATION FILE CHANGES (see appropriate man page for details)
+  * Configuration File Changes (see appropriate man page for details)
-  * CoreSpecPlugin has been removed.
+    + `CoreSpecPlugin` has been removed.
-  * Removed TopologyPlugin tree and dragonfly support from
+    + Removed `TopologyPlugin` tree and dragonfly support from
-    select/linear.  If those topology plugins are desired please switch to
+      `select/linear`.  If those topology plugins are desired please
-    select/cons_tres.
+      switch to `select/cons_tres`.
-  * Changed the default value for UnkillableStepTimeout to 60
+    + Changed the default value for `UnkillableStepTimeout` to 60
-    seconds or five times the value of MessageTimeout, whichever is greater.
+      seconds or five times the value of `MessageTimeout`, whichever
-  * An error log has been added if JobAcctGatherParams 'UsePss' or
+      is greater.
-    'NoShare' are configured with a plugin other than jobacct_gather/linux.
+    + An error log has been added if `JobAcctGatherParams` '`UsePss`'
-    In such case these parameters are ignored.
+      or '`NoShare`' are configured with a plugin other than
-  * helpers.conf - Added Flags=rebootless parameter allowing feature changes
+      `jobacct_gather/linux`. In such case these parameters are ignored.
-    without rebooting compute nodes.
+    + `helpers.conf` - Added `Flags=rebootless` parameter allowing
-  * topology/block - Replaced the BlockLevels with BlockSizes in topology.conf.
+      feature changes without rebooting compute nodes.
-  * Add contain_spank option to SlurmdParameters. When set, spank_user_init(),
+    + `topology/block` - Replaced the `BlockLevels` with `BlockSizes`
-    spank_task_post_fork(), and spank_task_exit() will execute within the
+      in `topology.conf`.
-    job_container/tmpfs plugin namespace.
+    + Add `contain_spank` option to `SlurmdParameters`. When set,
-  * Add SlurmctldParameters=max_powered_nodes=N, which prevents powering up
+    `spank_user_init()`, `spank_task_post_fork()`, and
-    nodes after the max is reached.
+    `spank_task_exit()` will execute within the
-  * Add ExclusiveTopo to a partition definition in slurm.conf.
+    `job_container/tmpfs` plugin namespace.
-  * Add AccountingStorageParameters=max_step_records to limit how many steps
+    + Add `SlurmctldParameters=max_powered_nodes=N`, which prevents
-    are recorded in the database for each job *- excluding batc
+      powering up nodes after the max is reached.
- COMMAND CHANGES (see man pages for details)
+    + Add `ExclusiveTopo` to a partition definition in `slurm.conf`.
-  * Add support for "elevenses" as an additional time specification.
+    + Add `AccountingStorageParameters=max_step_records` to limit how
-  * Add support for sbcast --preserve when job_container/tmpfs configured
+      many steps are recorded in the database for each job - excluding
-    (previously documented as unsupported).
+      batch.
-  * scontrol - Add new subcommand 'power' for node power control.
+  * Command Changes (see man pages for details)
-  * squeue - Adjust StdErr, StdOut, and StdIn output formats. These will now
+    + Add support for "elevenses" as an additional time specification.
-    consistently print "(null)" if a value is unavailable. StdErr will no
+    + Add support for `sbcast --preserve` when `job_container/tmpfs`
-    longer display StdOut if it is not distinctly set. StdOut will now
+      configured (previously documented as unsupported).
-    correctly display the default filename pattern for job arrays, and no
+    + `scontrol` - Add new subcommand `power` for node power control.
-    longer show it for non*batch jobs. However, the expansion patterns will
+    + `squeue` - Adjust `StdErr`, `StdOut`, and `StdIn` output formats.
      These will now consistently print "`(null)`" if a value is
      unavailable. `StdErr` will no longer display `StdOut` if it is
      not distinctly set. `StdOut` will now correctly display the
      default filename pattern for job arrays, and no longer show it
      for non-batch jobs. However, the expansion patterns will
      no longer be substituted by default.
-  * Add --segment to job allocation to be used in topology/block.
+    + Add `--segment` to job allocation to be used in topology/block.
-  * Add --exclusive=topo for use with topology/block.
+    + Add `--exclusive=topo` for use with topology/block.
-  * squeue - Add --expand-patterns option to expand StdErr, StdOut, StdIn
+    + `squeue` - Add `--expand-patterns` option to expand `StdErr`,
-    filename patterns as best as possible.
+      `StdOut`, `StdIn` filename patterns as best as possible.
-  * sacct - Add --expand-patterns option to expand StdErr, StdOut, StdIn
+    + `sacct` - Add `--expand-patterns` option to expand `StdErr`,
-    filename patterns as best as possible.
+      `StdOut`, `StdIn` filename patterns as best as possible.
-  * sreport - Requesting format=Planned will now return the expected Planned
+    + `sreport` - Requesting `format=Planned` will now return the
-    time as documented, instead of PlannedDown. To request Planned Down,
+      expected `Planned` time as documented, instead of `PlannedDown`.
-    one must use now format=PLNDDown or format=PlannedDown explicitly. The
+      To request `Planned Down`, one must use now `format=PLNDDown`
-    abbreviations "Pl" or "Pla" will now make reference to Planned instead of
+      or `format=PlannedDown` explicitly. The abbreviations
-    PlannedDown.
+      "`Pl`" or "`Pla`" will now make reference to Planned instead
- API CHANGES
+      of `PlannedDown`.
-  * Removed ListIterator type from <slurm/slurm.h>.
+  * API Changes
-  * Removed slurm_xlate_job_id() from <slurm/slurm.h>
+    + Removed `ListIterator` type from `<slurm/slurm.h>`.
- SLURMRESTD CHANGES
+    + Removed `slurm_xlate_job_id()` from `<slurm/slurm.h>`
-  * openapi/dbv0.0.38 and openapi/v0.0.38 plugins have been removed.
+  * SLURMRESTD Changes
-  * openapi/dbv0.0.39 and openapi/v0.0.39 plugins have been tagged as
+    + `openapi/dbv0.0.38` and `openapi/v0.0.38` plugins have been
-    deprecated to warn of their removal in the next release.
+      removed.
-  * Changed slurmrestd.service to only listen on TCP socket by default.
+    + `openapi/dbv0.0.39` and `openapi/v0.0.39` plugins have been
-    Environments with existing drop*in units for the service may need
+      tagged as deprecated to warn of their removal in the next release.
-    further adjustments to work after upgrading.
+    + Changed `slurmrestd.service` to only listen on TCP socket by
-  * slurmrestd - Tagged `script` field as deprecated in
+      default. Environments with existing drop-in units for the
-    'POST /slurm/v0.0.41/job/submit' in anticipation of removal in future
+      service may need further adjustments to work after upgrading.
-    OpenAPI plugin versions. Job submissions should set the `job.script` (or
+    + `slurmrestd` - Tagged `script` field as deprecated in
-    `jobs[0].script` for HetJobs) fields instead.
+      `POST /slurm/v0.0.41/job/submit` in anticipation of removal in
-  * slurmrestd - Attempt to automatically convert enumerated string arrays with
+      future OpenAPI plugin versions. Job submissions should set the
-    incoming non*string values into strings. Add warning when incoming value for
+      `job.script` (or `jobs[0].script` for HetJobs) fields instead.
-    enumerated string arrays can not be converted to string and silently ignore
+    + `slurmrestd` - Attempt to automatically convert enumerated
-    instead of rejecting entire request. This change affects any endpoint that
+      string arrays with  incoming non-string values into strings.
-    uses an enunmerated string as given in the OpenAPI specification. An
+      Add warning when incoming value for enumerated string arrays
-    example of this conversion would be to 'POST /slurm/v0.0.41/job/submit' with
+      can not be converted to string and silently ignore instead of
-    '.job.exclusive = true'. While the JSON (boolean) true value matches a
+      rejecting entire request. This change affects any endpoint that
-    possible enumeration, it is not the expected "true" string. This change
+      uses an enunmerated string as given in the OpenAPI specification.
-    automatically converts the (boolean) true to (string) "true" avoiding a
+      An example of this conversion would be to
-    parsing failure.
+      `POST /slurm/v0.0.41/job/submit` with `.job.exclusive = true`.
-  * slurmrestd - Add 'POST /slurm/v0.0.41/job/allocate' endpoint. This endpoint
+      While the JSON (boolean) true value matches a possible
-    will create a new job allocation without any steps. The allocation will need
+      enumeration, it is not the expected "true" string. This change
-    to be ended via signaling the job or it will run to the timelimit.
+      automatically converts the (boolean) `true` to (string) "`true`"
-  * slurmrestd - Allow startup when slurmdbd is not configured and avoid loading
+      avoiding a parsing failure.
-    slurmdbd specific plugins.
+    + `slurmrestd` - Add `POST /slurm/v0.0.41/job/allocate` endpoint.
- MPI/PMI2 CHANGES
+      This endpoint will create a new job allocation without any steps.
-  * Jobs submitted with the SLURM_HOSTFILE environment variable set implies
+      The allocation will need to be ended via signaling the job or
-    using an arbitrary distribution. Nevertheless, the logic used in PMI2 when
+      it will run to the timelimit.
-    generating their associated PMI_process_mapping values has been changed and
+    + `slurmrestd` - Allow startup when `slurmdbd` is not configured
-    will now be the same used for the plane distribution, as if "-m plane" were
+      and avoid loading `slurmdbd` specific plugins.
-    used. This has been changed because the original arbitrary distribution
+  * MPI/PMI2 Changes
-    implementation did not account for multiple instances of the same host being
+    + Jobs submitted with the `SLURM_HOSTFILE` environment variable
-    present in SLURM_HOSTFILE, providing an incorrect process mapping in such
+      set implies using an arbitrary distribution. Nevertheless, the
-    case. This change also enables distributing tasks in blocks when using
+      logic used in PMI2 when generating their associated
-    arbitrary distribution, which was not the case before. This only affects
+      `PMI_process_mapping` values has been changed and will now be
-    mpi/pmi2 plugin.
+      the same used for the plane distribution, as if `-m plane` were
- removed Fix-test-21.41.patch as upstream test changed                                                                           
+      used. This has been changed because the original arbitrary
      distribution implementation did not account for multiple
      instances of the same host being present in `SLURM_HOSTFILE`,
      providing an incorrect process mapping in such case. This
      change also enables distributing tasks in blocks when using
      arbitrary distribution, which was not the case before. This
      only affects `mpi`/`pmi2` plugin.
  * Removed Fix-test-21.41.patch as upstream test changed.
 -------------------------------------------------------------------
 Mon Mar 25 15:16:44 UTC 2024 - Christian Goll <cgoll@suse.com>
 - removed Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch
  as incoperated upstream
-* Changes in Slurm 23.02.5
+- Changes in Slurm 23.02.5
- * Add the JobId to debug() messages indicating when cpus_per_task/mem_per_cpu
+  * Add the `JobId` to `debug()` messages indicating when
-   or pn_min_cpus are being automatically adjusted.
+    `cpus_per_task/mem_per_cpu` or `pn_min_cpus` are being
- * Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if
+    automatically adjusted.
-   a node features plugin is configured.
+  * Fix regression in 23.02.2 that caused `slurmctld -R` to crash on
    startup if a node features plugin is configured.
  * Fix and prevent reoccurring reservations from overlapping.
- * job_container/tmpfs - Avoid attempts to share BasePath between nodes.
+  * `job_container/tmpfs` - Avoid attempts to share `BasePath`
- * Change the log message warning for rate limited users from verbose to info.
+     between nodes.
- * With CR_Cpu_Memory, fix node selection for jobs that request gres and
+  * Change the log message warning for rate limited users from
-   *-mem-per-cpu.
+    verbose to info.
- * Fix a regression from 22.05.7 in which some jobs were allocated too few
+  * With `CR_Cpu_Memory`, fix node selection for jobs that request
-   nodes, thus overcommitting cpus to some tasks.
+    gres and `--mem-per-cpu`.
- * Fix a job being stuck in the completing state if the job ends while the
+  * Fix a regression from 22.05.7 in which some jobs were allocated
-   primary controller is down or unresponsive and the backup controller has
+    too few nodes, thus overcommitting cpus to some tasks.
-   not yet taken over.
+  * Fix a job being stuck in the completing state if the job ends
- * Fix slurmctld segfault when a node registers with a configured CpuSpecList
+    while the primary controller is down or unresponsive and the
-   while slurmctld configuration has the node without CpuSpecList.
+    backup controller has not yet taken over.
- * Fix cloud nodes getting stuck in POWERED_DOWN+NO_RESPOND state after not
+  * Fix `slurmctld` segfault when a node registers with a configured
-   registering by ResumeTimeout.
+    `CpuSpecList` while slurmctld configuration has the node without
- * slurmstepd - Avoid cleanup of config.json-less containers spooldir getting
+    `CpuSpecList`.
-   skipped.
+  * Fix cloud nodes getting stuck in `POWERED_DOWN+NO_RESPOND` state
- * slurmstepd - Cleanup per task generated environment for containers in
+    after not registering by `ResumeTimeout`.
-   spooldir.
+  * `slurmstepd` - Avoid cleanup of `config.json`-less containers
- * Fix scontrol segfault when 'completing' command requested repeatedly in
+    spooldir getting skipped.
-   interactive mode.
+  * `slurmstepd` - Cleanup per task generated environment for
- * Properly handle a race condition between bind() and listen() calls in the
+    containers in spooldir.
-   network stack when running with SrunPortRange set.
+  * Fix `scontrol segfault` when 'completing' command requested
- * Federation - Fix revoked jobs being returned regardless of the -a/--all
+    repeatedly in interactive mode.
-   option for privileged users.
+  * Properly handle a race condition between `bind()` and `listen()`
- * Federation - Fix canceling pending federated jobs from non-origin clusters
+    calls in the network stack when running with `SrunPortRange` set.
-   which could leave federated jobs orphaned from the origin cluster.
+  * Federation - Fix revoked jobs being returned regardless of the
- * Fix sinfo segfault when printing multiple clusters with --noheader option.
+    `-a`/`--all` option for privileged users.
- * Federation - fix clusters not syncing if clusters are added to a federation
+  * Federation - Fix canceling pending federated jobs from non-origin
-   before they have registered with the dbd.
+    clusters which could leave federated jobs orphaned from the origin
- * Change pmi2 plugin to honor the SrunPortRange option. This matches the new
+    cluster.
-   behavior of the pmix plugin in 23.02.0. Note that neither of these plugins
+  * Fix sinfo segfault when printing multiple clusters with
-   makes use of the "MpiParams=ports=" option, and previously were only limited
+    `--noheader` option.
-   by the systems ephemeral port range.
+  * Federation - fix clusters not syncing if clusters are added to
- * node_features/helpers - Fix node selection for jobs requesting changeable
+    a federation before they have registered with the dbd.
-   features with the '|' operator, which could prevent jobs from running on
+  * Change `pmi2` plugin to honor the `SrunPortRange` option. This
-   some valid nodes.
+    matches the new behavior of the pmix plugin in 23.02.0. Note that
- * node_features/helpers - Fix inconsistent handling of '&' and '|', where an
+    neither of these plugins makes use of the "`MpiParams=ports=`"
-   AND'd feature was sometimes AND'd to all sets of features instead of just
+    option, and previously were only limited by the systems ephemeral
-   the current set. E.g. "foo|bar&baz" was interpreted as {foo,baz} or
+    port range.
-   {bar,baz} instead of how it is documented: "{foo} or {bar,baz}".
+  * `node_features/helpers` - Fix node selection for jobs requesting
- * Fix job accounting so that when a job is requeued its allocated node count
+    changeable features with the '`|`' operator, which could prevent
-   is cleared. After the requeue, sacct will correctly show that the job has
+    jobs from running on some valid nodes.
-   0 AllocNodes while it is pending or if it is canceled before restarting.
+  * `node_features/helpers` - Fix inconsistent handling of '`&`' and
- * sacct - AllocCPUS now correctly shows 0 if a job has not yet received an
+    '`|`', where an AND'd feature was sometimes AND'd to all sets of
-   allocation or if the job was canceled before getting one.
+    features instead of just the current set. E.g. "`foo|bar&baz`" was
- * Fix intel oneapi autodetect: detect the /dev/dri/renderD[0-9]+ gpus, and do
+    interpreted as `{foo,baz}` or `{bar,baz}` instead of how it is
-   not detect /dev/dri/card[0*9]+.
+    documented: "`{foo} or {bar,baz}`".
- * Format batch, extern, interactive, and pending step ids into strings that
+  * Fix job accounting so that when a job is requeued its allocated
-   are human readable.
+    node count is cleared. After the requeue, sacct will correctly
- * Fix node selection for jobs that request --gpus and a number of tasks fewer
+    show that the job has  0 `AllocNodes` while it is pending or if
-   than gpus, which resulted in incorrectly rejecting these jobs.
+    it is canceled before restarting.
- * Remove MYSQL_OPT_RECONNECT completely.
+  * `sacct` - `AllocCPUS` now correctly shows 0 if a job has not yet
- * Fix cloud nodes in POWERING_UP state disappearing (getting set to FUTURE)
+    received an allocation or if the job was canceled before getting
-   when an `scontrol reconfigure` happens.
+    one.
- * openapi/dbv0.0.39 - Avoid assert / segfault on missing coordinators list.
+  * Fix intel oneapi autodetect: detect the `/dev/dri/renderD[0-9]+`
- * slurmrestd - Correct memory leak while parsing OpenAPI specification
+    gpus, and do not detect `/dev/dri/card[0-9]+`.
-   templates with server overrides.
+  * Format batch, extern, interactive, and pending step ids into
- * slurmrestd - Reduce memory usage when printing out job CPU frequency.
+    strings that are human readable.
  * Fix node selection for jobs that request `--gpus` and a number
    of tasks fewer than gpus, which resulted in incorrectly rejecting
    these jobs.
  * Remove `MYSQL_OPT_RECONNECT` completely.
  * Fix cloud nodes in `POWERING_UP` state disappearing (getting set
    to `FUTURE`) when an `scontrol reconfigure` happens.
  * `openapi/dbv0.0.39` - Avoid assert / segfault on missing
    coordinators list.
  * `slurmrestd` - Correct memory leak while parsing OpenAPI
    specification templates with server overrides.
  * `slurmrestd` - Reduce memory usage when printing out job CPU
    frequency.
  * Fix overwriting user node reason with system message.
- * Remove --uid / --gid options from salloc and srun commands.
+  * Remove `--uid` / `--gid` options from salloc and srun commands.
  * Prevent deadlock when rpc_queue is enabled.
- * slurmrestd - Correct OpenAPI specification generation bug where fields with
+  * `slurmrestd` - Correct OpenAPI specification generation bug where
-   overlapping parent paths would not get generated.
+    fields with overlapping parent paths would not get generated.
  * Fix memory leak as a result of a partition info query.
  * Fix memory leak as a result of a job info query.
- * slurmrestd - For 'GET /slurm/v0.0.39/node[s]', change format of node's
+  * slurmrestd - For `GET /slurm/v0.0.39/node[s]`, change format of
-   energy field "current_watts" to a dictionary to account for unset value
+    node's energy field `current_watts` to a dictionary to account
-   instead of dumping 4294967294.
+    for unset value instead of dumping `4294967294`.
- * slurmrestd - For 'GET /slurm/v0.0.39/qos', change format of QOS's
+  * `slurmrestd` - For `GET /slurm/v0.0.39/qos`, change format of
-   field "priority" to a dictionary to account for unset value instead of
+    QOS's field `priority` to a dictionary to account for unset
-   dumping 4294967294.
+    value instead of dumping `4294967294`.
- * slurmrestd - For 'GET /slurm/v0.0.39/job[s]', the 'return code' code field
+  * `slurmrestd` - For `GET /slurm/v0.0.39/job[s]`, the `return code`
-   in v0.0.39_job_exit_code will be set to *127 instead of being left unset
+    code field in `v0.0.39_job_exit_code` will be set to 127 instead
-   where job does not have a relevant return code.
+    of being left unset where job does not have a relevant return code.
- * data_parser/v0.0.39 - Add required/memory_per_cpu and
+  * `data_parser/v0.0.39` - Add `required/memory_per_cpu` and
-   required/memory_per_node to `sacct *-json` and `sacct --yaml` and
+    required/memory_per_node to `sacct --json` and `sacct --yaml` and
-   'GET /slurmdb/v0.0.39/jobs' from slurmrestd.
+    `GET /slurmdb/v0.0.39/jobs` from `slurmrestd`.
- * For step allocations, fix --gres=none sometimes not ignoring gres from the
+  * For step allocations, fix `--gres=none` sometimes not ignoring
-   job.
+    gres from the job.
- * Fix --exclusive jobs incorrectly gang-scheduling where they shouldn't.
+  * Fix `--exclusive` jobs incorrectly gang-scheduling where they
- * Fix allocations with CR_SOCKET, gres not assigned to a specific socket, and
+    shouldn't.
-   block core distribion potentially allocating more sockets than required.
+  * Fix allocations with `CR_SOCKET`, gres not assigned to a specific
- * gpu/oneapi - Store cores correctly so CPU affinity is tracked.
+    socket, and block core distribion potentially allocating more
- * Revert a change in 23.02.3 where Slurm would kill a script's process group
+    sockets than required.
-   as soon as the script ended instead of waiting as long as any process in
+  * `gpu/oneapi` - Store cores correctly so CPU affinity is tracked.
-   that process group held the stdout/stderr file descriptors open. That change
+  * Revert a change in 23.02.3 where Slurm would kill a script's
-   broke some scripts that relied on the previous behavior. Setting time limits
+    process group as soon as the script ended instead of waiting as
-   for scripts (such as PrologEpilogTimeout) is strongly encouraged to avoid
+    long as any process in
-   Slurm waiting indefinitely for scripts to finish.
+    that process group held the stdout/stderr file descriptors open.
    That change broke some scripts that relied on the previous
    behavior. Setting time limits for scripts (such as
    `PrologEpilogTimeout`) is strongly encouraged to avoid Slurm
    waiting indefinitely for scripts to finish.
  * Allow slurmdbd -R to work if the root assoc id is not 1.
- * Fix slurmdbd -R not returning an error under certain conditions.
+  * Fix `slurmdbd -R` not returning an error under certain conditions.
- * slurmdbd - Avoid potential NULL pointer dereference in the mysql plugin.
+  * `slurmdbd` - Avoid potential NULL pointer dereference in the
- * Revert a change in 23.02 where SLURM_NTASKS was no longer set in the job's
+    mysql plugin.
-   environment when *-ntasks-per-node was requested.
+  * Revert a change in 23.02 where `SLURM_NTASKS` was no longer
- * Limit periodic node registrations to 50 instead of the full TreeWidth.
+    set in the job's environment when `--ntasks-per-node` was
-   Since unresolvable cloud/dynamic nodes must disable fanout by setting
+    requested.
-   TreeWidth to a large number, this would cause all nodes to register at
+  * Limit periodic node registrations to 50 instead of the full
-   once.
+   `TreeWidth`.
- * Fix regression in 23.02.3 which broken x11 forwarding for hosts when
+   Since unresolvable `cloud/dynamic` nodes must disable fanout by
-   MUNGE sends a localhost address in the encode host field. This is caused
+   setting `TreeWidth` to a large number, this would cause all nodes
-   when the node hostname is mapped to 127.0.0.1 (or similar) in /etc/hosts.
+   to register at once.
- * openapi/[db]v0.0.39 - fix memory leak on parsing error.
+  * Fix regression in 23.02.3 which broken x11 forwarding for hosts
- * data_parser/v0.0.39 - fix updating qos for associations.
+    when `MUNGE` sends a localhost address in the encode host field.
- * openapi/dbv0.0.39 - fix updating values for associations with null users.
+    This is caused when the node hostname is mapped to 127.0.0.1
- * Fix minor memory leak with --tres-per-task and licenses.
+    (or similar) in `/etc/hosts`.
  * `openapi/[db]v0.0.39` - fix memory leak on parsing error.
  * `data_parser/v0.0.39` - fix updating qos for associations.
  * `openapi/dbv0.0.39` - fix updating values for associations with
    null users.
  * Fix minor memory leak with `--tres-per-task` and licenses.
  * Fix cyclic socket cpu distribution for tasks in a step where
-   --cpus-per-task < usable threads per core.
+    `--cpus-per-task` < usable threads per core.
 - Changes in Slurm 23.02.4
-  * Fix sbatch return code when **wait is requested on a job array.
+  * Fix `sbatch` return code when --wait is requested on a job array.
-  * switch/hpe_slingshot * avoid segfault when running with old libcxi.
+  * `switch/hpe_slingshot` - avoid segfault when running with old
-  * Avoid slurmctld segfault when specifying AccountingStorageExternalHost.
+    libcxi.
-  * Fix collected GPUUtilization values for acct_gather_profile plugins.
+  * Avoid slurmctld segfault when specifying
    `AccountingStorageExternalHost`.
  * Fix collected `GPUUtilization` values for `acct_gather_profile`
    plugins.
  * Fix slurmrestd handling of job hold/release operations.
-  * Make spank S_JOB_ARGV item value hold the requested command argv instead of
+  * Make spank `S_JOB_ARGV` item value hold the requested command
-    the srun **bcast value when **bcast requested (only in local context).
+    argv instead of the srun `--bcast` value when `--bcast` requested
-  * Fix step running indefinitely when slurmctld takes more than MessageTimeout
+    (only in local context).
-    to respond. Now, slurmctld will cancel the step when detected, preventing
+  * Fix step running indefinitely when slurmctld takes more than
-    following steps from getting stuck waiting for resources to be released.
+    `MessageTimeout` to respond. Now, `slurmctld` will cancel the
-  * Fix regression to make job_desc.min_cpus accurate again in job_submit when
+    step when detected, preventing following steps from getting stuck
-    requesting a job with **ntasks*per*node.
+    waiting for resources to be released.
-  * scontrol * Permit changes to StdErr and StdIn for pending jobs.
+  * Fix regression to make job_desc.min_cpus accurate again in
-  * scontrol * Reset std{err,in,out} when set to empty string.
+    job_submit when requesting a job with `--ntasks-per-node`.
-  * slurmrestd * mark environment as a required field for job submission
+  * `scontrol` - Permit changes to `StdErr` and `StdIn` for pending
-    descriptions.
+    jobs.
-  * slurmrestd * avoid dumping null in OpenAPI schema required fields.
+  * `scontrol` - Reset std{err,in,out} when set to empty string.
-  * data_parser/v0.0.39 * avoid rejecting valid memory_per_node formatted as
+  * `slurmrestd` - mark environment as a required field for job
-    dictionary provided with a job description.
+     submission descriptions.
-  * data_parser/v0.0.39 * avoid rejecting valid memory_per_cpu formatted as
+  * `slurmrestd` - avoid dumping null in OpenAPI schema required
-    dictionary provided with a job description.
+    fields.
-  * slurmrestd * Return HTTP error code 404 when job query fails.
+    `data_parser/v0.0.39` - avoid rejecting valid `memory_per_node`
-  * slurmrestd * Add return schema to error response to job and license query.
+    formatted as dictionary provided with a job description.
  * `data_parser/v0.0.39` - avoid rejecting valid `memory_per_cpu`
    formatted as dictionary provided with a job description.
  * `slurmrestd` - Return HTTP error code 404 when job query fails.
  * `slurmrestd` - Add return schema to error response to job and
    license query.
  * Fix handling of ArrayTaskThrottle in backfill.
-  * Fix regression in 23.02.2 when checking gres state on slurmctld startup or
+  * Fix regression in 23.02.2 when checking gres state on `slurmctld`
-    reconfigure. Gres changes in the configuration were not updated on slurmctld
+    startup or  reconfigure. Gres changes in the configuration were
-    startup. On startup or reconfigure, these messages were present in the log:
+    not updated on `slurmctld` startup. On startup or reconfigure,
-    "error: Attempt to change gres/gpu Count".
+    these messages were present in the log:
    "`error: Attempt to change gres/gpu Count`".
  * Fix potential double count of gres when dealing with limits.
-  * switch/hpe_slingshot * support alternate traffic class names with "TC_"
+  * `switch/hpe_slingshot` - support alternate traffic class names
-    prefix.
+    with "`TC_`"  prefix.
-  * scrontab * Fix cutting off the final character of quoted variables.
+  * `scrontab` - Fix cutting off the final character of quoted
-  * Fix slurmstepd segfault when ContainerPath is not set in oci.conf
+    variables.
-  * Change the log message warning for rate limited users from debug to verbose.
+  * Fix `slurmstepd` segfault when `ContainerPath` is not set in
-  * Fixed an issue where jobs requesting licenses were incorrectly rejected.
+    `oci.conf`.
-  * smail * Fix issues where e*mails at job completion were not being sent.
+  * Change the log message warning for rate limited users from
-  * scontrol/slurmctld * fix comma parsing when updating a reservation's nodes.
+    debug to verbose.
-  * cgroup/v2 * Avoid capturing log output for ebpf when constraining devices,
+  * Fixed an issue where jobs requesting licenses were incorrectly
-    as this can lead to inadvertent failure if the log buffer is too small.
+    rejected.
-  * Fix **gpu*bind=single binding tasks to wrong gpus, leading to some gpus
+  * `smail` - Fix issues where emails at job completion were not
-    having more tasks than they should and other gpus being unused.
+    being sent.
-  * Fix main scheduler loop not starting after failover to backup controller.
+  * `scontrol/slurmctld` - fix comma parsing when updating a
-  * Added error message when attempting to use sattach on batch or extern steps.
+    reservation's nodes.
-  * Fix regression in 23.02 that causes slurmstepd to crash when srun requests
+  * `cgroup/v2` - Avoid capturing log output for ebpf when
-    more than TreeWidth nodes in a step and uses the pmi2 or pmix plugin.
+    constraining devices, as this can lead to inadvertent failure
-  * Reject job ArrayTaskThrottle update requests from unprivileged users.
+    if the log buffer is too small.
-  * data_parser/v0.0.39 * populate description fields of property objects in
+  * Fix --gpu-bind=single binding tasks to wrong gpus, leading to
-    generated OpenAPI specifications where defined.
+    some gpus having more tasks than they should and other gpus being
-  * slurmstepd * Avoid segfault caused by ContainerPath not being terminated by
+    unused.
-    '/' in oci.conf.
+  * Fix main scheduler loop not starting after failover to backup
-  * data_parser/v0.0.39 * Change v0.0.39_job_info response to tag exit_code
+    controller.
-    field as being complex instead of only an unsigned integer.
+  * Added error message when attempting to use sattach on batch or
-  * job_container/tmpfs * Fix %h and %n substitution in BasePath where %h was
+    extern steps.
-    substituted as the NodeName instead of the hostname, and %n was substituted
+  * Fix regression in 23.02 that causes slurmstepd to crash when
-    as an empty string.
+    `srun` requests more than `TreeWidth` nodes in a step and uses
-  * Fix regression where **cpu*bind=verbose would override TaskPluginParam.
+    the `pmi2` or `pmix` plugin.
-  * scancel * Fix **clusters/*M for federations. Only filtered jobs (e.g. *A,
+  * Reject job `ArrayTaskThrottle` update requests from unprivileged
-    *u, *p, etc.) from the specified clusters will be canceled, rather than all
+    users.
-    jobs in the federation. Specific jobids will still be routed to the origin
+  * `data_parser/v0.0.39` - populate description fields of property
-     cluster for cancellation.
+    objects in generated OpenAPI specifications where defined.
-
+  * `slurmstepd` - Avoid segfault caused by ContainerPath not being
    terminated by '`/`' in `oci.conf`.
  * `data_parser/v0.0.39` - Change `v0.0.39_job_info` response to tag
    `exit_code` field as being complex instead of only an unsigned
    integer.
  * `job_container/tmpfs` - Fix %h and %n substitution in `BasePath`
    where `%h` was substituted as the `NodeName` instead of the
    hostname, and `%n` was substituted as an empty string.
  * Fix regression where --cpu-bind=verbose would override
  `TaskPluginParam`.
  * `scancel` - Fix `--clusters`/`-M` for federations. Only filtered
    jobs (e.g. -A, -u, -p, etc.) from the specified clusters will be
    canceled, rather than all jobs in the federation.
    Specific jobids will still be routed to the origin cluster
    for cancellation.
 -------------------------------------------------------------------
 Mon Jan 29 13:47:55 UTC 2024 - Egbert Eich <eich@suse.com>
@ -2337,7 +2662,6 @@ Fri Jul  2 08:01:32 UTC 2021 - Christian Goll <cgoll@suse.com>
 - Updated to  20.11.8:
  * slurmctld - fix erroneous "StepId=CORRUPT" messages in error logs.
  * Correct the error given when auth plugin fails to pack a credential.
  * Fix unused-variable compiler warning on FreeBSD in fd_resolve_path().
  * acct_gather_filesystem/lustre - only emit collection error once per step.
  * Add GRES environment variables (e.g., CUDA_VISIBLE_DEVICES) into the
    interactive step, the same as is done for the batch step.
--- a/slurm.spec
+++ b/slurm.spec
@ -19,7 +19,7 @@
 # Check file META in sources: update so_version to (API_CURRENT - API_AGE)
 %define so_version 41
 # Make sure to update `upgrades` as well!
-%define ver 24.05.0
+%define ver 24.05.3
 %define _ver _24_05
 %define dl_ver %{ver}
 # so-version is 0 and seems to be stable
@ -59,6 +59,9 @@ ExclusiveArch:  do_not_build
 %if 0%{?sle_version} == 150500 || 0%{?sle_version} == 150600
 %define base_ver 2302
 %endif
 %if 0%{?sle_version} == 150500 || 0%{?sle_version} == 150600
 %define base_ver 2302
 %endif
 %define ver_m %{lua:x=string.gsub(rpm.expand("%ver"),"%.[^%.]*$","");print(x)}
 # Keep format_spec_file from botching the define below: