- Update to version 24.05.3

* `data_parser/v0.0.40` - Added field descriptions.
  * `slurmrestd` - Avoid creating new slurmdbd connection per request
    to `* /slurm/slurmctld/*/*` endpoints.
  * Fix compilation issue with `switch/hpe_slingshot` plugin.
  * Fix gres per task allocation with threads-per-core.
  * `data_parser/v0.0.41` - Added field descriptions.
  * `slurmrestd` - Change back generated OpenAPI schema for
    `DELETE /slurm/v0.0.40/jobs/` to `RequestBody` instead of using
    parameters for request. `slurmrestd` will continue accept endpoint
    requests via `RequestBody` or HTTP query.
  * `topology/tree` - Fix issues with switch distance optimization.
  * Fix potential segfault of secondary `slurmctld` when falling back
    to the primary when running with a `JobComp` plugin.
  * Enable `--json`/`--yaml=v0.0.39` options on client commands to
    dump data using data_parser/v0.0.39 instead or outputting nothing.
  * `switch/hpe_slingshot` - Fix issue that could result in a 0 length
    state file.
  * Fix unnecessary message protocol downgrade for unregistered nodes.
  * Fix unnecessarily packing alias addrs when terminating jobs with
    a mix of non-cloud/dynamic nodes and powered down cloud/dynamic
    nodes.
  * `accounting_storage/mysql` - Fix issue when deleting a qos that
    could remove too many commas from the qos and/or delta_qos fields
    of the assoc table.
  * `slurmctld` - Fix memory leak when using RestrictedCoresPerGPU.
  * Fix allowing access to reservations without `MaxStartDelay` set.
  * Fix regression introduced in 24.05.0rc1 breaking
    `srun --send-libs` parsing.
  * Fix slurmd vsize memory leak when using job submission/allocation

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=295
This commit is contained in:
Egbert Eich 2024-10-15 06:51:09 +00:00 committed by Git OBS Bridge
parent fc209e050f
commit b2f6e848a1
4 changed files with 629 additions and 302 deletions

View File

@ -1,3 +0,0 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a6d3e95f2bbda3c9567060efc3d7090ad8eac257fa3578798c89321957946e49
size 7117445

3
slurm-24.05.3.tar.bz2 Normal file
View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:b0b40513e9b6ae867ddb95d60b950bcb980c15b735b5d0dea37a9a00cc64ae24
size 7189600

View File

@ -1,312 +1,637 @@
-------------------------------------------------------------------
Mon Oct 14 10:40:10 UTC 2024 - Egbert Eich <eich@suse.com>
- Update to version 24.05.3
* `data_parser/v0.0.40` - Added field descriptions.
* `slurmrestd` - Avoid creating new slurmdbd connection per request
to `* /slurm/slurmctld/*/*` endpoints.
* Fix compilation issue with `switch/hpe_slingshot` plugin.
* Fix gres per task allocation with threads-per-core.
* `data_parser/v0.0.41` - Added field descriptions.
* `slurmrestd` - Change back generated OpenAPI schema for
`DELETE /slurm/v0.0.40/jobs/` to `RequestBody` instead of using
parameters for request. `slurmrestd` will continue accept endpoint
requests via `RequestBody` or HTTP query.
* `topology/tree` - Fix issues with switch distance optimization.
* Fix potential segfault of secondary `slurmctld` when falling back
to the primary when running with a `JobComp` plugin.
* Enable `--json`/`--yaml=v0.0.39` options on client commands to
dump data using data_parser/v0.0.39 instead or outputting nothing.
* `switch/hpe_slingshot` - Fix issue that could result in a 0 length
state file.
* Fix unnecessary message protocol downgrade for unregistered nodes.
* Fix unnecessarily packing alias addrs when terminating jobs with
a mix of non-cloud/dynamic nodes and powered down cloud/dynamic
nodes.
* `accounting_storage/mysql` - Fix issue when deleting a qos that
could remove too many commas from the qos and/or delta_qos fields
of the assoc table.
* `slurmctld` - Fix memory leak when using RestrictedCoresPerGPU.
* Fix allowing access to reservations without `MaxStartDelay` set.
* Fix regression introduced in 24.05.0rc1 breaking
`srun --send-libs` parsing.
* Fix slurmd vsize memory leak when using job submission/allocation
commands that implicitly or explicitly use --get-user-env.
* `slurmd` - Fix node going into invalid state when using
`CPUSpecList` and setting CPUs to the # of cores on a
multithreaded node.
* Fix reboot asap nodes being considered in backfill after a restart.
* Fix `--clusters`/`-M queries` for clusters outside of a
federation when `fed_display` is configured.
* Fix `scontrol` allowing updating job with bad cpus-per-task value.
* `sattach` - Fix regression from 24.05.2 security fix leading to
crash.
* `mpi/pmix` - Fix assertion when built under `--enable-debug`.
- Changes from Slurm 24.05.2
* Fix energy gathering rpc counter underflow in
`_rpc_acct_gather_energy` when more than 10 threads try to get
energy at the same time. This prevented the possibility to get
energy from slurmd by any step until slurmd was restarted,
so losing energy accounting metrics in the node.
* `accounting_storage/mysql` - Fix issue where new user with `wckey`
did not have a default wckey sent to the slurmctld.
* `slurmrestd` - Prevent slurmrestd segfault when handling the
following endpoints when none of the optional parameters are
specified:
`DELETE /slurm/v0.0.40/jobs`
`DELETE /slurm/v0.0.41/jobs`
`GET /slurm/v0.0.40/shares`
`GET /slurm/v0.0.41/shares`
`GET /slurmdb/v0.0.40/instance`
`GET /slurmdb/v0.0.41/instance`
`GET /slurmdb/v0.0.40/instances`
`GET /slurmdb/v0.0.41/instances`
`POST /slurm/v0.0.40/job/{job_id}`
`POST /slurm/v0.0.41/job/{job_id}`
* Fix IPMI energy gathering when no IPMIPowerSensors are specified
in `acct_gather.conf`. This situation resulted in an accounted
energy of 0 for job steps.
* Fix a minor memory leak in slurmctld when updating a job dependency.
* `scontrol`,`squeue` - Fix regression that caused incorrect values
for multisocket nodes at `.jobs[].job_resources.nodes.allocation`
for `scontrol show jobs --(json|yaml)` and `squeue --(json|yaml)`.
* `slurmrestd` - Fix regression that caused incorrect values for
multisocket nodes at `.jobs[].job_resources.nodes.allocation` to
be dumped with endpoints:
`GET /slurm/v0.0.41/job/{job_id}`
`GET /slurm/v0.0.41/jobs`
* `jobcomp/filetxt` - Fix truncation of job record lines > 1024
characters.
* `switch/hpe_slingshot` - Drain node on failure to delete CXI
services.
* Fix a performance regression from 23.11.0 in cpu frequency
handling when no `CpuFreqDef` is defined.
* Fix one-task-per-sharing not working across multiple nodes.
* Fix inconsistent number of cpus when creating a reservation
using the TRESPerNode option.
* `data_parser/v0.0.40+` - Fix job state parsing which could
break filtering.
* Prevent `cpus-per-task` to be modified in jobs where a `-c`
value has been explicitly specified and the requested memory
constraints implicitly increase the number of CPUs to allocate.
* `slurmrestd` - Fix regression where args `-s v0.0.39,dbv0.0.39`
and `-d v0.0.39` would result in `GET /openapi/v3` not
registering as a valid possible query resulting in 404 errors.
* `slurmrestd` - Fix memory leak for dbv0.0.39 jobs query which
occurred if the query parameters specified account, association,
cluster, constraints, format, groups, job_name, partition, qos,
reason, reservation, state, users, or wckey. This affects the
following endpoints:
`GET /slurmdb/v0.0.39/jobs`
* `slurmrestd` - In the case the slurmdbd does not respond to a
persistent connection init message, prevent the closed fd from
being used, and instead emit an error or warning depending on
if the connection was required.
* Fix 24.05.0 regression that caused the slurmdbd not to send back
an error message if there is an error initializing a persistent
connection.
* Reduce latency of forwarded x11 packets.
* Add `curr_dependency` (representing the current dependency of
the job).
and `orig_dependency` (representing the original requested
dependency of the job) fields to the job record in
`job_submit.lua` (for job update) and `jobcomp.lua`.
* Fix potential segfault of slurmctld configured with
`SlurmctldParameters=enable_rpc_queue` from happening on
reconfigure.
* Fix potential segfault of slurmctld on its shutdown when rate
limitting is enabled.
* `slurmrestd` - Fix missing job environment for `SLURM_JOB_NAME`,
`SLURM_OPEN_MODE`, `SLURM_JOB_DEPENDENCY`, `SLURM_PROFILE`,
`SLURM_ACCTG_FREQ`, `SLURM_NETWORK` and `SLURM_CPU_FREQ_REQ` to
match sbatch.
* Fix GRES environment variable indices being incorrect when only
using a subset of all GPUs on a node and the
`--gres-flags=allow-task-sharing` option.
* Prevent `scontrol` from segfaulting when requesting scontrol
show reservation `--json` or `--yaml` if there is an error
retrieving reservations from the `slurmctld`.
* `switch/hpe_slingshot` - Fix security issue around managing VNI
access. CVE-2024-42511.
* `switch/nvidia_imex` - Fix security issue managing IMEX channel
access. CVE-2024-42511.
* `switch/nvidia_imex` - Allow for compatibility with
`job_container/tmpfs`.
- Changes in Slurm 24.05.1
* Fix `slurmctld` and `slurmdbd` potentially stopping instead of
performing a logrotate when recieving `SIGUSR2` when using
`auth/slurm`.
* `switch/hpe_slingshot` - Fix slurmctld crash when upgrading
from 23.02.
* Fix "Could not find group" errors from `validate_group()` when
using `AllowGroups` with large `/etc/group` files.
* Add `AccountingStoreFlags=no_stdio` which allows to not record
the stdio paths of the job when set.
* `slurmrestd` - Prevent a slurmrestd segfault when parsing the
`crontab` field, which was never usable. Now it explicitly
ignores the value and emits a warning if it is used for the
following endpoints:
`POST /slurm/v0.0.39/job/{job_id}`
`POST /slurm/v0.0.39/job/submit`
`POST /slurm/v0.0.40/job/{job_id}`
`POST /slurm/v0.0.40/job/submit`
`POST /slurm/v0.0.41/job/{job_id}`
`POST /slurm/v0.0.41/job/submit`
`POST /slurm/v0.0.41/job/allocate`
* `mpi/pmi2` - Fix communication issue leading to task launch
failure with "`invalid kvs seq from node`".
* Fix getting user environment when using sbatch with
`--get-user-env` or `--export=` when there is a user profile
script that reads `/proc`.
* Prevent slurmd from crashing if `acct_gather_energy/gpu` is
configured but `GresTypes` is not configured.
* Do not log the following errors when `AcctGatherEnergyType`
plugins are used but a node does not have or cannot find sensors:
"`error: _get_joules_task: can't get info from slurmd`"
"`error: slurm_get_node_energy: Zero Bytes were transmitted or
received`"
However, the following error will continue to be logged:
"`error: Can't get energy data. No power sensors are available.
Try later`"
* `sbatch`, `srun` - Set `SLURM_NETWORK` environment variable if
`--network` is set.
* Fix cloud nodes not being able to forward to nodes that restarted
with new IP addresses.
* Fix cwd not being set correctly when running a SPANK plugin with a
`spank_user_init()` hook and the new "`contain_spank`" option set.
* `slurmctld` - Avoid deadlock during shutdown when `auth/slurm`
is active.
* Fix segfault in `slurmctld` with `topology/block`.
* `sacct` - Fix printing of job group for job steps.
* `scrun` - Log when an invalid environment variable causes the
job submission to be rejected.
* `accounting_storage/mysql` - Fix problem where listing or
modifying an association when specifying a qos list could hang
or take a very long time.
* `gpu/nvml` - Fix `gpuutil/gpumem` only tracking last GPU in step.
Now, `gpuutil/gpumem` will record sums of all GPUS in the step.
* Fix error in `scrontab` jobs when using
`slurm.conf:PropagatePrioProcess=1`.
* Fix `slurmctld` crash on a batch job submission with
`--nodes 0,...`.
* Fix dynamic IP address fanout forwarding when using `auth/slurm`.
* Restrict listening sockets in the `mpi/pmix` plugin and `sattach`
to the `SrunPortRange`.
* `slurmrestd` - Limit mime types returned from query to
`GET /openapi/v3` to only return one mime type per serializer
plugin to fix issues with OpenAPI client generators that are
unable to handle multiple mime type aliases.
* Fix many commands possibly reporting an "`Unexpected Message
Received`" when in reality the connection timed out.
* Prevent slurmctld from starting if there is not a json
serializer present and the `extra_constraints` feature is enabled.
* Fix heterogeneous job components not being signaled with
`scancel --ctld` and `DELETE slurm/v0.0.40/jobs` if the job ids
are not explicitly given, the heterogeneous job components match
the given filters, and the heterogeneous job leader does not
match the given filters.
* Fix regression from 23.02 impeding job licenses from being cleared.
* Move error to `log_flag` which made `_get_joules_task` error to
be logged to the user when too many rpcs were queued in slurmd
for gathering energy.
* For `scancel --ctld` and the associated rest api endpoints:
`DELETE /slurm/v0.0.40/jobs`
`DELETE /slurm/v0.0.41/jobs`
Fix canceling the final array task in a job array when the task
is pending and all array tasks have been split into separate job
records. Previously this task was not canceled.
* Fix `power_save operation` after recovering from a failed
reconfigure.
* `slurmctld` - Skip removing the pidfile when running under
systemd. In that situation it is never created in the first place.
* Fix issue where altering the flags on a Slurm account
(`UsersAreCoords`) several limits on the account's association
would be set to 0 in Slurm's internal cache.
* Fix memory leak in the controller when relaying `stepmgr` step
accounting to the dbd.
* Fix segfault when submitting stepmgr jobs within an existing
allocation.
* Added `disable_slurm_hydra_bootstrap` as a possible `MpiParams`
parameter in `slurm.conf`. Using this will disable env variable
injection to allocations for the following variables:
`I_MPI_HYDRA_BOOTSTRAP,` `I_MPI_HYDRA_BOOTSTRAP_EXEC_EXTRA_ARGS`,
`HYDRA_BOOTSTRAP`, `HYDRA_LAUNCHER_EXTRA_ARGS`.
* `scrun` - Delay shutdown until after start requested.
This caused `scrun` to never start or shutdown and hung forever
when using `--tty`.
* Fix backup `slurmctld` potentially not running the agent when
taking over as the primary controller.
* Fix primary controller not running the agent when a reconfigure
of the `slurmctld` fails.
* `slurmd` - fix premature timeout waiting for
`REQUEST_LAUNCH_PROLOG` with large array jobs causing node to
drain.
* `jobcomp/{elasticsearch,kafka}` - Avoid sending fields with
invalid date/time.
* `jobcomp/elasticsearch` - Fix `slurmctld` memory leak from
curl usage.
* `acct_gather_profile/influxdb` - Fix slurmstepd memory leak from
curl usage
* Fix 24.05.0 regression not deleting job hash dirs after
`MinJobAge`.
* Fix filtering arguments being ignored when using squeue `--json`.
* `switch/nvidia_imex` - Move setup call after `spank_init()` to
allow namespace manipulation within the SPANK plugin.
* `switch/nvidia_imex` - Skip plugin operation if
`nvidia-caps-imex-channels` device is not present rather than
preventing slurmd from starting.
* `switch/nvidia_imex` - Skip plugin operation if
`job_container/tmpfs` is configured due to incompatibility.
* `switch/nvidia_imex` - Remove any pre-existing channels when
`slurmd` starts.
* `rpc_queue` - Add support for an optional `rpc_queue.yaml`
configuration file.
* `slurmrestd` - Add new +prefer_refs flag to `data_parser/v0.0.41`
plugin. This flag will avoid inlining single referenced schemas
in the OpenAPI schema.
------------------------------------------------------------------- -------------------------------------------------------------------
Tue Jun 4 09:36:54 UTC 2024 - Christian Goll <cgoll@suse.com> Tue Jun 4 09:36:54 UTC 2024 - Christian Goll <cgoll@suse.com>
- updated to new release 24.05.0 with following major changes - Updated to new release 24.05.0 with following major changes
- IMPORTANT NOTES: * Important Notes:
If using the slurmdbd (Slurm DataBase Daemon) you must update If using the slurmdbd (Slurm DataBase Daemon) you must update
this first. NOTE: If using a backup DBD you must start the this first. NOTE: If using a backup DBD you must start the
primary first to do any database conversion, the backup will not primary first to do any database conversion, the backup will not
start until this has happened. The 24.05 slurmdbd will work start until this has happened. The 24.05 slurmdbd will work
with Slurm daemons of version 23.02 and above. You will not with Slurm daemons of version 23.02 and above. You will not
need to update all clusters at the same time, but it is very need to update all clusters at the same time, but it is very
important to update slurmdbd first and having it running before important to update slurmdbd first and having it running before
updating any other clusters making use of it. updating any other clusters making use of it.
- HIGHLIGHTS * Highlights
* Federation - allow client command operation when slurmdbd is + Federation - allow client command operation when slurmdbd is
unavailable. unavailable.
* burst_buffer/lua - Added two new hooks: slurm_bb_test_data_in + `burst_buffer/lua` - Added two new hooks: `slurm_bb_test_data_in`
and slurm_bb_test_data_out. The syntax and use of the new hooks and `slurm_bb_test_data_out`. The syntax and use of the new hooks
are documented in etc/burst_buffer.lua.example. These are are documented in `etc/burst_buffer.lua.example`. These are
required to exist. slurmctld now checks on startup if the required to exist. slurmctld now checks on startup if the
burst_buffer.lua script loads and contains all required hooks; `burst_buffer.lua` script loads and contains all required hooks;
slurmctld will exit with a fatal error if this is not `slurmctld` will exit with a fatal error if this is not
successful. Added PollInterval to burst_buffer.conf. Removed successful. Added `PollInterval` to `burst_buffer.conf`. Removed
the arbitrary limit of 512 copies of the script running the arbitrary limit of 512 copies of the script running
simultaneously. simultaneously.
* Add QOS limit MaxTRESRunMinsPerAccount. + Add QOS limit `MaxTRESRunMinsPerAccount`.
* Add QOS limit MaxTRESRunMinsPerUser. + Add QOS limit `MaxTRESRunMinsPerUser`.
* Add ELIGIBLE environment variable to jobcomp/script plugin. + Add `ELIGIBLE` environment variable to `jobcomp/script` plugin.
* Always use the QOS name for SLURM_JOB_QOS environment variables. + Always use the QOS name for `SLURM_JOB_QOS` environment variables.
Previously the batch environment would use the description field, Previously the batch environment would use the description field,
which was usually equivalent to the name. which was usually equivalent to the name.
* cgroup/v2 - Require dbus-1 version >= 1.11.16. + `cgroup/v2` - Require dbus-1 version >= 1.11.16.
* Allow NodeSet names to be used in SuspendExcNodes. + Allow `NodeSet` names to be used in SuspendExcNodes.
* SuspendExcNodes=<nodes>:N now counts allocated nodes in N. The + `SuspendExcNodes=<nodes>:N` now counts allocated nodes in `N`.
first N powered up nodes in <nodes> are protected from being The first `N` powered up nodes in <nodes> are protected from
suspended. being suspended.
* Store job output, input and error paths in SlurmDBD. + Store job output, input and error paths in `SlurmDBD`.
* Add USER_DELETE reservation flag to allow users with access to + Add `USER_DELETE` reservation flag to allow users with access
a reservation to delete it. to a reservation to delete it.
* Add SlurmctldParameters=enable_stepmgr to enable step + Add `SlurmctldParameters=enable_stepmgr` to enable step
management through the slurmstepd instead of the controller. management through the `slurmstepd` instead of the controller.
* Added PrologFlags=RunInJob to make prolog and epilog run + Added `PrologFlags=RunInJob` to make prolog and epilog run
inside the job extern step to include it in the job's cgroup. inside the job extern step to include it in the job's cgroup.
* Add ability to reserve MPI ports at the job level for stepmgr + Add ability to reserve MPI ports at the job level for stepmgr
jobs and subdivide them at the step level. jobs and subdivide them at the step level.
* slurmrestd - Add --generate-openapi-spec argument. + `slurmrestd` - Add `--generate-openapi-spec argument`.
- CONFIGURATION FILE CHANGES (see appropriate man page for details) * Configuration File Changes (see appropriate man page for details)
* CoreSpecPlugin has been removed. + `CoreSpecPlugin` has been removed.
* Removed TopologyPlugin tree and dragonfly support from + Removed `TopologyPlugin` tree and dragonfly support from
select/linear. If those topology plugins are desired please switch to `select/linear`. If those topology plugins are desired please
select/cons_tres. switch to `select/cons_tres`.
* Changed the default value for UnkillableStepTimeout to 60 + Changed the default value for `UnkillableStepTimeout` to 60
seconds or five times the value of MessageTimeout, whichever is greater. seconds or five times the value of `MessageTimeout`, whichever
* An error log has been added if JobAcctGatherParams 'UsePss' or is greater.
'NoShare' are configured with a plugin other than jobacct_gather/linux. + An error log has been added if `JobAcctGatherParams` '`UsePss`'
In such case these parameters are ignored. or '`NoShare`' are configured with a plugin other than
* helpers.conf - Added Flags=rebootless parameter allowing feature changes `jobacct_gather/linux`. In such case these parameters are ignored.
without rebooting compute nodes. + `helpers.conf` - Added `Flags=rebootless` parameter allowing
* topology/block - Replaced the BlockLevels with BlockSizes in topology.conf. feature changes without rebooting compute nodes.
* Add contain_spank option to SlurmdParameters. When set, spank_user_init(), + `topology/block` - Replaced the `BlockLevels` with `BlockSizes`
spank_task_post_fork(), and spank_task_exit() will execute within the in `topology.conf`.
job_container/tmpfs plugin namespace. + Add `contain_spank` option to `SlurmdParameters`. When set,
* Add SlurmctldParameters=max_powered_nodes=N, which prevents powering up `spank_user_init()`, `spank_task_post_fork()`, and
nodes after the max is reached. `spank_task_exit()` will execute within the
* Add ExclusiveTopo to a partition definition in slurm.conf. `job_container/tmpfs` plugin namespace.
* Add AccountingStorageParameters=max_step_records to limit how many steps + Add `SlurmctldParameters=max_powered_nodes=N`, which prevents
are recorded in the database for each job *- excluding batc powering up nodes after the max is reached.
- COMMAND CHANGES (see man pages for details) + Add `ExclusiveTopo` to a partition definition in `slurm.conf`.
* Add support for "elevenses" as an additional time specification. + Add `AccountingStorageParameters=max_step_records` to limit how
* Add support for sbcast --preserve when job_container/tmpfs configured many steps are recorded in the database for each job - excluding
(previously documented as unsupported). batch.
* scontrol - Add new subcommand 'power' for node power control. * Command Changes (see man pages for details)
* squeue - Adjust StdErr, StdOut, and StdIn output formats. These will now + Add support for "elevenses" as an additional time specification.
consistently print "(null)" if a value is unavailable. StdErr will no + Add support for `sbcast --preserve` when `job_container/tmpfs`
longer display StdOut if it is not distinctly set. StdOut will now configured (previously documented as unsupported).
correctly display the default filename pattern for job arrays, and no + `scontrol` - Add new subcommand `power` for node power control.
longer show it for non*batch jobs. However, the expansion patterns will + `squeue` - Adjust `StdErr`, `StdOut`, and `StdIn` output formats.
no longer be substituted by default. These will now consistently print "`(null)`" if a value is
* Add --segment to job allocation to be used in topology/block. unavailable. `StdErr` will no longer display `StdOut` if it is
* Add --exclusive=topo for use with topology/block. not distinctly set. `StdOut` will now correctly display the
* squeue - Add --expand-patterns option to expand StdErr, StdOut, StdIn default filename pattern for job arrays, and no longer show it
filename patterns as best as possible. for non-batch jobs. However, the expansion patterns will
* sacct - Add --expand-patterns option to expand StdErr, StdOut, StdIn no longer be substituted by default.
filename patterns as best as possible. + Add `--segment` to job allocation to be used in topology/block.
* sreport - Requesting format=Planned will now return the expected Planned + Add `--exclusive=topo` for use with topology/block.
time as documented, instead of PlannedDown. To request Planned Down, + `squeue` - Add `--expand-patterns` option to expand `StdErr`,
one must use now format=PLNDDown or format=PlannedDown explicitly. The `StdOut`, `StdIn` filename patterns as best as possible.
abbreviations "Pl" or "Pla" will now make reference to Planned instead of + `sacct` - Add `--expand-patterns` option to expand `StdErr`,
PlannedDown. `StdOut`, `StdIn` filename patterns as best as possible.
- API CHANGES + `sreport` - Requesting `format=Planned` will now return the
* Removed ListIterator type from <slurm/slurm.h>. expected `Planned` time as documented, instead of `PlannedDown`.
* Removed slurm_xlate_job_id() from <slurm/slurm.h> To request `Planned Down`, one must use now `format=PLNDDown`
- SLURMRESTD CHANGES or `format=PlannedDown` explicitly. The abbreviations
* openapi/dbv0.0.38 and openapi/v0.0.38 plugins have been removed. "`Pl`" or "`Pla`" will now make reference to Planned instead
* openapi/dbv0.0.39 and openapi/v0.0.39 plugins have been tagged as of `PlannedDown`.
deprecated to warn of their removal in the next release. * API Changes
* Changed slurmrestd.service to only listen on TCP socket by default. + Removed `ListIterator` type from `<slurm/slurm.h>`.
Environments with existing drop*in units for the service may need + Removed `slurm_xlate_job_id()` from `<slurm/slurm.h>`
further adjustments to work after upgrading. * SLURMRESTD Changes
* slurmrestd - Tagged `script` field as deprecated in + `openapi/dbv0.0.38` and `openapi/v0.0.38` plugins have been
'POST /slurm/v0.0.41/job/submit' in anticipation of removal in future removed.
OpenAPI plugin versions. Job submissions should set the `job.script` (or + `openapi/dbv0.0.39` and `openapi/v0.0.39` plugins have been
`jobs[0].script` for HetJobs) fields instead. tagged as deprecated to warn of their removal in the next release.
* slurmrestd - Attempt to automatically convert enumerated string arrays with + Changed `slurmrestd.service` to only listen on TCP socket by
incoming non*string values into strings. Add warning when incoming value for default. Environments with existing drop-in units for the
enumerated string arrays can not be converted to string and silently ignore service may need further adjustments to work after upgrading.
instead of rejecting entire request. This change affects any endpoint that + `slurmrestd` - Tagged `script` field as deprecated in
uses an enunmerated string as given in the OpenAPI specification. An `POST /slurm/v0.0.41/job/submit` in anticipation of removal in
example of this conversion would be to 'POST /slurm/v0.0.41/job/submit' with future OpenAPI plugin versions. Job submissions should set the
'.job.exclusive = true'. While the JSON (boolean) true value matches a `job.script` (or `jobs[0].script` for HetJobs) fields instead.
possible enumeration, it is not the expected "true" string. This change + `slurmrestd` - Attempt to automatically convert enumerated
automatically converts the (boolean) true to (string) "true" avoiding a string arrays with incoming non-string values into strings.
parsing failure. Add warning when incoming value for enumerated string arrays
* slurmrestd - Add 'POST /slurm/v0.0.41/job/allocate' endpoint. This endpoint can not be converted to string and silently ignore instead of
will create a new job allocation without any steps. The allocation will need rejecting entire request. This change affects any endpoint that
to be ended via signaling the job or it will run to the timelimit. uses an enunmerated string as given in the OpenAPI specification.
* slurmrestd - Allow startup when slurmdbd is not configured and avoid loading An example of this conversion would be to
slurmdbd specific plugins. `POST /slurm/v0.0.41/job/submit` with `.job.exclusive = true`.
- MPI/PMI2 CHANGES While the JSON (boolean) true value matches a possible
* Jobs submitted with the SLURM_HOSTFILE environment variable set implies enumeration, it is not the expected "true" string. This change
using an arbitrary distribution. Nevertheless, the logic used in PMI2 when automatically converts the (boolean) `true` to (string) "`true`"
generating their associated PMI_process_mapping values has been changed and avoiding a parsing failure.
will now be the same used for the plane distribution, as if "-m plane" were + `slurmrestd` - Add `POST /slurm/v0.0.41/job/allocate` endpoint.
used. This has been changed because the original arbitrary distribution This endpoint will create a new job allocation without any steps.
implementation did not account for multiple instances of the same host being The allocation will need to be ended via signaling the job or
present in SLURM_HOSTFILE, providing an incorrect process mapping in such it will run to the timelimit.
case. This change also enables distributing tasks in blocks when using + `slurmrestd` - Allow startup when `slurmdbd` is not configured
arbitrary distribution, which was not the case before. This only affects and avoid loading `slurmdbd` specific plugins.
mpi/pmi2 plugin. * MPI/PMI2 Changes
- removed Fix-test-21.41.patch as upstream test changed + Jobs submitted with the `SLURM_HOSTFILE` environment variable
set implies using an arbitrary distribution. Nevertheless, the
logic used in PMI2 when generating their associated
`PMI_process_mapping` values has been changed and will now be
the same used for the plane distribution, as if `-m plane` were
used. This has been changed because the original arbitrary
distribution implementation did not account for multiple
instances of the same host being present in `SLURM_HOSTFILE`,
providing an incorrect process mapping in such case. This
change also enables distributing tasks in blocks when using
arbitrary distribution, which was not the case before. This
only affects `mpi`/`pmi2` plugin.
* Removed Fix-test-21.41.patch as upstream test changed.
------------------------------------------------------------------- -------------------------------------------------------------------
Mon Mar 25 15:16:44 UTC 2024 - Christian Goll <cgoll@suse.com> Mon Mar 25 15:16:44 UTC 2024 - Christian Goll <cgoll@suse.com>
- removed Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch - removed Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch
as incoperated upstream as incoperated upstream
* Changes in Slurm 23.02.5 - Changes in Slurm 23.02.5
* Add the JobId to debug() messages indicating when cpus_per_task/mem_per_cpu * Add the `JobId` to `debug()` messages indicating when
or pn_min_cpus are being automatically adjusted. `cpus_per_task/mem_per_cpu` or `pn_min_cpus` are being
* Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if automatically adjusted.
a node features plugin is configured. * Fix regression in 23.02.2 that caused `slurmctld -R` to crash on
* Fix and prevent reoccurring reservations from overlapping. startup if a node features plugin is configured.
* job_container/tmpfs - Avoid attempts to share BasePath between nodes. * Fix and prevent reoccurring reservations from overlapping.
* Change the log message warning for rate limited users from verbose to info. * `job_container/tmpfs` - Avoid attempts to share `BasePath`
* With CR_Cpu_Memory, fix node selection for jobs that request gres and between nodes.
*-mem-per-cpu. * Change the log message warning for rate limited users from
* Fix a regression from 22.05.7 in which some jobs were allocated too few verbose to info.
nodes, thus overcommitting cpus to some tasks. * With `CR_Cpu_Memory`, fix node selection for jobs that request
* Fix a job being stuck in the completing state if the job ends while the gres and `--mem-per-cpu`.
primary controller is down or unresponsive and the backup controller has * Fix a regression from 22.05.7 in which some jobs were allocated
not yet taken over. too few nodes, thus overcommitting cpus to some tasks.
* Fix slurmctld segfault when a node registers with a configured CpuSpecList * Fix a job being stuck in the completing state if the job ends
while slurmctld configuration has the node without CpuSpecList. while the primary controller is down or unresponsive and the
* Fix cloud nodes getting stuck in POWERED_DOWN+NO_RESPOND state after not backup controller has not yet taken over.
registering by ResumeTimeout. * Fix `slurmctld` segfault when a node registers with a configured
* slurmstepd - Avoid cleanup of config.json-less containers spooldir getting `CpuSpecList` while slurmctld configuration has the node without
skipped. `CpuSpecList`.
* slurmstepd - Cleanup per task generated environment for containers in * Fix cloud nodes getting stuck in `POWERED_DOWN+NO_RESPOND` state
spooldir. after not registering by `ResumeTimeout`.
* Fix scontrol segfault when 'completing' command requested repeatedly in * `slurmstepd` - Avoid cleanup of `config.json`-less containers
interactive mode. spooldir getting skipped.
* Properly handle a race condition between bind() and listen() calls in the * `slurmstepd` - Cleanup per task generated environment for
network stack when running with SrunPortRange set. containers in spooldir.
* Federation - Fix revoked jobs being returned regardless of the -a/--all * Fix `scontrol segfault` when 'completing' command requested
option for privileged users. repeatedly in interactive mode.
* Federation - Fix canceling pending federated jobs from non-origin clusters * Properly handle a race condition between `bind()` and `listen()`
which could leave federated jobs orphaned from the origin cluster. calls in the network stack when running with `SrunPortRange` set.
* Fix sinfo segfault when printing multiple clusters with --noheader option. * Federation - Fix revoked jobs being returned regardless of the
* Federation - fix clusters not syncing if clusters are added to a federation `-a`/`--all` option for privileged users.
before they have registered with the dbd. * Federation - Fix canceling pending federated jobs from non-origin
* Change pmi2 plugin to honor the SrunPortRange option. This matches the new clusters which could leave federated jobs orphaned from the origin
behavior of the pmix plugin in 23.02.0. Note that neither of these plugins cluster.
makes use of the "MpiParams=ports=" option, and previously were only limited * Fix sinfo segfault when printing multiple clusters with
by the systems ephemeral port range. `--noheader` option.
* node_features/helpers - Fix node selection for jobs requesting changeable * Federation - fix clusters not syncing if clusters are added to
features with the '|' operator, which could prevent jobs from running on a federation before they have registered with the dbd.
some valid nodes. * Change `pmi2` plugin to honor the `SrunPortRange` option. This
* node_features/helpers - Fix inconsistent handling of '&' and '|', where an matches the new behavior of the pmix plugin in 23.02.0. Note that
AND'd feature was sometimes AND'd to all sets of features instead of just neither of these plugins makes use of the "`MpiParams=ports=`"
the current set. E.g. "foo|bar&baz" was interpreted as {foo,baz} or option, and previously were only limited by the systems ephemeral
{bar,baz} instead of how it is documented: "{foo} or {bar,baz}". port range.
* Fix job accounting so that when a job is requeued its allocated node count * `node_features/helpers` - Fix node selection for jobs requesting
is cleared. After the requeue, sacct will correctly show that the job has changeable features with the '`|`' operator, which could prevent
0 AllocNodes while it is pending or if it is canceled before restarting. jobs from running on some valid nodes.
* sacct - AllocCPUS now correctly shows 0 if a job has not yet received an * `node_features/helpers` - Fix inconsistent handling of '`&`' and
allocation or if the job was canceled before getting one. '`|`', where an AND'd feature was sometimes AND'd to all sets of
* Fix intel oneapi autodetect: detect the /dev/dri/renderD[0-9]+ gpus, and do features instead of just the current set. E.g. "`foo|bar&baz`" was
not detect /dev/dri/card[0*9]+. interpreted as `{foo,baz}` or `{bar,baz}` instead of how it is
* Format batch, extern, interactive, and pending step ids into strings that documented: "`{foo} or {bar,baz}`".
are human readable. * Fix job accounting so that when a job is requeued its allocated
* Fix node selection for jobs that request --gpus and a number of tasks fewer node count is cleared. After the requeue, sacct will correctly
than gpus, which resulted in incorrectly rejecting these jobs. show that the job has 0 `AllocNodes` while it is pending or if
* Remove MYSQL_OPT_RECONNECT completely. it is canceled before restarting.
* Fix cloud nodes in POWERING_UP state disappearing (getting set to FUTURE) * `sacct` - `AllocCPUS` now correctly shows 0 if a job has not yet
when an `scontrol reconfigure` happens. received an allocation or if the job was canceled before getting
* openapi/dbv0.0.39 - Avoid assert / segfault on missing coordinators list. one.
* slurmrestd - Correct memory leak while parsing OpenAPI specification * Fix intel oneapi autodetect: detect the `/dev/dri/renderD[0-9]+`
templates with server overrides. gpus, and do not detect `/dev/dri/card[0-9]+`.
* slurmrestd - Reduce memory usage when printing out job CPU frequency. * Format batch, extern, interactive, and pending step ids into
* Fix overwriting user node reason with system message. strings that are human readable.
* Remove --uid / --gid options from salloc and srun commands. * Fix node selection for jobs that request `--gpus` and a number
* Prevent deadlock when rpc_queue is enabled. of tasks fewer than gpus, which resulted in incorrectly rejecting
* slurmrestd - Correct OpenAPI specification generation bug where fields with these jobs.
overlapping parent paths would not get generated. * Remove `MYSQL_OPT_RECONNECT` completely.
* Fix memory leak as a result of a partition info query. * Fix cloud nodes in `POWERING_UP` state disappearing (getting set
* Fix memory leak as a result of a job info query. to `FUTURE`) when an `scontrol reconfigure` happens.
* slurmrestd - For 'GET /slurm/v0.0.39/node[s]', change format of node's * `openapi/dbv0.0.39` - Avoid assert / segfault on missing
energy field "current_watts" to a dictionary to account for unset value coordinators list.
instead of dumping 4294967294. * `slurmrestd` - Correct memory leak while parsing OpenAPI
* slurmrestd - For 'GET /slurm/v0.0.39/qos', change format of QOS's specification templates with server overrides.
field "priority" to a dictionary to account for unset value instead of * `slurmrestd` - Reduce memory usage when printing out job CPU
dumping 4294967294. frequency.
* slurmrestd - For 'GET /slurm/v0.0.39/job[s]', the 'return code' code field * Fix overwriting user node reason with system message.
in v0.0.39_job_exit_code will be set to *127 instead of being left unset * Remove `--uid` / `--gid` options from salloc and srun commands.
where job does not have a relevant return code. * Prevent deadlock when rpc_queue is enabled.
* data_parser/v0.0.39 - Add required/memory_per_cpu and * `slurmrestd` - Correct OpenAPI specification generation bug where
required/memory_per_node to `sacct *-json` and `sacct --yaml` and fields with overlapping parent paths would not get generated.
'GET /slurmdb/v0.0.39/jobs' from slurmrestd. * Fix memory leak as a result of a partition info query.
* For step allocations, fix --gres=none sometimes not ignoring gres from the * Fix memory leak as a result of a job info query.
job. * slurmrestd - For `GET /slurm/v0.0.39/node[s]`, change format of
* Fix --exclusive jobs incorrectly gang-scheduling where they shouldn't. node's energy field `current_watts` to a dictionary to account
* Fix allocations with CR_SOCKET, gres not assigned to a specific socket, and for unset value instead of dumping `4294967294`.
block core distribion potentially allocating more sockets than required. * `slurmrestd` - For `GET /slurm/v0.0.39/qos`, change format of
* gpu/oneapi - Store cores correctly so CPU affinity is tracked. QOS's field `priority` to a dictionary to account for unset
* Revert a change in 23.02.3 where Slurm would kill a script's process group value instead of dumping `4294967294`.
as soon as the script ended instead of waiting as long as any process in * `slurmrestd` - For `GET /slurm/v0.0.39/job[s]`, the `return code`
that process group held the stdout/stderr file descriptors open. That change code field in `v0.0.39_job_exit_code` will be set to 127 instead
broke some scripts that relied on the previous behavior. Setting time limits of being left unset where job does not have a relevant return code.
for scripts (such as PrologEpilogTimeout) is strongly encouraged to avoid * `data_parser/v0.0.39` - Add `required/memory_per_cpu` and
Slurm waiting indefinitely for scripts to finish. required/memory_per_node to `sacct --json` and `sacct --yaml` and
* Allow slurmdbd -R to work if the root assoc id is not 1. `GET /slurmdb/v0.0.39/jobs` from `slurmrestd`.
* Fix slurmdbd -R not returning an error under certain conditions. * For step allocations, fix `--gres=none` sometimes not ignoring
* slurmdbd - Avoid potential NULL pointer dereference in the mysql plugin. gres from the job.
* Revert a change in 23.02 where SLURM_NTASKS was no longer set in the job's * Fix `--exclusive` jobs incorrectly gang-scheduling where they
environment when *-ntasks-per-node was requested. shouldn't.
* Limit periodic node registrations to 50 instead of the full TreeWidth. * Fix allocations with `CR_SOCKET`, gres not assigned to a specific
Since unresolvable cloud/dynamic nodes must disable fanout by setting socket, and block core distribion potentially allocating more
TreeWidth to a large number, this would cause all nodes to register at sockets than required.
once. * `gpu/oneapi` - Store cores correctly so CPU affinity is tracked.
* Fix regression in 23.02.3 which broken x11 forwarding for hosts when * Revert a change in 23.02.3 where Slurm would kill a script's
MUNGE sends a localhost address in the encode host field. This is caused process group as soon as the script ended instead of waiting as
when the node hostname is mapped to 127.0.0.1 (or similar) in /etc/hosts. long as any process in
* openapi/[db]v0.0.39 - fix memory leak on parsing error. that process group held the stdout/stderr file descriptors open.
* data_parser/v0.0.39 - fix updating qos for associations. That change broke some scripts that relied on the previous
* openapi/dbv0.0.39 - fix updating values for associations with null users. behavior. Setting time limits for scripts (such as
* Fix minor memory leak with --tres-per-task and licenses. `PrologEpilogTimeout`) is strongly encouraged to avoid Slurm
* Fix cyclic socket cpu distribution for tasks in a step where waiting indefinitely for scripts to finish.
--cpus-per-task < usable threads per core. * Allow slurmdbd -R to work if the root assoc id is not 1.
* Fix `slurmdbd -R` not returning an error under certain conditions.
* `slurmdbd` - Avoid potential NULL pointer dereference in the
mysql plugin.
* Revert a change in 23.02 where `SLURM_NTASKS` was no longer
set in the job's environment when `--ntasks-per-node` was
requested.
* Limit periodic node registrations to 50 instead of the full
`TreeWidth`.
Since unresolvable `cloud/dynamic` nodes must disable fanout by
setting `TreeWidth` to a large number, this would cause all nodes
to register at once.
* Fix regression in 23.02.3 which broken x11 forwarding for hosts
when `MUNGE` sends a localhost address in the encode host field.
This is caused when the node hostname is mapped to 127.0.0.1
(or similar) in `/etc/hosts`.
* `openapi/[db]v0.0.39` - fix memory leak on parsing error.
* `data_parser/v0.0.39` - fix updating qos for associations.
* `openapi/dbv0.0.39` - fix updating values for associations with
null users.
* Fix minor memory leak with `--tres-per-task` and licenses.
* Fix cyclic socket cpu distribution for tasks in a step where
`--cpus-per-task` < usable threads per core.
- Changes in Slurm 23.02.4 - Changes in Slurm 23.02.4
* Fix sbatch return code when **wait is requested on a job array. * Fix `sbatch` return code when --wait is requested on a job array.
* switch/hpe_slingshot * avoid segfault when running with old libcxi. * `switch/hpe_slingshot` - avoid segfault when running with old
* Avoid slurmctld segfault when specifying AccountingStorageExternalHost. libcxi.
* Fix collected GPUUtilization values for acct_gather_profile plugins. * Avoid slurmctld segfault when specifying
`AccountingStorageExternalHost`.
* Fix collected `GPUUtilization` values for `acct_gather_profile`
plugins.
* Fix slurmrestd handling of job hold/release operations. * Fix slurmrestd handling of job hold/release operations.
* Make spank S_JOB_ARGV item value hold the requested command argv instead of * Make spank `S_JOB_ARGV` item value hold the requested command
the srun **bcast value when **bcast requested (only in local context). argv instead of the srun `--bcast` value when `--bcast` requested
* Fix step running indefinitely when slurmctld takes more than MessageTimeout (only in local context).
to respond. Now, slurmctld will cancel the step when detected, preventing * Fix step running indefinitely when slurmctld takes more than
following steps from getting stuck waiting for resources to be released. `MessageTimeout` to respond. Now, `slurmctld` will cancel the
* Fix regression to make job_desc.min_cpus accurate again in job_submit when step when detected, preventing following steps from getting stuck
requesting a job with **ntasks*per*node. waiting for resources to be released.
* scontrol * Permit changes to StdErr and StdIn for pending jobs. * Fix regression to make job_desc.min_cpus accurate again in
* scontrol * Reset std{err,in,out} when set to empty string. job_submit when requesting a job with `--ntasks-per-node`.
* slurmrestd * mark environment as a required field for job submission * `scontrol` - Permit changes to `StdErr` and `StdIn` for pending
descriptions. jobs.
* slurmrestd * avoid dumping null in OpenAPI schema required fields. * `scontrol` - Reset std{err,in,out} when set to empty string.
* data_parser/v0.0.39 * avoid rejecting valid memory_per_node formatted as * `slurmrestd` - mark environment as a required field for job
dictionary provided with a job description. submission descriptions.
* data_parser/v0.0.39 * avoid rejecting valid memory_per_cpu formatted as * `slurmrestd` - avoid dumping null in OpenAPI schema required
dictionary provided with a job description. fields.
* slurmrestd * Return HTTP error code 404 when job query fails. `data_parser/v0.0.39` - avoid rejecting valid `memory_per_node`
* slurmrestd * Add return schema to error response to job and license query. formatted as dictionary provided with a job description.
* `data_parser/v0.0.39` - avoid rejecting valid `memory_per_cpu`
formatted as dictionary provided with a job description.
* `slurmrestd` - Return HTTP error code 404 when job query fails.
* `slurmrestd` - Add return schema to error response to job and
license query.
* Fix handling of ArrayTaskThrottle in backfill. * Fix handling of ArrayTaskThrottle in backfill.
* Fix regression in 23.02.2 when checking gres state on slurmctld startup or * Fix regression in 23.02.2 when checking gres state on `slurmctld`
reconfigure. Gres changes in the configuration were not updated on slurmctld startup or reconfigure. Gres changes in the configuration were
startup. On startup or reconfigure, these messages were present in the log: not updated on `slurmctld` startup. On startup or reconfigure,
"error: Attempt to change gres/gpu Count". these messages were present in the log:
"`error: Attempt to change gres/gpu Count`".
* Fix potential double count of gres when dealing with limits. * Fix potential double count of gres when dealing with limits.
* switch/hpe_slingshot * support alternate traffic class names with "TC_" * `switch/hpe_slingshot` - support alternate traffic class names
prefix. with "`TC_`" prefix.
* scrontab * Fix cutting off the final character of quoted variables. * `scrontab` - Fix cutting off the final character of quoted
* Fix slurmstepd segfault when ContainerPath is not set in oci.conf variables.
* Change the log message warning for rate limited users from debug to verbose. * Fix `slurmstepd` segfault when `ContainerPath` is not set in
* Fixed an issue where jobs requesting licenses were incorrectly rejected. `oci.conf`.
* smail * Fix issues where e*mails at job completion were not being sent. * Change the log message warning for rate limited users from
* scontrol/slurmctld * fix comma parsing when updating a reservation's nodes. debug to verbose.
* cgroup/v2 * Avoid capturing log output for ebpf when constraining devices, * Fixed an issue where jobs requesting licenses were incorrectly
as this can lead to inadvertent failure if the log buffer is too small. rejected.
* Fix **gpu*bind=single binding tasks to wrong gpus, leading to some gpus * `smail` - Fix issues where emails at job completion were not
having more tasks than they should and other gpus being unused. being sent.
* Fix main scheduler loop not starting after failover to backup controller. * `scontrol/slurmctld` - fix comma parsing when updating a
* Added error message when attempting to use sattach on batch or extern steps. reservation's nodes.
* Fix regression in 23.02 that causes slurmstepd to crash when srun requests * `cgroup/v2` - Avoid capturing log output for ebpf when
more than TreeWidth nodes in a step and uses the pmi2 or pmix plugin. constraining devices, as this can lead to inadvertent failure
* Reject job ArrayTaskThrottle update requests from unprivileged users. if the log buffer is too small.
* data_parser/v0.0.39 * populate description fields of property objects in * Fix --gpu-bind=single binding tasks to wrong gpus, leading to
generated OpenAPI specifications where defined. some gpus having more tasks than they should and other gpus being
* slurmstepd * Avoid segfault caused by ContainerPath not being terminated by unused.
'/' in oci.conf. * Fix main scheduler loop not starting after failover to backup
* data_parser/v0.0.39 * Change v0.0.39_job_info response to tag exit_code controller.
field as being complex instead of only an unsigned integer. * Added error message when attempting to use sattach on batch or
* job_container/tmpfs * Fix %h and %n substitution in BasePath where %h was extern steps.
substituted as the NodeName instead of the hostname, and %n was substituted * Fix regression in 23.02 that causes slurmstepd to crash when
as an empty string. `srun` requests more than `TreeWidth` nodes in a step and uses
* Fix regression where **cpu*bind=verbose would override TaskPluginParam. the `pmi2` or `pmix` plugin.
* scancel * Fix **clusters/*M for federations. Only filtered jobs (e.g. *A, * Reject job `ArrayTaskThrottle` update requests from unprivileged
*u, *p, etc.) from the specified clusters will be canceled, rather than all users.
jobs in the federation. Specific jobids will still be routed to the origin * `data_parser/v0.0.39` - populate description fields of property
cluster for cancellation. objects in generated OpenAPI specifications where defined.
* `slurmstepd` - Avoid segfault caused by ContainerPath not being
terminated by '`/`' in `oci.conf`.
* `data_parser/v0.0.39` - Change `v0.0.39_job_info` response to tag
`exit_code` field as being complex instead of only an unsigned
integer.
* `job_container/tmpfs` - Fix %h and %n substitution in `BasePath`
where `%h` was substituted as the `NodeName` instead of the
hostname, and `%n` was substituted as an empty string.
* Fix regression where --cpu-bind=verbose would override
`TaskPluginParam`.
* `scancel` - Fix `--clusters`/`-M` for federations. Only filtered
jobs (e.g. -A, -u, -p, etc.) from the specified clusters will be
canceled, rather than all jobs in the federation.
Specific jobids will still be routed to the origin cluster
for cancellation.
------------------------------------------------------------------- -------------------------------------------------------------------
Mon Jan 29 13:47:55 UTC 2024 - Egbert Eich <eich@suse.com> Mon Jan 29 13:47:55 UTC 2024 - Egbert Eich <eich@suse.com>
@ -2337,7 +2662,6 @@ Fri Jul 2 08:01:32 UTC 2021 - Christian Goll <cgoll@suse.com>
- Updated to 20.11.8: - Updated to 20.11.8:
* slurmctld - fix erroneous "StepId=CORRUPT" messages in error logs. * slurmctld - fix erroneous "StepId=CORRUPT" messages in error logs.
* Correct the error given when auth plugin fails to pack a credential. * Correct the error given when auth plugin fails to pack a credential.
* Fix unused-variable compiler warning on FreeBSD in fd_resolve_path().
* acct_gather_filesystem/lustre - only emit collection error once per step. * acct_gather_filesystem/lustre - only emit collection error once per step.
* Add GRES environment variables (e.g., CUDA_VISIBLE_DEVICES) into the * Add GRES environment variables (e.g., CUDA_VISIBLE_DEVICES) into the
interactive step, the same as is done for the batch step. interactive step, the same as is done for the batch step.

View File

@ -19,7 +19,7 @@
# Check file META in sources: update so_version to (API_CURRENT - API_AGE) # Check file META in sources: update so_version to (API_CURRENT - API_AGE)
%define so_version 41 %define so_version 41
# Make sure to update `upgrades` as well! # Make sure to update `upgrades` as well!
%define ver 24.05.0 %define ver 24.05.3
%define _ver _24_05 %define _ver _24_05
%define dl_ver %{ver} %define dl_ver %{ver}
# so-version is 0 and seems to be stable # so-version is 0 and seems to be stable
@ -59,6 +59,9 @@ ExclusiveArch: do_not_build
%if 0%{?sle_version} == 150500 || 0%{?sle_version} == 150600 %if 0%{?sle_version} == 150500 || 0%{?sle_version} == 150600
%define base_ver 2302 %define base_ver 2302
%endif %endif
%if 0%{?sle_version} == 150500 || 0%{?sle_version} == 150600
%define base_ver 2302
%endif
%define ver_m %{lua:x=string.gsub(rpm.expand("%ver"),"%.[^%.]*$","");print(x)} %define ver_m %{lua:x=string.gsub(rpm.expand("%ver"),"%.[^%.]*$","");print(x)}
# Keep format_spec_file from botching the define below: # Keep format_spec_file from botching the define below: