- update to 23.02.6 to fix (CVE-2023-41914)
* Removed Fix-test-32.8.patch as fixed upstream
* Bug Fixes:
+ Fix `CpusPerTres=` not upgreadable with scontrol update
+ Fix unintentional gres removal when validating the gres job state.
+ Fix `--without-hpe-slingshot` configure option.
+ Fix cgroup v2 memory calculations when transparent huge pages are used.
+ Fix parsing of `sgather --timeout` option.
+ Fix regression from 22.05.0 that caused `srun --cpu-bind "=verbose"`
and `"=v"` options give different CPU bind masks.
+ Fix "_find_node_record: lookup failure for node" error message appearing
for all dynamic nodes during reconfigure.
+ Avoid segfault if loading serializer plugin fails.
+ `slurmrestd` - Correct OpenAPI format for `GET /slurm/v0.0.39/licenses`.
+ `slurmrestd` - Correct OpenAPI format for
`GET /slurm/v0.0.39/job/{job_id}`.
+ `slurmrestd` - Change format to multiple fields in
'GET /slurmdb/v0.0.39/assocations` and `GET /slurmdb/v0.0.39/qos` to
handle infinite and unset states.
+ When a node fails in a job with `--no-kill`, preserve the extern step on the
remaining nodes to avoid breaking features that rely on the extern step
such as `pam_slurm_adopt`, `x11`, and `job_container/tmpfs`.
+ `auth/jwt` - Ignore `x5c` field in JWKS files.
+ `auth/jwt` - Treat 'alg' field as optional in JWKS files.
+ Allow job_desc.selinux_context to be read from the job_submit.lua script.
+ Skip check in slurmstepd that causes a large number of errors in the
munge log: "Unauthorized credential for client UID=0 GID=0".
This error will still appear on `slurmd`/`slurmctld`/`slurmdbd` start up
and is not a cause for concern.
+ `slurmctld` - Allow startup with zero partitions.
OBS-URL: https://build.opensuse.org/request/show/1117163
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=96
* Bug Fixes:
+ Fix CpusPerTres= not upgreadable with scontrol update
+ Fix unintentional gres removal when validating the gres job state.
+ Fix --without-hpe-slingshot configure option.
+ Fix cgroup v2 memory calculations when transparent huge pages are used.
+ Fix parsing of sgather --timeout option.
+ Fix regression from 22.05.0 that caused srun --cpu-bind "=verbose" and "=v"
options give different CPU bind masks.
+ Fix "_find_node_record: lookup failure for node" error message appearing
for all dynamic nodes during reconfigure.
+ Avoid segfault if loading serializer plugin fails.
+ slurmrestd - Correct OpenAPI format for 'GET /slurm/v0.0.39/licenses'.
+ slurmrestd - Correct OpenAPI format for 'GET /slurm/v0.0.39/job/{job_id}'.
+ slurmrestd - Change format to multiple fields in 'GET
/slurmdb/v0.0.39/assocations' and 'GET /slurmdb/v0.0.39/qos' to handle
infinite and unset states.
+ When a node fails in a job with --no-kill, preserve the extern step on the
remaining nodes to avoid breaking features that rely on the extern step
such as pam_slurm_adopt, x11, and job_container/tmpfs.
+ auth/jwt - Ignore 'x5c' field in JWKS files.
+ auth/jwt - Treat 'alg' field as optional in JWKS files.
+ Allow job_desc.selinux_context to be read from the job_submit.lua script.
+ Skip check in slurmstepd that causes a large number of errors in the munge
log: "Unauthorized credential for client UID=0 GID=0". This error will
still appear on slurmd/slurmctld/slurmdbd start up and is not a cause for
concern.
+ slurmctld - Allow startup with zero partitions.
+ Fix some mig profile names in slurm not matching nvidia mig profiles.
+ Prevent slurmscriptd processing delays from blocking other threads in
slurmctld while trying to launch {Prolog|Epilog}Slurmctld.
OBS-URL: https://build.opensuse.org/request/show/1117145
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=268
- Updated to version 23.02.5 with the following changes:
* Bug Fixes:
+ Revert a change in 23.02 where `SLURM_NTASKS` was no longer set in the
job's environment when `--ntasks-per-node` was requested.
The method that is is being set, however, is different and should be more
accurate in more situations.
+ Change pmi2 plugin to honor the `SrunPortRange` option. This matches the
new behavior of the pmix plugin in 23.02.0. Note that neither of these
plugins makes use of the `MpiParams=ports=` option, and previously
were only limited by the systems ephemeral port range.
+ Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if
a node features plugin is configured.
+ Fix and prevent reoccurring reservations from overlapping.
+ `job_container/tmpfs` - Avoid attempts to share BasePath between nodes.
+ With `CR_Cpu_Memory`, fix node selection for jobs that request gres and
`--mem-per-cpu`.
+ Fix a regression from 22.05.7 in which some jobs were allocated too few
nodes, thus overcommitting cpus to some tasks.
+ Fix a job being stuck in the completing state if the job ends while the
primary controller is down or unresponsive and the backup controller has
not yet taken over.
+ Fix `slurmctld` segfault when a node registers with a configured
`CpuSpecList` while `slurmctld` configuration has the node without
`CpuSpecList`.
+ Fix cloud nodes getting stuck in `POWERED_DOWN+NO_RESPOND` state after
not registering by `ResumeTimeout`.
+ `slurmstepd` - Avoid cleanup of `config.json-less` containers spooldir
getting skipped.
+ Fix scontrol segfault when 'completing' command requested repeatedly in
interactive mode.
OBS-URL: https://build.opensuse.org/request/show/1111943
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=95
features with the `|` operator, which could prevent jobs from
+ `node_features/helpers` - Fix inconsistent handling of `&` and `|`,
instead of just the current set. E.g. `foo|bar&baz` was interpreted
`{foo} or {bar,baz}`.
tasks fewer than GPUs, which resulted in incorrectly rejecting these
jobs.
+ `slurmrestd` - For `GET /slurm/v0.0.39/node[s]`, change format of
node's energy field `current_watts` to a dictionary to account for
+ `slurmrestd` - For `GET /slurm/v0.0.39/qos`, change format of QOS's
+ slurmrestd - For `GET /slurm/v0.0.39/job[s]`, the 'return code'
`GET /slurmdb/v0.0.39/jobs` from slurmrestd.
were present in the log: `error: Attempt to change gres/gpu Count`.
+ Hold the job with `(Reservation ... invalid)` state reason if the
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=265
* Bug Fixes:
+ Revert a change in 23.02 where `SLURM_NTASKS` was no longer set in the
job's environment when `--ntasks-per-node` was requested.
The method that is is being set, however, is different and should be more
accurate in more situations.
+ Change pmi2 plugin to honor the `SrunPortRange` option. This matches the
new behavior of the pmix plugin in 23.02.0. Note that neither of these
plugins makes use of the "`MpiParams=ports=`" option, and previously
were only limited by the systems ephemeral port range.
+ Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if
a node features plugin is configured.
+ Fix and prevent reoccurring reservations from overlapping.
+ `job_container/tmpfs` - Avoid attempts to share BasePath between nodes.
+ With `CR_Cpu_Memory`, fix node selection for jobs that request gres and
`--mem-per-cpu`.
+ Fix a regression from 22.05.7 in which some jobs were allocated too few
nodes, thus overcommitting cpus to some tasks.
+ Fix a job being stuck in the completing state if the job ends while the
primary controller is down or unresponsive and the backup controller has
not yet taken over.
+ Fix `slurmctld` segfault when a node registers with a configured
`CpuSpecList` while `slurmctld` configuration has the node without
`CpuSpecList`.
+ Fix cloud nodes getting stuck in `POWERED_DOWN+NO_RESPOND` state after
not registering by `ResumeTimeout`.
+ `slurmstepd` - Avoid cleanup of `config.json-less` containers spooldir
getting skipped.
+ Fix scontrol segfault when 'completing' command requested repeatedly in
interactive mode.
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=264
- Updated to 23.02.4 with the following changes:
* Bug Fixes:
+ Fix main scheduler loop not starting after a failover to backup
controller. Avoid slurmctld segfault when specifying
`AccountingStorageExternalHost` (bsc#1214983).
+ Fix sbatch return code when `--wait` is requested on a job array.
+ Fix collected `GPUUtilization` values for `acct_gather_profile` plugins.
+ Fix `slurmrestd` handling of job hold/release operations.
+ Fix step running indefinitely when slurmctld takes more than
`MessageTimeout` to respond. Now, `slurmctld` will cancel the step when
detected, preventing following steps from getting stuck waiting for
resources to be released.
+ Fix regression to make `job_desc.min_cpus` accurate again in `job_submit`
when requesting a job with `--ntasks-per-node`.
+ Fix handling of `ArrayTaskThrottle` in backfill.
+ Fix regression in 23.02.2 when checking gres state on `slurmctld`
startup or reconfigure. Gres changes in the configuration were not
updated on slurmctld startup. On startup or reconfigure, these messages
were present in the log: `"error: Attempt to change gres/gpu Count`".
+ Fix potential double count of gres when dealing with limits.
+ Fix `slurmstepd` segfault when `ContainerPath` is not set in `oci.conf`
+ Fixed an issue where jobs requesting licenses were incorrectly rejected.
+ `scrontab` - Fix cutting off the final character of quoted variables.
+ `smail` - Fix issues where e-mails at job completion were not being sent.
+ `scontrol/slurmctld` - fix comma parsing when updating a reservation's
nodes.
+ Fix `--gpu-bind=single binding` tasks to wrong gpus, leading to some gpus
having more tasks than they should and other gpus being unused.
+ Fix regression in 23.02 that causes slurmstepd to crash when `srun`
requests more than `TreeWidth` nodes in a step and uses the pmi2 or
OBS-URL: https://build.opensuse.org/request/show/1110259
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=93
* Bug Fixes:
+ Fix main scheduler loop not starting after a failover to backup
controller. Avoid slurmctld segfault when specifying
`AccountingStorageExternalHost` (bsc#1214983).
+ Fix sbatch return code when `--wait` is requested on a job array.
+ Fix collected `GPUUtilization` values for `acct_gather_profile` plugins.
+ Fix `slurmrestd` handling of job hold/release operations.
+ Fix step running indefinitely when slurmctld takes more than
`MessageTimeout` to respond. Now, `slurmctld` will cancel the step when
detected, preventing following steps from getting stuck waiting for
resources to be released.
+ Fix regression to make `job_desc.min_cpus` accurate again in `job_submit`
when requesting a job with `--ntasks-per-node`.
+ Fix handling of `ArrayTaskThrottle` in backfill.
+ Fix regression in 23.02.2 when checking gres state on `slurmctld`
startup or reconfigure. Gres changes in the configuration were not
updated on slurmctld startup. On startup or reconfigure, these messages
were present in the log: `"error: Attempt to change gres/gpu Count`".
+ Fix potential double count of gres when dealing with limits.
+ Fix `slurmstepd` segfault when `ContainerPath` is not set in `oci.conf`
+ Fixed an issue where jobs requesting licenses were incorrectly rejected.
+ `scrontab` - Fix cutting off the final character of quoted variables.
+ `smail` - Fix issues where e-mails at job completion were not being sent.
+ `scontrol/slurmctld` - fix comma parsing when updating a reservation's
nodes.
+ Fix `--gpu-bind=single binding` tasks to wrong gpus, leading to some gpus
having more tasks than they should and other gpus being unused.
+ Fix regression in 23.02 that causes slurmstepd to crash when `srun`
requests more than `TreeWidth` nodes in a step and uses the pmi2 or
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=260
- Fixes since 23.02.03:
Highlights:
* Fix main scheduler loop not starting after a failover to backup controller.
* Avoid slurmctld segfault when specifying `AccountingStorageExternalHost`
(bsc#1214983).
Other:
* Fix sbatch return code when `--wait` is requested on a job array.
* Fix collected `GPUUtilization` values for `acct_gather_profile` plugins.
* Fix `slurmrestd` handling of job hold/release operations.
* Make spank `S_JOB_ARGV` item value hold the requested command `argv`
instead of the `srun --bcast` value when `--bcast` requested (only in local
context).
* Fix step running indefinitely when slurmctld takes more than
`MessageTimeout` to respond. Now, slurmctld will cancel the step when
detected, preventing following steps from getting stuck waiting for
resources to be released.
* Fix regression to make `job_desc.min_cpus` accurate again in job_submit when
requesting a job with `--ntasks-per-node`.
* Fix handling of `ArrayTaskThrottle` in backfill.
* Fix regression in 23.02.2 when checking gres state on `slurmctld` startup or
reconfigure. Gres changes in the configuration were not updated on slurmctld
startup. On startup or reconfigure, these messages were present in the log:
`"error: Attempt to change gres/gpu Count`".
* Fix potential double count of gres when dealing with limits.
* Fix slurmstepd segfault when ContainerPath is not set in `oci.conf`
* Fixed an issue where jobs requesting licenses were incorrectly rejected.
* `scrontab` - Fix cutting off the final character of quoted variables.
* `smail` - Fix issues where e-mails at job completion were not being sent.
* `scontrol/slurmctld` - fix comma parsing when updating a reservation's
nodes.
OBS-URL: https://build.opensuse.org/request/show/1109308
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=92
Highlights:
* Fix main scheduler loop not starting after a failover to backup controller.
* Avoid slurmctld segfault when specifying `AccountingStorageExternalHost`
(bsc#1214983).
Other:
* Fix sbatch return code when `--wait` is requested on a job array.
* Fix collected `GPUUtilization` values for `acct_gather_profile` plugins.
* Fix `slurmrestd` handling of job hold/release operations.
* Make spank `S_JOB_ARGV` item value hold the requested command `argv`
instead of the `srun --bcast` value when `--bcast` requested (only in local
context).
* Fix step running indefinitely when slurmctld takes more than
`MessageTimeout` to respond. Now, slurmctld will cancel the step when
detected, preventing following steps from getting stuck waiting for
resources to be released.
* Fix regression to make `job_desc.min_cpus` accurate again in job_submit when
requesting a job with `--ntasks-per-node`.
* Fix handling of `ArrayTaskThrottle` in backfill.
* Fix regression in 23.02.2 when checking gres state on `slurmctld` startup or
reconfigure. Gres changes in the configuration were not updated on slurmctld
startup. On startup or reconfigure, these messages were present in the log:
`"error: Attempt to change gres/gpu Count`".
* Fix potential double count of gres when dealing with limits.
* Fix slurmstepd segfault when ContainerPath is not set in `oci.conf`
* Fixed an issue where jobs requesting licenses were incorrectly rejected.
* `scrontab` - Fix cutting off the final character of quoted variables.
* `smail` - Fix issues where e-mails at job completion were not being sent.
* `scontrol/slurmctld` - fix comma parsing when updating a reservation's
nodes.
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=258
- updated to 23.02.04 which includes following changes:
* fixing the main scheduler loop not starting on the backup controller after
a failover event, a segfault when attempting to use
* AccountingStorageExternalHost, and an issue where steps could continue
running indefinitely if the slurmctld takes too long to respond (bsc#1214983)
* include a fix for a potential slurmctld crashes when the backup slurmctld
takes over.
* This also fixes some issues when using older versions of the command line
tools with a 23.02 controller.
* srun/sbatch/salloc - In order to support user namespaces, process user and
group ids are no longer used unless explicitly requested as an argument and
are left as nobody(99) by default. Any cli_filters or SPANK plugins need to
ignore any uid or gid that equal SLURM_AUTH_NOBODY (99). User and group ids
are now resolved by the active auth plugin. To determine the actual job uid
or gid you should use the RESPONSE_RESOURCE_ALLOCATION RPC.
- removed Fix-test-3.13.patch as fixed upstream
- removed Fix-test-38.11.patch as test changed upstream (forwarded request 1109009 from mslacken)
OBS-URL: https://build.opensuse.org/request/show/1109029
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=91
- updated to 23.02.04 which includes following changes:
* fixing the main scheduler loop not starting on the backup controller after
a failover event, a segfault when attempting to use
* AccountingStorageExternalHost, and an issue where steps could continue
running indefinitely if the slurmctld takes too long to respond (bsc#1214983)
* include a fix for a potential slurmctld crashes when the backup slurmctld
takes over.
* This also fixes some issues when using older versions of the command line
tools with a 23.02 controller.
* srun/sbatch/salloc - In order to support user namespaces, process user and
group ids are no longer used unless explicitly requested as an argument and
are left as nobody(99) by default. Any cli_filters or SPANK plugins need to
ignore any uid or gid that equal SLURM_AUTH_NOBODY (99). User and group ids
are now resolved by the active auth plugin. To determine the actual job uid
or gid you should use the RESPONSE_RESOURCE_ALLOCATION RPC.
- removed Fix-test-3.13.patch as fixed upstream
- removed Fix-test-38.11.patch as test changed upstream
OBS-URL: https://build.opensuse.org/request/show/1109009
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=256
- updated to 23.02.02 which includes a number of fixes to Slurm stability
* Includes a fix for a regression in 23.02 that caused openmpi mpirun to fail
to launch tasks.
* It also includes two functional changes: Don't update the cron job tasks if
the whole crontab file is left untouched after opening it with scrontab -e
* Sort dynamic nodes and include them in topology after scontrol reconfigure
or a slurmctld restart.
OBS-URL: https://build.opensuse.org/request/show/1085668
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=254
- Web-configurator: changed presets to SUSE defaults.
- If %_restart_on_update is no longer defined replace by own
macro.
- Marked slurm-openlava, slurm-seff and slurm-sjstat noarch.
- rpmlint:
* dropped some rpmlint filters which are no longer relevant.
* added/refreshed filters. For Details, see rpmlintrc.
- Remove workaround to fix the restart issue in an Slurm package
described in bsc#1088693.
The Slurm version in this package as 16.05. Any attempt to
directly migrate to the current version is bound to fail
anyway.
- Now require slurm-munge if munge authentication is installed.
OBS-URL: https://build.opensuse.org/request/show/1083466
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=89
- Web-configurator: changed presets to SUSE defaults.
- If %_restart_on_update is no longer defined replace by own
macro.
- Marked slurm-openlava, slurm-seff and slurm-sjstat noarch.
- rpmlint:
* dropped some rpmlint filters which are no longer relevant.
* added/refreshed filters. For Details, see rpmlintrc.
- Remove workaround to fix the restart issue in an Slurm package
described in bsc#1088693.
The Slurm version in this package as 16.05. Any attempt to
directly migrate to the current version is bound to fail
anyway.
OBS-URL: https://build.opensuse.org/request/show/1082770
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=251
- updated to 23.02.1 with the following changes:
* job_container/tmpfs - cleanup job container even if namespace mount is
already unmounted.
* openapi/dbv0.0.38 - Fix not displaying an error when updating QOS or
associations fails.
* Fix nodes remaining as PLANNED after slurmctld save state recovery.
* Add cgroup.conf EnableControllers option for cgroup/v2.
* Get correct cgroup root to allow slurmd to run in containers like Docker.
* slurmctld - add missing PrivateData=jobs check to step ContainerID lookup
requests originated from 'scontrol show step container-id=<id>' or certain
scrun operations when container state can't be directly queried.
* Fix nodes un-draining after being drained due to unkillable step.
* Fix remote licenses allowed percentages reset to 0 during upgrade.
* sacct - Avoid truncating time strings when using SLURM_TIME_FORMAT with
the --parsable option.
* Fix regression in 22.05.0rc1 that broke Nodes=ALL in a NodeSet.
* openapi/v0.0.39 - fix jobs submitted via slurmrestd being allocated fewer
CPUs than tasks when requesting multiple tasks.
* Fix job not being scheduled on valid nodes and potentially being rejected
when using parentheses at the beginning of square brackets in a feature
request, for example: "feat1&[(feat2|feat3)]".
* Fix regression in 23.02.0rc1 which made --gres-flags=enforce-binding no
longer enforce optimal core-gpu job placement.
* mpi/pmix - Fix v5 to load correctly when libpmix.so isn't in the normal
lib path.
* data_parser/v0.0.39 - fix regression where "memory_per_node" would be
rejected for job submission.
* data_parser/v0.0.39 - fix regression where "memory_per_cpu" would be
rejected for job submission.
* slurmctld - add an assert to check for magic number presence before deleting
OBS-URL: https://build.opensuse.org/request/show/1076522
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=88
- updated to 23.02.1 with following chnages:
* job_container/tmpfs - cleanup job container even if namespace mount is
already unmounted.
* openapi/dbv0.0.38 - Fix not displaying an error when updating QOS or
associations fails.
* Fix nodes remaining as PLANNED after slurmctld save state recovery.
* Add cgroup.conf EnableControllers option for cgroup/v2.
* Get correct cgroup root to allow slurmd to run in containers like Docker.
* slurmctld - add missing PrivateData=jobs check to step ContainerID lookup
requests originated from 'scontrol show step container-id=<id>' or certain
scrun operations when container state can't be directly queried.
* Fix nodes un-draining after being drained due to unkillable step.
* Fix remote licenses allowed percentages reset to 0 during upgrade.
* sacct - Avoid truncating time strings when using SLURM_TIME_FORMAT with
the --parsable option.
* Fix regression in 22.05.0rc1 that broke Nodes=ALL in a NodeSet.
* openapi/v0.0.39 - fix jobs submitted via slurmrestd being allocated fewer
CPUs than tasks when requesting multiple tasks.
* Fix job not being scheduled on valid nodes and potentially being rejected
when using parentheses at the beginning of square brackets in a feature
request, for example: "feat1&[(feat2|feat3)]".
* Fix regression in 23.02.0rc1 which made --gres-flags=enforce-binding no
longer enforce optimal core-gpu job placement.
* mpi/pmix - Fix v5 to load correctly when libpmix.so isn't in the normal
lib path.
* data_parser/v0.0.39 - fix regression where "memory_per_node" would be
rejected for job submission.
* data_parser/v0.0.39 - fix regression where "memory_per_cpu" would be
rejected for job submission.
* slurmctld - add an assert to check for magic number presence before deleting
OBS-URL: https://build.opensuse.org/request/show/1076461
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=248
- updated to 23.02.0
* Highlights
+ slurmctld - Add new RPC rate limiting feature. This is enabled through
SlurmctldParameters=rl_enable, otherwise disabled by default.
+ Make scontrol reconfigure and sending a SIGHUP to the slurmctld behave
the same. If you were using SIGHUP as a 'lighter' scontrol reconfigure
to rotate logs please update your scripts to use SIGUSR2 instead.
+ Change cloud nodes to show by default. PrivateData=cloud is no longer
needed.
+ sreport - Count planned (FKA reserved) time for jobs running in
IGNORE_JOBS reservations. Previously was lumped into IDLE time.
+ job_container/tmpfs - Support running with an arbitrary list of private
mount points (/tmp and /dev/shm are the default, but not required).
+ job_container/tmpfs - Set more environment variables in InitScript.
+ Make all cgroup directories created by Slurm owned by root. This was the
behavior in cgroup/v2 but not in cgroup/v1 where by default the step
directories ownership were set to the user and group of the job.
+ accounting_storage/mysql - change purge/archive to calculate record ages
based on end time, rather than start or submission times.
+ job_submit/lua - add support for log_user() from slurm_job_modify().
+ Run the following scripts in slurmscriptd instead of slurmctld:
ResumeProgram, ResumeFailProgram, SuspendProgram, ResvProlog, ResvEpilog,
and RebootProgram (only with SlurmctldParameters=reboot_from_controller).
+ Only permit changing log levels with 'srun --slurmd-debug' by root
or SlurmUser.
+ slurmctld will fatal() when reconfiguring the job_submit plugin fails.
+ Add PowerDownOnIdle partition option to power down nodes after nodes
become idle.
+ Add "[jobid.stepid]" prefix from slurmstepd and "slurmscriptd" prefix
from slurmcriptd to Syslog logging. Previously was only happening when
OBS-URL: https://build.opensuse.org/request/show/1068320
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=83
+ Fixed GpuFreqDef option. When set in slurm.conf, it will be used if
--gpu-freq was not explicitly set by the job step.
+ topology/tree - Add new TopologyParam=SwitchAsNodeRank option to reorder
nodes based on switch layout. This can be useful if the naming convention
for the nodes does not natually map to the network topology.
+ Removed the default setting for GpuFreqDef. If unset, no attempt to change
the GPU frequency will be made if --gpu-freq is not set for the step.
OBS-URL: https://build.opensuse.org/request/show/1068316
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=232
- updated to 23.02.0-0rc1
* Highlights
+ slurmctld - Add new RPC rate limiting feature. This is enabled through
SlurmctldParameters=rl_enable, otherwise disabled by default.
+ Make scontrol reconfigure and sending a SIGHUP to the slurmctld behave
the same. If you were using SIGHUP as a 'lighter' scontrol reconfigure
to rotate logs please update your scripts to use SIGUSR2 instead.
+ Change cloud nodes to show by default. PrivateData=cloud is no longer
needed.
+ sreport - Count planned (FKA reserved) time for jobs running in
IGNORE_JOBS reservations. Previously was lumped into IDLE time.
+ job_container/tmpfs - Support running with an arbitrary list of private
mount points (/tmp and /dev/shm are the default, but not required).
+ job_container/tmpfs - Set more environment variables in InitScript.
+ Make all cgroup directories created by Slurm owned by root. This was the
behavior in cgroup/v2 but not in cgroup/v1 where by default the step
directories ownership were set to the user and group of the job.
+ accounting_storage/mysql - change purge/archive to calculate record ages
based on end time, rather than start or submission times.
+ job_submit/lua - add support for log_user() from slurm_job_modify().
+ Run the following scripts in slurmscriptd instead of slurmctld:
ResumeProgram, ResumeFailProgram, SuspendProgram, ResvProlog, ResvEpilog,
and RebootProgram (only with SlurmctldParameters=reboot_from_controller).
+ Only permit changing log levels with 'srun --slurmd-debug' by root
or SlurmUser.
+ slurmctld will fatal() when reconfiguring the job_submit plugin fails.
+ Add PowerDownOnIdle partition option to power down nodes after nodes
become idle.
+ Add "[jobid.stepid]" prefix from slurmstepd and "slurmscriptd" prefix
from slurmcriptd to Syslog logging. Previously was only happening when
OBS-URL: https://build.opensuse.org/request/show/1067475
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=231