* slurmrestd - Remove deprecated fields from the following

`.result` from `POST /slurm/v0.0.42/job/submit`.  
     `.job_id`, `.step_id`, `.job_submit_user_msg` from `POST /slurm/v0.0.42/job/{job_id}`.  
     `.job.exclusive`, `.jobs[].exclusive` to `POST /slurm/v0.0.42/job/submit`.  
     `.jobs[].exclusive` from `GET /slurm/v0.0.42/job/{job_id}`.  
     `.jobs[].exclusive` from `GET /slurm/v0.0.42/jobs`.  
     `.job.oversubscribe`, `.jobs[].oversubscribe` to `POST /slurm/v0.0.42/job/submit`.  
     `.jobs[].oversubscribe` from `GET /slurm/v0.0.42/job/{job_id}`.  
     `.jobs[].oversubscribe` from `GET /slurm/v0.0.42/jobs`.  
     `DELETE /slurm/v0.0.40/jobs`  
     `DELETE /slurm/v0.0.41/jobs`  
     `DELETE /slurm/v0.0.42/jobs`  
    allocation is granted.
    `job|socket|task` or `cpus|mem` per GRES.
    node update whereas previously only single nodes could be
    updated through `/node/<nodename>` endpoint:
    `POST /slurm/v0.0.42/nodes`
    partition as this is a cluster-wide option.
    `REQUEST_NODE_INFO RPC`.
    the db server is not reachable.
    (`.jobs[].priority_by_partition`) to JSON and YAML output.
    connection` error if the error was the result of an
    authentication failure.
    errors with the `SLURM_PROTOCOL_AUTHENTICATION_ERROR` error
    code.
    of `Unspecified error` if querying the following endpoints
    fails:  
    `GET /slurm/v0.0.40/diag/`  
    `GET /slurm/v0.0.41/diag/`  
    `GET /slurm/v0.0.42/diag/`

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=307
This commit is contained in:
Egbert Eich 2025-01-17 21:14:19 +00:00 committed by Git OBS Bridge
parent 3a3588a812
commit 247a29f2a0

View File

@ -80,16 +80,16 @@ Mon Jan 6 12:40:31 UTC 2025 - Egbert Eich <eich@suse.com>
* Increase efficency of sending logs to syslog. * Increase efficency of sending logs to syslog.
* Switch to new official YAML mime type `application/yaml` in * Switch to new official YAML mime type `application/yaml` in
compliance with RFC9512 as primary mime type for YAML formatting. compliance with RFC9512 as primary mime type for YAML formatting.
* `slurmrestd` - Removed deprecated fields from the following * `slurmrestd` - Remove deprecated fields from the following
endpoints: endpoints:
`.result' from `POST /slurm/v0.0.42/job/submit`. `.result` from `POST /slurm/v0.0.42/job/submit`.
`.job_id`, `.step_id`, `.job_submit_user_msg` from `POST /slurm/v0.0.42/job/{job_id}`. `.job_id`, `.step_id`, `.job_submit_user_msg` from `POST /slurm/v0.0.42/job/{job_id}`.
`.job.exclusive`, `.jobs[].exclusive` to `POST /slurm/v0.0.42/job/submit`. `.job.exclusive`, `.jobs[].exclusive` to `POST /slurm/v0.0.42/job/submit`.
`.jobs[].exclusive` from `GET /slurm/v0.0.42/job/{job_id}`. `.jobs[].exclusive` from `GET /slurm/v0.0.42/job/{job_id}`.
`.jobs[].exclusive` from `GET /slurm/v0.0.42/jobs`. `.jobs[].exclusive` from `GET /slurm/v0.0.42/jobs`.
`.job.oversubscribe`, `.jobs[].oversubscribe` to `POST /slurm/v0.0.42/job/submit`. `.job.oversubscribe`, `.jobs[].oversubscribe` to `POST /slurm/v0.0.42/job/submit`.
`.jobs[].oversubscribe` from `GET /slurm/v0.0.42/job/{job_id}`. `.jobs[].oversubscribe` from `GET /slurm/v0.0.42/job/{job_id}`.
`.jobs[].oversubscribe` from `GET /slurm/v0.0.42/jobs`. `.jobs[].oversubscribe` from `GET /slurm/v0.0.42/jobs`.
* `scontrol` - Removed deprecated fields `.jobs[].exclusive` and * `scontrol` - Removed deprecated fields `.jobs[].exclusive` and
`.jobs[].oversubscribe` from `scontrol show jobs --{json|yaml}`. `.jobs[].oversubscribe` from `scontrol show jobs --{json|yaml}`.
* `squeue` - Removed deprecated fields `.jobs[].exclusive` and * `squeue` - Removed deprecated fields `.jobs[].exclusive` and
@ -105,297 +105,297 @@ Mon Jan 6 12:40:31 UTC 2025 - Egbert Eich <eich@suse.com>
to the drivers. to the drivers.
* Limit `SwitchName` to `HOST_NAME_MAX` chars length. * Limit `SwitchName` to `HOST_NAME_MAX` chars length.
* For `scancel --ctld` and the following rest api endpoints: * For `scancel --ctld` and the following rest api endpoints:
`DELETE /slurm/v0.0.40/jobs` `DELETE /slurm/v0.0.40/jobs`
`DELETE /slurm/v0.0.41/jobs` `DELETE /slurm/v0.0.41/jobs`
`DELETE /slurm/v0.0.42/jobs` `DELETE /slurm/v0.0.42/jobs`
Support array expressions in the responses to the client. Support array expressions in the responses to the client.
* `salloc` - Always output node names to the user when an * `salloc` - Always output node names to the user when an
allocation is granted. allocation is granted.
* `slurmrestd` - Removed all v0.0.39 endpoints. * `slurmrestd` - Removed all v0.0.39 endpoints.
* `select/linear` - Reject jobs asking for GRES per * `select/linear` - Reject jobs asking for GRES per
`job|socket|task` or `cpus|mem` per GRES. `job|socket|task` or `cpus|mem` per GRES.
* Add `/nodes` POST endpoint to REST API, supports multiple * Add `/nodes` POST endpoint to REST API, supports multiple
node update whereas previously only single nodes could be node update whereas previously only single nodes could be
updated through `/node/<nodename>` endpoint: updated through `/node/<nodename>` endpoint:
`POST /slurm/v0.0.42/nodes` `POST /slurm/v0.0.42/nodes`
* Do not allow changing or setting `PreemptMode=GANG` to a * Do not allow changing or setting `PreemptMode=GANG` to a
partition as this is a cluster-wide option. partition as this is a cluster-wide option.
* Add `%b` as a file name pattern for the array task id modulo 10. * Add `%b` as a file name pattern for the array task id modulo 10.
* Skip packing empty nodes when they are hidden during * Skip packing empty nodes when they are hidden during
`REQUEST_NODE_INFO RPC`. `REQUEST_NODE_INFO RPC`.
* `accounting_storage/mysql` - Avoid a fatal condition when * `accounting_storage/mysql` - Avoid a fatal condition when
the db server is not reachable. the db server is not reachable.
* Always lay out steps cyclically on nodes in an allocation. * Always lay out steps cyclically on nodes in an allocation.
* `squeue` - add priority by partition * `squeue` - add priority by partition
(`.jobs[].priority_by_partition`) to JSON and YAML output. (`.jobs[].priority_by_partition`) to JSON and YAML output.
* `slurmrestd` - Add clarification to `failed to open slurmdbd * `slurmrestd` - Add clarification to `failed to open slurmdbd
connection` error if the error was the result of an connection` error if the error was the result of an
authentication failure. authentication failure.
* Make it so `slurmctld` responds to RPCs that have authentication * Make it so `slurmctld` responds to RPCs that have authentication
errors with the `SLURM_PROTOCOL_AUTHENTICATION_ERROR` error errors with the `SLURM_PROTOCOL_AUTHENTICATION_ERROR` error
code. code.
* `openapi/slurmctld` - Display the correct error code instead * `openapi/slurmctld` - Display the correct error code instead
of `Unspecified error` if querying the following endpoints of `Unspecified error` if querying the following endpoints
fails: fails:
`GET /slurm/v0.0.40/diag/` `GET /slurm/v0.0.40/diag/`
`GET /slurm/v0.0.41/diag/` `GET /slurm/v0.0.41/diag/`
`GET /slurm/v0.0.42/diag/` `GET /slurm/v0.0.42/diag/`
`GET /slurm/v0.0.40/licenses/` `GET /slurm/v0.0.40/licenses/`
`GET /slurm/v0.0.41/licenses/` `GET /slurm/v0.0.41/licenses/`
`GET /slurm/v0.0.42/licenses/` `GET /slurm/v0.0.42/licenses/`
`GET /slurm/v0.0.40/reconfigure` `GET /slurm/v0.0.40/reconfigure`
`GET /slurm/v0.0.41/reconfigure` `GET /slurm/v0.0.41/reconfigure`
`GET /slurm/v0.0.42/reconfigure` `GET /slurm/v0.0.42/reconfigure`
* Fix how used CPUs are tracked in a job allocation to allow the * Fix how used CPUs are tracked in a job allocation to allow the
max number of concurrent steps to run at a time if threads per max number of concurrent steps to run at a time if threads per
core is greater than 1. core is greater than 1.
* In existing allocations SLURM_GPUS_PER_NODE environment * In existing allocations SLURM_GPUS_PER_NODE environment
variable will be ignored by srun if `--gpus` is specified. variable will be ignored by srun if `--gpus` is specified.
* When using `--get-user-env` explicitly or implicitly, check * When using `--get-user-env` explicitly or implicitly, check
if PID or mnt namespaces are disabled and fall back to old if PID or mnt namespaces are disabled and fall back to old
logic that does not rely on them when they are not available. logic that does not rely on them when they are not available.
* Removed non-functional option `SLURM_PROLOG_CPU_MASK` from * Removed non-functional option `SLURM_PROLOG_CPU_MASK` from
`TaskProlog` which was used to reset the affinity of a task `TaskProlog` which was used to reset the affinity of a task
based on the mask given. based on the mask given.
* `slurmrestd` - Support passing of `-d latest` to load latest * `slurmrestd` - Support passing of `-d latest` to load latest
version of `data_parser` plugin. version of `data_parser` plugin.
* `sacct`,`sacctmgr`,`scontrol`,`sdiag`,`sinfo`,`squeue`,`sshare` * `sacct`,`sacctmgr`,`scontrol`,`sdiag`,`sinfo`,`squeue`,`sshare`
- Change response to `--json=list` or `--yaml=list` to send - Change response to `--json=list` or `--yaml=list` to send
list of plugins to stdout and descriptive header to stderr to list of plugins to stdout and descriptive header to stderr to
allow for easier parsing. allow for easier parsing.
* `slurmrestd` - Change response to `-d list`, `-a list` or * `slurmrestd` - Change response to `-d list`, `-a list` or
`-s list` to send list of plugins to stdout and descriptive `-s list` to send list of plugins to stdout and descriptive
header to stderr to allow for easier parsing. header to stderr to allow for easier parsing.
* `sacct`,`sacctmgr`,`scontrol`,`sdiag`,`sinfo`,`squeue`, * `sacct`,`sacctmgr`,`scontrol`,`sdiag`,`sinfo`,`squeue`,
`sshare`,`slurmrestd` - Avoid crash when loading `data_parser` `sshare`,`slurmrestd` - Avoid crash when loading `data_parser`
plugins fail due to NULL dereference. plugins fail due to NULL dereference.
* Add autodetected GPUs to the output of `slurmd -C` * Add autodetected GPUs to the output of `slurmd -C`
* Remove `burst_buffer/lua` call `slurm.job_info_to_string()`. * Remove `burst_buffer/lua` call `slurm.job_info_to_string()`.
* Add `SchedulerParameters=bf_allow_magnetic_slot` option. It * Add `SchedulerParameters=bf_allow_magnetic_slot` option. It
allows jobs in magnetic reservations to be planned by backfill allows jobs in magnetic reservations to be planned by backfill
scheduler. scheduler.
* `slurmrestd` - Refuse to run as root, `SlurmUser`, and * `slurmrestd` - Refuse to run as root, `SlurmUser`, and
`nobody(99)`. `nobody(99)`.
* `openapi/slurmctld` - Revert regression that caused signaling * `openapi/slurmctld` - Revert regression that caused signaling
jobs to cancel entire job arrays instead of job array tasks: jobs to cancel entire job arrays instead of job array tasks:
`DELETE /slurm/v0.0.40/{job_id}` `DELETE /slurm/v0.0.40/{job_id}`
`DELETE /slurm/v0.0.41/{job_id}` `DELETE /slurm/v0.0.41/{job_id}`
`DELETE /slurm/v0.0.42/{job_id}` `DELETE /slurm/v0.0.42/{job_id}`
* `openapi/slurmctld` - Support more formats for `{job_id}` * `openapi/slurmctld` - Support more formats for `{job_id}`
including job steps: including job steps:
`DELETE /slurm/v0.0.40/{job_id}` `DELETE /slurm/v0.0.40/{job_id}`
`DELETE /slurm/v0.0.41/{job_id}` `DELETE /slurm/v0.0.41/{job_id}`
`DELETE /slurm/v0.0.42/{job_id}` `DELETE /slurm/v0.0.42/{job_id}`
* Alter scheduling of jobs at submission time to consider job * Alter scheduling of jobs at submission time to consider job
submission time and job id. This makes it so that that submission time and job id. This makes it so that that
interactive jobs aren't allocated resources before batch jobs interactive jobs aren't allocated resources before batch jobs
when they have the same priority at submit time. when they have the same priority at submit time.
* Fix multi-cluster submissions with differing Switch plugins. * Fix multi-cluster submissions with differing Switch plugins.
* `slurmrestd` - Change `+prefer_refs` flag to default in * `slurmrestd` - Change `+prefer_refs` flag to default in
`data_parser/v0.0.42` plugin. Add `+minimize_refs` flag to `data_parser/v0.0.42` plugin. Add `+minimize_refs` flag to
inline single referenced schemas in the OpenAPI schema. This inline single referenced schemas in the OpenAPI schema. This
sets the default OpenAPI schema generation behavior of sets the default OpenAPI schema generation behavior of
`data_parser/v0.0.42` to match v0.0.41 `+prefer_refs` and `data_parser/v0.0.42` to match v0.0.41 `+prefer_refs` and
v0.0.40 (without flags). v0.0.40 (without flags).
* Fix `LaunchParameters=batch_step_set_cpu_freq`. * Fix `LaunchParameters=batch_step_set_cpu_freq`.
* Clearer `seff` warning message for running jobs. * Clearer `seff` warning message for running jobs.
* `data_parser/v0.0.42` - Rename `JOB_INFO` field * `data_parser/v0.0.42` - Rename `JOB_INFO` field
`minimum_switches` to `required_switches` to reflect the `minimum_switches` to `required_switches` to reflect the
actual behavior. actual behavior.
* `data_parser/v0.0.42` - Rename `ACCOUNT_CONDITION` field * `data_parser/v0.0.42` - Rename `ACCOUNT_CONDITION` field
`assocation` to `association` to fix typo. `assocation` to `association` to fix typo.
* `cgroup/v2` - fix cgroup cleanup when running inside a * `cgroup/v2` - fix cgroup cleanup when running inside a
container without write permissions to `/sys/fs/cgroup`. container without write permissions to `/sys/fs/cgroup`.
* `cgroup/v2` - fix accounting of swap events detection. * `cgroup/v2` - fix accounting of swap events detection.
* Fix gathering MaxRSS for jobs that run shorter than two * Fix gathering MaxRSS for jobs that run shorter than two
`jobacctgather` intervals. Get the metrics from cgroups `jobacctgather` intervals. Get the metrics from cgroups
`memory.peak` or `memory.max_usage_in_bytes` where available. `memory.peak` or `memory.max_usage_in_bytes` where available.
* `openapi/slurmctld` - Set complex number support for the * `openapi/slurmctld` - Set complex number support for the
following fields: following fields:
`.shares[][].fairshare.factor` `.shares[][].fairshare.factor`
`.shares[][].fairshare.level` `.shares[][].fairshare.level`
for endpoints: for endpoints:
`GET /slurm/v0.0.42/shares` `GET /slurm/v0.0.42/shares`
and for commands: and for commands:
`sshare --json` `sshare --json`
`sshare --yaml` `sshare --yaml`
* `data_parser/v0.0.42` - Avoid dumping `Infinity` for `NO_VAL` * `data_parser/v0.0.42` - Avoid dumping `Infinity` for `NO_VAL`
tagged `number` fields. tagged `number` fields.
* Add `TopologyParam=TopoMaxSizeUnroll=#` to allow * Add `TopologyParam=TopoMaxSizeUnroll=#` to allow
`--nodes=<min>-<max>` for `topology/block`. `--nodes=<min>-<max>` for `topology/block`.
* `sacct` - Respect `--noheader` for `--batch-script` and * `sacct` - Respect `--noheader` for `--batch-script` and
`--env-vars`. `--env-vars`.
* `sacct` - Remove extra newline in output from `--batch-script` * `sacct` - Remove extra newline in output from `--batch-script`
and --env-vars. and --env-vars.
* Add `sacctmgr ping` command to query status of `slurmdbd`. * Add `sacctmgr ping` command to query status of `slurmdbd`.
* Generate an error message when a `NodeSet` name conflicts with * Generate an error message when a `NodeSet` name conflicts with
a `NodeName`, and prevent the controller from starting if such a `NodeName`, and prevent the controller from starting if such
a conflict exists. a conflict exists.
* `slurmd` - properly detect slurmd restarts in the energy * `slurmd` - properly detect slurmd restarts in the energy
gathering logic which caused bad numbers in accounting. gathering logic which caused bad numbers in accounting.
* `sackd` - retry fetching slurm configs indefinately in * `sackd` - retry fetching slurm configs indefinately in
configless mode. configless mode.
* `job_submit/lua` - Add `assoc_qos` attribute to `job_desc` * `job_submit/lua` - Add `assoc_qos` attribute to `job_desc`
to display all potential QOS's for a job's association. to display all potential QOS's for a job's association.
* `job_submit/lua` - Add `slurm.get_qos_priority()` function * `job_submit/lua` - Add `slurm.get_qos_priority()` function
to retrieve the given QOS's priority. to retrieve the given QOS's priority.
* `sbcast` - Add `--nodelist` option to specify where files are * `sbcast` - Add `--nodelist` option to specify where files are
transmitted to. transmitted to.
* `sbcast` - Add `--no-allocation` option to transmit files to * `sbcast` - Add `--no-allocation` option to transmit files to
nodes outside of a job allocation nodes outside of a job allocation
* Add `DataParserParameters` `slurm.conf` parameter to allow * Add `DataParserParameters` `slurm.conf` parameter to allow
setting default value for CLI `--json` and `--yaml` arguments. setting default value for CLI `--json` and `--yaml` arguments.
* `seff` - improve step's max memory consumption report by using * `seff` - improve step's max memory consumption report by using
`TresUsageInTot` and `TresUsageInAve` instead of overestimating `TresUsageInTot` and `TresUsageInAve` instead of overestimating
the values. the values.
* Enable RPC queueing for `REQUEST_KILL_JOBS`, which is used when * Enable RPC queueing for `REQUEST_KILL_JOBS`, which is used when
`scancel` is executed with `--ctld` flag. `scancel` is executed with `--ctld` flag.
* `slurmdbd` - Add `-u` option. This is used to determine if * `slurmdbd` - Add `-u` option. This is used to determine if
restarting the DBD will result in database conversion. restarting the DBD will result in database conversion.
* Fix `srun` inside an `salloc` in a federated cluster when using * Fix `srun` inside an `salloc` in a federated cluster when using
IPv6. IPv6.
* Calculate the forwarding timeouts according to tree depth * Calculate the forwarding timeouts according to tree depth
rather than node count / tree width for each level. Fixes race rather than node count / tree width for each level. Fixes race
conditions with same timeouts between two consecutive node conditions with same timeouts between two consecutive node
levels. levels.
* Add ability to submit jobs with multiple QOS. * Add ability to submit jobs with multiple QOS.
* Fix difference in behavior when swapping partition order in job * Fix difference in behavior when swapping partition order in job
submission. submission.
* Improve `PLANNED` state detection for mixed nodes and updating * Improve `PLANNED` state detection for mixed nodes and updating
state before yielding backfill locks. state before yielding backfill locks.
* Always consider partition priority tiers when deciding to try * Always consider partition priority tiers when deciding to try
scheduling jobs on submit. scheduling jobs on submit.
* Prevent starting jobs without reservations on submit when there * Prevent starting jobs without reservations on submit when there
are pending jobs with reservations that have flags `FLEX` or are pending jobs with reservations that have flags `FLEX` or
`ANY_NODES` that can be scheduled on overlapping nodes. `ANY_NODES` that can be scheduled on overlapping nodes.
* Prevent jobs that request both high and low priority tier * Prevent jobs that request both high and low priority tier
partitions from starting on submit in lower priority tier partitions from starting on submit in lower priority tier
partitions if it could delay pending jobs in higher priority partitions if it could delay pending jobs in higher priority
tier partitions. tier partitions.
* `scontrol` - Wait for `slurmctld` to start reconfigure in * `scontrol` - Wait for `slurmctld` to start reconfigure in
foreground mode before returning. foreground mode before returning.
* Improve reconfigure handling on Linux to only close open file * Improve reconfigure handling on Linux to only close open file
descriptors to avoid long delays on systems with large descriptors to avoid long delays on systems with large
`RLIMIT_NOFILE` settings. `RLIMIT_NOFILE` settings.
* `salloc` - Removed `--get-user-env` option. * `salloc` - Removed `--get-user-env` option.
* Removed the instant on feature from `switch/hpe_slingshot`. * Removed the instant on feature from `switch/hpe_slingshot`.
* Hardware collectives in `switch/hpe_slingshot` now requires * Hardware collectives in `switch/hpe_slingshot` now requires
`enable_stepmgr`. `enable_stepmgr`.
* Allow backfill to plan jobs on nodes currently being used by * Allow backfill to plan jobs on nodes currently being used by
exclusive user or mcs jobs. exclusive user or mcs jobs.
* Avoid miscaching IPv6 address to hostname lookups that could * Avoid miscaching IPv6 address to hostname lookups that could
have caused logs to have the incorrect hostname. have caused logs to have the incorrect hostname.
* `scontrol` - Add `--json`/`--yaml` support to `listpids` * `scontrol` - Add `--json`/`--yaml` support to `listpids`
* `scontrol` - Add `liststeps` * `scontrol` - Add `liststeps`
* `scontrol` - Add `listjobs` * `scontrol` - Add `listjobs`
* `slurmrestd` - Avoid connection to slurmdbd for the following * `slurmrestd` - Avoid connection to slurmdbd for the following
endpoints: endpoints:
`GET /slurm/v0.0.42/jobs` `GET /slurm/v0.0.42/jobs`
`GET /slurm/v0.0.42/job/{job_id}` `GET /slurm/v0.0.42/job/{job_id}`
* `slurmctld` - Changed incoming RPC handling to dedicated thread * `slurmctld` - Changed incoming RPC handling to dedicated thread
pool. pool.
* `job_container/tmpfs` - Add `EntireStepInNS` option that will * `job_container/tmpfs` - Add `EntireStepInNS` option that will
place the `slurmstepd` process within the constructed namespace place the `slurmstepd` process within the constructed namespace
directly. directly.
* `scontrol show topo` - Show aggregated block sizes when using * `scontrol show topo` - Show aggregated block sizes when using
`topology/block`. `topology/block`.
* `slurmrestd` - Add more descriptive HTTP status for * `slurmrestd` - Add more descriptive HTTP status for
authentication failure and connectivity errors with controller. authentication failure and connectivity errors with controller.
* `slurmrestd` - Improve reporting errors from `slurmctld` for * `slurmrestd` - Improve reporting errors from `slurmctld` for
job queries: job queries:
`GET /slurm/v0.0.41/{job_id}` `GET /slurm/v0.0.41/{job_id}`
`GET /slurm/v0.0.41/jobs/` `GET /slurm/v0.0.41/jobs/`
* Avoid rejecting a step request that needs fewer GRES than nodes * Avoid rejecting a step request that needs fewer GRES than nodes
in the job allocation. in the job allocation.
* `slurmrestd` - Tag the never populated `.jobs[].pid` field as * `slurmrestd` - Tag the never populated `.jobs[].pid` field as
deprecated for the following endpoints: deprecated for the following endpoints:
`GET /slurm/v0.0.42/{job_id}` `GET /slurm/v0.0.42/{job_id}`
`GET /slurm/v0.0.42/jobs/` `GET /slurm/v0.0.42/jobs/`
* `scontrol`,`squeue` - Tag the never populated `.jobs[].pid` field * `scontrol`,`squeue` - Tag the never populated `.jobs[].pid` field
as deprecated for the following: as deprecated for the following:
`scontrol show jobs --json` `scontrol show jobs --json`
`scontrol show jobs --yaml` `scontrol show jobs --yaml`
`scontrol show job ${JOB_ID} --json` `scontrol show job ${JOB_ID} --json`
`scontrol show job ${JOB_ID} --yaml` `scontrol show job ${JOB_ID} --yaml`
`squeue --json` `squeue --json`
`squeue --yaml` `squeue --yaml`
* `data_parser` v0.0.42 - fix timestamp parsing regression * `data_parser` v0.0.42 - fix timestamp parsing regression
introduced in in v0.0.40 (eaf3b6631f), parsing of non iso 8601 introduced in in v0.0.40 (eaf3b6631f), parsing of non iso 8601
style timestamps style timestamps
* `cgroup/v2` will detect some special container and namespaced * `cgroup/v2` will detect some special container and namespaced
setups and will work with it. setups and will work with it.
* Support IPv6 in configless mode. * Support IPv6 in configless mode.
* Add `SlurmctldParamters=ignore_constraint_validation` to ignore * Add `SlurmctldParamters=ignore_constraint_validation` to ignore
`constraint/feature` validation at submission. `constraint/feature` validation at submission.
* `slurmrestd` - Set `.pings[].mode` field as deprecated in the * `slurmrestd` - Set `.pings[].mode` field as deprecated in the
following endpoints: following endpoints:
`GET /slurm/v0.0.42/ping` `GET /slurm/v0.0.42/ping`
* `scontrol` - Set `.pings[].mode` field as deprecated in the * `scontrol` - Set `.pings[].mode` field as deprecated in the
following commands: following commands:
`scontrol ping --json` `scontrol ping --json`
`scontrol ping --yaml` `scontrol ping --yaml`
* `slurmrestd` - Set `.pings[].pinged` field as deprecated in * `slurmrestd` - Set `.pings[].pinged` field as deprecated in
the following endpoints: the following endpoints:
`GET /slurm/v0.0.42/ping` `GET /slurm/v0.0.42/ping`
* `scontrol` - Set `.pings[].pinged` field as deprecated in the * `scontrol` - Set `.pings[].pinged` field as deprecated in the
following commands: following commands:
`scontrol ping --json` `scontrol ping --json`
`scontrol ping --yaml` `scontrol ping --yaml`
* `slurmrestd` - Add `.pings[].primary` field to the following * `slurmrestd` - Add `.pings[].primary` field to the following
endpoints: endpoints:
`GET /slurm/v0.0.42/ping` `GET /slurm/v0.0.42/ping`
* `scontrol` - Add `.pings[].primary` field to the following * `scontrol` - Add `.pings[].primary` field to the following
commands: commands:
`scontrol ping --json` `scontrol ping --json`
`scontrol ping --yaml` `scontrol ping --yaml`
* `slurmrestd` - Add `.pings[].responding` field to the following * `slurmrestd` - Add `.pings[].responding` field to the following
endpoints: endpoints:
`GET /slurm/v0.0.42/ping` `GET /slurm/v0.0.42/ping`
* `scontrol` - Add `.pings[].responding` field to the following * `scontrol` - Add `.pings[].responding` field to the following
commands: commands:
`scontrol ping --json` `scontrol ping --json`
`scontrol ping --yaml` `scontrol ping --yaml`
* Prevent jobs without reservations from delaying jobs in * Prevent jobs without reservations from delaying jobs in
reservations with flags `FLEX` or `ANY_NODES` in the main reservations with flags `FLEX` or `ANY_NODES` in the main
scheduler. scheduler.
* Fix allowing to ask for multiple different types of TRES * Fix allowing to ask for multiple different types of TRES
when one of them has a value of 0. when one of them has a value of 0.
* `slurmctld` - Add a grace period to ensure the agent retry * `slurmctld` - Add a grace period to ensure the agent retry
queue is properly flushed during shutdown. queue is properly flushed during shutdown.
* Don't ship `src/slurmrestd/plugins/openapi/slurmdbd/openapi.json` * Don't ship `src/slurmrestd/plugins/openapi/slurmdbd/openapi.json`
`slurmrest` should always be used to enerate a new OpenAPI `slurmrest` should always be used to enerate a new OpenAPI
schema (aka openapi.json or openapi.yaml). schema (aka openapi.json or openapi.yaml).
* `mpi/pmix` - Fix potential deadlock and races with het jobs, * `mpi/pmix` - Fix potential deadlock and races with het jobs,
and fix potential memory and FDs leaks. and fix potential memory and FDs leaks.
* Fix jobs with `--gpus` being rejected in some edge cases for * Fix jobs with `--gpus` being rejected in some edge cases for
partitions where not all nodes have the same amount of GPUs partitions where not all nodes have the same amount of GPUs
and CPUs configured. and CPUs configured.
* In an extra constraints expression in a job request, do not * In an extra constraints expression in a job request, do not
allow an empty string for a key or value. allow an empty string for a key or value.
* In an extra constraints expression in a job request, fix * In an extra constraints expression in a job request, fix
validation that requests are separated by boolean operators. validation that requests are separated by boolean operators.
* Add `TaskPluginParam=OOMKillStep` to kill the step as a whole * Add `TaskPluginParam=OOMKillStep` to kill the step as a whole
when one task OOMs. when one task OOMs.
* Fix `scontrol` show conf not showing all `TaskPluginParam` * Fix `scontrol` show conf not showing all `TaskPluginParam`
elements. elements.
* `slurmrestd` - Add fields `.job.oom_kill_step` * `slurmrestd` - Add fields `.job.oom_kill_step`
`.jobs[].oom_kill_step` to `POST /slurm/v0.0.42/job/submit` `.jobs[].oom_kill_step` to `POST /slurm/v0.0.42/job/submit`
and `POST /slurm/v0.0.42/job/allocate`. and `POST /slurm/v0.0.42/job/allocate`.
* Improve performance for `_will_run_test()`. * Improve performance for `_will_run_test()`.
* Add `SchedulerParameters=bf_topopt_enable` option to enable * Add `SchedulerParameters=bf_topopt_enable` option to enable
experimental hook to control backfill. experimental hook to control backfill.
* If a step fails to launch under certain conditions, set the * If a step fails to launch under certain conditions, set the
step's state to `NODE_FAIL`. step's state to `NODE_FAIL`.
* `sched/backfill` - Fix certain situations where a job would * `sched/backfill` - Fix certain situations where a job would
not get a planned time, which could lead to it being delayed not get a planned time, which could lead to it being delayed
by lower priority jobs. by lower priority jobs.
* `slurmrestd` - Dump JSON `null` instead of `{}` (empty object) * `slurmrestd` - Dump JSON `null` instead of `{}` (empty object)
for non-required fields in objects to avoid client for non-required fields in objects to avoid client
compatiblity issues for v0.0.42 version tagged endpoints. compatiblity issues for v0.0.42 version tagged endpoints.
* `sacct`,`sacctmgr`,`scontrol`,`sdiag`,`sinfo`,`squeue`, * `sacct`,`sacctmgr`,`scontrol`,`sdiag`,`sinfo`,`squeue`,
`sshare` - Dump `null` instead `{}` (empty object) for `sshare` - Dump `null` instead `{}` (empty object) for
non-required fields in objects to avoid client compatiblity non-required fields in objects to avoid client compatiblity
issues when run with `--json` or `--yaml`. issues when run with `--json` or `--yaml`.
------------------------------------------------------------------- -------------------------------------------------------------------
Fri Nov 1 12:50:27 UTC 2024 - Egbert Eich <eich@suse.com> Fri Nov 1 12:50:27 UTC 2024 - Egbert Eich <eich@suse.com>