forked from pool/slurm
Accepting request 1235784 from network:cluster
- Update to version 24.11 * `slurmctld` - Reject arbitrary distribution jobs that do not specifying a task count. * Fix backwards compatibility of the `RESPONSE_JOB_INFO RPC` (used by `squeue`, `scontrol show job`, etc.) with Slurm clients version 24.05 and below. This was a regression in 24.11.0rc1. * Do not let `slurmctld`/`slurmd` start if there are more nodes defined in `slurm.conf` than the maximum supported amount (64k nodes). * `slurmctld` - Set job's exit code to 1 when a job fails with state `JOB_NODE_FAIL`. This fixes `sbatch --wait` not being able to exit with error code when a job fails for this reason in some cases. * Fix certain reservation updates requested from 23.02 clients. * `slurmrestd` - Fix populating non-required object fields of objects as `{}` in JSON/YAML instead of `null` causing compiled OpenAPI clients to reject the response to `GET /slurm/v0.0.40/jobs` due to validation failure of `.jobs[].job_resources`. * Fix issue where older versions of Slurm talking to a 24.11 dbd could loose step accounting. * Fix minor memory leaks. * Fix bad memory reference when `xstrchr` fails to find char. * Remove duplicate checks for a data structure. * Fix race condition in `stepmgr` step completion handling. * `slurm.spec` - add ability to specify patches to apply on the command line. * `slurm.spec` - add ability to supply extra version information. * Fix 24.11 HA issues. * Fix requeued jobs keeping their priority until the decay thread (forwarded request 1235783 from eeich) OBS-URL: https://build.opensuse.org/request/show/1235784 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=109
This commit is contained in:
commit
e8b6930a42
@ -1,3 +0,0 @@
|
|||||||
version https://git-lfs.github.com/spec/v1
|
|
||||||
oid sha256:240a2105c8801bc0d222fa2bbcf46f71392ef94cce9253357e5f43f029adaf9b
|
|
||||||
size 7183430
|
|
3
slurm-24.11.0.tar.bz2
Normal file
3
slurm-24.11.0.tar.bz2
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:39ebeeeeb5d874e090b7f2629bd319bfe7c41510931ff2244f85e961bdc69056
|
||||||
|
size 7254375
|
393
slurm.changes
393
slurm.changes
@ -1,3 +1,390 @@
|
|||||||
|
-------------------------------------------------------------------
|
||||||
|
Mon Jan 6 12:40:31 UTC 2025 - Egbert Eich <eich@suse.com>
|
||||||
|
|
||||||
|
- Update to version 24.11
|
||||||
|
* `slurmctld` - Reject arbitrary distribution jobs that do not
|
||||||
|
specifying a task count.
|
||||||
|
* Fix backwards compatibility of the `RESPONSE_JOB_INFO RPC`
|
||||||
|
(used by `squeue`, `scontrol show job`, etc.) with Slurm clients
|
||||||
|
version 24.05 and below. This was a regression in 24.11.0rc1.
|
||||||
|
* Do not let `slurmctld`/`slurmd` start if there are more nodes
|
||||||
|
defined in `slurm.conf` than the maximum supported amount
|
||||||
|
(64k nodes).
|
||||||
|
* `slurmctld` - Set job's exit code to 1 when a job fails with
|
||||||
|
state `JOB_NODE_FAIL`. This fixes `sbatch --wait` not being able
|
||||||
|
to exit with error code when a job fails for this reason in
|
||||||
|
some cases.
|
||||||
|
* Fix certain reservation updates requested from 23.02 clients.
|
||||||
|
* `slurmrestd` - Fix populating non-required object fields of
|
||||||
|
objects as `{}` in JSON/YAML instead of `null` causing compiled
|
||||||
|
OpenAPI clients to reject the response to
|
||||||
|
`GET /slurm/v0.0.40/jobs` due to validation failure of
|
||||||
|
`.jobs[].job_resources`.
|
||||||
|
* Fix issue where older versions of Slurm talking to a 24.11 dbd
|
||||||
|
could loose step accounting.
|
||||||
|
* Fix minor memory leaks.
|
||||||
|
* Fix bad memory reference when `xstrchr` fails to find char.
|
||||||
|
* Remove duplicate checks for a data structure.
|
||||||
|
* Fix race condition in `stepmgr` step completion handling.
|
||||||
|
* `slurm.spec` - add ability to specify patches to apply on the
|
||||||
|
command line.
|
||||||
|
* `slurm.spec` - add ability to supply extra version information.
|
||||||
|
* Fix 24.11 HA issues.
|
||||||
|
* Fix requeued jobs keeping their priority until the decay thread
|
||||||
|
happens.
|
||||||
|
* Fix potential memory corruption in `select/cons_tres` plugin.
|
||||||
|
* Avoid cache coherency issue on non-x86 platforms that could
|
||||||
|
result in a POSIX signal being ignored or an abort().
|
||||||
|
* `slurmctld` - Remove assertion in development builds that would
|
||||||
|
trigger if an outdated client attempted to connect.
|
||||||
|
* `slurmd` - Wait for `PrologEpilogTimeout` on reconfigure for
|
||||||
|
prologs to finish. This avoids a situation where the slurmd
|
||||||
|
never detects that the prolog completed.
|
||||||
|
* `job_container/tmpfs` - Setup x11 forwarding within the namespace.
|
||||||
|
* `slurmctld` - fix memory leak when sending a `DBD_JOB_START`
|
||||||
|
message.
|
||||||
|
* Fix issue with accounting rollup dealing with association tables.
|
||||||
|
* Fix minor memory leaks.
|
||||||
|
* Fix potential thread safety issues.
|
||||||
|
* Init mutex in burst_buffer plugins.
|
||||||
|
* `slurmdbd` - don't log errors when no changes occur from db
|
||||||
|
requests.
|
||||||
|
* `slurmcltd`,`slurmd` - Avoid deadlock during reconfigure if too
|
||||||
|
many POSIX signals are received.
|
||||||
|
* Improve error type logged from partial or incomplete reading
|
||||||
|
from socket or pipe to avoid potentially logging an error from
|
||||||
|
a previous syscall.
|
||||||
|
* `slurmrestd` - Improve the handling of queries when unable to
|
||||||
|
connect to slurmdbd by providing responses when possible.
|
||||||
|
* `slurmrestd`,`sackd`,`scrun` - Avoid rare hangs related to I/O.
|
||||||
|
* `scrun` - Add support `--all` argument for kill subcommand.
|
||||||
|
* Remove `srun --cpu-bind=rank`.
|
||||||
|
* Add `resource_spec/cpus` and `resource_spec/memory` entry
|
||||||
|
points in data_parser to print the `CpuSpecList` and
|
||||||
|
`MemSpecLimit` in `sinfo --json`.
|
||||||
|
* `sinfo` - Add `.sinfo[].resource_spec.cpus` and
|
||||||
|
`.sinfo[].resource_spec.memory` fields to print the `CpuSpecList`
|
||||||
|
and `MemSpecLimit` dumped by `sinfo --{json|yaml}`.
|
||||||
|
* Increase efficency of sending logs to syslog.
|
||||||
|
* Switch to new official YAML mime type `application/yaml` in
|
||||||
|
compliance with RFC9512 as primary mime type for YAML formatting.
|
||||||
|
* `slurmrestd` - Removed deprecated fields from the following
|
||||||
|
endpoints:
|
||||||
|
`.result' from `POST /slurm/v0.0.42/job/submit`.
|
||||||
|
`.job_id`, `.step_id`, `.job_submit_user_msg` from `POST /slurm/v0.0.42/job/{job_id}`.
|
||||||
|
`.job.exclusive`, `.jobs[].exclusive` to `POST /slurm/v0.0.42/job/submit`.
|
||||||
|
`.jobs[].exclusive` from `GET /slurm/v0.0.42/job/{job_id}`.
|
||||||
|
`.jobs[].exclusive` from `GET /slurm/v0.0.42/jobs`.
|
||||||
|
`.job.oversubscribe`, `.jobs[].oversubscribe` to `POST /slurm/v0.0.42/job/submit`.
|
||||||
|
`.jobs[].oversubscribe` from `GET /slurm/v0.0.42/job/{job_id}`.
|
||||||
|
`.jobs[].oversubscribe` from `GET /slurm/v0.0.42/jobs`.
|
||||||
|
* `scontrol` - Removed deprecated fields `.jobs[].exclusive` and
|
||||||
|
`.jobs[].oversubscribe` from `scontrol show jobs --{json|yaml}`.
|
||||||
|
* `squeue` - Removed deprecated fields `.jobs[].exclusive` and
|
||||||
|
`.jobs[].oversubscribe` from `squeue --{json|yaml}`.
|
||||||
|
* Improve the way to run external commands and fork processes to
|
||||||
|
avoid non-async-signal safe calls between a fork and an exec.
|
||||||
|
We fork ourselves now and executes the commands in a safe
|
||||||
|
environment. This includes spank prolog/epilog executions.
|
||||||
|
* Improve `MaxMemPerCPU` enforcement when exclusive jobs request
|
||||||
|
per node memory and the partition has heterogeneous nodes.
|
||||||
|
* Remove a `TOCTOU` where multiple steps requesting an energy
|
||||||
|
reading at the same time could cause too frequent accesses
|
||||||
|
to the drivers.
|
||||||
|
* Limit `SwitchName` to `HOST_NAME_MAX` chars length.
|
||||||
|
* For `scancel --ctld` and the following rest api endpoints:
|
||||||
|
`DELETE /slurm/v0.0.40/jobs`
|
||||||
|
`DELETE /slurm/v0.0.41/jobs`
|
||||||
|
`DELETE /slurm/v0.0.42/jobs`
|
||||||
|
Support array expressions in the responses to the client.
|
||||||
|
* `salloc` - Always output node names to the user when an
|
||||||
|
allocation is granted.
|
||||||
|
* `slurmrestd` - Removed all v0.0.39 endpoints.
|
||||||
|
* `select/linear` - Reject jobs asking for GRES per
|
||||||
|
`job|socket|task` or `cpus|mem` per GRES.
|
||||||
|
* Add `/nodes` POST endpoint to REST API, supports multiple
|
||||||
|
node update whereas previously only single nodes could be
|
||||||
|
updated through `/node/<nodename>` endpoint:
|
||||||
|
`POST /slurm/v0.0.42/nodes`
|
||||||
|
* Do not allow changing or setting `PreemptMode=GANG` to a
|
||||||
|
partition as this is a cluster-wide option.
|
||||||
|
* Add `%b` as a file name pattern for the array task id modulo 10.
|
||||||
|
* Skip packing empty nodes when they are hidden during
|
||||||
|
`REQUEST_NODE_INFO RPC`.
|
||||||
|
* `accounting_storage/mysql` - Avoid a fatal condition when
|
||||||
|
the db server is not reachable.
|
||||||
|
* Always lay out steps cyclically on nodes in an allocation.
|
||||||
|
* `squeue` - add priority by partition
|
||||||
|
(`.jobs[].priority_by_partition`) to JSON and YAML output.
|
||||||
|
* `slurmrestd` - Add clarification to `failed to open slurmdbd
|
||||||
|
connection` error if the error was the result of an
|
||||||
|
authentication failure.
|
||||||
|
* Make it so `slurmctld` responds to RPCs that have authentication
|
||||||
|
errors with the `SLURM_PROTOCOL_AUTHENTICATION_ERROR` error
|
||||||
|
code.
|
||||||
|
* `openapi/slurmctld` - Display the correct error code instead
|
||||||
|
of `Unspecified error` if querying the following endpoints
|
||||||
|
fails:
|
||||||
|
`GET /slurm/v0.0.40/diag/`
|
||||||
|
`GET /slurm/v0.0.41/diag/`
|
||||||
|
`GET /slurm/v0.0.42/diag/`
|
||||||
|
`GET /slurm/v0.0.40/licenses/`
|
||||||
|
`GET /slurm/v0.0.41/licenses/`
|
||||||
|
`GET /slurm/v0.0.42/licenses/`
|
||||||
|
`GET /slurm/v0.0.40/reconfigure`
|
||||||
|
`GET /slurm/v0.0.41/reconfigure`
|
||||||
|
`GET /slurm/v0.0.42/reconfigure`
|
||||||
|
* Fix how used CPUs are tracked in a job allocation to allow the
|
||||||
|
max number of concurrent steps to run at a time if threads per
|
||||||
|
core is greater than 1.
|
||||||
|
* In existing allocations SLURM_GPUS_PER_NODE environment
|
||||||
|
variable will be ignored by srun if `--gpus` is specified.
|
||||||
|
* When using `--get-user-env` explicitly or implicitly, check
|
||||||
|
if PID or mnt namespaces are disabled and fall back to old
|
||||||
|
logic that does not rely on them when they are not available.
|
||||||
|
* Removed non-functional option `SLURM_PROLOG_CPU_MASK` from
|
||||||
|
`TaskProlog` which was used to reset the affinity of a task
|
||||||
|
based on the mask given.
|
||||||
|
* `slurmrestd` - Support passing of `-d latest` to load latest
|
||||||
|
version of `data_parser` plugin.
|
||||||
|
* `sacct`,`sacctmgr`,`scontrol`,`sdiag`,`sinfo`,`squeue`,`sshare`
|
||||||
|
- Change response to `--json=list` or `--yaml=list` to send
|
||||||
|
list of plugins to stdout and descriptive header to stderr to
|
||||||
|
allow for easier parsing.
|
||||||
|
* `slurmrestd` - Change response to `-d list`, `-a list` or
|
||||||
|
`-s list` to send list of plugins to stdout and descriptive
|
||||||
|
header to stderr to allow for easier parsing.
|
||||||
|
* `sacct`,`sacctmgr`,`scontrol`,`sdiag`,`sinfo`,`squeue`,
|
||||||
|
`sshare`,`slurmrestd` - Avoid crash when loading `data_parser`
|
||||||
|
plugins fail due to NULL dereference.
|
||||||
|
* Add autodetected GPUs to the output of `slurmd -C`
|
||||||
|
* Remove `burst_buffer/lua` call `slurm.job_info_to_string()`.
|
||||||
|
* Add `SchedulerParameters=bf_allow_magnetic_slot` option. It
|
||||||
|
allows jobs in magnetic reservations to be planned by backfill
|
||||||
|
scheduler.
|
||||||
|
* `slurmrestd` - Refuse to run as root, `SlurmUser`, and
|
||||||
|
`nobody(99)`.
|
||||||
|
* `openapi/slurmctld` - Revert regression that caused signaling
|
||||||
|
jobs to cancel entire job arrays instead of job array tasks:
|
||||||
|
`DELETE /slurm/v0.0.40/{job_id}`
|
||||||
|
`DELETE /slurm/v0.0.41/{job_id}`
|
||||||
|
`DELETE /slurm/v0.0.42/{job_id}`
|
||||||
|
* `openapi/slurmctld` - Support more formats for `{job_id}`
|
||||||
|
including job steps:
|
||||||
|
`DELETE /slurm/v0.0.40/{job_id}`
|
||||||
|
`DELETE /slurm/v0.0.41/{job_id}`
|
||||||
|
`DELETE /slurm/v0.0.42/{job_id}`
|
||||||
|
* Alter scheduling of jobs at submission time to consider job
|
||||||
|
submission time and job id. This makes it so that that
|
||||||
|
interactive jobs aren't allocated resources before batch jobs
|
||||||
|
when they have the same priority at submit time.
|
||||||
|
* Fix multi-cluster submissions with differing Switch plugins.
|
||||||
|
* `slurmrestd` - Change `+prefer_refs` flag to default in
|
||||||
|
`data_parser/v0.0.42` plugin. Add `+minimize_refs` flag to
|
||||||
|
inline single referenced schemas in the OpenAPI schema. This
|
||||||
|
sets the default OpenAPI schema generation behavior of
|
||||||
|
`data_parser/v0.0.42` to match v0.0.41 `+prefer_refs` and
|
||||||
|
v0.0.40 (without flags).
|
||||||
|
* Fix `LaunchParameters=batch_step_set_cpu_freq`.
|
||||||
|
* Clearer `seff` warning message for running jobs.
|
||||||
|
* `data_parser/v0.0.42` - Rename `JOB_INFO` field
|
||||||
|
`minimum_switches` to `required_switches` to reflect the
|
||||||
|
actual behavior.
|
||||||
|
* `data_parser/v0.0.42` - Rename `ACCOUNT_CONDITION` field
|
||||||
|
`assocation` to `association` to fix typo.
|
||||||
|
* `cgroup/v2` - fix cgroup cleanup when running inside a
|
||||||
|
container without write permissions to `/sys/fs/cgroup`.
|
||||||
|
* `cgroup/v2` - fix accounting of swap events detection.
|
||||||
|
* Fix gathering MaxRSS for jobs that run shorter than two
|
||||||
|
`jobacctgather` intervals. Get the metrics from cgroups
|
||||||
|
`memory.peak` or `memory.max_usage_in_bytes` where available.
|
||||||
|
* `openapi/slurmctld` - Set complex number support for the
|
||||||
|
following fields:
|
||||||
|
`.shares[][].fairshare.factor`
|
||||||
|
`.shares[][].fairshare.level`
|
||||||
|
for endpoints:
|
||||||
|
`GET /slurm/v0.0.42/shares`
|
||||||
|
and for commands:
|
||||||
|
`sshare --json`
|
||||||
|
`sshare --yaml`
|
||||||
|
* `data_parser/v0.0.42` - Avoid dumping `Infinity` for `NO_VAL`
|
||||||
|
tagged `number` fields.
|
||||||
|
* Add `TopologyParam=TopoMaxSizeUnroll=#` to allow
|
||||||
|
`--nodes=<min>-<max>` for `topology/block`.
|
||||||
|
* `sacct` - Respect `--noheader` for `--batch-script` and
|
||||||
|
`--env-vars`.
|
||||||
|
* `sacct` - Remove extra newline in output from `--batch-script`
|
||||||
|
and --env-vars.
|
||||||
|
* Add `sacctmgr ping` command to query status of `slurmdbd`.
|
||||||
|
* Generate an error message when a `NodeSet` name conflicts with
|
||||||
|
a `NodeName`, and prevent the controller from starting if such
|
||||||
|
a conflict exists.
|
||||||
|
* `slurmd` - properly detect slurmd restarts in the energy
|
||||||
|
gathering logic which caused bad numbers in accounting.
|
||||||
|
* `sackd` - retry fetching slurm configs indefinately in
|
||||||
|
configless mode.
|
||||||
|
* `job_submit/lua` - Add `assoc_qos` attribute to `job_desc`
|
||||||
|
to display all potential QOS's for a job's association.
|
||||||
|
* `job_submit/lua` - Add `slurm.get_qos_priority()` function
|
||||||
|
to retrieve the given QOS's priority.
|
||||||
|
* `sbcast` - Add `--nodelist` option to specify where files are
|
||||||
|
transmitted to.
|
||||||
|
* `sbcast` - Add `--no-allocation` option to transmit files to
|
||||||
|
nodes outside of a job allocation
|
||||||
|
* Add `DataParserParameters` `slurm.conf` parameter to allow
|
||||||
|
setting default value for CLI `--json` and `--yaml` arguments.
|
||||||
|
* `seff` - improve step's max memory consumption report by using
|
||||||
|
`TresUsageInTot` and `TresUsageInAve` instead of overestimating
|
||||||
|
the values.
|
||||||
|
* Enable RPC queueing for `REQUEST_KILL_JOBS`, which is used when
|
||||||
|
`scancel` is executed with `--ctld` flag.
|
||||||
|
* `slurmdbd` - Add `-u` option. This is used to determine if
|
||||||
|
restarting the DBD will result in database conversion.
|
||||||
|
* Fix `srun` inside an `salloc` in a federated cluster when using
|
||||||
|
IPv6.
|
||||||
|
* Calculate the forwarding timeouts according to tree depth
|
||||||
|
rather than node count / tree width for each level. Fixes race
|
||||||
|
conditions with same timeouts between two consecutive node
|
||||||
|
levels.
|
||||||
|
* Add ability to submit jobs with multiple QOS.
|
||||||
|
* Fix difference in behavior when swapping partition order in job
|
||||||
|
submission.
|
||||||
|
* Improve `PLANNED` state detection for mixed nodes and updating
|
||||||
|
state before yielding backfill locks.
|
||||||
|
* Always consider partition priority tiers when deciding to try
|
||||||
|
scheduling jobs on submit.
|
||||||
|
* Prevent starting jobs without reservations on submit when there
|
||||||
|
are pending jobs with reservations that have flags `FLEX` or
|
||||||
|
`ANY_NODES` that can be scheduled on overlapping nodes.
|
||||||
|
* Prevent jobs that request both high and low priority tier
|
||||||
|
partitions from starting on submit in lower priority tier
|
||||||
|
partitions if it could delay pending jobs in higher priority
|
||||||
|
tier partitions.
|
||||||
|
* `scontrol` - Wait for `slurmctld` to start reconfigure in
|
||||||
|
foreground mode before returning.
|
||||||
|
* Improve reconfigure handling on Linux to only close open file
|
||||||
|
descriptors to avoid long delays on systems with large
|
||||||
|
`RLIMIT_NOFILE` settings.
|
||||||
|
* `salloc` - Removed `--get-user-env` option.
|
||||||
|
* Removed the instant on feature from `switch/hpe_slingshot`.
|
||||||
|
* Hardware collectives in `switch/hpe_slingshot` now requires
|
||||||
|
`enable_stepmgr`.
|
||||||
|
* Allow backfill to plan jobs on nodes currently being used by
|
||||||
|
exclusive user or mcs jobs.
|
||||||
|
* Avoid miscaching IPv6 address to hostname lookups that could
|
||||||
|
have caused logs to have the incorrect hostname.
|
||||||
|
* `scontrol` - Add `--json`/`--yaml` support to `listpids`
|
||||||
|
* `scontrol` - Add `liststeps`
|
||||||
|
* `scontrol` - Add `listjobs`
|
||||||
|
* `slurmrestd` - Avoid connection to slurmdbd for the following
|
||||||
|
endpoints:
|
||||||
|
`GET /slurm/v0.0.42/jobs`
|
||||||
|
`GET /slurm/v0.0.42/job/{job_id}`
|
||||||
|
* `slurmctld` - Changed incoming RPC handling to dedicated thread
|
||||||
|
pool.
|
||||||
|
* `job_container/tmpfs` - Add `EntireStepInNS` option that will
|
||||||
|
place the `slurmstepd` process within the constructed namespace
|
||||||
|
directly.
|
||||||
|
* `scontrol show topo` - Show aggregated block sizes when using
|
||||||
|
`topology/block`.
|
||||||
|
* `slurmrestd` - Add more descriptive HTTP status for
|
||||||
|
authentication failure and connectivity errors with controller.
|
||||||
|
* `slurmrestd` - Improve reporting errors from `slurmctld` for
|
||||||
|
job queries:
|
||||||
|
`GET /slurm/v0.0.41/{job_id}`
|
||||||
|
`GET /slurm/v0.0.41/jobs/`
|
||||||
|
* Avoid rejecting a step request that needs fewer GRES than nodes
|
||||||
|
in the job allocation.
|
||||||
|
* `slurmrestd` - Tag the never populated `.jobs[].pid` field as
|
||||||
|
deprecated for the following endpoints:
|
||||||
|
`GET /slurm/v0.0.42/{job_id}`
|
||||||
|
`GET /slurm/v0.0.42/jobs/`
|
||||||
|
* `scontrol`,`squeue` - Tag the never populated `.jobs[].pid` field
|
||||||
|
as deprecated for the following:
|
||||||
|
`scontrol show jobs --json`
|
||||||
|
`scontrol show jobs --yaml`
|
||||||
|
`scontrol show job ${JOB_ID} --json`
|
||||||
|
`scontrol show job ${JOB_ID} --yaml`
|
||||||
|
`squeue --json`
|
||||||
|
`squeue --yaml`
|
||||||
|
* `data_parser` v0.0.42 - fix timestamp parsing regression
|
||||||
|
introduced in in v0.0.40 (eaf3b6631f), parsing of non iso 8601
|
||||||
|
style timestamps
|
||||||
|
* `cgroup/v2` will detect some special container and namespaced
|
||||||
|
setups and will work with it.
|
||||||
|
* Support IPv6 in configless mode.
|
||||||
|
* Add `SlurmctldParamters=ignore_constraint_validation` to ignore
|
||||||
|
`constraint/feature` validation at submission.
|
||||||
|
* `slurmrestd` - Set `.pings[].mode` field as deprecated in the
|
||||||
|
following endpoints:
|
||||||
|
`GET /slurm/v0.0.42/ping`
|
||||||
|
* `scontrol` - Set `.pings[].mode` field as deprecated in the
|
||||||
|
following commands:
|
||||||
|
`scontrol ping --json`
|
||||||
|
`scontrol ping --yaml`
|
||||||
|
* `slurmrestd` - Set `.pings[].pinged` field as deprecated in
|
||||||
|
the following endpoints:
|
||||||
|
`GET /slurm/v0.0.42/ping`
|
||||||
|
* `scontrol` - Set `.pings[].pinged` field as deprecated in the
|
||||||
|
following commands:
|
||||||
|
`scontrol ping --json`
|
||||||
|
`scontrol ping --yaml`
|
||||||
|
* `slurmrestd` - Add `.pings[].primary` field to the following
|
||||||
|
endpoints:
|
||||||
|
`GET /slurm/v0.0.42/ping`
|
||||||
|
* `scontrol` - Add `.pings[].primary` field to the following
|
||||||
|
commands:
|
||||||
|
`scontrol ping --json`
|
||||||
|
`scontrol ping --yaml`
|
||||||
|
* `slurmrestd` - Add `.pings[].responding` field to the following
|
||||||
|
endpoints:
|
||||||
|
`GET /slurm/v0.0.42/ping`
|
||||||
|
* `scontrol` - Add `.pings[].responding` field to the following
|
||||||
|
commands:
|
||||||
|
`scontrol ping --json`
|
||||||
|
`scontrol ping --yaml`
|
||||||
|
* Prevent jobs without reservations from delaying jobs in
|
||||||
|
reservations with flags `FLEX` or `ANY_NODES` in the main
|
||||||
|
scheduler.
|
||||||
|
* Fix allowing to ask for multiple different types of TRES
|
||||||
|
when one of them has a value of 0.
|
||||||
|
* `slurmctld` - Add a grace period to ensure the agent retry
|
||||||
|
queue is properly flushed during shutdown.
|
||||||
|
* Don't ship `src/slurmrestd/plugins/openapi/slurmdbd/openapi.json`
|
||||||
|
`slurmrest` should always be used to enerate a new OpenAPI
|
||||||
|
schema (aka openapi.json or openapi.yaml).
|
||||||
|
* `mpi/pmix` - Fix potential deadlock and races with het jobs,
|
||||||
|
and fix potential memory and FDs leaks.
|
||||||
|
* Fix jobs with `--gpus` being rejected in some edge cases for
|
||||||
|
partitions where not all nodes have the same amount of GPUs
|
||||||
|
and CPUs configured.
|
||||||
|
* In an extra constraints expression in a job request, do not
|
||||||
|
allow an empty string for a key or value.
|
||||||
|
* In an extra constraints expression in a job request, fix
|
||||||
|
validation that requests are separated by boolean operators.
|
||||||
|
* Add `TaskPluginParam=OOMKillStep` to kill the step as a whole
|
||||||
|
when one task OOMs.
|
||||||
|
* Fix `scontrol` show conf not showing all `TaskPluginParam`
|
||||||
|
elements.
|
||||||
|
* `slurmrestd` - Add fields `.job.oom_kill_step`
|
||||||
|
`.jobs[].oom_kill_step` to `POST /slurm/v0.0.42/job/submit`
|
||||||
|
and `POST /slurm/v0.0.42/job/allocate`.
|
||||||
|
* Improve performance for `_will_run_test()`.
|
||||||
|
* Add `SchedulerParameters=bf_topopt_enable` option to enable
|
||||||
|
experimental hook to control backfill.
|
||||||
|
* If a step fails to launch under certain conditions, set the
|
||||||
|
step's state to `NODE_FAIL`.
|
||||||
|
* `sched/backfill` - Fix certain situations where a job would
|
||||||
|
not get a planned time, which could lead to it being delayed
|
||||||
|
by lower priority jobs.
|
||||||
|
* `slurmrestd` - Dump JSON `null` instead of `{}` (empty object)
|
||||||
|
for non-required fields in objects to avoid client
|
||||||
|
compatiblity issues for v0.0.42 version tagged endpoints.
|
||||||
|
* `sacct`,`sacctmgr`,`scontrol`,`sdiag`,`sinfo`,`squeue`,
|
||||||
|
`sshare` - Dump `null` instead `{}` (empty object) for
|
||||||
|
non-required fields in objects to avoid client compatiblity
|
||||||
|
issues when run with `--json` or `--yaml`.
|
||||||
|
|
||||||
-------------------------------------------------------------------
|
-------------------------------------------------------------------
|
||||||
Fri Nov 1 12:50:27 UTC 2024 - Egbert Eich <eich@suse.com>
|
Fri Nov 1 12:50:27 UTC 2024 - Egbert Eich <eich@suse.com>
|
||||||
|
|
||||||
@ -132,7 +519,7 @@ Mon Oct 14 10:40:10 UTC 2024 - Egbert Eich <eich@suse.com>
|
|||||||
* Fix reboot asap nodes being considered in backfill after a restart.
|
* Fix reboot asap nodes being considered in backfill after a restart.
|
||||||
* Fix `--clusters`/`-M queries` for clusters outside of a
|
* Fix `--clusters`/`-M queries` for clusters outside of a
|
||||||
federation when `fed_display` is configured.
|
federation when `fed_display` is configured.
|
||||||
* Fix `scontrol` allowing updating job with bad cpus-per-task value.
|
* Fix `scontrol` allowing updating job with bad `cpus-per-task` value.
|
||||||
* `sattach` - Fix regression from 24.05.2 security fix leading to
|
* `sattach` - Fix regression from 24.05.2 security fix leading to
|
||||||
crash.
|
crash.
|
||||||
* `mpi/pmix` - Fix assertion when built under `--enable-debug`.
|
* `mpi/pmix` - Fix assertion when built under `--enable-debug`.
|
||||||
@ -173,10 +560,10 @@ Mon Oct 14 10:40:10 UTC 2024 - Egbert Eich <eich@suse.com>
|
|||||||
characters.
|
characters.
|
||||||
* `switch/hpe_slingshot` - Drain node on failure to delete CXI
|
* `switch/hpe_slingshot` - Drain node on failure to delete CXI
|
||||||
services.
|
services.
|
||||||
* Fix a performance regression from 23.11.0 in cpu frequency
|
* Fix a performance regression from 23.11.0 in CPU frequency
|
||||||
handling when no `CpuFreqDef` is defined.
|
handling when no `CpuFreqDef` is defined.
|
||||||
* Fix one-task-per-sharing not working across multiple nodes.
|
* Fix one-task-per-sharing not working across multiple nodes.
|
||||||
* Fix inconsistent number of cpus when creating a reservation
|
* Fix inconsistent number of CPUs when creating a reservation
|
||||||
using the TRESPerNode option.
|
using the TRESPerNode option.
|
||||||
* `data_parser/v0.0.40+` - Fix job state parsing which could
|
* `data_parser/v0.0.40+` - Fix job state parsing which could
|
||||||
break filtering.
|
break filtering.
|
||||||
|
24
slurm.spec
24
slurm.spec
@ -1,7 +1,7 @@
|
|||||||
#
|
#
|
||||||
# spec file for package slurm
|
# spec file for package slurm
|
||||||
#
|
#
|
||||||
# Copyright (c) 2024 SUSE LLC
|
# Copyright (c) 2025 SUSE LLC
|
||||||
#
|
#
|
||||||
# All modifications and additions to the file contributed by third parties
|
# All modifications and additions to the file contributed by third parties
|
||||||
# remain the property of their copyright owners, unless otherwise agreed
|
# remain the property of their copyright owners, unless otherwise agreed
|
||||||
@ -17,10 +17,10 @@
|
|||||||
|
|
||||||
|
|
||||||
# Check file META in sources: update so_version to (API_CURRENT - API_AGE)
|
# Check file META in sources: update so_version to (API_CURRENT - API_AGE)
|
||||||
%define so_version 41
|
%define so_version 42
|
||||||
# Make sure to update `upgrades` as well!
|
# Make sure to update `upgrades` as well!
|
||||||
%define ver 24.05.4
|
%define ver 24.11.0
|
||||||
%define _ver _24_05
|
%define _ver _24_11
|
||||||
%define dl_ver %{ver}
|
%define dl_ver %{ver}
|
||||||
# so-version is 0 and seems to be stable
|
# so-version is 0 and seems to be stable
|
||||||
%define pmi_so 0
|
%define pmi_so 0
|
||||||
@ -62,6 +62,9 @@ ExclusiveArch: do_not_build
|
|||||||
%if 0%{?sle_version} == 150500 || 0%{?sle_version} == 150600
|
%if 0%{?sle_version} == 150500 || 0%{?sle_version} == 150600
|
||||||
%define base_ver 2302
|
%define base_ver 2302
|
||||||
%endif
|
%endif
|
||||||
|
%if 0%{?sle_version} == 150700
|
||||||
|
%define base_ver 2411
|
||||||
|
%endif
|
||||||
|
|
||||||
%define ver_m %{lua:x=string.gsub(rpm.expand("%ver"),"%.[^%.]*$","");print(x)}
|
%define ver_m %{lua:x=string.gsub(rpm.expand("%ver"),"%.[^%.]*$","");print(x)}
|
||||||
# Keep format_spec_file from botching the define below:
|
# Keep format_spec_file from botching the define below:
|
||||||
@ -756,6 +759,11 @@ rm %{buildroot}%{perl_archlib}/perllocal.pod \
|
|||||||
%{buildroot}%{perl_sitearch}/auto/Slurm/.packlist \
|
%{buildroot}%{perl_sitearch}/auto/Slurm/.packlist \
|
||||||
%{buildroot}%{perl_sitearch}/auto/Slurmdb/.packlist
|
%{buildroot}%{perl_sitearch}/auto/Slurmdb/.packlist
|
||||||
|
|
||||||
|
# Fix shell completion bindings
|
||||||
|
for i in `find %{buildroot}/usr/share/bash-completion/completions/ -type l`; do
|
||||||
|
ln -sf $(basename $(readlink -f $i)) $i;
|
||||||
|
done
|
||||||
|
|
||||||
mkdir -p %{buildroot}%{perl_vendorarch}
|
mkdir -p %{buildroot}%{perl_vendorarch}
|
||||||
|
|
||||||
mv %{buildroot}%{perl_sitearch}/* \
|
mv %{buildroot}%{perl_sitearch}/* \
|
||||||
@ -1081,6 +1089,7 @@ rm -rf /srv/slurm-testsuite/src /srv/slurm-testsuite/testsuite \
|
|||||||
%{?have_netloc:%{_bindir}/netloc_to_topology}
|
%{?have_netloc:%{_bindir}/netloc_to_topology}
|
||||||
%{_sbindir}/sackd
|
%{_sbindir}/sackd
|
||||||
%{_sbindir}/slurmctld
|
%{_sbindir}/slurmctld
|
||||||
|
%{_datadir}/bash-completion/completions/
|
||||||
%dir %{_libdir}/slurm/src
|
%dir %{_libdir}/slurm/src
|
||||||
%{_unitdir}/slurmctld.service
|
%{_unitdir}/slurmctld.service
|
||||||
%{_sbindir}/rcslurmctld
|
%{_sbindir}/rcslurmctld
|
||||||
@ -1193,9 +1202,9 @@ rm -rf /srv/slurm-testsuite/src /srv/slurm-testsuite/testsuite \
|
|||||||
%{_libdir}/slurm/acct_gather_filesystem_lustre.so
|
%{_libdir}/slurm/acct_gather_filesystem_lustre.so
|
||||||
%{_libdir}/slurm/burst_buffer_lua.so
|
%{_libdir}/slurm/burst_buffer_lua.so
|
||||||
%{_libdir}/slurm/burst_buffer_datawarp.so
|
%{_libdir}/slurm/burst_buffer_datawarp.so
|
||||||
|
%{_libdir}/slurm/data_parser_v0_0_42.so
|
||||||
%{_libdir}/slurm/data_parser_v0_0_41.so
|
%{_libdir}/slurm/data_parser_v0_0_41.so
|
||||||
%{_libdir}/slurm/data_parser_v0_0_40.so
|
%{_libdir}/slurm/data_parser_v0_0_40.so
|
||||||
%{_libdir}/slurm/data_parser_v0_0_39.so
|
|
||||||
%{_libdir}/slurm/cgroup_v1.so
|
%{_libdir}/slurm/cgroup_v1.so
|
||||||
%if 0%{?suse_version} >= 1500
|
%if 0%{?suse_version} >= 1500
|
||||||
%{_libdir}/slurm/cgroup_v2.so
|
%{_libdir}/slurm/cgroup_v2.so
|
||||||
@ -1270,6 +1279,9 @@ rm -rf /srv/slurm-testsuite/src /srv/slurm-testsuite/testsuite \
|
|||||||
%{_libdir}/slurm/node_features_knl_generic.so
|
%{_libdir}/slurm/node_features_knl_generic.so
|
||||||
%{_libdir}/slurm/acct_gather_profile_influxdb.so
|
%{_libdir}/slurm/acct_gather_profile_influxdb.so
|
||||||
%{_libdir}/slurm/jobcomp_elasticsearch.so
|
%{_libdir}/slurm/jobcomp_elasticsearch.so
|
||||||
|
%{_libdir}/slurm/certmgr_script.so
|
||||||
|
%{_libdir}/slurm/gpu_nvidia.so
|
||||||
|
%{_libdir}/slurm/mcs_label.so
|
||||||
|
|
||||||
%files lua
|
%files lua
|
||||||
%{_libdir}/slurm/job_submit_lua.so
|
%{_libdir}/slurm/job_submit_lua.so
|
||||||
@ -1304,8 +1316,6 @@ rm -rf /srv/slurm-testsuite/src /srv/slurm-testsuite/testsuite \
|
|||||||
%{_mandir}/man8/slurmrestd.*
|
%{_mandir}/man8/slurmrestd.*
|
||||||
%{_libdir}/slurm/openapi_slurmctld.so
|
%{_libdir}/slurm/openapi_slurmctld.so
|
||||||
%{_libdir}/slurm/openapi_slurmdbd.so
|
%{_libdir}/slurm/openapi_slurmdbd.so
|
||||||
%{_libdir}/slurm/openapi_dbv0_0_39.so
|
|
||||||
%{_libdir}/slurm/openapi_v0_0_39.so
|
|
||||||
%{_libdir}/slurm/rest_auth_local.so
|
%{_libdir}/slurm/rest_auth_local.so
|
||||||
%endif
|
%endif
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user