plugins makes use of the MpiParams=ports=
option, and previously
features with the `|` operator, which could prevent jobs from + `node_features/helpers` - Fix inconsistent handling of `&` and `|`, instead of just the current set. E.g. `foo|bar&baz` was interpreted `{foo} or {bar,baz}`. tasks fewer than GPUs, which resulted in incorrectly rejecting these jobs. + `slurmrestd` - For `GET /slurm/v0.0.39/node[s]`, change format of node's energy field `current_watts` to a dictionary to account for + `slurmrestd` - For `GET /slurm/v0.0.39/qos`, change format of QOS's + slurmrestd - For `GET /slurm/v0.0.39/job[s]`, the 'return code' `GET /slurmdb/v0.0.39/jobs` from slurmrestd. were present in the log: `error: Attempt to change gres/gpu Count`. + Hold the job with `(Reservation ... invalid)` state reason if the OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=265
This commit is contained in:
parent
74529b6cc2
commit
f0b994e220
@ -9,7 +9,7 @@ Mon Sep 18 05:23:19 UTC 2023 - Egbert Eich <eich@suse.com>
|
|||||||
accurate in more situations.
|
accurate in more situations.
|
||||||
+ Change pmi2 plugin to honor the `SrunPortRange` option. This matches the
|
+ Change pmi2 plugin to honor the `SrunPortRange` option. This matches the
|
||||||
new behavior of the pmix plugin in 23.02.0. Note that neither of these
|
new behavior of the pmix plugin in 23.02.0. Note that neither of these
|
||||||
plugins makes use of the "`MpiParams=ports=`" option, and previously
|
plugins makes use of the `MpiParams=ports=` option, and previously
|
||||||
were only limited by the systems ephemeral port range.
|
were only limited by the systems ephemeral port range.
|
||||||
+ Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if
|
+ Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if
|
||||||
a node features plugin is configured.
|
a node features plugin is configured.
|
||||||
@ -44,13 +44,13 @@ Mon Sep 18 05:23:19 UTC 2023 - Egbert Eich <eich@suse.com>
|
|||||||
federation before they have registered with the dbd.
|
federation before they have registered with the dbd.
|
||||||
+ `node_features/helpers` - Fix node selection for jobs requesting
|
+ `node_features/helpers` - Fix node selection for jobs requesting
|
||||||
changeable.
|
changeable.
|
||||||
features with the '`|`' operator, which could prevent jobs from
|
features with the `|` operator, which could prevent jobs from
|
||||||
running on some valid nodes.
|
running on some valid nodes.
|
||||||
+ `node_features/helpers` - Fix inconsistent handling of '`&`' and '`|`',
|
+ `node_features/helpers` - Fix inconsistent handling of `&` and `|`,
|
||||||
where an AND'd feature was sometimes AND'd to all sets of features
|
where an AND'd feature was sometimes AND'd to all sets of features
|
||||||
instead of just the current set. E.g. "`foo|bar&baz`" was interpreted
|
instead of just the current set. E.g. `foo|bar&baz` was interpreted
|
||||||
as `{foo,baz}` or `{bar,baz}` instead of how it is documented:
|
as `{foo,baz}` or `{bar,baz}` instead of how it is documented:
|
||||||
"`{foo} or {bar,baz}`".
|
`{foo} or {bar,baz}`.
|
||||||
+ Fix job accounting so that when a job is requeued its allocated node
|
+ Fix job accounting so that when a job is requeued its allocated node
|
||||||
count is cleared. After the requeue, sacct will correctly show that
|
count is cleared. After the requeue, sacct will correctly show that
|
||||||
the job has 0 `AllocNodes` while it is pending or if it is canceled
|
the job has 0 `AllocNodes` while it is pending or if it is canceled
|
||||||
@ -60,7 +60,8 @@ Mon Sep 18 05:23:19 UTC 2023 - Egbert Eich <eich@suse.com>
|
|||||||
+ Fix intel OneAPI autodetect: detect the `/dev/dri/renderD[0-9]+` GPUs,
|
+ Fix intel OneAPI autodetect: detect the `/dev/dri/renderD[0-9]+` GPUs,
|
||||||
and do not detect `/dev/dri/card[0-9]+`.
|
and do not detect `/dev/dri/card[0-9]+`.
|
||||||
+ Fix node selection for jobs that request `--gpus` and a number of
|
+ Fix node selection for jobs that request `--gpus` and a number of
|
||||||
tasks fewer than GPUs, which resulted in incorrectly rejecting these jobs.
|
tasks fewer than GPUs, which resulted in incorrectly rejecting these
|
||||||
|
jobs.
|
||||||
+ Remove `MYSQL_OPT_RECONNECT` completely.
|
+ Remove `MYSQL_OPT_RECONNECT` completely.
|
||||||
+ Fix cloud nodes in `POWERING_UP` state disappearing (getting set
|
+ Fix cloud nodes in `POWERING_UP` state disappearing (getting set
|
||||||
to `FUTURE`)
|
to `FUTURE`)
|
||||||
@ -102,13 +103,13 @@ Mon Sep 18 05:23:19 UTC 2023 - Egbert Eich <eich@suse.com>
|
|||||||
+ Fix minor memory leak with `--tres-per-task` and licenses.
|
+ Fix minor memory leak with `--tres-per-task` and licenses.
|
||||||
+ Fix cyclic socket cpu distribution for tasks in a step where
|
+ Fix cyclic socket cpu distribution for tasks in a step where
|
||||||
`--cpus-per-task` < usable threads per core.
|
`--cpus-per-task` < usable threads per core.
|
||||||
+ `slurmrestd` - For '`GET /slurm/v0.0.39/node[s]`', change format of
|
+ `slurmrestd` - For `GET /slurm/v0.0.39/node[s]`, change format of
|
||||||
node's energy field "`current_watts`" to a dictionary to account for
|
node's energy field `current_watts` to a dictionary to account for
|
||||||
unset value instead of dumping 4294967294.
|
unset value instead of dumping 4294967294.
|
||||||
+ `slurmrestd` - For '`GET /slurm/v0.0.39/qos`', change format of QOS's
|
+ `slurmrestd` - For `GET /slurm/v0.0.39/qos`, change format of QOS's
|
||||||
field "priority" to a dictionary to account for unset value instead of
|
field "priority" to a dictionary to account for unset value instead of
|
||||||
dumping 4294967294.
|
dumping 4294967294.
|
||||||
+ slurmrestd - For '`GET /slurm/v0.0.39/job[s]`', the 'return code'
|
+ slurmrestd - For `GET /slurm/v0.0.39/job[s]`, the 'return code'
|
||||||
code field in `v0.0.39_job_exit`_code will be set to -127 instead of
|
code field in `v0.0.39_job_exit`_code will be set to -127 instead of
|
||||||
being left unset where job does not have a relevant return code.
|
being left unset where job does not have a relevant return code.
|
||||||
* Other Changes:
|
* Other Changes:
|
||||||
@ -127,7 +128,7 @@ Mon Sep 18 05:23:19 UTC 2023 - Egbert Eich <eich@suse.com>
|
|||||||
+ `slurmrestd` - Reduce memory usage when printing out job CPU frequency.
|
+ `slurmrestd` - Reduce memory usage when printing out job CPU frequency.
|
||||||
+ `data_parser/v0.0.39` - Add `required/memory_per_cpu` and
|
+ `data_parser/v0.0.39` - Add `required/memory_per_cpu` and
|
||||||
`required/memory_per_node` to `sacct --json` and `sacct --yaml` and
|
`required/memory_per_node` to `sacct --json` and `sacct --yaml` and
|
||||||
'`GET /slurmdb/v0.0.39/jobs`' from slurmrestd.
|
`GET /slurmdb/v0.0.39/jobs` from slurmrestd.
|
||||||
+ `gpu/oneapi` - Store cores correctly so CPU affinity is tracked.
|
+ `gpu/oneapi` - Store cores correctly so CPU affinity is tracked.
|
||||||
+ Allow `slurmdbd -R` to work if the root assoc id is not 1.
|
+ Allow `slurmdbd -R` to work if the root assoc id is not 1.
|
||||||
+ Limit periodic node registrations to 50 instead of the full `TreeWidth`.
|
+ Limit periodic node registrations to 50 instead of the full `TreeWidth`.
|
||||||
@ -156,7 +157,7 @@ Mon Aug 21 09:43:08 UTC 2023 - Christian Goll <cgoll@suse.com>
|
|||||||
+ Fix regression in 23.02.2 when checking gres state on `slurmctld`
|
+ Fix regression in 23.02.2 when checking gres state on `slurmctld`
|
||||||
startup or reconfigure. Gres changes in the configuration were not
|
startup or reconfigure. Gres changes in the configuration were not
|
||||||
updated on slurmctld startup. On startup or reconfigure, these messages
|
updated on slurmctld startup. On startup or reconfigure, these messages
|
||||||
were present in the log: `"error: Attempt to change gres/gpu Count`".
|
were present in the log: `error: Attempt to change gres/gpu Count`.
|
||||||
+ Fix potential double count of gres when dealing with limits.
|
+ Fix potential double count of gres when dealing with limits.
|
||||||
+ Fix `slurmstepd` segfault when `ContainerPath` is not set in `oci.conf`
|
+ Fix `slurmstepd` segfault when `ContainerPath` is not set in `oci.conf`
|
||||||
+ Fixed an issue where jobs requesting licenses were incorrectly rejected.
|
+ Fixed an issue where jobs requesting licenses were incorrectly rejected.
|
||||||
@ -300,7 +301,7 @@ Mon Aug 21 09:43:08 UTC 2023 - Christian Goll <cgoll@suse.com>
|
|||||||
lookups.
|
lookups.
|
||||||
+ `sacct` - when printing `PLANNED` time, use end time instead of start
|
+ `sacct` - when printing `PLANNED` time, use end time instead of start
|
||||||
time for jobs cancelled before they started.
|
time for jobs cancelled before they started.
|
||||||
+ Hold the job with "`(Reservation ... invalid)`" state reason if the
|
+ Hold the job with `(Reservation ... invalid)` state reason if the
|
||||||
reservation is not usable by the job.
|
reservation is not usable by the job.
|
||||||
+ `sbatch` - Added new `--export=NIL` option.
|
+ `sbatch` - Added new `--export=NIL` option.
|
||||||
- Removed:
|
- Removed:
|
||||||
|
@ -1321,7 +1321,7 @@ rm -rf /srv/slurm-testsuite/src /srv/slurm-testsuite/testsuite \
|
|||||||
%{_mandir}/man5/cgroup.*
|
%{_mandir}/man5/cgroup.*
|
||||||
%{_mandir}/man5/gres.*
|
%{_mandir}/man5/gres.*
|
||||||
%{_mandir}/man5/helpers.*
|
%{_mandir}/man5/helpers.*
|
||||||
%{_mandir}/man5/nonstop.conf.5.*
|
#%%{_mandir}/man5/nonstop.conf.5.*
|
||||||
%{_mandir}/man5/oci.conf.5.gz
|
%{_mandir}/man5/oci.conf.5.gz
|
||||||
%{_mandir}/man5/topology.*
|
%{_mandir}/man5/topology.*
|
||||||
%{_mandir}/man5/knl.conf.5.*
|
%{_mandir}/man5/knl.conf.5.*
|
||||||
|
Loading…
Reference in New Issue
Block a user