plugins makes use of the MpiParams=ports= option, and previously

features with the `|` operator, which could prevent jobs from
    + `node_features/helpers` - Fix inconsistent handling of `&` and `|`,
      instead of just the current set. E.g. `foo|bar&baz` was interpreted
      `{foo} or {bar,baz}`.
      tasks fewer than GPUs, which resulted in incorrectly rejecting these
      jobs.
    + `slurmrestd` - For `GET /slurm/v0.0.39/node[s]`, change format of
      node's energy field `current_watts` to a dictionary to account for
    + `slurmrestd` - For `GET /slurm/v0.0.39/qos`, change format of QOS's
    + slurmrestd - For `GET /slurm/v0.0.39/job[s]`, the 'return code'
      `GET /slurmdb/v0.0.39/jobs` from slurmrestd.
      were present in the log: `error: Attempt to change gres/gpu Count`.
    + Hold the job with `(Reservation ... invalid)` state reason if the

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=265
This commit is contained in:
Egbert Eich 2023-09-18 05:43:58 +00:00 committed by Git OBS Bridge
parent 74529b6cc2
commit f0b994e220
2 changed files with 15 additions and 14 deletions

View File

@ -9,7 +9,7 @@ Mon Sep 18 05:23:19 UTC 2023 - Egbert Eich <eich@suse.com>
accurate in more situations. accurate in more situations.
+ Change pmi2 plugin to honor the `SrunPortRange` option. This matches the + Change pmi2 plugin to honor the `SrunPortRange` option. This matches the
new behavior of the pmix plugin in 23.02.0. Note that neither of these new behavior of the pmix plugin in 23.02.0. Note that neither of these
plugins makes use of the "`MpiParams=ports=`" option, and previously plugins makes use of the `MpiParams=ports=` option, and previously
were only limited by the systems ephemeral port range. were only limited by the systems ephemeral port range.
+ Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if + Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if
a node features plugin is configured. a node features plugin is configured.
@ -44,13 +44,13 @@ Mon Sep 18 05:23:19 UTC 2023 - Egbert Eich <eich@suse.com>
federation before they have registered with the dbd. federation before they have registered with the dbd.
+ `node_features/helpers` - Fix node selection for jobs requesting + `node_features/helpers` - Fix node selection for jobs requesting
changeable. changeable.
features with the '`|`' operator, which could prevent jobs from features with the `|` operator, which could prevent jobs from
running on some valid nodes. running on some valid nodes.
+ `node_features/helpers` - Fix inconsistent handling of '`&`' and '`|`', + `node_features/helpers` - Fix inconsistent handling of `&` and `|`,
where an AND'd feature was sometimes AND'd to all sets of features where an AND'd feature was sometimes AND'd to all sets of features
instead of just the current set. E.g. "`foo|bar&baz`" was interpreted instead of just the current set. E.g. `foo|bar&baz` was interpreted
as `{foo,baz}` or `{bar,baz}` instead of how it is documented: as `{foo,baz}` or `{bar,baz}` instead of how it is documented:
"`{foo} or {bar,baz}`". `{foo} or {bar,baz}`.
+ Fix job accounting so that when a job is requeued its allocated node + Fix job accounting so that when a job is requeued its allocated node
count is cleared. After the requeue, sacct will correctly show that count is cleared. After the requeue, sacct will correctly show that
the job has 0 `AllocNodes` while it is pending or if it is canceled the job has 0 `AllocNodes` while it is pending or if it is canceled
@ -60,7 +60,8 @@ Mon Sep 18 05:23:19 UTC 2023 - Egbert Eich <eich@suse.com>
+ Fix intel OneAPI autodetect: detect the `/dev/dri/renderD[0-9]+` GPUs, + Fix intel OneAPI autodetect: detect the `/dev/dri/renderD[0-9]+` GPUs,
and do not detect `/dev/dri/card[0-9]+`. and do not detect `/dev/dri/card[0-9]+`.
+ Fix node selection for jobs that request `--gpus` and a number of + Fix node selection for jobs that request `--gpus` and a number of
tasks fewer than GPUs, which resulted in incorrectly rejecting these jobs. tasks fewer than GPUs, which resulted in incorrectly rejecting these
jobs.
+ Remove `MYSQL_OPT_RECONNECT` completely. + Remove `MYSQL_OPT_RECONNECT` completely.
+ Fix cloud nodes in `POWERING_UP` state disappearing (getting set + Fix cloud nodes in `POWERING_UP` state disappearing (getting set
to `FUTURE`) to `FUTURE`)
@ -102,13 +103,13 @@ Mon Sep 18 05:23:19 UTC 2023 - Egbert Eich <eich@suse.com>
+ Fix minor memory leak with `--tres-per-task` and licenses. + Fix minor memory leak with `--tres-per-task` and licenses.
+ Fix cyclic socket cpu distribution for tasks in a step where + Fix cyclic socket cpu distribution for tasks in a step where
`--cpus-per-task` < usable threads per core. `--cpus-per-task` < usable threads per core.
+ `slurmrestd` - For '`GET /slurm/v0.0.39/node[s]`', change format of + `slurmrestd` - For `GET /slurm/v0.0.39/node[s]`, change format of
node's energy field "`current_watts`" to a dictionary to account for node's energy field `current_watts` to a dictionary to account for
unset value instead of dumping 4294967294. unset value instead of dumping 4294967294.
+ `slurmrestd` - For '`GET /slurm/v0.0.39/qos`', change format of QOS's + `slurmrestd` - For `GET /slurm/v0.0.39/qos`, change format of QOS's
field "priority" to a dictionary to account for unset value instead of field "priority" to a dictionary to account for unset value instead of
dumping 4294967294. dumping 4294967294.
+ slurmrestd - For '`GET /slurm/v0.0.39/job[s]`', the 'return code' + slurmrestd - For `GET /slurm/v0.0.39/job[s]`, the 'return code'
code field in `v0.0.39_job_exit`_code will be set to -127 instead of code field in `v0.0.39_job_exit`_code will be set to -127 instead of
being left unset where job does not have a relevant return code. being left unset where job does not have a relevant return code.
* Other Changes: * Other Changes:
@ -127,7 +128,7 @@ Mon Sep 18 05:23:19 UTC 2023 - Egbert Eich <eich@suse.com>
+ `slurmrestd` - Reduce memory usage when printing out job CPU frequency. + `slurmrestd` - Reduce memory usage when printing out job CPU frequency.
+ `data_parser/v0.0.39` - Add `required/memory_per_cpu` and + `data_parser/v0.0.39` - Add `required/memory_per_cpu` and
`required/memory_per_node` to `sacct --json` and `sacct --yaml` and `required/memory_per_node` to `sacct --json` and `sacct --yaml` and
'`GET /slurmdb/v0.0.39/jobs`' from slurmrestd. `GET /slurmdb/v0.0.39/jobs` from slurmrestd.
+ `gpu/oneapi` - Store cores correctly so CPU affinity is tracked. + `gpu/oneapi` - Store cores correctly so CPU affinity is tracked.
+ Allow `slurmdbd -R` to work if the root assoc id is not 1. + Allow `slurmdbd -R` to work if the root assoc id is not 1.
+ Limit periodic node registrations to 50 instead of the full `TreeWidth`. + Limit periodic node registrations to 50 instead of the full `TreeWidth`.
@ -156,7 +157,7 @@ Mon Aug 21 09:43:08 UTC 2023 - Christian Goll <cgoll@suse.com>
+ Fix regression in 23.02.2 when checking gres state on `slurmctld` + Fix regression in 23.02.2 when checking gres state on `slurmctld`
startup or reconfigure. Gres changes in the configuration were not startup or reconfigure. Gres changes in the configuration were not
updated on slurmctld startup. On startup or reconfigure, these messages updated on slurmctld startup. On startup or reconfigure, these messages
were present in the log: `"error: Attempt to change gres/gpu Count`". were present in the log: `error: Attempt to change gres/gpu Count`.
+ Fix potential double count of gres when dealing with limits. + Fix potential double count of gres when dealing with limits.
+ Fix `slurmstepd` segfault when `ContainerPath` is not set in `oci.conf` + Fix `slurmstepd` segfault when `ContainerPath` is not set in `oci.conf`
+ Fixed an issue where jobs requesting licenses were incorrectly rejected. + Fixed an issue where jobs requesting licenses were incorrectly rejected.
@ -300,7 +301,7 @@ Mon Aug 21 09:43:08 UTC 2023 - Christian Goll <cgoll@suse.com>
lookups. lookups.
+ `sacct` - when printing `PLANNED` time, use end time instead of start + `sacct` - when printing `PLANNED` time, use end time instead of start
time for jobs cancelled before they started. time for jobs cancelled before they started.
+ Hold the job with "`(Reservation ... invalid)`" state reason if the + Hold the job with `(Reservation ... invalid)` state reason if the
reservation is not usable by the job. reservation is not usable by the job.
+ `sbatch` - Added new `--export=NIL` option. + `sbatch` - Added new `--export=NIL` option.
- Removed: - Removed:

View File

@ -1321,7 +1321,7 @@ rm -rf /srv/slurm-testsuite/src /srv/slurm-testsuite/testsuite \
%{_mandir}/man5/cgroup.* %{_mandir}/man5/cgroup.*
%{_mandir}/man5/gres.* %{_mandir}/man5/gres.*
%{_mandir}/man5/helpers.* %{_mandir}/man5/helpers.*
%{_mandir}/man5/nonstop.conf.5.* #%%{_mandir}/man5/nonstop.conf.5.*
%{_mandir}/man5/oci.conf.5.gz %{_mandir}/man5/oci.conf.5.gz
%{_mandir}/man5/topology.* %{_mandir}/man5/topology.*
%{_mandir}/man5/knl.conf.5.* %{_mandir}/man5/knl.conf.5.*