plugins makes use of the MpiParams=ports= option, and previously

features with the `|` operator, which could prevent jobs from
    + `node_features/helpers` - Fix inconsistent handling of `&` and `|`,
      instead of just the current set. E.g. `foo|bar&baz` was interpreted
      `{foo} or {bar,baz}`.
      tasks fewer than GPUs, which resulted in incorrectly rejecting these
      jobs.
    + `slurmrestd` - For `GET /slurm/v0.0.39/node[s]`, change format of
      node's energy field `current_watts` to a dictionary to account for
    + `slurmrestd` - For `GET /slurm/v0.0.39/qos`, change format of QOS's
    + slurmrestd - For `GET /slurm/v0.0.39/job[s]`, the 'return code'
      `GET /slurmdb/v0.0.39/jobs` from slurmrestd.
      were present in the log: `error: Attempt to change gres/gpu Count`.
    + Hold the job with `(Reservation ... invalid)` state reason if the

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=265
This commit is contained in:
Egbert Eich 2023-09-18 05:43:58 +00:00 committed by Git OBS Bridge
parent 74529b6cc2
commit f0b994e220
2 changed files with 15 additions and 14 deletions

View File

@ -9,7 +9,7 @@ Mon Sep 18 05:23:19 UTC 2023 - Egbert Eich <eich@suse.com>
accurate in more situations.
+ Change pmi2 plugin to honor the `SrunPortRange` option. This matches the
new behavior of the pmix plugin in 23.02.0. Note that neither of these
plugins makes use of the "`MpiParams=ports=`" option, and previously
plugins makes use of the `MpiParams=ports=` option, and previously
were only limited by the systems ephemeral port range.
+ Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if
a node features plugin is configured.
@ -44,13 +44,13 @@ Mon Sep 18 05:23:19 UTC 2023 - Egbert Eich <eich@suse.com>
federation before they have registered with the dbd.
+ `node_features/helpers` - Fix node selection for jobs requesting
changeable.
features with the '`|`' operator, which could prevent jobs from
features with the `|` operator, which could prevent jobs from
running on some valid nodes.
+ `node_features/helpers` - Fix inconsistent handling of '`&`' and '`|`',
+ `node_features/helpers` - Fix inconsistent handling of `&` and `|`,
where an AND'd feature was sometimes AND'd to all sets of features
instead of just the current set. E.g. "`foo|bar&baz`" was interpreted
instead of just the current set. E.g. `foo|bar&baz` was interpreted
as `{foo,baz}` or `{bar,baz}` instead of how it is documented:
"`{foo} or {bar,baz}`".
`{foo} or {bar,baz}`.
+ Fix job accounting so that when a job is requeued its allocated node
count is cleared. After the requeue, sacct will correctly show that
the job has 0 `AllocNodes` while it is pending or if it is canceled
@ -60,7 +60,8 @@ Mon Sep 18 05:23:19 UTC 2023 - Egbert Eich <eich@suse.com>
+ Fix intel OneAPI autodetect: detect the `/dev/dri/renderD[0-9]+` GPUs,
and do not detect `/dev/dri/card[0-9]+`.
+ Fix node selection for jobs that request `--gpus` and a number of
tasks fewer than GPUs, which resulted in incorrectly rejecting these jobs.
tasks fewer than GPUs, which resulted in incorrectly rejecting these
jobs.
+ Remove `MYSQL_OPT_RECONNECT` completely.
+ Fix cloud nodes in `POWERING_UP` state disappearing (getting set
to `FUTURE`)
@ -102,13 +103,13 @@ Mon Sep 18 05:23:19 UTC 2023 - Egbert Eich <eich@suse.com>
+ Fix minor memory leak with `--tres-per-task` and licenses.
+ Fix cyclic socket cpu distribution for tasks in a step where
`--cpus-per-task` < usable threads per core.
+ `slurmrestd` - For '`GET /slurm/v0.0.39/node[s]`', change format of
node's energy field "`current_watts`" to a dictionary to account for
+ `slurmrestd` - For `GET /slurm/v0.0.39/node[s]`, change format of
node's energy field `current_watts` to a dictionary to account for
unset value instead of dumping 4294967294.
+ `slurmrestd` - For '`GET /slurm/v0.0.39/qos`', change format of QOS's
+ `slurmrestd` - For `GET /slurm/v0.0.39/qos`, change format of QOS's
field "priority" to a dictionary to account for unset value instead of
dumping 4294967294.
+ slurmrestd - For '`GET /slurm/v0.0.39/job[s]`', the 'return code'
+ slurmrestd - For `GET /slurm/v0.0.39/job[s]`, the 'return code'
code field in `v0.0.39_job_exit`_code will be set to -127 instead of
being left unset where job does not have a relevant return code.
* Other Changes:
@ -127,7 +128,7 @@ Mon Sep 18 05:23:19 UTC 2023 - Egbert Eich <eich@suse.com>
+ `slurmrestd` - Reduce memory usage when printing out job CPU frequency.
+ `data_parser/v0.0.39` - Add `required/memory_per_cpu` and
`required/memory_per_node` to `sacct --json` and `sacct --yaml` and
'`GET /slurmdb/v0.0.39/jobs`' from slurmrestd.
`GET /slurmdb/v0.0.39/jobs` from slurmrestd.
+ `gpu/oneapi` - Store cores correctly so CPU affinity is tracked.
+ Allow `slurmdbd -R` to work if the root assoc id is not 1.
+ Limit periodic node registrations to 50 instead of the full `TreeWidth`.
@ -156,7 +157,7 @@ Mon Aug 21 09:43:08 UTC 2023 - Christian Goll <cgoll@suse.com>
+ Fix regression in 23.02.2 when checking gres state on `slurmctld`
startup or reconfigure. Gres changes in the configuration were not
updated on slurmctld startup. On startup or reconfigure, these messages
were present in the log: `"error: Attempt to change gres/gpu Count`".
were present in the log: `error: Attempt to change gres/gpu Count`.
+ Fix potential double count of gres when dealing with limits.
+ Fix `slurmstepd` segfault when `ContainerPath` is not set in `oci.conf`
+ Fixed an issue where jobs requesting licenses were incorrectly rejected.
@ -300,7 +301,7 @@ Mon Aug 21 09:43:08 UTC 2023 - Christian Goll <cgoll@suse.com>
lookups.
+ `sacct` - when printing `PLANNED` time, use end time instead of start
time for jobs cancelled before they started.
+ Hold the job with "`(Reservation ... invalid)`" state reason if the
+ Hold the job with `(Reservation ... invalid)` state reason if the
reservation is not usable by the job.
+ `sbatch` - Added new `--export=NIL` option.
- Removed:

View File

@ -1321,7 +1321,7 @@ rm -rf /srv/slurm-testsuite/src /srv/slurm-testsuite/testsuite \
%{_mandir}/man5/cgroup.*
%{_mandir}/man5/gres.*
%{_mandir}/man5/helpers.*
%{_mandir}/man5/nonstop.conf.5.*
#%%{_mandir}/man5/nonstop.conf.5.*
%{_mandir}/man5/oci.conf.5.gz
%{_mandir}/man5/topology.*
%{_mandir}/man5/knl.conf.5.*