Egbert Eich c63b605916 - Fixes since 23.02.03:
Highlights:
  * Fix main scheduler loop not starting after a failover to backup controller.
  * Avoid slurmctld segfault when specifying `AccountingStorageExternalHost`
    (bsc#1214983).
  Other:
  * Fix sbatch return code when `--wait` is requested on a job array.
  * Fix collected `GPUUtilization` values for `acct_gather_profile` plugins.
  * Fix `slurmrestd` handling of job hold/release operations.
  * Make spank `S_JOB_ARGV` item value hold the requested command `argv`
    instead of the `srun --bcast` value when `--bcast` requested (only in local
    context).
  * Fix step running indefinitely when slurmctld takes more than
    `MessageTimeout` to respond. Now, slurmctld will cancel the step when
    detected, preventing following steps from getting stuck waiting for
    resources to be released.
  * Fix regression to make `job_desc.min_cpus` accurate again in job_submit when
    requesting a job with `--ntasks-per-node`.
  * Fix handling of `ArrayTaskThrottle` in backfill.
  * Fix regression in 23.02.2 when checking gres state on `slurmctld` startup or
    reconfigure. Gres changes in the configuration were not updated on slurmctld
    startup. On startup or reconfigure, these messages were present in the log:
    `"error: Attempt to change gres/gpu Count`".
  * Fix potential double count of gres when dealing with limits.
  * Fix slurmstepd segfault when ContainerPath is not set in `oci.conf`
  * Fixed an issue where jobs requesting licenses were incorrectly rejected.
  * `scrontab` - Fix cutting off the final character of quoted variables.
  * `smail` - Fix issues where e-mails at job completion were not being sent.
  * `scontrol/slurmctld` - fix comma parsing when updating a reservation's
    nodes.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=258
2023-09-06 17:11:37 +00:00
2023-09-06 17:11:37 +00:00
Description
No description provided
811 KiB
Languages
Python 100%