slurm/upgrades

12 lines
90 B
Plaintext
Raw Permalink Normal View History

Accepting request 1150524 from home:eeich:branches:network:cluster - Update to version 23.11.03 * slurmrestd - Reject single http query with multiple path requests. * Fix launching Singularity v4.x containers with `srun --container` by setting .process.terminal to true in generated `config.json` when step has pseudoterminal (`--pty`) requested. * Fix loading in `dyanmic/cloud` node jobs after `net_cred` expired. * Fix cgroup null path error on `slurmd/slurmstepd` tear down. * `data_parser/v0.0.40` - Prevent failure if accounting is disabled, instead issue a warning if needed data from the database can not be retrieved. * `openapi/slurmctld` - Prevent failure if accounting is disabled. * Prevent `slurmscriptd` processing delays from blocking other threads in `slurmctld` while trying to launch various scripts. This is additional work for a fix in 23.02.6. * Fix memory leak when receiving alias addrs from controller. * `scontrol` - Accept `scontrol token lifespan=infinite` to create tokens that effectively do not expire. * Avoid errors when Slurmdb accounting disabled when `--json` or `--yaml` is invoked with CLI commands and `slurmrestd`. Add warnings when query would have populated data from Slurmdb instead of errors. * Fix `slurmctld` memory leak when running job with `--tres-per-task=gres:shard:#` * Fix backfill trying to start jobs outside of backfill window. * Fix oversubscription on partitions with `PreemptMode=OFF`. * Preserve node reason on power up if the node is downed or drained. OBS-URL: https://build.opensuse.org/request/show/1150524 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=289
2024-02-26 22:40:59 +01:00
23.11.1
23.02.7
Accepting request 1136624 from home:eeich:branches:network:cluster - Update to 23.02.6 to fix (CVE-2023-49933 - bsc#1218046, CVE-2023-49935 - bsc#1218049, CVE-2023-49936 - bsc#1218050, CVE-2023-49937 - bsc#1218051, CVE-2023-49938 - bsc#1218053) * Security Fixes: + Add `JobAcctGatherParams=DisableGPUAcct` to disable gpu accounting. + `acct_gather_energy/ipmi` - Improve logging of DCMI issues. + `gpu/oneapi` - Add support for new env vars `ZE_FLAT_DEVICE_HIERARCHY` and `ZE_ENABLE_PCI_ID_DEVICE_ORDER`. + `data_parser/v0.0.39` - skip empty string when parsing QOS ids. + Remove error message from `assoc_mgr_update_assocs` when purposefully resetting the default QOS. * Bug Fixes: + `libslurm_nss` - Avoid causing glibc to assert due to an unexpected return from slurm_nss due to an error during lookup. + Fix job requests with `--tres-per-task` sometimes resulting in bad allocations that cannot run subsequent job steps. + Fix issue with `slurmd` where `srun` fails to be warned when a node prolog script runs beyond `MsgTimeout` set in `slurm.conf`. + `gres/shard` - Fix plugin functions to have matching parameter orders. + `gpu/nvml` - Fix issue that resulted in the wrong MIG devices being constrained to a job + `gpu/nvml` - Fix linking issue with MIGs that prevented multiple MIGs being used in a single job for certain MIG configurations + Fix file descriptor leak in slurmd when using `acct_gather_energy/ipmi` with DCMI devices. + `sview` - avoid crash when job has a node list string > 49 characters. + Prevent `slurmctld` crash during reconfigure when packing job start messages. + Preserve reason uid on reconfig. + Update node reason with updated `INVAL` state reason if different from OBS-URL: https://build.opensuse.org/request/show/1136624 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=282
2024-01-05 13:29:13 +01:00
23.02.6
23.02.5
23.02.3
23.02.0
Accepting request 1136624 from home:eeich:branches:network:cluster - Update to 23.02.6 to fix (CVE-2023-49933 - bsc#1218046, CVE-2023-49935 - bsc#1218049, CVE-2023-49936 - bsc#1218050, CVE-2023-49937 - bsc#1218051, CVE-2023-49938 - bsc#1218053) * Security Fixes: + Add `JobAcctGatherParams=DisableGPUAcct` to disable gpu accounting. + `acct_gather_energy/ipmi` - Improve logging of DCMI issues. + `gpu/oneapi` - Add support for new env vars `ZE_FLAT_DEVICE_HIERARCHY` and `ZE_ENABLE_PCI_ID_DEVICE_ORDER`. + `data_parser/v0.0.39` - skip empty string when parsing QOS ids. + Remove error message from `assoc_mgr_update_assocs` when purposefully resetting the default QOS. * Bug Fixes: + `libslurm_nss` - Avoid causing glibc to assert due to an unexpected return from slurm_nss due to an error during lookup. + Fix job requests with `--tres-per-task` sometimes resulting in bad allocations that cannot run subsequent job steps. + Fix issue with `slurmd` where `srun` fails to be warned when a node prolog script runs beyond `MsgTimeout` set in `slurm.conf`. + `gres/shard` - Fix plugin functions to have matching parameter orders. + `gpu/nvml` - Fix issue that resulted in the wrong MIG devices being constrained to a job + `gpu/nvml` - Fix linking issue with MIGs that prevented multiple MIGs being used in a single job for certain MIG configurations + Fix file descriptor leak in slurmd when using `acct_gather_energy/ipmi` with DCMI devices. + `sview` - avoid crash when job has a node list string > 49 characters. + Prevent `slurmctld` crash during reconfigure when packing job start messages. + Preserve reason uid on reconfig. + Update node reason with updated `INVAL` state reason if different from OBS-URL: https://build.opensuse.org/request/show/1136624 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=282
2024-01-05 13:29:13 +01:00
22.05.11
22.05.10
22.05.5
22.05.2
22.05.0