SHA256
1
0
forked from pool/slurm

Commit Graph

  • 61add11d2b Accepting request 1161658 from network:cluster factory Ana Guerrero 2024-03-26 18:27:40 +0000
  • cda5ce024e Accepting request 1161499 from home:mslacken:branches:network:cluster Christian Goll 2024-03-26 08:40:44 +0000
  • 2bd53c8d44 work correctly (boo#1204697). Egbert Eich 2024-03-23 10:05:59 +0000
  • 4ec0f5cd48 Accepting request 1151965 from network:cluster Ana Guerrero 2024-02-27 21:47:57 +0000
  • fb460ebe6a Accepting request 1150524 from home:eeich:branches:network:cluster Egbert Eich 2024-02-26 21:40:59 +0000
  • 6a021ebb80 Accepting request 1141442 from network:cluster Ana Guerrero 2024-01-25 17:41:05 +0000
  • f98ecb23d5 - Remove last change. This is not how it is intended to work Egbert Eich 2024-01-25 07:58:54 +0000
  • a95f2355d0 Accepting request 1141020 from home:dimstar:Factory Christian Goll 2024-01-24 14:43:56 +0000
  • e59754da76 CVE-2023-49933, CVE-2023-49934, CVE-2023-49935, CVE-2023-49936 and CVE-2023-49937 * Substantially overhauled the SlurmDBD association management code. For clusters updated to 23.11, account and user additions or removals are significantly faster than in prior releases. * Overhauled scontrol reconfigure to prevent configuration mistakes from disabling slurmctld and slurmd. Instead, an error will be returned, and the running configuration will persist. This does require updates to the systemd service files to use the --systemd option to slurmctld and slurmd. * Added a new internal auth/cred plugin - auth/slurm. This builds off the prior auth/jwt model, and permits operation of the slurmdbd and slurmctld without access to full directory information with a suitable configuration. * Added a new --external-launcher option to srun, which is automatically set by common MPI launcher implementations and ensures processes using those non-srun launchers have full access to all resources allocated on each node. * Reworked the dynamic/cloud modes of operation to allow for "fanout" - where Slurm communication can be automatically offloaded to compute nodes for increased cluster scalability. * Overhauled and extended the Reservation subsystem to allow for most of the same resource requirements as are placed on the job. Notably, this permits reservations to now reserve GRES directly. * Fix scontrol update job=... TimeLimit+=/-= when used with a raw JobId of job array element. * Reject TimeLimit increment/decrement when called on job with TimeLimit=UNLIMITED. Egbert Eich 2024-01-22 16:26:43 +0000
  • e7275730c8 Accepting request 1138332 from home:mslacken:branches:network:cluster Egbert Eich 2024-01-22 15:21:33 +0000
  • 1f813cb386 Accepting request 1137045 from network:cluster Dominique Leuenberger 2024-01-05 20:45:15 +0000
  • af603b8163 Accepting request 1136624 from home:eeich:branches:network:cluster Egbert Eich 2024-01-05 12:29:13 +0000
  • 0db8ed8d95 Accepting request 1130097 from network:cluster Ana Guerrero 2023-12-04 21:59:28 +0000
  • bbe01bb79f Accepting request 1130096 from home:eeich:branches:network:cluster Egbert Eich 2023-11-30 19:27:08 +0000
  • 5a1d72f62c Accepting request 1129638 from home:eeich:branches:network:cluster Egbert Eich 2023-11-28 18:02:52 +0000
  • 1e8971e87a Accepting request 1129192 from network:cluster Ana Guerrero 2023-11-27 21:44:42 +0000
  • db15cbcf3e - On SLE-12 exclude build for s390x. Egbert Eich 2023-11-20 15:31:39 +0000
  • ccb26326c7 Accepting request 1123596 from network:cluster Ana Guerrero 2023-11-06 20:14:38 +0000
  • 961668403a Accepting request 1123595 from home:eeich:branches:network:cluster Egbert Eich 2023-11-06 14:56:24 +0000
  • b28d182fe8 Accepting request 1121548 from network:cluster Dominique Leuenberger 2023-11-01 21:09:57 +0000
  • c9c235c313 Format fix to changes file: GET /slurmdb/v0.0.39/assocations and GET /slurmdb/v0.0.39/qos to Egbert Eich 2023-10-25 07:12:31 +0000
  • 150d433676 Accepting request 1118220 from network:cluster Ana Guerrero 2023-10-17 18:24:48 +0000
  • 37c34593a9 - update to 23.02.6 to fix (CVE-2023-41914, bsc#1216207) Egbert Eich 2023-10-17 08:09:39 +0000
  • f946358d8c Accepting request 1117163 from network:cluster Ana Guerrero 2023-10-12 21:41:42 +0000
  • 449ea49bf9 - Fix changes file formatting Egbert Eich 2023-10-12 10:02:10 +0000
  • cd2c5bfc50 Accepting request 1117145 from home:mslacken:branches:network:cluster Christian Goll 2023-10-12 09:09:32 +0000
  • 90bba6a8aa Accepting request 1117137 from home:mslacken:branches:network:cluster Egbert Eich 2023-10-12 08:49:44 +0000
  • 12bf38b1d0 Accepting request 1111943 from network:cluster Dominique Leuenberger 2023-09-20 11:26:46 +0000
  • f0b994e220 plugins makes use of the MpiParams=ports= option, and previously features with the | operator, which could prevent jobs from + node_features/helpers - Fix inconsistent handling of & and |, instead of just the current set. E.g. foo|bar&baz was interpreted {foo} or {bar,baz}. tasks fewer than GPUs, which resulted in incorrectly rejecting these jobs. + slurmrestd - For GET /slurm/v0.0.39/node[s], change format of node's energy field current_watts to a dictionary to account for + slurmrestd - For GET /slurm/v0.0.39/qos, change format of QOS's + slurmrestd - For GET /slurm/v0.0.39/job[s], the 'return code' GET /slurmdb/v0.0.39/jobs from slurmrestd. were present in the log: error: Attempt to change gres/gpu Count. + Hold the job with (Reservation ... invalid) state reason if the Egbert Eich 2023-09-18 05:43:58 +0000
  • 74529b6cc2 - Updated to version 23.02.5 with the following changes: * Bug Fixes: + Revert a change in 23.02 where SLURM_NTASKS was no longer set in the job's environment when --ntasks-per-node was requested. The method that is is being set, however, is different and should be more accurate in more situations. + Change pmi2 plugin to honor the SrunPortRange option. This matches the new behavior of the pmix plugin in 23.02.0. Note that neither of these plugins makes use of the "MpiParams=ports=" option, and previously were only limited by the systems ephemeral port range. + Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if a node features plugin is configured. + Fix and prevent reoccurring reservations from overlapping. + job_container/tmpfs - Avoid attempts to share BasePath between nodes. + With CR_Cpu_Memory, fix node selection for jobs that request gres and --mem-per-cpu. + Fix a regression from 22.05.7 in which some jobs were allocated too few nodes, thus overcommitting cpus to some tasks. + Fix a job being stuck in the completing state if the job ends while the primary controller is down or unresponsive and the backup controller has not yet taken over. + Fix slurmctld segfault when a node registers with a configured CpuSpecList while slurmctld configuration has the node without CpuSpecList. + Fix cloud nodes getting stuck in POWERED_DOWN+NO_RESPOND state after not registering by ResumeTimeout. + slurmstepd - Avoid cleanup of config.json-less containers spooldir getting skipped. + Fix scontrol segfault when 'completing' command requested repeatedly in interactive mode. Egbert Eich 2023-09-18 05:24:51 +0000
  • 3825e9fab0 Accepting request 1110422 from network:cluster Ana Guerrero 2023-09-12 19:02:53 +0000
  • a323feff42 Accepting request 1110421 from home:eeich:branches:network:cluster Egbert Eich 2023-09-12 04:52:56 +0000
  • 3bcde4bfd9 Accepting request 1110259 from network:cluster Ana Guerrero 2023-09-11 19:22:19 +0000
  • f9646ba945 - Updated to 23.02.4 with the following changes: * Bug Fixes: + Fix main scheduler loop not starting after a failover to backup controller. Avoid slurmctld segfault when specifying AccountingStorageExternalHost (bsc#1214983). + Fix sbatch return code when --wait is requested on a job array. + Fix collected GPUUtilization values for acct_gather_profile plugins. + Fix slurmrestd handling of job hold/release operations. + Fix step running indefinitely when slurmctld takes more than MessageTimeout to respond. Now, slurmctld will cancel the step when detected, preventing following steps from getting stuck waiting for resources to be released. + Fix regression to make job_desc.min_cpus accurate again in job_submit when requesting a job with --ntasks-per-node. + Fix handling of ArrayTaskThrottle in backfill. + Fix regression in 23.02.2 when checking gres state on slurmctld startup or reconfigure. Gres changes in the configuration were not updated on slurmctld startup. On startup or reconfigure, these messages were present in the log: "error: Attempt to change gres/gpu Count". + Fix potential double count of gres when dealing with limits. + Fix slurmstepd segfault when ContainerPath is not set in oci.conf + Fixed an issue where jobs requesting licenses were incorrectly rejected. + scrontab - Fix cutting off the final character of quoted variables. + smail - Fix issues where e-mails at job completion were not being sent. + scontrol/slurmctld - fix comma parsing when updating a reservation's nodes. + Fix --gpu-bind=single binding tasks to wrong gpus, leading to some gpus having more tasks than they should and other gpus being unused. + Fix regression in 23.02 that causes slurmstepd to crash when srun requests more than TreeWidth nodes in a step and uses the pmi2 or Egbert Eich 2023-09-11 07:21:32 +0000
  • 6b47182efe Accepting request 1109308 from network:cluster Ana Guerrero 2023-09-07 19:12:41 +0000
  • c63b605916 - Fixes since 23.02.03: Highlights: * Fix main scheduler loop not starting after a failover to backup controller. * Avoid slurmctld segfault when specifying AccountingStorageExternalHost (bsc#1214983). Other: * Fix sbatch return code when --wait is requested on a job array. * Fix collected GPUUtilization values for acct_gather_profile plugins. * Fix slurmrestd handling of job hold/release operations. * Make spank S_JOB_ARGV item value hold the requested command argv instead of the srun --bcast value when --bcast requested (only in local context). * Fix step running indefinitely when slurmctld takes more than MessageTimeout to respond. Now, slurmctld will cancel the step when detected, preventing following steps from getting stuck waiting for resources to be released. * Fix regression to make job_desc.min_cpus accurate again in job_submit when requesting a job with --ntasks-per-node. * Fix handling of ArrayTaskThrottle in backfill. * Fix regression in 23.02.2 when checking gres state on slurmctld startup or reconfigure. Gres changes in the configuration were not updated on slurmctld startup. On startup or reconfigure, these messages were present in the log: "error: Attempt to change gres/gpu Count". * Fix potential double count of gres when dealing with limits. * Fix slurmstepd segfault when ContainerPath is not set in oci.conf * Fixed an issue where jobs requesting licenses were incorrectly rejected. * scrontab - Fix cutting off the final character of quoted variables. * smail - Fix issues where e-mails at job completion were not being sent. * scontrol/slurmctld - fix comma parsing when updating a reservation's nodes. Egbert Eich 2023-09-06 17:11:37 +0000
  • 51bec69223 Accepting request 1109029 from network:cluster Ana Guerrero 2023-09-06 16:57:11 +0000
  • 47d665607b Accepting request 1109009 from home:mslacken:branches:network:cluster Christian Goll 2023-09-05 11:47:06 +0000
  • 03d2eefa9e Accepting request 1085677 from network:cluster Dominique Leuenberger 2023-05-09 11:09:16 +0000
  • 532aa1e96d Accepting request 1085668 from home:mslacken:branches:network:cluster Egbert Eich 2023-05-09 10:35:16 +0000
  • 0d5e08df4b Accepting request 1083466 from network:cluster Dominique Leuenberger 2023-04-28 14:23:13 +0000
  • 33bf8791ac - Require slurm-munge if munge authentication is installed. - Replace 'Require: config(pam)' by 'Require: pam'. Egbert Eich 2023-04-28 07:46:44 +0000
  • 392bec3223 Accepting request 1082770 from home:eeich:branches:network:cluster Christian Goll 2023-04-27 13:24:37 +0000
  • e27e58c1b6 Accepting request 1076522 from network:cluster Dominique Leuenberger 2023-04-01 17:32:20 +0000
  • 5a68fc8e5f - updated to 23.02.1 with the following changes: - removed right-pmix-path.patch as fixed upstream Egbert Eich 2023-03-31 15:48:27 +0000
  • d2a2e0a1e8 Accepting request 1076461 from home:mslacken:branches:network:cluster Egbert Eich 2023-03-31 15:44:08 +0000
  • c7d67ed696 Accepting request 1072592 from network:cluster Dominique Leuenberger 2023-03-17 16:05:03 +0000
  • 5c3d4865a1 Accepting request 1072591 from home:mslacken:branches:network:cluster Christian Goll 2023-03-17 10:52:44 +0000
  • 9883ad6d58 Accepting request 1072585 from home:mslacken:branches:network:cluster Christian Goll 2023-03-17 10:42:09 +0000
  • 2de2dcca49 Accepting request 1072087 from network:cluster Dominique Leuenberger 2023-03-15 17:56:12 +0000
  • 521f372d87 Accepting request 1072084 from home:mslacken:branches:network:cluster Christian Goll 2023-03-15 10:57:09 +0000
  • c224ea00c3 Accepting request 1070214 from network:cluster Dominique Leuenberger 2023-03-09 16:45:23 +0000
  • e85b508441 Accepting request 1070212 from home:eeich:branches:network:cluster Egbert Eich 2023-03-08 15:43:28 +0000
  • 86940cb8c4 Accepting request 1070094 from home:eeich:branches:network:cluster Egbert Eich 2023-03-08 07:58:58 +0000
  • 0f04c66747 Accepting request 1070043 from home:eeich:branches:network:cluster Egbert Eich 2023-03-07 22:14:15 +0000
  • da464bfaae Accepting request 1070038 from home:eeich:branches:network:cluster Egbert Eich 2023-03-07 21:33:03 +0000
  • 50b2b76a05 Accepting request 1068523 from network:cluster Dominique Leuenberger 2023-03-02 22:03:34 +0000
  • 6997bacde0 Accepting request 1068522 from home:eeich:branches:network:cluster Egbert Eich 2023-03-01 17:58:54 +0000
  • 8a8f7dcb78 Accepting request 1068320 from network:cluster Dominique Leuenberger 2023-03-01 15:14:17 +0000
  • e60f39a466 - updated to 23.02.0 Egbert Eich 2023-02-28 20:50:48 +0000
  • 8899aac00b - testsuite: on later SUSE versions claim ownership of directory Egbert Eich 2023-02-28 20:34:03 +0000
  • 18aa012ab9 Accepting request 1068316 from home:eeich:branches:network:cluster Egbert Eich 2023-02-28 20:30:32 +0000
  • ef6d6521aa Accepting request 1067475 from home:eeich:branches:network:cluster Egbert Eich 2023-02-23 19:32:51 +0000
  • d1ebf00ba6 Accepting request 1063957 from network:cluster Dominique Leuenberger 2023-02-09 15:23:26 +0000
  • 4693e39860 Accepting request 1063954 from home:eeich:branches:network:cluster Egbert Eich 2023-02-09 08:22:55 +0000
  • a4484c7dc2 Accepting request 1042071 from network:cluster Dominique Leuenberger 2022-12-11 16:16:58 +0000
  • 6f080824a4 Accepting request 1039957 from home:eeich:branches:network:cluster Egbert Eich 2022-12-11 07:58:12 +0000
  • 30dd030610 Accepting request 1031255 from network:cluster Dominique Leuenberger 2022-10-26 10:32:00 +0000
  • 212048404b * Improve setup-testsuite.sh: copy ssh fingerprints from all nodes. Egbert Eich 2022-10-26 06:23:36 +0000
  • 776ce8f23b - Test Suite fixes: * Update README_Testsuite.md. * Clean up left over files when de-installing test suite. * Adjustment to test suite package: for SLE mark the openmpi4 devel package and slurm-hdf5 optional. * Add -ffat-lto-objects to the build flags when LTO is set to make sure the object files we ship with the test suite still work correctly. Egbert Eich 2022-10-25 11:33:49 +0000
  • 642a47efa7 - Adjustment to test suite package: only recommend openmpi4 Egbert Eich 2022-10-24 08:54:35 +0000
  • 52046053d5 Accepting request 1030610 from home:eeich:branches:network:cluster Egbert Eich 2022-10-24 05:31:40 +0000
  • 220eec76a4 Accepting request 1030432 from network:cluster Dominique Leuenberger 2022-10-22 12:13:18 +0000
  • c2551ab47f Accepting request 1010642 from home:mslacken:branches:network:cluster Egbert Eich 2022-10-21 15:00:25 +0000
  • edd405b2c8 Accepting request 1006180 from network:cluster Dominique Leuenberger 2022-09-26 16:48:44 +0000
  • 09aecc2015 Accepting request 1005746 from home:eeich:branches:network:cluster Egbert Eich 2022-09-26 15:01:51 +0000
  • ae04ec8787 Accepting request 1005247 from network:cluster Dominique Leuenberger 2022-09-22 12:49:55 +0000
  • 3f68233e21 Accepting request 1005246 from home:eeich:branches:network:cluster Egbert Eich 2022-09-21 15:33:09 +0000
  • d3bcbab808 Accepting request 992362 from network:cluster Dominique Leuenberger 2022-08-02 20:09:54 +0000
  • b60ac5f569 Accepting request 992353 from home:eeich:branches:network:cluster Egbert Eich 2022-08-02 15:34:01 +0000
  • fd509c0258 Accepting request 990637 from home:bmwiedemann:branches:network:cluster Egbert Eich 2022-08-02 13:14:07 +0000
  • 7a8e082057 Accepting request 990643 from network:cluster Richard Brown 2022-07-22 17:21:25 +0000
  • e067a36989 - Fix a typo which prevented the nproc limit for slurmd to be up-ed for the test suite. Egbert Eich 2022-07-15 07:15:34 +0000
  • 69890cab1e Accepting request 989256 from home:eeich:branches:network:cluster Egbert Eich 2022-07-15 07:13:32 +0000
  • 167150eca6 - Fix a typo Egbert Eich 2022-07-15 07:12:53 +0000
  • e57307d81e Accepting request 988733 from network:cluster Dominique Leuenberger 2022-07-13 11:45:23 +0000
  • 7d13a7ba97 Accepting request 988732 from home:eeich:branches:network:cluster Egbert Eich 2022-07-12 20:03:18 +0000
  • 52adf61c22 Accepting request 983910 from home:mslacken:branches:network:cluster Christian Goll 2022-06-20 11:58:11 +0000
  • 2951a00ce2 - Package the Slrum testsuite for QA purposes. NOTE: This package is not meant to be used for testing by the user but rather for testing by the maintainers to ensure the package is working properly. DO NOT report test suite failures unless you are able to confirm that the failure is really a bug. Egbert Eich 2022-06-08 13:21:55 +0000
  • 13c4d39104 Accepting request 980097 from network:cluster Dominique Leuenberger 2022-05-31 14:04:51 +0000
  • faa19fe22b Accepting request 980093 from home:mslacken:branches:network:cluster Christian Goll 2022-05-31 13:38:54 +0000
  • 737b47d2be Accepting request 976280 from network:cluster Dominique Leuenberger 2022-05-12 20:59:35 +0000
  • a07f819c2f - Update to 21.08.8 which fixes CVE-2022-29500 (bsc#1199278), CVE-2022-29501 (bsc#1199279), and CVE-2022-29502 (bsc#1199281). Egbert Eich 2022-05-11 10:26:59 +0000
  • 5f6ca5dea6 Accepting request 976056 from home:eeich:branches:network:cluster Egbert Eich 2022-05-11 10:25:15 +0000
  • 62db1261ed Accepting request 975440 from network:cluster Dominique Leuenberger 2022-05-06 17:00:14 +0000
  • 950ae37e78 Accepting request 975374 from home:mslacken:branches:network:cluster Christian Goll 2022-05-06 15:13:12 +0000
  • 2450bd4dcd Accepting request 974456 from network:cluster Dominique Leuenberger 2022-05-03 19:19:04 +0000
  • 30c749c9e0 Accepting request 974433 from home:mslacken:branches:network:cluster Christian Goll 2022-05-02 17:06:13 +0000
  • ec8df38732 Accepting request 942222 from network:cluster Dominique Leuenberger 2021-12-23 16:53:52 +0000
  • d442993ff4 Accepting request 942081 from home:mslacken:branches:network:cluster Christian Goll 2021-12-23 10:26:41 +0000