Commit Graph

  • 90bba6a8aa Accepting request 1117137 from home:mslacken:branches:network:cluster Egbert Eich 2023-10-12 08:49:44 +00:00
  • e985773dfe Accepting request 1117137 from home:mslacken:branches:network:cluster Egbert Eich 2023-10-12 08:49:44 +00:00
  • 12bf38b1d0 Accepting request 1111943 from network:cluster Dominique Leuenberger 2023-09-20 11:26:46 +00:00
  • bd7880f40f Accepting request 1111943 from network:cluster Dominique Leuenberger 2023-09-20 11:26:46 +00:00
  • f0b994e220 plugins makes use of the MpiParams=ports= option, and previously features with the | operator, which could prevent jobs from + node_features/helpers - Fix inconsistent handling of & and |, instead of just the current set. E.g. foo|bar&baz was interpreted {foo} or {bar,baz}. tasks fewer than GPUs, which resulted in incorrectly rejecting these jobs. + slurmrestd - For GET /slurm/v0.0.39/node[s], change format of node's energy field current_watts to a dictionary to account for + slurmrestd - For GET /slurm/v0.0.39/qos, change format of QOS's + slurmrestd - For GET /slurm/v0.0.39/job[s], the 'return code' GET /slurmdb/v0.0.39/jobs from slurmrestd. were present in the log: error: Attempt to change gres/gpu Count. + Hold the job with (Reservation ... invalid) state reason if the Egbert Eich 2023-09-18 05:43:58 +00:00
  • a4f697f06d plugins makes use of the MpiParams=ports= option, and previously features with the | operator, which could prevent jobs from + node_features/helpers - Fix inconsistent handling of & and |, instead of just the current set. E.g. foo|bar&baz was interpreted {foo} or {bar,baz}. tasks fewer than GPUs, which resulted in incorrectly rejecting these jobs. + slurmrestd - For GET /slurm/v0.0.39/node[s], change format of node's energy field current_watts to a dictionary to account for + slurmrestd - For GET /slurm/v0.0.39/qos, change format of QOS's + slurmrestd - For GET /slurm/v0.0.39/job[s], the 'return code' GET /slurmdb/v0.0.39/jobs from slurmrestd. were present in the log: error: Attempt to change gres/gpu Count. + Hold the job with (Reservation ... invalid) state reason if the Egbert Eich 2023-09-18 05:43:58 +00:00
  • 74529b6cc2 - Updated to version 23.02.5 with the following changes: * Bug Fixes: + Revert a change in 23.02 where SLURM_NTASKS was no longer set in the job's environment when --ntasks-per-node was requested. The method that is is being set, however, is different and should be more accurate in more situations. + Change pmi2 plugin to honor the SrunPortRange option. This matches the new behavior of the pmix plugin in 23.02.0. Note that neither of these plugins makes use of the "MpiParams=ports=" option, and previously were only limited by the systems ephemeral port range. + Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if a node features plugin is configured. + Fix and prevent reoccurring reservations from overlapping. + job_container/tmpfs - Avoid attempts to share BasePath between nodes. + With CR_Cpu_Memory, fix node selection for jobs that request gres and --mem-per-cpu. + Fix a regression from 22.05.7 in which some jobs were allocated too few nodes, thus overcommitting cpus to some tasks. + Fix a job being stuck in the completing state if the job ends while the primary controller is down or unresponsive and the backup controller has not yet taken over. + Fix slurmctld segfault when a node registers with a configured CpuSpecList while slurmctld configuration has the node without CpuSpecList. + Fix cloud nodes getting stuck in POWERED_DOWN+NO_RESPOND state after not registering by ResumeTimeout. + slurmstepd - Avoid cleanup of config.json-less containers spooldir getting skipped. + Fix scontrol segfault when 'completing' command requested repeatedly in interactive mode. Egbert Eich 2023-09-18 05:24:51 +00:00
  • 7c740289ad - Updated to version 23.02.5 with the following changes: * Bug Fixes: + Revert a change in 23.02 where SLURM_NTASKS was no longer set in the job's environment when --ntasks-per-node was requested. The method that is is being set, however, is different and should be more accurate in more situations. + Change pmi2 plugin to honor the SrunPortRange option. This matches the new behavior of the pmix plugin in 23.02.0. Note that neither of these plugins makes use of the "MpiParams=ports=" option, and previously were only limited by the systems ephemeral port range. + Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if a node features plugin is configured. + Fix and prevent reoccurring reservations from overlapping. + job_container/tmpfs - Avoid attempts to share BasePath between nodes. + With CR_Cpu_Memory, fix node selection for jobs that request gres and --mem-per-cpu. + Fix a regression from 22.05.7 in which some jobs were allocated too few nodes, thus overcommitting cpus to some tasks. + Fix a job being stuck in the completing state if the job ends while the primary controller is down or unresponsive and the backup controller has not yet taken over. + Fix slurmctld segfault when a node registers with a configured CpuSpecList while slurmctld configuration has the node without CpuSpecList. + Fix cloud nodes getting stuck in POWERED_DOWN+NO_RESPOND state after not registering by ResumeTimeout. + slurmstepd - Avoid cleanup of config.json-less containers spooldir getting skipped. + Fix scontrol segfault when 'completing' command requested repeatedly in interactive mode. Egbert Eich 2023-09-18 05:24:51 +00:00
  • 3825e9fab0 Accepting request 1110422 from network:cluster Ana Guerrero 2023-09-12 19:02:53 +00:00
  • 5e252b9d68 Accepting request 1110422 from network:cluster Ana Guerrero 2023-09-12 19:02:53 +00:00
  • a323feff42 Accepting request 1110421 from home:eeich:branches:network:cluster Egbert Eich 2023-09-12 04:52:56 +00:00
  • 09d371a40e Accepting request 1110421 from home:eeich:branches:network:cluster Egbert Eich 2023-09-12 04:52:56 +00:00
  • 3bcde4bfd9 Accepting request 1110259 from network:cluster Ana Guerrero 2023-09-11 19:22:19 +00:00
  • 9f1000ec21 Accepting request 1110259 from network:cluster Ana Guerrero 2023-09-11 19:22:19 +00:00
  • f9646ba945 - Updated to 23.02.4 with the following changes: * Bug Fixes: + Fix main scheduler loop not starting after a failover to backup controller. Avoid slurmctld segfault when specifying AccountingStorageExternalHost (bsc#1214983). + Fix sbatch return code when --wait is requested on a job array. + Fix collected GPUUtilization values for acct_gather_profile plugins. + Fix slurmrestd handling of job hold/release operations. + Fix step running indefinitely when slurmctld takes more than MessageTimeout to respond. Now, slurmctld will cancel the step when detected, preventing following steps from getting stuck waiting for resources to be released. + Fix regression to make job_desc.min_cpus accurate again in job_submit when requesting a job with --ntasks-per-node. + Fix handling of ArrayTaskThrottle in backfill. + Fix regression in 23.02.2 when checking gres state on slurmctld startup or reconfigure. Gres changes in the configuration were not updated on slurmctld startup. On startup or reconfigure, these messages were present in the log: "error: Attempt to change gres/gpu Count". + Fix potential double count of gres when dealing with limits. + Fix slurmstepd segfault when ContainerPath is not set in oci.conf + Fixed an issue where jobs requesting licenses were incorrectly rejected. + scrontab - Fix cutting off the final character of quoted variables. + smail - Fix issues where e-mails at job completion were not being sent. + scontrol/slurmctld - fix comma parsing when updating a reservation's nodes. + Fix --gpu-bind=single binding tasks to wrong gpus, leading to some gpus having more tasks than they should and other gpus being unused. + Fix regression in 23.02 that causes slurmstepd to crash when srun requests more than TreeWidth nodes in a step and uses the pmi2 or Egbert Eich 2023-09-11 07:21:32 +00:00
  • 6ad091ecc0 - Updated to 23.02.4 with the following changes: * Bug Fixes: + Fix main scheduler loop not starting after a failover to backup controller. Avoid slurmctld segfault when specifying AccountingStorageExternalHost (bsc#1214983). + Fix sbatch return code when --wait is requested on a job array. + Fix collected GPUUtilization values for acct_gather_profile plugins. + Fix slurmrestd handling of job hold/release operations. + Fix step running indefinitely when slurmctld takes more than MessageTimeout to respond. Now, slurmctld will cancel the step when detected, preventing following steps from getting stuck waiting for resources to be released. + Fix regression to make job_desc.min_cpus accurate again in job_submit when requesting a job with --ntasks-per-node. + Fix handling of ArrayTaskThrottle in backfill. + Fix regression in 23.02.2 when checking gres state on slurmctld startup or reconfigure. Gres changes in the configuration were not updated on slurmctld startup. On startup or reconfigure, these messages were present in the log: "error: Attempt to change gres/gpu Count". + Fix potential double count of gres when dealing with limits. + Fix slurmstepd segfault when ContainerPath is not set in oci.conf + Fixed an issue where jobs requesting licenses were incorrectly rejected. + scrontab - Fix cutting off the final character of quoted variables. + smail - Fix issues where e-mails at job completion were not being sent. + scontrol/slurmctld - fix comma parsing when updating a reservation's nodes. + Fix --gpu-bind=single binding tasks to wrong gpus, leading to some gpus having more tasks than they should and other gpus being unused. + Fix regression in 23.02 that causes slurmstepd to crash when srun requests more than TreeWidth nodes in a step and uses the pmi2 or Egbert Eich 2023-09-11 07:21:32 +00:00
  • 6b47182efe Accepting request 1109308 from network:cluster Ana Guerrero 2023-09-07 19:12:41 +00:00
  • e167499b83 Accepting request 1109308 from network:cluster Ana Guerrero 2023-09-07 19:12:41 +00:00
  • c63b605916 - Fixes since 23.02.03: Highlights: * Fix main scheduler loop not starting after a failover to backup controller. * Avoid slurmctld segfault when specifying AccountingStorageExternalHost (bsc#1214983). Other: * Fix sbatch return code when --wait is requested on a job array. * Fix collected GPUUtilization values for acct_gather_profile plugins. * Fix slurmrestd handling of job hold/release operations. * Make spank S_JOB_ARGV item value hold the requested command argv instead of the srun --bcast value when --bcast requested (only in local context). * Fix step running indefinitely when slurmctld takes more than MessageTimeout to respond. Now, slurmctld will cancel the step when detected, preventing following steps from getting stuck waiting for resources to be released. * Fix regression to make job_desc.min_cpus accurate again in job_submit when requesting a job with --ntasks-per-node. * Fix handling of ArrayTaskThrottle in backfill. * Fix regression in 23.02.2 when checking gres state on slurmctld startup or reconfigure. Gres changes in the configuration were not updated on slurmctld startup. On startup or reconfigure, these messages were present in the log: "error: Attempt to change gres/gpu Count". * Fix potential double count of gres when dealing with limits. * Fix slurmstepd segfault when ContainerPath is not set in oci.conf * Fixed an issue where jobs requesting licenses were incorrectly rejected. * scrontab - Fix cutting off the final character of quoted variables. * smail - Fix issues where e-mails at job completion were not being sent. * scontrol/slurmctld - fix comma parsing when updating a reservation's nodes. Egbert Eich 2023-09-06 17:11:37 +00:00
  • 8b706ae37a - Fixes since 23.02.03: Highlights: * Fix main scheduler loop not starting after a failover to backup controller. * Avoid slurmctld segfault when specifying AccountingStorageExternalHost (bsc#1214983). Other: * Fix sbatch return code when --wait is requested on a job array. * Fix collected GPUUtilization values for acct_gather_profile plugins. * Fix slurmrestd handling of job hold/release operations. * Make spank S_JOB_ARGV item value hold the requested command argv instead of the srun --bcast value when --bcast requested (only in local context). * Fix step running indefinitely when slurmctld takes more than MessageTimeout to respond. Now, slurmctld will cancel the step when detected, preventing following steps from getting stuck waiting for resources to be released. * Fix regression to make job_desc.min_cpus accurate again in job_submit when requesting a job with --ntasks-per-node. * Fix handling of ArrayTaskThrottle in backfill. * Fix regression in 23.02.2 when checking gres state on slurmctld startup or reconfigure. Gres changes in the configuration were not updated on slurmctld startup. On startup or reconfigure, these messages were present in the log: "error: Attempt to change gres/gpu Count". * Fix potential double count of gres when dealing with limits. * Fix slurmstepd segfault when ContainerPath is not set in oci.conf * Fixed an issue where jobs requesting licenses were incorrectly rejected. * scrontab - Fix cutting off the final character of quoted variables. * smail - Fix issues where e-mails at job completion were not being sent. * scontrol/slurmctld - fix comma parsing when updating a reservation's nodes. Egbert Eich 2023-09-06 17:11:37 +00:00
  • 51bec69223 Accepting request 1109029 from network:cluster Ana Guerrero 2023-09-06 16:57:11 +00:00
  • 5e2c599785 Accepting request 1109029 from network:cluster Ana Guerrero 2023-09-06 16:57:11 +00:00
  • 47d665607b Accepting request 1109009 from home:mslacken:branches:network:cluster Christian Goll 2023-09-05 11:47:06 +00:00
  • 8f857e2839 Accepting request 1109009 from home:mslacken:branches:network:cluster Christian Goll 2023-09-05 11:47:06 +00:00
  • 03d2eefa9e Accepting request 1085677 from network:cluster Dominique Leuenberger 2023-05-09 11:09:16 +00:00
  • 6e09ce8fba Accepting request 1085677 from network:cluster Dominique Leuenberger 2023-05-09 11:09:16 +00:00
  • 532aa1e96d Accepting request 1085668 from home:mslacken:branches:network:cluster Egbert Eich 2023-05-09 10:35:16 +00:00
  • fdd6041a09 Accepting request 1085668 from home:mslacken:branches:network:cluster Egbert Eich 2023-05-09 10:35:16 +00:00
  • 0d5e08df4b Accepting request 1083466 from network:cluster Dominique Leuenberger 2023-04-28 14:23:13 +00:00
  • 97f56f8121 Accepting request 1083466 from network:cluster Dominique Leuenberger 2023-04-28 14:23:13 +00:00
  • 33bf8791ac - Require slurm-munge if munge authentication is installed. - Replace 'Require: config(pam)' by 'Require: pam'. Egbert Eich 2023-04-28 07:46:44 +00:00
  • f074aed6ef - Require slurm-munge if munge authentication is installed. - Replace 'Require: config(pam)' by 'Require: pam'. Egbert Eich 2023-04-28 07:46:44 +00:00
  • 392bec3223 Accepting request 1082770 from home:eeich:branches:network:cluster Christian Goll 2023-04-27 13:24:37 +00:00
  • 0caf21614f Accepting request 1082770 from home:eeich:branches:network:cluster Christian Goll 2023-04-27 13:24:37 +00:00
  • e27e58c1b6 Accepting request 1076522 from network:cluster Dominique Leuenberger 2023-04-01 17:32:20 +00:00
  • 0a6a57fc41 Accepting request 1076522 from network:cluster Dominique Leuenberger 2023-04-01 17:32:20 +00:00
  • 5a68fc8e5f - updated to 23.02.1 with the following changes: - removed right-pmix-path.patch as fixed upstream Egbert Eich 2023-03-31 15:48:27 +00:00
  • 4a846e7a5f - updated to 23.02.1 with the following changes: - removed right-pmix-path.patch as fixed upstream Egbert Eich 2023-03-31 15:48:27 +00:00
  • d2a2e0a1e8 Accepting request 1076461 from home:mslacken:branches:network:cluster Egbert Eich 2023-03-31 15:44:08 +00:00
  • ca20890f09 Accepting request 1076461 from home:mslacken:branches:network:cluster Egbert Eich 2023-03-31 15:44:08 +00:00
  • c7d67ed696 Accepting request 1072592 from network:cluster Dominique Leuenberger 2023-03-17 16:05:03 +00:00
  • 99a28e2e57 Accepting request 1072592 from network:cluster Dominique Leuenberger 2023-03-17 16:05:03 +00:00
  • 5c3d4865a1 Accepting request 1072591 from home:mslacken:branches:network:cluster Christian Goll 2023-03-17 10:52:44 +00:00
  • c4d5ccc9e3 Accepting request 1072591 from home:mslacken:branches:network:cluster Christian Goll 2023-03-17 10:52:44 +00:00
  • 9883ad6d58 Accepting request 1072585 from home:mslacken:branches:network:cluster Christian Goll 2023-03-17 10:42:09 +00:00
  • 685a526cd0 Accepting request 1072585 from home:mslacken:branches:network:cluster Christian Goll 2023-03-17 10:42:09 +00:00
  • 2de2dcca49 Accepting request 1072087 from network:cluster Dominique Leuenberger 2023-03-15 17:56:12 +00:00
  • e7950661d1 Accepting request 1072087 from network:cluster Dominique Leuenberger 2023-03-15 17:56:12 +00:00
  • 521f372d87 Accepting request 1072084 from home:mslacken:branches:network:cluster Christian Goll 2023-03-15 10:57:09 +00:00
  • 848e98dcf9 Accepting request 1072084 from home:mslacken:branches:network:cluster Christian Goll 2023-03-15 10:57:09 +00:00
  • c224ea00c3 Accepting request 1070214 from network:cluster Dominique Leuenberger 2023-03-09 16:45:23 +00:00
  • d61be24079 Accepting request 1070214 from network:cluster Dominique Leuenberger 2023-03-09 16:45:23 +00:00
  • e85b508441 Accepting request 1070212 from home:eeich:branches:network:cluster Egbert Eich 2023-03-08 15:43:28 +00:00
  • 242b77b0c0 Accepting request 1070212 from home:eeich:branches:network:cluster Egbert Eich 2023-03-08 15:43:28 +00:00
  • 86940cb8c4 Accepting request 1070094 from home:eeich:branches:network:cluster Egbert Eich 2023-03-08 07:58:58 +00:00
  • 44732e714c Accepting request 1070094 from home:eeich:branches:network:cluster Egbert Eich 2023-03-08 07:58:58 +00:00
  • 0f04c66747 Accepting request 1070043 from home:eeich:branches:network:cluster Egbert Eich 2023-03-07 22:14:15 +00:00
  • e63fa12bb7 Accepting request 1070043 from home:eeich:branches:network:cluster Egbert Eich 2023-03-07 22:14:15 +00:00
  • da464bfaae Accepting request 1070038 from home:eeich:branches:network:cluster Egbert Eich 2023-03-07 21:33:03 +00:00
  • 1876e24292 Accepting request 1070038 from home:eeich:branches:network:cluster Egbert Eich 2023-03-07 21:33:03 +00:00
  • 50b2b76a05 Accepting request 1068523 from network:cluster Dominique Leuenberger 2023-03-02 22:03:34 +00:00
  • 9c68299094 Accepting request 1068523 from network:cluster Dominique Leuenberger 2023-03-02 22:03:34 +00:00
  • 6997bacde0 Accepting request 1068522 from home:eeich:branches:network:cluster Egbert Eich 2023-03-01 17:58:54 +00:00
  • 26380fd8ab Accepting request 1068522 from home:eeich:branches:network:cluster Egbert Eich 2023-03-01 17:58:54 +00:00
  • 8a8f7dcb78 Accepting request 1068320 from network:cluster Dominique Leuenberger 2023-03-01 15:14:17 +00:00
  • d9acf9aac4 Accepting request 1068320 from network:cluster Dominique Leuenberger 2023-03-01 15:14:17 +00:00
  • e60f39a466 - updated to 23.02.0 Egbert Eich 2023-02-28 20:50:48 +00:00
  • ddbd159d94 - updated to 23.02.0 Egbert Eich 2023-02-28 20:50:48 +00:00
  • 8899aac00b - testsuite: on later SUSE versions claim ownership of directory Egbert Eich 2023-02-28 20:34:03 +00:00
  • 6bf6976e07 - testsuite: on later SUSE versions claim ownership of directory Egbert Eich 2023-02-28 20:34:03 +00:00
  • 18aa012ab9 Accepting request 1068316 from home:eeich:branches:network:cluster Egbert Eich 2023-02-28 20:30:32 +00:00
  • cafcd17ecb Accepting request 1068316 from home:eeich:branches:network:cluster Egbert Eich 2023-02-28 20:30:32 +00:00
  • ef6d6521aa Accepting request 1067475 from home:eeich:branches:network:cluster Egbert Eich 2023-02-23 19:32:51 +00:00
  • ad5873ee1d Accepting request 1067475 from home:eeich:branches:network:cluster Egbert Eich 2023-02-23 19:32:51 +00:00
  • d1ebf00ba6 Accepting request 1063957 from network:cluster Dominique Leuenberger 2023-02-09 15:23:26 +00:00
  • 5ef6cf7c80 Accepting request 1063957 from network:cluster Dominique Leuenberger 2023-02-09 15:23:26 +00:00
  • 4693e39860 Accepting request 1063954 from home:eeich:branches:network:cluster Egbert Eich 2023-02-09 08:22:55 +00:00
  • 4d6c7101a2 Accepting request 1063954 from home:eeich:branches:network:cluster Egbert Eich 2023-02-09 08:22:55 +00:00
  • a4484c7dc2 Accepting request 1042071 from network:cluster Dominique Leuenberger 2022-12-11 16:16:58 +00:00
  • 6a076dc3e8 Accepting request 1042071 from network:cluster Dominique Leuenberger 2022-12-11 16:16:58 +00:00
  • 6f080824a4 Accepting request 1039957 from home:eeich:branches:network:cluster Egbert Eich 2022-12-11 07:58:12 +00:00
  • e9cc5f8910 Accepting request 1039957 from home:eeich:branches:network:cluster Egbert Eich 2022-12-11 07:58:12 +00:00
  • 30dd030610 Accepting request 1031255 from network:cluster Dominique Leuenberger 2022-10-26 10:32:00 +00:00
  • e93e16edf0 Accepting request 1031255 from network:cluster Dominique Leuenberger 2022-10-26 10:32:00 +00:00
  • 212048404b * Improve setup-testsuite.sh: copy ssh fingerprints from all nodes. Egbert Eich 2022-10-26 06:23:36 +00:00
  • eac06a3bc4 * Improve setup-testsuite.sh: copy ssh fingerprints from all nodes. Egbert Eich 2022-10-26 06:23:36 +00:00
  • 776ce8f23b - Test Suite fixes: * Update README_Testsuite.md. * Clean up left over files when de-installing test suite. * Adjustment to test suite package: for SLE mark the openmpi4 devel package and slurm-hdf5 optional. * Add -ffat-lto-objects to the build flags when LTO is set to make sure the object files we ship with the test suite still work correctly. Egbert Eich 2022-10-25 11:33:49 +00:00
  • 371becf26d - Test Suite fixes: * Update README_Testsuite.md. * Clean up left over files when de-installing test suite. * Adjustment to test suite package: for SLE mark the openmpi4 devel package and slurm-hdf5 optional. * Add -ffat-lto-objects to the build flags when LTO is set to make sure the object files we ship with the test suite still work correctly. Egbert Eich 2022-10-25 11:33:49 +00:00
  • 642a47efa7 - Adjustment to test suite package: only recommend openmpi4 Egbert Eich 2022-10-24 08:54:35 +00:00
  • 4a0e30d273 - Adjustment to test suite package: only recommend openmpi4 Egbert Eich 2022-10-24 08:54:35 +00:00
  • 52046053d5 Accepting request 1030610 from home:eeich:branches:network:cluster Egbert Eich 2022-10-24 05:31:40 +00:00
  • c7e02dc61a Accepting request 1030610 from home:eeich:branches:network:cluster Egbert Eich 2022-10-24 05:31:40 +00:00
  • 220eec76a4 Accepting request 1030432 from network:cluster Dominique Leuenberger 2022-10-22 12:13:18 +00:00
  • 6178df65d5 Accepting request 1030432 from network:cluster Dominique Leuenberger 2022-10-22 12:13:18 +00:00
  • c2551ab47f Accepting request 1010642 from home:mslacken:branches:network:cluster Egbert Eich 2022-10-21 15:00:25 +00:00
  • bd17bf1567 Accepting request 1010642 from home:mslacken:branches:network:cluster Egbert Eich 2022-10-21 15:00:25 +00:00
  • edd405b2c8 Accepting request 1006180 from network:cluster Dominique Leuenberger 2022-09-26 16:48:44 +00:00
  • 168a697f85 Accepting request 1006180 from network:cluster Dominique Leuenberger 2022-09-26 16:48:44 +00:00
  • 09aecc2015 Accepting request 1005746 from home:eeich:branches:network:cluster Egbert Eich 2022-09-26 15:01:51 +00:00
  • 542f114081 Accepting request 1005746 from home:eeich:branches:network:cluster Egbert Eich 2022-09-26 15:01:51 +00:00