From 4ab9986278c0f31f0245807bf611bf7921ca4f9e540f510e16efe72004e49ae3 Mon Sep 17 00:00:00 2001 From: Ana Guerrero Date: Wed, 20 Jan 2021 13:58:46 +0000 Subject: [PATCH] Accepting request 864993 from home:anag:branches:network:cluster - Update to 20.11.03 - This release includes a major functional change to how job step launch is handled compared to the previous 20.11 releases. This affects srun as well as MPI stacks - such as Open MPI - which may use srun internally as part of the process launch. One of the changes made in the Slurm 20.11 release was to the semantics for job steps launched through the 'srun' command. This also inadvertently impacts many MPI releases that use srun underneath their own mpiexec/mpirun command. For 20.11.{0,1,2} releases, the default behavior for srun was changed such that each step was allocated exactly what was requested by the options given to srun, and did not have access to all resources assigned to the job on the node by default. This change was equivalent to Slurm setting the --exclusive option by default on all job steps. Job steps desiring all resources on the node needed to explicitly request them through the new '--whole' option. In the 20.11.3 release, we have reverted to the 20.02 and older behavior of assigning all resources on a node to the job step by default. This reversion is a major behavioral change which we would not generally do on a maintenance release, but is being done in the interest of restoring compatibility with the large number of existing Open MPI (and other MPI flavors) and job scripts that exist in production, and to remove what has proven to be a significant hurdle in moving to the new release. Please note that one change to step launch remains - by default, in 20.11 steps are no longer permitted to overlap on the resources they have been assigned. If that behavior is desired, all steps must explicitly opt-in through the newly added '--overlap' option. Further details and a full explanation of the issue can be found at: https://bugs.schedmd.com/show_bug.cgi?id=10383#c63 OBS-URL: https://build.opensuse.org/request/show/864993 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=171 --- slurm-20.11.2.tar.bz2 | 3 -- slurm-20.11.3.tar.bz2 | 3 ++ slurm.changes | 76 +++++++++++++++++++++++++++++++++++++++++++ slurm.spec | 2 +- 4 files changed, 80 insertions(+), 4 deletions(-) delete mode 100644 slurm-20.11.2.tar.bz2 create mode 100644 slurm-20.11.3.tar.bz2 diff --git a/slurm-20.11.2.tar.bz2 b/slurm-20.11.2.tar.bz2 deleted file mode 100644 index 8c7e5eb..0000000 --- a/slurm-20.11.2.tar.bz2 +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:b7fb4b9a9b73d3ee4cade654860352cacb0d1230243f1905f8ed5d858ade0296 -size 6532310 diff --git a/slurm-20.11.3.tar.bz2 b/slurm-20.11.3.tar.bz2 new file mode 100644 index 0000000..2de78a4 --- /dev/null +++ b/slurm-20.11.3.tar.bz2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:731558f4fde8c9b0935e0fcd9b769fe7338930a4b9dcfada305d0303bde9e0e8 +size 6530011 diff --git a/slurm.changes b/slurm.changes index 3645593..4b804ca 100644 --- a/slurm.changes +++ b/slurm.changes @@ -1,3 +1,79 @@ +------------------------------------------------------------------- +Wed Jan 20 10:13:23 UTC 2021 - Ana Guerrero Lopez + +- Update to 20.11.03 +- This release includes a major functional change to how job step launch is + handled compared to the previous 20.11 releases. This affects srun as + well as MPI stacks - such as Open MPI - which may use srun internally as + part of the process launch. + One of the changes made in the Slurm 20.11 release was to the semantics + for job steps launched through the 'srun' command. This also + inadvertently impacts many MPI releases that use srun underneath their + own mpiexec/mpirun command. + For 20.11.{0,1,2} releases, the default behavior for srun was changed + such that each step was allocated exactly what was requested by the + options given to srun, and did not have access to all resources assigned + to the job on the node by default. This change was equivalent to Slurm + setting the --exclusive option by default on all job steps. Job steps + desiring all resources on the node needed to explicitly request them + through the new '--whole' option. + In the 20.11.3 release, we have reverted to the 20.02 and older behavior + of assigning all resources on a node to the job step by default. + This reversion is a major behavioral change which we would not generally + do on a maintenance release, but is being done in the interest of + restoring compatibility with the large number of existing Open MPI (and + other MPI flavors) and job scripts that exist in production, and to + remove what has proven to be a significant hurdle in moving to the new + release. + Please note that one change to step launch remains - by default, in + 20.11 steps are no longer permitted to overlap on the resources they + have been assigned. If that behavior is desired, all steps must + explicitly opt-in through the newly added '--overlap' option. + Further details and a full explanation of the issue can be found at: + https://bugs.schedmd.com/show_bug.cgi?id=10383#c63 +- Other changes from 20.11.03 + * Fix segfault when parsing bad "#SBATCH hetjob" directive. + * Allow countless gpu:srun, sbatch->srun sequence. + * Reject job credential if non-superuser sets the LAUNCH_NO_ALLOC flag. + * Make it so srun --no-allocate works again. + * jobacct_gather/linux - Don't count memory on tasks that have already + finished. + * Fix 19.05/20.02 batch steps talking with a 20.11 slurmctld. + * jobacct_gather/common - Do not process jobacct's with same taskid when + calling prec_extra. + * Cleanup all tracked jobacct tasks when extern step child process finishes. + * slurmrestd/dbv0.0.36 - Correct structure of dbv0.0.36_tres_list. + * Fix regression causing task/affinity and task/cgroup to be out of sync when + configured ThreadsPerCore is different than the physical threads per core. + * Fix situation when --gpus is given but not max nodes (-N1-1) in a job + allocation. + * Interactive step - ignore cpu bind and mem bind options, and do not set + the associated environment variables which lead to unexpected behavior + from srun commands launched within the interactive step. + * Handle exit code from pipe when using UCX with PMIx. + ------------------------------------------------------------------- Fri Jan 8 13:27:02 UTC 2021 - Egbert Eich diff --git a/slurm.spec b/slurm.spec index c39ac9d..ce4a6dd 100644 --- a/slurm.spec +++ b/slurm.spec @@ -18,7 +18,7 @@ # Check file META in sources: update so_version to (API_CURRENT - API_AGE) %define so_version 36 -%define ver 20.11.2 +%define ver 20.11.3 %define _ver _20_11 %define dl_ver %{ver} # so-version is 0 and seems to be stable