- Update to version 20.11.0
Slurm 20.11 includes a number of new features including:
* Overhaul of the job step management and launch code, alongside improved
GPU task placement support.
* A new "Interactive Step" mode of operation for salloc.
* A new "scrontab" command that can be used to submit and manage
periodically repeating jobs.
* IPv6 support.
* Changes to the reservation logic, with new options allowing users
to delete reservations, allowing admins to skip the next occurance of a
repeated reservation, and allowing for a job to be submitted and eligible
to run within multiple reservations.
* Dynamic Future Nodes - automatically associate a dynamically
provisioned (or "cloud") node against a NodeName definition with matching
hardware.
* An experimental new RPC queuing mode for slurmctld to reduce thread
contention on heavily loaded clusters.
* SlurmDBD integration with the Slurm REST API.
Also check
https://github.com/SchedMD/slurm/blob/slurm-20-11-0-1/RELEASE_NOTES
OBS-URL: https://build.opensuse.org/request/show/852039
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=160
- Updated to 20.02.5, changes:
* Fix leak of TRESRunMins when job time is changed with --time-min
* pam_slurm - explicitly initialize slurm config to support configless mode.
* scontrol - Fix exit code when creating/updating reservations with wrong
Flags.
* When a GRES has a no_consume flag, report 0 for allocated.
* Fix cgroup cleanup by jobacct_gather/cgroup.
* When creating reservations/jobs don't allow counts on a feature unless
using an XOR.
* Improve number of boards discovery
* Fix updating a reservation NodeCnt on a zero-count reservation.
* slurmrestd - provide an explicit error messages when PSK auth fails.
* cons_tres - fix job requesting single gres per-node getting two or more
nodes with less CPUs than requested per-task.
* cons_tres - fix calculation of cores when using gres and cpus-per-task.
* cons_tres - fix job not getting access to socket without GPU or with less
than --gpus-per-socket when not enough cpus available on required socket
and not using --gres-flags=enforce binding.
* Fix HDF5 type version build error.
* Fix creation of CoreCnt only reservations when the first node isn't
available.
* Fix wrong DBD Agent queue size in sdiag when using accounting_storage/none.
* Improve job constraints XOR option logic.
* Fix preemption of hetjobs when needed nodes not in leader component.
* Fix wrong bit_or() messing potential preemptor jobs node bitmap, causing
bad node deallocations and even allocation of nodes from other partitions.
* Fix double-deallocation of preempted non-leader hetjob components.
* slurmdbd - prevent truncation of the step nodelists over 4095.
* Fix nodes remaining in drain state state after rebooting with ASAP option.
- changes from 20.02.4:
OBS-URL: https://build.opensuse.org/request/show/845108
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=156
- Updated to 20.02.3 which fixes CVE-2020-12693 (bsc#1172004).
- Other changes are:
* Factor in ntasks-per-core=1 with cons_tres.
* Fix formatting in error message in cons_tres.
* Fix calling stat on a NULL variable.
* Fix minor memory leak when using reservations with flags=first_cores.
* Fix gpu bind issue when CPUs=Cores and ThreadsPerCore > 1 on a node.
* Fix --mem-per-gpu for heterogenous --gres requests.
* Fix slurmctld load order in load_all_part_state().
* Fix race condition not finding jobacct gather task cgroup entry.
* Suppress error message when selecting nodes on disjoint topologies.
* Improve performance of _pack_default_job_details() with large number of job
* arguments.
* Fix archive loading previous to 17.11 jobs per-node req_mem.
* Fix regresion validating that --gpus-per-socket requires --sockets-per-node
* for steps. Should only validate allocation requests.
* error() instead of fatal() when parsing an invalid hostlist.
* nss_slurm - fix potential deadlock in slurmstepd on overloaded systems.
* cons_tres - fix --gres-flags=enforce-binding and related --cpus-per-gres.
* cons_tres - Allocate lowest numbered cores when filtering cores with gres.
* Fix getting system counts for named GRES/TRES.
* MySQL - Fix for handing typed GRES for association rollups.
* Fix step allocations when tasks_per_core > 1.
* Fix allocating more GRES than requested when asking for multiple GRES types.
OBS-URL: https://build.opensuse.org/request/show/808569
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=45
- Updated to 20.02.3 which fixes CVE-2020-12693
- Other changes are:
* Factor in ntasks-per-core=1 with cons_tres.
* Fix formatting in error message in cons_tres.
* Fix calling stat on a NULL variable.
* Fix minor memory leak when using reservations with flags=first_cores.
* Fix gpu bind issue when CPUs=Cores and ThreadsPerCore > 1 on a node.
* Fix --mem-per-gpu for heterogenous --gres requests.
* Fix slurmctld load order in load_all_part_state().
* Fix race condition not finding jobacct gather task cgroup entry.
* Suppress error message when selecting nodes on disjoint topologies.
* Improve performance of _pack_default_job_details() with large number of job
* arguments.
* Fix archive loading previous to 17.11 jobs per-node req_mem.
* Fix regresion validating that --gpus-per-socket requires --sockets-per-node
* for steps. Should only validate allocation requests.
* error() instead of fatal() when parsing an invalid hostlist.
* nss_slurm - fix potential deadlock in slurmstepd on overloaded systems.
* cons_tres - fix --gres-flags=enforce-binding and related --cpus-per-gres.
* cons_tres - Allocate lowest numbered cores when filtering cores with gres.
* Fix getting system counts for named GRES/TRES.
* MySQL - Fix for handing typed GRES for association rollups.
* Fix step allocations when tasks_per_core > 1.
* Fix allocating more GRES than requested when asking for multiple GRES types.
OBS-URL: https://build.opensuse.org/request/show/808130
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=147
- Updated to 20.02.1 with following changes"
* Improve job state reason for jobs hitting partition_job_depth.
* Speed up testing of singleton dependencies.
* Fix negative loop bound in cons_tres.
* srun - capture the MPI plugin return code from mpi_hook_client_fini() and
use as final return code for step failure.
* Fix segfault in cli_filter/lua.
* Fix --gpu-bind=map_gpu reusability if tasks > elements.
* Make sure config_flags on a gres are sent to the slurmctld on node
registration.
* Prolog/Epilog - Fix missing GPU information.
* Fix segfault when using config parser for expanded lines.
* Fix bit overlap test function.
* Don't accrue time if job begin time is in the future.
* Remove accrue time when updating a job start/eligible time to the future.
* Fix regression in 20.02.0 that broke --depend=expand.
* Reset begin time on job release if it's not in the future.
* Fix for recovering burst buffers when using high-availability.
* Fix invalid read due to freeing an incorrectly allocated env array.
* Update slurmctld -i message to warn about losing data.
* Fix scontrol cancel_reboot so it clears the DRAIN flag and node reason for a
pending ASAP reboot.
OBS-URL: https://build.opensuse.org/request/show/788905
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=145
- Update to version 20.02.0 (jsc#SLE-8491)
* Fix minor memory leak in slurmd on reconfig.
* Fix invalid ptr reference when rolling up data in the database.
* Change shtml2html.py to require python3 for RHEL8 support, and match
man2html.py.
* slurm.spec - override "hardening" linker flags to ensure RHEL8 builds
in a usable manner.
* Fix type mismatches in the perl API.
* Prevent use of uninitialized slurmctld_diag_stats.
* Fixed various Coverity issues.
* Only show warning about root-less topology in daemons.
* Fix accounting of jobs in IGNORE_JOBS reservations.
* Fix issue with batch steps state not loading correctly when upgrading from
19.05.
* Deprecate max_depend_depth in SchedulerParameters and move it to
DependencyParameters.
* Silence erroneous error on slurmctld upgrade when loading federation state.
* Break infinite loop in cons_tres dealing with incorrect tasks per tres
request resulting in slurmctld hang.
* Improve handling of --gpus-per-task to make sure appropriate number of GPUs
is assigned to job.
* Fix seg fault on cons_res when requesting --spread-job.
- Move to python3 for everything but SLE-11-SP4
* For SLE-11-SP4 add a workaround to handle a python3 script (python2.7
compliant).
OBS-URL: https://build.opensuse.org/request/show/779379
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=136
- Update to version 20.02.0-rc1
* sbatch - fix segfault when no newline at the end of a burst buffer file.
* Change scancel to only check job's base state when matching -t options.
* Save job dependency list in state files.
* cons_tres - allow jobs to be run on systems with root-less topologies.
* Restore pre-20.02pre1 PrologSlurmctld synchonization behavior to avoid
various race conditions, and ensure proper batch job launch.
* Add new slurmrestd command/daemon which implements the Slurm REST API.
- Update to version 20.02.0-0pre1, highlights are
OBS-URL: https://build.opensuse.org/request/show/774250
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=134
- Updated to version 20.02.0-0pre1, highlights are
Highlights:
* Exclusive behavior of a node includes all GRES on a node as well
as the cpus.
* Use python3 instead of python for internal build/test scripts.
The slurm.spec file has been updated to depend on python3 as well.
* Added new NodeSet configuration option to help simplify partition
configuration sections for heterogeneous / condo*style clusters.
* Added slurm.conf option MaxDBDMsgs to control how many messages will be
stored in the slurmctld before throwing them away when the slurmdbd is down.
* The checkpoint plugin interface and all associated API calls have been
removed.
* slurm_init_job_desc_msg() initializes mail_type as uint16_t. This allows
mail_type to be set to NONE with scontrol.
* Add new slurm_spank_log() function to print messages back to the user from
within a SPANK plugin without prepending "error: " from slurm_error().
* Enforce having partition name and nodelist=ALL when creating reservations
with flags=PART_NODES.
* SPANK - removed never-implemented slurm_spank_slurmd_init() interface. This
hook has always been accessible through slurm_spank_init() in the
S_CTX_SLURMD context instead.
* sbcast - add new BcastAddr option to NodeName lines to allow sbcast traffic
to flow over an alternate network path.
* Added auth/jwt plugin, and 'scontrol token' subcommand. PMIx - improve
* performance of proc map generation. Deprecate kill_invalid_depend in
* SchedulerParameters and move it to a new
option called DependencyParameters.
* Enable job dependencies for any job on any cluster in the same federation.
* Allow clusters to be added automatically to db at startup of ctld. Add
* AccountingStorageExternalHost slurm.conf parameter. The
OBS-URL: https://build.opensuse.org/request/show/773459
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=130
- BuildRequire pkgconfig(systemd) instead of systemd: allow OBS to
shortcut through the -mini flavors.
- Use systemd_ordering instead of systemd_requires: systemd is
never a strict requirement; but in case the system is scheduled
for installation together with systemd, we want systemd to be
installed prior to slurm.
- start slurmdbd after mariadb (bsc#1161716)
OBS-URL: https://build.opensuse.org/request/show/766872
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=123
- Update to version 19.05.5 (jsc#SLE-8491)
* Check %docdir/NEWS for details.
* Includes security fixes CVE-2019-19727, CVE-2019-19728,
CVE-2019-12838.
* Disable i586 builds as this is no longer supported.
* Create libnss_slurm package to support user and group resolution
thru slurmstepd.
* slurm-2.4.4-rpath.patch -> Remove-rpath-from-build.patch
Obsoleted:
- pam_slurm_adopt-avoid-running-outside-of-the-sshd-PA.patch
- pam_slurm_adopt-send_user_msg-don-t-copy-undefined-d.patch
- pam_slurm_adopt-use-uid-to-determine-whether-root-is.patch
OBS-URL: https://build.opensuse.org/request/show/762650
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=118
- Deprecate "ControlMachine" only for SLURM version upgrades and
products newer than 1501. This ensures that the original setting
is retained for the SLURM version shipped origianlly with SLE-15-SP1
or Leap 15.1.
- Update to v18.08.9 for fixing CVE-2019-19728 (bsc#1159692).
* Wrap END_TIMER{,2,3} macro definition in "do {} while (0)" block.
* Make sview work with glib2 v2.62.
* Make Slurm compile on linux after sys/sysctl.h was deprecated.
* Install slurmdbd.conf.example with 0600 permissions to encourage secure
use. CVE-2019-19727.
* srun - do not continue with job launch if --uid fails. CVE-2019-19728.
- added pmix support jsc#SLE-10800
- Use --with-shared-libslurm to build slurm binaries using libslurm.
- Make libslurm depend on slurm-config.
- Fix ownership of /var/spool/slurm on new installations
and upgrade (boo#1158696).
- Fix permissions of slurmdbd.conf (bsc#1155784, CVE-2019-19727).
- Fix %posttrans macro _res_update to cope with added newline
(bsc#1153259).
- Add package slurm-webdoc which sets up a web server to provide
the documentation for the version shipped.
- Move srun from 'slurm' to 'slurm-node': srun is required on the
nodes as well so sbatch will work. 'slurm-node' is a requirement (forwarded request 760450 from eeich)
OBS-URL: https://build.opensuse.org/request/show/761961
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=34
- Deprecate "ControlMachine" only for SLURM version upgrades and
products newer than 1501. This ensures that the original setting
is retained for the SLURM version shipped origianlly with SLE-15-SP1
or Leap 15.1.
- Update to v18.08.9 for fixing CVE-2019-19728 (bsc#1159692).
* Wrap END_TIMER{,2,3} macro definition in "do {} while (0)" block.
* Make sview work with glib2 v2.62.
* Make Slurm compile on linux after sys/sysctl.h was deprecated.
* Install slurmdbd.conf.example with 0600 permissions to encourage secure
use. CVE-2019-19727.
* srun - do not continue with job launch if --uid fails. CVE-2019-19728.
- added pmix support jsc#SLE-10800
- Use --with-shared-libslurm to build slurm binaries using libslurm.
- Make libslurm depend on slurm-config.
- Fix ownership of /var/spool/slurm on new installations
and upgrade (boo#1158696).
- Fix permissions of slurmdbd.conf (bsc#1155784, CVE-2019-19727).
- Fix %posttrans macro _res_update to cope with added newline
(bsc#1153259).
- Add package slurm-webdoc which sets up a web server to provide
the documentation for the version shipped.
- Move srun from 'slurm' to 'slurm-node': srun is required on the
nodes as well so sbatch will work. 'slurm-node' is a requirement
OBS-URL: https://build.opensuse.org/request/show/760450
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=116