Egbert Eich
fd509c0258
make slurmtest.tar reproducible OBS-URL: https://build.opensuse.org/request/show/990637 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=213
2904 lines
147 KiB
Plaintext
2904 lines
147 KiB
Plaintext
-------------------------------------------------------------------
|
|
Thu Jul 21 19:20:42 UTC 2022 - Bernhard Wiedemann <bwiedemann@suse.com>
|
|
|
|
- make slurmtest.tar reproducible
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Jul 14 15:20:46 UTC 2022 - Egbert Eich <eich@suse.com>
|
|
|
|
- Improve check for mpicc in testsuite package: if binary isn't
|
|
found, don't crash.
|
|
- Fix a typo which prevented the nproc limit for slurmd to be
|
|
up-ed for the test suite.
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Jun 20 09:23:17 UTC 2022 - Christian Goll <cgoll@suse.com>
|
|
|
|
- update to 22.05.2 with following fixes:
|
|
* Fix regression which allowed the oversubscription of licenses.
|
|
* Fix a segfault in slurmctld when requesting gres in job arrays.
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Jun 8 13:15:24 UTC 2022 - Egbert Eich <eich@suse.com>
|
|
|
|
- Package the Slurm testsuite for QA purposes.
|
|
NOTE: This package is not meant to be used for testing by the
|
|
user but rather for testing by the maintainers to ensure the
|
|
package is working properly.
|
|
DO NOT report test suite failures unless you are able to confirm
|
|
that the failure is really a bug.
|
|
* Fixes for test suite:
|
|
Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch
|
|
Fix-test-21.41.patch
|
|
Fix-test-38.11.patch
|
|
Fix-test-32.8.patch
|
|
Fix-test-3.13.patch
|
|
Fix-test7.2-to-find-libpmix-under-lib64-as-well.patch
|
|
* Add documentation:
|
|
README_Testsuite.md
|
|
- Allow log in as user 'slurm'. This allows admins to run certain
|
|
priviledged commands more easily without becoming root.
|
|
|
|
-------------------------------------------------------------------
|
|
Tue May 31 12:56:05 UTC 2022 - Christian Goll <cgoll@suse.com>
|
|
|
|
- update to 22.05.0 with following changes:
|
|
- Support for dynamic node addition and removal
|
|
- Support for native Linux cgroup v2 operation
|
|
- Newly added plugins to support HPE Slingshot 11 networks
|
|
(switch/hpe_slingshot), and Intel Xe GPUs (gpu/oneapi)
|
|
- Added new acct_gather_interconnect/sysfs plugin to collect statistics
|
|
from arbitrary network interfaces.
|
|
- Expanded and synced set of environment variables available in the
|
|
Prolog/Epilog/PrologSlurmctld/EpilogSlurmctld scripts.
|
|
- New "--prefer" option to job submissions to allow for a "soft
|
|
constraint" request to influence node selection.
|
|
- Optional support for license planning in the backfill scheduler with
|
|
"bf_licenses" option in SchedulerParameters.
|
|
- removed file slurm-2.4.4-init.patch as sysvinit is now realy deprecated
|
|
- removed file load-pmix-major-version.patch as fixed upstream
|
|
|
|
-------------------------------------------------------------------
|
|
Tue May 10 10:26:02 UTC 2022 - Egbert Eich <eich@suse.com>
|
|
|
|
- Add a comment about the CommunicationParameters=block_null_hash
|
|
option warning users who migrate - just in case.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri May 6 09:33:34 UTC 2022 - Christian Goll <cgoll@suse.com>
|
|
|
|
- Update to 21.08.8 which fixes CVE-2022-29500 (bsc#1199278),
|
|
CVE-2022-29501 (bsc#1199279), and CVE-2022-29502 (bsc#1199281).
|
|
- Added 'CommunicationParameters=block_null_hash' to slurm.conf, please
|
|
add this parameter to existing configurations.
|
|
|
|
-------------------------------------------------------------------
|
|
Mon May 2 14:12:59 UTC 2022 - Christian Goll <cgoll@suse.com>
|
|
- Update to 21.08.7 with following changes:
|
|
* openapi/v0.0.37 - correct calculation for bf_queue_len_mean in /diag.
|
|
* Avoid shrinking a reservation when overlapping with downed nodes.
|
|
* Only check TRES limits against current usage for TRES requested by the job.
|
|
* Do not allocate shared gres (MPS) in whole-node allocations
|
|
* Constrain slurmstepd to job/step cgroup like in previous versions of Slurm.
|
|
* Fix warnings on 32-bit compilers related to printf() formats.
|
|
* Fix reconfigure issues after disabling/reenabling the GANG PreemptMode.
|
|
* Fix race condition where a cgroup was being deleted while another step
|
|
was creating it.
|
|
* Set the slurmd port correctly if multi-slurmd
|
|
* Fix FAIL mail not being sent if a job was cancelled due to preemption.
|
|
* slurmrestd - move debug logs for HTTP handling to be gated by debugflag
|
|
NETWORK to avoid unnecessary logging of communication contents.
|
|
* Fix issue with bad memory access when shrinking running steps.
|
|
* Fix various issues with internal job accounting with GRES when jobs are
|
|
shrunk.
|
|
* Fix ipmi polling on slurmd reconfig or restart.
|
|
* Fix srun crash when reserved ports are being used and het step fails
|
|
to launch.
|
|
* openapi/dbv0.0.37 - fix DELETE execution path on /user/{user_name}.
|
|
* slurmctld - Properly requeue all components of a het job if PrologSlurmctld
|
|
fails.
|
|
* rlimits - remove final calls to limit nofiles to 4096 but to instead use
|
|
the max possible nofiles in slurmd and slurmdbd.
|
|
* Allow the DBD agent to load large messages (up to MAX_BUF_SIZE) from state.
|
|
* Fix potential deadlock during slurmctld restart when there is a completing
|
|
job.
|
|
* slurmstepd - reduce user requested soft rlimits when they are above max
|
|
hard rlimits to avoid rlimit request being completely ignored and
|
|
processes using default limits.
|
|
* Fix Slurm user commands displaying available features as active features
|
|
when no features were active.
|
|
* Don't power down nodes that are rebooting.
|
|
* Clear pending node reboot on power down request.
|
|
* Ignore node registrations while node is powering down.
|
|
* Don't reboot any node that is power<ing|ed> down.
|
|
* Don't allow a node to reboot if it's marked for power down.
|
|
* Fix issuing reboot and downing when rebooting a powering up node.
|
|
* Clear DRAIN on node after failing to resume before ResumeTimeout.
|
|
* Prevent repeating power down if node fails to resume before ResumeTimeout.
|
|
* Fix federated cloud node communication with srun and cloud_dns.
|
|
* Fix jobs being scheduled on nodes marked to be powered_down when idle.
|
|
* Fix problem where a privileged user could not view array tasks specified by
|
|
<array_job_id>_<task_id> when PrivateData had the jobs value set.
|
|
- Changes in Slurm 21.08.6
|
|
* Fix plugin_name definitions in a number of plugins to improve logging.
|
|
* Close sbcast file transfers when job is cancelled.
|
|
* scrontab - fix handling of --gpus and --ntasks-per-gpu options.
|
|
* sched/backfill - fix job_queue_rec_t memory leak.
|
|
* Fix magnetic reservation logic in both main and backfill schedulers.
|
|
* job_container/tmpfs - fix memory leak when using InitScript.
|
|
* slurmrestd / openapi - fix memory leaks.
|
|
* Fix slurmctld segfault due to job array resv_list double free.
|
|
* Fix multi-reservation job testing logic.
|
|
* Fix slurmctld segfault due to insufficient job reservation parse validation.
|
|
* Fix main and backfill schedulers handling for already rejected job array.
|
|
* sched/backfill - restore resv_ptr after yielding locks.
|
|
* acct_gather_energy/xcc - appropriately close and destroy the IPMI context.
|
|
* Protect slurmstepd from making multiple calls to the cleanup logic.
|
|
* Prevent slurmstepd segfault at cleanup time in mpi_fini().
|
|
* Fix slurmctld sometimes hanging if shutdown while PrologSlurmctld or
|
|
EpilogSlurmctld were running and PrologEpilogTimeout is set in slurm.conf.
|
|
* Fix affinity of the batch step if batch host is different than the first
|
|
node in the allocation.
|
|
* slurmdbd - fix segfault after multiple failover/failback operations.
|
|
* Fix jobcomp filetxt job selection condition.
|
|
* Fix -f flag of sacct not being used.
|
|
* Select cores for job steps according to the socket distribution. Previously,
|
|
sockets were always filled before selecting cores from the next socket.
|
|
* Keep node in Future state if epilog completes while in Future state.
|
|
* Fix erroneous --constraint behavior by preventing multiple sets of brackets.
|
|
* Make ResetAccrueTime update the job's accrue_time to now.
|
|
* Fix sattach initialization with configless mode.
|
|
* Revert packing limit checks affecting pmi2.
|
|
* sacct - fixed assertion failure when using -c option and a federation
|
|
display
|
|
* Fix issue that allowed steps to overallocate the job's memory.
|
|
* Fix the sanity check mode of AutoDetect so that it actually works.
|
|
* Fix deallocated nodes that didn't actually launch a job from waiting for
|
|
Epilogslurmctld to complete before clearing completing node's state.
|
|
* Job should be in a completing state if EpilogSlurmctld when being requeued.
|
|
* Fix job not being requeued properly if all node epilog's completed before
|
|
EpilogSlurmctld finished.
|
|
* Keep job completing until EpilogSlurmctld is completed even when "downing"
|
|
a node.
|
|
* Fix handling reboot with multiple job features.
|
|
* Fix nodes getting powered down when creating new partitions.
|
|
* Fix bad bit_realloc which potentially could lead to bad memory access.
|
|
* slurmctld - remove limit on the number of open files.
|
|
* Fix bug where job_state file of size above 2GB wasn't saved without any
|
|
error message.
|
|
* Fix various issues with no_consume gres.
|
|
* Fix regression in 21.08.0rc1 where job steps failed to launch on systems
|
|
that reserved a CPU in a cgroup outside of Slurm (for example, on systems
|
|
with WekaIO).
|
|
* Fix OverTimeLimit not being reset on scontrol reconfigure when it is
|
|
removed from slurm.conf.
|
|
* serializer/yaml - use dynamic buffer to allow creation of YAML outputs
|
|
larger than 1MiB.
|
|
* Fix minor memory leak affecting openapi users at process termination.
|
|
* Fix batch jobs not resolving the username when nss_slurm is enabled.
|
|
* slurmrestd - Avoid slurmrestd ignoring invalid HTTP method if the response
|
|
serialized without error.
|
|
* openapi/dbv0.0.37 - Correct conditional that caused the diag output to
|
|
give an internal server error status on success.
|
|
* Make --mem-bind=sort work with task_affinity
|
|
* Fix sacctmgr to set MaxJobsAccruePer{User|Account} and MinPrioThres in
|
|
sacctmgr add qos, modify already worked correctly.
|
|
* job_container/tmpfs - avoid printing extraneous error messages in Prolog
|
|
and Epilog, and when the job completes.
|
|
* Fix step CPU memory allocation with --threads-per-core without --exact.
|
|
* Remove implicit --exact when --threads-per-core or --hint=nomultithread
|
|
is used.
|
|
* Do not allow a step to request more threads per core than the
|
|
allocation did.
|
|
* Remove implicit --exact when --cpus-per-task is used.
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Dec 22 09:24:28 UTC 2021 - Christian Goll <cgoll@suse.com>
|
|
|
|
- update to 21.08.5 with following changes:
|
|
* Fix issue where typeless GRES node updates were not immediately reflected.
|
|
* Fix setting the default scrontab job working directory so that it's the home
|
|
of the different user (*u <user>) and not that of root or SlurmUser editor.
|
|
* Fix stepd not respecting SlurmdSyslogDebug.
|
|
* Fix concurrency issue with squeue.
|
|
* Fix job start time not being reset after launch when job is packed onto
|
|
already booting node.
|
|
* Fix updating SLURM_NODE_ALIASES for jobs packed onto powering up nodes.
|
|
* Cray - Fix issues with starting hetjobs.
|
|
* auth/jwks - Print fatal() message when jwks is configured but file could
|
|
not be opened.
|
|
* If sacctmgr has an association with an unknown qos as the default qos
|
|
print 'UNKN*###' instead of leaving a blank name.
|
|
* Correctly determine task count when giving --cpus-per-gpu, --gpus and
|
|
*-ntasks-per-node without task count.
|
|
* slurmctld - Fix places where the global last_job_update was not being set
|
|
to the time of update when a job's reason and description were updated.
|
|
* slurmctld - Fix case where a job submitted with more than one partition
|
|
would not have its reason updated while waiting to start.
|
|
* Fix memory leak in node feature rebooting.
|
|
* Fix time limit permanetly set to 1 minute by backfill for job array tasks
|
|
higher than the first with QOS NoReserve flag and PreemptMode configured.
|
|
* Fix sacct -N to show jobs that started in the current second
|
|
* Fix issue on running steps where both SLURM_NTASKS_PER_TRES and
|
|
SLURM_NTASKS_PER_GPU are set.
|
|
* Handle oversubscription request correctly when also requesting
|
|
*-ntasks-per-tres.
|
|
* Correctly detect when a step requests bad gres inside an allocation.
|
|
* slurmstepd - Correct possible deadlock when UnkillableStepTimeout triggers.
|
|
* srun - use maximum number of open files while handling job I/O.
|
|
* Fix writing to Xauthority files on root_squash NFS exports, which was
|
|
preventing X11 forwarding from completing setup.
|
|
* Fix regression in 21.08.0rc1 that broke --gres=none.
|
|
* Fix srun --cpus-per-task and --threads-per-core not implicitly setting
|
|
*-exact. It was meant to work this way in 21.08.
|
|
* Fix regression in 21.08.0 that broke dynamic future nodes.
|
|
* Fix dynamic future nodes remembering active state on restart.
|
|
* Fix powered down nodes getting stuck in COMPLETING+POWERED_DOWN when job is
|
|
cancelled before nodes are powering up.
|
|
|
|
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Nov 17 08:33:13 UTC 2021 - Christian Goll <cgoll@suse.com>
|
|
|
|
- updated to 21.08.4 which fixes (CVE-2021-43337) which is only present
|
|
in 21.08 tree.
|
|
* CVE-2021-43337:
|
|
For sites using the new AccountingStoreFlags=job_script and/or job_env
|
|
options, an issue was reported with the access control rules in SlurmDBD
|
|
that will permit users to request job scripts and environment files that
|
|
they should not have access to. (Scripts/environments are meant to only be
|
|
accessible by user accounts with administrator privileges, by account
|
|
coordinators for jobs submitted under their account, and by the user
|
|
themselves.)
|
|
- changes from 21.08.3:
|
|
* This includes a number of fixes since the last release a month ago,
|
|
including one critical fix to prevent a communication issue between
|
|
slurmctld and slurmdbd for sites that have started using the new
|
|
AccountingStoreFlags=job_script functionality.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Oct 29 15:54:53 UTC 2021 - Egbert Eich <eich@suse.com>
|
|
|
|
- Utilize sysuser infrastructure to set user/group slurm.
|
|
For munge authentication slurm should have a fixed UID across
|
|
all nodes including the management server. Set it to 120
|
|
- Limit firewalld service definitions to SUSE versions >= 15.
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Oct 18 13:36:14 UTC 2021 - Christian Goll <cgoll@suse.com>
|
|
|
|
- added service definitions for firewalld (JSC#SLE-22741)
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Oct 6 07:12:52 UTC 2021 - Christian Goll <cgoll@suse.com>
|
|
|
|
- update to 21.08.2
|
|
- major change:
|
|
* removed of support of the TaskAffinity=yes option in cgroup.conf. Please
|
|
consider using "TaskPlugins=cgroup,affinity" in slurm.conf as an option.
|
|
- minor changes and bugfixes:
|
|
* slurmctld - fix how the max number of cores on a node in a partition are
|
|
calculated when the partition contains multi*socket nodes. This in turn
|
|
corrects certain jobs node count estimations displayed client*side.
|
|
* job_submit/cray_aries - fix "craynetwork" GRES specification after changes
|
|
introduced in 21.08.0rc1 that made TRES always have a type prefix.
|
|
* Ignore nonsensical check in the slurmd for [Pro|Epi]logSlurmctld.
|
|
* Fix writing to stderr/syslog when systemd runs slurmctld in the foreground.
|
|
* Fix issue with updating job started with node range.
|
|
* Fix issue with nodes not clearing state in the database when the slurmctld
|
|
is started with clean*start.
|
|
* Fix hetjob components > 1 timing out due to InactiveLimit.
|
|
* Fix sprio printing -nan for normalized association priority if
|
|
PriorityWeightAssoc was not defined.
|
|
* Disallow FirstJobId=0.
|
|
* Preserve job start info in the database for a requeued job that hadn't
|
|
registered the first time in the database yet.
|
|
* Only send one message on prolog failure from the slurmd.
|
|
* Remove support for TaskAffinity=yes in cgroup.conf.
|
|
* accounting_storage/mysql - fix issue where querying jobs via sacct
|
|
*-whole-hetjob=yes or slurmrestd (which automatically includes this flag)
|
|
could in some cases return more records than expected.
|
|
* Fix issue for preemption of job array task that makes afterok dependency
|
|
fail. Additionally, send emails when requeueing happens due to preemption.
|
|
* Fix sending requeue mail type.
|
|
* Properly resize a job's GRES bitmaps and counts when resizing the job.
|
|
* Fix node being able to transition to CLOUD state from non-cloud state.
|
|
* Fix regression introduced in 21.08.0rc1 which broke a step's ability to
|
|
inherit GRES from the job when the step didn't request GRES but the job did.
|
|
* Fix errors in logic when picking nodes based on bracketed anded constraints.
|
|
This also enforces the requirement to have a count when using such
|
|
constraints.
|
|
* Handle job resize better in the database.
|
|
* Exclude currently running, resized jobs from the runaway jobs list.
|
|
* Make it possible to shrink a job more than once.
|
|
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Sep 28 15:53:38 UTC 2021 - Christian Goll <cgoll@suse.com>
|
|
|
|
- moved pam module from /lib64 to /usr/lib64 which fixes boo#1191095
|
|
via the macro %_pam_moduledir
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Sep 17 07:22:44 UTC 2021 - Christian Goll <cgoll@suse.com>
|
|
|
|
- updated to 21.08.1 with following bug fixes:
|
|
* Fix potential memory leak if a problem happens while allocating GRES for
|
|
a job.
|
|
* If an overallocation of GRES happens terminate the creation of a job.
|
|
* AutoDetect=nvml: Fatal if no devices found in MIG mode.
|
|
* Print federation and cluster sacctmgr error messages to stderr.
|
|
* Fix off by one error in --gpu-bind=mask_gpu.
|
|
* Add --gpu-bind=none to disable gpu binding when using --gpus-per-task.
|
|
* Handle the burst buffer state "alloc-revoke" which previously would not
|
|
display in the job correctly.
|
|
* Fix issue in the slurmstepd SPANK prolog/epilog handler where configuration
|
|
values were used before being initialized.
|
|
* Restore a step's ability to utilize all of an allocations memory if --mem=0.
|
|
* Fix --cpu-bind=verbose garbage taskid.
|
|
* Fix cgroup task affinity issues from garbage taskid info.
|
|
* Make gres_job_state_validate() client logging behavior as before 44466a4641.
|
|
* Fix steps with --hint overriding an allocation with --threads-per-core.
|
|
* Require requesting a GPU if --mem-per-gpu is requested.
|
|
* Return error early if a job is requesting --ntasks-per-gpu and no gpus or
|
|
task count.
|
|
* Properly clear out pending step if unavailable to run with available
|
|
resources.
|
|
* Kill all processes spawned by burst_buffer.lua including decendents.
|
|
* openapi/v0.0.{35,36,37} - Avoid setting default values of min_cpus,
|
|
job name, cwd, mail_type, and contiguous on job update.
|
|
* openapi/v0.0.{35,36,37} - Clear user hold on job update if hold=false.
|
|
* Prevent CRON_JOB flag from being cleared when loading job state.
|
|
* sacctmgr - Fix deleting WCKeys when not specifying a cluster.
|
|
* Fix getting memory for a step when the first node in the step isn't the
|
|
first node in the allocation.
|
|
* Make SelectTypeParameters=CR_Core_Memory default for cons_tres and cons_res.
|
|
* Correctly handle mutex unlocks in the gres code if failures happen.
|
|
* Give better error message if -m plane is given with no size.
|
|
* Fix --distribution=arbitrary for salloc.
|
|
* Fix jobcomp/script regression introduced in 21.08.0rc1 0c75b9ac9d.
|
|
* Only send the batch node in the step_hostlist in the job credential.
|
|
* When setting affinity for the batch step don't assume the batch host is node
|
|
0.
|
|
* In task/affinity better checking for node existence when laying out
|
|
affinity.
|
|
* slurmrestd - fix job submission with auth/jwt.
|
|
|
|
- removed Fix-statement-condition-in-netloc-autoconf-macro.patch
|
|
issue was fixed upstream
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Sep 6 15:34:06 UTC 2021 - Egbert Eich <eich@suse.com>
|
|
|
|
- Fix-statement-condition-in-netloc-autoconf-macro.patch:
|
|
Fix netloc check, reestablish netloc disable code.
|
|
- Make configure arg '--with-pmix' conditional.
|
|
- Move openapi plugins to package slurm-restd.
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Sep 2 13:19:33 UTC 2021 - Christian Goll <cgoll@suse.com>
|
|
|
|
- updated to 21.08.0, major changes:
|
|
* A new "AccountingStoreFlags=job_script" option to store the job scripts
|
|
directly in SlurmDBD.
|
|
* Added "sacct -o SubmitLine" format option to get the submit line
|
|
of a job/step.
|
|
* Changes to the node state management so that nodes are marked as PLANNED
|
|
instead of IDLE if the scheduler is still accumulating resources while
|
|
waiting to launch a job on them.
|
|
* RS256 token support in auth/jwt.
|
|
* Overhaul of the cgroup subsystems to simplify operation, mitigate a number
|
|
of inherent race conditions, and prepare for future cgroup v2 support.
|
|
* Further improvements to cloud node power state management.
|
|
* A new child process of the Slurm controller called "slurmscriptd"
|
|
responsible for executing PrologSlurmctld and EpilogSlurmctld scripts,
|
|
which significantly reduces performance issues associated with enabling
|
|
those options.
|
|
* A new burst_buffer/lua plugin allowing for site-specific asynchronous job
|
|
data management.
|
|
* Fixes to the job_container/tmpfs plugin to allow the slurmd process to be
|
|
restarted while the job is running without issue.
|
|
* Added json/yaml output to sacct, squeue, and sinfo commands.
|
|
* Added a new node_features/helpers plugin to provide a generic way to change
|
|
settings on a compute node across a reboot.
|
|
* Added support for automatically detecting and broadcasting shared libraries
|
|
for an executable launched with "srun --bcast".
|
|
* Added initial OCI container execution support with a new --container option
|
|
to sbatch and srun.
|
|
* Improved "configless" support by allowing multiple control servers to be
|
|
specified through the slurmd --conf-server option, and send additional
|
|
configuration files at startup including cli_filter.lua.
|
|
- minor changes:
|
|
* If an overallocation of GRES happens terminate the creation of a job.
|
|
* AutoDetect=nvml: Fatal if no devices found in MIG mode.
|
|
* Print federation and cluster sacctmgr error messages to stderr.
|
|
* Add --gpu-bind=none to disable gpu binding when using --gpus-per-task.
|
|
* Handle the burst buffer state "alloc-revoke" which previously would not
|
|
display in the job correctly.
|
|
* Fix issue in the slurmstepd SPANK prolog/epilog handler where configuration
|
|
values were used before being initialized.
|
|
* Restored --gpu-bind=single:<ntasks> to check core affinity like
|
|
*-gpu-bind=closest does. This removal of this behavior only was in rc2.
|
|
* slurmd - Fix assert failure on initialization due to bad node name.
|
|
* Fix error codes in cgroup/v1.
|
|
* Don't destroy the memory step outside fini, which leads to a double destroy
|
|
causing an error message.
|
|
* Add support for lua 5.4.
|
|
* Force cgroup.clone_children to 0 in slurm cgroup directories. This caused
|
|
issues in task cpuset plugin in systems with it enabled by default.
|
|
* Clear GRES HAS_TYPE flag when removing type name.
|
|
* Environment flags in gres.conf now override flags set by AutoDetect.
|
|
* Environment flags in gres.conf now apply to subsequent gres.conf lines where
|
|
Environment flags are not set.
|
|
* Set missing job_uid and job_gid members when preparing a kill_job_msg_t in
|
|
abort_job_on_node(), abort_job_on_nodes() and kill_job_on_node().
|
|
* Fix swappiness not being set in cgroups.
|
|
* Fix coordinators for new subaccounts.
|
|
* Fix coordinators when adding existing users with PrivateData=users.
|
|
* slurmctld - do not attempt to relinquish control to self.
|
|
* openapi/v0.0.37 - Honor kill_on_invalid_dependency as job parameter.
|
|
* Check max_gres when doing step allocation, fix for regression in rc2.
|
|
* SPANK plugins are now required to match the current Slurm version, and must
|
|
be recompiled for each new Slurm release.
|
|
* node_features/helpers - add ExecTime configuration option.
|
|
* srun - Fix force termination with -X.
|
|
* On slurmctld restart set node typed GRES counts correctly.
|
|
* Fix places where a step wasn't allocated in the slurmctld but wasn't ever
|
|
removed from the job.
|
|
* Fix step allocation memory when using --threads-per-core.
|
|
* Fix step allocations to consume all threads on a core when using
|
|
threads-per-core.
|
|
* Add check to validate cpu request on a step if --threads-per-core is given
|
|
and it is less than what the core on the node has in the allocation.
|
|
* Fix issue where a step could request more gres than the job had and the step
|
|
would hang forever. This bug was only introduced in 21.08.0rc2.
|
|
* Only print \r\n for logging messages on stderr when --pty has been
|
|
explicitly requested.
|
|
* Relax check on SPANK plugins to only require Slurm major + minor versions
|
|
to match.
|
|
* job_container/tmpfs - delegate handling of /dev/shm to the extern step
|
|
so new step launches will be attached correctly even after the slurmd
|
|
process has been restarted.
|
|
* Limit the wait time in proctrack_g_wait() to UnkillableStepTimeout instead
|
|
of a hardcoded value of 256 seconds, and limit the delay between tests to a
|
|
maximum of 32 seconds.
|
|
* fatal() on start if using job_container/tmpfs without PrologFlags=Contian.
|
|
* Load bf_when_last_cycle from job state only if protocol version >= 21.08.
|
|
* Docs - remove man3 section entirely.
|
|
* Set step memory when using MemPerGPU or DefMemPerGPU. Previously a step's
|
|
memory was not set even when it requested *-mem-per-gpu and at least one
|
|
GPU.
|
|
* Add cli_filter.lua support in configless mode.
|
|
* Check that the step requests at least as many gres as nodes.
|
|
* Make job's SLURM_JOB_GPUS print global GPU IDs instead of MIG unique_ids.
|
|
* Fix miscounting of GPU envs in prolog/epilog if MultipleFiles was used.
|
|
* Support MIGs in prolog/epilog's CUDA_VISIBLE_DEVICES & co.
|
|
* Add SLURM_JOB_GPUS back into Prolog; add it to Epilog.
|
|
* Fix issue where the original executable, not the bcast'd version, was
|
|
executed with 'srun *-bcast'.
|
|
* sacct - print '-' header correctly for fields over 53-characters wide.
|
|
* openapi/dbv0.0.37 - replace "REST" with "Slurm OpenAPI" for plugin_name.
|
|
* openapi/v0.0.37 - replace "REST" with "Slurm OpenAPI" for plugin_name.
|
|
* configless - fix segfault on 'scontrol reconfigure'.
|
|
* Use FREE_NULL_LIST instead of list_destroy.
|
|
* If we made are running an interactive session we need to force track_steps.
|
|
* Disable OPOST flag when using --pty to avoid issues with Emac.
|
|
* Fix issue where extra bonus core was allocated in some situations.
|
|
* Avoid putting gres with count of 0 on a TRES req/alloc.
|
|
* Fix memory in requested TRES when --mem-per-gpu is used.
|
|
* Changed ReqMem field in sacct to match memory from ReqTRES.
|
|
* Changed --gpu-bind=single:<ntasks> to no longer check core affinity like
|
|
*-gpu-bind=closest does. This consequently affects --ntasks-per-gpu.
|
|
* slurmrestd - add v0.0.37 OpenAPI plugin.
|
|
* slurmrestd/v0.0.37 - rename standard_in -> standard_input.
|
|
* slurmrestd/v0.0.37 - rename standard_out -> standard_output.
|
|
* Changed the --format handling for negative field widths (left justified)
|
|
to apply to the column headers as well as the printed fields.
|
|
* Add LimitFactor to the QOS. A float that is factored into an associations
|
|
[Grp|Max]TRES limits. For example, if the LimitFactor is 2, then an
|
|
association with a GrpTRES of 30 CPUs, would be allowed to allocate 60
|
|
CPUs when running under this QOS.
|
|
* slurmrestd - Pass SLURM_NO_CHANGE_IN_DATA to client as 403 (Not Modified).
|
|
* slurmrestd/v0.0.37 - Add update_time field to Jobs query to allow clients
|
|
to only get jobs list based on change timestamp.
|
|
* Reset job eligible time when job is manually held.
|
|
* Add DEBUG_FLAG_JAG to improve logging related to job account gathering.
|
|
* Convert logging in account_gather/common to DEBUG_FLAG_JAG.
|
|
* Add more logging for jag_common_poll_data() when prec_extra() called.
|
|
* slurmrestd/v0.0.37 - add API to fetch reservation(s) info.
|
|
* Catch more errors in task/cgroup initalization and cleanup to avoid allowing
|
|
jobs to start when cgroups failure to configure correctly.
|
|
* Fix cgroup ns detection when using containers (e.g. LXC or Docker).
|
|
* Reset job's next_step_id counter to 0 after being requeued.
|
|
* Make scontrol exit with non-zero status after failing to delete a partition
|
|
or reservation.
|
|
* Make NtasksPerTRES optional in slurm_sprint_job_info().
|
|
* slurmrestd/v0.0.37 - Add update_time field to nodes query to allow clients
|
|
to only get nodes list based on change timestamp.
|
|
* common/parse_config - catch and propagate return codes when handling a match
|
|
on a key-value pattern. This implies error codes detected in the handlers
|
|
are now not ignored and users of _handle_keyvalue_match() can fatal().
|
|
* common/hostlist - fix hostlist_delete_nth() xassert() upper bound check.
|
|
* API change: Removed slurm_kill_job_msg and modified the function signature
|
|
for slurm_kill_job2. slurm_kill_job2 should be used instead of
|
|
slurm_kill_job_msg.
|
|
* Fix non-zero exit code for scontrol ping when all controllers are down.
|
|
* Enforce a valid configuration for AccountingStorageEnforce in slurm.conf.
|
|
If the configuration is invalid, then an error message will be printed and
|
|
the command or daemon (including slurmctld) will not run.
|
|
* slurmrestd/v0.0.37 - Add update_time field to partitions/reservations query
|
|
to allow clients to only get the entities list when something changed.
|
|
* slurmdbd.service - add "After" relationship to all common names for MariaDB
|
|
to reduce startup delays.
|
|
* slurmrestd/v0.0.37 - Correct displaying node states that are UNKNOWN.
|
|
* slurmrestd/v0.0.37 - Add flags to node states.
|
|
* Fix first job on fresh cluster not being assigned JobId=1 (or FirstJobId).
|
|
* squeue - make it so --nodelist is sensitive to --clusters.
|
|
* squeue - do --nodelist node validation in the same order as listing.
|
|
* Removed AccountingStoreJobComment option. Please update your config to use
|
|
AccountingStoreFlags=job_comment instead.
|
|
* AccountingStoreFlags=job_script allows you to store the job's batch script.
|
|
* AccountingStoreFlags=job_env allows you to store the job's env vars.
|
|
* Add sacct -o SubmitLine to get the submit line of a job/step.
|
|
* Removed DefaultStorage{Host,Loc,Pass,Port,Type,User} options.
|
|
* Fix NtasksPerTRES delimiter from : to = in scontrol show job output.
|
|
* Removed CacheGroups, CheckpointType, JobCheckpointDir, MemLimitEnforce,
|
|
SchedulerPort, SchedulerRootFilter options.
|
|
* Make job accounting queries use consistent timeframes with and w/o jobs.
|
|
* --cpus-per-task and --threads-per-core now imply --exact.
|
|
This fixes issues where steps would be allocated the wrong number of CPUs.
|
|
* configure: the --with option handling has been made consistent across the
|
|
various optional libraries. Specifying *-with-foo=/path/to/foo will only
|
|
check that directory for the applicable library (rather than, in some cases,
|
|
falling back to the default directories), and will always error the build
|
|
if the library is not found (instead of a mix of error messages and non-
|
|
fatal warning messages).
|
|
* configure: replace --with-rmsi_dir option with proper handling for
|
|
*-with-rsmi=dir.
|
|
* Pass additional job environment variables to MailProg.
|
|
* Add SLURM_JOB_WORK_DIR to Prolog, Epilog.
|
|
* Removed sched/hold plugin.
|
|
* Fix srun overwriting SLURM_SUBMIT_DIR and SLURM_SUBMIT_HOST when within an
|
|
existing allocation.
|
|
* step_ctx code has been removed from the api.
|
|
* cli_filter/lua, jobcomp/lua, job_submit/lua now load their scripts from the
|
|
same directory as the slurm.conf file (and thus now will respect changes
|
|
to the SLURM_CONF environment variable).
|
|
* SPANK - call slurm_spank_init if defined without slurm_spank_slurmd_exit in
|
|
slurmd context.
|
|
* job_container/tmpfs - Remove need for .active file to allow salloc without
|
|
an interactive step to work.
|
|
* slurmd - Delay background node registration on every failure up to 128s on
|
|
startup.
|
|
* slurmctld - Always notify slurmd that node registration was accepted to
|
|
avoid slurmd needless attempting to re-register if there is configuration
|
|
issue.
|
|
* Put node into "INVAL" state upon registering with an invalid node
|
|
configuration. Node must register with a valid configuration to continue.
|
|
* Make --cpu-bind=threads default for --threads-per-core -- cli and env can
|
|
override.
|
|
* jobcomp/elasticsearch - Use data_t to serialize data. The plugin now has the
|
|
JSON-C library as a prerequisite.
|
|
* scrontab - create the temporary file under the TMPDIR environment variable
|
|
(if set), otherwise continue to use TmpFS as configured in slurm.conf.
|
|
* Add LastBusyTime to "scontrol show nodes" and slurmrestd nodes output,
|
|
which represents the time the node last had jobs on it.
|
|
* slurmd - allow multiple comma-separated controllers to be specified in
|
|
configless mode with *-conf-server
|
|
* sacctmgr - changed column headings to "ParentID" and "ParentName" instead
|
|
of "Par ID" and "Par Name" respectively.
|
|
* Perl API - make sure man pages are installed under the --prefix given to
|
|
configure.
|
|
* Manually powering down of nodes with scontrol now ignores
|
|
SuspendExc<Nodes|Parts>.
|
|
* SALLOC_THREADS_PER_CORE and SBATCH_THREADS_PER_CORE have been added as
|
|
input environment variables for salloc and sbatch, respectively. They do
|
|
the same thing as *-threads-per-core.
|
|
* Distinguish queued reboot requests (REBOOT) from issued reboots (REBOOT^).
|
|
* Set the maximum number of open files per process to 4096 to avoid
|
|
performance issues when closing the entire range with closeall().
|
|
* auth/jwt - add support for RS256 tokens.
|
|
* Relax reservation purge due to any invalid uid after creation time.
|
|
* Reject srun that requests both --exclusive and --overlap.
|
|
* service files - change dependency to network-online rather than just
|
|
network to ensure DNS and other services are available.
|
|
* RSMI: Fix incorrect PCI BDF bits.
|
|
* plugins/cli_filter - Convert to using data_t to serialize JSON.
|
|
* Fix testing array job after regaining locks in backfill.
|
|
* Don't display node's comment with "scontrol show nodes" unless set.
|
|
* Add "Extra" field to node to store extra information other than a comment.
|
|
* scrontab - Use /tmp instead of TmpFS if TMPDIR is not set.
|
|
* Add ResumeTimeout, SuspendTimeout and SuspendTime to Partitions.
|
|
* sreport - change to sorting TopUsage by the --tres option.
|
|
* slurmrestd - do not run allow operation as SlurmUser/root by default.
|
|
* Allow map_cpu and mask_cpu for non-whole node allocation.
|
|
* TaskPluginParam=verbose is now treated as a default. Previously it would be
|
|
applied regardless of the job specifying a *-cpu-bind.
|
|
* Add "node_reg_mem_percent" SlurmctldParameter to define percentage of
|
|
memory nodes are allowed to register with.
|
|
* Show correct number of SocketsPerBoard in slurmd -C with hwloc2.
|
|
* Alter sreport's cluster utilization report column name from
|
|
'Reserved' to 'Planned' to match the nomenclature of the 'Planned' node.
|
|
* Add StateComplete format option to sinfo to show base_state+flags.
|
|
* "scontrol show node" now shows State as base_state+flags instead of
|
|
shortened state with flags appended. eg. IDLE# *> IDLE+POWERING_UP.
|
|
Also "POWER" state flag string is "POWERED_DOWN".
|
|
* slurmd/req - add missing job_env_t's het_job_id initialization off the
|
|
request in _rpc_{abort,terminate}_job(). This caused problems for Native
|
|
Cray builds when joining a CNCU job_container plugin with Epilog configured.
|
|
* Fix joining a CNCU job_container on a Native Cray build before executing the
|
|
UnkillableStepProgram for a HetJob step.
|
|
* slurmrestd/v0.0.35 - Plugin has been tagged as deprecated.
|
|
* srun - Job steps requiring more cores than available to be rejected unless
|
|
'--overlap' is specificed.
|
|
* Add bf_node_space_size to SchedulerParameters.
|
|
* Add scontrol update node state=POWER_DOWN_FORCE and POWER_DOWN_ASAP as new
|
|
ways to power off and reset especially CLOUD nodes.
|
|
* Define and separate node power state transitions. Previously a powering
|
|
down node was in both states, POWERING_OFF and POWERED_OFF. These are now
|
|
separated.
|
|
* Create a new process called slurmscriptd which runs PrologSlurmctld and
|
|
EpilogSlurmctld. This avoids fork() calls from slurmctld, and can avoid
|
|
performance issues if the slurmctld has a large memory footprint.
|
|
* Added new Script option to DebugFlags for debugging slurmscriptd.
|
|
* scrontab - add ability to update crontab from a file or standard input.
|
|
* scrontab - add ability to set and expand variables.
|
|
* Pass JSON of job to node mappings to ResumeProgram.
|
|
* If running steps in an allocation with CR_PACK_NODE or -mpack the srun will
|
|
only attempt to allocate as much as needed from the allocation instead
|
|
of always trying to allocate every node in the allocation.
|
|
* Jobs that request the whole node now check to see if any gres are allocated.
|
|
* Rename SbcastParameters to BcastParameters.
|
|
* Make srun sensitive to BcastParameters.
|
|
* RSMI: Add gres_links_create_empty() and preserve RSMI enumeration order.
|
|
* GPUs: Use index instead of dev_num for CUDA_VISIBLE_DEVICES
|
|
* Don't run epilog on nodes if job never launched.
|
|
* QOS accrue limits only apply to the job QOS, not partition QOS.
|
|
* Add --gpu-bind=per_task:<gpus_per_task> option, --gpus-per-task will now
|
|
set this option by default.
|
|
* Treat any return code from SPANK plugin that is not SLURM_SUCCESS to be an
|
|
error or rejection.
|
|
* Print the statistics for extern step adopted processes in sstat.
|
|
* Fix SLURM_NODE_ALIASES to work for ipv6 node addrs.
|
|
* Add support for automatically detecting and broadcasting executable shared
|
|
object dependencies for sbcast and srun *-bcast.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Jul 2 08:01:32 UTC 2021 - Christian Goll <cgoll@suse.com>
|
|
|
|
- Updated to 20.11.8:
|
|
* slurmctld - fix erroneous "StepId=CORRUPT" messages in error logs.
|
|
* Correct the error given when auth plugin fails to pack a credential.
|
|
* Fix unused-variable compiler warning on FreeBSD in fd_resolve_path().
|
|
* acct_gather_filesystem/lustre - only emit collection error once per step.
|
|
* Add GRES environment variables (e.g., CUDA_VISIBLE_DEVICES) into the
|
|
interactive step, the same as is done for the batch step.
|
|
* Fix various potential deadlocks when altering objects in the database
|
|
dealing with every cluster in the database.
|
|
* slurmrestd:
|
|
- handle slurmdbd connection failures without segfaulting.
|
|
- fix segfault for searches in slurmdb/v0.0.36/jobs.
|
|
- remove (non-functioning) users query parameter for
|
|
slurmdb/v0.0.36/jobs from openapi.json
|
|
- fix segfault in slurmrestd db/jobs with numeric queries
|
|
- add argv handling for job/submit endpoint.
|
|
- add description for slurmdb/job endpoint.
|
|
* slurmrestd/dbv0.0.36:
|
|
- Fix values dumped in job state/current and
|
|
job step state.
|
|
- Correct description for previous state property.
|
|
* srun:
|
|
- fix broken node step allocation in a heterogeneous allocation.
|
|
- leave SLURM_DIST_UNKNOWN as default for --interactive.
|
|
* Fail step creation if -n is not multiple of --ntasks-per-gpu.
|
|
* job_container/tmpfs - Fix slowdown on teardown.
|
|
* Fix problem with SlurmctldProlog where requeued jobs would never launch.
|
|
* job_container/tmpfs - Fix issue when restarting slurmd where the namespace
|
|
mount points could disappear.
|
|
* sacct:
|
|
- avoid truncating JobId at 34 characters.
|
|
- fix segfault when printing StepId (or when using --long).
|
|
* scancel - fix segfault when --wckey filtering option is used.
|
|
* select/cons_tres - Fix memory leak.
|
|
* Prevent file descriptor leak in job_container/tmpfs on slurmd restart.
|
|
* perlapi/libslurmdb - expose tres_req_str to job hash.
|
|
* scrontab - close and reopen temporary crontab file to deal with editors
|
|
that do not change the original file, but instead write out then rename
|
|
a new file.
|
|
* sstat - fix linking so that it will work when --without-shared-libslurm
|
|
was used to build Slurm.
|
|
* Clear allocated cpus for running steps in a job before handling requested
|
|
nodes on new step.
|
|
* Don't reject a step if not enough nodes are available. Instead, defer the
|
|
step until enough nodes are available to satisfy the request.
|
|
* Don't reject a step if it requests at least one specific node that is
|
|
already allocated to another step. Instead, defer the step until the
|
|
requested node(s) become available.
|
|
* Better handling of --mem=0.
|
|
* Ignore DefCpuPerGpu when --cpus-per-task given.
|
|
|
|
|
|
-------------------------------------------------------------------
|
|
Fri May 14 10:07:04 UTC 2021 - Christian Goll <cgoll@suse.com>
|
|
|
|
- Updated to 20.11.7 which fixes CVE-2021-31215 (bsc#1186024)
|
|
- New features in 20.11.7:
|
|
* slurmd - handle configless failures gracefully instead of hanging
|
|
indefinitely.
|
|
* select/cons_tres - fix Dragonfly topology not selecting nodes in the same
|
|
leaf switch when it should as well as requests with *-switches option.
|
|
* Fix issue where certain step requests wouldn't run if the first node in the
|
|
job allocation was full and there were idle resources on other nodes in
|
|
the job allocation.
|
|
* Fix deadlock issue with <Prolog|Epilog>Slurmctld.
|
|
* torque/qstat - fix printf error message in output.
|
|
* When adding associations or wckeys avoid checking multiple times a user or
|
|
cluster name.
|
|
* Fix wrong jobacctgather information on a step on multiple nodes
|
|
due to timeouts sending its the information gathered on its node.
|
|
* Fix missing xstrdup which could result in slurmctld segfault on array jobs.
|
|
* Fix security issue in PrologSlurmctld and EpilogSlurmctld by always
|
|
prepending SPANK_ to all user-set environment variables. CVE-2021-31215.
|
|
- New features in 20.11.6:
|
|
* Fix sacct assert with the --qos option.
|
|
* Use pkg-config --atleast-version instead of --modversion for systemd.
|
|
* common/fd - fix getsockopt() call in fd_get_socket_error().
|
|
* Properly handle the return from fd_get_socket_error() in _conn_readable().
|
|
* cons_res - Fix issue where running jobs were not taken into consideration
|
|
when creating a reservation.
|
|
* Avoid a deadlock between job_list for_each and assoc QOS_LOCK.
|
|
* Fix TRESRunMins usage for partition qos on restart/reconfig.
|
|
* Fix printing of number of tasks on a completed job that didn't request
|
|
tasks.
|
|
* Fix updating GrpTRESRunMins when decrementing job time is bigger than it.
|
|
* Make it so we handle multithreaded allocations correctly when doing
|
|
--exclusive or --core-spec allocations.
|
|
* Fix incorrect round-up division in _pick_step_cores
|
|
* Use appropriate math to adjust cpu counts when --ntasks-per-core=1.
|
|
* cons_tres - Fix consideration of power downed nodes.
|
|
* cons_tres - Fix DefCpuPerGPU, increase cpus-per-task to match with
|
|
gpus-per-task * cpus-per-gpu.
|
|
* Fix under-cpu memory auto-adjustment when MaxMemPerCPU is set.
|
|
* Make it possible to override CR_CORE_DEFAULT_DIST_BLOCK.
|
|
* Perl API - fix retrieving/storing of slurm_step_id_t in job_step_info_t.
|
|
* Recover state of burst buffers when slurmctld is restarted to avoid skipping
|
|
burst buffer stages.
|
|
* Fix race condition in burst buffer plugin which caused a burst buffer
|
|
in stage-in to not get state saved if slurmctld stopped.
|
|
* auth/jwt - print an error if jwt_file= has not been set in slurmdbd.
|
|
* Fix RESV_DEL_HOLD not being a valid state when using squeue --states.
|
|
* Add missing squeue selectable states in valid states error message.
|
|
* Fix scheduling last array task multiple times on error, causing segfault.
|
|
* Fix issue where a step could be allocated more memory than the job when
|
|
dealing with --mem-per-cpu and --threads-per-core.
|
|
* Fix removing qos from assoc with -= can lead to assoc with no qos
|
|
* auth/jwt - fix segfault on invalid credential in slurmdbd due to
|
|
missing validate_slurm_user() function in context.
|
|
* Fix single Port= not being applied to range of nodes in slurm.conf
|
|
* Fix Jobs not requesting a tres are not starting because of that tres limit.
|
|
* acct_gather_energy/rapl - fix AveWatts calculation.
|
|
* job_container/tmpfs - Fix issues with cleanup and slurmd restarting on
|
|
running jobs.
|
|
|
|
-------------------------------------------------------------------
|
|
Mon May 3 16:09:44 UTC 2021 - Egbert Eich <eich@suse.com>
|
|
|
|
- Ship REST API version and auth plugins with slurmrestd.
|
|
- Add YAML support for REST API to build (bsc#1185603).
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Mar 17 08:55:58 UTC 2021 - Christian Goll <cgoll@suse.com>
|
|
|
|
- Udpate to 20.11.5:
|
|
- New features:
|
|
* New job_container/tmpfs plugin developed by NERSC that can be used to
|
|
create per-job filesystem namespaces. Documentaiion and configuration
|
|
can be found in the respecting man page.
|
|
- Bug fixes:
|
|
* Fix main scheduler bug where bf_hetjob_prio truncates SchedulerParameters.
|
|
* Fix sacct not displaying UserCPU, SystemCPU and TotalCPU for large times.
|
|
* scrontab - fix to return the correct index for a bad #SCRON option.
|
|
* scrontab - fix memory leak when invalid option found in #SCRON line.
|
|
* Add errno for when a user requests multiple partitions and they are using
|
|
partition based associations.
|
|
* Fix issue where a job could run in a wrong partition when using
|
|
EnforcePartLimits=any and partition based associations.
|
|
* Remove possible deadlock when adding associations/wckeys in multiple
|
|
threads.
|
|
* When using PrologFlags=alloc make sure the correct Slurm version is set
|
|
in the credential.
|
|
* When sending a job a warning signal make sure we always send SIGCONT
|
|
beforehand.
|
|
* Fix issue where a batch job would continue running if a prolog failed on a
|
|
node that wasn't the batch host and requeuing was disabled.
|
|
* Fix issue where sometimes salloc/srun wouldn't get a message about a prolog
|
|
failure in the job's stdout.
|
|
* Requeue or kill job on a prolog failure when PrologFlags is not set.
|
|
* Fix race condition causing node reboots to get requeued before
|
|
ResumeTimeout expires.
|
|
* Preserve node boot_req_time on reconfigure.
|
|
* Preserve node power_save_req_time on reconfigure.
|
|
* Fix node reboots being queued and issued multiple times and preventing the
|
|
reboot to time out.
|
|
* Fix run_command to exit correctly if track_script kills the calling thread.
|
|
* Only requeue a job when the PrologSlurmctld returns nonzero.
|
|
* When a job is signaled with SIGKILL make sure we flush all
|
|
prologs/setup scripts.
|
|
* Handle burst buffer scripts if the job is canceled while stage_in is
|
|
happening.
|
|
* When shutting down the slurmctld make note to ignore error message when
|
|
we have to kill a prolog/setup script we are tracking.
|
|
* scrontab - add support for the --open-mode option.
|
|
* acct_gather_profile/influxdb - avoid segfault on plugin shutdown if setup
|
|
has not completed successfully.
|
|
* Reduce delay in starting salloc allocations when running with prologs.
|
|
* Alter AllocNodes check to work if the allocating node's domain doesn't
|
|
match the slurmctld's. This restores the pre*20.11 behavior.
|
|
* Fix slurmctld segfault if jobs from a prior version had the now-removed
|
|
INVALID_DEPEND state flag set and were allowed to run in 20.11.
|
|
* Add job_container/tmpfs plugin to give a method to provide a private /tmp
|
|
per job.
|
|
* Set the correct core affinity when using AutoDetect.
|
|
* slurmrestd - mark "environment" as required for job submissions in schema.
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Feb 23 16:24:16 UTC 2021 - Christian Goll <cgoll@suse.com>
|
|
|
|
- Udpate to 20.11.04
|
|
* Fix node selection for advanced reservations with features.
|
|
* mpi/pmix: Handle pipe failure better when using ucx.
|
|
* mpi/pmix: include PMIX_NODEID for each process entry.
|
|
* Fix job getting rejected after being requeued on same node that died.
|
|
* job_submit/lua - add "network" field.
|
|
* Fix situations when a reoccuring reservation could erroneously skip a
|
|
period.
|
|
* Ensure that a reservations [pro|epi]log are ran on reoccuring reservations.
|
|
* Fix threads-per-core memory allocation issue when using CR_CPU_MEMORY.
|
|
* Fix scheduling issue with --gpus.
|
|
* Fix gpu allocations that request --cpus-per-task.
|
|
* mpi/pmix: fixed print messages for all PMIXP_* macros
|
|
* Add mapping for XCPU to --signal option.
|
|
* Fix regression in 20.11 that prevented a full pass of the main scheduler
|
|
from ever executing.
|
|
* Work around a glibc bug in which "0" is incorrectly printed as "nan"
|
|
which will result in corrupted association state on restart.
|
|
* Fix regression in 20.11 which made slurmd incorrectly attempt to find the
|
|
parent slurmd address when not applicable and send incorrect reverse*tree
|
|
info to the slurmstepd.
|
|
* Fix cgroup ns detection when using containers (e.g. LXC or Docker).
|
|
* scrontab - change temporary file handling to work with emacs.
|
|
- Removed check-for-lipmix.so.MAJOR.patch
|
|
- Added: load-pmix-major-version.patch
|
|
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Jan 20 10:13:23 UTC 2021 - Ana Guerrero Lopez <aguerrero@suse.com>
|
|
|
|
- Update to 20.11.03
|
|
- This release includes a major functional change to how job step launch is
|
|
handled compared to the previous 20.11 releases. This affects srun as
|
|
well as MPI stacks - such as Open MPI - which may use srun internally as
|
|
part of the process launch.
|
|
One of the changes made in the Slurm 20.11 release was to the semantics
|
|
for job steps launched through the 'srun' command. This also
|
|
inadvertently impacts many MPI releases that use srun underneath their
|
|
own mpiexec/mpirun command.
|
|
For 20.11.{0,1,2} releases, the default behavior for srun was changed
|
|
such that each step was allocated exactly what was requested by the
|
|
options given to srun, and did not have access to all resources assigned
|
|
to the job on the node by default. This change was equivalent to Slurm
|
|
setting the --exclusive option by default on all job steps. Job steps
|
|
desiring all resources on the node needed to explicitly request them
|
|
through the new '--whole' option.
|
|
In the 20.11.3 release, we have reverted to the 20.02 and older behavior
|
|
of assigning all resources on a node to the job step by default.
|
|
This reversion is a major behavioral change which we would not generally
|
|
do on a maintenance release, but is being done in the interest of
|
|
restoring compatibility with the large number of existing Open MPI (and
|
|
other MPI flavors) and job scripts that exist in production, and to
|
|
remove what has proven to be a significant hurdle in moving to the new
|
|
release.
|
|
Please note that one change to step launch remains - by default, in
|
|
20.11 steps are no longer permitted to overlap on the resources they
|
|
have been assigned. If that behavior is desired, all steps must
|
|
explicitly opt-in through the newly added '--overlap' option.
|
|
Further details and a full explanation of the issue can be found at:
|
|
https://bugs.schedmd.com/show_bug.cgi?id=10383#c63
|
|
- Other changes from 20.11.03
|
|
* Fix segfault when parsing bad "#SBATCH hetjob" directive.
|
|
* Allow countless gpu:<typenode GRES specifications in slurm.conf.
|
|
* PMIx - Don't set UCX_MEM_MMAP_RELOC for older version of UCX (pre 1.5).
|
|
* Don't green-light any GPU validation when core conversion fails.
|
|
* Allow updates to a reservation in the database that starts in the future.
|
|
* Better check/handling of primary key collision in reservation table.
|
|
* Improve reported error and logging in _build_node_list().
|
|
* Fix uninitialized variable in _rpc_file_bcast() which could lead to an
|
|
incorrect error return from sbcast / srun --bcast.
|
|
* mpi/cray_shasta - fix use-after-free on error in _multi_prog_parse().
|
|
* Cray - Handle setting correct prefix for cpuset cgroup with respects to
|
|
expected_usage_in_bytes. This fixes Cray's OOM killer.
|
|
* mpi/pmix: Fix PMIx_Abort support.
|
|
* Don't reject jobs allocating more cores than tasks with MaxMemPerCPU.
|
|
* Fix false error message complaining about oversubscribe in cons_tres.
|
|
* scrontab - fix parsing of empty lines.
|
|
* Fix regression causing spank_process_option errors to be ignored.
|
|
* Avoid making multiple interactive steps.
|
|
* Fix corner case issues where step creation should fail.
|
|
* Fix job rejection when --gres is less than --gpus.
|
|
* Fix regression causing spank prolog/epilog not to be called unless the
|
|
spank plugin was loaded in slurmd context.
|
|
* Fix regression preventing SLURM_HINT=nomultithread from being used
|
|
to set defaults for salloc->srun, sbatch->srun sequence.
|
|
* Reject job credential if non-superuser sets the LAUNCH_NO_ALLOC flag.
|
|
* Make it so srun --no-allocate works again.
|
|
* jobacct_gather/linux - Don't count memory on tasks that have already
|
|
finished.
|
|
* Fix 19.05/20.02 batch steps talking with a 20.11 slurmctld.
|
|
* jobacct_gather/common - Do not process jobacct's with same taskid when
|
|
calling prec_extra.
|
|
* Cleanup all tracked jobacct tasks when extern step child process finishes.
|
|
* slurmrestd/dbv0.0.36 - Correct structure of dbv0.0.36_tres_list.
|
|
* Fix regression causing task/affinity and task/cgroup to be out of sync when
|
|
configured ThreadsPerCore is different than the physical threads per core.
|
|
* Fix situation when --gpus is given but not max nodes (-N1-1) in a job
|
|
allocation.
|
|
* Interactive step - ignore cpu bind and mem bind options, and do not set
|
|
the associated environment variables which lead to unexpected behavior
|
|
from srun commands launched within the interactive step.
|
|
* Handle exit code from pipe when using UCX with PMIx.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Jan 8 13:27:02 UTC 2021 - Egbert Eich <eich@suse.com>
|
|
|
|
- Fix fallout introduced by:
|
|
"Replace '%service_del_postun -n' with '%service_del_postun_without_restart'"
|
|
for older Leap/SLE versions.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Jan 8 12:20:27 UTC 2021 - Egbert Eich <eich@suse.com>
|
|
|
|
- Fix Provides:/Conflicts: for libnss_slurm (bsc#1180700).
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Jan 5 08:02:02 UTC 2021 - Ana Guerrero Lopez <aguerrero@suse.com>
|
|
|
|
- Add support for configuration files from external plugins.
|
|
While built-in plugins have their configuration added in slurm.conf,
|
|
external SPANK plugins add their configuration to plugstack.conf
|
|
To allow packaging easily spank plugins, their configuration files
|
|
should be added independently at /etc/spack/plugstack.conf.d and
|
|
plugstack.conf should be left with an oneliner including all the
|
|
files under /etc/spack/plugstack.conf.d
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Dec 28 14:37:58 UTC 2020 - Ana Guerrero Lopez <aguerrero@suse.com>
|
|
|
|
- Update to 20.11.02
|
|
* Fix older versions of sacct not working with 20.11.
|
|
* Fix slurmctld crash when using a pre-20.11 srun in a job allocation.
|
|
* Correct logic problem in _validate_user_access.
|
|
* Fix libpmi to initialize Slurm configuration correctly.
|
|
|
|
- Update to 20.11.01
|
|
* Fix spelling of "overcomited" to "overcomitted" in sreport's cluster
|
|
utilization report.
|
|
* Silence debug message about shutting down backup controllers if none are
|
|
configured.
|
|
* Don't create interactive srun until PrologSlurmctld is done.
|
|
* Fix fd symlink path resolution.
|
|
* Fix slurmctld segfault on subnode reservation restore after node
|
|
configuration change.
|
|
* Fix resource allocation response message environment allocation size.
|
|
* Ensure that details->env_sup is NULL terminated.
|
|
* select/cray_aries - Correctly remove jobs/steps from blades using NPC.
|
|
* cons_tres - Avoid max_node_gres when entire node is allocated with
|
|
--ntasks-per-gpu.
|
|
* Allow NULL arg to data_get_type().
|
|
* In sreport have usage for a reservation contain all jobs that ran in the
|
|
reservation instead of just the ones that ran in the time specified. This
|
|
matches the report for the reservation is not truncated for a time period.
|
|
* Fix issue with sending wrong batch step id to a < 20.11 slurmd.
|
|
* Add a job's alloc_node to lua for job modification and completion.
|
|
* Fix regression getting a slurmdbd connection through the perl API.
|
|
* Stop the extern step terminate monitor right after proctrack_g_wait().
|
|
* Fix removing the normalized priority of assocs.
|
|
* slurmrestd/v0.0.36 - Use correct name for partition field:
|
|
"min nodes per job" -"min_nodes_per_job".
|
|
* slurmrestd/v0.0.36 - Add node comment field.
|
|
* Fix regression marking cloud nodes as "unexpectedly rebooted" after
|
|
multiple boots.
|
|
* Fix slurmctld segfault in _slurm_rpc_job_step_create().
|
|
* slurmrestd/v0.0.36 - Filter node states against NODE_STATE_BASE to avoid
|
|
the extended states all being reported as "invalid".
|
|
* Fix race that can prevent the prolog for a requeued job from running.
|
|
* cli_filter - add "type" to readily distinguish between the CLI command in
|
|
use.
|
|
* smail - reduce sleep before seff to 5 seconds.
|
|
* Ensure SPANK prolog and epilog run without an explicit PlugStackConfig.
|
|
* Disable MySQL automatic reconnection.
|
|
* Fix allowing "b" after memory unit suffixes.
|
|
* Fix slurmctld segfault with reservations without licenses.
|
|
* Due to internal restructuring ahead of the 20.11 release, applications
|
|
calling libslurm MUST call slurm_init(NULL) before any API calls.
|
|
Otherwise the API call is likely to fail due to libslurm's internal
|
|
configuration not being available.
|
|
* slurm.spec - allow custom paths for PMIx and UCX install locations.
|
|
* Use rpath if enabled when testing for Mellanox's UCX libraries.
|
|
* slurmrestd/dbv0.0.36 - Change user query for associations to optional.
|
|
* slurmrestd/dbv0.0.36 - Change account query for associations to optional.
|
|
* mpi/pmix - change the error handler error message to be more useful.
|
|
* Add missing connection in acct_storage_p_{clear_stats, reconfig, shutdown}.
|
|
* Perl API - fix issue when running in configless mode.
|
|
* nss_slurm - avoid deadlock when stray sockets are found.
|
|
* Display correct value for ScronParameters in 'scontrol show config'
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Nov 30 20:48:01 UTC 2020 - Egbert Eich <eich@suse.com>
|
|
|
|
- Update to version 20.11.0
|
|
Slurm 20.11 includes a number of new features including:
|
|
* Overhaul of the job step management and launch code, alongside improved
|
|
GPU task placement support.
|
|
* A new "Interactive Step" mode of operation for salloc.
|
|
* A new "scrontab" command that can be used to submit and manage
|
|
periodically repeating jobs.
|
|
* IPv6 support.
|
|
* Changes to the reservation logic, with new options allowing users
|
|
to delete reservations, allowing admins to skip the next occurance of a
|
|
repeated reservation, and allowing for a job to be submitted and eligible
|
|
to run within multiple reservations.
|
|
* Dynamic Future Nodes - automatically associate a dynamically
|
|
provisioned (or "cloud") node against a NodeName definition with matching
|
|
hardware.
|
|
* An experimental new RPC queuing mode for slurmctld to reduce thread
|
|
contention on heavily loaded clusters.
|
|
* SlurmDBD integration with the Slurm REST API.
|
|
Also check
|
|
https://github.com/SchedMD/slurm/blob/slurm-20-11-0-1/RELEASE_NOTES
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Nov 18 08:40:59 UTC 2020 - Ana Guerrero Lopez <aguerrero@suse.com>
|
|
|
|
- Updated to 20.02.6, addresses two security fixes:
|
|
* PMIx - fix potential buffer overflows from use of unpackmem().
|
|
CVE-2020-27745 (bsc#1178890)
|
|
* X11 forwarding - fix potential leak of the magic cookie when sent as an
|
|
argument to the xauth command. CVE-2020-27746 (bsc#1178891)
|
|
- And many other bugfixes, full log and details available at:
|
|
* https://lists.schedmd.com/pipermail/slurm-announce/2020/000045.html
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Nov 3 14:31:02 UTC 2020 - Franck Bui <fbui@suse.com>
|
|
|
|
- Replace '%service_del_postun -n' with '%service_del_postun_without_restart'
|
|
|
|
'-n' is deprecated and will be removed in the future.
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Oct 29 12:35:18 UTC 2020 - Ana Guerrero Lopez <aguerrero@suse.com>
|
|
|
|
- Updated to 20.02.5, changes:
|
|
* Fix leak of TRESRunMins when job time is changed with --time-min
|
|
* pam_slurm - explicitly initialize slurm config to support configless mode.
|
|
* scontrol - Fix exit code when creating/updating reservations with wrong
|
|
Flags.
|
|
* When a GRES has a no_consume flag, report 0 for allocated.
|
|
* Fix cgroup cleanup by jobacct_gather/cgroup.
|
|
* When creating reservations/jobs don't allow counts on a feature unless
|
|
using an XOR.
|
|
* Improve number of boards discovery
|
|
* Fix updating a reservation NodeCnt on a zero-count reservation.
|
|
* slurmrestd - provide an explicit error messages when PSK auth fails.
|
|
* cons_tres - fix job requesting single gres per-node getting two or more
|
|
nodes with less CPUs than requested per-task.
|
|
* cons_tres - fix calculation of cores when using gres and cpus-per-task.
|
|
* cons_tres - fix job not getting access to socket without GPU or with less
|
|
than --gpus-per-socket when not enough cpus available on required socket
|
|
and not using --gres-flags=enforce binding.
|
|
* Fix HDF5 type version build error.
|
|
* Fix creation of CoreCnt only reservations when the first node isn't
|
|
available.
|
|
* Fix wrong DBD Agent queue size in sdiag when using accounting_storage/none.
|
|
* Improve job constraints XOR option logic.
|
|
* Fix preemption of hetjobs when needed nodes not in leader component.
|
|
* Fix wrong bit_or() messing potential preemptor jobs node bitmap, causing
|
|
bad node deallocations and even allocation of nodes from other partitions.
|
|
* Fix double-deallocation of preempted non-leader hetjob components.
|
|
* slurmdbd - prevent truncation of the step nodelists over 4095.
|
|
* Fix nodes remaining in drain state state after rebooting with ASAP option.
|
|
|
|
- changes from 20.02.4:
|
|
* srun - suppress job step creation warning message when waiting on
|
|
PrologSlurmctld.
|
|
* slurmrestd - fix incorrect return values in data_list_for_each() functions.
|
|
* mpi/pmix - fix issue where HetJobs could fail to launch.
|
|
* slurmrestd - set content-type header in responses.
|
|
* Fix cons_res GRES overallocation for --gres-flags=disable-binding.
|
|
* Fix cons_res incorrectly filtering cores with respect to GRES locality for
|
|
--gres-flags=disable-binding requests.
|
|
* Fix regression where a dependency on multiple jobs in a single array using
|
|
underscores would only add the first job.
|
|
* slurmrestd - fix corrupted output due to incorrect use of memcpy().
|
|
* slurmrestd - address a number of minor Coverity warnings.
|
|
* Handle retry failure when slurmstepd is communicating with srun correctly.
|
|
* Fix jobacct_gather possibly duplicate stats when _is_a_lwp error shows up.
|
|
* Fix tasks binding to GRES which are closest to the allocated CPUs.
|
|
* Fix AMD GPU ROCM 3.5 support.
|
|
* Fix handling of job arrays in sacct when querying specific steps.
|
|
* slurmrestd - avoid fallback to local socket authentication if JWT
|
|
authentication is ill-formed.
|
|
* slurmrestd - restrict ability of requests to use different authentication
|
|
plugins.
|
|
* slurmrestd - unlink named unix sockets before closing.
|
|
* slurmrestd - fix invalid formatting in openapi.json.
|
|
* Fix batch jobs stuck in CF state on FrontEnd mode.
|
|
* Add a separate explicit error message when rejecting changes to active node
|
|
features.
|
|
* cons_common/job_test - fix slurmctld SIGABRT due to double-free.
|
|
* Fix updating reservations to set the duration correctly if updating the
|
|
start time.
|
|
* Fix update reservation to promiscuous mode.
|
|
* Fix override of job tasks count to max when ntasks-per-node present.
|
|
* Fix min CPUs per node not being at least CPUs per task requested.
|
|
* Fix CPUs allocated to match CPUs requested when requesting GRES and
|
|
threads per core equal to one.
|
|
* Fix NodeName config parsing with Boards and without CPUs.
|
|
* Ensure SLURM_JOB_USER and SLURM_JOB_UID are set in SrunProlog/Epilog.
|
|
* Fix error messages for certain invalid salloc/sbatch/srun options.
|
|
* pmi2 - clean up sockets at step termination.
|
|
* Fix 'scontrol hold' to work with 'JobName'.
|
|
* sbatch - handle --uid/--gid in #SBATCH directives properly.
|
|
* Fix race condition in job termination on slurmd.
|
|
* Print specific error messages if trying to run use certain
|
|
priority/multifactor factors that cannot work without SlurmDBD.
|
|
* Avoid partial GRES allocation when --gpus-per-job is not satisfied.
|
|
* Cray - Avoid referencing a variable outside of it's correct scope when
|
|
dealing with creating steps within a het job.
|
|
* slurmrestd - correctly handle larger addresses from accept().
|
|
* Avoid freeing wrong pointer with SlurmctldParameters=max_dbd_msg_action
|
|
with another option after that.
|
|
* Restore MCS label when suspended job is resumed.
|
|
* Fix insufficient lock levels.
|
|
* slurmrestd - use errno from job submission.
|
|
* Fix "user" filter for sacctmgr show transactions.
|
|
* Fix preemption logic.
|
|
* Fix no_consume GRES for exclusive (whole node) requests.
|
|
* Fix regression in 20.02 that caused an infinite loop in slurmctld when
|
|
requesting --distribution=plane for the job.
|
|
* Fix parsing of the --distribution option.
|
|
* Add CONF READ_LOCK to _handle_fed_send_job_sync.
|
|
* prep/script - always call slurmctld PrEp callback in _run_script().
|
|
* Fix node estimation for jobs that use GPUs or --cpus-per-task.
|
|
* Fix jobcomp, job_submit and cli_filter Lua implementation plugins causing
|
|
slurmctld and/or job submission CLI tools segfaults due to bad return
|
|
handling when the respective Lua script failed to load.
|
|
* Fix propagation of gpu options through hetjob components.
|
|
* Add SLURM_CLUSTERS environment variable to scancel.
|
|
* Fix packing/unpacking of "unlinked" jobs.
|
|
* Connect slurmstepd's stderr to srun for steps launched with --pty.
|
|
* Handle MPS correctly when doing exclusive allocations.
|
|
* slurmrestd - fix compiling against libhttpparser in a non-default path.
|
|
* slurmrestd - avoid compilation issues with libhttpparser < 2.6.
|
|
* Fix compile issues when compiling slurmrestd without --enable-debug.
|
|
* Reset idle time on a reservation that is getting purged.
|
|
* Fix reoccurring reservations that have Purge_comp= to keep correct
|
|
duration if they are purged.
|
|
* scontrol - changed the "PROMISCUOUS" flag to "MAGNETIC"
|
|
* Early return from epilog_set_env in case of no_consume.
|
|
* Fix cons_common/job_test start time discovery logic to prevent skewed
|
|
results between "will run test" executions.
|
|
* Ensure TRESRunMins limits are maintained during "scontrol reconfigure".
|
|
* Improve error message when host lookup fails.
|
|
|
|
- Refresh patch: pam_slurm-Initialize-arrays-and-pass-sizes.patch
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Jul 7 09:05:40 UTC 2020 - Egbert Eich <eich@suse.com>
|
|
|
|
- Add support for openPMIx also for Leap/SLE 15.0/1 (bsc#1173805).
|
|
- Do not run %check on SLE-12-SP2: Some incompatibility in tcl
|
|
makes this fail.
|
|
- Remove unneeded build dependency to postgresql-devel.
|
|
- Disable build on s390 (requires 64bit).
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Jun 3 11:11:11 UTC 2020 - Egbert Eich <eich@suse.com>
|
|
|
|
- Bring QA to the package build: add %%check stage.
|
|
- Remove cruft that isn't needed any longer.
|
|
- Add 'ghosted' run-file.
|
|
- Add rpmlint filter to handle issues with library packages
|
|
for Leap and enterprise upgrade versions.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri May 22 08:45:46 UTC 2020 - Christian Goll <cgoll@suse.com>
|
|
|
|
- Updated to 20.02.3 which fixes CVE-2020-12693 (bsc#1172004).
|
|
- Other changes are:
|
|
* Factor in ntasks-per-core=1 with cons_tres.
|
|
* Fix formatting in error message in cons_tres.
|
|
* Fix calling stat on a NULL variable.
|
|
* Fix minor memory leak when using reservations with flags=first_cores.
|
|
* Fix gpu bind issue when CPUs=Cores and ThreadsPerCore > 1 on a node.
|
|
* Fix --mem-per-gpu for heterogenous --gres requests.
|
|
* Fix slurmctld load order in load_all_part_state().
|
|
* Fix race condition not finding jobacct gather task cgroup entry.
|
|
* Suppress error message when selecting nodes on disjoint topologies.
|
|
* Improve performance of _pack_default_job_details() with large number of job
|
|
* arguments.
|
|
* Fix archive loading previous to 17.11 jobs per-node req_mem.
|
|
* Fix regresion validating that --gpus-per-socket requires --sockets-per-node
|
|
* for steps. Should only validate allocation requests.
|
|
* error() instead of fatal() when parsing an invalid hostlist.
|
|
* nss_slurm - fix potential deadlock in slurmstepd on overloaded systems.
|
|
* cons_tres - fix --gres-flags=enforce-binding and related --cpus-per-gres.
|
|
* cons_tres - Allocate lowest numbered cores when filtering cores with gres.
|
|
* Fix getting system counts for named GRES/TRES.
|
|
* MySQL - Fix for handing typed GRES for association rollups.
|
|
* Fix step allocations when tasks_per_core > 1.
|
|
* Fix allocating more GRES than requested when asking for multiple GRES types.
|
|
|
|
-------------------------------------------------------------------
|
|
Wed May 6 10:54:43 UTC 2020 - Egbert Eich <eich@suse.com>
|
|
|
|
- Treat libnss_slurm like any other package: add version string to
|
|
upgrade package.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Mar 27 08:26:34 UTC 2020 - Christian Goll <cgoll@suse.com>
|
|
|
|
- Updated to 20.02.1 with following changes"
|
|
* Improve job state reason for jobs hitting partition_job_depth.
|
|
* Speed up testing of singleton dependencies.
|
|
* Fix negative loop bound in cons_tres.
|
|
* srun - capture the MPI plugin return code from mpi_hook_client_fini() and
|
|
use as final return code for step failure.
|
|
* Fix segfault in cli_filter/lua.
|
|
* Fix --gpu-bind=map_gpu reusability if tasks > elements.
|
|
* Make sure config_flags on a gres are sent to the slurmctld on node
|
|
registration.
|
|
* Prolog/Epilog - Fix missing GPU information.
|
|
* Fix segfault when using config parser for expanded lines.
|
|
* Fix bit overlap test function.
|
|
* Don't accrue time if job begin time is in the future.
|
|
* Remove accrue time when updating a job start/eligible time to the future.
|
|
* Fix regression in 20.02.0 that broke --depend=expand.
|
|
* Reset begin time on job release if it's not in the future.
|
|
* Fix for recovering burst buffers when using high-availability.
|
|
* Fix invalid read due to freeing an incorrectly allocated env array.
|
|
* Update slurmctld -i message to warn about losing data.
|
|
* Fix scontrol cancel_reboot so it clears the DRAIN flag and node reason for a
|
|
pending ASAP reboot.
|
|
|
|
-------------------------------------------------------------------
|
|
Sun Mar 8 15:43:25 UTC 2020 - Egbert Eich <eich@suse.com>
|
|
|
|
- Remove legacy_cray: with 20.02 the special treatment for
|
|
cray-specific plugins on SLE version prior to 15SP2 is
|
|
no longer required.
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Mar 4 13:05:07 UTC 2020 - Christian Goll <cgoll@suse.com>
|
|
|
|
- slurm-plugins will now also require pmix not only libpmix
|
|
(bsc#1164326)
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Feb 28 17:27:43 UTC 2020 - Egbert Eich <eich@suse.com>
|
|
|
|
- Removed autopatch as it doesn't work for the SLE-11-SP4 build.
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Feb 27 20:07:19 UTC 2020 - Kasimir _ <kasimir_@outlook.de>
|
|
|
|
- Disable %arm builds as this is no longer supported.
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Feb 27 10:19:05 UTC 2020 - Christian Goll <cgoll@suse.com>
|
|
|
|
- pmix searches now also for libpmix.so.2 so that there is no dependency
|
|
for devel package (bsc#1164386)
|
|
* added patch file check-for-lipmix.so.MAJOR.patch
|
|
* reworded patch file Remove-rpath-from-build.patch to use %autopatch
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Feb 26 06:13:13 UTC 2020 - Egbert Eich <eich@suse.com>
|
|
|
|
- Update to version 20.02.0 (jsc#SLE-8491)
|
|
* Fix minor memory leak in slurmd on reconfig.
|
|
* Fix invalid ptr reference when rolling up data in the database.
|
|
* Change shtml2html.py to require python3 for RHEL8 support, and match
|
|
man2html.py.
|
|
* slurm.spec - override "hardening" linker flags to ensure RHEL8 builds
|
|
in a usable manner.
|
|
* Fix type mismatches in the perl API.
|
|
* Prevent use of uninitialized slurmctld_diag_stats.
|
|
* Fixed various Coverity issues.
|
|
* Only show warning about root-less topology in daemons.
|
|
* Fix accounting of jobs in IGNORE_JOBS reservations.
|
|
* Fix issue with batch steps state not loading correctly when upgrading from
|
|
19.05.
|
|
* Deprecate max_depend_depth in SchedulerParameters and move it to
|
|
DependencyParameters.
|
|
* Silence erroneous error on slurmctld upgrade when loading federation state.
|
|
* Break infinite loop in cons_tres dealing with incorrect tasks per tres
|
|
request resulting in slurmctld hang.
|
|
* Improve handling of --gpus-per-task to make sure appropriate number of GPUs
|
|
is assigned to job.
|
|
* Fix seg fault on cons_res when requesting --spread-job.
|
|
- Move to python3 for everything but SLE-11-SP4
|
|
* For SLE-11-SP4 add a workaround to handle a python3 script (python2.7
|
|
compliant).
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Feb 19 21:27:00 UTC 2020 - Egbert Eich <eich@suse.com>
|
|
|
|
- Add explicit version dependency to libpmix as well.
|
|
'slurm-devel' has a tight version dependency on libpmix -
|
|
allowing multiple libpmix versions in one package repository
|
|
is therefore essential.
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Feb 13 22:34:48 UTC 2020 - Egbert Eich <eich@suse.com>
|
|
|
|
- Update to version 20.02.0-rc1
|
|
* sbatch - fix segfault when no newline at the end of a burst buffer file.
|
|
* Change scancel to only check job's base state when matching -t options.
|
|
* Save job dependency list in state files.
|
|
* cons_tres - allow jobs to be run on systems with root-less topologies.
|
|
* Restore pre-20.02pre1 PrologSlurmctld synchonization behavior to avoid
|
|
various race conditions, and ensure proper batch job launch.
|
|
* Add new slurmrestd command/daemon which implements the Slurm REST API.
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Feb 11 10:09:43 UTC 2020 - Christian Goll <cgoll@suse.com>
|
|
|
|
- Update to version 20.02.0-0pre1, highlights are
|
|
Highlights:
|
|
* Exclusive behavior of a node includes all GRES on a node as well
|
|
as the cpus.
|
|
* Use python3 instead of python for internal build/test scripts.
|
|
The slurm.spec file has been updated to depend on python3 as well.
|
|
* Added new NodeSet configuration option to help simplify partition
|
|
configuration sections for heterogeneous / condo*style clusters.
|
|
* Added slurm.conf option MaxDBDMsgs to control how many messages will be
|
|
stored in the slurmctld before throwing them away when the slurmdbd is down.
|
|
* The checkpoint plugin interface and all associated API calls have been
|
|
removed.
|
|
* slurm_init_job_desc_msg() initializes mail_type as uint16_t. This allows
|
|
mail_type to be set to NONE with scontrol.
|
|
* Add new slurm_spank_log() function to print messages back to the user from
|
|
within a SPANK plugin without prepending "error: " from slurm_error().
|
|
* Enforce having partition name and nodelist=ALL when creating reservations
|
|
with flags=PART_NODES.
|
|
* SPANK - removed never-implemented slurm_spank_slurmd_init() interface. This
|
|
hook has always been accessible through slurm_spank_init() in the
|
|
S_CTX_SLURMD context instead.
|
|
* sbcast - add new BcastAddr option to NodeName lines to allow sbcast traffic
|
|
to flow over an alternate network path.
|
|
* Added auth/jwt plugin, and 'scontrol token' subcommand. PMIx - improve
|
|
* performance of proc map generation. Deprecate kill_invalid_depend in
|
|
* SchedulerParameters and move it to a new
|
|
option called DependencyParameters.
|
|
* Enable job dependencies for any job on any cluster in the same federation.
|
|
* Allow clusters to be added automatically to db at startup of ctld. Add
|
|
* AccountingStorageExternalHost slurm.conf parameter. The
|
|
* "ConditionPathExists" condition in slurmd.service has been disabled by
|
|
default to permit simpler installation of a "configless" Slurm cluster.
|
|
* In SchedulerParameters remove deprecated max_job_bf and replace with
|
|
bf_max_job_test.
|
|
* Disable sbatch, salloc, srun --reboot for non-admins. SPANK - added support
|
|
* for S_JOB_GID in the job script context with
|
|
spank_get_item().
|
|
* Prolog/Epilog - add SLURM_JOB_GID environment variable.
|
|
configuration file changes:
|
|
* The mpi/openmpi plugin has been removed as it does nothing.
|
|
MpiDefault=openmpi will be translated to the functionally-equivalent
|
|
MpiDefault=none.
|
|
command changes (see man pages for details)
|
|
* Display StepId=<jobid>.batch instead of StepId=<jobid>.4294967294 in output
|
|
of "scontrol show step". (slurm_sprint_job_step_info())
|
|
* MPMD in srun will now defer PATH resolution for the commands to launch to
|
|
slurmstepd. Previously it would handle resolution client*side, but with
|
|
a non*standard approach that walked PATH in reverse.
|
|
* squeue - added "--me" option, equivalent to --user=$USER.
|
|
* The LicensesUsed line has been removed from 'scontrol show config'.
|
|
Please see the 'scontrol show licenses' command as an alternative.
|
|
* sbatch - adjusted backoff times for "--wait" option to reduce load on
|
|
slurmctld. This results in a steady*state delay of 32s between queries,
|
|
instead of the prior 10s delay.
|
|
- Removed following deprecated patches:
|
|
* removed patch slurmctld-rerun-agent_init-when-backup-controller-takes-over.patch
|
|
* removed patch split-xdaemon-in-xdaemon_init-and-xdaemon_finish-for.patch
|
|
* removed patch slurmctld-uses-xdaemon_-for-systemd.patch
|
|
* removed patch slurmd-uses-xdaemon_-for-systemd.patch
|
|
* removed patch slurmdbd-uses-xdaemon_-for-systemd.patch
|
|
* removed patch slurmsmwd-uses-xdaemon_-for-systemd.patch
|
|
* removed patch removed-deprecated-xdaemon.patch
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Feb 5 15:37:05 UTC 2020 - Christian Goll <cgoll@suse.com>
|
|
|
|
- standard slurm.conf uses now also SlurmctldHost on all build
|
|
targets (bsc#1162377)
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Jan 27 08:42:55 UTC 2020 - Egbert Eich <eich@suse.com>
|
|
|
|
- Fix a missed systemd_requires -> systemd_ordering conversion.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Jan 24 17:31:18 UTC 2020 - Egbert Eich <eich@suse.com>
|
|
|
|
- Remove special OHPC compatibility macro: these settings should
|
|
be applied univerally.
|
|
- Add a Recommends for mariadb to slurm-slurmdbd: it is recommened
|
|
to run the database on the same machine as the daemon.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Jan 24 11:47:58 UTC 2020 - Dominique Leuenberger <dimstar@opensuse.org>
|
|
|
|
- BuildRequire pkgconfig(systemd) instead of systemd: allow OBS to
|
|
shortcut through the -mini flavors.
|
|
- Use systemd_ordering instead of systemd_requires: systemd is
|
|
never a strict requirement; but in case the system is scheduled
|
|
for installation together with systemd, we want systemd to be
|
|
installed prior to slurm.
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Jan 23 17:44:29 UTC 2020 - Christian Goll <cgoll@suse.com>
|
|
|
|
- start slurmdbd after mariadb (bsc#1161716)
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Jan 13 15:41:48 UTC 2020 - Egbert Eich <eich@suse.com>
|
|
|
|
- Fix base_ver for SLE 15 SP2.
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Jan 8 20:01:19 UTC 2020 - Egbert Eich <eich@suse.com>
|
|
|
|
- Update to version 19.05.5 (jsc#SLE-8491)
|
|
* Check %docdir/NEWS for details.
|
|
* Includes security fixes CVE-2019-19727, CVE-2019-19728,
|
|
CVE-2019-12838.
|
|
* Disable i586 builds as this is no longer supported.
|
|
* Create libnss_slurm package to support user and group resolution
|
|
thru slurmstepd.
|
|
* slurm-2.4.4-rpath.patch -> Remove-rpath-from-build.patch
|
|
Obsoleted:
|
|
- pam_slurm_adopt-avoid-running-outside-of-the-sshd-PA.patch
|
|
- pam_slurm_adopt-send_user_msg-don-t-copy-undefined-d.patch
|
|
- pam_slurm_adopt-use-uid-to-determine-whether-root-is.patch
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Jan 2 09:14:56 UTC 2020 - Egbert Eich <eich@suse.com>
|
|
|
|
- Deprecate "ControlMachine" only for SLURM version upgrades and
|
|
products newer than 1501. This ensures that the original setting
|
|
is retained for the SLURM version shipped origianlly with SLE-15-SP1
|
|
or Leap 15.1.
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Dec 21 09:07:42 UTC 2019 - Egbert Eich <eich@suse.com>
|
|
|
|
- Update to v18.08.9 for fixing CVE-2019-19728 (bsc#1159692).
|
|
* Wrap END_TIMER{,2,3} macro definition in "do {} while (0)" block.
|
|
* Make sview work with glib2 v2.62.
|
|
* Make Slurm compile on linux after sys/sysctl.h was deprecated.
|
|
* Install slurmdbd.conf.example with 0600 permissions to encourage secure
|
|
use. CVE-2019-19727.
|
|
* srun - do not continue with job launch if --uid fails. CVE-2019-19728.
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Dec 11 18:23:46 UTC 2019 - Christian Goll <cgoll@suse.com>
|
|
|
|
- added pmix support jsc#SLE-10800
|
|
|
|
-------------------------------------------------------------------
|
|
Sun Dec 8 11:33:42 UTC 2019 - Egbert Eich <eich@suse.com>
|
|
|
|
- Use --with-shared-libslurm to build slurm binaries using libslurm.
|
|
- Make libslurm depend on slurm-config.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Dec 6 17:06:32 UTC 2019 - Egbert Eich <eich@suse.com>
|
|
|
|
- Fix ownership of /var/spool/slurm on new installations
|
|
and upgrade (boo#1158696).
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Oct 31 10:18:21 UTC 2019 - Egbert Eich <eich@suse.com>
|
|
|
|
- Fix permissions of slurmdbd.conf (bsc#1155784, CVE-2019-19727).
|
|
- Fix %posttrans macro _res_update to cope with added newline
|
|
(bsc#1153259).
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Oct 21 15:54:43 UTC 2019 - Egbert Eich <eich@suse.com>
|
|
|
|
- Add package slurm-webdoc which sets up a web server to provide
|
|
the documentation for the version shipped.
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Oct 7 15:39:43 UTC 2019 - Egbert Eich <eich@suse.com>
|
|
|
|
- Move srun from 'slurm' to 'slurm-node': srun is required on the
|
|
nodes as well so sbatch will work. 'slurm-node' is a requirement
|
|
when 'slurm' is installed (bsc#1153095).
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Oct 2 08:26:02 UTC 2019 - Egbert Eich <eich@suse.com>
|
|
|
|
- Set %base_ver for SLE-15-SP2 to 18.08 (for now).
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Sep 11 10:55:25 UTC 2019 - Egbert Eich <eich@suse.com>
|
|
|
|
- Edit sample configuration to deprecate "ControlMachine",
|
|
"ControlAddr", "BackupController" and "BackupAddr" in favor
|
|
"SlurmctldHost".
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Aug 17 14:20:35 UTC 2019 - Egbert Eich <eich@suse.com>
|
|
|
|
- Fix logic of slurm-munge recommends: slurm-munge requires munge
|
|
already, so if we have munge installed we recommend slurm-munge
|
|
as the authentication when installing slurm or slurm-node.
|
|
|
|
-------------------------------------------------------------------
|
|
Sun Jul 14 13:28:13 UTC 2019 - Egbert Eich <eich@suse.com>
|
|
|
|
- Fix build for SLE-11-SP4 and older.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Jul 12 09:04:55 UTC 2019 - Christian Goll <cgoll@suse.com>
|
|
|
|
- added cray depend libraries to seperate package, as they are now
|
|
built, since json is enabled
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Jul 11 10:57:52 UTC 2019 - Christian Goll <cgoll@suse.com>
|
|
|
|
- Updated to 18.08.8 for fixing (CVE-2019-12838, bsc#1140709, jsc#SLE-7341,
|
|
jsc#SLE-7342)
|
|
* Update "xauth list" to use the same 10000ms timeout as the other xauth
|
|
commands.
|
|
* Fix issue in gres code to handle a gres cnt of 0.
|
|
* Don't purge jobs if backfill is running.
|
|
* Verify job is pending add/removing accrual time.
|
|
* Don't abort when the job doesn't have an association that was removed
|
|
before the job was able to make it to the database.
|
|
* Set state_reason if select_nodes() fails job for QOS or Account.
|
|
* Avoid seg_fault on referencing association without a valid_qos bitmap.
|
|
* If Association/QOS is removed on a pending job set that job as ineligible.
|
|
* When changing a jobs account/qos always make sure you remove the old limits.
|
|
* Don't reset a FAIL_QOS or FAIL_ACCOUNT job reason until the qos or
|
|
account changed.
|
|
* Restore "sreport -T ALL" functionality.
|
|
* Correctly typecast signals being sent through the api.
|
|
* Properly initialize structures throughout Slurm.
|
|
* Sync "numtask" squeue format option for jobs and steps to "numtasks".
|
|
* Fix sacct -PD to avoid CA before start jobs.
|
|
* Fix potential deadlock with backup slurmctld.
|
|
* Fixed issue with jobs not appearing in sacct after dependency satisfied.
|
|
* Fix showing non-eligible jobs when asking with -j and not -s.
|
|
* Fix issue with backfill scheduler scheduling tasks of an array
|
|
when not the head job.
|
|
* accounting_storage/mysql - fix SIGABRT in the archive load logic.
|
|
* accounting_storage/mysql - fix memory leak in the archive load logic.
|
|
* Limit records per single SQL statement when loading archived data.
|
|
* Fix unnecessary reloading of job submit plugins.
|
|
* Allow job submit plugins to be turned on/off with a reconfigure.
|
|
* Fix segfault when loading/unloading Lua job submit plugin multiple times.
|
|
* Fix printing duplicate error messages of jobs rejected by job submit plugin.
|
|
* Fix printing of job submit plugin messages of het jobs without pack id.
|
|
* Fix memory leak in group_cache.c
|
|
* Fix jobs stuck from FedJobLock when requeueing in a federation
|
|
* Fix requeueing job in a federation of clusters with differing associations
|
|
* sacctmgr - free memory before exiting in 'sacctmgr show runaway'.
|
|
* Fix seff showing memory overflow when steps tres mem usage is 0.
|
|
* Upon archive file name collision, create new archive file instead of
|
|
overwriting the old one to prevent lost records.
|
|
* Limit archive files to 50000 records per file so that archiving large
|
|
databases will succeed.
|
|
* Remove stray newlines in SPANK plugin error messages.
|
|
* Fix archive loading events.
|
|
* In select/cons_res: Only allocate 1 CPU per node with the --overcommit and
|
|
--nodelist options.
|
|
* Fix main scheduler from potentially not running through whole queue.
|
|
* cons_res/job_test - prevent a job from overallocating a node memory.
|
|
* cons_res/job_test - fix to consider a node's current allocated memory when
|
|
testing a job's memory request.
|
|
* Fix issue where multi-node job steps on cloud nodes wouldn't finish cleaning
|
|
up until the end of the job (rather than the end of the step).
|
|
* Fix issue with a 17.11 sbcast call to a 18.08 daemon.
|
|
* Add new job bit_flags of JOB_DEPENDENT.
|
|
* Make it so dependent jobs reset the AccrueTime and do not count against any
|
|
AccrueTime limits.
|
|
* Fix sacctmgr --parsable2 output for reservations and tres.
|
|
* Prevent slurmctld from potential segfault after job_start_data() called
|
|
for completing job.
|
|
* Fix jobs getting on nodes with "scontrol reboot asap".
|
|
* Record node reboot events to database.
|
|
* Fix node reboot failure message getting to event table.
|
|
* Don't write "(null)" to event table when no event reason exists.
|
|
* Fix minor memory leak when clearing runaway jobs.
|
|
* Avoid flooding slurmctld and logging when prolog complete RPC errors occur.
|
|
* Fix GCC 9 compiler warnings.
|
|
* Fix seff human readable memory string for values below a megabyte.
|
|
* Fix dump/load of rejected heterogeneous jobs.
|
|
* For heterogeneous jobs, do not count the each component against the QOS or
|
|
association job limit multiple times.
|
|
* slurmdbd - avoid reservation flag column corruption with the use of newer
|
|
flags, instead preserve the older flag fields that we can still fit in the
|
|
smallint field, and discard the rest.
|
|
* Fix security issue in accounting_storage/mysql plugin on archive file loads
|
|
by always escaping strings within the slurmdbd. CVE-2019-12838.
|
|
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Jul 8 08:19:23 UTC 2019 - Egbert Eich <eich@suse.com>
|
|
|
|
- Fix build dependency issue around libibmad-devel introduced
|
|
in SLE-12-SP4.
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Jul 8 05:41:11 UTC 2019 - Egbert Eich <eich@suse.com>
|
|
|
|
- Add BuildRequires to address warnings during build:
|
|
* for libcurl-devel, libssh2-devel and rrdtool-devel
|
|
* for libjson-c-devel and liblz4-devel where available,
|
|
disable these with --without-json and --without-lz4
|
|
where not.
|
|
* disable DataWarp (--without-datawarp).
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Jul 6 20:07:53 UTC 2019 - Egbert Eich <eich@suse.com>
|
|
|
|
- Update SLURM to 18.08.7:
|
|
* Set debug statement to debug2 to avoid benign error messages.
|
|
* Add SchedulerParameters option of bf_hetjob_immediate to attempt to start
|
|
a heterogeneous job as soon as all of its components are determined able
|
|
to do so.
|
|
* Fix underflow causing decay thread to exit.
|
|
* Fix main scheduler not considering hetjobs when building the job queue.
|
|
* Fix regression for sacct to display old jobs without a start time.
|
|
* Fix setting correct number of gres topology bits.
|
|
* Update hetjobs pending state reason when appropriate.
|
|
* Fix accounting_storage/filetxt's understanding of TRES.
|
|
* Set Accrue time when not enforcing limits.
|
|
* Fix srun segfault when requesting a hetjob with test_exec or bcast
|
|
options.
|
|
* Hide multipart priorities log message behind Priority debug flag.
|
|
* sched/backfill - Make hetjobs sensitive to bf_max_job_start.
|
|
* Fix slurmctld segfault due to job's partition pointer NULL dereference.
|
|
* Fix issue with OR'ed job dependencies.
|
|
* Add new job's bit_flags of INVALID_DEPEND to prevent rebuilding a job's
|
|
dependency string when it has at least one invalid and purged dependency.
|
|
* Promote federation unsynced siblings log message from debug to info.
|
|
* burst_buffer/cray - fix slurmctld SIGABRT due to illegal read/writes.
|
|
* burst_buffer/cray - fix memory leak due to unfreed job script content.
|
|
* node_features/knl_cray - fix script_argv use-after-free.
|
|
* burst_buffer/cray - fix script_argv use-after-free.
|
|
* Fix invalid reads of size 1 due to non null-terminated string reads.
|
|
* Add extra debug2 logs to identify why BadConstraints reason is set.
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Jul 6 18:05:33 UTC 2019 - Egbert Eich <eich@suse.com>
|
|
|
|
- Do not build hdf5 support where not available.
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Jul 6 11:21:08 UTC 2019 - Egbert Eich <eich@suse.com>
|
|
|
|
- Add support for version updates on SLE: Update packages to a
|
|
later version than the version supported originally on SLE
|
|
will receive a version string in their package name.
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Feb 27 11:06:10 UTC 2019 - Christian Goll <cgoll@suse.com>
|
|
|
|
- added the hdf5 job data gathering plugin
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Feb 1 19:27:10 UTC 2019 - eich@suse.com
|
|
|
|
- Add backward compatibility with SLE-11 SP4
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Jan 31 20:30:32 UTC 2019 - eich@suse.com
|
|
|
|
- Update to version 18.08.05-2:
|
|
This version obsoletes:
|
|
Fix-contrib-perlapi-to-build-with-the-fix-for-CVE-2019-6438-750cc23ed.patch
|
|
- Fix spec file for older SUSE versions.
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Jan 31 09:00:06 UTC 2019 - eich@suse.com
|
|
|
|
- Update to version 18.08.05:
|
|
* Add mitigation for a potential heap overflow on 32-bit systems in xmalloc.
|
|
(CVE-2019-6438, bsc#1123304).
|
|
* Other fixes:
|
|
+ Backfill - If a job has a time_limit guess the end time of a job better
|
|
if OverTimeLimit is Unlimited.
|
|
+ Fix "sacctmgr show events event=cluster"
|
|
+ Fix sacctmgr show runawayjobs from sibling cluster
|
|
+ Avoid bit offset of -1 in call to bit_nclear().
|
|
+ Insure that "hbm" is a configured GresType on knl systems.
|
|
+ Fix NodeFeaturesPlugins=node_features/knl_generic to allow other gres
|
|
other than knl.
|
|
+ cons_res: Prevent overflow on multiply.
|
|
+ Better debug for bad values in gres.conf.
|
|
+ Fix double accounting of energy at end of job.
|
|
+ Read gres.conf for cloud nodes on slurmctld.
|
|
+ Don't assume the first node of a job is the batch host when purging jobs
|
|
from a node.
|
|
+ Better debugging when a job doesn't have a job_resrcs ptr.
|
|
+ Store ave watts in energy plugins.
|
|
+ Add XCC plugin for reading Lenovo Power.
|
|
+ Fix minor memory leak when scheduling rebootable nodes.
|
|
+ Fix debug2 prefix for sched log.
|
|
+ Fix printing correct SLURM_JOB_ACCOUNT_PACK_GROUP_* in env for a Het Job.
|
|
+ sbatch - search current working directory first for job script.
|
|
+ Make it so held jobs reset the AccrueTime and do not count against any
|
|
AccrueTime limits.
|
|
+ Add SchedulerParameters option of bf_hetjob_prio=[min|avg|max] to alter
|
|
the job sorting algorithm for scheduling heterogeneous jobs.
|
|
+ Fix initialization of assoc_mgr_locks and slurmctld_locks lock
|
|
structures.
|
|
+ Fix segfault with job arrays using X11 forwarding.
|
|
+ Revert regression caused by e0ee1c7054 which caused negative values and
|
|
values starting with a decimal to be invalid for PriorityWeightTRES and
|
|
TRESBillingWeight.
|
|
+ Fix possibility to update a job's reservation to none.
|
|
+ Suppress connection errors to primary slurmdbd when backup dbd is active.
|
|
+ Suppress connection errors to primary db when backup db kicks in
|
|
+ Add missing fields for sacct --completion when using jobcomp/filetxt.
|
|
+ Fix incorrect values set for UserCPU, SystemCPU, and TotalCPU sacct
|
|
fields when JobAcctGatherType=jobacct_gather/cgroup.
|
|
+ Fixed srun from double printing invalid option msg twice.
|
|
+ Remove unused -b flag from getopt call in sbatch.
|
|
+ Disable reporting of node TRES in sreport.
|
|
+ Re-enabling features combined by OR within parenthesis for non-knl
|
|
setups.
|
|
+ Prevent sending duplicate requests to reboot a node before ResumeTimeout.
|
|
+ Down nodes that don't reboot by ResumeTimeout.
|
|
+ Update seff to reflect API change from rss_max to tres_usage_in_max.
|
|
+ Add missing TRES constants from perl API.
|
|
+ Fix issue where sacct would return incorrect array tasks when querying
|
|
specific tasks.
|
|
+ Add missing variables to slurmdb_stats_t in the perlapi.
|
|
+ Fix nodes not getting reboot RPC when job requires reboot of nodes.
|
|
+ Fix failing update the partition list of a job.
|
|
+ Use slurm.conf gres ids instead of gres.conf names to get a gres type
|
|
name.
|
|
* Disable
|
|
slurmctld-rerun-agent_init-when-backup-controller-takes-over.patch:
|
|
Believed to be fixed by commit c1a537dbbe6
|
|
See: https://bugs.schedmd.com/show_bug.cgi?id=5511
|
|
* Add
|
|
Fix-contrib-perlapi-to-build-with-the-fix-for-CVE-2019-6438-750cc23ed.patch:
|
|
Fix fallout from 750cc23ed for CVE-2019-6438.
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Dec 13 10:07:00 UTC 2018 - cgoll@suse.com
|
|
- Update to 18.08.04, with following highlights
|
|
* Fix message sent to user to display preempted instead of time limit when
|
|
a job is preempted.
|
|
* Fix memory leak when a failure happens processing a nodes gres config.
|
|
* Improve error message when failures happen processing a nodes gres config.
|
|
* Don't skip jobs in scontrol hold.
|
|
* Allow --cpu-bind=verbose to be used with SLURM_HINT environment variable.
|
|
* Enhanced handling for runaway jobs
|
|
* cons_res: Delay exiting cr_job_test until after cores/cpus are calculated
|
|
and distributed.
|
|
* Don't check existence of srun --prolog or --epilog executables when set to
|
|
"none" and SLURM_TEST_EXEC is used.
|
|
* Add "P" suffix support to job and step tres specifications.
|
|
* Fix jobacct_gather/cgroup to work correctly when more than one task is
|
|
started on a node.
|
|
* salloc - set SLURM_NTASKS_PER_CORE and SLURM_NTASKS_PER_SOCKET in the
|
|
environment if the corresponding command line options are used.
|
|
* slurmd - fix handling of the -f flag to specify alternate config file
|
|
locations.
|
|
* Add SchedulerParameters option of bf_ignore_newly_avail_nodes to avoid
|
|
scheduling lower priority jobs on resources that become available during
|
|
the backfill scheduling cycle when bf_continue is enabled.
|
|
* job_submit/lua: Add several slurmctld return codes and add user/group info
|
|
* salloc/sbatch/srun - print warning if mutually exclusive options of --mem
|
|
and --mem-per-cpu are both set.
|
|
- Refreshed:
|
|
* pam_slurm_adopt-avoid-running-outside-of-the-sshd-PA.patch
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Dec 10 10:49:14 UTC 2018 - cgoll@suse.com
|
|
|
|
- restarting services on update only when activated
|
|
- added rotation of logs
|
|
- Added backported patches which harden the pam module pam_slurm_adopt
|
|
(BOO#1116758) which will be in slurm 19.05.x
|
|
* added pam_slurm_adopt-avoid-running-outside-of-the-sshd-PA.patch
|
|
[PATCH 1/3] pam_slurm_adopt: avoid running outside of the sshd PAM
|
|
* added pam_slurm_adopt-send_user_msg-don-t-copy-undefined-d.patch
|
|
[PATCH 2/3] pam_slurm_adopt: send_user_msg: don't copy undefined data
|
|
* added pam_slurm_adopt-use-uid-to-determine-whether-root-is.patch
|
|
[PATCH 3/3] pam_slurm_adopt: use uid to determine whether root is
|
|
logging on
|
|
- package slurm-pam_slurm now depends on slurm-node and not on slurm
|
|
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Dec 5 16:00:50 UTC 2018 - Christian Goll <cgoll@suse.com>
|
|
|
|
- fixed code in %pretrans section to be compatible with lua 5.1
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Nov 20 11:21:37 UTC 2018 - eich@suse.com
|
|
|
|
- Added missing perl-base dependency.
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Nov 20 11:21:14 UTC 2018 - eich@suse.com
|
|
|
|
- Moved HTML docs to doc package.
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Nov 20 11:20:05 UTC 2018 - eich@suse.com
|
|
|
|
- Moved config man pages to a separate package: This way, they won't
|
|
get installed on compute nodes.
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Nov 20 11:11:15 UTC 2018 - eich@suse.com
|
|
|
|
- Update to 18.08.3
|
|
* Add new burst buffer state of "teardown-fail" to indicate the burst
|
|
buffer teardown operation is failing on specific buffers.
|
|
* Multiple backup slurmctld daemons can be configured
|
|
* Enable jobs with zero node count for creation and/or deletion of persistent
|
|
burst buffers.
|
|
* Add "scontrol show dwstat" command to display Cray burst buffer status.
|
|
* Add "GetSysStatus" option to burst_buffer.conf file.
|
|
* Add node and partition configuration options of "CpuBind" to control
|
|
default task binding.
|
|
* Add "NumaCpuBind" option to knl.conf
|
|
* Add sbatch "--batch" option to identify features required on batch node.
|
|
* Add "BatchFeatures" field to output of "scontrol show job".
|
|
* Add support for "--bb" option to sbatch command.
|
|
* Add new SystemComment field to job data structure and database.
|
|
* Expand reservation "flags" field from 32 to 64 bits.
|
|
* Add job state flag of "SIGNALING" to avoid race condition.
|
|
* Properly handle srun --will-run option when there are jobs in COMPLETING
|
|
state.
|
|
* Properly report who is signaling a step.
|
|
* Don't combine updated reservation records in sreport's reservation report.
|
|
* node_features plugin - Add suport for XOR & XAND of job constraints (node
|
|
feature specifications).
|
|
* Improvements to how srun searches for the executible when using cwd.
|
|
* Now programs can be checked before execution if test_exec is set.
|
|
* Report NodeFeatures plugin configuration with scontrol and sview commands.
|
|
* Add acct_gather_profile/influxdb plugin.
|
|
* Add new job state of SO/STAGE_OUT
|
|
* Correct SLURM_NTASKS and SLURM_NPROCS environment variable for
|
|
heterogeneous job step.
|
|
* Expand advanced reservation feature specification to support parenthesis
|
|
and counts of nodes with specified features.
|
|
* Defer job signaling until prolog is completed
|
|
* Have the primary slurmctld wait until the backup has completely shutdown
|
|
before taking control.
|
|
* Fix issue where unpacking job state after TRES count changed could lead to
|
|
invalid reads.
|
|
* Heterogeneous job steps allocations supported with Open MPI.
|
|
* Remove redundant function arguments from task plugins.
|
|
* Add Slurm configuration file check logic using "slurmctld -t" command.
|
|
* Add the use of a xml file to help performance when using hwloc.
|
|
* Remove support for "ChosLoc" configuration parameter.
|
|
* Configuration parameters "ControlMachine", "ControlAddr",
|
|
"BackupController" and "BackupAddr" replaced by an ordered list of
|
|
"SlurmctldHost" records.
|
|
* Remove --immediate option from sbatch.
|
|
* Add infrastructure for per-job and per-step TRES parameters.
|
|
* Add DefCpuPerGpu and DefMemPerGpu to global and per-partition configuration
|
|
parameters.
|
|
* Add ValidateMode configuration parameter to knl_cray.conf.
|
|
* Disable local PTY output processing when using 'srun --unbuffered'.
|
|
* Change the column name for the %U (User ID) field in squeue to 'UID'.
|
|
* CRAY - Add CheckGhalQuiesce to the CommunicationParameters.
|
|
* When a process is core dumping, avoid terminating other processes in that
|
|
task group.
|
|
* CPU frequency management enhancements: If scaling_available_frequencies
|
|
file is not available, then derive values from scaling_min_freq and
|
|
scaling_max_freq values.
|
|
* Add pending jobs count to sdiag output.
|
|
* Add configuration paramerers SlurmctldPrimaryOnProg and
|
|
SlurmctldPrimaryOffProg, which define programs to execute when a slurmctld
|
|
daemon changes state.
|
|
* Add configuration paramerers SlurmctldAddr for use with virtual IP to
|
|
manage backup slurmctld daemons.
|
|
* Explicitly shutdown the slurmd process when instructed to reboot.
|
|
* Add ability to create/update partition with TRESBillingWeights through
|
|
scontrol.
|
|
* Calcuate TRES billing values at submission.
|
|
* Add node_features plugin function "node_features_p_reboot_weight()".
|
|
* Add NodeRebootWeight parameter to knl.conf configuration file.
|
|
* Completely remove "gres" field from step record. Use "tres_per_node",
|
|
"tres_per_socket", etc.
|
|
* Add "Links" parameter to gres.conf configuration file.
|
|
* Force slurm_mktime() to set tm_isdst to -1.
|
|
* burst_buffer.conf - Add SetExecHost flag to enable burst buffer access
|
|
from the login node for interactive jobs.
|
|
* Append ", with requeued tasks" to job array "end" emails if any tasks in
|
|
the array were requeued.
|
|
* Add ResumeFailProgram slurm.conf option to specify a program that is called
|
|
when a node fails to respond by ResumeTimeout.
|
|
* Add new job pending reason of "ReqNodeNotAvail, reserved for maintenance".
|
|
* Remove AdminComment += syntax from 'scontrol update job'.
|
|
* sched/backfill: Reset job time limit if needed for deadline scheduling.
|
|
* For heterogeneous job component with required nodes, explicitly exclude
|
|
those nodes from all other job components.
|
|
* Add name of partition used to output of srun --test-only output.
|
|
* sdiag output now reports outgoing slurmctld message queue contents.
|
|
* Improve escaping special characters on user commands when specifying paths.
|
|
* Add salloc/sbatch/srun option of --gres-flags=disable-binding to disable
|
|
filtering of CPUs with respect to generic resource locality.
|
|
* SlurmDBD - Print warning if MySQL/MariaDB internal tuning is not at least
|
|
half of the recommended values.
|
|
* Add ability to specify a node reason when rebooting nodes with "scontrol
|
|
reboot".
|
|
* Add nextstate option to "scontrol reboot".
|
|
* Consider "resuming" (nextstate=resume) nodes as available in backfill
|
|
future scheduling.
|
|
* Add TimelimitRaw sacct output field to display timelimit numbers.
|
|
* Add support for sacct --whole-hetjob=[yes|no] option.
|
|
* Make salloc handle node requests the same as sbatch.
|
|
* Add shutdown_on_reboot SlurmdParameter to control whether the Slurmd will
|
|
shutdown itself down or not when a reboot request is received.
|
|
* Add cancel_reboot scontrol option to cancel pending reboot of nodes.
|
|
* Make Users case insensitive in the database based on
|
|
Parameters=PreserveCaseUser in the slurmdbd.conf.
|
|
* Improve scheduling when dealing with node_features that could have a
|
|
boot delay.
|
|
* Changed the default AuthType for slurmdbd to auth/munge.
|
|
* Added 'remote-fs.target' to After directive of slurmd.service file.
|
|
* Remove drain on node when reboot nextstate used.
|
|
* Speed up pack of job's qos.
|
|
* Add sacctmgr options to prevent/manage job queue stuffing:
|
|
- GrpJobsAccrue=<max_jobs>
|
|
- MaxJobsAccrue=<max_jobs>
|
|
* MinPrioThreshold
|
|
Minimum priority required to reserve resources when scheduling.
|
|
* Add control_inx value to trigger_info_msg_t to permit future work in the
|
|
trigger management code to distinguish which of multiple backup controllers
|
|
has changed state.
|
|
* NOTES:
|
|
PreemptType=preempt/job_prio has been removed - use PreemptType=preempt/qos
|
|
instead.
|
|
* Bluegene support was deprecated has now been removed
|
|
* cgroup_allowed_devices_file.conf was removed. It was never used by
|
|
default, as ConstrainDevices was not set. If needed, refer to the
|
|
cgroups.conf man page on how to create one.
|
|
* slurm.epilog.clean: Removed. User should use pam_slurm_adopt instead.
|
|
- Refreshed:
|
|
* removed-deprecated-xdaemon.patch
|
|
* slurmctld-uses-xdaemon_-for-systemd.patch
|
|
* slurmd-uses-xdaemon_-for-systemd.patch
|
|
* slurmdbd-uses-xdaemon_-for-systemd.patch
|
|
* slurmsmwd-uses-xdaemon_-for-systemd.patch
|
|
* slurmctld-rerun-agent_init-when-backup-controller-takes-over.patch
|
|
|
|
-------------------------------------------------------------------
|
|
Sun Sep 30 15:18:08 UTC 2018 - eich@suse.com
|
|
|
|
- Move config man-pages to config package.
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Sep 24 09:25:57 UTC 2018 - cgoll@suse.com
|
|
|
|
- added correct link flags for perl bindings (bsc#1108671)
|
|
* added correct linker search path in slurm-2.4.4-rpath.patch
|
|
* perl:Switch is required by slurm torque wrappers
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Sep 22 06:09:18 UTC 2018 - eich@suse.com
|
|
|
|
- Fix Requires(pre) and Requires(post) for slurm-config and slurm-node.
|
|
This fixes issues with failing slurm user creation when installed
|
|
during initial system installation (bsc#1109373).
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Aug 14 10:26:43 UTC 2018 - eich@suse.com
|
|
|
|
- Update to 17.11.9
|
|
* Fix segfault in slurmctld when a job's node bitmap is NULL during a
|
|
scheduling cycle. Primarily caused by EnforcePartLimits=ALL.
|
|
* Remove erroneous unlock in acct_gather_energy/ipmi.
|
|
* Enable support for hwloc version 2.0.1.
|
|
* Fix 'srun -q' (--qos) option handling.
|
|
* Fix socket communication issue that can lead to lost task completition
|
|
messages, which will cause a permanently stuck srun process.
|
|
* Handle creation of TMPDIR if environment variable is set or changed in
|
|
a task prolog script.
|
|
* Avoid node layout fragmentation if running with a fixed CPU count but
|
|
without Sockets and CoresPerSocket defined.
|
|
* burst_buffer/cray - Fix datawarp swap default pool overriding jobdw.
|
|
* Fix incorrect job priority assignment for multi-partition job with
|
|
different PriorityTier settings on the partitions.
|
|
* Fix sinfo to print correct node state.
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Aug 2 11:35:55 UTC 2018 - eich@suse.com
|
|
|
|
- When using a remote shared StateSaveLocation, slurmctld needs to
|
|
be started after remote filesystems have become available.
|
|
Add 'remote-fs.target' to the 'After=' directive in slurmctld.service
|
|
(boo#1103561).
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Jul 31 18:29:40 UTC 2018 - eich@suse.com
|
|
|
|
- Update to 17.11.8
|
|
* Fix incomplete RESPONSE_[RESOURCE|JOB_PACK]_ALLOCATION building path.
|
|
* Do not allocate nodes that were marked down due to the node not responding
|
|
by ResumeTimeout.
|
|
* task/cray plugin - search for "mems" cgroup information in the file
|
|
"cpuset.mems" then fall back to the file "mems".
|
|
* Fix ipmi profile debug uninitialized variable.
|
|
* PMIx: fixed the direct connect inline msg sending.
|
|
* MYSQL: Fix issue not handling all fields when loading an archive dump.
|
|
* Allow a job_submit plugin to change the admin_comment field during
|
|
job_submit_plugin_modify().
|
|
* job_submit/lua - fix access into reservation table.
|
|
* MySQL - Prevent deadlock caused by archive logic locking reads.
|
|
* Don't enforce MaxQueryTimeRange when requesting specific jobs.
|
|
* Modify --test-only logic to properly support jobs submitted to more than
|
|
one partition.
|
|
* Prevent slurmctld from abort when attempting to set non-existing
|
|
qos as def_qos_id.
|
|
* Add new job dependency type of "afterburstbuffer". The pending job will be
|
|
delayed until the first job completes execution and it's burst buffer
|
|
stage-out is completed.
|
|
* Reorder proctrack/task plugin load in the slurmstepd to match that of
|
|
slurmd
|
|
and avoid race condition calling task before proctrack can introduce.
|
|
* Prevent reboot of a busy KNL node when requesting inactive features.
|
|
* Revert to previous behavior when requesting memory per cpu/node introduced
|
|
in 17.11.7.
|
|
* Fix to reinitialize previously adjusted job members to their original
|
|
value
|
|
when validating the job memory in multi-partition requests.
|
|
* Fix _step_signal() from always returning SLURM_SUCCESS.
|
|
* Combine active and available node feature change logs on one line rather
|
|
than one line per node for performance reasons.
|
|
* Prevent occasionally leaking freezer cgroups.
|
|
* Fix potential segfault when closing the mpi/pmi2 plugin.
|
|
* Fix issues with --exclusive=[user|mcs] to work correctly
|
|
with preemption or when job requests a specific list of hosts.
|
|
* Make code compile with hdf5 1.10.2+
|
|
* mpi/pmix: Fixed the collectives canceling.
|
|
* SlurmDBD: improve error message handling on archive load failure.
|
|
* Fix incorrect locking when deleting reservations.
|
|
* Fix incorrect locking when setting up the power save module.
|
|
* Fix setting format output length for squeue when showing array jobs.
|
|
* Add xstrstr function.
|
|
* Fix printing out of --hint options in sbatch, salloc --help.
|
|
* Prevent possible divide by zero in _validate_time_limit().
|
|
* Add Delegate=yes to the slurmd.service file to prevent systemd from
|
|
interfering with the jobs' cgroup hierarchies.
|
|
* Change the backlog argument to the listen() syscall within srun to 4096
|
|
to match elsewhere in the code, and avoid communication problems at scale.
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Jul 31 17:30:08 UTC 2018 - eich@suse.com
|
|
|
|
- slurmctld-rerun-agent_init-when-backup-controller-takes-over.patch:
|
|
Fix race in the slurmctld backup controller which prevents it
|
|
to clean up allocations on nodes properly after failing over
|
|
(bsc#1084917).
|
|
- Handled %license in a backward compatible manner.
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Jul 28 15:30:58 UTC 2018 - eich@suse.com
|
|
|
|
- Add a 'Recommends: slurm-munge' to slurm-slurmdbd.
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Jul 11 12:04:55 UTC 2018 - eich@suse.com
|
|
|
|
- Shield comments between script snippets with a %{!?nil:...} to
|
|
avoid them being interpreted as scripts - in which case the update
|
|
level is passed as argument (see chapter 'Shared libraries' in:
|
|
https://en.opensuse.org/openSUSE:Packaging_scriptlet_snippets)
|
|
(bsc#1100850).
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Jun 5 13:24:43 UTC 2018 - cgoll@suse.com
|
|
|
|
- Update from 17.11.5 to 17.11.7
|
|
- Fix security issue in handling of username and gid fields
|
|
CVE-2018-10995 and bsc#1095508 what implied an
|
|
update from 17.11.5 to 17.11.7
|
|
Highlights of 17.11.6:
|
|
* CRAY - Add slurmsmwd to the contribs/cray dir
|
|
* PMIX - Added the direct connect authentication.
|
|
* Prevent the backup slurmctld from losing the active/available node
|
|
features list on takeover.
|
|
* Be able to force power_down of cloud node even if in power_save state.
|
|
* Allow cloud nodes to be recognized in Slurm when booted out of band.
|
|
* Numerous fixes - check 'NEWS' file.
|
|
Highlights of 17.11.7:
|
|
* Notify srun and ctld when unkillable stepd exits.
|
|
* Numerous fixes - check 'NEWS' file.
|
|
- Add: slurmsmwd-uses-xdaemon_-for-systemd.patch
|
|
* Fixes daemoniziation in newly introduced slurmsmwd daemon.
|
|
- Rename:
|
|
split-xdaemon-in-xdaemon_init-and-xdaemon_finish-for-systemd-compatibilty.patch
|
|
to split-xdaemon-in-xdaemon_init-and-xdaemon_finish-for.patch
|
|
* remain in sync with commit messages which introduced that file
|
|
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Apr 19 21:05:04 UTC 2018 - eich@suse.com
|
|
|
|
- Avoid running pretrans scripts when running in an instsys:
|
|
there may be not much installed, yet. pretrans code should
|
|
be done in lua, this way, it will be executed by the rpm-internal
|
|
lua interpreter and not be passed to a shell which may not be
|
|
around at the time this scriptlet is run (bsc#1090292).
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Apr 13 10:03:05 UTC 2018 - eich@suse.com
|
|
|
|
- Add requires for slurm-sql to the slurmdbd package.
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Apr 12 17:20:03 UTC 2018 - eich@suse.com
|
|
|
|
- Package READMEs for pam and pam_slurm_adopt.
|
|
- Use the new %%license directive for COPYING file.
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Apr 12 16:40:44 UTC 2018 - eich@suse.com
|
|
|
|
- Add:
|
|
* split-xdaemon-in-xdaemon_init-and-xdaemon_finish-for-systemd-compatibilty.patch
|
|
* slurmctld-uses-xdaemon_-for-systemd.patch
|
|
* slurmd-uses-xdaemon_-for-systemd.patch
|
|
* slurmdbd-uses-xdaemon_-for-systemd.patch
|
|
* removed-deprecated-xdaemon.patch
|
|
Fix interaction with systemd: systemd expects that a
|
|
daemonizing process doesn't go away until the PID file
|
|
with it PID of the daemon has bee written (bsc#1084125).
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Apr 11 11:27:31 UTC 2018 - eich@suse.com
|
|
|
|
- Make sure systemd services get restarted only when all
|
|
packages are in a consistent state, not in the middle
|
|
of an 'update' transaction (bsc#1088693).
|
|
Since the %postun scripts that run on update are from
|
|
the old package they cannot be changed - thus we work
|
|
around the restart breakage.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Mar 23 13:50:14 UTC 2018 - cgoll@suse.com
|
|
|
|
- fixed wrong log file location in slurmdbd.conf and
|
|
fixed pid location for slurmdbd and made slurm-slurmdbd
|
|
depend on slurm config which provides the dir /var/run/slurm
|
|
(bsc#1086859).
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Mar 16 08:57:20 UTC 2018 - cgoll@suse.com
|
|
|
|
- added comment for (bsc#1085606)
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Mar 14 19:34:58 UTC 2018 - eich@suse.com
|
|
|
|
- Fix security issue in accounting_storage/mysql plugin by always escaping
|
|
strings within the slurmdbd. CVE-2018-7033
|
|
http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2018-7033
|
|
(bsc#1085240).
|
|
- Update slurm to v17.11.5 (FATE#325451)
|
|
Highlights of 17.11:
|
|
* Support for federated clusters to manage a single work-flow
|
|
across a set of clusters.
|
|
* Support for heterogeneous job allocations (various processor types,
|
|
memory sizes, etc. by job component). Support for heterogeneous job
|
|
steps within a single MPI_COMM_WORLD is not yet supported for most
|
|
configurations.
|
|
* X11 support is now fully integrated with the main Slurm code. Remove
|
|
any X11 plugin configured in your plugstack.conf file to avoid errors
|
|
being logged about conflicting options.
|
|
* Added new advanced reservation flag of "flex", which permits jobs
|
|
requesting the reservation to begin prior to the reservation's
|
|
start time and use resources inside or outside of the reservation.
|
|
A typical use case is to prevent jobs not explicitly requesting the
|
|
reservation from using those reserved resources rather than forcing
|
|
jobs requesting the reservation to use those resources in the time
|
|
frame reserved.
|
|
* The sprio command has been modified to report a job's priority
|
|
information for every partition the job has been submitted to.
|
|
* Group ID lookup performed at job submit time to avoid lookup on
|
|
all compute nodes. Enable with PrologFlags=SendGIDs configuration
|
|
parameter.
|
|
* Slurm commands and daemons dynamically link to libslurmfull.so
|
|
instead of statically linking. This dramatically reduces the
|
|
footprint of Slurm.
|
|
* In switch plugin, added plugin_id symbol to plugins and wrapped
|
|
switch_jobinfo_t with dynamic_plugin_data_t in interface calls
|
|
in order to pass switch information between clusters with different
|
|
switch types.
|
|
* Changed default ProctrackType to cgroup.
|
|
* Changed default sched_min_interval from 0 to 2 microseconds.
|
|
* Added new 'scontrol write batch_script ' command to fetch a job's
|
|
batch script. Removed the ability to see the script as part of the
|
|
'scontrol -dd show job' command.
|
|
* Add new "billing" TRES which allows jobs to be limited based on the
|
|
job's billable TRES calculated by the job's partition's
|
|
TRESBillingWeights.
|
|
* Regular user use of "scontrol top" command is now disabled. Use the
|
|
configuration parameter "SchedulerParameters=enable_user_top" to
|
|
enable that functionality. The configuration parameter
|
|
"SchedulerParameters=disable_user_top" will be silently ignored.
|
|
* Change default to let pending jobs run outside of reservation after
|
|
reservation is gone to put jobs in held state. Added
|
|
NO_HOLD_JOBS_AFTER_END reservation flag to use old default.
|
|
Support for PMIx v2.0 as well as UCX support.
|
|
* Remove plugins for obsolete MPI stacks:
|
|
- lam
|
|
- mpich1_p4
|
|
- mpich1_shmem
|
|
- mvapich
|
|
* Numerous fixes - check 'NEWS' file.
|
|
- slurmd-Fix-slurmd-for-new-API-in-hwloc-2.0.patch
|
|
plugins-cgroup-Fix-slurmd-for-new-API-in-hwloc-2.0.patch:
|
|
Removed. Code upstream.
|
|
- slurmctld-service-var-run-path.patch:
|
|
Replaced by sed script.
|
|
- Fix some rpmlint warnings.
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Jan 29 13:43:57 UTC 2018 - cgoll@suse.com
|
|
|
|
- moved config files to slurm-config package (FATE#324574).
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Jan 29 04:01:28 UTC 2018 - jjolly@suse.com
|
|
|
|
- Moved slurmstepd and man page into slurm-node due to slurmd dependency
|
|
- Moved config files into slurm-node
|
|
- Moved slurmd rc scripts into slurm-node
|
|
- Made slurm-munge require slurm-plugins instead of slurm itself
|
|
- slurm-node suggested slurm-munge, causing the whole slurm to be
|
|
installed. The slurm-plugins seems to be a more base class
|
|
(FATE#324574).
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Jan 17 14:21:49 UTC 2018 - cgoll@suse.com
|
|
|
|
- split up light wight slurm-node package for deployment on nodes
|
|
(FATE#324574).
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Dec 1 16:04:55 UTC 2017 - cgoll@suse.com
|
|
|
|
- added /var/spool/ directory and removed duplicated entries from slurm.conf
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Nov 10 13:52:30 UTC 2017 - eich@suse.com
|
|
|
|
- Package so-versioned libs separately. libslurm is expected
|
|
to change more frequently and thus is packaged separately
|
|
from libpmi.
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Nov 1 16:15:04 UTC 2017 - eich@suse.com
|
|
|
|
- Updated to 17.02.9 to fix CVE-2017-15566 (bsc#1065697).
|
|
Changes in 17.0.9
|
|
* When resuming powered down nodes, mark DOWN nodes right after
|
|
ResumeTimeout
|
|
has been reached (previous logic would wait about one minute longer).
|
|
* Fix sreport not showing full column name for TRES Count.
|
|
* Fix slurmdb_reservations_get() giving wrong usage data when job's spanned
|
|
reservation that was modified.
|
|
* Fix sreport reservation utilization report showing bad data.
|
|
* Show all TRES' on a reservation in sreport reservation utilization report
|
|
by default.
|
|
* Fix sacctmgr show reservation handling "end" parameter.
|
|
* Work around issue with sysmacros.h and gcc7 / glibc 2.25.
|
|
* Fix layouts code to only allow setting a boolean.
|
|
* Fix sbatch --wait to keep waiting even if a message timeout occurs.
|
|
* CRAY - If configured with NodeFeatures=knl_cray and there are non-KNL
|
|
nodes which include no features the slurmctld will abort without
|
|
this patch when attemping strtok_r(NULL).
|
|
* Fix regression in 17.02.7 which would run the spank_task_privileged as
|
|
part of the slurmstepd instead of it's child process.
|
|
* Fix security issue in Prolog and Epilog by always prepending SPANK_ to
|
|
all user-set environment variables. CVE-2017-15566.
|
|
Changes in 17.0.8:
|
|
* Add 'slurmdbd:' to the accounting plugin to notify message is from dbd
|
|
instead of local.
|
|
* mpi/mvapich - Buffer being only partially cleared. No failures observed.
|
|
* Fix for job --switch option on dragonfly network.
|
|
* In salloc with --uid option, drop supplementary groups before changing UID.
|
|
* jobcomp/elasticsearch - strip any trailing slashes from JobCompLoc.
|
|
* jobcomp/elasticsearch - fix memory leak when transferring generated buffer.
|
|
* Prevent slurmstepd ABRT when parsing gres.conf CPUs.
|
|
* Fix sbatch --signal to signal all MPI ranks in a step instead of just those
|
|
on node 0.
|
|
* Check multiple partition limits when scheduling a job that were previously
|
|
only checked on submit.
|
|
* Cray: Avoid running application/step Node Health Check on the external
|
|
job step.
|
|
* Optimization enhancements for partition based job preemption.
|
|
* Address some build warnings from GCC 7.1, and one possible memory leak if
|
|
/proc is inaccessible.
|
|
* If creating/altering a core based reservation with scontrol/sview on a
|
|
remote cluster correctly determine the select type.
|
|
* Fix autoconf test for libcurl when clang is used.
|
|
* Fix default location for cgroup_allowed_devices_file.conf to use correct
|
|
default path.
|
|
* Document NewName option to sacctmgr.
|
|
* Reject a second PMI2_Init call within a single step to prevent slurmstepd
|
|
from hanging.
|
|
* Handle old 32bit values stored in the database for requested memory
|
|
correctly in sacct.
|
|
* Fix memory leaks in the task/cgroup plugin when constraining devices.
|
|
* Make extremely verbose info messages debug2 messages in the task/cgroup
|
|
plugin when constraining devices.
|
|
* Fix issue that would deny the stepd access to /dev/null where GRES has a
|
|
'type' but no file defined.
|
|
* Fix issue where the slurmstepd would fatal on job launch if you have no
|
|
gres listed in your slurm.conf but some in gres.conf.
|
|
* Fix validating time spec to correctly validate various time formats.
|
|
* Make scontrol work correctly with job update timelimit [+|-]=.
|
|
* Reduce the visibily of a number of warnings in _part_access_check.
|
|
* Prevent segfault in sacctmgr if no association name is specified for
|
|
an update command.
|
|
* burst_buffer/cray plugin modified to work with changes in Cray UP05
|
|
software release.
|
|
* Fix job reasons for jobs that are violating assoc MaxTRESPerNode limits.
|
|
* Fix segfault when unpacking a 16.05 slurm_cred in a 17.02 daemon.
|
|
* Fix setting TRES limits with case insensitive TRES names.
|
|
* Add alias for xstrncmp() -- slurm_xstrncmp().
|
|
* Fix sorting of case insensitive strings when using xstrcasecmp().
|
|
* Gracefully handle race condition when reading /proc as process exits.
|
|
* Avoid error on Cray duplicate setup of core specialization.
|
|
* Skip over undefined (hidden in Slurm) nodes in pbsnodes.
|
|
* Add empty hashes in perl api's slurm_load_node() for hidden nodes.
|
|
* CRAY - Add rpath logic to work for the alpscomm libs.
|
|
* Fixes for administrator extended TimeLimit (job reason & time limit reset).
|
|
* Fix gres selection on systems running select/linear.
|
|
* sview: Added window decorator for maximize,minimize,close buttons for all
|
|
systems.
|
|
* squeue: interpret negative length format specifiers as a request to
|
|
delimit values with spaces.
|
|
* Fix the torque pbsnodes wrapper script to parse a gres field with a type
|
|
set correctly.
|
|
- Fixed ABI version of libslurm.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Oct 6 13:53:08 UTC 2017 - jengelh@inai.de
|
|
|
|
- Trim redundant wording in descriptions.
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Sep 27 11:08:29 UTC 2017 - jjolly@suse.com
|
|
|
|
- Updated to slurm 17-02-7-1
|
|
* Added python as BuildRequires
|
|
* Removed sched-wiki package
|
|
* Removed slurmdb-direct package
|
|
* Obsoleted sched-wiki and slurmdb-direct packages
|
|
* Removing Cray-specific files
|
|
* Added /etc/slurm/layout.d files (new for this version)
|
|
* Remove /etc/slurm/cgroup files from package
|
|
* Added lib/slurm/mcs_account.so
|
|
* Removed lib/slurm/jobacct_gather_aix.so
|
|
* Removed lib/slurm/job_submit_cnode.so
|
|
- Created slurm-sql package
|
|
- Moved files from slurm-plugins to slurm-torque package
|
|
- Moved creation of /usr/lib/tmpfiles.d/slurm.conf into slurm.spec
|
|
* Removed tmpfiles.d-slurm.conf
|
|
- Changed /var/run path for slurm daemons to /var/run/slurm
|
|
* Added slurmctld-service-var-run-path.patch
|
|
(FATE#324026).
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Sep 12 16:00:11 UTC 2017 - jjolly@suse.com
|
|
|
|
- Made tmpfiles_create post-install macro SLE12 SP2 or greater
|
|
- Directly calling systemd-tmpfiles --create for before SLE12 SP2
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Jul 10 03:35:41 UTC 2017 - jjolly@suse.com
|
|
|
|
- Allows OpenSUSE Factory build as well
|
|
- Removes unused .service files from project
|
|
- Adds /var/run/slurm to /usr/lib/tmpfiles.d for boottime creation
|
|
* Patches upstream .service files to allow for /var/run/slurm path
|
|
* Modifies slurm.conf to allow for /var/run/slurm path
|
|
|
|
-------------------------------------------------------------------
|
|
Tue May 30 10:24:09 UTC 2017 - eich@suse.com
|
|
|
|
- Move wrapper script mpiexec provided by slrum-torque to
|
|
mpiexec.slurm to avoid conflicts. This file is normally
|
|
provided by the MPI implementation (boo#1041706).
|
|
|
|
-------------------------------------------------------------------
|
|
Mon May 8 10:10:04 UTC 2017 - eich@suse.com
|
|
|
|
- Replace remaining ${RPM_BUILD_ROOT}s.
|
|
- Improve description.
|
|
- Fix up changelog.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Mar 31 12:43:25 UTC 2017 - eich@suse.com
|
|
- Spec file: Replace "Requires : slurm-perlapi" by
|
|
"Requires: perl-slurm = %{version}" (boo#1031872).
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Feb 16 12:12:45 UTC 2017 - jengelh@inai.de
|
|
|
|
- Trim redundant parts of description. Fixup RPM groups.
|
|
- Replace unnecessary %__ macro indirections;
|
|
replace historic $RPM_* variables by macros.
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Feb 15 18:55:28 UTC 2017 - eich@suse.com
|
|
|
|
- slurmd-Fix-for-newer-API-versions.patch:
|
|
Stale patch removed.
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Feb 7 16:47:17 UTC 2017 - eich@suse.com
|
|
|
|
- Use %slurm_u and %slurm_g macros defined at the beginning of the spec
|
|
file when adding the slurm user/group for consistency.
|
|
- Define these macros to daemon,root for non-systemd.
|
|
- For anything newer than Leap 42.1 or SLE-12-SP1 build OpenHPC compatible.
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Feb 1 20:17:47 UTC 2017 - eich@suse.com
|
|
|
|
- Updated to 16.05.8.1
|
|
* Remove StoragePass from being printed out in the slurmdbd log at debug2
|
|
level.
|
|
* Defer PATH search for task program until launch in slurmstepd.
|
|
* Modify regression test1.89 to avoid leaving vestigial job. Also reduce
|
|
logging to reduce likelyhood of Expect buffer overflow.
|
|
* Do not PATH search for mult-prog launches if LaunchParamters=test_exec is
|
|
enabled.
|
|
* Fix for possible infinite loop in select/cons_res plugin when trying to
|
|
satisfy a job's ntasks_per_core or socket specification.
|
|
* If job is held for bad constraints make it so once updated the job doesn't
|
|
go into JobAdminHeld.
|
|
* sched/backfill - Fix logic to reserve resources for jobs that require a
|
|
node reboot (i.e. to change KNL mode) in order to start.
|
|
* When unpacking a node or front_end record from state and the protocol
|
|
version is lower than the min version, set it to the min.
|
|
* Remove redundant lookup for part_ptr when updating a reservation's nodes.
|
|
* Fix memory and file descriptor leaks in slurmd daemon's sbcast logic.
|
|
* Do not allocate specialized cores to jobs using the --exclusive option.
|
|
* Cancel interactive job if Prolog failure with "PrologFlags=contain" or
|
|
"PrologFlags=alloc" configured. Send new error prolog failure message to
|
|
the salloc or srun command as needed.
|
|
* Prevent possible out-of-bounds read in slurmstepd on an invalid #! line.
|
|
* Fix check for PluginDir within slurmctld to work with multiple directories.
|
|
* Cancel interactive jobs automatically on communication error to launching
|
|
srun/salloc process.
|
|
* Fix security issue caused by insecure file path handling triggered by the
|
|
failure of a Prolog script. To exploit this a user needs to anticipate or
|
|
cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371).
|
|
- Replace group/user add macros with function calls.
|
|
- Fix array initialzation and ensure strings are always NULL terminated in
|
|
- pam_slurm.c (bsc#1007053).
|
|
- Disable building with netloc support: the netloc API is part of the devel
|
|
branch of hwloc. Since this devel branch was included accidentally and has
|
|
been reversed since, we need to disable this for the time being.
|
|
- Conditionalized architecture specific pieces to support non-x86 architectures
|
|
better.
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Jan 3 17:21:58 UTC 2017 - eich@suse.com
|
|
|
|
- Remove: unneeded 'BuildRequires: python'
|
|
- Add:
|
|
BuildRequires: freeipmi-devel
|
|
BuildRequires: libibmad-devel
|
|
BuildRequires: libibumad-devel
|
|
so they are picked up by the slurm build.
|
|
- Enable modifications from openHPC Project.
|
|
- Enable lua API package build.
|
|
- Add a recommends for slurm-munge to the slurm package:
|
|
This is way, the munge auth method is available and slurm
|
|
works out of the box.
|
|
- Create /var/lib/slurm as StateSaveLocation directory.
|
|
/tmp is dangerous.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Dec 2 19:39:56 UTC 2016 - eich@suse.com
|
|
|
|
- Create slurm user/group in preinstall script.
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Nov 30 15:16:05 UTC 2016 - eich@suse.com
|
|
|
|
- Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE.
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Nov 22 21:42:04 UTC 2016 - eich@suse.com
|
|
|
|
- Fix build with and without OHCP_BUILD define.
|
|
- Fix build for systemd and non-systemd.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Nov 4 20:15:47 UTC 2016 - eich@suse.com
|
|
|
|
- Updated to 16-05-5 - equvalent to OpenHPC 1.2.
|
|
* Fix issue with resizing jobs and limits not be kept track of correctly.
|
|
* BGQ - Remove redeclaration of job_read_lock.
|
|
* BGQ - Tighter locks around structures when nodes/cables change state.
|
|
* Make it possible to change CPUsPerTask with scontrol.
|
|
* Make it so scontrol update part qos= will take away a partition QOS from
|
|
a partition.
|
|
* Backfill scheduling properly synchronized with Cray Node Health Check.
|
|
Prior logic could result in highest priority job getting improperly
|
|
postponed.
|
|
* Make it so daemons also support TopologyParam=NoInAddrAny.
|
|
* If scancel is operating on large number of jobs and RPC responses from
|
|
slurmctld daemon are slow then introduce a delay in sending the cancel job
|
|
requests from scancel in order to reduce load on slurmctld.
|
|
* Remove redundant logic when updating a job's task count.
|
|
* MySQL - Fix querying jobs with reservations when the id's have rolled.
|
|
* Perl - Fix use of uninitialized variable in slurm_job_step_get_pids.
|
|
* Launch batch job requsting --reboot after the boot completes.
|
|
* Do not attempt to power down a node which has never responded if the
|
|
slurmctld daemon restarts without state.
|
|
* Fix for possible slurmstepd segfault on invalid user ID.
|
|
* MySQL - Fix for possible race condition when archiving multiple clusters
|
|
at the same time.
|
|
* Add logic so that slurmstepd can be launched under valgrind.
|
|
* Increase buffer size to read /proc/*/stat files.
|
|
* Remove the SchedulerParameters option of "assoc_limit_continue", making it
|
|
the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop"
|
|
is set and a job cannot start due to association limits, then do not attempt
|
|
to initiate any lower priority jobs in that partition. Setting this can
|
|
decrease system throughput and utlization, but avoid potentially starving
|
|
larger jobs by preventing them from launching indefinitely.
|
|
* Update a node's socket and cores per socket counts as needed after a node
|
|
boot to reflect configuration changes which can occur on KNL processors.
|
|
Note that the node's total core count must not change, only the distribution
|
|
of cores across varying socket counts (KNL NUMA nodes treated as sockets by
|
|
Slurm).
|
|
* Rename partition configuration from "Shared" to "OverSubscribe". Rename
|
|
salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old
|
|
options will continue to function. Output field names also changed in
|
|
scontrol, sinfo, squeue and sview.
|
|
* Add SLURM_UMASK environment variable to user job.
|
|
* knl_conf: Added new configuration parameter of CapmcPollFreq.
|
|
* Cleanup two minor Coverity warnings.
|
|
* Make it so the tres units in a job's formatted string are converted like
|
|
they are in a step.
|
|
* Correct partition's MaxCPUsPerNode enforcement when nodes are shared by
|
|
multiple partitions.
|
|
* node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references.
|
|
* Display thread name instead of thread id and remove process name in stderr
|
|
logging for "thread_id" LogTimeFormat.
|
|
* Log IP address of bad incomming message to slurmctld.
|
|
* If a user requests tasks, nodes and ntasks-per-node and
|
|
tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node.
|
|
* Release CPU "owner" file locks.
|
|
* Update seff to fix warnings with ncpus, and list slurm-perlapi dependency
|
|
in spec file.
|
|
* Allow QOS timelimit to override partition timelimit when EnforcePartLimits
|
|
is set to all/any.
|
|
* Make it so qsub will do a "basename" on a wrapped command for the output
|
|
and error files.
|
|
* Add logic so that slurmstepd can be launched under valgrind.
|
|
* Increase buffer size to read /proc/*/stat files.
|
|
* Prevent job stuck in configuring state if slurmctld daemon restarted while
|
|
PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation
|
|
as needed.
|
|
* Move test for job wait reason value of BurstBufferResources and
|
|
BurstBufferStageIn later in the scheduling logic.
|
|
* Document which srun options apply to only job, only step, or job and step
|
|
allocations.
|
|
* Use more compatible function to get thread name (>= 2.6.11).
|
|
* Make it so the extern step uses a reverse tree when cleaning up.
|
|
* If extern step doesn't get added into the proctrack plugin make sure the
|
|
sleep is killed.
|
|
* Add web links to Slurm Diamond Collectors (from Harvard University) and
|
|
collectd (from EDF).
|
|
* Add job_submit plugin for the "reboot" field.
|
|
* Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to
|
|
job_submit/lua plugins.
|
|
* Send in a -1 for a taskid into spank_task_post_fork for the extern_step.
|
|
* MYSQL - Sightly better logic if a job completion comes in with an end time
|
|
of 0.
|
|
* task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft
|
|
memory limit to allocated memory limit (previously no soft limit was set).
|
|
* Streamline when schedule() is called when running with message aggregation
|
|
on batch script completes.
|
|
* Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t.
|
|
* Document that persistent burst buffers can not be created or destroyed using
|
|
the salloc or srun --bb options.
|
|
* Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and
|
|
SLURM_JOB_RESERVAION environment variables are set for the salloc command.
|
|
Document the same environment variables for the salloc, sbatch and srun
|
|
commands in their man pages.
|
|
* Fix issue where sacctmgr load cluster.cfg wouldn't load associations
|
|
that had a partition in them.
|
|
* Don't return the extern step from sstat by default.
|
|
* In sstat print 'extern' instead of 4294967295 for the extern step.
|
|
* Make advanced reservations work properly with core specialization.
|
|
* slurmstepd modified to pre-load all relevant plugins at startup to avoid
|
|
the possibility of modified plugins later resulting in inconsistent API
|
|
or data structures and a failure of slurmstepd.
|
|
* Export functions from parse_time.c in libslurm.so.
|
|
* Export unit convert functions from slurm_protocol_api.c in libslurm.so.
|
|
* Fix scancel to allow multiple steps from a job to be cancelled at once.
|
|
* Update and expand upgrade guide (in Quick Start Administrator web page).
|
|
* burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run
|
|
operation.
|
|
* Insure reported expected job start time is not in the past for pending jobs.
|
|
* Add support for PMIx v2.
|
|
|
|
Required for FATE#316379.
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Oct 17 13:25:52 UTC 2016 - eich@suse.com
|
|
|
|
- Setting 'download_files' service to mode='localonly'
|
|
and adding source tarball. (Required for Factory).
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Oct 15 18:11:39 UTC 2016 - eich@suse.com
|
|
|
|
- version 15.08.7.1
|
|
* Remove the 1024-character limit on lines in batch scripts.
|
|
task/affinity: Disable core-level task binding if more CPUs required than
|
|
available cores.
|
|
* Preemption/gang scheduling: If a job is suspended at slurmctld restart or
|
|
reconfiguration time, then leave it suspended rather than resume+suspend.
|
|
* Don't use lower weight nodes for job allocation when topology/tree used.
|
|
* Don't allow user specified reservation names to disrupt the normal
|
|
reservation sequeuece numbering scheme.
|
|
* Avoid hard-link/copy of script/environment files for job arrays. Use the
|
|
master job record file for all tasks of the job array.
|
|
NOTE: Job arrays submitted to Slurm version 15.08.6 or later will fail if
|
|
the slurmctld daemon is downgraded to an earlier version of Slurm.
|
|
* In slurmctld log file, log duplicate job ID found by slurmd. Previously was
|
|
being logged as prolog/epilog failure.
|
|
* If a job is requeued while in the process of being launch, remove it's
|
|
job ID from slurmd's record of active jobs in order to avoid generating a
|
|
duplicate job ID error when launched for the second time (which would
|
|
drain the node).
|
|
* Cleanup messages when handling job script and environment variables in
|
|
older directory structure formats.
|
|
* Prevent triggering gang scheduling within a partition if configured with
|
|
PreemptType=partition_prio and PreemptMode=suspend,gang.
|
|
* Decrease parallelism in job cancel request to prevent denial of service
|
|
when cancelling huge numbers of jobs.
|
|
* If all ephemeral ports are in use, try using other port numbers.
|
|
* Prevent "scontrol update job" from updating jobs that have already finished.
|
|
* Show requested TRES in "squeue -O tres" when job is pending.
|
|
* Backfill scheduler: Test association and QOS node limits before reserving
|
|
resources for pending job.
|
|
* Many bug fixes.
|
|
- Use source services to download package.
|
|
- Fix code for new API of hwloc-2.0.
|
|
- package netloc_to_topology where avialable.
|
|
- Package documentation.
|
|
|
|
-------------------------------------------------------------------
|
|
Sun Nov 1 13:45:52 UTC 2015 - scorot@free.fr
|
|
|
|
- version 15.08.3
|
|
* Many new features and bug fixes. See NEWS file
|
|
- update files list accordingly
|
|
- fix wrong end of line in some files
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Aug 6 19:06:18 UTC 2015 - scorot@free.fr
|
|
|
|
- version 14.11.8
|
|
* Many bug fixes. See NEWS file
|
|
- update files list accordingly
|
|
|
|
-------------------------------------------------------------------
|
|
Sun Nov 2 22:12:34 UTC 2014 - scorot@free.fr
|
|
|
|
- add missing systemd requirements
|
|
- add missing rclink
|
|
|
|
-------------------------------------------------------------------
|
|
Sun Nov 2 15:04:42 UTC 2014 - scorot@free.fr
|
|
|
|
- version 14.03.9
|
|
* Many bug fixes. See NEWS file
|
|
- add systemd support
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Jul 26 10:22:32 UTC 2014 - scorot@free.fr
|
|
|
|
- version 14.03.6
|
|
* Added support for native Slurm operation on Cray systems
|
|
(without ALPS).
|
|
* Added partition configuration parameters AllowAccounts,
|
|
AllowQOS, DenyAccounts and DenyQOS to provide greater control
|
|
over use.
|
|
* Added the ability to perform load based scheduling. Allocating
|
|
resources to jobs on the nodes with the largest number if idle
|
|
CPUs.
|
|
* Added support for reserving cores on a compute node for system
|
|
services (core specialization)
|
|
* Add mechanism for job_submit plugin to generate error message
|
|
for srun, salloc or sbatch to stderr.
|
|
* Support for Postgres database has long since been out of date
|
|
and problematic, so it has been removed entirely. If you
|
|
would like to use it the code still exists in <= 2.6, but will
|
|
not be included in this and future versions of the code.
|
|
* Added new structures and support for both server and cluster
|
|
resources.
|
|
* Significant performance improvements, especially with respect
|
|
to job array support.
|
|
- update files list
|
|
|
|
-------------------------------------------------------------------
|
|
Sun Mar 16 15:59:01 UTC 2014 - scorot@free.fr
|
|
|
|
- update to version 2.6.7
|
|
* Support for job arrays, which increases performance and ease of
|
|
use for sets of similar jobs.
|
|
* Job profiling capability added to record a wide variety of job
|
|
characteristics for each task on a user configurable periodic
|
|
basis. Data currently available includes CPU use, memory use,
|
|
energy use, Infiniband network use, Lustre file system use, etc.
|
|
* Support for MPICH2 using PMI2 communications interface with much
|
|
greater scalability.
|
|
* Prolog and epilog support for advanced reservations.
|
|
* Much faster throughput for job step execution with --exclusive
|
|
option. The srun process is notified when resources become
|
|
available rather than periodic polling.
|
|
* Support improved for Intel MIC (Many Integrated Core) processor.
|
|
* Advanced reservations with hostname and core counts now supports
|
|
asymmetric reservations (e.g. specific different core count for
|
|
each node).
|
|
* External sensor plugin infrastructure added to record power
|
|
consumption, temperature, etc.
|
|
* Improved performance for high-throughput computing.
|
|
* MapReduce+ support (launches ~1000x faster, runs ~10x faster).
|
|
* Added "MaxCPUsPerNode" partition configuration parameter. This
|
|
can be especially useful to schedule GPUs. For example a node
|
|
can be associated with two Slurm partitions (e.g. "cpu" and
|
|
"gpu") and the partition/queue "cpu" could be limited to only a
|
|
subset of the node's CPUs, insuring that one or more CPUs would
|
|
be available to jobs in the "gpu" partition/queue.
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Jun 6 20:31:49 UTC 2013 - scorot@free.fr
|
|
|
|
- version 2.5.7
|
|
* Fix for linking to the select/cray plugin to not give warning
|
|
about undefined variable.
|
|
* Add missing symbols to the xlator.h
|
|
* Avoid placing pending jobs in AdminHold state due to backfill
|
|
scheduler interactions with advanced reservation.
|
|
* Accounting - make average by task not cpu.
|
|
* POE - Correct logic to support poe option "-euidevice sn_all"
|
|
and "-euidevice sn_single".
|
|
* Accounting - Fix minor initialization error.
|
|
* POE - Correct logic to support srun network instances count
|
|
with POE.
|
|
* POE - With the srun --launch-cmd option, report proper task
|
|
count when the --cpus-per-task option is used without the
|
|
--ntasks option.
|
|
* POE - Fix logic binding tasks to CPUs.
|
|
* sview - Fix race condition where new information could of
|
|
slipped past the node tab and we didn't notice.
|
|
* Accounting - Fix an invalid memory read when slurmctld sends
|
|
data about start job to slurmdbd.
|
|
* If a prolog or epilog failure occurs, drain the node rather
|
|
than setting it down and killing all of its jobs.
|
|
* Priority/multifactor - Avoid underflow in half-life calculation.
|
|
* POE - pack missing variable to allow fanout (more than 32
|
|
nodes)
|
|
* Prevent clearing reason field for pending jobs. This bug was
|
|
introduced in v2.5.5 (see "Reject job at submit time ...").
|
|
* BGQ - Fix issue with preemption on sub-block jobs where a job
|
|
would kill all preemptable jobs on the midplane instead of just
|
|
the ones it needed to.
|
|
* switch/nrt - Validate dynamic window allocation size.
|
|
* BGQ - When --geo is requested do not impose the default
|
|
conn_types.
|
|
* RebootNode logic - Defers (rather than forgets) reboot request
|
|
with job running on the node within a reservation.
|
|
* switch/nrt - Correct network_id use logic. Correct support for
|
|
user sn_all and sn_single options.
|
|
* sched/backfill - Modify logic to reduce overhead under heavy
|
|
load.
|
|
* Fix job step allocation with --exclusive and --hostlist option.
|
|
* Select/cons_res - Fix bug resulting in error of "cons_res: sync
|
|
loop not progressing, holding job #"
|
|
* checkpoint/blcr - Reset max_nodes from zero to NO_VAL on job
|
|
restart.
|
|
* launch/poe - Fix for hostlist file support with repeated host
|
|
names.
|
|
* priority/multifactor2 - Prevent possible divide by zero.
|
|
-- srun - Don't check for executable if --test-only flag is
|
|
used.
|
|
* energy - On a single node only use the last task for gathering
|
|
energy. Since we don't currently track energy usage per task
|
|
(only per step). Otherwise we get double the energy.
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Apr 6 11:13:17 UTC 2013 - scorot@free.fr
|
|
|
|
- version 2.5.4
|
|
* Support for Intel® Many Integrated Core (MIC) processors.
|
|
* User control over CPU frequency of each job step.
|
|
* Recording power usage information for each job.
|
|
* Advanced reservation of cores rather than whole nodes.
|
|
* Integration with IBM's Parallel Environment including POE (Parallel
|
|
Operating Environment) and NRT (Network Resource Table) API.
|
|
* Highly optimized throughput for serial jobs in a new
|
|
"select/serial" plugin.
|
|
* CPU load is information available
|
|
* Configurable number of CPUs available to jobs in each SLURM
|
|
partition, which provides a mechanism to reserve CPUs for use
|
|
with GPUs.
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Nov 17 18:02:16 UTC 2012 - scorot@free.fr
|
|
|
|
- remore runlevel 4 from init script thanks to patch1
|
|
- fix self obsoletion of slurm-munge package
|
|
- use fdupes to remove duplicates
|
|
- spec file reformaing
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Nov 17 17:30:11 UTC 2012 - scorot@free.fr
|
|
|
|
- put perl macro in a better within install section
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Nov 17 17:01:20 UTC 2012 - scorot@free.fr
|
|
|
|
- enable numa on x86_64 arch only
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Nov 17 16:54:18 UTC 2012 - scorot@free.fr
|
|
|
|
- add numa and hwloc support
|
|
- fix rpath with patch0
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Nov 16 21:46:49 UTC 2012 - scorot@free.fr
|
|
|
|
- fix perl module files list
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Nov 5 21:48:52 UTC 2012 - scorot@free.fr
|
|
|
|
- use perl_process_packlist macro for the perl files cleanup
|
|
- fix some summaries length
|
|
- add cgoups directory and example the cgroup.release_common file
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Nov 3 18:19:59 UTC 2012 - scorot@free.fr
|
|
|
|
- spec file cleanup
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Nov 3 15:57:47 UTC 2012 - scorot@free.fr
|
|
|
|
- first package
|
|
|