SHA256
1
0
forked from pool/slurm
slurm/slurm.spec

1302 lines
38 KiB
RPMSpec
Raw Normal View History

#
# spec file
#
# Copyright (c) 2021 SUSE LLC
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
# upon. The license for this file, and modifications and additions to the
# file, is the same license as for the pristine package itself (unless the
# license for the pristine package is not an Open Source License, in which
# case the license is the MIT License). An "Open Source License" is a
# license that conforms to the Open Source Definition (Version 1.9)
# published by the Open Source Initiative.
Accepting request 670322 from home:eeich:branches:network:cluster - Update to version 18.08.05: * Add mitigation for a potential heap overflow on 32-bit systems in xmalloc. (CVE-2019-6438, bsc#1123304). * Other fixes: + Backfill - If a job has a time_limit guess the end time of a job better if OverTimeLimit is Unlimited. + Fix "sacctmgr show events event=cluster" + Fix sacctmgr show runawayjobs from sibling cluster + Avoid bit offset of -1 in call to bit_nclear(). + Insure that "hbm" is a configured GresType on knl systems. + Fix NodeFeaturesPlugins=node_features/knl_generic to allow other gres other than knl. + cons_res: Prevent overflow on multiply. + Better debug for bad values in gres.conf. + Fix double accounting of energy at end of job. + Read gres.conf for cloud nodes on slurmctld. + Don't assume the first node of a job is the batch host when purging jobs from a node. + Better debugging when a job doesn't have a job_resrcs ptr. + Store ave watts in energy plugins. + Add XCC plugin for reading Lenovo Power. + Fix minor memory leak when scheduling rebootable nodes. + Fix debug2 prefix for sched log. + Fix printing correct SLURM_JOB_ACCOUNT_PACK_GROUP_* in env for a Het Job. + sbatch - search current working directory first for job script. + Make it so held jobs reset the AccrueTime and do not count against any AccrueTime limits. + Add SchedulerParameters option of bf_hetjob_prio=[min|avg|max] to alter the job sorting algorithm for scheduling heterogeneous jobs. + Fix initialization of assoc_mgr_locks and slurmctld_locks lock OBS-URL: https://build.opensuse.org/request/show/670322 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=83
2019-01-31 12:56:59 +01:00
# Please submit bugfixes or comments via https://bugs.opensuse.org/
#
# Check file META in sources: update so_version to (API_CURRENT - API_AGE)
Accepting request 915777 from home:mslacken:slurm_update - updated to 21.08.1, major changes: * A new "AccountingStoreFlags=job_script" option to store the job scripts directly in SlurmDBD. * Added "sacct -o SubmitLine" format option to get the submit line of a job/step. * Changes to the node state management so that nodes are marked as PLANNED instead of IDLE if the scheduler is still accumulating resources while waiting to launch a job on them. * RS256 token support in auth/jwt. * Overhaul of the cgroup subsystems to simplify operation, mitigate a number of inherent race conditions, and prepare for future cgroup v2 support. * Further improvements to cloud node power state management. * A new child process of the Slurm controller called "slurmscriptd" responsible for executing PrologSlurmctld and EpilogSlurmctld scripts, which significantly reduces performance issues associated with enabling those options. * A new burst_buffer/lua plugin allowing for site-specific asynchronous job data management. * Fixes to the job_container/tmpfs plugin to allow the slurmd process to be restarted while the job is running without issue. * Added json/yaml output to sacct, squeue, and sinfo commands. * Added a new node_features/helpers plugin to provide a generic way to change settings on a compute node across a reboot. * Added support for automatically detecting and broadcasting shared libraries for an executable launched with "srun --bcast". * Added initial OCI container execution support with a new --container option to sbatch and srun. * Improved "configless" support by allowing multiple control servers to be specified through the slurmd --conf-server option, and send additional configuration files at startup including cli_filter.lua. OBS-URL: https://build.opensuse.org/request/show/915777 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=184
2021-09-06 15:29:00 +02:00
%define so_version 37
Accepting request 942081 from home:mslacken:branches:network:cluster - update to 21.08.5 with following changes: * Fix issue where typeless GRES node updates were not immediately reflected. * Fix setting the default scrontab job working directory so that it's the home of the different user (*u <user>) and not that of root or SlurmUser editor. * Fix stepd not respecting SlurmdSyslogDebug. * Fix concurrency issue with squeue. * Fix job start time not being reset after launch when job is packed onto already booting node. * Fix updating SLURM_NODE_ALIASES for jobs packed onto powering up nodes. * Cray - Fix issues with starting hetjobs. * auth/jwks - Print fatal() message when jwks is configured but file could not be opened. * If sacctmgr has an association with an unknown qos as the default qos print 'UNKN*###' instead of leaving a blank name. * Correctly determine task count when giving --cpus-per-gpu, --gpus and *-ntasks-per-node without task count. * slurmctld - Fix places where the global last_job_update was not being set to the time of update when a job's reason and description were updated. * slurmctld - Fix case where a job submitted with more than one partition would not have its reason updated while waiting to start. * Fix memory leak in node feature rebooting. * Fix time limit permanetly set to 1 minute by backfill for job array tasks higher than the first with QOS NoReserve flag and PreemptMode configured. * Fix sacct -N to show jobs that started in the current second * Fix issue on running steps where both SLURM_NTASKS_PER_TRES and SLURM_NTASKS_PER_GPU are set. * Handle oversubscription request correctly when also requesting *-ntasks-per-tres. * Correctly detect when a step requests bad gres inside an allocation. * slurmstepd - Correct possible deadlock when UnkillableStepTimeout triggers. OBS-URL: https://build.opensuse.org/request/show/942081 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=195
2021-12-23 11:26:41 +01:00
%define ver 21.08.5
Accepting request 915777 from home:mslacken:slurm_update - updated to 21.08.1, major changes: * A new "AccountingStoreFlags=job_script" option to store the job scripts directly in SlurmDBD. * Added "sacct -o SubmitLine" format option to get the submit line of a job/step. * Changes to the node state management so that nodes are marked as PLANNED instead of IDLE if the scheduler is still accumulating resources while waiting to launch a job on them. * RS256 token support in auth/jwt. * Overhaul of the cgroup subsystems to simplify operation, mitigate a number of inherent race conditions, and prepare for future cgroup v2 support. * Further improvements to cloud node power state management. * A new child process of the Slurm controller called "slurmscriptd" responsible for executing PrologSlurmctld and EpilogSlurmctld scripts, which significantly reduces performance issues associated with enabling those options. * A new burst_buffer/lua plugin allowing for site-specific asynchronous job data management. * Fixes to the job_container/tmpfs plugin to allow the slurmd process to be restarted while the job is running without issue. * Added json/yaml output to sacct, squeue, and sinfo commands. * Added a new node_features/helpers plugin to provide a generic way to change settings on a compute node across a reboot. * Added support for automatically detecting and broadcasting shared libraries for an executable launched with "srun --bcast". * Added initial OCI container execution support with a new --container option to sbatch and srun. * Improved "configless" support by allowing multiple control servers to be specified through the slurmd --conf-server option, and send additional configuration files at startup including cli_filter.lua. OBS-URL: https://build.opensuse.org/request/show/915777 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=184
2021-09-06 15:29:00 +02:00
%define _ver _21_08
%define dl_ver %{ver}
# so-version is 0 and seems to be stable
%define pmi_so 0
%define nss_so 2
%define pmix_so 2
%define ver_major %(ver=%{version}; echo ${ver%.*})
Accepting request 538161 from home:eeich:branches:network:cluster - Updated to 17.02.9 to fix CVE-2017-15566 (bsc#1065697). Changes in 17.0.9 * When resuming powered down nodes, mark DOWN nodes right after ResumeTimeout has been reached (previous logic would wait about one minute longer). * Fix sreport not showing full column name for TRES Count. * Fix slurmdb_reservations_get() giving wrong usage data when job's spanned reservation that was modified. * Fix sreport reservation utilization report showing bad data. * Show all TRES' on a reservation in sreport reservation utilization report by default. * Fix sacctmgr show reservation handling "end" parameter. * Work around issue with sysmacros.h and gcc7 / glibc 2.25. * Fix layouts code to only allow setting a boolean. * Fix sbatch --wait to keep waiting even if a message timeout occurs. * CRAY - If configured with NodeFeatures=knl_cray and there are non-KNL nodes which include no features the slurmctld will abort without this patch when attemping strtok_r(NULL). * Fix regression in 17.02.7 which would run the spank_task_privileged as part of the slurmstepd instead of it's child process. * Fix security issue in Prolog and Epilog by always prepending SPANK_ to all user-set environment variables. CVE-2017-15566. Changes in 17.0.8: * Add 'slurmdbd:' to the accounting plugin to notify message is from dbd instead of local. * mpi/mvapich - Buffer being only partially cleared. No failures observed. * Fix for job --switch option on dragonfly network. * In salloc with --uid option, drop supplementary groups before changing UID. * jobcomp/elasticsearch - strip any trailing slashes from JobCompLoc. * jobcomp/elasticsearch - fix memory leak when transferring generated buffer. OBS-URL: https://build.opensuse.org/request/show/538161 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=29
2017-11-01 18:01:38 +01:00
%define pname slurm
%ifarch i586 %arm s390
ExclusiveArch: do_not_build
%endif
%if 0%{?suse_version} < 1315
ExclusiveArch: do_not_build
%endif
%if 0%{?sle_version} == 120200
%define base_ver 1702
%define nocheck 1
%endif
%if 0%{?sle_version} == 150000
%define base_ver 1711
%endif
%if 0%{?sle_version} == 150100
%define base_ver 1808
%endif
%if 0%{?sle_version} == 150200
%define base_ver 2002
%endif
%if 0%{?sle_version} == 150300
%define base_ver 2011
%endif
%if 0%{?suse_version} >= 1500
%define have_sysuser 1
%endif
%if 0%{?base_ver} > 0 && 0%{?base_ver} < %(echo %{_ver} | tr -d _)
%define upgrade 1
%endif
# Build with PMIx only for SLE >= 15.0 and TW
%if 0%{?sle_version} >= 150000 || 0%{suse_version} >= 1550
%{bcond_without pmix}
%else
%{bcond_with pmix}
%endif
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%if 0%{?suse_version} >= 1220 || 0%{?sle_version} >= 120000
Accepting request 454272 from home:eeich:branches:network:cluster - Updated to 16.05.8.1 * Remove StoragePass from being printed out in the slurmdbd log at debug2 level. * Defer PATH search for task program until launch in slurmstepd. * Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is enabled. * Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. * If job is held for bad constraints make it so once updated the job doesn't go into JobAdminHeld. * sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. * When unpacking a node or front_end record from state and the protocol version is lower than the min version, set it to the min. * Remove redundant lookup for part_ptr when updating a reservation's nodes. * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic. * Do not allocate specialized cores to jobs using the --exclusive option. * Cancel interactive job if Prolog failure with "PrologFlags=contain" or "PrologFlags=alloc" configured. Send new error prolog failure message to the salloc or srun command as needed. * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line. * Fix check for PluginDir within slurmctld to work with multiple directories. * Cancel interactive jobs automatically on communication error to launching srun/salloc process. * Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371). - Replace group/user add macros with function calls. - Disable building with netloc support: the netloc API is part of the devel branch of hwloc. Since this devel branch was included accidentally and has been reversed since, we need to disable this for the time being. - Conditionalized architecture specific pieces to support non-x86 architectures better. - Remove: unneeded 'BuildRequires: python' - Add: BuildRequires: freeipmi-devel BuildRequires: libibmad-devel BuildRequires: libibumad-devel so they are picked up by the slurm build. - Enable modifications from openHPC Project. - Enable lua API package build. - Add a recommends for slurm-munge to the slurm package: This is way, the munge auth method is available and slurm works out of the box. - Create /var/lib/slurm as StateSaveLocation directory. /tmp is dangerous. - Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE. OBS-URL: https://build.opensuse.org/request/show/454272 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=13
2017-02-02 21:23:02 +01:00
%define with_systemd 1
%endif
%if 0%{?suse_version:1} && 0%{?suse_version} <= 1140
%define comp_at %defattr(-,root,root)
%undefine python_ver
%else
%define have_json_c 1
%define python_ver 3
%if 0%{?sle_version} >= 150000 || 0%{?is_opensuse}
%define have_apache_rpm_macros 1
%endif
%endif
Accepting request 454272 from home:eeich:branches:network:cluster - Updated to 16.05.8.1 * Remove StoragePass from being printed out in the slurmdbd log at debug2 level. * Defer PATH search for task program until launch in slurmstepd. * Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is enabled. * Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. * If job is held for bad constraints make it so once updated the job doesn't go into JobAdminHeld. * sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. * When unpacking a node or front_end record from state and the protocol version is lower than the min version, set it to the min. * Remove redundant lookup for part_ptr when updating a reservation's nodes. * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic. * Do not allocate specialized cores to jobs using the --exclusive option. * Cancel interactive job if Prolog failure with "PrologFlags=contain" or "PrologFlags=alloc" configured. Send new error prolog failure message to the salloc or srun command as needed. * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line. * Fix check for PluginDir within slurmctld to work with multiple directories. * Cancel interactive jobs automatically on communication error to launching srun/salloc process. * Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371). - Replace group/user add macros with function calls. - Disable building with netloc support: the netloc API is part of the devel branch of hwloc. Since this devel branch was included accidentally and has been reversed since, we need to disable this for the time being. - Conditionalized architecture specific pieces to support non-x86 architectures better. - Remove: unneeded 'BuildRequires: python' - Add: BuildRequires: freeipmi-devel BuildRequires: libibmad-devel BuildRequires: libibumad-devel so they are picked up by the slurm build. - Enable modifications from openHPC Project. - Enable lua API package build. - Add a recommends for slurm-munge to the slurm package: This is way, the munge auth method is available and slurm works out of the box. - Create /var/lib/slurm as StateSaveLocation directory. /tmp is dangerous. - Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE. OBS-URL: https://build.opensuse.org/request/show/454272 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=13
2017-02-02 21:23:02 +01:00
%if 0%{?sle_version} >= 150000 || 0%{?is_opensuse}
%define have_http_parser 1
%endif
%if 0%{?have_http_parser} && 0%{?have_json_c}
%define build_slurmrestd 1
%endif
%if 0
%define have_netloc 1
%endif
%if 0%{?is_opensuse} && 0%{!?sle_version:1}
%define is_factory 1
%endif
%if 0%{?is_factory} || 0%{?sle_version} >= 150000
%define have_hdf5 1
%define have_boolean_deps 1
%define have_lz4 1
%define have_firewalld 1
%endif
Accepting request 454272 from home:eeich:branches:network:cluster - Updated to 16.05.8.1 * Remove StoragePass from being printed out in the slurmdbd log at debug2 level. * Defer PATH search for task program until launch in slurmstepd. * Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is enabled. * Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. * If job is held for bad constraints make it so once updated the job doesn't go into JobAdminHeld. * sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. * When unpacking a node or front_end record from state and the protocol version is lower than the min version, set it to the min. * Remove redundant lookup for part_ptr when updating a reservation's nodes. * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic. * Do not allocate specialized cores to jobs using the --exclusive option. * Cancel interactive job if Prolog failure with "PrologFlags=contain" or "PrologFlags=alloc" configured. Send new error prolog failure message to the salloc or srun command as needed. * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line. * Fix check for PluginDir within slurmctld to work with multiple directories. * Cancel interactive jobs automatically on communication error to launching srun/salloc process. * Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371). - Replace group/user add macros with function calls. - Disable building with netloc support: the netloc API is part of the devel branch of hwloc. Since this devel branch was included accidentally and has been reversed since, we need to disable this for the time being. - Conditionalized architecture specific pieces to support non-x86 architectures better. - Remove: unneeded 'BuildRequires: python' - Add: BuildRequires: freeipmi-devel BuildRequires: libibmad-devel BuildRequires: libibumad-devel so they are picked up by the slurm build. - Enable modifications from openHPC Project. - Enable lua API package build. - Add a recommends for slurm-munge to the slurm package: This is way, the munge auth method is available and slurm works out of the box. - Create /var/lib/slurm as StateSaveLocation directory. /tmp is dangerous. - Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE. OBS-URL: https://build.opensuse.org/request/show/454272 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=13
2017-02-02 21:23:02 +01:00
%ifarch x86_64
%define have_libnuma 1
%else
%ifarch %{ix86}
Accepting request 454272 from home:eeich:branches:network:cluster - Updated to 16.05.8.1 * Remove StoragePass from being printed out in the slurmdbd log at debug2 level. * Defer PATH search for task program until launch in slurmstepd. * Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is enabled. * Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. * If job is held for bad constraints make it so once updated the job doesn't go into JobAdminHeld. * sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. * When unpacking a node or front_end record from state and the protocol version is lower than the min version, set it to the min. * Remove redundant lookup for part_ptr when updating a reservation's nodes. * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic. * Do not allocate specialized cores to jobs using the --exclusive option. * Cancel interactive job if Prolog failure with "PrologFlags=contain" or "PrologFlags=alloc" configured. Send new error prolog failure message to the salloc or srun command as needed. * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line. * Fix check for PluginDir within slurmctld to work with multiple directories. * Cancel interactive jobs automatically on communication error to launching srun/salloc process. * Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371). - Replace group/user add macros with function calls. - Disable building with netloc support: the netloc API is part of the devel branch of hwloc. Since this devel branch was included accidentally and has been reversed since, we need to disable this for the time being. - Conditionalized architecture specific pieces to support non-x86 architectures better. - Remove: unneeded 'BuildRequires: python' - Add: BuildRequires: freeipmi-devel BuildRequires: libibmad-devel BuildRequires: libibumad-devel so they are picked up by the slurm build. - Enable modifications from openHPC Project. - Enable lua API package build. - Add a recommends for slurm-munge to the slurm package: This is way, the munge auth method is available and slurm works out of the box. - Create /var/lib/slurm as StateSaveLocation directory. /tmp is dangerous. - Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE. OBS-URL: https://build.opensuse.org/request/show/454272 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=13
2017-02-02 21:23:02 +01:00
%if 0%{?sle_version} >= 120200
%define have_libnuma 1
%endif
%endif
Accepting request 435622 from home:eeich:branches:network:cluster - version 15.08.7.1 * Remove the 1024-character limit on lines in batch scripts. task/affinity: Disable core-level task binding if more CPUs required than available cores. * Preemption/gang scheduling: If a job is suspended at slurmctld restart or reconfiguration time, then leave it suspended rather than resume+suspend. * Don't use lower weight nodes for job allocation when topology/tree used. * Don't allow user specified reservation names to disrupt the normal reservation sequeuece numbering scheme. * Avoid hard-link/copy of script/environment files for job arrays. Use the master job record file for all tasks of the job array. NOTE: Job arrays submitted to Slurm version 15.08.6 or later will fail if the slurmctld daemon is downgraded to an earlier version of Slurm. * In slurmctld log file, log duplicate job ID found by slurmd. Previously was being logged as prolog/epilog failure. * If a job is requeued while in the process of being launch, remove it's job ID from slurmd's record of active jobs in order to avoid generating a duplicate job ID error when launched for the second time (which would drain the node). * Cleanup messages when handling job script and environment variables in older directory structure formats. * Prevent triggering gang scheduling within a partition if configured with PreemptType=partition_prio and PreemptMode=suspend,gang. * Decrease parallelism in job cancel request to prevent denial of service when cancelling huge numbers of jobs. * If all ephemeral ports are in use, try using other port numbers. * Prevent "scontrol update job" from updating jobs that have already finished. * Show requested TRES in "squeue -O tres" when job is pending. * Backfill scheduler: Test association and QOS node limits before reserving resources for pending job. * Many bug fixes. - Use source services to download package. - Fix code for new API of hwloc-2.0. - package netloc_to_topology where avialable. - Package documentation. OBS-URL: https://build.opensuse.org/request/show/435622 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=10
2016-10-16 21:51:20 +02:00
%endif
%if 0%{?with_systemd}
%define slurm_u %pname
%define slurm_g %pname
%else
%define slurm_u daemon
%define slurm_g root
%endif
%define slurm_uid 120
%define slurmdir %{_sysconfdir}/slurm
%define slurmdescr "SLURM workload manager"
%define libslurm libslurm%{so_version}
%{!?_rundir:%define _rundir /var/run}
%if !0%{?_pam_moduledir:1}
%define _pam_moduledir /%_lib
%endif
Name: %{pname}%{?upgrade:%{_ver}}
Accepting request 538161 from home:eeich:branches:network:cluster - Updated to 17.02.9 to fix CVE-2017-15566 (bsc#1065697). Changes in 17.0.9 * When resuming powered down nodes, mark DOWN nodes right after ResumeTimeout has been reached (previous logic would wait about one minute longer). * Fix sreport not showing full column name for TRES Count. * Fix slurmdb_reservations_get() giving wrong usage data when job's spanned reservation that was modified. * Fix sreport reservation utilization report showing bad data. * Show all TRES' on a reservation in sreport reservation utilization report by default. * Fix sacctmgr show reservation handling "end" parameter. * Work around issue with sysmacros.h and gcc7 / glibc 2.25. * Fix layouts code to only allow setting a boolean. * Fix sbatch --wait to keep waiting even if a message timeout occurs. * CRAY - If configured with NodeFeatures=knl_cray and there are non-KNL nodes which include no features the slurmctld will abort without this patch when attemping strtok_r(NULL). * Fix regression in 17.02.7 which would run the spank_task_privileged as part of the slurmstepd instead of it's child process. * Fix security issue in Prolog and Epilog by always prepending SPANK_ to all user-set environment variables. CVE-2017-15566. Changes in 17.0.8: * Add 'slurmdbd:' to the accounting plugin to notify message is from dbd instead of local. * mpi/mvapich - Buffer being only partially cleared. No failures observed. * Fix for job --switch option on dragonfly network. * In salloc with --uid option, drop supplementary groups before changing UID. * jobcomp/elasticsearch - strip any trailing slashes from JobCompLoc. * jobcomp/elasticsearch - fix memory leak when transferring generated buffer. OBS-URL: https://build.opensuse.org/request/show/538161 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=29
2017-11-01 18:01:38 +01:00
Version: %{ver}
Release: 0
Summary: Simple Linux Utility for Resource Management
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
License: SUSE-GPL-2.0-with-openssl-exception
Group: Productivity/Clustering/Computing
URL: https://www.schedmd.com
Source: https://download.schedmd.com/slurm/%{pname}-%{dl_ver}.tar.bz2
Source1: slurm-rpmlintrc
Source10: https://raw.githubusercontent.com/openSUSE/hpc/10c105e/files/slurm/slurmd.xml
Source11: https://raw.githubusercontent.com/openSUSE/hpc/10c105e/files/slurm/slurmctld.xml
Source12: https://raw.githubusercontent.com/openSUSE/hpc/10c105e/files/slurm/slurmdbd.xml
Patch0: Remove-rpath-from-build.patch
Patch1: slurm-2.4.4-init.patch
Patch2: pam_slurm-Initialize-arrays-and-pass-sizes.patch
Accepting request 874647 from home:mslacken:branches:network:cluster - Udpate to 20.11.04 * Fix node selection for advanced reservations with features. * mpi/pmix: Handle pipe failure better when using ucx. * mpi/pmix: include PMIX_NODEID for each process entry. * Fix job getting rejected after being requeued on same node that died. * job_submit/lua - add "network" field. * Fix situations when a reoccuring reservation could erroneously skip a period. * Ensure that a reservations [pro|epi]log are ran on reoccuring reservations. * Fix threads-per-core memory allocation issue when using CR_CPU_MEMORY. * Fix scheduling issue with --gpus. * Fix gpu allocations that request --cpus-per-task. * mpi/pmix: fixed print messages for all PMIXP_* macros * Add mapping for XCPU to --signal option. * Fix regression in 20.11 that prevented a full pass of the main scheduler from ever executing. * Work around a glibc bug in which "0" is incorrectly printed as "nan" which will result in corrupted association state on restart. * Fix regression in 20.11 which made slurmd incorrectly attempt to find the parent slurmd address when not applicable and send incorrect reverse*tree info to the slurmstepd. * Fix cgroup ns detection when using containers (e.g. LXC or Docker). * scrontab - change temporary file handling to work with emacs. - Removed check-for-lipmix.so.MAJOR.patch - Added: load-pmix-major-version.patch OBS-URL: https://build.opensuse.org/request/show/874647 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=173
2021-02-24 10:49:16 +01:00
Patch3: load-pmix-major-version.patch
%{?upgrade:Provides: %{pname} = %{version}}
%{?upgrade:Conflicts: %{pname}}
Requires: %{name}-config = %{version}
%if 0%{?have_boolean_deps}
Recommends: (%{name}-munge = %version if munge)
%else
Recommends: %{name}-munge = %version
%endif
Requires(pre): %{name}-node = %{version}
Recommends: %{name}-config-man = %{version}
Recommends: %{name}-doc = %{version}
Accepting request 874647 from home:mslacken:branches:network:cluster - Udpate to 20.11.04 * Fix node selection for advanced reservations with features. * mpi/pmix: Handle pipe failure better when using ucx. * mpi/pmix: include PMIX_NODEID for each process entry. * Fix job getting rejected after being requeued on same node that died. * job_submit/lua - add "network" field. * Fix situations when a reoccuring reservation could erroneously skip a period. * Ensure that a reservations [pro|epi]log are ran on reoccuring reservations. * Fix threads-per-core memory allocation issue when using CR_CPU_MEMORY. * Fix scheduling issue with --gpus. * Fix gpu allocations that request --cpus-per-task. * mpi/pmix: fixed print messages for all PMIXP_* macros * Add mapping for XCPU to --signal option. * Fix regression in 20.11 that prevented a full pass of the main scheduler from ever executing. * Work around a glibc bug in which "0" is incorrectly printed as "nan" which will result in corrupted association state on restart. * Fix regression in 20.11 which made slurmd incorrectly attempt to find the parent slurmd address when not applicable and send incorrect reverse*tree info to the slurmstepd. * Fix cgroup ns detection when using containers (e.g. LXC or Docker). * scrontab - change temporary file handling to work with emacs. - Removed check-for-lipmix.so.MAJOR.patch - Added: load-pmix-major-version.patch OBS-URL: https://build.opensuse.org/request/show/874647 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=173
2021-02-24 10:49:16 +01:00
BuildRequires: autoconf
BuildRequires: automake
BuildRequires: coreutils
BuildRequires: fdupes
%{?have_firewalld:BuildRequires: firewalld}
BuildRequires: gcc-c++
BuildRequires: gtk2-devel
%if 0%{?have_hdf5}
BuildRequires: hdf5-devel
%endif
BuildRequires: libbitmask-devel
BuildRequires: libcpuset-devel
BuildRequires: python%{?python_ver}
Accepting request 454272 from home:eeich:branches:network:cluster - Updated to 16.05.8.1 * Remove StoragePass from being printed out in the slurmdbd log at debug2 level. * Defer PATH search for task program until launch in slurmstepd. * Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is enabled. * Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. * If job is held for bad constraints make it so once updated the job doesn't go into JobAdminHeld. * sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. * When unpacking a node or front_end record from state and the protocol version is lower than the min version, set it to the min. * Remove redundant lookup for part_ptr when updating a reservation's nodes. * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic. * Do not allocate specialized cores to jobs using the --exclusive option. * Cancel interactive job if Prolog failure with "PrologFlags=contain" or "PrologFlags=alloc" configured. Send new error prolog failure message to the salloc or srun command as needed. * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line. * Fix check for PluginDir within slurmctld to work with multiple directories. * Cancel interactive jobs automatically on communication error to launching srun/salloc process. * Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371). - Replace group/user add macros with function calls. - Disable building with netloc support: the netloc API is part of the devel branch of hwloc. Since this devel branch was included accidentally and has been reversed since, we need to disable this for the time being. - Conditionalized architecture specific pieces to support non-x86 architectures better. - Remove: unneeded 'BuildRequires: python' - Add: BuildRequires: freeipmi-devel BuildRequires: libibmad-devel BuildRequires: libibumad-devel so they are picked up by the slurm build. - Enable modifications from openHPC Project. - Enable lua API package build. - Add a recommends for slurm-munge to the slurm package: This is way, the munge auth method is available and slurm works out of the box. - Create /var/lib/slurm as StateSaveLocation directory. /tmp is dangerous. - Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE. OBS-URL: https://build.opensuse.org/request/show/454272 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=13
2017-02-02 21:23:02 +01:00
%if 0%{?have_libnuma}
BuildRequires: libnuma-devel
%endif
BuildRequires: mysql-devel >= 5.0.0
BuildRequires: ncurses-devel
%{?with_pmix:BuildRequires: pmix-devel}
BuildRequires: openssl-devel >= 0.9.6
BuildRequires: pkgconfig
BuildRequires: readline-devel
Accepting request 454272 from home:eeich:branches:network:cluster - Updated to 16.05.8.1 * Remove StoragePass from being printed out in the slurmdbd log at debug2 level. * Defer PATH search for task program until launch in slurmstepd. * Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is enabled. * Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. * If job is held for bad constraints make it so once updated the job doesn't go into JobAdminHeld. * sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. * When unpacking a node or front_end record from state and the protocol version is lower than the min version, set it to the min. * Remove redundant lookup for part_ptr when updating a reservation's nodes. * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic. * Do not allocate specialized cores to jobs using the --exclusive option. * Cancel interactive job if Prolog failure with "PrologFlags=contain" or "PrologFlags=alloc" configured. Send new error prolog failure message to the salloc or srun command as needed. * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line. * Fix check for PluginDir within slurmctld to work with multiple directories. * Cancel interactive jobs automatically on communication error to launching srun/salloc process. * Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371). - Replace group/user add macros with function calls. - Disable building with netloc support: the netloc API is part of the devel branch of hwloc. Since this devel branch was included accidentally and has been reversed since, we need to disable this for the time being. - Conditionalized architecture specific pieces to support non-x86 architectures better. - Remove: unneeded 'BuildRequires: python' - Add: BuildRequires: freeipmi-devel BuildRequires: libibmad-devel BuildRequires: libibumad-devel so they are picked up by the slurm build. - Enable modifications from openHPC Project. - Enable lua API package build. - Add a recommends for slurm-munge to the slurm package: This is way, the munge auth method is available and slurm works out of the box. - Create /var/lib/slurm as StateSaveLocation directory. /tmp is dangerous. - Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE. OBS-URL: https://build.opensuse.org/request/show/454272 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=13
2017-02-02 21:23:02 +01:00
%if 0%{?suse_version} > 1310 || 0%{?sle_version}
%if 0%{?sle_version} >= 120400 && 0%{?sle_version} < 150000
BuildRequires: infiniband-diags-devel
%else
Accepting request 454272 from home:eeich:branches:network:cluster - Updated to 16.05.8.1 * Remove StoragePass from being printed out in the slurmdbd log at debug2 level. * Defer PATH search for task program until launch in slurmstepd. * Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is enabled. * Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. * If job is held for bad constraints make it so once updated the job doesn't go into JobAdminHeld. * sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. * When unpacking a node or front_end record from state and the protocol version is lower than the min version, set it to the min. * Remove redundant lookup for part_ptr when updating a reservation's nodes. * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic. * Do not allocate specialized cores to jobs using the --exclusive option. * Cancel interactive job if Prolog failure with "PrologFlags=contain" or "PrologFlags=alloc" configured. Send new error prolog failure message to the salloc or srun command as needed. * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line. * Fix check for PluginDir within slurmctld to work with multiple directories. * Cancel interactive jobs automatically on communication error to launching srun/salloc process. * Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371). - Replace group/user add macros with function calls. - Disable building with netloc support: the netloc API is part of the devel branch of hwloc. Since this devel branch was included accidentally and has been reversed since, we need to disable this for the time being. - Conditionalized architecture specific pieces to support non-x86 architectures better. - Remove: unneeded 'BuildRequires: python' - Add: BuildRequires: freeipmi-devel BuildRequires: libibmad-devel BuildRequires: libibumad-devel so they are picked up by the slurm build. - Enable modifications from openHPC Project. - Enable lua API package build. - Add a recommends for slurm-munge to the slurm package: This is way, the munge auth method is available and slurm works out of the box. - Create /var/lib/slurm as StateSaveLocation directory. /tmp is dangerous. - Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE. OBS-URL: https://build.opensuse.org/request/show/454272 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=13
2017-02-02 21:23:02 +01:00
BuildRequires: libibmad-devel
%endif
Accepting request 454272 from home:eeich:branches:network:cluster - Updated to 16.05.8.1 * Remove StoragePass from being printed out in the slurmdbd log at debug2 level. * Defer PATH search for task program until launch in slurmstepd. * Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is enabled. * Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. * If job is held for bad constraints make it so once updated the job doesn't go into JobAdminHeld. * sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. * When unpacking a node or front_end record from state and the protocol version is lower than the min version, set it to the min. * Remove redundant lookup for part_ptr when updating a reservation's nodes. * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic. * Do not allocate specialized cores to jobs using the --exclusive option. * Cancel interactive job if Prolog failure with "PrologFlags=contain" or "PrologFlags=alloc" configured. Send new error prolog failure message to the salloc or srun command as needed. * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line. * Fix check for PluginDir within slurmctld to work with multiple directories. * Cancel interactive jobs automatically on communication error to launching srun/salloc process. * Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371). - Replace group/user add macros with function calls. - Disable building with netloc support: the netloc API is part of the devel branch of hwloc. Since this devel branch was included accidentally and has been reversed since, we need to disable this for the time being. - Conditionalized architecture specific pieces to support non-x86 architectures better. - Remove: unneeded 'BuildRequires: python' - Add: BuildRequires: freeipmi-devel BuildRequires: libibmad-devel BuildRequires: libibumad-devel so they are picked up by the slurm build. - Enable modifications from openHPC Project. - Enable lua API package build. - Add a recommends for slurm-munge to the slurm package: This is way, the munge auth method is available and slurm works out of the box. - Create /var/lib/slurm as StateSaveLocation directory. /tmp is dangerous. - Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE. OBS-URL: https://build.opensuse.org/request/show/454272 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=13
2017-02-02 21:23:02 +01:00
BuildRequires: libibumad-devel
%endif
%if 0%{?suse_version} > 1140
BuildRequires: libhwloc-devel
%ifarch %{ix86} x86_64
BuildRequires: freeipmi-devel
%endif
%endif
BuildRequires: libcurl-devel
%if 0%{?have_json_c}
BuildRequires: libjson-c-devel
%endif
%if 0%{?have_lz4}
BuildRequires: liblz4-devel
%endif
BuildRequires: libssh2-devel
BuildRequires: libyaml-devel
BuildRequires: rrdtool-devel
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%if 0%{?with_systemd}
%{?have_sysuser:BuildRequires: sysuser-tools}
%{?systemd_ordering}
BuildRequires: dejagnu
BuildRequires: pkgconfig(systemd)
%else
Requires(post): %insserv_prereq %fillup_prereq
%endif
BuildRoot: %{_tmppath}/%{name}-%{version}-build
Obsoletes: slurm-sched-wiki < %{version}
Obsoletes: slurmdb-direct < %{version}
%description
SLURM is a fault-tolerant scalable cluster management and job
scheduling system for Linux clusters containing up to 65,536 nodes.
Components include machine status, partition management, job
management, scheduling and accounting modules.
Accepting request 435622 from home:eeich:branches:network:cluster - version 15.08.7.1 * Remove the 1024-character limit on lines in batch scripts. task/affinity: Disable core-level task binding if more CPUs required than available cores. * Preemption/gang scheduling: If a job is suspended at slurmctld restart or reconfiguration time, then leave it suspended rather than resume+suspend. * Don't use lower weight nodes for job allocation when topology/tree used. * Don't allow user specified reservation names to disrupt the normal reservation sequeuece numbering scheme. * Avoid hard-link/copy of script/environment files for job arrays. Use the master job record file for all tasks of the job array. NOTE: Job arrays submitted to Slurm version 15.08.6 or later will fail if the slurmctld daemon is downgraded to an earlier version of Slurm. * In slurmctld log file, log duplicate job ID found by slurmd. Previously was being logged as prolog/epilog failure. * If a job is requeued while in the process of being launch, remove it's job ID from slurmd's record of active jobs in order to avoid generating a duplicate job ID error when launched for the second time (which would drain the node). * Cleanup messages when handling job script and environment variables in older directory structure formats. * Prevent triggering gang scheduling within a partition if configured with PreemptType=partition_prio and PreemptMode=suspend,gang. * Decrease parallelism in job cancel request to prevent denial of service when cancelling huge numbers of jobs. * If all ephemeral ports are in use, try using other port numbers. * Prevent "scontrol update job" from updating jobs that have already finished. * Show requested TRES in "squeue -O tres" when job is pending. * Backfill scheduler: Test association and QOS node limits before reserving resources for pending job. * Many bug fixes. - Use source services to download package. - Fix code for new API of hwloc-2.0. - package netloc_to_topology where avialable. - Package documentation. OBS-URL: https://build.opensuse.org/request/show/435622 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=10
2016-10-16 21:51:20 +02:00
%package doc
Summary: Documentation for SLURM
Group: Documentation/HTML
%{?upgrade:Provides: %{pname}-doc = %{version}}
%{?upgrade:Conflicts: %{pname}-doc}
Accepting request 435622 from home:eeich:branches:network:cluster - version 15.08.7.1 * Remove the 1024-character limit on lines in batch scripts. task/affinity: Disable core-level task binding if more CPUs required than available cores. * Preemption/gang scheduling: If a job is suspended at slurmctld restart or reconfiguration time, then leave it suspended rather than resume+suspend. * Don't use lower weight nodes for job allocation when topology/tree used. * Don't allow user specified reservation names to disrupt the normal reservation sequeuece numbering scheme. * Avoid hard-link/copy of script/environment files for job arrays. Use the master job record file for all tasks of the job array. NOTE: Job arrays submitted to Slurm version 15.08.6 or later will fail if the slurmctld daemon is downgraded to an earlier version of Slurm. * In slurmctld log file, log duplicate job ID found by slurmd. Previously was being logged as prolog/epilog failure. * If a job is requeued while in the process of being launch, remove it's job ID from slurmd's record of active jobs in order to avoid generating a duplicate job ID error when launched for the second time (which would drain the node). * Cleanup messages when handling job script and environment variables in older directory structure formats. * Prevent triggering gang scheduling within a partition if configured with PreemptType=partition_prio and PreemptMode=suspend,gang. * Decrease parallelism in job cancel request to prevent denial of service when cancelling huge numbers of jobs. * If all ephemeral ports are in use, try using other port numbers. * Prevent "scontrol update job" from updating jobs that have already finished. * Show requested TRES in "squeue -O tres" when job is pending. * Backfill scheduler: Test association and QOS node limits before reserving resources for pending job. * Many bug fixes. - Use source services to download package. - Fix code for new API of hwloc-2.0. - package netloc_to_topology where avialable. - Package documentation. OBS-URL: https://build.opensuse.org/request/show/435622 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=10
2016-10-16 21:51:20 +02:00
%package webdoc
Summary: Set up SLURM Documentation Server
Group: Productivity/Clustering/Computing
%if 0%{?have_apache_rpm_macros}
BuildRequires: apache-rpm-macros
%else
%define apache_sysconfdir /etc/apache2
%endif
Requires: slurm-doc = %{version}
Requires(pre): apache2
%description webdoc
Set up HTTP server for SLURM configuration.
Accepting request 435622 from home:eeich:branches:network:cluster - version 15.08.7.1 * Remove the 1024-character limit on lines in batch scripts. task/affinity: Disable core-level task binding if more CPUs required than available cores. * Preemption/gang scheduling: If a job is suspended at slurmctld restart or reconfiguration time, then leave it suspended rather than resume+suspend. * Don't use lower weight nodes for job allocation when topology/tree used. * Don't allow user specified reservation names to disrupt the normal reservation sequeuece numbering scheme. * Avoid hard-link/copy of script/environment files for job arrays. Use the master job record file for all tasks of the job array. NOTE: Job arrays submitted to Slurm version 15.08.6 or later will fail if the slurmctld daemon is downgraded to an earlier version of Slurm. * In slurmctld log file, log duplicate job ID found by slurmd. Previously was being logged as prolog/epilog failure. * If a job is requeued while in the process of being launch, remove it's job ID from slurmd's record of active jobs in order to avoid generating a duplicate job ID error when launched for the second time (which would drain the node). * Cleanup messages when handling job script and environment variables in older directory structure formats. * Prevent triggering gang scheduling within a partition if configured with PreemptType=partition_prio and PreemptMode=suspend,gang. * Decrease parallelism in job cancel request to prevent denial of service when cancelling huge numbers of jobs. * If all ephemeral ports are in use, try using other port numbers. * Prevent "scontrol update job" from updating jobs that have already finished. * Show requested TRES in "squeue -O tres" when job is pending. * Backfill scheduler: Test association and QOS node limits before reserving resources for pending job. * Many bug fixes. - Use source services to download package. - Fix code for new API of hwloc-2.0. - package netloc_to_topology where avialable. - Package documentation. OBS-URL: https://build.opensuse.org/request/show/435622 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=10
2016-10-16 21:51:20 +02:00
%description doc
Documentation (HTML) for the SLURM cluster managment software.
Accepting request 435622 from home:eeich:branches:network:cluster - version 15.08.7.1 * Remove the 1024-character limit on lines in batch scripts. task/affinity: Disable core-level task binding if more CPUs required than available cores. * Preemption/gang scheduling: If a job is suspended at slurmctld restart or reconfiguration time, then leave it suspended rather than resume+suspend. * Don't use lower weight nodes for job allocation when topology/tree used. * Don't allow user specified reservation names to disrupt the normal reservation sequeuece numbering scheme. * Avoid hard-link/copy of script/environment files for job arrays. Use the master job record file for all tasks of the job array. NOTE: Job arrays submitted to Slurm version 15.08.6 or later will fail if the slurmctld daemon is downgraded to an earlier version of Slurm. * In slurmctld log file, log duplicate job ID found by slurmd. Previously was being logged as prolog/epilog failure. * If a job is requeued while in the process of being launch, remove it's job ID from slurmd's record of active jobs in order to avoid generating a duplicate job ID error when launched for the second time (which would drain the node). * Cleanup messages when handling job script and environment variables in older directory structure formats. * Prevent triggering gang scheduling within a partition if configured with PreemptType=partition_prio and PreemptMode=suspend,gang. * Decrease parallelism in job cancel request to prevent denial of service when cancelling huge numbers of jobs. * If all ephemeral ports are in use, try using other port numbers. * Prevent "scontrol update job" from updating jobs that have already finished. * Show requested TRES in "squeue -O tres" when job is pending. * Backfill scheduler: Test association and QOS node limits before reserving resources for pending job. * Many bug fixes. - Use source services to download package. - Fix code for new API of hwloc-2.0. - package netloc_to_topology where avialable. - Package documentation. OBS-URL: https://build.opensuse.org/request/show/435622 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=10
2016-10-16 21:51:20 +02:00
%package -n perl-%{name}
Summary: Perl API to SLURM
Group: Development/Languages/Perl
Requires: %{name} = %{version}
%if 0%{?suse_version} < 1140
Requires: perl = %{perl_version}
%else
%{libperl_requires}
%{perl_requires}
%endif
%{?upgrade:Provides: perl-%{pname} = %{version}}
%{?upgrade:Conflicts: perl-%{pname}}
%description -n perl-%{name}
This package includes the Perl API to provide an interface to SLURM
through Perl.
%package -n %{libslurm}
# the .so number of libslurm is bumped with each major release
# therefore no need for a version string for Leap/SLE upgrade packages
Summary: Libraries for SLURM
Group: System/Libraries
Requires: %{name}-config
Conflicts: %{name}-config < %{ver_major}
Conflicts: %{name}-config > %{ver_major}.99
Provides: libslurm = %{version}
Conflicts: libslurm
%description -n %{libslurm}
This package contains the library needed to run programs dynamically linked
with SLURM.
%package -n libpmi%{pmi_so}%{?upgrade:%{_ver}}
Summary: SLURM PMI Library
Group: System/Libraries
%{?upgrade:Provides: libpmi%{pmi_so} = %{version}}
%{?upgrade:Conflicts: libpmi%{pmi_so}}
%description -n libpmi%{pmi_so}%{?upgrade:%{_ver}}
This package contains the library needed to run programs dynamically linked
with SLURM.
%package -n libnss_%{pname}%{nss_so}%{?upgrade:%{_ver}}
Summary: NSS Plugin for SLURM
Group: System/Libraries
%{?upgrade:Provides: libnss_%{pname}%{nss_so} = %{version}}
%{?upgrade:Conflicts: libnss_%{pname}%{nss_so}}
%description -n libnss_%{pname}%{nss_so}%{?upgrade:%{_ver}}
libnss_slurm is an optional NSS plugin that permits password and group
resolution for a job on a compute node to be serviced through the local
slurmstepd process.
%package devel
Summary: Development package for SLURM
Group: Development/Libraries/C and C++
Requires: %{libslurm} = %{version}
Requires: %{name} = %{version}
Requires: libpmi%{pmi_so} = %{version}
%{?upgrade:Provides: %{pname}-devel = %{version}}
%{?upgrade:Conflicts: %{pname}-devel}
%description devel
This package includes the header files for the SLURM API.
%package auth-none
Summary: SLURM auth NULL implementation (no authentication)
Group: Productivity/Clustering/Computing
Requires: %{name} = %{version}
%{?upgrade:Provides: %{pname}-auth-none = %{version}}
%{?upgrade:Conflicts: %{pname}-auth-none}
%description auth-none
This package cobtains the SLURM NULL authentication module.
%package munge
Summary: SLURM authentication and crypto implementation using Munge
Group: Productivity/Clustering/Computing
Requires: %{name}-plugins = %{version}
Requires: munge
BuildRequires: munge-devel
Obsoletes: %{name}-auth-munge < %{version}
Provides: %{name}-auth-munge = %{version}
%{?upgrade:Provides: %{pname}-munge = %{version}}
%{?upgrade:Conflicts: %{pname}-munge}
%description munge
This package contains the SLURM authentication module for Chris Dunlap's Munge.
%package sview
Summary: SLURM graphical interface
Group: Productivity/Clustering/Computing
%{?upgrade:Provides: %{pname}-sview = %{version}}
%{?upgrade:Conflicts: %{pname}-sview}
%description sview
sview is a graphical user interface to get and update state information for
jobs, partitions, and nodes managed by SLURM.
%package slurmdbd
Summary: SLURM database daemon
Group: Productivity/Clustering/Computing
Requires: %{name}-config = %{version}
Requires: %{name}-plugins = %{version}
Requires: %{name}-sql = %{version}
%if 0%{?suse_version} > 1310
Recommends: mariadb
%endif
%if 0%{?have_boolean_deps}
Recommends: (%{name}-munge = %version if munge)
%else
Recommends: %{name}-munge = %version
%endif
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%if 0%{?with_systemd}
%{?systemd_ordering}
%else
Requires(post): %insserv_prereq %fillup_prereq
%endif
Obsoletes: slurm-sched-wiki < %{version}
Obsoletes: slurmdb-direct < %{version}
%{?upgrade:Provides: %{pname}-slurmdbd = %{version}}
%{?upgrade:Conflicts: %{pname}-slurmdb}
%description slurmdbd
The SLURM database daemon provides accounting of jobs in a database.
%package sql
Summary: Slurm SQL support
Group: Productivity/Clustering/Computing
%{?upgrade:Provides: %{pname}-sql = %{version}}
%{?upgrade:Conflicts: %{pname}-sql}
%description sql
Contains interfaces to MySQL for use by SLURM.
%package plugins
Summary: SLURM plugins (loadable shared objects)
Group: Productivity/Clustering/Computing
%{?upgrade:Provides: %{pname}-plugins = %{version}}
%{?upgrade:Conflicts: %{pname}-plugins}
%if %{with pmix}
Requires: libpmix%{pmix_so}
Requires: pmix
%endif
%description plugins
This package contains the SLURM plugins (loadable shared objects)
%package torque
Summary: Wrappers for transitition from Torque/PBS to SLURM
Group: Productivity/Clustering/Computing
Requires: perl-%{name} = %{version}
Requires: perl-Switch
Provides: torque-client
%{?upgrade:Provides: %{pname}-torque = %{version}}
%{?upgrade:Conflicts: %{pname}-torque}
%description torque
Wrapper scripts for aiding migration from Torque/PBS to SLURM.
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%package openlava
Summary: Wrappers for transitition from OpenLava/LSF to Slurm
Group: Productivity/Clustering/Computing
Requires: perl-%{name} = %{version}
%{?upgrade:Provides: %{pname}-openlava = %{version}}
%{?upgrade:Conflicts: %{pname}-openlava}
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%description openlava
Wrapper scripts for aiding migration from OpenLava/LSF to Slurm
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%package seff
Summary: Mail tool that includes job statistics in user notification email
Group: Productivity/Clustering/Computing
Requires: perl-%{name} = %{version}
%{?upgrade:Provides: %{pname}-seff = %{version}}
%{?upgrade:Conflicts: %{pname}-seff}
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%description seff
Mail program used directly by the SLURM daemons. On completion of a job,
it waits for accounting information to be available and includes that
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
information in the email body.
%package sjstat
Summary: Perl tool to print SLURM job state information
Group: Productivity/Clustering/Computing
Requires: %{name} = %{version}
%{?upgrade:Provides: %{pname}-sjstat = %{version}}
%{?upgrade:Conflicts: %{pname}-sjstat}
%if 0%{?suse_version} < 1140
Requires: perl = %{perl_version}
%else
%{perl_requires}
%endif
%description sjstat
This package contains a Perl tool to print SLURM job state information.
%package pam_slurm
Summary: PAM module for restricting access to compute nodes via SLURM
Group: Productivity/Clustering/Computing
Requires: %{name}-node = %{version}
%{?upgrade:Provides: %{pname}-pam_slurm = %{version}}
%{?upgrade:Conflicts: %{pname}-pam_slurm}
BuildRequires: pam-devel
%description pam_slurm
This module restricts access to compute nodes in a cluster where the Simple
Linux Utility for Resource Managment (SLURM) is in use. Access is granted
to root, any user with an SLURM-launched job currently running on the node,
or any user who has allocated resources on the node according to the SLURM.
Accepting request 454272 from home:eeich:branches:network:cluster - Updated to 16.05.8.1 * Remove StoragePass from being printed out in the slurmdbd log at debug2 level. * Defer PATH search for task program until launch in slurmstepd. * Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is enabled. * Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. * If job is held for bad constraints make it so once updated the job doesn't go into JobAdminHeld. * sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. * When unpacking a node or front_end record from state and the protocol version is lower than the min version, set it to the min. * Remove redundant lookup for part_ptr when updating a reservation's nodes. * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic. * Do not allocate specialized cores to jobs using the --exclusive option. * Cancel interactive job if Prolog failure with "PrologFlags=contain" or "PrologFlags=alloc" configured. Send new error prolog failure message to the salloc or srun command as needed. * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line. * Fix check for PluginDir within slurmctld to work with multiple directories. * Cancel interactive jobs automatically on communication error to launching srun/salloc process. * Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371). - Replace group/user add macros with function calls. - Disable building with netloc support: the netloc API is part of the devel branch of hwloc. Since this devel branch was included accidentally and has been reversed since, we need to disable this for the time being. - Conditionalized architecture specific pieces to support non-x86 architectures better. - Remove: unneeded 'BuildRequires: python' - Add: BuildRequires: freeipmi-devel BuildRequires: libibmad-devel BuildRequires: libibumad-devel so they are picked up by the slurm build. - Enable modifications from openHPC Project. - Enable lua API package build. - Add a recommends for slurm-munge to the slurm package: This is way, the munge auth method is available and slurm works out of the box. - Create /var/lib/slurm as StateSaveLocation directory. /tmp is dangerous. - Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE. OBS-URL: https://build.opensuse.org/request/show/454272 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=13
2017-02-02 21:23:02 +01:00
%package lua
Summary: Lua API for SLURM
Group: Development/Languages/Other
Requires: %{name} = %{version}
%{?upgrade:Provides: %{pname}-lua = %{version}}
%{?upgrade:Conflicts: %{pname}-lua}
Accepting request 454272 from home:eeich:branches:network:cluster - Updated to 16.05.8.1 * Remove StoragePass from being printed out in the slurmdbd log at debug2 level. * Defer PATH search for task program until launch in slurmstepd. * Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is enabled. * Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. * If job is held for bad constraints make it so once updated the job doesn't go into JobAdminHeld. * sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. * When unpacking a node or front_end record from state and the protocol version is lower than the min version, set it to the min. * Remove redundant lookup for part_ptr when updating a reservation's nodes. * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic. * Do not allocate specialized cores to jobs using the --exclusive option. * Cancel interactive job if Prolog failure with "PrologFlags=contain" or "PrologFlags=alloc" configured. Send new error prolog failure message to the salloc or srun command as needed. * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line. * Fix check for PluginDir within slurmctld to work with multiple directories. * Cancel interactive jobs automatically on communication error to launching srun/salloc process. * Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371). - Replace group/user add macros with function calls. - Disable building with netloc support: the netloc API is part of the devel branch of hwloc. Since this devel branch was included accidentally and has been reversed since, we need to disable this for the time being. - Conditionalized architecture specific pieces to support non-x86 architectures better. - Remove: unneeded 'BuildRequires: python' - Add: BuildRequires: freeipmi-devel BuildRequires: libibmad-devel BuildRequires: libibumad-devel so they are picked up by the slurm build. - Enable modifications from openHPC Project. - Enable lua API package build. - Add a recommends for slurm-munge to the slurm package: This is way, the munge auth method is available and slurm works out of the box. - Create /var/lib/slurm as StateSaveLocation directory. /tmp is dangerous. - Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE. OBS-URL: https://build.opensuse.org/request/show/454272 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=13
2017-02-02 21:23:02 +01:00
BuildRequires: lua-devel
%description lua
This package includes the Lua API to provide an interface to SLURM
through Lua.
%package rest
Summary: Slurm REST API Interface
Group: Productivity/Clustering/Computing
Requires: %{name}-config = %{version}
%if 0%{?have_http_parser}
BuildRequires: http-parser-devel
%endif
%if 0%{?have_boolean_deps}
Recommends: (%{name}-munge = %version if munge)
%else
Recommends: %{name}-munge = %version
%endif
%{?upgrade:Provides: %{pname}-rest = %{version}}
%{?upgrade:Conflicts: %{pname}-rest}
%description rest
This package provides the interface to SLURM via REST API.
%package node
Summary: Minimal slurm node
Group: Productivity/Clustering/Computing
Requires: %{name}-config = %{version}
Requires: %{name}-plugins = %{version}
%if 0%{?have_boolean_deps}
Recommends: (%{name}-munge = %version if munge)
%else
Recommends: %{name}-munge = %version
%endif
%if 0%{?with_systemd}
%{?systemd_ordering}
%else
Requires(post): %insserv_prereq %fillup_prereq
%endif
%{?upgrade:Provides: %{pname}-node = %{version}}
%{?upgrade:Conflicts: %{pname}-node}
%description node
This package contains just the minmal code to run a compute node.
%package config
Summary: Config files and directories for slurm services
Group: Productivity/Clustering/Computing
Requires: logrotate
%if 0%{?suse_version} <= 1140
Requires(pre): pwdutils
%else
Requires(pre): shadow
%endif
%if 0%{?with_systemd}
%{?systemd_ordering}
%endif
%{?upgrade:Provides: %{pname}-config = %{version}}
%{?upgrade:Conflicts: %{pname}-config}
%description config
This package contains the slurm config files necessary direcories
for the slurm daemons.
Accepting request 650545 from home:eeich:branches:network:cluster - Added missing perl-base dependency. - Moved HTML docs to doc package. - Moved config man pages to a separate package: This way, they won't get installed on compute nodes. - Update to 18.08.3 * Add new burst buffer state of "teardown-fail" to indicate the burst buffer teardown operation is failing on specific buffers. * Multiple backup slurmctld daemons can be configured * Enable jobs with zero node count for creation and/or deletion of persistent burst buffers. * Add "scontrol show dwstat" command to display Cray burst buffer status. * Add "GetSysStatus" option to burst_buffer.conf file. * Add node and partition configuration options of "CpuBind" to control default task binding. * Add "NumaCpuBind" option to knl.conf * Add sbatch "--batch" option to identify features required on batch node. * Add "BatchFeatures" field to output of "scontrol show job". * Add support for "--bb" option to sbatch command. * Add new SystemComment field to job data structure and database. * Expand reservation "flags" field from 32 to 64 bits. * Add job state flag of "SIGNALING" to avoid race condition. * Properly handle srun --will-run option when there are jobs in COMPLETING state. * Properly report who is signaling a step. * Don't combine updated reservation records in sreport's reservation report. * node_features plugin - Add suport for XOR & XAND of job constraints (node feature specifications). OBS-URL: https://build.opensuse.org/request/show/650545 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=75
2018-11-20 18:07:44 +01:00
%package config-man
Summary: Config files and directories for slurm services
Group: Documentation/Man
%{?upgrade:Provides: %{pname}-config-man = %{version}}
%{?upgrade:Conflicts: %{pname}-config-man}
Accepting request 650545 from home:eeich:branches:network:cluster - Added missing perl-base dependency. - Moved HTML docs to doc package. - Moved config man pages to a separate package: This way, they won't get installed on compute nodes. - Update to 18.08.3 * Add new burst buffer state of "teardown-fail" to indicate the burst buffer teardown operation is failing on specific buffers. * Multiple backup slurmctld daemons can be configured * Enable jobs with zero node count for creation and/or deletion of persistent burst buffers. * Add "scontrol show dwstat" command to display Cray burst buffer status. * Add "GetSysStatus" option to burst_buffer.conf file. * Add node and partition configuration options of "CpuBind" to control default task binding. * Add "NumaCpuBind" option to knl.conf * Add sbatch "--batch" option to identify features required on batch node. * Add "BatchFeatures" field to output of "scontrol show job". * Add support for "--bb" option to sbatch command. * Add new SystemComment field to job data structure and database. * Expand reservation "flags" field from 32 to 64 bits. * Add job state flag of "SIGNALING" to avoid race condition. * Properly handle srun --will-run option when there are jobs in COMPLETING state. * Properly report who is signaling a step. * Don't combine updated reservation records in sreport's reservation report. * node_features plugin - Add suport for XOR & XAND of job constraints (node feature specifications). OBS-URL: https://build.opensuse.org/request/show/650545 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=75
2018-11-20 18:07:44 +01:00
%description config-man
Man pages for the SLURM cluster managment software config files.
%package hdf5
Summary: Store accounting data in hdf5
Group: Productivity/Clustering/Computing
%description hdf5
Plugin to store accounting in the hdf5 file format. This plugin has to be
activated in the slurm configuration. Includes also utility the program
sh5utils to merge this hdf5 files or extract data from them.
Accepting request 714908 from home:mslacken:branches:network:cluster - added cray depend libraries to seperate package, as they are now built, since json is enabled - Updated to 18.0.7 for fixing CVE-2019-12838 and (bsc#1140709) * Update "xauth list" to use the same 10000ms timeout as the other xauth commands. * Fix issue in gres code to handle a gres cnt of 0. * Don't purge jobs if backfill is running. * Verify job is pending add/removing accrual time. * Don't abort when the job doesn't have an association that was removed before the job was able to make it to the database. * Set state_reason if select_nodes() fails job for QOS or Account. * Avoid seg_fault on referencing association without a valid_qos bitmap. * If Association/QOS is removed on a pending job set that job as ineligible. * When changing a jobs account/qos always make sure you remove the old limits. * Don't reset a FAIL_QOS or FAIL_ACCOUNT job reason until the qos or account changed. * Restore "sreport -T ALL" functionality. * Correctly typecast signals being sent through the api. * Properly initialize structures throughout Slurm. * Sync "numtask" squeue format option for jobs and steps to "numtasks". * Fix sacct -PD to avoid CA before start jobs. * Fix potential deadlock with backup slurmctld. * Fixed issue with jobs not appearing in sacct after dependency satisfied. * Fix showing non-eligible jobs when asking with -j and not -s. * Fix issue with backfill scheduler scheduling tasks of an array when not the head job. * accounting_storage/mysql - fix SIGABRT in the archive load logic. * accounting_storage/mysql - fix memory leak in the archive load logic. * Limit records per single SQL statement when loading archived data. OBS-URL: https://build.opensuse.org/request/show/714908 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=100
2019-07-12 20:09:50 +02:00
%package cray
Summary: Cray specific plugins
Group: Productivity/Clustering/Computing
%description cray
Plugins for specific cray hardware, includes power and knl node management.
Contains also cray specific documentation.
%prep
%setup -q -n %{pname}-%{dl_ver}
%patch0 -p1
%patch1 -p1
Accepting request 435622 from home:eeich:branches:network:cluster - version 15.08.7.1 * Remove the 1024-character limit on lines in batch scripts. task/affinity: Disable core-level task binding if more CPUs required than available cores. * Preemption/gang scheduling: If a job is suspended at slurmctld restart or reconfiguration time, then leave it suspended rather than resume+suspend. * Don't use lower weight nodes for job allocation when topology/tree used. * Don't allow user specified reservation names to disrupt the normal reservation sequeuece numbering scheme. * Avoid hard-link/copy of script/environment files for job arrays. Use the master job record file for all tasks of the job array. NOTE: Job arrays submitted to Slurm version 15.08.6 or later will fail if the slurmctld daemon is downgraded to an earlier version of Slurm. * In slurmctld log file, log duplicate job ID found by slurmd. Previously was being logged as prolog/epilog failure. * If a job is requeued while in the process of being launch, remove it's job ID from slurmd's record of active jobs in order to avoid generating a duplicate job ID error when launched for the second time (which would drain the node). * Cleanup messages when handling job script and environment variables in older directory structure formats. * Prevent triggering gang scheduling within a partition if configured with PreemptType=partition_prio and PreemptMode=suspend,gang. * Decrease parallelism in job cancel request to prevent denial of service when cancelling huge numbers of jobs. * If all ephemeral ports are in use, try using other port numbers. * Prevent "scontrol update job" from updating jobs that have already finished. * Show requested TRES in "squeue -O tres" when job is pending. * Backfill scheduler: Test association and QOS node limits before reserving resources for pending job. * Many bug fixes. - Use source services to download package. - Fix code for new API of hwloc-2.0. - package netloc_to_topology where avialable. - Package documentation. OBS-URL: https://build.opensuse.org/request/show/435622 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=10
2016-10-16 21:51:20 +02:00
%patch2 -p1
%patch3 -p1
%if 0%{?python_ver} < 3
# Workaround for wrongly flagged python3 to keep SLE-11-SP4 building
mkdir -p mybin; ln -s /usr/bin/python2 mybin/python3
%endif
%build
Accepting request 874647 from home:mslacken:branches:network:cluster - Udpate to 20.11.04 * Fix node selection for advanced reservations with features. * mpi/pmix: Handle pipe failure better when using ucx. * mpi/pmix: include PMIX_NODEID for each process entry. * Fix job getting rejected after being requeued on same node that died. * job_submit/lua - add "network" field. * Fix situations when a reoccuring reservation could erroneously skip a period. * Ensure that a reservations [pro|epi]log are ran on reoccuring reservations. * Fix threads-per-core memory allocation issue when using CR_CPU_MEMORY. * Fix scheduling issue with --gpus. * Fix gpu allocations that request --cpus-per-task. * mpi/pmix: fixed print messages for all PMIXP_* macros * Add mapping for XCPU to --signal option. * Fix regression in 20.11 that prevented a full pass of the main scheduler from ever executing. * Work around a glibc bug in which "0" is incorrectly printed as "nan" which will result in corrupted association state on restart. * Fix regression in 20.11 which made slurmd incorrectly attempt to find the parent slurmd address when not applicable and send incorrect reverse*tree info to the slurmstepd. * Fix cgroup ns detection when using containers (e.g. LXC or Docker). * scrontab - change temporary file handling to work with emacs. - Removed check-for-lipmix.so.MAJOR.patch - Added: load-pmix-major-version.patch OBS-URL: https://build.opensuse.org/request/show/874647 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=173
2021-02-24 10:49:16 +01:00
autoreconf
Accepting request 714908 from home:mslacken:branches:network:cluster - added cray depend libraries to seperate package, as they are now built, since json is enabled - Updated to 18.0.7 for fixing CVE-2019-12838 and (bsc#1140709) * Update "xauth list" to use the same 10000ms timeout as the other xauth commands. * Fix issue in gres code to handle a gres cnt of 0. * Don't purge jobs if backfill is running. * Verify job is pending add/removing accrual time. * Don't abort when the job doesn't have an association that was removed before the job was able to make it to the database. * Set state_reason if select_nodes() fails job for QOS or Account. * Avoid seg_fault on referencing association without a valid_qos bitmap. * If Association/QOS is removed on a pending job set that job as ineligible. * When changing a jobs account/qos always make sure you remove the old limits. * Don't reset a FAIL_QOS or FAIL_ACCOUNT job reason until the qos or account changed. * Restore "sreport -T ALL" functionality. * Correctly typecast signals being sent through the api. * Properly initialize structures throughout Slurm. * Sync "numtask" squeue format option for jobs and steps to "numtasks". * Fix sacct -PD to avoid CA before start jobs. * Fix potential deadlock with backup slurmctld. * Fixed issue with jobs not appearing in sacct after dependency satisfied. * Fix showing non-eligible jobs when asking with -j and not -s. * Fix issue with backfill scheduler scheduling tasks of an array when not the head job. * accounting_storage/mysql - fix SIGABRT in the archive load logic. * accounting_storage/mysql - fix memory leak in the archive load logic. * Limit records per single SQL statement when loading archived data. OBS-URL: https://build.opensuse.org/request/show/714908 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=100
2019-07-12 20:09:50 +02:00
%define _lto_cflags %{nil}
[ -e $(pwd)/mybin ] && PATH=$(pwd)/mybin:$PATH
%configure --enable-shared \
--disable-static \
--without-rpath \
--without-datawarp \
--with-shared-libslurm \
--with-pam_dir=%_pam_moduledir \
%{?with_pmix:--with-pmix=/usr/} \
%if 0%{?build_slurmrestd}
--enable-slurmrestd \
%endif
--with-yaml \
%{!?have_netloc:--without-netloc} \
--sysconfdir=%{_sysconfdir}/%{pname} \
%{!?have_hdf5:--without-hdf5} \
%{!?have_lz4:--without-lz4} \
Accepting request 714908 from home:mslacken:branches:network:cluster - added cray depend libraries to seperate package, as they are now built, since json is enabled - Updated to 18.0.7 for fixing CVE-2019-12838 and (bsc#1140709) * Update "xauth list" to use the same 10000ms timeout as the other xauth commands. * Fix issue in gres code to handle a gres cnt of 0. * Don't purge jobs if backfill is running. * Verify job is pending add/removing accrual time. * Don't abort when the job doesn't have an association that was removed before the job was able to make it to the database. * Set state_reason if select_nodes() fails job for QOS or Account. * Avoid seg_fault on referencing association without a valid_qos bitmap. * If Association/QOS is removed on a pending job set that job as ineligible. * When changing a jobs account/qos always make sure you remove the old limits. * Don't reset a FAIL_QOS or FAIL_ACCOUNT job reason until the qos or account changed. * Restore "sreport -T ALL" functionality. * Correctly typecast signals being sent through the api. * Properly initialize structures throughout Slurm. * Sync "numtask" squeue format option for jobs and steps to "numtasks". * Fix sacct -PD to avoid CA before start jobs. * Fix potential deadlock with backup slurmctld. * Fixed issue with jobs not appearing in sacct after dependency satisfied. * Fix showing non-eligible jobs when asking with -j and not -s. * Fix issue with backfill scheduler scheduling tasks of an array when not the head job. * accounting_storage/mysql - fix SIGABRT in the archive load logic. * accounting_storage/mysql - fix memory leak in the archive load logic. * Limit records per single SQL statement when loading archived data. OBS-URL: https://build.opensuse.org/request/show/714908 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=100
2019-07-12 20:09:50 +02:00
%{!?have_json_c:--without-json}
make %{?_smp_mflags}
%install
[ -e $(pwd)/mybin ] && PATH=$(pwd)/mybin:$PATH
Accepting request 454272 from home:eeich:branches:network:cluster - Updated to 16.05.8.1 * Remove StoragePass from being printed out in the slurmdbd log at debug2 level. * Defer PATH search for task program until launch in slurmstepd. * Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is enabled. * Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. * If job is held for bad constraints make it so once updated the job doesn't go into JobAdminHeld. * sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. * When unpacking a node or front_end record from state and the protocol version is lower than the min version, set it to the min. * Remove redundant lookup for part_ptr when updating a reservation's nodes. * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic. * Do not allocate specialized cores to jobs using the --exclusive option. * Cancel interactive job if Prolog failure with "PrologFlags=contain" or "PrologFlags=alloc" configured. Send new error prolog failure message to the salloc or srun command as needed. * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line. * Fix check for PluginDir within slurmctld to work with multiple directories. * Cancel interactive jobs automatically on communication error to launching srun/salloc process. * Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371). - Replace group/user add macros with function calls. - Disable building with netloc support: the netloc API is part of the devel branch of hwloc. Since this devel branch was included accidentally and has been reversed since, we need to disable this for the time being. - Conditionalized architecture specific pieces to support non-x86 architectures better. - Remove: unneeded 'BuildRequires: python' - Add: BuildRequires: freeipmi-devel BuildRequires: libibmad-devel BuildRequires: libibumad-devel so they are picked up by the slurm build. - Enable modifications from openHPC Project. - Enable lua API package build. - Add a recommends for slurm-munge to the slurm package: This is way, the munge auth method is available and slurm works out of the box. - Create /var/lib/slurm as StateSaveLocation directory. /tmp is dangerous. - Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE. OBS-URL: https://build.opensuse.org/request/show/454272 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=13
2017-02-02 21:23:02 +01:00
%make_install
make install-contrib DESTDIR=%{buildroot} PERL_MM_PARAMS="INSTALLDIRS=vendor"
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%if 0%{?with_systemd}
mkdir -p %{buildroot}%{_unitdir}
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
install -p -m644 etc/slurmd.service etc/slurmdbd.service etc/slurmctld.service %{buildroot}%{_unitdir}
ln -s /usr/sbin/service %{buildroot}%{_sbindir}/rcslurmd
ln -s /usr/sbin/service %{buildroot}%{_sbindir}/rcslurmdbd
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
ln -s /usr/sbin/service %{buildroot}%{_sbindir}/rcslurmctld
install -d -m 0755 %{buildroot}/%{_tmpfilesdir}/
cat <<-EOF > %{buildroot}/%{_tmpfilesdir}/%{pname}.conf
# Create a directory with permissions 0700 owned by user slurm, group slurm
d %{_rundir}/slurm 0700 slurm slurm
EOF
chmod 0644 %{buildroot}/%{_tmpfilesdir}/%{pname}.conf
%else
install -D -m755 etc/init.d.slurm %{buildroot}%{_initrddir}/slurm
install -D -m755 etc/init.d.slurmdbd %{buildroot}%{_initrddir}/slurmdbd
ln -sf %{_initrddir}/slurm %{buildroot}%{_sbindir}/rcslurm
ln -sf %{_initrddir}/slurmdbd %{buildroot}%{_sbindir}/rcslurmdbd
%endif
mkdir -p %{buildroot}%{_localstatedir}/spool/slurm
install -D -m644 etc/cgroup.conf.example %{buildroot}/%{_sysconfdir}/%{pname}/cgroup.conf
install -D -m644 etc/slurm.conf.example %{buildroot}/%{_sysconfdir}/%{pname}/slurm.conf.example
install -D -m600 etc/slurmdbd.conf.example %{buildroot}/%{_sysconfdir}/%{pname}/slurmdbd.conf
install -D -m600 etc/slurmdbd.conf.example %{buildroot}%{_sysconfdir}/%{pname}/slurmdbd.conf.example
install -D -m755 contribs/sjstat %{buildroot}%{_bindir}/sjstat
install -D -m755 contribs/sgather/sgather %{buildroot}%{_bindir}/sgather
%if 0%{?have_firewalld}
install -D -m644 %{S:10} %{buildroot}/%{_prefix}/lib/firewalld/services/slurmd.xml
install -D -m644 %{S:11} %{buildroot}/%{_prefix}/lib/firewalld/services/slurmctld.xml
install -D -m644 %{S:12} %{buildroot}/%{_prefix}/lib/firewalld/services/slurmdbd.xml
%endif
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
cat <<EOF >%{buildroot}%{_sysconfdir}/%{pname}/plugstack.conf
include %{_sysconfdir}/%{pname}/plugstack.conf.d/*.conf
EOF
mkdir -p %{buildroot}%{_sysconfdir}/%{pname}/plugstack.conf.d
cp contribs/pam_slurm_adopt/README ../README.pam_slurm_adopt
cp contribs/pam/README ../README.pam_slurm
# remove static pam libs
rm -v %{buildroot}%{_pam_moduledir}/*la
# change slurm.conf for our needs
head -n -2 %{buildroot}/%{_sysconfdir}/%{pname}/slurm.conf.example | grep -v ReturnToService > %{buildroot}/%{_sysconfdir}/%{pname}/slurm.conf
sed -i 's#\(StateSaveLocation=\).*#\1%_localstatedir/lib/slurm#' %{buildroot}/%{_sysconfdir}/%{pname}/slurm.conf
sed -i 's#^\(SlurmdPidFile=\).*$#\1%{_localstatedir}/run/slurm/slurmd.pid#' %{buildroot}/%{_sysconfdir}/%{pname}/slurm.conf
sed -i 's#^\(SlurmctldPidFile=\).*$#\1%{_localstatedir}/run/slurm/slurmctld.pid#' %{buildroot}/%{_sysconfdir}/%{pname}/slurm.conf
sed -i 's#^\(SlurmdSpoolDir=\)/.*#\1%{_localstatedir}/spool/slurm#' %{buildroot}/%{_sysconfdir}/%{pname}/slurm.conf
sed -i -e '/^ControlMachine=/i# Ordered List of Control Nodes' \
-e 's#ControlMachine=\(.*\)$#SlurmctldHost=\1(10.0.10.20)#' \
-e 's#BackupController=.*#SlurmctldHost=linux1(10.0.10.21)#' \
-e '/.*ControlAddr=.*/d' \
-e '/.*BackupAddr=.*/d' %{buildroot}/%{_sysconfdir}/%{pname}/slurm.conf
cat >>%{buildroot}/%{_sysconfdir}/%{pname}/slurm.conf <<EOF
# SUSE default configuration
PropagateResourceLimitsExcept=MEMLOCK
NodeName=linux State=UNKNOWN
PartitionName=normal Nodes=linux Default=YES MaxTime=24:00:00 State=UP
EOF
Accepting request 629222 from home:eeich:branches:network:cluster - Update to 17.11.9 * Fix segfault in slurmctld when a job's node bitmap is NULL during a scheduling cycle. Primarily caused by EnforcePartLimits=ALL. * Remove erroneous unlock in acct_gather_energy/ipmi. * Enable support for hwloc version 2.0.1. * Fix 'srun -q' (--qos) option handling. * Fix socket communication issue that can lead to lost task completition messages, which will cause a permanently stuck srun process. * Handle creation of TMPDIR if environment variable is set or changed in a task prolog script. * Avoid node layout fragmentation if running with a fixed CPU count but without Sockets and CoresPerSocket defined. * burst_buffer/cray - Fix datawarp swap default pool overriding jobdw. * Fix incorrect job priority assignment for multi-partition job with different PriorityTier settings on the partitions. * Fix sinfo to print correct node state. - When using a remote shared StateSaveLocation, slurmctld needs to be started after remote filesystems have become available. Add 'remote-fs.target' to the 'After=' directive in slurmctld.service (boo#1103561). - Update to 17.11.8 * Fix incomplete RESPONSE_[RESOURCE|JOB_PACK]_ALLOCATION building path. * Do not allocate nodes that were marked down due to the node not responding by ResumeTimeout. * task/cray plugin - search for "mems" cgroup information in the file "cpuset.mems" then fall back to the file "mems". * Fix ipmi profile debug uninitialized variable. * PMIx: fixed the direct connect inline msg sending. OBS-URL: https://build.opensuse.org/request/show/629222 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=64
2018-08-14 15:00:16 +02:00
# change slurmdbd.conf for our needs
sed -i 's@LogFile=/var/log/slurm/slurmdbd.log@LogFile=/var/log/slurmdbd.log@'\
%{buildroot}/%{_sysconfdir}/%{pname}/slurmdbd.conf
Accepting request 629222 from home:eeich:branches:network:cluster - Update to 17.11.9 * Fix segfault in slurmctld when a job's node bitmap is NULL during a scheduling cycle. Primarily caused by EnforcePartLimits=ALL. * Remove erroneous unlock in acct_gather_energy/ipmi. * Enable support for hwloc version 2.0.1. * Fix 'srun -q' (--qos) option handling. * Fix socket communication issue that can lead to lost task completition messages, which will cause a permanently stuck srun process. * Handle creation of TMPDIR if environment variable is set or changed in a task prolog script. * Avoid node layout fragmentation if running with a fixed CPU count but without Sockets and CoresPerSocket defined. * burst_buffer/cray - Fix datawarp swap default pool overriding jobdw. * Fix incorrect job priority assignment for multi-partition job with different PriorityTier settings on the partitions. * Fix sinfo to print correct node state. - When using a remote shared StateSaveLocation, slurmctld needs to be started after remote filesystems have become available. Add 'remote-fs.target' to the 'After=' directive in slurmctld.service (boo#1103561). - Update to 17.11.8 * Fix incomplete RESPONSE_[RESOURCE|JOB_PACK]_ALLOCATION building path. * Do not allocate nodes that were marked down due to the node not responding by ResumeTimeout. * task/cray plugin - search for "mems" cgroup information in the file "cpuset.mems" then fall back to the file "mems". * Fix ipmi profile debug uninitialized variable. * PMIx: fixed the direct connect inline msg sending. OBS-URL: https://build.opensuse.org/request/show/629222 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=64
2018-08-14 15:00:16 +02:00
sed -i -e "s@PidFile=.*@PidFile=%{_localstatedir}/run/slurm/slurmdbd.pid@" \
%{buildroot}/%{_sysconfdir}/%{pname}/slurmdbd.conf
Accepting request 629222 from home:eeich:branches:network:cluster - Update to 17.11.9 * Fix segfault in slurmctld when a job's node bitmap is NULL during a scheduling cycle. Primarily caused by EnforcePartLimits=ALL. * Remove erroneous unlock in acct_gather_energy/ipmi. * Enable support for hwloc version 2.0.1. * Fix 'srun -q' (--qos) option handling. * Fix socket communication issue that can lead to lost task completition messages, which will cause a permanently stuck srun process. * Handle creation of TMPDIR if environment variable is set or changed in a task prolog script. * Avoid node layout fragmentation if running with a fixed CPU count but without Sockets and CoresPerSocket defined. * burst_buffer/cray - Fix datawarp swap default pool overriding jobdw. * Fix incorrect job priority assignment for multi-partition job with different PriorityTier settings on the partitions. * Fix sinfo to print correct node state. - When using a remote shared StateSaveLocation, slurmctld needs to be started after remote filesystems have become available. Add 'remote-fs.target' to the 'After=' directive in slurmctld.service (boo#1103561). - Update to 17.11.8 * Fix incomplete RESPONSE_[RESOURCE|JOB_PACK]_ALLOCATION building path. * Do not allocate nodes that were marked down due to the node not responding by ResumeTimeout. * task/cray plugin - search for "mems" cgroup information in the file "cpuset.mems" then fall back to the file "mems". * Fix ipmi profile debug uninitialized variable. * PMIx: fixed the direct connect inline msg sending. OBS-URL: https://build.opensuse.org/request/show/629222 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=64
2018-08-14 15:00:16 +02:00
# manage local state dir and a remote states save location
mkdir -p %{buildroot}/%_localstatedir/lib/slurm
%if 0%{?with_systemd}
Accepting request 629222 from home:eeich:branches:network:cluster - Update to 17.11.9 * Fix segfault in slurmctld when a job's node bitmap is NULL during a scheduling cycle. Primarily caused by EnforcePartLimits=ALL. * Remove erroneous unlock in acct_gather_energy/ipmi. * Enable support for hwloc version 2.0.1. * Fix 'srun -q' (--qos) option handling. * Fix socket communication issue that can lead to lost task completition messages, which will cause a permanently stuck srun process. * Handle creation of TMPDIR if environment variable is set or changed in a task prolog script. * Avoid node layout fragmentation if running with a fixed CPU count but without Sockets and CoresPerSocket defined. * burst_buffer/cray - Fix datawarp swap default pool overriding jobdw. * Fix incorrect job priority assignment for multi-partition job with different PriorityTier settings on the partitions. * Fix sinfo to print correct node state. - When using a remote shared StateSaveLocation, slurmctld needs to be started after remote filesystems have become available. Add 'remote-fs.target' to the 'After=' directive in slurmctld.service (boo#1103561). - Update to 17.11.8 * Fix incomplete RESPONSE_[RESOURCE|JOB_PACK]_ALLOCATION building path. * Do not allocate nodes that were marked down due to the node not responding by ResumeTimeout. * task/cray plugin - search for "mems" cgroup information in the file "cpuset.mems" then fall back to the file "mems". * Fix ipmi profile debug uninitialized variable. * PMIx: fixed the direct connect inline msg sending. OBS-URL: https://build.opensuse.org/request/show/629222 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=64
2018-08-14 15:00:16 +02:00
sed -i -e "s@PIDFile=.*@PIDFile=%{_localstatedir}/run/slurm/slurmctld.pid@" \
-e "s@After=.*@After=network.target munge.service remote-fs.target@" \
%{buildroot}/%{_unitdir}/slurmctld.service
sed -i -e "s@PIDFile=.*@PIDFile=%{_localstatedir}/run/slurm/slurmd.pid@" \
%{buildroot}/%{_unitdir}/slurmd.service
sed -i -e "s@PIDFile=.*@PIDFile=%{_localstatedir}/run/slurm/slurmdbd.pid@" \
-e 's@After=\(.*\)@After=\1 mariadb.service@' \
Accepting request 629222 from home:eeich:branches:network:cluster - Update to 17.11.9 * Fix segfault in slurmctld when a job's node bitmap is NULL during a scheduling cycle. Primarily caused by EnforcePartLimits=ALL. * Remove erroneous unlock in acct_gather_energy/ipmi. * Enable support for hwloc version 2.0.1. * Fix 'srun -q' (--qos) option handling. * Fix socket communication issue that can lead to lost task completition messages, which will cause a permanently stuck srun process. * Handle creation of TMPDIR if environment variable is set or changed in a task prolog script. * Avoid node layout fragmentation if running with a fixed CPU count but without Sockets and CoresPerSocket defined. * burst_buffer/cray - Fix datawarp swap default pool overriding jobdw. * Fix incorrect job priority assignment for multi-partition job with different PriorityTier settings on the partitions. * Fix sinfo to print correct node state. - When using a remote shared StateSaveLocation, slurmctld needs to be started after remote filesystems have become available. Add 'remote-fs.target' to the 'After=' directive in slurmctld.service (boo#1103561). - Update to 17.11.8 * Fix incomplete RESPONSE_[RESOURCE|JOB_PACK]_ALLOCATION building path. * Do not allocate nodes that were marked down due to the node not responding by ResumeTimeout. * task/cray plugin - search for "mems" cgroup information in the file "cpuset.mems" then fall back to the file "mems". * Fix ipmi profile debug uninitialized variable. * PMIx: fixed the direct connect inline msg sending. OBS-URL: https://build.opensuse.org/request/show/629222 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=64
2018-08-14 15:00:16 +02:00
%{buildroot}/%{_unitdir}/slurmdbd.service
%if 0%{?have_sysuser}
echo "u %slurm_u %{slurm_uid} \"%slurmdescr\" %{slurmdir}\n" > system-user-%{pname}.conf
%sysusers_generate_pre system-user-%{pname}.conf %{pname} system-user-%{pname}.conf
install -D -m 644 system-user-%{pname}.conf %{buildroot}%{_sysusersdir}/system-user-%{pname}.conf
%endif
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%endif
# Delete static files:
rm -rf %{buildroot}/%{_libdir}/slurm/*.{a,la} \
%{buildroot}/%{_libdir}/*.la \
%{buildroot}/%_lib/security/*.la
rm %{buildroot}/%{perl_archlib}/perllocal.pod \
%{buildroot}/%{perl_vendorarch}/auto/Slurm/.packlist \
%{buildroot}/%{perl_vendorarch}/auto/Slurmdb/.packlist
# Remove Cray specific binaries
rm -f %{buildroot}/%{_sbindir}/capmc_suspend \
%{buildroot}/%{_sbindir}/capmc_resume
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
# Build man pages that are generated directly by the tools
%{buildroot}%{_bindir}/sjobexitmod --roff > %{buildroot}/%{_mandir}/man1/sjobexitmod.1
%{buildroot}%{_bindir}/sjstat --roff > %{buildroot}/%{_mandir}/man1/sjstat.1
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
# avoid conflicts with other packages, make wrapper unique
mv %{buildroot}/%{_bindir}/mpiexec %{buildroot}/%{_bindir}/mpiexec.slurm
mkdir -p %{buildroot}/etc/ld.so.conf.d
echo '%{_libdir}/slurm' > %{buildroot}/etc/ld.so.conf.d/slurm.conf
chmod 644 %{buildroot}/etc/ld.so.conf.d/slurm.conf
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
# Make pkg-config file
mkdir -p %{buildroot}/%{_libdir}/pkgconfig
cat > %{buildroot}/%{_libdir}/pkgconfig/slurm.pc <<EOF
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
includedir=%{_prefix}/include
libdir=%{_libdir}
Cflags: -I\${includedir}
Libs: -L\${libdir} -lslurm
Description: Slurm API
Name: %{pname}
Version: %{version}
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
EOF
# Enable rotation of log files
mkdir -p %{buildroot}/%{_sysconfdir}/logrotate.d/
for service in slurmd slurmctld slurmdbd ; do
cat <<EOF > %{buildroot}/%{_sysconfdir}/logrotate.d/${service}.conf
/var/log/${service}.log {
compress
dateext
missingok
nocreate
notifempty
maxage 365
rotate 99
copytruncate
postrotate
pgrep ${service} && killall -SIGUSR2 ${service} || exit 0
endscript
}
EOF
done
mkdir -p %{buildroot}/%{apache_sysconfdir}/conf.d
cat > %{buildroot}/%{apache_sysconfdir}/conf.d/slurm.conf <<EOF
Alias /slurm/ "/usr/share/doc/slurm-%{ver}/html/"
<Directory "/usr/share/doc/slurm-%{ver}/html/">
AllowOverride None
DirectoryIndex slurm.html
# Controls who can get stuff from this server.
<IfModule !mod_access_compat.c>
Require all granted
</IfModule>
<IfModule mod_access_compat.c>
Order allow,deny
Allow from all
</IfModule>
</Directory>
EOF
cat > %{buildroot}/%{_sysconfdir}/%{pname}/nss_slurm.conf <<EOF
## Optional config for libnss_slurm
## Specify if different from default
# SlurmdSpoolDir /var/spool/slurmd
## Specify if does not match hostname
# NodeName myname
EOF
%fdupes -s %{buildroot}
# Temporary - remove when build is fixed upstream.
%if !0%{?build_slurmrestd}
rm -f %{buildroot}/%{_mandir}/man8/slurmrestd.*
rm -f %{buildroot}/%{_libdir}/slurm/openapi_*
%endif
%check
%{!?nocheck:make check}
%define fixperm() [ $1 -eq 1 -a -e %2 ] && /bin/chmod %1 %2
%if 0%{!?service_del_postun_without_restart:1}
%define service_del_postun_without_restart() %{expand:%%service_del_postun -n %{**}}
%endif
%pre
Accepting request 454272 from home:eeich:branches:network:cluster - Updated to 16.05.8.1 * Remove StoragePass from being printed out in the slurmdbd log at debug2 level. * Defer PATH search for task program until launch in slurmstepd. * Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is enabled. * Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. * If job is held for bad constraints make it so once updated the job doesn't go into JobAdminHeld. * sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. * When unpacking a node or front_end record from state and the protocol version is lower than the min version, set it to the min. * Remove redundant lookup for part_ptr when updating a reservation's nodes. * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic. * Do not allocate specialized cores to jobs using the --exclusive option. * Cancel interactive job if Prolog failure with "PrologFlags=contain" or "PrologFlags=alloc" configured. Send new error prolog failure message to the salloc or srun command as needed. * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line. * Fix check for PluginDir within slurmctld to work with multiple directories. * Cancel interactive jobs automatically on communication error to launching srun/salloc process. * Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371). - Replace group/user add macros with function calls. - Disable building with netloc support: the netloc API is part of the devel branch of hwloc. Since this devel branch was included accidentally and has been reversed since, we need to disable this for the time being. - Conditionalized architecture specific pieces to support non-x86 architectures better. - Remove: unneeded 'BuildRequires: python' - Add: BuildRequires: freeipmi-devel BuildRequires: libibmad-devel BuildRequires: libibumad-devel so they are picked up by the slurm build. - Enable modifications from openHPC Project. - Enable lua API package build. - Add a recommends for slurm-munge to the slurm package: This is way, the munge auth method is available and slurm works out of the box. - Create /var/lib/slurm as StateSaveLocation directory. /tmp is dangerous. - Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE. OBS-URL: https://build.opensuse.org/request/show/454272 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=13
2017-02-02 21:23:02 +01:00
%if 0%{?with_systemd}
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%service_add_pre slurmctld.service
%endif
%post
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%if 0%{?with_systemd}
%service_add_post slurmctld.service
%else
%fillup_and_insserv slurm
%endif
%preun
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%if 0%{?with_systemd}
%service_del_preun slurmctld.service
%endif
%postun
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%if 0%{?with_systemd}
%service_del_postun_without_restart slurmctld.service
%else
%insserv_cleanup
%endif
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%if 0%{?with_systemd}
%pre slurmdbd
%service_add_pre slurmdbd.service
%endif
%post slurmdbd
%{fixperm 0600 %{_sysconfdir}/%{pname}/slurmdbd.conf}
%{fixperm 0600 %{_sysconfdir}/%{pname}/slurmdbd.conf.example}
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%if 0%{?with_systemd}
%service_add_post slurmdbd.service
%else
%fillup_and_insserv slurmdbd
%endif
%preun slurmdbd
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%if 0%{?with_systemd}
%service_del_preun slurmdbd.service
%else
%stop_on_removal slurmdbd
%endif
%postun slurmdbd
%{fixperm 0600 %{_sysconfdir}/%{pname}/slurmdbd.conf}
%{fixperm 0600 %{_sysconfdir}/%{pname}/slurmdbd.conf.example}
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%if 0%{?with_systemd}
%service_del_postun_without_restart slurmdbd.service
%else
%restart_on_update slurmdbd
%insserv_cleanup
%endif
%pre node
%if 0%{?with_systemd}
%service_add_pre slurmd.service
%endif
%post node
%if 0%{?with_systemd}
%service_add_post slurmd.service
%endif
%preun node
%if 0%{?with_systemd}
%service_del_preun slurmd.service
%else
%stop_on_removal slurmd
%endif
%postun node
%if 0%{?with_systemd}
%service_del_postun_without_restart slurmd.service
%else
%restart_on_update slurmd
%insserv_cleanup
%endif
%pre config %{?have_sysuser:-f %{pname}.pre}
%if 0%{!?have_sysuser:1}
getent group %slurm_g >/dev/null || groupadd -r %slurm_g
getent passwd %slurm_u >/dev/null || useradd -r -g %slurm_g -d %slurmdir -s /bin/false -c %{slurmdescr} %slurm_u
[ -d %{_localstatedir}/spool/slurm ] && /bin/chown -h %slurm_u:%slurm_g %{_localstatedir}/spool/slurm
exit 0
%endif
%post config
%if 0%{?with_systemd}
%if 0%{?tmpfiles_create:1}
%tmpfiles_create slurm.conf
%else
systemd-tmpfiles --create slurm.conf
%endif
%endif
%post -n %{libslurm} -p /sbin/ldconfig
%postun -n %{libslurm} -p /sbin/ldconfig
%post -n libpmi%{pmi_so}%{?upgrade:%{_ver}} -p /sbin/ldconfig
%postun -n libpmi%{pmi_so}%{?upgrade:%{_ver}} -p /sbin/ldconfig
%post -n libnss_%{pname}%{nss_so}%{?upgrade:%{_ver}} -p /sbin/ldconfig
%postun -n libnss_%{pname}%{nss_so}%{?upgrade:%{_ver}} -p /sbin/ldconfig
%{!?nil:
# On update the %%postun code of the old package restarts the
# service. This breaks in case the ABI between slurm and its
# plugins has changed as updates are not atomic. Since we cannot
# fix the old scripts we need these macros as a workaround.
# They should be removed at some point.
# Do pretrans in lua: https://fedoraproject.org/wiki/Packaging:Scriptlets
}
%define _test_rest() %{?with_systemd: os.remove("/run/%{1}.rst")
if os.execute() and os.getenv("YAST_IS_RUNNING") ~= "instsys" then
local handle = io.popen("systemctl is-active %{1} 2>&1")
local str = handle:read("*a"); handle:close()
str = string.gsub(str, '^%%s+', '')
str = string.gsub(str, '%%s+$', '')
str = string.gsub(str, '[\\n\\r]+', ' ')
if str == "active" then
local file = io.open("/run/%{1}.rst","w"); file:close()
end
end
}
%define _rest() %{?with_systemd:[ -e /run/%{1}.rst ] && { systemctl status %{1} &>/dev/null || systemctl restart %{1}; }; rm -f /run/%{1}.rst;}
%{!?nil:
# Until a posttrans macro has been added to macros.systemd, we need this
# Do NOT delete the line breaks in the macro definition: they help
# to cope with different versions of the %%_restart_on_update.
}
%define _res_update() %{?with_systemd:
%{expand:%%_restart_on_update %{?*}}
}
%pretrans -p <lua>
%_test_rest slurmctld
%pretrans node -p <lua>
%_test_rest slurmd
%pretrans slurmdbd -p <lua>
%_test_rest slurmdbd
%posttrans
%_res_update slurmctld
%_rest slurmctld
%posttrans node
%_res_update slurmd
%_rest slurmd
%posttrans slurmdbd
%_res_update slurmdbd.service
%_rest slurmdbd
Accepting request 629222 from home:eeich:branches:network:cluster - Update to 17.11.9 * Fix segfault in slurmctld when a job's node bitmap is NULL during a scheduling cycle. Primarily caused by EnforcePartLimits=ALL. * Remove erroneous unlock in acct_gather_energy/ipmi. * Enable support for hwloc version 2.0.1. * Fix 'srun -q' (--qos) option handling. * Fix socket communication issue that can lead to lost task completition messages, which will cause a permanently stuck srun process. * Handle creation of TMPDIR if environment variable is set or changed in a task prolog script. * Avoid node layout fragmentation if running with a fixed CPU count but without Sockets and CoresPerSocket defined. * burst_buffer/cray - Fix datawarp swap default pool overriding jobdw. * Fix incorrect job priority assignment for multi-partition job with different PriorityTier settings on the partitions. * Fix sinfo to print correct node state. - When using a remote shared StateSaveLocation, slurmctld needs to be started after remote filesystems have become available. Add 'remote-fs.target' to the 'After=' directive in slurmctld.service (boo#1103561). - Update to 17.11.8 * Fix incomplete RESPONSE_[RESOURCE|JOB_PACK]_ALLOCATION building path. * Do not allocate nodes that were marked down due to the node not responding by ResumeTimeout. * task/cray plugin - search for "mems" cgroup information in the file "cpuset.mems" then fall back to the file "mems". * Fix ipmi profile debug uninitialized variable. * PMIx: fixed the direct connect inline msg sending. OBS-URL: https://build.opensuse.org/request/show/629222 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=64
2018-08-14 15:00:16 +02:00
%if 0%{?sle_version} > 120200 || 0%{?suse_version} > 1320
%define my_license %license
%else
%define my_license %doc
Accepting request 629222 from home:eeich:branches:network:cluster - Update to 17.11.9 * Fix segfault in slurmctld when a job's node bitmap is NULL during a scheduling cycle. Primarily caused by EnforcePartLimits=ALL. * Remove erroneous unlock in acct_gather_energy/ipmi. * Enable support for hwloc version 2.0.1. * Fix 'srun -q' (--qos) option handling. * Fix socket communication issue that can lead to lost task completition messages, which will cause a permanently stuck srun process. * Handle creation of TMPDIR if environment variable is set or changed in a task prolog script. * Avoid node layout fragmentation if running with a fixed CPU count but without Sockets and CoresPerSocket defined. * burst_buffer/cray - Fix datawarp swap default pool overriding jobdw. * Fix incorrect job priority assignment for multi-partition job with different PriorityTier settings on the partitions. * Fix sinfo to print correct node state. - When using a remote shared StateSaveLocation, slurmctld needs to be started after remote filesystems have become available. Add 'remote-fs.target' to the 'After=' directive in slurmctld.service (boo#1103561). - Update to 17.11.8 * Fix incomplete RESPONSE_[RESOURCE|JOB_PACK]_ALLOCATION building path. * Do not allocate nodes that were marked down due to the node not responding by ResumeTimeout. * task/cray plugin - search for "mems" cgroup information in the file "cpuset.mems" then fall back to the file "mems". * Fix ipmi profile debug uninitialized variable. * PMIx: fixed the direct connect inline msg sending. OBS-URL: https://build.opensuse.org/request/show/629222 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=64
2018-08-14 15:00:16 +02:00
%endif
%files
%{?comp_at}
%doc AUTHORS NEWS RELEASE_NOTES DISCLAIMER
Accepting request 629222 from home:eeich:branches:network:cluster - Update to 17.11.9 * Fix segfault in slurmctld when a job's node bitmap is NULL during a scheduling cycle. Primarily caused by EnforcePartLimits=ALL. * Remove erroneous unlock in acct_gather_energy/ipmi. * Enable support for hwloc version 2.0.1. * Fix 'srun -q' (--qos) option handling. * Fix socket communication issue that can lead to lost task completition messages, which will cause a permanently stuck srun process. * Handle creation of TMPDIR if environment variable is set or changed in a task prolog script. * Avoid node layout fragmentation if running with a fixed CPU count but without Sockets and CoresPerSocket defined. * burst_buffer/cray - Fix datawarp swap default pool overriding jobdw. * Fix incorrect job priority assignment for multi-partition job with different PriorityTier settings on the partitions. * Fix sinfo to print correct node state. - When using a remote shared StateSaveLocation, slurmctld needs to be started after remote filesystems have become available. Add 'remote-fs.target' to the 'After=' directive in slurmctld.service (boo#1103561). - Update to 17.11.8 * Fix incomplete RESPONSE_[RESOURCE|JOB_PACK]_ALLOCATION building path. * Do not allocate nodes that were marked down due to the node not responding by ResumeTimeout. * task/cray plugin - search for "mems" cgroup information in the file "cpuset.mems" then fall back to the file "mems". * Fix ipmi profile debug uninitialized variable. * PMIx: fixed the direct connect inline msg sending. OBS-URL: https://build.opensuse.org/request/show/629222 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=64
2018-08-14 15:00:16 +02:00
%my_license COPYING
%{_bindir}/sacct
%{_bindir}/sacctmgr
%{_bindir}/salloc
%{_bindir}/sattach
%{_bindir}/sbatch
%{_bindir}/sbcast
%{_bindir}/scancel
%{_bindir}/scrontab
%{_bindir}/scontrol
%{_bindir}/sdiag
%{_bindir}/sgather
%{_bindir}/sinfo
%{_bindir}/sjobexitmod
%{_bindir}/sprio
%{_bindir}/squeue
%{_bindir}/sreport
%{_bindir}/sshare
%{_bindir}/sstat
%{_bindir}/strigger
%{?have_netloc:%{_bindir}/netloc_to_topology}
%{_sbindir}/slurmctld
%{_sbindir}/slurmsmwd
Accepting request 650545 from home:eeich:branches:network:cluster - Added missing perl-base dependency. - Moved HTML docs to doc package. - Moved config man pages to a separate package: This way, they won't get installed on compute nodes. - Update to 18.08.3 * Add new burst buffer state of "teardown-fail" to indicate the burst buffer teardown operation is failing on specific buffers. * Multiple backup slurmctld daemons can be configured * Enable jobs with zero node count for creation and/or deletion of persistent burst buffers. * Add "scontrol show dwstat" command to display Cray burst buffer status. * Add "GetSysStatus" option to burst_buffer.conf file. * Add node and partition configuration options of "CpuBind" to control default task binding. * Add "NumaCpuBind" option to knl.conf * Add sbatch "--batch" option to identify features required on batch node. * Add "BatchFeatures" field to output of "scontrol show job". * Add support for "--bb" option to sbatch command. * Add new SystemComment field to job data structure and database. * Expand reservation "flags" field from 32 to 64 bits. * Add job state flag of "SIGNALING" to avoid race condition. * Properly handle srun --will-run option when there are jobs in COMPLETING state. * Properly report who is signaling a step. * Don't combine updated reservation records in sreport's reservation report. * node_features plugin - Add suport for XOR & XAND of job constraints (node feature specifications). OBS-URL: https://build.opensuse.org/request/show/650545 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=75
2018-11-20 18:07:44 +01:00
%dir %{_libdir}/slurm/src
%if 0%{?with_systemd}
%{_unitdir}/slurmctld.service
%{_sbindir}/rcslurmctld
%endif
%{_mandir}/man1/sacct.1*
%{_mandir}/man1/sacctmgr.1*
%{_mandir}/man1/salloc.1*
%{_mandir}/man1/sattach.1*
%{_mandir}/man1/sbatch.1*
%{_mandir}/man1/sbcast.1*
%{_mandir}/man1/scancel.1*
%{_mandir}/man1/scrontab.1*
%{_mandir}/man1/scontrol.1*
%{_mandir}/man1/sdiag.1.*
%{_mandir}/man1/sgather.1.*
%{_mandir}/man1/sinfo.1*
%{_mandir}/man1/slurm.1*
%{_mandir}/man1/sprio.1*
%{_mandir}/man1/squeue.1*
%{_mandir}/man1/sreport.1*
%{_mandir}/man1/sshare.1*
%{_mandir}/man1/sstat.1*
%{_mandir}/man1/strigger.1*
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%{_mandir}/man1/sjobexitmod.1.*
%{_mandir}/man1/sjstat.1.*
%{_mandir}/man8/slurmctld.*
%{_mandir}/man8/spank*
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%files openlava
%{?comp_at}
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%{_bindir}/bjobs
%{_bindir}/bkill
%{_bindir}/bsub
%{_bindir}/lsid
%files seff
%{?comp_at}
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%{_bindir}/seff
%{_bindir}/smail
Accepting request 435622 from home:eeich:branches:network:cluster - version 15.08.7.1 * Remove the 1024-character limit on lines in batch scripts. task/affinity: Disable core-level task binding if more CPUs required than available cores. * Preemption/gang scheduling: If a job is suspended at slurmctld restart or reconfiguration time, then leave it suspended rather than resume+suspend. * Don't use lower weight nodes for job allocation when topology/tree used. * Don't allow user specified reservation names to disrupt the normal reservation sequeuece numbering scheme. * Avoid hard-link/copy of script/environment files for job arrays. Use the master job record file for all tasks of the job array. NOTE: Job arrays submitted to Slurm version 15.08.6 or later will fail if the slurmctld daemon is downgraded to an earlier version of Slurm. * In slurmctld log file, log duplicate job ID found by slurmd. Previously was being logged as prolog/epilog failure. * If a job is requeued while in the process of being launch, remove it's job ID from slurmd's record of active jobs in order to avoid generating a duplicate job ID error when launched for the second time (which would drain the node). * Cleanup messages when handling job script and environment variables in older directory structure formats. * Prevent triggering gang scheduling within a partition if configured with PreemptType=partition_prio and PreemptMode=suspend,gang. * Decrease parallelism in job cancel request to prevent denial of service when cancelling huge numbers of jobs. * If all ephemeral ports are in use, try using other port numbers. * Prevent "scontrol update job" from updating jobs that have already finished. * Show requested TRES in "squeue -O tres" when job is pending. * Backfill scheduler: Test association and QOS node limits before reserving resources for pending job. * Many bug fixes. - Use source services to download package. - Fix code for new API of hwloc-2.0. - package netloc_to_topology where avialable. - Package documentation. OBS-URL: https://build.opensuse.org/request/show/435622 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=10
2016-10-16 21:51:20 +02:00
%files doc
%{?comp_at}
%dir %{_datadir}/doc/%{pname}-%{dl_ver}
%{_datadir}/doc/%{pname}-%{dl_ver}/*
Accepting request 435622 from home:eeich:branches:network:cluster - version 15.08.7.1 * Remove the 1024-character limit on lines in batch scripts. task/affinity: Disable core-level task binding if more CPUs required than available cores. * Preemption/gang scheduling: If a job is suspended at slurmctld restart or reconfiguration time, then leave it suspended rather than resume+suspend. * Don't use lower weight nodes for job allocation when topology/tree used. * Don't allow user specified reservation names to disrupt the normal reservation sequeuece numbering scheme. * Avoid hard-link/copy of script/environment files for job arrays. Use the master job record file for all tasks of the job array. NOTE: Job arrays submitted to Slurm version 15.08.6 or later will fail if the slurmctld daemon is downgraded to an earlier version of Slurm. * In slurmctld log file, log duplicate job ID found by slurmd. Previously was being logged as prolog/epilog failure. * If a job is requeued while in the process of being launch, remove it's job ID from slurmd's record of active jobs in order to avoid generating a duplicate job ID error when launched for the second time (which would drain the node). * Cleanup messages when handling job script and environment variables in older directory structure formats. * Prevent triggering gang scheduling within a partition if configured with PreemptType=partition_prio and PreemptMode=suspend,gang. * Decrease parallelism in job cancel request to prevent denial of service when cancelling huge numbers of jobs. * If all ephemeral ports are in use, try using other port numbers. * Prevent "scontrol update job" from updating jobs that have already finished. * Show requested TRES in "squeue -O tres" when job is pending. * Backfill scheduler: Test association and QOS node limits before reserving resources for pending job. * Many bug fixes. - Use source services to download package. - Fix code for new API of hwloc-2.0. - package netloc_to_topology where avialable. - Package documentation. OBS-URL: https://build.opensuse.org/request/show/435622 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=10
2016-10-16 21:51:20 +02:00
%files webdoc
%{?comp_at}
%{apache_sysconfdir}/conf.d/slurm.conf
%files -n %{libslurm}
%{?comp_at}
%{_libdir}/libslurm*.so.%{so_version}*
%files -n libpmi%{pmi_so}%{?upgrade:%{_ver}}
%{?comp_at}
%{_libdir}/libpmi*.so.%{pmi_so}*
%files -n libnss_%{pname}%{nss_so}%{?upgrade:%{_ver}}
%{?comp_at}
%config(noreplace) %{_sysconfdir}/%{pname}/nss_slurm.conf
%{_libdir}/libnss_slurm.so.%{nss_so}
%files devel
%{?comp_at}
%{_prefix}/include/slurm
Accepting request 454272 from home:eeich:branches:network:cluster - Updated to 16.05.8.1 * Remove StoragePass from being printed out in the slurmdbd log at debug2 level. * Defer PATH search for task program until launch in slurmstepd. * Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is enabled. * Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. * If job is held for bad constraints make it so once updated the job doesn't go into JobAdminHeld. * sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. * When unpacking a node or front_end record from state and the protocol version is lower than the min version, set it to the min. * Remove redundant lookup for part_ptr when updating a reservation's nodes. * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic. * Do not allocate specialized cores to jobs using the --exclusive option. * Cancel interactive job if Prolog failure with "PrologFlags=contain" or "PrologFlags=alloc" configured. Send new error prolog failure message to the salloc or srun command as needed. * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line. * Fix check for PluginDir within slurmctld to work with multiple directories. * Cancel interactive jobs automatically on communication error to launching srun/salloc process. * Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371). - Replace group/user add macros with function calls. - Disable building with netloc support: the netloc API is part of the devel branch of hwloc. Since this devel branch was included accidentally and has been reversed since, we need to disable this for the time being. - Conditionalized architecture specific pieces to support non-x86 architectures better. - Remove: unneeded 'BuildRequires: python' - Add: BuildRequires: freeipmi-devel BuildRequires: libibmad-devel BuildRequires: libibumad-devel so they are picked up by the slurm build. - Enable modifications from openHPC Project. - Enable lua API package build. - Add a recommends for slurm-munge to the slurm package: This is way, the munge auth method is available and slurm works out of the box. - Create /var/lib/slurm as StateSaveLocation directory. /tmp is dangerous. - Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE. OBS-URL: https://build.opensuse.org/request/show/454272 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=13
2017-02-02 21:23:02 +01:00
%{_libdir}/libpmi.so
%{_libdir}/libpmi2.so
%{_libdir}/libslurm.so
%{_libdir}/slurm/src/*
Accepting request 915777 from home:mslacken:slurm_update - updated to 21.08.1, major changes: * A new "AccountingStoreFlags=job_script" option to store the job scripts directly in SlurmDBD. * Added "sacct -o SubmitLine" format option to get the submit line of a job/step. * Changes to the node state management so that nodes are marked as PLANNED instead of IDLE if the scheduler is still accumulating resources while waiting to launch a job on them. * RS256 token support in auth/jwt. * Overhaul of the cgroup subsystems to simplify operation, mitigate a number of inherent race conditions, and prepare for future cgroup v2 support. * Further improvements to cloud node power state management. * A new child process of the Slurm controller called "slurmscriptd" responsible for executing PrologSlurmctld and EpilogSlurmctld scripts, which significantly reduces performance issues associated with enabling those options. * A new burst_buffer/lua plugin allowing for site-specific asynchronous job data management. * Fixes to the job_container/tmpfs plugin to allow the slurmd process to be restarted while the job is running without issue. * Added json/yaml output to sacct, squeue, and sinfo commands. * Added a new node_features/helpers plugin to provide a generic way to change settings on a compute node across a reboot. * Added support for automatically detecting and broadcasting shared libraries for an executable launched with "srun --bcast". * Added initial OCI container execution support with a new --container option to sbatch and srun. * Improved "configless" support by allowing multiple control servers to be specified through the slurmd --conf-server option, and send additional configuration files at startup including cli_filter.lua. OBS-URL: https://build.opensuse.org/request/show/915777 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=184
2021-09-06 15:29:00 +02:00
#%{_mandir}/man3/slurm_*
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%{_libdir}/pkgconfig/slurm.pc
%files sview
%{?comp_at}
%{_bindir}/sview
%{_mandir}/man1/sview.1*
%files auth-none
%{?comp_at}
%{_libdir}/slurm/auth_none.so
%files munge
%{?comp_at}
%{_libdir}/slurm/auth_munge.so
%{_libdir}/slurm/cred_munge.so
%files -n perl-%{name}
%{?comp_at}
%{perl_vendorarch}/Slurm.pm
%{perl_vendorarch}/Slurm
%{perl_vendorarch}/auto/Slurm
%{perl_vendorarch}/Slurmdb.pm
%{perl_vendorarch}/auto/Slurmdb
%{_mandir}/man3/Slurm*.3pm.*
%files slurmdbd
%{?comp_at}
%{_sbindir}/slurmdbd
%{_mandir}/man5/slurmdbd.*
%{_mandir}/man8/slurmdbd.*
%config(noreplace) %attr(0600,%slurm_u,%slurm_g) %{_sysconfdir}/%{pname}/slurmdbd.conf
%attr(0600,%slurm_u,%slurm_g) %{_sysconfdir}/%{pname}/slurmdbd.conf.example
Accepting request 441490 from home:eeich:branches:network:cluster - Fix build with and without OHCP_BUILD define. - Fix build for systemd and non-systemd. - Updated to 16-05-5 - equvalent to OpenHPC 1.2. * Fix issue with resizing jobs and limits not be kept track of correctly. * BGQ - Remove redeclaration of job_read_lock. * BGQ - Tighter locks around structures when nodes/cables change state. * Make it possible to change CPUsPerTask with scontrol. * Make it so scontrol update part qos= will take away a partition QOS from a partition. * Backfill scheduling properly synchronized with Cray Node Health Check. Prior logic could result in highest priority job getting improperly postponed. * Make it so daemons also support TopologyParam=NoInAddrAny. * If scancel is operating on large number of jobs and RPC responses from slurmctld daemon are slow then introduce a delay in sending the cancel job requests from scancel in order to reduce load on slurmctld. * Remove redundant logic when updating a job's task count. * MySQL - Fix querying jobs with reservations when the id's have rolled. * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids. * Launch batch job requsting --reboot after the boot completes. * Do not attempt to power down a node which has never responded if the slurmctld daemon restarts without state. * Fix for possible slurmstepd segfault on invalid user ID. * MySQL - Fix for possible race condition when archiving multiple clusters at the same time. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Remove the SchedulerParameters option of "assoc_limit_continue", making it the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop" is set and a job cannot start due to association limits, then do not attempt to initiate any lower priority jobs in that partition. Setting this can decrease system throughput and utlization, but avoid potentially starving larger jobs by preventing them from launching indefinitely. * Update a node's socket and cores per socket counts as needed after a node boot to reflect configuration changes which can occur on KNL processors. Note that the node's total core count must not change, only the distribution of cores across varying socket counts (KNL NUMA nodes treated as sockets by Slurm). * Rename partition configuration from "Shared" to "OverSubscribe". Rename salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old options will continue to function. Output field names also changed in scontrol, sinfo, squeue and sview. * Add SLURM_UMASK environment variable to user job. * knl_conf: Added new configuration parameter of CapmcPollFreq. * Cleanup two minor Coverity warnings. * Make it so the tres units in a job's formatted string are converted like they are in a step. * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by multiple partitions. * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references. * Display thread name instead of thread id and remove process name in stderr logging for "thread_id" LogTimeFormat. * Log IP address of bad incomming message to slurmctld. * If a user requests tasks, nodes and ntasks-per-node and tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node. * Release CPU "owner" file locks. * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency in spec file. * Allow QOS timelimit to override partition timelimit when EnforcePartLimits is set to all/any. * Make it so qsub will do a "basename" on a wrapped command for the output and error files. * Add logic so that slurmstepd can be launched under valgrind. * Increase buffer size to read /proc/*/stat files. * Prevent job stuck in configuring state if slurmctld daemon restarted while PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation as needed. * Move test for job wait reason value of BurstBufferResources and BurstBufferStageIn later in the scheduling logic. * Document which srun options apply to only job, only step, or job and step allocations. * Use more compatible function to get thread name (>= 2.6.11). * Make it so the extern step uses a reverse tree when cleaning up. * If extern step doesn't get added into the proctrack plugin make sure the sleep is killed. * Add web links to Slurm Diamond Collectors (from Harvard University) and collectd (from EDF). * Add job_submit plugin for the "reboot" field. * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to job_submit/lua plugins. * Send in a -1 for a taskid into spank_task_post_fork for the extern_step. * MYSQL - Sightly better logic if a job completion comes in with an end time of 0. * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft memory limit to allocated memory limit (previously no soft limit was set). * Streamline when schedule() is called when running with message aggregation on batch script completes. * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t. * Document that persistent burst buffers can not be created or destroyed using the salloc or srun --bb options. * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and SLURM_JOB_RESERVAION environment variables are set for the salloc command. Document the same environment variables for the salloc, sbatch and srun commands in their man pages. * Fix issue where sacctmgr load cluster.cfg wouldn't load associations that had a partition in them. * Don't return the extern step from sstat by default. * In sstat print 'extern' instead of 4294967295 for the extern step. * Make advanced reservations work properly with core specialization. * slurmstepd modified to pre-load all relevant plugins at startup to avoid the possibility of modified plugins later resulting in inconsistent API or data structures and a failure of slurmstepd. * Export functions from parse_time.c in libslurm.so. * Export unit convert functions from slurm_protocol_api.c in libslurm.so. * Fix scancel to allow multiple steps from a job to be cancelled at once. * Update and expand upgrade guide (in Quick Start Administrator web page). * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run operation. * Insure reported expected job start time is not in the past for pending jobs. * Add support for PMIx v2. OBS-URL: https://build.opensuse.org/request/show/441490 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 23:01:51 +01:00
%if 0%{?with_systemd}
%{_unitdir}/slurmdbd.service
%else
%{_initrddir}/slurmdbd
%endif
%{_sbindir}/rcslurmdbd
%files sql
%{?comp_at}
%dir %{_libdir}/slurm
%{_libdir}/slurm/accounting_storage_mysql.so
%{_libdir}/slurm/jobcomp_mysql.so
%files plugins
%{?comp_at}
%config %{_sysconfdir}/ld.so.conf.d/slurm.conf
%config(noreplace) %{_sysconfdir}/%{pname}/plugstack.conf
%dir %{_sysconfdir}/%{pname}/plugstack.conf.d
%dir %{_libdir}/slurm
%{_libdir}/slurm/libslurmfull.so
%{_libdir}/slurm/accounting_storage_none.so
%{_libdir}/slurm/accounting_storage_slurmdbd.so
%{_libdir}/slurm/acct_gather_energy_pm_counters.so
%{_libdir}/slurm/acct_gather_energy_ibmaem.so
%{_libdir}/slurm/acct_gather_energy_none.so
%{_libdir}/slurm/acct_gather_energy_rapl.so
Accepting request 226317 from home:scorot:branches:network:cluster - update to version 2.6.7 * Support for job arrays, which increases performance and ease of use for sets of similar jobs. * Job profiling capability added to record a wide variety of job characteristics for each task on a user configurable periodic basis. Data currently available includes CPU use, memory use, energy use, Infiniband network use, Lustre file system use, etc. * Support for MPICH2 using PMI2 communications interface with much greater scalability. * Prolog and epilog support for advanced reservations. * Much faster throughput for job step execution with --exclusive option. The srun process is notified when resources become available rather than periodic polling. * Support improved for Intel MIC (Many Integrated Core) processor. * Advanced reservations with hostname and core counts now supports asymmetric reservations (e.g. specific different core count for each node). * External sensor plugin infrastructure added to record power consumption, temperature, etc. * Improved performance for high-throughput computing. * MapReduce+ support (launches ~1000x faster, runs ~10x faster). * Added "MaxCPUsPerNode" partition configuration parameter. This can be especially useful to schedule GPUs. For example a node can be associated with two Slurm partitions (e.g. "cpu" and "gpu") and the partition/queue "cpu" could be limited to only a subset of the node's CPUs, insuring that one or more CPUs would be available to jobs in the "gpu" partition/queue. OBS-URL: https://build.opensuse.org/request/show/226317 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=4
2014-03-16 21:42:08 +01:00
%{_libdir}/slurm/acct_gather_filesystem_lustre.so
%{_libdir}/slurm/acct_gather_filesystem_none.so
%{_libdir}/slurm/acct_gather_interconnect_none.so
Accepting request 226317 from home:scorot:branches:network:cluster - update to version 2.6.7 * Support for job arrays, which increases performance and ease of use for sets of similar jobs. * Job profiling capability added to record a wide variety of job characteristics for each task on a user configurable periodic basis. Data currently available includes CPU use, memory use, energy use, Infiniband network use, Lustre file system use, etc. * Support for MPICH2 using PMI2 communications interface with much greater scalability. * Prolog and epilog support for advanced reservations. * Much faster throughput for job step execution with --exclusive option. The srun process is notified when resources become available rather than periodic polling. * Support improved for Intel MIC (Many Integrated Core) processor. * Advanced reservations with hostname and core counts now supports asymmetric reservations (e.g. specific different core count for each node). * External sensor plugin infrastructure added to record power consumption, temperature, etc. * Improved performance for high-throughput computing. * MapReduce+ support (launches ~1000x faster, runs ~10x faster). * Added "MaxCPUsPerNode" partition configuration parameter. This can be especially useful to schedule GPUs. For example a node can be associated with two Slurm partitions (e.g. "cpu" and "gpu") and the partition/queue "cpu" could be limited to only a subset of the node's CPUs, insuring that one or more CPUs would be available to jobs in the "gpu" partition/queue. OBS-URL: https://build.opensuse.org/request/show/226317 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=4
2014-03-16 21:42:08 +01:00
%{_libdir}/slurm/acct_gather_profile_none.so
Accepting request 915777 from home:mslacken:slurm_update - updated to 21.08.1, major changes: * A new "AccountingStoreFlags=job_script" option to store the job scripts directly in SlurmDBD. * Added "sacct -o SubmitLine" format option to get the submit line of a job/step. * Changes to the node state management so that nodes are marked as PLANNED instead of IDLE if the scheduler is still accumulating resources while waiting to launch a job on them. * RS256 token support in auth/jwt. * Overhaul of the cgroup subsystems to simplify operation, mitigate a number of inherent race conditions, and prepare for future cgroup v2 support. * Further improvements to cloud node power state management. * A new child process of the Slurm controller called "slurmscriptd" responsible for executing PrologSlurmctld and EpilogSlurmctld scripts, which significantly reduces performance issues associated with enabling those options. * A new burst_buffer/lua plugin allowing for site-specific asynchronous job data management. * Fixes to the job_container/tmpfs plugin to allow the slurmd process to be restarted while the job is running without issue. * Added json/yaml output to sacct, squeue, and sinfo commands. * Added a new node_features/helpers plugin to provide a generic way to change settings on a compute node across a reboot. * Added support for automatically detecting and broadcasting shared libraries for an executable launched with "srun --bcast". * Added initial OCI container execution support with a new --container option to sbatch and srun. * Improved "configless" support by allowing multiple control servers to be specified through the slurmd --conf-server option, and send additional configuration files at startup including cli_filter.lua. OBS-URL: https://build.opensuse.org/request/show/915777 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=184
2021-09-06 15:29:00 +02:00
%{_libdir}/slurm/burst_buffer_lua.so
%{?have_json_c:%{_libdir}/slurm/burst_buffer_datawarp.so}
Accepting request 915777 from home:mslacken:slurm_update - updated to 21.08.1, major changes: * A new "AccountingStoreFlags=job_script" option to store the job scripts directly in SlurmDBD. * Added "sacct -o SubmitLine" format option to get the submit line of a job/step. * Changes to the node state management so that nodes are marked as PLANNED instead of IDLE if the scheduler is still accumulating resources while waiting to launch a job on them. * RS256 token support in auth/jwt. * Overhaul of the cgroup subsystems to simplify operation, mitigate a number of inherent race conditions, and prepare for future cgroup v2 support. * Further improvements to cloud node power state management. * A new child process of the Slurm controller called "slurmscriptd" responsible for executing PrologSlurmctld and EpilogSlurmctld scripts, which significantly reduces performance issues associated with enabling those options. * A new burst_buffer/lua plugin allowing for site-specific asynchronous job data management. * Fixes to the job_container/tmpfs plugin to allow the slurmd process to be restarted while the job is running without issue. * Added json/yaml output to sacct, squeue, and sinfo commands. * Added a new node_features/helpers plugin to provide a generic way to change settings on a compute node across a reboot. * Added support for automatically detecting and broadcasting shared libraries for an executable launched with "srun --bcast". * Added initial OCI container execution support with a new --container option to sbatch and srun. * Improved "configless" support by allowing multiple control servers to be specified through the slurmd --conf-server option, and send additional configuration files at startup including cli_filter.lua. OBS-URL: https://build.opensuse.org/request/show/915777 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=184
2021-09-06 15:29:00 +02:00
%{_libdir}/slurm/cgroup_v1.so
%{_libdir}/slurm/core_spec_none.so
%{_libdir}/slurm/cli_filter_none.so
Accepting request 773459 from home:mslacken:branches:network:cluster - Updated to version 20.02.0-0pre1, highlights are Highlights: * Exclusive behavior of a node includes all GRES on a node as well as the cpus. * Use python3 instead of python for internal build/test scripts. The slurm.spec file has been updated to depend on python3 as well. * Added new NodeSet configuration option to help simplify partition configuration sections for heterogeneous / condo*style clusters. * Added slurm.conf option MaxDBDMsgs to control how many messages will be stored in the slurmctld before throwing them away when the slurmdbd is down. * The checkpoint plugin interface and all associated API calls have been removed. * slurm_init_job_desc_msg() initializes mail_type as uint16_t. This allows mail_type to be set to NONE with scontrol. * Add new slurm_spank_log() function to print messages back to the user from within a SPANK plugin without prepending "error: " from slurm_error(). * Enforce having partition name and nodelist=ALL when creating reservations with flags=PART_NODES. * SPANK - removed never-implemented slurm_spank_slurmd_init() interface. This hook has always been accessible through slurm_spank_init() in the S_CTX_SLURMD context instead. * sbcast - add new BcastAddr option to NodeName lines to allow sbcast traffic to flow over an alternate network path. * Added auth/jwt plugin, and 'scontrol token' subcommand. PMIx - improve * performance of proc map generation. Deprecate kill_invalid_depend in * SchedulerParameters and move it to a new option called DependencyParameters. * Enable job dependencies for any job on any cluster in the same federation. * Allow clusters to be added automatically to db at startup of ctld. Add * AccountingStorageExternalHost slurm.conf parameter. The OBS-URL: https://build.opensuse.org/request/show/773459 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=130
2020-02-11 15:31:26 +01:00
%{_libdir}/slurm/cli_filter_lua.so
%{_libdir}/slurm/cli_filter_syslog.so
%{_libdir}/slurm/cli_filter_user_defaults.so
%{_libdir}/slurm/cred_none.so
Accepting request 226317 from home:scorot:branches:network:cluster - update to version 2.6.7 * Support for job arrays, which increases performance and ease of use for sets of similar jobs. * Job profiling capability added to record a wide variety of job characteristics for each task on a user configurable periodic basis. Data currently available includes CPU use, memory use, energy use, Infiniband network use, Lustre file system use, etc. * Support for MPICH2 using PMI2 communications interface with much greater scalability. * Prolog and epilog support for advanced reservations. * Much faster throughput for job step execution with --exclusive option. The srun process is notified when resources become available rather than periodic polling. * Support improved for Intel MIC (Many Integrated Core) processor. * Advanced reservations with hostname and core counts now supports asymmetric reservations (e.g. specific different core count for each node). * External sensor plugin infrastructure added to record power consumption, temperature, etc. * Improved performance for high-throughput computing. * MapReduce+ support (launches ~1000x faster, runs ~10x faster). * Added "MaxCPUsPerNode" partition configuration parameter. This can be especially useful to schedule GPUs. For example a node can be associated with two Slurm partitions (e.g. "cpu" and "gpu") and the partition/queue "cpu" could be limited to only a subset of the node's CPUs, insuring that one or more CPUs would be available to jobs in the "gpu" partition/queue. OBS-URL: https://build.opensuse.org/request/show/226317 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=4
2014-03-16 21:42:08 +01:00
%{_libdir}/slurm/ext_sensors_none.so
%{_libdir}/slurm/gpu_generic.so
%{_libdir}/slurm/gres_gpu.so
%{_libdir}/slurm/gres_mps.so
%{_libdir}/slurm/gres_nic.so
%{_libdir}/slurm/jobacct_gather_cgroup.so
%{_libdir}/slurm/jobacct_gather_linux.so
%{_libdir}/slurm/jobacct_gather_none.so
%{_libdir}/slurm/jobcomp_filetxt.so
%{_libdir}/slurm/jobcomp_none.so
Accepting request 773459 from home:mslacken:branches:network:cluster - Updated to version 20.02.0-0pre1, highlights are Highlights: * Exclusive behavior of a node includes all GRES on a node as well as the cpus. * Use python3 instead of python for internal build/test scripts. The slurm.spec file has been updated to depend on python3 as well. * Added new NodeSet configuration option to help simplify partition configuration sections for heterogeneous / condo*style clusters. * Added slurm.conf option MaxDBDMsgs to control how many messages will be stored in the slurmctld before throwing them away when the slurmdbd is down. * The checkpoint plugin interface and all associated API calls have been removed. * slurm_init_job_desc_msg() initializes mail_type as uint16_t. This allows mail_type to be set to NONE with scontrol. * Add new slurm_spank_log() function to print messages back to the user from within a SPANK plugin without prepending "error: " from slurm_error(). * Enforce having partition name and nodelist=ALL when creating reservations with flags=PART_NODES. * SPANK - removed never-implemented slurm_spank_slurmd_init() interface. This hook has always been accessible through slurm_spank_init() in the S_CTX_SLURMD context instead. * sbcast - add new BcastAddr option to NodeName lines to allow sbcast traffic to flow over an alternate network path. * Added auth/jwt plugin, and 'scontrol token' subcommand. PMIx - improve * performance of proc map generation. Deprecate kill_invalid_depend in * SchedulerParameters and move it to a new option called DependencyParameters. * Enable job dependencies for any job on any cluster in the same federation. * Allow clusters to be added automatically to db at startup of ctld. Add * AccountingStorageExternalHost slurm.conf parameter. The OBS-URL: https://build.opensuse.org/request/show/773459 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=130
2020-02-11 15:31:26 +01:00
%{_libdir}/slurm/jobcomp_lua.so
%{_libdir}/slurm/jobcomp_script.so
%{_libdir}/slurm/job_container_cncu.so
%{_libdir}/slurm/job_container_none.so
%{_libdir}/slurm/job_container_tmpfs.so
%{_libdir}/slurm/job_submit_all_partitions.so
%{_libdir}/slurm/job_submit_defaults.so
%{_libdir}/slurm/job_submit_logging.so
%{_libdir}/slurm/job_submit_partition.so
Accepting request 226317 from home:scorot:branches:network:cluster - update to version 2.6.7 * Support for job arrays, which increases performance and ease of use for sets of similar jobs. * Job profiling capability added to record a wide variety of job characteristics for each task on a user configurable periodic basis. Data currently available includes CPU use, memory use, energy use, Infiniband network use, Lustre file system use, etc. * Support for MPICH2 using PMI2 communications interface with much greater scalability. * Prolog and epilog support for advanced reservations. * Much faster throughput for job step execution with --exclusive option. The srun process is notified when resources become available rather than periodic polling. * Support improved for Intel MIC (Many Integrated Core) processor. * Advanced reservations with hostname and core counts now supports asymmetric reservations (e.g. specific different core count for each node). * External sensor plugin infrastructure added to record power consumption, temperature, etc. * Improved performance for high-throughput computing. * MapReduce+ support (launches ~1000x faster, runs ~10x faster). * Added "MaxCPUsPerNode" partition configuration parameter. This can be especially useful to schedule GPUs. For example a node can be associated with two Slurm partitions (e.g. "cpu" and "gpu") and the partition/queue "cpu" could be limited to only a subset of the node's CPUs, insuring that one or more CPUs would be available to jobs in the "gpu" partition/queue. OBS-URL: https://build.opensuse.org/request/show/226317 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=4
2014-03-16 21:42:08 +01:00
%{_libdir}/slurm/job_submit_require_timelimit.so
%{_libdir}/slurm/job_submit_throttle.so
%{_libdir}/slurm/launch_slurm.so
%{_libdir}/slurm/libslurm_pmi.so
%{_libdir}/slurm/mcs_account.so
%{_libdir}/slurm/mcs_group.so
%{_libdir}/slurm/mcs_none.so
%{_libdir}/slurm/mcs_user.so
%{_libdir}/slurm/mpi_none.so
%{_libdir}/slurm/mpi_pmi2.so
%if %{with pmix}
%{_libdir}/slurm/mpi_pmix.so
%{_libdir}/slurm/mpi_pmix_v3.so
%endif
Accepting request 915777 from home:mslacken:slurm_update - updated to 21.08.1, major changes: * A new "AccountingStoreFlags=job_script" option to store the job scripts directly in SlurmDBD. * Added "sacct -o SubmitLine" format option to get the submit line of a job/step. * Changes to the node state management so that nodes are marked as PLANNED instead of IDLE if the scheduler is still accumulating resources while waiting to launch a job on them. * RS256 token support in auth/jwt. * Overhaul of the cgroup subsystems to simplify operation, mitigate a number of inherent race conditions, and prepare for future cgroup v2 support. * Further improvements to cloud node power state management. * A new child process of the Slurm controller called "slurmscriptd" responsible for executing PrologSlurmctld and EpilogSlurmctld scripts, which significantly reduces performance issues associated with enabling those options. * A new burst_buffer/lua plugin allowing for site-specific asynchronous job data management. * Fixes to the job_container/tmpfs plugin to allow the slurmd process to be restarted while the job is running without issue. * Added json/yaml output to sacct, squeue, and sinfo commands. * Added a new node_features/helpers plugin to provide a generic way to change settings on a compute node across a reboot. * Added support for automatically detecting and broadcasting shared libraries for an executable launched with "srun --bcast". * Added initial OCI container execution support with a new --container option to sbatch and srun. * Improved "configless" support by allowing multiple control servers to be specified through the slurmd --conf-server option, and send additional configuration files at startup including cli_filter.lua. OBS-URL: https://build.opensuse.org/request/show/915777 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=184
2021-09-06 15:29:00 +02:00
%{_libdir}/slurm/node_features_helpers.so
%{_libdir}/slurm/power_none.so
%{_libdir}/slurm/preempt_none.so
%{_libdir}/slurm/preempt_partition_prio.so
%{_libdir}/slurm/preempt_qos.so
Accepting request 773459 from home:mslacken:branches:network:cluster - Updated to version 20.02.0-0pre1, highlights are Highlights: * Exclusive behavior of a node includes all GRES on a node as well as the cpus. * Use python3 instead of python for internal build/test scripts. The slurm.spec file has been updated to depend on python3 as well. * Added new NodeSet configuration option to help simplify partition configuration sections for heterogeneous / condo*style clusters. * Added slurm.conf option MaxDBDMsgs to control how many messages will be stored in the slurmctld before throwing them away when the slurmdbd is down. * The checkpoint plugin interface and all associated API calls have been removed. * slurm_init_job_desc_msg() initializes mail_type as uint16_t. This allows mail_type to be set to NONE with scontrol. * Add new slurm_spank_log() function to print messages back to the user from within a SPANK plugin without prepending "error: " from slurm_error(). * Enforce having partition name and nodelist=ALL when creating reservations with flags=PART_NODES. * SPANK - removed never-implemented slurm_spank_slurmd_init() interface. This hook has always been accessible through slurm_spank_init() in the S_CTX_SLURMD context instead. * sbcast - add new BcastAddr option to NodeName lines to allow sbcast traffic to flow over an alternate network path. * Added auth/jwt plugin, and 'scontrol token' subcommand. PMIx - improve * performance of proc map generation. Deprecate kill_invalid_depend in * SchedulerParameters and move it to a new option called DependencyParameters. * Enable job dependencies for any job on any cluster in the same federation. * Allow clusters to be added automatically to db at startup of ctld. Add * AccountingStorageExternalHost slurm.conf parameter. The OBS-URL: https://build.opensuse.org/request/show/773459 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=130
2020-02-11 15:31:26 +01:00
%{_libdir}/slurm/prep_script.so
%{_libdir}/slurm/priority_basic.so
%{_libdir}/slurm/priority_multifactor.so
%{_libdir}/slurm/proctrack_cgroup.so
%{_libdir}/slurm/proctrack_linuxproc.so
%{_libdir}/slurm/proctrack_pgid.so
%{_libdir}/slurm/route_default.so
%{_libdir}/slurm/route_topology.so
%{_libdir}/slurm/sched_backfill.so
%{_libdir}/slurm/sched_builtin.so
%{_libdir}/slurm/select_cons_res.so
%{_libdir}/slurm/select_cons_tres.so
%{_libdir}/slurm/select_linear.so
Accepting request 915777 from home:mslacken:slurm_update - updated to 21.08.1, major changes: * A new "AccountingStoreFlags=job_script" option to store the job scripts directly in SlurmDBD. * Added "sacct -o SubmitLine" format option to get the submit line of a job/step. * Changes to the node state management so that nodes are marked as PLANNED instead of IDLE if the scheduler is still accumulating resources while waiting to launch a job on them. * RS256 token support in auth/jwt. * Overhaul of the cgroup subsystems to simplify operation, mitigate a number of inherent race conditions, and prepare for future cgroup v2 support. * Further improvements to cloud node power state management. * A new child process of the Slurm controller called "slurmscriptd" responsible for executing PrologSlurmctld and EpilogSlurmctld scripts, which significantly reduces performance issues associated with enabling those options. * A new burst_buffer/lua plugin allowing for site-specific asynchronous job data management. * Fixes to the job_container/tmpfs plugin to allow the slurmd process to be restarted while the job is running without issue. * Added json/yaml output to sacct, squeue, and sinfo commands. * Added a new node_features/helpers plugin to provide a generic way to change settings on a compute node across a reboot. * Added support for automatically detecting and broadcasting shared libraries for an executable launched with "srun --bcast". * Added initial OCI container execution support with a new --container option to sbatch and srun. * Improved "configless" support by allowing multiple control servers to be specified through the slurmd --conf-server option, and send additional configuration files at startup including cli_filter.lua. OBS-URL: https://build.opensuse.org/request/show/915777 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=184
2021-09-06 15:29:00 +02:00
%{_libdir}/slurm/serializer_json.so
%{_libdir}/slurm/serializer_url_encoded.so
%{_libdir}/slurm/serializer_yaml.so
%{_libdir}/slurm/site_factor_none.so
%{_libdir}/slurm/slurmctld_nonstop.so
%{_libdir}/slurm/switch_none.so
%{_libdir}/slurm/task_affinity.so
%{_libdir}/slurm/task_cgroup.so
%{_libdir}/slurm/task_none.so
%{_libdir}/slurm/topology_3d_torus.so
%{_libdir}/slurm/topology_hypercube.so
%{_libdir}/slurm/topology_none.so
%{_libdir}/slurm/topology_tree.so
Accepting request 454272 from home:eeich:branches:network:cluster - Updated to 16.05.8.1 * Remove StoragePass from being printed out in the slurmdbd log at debug2 level. * Defer PATH search for task program until launch in slurmstepd. * Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is enabled. * Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. * If job is held for bad constraints make it so once updated the job doesn't go into JobAdminHeld. * sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. * When unpacking a node or front_end record from state and the protocol version is lower than the min version, set it to the min. * Remove redundant lookup for part_ptr when updating a reservation's nodes. * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic. * Do not allocate specialized cores to jobs using the --exclusive option. * Cancel interactive job if Prolog failure with "PrologFlags=contain" or "PrologFlags=alloc" configured. Send new error prolog failure message to the salloc or srun command as needed. * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line. * Fix check for PluginDir within slurmctld to work with multiple directories. * Cancel interactive jobs automatically on communication error to launching srun/salloc process. * Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371). - Replace group/user add macros with function calls. - Disable building with netloc support: the netloc API is part of the devel branch of hwloc. Since this devel branch was included accidentally and has been reversed since, we need to disable this for the time being. - Conditionalized architecture specific pieces to support non-x86 architectures better. - Remove: unneeded 'BuildRequires: python' - Add: BuildRequires: freeipmi-devel BuildRequires: libibmad-devel BuildRequires: libibumad-devel so they are picked up by the slurm build. - Enable modifications from openHPC Project. - Enable lua API package build. - Add a recommends for slurm-munge to the slurm package: This is way, the munge auth method is available and slurm works out of the box. - Create /var/lib/slurm as StateSaveLocation directory. /tmp is dangerous. - Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE. OBS-URL: https://build.opensuse.org/request/show/454272 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=13
2017-02-02 21:23:02 +01:00
%if 0%{?suse_version} > 1310
%{_libdir}/slurm/acct_gather_interconnect_ofed.so
Accepting request 454272 from home:eeich:branches:network:cluster - Updated to 16.05.8.1 * Remove StoragePass from being printed out in the slurmdbd log at debug2 level. * Defer PATH search for task program until launch in slurmstepd. * Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is enabled. * Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. * If job is held for bad constraints make it so once updated the job doesn't go into JobAdminHeld. * sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. * When unpacking a node or front_end record from state and the protocol version is lower than the min version, set it to the min. * Remove redundant lookup for part_ptr when updating a reservation's nodes. * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic. * Do not allocate specialized cores to jobs using the --exclusive option. * Cancel interactive job if Prolog failure with "PrologFlags=contain" or "PrologFlags=alloc" configured. Send new error prolog failure message to the salloc or srun command as needed. * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line. * Fix check for PluginDir within slurmctld to work with multiple directories. * Cancel interactive jobs automatically on communication error to launching srun/salloc process. * Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371). - Replace group/user add macros with function calls. - Disable building with netloc support: the netloc API is part of the devel branch of hwloc. Since this devel branch was included accidentally and has been reversed since, we need to disable this for the time being. - Conditionalized architecture specific pieces to support non-x86 architectures better. - Remove: unneeded 'BuildRequires: python' - Add: BuildRequires: freeipmi-devel BuildRequires: libibmad-devel BuildRequires: libibumad-devel so they are picked up by the slurm build. - Enable modifications from openHPC Project. - Enable lua API package build. - Add a recommends for slurm-munge to the slurm package: This is way, the munge auth method is available and slurm works out of the box. - Create /var/lib/slurm as StateSaveLocation directory. /tmp is dangerous. - Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE. OBS-URL: https://build.opensuse.org/request/show/454272 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=13
2017-02-02 21:23:02 +01:00
%endif
%if 0%{?suse_version} > 1140
%ifarch %{ix86} x86_64
%{_libdir}/slurm/acct_gather_energy_ipmi.so
%{_libdir}/slurm/acct_gather_energy_xcc.so
Accepting request 454272 from home:eeich:branches:network:cluster - Updated to 16.05.8.1 * Remove StoragePass from being printed out in the slurmdbd log at debug2 level. * Defer PATH search for task program until launch in slurmstepd. * Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is enabled. * Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. * If job is held for bad constraints make it so once updated the job doesn't go into JobAdminHeld. * sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. * When unpacking a node or front_end record from state and the protocol version is lower than the min version, set it to the min. * Remove redundant lookup for part_ptr when updating a reservation's nodes. * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic. * Do not allocate specialized cores to jobs using the --exclusive option. * Cancel interactive job if Prolog failure with "PrologFlags=contain" or "PrologFlags=alloc" configured. Send new error prolog failure message to the salloc or srun command as needed. * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line. * Fix check for PluginDir within slurmctld to work with multiple directories. * Cancel interactive jobs automatically on communication error to launching srun/salloc process. * Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371). - Replace group/user add macros with function calls. - Disable building with netloc support: the netloc API is part of the devel branch of hwloc. Since this devel branch was included accidentally and has been reversed since, we need to disable this for the time being. - Conditionalized architecture specific pieces to support non-x86 architectures better. - Remove: unneeded 'BuildRequires: python' - Add: BuildRequires: freeipmi-devel BuildRequires: libibmad-devel BuildRequires: libibumad-devel so they are picked up by the slurm build. - Enable modifications from openHPC Project. - Enable lua API package build. - Add a recommends for slurm-munge to the slurm package: This is way, the munge auth method is available and slurm works out of the box. - Create /var/lib/slurm as StateSaveLocation directory. /tmp is dangerous. - Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE. OBS-URL: https://build.opensuse.org/request/show/454272 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=13
2017-02-02 21:23:02 +01:00
%endif
%endif
%{_libdir}/slurm/node_features_knl_generic.so
%{_libdir}/slurm/acct_gather_profile_influxdb.so
%{_libdir}/slurm/ext_sensors_rrd.so
%{_libdir}/slurm/jobcomp_elasticsearch.so
Accepting request 454272 from home:eeich:branches:network:cluster - Updated to 16.05.8.1 * Remove StoragePass from being printed out in the slurmdbd log at debug2 level. * Defer PATH search for task program until launch in slurmstepd. * Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is enabled. * Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. * If job is held for bad constraints make it so once updated the job doesn't go into JobAdminHeld. * sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. * When unpacking a node or front_end record from state and the protocol version is lower than the min version, set it to the min. * Remove redundant lookup for part_ptr when updating a reservation's nodes. * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic. * Do not allocate specialized cores to jobs using the --exclusive option. * Cancel interactive job if Prolog failure with "PrologFlags=contain" or "PrologFlags=alloc" configured. Send new error prolog failure message to the salloc or srun command as needed. * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line. * Fix check for PluginDir within slurmctld to work with multiple directories. * Cancel interactive jobs automatically on communication error to launching srun/salloc process. * Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371). - Replace group/user add macros with function calls. - Disable building with netloc support: the netloc API is part of the devel branch of hwloc. Since this devel branch was included accidentally and has been reversed since, we need to disable this for the time being. - Conditionalized architecture specific pieces to support non-x86 architectures better. - Remove: unneeded 'BuildRequires: python' - Add: BuildRequires: freeipmi-devel BuildRequires: libibmad-devel BuildRequires: libibumad-devel so they are picked up by the slurm build. - Enable modifications from openHPC Project. - Enable lua API package build. - Add a recommends for slurm-munge to the slurm package: This is way, the munge auth method is available and slurm works out of the box. - Create /var/lib/slurm as StateSaveLocation directory. /tmp is dangerous. - Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE. OBS-URL: https://build.opensuse.org/request/show/454272 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=13
2017-02-02 21:23:02 +01:00
%files lua
%{?comp_at}
Accepting request 454272 from home:eeich:branches:network:cluster - Updated to 16.05.8.1 * Remove StoragePass from being printed out in the slurmdbd log at debug2 level. * Defer PATH search for task program until launch in slurmstepd. * Modify regression test1.89 to avoid leaving vestigial job. Also reduce logging to reduce likelyhood of Expect buffer overflow. * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is enabled. * Fix for possible infinite loop in select/cons_res plugin when trying to satisfy a job's ntasks_per_core or socket specification. * If job is held for bad constraints make it so once updated the job doesn't go into JobAdminHeld. * sched/backfill - Fix logic to reserve resources for jobs that require a node reboot (i.e. to change KNL mode) in order to start. * When unpacking a node or front_end record from state and the protocol version is lower than the min version, set it to the min. * Remove redundant lookup for part_ptr when updating a reservation's nodes. * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic. * Do not allocate specialized cores to jobs using the --exclusive option. * Cancel interactive job if Prolog failure with "PrologFlags=contain" or "PrologFlags=alloc" configured. Send new error prolog failure message to the salloc or srun command as needed. * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line. * Fix check for PluginDir within slurmctld to work with multiple directories. * Cancel interactive jobs automatically on communication error to launching srun/salloc process. * Fix security issue caused by insecure file path handling triggered by the failure of a Prolog script. To exploit this a user needs to anticipate or cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371). - Replace group/user add macros with function calls. - Disable building with netloc support: the netloc API is part of the devel branch of hwloc. Since this devel branch was included accidentally and has been reversed since, we need to disable this for the time being. - Conditionalized architecture specific pieces to support non-x86 architectures better. - Remove: unneeded 'BuildRequires: python' - Add: BuildRequires: freeipmi-devel BuildRequires: libibmad-devel BuildRequires: libibumad-devel so they are picked up by the slurm build. - Enable modifications from openHPC Project. - Enable lua API package build. - Add a recommends for slurm-munge to the slurm package: This is way, the munge auth method is available and slurm works out of the box. - Create /var/lib/slurm as StateSaveLocation directory. /tmp is dangerous. - Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE. OBS-URL: https://build.opensuse.org/request/show/454272 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=13
2017-02-02 21:23:02 +01:00
%{_libdir}/slurm/job_submit_lua.so
%files torque
%{?comp_at}
%{_bindir}/pbsnodes
Accepting request 226317 from home:scorot:branches:network:cluster - update to version 2.6.7 * Support for job arrays, which increases performance and ease of use for sets of similar jobs. * Job profiling capability added to record a wide variety of job characteristics for each task on a user configurable periodic basis. Data currently available includes CPU use, memory use, energy use, Infiniband network use, Lustre file system use, etc. * Support for MPICH2 using PMI2 communications interface with much greater scalability. * Prolog and epilog support for advanced reservations. * Much faster throughput for job step execution with --exclusive option. The srun process is notified when resources become available rather than periodic polling. * Support improved for Intel MIC (Many Integrated Core) processor. * Advanced reservations with hostname and core counts now supports asymmetric reservations (e.g. specific different core count for each node). * External sensor plugin infrastructure added to record power consumption, temperature, etc. * Improved performance for high-throughput computing. * MapReduce+ support (launches ~1000x faster, runs ~10x faster). * Added "MaxCPUsPerNode" partition configuration parameter. This can be especially useful to schedule GPUs. For example a node can be associated with two Slurm partitions (e.g. "cpu" and "gpu") and the partition/queue "cpu" could be limited to only a subset of the node's CPUs, insuring that one or more CPUs would be available to jobs in the "gpu" partition/queue. OBS-URL: https://build.opensuse.org/request/show/226317 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=4
2014-03-16 21:42:08 +01:00
%{_bindir}/qalter
%{_bindir}/qdel
%{_bindir}/qhold
%{_bindir}/qrls
Accepting request 226317 from home:scorot:branches:network:cluster - update to version 2.6.7 * Support for job arrays, which increases performance and ease of use for sets of similar jobs. * Job profiling capability added to record a wide variety of job characteristics for each task on a user configurable periodic basis. Data currently available includes CPU use, memory use, energy use, Infiniband network use, Lustre file system use, etc. * Support for MPICH2 using PMI2 communications interface with much greater scalability. * Prolog and epilog support for advanced reservations. * Much faster throughput for job step execution with --exclusive option. The srun process is notified when resources become available rather than periodic polling. * Support improved for Intel MIC (Many Integrated Core) processor. * Advanced reservations with hostname and core counts now supports asymmetric reservations (e.g. specific different core count for each node). * External sensor plugin infrastructure added to record power consumption, temperature, etc. * Improved performance for high-throughput computing. * MapReduce+ support (launches ~1000x faster, runs ~10x faster). * Added "MaxCPUsPerNode" partition configuration parameter. This can be especially useful to schedule GPUs. For example a node can be associated with two Slurm partitions (e.g. "cpu" and "gpu") and the partition/queue "cpu" could be limited to only a subset of the node's CPUs, insuring that one or more CPUs would be available to jobs in the "gpu" partition/queue. OBS-URL: https://build.opensuse.org/request/show/226317 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=4
2014-03-16 21:42:08 +01:00
%{_bindir}/qrerun
%{_bindir}/qstat
%{_bindir}/qsub
%{_bindir}/mpiexec.slurm
%{_bindir}/generate_pbs_nodefile
%{_libdir}/slurm/job_submit_pbs.so
%{_libdir}/slurm/spank_pbs.so
%files sjstat
%{?comp_at}
%{_bindir}/sjstat
%files pam_slurm
%{?comp_at}
%doc ../README.pam_slurm ../README.pam_slurm_adopt
%{_pam_moduledir}/pam_slurm.so
%{_pam_moduledir}/pam_slurm_adopt.so
%if 0%{?build_slurmrestd}
%files rest
%{?comp_at}
%{_sbindir}/slurmrestd
%{_mandir}/man8/slurmrestd.*
%{_libdir}/slurm/openapi_dbv0_0_37.so
%{_libdir}/slurm/openapi_v0_0_37.so
%{_libdir}/slurm/openapi_dbv0_0_36.so
%{_libdir}/slurm/openapi_v0_0_35.so
%{_libdir}/slurm/openapi_v0_0_36.so
%{_libdir}/slurm/rest_auth_jwt.so
%{_libdir}/slurm/rest_auth_local.so
%endif
%files node
%{?comp_at}
%{_sbindir}/slurmd
%{_sbindir}/slurmstepd
# bsc#1153095
%{_bindir}/srun
%{_mandir}/man1/srun.1*
%{_mandir}/man8/slurmd.*
%{_mandir}/man8/slurmstepd*
%if 0%{?with_systemd}
%{_sbindir}/rcslurmd
%{_unitdir}/slurmd.service
%else
%{_initrddir}/slurm
%{_sbindir}/rcslurm
%endif
%files config
%{?comp_at}
%dir %{_sysconfdir}/%{pname}
%config(noreplace) %{_sysconfdir}/%{pname}/slurm.conf
%config %{_sysconfdir}/%{pname}/slurm.conf.example
%config(noreplace) %{_sysconfdir}/%{pname}/cgroup.conf
%attr(0755, %slurm_u, %slurm_g) %_localstatedir/lib/slurm
%{?with_systemd:%{_tmpfilesdir}/%{pname}.conf}
%{?_rundir:%ghost %{_rundir}/slurm}
%dir %attr(0755, %slurm_u, %slurm_g)%{_localstatedir}/spool/slurm
%config(noreplace) %{_sysconfdir}/logrotate.d/slurm*
%if 0%{?have_firewalld}
%{_prefix}/lib/firewalld/services/slurmd.xml
%{_prefix}/lib/firewalld/services/slurmctld.xml
%{_prefix}/lib/firewalld/services/slurmdbd.xml
%endif
%{?have_sysuser:%{_sysusersdir}/system-user-%{pname}.conf}
Accepting request 650545 from home:eeich:branches:network:cluster - Added missing perl-base dependency. - Moved HTML docs to doc package. - Moved config man pages to a separate package: This way, they won't get installed on compute nodes. - Update to 18.08.3 * Add new burst buffer state of "teardown-fail" to indicate the burst buffer teardown operation is failing on specific buffers. * Multiple backup slurmctld daemons can be configured * Enable jobs with zero node count for creation and/or deletion of persistent burst buffers. * Add "scontrol show dwstat" command to display Cray burst buffer status. * Add "GetSysStatus" option to burst_buffer.conf file. * Add node and partition configuration options of "CpuBind" to control default task binding. * Add "NumaCpuBind" option to knl.conf * Add sbatch "--batch" option to identify features required on batch node. * Add "BatchFeatures" field to output of "scontrol show job". * Add support for "--bb" option to sbatch command. * Add new SystemComment field to job data structure and database. * Expand reservation "flags" field from 32 to 64 bits. * Add job state flag of "SIGNALING" to avoid race condition. * Properly handle srun --will-run option when there are jobs in COMPLETING state. * Properly report who is signaling a step. * Don't combine updated reservation records in sreport's reservation report. * node_features plugin - Add suport for XOR & XAND of job constraints (node feature specifications). OBS-URL: https://build.opensuse.org/request/show/650545 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=75
2018-11-20 18:07:44 +01:00
%files config-man
%{?comp_at}
%{_mandir}/man5/acct_gather.conf.*
%{_mandir}/man5/burst_buffer.conf.*
%{_mandir}/man5/ext_sensors.conf.*
%{_mandir}/man5/slurm.*
%{_mandir}/man5/cgroup.*
%{_mandir}/man5/gres.*
Accepting request 942081 from home:mslacken:branches:network:cluster - update to 21.08.5 with following changes: * Fix issue where typeless GRES node updates were not immediately reflected. * Fix setting the default scrontab job working directory so that it's the home of the different user (*u <user>) and not that of root or SlurmUser editor. * Fix stepd not respecting SlurmdSyslogDebug. * Fix concurrency issue with squeue. * Fix job start time not being reset after launch when job is packed onto already booting node. * Fix updating SLURM_NODE_ALIASES for jobs packed onto powering up nodes. * Cray - Fix issues with starting hetjobs. * auth/jwks - Print fatal() message when jwks is configured but file could not be opened. * If sacctmgr has an association with an unknown qos as the default qos print 'UNKN*###' instead of leaving a blank name. * Correctly determine task count when giving --cpus-per-gpu, --gpus and *-ntasks-per-node without task count. * slurmctld - Fix places where the global last_job_update was not being set to the time of update when a job's reason and description were updated. * slurmctld - Fix case where a job submitted with more than one partition would not have its reason updated while waiting to start. * Fix memory leak in node feature rebooting. * Fix time limit permanetly set to 1 minute by backfill for job array tasks higher than the first with QOS NoReserve flag and PreemptMode configured. * Fix sacct -N to show jobs that started in the current second * Fix issue on running steps where both SLURM_NTASKS_PER_TRES and SLURM_NTASKS_PER_GPU are set. * Handle oversubscription request correctly when also requesting *-ntasks-per-tres. * Correctly detect when a step requests bad gres inside an allocation. * slurmstepd - Correct possible deadlock when UnkillableStepTimeout triggers. OBS-URL: https://build.opensuse.org/request/show/942081 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=195
2021-12-23 11:26:41 +01:00
%{_mandir}/man5/helpers.*
%{_mandir}/man5/nonstop.conf.5.*
Accepting request 915777 from home:mslacken:slurm_update - updated to 21.08.1, major changes: * A new "AccountingStoreFlags=job_script" option to store the job scripts directly in SlurmDBD. * Added "sacct -o SubmitLine" format option to get the submit line of a job/step. * Changes to the node state management so that nodes are marked as PLANNED instead of IDLE if the scheduler is still accumulating resources while waiting to launch a job on them. * RS256 token support in auth/jwt. * Overhaul of the cgroup subsystems to simplify operation, mitigate a number of inherent race conditions, and prepare for future cgroup v2 support. * Further improvements to cloud node power state management. * A new child process of the Slurm controller called "slurmscriptd" responsible for executing PrologSlurmctld and EpilogSlurmctld scripts, which significantly reduces performance issues associated with enabling those options. * A new burst_buffer/lua plugin allowing for site-specific asynchronous job data management. * Fixes to the job_container/tmpfs plugin to allow the slurmd process to be restarted while the job is running without issue. * Added json/yaml output to sacct, squeue, and sinfo commands. * Added a new node_features/helpers plugin to provide a generic way to change settings on a compute node across a reboot. * Added support for automatically detecting and broadcasting shared libraries for an executable launched with "srun --bcast". * Added initial OCI container execution support with a new --container option to sbatch and srun. * Improved "configless" support by allowing multiple control servers to be specified through the slurmd --conf-server option, and send additional configuration files at startup including cli_filter.lua. OBS-URL: https://build.opensuse.org/request/show/915777 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=184
2021-09-06 15:29:00 +02:00
%{_mandir}/man5/oci.conf.5.gz
%{_mandir}/man5/topology.*
%{_mandir}/man5/knl.conf.5.*
%{_mandir}/man5/job_container.conf.5.*
%if 0%{?have_hdf5}
%files hdf5
%{_bindir}/sh5util
%{_libdir}/slurm/acct_gather_profile_hdf5.so
%{_mandir}/man1/sh5util.1.gz
%endif
Accepting request 714908 from home:mslacken:branches:network:cluster - added cray depend libraries to seperate package, as they are now built, since json is enabled - Updated to 18.0.7 for fixing CVE-2019-12838 and (bsc#1140709) * Update "xauth list" to use the same 10000ms timeout as the other xauth commands. * Fix issue in gres code to handle a gres cnt of 0. * Don't purge jobs if backfill is running. * Verify job is pending add/removing accrual time. * Don't abort when the job doesn't have an association that was removed before the job was able to make it to the database. * Set state_reason if select_nodes() fails job for QOS or Account. * Avoid seg_fault on referencing association without a valid_qos bitmap. * If Association/QOS is removed on a pending job set that job as ineligible. * When changing a jobs account/qos always make sure you remove the old limits. * Don't reset a FAIL_QOS or FAIL_ACCOUNT job reason until the qos or account changed. * Restore "sreport -T ALL" functionality. * Correctly typecast signals being sent through the api. * Properly initialize structures throughout Slurm. * Sync "numtask" squeue format option for jobs and steps to "numtasks". * Fix sacct -PD to avoid CA before start jobs. * Fix potential deadlock with backup slurmctld. * Fixed issue with jobs not appearing in sacct after dependency satisfied. * Fix showing non-eligible jobs when asking with -j and not -s. * Fix issue with backfill scheduler scheduling tasks of an array when not the head job. * accounting_storage/mysql - fix SIGABRT in the archive load logic. * accounting_storage/mysql - fix memory leak in the archive load logic. * Limit records per single SQL statement when loading archived data. OBS-URL: https://build.opensuse.org/request/show/714908 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=100
2019-07-12 20:09:50 +02:00
%files cray
%{?comp_at}
Accepting request 714908 from home:mslacken:branches:network:cluster - added cray depend libraries to seperate package, as they are now built, since json is enabled - Updated to 18.0.7 for fixing CVE-2019-12838 and (bsc#1140709) * Update "xauth list" to use the same 10000ms timeout as the other xauth commands. * Fix issue in gres code to handle a gres cnt of 0. * Don't purge jobs if backfill is running. * Verify job is pending add/removing accrual time. * Don't abort when the job doesn't have an association that was removed before the job was able to make it to the database. * Set state_reason if select_nodes() fails job for QOS or Account. * Avoid seg_fault on referencing association without a valid_qos bitmap. * If Association/QOS is removed on a pending job set that job as ineligible. * When changing a jobs account/qos always make sure you remove the old limits. * Don't reset a FAIL_QOS or FAIL_ACCOUNT job reason until the qos or account changed. * Restore "sreport -T ALL" functionality. * Correctly typecast signals being sent through the api. * Properly initialize structures throughout Slurm. * Sync "numtask" squeue format option for jobs and steps to "numtasks". * Fix sacct -PD to avoid CA before start jobs. * Fix potential deadlock with backup slurmctld. * Fixed issue with jobs not appearing in sacct after dependency satisfied. * Fix showing non-eligible jobs when asking with -j and not -s. * Fix issue with backfill scheduler scheduling tasks of an array when not the head job. * accounting_storage/mysql - fix SIGABRT in the archive load logic. * accounting_storage/mysql - fix memory leak in the archive load logic. * Limit records per single SQL statement when loading archived data. OBS-URL: https://build.opensuse.org/request/show/714908 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=100
2019-07-12 20:09:50 +02:00
# do not remove cray sepcific packages from SLES update
# Only for Cray
%{_libdir}/slurm/core_spec_cray_aries.so
%{_libdir}/slurm/job_submit_cray_aries.so
%{_libdir}/slurm/select_cray_aries.so
%{_libdir}/slurm/switch_cray_aries.so
%{_libdir}/slurm/task_cray_aries.so
%{_libdir}/slurm/mpi_cray_shasta.so
%if 0%{?have_json_c}
Accepting request 714908 from home:mslacken:branches:network:cluster - added cray depend libraries to seperate package, as they are now built, since json is enabled - Updated to 18.0.7 for fixing CVE-2019-12838 and (bsc#1140709) * Update "xauth list" to use the same 10000ms timeout as the other xauth commands. * Fix issue in gres code to handle a gres cnt of 0. * Don't purge jobs if backfill is running. * Verify job is pending add/removing accrual time. * Don't abort when the job doesn't have an association that was removed before the job was able to make it to the database. * Set state_reason if select_nodes() fails job for QOS or Account. * Avoid seg_fault on referencing association without a valid_qos bitmap. * If Association/QOS is removed on a pending job set that job as ineligible. * When changing a jobs account/qos always make sure you remove the old limits. * Don't reset a FAIL_QOS or FAIL_ACCOUNT job reason until the qos or account changed. * Restore "sreport -T ALL" functionality. * Correctly typecast signals being sent through the api. * Properly initialize structures throughout Slurm. * Sync "numtask" squeue format option for jobs and steps to "numtasks". * Fix sacct -PD to avoid CA before start jobs. * Fix potential deadlock with backup slurmctld. * Fixed issue with jobs not appearing in sacct after dependency satisfied. * Fix showing non-eligible jobs when asking with -j and not -s. * Fix issue with backfill scheduler scheduling tasks of an array when not the head job. * accounting_storage/mysql - fix SIGABRT in the archive load logic. * accounting_storage/mysql - fix memory leak in the archive load logic. * Limit records per single SQL statement when loading archived data. OBS-URL: https://build.opensuse.org/request/show/714908 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=100
2019-07-12 20:09:50 +02:00
%{_libdir}/slurm/node_features_knl_cray.so
%{_libdir}/slurm/power_cray_aries.so
%endif
Accepting request 714908 from home:mslacken:branches:network:cluster - added cray depend libraries to seperate package, as they are now built, since json is enabled - Updated to 18.0.7 for fixing CVE-2019-12838 and (bsc#1140709) * Update "xauth list" to use the same 10000ms timeout as the other xauth commands. * Fix issue in gres code to handle a gres cnt of 0. * Don't purge jobs if backfill is running. * Verify job is pending add/removing accrual time. * Don't abort when the job doesn't have an association that was removed before the job was able to make it to the database. * Set state_reason if select_nodes() fails job for QOS or Account. * Avoid seg_fault on referencing association without a valid_qos bitmap. * If Association/QOS is removed on a pending job set that job as ineligible. * When changing a jobs account/qos always make sure you remove the old limits. * Don't reset a FAIL_QOS or FAIL_ACCOUNT job reason until the qos or account changed. * Restore "sreport -T ALL" functionality. * Correctly typecast signals being sent through the api. * Properly initialize structures throughout Slurm. * Sync "numtask" squeue format option for jobs and steps to "numtasks". * Fix sacct -PD to avoid CA before start jobs. * Fix potential deadlock with backup slurmctld. * Fixed issue with jobs not appearing in sacct after dependency satisfied. * Fix showing non-eligible jobs when asking with -j and not -s. * Fix issue with backfill scheduler scheduling tasks of an array when not the head job. * accounting_storage/mysql - fix SIGABRT in the archive load logic. * accounting_storage/mysql - fix memory leak in the archive load logic. * Limit records per single SQL statement when loading archived data. OBS-URL: https://build.opensuse.org/request/show/714908 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=100
2019-07-12 20:09:50 +02:00
%changelog