forked from pool/slurm
Egbert Eich
e4e11a7864
- Trim redundant parts of description. Fixup RPM groups. - Replace unnecessary %__ macro indirections; replace historic $RPM_* variables by macros. OBS-URL: https://build.opensuse.org/request/show/458469 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=16
452 lines
21 KiB
Plaintext
452 lines
21 KiB
Plaintext
-------------------------------------------------------------------
|
|
Thu Feb 16 12:12:45 UTC 2017 - jengelh@inai.de
|
|
|
|
- Trim redundant parts of description. Fixup RPM groups.
|
|
- Replace unnecessary %__ macro indirections;
|
|
replace historic $RPM_* variables by macros.
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Feb 15 18:55:28 UTC 2017 - eich@suse.com
|
|
|
|
- slurmd-Fix-for-newer-API-versions.patch:
|
|
Stale patch removed.
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Feb 7 16:47:17 UTC 2017 - eich@suse.com
|
|
|
|
- Use %slurm_u and %slurm_g macros defined at the beginning of the spec
|
|
file when adding the slurm user/group for consistency.
|
|
- Define these macros to daemon,root for non-systemd.
|
|
- For anything newer than Leap 42.1 or SLE-12-SP1 build OpenHPC compatible.
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Feb 1 20:17:47 UTC 2017 - eich@suse.com
|
|
|
|
- Updated to 16.05.8.1
|
|
* Remove StoragePass from being printed out in the slurmdbd log at debug2
|
|
level.
|
|
* Defer PATH search for task program until launch in slurmstepd.
|
|
* Modify regression test1.89 to avoid leaving vestigial job. Also reduce
|
|
logging to reduce likelyhood of Expect buffer overflow.
|
|
* Do not PATH search for mult-prog launches if LaunchParamters=test_exec is
|
|
enabled.
|
|
* Fix for possible infinite loop in select/cons_res plugin when trying to
|
|
satisfy a job's ntasks_per_core or socket specification.
|
|
* If job is held for bad constraints make it so once updated the job doesn't
|
|
go into JobAdminHeld.
|
|
* sched/backfill - Fix logic to reserve resources for jobs that require a
|
|
node reboot (i.e. to change KNL mode) in order to start.
|
|
* When unpacking a node or front_end record from state and the protocol
|
|
version is lower than the min version, set it to the min.
|
|
* Remove redundant lookup for part_ptr when updating a reservation's nodes.
|
|
* Fix memory and file descriptor leaks in slurmd daemon's sbcast logic.
|
|
* Do not allocate specialized cores to jobs using the --exclusive option.
|
|
* Cancel interactive job if Prolog failure with "PrologFlags=contain" or
|
|
"PrologFlags=alloc" configured. Send new error prolog failure message to
|
|
the salloc or srun command as needed.
|
|
* Prevent possible out-of-bounds read in slurmstepd on an invalid #! line.
|
|
* Fix check for PluginDir within slurmctld to work with multiple directories.
|
|
* Cancel interactive jobs automatically on communication error to launching
|
|
srun/salloc process.
|
|
* Fix security issue caused by insecure file path handling triggered by the
|
|
failure of a Prolog script. To exploit this a user needs to anticipate or
|
|
cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371).
|
|
- Replace group/user add macros with function calls.
|
|
- Disable building with netloc support: the netloc API is part of the devel
|
|
branch of hwloc. Since this devel branch was included accidentally and has
|
|
been reversed since, we need to disable this for the time being.
|
|
- Conditionalized architecture specific pieces to support non-x86 architectures
|
|
better.
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Jan 3 17:21:58 UTC 2017 - eich@suse.com
|
|
|
|
- Remove: unneeded 'BuildRequires: python'
|
|
- Add:
|
|
BuildRequires: freeipmi-devel
|
|
BuildRequires: libibmad-devel
|
|
BuildRequires: libibumad-devel
|
|
so they are picked up by the slurm build.
|
|
- Enable modifications from openHPC Project.
|
|
- Enable lua API package build.
|
|
- Add a recommends for slurm-munge to the slurm package:
|
|
This is way, the munge auth method is available and slurm
|
|
works out of the box.
|
|
- Create /var/lib/slurm as StateSaveLocation directory.
|
|
/tmp is dangerous.
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Nov 30 15:16:05 UTC 2016 - eich@suse.com
|
|
|
|
- Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE.
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Nov 22 21:42:04 UTC 2016 - eich@suse.com
|
|
|
|
- Fix build with and without OHCP_BUILD define.
|
|
- Fix build for systemd and non-systemd.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Nov 4 20:15:47 UTC 2016 - eich@suse.com
|
|
|
|
- Updated to 16-05-5 - equvalent to OpenHPC 1.2.
|
|
* Fix issue with resizing jobs and limits not be kept track of correctly.
|
|
* BGQ - Remove redeclaration of job_read_lock.
|
|
* BGQ - Tighter locks around structures when nodes/cables change state.
|
|
* Make it possible to change CPUsPerTask with scontrol.
|
|
* Make it so scontrol update part qos= will take away a partition QOS from
|
|
a partition.
|
|
* Backfill scheduling properly synchronized with Cray Node Health Check.
|
|
Prior logic could result in highest priority job getting improperly
|
|
postponed.
|
|
* Make it so daemons also support TopologyParam=NoInAddrAny.
|
|
* If scancel is operating on large number of jobs and RPC responses from
|
|
slurmctld daemon are slow then introduce a delay in sending the cancel job
|
|
requests from scancel in order to reduce load on slurmctld.
|
|
* Remove redundant logic when updating a job's task count.
|
|
* MySQL - Fix querying jobs with reservations when the id's have rolled.
|
|
* Perl - Fix use of uninitialized variable in slurm_job_step_get_pids.
|
|
* Launch batch job requsting --reboot after the boot completes.
|
|
* Do not attempt to power down a node which has never responded if the
|
|
slurmctld daemon restarts without state.
|
|
* Fix for possible slurmstepd segfault on invalid user ID.
|
|
* MySQL - Fix for possible race condition when archiving multiple clusters
|
|
at the same time.
|
|
* Add logic so that slurmstepd can be launched under valgrind.
|
|
* Increase buffer size to read /proc/*/stat files.
|
|
* Remove the SchedulerParameters option of "assoc_limit_continue", making it
|
|
the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop"
|
|
is set and a job cannot start due to association limits, then do not attempt
|
|
to initiate any lower priority jobs in that partition. Setting this can
|
|
decrease system throughput and utlization, but avoid potentially starving
|
|
larger jobs by preventing them from launching indefinitely.
|
|
* Update a node's socket and cores per socket counts as needed after a node
|
|
boot to reflect configuration changes which can occur on KNL processors.
|
|
Note that the node's total core count must not change, only the distribution
|
|
of cores across varying socket counts (KNL NUMA nodes treated as sockets by
|
|
Slurm).
|
|
* Rename partition configuration from "Shared" to "OverSubscribe". Rename
|
|
salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old
|
|
options will continue to function. Output field names also changed in
|
|
scontrol, sinfo, squeue and sview.
|
|
* Add SLURM_UMASK environment variable to user job.
|
|
* knl_conf: Added new configuration parameter of CapmcPollFreq.
|
|
* Cleanup two minor Coverity warnings.
|
|
* Make it so the tres units in a job's formatted string are converted like
|
|
they are in a step.
|
|
* Correct partition's MaxCPUsPerNode enforcement when nodes are shared by
|
|
multiple partitions.
|
|
* node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references.
|
|
* Display thread name instead of thread id and remove process name in stderr
|
|
logging for "thread_id" LogTimeFormat.
|
|
* Log IP address of bad incomming message to slurmctld.
|
|
* If a user requests tasks, nodes and ntasks-per-node and
|
|
tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node.
|
|
* Release CPU "owner" file locks.
|
|
* Update seff to fix warnings with ncpus, and list slurm-perlapi dependency
|
|
in spec file.
|
|
* Allow QOS timelimit to override partition timelimit when EnforcePartLimits
|
|
is set to all/any.
|
|
* Make it so qsub will do a "basename" on a wrapped command for the output
|
|
and error files.
|
|
* Add logic so that slurmstepd can be launched under valgrind.
|
|
* Increase buffer size to read /proc/*/stat files.
|
|
* Prevent job stuck in configuring state if slurmctld daemon restarted while
|
|
PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation
|
|
as needed.
|
|
* Move test for job wait reason value of BurstBufferResources and
|
|
BurstBufferStageIn later in the scheduling logic.
|
|
* Document which srun options apply to only job, only step, or job and step
|
|
allocations.
|
|
* Use more compatible function to get thread name (>= 2.6.11).
|
|
* Make it so the extern step uses a reverse tree when cleaning up.
|
|
* If extern step doesn't get added into the proctrack plugin make sure the
|
|
sleep is killed.
|
|
* Add web links to Slurm Diamond Collectors (from Harvard University) and
|
|
collectd (from EDF).
|
|
* Add job_submit plugin for the "reboot" field.
|
|
* Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to
|
|
job_submit/lua plugins.
|
|
* Send in a -1 for a taskid into spank_task_post_fork for the extern_step.
|
|
* MYSQL - Sightly better logic if a job completion comes in with an end time
|
|
of 0.
|
|
* task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft
|
|
memory limit to allocated memory limit (previously no soft limit was set).
|
|
* Streamline when schedule() is called when running with message aggregation
|
|
on batch script completes.
|
|
* Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t.
|
|
* Document that persistent burst buffers can not be created or destroyed using
|
|
the salloc or srun --bb options.
|
|
* Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and
|
|
SLURM_JOB_RESERVAION environment variables are set for the salloc command.
|
|
Document the same environment variables for the salloc, sbatch and srun
|
|
commands in their man pages.
|
|
* Fix issue where sacctmgr load cluster.cfg wouldn't load associations
|
|
that had a partition in them.
|
|
* Don't return the extern step from sstat by default.
|
|
* In sstat print 'extern' instead of 4294967295 for the extern step.
|
|
* Make advanced reservations work properly with core specialization.
|
|
* slurmstepd modified to pre-load all relevant plugins at startup to avoid
|
|
the possibility of modified plugins later resulting in inconsistent API
|
|
or data structures and a failure of slurmstepd.
|
|
* Export functions from parse_time.c in libslurm.so.
|
|
* Export unit convert functions from slurm_protocol_api.c in libslurm.so.
|
|
* Fix scancel to allow multiple steps from a job to be cancelled at once.
|
|
* Update and expand upgrade guide (in Quick Start Administrator web page).
|
|
* burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run
|
|
operation.
|
|
* Insure reported expected job start time is not in the past for pending jobs.
|
|
* Add support for PMIx v2.
|
|
|
|
Required for FATE#316379.
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Oct 17 13:25:52 UTC 2016 - eich@suse.com
|
|
|
|
- Setting 'download_files' service to mode='localonly'
|
|
and adding source tarball. (Required for Factory).
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Oct 15 18:11:39 UTC 2016 - eich@suse.com
|
|
|
|
- version 15.08.7.1
|
|
* Remove the 1024-character limit on lines in batch scripts.
|
|
task/affinity: Disable core-level task binding if more CPUs required than
|
|
available cores.
|
|
* Preemption/gang scheduling: If a job is suspended at slurmctld restart or
|
|
reconfiguration time, then leave it suspended rather than resume+suspend.
|
|
* Don't use lower weight nodes for job allocation when topology/tree used.
|
|
* Don't allow user specified reservation names to disrupt the normal
|
|
reservation sequeuece numbering scheme.
|
|
* Avoid hard-link/copy of script/environment files for job arrays. Use the
|
|
master job record file for all tasks of the job array.
|
|
NOTE: Job arrays submitted to Slurm version 15.08.6 or later will fail if
|
|
the slurmctld daemon is downgraded to an earlier version of Slurm.
|
|
* In slurmctld log file, log duplicate job ID found by slurmd. Previously was
|
|
being logged as prolog/epilog failure.
|
|
* If a job is requeued while in the process of being launch, remove it's
|
|
job ID from slurmd's record of active jobs in order to avoid generating a
|
|
duplicate job ID error when launched for the second time (which would
|
|
drain the node).
|
|
* Cleanup messages when handling job script and environment variables in
|
|
older directory structure formats.
|
|
* Prevent triggering gang scheduling within a partition if configured with
|
|
PreemptType=partition_prio and PreemptMode=suspend,gang.
|
|
* Decrease parallelism in job cancel request to prevent denial of service
|
|
when cancelling huge numbers of jobs.
|
|
* If all ephemeral ports are in use, try using other port numbers.
|
|
* Prevent "scontrol update job" from updating jobs that have already finished.
|
|
* Show requested TRES in "squeue -O tres" when job is pending.
|
|
* Backfill scheduler: Test association and QOS node limits before reserving
|
|
resources for pending job.
|
|
* Many bug fixes.
|
|
- Use source services to download package.
|
|
- Fix code for new API of hwloc-2.0.
|
|
- package netloc_to_topology where avialable.
|
|
- Package documentation.
|
|
|
|
-------------------------------------------------------------------
|
|
Sun Nov 1 13:45:52 UTC 2015 - scorot@free.fr
|
|
|
|
- version 15.08.3
|
|
* Many new features and bug fixes. See NEWS file
|
|
- update files list accordingly
|
|
- fix wrong end of line in some files
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Aug 6 19:06:18 UTC 2015 - scorot@free.fr
|
|
|
|
- version 14.11.8
|
|
* Many bug fixes. See NEWS file
|
|
- update files list accordingly
|
|
|
|
-------------------------------------------------------------------
|
|
Sun Nov 2 22:12:34 UTC 2014 - scorot@free.fr
|
|
|
|
- add missing systemd requirements
|
|
- add missing rclink
|
|
|
|
-------------------------------------------------------------------
|
|
Sun Nov 2 15:04:42 UTC 2014 - scorot@free.fr
|
|
|
|
- version 14.03.9
|
|
* Many bug fixes. See NEWS file
|
|
- add systemd support
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Jul 26 10:22:32 UTC 2014 - scorot@free.fr
|
|
|
|
- version 14.03.6
|
|
* Added support for native Slurm operation on Cray systems
|
|
(without ALPS).
|
|
* Added partition configuration parameters AllowAccounts,
|
|
AllowQOS, DenyAccounts and DenyQOS to provide greater control
|
|
over use.
|
|
* Added the ability to perform load based scheduling. Allocating
|
|
resources to jobs on the nodes with the largest number if idle
|
|
CPUs.
|
|
* Added support for reserving cores on a compute node for system
|
|
services (core specialization)
|
|
* Add mechanism for job_submit plugin to generate error message
|
|
for srun, salloc or sbatch to stderr.
|
|
* Support for Postgres database has long since been out of date
|
|
and problematic, so it has been removed entirely. If you
|
|
would like to use it the code still exists in <= 2.6, but will
|
|
not be included in this and future versions of the code.
|
|
* Added new structures and support for both server and cluster
|
|
resources.
|
|
* Significant performance improvements, especially with respect
|
|
to job array support.
|
|
- update files list
|
|
|
|
-------------------------------------------------------------------
|
|
Sun Mar 16 15:59:01 UTC 2014 - scorot@free.fr
|
|
|
|
- update to version 2.6.7
|
|
* Support for job arrays, which increases performance and ease of
|
|
use for sets of similar jobs.
|
|
* Job profiling capability added to record a wide variety of job
|
|
characteristics for each task on a user configurable periodic
|
|
basis. Data currently available includes CPU use, memory use,
|
|
energy use, Infiniband network use, Lustre file system use, etc.
|
|
* Support for MPICH2 using PMI2 communications interface with much
|
|
greater scalability.
|
|
* Prolog and epilog support for advanced reservations.
|
|
* Much faster throughput for job step execution with --exclusive
|
|
option. The srun process is notified when resources become
|
|
available rather than periodic polling.
|
|
* Support improved for Intel MIC (Many Integrated Core) processor.
|
|
* Advanced reservations with hostname and core counts now supports
|
|
asymmetric reservations (e.g. specific different core count for
|
|
each node).
|
|
* External sensor plugin infrastructure added to record power
|
|
consumption, temperature, etc.
|
|
* Improved performance for high-throughput computing.
|
|
* MapReduce+ support (launches ~1000x faster, runs ~10x faster).
|
|
* Added "MaxCPUsPerNode" partition configuration parameter. This
|
|
can be especially useful to schedule GPUs. For example a node
|
|
can be associated with two Slurm partitions (e.g. "cpu" and
|
|
"gpu") and the partition/queue "cpu" could be limited to only a
|
|
subset of the node's CPUs, insuring that one or more CPUs would
|
|
be available to jobs in the "gpu" partition/queue.
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Jun 6 20:31:49 UTC 2013 - scorot@free.fr
|
|
|
|
- version 2.5.7
|
|
* Fix for linking to the select/cray plugin to not give warning
|
|
about undefined variable.
|
|
* Add missing symbols to the xlator.h
|
|
* Avoid placing pending jobs in AdminHold state due to backfill
|
|
scheduler interactions with advanced reservation.
|
|
* Accounting - make average by task not cpu.
|
|
* POE - Correct logic to support poe option "-euidevice sn_all"
|
|
and "-euidevice sn_single".
|
|
* Accounting - Fix minor initialization error.
|
|
* POE - Correct logic to support srun network instances count
|
|
with POE.
|
|
* POE - With the srun --launch-cmd option, report proper task
|
|
count when the --cpus-per-task option is used without the
|
|
--ntasks option.
|
|
* POE - Fix logic binding tasks to CPUs.
|
|
* sview - Fix race condition where new information could of
|
|
slipped past the node tab and we didn't notice.
|
|
* Accounting - Fix an invalid memory read when slurmctld sends
|
|
data about start job to slurmdbd.
|
|
* If a prolog or epilog failure occurs, drain the node rather
|
|
than setting it down and killing all of its jobs.
|
|
* Priority/multifactor - Avoid underflow in half-life calculation.
|
|
* POE - pack missing variable to allow fanout (more than 32
|
|
nodes)
|
|
* Prevent clearing reason field for pending jobs. This bug was
|
|
introduced in v2.5.5 (see "Reject job at submit time ...").
|
|
* BGQ - Fix issue with preemption on sub-block jobs where a job
|
|
would kill all preemptable jobs on the midplane instead of just
|
|
the ones it needed to.
|
|
* switch/nrt - Validate dynamic window allocation size.
|
|
* BGQ - When --geo is requested do not impose the default
|
|
conn_types.
|
|
* RebootNode logic - Defers (rather than forgets) reboot request
|
|
with job running on the node within a reservation.
|
|
* switch/nrt - Correct network_id use logic. Correct support for
|
|
user sn_all and sn_single options.
|
|
* sched/backfill - Modify logic to reduce overhead under heavy
|
|
load.
|
|
* Fix job step allocation with --exclusive and --hostlist option.
|
|
* Select/cons_res - Fix bug resulting in error of "cons_res: sync
|
|
loop not progressing, holding job #"
|
|
* checkpoint/blcr - Reset max_nodes from zero to NO_VAL on job
|
|
restart.
|
|
* launch/poe - Fix for hostlist file support with repeated host
|
|
names.
|
|
* priority/multifactor2 - Prevent possible divide by zero.
|
|
-- srun - Don't check for executable if --test-only flag is
|
|
used.
|
|
* energy - On a single node only use the last task for gathering
|
|
energy. Since we don't currently track energy usage per task
|
|
(only per step). Otherwise we get double the energy.
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Apr 6 11:13:17 UTC 2013 - scorot@free.fr
|
|
|
|
- version 2.5.4
|
|
* Support for Intel® Many Integrated Core (MIC) processors.
|
|
* User control over CPU frequency of each job step.
|
|
* Recording power usage information for each job.
|
|
* Advanced reservation of cores rather than whole nodes.
|
|
* Integration with IBM's Parallel Environment including POE (Parallel
|
|
Operating Environment) and NRT (Network Resource Table) API.
|
|
* Highly optimized throughput for serial jobs in a new
|
|
"select/serial" plugin.
|
|
* CPU load is information available
|
|
* Configurable number of CPUs available to jobs in each SLURM
|
|
partition, which provides a mechanism to reserve CPUs for use
|
|
with GPUs.
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Nov 17 18:02:16 UTC 2012 - scorot@free.fr
|
|
|
|
- remore runlevel 4 from init script thanks to patch1
|
|
- fix self obsoletion of slurm-munge package
|
|
- use fdupes to remove duplicates
|
|
- spec file reformaing
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Nov 17 17:30:11 UTC 2012 - scorot@free.fr
|
|
|
|
- put perl macro in a better within install section
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Nov 17 17:01:20 UTC 2012 - scorot@free.fr
|
|
|
|
- enable numa on x86_64 arch only
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Nov 17 16:54:18 UTC 2012 - scorot@free.fr
|
|
|
|
- add numa and hwloc support
|
|
- fix rpath with patch0
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Nov 16 21:46:49 UTC 2012 - scorot@free.fr
|
|
|
|
- fix perl module files list
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Nov 5 21:48:52 UTC 2012 - scorot@free.fr
|
|
|
|
- use perl_process_packlist macro for the perl files cleanup
|
|
- fix some summaries length
|
|
- add cgoups directory and example the cgroup.release_common file
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Nov 3 18:19:59 UTC 2012 - scorot@free.fr
|
|
|
|
- spec file cleanup
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Nov 3 15:57:47 UTC 2012 - scorot@free.fr
|
|
|
|
- first package
|
|
|