SHA256
1
0
forked from pool/slurm
slurm/slurm.changes

250 lines
11 KiB
Plaintext
Raw Normal View History

-------------------------------------------------------------------
Mon Oct 17 13:25:52 UTC 2016 - eich@suse.com
- Setting 'download_files' service to mode='localonly'
and adding source tarball. (Required for Factory).
Accepting request 435622 from home:eeich:branches:network:cluster - version 15.08.7.1 * Remove the 1024-character limit on lines in batch scripts. task/affinity: Disable core-level task binding if more CPUs required than available cores. * Preemption/gang scheduling: If a job is suspended at slurmctld restart or reconfiguration time, then leave it suspended rather than resume+suspend. * Don't use lower weight nodes for job allocation when topology/tree used. * Don't allow user specified reservation names to disrupt the normal reservation sequeuece numbering scheme. * Avoid hard-link/copy of script/environment files for job arrays. Use the master job record file for all tasks of the job array. NOTE: Job arrays submitted to Slurm version 15.08.6 or later will fail if the slurmctld daemon is downgraded to an earlier version of Slurm. * In slurmctld log file, log duplicate job ID found by slurmd. Previously was being logged as prolog/epilog failure. * If a job is requeued while in the process of being launch, remove it's job ID from slurmd's record of active jobs in order to avoid generating a duplicate job ID error when launched for the second time (which would drain the node). * Cleanup messages when handling job script and environment variables in older directory structure formats. * Prevent triggering gang scheduling within a partition if configured with PreemptType=partition_prio and PreemptMode=suspend,gang. * Decrease parallelism in job cancel request to prevent denial of service when cancelling huge numbers of jobs. * If all ephemeral ports are in use, try using other port numbers. * Prevent "scontrol update job" from updating jobs that have already finished. * Show requested TRES in "squeue -O tres" when job is pending. * Backfill scheduler: Test association and QOS node limits before reserving resources for pending job. * Many bug fixes. - Use source services to download package. - Fix code for new API of hwloc-2.0. - package netloc_to_topology where avialable. - Package documentation. OBS-URL: https://build.opensuse.org/request/show/435622 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=10
2016-10-16 21:51:20 +02:00
-------------------------------------------------------------------
Sat Oct 15 18:11:39 UTC 2016 - eich@suse.com
- version 15.08.7.1
* Remove the 1024-character limit on lines in batch scripts.
task/affinity: Disable core-level task binding if more CPUs required than
available cores.
* Preemption/gang scheduling: If a job is suspended at slurmctld restart or
reconfiguration time, then leave it suspended rather than resume+suspend.
* Don't use lower weight nodes for job allocation when topology/tree used.
* Don't allow user specified reservation names to disrupt the normal
reservation sequeuece numbering scheme.
* Avoid hard-link/copy of script/environment files for job arrays. Use the
master job record file for all tasks of the job array.
NOTE: Job arrays submitted to Slurm version 15.08.6 or later will fail if
the slurmctld daemon is downgraded to an earlier version of Slurm.
* In slurmctld log file, log duplicate job ID found by slurmd. Previously was
being logged as prolog/epilog failure.
* If a job is requeued while in the process of being launch, remove it's
job ID from slurmd's record of active jobs in order to avoid generating a
duplicate job ID error when launched for the second time (which would
drain the node).
* Cleanup messages when handling job script and environment variables in
older directory structure formats.
* Prevent triggering gang scheduling within a partition if configured with
PreemptType=partition_prio and PreemptMode=suspend,gang.
* Decrease parallelism in job cancel request to prevent denial of service
when cancelling huge numbers of jobs.
* If all ephemeral ports are in use, try using other port numbers.
* Prevent "scontrol update job" from updating jobs that have already finished.
* Show requested TRES in "squeue -O tres" when job is pending.
* Backfill scheduler: Test association and QOS node limits before reserving
resources for pending job.
* Many bug fixes.
- Use source services to download package.
- Fix code for new API of hwloc-2.0.
- package netloc_to_topology where avialable.
- Package documentation.
-------------------------------------------------------------------
Sun Nov 1 13:45:52 UTC 2015 - scorot@free.fr
- version 15.08.3
* Many new features and bug fixes. See NEWS file
- update files list accordingly
- fix wrong end of line in some files
-------------------------------------------------------------------
Thu Aug 6 19:06:18 UTC 2015 - scorot@free.fr
- version 14.11.8
* Many bug fixes. See NEWS file
- update files list accordingly
-------------------------------------------------------------------
Sun Nov 2 22:12:34 UTC 2014 - scorot@free.fr
- add missing systemd requirements
- add missing rclink
-------------------------------------------------------------------
Sun Nov 2 15:04:42 UTC 2014 - scorot@free.fr
- version 14.03.9
* Many bug fixes. See NEWS file
- add systemd support
-------------------------------------------------------------------
Sat Jul 26 10:22:32 UTC 2014 - scorot@free.fr
- version 14.03.6
* Added support for native Slurm operation on Cray systems
(without ALPS).
* Added partition configuration parameters AllowAccounts,
AllowQOS, DenyAccounts and DenyQOS to provide greater control
over use.
* Added the ability to perform load based scheduling. Allocating
resources to jobs on the nodes with the largest number if idle
CPUs.
* Added support for reserving cores on a compute node for system
services (core specialization)
* Add mechanism for job_submit plugin to generate error message
for srun, salloc or sbatch to stderr.
* Support for Postgres database has long since been out of date
and problematic, so it has been removed entirely. If you
would like to use it the code still exists in <= 2.6, but will
not be included in this and future versions of the code.
* Added new structures and support for both server and cluster
resources.
* Significant performance improvements, especially with respect
to job array support.
- update files list
Accepting request 226317 from home:scorot:branches:network:cluster - update to version 2.6.7 * Support for job arrays, which increases performance and ease of use for sets of similar jobs. * Job profiling capability added to record a wide variety of job characteristics for each task on a user configurable periodic basis. Data currently available includes CPU use, memory use, energy use, Infiniband network use, Lustre file system use, etc. * Support for MPICH2 using PMI2 communications interface with much greater scalability. * Prolog and epilog support for advanced reservations. * Much faster throughput for job step execution with --exclusive option. The srun process is notified when resources become available rather than periodic polling. * Support improved for Intel MIC (Many Integrated Core) processor. * Advanced reservations with hostname and core counts now supports asymmetric reservations (e.g. specific different core count for each node). * External sensor plugin infrastructure added to record power consumption, temperature, etc. * Improved performance for high-throughput computing. * MapReduce+ support (launches ~1000x faster, runs ~10x faster). * Added "MaxCPUsPerNode" partition configuration parameter. This can be especially useful to schedule GPUs. For example a node can be associated with two Slurm partitions (e.g. "cpu" and "gpu") and the partition/queue "cpu" could be limited to only a subset of the node's CPUs, insuring that one or more CPUs would be available to jobs in the "gpu" partition/queue. OBS-URL: https://build.opensuse.org/request/show/226317 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=4
2014-03-16 21:42:08 +01:00
-------------------------------------------------------------------
Sun Mar 16 15:59:01 UTC 2014 - scorot@free.fr
- update to version 2.6.7
* Support for job arrays, which increases performance and ease of
use for sets of similar jobs.
* Job profiling capability added to record a wide variety of job
characteristics for each task on a user configurable periodic
basis. Data currently available includes CPU use, memory use,
energy use, Infiniband network use, Lustre file system use, etc.
* Support for MPICH2 using PMI2 communications interface with much
greater scalability.
* Prolog and epilog support for advanced reservations.
* Much faster throughput for job step execution with --exclusive
option. The srun process is notified when resources become
available rather than periodic polling.
* Support improved for Intel MIC (Many Integrated Core) processor.
* Advanced reservations with hostname and core counts now supports
asymmetric reservations (e.g. specific different core count for
each node).
* External sensor plugin infrastructure added to record power
consumption, temperature, etc.
* Improved performance for high-throughput computing.
* MapReduce+ support (launches ~1000x faster, runs ~10x faster).
* Added "MaxCPUsPerNode" partition configuration parameter. This
can be especially useful to schedule GPUs. For example a node
can be associated with two Slurm partitions (e.g. "cpu" and
"gpu") and the partition/queue "cpu" could be limited to only a
subset of the node's CPUs, insuring that one or more CPUs would
be available to jobs in the "gpu" partition/queue.
Accepting request 177944 from home:scorot:branches:network:cluster - version 2.5.7 * Fix for linking to the select/cray plugin to not give warning about undefined variable. * Add missing symbols to the xlator.h * Avoid placing pending jobs in AdminHold state due to backfill scheduler interactions with advanced reservation. * Accounting - make average by task not cpu. * POE - Correct logic to support poe option "-euidevice sn_all" and "-euidevice sn_single". * Accounting - Fix minor initialization error. * POE - Correct logic to support srun network instances count with POE. * POE - With the srun --launch-cmd option, report proper task count when the --cpus-per-task option is used without the --ntasks option. * POE - Fix logic binding tasks to CPUs. * sview - Fix race condition where new information could of slipped past the node tab and we didn't notice. * Accounting - Fix an invalid memory read when slurmctld sends data about start job to slurmdbd. * If a prolog or epilog failure occurs, drain the node rather than setting it down and killing all of its jobs. * Priority/multifactor - Avoid underflow in half-life calculation. * POE - pack missing variable to allow fanout (more than 32 nodes) * Prevent clearing reason field for pending jobs. This bug was introduced in v2.5.5 (see "Reject job at submit time ..."). * BGQ - Fix issue with preemption on sub-block jobs where a job would kill all preemptable jobs on the midplane instead of just the ones it needed to. OBS-URL: https://build.opensuse.org/request/show/177944 OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=3
2013-06-06 23:03:00 +02:00
-------------------------------------------------------------------
Thu Jun 6 20:31:49 UTC 2013 - scorot@free.fr
- version 2.5.7
* Fix for linking to the select/cray plugin to not give warning
about undefined variable.
* Add missing symbols to the xlator.h
* Avoid placing pending jobs in AdminHold state due to backfill
scheduler interactions with advanced reservation.
* Accounting - make average by task not cpu.
* POE - Correct logic to support poe option "-euidevice sn_all"
and "-euidevice sn_single".
* Accounting - Fix minor initialization error.
* POE - Correct logic to support srun network instances count
with POE.
* POE - With the srun --launch-cmd option, report proper task
count when the --cpus-per-task option is used without the
--ntasks option.
* POE - Fix logic binding tasks to CPUs.
* sview - Fix race condition where new information could of
slipped past the node tab and we didn't notice.
* Accounting - Fix an invalid memory read when slurmctld sends
data about start job to slurmdbd.
* If a prolog or epilog failure occurs, drain the node rather
than setting it down and killing all of its jobs.
* Priority/multifactor - Avoid underflow in half-life calculation.
* POE - pack missing variable to allow fanout (more than 32
nodes)
* Prevent clearing reason field for pending jobs. This bug was
introduced in v2.5.5 (see "Reject job at submit time ...").
* BGQ - Fix issue with preemption on sub-block jobs where a job
would kill all preemptable jobs on the midplane instead of just
the ones it needed to.
* switch/nrt - Validate dynamic window allocation size.
* BGQ - When --geo is requested do not impose the default
conn_types.
* RebootNode logic - Defers (rather than forgets) reboot request
with job running on the node within a reservation.
* switch/nrt - Correct network_id use logic. Correct support for
user sn_all and sn_single options.
* sched/backfill - Modify logic to reduce overhead under heavy
load.
* Fix job step allocation with --exclusive and --hostlist option.
* Select/cons_res - Fix bug resulting in error of "cons_res: sync
loop not progressing, holding job #"
* checkpoint/blcr - Reset max_nodes from zero to NO_VAL on job
restart.
* launch/poe - Fix for hostlist file support with repeated host
names.
* priority/multifactor2 - Prevent possible divide by zero.
-- srun - Don't check for executable if --test-only flag is
used.
* energy - On a single node only use the last task for gathering
energy. Since we don't currently track energy usage per task
(only per step). Otherwise we get double the energy.
-------------------------------------------------------------------
Sat Apr 6 11:13:17 UTC 2013 - scorot@free.fr
- version 2.5.4
* Support for Intel® Many Integrated Core (MIC) processors.
* User control over CPU frequency of each job step.
* Recording power usage information for each job.
* Advanced reservation of cores rather than whole nodes.
* Integration with IBM's Parallel Environment including POE (Parallel
Operating Environment) and NRT (Network Resource Table) API.
* Highly optimized throughput for serial jobs in a new
"select/serial" plugin.
* CPU load is information available
* Configurable number of CPUs available to jobs in each SLURM
partition, which provides a mechanism to reserve CPUs for use
with GPUs.
-------------------------------------------------------------------
Sat Nov 17 18:02:16 UTC 2012 - scorot@free.fr
- remore runlevel 4 from init script thanks to patch1
- fix self obsoletion of slurm-munge package
- use fdupes to remove duplicates
- spec file reformaing
-------------------------------------------------------------------
Sat Nov 17 17:30:11 UTC 2012 - scorot@free.fr
- put perl macro in a better within install section
-------------------------------------------------------------------
Sat Nov 17 17:01:20 UTC 2012 - scorot@free.fr
- enable numa on x86_64 arch only
-------------------------------------------------------------------
Sat Nov 17 16:54:18 UTC 2012 - scorot@free.fr
- add numa and hwloc support
- fix rpath with patch0
-------------------------------------------------------------------
Fri Nov 16 21:46:49 UTC 2012 - scorot@free.fr
- fix perl module files list
-------------------------------------------------------------------
Mon Nov 5 21:48:52 UTC 2012 - scorot@free.fr
- use perl_process_packlist macro for the perl files cleanup
- fix some summaries length
- add cgoups directory and example the cgroup.release_common file
-------------------------------------------------------------------
Sat Nov 3 18:19:59 UTC 2012 - scorot@free.fr
- spec file cleanup
-------------------------------------------------------------------
Sat Nov 3 15:57:47 UTC 2012 - scorot@free.fr
- first package