Compare commits

..

315 Commits

Author SHA256 Message Date
Dominique Leuenberger
8a2be70840 Accepting request 1238577 from network:cluster
* `slurmrestd` - Remove deprecated fields from the following
     `.result` from `POST /slurm/v0.0.42/job/submit`.  
     `.job_id`, `.step_id`, `.job_submit_user_msg` from `POST /slurm/v0.0.42/job/{job_id}`.  
     `.job.exclusive`, `.jobs[].exclusive` to `POST /slurm/v0.0.42/job/submit`.  
     `.jobs[].exclusive` from `GET /slurm/v0.0.42/job/{job_id}`.  
     `.jobs[].exclusive` from `GET /slurm/v0.0.42/jobs`.  
     `.job.oversubscribe`, `.jobs[].oversubscribe` to `POST /slurm/v0.0.42/job/submit`.  
     `.jobs[].oversubscribe` from `GET /slurm/v0.0.42/job/{job_id}`.  
     `.jobs[].oversubscribe` from `GET /slurm/v0.0.42/jobs`.  
     `DELETE /slurm/v0.0.40/jobs`  
     `DELETE /slurm/v0.0.41/jobs`  
     `DELETE /slurm/v0.0.42/jobs`  
    allocation is granted.
    `job|socket|task` or `cpus|mem` per GRES.
    node update whereas previously only single nodes could be
    updated through `/node/<nodename>` endpoint:
    `POST /slurm/v0.0.42/nodes`
    partition as this is a cluster-wide option.
    `REQUEST_NODE_INFO RPC`.
    the db server is not reachable.
    (`.jobs[].priority_by_partition`) to JSON and YAML output.
    connection` error if the error was the result of an
    authentication failure.
    errors with the `SLURM_PROTOCOL_AUTHENTICATION_ERROR` error
    code.
    of `Unspecified error` if querying the following endpoints
    fails:  
    `GET /slurm/v0.0.40/diag/`  
    `GET /slurm/v0.0.41/diag/`  
    `GET /slurm/v0.0.42/diag/` (forwarded request 1238576 from eeich)

OBS-URL: https://build.opensuse.org/request/show/1238577
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=111
2025-01-18 12:18:25 +00:00
Ana Guerrero
3b4d2235f3 Accepting request 1236247 from network:cluster
- Fix testsuite:
  Cater for erroneous: `#include </src/[slurm_internal_header]>`
  statements. (forwarded request 1236246 from eeich)

OBS-URL: https://build.opensuse.org/request/show/1236247
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=110
2025-01-12 10:14:54 +00:00
bf43fd9d06 - Fix testsuite:
Cater for erroneous: `#include </src/[slurm_internal_header]>`
  statements.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=304
2025-01-09 15:43:36 +00:00
Ana Guerrero
e8b6930a42 Accepting request 1235784 from network:cluster
- Update to version 24.11
  * `slurmctld` - Reject arbitrary distribution jobs that do not
    specifying a task count.
  * Fix backwards compatibility of the `RESPONSE_JOB_INFO RPC`
    (used by `squeue`, `scontrol show job`, etc.) with Slurm clients
    version 24.05 and below. This was a regression in 24.11.0rc1.
  * Do not let `slurmctld`/`slurmd` start if there are more nodes
    defined in `slurm.conf` than the maximum supported amount
    (64k nodes).
  * `slurmctld` - Set job's exit code to 1 when a job fails with
    state `JOB_NODE_FAIL`. This fixes `sbatch --wait` not being able
    to exit with error code when a job fails for this reason in
    some cases.
  * Fix certain reservation updates requested from 23.02 clients.
  * `slurmrestd` - Fix populating non-required object fields of
    objects as `{}` in JSON/YAML instead of `null` causing compiled
    OpenAPI clients to reject the response to
    `GET /slurm/v0.0.40/jobs` due to validation failure of
    `.jobs[].job_resources`.
  * Fix issue where older versions of Slurm talking to a 24.11 dbd
    could loose step accounting.
  * Fix minor memory leaks.
  * Fix bad memory reference when `xstrchr` fails to find char.
  * Remove duplicate checks for a data structure.
  * Fix race condition in `stepmgr` step completion handling.
  * `slurm.spec` - add ability to specify patches to apply on the
    command line.
  * `slurm.spec` - add ability to supply extra version information.
  * Fix 24.11 HA issues.
  * Fix requeued jobs keeping their priority until the decay thread (forwarded request 1235783 from eeich)

OBS-URL: https://build.opensuse.org/request/show/1235784
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=109
2025-01-09 14:07:22 +00:00
626fb47a3b - Update to version 24.11
* `slurmctld` - Reject arbitrary distribution jobs that do not
    specifying a task count.
  * Fix backwards compatibility of the `RESPONSE_JOB_INFO RPC`
    (used by `squeue`, `scontrol show job`, etc.) with Slurm clients
    version 24.05 and below. This was a regression in 24.11.0rc1.
  * Do not let `slurmctld`/`slurmd` start if there are more nodes
    defined in `slurm.conf` than the maximum supported amount
    (64k nodes).
  * `slurmctld` - Set job's exit code to 1 when a job fails with
    state `JOB_NODE_FAIL`. This fixes `sbatch --wait` not being able
    to exit with error code when a job fails for this reason in
    some cases.
  * Fix certain reservation updates requested from 23.02 clients.
  * `slurmrestd` - Fix populating non-required object fields of
    objects as `{}` in JSON/YAML instead of `null` causing compiled
    OpenAPI clients to reject the response to
    `GET /slurm/v0.0.40/jobs` due to validation failure of
    `.jobs[].job_resources`.
  * Fix issue where older versions of Slurm talking to a 24.11 dbd
    could loose step accounting.
  * Fix minor memory leaks.
  * Fix bad memory reference when `xstrchr` fails to find char.
  * Remove duplicate checks for a data structure.
  * Fix race condition in `stepmgr` step completion handling.
  * `slurm.spec` - add ability to specify patches to apply on the
    command line.
  * `slurm.spec` - add ability to supply extra version information.
  * Fix 24.11 HA issues.
  * Fix requeued jobs keeping their priority until the decay thread

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=302
2025-01-08 06:03:29 +00:00
Dominique Leuenberger
17d576bce0 Accepting request 1220076 from network:cluster
- Update to version 24.05.4 & fix for CVE-2024-48936.
  * Fix generic int sort functions.
  * Fix user look up using possible unrealized uid in the dbd.
  * `slurmrestd` - Fix regressions that allowed `slurmrestd` to
    be run as SlurmUser when `SlurmUser` was not root.
  * mpi/pmix fix race conditions with het jobs at step start/end
    which could make srun to hang.
  * Fix not showing some `SelectTypeParameters` in `scontrol show
    config`.
  * Avoid assert when dumping removed certain fields in JSON/YAML.
  * Improve how shards are scheduled with affinity in mind.
  * Fix `MaxJobsAccruePU` not being respected when `MaxJobsAccruePA`
    is set in the same QOS.
  * Prevent backfill from planning jobs that use overlapping
    resources for the same time slot if the job's time limit is
    less than `bf_resolution`.
  * Fix memory leak when requesting typed gres and
    `--[cpus|mem]-per-gpu`.
  * Prevent backfill from breaking out due to "system state
    changed" every 30 seconds if reservations use `REPLACE` or
   `REPLACE_DOWN` flags.
  * `slurmrestd` - Make sure that scheduler_unset parameter defaults
    to true even when the following flags are also set:
    `show_duplicates`, `skip_steps`, `disable_truncate_usage_time`,
    `run_away_jobs`, `whole_hetjob`, `disable_whole_hetjob`,
    `disable_wait_for_result`, `usage_time_as_submit_time`,
    `show_batch_script`, and or `show_job_environment`. Additionaly,
    always make sure show_duplicates and
    `disable_truncate_usage_time` default to true when the following
    flags are also set: `scheduler_unset`, `scheduled_on_submit`, (forwarded request 1220075 from eeich)

OBS-URL: https://build.opensuse.org/request/show/1220076
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=108
2024-11-01 20:07:50 +00:00
b1107f7a34 - Update to version 24.05.4 & fix for CVE-2024-48936.
* Fix generic int sort functions.
  * Fix user look up using possible unrealized uid in the dbd.
  * `slurmrestd` - Fix regressions that allowed `slurmrestd` to
    be run as SlurmUser when `SlurmUser` was not root.
  * mpi/pmix fix race conditions with het jobs at step start/end
    which could make srun to hang.
  * Fix not showing some `SelectTypeParameters` in `scontrol show
    config`.
  * Avoid assert when dumping removed certain fields in JSON/YAML.
  * Improve how shards are scheduled with affinity in mind.
  * Fix `MaxJobsAccruePU` not being respected when `MaxJobsAccruePA`
    is set in the same QOS.
  * Prevent backfill from planning jobs that use overlapping
    resources for the same time slot if the job's time limit is
    less than `bf_resolution`.
  * Fix memory leak when requesting typed gres and
    `--[cpus|mem]-per-gpu`.
  * Prevent backfill from breaking out due to "system state
    changed" every 30 seconds if reservations use `REPLACE` or
   `REPLACE_DOWN` flags.
  * `slurmrestd` - Make sure that scheduler_unset parameter defaults
    to true even when the following flags are also set:
    `show_duplicates`, `skip_steps`, `disable_truncate_usage_time`,
    `run_away_jobs`, `whole_hetjob`, `disable_whole_hetjob`,
    `disable_wait_for_result`, `usage_time_as_submit_time`,
    `show_batch_script`, and or `show_job_environment`. Additionaly,
    always make sure show_duplicates and
    `disable_truncate_usage_time` default to true when the following
    flags are also set: `scheduler_unset`, `scheduled_on_submit`,

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=300
2024-11-01 13:22:34 +00:00
Ana Guerrero
3133935d61 Accepting request 1217321 from network:cluster
- Add %(?%sysusers_requires} to slurm-config.
  This fixes issues when building against Slurm. (forwarded request 1217300 from eeich)

OBS-URL: https://build.opensuse.org/request/show/1217321
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=107
2024-10-24 13:42:28 +00:00
427f09ad29 - Add %(?%sysusers_requires} to slurm-config.
This fixes issues when building against Slurm.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=298
2024-10-23 09:42:56 +00:00
Ana Guerrero
de9dc95156 Accepting request 1208086 from network:cluster
- Update to version 24.05.3
  * `data_parser/v0.0.40` - Added field descriptions.
  * `slurmrestd` - Avoid creating new slurmdbd connection per request
    to `* /slurm/slurmctld/*/*` endpoints.
  * Fix compilation issue with `switch/hpe_slingshot` plugin.
  * Fix gres per task allocation with threads-per-core.
  * `data_parser/v0.0.41` - Added field descriptions.
  * `slurmrestd` - Change back generated OpenAPI schema for
    `DELETE /slurm/v0.0.40/jobs/` to `RequestBody` instead of using
    parameters for request. `slurmrestd` will continue accept endpoint
    requests via `RequestBody` or HTTP query.
  * `topology/tree` - Fix issues with switch distance optimization.
  * Fix potential segfault of secondary `slurmctld` when falling back
    to the primary when running with a `JobComp` plugin.
  * Enable `--json`/`--yaml=v0.0.39` options on client commands to
    dump data using data_parser/v0.0.39 instead or outputting nothing.
  * `switch/hpe_slingshot` - Fix issue that could result in a 0 length
    state file.
  * Fix unnecessary message protocol downgrade for unregistered nodes.
  * Fix unnecessarily packing alias addrs when terminating jobs with
    a mix of non-cloud/dynamic nodes and powered down cloud/dynamic
    nodes.
  * `accounting_storage/mysql` - Fix issue when deleting a qos that
    could remove too many commas from the qos and/or delta_qos fields
    of the assoc table.
  * `slurmctld` - Fix memory leak when using RestrictedCoresPerGPU.
  * Fix allowing access to reservations without `MaxStartDelay` set.
  * Fix regression introduced in 24.05.0rc1 breaking
    `srun --send-libs` parsing.
  * Fix slurmd vsize memory leak when using job submission/allocation

OBS-URL: https://build.opensuse.org/request/show/1208086
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=106
2024-10-15 13:01:34 +00:00
1cc2983ebe - Removed Fix-test-21.41.patch as upstream test changed.
- Dropped package plugin-ext-sensors-rrd as the plugin module no
  longer exists.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=296
2024-10-15 10:19:24 +00:00
b2f6e848a1 - Update to version 24.05.3
* `data_parser/v0.0.40` - Added field descriptions.
  * `slurmrestd` - Avoid creating new slurmdbd connection per request
    to `* /slurm/slurmctld/*/*` endpoints.
  * Fix compilation issue with `switch/hpe_slingshot` plugin.
  * Fix gres per task allocation with threads-per-core.
  * `data_parser/v0.0.41` - Added field descriptions.
  * `slurmrestd` - Change back generated OpenAPI schema for
    `DELETE /slurm/v0.0.40/jobs/` to `RequestBody` instead of using
    parameters for request. `slurmrestd` will continue accept endpoint
    requests via `RequestBody` or HTTP query.
  * `topology/tree` - Fix issues with switch distance optimization.
  * Fix potential segfault of secondary `slurmctld` when falling back
    to the primary when running with a `JobComp` plugin.
  * Enable `--json`/`--yaml=v0.0.39` options on client commands to
    dump data using data_parser/v0.0.39 instead or outputting nothing.
  * `switch/hpe_slingshot` - Fix issue that could result in a 0 length
    state file.
  * Fix unnecessary message protocol downgrade for unregistered nodes.
  * Fix unnecessarily packing alias addrs when terminating jobs with
    a mix of non-cloud/dynamic nodes and powered down cloud/dynamic
    nodes.
  * `accounting_storage/mysql` - Fix issue when deleting a qos that
    could remove too many commas from the qos and/or delta_qos fields
    of the assoc table.
  * `slurmctld` - Fix memory leak when using RestrictedCoresPerGPU.
  * Fix allowing access to reservations without `MaxStartDelay` set.
  * Fix regression introduced in 24.05.0rc1 breaking
    `srun --send-libs` parsing.
  * Fix slurmd vsize memory leak when using job submission/allocation

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=295
2024-10-15 06:51:09 +00:00
fc209e050f - updated to new release 24.05.0 with following major changes
- IMPORTANT NOTES:
  If using the slurmdbd (Slurm DataBase Daemon) you must update
  this first.  NOTE: If using a backup DBD you must start the
  primary first to do any database conversion, the backup will not
  start until this has happened.  The 24.05 slurmdbd will work
  with Slurm daemons of version 23.02 and above.  You will not
  need to update all clusters at the same time, but it is very
  important to update slurmdbd first and having it running before
  updating any other clusters making use of it.
- HIGHLIGHTS
  * Federation - allow client command operation when slurmdbd is
    unavailable.
  * burst_buffer/lua - Added two new hooks: slurm_bb_test_data_in
    and slurm_bb_test_data_out. The syntax and use of the new hooks
    are documented in etc/burst_buffer.lua.example. These are
    required to exist. slurmctld now checks on startup if the
    burst_buffer.lua script loads and contains all required hooks;
    slurmctld will exit with a fatal error if this is not
    successful. Added PollInterval to burst_buffer.conf. Removed
    the arbitrary limit of 512 copies of the script running
    simultaneously.
  * Add QOS limit MaxTRESRunMinsPerAccount. 
  * Add QOS limit MaxTRESRunMinsPerUser.
  * Add ELIGIBLE environment variable to jobcomp/script plugin.
  * Always use the QOS name for SLURM_JOB_QOS environment variables.
    Previously the batch environment would use the description field,
    which was usually equivalent to the name. 
  * cgroup/v2 - Require dbus-1 version >= 1.11.16.
  * Allow NodeSet names to be used in SuspendExcNodes.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=294
2024-10-14 10:03:00 +00:00
Ana Guerrero
61add11d2b Accepting request 1161658 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/1161658
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=105
2024-03-26 18:27:40 +00:00
cda5ce024e Accepting request 1161499 from home:mslacken:branches:network:cluster
- removed Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch
  as incoperated upstream
* Changes in Slurm 23.02.5
 * Add the JobId to debug() messages indicating when cpus_per_task/mem_per_cpu
   or pn_min_cpus are being automatically adjusted.
 * Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if
   a node features plugin is configured.
 * Fix and prevent reoccurring reservations from overlapping.
 * job_container/tmpfs - Avoid attempts to share BasePath between nodes.
 * Change the log message warning for rate limited users from verbose to info.
 * With CR_Cpu_Memory, fix node selection for jobs that request gres and
   *-mem-per-cpu.
 * Fix a regression from 22.05.7 in which some jobs were allocated too few
   nodes, thus overcommitting cpus to some tasks.
 * Fix a job being stuck in the completing state if the job ends while the
   primary controller is down or unresponsive and the backup controller has
   not yet taken over.
 * Fix slurmctld segfault when a node registers with a configured CpuSpecList
   while slurmctld configuration has the node without CpuSpecList.
 * Fix cloud nodes getting stuck in POWERED_DOWN+NO_RESPOND state after not
   registering by ResumeTimeout.
 * slurmstepd - Avoid cleanup of config.json-less containers spooldir getting
   skipped.
 * slurmstepd - Cleanup per task generated environment for containers in
   spooldir.
 * Fix scontrol segfault when 'completing' command requested repeatedly in
   interactive mode.
 * Properly handle a race condition between bind() and listen() calls in the
   network stack when running with SrunPortRange set.
 * Federation - Fix revoked jobs being returned regardless of the -a/--all

OBS-URL: https://build.opensuse.org/request/show/1161499
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=292
2024-03-26 08:40:44 +00:00
2bd53c8d44 work correctly (boo#1204697).
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=291
2024-03-23 10:05:59 +00:00
Ana Guerrero
4ec0f5cd48 Accepting request 1151965 from network:cluster
- Update to version 23.11.03
  * slurmrestd - Reject single http query with multiple path
    requests.
  * Fix launching Singularity v4.x containers with
    `srun --container` by setting .process.terminal to true in
    generated `config.json` when step has pseudoterminal (`--pty`)
    requested.
  * Fix loading in `dyanmic/cloud` node jobs after `net_cred`
    expired.
  * Fix cgroup null path error on `slurmd/slurmstepd` tear down.
  * `data_parser/v0.0.40` - Prevent failure if accounting is
    disabled, instead issue a warning if needed data from the
    database can not be retrieved.
  * `openapi/slurmctld` - Prevent failure if accounting is disabled.
  * Prevent `slurmscriptd` processing delays from blocking other
    threads in `slurmctld` while trying to launch various scripts.
    This is additional work for a fix in 23.02.6.
  * Fix memory leak when receiving alias addrs from controller.
  * `scontrol` - Accept `scontrol token lifespan=infinite` to
    create tokens that effectively do not expire.
  * Avoid errors when Slurmdb accounting disabled when `--json` or
    `--yaml` is invoked with CLI commands and `slurmrestd`. Add
    warnings when query would have populated data from Slurmdb
    instead of errors.
  * Fix `slurmctld` memory leak when running job with
    `--tres-per-task=gres:shard:#`
  * Fix backfill trying to start jobs outside of backfill window.
  * Fix oversubscription on partitions with `PreemptMode=OFF`.
  * Preserve node reason on power up if the node is downed
    or drained. (forwarded request 1150524 from eeich)

OBS-URL: https://build.opensuse.org/request/show/1151965
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=104
2024-02-27 21:47:57 +00:00
fb460ebe6a Accepting request 1150524 from home:eeich:branches:network:cluster
- Update to version 23.11.03
  * slurmrestd - Reject single http query with multiple path
    requests.
  * Fix launching Singularity v4.x containers with
    `srun --container` by setting .process.terminal to true in
    generated `config.json` when step has pseudoterminal (`--pty`)
    requested.
  * Fix loading in `dyanmic/cloud` node jobs after `net_cred`
    expired.
  * Fix cgroup null path error on `slurmd/slurmstepd` tear down.
  * `data_parser/v0.0.40` - Prevent failure if accounting is
    disabled, instead issue a warning if needed data from the
    database can not be retrieved.
  * `openapi/slurmctld` - Prevent failure if accounting is disabled.
  * Prevent `slurmscriptd` processing delays from blocking other
    threads in `slurmctld` while trying to launch various scripts.
    This is additional work for a fix in 23.02.6.
  * Fix memory leak when receiving alias addrs from controller.
  * `scontrol` - Accept `scontrol token lifespan=infinite` to
    create tokens that effectively do not expire.
  * Avoid errors when Slurmdb accounting disabled when `--json` or
    `--yaml` is invoked with CLI commands and `slurmrestd`. Add
    warnings when query would have populated data from Slurmdb
    instead of errors.
  * Fix `slurmctld` memory leak when running job with
    `--tres-per-task=gres:shard:#`
  * Fix backfill trying to start jobs outside of backfill window.
  * Fix oversubscription on partitions with `PreemptMode=OFF`.
  * Preserve node reason on power up if the node is downed
    or drained.

OBS-URL: https://build.opensuse.org/request/show/1150524
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=289
2024-02-26 21:40:59 +00:00
Ana Guerrero
6a021ebb80 Accepting request 1141442 from network:cluster
- Update to 23.11.1 with following major improvements and fixing
  CVE-2023-49933, CVE-2023-49934, CVE-2023-49935, CVE-2023-49936
  and CVE-2023-49937
  * Substantially overhauled the SlurmDBD association management
    code. For clusters updated to 23.11, account and user
    additions or removals are significantly faster than in prior
    releases.
  * Overhauled `scontrol reconfigure` to prevent configuration
    mistakes from disabling slurmctld and slurmd. Instead, an
    error will be returned, and the running configuration will
    persist. This does require updates to the systemd service
    files to use the `--systemd` option to `slurmctld` and `slurmd`.
  * Added a new internal `auth/cred` plugin - `auth/slurm`. This
    builds off the prior `auth/jwt` model, and permits operation
    of the `slurmdbd` and `slurmctld` without access to full
    directory information with a suitable configuration.
  * Added a new `--external-launcher` option to `srun`, which is
    automatically set by common MPI launcher implementations and
    ensures processes using those non-srun launchers have full
    access to all resources allocated on each node.
  * Reworked the dynamic/cloud modes of operation to allow for
    "fanout" - where Slurm communication can be automatically
    offloaded to compute nodes for increased cluster scalability.
  * Overhauled and extended the Reservation subsystem to allow
    for most of the same resource requirements as are placed on
    the job. Notably, this permits reservations to now reserve
    GRES directly.
- Details of changes:
  * Fix `scontrol update job=... TimeLimit+=/-=` when used with a
    raw JobId of job array element.

OBS-URL: https://build.opensuse.org/request/show/1141442
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=103
2024-01-25 17:41:05 +00:00
f98ecb23d5 - Remove last change. This is not how it is intended to work
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=287
2024-01-25 07:58:54 +00:00
a95f2355d0 Accepting request 1141020 from home:dimstar:Factory
- Fix dependency of testsuite when building without hdf5
  (have_hdf5=0). The previously use construct
  %{?have_hdf5:%ts_depends: does not behave as intended by the
  line-author: %{?…:} does not include a question of value, but
  only if the variable is defined or undefind.

OBS-URL: https://build.opensuse.org/request/show/1141020
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=286
2024-01-24 14:43:56 +00:00
e59754da76 CVE-2023-49933, CVE-2023-49934, CVE-2023-49935, CVE-2023-49936
and CVE-2023-49937
  * Substantially overhauled the SlurmDBD association management
    code. For clusters updated to 23.11, account and user
    additions or removals are significantly faster than in prior
    releases.
  * Overhauled `scontrol reconfigure` to prevent configuration
    mistakes from disabling slurmctld and slurmd. Instead, an
    error will be returned, and the running configuration will
    persist. This does require updates to the systemd service
    files to use the `--systemd` option to `slurmctld` and `slurmd`.
  * Added a new internal `auth/cred` plugin - `auth/slurm`. This
    builds off the prior `auth/jwt` model, and permits operation
    of the `slurmdbd` and `slurmctld` without access to full
    directory information with a suitable configuration.
  * Added a new `--external-launcher` option to `srun`, which is
    automatically set by common MPI launcher implementations and
    ensures processes using those non-srun launchers have full
    access to all resources allocated on each node.
  * Reworked the dynamic/cloud modes of operation to allow for
    "fanout" - where Slurm communication can be automatically
    offloaded to compute nodes for increased cluster scalability.
  * Overhauled and extended the Reservation subsystem to allow
    for most of the same resource requirements as are placed on
    the job. Notably, this permits reservations to now reserve
    GRES directly.
  * Fix `scontrol update job=... TimeLimit+=/-=` when used with a
    raw JobId of job array element.
  * Reject `TimeLimit` increment/decrement when called on job with
    `TimeLimit=UNLIMITED`.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=285
2024-01-22 16:26:43 +00:00
e7275730c8 Accepting request 1138332 from home:mslacken:branches:network:cluster
- Update to 23.11.1 with following major improvements and fixing
  CVE-2023-49933, CVE-2023-49934, CVE-2023-49935, CVE-2023-49936 and
  CVE-2023-49937
  * Substantially overhauled the SlurmDBD association management code. For
    clusters updated to 23.11, account and user additions or removals are
    significantly faster than in prior releases.
  * Overhauled 'scontrol reconfigure' to prevent configuration mistakes from
    disabling slurmctld and slurmd. Instead, an error will be returned, and the
    running configuration will persist. This does require updates to the
    systemd service files to use the --systemd option to slurmctld and slurmd.
  * Added a new internal auth/cred plugin - "auth/slurm". This builds off the
    prior auth/jwt model, and permits operation of the slurmdbd and slurmctld
    without access to full directory information with a suitable configuration.
  * Added a new --external-launcher option to srun, which is automatically set
    by common MPI launcher implementations and ensures processes using those
    non-srun launchers have full access to all resources allocated on each
    node.
  * Reworked the dynamic/cloud modes of operation to allow for "fanout" - where
    Slurm communication can be automatically offloaded to compute nodes for
    increased cluster scalability.
    Added initial official Debian packaging support.
  * Overhauled and extended the Reservation subsystem to allow for most of the
    same resource requirements as are placed on the job. Notably, this permits
    reservations to now reserve GRES directly.
- Details of changes:
  * Fix scontrol update job=... TimeLimit+=/-= when used with a raw JobId of job
    array element.
  * Reject TimeLimit increment/decrement when called on job with
    TimeLimit=UNLIMITED.
  * Fix issue with requesting a job with  *licenses as well as

OBS-URL: https://build.opensuse.org/request/show/1138332
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=284
2024-01-22 15:21:33 +00:00
Dominique Leuenberger
1f813cb386 Accepting request 1137045 from network:cluster
- Update to 23.02.6 to fix (CVE-2023-49933 - bsc#1218046, CVE-2023-49935 -
  bsc#1218049, CVE-2023-49936 - bsc#1218050, CVE-2023-49937 - bsc#1218051,
  CVE-2023-49938 - bsc#1218053)
  * Security Fixes:
    + Add `JobAcctGatherParams=DisableGPUAcct` to disable gpu accounting.
    + `acct_gather_energy/ipmi` - Improve logging of DCMI issues.
    + `gpu/oneapi` - Add support for new env vars `ZE_FLAT_DEVICE_HIERARCHY`
      and `ZE_ENABLE_PCI_ID_DEVICE_ORDER`.
    + `data_parser/v0.0.39` - skip empty string when parsing QOS ids.
    + Remove error message from `assoc_mgr_update_assocs` when purposefully
      resetting the default QOS.
  * Bug Fixes:
    + `libslurm_nss` - Avoid causing glibc to assert due to an unexpected
      return from slurm_nss due to an error during lookup.
    + Fix job requests with `--tres-per-task` sometimes resulting in bad
      allocations that cannot run subsequent job steps.
    + Fix issue with `slurmd` where `srun` fails to be warned when a node
      prolog script runs beyond `MsgTimeout` set in `slurm.conf`.
    + `gres/shard` - Fix plugin functions to have matching parameter orders.
    + `gpu/nvml` - Fix issue that resulted in the wrong MIG devices being
      constrained to a job
    + `gpu/nvml` - Fix linking issue with MIGs that prevented multiple MIGs
      being used in a single job for certain MIG configurations
    + Fix file descriptor leak in slurmd when using `acct_gather_energy/ipmi`
      with DCMI devices.
    + `sview` - avoid crash when job has a node list string > 49 characters.
    + Prevent `slurmctld` crash during reconfigure when packing job start
      messages.
    + Preserve reason uid on reconfig.
    + Update node reason with updated `INVAL` state reason if different from (forwarded request 1136624 from eeich)

OBS-URL: https://build.opensuse.org/request/show/1137045
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=102
2024-01-05 20:45:15 +00:00
af603b8163 Accepting request 1136624 from home:eeich:branches:network:cluster
- Update to 23.02.6 to fix (CVE-2023-49933 - bsc#1218046, CVE-2023-49935 -
  bsc#1218049, CVE-2023-49936 - bsc#1218050, CVE-2023-49937 - bsc#1218051,
  CVE-2023-49938 - bsc#1218053)
  * Security Fixes:
    + Add `JobAcctGatherParams=DisableGPUAcct` to disable gpu accounting.
    + `acct_gather_energy/ipmi` - Improve logging of DCMI issues.
    + `gpu/oneapi` - Add support for new env vars `ZE_FLAT_DEVICE_HIERARCHY`
      and `ZE_ENABLE_PCI_ID_DEVICE_ORDER`.
    + `data_parser/v0.0.39` - skip empty string when parsing QOS ids.
    + Remove error message from `assoc_mgr_update_assocs` when purposefully
      resetting the default QOS.
  * Bug Fixes:
    + `libslurm_nss` - Avoid causing glibc to assert due to an unexpected
      return from slurm_nss due to an error during lookup.
    + Fix job requests with `--tres-per-task` sometimes resulting in bad
      allocations that cannot run subsequent job steps.
    + Fix issue with `slurmd` where `srun` fails to be warned when a node
      prolog script runs beyond `MsgTimeout` set in `slurm.conf`.
    + `gres/shard` - Fix plugin functions to have matching parameter orders.
    + `gpu/nvml` - Fix issue that resulted in the wrong MIG devices being
      constrained to a job
    + `gpu/nvml` - Fix linking issue with MIGs that prevented multiple MIGs
      being used in a single job for certain MIG configurations
    + Fix file descriptor leak in slurmd when using `acct_gather_energy/ipmi`
      with DCMI devices.
    + `sview` - avoid crash when job has a node list string > 49 characters.
    + Prevent `slurmctld` crash during reconfigure when packing job start
      messages.
    + Preserve reason uid on reconfig.
    + Update node reason with updated `INVAL` state reason if different from

OBS-URL: https://build.opensuse.org/request/show/1136624
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=282
2024-01-05 12:29:13 +00:00
Ana Guerrero
0db8ed8d95 Accepting request 1130097 from network:cluster
- Add missing service file for slurmrestd (boo#1217711). (forwarded request 1130096 from eeich)

OBS-URL: https://build.opensuse.org/request/show/1130097
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=101
2023-12-04 21:59:28 +00:00
bbe01bb79f Accepting request 1130096 from home:eeich:branches:network:cluster
- Add missing service file for slurmrestd (boo#1217711).

OBS-URL: https://build.opensuse.org/request/show/1130096
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=280
2023-11-30 19:27:08 +00:00
5a1d72f62c Accepting request 1129638 from home:eeich:branches:network:cluster
- Explicitly create an Obsoletes: entry for each package version
  that is obsoleted by the present version. These are all published
  versions of the last two major releases as well as all minor
  versions of the present release lower than the current one
  (bsc#1216869 2nd part).
  This prevents the current version to upgrade a old Slurm version
  for which no upgrade path exists.

OBS-URL: https://build.opensuse.org/request/show/1129638
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=279
2023-11-28 18:02:52 +00:00
Ana Guerrero
1e8971e87a Accepting request 1129192 from network:cluster
Automatic submission by obs-autosubmit

OBS-URL: https://build.opensuse.org/request/show/1129192
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=100
2023-11-27 21:44:42 +00:00
db15cbcf3e - On SLE-12 exclude build for s390x.
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=277
2023-11-20 15:31:39 +00:00
Ana Guerrero
ccb26326c7 Accepting request 1123596 from network:cluster
- Add missing dependencies to slurm-config to plugins package.
  These should help to tie down the slurm version and help to avoid
  a package mix (bsc#1216869). (forwarded request 1123595 from eeich)

OBS-URL: https://build.opensuse.org/request/show/1123596
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=99
2023-11-06 20:14:38 +00:00
961668403a Accepting request 1123595 from home:eeich:branches:network:cluster
- Add missing dependencies to slurm-config to plugins package.
  These should help to tie down the slurm version and help to avoid
  a package mix (bsc#1216869).

OBS-URL: https://build.opensuse.org/request/show/1123595
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=275
2023-11-06 14:56:24 +00:00
Dominique Leuenberger
b28d182fe8 Accepting request 1121548 from network:cluster
Automatic submission by obs-autosubmit

OBS-URL: https://build.opensuse.org/request/show/1121548
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=98
2023-11-01 21:09:57 +00:00
c9c235c313 Format fix to changes file:
`GET /slurmdb/v0.0.39/assocations` and `GET /slurmdb/v0.0.39/qos` to

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=273
2023-10-25 07:12:31 +00:00
Ana Guerrero
150d433676 Accepting request 1118220 from network:cluster
- update to 23.02.6 to fix (CVE-2023-41914, bsc#1216207)

OBS-URL: https://build.opensuse.org/request/show/1118220
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=97
2023-10-17 18:24:48 +00:00
37c34593a9 - update to 23.02.6 to fix (CVE-2023-41914, bsc#1216207)
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=271
2023-10-17 08:09:39 +00:00
Ana Guerrero
f946358d8c Accepting request 1117163 from network:cluster
- update to 23.02.6 to fix (CVE-2023-41914)
  * Removed Fix-test-32.8.patch as fixed upstream
  * Bug Fixes:
    + Fix `CpusPerTres=` not upgreadable with scontrol update
    + Fix unintentional gres removal when validating the gres job state.
    + Fix `--without-hpe-slingshot` configure option.
    + Fix cgroup v2 memory calculations when transparent huge pages are used.
    + Fix parsing of `sgather --timeout` option.
    + Fix regression from 22.05.0 that caused `srun --cpu-bind "=verbose"`
      and `"=v"` options give different CPU bind masks.
    + Fix "_find_node_record: lookup failure for node" error message appearing
      for all dynamic nodes during reconfigure.
    + Avoid segfault if loading serializer plugin fails.
    + `slurmrestd` - Correct OpenAPI format for `GET /slurm/v0.0.39/licenses`.
    + `slurmrestd` - Correct OpenAPI format for
      `GET /slurm/v0.0.39/job/{job_id}`.
    + `slurmrestd` - Change format to multiple fields in
     'GET /slurmdb/v0.0.39/assocations` and `GET /slurmdb/v0.0.39/qos` to
      handle infinite and unset states.
    + When a node fails in a job with `--no-kill`, preserve the extern step on the
      remaining nodes to avoid breaking features that rely on the extern step
      such as `pam_slurm_adopt`, `x11`, and `job_container/tmpfs`.
    + `auth/jwt` - Ignore `x5c` field in JWKS files.
    + `auth/jwt` - Treat 'alg' field as optional in JWKS files.
    + Allow job_desc.selinux_context to be read from the job_submit.lua script.
    + Skip check in slurmstepd that causes a large number of errors in the
      munge log: "Unauthorized credential for client UID=0 GID=0".
      This error will still appear on `slurmd`/`slurmctld`/`slurmdbd` start up
      and is not a cause for concern.
    + `slurmctld` - Allow startup with zero partitions.

OBS-URL: https://build.opensuse.org/request/show/1117163
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=96
2023-10-12 21:41:42 +00:00
449ea49bf9 - Fix changes file formatting
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=269
2023-10-12 10:02:10 +00:00
cd2c5bfc50 Accepting request 1117145 from home:mslacken:branches:network:cluster
* Bug Fixes:
   + Fix CpusPerTres= not upgreadable with scontrol update
   + Fix unintentional gres removal when validating the gres job state.
   + Fix --without-hpe-slingshot configure option.
   + Fix cgroup v2 memory calculations when transparent huge pages are used.
   + Fix parsing of sgather --timeout option.
   + Fix regression from 22.05.0 that caused srun --cpu-bind "=verbose" and "=v"
     options give different CPU bind masks.
   + Fix "_find_node_record: lookup failure for node" error message appearing
     for all dynamic nodes during reconfigure.
   + Avoid segfault if loading serializer plugin fails.
   + slurmrestd - Correct OpenAPI format for 'GET /slurm/v0.0.39/licenses'.
   + slurmrestd - Correct OpenAPI format for 'GET /slurm/v0.0.39/job/{job_id}'.
   + slurmrestd - Change format to multiple fields in 'GET
     /slurmdb/v0.0.39/assocations' and 'GET /slurmdb/v0.0.39/qos' to handle
     infinite and unset states.
   + When a node fails in a job with --no-kill, preserve the extern step on the
     remaining nodes to avoid breaking features that rely on the extern step
     such as pam_slurm_adopt, x11, and job_container/tmpfs.
   + auth/jwt - Ignore 'x5c' field in JWKS files.
   + auth/jwt - Treat 'alg' field as optional in JWKS files.
   + Allow job_desc.selinux_context to be read from the job_submit.lua script.
   + Skip check in slurmstepd that causes a large number of errors in the munge
     log: "Unauthorized credential for client UID=0 GID=0".  This error will
     still appear on slurmd/slurmctld/slurmdbd start up and is not a cause for
     concern.
   + slurmctld - Allow startup with zero partitions.
   + Fix some mig profile names in slurm not matching nvidia mig profiles.
   + Prevent slurmscriptd processing delays from blocking other threads in
     slurmctld while trying to launch {Prolog|Epilog}Slurmctld.

OBS-URL: https://build.opensuse.org/request/show/1117145
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=268
2023-10-12 09:09:32 +00:00
90bba6a8aa Accepting request 1117137 from home:mslacken:branches:network:cluster
- update to 23.02.6 to fix (CVE-2023-41914) 
  * Removed Fix-test-32.8.patch as fixed upstream

OBS-URL: https://build.opensuse.org/request/show/1117137
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=267
2023-10-12 08:49:44 +00:00
Dominique Leuenberger
12bf38b1d0 Accepting request 1111943 from network:cluster
- Updated to version 23.02.5 with the following changes:
  * Bug Fixes:
    + Revert a change in 23.02 where `SLURM_NTASKS` was no longer set in the
      job's environment when `--ntasks-per-node` was requested.
      The method that is is being set, however, is different and should be more
      accurate in more situations.
    + Change pmi2 plugin to honor the `SrunPortRange` option. This matches the
      new behavior of the pmix plugin in 23.02.0. Note that neither of these
      plugins makes use of the `MpiParams=ports=` option, and previously
      were only limited by the systems ephemeral port range.
    + Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if
      a node features plugin is configured.
    + Fix and prevent reoccurring reservations from overlapping.
    + `job_container/tmpfs` - Avoid attempts to share BasePath between nodes.
    + With `CR_Cpu_Memory`, fix node selection for jobs that request gres and
      `--mem-per-cpu`.
    + Fix a regression from 22.05.7 in which some jobs were allocated too few
      nodes, thus overcommitting cpus to some tasks.
    + Fix a job being stuck in the completing state if the job ends while the
      primary controller is down or unresponsive and the backup controller has
      not yet taken over.
    + Fix `slurmctld` segfault when a node registers with a configured
      `CpuSpecList` while `slurmctld` configuration has the node without
      `CpuSpecList`.
    + Fix cloud nodes getting stuck in `POWERED_DOWN+NO_RESPOND` state after
      not registering by `ResumeTimeout`.
    + `slurmstepd` - Avoid cleanup of `config.json-less` containers spooldir
      getting skipped.
    + Fix scontrol segfault when 'completing' command requested repeatedly in
      interactive mode.

OBS-URL: https://build.opensuse.org/request/show/1111943
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=95
2023-09-20 11:26:46 +00:00
f0b994e220 plugins makes use of the MpiParams=ports= option, and previously
features with the `|` operator, which could prevent jobs from
    + `node_features/helpers` - Fix inconsistent handling of `&` and `|`,
      instead of just the current set. E.g. `foo|bar&baz` was interpreted
      `{foo} or {bar,baz}`.
      tasks fewer than GPUs, which resulted in incorrectly rejecting these
      jobs.
    + `slurmrestd` - For `GET /slurm/v0.0.39/node[s]`, change format of
      node's energy field `current_watts` to a dictionary to account for
    + `slurmrestd` - For `GET /slurm/v0.0.39/qos`, change format of QOS's
    + slurmrestd - For `GET /slurm/v0.0.39/job[s]`, the 'return code'
      `GET /slurmdb/v0.0.39/jobs` from slurmrestd.
      were present in the log: `error: Attempt to change gres/gpu Count`.
    + Hold the job with `(Reservation ... invalid)` state reason if the

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=265
2023-09-18 05:43:58 +00:00
74529b6cc2 - Updated to version 23.02.5 with the following changes:
* Bug Fixes:
    + Revert a change in 23.02 where `SLURM_NTASKS` was no longer set in the
      job's environment when `--ntasks-per-node` was requested.
      The method that is is being set, however, is different and should be more
      accurate in more situations.
    + Change pmi2 plugin to honor the `SrunPortRange` option. This matches the
      new behavior of the pmix plugin in 23.02.0. Note that neither of these
      plugins makes use of the "`MpiParams=ports=`" option, and previously
      were only limited by the systems ephemeral port range.
    + Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if
      a node features plugin is configured.
    + Fix and prevent reoccurring reservations from overlapping.
    + `job_container/tmpfs` - Avoid attempts to share BasePath between nodes.
    + With `CR_Cpu_Memory`, fix node selection for jobs that request gres and
      `--mem-per-cpu`.
    + Fix a regression from 22.05.7 in which some jobs were allocated too few
      nodes, thus overcommitting cpus to some tasks.
    + Fix a job being stuck in the completing state if the job ends while the
      primary controller is down or unresponsive and the backup controller has
      not yet taken over.
    + Fix `slurmctld` segfault when a node registers with a configured
      `CpuSpecList` while `slurmctld` configuration has the node without
      `CpuSpecList`.
    + Fix cloud nodes getting stuck in `POWERED_DOWN+NO_RESPOND` state after
      not registering by `ResumeTimeout`.
    + `slurmstepd` - Avoid cleanup of `config.json-less` containers spooldir
      getting skipped.
    + Fix scontrol segfault when 'completing' command requested repeatedly in
      interactive mode.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=264
2023-09-18 05:24:51 +00:00
Ana Guerrero
3825e9fab0 Accepting request 1110422 from network:cluster
- Create a macro for upgrade dependency to ensure uniform handling. (forwarded request 1110421 from eeich)

OBS-URL: https://build.opensuse.org/request/show/1110422
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=94
2023-09-12 19:02:53 +00:00
a323feff42 Accepting request 1110421 from home:eeich:branches:network:cluster
- Create a macro for upgrade dependency to ensure uniform handling.

OBS-URL: https://build.opensuse.org/request/show/1110421
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=262
2023-09-12 04:52:56 +00:00
Ana Guerrero
3bcde4bfd9 Accepting request 1110259 from network:cluster
- Updated to 23.02.4 with the following changes:
  * Bug Fixes:
    + Fix main scheduler loop not starting after a failover to backup
      controller. Avoid slurmctld segfault when specifying
     `AccountingStorageExternalHost` (bsc#1214983).
    + Fix sbatch return code when `--wait` is requested on a job array.
    + Fix collected `GPUUtilization` values for `acct_gather_profile` plugins.
    + Fix `slurmrestd` handling of job hold/release operations.
    + Fix step running indefinitely when slurmctld takes more than
      `MessageTimeout` to respond. Now, `slurmctld` will cancel the step when
       detected, preventing following steps from getting stuck waiting for
       resources to be released.
    + Fix regression to make `job_desc.min_cpus` accurate again in `job_submit`
      when requesting a job with `--ntasks-per-node`.
    + Fix handling of `ArrayTaskThrottle` in backfill.
    + Fix regression in 23.02.2 when checking gres state on `slurmctld`
      startup  or reconfigure. Gres changes in the configuration were not
      updated on slurmctld startup. On startup or reconfigure, these messages
      were present in the log: `"error: Attempt to change gres/gpu Count`".
    + Fix potential double count of gres when dealing with limits.
    + Fix `slurmstepd` segfault when `ContainerPath` is not set in `oci.conf`
    + Fixed an issue where jobs requesting licenses were incorrectly rejected.
    + `scrontab` - Fix cutting off the final character of quoted variables.
    + `smail` - Fix issues where e-mails at job completion were not being sent.
    + `scontrol/slurmctld` - fix comma parsing when updating a reservation's
       nodes.
    + Fix `--gpu-bind=single binding` tasks to wrong gpus, leading to some gpus
      having more tasks than they should and other gpus being unused.
    + Fix regression in 23.02 that causes slurmstepd to crash when `srun`
      requests more than `TreeWidth` nodes in a step and uses the pmi2 or

OBS-URL: https://build.opensuse.org/request/show/1110259
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=93
2023-09-11 19:22:19 +00:00
f9646ba945 - Updated to 23.02.4 with the following changes:
* Bug Fixes:
    + Fix main scheduler loop not starting after a failover to backup
      controller. Avoid slurmctld segfault when specifying
     `AccountingStorageExternalHost` (bsc#1214983).
    + Fix sbatch return code when `--wait` is requested on a job array.
    + Fix collected `GPUUtilization` values for `acct_gather_profile` plugins.
    + Fix `slurmrestd` handling of job hold/release operations.
    + Fix step running indefinitely when slurmctld takes more than
      `MessageTimeout` to respond. Now, `slurmctld` will cancel the step when
       detected, preventing following steps from getting stuck waiting for
       resources to be released.
    + Fix regression to make `job_desc.min_cpus` accurate again in `job_submit`
      when requesting a job with `--ntasks-per-node`.
    + Fix handling of `ArrayTaskThrottle` in backfill.
    + Fix regression in 23.02.2 when checking gres state on `slurmctld`
      startup  or reconfigure. Gres changes in the configuration were not
      updated on slurmctld startup. On startup or reconfigure, these messages
      were present in the log: `"error: Attempt to change gres/gpu Count`".
    + Fix potential double count of gres when dealing with limits.
    + Fix `slurmstepd` segfault when `ContainerPath` is not set in `oci.conf`
    + Fixed an issue where jobs requesting licenses were incorrectly rejected.
    + `scrontab` - Fix cutting off the final character of quoted variables.
    + `smail` - Fix issues where e-mails at job completion were not being sent.
    + `scontrol/slurmctld` - fix comma parsing when updating a reservation's
       nodes.
    + Fix `--gpu-bind=single binding` tasks to wrong gpus, leading to some gpus
      having more tasks than they should and other gpus being unused.
    + Fix regression in 23.02 that causes slurmstepd to crash when `srun`
      requests more than `TreeWidth` nodes in a step and uses the pmi2 or

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=260
2023-09-11 07:21:32 +00:00
Ana Guerrero
6b47182efe Accepting request 1109308 from network:cluster
- Fixes since 23.02.03:
  Highlights:
  * Fix main scheduler loop not starting after a failover to backup controller.
  * Avoid slurmctld segfault when specifying `AccountingStorageExternalHost`
    (bsc#1214983).
  Other:
  * Fix sbatch return code when `--wait` is requested on a job array.
  * Fix collected `GPUUtilization` values for `acct_gather_profile` plugins.
  * Fix `slurmrestd` handling of job hold/release operations.
  * Make spank `S_JOB_ARGV` item value hold the requested command `argv`
    instead of the `srun --bcast` value when `--bcast` requested (only in local
    context).
  * Fix step running indefinitely when slurmctld takes more than
    `MessageTimeout` to respond. Now, slurmctld will cancel the step when
    detected, preventing following steps from getting stuck waiting for
    resources to be released.
  * Fix regression to make `job_desc.min_cpus` accurate again in job_submit when
    requesting a job with `--ntasks-per-node`.
  * Fix handling of `ArrayTaskThrottle` in backfill.
  * Fix regression in 23.02.2 when checking gres state on `slurmctld` startup or
    reconfigure. Gres changes in the configuration were not updated on slurmctld
    startup. On startup or reconfigure, these messages were present in the log:
    `"error: Attempt to change gres/gpu Count`".
  * Fix potential double count of gres when dealing with limits.
  * Fix slurmstepd segfault when ContainerPath is not set in `oci.conf`
  * Fixed an issue where jobs requesting licenses were incorrectly rejected.
  * `scrontab` - Fix cutting off the final character of quoted variables.
  * `smail` - Fix issues where e-mails at job completion were not being sent.
  * `scontrol/slurmctld` - fix comma parsing when updating a reservation's
    nodes.

OBS-URL: https://build.opensuse.org/request/show/1109308
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=92
2023-09-07 19:12:41 +00:00
c63b605916 - Fixes since 23.02.03:
Highlights:
  * Fix main scheduler loop not starting after a failover to backup controller.
  * Avoid slurmctld segfault when specifying `AccountingStorageExternalHost`
    (bsc#1214983).
  Other:
  * Fix sbatch return code when `--wait` is requested on a job array.
  * Fix collected `GPUUtilization` values for `acct_gather_profile` plugins.
  * Fix `slurmrestd` handling of job hold/release operations.
  * Make spank `S_JOB_ARGV` item value hold the requested command `argv`
    instead of the `srun --bcast` value when `--bcast` requested (only in local
    context).
  * Fix step running indefinitely when slurmctld takes more than
    `MessageTimeout` to respond. Now, slurmctld will cancel the step when
    detected, preventing following steps from getting stuck waiting for
    resources to be released.
  * Fix regression to make `job_desc.min_cpus` accurate again in job_submit when
    requesting a job with `--ntasks-per-node`.
  * Fix handling of `ArrayTaskThrottle` in backfill.
  * Fix regression in 23.02.2 when checking gres state on `slurmctld` startup or
    reconfigure. Gres changes in the configuration were not updated on slurmctld
    startup. On startup or reconfigure, these messages were present in the log:
    `"error: Attempt to change gres/gpu Count`".
  * Fix potential double count of gres when dealing with limits.
  * Fix slurmstepd segfault when ContainerPath is not set in `oci.conf`
  * Fixed an issue where jobs requesting licenses were incorrectly rejected.
  * `scrontab` - Fix cutting off the final character of quoted variables.
  * `smail` - Fix issues where e-mails at job completion were not being sent.
  * `scontrol/slurmctld` - fix comma parsing when updating a reservation's
    nodes.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=258
2023-09-06 17:11:37 +00:00
Ana Guerrero
51bec69223 Accepting request 1109029 from network:cluster
- updated to 23.02.04 which includes following changes: 
  * fixing the main scheduler loop not starting on the backup controller after
    a failover event, a segfault when attempting to use
  * AccountingStorageExternalHost, and an issue where steps could continue
    running indefinitely if the slurmctld takes too long to respond (bsc#1214983)
  * include a fix for a potential slurmctld crashes when the backup slurmctld
    takes over.
  * This also fixes some issues when using older versions of the command line
    tools with a 23.02 controller.
  * srun/sbatch/salloc - In order to support user namespaces, process user and
    group ids are no longer used unless explicitly requested as an argument and
    are left as nobody(99) by default. Any cli_filters or SPANK plugins need to
    ignore any uid or gid that equal SLURM_AUTH_NOBODY (99). User and group ids
    are now resolved by the active auth plugin. To determine the actual job uid
    or gid you should use the RESPONSE_RESOURCE_ALLOCATION RPC.
- removed Fix-test-3.13.patch as fixed upstream
- removed Fix-test-38.11.patch as test changed upstream (forwarded request 1109009 from mslacken)

OBS-URL: https://build.opensuse.org/request/show/1109029
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=91
2023-09-06 16:57:11 +00:00
47d665607b Accepting request 1109009 from home:mslacken:branches:network:cluster
- updated to 23.02.04 which includes following changes: 
  * fixing the main scheduler loop not starting on the backup controller after
    a failover event, a segfault when attempting to use
  * AccountingStorageExternalHost, and an issue where steps could continue
    running indefinitely if the slurmctld takes too long to respond (bsc#1214983)
  * include a fix for a potential slurmctld crashes when the backup slurmctld
    takes over.
  * This also fixes some issues when using older versions of the command line
    tools with a 23.02 controller.
  * srun/sbatch/salloc - In order to support user namespaces, process user and
    group ids are no longer used unless explicitly requested as an argument and
    are left as nobody(99) by default. Any cli_filters or SPANK plugins need to
    ignore any uid or gid that equal SLURM_AUTH_NOBODY (99). User and group ids
    are now resolved by the active auth plugin. To determine the actual job uid
    or gid you should use the RESPONSE_RESOURCE_ALLOCATION RPC.
- removed Fix-test-3.13.patch as fixed upstream
- removed Fix-test-38.11.patch as test changed upstream

OBS-URL: https://build.opensuse.org/request/show/1109009
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=256
2023-09-05 11:47:06 +00:00
Dominique Leuenberger
03d2eefa9e Accepting request 1085677 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/1085677
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=90
2023-05-09 11:09:16 +00:00
532aa1e96d Accepting request 1085668 from home:mslacken:branches:network:cluster
- updated to 23.02.02 which includes a number of fixes to Slurm stability
  * Includes a fix for a regression in 23.02 that caused openmpi mpirun to fail
    to launch tasks. 
  * It also includes two functional changes: Don't update the cron job tasks if
    the whole crontab file is left untouched after opening it with scrontab -e
  * Sort dynamic nodes and include them in topology after scontrol reconfigure
    or a slurmctld restart.

OBS-URL: https://build.opensuse.org/request/show/1085668
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=254
2023-05-09 10:35:16 +00:00
Dominique Leuenberger
0d5e08df4b Accepting request 1083466 from network:cluster
- Web-configurator: changed presets to SUSE defaults.
- If %_restart_on_update is no longer defined replace by own
  macro.
- Marked slurm-openlava, slurm-seff and slurm-sjstat noarch.
- rpmlint:
  * dropped some rpmlint filters which are no longer relevant.
  * added/refreshed filters. For Details, see rpmlintrc.
- Remove workaround to fix the restart issue in an Slurm package
  described in bsc#1088693.
  The Slurm version in this package as 16.05. Any attempt to
  directly migrate to the current version is bound to fail
  anyway.
- Now require slurm-munge if munge authentication is installed.

OBS-URL: https://build.opensuse.org/request/show/1083466
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=89
2023-04-28 14:23:13 +00:00
33bf8791ac - Require slurm-munge if munge authentication is installed.
- Replace 'Require: config(pam)' by 'Require: pam'.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=252
2023-04-28 07:46:44 +00:00
392bec3223 Accepting request 1082770 from home:eeich:branches:network:cluster
- Web-configurator: changed presets to SUSE defaults.
- If %_restart_on_update is no longer defined replace by own
  macro.
- Marked slurm-openlava, slurm-seff and slurm-sjstat noarch.
- rpmlint:
  * dropped some rpmlint filters which are no longer relevant.
  * added/refreshed filters. For Details, see rpmlintrc.
- Remove workaround to fix the restart issue in an Slurm package
  described in bsc#1088693.
  The Slurm version in this package as 16.05. Any attempt to
  directly migrate to the current version is bound to fail
  anyway.

OBS-URL: https://build.opensuse.org/request/show/1082770
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=251
2023-04-27 13:24:37 +00:00
Dominique Leuenberger
e27e58c1b6 Accepting request 1076522 from network:cluster
- updated to 23.02.1 with the following changes:
  * job_container/tmpfs - cleanup job container even if namespace mount is
    already unmounted.
  * openapi/dbv0.0.38 - Fix not displaying an error when updating QOS or
    associations fails.
  * Fix nodes remaining as PLANNED after slurmctld save state recovery.
  * Add cgroup.conf EnableControllers option for cgroup/v2.
  * Get correct cgroup root to allow slurmd to run in containers like Docker.
  * slurmctld - add missing PrivateData=jobs check to step ContainerID lookup
    requests originated from 'scontrol show step container-id=<id>' or certain
    scrun operations when container state can't be directly queried.
  * Fix nodes un-draining after being drained due to unkillable step.
  * Fix remote licenses allowed percentages reset to 0 during upgrade.
  * sacct - Avoid truncating time strings when using SLURM_TIME_FORMAT with
    the --parsable option.
  * Fix regression in 22.05.0rc1 that broke Nodes=ALL in a NodeSet.
  * openapi/v0.0.39 - fix jobs submitted via slurmrestd being allocated fewer
    CPUs than tasks when requesting multiple tasks.
  * Fix job not being scheduled on valid nodes and potentially being rejected
    when using parentheses at the beginning of square brackets in a feature
    request, for example: "feat1&[(feat2|feat3)]".
  * Fix regression in 23.02.0rc1 which made --gres-flags=enforce-binding no
    longer enforce optimal core-gpu job placement.
  * mpi/pmix - Fix v5 to load correctly when libpmix.so isn't in the normal
    lib path.
  * data_parser/v0.0.39 - fix regression where "memory_per_node" would be
    rejected for job submission.
  * data_parser/v0.0.39 - fix regression where "memory_per_cpu" would be
    rejected for job submission.
  * slurmctld - add an assert to check for magic number presence before deleting

OBS-URL: https://build.opensuse.org/request/show/1076522
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=88
2023-04-01 17:32:20 +00:00
5a68fc8e5f - updated to 23.02.1 with the following changes:
- removed right-pmix-path.patch as fixed upstream

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=249
2023-03-31 15:48:27 +00:00
d2a2e0a1e8 Accepting request 1076461 from home:mslacken:branches:network:cluster
- updated to 23.02.1 with following chnages:
  * job_container/tmpfs - cleanup job container even if namespace mount is
    already unmounted.
  * openapi/dbv0.0.38 - Fix not displaying an error when updating QOS or
    associations fails.
  * Fix nodes remaining as PLANNED after slurmctld save state recovery.
  * Add cgroup.conf EnableControllers option for cgroup/v2.
  * Get correct cgroup root to allow slurmd to run in containers like Docker.
  * slurmctld - add missing PrivateData=jobs check to step ContainerID lookup
    requests originated from 'scontrol show step container-id=<id>' or certain
    scrun operations when container state can't be directly queried.
  * Fix nodes un-draining after being drained due to unkillable step.
  * Fix remote licenses allowed percentages reset to 0 during upgrade.
  * sacct - Avoid truncating time strings when using SLURM_TIME_FORMAT with
    the --parsable option.
  * Fix regression in 22.05.0rc1 that broke Nodes=ALL in a NodeSet.
  * openapi/v0.0.39 - fix jobs submitted via slurmrestd being allocated fewer
    CPUs than tasks when requesting multiple tasks.
  * Fix job not being scheduled on valid nodes and potentially being rejected
    when using parentheses at the beginning of square brackets in a feature
    request, for example: "feat1&[(feat2|feat3)]".
  * Fix regression in 23.02.0rc1 which made --gres-flags=enforce-binding no
    longer enforce optimal core-gpu job placement.
  * mpi/pmix - Fix v5 to load correctly when libpmix.so isn't in the normal
    lib path.
  * data_parser/v0.0.39 - fix regression where "memory_per_node" would be
    rejected for job submission.
  * data_parser/v0.0.39 - fix regression where "memory_per_cpu" would be
    rejected for job submission.
  * slurmctld - add an assert to check for magic number presence before deleting

OBS-URL: https://build.opensuse.org/request/show/1076461
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=248
2023-03-31 15:44:08 +00:00
Dominique Leuenberger
c7d67ed696 Accepting request 1072592 from network:cluster
added: right-pmix-path.patch (forwarded request 1072591 from mslacken)

OBS-URL: https://build.opensuse.org/request/show/1072592
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=87
2023-03-17 16:05:03 +00:00
5c3d4865a1 Accepting request 1072591 from home:mslacken:branches:network:cluster
added: right-pmix-path.patch

OBS-URL: https://build.opensuse.org/request/show/1072591
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=246
2023-03-17 10:52:44 +00:00
9883ad6d58 Accepting request 1072585 from home:mslacken:branches:network:cluster
- use libpmix.so.2 instead of libpmix.so to fix (bsc#1209260)
  this removes the need of pmix-pluginlib

OBS-URL: https://build.opensuse.org/request/show/1072585
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=245
2023-03-17 10:42:09 +00:00
Dominique Leuenberger
2de2dcca49 Accepting request 1072087 from network:cluster
- slurm-plugins need to require pmix-pluginlib (bsc#1209260) (forwarded request 1072084 from mslacken)

OBS-URL: https://build.opensuse.org/request/show/1072087
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=86
2023-03-15 17:56:12 +00:00
521f372d87 Accepting request 1072084 from home:mslacken:branches:network:cluster
- slurm-plugins need to require pmix-pluginlib (bsc#1209260)

OBS-URL: https://build.opensuse.org/request/show/1072084
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=243
2023-03-15 10:57:09 +00:00
Dominique Leuenberger
c224ea00c3 Accepting request 1070214 from network:cluster
- Fixing dependencies for slurm--plugin-ext-sensors-rrd again. (forwarded request 1070212 from eeich)

OBS-URL: https://build.opensuse.org/request/show/1070214
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=85
2023-03-09 16:45:23 +00:00
e85b508441 Accepting request 1070212 from home:eeich:branches:network:cluster
- Fixing dependencies for slurm--plugin-ext-sensors-rrd again.

OBS-URL: https://build.opensuse.org/request/show/1070212
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=241
2023-03-08 15:43:28 +00:00
86940cb8c4 Accepting request 1070094 from home:eeich:branches:network:cluster
- Fix conflicts for plugin-ext-sensors-rrd

OBS-URL: https://build.opensuse.org/request/show/1070094
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=240
2023-03-08 07:58:58 +00:00
0f04c66747 Accepting request 1070043 from home:eeich:branches:network:cluster
- Fixup previous submission.

OBS-URL: https://build.opensuse.org/request/show/1070043
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=239
2023-03-07 22:14:15 +00:00
da464bfaae Accepting request 1070038 from home:eeich:branches:network:cluster
- Stop pulling firewall rules from github. There is no benefit to
  host these separately.
- Remove pre-sle12 pieces.

- Add missing Provides:, Conflicts: and Obsoletes: to slurm-cray,
  slurm-hdf5 and slurm-testsuite to avoid package conflicts.
- Unify Obsoletes:.
- Consolidate spec files between different Slurm releases in
  Leap/SLE maintenance.

OBS-URL: https://build.opensuse.org/request/show/1070038
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=238
2023-03-07 21:33:03 +00:00
Dominique Leuenberger
50b2b76a05 Accepting request 1068523 from network:cluster
- Add missing Provides: and Obsoletes: to slurm-cray, slurm-hdf5
  and slurm-testsuite to avoid package conflicts.
- Add dependency for the general plugin package to the
  AcctGatherProfile HDF5 plugin.
- Adjust node RealMemory in slurm.conf of test suite for 8G test
  nodes. (forwarded request 1068522 from eeich)

OBS-URL: https://build.opensuse.org/request/show/1068523
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=84
2023-03-02 22:03:34 +00:00
6997bacde0 Accepting request 1068522 from home:eeich:branches:network:cluster
- Add missing Provides: and Obsoletes: to slurm-cray, slurm-hdf5
  and slurm-testsuite to avoid package conflicts.
- Add dependency for the general plugin package to the
  AcctGatherProfile HDF5 plugin.
- Adjust node RealMemory in slurm.conf of test suite for 8G test
  nodes.

OBS-URL: https://build.opensuse.org/request/show/1068522
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=236
2023-03-01 17:58:54 +00:00
Dominique Leuenberger
8a8f7dcb78 Accepting request 1068320 from network:cluster
- updated to 23.02.0
  * Highlights
    + slurmctld - Add new RPC rate limiting feature. This is enabled through
      SlurmctldParameters=rl_enable, otherwise disabled by default.
    + Make scontrol reconfigure and sending a SIGHUP to the slurmctld behave
      the same. If you were using SIGHUP as a 'lighter' scontrol reconfigure
      to rotate logs please update your scripts to use SIGUSR2 instead.
    + Change cloud nodes to show by default. PrivateData=cloud is no longer
      needed.
    + sreport - Count planned (FKA reserved) time for jobs running in
      IGNORE_JOBS reservations. Previously was lumped into IDLE time.
    + job_container/tmpfs - Support running with an arbitrary list of private
      mount points (/tmp and /dev/shm are the default, but not required).
    + job_container/tmpfs - Set more environment variables in InitScript.
    + Make all cgroup directories created by Slurm owned by root. This was the
      behavior in cgroup/v2 but not in cgroup/v1 where by default the step
      directories ownership were set to the user and group of the job.
    + accounting_storage/mysql - change purge/archive to calculate record ages
      based on end time, rather than start or submission times.
    + job_submit/lua - add support for log_user() from slurm_job_modify().
    + Run the following scripts in slurmscriptd instead of slurmctld:
      ResumeProgram, ResumeFailProgram, SuspendProgram, ResvProlog, ResvEpilog,
      and RebootProgram (only with SlurmctldParameters=reboot_from_controller).
    + Only permit changing log levels with 'srun --slurmd-debug' by root
      or SlurmUser.
    + slurmctld will fatal() when reconfiguring the job_submit plugin fails.
    + Add PowerDownOnIdle partition option to power down nodes after nodes
      become idle.
    + Add "[jobid.stepid]" prefix from slurmstepd and "slurmscriptd" prefix
      from slurmcriptd to Syslog logging. Previously was only happening when

OBS-URL: https://build.opensuse.org/request/show/1068320
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=83
2023-03-01 15:14:17 +00:00
e60f39a466 - updated to 23.02.0
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=234
2023-02-28 20:50:48 +00:00
8899aac00b - testsuite: on later SUSE versions claim ownership of directory
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=233
2023-02-28 20:34:03 +00:00
18aa012ab9 Accepting request 1068316 from home:eeich:branches:network:cluster
+ Fixed GpuFreqDef option. When set in slurm.conf, it will be used if
      --gpu-freq was not explicitly set by the job step.
    + topology/tree - Add new TopologyParam=SwitchAsNodeRank option to reorder
      nodes based on switch layout. This can be useful if the naming convention
      for the nodes does not natually map to the network topology.
    + Removed the default setting for GpuFreqDef. If unset, no attempt to change
      the GPU frequency will be made if --gpu-freq is not set for the step.

OBS-URL: https://build.opensuse.org/request/show/1068316
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=232
2023-02-28 20:30:32 +00:00
ef6d6521aa Accepting request 1067475 from home:eeich:branches:network:cluster
- updated to 23.02.0-0rc1
  * Highlights
    + slurmctld - Add new RPC rate limiting feature. This is enabled through
      SlurmctldParameters=rl_enable, otherwise disabled by default.
    + Make scontrol reconfigure and sending a SIGHUP to the slurmctld behave
      the same. If you were using SIGHUP as a 'lighter' scontrol reconfigure
      to rotate logs please update your scripts to use SIGUSR2 instead.
    + Change cloud nodes to show by default. PrivateData=cloud is no longer
      needed.
    + sreport - Count planned (FKA reserved) time for jobs running in
      IGNORE_JOBS reservations. Previously was lumped into IDLE time.
    + job_container/tmpfs - Support running with an arbitrary list of private
      mount points (/tmp and /dev/shm are the default, but not required).
    + job_container/tmpfs - Set more environment variables in InitScript.
    + Make all cgroup directories created by Slurm owned by root. This was the
      behavior in cgroup/v2 but not in cgroup/v1 where by default the step
      directories ownership were set to the user and group of the job.
    + accounting_storage/mysql - change purge/archive to calculate record ages
      based on end time, rather than start or submission times.
    + job_submit/lua - add support for log_user() from slurm_job_modify().
    + Run the following scripts in slurmscriptd instead of slurmctld:
      ResumeProgram, ResumeFailProgram, SuspendProgram, ResvProlog, ResvEpilog,
      and RebootProgram (only with SlurmctldParameters=reboot_from_controller).
    + Only permit changing log levels with 'srun --slurmd-debug' by root
      or SlurmUser.
    + slurmctld will fatal() when reconfiguring the job_submit plugin fails.
    + Add PowerDownOnIdle partition option to power down nodes after nodes
      become idle.
    + Add "[jobid.stepid]" prefix from slurmstepd and "slurmscriptd" prefix
      from slurmcriptd to Syslog logging. Previously was only happening when

OBS-URL: https://build.opensuse.org/request/show/1067475
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=231
2023-02-23 19:32:51 +00:00
Dominique Leuenberger
d1ebf00ba6 Accepting request 1063957 from network:cluster
- testsuite: on laster SUSE versions claim ownership of directory
  /etc/security/limits.d. (forwarded request 1063954 from eeich)

OBS-URL: https://build.opensuse.org/request/show/1063957
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=82
2023-02-09 15:23:26 +00:00
4693e39860 Accepting request 1063954 from home:eeich:branches:network:cluster
- testsuite: on laster SUSE versions claim ownership of directory
  /etc/security/limits.d.

OBS-URL: https://build.opensuse.org/request/show/1063954
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=229
2023-02-09 08:22:55 +00:00
Dominique Leuenberger
a4484c7dc2 Accepting request 1042071 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/1042071
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=81
2022-12-11 16:16:58 +00:00
6f080824a4 Accepting request 1039957 from home:eeich:branches:network:cluster
- Move the ext_sensors/rrd plugin to a separate package: this
  plugin requires librrd which in turn requires huge parts of
  the client side X Window System stack.
  There is probably no use in cluttering up a system for a
  plugin that probably only used by a few.

OBS-URL: https://build.opensuse.org/request/show/1039957
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=227
2022-12-11 07:58:12 +00:00
Dominique Leuenberger
30dd030610 Accepting request 1031255 from network:cluster
- Test Suite fixes:
  * Update README_Testsuite.md.
  * Clean up left over files when de-installing test suite.
  * Adjustment to test suite package: for SLE mark the openmpi4
    devel package and slurm-hdf5 optional.
  * Add -ffat-lto-objects to the build flags when LTO is set to
    make sure the object files we ship with the test suite still
    work correctly.
  * Improve setup-testsuite.sh: copy ssh fingerprints from all nodes.

- set environment variable SUSE_ZNOW to 0 in %build to avoid module load
  failures due to unresolved symbols as module take advantage of lazy
  bindings (bsc#1200030).

OBS-URL: https://build.opensuse.org/request/show/1031255
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=80
2022-10-26 10:32:00 +00:00
212048404b * Improve setup-testsuite.sh: copy ssh fingerprints from all nodes.
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=225
2022-10-26 06:23:36 +00:00
776ce8f23b - Test Suite fixes:
* Update README_Testsuite.md.
  * Clean up left over files when de-installing test suite.
  * Adjustment to test suite package: for SLE mark the openmpi4
    devel package and slurm-hdf5 optional.
  * Add -ffat-lto-objects to the build flags when LTO is set to
    make sure the object files we ship with the test suite still
    work correctly.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=224
2022-10-25 11:33:49 +00:00
642a47efa7 - Adjustment to test suite package: only recommend openmpi4
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=223
2022-10-24 08:54:35 +00:00
52046053d5 Accepting request 1030610 from home:eeich:branches:network:cluster
- Update README_Testsuite.md.
- Make hdf5 package optional for test suite.
- Clean up left over files when de-installing test suite.

- set environment variable SUSE_ZNOW to 0 in %build to avoid module load
  failures due to unresolved symbols as module take advantage of lazy
  bindings (bsc#1200030).

OBS-URL: https://build.opensuse.org/request/show/1030610
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=222
2022-10-24 05:31:40 +00:00
Dominique Leuenberger
220eec76a4 Accepting request 1030432 from network:cluster
- updated to 22.05.5
- NOTE: Slurm validates that libraries are of the same version. Unfortunately,
  due to an oversight, we failed to notice that the slurmstepd loads the
  hash_k12 library only after a job has completed. This means that if the
  hash_k12 library is upgraded before a job finishes, the slurmstepd will load
  the new library when the job finishes, and will fail due to a mismatch of
  versions.  This results in nodes with slurmstepd processes stuck
  indefinitely. These processes require manual intervention to clean up. There
  is no clean way to resolve these hung slurmstepd processes.
  The only recommended way to upgrade between minor versions of 22.05 with
  RPM’s or upgrades that replace current binaries and libraries is to drain the
  nodes of running jobs first.
- Fixes a number of moderate severity issues, noteable are:
  * Load hash plugin at slurmstepd launch time to prevent issues loading the
    plugin at step completion if the Slurm installation is upgraded.
  * Update nvml plugin to match the unique id format for MIG devices in new
    Nvidia drivers.
  * Fix multi-node step launch failure when nodes in the controller aren't in
    natural order. This can happen with inconsistent node naming (such as
    node15 and node052) or with dynamic nodes which can register in any order.
  * job_container/tmpfs - cleanup containers even when the .ns file isn't
    mounted anymore.
  * Wait up to PrologEpilogTimeout before shutting down slurmd to allow prolog
    and epilog scripts to complete or timeout. Previously, slurmd waited 120
    seconds before timing out and killing prolog and epilog scripts. (forwarded request 1010642 from mslacken)

OBS-URL: https://build.opensuse.org/request/show/1030432
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=79
2022-10-22 12:13:18 +00:00
c2551ab47f Accepting request 1010642 from home:mslacken:branches:network:cluster
- updated to 22.05.5
- NOTE: Slurm validates that libraries are of the same version. Unfortunately,
  due to an oversight, we failed to notice that the slurmstepd loads the
  hash_k12 library only after a job has completed. This means that if the
  hash_k12 library is upgraded before a job finishes, the slurmstepd will load
  the new library when the job finishes, and will fail due to a mismatch of
  versions.  This results in nodes with slurmstepd processes stuck
  indefinitely. These processes require manual intervention to clean up. There
  is no clean way to resolve these hung slurmstepd processes.
  The only recommended way to upgrade between minor versions of 22.05 with
  RPM’s or upgrades that replace current binaries and libraries is to drain the
  nodes of running jobs first.
- Fixes a number of moderate severity issues, noteable are:
  * Load hash plugin at slurmstepd launch time to prevent issues loading the
    plugin at step completion if the Slurm installation is upgraded.
  * Update nvml plugin to match the unique id format for MIG devices in new
    Nvidia drivers.
  * Fix multi-node step launch failure when nodes in the controller aren't in
    natural order. This can happen with inconsistent node naming (such as
    node15 and node052) or with dynamic nodes which can register in any order.
  * job_container/tmpfs - cleanup containers even when the .ns file isn't
    mounted anymore.
  * Wait up to PrologEpilogTimeout before shutting down slurmd to allow prolog
    and epilog scripts to complete or timeout. Previously, slurmd waited 120
    seconds before timing out and killing prolog and epilog scripts.

OBS-URL: https://build.opensuse.org/request/show/1010642
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=220
2022-10-21 15:00:25 +00:00
Dominique Leuenberger
edd405b2c8 Accepting request 1006180 from network:cluster
- Do not deduplicate files of testsuite Slurm configuration.
  This directory is supposed to be mounted over /etc/slurm
  therefore it must not contain softlinks to the files in
  this directory.
- Improve .a and .o file collection for test suite: find these
  files even if there are multiple ones in a single line. (forwarded request 1005746 from eeich)

OBS-URL: https://build.opensuse.org/request/show/1006180
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=78
2022-09-26 16:48:44 +00:00
09aecc2015 Accepting request 1005746 from home:eeich:branches:network:cluster
- Do not deduplicate files of testsuite Slurm configuration.
  This directory is supposed to be mounted over /etc/slurm
  therefore it must not contain softlinks to the files in
  this directory.
- Improve .a and .o file collection for test suite: find these
  files even if there are multiple ones in a single line.

OBS-URL: https://build.opensuse.org/request/show/1005746
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=218
2022-09-26 15:01:51 +00:00
Dominique Leuenberger
ae04ec8787 Accepting request 1005247 from network:cluster
- Fix build for older product version. (forwarded request 1005246 from eeich)

OBS-URL: https://build.opensuse.org/request/show/1005247
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=77
2022-09-22 12:49:55 +00:00
3f68233e21 Accepting request 1005246 from home:eeich:branches:network:cluster
- Fix build for older product version.

OBS-URL: https://build.opensuse.org/request/show/1005246
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=216
2022-09-21 15:33:09 +00:00
Dominique Leuenberger
d3bcbab808 Accepting request 992362 from network:cluster
- Fix a potential security vulnerability in the test package
  (bsc#1201674, CVE-2022-31251).

- Patch NOFILE Limit in the slurmd.service copy for the testsuite. (forwarded request 992353 from eeich)

OBS-URL: https://build.opensuse.org/request/show/992362
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=76
2022-08-02 20:09:54 +00:00
b60ac5f569 Accepting request 992353 from home:eeich:branches:network:cluster
- Fix a potential security vulnerability in the test package
  (bsc#1201674, CVE-2022-31251).

- Patch NOFILE Limit in the slurmd.service copy for the testsuite.

OBS-URL: https://build.opensuse.org/request/show/992353
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=214
2022-08-02 15:34:01 +00:00
fd509c0258 Accepting request 990637 from home:bmwiedemann:branches:network:cluster
make slurmtest.tar reproducible

OBS-URL: https://build.opensuse.org/request/show/990637
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=213
2022-08-02 13:14:07 +00:00
Richard Brown
7a8e082057 Accepting request 990643 from network:cluster
Automatic submission by obs-autosubmit

OBS-URL: https://build.opensuse.org/request/show/990643
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=75
2022-07-22 17:21:25 +00:00
e067a36989 - Fix a typo which prevented the nproc limit for slurmd to be
up-ed for the test suite.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=211
2022-07-15 07:15:34 +00:00
69890cab1e Accepting request 989256 from home:eeich:branches:network:cluster
- Improve check for mpicc in testsuite package: if binary isn't
  found, don't crash.

OBS-URL: https://build.opensuse.org/request/show/989256
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=210
2022-07-15 07:13:32 +00:00
167150eca6 - Fix a typo
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=209
2022-07-15 07:12:53 +00:00
Dominique Leuenberger
e57307d81e Accepting request 988733 from network:cluster
- Package the Slurm testsuite for QA purposes.
  * Fixes for test suite:
    Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch
    Fix-test-21.41.patch
    Fix-test-38.11.patch
    Fix-test-32.8.patch
    Fix-test-3.13.patch
    Fix-test7.2-to-find-libpmix-under-lib64-as-well.patch
  * Add documentation:
    README_Testsuite.md
- Allow log in as user 'slurm'. This allows admins to run certain
  priviledged commands more easily without becoming root. (forwarded request 988732 from eeich)

OBS-URL: https://build.opensuse.org/request/show/988733
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=74
2022-07-13 11:45:23 +00:00
7d13a7ba97 Accepting request 988732 from home:eeich:branches:network:cluster
- Package the Slurm testsuite for QA purposes.
  * Fixes for test suite:
    Keep-logs-of-skipped-test-when-running-test-cases-sequentially.patch
    Fix-test-21.41.patch
    Fix-test-38.11.patch
    Fix-test-32.8.patch
    Fix-test-3.13.patch
    Fix-test7.2-to-find-libpmix-under-lib64-as-well.patch
  * Add documentation:
    README_Testsuite.md
- Allow log in as user 'slurm'. This allows admins to run certain
  priviledged commands more easily without becoming root.

OBS-URL: https://build.opensuse.org/request/show/988732
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=207
2022-07-12 20:03:18 +00:00
52adf61c22 Accepting request 983910 from home:mslacken:branches:network:cluster
- update to 22.05.2 with following fixes:
  * Fix regression which allowed the oversubscription of licenses.
  * Fix a segfault in slurmctld when requesting gres in job arrays.

OBS-URL: https://build.opensuse.org/request/show/983910
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=206
2022-06-20 11:58:11 +00:00
2951a00ce2 - Package the Slrum testsuite for QA purposes.
NOTE: This package is not meant to be used for testing by the
  user but rather for testing by the maintainers to ensure the
  package is working properly.
  DO NOT report test suite failures unless you are able to confirm
  that the failure is really a bug.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=205
2022-06-08 13:21:55 +00:00
Dominique Leuenberger
13c4d39104 Accepting request 980097 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/980097
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=73
2022-05-31 14:04:51 +00:00
faa19fe22b Accepting request 980093 from home:mslacken:branches:network:cluster
- update to 22.05.0 with following changes:
- Support for dynamic node addition and removal
- Support for native Linux cgroup v2 operation
- Newly added plugins to support HPE Slingshot 11 networks
  (switch/hpe_slingshot), and Intel Xe GPUs (gpu/oneapi)
- Added new acct_gather_interconnect/sysfs plugin to collect statistics
  from arbitrary network interfaces.
- Expanded and synced set of environment variables available in the
  Prolog/Epilog/PrologSlurmctld/EpilogSlurmctld scripts.
- New "--prefer" option to job submissions to allow for a "soft
  constraint" request to influence node selection.
- Optional support for license planning in the backfill scheduler with
  "bf_licenses" option in SchedulerParameters.
- removed file slurm-2.4.4-init.patch as sysvinit is now realy deprecated
- removed file load-pmix-major-version.patch as fixed upstream

OBS-URL: https://build.opensuse.org/request/show/980093
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=203
2022-05-31 13:38:54 +00:00
Dominique Leuenberger
737b47d2be Accepting request 976280 from network:cluster
- Add a comment about the CommunicationParameters=block_null_hash
  option warning users who migrate - just in case.

- Update to 21.08.8 which fixes CVE-2022-29500 (bsc#1199278),
  CVE-2022-29501 (bsc#1199279), and CVE-2022-29502 (bsc#1199281).

OBS-URL: https://build.opensuse.org/request/show/976280
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=72
2022-05-12 20:59:35 +00:00
a07f819c2f - Update to 21.08.8 which fixes CVE-2022-29500 (bsc#1199278),
CVE-2022-29501 (bsc#1199279), and CVE-2022-29502 (bsc#1199281).

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=201
2022-05-11 10:26:59 +00:00
5f6ca5dea6 Accepting request 976056 from home:eeich:branches:network:cluster
- Add a comment about the CommunicationParameters=block_null_hash
  option warning users who migrate - just in case.

OBS-URL: https://build.opensuse.org/request/show/976056
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=200
2022-05-11 10:25:15 +00:00
Dominique Leuenberger
62db1261ed Accepting request 975440 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/975440
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=71
2022-05-06 17:00:14 +00:00
950ae37e78 Accepting request 975374 from home:mslacken:branches:network:cluster
- Update to 21.08.8 which fixes CVE-2022-29500, CVE-2022-29501
  and CVE-2022-29502
- Added 'CommunicationParameters=block_null_hash' to slurm.conf, please
  add this parameter to existing configurations.

OBS-URL: https://build.opensuse.org/request/show/975374
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=198
2022-05-06 15:13:12 +00:00
Dominique Leuenberger
2450bd4dcd Accepting request 974456 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/974456
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=70
2022-05-03 19:19:04 +00:00
30c749c9e0 Accepting request 974433 from home:mslacken:branches:network:cluster
- Update to 21.08.7 with following changes:
  * openapi/v0.0.37 - correct calculation for bf_queue_len_mean in /diag.
  * Avoid shrinking a reservation when overlapping with downed nodes.
  * Only check TRES limits against current usage for TRES requested by the job.
  * Do not allocate shared gres (MPS) in whole-node allocations
  * Constrain slurmstepd to job/step cgroup like in previous versions of Slurm.
  * Fix warnings on 32-bit compilers related to printf() formats.
  * Fix reconfigure issues after disabling/reenabling the GANG PreemptMode.
  * Fix race condition where a cgroup was being deleted while another step
    was creating it.
  * Set the slurmd port correctly if multi-slurmd
  * Fix FAIL mail not being sent if a job was cancelled due to preemption.
  * slurmrestd - move debug logs for HTTP handling to be gated by debugflag
    NETWORK to avoid unnecessary logging of communication contents.
  * Fix issue with bad memory access when shrinking running steps.
  * Fix various issues with internal job accounting with GRES when jobs are
    shrunk.
  * Fix ipmi polling on slurmd reconfig or restart.
  * Fix srun crash when reserved ports are being used and het step fails
    to launch.
  * openapi/dbv0.0.37 - fix DELETE execution path on /user/{user_name}.
  * slurmctld - Properly requeue all components of a het job if PrologSlurmctld
    fails.
  * rlimits - remove final calls to limit nofiles to 4096 but to instead use
    the max possible nofiles in slurmd and slurmdbd.
  * Allow the DBD agent to load large messages (up to MAX_BUF_SIZE) from state.
  * Fix potential deadlock during slurmctld restart when there is a completing
    job.
  * slurmstepd - reduce user requested soft rlimits when they are above max
    hard rlimits to avoid rlimit request being completely ignored and

OBS-URL: https://build.opensuse.org/request/show/974433
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=196
2022-05-02 17:06:13 +00:00
Dominique Leuenberger
ec8df38732 Accepting request 942222 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/942222
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=69
2021-12-23 16:53:52 +00:00
d442993ff4 Accepting request 942081 from home:mslacken:branches:network:cluster
- update to 21.08.5 with following changes:
  * Fix issue where typeless GRES node updates were not immediately reflected.
  * Fix setting the default scrontab job working directory so that it's the home
    of the different user (*u <user>) and not that of root or SlurmUser editor.
  * Fix stepd not respecting SlurmdSyslogDebug.
  * Fix concurrency issue with squeue.
  * Fix job start time not being reset after launch when job is packed onto
    already booting node.
  * Fix updating SLURM_NODE_ALIASES for jobs packed onto powering up nodes.
  * Cray - Fix issues with starting hetjobs.
  * auth/jwks - Print fatal() message when jwks is configured but file could
    not be opened.
  * If sacctmgr has an association with an unknown qos as the default qos
    print 'UNKN*###' instead of leaving a blank name.
  * Correctly determine task count when giving --cpus-per-gpu, --gpus and
    *-ntasks-per-node without task count.
  * slurmctld - Fix places where the global last_job_update was not being set
    to the time of update when a job's reason and description were updated.
  * slurmctld - Fix case where a job submitted with more than one partition
    would not have its reason updated while waiting to start.
  * Fix memory leak in node feature rebooting.
  * Fix time limit permanetly set to 1 minute by backfill for job array tasks
    higher than the first with QOS NoReserve flag and PreemptMode configured.
  * Fix sacct -N to show jobs that started in the current second
  * Fix issue on running steps where both SLURM_NTASKS_PER_TRES and
    SLURM_NTASKS_PER_GPU are set.
  * Handle oversubscription request correctly when also requesting
    *-ntasks-per-tres.
  * Correctly detect when a step requests bad gres inside an allocation.
  * slurmstepd - Correct possible deadlock when UnkillableStepTimeout triggers.

OBS-URL: https://build.opensuse.org/request/show/942081
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=195
2021-12-23 10:26:41 +00:00
Dominique Leuenberger
0793824683 Accepting request 932162 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/932162
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=68
2021-11-21 22:51:50 +00:00
350be975f5 Accepting request 932063 from home:aginies:branches:network:cluster
add a ref to SLE-22741 (firewall config) in changelog

OBS-URL: https://build.opensuse.org/request/show/932063
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=194
2021-11-18 09:37:45 +00:00
d4c2b2bcf3 - updated to 21.08.4 which fixes (CVE-2021-43337) which is only present
in 21.08 tree.
  * CVE-2021-43337:
    For sites using the new AccountingStoreFlags=job_script and/or job_env
    options, an issue was reported with the access control rules in SlurmDBD
    that will permit users to request job scripts and environment files that
    they should not have access to. (Scripts/environments are meant to only be
    accessible by user accounts with administrator privileges, by account
    coordinators for jobs submitted under their account, and by the user
    themselves.)
- changes from 21.08.3:
  * This includes a number of fixes since the last release a month ago,
    including one critical fix to prevent a communication issue between
    slurmctld and slurmdbd for sites that have started using the new
    AccountingStoreFlags=job_script functionality.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=193
2021-11-17 08:37:51 +00:00
Dominique Leuenberger
147f929296 Accepting request 928192 from network:cluster
- Utilize sysuser infrastructure to set user/group slurm.
  For munge authentication slurm should have a fixed UID across
  all nodes including the management server. Set it to 120
- Limit firewalld service definitions to SUSE versions &gt;= 15. (forwarded request 928191 from eeich)

OBS-URL: https://build.opensuse.org/request/show/928192
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=67
2021-10-29 20:34:40 +00:00
c67f43163f Accepting request 928191 from home:eeich:branches:network:cluster
- Utilize sysuser infrastructure to set user/group slurm.
  For munge authentication slurm should have a fixed UID across
  all nodes including the management server. Set it to 120
- Limit firewalld service definitions to SUSE versions >= 15.

OBS-URL: https://build.opensuse.org/request/show/928191
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=192
2021-10-29 17:38:05 +00:00
f4a3f06e75 Accepting request 926016 from home:mslacken:branches:network:cluster
- added service definitions for firewalld

OBS-URL: https://build.opensuse.org/request/show/926016
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=191
2021-10-29 14:17:34 +00:00
Dominique Leuenberger
2cf5062473 Accepting request 924633 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/924633
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=66
2021-10-11 13:31:58 +00:00
7a20fda376 Accepting request 923425 from home:mslacken:branches:network:cluster
- update to 21.08.2 
- major change:
  * removed of support of the TaskAffinity=yes option in cgroup.conf. Please
    consider using "TaskPlugins=cgroup,affinity" in slurm.conf as an option.
- minor changes and bugfixes:
  * slurmctld - fix how the max number of cores on a node in a partition are
    calculated when the partition contains multi*socket nodes. This in turn
    corrects certain jobs node count estimations displayed client*side.
  * job_submit/cray_aries - fix "craynetwork" GRES specification after changes
    introduced in 21.08.0rc1 that made TRES always have a type prefix.
  * Ignore nonsensical check in the slurmd for [Pro|Epi]logSlurmctld.
  * Fix writing to stderr/syslog when systemd runs slurmctld in the foreground.
  * Fix issue with updating job started with node range.
  * Fix issue with nodes not clearing state in the database when the slurmctld
    is started with clean*start.
  * Fix hetjob components > 1 timing out due to InactiveLimit.
  * Fix sprio printing -nan for normalized association priority if
    PriorityWeightAssoc was not defined.
  * Disallow FirstJobId=0.
  * Preserve job start info in the database for a requeued job that hadn't
    registered the first time in the database yet.
  * Only send one message on prolog failure from the slurmd.
  * Remove support for TaskAffinity=yes in cgroup.conf.
  * accounting_storage/mysql - fix issue where querying jobs via sacct
    *-whole-hetjob=yes or slurmrestd (which automatically includes this flag)
    could in some cases return more records than expected.
  * Fix issue for preemption of job array task that makes afterok dependency
    fail. Additionally, send emails when requeueing happens due to preemption.
  * Fix sending requeue mail type.
  * Properly resize a job's GRES bitmaps and counts when resizing the job.

OBS-URL: https://build.opensuse.org/request/show/923425
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=190
2021-10-11 08:40:56 +00:00
Dominique Leuenberger
ad0b52bd59 Accepting request 922117 from network:cluster
- moved pam module from /lib64 to /usr/lib64 which fixes boo#1191095 
  via the macro %_pam_moduledir

OBS-URL: https://build.opensuse.org/request/show/922117
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=65
2021-09-29 18:18:55 +00:00
64b9f7f60a macro fixed
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=189
2021-09-29 07:35:03 +00:00
1b26b8910b via the macro %_pam_moduledir
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=188
2021-09-29 07:08:48 +00:00
728a1b3c1e updated major version
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=187
2021-09-28 15:54:50 +00:00
Dominique Leuenberger
de212e226d Accepting request 921717 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/921717
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=64
2021-09-27 18:08:57 +00:00
5b07269e3d Accepting request 919668 from home:mslacken:branches:network:cluster
- updated to 21.08.1 with following bug fixes:
  * Fix potential memory leak if a problem happens while allocating GRES for
    a job.
  * If an overallocation of GRES happens terminate the creation of a job.
  * AutoDetect=nvml: Fatal if no devices found in MIG mode.
  * Print federation and cluster sacctmgr error messages to stderr.
  * Fix off by one error in --gpu-bind=mask_gpu.
  * Add --gpu-bind=none to disable gpu binding when using --gpus-per-task.
  * Handle the burst buffer state "alloc-revoke" which previously would not
    display in the job correctly.
  * Fix issue in the slurmstepd SPANK prolog/epilog handler where configuration
    values were used before being initialized.
  * Restore a step's ability to utilize all of an allocations memory if --mem=0.
  * Fix --cpu-bind=verbose garbage taskid.
  * Fix cgroup task affinity issues from garbage taskid info.
  * Make gres_job_state_validate() client logging behavior as before 44466a4641.
  * Fix steps with --hint overriding an allocation with --threads-per-core.
  * Require requesting a GPU if --mem-per-gpu is requested.
  * Return error early if a job is requesting --ntasks-per-gpu and no gpus or
    task count.
  * Properly clear out pending step if unavailable to run with available
    resources.
  * Kill all processes spawned by burst_buffer.lua including decendents.
  * openapi/v0.0.{35,36,37} - Avoid setting default values of min_cpus,
    job name, cwd, mail_type, and contiguous on job update.
  * openapi/v0.0.{35,36,37} - Clear user hold on job update if hold=false.
  * Prevent CRON_JOB flag from being cleared when loading job state.
  * sacctmgr - Fix deleting WCKeys when not specifying a cluster.
  * Fix getting memory for a step when the first node in the step isn't the
    first node in the allocation.

OBS-URL: https://build.opensuse.org/request/show/919668
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=186
2021-09-27 09:23:35 +00:00
Dominique Leuenberger
8a0b85fee5 Accepting request 917457 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/917457
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=63
2021-09-08 19:36:49 +00:00
e22daa9ce5 Accepting request 917243 from home:eeich:branches:network:cluster
- Fix-statement-condition-in-netloc-autoconf-macro.patch:
  Fix netloc check, reestablish netloc disable code.
- Make configure arg '--with-pmix' conditional.
- Move openapi plugins to package slurm-restd.

OBS-URL: https://build.opensuse.org/request/show/917243
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=185
2021-09-08 07:34:10 +00:00
Dominique Leuenberger
b0f9c9aa23 Accepting request 917119 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/917119
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=62
2021-09-07 19:21:20 +00:00
562a595d05 Accepting request 915777 from home:mslacken:slurm_update
- updated to 21.08.1, major changes:
  * A new "AccountingStoreFlags=job_script" option to store the job scripts
    directly in SlurmDBD.
  * Added "sacct -o SubmitLine" format option to get the submit line 
    of a job/step.
  * Changes to the node state management so that nodes are marked as PLANNED
    instead of IDLE if the scheduler is still accumulating resources while
    waiting to launch a job on them.
  * RS256 token support in auth/jwt.
  * Overhaul of the cgroup subsystems to simplify operation, mitigate a number
    of inherent race conditions, and prepare for future cgroup v2 support.
  * Further improvements to cloud node power state management.
  * A new child process of the Slurm controller called "slurmscriptd"
    responsible for executing PrologSlurmctld and EpilogSlurmctld scripts,
    which significantly reduces performance issues associated with enabling
    those options.
  * A new burst_buffer/lua plugin allowing for site-specific asynchronous job
    data management.
  * Fixes to the job_container/tmpfs plugin to allow the slurmd process to be
    restarted while the job is running without issue.
  * Added json/yaml output to sacct, squeue, and sinfo commands.
  * Added a new node_features/helpers plugin to provide a generic way to change
    settings on a compute node across a reboot.
  * Added support for automatically detecting and broadcasting shared libraries
    for an executable launched with "srun --bcast".
  * Added initial OCI container execution support with a new --container option
    to sbatch and srun.
  * Improved "configless" support by allowing multiple control servers to be
    specified through the slurmd --conf-server option, and send additional
    configuration files at startup including cli_filter.lua.

OBS-URL: https://build.opensuse.org/request/show/915777
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=184
2021-09-06 13:29:00 +00:00
Dominique Leuenberger
2c3271fa4b Accepting request 903746 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/903746
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=61
2021-07-03 18:50:46 +00:00
b61c5b25fa Accepting request 903744 from home:mslacken:slurm_update
- Updated to  20.11.8:
  * slurmctld - fix erroneous "StepId=CORRUPT" messages in error logs.
  * Correct the error given when auth plugin fails to pack a credential.
  * Fix unused-variable compiler warning on FreeBSD in fd_resolve_path().
  * acct_gather_filesystem/lustre - only emit collection error once per step.
  * Add GRES environment variables (e.g., CUDA_VISIBLE_DEVICES) into the
    interactive step, the same as is done for the batch step.
  * Fix various potential deadlocks when altering objects in the database
    dealing with every cluster in the database.
  * slurmrestd:
   - handle slurmdbd connection failures without segfaulting.
   - fix segfault for searches in slurmdb/v0.0.36/jobs.
   - remove (non-functioning) users query parameter for
     slurmdb/v0.0.36/jobs from openapi.json
   - fix segfault in slurmrestd db/jobs with numeric queries
   - add argv handling for job/submit endpoint.
   - add description for slurmdb/job endpoint.
  * slurmrestd/dbv0.0.36:
   - Fix values dumped in job state/current and
     job step state.
   - Correct description for previous state property.
  * srun:
   - fix broken node step allocation in a heterogeneous allocation.
   - leave SLURM_DIST_UNKNOWN as default for --interactive.
  * Fail step creation if -n is not multiple of --ntasks-per-gpu.
  * job_container/tmpfs - Fix slowdown on teardown.
  * Fix problem with SlurmctldProlog where requeued jobs would never launch.
  * job_container/tmpfs - Fix issue when restarting slurmd where the namespace
    mount points could disappear.
  * sacct:

OBS-URL: https://build.opensuse.org/request/show/903744
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=183
2021-07-02 15:32:26 +00:00
Dominique Leuenberger
fa3ad08714 Accepting request 894432 from network:cluster
- New features in 20.11.7:
- New features in 20.11.6:
- Fix Provides:/Conflicts: for libnss_slurm (bsc#1180700).

OBS-URL: https://build.opensuse.org/request/show/894432
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=60
2021-05-20 17:25:01 +00:00
b4f7e9209d - New features in 20.11.7:
- New features in 20.11.6:
- Fix Provides:/Conflicts: for libnss_slurm (bsc#1180700).

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=181
2021-05-19 18:34:28 +00:00
Dominique Leuenberger
8c583ef6c0 Accepting request 893087 from network:cluster
- Updated to 20.11.7 which fixes CVE-2021-31215 (bsc#1186024)
- New featuresi from 20.11.7:
 * slurmd - handle configless failures gracefully instead of hanging
   indefinitely.
 * select/cons_tres - fix Dragonfly topology not selecting nodes in the same
   leaf switch when it should as well as requests with *-switches option.
 * Fix issue where certain step requests wouldn't run if the first node in the
   job allocation was full and there were idle resources on other nodes in
   the job allocation.
 * Fix deadlock issue with <Prolog|Epilog>Slurmctld.
 * torque/qstat - fix printf error message in output.
 * When adding associations or wckeys avoid checking multiple times a user or
   cluster name.
 * Fix wrong jobacctgather information on a step on multiple nodes
   due to timeouts sending its the information gathered on its node.
 * Fix missing xstrdup which could result in slurmctld segfault on array jobs.
 * Fix security issue in PrologSlurmctld and EpilogSlurmctld by always
   prepending SPANK_ to all user-set environment variables. CVE-2021-31215.
- New features from 20.11.6:
 * Fix sacct assert with the --qos option.
 * Use pkg-config --atleast-version instead of --modversion for systemd.
 * common/fd - fix getsockopt() call in fd_get_socket_error().
 * Properly handle the return from fd_get_socket_error() in _conn_readable().
 * cons_res - Fix issue where running jobs were not taken into consideration
   when creating a reservation.
 * Avoid a deadlock between job_list for_each and assoc QOS_LOCK.
 * Fix TRESRunMins usage for partition qos on restart/reconfig.
 * Fix printing of number of tasks on a completed job that didn't request
   tasks.
 * Fix updating GrpTRESRunMins when decrementing job time is bigger than it.

OBS-URL: https://build.opensuse.org/request/show/893087
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=59
2021-05-14 23:24:22 +00:00
89b4ed3f9f - Updated to 20.11.7 which fixes CVE-2021-31215 (bsc#1186024)
- New featuresi from 20.11.7:
 * slurmd - handle configless failures gracefully instead of hanging
   indefinitely.
 * select/cons_tres - fix Dragonfly topology not selecting nodes in the same
   leaf switch when it should as well as requests with *-switches option.
 * Fix issue where certain step requests wouldn't run if the first node in the
   job allocation was full and there were idle resources on other nodes in
   the job allocation.
 * Fix deadlock issue with <Prolog|Epilog>Slurmctld.
 * torque/qstat - fix printf error message in output.
 * When adding associations or wckeys avoid checking multiple times a user or
   cluster name.
 * Fix wrong jobacctgather information on a step on multiple nodes
   due to timeouts sending its the information gathered on its node.
 * Fix missing xstrdup which could result in slurmctld segfault on array jobs.
 * Fix security issue in PrologSlurmctld and EpilogSlurmctld by always
   prepending SPANK_ to all user-set environment variables. CVE-2021-31215.
- New features from 20.11.6:
 * Fix sacct assert with the --qos option.
 * Use pkg-config --atleast-version instead of --modversion for systemd.
 * common/fd - fix getsockopt() call in fd_get_socket_error().
 * Properly handle the return from fd_get_socket_error() in _conn_readable().
 * cons_res - Fix issue where running jobs were not taken into consideration
   when creating a reservation.
 * Avoid a deadlock between job_list for_each and assoc QOS_LOCK.
 * Fix TRESRunMins usage for partition qos on restart/reconfig.
 * Fix printing of number of tasks on a completed job that didn't request
   tasks.
 * Fix updating GrpTRESRunMins when decrementing job time is bigger than it.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=179
2021-05-14 10:35:47 +00:00
Dominique Leuenberger
7cb151db13 Accepting request 890262 from network:cluster
- Ship REST API version and auth plugins with slurmrestd.
- Add YAML support for REST API to build (bsc#1185603). (forwarded request 890261 from eeich)

OBS-URL: https://build.opensuse.org/request/show/890262
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=58
2021-05-04 20:00:59 +00:00
47fc726263 Accepting request 890261 from home:eeich:branches:network:cluster
- Ship REST API version and auth plugins with slurmrestd.
- Add YAML support for REST API to build (bsc#1185603).

OBS-URL: https://build.opensuse.org/request/show/890261
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=177
2021-05-04 08:36:53 +00:00
Dominique Leuenberger
a4d0f3eef7 Accepting request 879660 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/879660
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=57
2021-03-17 19:16:54 +00:00
Ana Guerrero
ff5dc58526 Accepting request 879659 from home:anag:branches:home:mslacken:slurm_up
update + typo fix

OBS-URL: https://build.opensuse.org/request/show/879659
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=175
2021-03-17 10:26:51 +00:00
Richard Brown
7f8f9f1010 Accepting request 874787 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/874787
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=56
2021-02-25 17:27:59 +00:00
927cd6ab24 Accepting request 874647 from home:mslacken:branches:network:cluster
- Udpate to 20.11.04
 * Fix node selection for advanced reservations with features.
 * mpi/pmix: Handle pipe failure better when using ucx.
 * mpi/pmix: include PMIX_NODEID for each process entry.
 * Fix job getting rejected after being requeued on same node that died.
 * job_submit/lua - add "network" field.
 * Fix situations when a reoccuring reservation could erroneously skip a
   period.
 * Ensure that a reservations [pro|epi]log are ran on reoccuring reservations.
 * Fix threads-per-core memory allocation issue when using CR_CPU_MEMORY.
 * Fix scheduling issue with --gpus.
 * Fix gpu allocations that request --cpus-per-task.
 * mpi/pmix: fixed print messages for all PMIXP_* macros
 * Add mapping for XCPU to --signal option.
 * Fix regression in 20.11 that prevented a full pass of the main scheduler
   from ever executing.
 * Work around a glibc bug in which "0" is incorrectly printed as "nan"
   which will result in corrupted association state on restart.
 * Fix regression in 20.11 which made slurmd incorrectly attempt to find the
   parent slurmd address when not applicable and send incorrect reverse*tree
   info to the slurmstepd.
 * Fix cgroup ns detection when using containers (e.g. LXC or Docker).
 * scrontab - change temporary file handling to work with emacs. 
- Removed check-for-lipmix.so.MAJOR.patch
- Added: load-pmix-major-version.patch

OBS-URL: https://build.opensuse.org/request/show/874647
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=173
2021-02-24 09:49:16 +00:00
Dominique Leuenberger
1a5fe227cc Accepting request 865000 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/865000
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=55
2021-01-20 17:29:15 +00:00
Ana Guerrero
4ab9986278 Accepting request 864993 from home:anag:branches:network:cluster
- Update to 20.11.03
- This release includes a major functional change to how job step launch is 
  handled compared to the previous 20.11 releases. This affects srun as 
  well as MPI stacks - such as Open MPI - which may use srun internally as 
  part of the process launch.
  One of the changes made in the Slurm 20.11 release was to the semantics 
  for job steps launched through the 'srun' command. This also 
  inadvertently impacts many MPI releases that use srun underneath their 
  own mpiexec/mpirun command.
  For 20.11.{0,1,2} releases, the default behavior for srun was changed  
  such that each step was allocated exactly what was requested by the 
  options given to srun, and did not have access to all resources assigned 
  to the job on the node by default. This change was equivalent to Slurm 
  setting the --exclusive option by default on all job steps. Job steps 
  desiring all resources on the node needed to explicitly request them 
  through the new '--whole' option.
  In the 20.11.3 release, we have reverted to the 20.02 and older behavior 
  of assigning all resources on a node to the job step by default.
  This reversion is a major behavioral change which we would not generally 
  do on a maintenance release, but is being done in the interest of 
  restoring compatibility with the large number of existing Open MPI (and 
  other MPI flavors) and job scripts that exist in production, and to 
  remove what has proven to be a significant hurdle in moving to the new 
  release.
  Please note that one change to step launch remains - by default, in 
  20.11 steps are no longer permitted to overlap on the resources they 
  have been assigned. If that behavior is desired, all steps must 
  explicitly opt-in through the newly added '--overlap' option.
  Further details and a full explanation of the issue can be found at:
  https://bugs.schedmd.com/show_bug.cgi?id=10383#c63

OBS-URL: https://build.opensuse.org/request/show/864993
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=171
2021-01-20 13:58:46 +00:00
Dominique Leuenberger
9123a17403 Accepting request 861777 from network:cluster
- Fix fallout introduced by:
  "Replace  '%service_del_postun -n' with '%service_del_postun_without_restart'"
  for older Leap/SLE versions. (forwarded request 861776 from eeich)

OBS-URL: https://build.opensuse.org/request/show/861777
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=54
2021-01-10 18:43:37 +00:00
82c61d739d Accepting request 861776 from home:eeich:branches:network:cluster
- Fix fallout introduced by:
  "Replace  '%service_del_postun -n' with '%service_del_postun_without_restart'"
  for older Leap/SLE versions.

OBS-URL: https://build.opensuse.org/request/show/861776
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=169
2021-01-08 17:40:48 +00:00
Dominique Leuenberger
7cf27bf750 Accepting request 861655 from network:cluster
- Fix Provides:/Conflicts: for libnss_slurm.

- Replace  '%service_del_postun -n' with '%service_del_postun_without_restart'
  '-n' is deprecated and will be removed in the future.

OBS-URL: https://build.opensuse.org/request/show/861655
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=53
2021-01-08 16:39:19 +00:00
0d02ad4cfa - Fix Provides:/Conflicts: for libnss_slurm.
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=167
2021-01-08 12:21:49 +00:00
c50d4048dc Accepting request 845752 from home:fbui:branches:network:cluster
- Replace  '%service_del_postun -n' with '%service_del_postun_without_restart'
  '-n' is deprecated and will be removed in the future.

OBS-URL: https://build.opensuse.org/request/show/845752
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=166
2021-01-08 12:18:52 +00:00
Dominique Leuenberger
a8ec215de5 Accepting request 860691 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/860691
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=52
2021-01-06 18:57:04 +00:00
Ana Guerrero
08c7233b38 Accepting request 860690 from home:anag:branches:network:cluster
- Add support for configuration files from external plugins. 
  While built-in plugins have their configuration added in slurm.conf,
  external SPANK plugins add their configuration to plugstack.conf
  To allow packaging easily spank plugins, their configuration files
  should be added independently at /etc/spack/plugstack.conf.d and
  plugstack.conf should be left with an oneliner including all the
  files under /etc/spack/plugstack.conf.d

OBS-URL: https://build.opensuse.org/request/show/860690
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=164
2021-01-06 10:42:08 +00:00
Dominique Leuenberger
3f82fae399 Accepting request 859115 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/859115
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=51
2020-12-29 14:52:58 +00:00
Ana Guerrero
caa18eaeaa Accepting request 859114 from home:anag:branches:network:cluster
- Update to 20.11.02 
  * Fix older versions of sacct not working with 20.11.
  * Fix slurmctld crash when using a pre-20.11 srun in a job allocation.
  * Correct logic problem in _validate_user_access.
  * Fix libpmi to initialize Slurm configuration correctly.
- Update to 20.11.01
  * Fix spelling of "overcomited" to "overcomitted" in sreport's cluster
    utilization report.
  * Silence debug message about shutting down backup controllers if none are
    configured.
  * Don't create interactive srun until PrologSlurmctld is done.
  * Fix fd symlink path resolution.
  * Fix slurmctld segfault on subnode reservation restore after node
    configuration change.
  * Fix resource allocation response message environment allocation size.
  * Ensure that details->env_sup is NULL terminated.
  * select/cray_aries - Correctly remove jobs/steps from blades using NPC.
  * cons_tres - Avoid max_node_gres when entire node is allocated with
    --ntasks-per-gpu.
  * Allow NULL arg to data_get_type().
  * In sreport have usage for a reservation contain all jobs that ran in the
    reservation instead of just the ones that ran in the time specified. This
    matches the report for the reservation is not truncated for a time period.
  * Fix issue with sending wrong batch step id to a < 20.11 slurmd.
  * Add a job's alloc_node to lua for job modification and completion.
  * Fix regression getting a slurmdbd connection through the perl API.
  * Stop the extern step terminate monitor right after proctrack_g_wait().
  * Fix removing the normalized priority of assocs.
  * slurmrestd/v0.0.36 - Use correct name for partition field:
    "min nodes per job" -"min_nodes_per_job".

OBS-URL: https://build.opensuse.org/request/show/859114
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=162
2020-12-29 03:15:30 +00:00
Dominique Leuenberger
72671d260f Accepting request 853268 from network:cluster
- Update to version 20.11.0
  Slurm 20.11 includes a number of new features including:
  * Overhaul of the job step management and launch code, alongside improved
    GPU task placement support.
  * A new "Interactive Step" mode of operation for salloc.
  * A new "scrontab" command that can be used to submit and manage
    periodically repeating jobs.
  * IPv6 support.
  * Changes to the reservation logic, with new options allowing users
    to delete reservations, allowing admins to skip the next occurance of a
    repeated reservation, and allowing for a job to be submitted and eligible
    to run within multiple reservations.
  * Dynamic Future Nodes - automatically associate a dynamically
    provisioned (or "cloud") node against a NodeName definition with matching
    hardware.
  * An experimental new RPC queuing mode for slurmctld to reduce thread
    contention on heavily loaded clusters.
  * SlurmDBD integration with the Slurm REST API.
  Also check
  https://github.com/SchedMD/slurm/blob/slurm-20-11-0-1/RELEASE_NOTES (forwarded request 852039 from eeich)

OBS-URL: https://build.opensuse.org/request/show/853268
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=50
2020-12-05 19:37:37 +00:00
d5d3aa2162 Accepting request 852039 from home:eeich:branches:network:cluster
- Update to version 20.11.0
  Slurm 20.11 includes a number of new features including:
  * Overhaul of the job step management and launch code, alongside improved
    GPU task placement support.
  * A new "Interactive Step" mode of operation for salloc.
  * A new "scrontab" command that can be used to submit and manage
    periodically repeating jobs.
  * IPv6 support.
  * Changes to the reservation logic, with new options allowing users
    to delete reservations, allowing admins to skip the next occurance of a
    repeated reservation, and allowing for a job to be submitted and eligible
    to run within multiple reservations.
  * Dynamic Future Nodes - automatically associate a dynamically
    provisioned (or "cloud") node against a NodeName definition with matching
    hardware.
  * An experimental new RPC queuing mode for slurmctld to reduce thread
    contention on heavily loaded clusters.
  * SlurmDBD integration with the Slurm REST API.
  Also check
  https://github.com/SchedMD/slurm/blob/slurm-20-11-0-1/RELEASE_NOTES

OBS-URL: https://build.opensuse.org/request/show/852039
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=160
2020-12-05 14:46:07 +00:00
Dominique Leuenberger
b50ab109f0 Accepting request 849253 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/849253
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=49
2020-11-19 10:59:53 +00:00
Ana Guerrero
370ac32279 Accepting request 849252 from home:anag:branches:network:cluster
- Updated to 20.02.6, addresses two security fixes:
  * PMIx - fix potential buffer overflows from use of unpackmem().
    CVE-2020-27745 (bsc#1178890)
  * X11 forwarding - fix potential leak of the magic cookie when sent as an
     argument to the xauth command. CVE-2020-27746 (bsc#1178891)
- And many other bugfixes, full log and details available at:
  * https://lists.schedmd.com/pipermail/slurm-announce/2020/000045.html

OBS-URL: https://build.opensuse.org/request/show/849252
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=158
2020-11-18 09:57:56 +00:00
Dominique Leuenberger
c14773b6e9 Accepting request 845437 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/845437
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=48
2020-11-03 14:22:10 +00:00
e481851f5a Accepting request 845108 from home:anag:branches:network:cluster
- Updated to 20.02.5, changes:
 * Fix leak of TRESRunMins when job time is changed with --time-min
 * pam_slurm - explicitly initialize slurm config to support configless mode.
 * scontrol - Fix exit code when creating/updating reservations with wrong
   Flags.
 * When a GRES has a no_consume flag, report 0 for allocated.
 * Fix cgroup cleanup by jobacct_gather/cgroup.
 * When creating reservations/jobs don't allow counts on a feature unless
   using an XOR.
 * Improve number of boards discovery
 * Fix updating a reservation NodeCnt on a zero-count reservation.
 * slurmrestd - provide an explicit error messages when PSK auth fails.
 * cons_tres - fix job requesting single gres per-node getting two or more
   nodes with less CPUs than requested per-task.
 * cons_tres - fix calculation of cores when using gres and cpus-per-task.
 * cons_tres - fix job not getting access to socket without GPU or with less
   than --gpus-per-socket when not enough cpus available on required socket
   and not using --gres-flags=enforce binding.
 * Fix HDF5 type version build error.
 * Fix creation of CoreCnt only reservations when the first node isn't
   available.
 * Fix wrong DBD Agent queue size in sdiag when using accounting_storage/none.
 * Improve job constraints XOR option logic.
 * Fix preemption of hetjobs when needed nodes not in leader component.
 * Fix wrong bit_or() messing potential preemptor jobs node bitmap, causing
   bad node deallocations and even allocation of nodes from other partitions.
 * Fix double-deallocation of preempted non-leader hetjob components.
 * slurmdbd - prevent truncation of the step nodelists over 4095.
 * Fix nodes remaining in drain state state after rebooting with ASAP option.
 - changes from 20.02.4:

OBS-URL: https://build.opensuse.org/request/show/845108
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=156
2020-11-02 13:42:03 +00:00
Dominique Leuenberger
b0358c26fb Accepting request 819285 from network:cluster
- Add support for openPMIx also for Leap/SLE 15.0/1 (bsc#1173805).
- Do not run %check on SLE-12-SP2: Some incompatibility in tcl
  makes this fail.
- Remove unneeded build dependency to postgresql-devel.
- Disable build on s390 (requires 64bit).

OBS-URL: https://build.opensuse.org/request/show/819285
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=47
2020-07-08 17:16:29 +00:00
e3512185d8 - Disable build on s390 (requires 64bit).
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=154
2020-07-07 20:14:00 +00:00
361d99b111 - Add support for openPMIx also for Leap/SLE 15.0/1 (bsc#1173805).
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=153
2020-07-07 16:20:06 +00:00
4b04d88697 Accepting request 819233 from home:eeich:branches:network:cluster
- Add support for openPMIx also for Leap/SLE 15.0/1.
- Do not run %check on SLE-12-SP2: Some incompatibility in tcl
  makes this fail.
- Remove unneeded build dependency to postgresql-devel.

OBS-URL: https://build.opensuse.org/request/show/819233
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=152
2020-07-07 13:08:10 +00:00
Dominique Leuenberger
a8d157b928 Accepting request 815491 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/815491
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=46
2020-07-02 22:07:05 +00:00
e8d4b0e920 Accepting request 811475 from home:eeich:branches:network:cluster
- Bring QA to the package build: add %%check stage.
- Remove cruft that isn't needed any longer.
- Add 'ghosted' run-file.
- Add rpmlint filter to handle issues with library packages
  for Leap and enterprise upgrade versions.

- Treat libnss_slurm like any other package: add version string to
  upgrade package.

OBS-URL: https://build.opensuse.org/request/show/811475
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=150
2020-06-17 11:15:39 +00:00
Yuchen Lin
5da789b9e4 Accepting request 808569 from network:cluster
- Updated to 20.02.3 which fixes CVE-2020-12693 (bsc#1172004).
- Other changes are:
 * Factor in ntasks-per-core=1 with cons_tres.
 * Fix formatting in error message in cons_tres.
 * Fix calling stat on a NULL variable.
 * Fix minor memory leak when using reservations with flags=first_cores.
 * Fix gpu bind issue when CPUs=Cores and ThreadsPerCore > 1 on a node.
 * Fix --mem-per-gpu for heterogenous --gres requests.
 * Fix slurmctld load order in load_all_part_state().
 * Fix race condition not finding jobacct gather task cgroup entry.
 * Suppress error message when selecting nodes on disjoint topologies.
 * Improve performance of _pack_default_job_details() with large number of job
 * arguments.
 * Fix archive loading previous to 17.11 jobs per-node req_mem.
 * Fix regresion validating that --gpus-per-socket requires --sockets-per-node
 * for steps. Should only validate allocation requests.
 * error() instead of fatal() when parsing an invalid hostlist.
 * nss_slurm - fix potential deadlock in slurmstepd on overloaded systems.
 * cons_tres - fix --gres-flags=enforce-binding and related --cpus-per-gres.
 * cons_tres - Allocate lowest numbered cores when filtering cores with gres.
 * Fix getting system counts for named GRES/TRES.
 * MySQL - Fix for handing typed GRES for association rollups.
 * Fix step allocations when tasks_per_core > 1.
 * Fix allocating more GRES than requested when asking for multiple GRES types.

OBS-URL: https://build.opensuse.org/request/show/808569
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=45
2020-05-26 15:21:07 +00:00
85a31ae1b5 - Updated to 20.02.3 which fixes CVE-2020-12693 (bsc#1172004).
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=148
2020-05-25 05:01:16 +00:00
6f1a2e50da Accepting request 808130 from home:mslacken:branches:network:cluster
- Updated to 20.02.3 which fixes CVE-2020-12693
- Other changes are:
 * Factor in ntasks-per-core=1 with cons_tres.
 * Fix formatting in error message in cons_tres.
 * Fix calling stat on a NULL variable.
 * Fix minor memory leak when using reservations with flags=first_cores.
 * Fix gpu bind issue when CPUs=Cores and ThreadsPerCore > 1 on a node.
 * Fix --mem-per-gpu for heterogenous --gres requests.
 * Fix slurmctld load order in load_all_part_state().
 * Fix race condition not finding jobacct gather task cgroup entry.
 * Suppress error message when selecting nodes on disjoint topologies.
 * Improve performance of _pack_default_job_details() with large number of job
 * arguments.
 * Fix archive loading previous to 17.11 jobs per-node req_mem.
 * Fix regresion validating that --gpus-per-socket requires --sockets-per-node
 * for steps. Should only validate allocation requests.
 * error() instead of fatal() when parsing an invalid hostlist.
 * nss_slurm - fix potential deadlock in slurmstepd on overloaded systems.
 * cons_tres - fix --gres-flags=enforce-binding and related --cpus-per-gres.
 * cons_tres - Allocate lowest numbered cores when filtering cores with gres.
 * Fix getting system counts for named GRES/TRES.
 * MySQL - Fix for handing typed GRES for association rollups.
 * Fix step allocations when tasks_per_core > 1.
 * Fix allocating more GRES than requested when asking for multiple GRES types.

OBS-URL: https://build.opensuse.org/request/show/808130
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=147
2020-05-22 09:31:56 +00:00
Dominique Leuenberger
5a31b5df57 Accepting request 788917 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/788917
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=44
2020-03-27 20:58:27 +00:00
8ae99b8cc0 Accepting request 788905 from home:mslacken:branches:network:cluster
- Updated to 20.02.1 with following changes"
 * Improve job state reason for jobs hitting partition_job_depth.
 * Speed up testing of singleton dependencies.
 * Fix negative loop bound in cons_tres.
 * srun - capture the MPI plugin return code from mpi_hook_client_fini() and
   use as final return code for step failure.
 * Fix segfault in cli_filter/lua.
 * Fix --gpu-bind=map_gpu reusability if tasks > elements.
 * Make sure config_flags on a gres are sent to the slurmctld on node
   registration.
 * Prolog/Epilog - Fix missing GPU information.
 * Fix segfault when using config parser for expanded lines.
 * Fix bit overlap test function.
 * Don't accrue time if job begin time is in the future.
 * Remove accrue time when updating a job start/eligible time to the future.
 * Fix regression in 20.02.0 that broke --depend=expand.
 * Reset begin time on job release if it's not in the future.
 * Fix for recovering burst buffers when using high-availability.
 * Fix invalid read due to freeing an incorrectly allocated env array.
 * Update slurmctld -i message to warn about losing data.
 * Fix scontrol cancel_reboot so it clears the DRAIN flag and node reason for a
   pending ASAP reboot.

OBS-URL: https://build.opensuse.org/request/show/788905
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=145
2020-03-27 08:46:13 +00:00
Dominique Leuenberger
f1816bc1fc Accepting request 784699 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/784699
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=43
2020-03-14 08:56:29 +00:00
efb023382f Accepting request 783058 from home:eeich:branches:network:cluster
- Remove legacy_cray: with 20.02 the special treatment for
  cray-specific plugins on SLE version prior to 15SP2 is
  no longer required.

OBS-URL: https://build.opensuse.org/request/show/783058
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=143
2020-03-13 17:33:40 +00:00
Dominique Leuenberger
1721f9e4de Accepting request 781815 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/781815
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=42
2020-03-05 22:23:46 +00:00
cf20470554 Accepting request 781517 from home:mslacken:branches:network:cluster
- slurm-plugins will now also require pmix not only libpmix 
  (bsc#1164326)

OBS-URL: https://build.opensuse.org/request/show/781517
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=141
2020-03-05 10:42:25 +00:00
Dominique Leuenberger
d674bda646 Accepting request 780356 from network:cluster
- Removed autopatch as it doesn't work for the SLE-11-SP4 build.

- pmix searches now also for libpmix.so.2 so that there is no dependency
  for devel package (bsc#1164386)
  * added patch file check-for-lipmix.so.MAJOR.patch
  * reworded patch file Remove-rpath-from-build.patch to use %autopatch (forwarded request 780353 from eeich)

OBS-URL: https://build.opensuse.org/request/show/780356
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=41
2020-03-01 20:27:50 +00:00
fd9e32c9b0 Accepting request 780353 from home:eeich:branches:network:cluster
- Removed autopatch as it doesn't work for the SLE-11-SP4 build.

- pmix searches now also for libpmix.so.2 so that there is no dependency
  for devel package (bsc#1164386)
  * added patch file check-for-lipmix.so.MAJOR.patch
  * reworded patch file Remove-rpath-from-build.patch to use %autopatch

OBS-URL: https://build.opensuse.org/request/show/780353
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=139
2020-02-28 17:43:45 +00:00
Dominique Leuenberger
146edf5651 Accepting request 780146 from network:cluster
- Disable %arm builds as this is no longer supported. (forwarded request 780053 from kasimir)

OBS-URL: https://build.opensuse.org/request/show/780146
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=40
2020-02-28 14:21:36 +00:00
6bfc8d389d Accepting request 780053 from home:kasimir:branches:network:cluster
- Disable %arm builds as this is no longer supported.

OBS-URL: https://build.opensuse.org/request/show/780053
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=137
2020-02-28 07:48:43 +00:00
63d5c47eb1 Accepting request 779379 from home:eeich:branches:network:cluster
- Update to version 20.02.0 (jsc#SLE-8491)
  * Fix minor memory leak in slurmd on reconfig.
  * Fix invalid ptr reference when rolling up data in the database.
  * Change shtml2html.py to require python3 for RHEL8 support, and match
    man2html.py.
  * slurm.spec - override "hardening" linker flags to ensure RHEL8 builds
    in a usable manner.
  * Fix type mismatches in the perl API.
  * Prevent use of uninitialized slurmctld_diag_stats.
  * Fixed various Coverity issues.
  * Only show warning about root-less topology in daemons.
  * Fix accounting of jobs in IGNORE_JOBS reservations.
  * Fix issue with batch steps state not loading correctly when upgrading from
    19.05.
  * Deprecate max_depend_depth in SchedulerParameters and move it to
    DependencyParameters.
  * Silence erroneous error on slurmctld upgrade when loading federation state.
  * Break infinite loop in cons_tres dealing with incorrect tasks per tres
    request resulting in slurmctld hang.
  * Improve handling of --gpus-per-task to make sure appropriate number of GPUs
    is assigned to job.
  * Fix seg fault on cons_res when requesting --spread-job.
- Move to python3 for everything but SLE-11-SP4
  * For SLE-11-SP4 add a workaround to handle a python3 script (python2.7
    compliant).

OBS-URL: https://build.opensuse.org/request/show/779379
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=136
2020-02-26 11:12:32 +00:00
e5be8f4bf8 - Add explicit version dependency to libpmix as well.
'slurm-devel' has a tight version dependency on libpmix -
  allowing multiple libpmix versions in one package repository
  is therefore essential.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=135
2020-02-19 21:31:15 +00:00
f9c5d7da3d Accepting request 774250 from home:eeich:branches:network:cluster
- Update to version 20.02.0-rc1
  * sbatch - fix segfault when no newline at the end of a burst buffer file.
  * Change scancel to only check job's base state when matching -t options.
  * Save job dependency list in state files.
  * cons_tres - allow jobs to be run on systems with root-less topologies.
  * Restore pre-20.02pre1 PrologSlurmctld synchonization behavior to avoid
    various race conditions, and ensure proper batch job launch.
  * Add new slurmrestd command/daemon which implements the Slurm REST API.

- Update to version 20.02.0-0pre1, highlights are

OBS-URL: https://build.opensuse.org/request/show/774250
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=134
2020-02-14 07:52:54 +00:00
37bd1399a2 - set %base_ver for SLE-15-SP2 to 2002.
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=133
2020-02-12 14:08:43 +00:00
dcb09720c2 - Remove man5/cray.* from legacy packages.
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=132
2020-02-11 20:53:34 +00:00
2ba901b4dc - Add mpi_cray_shasta.so to plugins for legacy.
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=131
2020-02-11 20:44:25 +00:00
54640668e5 Accepting request 773459 from home:mslacken:branches:network:cluster
- Updated to version 20.02.0-0pre1, highlights are
  Highlights:
 * Exclusive behavior of a node includes all GRES on a node as well
   as the cpus.
 * Use python3 instead of python for internal build/test scripts.
   The slurm.spec file has been updated to depend on python3 as well.
 * Added new NodeSet configuration option to help simplify partition
   configuration sections for heterogeneous / condo*style clusters.
 * Added slurm.conf option MaxDBDMsgs to control how many messages will be
   stored in the slurmctld before throwing them away when the slurmdbd is down.
 * The checkpoint plugin interface and all associated API calls have been
   removed.
 * slurm_init_job_desc_msg() initializes mail_type as uint16_t. This allows
   mail_type to be set to NONE with scontrol.
 * Add new slurm_spank_log() function to print messages back to the user from
   within a SPANK plugin without prepending "error: " from slurm_error().
 * Enforce having partition name and nodelist=ALL when creating reservations
   with flags=PART_NODES.
 * SPANK - removed never-implemented slurm_spank_slurmd_init() interface. This
   hook has always been accessible through slurm_spank_init() in the
   S_CTX_SLURMD context instead.
 * sbcast - add new BcastAddr option to NodeName lines to allow sbcast traffic
   to flow over an alternate network path.
 * Added auth/jwt plugin, and 'scontrol token' subcommand.  PMIx - improve
 * performance of proc map generation.  Deprecate kill_invalid_depend in
 * SchedulerParameters and move it to a new
   option called DependencyParameters.
 * Enable job dependencies for any job on any cluster in the same federation.
 * Allow clusters to be added automatically to db at startup of ctld.  Add
 * AccountingStorageExternalHost slurm.conf parameter.  The

OBS-URL: https://build.opensuse.org/request/show/773459
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=130
2020-02-11 14:31:26 +00:00
Dominique Leuenberger
2d575836ec Accepting request 770336 from network:cluster
- standard slurm.conf uses now also SlurmctldHost on all build 
  targets (bsc#1162377)

OBS-URL: https://build.opensuse.org/request/show/770336
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=39
2020-02-06 12:08:18 +00:00
d94a66a178 - standard slurm.conf uses now also SlurmctldHost on all build
targets (bsc#1162377)

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=128
2020-02-05 15:38:55 +00:00
Dominique Leuenberger
07b9288aee Accepting request 767593 from network:cluster
- Fix a missed systemd_requires -> systemd_ordering conversion.

OBS-URL: https://build.opensuse.org/request/show/767593
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=38
2020-01-27 19:17:10 +00:00
17b070147f - Fix a missed systemd_requires -> systemd_ordering conversion.
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=126
2020-01-27 08:54:27 +00:00
Dominique Leuenberger
c06abc983b Accepting request 767017 from network:cluster
- Remove special OHPC compatibility macro: these settings should
  be applied univerally.
- Add a Recommends for mariadb to slurm-slurmdbd: it is recommened
  to run the database on the same machine as the daemon. (forwarded request 767005 from eeich)

OBS-URL: https://build.opensuse.org/request/show/767017
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=37
2020-01-25 12:23:59 +00:00
73e298f12f Accepting request 767005 from home:eeich:branches:network:cluster
- Remove special OHPC compatibility macro: these settings should
  be applied univerally.
- Add a Recommends for mariadb to slurm-slurmdbd: it is recommened
  to run the database on the same machine as the daemon.

OBS-URL: https://build.opensuse.org/request/show/767005
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=124
2020-01-25 06:14:47 +00:00
345d1bbb94 Accepting request 766872 from home:dimstar:Factory
- BuildRequire pkgconfig(systemd) instead of systemd: allow OBS to
  shortcut through the -mini flavors.
- Use systemd_ordering instead of systemd_requires: systemd is
  never a strict requirement; but in case the system is scheduled
  for installation together with systemd, we want systemd to be
  installed prior to slurm.

- start slurmdbd after mariadb (bsc#1161716)

OBS-URL: https://build.opensuse.org/request/show/766872
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=123
2020-01-24 17:12:50 +00:00
995841bad4 Accepting request 766677 from home:mslacken:branches:network:cluster
- start slurmdbd after mariabd (bsc#1161716)

OBS-URL: https://build.opensuse.org/request/show/766677
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=122
2020-01-23 17:49:33 +00:00
Dominique Leuenberger
33c1c63828 Accepting request 764070 from network:cluster
- Fix base_ver for SLE 15 SP2.

OBS-URL: https://build.opensuse.org/request/show/764070
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=36
2020-01-13 21:22:52 +00:00
c39f0bf6fb - Fix base_ver for SLE 15 SP2.
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=120
2020-01-13 15:42:28 +00:00
Dominique Leuenberger
0e9c3285f6 Accepting request 762780 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/762780
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=35
2020-01-10 16:50:05 +00:00
0581b91660 Accepting request 762650 from home:eeich:branches:network:cluster
- Update to version 19.05.5 (jsc#SLE-8491)
  * Check %docdir/NEWS for details.
  * Includes security fixes CVE-2019-19727, CVE-2019-19728,
    CVE-2019-12838.
  * Disable i586 builds as this is no longer supported.
  * Create libnss_slurm package to support user and group resolution
    thru slurmstepd.
  * slurm-2.4.4-rpath.patch -> Remove-rpath-from-build.patch
    Obsoleted:
    - pam_slurm_adopt-avoid-running-outside-of-the-sshd-PA.patch
    - pam_slurm_adopt-send_user_msg-don-t-copy-undefined-d.patch
    - pam_slurm_adopt-use-uid-to-determine-whether-root-is.patch

OBS-URL: https://build.opensuse.org/request/show/762650
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=118
2020-01-10 10:38:48 +00:00
Dominique Leuenberger
148129d300 Accepting request 761961 from network:cluster
- Deprecate "ControlMachine" only for SLURM version upgrades and
  products newer than 1501. This ensures that the original setting
  is retained for the SLURM version shipped origianlly with SLE-15-SP1
  or Leap 15.1.

- Update to v18.08.9 for fixing CVE-2019-19728 (bsc#1159692).
  * Wrap END_TIMER{,2,3} macro definition in "do {} while (0)" block.
  * Make sview work with glib2 v2.62.
  * Make Slurm compile on linux after sys/sysctl.h was deprecated.
  * Install slurmdbd.conf.example with 0600 permissions to encourage secure
    use. CVE-2019-19727.
  * srun - do not continue with job launch if --uid fails. CVE-2019-19728.

- added pmix support jsc#SLE-10800 

- Use --with-shared-libslurm to build slurm binaries using libslurm.
- Make libslurm depend on slurm-config.

- Fix ownership of /var/spool/slurm on new installations
  and upgrade (boo#1158696).

- Fix permissions of slurmdbd.conf (bsc#1155784, CVE-2019-19727).
- Fix %posttrans macro _res_update to cope with added newline
  (bsc#1153259).

- Add package slurm-webdoc which sets up a web server to provide
  the documentation for the version shipped.

- Move srun from 'slurm' to 'slurm-node': srun is required on the
  nodes as well so sbatch will work. 'slurm-node' is a requirement (forwarded request 760450 from eeich)

OBS-URL: https://build.opensuse.org/request/show/761961
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=34
2020-01-09 21:50:29 +00:00
69c13014d9 Accepting request 760450 from home:eeich:branches:network:cluster
- Deprecate "ControlMachine" only for SLURM version upgrades and
  products newer than 1501. This ensures that the original setting
  is retained for the SLURM version shipped origianlly with SLE-15-SP1
  or Leap 15.1.

- Update to v18.08.9 for fixing CVE-2019-19728 (bsc#1159692).
  * Wrap END_TIMER{,2,3} macro definition in "do {} while (0)" block.
  * Make sview work with glib2 v2.62.
  * Make Slurm compile on linux after sys/sysctl.h was deprecated.
  * Install slurmdbd.conf.example with 0600 permissions to encourage secure
    use. CVE-2019-19727.
  * srun - do not continue with job launch if --uid fails. CVE-2019-19728.

- added pmix support jsc#SLE-10800 

- Use --with-shared-libslurm to build slurm binaries using libslurm.
- Make libslurm depend on slurm-config.

- Fix ownership of /var/spool/slurm on new installations
  and upgrade (boo#1158696).

- Fix permissions of slurmdbd.conf (bsc#1155784, CVE-2019-19727).
- Fix %posttrans macro _res_update to cope with added newline
  (bsc#1153259).

- Add package slurm-webdoc which sets up a web server to provide
  the documentation for the version shipped.

- Move srun from 'slurm' to 'slurm-node': srun is required on the
  nodes as well so sbatch will work. 'slurm-node' is a requirement

OBS-URL: https://build.opensuse.org/request/show/760450
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=116
2020-01-08 19:27:10 +00:00
Dominique Leuenberger
9bd54c2a5c Accepting request 734512 from network:cluster
- Set %base_ver for SLE-15-SP2 to 18.08 (for now).

OBS-URL: https://build.opensuse.org/request/show/734512
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=33
2019-10-02 10:00:54 +00:00
163930db89 - Set %base_ver for SLE-15-SP2 to 18.08 (for now).
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=114
2019-10-02 08:27:50 +00:00
Yuchen Lin
fde06d1d7d Accepting request 731005 from network:cluster
- Edit sample configuration to deprecate "ControlMachine",
  "ControlAddr", "BackupController" and "BackupAddr" in favor
  "SlurmctldHost". (forwarded request 731004 from eeich)

OBS-URL: https://build.opensuse.org/request/show/731005
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=32
2019-09-16 08:52:58 +00:00
e3e7bce7dc Accepting request 731004 from home:eeich:branches:network:cluster
- Edit sample configuration to deprecate "ControlMachine",
  "ControlAddr", "BackupController" and "BackupAddr" in favor
  "SlurmctldHost".

OBS-URL: https://build.opensuse.org/request/show/731004
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=112
2019-09-14 21:47:11 +00:00
Dominique Leuenberger
a65036f38d Accepting request 724412 from network:cluster
- Fix logic of slurm-munge recommends: slurm-munge requires munge
  already, so if we have munge installed we recommend slurm-munge
  as the authentication when installing slurm or slurm-node.

- Updated to 18.08.8 for fixing (CVE-2019-12838, bsc#1140709, jsc#SLE-7341,
  jsc#SLE-7342)

OBS-URL: https://build.opensuse.org/request/show/724412
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=31
2019-08-19 19:41:36 +00:00
9c7abff085 - Updated to 18.08.8 for fixing (CVE-2019-12838, bsc#1140709, jsc#SLE-7341,
jsc#SLE-7342)

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=110
2019-08-18 20:13:20 +00:00
c0e29e647e - Updated to 18.08.8 for fixing (CVE-2019-12838, bsc#1140709, jre#SLE-7341,
jre#SLE-7342)

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=109
2019-08-18 18:46:31 +00:00
f2775f6e1e - Fix logic of slurm-munge recommends: slurm-munge requires munge
already, so if we have munge installed we recommend slurm-munge
  as the authentication when installing slurm or slurm-node.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=108
2019-08-17 14:25:47 +00:00
Dominique Leuenberger
62722daee0 Accepting request 715614 from network:cluster
removed explanation of changelog entry (forwarded request 715613 from mslacken)

OBS-URL: https://build.opensuse.org/request/show/715614
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=30
2019-07-17 11:20:26 +00:00
89f111874a Accepting request 715613 from home:mslacken:branches:network:cluster
removed explanation of changelog entry

OBS-URL: https://build.opensuse.org/request/show/715613
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=106
2019-07-16 08:32:48 +00:00
5a7922ceef Accepting request 715604 from home:mslacken:branches:network:cluster
- Fixed changelog entry from Jul 11 in order to use the right

OBS-URL: https://build.opensuse.org/request/show/715604
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=105
2019-07-16 08:18:32 +00:00
9d923e48e1 Accepting request 715597 from home:mslacken:branches:network:cluster
- Fixed changelog entry if Jul 11 in order to use the right 
  version slurm 18.08.8

- Updated to 18.08.8 for fixing CVE-2019-12838 and (bsc#1140709)

OBS-URL: https://build.opensuse.org/request/show/715597
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=104
2019-07-16 07:57:42 +00:00
Dominique Leuenberger
424501c95a Accepting request 715349 from network:cluster
- Fix build for SLE-11-SP4 and older. (forwarded request 715348 from eeich)

OBS-URL: https://build.opensuse.org/request/show/715349
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=29
2019-07-16 06:41:17 +00:00
f88a1f8e69 Accepting request 715348 from home:eeich:branches:network:cluster
- Fix build for SLE-11-SP4 and older.

OBS-URL: https://build.opensuse.org/request/show/715348
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=102
2019-07-14 21:25:41 +00:00
Dominique Leuenberger
8991e2f1ad Accepting request 714909 from network:cluster
- added cray depend libraries to seperate package, as they are now
  built, since json is enabled

- Updated to 18.0.7 for fixing CVE-2019-12838 and (bsc#1140709)
  * Update "xauth list" to use the same 10000ms timeout as the other xauth
    commands.
  * Fix issue in gres code to handle a gres cnt of 0.
  * Don't purge jobs if backfill is running.
  * Verify job is pending add/removing accrual time.
  * Don't abort when the job doesn't have an association that was removed
    before the job was able to make it to the database.
  * Set state_reason if select_nodes() fails job for QOS or Account.
  * Avoid seg_fault on referencing association without a valid_qos bitmap.
  * If Association/QOS is removed on a pending job set that job as ineligible.
  * When changing a jobs account/qos always make sure you remove the old limits.
  * Don't reset a FAIL_QOS or FAIL_ACCOUNT job reason until the qos or
    account changed.
  * Restore "sreport -T ALL" functionality.
  * Correctly typecast signals being sent through the api.
  * Properly initialize structures throughout Slurm.
  * Sync "numtask" squeue format option for jobs and steps to "numtasks".
  * Fix sacct -PD to avoid CA before start jobs.
  * Fix potential deadlock with backup slurmctld.
  * Fixed issue with jobs not appearing in sacct after dependency satisfied.
  * Fix showing non-eligible jobs when asking with -j and not -s.
  * Fix issue with backfill scheduler scheduling tasks of an array
    when not the head job.
  * accounting_storage/mysql - fix SIGABRT in the archive load logic.
  * accounting_storage/mysql - fix memory leak in the archive load logic.
  * Limit records per single SQL statement when loading archived data. (forwarded request 714908 from mslacken)

OBS-URL: https://build.opensuse.org/request/show/714909
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=28
2019-07-13 11:50:15 +00:00
257676d4f2 Accepting request 714908 from home:mslacken:branches:network:cluster
- added cray depend libraries to seperate package, as they are now
  built, since json is enabled

- Updated to 18.0.7 for fixing CVE-2019-12838 and (bsc#1140709)
  * Update "xauth list" to use the same 10000ms timeout as the other xauth
    commands.
  * Fix issue in gres code to handle a gres cnt of 0.
  * Don't purge jobs if backfill is running.
  * Verify job is pending add/removing accrual time.
  * Don't abort when the job doesn't have an association that was removed
    before the job was able to make it to the database.
  * Set state_reason if select_nodes() fails job for QOS or Account.
  * Avoid seg_fault on referencing association without a valid_qos bitmap.
  * If Association/QOS is removed on a pending job set that job as ineligible.
  * When changing a jobs account/qos always make sure you remove the old limits.
  * Don't reset a FAIL_QOS or FAIL_ACCOUNT job reason until the qos or
    account changed.
  * Restore "sreport -T ALL" functionality.
  * Correctly typecast signals being sent through the api.
  * Properly initialize structures throughout Slurm.
  * Sync "numtask" squeue format option for jobs and steps to "numtasks".
  * Fix sacct -PD to avoid CA before start jobs.
  * Fix potential deadlock with backup slurmctld.
  * Fixed issue with jobs not appearing in sacct after dependency satisfied.
  * Fix showing non-eligible jobs when asking with -j and not -s.
  * Fix issue with backfill scheduler scheduling tasks of an array
    when not the head job.
  * accounting_storage/mysql - fix SIGABRT in the archive load logic.
  * accounting_storage/mysql - fix memory leak in the archive load logic.
  * Limit records per single SQL statement when loading archived data.

OBS-URL: https://build.opensuse.org/request/show/714908
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=100
2019-07-12 18:09:50 +00:00
fa2138ebce Accepting request 714002 from home:eeich:slurm-staging
- Fix build dependency issue around libibmad-devel introduced
  in SLE-12-SP4.

OBS-URL: https://build.opensuse.org/request/show/714002
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=99
2019-07-08 08:21:33 +00:00
5a25a5ea8b Accepting request 713918 from home:eeich:slurm-staging
- Add BuildRequires to address warnings during build:
  * for libcurl-devel, libssh2-devel and rrdtool-devel
  * for libjson-c-devel and liblz4-devel where available,
    disable these with --without-json and --without-lz4
    where not.
  * disable DataWarp (--without-datawarp).

OBS-URL: https://build.opensuse.org/request/show/713918
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=98
2019-07-08 05:48:14 +00:00
5f6fddfc21 - Remove stray BuildRequires for infiniband-diags-devel
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=97
2019-07-07 19:03:24 +00:00
db5ace2fb9 - Fix test for Factory
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=96
2019-07-07 14:57:15 +00:00
69c4464cd5 - Fix test for oS Factory
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=95
2019-07-07 12:42:55 +00:00
d212ad0245 Accepting request 713773 from home:eeich:branches:network:cluster
- Update SLURM to 18.08.7:
  * Set debug statement to debug2 to avoid benign error messages.
  * Add SchedulerParameters option of bf_hetjob_immediate to attempt to start
    a heterogeneous job as soon as all of its components are determined able
    to do so.
  * Fix underflow causing decay thread to exit.
  * Fix main scheduler not considering hetjobs when building the job queue.
  * Fix regression for sacct to display old jobs without a start time.
  * Fix setting correct number of gres topology bits.
  * Update hetjobs pending state reason when appropriate.
  * Fix accounting_storage/filetxt's understanding of TRES.
  * Set Accrue time when not enforcing limits.
  * Fix srun segfault when requesting a hetjob with test_exec or bcast
    options.
  * Hide multipart priorities log message behind Priority debug flag.
  * sched/backfill - Make hetjobs sensitive to bf_max_job_start.
  * Fix slurmctld segfault due to job's partition pointer NULL dereference.
  * Fix issue with OR'ed job dependencies.
  * Add new job's bit_flags of INVALID_DEPEND to prevent rebuilding a job's
    dependency string when it has at least one invalid and purged dependency.
  * Promote federation unsynced siblings log message from debug to info.
  * burst_buffer/cray - fix slurmctld SIGABRT due to illegal read/writes.
  * burst_buffer/cray - fix memory leak due to unfreed job script content.
  * node_features/knl_cray - fix script_argv use-after-free.
  * burst_buffer/cray - fix script_argv use-after-free.
  * Fix invalid reads of size 1 due to non null-terminated string reads.
  * Add extra debug2 logs to identify why BadConstraints reason is set.

OBS-URL: https://build.opensuse.org/request/show/713773
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=94
2019-07-07 04:27:16 +00:00
0c8ed23dc7 Accepting request 713744 from home:eeich:branches:network:cluster
- Do not build hdf5 support where not available.

OBS-URL: https://build.opensuse.org/request/show/713744
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=93
2019-07-06 20:02:33 +00:00
2536acafc5 Accepting request 713735 from home:eeich:branches:network:cluster
- Add support for version updates on SLE: Update packages to a
  later version than the version supported originally on SLE
  will receive a version string in their package name.

OBS-URL: https://build.opensuse.org/request/show/713735
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=92
2019-07-06 17:41:00 +00:00
Dominique Leuenberger
aa393a5ef3 Accepting request 706361 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/706361
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=27
2019-06-01 07:55:54 +00:00
4a0199d836 Accepting request 679787 from home:mslacken:slurm18
- added the hdf5 job data gathering plugin

OBS-URL: https://build.opensuse.org/request/show/679787
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=90
2019-05-29 15:15:25 +00:00
Stephan Kulow
3f77b9a7fc Accepting request 670636 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/670636
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=26
2019-02-02 20:50:18 +00:00
2b7d9f397e Accepting request 670635 from home:eeich:branches:network:cluster
- Add backward compatibility with SLE-11 SP4

OBS-URL: https://build.opensuse.org/request/show/670635
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=88
2019-02-01 19:44:10 +00:00
Dominique Leuenberger
d8bad37648 Accepting request 670462 from network:cluster
- Update to version 18.08.05-2:
  This version obsoletes:
  Fix-contrib-perlapi-to-build-with-the-fix-for-CVE-2019-6438-750cc23ed.patch
- Fix spec file for older SUSE versions.

- Update to version 18.08.05:
  * Add mitigation for a potential heap overflow on 32-bit systems in xmalloc.
    (CVE-2019-6438, bsc#1123304).
  * Other fixes:
    + Backfill - If a job has a time_limit guess the end time of a job better
      if OverTimeLimit is Unlimited.
    + Fix "sacctmgr show events event=cluster"
    + Fix sacctmgr show runawayjobs from sibling cluster
    + Avoid bit offset of -1 in call to bit_nclear().
    + Insure that "hbm" is a configured GresType on knl systems.
    + Fix NodeFeaturesPlugins=node_features/knl_generic to allow other gres
      other than knl.
    + cons_res: Prevent overflow on multiply.
    + Better debug for bad values in gres.conf.
    + Fix double accounting of energy at end of job.
    + Read gres.conf for cloud nodes on slurmctld.
    + Don't assume the first node of a job is the batch host when purging jobs
      from a node.
    + Better debugging when a job doesn't have a job_resrcs ptr.
    + Store ave watts in energy plugins.
    + Add XCC plugin for reading Lenovo Power.
    + Fix minor memory leak when scheduling rebootable nodes.
    + Fix debug2 prefix for sched log.
    + Fix printing correct SLURM_JOB_ACCOUNT_PACK_GROUP_* in env for a Het Job.
    + sbatch - search current working directory first for job script.

OBS-URL: https://build.opensuse.org/request/show/670462
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=25
2019-02-01 10:48:34 +00:00
a857bd00b6 - Fix build.
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=86
2019-01-31 21:19:18 +00:00
acb7e0505a - Update to version 18.08.05-2:
This version obsoletes:
  Fix-contrib-perlapi-to-build-with-the-fix-for-CVE-2019-6438-750cc23ed.patch
- Fix spec file for older SUSE versions.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=85
2019-01-31 20:33:20 +00:00
2ff256ff3d - Structural fixes to build on older openSUSE and SLE versions.
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=84
2019-01-31 20:14:27 +00:00
c9da5cd5a9 Accepting request 670322 from home:eeich:branches:network:cluster
- Update to version 18.08.05:
  * Add mitigation for a potential heap overflow on 32-bit systems in xmalloc.
    (CVE-2019-6438, bsc#1123304).
  * Other fixes:
    + Backfill - If a job has a time_limit guess the end time of a job better
      if OverTimeLimit is Unlimited.
    + Fix "sacctmgr show events event=cluster"
    + Fix sacctmgr show runawayjobs from sibling cluster
    + Avoid bit offset of -1 in call to bit_nclear().
    + Insure that "hbm" is a configured GresType on knl systems.
    + Fix NodeFeaturesPlugins=node_features/knl_generic to allow other gres
      other than knl.
    + cons_res: Prevent overflow on multiply.
    + Better debug for bad values in gres.conf.
    + Fix double accounting of energy at end of job.
    + Read gres.conf for cloud nodes on slurmctld.
    + Don't assume the first node of a job is the batch host when purging jobs
      from a node.
    + Better debugging when a job doesn't have a job_resrcs ptr.
    + Store ave watts in energy plugins.
    + Add XCC plugin for reading Lenovo Power.
    + Fix minor memory leak when scheduling rebootable nodes.
    + Fix debug2 prefix for sched log.
    + Fix printing correct SLURM_JOB_ACCOUNT_PACK_GROUP_* in env for a Het Job.
    + sbatch - search current working directory first for job script.
    + Make it so held jobs reset the AccrueTime and do not count against any
      AccrueTime limits.
    + Add SchedulerParameters option of bf_hetjob_prio=[min|avg|max] to alter
      the job sorting algorithm for scheduling heterogeneous jobs.
    + Fix initialization of assoc_mgr_locks and slurmctld_locks lock

OBS-URL: https://build.opensuse.org/request/show/670322
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=83
2019-01-31 11:56:59 +00:00
Dominique Leuenberger
74b4d5ddb3 Accepting request 663813 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/663813
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=24
2019-01-21 09:47:31 +00:00
364aa9908a Accepting request 663733 from home:mslacken:slurm18
- Update to 18.08.04, with following highlights
  * Fix message sent to user to display preempted instead of time limit when
    a job is preempted.
  * Fix memory leak when a failure happens processing a nodes gres config.
  * Improve error message when failures happen processing a nodes gres config.
  * Don't skip jobs in scontrol hold.
  * Allow --cpu-bind=verbose to be used with SLURM_HINT environment variable.
  * Enhanced handling for runaway jobs
  * cons_res: Delay exiting cr_job_test until after cores/cpus are calculated
    and distributed.
  * Don't check existence of srun --prolog or --epilog executables when set to
    "none" and SLURM_TEST_EXEC is used.
  * Add "P" suffix support to job and step tres specifications.
  * Fix jobacct_gather/cgroup to work correctly when more than one task is
    started on a node.
  * salloc - set SLURM_NTASKS_PER_CORE and SLURM_NTASKS_PER_SOCKET in the
    environment if the corresponding command line options are used.
  * slurmd - fix handling of the -f flag to specify alternate config file
    locations.
  * Add SchedulerParameters option of bf_ignore_newly_avail_nodes to avoid
    scheduling lower priority jobs on resources that become available during
    the backfill scheduling cycle when bf_continue is enabled.
  * job_submit/lua: Add several slurmctld return codes and add user/group info
  * salloc/sbatch/srun - print warning if mutually exclusive options of --mem
    and --mem-per-cpu are both set.
 - Refreshed:
  * pam_slurm_adopt-avoid-running-outside-of-the-sshd-PA.patch

OBS-URL: https://build.opensuse.org/request/show/663733
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=81
2019-01-08 19:05:14 +00:00
Dominique Leuenberger
5f5fe54c27 Accepting request 657426 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/657426
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=23
2018-12-12 16:31:01 +00:00
9eefc8e774 Accepting request 657422 from home:mslacken:slurm18
- restarting services on update only when activated 
- added rotation of logs
- Added backported patches which harden the pam module pam_slurm_adopt
  (BOO#1116758) which will be in slurm 19.05.x
  * added pam_slurm_adopt-avoid-running-outside-of-the-sshd-PA.patch
    [PATCH 1/3] pam_slurm_adopt: avoid running outside of the sshd PAM
  * added pam_slurm_adopt-send_user_msg-don-t-copy-undefined-d.patch
    [PATCH 2/3] pam_slurm_adopt: send_user_msg: don't copy undefined data
  * added pam_slurm_adopt-use-uid-to-determine-whether-root-is.patch
    [PATCH 3/3] pam_slurm_adopt: use uid to determine whether root is
    logging on
- package slurm-pam_slurm now depends on slurm-node and not on slurm

OBS-URL: https://build.opensuse.org/request/show/657422
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=79
2018-12-12 09:28:26 +00:00
Dominique Leuenberger
d456564fc2 Accepting request 655559 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/655559
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=22
2018-12-07 13:34:03 +00:00
8ddf42df7f Accepting request 655364 from home:mslacken:slurm18
- fixed code in %pretrans section to be compatible with lua 5.1

OBS-URL: https://build.opensuse.org/request/show/655364
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=77
2018-12-06 09:50:36 +00:00
Dominique Leuenberger
ce6f6d350e Accepting request 653720 from network:cluster
Automatic submission by obs-autosubmit

OBS-URL: https://build.opensuse.org/request/show/653720
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=21
2018-12-04 19:57:26 +00:00
f21d191e3c Accepting request 650545 from home:eeich:branches:network:cluster
- Added missing perl-base dependency.

- Moved HTML docs to doc package.

- Moved config man pages to a separate package: This way, they won't
  get installed on compute nodes.                                                                                                                                  

- Update to 18.08.3
  * Add new burst buffer state of "teardown-fail" to indicate the burst
    buffer teardown operation is failing on specific buffers.
  * Multiple backup slurmctld daemons can be configured
  * Enable jobs with zero node count for creation and/or deletion of persistent
    burst buffers.
  * Add "scontrol show dwstat" command to display Cray burst buffer status.
  * Add "GetSysStatus" option to burst_buffer.conf file.
  * Add node and partition configuration options of "CpuBind" to control
    default task binding.
  * Add "NumaCpuBind" option to knl.conf
  * Add sbatch "--batch" option to identify features required on batch node.
  * Add "BatchFeatures" field to output of "scontrol show job".
  * Add support for "--bb" option to sbatch command.
  * Add new SystemComment field to job data structure and database.
  * Expand reservation "flags" field from 32 to 64 bits.
  * Add job state flag of "SIGNALING" to avoid race condition.
  * Properly handle srun --will-run option when there are jobs in COMPLETING
    state.
  * Properly report who is signaling a step.
  * Don't combine updated reservation records in sreport's reservation report.
  * node_features plugin - Add suport for XOR & XAND of job constraints (node
    feature specifications).

OBS-URL: https://build.opensuse.org/request/show/650545
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=75
2018-11-20 17:07:44 +00:00
Dominique Leuenberger
86c9afa17d Accepting request 639245 from network:cluster
- Move config man-pages to config package. (forwarded request 639244 from eeich)

OBS-URL: https://build.opensuse.org/request/show/639245
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=20
2018-10-01 07:08:06 +00:00
2390a20289 Accepting request 639244 from home:eeich:branches:network:cluster
- Move config man-pages to config package.

OBS-URL: https://build.opensuse.org/request/show/639244
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=73
2018-09-30 15:33:20 +00:00
Dominique Leuenberger
d570b4c591 Accepting request 637642 from network:cluster
- added correct link flags for perl bindings (bsc#1108671)
  * added correct linker search path in slurm-2.4.4-rpath.patch
  * perl:Switch is required by slurm torque wrappers

OBS-URL: https://build.opensuse.org/request/show/637642
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=19
2018-09-25 13:42:36 +00:00
Dominique Leuenberger
e67629e7ac Accepting request 637167 from network:cluster
- Fix Requires(pre) and Requires(post) for slurm-config and slurm-node.
  This fixes issues with failing slurm user creation when installed
  during initial system installation (bsc#1109373). (forwarded request 637165 from eeich)

OBS-URL: https://build.opensuse.org/request/show/637167
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=18
2018-09-24 11:13:28 +00:00
39fedd2ce8 - added correct link flags for perl bindings (bsc#1108671)
* added correct linker search path in slurm-2.4.4-rpath.patch
  * perl:Switch is required by slurm torque wrappers

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=70
2018-09-24 09:37:13 +00:00
410ad28aca Accepting request 637165 from home:eeich:branches:network:cluster
- Fix Requires(pre) and Requires(post) for slurm-config and slurm-node.
  This fixes issues with failing slurm user creation when installed
  during initial system installation (bsc#1109373).

OBS-URL: https://build.opensuse.org/request/show/637165
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=69
2018-09-22 07:50:55 +00:00
Dominique Leuenberger
787af67337 Accepting request 631120 from network:cluster
- slurmctld-rerun-agent_init-when-backup-controller-takes-over.patch:
  Fix race in the slurmctld backup controller which prevents it

OBS-URL: https://build.opensuse.org/request/show/631120
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=17
2018-08-24 15:11:07 +00:00
dbb82d64bd - slurmctld-rerun-agent_init-when-backup-controller-takes-over.patch:
Fix race in the slurmctld backup controller which prevents it

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=67
2018-08-23 13:54:53 +00:00
Dominique Leuenberger
86275b2ca6 Accepting request 629227 from network:cluster
- slurmctld-rerun-agent_init-when-backup-controller-takes-over.patch
  Fix an issue where the fallback controller will not be able to idle
  nodes after a failover when a process has terminated (bsc#1084917). (forwarded request 629226 from eeich)

OBS-URL: https://build.opensuse.org/request/show/629227
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=16
2018-08-17 22:02:08 +00:00
fafb5a0196 Accepting request 629226 from home:eeich:branches:network:cluster
- slurmctld-rerun-agent_init-when-backup-controller-takes-over.patch
  Fix an issue where the fallback controller will not be able to idle
  nodes after a failover when a process has terminated (bsc#1084917).

OBS-URL: https://build.opensuse.org/request/show/629226
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=65
2018-08-14 13:18:35 +00:00
d5a2e95d8c Accepting request 629222 from home:eeich:branches:network:cluster
- Update to 17.11.9
  * Fix segfault in slurmctld when a job's node bitmap is NULL during a
    scheduling cycle.  Primarily caused by EnforcePartLimits=ALL.
  * Remove erroneous unlock in acct_gather_energy/ipmi.
  * Enable support for hwloc version 2.0.1.
  * Fix 'srun -q' (--qos) option handling.
  * Fix socket communication issue that can lead to lost task completition
    messages, which will cause a permanently stuck srun process.
  * Handle creation of TMPDIR if environment variable is set or changed in
    a task prolog script.
  * Avoid node layout fragmentation if running with a fixed CPU count but
    without Sockets and CoresPerSocket defined.
  * burst_buffer/cray - Fix datawarp swap default pool overriding jobdw.
  * Fix incorrect job priority assignment for multi-partition job with
    different PriorityTier settings on the partitions.
  * Fix sinfo to print correct node state.

- When using a remote shared StateSaveLocation, slurmctld needs to
  be started after remote filesystems have become available.
  Add 'remote-fs.target' to the 'After=' directive in slurmctld.service
  (boo#1103561).

- Update to 17.11.8
  * Fix incomplete RESPONSE_[RESOURCE|JOB_PACK]_ALLOCATION building path.
  * Do not allocate nodes that were marked down due to the node not responding
    by ResumeTimeout.
  * task/cray plugin - search for "mems" cgroup information in the file
    "cpuset.mems" then fall back to the file "mems".
  * Fix ipmi profile debug uninitialized variable.
  * PMIx: fixed the direct connect inline msg sending.

OBS-URL: https://build.opensuse.org/request/show/629222
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=64
2018-08-14 13:00:16 +00:00
Dominique Leuenberger
1a766a5938 Accepting request 622077 from network:cluster
- Shield comments between script snippets with a %{!?nil:...} to
  avoid them being interpreted as scripts - in which case the update
  level is passed as argument (see chapter 'Shared libraries' in:
  https://en.opensuse.org/openSUSE:Packaging_scriptlet_snippets)
  (bsc#1100850).

OBS-URL: https://build.opensuse.org/request/show/622077
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=15
2018-07-13 08:20:52 +00:00
62ef6634bc - Shield comments between script snippets with a %{!?nil:...} to
avoid them being interpreted as scripts - in which case the update
  level is passed as argument (see chapter 'Shared libraries' in:
  https://en.opensuse.org/openSUSE:Packaging_scriptlet_snippets)
  (bsc#1100850).

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=62
2018-07-11 12:08:06 +00:00
Yuchen Lin
502ac7ba66 Accepting request 616050 from network:cluster
- Update from 17.11.5 to 17.11.7
- Fix security issue in handling of username and gid fields
  CVE-2018-10995 and bsc#1095508 what implied an 
  update from 17.11.5 to 17.11.7
  Highlights of 17.11.6:
  * CRAY - Add slurmsmwd to the contribs/cray dir
  * PMIX - Added the direct connect authentication.
  * Prevent the backup slurmctld from losing the active/available node
    features list on takeover.
  * Be able to force power_down of cloud node even if in power_save state.
  * Allow cloud nodes to be recognized in Slurm when booted out of band.
  * Numerous fixes - check 'NEWS' file.
  Highlights of 17.11.7:
  * Notify srun and ctld when unkillable stepd exits.
  * Numerous fixes - check 'NEWS' file.
- Add: slurmsmwd-uses-xdaemon_-for-systemd.patch
  * Fixes daemoniziation in newly introduced slurmsmwd daemon.
- Rename:
  split-xdaemon-in-xdaemon_init-and-xdaemon_finish-for-systemd-compatibilty.patch
  to split-xdaemon-in-xdaemon_init-and-xdaemon_finish-for.patch
  * remain in sync with commit messages which introduced that file

OBS-URL: https://build.opensuse.org/request/show/616050
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=14
2018-06-13 13:39:46 +00:00
1337fac8b2 - Add: slurmsmwd-uses-xdaemon_-for-systemd.patch
* Fixes daemoniziation in newly introduced slurmsmwd daemon.
- Rename:
  split-xdaemon-in-xdaemon_init-and-xdaemon_finish-for-systemd-compatibilty.patch
  * remain in sync with commit messages which introduced that file

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=60
2018-06-11 14:22:14 +00:00
3e1fd5dae9 Accepting request 616031 from home:mslacken
- Fix security issue in handling of username and gid fields
  CVE-2018-10995 and bsc#1095508 what implied an 
  update from 17.11.5 to 17.11.7
- renanmed split-xdaemon-in-xdaemon_init-and-xdaemon_finish-for-systemd-compatibilty.patch
  to split-xdaemon-in-xdaemon_init-and-xdaemon_finish-for.patch
  in order to be in sync with commit messages which introduced that file

OBS-URL: https://build.opensuse.org/request/show/616031
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=59
2018-06-11 14:18:08 +00:00
55d6d2b0c7 Accepting request 615950 from home:mslacken
- Fix security issue in handling of username and gid fields
  CVE-2018-10995 what implied an update from 17.11.5 to 17.11.7
- Update from 17.11.5 to 17.11.7
  Highlights of 17.11.6:
  * CRAY - Add slurmsmwd to the contribs/cray dir
  * PMIX - Added the direct connect authentication.
  * Prevent the backup slurmctld from losing the active/available node
    features list on takeover.
  * Be able to force power_down of cloud node even if in power_save state.
  * Allow cloud nodes to be recognized in Slurm when booted out of band.
  * Numerous fixes - check 'NEWS' file.
  Highlights of 17.11.7:
  * Notify srun and ctld when unkillable stepd exits.
  * Numerous fixes - check 'NEWS' file.

OBS-URL: https://build.opensuse.org/request/show/615950
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=58
2018-06-11 10:31:14 +00:00
Dominique Leuenberger
eecd28fef6 Accepting request 599202 from network:cluster
- Avoid running pretrans scripts when running in an instsys:
  there may be not much installed, yet. pretrans code should
  be done in lua, this way, it will be executed by the rpm-internal
  lua interpreter and not be passed to a shell which may not be
  around at the time this scriptlet is run (bsc#1090292). (forwarded request 599201 from eeich)

OBS-URL: https://build.opensuse.org/request/show/599202
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=13
2018-04-20 15:32:10 +00:00
7d56316590 Accepting request 599201 from home:eeich:branches:network:cluster
- Avoid running pretrans scripts when running in an instsys:
  there may be not much installed, yet. pretrans code should
  be done in lua, this way, it will be executed by the rpm-internal
  lua interpreter and not be passed to a shell which may not be
  around at the time this scriptlet is run (bsc#1090292).

OBS-URL: https://build.opensuse.org/request/show/599201
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=56
2018-04-20 09:24:13 +00:00
Dominique Leuenberger
648ad9864b Accepting request 596387 from network:cluster
- Add requires for slurm-sql to the slurmdbd package.

- Package READMEs for pam and pam_slurm_adopt.
- Use the new %%license directive for COPYING file.

- Add:
  * split-xdaemon-in-xdaemon_init-and-xdaemon_finish-for-systemd-compatibilty.patch
  * slurmctld-uses-xdaemon_-for-systemd.patch
  * slurmd-uses-xdaemon_-for-systemd.patch
  * slurmdbd-uses-xdaemon_-for-systemd.patch
  * removed-deprecated-xdaemon.patch
  Fix interaction with systemd: systemd expects that a 
  daemonizing process doesn't go away until the PID file
  with it PID of the daemon has bee written (bsc#1084125).

- Make sure systemd services get restarted only when all
  packages are in a consistent state, not in the middle
  of an 'update' transaction (bsc#1088693).
  Since the %postun scripts that run on update are from
  the old package they cannot be changed - thus we work
  around the restart breakage.

  (bsc#1086859).

OBS-URL: https://build.opensuse.org/request/show/596387
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=12
2018-04-16 10:49:00 +00:00
df7fca5b1f - Add requires for slurm-sql to the slurmdbd package.
- Add:
  * split-xdaemon-in-xdaemon_init-and-xdaemon_finish-for-systemd-compatibilty.patch
  * slurmctld-uses-xdaemon_-for-systemd.patch
  * slurmd-uses-xdaemon_-for-systemd.patch
  * slurmdbd-uses-xdaemon_-for-systemd.patch
  * removed-deprecated-xdaemon.patch
  Fix interaction with systemd: systemd expects that a

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=54
2018-04-13 15:08:24 +00:00
d892c59e4e - Package READMEs for pam and pam_slurm_adopt.
- Use the new %%license directive for COPYING file.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=53
2018-04-12 17:22:25 +00:00
8d80dfc527 - Fix interaction with systemd: systemd expects that a
daemonizing process doesn't go away until the PID file
  with it PID of the daemon has bee written (bsc#1084125).

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=52
2018-04-12 16:42:36 +00:00
7dbbe8e89d - Make sure systemd services get restarted only when all
packages are in a consistent state, not in the middle
  of an 'update' transaction (bsc#1088693).
  Since the %postun scripts that run on update are from
  the old package they cannot be changed - thus we work
  around the restart breakage.
  (bsc#1086859).

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=51
2018-04-11 11:50:15 +00:00
Dominique Leuenberger
3297ea57b6 Accepting request 591864 from network:cluster
- fixed wrong log file location in slurmdbd.conf and 
  fixed pid location for slurmdbd and made slurm-slurmdbd
  depend on slurm config which provides the dir /var/run/slurm
  (bsc#1086859) (forwarded request 591103 from mslacken)

OBS-URL: https://build.opensuse.org/request/show/591864
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=11
2018-03-29 09:57:05 +00:00
7025591d0d Accepting request 591103 from home:mslacken:hpc
- fixed wrong log file location in slurmdbd.conf and 
  fixed pid location for slurmdbd and made slurm-slurmdbd
  depend on slurm config which provides the dir /var/run/slurm
  (bsc#1086859)

OBS-URL: https://build.opensuse.org/request/show/591103
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=49
2018-03-28 08:20:56 +00:00
Dominique Leuenberger
d6b57f2b49 Accepting request 587828 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/587828
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=10
2018-03-20 20:59:01 +00:00
003175f991 Accepting request 587822 from home:mslacken
- added comment for (bsc#1085606)

OBS-URL: https://build.opensuse.org/request/show/587822
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=47
2018-03-16 09:52:14 +00:00
cbe6c9fcaa Accepting request 587617 from home:eeich:branches:network:cluster
- Fix security issue in accounting_storage/mysql plugin by always escaping
  strings within the slurmdbd. CVE-2018-7033
  http://web.nvd.nist.gov/view/vuln/detail?vulnId=CVE-2018-7033
  (bsc#1085240).
- Update slurm to v17.11.5 (FATE#325451)
  Highlights of 17.11:
  * Support for federated clusters to manage a single work-flow 
    across a set of clusters.
  * Support for heterogeneous job allocations (various processor types,
    memory sizes, etc. by job component). Support for heterogeneous job
    steps within a single MPI_COMM_WORLD is not yet supported for most
    configurations.
  * X11 support is now fully integrated with the main Slurm code. Remove
    any X11 plugin configured in your plugstack.conf file to avoid errors
    being logged about conflicting options.
  * Added new advanced reservation flag of "flex", which permits jobs
    requesting the reservation to begin prior to the reservation's 
    start time and use resources inside or outside of the reservation.
    A typical use case is to prevent jobs not explicitly requesting the
    reservation from using those reserved resources rather than forcing
    jobs requesting the reservation to use those resources in the time
    frame reserved.
  * The sprio command has been modified to report a job's priority
    information for every partition the job has been submitted to.
  * Group ID lookup performed at job submit time to avoid lookup on
    all compute nodes. Enable with PrologFlags=SendGIDs configuration
    parameter.
  * Slurm commands and daemons dynamically link to libslurmfull.so
    instead of statically linking. This dramatically reduces the
    footprint of Slurm.

OBS-URL: https://build.opensuse.org/request/show/587617
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=46
2018-03-15 19:52:49 +00:00
19ceb304e2 - Fixed some rpmlint warnings.
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=45
2018-03-15 12:23:19 +00:00
23b2a195ba Accepting request 587092 from home:eeich:branches:network:cluster
- Update slurm to v17.11.4 (FATE#325451)
  * Link dynamically to libslurm.so to reduce footprint
    of all binaries.
  * Remove plugins for obsolete MPI stacks:
    - lam
    - mpich1_p4
    - mpich1_shmem
    - mvapich
  * Numerous fixes - check 'NEWS' file.
- slurmd-Fix-slurmd-for-new-API-in-hwloc-2.0.patch
  plugins-cgroup-Fix-slurmd-for-new-API-in-hwloc-2.0.patch:
  Removed. Code upstream.
- slurmctld-service-var-run-path.patch:
  Replaced by sed script.

OBS-URL: https://build.opensuse.org/request/show/587092
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=44
2018-03-15 07:03:02 +00:00
d6c16c524d - Remove the last two commits, changes were invalid.
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=43
2018-03-07 20:21:50 +00:00
903545a8b9 * Create a separate logdir for slurm as well.
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=42
2018-03-07 16:57:41 +00:00
a59a0c2ced - Fix user/group settings (boo#1084333)
* Fix user/group for /var/run/slurm the PID file directory.
  * Fix user/group in systemd service files for process ownership.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=41
2018-03-07 15:28:17 +00:00
Dominique Leuenberger
9b876f9095 Accepting request 571153 from network:cluster
- moved config files to slurm-config package (FATE#324574).

- Moved slurmstepd and man page into slurm-node due to slurmd dependency
- Moved config files into slurm-node
- Moved slurmd rc scripts into slurm-node
- Made slurm-munge require slurm-plugins instead of slurm itself
  - slurm-node suggested slurm-munge, causing the whole slurm to be
    installed. The slurm-plugins seems to be a more base class
    (FATE#324574).

- split up light wight slurm-node package for deployment on nodes
  (FATE#324574).

OBS-URL: https://build.opensuse.org/request/show/571153
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=9
2018-01-31 18:52:24 +00:00
9c6e84b74f - Added FATE to log file (FATE#324574).
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=39
2018-01-30 16:27:16 +00:00
9127e7f808 Accepting request 570889 from home:mslacken
- moved config files to slurm-config package 

- Moved slurmstepd and man page into slurm-node due to slurmd dependency
- Moved config files into slurm-node
- Moved slurmd rc scripts into slurm-node
- Made slurm-munge require slurm-plugins instead of slurm itself
  - slurm-node suggested slurm-munge, causing the whole slurm to be
    installed. The slurm-plugins seems to be a more base class

- split up light wight slurm-node package for deployment on nodes

OBS-URL: https://build.opensuse.org/request/show/570889
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=38
2018-01-30 16:25:18 +00:00
Dominique Leuenberger
78dffd3c9b Accepting request 554997 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/554997
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=8
2017-12-22 11:18:54 +00:00
6579dd2427 Accepting request 553521 from home:mslacken:hpc
- added /var/spool/ directory and removed duplicated entries from slurm.conf

OBS-URL: https://build.opensuse.org/request/show/553521
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=36
2017-12-07 11:18:20 +00:00
Dominique Leuenberger
3a83cab0fd Accepting request 543899 from network:cluster
- Package so-versioned libs separately: currently: libslrum31, libpmi0 (forwarded request 543898 from eeich)

OBS-URL: https://build.opensuse.org/request/show/543899
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=7
2017-11-21 14:33:05 +00:00
2bb9a2359c Accepting request 543898 from home:eeich:branches:network:cluster
- Package so-versioned libs separately: currently: libslrum31, libpmi0

OBS-URL: https://build.opensuse.org/request/show/543898
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=34
2017-11-20 14:01:44 +00:00
Dominique Leuenberger
10ac344cf4 Accepting request 542025 from network:cluster
- Package so-versioned libs separately. sibslurm is expected
  to change more frequently and thus is packaged separately
  from libpmi. (forwarded request 542024 from eeich)

OBS-URL: https://build.opensuse.org/request/show/542025
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=6
2017-11-15 16:04:41 +00:00
8ae70e8d08 Accepting request 542024 from home:eeich:branches:network:cluster
- Package so-versioned libs separately. sibslurm is expected
  to change more frequently and thus is packaged separately
  from libpmi.

OBS-URL: https://build.opensuse.org/request/show/542024
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=32
2017-11-15 12:49:05 +00:00
2dc1925cbd Accepting request 540570 from home:eeich:branches:network:cluster
- Package so-versioned libs separate from non-so-versioned.
  This way, the non-so-versioned libs can remain installed
  without conflict.

OBS-URL: https://build.opensuse.org/request/show/540570
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=31
2017-11-10 14:19:18 +00:00
Dominique Leuenberger
890109c64c Accepting request 538162 from network:cluster
- Updated to 17.02.9 to fix CVE-2017-15566 (bsc#1065697).
   Changes in 17.0.9
   * When resuming powered down nodes, mark DOWN nodes right after
     ResumeTimeout
    has been reached (previous logic would wait about one minute longer).
   * Fix sreport not showing full column name for TRES Count.
   * Fix slurmdb_reservations_get() giving wrong usage data when job's spanned
     reservation that was modified.
   * Fix sreport reservation utilization report showing bad data.
   * Show all TRES' on a reservation in sreport reservation utilization report
     by default.
   * Fix sacctmgr show reservation handling "end" parameter.
   * Work around issue with sysmacros.h and gcc7 / glibc 2.25.
   * Fix layouts code to only allow setting a boolean.
   * Fix sbatch --wait to keep waiting even if a message timeout occurs.
   * CRAY - If configured with NodeFeatures=knl_cray and there are non-KNL
     nodes which include no features the slurmctld will abort without
     this patch when attemping strtok_r(NULL).
   * Fix regression in 17.02.7 which would run the spank_task_privileged as
     part of the slurmstepd instead of it's child process.
   * Fix security issue in Prolog and Epilog by always prepending SPANK_ to
     all user-set environment variables. CVE-2017-15566.
   Changes in 17.0.8:
   * Add 'slurmdbd:' to the accounting plugin to notify message is from dbd
    instead of local.
   * mpi/mvapich - Buffer being only partially cleared. No failures observed.
   * Fix for job  --switch option on dragonfly network.
   * In salloc with  --uid option, drop supplementary groups before changing UID.
   * jobcomp/elasticsearch - strip any trailing slashes from JobCompLoc.
   * jobcomp/elasticsearch - fix memory leak when transferring generated buffer. (forwarded request 538161 from eeich)

OBS-URL: https://build.opensuse.org/request/show/538162
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=5
2017-11-03 15:25:41 +00:00
2ea5b3f2de Accepting request 538161 from home:eeich:branches:network:cluster
- Updated to 17.02.9 to fix CVE-2017-15566 (bsc#1065697).
   Changes in 17.0.9
   * When resuming powered down nodes, mark DOWN nodes right after
     ResumeTimeout
    has been reached (previous logic would wait about one minute longer).
   * Fix sreport not showing full column name for TRES Count.
   * Fix slurmdb_reservations_get() giving wrong usage data when job's spanned
     reservation that was modified.
   * Fix sreport reservation utilization report showing bad data.
   * Show all TRES' on a reservation in sreport reservation utilization report
     by default.
   * Fix sacctmgr show reservation handling "end" parameter.
   * Work around issue with sysmacros.h and gcc7 / glibc 2.25.
   * Fix layouts code to only allow setting a boolean.
   * Fix sbatch --wait to keep waiting even if a message timeout occurs.
   * CRAY - If configured with NodeFeatures=knl_cray and there are non-KNL
     nodes which include no features the slurmctld will abort without
     this patch when attemping strtok_r(NULL).
   * Fix regression in 17.02.7 which would run the spank_task_privileged as
     part of the slurmstepd instead of it's child process.
   * Fix security issue in Prolog and Epilog by always prepending SPANK_ to
     all user-set environment variables. CVE-2017-15566.
   Changes in 17.0.8:
   * Add 'slurmdbd:' to the accounting plugin to notify message is from dbd
    instead of local.
   * mpi/mvapich - Buffer being only partially cleared. No failures observed.
   * Fix for job  --switch option on dragonfly network.
   * In salloc with  --uid option, drop supplementary groups before changing UID.
   * jobcomp/elasticsearch - strip any trailing slashes from JobCompLoc.
   * jobcomp/elasticsearch - fix memory leak when transferring generated buffer.

OBS-URL: https://build.opensuse.org/request/show/538161
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=29
2017-11-01 17:01:38 +00:00
Dominique Leuenberger
ffb5750e23 Accepting request 534182 from network:cluster
- Add feature request number to changelog entry (FATE#324026).

OBS-URL: https://build.opensuse.org/request/show/534182
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=4
2017-10-19 17:32:26 +00:00
2f4ce2f8e9 - Fix FATE number (FATE#324026).
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=27
2017-10-16 10:21:20 +00:00
845da57e5e - Add feature request no to this. (FATE#323998).
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=26
2017-10-16 10:19:02 +00:00
Dominique Leuenberger
395325315d Accepting request 532262 from network:cluster
- Trim redundant wording in descriptions. (forwarded request 532228 from jengelh)

OBS-URL: https://build.opensuse.org/request/show/532262
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=3
2017-10-13 12:13:38 +00:00
848ae55172 Accepting request 532228 from home:jengelh:branches:network:cluster
- Trim redundant wording in descriptions.

OBS-URL: https://build.opensuse.org/request/show/532228
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=24
2017-10-06 15:44:30 +00:00
106d3595fa Accepting request 532221 from home:jjolly:hpc
Updated slurm.changes with patch information

OBS-URL: https://build.opensuse.org/request/show/532221
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=23
2017-10-06 13:44:28 +00:00
7eaadcf9e1 - Minor formatting fixes to the changelog:
* Patches upstream .service files to allow for /var/run/slurm path
  * Modifies slurm.conf to allow for /var/run/slurm path

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=22
2017-10-05 05:15:20 +00:00
3c1a1d0ac8 Accepting request 531499 from home:jjolly:hpc
- Updated to slurm 17-02-7-1
  * Added python as BuildRequires
  * Removed sched-wiki package
  * Removed slurmdb-direct package
  * Obsoleted sched-wiki and slurmdb-direct packages
  * Removing Cray-specific files
  * Added /etc/slurm/layout.d files (new for this version)
  * Remove /etc/slurm/cgroup files from package
  * Added lib/slurm/mcs_account.so
  * Removed lib/slurm/jobacct_gather_aix.so
  * Removed lib/slurm/job_submit_cnode.so
- Created slurm-sql package
- Moved files from slurm-plugins to slurm-torque package
- Moved creation of /usr/lib/tmpfiles.d/slurm.conf into slurm.spec
  * Removed tmpfiles.d-slurm.conf

- Made tmpfiles_create post-install macro SLE12 SP2 or greater
- Directly calling systemd-tmpfiles --create for before SLE12 SP2

- Allows OpenSUSE Factory build as well
- Removes unused .service files from project
- Adds /var/run/slurm to /usr/lib/tmpfiles.d for boottime creation
  - Patches upstream .service files to allow for /var/run/slurm path
  - Modifies slurm.conf to allow for /var/run/slurm path

OBS-URL: https://build.opensuse.org/request/show/531499
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=21
2017-10-05 05:13:44 +00:00
Dominique Leuenberger
a8f243d156 Accepting request 499661 from network:cluster
- Move wrapper script mpiexec provided by slrum-torque to
  mpiexec.slurm to avoid conflicts. This file is normally
  provided by the MPI implementation (boo#1041706). 

- Replace remaining ${RPM_BUILD_ROOT}s.
- Improve description.
- Fix up changelog.

- Spec file: Replace "Requires : slurm-perlapi" by
  "Requires: perl-slurm = %{version}" (boo#1031872).

- Fix array initialzation and ensure strings are always NULL terminated in
-  pam_slurm.c (bsc#1007053).
- Create slurm user/group in preinstall script.

OBS-URL: https://build.opensuse.org/request/show/499661
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=2
2017-06-08 13:01:56 +00:00
a09041a8b6 - Move wrapper script mpiexec provided by slrum-torque to
mpiexec.slurm to avoid conflicts. This file is normally
  provided by the MPI implementation (boo#1041706).

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=19
2017-05-30 10:41:04 +00:00
8b5c455f6f Accepting request 493422 from home:eeich:branches:network:cluster
- Replace remaining ${RPM_BUILD_ROOT}s.
- Improve description.
- Fix up changelog.

OBS-URL: https://build.opensuse.org/request/show/493422
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=18
2017-05-27 13:49:16 +00:00
Dominique Leuenberger
fc6f02d304 Accepting request 458674 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/458674
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=1
2017-03-04 15:36:42 +00:00
e4e11a7864 Accepting request 458469 from home:jengelh:branches:network:cluster
- Trim redundant parts of description. Fixup RPM groups.
- Replace unnecessary %__ macro indirections;
  replace historic $RPM_* variables by macros.

OBS-URL: https://build.opensuse.org/request/show/458469
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=16
2017-02-17 12:37:31 +00:00
8821ad8303 Accepting request 457538 from home:eeich:branches:network:cluster
- slurmd-Fix-for-newer-API-versions.patch:
  Stale patch removed.

OBS-URL: https://build.opensuse.org/request/show/457538
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=15
2017-02-15 22:24:41 +00:00
Corot Sebastien
a47a9ab6d8 Accepting request 455364 from home:eeich:branches:network:cluster
- Use %slurm_u and %slurm_g macros defined at the beginning of the spec
  file when adding the slurm user/group for consistency.
- Define these macros to daemon,root for non-systemd.
- For anything newer than Leap 42.1 or SLE-12-SP1 build OpenHPC compatible.

OBS-URL: https://build.opensuse.org/request/show/455364
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=14
2017-02-11 20:31:48 +00:00
Corot Sebastien
bd06e0c765 Accepting request 454272 from home:eeich:branches:network:cluster
- Updated to 16.05.8.1
 * Remove StoragePass from being printed out in the slurmdbd log at debug2
   level.
 * Defer PATH search for task program until launch in slurmstepd.
 * Modify regression test1.89 to avoid leaving vestigial job. Also reduce
    logging to reduce likelyhood of Expect buffer overflow.
 * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is
    enabled.
 * Fix for possible infinite loop in select/cons_res plugin when trying to
    satisfy a job's ntasks_per_core or socket specification.
 * If job is held for bad constraints make it so once updated the job doesn't
    go into JobAdminHeld.
 * sched/backfill - Fix logic to reserve resources for jobs that require a
    node reboot (i.e. to change KNL mode) in order to start.
 * When unpacking a node or front_end record from state and the protocol
    version is lower than the min version, set it to the min.
 * Remove redundant lookup for part_ptr when updating a reservation's nodes.
 * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic.
 * Do not allocate specialized cores to jobs using the --exclusive option.
 * Cancel interactive job if Prolog failure with "PrologFlags=contain" or
   "PrologFlags=alloc" configured. Send new error prolog failure message to
   the salloc or srun command as needed.
 * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line.
 * Fix check for PluginDir within slurmctld to work with multiple directories.
 * Cancel interactive jobs automatically on communication error to launching
   srun/salloc process.
 * Fix security issue caused by insecure file path handling triggered by the
   failure of a Prolog script. To exploit this a user needs to anticipate or
   cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371).
- Replace group/user add macros with function calls.
- Disable building with netloc support: the netloc API is part of the devel
  branch of hwloc. Since this devel branch was included accidentally and has
  been reversed since, we need to disable this for the time being.
- Conditionalized architecture specific pieces to support non-x86 architectures
  better.

- Remove: unneeded 'BuildRequires:  python'
- Add:
  BuildRequires:  freeipmi-devel
  BuildRequires:  libibmad-devel
  BuildRequires:  libibumad-devel
  so they are picked up by the slurm build.
- Enable modifications from openHPC Project.
- Enable lua API package build.
- Add a recommends for slurm-munge to the slurm package:
  This is way, the munge auth method is available and slurm
  works out of the box.
- Create /var/lib/slurm as StateSaveLocation directory.
  /tmp is dangerous. 

- Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE.

OBS-URL: https://build.opensuse.org/request/show/454272
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=13
2017-02-02 20:23:02 +00:00
Corot Sebastien
7bac92b6f9 Accepting request 441490 from home:eeich:branches:network:cluster
- Fix build with and without OHCP_BUILD define.
- Fix build for systemd and non-systemd.

- Updated to 16-05-5 - equvalent to OpenHPC 1.2.
  * Fix issue with resizing jobs and limits not be kept track of correctly.
  * BGQ - Remove redeclaration of job_read_lock.
  * BGQ - Tighter locks around structures when nodes/cables change state.
  * Make it possible to change CPUsPerTask with scontrol.
  * Make it so scontrol update part qos= will take away a partition QOS from
    a partition.
  * Backfill scheduling properly synchronized with Cray Node Health Check.
    Prior logic could result in highest priority job getting improperly
    postponed.
  * Make it so daemons also support TopologyParam=NoInAddrAny.
  * If scancel is operating on large number of jobs and RPC responses from
    slurmctld daemon are slow then introduce a delay in sending the cancel job
    requests from scancel in order to reduce load on slurmctld.
  * Remove redundant logic when updating a job's task count.
  * MySQL - Fix querying jobs with reservations when the id's have rolled.
  * Perl - Fix use of uninitialized variable in slurm_job_step_get_pids.
  * Launch batch job requsting --reboot after the boot completes.
  * Do not attempt to power down a node which has never responded if the
    slurmctld daemon restarts without state.
  * Fix for possible slurmstepd segfault on invalid user ID.
  * MySQL - Fix for possible race condition when archiving multiple clusters
    at the same time.
  * Add logic so that slurmstepd can be launched under valgrind.
  * Increase buffer size to read /proc/*/stat files.
  * Remove the SchedulerParameters option of "assoc_limit_continue", making it
    the default value. Add option of "assoc_limit_stop". If "assoc_limit_stop"
    is set and a job cannot start due to association limits, then do not attempt
    to initiate any lower priority jobs in that partition. Setting this can
    decrease system throughput and utlization, but avoid potentially starving
    larger jobs by preventing them from launching indefinitely.
  * Update a node's socket and cores per socket counts as needed after a node
    boot to reflect configuration changes which can occur on KNL processors.
    Note that the node's total core count must not change, only the distribution
    of cores across varying socket counts (KNL NUMA nodes treated as sockets by
    Slurm).
  * Rename partition configuration from "Shared" to "OverSubscribe". Rename
    salloc, sbatch, srun option from "--shared" to "--oversubscribe". The old
    options will continue to function. Output field names also changed in
    scontrol, sinfo, squeue and sview.
  * Add SLURM_UMASK environment variable to user job.
  * knl_conf: Added new configuration parameter of CapmcPollFreq.
  * Cleanup two minor Coverity warnings.
  * Make it so the tres units in a job's formatted string are converted like
    they are in a step.
  * Correct partition's MaxCPUsPerNode enforcement when nodes are shared by
    multiple partitions.
  * node_feature/knl_cray - Prevent slurmctld GRES errors for "hbm" references.
  * Display thread name instead of thread id and remove process name in stderr
    logging for "thread_id" LogTimeFormat.
  * Log IP address of bad incomming message to slurmctld.
  * If a user requests tasks, nodes and ntasks-per-node and
    tasks-per-node/nodes != tasks print warning and ignore ntasks-per-node.
  * Release CPU "owner" file locks.
  * Update seff to fix warnings with ncpus, and list slurm-perlapi dependency
    in spec file.
  * Allow QOS timelimit to override partition timelimit when EnforcePartLimits
    is set to all/any.
  * Make it so qsub will do a "basename" on a wrapped command for the output
    and error files.
  * Add logic so that slurmstepd can be launched under valgrind.
  * Increase buffer size to read /proc/*/stat files.
  * Prevent job stuck in configuring state if slurmctld daemon restarted while
    PrologSlurmctld is running. Also re-issue burst_buffer/pre-load operation
    as needed.
  * Move test for job wait reason value of BurstBufferResources and
    BurstBufferStageIn later in the scheduling logic.
  * Document which srun options apply to only job, only step, or job and step
    allocations.
  * Use more compatible function to get thread name (>= 2.6.11).
  * Make it so the extern step uses a reverse tree when cleaning up.
  * If extern step doesn't get added into the proctrack plugin make sure the
    sleep is killed.
  * Add web links to Slurm Diamond Collectors (from Harvard University) and
    collectd (from EDF).
  * Add job_submit plugin for the "reboot" field.
  * Make some more Slurm constants (INFINITE, NO_VAL64, etc.) available to
    job_submit/lua plugins.
  * Send in a -1 for a taskid into spank_task_post_fork for the extern_step.
  * MYSQL - Sightly better logic if a job completion comes in with an end time
    of 0.
  * task/cgroup plugin is configured with ConstrainRAMSpace=yes, then set soft
    memory limit to allocated memory limit (previously no soft limit was set).
  * Streamline when schedule() is called when running with message aggregation
    on batch script completes.
  * Fix incorrect casting when [un]packing derived_ec on slurmdb_job_rec_t.
  * Document that persistent burst buffers can not be created or destroyed using
    the salloc or srun --bb options.
  * Add support for setting the SLURM_JOB_ACCOUNT, SLURM_JOB_QOS and
    SLURM_JOB_RESERVAION environment variables are set for the salloc command.
    Document the same environment variables for the salloc, sbatch and srun
    commands in their man pages.
  * Fix issue where sacctmgr load cluster.cfg wouldn't load associations
    that had a partition in them.
  * Don't return the extern step from sstat by default.
  * In sstat print 'extern' instead of 4294967295 for the extern step.
  * Make advanced reservations work properly with core specialization.
  * slurmstepd modified to pre-load all relevant plugins at startup to avoid
    the possibility of modified plugins later resulting in inconsistent API
    or data structures and a failure of slurmstepd.
  * Export functions from parse_time.c in libslurm.so.
  * Export unit convert functions from slurm_protocol_api.c in libslurm.so.
  * Fix scancel to allow multiple steps from a job to be cancelled at once.
  * Update and expand upgrade guide (in Quick Start Administrator web page).
  * burst_buffer/cray: Requeue, but do not hold a job which fails the pre_run
    operation.
  * Insure reported expected job start time is not in the past for pending jobs.
  * Add support for PMIx v2.

OBS-URL: https://build.opensuse.org/request/show/441490
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=12
2016-11-24 22:01:51 +00:00
3a8ac1a69b Accepting request 435788 from home:eeich:branches:network:cluster
- Setting 'download_files' service to mode='localonly'
  and adding source tarball. (Required for Factory).

OBS-URL: https://build.opensuse.org/request/show/435788
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=11
2016-10-17 15:40:31 +00:00
Corot Sebastien
2028708d3a Accepting request 435622 from home:eeich:branches:network:cluster
- version 15.08.7.1
  * Remove the 1024-character limit on lines in batch scripts.
    task/affinity: Disable core-level task binding if more CPUs required than
    available cores.
  * Preemption/gang scheduling: If a job is suspended at slurmctld restart or
    reconfiguration time, then leave it suspended rather than resume+suspend.
  * Don't use lower weight nodes for job allocation when topology/tree used.
  * Don't allow user specified reservation names to disrupt the normal
    reservation sequeuece numbering scheme.
  * Avoid hard-link/copy of script/environment files for job arrays. Use the
    master job record file for all tasks of the job array.
    NOTE: Job arrays submitted to Slurm version 15.08.6 or later will fail if
    the slurmctld daemon is downgraded to an earlier version of Slurm.
  * In slurmctld log file, log duplicate job ID found by slurmd. Previously was
    being logged as prolog/epilog failure.
  * If a job is requeued while in the process of being launch, remove it's
    job ID from slurmd's record of active jobs in order to avoid generating a
    duplicate job ID error when launched for the second time (which would
    drain the node).
  * Cleanup messages when handling job script and environment variables in
    older directory structure formats.
  * Prevent triggering gang scheduling within a partition if configured with
    PreemptType=partition_prio and PreemptMode=suspend,gang.
  * Decrease parallelism in job cancel request to prevent denial of service
    when cancelling huge numbers of jobs.
  * If all ephemeral ports are in use, try using other port numbers.
  * Prevent "scontrol update job" from updating jobs that have already finished.
  * Show requested TRES in "squeue -O tres" when job is pending.
  * Backfill scheduler: Test association and QOS node limits before reserving
    resources for pending job.
  * Many bug fixes.
- Use source services to download package.
- Fix code for new API of hwloc-2.0.
- package netloc_to_topology where avialable.
- Package documentation.

OBS-URL: https://build.opensuse.org/request/show/435622
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=10
2016-10-16 19:51:20 +00:00
Corot Sebastien
d541e7d46f Accepting request 341936 from home:scorot:branches:network:cluster
- version 15.08.3
  * Many new features and bug fixes. See NEWS file 
- update files list accordingly
- fix wrong end of line in some files

OBS-URL: https://build.opensuse.org/request/show/341936
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=9
2015-11-01 16:14:37 +00:00
Corot Sebastien
b0a822f465 Accepting request 321014 from home:scorot:branches:network:cluster
- version 14.11.8

OBS-URL: https://build.opensuse.org/request/show/321014
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=8
2015-08-06 19:51:59 +00:00
Corot Sebastien
1648adac4b Accepting request 259704 from home:scorot:branches:network:cluster
- add missing systemd requirements 
- add missing rclink

OBS-URL: https://build.opensuse.org/request/show/259704
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=7
2014-11-04 20:12:14 +00:00
Corot Sebastien
ee0741e9cc Accepting request 259383 from home:scorot:branches:network:cluster
- version 14.03.9
  * Many bug fixes. See NEWS file
- add systemd support

OBS-URL: https://build.opensuse.org/request/show/259383
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=6
2014-11-02 20:27:29 +00:00
Corot Sebastien
926d205df2 Accepting request 242503 from home:scorot:branches:network:cluster
- version 14.03.6
  * Added support for native Slurm operation on Cray systems (without ALPS).
  * Added partition configuration parameters AllowAccounts, AllowQOS, DenyAccounts and DenyQOS to provide greater control over use.
  * Added the ability to perform load based scheduling. Allocating resources to jobs on the nodes with the largest number if idle CPUs.
  * Added support for reserving cores on a compute node for system services (core specialization)
  * Add mechanism for job_submit plugin to generate error message for srun, salloc or sbatch to stderr.
  * Support for Postgres database has long since been out of date and problematic, so it has been removed entirely.  If you would like to use it the code still exists in <= 2.6, but will not be included in this and future versions of the code.
  * Added new structures and support for both server and cluster resources.
  * Significant performance improvements, especially with respect to job array support. 
- update files list

OBS-URL: https://build.opensuse.org/request/show/242503
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=5
2014-07-26 14:36:03 +00:00
Corot Sebastien
9d43bcd4c5 Accepting request 226317 from home:scorot:branches:network:cluster
- update to version 2.6.7
  * Support for job arrays, which increases performance and ease of
    use for sets of similar jobs.
  * Job profiling capability added to record a wide variety of job
    characteristics for each task on a user configurable periodic
    basis. Data currently available includes CPU use, memory use,
    energy use, Infiniband network use, Lustre file system use, etc.
  * Support for MPICH2 using PMI2 communications interface with much
    greater scalability.
  * Prolog and epilog support for advanced reservations.
  * Much faster throughput for job step execution with --exclusive
    option. The srun process is notified when resources become
    available rather than periodic polling.
  * Support improved for Intel MIC (Many Integrated Core) processor.
  * Advanced reservations with hostname and core counts now supports
    asymmetric reservations (e.g. specific different core count for
    each node).
  * External sensor plugin infrastructure added to record power
    consumption, temperature, etc.
  * Improved performance for high-throughput computing.
  * MapReduce+ support (launches ~1000x faster, runs ~10x faster).
  * Added "MaxCPUsPerNode" partition configuration parameter. This
    can be especially useful to schedule GPUs. For example a node
    can be associated with two Slurm partitions (e.g. "cpu" and
    "gpu") and the partition/queue "cpu" could be limited to only a
    subset of the node's CPUs, insuring that one or more CPUs would
    be available to jobs in the "gpu" partition/queue.

OBS-URL: https://build.opensuse.org/request/show/226317
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=4
2014-03-16 20:42:08 +00:00
Corot Sebastien
0425ab0b3d Accepting request 177944 from home:scorot:branches:network:cluster
- version 2.5.7
  * Fix for linking to the select/cray plugin to not give warning
    about undefined variable.
  * Add missing symbols to the xlator.h
  * Avoid placing pending jobs in AdminHold state due to backfill
    scheduler interactions with advanced reservation.
  * Accounting - make average by task not cpu.
  * POE - Correct logic to support poe option "-euidevice sn_all"
    and "-euidevice sn_single".
  * Accounting - Fix minor initialization error.
  * POE - Correct logic to support srun network instances count
    with POE.
  * POE - With the srun --launch-cmd option, report proper task
    count when the --cpus-per-task option is used without the
    --ntasks option.
  * POE - Fix logic binding tasks to CPUs.
  * sview - Fix race condition where new information could of
    slipped past the node tab and we didn't notice.
  * Accounting - Fix an invalid memory read when slurmctld sends
    data about start job to slurmdbd.
  * If a prolog or epilog failure occurs, drain the node rather
    than setting it down and killing all of its jobs.
  * Priority/multifactor - Avoid underflow in half-life calculation.
  * POE - pack missing variable to allow fanout (more than 32
    nodes)
  * Prevent clearing reason field for pending jobs. This bug was
    introduced in v2.5.5 (see "Reject job at submit time ...").
  * BGQ - Fix issue with preemption on sub-block jobs where a job
    would kill all preemptable jobs on the midplane instead of just
    the ones it needed to.

OBS-URL: https://build.opensuse.org/request/show/177944
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=3
2013-06-06 21:03:00 +00:00
Corot Sebastien
47c8ca2b53 Accepting request 163479 from home:scorot:branches:network:cluster
- version 2.5.4
  * Support for Intel® Many Integrated Core (MIC) processors.
  * User control over CPU frequency of each job step.
  * Recording power usage information for each job.
  * Advanced reservation of cores rather than whole nodes.
  * Integration with IBM's Parallel Environment including POE (Parallel Operating Environment) and NRT (Network Resource Table) API.
  * Highly optimized throughput for serial jobs in a new "select/serial" plugin.
  * CPU load is information available
  * Configurable number of CPUs available to jobs in each SLURM partition, which provides a mechanism to reserve CPUs for use with GPUs.

OBS-URL: https://build.opensuse.org/request/show/163479
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=2
2013-04-09 20:09:17 +00:00
Tobias Burnus
1403874780 Accepting request 163005 from home:scorot:branches:network:cluster
Simple Linux Utility for Resource Management

OBS-URL: https://build.opensuse.org/request/show/163005
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=1
2013-04-08 21:59:36 +00:00
3 changed files with 0 additions and 71 deletions

View File

@ -1,65 +0,0 @@
From: Egbert Eich <eich@suse.com>
Date: Wed Jun 22 14:39:10 2022 +0200
Subject: Fix test 21.41
Patch-mainline: Not yet
Git-repo: https://github.com/SchedMD/slurm
Git-commit: 21619ffa15d1d656ee11a477ebb8215a06387fdd
References:
Since expect is not line oriented, the output is not matched line by line.
Thus the order in which results are returned by sacctmgr actually matters:
If the first test case matches what is returned first, this part will be
consumed. If the 2nd test case will then match what is left over, the
test will actually succeed.
If this is not the case, ie if the first test matches a part that is
actually sent later, the earlier parts will actually be forgotten and
won't match at all.
To make the test resilient to different order of results, the test has
been rewritten to only contain a single match line.
Signed-off-by: Egbert Eich <eich@suse.com>
Signed-off-by: Egbert Eich <eich@suse.de>
---
testsuite/expect/test21.41 | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/testsuite/expect/test21.41 b/testsuite/expect/test21.41
index c0961522db..1fd921a48f 100755
--- a/testsuite/expect/test21.41
+++ b/testsuite/expect/test21.41
@@ -372,21 +372,21 @@ expect {
-re "There was a problem" {
fail "There was a problem with the sacctmgr command"
}
- -re "$user1.$wckey1.($number)." {
- set user1wckey1 $expect_out(1,string)
- exp_continue
- }
- -re "$user2.$wckey1.($number)." {
- set user2wckey1 $expect_out(1,string)
- exp_continue
- }
- -re "$user1.$wckey2.($number)." {
- set user1wckey2 $expect_out(1,string)
- exp_continue
- }
- -re "$user2.$wckey2.($number)." {
- set user2wckey2 $expect_out(1,string)
- exp_continue
+ -re "($user1|$user2).($wckey1|$wckey2).($number)." {
+ if { $expect_out(1,string) eq $user1 } {
+ if { $expect_out(2,string) eq $wckey1 } {
+ set user1wckey1 $expect_out(3,string)
+ } elseif { $expect_out(2,string) eq $wckey2 } {
+ set user1wckey2 $expect_out(3,string)
+ }
+ } elseif { $expect_out(1,string) eq $user2 } {
+ if { $expect_out(2,string) eq $wckey1 } {
+ set user2wckey1 $expect_out(3,string)
+ } elseif { $expect_out(2,string) eq $wckey2 } {
+ set user2wckey2 $expect_out(3,string)
+ }
+ }
+ exp_continue
}
timeout {
fail "sacctmgr wckeys not responding"

View File

@ -1,3 +0,0 @@
version https://git-lfs.github.com/spec/v1
oid sha256:7a8f4b1b46d3a8ec9a95066b04635c97f9095877f6189a8ff7388e5e74daeef3
size 7365175

View File

@ -1,3 +0,0 @@
version https://git-lfs.github.com/spec/v1
oid sha256:240a2105c8801bc0d222fa2bbcf46f71392ef94cce9253357e5f43f029adaf9b
size 7183430