SHA256
1
0
forked from pool/slurm

287 Commits

Author SHA256 Message Date
Ana Guerrero
1e8971e87a Accepting request 1129192 from network:cluster
Automatic submission by obs-autosubmit

OBS-URL: https://build.opensuse.org/request/show/1129192
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=100
2023-11-27 21:44:42 +00:00
db15cbcf3e - On SLE-12 exclude build for s390x.
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=277
2023-11-20 15:31:39 +00:00
Ana Guerrero
ccb26326c7 Accepting request 1123596 from network:cluster
- Add missing dependencies to slurm-config to plugins package.
  These should help to tie down the slurm version and help to avoid
  a package mix (bsc#1216869). (forwarded request 1123595 from eeich)

OBS-URL: https://build.opensuse.org/request/show/1123596
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=99
2023-11-06 20:14:38 +00:00
961668403a Accepting request 1123595 from home:eeich:branches:network:cluster
- Add missing dependencies to slurm-config to plugins package.
  These should help to tie down the slurm version and help to avoid
  a package mix (bsc#1216869).

OBS-URL: https://build.opensuse.org/request/show/1123595
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=275
2023-11-06 14:56:24 +00:00
Dominique Leuenberger
b28d182fe8 Accepting request 1121548 from network:cluster
Automatic submission by obs-autosubmit

OBS-URL: https://build.opensuse.org/request/show/1121548
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=98
2023-11-01 21:09:57 +00:00
c9c235c313 Format fix to changes file:
`GET /slurmdb/v0.0.39/assocations` and `GET /slurmdb/v0.0.39/qos` to

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=273
2023-10-25 07:12:31 +00:00
Ana Guerrero
150d433676 Accepting request 1118220 from network:cluster
- update to 23.02.6 to fix (CVE-2023-41914, bsc#1216207)

OBS-URL: https://build.opensuse.org/request/show/1118220
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=97
2023-10-17 18:24:48 +00:00
37c34593a9 - update to 23.02.6 to fix (CVE-2023-41914, bsc#1216207)
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=271
2023-10-17 08:09:39 +00:00
Ana Guerrero
f946358d8c Accepting request 1117163 from network:cluster
- update to 23.02.6 to fix (CVE-2023-41914)
  * Removed Fix-test-32.8.patch as fixed upstream
  * Bug Fixes:
    + Fix `CpusPerTres=` not upgreadable with scontrol update
    + Fix unintentional gres removal when validating the gres job state.
    + Fix `--without-hpe-slingshot` configure option.
    + Fix cgroup v2 memory calculations when transparent huge pages are used.
    + Fix parsing of `sgather --timeout` option.
    + Fix regression from 22.05.0 that caused `srun --cpu-bind "=verbose"`
      and `"=v"` options give different CPU bind masks.
    + Fix "_find_node_record: lookup failure for node" error message appearing
      for all dynamic nodes during reconfigure.
    + Avoid segfault if loading serializer plugin fails.
    + `slurmrestd` - Correct OpenAPI format for `GET /slurm/v0.0.39/licenses`.
    + `slurmrestd` - Correct OpenAPI format for
      `GET /slurm/v0.0.39/job/{job_id}`.
    + `slurmrestd` - Change format to multiple fields in
     'GET /slurmdb/v0.0.39/assocations` and `GET /slurmdb/v0.0.39/qos` to
      handle infinite and unset states.
    + When a node fails in a job with `--no-kill`, preserve the extern step on the
      remaining nodes to avoid breaking features that rely on the extern step
      such as `pam_slurm_adopt`, `x11`, and `job_container/tmpfs`.
    + `auth/jwt` - Ignore `x5c` field in JWKS files.
    + `auth/jwt` - Treat 'alg' field as optional in JWKS files.
    + Allow job_desc.selinux_context to be read from the job_submit.lua script.
    + Skip check in slurmstepd that causes a large number of errors in the
      munge log: "Unauthorized credential for client UID=0 GID=0".
      This error will still appear on `slurmd`/`slurmctld`/`slurmdbd` start up
      and is not a cause for concern.
    + `slurmctld` - Allow startup with zero partitions.

OBS-URL: https://build.opensuse.org/request/show/1117163
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=96
2023-10-12 21:41:42 +00:00
449ea49bf9 - Fix changes file formatting
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=269
2023-10-12 10:02:10 +00:00
cd2c5bfc50 Accepting request 1117145 from home:mslacken:branches:network:cluster
* Bug Fixes:
   + Fix CpusPerTres= not upgreadable with scontrol update
   + Fix unintentional gres removal when validating the gres job state.
   + Fix --without-hpe-slingshot configure option.
   + Fix cgroup v2 memory calculations when transparent huge pages are used.
   + Fix parsing of sgather --timeout option.
   + Fix regression from 22.05.0 that caused srun --cpu-bind "=verbose" and "=v"
     options give different CPU bind masks.
   + Fix "_find_node_record: lookup failure for node" error message appearing
     for all dynamic nodes during reconfigure.
   + Avoid segfault if loading serializer plugin fails.
   + slurmrestd - Correct OpenAPI format for 'GET /slurm/v0.0.39/licenses'.
   + slurmrestd - Correct OpenAPI format for 'GET /slurm/v0.0.39/job/{job_id}'.
   + slurmrestd - Change format to multiple fields in 'GET
     /slurmdb/v0.0.39/assocations' and 'GET /slurmdb/v0.0.39/qos' to handle
     infinite and unset states.
   + When a node fails in a job with --no-kill, preserve the extern step on the
     remaining nodes to avoid breaking features that rely on the extern step
     such as pam_slurm_adopt, x11, and job_container/tmpfs.
   + auth/jwt - Ignore 'x5c' field in JWKS files.
   + auth/jwt - Treat 'alg' field as optional in JWKS files.
   + Allow job_desc.selinux_context to be read from the job_submit.lua script.
   + Skip check in slurmstepd that causes a large number of errors in the munge
     log: "Unauthorized credential for client UID=0 GID=0".  This error will
     still appear on slurmd/slurmctld/slurmdbd start up and is not a cause for
     concern.
   + slurmctld - Allow startup with zero partitions.
   + Fix some mig profile names in slurm not matching nvidia mig profiles.
   + Prevent slurmscriptd processing delays from blocking other threads in
     slurmctld while trying to launch {Prolog|Epilog}Slurmctld.

OBS-URL: https://build.opensuse.org/request/show/1117145
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=268
2023-10-12 09:09:32 +00:00
90bba6a8aa Accepting request 1117137 from home:mslacken:branches:network:cluster
- update to 23.02.6 to fix (CVE-2023-41914) 
  * Removed Fix-test-32.8.patch as fixed upstream

OBS-URL: https://build.opensuse.org/request/show/1117137
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=267
2023-10-12 08:49:44 +00:00
Dominique Leuenberger
12bf38b1d0 Accepting request 1111943 from network:cluster
- Updated to version 23.02.5 with the following changes:
  * Bug Fixes:
    + Revert a change in 23.02 where `SLURM_NTASKS` was no longer set in the
      job's environment when `--ntasks-per-node` was requested.
      The method that is is being set, however, is different and should be more
      accurate in more situations.
    + Change pmi2 plugin to honor the `SrunPortRange` option. This matches the
      new behavior of the pmix plugin in 23.02.0. Note that neither of these
      plugins makes use of the `MpiParams=ports=` option, and previously
      were only limited by the systems ephemeral port range.
    + Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if
      a node features plugin is configured.
    + Fix and prevent reoccurring reservations from overlapping.
    + `job_container/tmpfs` - Avoid attempts to share BasePath between nodes.
    + With `CR_Cpu_Memory`, fix node selection for jobs that request gres and
      `--mem-per-cpu`.
    + Fix a regression from 22.05.7 in which some jobs were allocated too few
      nodes, thus overcommitting cpus to some tasks.
    + Fix a job being stuck in the completing state if the job ends while the
      primary controller is down or unresponsive and the backup controller has
      not yet taken over.
    + Fix `slurmctld` segfault when a node registers with a configured
      `CpuSpecList` while `slurmctld` configuration has the node without
      `CpuSpecList`.
    + Fix cloud nodes getting stuck in `POWERED_DOWN+NO_RESPOND` state after
      not registering by `ResumeTimeout`.
    + `slurmstepd` - Avoid cleanup of `config.json-less` containers spooldir
      getting skipped.
    + Fix scontrol segfault when 'completing' command requested repeatedly in
      interactive mode.

OBS-URL: https://build.opensuse.org/request/show/1111943
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=95
2023-09-20 11:26:46 +00:00
f0b994e220 plugins makes use of the MpiParams=ports= option, and previously
features with the `|` operator, which could prevent jobs from
    + `node_features/helpers` - Fix inconsistent handling of `&` and `|`,
      instead of just the current set. E.g. `foo|bar&baz` was interpreted
      `{foo} or {bar,baz}`.
      tasks fewer than GPUs, which resulted in incorrectly rejecting these
      jobs.
    + `slurmrestd` - For `GET /slurm/v0.0.39/node[s]`, change format of
      node's energy field `current_watts` to a dictionary to account for
    + `slurmrestd` - For `GET /slurm/v0.0.39/qos`, change format of QOS's
    + slurmrestd - For `GET /slurm/v0.0.39/job[s]`, the 'return code'
      `GET /slurmdb/v0.0.39/jobs` from slurmrestd.
      were present in the log: `error: Attempt to change gres/gpu Count`.
    + Hold the job with `(Reservation ... invalid)` state reason if the

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=265
2023-09-18 05:43:58 +00:00
74529b6cc2 - Updated to version 23.02.5 with the following changes:
* Bug Fixes:
    + Revert a change in 23.02 where `SLURM_NTASKS` was no longer set in the
      job's environment when `--ntasks-per-node` was requested.
      The method that is is being set, however, is different and should be more
      accurate in more situations.
    + Change pmi2 plugin to honor the `SrunPortRange` option. This matches the
      new behavior of the pmix plugin in 23.02.0. Note that neither of these
      plugins makes use of the "`MpiParams=ports=`" option, and previously
      were only limited by the systems ephemeral port range.
    + Fix regression in 23.02.2 that caused slurmctld -R to crash on startup if
      a node features plugin is configured.
    + Fix and prevent reoccurring reservations from overlapping.
    + `job_container/tmpfs` - Avoid attempts to share BasePath between nodes.
    + With `CR_Cpu_Memory`, fix node selection for jobs that request gres and
      `--mem-per-cpu`.
    + Fix a regression from 22.05.7 in which some jobs were allocated too few
      nodes, thus overcommitting cpus to some tasks.
    + Fix a job being stuck in the completing state if the job ends while the
      primary controller is down or unresponsive and the backup controller has
      not yet taken over.
    + Fix `slurmctld` segfault when a node registers with a configured
      `CpuSpecList` while `slurmctld` configuration has the node without
      `CpuSpecList`.
    + Fix cloud nodes getting stuck in `POWERED_DOWN+NO_RESPOND` state after
      not registering by `ResumeTimeout`.
    + `slurmstepd` - Avoid cleanup of `config.json-less` containers spooldir
      getting skipped.
    + Fix scontrol segfault when 'completing' command requested repeatedly in
      interactive mode.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=264
2023-09-18 05:24:51 +00:00
Ana Guerrero
3825e9fab0 Accepting request 1110422 from network:cluster
- Create a macro for upgrade dependency to ensure uniform handling. (forwarded request 1110421 from eeich)

OBS-URL: https://build.opensuse.org/request/show/1110422
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=94
2023-09-12 19:02:53 +00:00
a323feff42 Accepting request 1110421 from home:eeich:branches:network:cluster
- Create a macro for upgrade dependency to ensure uniform handling.

OBS-URL: https://build.opensuse.org/request/show/1110421
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=262
2023-09-12 04:52:56 +00:00
Ana Guerrero
3bcde4bfd9 Accepting request 1110259 from network:cluster
- Updated to 23.02.4 with the following changes:
  * Bug Fixes:
    + Fix main scheduler loop not starting after a failover to backup
      controller. Avoid slurmctld segfault when specifying
     `AccountingStorageExternalHost` (bsc#1214983).
    + Fix sbatch return code when `--wait` is requested on a job array.
    + Fix collected `GPUUtilization` values for `acct_gather_profile` plugins.
    + Fix `slurmrestd` handling of job hold/release operations.
    + Fix step running indefinitely when slurmctld takes more than
      `MessageTimeout` to respond. Now, `slurmctld` will cancel the step when
       detected, preventing following steps from getting stuck waiting for
       resources to be released.
    + Fix regression to make `job_desc.min_cpus` accurate again in `job_submit`
      when requesting a job with `--ntasks-per-node`.
    + Fix handling of `ArrayTaskThrottle` in backfill.
    + Fix regression in 23.02.2 when checking gres state on `slurmctld`
      startup  or reconfigure. Gres changes in the configuration were not
      updated on slurmctld startup. On startup or reconfigure, these messages
      were present in the log: `"error: Attempt to change gres/gpu Count`".
    + Fix potential double count of gres when dealing with limits.
    + Fix `slurmstepd` segfault when `ContainerPath` is not set in `oci.conf`
    + Fixed an issue where jobs requesting licenses were incorrectly rejected.
    + `scrontab` - Fix cutting off the final character of quoted variables.
    + `smail` - Fix issues where e-mails at job completion were not being sent.
    + `scontrol/slurmctld` - fix comma parsing when updating a reservation's
       nodes.
    + Fix `--gpu-bind=single binding` tasks to wrong gpus, leading to some gpus
      having more tasks than they should and other gpus being unused.
    + Fix regression in 23.02 that causes slurmstepd to crash when `srun`
      requests more than `TreeWidth` nodes in a step and uses the pmi2 or

OBS-URL: https://build.opensuse.org/request/show/1110259
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=93
2023-09-11 19:22:19 +00:00
f9646ba945 - Updated to 23.02.4 with the following changes:
* Bug Fixes:
    + Fix main scheduler loop not starting after a failover to backup
      controller. Avoid slurmctld segfault when specifying
     `AccountingStorageExternalHost` (bsc#1214983).
    + Fix sbatch return code when `--wait` is requested on a job array.
    + Fix collected `GPUUtilization` values for `acct_gather_profile` plugins.
    + Fix `slurmrestd` handling of job hold/release operations.
    + Fix step running indefinitely when slurmctld takes more than
      `MessageTimeout` to respond. Now, `slurmctld` will cancel the step when
       detected, preventing following steps from getting stuck waiting for
       resources to be released.
    + Fix regression to make `job_desc.min_cpus` accurate again in `job_submit`
      when requesting a job with `--ntasks-per-node`.
    + Fix handling of `ArrayTaskThrottle` in backfill.
    + Fix regression in 23.02.2 when checking gres state on `slurmctld`
      startup  or reconfigure. Gres changes in the configuration were not
      updated on slurmctld startup. On startup or reconfigure, these messages
      were present in the log: `"error: Attempt to change gres/gpu Count`".
    + Fix potential double count of gres when dealing with limits.
    + Fix `slurmstepd` segfault when `ContainerPath` is not set in `oci.conf`
    + Fixed an issue where jobs requesting licenses were incorrectly rejected.
    + `scrontab` - Fix cutting off the final character of quoted variables.
    + `smail` - Fix issues where e-mails at job completion were not being sent.
    + `scontrol/slurmctld` - fix comma parsing when updating a reservation's
       nodes.
    + Fix `--gpu-bind=single binding` tasks to wrong gpus, leading to some gpus
      having more tasks than they should and other gpus being unused.
    + Fix regression in 23.02 that causes slurmstepd to crash when `srun`
      requests more than `TreeWidth` nodes in a step and uses the pmi2 or

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=260
2023-09-11 07:21:32 +00:00
Ana Guerrero
6b47182efe Accepting request 1109308 from network:cluster
- Fixes since 23.02.03:
  Highlights:
  * Fix main scheduler loop not starting after a failover to backup controller.
  * Avoid slurmctld segfault when specifying `AccountingStorageExternalHost`
    (bsc#1214983).
  Other:
  * Fix sbatch return code when `--wait` is requested on a job array.
  * Fix collected `GPUUtilization` values for `acct_gather_profile` plugins.
  * Fix `slurmrestd` handling of job hold/release operations.
  * Make spank `S_JOB_ARGV` item value hold the requested command `argv`
    instead of the `srun --bcast` value when `--bcast` requested (only in local
    context).
  * Fix step running indefinitely when slurmctld takes more than
    `MessageTimeout` to respond. Now, slurmctld will cancel the step when
    detected, preventing following steps from getting stuck waiting for
    resources to be released.
  * Fix regression to make `job_desc.min_cpus` accurate again in job_submit when
    requesting a job with `--ntasks-per-node`.
  * Fix handling of `ArrayTaskThrottle` in backfill.
  * Fix regression in 23.02.2 when checking gres state on `slurmctld` startup or
    reconfigure. Gres changes in the configuration were not updated on slurmctld
    startup. On startup or reconfigure, these messages were present in the log:
    `"error: Attempt to change gres/gpu Count`".
  * Fix potential double count of gres when dealing with limits.
  * Fix slurmstepd segfault when ContainerPath is not set in `oci.conf`
  * Fixed an issue where jobs requesting licenses were incorrectly rejected.
  * `scrontab` - Fix cutting off the final character of quoted variables.
  * `smail` - Fix issues where e-mails at job completion were not being sent.
  * `scontrol/slurmctld` - fix comma parsing when updating a reservation's
    nodes.

OBS-URL: https://build.opensuse.org/request/show/1109308
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=92
2023-09-07 19:12:41 +00:00
c63b605916 - Fixes since 23.02.03:
Highlights:
  * Fix main scheduler loop not starting after a failover to backup controller.
  * Avoid slurmctld segfault when specifying `AccountingStorageExternalHost`
    (bsc#1214983).
  Other:
  * Fix sbatch return code when `--wait` is requested on a job array.
  * Fix collected `GPUUtilization` values for `acct_gather_profile` plugins.
  * Fix `slurmrestd` handling of job hold/release operations.
  * Make spank `S_JOB_ARGV` item value hold the requested command `argv`
    instead of the `srun --bcast` value when `--bcast` requested (only in local
    context).
  * Fix step running indefinitely when slurmctld takes more than
    `MessageTimeout` to respond. Now, slurmctld will cancel the step when
    detected, preventing following steps from getting stuck waiting for
    resources to be released.
  * Fix regression to make `job_desc.min_cpus` accurate again in job_submit when
    requesting a job with `--ntasks-per-node`.
  * Fix handling of `ArrayTaskThrottle` in backfill.
  * Fix regression in 23.02.2 when checking gres state on `slurmctld` startup or
    reconfigure. Gres changes in the configuration were not updated on slurmctld
    startup. On startup or reconfigure, these messages were present in the log:
    `"error: Attempt to change gres/gpu Count`".
  * Fix potential double count of gres when dealing with limits.
  * Fix slurmstepd segfault when ContainerPath is not set in `oci.conf`
  * Fixed an issue where jobs requesting licenses were incorrectly rejected.
  * `scrontab` - Fix cutting off the final character of quoted variables.
  * `smail` - Fix issues where e-mails at job completion were not being sent.
  * `scontrol/slurmctld` - fix comma parsing when updating a reservation's
    nodes.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=258
2023-09-06 17:11:37 +00:00
Ana Guerrero
51bec69223 Accepting request 1109029 from network:cluster
- updated to 23.02.04 which includes following changes: 
  * fixing the main scheduler loop not starting on the backup controller after
    a failover event, a segfault when attempting to use
  * AccountingStorageExternalHost, and an issue where steps could continue
    running indefinitely if the slurmctld takes too long to respond (bsc#1214983)
  * include a fix for a potential slurmctld crashes when the backup slurmctld
    takes over.
  * This also fixes some issues when using older versions of the command line
    tools with a 23.02 controller.
  * srun/sbatch/salloc - In order to support user namespaces, process user and
    group ids are no longer used unless explicitly requested as an argument and
    are left as nobody(99) by default. Any cli_filters or SPANK plugins need to
    ignore any uid or gid that equal SLURM_AUTH_NOBODY (99). User and group ids
    are now resolved by the active auth plugin. To determine the actual job uid
    or gid you should use the RESPONSE_RESOURCE_ALLOCATION RPC.
- removed Fix-test-3.13.patch as fixed upstream
- removed Fix-test-38.11.patch as test changed upstream (forwarded request 1109009 from mslacken)

OBS-URL: https://build.opensuse.org/request/show/1109029
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=91
2023-09-06 16:57:11 +00:00
47d665607b Accepting request 1109009 from home:mslacken:branches:network:cluster
- updated to 23.02.04 which includes following changes: 
  * fixing the main scheduler loop not starting on the backup controller after
    a failover event, a segfault when attempting to use
  * AccountingStorageExternalHost, and an issue where steps could continue
    running indefinitely if the slurmctld takes too long to respond (bsc#1214983)
  * include a fix for a potential slurmctld crashes when the backup slurmctld
    takes over.
  * This also fixes some issues when using older versions of the command line
    tools with a 23.02 controller.
  * srun/sbatch/salloc - In order to support user namespaces, process user and
    group ids are no longer used unless explicitly requested as an argument and
    are left as nobody(99) by default. Any cli_filters or SPANK plugins need to
    ignore any uid or gid that equal SLURM_AUTH_NOBODY (99). User and group ids
    are now resolved by the active auth plugin. To determine the actual job uid
    or gid you should use the RESPONSE_RESOURCE_ALLOCATION RPC.
- removed Fix-test-3.13.patch as fixed upstream
- removed Fix-test-38.11.patch as test changed upstream

OBS-URL: https://build.opensuse.org/request/show/1109009
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=256
2023-09-05 11:47:06 +00:00
Dominique Leuenberger
03d2eefa9e Accepting request 1085677 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/1085677
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=90
2023-05-09 11:09:16 +00:00
532aa1e96d Accepting request 1085668 from home:mslacken:branches:network:cluster
- updated to 23.02.02 which includes a number of fixes to Slurm stability
  * Includes a fix for a regression in 23.02 that caused openmpi mpirun to fail
    to launch tasks. 
  * It also includes two functional changes: Don't update the cron job tasks if
    the whole crontab file is left untouched after opening it with scrontab -e
  * Sort dynamic nodes and include them in topology after scontrol reconfigure
    or a slurmctld restart.

OBS-URL: https://build.opensuse.org/request/show/1085668
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=254
2023-05-09 10:35:16 +00:00
Dominique Leuenberger
0d5e08df4b Accepting request 1083466 from network:cluster
- Web-configurator: changed presets to SUSE defaults.
- If %_restart_on_update is no longer defined replace by own
  macro.
- Marked slurm-openlava, slurm-seff and slurm-sjstat noarch.
- rpmlint:
  * dropped some rpmlint filters which are no longer relevant.
  * added/refreshed filters. For Details, see rpmlintrc.
- Remove workaround to fix the restart issue in an Slurm package
  described in bsc#1088693.
  The Slurm version in this package as 16.05. Any attempt to
  directly migrate to the current version is bound to fail
  anyway.
- Now require slurm-munge if munge authentication is installed.

OBS-URL: https://build.opensuse.org/request/show/1083466
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=89
2023-04-28 14:23:13 +00:00
33bf8791ac - Require slurm-munge if munge authentication is installed.
- Replace 'Require: config(pam)' by 'Require: pam'.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=252
2023-04-28 07:46:44 +00:00
392bec3223 Accepting request 1082770 from home:eeich:branches:network:cluster
- Web-configurator: changed presets to SUSE defaults.
- If %_restart_on_update is no longer defined replace by own
  macro.
- Marked slurm-openlava, slurm-seff and slurm-sjstat noarch.
- rpmlint:
  * dropped some rpmlint filters which are no longer relevant.
  * added/refreshed filters. For Details, see rpmlintrc.
- Remove workaround to fix the restart issue in an Slurm package
  described in bsc#1088693.
  The Slurm version in this package as 16.05. Any attempt to
  directly migrate to the current version is bound to fail
  anyway.

OBS-URL: https://build.opensuse.org/request/show/1082770
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=251
2023-04-27 13:24:37 +00:00
Dominique Leuenberger
e27e58c1b6 Accepting request 1076522 from network:cluster
- updated to 23.02.1 with the following changes:
  * job_container/tmpfs - cleanup job container even if namespace mount is
    already unmounted.
  * openapi/dbv0.0.38 - Fix not displaying an error when updating QOS or
    associations fails.
  * Fix nodes remaining as PLANNED after slurmctld save state recovery.
  * Add cgroup.conf EnableControllers option for cgroup/v2.
  * Get correct cgroup root to allow slurmd to run in containers like Docker.
  * slurmctld - add missing PrivateData=jobs check to step ContainerID lookup
    requests originated from 'scontrol show step container-id=<id>' or certain
    scrun operations when container state can't be directly queried.
  * Fix nodes un-draining after being drained due to unkillable step.
  * Fix remote licenses allowed percentages reset to 0 during upgrade.
  * sacct - Avoid truncating time strings when using SLURM_TIME_FORMAT with
    the --parsable option.
  * Fix regression in 22.05.0rc1 that broke Nodes=ALL in a NodeSet.
  * openapi/v0.0.39 - fix jobs submitted via slurmrestd being allocated fewer
    CPUs than tasks when requesting multiple tasks.
  * Fix job not being scheduled on valid nodes and potentially being rejected
    when using parentheses at the beginning of square brackets in a feature
    request, for example: "feat1&[(feat2|feat3)]".
  * Fix regression in 23.02.0rc1 which made --gres-flags=enforce-binding no
    longer enforce optimal core-gpu job placement.
  * mpi/pmix - Fix v5 to load correctly when libpmix.so isn't in the normal
    lib path.
  * data_parser/v0.0.39 - fix regression where "memory_per_node" would be
    rejected for job submission.
  * data_parser/v0.0.39 - fix regression where "memory_per_cpu" would be
    rejected for job submission.
  * slurmctld - add an assert to check for magic number presence before deleting

OBS-URL: https://build.opensuse.org/request/show/1076522
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=88
2023-04-01 17:32:20 +00:00
5a68fc8e5f - updated to 23.02.1 with the following changes:
- removed right-pmix-path.patch as fixed upstream

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=249
2023-03-31 15:48:27 +00:00
d2a2e0a1e8 Accepting request 1076461 from home:mslacken:branches:network:cluster
- updated to 23.02.1 with following chnages:
  * job_container/tmpfs - cleanup job container even if namespace mount is
    already unmounted.
  * openapi/dbv0.0.38 - Fix not displaying an error when updating QOS or
    associations fails.
  * Fix nodes remaining as PLANNED after slurmctld save state recovery.
  * Add cgroup.conf EnableControllers option for cgroup/v2.
  * Get correct cgroup root to allow slurmd to run in containers like Docker.
  * slurmctld - add missing PrivateData=jobs check to step ContainerID lookup
    requests originated from 'scontrol show step container-id=<id>' or certain
    scrun operations when container state can't be directly queried.
  * Fix nodes un-draining after being drained due to unkillable step.
  * Fix remote licenses allowed percentages reset to 0 during upgrade.
  * sacct - Avoid truncating time strings when using SLURM_TIME_FORMAT with
    the --parsable option.
  * Fix regression in 22.05.0rc1 that broke Nodes=ALL in a NodeSet.
  * openapi/v0.0.39 - fix jobs submitted via slurmrestd being allocated fewer
    CPUs than tasks when requesting multiple tasks.
  * Fix job not being scheduled on valid nodes and potentially being rejected
    when using parentheses at the beginning of square brackets in a feature
    request, for example: "feat1&[(feat2|feat3)]".
  * Fix regression in 23.02.0rc1 which made --gres-flags=enforce-binding no
    longer enforce optimal core-gpu job placement.
  * mpi/pmix - Fix v5 to load correctly when libpmix.so isn't in the normal
    lib path.
  * data_parser/v0.0.39 - fix regression where "memory_per_node" would be
    rejected for job submission.
  * data_parser/v0.0.39 - fix regression where "memory_per_cpu" would be
    rejected for job submission.
  * slurmctld - add an assert to check for magic number presence before deleting

OBS-URL: https://build.opensuse.org/request/show/1076461
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=248
2023-03-31 15:44:08 +00:00
Dominique Leuenberger
c7d67ed696 Accepting request 1072592 from network:cluster
added: right-pmix-path.patch (forwarded request 1072591 from mslacken)

OBS-URL: https://build.opensuse.org/request/show/1072592
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=87
2023-03-17 16:05:03 +00:00
5c3d4865a1 Accepting request 1072591 from home:mslacken:branches:network:cluster
added: right-pmix-path.patch

OBS-URL: https://build.opensuse.org/request/show/1072591
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=246
2023-03-17 10:52:44 +00:00
9883ad6d58 Accepting request 1072585 from home:mslacken:branches:network:cluster
- use libpmix.so.2 instead of libpmix.so to fix (bsc#1209260)
  this removes the need of pmix-pluginlib

OBS-URL: https://build.opensuse.org/request/show/1072585
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=245
2023-03-17 10:42:09 +00:00
Dominique Leuenberger
2de2dcca49 Accepting request 1072087 from network:cluster
- slurm-plugins need to require pmix-pluginlib (bsc#1209260) (forwarded request 1072084 from mslacken)

OBS-URL: https://build.opensuse.org/request/show/1072087
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=86
2023-03-15 17:56:12 +00:00
521f372d87 Accepting request 1072084 from home:mslacken:branches:network:cluster
- slurm-plugins need to require pmix-pluginlib (bsc#1209260)

OBS-URL: https://build.opensuse.org/request/show/1072084
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=243
2023-03-15 10:57:09 +00:00
Dominique Leuenberger
c224ea00c3 Accepting request 1070214 from network:cluster
- Fixing dependencies for slurm--plugin-ext-sensors-rrd again. (forwarded request 1070212 from eeich)

OBS-URL: https://build.opensuse.org/request/show/1070214
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=85
2023-03-09 16:45:23 +00:00
e85b508441 Accepting request 1070212 from home:eeich:branches:network:cluster
- Fixing dependencies for slurm--plugin-ext-sensors-rrd again.

OBS-URL: https://build.opensuse.org/request/show/1070212
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=241
2023-03-08 15:43:28 +00:00
86940cb8c4 Accepting request 1070094 from home:eeich:branches:network:cluster
- Fix conflicts for plugin-ext-sensors-rrd

OBS-URL: https://build.opensuse.org/request/show/1070094
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=240
2023-03-08 07:58:58 +00:00
0f04c66747 Accepting request 1070043 from home:eeich:branches:network:cluster
- Fixup previous submission.

OBS-URL: https://build.opensuse.org/request/show/1070043
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=239
2023-03-07 22:14:15 +00:00
da464bfaae Accepting request 1070038 from home:eeich:branches:network:cluster
- Stop pulling firewall rules from github. There is no benefit to
  host these separately.
- Remove pre-sle12 pieces.

- Add missing Provides:, Conflicts: and Obsoletes: to slurm-cray,
  slurm-hdf5 and slurm-testsuite to avoid package conflicts.
- Unify Obsoletes:.
- Consolidate spec files between different Slurm releases in
  Leap/SLE maintenance.

OBS-URL: https://build.opensuse.org/request/show/1070038
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=238
2023-03-07 21:33:03 +00:00
Dominique Leuenberger
50b2b76a05 Accepting request 1068523 from network:cluster
- Add missing Provides: and Obsoletes: to slurm-cray, slurm-hdf5
  and slurm-testsuite to avoid package conflicts.
- Add dependency for the general plugin package to the
  AcctGatherProfile HDF5 plugin.
- Adjust node RealMemory in slurm.conf of test suite for 8G test
  nodes. (forwarded request 1068522 from eeich)

OBS-URL: https://build.opensuse.org/request/show/1068523
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=84
2023-03-02 22:03:34 +00:00
6997bacde0 Accepting request 1068522 from home:eeich:branches:network:cluster
- Add missing Provides: and Obsoletes: to slurm-cray, slurm-hdf5
  and slurm-testsuite to avoid package conflicts.
- Add dependency for the general plugin package to the
  AcctGatherProfile HDF5 plugin.
- Adjust node RealMemory in slurm.conf of test suite for 8G test
  nodes.

OBS-URL: https://build.opensuse.org/request/show/1068522
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=236
2023-03-01 17:58:54 +00:00
Dominique Leuenberger
8a8f7dcb78 Accepting request 1068320 from network:cluster
- updated to 23.02.0
  * Highlights
    + slurmctld - Add new RPC rate limiting feature. This is enabled through
      SlurmctldParameters=rl_enable, otherwise disabled by default.
    + Make scontrol reconfigure and sending a SIGHUP to the slurmctld behave
      the same. If you were using SIGHUP as a 'lighter' scontrol reconfigure
      to rotate logs please update your scripts to use SIGUSR2 instead.
    + Change cloud nodes to show by default. PrivateData=cloud is no longer
      needed.
    + sreport - Count planned (FKA reserved) time for jobs running in
      IGNORE_JOBS reservations. Previously was lumped into IDLE time.
    + job_container/tmpfs - Support running with an arbitrary list of private
      mount points (/tmp and /dev/shm are the default, but not required).
    + job_container/tmpfs - Set more environment variables in InitScript.
    + Make all cgroup directories created by Slurm owned by root. This was the
      behavior in cgroup/v2 but not in cgroup/v1 where by default the step
      directories ownership were set to the user and group of the job.
    + accounting_storage/mysql - change purge/archive to calculate record ages
      based on end time, rather than start or submission times.
    + job_submit/lua - add support for log_user() from slurm_job_modify().
    + Run the following scripts in slurmscriptd instead of slurmctld:
      ResumeProgram, ResumeFailProgram, SuspendProgram, ResvProlog, ResvEpilog,
      and RebootProgram (only with SlurmctldParameters=reboot_from_controller).
    + Only permit changing log levels with 'srun --slurmd-debug' by root
      or SlurmUser.
    + slurmctld will fatal() when reconfiguring the job_submit plugin fails.
    + Add PowerDownOnIdle partition option to power down nodes after nodes
      become idle.
    + Add "[jobid.stepid]" prefix from slurmstepd and "slurmscriptd" prefix
      from slurmcriptd to Syslog logging. Previously was only happening when

OBS-URL: https://build.opensuse.org/request/show/1068320
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=83
2023-03-01 15:14:17 +00:00
e60f39a466 - updated to 23.02.0
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=234
2023-02-28 20:50:48 +00:00
8899aac00b - testsuite: on later SUSE versions claim ownership of directory
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=233
2023-02-28 20:34:03 +00:00
18aa012ab9 Accepting request 1068316 from home:eeich:branches:network:cluster
+ Fixed GpuFreqDef option. When set in slurm.conf, it will be used if
      --gpu-freq was not explicitly set by the job step.
    + topology/tree - Add new TopologyParam=SwitchAsNodeRank option to reorder
      nodes based on switch layout. This can be useful if the naming convention
      for the nodes does not natually map to the network topology.
    + Removed the default setting for GpuFreqDef. If unset, no attempt to change
      the GPU frequency will be made if --gpu-freq is not set for the step.

OBS-URL: https://build.opensuse.org/request/show/1068316
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=232
2023-02-28 20:30:32 +00:00
ef6d6521aa Accepting request 1067475 from home:eeich:branches:network:cluster
- updated to 23.02.0-0rc1
  * Highlights
    + slurmctld - Add new RPC rate limiting feature. This is enabled through
      SlurmctldParameters=rl_enable, otherwise disabled by default.
    + Make scontrol reconfigure and sending a SIGHUP to the slurmctld behave
      the same. If you were using SIGHUP as a 'lighter' scontrol reconfigure
      to rotate logs please update your scripts to use SIGUSR2 instead.
    + Change cloud nodes to show by default. PrivateData=cloud is no longer
      needed.
    + sreport - Count planned (FKA reserved) time for jobs running in
      IGNORE_JOBS reservations. Previously was lumped into IDLE time.
    + job_container/tmpfs - Support running with an arbitrary list of private
      mount points (/tmp and /dev/shm are the default, but not required).
    + job_container/tmpfs - Set more environment variables in InitScript.
    + Make all cgroup directories created by Slurm owned by root. This was the
      behavior in cgroup/v2 but not in cgroup/v1 where by default the step
      directories ownership were set to the user and group of the job.
    + accounting_storage/mysql - change purge/archive to calculate record ages
      based on end time, rather than start or submission times.
    + job_submit/lua - add support for log_user() from slurm_job_modify().
    + Run the following scripts in slurmscriptd instead of slurmctld:
      ResumeProgram, ResumeFailProgram, SuspendProgram, ResvProlog, ResvEpilog,
      and RebootProgram (only with SlurmctldParameters=reboot_from_controller).
    + Only permit changing log levels with 'srun --slurmd-debug' by root
      or SlurmUser.
    + slurmctld will fatal() when reconfiguring the job_submit plugin fails.
    + Add PowerDownOnIdle partition option to power down nodes after nodes
      become idle.
    + Add "[jobid.stepid]" prefix from slurmstepd and "slurmscriptd" prefix
      from slurmcriptd to Syslog logging. Previously was only happening when

OBS-URL: https://build.opensuse.org/request/show/1067475
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=231
2023-02-23 19:32:51 +00:00
Dominique Leuenberger
d1ebf00ba6 Accepting request 1063957 from network:cluster
- testsuite: on laster SUSE versions claim ownership of directory
  /etc/security/limits.d. (forwarded request 1063954 from eeich)

OBS-URL: https://build.opensuse.org/request/show/1063957
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=82
2023-02-09 15:23:26 +00:00
4693e39860 Accepting request 1063954 from home:eeich:branches:network:cluster
- testsuite: on laster SUSE versions claim ownership of directory
  /etc/security/limits.d.

OBS-URL: https://build.opensuse.org/request/show/1063954
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=229
2023-02-09 08:22:55 +00:00