Commit Graph

5 Commits

Author SHA256 Message Date
b1107f7a34 - Update to version 24.05.4 & fix for CVE-2024-48936.
* Fix generic int sort functions.
  * Fix user look up using possible unrealized uid in the dbd.
  * `slurmrestd` - Fix regressions that allowed `slurmrestd` to
    be run as SlurmUser when `SlurmUser` was not root.
  * mpi/pmix fix race conditions with het jobs at step start/end
    which could make srun to hang.
  * Fix not showing some `SelectTypeParameters` in `scontrol show
    config`.
  * Avoid assert when dumping removed certain fields in JSON/YAML.
  * Improve how shards are scheduled with affinity in mind.
  * Fix `MaxJobsAccruePU` not being respected when `MaxJobsAccruePA`
    is set in the same QOS.
  * Prevent backfill from planning jobs that use overlapping
    resources for the same time slot if the job's time limit is
    less than `bf_resolution`.
  * Fix memory leak when requesting typed gres and
    `--[cpus|mem]-per-gpu`.
  * Prevent backfill from breaking out due to "system state
    changed" every 30 seconds if reservations use `REPLACE` or
   `REPLACE_DOWN` flags.
  * `slurmrestd` - Make sure that scheduler_unset parameter defaults
    to true even when the following flags are also set:
    `show_duplicates`, `skip_steps`, `disable_truncate_usage_time`,
    `run_away_jobs`, `whole_hetjob`, `disable_whole_hetjob`,
    `disable_wait_for_result`, `usage_time_as_submit_time`,
    `show_batch_script`, and or `show_job_environment`. Additionaly,
    always make sure show_duplicates and
    `disable_truncate_usage_time` default to true when the following
    flags are also set: `scheduler_unset`, `scheduled_on_submit`,

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=300
2024-11-01 13:22:34 +00:00
427f09ad29 - Add %(?%sysusers_requires} to slurm-config.
This fixes issues when building against Slurm.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=298
2024-10-23 09:42:56 +00:00
1cc2983ebe - Removed Fix-test-21.41.patch as upstream test changed.
- Dropped package plugin-ext-sensors-rrd as the plugin module no
  longer exists.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=296
2024-10-15 10:19:24 +00:00
b2f6e848a1 - Update to version 24.05.3
* `data_parser/v0.0.40` - Added field descriptions.
  * `slurmrestd` - Avoid creating new slurmdbd connection per request
    to `* /slurm/slurmctld/*/*` endpoints.
  * Fix compilation issue with `switch/hpe_slingshot` plugin.
  * Fix gres per task allocation with threads-per-core.
  * `data_parser/v0.0.41` - Added field descriptions.
  * `slurmrestd` - Change back generated OpenAPI schema for
    `DELETE /slurm/v0.0.40/jobs/` to `RequestBody` instead of using
    parameters for request. `slurmrestd` will continue accept endpoint
    requests via `RequestBody` or HTTP query.
  * `topology/tree` - Fix issues with switch distance optimization.
  * Fix potential segfault of secondary `slurmctld` when falling back
    to the primary when running with a `JobComp` plugin.
  * Enable `--json`/`--yaml=v0.0.39` options on client commands to
    dump data using data_parser/v0.0.39 instead or outputting nothing.
  * `switch/hpe_slingshot` - Fix issue that could result in a 0 length
    state file.
  * Fix unnecessary message protocol downgrade for unregistered nodes.
  * Fix unnecessarily packing alias addrs when terminating jobs with
    a mix of non-cloud/dynamic nodes and powered down cloud/dynamic
    nodes.
  * `accounting_storage/mysql` - Fix issue when deleting a qos that
    could remove too many commas from the qos and/or delta_qos fields
    of the assoc table.
  * `slurmctld` - Fix memory leak when using RestrictedCoresPerGPU.
  * Fix allowing access to reservations without `MaxStartDelay` set.
  * Fix regression introduced in 24.05.0rc1 breaking
    `srun --send-libs` parsing.
  * Fix slurmd vsize memory leak when using job submission/allocation

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=295
2024-10-15 06:51:09 +00:00
fc209e050f - updated to new release 24.05.0 with following major changes
- IMPORTANT NOTES:
  If using the slurmdbd (Slurm DataBase Daemon) you must update
  this first.  NOTE: If using a backup DBD you must start the
  primary first to do any database conversion, the backup will not
  start until this has happened.  The 24.05 slurmdbd will work
  with Slurm daemons of version 23.02 and above.  You will not
  need to update all clusters at the same time, but it is very
  important to update slurmdbd first and having it running before
  updating any other clusters making use of it.
- HIGHLIGHTS
  * Federation - allow client command operation when slurmdbd is
    unavailable.
  * burst_buffer/lua - Added two new hooks: slurm_bb_test_data_in
    and slurm_bb_test_data_out. The syntax and use of the new hooks
    are documented in etc/burst_buffer.lua.example. These are
    required to exist. slurmctld now checks on startup if the
    burst_buffer.lua script loads and contains all required hooks;
    slurmctld will exit with a fatal error if this is not
    successful. Added PollInterval to burst_buffer.conf. Removed
    the arbitrary limit of 512 copies of the script running
    simultaneously.
  * Add QOS limit MaxTRESRunMinsPerAccount. 
  * Add QOS limit MaxTRESRunMinsPerUser.
  * Add ELIGIBLE environment variable to jobcomp/script plugin.
  * Always use the QOS name for SLURM_JOB_QOS environment variables.
    Previously the batch environment would use the description field,
    which was usually equivalent to the name. 
  * cgroup/v2 - Require dbus-1 version >= 1.11.16.
  * Allow NodeSet names to be used in SuspendExcNodes.

OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=294
2024-10-14 10:03:00 +00:00