SHA256
1
0
forked from pool/slurm
Commit Graph

4 Commits

Author SHA256 Message Date
e7275730c8 Accepting request 1138332 from home:mslacken:branches:network:cluster
- Update to 23.11.1 with following major improvements and fixing
  CVE-2023-49933, CVE-2023-49934, CVE-2023-49935, CVE-2023-49936 and
  CVE-2023-49937
  * Substantially overhauled the SlurmDBD association management code. For
    clusters updated to 23.11, account and user additions or removals are
    significantly faster than in prior releases.
  * Overhauled 'scontrol reconfigure' to prevent configuration mistakes from
    disabling slurmctld and slurmd. Instead, an error will be returned, and the
    running configuration will persist. This does require updates to the
    systemd service files to use the --systemd option to slurmctld and slurmd.
  * Added a new internal auth/cred plugin - "auth/slurm". This builds off the
    prior auth/jwt model, and permits operation of the slurmdbd and slurmctld
    without access to full directory information with a suitable configuration.
  * Added a new --external-launcher option to srun, which is automatically set
    by common MPI launcher implementations and ensures processes using those
    non-srun launchers have full access to all resources allocated on each
    node.
  * Reworked the dynamic/cloud modes of operation to allow for "fanout" - where
    Slurm communication can be automatically offloaded to compute nodes for
    increased cluster scalability.
    Added initial official Debian packaging support.
  * Overhauled and extended the Reservation subsystem to allow for most of the
    same resource requirements as are placed on the job. Notably, this permits
    reservations to now reserve GRES directly.
- Details of changes:
  * Fix scontrol update job=... TimeLimit+=/-= when used with a raw JobId of job
    array element.
  * Reject TimeLimit increment/decrement when called on job with
    TimeLimit=UNLIMITED.
  * Fix issue with requesting a job with  *licenses as well as

OBS-URL: https://build.opensuse.org/request/show/1138332
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=284
2024-01-22 15:21:33 +00:00
ef6d6521aa Accepting request 1067475 from home:eeich:branches:network:cluster
- updated to 23.02.0-0rc1
  * Highlights
    + slurmctld - Add new RPC rate limiting feature. This is enabled through
      SlurmctldParameters=rl_enable, otherwise disabled by default.
    + Make scontrol reconfigure and sending a SIGHUP to the slurmctld behave
      the same. If you were using SIGHUP as a 'lighter' scontrol reconfigure
      to rotate logs please update your scripts to use SIGUSR2 instead.
    + Change cloud nodes to show by default. PrivateData=cloud is no longer
      needed.
    + sreport - Count planned (FKA reserved) time for jobs running in
      IGNORE_JOBS reservations. Previously was lumped into IDLE time.
    + job_container/tmpfs - Support running with an arbitrary list of private
      mount points (/tmp and /dev/shm are the default, but not required).
    + job_container/tmpfs - Set more environment variables in InitScript.
    + Make all cgroup directories created by Slurm owned by root. This was the
      behavior in cgroup/v2 but not in cgroup/v1 where by default the step
      directories ownership were set to the user and group of the job.
    + accounting_storage/mysql - change purge/archive to calculate record ages
      based on end time, rather than start or submission times.
    + job_submit/lua - add support for log_user() from slurm_job_modify().
    + Run the following scripts in slurmscriptd instead of slurmctld:
      ResumeProgram, ResumeFailProgram, SuspendProgram, ResvProlog, ResvEpilog,
      and RebootProgram (only with SlurmctldParameters=reboot_from_controller).
    + Only permit changing log levels with 'srun --slurmd-debug' by root
      or SlurmUser.
    + slurmctld will fatal() when reconfiguring the job_submit plugin fails.
    + Add PowerDownOnIdle partition option to power down nodes after nodes
      become idle.
    + Add "[jobid.stepid]" prefix from slurmstepd and "slurmscriptd" prefix
      from slurmcriptd to Syslog logging. Previously was only happening when

OBS-URL: https://build.opensuse.org/request/show/1067475
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=231
2023-02-23 19:32:51 +00:00
e481851f5a Accepting request 845108 from home:anag:branches:network:cluster
- Updated to 20.02.5, changes:
 * Fix leak of TRESRunMins when job time is changed with --time-min
 * pam_slurm - explicitly initialize slurm config to support configless mode.
 * scontrol - Fix exit code when creating/updating reservations with wrong
   Flags.
 * When a GRES has a no_consume flag, report 0 for allocated.
 * Fix cgroup cleanup by jobacct_gather/cgroup.
 * When creating reservations/jobs don't allow counts on a feature unless
   using an XOR.
 * Improve number of boards discovery
 * Fix updating a reservation NodeCnt on a zero-count reservation.
 * slurmrestd - provide an explicit error messages when PSK auth fails.
 * cons_tres - fix job requesting single gres per-node getting two or more
   nodes with less CPUs than requested per-task.
 * cons_tres - fix calculation of cores when using gres and cpus-per-task.
 * cons_tres - fix job not getting access to socket without GPU or with less
   than --gpus-per-socket when not enough cpus available on required socket
   and not using --gres-flags=enforce binding.
 * Fix HDF5 type version build error.
 * Fix creation of CoreCnt only reservations when the first node isn't
   available.
 * Fix wrong DBD Agent queue size in sdiag when using accounting_storage/none.
 * Improve job constraints XOR option logic.
 * Fix preemption of hetjobs when needed nodes not in leader component.
 * Fix wrong bit_or() messing potential preemptor jobs node bitmap, causing
   bad node deallocations and even allocation of nodes from other partitions.
 * Fix double-deallocation of preempted non-leader hetjob components.
 * slurmdbd - prevent truncation of the step nodelists over 4095.
 * Fix nodes remaining in drain state state after rebooting with ASAP option.
 - changes from 20.02.4:

OBS-URL: https://build.opensuse.org/request/show/845108
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=156
2020-11-02 13:42:03 +00:00
Corot Sebastien
bd06e0c765 Accepting request 454272 from home:eeich:branches:network:cluster
- Updated to 16.05.8.1
 * Remove StoragePass from being printed out in the slurmdbd log at debug2
   level.
 * Defer PATH search for task program until launch in slurmstepd.
 * Modify regression test1.89 to avoid leaving vestigial job. Also reduce
    logging to reduce likelyhood of Expect buffer overflow.
 * Do not PATH search for mult-prog launches if LaunchParamters=test_exec is
    enabled.
 * Fix for possible infinite loop in select/cons_res plugin when trying to
    satisfy a job's ntasks_per_core or socket specification.
 * If job is held for bad constraints make it so once updated the job doesn't
    go into JobAdminHeld.
 * sched/backfill - Fix logic to reserve resources for jobs that require a
    node reboot (i.e. to change KNL mode) in order to start.
 * When unpacking a node or front_end record from state and the protocol
    version is lower than the min version, set it to the min.
 * Remove redundant lookup for part_ptr when updating a reservation's nodes.
 * Fix memory and file descriptor leaks in slurmd daemon's sbcast logic.
 * Do not allocate specialized cores to jobs using the --exclusive option.
 * Cancel interactive job if Prolog failure with "PrologFlags=contain" or
   "PrologFlags=alloc" configured. Send new error prolog failure message to
   the salloc or srun command as needed.
 * Prevent possible out-of-bounds read in slurmstepd on an invalid #! line.
 * Fix check for PluginDir within slurmctld to work with multiple directories.
 * Cancel interactive jobs automatically on communication error to launching
   srun/salloc process.
 * Fix security issue caused by insecure file path handling triggered by the
   failure of a Prolog script. To exploit this a user needs to anticipate or
   cause the Prolog to fail for their job. CVE-2016-10030 (bsc#1018371).
- Replace group/user add macros with function calls.
- Disable building with netloc support: the netloc API is part of the devel
  branch of hwloc. Since this devel branch was included accidentally and has
  been reversed since, we need to disable this for the time being.
- Conditionalized architecture specific pieces to support non-x86 architectures
  better.

- Remove: unneeded 'BuildRequires:  python'
- Add:
  BuildRequires:  freeipmi-devel
  BuildRequires:  libibmad-devel
  BuildRequires:  libibumad-devel
  so they are picked up by the slurm build.
- Enable modifications from openHPC Project.
- Enable lua API package build.
- Add a recommends for slurm-munge to the slurm package:
  This is way, the munge auth method is available and slurm
  works out of the box.
- Create /var/lib/slurm as StateSaveLocation directory.
  /tmp is dangerous. 

- Keep %{_libdir}/libpmi* and %{_libdir}/mpi_pmi2* on SUSE.

OBS-URL: https://build.opensuse.org/request/show/454272
OBS-URL: https://build.opensuse.org/package/show/network:cluster/slurm?expand=0&rev=13
2017-02-02 20:23:02 +00:00