diff --git a/pam_slurm_adopt-avoid-running-outside-of-the-sshd-PA.patch b/pam_slurm_adopt-avoid-running-outside-of-the-sshd-PA.patch index f16ee26..e5317fa 100644 --- a/pam_slurm_adopt-avoid-running-outside-of-the-sshd-PA.patch +++ b/pam_slurm_adopt-avoid-running-outside-of-the-sshd-PA.patch @@ -1,4 +1,4 @@ -From 9f13f7450cb38ac099d2887ab42f588f9dd35306 Mon Sep 17 00:00:00 2001 +From 4c38389917a54e137a4578b45f0f6a821c8c591a Mon Sep 17 00:00:00 2001 From: Matthias Gerstner Date: Wed, 5 Dec 2018 15:03:19 +0100 Subject: [PATCH 1/3] pam_slurm_adopt: avoid running outside of the sshd PAM @@ -16,18 +16,115 @@ this behaviour, if different behaviour is explicitly desired. Signed-off-by: Christian Goll --- - contribs/pam_slurm_adopt/README | 9 ++++++ - contribs/pam_slurm_adopt/pam_slurm_adopt.c | 46 ++++++++++++++++++++++++++++++ - 2 files changed, 55 insertions(+) + contribs/pam_slurm_adopt/README | 172 ++++++++++++++++++++++++++++- + contribs/pam_slurm_adopt/pam_slurm_adopt.c | 46 ++++++++ + 2 files changed, 217 insertions(+), 1 deletion(-) diff --git a/contribs/pam_slurm_adopt/README b/contribs/pam_slurm_adopt/README -index a84480c1a6..a2d61a977b 100644 +index 07039740f8..8baece6d2e 100644 --- a/contribs/pam_slurm_adopt/README +++ b/contribs/pam_slurm_adopt/README -@@ -97,6 +97,15 @@ This module has the following options (* = default): - 0* = If the step the job is adopted into has X11 enabled, set - the DISPLAY variable in the processes environment accordingly. +@@ -1,5 +1,175 @@ + Current documentation can be found here: + https://slurm.schedmd.com/pam_slurm_adopt.html +- + (Which is generated from docs/html/pam_slurm_adopt.shtml.) ++ ++======= ++AUTHOR ++ Ryan Cox ++ ++MODULE TYPES PROVIDED ++ account ++ ++DESCRIPTION ++ This module attempts to determine the job which originated this connection. ++ The module is configurable; these are the default steps: ++ ++ 1) Check the local stepd for a count of jobs owned by the non-root user ++ a) If none, deny (option action_no_jobs) ++ b) If only one, adopt the process into that job ++ c) If multiple, continue ++ 2) Determine src/dst IP/port of socket ++ 3) Issue callerid RPC to slurmd at IP address of source ++ a) If the remote slurmd can identify the source job, adopt into that job ++ b) If not, continue ++ 4) Pick a random local job from the user to adopt into (option action_unknown) ++ ++ Jobs are adopted into a job's allocation step. ++ ++MODULE OPTIONS ++This module has the following options (* = default): ++ ++ ignore_root - By default, all root connections are ignored. If the RPC ++ is sent to a node which drops packets to the slurmd port, the ++ RPC will block for some time before failing. This is ++ unlikely to be desirable. Likewise, root may be trying to ++ administer the system and not do work that should be in a job. ++ The job may trigger oom-killer or just exit. If root restarts ++ a service or similar, it will be tracked and killed by Slurm ++ when the job exits. This sounds bad because it is bad. ++ ++ 1* = Let the connection through without adoption ++ 0 = I am crazy. I want random services to die when root jobs exit. I ++ also like it when RPCs block for a while then time out. ++ ++ ++ action_no_jobs - The action to perform if the user has no jobs on the node ++ ++ ignore = Do nothing. Fall through to the next pam module ++ deny* = Deny the connection ++ ++ ++ action_unknown - The action to perform when the user has multiple jobs on ++ the node *and* the RPC does not locate the source job. ++ If the RPC mechanism works properly in your environment, ++ this option will likely be relevant *only* when connecting ++ from a login node. ++ ++ newest* = Pick the newest job on the node. The "newest" job is chosen ++ based on the mtime of the job's step_extern cgroup; asking ++ Slurm would require an RPC to the controller. The user can ssh ++ in but may be adopted into a job that exits earlier than the ++ job they intended to check on. The ssh connection will at ++ least be subject to appropriate limits and the user can be ++ informed of better ways to accomplish their objectives if this ++ becomes a problem ++ allow = Let the connection through without adoption ++ deny = Deny the connection ++ ++ ++ action_adopt_failure - The action to perform if the process is unable to be ++ adopted into any job for whatever reason. If the ++ process cannot be adopted into the job identified by ++ the callerid RPC, it will fall through to the ++ action_unknown code and try to adopt there. A failure ++ at that point or if there is only one job will result ++ in this action being taken. ++ ++ allow* = Let the connection through without adoption ++ deny = Deny the connection ++ ++ action_generic_failure - The action to perform if there are certain failures ++ such as the inability to talk to the local slurmd ++ or if the kernel doesn't offer the correct ++ facilities. ++ ++ ignore* = Do nothing. Fall through to the next pam module ++ allow = Let the connection through without adoption ++ deny = Deny the connection ++ ++ log_level - See SlurmdDebug in slurm.conf(5) for available options. The ++ default log_level is info. ++ ++ disable_x11 - turn off Slurm built-in X11 forwarding support. ++ ++ 1 = Do not check for Slurm's X11 forwarding support, and no not ++ alter the DISPLAY variable. ++ 0* = If the step the job is adopted into has X11 enabled, set ++ the DISPLAY variable in the processes environment accordingly. ++ + service - The pam service name for which this module should run. By default + it only runs for sshd for which it was designed for. A + different service name can be specified like "login" or "*" to @@ -37,11 +134,75 @@ index a84480c1a6..a2d61a977b 100644 + module will not perform the adoption logic and returns + PAM_IGNORE immediately. + - SLURM.CONF CONFIGURATION - PrologFlags=contain must be set in slurm.conf. This sets up the "extern" step - into which ssh-launched processes will be adopted. ++SLURM.CONF CONFIGURATION ++ PrologFlags=contain must be set in slurm.conf. This sets up the "extern" step ++ into which ssh-launched processes will be adopted. ++ ++ **** IMPORTANT **** ++ PrologFlags=contain must be in place *before* using this module. ++ The module bases its checks on local steps that have already been launched. If ++ the user has no steps on the node, such as the extern step, the module will ++ assume that the user has no jobs allocated to the node. Depending on your ++ configuration of the pam module, you might deny *all* user ssh attempts. ++ ++NOTES ++ This module and the related RPC currently support Linux systems which ++ have network connection information available through /proc/net/tcp{,6}. A ++ proccess's sockets must exist as symlinks in its /proc/self/fd directory. ++ ++ The RPC data structure itself is OS-agnostic. If support is desired for a ++ different OS, relevant code must be added to find one's socket information ++ then match that information on the remote end to a particular process which ++ Slurm is tracking. ++ ++ IPv6 is supported by the RPC data structure itself and the code which sends it ++ and receives it. Sending the RPC to an IPv6 address is not currently ++ supported by Slurm. Once support is added, remove the relevant check in ++ slurm_network_callerid(). ++ ++ For the action_unknown=newest setting to work, the memory cgroup must be in ++ use so that the code can check mtimes of cgroup directories. If you would ++ prefer to use a different subsystem, modify the _indeterminate_multiple ++ function. ++ ++FIREWALLS, IP ADDRESSES, ETC. ++ slurmd should be accessible on any IP address from which a user might launch ++ ssh. The RPC to determine the source job must be able to reach the slurmd ++ port on that particular IP address. ++ ++ If there is no slurmd on the source node, such as on a login node, it is ++ better to have the RPC be rejected rather than silently dropped. This ++ will allow better responsiveness to the RPC initiator. ++ ++EXAMPLES / SUGGESTED USAGE ++ Use of this module is recommended on any compute node. ++ ++ Add the following line to the appropriate file in /etc/pam.d, such as ++ system-auth or sshd: ++ ++ account sufficient pam_slurm_adopt.so ++ ++ If you always want to allow access for an administrative group (e.g. wheel), ++ stack the pam_access module after pam_slurm_adopt. A success with ++ pam_slurm_adopt is sufficient to allow access but the pam_access module can ++ allow others, such as staff, access even without jobs. ++ ++ account sufficient pam_slurm_adopt.so ++ account required pam_access.so ++ ++ ++ Then edit the pam_access configuration file (/etc/security/access.conf): ++ ++ +:wheel:ALL ++ -:ALL:ALL ++ ++ When access is denied, the user will receive a relevant error message. ++ ++ pam_systemd.so is known to not play nice with Slurm's usage of cgroups. It is ++ recommended that you disable it or possibly add pam_slurm_adopt.so after ++ pam_systemd.so. diff --git a/contribs/pam_slurm_adopt/pam_slurm_adopt.c b/contribs/pam_slurm_adopt/pam_slurm_adopt.c -index 3f23c2ec77..da21479f61 100644 +index 51f21e8729..dccad90185 100644 --- a/contribs/pam_slurm_adopt/pam_slurm_adopt.c +++ b/contribs/pam_slurm_adopt/pam_slurm_adopt.c @@ -94,6 +94,7 @@ static struct { @@ -124,7 +285,7 @@ index 3f23c2ec77..da21479f61 100644 _log_init(opts.log_level); switch (opts.action_generic_failure) { -@@ -762,6 +807,7 @@ cleanup: +@@ -765,6 +810,7 @@ cleanup: xfree(buf); xfree(slurm_cgroup_conf); xfree(opts.node_name); diff --git a/pam_slurm_adopt-send_user_msg-don-t-copy-undefined-d.patch b/pam_slurm_adopt-send_user_msg-don-t-copy-undefined-d.patch index 53a178c..417ad26 100644 --- a/pam_slurm_adopt-send_user_msg-don-t-copy-undefined-d.patch +++ b/pam_slurm_adopt-send_user_msg-don-t-copy-undefined-d.patch @@ -1,4 +1,4 @@ -From 33d78f2db60d3a86c38512f0502df559782cbdf6 Mon Sep 17 00:00:00 2001 +From a5d4481c05e2afa1ff1920446663e66c48ef9277 Mon Sep 17 00:00:00 2001 From: Matthias Gerstner Date: Wed, 5 Dec 2018 14:08:07 +0100 Subject: [PATCH 2/3] pam_slurm_adopt: send_user_msg: don't copy undefined data diff --git a/pam_slurm_adopt-use-uid-to-determine-whether-root-is.patch b/pam_slurm_adopt-use-uid-to-determine-whether-root-is.patch index 5286328..c0fa6de 100644 --- a/pam_slurm_adopt-use-uid-to-determine-whether-root-is.patch +++ b/pam_slurm_adopt-use-uid-to-determine-whether-root-is.patch @@ -1,4 +1,4 @@ -From 86f74afb04f2f8f40751ccc0bdbfd77b99035d8d Mon Sep 17 00:00:00 2001 +From d630acbf5709dcf03f9e8cd1739a77cfe6c1e4b8 Mon Sep 17 00:00:00 2001 From: Matthias Gerstner Date: Wed, 5 Dec 2018 15:08:53 +0100 Subject: [PATCH 3/3] pam_slurm_adopt: use uid to determine whether root is @@ -13,7 +13,7 @@ Signed-off-by: Christian Goll 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/contribs/pam_slurm_adopt/pam_slurm_adopt.c b/contribs/pam_slurm_adopt/pam_slurm_adopt.c -index da21479f61..c4635b4693 100644 +index dccad90185..f1d062885e 100644 --- a/contribs/pam_slurm_adopt/pam_slurm_adopt.c +++ b/contribs/pam_slurm_adopt/pam_slurm_adopt.c @@ -708,17 +708,6 @@ PAM_EXTERN int pam_sm_acct_mgmt(pam_handle_t *pamh, int flags @@ -49,8 +49,8 @@ index da21479f61..c4635b4693 100644 + } + } - /* Check if there are any steps on the node from any user. A failure here - * likely means failures everywhere so exit on failure or if no local jobs + /* + * Check if there are any steps on the node from any user. A failure here -- 2.16.4 diff --git a/slurm-18.08.3.tar.bz2 b/slurm-18.08.3.tar.bz2 deleted file mode 100644 index b5ece48..0000000 --- a/slurm-18.08.3.tar.bz2 +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:959df5d07563f2f472376a57bdafe61b4c44fe183a4a2c279c83607336dff806 -size 6092020 diff --git a/slurm-18.08.4.tar.bz2 b/slurm-18.08.4.tar.bz2 new file mode 100644 index 0000000..3f317fd --- /dev/null +++ b/slurm-18.08.4.tar.bz2 @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:d215ef87481e48032ac7c3bcf61aac40b5258dedfbab3f56af5d53d59f22b4c8 +size 6069605 diff --git a/slurm.changes b/slurm.changes index b5750d4..f93c247 100644 --- a/slurm.changes +++ b/slurm.changes @@ -1,3 +1,33 @@ +------------------------------------------------------------------- +Thu Dec 13 10:07:00 UTC 2018 - cgoll@suse.com +- Update to 18.08.04, with following highlights + * Fix message sent to user to display preempted instead of time limit when + a job is preempted. + * Fix memory leak when a failure happens processing a nodes gres config. + * Improve error message when failures happen processing a nodes gres config. + * Don't skip jobs in scontrol hold. + * Allow --cpu-bind=verbose to be used with SLURM_HINT environment variable. + * Enhanced handling for runaway jobs + * cons_res: Delay exiting cr_job_test until after cores/cpus are calculated + and distributed. + * Don't check existence of srun --prolog or --epilog executables when set to + "none" and SLURM_TEST_EXEC is used. + * Add "P" suffix support to job and step tres specifications. + * Fix jobacct_gather/cgroup to work correctly when more than one task is + started on a node. + * salloc - set SLURM_NTASKS_PER_CORE and SLURM_NTASKS_PER_SOCKET in the + environment if the corresponding command line options are used. + * slurmd - fix handling of the -f flag to specify alternate config file + locations. + * Add SchedulerParameters option of bf_ignore_newly_avail_nodes to avoid + scheduling lower priority jobs on resources that become available during + the backfill scheduling cycle when bf_continue is enabled. + * job_submit/lua: Add several slurmctld return codes and add user/group info + * salloc/sbatch/srun - print warning if mutually exclusive options of --mem + and --mem-per-cpu are both set. + - Refreshed: + * pam_slurm_adopt-avoid-running-outside-of-the-sshd-PA.patch + ------------------------------------------------------------------- Mon Dec 10 10:49:14 UTC 2018 - cgoll@suse.com diff --git a/slurm.spec b/slurm.spec index 2156827..ef3ab4c 100644 --- a/slurm.spec +++ b/slurm.spec @@ -1,7 +1,7 @@ # # spec file for package slurm # -# Copyright (c) 2018 SUSE LINUX GmbH, Nuernberg, Germany. +# Copyright (c) 2019 SUSE LINUX GmbH, Nuernberg, Germany. # # All modifications and additions to the file contributed by third parties # remain the property of their copyright owners, unless otherwise agreed @@ -18,7 +18,7 @@ # Check file META in sources: update so_version to (API_CURRENT - API_AGE) %define so_version 33 -%define ver 18.08.3 +%define ver 18.08.4 # so-version is 0 and seems to be stable %define pmi_so 0