Accepting request 663813 from network:cluster
OBS-URL: https://build.opensuse.org/request/show/663813 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/slurm?expand=0&rev=24
This commit is contained in:
commit
74b4d5ddb3
@ -1,4 +1,4 @@
|
|||||||
From 9f13f7450cb38ac099d2887ab42f588f9dd35306 Mon Sep 17 00:00:00 2001
|
From 4c38389917a54e137a4578b45f0f6a821c8c591a Mon Sep 17 00:00:00 2001
|
||||||
From: Matthias Gerstner <matthias.gerstner@suse.de>
|
From: Matthias Gerstner <matthias.gerstner@suse.de>
|
||||||
Date: Wed, 5 Dec 2018 15:03:19 +0100
|
Date: Wed, 5 Dec 2018 15:03:19 +0100
|
||||||
Subject: [PATCH 1/3] pam_slurm_adopt: avoid running outside of the sshd PAM
|
Subject: [PATCH 1/3] pam_slurm_adopt: avoid running outside of the sshd PAM
|
||||||
@ -16,18 +16,115 @@ this behaviour, if different behaviour is explicitly desired.
|
|||||||
|
|
||||||
Signed-off-by: Christian Goll <cgoll@suse.de>
|
Signed-off-by: Christian Goll <cgoll@suse.de>
|
||||||
---
|
---
|
||||||
contribs/pam_slurm_adopt/README | 9 ++++++
|
contribs/pam_slurm_adopt/README | 172 ++++++++++++++++++++++++++++-
|
||||||
contribs/pam_slurm_adopt/pam_slurm_adopt.c | 46 ++++++++++++++++++++++++++++++
|
contribs/pam_slurm_adopt/pam_slurm_adopt.c | 46 ++++++++
|
||||||
2 files changed, 55 insertions(+)
|
2 files changed, 217 insertions(+), 1 deletion(-)
|
||||||
|
|
||||||
diff --git a/contribs/pam_slurm_adopt/README b/contribs/pam_slurm_adopt/README
|
diff --git a/contribs/pam_slurm_adopt/README b/contribs/pam_slurm_adopt/README
|
||||||
index a84480c1a6..a2d61a977b 100644
|
index 07039740f8..8baece6d2e 100644
|
||||||
--- a/contribs/pam_slurm_adopt/README
|
--- a/contribs/pam_slurm_adopt/README
|
||||||
+++ b/contribs/pam_slurm_adopt/README
|
+++ b/contribs/pam_slurm_adopt/README
|
||||||
@@ -97,6 +97,15 @@ This module has the following options (* = default):
|
@@ -1,5 +1,175 @@
|
||||||
0* = If the step the job is adopted into has X11 enabled, set
|
Current documentation can be found here:
|
||||||
the DISPLAY variable in the processes environment accordingly.
|
|
||||||
|
|
||||||
|
https://slurm.schedmd.com/pam_slurm_adopt.html
|
||||||
|
-
|
||||||
|
(Which is generated from docs/html/pam_slurm_adopt.shtml.)
|
||||||
|
+
|
||||||
|
+=======
|
||||||
|
+AUTHOR
|
||||||
|
+ Ryan Cox <ryan_cox@byu.edu>
|
||||||
|
+
|
||||||
|
+MODULE TYPES PROVIDED
|
||||||
|
+ account
|
||||||
|
+
|
||||||
|
+DESCRIPTION
|
||||||
|
+ This module attempts to determine the job which originated this connection.
|
||||||
|
+ The module is configurable; these are the default steps:
|
||||||
|
+
|
||||||
|
+ 1) Check the local stepd for a count of jobs owned by the non-root user
|
||||||
|
+ a) If none, deny (option action_no_jobs)
|
||||||
|
+ b) If only one, adopt the process into that job
|
||||||
|
+ c) If multiple, continue
|
||||||
|
+ 2) Determine src/dst IP/port of socket
|
||||||
|
+ 3) Issue callerid RPC to slurmd at IP address of source
|
||||||
|
+ a) If the remote slurmd can identify the source job, adopt into that job
|
||||||
|
+ b) If not, continue
|
||||||
|
+ 4) Pick a random local job from the user to adopt into (option action_unknown)
|
||||||
|
+
|
||||||
|
+ Jobs are adopted into a job's allocation step.
|
||||||
|
+
|
||||||
|
+MODULE OPTIONS
|
||||||
|
+This module has the following options (* = default):
|
||||||
|
+
|
||||||
|
+ ignore_root - By default, all root connections are ignored. If the RPC
|
||||||
|
+ is sent to a node which drops packets to the slurmd port, the
|
||||||
|
+ RPC will block for some time before failing. This is
|
||||||
|
+ unlikely to be desirable. Likewise, root may be trying to
|
||||||
|
+ administer the system and not do work that should be in a job.
|
||||||
|
+ The job may trigger oom-killer or just exit. If root restarts
|
||||||
|
+ a service or similar, it will be tracked and killed by Slurm
|
||||||
|
+ when the job exits. This sounds bad because it is bad.
|
||||||
|
+
|
||||||
|
+ 1* = Let the connection through without adoption
|
||||||
|
+ 0 = I am crazy. I want random services to die when root jobs exit. I
|
||||||
|
+ also like it when RPCs block for a while then time out.
|
||||||
|
+
|
||||||
|
+
|
||||||
|
+ action_no_jobs - The action to perform if the user has no jobs on the node
|
||||||
|
+
|
||||||
|
+ ignore = Do nothing. Fall through to the next pam module
|
||||||
|
+ deny* = Deny the connection
|
||||||
|
+
|
||||||
|
+
|
||||||
|
+ action_unknown - The action to perform when the user has multiple jobs on
|
||||||
|
+ the node *and* the RPC does not locate the source job.
|
||||||
|
+ If the RPC mechanism works properly in your environment,
|
||||||
|
+ this option will likely be relevant *only* when connecting
|
||||||
|
+ from a login node.
|
||||||
|
+
|
||||||
|
+ newest* = Pick the newest job on the node. The "newest" job is chosen
|
||||||
|
+ based on the mtime of the job's step_extern cgroup; asking
|
||||||
|
+ Slurm would require an RPC to the controller. The user can ssh
|
||||||
|
+ in but may be adopted into a job that exits earlier than the
|
||||||
|
+ job they intended to check on. The ssh connection will at
|
||||||
|
+ least be subject to appropriate limits and the user can be
|
||||||
|
+ informed of better ways to accomplish their objectives if this
|
||||||
|
+ becomes a problem
|
||||||
|
+ allow = Let the connection through without adoption
|
||||||
|
+ deny = Deny the connection
|
||||||
|
+
|
||||||
|
+
|
||||||
|
+ action_adopt_failure - The action to perform if the process is unable to be
|
||||||
|
+ adopted into any job for whatever reason. If the
|
||||||
|
+ process cannot be adopted into the job identified by
|
||||||
|
+ the callerid RPC, it will fall through to the
|
||||||
|
+ action_unknown code and try to adopt there. A failure
|
||||||
|
+ at that point or if there is only one job will result
|
||||||
|
+ in this action being taken.
|
||||||
|
+
|
||||||
|
+ allow* = Let the connection through without adoption
|
||||||
|
+ deny = Deny the connection
|
||||||
|
+
|
||||||
|
+ action_generic_failure - The action to perform if there are certain failures
|
||||||
|
+ such as the inability to talk to the local slurmd
|
||||||
|
+ or if the kernel doesn't offer the correct
|
||||||
|
+ facilities.
|
||||||
|
+
|
||||||
|
+ ignore* = Do nothing. Fall through to the next pam module
|
||||||
|
+ allow = Let the connection through without adoption
|
||||||
|
+ deny = Deny the connection
|
||||||
|
+
|
||||||
|
+ log_level - See SlurmdDebug in slurm.conf(5) for available options. The
|
||||||
|
+ default log_level is info.
|
||||||
|
+
|
||||||
|
+ disable_x11 - turn off Slurm built-in X11 forwarding support.
|
||||||
|
+
|
||||||
|
+ 1 = Do not check for Slurm's X11 forwarding support, and no not
|
||||||
|
+ alter the DISPLAY variable.
|
||||||
|
+ 0* = If the step the job is adopted into has X11 enabled, set
|
||||||
|
+ the DISPLAY variable in the processes environment accordingly.
|
||||||
|
+
|
||||||
+ service - The pam service name for which this module should run. By default
|
+ service - The pam service name for which this module should run. By default
|
||||||
+ it only runs for sshd for which it was designed for. A
|
+ it only runs for sshd for which it was designed for. A
|
||||||
+ different service name can be specified like "login" or "*" to
|
+ different service name can be specified like "login" or "*" to
|
||||||
@ -37,11 +134,75 @@ index a84480c1a6..a2d61a977b 100644
|
|||||||
+ module will not perform the adoption logic and returns
|
+ module will not perform the adoption logic and returns
|
||||||
+ PAM_IGNORE immediately.
|
+ PAM_IGNORE immediately.
|
||||||
+
|
+
|
||||||
SLURM.CONF CONFIGURATION
|
+SLURM.CONF CONFIGURATION
|
||||||
PrologFlags=contain must be set in slurm.conf. This sets up the "extern" step
|
+ PrologFlags=contain must be set in slurm.conf. This sets up the "extern" step
|
||||||
into which ssh-launched processes will be adopted.
|
+ into which ssh-launched processes will be adopted.
|
||||||
|
+
|
||||||
|
+ **** IMPORTANT ****
|
||||||
|
+ PrologFlags=contain must be in place *before* using this module.
|
||||||
|
+ The module bases its checks on local steps that have already been launched. If
|
||||||
|
+ the user has no steps on the node, such as the extern step, the module will
|
||||||
|
+ assume that the user has no jobs allocated to the node. Depending on your
|
||||||
|
+ configuration of the pam module, you might deny *all* user ssh attempts.
|
||||||
|
+
|
||||||
|
+NOTES
|
||||||
|
+ This module and the related RPC currently support Linux systems which
|
||||||
|
+ have network connection information available through /proc/net/tcp{,6}. A
|
||||||
|
+ proccess's sockets must exist as symlinks in its /proc/self/fd directory.
|
||||||
|
+
|
||||||
|
+ The RPC data structure itself is OS-agnostic. If support is desired for a
|
||||||
|
+ different OS, relevant code must be added to find one's socket information
|
||||||
|
+ then match that information on the remote end to a particular process which
|
||||||
|
+ Slurm is tracking.
|
||||||
|
+
|
||||||
|
+ IPv6 is supported by the RPC data structure itself and the code which sends it
|
||||||
|
+ and receives it. Sending the RPC to an IPv6 address is not currently
|
||||||
|
+ supported by Slurm. Once support is added, remove the relevant check in
|
||||||
|
+ slurm_network_callerid().
|
||||||
|
+
|
||||||
|
+ For the action_unknown=newest setting to work, the memory cgroup must be in
|
||||||
|
+ use so that the code can check mtimes of cgroup directories. If you would
|
||||||
|
+ prefer to use a different subsystem, modify the _indeterminate_multiple
|
||||||
|
+ function.
|
||||||
|
+
|
||||||
|
+FIREWALLS, IP ADDRESSES, ETC.
|
||||||
|
+ slurmd should be accessible on any IP address from which a user might launch
|
||||||
|
+ ssh. The RPC to determine the source job must be able to reach the slurmd
|
||||||
|
+ port on that particular IP address.
|
||||||
|
+
|
||||||
|
+ If there is no slurmd on the source node, such as on a login node, it is
|
||||||
|
+ better to have the RPC be rejected rather than silently dropped. This
|
||||||
|
+ will allow better responsiveness to the RPC initiator.
|
||||||
|
+
|
||||||
|
+EXAMPLES / SUGGESTED USAGE
|
||||||
|
+ Use of this module is recommended on any compute node.
|
||||||
|
+
|
||||||
|
+ Add the following line to the appropriate file in /etc/pam.d, such as
|
||||||
|
+ system-auth or sshd:
|
||||||
|
+
|
||||||
|
+ account sufficient pam_slurm_adopt.so
|
||||||
|
+
|
||||||
|
+ If you always want to allow access for an administrative group (e.g. wheel),
|
||||||
|
+ stack the pam_access module after pam_slurm_adopt. A success with
|
||||||
|
+ pam_slurm_adopt is sufficient to allow access but the pam_access module can
|
||||||
|
+ allow others, such as staff, access even without jobs.
|
||||||
|
+
|
||||||
|
+ account sufficient pam_slurm_adopt.so
|
||||||
|
+ account required pam_access.so
|
||||||
|
+
|
||||||
|
+
|
||||||
|
+ Then edit the pam_access configuration file (/etc/security/access.conf):
|
||||||
|
+
|
||||||
|
+ +:wheel:ALL
|
||||||
|
+ -:ALL:ALL
|
||||||
|
+
|
||||||
|
+ When access is denied, the user will receive a relevant error message.
|
||||||
|
+
|
||||||
|
+ pam_systemd.so is known to not play nice with Slurm's usage of cgroups. It is
|
||||||
|
+ recommended that you disable it or possibly add pam_slurm_adopt.so after
|
||||||
|
+ pam_systemd.so.
|
||||||
diff --git a/contribs/pam_slurm_adopt/pam_slurm_adopt.c b/contribs/pam_slurm_adopt/pam_slurm_adopt.c
|
diff --git a/contribs/pam_slurm_adopt/pam_slurm_adopt.c b/contribs/pam_slurm_adopt/pam_slurm_adopt.c
|
||||||
index 3f23c2ec77..da21479f61 100644
|
index 51f21e8729..dccad90185 100644
|
||||||
--- a/contribs/pam_slurm_adopt/pam_slurm_adopt.c
|
--- a/contribs/pam_slurm_adopt/pam_slurm_adopt.c
|
||||||
+++ b/contribs/pam_slurm_adopt/pam_slurm_adopt.c
|
+++ b/contribs/pam_slurm_adopt/pam_slurm_adopt.c
|
||||||
@@ -94,6 +94,7 @@ static struct {
|
@@ -94,6 +94,7 @@ static struct {
|
||||||
@ -124,7 +285,7 @@ index 3f23c2ec77..da21479f61 100644
|
|||||||
_log_init(opts.log_level);
|
_log_init(opts.log_level);
|
||||||
|
|
||||||
switch (opts.action_generic_failure) {
|
switch (opts.action_generic_failure) {
|
||||||
@@ -762,6 +807,7 @@ cleanup:
|
@@ -765,6 +810,7 @@ cleanup:
|
||||||
xfree(buf);
|
xfree(buf);
|
||||||
xfree(slurm_cgroup_conf);
|
xfree(slurm_cgroup_conf);
|
||||||
xfree(opts.node_name);
|
xfree(opts.node_name);
|
||||||
|
@ -1,4 +1,4 @@
|
|||||||
From 33d78f2db60d3a86c38512f0502df559782cbdf6 Mon Sep 17 00:00:00 2001
|
From a5d4481c05e2afa1ff1920446663e66c48ef9277 Mon Sep 17 00:00:00 2001
|
||||||
From: Matthias Gerstner <matthias.gerstner@suse.de>
|
From: Matthias Gerstner <matthias.gerstner@suse.de>
|
||||||
Date: Wed, 5 Dec 2018 14:08:07 +0100
|
Date: Wed, 5 Dec 2018 14:08:07 +0100
|
||||||
Subject: [PATCH 2/3] pam_slurm_adopt: send_user_msg: don't copy undefined data
|
Subject: [PATCH 2/3] pam_slurm_adopt: send_user_msg: don't copy undefined data
|
||||||
|
@ -1,4 +1,4 @@
|
|||||||
From 86f74afb04f2f8f40751ccc0bdbfd77b99035d8d Mon Sep 17 00:00:00 2001
|
From d630acbf5709dcf03f9e8cd1739a77cfe6c1e4b8 Mon Sep 17 00:00:00 2001
|
||||||
From: Matthias Gerstner <matthias.gerstner@suse.de>
|
From: Matthias Gerstner <matthias.gerstner@suse.de>
|
||||||
Date: Wed, 5 Dec 2018 15:08:53 +0100
|
Date: Wed, 5 Dec 2018 15:08:53 +0100
|
||||||
Subject: [PATCH 3/3] pam_slurm_adopt: use uid to determine whether root is
|
Subject: [PATCH 3/3] pam_slurm_adopt: use uid to determine whether root is
|
||||||
@ -13,7 +13,7 @@ Signed-off-by: Christian Goll <cgoll@suse.de>
|
|||||||
1 file changed, 10 insertions(+), 11 deletions(-)
|
1 file changed, 10 insertions(+), 11 deletions(-)
|
||||||
|
|
||||||
diff --git a/contribs/pam_slurm_adopt/pam_slurm_adopt.c b/contribs/pam_slurm_adopt/pam_slurm_adopt.c
|
diff --git a/contribs/pam_slurm_adopt/pam_slurm_adopt.c b/contribs/pam_slurm_adopt/pam_slurm_adopt.c
|
||||||
index da21479f61..c4635b4693 100644
|
index dccad90185..f1d062885e 100644
|
||||||
--- a/contribs/pam_slurm_adopt/pam_slurm_adopt.c
|
--- a/contribs/pam_slurm_adopt/pam_slurm_adopt.c
|
||||||
+++ b/contribs/pam_slurm_adopt/pam_slurm_adopt.c
|
+++ b/contribs/pam_slurm_adopt/pam_slurm_adopt.c
|
||||||
@@ -708,17 +708,6 @@ PAM_EXTERN int pam_sm_acct_mgmt(pam_handle_t *pamh, int flags
|
@@ -708,17 +708,6 @@ PAM_EXTERN int pam_sm_acct_mgmt(pam_handle_t *pamh, int flags
|
||||||
@ -49,8 +49,8 @@ index da21479f61..c4635b4693 100644
|
|||||||
+ }
|
+ }
|
||||||
+ }
|
+ }
|
||||||
|
|
||||||
/* Check if there are any steps on the node from any user. A failure here
|
/*
|
||||||
* likely means failures everywhere so exit on failure or if no local jobs
|
* Check if there are any steps on the node from any user. A failure here
|
||||||
--
|
--
|
||||||
2.16.4
|
2.16.4
|
||||||
|
|
||||||
|
@ -1,3 +0,0 @@
|
|||||||
version https://git-lfs.github.com/spec/v1
|
|
||||||
oid sha256:959df5d07563f2f472376a57bdafe61b4c44fe183a4a2c279c83607336dff806
|
|
||||||
size 6092020
|
|
3
slurm-18.08.4.tar.bz2
Normal file
3
slurm-18.08.4.tar.bz2
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:d215ef87481e48032ac7c3bcf61aac40b5258dedfbab3f56af5d53d59f22b4c8
|
||||||
|
size 6069605
|
@ -1,3 +1,33 @@
|
|||||||
|
-------------------------------------------------------------------
|
||||||
|
Thu Dec 13 10:07:00 UTC 2018 - cgoll@suse.com
|
||||||
|
- Update to 18.08.04, with following highlights
|
||||||
|
* Fix message sent to user to display preempted instead of time limit when
|
||||||
|
a job is preempted.
|
||||||
|
* Fix memory leak when a failure happens processing a nodes gres config.
|
||||||
|
* Improve error message when failures happen processing a nodes gres config.
|
||||||
|
* Don't skip jobs in scontrol hold.
|
||||||
|
* Allow --cpu-bind=verbose to be used with SLURM_HINT environment variable.
|
||||||
|
* Enhanced handling for runaway jobs
|
||||||
|
* cons_res: Delay exiting cr_job_test until after cores/cpus are calculated
|
||||||
|
and distributed.
|
||||||
|
* Don't check existence of srun --prolog or --epilog executables when set to
|
||||||
|
"none" and SLURM_TEST_EXEC is used.
|
||||||
|
* Add "P" suffix support to job and step tres specifications.
|
||||||
|
* Fix jobacct_gather/cgroup to work correctly when more than one task is
|
||||||
|
started on a node.
|
||||||
|
* salloc - set SLURM_NTASKS_PER_CORE and SLURM_NTASKS_PER_SOCKET in the
|
||||||
|
environment if the corresponding command line options are used.
|
||||||
|
* slurmd - fix handling of the -f flag to specify alternate config file
|
||||||
|
locations.
|
||||||
|
* Add SchedulerParameters option of bf_ignore_newly_avail_nodes to avoid
|
||||||
|
scheduling lower priority jobs on resources that become available during
|
||||||
|
the backfill scheduling cycle when bf_continue is enabled.
|
||||||
|
* job_submit/lua: Add several slurmctld return codes and add user/group info
|
||||||
|
* salloc/sbatch/srun - print warning if mutually exclusive options of --mem
|
||||||
|
and --mem-per-cpu are both set.
|
||||||
|
- Refreshed:
|
||||||
|
* pam_slurm_adopt-avoid-running-outside-of-the-sshd-PA.patch
|
||||||
|
|
||||||
-------------------------------------------------------------------
|
-------------------------------------------------------------------
|
||||||
Mon Dec 10 10:49:14 UTC 2018 - cgoll@suse.com
|
Mon Dec 10 10:49:14 UTC 2018 - cgoll@suse.com
|
||||||
|
|
||||||
|
@ -1,7 +1,7 @@
|
|||||||
#
|
#
|
||||||
# spec file for package slurm
|
# spec file for package slurm
|
||||||
#
|
#
|
||||||
# Copyright (c) 2018 SUSE LINUX GmbH, Nuernberg, Germany.
|
# Copyright (c) 2019 SUSE LINUX GmbH, Nuernberg, Germany.
|
||||||
#
|
#
|
||||||
# All modifications and additions to the file contributed by third parties
|
# All modifications and additions to the file contributed by third parties
|
||||||
# remain the property of their copyright owners, unless otherwise agreed
|
# remain the property of their copyright owners, unless otherwise agreed
|
||||||
@ -18,7 +18,7 @@
|
|||||||
|
|
||||||
# Check file META in sources: update so_version to (API_CURRENT - API_AGE)
|
# Check file META in sources: update so_version to (API_CURRENT - API_AGE)
|
||||||
%define so_version 33
|
%define so_version 33
|
||||||
%define ver 18.08.3
|
%define ver 18.08.4
|
||||||
# so-version is 0 and seems to be stable
|
# so-version is 0 and seems to be stable
|
||||||
%define pmi_so 0
|
%define pmi_so 0
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user