SHA256
1
0
forked from pool/openucx
openucx/openucx-s390x-support.patch
Nicolas Morey-Chaisemartin 878438d42d Accepting request 1006486 from home:NMoreyChaisemartin:branches:science:HPC
- Update to v1.13.1 (jsc#PED-912)
  - Core
    - Added new objects to VFS: local and remote address of endpoint,
      statistics of ucp_ep_create success/failure, failed/destroyed endpoints
    - Added support for UCX static libraries
    - Added profiling for rkey management routines
    - PCIe relaxed order enabled by default for AMD CPUs
    - Fixed not deallocating memory from ucp_mem_unmap if no rcache
    - Fixed versioning infrastructure
    - Multiple code improvements: refactoring, debug prints and assertions, etc.
    - Multiple improvements in build, test and docs infrastructure
    - Added new objects to VFS (md, component, log_level, etc.)
    - Added configuration variable to specify which loadable modules are allowed
    - Added build-time configuration to disable sigaction overriding
  - UCP
    - Added API to pass pre-registered memory handle to UCP operations
    - Added implementation of AM rendezvous protocol
    - Added 2-stage pipeline rendezvous protocol for GPU
    - Added support for fragment mem_type for v1 pipeline proto, disabled by default
    - Added active message support for proto v2
    - Added UCP memory registration cache
    - Improved adaptive progress - deactivate iface when all p2p lanes are destroyed
    - Added support for user memh in proto_v1
    - Added support for selecting local address when creating a client endpoint
    - Added option to limit GPUDirectRDMA size in rendezvous protocol, UCX_RNDV_MEMTYPE_DIRECT_SIZE
    - Deprecated UCX_SOCKADDR_AUX_TLS configuration parameter
    - Resolving remote EP ID when creating local EP disabled by default
    - Added client_id to ucp_worker_create() and ucp_conn_request_query() APIs
    - Added ucp_worker_address_query() API
    - Updated ucp_ep_query() API for getting local and remote addresses
    - Added address versioning to correctly preserve wire compatibility starting from version 1.11.0
    - Added new client/server connection establishment packet header format
    - Enabled rendezvous and tag sync protocols when error handling is enabled on the endpoint
    - Added iov zcopy support to RMA operations
    - Reduced memory usage of unexpected messages by fitting receive buffer size to packet size
    - Added support for modifying UCT and UCS configs by ucp_config_modify() API
    - Optimized unpacked rkeys memory consumption
    - Added request flag to influence latency vs. bandwidth protocol
    - Reduced memory management overhead with new protocols
    - Improved performance calculations for new protocols
    - Added AMO support with GPU memory target using new protocols
    - Added put_zcopy, get_zcopy and pipeline based rendezvous in new protocols
    - Added support for user-defined alignment in Active Messages
    - Added support for offload tag sync in new protocols
    - Updated ucp_atomic_post() to use NBX flow
  - UCT
    - Introduced API uct_md_mkey_pack_v2
    - Introduced UCT iface features API
    - Introduced max_inflight_eps parameter in perf_attr API
    - Introduced UCT_SEND_FLAG_PEER_CHECK flag that forces checking connectivity to a peer
    - Introduced UCX_RCACHE_PURGE_ON_FORK to enable/disable cleaning regions when application is forking
    - Disabled PEER_FAILURE capability for XPMEM
    - Added API - uct_iface_is_reachable_v2()
    - Added IPv6 address support in TCP
    - Added latency estimation to uct_iface_estimate_perf()
    - Adjusted knem and cma overhead cost
    - Increased built-in TCP keep-alive interval to 2 seconds
  - RDMA CORE (IB, ROCE, etc.)
    - Introduced NDR autorecognition
    - Introduced CQE zipping support
    - Set the default MAX_RD_ATOMIC to maximum value supported by the hardware
    - Disabled mlx5 ifaces on verbs MD
    - Added detection of IB NDR devices
    - Added check for CQ overrun in assert mode
    - Added bitmap usage for releasing detached DCIs
    - Added configuration for requests ack frequency with DevX
    - Added remote QP info to tx error CQE traces
  - ROCM
    - Increased maximum number of HSA agents
  - UCS
    - Added topo module infrastructure
    - Added memtrack and rcache information to VFS
    - Added API for a per-process aggregate-sum statistics report
    - Added memory pool set data structure
    - Added new ptr_array API for bulk allocation
    - Added ucs_string_buffer_append_flags() for string buffer
    - Added ucs_ffs32()
    - Added ucs_vsnprintf_safe() which always adds '\0'
    - Added thread-safe put to ptr_map
    - Improved accuracy of the topology distance estimation
    - Added prints of leaked callbacks from the callback queue
    - Removed a diagnostic message when fuse thread is stopped
    - Added configurable limit for the memory consumed by rcache
    - Added configuration for VFS(FUSE) thread affinity
    - Added memory limit support to memtrack
  - Packaging
    - Added cmake config files for better integration with external cmake based projects
  - Tools
    - Added loop-back transport support in ucx_perftest
    - Split ucx_perftest into separate modules
    - Added process placement option for ucx_info
    - Extended parameters correctness check in ucx_perftest
- Backported UCS-DEBUG-replace-PTR-with-void.patch
  from upstream to fix compilation

OBS-URL: https://build.opensuse.org/request/show/1006486
OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=48
2022-09-29 15:27:45 +00:00

431 lines
12 KiB
Diff

commit 9d5c0d189d4cd5413089bd65fed1e87293e15763
Author: Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
Date: Tue Sep 27 17:47:15 2022 +0200
openucx s390x support
Signed-off-by: Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
diff --git config/m4/ucm.m4 config/m4/ucm.m4
index 8d7a9e40ec06..df7508e1e71a 100644
--- config/m4/ucm.m4
+++ config/m4/ucm.m4
@@ -80,9 +80,20 @@ AC_CHECK_DECLS([SYS_ipc],
[ipc_hooks_happy=no],
[#include <sys/syscall.h>])
+
+SAVE_CFLAGS=$CFLAGS
+CFLAGS="$CLAGS -Isrc/"
+bistro_arch_happy=yes
+AC_CHECK_DECLS([ucm_bistro_patch],
+ [],
+ [bistro_arch_happy=no],
+ [#include <ucm/bistro/bistro.h>])
+CFLAGS=$SAVE_CFLAGS
+
AS_IF([test "x$mmap_hooks_happy" = "xyes"],
AS_IF([test "x$ipc_hooks_happy" = "xyes" -o "x$shm_hooks_happy" = "xyes"],
- [bistro_hooks_happy=yes]))
+ AS_IF([test "x$bistro_arch_happy" == "xyes"],
+ [bistro_hooks_happy=yes])))
AS_IF([test "x$bistro_hooks_happy" = "xyes"],
[AC_DEFINE([UCM_BISTRO_HOOKS], [1], [Enable BISTRO hooks])],
diff --git src/tools/info/sys_info.c src/tools/info/sys_info.c
index 5316b1c4336e..e910bc53572d 100644
--- src/tools/info/sys_info.c
+++ src/tools/info/sys_info.c
@@ -46,7 +46,8 @@ static const char* cpu_vendor_names[] = {
[UCS_CPU_VENDOR_GENERIC_ARM] = "Generic ARM",
[UCS_CPU_VENDOR_GENERIC_PPC] = "Generic PPC",
[UCS_CPU_VENDOR_FUJITSU_ARM] = "Fujitsu ARM",
- [UCS_CPU_VENDOR_ZHAOXIN] = "Zhaoxin"
+ [UCS_CPU_VENDOR_ZHAOXIN] = "Zhaoxin",
+ [UCS_CPU_VENDOR_GENERIC_IBM] = "Generic IBM"
};
static double measure_memcpy_bandwidth(size_t size)
diff --git src/ucm/Makefile.am src/ucm/Makefile.am
index 5140b5acf5bf..8805124befee 100644
--- src/ucm/Makefile.am
+++ src/ucm/Makefile.am
@@ -31,7 +31,8 @@ noinst_HEADERS = \
bistro/bistro.h \
bistro/bistro_x86_64.h \
bistro/bistro_aarch64.h \
- bistro/bistro_ppc64.h
+ bistro/bistro_ppc64.h \
+ bistro/bistro_s390x.h
libucm_la_SOURCES = \
event/event.c \
diff --git src/ucm/bistro/bistro.h src/ucm/bistro/bistro.h
index 101000455e66..0ae947429796 100644
--- src/ucm/bistro/bistro.h
+++ src/ucm/bistro/bistro.h
@@ -20,6 +20,8 @@ typedef struct ucm_bistro_restore_point ucm_bistro_restore_point_t;
# include "bistro_aarch64.h"
#elif defined(__x86_64__)
# include "bistro_x86_64.h"
+#elif defined(__s390x__)
+# include "bistro_s390x.h"
#else
# error "Unsupported architecture"
#endif
diff --git src/ucm/bistro/bistro_s390x.h src/ucm/bistro/bistro_s390x.h
new file mode 100644
index 000000000000..c0f427f4984a
--- /dev/null
+++ src/ucm/bistro/bistro_s390x.h
@@ -0,0 +1,18 @@
+#ifndef UCM_BISTRO_BISTRO_S390X_H_
+#define UCM_BISTRO_BISTRO_S390X_H_
+
+#include <stdint.h>
+
+#include <ucs/type/status.h>
+#include <ucs/sys/compiler_def.h>
+
+#define UCM_BISTRO_PROLOGUE
+#define UCM_BISTRO_EPILOGUE
+
+static inline ucs_status_t ucm_bistro_patch(void *func_ptr, void *hook, const char *symbol,
+ void **orig_func_p,
+ ucm_bistro_restore_point_t **rp){
+ return UCS_ERR_UNSUPPORTED;
+}
+
+#endif
diff --git src/ucs/Makefile.am src/ucs/Makefile.am
index 77680021d725..29f31aabd958 100644
--- src/ucs/Makefile.am
+++ src/ucs/Makefile.am
@@ -22,6 +22,7 @@ libucs_la_LIBADD = $(LIBM) $(top_builddir)/src/ucm/libucm.la $(BFD_LIBS)
nobase_dist_libucs_la_HEADERS = \
arch/aarch64/bitops.h \
arch/ppc64/bitops.h \
+ arch/s390x/bitops.h \
arch/x86_64/bitops.h \
arch/bitops.h \
algorithm/crc.h \
@@ -81,12 +82,14 @@ nobase_dist_libucs_la_HEADERS = \
arch/aarch64/global_opts.h \
arch/generic/atomic.h \
arch/ppc64/global_opts.h \
+ arch/s390x/global_opts.h \
arch/global_opts.h
noinst_HEADERS = \
arch/aarch64/cpu.h \
arch/generic/cpu.h \
arch/ppc64/cpu.h \
+ arch/s390x/cpu.h \
arch/x86_64/cpu.h \
arch/cpu.h \
config/ucm_opts.h \
@@ -134,6 +137,7 @@ libucs_la_SOURCES = \
algorithm/qsort_r.c \
arch/aarch64/cpu.c \
arch/aarch64/global_opts.c \
+ arch/s390x/global_opts.c \
arch/ppc64/timebase.c \
arch/ppc64/global_opts.c \
arch/x86_64/cpu.c \
diff --git src/ucs/arch/atomic.h src/ucs/arch/atomic.h
index 6a8551f592e1..e3a9f4641383 100644
--- src/ucs/arch/atomic.h
+++ src/ucs/arch/atomic.h
@@ -15,6 +15,8 @@
# include "generic/atomic.h"
#elif defined(__aarch64__)
# include "generic/atomic.h"
+#elif defined(__s390x__)
+# include "generic/atomic.h"
#else
# error "Unsupported architecture"
#endif
diff --git src/ucs/arch/bitops.h src/ucs/arch/bitops.h
index 77e00571e04f..bbdea0ceb210 100644
--- src/ucs/arch/bitops.h
+++ src/ucs/arch/bitops.h
@@ -20,6 +20,8 @@ BEGIN_C_DECLS
# include "ppc64/bitops.h"
#elif defined(__aarch64__)
# include "aarch64/bitops.h"
+#elif defined(__s390x__)
+# include "s390x/bitops.h"
#else
# error "Unsupported architecture"
#endif
diff --git src/ucs/arch/cpu.c src/ucs/arch/cpu.c
index 9e6fab0904eb..c912e991586c 100644
--- src/ucs/arch/cpu.c
+++ src/ucs/arch/cpu.c
@@ -61,6 +61,10 @@ const ucs_cpu_builtin_memcpy_t ucs_cpu_builtin_memcpy[UCS_CPU_VENDOR_LAST] = {
.min = UCS_MEMUNITS_INF,
.max = UCS_MEMUNITS_INF
},
+ [UCS_CPU_VENDOR_GENERIC_IBM] = {
+ .min = UCS_MEMUNITS_INF,
+ .max = UCS_MEMUNITS_INF
+ },
[UCS_CPU_VENDOR_FUJITSU_ARM] = {
.min = UCS_MEMUNITS_INF,
.max = UCS_MEMUNITS_INF
@@ -77,6 +81,7 @@ const size_t ucs_cpu_est_bcopy_bw[UCS_CPU_VENDOR_LAST] = {
[UCS_CPU_VENDOR_AMD] = 5008 * UCS_MBYTE,
[UCS_CPU_VENDOR_GENERIC_ARM] = 5800 * UCS_MBYTE,
[UCS_CPU_VENDOR_GENERIC_PPC] = 5800 * UCS_MBYTE,
+ [UCS_CPU_VENDOR_GENERIC_IBM] = 5800 * UCS_MBYTE,
[UCS_CPU_VENDOR_FUJITSU_ARM] = 12000 * UCS_MBYTE
};
diff --git src/ucs/arch/cpu.h src/ucs/arch/cpu.h
index 719913fb0b8c..04d69ca01533 100644
--- src/ucs/arch/cpu.h
+++ src/ucs/arch/cpu.h
@@ -63,6 +63,7 @@ typedef enum ucs_cpu_vendor {
UCS_CPU_VENDOR_AMD,
UCS_CPU_VENDOR_GENERIC_ARM,
UCS_CPU_VENDOR_GENERIC_PPC,
+ UCS_CPU_VENDOR_GENERIC_IBM,
UCS_CPU_VENDOR_FUJITSU_ARM,
UCS_CPU_VENDOR_ZHAOXIN,
UCS_CPU_VENDOR_LAST
@@ -98,6 +99,8 @@ typedef struct ucs_cpu_builtin_memcpy {
# include "ppc64/cpu.h"
#elif defined(__aarch64__)
# include "aarch64/cpu.h"
+#elif defined(__s390x__)
+# include "s390x/cpu.h"
#else
# error "Unsupported architecture"
#endif
diff --git src/ucs/arch/global_opts.h src/ucs/arch/global_opts.h
index 8786f130290a..0d251fb91868 100644
--- src/ucs/arch/global_opts.h
+++ src/ucs/arch/global_opts.h
@@ -15,6 +15,8 @@
# include "ppc64/global_opts.h"
#elif defined(__aarch64__)
# include "aarch64/global_opts.h"
+#elif defined(__s390x__)
+# include "s390x/global_opts.h"
#else
# error "Unsupported architecture"
#endif
diff --git src/ucs/arch/s390x/bitops.h src/ucs/arch/s390x/bitops.h
new file mode 100644
index 000000000000..39ad125107e9
--- /dev/null
+++ src/ucs/arch/s390x/bitops.h
@@ -0,0 +1,32 @@
+/**
+* Copyright (C) Mellanox Technologies Ltd. 2001-2015. ALL RIGHTS RESERVED.
+*
+* See file LICENSE for terms.
+*/
+
+#ifndef UCS_S390X_BITOPS_H_
+#define UCS_S390X_BITOPS_H_
+
+#include <stdint.h>
+
+
+static inline unsigned __ucs_ilog2_u32(uint32_t n)
+{
+ if (!n)
+ return 0;
+ return 31 - __builtin_clz(n);
+}
+
+static inline unsigned __ucs_ilog2_u64(uint64_t n)
+{
+ if (!n)
+ return 0;
+ return 63 - __builtin_clz(n);
+}
+
+static inline unsigned ucs_ffs64(uint64_t n)
+{
+ return __ucs_ilog2_u64(n & -n);
+}
+
+#endif
diff --git src/ucs/arch/s390x/cpu.h src/ucs/arch/s390x/cpu.h
new file mode 100644
index 000000000000..4f0a87006118
--- /dev/null
+++ src/ucs/arch/s390x/cpu.h
@@ -0,0 +1,84 @@
+/**
+* Copyright (C) Mellanox Technologies Ltd. 2001-2013. ALL RIGHTS RESERVED.
+* Copyright (C) ARM Ltd. 2016-2017. ALL RIGHTS RESERVED.
+*
+* See file LICENSE for terms.
+*/
+
+
+#ifndef UCS_S390X_CPU_H_
+#define UCS_S390X_CPU_H_
+
+#include <ucs/sys/compiler.h>
+#include <ucs/arch/generic/cpu.h>
+#include <stdint.h>
+#include <string.h>
+#include <ucs/type/status.h>
+
+
+#define UCS_ARCH_CACHE_LINE_SIZE 256
+
+BEGIN_C_DECLS
+
+/* Assume the worst - weak memory ordering */
+#define ucs_memory_bus_fence() asm volatile (""::: "memory")
+#define ucs_memory_bus_store_fence() ucs_memory_bus_fence()
+#define ucs_memory_bus_load_fence() ucs_memory_bus_fence()
+#define ucs_memory_bus_wc_flush() ucs_memory_bus_fence()
+#define ucs_memory_cpu_fence() ucs_memory_bus_fence()
+#define ucs_memory_cpu_store_fence() ucs_memory_bus_fence()
+#define ucs_memory_cpu_load_fence() ucs_memory_bus_fence()
+#define ucs_memory_cpu_wc_fence() ucs_memory_bus_fence()
+
+
+static inline uint64_t ucs_arch_read_hres_clock()
+{
+ unsigned long clk;
+ asm volatile("stck %0" : "=Q" (clk) : : "cc");
+ return clk >> 2;
+}
+#define ucs_arch_get_clocks_per_sec ucs_arch_generic_get_clocks_per_sec
+
+
+static inline ucs_cpu_model_t ucs_arch_get_cpu_model()
+{
+ return UCS_CPU_MODEL_UNKNOWN;
+}
+
+static inline ucs_cpu_vendor_t ucs_arch_get_cpu_vendor()
+{
+ return UCS_CPU_VENDOR_GENERIC_IBM;
+}
+
+static inline int ucs_arch_get_cpu_flag()
+{
+ return UCS_CPU_FLAG_UNKNOWN;
+}
+
+double ucs_arch_get_clocks_per_sec();
+
+#define ucs_arch_wait_mem ucs_arch_generic_wait_mem
+
+static inline void ucs_cpu_init()
+{
+}
+
+static inline void *ucs_memcpy_relaxed(void *dst, const void *src, size_t len)
+{
+ return memcpy(dst, src, len);
+}
+
+static UCS_F_ALWAYS_INLINE void
+ucs_memcpy_nontemporal(void *dst, const void *src, size_t len)
+{
+ memcpy(dst, src, len);
+}
+
+static inline ucs_status_t ucs_arch_get_cache_size(size_t *cache_sizes)
+{
+ return UCS_ERR_UNSUPPORTED;
+}
+
+END_C_DECLS
+
+#endif
diff --git src/ucs/arch/s390x/global_opts.c src/ucs/arch/s390x/global_opts.c
new file mode 100644
index 000000000000..4fa0c74034a7
--- /dev/null
+++ src/ucs/arch/s390x/global_opts.c
@@ -0,0 +1,24 @@
+/**
+* Copyright (C) Mellanox Technologies Ltd. 2019. ALL RIGHTS RESERVED.
+*
+* See file LICENSE for terms.
+*/
+
+#if defined(__s390x__)
+
+#ifdef HAVE_CONFIG_H
+# include "config.h"
+#endif
+
+#include <ucs/arch/global_opts.h>
+#include <ucs/config/parser.h>
+
+ucs_config_field_t ucs_arch_global_opts_table[] = {
+ {NULL}
+};
+
+void ucs_arch_print_memcpy_limits(ucs_arch_global_opts_t *config)
+{
+}
+
+#endif
diff --git src/ucs/arch/s390x/global_opts.h src/ucs/arch/s390x/global_opts.h
new file mode 100644
index 000000000000..225e4e5e896a
--- /dev/null
+++ src/ucs/arch/s390x/global_opts.h
@@ -0,0 +1,25 @@
+/**
+* Copyright (C) Mellanox Technologies Ltd. 2019. ALL RIGHTS RESERVED.
+*
+* See file LICENSE for terms.
+*/
+
+
+#ifndef UCS_PPC64_GLOBAL_OPTS_H_
+#define UCS_PPC64_GLOBAL_OPTS_H_
+
+#include <ucs/sys/compiler_def.h>
+
+BEGIN_C_DECLS
+
+#define UCS_ARCH_GLOBAL_OPTS_INITALIZER {}
+
+/* built-in memcpy config */
+typedef struct ucs_arch_global_opts {
+ char dummy;
+} ucs_arch_global_opts_t;
+
+END_C_DECLS
+
+#endif
+
diff --git src/ucs/sys/sys.c src/ucs/sys/sys.c
index 88f4a147315e..0b6d186265a8 100644
--- src/ucs/sys/sys.c
+++ src/ucs/sys/sys.c
@@ -1224,8 +1224,19 @@ void *ucs_sys_realloc(void *old_ptr, size_t old_length, size_t new_length)
if (old_ptr == NULL) {
/* Note: Must pass the 0 offset as "long", otherwise it will be
* partially undefined when converted to syscall arguments */
+#if defined(__s390x__)
+ long int _args[6] = {
+ (long int) NULL,
+ (long int) new_length,
+ (long int) PROT_READ|PROT_WRITE,
+ (long int) MAP_PRIVATE|MAP_ANONYMOUS,
+ (long int) -1,
+ (long int) 0ul};
+ ptr = (void*)syscall(__NR_mmap, _args);
+#else
ptr = (void*)syscall(__NR_mmap, NULL, new_length, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0ul);
+#endif
if (ptr == MAP_FAILED) {
ucs_log_fatal_error("mmap(NULL, %zu, READ|WRITE, PRIVATE|ANON) failed: %m",
new_length);