SHA256
1
0
forked from pool/openucx
openucx/openucx-s390x-support.patch

454 lines
12 KiB
Diff
Raw Normal View History

commit 28ffffe90896cbd655466b870b74d8304736a316
Accepting request 1075167 from home:NMorey:branches:science:HPC - Update to v1.14.0 - UCP - Added API for querying transport and device names on endpoint - Added API for querying datatype object - Added API for exporting and importing memory keys (no implementation yet) - Added support for non-persistent active message header - Added infrastructure to print protocols v2 performance - Multiple performance improvements for protocols v2 - Added support for non-contiguous datatypes for rendezvous protocols v2 - Added support for reset and abort request in protocols v2 - Added support for user memory handles in RMA API - Added multi-rail support for RMA API in protocols v2 - Added support for up to 16 different lanes per endpoint - Added support for dmabuf memory registration in protocols v2 - Added strong fence mode for ucp_worker_fence() API - UCT - Added new uct_md_mem_attach() API to support exported memory handles - Added remote completion mode for endpoint flush (via new flag) - Added support for dmabuf registration - Added new uct_ep_connect_to_ep_v2() API - Added new uct_mem_reg_v2() API - Added new uct_md_query_v2() API - Added support for IPv6 loopback address in TCP transport - RDMA CORE (IB, ROCE, etc.) - Added ECE (enhanced connection establishment) support for RC and DC transports - Added support for hardware DCS in DC transport - Added UD interface and endpoint resource information to VFS - Added CQ creation via DEVX API - Removed support for accelerated IB transports over legacy experimental verbs - UCS - Added support for auto-correction of user environment variables - UCM - Implemented CUDA bistro hooks for aarch64 (to enable memory cache on this platform) - Added support for CUDA virtual/stream-ordered memory with cudaMallocAsync - Documentation - Added FAQ for using pkg-config tool to build applications with UCX - Tools - Added runtime library version to the 'ucx_info -v' output - Added support for memory types in ucx_info - Many bugfixes. See NEWS. - Drop patch merged upstream: - UCS-DEBUG-replace-PTR-with-void.patch - gcc13-fix.patch - Refresh openucx-s390x-support.patch OBS-URL: https://build.opensuse.org/request/show/1075167 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=57
2023-03-29 08:50:48 +00:00
Author: Nicolas Morey <nmorey@suse.com>
Date: Wed Jun 26 17:36:58 2024 +0200
openucx s390x support
Signed-off-by: Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
diff --git config/m4/ucm.m4 config/m4/ucm.m4
Accepting request 1115979 from home:NMorey:branches:science:HPC - Update to 1.15.0 - UCP - Added 2-stage pipeline protocol in the new protocol infrastructure - Added reset and abort functionality of rendezvous protocols in the new infrastructure - Added zero-copy rendezvous data send protocol in the new infrastructure - Added support for user memory handle in the new protocol infrastructure - Added option to force ODP registration for certain memory types - Enabled lock free memory region deregistration - Updated allow/deny transport list feature to control auxiliary transport selection - Multiple performance improvements of the new protocol infrastructure - Multiple improvements in error and debug messages - Fixed assertion when sending from non-contiguous GPU buffer to managed buffer - Fixed the race condition on endpoint configurations - Fixed endpoint reconfiguration issues due to asymmetrical selection - Fixed endpoint reconfiguration error due to wrong locality detection - Fixed crash during connection manager cleanup - Fixed rkey index calculation for rendezvous protocol - Fixed rcache dump function - Removed logging from rkey unpack in release mode - Fixed dobule free of rkey in rendezvous protocol - Fixed rendezvous pipeline protocol error flow - Fixed error handling in rendezvous get zcopy protocol - Replay pending requests of wireup EP CM during connection establishment to prevent potential ordering issues and wrong configuration - Pass user-provided memory type to the function that checks whether the buffer can be sent inline or not - Avoid memory registration during UCP context initialization - Fixed CPU/device atomics selection in the new protocol infrastructure - Multiple fixes in the new protocol infrastructure information output OBS-URL: https://build.opensuse.org/request/show/1115979 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=63
2023-10-06 09:59:22 +00:00
index e5e66266d695..ef7e4ede93ce 100644
--- config/m4/ucm.m4
+++ config/m4/ucm.m4
Accepting request 1006486 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.13.1 (jsc#PED-912) - Core - Added new objects to VFS: local and remote address of endpoint, statistics of ucp_ep_create success/failure, failed/destroyed endpoints - Added support for UCX static libraries - Added profiling for rkey management routines - PCIe relaxed order enabled by default for AMD CPUs - Fixed not deallocating memory from ucp_mem_unmap if no rcache - Fixed versioning infrastructure - Multiple code improvements: refactoring, debug prints and assertions, etc. - Multiple improvements in build, test and docs infrastructure - Added new objects to VFS (md, component, log_level, etc.) - Added configuration variable to specify which loadable modules are allowed - Added build-time configuration to disable sigaction overriding - UCP - Added API to pass pre-registered memory handle to UCP operations - Added implementation of AM rendezvous protocol - Added 2-stage pipeline rendezvous protocol for GPU - Added support for fragment mem_type for v1 pipeline proto, disabled by default - Added active message support for proto v2 - Added UCP memory registration cache - Improved adaptive progress - deactivate iface when all p2p lanes are destroyed - Added support for user memh in proto_v1 - Added support for selecting local address when creating a client endpoint - Added option to limit GPUDirectRDMA size in rendezvous protocol, UCX_RNDV_MEMTYPE_DIRECT_SIZE - Deprecated UCX_SOCKADDR_AUX_TLS configuration parameter - Resolving remote EP ID when creating local EP disabled by default - Added client_id to ucp_worker_create() and ucp_conn_request_query() APIs - Added ucp_worker_address_query() API - Updated ucp_ep_query() API for getting local and remote addresses - Added address versioning to correctly preserve wire compatibility starting from version 1.11.0 - Added new client/server connection establishment packet header format - Enabled rendezvous and tag sync protocols when error handling is enabled on the endpoint - Added iov zcopy support to RMA operations - Reduced memory usage of unexpected messages by fitting receive buffer size to packet size - Added support for modifying UCT and UCS configs by ucp_config_modify() API - Optimized unpacked rkeys memory consumption - Added request flag to influence latency vs. bandwidth protocol - Reduced memory management overhead with new protocols - Improved performance calculations for new protocols - Added AMO support with GPU memory target using new protocols - Added put_zcopy, get_zcopy and pipeline based rendezvous in new protocols - Added support for user-defined alignment in Active Messages - Added support for offload tag sync in new protocols - Updated ucp_atomic_post() to use NBX flow - UCT - Introduced API uct_md_mkey_pack_v2 - Introduced UCT iface features API - Introduced max_inflight_eps parameter in perf_attr API - Introduced UCT_SEND_FLAG_PEER_CHECK flag that forces checking connectivity to a peer - Introduced UCX_RCACHE_PURGE_ON_FORK to enable/disable cleaning regions when application is forking - Disabled PEER_FAILURE capability for XPMEM - Added API - uct_iface_is_reachable_v2() - Added IPv6 address support in TCP - Added latency estimation to uct_iface_estimate_perf() - Adjusted knem and cma overhead cost - Increased built-in TCP keep-alive interval to 2 seconds - RDMA CORE (IB, ROCE, etc.) - Introduced NDR autorecognition - Introduced CQE zipping support - Set the default MAX_RD_ATOMIC to maximum value supported by the hardware - Disabled mlx5 ifaces on verbs MD - Added detection of IB NDR devices - Added check for CQ overrun in assert mode - Added bitmap usage for releasing detached DCIs - Added configuration for requests ack frequency with DevX - Added remote QP info to tx error CQE traces - ROCM - Increased maximum number of HSA agents - UCS - Added topo module infrastructure - Added memtrack and rcache information to VFS - Added API for a per-process aggregate-sum statistics report - Added memory pool set data structure - Added new ptr_array API for bulk allocation - Added ucs_string_buffer_append_flags() for string buffer - Added ucs_ffs32() - Added ucs_vsnprintf_safe() which always adds '\0' - Added thread-safe put to ptr_map - Improved accuracy of the topology distance estimation - Added prints of leaked callbacks from the callback queue - Removed a diagnostic message when fuse thread is stopped - Added configurable limit for the memory consumed by rcache - Added configuration for VFS(FUSE) thread affinity - Added memory limit support to memtrack - Packaging - Added cmake config files for better integration with external cmake based projects - Tools - Added loop-back transport support in ucx_perftest - Split ucx_perftest into separate modules - Added process placement option for ucx_info - Extended parameters correctness check in ucx_perftest - Backported UCS-DEBUG-replace-PTR-with-void.patch from upstream to fix compilation OBS-URL: https://build.opensuse.org/request/show/1006486 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=48
2022-09-29 15:27:45 +00:00
@@ -80,9 +80,20 @@ AC_CHECK_DECLS([SYS_ipc],
[ipc_hooks_happy=no],
[#include <sys/syscall.h>])
Accepting request 733589 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.6.0 - Features: - Modular architecture for UCT transports - ROCm transport re-design: support for managed memory, direct copy, ROCm GDR - Random scheduling policy for DC transport - Optimized out-of-box settings for multi-rail - Added support for OmniPath (using Verbs) - Support for PCI atomics with IB transports - Reduced UCP address size for homogeneous environments - Bugfixes: - Multiple stability and performance improvements in TCP transport - Multiple stability fixed in Verbs and MLX5 transports - Multiple stability fixes in UCM memory hooks - Multiple stability fixes in UGNI transport - RPM Spec file cleanup - Fixing compilation issues with most recent clang and gcc compilers - Fixing the wrong name of aliases - Fix data race in UCP wireup - Fix segfault when libuct.so is reloaded - issue #3558 - Include Java sources in distribution - Handle EADDRNOTAVAIL in rdma_cm connection manager - Disable ibcm on RHEL7+ by default - Fix data race in UCP proxy endpoint - Static checker fixes - Fallback to ibv_create_cq() if ibv_create_cq_ex() returns ENOSYS - Fix malloc hooks test - Fix checking return status in ucp_client_server example - Fix gdrcopy libdir config value - Fix printing atomic capabilities in ucx_info - Fix perftest warmup iterations to be non-zero OBS-URL: https://build.opensuse.org/request/show/733589 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=33
2019-09-27 08:19:55 +00:00
+
+SAVE_CFLAGS=$CFLAGS
+CFLAGS="$CLAGS -Isrc/"
+bistro_arch_happy=yes
+AC_CHECK_DECLS([ucm_bistro_patch],
+ [],
Accepting request 733589 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.6.0 - Features: - Modular architecture for UCT transports - ROCm transport re-design: support for managed memory, direct copy, ROCm GDR - Random scheduling policy for DC transport - Optimized out-of-box settings for multi-rail - Added support for OmniPath (using Verbs) - Support for PCI atomics with IB transports - Reduced UCP address size for homogeneous environments - Bugfixes: - Multiple stability and performance improvements in TCP transport - Multiple stability fixed in Verbs and MLX5 transports - Multiple stability fixes in UCM memory hooks - Multiple stability fixes in UGNI transport - RPM Spec file cleanup - Fixing compilation issues with most recent clang and gcc compilers - Fixing the wrong name of aliases - Fix data race in UCP wireup - Fix segfault when libuct.so is reloaded - issue #3558 - Include Java sources in distribution - Handle EADDRNOTAVAIL in rdma_cm connection manager - Disable ibcm on RHEL7+ by default - Fix data race in UCP proxy endpoint - Static checker fixes - Fallback to ibv_create_cq() if ibv_create_cq_ex() returns ENOSYS - Fix malloc hooks test - Fix checking return status in ucp_client_server example - Fix gdrcopy libdir config value - Fix printing atomic capabilities in ucx_info - Fix perftest warmup iterations to be non-zero OBS-URL: https://build.opensuse.org/request/show/733589 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=33
2019-09-27 08:19:55 +00:00
+ [bistro_arch_happy=no],
+ [#include <ucm/bistro/bistro.h>])
+CFLAGS=$SAVE_CFLAGS
+
Accepting request 733589 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.6.0 - Features: - Modular architecture for UCT transports - ROCm transport re-design: support for managed memory, direct copy, ROCm GDR - Random scheduling policy for DC transport - Optimized out-of-box settings for multi-rail - Added support for OmniPath (using Verbs) - Support for PCI atomics with IB transports - Reduced UCP address size for homogeneous environments - Bugfixes: - Multiple stability and performance improvements in TCP transport - Multiple stability fixed in Verbs and MLX5 transports - Multiple stability fixes in UCM memory hooks - Multiple stability fixes in UGNI transport - RPM Spec file cleanup - Fixing compilation issues with most recent clang and gcc compilers - Fixing the wrong name of aliases - Fix data race in UCP wireup - Fix segfault when libuct.so is reloaded - issue #3558 - Include Java sources in distribution - Handle EADDRNOTAVAIL in rdma_cm connection manager - Disable ibcm on RHEL7+ by default - Fix data race in UCP proxy endpoint - Static checker fixes - Fallback to ibv_create_cq() if ibv_create_cq_ex() returns ENOSYS - Fix malloc hooks test - Fix checking return status in ucp_client_server example - Fix gdrcopy libdir config value - Fix printing atomic capabilities in ucx_info - Fix perftest warmup iterations to be non-zero OBS-URL: https://build.opensuse.org/request/show/733589 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=33
2019-09-27 08:19:55 +00:00
AS_IF([test "x$mmap_hooks_happy" = "xyes"],
AS_IF([test "x$ipc_hooks_happy" = "xyes" -o "x$shm_hooks_happy" = "xyes"],
- [bistro_hooks_happy=yes]))
Accepting request 733589 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.6.0 - Features: - Modular architecture for UCT transports - ROCm transport re-design: support for managed memory, direct copy, ROCm GDR - Random scheduling policy for DC transport - Optimized out-of-box settings for multi-rail - Added support for OmniPath (using Verbs) - Support for PCI atomics with IB transports - Reduced UCP address size for homogeneous environments - Bugfixes: - Multiple stability and performance improvements in TCP transport - Multiple stability fixed in Verbs and MLX5 transports - Multiple stability fixes in UCM memory hooks - Multiple stability fixes in UGNI transport - RPM Spec file cleanup - Fixing compilation issues with most recent clang and gcc compilers - Fixing the wrong name of aliases - Fix data race in UCP wireup - Fix segfault when libuct.so is reloaded - issue #3558 - Include Java sources in distribution - Handle EADDRNOTAVAIL in rdma_cm connection manager - Disable ibcm on RHEL7+ by default - Fix data race in UCP proxy endpoint - Static checker fixes - Fallback to ibv_create_cq() if ibv_create_cq_ex() returns ENOSYS - Fix malloc hooks test - Fix checking return status in ucp_client_server example - Fix gdrcopy libdir config value - Fix printing atomic capabilities in ucx_info - Fix perftest warmup iterations to be non-zero OBS-URL: https://build.opensuse.org/request/show/733589 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=33
2019-09-27 08:19:55 +00:00
+ AS_IF([test "x$bistro_arch_happy" == "xyes"],
+ [bistro_hooks_happy=yes])))
Accepting request 733589 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.6.0 - Features: - Modular architecture for UCT transports - ROCm transport re-design: support for managed memory, direct copy, ROCm GDR - Random scheduling policy for DC transport - Optimized out-of-box settings for multi-rail - Added support for OmniPath (using Verbs) - Support for PCI atomics with IB transports - Reduced UCP address size for homogeneous environments - Bugfixes: - Multiple stability and performance improvements in TCP transport - Multiple stability fixed in Verbs and MLX5 transports - Multiple stability fixes in UCM memory hooks - Multiple stability fixes in UGNI transport - RPM Spec file cleanup - Fixing compilation issues with most recent clang and gcc compilers - Fixing the wrong name of aliases - Fix data race in UCP wireup - Fix segfault when libuct.so is reloaded - issue #3558 - Include Java sources in distribution - Handle EADDRNOTAVAIL in rdma_cm connection manager - Disable ibcm on RHEL7+ by default - Fix data race in UCP proxy endpoint - Static checker fixes - Fallback to ibv_create_cq() if ibv_create_cq_ex() returns ENOSYS - Fix malloc hooks test - Fix checking return status in ucp_client_server example - Fix gdrcopy libdir config value - Fix printing atomic capabilities in ucx_info - Fix perftest warmup iterations to be non-zero OBS-URL: https://build.opensuse.org/request/show/733589 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=33
2019-09-27 08:19:55 +00:00
AS_IF([test "x$bistro_hooks_happy" = "xyes"],
[AC_DEFINE([UCM_BISTRO_HOOKS], [1], [Enable BISTRO hooks])],
diff --git src/ucm/Makefile.am src/ucm/Makefile.am
index fa7a722f2d31..e6df414a4ecb 100644
--- src/ucm/Makefile.am
+++ src/ucm/Makefile.am
@@ -34,6 +34,7 @@ noinst_HEADERS = \
bistro/bistro_aarch64.h \
bistro/bistro_ppc64.h \
bistro/bistro_rv64.h
+ bistro/bistro_s390x.h
Accepting request 733589 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.6.0 - Features: - Modular architecture for UCT transports - ROCm transport re-design: support for managed memory, direct copy, ROCm GDR - Random scheduling policy for DC transport - Optimized out-of-box settings for multi-rail - Added support for OmniPath (using Verbs) - Support for PCI atomics with IB transports - Reduced UCP address size for homogeneous environments - Bugfixes: - Multiple stability and performance improvements in TCP transport - Multiple stability fixed in Verbs and MLX5 transports - Multiple stability fixes in UCM memory hooks - Multiple stability fixes in UGNI transport - RPM Spec file cleanup - Fixing compilation issues with most recent clang and gcc compilers - Fixing the wrong name of aliases - Fix data race in UCP wireup - Fix segfault when libuct.so is reloaded - issue #3558 - Include Java sources in distribution - Handle EADDRNOTAVAIL in rdma_cm connection manager - Disable ibcm on RHEL7+ by default - Fix data race in UCP proxy endpoint - Static checker fixes - Fallback to ibv_create_cq() if ibv_create_cq_ex() returns ENOSYS - Fix malloc hooks test - Fix checking return status in ucp_client_server example - Fix gdrcopy libdir config value - Fix printing atomic capabilities in ucx_info - Fix perftest warmup iterations to be non-zero OBS-URL: https://build.opensuse.org/request/show/733589 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=33
2019-09-27 08:19:55 +00:00
libucm_la_SOURCES = \
event/event.c \
diff --git src/ucm/bistro/bistro.h src/ucm/bistro/bistro.h
index 8d0b90751676..a0b9d3f064c3 100644
--- src/ucm/bistro/bistro.h
+++ src/ucm/bistro/bistro.h
@@ -23,6 +23,8 @@ typedef struct ucm_bistro_restore_point ucm_bistro_restore_point_t;
# include "bistro_x86_64.h"
#elif defined(__riscv)
# include "bistro_rv64.h"
+#elif defined(__s390x__)
+# include "bistro_s390x.h"
#else
# error "Unsupported architecture"
#endif
diff --git src/ucm/bistro/bistro_s390x.h src/ucm/bistro/bistro_s390x.h
new file mode 100644
index 000000000000..2beb5de54fab
--- /dev/null
+++ src/ucm/bistro/bistro_s390x.h
@@ -0,0 +1,27 @@
+#ifndef UCM_BISTRO_BISTRO_S390X_H_
+#define UCM_BISTRO_BISTRO_S390X_H_
+
+#include <stdint.h>
+
+#include <ucs/type/status.h>
+#include <ucs/sys/compiler_def.h>
+
+#define UCM_BISTRO_PROLOGUE
+#define UCM_BISTRO_EPILOGUE
+
+typedef struct ucm_bistro_patch {
+} UCS_S_PACKED ucm_bistro_patch_t;
+typedef struct {
+} UCS_S_PACKED ucm_bistro_lock_t;
+
Accepting request 921702 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.11.1 (jsc#SLE-19260) - Core: - Added support for UCX monitoring using virtual file system (VFS)/FUSE - Added support for applications with static CUDA runtime linking - Added support for a configuration file - Updated clang format configuration - UCP - Added rendezvous API for active messages - Added user-defined name to context, worker, and endpoint objects - Added flag to silence request leak check - Added API for endpoint performance evaluation - Added API - ucp_request_query - Added API - ucp_lib_query - Added bandwidth optimizations for new protocols multi-lane - Added support for multi-rail over lanes with BW ratio >= 1/4 - Added support for tracking outstanding requests and aborting those in case of connection failure - Refactored keep-alive protocol - Added device id to wireup protocol - Added support up to 128 transport layer resources in UCP context - Added support CUDA memory allocations with ucp_mem_map - Increased UCP_WORKER_MAX_EP_CONFIG to 64 - Adjusted memory type zcopy threshold when UCX_ZCOPY_THRESH set - Refactored wireup protocols, rendezvous, get, zcopy protocols - Added put zcopy multi-rail - Improved logging for new protocols - Added system topology information - Added new protocols for eager offload protocols - UCT - Extended connection establishment API OBS-URL: https://build.opensuse.org/request/show/921702 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=46
2021-09-27 09:00:18 +00:00
+static inline ucs_status_t ucm_bistro_patch(void *func_ptr, void *hook, const char *symbol,
+ void **orig_func_p,
+ ucm_bistro_restore_point_t **rp){
Accepting request 921702 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.11.1 (jsc#SLE-19260) - Core: - Added support for UCX monitoring using virtual file system (VFS)/FUSE - Added support for applications with static CUDA runtime linking - Added support for a configuration file - Updated clang format configuration - UCP - Added rendezvous API for active messages - Added user-defined name to context, worker, and endpoint objects - Added flag to silence request leak check - Added API for endpoint performance evaluation - Added API - ucp_request_query - Added API - ucp_lib_query - Added bandwidth optimizations for new protocols multi-lane - Added support for multi-rail over lanes with BW ratio >= 1/4 - Added support for tracking outstanding requests and aborting those in case of connection failure - Refactored keep-alive protocol - Added device id to wireup protocol - Added support up to 128 transport layer resources in UCP context - Added support CUDA memory allocations with ucp_mem_map - Increased UCP_WORKER_MAX_EP_CONFIG to 64 - Adjusted memory type zcopy threshold when UCX_ZCOPY_THRESH set - Refactored wireup protocols, rendezvous, get, zcopy protocols - Added put zcopy multi-rail - Improved logging for new protocols - Added system topology information - Added new protocols for eager offload protocols - UCT - Extended connection establishment API OBS-URL: https://build.opensuse.org/request/show/921702 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=46
2021-09-27 09:00:18 +00:00
+ return UCS_ERR_UNSUPPORTED;
+}
+
+static inline void ucm_bistro_patch_lock(void * UCS_V_UNUSED dst)
+{
+}
+
+#endif
diff --git src/ucs/Makefile.am src/ucs/Makefile.am
index 4a05f47b6369..c1cd2fb2cb57 100644
--- src/ucs/Makefile.am
+++ src/ucs/Makefile.am
@@ -24,6 +24,7 @@ nobase_dist_libucs_la_HEADERS = \
Accepting request 921702 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.11.1 (jsc#SLE-19260) - Core: - Added support for UCX monitoring using virtual file system (VFS)/FUSE - Added support for applications with static CUDA runtime linking - Added support for a configuration file - Updated clang format configuration - UCP - Added rendezvous API for active messages - Added user-defined name to context, worker, and endpoint objects - Added flag to silence request leak check - Added API for endpoint performance evaluation - Added API - ucp_request_query - Added API - ucp_lib_query - Added bandwidth optimizations for new protocols multi-lane - Added support for multi-rail over lanes with BW ratio >= 1/4 - Added support for tracking outstanding requests and aborting those in case of connection failure - Refactored keep-alive protocol - Added device id to wireup protocol - Added support up to 128 transport layer resources in UCP context - Added support CUDA memory allocations with ucp_mem_map - Increased UCP_WORKER_MAX_EP_CONFIG to 64 - Adjusted memory type zcopy threshold when UCX_ZCOPY_THRESH set - Refactored wireup protocols, rendezvous, get, zcopy protocols - Added put zcopy multi-rail - Improved logging for new protocols - Added system topology information - Added new protocols for eager offload protocols - UCT - Extended connection establishment API OBS-URL: https://build.opensuse.org/request/show/921702 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=46
2021-09-27 09:00:18 +00:00
arch/aarch64/bitops.h \
arch/ppc64/bitops.h \
arch/rv64/bitops.h \
Accepting request 921702 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.11.1 (jsc#SLE-19260) - Core: - Added support for UCX monitoring using virtual file system (VFS)/FUSE - Added support for applications with static CUDA runtime linking - Added support for a configuration file - Updated clang format configuration - UCP - Added rendezvous API for active messages - Added user-defined name to context, worker, and endpoint objects - Added flag to silence request leak check - Added API for endpoint performance evaluation - Added API - ucp_request_query - Added API - ucp_lib_query - Added bandwidth optimizations for new protocols multi-lane - Added support for multi-rail over lanes with BW ratio >= 1/4 - Added support for tracking outstanding requests and aborting those in case of connection failure - Refactored keep-alive protocol - Added device id to wireup protocol - Added support up to 128 transport layer resources in UCP context - Added support CUDA memory allocations with ucp_mem_map - Increased UCP_WORKER_MAX_EP_CONFIG to 64 - Adjusted memory type zcopy threshold when UCX_ZCOPY_THRESH set - Refactored wireup protocols, rendezvous, get, zcopy protocols - Added put zcopy multi-rail - Improved logging for new protocols - Added system topology information - Added new protocols for eager offload protocols - UCT - Extended connection establishment API OBS-URL: https://build.opensuse.org/request/show/921702 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=46
2021-09-27 09:00:18 +00:00
+ arch/s390x/bitops.h \
arch/x86_64/bitops.h \
arch/bitops.h \
algorithm/crc.h \
@@ -87,6 +88,7 @@ nobase_dist_libucs_la_HEADERS = \
Accepting request 921702 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.11.1 (jsc#SLE-19260) - Core: - Added support for UCX monitoring using virtual file system (VFS)/FUSE - Added support for applications with static CUDA runtime linking - Added support for a configuration file - Updated clang format configuration - UCP - Added rendezvous API for active messages - Added user-defined name to context, worker, and endpoint objects - Added flag to silence request leak check - Added API for endpoint performance evaluation - Added API - ucp_request_query - Added API - ucp_lib_query - Added bandwidth optimizations for new protocols multi-lane - Added support for multi-rail over lanes with BW ratio >= 1/4 - Added support for tracking outstanding requests and aborting those in case of connection failure - Refactored keep-alive protocol - Added device id to wireup protocol - Added support up to 128 transport layer resources in UCP context - Added support CUDA memory allocations with ucp_mem_map - Increased UCP_WORKER_MAX_EP_CONFIG to 64 - Adjusted memory type zcopy threshold when UCX_ZCOPY_THRESH set - Refactored wireup protocols, rendezvous, get, zcopy protocols - Added put zcopy multi-rail - Improved logging for new protocols - Added system topology information - Added new protocols for eager offload protocols - UCT - Extended connection establishment API OBS-URL: https://build.opensuse.org/request/show/921702 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=46
2021-09-27 09:00:18 +00:00
arch/generic/atomic.h \
Accepting request 811684 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.8.0 - Features: - Improved detection for DEVX support - Improved TCP scalability - Added support for ROCM to perftest - Added support for different source and target memory types to perftest - Added optimized memcpy for ROCM devices - Added hardware tag-matching for CUDA buffers - Added support for CUDA and ROCM managed memories - Added support for client/server disconnect protocol over rdma connection manager - Added support for striding receive queue for hardware tag-matching - Added XPMEM-based rendezvous protocol for shared memory - Added support shared memory communication between containers on same machine - Added support for multi-threaded RDMA memory registration for large regions - Added new test cases to Azure CI - Added support for multiple listening transports - Added UCT socket-based connection manager transport - Updated API for UCT component management - Added API to retrieve the listening port - Added UCP active message API - Removed deprecated API for querying UCT memory domains - Refactored server/client examples - Added support for dlopen interception in UCM - Added support for PCIe atomics - Updated Java API: added support for most of UCP layer operations - Updated support for Mellanox DevX API - Added multiple UCT/TCP transport performance optimizations - Optimized memcpy() for Intel platforms - Added protection from non-UCX socket based app connections - Improved search time for PKEY object - Enabled gtest over IPv6 interfaces - Updated Mellanox and Bull device IDs - Added support for CUDA_VISIBLE_DEVICES - Increased limits for CUDA IPC registration - Bugfixes: - Multiple fixes in JUCX - Fixes in UCP thread safety - Fixes for most recent versions GCC, PGI, and ICC - Fixes for CPU affinity on Azure instances - Fixes in XPMEM support on PPC64 - Performance fixes in CUDA IPC - Fixes in RDMA CM flows - Multiple fixes in TCP transport - Multiple fixes in documentation - Fixes in transport lane selection logic - Fixes in Java jar build - Fixes in socket connection manager for Nvidia DGX-2 platform - Multiple fixes in UCP, UCT, UCM libraries - Multiple fixes for BSD and Mac OS systems - Fixes for Clang compiler - Fix CPU optimization configuration options - Fix JUCX build on GPU nodes - Fix in Azure release pipeline flow - Fix in CUDA memory hooks management - Fix in GPU memory peer direct gtest - Fix in TCP connection establishment flow - Fix in GPU IPC check - Fix in CUDA Jenkins test flow - Multiple fixes in CUDA IPC flow - Fix adding missing header files - Fix to prevent failures in presence of VPN enabled Ethernet interfaces - Refresh openucx-s390x-support.patch against new version OBS-URL: https://build.opensuse.org/request/show/811684 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=37
2020-06-05 08:02:58 +00:00
arch/ppc64/global_opts.h \
arch/rv64/global_opts.h \
Accepting request 811684 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.8.0 - Features: - Improved detection for DEVX support - Improved TCP scalability - Added support for ROCM to perftest - Added support for different source and target memory types to perftest - Added optimized memcpy for ROCM devices - Added hardware tag-matching for CUDA buffers - Added support for CUDA and ROCM managed memories - Added support for client/server disconnect protocol over rdma connection manager - Added support for striding receive queue for hardware tag-matching - Added XPMEM-based rendezvous protocol for shared memory - Added support shared memory communication between containers on same machine - Added support for multi-threaded RDMA memory registration for large regions - Added new test cases to Azure CI - Added support for multiple listening transports - Added UCT socket-based connection manager transport - Updated API for UCT component management - Added API to retrieve the listening port - Added UCP active message API - Removed deprecated API for querying UCT memory domains - Refactored server/client examples - Added support for dlopen interception in UCM - Added support for PCIe atomics - Updated Java API: added support for most of UCP layer operations - Updated support for Mellanox DevX API - Added multiple UCT/TCP transport performance optimizations - Optimized memcpy() for Intel platforms - Added protection from non-UCX socket based app connections - Improved search time for PKEY object - Enabled gtest over IPv6 interfaces - Updated Mellanox and Bull device IDs - Added support for CUDA_VISIBLE_DEVICES - Increased limits for CUDA IPC registration - Bugfixes: - Multiple fixes in JUCX - Fixes in UCP thread safety - Fixes for most recent versions GCC, PGI, and ICC - Fixes for CPU affinity on Azure instances - Fixes in XPMEM support on PPC64 - Performance fixes in CUDA IPC - Fixes in RDMA CM flows - Multiple fixes in TCP transport - Multiple fixes in documentation - Fixes in transport lane selection logic - Fixes in Java jar build - Fixes in socket connection manager for Nvidia DGX-2 platform - Multiple fixes in UCP, UCT, UCM libraries - Multiple fixes for BSD and Mac OS systems - Fixes for Clang compiler - Fix CPU optimization configuration options - Fix JUCX build on GPU nodes - Fix in Azure release pipeline flow - Fix in CUDA memory hooks management - Fix in GPU memory peer direct gtest - Fix in TCP connection establishment flow - Fix in GPU IPC check - Fix in CUDA Jenkins test flow - Multiple fixes in CUDA IPC flow - Fix adding missing header files - Fix to prevent failures in presence of VPN enabled Ethernet interfaces - Refresh openucx-s390x-support.patch against new version OBS-URL: https://build.opensuse.org/request/show/811684 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=37
2020-06-05 08:02:58 +00:00
+ arch/s390x/global_opts.h \
arch/global_opts.h
noinst_HEADERS = \
@@ -94,6 +96,7 @@ noinst_HEADERS = \
arch/generic/cpu.h \
arch/ppc64/cpu.h \
arch/rv64/cpu.h \
+ arch/s390x/cpu.h \
arch/x86_64/cpu.h \
Accepting request 921702 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.11.1 (jsc#SLE-19260) - Core: - Added support for UCX monitoring using virtual file system (VFS)/FUSE - Added support for applications with static CUDA runtime linking - Added support for a configuration file - Updated clang format configuration - UCP - Added rendezvous API for active messages - Added user-defined name to context, worker, and endpoint objects - Added flag to silence request leak check - Added API for endpoint performance evaluation - Added API - ucp_request_query - Added API - ucp_lib_query - Added bandwidth optimizations for new protocols multi-lane - Added support for multi-rail over lanes with BW ratio >= 1/4 - Added support for tracking outstanding requests and aborting those in case of connection failure - Refactored keep-alive protocol - Added device id to wireup protocol - Added support up to 128 transport layer resources in UCP context - Added support CUDA memory allocations with ucp_mem_map - Increased UCP_WORKER_MAX_EP_CONFIG to 64 - Adjusted memory type zcopy threshold when UCX_ZCOPY_THRESH set - Refactored wireup protocols, rendezvous, get, zcopy protocols - Added put zcopy multi-rail - Improved logging for new protocols - Added system topology information - Added new protocols for eager offload protocols - UCT - Extended connection establishment API OBS-URL: https://build.opensuse.org/request/show/921702 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=46
2021-09-27 09:00:18 +00:00
arch/cpu.h \
Accepting request 1006486 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.13.1 (jsc#PED-912) - Core - Added new objects to VFS: local and remote address of endpoint, statistics of ucp_ep_create success/failure, failed/destroyed endpoints - Added support for UCX static libraries - Added profiling for rkey management routines - PCIe relaxed order enabled by default for AMD CPUs - Fixed not deallocating memory from ucp_mem_unmap if no rcache - Fixed versioning infrastructure - Multiple code improvements: refactoring, debug prints and assertions, etc. - Multiple improvements in build, test and docs infrastructure - Added new objects to VFS (md, component, log_level, etc.) - Added configuration variable to specify which loadable modules are allowed - Added build-time configuration to disable sigaction overriding - UCP - Added API to pass pre-registered memory handle to UCP operations - Added implementation of AM rendezvous protocol - Added 2-stage pipeline rendezvous protocol for GPU - Added support for fragment mem_type for v1 pipeline proto, disabled by default - Added active message support for proto v2 - Added UCP memory registration cache - Improved adaptive progress - deactivate iface when all p2p lanes are destroyed - Added support for user memh in proto_v1 - Added support for selecting local address when creating a client endpoint - Added option to limit GPUDirectRDMA size in rendezvous protocol, UCX_RNDV_MEMTYPE_DIRECT_SIZE - Deprecated UCX_SOCKADDR_AUX_TLS configuration parameter - Resolving remote EP ID when creating local EP disabled by default - Added client_id to ucp_worker_create() and ucp_conn_request_query() APIs - Added ucp_worker_address_query() API - Updated ucp_ep_query() API for getting local and remote addresses - Added address versioning to correctly preserve wire compatibility starting from version 1.11.0 - Added new client/server connection establishment packet header format - Enabled rendezvous and tag sync protocols when error handling is enabled on the endpoint - Added iov zcopy support to RMA operations - Reduced memory usage of unexpected messages by fitting receive buffer size to packet size - Added support for modifying UCT and UCS configs by ucp_config_modify() API - Optimized unpacked rkeys memory consumption - Added request flag to influence latency vs. bandwidth protocol - Reduced memory management overhead with new protocols - Improved performance calculations for new protocols - Added AMO support with GPU memory target using new protocols - Added put_zcopy, get_zcopy and pipeline based rendezvous in new protocols - Added support for user-defined alignment in Active Messages - Added support for offload tag sync in new protocols - Updated ucp_atomic_post() to use NBX flow - UCT - Introduced API uct_md_mkey_pack_v2 - Introduced UCT iface features API - Introduced max_inflight_eps parameter in perf_attr API - Introduced UCT_SEND_FLAG_PEER_CHECK flag that forces checking connectivity to a peer - Introduced UCX_RCACHE_PURGE_ON_FORK to enable/disable cleaning regions when application is forking - Disabled PEER_FAILURE capability for XPMEM - Added API - uct_iface_is_reachable_v2() - Added IPv6 address support in TCP - Added latency estimation to uct_iface_estimate_perf() - Adjusted knem and cma overhead cost - Increased built-in TCP keep-alive interval to 2 seconds - RDMA CORE (IB, ROCE, etc.) - Introduced NDR autorecognition - Introduced CQE zipping support - Set the default MAX_RD_ATOMIC to maximum value supported by the hardware - Disabled mlx5 ifaces on verbs MD - Added detection of IB NDR devices - Added check for CQ overrun in assert mode - Added bitmap usage for releasing detached DCIs - Added configuration for requests ack frequency with DevX - Added remote QP info to tx error CQE traces - ROCM - Increased maximum number of HSA agents - UCS - Added topo module infrastructure - Added memtrack and rcache information to VFS - Added API for a per-process aggregate-sum statistics report - Added memory pool set data structure - Added new ptr_array API for bulk allocation - Added ucs_string_buffer_append_flags() for string buffer - Added ucs_ffs32() - Added ucs_vsnprintf_safe() which always adds '\0' - Added thread-safe put to ptr_map - Improved accuracy of the topology distance estimation - Added prints of leaked callbacks from the callback queue - Removed a diagnostic message when fuse thread is stopped - Added configurable limit for the memory consumed by rcache - Added configuration for VFS(FUSE) thread affinity - Added memory limit support to memtrack - Packaging - Added cmake config files for better integration with external cmake based projects - Tools - Added loop-back transport support in ucx_perftest - Split ucx_perftest into separate modules - Added process placement option for ucx_info - Extended parameters correctness check in ucx_perftest - Backported UCS-DEBUG-replace-PTR-with-void.patch from upstream to fix compilation OBS-URL: https://build.opensuse.org/request/show/1006486 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=48
2022-09-29 15:27:45 +00:00
config/ucm_opts.h \
@@ -149,6 +152,7 @@ libucs_la_SOURCES = \
Accepting request 1075167 from home:NMorey:branches:science:HPC - Update to v1.14.0 - UCP - Added API for querying transport and device names on endpoint - Added API for querying datatype object - Added API for exporting and importing memory keys (no implementation yet) - Added support for non-persistent active message header - Added infrastructure to print protocols v2 performance - Multiple performance improvements for protocols v2 - Added support for non-contiguous datatypes for rendezvous protocols v2 - Added support for reset and abort request in protocols v2 - Added support for user memory handles in RMA API - Added multi-rail support for RMA API in protocols v2 - Added support for up to 16 different lanes per endpoint - Added support for dmabuf memory registration in protocols v2 - Added strong fence mode for ucp_worker_fence() API - UCT - Added new uct_md_mem_attach() API to support exported memory handles - Added remote completion mode for endpoint flush (via new flag) - Added support for dmabuf registration - Added new uct_ep_connect_to_ep_v2() API - Added new uct_mem_reg_v2() API - Added new uct_md_query_v2() API - Added support for IPv6 loopback address in TCP transport - RDMA CORE (IB, ROCE, etc.) - Added ECE (enhanced connection establishment) support for RC and DC transports - Added support for hardware DCS in DC transport - Added UD interface and endpoint resource information to VFS - Added CQ creation via DEVX API - Removed support for accelerated IB transports over legacy experimental verbs - UCS - Added support for auto-correction of user environment variables - UCM - Implemented CUDA bistro hooks for aarch64 (to enable memory cache on this platform) - Added support for CUDA virtual/stream-ordered memory with cudaMallocAsync - Documentation - Added FAQ for using pkg-config tool to build applications with UCX - Tools - Added runtime library version to the 'ucx_info -v' output - Added support for memory types in ucx_info - Many bugfixes. See NEWS. - Drop patch merged upstream: - UCS-DEBUG-replace-PTR-with-void.patch - gcc13-fix.patch - Refresh openucx-s390x-support.patch OBS-URL: https://build.opensuse.org/request/show/1075167 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=57
2023-03-29 08:50:48 +00:00
algorithm/string_distance.c \
Accepting request 811684 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.8.0 - Features: - Improved detection for DEVX support - Improved TCP scalability - Added support for ROCM to perftest - Added support for different source and target memory types to perftest - Added optimized memcpy for ROCM devices - Added hardware tag-matching for CUDA buffers - Added support for CUDA and ROCM managed memories - Added support for client/server disconnect protocol over rdma connection manager - Added support for striding receive queue for hardware tag-matching - Added XPMEM-based rendezvous protocol for shared memory - Added support shared memory communication between containers on same machine - Added support for multi-threaded RDMA memory registration for large regions - Added new test cases to Azure CI - Added support for multiple listening transports - Added UCT socket-based connection manager transport - Updated API for UCT component management - Added API to retrieve the listening port - Added UCP active message API - Removed deprecated API for querying UCT memory domains - Refactored server/client examples - Added support for dlopen interception in UCM - Added support for PCIe atomics - Updated Java API: added support for most of UCP layer operations - Updated support for Mellanox DevX API - Added multiple UCT/TCP transport performance optimizations - Optimized memcpy() for Intel platforms - Added protection from non-UCX socket based app connections - Improved search time for PKEY object - Enabled gtest over IPv6 interfaces - Updated Mellanox and Bull device IDs - Added support for CUDA_VISIBLE_DEVICES - Increased limits for CUDA IPC registration - Bugfixes: - Multiple fixes in JUCX - Fixes in UCP thread safety - Fixes for most recent versions GCC, PGI, and ICC - Fixes for CPU affinity on Azure instances - Fixes in XPMEM support on PPC64 - Performance fixes in CUDA IPC - Fixes in RDMA CM flows - Multiple fixes in TCP transport - Multiple fixes in documentation - Fixes in transport lane selection logic - Fixes in Java jar build - Fixes in socket connection manager for Nvidia DGX-2 platform - Multiple fixes in UCP, UCT, UCM libraries - Multiple fixes for BSD and Mac OS systems - Fixes for Clang compiler - Fix CPU optimization configuration options - Fix JUCX build on GPU nodes - Fix in Azure release pipeline flow - Fix in CUDA memory hooks management - Fix in GPU memory peer direct gtest - Fix in TCP connection establishment flow - Fix in GPU IPC check - Fix in CUDA Jenkins test flow - Multiple fixes in CUDA IPC flow - Fix adding missing header files - Fix to prevent failures in presence of VPN enabled Ethernet interfaces - Refresh openucx-s390x-support.patch against new version OBS-URL: https://build.opensuse.org/request/show/811684 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=37
2020-06-05 08:02:58 +00:00
arch/aarch64/cpu.c \
arch/aarch64/global_opts.c \
+ arch/s390x/global_opts.c \
arch/ppc64/timebase.c \
arch/ppc64/global_opts.c \
arch/rv64/cpu.c \
diff --git src/ucs/arch/atomic.h src/ucs/arch/atomic.h
index 849647902fab..a328c37e2020 100644
--- src/ucs/arch/atomic.h
+++ src/ucs/arch/atomic.h
@@ -18,6 +18,8 @@
# include "generic/atomic.h"
#elif defined(__riscv)
# include "generic/atomic.h"
+#elif defined(__s390x__)
+# include "generic/atomic.h"
#else
# error "Unsupported architecture"
#endif
diff --git src/ucs/arch/bitops.h src/ucs/arch/bitops.h
index 3e0e530f1336..f887e03ebac0 100644
--- src/ucs/arch/bitops.h
+++ src/ucs/arch/bitops.h
@@ -23,6 +23,8 @@ BEGIN_C_DECLS
# include "aarch64/bitops.h"
#elif defined(__riscv)
# include "rv64/bitops.h"
+#elif defined(__s390x__)
+# include "s390x/bitops.h"
#else
# error "Unsupported architecture"
#endif
Accepting request 811684 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.8.0 - Features: - Improved detection for DEVX support - Improved TCP scalability - Added support for ROCM to perftest - Added support for different source and target memory types to perftest - Added optimized memcpy for ROCM devices - Added hardware tag-matching for CUDA buffers - Added support for CUDA and ROCM managed memories - Added support for client/server disconnect protocol over rdma connection manager - Added support for striding receive queue for hardware tag-matching - Added XPMEM-based rendezvous protocol for shared memory - Added support shared memory communication between containers on same machine - Added support for multi-threaded RDMA memory registration for large regions - Added new test cases to Azure CI - Added support for multiple listening transports - Added UCT socket-based connection manager transport - Updated API for UCT component management - Added API to retrieve the listening port - Added UCP active message API - Removed deprecated API for querying UCT memory domains - Refactored server/client examples - Added support for dlopen interception in UCM - Added support for PCIe atomics - Updated Java API: added support for most of UCP layer operations - Updated support for Mellanox DevX API - Added multiple UCT/TCP transport performance optimizations - Optimized memcpy() for Intel platforms - Added protection from non-UCX socket based app connections - Improved search time for PKEY object - Enabled gtest over IPv6 interfaces - Updated Mellanox and Bull device IDs - Added support for CUDA_VISIBLE_DEVICES - Increased limits for CUDA IPC registration - Bugfixes: - Multiple fixes in JUCX - Fixes in UCP thread safety - Fixes for most recent versions GCC, PGI, and ICC - Fixes for CPU affinity on Azure instances - Fixes in XPMEM support on PPC64 - Performance fixes in CUDA IPC - Fixes in RDMA CM flows - Multiple fixes in TCP transport - Multiple fixes in documentation - Fixes in transport lane selection logic - Fixes in Java jar build - Fixes in socket connection manager for Nvidia DGX-2 platform - Multiple fixes in UCP, UCT, UCM libraries - Multiple fixes for BSD and Mac OS systems - Fixes for Clang compiler - Fix CPU optimization configuration options - Fix JUCX build on GPU nodes - Fix in Azure release pipeline flow - Fix in CUDA memory hooks management - Fix in GPU memory peer direct gtest - Fix in TCP connection establishment flow - Fix in GPU IPC check - Fix in CUDA Jenkins test flow - Multiple fixes in CUDA IPC flow - Fix adding missing header files - Fix to prevent failures in presence of VPN enabled Ethernet interfaces - Refresh openucx-s390x-support.patch against new version OBS-URL: https://build.opensuse.org/request/show/811684 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=37
2020-06-05 08:02:58 +00:00
diff --git src/ucs/arch/cpu.c src/ucs/arch/cpu.c
index 307fb61bfc4a..4356fff36f8b 100644
Accepting request 811684 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.8.0 - Features: - Improved detection for DEVX support - Improved TCP scalability - Added support for ROCM to perftest - Added support for different source and target memory types to perftest - Added optimized memcpy for ROCM devices - Added hardware tag-matching for CUDA buffers - Added support for CUDA and ROCM managed memories - Added support for client/server disconnect protocol over rdma connection manager - Added support for striding receive queue for hardware tag-matching - Added XPMEM-based rendezvous protocol for shared memory - Added support shared memory communication between containers on same machine - Added support for multi-threaded RDMA memory registration for large regions - Added new test cases to Azure CI - Added support for multiple listening transports - Added UCT socket-based connection manager transport - Updated API for UCT component management - Added API to retrieve the listening port - Added UCP active message API - Removed deprecated API for querying UCT memory domains - Refactored server/client examples - Added support for dlopen interception in UCM - Added support for PCIe atomics - Updated Java API: added support for most of UCP layer operations - Updated support for Mellanox DevX API - Added multiple UCT/TCP transport performance optimizations - Optimized memcpy() for Intel platforms - Added protection from non-UCX socket based app connections - Improved search time for PKEY object - Enabled gtest over IPv6 interfaces - Updated Mellanox and Bull device IDs - Added support for CUDA_VISIBLE_DEVICES - Increased limits for CUDA IPC registration - Bugfixes: - Multiple fixes in JUCX - Fixes in UCP thread safety - Fixes for most recent versions GCC, PGI, and ICC - Fixes for CPU affinity on Azure instances - Fixes in XPMEM support on PPC64 - Performance fixes in CUDA IPC - Fixes in RDMA CM flows - Multiple fixes in TCP transport - Multiple fixes in documentation - Fixes in transport lane selection logic - Fixes in Java jar build - Fixes in socket connection manager for Nvidia DGX-2 platform - Multiple fixes in UCP, UCT, UCM libraries - Multiple fixes for BSD and Mac OS systems - Fixes for Clang compiler - Fix CPU optimization configuration options - Fix JUCX build on GPU nodes - Fix in Azure release pipeline flow - Fix in CUDA memory hooks management - Fix in GPU memory peer direct gtest - Fix in TCP connection establishment flow - Fix in GPU IPC check - Fix in CUDA Jenkins test flow - Multiple fixes in CUDA IPC flow - Fix adding missing header files - Fix to prevent failures in presence of VPN enabled Ethernet interfaces - Refresh openucx-s390x-support.patch against new version OBS-URL: https://build.opensuse.org/request/show/811684 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=37
2020-06-05 08:02:58 +00:00
--- src/ucs/arch/cpu.c
+++ src/ucs/arch/cpu.c
@@ -64,6 +64,10 @@ const ucs_cpu_builtin_memcpy_t ucs_cpu_builtin_memcpy[UCS_CPU_VENDOR_LAST] = {
Accepting request 811684 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.8.0 - Features: - Improved detection for DEVX support - Improved TCP scalability - Added support for ROCM to perftest - Added support for different source and target memory types to perftest - Added optimized memcpy for ROCM devices - Added hardware tag-matching for CUDA buffers - Added support for CUDA and ROCM managed memories - Added support for client/server disconnect protocol over rdma connection manager - Added support for striding receive queue for hardware tag-matching - Added XPMEM-based rendezvous protocol for shared memory - Added support shared memory communication between containers on same machine - Added support for multi-threaded RDMA memory registration for large regions - Added new test cases to Azure CI - Added support for multiple listening transports - Added UCT socket-based connection manager transport - Updated API for UCT component management - Added API to retrieve the listening port - Added UCP active message API - Removed deprecated API for querying UCT memory domains - Refactored server/client examples - Added support for dlopen interception in UCM - Added support for PCIe atomics - Updated Java API: added support for most of UCP layer operations - Updated support for Mellanox DevX API - Added multiple UCT/TCP transport performance optimizations - Optimized memcpy() for Intel platforms - Added protection from non-UCX socket based app connections - Improved search time for PKEY object - Enabled gtest over IPv6 interfaces - Updated Mellanox and Bull device IDs - Added support for CUDA_VISIBLE_DEVICES - Increased limits for CUDA IPC registration - Bugfixes: - Multiple fixes in JUCX - Fixes in UCP thread safety - Fixes for most recent versions GCC, PGI, and ICC - Fixes for CPU affinity on Azure instances - Fixes in XPMEM support on PPC64 - Performance fixes in CUDA IPC - Fixes in RDMA CM flows - Multiple fixes in TCP transport - Multiple fixes in documentation - Fixes in transport lane selection logic - Fixes in Java jar build - Fixes in socket connection manager for Nvidia DGX-2 platform - Multiple fixes in UCP, UCT, UCM libraries - Multiple fixes for BSD and Mac OS systems - Fixes for Clang compiler - Fix CPU optimization configuration options - Fix JUCX build on GPU nodes - Fix in Azure release pipeline flow - Fix in CUDA memory hooks management - Fix in GPU memory peer direct gtest - Fix in TCP connection establishment flow - Fix in GPU IPC check - Fix in CUDA Jenkins test flow - Multiple fixes in CUDA IPC flow - Fix adding missing header files - Fix to prevent failures in presence of VPN enabled Ethernet interfaces - Refresh openucx-s390x-support.patch against new version OBS-URL: https://build.opensuse.org/request/show/811684 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=37
2020-06-05 08:02:58 +00:00
.min = UCS_MEMUNITS_INF,
.max = UCS_MEMUNITS_INF
Accepting request 840386 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.9.0 (jsc#SLE-15163) - Features: - Added a new class of communication APIs '*_nbx' that enable API extendability while - preserving ABI backward compatibility - Added asynchronous event support to UCT/IB/DEVX - Added support for latest CUDA library version - Added NAK-based reliability protocol for UCT/IB/UD to optimize resends - Added new tests for ROCm - Added new configuration parameters for protocol selection - Added performance optimization for Fujitsu A64FX with InfiniBand - Added performance optimization for clear cache code aarch64 - Added support for relaxed-order PCIe access in IB RDMA transports - Added new TCP connection manager - Added support for UCT/IB PKey with partial membership in IB transports - Added support for RoCE LAG - Added support for ROCm 3.7 and above - Added flow control for RDMA read operations - Improved endpoint flush implementation for UCT/IB - Improved UD timer to avoid interrupting the main thread when not in use - Improved latency estimation for network path with CUDA - Improved error reporting messages - Improved performance in active message flow (removed malloc call) - Improved performance in ptr_array flow - Improved performance in UCT/SM progress engine flow - Improved I/O demo code - Improved rendezvous protocol for CUDA - Updated examples code - Bugfixes: - Fixes for most resent versions of GCC, CLANG, ARMCLANG, PGI - Fixes in UCT/IB for strict order keys OBS-URL: https://build.opensuse.org/request/show/840386 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=42
2020-10-09 06:50:44 +00:00
},
Accepting request 811684 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.8.0 - Features: - Improved detection for DEVX support - Improved TCP scalability - Added support for ROCM to perftest - Added support for different source and target memory types to perftest - Added optimized memcpy for ROCM devices - Added hardware tag-matching for CUDA buffers - Added support for CUDA and ROCM managed memories - Added support for client/server disconnect protocol over rdma connection manager - Added support for striding receive queue for hardware tag-matching - Added XPMEM-based rendezvous protocol for shared memory - Added support shared memory communication between containers on same machine - Added support for multi-threaded RDMA memory registration for large regions - Added new test cases to Azure CI - Added support for multiple listening transports - Added UCT socket-based connection manager transport - Updated API for UCT component management - Added API to retrieve the listening port - Added UCP active message API - Removed deprecated API for querying UCT memory domains - Refactored server/client examples - Added support for dlopen interception in UCM - Added support for PCIe atomics - Updated Java API: added support for most of UCP layer operations - Updated support for Mellanox DevX API - Added multiple UCT/TCP transport performance optimizations - Optimized memcpy() for Intel platforms - Added protection from non-UCX socket based app connections - Improved search time for PKEY object - Enabled gtest over IPv6 interfaces - Updated Mellanox and Bull device IDs - Added support for CUDA_VISIBLE_DEVICES - Increased limits for CUDA IPC registration - Bugfixes: - Multiple fixes in JUCX - Fixes in UCP thread safety - Fixes for most recent versions GCC, PGI, and ICC - Fixes for CPU affinity on Azure instances - Fixes in XPMEM support on PPC64 - Performance fixes in CUDA IPC - Fixes in RDMA CM flows - Multiple fixes in TCP transport - Multiple fixes in documentation - Fixes in transport lane selection logic - Fixes in Java jar build - Fixes in socket connection manager for Nvidia DGX-2 platform - Multiple fixes in UCP, UCT, UCM libraries - Multiple fixes for BSD and Mac OS systems - Fixes for Clang compiler - Fix CPU optimization configuration options - Fix JUCX build on GPU nodes - Fix in Azure release pipeline flow - Fix in CUDA memory hooks management - Fix in GPU memory peer direct gtest - Fix in TCP connection establishment flow - Fix in GPU IPC check - Fix in CUDA Jenkins test flow - Multiple fixes in CUDA IPC flow - Fix adding missing header files - Fix to prevent failures in presence of VPN enabled Ethernet interfaces - Refresh openucx-s390x-support.patch against new version OBS-URL: https://build.opensuse.org/request/show/811684 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=37
2020-06-05 08:02:58 +00:00
+ [UCS_CPU_VENDOR_GENERIC_IBM] = {
+ .min = UCS_MEMUNITS_INF,
+ .max = UCS_MEMUNITS_INF
+ },
Accepting request 840386 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.9.0 (jsc#SLE-15163) - Features: - Added a new class of communication APIs '*_nbx' that enable API extendability while - preserving ABI backward compatibility - Added asynchronous event support to UCT/IB/DEVX - Added support for latest CUDA library version - Added NAK-based reliability protocol for UCT/IB/UD to optimize resends - Added new tests for ROCm - Added new configuration parameters for protocol selection - Added performance optimization for Fujitsu A64FX with InfiniBand - Added performance optimization for clear cache code aarch64 - Added support for relaxed-order PCIe access in IB RDMA transports - Added new TCP connection manager - Added support for UCT/IB PKey with partial membership in IB transports - Added support for RoCE LAG - Added support for ROCm 3.7 and above - Added flow control for RDMA read operations - Improved endpoint flush implementation for UCT/IB - Improved UD timer to avoid interrupting the main thread when not in use - Improved latency estimation for network path with CUDA - Improved error reporting messages - Improved performance in active message flow (removed malloc call) - Improved performance in ptr_array flow - Improved performance in UCT/SM progress engine flow - Improved I/O demo code - Improved rendezvous protocol for CUDA - Updated examples code - Bugfixes: - Fixes for most resent versions of GCC, CLANG, ARMCLANG, PGI - Fixes in UCT/IB for strict order keys OBS-URL: https://build.opensuse.org/request/show/840386 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=42
2020-10-09 06:50:44 +00:00
[UCS_CPU_VENDOR_FUJITSU_ARM] = {
.min = UCS_MEMUNITS_INF,
.max = UCS_MEMUNITS_INF
@@ -89,6 +93,7 @@ const size_t ucs_cpu_est_bcopy_bw[UCS_CPU_VENDOR_LAST] = {
[UCS_CPU_VENDOR_GENERIC_ARM] = UCS_CPU_EST_BCOPY_BW_DEFAULT,
[UCS_CPU_VENDOR_GENERIC_PPC] = UCS_CPU_EST_BCOPY_BW_DEFAULT,
[UCS_CPU_VENDOR_GENERIC_RV64G] = UCS_CPU_EST_BCOPY_BW_DEFAULT,
Accepting request 1115979 from home:NMorey:branches:science:HPC - Update to 1.15.0 - UCP - Added 2-stage pipeline protocol in the new protocol infrastructure - Added reset and abort functionality of rendezvous protocols in the new infrastructure - Added zero-copy rendezvous data send protocol in the new infrastructure - Added support for user memory handle in the new protocol infrastructure - Added option to force ODP registration for certain memory types - Enabled lock free memory region deregistration - Updated allow/deny transport list feature to control auxiliary transport selection - Multiple performance improvements of the new protocol infrastructure - Multiple improvements in error and debug messages - Fixed assertion when sending from non-contiguous GPU buffer to managed buffer - Fixed the race condition on endpoint configurations - Fixed endpoint reconfiguration issues due to asymmetrical selection - Fixed endpoint reconfiguration error due to wrong locality detection - Fixed crash during connection manager cleanup - Fixed rkey index calculation for rendezvous protocol - Fixed rcache dump function - Removed logging from rkey unpack in release mode - Fixed dobule free of rkey in rendezvous protocol - Fixed rendezvous pipeline protocol error flow - Fixed error handling in rendezvous get zcopy protocol - Replay pending requests of wireup EP CM during connection establishment to prevent potential ordering issues and wrong configuration - Pass user-provided memory type to the function that checks whether the buffer can be sent inline or not - Avoid memory registration during UCP context initialization - Fixed CPU/device atomics selection in the new protocol infrastructure - Multiple fixes in the new protocol infrastructure information output OBS-URL: https://build.opensuse.org/request/show/1115979 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=63
2023-10-06 09:59:22 +00:00
+ [UCS_CPU_VENDOR_GENERIC_IBM] = UCS_CPU_EST_BCOPY_BW_DEFAULT,
[UCS_CPU_VENDOR_FUJITSU_ARM] = UCS_CPU_EST_BCOPY_BW_FUJITSU_ARM,
[UCS_CPU_VENDOR_ZHAOXIN] = UCS_CPU_EST_BCOPY_BW_DEFAULT,
[UCS_CPU_VENDOR_NVIDIA] = UCS_CPU_EST_BCOPY_BW_DEFAULT
@@ -183,6 +188,7 @@ const char *ucs_cpu_vendor_name()
[UCS_CPU_VENDOR_GENERIC_ARM] = "Generic ARM",
[UCS_CPU_VENDOR_GENERIC_PPC] = "Generic PPC",
[UCS_CPU_VENDOR_GENERIC_RV64G] = "Generic RV64G",
+ [UCS_CPU_VENDOR_GENERIC_IBM] = "Generic IBM",
[UCS_CPU_VENDOR_FUJITSU_ARM] = "Fujitsu ARM",
[UCS_CPU_VENDOR_ZHAOXIN] = "Zhaoxin",
[UCS_CPU_VENDOR_NVIDIA] = "Nvidia"
@@ -212,6 +218,7 @@ const char *ucs_cpu_model_name()
[UCS_CPU_MODEL_ZHAOXIN_WUDAOKOU] = "Wudaokou",
[UCS_CPU_MODEL_ZHAOXIN_LUJIAZUI] = "Lujiazui",
[UCS_CPU_MODEL_RV64G] = "RV64G",
+ [UCS_CPU_MODEL_S390X] = "S390x",
[UCS_CPU_MODEL_NVIDIA_GRACE] = "Grace"
};
diff --git src/ucs/arch/cpu.h src/ucs/arch/cpu.h
index ca25e714d141..e97405c30d52 100644
--- src/ucs/arch/cpu.h
+++ src/ucs/arch/cpu.h
@@ -39,6 +39,7 @@ typedef enum ucs_cpu_model {
UCS_CPU_MODEL_ZHAOXIN_WUDAOKOU,
UCS_CPU_MODEL_ZHAOXIN_LUJIAZUI,
UCS_CPU_MODEL_RV64G,
+ UCS_CPU_MODEL_S390X,
UCS_CPU_MODEL_NVIDIA_GRACE,
UCS_CPU_MODEL_LAST
} ucs_cpu_model_t;
@@ -68,6 +69,7 @@ typedef enum ucs_cpu_vendor {
Accepting request 811684 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.8.0 - Features: - Improved detection for DEVX support - Improved TCP scalability - Added support for ROCM to perftest - Added support for different source and target memory types to perftest - Added optimized memcpy for ROCM devices - Added hardware tag-matching for CUDA buffers - Added support for CUDA and ROCM managed memories - Added support for client/server disconnect protocol over rdma connection manager - Added support for striding receive queue for hardware tag-matching - Added XPMEM-based rendezvous protocol for shared memory - Added support shared memory communication between containers on same machine - Added support for multi-threaded RDMA memory registration for large regions - Added new test cases to Azure CI - Added support for multiple listening transports - Added UCT socket-based connection manager transport - Updated API for UCT component management - Added API to retrieve the listening port - Added UCP active message API - Removed deprecated API for querying UCT memory domains - Refactored server/client examples - Added support for dlopen interception in UCM - Added support for PCIe atomics - Updated Java API: added support for most of UCP layer operations - Updated support for Mellanox DevX API - Added multiple UCT/TCP transport performance optimizations - Optimized memcpy() for Intel platforms - Added protection from non-UCX socket based app connections - Improved search time for PKEY object - Enabled gtest over IPv6 interfaces - Updated Mellanox and Bull device IDs - Added support for CUDA_VISIBLE_DEVICES - Increased limits for CUDA IPC registration - Bugfixes: - Multiple fixes in JUCX - Fixes in UCP thread safety - Fixes for most recent versions GCC, PGI, and ICC - Fixes for CPU affinity on Azure instances - Fixes in XPMEM support on PPC64 - Performance fixes in CUDA IPC - Fixes in RDMA CM flows - Multiple fixes in TCP transport - Multiple fixes in documentation - Fixes in transport lane selection logic - Fixes in Java jar build - Fixes in socket connection manager for Nvidia DGX-2 platform - Multiple fixes in UCP, UCT, UCM libraries - Multiple fixes for BSD and Mac OS systems - Fixes for Clang compiler - Fix CPU optimization configuration options - Fix JUCX build on GPU nodes - Fix in Azure release pipeline flow - Fix in CUDA memory hooks management - Fix in GPU memory peer direct gtest - Fix in TCP connection establishment flow - Fix in GPU IPC check - Fix in CUDA Jenkins test flow - Multiple fixes in CUDA IPC flow - Fix adding missing header files - Fix to prevent failures in presence of VPN enabled Ethernet interfaces - Refresh openucx-s390x-support.patch against new version OBS-URL: https://build.opensuse.org/request/show/811684 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=37
2020-06-05 08:02:58 +00:00
UCS_CPU_VENDOR_AMD,
UCS_CPU_VENDOR_GENERIC_ARM,
UCS_CPU_VENDOR_GENERIC_PPC,
+ UCS_CPU_VENDOR_GENERIC_IBM,
Accepting request 840386 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.9.0 (jsc#SLE-15163) - Features: - Added a new class of communication APIs '*_nbx' that enable API extendability while - preserving ABI backward compatibility - Added asynchronous event support to UCT/IB/DEVX - Added support for latest CUDA library version - Added NAK-based reliability protocol for UCT/IB/UD to optimize resends - Added new tests for ROCm - Added new configuration parameters for protocol selection - Added performance optimization for Fujitsu A64FX with InfiniBand - Added performance optimization for clear cache code aarch64 - Added support for relaxed-order PCIe access in IB RDMA transports - Added new TCP connection manager - Added support for UCT/IB PKey with partial membership in IB transports - Added support for RoCE LAG - Added support for ROCm 3.7 and above - Added flow control for RDMA read operations - Improved endpoint flush implementation for UCT/IB - Improved UD timer to avoid interrupting the main thread when not in use - Improved latency estimation for network path with CUDA - Improved error reporting messages - Improved performance in active message flow (removed malloc call) - Improved performance in ptr_array flow - Improved performance in UCT/SM progress engine flow - Improved I/O demo code - Improved rendezvous protocol for CUDA - Updated examples code - Bugfixes: - Fixes for most resent versions of GCC, CLANG, ARMCLANG, PGI - Fixes in UCT/IB for strict order keys OBS-URL: https://build.opensuse.org/request/show/840386 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=42
2020-10-09 06:50:44 +00:00
UCS_CPU_VENDOR_FUJITSU_ARM,
Accepting request 921702 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.11.1 (jsc#SLE-19260) - Core: - Added support for UCX monitoring using virtual file system (VFS)/FUSE - Added support for applications with static CUDA runtime linking - Added support for a configuration file - Updated clang format configuration - UCP - Added rendezvous API for active messages - Added user-defined name to context, worker, and endpoint objects - Added flag to silence request leak check - Added API for endpoint performance evaluation - Added API - ucp_request_query - Added API - ucp_lib_query - Added bandwidth optimizations for new protocols multi-lane - Added support for multi-rail over lanes with BW ratio >= 1/4 - Added support for tracking outstanding requests and aborting those in case of connection failure - Refactored keep-alive protocol - Added device id to wireup protocol - Added support up to 128 transport layer resources in UCP context - Added support CUDA memory allocations with ucp_mem_map - Increased UCP_WORKER_MAX_EP_CONFIG to 64 - Adjusted memory type zcopy threshold when UCX_ZCOPY_THRESH set - Refactored wireup protocols, rendezvous, get, zcopy protocols - Added put zcopy multi-rail - Improved logging for new protocols - Added system topology information - Added new protocols for eager offload protocols - UCT - Extended connection establishment API OBS-URL: https://build.opensuse.org/request/show/921702 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=46
2021-09-27 09:00:18 +00:00
UCS_CPU_VENDOR_ZHAOXIN,
UCS_CPU_VENDOR_GENERIC_RV64G,
@@ -107,6 +109,8 @@ typedef struct ucs_cpu_builtin_memcpy {
# include "aarch64/cpu.h"
#elif defined(__riscv)
# include "rv64/cpu.h"
+#elif defined(__s390x__)
+# include "s390x/cpu.h"
#else
# error "Unsupported architecture"
#endif
Accepting request 811684 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.8.0 - Features: - Improved detection for DEVX support - Improved TCP scalability - Added support for ROCM to perftest - Added support for different source and target memory types to perftest - Added optimized memcpy for ROCM devices - Added hardware tag-matching for CUDA buffers - Added support for CUDA and ROCM managed memories - Added support for client/server disconnect protocol over rdma connection manager - Added support for striding receive queue for hardware tag-matching - Added XPMEM-based rendezvous protocol for shared memory - Added support shared memory communication between containers on same machine - Added support for multi-threaded RDMA memory registration for large regions - Added new test cases to Azure CI - Added support for multiple listening transports - Added UCT socket-based connection manager transport - Updated API for UCT component management - Added API to retrieve the listening port - Added UCP active message API - Removed deprecated API for querying UCT memory domains - Refactored server/client examples - Added support for dlopen interception in UCM - Added support for PCIe atomics - Updated Java API: added support for most of UCP layer operations - Updated support for Mellanox DevX API - Added multiple UCT/TCP transport performance optimizations - Optimized memcpy() for Intel platforms - Added protection from non-UCX socket based app connections - Improved search time for PKEY object - Enabled gtest over IPv6 interfaces - Updated Mellanox and Bull device IDs - Added support for CUDA_VISIBLE_DEVICES - Increased limits for CUDA IPC registration - Bugfixes: - Multiple fixes in JUCX - Fixes in UCP thread safety - Fixes for most recent versions GCC, PGI, and ICC - Fixes for CPU affinity on Azure instances - Fixes in XPMEM support on PPC64 - Performance fixes in CUDA IPC - Fixes in RDMA CM flows - Multiple fixes in TCP transport - Multiple fixes in documentation - Fixes in transport lane selection logic - Fixes in Java jar build - Fixes in socket connection manager for Nvidia DGX-2 platform - Multiple fixes in UCP, UCT, UCM libraries - Multiple fixes for BSD and Mac OS systems - Fixes for Clang compiler - Fix CPU optimization configuration options - Fix JUCX build on GPU nodes - Fix in Azure release pipeline flow - Fix in CUDA memory hooks management - Fix in GPU memory peer direct gtest - Fix in TCP connection establishment flow - Fix in GPU IPC check - Fix in CUDA Jenkins test flow - Multiple fixes in CUDA IPC flow - Fix adding missing header files - Fix to prevent failures in presence of VPN enabled Ethernet interfaces - Refresh openucx-s390x-support.patch against new version OBS-URL: https://build.opensuse.org/request/show/811684 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=37
2020-06-05 08:02:58 +00:00
diff --git src/ucs/arch/global_opts.h src/ucs/arch/global_opts.h
index 550d22b8b751..d8e4a7cca694 100644
Accepting request 811684 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.8.0 - Features: - Improved detection for DEVX support - Improved TCP scalability - Added support for ROCM to perftest - Added support for different source and target memory types to perftest - Added optimized memcpy for ROCM devices - Added hardware tag-matching for CUDA buffers - Added support for CUDA and ROCM managed memories - Added support for client/server disconnect protocol over rdma connection manager - Added support for striding receive queue for hardware tag-matching - Added XPMEM-based rendezvous protocol for shared memory - Added support shared memory communication between containers on same machine - Added support for multi-threaded RDMA memory registration for large regions - Added new test cases to Azure CI - Added support for multiple listening transports - Added UCT socket-based connection manager transport - Updated API for UCT component management - Added API to retrieve the listening port - Added UCP active message API - Removed deprecated API for querying UCT memory domains - Refactored server/client examples - Added support for dlopen interception in UCM - Added support for PCIe atomics - Updated Java API: added support for most of UCP layer operations - Updated support for Mellanox DevX API - Added multiple UCT/TCP transport performance optimizations - Optimized memcpy() for Intel platforms - Added protection from non-UCX socket based app connections - Improved search time for PKEY object - Enabled gtest over IPv6 interfaces - Updated Mellanox and Bull device IDs - Added support for CUDA_VISIBLE_DEVICES - Increased limits for CUDA IPC registration - Bugfixes: - Multiple fixes in JUCX - Fixes in UCP thread safety - Fixes for most recent versions GCC, PGI, and ICC - Fixes for CPU affinity on Azure instances - Fixes in XPMEM support on PPC64 - Performance fixes in CUDA IPC - Fixes in RDMA CM flows - Multiple fixes in TCP transport - Multiple fixes in documentation - Fixes in transport lane selection logic - Fixes in Java jar build - Fixes in socket connection manager for Nvidia DGX-2 platform - Multiple fixes in UCP, UCT, UCM libraries - Multiple fixes for BSD and Mac OS systems - Fixes for Clang compiler - Fix CPU optimization configuration options - Fix JUCX build on GPU nodes - Fix in Azure release pipeline flow - Fix in CUDA memory hooks management - Fix in GPU memory peer direct gtest - Fix in TCP connection establishment flow - Fix in GPU IPC check - Fix in CUDA Jenkins test flow - Multiple fixes in CUDA IPC flow - Fix adding missing header files - Fix to prevent failures in presence of VPN enabled Ethernet interfaces - Refresh openucx-s390x-support.patch against new version OBS-URL: https://build.opensuse.org/request/show/811684 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=37
2020-06-05 08:02:58 +00:00
--- src/ucs/arch/global_opts.h
+++ src/ucs/arch/global_opts.h
@@ -18,6 +18,8 @@
Accepting request 811684 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.8.0 - Features: - Improved detection for DEVX support - Improved TCP scalability - Added support for ROCM to perftest - Added support for different source and target memory types to perftest - Added optimized memcpy for ROCM devices - Added hardware tag-matching for CUDA buffers - Added support for CUDA and ROCM managed memories - Added support for client/server disconnect protocol over rdma connection manager - Added support for striding receive queue for hardware tag-matching - Added XPMEM-based rendezvous protocol for shared memory - Added support shared memory communication between containers on same machine - Added support for multi-threaded RDMA memory registration for large regions - Added new test cases to Azure CI - Added support for multiple listening transports - Added UCT socket-based connection manager transport - Updated API for UCT component management - Added API to retrieve the listening port - Added UCP active message API - Removed deprecated API for querying UCT memory domains - Refactored server/client examples - Added support for dlopen interception in UCM - Added support for PCIe atomics - Updated Java API: added support for most of UCP layer operations - Updated support for Mellanox DevX API - Added multiple UCT/TCP transport performance optimizations - Optimized memcpy() for Intel platforms - Added protection from non-UCX socket based app connections - Improved search time for PKEY object - Enabled gtest over IPv6 interfaces - Updated Mellanox and Bull device IDs - Added support for CUDA_VISIBLE_DEVICES - Increased limits for CUDA IPC registration - Bugfixes: - Multiple fixes in JUCX - Fixes in UCP thread safety - Fixes for most recent versions GCC, PGI, and ICC - Fixes for CPU affinity on Azure instances - Fixes in XPMEM support on PPC64 - Performance fixes in CUDA IPC - Fixes in RDMA CM flows - Multiple fixes in TCP transport - Multiple fixes in documentation - Fixes in transport lane selection logic - Fixes in Java jar build - Fixes in socket connection manager for Nvidia DGX-2 platform - Multiple fixes in UCP, UCT, UCM libraries - Multiple fixes for BSD and Mac OS systems - Fixes for Clang compiler - Fix CPU optimization configuration options - Fix JUCX build on GPU nodes - Fix in Azure release pipeline flow - Fix in CUDA memory hooks management - Fix in GPU memory peer direct gtest - Fix in TCP connection establishment flow - Fix in GPU IPC check - Fix in CUDA Jenkins test flow - Multiple fixes in CUDA IPC flow - Fix adding missing header files - Fix to prevent failures in presence of VPN enabled Ethernet interfaces - Refresh openucx-s390x-support.patch against new version OBS-URL: https://build.opensuse.org/request/show/811684 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=37
2020-06-05 08:02:58 +00:00
# include "aarch64/global_opts.h"
#elif defined(__riscv)
# include "rv64/global_opts.h"
Accepting request 811684 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.8.0 - Features: - Improved detection for DEVX support - Improved TCP scalability - Added support for ROCM to perftest - Added support for different source and target memory types to perftest - Added optimized memcpy for ROCM devices - Added hardware tag-matching for CUDA buffers - Added support for CUDA and ROCM managed memories - Added support for client/server disconnect protocol over rdma connection manager - Added support for striding receive queue for hardware tag-matching - Added XPMEM-based rendezvous protocol for shared memory - Added support shared memory communication between containers on same machine - Added support for multi-threaded RDMA memory registration for large regions - Added new test cases to Azure CI - Added support for multiple listening transports - Added UCT socket-based connection manager transport - Updated API for UCT component management - Added API to retrieve the listening port - Added UCP active message API - Removed deprecated API for querying UCT memory domains - Refactored server/client examples - Added support for dlopen interception in UCM - Added support for PCIe atomics - Updated Java API: added support for most of UCP layer operations - Updated support for Mellanox DevX API - Added multiple UCT/TCP transport performance optimizations - Optimized memcpy() for Intel platforms - Added protection from non-UCX socket based app connections - Improved search time for PKEY object - Enabled gtest over IPv6 interfaces - Updated Mellanox and Bull device IDs - Added support for CUDA_VISIBLE_DEVICES - Increased limits for CUDA IPC registration - Bugfixes: - Multiple fixes in JUCX - Fixes in UCP thread safety - Fixes for most recent versions GCC, PGI, and ICC - Fixes for CPU affinity on Azure instances - Fixes in XPMEM support on PPC64 - Performance fixes in CUDA IPC - Fixes in RDMA CM flows - Multiple fixes in TCP transport - Multiple fixes in documentation - Fixes in transport lane selection logic - Fixes in Java jar build - Fixes in socket connection manager for Nvidia DGX-2 platform - Multiple fixes in UCP, UCT, UCM libraries - Multiple fixes for BSD and Mac OS systems - Fixes for Clang compiler - Fix CPU optimization configuration options - Fix JUCX build on GPU nodes - Fix in Azure release pipeline flow - Fix in CUDA memory hooks management - Fix in GPU memory peer direct gtest - Fix in TCP connection establishment flow - Fix in GPU IPC check - Fix in CUDA Jenkins test flow - Multiple fixes in CUDA IPC flow - Fix adding missing header files - Fix to prevent failures in presence of VPN enabled Ethernet interfaces - Refresh openucx-s390x-support.patch against new version OBS-URL: https://build.opensuse.org/request/show/811684 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=37
2020-06-05 08:02:58 +00:00
+#elif defined(__s390x__)
+# include "s390x/global_opts.h"
#else
# error "Unsupported architecture"
#endif
diff --git src/ucs/arch/s390x/bitops.h src/ucs/arch/s390x/bitops.h
new file mode 100644
index 000000000000..ce48ff1ff451
--- /dev/null
+++ src/ucs/arch/s390x/bitops.h
@@ -0,0 +1,37 @@
+/**
+* Copyright (C) Mellanox Technologies Ltd. 2001-2015. ALL RIGHTS RESERVED.
+*
+* See file LICENSE for terms.
+*/
+
+#ifndef UCS_S390X_BITOPS_H_
+#define UCS_S390X_BITOPS_H_
+
+#include <stdint.h>
+
+
+static inline unsigned __ucs_ilog2_u32(uint32_t n)
+{
+ if (!n)
+ return 0;
+ return 31 - __builtin_clz(n);
+}
+
+static inline unsigned __ucs_ilog2_u64(uint64_t n)
+{
+ if (!n)
+ return 0;
Accepting request 1075167 from home:NMorey:branches:science:HPC - Update to v1.14.0 - UCP - Added API for querying transport and device names on endpoint - Added API for querying datatype object - Added API for exporting and importing memory keys (no implementation yet) - Added support for non-persistent active message header - Added infrastructure to print protocols v2 performance - Multiple performance improvements for protocols v2 - Added support for non-contiguous datatypes for rendezvous protocols v2 - Added support for reset and abort request in protocols v2 - Added support for user memory handles in RMA API - Added multi-rail support for RMA API in protocols v2 - Added support for up to 16 different lanes per endpoint - Added support for dmabuf memory registration in protocols v2 - Added strong fence mode for ucp_worker_fence() API - UCT - Added new uct_md_mem_attach() API to support exported memory handles - Added remote completion mode for endpoint flush (via new flag) - Added support for dmabuf registration - Added new uct_ep_connect_to_ep_v2() API - Added new uct_mem_reg_v2() API - Added new uct_md_query_v2() API - Added support for IPv6 loopback address in TCP transport - RDMA CORE (IB, ROCE, etc.) - Added ECE (enhanced connection establishment) support for RC and DC transports - Added support for hardware DCS in DC transport - Added UD interface and endpoint resource information to VFS - Added CQ creation via DEVX API - Removed support for accelerated IB transports over legacy experimental verbs - UCS - Added support for auto-correction of user environment variables - UCM - Implemented CUDA bistro hooks for aarch64 (to enable memory cache on this platform) - Added support for CUDA virtual/stream-ordered memory with cudaMallocAsync - Documentation - Added FAQ for using pkg-config tool to build applications with UCX - Tools - Added runtime library version to the 'ucx_info -v' output - Added support for memory types in ucx_info - Many bugfixes. See NEWS. - Drop patch merged upstream: - UCS-DEBUG-replace-PTR-with-void.patch - gcc13-fix.patch - Refresh openucx-s390x-support.patch OBS-URL: https://build.opensuse.org/request/show/1075167 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=57
2023-03-29 08:50:48 +00:00
+ return 63 - __builtin_clz(n);
+}
+
+static UCS_F_ALWAYS_INLINE unsigned ucs_ffs32(uint32_t n)
+{
+ return __ucs_ilog2_u32(n & -n);
+}
+
+static inline unsigned ucs_ffs64(uint64_t n)
+{
+ return __ucs_ilog2_u64(n & -n);
+}
+
+#endif
diff --git src/ucs/arch/s390x/cpu.h src/ucs/arch/s390x/cpu.h
new file mode 100644
index 000000000000..033f58f7c047
--- /dev/null
+++ src/ucs/arch/s390x/cpu.h
Accepting request 811684 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.8.0 - Features: - Improved detection for DEVX support - Improved TCP scalability - Added support for ROCM to perftest - Added support for different source and target memory types to perftest - Added optimized memcpy for ROCM devices - Added hardware tag-matching for CUDA buffers - Added support for CUDA and ROCM managed memories - Added support for client/server disconnect protocol over rdma connection manager - Added support for striding receive queue for hardware tag-matching - Added XPMEM-based rendezvous protocol for shared memory - Added support shared memory communication between containers on same machine - Added support for multi-threaded RDMA memory registration for large regions - Added new test cases to Azure CI - Added support for multiple listening transports - Added UCT socket-based connection manager transport - Updated API for UCT component management - Added API to retrieve the listening port - Added UCP active message API - Removed deprecated API for querying UCT memory domains - Refactored server/client examples - Added support for dlopen interception in UCM - Added support for PCIe atomics - Updated Java API: added support for most of UCP layer operations - Updated support for Mellanox DevX API - Added multiple UCT/TCP transport performance optimizations - Optimized memcpy() for Intel platforms - Added protection from non-UCX socket based app connections - Improved search time for PKEY object - Enabled gtest over IPv6 interfaces - Updated Mellanox and Bull device IDs - Added support for CUDA_VISIBLE_DEVICES - Increased limits for CUDA IPC registration - Bugfixes: - Multiple fixes in JUCX - Fixes in UCP thread safety - Fixes for most recent versions GCC, PGI, and ICC - Fixes for CPU affinity on Azure instances - Fixes in XPMEM support on PPC64 - Performance fixes in CUDA IPC - Fixes in RDMA CM flows - Multiple fixes in TCP transport - Multiple fixes in documentation - Fixes in transport lane selection logic - Fixes in Java jar build - Fixes in socket connection manager for Nvidia DGX-2 platform - Multiple fixes in UCP, UCT, UCM libraries - Multiple fixes for BSD and Mac OS systems - Fixes for Clang compiler - Fix CPU optimization configuration options - Fix JUCX build on GPU nodes - Fix in Azure release pipeline flow - Fix in CUDA memory hooks management - Fix in GPU memory peer direct gtest - Fix in TCP connection establishment flow - Fix in GPU IPC check - Fix in CUDA Jenkins test flow - Multiple fixes in CUDA IPC flow - Fix adding missing header files - Fix to prevent failures in presence of VPN enabled Ethernet interfaces - Refresh openucx-s390x-support.patch against new version OBS-URL: https://build.opensuse.org/request/show/811684 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=37
2020-06-05 08:02:58 +00:00
@@ -0,0 +1,84 @@
+/**
+* Copyright (C) Mellanox Technologies Ltd. 2001-2013. ALL RIGHTS RESERVED.
+* Copyright (C) ARM Ltd. 2016-2017. ALL RIGHTS RESERVED.
+*
+* See file LICENSE for terms.
+*/
+
+
+#ifndef UCS_S390X_CPU_H_
+#define UCS_S390X_CPU_H_
+
+#include <ucs/sys/compiler.h>
+#include <ucs/arch/generic/cpu.h>
+#include <stdint.h>
Accepting request 811684 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.8.0 - Features: - Improved detection for DEVX support - Improved TCP scalability - Added support for ROCM to perftest - Added support for different source and target memory types to perftest - Added optimized memcpy for ROCM devices - Added hardware tag-matching for CUDA buffers - Added support for CUDA and ROCM managed memories - Added support for client/server disconnect protocol over rdma connection manager - Added support for striding receive queue for hardware tag-matching - Added XPMEM-based rendezvous protocol for shared memory - Added support shared memory communication between containers on same machine - Added support for multi-threaded RDMA memory registration for large regions - Added new test cases to Azure CI - Added support for multiple listening transports - Added UCT socket-based connection manager transport - Updated API for UCT component management - Added API to retrieve the listening port - Added UCP active message API - Removed deprecated API for querying UCT memory domains - Refactored server/client examples - Added support for dlopen interception in UCM - Added support for PCIe atomics - Updated Java API: added support for most of UCP layer operations - Updated support for Mellanox DevX API - Added multiple UCT/TCP transport performance optimizations - Optimized memcpy() for Intel platforms - Added protection from non-UCX socket based app connections - Improved search time for PKEY object - Enabled gtest over IPv6 interfaces - Updated Mellanox and Bull device IDs - Added support for CUDA_VISIBLE_DEVICES - Increased limits for CUDA IPC registration - Bugfixes: - Multiple fixes in JUCX - Fixes in UCP thread safety - Fixes for most recent versions GCC, PGI, and ICC - Fixes for CPU affinity on Azure instances - Fixes in XPMEM support on PPC64 - Performance fixes in CUDA IPC - Fixes in RDMA CM flows - Multiple fixes in TCP transport - Multiple fixes in documentation - Fixes in transport lane selection logic - Fixes in Java jar build - Fixes in socket connection manager for Nvidia DGX-2 platform - Multiple fixes in UCP, UCT, UCM libraries - Multiple fixes for BSD and Mac OS systems - Fixes for Clang compiler - Fix CPU optimization configuration options - Fix JUCX build on GPU nodes - Fix in Azure release pipeline flow - Fix in CUDA memory hooks management - Fix in GPU memory peer direct gtest - Fix in TCP connection establishment flow - Fix in GPU IPC check - Fix in CUDA Jenkins test flow - Multiple fixes in CUDA IPC flow - Fix adding missing header files - Fix to prevent failures in presence of VPN enabled Ethernet interfaces - Refresh openucx-s390x-support.patch against new version OBS-URL: https://build.opensuse.org/request/show/811684 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=37
2020-06-05 08:02:58 +00:00
+#include <string.h>
+#include <ucs/type/status.h>
+
+
+#define UCS_ARCH_CACHE_LINE_SIZE 256
+
Accepting request 811684 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.8.0 - Features: - Improved detection for DEVX support - Improved TCP scalability - Added support for ROCM to perftest - Added support for different source and target memory types to perftest - Added optimized memcpy for ROCM devices - Added hardware tag-matching for CUDA buffers - Added support for CUDA and ROCM managed memories - Added support for client/server disconnect protocol over rdma connection manager - Added support for striding receive queue for hardware tag-matching - Added XPMEM-based rendezvous protocol for shared memory - Added support shared memory communication between containers on same machine - Added support for multi-threaded RDMA memory registration for large regions - Added new test cases to Azure CI - Added support for multiple listening transports - Added UCT socket-based connection manager transport - Updated API for UCT component management - Added API to retrieve the listening port - Added UCP active message API - Removed deprecated API for querying UCT memory domains - Refactored server/client examples - Added support for dlopen interception in UCM - Added support for PCIe atomics - Updated Java API: added support for most of UCP layer operations - Updated support for Mellanox DevX API - Added multiple UCT/TCP transport performance optimizations - Optimized memcpy() for Intel platforms - Added protection from non-UCX socket based app connections - Improved search time for PKEY object - Enabled gtest over IPv6 interfaces - Updated Mellanox and Bull device IDs - Added support for CUDA_VISIBLE_DEVICES - Increased limits for CUDA IPC registration - Bugfixes: - Multiple fixes in JUCX - Fixes in UCP thread safety - Fixes for most recent versions GCC, PGI, and ICC - Fixes for CPU affinity on Azure instances - Fixes in XPMEM support on PPC64 - Performance fixes in CUDA IPC - Fixes in RDMA CM flows - Multiple fixes in TCP transport - Multiple fixes in documentation - Fixes in transport lane selection logic - Fixes in Java jar build - Fixes in socket connection manager for Nvidia DGX-2 platform - Multiple fixes in UCP, UCT, UCM libraries - Multiple fixes for BSD and Mac OS systems - Fixes for Clang compiler - Fix CPU optimization configuration options - Fix JUCX build on GPU nodes - Fix in Azure release pipeline flow - Fix in CUDA memory hooks management - Fix in GPU memory peer direct gtest - Fix in TCP connection establishment flow - Fix in GPU IPC check - Fix in CUDA Jenkins test flow - Multiple fixes in CUDA IPC flow - Fix adding missing header files - Fix to prevent failures in presence of VPN enabled Ethernet interfaces - Refresh openucx-s390x-support.patch against new version OBS-URL: https://build.opensuse.org/request/show/811684 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=37
2020-06-05 08:02:58 +00:00
+BEGIN_C_DECLS
+
+/* Assume the worst - weak memory ordering */
+#define ucs_memory_bus_fence() asm volatile (""::: "memory")
+#define ucs_memory_bus_store_fence() ucs_memory_bus_fence()
+#define ucs_memory_bus_load_fence() ucs_memory_bus_fence()
+#define ucs_memory_bus_wc_flush() ucs_memory_bus_fence()
+#define ucs_memory_cpu_fence() ucs_memory_bus_fence()
+#define ucs_memory_cpu_store_fence() ucs_memory_bus_fence()
+#define ucs_memory_cpu_load_fence() ucs_memory_bus_fence()
+#define ucs_memory_cpu_wc_fence() ucs_memory_bus_fence()
+
+
+static inline uint64_t ucs_arch_read_hres_clock()
+{
+ unsigned long clk;
+ asm volatile("stck %0" : "=Q" (clk) : : "cc");
+ return clk >> 2;
+}
+#define ucs_arch_get_clocks_per_sec ucs_arch_generic_get_clocks_per_sec
+
+
+static inline ucs_cpu_model_t ucs_arch_get_cpu_model()
+{
+ return UCS_CPU_MODEL_S390X;
+}
+
Accepting request 811684 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.8.0 - Features: - Improved detection for DEVX support - Improved TCP scalability - Added support for ROCM to perftest - Added support for different source and target memory types to perftest - Added optimized memcpy for ROCM devices - Added hardware tag-matching for CUDA buffers - Added support for CUDA and ROCM managed memories - Added support for client/server disconnect protocol over rdma connection manager - Added support for striding receive queue for hardware tag-matching - Added XPMEM-based rendezvous protocol for shared memory - Added support shared memory communication between containers on same machine - Added support for multi-threaded RDMA memory registration for large regions - Added new test cases to Azure CI - Added support for multiple listening transports - Added UCT socket-based connection manager transport - Updated API for UCT component management - Added API to retrieve the listening port - Added UCP active message API - Removed deprecated API for querying UCT memory domains - Refactored server/client examples - Added support for dlopen interception in UCM - Added support for PCIe atomics - Updated Java API: added support for most of UCP layer operations - Updated support for Mellanox DevX API - Added multiple UCT/TCP transport performance optimizations - Optimized memcpy() for Intel platforms - Added protection from non-UCX socket based app connections - Improved search time for PKEY object - Enabled gtest over IPv6 interfaces - Updated Mellanox and Bull device IDs - Added support for CUDA_VISIBLE_DEVICES - Increased limits for CUDA IPC registration - Bugfixes: - Multiple fixes in JUCX - Fixes in UCP thread safety - Fixes for most recent versions GCC, PGI, and ICC - Fixes for CPU affinity on Azure instances - Fixes in XPMEM support on PPC64 - Performance fixes in CUDA IPC - Fixes in RDMA CM flows - Multiple fixes in TCP transport - Multiple fixes in documentation - Fixes in transport lane selection logic - Fixes in Java jar build - Fixes in socket connection manager for Nvidia DGX-2 platform - Multiple fixes in UCP, UCT, UCM libraries - Multiple fixes for BSD and Mac OS systems - Fixes for Clang compiler - Fix CPU optimization configuration options - Fix JUCX build on GPU nodes - Fix in Azure release pipeline flow - Fix in CUDA memory hooks management - Fix in GPU memory peer direct gtest - Fix in TCP connection establishment flow - Fix in GPU IPC check - Fix in CUDA Jenkins test flow - Multiple fixes in CUDA IPC flow - Fix adding missing header files - Fix to prevent failures in presence of VPN enabled Ethernet interfaces - Refresh openucx-s390x-support.patch against new version OBS-URL: https://build.opensuse.org/request/show/811684 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=37
2020-06-05 08:02:58 +00:00
+static inline ucs_cpu_vendor_t ucs_arch_get_cpu_vendor()
+{
+ return UCS_CPU_VENDOR_GENERIC_IBM;
+}
+
+static inline int ucs_arch_get_cpu_flag()
+{
+ return UCS_CPU_FLAG_UNKNOWN;
+}
+
+double ucs_arch_get_clocks_per_sec();
+
+#define ucs_arch_wait_mem ucs_arch_generic_wait_mem
+
Accepting request 811684 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.8.0 - Features: - Improved detection for DEVX support - Improved TCP scalability - Added support for ROCM to perftest - Added support for different source and target memory types to perftest - Added optimized memcpy for ROCM devices - Added hardware tag-matching for CUDA buffers - Added support for CUDA and ROCM managed memories - Added support for client/server disconnect protocol over rdma connection manager - Added support for striding receive queue for hardware tag-matching - Added XPMEM-based rendezvous protocol for shared memory - Added support shared memory communication between containers on same machine - Added support for multi-threaded RDMA memory registration for large regions - Added new test cases to Azure CI - Added support for multiple listening transports - Added UCT socket-based connection manager transport - Updated API for UCT component management - Added API to retrieve the listening port - Added UCP active message API - Removed deprecated API for querying UCT memory domains - Refactored server/client examples - Added support for dlopen interception in UCM - Added support for PCIe atomics - Updated Java API: added support for most of UCP layer operations - Updated support for Mellanox DevX API - Added multiple UCT/TCP transport performance optimizations - Optimized memcpy() for Intel platforms - Added protection from non-UCX socket based app connections - Improved search time for PKEY object - Enabled gtest over IPv6 interfaces - Updated Mellanox and Bull device IDs - Added support for CUDA_VISIBLE_DEVICES - Increased limits for CUDA IPC registration - Bugfixes: - Multiple fixes in JUCX - Fixes in UCP thread safety - Fixes for most recent versions GCC, PGI, and ICC - Fixes for CPU affinity on Azure instances - Fixes in XPMEM support on PPC64 - Performance fixes in CUDA IPC - Fixes in RDMA CM flows - Multiple fixes in TCP transport - Multiple fixes in documentation - Fixes in transport lane selection logic - Fixes in Java jar build - Fixes in socket connection manager for Nvidia DGX-2 platform - Multiple fixes in UCP, UCT, UCM libraries - Multiple fixes for BSD and Mac OS systems - Fixes for Clang compiler - Fix CPU optimization configuration options - Fix JUCX build on GPU nodes - Fix in Azure release pipeline flow - Fix in CUDA memory hooks management - Fix in GPU memory peer direct gtest - Fix in TCP connection establishment flow - Fix in GPU IPC check - Fix in CUDA Jenkins test flow - Multiple fixes in CUDA IPC flow - Fix adding missing header files - Fix to prevent failures in presence of VPN enabled Ethernet interfaces - Refresh openucx-s390x-support.patch against new version OBS-URL: https://build.opensuse.org/request/show/811684 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=37
2020-06-05 08:02:58 +00:00
+static inline void ucs_cpu_init()
+{
+}
+
+static inline void *ucs_memcpy_relaxed(void *dst, const void *src, size_t len)
+{
+ return memcpy(dst, src, len);
+}
+
+static UCS_F_ALWAYS_INLINE void
+ucs_memcpy_nontemporal(void *dst, const void *src, size_t len)
+{
+ memcpy(dst, src, len);
+}
+
+static inline ucs_status_t ucs_arch_get_cache_size(size_t *cache_sizes)
+{
+ return UCS_ERR_UNSUPPORTED;
+}
+
+END_C_DECLS
+
+#endif
diff --git src/ucs/arch/s390x/global_opts.c src/ucs/arch/s390x/global_opts.c
new file mode 100644
index 000000000000..4fa0c74034a7
--- /dev/null
+++ src/ucs/arch/s390x/global_opts.c
@@ -0,0 +1,24 @@
+/**
+* Copyright (C) Mellanox Technologies Ltd. 2019. ALL RIGHTS RESERVED.
+*
+* See file LICENSE for terms.
+*/
+
+#if defined(__s390x__)
+
+#ifdef HAVE_CONFIG_H
+# include "config.h"
+#endif
+
+#include <ucs/arch/global_opts.h>
+#include <ucs/config/parser.h>
+
+ucs_config_field_t ucs_arch_global_opts_table[] = {
+ {NULL}
+};
+
+void ucs_arch_print_memcpy_limits(ucs_arch_global_opts_t *config)
+{
+}
+
+#endif
Accepting request 811684 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.8.0 - Features: - Improved detection for DEVX support - Improved TCP scalability - Added support for ROCM to perftest - Added support for different source and target memory types to perftest - Added optimized memcpy for ROCM devices - Added hardware tag-matching for CUDA buffers - Added support for CUDA and ROCM managed memories - Added support for client/server disconnect protocol over rdma connection manager - Added support for striding receive queue for hardware tag-matching - Added XPMEM-based rendezvous protocol for shared memory - Added support shared memory communication between containers on same machine - Added support for multi-threaded RDMA memory registration for large regions - Added new test cases to Azure CI - Added support for multiple listening transports - Added UCT socket-based connection manager transport - Updated API for UCT component management - Added API to retrieve the listening port - Added UCP active message API - Removed deprecated API for querying UCT memory domains - Refactored server/client examples - Added support for dlopen interception in UCM - Added support for PCIe atomics - Updated Java API: added support for most of UCP layer operations - Updated support for Mellanox DevX API - Added multiple UCT/TCP transport performance optimizations - Optimized memcpy() for Intel platforms - Added protection from non-UCX socket based app connections - Improved search time for PKEY object - Enabled gtest over IPv6 interfaces - Updated Mellanox and Bull device IDs - Added support for CUDA_VISIBLE_DEVICES - Increased limits for CUDA IPC registration - Bugfixes: - Multiple fixes in JUCX - Fixes in UCP thread safety - Fixes for most recent versions GCC, PGI, and ICC - Fixes for CPU affinity on Azure instances - Fixes in XPMEM support on PPC64 - Performance fixes in CUDA IPC - Fixes in RDMA CM flows - Multiple fixes in TCP transport - Multiple fixes in documentation - Fixes in transport lane selection logic - Fixes in Java jar build - Fixes in socket connection manager for Nvidia DGX-2 platform - Multiple fixes in UCP, UCT, UCM libraries - Multiple fixes for BSD and Mac OS systems - Fixes for Clang compiler - Fix CPU optimization configuration options - Fix JUCX build on GPU nodes - Fix in Azure release pipeline flow - Fix in CUDA memory hooks management - Fix in GPU memory peer direct gtest - Fix in TCP connection establishment flow - Fix in GPU IPC check - Fix in CUDA Jenkins test flow - Multiple fixes in CUDA IPC flow - Fix adding missing header files - Fix to prevent failures in presence of VPN enabled Ethernet interfaces - Refresh openucx-s390x-support.patch against new version OBS-URL: https://build.opensuse.org/request/show/811684 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=37
2020-06-05 08:02:58 +00:00
diff --git src/ucs/arch/s390x/global_opts.h src/ucs/arch/s390x/global_opts.h
new file mode 100644
index 000000000000..225e4e5e896a
--- /dev/null
+++ src/ucs/arch/s390x/global_opts.h
@@ -0,0 +1,25 @@
+/**
+* Copyright (C) Mellanox Technologies Ltd. 2019. ALL RIGHTS RESERVED.
+*
+* See file LICENSE for terms.
+*/
+
+
+#ifndef UCS_PPC64_GLOBAL_OPTS_H_
+#define UCS_PPC64_GLOBAL_OPTS_H_
+
+#include <ucs/sys/compiler_def.h>
+
+BEGIN_C_DECLS
+
+#define UCS_ARCH_GLOBAL_OPTS_INITALIZER {}
+
+/* built-in memcpy config */
+typedef struct ucs_arch_global_opts {
+ char dummy;
+} ucs_arch_global_opts_t;
+
+END_C_DECLS
+
+#endif
+
diff --git src/ucs/sys/sys.c src/ucs/sys/sys.c
index 42ff75f64af5..b22418e3f4b0 100644
--- src/ucs/sys/sys.c
+++ src/ucs/sys/sys.c
Accepting request 1115979 from home:NMorey:branches:science:HPC - Update to 1.15.0 - UCP - Added 2-stage pipeline protocol in the new protocol infrastructure - Added reset and abort functionality of rendezvous protocols in the new infrastructure - Added zero-copy rendezvous data send protocol in the new infrastructure - Added support for user memory handle in the new protocol infrastructure - Added option to force ODP registration for certain memory types - Enabled lock free memory region deregistration - Updated allow/deny transport list feature to control auxiliary transport selection - Multiple performance improvements of the new protocol infrastructure - Multiple improvements in error and debug messages - Fixed assertion when sending from non-contiguous GPU buffer to managed buffer - Fixed the race condition on endpoint configurations - Fixed endpoint reconfiguration issues due to asymmetrical selection - Fixed endpoint reconfiguration error due to wrong locality detection - Fixed crash during connection manager cleanup - Fixed rkey index calculation for rendezvous protocol - Fixed rcache dump function - Removed logging from rkey unpack in release mode - Fixed dobule free of rkey in rendezvous protocol - Fixed rendezvous pipeline protocol error flow - Fixed error handling in rendezvous get zcopy protocol - Replay pending requests of wireup EP CM during connection establishment to prevent potential ordering issues and wrong configuration - Pass user-provided memory type to the function that checks whether the buffer can be sent inline or not - Avoid memory registration during UCP context initialization - Fixed CPU/device atomics selection in the new protocol infrastructure - Multiple fixes in the new protocol infrastructure information output OBS-URL: https://build.opensuse.org/request/show/1115979 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=63
2023-10-06 09:59:22 +00:00
@@ -1258,8 +1258,19 @@ void *ucs_sys_realloc(void *old_ptr, size_t old_length, size_t new_length)
if (old_ptr == NULL) {
/* Note: Must pass the 0 offset as "long", otherwise it will be
* partially undefined when converted to syscall arguments */
+#if defined(__s390x__)
+ long int _args[6] = {
+ (long int) NULL,
+ (long int) new_length,
+ (long int) PROT_READ|PROT_WRITE,
+ (long int) MAP_PRIVATE|MAP_ANONYMOUS,
+ (long int) -1,
+ (long int) 0ul};
+ ptr = (void*)syscall(__NR_mmap, _args);
+#else
ptr = (void*)syscall(__NR_mmap, NULL, new_length, PROT_READ|PROT_WRITE,
MAP_PRIVATE|MAP_ANONYMOUS, -1, 0ul);
+#endif
if (ptr == MAP_FAILED) {
ucs_log_fatal_error("mmap(NULL, %zu, READ|WRITE, PRIVATE|ANON) failed: %m",
new_length);