From 878438d42d6923ab391e22faae6581286754e400598555cc7f1f1ab461cd9606 Mon Sep 17 00:00:00 2001 From: Nicolas Morey-Chaisemartin Date: Thu, 29 Sep 2022 15:27:45 +0000 Subject: [PATCH] Accepting request 1006486 from home:NMoreyChaisemartin:branches:science:HPC - Update to v1.13.1 (jsc#PED-912) - Core - Added new objects to VFS: local and remote address of endpoint, statistics of ucp_ep_create success/failure, failed/destroyed endpoints - Added support for UCX static libraries - Added profiling for rkey management routines - PCIe relaxed order enabled by default for AMD CPUs - Fixed not deallocating memory from ucp_mem_unmap if no rcache - Fixed versioning infrastructure - Multiple code improvements: refactoring, debug prints and assertions, etc. - Multiple improvements in build, test and docs infrastructure - Added new objects to VFS (md, component, log_level, etc.) - Added configuration variable to specify which loadable modules are allowed - Added build-time configuration to disable sigaction overriding - UCP - Added API to pass pre-registered memory handle to UCP operations - Added implementation of AM rendezvous protocol - Added 2-stage pipeline rendezvous protocol for GPU - Added support for fragment mem_type for v1 pipeline proto, disabled by default - Added active message support for proto v2 - Added UCP memory registration cache - Improved adaptive progress - deactivate iface when all p2p lanes are destroyed - Added support for user memh in proto_v1 - Added support for selecting local address when creating a client endpoint - Added option to limit GPUDirectRDMA size in rendezvous protocol, UCX_RNDV_MEMTYPE_DIRECT_SIZE - Deprecated UCX_SOCKADDR_AUX_TLS configuration parameter - Resolving remote EP ID when creating local EP disabled by default - Added client_id to ucp_worker_create() and ucp_conn_request_query() APIs - Added ucp_worker_address_query() API - Updated ucp_ep_query() API for getting local and remote addresses - Added address versioning to correctly preserve wire compatibility starting from version 1.11.0 - Added new client/server connection establishment packet header format - Enabled rendezvous and tag sync protocols when error handling is enabled on the endpoint - Added iov zcopy support to RMA operations - Reduced memory usage of unexpected messages by fitting receive buffer size to packet size - Added support for modifying UCT and UCS configs by ucp_config_modify() API - Optimized unpacked rkeys memory consumption - Added request flag to influence latency vs. bandwidth protocol - Reduced memory management overhead with new protocols - Improved performance calculations for new protocols - Added AMO support with GPU memory target using new protocols - Added put_zcopy, get_zcopy and pipeline based rendezvous in new protocols - Added support for user-defined alignment in Active Messages - Added support for offload tag sync in new protocols - Updated ucp_atomic_post() to use NBX flow - UCT - Introduced API uct_md_mkey_pack_v2 - Introduced UCT iface features API - Introduced max_inflight_eps parameter in perf_attr API - Introduced UCT_SEND_FLAG_PEER_CHECK flag that forces checking connectivity to a peer - Introduced UCX_RCACHE_PURGE_ON_FORK to enable/disable cleaning regions when application is forking - Disabled PEER_FAILURE capability for XPMEM - Added API - uct_iface_is_reachable_v2() - Added IPv6 address support in TCP - Added latency estimation to uct_iface_estimate_perf() - Adjusted knem and cma overhead cost - Increased built-in TCP keep-alive interval to 2 seconds - RDMA CORE (IB, ROCE, etc.) - Introduced NDR autorecognition - Introduced CQE zipping support - Set the default MAX_RD_ATOMIC to maximum value supported by the hardware - Disabled mlx5 ifaces on verbs MD - Added detection of IB NDR devices - Added check for CQ overrun in assert mode - Added bitmap usage for releasing detached DCIs - Added configuration for requests ack frequency with DevX - Added remote QP info to tx error CQE traces - ROCM - Increased maximum number of HSA agents - UCS - Added topo module infrastructure - Added memtrack and rcache information to VFS - Added API for a per-process aggregate-sum statistics report - Added memory pool set data structure - Added new ptr_array API for bulk allocation - Added ucs_string_buffer_append_flags() for string buffer - Added ucs_ffs32() - Added ucs_vsnprintf_safe() which always adds '\0' - Added thread-safe put to ptr_map - Improved accuracy of the topology distance estimation - Added prints of leaked callbacks from the callback queue - Removed a diagnostic message when fuse thread is stopped - Added configurable limit for the memory consumed by rcache - Added configuration for VFS(FUSE) thread affinity - Added memory limit support to memtrack - Packaging - Added cmake config files for better integration with external cmake based projects - Tools - Added loop-back transport support in ucx_perftest - Split ucx_perftest into separate modules - Added process placement option for ucx_info - Extended parameters correctness check in ucx_perftest - Backported UCS-DEBUG-replace-PTR-with-void.patch from upstream to fix compilation OBS-URL: https://build.opensuse.org/request/show/1006486 OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=48 --- UCS-DEBUG-replace-PTR-with-void.patch | 25 +++++++ openucx-s390x-support.patch | 40 +++++------ openucx.changes | 99 +++++++++++++++++++++++++++ openucx.spec | 13 +++- ucx-1.11.1.tar.gz | 3 - ucx-1.13.1.tar.gz | 3 + 6 files changed, 159 insertions(+), 24 deletions(-) create mode 100644 UCS-DEBUG-replace-PTR-with-void.patch delete mode 100644 ucx-1.11.1.tar.gz create mode 100644 ucx-1.13.1.tar.gz diff --git a/UCS-DEBUG-replace-PTR-with-void.patch b/UCS-DEBUG-replace-PTR-with-void.patch new file mode 100644 index 0000000..16ce7d0 --- /dev/null +++ b/UCS-DEBUG-replace-PTR-with-void.patch @@ -0,0 +1,25 @@ +commit 2ba0935afc92b9288c01da246b3be53822277add +Author: Hui Zhou +Date: Sun Aug 14 23:29:09 2022 -0500 + + UCS/DEBUG: replace PTR with void * + + The PTR macro is missing on the latest Arch linux. + +diff --git src/ucs/debug/debug.c src/ucs/debug/debug.c +index b803636c2221..4bbf5095c419 100644 +--- src/ucs/debug/debug.c ++++ src/ucs/debug/debug.c +@@ -272,10 +272,10 @@ static int load_file(struct backtrace_file *file) + goto err_close; + } + +- symcount = bfd_read_minisymbols(file->abfd, 0, (PTR)&file->syms, &size); ++ symcount = bfd_read_minisymbols(file->abfd, 0, (void *)&file->syms, &size); + if (symcount == 0) { + free(file->syms); +- symcount = bfd_read_minisymbols(file->abfd, 1, (PTR)&file->syms, &size); ++ symcount = bfd_read_minisymbols(file->abfd, 1, (void *)&file->syms, &size); + } + if (symcount < 0) { + goto err_close; diff --git a/openucx-s390x-support.patch b/openucx-s390x-support.patch index fa7f7ed..539f7e7 100644 --- a/openucx-s390x-support.patch +++ b/openucx-s390x-support.patch @@ -1,16 +1,16 @@ -commit 71d28736870f46080b8187bf2ba64920c87dc7e4 +commit 9d5c0d189d4cd5413089bd65fed1e87293e15763 Author: Nicolas Morey-Chaisemartin -Date: Thu Aug 9 07:41:24 2018 +0200 +Date: Tue Sep 27 17:47:15 2022 +0200 openucx s390x support Signed-off-by: Nicolas Morey-Chaisemartin diff --git config/m4/ucm.m4 config/m4/ucm.m4 -index 1e229edc51f2..3f74fca02976 100644 +index 8d7a9e40ec06..df7508e1e71a 100644 --- config/m4/ucm.m4 +++ config/m4/ucm.m4 -@@ -73,9 +73,20 @@ AC_CHECK_DECLS([SYS_ipc], +@@ -80,9 +80,20 @@ AC_CHECK_DECLS([SYS_ipc], [ipc_hooks_happy=no], [#include ]) @@ -33,10 +33,10 @@ index 1e229edc51f2..3f74fca02976 100644 AS_IF([test "x$bistro_hooks_happy" = "xyes"], [AC_DEFINE([UCM_BISTRO_HOOKS], [1], [Enable BISTRO hooks])], diff --git src/tools/info/sys_info.c src/tools/info/sys_info.c -index 7c355a264c2b..52efadec096c 100644 +index 5316b1c4336e..e910bc53572d 100644 --- src/tools/info/sys_info.c +++ src/tools/info/sys_info.c -@@ -44,7 +44,8 @@ static const char* cpu_vendor_names[] = { +@@ -46,7 +46,8 @@ static const char* cpu_vendor_names[] = { [UCS_CPU_VENDOR_GENERIC_ARM] = "Generic ARM", [UCS_CPU_VENDOR_GENERIC_PPC] = "Generic PPC", [UCS_CPU_VENDOR_FUJITSU_ARM] = "Fujitsu ARM", @@ -47,7 +47,7 @@ index 7c355a264c2b..52efadec096c 100644 static double measure_memcpy_bandwidth(size_t size) diff --git src/ucm/Makefile.am src/ucm/Makefile.am -index 55784d0c31f4..a6003eda0333 100644 +index 5140b5acf5bf..8805124befee 100644 --- src/ucm/Makefile.am +++ src/ucm/Makefile.am @@ -31,7 +31,8 @@ noinst_HEADERS = \ @@ -98,10 +98,10 @@ index 000000000000..c0f427f4984a + +#endif diff --git src/ucs/Makefile.am src/ucs/Makefile.am -index 8cc77e87da3f..2fbb53188a58 100644 +index 77680021d725..29f31aabd958 100644 --- src/ucs/Makefile.am +++ src/ucs/Makefile.am -@@ -21,6 +21,7 @@ libucs_la_LIBADD = $(LIBM) $(top_builddir)/src/ucm/libucm.la +@@ -22,6 +22,7 @@ libucs_la_LIBADD = $(LIBM) $(top_builddir)/src/ucm/libucm.la $(BFD_LIBS) nobase_dist_libucs_la_HEADERS = \ arch/aarch64/bitops.h \ arch/ppc64/bitops.h \ @@ -109,7 +109,7 @@ index 8cc77e87da3f..2fbb53188a58 100644 arch/x86_64/bitops.h \ arch/bitops.h \ algorithm/crc.h \ -@@ -77,12 +78,14 @@ nobase_dist_libucs_la_HEADERS = \ +@@ -81,12 +82,14 @@ nobase_dist_libucs_la_HEADERS = \ arch/aarch64/global_opts.h \ arch/generic/atomic.h \ arch/ppc64/global_opts.h \ @@ -123,8 +123,8 @@ index 8cc77e87da3f..2fbb53188a58 100644 + arch/s390x/cpu.h \ arch/x86_64/cpu.h \ arch/cpu.h \ - datastruct/arbiter.h \ -@@ -127,6 +130,7 @@ libucs_la_SOURCES = \ + config/ucm_opts.h \ +@@ -134,6 +137,7 @@ libucs_la_SOURCES = \ algorithm/qsort_r.c \ arch/aarch64/cpu.c \ arch/aarch64/global_opts.c \ @@ -146,7 +146,7 @@ index 6a8551f592e1..e3a9f4641383 100644 # error "Unsupported architecture" #endif diff --git src/ucs/arch/bitops.h src/ucs/arch/bitops.h -index a890cd255295..badc12419b5b 100644 +index 77e00571e04f..bbdea0ceb210 100644 --- src/ucs/arch/bitops.h +++ src/ucs/arch/bitops.h @@ -20,6 +20,8 @@ BEGIN_C_DECLS @@ -159,7 +159,7 @@ index a890cd255295..badc12419b5b 100644 # error "Unsupported architecture" #endif diff --git src/ucs/arch/cpu.c src/ucs/arch/cpu.c -index 210a49c8e717..4018392ebed3 100644 +index 9e6fab0904eb..c912e991586c 100644 --- src/ucs/arch/cpu.c +++ src/ucs/arch/cpu.c @@ -61,6 +61,10 @@ const ucs_cpu_builtin_memcpy_t ucs_cpu_builtin_memcpy[UCS_CPU_VENDOR_LAST] = { @@ -178,14 +178,14 @@ index 210a49c8e717..4018392ebed3 100644 [UCS_CPU_VENDOR_GENERIC_ARM] = 5800 * UCS_MBYTE, [UCS_CPU_VENDOR_GENERIC_PPC] = 5800 * UCS_MBYTE, + [UCS_CPU_VENDOR_GENERIC_IBM] = 5800 * UCS_MBYTE, - [UCS_CPU_VENDOR_FUJITSU_ARM] = 5800 * UCS_MBYTE + [UCS_CPU_VENDOR_FUJITSU_ARM] = 12000 * UCS_MBYTE }; diff --git src/ucs/arch/cpu.h src/ucs/arch/cpu.h -index e06f6b95ebb1..15f3198976a9 100644 +index 719913fb0b8c..04d69ca01533 100644 --- src/ucs/arch/cpu.h +++ src/ucs/arch/cpu.h -@@ -62,6 +62,7 @@ typedef enum ucs_cpu_vendor { +@@ -63,6 +63,7 @@ typedef enum ucs_cpu_vendor { UCS_CPU_VENDOR_AMD, UCS_CPU_VENDOR_GENERIC_ARM, UCS_CPU_VENDOR_GENERIC_PPC, @@ -193,7 +193,7 @@ index e06f6b95ebb1..15f3198976a9 100644 UCS_CPU_VENDOR_FUJITSU_ARM, UCS_CPU_VENDOR_ZHAOXIN, UCS_CPU_VENDOR_LAST -@@ -97,6 +98,8 @@ typedef struct ucs_cpu_builtin_memcpy { +@@ -98,6 +99,8 @@ typedef struct ucs_cpu_builtin_memcpy { # include "ppc64/cpu.h" #elif defined(__aarch64__) # include "aarch64/cpu.h" @@ -405,10 +405,10 @@ index 000000000000..225e4e5e896a +#endif + diff --git src/ucs/sys/sys.c src/ucs/sys/sys.c -index 59836aaa51c2..3975db7f6be3 100644 +index 88f4a147315e..0b6d186265a8 100644 --- src/ucs/sys/sys.c +++ src/ucs/sys/sys.c -@@ -1223,8 +1223,19 @@ void *ucs_sys_realloc(void *old_ptr, size_t old_length, size_t new_length) +@@ -1224,8 +1224,19 @@ void *ucs_sys_realloc(void *old_ptr, size_t old_length, size_t new_length) if (old_ptr == NULL) { /* Note: Must pass the 0 offset as "long", otherwise it will be * partially undefined when converted to syscall arguments */ diff --git a/openucx.changes b/openucx.changes index a6d45c8..e1e6145 100644 --- a/openucx.changes +++ b/openucx.changes @@ -1,3 +1,102 @@ +------------------------------------------------------------------- +Tue Sep 27 15:55:19 UTC 2022 - Nicolas Morey-Chaisemartin + +- Update to v1.13.1 (jsc#PED-912) + - Core + - Added new objects to VFS: local and remote address of endpoint, + statistics of ucp_ep_create success/failure, failed/destroyed endpoints + - Added support for UCX static libraries + - Added profiling for rkey management routines + - PCIe relaxed order enabled by default for AMD CPUs + - Fixed not deallocating memory from ucp_mem_unmap if no rcache + - Fixed versioning infrastructure + - Multiple code improvements: refactoring, debug prints and assertions, etc. + - Multiple improvements in build, test and docs infrastructure + - Added new objects to VFS (md, component, log_level, etc.) + - Added configuration variable to specify which loadable modules are allowed + - Added build-time configuration to disable sigaction overriding + - UCP + - Added API to pass pre-registered memory handle to UCP operations + - Added implementation of AM rendezvous protocol + - Added 2-stage pipeline rendezvous protocol for GPU + - Added support for fragment mem_type for v1 pipeline proto, disabled by default + - Added active message support for proto v2 + - Added UCP memory registration cache + - Improved adaptive progress - deactivate iface when all p2p lanes are destroyed + - Added support for user memh in proto_v1 + - Added support for selecting local address when creating a client endpoint + - Added option to limit GPUDirectRDMA size in rendezvous protocol, UCX_RNDV_MEMTYPE_DIRECT_SIZE + - Deprecated UCX_SOCKADDR_AUX_TLS configuration parameter + - Resolving remote EP ID when creating local EP disabled by default + - Added client_id to ucp_worker_create() and ucp_conn_request_query() APIs + - Added ucp_worker_address_query() API + - Updated ucp_ep_query() API for getting local and remote addresses + - Added address versioning to correctly preserve wire compatibility starting from version 1.11.0 + - Added new client/server connection establishment packet header format + - Enabled rendezvous and tag sync protocols when error handling is enabled on the endpoint + - Added iov zcopy support to RMA operations + - Reduced memory usage of unexpected messages by fitting receive buffer size to packet size + - Added support for modifying UCT and UCS configs by ucp_config_modify() API + - Optimized unpacked rkeys memory consumption + - Added request flag to influence latency vs. bandwidth protocol + - Reduced memory management overhead with new protocols + - Improved performance calculations for new protocols + - Added AMO support with GPU memory target using new protocols + - Added put_zcopy, get_zcopy and pipeline based rendezvous in new protocols + - Added support for user-defined alignment in Active Messages + - Added support for offload tag sync in new protocols + - Updated ucp_atomic_post() to use NBX flow + - UCT + - Introduced API uct_md_mkey_pack_v2 + - Introduced UCT iface features API + - Introduced max_inflight_eps parameter in perf_attr API + - Introduced UCT_SEND_FLAG_PEER_CHECK flag that forces checking connectivity to a peer + - Introduced UCX_RCACHE_PURGE_ON_FORK to enable/disable cleaning regions when application is forking + - Disabled PEER_FAILURE capability for XPMEM + - Added API - uct_iface_is_reachable_v2() + - Added IPv6 address support in TCP + - Added latency estimation to uct_iface_estimate_perf() + - Adjusted knem and cma overhead cost + - Increased built-in TCP keep-alive interval to 2 seconds + - RDMA CORE (IB, ROCE, etc.) + - Introduced NDR autorecognition + - Introduced CQE zipping support + - Set the default MAX_RD_ATOMIC to maximum value supported by the hardware + - Disabled mlx5 ifaces on verbs MD + - Added detection of IB NDR devices + - Added check for CQ overrun in assert mode + - Added bitmap usage for releasing detached DCIs + - Added configuration for requests ack frequency with DevX + - Added remote QP info to tx error CQE traces + - ROCM + - Increased maximum number of HSA agents + - UCS + - Added topo module infrastructure + - Added memtrack and rcache information to VFS + - Added API for a per-process aggregate-sum statistics report + - Added memory pool set data structure + - Added new ptr_array API for bulk allocation + - Added ucs_string_buffer_append_flags() for string buffer + - Added ucs_ffs32() + - Added ucs_vsnprintf_safe() which always adds '\0' + - Added thread-safe put to ptr_map + - Improved accuracy of the topology distance estimation + - Added prints of leaked callbacks from the callback queue + - Removed a diagnostic message when fuse thread is stopped + - Added configurable limit for the memory consumed by rcache + - Added configuration for VFS(FUSE) thread affinity + - Added memory limit support to memtrack + - Packaging + - Added cmake config files for better integration with external cmake based projects + - Tools + - Added loop-back transport support in ucx_perftest + - Split ucx_perftest into separate modules + - Added process placement option for ucx_info + - Extended parameters correctness check in ucx_perftest +- Backported UCS-DEBUG-replace-PTR-with-void.patch + from upstream to fix compilation + + ------------------------------------------------------------------- Thu Jan 13 08:42:05 UTC 2022 - Nicolas Morey-Chaisemartin diff --git a/openucx.spec b/openucx.spec index 661f206..af804a4 100644 --- a/openucx.spec +++ b/openucx.spec @@ -17,7 +17,7 @@ Name: openucx -Version: 1.11.1 +Version: 1.13.1 Release: 0 Summary: Communication layer for Message Passing (MPI) License: BSD-3-Clause @@ -30,6 +30,7 @@ Source: https://github.com/openucx/ucx/releases/download/v%version/ucx-% Source1: baselibs.conf Patch1: openucx-s390x-support.patch Patch2: ucm-fix-UCX_MEM_MALLOC_RELOC.patch +Patch3: UCS-DEBUG-replace-PTR-with-void.patch BuildRequires: autoconf >= 2.63 BuildRequires: automake >= 1.10 BuildRequires: binutils-devel @@ -138,6 +139,7 @@ hardware. %patch1 %endif %patch2 +%patch3 %build autoreconf -fi @@ -188,6 +190,8 @@ mv %buildroot/%_bindir/io_demo %buildroot/%_libexecdir/%{name}/ %_datadir/%{name}/ %_libexecdir/%{name} %_libdir/pkgconfig/ucx.pc +%dir %_libdir/cmake/ +%_libdir/cmake/ucx/ %doc LICENSE NEWS %files -n libucm0 @@ -211,11 +215,14 @@ mv %buildroot/%_bindir/io_demo %buildroot/%_libexecdir/%{name}/ %files -n libucs0 %defattr(-,root,root) %_libdir/libucs.so.* +%_libdir/libucs_signal.so.* %files -n libucs-devel %defattr(-,root,root) %_includedir/ucs/ %_libdir/libucs.so +%_libdir/libucs_signal.so +%_libdir/pkgconfig/ucx-ucs.pc %files -n libuct0 %defattr(-,root,root) @@ -229,5 +236,9 @@ mv %buildroot/%_bindir/io_demo %buildroot/%_libexecdir/%{name}/ %_libdir/libuct.so %dir %_libdir/ucx/ %_libdir/ucx/libuct_*.so +%_libdir/pkgconfig/ucx-uct.pc +%_libdir/pkgconfig/ucx-cma.pc +%_libdir/pkgconfig/ucx-ib.pc +%_libdir/pkgconfig/ucx-rdmacm.pc %changelog diff --git a/ucx-1.11.1.tar.gz b/ucx-1.11.1.tar.gz deleted file mode 100644 index a8cfdb2..0000000 --- a/ucx-1.11.1.tar.gz +++ /dev/null @@ -1,3 +0,0 @@ -version https://git-lfs.github.com/spec/v1 -oid sha256:29338cad18858517f96b46ff83bdd259a5899e274792cebd269717c660aa86fd -size 2746949 diff --git a/ucx-1.13.1.tar.gz b/ucx-1.13.1.tar.gz new file mode 100644 index 0000000..906d7dd --- /dev/null +++ b/ucx-1.13.1.tar.gz @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:efc37829b68e131d2acc82a3fd4334bfd611156a756837ffeb650ab9a9dd3828 +size 2979566