Accepting request 1006486 from home:NMoreyChaisemartin:branches:science:HPC

- Update to v1.13.1 (jsc#PED-912)
  - Core
    - Added new objects to VFS: local and remote address of endpoint,
      statistics of ucp_ep_create success/failure, failed/destroyed endpoints
    - Added support for UCX static libraries
    - Added profiling for rkey management routines
    - PCIe relaxed order enabled by default for AMD CPUs
    - Fixed not deallocating memory from ucp_mem_unmap if no rcache
    - Fixed versioning infrastructure
    - Multiple code improvements: refactoring, debug prints and assertions, etc.
    - Multiple improvements in build, test and docs infrastructure
    - Added new objects to VFS (md, component, log_level, etc.)
    - Added configuration variable to specify which loadable modules are allowed
    - Added build-time configuration to disable sigaction overriding
  - UCP
    - Added API to pass pre-registered memory handle to UCP operations
    - Added implementation of AM rendezvous protocol
    - Added 2-stage pipeline rendezvous protocol for GPU
    - Added support for fragment mem_type for v1 pipeline proto, disabled by default
    - Added active message support for proto v2
    - Added UCP memory registration cache
    - Improved adaptive progress - deactivate iface when all p2p lanes are destroyed
    - Added support for user memh in proto_v1
    - Added support for selecting local address when creating a client endpoint
    - Added option to limit GPUDirectRDMA size in rendezvous protocol, UCX_RNDV_MEMTYPE_DIRECT_SIZE
    - Deprecated UCX_SOCKADDR_AUX_TLS configuration parameter
    - Resolving remote EP ID when creating local EP disabled by default
    - Added client_id to ucp_worker_create() and ucp_conn_request_query() APIs
    - Added ucp_worker_address_query() API
    - Updated ucp_ep_query() API for getting local and remote addresses
    - Added address versioning to correctly preserve wire compatibility starting from version 1.11.0
    - Added new client/server connection establishment packet header format
    - Enabled rendezvous and tag sync protocols when error handling is enabled on the endpoint
    - Added iov zcopy support to RMA operations
    - Reduced memory usage of unexpected messages by fitting receive buffer size to packet size
    - Added support for modifying UCT and UCS configs by ucp_config_modify() API
    - Optimized unpacked rkeys memory consumption
    - Added request flag to influence latency vs. bandwidth protocol
    - Reduced memory management overhead with new protocols
    - Improved performance calculations for new protocols
    - Added AMO support with GPU memory target using new protocols
    - Added put_zcopy, get_zcopy and pipeline based rendezvous in new protocols
    - Added support for user-defined alignment in Active Messages
    - Added support for offload tag sync in new protocols
    - Updated ucp_atomic_post() to use NBX flow
  - UCT
    - Introduced API uct_md_mkey_pack_v2
    - Introduced UCT iface features API
    - Introduced max_inflight_eps parameter in perf_attr API
    - Introduced UCT_SEND_FLAG_PEER_CHECK flag that forces checking connectivity to a peer
    - Introduced UCX_RCACHE_PURGE_ON_FORK to enable/disable cleaning regions when application is forking
    - Disabled PEER_FAILURE capability for XPMEM
    - Added API - uct_iface_is_reachable_v2()
    - Added IPv6 address support in TCP
    - Added latency estimation to uct_iface_estimate_perf()
    - Adjusted knem and cma overhead cost
    - Increased built-in TCP keep-alive interval to 2 seconds
  - RDMA CORE (IB, ROCE, etc.)
    - Introduced NDR autorecognition
    - Introduced CQE zipping support
    - Set the default MAX_RD_ATOMIC to maximum value supported by the hardware
    - Disabled mlx5 ifaces on verbs MD
    - Added detection of IB NDR devices
    - Added check for CQ overrun in assert mode
    - Added bitmap usage for releasing detached DCIs
    - Added configuration for requests ack frequency with DevX
    - Added remote QP info to tx error CQE traces
  - ROCM
    - Increased maximum number of HSA agents
  - UCS
    - Added topo module infrastructure
    - Added memtrack and rcache information to VFS
    - Added API for a per-process aggregate-sum statistics report
    - Added memory pool set data structure
    - Added new ptr_array API for bulk allocation
    - Added ucs_string_buffer_append_flags() for string buffer
    - Added ucs_ffs32()
    - Added ucs_vsnprintf_safe() which always adds '\0'
    - Added thread-safe put to ptr_map
    - Improved accuracy of the topology distance estimation
    - Added prints of leaked callbacks from the callback queue
    - Removed a diagnostic message when fuse thread is stopped
    - Added configurable limit for the memory consumed by rcache
    - Added configuration for VFS(FUSE) thread affinity
    - Added memory limit support to memtrack
  - Packaging
    - Added cmake config files for better integration with external cmake based projects
  - Tools
    - Added loop-back transport support in ucx_perftest
    - Split ucx_perftest into separate modules
    - Added process placement option for ucx_info
    - Extended parameters correctness check in ucx_perftest
- Backported UCS-DEBUG-replace-PTR-with-void.patch
  from upstream to fix compilation

OBS-URL: https://build.opensuse.org/request/show/1006486
OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=48
This commit is contained in:
Nicolas Morey-Chaisemartin 2022-09-29 15:27:45 +00:00 committed by Git OBS Bridge
parent 6e22959692
commit 878438d42d
6 changed files with 159 additions and 24 deletions

View File

@ -0,0 +1,25 @@
commit 2ba0935afc92b9288c01da246b3be53822277add
Author: Hui Zhou <hzhou321@anl.gov>
Date: Sun Aug 14 23:29:09 2022 -0500
UCS/DEBUG: replace PTR with void *
The PTR macro is missing on the latest Arch linux.
diff --git src/ucs/debug/debug.c src/ucs/debug/debug.c
index b803636c2221..4bbf5095c419 100644
--- src/ucs/debug/debug.c
+++ src/ucs/debug/debug.c
@@ -272,10 +272,10 @@ static int load_file(struct backtrace_file *file)
goto err_close;
}
- symcount = bfd_read_minisymbols(file->abfd, 0, (PTR)&file->syms, &size);
+ symcount = bfd_read_minisymbols(file->abfd, 0, (void *)&file->syms, &size);
if (symcount == 0) {
free(file->syms);
- symcount = bfd_read_minisymbols(file->abfd, 1, (PTR)&file->syms, &size);
+ symcount = bfd_read_minisymbols(file->abfd, 1, (void *)&file->syms, &size);
}
if (symcount < 0) {
goto err_close;

View File

@ -1,16 +1,16 @@
commit 71d28736870f46080b8187bf2ba64920c87dc7e4
commit 9d5c0d189d4cd5413089bd65fed1e87293e15763
Author: Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
Date: Thu Aug 9 07:41:24 2018 +0200
Date: Tue Sep 27 17:47:15 2022 +0200
openucx s390x support
Signed-off-by: Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
diff --git config/m4/ucm.m4 config/m4/ucm.m4
index 1e229edc51f2..3f74fca02976 100644
index 8d7a9e40ec06..df7508e1e71a 100644
--- config/m4/ucm.m4
+++ config/m4/ucm.m4
@@ -73,9 +73,20 @@ AC_CHECK_DECLS([SYS_ipc],
@@ -80,9 +80,20 @@ AC_CHECK_DECLS([SYS_ipc],
[ipc_hooks_happy=no],
[#include <sys/syscall.h>])
@ -33,10 +33,10 @@ index 1e229edc51f2..3f74fca02976 100644
AS_IF([test "x$bistro_hooks_happy" = "xyes"],
[AC_DEFINE([UCM_BISTRO_HOOKS], [1], [Enable BISTRO hooks])],
diff --git src/tools/info/sys_info.c src/tools/info/sys_info.c
index 7c355a264c2b..52efadec096c 100644
index 5316b1c4336e..e910bc53572d 100644
--- src/tools/info/sys_info.c
+++ src/tools/info/sys_info.c
@@ -44,7 +44,8 @@ static const char* cpu_vendor_names[] = {
@@ -46,7 +46,8 @@ static const char* cpu_vendor_names[] = {
[UCS_CPU_VENDOR_GENERIC_ARM] = "Generic ARM",
[UCS_CPU_VENDOR_GENERIC_PPC] = "Generic PPC",
[UCS_CPU_VENDOR_FUJITSU_ARM] = "Fujitsu ARM",
@ -47,7 +47,7 @@ index 7c355a264c2b..52efadec096c 100644
static double measure_memcpy_bandwidth(size_t size)
diff --git src/ucm/Makefile.am src/ucm/Makefile.am
index 55784d0c31f4..a6003eda0333 100644
index 5140b5acf5bf..8805124befee 100644
--- src/ucm/Makefile.am
+++ src/ucm/Makefile.am
@@ -31,7 +31,8 @@ noinst_HEADERS = \
@ -98,10 +98,10 @@ index 000000000000..c0f427f4984a
+
+#endif
diff --git src/ucs/Makefile.am src/ucs/Makefile.am
index 8cc77e87da3f..2fbb53188a58 100644
index 77680021d725..29f31aabd958 100644
--- src/ucs/Makefile.am
+++ src/ucs/Makefile.am
@@ -21,6 +21,7 @@ libucs_la_LIBADD = $(LIBM) $(top_builddir)/src/ucm/libucm.la
@@ -22,6 +22,7 @@ libucs_la_LIBADD = $(LIBM) $(top_builddir)/src/ucm/libucm.la $(BFD_LIBS)
nobase_dist_libucs_la_HEADERS = \
arch/aarch64/bitops.h \
arch/ppc64/bitops.h \
@ -109,7 +109,7 @@ index 8cc77e87da3f..2fbb53188a58 100644
arch/x86_64/bitops.h \
arch/bitops.h \
algorithm/crc.h \
@@ -77,12 +78,14 @@ nobase_dist_libucs_la_HEADERS = \
@@ -81,12 +82,14 @@ nobase_dist_libucs_la_HEADERS = \
arch/aarch64/global_opts.h \
arch/generic/atomic.h \
arch/ppc64/global_opts.h \
@ -123,8 +123,8 @@ index 8cc77e87da3f..2fbb53188a58 100644
+ arch/s390x/cpu.h \
arch/x86_64/cpu.h \
arch/cpu.h \
datastruct/arbiter.h \
@@ -127,6 +130,7 @@ libucs_la_SOURCES = \
config/ucm_opts.h \
@@ -134,6 +137,7 @@ libucs_la_SOURCES = \
algorithm/qsort_r.c \
arch/aarch64/cpu.c \
arch/aarch64/global_opts.c \
@ -146,7 +146,7 @@ index 6a8551f592e1..e3a9f4641383 100644
# error "Unsupported architecture"
#endif
diff --git src/ucs/arch/bitops.h src/ucs/arch/bitops.h
index a890cd255295..badc12419b5b 100644
index 77e00571e04f..bbdea0ceb210 100644
--- src/ucs/arch/bitops.h
+++ src/ucs/arch/bitops.h
@@ -20,6 +20,8 @@ BEGIN_C_DECLS
@ -159,7 +159,7 @@ index a890cd255295..badc12419b5b 100644
# error "Unsupported architecture"
#endif
diff --git src/ucs/arch/cpu.c src/ucs/arch/cpu.c
index 210a49c8e717..4018392ebed3 100644
index 9e6fab0904eb..c912e991586c 100644
--- src/ucs/arch/cpu.c
+++ src/ucs/arch/cpu.c
@@ -61,6 +61,10 @@ const ucs_cpu_builtin_memcpy_t ucs_cpu_builtin_memcpy[UCS_CPU_VENDOR_LAST] = {
@ -178,14 +178,14 @@ index 210a49c8e717..4018392ebed3 100644
[UCS_CPU_VENDOR_GENERIC_ARM] = 5800 * UCS_MBYTE,
[UCS_CPU_VENDOR_GENERIC_PPC] = 5800 * UCS_MBYTE,
+ [UCS_CPU_VENDOR_GENERIC_IBM] = 5800 * UCS_MBYTE,
[UCS_CPU_VENDOR_FUJITSU_ARM] = 5800 * UCS_MBYTE
[UCS_CPU_VENDOR_FUJITSU_ARM] = 12000 * UCS_MBYTE
};
diff --git src/ucs/arch/cpu.h src/ucs/arch/cpu.h
index e06f6b95ebb1..15f3198976a9 100644
index 719913fb0b8c..04d69ca01533 100644
--- src/ucs/arch/cpu.h
+++ src/ucs/arch/cpu.h
@@ -62,6 +62,7 @@ typedef enum ucs_cpu_vendor {
@@ -63,6 +63,7 @@ typedef enum ucs_cpu_vendor {
UCS_CPU_VENDOR_AMD,
UCS_CPU_VENDOR_GENERIC_ARM,
UCS_CPU_VENDOR_GENERIC_PPC,
@ -193,7 +193,7 @@ index e06f6b95ebb1..15f3198976a9 100644
UCS_CPU_VENDOR_FUJITSU_ARM,
UCS_CPU_VENDOR_ZHAOXIN,
UCS_CPU_VENDOR_LAST
@@ -97,6 +98,8 @@ typedef struct ucs_cpu_builtin_memcpy {
@@ -98,6 +99,8 @@ typedef struct ucs_cpu_builtin_memcpy {
# include "ppc64/cpu.h"
#elif defined(__aarch64__)
# include "aarch64/cpu.h"
@ -405,10 +405,10 @@ index 000000000000..225e4e5e896a
+#endif
+
diff --git src/ucs/sys/sys.c src/ucs/sys/sys.c
index 59836aaa51c2..3975db7f6be3 100644
index 88f4a147315e..0b6d186265a8 100644
--- src/ucs/sys/sys.c
+++ src/ucs/sys/sys.c
@@ -1223,8 +1223,19 @@ void *ucs_sys_realloc(void *old_ptr, size_t old_length, size_t new_length)
@@ -1224,8 +1224,19 @@ void *ucs_sys_realloc(void *old_ptr, size_t old_length, size_t new_length)
if (old_ptr == NULL) {
/* Note: Must pass the 0 offset as "long", otherwise it will be
* partially undefined when converted to syscall arguments */

View File

@ -1,3 +1,102 @@
-------------------------------------------------------------------
Tue Sep 27 15:55:19 UTC 2022 - Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>
- Update to v1.13.1 (jsc#PED-912)
- Core
- Added new objects to VFS: local and remote address of endpoint,
statistics of ucp_ep_create success/failure, failed/destroyed endpoints
- Added support for UCX static libraries
- Added profiling for rkey management routines
- PCIe relaxed order enabled by default for AMD CPUs
- Fixed not deallocating memory from ucp_mem_unmap if no rcache
- Fixed versioning infrastructure
- Multiple code improvements: refactoring, debug prints and assertions, etc.
- Multiple improvements in build, test and docs infrastructure
- Added new objects to VFS (md, component, log_level, etc.)
- Added configuration variable to specify which loadable modules are allowed
- Added build-time configuration to disable sigaction overriding
- UCP
- Added API to pass pre-registered memory handle to UCP operations
- Added implementation of AM rendezvous protocol
- Added 2-stage pipeline rendezvous protocol for GPU
- Added support for fragment mem_type for v1 pipeline proto, disabled by default
- Added active message support for proto v2
- Added UCP memory registration cache
- Improved adaptive progress - deactivate iface when all p2p lanes are destroyed
- Added support for user memh in proto_v1
- Added support for selecting local address when creating a client endpoint
- Added option to limit GPUDirectRDMA size in rendezvous protocol, UCX_RNDV_MEMTYPE_DIRECT_SIZE
- Deprecated UCX_SOCKADDR_AUX_TLS configuration parameter
- Resolving remote EP ID when creating local EP disabled by default
- Added client_id to ucp_worker_create() and ucp_conn_request_query() APIs
- Added ucp_worker_address_query() API
- Updated ucp_ep_query() API for getting local and remote addresses
- Added address versioning to correctly preserve wire compatibility starting from version 1.11.0
- Added new client/server connection establishment packet header format
- Enabled rendezvous and tag sync protocols when error handling is enabled on the endpoint
- Added iov zcopy support to RMA operations
- Reduced memory usage of unexpected messages by fitting receive buffer size to packet size
- Added support for modifying UCT and UCS configs by ucp_config_modify() API
- Optimized unpacked rkeys memory consumption
- Added request flag to influence latency vs. bandwidth protocol
- Reduced memory management overhead with new protocols
- Improved performance calculations for new protocols
- Added AMO support with GPU memory target using new protocols
- Added put_zcopy, get_zcopy and pipeline based rendezvous in new protocols
- Added support for user-defined alignment in Active Messages
- Added support for offload tag sync in new protocols
- Updated ucp_atomic_post() to use NBX flow
- UCT
- Introduced API uct_md_mkey_pack_v2
- Introduced UCT iface features API
- Introduced max_inflight_eps parameter in perf_attr API
- Introduced UCT_SEND_FLAG_PEER_CHECK flag that forces checking connectivity to a peer
- Introduced UCX_RCACHE_PURGE_ON_FORK to enable/disable cleaning regions when application is forking
- Disabled PEER_FAILURE capability for XPMEM
- Added API - uct_iface_is_reachable_v2()
- Added IPv6 address support in TCP
- Added latency estimation to uct_iface_estimate_perf()
- Adjusted knem and cma overhead cost
- Increased built-in TCP keep-alive interval to 2 seconds
- RDMA CORE (IB, ROCE, etc.)
- Introduced NDR autorecognition
- Introduced CQE zipping support
- Set the default MAX_RD_ATOMIC to maximum value supported by the hardware
- Disabled mlx5 ifaces on verbs MD
- Added detection of IB NDR devices
- Added check for CQ overrun in assert mode
- Added bitmap usage for releasing detached DCIs
- Added configuration for requests ack frequency with DevX
- Added remote QP info to tx error CQE traces
- ROCM
- Increased maximum number of HSA agents
- UCS
- Added topo module infrastructure
- Added memtrack and rcache information to VFS
- Added API for a per-process aggregate-sum statistics report
- Added memory pool set data structure
- Added new ptr_array API for bulk allocation
- Added ucs_string_buffer_append_flags() for string buffer
- Added ucs_ffs32()
- Added ucs_vsnprintf_safe() which always adds '\0'
- Added thread-safe put to ptr_map
- Improved accuracy of the topology distance estimation
- Added prints of leaked callbacks from the callback queue
- Removed a diagnostic message when fuse thread is stopped
- Added configurable limit for the memory consumed by rcache
- Added configuration for VFS(FUSE) thread affinity
- Added memory limit support to memtrack
- Packaging
- Added cmake config files for better integration with external cmake based projects
- Tools
- Added loop-back transport support in ucx_perftest
- Split ucx_perftest into separate modules
- Added process placement option for ucx_info
- Extended parameters correctness check in ucx_perftest
- Backported UCS-DEBUG-replace-PTR-with-void.patch
from upstream to fix compilation
-------------------------------------------------------------------
Thu Jan 13 08:42:05 UTC 2022 - Nicolas Morey-Chaisemartin <nmoreychaisemartin@suse.com>

View File

@ -17,7 +17,7 @@
Name: openucx
Version: 1.11.1
Version: 1.13.1
Release: 0
Summary: Communication layer for Message Passing (MPI)
License: BSD-3-Clause
@ -30,6 +30,7 @@ Source: https://github.com/openucx/ucx/releases/download/v%version/ucx-%
Source1: baselibs.conf
Patch1: openucx-s390x-support.patch
Patch2: ucm-fix-UCX_MEM_MALLOC_RELOC.patch
Patch3: UCS-DEBUG-replace-PTR-with-void.patch
BuildRequires: autoconf >= 2.63
BuildRequires: automake >= 1.10
BuildRequires: binutils-devel
@ -138,6 +139,7 @@ hardware.
%patch1
%endif
%patch2
%patch3
%build
autoreconf -fi
@ -188,6 +190,8 @@ mv %buildroot/%_bindir/io_demo %buildroot/%_libexecdir/%{name}/
%_datadir/%{name}/
%_libexecdir/%{name}
%_libdir/pkgconfig/ucx.pc
%dir %_libdir/cmake/
%_libdir/cmake/ucx/
%doc LICENSE NEWS
%files -n libucm0
@ -211,11 +215,14 @@ mv %buildroot/%_bindir/io_demo %buildroot/%_libexecdir/%{name}/
%files -n libucs0
%defattr(-,root,root)
%_libdir/libucs.so.*
%_libdir/libucs_signal.so.*
%files -n libucs-devel
%defattr(-,root,root)
%_includedir/ucs/
%_libdir/libucs.so
%_libdir/libucs_signal.so
%_libdir/pkgconfig/ucx-ucs.pc
%files -n libuct0
%defattr(-,root,root)
@ -229,5 +236,9 @@ mv %buildroot/%_bindir/io_demo %buildroot/%_libexecdir/%{name}/
%_libdir/libuct.so
%dir %_libdir/ucx/
%_libdir/ucx/libuct_*.so
%_libdir/pkgconfig/ucx-uct.pc
%_libdir/pkgconfig/ucx-cma.pc
%_libdir/pkgconfig/ucx-ib.pc
%_libdir/pkgconfig/ucx-rdmacm.pc
%changelog

View File

@ -1,3 +0,0 @@
version https://git-lfs.github.com/spec/v1
oid sha256:29338cad18858517f96b46ff83bdd259a5899e274792cebd269717c660aa86fd
size 2746949

3
ucx-1.13.1.tar.gz Normal file
View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:efc37829b68e131d2acc82a3fd4334bfd611156a756837ffeb650ab9a9dd3828
size 2979566