22 Commits

Author SHA256 Message Date
Nicolas Morey
3ed2204149 Update to v1.19.1
- Features
    - UCP
      - Do not require transport memory support if rendezvous protocol is not used
    - Build
      - Added CUDA 13 support to the release pipeline
      - Added Rocky OS support to the release pipeline
  - Bugfixes
    - UCS
      - Fixed Netlink fetch mechanism

Signed-off-by: Nicolas Morey <nmorey@suse.com>
2026-01-02 14:34:18 +01:00
Nicolas Morey
ed9e44370b Add patches to fix a badly initialized value in settings
Signed-off-by: Nicolas Morey <nmorey@suse.com>
2025-11-07 17:54:15 +01:00
Nicolas Morey
7690a30a01 Fix a badly initialized value in settings
Signed-off-by: Nicolas Morey <nmorey@suse.com>
2025-11-07 17:22:29 +01:00
Nicolas Morey
a1035f1e89 Minor fixes to openucx-s390x-support.patch
Signed-off-by: Nicolas Morey <nmorey@suse.com>
2025-11-05 17:49:32 +01:00
2e169061f4 Add Gitea build results 2025-10-27 17:35:42 +01:00
0e3357c05c Accepting request 1298351 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/1298351
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/openucx?expand=0&rev=37
2025-08-09 17:58:51 +00:00
5f25c6c29c - Update to ucx 1.19.0
- UCP
    - Enabled multi-GPU support within a single process
    - Added dynamic selection between strong and weak fences in RMA flush operations
    - Improved endpoint reconfiguration capabilities
    - Added All2All lane selection for multi-NIC-GPU systems
    - Improved rkey debug info when config cache limit is reached
    - Improved UCP protocol selection based on available memory types
    - Removed dummy memory key from irrelevant transports (TCP, CMA and CUDA)
    - Improved RNDV performance with device-local staging buffers
    - Enabled error handling for RMA get_offload protocols
    - Made UCX_TLS=^ib disable all transports including auxiliary
    - Fixed send request status handling
    - Fixed performance degradation in RNDV by optimizing md cache updates
    - Fixed protocol selection when first lane is filtered out by fragment size
    - Fixed rkey selection by using memory registration flag
  - UCT
    - Defined uct_rkey_unpack_v2 API to support passing sys-dev
  - RDMA CORE (IB, ROCE, etc.)
    - Added SRD transport support in EFA with reordering, AM, and control operations
    - Removed XGVMI BF2 support (umem)
    - Removed device memory indirect key
    - Fixed VFS objects for DCIs and pools
    - Added routing table cache to the reachability check
    - Fixed strict order usage in IB auxiliary rkeys
    - Improved various init logging messages
    - Improved reliability of DC transport by adding DCI validation and separating connection logic
    - Fixed segfault in DC fence operation
  - UCS
    - Removed compilation warnings

OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=82
2025-08-08 08:15:59 +00:00
f8b8d435cc Accepting request 1285180 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/1285180
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/openucx?expand=0&rev=36
2025-06-13 16:42:51 +00:00
e6035d1f52 - Update to ucx 1.18.1
- CUDA
    - Added config keys to update cuda_copy bandwidth for coherent platforms
    - Improved cache invalidation of memory allocated using CUDA memory pool
  - AZP
    - Added Ubuntu 24.04 to build and release pipeline
  - UCP
    - Fixed assertion failure when maximum lane fragment is smaller than AM header
    - Fixed potential active message user header use after free with protocol reconfiguration
  - CUDA
    - Fixed registration of CUDA Fabric memory allocated by UCT
    - Fixed VA recycling check of memory allocated using VMM and CUDA memory pool
  - RDMA CORE (IB, ROCE, etc.)
    - Do not use ConnectX-8 SMI subdevices for communication
    - Fixed remote access error by disabling ODP when the device supports DDP
    - Fixed configuration logic by disabling DDP when AR is disabled
  - UCM
    - Fixed crash with bistro hooks for CUDA 12.9 on amd64

OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=80
2025-06-12 14:32:39 +00:00
f22c7e86d8 Accepting request 1277496 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/1277496
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/openucx?expand=0&rev=35
2025-05-23 12:29:12 +00:00
d8d8c7c955 add patches to fix gcc-15 compile errors (boo#1241939)
OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=78
2025-05-14 20:27:27 +00:00
77c5e72d38 Accepting request 1266178 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/1266178
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/openucx?expand=0&rev=34
2025-04-02 15:09:07 +00:00
28afc5599d - Add UCT-IB-UD-Use-GRH-to-detect-address-family-on-non-Mellanox-hardware.patch
to fix an UD init issue on non-Mellanox RDMA HW (bsc#1240204).

OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=76
2025-04-01 13:23:59 +00:00
ad2b6e1eb3 Accepting request 1247274 from science:HPC
- Update to ucx 1.18.0
  - UCP
    - Enabled using CUDA staging buffers for pipeline protocols by default
    - Added endpoint reconfiguration support for non-reused p2p scenarios
    - Enabled non-cacheable memory domains, activated for gdr_copy
    - Added user_data parameter to ucp_ep_query
    - Added support for host memory pipeline through CUDA buffers for rendezvous protocol
    - Added global VA infrastructure and memory region in absence of error handling
    - Made protocol performance node names more informative
    - Enforced always running on the same thread in single thread mode
    - Multiple improvements in protocols selection infrastructure
    - Added UCP_MEM_MAP_LOCK API flag to enforce locked memory mapping
    - Allowed up-to 64 endpoint lanes for systems with many transports or devices
    - Added usage tracker to worker
    - Improved various logging messages
    - Fixed stack overflow in exported rkey unpack
    - Removed extra remote-cpu overhead from protocol estimation for zcopy
    - Fixed performance estimation for rndv pipeline protocols
    - Fixed ATP sending by picking the correct lane
    - Fixed missing reg_id on memh creation
    - Fixed repeated invalidations by retaining existing access flags
    - Fixed abort reason propagation for rendezvous RTR mtype
    - Do not check transport availability if it is disabled by UCX_TLS environment variable
    - Fixed wrong flag being used for checking BCOPY capability
    - Fixed sending too many ATPs for small messages
    - Enforced 16 bits size for Active Messages identifiers
    - Fixed unnecessary status check for emulated AMO
    - Fixed more than one fragment sending in rendezvous pipeline
    - Fixed crash by using biggest max frag across all lanes
    - Fixed missing memory handle flags by copying from parent to child

OBS-URL: https://build.opensuse.org/request/show/1247274
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/openucx?expand=0&rev=33
2025-02-20 15:28:03 +00:00
47635a7117 Accepting request 1247273 from home:NMorey:branches:science:HPC
- Refresh openucx-s390x-support.patch due to API changes

OBS-URL: https://build.opensuse.org/request/show/1247273
OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=74
2025-02-20 06:38:45 +00:00
9a474b25ce Accepting request 1247161 from home:NMorey:branches:science:HPC
- Update to ucx 1.18.0
  - UCP
    - Enabled using CUDA staging buffers for pipeline protocols by default
    - Added endpoint reconfiguration support for non-reused p2p scenarios
    - Enabled non-cacheable memory domains, activated for gdr_copy
    - Added user_data parameter to ucp_ep_query
    - Added support for host memory pipeline through CUDA buffers for rendezvous protocol
    - Added global VA infrastructure and memory region in absence of error handling
    - Made protocol performance node names more informative
    - Enforced always running on the same thread in single thread mode
    - Multiple improvements in protocols selection infrastructure
    - Added UCP_MEM_MAP_LOCK API flag to enforce locked memory mapping
    - Allowed up-to 64 endpoint lanes for systems with many transports or devices
    - Added usage tracker to worker
    - Improved various logging messages
    - Fixed stack overflow in exported rkey unpack
    - Removed extra remote-cpu overhead from protocol estimation for zcopy
    - Fixed performance estimation for rndv pipeline protocols
    - Fixed ATP sending by picking the correct lane
    - Fixed missing reg_id on memh creation
    - Fixed repeated invalidations by retaining existing access flags
    - Fixed abort reason propagation for rendezvous RTR mtype
    - Do not check transport availability if it is disabled by UCX_TLS environment variable
    - Fixed wrong flag being used for checking BCOPY capability
    - Fixed sending too many ATPs for small messages
    - Enforced 16 bits size for Active Messages identifiers
    - Fixed unnecessary status check for emulated AMO
    - Fixed more than one fragment sending in rendezvous pipeline
    - Fixed crash by using biggest max frag across all lanes
    - Fixed missing memory handle flags by copying from parent to child

OBS-URL: https://build.opensuse.org/request/show/1247161
OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=73
2025-02-19 20:35:36 +00:00
83523eaad4 Accepting request 1199376 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/1199376
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/openucx?expand=0&rev=32
2024-09-09 12:43:20 +00:00
68685ed0da Accepting request 1199375 from home:NMorey:branches:science:HPC
- Refresh openucx-s390x-support.patch to fix compilation on s390x

OBS-URL: https://build.opensuse.org/request/show/1199375
OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=71
2024-09-07 14:26:13 +00:00
a5f1adbb12 Accepting request 1184228 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/1184228
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/openucx?expand=0&rev=31
2024-07-03 18:26:35 +00:00
de09e2a891 Accepting request 1184022 from openSUSE:Factory:RISCV
- Enable build on riscv64

OBS-URL: https://build.opensuse.org/request/show/1184022
OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=69
2024-07-01 08:27:55 +00:00
49c5ede7c9 Accepting request 1183479 from science:HPC
- Update to 1.17.0
  - See NEWS for the complete CHANGELOG
- Refresh openucx-s390x-support.patch against the latest sources
- Add upstream fix UCS-TIME-Add-math.h-to-provide-INFINITY.patch
  to fix compilation on ppc64

OBS-URL: https://build.opensuse.org/request/show/1183479
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/openucx?expand=0&rev=30
2024-06-29 13:16:13 +00:00
b79725a512 Accepting request 1183477 from home:NMorey:branches:science:HPC
- Update to 1.17.0
  - See NEWS for the complete CHANGELOG
- Refresh openucx-s390x-support.patch against the latest sources
- Add upstream fix UCS-TIME-Add-math.h-to-provide-INFINITY.patch
  to fix compilation on ppc64

OBS-URL: https://build.opensuse.org/request/show/1183477
OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=67
2024-06-26 17:49:24 +00:00
8 changed files with 422 additions and 88 deletions

12
README.md Normal file
View File

@@ -0,0 +1,12 @@
## Build Results
Current state of openucx in openSUSE:Factory is
![Factory build results](https://br.opensuse.org/status/openSUSE:Factory/openucx/standard)
The current state of openucx in the devel project build (science:HPC)
![Devel project build results](https://br.opensuse.org/status/science:HPC/openucx)

View File

@@ -0,0 +1,21 @@
commit 2d79ffee423fd4570599258e00689cc745e8785e
Author: Nicolas Morey <nmorey@suse.com>
Date: Fri Nov 7 17:19:54 2025 +0100
UCP/CORE: Fix config type for dynamic_tl_progress_factor
Signed-off-by: Nicolas Morey <nmorey@suse.com>
diff --git src/ucp/core/ucp_context.c src/ucp/core/ucp_context.c
index 8b9dbeaca9ea..4cbae096ed93 100644
--- src/ucp/core/ucp_context.c
+++ src/ucp/core/ucp_context.c
@@ -440,7 +440,7 @@ static ucs_config_field_t ucp_context_config_table[] = {
"Number of usage tracker rounds performed for each progress operation. Must be\n"
"non-zero value.",
ucs_offsetof(ucp_context_config_t, dynamic_tl_progress_factor),
- UCS_CONFIG_TYPE_TIME_UNITS},
+ UCS_CONFIG_TYPE_UINT},
{"RESOLVE_REMOTE_EP_ID", "n",
"Defines whether resolving remote endpoint ID is required or not when\n"

View File

@@ -0,0 +1,23 @@
commit 9655ec674b1d6278a80705eeb1e5bf0a36d7a211
Author: Nicolas Morey <nmorey@suse.com>
Date: Fri Nov 7 17:51:31 2025 +0100
UCT/SELF: Fix config type for num_devices
size_t may be larger than an int. This causes issue on big endian systems
Signed-off-by: Nicolas Morey <nmorey@suse.com>
diff --git src/uct/sm/self/self.c src/uct/sm/self/self.c
index 6e7815c21dfa..1986e9cde290 100644
--- src/uct/sm/self/self.c
+++ src/uct/sm/self/self.c
@@ -57,7 +57,7 @@ static ucs_config_field_t uct_self_md_config_table[] = {
UCS_CONFIG_TYPE_TABLE(uct_md_config_table)},
{"NUM_DEVICES", "1", "Number of \"self\" devices to create",
- ucs_offsetof(uct_self_md_config_t, num_devices), UCS_CONFIG_TYPE_INT},
+ ucs_offsetof(uct_self_md_config_t, num_devices), UCS_CONFIG_TYPE_ULONG},
{NULL}
};

View File

@@ -1,6 +1,6 @@
commit 328a69d07b618e0aa83fe2351e8d7ca4fc1b2f00
commit e5fd9ff24191cfd99b5759bdaf291cc36aaa6346
Author: Nicolas Morey <nmorey@suse.com>
Date: Mon Feb 13 17:04:14 2023 +0100
Date: Wed Feb 19 16:46:33 2025 +0100
openucx s390x support
@@ -32,42 +32,26 @@ index e5e66266d695..ef7e4ede93ce 100644
AS_IF([test "x$bistro_hooks_happy" = "xyes"],
[AC_DEFINE([UCM_BISTRO_HOOKS], [1], [Enable BISTRO hooks])],
diff --git src/tools/info/sys_info.c src/tools/info/sys_info.c
index e5aff871d491..2b7c54319f53 100644
--- src/tools/info/sys_info.c
+++ src/tools/info/sys_info.c
@@ -47,7 +47,8 @@ static const char* cpu_vendor_names[] = {
[UCS_CPU_VENDOR_GENERIC_ARM] = "Generic ARM",
[UCS_CPU_VENDOR_GENERIC_PPC] = "Generic PPC",
[UCS_CPU_VENDOR_FUJITSU_ARM] = "Fujitsu ARM",
- [UCS_CPU_VENDOR_ZHAOXIN] = "Zhaoxin"
+ [UCS_CPU_VENDOR_ZHAOXIN] = "Zhaoxin",
+ [UCS_CPU_VENDOR_GENERIC_IBM] = "Generic IBM"
};
static double measure_memcpy_bandwidth(size_t size)
diff --git src/ucm/Makefile.am src/ucm/Makefile.am
index 48b82bf89cbe..582f83d1ea82 100644
index 7866aa0ac13b..2d44e20f124d 100644
--- src/ucm/Makefile.am
+++ src/ucm/Makefile.am
@@ -31,7 +31,8 @@ noinst_HEADERS = \
bistro/bistro.h \
bistro/bistro_x86_64.h \
@@ -35,6 +35,7 @@ noinst_HEADERS = \
bistro/bistro_aarch64.h \
- bistro/bistro_ppc64.h
+ bistro/bistro_ppc64.h \
bistro/bistro_ppc64.h \
bistro/bistro_rv64.h
+ bistro/bistro_s390x.h
libucm_la_SOURCES = \
event/event.c \
diff --git src/ucm/bistro/bistro.h src/ucm/bistro/bistro.h
index b622e3c14fbb..4acd9e9cdb83 100644
index fffbe738b116..31859a84b159 100644
--- src/ucm/bistro/bistro.h
+++ src/ucm/bistro/bistro.h
@@ -20,6 +20,8 @@ typedef struct ucm_bistro_restore_point ucm_bistro_restore_point_t;
# include "bistro_aarch64.h"
#elif defined(__x86_64__)
@@ -23,6 +23,8 @@ typedef struct ucm_bistro_restore_point ucm_bistro_restore_point_t;
# include "bistro_x86_64.h"
#elif defined(__riscv)
# include "bistro_rv64.h"
+#elif defined(__s390x__)
+# include "bistro_s390x.h"
#else
@@ -75,10 +59,10 @@ index b622e3c14fbb..4acd9e9cdb83 100644
#endif
diff --git src/ucm/bistro/bistro_s390x.h src/ucm/bistro/bistro_s390x.h
new file mode 100644
index 000000000000..c0f427f4984a
index 000000000000..2beb5de54fab
--- /dev/null
+++ src/ucm/bistro/bistro_s390x.h
@@ -0,0 +1,18 @@
@@ -0,0 +1,27 @@
+#ifndef UCM_BISTRO_BISTRO_S390X_H_
+#define UCM_BISTRO_BISTRO_S390X_H_
+
@@ -90,55 +74,65 @@ index 000000000000..c0f427f4984a
+#define UCM_BISTRO_PROLOGUE
+#define UCM_BISTRO_EPILOGUE
+
+typedef struct ucm_bistro_patch {
+} UCS_S_PACKED ucm_bistro_patch_t;
+typedef struct {
+} UCS_S_PACKED ucm_bistro_lock_t;
+
+static inline ucs_status_t ucm_bistro_patch(void *func_ptr, void *hook, const char *symbol,
+ void **orig_func_p,
+ ucm_bistro_restore_point_t **rp){
+ void **orig_func_p,
+ ucm_bistro_restore_point_t **rp){
+ return UCS_ERR_UNSUPPORTED;
+}
+
+static inline void ucm_bistro_patch_lock(void * UCS_V_UNUSED dst)
+{
+}
+
+#endif
diff --git src/ucs/Makefile.am src/ucs/Makefile.am
index c7696d56f25d..c63b32bad844 100644
index 699a4addcd29..2f20f9945411 100644
--- src/ucs/Makefile.am
+++ src/ucs/Makefile.am
@@ -22,6 +22,7 @@ libucs_la_LIBADD = $(LIBM) $(top_builddir)/src/ucm/libucm.la $(BFD_LIBS)
nobase_dist_libucs_la_HEADERS = \
@@ -24,6 +24,7 @@ nobase_dist_libucs_la_HEADERS = \
arch/aarch64/bitops.h \
arch/ppc64/bitops.h \
arch/rv64/bitops.h \
+ arch/s390x/bitops.h \
arch/x86_64/bitops.h \
arch/bitops.h \
algorithm/crc.h \
@@ -82,12 +83,14 @@ nobase_dist_libucs_la_HEADERS = \
arch/aarch64/global_opts.h \
@@ -87,6 +88,7 @@ nobase_dist_libucs_la_HEADERS = \
arch/generic/atomic.h \
arch/ppc64/global_opts.h \
arch/rv64/global_opts.h \
+ arch/s390x/global_opts.h \
arch/global_opts.h
noinst_HEADERS = \
arch/aarch64/cpu.h \
@@ -94,6 +96,7 @@ noinst_HEADERS = \
arch/generic/cpu.h \
arch/ppc64/cpu.h \
arch/rv64/cpu.h \
+ arch/s390x/cpu.h \
arch/x86_64/cpu.h \
arch/cpu.h \
config/ucm_opts.h \
@@ -138,6 +141,7 @@ libucs_la_SOURCES = \
@@ -150,6 +153,7 @@ libucs_la_SOURCES = \
algorithm/string_distance.c \
arch/aarch64/cpu.c \
arch/aarch64/global_opts.c \
+ arch/s390x/global_opts.c \
arch/ppc64/timebase.c \
arch/ppc64/global_opts.c \
arch/x86_64/cpu.c \
arch/rv64/cpu.c \
diff --git src/ucs/arch/atomic.h src/ucs/arch/atomic.h
index 52be711c1d0a..8f1d62a28dc9 100644
index 849647902fab..a328c37e2020 100644
--- src/ucs/arch/atomic.h
+++ src/ucs/arch/atomic.h
@@ -15,6 +15,8 @@
@@ -18,6 +18,8 @@
# include "generic/atomic.h"
#elif defined(__aarch64__)
#elif defined(__riscv)
# include "generic/atomic.h"
+#elif defined(__s390x__)
+# include "generic/atomic.h"
@@ -146,23 +140,23 @@ index 52be711c1d0a..8f1d62a28dc9 100644
# error "Unsupported architecture"
#endif
diff --git src/ucs/arch/bitops.h src/ucs/arch/bitops.h
index e89a37d0b673..dd2b9d5b6bcb 100644
index ae531834451e..d4228b135641 100644
--- src/ucs/arch/bitops.h
+++ src/ucs/arch/bitops.h
@@ -20,6 +20,8 @@ BEGIN_C_DECLS
# include "ppc64/bitops.h"
#elif defined(__aarch64__)
@@ -23,6 +23,8 @@ BEGIN_C_DECLS
# include "aarch64/bitops.h"
#elif defined(__riscv)
# include "rv64/bitops.h"
+#elif defined(__s390x__)
+# include "s390x/bitops.h"
#else
# error "Unsupported architecture"
#endif
diff --git src/ucs/arch/cpu.c src/ucs/arch/cpu.c
index ece8f7fb82dd..b35b10ad090a 100644
index 6fe5e31dba31..f92c53f303cd 100644
--- src/ucs/arch/cpu.c
+++ src/ucs/arch/cpu.c
@@ -63,6 +63,10 @@ const ucs_cpu_builtin_memcpy_t ucs_cpu_builtin_memcpy[UCS_CPU_VENDOR_LAST] = {
@@ -64,6 +64,10 @@ const ucs_cpu_builtin_memcpy_t ucs_cpu_builtin_memcpy[UCS_CPU_VENDOR_LAST] = {
.min = UCS_MEMUNITS_INF,
.max = UCS_MEMUNITS_INF
},
@@ -173,43 +167,67 @@ index ece8f7fb82dd..b35b10ad090a 100644
[UCS_CPU_VENDOR_FUJITSU_ARM] = {
.min = UCS_MEMUNITS_INF,
.max = UCS_MEMUNITS_INF
@@ -78,6 +82,7 @@ const size_t ucs_cpu_est_bcopy_bw[UCS_CPU_VENDOR_LAST] = {
[UCS_CPU_VENDOR_INTEL] = UCS_CPU_EST_BCOPY_BW_DEFAULT,
[UCS_CPU_VENDOR_AMD] = UCS_CPU_EST_BCOPY_BW_AMD,
[UCS_CPU_VENDOR_GENERIC_ARM] = UCS_CPU_EST_BCOPY_BW_DEFAULT,
+ [UCS_CPU_VENDOR_GENERIC_IBM] = UCS_CPU_EST_BCOPY_BW_DEFAULT,
[UCS_CPU_VENDOR_GENERIC_PPC] = UCS_CPU_EST_BCOPY_BW_DEFAULT,
[UCS_CPU_VENDOR_FUJITSU_ARM] = UCS_CPU_EST_BCOPY_BW_FUJITSU_ARM,
[UCS_CPU_VENDOR_ZHAOXIN] = UCS_CPU_EST_BCOPY_BW_DEFAULT
@@ -82,7 +86,6 @@ const ucs_cpu_builtin_memcpy_t ucs_cpu_builtin_memcpy[UCS_CPU_VENDOR_LAST] = {
}
};
-
static void ucs_sysfs_get_cache_size()
{
char type_str[32]; /* Data/Instruction/Unified */
@@ -167,6 +170,7 @@ const char *ucs_cpu_vendor_name()
[UCS_CPU_VENDOR_GENERIC_ARM] = "Generic ARM",
[UCS_CPU_VENDOR_GENERIC_PPC] = "Generic PPC",
[UCS_CPU_VENDOR_GENERIC_RV64G] = "Generic RV64G",
+ [UCS_CPU_VENDOR_GENERIC_IBM] = "Generic IBM",
[UCS_CPU_VENDOR_FUJITSU_ARM] = "Fujitsu ARM",
[UCS_CPU_VENDOR_ZHAOXIN] = "Zhaoxin",
[UCS_CPU_VENDOR_NVIDIA] = "Nvidia"
@@ -197,6 +201,7 @@ const char *ucs_cpu_model_name()
[UCS_CPU_MODEL_ZHAOXIN_WUDAOKOU] = "Wudaokou",
[UCS_CPU_MODEL_ZHAOXIN_LUJIAZUI] = "Lujiazui",
[UCS_CPU_MODEL_RV64G] = "RV64G",
+ [UCS_CPU_MODEL_S390X] = "S390x",
[UCS_CPU_MODEL_NVIDIA_GRACE] = "Grace"
};
diff --git src/ucs/arch/cpu.h src/ucs/arch/cpu.h
index eb557d385670..cfd297e24558 100644
index 857b8b804cf7..89461d52d406 100644
--- src/ucs/arch/cpu.h
+++ src/ucs/arch/cpu.h
@@ -64,6 +64,7 @@ typedef enum ucs_cpu_vendor {
@@ -41,6 +41,7 @@ typedef enum ucs_cpu_model {
UCS_CPU_MODEL_ZHAOXIN_WUDAOKOU,
UCS_CPU_MODEL_ZHAOXIN_LUJIAZUI,
UCS_CPU_MODEL_RV64G,
+ UCS_CPU_MODEL_S390X,
UCS_CPU_MODEL_NVIDIA_GRACE,
UCS_CPU_MODEL_LAST
} ucs_cpu_model_t;
@@ -70,6 +71,7 @@ typedef enum ucs_cpu_vendor {
UCS_CPU_VENDOR_AMD,
UCS_CPU_VENDOR_GENERIC_ARM,
UCS_CPU_VENDOR_GENERIC_PPC,
+ UCS_CPU_VENDOR_GENERIC_IBM,
UCS_CPU_VENDOR_FUJITSU_ARM,
UCS_CPU_VENDOR_ZHAOXIN,
UCS_CPU_VENDOR_LAST
@@ -99,6 +100,8 @@ typedef struct ucs_cpu_builtin_memcpy {
# include "ppc64/cpu.h"
#elif defined(__aarch64__)
UCS_CPU_VENDOR_GENERIC_RV64G,
@@ -109,6 +111,8 @@ typedef struct ucs_cpu_builtin_memcpy {
# include "aarch64/cpu.h"
#elif defined(__riscv)
# include "rv64/cpu.h"
+#elif defined(__s390x__)
+# include "s390x/cpu.h"
#else
# error "Unsupported architecture"
#endif
diff --git src/ucs/arch/global_opts.h src/ucs/arch/global_opts.h
index 75d086177a7f..96c670cb60d3 100644
index 550d22b8b751..d8e4a7cca694 100644
--- src/ucs/arch/global_opts.h
+++ src/ucs/arch/global_opts.h
@@ -15,6 +15,8 @@
# include "ppc64/global_opts.h"
#elif defined(__aarch64__)
@@ -18,6 +18,8 @@
# include "aarch64/global_opts.h"
#elif defined(__riscv)
# include "rv64/global_opts.h"
+#elif defined(__s390x__)
+# include "s390x/global_opts.h"
#else
@@ -217,7 +235,7 @@ index 75d086177a7f..96c670cb60d3 100644
#endif
diff --git src/ucs/arch/s390x/bitops.h src/ucs/arch/s390x/bitops.h
new file mode 100644
index 000000000000..ce48ff1ff451
index 000000000000..88b74558f333
--- /dev/null
+++ src/ucs/arch/s390x/bitops.h
@@ -0,0 +1,37 @@
@@ -244,7 +262,7 @@ index 000000000000..ce48ff1ff451
+{
+ if (!n)
+ return 0;
+ return 63 - __builtin_clz(n);
+ return 63 - __builtin_clzll(n);
+}
+
+static UCS_F_ALWAYS_INLINE unsigned ucs_ffs32(uint32_t n)
@@ -260,10 +278,10 @@ index 000000000000..ce48ff1ff451
+#endif
diff --git src/ucs/arch/s390x/cpu.h src/ucs/arch/s390x/cpu.h
new file mode 100644
index 000000000000..4f0a87006118
index 000000000000..e1d41a0ef8b8
--- /dev/null
+++ src/ucs/arch/s390x/cpu.h
@@ -0,0 +1,84 @@
@@ -0,0 +1,86 @@
+/**
+* Copyright (C) Mellanox Technologies Ltd. 2001-2013. ALL RIGHTS RESERVED.
+* Copyright (C) ARM Ltd. 2016-2017. ALL RIGHTS RESERVED.
@@ -290,7 +308,7 @@ index 000000000000..4f0a87006118
+#define ucs_memory_bus_fence() asm volatile (""::: "memory")
+#define ucs_memory_bus_store_fence() ucs_memory_bus_fence()
+#define ucs_memory_bus_load_fence() ucs_memory_bus_fence()
+#define ucs_memory_bus_wc_flush() ucs_memory_bus_fence()
+#define ucs_memory_bus_cacheline_wc_flush() ucs_memory_bus_fence()
+#define ucs_memory_cpu_fence() ucs_memory_bus_fence()
+#define ucs_memory_cpu_store_fence() ucs_memory_bus_fence()
+#define ucs_memory_cpu_load_fence() ucs_memory_bus_fence()
@@ -308,7 +326,7 @@ index 000000000000..4f0a87006118
+
+static inline ucs_cpu_model_t ucs_arch_get_cpu_model()
+{
+ return UCS_CPU_MODEL_UNKNOWN;
+ return UCS_CPU_MODEL_S390X;
+}
+
+static inline ucs_cpu_vendor_t ucs_arch_get_cpu_vendor()
@@ -329,7 +347,9 @@ index 000000000000..4f0a87006118
+{
+}
+
+static inline void *ucs_memcpy_relaxed(void *dst, const void *src, size_t len)
+static inline void *ucs_memcpy_relaxed(void *dst, const void *src, size_t len,
+ ucs_arch_memcpy_hint_t hint,
+ size_t total_len)
+{
+ return memcpy(dst, src, len);
+}
@@ -380,7 +400,7 @@ index 000000000000..4fa0c74034a7
+#endif
diff --git src/ucs/arch/s390x/global_opts.h src/ucs/arch/s390x/global_opts.h
new file mode 100644
index 000000000000..225e4e5e896a
index 000000000000..b7c5693266d9
--- /dev/null
+++ src/ucs/arch/s390x/global_opts.h
@@ -0,0 +1,25 @@
@@ -391,8 +411,8 @@ index 000000000000..225e4e5e896a
+*/
+
+
+#ifndef UCS_PPC64_GLOBAL_OPTS_H_
+#define UCS_PPC64_GLOBAL_OPTS_H_
+#ifndef UCS_S390X_GLOBAL_OPTS_H_
+#define UCS_S390X_GLOBAL_OPTS_H_
+
+#include <ucs/sys/compiler_def.h>
+
@@ -410,10 +430,10 @@ index 000000000000..225e4e5e896a
+#endif
+
diff --git src/ucs/sys/sys.c src/ucs/sys/sys.c
index 58e67835c4d0..308f03606d5b 100644
index 7cd875e8f7b2..b8b2d3c026be 100644
--- src/ucs/sys/sys.c
+++ src/ucs/sys/sys.c
@@ -1258,8 +1258,19 @@ void *ucs_sys_realloc(void *old_ptr, size_t old_length, size_t new_length)
@@ -1265,8 +1265,19 @@ void *ucs_sys_realloc(void *old_ptr, size_t old_length, size_t new_length)
if (old_ptr == NULL) {
/* Note: Must pass the 0 offset as "long", otherwise it will be
* partially undefined when converted to syscall arguments */

View File

@@ -1,3 +1,254 @@
-------------------------------------------------------------------
Fri Jan 02 14:32:48 UTC 2026 - Nicolas Morey <nmorey@suse.com>
- Update to v1.19.1
- Features
- UCP
- Do not require transport memory support if rendezvous protocol is not used
- Build
- Added CUDA 13 support to the release pipeline
- Added Rocky OS support to the release pipeline
- Bugfixes
- UCS
- Fixed Netlink fetch mechanism
-------------------------------------------------------------------
Wed Nov 5 16:48:53 UTC 2025 - Nicolas Morey <nicolas.morey@suse.com>
- Minor fixes to openucx-s390x-support.patch
- Add patches to fix a badly initialized value in settings
- UCP-CORE-Fix-config-type-for-dynamic_tl_progress_factor.patch
- UCT-SELF-Fix-config-type-for-num_devices.patch
-------------------------------------------------------------------
Wed Jun 25 15:49:50 UTC 2025 - Nicolas Morey <nicolas.morey@suse.com>
- Update to ucx 1.19.0
- UCP
- Enabled multi-GPU support within a single process
- Added dynamic selection between strong and weak fences in RMA flush operations
- Improved endpoint reconfiguration capabilities
- Added All2All lane selection for multi-NIC-GPU systems
- Improved rkey debug info when config cache limit is reached
- Improved UCP protocol selection based on available memory types
- Removed dummy memory key from irrelevant transports (TCP, CMA and CUDA)
- Improved RNDV performance with device-local staging buffers
- Enabled error handling for RMA get_offload protocols
- Made UCX_TLS=^ib disable all transports including auxiliary
- Fixed send request status handling
- Fixed performance degradation in RNDV by optimizing md cache updates
- Fixed protocol selection when first lane is filtered out by fragment size
- Fixed rkey selection by using memory registration flag
- UCT
- Defined uct_rkey_unpack_v2 API to support passing sys-dev
- RDMA CORE (IB, ROCE, etc.)
- Added SRD transport support in EFA with reordering, AM, and control operations
- Removed XGVMI BF2 support (umem)
- Removed device memory indirect key
- Fixed VFS objects for DCIs and pools
- Added routing table cache to the reachability check
- Fixed strict order usage in IB auxiliary rkeys
- Improved various init logging messages
- Improved reliability of DC transport by adding DCI validation and separating connection logic
- Fixed segfault in DC fence operation
- UCS
- Removed compilation warnings
- Use UCS function for counting leading zeros on x86 architecture
- Fixed a compilation warning
- Shared Memory
- Fixed FIFO availability check for sm transport
- Tools
- Added name filter option (-F 'str') to ucx_info for config and feature dumps
- Improved ucx_info input validation
- Documentation
- Fixed open-mpi clone instruction
- Build
- Fixed enum-int-mismatch warnings with GCC 15
- Drop patches merged upstream:
- UCT-IB-UD-Use-GRH-to-detect-address-family-on-non-Mellanox-hardware.patch
- openucx-extern-c.patch
- openucx-strict-headers-additional.patch
- openucx-strict-headers.patch
-------------------------------------------------------------------
Thu Jun 12 08:28:59 UTC 2025 - Nicolas Morey <nicolas.morey@suse.com>
- Update to ucx 1.18.1
- CUDA
- Added config keys to update cuda_copy bandwidth for coherent platforms
- Improved cache invalidation of memory allocated using CUDA memory pool
- AZP
- Added Ubuntu 24.04 to build and release pipeline
- UCP
- Fixed assertion failure when maximum lane fragment is smaller than AM header
- Fixed potential active message user header use after free with protocol reconfiguration
- CUDA
- Fixed registration of CUDA Fabric memory allocated by UCT
- Fixed VA recycling check of memory allocated using VMM and CUDA memory pool
- RDMA CORE (IB, ROCE, etc.)
- Do not use ConnectX-8 SMI subdevices for communication
- Fixed remote access error by disabling ODP when the device supports DDP
- Fixed configuration logic by disabling DDP when AR is disabled
- UCM
- Fixed crash with bistro hooks for CUDA 12.9 on amd64
-------------------------------------------------------------------
Wed May 2 14:16:35 UTC 2025 - Friedrich Haubensak <hsk17@mail.de>
- Add openucx-strict-headers.patch and openucx-extern-c.patch from
upstream and additional openucx-strict-headers-additional.patch
to build w/ gcc-15 (boo#1241939)
-------------------------------------------------------------------
Tue Apr 1 12:31:11 UTC 2025 - Nicolas Morey <nicolas.morey@suse.com>
- Add UCT-IB-UD-Use-GRH-to-detect-address-family-on-non-Mellanox-hardware.patch
to fix an UD init issue on non-Mellanox RDMA HW (bsc#1240204).
-------------------------------------------------------------------
Wed Feb 19 15:47:23 UTC 2025 - Nicolas Morey <nicolas.morey@suse.com>
- Update to ucx 1.18.0
- UCP
- Enabled using CUDA staging buffers for pipeline protocols by default
- Added endpoint reconfiguration support for non-reused p2p scenarios
- Enabled non-cacheable memory domains, activated for gdr_copy
- Added user_data parameter to ucp_ep_query
- Added support for host memory pipeline through CUDA buffers for rendezvous protocol
- Added global VA infrastructure and memory region in absence of error handling
- Made protocol performance node names more informative
- Enforced always running on the same thread in single thread mode
- Multiple improvements in protocols selection infrastructure
- Added UCP_MEM_MAP_LOCK API flag to enforce locked memory mapping
- Allowed up-to 64 endpoint lanes for systems with many transports or devices
- Added usage tracker to worker
- Improved various logging messages
- Fixed stack overflow in exported rkey unpack
- Removed extra remote-cpu overhead from protocol estimation for zcopy
- Fixed performance estimation for rndv pipeline protocols
- Fixed ATP sending by picking the correct lane
- Fixed missing reg_id on memh creation
- Fixed repeated invalidations by retaining existing access flags
- Fixed abort reason propagation for rendezvous RTR mtype
- Do not check transport availability if it is disabled by UCX_TLS environment variable
- Fixed wrong flag being used for checking BCOPY capability
- Fixed sending too many ATPs for small messages
- Enforced 16 bits size for Active Messages identifiers
- Fixed unnecessary status check for emulated AMO
- Fixed more than one fragment sending in rendezvous pipeline
- Fixed crash by using biggest max frag across all lanes
- Fixed missing memory handle flags by copying from parent to child
- Fixed worker interface activate count
- Fixed flush requests by replacing ATP/flush lane map with lane indexes
- Fixed lost uct_flags when merging memory regions
- UCT
- Fixed memory domain UCT flags description
- RDMA CORE (IB, ROCE, etc.)
- Added environment variable to manage DC initiator capacity
- Added DC dcs_hybrid policy
- Reduced MLX5/DV stack size consumption
- Added ODP support for verbs and mlx5dv
- Added support of CUDA managed memory on IB when ODP is available
- Added support of Adaptive Routing on RoCE
- Enabled use of implicit ODP with relaxed ordering
- Improved GPU-Direct detection in IB transport
- Increased DC initiator default count to 32 for performance optimization
- Added ConnectX-8 device support with DDP
- Added support for subnet filter list for RoCE interfaces
- Enhanced the error message to provide more details when a connection cannot be
established due to unreachable transports
- Added IB MLX5 as a separate UCX module with separate RPM sub-package
- Added initial support for GGA transport, for fast DPU memory access
- Set IB DevX atomic mode based on device capabilities
- Removed DC keepalive mechanism, since the keepalive is done on UCP layer
- Optimized cross-gVMI memory registration using indirect memory keys cache
- Improved various logging messages
- Fixed FETCH_ADD remote access error for ODP/KSM case
- Fixed missing conditional compilation checks for DM
- Fixed IB MD allocation naming typo
- Fixed invalid GIDs filter in IB
- Fixed flags usage in MLX5 zcopy_post
- Do not limit ODP registration retries
- Fixed JUCX failures by considering the number of supported completion vectors
- UCS
- Added support for wildcards in configuration parameter names
- Added ASAN protection to several internal data structures
- Reduced stack usage in topology detection code
- Improved bitmaps configuration parsing with wider bitfield
- Added options to set topology distance between devices
- Optimized VFS unix socket watch by using user private folder
- Added general IP subnet matching infrastructure
- Extend array data structure to support user-provided array copy routine
- Improved time units description
- Fixed a crash by using heap allocation to process expired timers in batch
- Fixed allocation issue on memtrack dump
- Fixed deletion of the monitored folder in VFS
- Fixed unsafe resize for DC initiator array
- Fixed function macro invocation to match C standard
- Fixed calling async handler on already released resource
- Fixed performance by setting higher bandwidth for different NUMA nodes on Grace
- Fixed undeclared value error in timer conversion routine
- Fixed uninitialized value access in registration cache
- UCM
- Extend CUDA memory hooks to include memory mapping APIs
- Fixed race condition in parsing proc maps
- Fixed mremap failure while parsing /proc/self/maps
- TCP
- Always bind endpoint to interface
- Tools
- Improved performance by increasing window size for put_bw and add get_bw in ucx_perftest
- Added multi-send flag for receive operations in bandwidth benchmarks in ucx_perftest
- Improved ucx_perftest uni-directional test with added fence
- Detailed ucx_perftest batch section of command-line documentation
- Fixed buffer size potential overflow in ucx_perftest
- Fixed missing address when packing memory keys on ucx_perftest
- Fixed memory leak for endpoint report in ucx_info
- Fixed build without openmp in ucx_perftest
- Fixed UCT device override on server side on ucx_perftest
- Documentation
- Added a section regarding adaptive routing on RoCE
- Architecture
- Added CPU Model for MI300A
- Added Fujitsu ARM specific values to ucx.conf
- Added AMD Turin support
- Added an optimized non-temporal memory copy implementation for AMD CPU
- Build
- Improved compiler error reporting with added flag
- Improved coverity script to allow faster turnaround time
- Improved Intel Compiler detection and support
- Fixed using correct ASAN version for running tests
- Configuration
- Used POSIX bourne syntax to check equality
- Fixed build failure by using proper flags in compiler.m4
- Fixed perftest MAD support default guessing
- GO
- Added multi-send flag and user memh support in request params
- Added serialized thread mode to avoid subtle races between threads
- Fixed make distcheck
- Packaging
- Improved dpkg-buildpackage sample command by explicitly adding mlx5 related arguments
- Delete UCS-TIME-Add-math.h-to-provide-INFINITY.patch which was merged upstream
- Refresh openucx-s390x-support.patch due to API changes
-------------------------------------------------------------------
Sat Sep 7 14:22:20 UTC 2024 - Nicolas Morey <nicolas.morey@suse.com>
- Refresh openucx-s390x-support.patch to fix compilation on s390x
-------------------------------------------------------------------
Sat Jun 29 16:55:27 UTC 2024 - Andreas Schwab <schwab@suse.de>
- Enable build on riscv64
-------------------------------------------------------------------
Wed Jun 26 15:43:05 UTC 2024 - Nicolas Morey <nicolas.morey@suse.com>
- Update to 1.17.0
- See NEWS for the complete CHANGELOG
- Refresh openucx-s390x-support.patch against the latest sources
- Add upstream fix UCS-TIME-Add-math.h-to-provide-INFINITY.patch
to fix compilation on ppc64
-------------------------------------------------------------------
Mon Feb 26 12:49:43 UTC 2024 - Dominique Leuenberger <dimstar@opensuse.org>

View File

@@ -1,7 +1,7 @@
#
# spec file for package openucx
#
# Copyright (c) 2023 SUSE LLC
# Copyright (c) 2025 SUSE LLC
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
@@ -20,7 +20,7 @@
%define version_suf %{nil}
Name: openucx
Version: 1.15.0
Version: 1.19.1
Release: 0
Summary: Communication layer for Message Passing (MPI)
License: BSD-3-Clause
@@ -30,8 +30,11 @@ URL: http://openucx.org/
#Git-Clone: git://github.com/openucx/ucx
#Git-Web: https://github.com/openucx/ucx
Source: https://github.com/openucx/ucx/releases/download/v%version%{?version_suf}/ucx-%version.tar.gz
Source100: README.md
Patch1: openucx-s390x-support.patch
Patch2: ucm-fix-UCX_MEM_MALLOC_RELOC.patch
Patch3: UCP-CORE-Fix-config-type-for-dynamic_tl_progress_factor.patch
Patch4: UCT-SELF-Fix-config-type-for-num_devices.patch
BuildRequires: autoconf >= 2.63
BuildRequires: automake >= 1.10
BuildRequires: binutils-devel
@@ -48,7 +51,7 @@ BuildRequires: libtool
BuildRequires: pkg-config
BuildRequires: zlib-devel
BuildRoot: %{_tmppath}/%{name}-%{version}-build
ExclusiveArch: aarch64 %power64 x86_64 s390x
ExclusiveArch: aarch64 %power64 x86_64 s390x riscv64
%description
UCX stands for Unified Communication X. UCX provides a communication
@@ -136,10 +139,7 @@ hardware.
%prep
%setup -qn ucx-%version
%ifarch s390x
%patch -P 1
%endif
%patch -P 2
%autopatch -p0
%build
autoreconf -fi
@@ -160,7 +160,8 @@ export UCX_CFLAGS="$UCX_CFLAGS -mno-sse -mno-sse2"
--disable-debug --disable-assertions \
--disable-params-check \
--with-rc --with-ud --with-dc \
--with-mlx5-dv --with-rdmacm
--with-ib-hw-tm --with-dm --with-devx \
--with-mlx5 --with-rdmacm
# Override BASE_CFLAGS to disable Werror (boo#1121267)
make %{?_smp_mflags} V=1 BASE_CFLAGS="-g -Wall"
@@ -192,6 +193,8 @@ mv %buildroot/%_bindir/io_demo %buildroot/%_libexecdir/%{name}/
%_libdir/pkgconfig/ucx.pc
%dir %_libdir/cmake/
%_libdir/cmake/ucx/
%dir %{_sysconfdir}/ucx/
%config %{_sysconfdir}/ucx/ucx.conf
%license LICENSE
%doc NEWS
@@ -230,6 +233,7 @@ mv %buildroot/%_bindir/io_demo %buildroot/%_libexecdir/%{name}/
%_libdir/libuct.so.*
%dir %_libdir/ucx/
%_libdir/ucx/libuct_*.so.*
%_libdir/ucx/libucx_perftest_mad.so.*
%files -n libuct-devel
%defattr(-,root,root)
@@ -237,9 +241,12 @@ mv %buildroot/%_bindir/io_demo %buildroot/%_libexecdir/%{name}/
%_libdir/libuct.so
%dir %_libdir/ucx/
%_libdir/ucx/libuct_*.so
%_libdir/ucx/libucx_perftest_mad.so
%_libdir/pkgconfig/ucx-uct.pc
%_libdir/pkgconfig/ucx-cma.pc
%_libdir/pkgconfig/ucx-ib.pc
%_libdir/pkgconfig/ucx-ib-efa.pc
%_libdir/pkgconfig/ucx-ib-mlx5.pc
%_libdir/pkgconfig/ucx-rdmacm.pc
%changelog

Binary file not shown.

BIN
ucx-1.19.1.tar.gz LFS Normal file

Binary file not shown.