* devel/main:
Update to v1.19.1
Add patches to fix a badly initialized value in settings
Fix a badly initialized value in settings
Minor fixes to openucx-s390x-support.patch
Add Gitea build results
- Update to ucx 1.19.0 - UCP - Enabled multi-GPU support within a single process - Added dynamic selection between strong and weak fences in RMA flush operations - Improved endpoint reconfiguration capabilities - Added All2All lane selection for multi-NIC-GPU systems - Improved rkey debug info when config cache limit is reached - Improved UCP protocol selection based on available memory types - Removed dummy memory key from irrelevant transports (TCP, CMA and CUDA) - Improved RNDV performance with device-local staging buffers - Enabled error handling for RMA get_offload protocols - Made UCX_TLS=^ib disable all transports including auxiliary - Fixed send request status handling - Fixed performance degradation in RNDV by optimizing md cache updates - Fixed protocol selection when first lane is filtered out by fragment size - Fixed rkey selection by using memory registration flag - UCT - Defined uct_rkey_unpack_v2 API to support passing sys-dev - RDMA CORE (IB, ROCE, etc.) - Added SRD transport support in EFA with reordering, AM, and control operations - Removed XGVMI BF2 support (umem) - Removed device memory indirect key - Fixed VFS objects for DCIs and pools - Added routing table cache to the reachability check - Fixed strict order usage in IB auxiliary rkeys - Improved various init logging messages - Improved reliability of DC transport by adding DCI validation and separating connection logic - Fixed segfault in DC fence operation - UCS - Removed compilation warnings
- Update to ucx 1.18.1 - CUDA - Added config keys to update cuda_copy bandwidth for coherent platforms - Improved cache invalidation of memory allocated using CUDA memory pool - AZP - Added Ubuntu 24.04 to build and release pipeline - UCP - Fixed assertion failure when maximum lane fragment is smaller than AM header - Fixed potential active message user header use after free with protocol reconfiguration - CUDA - Fixed registration of CUDA Fabric memory allocated by UCT - Fixed VA recycling check of memory allocated using VMM and CUDA memory pool - RDMA CORE (IB, ROCE, etc.) - Do not use ConnectX-8 SMI subdevices for communication - Fixed remote access error by disabling ODP when the device supports DDP - Fixed configuration logic by disabling DDP when AR is disabled - UCM - Fixed crash with bistro hooks for CUDA 12.9 on amd64
add patches to fix gcc-15 compile errors (boo#1241939)
- Add UCT-IB-UD-Use-GRH-to-detect-address-family-on-non-Mellanox-hardware.patch to fix an UD init issue on non-Mellanox RDMA HW (bsc#1240204).
Accepting request 1247273 from home:NMorey:branches:science:HPC
Accepting request 1247161 from home:NMorey:branches:science:HPC
Accepting request 1199375 from home:NMorey:branches:science:HPC
Accepting request 1184022 from openSUSE:Factory:RISCV
Accepting request 1183477 from home:NMorey:branches:science:HPC
Signed-off-by: Nicolas Morey <nmorey@suse.com>
- Features
- UCP
- Do not require transport memory support if rendezvous protocol is not used
- Build
- Added CUDA 13 support to the release pipeline
- Added Rocky OS support to the release pipeline
- Bugfixes
- UCS
- Fixed Netlink fetch mechanism
Signed-off-by: Nicolas Morey <nmorey@suse.com>
- UCP
- Enabled multi-GPU support within a single process
- Added dynamic selection between strong and weak fences in RMA flush operations
- Improved endpoint reconfiguration capabilities
- Added All2All lane selection for multi-NIC-GPU systems
- Improved rkey debug info when config cache limit is reached
- Improved UCP protocol selection based on available memory types
- Removed dummy memory key from irrelevant transports (TCP, CMA and CUDA)
- Improved RNDV performance with device-local staging buffers
- Enabled error handling for RMA get_offload protocols
- Made UCX_TLS=^ib disable all transports including auxiliary
- Fixed send request status handling
- Fixed performance degradation in RNDV by optimizing md cache updates
- Fixed protocol selection when first lane is filtered out by fragment size
- Fixed rkey selection by using memory registration flag
- UCT
- Defined uct_rkey_unpack_v2 API to support passing sys-dev
- RDMA CORE (IB, ROCE, etc.)
- Added SRD transport support in EFA with reordering, AM, and control operations
- Removed XGVMI BF2 support (umem)
- Removed device memory indirect key
- Fixed VFS objects for DCIs and pools
- Added routing table cache to the reachability check
- Fixed strict order usage in IB auxiliary rkeys
- Improved various init logging messages
- Improved reliability of DC transport by adding DCI validation and separating connection logic
- Fixed segfault in DC fence operation
- UCS
- Removed compilation warnings
OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=82
- CUDA
- Added config keys to update cuda_copy bandwidth for coherent platforms
- Improved cache invalidation of memory allocated using CUDA memory pool
- AZP
- Added Ubuntu 24.04 to build and release pipeline
- UCP
- Fixed assertion failure when maximum lane fragment is smaller than AM header
- Fixed potential active message user header use after free with protocol reconfiguration
- CUDA
- Fixed registration of CUDA Fabric memory allocated by UCT
- Fixed VA recycling check of memory allocated using VMM and CUDA memory pool
- RDMA CORE (IB, ROCE, etc.)
- Do not use ConnectX-8 SMI subdevices for communication
- Fixed remote access error by disabling ODP when the device supports DDP
- Fixed configuration logic by disabling DDP when AR is disabled
- UCM
- Fixed crash with bistro hooks for CUDA 12.9 on amd64
OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=80
- Update to ucx 1.18.0
- UCP
- Enabled using CUDA staging buffers for pipeline protocols by default
- Added endpoint reconfiguration support for non-reused p2p scenarios
- Enabled non-cacheable memory domains, activated for gdr_copy
- Added user_data parameter to ucp_ep_query
- Added support for host memory pipeline through CUDA buffers for rendezvous protocol
- Added global VA infrastructure and memory region in absence of error handling
- Made protocol performance node names more informative
- Enforced always running on the same thread in single thread mode
- Multiple improvements in protocols selection infrastructure
- Added UCP_MEM_MAP_LOCK API flag to enforce locked memory mapping
- Allowed up-to 64 endpoint lanes for systems with many transports or devices
- Added usage tracker to worker
- Improved various logging messages
- Fixed stack overflow in exported rkey unpack
- Removed extra remote-cpu overhead from protocol estimation for zcopy
- Fixed performance estimation for rndv pipeline protocols
- Fixed ATP sending by picking the correct lane
- Fixed missing reg_id on memh creation
- Fixed repeated invalidations by retaining existing access flags
- Fixed abort reason propagation for rendezvous RTR mtype
- Do not check transport availability if it is disabled by UCX_TLS environment variable
- Fixed wrong flag being used for checking BCOPY capability
- Fixed sending too many ATPs for small messages
- Enforced 16 bits size for Active Messages identifiers
- Fixed unnecessary status check for emulated AMO
- Fixed more than one fragment sending in rendezvous pipeline
- Fixed crash by using biggest max frag across all lanes
- Fixed missing memory handle flags by copying from parent to child
OBS-URL: https://build.opensuse.org/request/show/1247274
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/openucx?expand=0&rev=33
- Update to ucx 1.18.0
- UCP
- Enabled using CUDA staging buffers for pipeline protocols by default
- Added endpoint reconfiguration support for non-reused p2p scenarios
- Enabled non-cacheable memory domains, activated for gdr_copy
- Added user_data parameter to ucp_ep_query
- Added support for host memory pipeline through CUDA buffers for rendezvous protocol
- Added global VA infrastructure and memory region in absence of error handling
- Made protocol performance node names more informative
- Enforced always running on the same thread in single thread mode
- Multiple improvements in protocols selection infrastructure
- Added UCP_MEM_MAP_LOCK API flag to enforce locked memory mapping
- Allowed up-to 64 endpoint lanes for systems with many transports or devices
- Added usage tracker to worker
- Improved various logging messages
- Fixed stack overflow in exported rkey unpack
- Removed extra remote-cpu overhead from protocol estimation for zcopy
- Fixed performance estimation for rndv pipeline protocols
- Fixed ATP sending by picking the correct lane
- Fixed missing reg_id on memh creation
- Fixed repeated invalidations by retaining existing access flags
- Fixed abort reason propagation for rendezvous RTR mtype
- Do not check transport availability if it is disabled by UCX_TLS environment variable
- Fixed wrong flag being used for checking BCOPY capability
- Fixed sending too many ATPs for small messages
- Enforced 16 bits size for Active Messages identifiers
- Fixed unnecessary status check for emulated AMO
- Fixed more than one fragment sending in rendezvous pipeline
- Fixed crash by using biggest max frag across all lanes
- Fixed missing memory handle flags by copying from parent to child
OBS-URL: https://build.opensuse.org/request/show/1247274
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/openucx?expand=0&rev=33
- Update to ucx 1.18.0
- UCP
- Enabled using CUDA staging buffers for pipeline protocols by default
- Added endpoint reconfiguration support for non-reused p2p scenarios
- Enabled non-cacheable memory domains, activated for gdr_copy
- Added user_data parameter to ucp_ep_query
- Added support for host memory pipeline through CUDA buffers for rendezvous protocol
- Added global VA infrastructure and memory region in absence of error handling
- Made protocol performance node names more informative
- Enforced always running on the same thread in single thread mode
- Multiple improvements in protocols selection infrastructure
- Added UCP_MEM_MAP_LOCK API flag to enforce locked memory mapping
- Allowed up-to 64 endpoint lanes for systems with many transports or devices
- Added usage tracker to worker
- Improved various logging messages
- Fixed stack overflow in exported rkey unpack
- Removed extra remote-cpu overhead from protocol estimation for zcopy
- Fixed performance estimation for rndv pipeline protocols
- Fixed ATP sending by picking the correct lane
- Fixed missing reg_id on memh creation
- Fixed repeated invalidations by retaining existing access flags
- Fixed abort reason propagation for rendezvous RTR mtype
- Do not check transport availability if it is disabled by UCX_TLS environment variable
- Fixed wrong flag being used for checking BCOPY capability
- Fixed sending too many ATPs for small messages
- Enforced 16 bits size for Active Messages identifiers
- Fixed unnecessary status check for emulated AMO
- Fixed more than one fragment sending in rendezvous pipeline
- Fixed crash by using biggest max frag across all lanes
- Fixed missing memory handle flags by copying from parent to child
OBS-URL: https://build.opensuse.org/request/show/1247161
OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=73
- UCP
- Enabled using CUDA staging buffers for pipeline protocols by default
- Added endpoint reconfiguration support for non-reused p2p scenarios
- Enabled non-cacheable memory domains, activated for gdr_copy
- Added user_data parameter to ucp_ep_query
- Added support for host memory pipeline through CUDA buffers for rendezvous protocol
- Added global VA infrastructure and memory region in absence of error handling
- Made protocol performance node names more informative
- Enforced always running on the same thread in single thread mode
- Multiple improvements in protocols selection infrastructure
- Added UCP_MEM_MAP_LOCK API flag to enforce locked memory mapping
- Allowed up-to 64 endpoint lanes for systems with many transports or devices
- Added usage tracker to worker
- Improved various logging messages
- Fixed stack overflow in exported rkey unpack
- Removed extra remote-cpu overhead from protocol estimation for zcopy
- Fixed performance estimation for rndv pipeline protocols
- Fixed ATP sending by picking the correct lane
- Fixed missing reg_id on memh creation
- Fixed repeated invalidations by retaining existing access flags
- Fixed abort reason propagation for rendezvous RTR mtype
- Do not check transport availability if it is disabled by UCX_TLS environment variable
- Fixed wrong flag being used for checking BCOPY capability
- Fixed sending too many ATPs for small messages
- Enforced 16 bits size for Active Messages identifiers
- Fixed unnecessary status check for emulated AMO
- Fixed more than one fragment sending in rendezvous pipeline
- Fixed crash by using biggest max frag across all lanes
- Fixed missing memory handle flags by copying from parent to child
OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=73
- Update to 1.15.0
- UCP
- Added 2-stage pipeline protocol in the new protocol infrastructure
- Added reset and abort functionality of rendezvous protocols in the
new infrastructure
- Added zero-copy rendezvous data send protocol in the new infrastructure
- Added support for user memory handle in the new protocol infrastructure
- Added option to force ODP registration for certain memory types
- Enabled lock free memory region deregistration
- Updated allow/deny transport list feature to control auxiliary transport selection
- Multiple performance improvements of the new protocol infrastructure
- Multiple improvements in error and debug messages
- Fixed assertion when sending from non-contiguous GPU buffer to managed buffer
- Fixed the race condition on endpoint configurations
- Fixed endpoint reconfiguration issues due to asymmetrical selection
- Fixed endpoint reconfiguration error due to wrong locality detection
- Fixed crash during connection manager cleanup
- Fixed rkey index calculation for rendezvous protocol
- Fixed rcache dump function
- Removed logging from rkey unpack in release mode
- Fixed dobule free of rkey in rendezvous protocol
- Fixed rendezvous pipeline protocol error flow
- Fixed error handling in rendezvous get zcopy protocol
- Replay pending requests of wireup EP CM during connection establishment
to prevent potential ordering issues and wrong configuration
- Pass user-provided memory type to the function that checks whether the buffer
can be sent inline or not
- Avoid memory registration during UCP context initialization
- Fixed CPU/device atomics selection in the new protocol infrastructure
- Multiple fixes in the new protocol infrastructure information output
OBS-URL: https://build.opensuse.org/request/show/1115979
OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=63
- Update to v1.14.1
- Fixed ROCm to prevent the locking of host pinned memory
- Added CUDA 12 based UCX builds to the release flow
- Increased the maximal number of endpoint configurations
- Fixed filter for a slow-lanes in selection logic
- Fixed TCP transport bandwidth calculation
- Fixed device detection for ROCM
- Fixed compatibility with CUDA 12
- Fixed rendezvous threshold for multi-path configurations
- Fixed error message in case of static link
- Fixed BlueField-3 detection
- Multiple fixes for Azure CI pipeline
OBS-URL: https://build.opensuse.org/request/show/1100640
OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=61
- Update to v1.14.0
- UCP
- Added API for querying transport and device names on endpoint
- Added API for querying datatype object
- Added API for exporting and importing memory keys (no implementation yet)
- Added support for non-persistent active message header
- Added infrastructure to print protocols v2 performance
- Multiple performance improvements for protocols v2
- Added support for non-contiguous datatypes for rendezvous protocols v2
- Added support for reset and abort request in protocols v2
- Added support for user memory handles in RMA API
- Added multi-rail support for RMA API in protocols v2
- Added support for up to 16 different lanes per endpoint
- Added support for dmabuf memory registration in protocols v2
- Added strong fence mode for ucp_worker_fence() API
- UCT
- Added new uct_md_mem_attach() API to support exported memory handles
- Added remote completion mode for endpoint flush (via new flag)
- Added support for dmabuf registration
- Added new uct_ep_connect_to_ep_v2() API
- Added new uct_mem_reg_v2() API
- Added new uct_md_query_v2() API
- Added support for IPv6 loopback address in TCP transport
- RDMA CORE (IB, ROCE, etc.)
- Added ECE (enhanced connection establishment) support for RC and DC transports
- Added support for hardware DCS in DC transport
- Added UD interface and endpoint resource information to VFS
- Added CQ creation via DEVX API
- Removed support for accelerated IB transports over legacy experimental verbs
- UCS
- Added support for auto-correction of user environment variables
- UCM
- Implemented CUDA bistro hooks for aarch64 (to enable memory cache on this platform)
- Added support for CUDA virtual/stream-ordered memory with cudaMallocAsync
- Documentation
- Added FAQ for using pkg-config tool to build applications with UCX
- Tools
- Added runtime library version to the 'ucx_info -v' output
- Added support for memory types in ucx_info
- Many bugfixes. See NEWS.
- Drop patch merged upstream:
- UCS-DEBUG-replace-PTR-with-void.patch
- gcc13-fix.patch
- Refresh openucx-s390x-support.patch
OBS-URL: https://build.opensuse.org/request/show/1075600
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/openucx?expand=0&rev=26
- Update to v1.14.0
- UCP
- Added API for querying transport and device names on endpoint
- Added API for querying datatype object
- Added API for exporting and importing memory keys (no implementation yet)
- Added support for non-persistent active message header
- Added infrastructure to print protocols v2 performance
- Multiple performance improvements for protocols v2
- Added support for non-contiguous datatypes for rendezvous protocols v2
- Added support for reset and abort request in protocols v2
- Added support for user memory handles in RMA API
- Added multi-rail support for RMA API in protocols v2
- Added support for up to 16 different lanes per endpoint
- Added support for dmabuf memory registration in protocols v2
- Added strong fence mode for ucp_worker_fence() API
- UCT
- Added new uct_md_mem_attach() API to support exported memory handles
- Added remote completion mode for endpoint flush (via new flag)
- Added support for dmabuf registration
- Added new uct_ep_connect_to_ep_v2() API
- Added new uct_mem_reg_v2() API
- Added new uct_md_query_v2() API
- Added support for IPv6 loopback address in TCP transport
- RDMA CORE (IB, ROCE, etc.)
- Added ECE (enhanced connection establishment) support for RC and DC transports
- Added support for hardware DCS in DC transport
- Added UD interface and endpoint resource information to VFS
- Added CQ creation via DEVX API
- Removed support for accelerated IB transports over legacy experimental verbs
- UCS
- Added support for auto-correction of user environment variables
- UCM
- Implemented CUDA bistro hooks for aarch64 (to enable memory cache on this platform)
- Added support for CUDA virtual/stream-ordered memory with cudaMallocAsync
- Documentation
- Added FAQ for using pkg-config tool to build applications with UCX
- Tools
- Added runtime library version to the 'ucx_info -v' output
- Added support for memory types in ucx_info
- Many bugfixes. See NEWS.
- Drop patch merged upstream:
- UCS-DEBUG-replace-PTR-with-void.patch
- gcc13-fix.patch
- Refresh openucx-s390x-support.patch
OBS-URL: https://build.opensuse.org/request/show/1075167
OBS-URL: https://build.opensuse.org/package/show/science:HPC/openucx?expand=0&rev=57