------------------------------------------------------------------- Mon Mar 6 12:18:52 UTC 2023 - Martin Liška - Add upstream gcc13-fix.patch fix. ------------------------------------------------------------------- Mon Jan 16 09:45:05 UTC 2023 - Andreas Schwab - openucx-s390x-support.patch: fix use of clz builtin for 64-bit value ------------------------------------------------------------------- Tue Oct 4 16:39:30 UTC 2022 - Nicolas Morey-Chaisemartin - Update openucx-s390x-support.patch to add missing ucs_ffs32 on s390x - Drop baselibs.conf as openucx only works on 64b systems ------------------------------------------------------------------- Tue Sep 27 15:55:19 UTC 2022 - Nicolas Morey-Chaisemartin - Update to v1.13.1 (jsc#PED-912) - Core - Added new objects to VFS: local and remote address of endpoint, statistics of ucp_ep_create success/failure, failed/destroyed endpoints - Added support for UCX static libraries - Added profiling for rkey management routines - PCIe relaxed order enabled by default for AMD CPUs - Fixed not deallocating memory from ucp_mem_unmap if no rcache - Fixed versioning infrastructure - Multiple code improvements: refactoring, debug prints and assertions, etc. - Multiple improvements in build, test and docs infrastructure - Added new objects to VFS (md, component, log_level, etc.) - Added configuration variable to specify which loadable modules are allowed - Added build-time configuration to disable sigaction overriding - UCP - Added API to pass pre-registered memory handle to UCP operations - Added implementation of AM rendezvous protocol - Added 2-stage pipeline rendezvous protocol for GPU - Added support for fragment mem_type for v1 pipeline proto, disabled by default - Added active message support for proto v2 - Added UCP memory registration cache - Improved adaptive progress - deactivate iface when all p2p lanes are destroyed - Added support for user memh in proto_v1 - Added support for selecting local address when creating a client endpoint - Added option to limit GPUDirectRDMA size in rendezvous protocol, UCX_RNDV_MEMTYPE_DIRECT_SIZE - Deprecated UCX_SOCKADDR_AUX_TLS configuration parameter - Resolving remote EP ID when creating local EP disabled by default - Added client_id to ucp_worker_create() and ucp_conn_request_query() APIs - Added ucp_worker_address_query() API - Updated ucp_ep_query() API for getting local and remote addresses - Added address versioning to correctly preserve wire compatibility starting from version 1.11.0 - Added new client/server connection establishment packet header format - Enabled rendezvous and tag sync protocols when error handling is enabled on the endpoint - Added iov zcopy support to RMA operations - Reduced memory usage of unexpected messages by fitting receive buffer size to packet size - Added support for modifying UCT and UCS configs by ucp_config_modify() API - Optimized unpacked rkeys memory consumption - Added request flag to influence latency vs. bandwidth protocol - Reduced memory management overhead with new protocols - Improved performance calculations for new protocols - Added AMO support with GPU memory target using new protocols - Added put_zcopy, get_zcopy and pipeline based rendezvous in new protocols - Added support for user-defined alignment in Active Messages - Added support for offload tag sync in new protocols - Updated ucp_atomic_post() to use NBX flow - UCT - Introduced API uct_md_mkey_pack_v2 - Introduced UCT iface features API - Introduced max_inflight_eps parameter in perf_attr API - Introduced UCT_SEND_FLAG_PEER_CHECK flag that forces checking connectivity to a peer - Introduced UCX_RCACHE_PURGE_ON_FORK to enable/disable cleaning regions when application is forking - Disabled PEER_FAILURE capability for XPMEM - Added API - uct_iface_is_reachable_v2() - Added IPv6 address support in TCP - Added latency estimation to uct_iface_estimate_perf() - Adjusted knem and cma overhead cost - Increased built-in TCP keep-alive interval to 2 seconds - RDMA CORE (IB, ROCE, etc.) - Introduced NDR autorecognition - Introduced CQE zipping support - Set the default MAX_RD_ATOMIC to maximum value supported by the hardware - Disabled mlx5 ifaces on verbs MD - Added detection of IB NDR devices - Added check for CQ overrun in assert mode - Added bitmap usage for releasing detached DCIs - Added configuration for requests ack frequency with DevX - Added remote QP info to tx error CQE traces - ROCM - Increased maximum number of HSA agents - UCS - Added topo module infrastructure - Added memtrack and rcache information to VFS - Added API for a per-process aggregate-sum statistics report - Added memory pool set data structure - Added new ptr_array API for bulk allocation - Added ucs_string_buffer_append_flags() for string buffer - Added ucs_ffs32() - Added ucs_vsnprintf_safe() which always adds '\0' - Added thread-safe put to ptr_map - Improved accuracy of the topology distance estimation - Added prints of leaked callbacks from the callback queue - Removed a diagnostic message when fuse thread is stopped - Added configurable limit for the memory consumed by rcache - Added configuration for VFS(FUSE) thread affinity - Added memory limit support to memtrack - Packaging - Added cmake config files for better integration with external cmake based projects - Tools - Added loop-back transport support in ucx_perftest - Split ucx_perftest into separate modules - Added process placement option for ucx_info - Extended parameters correctness check in ucx_perftest - Backported UCS-DEBUG-replace-PTR-with-void.patch from upstream to fix compilation ------------------------------------------------------------------- Thu Jan 13 08:42:05 UTC 2022 - Nicolas Morey-Chaisemartin - Fix UCM bistro support on non s390x archs - Add ucm-fix-UCX_MEM_MALLOC_RELOC.patch to disable malloc relocations by default (bsc#1194369) ------------------------------------------------------------------- Thu Sep 23 07:35:57 UTC 2021 - Nicolas Morey-Chaisemartin - Update to v1.11.1 (jsc#SLE-19260) ------------------------------------------------------------------- Wed Feb 24 16:34:54 UTC 2021 - Nicolas Morey-Chaisemartin - Update openucx-s390x-support.patch to fix mmap syscall on s390x (bsc#1182691) - Core: - Added support for UCX monitoring using virtual file system (VFS)/FUSE - Added support for applications with static CUDA runtime linking - Added support for a configuration file - Updated clang format configuration - UCP - Added rendezvous API for active messages - Added user-defined name to context, worker, and endpoint objects - Added flag to silence request leak check - Added API for endpoint performance evaluation - Added API - ucp_request_query - Added API - ucp_lib_query - Added bandwidth optimizations for new protocols multi-lane - Added support for multi-rail over lanes with BW ratio >= 1/4 - Added support for tracking outstanding requests and aborting those in case of connection failure - Refactored keep-alive protocol - Added device id to wireup protocol - Added support up to 128 transport layer resources in UCP context - Added support CUDA memory allocations with ucp_mem_map - Increased UCP_WORKER_MAX_EP_CONFIG to 64 - Adjusted memory type zcopy threshold when UCX_ZCOPY_THRESH set - Refactored wireup protocols, rendezvous, get, zcopy protocols - Added put zcopy multi-rail - Improved logging for new protocols - Added system topology information - Added new protocols for eager offload protocols - UCT - Extended connection establishment API - Added active message AM alignment in iface params - Added active message short IOV API. - Added support for interface query by operation and memory type - Added API to get allocation base address and length - Added md_dereg_v2 API - UCS - Added log filter by source file name. - Added checking for last element in fraglist queue - Added a method to get IP address from sockaddr. - Added memory usage limits to registration cache - RDMA CORE (IB, ROCE, etc.) - Added report of QP info in case of completion with error - Refactored of FC send operations - Added support for DevX unique QPN allocation - Optimized endpoint lookup for DCI - Added support for RDMA sub-function (SF) - Added support for DCI via DEVX - Added DCI pool per LAG port - Added support for RoCE IP reachability check using a subnet mask - Added active message short IOV for UD/DC/RC mlx, UD/RC verbs - Added endpoint keep alive check for UD - Suppressed warning if device can't be opened - Added support for multiple flush cancel without completion - Added ignore for devices with invalid GID - Added support for SRQ linked list reordering - Added flush by flow control on old devices - Added support for configurable rdma_resolve_addr/route timeout - Shared memory - Added active message short IOV support for posix, sysv, and self transports - TCP - Added support for peer failure in case of CONNECT_TO_EP - Added support for active message short IOV - See NEWS for a complete changelog and bug fixes - Refresh openucx-s390x-support against latest sources ------------------------------------------------------------------- Mon Oct 5 13:21:34 UTC 2020 - Nicolas Morey-Chaisemartin - Update to v1.9.0 (jsc#SLE-15163) - Features: - Added a new class of communication APIs '*_nbx' that enable API extendability while - preserving ABI backward compatibility - Added asynchronous event support to UCT/IB/DEVX - Added support for latest CUDA library version - Added NAK-based reliability protocol for UCT/IB/UD to optimize resends - Added new tests for ROCm - Added new configuration parameters for protocol selection - Added performance optimization for Fujitsu A64FX with InfiniBand - Added performance optimization for clear cache code aarch64 - Added support for relaxed-order PCIe access in IB RDMA transports - Added new TCP connection manager - Added support for UCT/IB PKey with partial membership in IB transports - Added support for RoCE LAG - Added support for ROCm 3.7 and above - Added flow control for RDMA read operations - Improved endpoint flush implementation for UCT/IB - Improved UD timer to avoid interrupting the main thread when not in use - Improved latency estimation for network path with CUDA - Improved error reporting messages - Improved performance in active message flow (removed malloc call) - Improved performance in ptr_array flow - Improved performance in UCT/SM progress engine flow - Improved I/O demo code - Improved rendezvous protocol for CUDA - Updated examples code - Bugfixes: - Fixes for most resent versions of GCC, CLANG, ARMCLANG, PGI - Fixes in UCT/IB for strict order keys - Fixes in memory barrier code for aarch64 - Fixes in UCT/IB/DEVX for fork system call - Fixes in UCT/IB for rand() call in rdma-core - Fixed in group rescheduling for UCT/IB/DC - Fixes in UCT/CUDA bandwidth reporting - Fixes in rkey_ptr protocol - Fixes in lane selection for rendezvous protocol based on get-zero-copy flow - Fixes for ROCm build - Fixes for XPMEM transport - Fixes in closing endpoint code - Fixes in RDMACM code - Fixes in memcpy selection for AMD - Fixed in UCT/UD endpoint flush functionality - Fixes in XPMEM detection - Fixes in rendezvous staging protocol - Fixes in ROCEv1 mlx5 UDP source port configuration - Multiple fixes in RPM spec file - Multiple fixes in UCP documentation - Multiple fixes in socket connection manager - Multiple fixes in gtest - Multiple fixes in JAVA API implementation - Refresh openucx-s390x-support.patch against new version ------------------------------------------------------------------- Mon Jul 13 08:19:45 UTC 2020 - Nicolas Morey-Chaisemartin - Update to v1.8.1 - Features: - Added binary release pipeline in Azure CI - Bugfixes: - Multiple fixes in testing environment - Fixes in InfiniBand DEVX transport - Fixes in memory management for CUDA IPC transport - Fixes for binutils 2.34+ - Fixes for AMD ROCM build environment ------------------------------------------------------------------- Fri Jun 5 09:38:40 UTC 2020 - Jan Engelhardt - Trim bias and filler wording from descriptions. ------------------------------------------------------------------- Thu Jun 4 08:18:26 UTC 2020 - Nicolas Morey-Chaisemartin - Update to v1.8.0 - Features: - Improved detection for DEVX support - Improved TCP scalability - Added support for ROCM to perftest - Added support for different source and target memory types to perftest - Added optimized memcpy for ROCM devices - Added hardware tag-matching for CUDA buffers - Added support for CUDA and ROCM managed memories - Added support for client/server disconnect protocol over rdma connection manager - Added support for striding receive queue for hardware tag-matching - Added XPMEM-based rendezvous protocol for shared memory - Added support shared memory communication between containers on same machine - Added support for multi-threaded RDMA memory registration for large regions - Added new test cases to Azure CI - Added support for multiple listening transports - Added UCT socket-based connection manager transport - Updated API for UCT component management - Added API to retrieve the listening port - Added UCP active message API - Removed deprecated API for querying UCT memory domains - Refactored server/client examples - Added support for dlopen interception in UCM - Added support for PCIe atomics - Updated Java API: added support for most of UCP layer operations - Updated support for Mellanox DevX API - Added multiple UCT/TCP transport performance optimizations - Optimized memcpy() for Intel platforms - Added protection from non-UCX socket based app connections - Improved search time for PKEY object - Enabled gtest over IPv6 interfaces - Updated Mellanox and Bull device IDs - Added support for CUDA_VISIBLE_DEVICES - Increased limits for CUDA IPC registration - Bugfixes: - Multiple fixes in JUCX - Fixes in UCP thread safety - Fixes for most recent versions GCC, PGI, and ICC - Fixes for CPU affinity on Azure instances - Fixes in XPMEM support on PPC64 - Performance fixes in CUDA IPC - Fixes in RDMA CM flows - Multiple fixes in TCP transport - Multiple fixes in documentation - Fixes in transport lane selection logic - Fixes in Java jar build - Fixes in socket connection manager for Nvidia DGX-2 platform - Multiple fixes in UCP, UCT, UCM libraries - Multiple fixes for BSD and Mac OS systems - Fixes for Clang compiler - Fix CPU optimization configuration options - Fix JUCX build on GPU nodes - Fix in Azure release pipeline flow - Fix in CUDA memory hooks management - Fix in GPU memory peer direct gtest - Fix in TCP connection establishment flow - Fix in GPU IPC check - Fix in CUDA Jenkins test flow - Multiple fixes in CUDA IPC flow - Fix adding missing header files - Fix to prevent failures in presence of VPN enabled Ethernet interfaces - Refresh openucx-s390x-support.patch against new version ------------------------------------------------------------------- Fri Oct 4 08:11:49 UTC 2019 - Jan Engelhardt - Ensure /usr/lib/ucx is owned at all times. ------------------------------------------------------------------- Wed Sep 18 10:16:05 UTC 2019 - Nicolas Morey-Chaisemartin - Update to v1.6.0 - Features: - Modular architecture for UCT transports - ROCm transport re-design: support for managed memory, direct copy, ROCm GDR - Random scheduling policy for DC transport - Optimized out-of-box settings for multi-rail - Added support for OmniPath (using Verbs) - Support for PCI atomics with IB transports - Reduced UCP address size for homogeneous environments - Bugfixes: - Multiple stability and performance improvements in TCP transport - Multiple stability fixed in Verbs and MLX5 transports - Multiple stability fixes in UCM memory hooks - Multiple stability fixes in UGNI transport - RPM Spec file cleanup - Fixing compilation issues with most recent clang and gcc compilers - Fixing the wrong name of aliases - Fix data race in UCP wireup - Fix segfault when libuct.so is reloaded - issue #3558 - Include Java sources in distribution - Handle EADDRNOTAVAIL in rdma_cm connection manager - Disable ibcm on RHEL7+ by default - Fix data race in UCP proxy endpoint - Static checker fixes - Fallback to ibv_create_cq() if ibv_create_cq_ex() returns ENOSYS - Fix malloc hooks test - Fix checking return status in ucp_client_server example - Fix gdrcopy libdir config value - Fix printing atomic capabilities in ucx_info - Fix perftest warmup iterations to be non-zero - Fixing default values for configure logic - Fix race condition updating fired_events from multiple threads - Fix madvise() hook - Refresh openucx-s390x-support.patch against new version ------------------------------------------------------------------- Wed May 15 05:52:55 UTC 2019 - Nicolas Morey-Chaisemartin - Disable Werror to handle boo#1121267 ------------------------------------------------------------------- Mon Feb 25 07:56:39 UTC 2019 - nmorey - Update openucx-s390x-support.patch to fix support of 1.5.0 on s390x (bsc#1121267) - Add baselibs.conf for ppc ------------------------------------------------------------------- Fri Feb 22 12:11:57 UTC 2019 - Martin Liška - Update to v1.5.0 (bsc#1121267) * Features: * New emulation mode enabling full UCX functionality (Atomic, Put, Get) * over TCP and RDMA-CORE interconnects which don't implement full RDMA semantics * Non-blocking API for all one-sided operations. All blocking communication APIs marked * as deprecated * New client/server connection establishment API, which allows connected handover between workers * Support for rdma-core direct-verbs (DEVX) and DC with mlx5 transports * GPU - Support for stream API and receive side pipelining * Malloc hooks using binary instrumentation instead of symbol override * Statistics for UCT tag API * GPU-to-Infiniband HCA affinity support based on locality/distance (PCIe) * Bugfixes: * Fix overflow in RC/DC flush operations * Update description in SPEC file and README * Fix RoCE source port for dc_mlx5 flow control * Improve ucx_info help message * Fix segfault in UCP, due to int truncation in count_one_bits() * Multiple other bugfixes (full list on github) * Tested configurations: * InfiniBand: MLNX_OFED 4.4-4.5, distribution inbox drivers, rdma-core * CUDA: gdrcopy 1.2, cuda 9.1.85 * XPMEM: 2.6.2 * KNEM: 1.1.2 ------------------------------------------------------------------- Tue Nov 6 07:18:34 UTC 2018 - nmoreychaisemartin@suse.com - Update to v1.4.0 (bsc#1103494) * Features: * Improved support for installation with latest ROCm * Improved support for latest rdma-core * Added support for CUDA IPC for intra-node GPU, CUDA memory allocation cache for mem-type detection, latest Mellanox devices, Nvidia GPU managed memory, multiple connections between the same pair of workers, large worker address for client/server connection establishment and INADDR_ANY, and for bitwise atomics operations. * Bugfixes: * Performance fixes for rendezvous protocol * Memory hook fixes * Clang support fixes * Self tl multi-rail fix * Thread safety fixes in IB/RDMA transport * Compilation fixes with upstream rdma-core * Multiple minor bugfixes (full list on github) * Segfault fix for a code generated by armclang compiler * UCP memory-domain index fix for zero-copy active messages ------------------------------------------------------------------- Mon Oct 15 07:51:12 UTC 2018 - nmoreychaisemartin@suse.com - Update to v1.3.1 (fate#325996) - Prevent potential out-of-order sending in shared memory active messages - CUDA: Include cudamem.h in source tarball, pass cudaFree memory size - Registration cache: fix large range lookup, handle shmat(REMAP)/mmap(FIXED) - Limit IB CQE size for specific ARM boards ------------------------------------------------------------------- Thu Aug 9 05:57:24 UTC 2018 - nmoreychaisemartin@suse.com - Update to v1.3.0 (bsc#1104159) - Added stream-based communication API to UCP - Added support for GPU platforms: Nvidia CUDA and AMD ROCM software stacks - Added API for client/server based connection establishment - Added support for TCP transport - Support for InfiniBand tag-matching offload for DC and accelerated transports - Multi-rail support for eager and rendezvous protocols - Added support for tag-matching communications with CUDA buffers - Added ucp_rkey_ptr() to obtain pointer for shared memory region - Avoid progress overhead on unused transports - Improved scalability of software tag-matching by using a hash table - Added transparent huge-pages allocator - Added non-blocking flush and disconnect for UCP - Support fixed-address memory allocation via ucp_mem_map() - Added ucp_tag_send_nbr() API to avoid send request allocation - Support global addressing in all IB transports - Add support for external epoll fd and edge-triggered events - Added registration cache for knem - Initial support for Java bindings - Multiple bugfixes (full list on github) - Drop UCT-UD-fixed-compilation-by-gcc8.patch as it was fixed upstream - Refresh openucx-s390x-support.patch against latest sources ------------------------------------------------------------------- Wed Jun 13 12:45:34 UTC 2018 - nmoreychaisemartin@suse.com - Remove libnuma-devel on s390x for older releases ------------------------------------------------------------------- Tue Mar 27 07:12:37 UTC 2018 - nmoreychaisemartin@suse.com - Add UCT-UD-fixed-compilation-by-gcc8.patch to fix compilation with GCC8 (bsc#1084635) ------------------------------------------------------------------- Sat Jan 20 15:40:43 UTC 2018 - jengelh@inai.de - Use right documentation path. ------------------------------------------------------------------- Fri Jan 19 10:12:04 UTC 2018 - nmoreychaisemartin@suse.com - Update to 1.2.2 - Support including UCX API headers from C++ code - UD transport to handle unicast flood on RoCE fabric - Compilation fixes for gcc 7.1.1, clang 3.6, clang 5 - When UD transport is used with RoCE, packets intended for other peers may arrive on different adapters (as a result of unicast flooding). - This change adds packet filtering based on destination GIDs. Now the packet is silently dropped, if its destination GID does not match the local GID. - Added a new device ID for InfiniBand HCA ------------------------------------------------------------------- Fri Dec 8 21:19:11 UTC 2017 - dimstar@opensuse.org - Drop doxygen BuildRequires: The documentation was already not built with this enabled. Removing the BR causes no regression in the package but eliminates a build cycle boost -> curl -> doxygen -> openucx -> boost ------------------------------------------------------------------- Tue Sep 19 13:52:13 UTC 2017 - jengelh@inai.de - Rediff openucx-s390x-support.patch as p1 to be in line with potential git-generated patches. ------------------------------------------------------------------- Tue Sep 19 09:26:07 UTC 2017 - nmoreychaisemartin@suse.com - Switch to version 1.2.1 (Fate#324050) Previous 1.3+ version was based on a development branch. Supported platforms - Shared memory: KNEM, CMA, XPMEM, SYSV, Posix - VERBs over InfiniBand and RoCE. VERBS over other RDMA interconnects (iWarp, OmniPath, etc.) is available for community evaluation and has not been tested in context of this release - Cray Gemini and Aries - Architectures: x86_64, ARMv8 (64bit), Power64 Features: - Added support for InfiniBand DC and UD transports, including accelerated verbs for Mellanox devices - Full support for PGAS/SHMEM interfaces, blocking and non-blocking APIs - Support for MPI tag matching, both in software and offload mode - Zero copy protocols and rendezvous, registration cache - Handling transport errors - Flow control for DC/RC - Dataypes support: contiguous, IOV, generic - Multi-threading support - Support for ARMv8 64bit architecture - A new API for efficient memory polling - Support for malloc-hooks and memory registration caching ------------------------------------------------------------------- Fri Jun 30 09:30:58 UTC 2017 - nmoreychaisemartin@suse.com - Disable avx at configure level ------------------------------------------------------------------- Wed Jun 28 16:46:31 UTC 2017 - nmoreychaisemartin@suse.com - Add openucx-s390x-support.patch to fix compilation on s390x - Compile openucx on s390x ------------------------------------------------------------------- Thu Jun 8 12:12:59 UTC 2017 - nmoreychaisemartin@suse.com - Fix compilation on ppc ------------------------------------------------------------------- Fri May 26 08:29:51 UTC 2017 - jengelh@inai.de - Update to snapshot 1.3+git44 * No changelog was found - Add -Wno-error and disable AVX/SSE as it is not guaranteed to exist. ------------------------------------------------------------------- Sat Jun 18 07:36:59 UTC 2016 - jengelh@inai.de - Update to snapshot 0~git1727 * New: libucm. libucm is a standalone non-unloadable library which installs hooks for virtual memory changes in the current process. ------------------------------------------------------------------- Sun Sep 13 18:35:15 UTC 2015 - jengelh@inai.de - Update to snapshot 0~git862 * License clarification on upstream's behalf ------------------------------------------------------------------- Mon Jul 27 18:32:48 UTC 2015 - jengelh@inai.de - Initial package for build.opensuse.org (version 0~git713)