SHA256
1
0
forked from pool/libfabric
Commit Graph

106 Commits

Author SHA256 Message Date
Ana Guerrero
85e3cca968 Accepting request 1164392 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/1164392
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=47
2024-04-04 20:24:35 +00:00
779fd8ecd7 Accepting request 1164368 from home:NMorey:branches:science:HPC
- Enable ucx and new efa provider on 64b architectures.
- Use a single changes file for libfabric and fabtests.
- Update to 1.21.0
  - Core
    - Various update and fixed in man pages
    - Fix xpmem memory corruption
    - Extend FI_PROVIDER_PATH to allow setting preferred DL provider
    - Add a SECURITY.md file
    - Document preferred threading model for scalable endpoints
    - Move FI_PRIORITY to internal flag
    - Remove FI_PROV_SPECIFIC
    - Remove unimplemented or unused features
    - Support cntr byte counting
    - configure: Do not check for xpmem if disabled
    - Add FI_PROGRESS_CONTROL_UNIFIED
    - hmem/cuda: Get multiple attributes at once in cuda_is_addr_valid
    - configure: Add -pipe by default to CFLAGS
    - Selectively generate warnings on failed loading of DL providers
    - hmem: introduce ofi_dev_reg_copy_*_iov ops
    - Print provider path on fabric creation
    - Introduce FI_OPT_SHARED_MEMORY_PERMITTED
    - README.md: Add badge for openssf scorecard
    - man: Regulate the fi_setopt call sequence.
    - man: Clarify the usage of FI_RMOTE_CQ_DATA flag
    - man: Add ucx provider to the fi_provider man page
    - configure.ac: add extra check for 128 bit atomic support
    - include/osd: align atomic complex definitions
    - hmem/synapseai: Refine the error handling and warning
    - Specify C11 standard for Visual Studio builds
    - configure: Do not check for xpmem if disabled

OBS-URL: https://build.opensuse.org/request/show/1164368
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=101
2024-04-03 15:32:26 +00:00
Ana Guerrero
1b10814640 Accepting request 1161340 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/1161340
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=46
2024-03-25 20:07:15 +00:00
0dfc65be02 Accepting request 1161331 from home:NMorey:branches:science:HPC
- Update to 1.20.1
  - Core
    - hmem/ze: Change the library name passed to dlopen
    - hmem/ze: map device id to physical device
    - hmem/ze: skip duplicate initialization
    - hmem/ze: dynamically allocate device resources based on number of devices
    - hmem/ze: fix hmem_ze_copy_engine variable look up
    - hmem/ze: Increase ZE_MAX_DEVICES to 32
    - man: Fix typo in fi_getinfo man page
    - Fix compiler warning when compiling with ICX
    - man: Fix fi_rxm.7 and fi_collective.3 man pages
    - man: Update EFA docs for FI_EFA_INTER_MIN_READ_WRITE_SIZE
  - EFA
    - efa_rdm_ep_record_tx_op_submitted() rm peer lookup
    - Remove peer lookup from efa_rdm_pke_sendv()
    - Make handshake response use txe
    - test: Only close SHM if SHM peer is Created
    - Handshake code allocs txe via efa util
    - Initialize txe.rma_iov_count to 0
    - Switch fi_addr to efa_rdm_peer in trigger_handshake
    - Downgrade EFA Endpoint Creation WARN to INFO
    - Init srx_ctx before use
    - Clean up generic_send path
    - Pass in efa_rdm_ep to efa_rdm_msg_generic_recv()
    - Make recv path slightly more efficient
    - re-org rma write to avoid duplicate checks
    - Add missing sync_memops call to writedata
    - use peer pointer from txe in read, write and send
    - Pass in peer pointer to txe
    - Get rid of noop instruction from empty #define

OBS-URL: https://build.opensuse.org/request/show/1161331
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=99
2024-03-25 08:50:35 +00:00
Dominique Leuenberger
d15d9152ef Accepting request 1155207 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/1155207
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=45
2024-03-06 22:03:43 +00:00
73658dedfa Accepting request 1153473 from home:pgajdos:l
- Use %autosetup macro. Allows to eliminate the usage of deprecated
  %patchN

OBS-URL: https://build.opensuse.org/request/show/1153473
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=97
2024-03-05 13:40:29 +00:00
Ana Guerrero
77be0f1aba Accepting request 1127574 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/1127574
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=44
2023-11-20 20:19:00 +00:00
5587bbb374 Accepting request 1127573 from home:NMorey:branches:science:HPC
- Update to 1.20.0 (jsc#PED-5777, jsc#PED-5893, jsc#PED-5889)

OBS-URL: https://build.opensuse.org/request/show/1127573
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=95
2023-11-19 18:58:48 +00:00
Ana Guerrero
f6a72224bc Accepting request 1108987 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/1108987
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=43
2023-09-06 16:55:45 +00:00
a07202472f Accepting request 1108986 from home:NMorey:branches:science:HPC
- Update to 1.19.0
  - Core
    - General code cleanup and restructuring
    - Add ofi_hmem_any_ipc_enabled()
    - ofi_consume_iov allows 0-byte consume
    - ofi_consume_iov consistency
    - ofi_indexer: return error code when iterating
    - getinfo: Add post filters for domain and fabric names
    - Filter loopback device if iface is specified
    - bsock: Fix error checking for -EAGAIN
    - windows/osd: Remove unneeded check to silence coverity
    - windows/osd: Move variable declaration to silence coverity
    - Introduce gdrcopy awareness to hmem copy
    - mr/cache: Fix fi_mr_info initialization
    - hmem_cuda: remove gdrcopy from cuda hmem copy path
    - iouring: Fix wrong indent in ofi_sockapi_accept_uring()
    - Implement ofi_sockctx_uring_poll_add()
    - hmem: introduce gdrcopy from/to cuda iov functions
    - hmem: Deprecate `FI_HMEM_CUDA_ENABLE_XFER`
    - hmem_cuda: Restrict CUDA IPC based on peer accessibility
    - hmem_cuda: Log number of CUDA devices detected
    - hmem_cuda: Refactor global variables
    - tostr: Remove the extra dir "shared/" from "include/" and "src/" .
    - hmem_ze: fix ZE is valid check
    - hmem_rocr: fix offset calculation
    - hmem_rocr: use ofi spinlock functions
    - hmem_rocr: minor fixes
    - hmem_neuron: convert warn to info for nrt_get_dmabuf_fd not found
    - hmem_neuron: check existance of neuron devices during initialization
    - tostr: Moved Windows functions in shared/ofi_str.c to windows/osd.h

OBS-URL: https://build.opensuse.org/request/show/1108986
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=93
2023-09-05 07:23:01 +00:00
Dominique Leuenberger
03adceadba Accepting request 1102763 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/1102763
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=42
2023-08-09 15:23:55 +00:00
fd28efa431 Accepting request 1102753 from home:NMorey:branches:science:HPC
- Drop support for obsolete TrueScale (bsc#1212146)

OBS-URL: https://build.opensuse.org/request/show/1102753
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=91
2023-08-07 17:25:39 +00:00
Dominique Leuenberger
c63f177d4e Accepting request 1096632 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/1096632
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=41
2023-07-04 13:21:43 +00:00
1545d1225e Accepting request 1096631 from home:NMorey:branches:science:HPC
- Update to 1.18.1
  - Core
    - Fix build warning for ofi_dynpoll_get_fd
  - EFA
    - Handle 0-byte writes
    - Apply byte_in_order_128_byte for all memory type
    - Increase default shm_av_size to 256
    - Force handshake before selecting rtm for non-system ifaces.
    - Only select readbase_rtm when both sides support rdma-read
    - Bugfix for initializing SHM offload
    - Correct CPPFLAGS during configure
    - Make setopt support sendrecv aligned 128 bytes
    - Make data size to be 128 byte multiples for in-order aligned send/recv
    - prepare local read pkt entry for in-order aligned send/recv.
    - Disable gdrcopy and cudamemcpy for in-order aligned recv.
    - Increase the pad size in rxr_pkt_entry
    - Make readcopy pkt pool 128 byte aligned
    - Introduce alignment to support in order aligned ops
    - Fix a bug when calling ibv_query_qp_data_in_order
    - RMA operations will ensure FI_ATOMIC cap
    - RMA operations will ensure FI_RMA cap
    - Unittest atomics without FI_ATOMIC cap.
    - Unittest RMA without FI_RMA cap.
    - Refactor pkt_entry assignment in poll_ibv loop
    - Fixes for RDMA Write and Writedata
  - RXM
    - Revert rxm util peer CQ support
    - Fix credit size parameter for flow ctrl
  - SHM
    - Fix DSA enable

OBS-URL: https://build.opensuse.org/request/show/1096631
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=89
2023-07-03 16:43:43 +00:00
Dominique Leuenberger
f78d1c2529 Accepting request 1085713 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/1085713
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=40
2023-05-10 14:16:42 +00:00
bd0842cd83 Accepting request 1084707 from home:fcrozat:branches:science:HPC
- Add _multibuild to define additional spec files as additional
  flavors.
  Eliminates the need for source package links in OBS.

- Add _multibuild to define additional spec files as additional
  flavors.
  Eliminates the need for source package links in OBS.

OBS-URL: https://build.opensuse.org/request/show/1084707
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=87
2023-05-09 12:56:31 +00:00
Dominique Leuenberger
bb5d2fb283 Accepting request 1080189 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/1080189
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=39
2023-04-20 13:13:15 +00:00
add54a60b7 Accepting request 1080188 from home:NMorey:branches:science:HPC
- Update to 1.18.0
  - Core
    - rocr: fix offset calculation
    - rocr: use ofi spinlock functions
    - rocr: minor fixes
    - neuron: convert warn to info for nrt_get_dmabuf_fd not found
    - neuron: check existance of neuron devices during initialization
    - neuron: Add support for neuron dma-buf
    - ze: update ZE to support new driver index specification
    - List variables read from config file
    - Add switch to prefer system-config over environment
    - Add basic system-config support for setting library variables
    - Move peer provider defines into new header
    - rocr: Support asynchronous memory copies
    - rocr: Add support for ROCR IPC
    - rocr: rename rocr data-structures
    - synpaseai: return 0 for host_register and host_deregister
    - fabric: Improve log level of provider mismatch
    - cuda: Allow CUDA IPC when P2P disabled
    - ze: add ZE command list pool to reuse command lists
    - cuda: implement cuda_get_xfer_setting for non cuda build
    - cuda: adjust FI_HMEM_CUDA_ENABLE_XFER behavior
    - cuda.c: Add const to param to remove warning
    - Add IFF_RUNNING check to indicate iface is up and running
    - io_uring support enhancements
  - EFA
    - Implement CUDA support on instance types that do not support GPUDirect RDMA
    - Implement fi_write using device's RDMA write capability
    - Enrich error messages with debug and connection info
    - Implement support for FI_OPT_EFA_USE_DEVICE_RDMA in fi_setopt

OBS-URL: https://build.opensuse.org/request/show/1080188
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=85
2023-04-18 20:47:57 +00:00
Dominique Leuenberger
1e1e226034 Accepting request 1075156 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/1075156
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=38
2023-03-30 20:50:41 +00:00
6d086ca72a Accepting request 1075155 from home:NMorey:branches:science:HPC
- Update to 1.17.1
  - Core
    - hmem_cuda Add const to param to remove warning
    - Fix typos in fi_ext.h
    - ofi_epoll: Remove unused hot_index struct member
  - EFA
    - Print local/peer addresses for RX write errors
    - Unit test to verify no copy with shm for small host message
    - Avoid unnecessary copy when sending data from shm
    - Compare pci bus id in hints
    - Fix double free in rxr endpoint init
  - Hooks
    - dmabuf_peer_mem: Handle IPC handle caching in L0
  - OPX
    - Exclude from build if missing needed defines
    - Move some logs to optimized builds
    - Fix build warnings for unused return code from posix_memalign
    - Add reliability sanity check to detect when send buffer is illegally altered
    - SDMA Completion workaround for driver cache invalidation race condition
    - Fix replay payload pointer increment
    - Handle completion counter across multiple writes in SDMA
    - Cleanup pointers after free()
    - Modify domain creation to handle soft cache errors
    - Two biband performance improvements
    - Fixes based on Coverity Scan related to auto progress patch
    - Changed poll many argument to rx_caps instead of caps
    - Resynch with server configured for Multi-Engines (DAOS CART Self Tests)
    - Remove import_monitor as ENOSYS case
    - Address memory leaks reported on OFIWG issues page
    - Remove unused fields
    - Fix unwanted print statement case
    - Add replays over SDMA
    - Implement basic TID Cache
    - Revert work_pending check change
    - Fix use_immediate_blocks
    - Restore state after replay packet is NULL
    - Fix memory leak from early arrival packets.
    - Fix segfault in SHM operations from uninitialized value in atomic path.
    - Prevent SDMA work entries from being reused with outstanding
      replays pointing to bounce buf.
    - Set runtime as default for OPX_AV
    - Fix RTS replay immediate data
    - Fix errors caught by the upstream libfabric Coverity Scan
    - Support multiple HFI devices
    - Support OFI_PORT and Contiguous endpoint addresses
    - Update man pages
  - Util
    - util_cq: Remove annoying WARNING message for FI_AFFINITY

OBS-URL: https://build.opensuse.org/request/show/1075155
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=83
2023-03-29 08:24:52 +00:00
Dominique Leuenberger
d5c883a19d Accepting request 1034518 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/1034518
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=37
2022-11-09 11:56:28 +00:00
Nicolas Morey-Chaisemartin
1b73b978dd Accepting request 1034517 from home:NMoreyChaisemartin:branches:science:HPC
- Add prov-net-fix-error-path-in-xnet_enable_rdm.patch to fix a deadlock
  when no network interfaces are available (bsc#1205139)

OBS-URL: https://build.opensuse.org/request/show/1034517
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=81
2022-11-08 12:08:06 +00:00
Dominique Leuenberger
b41af68ed2 Accepting request 1012024 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/1012024
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=36
2022-10-18 10:44:22 +00:00
Nicolas Morey-Chaisemartin
b4457cf5d3 Accepting request 1012023 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.16.1
  - Core
    - Fix windows implementation to remove fd from poll set
  - PSM3
    - Add missing files to release tarball
  - Util
    - Handle NULL address insertion to fi_av_insert
- Drop prov-rxm-Disable-128-bit-atomics.patch which was merged upstream

- Update to 1.16.1
  - Core
    - Fix windows implementation to remove fd from poll set
  - PSM3
    - Add missing files to release tarball
  - Util
    - Handle NULL address insertion to fi_av_insert
- Drop prov-rxm-Disable-128-bit-atomics.patch which was merged upstream

OBS-URL: https://build.opensuse.org/request/show/1012023
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=79
2022-10-17 08:21:22 +00:00
Fabian Vogt
99fd313f39 Accepting request 1008574 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/1008574
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=35
2022-10-10 16:43:27 +00:00
Nicolas Morey-Chaisemartin
f1f52ea9c9 Accepting request 1008573 from home:NMoreyChaisemartin:branches:science:HPC
- Add prov-rxm-Disable-128-bit-atomics.patch to fix a potential
  segfault on misaligned buffers.
- Add prov-rxm-Disable-128-bit-atomics.patch to fix a potential
  segfault on misaligned buffers.

OBS-URL: https://build.opensuse.org/request/show/1008573
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=77
2022-10-06 17:01:30 +00:00
Richard Brown
36926c25e7 Accepting request 1007632 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/1007632
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=34
2022-10-04 18:36:52 +00:00
Nicolas Morey-Chaisemartin
d98f48a74f Accepting request 1007631 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.16.0 (jsc#PED-351, jsc#PED-190)
  - Core
    - Added HMEM IPC cache
    - Use exact string comparison checks for network interfaces
    - Restructuring of poll/epoll abstraction
    - Add ability to disable locks completely in debug builds
    - Serialize access to modifying the logging calls
    - Minor fixes to fi_tostr text formatting
    - Add hmem interface checks to memory registration
  - EFA
    - Added support of Synapse AI memory.
    - Improved error message
  - Net
    - Temporarily forked, optimized version of tcp provider
    - Focused on improved performance and scalability over tcp sockets
    - Fork ensures tcp provider stability while net provider is developed
    - Shares the tcp provider protocol and base implementation for msg endpoints
    - Integrates direct support for rdm endpoints, using a derivative from rxm
    - Implements own protocol for rdm endpoints, separate from rxm;tcp
  - OPX
    - Added initial support for SDMA
    - General performance enhancements
    - Performance improvements to reliability protocol
    - Improved deferred work pending complete
    - Added support for OPX_AV=runtime
    - Support iov memory registration ops
    - Added DAOS RPC support
    - Atomic ops enhancements
    - Improved documentation
    - Debug build enhancements
    - Fixed compiler warnings
    - Reduced time to compile prov/opx code
    - General bug fixes
    - Fixed PSN wrapping scaling
    - Added intranode fence
    - Addressed bugs discovered by coverity scan
  - PSM2
    - Fix sending CQ data in some instances of fi_tsendmsg
  - PSM3
    - Updated to match Intel Ethernet Fabric Suite (IEFS) 11.3 release
  - RxM
    - Update to read multiple completions at once from msg provider
    - Move RxM AV implementation to util code to share with net provider
    - Minor code cleanups
  - SHM
    - Implement and use ipc_cache
    - Add log messages for debugging and error tracking
    - Fix check for FI_MR_HMEM mr_mode
    - Move shm signal handlers initialization to EP
    - Added log messages for errors detected
  - TCP
    - Fix incorrect signaling of the CQ
    - Increase max number of poll events to retrieve
    - Acquire ep lock prior to flushing socket in shutdown
    - Verify ep state prior to progressing socket data
    - Read cm error data when receiving connreq response
    - Log error on connect failure
    - Fix assertion failure in CQ progress function
  - Util
    - Fix text in log of UFFD ioctl failure
    - Introduce cuda ipc monitor
    - Fix CQ memory leak handling overflow
    - Fix MR mode bit check for ver 1.5 and greater
    - Add max_array_size to track/check array overflow
    - Always progress transfers when reading from a CQ
    - Handle NULL address insertion
    - Try IPv4 before IPv6 addresses when starting name server
    - Fix IP util av default address length
    - Fix util IP getinfo path to read hints->addr_format
    - Fix debug print mismatch
    - Fix return code when memory allocation fails.
    - Fix build sign warning in ofi_bufpool_region_alloc
    - Minor code cleanups
    - Print warning if an addr is inserted into an AV again
  - Verbs
    - Fix support of FI_SOCKADDR_IB when requested by the application
    - Ensure all posted receives are flushed to the application
    - Update ofi_mr_cache_search API for hmem IPC support
    - Reduce logging verbosity for "no active ports"
    - Fix incorrect length used in memory registration
    - Various minor bug fixes for test failures
    - Fix a memory leak getting IB address
    - Implement verbs provider on Windows over NetworkDirect API
    - Set and check address format correctly
    - Only close qp if it was initialized
    - Portable detection of loopback device
  - Fabtests
    - multi_ep: Separate EP resources and fix MR registration
    - multi_recv: Fix possible crash and check for valid buffer
    - unexpected_msg: Fix printf compiler warning
    - dgram_pingpong.c: Use out-of-band sync
    - multinode: Make multinode tests platform agnostic, fix formatting
    - ubertest: Fix string comparison to include length, fix writedata completion check
    - av_test: add support for -e <ep_type>
    - New tests:
      - dmabuf-rdma: Component level test for dma-buf RDMA
      - sock_test: Component level performance test of poll, epoll, and select
      - rdm_stress: Multi-threaded, multi-process stress test for RDM endpoints
      - sighandler_test: Regression test for signal handler restoration
- Drop patches fixed upstream:
  - prov-opx-Correctly-disable-OPX-if-unsupported.patch
  - disable-flatten-attr.patch

OBS-URL: https://build.opensuse.org/request/show/1007631
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=75
2022-10-03 07:34:47 +00:00
Dominique Leuenberger
727dd06214 Accepting request 998811 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/998811
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=33
2022-08-24 13:10:49 +00:00
Nicolas Morey-Chaisemartin
deb2507db0 Accepting request 998810 from home:marxin:branches:science:HPC
- Add disable-flatten-attr.patch that drops flatten attribute.
  Note the flatten attribute results in huge compile time hog
  in inliner (same the binary size would be huge).
- Use %make_build and enable LTO (boo#1133235).
- Synchronize used Patches.

- Add disable-flatten-attr.patch that drops flatten attribute.
  Note the flatten attribute results in huge compile time hog
  in inliner (same the binary size would be huge).
- Use %make_build and enable LTO (boo#1133235).
- Synchronize used Patches.

OBS-URL: https://build.opensuse.org/request/show/998810
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=73
2022-08-23 12:14:10 +00:00
Fabian Vogt
8bb8f3d9c7 Accepting request 989962 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/989962
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=32
2022-07-31 21:00:32 +00:00
Nicolas Morey-Chaisemartin
abc00bb762 Accepting request 989191 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.15.1
  - Core
    - Fix fi_info indentation error in fi_tostr
    - hmem_ze: Add runtime option to choose specific copy engine
    - Cleanup of configure HMEM checks
    - Fixed stringop-truncation in ofi_ifaddr_get_speed
    - Add utility provider log suffix to make logs easier to read
    - Fix truncation of ipv6 addressing
    - hmem: add support for AWS Trainium devices
    - Fix potential sscanf overflows
    - hmem: pass through device and flags when querying memory interface
    - Rework locking in several areas to convert spinlocks to mutexes
    - Add new locking abstractions to select lock types at runtime
    - Add new FI_PROTO_RXM_TCP for optimized rxm over tcp path
    - Fix windows implementation to remove fd from poll set
  - EFA
    - Added windows support through efawin (https://github.com/aws/efawin)
    - Added support of AWS neuron.
    - Added support of using gdrcopy to copy data from host to device.
    - Fixed a bug that cause 0 byte read to fail.
    - Fixed a memory corruption issue that can caused forked process to crash.
    - Extended testing coverage through new pytest based testing framework.
  - HOOKS
    - Add new hooking provider dmabuf_peer_mem
    - Enable DL build of hooking providers
    - Add HMEM memory registration hook
  - OPX
    - New provider supporting Cornelis Networks Omni-path hardware
  - PSM3
    - Updated psm3 to match IEFS 11.2.0.0 release
    - Added support for sockets (TCP/UDP) via a runtime selectable Hardware
  Abstraction Layer (HAL)
    - Added support for IPv6 addressing in RoCE and sockets
    - Added various NIC selection filtering options (wildcarded NIC name,
      address format, wildcarded IP subnet, link speed)
    - Performance tuning in conjunction with OneAPI and OneCCL
    - Improved PSM3_IDENTIFY output
    - Rename most internal symbols to psm3_
    - Corrected vulnerabilities found during Coverity scans
    - configure options refined and help text improved
    - PSM3_MULTI_EP has been deprecated (recommend always enabled, default
      is enabled [same default as previous releases])
    - Various bug fixes
  - RxM
    - Add check that atomic size is valid
    - Add support to passthru calls to tcp provider in specific
  - TCP
    - Add assert to verify RMA source/target msg sizes match
    - Wake-up threads blocked on CQ to update their poll events
    - Fix use of incorrect events in progress handler
    - Fixes for various compile warnings, mostly on Windows
    - Add support for FI_RMA_EVENT capability
    - Add support for completion counters
    - Fix check for CQ data in tagged messages
    - Add cancel support to shared rx context
    - Add src_addr receive buffer matching
    - Add provider control to assign a src_addr with an ep
    - Handle trecv with FI_PEEK flag
    - Allow binding a CQ with an SRX
    - Restructuring of code in source files
    - Handle EWOULDBLOCK returned by send call
    - Add hot (active) pollfd
  - SHM
    - Properly chain the original signal handlers
    - Avoid uninitialized variable with invalid atomic parameters
    - Fix 0 byte SAR read
    - Initialize len parameter to accept
    - Refactor and simplify protocol code
    - Remove broken support for 128-bit atomics
    - Fix FI_INJECT flag support
    - Add assert to verify RMA source/target msg sizes match
    - Set domain threading to thread safe
    - Fix possible use of uninitiated var in av_insert
  - Util
    - Fix sign warning in ofi_bufpool_region_alloc
    - Remove unused variable from ofi_bufpool_destroy
    - Fix check for valid datatype in ofi_atomic_valid
    - Return with error if util_coll_sched_copy fails
    - Fix use of uninitialized variable in ofi_ep_allreduce
    - Fix memory access in ip_av_insertsym
    - Track ep per collective operation not with multicast
    - Restructure collective av set creation/destruction
    - Change most locks from spin locks to mutexes
    - Allow selection of spinlocks for CQ and domain objects
    - Fix AV default addrlen
    - Update fi_getinfo checks to include hints->addr_
    - Handle NULL address insertion to fi_av_insert
  - Verbs
    - Initial changes for compiling on Windows (via NetworkDirect)
    - Add a failover path to dma-buf based memory registration
    - Replace use of spin locks with mutexes
    - Check for valid qp prior to cleanup
    - Set and check for address format correct in fi_getinfo
  - Fabtests
    - hmem_cuda: used device allocated host buff to fill device buf
    - Add python scripts to control test execution
    - test_configs: include util provider in core config file
    - Add option "--pin-core"
    - Only call nrt_init once
    - Fix a bug in ft_neuron_cleanup
    - Correct help for unit test programs
    - Remove duplicate help prints from fi_mcast
    - configure.ac: fix --enable-debug=no not properly detected
    - msg_inject: handle the case ft_tsendmsg return -FI_EAGAIN
    - Add AWS Trainium device support
    - fi_inj_complete: Add FI_INJECT to fabtests
    - inj_complete.c: Make arguments align with the other tests
    - dgram_pingpong: handle the error return of fi_recv
    - recv_cancel: Remove requirement for unexpected msg handling
    - poll: Fix crash if unable to allocate pollset
    - ubertest: Add GPU testing and validation support
    - Add HMEM options parsing support
    - Update and re-enable fi_multi_ep test
- Add prov-opx-Correctly-disable-OPX-if-unsupported.patch to disable
  OPX compilation on non x86_64 systems

OBS-URL: https://build.opensuse.org/request/show/989191
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=71
2022-07-18 13:06:07 +00:00
Dominique Leuenberger
698f4fa244 Accepting request 971080 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/971080
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=31
2022-04-22 19:53:05 +00:00
Nicolas Morey-Chaisemartin
36cbb47841 Accepting request 971079 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.14.1
  - Core
    - Use non-shared memory allocations to use MADV_DONTFORK safely
    - Fix incorrect use of gdr_copy_from_mapping
    - Ensure proper timeout time for pollfds to avoid early exit
  - EFA
    - Handle read completion properly for multi_recv
    - Use shm's inject write when possible
    - Support 0 byte read
  - RxM
    - Ensure signaling the CQ fd after writing completion
    - Fix inject path for sending tagged messages with cq data
    - Negotiate credit based flow control support over CM
    - Add PID to CM messages to detect stale vs duplicate connections
    - Fix race handling unexpected messages from unknown peers
    - Fix possible leak of stack data in cm_accept
    - Restrict reported caps based on core provider
    - Delay starting listen until endpoint fully initialized
    - Verify valid atomic size
  - Sockets
    - Fix coverity reports on uninitialized data
    - Check for NULL pointers passed to memcpy
    - Add missing error return code from sock_ep_enable
  - TCP
    - Fix performance regression resulting from sparse pollfd sets
    - Fix assertion failure in CQ progress function
    - Do not generate error completions for inject msgs
    - Fix use of incorrect event names in progress handler
    - Fix check for CQ data in tagged messages
    - Make start_op array a static to reduce memory

OBS-URL: https://build.opensuse.org/request/show/971079
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=69
2022-04-20 11:30:05 +00:00
Dominique Leuenberger
77029653be Accepting request 933768 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/933768
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=30
2021-11-28 20:29:57 +00:00
Nicolas Morey-Chaisemartin
a69e2dce28 Accepting request 932983 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.14.0
  - Add time stamps to log messages
  - Fix gdrcopy calculation of memory region size when aligned
  - Allow user to disable use of p2p transfers
  - Update fi_tostr print FI_SHARED_CONTEXT text instead of value
  - Update fi_tostr to output field names matching header file names
  - Fix narrow race condition in ofi_init
  - Add new fi_log_sparse API to rate limit repeated log output
  - Define memory registration for buffers used for collective operations
  - EFA, SHM, TCP, RXM, and verbs fixes

OBS-URL: https://build.opensuse.org/request/show/932983
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=68
2021-11-25 14:12:36 +00:00
Dominique Leuenberger
b350b1181b Accepting request 928954 from science:HPC
- Enable PSM3 provider (jsc#SLE-18754)

- Update to 1.13.2
  - Sort DL providers to ensure consistent load ordering
  - Update hooking providers to handle fi_open_ops calls to avoid crashes
  - Replace cassert with assert.h to avoid C++ headers in C code
  - Enhance serialization for memory monitors to handle external monitors
  - EFA, SHM, TCP, RxM and vers fixes

OBS-URL: https://build.opensuse.org/request/show/928954
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=29
2021-11-08 16:24:08 +00:00
Nicolas Morey-Chaisemartin
ad6d9ec62e Accepting request 928952 from home:NMoreyChaisemartin:branches:science:HPC
- Enable PSM3 provider (jsc#SLE-18754)

OBS-URL: https://build.opensuse.org/request/show/928952
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=67
2021-11-03 08:07:55 +00:00
Nicolas Morey-Chaisemartin
dd36aca7a8 Accepting request 928694 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.13.2
  - Sort DL providers to ensure consistent load ordering
  - Update hooking providers to handle fi_open_ops calls to avoid crashes
  - Replace cassert with assert.h to avoid C++ headers in C code
  - Enhance serialization for memory monitors to handle external monitors
  - EFA, SHM, TCP, RxM and vers fixes

OBS-URL: https://build.opensuse.org/request/show/928694
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=66
2021-11-02 09:39:02 +00:00
Dominique Leuenberger
1668f45c04 Accepting request 917139 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/917139
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=28
2021-09-08 19:36:33 +00:00
Nicolas Morey-Chaisemartin
a480721370 Accepting request 917134 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.13.1
  - Enable loading ZE library with dlopen()
  - Add IPv6 support to fi_pingpong
  - EFA, PSM3 and SHM fixes

OBS-URL: https://build.opensuse.org/request/show/917134
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=65
2021-09-06 14:36:39 +00:00
Dominique Leuenberger
7c552a978d Accepting request 905237 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/905237
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=27
2021-07-16 20:12:28 +00:00
Nicolas Morey-Chaisemartin
c26bb2e322 Accepting request 905235 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.13.0
  - Fix behavior of fi_param_get parsing an invalid boolean value
  - Add new APIs to open, export, and import specialized fid's
  - Define ability to import a monitor into the registration cache
  - Add API support for INT128/UINT128 atomics
  - Fix incorrect check for provider name in getinfo filtering path
  - Allow core providers to return default attributes which are lower then
    maximum supported attributes in getinfo call
  - Add option prefer external providers (in order discovered) over internal
    providers, regardless of provider version
  - Separate Ze (level-0) and DRM dependencies
  - Always maintain a list of all discovered providers
  - Fix incorrect CUDA warnings
  - Fix bug in cuda init/cleanup checking for gdrcopy support
  - Shift order providers are called from in fi_getinfo, move psm2 ahead of
    psm3 and efa ahead of psmX
  - See NEWS.md for changelog

OBS-URL: https://build.opensuse.org/request/show/905235
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=64
2021-07-09 10:59:38 +00:00
Richard Brown
890f767b43 Accepting request 882724 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/882724
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=26
2021-04-08 19:01:51 +00:00
Nicolas Morey-Chaisemartin
948cc1e28f Accepting request 882701 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.12.1
  - Fix initialization checks for CUDA HMEM support
  - Fail if a memory monitor is requested but not available
  - Adjust priority of psm3 provider to prefer HW specific providers,
    such as efa and psm2
  - EFA and PSM3 fixes
  - See NEWS.md for changelog

OBS-URL: https://build.opensuse.org/request/show/882701
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=62
2021-04-02 13:48:53 +00:00
Richard Brown
83459a9280 Accepting request 879116 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/879116
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=25
2021-03-16 14:42:51 +00:00
Nicolas Morey-Chaisemartin
1cc7aa642e Accepting request 879115 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.12.0
  - See NEWS.md for changelog

- Update to 1.12.0
  - See NEWS.md for changelog

OBS-URL: https://build.opensuse.org/request/show/879115
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=60
2021-03-15 09:05:41 +00:00
Richard Brown
43f306a87d Accepting request 872745 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/872745
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=24
2021-02-22 13:22:22 +00:00
Nicolas Morey-Chaisemartin
c71b12878d Accepting request 872743 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.11.2 (bsc#1181983)
  - See NEWS.md for changelog

OBS-URL: https://build.opensuse.org/request/show/872743
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=58
2021-02-16 09:10:48 +00:00
Dominique Leuenberger
1c055cec97 Accepting request 841254 from science:HPC
OBS-URL: https://build.opensuse.org/request/show/841254
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/libfabric?expand=0&rev=23
2020-10-14 13:37:52 +00:00