Commit Graph

36 Commits

Author SHA256 Message Date
779fd8ecd7 Accepting request 1164368 from home:NMorey:branches:science:HPC
- Enable ucx and new efa provider on 64b architectures.
- Use a single changes file for libfabric and fabtests.
- Update to 1.21.0
  - Core
    - Various update and fixed in man pages
    - Fix xpmem memory corruption
    - Extend FI_PROVIDER_PATH to allow setting preferred DL provider
    - Add a SECURITY.md file
    - Document preferred threading model for scalable endpoints
    - Move FI_PRIORITY to internal flag
    - Remove FI_PROV_SPECIFIC
    - Remove unimplemented or unused features
    - Support cntr byte counting
    - configure: Do not check for xpmem if disabled
    - Add FI_PROGRESS_CONTROL_UNIFIED
    - hmem/cuda: Get multiple attributes at once in cuda_is_addr_valid
    - configure: Add -pipe by default to CFLAGS
    - Selectively generate warnings on failed loading of DL providers
    - hmem: introduce ofi_dev_reg_copy_*_iov ops
    - Print provider path on fabric creation
    - Introduce FI_OPT_SHARED_MEMORY_PERMITTED
    - README.md: Add badge for openssf scorecard
    - man: Regulate the fi_setopt call sequence.
    - man: Clarify the usage of FI_RMOTE_CQ_DATA flag
    - man: Add ucx provider to the fi_provider man page
    - configure.ac: add extra check for 128 bit atomic support
    - include/osd: align atomic complex definitions
    - hmem/synapseai: Refine the error handling and warning
    - Specify C11 standard for Visual Studio builds
    - configure: Do not check for xpmem if disabled

OBS-URL: https://build.opensuse.org/request/show/1164368
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=101
2024-04-03 15:32:26 +00:00
0dfc65be02 Accepting request 1161331 from home:NMorey:branches:science:HPC
- Update to 1.20.1
  - Core
    - hmem/ze: Change the library name passed to dlopen
    - hmem/ze: map device id to physical device
    - hmem/ze: skip duplicate initialization
    - hmem/ze: dynamically allocate device resources based on number of devices
    - hmem/ze: fix hmem_ze_copy_engine variable look up
    - hmem/ze: Increase ZE_MAX_DEVICES to 32
    - man: Fix typo in fi_getinfo man page
    - Fix compiler warning when compiling with ICX
    - man: Fix fi_rxm.7 and fi_collective.3 man pages
    - man: Update EFA docs for FI_EFA_INTER_MIN_READ_WRITE_SIZE
  - EFA
    - efa_rdm_ep_record_tx_op_submitted() rm peer lookup
    - Remove peer lookup from efa_rdm_pke_sendv()
    - Make handshake response use txe
    - test: Only close SHM if SHM peer is Created
    - Handshake code allocs txe via efa util
    - Initialize txe.rma_iov_count to 0
    - Switch fi_addr to efa_rdm_peer in trigger_handshake
    - Downgrade EFA Endpoint Creation WARN to INFO
    - Init srx_ctx before use
    - Clean up generic_send path
    - Pass in efa_rdm_ep to efa_rdm_msg_generic_recv()
    - Make recv path slightly more efficient
    - re-org rma write to avoid duplicate checks
    - Add missing sync_memops call to writedata
    - use peer pointer from txe in read, write and send
    - Pass in peer pointer to txe
    - Get rid of noop instruction from empty #define

OBS-URL: https://build.opensuse.org/request/show/1161331
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=99
2024-03-25 08:50:35 +00:00
5587bbb374 Accepting request 1127573 from home:NMorey:branches:science:HPC
- Update to 1.20.0 (jsc#PED-5777, jsc#PED-5893, jsc#PED-5889)

OBS-URL: https://build.opensuse.org/request/show/1127573
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=95
2023-11-19 18:58:48 +00:00
a07202472f Accepting request 1108986 from home:NMorey:branches:science:HPC
- Update to 1.19.0
  - Core
    - General code cleanup and restructuring
    - Add ofi_hmem_any_ipc_enabled()
    - ofi_consume_iov allows 0-byte consume
    - ofi_consume_iov consistency
    - ofi_indexer: return error code when iterating
    - getinfo: Add post filters for domain and fabric names
    - Filter loopback device if iface is specified
    - bsock: Fix error checking for -EAGAIN
    - windows/osd: Remove unneeded check to silence coverity
    - windows/osd: Move variable declaration to silence coverity
    - Introduce gdrcopy awareness to hmem copy
    - mr/cache: Fix fi_mr_info initialization
    - hmem_cuda: remove gdrcopy from cuda hmem copy path
    - iouring: Fix wrong indent in ofi_sockapi_accept_uring()
    - Implement ofi_sockctx_uring_poll_add()
    - hmem: introduce gdrcopy from/to cuda iov functions
    - hmem: Deprecate `FI_HMEM_CUDA_ENABLE_XFER`
    - hmem_cuda: Restrict CUDA IPC based on peer accessibility
    - hmem_cuda: Log number of CUDA devices detected
    - hmem_cuda: Refactor global variables
    - tostr: Remove the extra dir "shared/" from "include/" and "src/" .
    - hmem_ze: fix ZE is valid check
    - hmem_rocr: fix offset calculation
    - hmem_rocr: use ofi spinlock functions
    - hmem_rocr: minor fixes
    - hmem_neuron: convert warn to info for nrt_get_dmabuf_fd not found
    - hmem_neuron: check existance of neuron devices during initialization
    - tostr: Moved Windows functions in shared/ofi_str.c to windows/osd.h

OBS-URL: https://build.opensuse.org/request/show/1108986
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=93
2023-09-05 07:23:01 +00:00
1545d1225e Accepting request 1096631 from home:NMorey:branches:science:HPC
- Update to 1.18.1
  - Core
    - Fix build warning for ofi_dynpoll_get_fd
  - EFA
    - Handle 0-byte writes
    - Apply byte_in_order_128_byte for all memory type
    - Increase default shm_av_size to 256
    - Force handshake before selecting rtm for non-system ifaces.
    - Only select readbase_rtm when both sides support rdma-read
    - Bugfix for initializing SHM offload
    - Correct CPPFLAGS during configure
    - Make setopt support sendrecv aligned 128 bytes
    - Make data size to be 128 byte multiples for in-order aligned send/recv
    - prepare local read pkt entry for in-order aligned send/recv.
    - Disable gdrcopy and cudamemcpy for in-order aligned recv.
    - Increase the pad size in rxr_pkt_entry
    - Make readcopy pkt pool 128 byte aligned
    - Introduce alignment to support in order aligned ops
    - Fix a bug when calling ibv_query_qp_data_in_order
    - RMA operations will ensure FI_ATOMIC cap
    - RMA operations will ensure FI_RMA cap
    - Unittest atomics without FI_ATOMIC cap.
    - Unittest RMA without FI_RMA cap.
    - Refactor pkt_entry assignment in poll_ibv loop
    - Fixes for RDMA Write and Writedata
  - RXM
    - Revert rxm util peer CQ support
    - Fix credit size parameter for flow ctrl
  - SHM
    - Fix DSA enable

OBS-URL: https://build.opensuse.org/request/show/1096631
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=89
2023-07-03 16:43:43 +00:00
add54a60b7 Accepting request 1080188 from home:NMorey:branches:science:HPC
- Update to 1.18.0
  - Core
    - rocr: fix offset calculation
    - rocr: use ofi spinlock functions
    - rocr: minor fixes
    - neuron: convert warn to info for nrt_get_dmabuf_fd not found
    - neuron: check existance of neuron devices during initialization
    - neuron: Add support for neuron dma-buf
    - ze: update ZE to support new driver index specification
    - List variables read from config file
    - Add switch to prefer system-config over environment
    - Add basic system-config support for setting library variables
    - Move peer provider defines into new header
    - rocr: Support asynchronous memory copies
    - rocr: Add support for ROCR IPC
    - rocr: rename rocr data-structures
    - synpaseai: return 0 for host_register and host_deregister
    - fabric: Improve log level of provider mismatch
    - cuda: Allow CUDA IPC when P2P disabled
    - ze: add ZE command list pool to reuse command lists
    - cuda: implement cuda_get_xfer_setting for non cuda build
    - cuda: adjust FI_HMEM_CUDA_ENABLE_XFER behavior
    - cuda.c: Add const to param to remove warning
    - Add IFF_RUNNING check to indicate iface is up and running
    - io_uring support enhancements
  - EFA
    - Implement CUDA support on instance types that do not support GPUDirect RDMA
    - Implement fi_write using device's RDMA write capability
    - Enrich error messages with debug and connection info
    - Implement support for FI_OPT_EFA_USE_DEVICE_RDMA in fi_setopt

OBS-URL: https://build.opensuse.org/request/show/1080188
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=85
2023-04-18 20:47:57 +00:00
6d086ca72a Accepting request 1075155 from home:NMorey:branches:science:HPC
- Update to 1.17.1
  - Core
    - hmem_cuda Add const to param to remove warning
    - Fix typos in fi_ext.h
    - ofi_epoll: Remove unused hot_index struct member
  - EFA
    - Print local/peer addresses for RX write errors
    - Unit test to verify no copy with shm for small host message
    - Avoid unnecessary copy when sending data from shm
    - Compare pci bus id in hints
    - Fix double free in rxr endpoint init
  - Hooks
    - dmabuf_peer_mem: Handle IPC handle caching in L0
  - OPX
    - Exclude from build if missing needed defines
    - Move some logs to optimized builds
    - Fix build warnings for unused return code from posix_memalign
    - Add reliability sanity check to detect when send buffer is illegally altered
    - SDMA Completion workaround for driver cache invalidation race condition
    - Fix replay payload pointer increment
    - Handle completion counter across multiple writes in SDMA
    - Cleanup pointers after free()
    - Modify domain creation to handle soft cache errors
    - Two biband performance improvements
    - Fixes based on Coverity Scan related to auto progress patch
    - Changed poll many argument to rx_caps instead of caps
    - Resynch with server configured for Multi-Engines (DAOS CART Self Tests)
    - Remove import_monitor as ENOSYS case
    - Address memory leaks reported on OFIWG issues page
    - Remove unused fields
    - Fix unwanted print statement case
    - Add replays over SDMA
    - Implement basic TID Cache
    - Revert work_pending check change
    - Fix use_immediate_blocks
    - Restore state after replay packet is NULL
    - Fix memory leak from early arrival packets.
    - Fix segfault in SHM operations from uninitialized value in atomic path.
    - Prevent SDMA work entries from being reused with outstanding
      replays pointing to bounce buf.
    - Set runtime as default for OPX_AV
    - Fix RTS replay immediate data
    - Fix errors caught by the upstream libfabric Coverity Scan
    - Support multiple HFI devices
    - Support OFI_PORT and Contiguous endpoint addresses
    - Update man pages
  - Util
    - util_cq: Remove annoying WARNING message for FI_AFFINITY

OBS-URL: https://build.opensuse.org/request/show/1075155
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=83
2023-03-29 08:24:52 +00:00
Nicolas Morey-Chaisemartin
b4457cf5d3 Accepting request 1012023 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.16.1
  - Core
    - Fix windows implementation to remove fd from poll set
  - PSM3
    - Add missing files to release tarball
  - Util
    - Handle NULL address insertion to fi_av_insert
- Drop prov-rxm-Disable-128-bit-atomics.patch which was merged upstream

- Update to 1.16.1
  - Core
    - Fix windows implementation to remove fd from poll set
  - PSM3
    - Add missing files to release tarball
  - Util
    - Handle NULL address insertion to fi_av_insert
- Drop prov-rxm-Disable-128-bit-atomics.patch which was merged upstream

OBS-URL: https://build.opensuse.org/request/show/1012023
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=79
2022-10-17 08:21:22 +00:00
Nicolas Morey-Chaisemartin
d98f48a74f Accepting request 1007631 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.16.0 (jsc#PED-351, jsc#PED-190)
  - Core
    - Added HMEM IPC cache
    - Use exact string comparison checks for network interfaces
    - Restructuring of poll/epoll abstraction
    - Add ability to disable locks completely in debug builds
    - Serialize access to modifying the logging calls
    - Minor fixes to fi_tostr text formatting
    - Add hmem interface checks to memory registration
  - EFA
    - Added support of Synapse AI memory.
    - Improved error message
  - Net
    - Temporarily forked, optimized version of tcp provider
    - Focused on improved performance and scalability over tcp sockets
    - Fork ensures tcp provider stability while net provider is developed
    - Shares the tcp provider protocol and base implementation for msg endpoints
    - Integrates direct support for rdm endpoints, using a derivative from rxm
    - Implements own protocol for rdm endpoints, separate from rxm;tcp
  - OPX
    - Added initial support for SDMA
    - General performance enhancements
    - Performance improvements to reliability protocol
    - Improved deferred work pending complete
    - Added support for OPX_AV=runtime
    - Support iov memory registration ops
    - Added DAOS RPC support
    - Atomic ops enhancements
    - Improved documentation
    - Debug build enhancements
    - Fixed compiler warnings
    - Reduced time to compile prov/opx code
    - General bug fixes
    - Fixed PSN wrapping scaling
    - Added intranode fence
    - Addressed bugs discovered by coverity scan
  - PSM2
    - Fix sending CQ data in some instances of fi_tsendmsg
  - PSM3
    - Updated to match Intel Ethernet Fabric Suite (IEFS) 11.3 release
  - RxM
    - Update to read multiple completions at once from msg provider
    - Move RxM AV implementation to util code to share with net provider
    - Minor code cleanups
  - SHM
    - Implement and use ipc_cache
    - Add log messages for debugging and error tracking
    - Fix check for FI_MR_HMEM mr_mode
    - Move shm signal handlers initialization to EP
    - Added log messages for errors detected
  - TCP
    - Fix incorrect signaling of the CQ
    - Increase max number of poll events to retrieve
    - Acquire ep lock prior to flushing socket in shutdown
    - Verify ep state prior to progressing socket data
    - Read cm error data when receiving connreq response
    - Log error on connect failure
    - Fix assertion failure in CQ progress function
  - Util
    - Fix text in log of UFFD ioctl failure
    - Introduce cuda ipc monitor
    - Fix CQ memory leak handling overflow
    - Fix MR mode bit check for ver 1.5 and greater
    - Add max_array_size to track/check array overflow
    - Always progress transfers when reading from a CQ
    - Handle NULL address insertion
    - Try IPv4 before IPv6 addresses when starting name server
    - Fix IP util av default address length
    - Fix util IP getinfo path to read hints->addr_format
    - Fix debug print mismatch
    - Fix return code when memory allocation fails.
    - Fix build sign warning in ofi_bufpool_region_alloc
    - Minor code cleanups
    - Print warning if an addr is inserted into an AV again
  - Verbs
    - Fix support of FI_SOCKADDR_IB when requested by the application
    - Ensure all posted receives are flushed to the application
    - Update ofi_mr_cache_search API for hmem IPC support
    - Reduce logging verbosity for "no active ports"
    - Fix incorrect length used in memory registration
    - Various minor bug fixes for test failures
    - Fix a memory leak getting IB address
    - Implement verbs provider on Windows over NetworkDirect API
    - Set and check address format correctly
    - Only close qp if it was initialized
    - Portable detection of loopback device
  - Fabtests
    - multi_ep: Separate EP resources and fix MR registration
    - multi_recv: Fix possible crash and check for valid buffer
    - unexpected_msg: Fix printf compiler warning
    - dgram_pingpong.c: Use out-of-band sync
    - multinode: Make multinode tests platform agnostic, fix formatting
    - ubertest: Fix string comparison to include length, fix writedata completion check
    - av_test: add support for -e <ep_type>
    - New tests:
      - dmabuf-rdma: Component level test for dma-buf RDMA
      - sock_test: Component level performance test of poll, epoll, and select
      - rdm_stress: Multi-threaded, multi-process stress test for RDM endpoints
      - sighandler_test: Regression test for signal handler restoration
- Drop patches fixed upstream:
  - prov-opx-Correctly-disable-OPX-if-unsupported.patch
  - disable-flatten-attr.patch

OBS-URL: https://build.opensuse.org/request/show/1007631
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=75
2022-10-03 07:34:47 +00:00
Nicolas Morey-Chaisemartin
abc00bb762 Accepting request 989191 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.15.1
  - Core
    - Fix fi_info indentation error in fi_tostr
    - hmem_ze: Add runtime option to choose specific copy engine
    - Cleanup of configure HMEM checks
    - Fixed stringop-truncation in ofi_ifaddr_get_speed
    - Add utility provider log suffix to make logs easier to read
    - Fix truncation of ipv6 addressing
    - hmem: add support for AWS Trainium devices
    - Fix potential sscanf overflows
    - hmem: pass through device and flags when querying memory interface
    - Rework locking in several areas to convert spinlocks to mutexes
    - Add new locking abstractions to select lock types at runtime
    - Add new FI_PROTO_RXM_TCP for optimized rxm over tcp path
    - Fix windows implementation to remove fd from poll set
  - EFA
    - Added windows support through efawin (https://github.com/aws/efawin)
    - Added support of AWS neuron.
    - Added support of using gdrcopy to copy data from host to device.
    - Fixed a bug that cause 0 byte read to fail.
    - Fixed a memory corruption issue that can caused forked process to crash.
    - Extended testing coverage through new pytest based testing framework.
  - HOOKS
    - Add new hooking provider dmabuf_peer_mem
    - Enable DL build of hooking providers
    - Add HMEM memory registration hook
  - OPX
    - New provider supporting Cornelis Networks Omni-path hardware
  - PSM3
    - Updated psm3 to match IEFS 11.2.0.0 release
    - Added support for sockets (TCP/UDP) via a runtime selectable Hardware
  Abstraction Layer (HAL)
    - Added support for IPv6 addressing in RoCE and sockets
    - Added various NIC selection filtering options (wildcarded NIC name,
      address format, wildcarded IP subnet, link speed)
    - Performance tuning in conjunction with OneAPI and OneCCL
    - Improved PSM3_IDENTIFY output
    - Rename most internal symbols to psm3_
    - Corrected vulnerabilities found during Coverity scans
    - configure options refined and help text improved
    - PSM3_MULTI_EP has been deprecated (recommend always enabled, default
      is enabled [same default as previous releases])
    - Various bug fixes
  - RxM
    - Add check that atomic size is valid
    - Add support to passthru calls to tcp provider in specific
  - TCP
    - Add assert to verify RMA source/target msg sizes match
    - Wake-up threads blocked on CQ to update their poll events
    - Fix use of incorrect events in progress handler
    - Fixes for various compile warnings, mostly on Windows
    - Add support for FI_RMA_EVENT capability
    - Add support for completion counters
    - Fix check for CQ data in tagged messages
    - Add cancel support to shared rx context
    - Add src_addr receive buffer matching
    - Add provider control to assign a src_addr with an ep
    - Handle trecv with FI_PEEK flag
    - Allow binding a CQ with an SRX
    - Restructuring of code in source files
    - Handle EWOULDBLOCK returned by send call
    - Add hot (active) pollfd
  - SHM
    - Properly chain the original signal handlers
    - Avoid uninitialized variable with invalid atomic parameters
    - Fix 0 byte SAR read
    - Initialize len parameter to accept
    - Refactor and simplify protocol code
    - Remove broken support for 128-bit atomics
    - Fix FI_INJECT flag support
    - Add assert to verify RMA source/target msg sizes match
    - Set domain threading to thread safe
    - Fix possible use of uninitiated var in av_insert
  - Util
    - Fix sign warning in ofi_bufpool_region_alloc
    - Remove unused variable from ofi_bufpool_destroy
    - Fix check for valid datatype in ofi_atomic_valid
    - Return with error if util_coll_sched_copy fails
    - Fix use of uninitialized variable in ofi_ep_allreduce
    - Fix memory access in ip_av_insertsym
    - Track ep per collective operation not with multicast
    - Restructure collective av set creation/destruction
    - Change most locks from spin locks to mutexes
    - Allow selection of spinlocks for CQ and domain objects
    - Fix AV default addrlen
    - Update fi_getinfo checks to include hints->addr_
    - Handle NULL address insertion to fi_av_insert
  - Verbs
    - Initial changes for compiling on Windows (via NetworkDirect)
    - Add a failover path to dma-buf based memory registration
    - Replace use of spin locks with mutexes
    - Check for valid qp prior to cleanup
    - Set and check for address format correct in fi_getinfo
  - Fabtests
    - hmem_cuda: used device allocated host buff to fill device buf
    - Add python scripts to control test execution
    - test_configs: include util provider in core config file
    - Add option "--pin-core"
    - Only call nrt_init once
    - Fix a bug in ft_neuron_cleanup
    - Correct help for unit test programs
    - Remove duplicate help prints from fi_mcast
    - configure.ac: fix --enable-debug=no not properly detected
    - msg_inject: handle the case ft_tsendmsg return -FI_EAGAIN
    - Add AWS Trainium device support
    - fi_inj_complete: Add FI_INJECT to fabtests
    - inj_complete.c: Make arguments align with the other tests
    - dgram_pingpong: handle the error return of fi_recv
    - recv_cancel: Remove requirement for unexpected msg handling
    - poll: Fix crash if unable to allocate pollset
    - ubertest: Add GPU testing and validation support
    - Add HMEM options parsing support
    - Update and re-enable fi_multi_ep test
- Add prov-opx-Correctly-disable-OPX-if-unsupported.patch to disable
  OPX compilation on non x86_64 systems

OBS-URL: https://build.opensuse.org/request/show/989191
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=71
2022-07-18 13:06:07 +00:00
Nicolas Morey-Chaisemartin
36cbb47841 Accepting request 971079 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.14.1
  - Core
    - Use non-shared memory allocations to use MADV_DONTFORK safely
    - Fix incorrect use of gdr_copy_from_mapping
    - Ensure proper timeout time for pollfds to avoid early exit
  - EFA
    - Handle read completion properly for multi_recv
    - Use shm's inject write when possible
    - Support 0 byte read
  - RxM
    - Ensure signaling the CQ fd after writing completion
    - Fix inject path for sending tagged messages with cq data
    - Negotiate credit based flow control support over CM
    - Add PID to CM messages to detect stale vs duplicate connections
    - Fix race handling unexpected messages from unknown peers
    - Fix possible leak of stack data in cm_accept
    - Restrict reported caps based on core provider
    - Delay starting listen until endpoint fully initialized
    - Verify valid atomic size
  - Sockets
    - Fix coverity reports on uninitialized data
    - Check for NULL pointers passed to memcpy
    - Add missing error return code from sock_ep_enable
  - TCP
    - Fix performance regression resulting from sparse pollfd sets
    - Fix assertion failure in CQ progress function
    - Do not generate error completions for inject msgs
    - Fix use of incorrect event names in progress handler
    - Fix check for CQ data in tagged messages
    - Make start_op array a static to reduce memory

OBS-URL: https://build.opensuse.org/request/show/971079
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=69
2022-04-20 11:30:05 +00:00
Nicolas Morey-Chaisemartin
a69e2dce28 Accepting request 932983 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.14.0
  - Add time stamps to log messages
  - Fix gdrcopy calculation of memory region size when aligned
  - Allow user to disable use of p2p transfers
  - Update fi_tostr print FI_SHARED_CONTEXT text instead of value
  - Update fi_tostr to output field names matching header file names
  - Fix narrow race condition in ofi_init
  - Add new fi_log_sparse API to rate limit repeated log output
  - Define memory registration for buffers used for collective operations
  - EFA, SHM, TCP, RXM, and verbs fixes

OBS-URL: https://build.opensuse.org/request/show/932983
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=68
2021-11-25 14:12:36 +00:00
Nicolas Morey-Chaisemartin
dd36aca7a8 Accepting request 928694 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.13.2
  - Sort DL providers to ensure consistent load ordering
  - Update hooking providers to handle fi_open_ops calls to avoid crashes
  - Replace cassert with assert.h to avoid C++ headers in C code
  - Enhance serialization for memory monitors to handle external monitors
  - EFA, SHM, TCP, RxM and vers fixes

OBS-URL: https://build.opensuse.org/request/show/928694
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=66
2021-11-02 09:39:02 +00:00
Nicolas Morey-Chaisemartin
a480721370 Accepting request 917134 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.13.1
  - Enable loading ZE library with dlopen()
  - Add IPv6 support to fi_pingpong
  - EFA, PSM3 and SHM fixes

OBS-URL: https://build.opensuse.org/request/show/917134
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=65
2021-09-06 14:36:39 +00:00
Nicolas Morey-Chaisemartin
c26bb2e322 Accepting request 905235 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.13.0
  - Fix behavior of fi_param_get parsing an invalid boolean value
  - Add new APIs to open, export, and import specialized fid's
  - Define ability to import a monitor into the registration cache
  - Add API support for INT128/UINT128 atomics
  - Fix incorrect check for provider name in getinfo filtering path
  - Allow core providers to return default attributes which are lower then
    maximum supported attributes in getinfo call
  - Add option prefer external providers (in order discovered) over internal
    providers, regardless of provider version
  - Separate Ze (level-0) and DRM dependencies
  - Always maintain a list of all discovered providers
  - Fix incorrect CUDA warnings
  - Fix bug in cuda init/cleanup checking for gdrcopy support
  - Shift order providers are called from in fi_getinfo, move psm2 ahead of
    psm3 and efa ahead of psmX
  - See NEWS.md for changelog

OBS-URL: https://build.opensuse.org/request/show/905235
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=64
2021-07-09 10:59:38 +00:00
Nicolas Morey-Chaisemartin
948cc1e28f Accepting request 882701 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.12.1
  - Fix initialization checks for CUDA HMEM support
  - Fail if a memory monitor is requested but not available
  - Adjust priority of psm3 provider to prefer HW specific providers,
    such as efa and psm2
  - EFA and PSM3 fixes
  - See NEWS.md for changelog

OBS-URL: https://build.opensuse.org/request/show/882701
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=62
2021-04-02 13:48:53 +00:00
Nicolas Morey-Chaisemartin
1cc7aa642e Accepting request 879115 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.12.0
  - See NEWS.md for changelog

- Update to 1.12.0
  - See NEWS.md for changelog

OBS-URL: https://build.opensuse.org/request/show/879115
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=60
2021-03-15 09:05:41 +00:00
Nicolas Morey-Chaisemartin
c71b12878d Accepting request 872743 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.11.2 (bsc#1181983)
  - See NEWS.md for changelog

OBS-URL: https://build.opensuse.org/request/show/872743
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=58
2021-02-16 09:10:48 +00:00
Nicolas Morey-Chaisemartin
039b03f454 Accepting request 841253 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.11.1 (jsc#SLE-13312)
  - See NEWS.md for changelog

OBS-URL: https://build.opensuse.org/request/show/841253
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=56
2020-10-12 11:41:12 +00:00
Nicolas Morey-Chaisemartin
f456b2d922 Accepting request 839527 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.11.0
  - See NEWS.md for changelog

OBS-URL: https://build.opensuse.org/request/show/839527
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=54
2020-10-05 13:29:09 +00:00
Nicolas Morey-Chaisemartin
0d953f5d66 Accepting request 806799 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.10.1
  - See NEWS.md for changelog

OBS-URL: https://build.opensuse.org/request/show/806799
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=52
2020-05-18 08:15:17 +00:00
Nicolas Morey-Chaisemartin
ffab131426 Accepting request 798314 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.10.0
  - See NEWS.md for changelog

OBS-URL: https://build.opensuse.org/request/show/798314
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=50
2020-04-27 15:27:43 +00:00
Nicolas Morey-Chaisemartin
e8ffab7679 Accepting request 786346 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.9.1 (bsc#1160275)
  - See NEWS.md for changelog

- Update to 1.9.1 (bsc#1160275)
  - See NEWS.md for changelog

OBS-URL: https://build.opensuse.org/request/show/786346
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=48
2020-03-19 08:44:16 +00:00
Nicolas Morey-Chaisemartin
1191900c72 Accepting request 750766 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.9.0 (jsc#SLE-8257)
  - See NEWS.md for changelog

OBS-URL: https://build.opensuse.org/request/show/750766
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=46
2019-11-25 14:46:04 +00:00
Nicolas Morey-Chaisemartin
078084ead9 Accepting request 734300 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.8.1 (jsc#SLE-8257)
  - See NEWS.md for changelog

OBS-URL: https://build.opensuse.org/request/show/734300
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=44
2019-10-01 12:37:21 +00:00
Nicolas Morey-Chaisemartin
ec4c504a89 Accepting request 733593 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.8.0
  - See NEWS.md for changelog

- Disable LTO (boo#1133235).

OBS-URL: https://build.opensuse.org/request/show/733593
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=43
2019-09-27 07:19:29 +00:00
Nicolas Morey-Chaisemartin
6a1a2a0cfa Accepting request 692501 from home:NMoreyChaisemartin:branches:science:HPC
- Update to 1.7.1
  - See NEWS.md for changelog

OBS-URL: https://build.opensuse.org/request/show/692501
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=39
2019-04-09 07:08:43 +00:00
Nicolas Morey-Chaisemartin
99d2939085 Accepting request 672842 from home:NMoreyChaisemartin:branches:libfabric-1.7
- Update to v1.7.0
  - fabtests and libfabric repos have been merged upstream

OBS-URL: https://build.opensuse.org/request/show/672842
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=35
2019-02-08 16:06:13 +00:00
Nicolas Morey-Chaisemartin
83b7fc95da Accepting request 644645 from home:NMoreyChaisemartin:branches:sp1-staging
- Update to v1.6.2 (fate#325852)
  - Core
    - Cleanup of debug messages
    - Fix compile issues with older compilers
    - Check that all debug compiler flags are supported by compiler
  - GNI
    - Fix problems with Scalable Endpoint creation
    - Fix interoperability problem with HPC toolkit
    - Improve configuration check for kdreg
  - PSM
    - Enforce FI_RMA_EVENT checking when updating counters
    - Fix race condition in fi_cq_readerr()
    - Always try to make progress when fi_cntr_read is called
  - PSM2
    - Revert "Avoid long delay in psm2_ep_close"
    - Fix memory corruption related to sendv
    - Performance tweak for bi-directional send/recv on KNL
    - Fix CPU detection
    - Enforce FI_RMA_EVENT checking when updating counters
    - Remove stale info from address vector when disconnecting
    - Fix race condition in fi_cq_readerr()
    - Adjust reported context numbers for special cases
    - Always try to make progress when fi_cntr_read is called
    - Support control functions related to MR mode
    - Unblock fi_cntr_wait on errors
    - Properly update error counters
    - Fix irregular performance drop for aggregated RMA operations
    - Reset Tx/Rx context counter when fabric is initialized
    - Fix incorrect completion event for iov send
    - Fix occasional assertion failure in psm2_ep_close
    - Avoid long delay in psm2_ep_close
    - Fix potential duplication of iov send completion
    - Replace some parameter checking with assertions
    - Check iov limit in sendmsg
    - Avoid adding FI_TRIGGER caps automatically
    - Avoid unnecessary calls to psmx2_am_progress()
  - RXM
    - Fix incorrect increments of error counters for small messages
    - Increment write completion counter for small transfers
    - Use FI_UNIVERSE_SIZE when defining MSG provider CQ size
    - Make TX, RX queue sizes independent of MSG provider
    - Make deferred requests opt-in
    - Fill missing rxm_conn in rx_buf when shared context is not used
    - Fix an issue where MSG endpoint recv queue got empty resulting
  in a hang
    - Set FI_ORDER_NONE for tx and rx completion ordering
    - Serialize access to repost_ready_list
    - Reprocess unexpected messages on av update
    - Fix a bug in matching directed receives
    - Fix desc field when postponing RMA ops
    - Fix incorrect reporting of mem_tag format
    - Don't include FI_DIRECTED_RECV, FI_SOURCE caps if they're not needed
    - Fix matching for RMA I/O vectors
    - Fix reading pointer after freeing it.
    - Avoid reading invalid AV entry
    - Handle deleting the same address multiple times
    - Fix crash in fi_av_remove if FI_SOURCE wasn't enabled
  - Sockets
    - Increase maximum messages size as MPICH bug work-around
    - Fix use after free error handling triggered ops.
  - Verbs
    - Detect string format of wildcard address in node argument
    - Don't report unusable fi_info (no source IP address)
    - Don't assert when a verbs device exposes unsupported MTU types
    - Report correct rma_iov_limit
    - Add new variable - FI_VERBS_MR_CACHE_MERGE_REGIONS
    - eq->err.err must return a positive error code

OBS-URL: https://build.opensuse.org/request/show/644645
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=32
2018-10-25 13:11:09 +00:00
Nicolas Morey-Chaisemartin
1f3a59b06d Accepting request 587420 from home:NMoreyChaisemartin:branches:science:HPC
- Update to v1.6.0
  - Core
    - Introduces support for performing RMA operations to persistent memory
      See FI_RMA_PMEM capability in fi_getinfo.3
    - Define additional errno values
    - General code cleanups and restructuring
    - Force provider ordering when using dynamically loaded providers
    - Add const to fi_getinfo() hints parameter
    - Improve use of epoll for better scalability
    - Fixes to generic name service
  - PSM
    - Move environment variable reading out from fi_getinfo()
    - Shortcut obviously unsuccessful fi_getinfo() calls
    - Remove excessive name sever implementation
    - Enable ordering of RMA operations
  - PSM2
    - Skip inactive units in round-robin context allocation
    - Allow contexts be shared by Tx-only and Rx-only endpoints
    - Use utility functions to check provider attributes
    - Turn on FI_THREAD_SAFE support
    - Make address vector operations thread-safe
    - Move environment variable reading out from fi_getinfo()
    - Reduce noise when optimizing tagged message functions
    - Shortcut obviously unsuccessful fi_getinfo() calls
    - Improve how Tx/Rx context limits are handled
    - Support auto selection from two different tag layout schemes
    - Add provider build options to debug output
    - Support remote CQ data for tagged messages, add specialization.
    - Support opening multiple domains
    - Put trigger implementation into a separate file
    - Update makefile and configure script
    - Replace allocated context with reserved space in psm2_mq_req
    - Limit exported symbols for DSO provider
    - Reduce HW context usage for certain TX only endpoints
    - Remove unnecessary dependencies from the configure script
    - Refactor the handling of op context type
    - Optimize the conversion between 96-bit and 64-bit tags
    - Code refactoring for completion generation
    - Remove obsolete feature checking code
    - Report correct source address for scalable endpoints
    - Allow binding any number of endpoints to a CQ/counter
    - Add shared Tx context support
    - Add alternative implementation for completion polling
    - Change the default value of FI_PSM2_DELAY to 0
    - Add an environment variable for automatic connection cleanup
    - Abstract the completion polling mechanism
    - Use the new psm2_am_register_handlers_2 function when available
    - Allow specialization when FI_COMPLETION op_flag is set.
    - Put Tx/Rx context related functions into a separate file
    - Enable PSM2 multi-ep feature by default
    - Add option to build with PSM2 source included
    - Simplify the code for checking endpoint capabilities
    - Simplify the handling of self-targeted RMA operations
    - Allow all free contexts be used for scalable endpoints
    - Enable ordering of RMA operations
    - Enable multiple endpoints over PSM2 multi-ep support
    - Support multiple Tx/Rx contexts in address vector
    - Remove the virtual lane mechanism
    - Less code duplication in tagged, add more specialization.
    - Allow PSM2 epid be reused within the same session
    - Turn on user adjustable inject size for all operations
    - Use pre-allocated memory pool for RMA requests
    - Add support for lazy connection
    - Various bug fixes
  - SHM
    - Initial release of shared memory provider
    - See the fi_shm.7 man page for details on available features and limitations
  - Sockets
    - Scalability enhancements
    - Fix issue associating a connection with an AV entry that could result in
      application hangs
    - Add support for new persistent memory capabilities
    - Fix fi_cq_signal to unblock threads waiting on cq sread calls
    - Fix epoll_wait loop handling to avoid out of memory errors
    - Add support for TCP keepalives, controllable via environment variables
    - Reduce the number of threads allocated for handling connections
    - Several code cleanups in response to static code analysis reports
    - Fix reporting multiple completion events for the same request in error cases
  - usNIC
    - Minor adjustments to match new core MR mode bits functionality
    - Several code cleanups in response to static code analysis reports
  - Verbs
    - Code cleanups and simplifications
    - General code optimizations to improve performance
    - Fix handling of wildcard addresses
    - Check for fatal errors during connection establishment
    - Support larger inject sizes
    - Fix double locking issue
    - Add support for memory registration caching (disabled by default)
    - Enable setting thread affinity for CM threads
    - Fix hangs in MPI closing RDM endpoints
    - Add support for different CQ formats
    - Fix RMA read operations over iWarp devices
    - Optimize CM progress handling
    - Several bug fixes

OBS-URL: https://build.opensuse.org/request/show/587420
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=30
2018-03-15 08:24:31 +00:00
Nicolas Morey-Chaisemartin
c053ee0122 Accepting request 558744 from home:NMoreyChaisemartin:branches:science:HPC
- Update to v1.5.3
  - Core
    - Handle malloc failures
    - Ensure global lock is initialized on Windows
    - Fix spelling and formatting errors in man pages
  - PSM
    - Fix print format mismatches
    - Remove 15 second startup delay when no hardware is installed
    - Preserve FI_MR_SCALABLE mode bit for backwards compatability
  - PSM2
    - Fix print format mismatches
    - Allow all to all communication between scalable endpoints
    - Preserve FI_MR_SCALABLE mode bit for backwards compatability
    - Fix reference counting issue with opened domains
    - Fix segfault for RMA/atomic operations to local scalable endpoints
    - Fix resource counting related issues for Tx/Rx contexts
    - Allow completion suppression when fi_context is non-NULL
    - Use correct queue for triggered operations with scalable endpoints
  - Sockets
    - Fix check for invalid connection handle
    - Fix crash in fi_av_remove
  - Util
    - Fix number of bits used for connection index
  - Verbs
    - Fix incorrect CQ entry data for MSG endpoints
    - Properly check for errors from getifaddrs
    - Retry getifaddr on failure because of busy netlink sockets
    - Ack CM events on error paths
- Remove 0001-prov-psm-Eliminate-psm2-compat-library-delay-with-hf.patch
   as it was merged upstream

OBS-URL: https://build.opensuse.org/request/show/558744
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=28
2017-12-20 09:03:14 +00:00
Nicolas Morey-Chaisemartin
ad87934964 Accepting request 544093 from home:NMoreyChaisemartin:branches:science:HPC
- Update to v1.5.2
  - Core
    - Fix Power PC 32-bit build
  - Sockets
    - Fix incorrect reporting of counter attributes
  - Verbs
    - Fix reporting attributes based on device limits
    - Fix incorrect CQ size reported for iWarp NICs
    - Update man page with known issues for specific NICs
    - Fix FI_RX_CQ_DATA mode check
    - Disable on-demand paging by default (can cause data corruption)
    - Disable loopback (localhost) addressing (causing failures in MPI)

OBS-URL: https://build.opensuse.org/request/show/544093
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=26
2017-11-21 08:58:36 +00:00
Nicolas Morey-Chaisemartin
db51c4fa52 Accepting request 532702 from home:NMoreyChaisemartin:branches:science:HPC
- Update to v1.5.1
  - Core
    - Fix initialization used by DL providers to avoid crash
    - Add checks for null hints and improperly terminated strings
    - Check for invalid core names passed to fabric open
    - Provide consistent provider ordering when using DL providers
    - Fix OFI_LIKELY definitions when GNUC is not present
  - GNI
    - Add ability to detect local PE rank
    - Fix compiler/config problems
    - Fix CQ read error corruption
    - Remove tests of deprecated interfaces
  - PSM
    - Fix CQ corruption reporting errors
    - Always generate a completion on error
  - PSM2
    - Fix CQ corruption reporting errors
    - Always generate a completion on error
    - Add checks to handle out of memory errors
    - Add NULL check for iov in atomic readv/writev calls
    - Fix FI_PEEK src address matching
    - Fix bug in scalable endpoint address resolution
    - Fix segfault bug in RMA completion generation
  - Sockets
    - Fix missing FI_CLAIM src address data on completion
    - Fix CQ corruption reporting errors
    - Fix serialization issue wrt out of order CPU writes to Tx ring buffer
  - Verbs
    - Allow modifying rnr retry timout to improve performance
    - Add checks to handle out of memory errors

OBS-URL: https://build.opensuse.org/request/show/532702
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=21
2017-10-09 09:47:51 +00:00
Nicolas Morey-Chaisemartin
ec21810cab Accepting request 521126 from home:NMoreyChaisemartin:branches:science:HPC
- Update _service to allow auto updates from github

OBS-URL: https://build.opensuse.org/request/show/521126
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=19
2017-09-05 13:35:45 +00:00
Nicolas Morey-Chaisemartin
077af6fa82 Accepting request 515855 from home:NMoreyChaisemartin:branches:science:HPC
- Update to v1.5.0
  * Authorization keys Authorization keys, commonly referred to as job keys,
    are used to isolate processes from communicating with other processes
    for security purposes.
  * Multicast support Datagram endpoints can now support multicast communication.
  * (Experimental) socket-like endpoint types New FI_SOCK_STREAM and FI_SOCK_DGRAM
    endpoint types are introduced. These endpoint types target support of cloud
    and enterprise based middleware and applications.
  * Tagged atomic support Atomic operations can now target tagged receive
    buffers, in addition to RMA buffers.
  * (Experimental) deferred work queues Deferred work queues are enhanced triggerred
    operations. They target support for collective-based operations.
  * New mode bits: FI_RESTRICTED_COMP and FI_NOTIFY_FLAGS_ONLY These mode bits
    support optimized completion processing to minimize software overhead.
  * Multi-threaded error reporting Reading CQ and EQ errors now allow the application
    to provide the error buffer, eliminating the need for the application to
    synchronize between multiple threads when handling errors.
  * FI_SOURCE_ERR capability This feature allows the provider to validate and
    report the source address for any received messages.
  * FI_ADDR_STR string based addressing Applications can now request and use
    addresses provided using a standardized string format. This makes it easier
    to pass full addressing data through a command line, or handle address exchange
    through text files.
  * Communication scope capabilities: FI_LOCAL_COMM and FI_REMOTE_COMM Used to
    indicate if an application requires communication with peers on the same
    node and/or remote nodes.
  * New memory registration modes The FI_BASIC_MR and FI_SCALABLE_MR memory registration
    modes have been replaced by more refined registration mode bits. This allows
    applications to make better use of provider hardware capabilities when dealing
    with registered memory regions.
  * New mode bit: FI_CONTEXT2 Some providers need more than the size provided by the
    FI_CONTEXT mode bit setting. To accomodate such providers, an FI_CONTEXT2 mode bit
    was added. This mode bit doubles the amount of context space that an application
    allocates on behalf of the provider.
  * PSM provider notes
    * Improve the name server functionality and move to the utility code
    * Handle updated mr_mode definitions
    * Add support of 32 and 64 bit atomic values
  * PSM2 provider notes
    * Add option to adjust the locking level
    * Improve the name server functionality and move to the utility code
    * Add support for string address format
    * Add an environment vaiable for message inject size
    * Handle FI_DISCARD in tagged receive functions
    * Handle updated mr_mode definitions
    * Add support for scalable endpoint
    * Add support of 32 and 64 bit atomic values
    * Add FI_SOURCE_ERR to the supported caps
    * Improve the method of checking device existence
  * Sockets provider notes
    * Updated and enhanced atomic operation support.
    * Add support for experimental deferred work queue operations.
    * Fixed counter signaling when used with wait sets.
    * Improved support on Windows.
    * Cleaned up event reporting for destroyed endpoints.
    * Fixed several possible crash scenarios.
    * Fixed handling socket disconnect events which could hang the provider.
  * UDP provider notes
    * Add support for multicast data transfers
  * Verbs provider notes
    * Fix an issue where if the user requests higher values for tx, rx
      context sizes than default it wasn't honored.
    * Introduce env variables for setting default tx, rx context sizes and iov limits.
    * Report correct completion ordering supported by MSG endpoints.
  * Fix rpmbuild warnings

OBS-URL: https://build.opensuse.org/request/show/515855
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=16
2017-08-10 08:57:14 +00:00
Nicolas Morey-Chaisemartin
eec67e9b6b Accepting request 495397 from science:HPC:rdma-core
- Update to v1.4.2 (bsc#1036907).

OBS-URL: https://build.opensuse.org/request/show/495397
OBS-URL: https://build.opensuse.org/package/show/science:HPC/libfabric?expand=0&rev=9
2017-05-16 15:52:20 +00:00