Accepting request 1160107 from home:eeich:branches:science

- Cleaned up changelog:
  * Added missing changes from 0.3.22 to 0.3.24 release.
  * Formated list of package changes in markdown format for easier
    conversion.
  * Dropped all entries that are irrelevant for SUSE or to
    users:
    - build related - in particular CMAKE
    - OS-related except Linux
    - related to compilers not supported on SUSE
    - related to architectures presently not supported on SUSE

OBS-URL: https://build.opensuse.org/request/show/1160107
OBS-URL: https://build.opensuse.org/package/show/science/openblas?expand=0&rev=173
This commit is contained in:
Egbert Eich 2024-05-13 12:15:17 +00:00 committed by Git OBS Bridge
parent b0b71280c4
commit 92f8b96ec2
2 changed files with 176 additions and 46 deletions

View File

@ -9,53 +9,39 @@ Wed Jan 17 08:47:55 UTC 2024 - Egbert Eich <eich@suse.com>
- Update to version 0.3.26: - Update to version 0.3.26:
* General: * General:
- Improved the version of openblas.pc that is created by the
CMAKE build.
- Fixed a CMAKE-specific build problem on older versions of
MacOS.
- Worked around linking problems on old versions of MacOS.
- Corrected installation location of the lapacke_mangling
header in CMAKE builds.
- Added type declarations for complex variables to the - Added type declarations for complex variables to the
MSVC-specific parts of the LAPACK header. MSVC-specific parts of the LAPACK header.
- Significantly sped up ?GESV for small problem sizes by - Significantly sped up `?GESV` for small problem sizes by
introducing a lower bound for multithreading. introducing a lower bound for multithreading.
- Imported additions and corrections from the Reference-LAPACK - Imported additions and corrections from the Reference-LAPACK
project: project:
+ Added new LAPACK functions for truncated QR with pivoting + Added new LAPACK functions for truncated `QR` with pivoting
(Reference-LAPACK PRs 891&941). (Reference-LAPACK PRs 891&941).
+ Handle miscalculation of minimum work array size in corner + Handle miscalculation of minimum work array size in corner
cases (Reference-LAPACK PR 942). cases (Reference-LAPACK PR 942).
+ Fixed use of uninitialized variables in ?GEDMD and + Fixed use of uninitialized variables in `?GEDMD` and
improved inline documentation. improved inline documentation.
+ Fixed use of uninitialized variables (and consequential + Fixed use of uninitialized variables (and consequential
failures) in ?BBCSD. failures) in `?BBCSD`.
+ Added tests for the recently introduced Dynamic Mode + Added tests for the recently introduced Dynamic Mode
Decomposition functions. Decomposition functions.
+ Fixed several memory leaks in the LAPACK testsuite. + Fixed several memory leaks in the LAPACK testsuite.
+ Fixed counting of testsuite results by the Python script.
* x86-64: * x86-64:
- Fixed computation of CASUM on SkylakeX and newer targets in - Fixed computation of `CASUM` on SkylakeX and newer targets in
the special case that AVX512 is not supported by the compiler the special case that AVX512 is not supported by the compiler
or operating environment. or operating environment.
- Fixed potential undefined behaviour in the CASUM/ZASUM - Fixed potential undefined behaviour in the `CASUM`/`ZASUM`
kernels for AVX512 targets. kernels for AVX512 targets.
- worked around a problem in the pre-AVX kernels for GEMV - worked around a problem in the pre-AVX kernels for `GEMV`
* arm64: * arm64:
- Sped up SGEMM and DGEMM on Neoverse V1 and N1. - Sped up `SGEMM` and `DGEMM` on Neoverse V1 and N1.
- Sped up ?DOT on SVE-capable targets. - Sped up `?DOT` on SVE-capable targets.
- Reduced the number of targets in DYNAMIC_ARCH builds by - Reduced the number of targets in `DYNAMIC_ARCH` builds by
eliminating functionally equivalent ones. eliminating functionally equivalent ones.
* POWER: * POWER:
- Improved the SGEMM kernel for POWER10. - Improved the SGEMM kernel for POWER10.
- Fixed compilation with (very) old versions of gcc. - Fixed compilation with (very) old versions of gcc.
- Fixed detection of old 32bit PPC targets in CMAKE-based
builds.
- Added autodetection of the POWERPC 7400 subtype. - Added autodetection of the POWERPC 7400 subtype.
- Fixed CMAKE-based compilation for PPCG4 and PPC970 targets.
* LONGARCH64:
- Added and improved optimized kernels for almost all BLAS
functions.
------------------------------------------------------------------- -------------------------------------------------------------------
Wed Dec 20 12:02:55 UTC 2023 - Giacomo Comes <gcomes.obs@gmail.com> Wed Dec 20 12:02:55 UTC 2023 - Giacomo Comes <gcomes.obs@gmail.com>
@ -72,44 +58,188 @@ Wed Nov 29 05:43:18 UTC 2023 - Atri Bhattacharya <badshah400@gmail.com>
thread count thread count
- improved the code to add supplementary thread buffers in - improved the code to add supplementary thread buffers in
case of overflow case of overflow
- fixed a potential division by zero in ?ROTG - fixed a potential division by zero in `?ROTG`
- improved the ?MATCOPY functions to accept zero-sized rows or - improved the `?MATCOPY` functions to accept zero-sized rows or
columns columns
- corrected empty prototypes in function declarations - corrected empty prototypes in function declarations
- cleaned up unused declarations in the f2c-converted versions - cleaned up unused declarations in the f2c-converted versions
of the LAPACK sources of the LAPACK sources
- fixed compilation with the Cray CCE Compiler suite
- improved link line rewriting to avoid mixed libgomp/libomp - improved link line rewriting to avoid mixed libgomp/libomp
builds with clang&gfortran builds with clang&gfortran
- worked around OPENMP builds with LLVM14's libomp hanging on
FreeBSD
- improved the Makefiles to require less option duplication on
"make install"
- imported the following changes from the upcoming release - imported the following changes from the upcoming release
3.12 of Reference-LAPACK: LAPACK PR 900, LAPACK PR 904, 3.12 of Reference-LAPACK: LAPACK PR 900, LAPACK PR 904,
LAPACK PR 907, LAPACK PR 909, LAPACK PR 926, LAPACK PR 927, LAPACK PR 907, LAPACK PR 909, LAPACK PR 926, LAPACK PR 927,
LAPACK PR 928 & 930 LAPACK PR 928 & 930
* x86-64: * x86-64:
- fixed compile-time autodetection of AMD Ryzen3 and Ryzen4
cpus
- fixed capability-based fallback selection for unknown cpus - fixed capability-based fallback selection for unknown cpus
in DYNAMIC_ARCH in `DYNAMIC_ARCH`
- added AVX512 optimizations for ?ASUM on Sapphire Rapids and - added AVX512 optimizations for `?ASUM` on Intel Sapphire Rapids and
Cooper Lake Cooper Lake
* ARM64: * ARM64:
- fixed building on Apple with homebrew gcc
- fixed building with XCODE 15 - fixed building with XCODE 15
- fixed building on A64FX and Cortex A710/X1/X2 - fixed building on A64FX and Cortex A710/X1/X2
- increased the default buffer size for recent ARM server cpus - increased the default buffer size for recent arm server cpus
* POWER: * POWER:
- fixed building with the IBM xlf 16.1.1 compiler - added support for `DYNAMIC_ARCH` builds with clang
- fixed building with IBM XL C - fixed union declaration in the `BFLOAT16` test case
- added support for DYNAMIC_ARCH builds with clang - Changes in version 0.3.24
- fixed union declaration in the BFLOAT16 test case * General:
- enable optimizations for the AIX assembler on POWER10 - Declared the arguments of `cblas_xerbla` as `const`
* LOONGARCH64: (in accordance with the reference implementation
- added an optimized SGEMV kernel and others, the previous discrepancy appears to have dated
- added an optimized DTRSM kernel back to GotoBLAS)
- fixed the implementation of `?GEMMT` that was added in 0.3.23
- made cpu-specific `SWITCH_RATIO` parameters for GEMM
available to `DYNAMIC_ARCH` builds
- fixed missing `SSYCONVF` function in the shared library
- fixed parallel build logic used with gmake
- fixed several issues with the handling of runtime limits on
the number of OPENMP threads
- corrected the error code returned by `SGEADD`/`DGEADD` when
LDA is too small
- corrected the error code returned by `IMATCOPY` when LDB
is too small
- updated `?NRM2` to support negative increment values (as
introduced in release 3.10.0 of the Reference BLAS)
- updated `?ROTG` to use the safe scaling algorithm introduced
in release 3.10.0 of the Reference BLAS
- fixed OpenMP builds with CLANG for the case where libomp is
not in a standard location
- fixed a potential overwrite of unrelated memory during
thread initialisation on startup
- fixed a potential integer overflow in the multithreading
threshold for `?SYMM`/`?SYRK`
- fixed build of the LAPACKE interfaces for the LAPACK 3.11.0
`?TRSYL` functions added in 0.3.22
- applied additions and corrections from the development
branch of Reference-LAPACK:
- fixed actual arguments passed to a number of LAPACK
functions (from Reference-LAPACK PR 885)
- fixed workspace query results in LAPACK `?SYTRF`/`?TRECV3`
(from Reference-LAPACK PR 883)
- fixed derivation of the UPLO parameter in `LAPACKE_?larfb`
(from Reference-LAPACK PR 878)
- fixed a crash in LAPACK `?GELSDD` on `NRHS=0` (from
Reference-LAPACK PR 876)
- added new LAPACK utility functions `CRSCL` and `ZRSCL`
(from Reference-LAPACK PR 839)
- corrected the order of eigenvalues for 2x2 matrices in
`?STEMR` (Reference-LAPACK PR 867)
- removed spurious reference to OpenMP variables outside
OpenMP contexts (Reference-LAPACK PR 860)
- updated file comments on use of `LAMBDA` variable in
LAPACK (Reference-LAPACK PR 852)
- fixed documentation of LAPACK `SLASD0`/`DLASD0`
(Reference-LAPACK PR 855)
- fixed confusing use of "minor" in LAPACK documentation
(Reference-LAPACK PR 849)
- added new LAPACK functions ?GEDMD for dynamic mode
decomposition (Reference-LAPACK PR 736)
- fixed potential stack overflows in the `EIG` part of the
LAPACK testsuite (Reference-LAPACK PR 854)
- applied small improvements to the variants of
Cholesky and QR functions (Reference-LAPACK PR 847)
- removed unused variables from LAPACK `?BDSQR`
(Reference-LAPACK PR 832)
- fixed a potential crash on allocation failure in LAPACKE
`SGEESX`/`DGEESX` (Reference-LAPACK PR 836)
- added a quick return from `SLARUV`/`DLARUV` for N < 1
(Reference-LAPACK PR 837)
- updated function descriptions in LAPACK `?GEGS`/`?GEGV`
(Reference-LAPACK PR 831)
- improved algorithm description in `?GELSY`
(Reference-LAPACK PR 833)
- fixed scaling in LAPACK `STGSNA`/`DTGSNA`
(Reference-LAPACK PR 830)
- fixed crash in `LAPACKE_?geqrt` with row-major data
(Reference-LAPACK PR 768)
- added LAPACKE interfaces for `C/ZUNHR_COL` and
`S/DORHR_COL` (Reference-LAPACK PR 827)
- added error exit tests for `SYSV`/`SYTD2`/`GEHD2` to
the testsuite (Reference-LAPACK PR 795)
- fixed typos in LAPACK source and comments
(Reference-LAPACK PRs 809,811,812,814,820)
- adopt refactored `?GEBAL` implementation
(Reference-LAPACK PR 808)
* x86_64:
- added cpu model autodetection for Intel Alder Lake N
- added activation of the AMX tile to the Sapphire Rapids
`SBGEMM` kernel
- worked around miscompilations of GEMV/SYMV kernels by
gcc's tree-vectorizer
- fixed runtime detection of Cooperlake and Sapphire Rapids
in `DYNAMIC_ARCH`
- fixed feature-based cputype fallback in `DYNAMIC_ARCH`
- corrected `ZAXPY` result on old pre-AVX hardware for the
`INCX=0` case
- fixed a potential use of uninitialized variables in ZTRSM
* ARMV8:
- implemented SWITCH_RATIO parameter for improved GEMM
performance on Neoverse
- activated SVE SGEMM and DGEMM kernels for Neoverse V1
- improved performance of the SVE CGEMM and ZGEMM kernels
on Neoverse V1
- improved kernel selection for the ARMV8SVE target and added
it to `DYNAMIC_ARCH`
- fixed runtime check for SVE availability in `DYNAMIC_ARCH`
builds to take OS or container restrictions into account
- fixed a potential use of uninitialized variables in ZTRSM
* POWER:
- fixed compiler warnings in the POWER10 SBGEMM kernel
- Changes in version 0.3.23
* General:
- fixed a serious regression in `GETRF`/`GETF2` and
`ZGETRF`/`ZGETF2` where subnormal but nonzero data elements
triggered the singularity flag
- fixed a long-standing bug in `CSPR`/`ZSPR` in single-threaded
operation
- for cases where elements of the X vector are real numbers (or
complex with only the real part zero)
* x86_64:
- added further CPUID values for Intel Raptor Lake
- Changes in version 0.3.22
* General:
- Updated the included LAPACK to Reference-LAPACK release 3.11.0
plus post-release corrections and improvements
- Added a threshold for multithreading in `SYMM`, `SYMV` and
`SYR2K`
- Increased the threshold for multithreading in `SYRK`
- OpenBLAS no longer decreases the global `OMP_NUM_THREADS`
when it exceeds the maximum thread count the library was
compiled for.
- fixed `?GETF2` potentially returning `NaN` with tiny matrix
elements
- fixed `openblas_set_num_threads` to work in `USE_OPENMP`
builds.
- fixed cpu core counting in `USE_OPENMP` builds returning the
number of OMP "places" rather than cores
- fixed stride calculation in the optimized small-matrix path of
complex `SYR`
- fixed building of Reference-LAPACK with recent gfortran
- added new environment variable `OPENBLAS_DEFAULT_NUM_THREADS`
- added a GEMV-based implementation of `GEMMT`
* x86_64:
- added autodetection of Intel Raptor Lake cpu models
- added SSCAL microkernels for Haswell and newer targets
- improved the performance of the Haswell DSCAL microkernel
- added CSCAL and ZSCAL microkernels for SkylakeX targets
- fixed detection of gfortran and Cray CCE compilers
- fixed runtime selection of COOPERLAKE in `DYNAMIC_ARCH` builds
- worked around gcc/llvm using risky FMA operations in
CSCAL/ZSCAL
* ARMV8:
- fixed cross-compilation to CortexA53 with CMAKE
- fixed compilation with CMAKE and "Arm Compiler for Linux 22.1"
- added cpu autodetection for Cortex X3 and A715
- fixed conditional compilation of SVE-capable targets in
`DYNAMIC_ARCH`
- sped up SVE kernels by removing unnecessary prefetches
- improved the GEMM performance of Neoverse V1
- added SVE kernels for SDOT and DDOT
- added an SBGEMM kernel for Neoverse N2
- improved cpu-specific compiler option selection for
Neoverse cpus
- added support for setting `CONSISTENT_FPCSR`
- Minor rebase of openblas-ppc64be_up2_p8.patch to apply cleanly. - Minor rebase of openblas-ppc64be_up2_p8.patch to apply cleanly.
- Drop upstreamed patches: - Drop upstreamed patches:
* Use-blasint-for-INTERFACE64-compatibility.patch * Use-blasint-for-INTERFACE64-compatibility.patch

View File

@ -434,7 +434,7 @@ make MAKE_NB_JOBS=$jobs %{?openblas_target} %{?build_flags} \
%{?dynamic_list} \ %{?dynamic_list} \
%{!?with_hpc:%{?libnamesuffix} FC=gfortran CC=gcc%{?cc_v:-%{cc_v}} %{?cc_v:CEXTRALIB=""}} \ %{!?with_hpc:%{?libnamesuffix} FC=gfortran CC=gcc%{?cc_v:-%{cc_v}} %{?cc_v:CEXTRALIB=""}} \
%{?ldflags_tests:LDFLAGS_TESTS=%{ldflags_tests}} \ %{?ldflags_tests:LDFLAGS_TESTS=%{ldflags_tests}} \
%{?with_hpc:%{?cc_v:CC=gcc-%{cc_v} CEXTRALIB=""}} %{?with_hpc:%{?cc_v:CC=gcc-%{cc_v} FC=gfortran-%{cc_v} CEXTRALIB=""}}
%install %install
%if %{with hpc} %if %{with hpc}