Accepting request 1173654 from science

- Cleaned up changelog: * Added missing changes from 0.3.22 to 0.3.24 release. * Formated list of package changes in markdown format for easier conversion. * Dropped all entries that are irrelevant for SUSE or to users: - build related - in particular CMAKE - OS-related except Linux - related to compilers not supported on SUSE - related to architectures presently not supported on SUSE (forwarded request 1160107 from eeich) OBS-URL: https://build.opensuse.org/request/show/1173654 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/openblas?expand=0&rev=61
2024-05-14 11:37:27 +00:00 · 2024-05-14 11:37:27 +00:00 · 7de1c526a3
commit 7de1c526a3
parent 9beac0d378 92f8b96ec2
2 changed files with 176 additions and 46 deletions
--- a/openblas.changes
+++ b/openblas.changes
@ -9,53 +9,39 @@ Wed Jan 17 08:47:55 UTC 2024 - Egbert Eich <eich@suse.com>

 - Update to version 0.3.26:
  * General:
-    - Improved the version of openblas.pc that is created by the
-      CMAKE build.
-    - Fixed a CMAKE-specific build problem on older versions of
-      MacOS.
-    - Worked around linking problems on old versions of MacOS.
-    - Corrected installation location of the lapacke_mangling
-      header in CMAKE builds.
    - Added type declarations for complex variables to the
      MSVC-specific parts of the LAPACK header.
-    - Significantly sped up ?GESV for small problem sizes by
+    - Significantly sped up `?GESV` for small problem sizes by
      introducing a lower bound for multithreading.
    - Imported additions and corrections from the Reference-LAPACK
      project:
-      + Added new LAPACK functions for truncated QR with pivoting
+      + Added new LAPACK functions for truncated `QR` with pivoting
        (Reference-LAPACK PRs 891&941).
      + Handle miscalculation of minimum work array size in corner
      	cases (Reference-LAPACK PR 942).
-      + Fixed use of uninitialized variables in ?GEDMD and
+      + Fixed use of uninitialized variables in `?GEDMD` and
      	improved inline documentation.
      + Fixed use of uninitialized variables (and consequential
-      	failures) in ?BBCSD.
+      	failures) in `?BBCSD`.
      + Added tests for the recently introduced Dynamic Mode
      	Decomposition functions.
      + Fixed several memory leaks in the LAPACK testsuite.
-      + Fixed counting of testsuite results by the Python script.
  * x86-64:
-    - Fixed computation of CASUM on SkylakeX and newer targets in
+    - Fixed computation of `CASUM` on SkylakeX and newer targets in
      the special case that AVX512 is not supported by the compiler
      or operating environment.
-    - Fixed potential undefined behaviour in the CASUM/ZASUM
+    - Fixed potential undefined behaviour in the `CASUM`/`ZASUM`
      kernels for AVX512 targets.
-    - worked around a problem in the pre-AVX kernels for GEMV
+    - worked around a problem in the pre-AVX kernels for `GEMV`
  * arm64:
-    - Sped up SGEMM and DGEMM on Neoverse V1 and N1.
-    - Sped up ?DOT on SVE-capable targets.
-    - Reduced the number of targets in DYNAMIC_ARCH builds by
+    - Sped up `SGEMM` and `DGEMM` on Neoverse V1 and N1.
+    - Sped up `?DOT` on SVE-capable targets.
+    - Reduced the number of targets in `DYNAMIC_ARCH` builds by
      eliminating functionally equivalent ones.
  * POWER:
    - Improved the SGEMM kernel for POWER10.
    - Fixed compilation with (very) old versions of gcc.
-    - Fixed detection of old 32bit PPC targets in CMAKE-based
-      builds.
    - Added autodetection of the POWERPC 7400 subtype.
-    - Fixed CMAKE-based compilation for PPCG4 and PPC970 targets.
-  * LONGARCH64:
-    - Added and improved optimized kernels for almost all BLAS
-      functions.

 -------------------------------------------------------------------
 Wed Dec 20 12:02:55 UTC 2023 - Giacomo Comes <gcomes.obs@gmail.com>
@ -72,44 +58,188 @@ Wed Nov 29 05:43:18 UTC 2023 - Atri Bhattacharya <badshah400@gmail.com>
      thread count
    - improved the code to add supplementary thread buffers in
      case of overflow
-    - fixed a potential division by zero in ?ROTG
-    - improved the ?MATCOPY functions to accept zero-sized rows or
+    - fixed a potential division by zero in `?ROTG`
+    - improved the `?MATCOPY` functions to accept zero-sized rows or
      columns
    - corrected empty prototypes in function declarations
    - cleaned up unused declarations in the f2c-converted versions
      of the LAPACK sources
-    - fixed compilation with the Cray CCE Compiler suite
    - improved link line rewriting to avoid mixed libgomp/libomp
      builds with clang&gfortran
-    - worked around OPENMP builds with LLVM14's libomp hanging on
-      FreeBSD
-    - improved the Makefiles to require less option duplication on
-      "make install"
    - imported the following changes from the upcoming release
      3.12 of Reference-LAPACK: LAPACK PR 900, LAPACK PR 904,
      LAPACK PR 907, LAPACK PR 909, LAPACK PR 926, LAPACK PR 927,
      LAPACK PR 928 & 930
  * x86-64:
-    - fixed compile-time autodetection of AMD Ryzen3 and Ryzen4
-      cpus
    - fixed capability-based fallback selection for unknown cpus
-      in DYNAMIC_ARCH
-    - added AVX512 optimizations for ?ASUM on Sapphire Rapids and
+      in `DYNAMIC_ARCH`
+    - added AVX512 optimizations for `?ASUM` on Intel Sapphire Rapids and
      Cooper Lake
  * ARM64:
-    - fixed building on Apple with homebrew gcc
    - fixed building with XCODE 15
    - fixed building on A64FX and Cortex A710/X1/X2
-    - increased the default buffer size for recent ARM server cpus
+    - increased the default buffer size for recent arm server cpus
  * POWER:
-    - fixed building with the IBM xlf 16.1.1 compiler
-    - fixed building with IBM XL C
-    - added support for DYNAMIC_ARCH builds with clang
-    - fixed union declaration in the BFLOAT16 test case
-    - enable optimizations for the AIX assembler on POWER10
-  * LOONGARCH64:
-    - added an optimized SGEMV kernel
-    - added an optimized DTRSM kernel
+    - added support for `DYNAMIC_ARCH` builds with clang
+    - fixed union declaration in the `BFLOAT16` test case
+- Changes in version 0.3.24
+  * General:
+    - Declared the arguments of `cblas_xerbla` as `const`
+      (in accordance with the reference implementation
+      and others, the previous discrepancy appears to have dated
+      back to GotoBLAS)
+    - fixed the implementation of `?GEMMT` that was added in 0.3.23
+    - made cpu-specific `SWITCH_RATIO` parameters for GEMM
+      available to `DYNAMIC_ARCH` builds
+    - fixed missing `SSYCONVF` function in the shared library
+    - fixed parallel build logic used with gmake
+    - fixed several issues with the handling of runtime limits on
+      the number of OPENMP threads
+    - corrected the error code returned by `SGEADD`/`DGEADD` when
+      LDA is too small
+    - corrected the error code returned by `IMATCOPY` when LDB
+      is too small
+    - updated `?NRM2` to support negative increment values (as
+      introduced in release 3.10.0 of the Reference BLAS)
+    - updated `?ROTG` to use the safe scaling algorithm introduced
+      in release 3.10.0 of the Reference BLAS
+    - fixed OpenMP builds with CLANG for the case where libomp is
+      not in a standard location
+    - fixed a potential overwrite of unrelated memory during
+      thread initialisation on startup
+    - fixed a potential integer overflow in the multithreading
+      threshold for `?SYMM`/`?SYRK`
+    - fixed build of the LAPACKE interfaces for the LAPACK 3.11.0
+      `?TRSYL` functions added in 0.3.22
+    - applied additions and corrections from the development
+      branch of Reference-LAPACK:
+      - fixed actual arguments passed to a number of LAPACK
+        functions (from Reference-LAPACK PR 885)
+      - fixed workspace query results in LAPACK `?SYTRF`/`?TRECV3`
+        (from Reference-LAPACK PR 883)
+      - fixed derivation of the UPLO parameter in `LAPACKE_?larfb`
+        (from Reference-LAPACK PR 878)
+      - fixed a crash in LAPACK `?GELSDD` on `NRHS=0` (from
+        Reference-LAPACK PR 876)
+      - added new LAPACK utility functions `CRSCL` and `ZRSCL`
+        (from Reference-LAPACK PR 839)
+      - corrected the order of eigenvalues for 2x2 matrices in
+       `?STEMR` (Reference-LAPACK PR 867)
+      - removed spurious reference to OpenMP variables outside
+        OpenMP contexts (Reference-LAPACK PR 860)
+      - updated file comments on use of `LAMBDA` variable in
+        LAPACK (Reference-LAPACK PR 852)
+      - fixed documentation of LAPACK `SLASD0`/`DLASD0`
+        (Reference-LAPACK PR 855)
+      - fixed confusing use of "minor" in LAPACK documentation
+        (Reference-LAPACK PR 849)
+      - added new LAPACK functions ?GEDMD for dynamic mode
+        decomposition (Reference-LAPACK PR 736)
+      - fixed potential stack overflows in the `EIG` part of the
+        LAPACK testsuite (Reference-LAPACK PR 854)
+      - applied small improvements to the variants of
+        Cholesky and QR functions (Reference-LAPACK PR 847)
+      - removed unused variables from LAPACK `?BDSQR`
+        (Reference-LAPACK PR 832)
+      - fixed a potential crash on allocation failure in LAPACKE
+        `SGEESX`/`DGEESX` (Reference-LAPACK PR 836)
+      - added a quick return from `SLARUV`/`DLARUV` for N < 1
+        (Reference-LAPACK PR 837)
+      - updated function descriptions in LAPACK `?GEGS`/`?GEGV`
+        (Reference-LAPACK PR 831)
+      - improved algorithm description in `?GELSY`
+        (Reference-LAPACK PR 833)
+      - fixed scaling in LAPACK `STGSNA`/`DTGSNA`
+        (Reference-LAPACK PR 830)
+      - fixed crash in `LAPACKE_?geqrt` with row-major data
+        (Reference-LAPACK PR 768)
+      - added LAPACKE interfaces for `C/ZUNHR_COL` and
+        `S/DORHR_COL` (Reference-LAPACK PR 827)
+      - added error exit tests for `SYSV`/`SYTD2`/`GEHD2` to
+        the testsuite (Reference-LAPACK PR 795)
+      - fixed typos in LAPACK source and comments
+        (Reference-LAPACK PRs 809,811,812,814,820)
+      - adopt refactored `?GEBAL` implementation
+        (Reference-LAPACK PR 808)
+  * x86_64:
+    - added cpu model autodetection for Intel Alder Lake N
+    - added activation of the AMX tile to the Sapphire Rapids
+      `SBGEMM` kernel
+    - worked around miscompilations of GEMV/SYMV kernels by
+      gcc's tree-vectorizer
+    - fixed runtime detection of Cooperlake and Sapphire Rapids
+      in `DYNAMIC_ARCH`
+    - fixed feature-based cputype fallback in `DYNAMIC_ARCH`
+    - corrected `ZAXPY` result on old pre-AVX hardware for the
+      `INCX=0` case
+    - fixed a potential use of uninitialized variables in ZTRSM
+  * ARMV8:
+    - implemented SWITCH_RATIO parameter for improved GEMM
+      performance on Neoverse
+    - activated SVE SGEMM and DGEMM kernels for Neoverse V1
+    - improved performance of the SVE CGEMM and ZGEMM kernels
+      on Neoverse V1
+    - improved kernel selection for the ARMV8SVE target and added
+      it to `DYNAMIC_ARCH`
+    - fixed runtime check for SVE availability in `DYNAMIC_ARCH`
+      builds to take OS or container restrictions into account
+    - fixed a potential use of uninitialized variables in ZTRSM
+  * POWER:
+    - fixed compiler warnings in the POWER10 SBGEMM kernel
+- Changes in version 0.3.23
+  * General:
+    - fixed a serious regression in `GETRF`/`GETF2` and
+      `ZGETRF`/`ZGETF2` where subnormal but nonzero data elements
+      triggered the singularity flag
+    - fixed a long-standing bug in `CSPR`/`ZSPR` in single-threaded
+      operation
+    - for cases where elements of the X vector are real numbers (or
+      complex with only the real part zero)
+  * x86_64:
+    - added further CPUID values for Intel Raptor Lake
+- Changes in version 0.3.22
+  * General:
+    - Updated the included LAPACK to Reference-LAPACK release 3.11.0
+      plus post-release corrections and improvements
+    - Added a threshold for multithreading in `SYMM`, `SYMV` and
+      `SYR2K`
+    - Increased the threshold for multithreading in `SYRK`
+    - OpenBLAS no longer decreases the global `OMP_NUM_THREADS`
+      when it exceeds the maximum thread count the library was
+      compiled for.
+    - fixed `?GETF2` potentially returning `NaN` with tiny matrix
+      elements
+    - fixed `openblas_set_num_threads` to work in `USE_OPENMP`
+      builds.
+    - fixed cpu core counting in `USE_OPENMP` builds returning the
+      number of OMP "places" rather than cores
+    - fixed stride calculation in the optimized small-matrix path of
+      complex `SYR`
+    - fixed building of Reference-LAPACK with recent gfortran
+    - added new environment variable `OPENBLAS_DEFAULT_NUM_THREADS`
+    - added a GEMV-based implementation of `GEMMT`
+  * x86_64:
+    - added autodetection of Intel Raptor Lake cpu models
+    - added SSCAL microkernels for Haswell and newer targets
+    - improved the performance of the Haswell DSCAL microkernel
+    - added CSCAL and ZSCAL microkernels for SkylakeX targets
+    - fixed detection of gfortran and Cray CCE compilers
+    - fixed runtime selection of COOPERLAKE in `DYNAMIC_ARCH` builds
+    - worked around gcc/llvm using risky FMA operations in
+      CSCAL/ZSCAL
+  * ARMV8:
+    - fixed cross-compilation to CortexA53 with CMAKE
+    - fixed compilation with CMAKE and "Arm Compiler for Linux 22.1"
+    - added cpu autodetection for Cortex X3 and A715
+    - fixed conditional compilation of SVE-capable targets in
+      `DYNAMIC_ARCH`
+    - sped up SVE kernels by removing unnecessary prefetches
+    - improved the GEMM performance of Neoverse V1
+    - added SVE kernels for SDOT and DDOT
+    - added an SBGEMM kernel for Neoverse N2
+    - improved cpu-specific compiler option selection for
+      Neoverse cpus
+    - added support for setting `CONSISTENT_FPCSR`
 - Minor rebase of openblas-ppc64be_up2_p8.patch to apply cleanly.
 - Drop upstreamed patches:
  * Use-blasint-for-INTERFACE64-compatibility.patch
--- a/openblas.spec
+++ b/openblas.spec
@ -434,7 +434,7 @@ make MAKE_NB_JOBS=$jobs %{?openblas_target} %{?build_flags} \
     %{?dynamic_list} \
     %{!?with_hpc:%{?libnamesuffix} FC=gfortran CC=gcc%{?cc_v:-%{cc_v}} %{?cc_v:CEXTRALIB=""}} \
     %{?ldflags_tests:LDFLAGS_TESTS=%{ldflags_tests}} \
-     %{?with_hpc:%{?cc_v:CC=gcc-%{cc_v} CEXTRALIB=""}}
+     %{?with_hpc:%{?cc_v:CC=gcc-%{cc_v} FC=gfortran-%{cc_v} CEXTRALIB=""}}

 %install
 %if %{with hpc}