- Update to versiom 0.3.5
common:
* Loop unrolling in TRMV has been enabled again.
* A domain error in the thread workload distribution for SYRK
has been fixed.
* gmake builds will now automatically add -fPIC to the build
options if the platform requires it.
* A pthreads key leakage (and associate crash on dlclose) in
the USE_TLS codepath was fixed.
* Building of the utest cases on systems that do not provide
an implementation of complex.h was fixed.
x86_64:
* The SkylakeX code was changed to compile on OSX.
* Unwanted application of the -march=skylake-avx512 option
to the common code parts of a DYNAMIC_ARCH build was fixed.
* Improved performance of SGEMM for small workloads on Skylake X.
* Performance of SGEMM and DGEMM was improved on Haswell.
armv8:
* A configuration error that broke the CNRM2 kernel was corrected.
* Compilation of the GEMM kernels with CMAKE was fixed.
* DYNAMIC_ARCH builds are now available with CMAKE as well.
* Using CMAKE for cross-compilation to the new cpu TARGETs
introduced in 0.3.4 now works.
power:
* A problem in cpu autodetection for AIX has been corrected.
OBS-URL: https://build.opensuse.org/request/show/663321
OBS-URL: https://build.opensuse.org/package/show/science/openblas?expand=0&rev=72
- Update to version 0.3.4
common:
* The new, experimental thread-local memory allocation had
inadvertently been left enabled for gmake builds in 0.3.3
despite the announcement. It is now disabled by default,
and single-threaded builds will keep using the old
allocator even if the USE_TLS option is turned on.
* OpenBLAS will now provide enough buffer space for at least
50 threads by default.
* The output of openblas_get_config() now contains the version
number.
* A serious thread safety bug in GEMV operation with small M and
large N size has been fixed.
* The code will now automatically call blas_thread_init after
a fork if needed before handling a call to
openblas_set_num_threads
* Accesses to parallelized level3 functions from multiple
callers are now serialized to avoid thread races
(unless using OpenMP).
* This should provide better performance than the
known-threadsafe (but non-default)
USE_SIMPLE_THREADED_LEVEL3 option.
* When building LAPACK with gfortran, -frecursive is now
(again) enabled by default to ensure correct behaviour.
* The OpenBLAS version cblas.h now supports both CBLAS_ORDER
and CBLAS_LAYOUT as the name of the matrix row/column order
option.
* Externally set LDFLAGS are now passed through to the final
compile/link
* steps to facilitate setting platform-specific linker flags.
OBS-URL: https://build.opensuse.org/request/show/656046
OBS-URL: https://build.opensuse.org/package/show/science/openblas?expand=0&rev=70
- Update to version 0.3.3
common:
* thread memory allocation has been switched back to the method used before version 0.3.1 due to unexpected problems caused by the new code under some circumstances.
* LAPACK PR272 has been integrated, which fixes spurious errors in DSYEVR and related functions caused by missing conversion from ILAENV to ILAENV_2STAGE in several _2stage routines.
x86_64
* added AVX512 implementations of SDOT, DDOT, SAXPY, DAXPY, DSCAL, DGEMVN and DSYMVL
* added a workaround for a cygwin issue that prevented compilation of AVX512 code
OBS-URL: https://build.opensuse.org/request/show/640892
OBS-URL: https://build.opensuse.org/package/show/science/openblas?expand=0&rev=68
- Update to version 0.3.2
common:
* Fixes for regressions caused by the rewrite of the thread
initialization code in 0.3.1
x86_64:
* Added autodetection of AMD Ryzen 2
* Fixed build with older versions of MSVC
power:
* Fixed cpu autodetection for the BSDs
mips64:
* Fixed utest errors in AXPY, DSDOT, ROT and SWAP
- Version 0.3.1
common:
* Rewritten thread initialization code with significantly
reduced overhead
* Added CBLAS interfaces to the IxAMIN BLAS extension functions
* Fixed the lapack-test target
* CMAKE builds now create an OpenBLASConfig.cmake file
* ZAXPY now uses a single thread for small input sizes
* The LAPACK code was updated from Reference-LAPACK/lapack#253
power:
* Corrected CROT and ZROT behaviour with zero INC_X
armv7:
* Corrected xDOT behaviour with zero INC_X or INC_Y
x86_64:
* Retired some older targets of DYNAMIC_ARCH builds to a
new option DYNAMIC_OLDER, this affects PENRYN,DUNNINGTON,
OPTERON,OPTERON_SSE3,BOBCAT,ATOM and NANO (which will still
be supported via the slower PRESCOTT kernels when this option
is not set)
OBS-URL: https://build.opensuse.org/request/show/629943
OBS-URL: https://build.opensuse.org/package/show/science/openblas?expand=0&rev=67
- Update to version 0.2.20:
* common:
- Improved CMake support
- Fixed several thread race and locking bugs
- Fixed default LAPACK optimization level
- Updated LAPACK to 3.7.0
- Added ReLAPACK (https://github.com/HPAC/ReLAPACK), make
BUILD_RELAPACK=1
* POWER:
- Optimizations for Power9
- Fixed several Power8 assembly bugs
* ARM:
- New optimized Vulcan and ThunderX2T99 targets
- Support for ARMV7 SOFT_FP ABI (make ARM_SOFTFP_ABI=1)
- Detect all cpu cores including offline ones
- Fix compilation with CLANG
- Support building a shared library for Android
* MIPS:
- Fixed several threading issues
- Fix compilation with CLANG
* x86_64:
- Detect Intel Bay Trail and Apollo Lake
- Detect Intel Sky Lake and Kaby Lake
- Detect Intel Knights Landing
- Detect AMD A8, A10, A12 and Ryzen
- Support 64bit builds with Visual Studio
- Fix building with Intel and PGI compilers
- Fix building with MINGW and TDM-GCC
- Fix cmake builds for Haswell and related cpus
- Fix building for Sandybridge with CLANG 3.9
OBS-URL: https://build.opensuse.org/request/show/513093
OBS-URL: https://build.opensuse.org/package/show/science/openblas?expand=0&rev=52
- Update to version 0.2.16
* Upgrade LAPACK to 3.6.0 version.
* Disable multi-threading for small size swap and ger.
* Improve small zger, zgemv, ztrmv using stack alloction.
* Let openblas_get_num_threads return the number of active threads.
* Fix LAPACK Dormbr, Dormlq bug.
* Avoid potential getenv segfault.
* Import LAPACK svn bugfix #142-#147,#150-#155
* Optimize trsm kernels for AMD Bulldozer, Piledriver, Steamroller.
* Detect Intel Avoton.
* Detect AMD Trinity, Richland, E2-3200.
* Optimize c/zgemv for AMD Bulldozer, Piledriver, Steamroller
* Fix bug with scipy linalg test.
* Support and optimize Cortex-A57 AArch64.
* Update ARMV6 kernels.
* Improve DGEMM for ARM Cortex-A57.
* Fix detection of POWER architecture.
* Optimize D and Z BLAS3 functions for Power8.
- Remove openblas-libs.patch, not needed.
OBS-URL: https://build.opensuse.org/request/show/374076
OBS-URL: https://build.opensuse.org/package/show/science/openblas?expand=0&rev=42
- Update to version 0.2.15
* Enable MAX_STACK_ALLOC flags by default.
* Improve ger and gemv for small matrices.
* Improve gemv parallel with small m and large n case.
* Improve ?imatcopy when lda==ldb
* Add vecLib benchmarks
* Fix LAPACK lantr for row major matrices
* Fix LAPACKE lansy
* Import bug fixes for LAPACKE s/dormlq, c/zunmlq
* Raise the signal when pthread_create fails
* Drop obsolete openblas-arm64-build.patch
x86/x86-64:
* Support pure C generic kernels for x86/x86-64.
* Support Intel Boardwell and Skylake by Haswell kernels.
* Support AMD Excavator by Steamroller kernels.
* Optimize s/d/c/zdot for Intel SandyBridge and Haswell.
* Optimize s/d/c/zdot for AMD Piledriver and Steamroller.
* Optimize s/d/c/zapxy for Intel SandyBridge and Haswell.
* Optimize s/d/c/zapxy for AMD Piledriver and Steamroller.
* Optimize d/c/zscal for Intel Haswell, dscal for Intel SandyBridge.
* Optimize d/c/zscal for AMD Bulldozer, Piledriver and Steamroller.
* Optimize s/dger for Intel SandyBridge.
* Optimize s/dsymv for Intel SandyBridge.
* Optimize ssymv for Intel Haswell.
* Optimize dgemv for Intel Nehalem and Haswell.
* Optimize dtrmm for Intel Haswell.
ARM:
* Support Android NDK armeabi-v7a-hard ABI (-mfloat-abi=hard)
* Fix lock, rpcc bugs
POWER:
OBS-URL: https://build.opensuse.org/request/show/341265
OBS-URL: https://build.opensuse.org/package/show/science/openblas?expand=0&rev=40
- Change library name suffix
* drop openblas-soname.patch
- Add RPM %post script for manual BLAS/LAPACK update-alternatives
configuration update
- Use update-alternatives mechanism for OpenBLAS variants (serial,
openmp, pthreads). pthreads variant is default for x86 and x86_64,
OpenMP for other architectures.
- Fix build on ARM64
* openblas-arm64-build.patch
- Add update-alternatives mechanism for CBLAS
- Provide cmake module
- Delete info about host cpu from openblas_config.h for dynamic arch
- Add update-alternatives to 'preup' and 'post' requires list for
libraries
- Add README.SUSE
OBS-URL: https://build.opensuse.org/request/show/322040
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/openblas?expand=0&rev=9
* drop openblas-soname.patch
- Add RPM %post script for manual BLAS/LAPACK update-alternatives
configuration update
- Use update-alternatives mechanism for OpenBLAS variants (serial,
openmp, pthreads). pthreads variant is default for x86 and x86_64,
OpenMP for other architectures.
- Fix build on ARM64
* openblas-arm64-build.patch
- Add update-alternatives mechanism for CBLAS
- Provide cmake module
- Delete info about host cpu from openblas_config.h for dynamic arch
- Add update-alternatives to 'preup' and 'post' requires list for
libraries
- Add README.SUSE
OBS-URL: https://build.opensuse.org/package/show/science/openblas?expand=0&rev=38
- Update to version 0.2.14
* Improve ger and gemv for small matrices by stack allocation.
e.g. make -DMAX_STACK_ALLOC=2048
* Introduce openblas_get_num_threads and openblas_get_num_procs.
* Add ATLAS-style ?geadd function.
* Fix c/zsyr bug with negative incx.
* Fix race condition during shutdown causing a crash in
gotoblas_set_affinity().
x86/x86-64:
* Support AMD Streamroller.
ARM:
* Add Cortex-A9 and Cortex-A15 targets.
OBS-URL: https://build.opensuse.org/request/show/305140
OBS-URL: https://build.opensuse.org/package/show/science/openblas?expand=0&rev=36
- Update to version 0.2.13
* Add SYMBOLPREFIX and SYMBOLSUFFIX makefile options
for adding a prefix or suffix to all exported symbol names
in the shared library.
* Remove openblas-0.1.0-soname.patch
* Add openblas-soname.patch
* Rebase openblas-noexecstack.patch
x86/x86-64:
* Add generic kernel files for x86-64. make TARGET=GENERIC
* Fix a bug of sgemm kernel on Intel Sandy Bridge.
* Fix c_check bug on some amd64 systems.
ARM:
* Support APM's X-Gene 1 AArch64 processors.
* Optimize trmm and sgemm.
OBS-URL: https://build.opensuse.org/request/show/263947
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/openblas?expand=0&rev=7
- Update to version 0.2.12
* Added CBLAS interface for ?omatcopy and ?imatcopy.
* Enable ?gemm3m functions.
* Added benchmark for ?gemm3m.
* Optimized multithreading lower limits.
* Disabled SYMM3M and HEMM3M functions because of segment violations.
x86/x86-64:
* Improved axpy and symv performance on AMD Bulldozer.
* Improved gemv performance on modern Intel and AMD CPUs.
OBS-URL: https://build.opensuse.org/request/show/257419
OBS-URL: https://build.opensuse.org/package/show/science/openblas?expand=0&rev=31
- Update to version 0.2.9
* Update LAPACK to 3.5.0 version
* Fixed compatiable issues with Clang and Pathscale compilers.
* Added OPENBLAS_VERBOSE environment variable.(#338)
* Make OpenBLAS thread-pool resilient to fork via pthread_atfork.
(#294)
* Rewrote rotmg
* Fixed sdsdot bug.
* Improved the result for LAPACK testing. (#372)
x86/x86-64:
* Optimization on Intel Haswell.
* Enable optimization kernels on AMD Bulldozer and Piledriver.
* Detect Intel Haswell for new Macbook.
* To improve LAPACK testing, we fallback some kernels. (#372)
https://github.com/xianyi/OpenBLAS/wiki/Fixed-optimized-kernels-To-do-List
ARM:
* Support ARMv6 and ARMv7 ISA.
* Optimization on ARM Cortex-A9.
- Update patches:
* openblas-0.2.8-libs.patch
* openblas-0.2.8-noexecstack.patch
to
* openblas-libs.patch
* openblas-noexecstack.patch
- Fix gcc warnings (#385)
* openblas-0.2.9-gcc-warnings.patch
OBS-URL: https://build.opensuse.org/request/show/237017
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/openblas?expand=0&rev=2
- Update to version 0.2.9
* Update LAPACK to 3.5.0 version
* Fixed compatiable issues with Clang and Pathscale compilers.
* Added OPENBLAS_VERBOSE environment variable.(#338)
* Make OpenBLAS thread-pool resilient to fork via pthread_atfork.
(#294)
* Rewrote rotmg
* Fixed sdsdot bug.
* Improved the result for LAPACK testing. (#372)
x86/x86-64:
* Optimization on Intel Haswell.
* Enable optimization kernels on AMD Bulldozer and Piledriver.
* Detect Intel Haswell for new Macbook.
* To improve LAPACK testing, we fallback some kernels. (#372)
https://github.com/xianyi/OpenBLAS/wiki/Fixed-optimized-kernels-To-do-List
ARM:
* Support ARMv6 and ARMv7 ISA.
* Optimization on ARM Cortex-A9.
- Update patches:
* openblas-0.2.8-libs.patch
* openblas-0.2.8-noexecstack.patch
- Fix gcc warnings (#385)
* openblas-0.2.9-gcc-warnings.patch
OBS-URL: https://build.opensuse.org/request/show/237012
OBS-URL: https://build.opensuse.org/package/show/science/openblas?expand=0&rev=23