python-numba/python-numba.changes

1502 lines
67 KiB
Plaintext
Raw Normal View History

-------------------------------------------------------------------
Tue May 28 09:30:26 UTC 2024 - Daniel Garcia <daniel.garcia@suse.com>
- Skip broken test on ppc64le
bsc#1225394, gh#numba/numba#8489
-------------------------------------------------------------------
Fri Mar 22 20:05:25 UTC 2024 - Dirk Müller <dmueller@suse.com>
- update to 0.59.1:
* Fixed caching of kernels that use target-specific overloads
* Fixed a performance regression introduced in Numba 0.59 which
made ``np.searchsorted`` considerably slower.
* This patch fixes two issues with ``np.searchsorted``. First,
a regression is fixed in the support of ``np.datetime64``.
Second, adopt ``NAT``-aware comparisons to fix mishandling
of ``NAT`` value.
* Allow use of Python 3.12 PEP-695 type parameter syntax
-------------------------------------------------------------------
Fri Mar 8 15:37:58 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Stop testing python39: dropped since ipython 8.19
-------------------------------------------------------------------
Wed Feb 21 15:35:47 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Simplify test flavor logic
- Prepare for python39 flavor drop: Exclude build in empty test
flavors
- Don't test on 32bit-platforms
-------------------------------------------------------------------
Sat Feb 3 07:04:27 UTC 2024 - Dirk Müller <dmueller@suse.com>
- update to 0.59.0
* Python 3.12 support
* minimum supported version to 3.9
* Add support for ufunc attributes and reduce
* Add a config variable to enable / disable the llvmlite memory
manager
* see https://numba.readthedocs.io/en/stable/release/0.59.0-notes.html#highlights
-------------------------------------------------------------------
Mon Nov 20 12:15:07 UTC 2023 - Markéta Machová <mmachova@suse.com>
- Update to 0.58.1
* Added towncrier
* The minimum supported NumPy version is 1.22.
* Add support for NumPy 1.26
* Remove NVVM 3.4 and CTK 11.0 / 11.1 support
* Removal of Windows 32-bit Support
* The minimum llvmlite version is now 0.41.0.
* Added RVSDG-frontend
- Drop merged patches:
* numba-pr9105-np1.25.patch
* multiprocessing-context.patch
-------------------------------------------------------------------
Tue Sep 19 12:08:03 UTC 2023 - Markéta Machová <mmachova@suse.com>
- Add multiprocessing-context.patch fixing tests for Python 3.11.5
-------------------------------------------------------------------
Mon Aug 21 19:53:19 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Add numba-pr9105-np1.25.patch, raise (reintroduced) numpy pin
* gh#numba/numba#9105
* Adapted gh#numba/numba#9138
-------------------------------------------------------------------
Mon Aug 14 06:47:15 UTC 2023 - Dirk Müller <dmueller@suse.com>
- update to 0.57.1:
* fix regressions with 0.57.0
- remove upper bound on numpy - upstream does not have it either
-------------------------------------------------------------------
Fri May 26 13:28:26 UTC 2023 - Steve Kowalik <steven.kowalik@suse.com>
- Update to 0.57.0:
* Support for Python 3.11 (minimum is moved to 3.8)
* Support for NumPy 1.24 (minimum is moved to 1.21)
* Python language support enhancements:
+ Exception classes now support arguments that are not compile time
constant.
+ The built-in functions hasattr and getattr are supported for compile
time constant attributes.
+ The built-in functions str and repr are now implemented similarly to
their Python implementations. Custom __str__ and __repr__ functions
can be associated with types and work as expected.
+ Numbas unicode functionality in str.startswith now supports kwargs
start and end.
+ min and max now support boolean types.
+ Support is added for the dict(iterable) constructor.
- Dropped patches:
* numba-pr8620-np1.24.patch
* update-tbb-backend-calls-2021.6.patch
- Rebased existing patch.
-------------------------------------------------------------------
Wed Apr 12 05:53:24 UTC 2023 - Steve Kowalik <steven.kowalik@suse.com>
- Clean up leftover Python 3.8 gubbins, look forward to Python 3.11 support.
-------------------------------------------------------------------
Tue Apr 11 08:30:00 UTC 2023 - Dominique Leuenberger <dimstar@opensuse.org>
- Remove test-py38 flavor from multibuild: Python 3.8 is no longer
supported.
-------------------------------------------------------------------
Tue Jan 3 12:13:00 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Split out python flavors into testing multibuilds. Depending on
the obs worker, the test suite can take almost an hour per
flavor.
- Replace allow-numpy-1.24.patch with an updated
numba-pr8620-np1.24.patch to also work with still present numpy
1.23 in Factory (discussed upstream in gh#numba/numba#8620)
- Merge fix-cli-test.patch into skip-failing-tests.patch
-------------------------------------------------------------------
Mon Jan 2 21:27:24 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Clean up the specfile
* restore the multibuild
* Patch allow-numpy-1.24.patch is the WIP gh#numba/numba#8620
-------------------------------------------------------------------
Sun Jan 1 11:41:11 UTC 2023 - Matej Cepl <mcepl@suse.com>
- Update to 0.56.4:
- This is a bugfix release to fix a regression in the CUDA
target in relation to the .view() method on CUDA device
arrays that is present when using NumPy version 1.23.0 or
later.
- This is a bugfix release to remove the version restriction
applied to the setuptools package and to fix a bug in the
CUDA target in relation to copying zero length device arrays
to zero length host arrays.
- Add allow-numpy-1.24.patch to allow work with numpy 1.24
-------------------------------------------------------------------
Mon Oct 10 10:07:52 UTC 2022 - John Vandenberg <jayvdb@gmail.com>
- Allow numpy 1.23
-------------------------------------------------------------------
Mon Oct 3 12:02:05 UTC 2022 - Daniel Garcia <daniel.garcia@suse.com>
- Update to 0.56.2
This release continues to add new features, bug fixes and stability
improvements to Numba. Please note that this will be the last release that
has support for Python 3.7 as the next release series (Numba 0.57) will
support Python 3.11! Also note that, this will be the last release to support
linux-32 packages produced by the Numba team.
- Remove fix-max-name-size.patch, it's included in the new version.
- Add update-tbb-backend-calls-2021.6.patch to make it compatible with the
latest tbb-devel version.
- Add fix-cli-test.patch to disable one test that fails with OBS.
-------------------------------------------------------------------
Mon Jul 11 16:05:33 UTC 2022 - Ben Greiner <code@bnavigator.de>
- Update to 0.55.2
* This is a maintenance release to support NumPy 1.22 and Apple
M1.
* Backport #8027: Support for NumPy 1.22
* update max NumPy for 0.55.2
* Backport #8052 Ensure pthread is linked in when building for
ppc64le.
* Backport #8102 to fix numpy requirements
* Backport #8109 Pin TBB support with respect to incompatible
2021.6 API.
-------------------------------------------------------------------
Sat Jan 29 13:23:43 UTC 2022 - Ben Greiner <code@bnavigator.de>
- Update to 0.55.1
* This is a bugfix release that closes all the remaining issues
from the accelerated release of 0.55.0 and also any release
critical regressions discovered since then.
* CUDA target deprecation notices:
- Support for CUDA toolkits < 10.2 is deprecated and will be
removed in Numba 0.56.
- Support for devices with Compute Capability < 5.3 is
deprecated and will be removed in Numba 0.56.
- Drop numba-pr7748-random32bitwidth.patch
- Explicitly declare supported platforms (avoid failing tests on
ppc64)
-------------------------------------------------------------------
Fri Jan 14 16:55:37 UTC 2022 - Ben Greiner <code@bnavigator.de>
- Update to 0.55.0
* This release includes a significant number important dependency
upgrades along with a number of new features and bug fixes.
* NOTE: Due to NumPy CVE-2021-33430 this release has bypassed the
usual release process so as to promptly provide a Numba release
that supports NumPy 1.21. A single release candidate (RC1) was
made and a few issues were reported, these are summarised as
follows and will be fixed in a subsequent 0.55.1 release.
* Known issues with this release:
- Incorrect result copying array-typed field of structured
array (#7693)
- Two issues in DebugInfo generation (#7726, #7730)
- Compilation failure for hash of floating point values on 32
bit Windows when using Python 3.10 (#7713).
* Support for Python 3.10
* Support for NumPy 1.21
* The minimum supported NumPy version is raised to 1.18 for
runtime (compilation however remains compatible with NumPy
1.11).
* Experimental support for isinstance.
* The following functions are now supported:
- np.broadcast_to
- np.float_power
- np.cbrt
- np.logspace
- np.take_along_axis
- np.average
- np.argmin gains support for the axis kwarg.
- np.ndarray.astype gains support for types expressed as
literal strings.
* For users of the Numba extension API, Numba now has a new error
handling mode whereby it will treat all exceptions that do not
inherit from numba.errors.NumbaException as a “hard error” and
immediately unwind the stack. This makes it much easier to
debug when writing @overloads etc from the extension API as
theres now no confusion between Python errors and Numba
errors. This feature can be enabled by setting the environment
variable: NUMBA_CAPTURED_ERRORS='new_style'.
* The threading layer selection priority can now be changed via
the environment variable NUMBA_THREADING_LAYER_PRIORITY.
* Support for NVIDIAs CUDA Python bindings.
* Support for 16-bit floating point numbers and their basic
operations via intrinsics.
* Streams are provided in the Stream.async_done result, making it
easier to implement asynchronous work queues.
* Support for structured types in device arrays, character
sequences in NumPy arrays, and some array operations on nested
arrays.
* Much underlying refactoring to align the CUDA target more
closely with the CPU target, which lays the groudwork for
supporting the high level extension API in CUDA in future
releases.
* Intel also kindly sponsored research and development into
native debug (DWARF) support and handling per-function
compilation flags:
* Line number/location tracking is much improved.
* Numbas internal representation of containers (e.g. tuples,
arrays) are now encoded as structures.
* Numbas per-function compilation flags are encoded into the ABI
field of the mangled name of the function such that its
possible to compile and differentiate between versions of the
same function with different flags set.
* There are no new general deprecations.
* There are no new CUDA target deprecations.
- Drop numba-pr7483-numpy1_21.patch
- Add numba-pr7748-random32bitwidth.patch -- gh#numba/numba#7748
-------------------------------------------------------------------
Sat Jan 8 22:19:07 UTC 2022 - Ben Greiner <code@bnavigator.de>
- Numba <0.55 is not compatible with Python 3.10 or NumPy 1.22
gh#numba/numba#7557
- Add test skip to numba-pr7483-numpy1_21.patch due to numpy update
gh#numpy/numpy#20376
-------------------------------------------------------------------
Thu Nov 18 18:42:21 UTC 2021 - Ben Greiner <code@bnavigator.de>
- Update to 0.54.1
* This is a bugfix release for 0.54.0. It fixes a regression in
structured array type handling, a potential leak on
initialization failure in the CUDA target, a regression caused
by Numbas vendored cloudpickle module resetting dynamic
classes and a few minor testing/infrastructure related
problems.
- Release summary for 0.54.0
* This release includes a significant number of new features,
important refactoring, critical bug fixes and a number of
dependency upgrades.
* Python language support enhancements:
- Basic support for f-strings.
- dict comprehensions are now supported.
- The sum built-in function is implemented.
* NumPy features/enhancements, The following functions are now
supported:
- np.clip
- np.iscomplex
- np.iscomplexobj
- np.isneginf
- np.isposinf
- np.isreal
- np.isrealobj
- np.isscalar
- np.random.dirichlet
- np.rot90
- np.swapaxes
* Also np.argmax has gained support for the axis keyword argument
and its now possible to use 0d NumPy arrays as scalars in
__setitem__ calls.
Internal changes:
* Debugging support through DWARF has been fixed and enhanced.
* Numba now optimises the way in which locals are emitted to help
reduce time spend in LLVMs SROA passes.
CUDA target changes:
* Support for emitting lineinfo to be consumed by profiling tools
such as Nsight Compute
* Improved fastmath code generation for various trig, division,
and other functions
* Faster compilation using lazy addition of libdevice to compiled
units
* Support for IPC on Windows
* Support for passing tuples to CUDA ufuncs
* Performance warnings:
- When making implicit copies by calling a kernel on arrays in
host memory
- When occupancy is poor due to kernel or ufunc/gufunc
configuration
* Support for implementing warp-aggregated intrinsics:
- Using support for more CUDA functions: activemask(),
lanemask_lt()
- The ffs() function now works correctly!
* Support for @overload in the CUDA target
Intel kindly sponsored research and development that lead to a
number of new features and internal support changes:
* Dispatchers can now be retargetted to a new target via a user
defined context manager.
* Support for custom NumPy array subclasses has been added
(including an overloadable memory allocator).
* An inheritance based model for targets that permits targets to
share @overload implementations.
* Per function compiler flags with inheritance behaviours.
* The extension API now has support for overloading class methods
via the @overload_classmethod decorator.
Deprecations:
* The ROCm target (for AMD ROC GPUs) has been moved to an
“unmaintained” status and a seperate repository stub has been
created for it at: https://github.com/numba/numba-rocm
CUDA target deprecations and breaking changes:
* Relaxed strides checking is now the default when computing the
contiguity of device arrays.
* The inspect_ptx() method is deprecated. For use cases that
obtain PTX for further compilation outside of Numba, use
compile_ptx() instead.
* Eager compilation of device functions (the case when
device=True and a signature is provided) is deprecated.
Version support/dependency changes:
* LLVM 11 is now supported on all platforms via llvmlite.
* The minimum supported Python version is raised to 3.7.
* NumPy version 1.20 is supported.
* The minimum supported NumPy version is raised to 1.17 for
runtime (compilation however remains compatible with NumPy
1.11).
* Vendor cloudpickle v1.6.0 now used for all pickle operations.
* TBB >= 2021 is now supported and all prior versions are
unsupported (not easily possible to maintain the ABI breaking
changes).
- Full release notes;
https://numba.readthedocs.io/en/0.54.1/release-notes.html
- Drop patches merged upstream:
* packaging-ignore-setuptools-deprecation.patch
* numba-pr6851-llvm-timings.patch
- Refresh skip-failing-tests.patch, fix-max-name-size.patch
- Add numba-pr7483-numpy1_21.patch gh#numba/numba#7176,
gh#numba/numba#7483
-------------------------------------------------------------------
Wed Mar 17 16:51:46 UTC 2021 - Ben Greiner <code@bnavigator.de>
- Update to 0.53.0
* Support for Python 3.9
* Function sub-typing
* Initial support for dynamic gufuncs (i.e. from @guvectorize)
* Parallel Accelerator (@njit(parallel=True) now supports
Fortran ordered arrays
* Full release notes at
https://numba.readthedocs.io/en/0.53.0/release-notes.html
- Don't unpin-llvmlite.patch. It really need to be the correct
version.
- Refresh skip-failing-tests.patch
- Add packaging-ignore-setuptools-deprecation.patch
gh#numba/numba#6837
- Add numba-pr6851-llvm-timings.patch gh#numba/numba#6851 in order
to fix 32-bit issues gh#numba/numba#6832
-------------------------------------------------------------------
Wed Feb 17 09:49:48 UTC 2021 - Ben Greiner <code@bnavigator.de>
- Update to 0.52.0
https://numba.readthedocs.io/en/stable/release-notes.html
This release focuses on performance improvements, but also adds
some new features and contains numerous bug fixes and stability
improvements.
Highlights of core performance improvements include:
* Intel kindly sponsored research and development into producing
a new reference count pruning pass. This pass operates at the
LLVM level and can prune a number of common reference counting
patterns. This will improve performance for two primary
reasons:
- There will be less pressure on the atomic locks used to do
the reference counting.
- Removal of reference counting operations permits more
inlining and the optimisation passes can in general do more
with what is present.
(Siu Kwan Lam).
* Intel also sponsored work to improve the performance of the
numba.typed.List container, particularly in the case of
__getitem__ and iteration (Stuart Archibald).
* Superword-level parallelism vectorization is now switched on
and the optimisation pipeline has been lightly analysed and
tuned so as to be able to vectorize more and more often
(Stuart Archibald).
Highlights of core feature changes include:
* The inspect_cfg method on the JIT dispatcher object has been
significantly enhanced and now includes highlighted output and
interleaved line markers and Python source (Stuart Archibald).
* The BSD operating system is now unofficially supported (Stuart
Archibald).
* Numerous features/functionality improvements to NumPy support,
including support for:
- np.asfarray (Guilherme Leobas)
- “subtyping” in record arrays (Lucio Fernandez-Arjona)
- np.split and np.array_split (Isaac Virshup)
- operator.contains with ndarray (@mugoh).
- np.asarray_chkfinite (Rishabh Varshney).
- NumPy 1.19 (Stuart Archibald).
- the ndarray allocators, empty, ones and zeros, accepting a
dtype specified as a string literal (Stuart Archibald).
* Booleans are now supported as literal types (Alexey Kozlov).
* On the CUDA target:
* CUDA 9.0 is now the minimum supported version (Graham Markall).
* Support for Unified Memory has been added (Max Katz).
* Kernel launch overhead is reduced (Graham Markall).
* Cudasim support for mapped array, memcopies and memset has
been * added (Mike Williams).
* Access has been wired in to all libdevice functions (Graham
Markall).
* Additional CUDA atomic operations have been added (Michae
Collison).
* Additional math library functions (frexp, ldexp, isfinite)
(Zhihao * Yuan).
* Support for power on complex numbers (Graham Markall).
Deprecations to note:
* There are no new deprecations. However, note that
“compatibility” mode, which was added some 40 releases ago to
help transition from 0.11 to 0.12+, has been removed! Also,
the shim to permit the import of jitclass from Numbas top
level namespace has now been removed as per the deprecation
schedule.
- NEP 29: Skip python36 build. Python 3.6 is dropped by NumPy 1.20
-------------------------------------------------------------------
Mon Nov 2 16:34:48 UTC 2020 - Marketa Calabkova <mcalabkova@suse.com>
- Update to 0.51.2
* The compilation chain is now based on LLVM 10 (Valentin Haenel).
* Numba has internally switched to prefer non-literal types over literal ones so
as to reduce function over-specialisation, this with view of speeding up
compile times (Siu Kwan Lam).
* On the CUDA target: Support for CUDA Toolkit 11, Ampere, and Compute
Capability 8.0; Printing of ``SASS`` code for kernels; Callbacks to Python
functions can be inserted into CUDA streams, and streams are async awaitable;
Atomic ``nanmin`` and ``nanmax`` functions are added; Fixes for various
miscompilations and segfaults. (mostly Graham Markall; call backs on
streams by Peter Würtz).
* Support for heterogeneous immutable lists and heterogeneous immutable string
key dictionaries. Also optional initial/construction value capturing for all
lists and dictionaries containing literal values (Stuart Archibald).
* A new pass-by-reference mutable structure extension type ``StructRef`` (Siu
Kwan Lam).
* Object mode blocks are now cacheable, with the side effect of numerous bug
fixes and performance improvements in caching. This also permits caching of
functions defined in closures (Siu Kwan Lam).
* The error handling and reporting system has been improved to reduce the size
of error messages, and also improve quality and specificity.
* The CUDA target has more stream constructors available and a new function for
compiling to PTX without linking and loading the code to a device. Further,
the macro-based system for describing CUDA threads and blocks has been
replaced with standard typing and lowering implementations, for improved
debugging and extensibility.
- Better unpin llvmlite with unpin-llvmlite.patch to avoid breakages
-------------------------------------------------------------------
Wed May 27 07:24:32 UTC 2020 - pgajdos@suse.com
- version update to 0.49.1
* PR #5587: Fixed #5586 Threading Implementation Typos
* PR #5592: Fixes #5583 Remove references to cffi_support from docs and examples
* PR #5614: Fix invalid type in resolve for comparison expr in parfors.
* PR #5624: Fix erroneous rewrite of predicate to bit const on prune.
* PR #5627: Fixes #5623, SSA local def scan based on invalid equality
assumption.
* PR #5629: Fixes naming error in array_exprs
* PR #5630: Fix #5570. Incorrect race variable detection due to SSA naming.
* PR #5638: Make literal_unroll function work as a freevar.
* PR #5648: Unset the memory manager after EMM Plugin tests
* PR #5651: Fix some SSA issues
* PR #5652: Pin to sphinx=2.4.4 to avoid problem with C declaration
* PR #5658: Fix unifying undefined first class function types issue
* PR #5669: Update example in 5m guide WRT SSA type stability.
* PR #5676: Restore ``numba.types`` as public API
-------------------------------------------------------------------
Fri Apr 24 14:07:35 UTC 2020 - Marketa Calabkova <mcalabkova@suse.com>
- Update to 0.49.0
* Removal of all Python 2 related code and also updating the minimum supported
Python version to 3.6, the minimum supported NumPy version to 1.15 and the
minimum supported SciPy version to 1.0. (Stuart Archibald).
* Refactoring of the Numba code base. The code is now organised into submodules
by functionality. This cleans up Numba's top level namespace.
(Stuart Archibald).
* Introduction of an ``ir.Del`` free static single assignment form for Numba's
intermediate representation (Siu Kwan Lam and Stuart Archibald).
* An OpenMP-like thread masking API has been added for use with code using the
parallel CPU backends (Aaron Meurer and Stuart Archibald).
* For the CUDA target, all kernel launches now require a configuration, this
preventing accidental launches of kernels with the old default of a single
thread in a single block. The hard-coded autotuner is also now removed, such
tuning is deferred to CUDA API calls that provide the same functionality
(Graham Markall).
* The CUDA target also gained an External Memory Management plugin interface to
allow Numba to use another CUDA-aware library for all memory allocations and
deallocations (Graham Markall).
* The Numba Typed List container gained support for construction from iterables
(Valentin Haenel).
* Experimental support was added for first-class function types
(Pearu Peterson).
- Refreshed patch skip-failing-tests.patch
* the troublesome tests are skipped upstream on 32-bit
- Unpin llvmlite
-------------------------------------------------------------------
Mon Apr 6 07:56:16 UTC 2020 - Tomáš Chvátal <tchvatal@suse.com>
- Switch to multibuilt as the tests take ages to build and we
could speed things up in 2 loops
-------------------------------------------------------------------
Fri Feb 21 09:39:07 UTC 2020 - Tomáš Chvátal <tchvatal@suse.com>
- Update to 0.48.0:
* Many fixes for llvm/cuda updates; see CHANGE_LOG for details
* Drop python2 support
- Add one more failing test to skip:
* skip-failing-tests.patch
-------------------------------------------------------------------
Tue Dec 17 23:28:40 CET 2019 - Matej Cepl <mcepl@suse.com>
- Clean up SPEC file (mostly just testing new python-llvmlite
package)
-------------------------------------------------------------------
Thu Oct 24 20:55:10 UTC 2019 - Todd R <toddrme2178@gmail.com>
- Restore python2 support.
-------------------------------------------------------------------
Thu Sep 26 08:06:01 UTC 2019 - Tomáš Chvátal <tchvatal@suse.com>
- Update to 0.46.0:
* Many fixes and changes for llvm/cuda updates
See CHANGE_LOG file for details
- Add fix-max-name-size.patch to fix issue with numba
identifier length on recent LLVM versions.
- Remove test from skip-failing-tests.patch fixed by
fix-max-name-size.patch. The test is important, if it is failing
numba will not work reliably.
-------------------------------------------------------------------
Thu Sep 26 08:06:01 UTC 2019 - Tomáš Chvátal <tchvatal@suse.com>
- Update to 0.45.1:
* Many fixes and changes for llvm/cuda updates
See CHANGE_LOG file for details
- Update skip-failing-tests.patch to skip one more failing test
-------------------------------------------------------------------
Thu Apr 11 21:52:30 CEST 2019 - Matej Cepl <mcepl@suse.com>
- Update to 0.43.1, which is a bugfix release.
-------------------------------------------------------------------
Mon Mar 18 18:05:34 CET 2019 - Matej Cepl <mcepl@suse.com>
- Update to 0.43.0:
- Initial support for statically typed dictionaries
- Improvements to `hash()` to match Python 3 behavior
- Support for the heapq module
- Ability to pass C structs to Numba
- More NumPy functions: asarray, trapz, roll, ptp, extract
- Add skip-failing-tests.patch to avoid problems with possibly
incompatible version of NumPy 1.16.
-------------------------------------------------------------------
Sat Jan 26 17:06:14 UTC 2019 - Arun Persaud <arun@gmx.de>
- specfile:
* update copyright year
- update to version 0.42.0:
* In this release the major features are:
+ The capability to launch and attach the GDB debugger from within
a jitted function.
+ The upgrading of LLVM to version 7.0.0.
* We added a draft of the project roadmap to the developer
manual. The roadmap is for informational purposes only as
priorities and resources may change.
* Here are some enhancements from contributed PRs:
+ #3532. Daniel Wennberg improved the "cuda.{pinned, mapped}" API
so that the associated memory is released immediately at the
exit of the context manager.
+ #3531. Dimitri Vorona enabled the inlining of jitclass methods.
+ #3516. Simon Perkins added the support for passing numpy dtypes
(i.e. "np.dtype("int32")") and their type constructor
(i.e. "np.int32") into a jitted function.
+ #3509. Rob Ennis added support for "np.corrcoef".
* A regression issue (#3554, #3461) relating to making an empty
slice in parallel mode is resolved by #3558.
* General Enhancements:
+ PR #3392: Launch and attach gdb directly from Numba.
+ PR #3437: Changes to accommodate LLVM 7.0.x
+ PR #3509: Support for np.corrcoef
+ PR #3516: Typeof dtype values
+ PR #3520: Fix @stencil ignoring cval if out kwarg supplied.
+ PR #3531: Fix jitclass method inlining and avoid unnecessary
increfs
+ PR #3538: Avoid future C-level assertion error due to invalid
visibility
+ PR #3543: Avoid implementation error being hidden by the
try-except
+ PR #3544: Add `long_running` test flag and feature to exclude
tests.
+ PR #3549: ParallelAccelerator caching improvements
+ PR #3558: Fixes array analysis for inplace binary operators.
+ PR #3566: Skip alignment tests on armv7l.
+ PR #3567: Fix unifying literal types in namedtuple
+ PR #3576: Add special copy routine for NumPy out arrays
+ PR #3577: Fix example and docs typos for `objmode` context
manager. reorder statements.
+ PR #3580: Use alias information when determining whether it is
safe to
+ PR #3583: Use `ir.unknown_loc` for unknown `Loc`, as #3390 with
tests
+ PR #3587: Fix llvm.memset usage changes in llvm7
+ PR #3596: Fix Array Analysis for Global Namedtuples
+ PR #3597: Warn users if threading backend init unsafe.
+ PR #3605: Add guard for writing to read only arrays from ufunc
calls
+ PR #3606: Improve the accuracy of error message wording for
undefined type.
+ PR #3611: gdb test guard needs to ack ptrace permissions
+ PR #3616: Skip gdb tests on ARM.
* CUDA Enhancements:
+ PR #3532: Unregister temporarily pinned host arrays at once
+ PR #3552: Handle broadcast arrays correctly in host->device
transfer.
+ PR #3578: Align cuda and cuda simulator kwarg names.
* Documentation Updates:
+ PR #3545: Fix @njit description in 5 min guide
+ PR #3570: Minor documentation fixes for numba.cuda
+ PR #3581: Fixing minor typo in `reference/types.rst`
+ PR #3594: Changing `@stencil` docs to correctly reflect
`func_or_mode` param
+ PR #3617: Draft roadmap as of Dec 2018
-------------------------------------------------------------------
Sat Dec 1 18:34:28 UTC 2018 - Arun Persaud <arun@gmx.de>
- update to version 0.41.0:
* major features:
+ Diagnostics showing the optimizations done by
ParallelAccelerator
+ Support for profiling Numba-compiled functions in Intel VTune
+ Additional NumPy functions: partition, nancumsum, nancumprod,
ediff1d, cov, conj, conjugate, tri, tril, triu
+ Initial support for Python 3 Unicode strings
* General Enhancements:
+ PR #1968: armv7 support
+ PR #2983: invert mapping b/w binop operators and the operator
module #2297
+ PR #3160: First attempt at parallel diagnostics
+ PR #3307: Adding NUMBA_ENABLE_PROFILING envvar, enabling jit
event
+ PR #3320: Support for np.partition
+ PR #3324: Support for np.nancumsum and np.nancumprod
+ PR #3325: Add location information to exceptions.
+ PR #3337: Support for np.ediff1d
+ PR #3345: Support for np.cov
+ PR #3348: Support user pipeline class in with lifting
+ PR #3363: string support
+ PR #3373: Improve error message for empty imprecise lists.
+ PR #3375: Enable overload(operator.getitem)
+ PR #3402: Support negative indexing in tuple.
+ PR #3414: Refactor Const type
+ PR #3416: Optimized usage of alloca out of the loop
+ PR #3424: Updates for llvmlite 0.26
+ PR #3462: Add support for `np.conj/np.conjugate`.
+ PR #3480: np.tri, np.tril, np.triu - default optional args
+ PR #3481: Permit dtype argument as sole kwarg in np.eye
* CUDA Enhancements:
+ PR #3399: Add max_registers Option to cuda.jit
* Continuous Integration / Testing:
+ PR #3303: CI with Azure Pipelines
+ PR #3309: Workaround race condition with apt
+ PR #3371: Fix issues with Azure Pipelines
+ PR #3362: Fix #3360: `RuntimeWarning: 'numba.runtests' found in
sys.modules`
+ PR #3374: Disable openmp in wheel building
+ PR #3404: Azure Pipelines templates
+ PR #3419: Fix cuda tests and error reporting in test discovery
+ PR #3491: Prevent faulthandler installation on armv7l
+ PR #3493: Fix CUDA test that used negative indexing behaviour
that's fixed.
+ PR #3495: Start Flake8 checking of Numba source
* Fixes:
+ PR #2950: Fix dispatcher to only consider contiguous-ness.
+ PR #3124: Fix 3119, raise for 0d arrays in reductions
+ PR #3228: Reduce redundant module linking
+ PR #3329: Fix AOT on windows.
+ PR #3335: Fix memory management of __cuda_array_interface__
views.
+ PR #3340: Fix typo in error name.
+ PR #3365: Fix the default unboxing logic
+ PR #3367: Allow non-global reference to objmode()
context-manager
+ PR #3381: Fix global reference in objmode for dynamically
created function
+ PR #3382: CUDA_ERROR_MISALIGNED_ADDRESS Using Multiple Const
Arrays
+ PR #3384: Correctly handle very old versions of colorama
+ PR #3394: Add 32bit package guard for non-32bit installs
+ PR #3397: Fix with-objmode warning
+ PR #3403 Fix label offset in call inline after parfor pass
+ PR #3429: Fixes raising of user defined exceptions for
exec(<string>).
+ PR #3432: Fix error due to function naming in CI in py2.7
+ PR #3444: Fixed TBB's single thread execution and test added for
#3440
+ PR #3449: Allow matching non-array objects in find_callname()
+ PR #3455: Change getiter and iternext to not be pure. Resolves
#3425
+ PR #3467: Make ir.UndefinedType singleton class.
+ PR #3478: Fix np.random.shuffle sideeffect
+ PR #3487: Raise unsupported for kwargs given to `print()`
+ PR #3488: Remove dead script.
+ PR #3498: Fix stencil support for boolean as return type
+ PR #3511: Fix handling make_function literals (regression of
#3414)
+ PR #3514: Add missing unicode != unicode
+ PR #3527: Fix complex math sqrt implementation for large -ve
values
+ PR #3530: This adds arg an check for the pattern supplied to
Parfors.
+ PR #3536: Sets list dtor linkage to `linkonce_odr` to fix
visibility in AOT
* Documentation Updates:
+ PR #3316: Update 0.40 changelog with additional PRs
+ PR #3318: Tweak spacing to avoid search box wrapping onto second
line
+ PR #3321: Add note about memory leaks with exceptions to
docs. Fixes #3263
+ PR #3322: Add FAQ on CUDA + fork issue. Fixes #3315.
+ PR #3343: Update docs for argsort, kind kwarg partially
supported.
+ PR #3357: Added mention of njit in 5minguide.rst
+ PR #3434: Fix parallel reduction example in docs.
+ PR #3452: Fix broken link and mark up problem.
+ PR #3484: Size Numba logo in docs in em units. Fixes #3313
+ PR #3502: just two typos
+ PR #3506: Document string support
+ PR #3513: Documentation for parallel diagnostics.
+ PR #3526: Fix 5 min guide with respect to @njit decl
-------------------------------------------------------------------
Fri Oct 26 21:28:50 UTC 2018 - Jan Engelhardt <jengelh@inai.de>
- Use noun phrase in summary.
-------------------------------------------------------------------
Fri Oct 26 19:45:47 UTC 2018 - Todd R <toddrme2178@gmail.com>
- Update to Version 0.40.1
* PR #3338: Accidentally left Anton off contributor list for 0.40.0
* PR #3374: Disable OpenMP in wheel building
* PR #3376: Update 0.40.1 changelog and docs on OpenMP backend
- Update to Version 0.40.0
+ This release adds a number of major features:
* A new GPU backend: kernels for AMD GPUs can now be compiled using the ROCm
driver on Linux.
* The thread pool implementation used by Numba for automatic multithreading
is configurable to use TBB, OpenMP, or the old "workqueue" implementation.
(TBB is likely to become the preferred default in a future release.)
* New documentation on thread and fork-safety with Numba, along with overall
improvements in thread-safety.
* Experimental support for executing a block of code inside a nopython mode
function in object mode.
* Parallel loops now allow arrays as reduction variables
* CUDA improvements: FMA, faster float64 atomics on supporting hardware,
records in const memory, and improved datatime dtype support
* More NumPy functions: vander, tri, triu, tril, fill_diagonal
+ General Enhancements:
* PR #3017: Add facility to support with-contexts
* PR #3033: Add support for multidimensional CFFI arrays
* PR #3122: Add inliner to object mode pipeline
* PR #3127: Support for reductions on arrays.
* PR #3145: Support for np.fill_diagonal
* PR #3151: Keep a queue of references to last N deserialized functions. Fixes #3026
* PR #3154: Support use of list() if typeable.
* PR #3166: Objmode with-block
* PR #3179: Updates for llvmlite 0.25
* PR #3181: Support function extension in alias analysis
* PR #3189: Support literal constants in typing of object methods
* PR #3190: Support passing closures as literal values in typing
* PR #3199: Support inferring stencil index as constant in simple unary expressions
* PR #3202: Threading layer backend refactor/rewrite/reinvention!
* PR #3209: Support for np.tri, np.tril and np.triu
* PR #3211: Handle unpacking in building tuple (BUILD_TUPLE_UNPACK opcode)
* PR #3212: Support for np.vander
* PR #3227: Add NumPy 1.15 support
* PR #3272: Add MemInfo_data to runtime._nrt_python.c_helpers
* PR #3273: Refactor. Removing thread-local-storage based context nesting.
* PR #3278: compiler threadsafety lockdown
* PR #3291: Add CPU count and CFS restrictions info to numba -s.
+ CUDA Enhancements:
* PR #3152: Use cuda driver api to get best blocksize for best occupancy
* PR #3165: Add FMA intrinsic support
* PR #3172: Use float64 add Atomics, Where Available
* PR #3186: Support Records in CUDA Const Memory
* PR #3191: CUDA: fix log size
* PR #3198: Fix GPU datetime timedelta types usage
* PR #3221: Support datetime/timedelta scalar argument to a CUDA kernel.
* PR #3259: Add DeviceNDArray.view method to reinterpret data as a different type.
* PR #3310: Fix IPC handling of sliced cuda array.
+ ROCm Enhancements:
* PR #3023: Support for AMDGCN/ROCm.
* PR #3108: Add ROC info to `numba -s` output.
* PR #3176: Move ROC vectorize init to npyufunc
* PR #3177: Add auto_synchronize support to ROC stream
* PR #3178: Update ROC target documentation.
* PR #3294: Add compiler lock to ROC compilation path.
* PR #3280: Add wavebits property to the HSA Agent.
* PR #3281: Fix ds_permute types and add tests
+ Continuous Integration / Testing:
* PR #3091: Remove old recipes, switch to test config based on env var.
* PR #3094: Add higher ULP tolerance for products in complex space.
* PR #3096: Set exit on error in incremental scripts
* PR #3109: Add skip to test needing jinja2 if no jinja2.
* PR #3125: Skip cudasim only tests
* PR #3126: add slack, drop flowdock
* PR #3147: Improve error message for arg type unsupported during typing.
* PR #3128: Fix recipe/build for jetson tx2/ARM
* PR #3167: In build script activate env before installing.
* PR #3180: Add skip to broken test.
* PR #3216: Fix libcuda.so loading in some container setup
* PR #3224: Switch to new Gitter notification webhook URL and encrypt it
* PR #3235: Add 32bit Travis CI jobs
* PR #3257: This adds scipy/ipython back into windows conda test phase.
+ Fixes:
* PR #3038: Fix random integer generation to match results from NumPy.
* PR #3045: Fix #3027 - Numba reassigns sys.stdout
* PR #3059: Handler for known LoweringErrors.
* PR #3060: Adjust attribute error for NumPy functions.
* PR #3067: Abort simulator threads on exception in thread block.
* PR #3079: Implement +/-(types.boolean) Fix #2624
* PR #3080: Compute np.var and np.std correctly for complex types.
* PR #3088: Fix #3066 (array.dtype.type in prange)
* PR #3089: Fix invalid ParallelAccelerator hoisting issue.
* PR #3136: Fix #3135 (lowering error)
* PR #3137: Fix for issue3103 (race condition detection)
* PR #3142: Fix Issue #3139 (parfors reuse of reduction variable across prange blocks)
* PR #3148: Remove dead array equal @infer code
* PR #3153: Fix canonicalize_array_math typing for calls with kw args
* PR #3156: Fixes issue with missing pygments in testing and adds guards.
* PR #3168: Py37 bytes output fix.
* PR #3171: Fix #3146. Fix CFUNCTYPE void* return-type handling
* PR #3193: Fix setitem/getitem resolvers
* PR #3222: Fix #3214. Mishandling of POP_BLOCK in while True loop.
* PR #3230: Fixes liveness analysis issue in looplifting
* PR #3233: Fix return type difference for 32bit ctypes.c_void_p
* PR #3234: Fix types and layout for `np.where`.
* PR #3237: Fix DeprecationWarning about imp module
* PR #3241: Fix #3225. Normalize 0nd array to scalar in typing of indexing code.
* PR #3256: Fix #3251: Move imports of ABCs to collections.abc for Python >= 3.3
* PR #3292: Fix issue3279.
* PR #3302: Fix error due to mismatching dtype
+ Documentation Updates:
* PR #3104: Workaround for #3098 (test_optional_unpack Heisenbug)
* PR #3132: Adds an ~5 minute guide to Numba.
* PR #3194: Fix docs RE: np.random generator fork/thread safety
* PR #3242: Page with Numba talks and tutorial links
* PR #3258: Allow users to choose the type of issue they are reporting.
* PR #3260: Fixed broken link
* PR #3266: Fix cuda pointer ownership problem with user/externally allocated pointer
* PR #3269: Tweak typography with CSS
* PR #3270: Update FAQ for functions passed as arguments
* PR #3274: Update installation instructions
* PR #3275: Note pyobject and voidptr are types in docs
* PR #3288: Do not need to call parallel optimizations "experimental" anymore
* PR #3318: Tweak spacing to avoid search box wrapping onto second line
- Remove upstream-included numba-0.39.0-fix-3135.patch
-------------------------------------------------------------------
Fri Jul 20 13:09:58 UTC 2018 - mcepl@suse.com
- Add patch numba-0.39.0-fix-3135.patch to make not fail datashader
tests. (https://github.com/bokeh/datashader/issues/620)
-------------------------------------------------------------------
Fri Jul 13 09:20:32 UTC 2018 - tchvatal@suse.com
- Fix version requirement to ask for new llvmlite
-------------------------------------------------------------------
Thu Jul 12 03:31:08 UTC 2018 - arun@gmx.de
- update to version 0.39.0:
* Here are the highlights for the Numba 0.39.0 release.
+ This is the first version that supports Python 3.7.
+ With help from Intel, we have fixed the issues with SVML support
(related issues #2938, #2998, #3006).
+ List has gained support for containing reference-counted types
like NumPy arrays and `list`. Note, list still cannot hold
heterogeneous types.
+ We have made a significant change to the internal
calling-convention, which should be transparent to most users,
to allow for a future feature that will permitting jumping back
into python-mode from a nopython-mode function. This also fixes
a limitation to `print` that disabled its use from nopython
functions that were deep in the call-stack.
+ For CUDA GPU support, we added a `__cuda_array_interface__`
following the NumPy array interface specification to allow Numba
to consume externally defined device arrays. We have opened a
corresponding pull request to CuPy to test out the concept and
be able to use a CuPy GPU array.
+ The Numba dispatcher `inspect_types()` method now supports the
kwarg `pretty` which if set to `True` will produce ANSI/HTML
output, showing the annotated types, when invoked from
ipython/jupyter-notebook respectively.
+ The NumPy functions `ndarray.dot`, `np.percentile` and
`np.nanpercentile`, and `np.unique` are now supported.
+ Numba now supports the use of a per-project configuration file
to permanently set behaviours typically set via `NUMBA_*` family
environment variables.
+ Support for the `ppc64le` architecture has been added.
* Enhancements:
+ PR #2793: Simplify and remove javascript from html_annotate
templates.
+ PR #2840: Support list of refcounted types
+ PR #2902: Support for np.unique
+ PR #2926: Enable fence for all architecture and add developer
notes
+ PR #2928: Making error about untyped list more informative.
+ PR #2930: Add configuration file and color schemes.
+ PR #2932: Fix encoding to 'UTF-8' in `check_output` decode.
+ PR #2938: Python 3.7 compat: _Py_Finalizing becomes
_Py_IsFinalizing()
+ PR #2939: Comprehensive SVML unit test
+ PR #2946: Add support for `ndarray.dot` method and tests.
+ PR #2953: percentile and nanpercentile
+ PR #2957: Add new 3.7 opcode support.
+ PR #2963: Improve alias analysis to be more comprehensive
+ PR #2984: Support for namedtuples in array analysis
+ PR #2986: Fix environment propagation
+ PR #2990: Improve function call matching for intrinsics
+ PR #3002: Second pass at error rewrites (interpreter errors).
+ PR #3004: Add numpy.empty to the list of pure functions.
+ PR #3008: Augment SVML detection with llvmlite SVML patch
detection.
+ PR #3012: Make use of the common spelling of
heterogeneous/homogeneous.
+ PR #3032: Fix pycc ctypes test due to mismatch in
calling-convention
+ PR #3039: Add SVML detection to Numba environment diagnostic
tool.
+ PR #3041: This adds @needs_blas to tests that use BLAS
+ PR #3056: Require llvmlite>=0.24.0
* CUDA Enhancements:
+ PR #2860: __cuda_array_interface__
+ PR #2910: More CUDA intrinsics
+ PR #2929: Add Flag To Prevent Unneccessary D->H Copies
+ PR #3037: Add CUDA IPC support on non-peer-accessible devices
* CI Enhancements:
+ PR #3021: Update appveyor config.
+ PR #3040: Add fault handler to all builds
+ PR #3042: Add catchsegv
+ PR #3077: Adds optional number of processes for `-m` in testing
* Fixes:
+ PR #2897: Fix line position of delete statement in numba ir
+ PR #2905: Fix for #2862
+ PR #3009: Fix optional type returning in recursive call
+ PR #3019: workaround and unittest for issue #3016
+ PR #3035: [TESTING] Attempt delayed removal of Env
+ PR #3048: [WIP] Fix cuda tests failure on buildfarm
+ PR #3054: Make test work on 32-bit
+ PR #3062: Fix cuda.In freeing devary before the kernel launch
+ PR #3073: Workaround #3072
+ PR #3076: Avoid ignored exception due to missing globals at
interpreter teardown
* Documentation Updates:
+ PR #2966: Fix syntax in env var docs.
+ PR #2967: Fix typo in CUDA kernel layout example.
+ PR #2970: Fix docstring copy paste error.
-------------------------------------------------------------------
Sun Jun 24 01:05:37 UTC 2018 - arun@gmx.de
- update to version 0.38.1:
This is a critical bug fix release addressing:
https://github.com/numba/numba/issues/3006
The bug does not impact users using conda packages from Anaconda or Intel Python
Distribution (but it does impact conda-forge). It does not impact users of pip
using wheels from PyPI.
This only impacts a small number of users where:
* The ICC runtime (specifically libsvml) is present in the user's environment.
* The user is using an llvmlite statically linked against a version of LLVM
that has not been patched with SVML support.
* The platform is 64-bit.
The release fixes a code generation path that could lead to the production of
incorrect results under the above situation.
Fixes:
* PR #3007: Augment SVML detection with llvmlite SVML patch
detection.
-------------------------------------------------------------------
Fri May 18 08:06:59 UTC 2018 - tchvatal@suse.com
- Fix dependencies to match reality
- Add more items to make python2 build
-------------------------------------------------------------------
Sat May 12 16:21:24 UTC 2018 - arun@gmx.de
- update to version 0.38.0:
* highlights:
+ Numba (via llvmlite) is now backed by LLVM 6.0, general
vectorization is improved as a result. A significant long
standing LLVM bug that was causing corruption was also found and
fixed.
+ Further considerable improvements in vectorization are made
available as Numba now supports Intel's short vector math
library (SVML). Try it out with `conda install -c numba
icc_rt`.
+ CUDA 8.0 is now the minimum supported CUDA version.
* Other highlights include:
+ Bug fixes to `parallel=True` have enabled more vectorization
opportunities when using the ParallelAccelerator technology.
+ Much effort has gone into improving error reporting and the
general usability of Numba. This includes highlighted error
messages and performance tips documentation. Try it out with
`conda install colorama`.
+ A number of new NumPy functions are supported, `np.convolve`,
`np.correlate` `np.reshape`, `np.transpose`, `np.permutation`,
`np.real`, `np.imag`, and `np.searchsorted` now supports
the`side` kwarg. Further, `np.argsort` now supports the `kind`
kwarg with `quicksort` and `mergesort` available.
+ The Numba extension API has gained the ability operate more
easily with functions from Cython modules through the use of
`numba.extending.get_cython_function_address` to obtain function
addresses for direct use in `ctypes.CFUNCTYPE`.
+ Numba now allows the passing of jitted functions (and containers
of jitted functions) as arguments to other jitted functions.
+ The CUDA functionality has gained support for a larger selection
of bit manipulation intrinsics, also SELP, and has had a number
of bugs fixed.
+ Initial work to support the PPC64LE platform has been added,
full support is however waiting on the LLVM 6.0.1 release as it
contains critical patches not present in 6.0.0. It is hoped
that any remaining issues will be fixed in the next release.
+ The capacity for advanced users/compiler engineers to define
their own compilation pipelines.
-------------------------------------------------------------------
Mon Apr 23 14:55:41 UTC 2018 - toddrme2178@gmail.com
- Fix dependency versions
-------------------------------------------------------------------
Fri Mar 2 23:16:36 UTC 2018 - arun@gmx.de
- specfile:
* update required llvmlite version
- update to version 0.37.0:
* Misc enhancements:
+ PR #2627: Remove hacks to make llvmlite threadsafe
+ PR #2672: Add ascontiguousarray
+ PR #2678: Add Gitter badge
+ PR #2691: Fix #2690: add intrinsic to convert array to tuple
+ PR #2703: Test runner feature: failed-first and last-failed
+ PR #2708: Patch for issue #1907
+ PR #2732: Add support for array.fill
* Misc Fixes:
+ PR #2610: Fix #2606 lowering of optional.setattr
+ PR #2650: Remove skip for win32 cosine test
+ PR #2668: Fix empty_like from readonly arrays.
+ PR #2682: Fixes 2210, remove _DisableJitWrapper
+ PR #2684: Fix #2340, generator error yielding bool
+ PR #2693: Add travis-ci testing of NumPy 1.14, and also check on
Python 2.7
+ PR #2694: Avoid type inference failure due to a typing template
rejection
+ PR #2695: Update llvmlite version dependency.
+ PR #2696: Fix tuple indexing codegeneration for empty tuple
+ PR #2698: Fix #2697 by deferring deletion in the simplify_CFG
loop.
+ PR #2701: Small fix to avoid tempfiles being created in the
current directory
+ PR #2725: Fix 2481, LLVM IR parsing error due to mutated IR
+ PR #2726: Fix #2673: incorrect fork error msg.
+ PR #2728: Alternative to #2620. Remove dead code
ByteCodeInst.get.
+ PR #2730: Add guard for test needing SciPy/BLAS
* Documentation updates:
+ PR #2670: Update communication channels
+ PR #2671: Add docs about diagnosing loop vectorizer
+ PR #2683: Add docs on const arg requirements and on const mem
alloc
+ PR #2722: Add docs on numpy support in cuda
+ PR #2724: Update doc: warning about unsupported arguments
* ParallelAccelerator enhancements/fixes:
+ Parallel support for `np.arange` and `np.linspace`, also
`np.mean`, `np.std` and `np.var` are added. This was performed
as part of a general refactor and cleanup of the core ParallelAccelerator code.
+ PR #2674: Core pa
+ PR #2704: Generate Dels after parfor sequential lowering
+ PR #2716: Handle matching directly supported functions
* CUDA enhancements:
+ PR #2665: CUDA DeviceNDArray: Support numpy tranpose API
+ PR #2681: Allow Assigning to DeviceNDArrays
+ PR #2702: Make DummyArray do High Dimensional Reshapes
+ PR #2714: Use CFFI to Reuse Code
* CUDA fixes:
+ PR #2667: Fix CUDA DeviceNDArray slicing
+ PR #2686: Fix #2663: incorrect offset when indexing cuda array.
+ PR #2687: Ensure Constructed Stream Bound
+ PR #2706: Workaround for unexpected warp divergence due to
exception raising code
+ PR #2707: Fix regression: cuda test submodules not loading
properly in runtests
+ PR #2731: Use more challenging values in slice tests.
+ PR #2720: A quick testsuite fix to not run the new cuda testcase
in the multiprocess pool
-------------------------------------------------------------------
Thu Jan 11 19:25:55 UTC 2018 - toddrme2178@gmail.com
- Bump minimum llvmlite version.
-------------------------------------------------------------------
Thu Dec 21 18:33:16 UTC 2017 - arun@gmx.de
- update to version 0.36.2:
* PR #2645: Avoid CPython bug with "exec" in older 2.7.x.
* PR #2652: Add support for CUDA 9.
-------------------------------------------------------------------
Fri Dec 8 17:59:51 UTC 2017 - arun@gmx.de
- update to version 0.36.1:
* ParallelAccelerator features:
+ PR #2457: Stencil Computations in ParallelAccelerator
+ PR #2548: Slice and range fusion, parallelizing bitarray and
slice assignment
+ PR #2516: Support general reductions in ParallelAccelerator
* ParallelAccelerator fixes:
+ PR #2540: Fix bug #2537
+ PR #2566: Fix issue #2564.
+ PR #2599: Fix nested multi-dimensional parfor type inference
issue
+ PR #2604: Fixes for stencil tests and cmath sin().
+ PR #2605: Fixes issue #2603.
* PR #2568: Update for LLVM 5
* PR #2607: Fixes abort when getting address to
"nrt_unresolved_abort"
* PR #2615: Working towards conda build 3
* Misc fixes/enhancements:
+ PR #2534: Add tuple support to np.take.
+ PR #2551: Rebranding fix
+ PR #2552: relative doc links
+ PR #2570: Fix issue #2561, handle missing successor on loop exit
+ PR #2588: Fix #2555. Disable libpython.so linking on linux
+ PR #2601: Update llvmlite version dependency.
+ PR #2608: Fix potential cache file collision
+ PR #2612: Fix NRT test failure due to increased overhead when
running in coverage
+ PR #2619: Fix dubious pthread_cond_signal not in lock
+ PR #2622: Fix `np.nanmedian` for all NaN case.
+ PR #2633: Fix markdown in CONTRIBUTING.md
+ PR #2635: Make the dependency on compilers for AOT optional.
* CUDA support fixes:
+ PR #2523: Fix invalid cuda context in memory transfer calls in
another thread
+ PR #2575: Use CPU to initialize xoroshiro states for GPU
RNG. Fixes #2573
+ PR #2581: Fix cuda gufunc mishandling of scalar arg as array and
out argument
-------------------------------------------------------------------
Tue Oct 3 06:05:20 UTC 2017 - arun@gmx.de
- update to version 0.35.0:
* ParallelAccelerator:
+ PR #2400: Array comprehension
+ PR #2405: Support printing Numpy arrays
+ PR #2438: from Support more np.random functions in
ParallelAccelerator
+ PR #2482: Support for sum with axis in nopython mode.
+ PR #2487: Adding developer documentation for ParallelAccelerator
technology.
+ PR #2492: Core PA refactor adds assertions for broadcast
semantics
* ParallelAccelerator fixes:
+ PR #2478: Rename cfg before parfor translation (#2477)
+ PR #2479: Fix broken array comprehension tests on unsupported
platforms
+ PR #2484: Fix array comprehension test on win64
+ PR #2506: Fix for 32-bit machines.
* Additional features of note:
+ PR #2490: Implement np.take and ndarray.take
+ PR #2493: Display a warning if parallel=True is set but not
possible.
+ PR #2513: Add np.MachAr, np.finfo, np.iinfo
+ PR #2515: Allow environ overriding of cpu target and cpu
features.
* Misc fixes/enhancements:
+ PR #2455: add contextual information to runtime errors
+ PR #2470: Fixes #2458, poor performance in np.median
+ PR #2471: Ensure LLVM threadsafety in {g,}ufunc building.
+ PR #2494: Update doc theme
+ PR #2503: Remove hacky code added in 2482 and feature
enhancement
+ PR #2505: Serialise env mutation tests during multithreaded
testing.
+ PR #2520: Fix failing cpu-target override tests
* CUDA support fixes:
+ PR #2504: Enable CUDA toolkit version testing
+ PR #2509: Disable tests generating code unavailable in lower CC
versions.
+ PR #2511: Fix Windows 64 bit CUDA tests.
- changes from version 0.34.0:
* ParallelAccelerator features:
+ PR #2318: Transfer ParallelAccelerator technology to Numba
+ PR #2379: ParallelAccelerator Core Improvements
+ PR #2367: Add support for len(range(...))
+ PR #2369: List comprehension
+ PR #2391: Explicit Parallel Loop Support (prange)
* CUDA support enhancements:
+ PR #2377: New GPU reduction algorithm
* CUDA support fixes:
+ PR #2397: Fix #2393, always set alignment of cuda static memory
regions
* Misc Fixes:
+ PR #2373, Issue #2372: 32-bit compatibility fix for parfor
related code
+ PR #2376: Fix #2375 missing stdint.h for py2.7 vc9
+ PR #2378: Fix deadlock in parallel gufunc when kernel acquires
the GIL.
+ PR #2382: Forbid unsafe casting in bitwise operation
+ PR #2385: docs: fix Sphinx errors
+ PR #2396: Use 64-bit RHS operand for shift
+ PR #2404: Fix threadsafety logic issue in ufunc compilation
cache.
+ PR #2424: Ensure consistent iteration order of blocks for type
inference.
+ PR #2425: Guard code to prevent the use of parallel on win32 +
py27
+ PR #2426: Basic test for Enum member type recovery.
+ PR #2433: Fix up the parfors tests with respect to windows py2.7
+ PR #2442: Skip tests that need BLAS/LAPACK if scipy is not
available.
+ PR #2444: Add test for invalid array setitem
+ PR #2449: Make the runtime initialiser threadsafe
+ PR #2452: Skip CFG test on 64bit windows
* Misc Enhancements:
+ PR #2366: Improvements to IR utils
+ PR #2388: Update README.rst to indicate the proper version of
LLVM
+ PR #2394: Upgrade to llvmlite 0.19.*
+ PR #2395: Update llvmlite version to 0.19
+ PR #2406: Expose environment object to ufuncs
+ PR #2407: Expose environment object to target-context inside
lowerer
+ PR #2413: Add flags to pass through to conda build for buildbot
+ PR #2414: Add cross compile flags to local recipe
+ PR #2415: A few cleanups for rewrites
+ PR #2418: Add getitem support for Enum classes
+ PR #2419: Add support for returning enums in vectorize
+ PR #2421: Add copyright notice for Intel contributed files.
+ PR #2422: Patch code base to work with np 1.13 release
+ PR #2448: Adds in warning message when using parallel if
cache=True
+ PR #2450: Add test for keyword arg on .sum-like and .cumsum-like
array methods
- changes from version 0.33.0:
* There are also several enhancements to the CUDA GPU support:
+ A GPU random number generator based on xoroshiro128+ algorithm
is added. See details and examples in documentation.
+ @cuda.jit CUDA kernels can now call @jit and @njit CPU functions
and they will automatically be compiled as CUDA device
functions.
+ CUDA IPC memory API is exposed for sharing memory between
proceses. See usage details in documentation.
* Reference counting enhancements:
+ PR #2346, Issue #2345, #2248: Add extra refcount pruning after
inlining
+ PR #2349: Fix refct pruning not removing refct op with tail
call.
+ PR #2352, Issue #2350: Add refcount pruning pass for function
that does not need refcount
* CUDA support enhancements:
+ PR #2023: Supports CUDA IPC for device array
+ PR #2343, Issue #2335: Allow CPU jit decorated function to be
used as cuda device function
+ PR #2347: Add random number generator support for CUDA device
code
+ PR #2361: Update autotune table for CC: 5.3, 6.0, 6.1, 6.2
* Misc fixes:
+ PR #2362: Avoid test failure due to typing to int32 on 32-bit
platforms
+ PR #2359: Fixed nogil example that threw a TypeError when
executed.
+ PR #2357, Issue #2356: Fix fragile test that depends on how the
script is executed.
+ PR #2355: Fix cpu dispatcher referenced as attribute of another
module
+ PR #2354: Fixes an issue with caching when function needs NRT
and refcount pruning
+ PR #2342, Issue #2339: Add warnings to inspection when it is
used on unserialized cached code
+ PR #2329, Issue #2250: Better handling of missing op codes
* Misc enhancements:
+ PR #2360: Adds missing values in error mesasge interp.
+ PR #2353: Handle when get_host_cpu_features() raises
RuntimeError
+ PR #2351: Enable SVML for erf/erfc/gamma/lgamma/log2
+ PR #2344: Expose error_model setting in jit decorator
+ PR #2337: Align blocking terminate support for fork() with new
TBB version
+ PR #2336: Bump llvmlite version to 0.18
+ PR #2330: Core changes in PR #2318
-------------------------------------------------------------------
Wed May 3 18:23:09 UTC 2017 - toddrme2178@gmail.com
- update to version 0.32.0:
+ Improvements:
* PR #2322: Suppress test error due to unknown but consistent error with tgamma
* PR #2320: Update llvmlite dependency to 0.17
* PR #2308: Add details to error message on why cuda support is disabled.
* PR #2302: Add os x to travis
* PR #2294: Disable remove_module on MCJIT due to memory leak inside LLVM
* PR #2291: Split parallel tests and recycle workers to tame memory usage
* PR #2253: Remove the pointer-stuffing hack for storing meminfos in lists
+ Fixes:
* PR #2331: Fix a bug in the GPU array indexing
* PR #2326: Fix #2321 docs referring to non-existing function.
* PR #2316: Fixing more race-condition problems
* PR #2315: Fix #2314. Relax strict type check to allow optional type.
* PR #2310: Fix race condition due to concurrent compilation and cache loading
* PR #2304: Fix intrinsic 1st arg not a typing.Context as stated by the docs.
* PR #2287: Fix int64 atomic min-max
* PR #2286: Fix #2285 `@overload_method` not linking dependent libs
* PR #2303: Missing import statements to interval-example.rst
- Implement single-spec version
-------------------------------------------------------------------
Wed Feb 22 22:15:53 UTC 2017 - arun@gmx.de
- update to version 0.31.0:
* Improvements:
+ PR #2281: Update for numpy1.12
+ PR #2278: Add CUDA atomic.{max, min, compare_and_swap}
+ PR #2277: Add about section to conda recipies to identify
license and other metadata in Anaconda Cloud
+ PR #2271: Adopt itanium C++-style mangling for CPU and CUDA
targets
+ PR #2267: Add fastmath flags
+ PR #2261: Support dtype.type
+ PR #2249: Changes for llvm3.9
+ PR #2234: Bump llvmlite requirement to 0.16 and add
install_name_tool_fixer to mviewbuf for OS X
+ PR #2230: Add python3.6 to TravisCi
+ PR #2227: Enable caching for gufunc wrapper
+ PR #2170: Add debugging support
+ PR #2037: inspect_cfg() for easier visualization of the function
operation
* Fixes:
+ PR #2274: Fix nvvm ir patch in mishandling “load”
+ PR #2272: Fix breakage to cuda7.5
+ PR #2269: Fix caching of copy_strides kernel in cuda.reduce
+ PR #2265: Fix #2263: error when linking two modules with dynamic
globals
+ PR #2252: Fix path separator in test
+ PR #2246: Fix overuse of memory in some system with fork
+ PR #2241: Fix #2240: __module__ in dynamically created function
not a str
+ PR #2239: Fix fingerprint computation failure preventing
fallback
-------------------------------------------------------------------
Sun Jan 15 00:33:08 UTC 2017 - arun@gmx.de
- update to version 0.30.1:
* Fixes:
+ PR #2232: Fix name clashes with _Py_hashtable_xxx in Python 3.6.
* Improvements:
+ PR #2217: Add Intel TBB threadpool implementation for parallel
ufunc.
-------------------------------------------------------------------
Tue Jan 10 17:17:33 UTC 2017 - arun@gmx.de
- specfile:
* update copyright year
- update to version 0.30.0:
* Improvements:
+ PR #2209: Support Python 3.6.
+ PR #2175: Support np.trace(), np.outer() and np.kron().
+ PR #2197: Support np.nanprod().
+ PR #2190: Support caching for ufunc.
+ PR #2186: Add system reporting tool.
* Fixes:
+ PR #2214, Issue #2212: Fix memory error with ndenumerate and
flat iterators.
+ PR #2206, Issue #2163: Fix zip() consuming extra elements in
early exhaustion.
+ PR #2185, Issue #2159, #2169: Fix rewrite pass affecting objmode
fallback.
+ PR #2204, Issue #2178: Fix annotation for liftedloop.
+ PR #2203: Fix Appveyor segfault with Python 3.5.
+ PR #2202, Issue #2198: Fix target context not initialized when
loading from ufunc cache.
+ PR #2172, Issue #2171: Fix optional type unpacking.
+ PR #2189, Issue #2188: Disable freezing of big (>1MB) global
arrays.
+ PR #2180, Issue #2179: Fix invalid variable version in
looplifting.
+ PR #2156, Issue #2155: Fix divmod, floordiv segfault on CUDA.
-------------------------------------------------------------------
Fri Dec 2 21:07:51 UTC 2016 - jengelh@inai.de
- remove subjective words from description
-------------------------------------------------------------------
Sat Nov 5 17:53:40 UTC 2016 - arun@gmx.de
- update to version 0.29.0:
* Improvements:
+ PR #2130, #2137: Add type-inferred recursion with docs and
examples.
+ PR #2134: Add np.linalg.matrix_power.
+ PR #2125: Add np.roots.
+ PR #2129: Add np.linalg.{eigvals,eigh,eigvalsh}.
+ PR #2126: Add array-to-array broadcasting.
+ PR #2069: Add hstack and related functions.
+ PR #2128: Allow for vectorizing a jitted function. (thanks to
@dhirschfeld)
+ PR #2117: Update examples and make them test-able.
+ PR #2127: Refactor interpreter class and its results.
* Fixes:
+ PR #2149: Workaround MSVC9.0 SP1 fmod bug kb982107.
+ PR #2145, Issue #2009: Fixes kwargs for jitclass __init__
method.
+ PR #2150: Fix slowdown in objmode fallback.
+ PR #2050, Issue #1258: Fix liveness problem with some generator
loops.
+ PR #2072, Issue #1995: Right shift of unsigned LHS should be
logical.
+ PR #2115, Issue #1466: Fix inspect_types() error due to mangled
variable name.
+ PR #2119, Issue #2118: Fix array type created from record-dtype.
+ PR #2122, Issue #1808: Fix returning a generator due to
datamodel error.
-------------------------------------------------------------------
Fri Sep 23 23:38:02 UTC 2016 - toddrme2178@gmail.com
- Initial version