------------------------------------------------------------------- Fri Dec 8 17:59:51 UTC 2017 - arun@gmx.de - update to version 0.36.1: * ParallelAccelerator features: + PR #2457: Stencil Computations in ParallelAccelerator + PR #2548: Slice and range fusion, parallelizing bitarray and slice assignment + PR #2516: Support general reductions in ParallelAccelerator * ParallelAccelerator fixes: + PR #2540: Fix bug #2537 + PR #2566: Fix issue #2564. + PR #2599: Fix nested multi-dimensional parfor type inference issue + PR #2604: Fixes for stencil tests and cmath sin(). + PR #2605: Fixes issue #2603. * PR #2568: Update for LLVM 5 * PR #2607: Fixes abort when getting address to "nrt_unresolved_abort" * PR #2615: Working towards conda build 3 * Misc fixes/enhancements: + PR #2534: Add tuple support to np.take. + PR #2551: Rebranding fix + PR #2552: relative doc links + PR #2570: Fix issue #2561, handle missing successor on loop exit + PR #2588: Fix #2555. Disable libpython.so linking on linux + PR #2601: Update llvmlite version dependency. + PR #2608: Fix potential cache file collision + PR #2612: Fix NRT test failure due to increased overhead when running in coverage + PR #2619: Fix dubious pthread_cond_signal not in lock + PR #2622: Fix `np.nanmedian` for all NaN case. + PR #2633: Fix markdown in CONTRIBUTING.md + PR #2635: Make the dependency on compilers for AOT optional. * CUDA support fixes: + PR #2523: Fix invalid cuda context in memory transfer calls in another thread + PR #2575: Use CPU to initialize xoroshiro states for GPU RNG. Fixes #2573 + PR #2581: Fix cuda gufunc mishandling of scalar arg as array and out argument ------------------------------------------------------------------- Tue Oct 3 06:05:20 UTC 2017 - arun@gmx.de - update to version 0.35.0: * ParallelAccelerator: + PR #2400: Array comprehension + PR #2405: Support printing Numpy arrays + PR #2438: from Support more np.random functions in ParallelAccelerator + PR #2482: Support for sum with axis in nopython mode. + PR #2487: Adding developer documentation for ParallelAccelerator technology. + PR #2492: Core PA refactor adds assertions for broadcast semantics * ParallelAccelerator fixes: + PR #2478: Rename cfg before parfor translation (#2477) + PR #2479: Fix broken array comprehension tests on unsupported platforms + PR #2484: Fix array comprehension test on win64 + PR #2506: Fix for 32-bit machines. * Additional features of note: + PR #2490: Implement np.take and ndarray.take + PR #2493: Display a warning if parallel=True is set but not possible. + PR #2513: Add np.MachAr, np.finfo, np.iinfo + PR #2515: Allow environ overriding of cpu target and cpu features. * Misc fixes/enhancements: + PR #2455: add contextual information to runtime errors + PR #2470: Fixes #2458, poor performance in np.median + PR #2471: Ensure LLVM threadsafety in {g,}ufunc building. + PR #2494: Update doc theme + PR #2503: Remove hacky code added in 2482 and feature enhancement + PR #2505: Serialise env mutation tests during multithreaded testing. + PR #2520: Fix failing cpu-target override tests * CUDA support fixes: + PR #2504: Enable CUDA toolkit version testing + PR #2509: Disable tests generating code unavailable in lower CC versions. + PR #2511: Fix Windows 64 bit CUDA tests. - changes from version 0.34.0: * ParallelAccelerator features: + PR #2318: Transfer ParallelAccelerator technology to Numba + PR #2379: ParallelAccelerator Core Improvements + PR #2367: Add support for len(range(...)) + PR #2369: List comprehension + PR #2391: Explicit Parallel Loop Support (prange) * CUDA support enhancements: + PR #2377: New GPU reduction algorithm * CUDA support fixes: + PR #2397: Fix #2393, always set alignment of cuda static memory regions * Misc Fixes: + PR #2373, Issue #2372: 32-bit compatibility fix for parfor related code + PR #2376: Fix #2375 missing stdint.h for py2.7 vc9 + PR #2378: Fix deadlock in parallel gufunc when kernel acquires the GIL. + PR #2382: Forbid unsafe casting in bitwise operation + PR #2385: docs: fix Sphinx errors + PR #2396: Use 64-bit RHS operand for shift + PR #2404: Fix threadsafety logic issue in ufunc compilation cache. + PR #2424: Ensure consistent iteration order of blocks for type inference. + PR #2425: Guard code to prevent the use of ‘parallel’ on win32 + py27 + PR #2426: Basic test for Enum member type recovery. + PR #2433: Fix up the parfors tests with respect to windows py2.7 + PR #2442: Skip tests that need BLAS/LAPACK if scipy is not available. + PR #2444: Add test for invalid array setitem + PR #2449: Make the runtime initialiser threadsafe + PR #2452: Skip CFG test on 64bit windows * Misc Enhancements: + PR #2366: Improvements to IR utils + PR #2388: Update README.rst to indicate the proper version of LLVM + PR #2394: Upgrade to llvmlite 0.19.* + PR #2395: Update llvmlite version to 0.19 + PR #2406: Expose environment object to ufuncs + PR #2407: Expose environment object to target-context inside lowerer + PR #2413: Add flags to pass through to conda build for buildbot + PR #2414: Add cross compile flags to local recipe + PR #2415: A few cleanups for rewrites + PR #2418: Add getitem support for Enum classes + PR #2419: Add support for returning enums in vectorize + PR #2421: Add copyright notice for Intel contributed files. + PR #2422: Patch code base to work with np 1.13 release + PR #2448: Adds in warning message when using ‘parallel’ if cache=True + PR #2450: Add test for keyword arg on .sum-like and .cumsum-like array methods - changes from version 0.33.0: * There are also several enhancements to the CUDA GPU support: + A GPU random number generator based on xoroshiro128+ algorithm is added. See details and examples in documentation. + @cuda.jit CUDA kernels can now call @jit and @njit CPU functions and they will automatically be compiled as CUDA device functions. + CUDA IPC memory API is exposed for sharing memory between proceses. See usage details in documentation. * Reference counting enhancements: + PR #2346, Issue #2345, #2248: Add extra refcount pruning after inlining + PR #2349: Fix refct pruning not removing refct op with tail call. + PR #2352, Issue #2350: Add refcount pruning pass for function that does not need refcount * CUDA support enhancements: + PR #2023: Supports CUDA IPC for device array + PR #2343, Issue #2335: Allow CPU jit decorated function to be used as cuda device function + PR #2347: Add random number generator support for CUDA device code + PR #2361: Update autotune table for CC: 5.3, 6.0, 6.1, 6.2 * Misc fixes: + PR #2362: Avoid test failure due to typing to int32 on 32-bit platforms + PR #2359: Fixed nogil example that threw a TypeError when executed. + PR #2357, Issue #2356: Fix fragile test that depends on how the script is executed. + PR #2355: Fix cpu dispatcher referenced as attribute of another module + PR #2354: Fixes an issue with caching when function needs NRT and refcount pruning + PR #2342, Issue #2339: Add warnings to inspection when it is used on unserialized cached code + PR #2329, Issue #2250: Better handling of missing op codes * Misc enhancements: + PR #2360: Adds missing values in error mesasge interp. + PR #2353: Handle when get_host_cpu_features() raises RuntimeError + PR #2351: Enable SVML for erf/erfc/gamma/lgamma/log2 + PR #2344: Expose error_model setting in jit decorator + PR #2337: Align blocking terminate support for fork() with new TBB version + PR #2336: Bump llvmlite version to 0.18 + PR #2330: Core changes in PR #2318 ------------------------------------------------------------------- Wed May 3 18:23:09 UTC 2017 - toddrme2178@gmail.com - update to version 0.32.0: + Improvements: * PR #2322: Suppress test error due to unknown but consistent error with tgamma * PR #2320: Update llvmlite dependency to 0.17 * PR #2308: Add details to error message on why cuda support is disabled. * PR #2302: Add os x to travis * PR #2294: Disable remove_module on MCJIT due to memory leak inside LLVM * PR #2291: Split parallel tests and recycle workers to tame memory usage * PR #2253: Remove the pointer-stuffing hack for storing meminfos in lists + Fixes: * PR #2331: Fix a bug in the GPU array indexing * PR #2326: Fix #2321 docs referring to non-existing function. * PR #2316: Fixing more race-condition problems * PR #2315: Fix #2314. Relax strict type check to allow optional type. * PR #2310: Fix race condition due to concurrent compilation and cache loading * PR #2304: Fix intrinsic 1st arg not a typing.Context as stated by the docs. * PR #2287: Fix int64 atomic min-max * PR #2286: Fix #2285 `@overload_method` not linking dependent libs * PR #2303: Missing import statements to interval-example.rst - Implement single-spec version ------------------------------------------------------------------- Wed Feb 22 22:15:53 UTC 2017 - arun@gmx.de - update to version 0.31.0: * Improvements: + PR #2281: Update for numpy1.12 + PR #2278: Add CUDA atomic.{max, min, compare_and_swap} + PR #2277: Add about section to conda recipies to identify license and other metadata in Anaconda Cloud + PR #2271: Adopt itanium C++-style mangling for CPU and CUDA targets + PR #2267: Add fastmath flags + PR #2261: Support dtype.type + PR #2249: Changes for llvm3.9 + PR #2234: Bump llvmlite requirement to 0.16 and add install_name_tool_fixer to mviewbuf for OS X + PR #2230: Add python3.6 to TravisCi + PR #2227: Enable caching for gufunc wrapper + PR #2170: Add debugging support + PR #2037: inspect_cfg() for easier visualization of the function operation * Fixes: + PR #2274: Fix nvvm ir patch in mishandling “load” + PR #2272: Fix breakage to cuda7.5 + PR #2269: Fix caching of copy_strides kernel in cuda.reduce + PR #2265: Fix #2263: error when linking two modules with dynamic globals + PR #2252: Fix path separator in test + PR #2246: Fix overuse of memory in some system with fork + PR #2241: Fix #2240: __module__ in dynamically created function not a str + PR #2239: Fix fingerprint computation failure preventing fallback ------------------------------------------------------------------- Sun Jan 15 00:33:08 UTC 2017 - arun@gmx.de - update to version 0.30.1: * Fixes: + PR #2232: Fix name clashes with _Py_hashtable_xxx in Python 3.6. * Improvements: + PR #2217: Add Intel TBB threadpool implementation for parallel ufunc. ------------------------------------------------------------------- Tue Jan 10 17:17:33 UTC 2017 - arun@gmx.de - specfile: * update copyright year - update to version 0.30.0: * Improvements: + PR #2209: Support Python 3.6. + PR #2175: Support np.trace(), np.outer() and np.kron(). + PR #2197: Support np.nanprod(). + PR #2190: Support caching for ufunc. + PR #2186: Add system reporting tool. * Fixes: + PR #2214, Issue #2212: Fix memory error with ndenumerate and flat iterators. + PR #2206, Issue #2163: Fix zip() consuming extra elements in early exhaustion. + PR #2185, Issue #2159, #2169: Fix rewrite pass affecting objmode fallback. + PR #2204, Issue #2178: Fix annotation for liftedloop. + PR #2203: Fix Appveyor segfault with Python 3.5. + PR #2202, Issue #2198: Fix target context not initialized when loading from ufunc cache. + PR #2172, Issue #2171: Fix optional type unpacking. + PR #2189, Issue #2188: Disable freezing of big (>1MB) global arrays. + PR #2180, Issue #2179: Fix invalid variable version in looplifting. + PR #2156, Issue #2155: Fix divmod, floordiv segfault on CUDA. ------------------------------------------------------------------- Fri Dec 2 21:07:51 UTC 2016 - jengelh@inai.de - remove subjective words from description ------------------------------------------------------------------- Sat Nov 5 17:53:40 UTC 2016 - arun@gmx.de - update to version 0.29.0: * Improvements: + PR #2130, #2137: Add type-inferred recursion with docs and examples. + PR #2134: Add np.linalg.matrix_power. + PR #2125: Add np.roots. + PR #2129: Add np.linalg.{eigvals,eigh,eigvalsh}. + PR #2126: Add array-to-array broadcasting. + PR #2069: Add hstack and related functions. + PR #2128: Allow for vectorizing a jitted function. (thanks to @dhirschfeld) + PR #2117: Update examples and make them test-able. + PR #2127: Refactor interpreter class and its results. * Fixes: + PR #2149: Workaround MSVC9.0 SP1 fmod bug kb982107. + PR #2145, Issue #2009: Fixes kwargs for jitclass __init__ method. + PR #2150: Fix slowdown in objmode fallback. + PR #2050, Issue #1258: Fix liveness problem with some generator loops. + PR #2072, Issue #1995: Right shift of unsigned LHS should be logical. + PR #2115, Issue #1466: Fix inspect_types() error due to mangled variable name. + PR #2119, Issue #2118: Fix array type created from record-dtype. + PR #2122, Issue #1808: Fix returning a generator due to datamodel error. ------------------------------------------------------------------- Fri Sep 23 23:38:02 UTC 2016 - toddrme2178@gmail.com - Initial version