10
0
forked from pool/apache-arrow
Files
apache-arrow/python-pyarrow.changes
Benjamin Greiner 853a205aac - Update to 20.0.0
## Bug Fixes
  * GH-30302 - [C++][Parquet] Preserve the bitwidth of integer
    dictionary indices on round-trip to Parquet (#45685)
  * GH-31992 - [C++][Parquet] Handling the special case when
    DataPageV2 values buffer is empty (#45252)
  * GH-37630 - [C++][Python][Dataset] Allow disabling fragment
    metadata caching (#45330)
  * GH-39023 - [C++][CMake] Add missing launcher path conversion
    for ExternalPackage (#45349)
  * GH-43057 - [C++] Thread-safe AesEncryptor / AesDecryptor
    (#44990)
  * GH-45048 - [C++][Parquet] Deprecate unused chunk_size parameter
    in parquet::arrow::FileWriter::NewRowGroup() (#45088)
  * GH-45129 - [Python][C++] Fix usage of deprecated C++
    functionality on pyarrow (#45189)
  * GH-45132 - [C++][Gandiva] Update LLVM to 18.1 (#45114)
  * GH-45185 - [C++][Parquet] Raise an error for invalid repetition
    levels when delimiting records (#45186)
  * GH-45254 - [C++][Acero] Fix the row offset truncation in row
    table merge (#45255)
  * GH-45266 - [C++][Acero] Fix the running tasks count of
    Scheduler when get error tasks in multi-threads (#45268)
  * GH-45270 - [C++][CI] Disable mimalloc on Valgrind builds
    (#45271)
  * GH-45301 - [C++] Change PrimitiveArray ctor to protected
    (#45444)
  * GH-45334 - [C++][Acero] Fix swiss join overflow issues in row
    offset calculation for fixed length and null masks (#45336)
  * GH-45362 - [C++] Fix identity cast for time and list scalar

OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=55
2025-06-13 18:31:56 +00:00

922 lines
44 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
-------------------------------------------------------------------
Fri Jun 13 18:22:38 UTC 2025 - Ben Greiner <code@bnavigator.de>
- Update to 20.0.0
## Bug Fixes
* GH-36628 - [Python][Parquet] Fail when instantiating internal
Parquet metadata classes (#45549)
* GH-37630 - [C++][Python][Dataset] Allow disabling fragment
metadata caching (#45330)
* GH-44188 - [Python] Fix pandas roundtrip with bytes column
names (#44171)
* GH-45129 - [Python][C++] Fix usage of deprecated C++
functionality on pyarrow (#45189)
* GH-45155 - [Python][CI] Fix path for scientific nightly windows
wheel upload (#45222)
* GH-45169 - [Python] Adapt to modified pytest ignore collect
hook api (#45170)
* GH-45380 - [Python] Expose RankQuantileOptions to Python
(#45392)
* GH-45530 - [Python][Packaging] Add pyarrow.libs dir to
get_library_dirs (#45766)
* GH-45582 - [Python] Preserve decimal32/64/256 metadata in
Schema.metadata (#45583)
* GH-45733 - [C++][Python] Add biased/unbiased toggle to skew and
kurtosis functions (#45762)
* GH-45739 - [C++][Python] Fix crash when calling
hash_pivot_wider without options (#45740)
* GH-45758 - [Python] Add AzureFileSystem documentation (#45759)
* GH-45926 - [Python] Use pytest.approx for float values on
unbiased skew and kurtosis tests (#45929)
* GH-46041 - [Python][Packaging] Temporary remove pandas from
being installed on free-threaded Windows wheel tests (#46042)
## New Features and Improvements
* GH-14932 - [Python] Add python bindings for JSON streaming
reader (#45084)
* GH-35289 - [Python] Support large variable width types in numpy
conversion (#36701)
* GH-36412 - [Python][CI] Fix deprecation warnings in the pandas
nightly build
* GH-39010 - [Python] Introduce maps_as_pydicts parameter for
to_pylist, to_pydict, as_py (#45471)
* GH-41002 - [Python] Remove pins for pytest-cython and
conda-docs pytest (#45240)
* GH-41985 - [Python][Docs] Clarify docstring of
pyarrow.compute.scalar() (#45668)
* GH-43587 - [Python] Remove no longer used serialize/deserialize
PyArrow C++ code (#45743)
* GH-44421 - [Python] Add configuration for building & testing
free-threaded wheels on Windows (#44804)
* GH-44790 - [Python] Remove use_legacy_dataset from code base
(#45742)
* GH-45156 - [Python][Packaging] Refactor Python Windows wheel
images to use newer base image (#45442)
* GH-45237 - [Python] Raise minimum supported cython to >=3
(#45238)
* GH-45278 - [Python][Packaging] Updated delvewheel install
command and updated flags used with delvewheel repair (#45323)
* GH-45282 - [Python][Parquet] Remove unused readonly properties
of ParquetWriter (#45281)
* GH-45288 - [Python][Packaging][Docs] Update documentation for
PyArrow nightly wheels (#45289)
* GH-45358 - [C++][Python] Add MemoryPool method to print
statistics (#45359)
* GH-45433 - [Python] Remove Cython workarounds (#45437)
* GH-45457 - [Python] Add pyarrow.ArrayStatistics (#45550)
* GH-45482 - [CI][Python] Dont use Ubuntu 20.04 for wheel test
(#45483)
* GH-45570 - [Python] Allow Decimal32/64Array.to_pandas (#45571)
* GH-45676 - [C++][Python][Compute] Add skew and kurtosis
functions (#45677)
* GH-45680 - [C++][Python] Remove deprecated functions in 20.0
* GH-45705 - [Python] Add support for SAS token in
AzureFileSystem (#45706)
* GH-45755 - [C++][Python][Compute] Add winsorize function
(#45763)
* GH-45848 - [C++][Python][R] Remove deprecated PARQUET_2_0
(#45849)
* GH-45920 - [Release][Python] Upload sdist and wheels to GitHub
Releases not apache.jfrog.io (#45962)
-------------------------------------------------------------------
Mon Feb 17 19:17:26 UTC 2025 - Ben Greiner <code@bnavigator.de>
- Update to 19.0.1
## Bug Fixes
* [Python][CI] Make download_tzdata_on_windows more robust and
use tzdata package for tzinfo database on Windows for ORC
(#45425)
* [Python] Only enable the string dtype on pandas export for
pandas>=2.3 (#45383) [Python] Fix version comparison in pandas
compat for pandas 2.3 dev version (#45428)
## Improvements
* [CI][Python] Temporarily avoid newer boto3 version (#45311)
[CI] Bump Minio version and unpin boto3 (#45320)
- Release 19.0.0
## New Features and Improvements
* [Python] Add more FlightInfo / FlightEndpoint attributes
(#43537)
* [Python] Support Arrow PyCapsule stream objects in
write_dataset (#43771)
* [Python] Support pandas future default string dtype
* [CI][Python] Use GitHub Packages for vcpkg cache (#44644)
* [Python] Add Python wrapper for JsonExtensionType (#44070)
* [Python][C++] Add version suffix to libarrow_python* libraries
(#44702)
* [Python] Add support for Decimal32 and Decimal64 types (#44882)
* [C++][Python] Add Hyperbolic Trig functions (#44630)
* [Python] Clean-up name / field_name handling in pandas compat
(#44963)
* [CI][Python][Packaging] Test 3.12 wheels on Ubuntu 24.04
(#45042)
* [CI][Packaging][Python] Simplify
dev/tasks/python-wheels/github.linux.yml (#45077)
* [Python] Honor the strings_to_categorical keyword in to_pandas
for string view type (#45176)
## Bug Fixes
* [C++][Python] Fix ORC crash when file contains unknown timezone
(#45051)
* [Python] Converting month_day_nano_interal to numpy crashes
* [Python] Allow from_buffers to work with StringView on Python
(#44701)
* [C++][Python] Fix Flight Timestamp precision, revert workaround
from #43537 (#44681)
* [Docs][Python] Add missing canonical extension types to PyArrow
arrays and datatypes docs (#44880)
* [Python] Trigger manual Garbage collection before checking
allocated bytes for dlpack tests (#44793)
* [Python][Packaging] Use delvewheel to repair Windows wheels
(#35323)
* [CI][Python] Fix and modernize AppVeyor build (#44999)
* [Python][Docs] Update docstrings for metadata methods on Field
and Schema classes (#45004)
* [CI][Python] Fix test_memory failures (#45007)
* [CI][Packaging][Python] Fix Docker push step for free-threaded
wheel builds (#45040)
* [Packaging][Python] Use ORC from vcpkg instead of bundled on
Linux and macOS (#45046)
- Release 18.1.0
## Bug Fixes
* [Release][Packacing][Python] Set PARQUET_TEST_DATA on
verify-release-candidate-wheels.bat (#44462)
## New Features and Improvements
- Release 18.0.0
## Bug Fixes
* [Python][Packaging] Bump MACOSX_DEPLOYMENT_TARGET to 12 instead
of 11 (#43137)
* [Release][Packaging][Python] Add tzdata as conda env
requirement to avoid ORC failure (#43233)
* [Python] Give precedence to pycapsule interface in
pa.schema(..) (#43486)
* [Python] Sanitize Python reference handling in UDF
implementation (#43557)
* [Python] Allow tuple for rename columns (#43609)
* [Packaging][Python] Fix vcpkg version detection in macOS wheel
build jobs (#43615)
* [Python] Fix compilation on Cython<3 (#43765)
* [Python][CI] Correct PARQUET_TEST_DATA path in wheel tests
(#43786)
* [CI][Packaging][Python] Avoid uploading wheel to gemfury if
version already exists (#43816)
* [CI][Python] Skip test that requires PARQUET_TEST_DATA env on
emscripten (#43906)
* [Python] Fix threading issues with borrowed refs and pandas
(#44047)
* [Benchmarking][Python] Avoid uwsgi install failure on macOS
(#44221)
* [CI][Release][Python] Do not verify Python on Ubuntu 20.04
(#44254)
* [CI][Python] Remove ds requirement from test collection on
test_dataset.py (#44370)
## New Features and Improvements
* [C++][Python] Native support for UUID (#37298)
* [C++][Python] Bool8 Extension Type Implementation (#43488)
* [Python] Make NumPy an optional runtime dependency (#41904)
* [Python] Add StructType attribute to access all its fields
(#43481)
* [CI][Python] Use pipx to install GCS testbench (#43852)
* [Python][CI][Packaging] Dont upload sdist to scientific-python
nightly channel (only wheels) (#43943)
* [Python][CI][Packaging] Upload nightly wheels to main label of
scientific-python-nightly-wheels channel (#43932)
* [CI][Packaging][Python] Upload pyarrow nightly wheels to
scientific python channel on Anaconda (#43862)
* [C++][Python][Parquet] Support reading/writing key-value
metadata from/to ColumnChunkMetaData (#41580)
* [Python] Ensure (Chunked)Array/RecordBatch/Table methods dont
crash with non-CPU data
* [Python] Let StructArray.from_array accept a type in addition
to names or fields (#43047)
* [Python] Test FlightStreamReader iterator (#42086)
* [Python] Add bindings for CopyTo on RecordBatch and Array
classes (#42223)
* [Python] Use Py_IsFinalizing from pythoncapi_compat.h (#43767)
* [Python] Add bindings for memory manager and device to Context
class (#43392)
* [C++][Python] Add Opaque canonical extension type (#43458)
* [Python] Deprecate passing build flags to setup.py (#43515)
* [Python][Packaging][CI] Drop Python 3.8 support (#43970)
* [Python][CI] Add Python 3.13 conda test build (#44192)
* [Python][CI][Packaging] Use released versions to build and test
wheels on Python 3.13 (#44193)
* [Python] Set up wheel building for Python 3.13 (#43539)
* [Python] Remove usage of deprecated pkg_resources in setup.py
(#43602)
* [Python][CI] Add a Crossbow job with the free-threaded build
(#43671)
* [Python] Do not use borrowed references APIs (#43540)
* [Python] Declare support for free-threading in Cython (#43606)
* [Python][CI] Add a Crossbow job with a debug CPython
interpreter (#43565)
* [Python][Dataset] Python / Cython interface to C++
arrow::dataset::Partitioning::Format (#43740)
* [Python][CI] Simplify python/requirements-wheel-test.txt file
(#43691)
* [Python] RecordBatch fails gracefully on non-cpu devices
(#43729)
* [Python] ChunkedArray fails gracefully on non-cpu devices
(#43795)
* [Python][Packaging] Remove numpy dependency from pyarrow
packaging (#44148)
* [Python] Build macOS and manylinux wheels for free-threading
(#43965)
* [Python] Table fails gracefully on non-cpu devices (#43974)
* [Python] Deprecate the no longer used serialize/deserialize
Pyarrow C++ functions (#44064)
* [CI][Python] Enable S3 testing on Windows wheel builds (#44093)
* [CI][Python] Enable S3 tests on macOS CI (#44129)
* [Packaging][Python] Use macOS 12 as deployment target to have
macOS 12 pyarrow wheels (#44315)
* [Packaging][Python] Disable interactive deb configuration in
wheel-manylinux--cp313t- (#44362)
- Drop pyarrow-pr433325-extradirs.patch
-------------------------------------------------------------------
Thu Sep 26 23:24:22 UTC 2024 - Guang Yee <gyee@suse.com>
- Enable sle15_python_module_pythons.
-------------------------------------------------------------------
Wed Aug 14 20:27:48 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Update to 17.0.0
## Bug Fixes
* [C++][Python] Fix casting to extension type with fixed size
list storage type (#42219)
* [Python] Include metadata when creating pa.schema from
PyCapsule (#41538)
* [C++][Python] RecordBatch.filter() segfaults if passed a
ChunkedArray (#40971)
* [Python] pa.array: add check for byte-swapped numpy arrays
inside python objects (#41549)
* [Python] Fix read_table for encrypted parquet (#39438)
* [Python] RunEndEncodedArray.from_arrays: bugfix for Array
arguments (#40560) (#41093)
* [C++][Python] Map child Array constructed from keys and items
shouldnt have offset (#40871)
* [Python] `test_numpy_array_protocol` test failures with numpy
2.0.0rc1
* [Python] Fix StructArray.sort() for by=None (#41495)
* [Python] Build with Python 3.13 (#42034)
* [Python] remove special methods related to buffers in python
<2.6 (#41492)
* [Python] Fix reading column index with decimal values (#41503)
* [Docs][Python] Remove duplicate contents (#41588)
* [C++][Python] Add optional null_bitmap to MapArray::FromArrays
(#41757)
* [Python][Parquet] Implement to_dict method on SortingColumn
(#41704)
* [Python] CMake: ignore Parquet encryption option if Parquet
itself is not enabled (fix Java integration build) (#41776)
* [Python] Disallow direct pa.RecordBatchReader() construction to
avoid segfaults (#41773)
* [Python] Fix RecordBatchReader.cast to support casting to equal
schema for all types (#42098)
* [Python] Fix tests when using NumPy 2.0 on Windows (#42099)
* [CI][Python] Use pip install -e instead of setup.py build_ext
inplace for installing pyarrow on verification script (#42007)
* [CI][Python][C++] Fix utf8proc detection for wheel on Windows
(#42022)
* [Python][CI] Update expected output for numpy 2.0.0 (#42172)
## New Features and Improvements
* [Python] Replace pandas.util.testing.rands with vendored
version (#42089)
* [Python] begin moving static settings to pyproject.toml
(#41041)
* [Python] Implement PyCapsule interface for Device data in
PyArrow (#40717)
* [Python] Expand the Arrow PyCapsule Interface with C Device
Data support (#40708)
* [Python] Let RecordBatch.filter accept a boolean expression in
addition to mask array (#43043)
* [Python] Fix pickling of LocalFileSystem for cython 2 (#41459)
* [Python] Expand the C Device Interface bindings to support
import on CUDA device (#40385)
* [Python] Allow passing a mapping of column names to
rename_columns (#40645)
* [Python][Packaging] Strip unnecessary symbols when building
wheels (#42028)
* [Python][Docs] Update PyArrow installation docs for conda
package split (#41135)
* [Python] Basic bindings for Device and MemoryManager classes
(#41685)
* [C++][Python] Expose recursive flatten for lists on
list_flatten kernel function and pyarrow bindings (#41295)
* [Python][Packaging] Ensure to build with released numpy 2.0
(instead of RC) in the wheel building workflows (#42194)
* [CI][Python] Add a job on ARM64 macOS (#41313)
* [CI][Python] Reduce CI time on macOS (#41378)
* [Python] Expose byte_width and bit_width of ExtensionType in
terms of the storage type (#41413)
* [Python] Update Python development guide about components being
enabled by default based on Arrow C++ (#41705)
* [Python] Building PyArrow: enable/disable python components by
default based on availability in Arrow C++ (#41494)
* [C++][Python] Extends the add_key_value to parquet::arrow and
PyArrow (#41633)
* [Python] Ensure Buffer methods dont crash with non-CPU data
(#41889)
* [C++][Python] PrettyPrint non-cpu data by copying to default
CPU device (#42010)
* [Python][Parquet] Update BYTE_STREAM_SPLIT description in
write_table() docstring (#41759)
* [Python] Add support for Pyodide (#37822)
* [Python] Fix pandas tests to follow downstream datetime64 unit
changes (#41979)
* [Python] Allow Array.filter() to take general array input
(#42051)
* [Python] Expose new FLOAT16 logical type in the pyarrow.parquet
bindings (#42103)
* [Python] Array gracefully fails on non-cpu device (#42113)
* [Python][Parquet] Pyarrow store decimal as integer (#42169)
* [Python] Add CI job for Numpy 1.X (#42189)
* [CI][Python] Pin openjdk=17 in python substrait integration
(#43051)
- Drop pyarrow-pr41319-numpy2-tests.patch
- Add pyarrow-pr433325-extradirs.patch gh#apache/arrow/pull/43325
-------------------------------------------------------------------
Thu Apr 25 08:58:22 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Update to 16.0.0
* [Python] construct pandas.DataFrame with public API in
to_pandas (#40897)
* [Python] Fix ORC test segfault in the python wheel windows test
(#40609)
* [Python] Attach Python stacktrace to errors in ConvertPyError
(#39380)
* [Python] Plug reference leaks when creating Arrow array from
Python list of dicts (#40412)
* [Python] Empty slicing an array backwards beyond the start is
now empty (#40682)
* [Python] Slicing an array backwards beyond the start now
includes first item. (#39240)
* [Python] Calling
pyarrow.dataset.ParquetFileFormat.make_write_options as a class
method results in a segfault (#40976)
* [Python] Fix parquet import in encryption test (#40505)
* [Python] fix raising ValueError on _ensure_partitioning
(#39593)
* [Python] Validate max_chunksize in Table.to_batches (#39796)
* [C++][Python] Fix test_gdb failures on 32-bit (#40293)
* [Python] Make Tensor.__getbuffer__ work on 32-bit platforms
(#40294)
* [Python] Avoid using np.take in Array.to_numpy() (#40295)
* [Python][C++] Fix large file handling on 32-bit Python build
(#40176)
* [Python] Update size assumptions for 32-bit platforms (#40165)
* [Python] Fix OverflowError in foreign_buffer on 32-bit
platforms (#40158)
* [Python] Add Type_FIXED_SIZE_LIST to _NESTED_TYPES set (#40172)
* [Python] Mark ListView as a nested type (#40265)
* [Python] only allocate the ScalarMemoTable when used (#40565)
* [Python] Error compiling Cython files on Windows during release
verification
* [Python] Fix flake8 failures in python/benchmarks/parquet.py
(#40440)
* [Python] Suppress python/examples/minimal_build/Dockerfile.*
warnings (#40444)
* [Python][Docs] Add workaround for autosummary (#40739)
* [Python] BUG: Empty slicing an array backwards beyond the start
should be empty
* [CI][Python] Activate ARROW_PYTHON_VENV if defined in
sdist-test job (#40707)
* [CI][Python] CI failures on Python builds due to pytest_cython
(#40975)
* [Python] ListView pandas tests should use np.nan instead of
None (#41040)
* [C++][Python] Sporadic asof_join failures in PyArrow
## New Features and Improvements
* [Python][CI] Remove legacy hdfs tests from hdfs and hypothesis
setup (#40363)
* [Python] Remove deprecated pyarrow.filesystem legacy
implementations (#39825)
* [C++][Python] Add missing methods to RecordBatch (#39506)
* [Python][CI] Support ORC in Windows wheels
* [Python] Correct test marker for join_asof tests (#40666)
* [Python] Add join_asof binding (#34234)
* [Python] Add a function to download and extract timezone
database on Windows (#38179)
* [Python][CI][Packaging] Enable ORC on Windows Appveyor CI and
Windows wheels for pyarrow
* [Python] Add a FixedSizeTensorScalar class (#37533)
* [Python][CI][Dev][Python] Release and merge script errors
(#37819)" (#40150)
* [Python] Construct pyarrow.Field and ChunkedArray through Arrow
PyCapsule Protocol (#40818)
* [Python] Fix missing byte_width attribute on DataType class
(#39592)
* [Python] Compatibility with NumPy 2.0
* [Packaging][Python] Enable building pyarrow against numpy 2.0
(#39557)
* [Python] Basic pyarrow bindings for Binary/StringView classes
(#39652)
* [Python] Expose force_virtual_addressing in PyArrow (#39819)
* [Python][Parquet] Support hashing for FileMetaData and
ParquetSchema (#39781)
* [Python] Add bindings for ListView and LargeListView (#39813)
* [Python][Packaging] Build pyarrow wheels with numpy RC instead
of nightly (#41097)
* [Python] Support creating Binary/StringView arrays from python
objects (#39853)
* [Python] ListView support for pa.array() (#40160)
* [Python][CI] Remove upper pin on pytest (#40487)
* [Python][FS][Azure] Minimal Python bindings for AzureFileSystem
(#40021)
* [Python] Low-level bindings for exporting/importing the C
Device Interface (#39980)
* [Python] Add ChunkedArray import/export to/from C (#39985)
* [Python] Use Cast() instead of CastTo (#40116)
* [C++][Python] Basic conversion of RecordBatch to Arrow Tensor
(#40064)
* [C++][Python] Basic conversion of RecordBatch to Arrow Tensor -
add support for different data types (#40359)
* [C++][Python] Basic conversion of RecordBatch to Arrow Tensor -
add option to cast NULL to NaN (#40803)
* [Python] Support requested_schema in __arrow_c_stream__()
(#40070)
* [Python] Support Binary/StringView conversion to numpy/pandas
(#40093)
* [Python] Allow FileInfo instances to be passed to dataset init
(#40143)
* [Python][CI] Add 32-bit Debian build on Crossbow (#40164)
* [Python] ListView arrow-to-pandas conversion (#40482)
* [Python][CI] Disable generating C lines in Cython tracebacks
(#40225)
* [Python] Support construction of Run-End Encoded arrays in
pa.array(..) (#40341)
* [Python] Accept dict in pyarrow.record_batch() function
(#40292)
* [Python] Update for NumPy 2.0 ABI change in
PyArray_Descr->elsize (#40418)
* [Python][CI] Fix install of nightly dask in integration tests
(#40378)
* [Python] Fix byte_width for binary(0) + fix hypothesis tests
(#40381)
* [Python][CI] Fix dataset partition filter tests with pandas
nightly (#40429)
* [Docs][Python] Added JsonFileFormat to docs (#40585)
* [Dev][C++][Python][R] Use pre-commit for clang-format (#40587)
* [Python][C++] Support conversion of pyarrow.RunEndEncodedArray
to numpy/pandas (#40661)
* [Python] Simplify and improve perf of creation of the column
names in Table.to_pandas (#40721)
* [Docs][C++][Python] Add initial documentation for
RecordBatch::Tensor conversion (#40842)
* [C++][Python] Basic conversion of RecordBatch to Arrow Tensor -
add support for row-major (#40867)
* [CI][Python] check message in test_make_write_options_error for
Cython 2 (#41059)
* [Python] Add copy keyword in Array.array for numpy 2.0+
compatibility (#41071)
* [Python][Packaging] PyArrow wheel building is failing because
of disabled vcpkg install of liblzma
- Drop apache-arrow-pr40230-glog-0.7.patch
- Drop apache-arrow-pr40275-glog-0.7-2.patch
- Add pyarrow-pr41319-numpy2-tests.patch gh#apache/arrow#41319
-------------------------------------------------------------------
Sat Mar 23 15:23:23 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Update to 15.0.2
## Bug Fixes
* [Python] Fix except clauses (#40387)
* [Python][CI] Skip failing test_dateutil_tzinfo_to_string
(#40486)
-------------------------------------------------------------------
Wed Feb 28 12:12:36 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Move to science/apache-arrow as multibuild package
- Also needs the cpp GLOG patches
* Add apache-arrow-pr40230-glog-0.7.patch
* Add apache-arrow-pr40275-glog-0.7-2.patch
-------------------------------------------------------------------
Fri Feb 23 17:35:37 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Update to 15.0.1
## Bug Fixes
* [Python] Fix race condition in _pandas_api#_check_import
(#39314)
* [Python] Avoid leaking references to Numpy dtypes (#39636)
* [Release] Update platform tags for macOS wheels to macosx_10_15
(#39657)
* [Python][CI] Fix test failures with latest/nightly pandas
(#39760)
* [C#] Restore support for .NET 4.6.2 (#40008)
* [Python] Make capsule name check more lenient (#39977)
* [Python][FlightRPC] Release GIL in GeneratorStream (#40005)
## New Features and Improvements
* [Python] Remove the use of pytest-lazy-fixture (#39850)
* [Python][CI] Pin moto<5 for dask integration tests (#39881)
* [Python] Fix tests for pandas with CoW / nightly integration
tests (#40000)
- Release 15.0.0
## Bug Fixes
* [C++][Python] Add a no-op kernel for
dictionary_encode(dictionary) (#38349)
* [Python] Fix S3FileSystem equals None segfault (#39276)
* Fix TestArrowReaderAdHoc.ReadFloat16Files to use new
uncompressed files (#38825)
* [Python] Fix spelling (#38945)
* [CI][Python] Update pandas tests failing on pandas nightly CI
build (#39498)
* [CI][JS] Force node 20 on JS build on arm64 to fix build issues
(#39499)
## New Features and Improvements
* [C++][Python] Add "Z" to the end of timestamp print string when
tz defined (#39272)
* [Python] Remove the legacy ParquetDataset custom python-based
implementation (#39112)
* [Python] add Table.to/from_struct_array (#38520)
* [C++][Python] DLPack implementation for Arrow Arrays (producer)
(#38472)
* [Python] FixedSizeListArray.from_arrays supports mask parameter
(#39396)
* [C++][Python][R] Allow users to adjust S3 log level by
environment variable (#38267)
* [Python] Expose Parquet sorting metadata (#37665)
* [C++][Python][Parquet] Implement Float16 logical type (#36073)
* [Python] Make CacheOptions configurable from Python (#36627)
* [Python][Parquet] Parquet Support write and validate Page CRC
(#38360)
* [Python][Dataset] Expose file size to python dataset (#37868)
* [R] Allow code() to return package name prefix. (#38144)
* [Python] Remove usage of pandas internals DatetimeTZBlock
(#38321)
* Add validation logic for offsets and values to
arrow.array.ListArray.fromArrays (#38531)
* [Python][Compute] Describe strptime format semantics (#38665)
* [Python] Remove dead code in _reconstruct_block (#38714)
* [Python] Fix append mode for cython 2 (#39027)
* [Python] Add append mode for pyarrow.OsFile (#38820)
* [Python] Extract libparquet requirements out of
libarrow_python.so to new libarrow_python_parquet_encryption.so
(#39316)
* Create module info compiler plugin (#39135)
* [Python] RecordBatchReader.from_stream constructor for objects
implementing the Arrow PyCapsule protocol (#39218)
* [Python] Pass in type to MapType.from_arrays (#39516)
* [Python][CI] Skip failing dask tests: test_describe_empty and
test_view (#39534)
* [Python] NumPy 2.0 compat: remove usage of np.core (#39535)
* [Packaging][Python] Add a numpy<2 pin to the install
requirements for the 15.x release branch (#39538)
-------------------------------------------------------------------
Mon Jan 15 20:42:25 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Update to 14.0.2
## New Features and Improvements
* GH-38342 - [Python] Update to_pandas to use non-deprecated
DataFrame constructor (#38374)
* GH-38364 - [Python] Initialize S3 on first use (#38375)
## Bug Fixes
* GH-38345 - [Release] Use local test data for verification if
possible (#38362)
* GH-38577 - Reading parquet file behavior change from 13.0.0 to
14.0.0
* GH-38626 - [Python] Fix segfault when PyArrow is imported at
shutdown (#38637)
* GH-38676 - [Python] Fix potential deadlock when CSV reading
errors out (#38713)
* GH-38984 - [Python][Packaging] Verification of wheels on
AlmaLinux 8 are failing due to missing pip (#38985)
* GH-39074 - [Release][Packaging] Use UTF-8 explicitly for KEYS
(#39082)
-------------------------------------------------------------------
Tue Nov 14 23:29:03 UTC 2023 - Ondřej Súkup <mimi.vx@gmail.com>
- Fix cve in changelog
-------------------------------------------------------------------
Tue Nov 14 09:28:23 UTC 2023 - Ondřej Súkup <mimi.vx@gmail.com>
- Update to 14.0.1
- drop pyarrow-pr37481-pandas2.1.patch
- fixes boo#1216991 CVE-2023-47248
* GH-38431 - [Python][CI] Update fs.type_name checks for s3fs tests
* GH-38607 - [Python] Disable PyExtensionType autoload
- update to 14.0.0
* very long list of changes can be found here:
https://arrow.apache.org/release/14.0.0.html
-------------------------------------------------------------------
Thu Aug 31 18:43:55 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Update to 13.0.0
## Compatibility notes:
* The default format version for Parquet has been bumped from 2.4
to 2.6 GH-35746. In practice, this means that nanosecond
timestamps now preserve its resolution instead of being
converted to microseconds.
* Support for Python 3.7 is dropped GH-34788
## New features:
* Conversion to non-nano datetime64 for pandas >= 2.0 is now
supported GH-33321
* Write page index is now supported GH-36284
* Bindings for reading JSON format in Dataset are added GH-34216
* keys_sorted property of MapType is now exposed GH-35112
## Other improvements:
* Common python functionality between Table and RecordBatch
classes has been consolidated ( GH-36129, GH-35415, GH-35390,
GH-34979, GH-34868, GH-31868)
* Some functionality for FixedShapeTensorType has been improved
(__reduce__ GH-36038, picklability GH-35599)
* Pyarrow scalars can now be accepted in the array constructor
GH-21761
* DataFrame Interchange Protocol implementation and usage is now
documented GH-33980
* Conversion between Arrow and Pandas for map/pydict now has
enhanced support GH-34729
* Usability of pc.map_lookup / MapLookupOptions is improved
GH-36045
* zero_copy_only keyword can now also be accepted in
ChunkedArray.to_numpy() GH-34787
* Python C++ codebase now has linter support in Archery and the
CI GH-35485
## Relevant bug fixes:
* __array__ numpy conversion for Table and RecordBatch is now
corrected so that np.asarray(pa.Table) doesnt return a
transposed result GH-34886
* parquet.write_to_dataset doesnt create empty files for
non-observed dictionary (category) values anymore GH-23870
* Dataset writer now also correctly follows default Parquet
version of 2.6 GH-36537
* Comparing pyarrow.dataset.Partitioning with other type is now
correctly handled GH-36659
* Pickling of pyarrow.dataset PartitioningFactory objects is now
supported GH-34884
* None schema is now disallowed in parquet writer GH-35858
* pa.FixedShapeTensorArray.to_numpy_ndarray is not failing on
sliced arrays GH-35573
* Halffloat type is now supported in the conversion from Arrow
list to pandas GH-36168
* __from_arrow__ is now also implemented for Array.to_pandas for
pandas extension data types GH-36096
- Add pyarrow-pr37481-pandas2.1.patch gh#apache/arrow#37481
-------------------------------------------------------------------
Fri Aug 25 12:52:17 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Limit to Cython < 3
-------------------------------------------------------------------
Mon Jun 12 12:22:31 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Update to 12.0.1
## Bug Fixes
* [GH-35389] - [Python] Fix coalesce_keys=False option in join
operation (#35505)
* [GH-35821] - [Python][CI] Skip extension type test failing with
pandas 2.0.2 (#35822)
* [GH-35845] - [CI][Python] Fix usage of assert_frame_equal in
test_hdfs.py (#35842)
## New Features and Improvements
* [GH-35329] - [Python] Address pandas.types.is_sparse deprecation
(#35366)
- Drop pyarrow-pr35822-pandas2-extensiontype.patch
-------------------------------------------------------------------
Wed Jun 7 07:39:44 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Skip invalid pandas 2 test
* pyarrow-pr35822-pandas2-extensiontype.patch
* gh#apache/arrow#35822
* gh#apache/arrow#35839
-------------------------------------------------------------------
Thu May 18 07:28:28 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Update to 12.0.0
## Compatibility notes:
* Plasma has been removed in this release (GH-33243). In
addition, the deprecated serialization module in PyArrow was
also removed (GH-29705). IPC (Inter-Process Communication)
functionality of pyarrow or the standard library pickle should
be used instead.
* The deprecated use_async keyword has been removed from the
dataset module (GH-30774)
* Minimum Cython version to build PyArrow from source has been
raised to 0.29.31 (GH-34933). In addition, PyArrow can now be
compiled using Cython 3 (GH-34564).
## New features:
* A new pyarrow.acero module with initial bindings for the Acero
execution engine has been added (GH-33976)
* A new canonical extension type for fixed shaped tensor data has
been defined. This is exposed in PyArrow as the
FixedShapeTensorType (GH-34882, GH-34956)
* Run-End Encoded arrays binding has been implemented (GH-34686,
GH-34568)
* Method is_nan has been added to Array, ChunkedArray and
Expression (GH-34154)
* Dataframe interchange protocol has been implemented for
RecordBatch (GH-33926)
## Other improvements:
* Extension arrays can now be concatenated (GH-31868)
* get_partition_keys helper function is implemented in the
dataset module to access the partitioning fields key/value
from the partition expression of a certain dataset fragment
(GH-33825)
* PyArrow Array objects can now be accepted by the pa.array()
constructor (GH-34411)
* The default row group size when writing parquet files has been
changed (GH-34280)
* RecordBatch has the select() method implemented (GH-34359)
* New method drop_column on the pyarrow.Table supports passing a
single column as a string (GH-33377)
* User-defined tabular functions, which are a user-functions
implemented in Python that return a stateful stream of tabular
data, are now also supported (GH-32916)
* Arrow Archery tool now includes linting of the Cython files
(GH-31905)
* Breaking Change: Reorder output fields of “group_by” node so
that keys/segment keys come before aggregates (GH-33616)
## Relevant bug fixes:
* Acero can now detect and raise an error in case a join
operation needs too much bytes of key data (GH-34474)
* Fix for converting non-sequence object in pa.array() (GH-34944)
* Fix erroneous table conversion to pandas if table includes an
extension array that does not implement to_pandas_dtype
(GH-34906)
* Reading from a closed ArrayStreamBatchReader now returns
invalid status instead of segfaulting (GH-34165)
* array() now returns pyarrow.Array and not pyarrow.ChunkedArray
for columns with __arrow_array__ method and only one chunk so
that the conversion of pandas dataframe with categorical column
of dtype string[pyarrow] does not fail (GH-33727)
* Custom type mapper in to_pandas now converts index dtypes
together with column dtypes (GH-34283)
-------------------------------------------------------------------
Wed Mar 29 13:25:55 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Fix tests expecting the jemalloc backend which was disabled in
the apache-arrow package
-------------------------------------------------------------------
Sun Mar 12 05:31:32 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Update to v11.0.0
* [Python][Doc] Add five more numpydoc checks to CI (#15214)
* [Python][CI][Doc] Enable numpydoc check PR03 (#13983)
* [Python] Expose flag to enable/disable storing Arrow schema in Parquet metadata (#13000)
* [Python] Add support for reading record batch custom metadata API (#13041)
* [Python] Add lazy Dataset.filter() method (#13409)
* [Python] ParquetDataset to still take legacy code path when old filesystem is passed (#15269)
* [Python] Switch default and deprecate use_legacy_dataset=True in ParquetDataset (#14052)
* [Python] Support lazy Dataset.filter
* [Python] Order of columns in pyarrow.feather.read_table (#14528)
* [Python] Construct MapArray from sequence of dicts (instead of list of tuples) (#14547)
* [Python] Unify CMakeLists.txt in python/ (#14925)
* [C++][Python] Implement list_slice kernel (#14395)
* [C++][Python] Enable struct_field kernel to accept string field names (#14495)
* [Python][C++] Add use\_threads to run\_substrait\_query
* [Python][Docs] adding info about TableGroupBy.aggregation with empty list (#14482)
* [Python] DataFrame Interchange Protocol for pyarrow Table
* [Python] Drop older versions of Pandas (<1.0) (#14631)
* [Python] Pass Cmake args to Python CPP
* [Docs][Python] Improve docs for S3FileSystem (#14599)
* [Python] Add missing value accessor to temporal types (#14746)
* [Python] Expose time32/time64 scalar values (#14637)
* [Python] Remove gcc 4.9 compatibility code (#14602)
* [C++][Python] Support slicing to end in list_slice kernel (#14749)
* [C++][Python] Support step >= 1 in list_slice kernel (#14696)
* [Release][Python] Upload .wheel/.tar.gz for release not RC (#14708)
* [Python] Expose Scalar.validate() (#15149)
* [Python] PyArrow C++ header files no longer always included in installed pyarrow (#14656)
* [Doc][Python] Update note about bundling Arrow C++ on Windows (#14660)
* [Python] Reduce warnings during tests (#14729)
* [Python] Expose reading a schema from an IPC message (#14831)
* [Python] Expose QuotingStyle to Python (#14722)
* [Python] Add (Chunked)Array sort() method (#14781)
* [Python] Dataset.sort_by (#14976)
* [Python] Avoid dependency on exec plan in Table.sort_by to fix minimal tests (#15268)
* [Python] Remove auto generated pyarrow_api.h and pyarrow_lib.h (#15219)
* [Python] Error if datetime.timedelta to pyarrow.duration conversion overflows (#13718)
* [Python] to_pandas fails with FixedOffset timezones when timestamp_as_object is used (#14448)
* [Python] Pass **kwargs in read_feather to to_pandas() (#14492)
* [Python] Add python test for decimals to csv (#14525)
* [Python] Test that reading of timedelta is stable (read_feather/to_pandas) (#14531)
* [C++][Python] Improve s3fs error message when wrong region (#14601)
* [Python][C++] Adding support for IpcWriteOptions to the dataset ipc file writer (#14414)
* [Python] Support passing create_dir thru pq.write_to_dataset (#14459)
* [CI][Python] Fix pandas master/nightly build failure related to timedelta (#14460)
* [Python] Fix writing files with multi-byte characters in file name (#14764)
* [Python] Handle pytest 8 deprecations about pytest.warns(None)
* [Python] Remove ARROW_BUILD_DIR in building pyarrow C++ (#14498)
* [Python] Honor default memory pool in Dataset scanning (#14516)
* [Python] Fully support filesystem in parquet.write_metadata (#14574)
* [Python] Check schema argument type in RecordBatchReader.from_batches (#14583)
* [Python][Docs] PyArrow table join docstring typos for left and right suffix arguments (#14591)
* [Python] pass back time types with correct type class (#14633)
* [Python] Support filesystem parameter in ParquetFile (#14717)
* [Python][Docs] Add missing CMAKE_PREFIX_PATH to allow setup.py CMake invocations to find Arrow CMake package (#14586)
* [Python][CI] Add DYLD_LIBRARY_PATH to avoid requiring PYARROW_BUNDLE_ARROW_CPP on macOS job (#14643)
* [Python] Don't crash when schema=None in FlightClient.do_put (#14698)
* [Python] Change warnings to _warnings in _plasma_store_entry_point (#14695)
* [CI][Python] Update nightly test-conda-python-3.7-pandas-0.24 to pandas >= 1.0 (#14714)
* [CI][Python] Update spark test modules to match spark master (#14715)
* [Python] Fix test_s3fs_wrong_region; set anonymous=True (#14716)
* [Python][CI] Fix nightly job using pandas dev (temporarily skip tests) (#15048)
* [Python] Quadratic memory usage of Table.to\_pandas with nested data
* [Python] Fix pyarrow.get_libraries() order (#14944)
* [Python] Fix segfault for dataset ORC write (#15049)
* [Python][Docs] Update docstring for pyarrow.decompress (#15061)
* [Python][CI] Dask nightly tests are failing due to fsspec bug (#15065)
* [C++][Python][FlightRPC] Make DoAction truly streaming (#15118)
* [Benchmarking][Python] Set ARROW_INSTALL_NAME_RPATH=ON for benchmark builds (#15123)
* [Python][macOS] Use `@rpath` for libarrow_python.dylib (#15143)
* [Python] Docstring test failure (#15186)
* [Python] Don't use target_include_directories() for imported target (#33606)
* [Python] Make CSV cancellation test more robust
* [Python][CI] Python sdist installation fails with latest setuptools 58.5
* [Python] Missing bindings for existing\_data\_behavior makes it impossible to maintain old behavior
* [Python] update trove classifiers to include Python 3.10
* [Release][Python] Use python -m pytest
* [Python][C++] Non-deterministic segfault in "AMD64 MacOS 10.15 Python 3.7" build
* [Python][Doc] Clarify what should be expected if read_table is passed an empty list of columns
* [Python][Packaging] Set deployment target to 10.13 for universal2 wheels
* [Python] Fix crash in take/filter of empty ExtensionArray
* [Python] Move marks from fixtures to individual tests/params
* [Python][CI] Requiring s3fs >= 2021.8
* [Python] Allow writing datasets using a partitioning that only specifies field_names
* [Python] Table.from_arrays should raise an error when array is empty but names is not
* [Python][Packaging] Pin minimum setuptools version for the macos wheels
* [Python][Doc] Document nullable dtypes handling and usage of types_mapper in to_pandas conversion
* [C++][Python] Fix unique/value_counts on empty dictionary arrays
* [Python][CI] Fix tests using OrcFileFormat for Python 3.6 + orc not built
* [Python] Fix FlightClient.do_action
* [Python][Docs] Fix usage of sync scanner in dataset writing docs
* [Packaging][Python] Python 3.9 installation fails in macOS wheel build
* [CI][Python] Fix Spark integration failures
* [Python] Fix version constraints in pyproject.toml
* [Packaging][Python] Disable windows wheel testing for python 3.6
* [Python][C++] Segfault with read\_json when a field is missing
* [Python] Support for set/list columns when converting from Pandas
* [Python] Support converting nested sets when converting to arrow
* [Python] Make filesystems compatible with fsspec
* [C++][Python][R] Consolidate coalesce/fill_null
* [Python][Doc] Document the fsspec wrapper for pyarrow.fs filesystems
* [Python] Support core-site.xml default filesystem.
* [Python] Improve HadoopFileSystem docstring
* [Python][Doc] Document missing pandas to arrow conversions
* [Python] Make SubTreeFileSystem print method more informative
* [Doc][Python] Improve documentation regarding dealing with memory mapped files
* [C++][Python] Implement a new scalar function: list_element
* [Python] Allow creating RecordBatch from Python dict
* [Python] Update HadoopFileSystem docs to clarify setting CLASSPATH env variable is required
* [Python] Improve documentation on what 'use_threads' does in 'read_feather'
* [C++][Python] Improve consistency of explicit C++ types in PyArrow files
* [Doc][Python] Improve PyArrow documentation for new users
* [C++][Python] Add CSV convert option to change decimal point
* [Python][Packaging] Build M1 wheels for python 3.8
* [Release][Python] Verify python 3.8 macOS arm64 wheel
* [Doc][Python] Switch ipc/io doc to use context managers
* [Python] Mention alternative deprecation message for ParquetDataset.partitions
* [C++][Python] Implement ExtensionScalar
* [Packaging][Python] Skip test_cancellation test case on M1
* [Python][FlightRPC] pyarrow client do_put close method after write_table did not throw flight error
* [Packaging][Python] Define --with-lg-page for jemalloc in the arm manylinux builds
* [Python] Fix docstrings
* [Python] Expose copy_files in pyarrow.fs
* [Doc][Python] Add a recipe on how to save partitioned datasets to the Cookbook
* [Python] Update deprecated pytest yield_fixture functions
* [Python] Support for MapType with Fields
* [Python][Docs] Improve filesystem documentation
* [Python] Add dataset mark to test_parquet_dataset_deprecated_properties
* . [Python] Preview data when printing tables
* [C++][Python] Column projection pushdown for ORC dataset reading + use liborc for column selection
* [C++][Python] Add support for new MonthDayNano Interval Type
* [Doc][Python] Add documentation for unify_schemas
* [C++][Python] Implement C data interface support for extension types
* [Python] Allow more than numpy.array as masks when creating arrays
* [Python] Correct TimestampScalar.as_py() and DurationScalar.as_py() docstrings
* [Python] Migrate Python ORC bindings to use new Result-based APIs
* [Python] Support tuples in unify_schemas
* [C++][Python] Not providing a sort_key in the "select_k_unstable" kernel crashes
* [C++][Python] Support cast of naive timestamps to strings
* [Python] Update kernel categories in compute doc to match C++
* [C++][Python][R] Implement count distinct kernel
* [Python] Allow unsigned integer index type in dictionary() type factory function
* [Python] Missing Python tests for compute kernels
* [Python][CI] Add support for python 3.10
* [C++][Python] Improve error message when trying use SyncScanner when requiring async
* [Python] Extend CompressedInputStream to work with paths, strings and files
* [Packaging][Python] Enable NEON SIMD optimization for M1 wheels
* [C++][Python] Use std::move() explicitly for g++ 4.8.5
* [Python][Packaging] Use numpy 1.21.3 to build python 3.10 wheels for macOS and windows
- Build via PEP517
-------------------------------------------------------------------
Mon Aug 22 07:06:44 UTC 2022 - John Vandenberg <jayvdb@gmail.com>
- Update to v9.0.0
-------------------------------------------------------------------
Mon Jan 21 03:51:32 UTC 2019 - Todd R <toddrme2178@gmail.com>
- Initial version for v0.13.0