Compare commits

25 Commits

Author SHA256 Message Date
29b55664af Accepting request 1285645 from science
- Update to 20.0.0
  ## Bug Fixes
  * GH-30302 - [C++][Parquet] Preserve the bitwidth of integer
    dictionary indices on round-trip to Parquet (#45685)
  * GH-31992 - [C++][Parquet] Handling the special case when
    DataPageV2 values buffer is empty (#45252)
  * GH-37630 - [C++][Python][Dataset] Allow disabling fragment
    metadata caching (#45330)
  * GH-39023 - [C++][CMake] Add missing launcher path conversion
    for ExternalPackage (#45349)
  * GH-43057 - [C++] Thread-safe AesEncryptor / AesDecryptor
    (#44990)
  * GH-45048 - [C++][Parquet] Deprecate unused chunk_size parameter
    in parquet::arrow::FileWriter::NewRowGroup() (#45088)
  * GH-45129 - [Python][C++] Fix usage of deprecated C++
    functionality on pyarrow (#45189)
  * GH-45132 - [C++][Gandiva] Update LLVM to 18.1 (#45114)
  * GH-45185 - [C++][Parquet] Raise an error for invalid repetition
    levels when delimiting records (#45186)
  * GH-45254 - [C++][Acero] Fix the row offset truncation in row
    table merge (#45255)
  * GH-45266 - [C++][Acero] Fix the running tasks count of
    Scheduler when get error tasks in multi-threads (#45268)
  * GH-45270 - [C++][CI] Disable mimalloc on Valgrind builds
    (#45271)
  * GH-45301 - [C++] Change PrimitiveArray ctor to protected
    (#45444)
  * GH-45334 - [C++][Acero] Fix swiss join overflow issues in row
    offset calculation for fixed length and null masks (#45336)
  * GH-45362 - [C++] Fix identity cast for time and list scalar

OBS-URL: https://build.opensuse.org/request/show/1285645
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=20
2025-06-14 14:17:55 +00:00
1aba9e9712 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=57 2025-06-13 18:46:54 +00:00
8697b15a63 .
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=56
2025-06-13 18:39:08 +00:00
853a205aac - Update to 20.0.0
## Bug Fixes
  * GH-30302 - [C++][Parquet] Preserve the bitwidth of integer
    dictionary indices on round-trip to Parquet (#45685)
  * GH-31992 - [C++][Parquet] Handling the special case when
    DataPageV2 values buffer is empty (#45252)
  * GH-37630 - [C++][Python][Dataset] Allow disabling fragment
    metadata caching (#45330)
  * GH-39023 - [C++][CMake] Add missing launcher path conversion
    for ExternalPackage (#45349)
  * GH-43057 - [C++] Thread-safe AesEncryptor / AesDecryptor
    (#44990)
  * GH-45048 - [C++][Parquet] Deprecate unused chunk_size parameter
    in parquet::arrow::FileWriter::NewRowGroup() (#45088)
  * GH-45129 - [Python][C++] Fix usage of deprecated C++
    functionality on pyarrow (#45189)
  * GH-45132 - [C++][Gandiva] Update LLVM to 18.1 (#45114)
  * GH-45185 - [C++][Parquet] Raise an error for invalid repetition
    levels when delimiting records (#45186)
  * GH-45254 - [C++][Acero] Fix the row offset truncation in row
    table merge (#45255)
  * GH-45266 - [C++][Acero] Fix the running tasks count of
    Scheduler when get error tasks in multi-threads (#45268)
  * GH-45270 - [C++][CI] Disable mimalloc on Valgrind builds
    (#45271)
  * GH-45301 - [C++] Change PrimitiveArray ctor to protected
    (#45444)
  * GH-45334 - [C++][Acero] Fix swiss join overflow issues in row
    offset calculation for fixed length and null masks (#45336)
  * GH-45362 - [C++] Fix identity cast for time and list scalar

OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=55
2025-06-13 18:31:56 +00:00
d35123d1c1 Accepting request 1271193 from science
- to fix cmake-4 build problems, upgrade bundled mimalloc from
  2.0.6 to 2.0.9 and add apache-arrow-19.0.1-mimalloc-version.patch;
  mimalloc changes according to readme.md:
  * 2.0.9:
    - Supports building with asan and improved [Valgrind] support.
    - Support abitrary large alignments, in particular for
      `std::pmr` pools.
    - Added C++ STL allocators attached to a specific heap.
    - Heap walks now visit all object (including huge objects).
    - Support Windows nano server containers.
    - Various small bug fixes.
  * 2.0.7:
    - Initial support for [Valgrind] for leak testing and heap
      block overflow detection.
    - Initial support for attaching heaps to a speficic memory area.
    - Fix `realloc` behavior for zero size blocks,
    - Remove restriction to integral multiple of the alignment in
      `alloc_align`.
    - Improved aligned allocation performance.
    - Reduced contention with many threads on few processors.
    - VS2022 support.
    - Support `pkg-config`.

OBS-URL: https://build.opensuse.org/request/show/1271193
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=19
2025-04-22 15:28:04 +00:00
a03ab640dd OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=53 2025-04-21 16:33:45 +00:00
12b0bf8517 Accepting request 1271189 from home:hsk17:branches:home:simotek:cmake4b
changes to fix  cmake-4 build problems

OBS-URL: https://build.opensuse.org/request/show/1271189
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=52
2025-04-21 16:30:53 +00:00
4d14f521d8 Accepting request 1264972 from science
- Re-enable flight, grpc has been fixed boo#1237422

OBS-URL: https://build.opensuse.org/request/show/1264972
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=18
2025-04-02 19:05:38 +00:00
986ddd3f2e Accepting request 1264971 from home:bnavigator:branches:science
- Re-enable flight, grpc has been fixed boo#1237422

OBS-URL: https://build.opensuse.org/request/show/1264971
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=50
2025-03-28 08:48:20 +00:00
c3e5d75605 Accepting request 1252869 from science
- Add missing dependencies for libboost_process explicitly
  boo#1239599 (forwarded request 1252868 from bnavigator)

OBS-URL: https://build.opensuse.org/request/show/1252869
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=17
2025-03-13 21:47:20 +00:00
1d57fa866b Accepting request 1252868 from home:bnavigator:branches:science
- Add missing dependencies for libboost_process explicitly
  boo#1239599

OBS-URL: https://build.opensuse.org/request/show/1252868
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=48
2025-03-13 19:08:14 +00:00
01acc18061 Accepting request 1247454 from science
- disable flight because of gh#grpc/grpc#37968 boo#1237422 (forwarded request 1247453 from bnavigator)

OBS-URL: https://build.opensuse.org/request/show/1247454
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=16
2025-02-20 18:53:17 +00:00
285eb6979a Accepting request 1247453 from home:bnavigator:branches:science
- disable flight because of gh#grpc/grpc#37968 boo#1237422

OBS-URL: https://build.opensuse.org/request/show/1247453
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=46
2025-02-20 16:43:05 +00:00
ea30dc8735 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=45 2025-02-18 19:10:15 +00:00
e02b9f7269 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=44 2025-02-18 19:08:10 +00:00
2c44dc303e OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=43 2025-02-18 15:09:20 +00:00
1392e3167f OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=42 2025-02-18 13:00:56 +00:00
d9b8e0ac6e OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=41 2025-02-17 22:38:30 +00:00
55775895c9 - Update to 19.0.1
## Bug Fixes
  * [C++] Fix overflow issues for large build side in swiss join
    (#45108)
  * [C++][Fuzzing] Fix Negation bug discovered by fuzzing (#45181)
  * [C++][Parquet] Omit level histogram when max level is 0
    (#45285)
  * [Parquet][C++] Fix statistics load logic for no row group and
    multiple row groups (#45350)
  * [C++] Disable Flight test (#45232)
  ## Improvements
  * [C++][Parquet] Improve performance of generating size
    statistics (#45202)
  * [C++][S3] Workaround compatibility issue between AWS SDK and
    MinIO (#45310)
- Release 19.0.0
  ## New Features and Improvements
  * [CI][C++] Add a nightly job to test offline build (#44721)
  * [C++] GcsFileSystem::Make should return Result (#44503)
  * [C++][Parquet] Implement SizeStatistics (#40594)
  * [C++] Reduce string inlining in Substrait serde (#45174)
  * [C++][Acero] Enhance asof_join to work in multi-threaded
    execution by sequencing input (#44083)
  * [C++] Support the AWS S3 SSE-C encryption (#43601)
  * [C++][Parquet] Parquet Metadata Printer supports print
    sort-columns (#43599)
  * [C++] Add C++ implementation of Async C Data Interface (#44495)
  * [C++][Acero] Support AVX2 swiss join decoding (#43832)
  * [C++] skip -0117 in StrptimeZoneOffset for old glibc (#44621)
  * [C++] Add arrow::RecordBatch::MakeStatisticsArray() (#44252)

OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=40
2025-02-17 22:32:29 +00:00
ffdc9dadfc Accepting request 1218457 from science
OBS-URL: https://build.opensuse.org/request/show/1218457
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=15
2024-10-27 10:25:51 +00:00
be27bc1230 Accepting request 1218425 from home:yeey:OpenWebUI
- Set the appropriate C++ complier for the given platform so
  it will compile on Leap 15.x. 

- Enable sle15_python_module_pythons.

OBS-URL: https://build.opensuse.org/request/show/1218425
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=38
2024-10-26 01:06:02 +00:00
758d4c683d Accepting request 1201792 from science
OBS-URL: https://build.opensuse.org/request/show/1201792
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=14
2024-09-22 09:05:54 +00:00
3f02fd3dcd Accepting request 1201791 from home:bnavigator:branches:science
- Add apache-arrow-pr43766-boost1_86.patch for Boost 1.86
  * gh#apache/arrow#43766

OBS-URL: https://build.opensuse.org/request/show/1201791
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=36
2024-09-18 12:46:47 +00:00
1db8e83530 Accepting request 1194086 from science
OBS-URL: https://build.opensuse.org/request/show/1194086
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=13
2024-08-16 10:23:38 +00:00
9bed06f66b Accepting request 1194085 from home:bnavigator:branches:science
- Update to 17.0.0
  ## Bug Fixes
  * [C++] Add option to string ‘center’ kernel to control
    left/right alignment on odd number of padding (#41449)
  * [C++][Python] Fix casting to extension type with fixed size
    list storage type (#42219)
  * [C++] Replace null_count with MayHaveNulls in
    ListArrayFromArray and MapArray (#41957)
  * [C++][Python] RecordBatch.filter() segfaults if passed a
    ChunkedArray (#40971)
  * [C++][Parquet] Timestamp conversion from Parquet to Arrow does
    not follow compatibility guidelines for convertedType
  * [C++] Use LargeStringArray for casting when writing tables to
    CSV (#40271)
  * [C++][Python] Map child Array constructed from keys and items
    shouldn’t have offset (#40871)
  * [C++] Fix compile warning with ‘implicitly-defined constructor
    does not initialize’ in encoding_benchmark (#41060)
  * [C++] Get null_bit_id according to are_cols_in_encoding_order
    in NullUpdateColumnToRow_avx2 (#40998)
  * [C++] Clean up unused parameter warnings (#41111)
  * [C++][Acero] Fix asof join race (#41614)
  * [C++] support for single threaded joins (#41125)
  * [C++] Fix hashjoin benchmark failed at make utf8’s random
    batches (#41195)
  * [C++] Check to avoid copying when NullBitmapBuffer is Null
    (#41452)
  * [C++] Fix crash on invalid Parquet file (#41366)
  * [C++][Parquet] More strict Parquet level checking (#41346)
  * [C++][Gandiva] Fix gandiva cache size env var (#41330)
  * [C++][CMake][Windows] Remove needless .dll suffix from link
    libraries (#41341)
  * [C++][CMake] Remove unused ARROW_NO_DEPRECATED_API (#41345)
  * [C++][maybe_unused] with Arrow macro (#41359)
  * [C++][Large] ListView and Map nested types for scalar_if_else’s
    kernel functions (#41419)
  * [C++][Gandiva] Fix ascii_utf8 function to return same result on
    x86 and Arm (#41434)
  * [C++] Reuse deduplication logic for direct registration
    (#41466)
  * [C++] Clean up more redundant move warnings (#41487)
  * [C++][Compute] Remove redundant logic for ArrayData as
    ExecResults in ExecScalarCaseWhen (#41380)
  * [C++][CMake] correctly use Protobuf_PROTOC_EXECUTABLE (#41582)
  * [C++][CMake] Fix ARROW_USE_BOOST detect condition (#41622)
  * [C++][Python] Add optional null_bitmap to MapArray::FromArrays
    (#41757)
  * [C++] macros.h: Fix ARROW_FORCE_INLINE for MSVC (#41712)
  * [C++][Acero] Remove an useless parameter for QueryContext::Init
    called in hash_join_benchmark (#41716)
  * [C++] Fix the issue that temp vector stack may be under sized
    (#41746)
  * [C++] Check that extension metadata key is present before
    attempting to delete it (#41763)
  * [C++] Iterator releases its resource immediately when it reads
    all values (#41824)
  * [C++][Flight][Benchmark] Ensure waiting server ready (#41793)
  * [C++] Fix avx2 gather offset larger than 2GB in
    CompareColumnsToRows (#42188)
  * [C++][S3] Fix potential deadlock when closing output stream
    (#41876)
  * [CI][C++] Clear cache for mamba on AppVeyor (#41977)
  * [CI][Python][C++] Fix utf8proc detection for wheel on Windows
    (#42022)
  * [C++] Support list-views on list_slice (#42067)
  * [C++] Fix an OTel test failure and remove needless logs
    (#42122)
  * [C++][FS][Azure] Ensure setting BlobSasBuilder::Protocol
    (#42108)
  * [C++] Support list-view typed arrays in array_take and
    array_filter (#42117)
  * [C++] Fix some potential uninitialized variable warnings
    (#42207)
  * [C++] Avoid invalid accesses in parquet-encoding-benchmark
    (#42141)
  * [C++] Use FetchContent for bundled ORC (#43011)
  * [C++] Fix GetRecordBatchPayload crashes for device data
    (#42199)
  * [C++] Use non-stale c-ares download URL (#42250)
  * [C++][Parquet] Check for valid ciphertext length to prevent
    segfault (#43071)
  * [C++][Compute] Mark KeyCompare.CompareColumnsToRowsLarge as
    large memory test (#43128)
  * [C++] Upgrade bundled google-cloud-cpp to 2.22.0 (#43136)
  ## New Features and Improvements
  * [C++][Compute] Implement Grouper::Reset (#41352)
  * [Go][C++] Implement Flight SQL Bulk Ingestion (#38385)
  * [C++][FS][Azure] Support azure cli auth (#41976)
  * [C++][FS][Azure] Add support for environment credential
    (#41715)
  * [C++] Optimize Take for fixed-size types including nested
    fixed-size lists (#41297)
  * [C++][Device] Add Copy/View slice functions to a CPU pointer
    (#41477)
  * [C++] Add support for OpenTelemetry logging (#39905)
  * [C++] Import/Export ArrowDeviceArrayStream (#40807)
  * [C++] move LocalFileSystem to the registry (#40356)
  * [C++] Make flatbuffers serialization more deterministic
    (#40392)
  * [C++][Gandiva] add RE2::Options set_dot_nl(true) for Like
    function (#40970)
  * [C++] Introduce portable compiler assumptions (#41021)
  * [C++] Add a grouper benchmark for preventing performance
    regression (#41036)
  * [C++] Support flatten for combining nested list related types
    (#41092)
  * [C++] Clean up remaining tasks related to half float casts
    (#41084)
  * [C++][FS][Azure] Add support for CopyFile with hierarchical
    namespace support (#41276)
  * [C++] Add is_validity_defined_by_bitmap() predicate (#41115)
  * [C++] IO: enhance boundary checking in CompressedInputStream
    (#41117)
  * [C++][Python] Expose recursive flatten for lists on
    list_flatten kernel function and pyarrow bindings (#41295)
  * [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst
    (#41187)
  * [C++] Extract the kernel loops used for PrimitiveTakeExec and
    generalize to any fixed-width type (#41373)
  * [C++][Acero] Use per-node basis temp vector stack to mitigate
    overflow (#41335)
  * [C++][Parquet] Optimize DelimitRecords by batch execution when
    max_rep_level > 1 (#41362)
  * [C++][FS][Azure][Docs] Add AzureFileSystem to Filesystems API
    reference (#41411)
  * [C++] Use ASAN to poison temp vector stack memory (#41695)
  * [C++][S3] Add a new option to check existence before CreateDir
    (#41822)
  * [C++][Parquet] Fix
    DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize (#41546)
  * [C++] Thirdparty: Upgrade xsimd to 13.0.0 (#41548)
  * [C++] Improve fixed_width_test_util.h (#41575)
  * [C++] ChunkResolver: Implement ResolveMany and add unit tests
    (#41561)
  * [C++] fixed_width_internal.h: Simplify docstring and support
    bit-sized types (BOOL) (#41597)
  * [C++][Python] Extends the add_key_value to parquet::arrow and
    PyArrow (#41633)
  * [C++][CMake][Windows] Don’t build needless object libraries
    (#41658)
  * [C++][Python] PrettyPrint non-cpu data by copying to default
    CPU device (#42010)
  * [C++][Parquet] Thrift: generate template method to accelerate
    reading thrift (#41703)
  * [C++][Parquet] Minor: moving EncodedStats by default rather
    than copying (#41727)
  * [C++][ORC] Ensure setting detected ORC version (#41767)
  * [C++][Parquet] Add file metadata read/write benchmark (#41761)
  * [C++] Make git-dependent definitions internal (#41781)
  * [C++][S3] Remove GetBucketRegion hack for newer AWS SDK
    versions (#41798)
  * [C++][Parquet] normalize dictionary encoding to use
    RLE_DICTIONARY (#41819)
  * [C++] IPC: Minor enhance the code of writer (#41900)
  * [C++] Fix ExecuteScalar deduce all_scalar with chunked_array
    (#41925)
  * [C++] Minor enhance code style for FixedShapeTensorType
    (#41954)
  * [C++] Follow up of adding null_bitmap to MapArray::FromArrays
    (#41956)
  * [C++] Misc changes making code around list-like types and
    list-view types behave the same way (#41971)
  * [C++] : kernel.cc: Remove defaults on switch so that compiler
    can check full enum coverage for us (#41995)
  * [C++][Parquet] ParquetFilePrinter::JSONPrint print length of
    FLBA (#41981)
  * [C++][CMake] Add preset for Valgrind (#42110)
  * [C++] Move TakeXXX free functions into TakeMetaFunction and
    make them private (#42127)
  * [C++][FS][Azure] Validate
    AzureOptions::{blob,dfs}_storage_scheme (#42135)
  * [C++] list_parent_indices: Add support for list-view types
    (#42236)
  * [C++] Reduce the recursion of many-join test (#43042)
  * [C++] Limit buffer size in BufferedInputStream::SetBufferSize
    with raw_read_bound (#43064)
- Require cmake lz4 for 1.10
- Update to 17.0.0
  ## Bug Fixes
  * [C++][Python] Fix casting to extension type with fixed size
    list storage type (#42219)
  * [Python] Include metadata when creating pa.schema from
    PyCapsule (#41538)
  * [C++][Python] RecordBatch.filter() segfaults if passed a
    ChunkedArray (#40971)
  * [Python] pa.array: add check for byte-swapped numpy arrays
    inside python objects (#41549)
  * [Python] Fix read_table for encrypted parquet (#39438)
  * [Python] RunEndEncodedArray.from_arrays: bugfix for Array
    arguments (#40560) (#41093)
  * [C++][Python] Map child Array constructed from keys and items
    shouldn’t have offset (#40871)
  * [Python] `test_numpy_array_protocol` test failures with numpy
    2.0.0rc1
  * [Python] Fix StructArray.sort() for by=None (#41495)
  * [Python] Build with Python 3.13 (#42034)
  * [Python] remove special methods related to buffers in python
    <2.6 (#41492)
  * [Python] Fix reading column index with decimal values (#41503)
  * [Docs][Python] Remove duplicate contents (#41588)
  * [C++][Python] Add optional null_bitmap to MapArray::FromArrays
    (#41757)
  * [Python][Parquet] Implement to_dict method on SortingColumn
    (#41704)
  * [Python] CMake: ignore Parquet encryption option if Parquet
    itself is not enabled (fix Java integration build) (#41776)
  * [Python] Disallow direct pa.RecordBatchReader() construction to
    avoid segfaults (#41773)
  * [Python] Fix RecordBatchReader.cast to support casting to equal
    schema for all types (#42098)
  * [Python] Fix tests when using NumPy 2.0 on Windows (#42099)
  * [CI][Python] Use pip install -e instead of setup.py build_ext
    –inplace for installing pyarrow on verification script (#42007)
  * [CI][Python][C++] Fix utf8proc detection for wheel on Windows
    (#42022)
  * [Python][CI] Update expected output for numpy 2.0.0 (#42172)
  ## New Features and Improvements
  * [Python] Replace pandas.util.testing.rands with vendored
    version (#42089)
  * [Python] begin moving static settings to pyproject.toml
    (#41041)
  * [Python] Implement PyCapsule interface for Device data in
    PyArrow (#40717)
  * [Python] Expand the Arrow PyCapsule Interface with C Device
    Data support (#40708)
  * [Python] Let RecordBatch.filter accept a boolean expression in
    addition to mask array (#43043)
  * [Python] Fix pickling of LocalFileSystem for cython 2 (#41459)
  * [Python] Expand the C Device Interface bindings to support
    import on CUDA device (#40385)
  * [Python] Allow passing a mapping of column names to
    rename_columns (#40645)
  * [Python][Packaging] Strip unnecessary symbols when building
    wheels (#42028)
  * [Python][Docs] Update PyArrow installation docs for conda
    package split (#41135)
  * [Python] Basic bindings for Device and MemoryManager classes
    (#41685)
  * [C++][Python] Expose recursive flatten for lists on
    list_flatten kernel function and pyarrow bindings (#41295)
  * [Python][Packaging] Ensure to build with released numpy 2.0
    (instead of RC) in the wheel building workflows (#42194)
  * [CI][Python] Add a job on ARM64 macOS (#41313)
  * [CI][Python] Reduce CI time on macOS (#41378)
  * [Python] Expose byte_width and bit_width of ExtensionType in
    terms of the storage type (#41413)
  * [Python] Update Python development guide about components being
    enabled by default based on Arrow C++ (#41705)
  * [Python] Building PyArrow: enable/disable python components by
    default based on availability in Arrow C++ (#41494)
  * [C++][Python] Extends the add_key_value to parquet::arrow and
    PyArrow (#41633)
  * [Python] Ensure Buffer methods don’t crash with non-CPU data
    (#41889)
  * [C++][Python] PrettyPrint non-cpu data by copying to default
    CPU device (#42010)
  * [Python][Parquet] Update BYTE_STREAM_SPLIT description in
    write_table() docstring (#41759)
  * [Python] Add support for Pyodide (#37822)
  * [Python] Fix pandas tests to follow downstream datetime64 unit
    changes (#41979)
  * [Python] Allow Array.filter() to take general array input
    (#42051)
  * [Python] Expose new FLOAT16 logical type in the pyarrow.parquet
    bindings (#42103)
  * [Python] Array gracefully fails on non-cpu device (#42113)
  * [Python][Parquet] Pyarrow store decimal as integer (#42169)
  * [Python] Add CI job for Numpy 1.X (#42189)
  * [CI][Python] Pin openjdk=17 in python substrait integration
    (#43051)
- Drop pyarrow-pr41319-numpy2-tests.patch
- Add pyarrow-pr433325-extradirs.patch gh#apache/arrow/pull/43325

OBS-URL: https://build.opensuse.org/request/show/1194085
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=34
2024-08-15 09:43:24 +00:00
13 changed files with 1275 additions and 81 deletions

View File

@@ -1,3 +0,0 @@
version https://git-lfs.github.com/spec/v1
oid sha256:423eb4c1d6dbbcb7ca429d548e94f8a99cd4603bc023de9c0578d1950ce0f21d
size 21350177

View File

@@ -0,0 +1,13 @@
--- a/cpp/thirdparty/versions.txt 2025-02-11 23:16:06.000000000 +0100
+++ b/cpp/thirdparty/versions.txt 2025-04-21 15:34:04.565829184 +0200
@@ -82,8 +82,8 @@
ARROW_JEMALLOC_BUILD_SHA256_CHECKSUM=2db82d1e7119df3e71b7640219b6dfe84789bc0537983c3b7ac4f7189aecfeaa
ARROW_LZ4_BUILD_VERSION=v1.10.0
ARROW_LZ4_BUILD_SHA256_CHECKSUM=537512904744b35e232912055ccf8ec66d768639ff3abe5788d90d792ec5f48b
-ARROW_MIMALLOC_BUILD_VERSION=v2.0.6
-ARROW_MIMALLOC_BUILD_SHA256_CHECKSUM=9f05c94cc2b017ed13698834ac2a3567b6339a8bde27640df5a1581d49d05ce5
+ARROW_MIMALLOC_BUILD_VERSION=v2.0.9
+ARROW_MIMALLOC_BUILD_SHA256_CHECKSUM=4a29edae32a914a706715e2ac8e7e4109e25353212edeed0888f4e3e15db5850
ARROW_NLOHMANN_JSON_BUILD_VERSION=v3.10.5
ARROW_NLOHMANN_JSON_BUILD_SHA256_CHECKSUM=5daca6ca216495edf89d167f808d1d03c4a4d929cef7da5e10f135ae1540c7e4
ARROW_OPENTELEMETRY_BUILD_VERSION=v1.13.0

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:67e31a4f46528634b8c3cbb0dc60ac8f85859d906b400d83d0b6f732b0c5b0e3
size 17592223

View File

@@ -1,3 +1,818 @@
-------------------------------------------------------------------
Fri Jun 13 18:22:55 UTC 2025 - Ben Greiner <code@bnavigator.de>
- Update to 20.0.0
## Bug Fixes
* GH-30302 - [C++][Parquet] Preserve the bitwidth of integer
dictionary indices on round-trip to Parquet (#45685)
* GH-31992 - [C++][Parquet] Handling the special case when
DataPageV2 values buffer is empty (#45252)
* GH-37630 - [C++][Python][Dataset] Allow disabling fragment
metadata caching (#45330)
* GH-39023 - [C++][CMake] Add missing launcher path conversion
for ExternalPackage (#45349)
* GH-43057 - [C++] Thread-safe AesEncryptor / AesDecryptor
(#44990)
* GH-45048 - [C++][Parquet] Deprecate unused chunk_size parameter
in parquet::arrow::FileWriter::NewRowGroup() (#45088)
* GH-45129 - [Python][C++] Fix usage of deprecated C++
functionality on pyarrow (#45189)
* GH-45132 - [C++][Gandiva] Update LLVM to 18.1 (#45114)
* GH-45185 - [C++][Parquet] Raise an error for invalid repetition
levels when delimiting records (#45186)
* GH-45254 - [C++][Acero] Fix the row offset truncation in row
table merge (#45255)
* GH-45266 - [C++][Acero] Fix the running tasks count of
Scheduler when get error tasks in multi-threads (#45268)
* GH-45270 - [C++][CI] Disable mimalloc on Valgrind builds
(#45271)
* GH-45301 - [C++] Change PrimitiveArray ctor to protected
(#45444)
* GH-45334 - [C++][Acero] Fix swiss join overflow issues in row
offset calculation for fixed length and null masks (#45336)
* GH-45362 - [C++] Fix identity cast for time and list scalar
(#45370)
* GH-45371 - [C++] Fix data race in SimpleRecordBatch::columns
(#45372)
* GH-45393 - [C++][Compute] Fix wrong decoding for 32-bit column
in row table (#45473)
* GH-45396 - [C++] Use Boost with ARROW_FUZZING (#45397)
* GH-45423 - [C++] Dont require Boost library with
ARROW_TESTING=ON/ARROW_BUILD_SHARED=OFF (#45424)
* GH-45497 - [C++][CSV] Avoid buffer overflow when a line has too
many columns (#45498)
* GH-45510 - [CI][C++] Fix LLVM APT repository preparation on
Debian (#45511)
* GH-45512 - [C++] Clean up undefined symbols in libarrow without
IPC (#45513)
* GH-45514 - [CI][C++][Docs] Set CUDAToolkit_ROOT explicitly in
debian-docs (#45520)
* GH-45537 - [CI][C++] Add missing includes (iwyu) to
file_skyhook.cc (#45538)
* GH-45541 - [Doc][C++] Render ASCII art as-is (#45542)
* GH-45545 - [C++][Parquet] Add missing includes (#45554)
* GH-45564 - [C++][Acero] Add size validation for names and
expressions vectors in ProjectNode (#45565)
* GH-45568 - [C++][Parquet][CMake] Enable zlib automatically when
Thrift is needed (#45569)
* GH-45578 - [C++] Use max not min in
MakeStatisticsArrayMaxApproximate test (#45579)
* GH-45587 - [C++][Docs] Fix the statistics schema link in
arrow::RecordBatch::MakeStatisticsArray()s docstring (#45588)
* GH-45614 - [C++] Use Boosts CMake packages instead of
FindBoost.cmake in CMake (#45623)
* GH-45628 - [C++] Ensure specifying Boost include directory for
bundled Thrift (#45637)
* GH-45669 - [C++][Parquet] Add missing
ParquetFileReader::GetReadRanges() definition (#45684)
* GH-45693 - [C++][Gandiva] Fix aes_encrypt/decrypt algorithm
selection (#45695)
* GH-45700 - [C++][Compute] Added nullptr check in Equals method
to handle null impl_ pointers (#45701)
* GH-45733 - [C++][Python] Add biased/unbiased toggle to skew and
kurtosis functions (#45762)
* GH-45739 - [C++][Python] Fix crash when calling
hash_pivot_wider without options (#45740)
* GH-45788 - [C++][Acero] Fix data race in aggregate node
(#45789)
* GH-45868 - [C++][CI] Fix test for ambiguous initialization on
C++ 20 (#45871)
* GH-45905 - [C++][Acero] Enlarge the timeout in ConcurrentQueue
test to reduce sporadical failures (#45923)
* GH-45930 - [C++] Dont use ICU C++ API in Azure SDK C++
(#45952)
* GH-45939 - [C++][Benchmarking] Fix compilation failures
(#45942)
* GH-45959 - [C++][CMake] Fix Protobuf dependency in
Arrow::arrow_static (#45960)
* GH-45980 - [C++] Bump Bundled Snappy version to 1.2.2 (#45981)
* GH-45999 - [C++][Gandiva] Fix crashes on LLVM 20.1.1 (#46000)
* GH-46022 - [C++] Fix build error with g++ 7.5.0 (#46028)
* GH-46067 - [CI][C++] Remove system Flatbuffers from macOS
(#46105)
* GH-46077 - [CI][C++] Disable -Werror on macos-13 (#46106)
* GH-46111 - [C++][CI] Fix boost 1.88 on MinGW (#46113)
* GH-46123 - [C++] Undefined behavior in compare_internal.cc and
light_array_internal.cc (#46124)
* GH-46134 - [CI][C++] Explicit conversion of possible
absl::string_view on protobuf methods to std::string (#46136)
* GH-46159 - [CI][C++] Stop using possibly missing
boost/process/v2.hpp on boost 1.88 and use individual includes
(#46160)
* GH-46195 - [Release][C++] verify-rc-source-cpp-macos-amd64
failed to build googlemock
## New Features and Improvements
* GH-26648 - [C++] Optimize union equality comparison (#45384)
* GH-33592 - [C++] support casting nullable fields to
non-nullable if there are no null values (#43782)
* GH-41764 - [Parquet][C++] Support future logical types in the
Parquet reader (#41765)
* GH-41816 - [C++] Add Minimal Meson Build of libarrow (#45441)
* GH-43296 - [C++][FlightRPC] Remove Flight UCX transport
(#43297)
* GH-43573 - [C++] Copy bitmap when casting from string-view to
offset string and binary types (#44822)
* GH-44042 - [C++][Parquet] Limit num-of row-groups when building
parquet for encrypted file (# 44043)
* GH-44393 - [C++][Compute] Vector selection functions
inverse_permutation and scatter (#44394)
* GH-44615 - [C++][Compute] Add extract_regex_span function
(#45577)
* GH-44629 - [C++][Acero] Use implicit_ordering for asof_join
rather than require_sequenced_output (#44616)
* GH-44950 - [C++] Bump minimum CMake version to 3.25 (#44989)
* GH-45045 - [C++][Parquet] Add a benchmark for
size_statistics_level (#45085)
* GH-45190 - [C++][Compute] Add rank_quantile function (#45259)
* GH-45196 - [C++][Acero] Small refinement to hash join (#45197)
* GH-45206 - [C++][CMake] Add sanitizer presets (#45207)
* GH-45209 - [C++][CMake] Fix the issue that allocator not
disabled for sanitizer cmake presets (#45210)
* GH-45215 - [C++][Acero] Export SequencingQueue and
SerialSequencingQueue (#45221)
* GH-45216 - [C++][Compute] Refactor Rank implementation (#45217)
* GH-45219 - [C++][Examples] Update examples to disable mimalloc
(#45220)
* GH-45225 - [C++] Upgrade ORC to 2.1.0 (#45226)
* GH-45227 - [C++][Parquet] Enable Size Stats and Page Index by
default (#45249)
* GH-45269 - [C++][Compute] Add “pivot_wider” and
“hash_pivot_wider” functions (#45562)
* GH-45279 - [C++][Compute] Move all Grouper tests to
grouper_test.cc (#45280)
* GH-45344 - [C++][Testing] Generic StepGenerator (#45345)
* GH-45358 - [C++][Python] Add MemoryPool method to print
statistics (#45359)
* GH-45361 - [CI][C++] Curate ci/vcpkg/vcpkg.json (#45081)
* GH-45366 - [C++][Parquet] Set is_compressed to false when data
page v2 is not compressed (#45367)
* GH-45416 - [CI][C++][Homebrew] Backport the latest formula
changes (#45460)
* GH-45478 - [CI][C++] Drop support for Ubuntu 20.04 (#45519)
* GH-45506 - [C++][Acero] More overflow-safe Swiss table (#45515)
* GH-45551 - [C++][Acero] Release temp states of Swiss join
building hash table to reduce memory consumption (#45552)
* GH-45563 - [C++][Compute] Split up hash_aggregate.cc (#45725)
* GH-45566 - [C++][Parquet][CMake] Remove a workaround for
Windows in FindThriftAlt.cmake (#45567)
* GH-45572 - [C++][Compute] Add rank_normal function (#45573)
* GH-45584 - [C++][Thirdparty] Bump zstd to v1.5.7 (#45585)
* GH-45589 - [C++] Enable singular test in Meson configuration
(#45596)
* GH-45591 - [C++][Acero] Refine hash join benchmark and remove
openmp from the project (#45593)
* GH-45605 - [R][C++] Fix identifier … preceded by whitespace
warnings (#45606)
* GH-45611 - [C++][Acero] Improve Swiss join build performance by
partitioning batches ahead to reduce contention (#45612)
* GH-45620 - [CI][C++] Use Visual Studio 2022 not 2019 (#45621)
* GH-45652 - [C++][Acero] Unify ConcurrentQueue and
BackpressureConcurrentQueue API (#45421)
* GH-45676 - [C++][Python][Compute] Add skew and kurtosis
functions (#45677)
* GH-45680 - [C++][Python] Remove deprecated functions in 20.0
* GH-45689 - [C++][Thirdparty] Bump Apache ORC to 2.1.1 (#45600)
* GH-45694 - [C++] Bump vendored flatbuffers to 24.3.6 (#45687)
* GH-45696 - [C++][Gandiva] Accept LLVM 20.1 (#45697)
* GH-45732 - [C++][Compute] Accept more pivot key types (#45945)
* GH-45744 - [C++] Remove deprecated GetNextSegment (#45745)
* GH-45746 - [C++] Remove deprecated functions in 20.0 (C++
subset) (#45748)
* GH-45755 - [C++][Python][Compute] Add winsorize function
(#45763)
* GH-45771 - [C++] Add tests to top level Meson configuration
(#45773)
* GH-45772 - [C++] Export Arrow as dependency from Meson
configuration (#45774)
* GH-45775 - [C++] Use dict.get() in Meson configuration (#45776)
* GH-45779 - [C++] Add testing directory to Meson configuration
(#45780)
* GH-45784 - [C++] Unpin LLVM and OpenSSL in Brewfile (#45785)
* GH-45792 - [C++] Add benchmarks to Meson configuration (#45793)
* GH-45816 - [C++] Make VisitType() fallback branch unreachable
(#45815)
* GH-45820 - [C++] Add optional out_offset for Buffer-returning
CopyBitmap function (#45852)
* GH-45821 - [C++][Compute] Grouper improvements (#45822)
* GH-45825 - [C++] Add c directory to Meson configuration
(#45826)
* GH-45827 - [C++] Add io directory to Meson configuration
(#45828)
* GH-45831 - [C++] Add CSV directory to Meson configuration
(#45832)
* GH-45848 - [C++][Python][R] Remove deprecated PARQUET_2_0
(#45849)
* GH-45877 - [C++][Acero] Cleanup 64-bit temp states of Swiss
join by using 32-bit (#45878)
* GH-45917 - [C++][Acero] Add flush taskgroup to enable
parallelization (#45918)
* GH-45922 - [C++][Flight] Remove deprecated Authenticate and
StartCall (#45932)
* GH-45953 - [C++] Use lock to fix atomic bug in
ReadaheadGenerator (#45954)
* GH-45986 - [C++] Update bundled GoogleTest (#45996)
* GH-45987 - [C++] Set CMAKE_POLICY_VERSION_MINIMUM=3.5 for
bundled dependencies (#45997)
-------------------------------------------------------------------
Mon Apr 21 14:34:37 UTC 2025 - Friedrich Haubensak <hsk17@mail.de>
- to fix cmake-4 build problems, upgrade bundled mimalloc from
2.0.6 to 2.0.9 and add apache-arrow-19.0.1-mimalloc-version.patch;
mimalloc changes according to readme.md:
* 2.0.9:
- Supports building with asan and improved [Valgrind] support.
- Support abitrary large alignments, in particular for
`std::pmr` pools.
- Added C++ STL allocators attached to a specific heap.
- Heap walks now visit all object (including huge objects).
- Support Windows nano server containers.
- Various small bug fixes.
* 2.0.7:
- Initial support for [Valgrind] for leak testing and heap
block overflow detection.
- Initial support for attaching heaps to a speficic memory area.
- Fix `realloc` behavior for zero size blocks,
- Remove restriction to integral multiple of the alignment in
`alloc_align`.
- Improved aligned allocation performance.
- Reduced contention with many threads on few processors.
- VS2022 support.
- Support `pkg-config`.
-------------------------------------------------------------------
Fri Mar 28 08:47:10 UTC 2025 - Ben Greiner <code@bnavigator.de>
- Re-enable flight, grpc has been fixed boo#1237422
-------------------------------------------------------------------
Thu Mar 13 18:57:51 UTC 2025 - Ben Greiner <code@bnavigator.de>
- Add missing dependencies for libboost_process explicitly
boo#1239599
-------------------------------------------------------------------
Wed Feb 19 15:58:28 UTC 2025 - Ben Greiner <code@bnavigator.de>
- disable flight because of gh#grpc/grpc#37968 boo#1237422
-------------------------------------------------------------------
Mon Feb 17 19:17:26 UTC 2025 - Ben Greiner <code@bnavigator.de>
- Update to 19.0.1
## Bug Fixes
* [C++] Fix overflow issues for large build side in swiss join
(#45108)
* [C++][Fuzzing] Fix Negation bug discovered by fuzzing (#45181)
* [C++][Parquet] Omit level histogram when max level is 0
(#45285)
* [Parquet][C++] Fix statistics load logic for no row group and
multiple row groups (#45350)
* [C++] Disable Flight test (#45232)
## Improvements
* [C++][Parquet] Improve performance of generating size
statistics (#45202)
* [C++][S3] Workaround compatibility issue between AWS SDK and
MinIO (#45310)
- Release 19.0.0
## New Features and Improvements
* [CI][C++] Add a nightly job to test offline build (#44721)
* [C++] GcsFileSystem::Make should return Result (#44503)
* [C++][Parquet] Implement SizeStatistics (#40594)
* [C++] Reduce string inlining in Substrait serde (#45174)
* [C++][Acero] Enhance asof_join to work in multi-threaded
execution by sequencing input (#44083)
* [C++] Support the AWS S3 SSE-C encryption (#43601)
* [C++][Parquet] Parquet Metadata Printer supports print
sort-columns (#43599)
* [C++] Add C++ implementation of Async C Data Interface (#44495)
* [C++][Acero] Support AVX2 swiss join decoding (#43832)
* [C++] skip -0117 in StrptimeZoneOffset for old glibc (#44621)
* [C++] Add arrow::RecordBatch::MakeStatisticsArray() (#44252)
* [C++] Improve merge step in chunked sorting (#44217)
* [C++][Parquet] Tools: Debug Print for Json should be valid JSON
(#44532)
* [C++][FS][Azure] Implement SAS token authentication (#45021)
* [C++] Dont export template class (#44365)
* [C++][Docs] Update the URL to C++ Development in README.md
(#44427)
* [C++] Added rvalue-reference-qualified overload for
arrow::Result::status() returning value instead of reference
(#44477)
* [C++] StatusConstant- cheaply copied const Status (#44493)
* [C++][Compute] Allow casting struct to bigger nullable struct
(#44587)
* [C++] Use array type to compute min/max statistics Arrow type
(#45094)
* [C++] Minor: ArrayData ctor can assign null_count directly
(#44582)
* [C++] Add const and & to arrow::Array::statistics() return type
(#44592)
* [Python][C++] Add version suffix to libarrow_python* libraries
(#44702)
* [C++] NumericBuilder::AppendValues append vector prevent from
ub (#44794)
* [C++][Parquet] Remove obsolete parquet_constants generated
files from old thrift (#44772)
* [Docs][C++] Add arrow::ArrayStatistics to API doc (#44764)
* [C++] Upgrade ORC to 2.0.3 (#44745)
* [C++][Parquet] Add arrow::Result version of
parquet::arrow::OpenFile() (#44785)
* [C++] Fix a couple of maybe-uninitialized warnings (#44789)
* [C++] Use arrow::util::span on
arrow::util::bitmap_builders_utilities instead of std::vector
(#44796)
* [C++][Parquet] Add arrow::Result version of
parquet::arrow::FileReader::GetRecordBatchReader() (#44809)
* [C++] minor optimize cancel and thread pool (#44812)
* [C++][Parquet] Add an example to dump statistics read as
arrow::ArrayStatistics (#44816)
* [C++] Add the Expm1(exponent) scalar arithmetic function
(#44904)
* [C++] Add WithinUlp testing functions (#44906)
* [C++][Python] Add Hyperbolic Trig functions (#44630)
* [C++] Enable mimalloc by default, disable jemalloc by default
and more (#44951)
* [C++] Add support for building system OpenTelemetry (#44983)
* [C++][CMake] Use librt only for Linux (#44984)
* [C++] Support for fixed-size list in conversion of range tuple
(#45008)
* [C++][Parquet] Allow configuring the default footer read size
(#45016)
* [C++] Remove result_internal.h (#45066)
* [FlightRPC][C++] Deprecate InitializeFlightUcx before removing
UCX (#45080)
* [C++][Parquet] Add GetReadRanges function to FileReader
(#45093)
* [C++] Apply a cstdint patch to bundled Thrift for GCC 15
(#45097)
* [C++] Remove useless “hash table ready” states in swiss join
(#45136)
* [CI][C++] Add a GCC 15 job (#45138)
* [C++] Ensure using cpp/cmake_modules/*.cmake (#45143)
* [CI][C++] Upgrade Alpine Linux to 3.18 from 3.16 (#45168)
## Bug Fixes
* [C++] Fix CopyFiles when destination is a FileSystem with
background_writes (#44897)
* [C++][Python] Fix ORC crash when file contains unknown timezone
(#45051)
* [C++] Replace std::aligned_storage that is deprecated in C++23
(#45019)
* [C++][Parquet] Refuse writing non-nullable column that contains
nulls (#44921)
* [C++] Initialize offset vector head as 0 after memory allocated
in grouper.cc (#43123)
* [C++] io::BufferedInput: Fix invalid state after SetBufferSize
(#44387)
* [C++][Parquet] Fix schema conversion from two-level encoding
nested list (#43995)
* [C++] Use “lib” for generating bundled dependencies even with
“clang-cl” (#44391)
* [C++] Fix unaligned load/store implementation for clang-18
(#44468)
* [C++] Use CMAKE_LIBTOOL on macOS (#44385)
* [CI][C++] Use setup-python on hosted runner (#44411)
* [C++] Update vendored date to 3.0.3 (#44482)
* [GLib][C++] Meson searches libraries with specific versions.
(#44475)
* [C++][Acero] Fix crash when thread in asof_join is not running
(#44584)
* [C++] NumericArray should not use ctor from parent directly
(#44542)
* [C++] FunctionOptions::{Serialize,Deserialize}() return an
error without ARROW_IPC (#45171)
* [C++][Acero] Enhance partition sort example (#44678)
* [C++][Python] Fix Flight Timestamp precision, revert workaround
from #43537 (#44681)
* [C++] Add S3 option to ignore SIGPIPE signals (#44735)
* [C++] Keep field metadata for keys and values when importing a
map type via the C data interface (#44715)
* [C++][CI] Fix arrow-c-bridge-test timeout with threading
disabled (#44737)
* [C++] Use lowercased windows.h to enable cross-platform builds
(#44755)
* [C++] Fix Float16.To{Little,Big}Endian on big endian machines
(#44768)
* [C++][Parquet] Fix read/write of metadata length footer on
big-endian systems (#44787)
* [C++][CI] Migrate to arrow::Result based
parquet::arrow::OpenFile() API in example tutorials (#44807)
* [C++] Fix thread-unsafe access in ConcurrentQueue::UnsyncFront
(#44849)
* [C++] Fix compilation error on GCC 8 (#44899)
* [C++][CI] Silence protobuf-generated deprecations (#44955)
* [C++] Use recommended downloads URLs for ORC and Thrift
(#44977)
* [C++] Include path in the documentation is wrong (#45031)
* [C++] Remove Parquet requirement from Arrow Acero and from
Arrow Dataset when not necessary (#45035)
* [C++] Add support for Boost 1.87.0 (#45057)
* [C++][CI] Fix test-build-cpp-fuzz failures (#45060)
* [C++][Parquet] Fix generation of repetition levels for
encryption test data (#45074)
* [C++] Avoid static const variable in the status.h (#45100)
* [C++][Parquet] Fix Null-dereference READ in
parquet::arrow::ListToSchemaField (#45152)
* [C++][Release] Add llvm-dev back to setup-ubuntu.sh (#45184)
* [C++][Parquet] test-conda-cpp-valgrind fails on
arrow-dataset-file-parquet-encryption-test
- Release 18.1.0
## Bug Fixes
* [C++] Add support for overwriting grpc_cpp_plugin path for
cross-compiling (#44507)
* [Docs][C++] Fix documentation directive for ChunkLocation
(#44505)
* [C++] Add find module for abseil that handles missing version
(#44613)
* [C++][Dev] Update bundled Thrift, update mirrors to use CDN
(#44685)
## New Features and Improvements
* [C++] Move ChunkResolver to the public API (#44357)
- Release 18.0.0
## Bug Fixes
* [C++] data corruption when using `group_by` and `aggregate` on
large data sets
* [C++] Use PutObject request for S3 in OutputStream when only
uploading small data (#41564)
* [C++] Clean up implicit fallthrough warnings (#41892)
* [C++] Fix avx2 gather rows more than 2^31 issue in
CompareColumnsToRows (#43065)
* [C++][ArrowFlight] Crash due to UCS thread mode
* [C++] Add workaround for missing Boost dependency of Thrift
(#43328)
* [C++] Skip not Emscripten ready tests in CSV tests (#43724)
* [C++] Add date{32,64} to date{32,64} cast (#43192)
* [C++][Compute] Detect and explicit error for offset overflow in
row table (#43226)
* [C++] Fix decimal benchmarks to avoid out-of-bounds accesses
(#43212)
* [C++] Resolve Abseil like any other dependency in the build
system (#43219)
* [C++][Parquet] Refactor parquet::encryption::AesEncryptor to
use unique_ptr (#43222)
* [C++] Fix Abseil compile error on GCC 13 (#43157)
* [C++] Add missing serde methods to Location (#43332)
* [C++][Parquet] min-max Statistics doesnt work well when one of
min-max is truncated (#43383)
* [C++][Parquet] parquet-dump-footer: Remove redundant link and
fix debug processing (#43375)
* [C++] Ensure using bundled GoogleTest when we use bundled
GoogleTest (#43465)
* [C++][Compute] Fix invalid memory access when resizing
var-length buffer in row table (#43415)
* [C++][FlightRPC] Fix Flight UCX build issues (#43430)
* [C++] FIlter out zero length buffers on gRPC transport (#43448)
* [C++][Gandiva] Always use gdv_function_stubs.h in
context_helper.cc (#43464)
* [C++] Add support for the official LZ4 CMake package (#43468)
* [C++] Register the new Opaque extension type by default
(#43788)
* [C++][Acero] Fix typos in join benchmark (#43871)
* [C++][CI] Catch potential integer overflow in PoolBuffer
(#43886)
* [C++] Leak S3 structures if finalization happens too late
(#44090)
* [C++][Parquet] Fix reported metrics in
parquet-arrow-reader-writer-benchmark (#44082)
* [C++] Dont use Boost.Process with Emscripten (#44097)
* [C++] Add home made _mm256_set_m128i for compilers who are
missing it (#44116)
* [C++] JsonExtensionType equality check ignores storage type
(#44215)
* [CI][C++][AppVeyor] Use conda instead of Mamba (#44235)
* [C++][FS][Azure] Fix edgecase where GetFileInfo incorrectly
returns NotFound on flat namespace and Azurite (#44302)
* [C++][FS][Azure] Catch missing exceptions on HNS support check
(#44274)
* [C++][FS][Azure] Fix minor hierarchical namespace bugs (#44307)
* [C++] Fix S3 error handling in ObjectOutputStream (#44335)
* [C++] Disable jemalloc by default on ARM (#44380)
## New Features and Improvements
* [C++][Python] Native support for UUID (#37298)
* [C++][Python] Bool8 Extension Type Implementation (#43488)
* [C++][Parquet] Add JSON canonical extension type (#13901)
* [C++][Compute] Replace explicit checking with DCHECK for
invariants in row segmenter (#44236)
* [C++][CI] Improve IPC fuzzing seed corpus (#43621)
* [Documentation][C++] Explicitly note that compute is optional
(#43629)
* [C++] Azure file system write buffering & async writes (#43096)
* [C++][Parquet] Separate encoders and decoder (#43972)
* [C++][Python][Parquet] Support reading/writing key-value
metadata from/to ColumnChunkMetaData (#41580)
* [Docs][C++] Is arrow::dataset namespace still experimental?
* [C++] Add arrow::ArrayStatistics (#43273)
* [CI][C++] Update Minio version (#44225)
* [C++][Parquet] Add binary that extracts a footer from a parquet
file (#42174)
* [C++] Support casting to and from utf8_view/binary_view
(#43302)
* [C++] Update bundled vendor/datetime to support for building
with libc++ and C++20 (#43094)
* [C++] Implement PathFromUri support for Azure file system
(#43098)
* [C++][Compute] Fix the unnecessary allocation of extra bytes
when encoding row table (#43125)
* [C++][Parquet] Replace use of int with int32_t in the internal
Parquet encryption APIs (#43413)
* [C++][Parquet] Refactor Encryptor API to use arrow::util::span
instead of raw pointers (#43195)
* [C++][Parquet] Default initialize some parquet metadata
variables (#43144)
* [C++] Fix CMake link order for AWS SDK (#43230)
* [C++] Suggest a cast when Concatenate fails due to offsets
overflow (#43190)
* [C++] Support basic is_in predicate simplification (#43761)
* [C++][AzureFS] Ignore password field in URI (#44220)
* [C++] Add lint for DCHECK in public headers (#43248)
* [C++][FlightRPC] Reduce repetition in flight/types.cc in serde
functions (#43237)
* [C++][Parquet] remove useless template parameter of
DeltaLengthByteArrayEncoder (#43250)
* [C++] Always prefer mimalloc to jemalloc (#40875)
* [C++][Flight] Use a Base CRTP type for the types used in RPC
calls (#43255)
* [C++] Expand the take function tests to cover more
chunked-array cases (#43292)
* [C++][Parquet] Enhance the comment for ColumnReader/Decoder
(#44003)
* [C++] Order classes in flight/types.h according to Flight.proto
(#43330)
* [C++][Parquet] Deprecate ColumnChunk::file_offset field and no
longer write Metadata at end of Chunk (#43428)
* [C++] Add benchmark for binary view builder (#43445)
* [C++][Python] Add Opaque canonical extension type (#43458)
* [Java][C++] Support more CsvFragmentScanOptions in JNI call
(#43482)
* [C++] Thirdparty: Bump lz4 to 1.10.0 (#43493)
* [C++][Compute] Widen the row offset of the row table to 64-bit
(#43389)
* [C++] Use ViewOrCopyTo instead of CopyTo when pretty printing
non-CPU data (#43508)
* [FlightRPC][C++] Reduce the number of references to
protobuf::Any (#43544)
* [C++] Simplify arrow::ArrayStatistics::ValueType (#43581)
* [C++][GLib] Dont install arrow-cuda.pc/arrow-cuda-glib.pc on
Windows (#43593)
* [C++] Remove redundant default constructor/deconstructor in
arrow::ArrayStatistics (#43579)
* [C++] Remove std::optional from
arrow::ArrayStatistics::is_{min,max}_exact (#43595)
* [C++][FlightRPC] Move the FlightTestServer to its own .cc and
.h files (#43678)
* [C++] Compute: fix register kernel SimdLevel for
AddMinMax512AggKernels (#43704)
* [C++] Prevent Snappy from disabling RTTI when bundled (#43706)
* [C++][FS][Azure] Use the latest Azurite and update the bundled
Azure SDK for C++ to azure-identity_1.9.0 (#43723)
* [C++][Parquet][CI] Parquet: Introducing more bad_data for
testing (#43708)
* [C++][Parquet] Dataset: Handle num-nulls in Parquet correctly
when !HasNullCount() (#43726)
* [C++] Clarify the way SIMD-enabled agg kernels come from the
same code in different compilation units (#43720)
* [C++] Fix Scalar boolean handling in row encoder (#43734)
* [C++] Add support for Boost 1.86 (#43766)
* [C++] Compute: More comment in RowEncoder (#43763)
* [C++] Acero: Minor code enhancement for Join (#43760)
* [C++] Fix the case when boolean_{any all} meets constant input
with length in Acero (#43799)
* [C++] Add chunked Take benchmarks with a small selection factor
(#43772)
* [C++] Indent preprocessor directives (#43798)
* [C++] Attach arrow::ArrayStatistics to arrow::ArrayData
(#43801)
* [C++] Enable filesystem automatically when one of
ARROW_{AZURE,GCS,HDFS,S3}=ON is specified (#43806)
* [C++] Expose the set of device types where a ChunkedArray is
allocated (#43853)
* [C++] Make ChunkResolver::ResolveMany output a list of
ChunkLocations (#43928)
* [C++][Parquet] Add support for arrow::ArrayStatistics: non
zero-copy int based types (#43945)
* [C++][Parquet] Guard against use of cleared decryptor/encryptor
(#43947)
* [C++] Add tests based on random data and benchmarks to
ChunkResolver::ResolveMany (#43954)
* [C++] Enhance error message for URI parsing (#43938)
* [CI][C++][Dev] Add cpplint to pre-commit (#43982)
* [C++][Parquet] Add support for arrow::ArrayStatistics:
zero-copy types (#43984)
* [C++][Acero] Some code cleanup to Grouper (#43988)
* [C++] Add missing std::move() in array_nested.cc (#43993)
* [C++][Docs] Add missing install command in building docs
(#44000)
* [C++][Parquet] Add support for arrow::ArrayStatistics: boolean
(#44009)
* [C++] IPC: ipc reader/writer code enhancement (#44019)
* [C++][Compute] Reduce the complexity of row segmenter (#44053)
* [C++][Parquet] Add Float16 reading benchmarks (#44073)
* [C++][Parquet] Remove deprecated APIs (#44080)
* [C++][Acero] Add more row segmenter tests (#44166)
* [C++][Parquet] Fix typo in parquet/column_writer.cc (#40856)
* [C++] Avoid repeated ArrayData::offset lookups (#44190)
* [C++][Gandiva] Accept LLVM 19.1 (#44233)
* [C++] Unify simd header includings (#44250)
* [C++][Decimal] Use 0E+1 not 0.E+1 for broader compatibility
(#44275)
* [Packaging][C++] Enable Azure file system for deb/rpm (#44348)
- Drop apache-arrow-pr43766-boost1_86.patch
- Release notes for 18.0.0 and 19.0.0
-------------------------------------------------------------------
Fri Sep 27 05:31:41 UTC 2024 - Guang Yee <gyee@suse.com>
- Set the appropriate C++ complier for the given platform so
it will compile on Leap 15.x.
-------------------------------------------------------------------
Wed Sep 18 06:59:36 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Add apache-arrow-pr43766-boost1_86.patch for Boost 1.86
* gh#apache/arrow#43766
-------------------------------------------------------------------
Mon Aug 12 17:11:06 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Update to 17.0.0
## Bug Fixes
* [C++] Add option to string center kernel to control
left/right alignment on odd number of padding (#41449)
* [C++][Python] Fix casting to extension type with fixed size
list storage type (#42219)
* [C++] Replace null_count with MayHaveNulls in
ListArrayFromArray and MapArray (#41957)
* [C++][Python] RecordBatch.filter() segfaults if passed a
ChunkedArray (#40971)
* [C++][Parquet] Timestamp conversion from Parquet to Arrow does
not follow compatibility guidelines for convertedType
* [C++] Use LargeStringArray for casting when writing tables to
CSV (#40271)
* [C++][Python] Map child Array constructed from keys and items
shouldnt have offset (#40871)
* [C++] Fix compile warning with implicitly-defined constructor
does not initialize in encoding_benchmark (#41060)
* [C++] Get null_bit_id according to are_cols_in_encoding_order
in NullUpdateColumnToRow_avx2 (#40998)
* [C++] Clean up unused parameter warnings (#41111)
* [C++][Acero] Fix asof join race (#41614)
* [C++] support for single threaded joins (#41125)
* [C++] Fix hashjoin benchmark failed at make utf8s random
batches (#41195)
* [C++] Check to avoid copying when NullBitmapBuffer is Null
(#41452)
* [C++] Fix crash on invalid Parquet file (#41366)
* [C++][Parquet] More strict Parquet level checking (#41346)
* [C++][Gandiva] Fix gandiva cache size env var (#41330)
* [C++][CMake][Windows] Remove needless .dll suffix from link
libraries (#41341)
* [C++][CMake] Remove unused ARROW_NO_DEPRECATED_API (#41345)
* [C++][maybe_unused] with Arrow macro (#41359)
* [C++][Large] ListView and Map nested types for scalar_if_elses
kernel functions (#41419)
* [C++][Gandiva] Fix ascii_utf8 function to return same result on
x86 and Arm (#41434)
* [C++] Reuse deduplication logic for direct registration
(#41466)
* [C++] Clean up more redundant move warnings (#41487)
* [C++][Compute] Remove redundant logic for ArrayData as
ExecResults in ExecScalarCaseWhen (#41380)
* [C++][CMake] correctly use Protobuf_PROTOC_EXECUTABLE (#41582)
* [C++][CMake] Fix ARROW_USE_BOOST detect condition (#41622)
* [C++][Python] Add optional null_bitmap to MapArray::FromArrays
(#41757)
* [C++] macros.h: Fix ARROW_FORCE_INLINE for MSVC (#41712)
* [C++][Acero] Remove an useless parameter for QueryContext::Init
called in hash_join_benchmark (#41716)
* [C++] Fix the issue that temp vector stack may be under sized
(#41746)
* [C++] Check that extension metadata key is present before
attempting to delete it (#41763)
* [C++] Iterator releases its resource immediately when it reads
all values (#41824)
* [C++][Flight][Benchmark] Ensure waiting server ready (#41793)
* [C++] Fix avx2 gather offset larger than 2GB in
CompareColumnsToRows (#42188)
* [C++][S3] Fix potential deadlock when closing output stream
(#41876)
* [CI][C++] Clear cache for mamba on AppVeyor (#41977)
* [CI][Python][C++] Fix utf8proc detection for wheel on Windows
(#42022)
* [C++] Support list-views on list_slice (#42067)
* [C++] Fix an OTel test failure and remove needless logs
(#42122)
* [C++][FS][Azure] Ensure setting BlobSasBuilder::Protocol
(#42108)
* [C++] Support list-view typed arrays in array_take and
array_filter (#42117)
* [C++] Fix some potential uninitialized variable warnings
(#42207)
* [C++] Avoid invalid accesses in parquet-encoding-benchmark
(#42141)
* [C++] Use FetchContent for bundled ORC (#43011)
* [C++] Fix GetRecordBatchPayload crashes for device data
(#42199)
* [C++] Use non-stale c-ares download URL (#42250)
* [C++][Parquet] Check for valid ciphertext length to prevent
segfault (#43071)
* [C++][Compute] Mark KeyCompare.CompareColumnsToRowsLarge as
large memory test (#43128)
* [C++] Upgrade bundled google-cloud-cpp to 2.22.0 (#43136)
## New Features and Improvements
* [C++][Compute] Implement Grouper::Reset (#41352)
* [Go][C++] Implement Flight SQL Bulk Ingestion (#38385)
* [C++][FS][Azure] Support azure cli auth (#41976)
* [C++][FS][Azure] Add support for environment credential
(#41715)
* [C++] Optimize Take for fixed-size types including nested
fixed-size lists (#41297)
* [C++][Device] Add Copy/View slice functions to a CPU pointer
(#41477)
* [C++] Add support for OpenTelemetry logging (#39905)
* [C++] Import/Export ArrowDeviceArrayStream (#40807)
* [C++] move LocalFileSystem to the registry (#40356)
* [C++] Make flatbuffers serialization more deterministic
(#40392)
* [C++][Gandiva] add RE2::Options set_dot_nl(true) for Like
function (#40970)
* [C++] Introduce portable compiler assumptions (#41021)
* [C++] Add a grouper benchmark for preventing performance
regression (#41036)
* [C++] Support flatten for combining nested list related types
(#41092)
* [C++] Clean up remaining tasks related to half float casts
(#41084)
* [C++][FS][Azure] Add support for CopyFile with hierarchical
namespace support (#41276)
* [C++] Add is_validity_defined_by_bitmap() predicate (#41115)
* [C++] IO: enhance boundary checking in CompressedInputStream
(#41117)
* [C++][Python] Expose recursive flatten for lists on
list_flatten kernel function and pyarrow bindings (#41295)
* [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst
(#41187)
* [C++] Extract the kernel loops used for PrimitiveTakeExec and
generalize to any fixed-width type (#41373)
* [C++][Acero] Use per-node basis temp vector stack to mitigate
overflow (#41335)
* [C++][Parquet] Optimize DelimitRecords by batch execution when
max_rep_level > 1 (#41362)
* [C++][FS][Azure][Docs] Add AzureFileSystem to Filesystems API
reference (#41411)
* [C++] Use ASAN to poison temp vector stack memory (#41695)
* [C++][S3] Add a new option to check existence before CreateDir
(#41822)
* [C++][Parquet] Fix
DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize (#41546)
* [C++] Thirdparty: Upgrade xsimd to 13.0.0 (#41548)
* [C++] Improve fixed_width_test_util.h (#41575)
* [C++] ChunkResolver: Implement ResolveMany and add unit tests
(#41561)
* [C++] fixed_width_internal.h: Simplify docstring and support
bit-sized types (BOOL) (#41597)
* [C++][Python] Extends the add_key_value to parquet::arrow and
PyArrow (#41633)
* [C++][CMake][Windows] Dont build needless object libraries
(#41658)
* [C++][Python] PrettyPrint non-cpu data by copying to default
CPU device (#42010)
* [C++][Parquet] Thrift: generate template method to accelerate
reading thrift (#41703)
* [C++][Parquet] Minor: moving EncodedStats by default rather
than copying (#41727)
* [C++][ORC] Ensure setting detected ORC version (#41767)
* [C++][Parquet] Add file metadata read/write benchmark (#41761)
* [C++] Make git-dependent definitions internal (#41781)
* [C++][S3] Remove GetBucketRegion hack for newer AWS SDK
versions (#41798)
* [C++][Parquet] normalize dictionary encoding to use
RLE_DICTIONARY (#41819)
* [C++] IPC: Minor enhance the code of writer (#41900)
* [C++] Fix ExecuteScalar deduce all_scalar with chunked_array
(#41925)
* [C++] Minor enhance code style for FixedShapeTensorType
(#41954)
* [C++] Follow up of adding null_bitmap to MapArray::FromArrays
(#41956)
* [C++] Misc changes making code around list-like types and
list-view types behave the same way (#41971)
* [C++] : kernel.cc: Remove defaults on switch so that compiler
can check full enum coverage for us (#41995)
* [C++][Parquet] ParquetFilePrinter::JSONPrint print length of
FLBA (#41981)
* [C++][CMake] Add preset for Valgrind (#42110)
* [C++] Move TakeXXX free functions into TakeMetaFunction and
make them private (#42127)
* [C++][FS][Azure] Validate
AzureOptions::{blob,dfs}_storage_scheme (#42135)
* [C++] list_parent_indices: Add support for list-view types
(#42236)
* [C++] Reduce the recursion of many-join test (#43042)
* [C++] Limit buffer size in BufferedInputStream::SetBufferSize
with raw_read_bound (#43064)
- Require cmake lz4 for 1.10
-------------------------------------------------------------------
Sun Apr 21 16:35:21 UTC 2024 - Ben Greiner <code@bnavigator.de>

View File

@@ -1,7 +1,7 @@
#
# spec file for package apache-arrow
#
# Copyright (c) 2024 SUSE LLC
# Copyright (c) 2025 SUSE LLC
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
@@ -16,19 +16,29 @@
#
%bcond_without tests
%bcond_without flight
# Remove static build due to devel-static packages being required by the generated CMake Targets
%bcond_with static
%bcond_without tests
# Required for runtime dispatch, not yet packaged
%bcond_with xsimd
%define sonum 1600
%if %{suse_version} <= 1500
# requires __has_builtin with keywords
%define gccver 13
%endif
%define sonum 2000
# See git submodule /testing pointing to the correct revision
%define arrow_testing_commit 25d16511e8d42c2744a1d94d90169e3a36e92631
%define arrow_testing_commit d2a13712303498963395318a4eb42872e66aead7
# See git submodule /cpp/submodules/parquet-testing pointing to the correct revision
%define parquet_testing_commit 74278bc4a1122d74945969e6dec405abd1533ec3
%define parquet_testing_commit 18d17540097fca7c40be3d42c167e6bfad90763c
# See cpp/thirdparty/versions.txt, replace by BuildRequires: pkgconfig(mimalloc) as soon as gh#apache/arrow#42211 is resolved
# mimalloc version bumped, see Patch100
%define arrow_mimalloc_build_version v2.0.9
Name: apache-arrow
Version: 16.0.0
Version: 20.0.0
Release: 0
Summary: A development platform for in-memory data
License: Apache-2.0 AND BSD-3-Clause AND BSD-2-Clause AND MIT
@@ -38,12 +48,17 @@ URL: https://arrow.apache.org/
Source0: https://github.com/apache/arrow/archive/apache-arrow-%{version}.tar.gz
Source1: https://github.com/apache/arrow-testing/archive/%{arrow_testing_commit}.tar.gz#/arrow-testing-%{version}.tar.gz
Source2: https://github.com/apache/parquet-testing/archive/%{parquet_testing_commit}.tar.gz#/parquet-testing-%{version}.tar.gz
Source3: https://github.com/microsoft/mimalloc/archive/%{arrow_mimalloc_build_version}.tar.gz#/mimalloc-%{arrow_mimalloc_build_version}.tar.gz
Patch100: apache-arrow-19.0.1-mimalloc-version.patch
BuildRequires: bison
BuildRequires: cmake >= 3.16
BuildRequires: cmake >= 3.25
BuildRequires: fdupes
BuildRequires: flex
BuildRequires: gcc-c++
BuildRequires: gcc%{?gccver}-c++
BuildRequires: libboost_context-devel
BuildRequires: libboost_date_time-devel
BuildRequires: libboost_filesystem-devel
BuildRequires: libboost_process-devel
BuildRequires: libboost_system-devel >= 1.64.0
%if %{with static}
BuildRequires: libzstd-devel-static
@@ -51,6 +66,7 @@ BuildRequires: libzstd-devel-static
BuildRequires: pkgconfig
BuildRequires: python-rpm-macros
BuildRequires: python3-base
BuildRequires: (cmake(lz4) >= 1.10 or (pkgconfig(liblz4) >= 1.8.3 with pkgconfig(liblz4) < 1.10))
BuildRequires: cmake(Snappy) >= 1.1.7
BuildRequires: cmake(absl)
BuildRequires: cmake(double-conversion) >= 3.1.5
@@ -64,7 +80,6 @@ BuildRequires: pkgconfig(libbrotlidec) >= 1.0.7
BuildRequires: pkgconfig(libbrotlienc) >= 1.0.7
BuildRequires: pkgconfig(libcares) >= 1.15.0
BuildRequires: pkgconfig(libglog) >= 0.3.5
BuildRequires: pkgconfig(liblz4) >= 1.8.3
BuildRequires: pkgconfig(libopenssl)
BuildRequires: pkgconfig(liburiparser) >= 0.9.3
BuildRequires: pkgconfig(libutf8proc)
@@ -177,15 +192,19 @@ Group: Development/Libraries/C and C++
Requires: libarrow%{sonum} = %{version}
Requires: libarrow_acero%{sonum} = %{version}
Requires: libarrow_dataset%{sonum} = %{version}
%if %{with flight}
Requires: libarrow_flight%{sonum} = %{version}
Requires: libarrow_flight_sql%{sonum} = %{version}
%endif
%if %{with static}
Suggests: %{name}-devel-static = %{version}
Suggests: %{name}-acero-devel-static = %{version}
Suggests: %{name}-dataset-devel-static = %{version}
%if %{with flight}
Suggests: %{name}-flight-devel-static = %{version}
Suggests: %{name}-flight-sql-devel-static = %{version}
%endif
%endif
%description devel
Apache Arrow is a cross-language development platform for in-memory
@@ -329,8 +348,11 @@ This package provides utilities for working with the Parquet format.
sed -i 's/find_package(Protobuf/find_package(Protobuf CONFIG/' cpp/cmake_modules/FindProtobufAlt.cmake
%build
%{?gccver:export CXX=g++-%{gccver}}
%{?gccver:export CC=gcc-%{gccver}}
export CFLAGS="%{optflags} -ffat-lto-objects"
export CXXFLAGS="%{optflags} -ffat-lto-objects"
export ARROW_MIMALLOC_URL=%{SOURCE3}
pushd cpp
%cmake \
@@ -351,14 +373,15 @@ pushd cpp
-DARROW_CSV:BOOL=ON \
-DARROW_DATASET:BOOL=ON \
-DARROW_FILESYSTEM:BOOL=ON \
%if %{with flight}
-DARROW_FLIGHT:BOOL=ON \
-DARROW_FLIGHT_SQL:BOOL=ON \
%endif
-DARROW_GANDIVA:BOOL=OFF \
-DARROW_SKYHOOK:BOOL=OFF \
-DARROW_HDFS:BOOL=ON \
-DARROW_HIVESERVER2:BOOL=OFF \
-DARROW_IPC:BOOL=ON \
-DARROW_JEMALLOC:BOOL=OFF \
-DARROW_JSON:BOOL=ON \
-DARROW_ORC:BOOL=OFF \
-DARROW_PARQUET:BOOL=ON \
@@ -387,16 +410,20 @@ pushd cpp
popd
%if %{with tests}
rm %{buildroot}%{_libdir}/libarrow_testing.so*
rm %{buildroot}%{_libdir}/libarrow_flight_testing.so*
rm %{buildroot}%{_libdir}/pkgconfig/arrow-testing.pc
rm -Rf %{buildroot}%{_libdir}/cmake/ArrowTesting
rm -Rf %{buildroot}%{_includedir}/arrow/testing
%if %{with flight}
rm %{buildroot}%{_libdir}/libarrow_flight_testing.so*
rm %{buildroot}%{_libdir}/pkgconfig/arrow-flight-testing.pc
rm -Rf %{buildroot}%{_libdir}/cmake/ArrowFlightTesting
%endif
%if %{with static}
rm %{buildroot}%{_libdir}/libarrow_testing.a
%if %{with flight}
rm %{buildroot}%{_libdir}/libarrow_flight_testing.a
%endif
rm -Rf %{buildroot}%{_libdir}/cmake/ArrowTesting
rm -Rf %{buildroot}%{_libdir}/cmake/ArrowFlightTesting
rm -Rf %{buildroot}%{_includedir}/arrow/testing
%endif
%endif
rm -r %{buildroot}%{_datadir}/doc/arrow/
%fdupes %{buildroot}%{_libdir}/cmake
@@ -421,7 +448,7 @@ if [ -n "${GTEST_failing}" ]; then
fi
%ifarch s390x
# bsc#1218592
exclude_regex='--exclude-regex (arrow-dataset-file-parquet-test|parquet-internals-test|parquet-reader-test|parquet-arrow-test|parquet-arrow-internals-test|parquet-encryption-test|arquet-encryption-key-management-test)'
exclude_regex='--exclude-regex (arrow-dataset-file-parquet-test|parquet-internals-test|parquet-reader-test|parquet-arrow-test|parquet-arrow-internals-test|parquet-encryption-test|parquet-encryption-key-management-test)'
%endif
%ctest --label-regex unittest $exclude_regex
popd
@@ -431,54 +458,60 @@ popd
%postun -n libarrow%{sonum} -p /sbin/ldconfig
%post -n libarrow_acero%{sonum} -p /sbin/ldconfig
%postun -n libarrow_acero%{sonum} -p /sbin/ldconfig
%if %{with flight}
%post -n libarrow_flight%{sonum} -p /sbin/ldconfig
%postun -n libarrow_flight%{sonum} -p /sbin/ldconfig
%post -n libarrow_flight_sql%{sonum} -p /sbin/ldconfig
%postun -n libarrow_flight_sql%{sonum} -p /sbin/ldconfig
%endif
%post -n libarrow_dataset%{sonum} -p /sbin/ldconfig
%postun -n libarrow_dataset%{sonum} -p /sbin/ldconfig
%post -n libparquet%{sonum} -p /sbin/ldconfig
%postun -n libparquet%{sonum} -p /sbin/ldconfig
%files
%license LICENSE.txt NOTICE.txt header
%license LICENSE.txt NOTICE.txt
%{_bindir}/arrow-file-to-stream
%{_bindir}/arrow-stream-to-file
%files -n libarrow%{sonum}
%license LICENSE.txt NOTICE.txt header
%license LICENSE.txt NOTICE.txt
%{_libdir}/libarrow.so.*
%files -n libarrow_acero%{sonum}
%license LICENSE.txt NOTICE.txt header
%license LICENSE.txt NOTICE.txt
%{_libdir}/libarrow_acero.so.*
%if %{with flight}
%files -n libarrow_flight%{sonum}
%license LICENSE.txt NOTICE.txt header
%license LICENSE.txt NOTICE.txt
%{_libdir}/libarrow_flight.so.*
%files -n libarrow_flight_sql%{sonum}
%license LICENSE.txt NOTICE.txt header
%license LICENSE.txt NOTICE.txt
%{_libdir}/libarrow_flight_sql.so.*
%endif
%files -n libarrow_dataset%{sonum}
%license LICENSE.txt NOTICE.txt header
%license LICENSE.txt NOTICE.txt
%{_libdir}/libarrow_dataset.so.*
%files -n libparquet%{sonum}
%license LICENSE.txt NOTICE.txt header
%license LICENSE.txt NOTICE.txt
%{_libdir}/libparquet.so.*
%files devel
%doc README.md
%license LICENSE.txt NOTICE.txt header
%license LICENSE.txt NOTICE.txt
%{_includedir}/arrow/
%{_libdir}/cmake/Arrow*
%{_libdir}/libarrow.so
%{_libdir}/libarrow_acero.so
%{_libdir}/libarrow_dataset.so
%if %{with flight}
%{_libdir}/libarrow_flight.so
%{_libdir}/libarrow_flight_sql.so
%endif
%{_libdir}/pkgconfig/arrow*.pc
%dir %{_datadir}/arrow
%{_datadir}/arrow/gdb
@@ -490,29 +523,31 @@ popd
%if %{with static}
%files devel-static
%license LICENSE.txt NOTICE.txt header
%license LICENSE.txt NOTICE.txt
%{_libdir}/libarrow.a
%files acero-devel-static
%license LICENSE.txt NOTICE.txt header
%license LICENSE.txt NOTICE.txt
%{_libdir}/libarrow_acero.a
%files dataset-devel-static
%license LICENSE.txt NOTICE.txt header
%license LICENSE.txt NOTICE.txt
%{_libdir}/libarrow_dataset.a
%if %{with flight}
%files flight-devel-static
%license LICENSE.txt NOTICE.txt header
%license LICENSE.txt NOTICE.txt
%{_libdir}/libarrow_flight.a
%files flight-sql-devel-static
%license LICENSE.txt NOTICE.txt header
%license LICENSE.txt NOTICE.txt
%{_libdir}/libarrow_flight_sql.a
%endif
%endif
%files -n apache-parquet-devel
%doc README.md
%license LICENSE.txt NOTICE.txt header
%license LICENSE.txt NOTICE.txt
%{_includedir}/parquet/
%{_libdir}/cmake/Parquet
%{_libdir}/libparquet.so
@@ -520,13 +555,13 @@ popd
%if %{with static}
%files -n apache-parquet-devel-static
%license LICENSE.txt NOTICE.txt header
%license LICENSE.txt NOTICE.txt
%{_libdir}/libparquet.a
%endif
%files -n apache-parquet-utils
%doc README.md
%license LICENSE.txt NOTICE.txt header
%license LICENSE.txt NOTICE.txt
%{_bindir}/parquet-*
%changelog

View File

@@ -1,3 +0,0 @@
version https://git-lfs.github.com/spec/v1
oid sha256:87fa36b469cac0a0c95596e7be39548ddf20c8f737a02ea559e30fbebd12c7d3
size 3571960

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:9cca062005e329a6a60a30e28f509f5f4bd12384035b64fcaab19a5a46343cc1
size 3572581

3
mimalloc-v2.0.9.tar.gz Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4a29edae32a914a706715e2ac8e7e4109e25353212edeed0888f4e3e15db5850
size 1143452

View File

@@ -1,3 +0,0 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ac6331205baec1b97e8115de22efaf84561483623e5792d58060e91e84304bce
size 1037654

View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4496522640dc88635a8bf3c8e7572a5815549188fa00df132eef6e2a97ce0652
size 1077258

View File

@@ -1,26 +0,0 @@
Index: arrow-apache-arrow-16.0.0/python/pyarrow/tests/test_array.py
===================================================================
--- arrow-apache-arrow-16.0.0.orig/python/pyarrow/tests/test_array.py
+++ arrow-apache-arrow-16.0.0/python/pyarrow/tests/test_array.py
@@ -3323,7 +3323,7 @@ def test_numpy_array_protocol():
result = np.asarray(arr)
np.testing.assert_array_equal(result, expected)
- if Version(np.__version__) < Version("2.0"):
+ if Version(np.__version__) < Version("2.0.0rc1"):
# copy keyword is not strict and not passed down to __array__
result = np.array(arr, copy=False)
np.testing.assert_array_equal(result, expected)
Index: arrow-apache-arrow-16.0.0/python/pyarrow/tests/test_table.py
===================================================================
--- arrow-apache-arrow-16.0.0.orig/python/pyarrow/tests/test_table.py
+++ arrow-apache-arrow-16.0.0/python/pyarrow/tests/test_table.py
@@ -3244,7 +3244,7 @@ def test_numpy_array_protocol(constructo
table = constructor([[1, 2, 3], [4.0, 5.0, 6.0]], names=["a", "b"])
expected = np.array([[1, 4], [2, 5], [3, 6]], dtype="float64")
- if Version(np.__version__) < Version("2.0"):
+ if Version(np.__version__) < Version("2.0.0rc1"):
# copy keyword is not strict and not passed down to __array__
result = np.array(table, copy=False)
np.testing.assert_array_equal(result, expected)

View File

@@ -1,3 +1,340 @@
-------------------------------------------------------------------
Fri Jun 13 18:22:38 UTC 2025 - Ben Greiner <code@bnavigator.de>
- Update to 20.0.0
## Bug Fixes
* GH-36628 - [Python][Parquet] Fail when instantiating internal
Parquet metadata classes (#45549)
* GH-37630 - [C++][Python][Dataset] Allow disabling fragment
metadata caching (#45330)
* GH-44188 - [Python] Fix pandas roundtrip with bytes column
names (#44171)
* GH-45129 - [Python][C++] Fix usage of deprecated C++
functionality on pyarrow (#45189)
* GH-45155 - [Python][CI] Fix path for scientific nightly windows
wheel upload (#45222)
* GH-45169 - [Python] Adapt to modified pytest ignore collect
hook api (#45170)
* GH-45380 - [Python] Expose RankQuantileOptions to Python
(#45392)
* GH-45530 - [Python][Packaging] Add pyarrow.libs dir to
get_library_dirs (#45766)
* GH-45582 - [Python] Preserve decimal32/64/256 metadata in
Schema.metadata (#45583)
* GH-45733 - [C++][Python] Add biased/unbiased toggle to skew and
kurtosis functions (#45762)
* GH-45739 - [C++][Python] Fix crash when calling
hash_pivot_wider without options (#45740)
* GH-45758 - [Python] Add AzureFileSystem documentation (#45759)
* GH-45926 - [Python] Use pytest.approx for float values on
unbiased skew and kurtosis tests (#45929)
* GH-46041 - [Python][Packaging] Temporary remove pandas from
being installed on free-threaded Windows wheel tests (#46042)
## New Features and Improvements
* GH-14932 - [Python] Add python bindings for JSON streaming
reader (#45084)
* GH-35289 - [Python] Support large variable width types in numpy
conversion (#36701)
* GH-36412 - [Python][CI] Fix deprecation warnings in the pandas
nightly build
* GH-39010 - [Python] Introduce maps_as_pydicts parameter for
to_pylist, to_pydict, as_py (#45471)
* GH-41002 - [Python] Remove pins for pytest-cython and
conda-docs pytest (#45240)
* GH-41985 - [Python][Docs] Clarify docstring of
pyarrow.compute.scalar() (#45668)
* GH-43587 - [Python] Remove no longer used serialize/deserialize
PyArrow C++ code (#45743)
* GH-44421 - [Python] Add configuration for building & testing
free-threaded wheels on Windows (#44804)
* GH-44790 - [Python] Remove use_legacy_dataset from code base
(#45742)
* GH-45156 - [Python][Packaging] Refactor Python Windows wheel
images to use newer base image (#45442)
* GH-45237 - [Python] Raise minimum supported cython to >=3
(#45238)
* GH-45278 - [Python][Packaging] Updated delvewheel install
command and updated flags used with delvewheel repair (#45323)
* GH-45282 - [Python][Parquet] Remove unused readonly properties
of ParquetWriter (#45281)
* GH-45288 - [Python][Packaging][Docs] Update documentation for
PyArrow nightly wheels (#45289)
* GH-45358 - [C++][Python] Add MemoryPool method to print
statistics (#45359)
* GH-45433 - [Python] Remove Cython workarounds (#45437)
* GH-45457 - [Python] Add pyarrow.ArrayStatistics (#45550)
* GH-45482 - [CI][Python] Dont use Ubuntu 20.04 for wheel test
(#45483)
* GH-45570 - [Python] Allow Decimal32/64Array.to_pandas (#45571)
* GH-45676 - [C++][Python][Compute] Add skew and kurtosis
functions (#45677)
* GH-45680 - [C++][Python] Remove deprecated functions in 20.0
* GH-45705 - [Python] Add support for SAS token in
AzureFileSystem (#45706)
* GH-45755 - [C++][Python][Compute] Add winsorize function
(#45763)
* GH-45848 - [C++][Python][R] Remove deprecated PARQUET_2_0
(#45849)
* GH-45920 - [Release][Python] Upload sdist and wheels to GitHub
Releases not apache.jfrog.io (#45962)
-------------------------------------------------------------------
Mon Feb 17 19:17:26 UTC 2025 - Ben Greiner <code@bnavigator.de>
- Update to 19.0.1
## Bug Fixes
* [Python][CI] Make download_tzdata_on_windows more robust and
use tzdata package for tzinfo database on Windows for ORC
(#45425)
* [Python] Only enable the string dtype on pandas export for
pandas>=2.3 (#45383) [Python] Fix version comparison in pandas
compat for pandas 2.3 dev version (#45428)
## Improvements
* [CI][Python] Temporarily avoid newer boto3 version (#45311)
[CI] Bump Minio version and unpin boto3 (#45320)
- Release 19.0.0
## New Features and Improvements
* [Python] Add more FlightInfo / FlightEndpoint attributes
(#43537)
* [Python] Support Arrow PyCapsule stream objects in
write_dataset (#43771)
* [Python] Support pandas future default string dtype
* [CI][Python] Use GitHub Packages for vcpkg cache (#44644)
* [Python] Add Python wrapper for JsonExtensionType (#44070)
* [Python][C++] Add version suffix to libarrow_python* libraries
(#44702)
* [Python] Add support for Decimal32 and Decimal64 types (#44882)
* [C++][Python] Add Hyperbolic Trig functions (#44630)
* [Python] Clean-up name / field_name handling in pandas compat
(#44963)
* [CI][Python][Packaging] Test 3.12 wheels on Ubuntu 24.04
(#45042)
* [CI][Packaging][Python] Simplify
dev/tasks/python-wheels/github.linux.yml (#45077)
* [Python] Honor the strings_to_categorical keyword in to_pandas
for string view type (#45176)
## Bug Fixes
* [C++][Python] Fix ORC crash when file contains unknown timezone
(#45051)
* [Python] Converting month_day_nano_interal to numpy crashes
* [Python] Allow from_buffers to work with StringView on Python
(#44701)
* [C++][Python] Fix Flight Timestamp precision, revert workaround
from #43537 (#44681)
* [Docs][Python] Add missing canonical extension types to PyArrow
arrays and datatypes docs (#44880)
* [Python] Trigger manual Garbage collection before checking
allocated bytes for dlpack tests (#44793)
* [Python][Packaging] Use delvewheel to repair Windows wheels
(#35323)
* [CI][Python] Fix and modernize AppVeyor build (#44999)
* [Python][Docs] Update docstrings for metadata methods on Field
and Schema classes (#45004)
* [CI][Python] Fix test_memory failures (#45007)
* [CI][Packaging][Python] Fix Docker push step for free-threaded
wheel builds (#45040)
* [Packaging][Python] Use ORC from vcpkg instead of bundled on
Linux and macOS (#45046)
- Release 18.1.0
## Bug Fixes
* [Release][Packacing][Python] Set PARQUET_TEST_DATA on
verify-release-candidate-wheels.bat (#44462)
## New Features and Improvements
- Release 18.0.0
## Bug Fixes
* [Python][Packaging] Bump MACOSX_DEPLOYMENT_TARGET to 12 instead
of 11 (#43137)
* [Release][Packaging][Python] Add tzdata as conda env
requirement to avoid ORC failure (#43233)
* [Python] Give precedence to pycapsule interface in
pa.schema(..) (#43486)
* [Python] Sanitize Python reference handling in UDF
implementation (#43557)
* [Python] Allow tuple for rename columns (#43609)
* [Packaging][Python] Fix vcpkg version detection in macOS wheel
build jobs (#43615)
* [Python] Fix compilation on Cython<3 (#43765)
* [Python][CI] Correct PARQUET_TEST_DATA path in wheel tests
(#43786)
* [CI][Packaging][Python] Avoid uploading wheel to gemfury if
version already exists (#43816)
* [CI][Python] Skip test that requires PARQUET_TEST_DATA env on
emscripten (#43906)
* [Python] Fix threading issues with borrowed refs and pandas
(#44047)
* [Benchmarking][Python] Avoid uwsgi install failure on macOS
(#44221)
* [CI][Release][Python] Do not verify Python on Ubuntu 20.04
(#44254)
* [CI][Python] Remove ds requirement from test collection on
test_dataset.py (#44370)
## New Features and Improvements
* [C++][Python] Native support for UUID (#37298)
* [C++][Python] Bool8 Extension Type Implementation (#43488)
* [Python] Make NumPy an optional runtime dependency (#41904)
* [Python] Add StructType attribute to access all its fields
(#43481)
* [CI][Python] Use pipx to install GCS testbench (#43852)
* [Python][CI][Packaging] Dont upload sdist to scientific-python
nightly channel (only wheels) (#43943)
* [Python][CI][Packaging] Upload nightly wheels to main label of
scientific-python-nightly-wheels channel (#43932)
* [CI][Packaging][Python] Upload pyarrow nightly wheels to
scientific python channel on Anaconda (#43862)
* [C++][Python][Parquet] Support reading/writing key-value
metadata from/to ColumnChunkMetaData (#41580)
* [Python] Ensure (Chunked)Array/RecordBatch/Table methods dont
crash with non-CPU data
* [Python] Let StructArray.from_array accept a type in addition
to names or fields (#43047)
* [Python] Test FlightStreamReader iterator (#42086)
* [Python] Add bindings for CopyTo on RecordBatch and Array
classes (#42223)
* [Python] Use Py_IsFinalizing from pythoncapi_compat.h (#43767)
* [Python] Add bindings for memory manager and device to Context
class (#43392)
* [C++][Python] Add Opaque canonical extension type (#43458)
* [Python] Deprecate passing build flags to setup.py (#43515)
* [Python][Packaging][CI] Drop Python 3.8 support (#43970)
* [Python][CI] Add Python 3.13 conda test build (#44192)
* [Python][CI][Packaging] Use released versions to build and test
wheels on Python 3.13 (#44193)
* [Python] Set up wheel building for Python 3.13 (#43539)
* [Python] Remove usage of deprecated pkg_resources in setup.py
(#43602)
* [Python][CI] Add a Crossbow job with the free-threaded build
(#43671)
* [Python] Do not use borrowed references APIs (#43540)
* [Python] Declare support for free-threading in Cython (#43606)
* [Python][CI] Add a Crossbow job with a debug CPython
interpreter (#43565)
* [Python][Dataset] Python / Cython interface to C++
arrow::dataset::Partitioning::Format (#43740)
* [Python][CI] Simplify python/requirements-wheel-test.txt file
(#43691)
* [Python] RecordBatch fails gracefully on non-cpu devices
(#43729)
* [Python] ChunkedArray fails gracefully on non-cpu devices
(#43795)
* [Python][Packaging] Remove numpy dependency from pyarrow
packaging (#44148)
* [Python] Build macOS and manylinux wheels for free-threading
(#43965)
* [Python] Table fails gracefully on non-cpu devices (#43974)
* [Python] Deprecate the no longer used serialize/deserialize
Pyarrow C++ functions (#44064)
* [CI][Python] Enable S3 testing on Windows wheel builds (#44093)
* [CI][Python] Enable S3 tests on macOS CI (#44129)
* [Packaging][Python] Use macOS 12 as deployment target to have
macOS 12 pyarrow wheels (#44315)
* [Packaging][Python] Disable interactive deb configuration in
wheel-manylinux--cp313t- (#44362)
- Drop pyarrow-pr433325-extradirs.patch
-------------------------------------------------------------------
Thu Sep 26 23:24:22 UTC 2024 - Guang Yee <gyee@suse.com>
- Enable sle15_python_module_pythons.
-------------------------------------------------------------------
Wed Aug 14 20:27:48 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Update to 17.0.0
## Bug Fixes
* [C++][Python] Fix casting to extension type with fixed size
list storage type (#42219)
* [Python] Include metadata when creating pa.schema from
PyCapsule (#41538)
* [C++][Python] RecordBatch.filter() segfaults if passed a
ChunkedArray (#40971)
* [Python] pa.array: add check for byte-swapped numpy arrays
inside python objects (#41549)
* [Python] Fix read_table for encrypted parquet (#39438)
* [Python] RunEndEncodedArray.from_arrays: bugfix for Array
arguments (#40560) (#41093)
* [C++][Python] Map child Array constructed from keys and items
shouldnt have offset (#40871)
* [Python] `test_numpy_array_protocol` test failures with numpy
2.0.0rc1
* [Python] Fix StructArray.sort() for by=None (#41495)
* [Python] Build with Python 3.13 (#42034)
* [Python] remove special methods related to buffers in python
<2.6 (#41492)
* [Python] Fix reading column index with decimal values (#41503)
* [Docs][Python] Remove duplicate contents (#41588)
* [C++][Python] Add optional null_bitmap to MapArray::FromArrays
(#41757)
* [Python][Parquet] Implement to_dict method on SortingColumn
(#41704)
* [Python] CMake: ignore Parquet encryption option if Parquet
itself is not enabled (fix Java integration build) (#41776)
* [Python] Disallow direct pa.RecordBatchReader() construction to
avoid segfaults (#41773)
* [Python] Fix RecordBatchReader.cast to support casting to equal
schema for all types (#42098)
* [Python] Fix tests when using NumPy 2.0 on Windows (#42099)
* [CI][Python] Use pip install -e instead of setup.py build_ext
inplace for installing pyarrow on verification script (#42007)
* [CI][Python][C++] Fix utf8proc detection for wheel on Windows
(#42022)
* [Python][CI] Update expected output for numpy 2.0.0 (#42172)
## New Features and Improvements
* [Python] Replace pandas.util.testing.rands with vendored
version (#42089)
* [Python] begin moving static settings to pyproject.toml
(#41041)
* [Python] Implement PyCapsule interface for Device data in
PyArrow (#40717)
* [Python] Expand the Arrow PyCapsule Interface with C Device
Data support (#40708)
* [Python] Let RecordBatch.filter accept a boolean expression in
addition to mask array (#43043)
* [Python] Fix pickling of LocalFileSystem for cython 2 (#41459)
* [Python] Expand the C Device Interface bindings to support
import on CUDA device (#40385)
* [Python] Allow passing a mapping of column names to
rename_columns (#40645)
* [Python][Packaging] Strip unnecessary symbols when building
wheels (#42028)
* [Python][Docs] Update PyArrow installation docs for conda
package split (#41135)
* [Python] Basic bindings for Device and MemoryManager classes
(#41685)
* [C++][Python] Expose recursive flatten for lists on
list_flatten kernel function and pyarrow bindings (#41295)
* [Python][Packaging] Ensure to build with released numpy 2.0
(instead of RC) in the wheel building workflows (#42194)
* [CI][Python] Add a job on ARM64 macOS (#41313)
* [CI][Python] Reduce CI time on macOS (#41378)
* [Python] Expose byte_width and bit_width of ExtensionType in
terms of the storage type (#41413)
* [Python] Update Python development guide about components being
enabled by default based on Arrow C++ (#41705)
* [Python] Building PyArrow: enable/disable python components by
default based on availability in Arrow C++ (#41494)
* [C++][Python] Extends the add_key_value to parquet::arrow and
PyArrow (#41633)
* [Python] Ensure Buffer methods dont crash with non-CPU data
(#41889)
* [C++][Python] PrettyPrint non-cpu data by copying to default
CPU device (#42010)
* [Python][Parquet] Update BYTE_STREAM_SPLIT description in
write_table() docstring (#41759)
* [Python] Add support for Pyodide (#37822)
* [Python] Fix pandas tests to follow downstream datetime64 unit
changes (#41979)
* [Python] Allow Array.filter() to take general array input
(#42051)
* [Python] Expose new FLOAT16 logical type in the pyarrow.parquet
bindings (#42103)
* [Python] Array gracefully fails on non-cpu device (#42113)
* [Python][Parquet] Pyarrow store decimal as integer (#42169)
* [Python] Add CI job for Numpy 1.X (#42189)
* [CI][Python] Pin openjdk=17 in python substrait integration
(#43051)
- Drop pyarrow-pr41319-numpy2-tests.patch
- Add pyarrow-pr433325-extradirs.patch gh#apache/arrow/pull/43325
-------------------------------------------------------------------
Thu Apr 25 08:58:22 UTC 2024 - Ben Greiner <code@bnavigator.de>
@@ -252,12 +589,12 @@ Mon Jan 15 20:42:25 UTC 2024 - Ben Greiner <code@bnavigator.de>
-------------------------------------------------------------------
Tue Nov 14 23:29:03 UTC 2023 - Ondřej Súkup <mimi.vx@gmail.com>
- Fix cve in changelog
- Fix cve in changelog
-------------------------------------------------------------------
Tue Nov 14 09:28:23 UTC 2023 - Ondřej Súkup <mimi.vx@gmail.com>
- Update to 14.0.1
- Update to 14.0.1
- drop pyarrow-pr37481-pandas2.1.patch
- fixes boo#1216991 CVE-2023-47248
* GH-38431 - [Python][CI] Update fs.type_name checks for s3fs tests
@@ -440,7 +777,7 @@ Sun Mar 12 05:31:32 UTC 2023 - Ben Greiner <code@bnavigator.de>
* [Python][Docs] adding info about TableGroupBy.aggregation with empty list (#14482)
* [Python] DataFrame Interchange Protocol for pyarrow Table
* [Python] Drop older versions of Pandas (<1.0) (#14631)
* [Python] Pass Cmake args to Python CPP
* [Python] Pass Cmake args to Python CPP
* [Docs][Python] Improve docs for S3FileSystem (#14599)
* [Python] Add missing value accessor to temporal types (#14746)
* [Python] Expose time32/time64 scalar values (#14637)
@@ -468,7 +805,7 @@ Sun Mar 12 05:31:32 UTC 2023 - Ben Greiner <code@bnavigator.de>
* [Python] Support passing create_dir thru pq.write_to_dataset (#14459)
* [CI][Python] Fix pandas master/nightly build failure related to timedelta (#14460)
* [Python] Fix writing files with multi-byte characters in file name (#14764)
* [Python] Handle pytest 8 deprecations about pytest.warns(None)
* [Python] Handle pytest 8 deprecations about pytest.warns(None)
* [Python] Remove ARROW_BUILD_DIR in building pyarrow C++ (#14498)
* [Python] Honor default memory pool in Dataset scanning (#14516)
* [Python] Fully support filesystem in parquet.write_metadata (#14574)

View File

@@ -1,7 +1,7 @@
#
# spec file for package python-pyarrow
#
# Copyright (c) 2024 SUSE LLC
# Copyright (c) 2025 SUSE LLC
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
@@ -16,29 +16,38 @@
#
%{?sle15_python_module_pythons}
%bcond_with xsimd
%define plainpython python
# See git submodule /testing pointing to the correct revision
%define arrow_testing_commit d2a13712303498963395318a4eb42872e66aead7
# See git submodule /cpp/submodules/parquet-testing pointing to the correct revision
%define parquet_testing_commit 18d17540097fca7c40be3d42c167e6bfad90763c
%if %{suse_version} <= 1500
# requires __has_builtin with keywords
%define gccver 13
%endif
Name: python-pyarrow
Version: 16.0.0
Version: 20.0.0
Release: 0
Summary: Python library for Apache Arrow
License: Apache-2.0 AND BSD-3-Clause AND BSD-2-Clause AND MIT
URL: https://arrow.apache.org/
# SourceRepository: https://github.com/apache/arrow
Source0: apache-arrow-%{version}.tar.gz
Source1: arrow-testing-%{version}.tar.gz
Source2: parquet-testing-%{version}.tar.gz
Source99: python-pyarrow.rpmlintrc
# PATCH-FIX-UPSTREAM pyarrow-pr41319-numpy2-tests.patch gh#apache/arrow#41319
Patch0: pyarrow-pr41319-numpy2-tests.patch
BuildRequires: %{python_module Cython >= 0.29.31}
BuildRequires: %{python_module Cython >= 3}
BuildRequires: %{python_module devel >= 3.8}
BuildRequires: %{python_module numpy-devel >= 1.25}
BuildRequires: %{python_module pip}
BuildRequires: %{python_module setuptools_scm}
BuildRequires: %{python_module setuptools}
BuildRequires: %{python_module wheel}
BuildRequires: cmake
BuildRequires: cmake >= 3.25
BuildRequires: fdupes
BuildRequires: gcc-c++
BuildRequires: gcc%{?gccver}-c++
BuildRequires: openssl-devel
BuildRequires: pkgconfig
BuildRequires: python-rpm-macros
@@ -88,12 +97,13 @@ This package provides the header files within the python
platlib for consuming modules using cythonization.
%prep
%autosetup -p1 -n arrow-apache-arrow-%{version}
# we disabled the jemalloc backend in apache-arrow
sed -i 's/should_have_jemalloc = sys.platform == "linux"/should_have_jemalloc = False/' python/pyarrow/tests/test_memory.py
%setup -n arrow-apache-arrow-%{version} -a1 -a2
%autopatch -p1
%build
pushd python
%{?gccver:export CXX=g++-%{gccver}}
%{?gccver:export CC=gcc-%{gccver}}
export CFLAGS="%{optflags}"
export PYARROW_BUILD_TYPE=relwithdebinfo
export PYARROW_BUILD_VERBOSE=1
@@ -122,8 +132,15 @@ pushd python
popd
%check
# flaky
%{?gccver:export CXX=g++-%{gccver}}
%{?gccver:export CC=gcc-%{gccver}}
export ARROW_TEST_DATA="${PWD}/arrow-testing-%{arrow_testing_commit}/data"
export PARQUET_TEST_DATA="${PWD}/parquet-testing-%{parquet_testing_commit}/data"
# flaky tests
donttest="test_total_bytes_allocated"
donttest="$donttest or test_batch_lifetime"
# worker crashes, we don't have an s3 setup in obs anyway
donttest="$donttest or test_s3fs_limited_permissions_create_bucket"
%ifarch %{ix86} %{arm32}
# tests conversion to 64bit datatypes
donttest="$donttest or test_conversion"