Sync from SUSE:ALP:Source:Standard:1.0 apache-arrow revision 93f04aba74c56d8de77af9dd5d33dad0

2024-01-05 09:32:04 +01:00 · 2024-01-05 09:32:04 +01:00 · fa16e2d0e2
commit fa16e2d0e2
7 changed files with 875 additions and 0 deletions
--- a/.gitattributes
+++ b/.gitattributes
@ -0,0 +1,23 @@
+## Default LFS
+*.7z filter=lfs diff=lfs merge=lfs -text
+*.bsp filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.gem filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.jar filter=lfs diff=lfs merge=lfs -text
+*.lz filter=lfs diff=lfs merge=lfs -text
+*.lzma filter=lfs diff=lfs merge=lfs -text
+*.obscpio filter=lfs diff=lfs merge=lfs -text
+*.oxt filter=lfs diff=lfs merge=lfs -text
+*.pdf filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text
+*.rpm filter=lfs diff=lfs merge=lfs -text
+*.tbz filter=lfs diff=lfs merge=lfs -text
+*.tbz2 filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.ttf filter=lfs diff=lfs merge=lfs -text
+*.txz filter=lfs diff=lfs merge=lfs -text
+*.whl filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
--- a/11
+++ b/11
@ -0,0 +1,11 @@
+<constraints>
+    <hardware>
+      <memory>
+        <size unit="G">10</size>
+      </memory>
+      <disk>
+        <size unit="G">10</size>
+      </disk>
+    </hardware>
+</constraints>
+
--- a/apache-arrow-14.0.1.tar.gz
+++ b/apache-arrow-14.0.1.tar.gz
--- a/apache-arrow.changes
+++ b/apache-arrow.changes
@ -0,0 +1,418 @@
+-------------------------------------------------------------------
+Mon Nov 13 23:51:00 UTC 2023 - Ondřej Súkup <mimi.vx@gmail.com>
+
+- update 14.0.1
+ * GH-38431 - [Python][CI] Update fs.type_name checks for s3fs tests
+ * GH-38607 - [Python] Disable PyExtensionType autoload
+- update to 14.0.1
+ * very long list of changes can be found here:
+ https://arrow.apache.org/release/14.0.0.html
+
+-------------------------------------------------------------------
+Fri Aug 25 09:05:09 UTC 2023 - Ben Greiner <code@bnavigator.de>
+
+- Update to 13.0.0
+  ## Acero
+  * Handling of unaligned buffers is input nodes can be configured
+    programmatically or by setting the environment variable
+    ACERO_ALIGNMENT_HANDLING. The default behavior is to warn when
+    an unaligned buffer is detected GH-35498.
+  ## Compute
+  * Several new functions have been added:
+    - aggregate functions “first”, “last”, “first_last” GH-34911;
+    - vector functions “cumulative_prod”, “cumulative_min”,
+      “cumulative_max” GH-32190;
+    - vector function “pairwise_diff” GH-35786.
+  * Sorting now works on dictionary arrays, with a much better
+    performance than the naive approach of sorting the decoded
+    dictionary GH-29887. Sorting also works on struct arrays, and
+    nested sort keys are supported using FieldRed GH-33206.
+  * The check_overflow option has been removed from
+    CumulativeSumOptions as it was redundant with the availability
+    of two different functions: “cumulative_sum” and
+    “cumulative_sum_checked” GH-35789.
+  * Run-end encoded filters are efficiently supported GH-35749.
+  * Duration types are supported with the “is_in” and “index_in”
+    functions GH-36047. They can be multiplied with all integer
+    types GH-36128.
+  * “is_in” and “index_in” now cast their inputs more flexibly:
+    they first attempt to cast the value set to the input type,
+    then in the other direction if the former fails GH-36203.
+  * Multiple bugs have been fixed in “utf8_slice_codeunits” when
+    the stop option is omitted GH-36311.
+  ## Dataset
+  * A custom schema can now be passed when writing a dataset
+    GH-35730. The custom schema can alter nullability or metadata
+    information, but is not allowed to change the datatypes
+    written.
+  ## Filesystems
+  * The S3 filesystem now writes files in equal-sized chunks, for
+    compatibility with Cloudflare’s “R2” Storage GH-34363.
+  * A long-standing issue where S3 support could crash at shutdown
+    because of resources still being alive after S3 finalization
+    has been fixed GH-36346. Now, attempts to use S3 resources
+    (such as making filesystem calls) after S3 finalization should
+    result in a clean error.
+  * The GCS filesystem accepts a new option to set the project id
+    GH-36227.
+  ## IPC
+  * Nullability and metadata information for sub-fields of map
+    types is now preserved when deserializing Arrow IPC GH-35297.
+  ## Orc
+  * The Orc adapter now maps Arrow field metadata to Orc type
+    attributes when writing, and vice-versa when reading GH-35304.
+  ## Parquet
+  * It is now possible to write additional metadata while a
+    ParquetFileWriter is open GH-34888.
+  * Writing a page index can be enabled selectively per-column
+    GH-34949. In addition, page header statistics are not written
+    anymore if the page index is enabled for the given column
+    GH-34375, as the information would be redundant and less
+    efficiently accessed.
+  * Parquet writer properties allow specifying the sorting columns
+    GH-35331. The user is responsible for ensuring that the data
+    written to the file actually complies with the given sorting.
+  * CRC computation has been implemented for v2 data pages
+    GH-35171. It was already implemented for v1 data pages.
+  * Writing compliant nested types is now enabled by default
+    GH-29781. This should not have any negative implication.
+  * Attempting to load a subset of an Arrow extension type is now
+    forbidden GH-20385. Previously, if an extension type’s storage
+    is nested (for example a “Point” extension type backed by a
+    struct<x: float64, y: float64>), it was possible to load
+    selectively some of the columns of the storage type.
+  ## Substrait
+  * Support for various functions has been added: “stddev”,
+    “variance”, “first”, “last” (GH-35247, GH-35506).
+  * Deserializing sorts is now supported GH-32763. However, some
+    features, such as clustered sort direction or custom sort
+    functions, are not implemented.
+  ## Miscellaneous
+  * FieldRef sports additional methods to get a flattened version
+    of nested fields GH-14946. Compared to their non-flattened
+    counterparts, the methods GetFlattened, GetAllFlattened,
+    GetOneFlattened and GetOneOrNoneFlattened combine a child’s
+    null bitmap with its ancestors’ null bitmaps such as to compute
+    the field’s overall logical validity bitmap.
+  * In other words, given the struct array [null, {'x': null},
+    {'x': 5}], FieldRef("x")::Get might return [0, null, 5] while
+    FieldRef("y")::GetFlattened will always return [null, null, 5].
+  * Scalar::hash() has been fixed for sliced nested arrays
+    GH-35360.
+  * A new floating-point to decimal conversion algorithm exhibits
+    much better precision GH-35576.
+  * It is now possible to cast between scalars of different
+    list-like types GH-36309.
+
+-------------------------------------------------------------------
+Mon Jun 12 12:13:18 UTC 2023 - Ben Greiner <code@bnavigator.de>
+
+- Update to 12.0.1
+  * [GH-35423] - [C++][Parquet] Parquet PageReader Force
+    decompression buffer resize smaller (#35428)
+  * [GH-35498] - [C++] Relax EnsureAlignment check in Acero from
+    requiring 64-byte aligned buffers to requiring value-aligned
+    buffers (#35565)
+  * [GH-35519] - [C++][Parquet] Fixing exception handling in parquet
+    FileSerializer (#35520)
+  * [GH-35538] - [C++] Remove unnecessary status.h include from
+    protobuf (#35673)
+  * [GH-35730] - [C++] Add the ability to specify custom schema on a
+    dataset write (#35860)
+  * [GH-35850] - [C++] Don't disable optimization with
+    RelWithDebInfo (#35856)
+- Drop cflags.patch -- fixed upstream
+
+-------------------------------------------------------------------
+Thu May 18 07:00:43 UTC 2023 - Ben Greiner <code@bnavigator.de>
+
+- Update to 12.0.0
+  * Run-End Encoded Arrays have been implemented and are accessible
+    (GH-32104)
+  * The FixedShapeTensor Logical value type has been implemented
+    using ExtensionType (GH-15483, GH-34796)
+  ## Compute
+  * New kernel to convert timestamp with timezone to wall time
+    (GH-33143)
+  * Cast kernels are now built into libarrow by default (GH-34388)
+  ## Acero
+  * Acero has been moved out of libarrow into it’s own shared
+    library, allowing for smaller builds of the core libarrow
+    (GH-15280)
+  * Exec nodes now can have a concept of “ordering” and will reject
+    non-sensible plans (GH-34136)
+  * New exec nodes: “pivot_longer” (GH-34266), “order_by”
+    (GH-34248) and “fetch” (GH-34059)
+  * Breaking Change: Reorder output fields of “group_by” node so
+    that keys/segment keys come before aggregates (GH-33616)
+  ## Substrait
+  * Add support for the round function GH-33588
+  * Add support for the cast expression element GH-31910
+  * Added API reference documentation GH-34011
+  * Added an extension relation to support segmented aggregation
+    GH-34626
+  * The output of the aggregate relation now conforms to the spec
+    GH-34786
+  ## Parquet
+  * Added support for DeltaLengthByteArray encoding to the Parquet
+    writer (GH-33024)
+  * NaNs are correctly handled now for Parquet predicate push-downs
+    (GH-18481)
+  * Added support for reading Parquet page indexes (GH-33596) and
+    writing page indexes (GH-34053)
+  * Parquet writer can write columns in parallel now (GH-33655)
+  * Fixed incorrect number of rows in Parquet V2 page headers
+    (GH-34086)
+  * Fixed incorrect Parquet page null_count when stats are disabled
+    (GH-34326)
+  * Added support for reading BloomFilters to the Parquet Reader
+    (GH-34665)
+  * Parquet File-writer can now add additional key-value metadata
+    after it has been opened (GH-34888)
+  * Breaking Change: The default row group size for the Arrow
+    writer changed from 64Mi rows to 1Mi rows. GH-34280
+  ## ORC
+  * Added support for the union type in ORC writer (GH-34262)
+  * Fixed ORC CHAR type mapping with Arrow (GH-34823)
+  * Fixed timestamp type mapping between ORC and arrow (GH-34590)
+  ## Datasets
+  * Added support for reading JSON datasets (GH-33209)
+  * Dataset writer now supports specifying a function callback to
+    construct the file name in addition to the existing file name
+    template (GH-34565)
+  ## Filesystems
+  * GcsFileSystem::OpenInputFile avoids unnecessary downloads
+    (GH-34051)
+  ## Other changes
+  * Convenience Append(std::optional...) methods have been added to
+    array builders
+    ([GH-14863](https://github.com/apache/arrow/issues/14863))
+  * A deprecated OpenTelemetry header was removed from the Flight
+    library (GH-34417)
+  * Fixed crash in “take” kernels on ExtensionArrays with an
+    underlying dictionary type (GH-34619)
+  * Fixed bug where the C-Data bridge did not preserve nullability
+    of map values on import (GH-34983)
+  * Added support for EqualOptions to RecordBatch::Equals
+    (GH-34968)
+  * zstd dependency upgraded to v1.5.5 (GH-34899)
+  * Improved handling of “logical” nulls such as with union and
+    RunEndEncoded arrays (GH-34361)
+  * Fixed incorrect handling of uncompressed body buffers in IPC
+    reader, added IpcWriteOptions::min_space_savings for optional
+    compression optimizations (GH-15102)
+
+-------------------------------------------------------------------
+Mon Apr  3 11:09:06 UTC 2023 - Andreas Schwab <schwab@suse.de>
+
+- cflags.patch: fix option order to compile with optimisation
+- Adjust constraints
+
+-------------------------------------------------------------------
+Wed Mar 29 13:13:13 UTC 2023 - Ben Greiner <code@bnavigator.de>
+
+- Remove gflags-static. It was only needed due to a packaging error
+  with gflags which is about to be fixed in Tumbleweed
+- Disable build of the jemalloc memory pool backend
+  * It requires every consuming application to LD_PRELOAD
+    libjemalloc.so.2, even when it is not set as the default memory
+    pool, due to static TLS block allocation errors
+  * Usage of the bundled jemalloc as a workaround is not desired
+    (gh#apache/arrow#13739)
+  * jemalloc does not seem to have a clear advantage over the
+    system glibc allocator:
+    https://ursalabs.org/blog/2021-r-benchmarks-part-1
+  * This overrides the default behavior documented in
+    https://arrow.apache.org/docs/cpp/memory.html#default-memory-pool
+
+-------------------------------------------------------------------
+Sun Mar 12 04:28:52 UTC 2023 - Ben Greiner <code@bnavigator.de>
+
+- Update to v11.0.0
+  * ARROW-4709 - [C++] Optimize for ordered JSON fields (#14100)
+  * ARROW-11776 - [C++][Java] Support parquet write from ArrowReader
+    to file (#14151)
+  * ARROW-13938 - [C++] Date and datetime types should autocast from
+    strings
+  * ARROW-14161 - [C++][Docs] Improve Parquet C++ docs (#14018)
+  * ARROW-14999 - [C++] Optional field name equality checks for map
+    and list type (#14847)
+  * ARROW-15538 - [C++] Expanding coverage of math functions from
+    Substrait to Acero (#14434)
+  * ARROW-15592 - [C++] Add support for custom output field names in
+    a substrait::PlanRel (#14292)
+  * ARROW-15732 - [C++] Do not use any CPU threads in execution plan
+    when use_threads is false (#15104)
+  * ARROW-16782 - [Format] Add REE definitions to FlatBuffers
+    (#14176)
+  * ARROW-17144 - [C++][Gandiva] Add sqrt function (#13656)
+  * ARROW-17301 - [C++] Implement compute function "binary_slice"
+    (#14550)
+  * ARROW-17509 - [C++] Simplify async scheduler by removing the
+    need to call End (#14524)
+  * ARROW-17520 - [C++] Implement SubStrait SetRel (UnionAll)
+    (#14186)
+  * ARROW-17610 - [C++] Support additional source types in
+    SourceNode (#14207)
+  * ARROW-17613 - [C++] Add function execution API for a
+    preconfigured kernel (#14043)
+  * ARROW-17640 - [C++] Add File Handling Test cases for GlobFile
+    handling in Substrait Read (#14132)
+  * ARROW-17798 - [C++][Parquet] Add DELTA_BINARY_PACKED encoder to
+    Parquet writer (#14191)
+  * ARROW-17825 - [C++] Allow the possibility to write several
+    tables in ORCFileWriter (#14219)
+  * ARROW-17836 - [C++] Allow specifying alignment of buffers
+    (#14225)
+  * ARROW-17837 - [C++][Acero] Create ExecPlan-owned QueryContext
+    that will store a plan's shared data structures (#14227)
+  * ARROW-17859 - [C++] Use self-pipe in signal-receiving StopSource
+    (#14250)
+  * ARROW-17867 - [C++][FlightRPC] Expose bulk parameter binding in
+    Flight SQL (#14266)
+  * ARROW-17932 - [C++] Implement streaming RecordBatchReader for
+    JSON (#14355)
+  * ARROW-17960 - [C++][Python] Implement list_slice kernel (#14395)
+  * ARROW-17966 - [C++] Adjust to new format for Substrait optional
+    arguments (#14415)
+  * ARROW-17975 - [C++] Create at-fork facility (#14594)
+  * ARROW-17980 - [C++] As-of-Join Substrait extension (#14485)
+  * ARROW-17989 - [C++][Python] Enable struct_field kernel to accept
+    string field names (#14495)
+  * ARROW-18008 - [Python][C++] Add use_threads to
+    run_substrait_query
+  * ARROW-18051 - [C++] Enable tests skipped by ARROW-16392 (#14425)
+  * ARROW-18095 - [CI][C++][MinGW] All tests exited with 0xc0000139
+  * ARROW-18113 - [C++] Add RandomAccessFile::ReadManyAsync (#14723)
+  * ARROW-18135 - [C++] Avoid warnings that ExecBatch::length may be
+    uninitialized (#14480)
+  * ARROW-18144 - [C++] Improve JSONTypeError error message in
+    testing (#14486)
+  * ARROW-18184 - [C++] Improve JSON parser benchmarks (#14552)
+  * ARROW-18206 - [C++][CI] Add a nightly build for C++20
+    compilation (#14571)
+  * ARROW-18235 - [C++][Gandiva] Fix the like function
+    implementation for escape chars (#14579)
+  * ARROW-18249 - [C++] Update vcpkg port to arrow 10.0.0
+  * ARROW-18253 - [C++][Parquet] Add additional bounds safety checks
+    (#14592)
+  * ARROW-18259 - [C++][CMake] Add support for system Thrift CMake
+    package (#14597)
+  * ARROW-18280 - [C++][Python] Support slicing to end in list_slice
+    kernel (#14749)
+  * ARROW-18282 - [C++][Python] Support step >= 1 in list_slice
+    kernel (#14696)
+  * ARROW-18287 - [C++][CMake] Add support for Brotli/utf8proc
+    provided by vcpkg (#14609)
+  * ARROW-18342 - [C++] AsofJoinNode support for Boolean data field
+    (#14658)
+  * ARROW-18350 - [C++] Use std::to_chars instead of std::to_string
+    (#14666)
+  * ARROW-18367 - [C++] Enable the creation of named table relations
+    (#14681)
+  * ARROW-18373 - Fix component drop-down, add license text (#14688)
+  * ARROW-18377 - MIGRATION: Automate component labels from issue
+    form content (#15245)
+  * ARROW-18395 - [C++] Move select-k implementation into separate
+    module
+  * ARROW-18402 - [C++] Expose DeclarationInfo (#14765)
+  * ARROW-18406 - [C++] Can't build Arrow with Substrait on Ubuntu
+    20.04 (#14735)
+  * ARROW-18409 - [GLib][Plasma] Suppress deprecated warning in
+    building plasma-glib (#14739)
+  * ARROW-18413 - [C++][Parquet] Expose page index info from
+    ColumnChunkMetaData (#14742)
+  * ARROW-18419 - [C++] Update vendored fast_float (#14817)
+  * ARROW-18420 - [C++][Parquet] Introduce ColumnIndex & OffsetIndex
+    (#14803)
+  * ARROW-18421 - [C++][ORC] Add accessor for stripe information in
+    reader (#14806)
+  * ARROW-18427 - [C++] Support negative tolerance in AsofJoinNode
+    (#14934)
+  * ARROW-18435 - [C++][Java] Update ORC to 1.8.1 (#14942)
+  * GH-14869 - [C++] Add Cflags.private defining _STATIC to .pc.in.
+    (#14900)
+  * GH-14920 - [C++][CMake] Add missing -latomic to Arrow CMake
+    package (#15251)
+  * GH-14937 - [C++] Add rank kernel benchmarks (#14938)
+  * GH-14951 - [C++][Parquet] Add benchmarks for DELTA_BINARY_PACKED
+    encoding (#15140)
+  * GH-15072 - [C++] Move the round functionality into a separate
+    module (#15073)
+  * GH-15074 - [Parquet][C++] change 16-bit page_ordinal to 32-bit
+    (#15182)
+  * GH-15096 - [C++] Substrait ProjectRel Emit Optimization (#15097)
+  * GH-15100 - [C++][Parquet] Add benchmark for reading strings from
+    Parquet (#15101)
+  * GH-15151 - [C++] Adding RecordBatchReaderSource to solve an
+    issue in R API (#15183)
+  * GH-15185 - [C++][Parquet] Improve documentation for Parquet
+    Reader column_indices (#15184)
+  * GH-15199 - [C++][Substrait] Allow
+    AGGREGATION_INVOCATION_UNSPECIFIED as valid invocation (#15198)
+  * GH-15200 - [C++] Created benchmarks for round kernels. (#15201)
+  * GH-15216 - [C++][Parquet] Parquet writer accepts RecordBatch
+    (#15240)
+  * GH-15226 - [C++] Add DurationType to hash kernels (#33685)
+  * GH-15237 - [C++] Add ::arrow::Unreachable() using
+    std::string_view (#15238)
+  * GH-15239 - [C++][Parquet] Parquet writer writes decimal as
+    int32/64 (#15244)
+  * GH-15290 - [C++][Compute] Optimize IfElse kernel AAS/ASA case
+    when the scalar is null (#15291)
+  * GH-33607 - [C++] Support optional additional arguments for
+    inline visit functions (#33608)
+  * GH-33657 - [C++] arrow-dataset.pc doesn't depend on parquet.pc
+    without ARROW_PARQUET=ON (#33665)
+  * PARQUET-2179 - [C++][Parquet] Add a test for skipping repeated
+    fields (#14366)
+  * PARQUET-2188 - [parquet-cpp] Add SkipRecords API to RecordReader
+    (#14142)
+  * PARQUET-2204 - [parquet-cpp] TypedColumnReaderImpl::Skip should
+    reuse scratch space (#14509)
+  * PARQUET-2206 - [parquet-cpp] Microbenchmark for ColumnReader
+    ReadBatch and Skip (#14523)
+  * PARQUET-2209 - [parquet-cpp] Optimize skip for the case that
+    number of values to skip equals page size (#14545)
+  * PARQUET-2210 - [C++][Parquet] Skip pages based on header
+    metadata using a callback (#14603)
+  * PARQUET-2211 - [C++] Print ColumnMetaData.encoding_stats field
+    (#14556)
+- Remove unused python3-arrow package declaration
+  * Add options as recommended for python support
+- Provide test data for unittests
+- Don't use system jemalloc but bundle it in order to avoid
+  static TLS errors in consuming packages like python-pyarrow
+  * gh#apache/arrow#13739
+
+-------------------------------------------------------------------
+Sun Aug 28 19:30:50 UTC 2022 - Stefan Brüns <stefan.bruens@rwth-aachen.de>
+
+- Revert ccache change, using ccache in a pristine buildroot
+  just slows down OBS builds (use --ccache for local builds).
+- Remove unused gflags-static-devel dependency.
+
+-------------------------------------------------------------------
+Mon Aug 22 06:22:43 UTC 2022 - John Vandenberg <jayvdb@gmail.com>
+
+- Speed up builds with ccache
+
+-------------------------------------------------------------------
+Sat Aug  6 01:59:08 UTC 2022 - Stefan Brüns <stefan.bruens@rwth-aachen.de>
+
+- Update to v9.0.0
+  No (current) changelog provided
+- Spec file cleanup:
+  * Remove lots of duplicate, unused, or wrong build dependencies
+  * Do not package outdated Readmes and Changelogs
+- Enable tests, disable ones requiring external test data
+
+-------------------------------------------------------------------
+Sat Nov 14 09:07:59 UTC 2020 - John Vandenberg <jayvdb@gmail.com>
+
+- Update to v2.0.0
+
+-------------------------------------------------------------------
+Wed Nov 13 21:14:00 UTC 2019 - TheBlackCat <toddrme2178@gmail.com>
+
+- Initial spec for v0.12.0
--- a/apache-arrow.spec
+++ b/apache-arrow.spec
@ -0,0 +1,414 @@
+#
+# spec file for package apache-arrow
+#
+# Copyright (c) 2023 SUSE LLC
+#
+# All modifications and additions to the file contributed by third parties
+# remain the property of their copyright owners, unless otherwise agreed
+# upon. The license for this file, and modifications and additions to the
+# file, is the same license as for the pristine package itself (unless the
+# license for the pristine package is not an Open Source License, in which
+# case the license is the MIT License). An "Open Source License" is a
+# license that conforms to the Open Source Definition (Version 1.9)
+# published by the Open Source Initiative.
+
+# Please submit bugfixes or comments via https://bugs.opensuse.org/
+#
+
+
+%bcond_without tests
+# Required for runtime dispatch, not yet packaged
+%bcond_with xsimd
+
+%define sonum   1400
+# See git submodule /testing pointing to the correct revision
+%define arrow_testing_commit 47f7b56b25683202c1fd957668e13f2abafc0f12
+# See git submodule /cpp/submodules/parquet-testing pointing to the correct revision
+%define parquet_testing_commit e45cd23f784aab3d6bf0701f8f4e621469ed3be7
+Name:           apache-arrow
+Version:        14.0.1
+Release:        0
+Summary:        A development platform for in-memory data
+License:        Apache-2.0 AND BSD-3-Clause AND BSD-2-Clause AND MIT
+Group:          Development/Tools/Other
+URL:            https://arrow.apache.org/
+Source0:        https://github.com/apache/arrow/archive/apache-arrow-%{version}.tar.gz
+Source1:        https://github.com/apache/arrow-testing/archive/%{arrow_testing_commit}.tar.gz#/arrow-testing-%{version}.tar.gz
+Source2:        https://github.com/apache/parquet-testing/archive/%{parquet_testing_commit}.tar.gz#/parquet-testing-%{version}.tar.gz
+BuildRequires:  bison
+BuildRequires:  cmake >= 3.16
+BuildRequires:  fdupes
+BuildRequires:  flex
+BuildRequires:  gcc-c++
+BuildRequires:  libboost_filesystem-devel
+BuildRequires:  libboost_system-devel >= 1.64.0
+BuildRequires:  libzstd-devel-static
+BuildRequires:  llvm-devel >= 7
+BuildRequires:  pkgconfig
+BuildRequires:  python-rpm-macros
+BuildRequires:  python3-base
+BuildRequires:  cmake(Snappy) >= 1.1.7
+BuildRequires:  cmake(absl)
+BuildRequires:  cmake(double-conversion) >= 3.1.5
+BuildRequires:  cmake(re2)
+BuildRequires:  pkgconfig(RapidJSON)
+BuildRequires:  pkgconfig(bzip2) >= 1.0.8
+BuildRequires:  pkgconfig(gflags) >= 2.2.0
+BuildRequires:  pkgconfig(grpc++) >= 1.20.0
+BuildRequires:  pkgconfig(libbrotlicommon) >= 1.0.7
+BuildRequires:  pkgconfig(libbrotlidec) >= 1.0.7
+BuildRequires:  pkgconfig(libbrotlienc) >= 1.0.7
+BuildRequires:  pkgconfig(libcares) >= 1.15.0
+BuildRequires:  pkgconfig(libglog) >= 0.3.5
+BuildRequires:  pkgconfig(liblz4) >= 1.8.3
+BuildRequires:  pkgconfig(libopenssl)
+BuildRequires:  pkgconfig(liburiparser) >= 0.9.3
+BuildRequires:  pkgconfig(libutf8proc)
+BuildRequires:  pkgconfig(libzstd) >= 1.4.3
+BuildRequires:  pkgconfig(protobuf) >= 3.7.1
+BuildRequires:  pkgconfig(thrift) >= 0.11.0
+BuildRequires:  pkgconfig(zlib) >= 1.2.11
+%if %{with tests}
+BuildRequires:  timezone
+BuildRequires:  pkgconfig(gmock) >= 1.10
+BuildRequires:  pkgconfig(gtest) >= 1.10
+%endif
+
+%description
+Apache Arrow is a cross-language development platform for in-memory
+data. It specifies a standardized language-independent columnar memory
+format for flat and hierarchical data, organized for efficient
+analytic operations on modern hardware. It also provides computational
+libraries and zero-copy streaming messaging and interprocess
+communication.
+
+%package     -n libarrow%{sonum}
+Summary:        Development platform for in-memory data - shared library
+Group:          System/Libraries
+
+%description -n libarrow%{sonum}
+Apache Arrow is a cross-language development platform for in-memory
+data. It specifies a standardized language-independent columnar memory
+format for flat and hierarchical data, organized for efficient
+analytic operations on modern hardware. It also provides computational
+libraries and zero-copy streaming messaging and interprocess
+communication.
+
+This package provides the shared library for Apache Arrow.
+
+%package     -n libarrow_acero%{sonum}
+Summary:        Development platform for in-memory data - shared library
+Group:          System/Libraries
+
+%description -n libarrow_acero%{sonum}
+Apache Arrow is a cross-language development platform for in-memory
+data. It specifies a standardized language-independent columnar memory
+format for flat and hierarchical data, organized for efficient
+analytic operations on modern hardware. It also provides computational
+libraries and zero-copy streaming messaging and interprocess
+communication.
+
+This package provides the shared library for the Acero streaming execution engine
+
+%package     -n libarrow_dataset%{sonum}
+Summary:        Development platform for in-memory data - shared library
+Group:          System/Libraries
+
+%description -n libarrow_dataset%{sonum}
+Apache Arrow is a cross-language development platform for in-memory
+data. It specifies a standardized language-independent columnar memory
+format for flat and hierarchical data, organized for efficient
+analytic operations on modern hardware. It also provides computational
+libraries and zero-copy streaming messaging and interprocess
+communication.
+
+This package provides the shared library for Dataset API support.
+
+%package     -n libparquet%{sonum}
+Summary:        Development platform for in-memory data - shared library
+Group:          System/Libraries
+
+%description -n libparquet%{sonum}
+Apache Arrow is a cross-language development platform for in-memory
+data. It specifies a standardized language-independent columnar memory
+format for flat and hierarchical data, organized for efficient
+analytic operations on modern hardware. It also provides computational
+libraries and zero-copy streaming messaging and interprocess
+communication.
+
+This package provides the shared library for the Parquet format.
+
+%package        devel
+Summary:        Development platform for in-memory data - development files
+Group:          Development/Libraries/C and C++
+Requires:       libarrow%{sonum} = %{version}
+Requires:       libarrow_acero%{sonum} = %{version}
+Requires:       libarrow_dataset%{sonum} = %{version}
+
+%description    devel
+Apache Arrow is a cross-language development platform for in-memory
+data. It specifies a standardized language-independent columnar memory
+format for flat and hierarchical data, organized for efficient
+analytic operations on modern hardware. It also provides computational
+libraries and zero-copy streaming messaging and interprocess
+communication.
+
+This package provides the development libraries and headers for
+Apache Arrow.
+
+%package        devel-static
+Summary:        Development platform for in-memory data - development files
+Group:          Development/Libraries/C and C++
+Requires:       %{name}-devel = %{version}
+
+%description    devel-static
+Apache Arrow is a cross-language development platform for in-memory
+data. It specifies a standardized language-independent columnar memory
+format for flat and hierarchical data, organized for efficient
+analytic operations on modern hardware. It also provides computational
+libraries and zero-copy streaming messaging and interprocess
+communication.
+
+This package provides the static library
+
+%package        acero-devel-static
+Summary:        Development platform for in-memory data - development files
+Group:          Development/Libraries/C and C++
+Requires:       %{name}-devel = %{version}
+
+%description    acero-devel-static
+Apache Arrow is a cross-language development platform for in-memory
+data. It specifies a standardized language-independent columnar memory
+format for flat and hierarchical data, organized for efficient
+analytic operations on modern hardware. It also provides computational
+libraries and zero-copy streaming messaging and interprocess
+communication.
+
+This package provides the static library for the Acero streaming execution engine
+
+%package        dataset-devel-static
+Summary:        Development platform for in-memory data - development files
+Group:          Development/Libraries/C and C++
+Requires:       %{name}-devel = %{version}
+
+%description    dataset-devel-static
+Apache Arrow is a cross-language development platform for in-memory
+data. It specifies a standardized language-independent columnar memory
+format for flat and hierarchical data, organized for efficient
+analytic operations on modern hardware. It also provides computational
+libraries and zero-copy streaming messaging and interprocess
+communication.
+
+This package provides the static library for Dataset API support
+
+%package     -n apache-parquet-devel
+Summary:        Development platform for in-memory data - development files
+Group:          Development/Libraries/C and C++
+Requires:       libparquet%{sonum} = %{version}
+
+%description -n apache-parquet-devel
+Apache Arrow is a cross-language development platform for in-memory
+data. It specifies a standardized language-independent columnar memory
+format for flat and hierarchical data, organized for efficient
+analytic operations on modern hardware. It also provides computational
+libraries and zero-copy streaming messaging and interprocess
+communication.
+
+This package provides the development libraries and headers for
+the Parquet format.
+
+%package     -n apache-parquet-devel-static
+Summary:        Development platform for in-memory data - development files
+Group:          Development/Libraries/C and C++
+Requires:       apache-parquet-devel = %{version}
+
+%description -n apache-parquet-devel-static
+Apache Arrow is a cross-language development platform for in-memory
+data. It specifies a standardized language-independent columnar memory
+format for flat and hierarchical data, organized for efficient
+analytic operations on modern hardware. It also provides computational
+libraries and zero-copy streaming messaging and interprocess
+communication.
+
+This package provides the static library for the Parquet format.
+
+%package     -n apache-parquet-utils
+Summary:        Development platform for in-memory data - development files
+Group:          Productivity/Scientific/Math
+
+%description -n apache-parquet-utils
+Apache Arrow is a cross-language development platform for in-memory
+data. It specifies a standardized language-independent columnar memory
+format for flat and hierarchical data, organized for efficient
+analytic operations on modern hardware. It also provides computational
+libraries and zero-copy streaming messaging and interprocess
+communication.
+
+This package provides utilities for working with the Parquet format.
+
+%prep
+%setup -q -n arrow-apache-arrow-%{version} -a1 -a2
+
+%build
+export CFLAGS="%{optflags} -ffat-lto-objects"
+export CXXFLAGS="%{optflags} -ffat-lto-objects"
+
+pushd cpp
+%cmake \
+   -DARROW_BUILD_EXAMPLES:BOOL=ON \
+   -DARROW_BUILD_SHARED:BOOL=ON \
+   -DARROW_BUILD_STATIC:BOOL=ON \
+   -DARROW_BUILD_TESTS:BOOL=%{?with_tests:ON}%{!?with_tests:OFF} \
+   -DARROW_BUILD_UTILITIES:BOOL=ON \
+   -DARROW_DEPENDENCY_SOURCE=SYSTEM \
+   -DARROW_BUILD_BENCHMARKS:BOOL=OFF \
+%ifarch aarch64
+   -DARROW_SIMD_LEVEL:STRING=%{?with_xsimd:NEON}%{!?with_xsimd:NONE} \
+%else
+   -DARROW_SIMD_LEVEL:STRING="NONE" \
+%endif
+   -DARROW_RUNTIME_SIMD_LEVEL:STRING=%{?with_xsimd:MAX}%{!?with_xsimd:NONE} \
+   -DARROW_COMPUTE:BOOL=ON \
+   -DARROW_CSV:BOOL=ON \
+   -DARROW_DATASET:BOOL=ON \
+   -DARROW_FILESYSTEM:BOOL=ON \
+   -DARROW_FLIGHT:BOOL=OFF \
+   -DARROW_GANDIVA:BOOL=OFF \
+   -DARROW_HDFS:BOOL=ON \
+   -DARROW_HIVESERVER2:BOOL=OFF \
+   -DARROW_IPC:BOOL=ON \
+   -DARROW_JEMALLOC:BOOL=OFF \
+   -DARROW_JSON:BOOL=ON \
+   -DARROW_ORC:BOOL=OFF \
+   -DARROW_PARQUET:BOOL=ON \
+   -DARROW_USE_GLOG:BOOL=ON \
+   -DARROW_USE_OPENSSL:BOOL=ON \
+   -DARROW_WITH_BACKTRACE:BOOL=ON \
+   -DARROW_WITH_BROTLI:BOOL=ON \
+   -DARROW_WITH_BZ2:BOOL=ON \
+   -DARROW_WITH_LZ4:BOOL=ON \
+   -DARROW_WITH_SNAPPY:BOOL=ON \
+   -DARROW_WITH_ZLIB:BOOL=ON \
+   -DARROW_WITH_ZSTD:BOOL=ON \
+   -DPARQUET_BUILD_EXAMPLES:BOOL=ON \
+   -DPARQUET_BUILD_EXECUTABLES:BOOL=ON \
+   -DPARQUET_REQUIRE_ENCRYPTION:BOOL=ON \
+   -DARROW_VERBOSE_THIRDPARTY_BUILD:BOOL=ON \
+   -DARROW_CUDA:BOOL=OFF \
+   -DARROW_GANDIVA_JAVA:BOOL=OFF
+
+%cmake_build
+popd
+
+%install
+pushd cpp
+%cmake_install
+popd
+%if %{with tests}
+rm %{buildroot}%{_libdir}/libarrow_testing.so*
+rm %{buildroot}%{_libdir}/libarrow_testing.a
+rm %{buildroot}%{_libdir}/pkgconfig/arrow-testing.pc
+rm -Rf %{buildroot}%{_includedir}/arrow/testing
+%endif
+rm -r %{buildroot}%{_datadir}/doc/arrow/
+%fdupes %{buildroot}%{_libdir}/cmake
+
+%check
+%if %{with tests}
+export PARQUET_TEST_DATA="${PWD}/parquet-testing-%{parquet_testing_commit}/data"
+export ARROW_TEST_DATA="${PWD}/arrow-testing-%{arrow_testing_commit}/data"
+pushd cpp
+export PYTHON=%{_bindir}/python3
+%ifarch %ix86 %arm32
+GTEST_failing="TestDecimalFromReal*"
+GTEST_failing="${GTEST_failing}:*TestDecryptionConfiguration.TestDecryption*"
+%endif
+%ifnarch x86_64
+GTEST_failing="${GTEST_failing}:Jemalloc.GetAllocationStats"
+%endif
+if [ -n "${GTEST_failing}" ]; then
+  export GTEST_FILTER=${GTEST_failing}
+  %ctest --label-regex unittest || true
+  export GTEST_FILTER=*:-${GTEST_failing}
+fi
+%ctest --label-regex unittest
+popd
+%endif
+
+%post   -n libarrow%{sonum}   -p /sbin/ldconfig
+%postun -n libarrow%{sonum}   -p /sbin/ldconfig
+%post   -n libarrow_acero%{sonum}  -p /sbin/ldconfig
+%postun -n libarrow_acero%{sonum}  -p /sbin/ldconfig
+%post   -n libarrow_dataset%{sonum}   -p /sbin/ldconfig
+%postun -n libarrow_dataset%{sonum}   -p /sbin/ldconfig
+%post   -n libparquet%{sonum} -p /sbin/ldconfig
+%postun -n libparquet%{sonum} -p /sbin/ldconfig
+
+%files
+%license LICENSE.txt NOTICE.txt header
+%{_bindir}/arrow-file-to-stream
+%{_bindir}/arrow-stream-to-file
+
+%files -n libarrow%{sonum}
+%license LICENSE.txt NOTICE.txt header
+%{_libdir}/libarrow.so.*
+
+%files -n libarrow_acero%{sonum}
+%license LICENSE.txt NOTICE.txt header
+%{_libdir}/libarrow_acero.so.*
+
+%files -n libarrow_dataset%{sonum}
+%license LICENSE.txt NOTICE.txt header
+%{_libdir}/libarrow_dataset.so.*
+
+%files -n libparquet%{sonum}
+%license LICENSE.txt NOTICE.txt header
+%{_libdir}/libparquet.so.*
+
+%files devel
+%doc README.md
+%license LICENSE.txt NOTICE.txt header
+%{_includedir}/arrow/
+%{_libdir}/cmake/Arrow*
+%{_libdir}/libarrow.so
+%{_libdir}/libarrow_acero.so
+%{_libdir}/libarrow_dataset.so
+%{_libdir}/pkgconfig/arrow*.pc
+%dir %{_datadir}/arrow
+%{_datadir}/arrow/gdb
+%dir %{_datadir}/gdb
+%dir %{_datadir}/gdb/auto-load
+%dir %{_datadir}/gdb/auto-load/%{_prefix}
+%dir %{_datadir}/gdb/auto-load/%{_libdir}
+%{_datadir}/gdb/auto-load/%{_libdir}/libarrow.so.*.py
+
+%files devel-static
+%license LICENSE.txt NOTICE.txt header
+%{_libdir}/libarrow.a
+
+%files acero-devel-static
+%license LICENSE.txt NOTICE.txt header
+%{_libdir}/libarrow_acero.a
+
+%files dataset-devel-static
+%license LICENSE.txt NOTICE.txt header
+%{_libdir}/libarrow_dataset.a
+
+%files -n apache-parquet-devel
+%doc README.md
+%license LICENSE.txt NOTICE.txt header
+%{_includedir}/parquet/
+%{_libdir}/cmake/Parquet
+%{_libdir}/libparquet.so
+%{_libdir}/pkgconfig/parquet.pc
+
+%files -n apache-parquet-devel-static
+%license LICENSE.txt NOTICE.txt header
+%{_libdir}/libparquet.a
+
+%files -n apache-parquet-utils
+%doc README.md
+%license LICENSE.txt NOTICE.txt header
+%{_bindir}/parquet-*
+
+%changelog
--- a/arrow-testing-14.0.1.tar.gz
+++ b/arrow-testing-14.0.1.tar.gz
--- a/parquet-testing-14.0.1.tar.gz
+++ b/parquet-testing-14.0.1.tar.gz