Sync from SUSE:ALP:Source:Standard:1.0 apache-arrow revision 93f04aba74c56d8de77af9dd5d33dad0
This commit is contained in:
commit
fa16e2d0e2
23
.gitattributes
vendored
Normal file
23
.gitattributes
vendored
Normal file
@ -0,0 +1,23 @@
|
||||
## Default LFS
|
||||
*.7z filter=lfs diff=lfs merge=lfs -text
|
||||
*.bsp filter=lfs diff=lfs merge=lfs -text
|
||||
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.gem filter=lfs diff=lfs merge=lfs -text
|
||||
*.gz filter=lfs diff=lfs merge=lfs -text
|
||||
*.jar filter=lfs diff=lfs merge=lfs -text
|
||||
*.lz filter=lfs diff=lfs merge=lfs -text
|
||||
*.lzma filter=lfs diff=lfs merge=lfs -text
|
||||
*.obscpio filter=lfs diff=lfs merge=lfs -text
|
||||
*.oxt filter=lfs diff=lfs merge=lfs -text
|
||||
*.pdf filter=lfs diff=lfs merge=lfs -text
|
||||
*.png filter=lfs diff=lfs merge=lfs -text
|
||||
*.rpm filter=lfs diff=lfs merge=lfs -text
|
||||
*.tbz filter=lfs diff=lfs merge=lfs -text
|
||||
*.tbz2 filter=lfs diff=lfs merge=lfs -text
|
||||
*.tgz filter=lfs diff=lfs merge=lfs -text
|
||||
*.ttf filter=lfs diff=lfs merge=lfs -text
|
||||
*.txz filter=lfs diff=lfs merge=lfs -text
|
||||
*.whl filter=lfs diff=lfs merge=lfs -text
|
||||
*.xz filter=lfs diff=lfs merge=lfs -text
|
||||
*.zip filter=lfs diff=lfs merge=lfs -text
|
||||
*.zst filter=lfs diff=lfs merge=lfs -text
|
11
_constraints
Normal file
11
_constraints
Normal file
@ -0,0 +1,11 @@
|
||||
<constraints>
|
||||
<hardware>
|
||||
<memory>
|
||||
<size unit="G">10</size>
|
||||
</memory>
|
||||
<disk>
|
||||
<size unit="G">10</size>
|
||||
</disk>
|
||||
</hardware>
|
||||
</constraints>
|
||||
|
BIN
apache-arrow-14.0.1.tar.gz
(Stored with Git LFS)
Normal file
BIN
apache-arrow-14.0.1.tar.gz
(Stored with Git LFS)
Normal file
Binary file not shown.
418
apache-arrow.changes
Normal file
418
apache-arrow.changes
Normal file
@ -0,0 +1,418 @@
|
||||
-------------------------------------------------------------------
|
||||
Mon Nov 13 23:51:00 UTC 2023 - Ondřej Súkup <mimi.vx@gmail.com>
|
||||
|
||||
- update 14.0.1
|
||||
* GH-38431 - [Python][CI] Update fs.type_name checks for s3fs tests
|
||||
* GH-38607 - [Python] Disable PyExtensionType autoload
|
||||
- update to 14.0.1
|
||||
* very long list of changes can be found here:
|
||||
https://arrow.apache.org/release/14.0.0.html
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Fri Aug 25 09:05:09 UTC 2023 - Ben Greiner <code@bnavigator.de>
|
||||
|
||||
- Update to 13.0.0
|
||||
## Acero
|
||||
* Handling of unaligned buffers is input nodes can be configured
|
||||
programmatically or by setting the environment variable
|
||||
ACERO_ALIGNMENT_HANDLING. The default behavior is to warn when
|
||||
an unaligned buffer is detected GH-35498.
|
||||
## Compute
|
||||
* Several new functions have been added:
|
||||
- aggregate functions “first”, “last”, “first_last” GH-34911;
|
||||
- vector functions “cumulative_prod”, “cumulative_min”,
|
||||
“cumulative_max” GH-32190;
|
||||
- vector function “pairwise_diff” GH-35786.
|
||||
* Sorting now works on dictionary arrays, with a much better
|
||||
performance than the naive approach of sorting the decoded
|
||||
dictionary GH-29887. Sorting also works on struct arrays, and
|
||||
nested sort keys are supported using FieldRed GH-33206.
|
||||
* The check_overflow option has been removed from
|
||||
CumulativeSumOptions as it was redundant with the availability
|
||||
of two different functions: “cumulative_sum” and
|
||||
“cumulative_sum_checked” GH-35789.
|
||||
* Run-end encoded filters are efficiently supported GH-35749.
|
||||
* Duration types are supported with the “is_in” and “index_in”
|
||||
functions GH-36047. They can be multiplied with all integer
|
||||
types GH-36128.
|
||||
* “is_in” and “index_in” now cast their inputs more flexibly:
|
||||
they first attempt to cast the value set to the input type,
|
||||
then in the other direction if the former fails GH-36203.
|
||||
* Multiple bugs have been fixed in “utf8_slice_codeunits” when
|
||||
the stop option is omitted GH-36311.
|
||||
## Dataset
|
||||
* A custom schema can now be passed when writing a dataset
|
||||
GH-35730. The custom schema can alter nullability or metadata
|
||||
information, but is not allowed to change the datatypes
|
||||
written.
|
||||
## Filesystems
|
||||
* The S3 filesystem now writes files in equal-sized chunks, for
|
||||
compatibility with Cloudflare’s “R2” Storage GH-34363.
|
||||
* A long-standing issue where S3 support could crash at shutdown
|
||||
because of resources still being alive after S3 finalization
|
||||
has been fixed GH-36346. Now, attempts to use S3 resources
|
||||
(such as making filesystem calls) after S3 finalization should
|
||||
result in a clean error.
|
||||
* The GCS filesystem accepts a new option to set the project id
|
||||
GH-36227.
|
||||
## IPC
|
||||
* Nullability and metadata information for sub-fields of map
|
||||
types is now preserved when deserializing Arrow IPC GH-35297.
|
||||
## Orc
|
||||
* The Orc adapter now maps Arrow field metadata to Orc type
|
||||
attributes when writing, and vice-versa when reading GH-35304.
|
||||
## Parquet
|
||||
* It is now possible to write additional metadata while a
|
||||
ParquetFileWriter is open GH-34888.
|
||||
* Writing a page index can be enabled selectively per-column
|
||||
GH-34949. In addition, page header statistics are not written
|
||||
anymore if the page index is enabled for the given column
|
||||
GH-34375, as the information would be redundant and less
|
||||
efficiently accessed.
|
||||
* Parquet writer properties allow specifying the sorting columns
|
||||
GH-35331. The user is responsible for ensuring that the data
|
||||
written to the file actually complies with the given sorting.
|
||||
* CRC computation has been implemented for v2 data pages
|
||||
GH-35171. It was already implemented for v1 data pages.
|
||||
* Writing compliant nested types is now enabled by default
|
||||
GH-29781. This should not have any negative implication.
|
||||
* Attempting to load a subset of an Arrow extension type is now
|
||||
forbidden GH-20385. Previously, if an extension type’s storage
|
||||
is nested (for example a “Point” extension type backed by a
|
||||
struct<x: float64, y: float64>), it was possible to load
|
||||
selectively some of the columns of the storage type.
|
||||
## Substrait
|
||||
* Support for various functions has been added: “stddev”,
|
||||
“variance”, “first”, “last” (GH-35247, GH-35506).
|
||||
* Deserializing sorts is now supported GH-32763. However, some
|
||||
features, such as clustered sort direction or custom sort
|
||||
functions, are not implemented.
|
||||
## Miscellaneous
|
||||
* FieldRef sports additional methods to get a flattened version
|
||||
of nested fields GH-14946. Compared to their non-flattened
|
||||
counterparts, the methods GetFlattened, GetAllFlattened,
|
||||
GetOneFlattened and GetOneOrNoneFlattened combine a child’s
|
||||
null bitmap with its ancestors’ null bitmaps such as to compute
|
||||
the field’s overall logical validity bitmap.
|
||||
* In other words, given the struct array [null, {'x': null},
|
||||
{'x': 5}], FieldRef("x")::Get might return [0, null, 5] while
|
||||
FieldRef("y")::GetFlattened will always return [null, null, 5].
|
||||
* Scalar::hash() has been fixed for sliced nested arrays
|
||||
GH-35360.
|
||||
* A new floating-point to decimal conversion algorithm exhibits
|
||||
much better precision GH-35576.
|
||||
* It is now possible to cast between scalars of different
|
||||
list-like types GH-36309.
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Mon Jun 12 12:13:18 UTC 2023 - Ben Greiner <code@bnavigator.de>
|
||||
|
||||
- Update to 12.0.1
|
||||
* [GH-35423] - [C++][Parquet] Parquet PageReader Force
|
||||
decompression buffer resize smaller (#35428)
|
||||
* [GH-35498] - [C++] Relax EnsureAlignment check in Acero from
|
||||
requiring 64-byte aligned buffers to requiring value-aligned
|
||||
buffers (#35565)
|
||||
* [GH-35519] - [C++][Parquet] Fixing exception handling in parquet
|
||||
FileSerializer (#35520)
|
||||
* [GH-35538] - [C++] Remove unnecessary status.h include from
|
||||
protobuf (#35673)
|
||||
* [GH-35730] - [C++] Add the ability to specify custom schema on a
|
||||
dataset write (#35860)
|
||||
* [GH-35850] - [C++] Don't disable optimization with
|
||||
RelWithDebInfo (#35856)
|
||||
- Drop cflags.patch -- fixed upstream
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Thu May 18 07:00:43 UTC 2023 - Ben Greiner <code@bnavigator.de>
|
||||
|
||||
- Update to 12.0.0
|
||||
* Run-End Encoded Arrays have been implemented and are accessible
|
||||
(GH-32104)
|
||||
* The FixedShapeTensor Logical value type has been implemented
|
||||
using ExtensionType (GH-15483, GH-34796)
|
||||
## Compute
|
||||
* New kernel to convert timestamp with timezone to wall time
|
||||
(GH-33143)
|
||||
* Cast kernels are now built into libarrow by default (GH-34388)
|
||||
## Acero
|
||||
* Acero has been moved out of libarrow into it’s own shared
|
||||
library, allowing for smaller builds of the core libarrow
|
||||
(GH-15280)
|
||||
* Exec nodes now can have a concept of “ordering” and will reject
|
||||
non-sensible plans (GH-34136)
|
||||
* New exec nodes: “pivot_longer” (GH-34266), “order_by”
|
||||
(GH-34248) and “fetch” (GH-34059)
|
||||
* Breaking Change: Reorder output fields of “group_by” node so
|
||||
that keys/segment keys come before aggregates (GH-33616)
|
||||
## Substrait
|
||||
* Add support for the round function GH-33588
|
||||
* Add support for the cast expression element GH-31910
|
||||
* Added API reference documentation GH-34011
|
||||
* Added an extension relation to support segmented aggregation
|
||||
GH-34626
|
||||
* The output of the aggregate relation now conforms to the spec
|
||||
GH-34786
|
||||
## Parquet
|
||||
* Added support for DeltaLengthByteArray encoding to the Parquet
|
||||
writer (GH-33024)
|
||||
* NaNs are correctly handled now for Parquet predicate push-downs
|
||||
(GH-18481)
|
||||
* Added support for reading Parquet page indexes (GH-33596) and
|
||||
writing page indexes (GH-34053)
|
||||
* Parquet writer can write columns in parallel now (GH-33655)
|
||||
* Fixed incorrect number of rows in Parquet V2 page headers
|
||||
(GH-34086)
|
||||
* Fixed incorrect Parquet page null_count when stats are disabled
|
||||
(GH-34326)
|
||||
* Added support for reading BloomFilters to the Parquet Reader
|
||||
(GH-34665)
|
||||
* Parquet File-writer can now add additional key-value metadata
|
||||
after it has been opened (GH-34888)
|
||||
* Breaking Change: The default row group size for the Arrow
|
||||
writer changed from 64Mi rows to 1Mi rows. GH-34280
|
||||
## ORC
|
||||
* Added support for the union type in ORC writer (GH-34262)
|
||||
* Fixed ORC CHAR type mapping with Arrow (GH-34823)
|
||||
* Fixed timestamp type mapping between ORC and arrow (GH-34590)
|
||||
## Datasets
|
||||
* Added support for reading JSON datasets (GH-33209)
|
||||
* Dataset writer now supports specifying a function callback to
|
||||
construct the file name in addition to the existing file name
|
||||
template (GH-34565)
|
||||
## Filesystems
|
||||
* GcsFileSystem::OpenInputFile avoids unnecessary downloads
|
||||
(GH-34051)
|
||||
## Other changes
|
||||
* Convenience Append(std::optional...) methods have been added to
|
||||
array builders
|
||||
([GH-14863](https://github.com/apache/arrow/issues/14863))
|
||||
* A deprecated OpenTelemetry header was removed from the Flight
|
||||
library (GH-34417)
|
||||
* Fixed crash in “take” kernels on ExtensionArrays with an
|
||||
underlying dictionary type (GH-34619)
|
||||
* Fixed bug where the C-Data bridge did not preserve nullability
|
||||
of map values on import (GH-34983)
|
||||
* Added support for EqualOptions to RecordBatch::Equals
|
||||
(GH-34968)
|
||||
* zstd dependency upgraded to v1.5.5 (GH-34899)
|
||||
* Improved handling of “logical” nulls such as with union and
|
||||
RunEndEncoded arrays (GH-34361)
|
||||
* Fixed incorrect handling of uncompressed body buffers in IPC
|
||||
reader, added IpcWriteOptions::min_space_savings for optional
|
||||
compression optimizations (GH-15102)
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Mon Apr 3 11:09:06 UTC 2023 - Andreas Schwab <schwab@suse.de>
|
||||
|
||||
- cflags.patch: fix option order to compile with optimisation
|
||||
- Adjust constraints
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Wed Mar 29 13:13:13 UTC 2023 - Ben Greiner <code@bnavigator.de>
|
||||
|
||||
- Remove gflags-static. It was only needed due to a packaging error
|
||||
with gflags which is about to be fixed in Tumbleweed
|
||||
- Disable build of the jemalloc memory pool backend
|
||||
* It requires every consuming application to LD_PRELOAD
|
||||
libjemalloc.so.2, even when it is not set as the default memory
|
||||
pool, due to static TLS block allocation errors
|
||||
* Usage of the bundled jemalloc as a workaround is not desired
|
||||
(gh#apache/arrow#13739)
|
||||
* jemalloc does not seem to have a clear advantage over the
|
||||
system glibc allocator:
|
||||
https://ursalabs.org/blog/2021-r-benchmarks-part-1
|
||||
* This overrides the default behavior documented in
|
||||
https://arrow.apache.org/docs/cpp/memory.html#default-memory-pool
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Sun Mar 12 04:28:52 UTC 2023 - Ben Greiner <code@bnavigator.de>
|
||||
|
||||
- Update to v11.0.0
|
||||
* ARROW-4709 - [C++] Optimize for ordered JSON fields (#14100)
|
||||
* ARROW-11776 - [C++][Java] Support parquet write from ArrowReader
|
||||
to file (#14151)
|
||||
* ARROW-13938 - [C++] Date and datetime types should autocast from
|
||||
strings
|
||||
* ARROW-14161 - [C++][Docs] Improve Parquet C++ docs (#14018)
|
||||
* ARROW-14999 - [C++] Optional field name equality checks for map
|
||||
and list type (#14847)
|
||||
* ARROW-15538 - [C++] Expanding coverage of math functions from
|
||||
Substrait to Acero (#14434)
|
||||
* ARROW-15592 - [C++] Add support for custom output field names in
|
||||
a substrait::PlanRel (#14292)
|
||||
* ARROW-15732 - [C++] Do not use any CPU threads in execution plan
|
||||
when use_threads is false (#15104)
|
||||
* ARROW-16782 - [Format] Add REE definitions to FlatBuffers
|
||||
(#14176)
|
||||
* ARROW-17144 - [C++][Gandiva] Add sqrt function (#13656)
|
||||
* ARROW-17301 - [C++] Implement compute function "binary_slice"
|
||||
(#14550)
|
||||
* ARROW-17509 - [C++] Simplify async scheduler by removing the
|
||||
need to call End (#14524)
|
||||
* ARROW-17520 - [C++] Implement SubStrait SetRel (UnionAll)
|
||||
(#14186)
|
||||
* ARROW-17610 - [C++] Support additional source types in
|
||||
SourceNode (#14207)
|
||||
* ARROW-17613 - [C++] Add function execution API for a
|
||||
preconfigured kernel (#14043)
|
||||
* ARROW-17640 - [C++] Add File Handling Test cases for GlobFile
|
||||
handling in Substrait Read (#14132)
|
||||
* ARROW-17798 - [C++][Parquet] Add DELTA_BINARY_PACKED encoder to
|
||||
Parquet writer (#14191)
|
||||
* ARROW-17825 - [C++] Allow the possibility to write several
|
||||
tables in ORCFileWriter (#14219)
|
||||
* ARROW-17836 - [C++] Allow specifying alignment of buffers
|
||||
(#14225)
|
||||
* ARROW-17837 - [C++][Acero] Create ExecPlan-owned QueryContext
|
||||
that will store a plan's shared data structures (#14227)
|
||||
* ARROW-17859 - [C++] Use self-pipe in signal-receiving StopSource
|
||||
(#14250)
|
||||
* ARROW-17867 - [C++][FlightRPC] Expose bulk parameter binding in
|
||||
Flight SQL (#14266)
|
||||
* ARROW-17932 - [C++] Implement streaming RecordBatchReader for
|
||||
JSON (#14355)
|
||||
* ARROW-17960 - [C++][Python] Implement list_slice kernel (#14395)
|
||||
* ARROW-17966 - [C++] Adjust to new format for Substrait optional
|
||||
arguments (#14415)
|
||||
* ARROW-17975 - [C++] Create at-fork facility (#14594)
|
||||
* ARROW-17980 - [C++] As-of-Join Substrait extension (#14485)
|
||||
* ARROW-17989 - [C++][Python] Enable struct_field kernel to accept
|
||||
string field names (#14495)
|
||||
* ARROW-18008 - [Python][C++] Add use_threads to
|
||||
run_substrait_query
|
||||
* ARROW-18051 - [C++] Enable tests skipped by ARROW-16392 (#14425)
|
||||
* ARROW-18095 - [CI][C++][MinGW] All tests exited with 0xc0000139
|
||||
* ARROW-18113 - [C++] Add RandomAccessFile::ReadManyAsync (#14723)
|
||||
* ARROW-18135 - [C++] Avoid warnings that ExecBatch::length may be
|
||||
uninitialized (#14480)
|
||||
* ARROW-18144 - [C++] Improve JSONTypeError error message in
|
||||
testing (#14486)
|
||||
* ARROW-18184 - [C++] Improve JSON parser benchmarks (#14552)
|
||||
* ARROW-18206 - [C++][CI] Add a nightly build for C++20
|
||||
compilation (#14571)
|
||||
* ARROW-18235 - [C++][Gandiva] Fix the like function
|
||||
implementation for escape chars (#14579)
|
||||
* ARROW-18249 - [C++] Update vcpkg port to arrow 10.0.0
|
||||
* ARROW-18253 - [C++][Parquet] Add additional bounds safety checks
|
||||
(#14592)
|
||||
* ARROW-18259 - [C++][CMake] Add support for system Thrift CMake
|
||||
package (#14597)
|
||||
* ARROW-18280 - [C++][Python] Support slicing to end in list_slice
|
||||
kernel (#14749)
|
||||
* ARROW-18282 - [C++][Python] Support step >= 1 in list_slice
|
||||
kernel (#14696)
|
||||
* ARROW-18287 - [C++][CMake] Add support for Brotli/utf8proc
|
||||
provided by vcpkg (#14609)
|
||||
* ARROW-18342 - [C++] AsofJoinNode support for Boolean data field
|
||||
(#14658)
|
||||
* ARROW-18350 - [C++] Use std::to_chars instead of std::to_string
|
||||
(#14666)
|
||||
* ARROW-18367 - [C++] Enable the creation of named table relations
|
||||
(#14681)
|
||||
* ARROW-18373 - Fix component drop-down, add license text (#14688)
|
||||
* ARROW-18377 - MIGRATION: Automate component labels from issue
|
||||
form content (#15245)
|
||||
* ARROW-18395 - [C++] Move select-k implementation into separate
|
||||
module
|
||||
* ARROW-18402 - [C++] Expose DeclarationInfo (#14765)
|
||||
* ARROW-18406 - [C++] Can't build Arrow with Substrait on Ubuntu
|
||||
20.04 (#14735)
|
||||
* ARROW-18409 - [GLib][Plasma] Suppress deprecated warning in
|
||||
building plasma-glib (#14739)
|
||||
* ARROW-18413 - [C++][Parquet] Expose page index info from
|
||||
ColumnChunkMetaData (#14742)
|
||||
* ARROW-18419 - [C++] Update vendored fast_float (#14817)
|
||||
* ARROW-18420 - [C++][Parquet] Introduce ColumnIndex & OffsetIndex
|
||||
(#14803)
|
||||
* ARROW-18421 - [C++][ORC] Add accessor for stripe information in
|
||||
reader (#14806)
|
||||
* ARROW-18427 - [C++] Support negative tolerance in AsofJoinNode
|
||||
(#14934)
|
||||
* ARROW-18435 - [C++][Java] Update ORC to 1.8.1 (#14942)
|
||||
* GH-14869 - [C++] Add Cflags.private defining _STATIC to .pc.in.
|
||||
(#14900)
|
||||
* GH-14920 - [C++][CMake] Add missing -latomic to Arrow CMake
|
||||
package (#15251)
|
||||
* GH-14937 - [C++] Add rank kernel benchmarks (#14938)
|
||||
* GH-14951 - [C++][Parquet] Add benchmarks for DELTA_BINARY_PACKED
|
||||
encoding (#15140)
|
||||
* GH-15072 - [C++] Move the round functionality into a separate
|
||||
module (#15073)
|
||||
* GH-15074 - [Parquet][C++] change 16-bit page_ordinal to 32-bit
|
||||
(#15182)
|
||||
* GH-15096 - [C++] Substrait ProjectRel Emit Optimization (#15097)
|
||||
* GH-15100 - [C++][Parquet] Add benchmark for reading strings from
|
||||
Parquet (#15101)
|
||||
* GH-15151 - [C++] Adding RecordBatchReaderSource to solve an
|
||||
issue in R API (#15183)
|
||||
* GH-15185 - [C++][Parquet] Improve documentation for Parquet
|
||||
Reader column_indices (#15184)
|
||||
* GH-15199 - [C++][Substrait] Allow
|
||||
AGGREGATION_INVOCATION_UNSPECIFIED as valid invocation (#15198)
|
||||
* GH-15200 - [C++] Created benchmarks for round kernels. (#15201)
|
||||
* GH-15216 - [C++][Parquet] Parquet writer accepts RecordBatch
|
||||
(#15240)
|
||||
* GH-15226 - [C++] Add DurationType to hash kernels (#33685)
|
||||
* GH-15237 - [C++] Add ::arrow::Unreachable() using
|
||||
std::string_view (#15238)
|
||||
* GH-15239 - [C++][Parquet] Parquet writer writes decimal as
|
||||
int32/64 (#15244)
|
||||
* GH-15290 - [C++][Compute] Optimize IfElse kernel AAS/ASA case
|
||||
when the scalar is null (#15291)
|
||||
* GH-33607 - [C++] Support optional additional arguments for
|
||||
inline visit functions (#33608)
|
||||
* GH-33657 - [C++] arrow-dataset.pc doesn't depend on parquet.pc
|
||||
without ARROW_PARQUET=ON (#33665)
|
||||
* PARQUET-2179 - [C++][Parquet] Add a test for skipping repeated
|
||||
fields (#14366)
|
||||
* PARQUET-2188 - [parquet-cpp] Add SkipRecords API to RecordReader
|
||||
(#14142)
|
||||
* PARQUET-2204 - [parquet-cpp] TypedColumnReaderImpl::Skip should
|
||||
reuse scratch space (#14509)
|
||||
* PARQUET-2206 - [parquet-cpp] Microbenchmark for ColumnReader
|
||||
ReadBatch and Skip (#14523)
|
||||
* PARQUET-2209 - [parquet-cpp] Optimize skip for the case that
|
||||
number of values to skip equals page size (#14545)
|
||||
* PARQUET-2210 - [C++][Parquet] Skip pages based on header
|
||||
metadata using a callback (#14603)
|
||||
* PARQUET-2211 - [C++] Print ColumnMetaData.encoding_stats field
|
||||
(#14556)
|
||||
- Remove unused python3-arrow package declaration
|
||||
* Add options as recommended for python support
|
||||
- Provide test data for unittests
|
||||
- Don't use system jemalloc but bundle it in order to avoid
|
||||
static TLS errors in consuming packages like python-pyarrow
|
||||
* gh#apache/arrow#13739
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Sun Aug 28 19:30:50 UTC 2022 - Stefan Brüns <stefan.bruens@rwth-aachen.de>
|
||||
|
||||
- Revert ccache change, using ccache in a pristine buildroot
|
||||
just slows down OBS builds (use --ccache for local builds).
|
||||
- Remove unused gflags-static-devel dependency.
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Mon Aug 22 06:22:43 UTC 2022 - John Vandenberg <jayvdb@gmail.com>
|
||||
|
||||
- Speed up builds with ccache
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Sat Aug 6 01:59:08 UTC 2022 - Stefan Brüns <stefan.bruens@rwth-aachen.de>
|
||||
|
||||
- Update to v9.0.0
|
||||
No (current) changelog provided
|
||||
- Spec file cleanup:
|
||||
* Remove lots of duplicate, unused, or wrong build dependencies
|
||||
* Do not package outdated Readmes and Changelogs
|
||||
- Enable tests, disable ones requiring external test data
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Sat Nov 14 09:07:59 UTC 2020 - John Vandenberg <jayvdb@gmail.com>
|
||||
|
||||
- Update to v2.0.0
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Wed Nov 13 21:14:00 UTC 2019 - TheBlackCat <toddrme2178@gmail.com>
|
||||
|
||||
- Initial spec for v0.12.0
|
414
apache-arrow.spec
Normal file
414
apache-arrow.spec
Normal file
@ -0,0 +1,414 @@
|
||||
#
|
||||
# spec file for package apache-arrow
|
||||
#
|
||||
# Copyright (c) 2023 SUSE LLC
|
||||
#
|
||||
# All modifications and additions to the file contributed by third parties
|
||||
# remain the property of their copyright owners, unless otherwise agreed
|
||||
# upon. The license for this file, and modifications and additions to the
|
||||
# file, is the same license as for the pristine package itself (unless the
|
||||
# license for the pristine package is not an Open Source License, in which
|
||||
# case the license is the MIT License). An "Open Source License" is a
|
||||
# license that conforms to the Open Source Definition (Version 1.9)
|
||||
# published by the Open Source Initiative.
|
||||
|
||||
# Please submit bugfixes or comments via https://bugs.opensuse.org/
|
||||
#
|
||||
|
||||
|
||||
%bcond_without tests
|
||||
# Required for runtime dispatch, not yet packaged
|
||||
%bcond_with xsimd
|
||||
|
||||
%define sonum 1400
|
||||
# See git submodule /testing pointing to the correct revision
|
||||
%define arrow_testing_commit 47f7b56b25683202c1fd957668e13f2abafc0f12
|
||||
# See git submodule /cpp/submodules/parquet-testing pointing to the correct revision
|
||||
%define parquet_testing_commit e45cd23f784aab3d6bf0701f8f4e621469ed3be7
|
||||
Name: apache-arrow
|
||||
Version: 14.0.1
|
||||
Release: 0
|
||||
Summary: A development platform for in-memory data
|
||||
License: Apache-2.0 AND BSD-3-Clause AND BSD-2-Clause AND MIT
|
||||
Group: Development/Tools/Other
|
||||
URL: https://arrow.apache.org/
|
||||
Source0: https://github.com/apache/arrow/archive/apache-arrow-%{version}.tar.gz
|
||||
Source1: https://github.com/apache/arrow-testing/archive/%{arrow_testing_commit}.tar.gz#/arrow-testing-%{version}.tar.gz
|
||||
Source2: https://github.com/apache/parquet-testing/archive/%{parquet_testing_commit}.tar.gz#/parquet-testing-%{version}.tar.gz
|
||||
BuildRequires: bison
|
||||
BuildRequires: cmake >= 3.16
|
||||
BuildRequires: fdupes
|
||||
BuildRequires: flex
|
||||
BuildRequires: gcc-c++
|
||||
BuildRequires: libboost_filesystem-devel
|
||||
BuildRequires: libboost_system-devel >= 1.64.0
|
||||
BuildRequires: libzstd-devel-static
|
||||
BuildRequires: llvm-devel >= 7
|
||||
BuildRequires: pkgconfig
|
||||
BuildRequires: python-rpm-macros
|
||||
BuildRequires: python3-base
|
||||
BuildRequires: cmake(Snappy) >= 1.1.7
|
||||
BuildRequires: cmake(absl)
|
||||
BuildRequires: cmake(double-conversion) >= 3.1.5
|
||||
BuildRequires: cmake(re2)
|
||||
BuildRequires: pkgconfig(RapidJSON)
|
||||
BuildRequires: pkgconfig(bzip2) >= 1.0.8
|
||||
BuildRequires: pkgconfig(gflags) >= 2.2.0
|
||||
BuildRequires: pkgconfig(grpc++) >= 1.20.0
|
||||
BuildRequires: pkgconfig(libbrotlicommon) >= 1.0.7
|
||||
BuildRequires: pkgconfig(libbrotlidec) >= 1.0.7
|
||||
BuildRequires: pkgconfig(libbrotlienc) >= 1.0.7
|
||||
BuildRequires: pkgconfig(libcares) >= 1.15.0
|
||||
BuildRequires: pkgconfig(libglog) >= 0.3.5
|
||||
BuildRequires: pkgconfig(liblz4) >= 1.8.3
|
||||
BuildRequires: pkgconfig(libopenssl)
|
||||
BuildRequires: pkgconfig(liburiparser) >= 0.9.3
|
||||
BuildRequires: pkgconfig(libutf8proc)
|
||||
BuildRequires: pkgconfig(libzstd) >= 1.4.3
|
||||
BuildRequires: pkgconfig(protobuf) >= 3.7.1
|
||||
BuildRequires: pkgconfig(thrift) >= 0.11.0
|
||||
BuildRequires: pkgconfig(zlib) >= 1.2.11
|
||||
%if %{with tests}
|
||||
BuildRequires: timezone
|
||||
BuildRequires: pkgconfig(gmock) >= 1.10
|
||||
BuildRequires: pkgconfig(gtest) >= 1.10
|
||||
%endif
|
||||
|
||||
%description
|
||||
Apache Arrow is a cross-language development platform for in-memory
|
||||
data. It specifies a standardized language-independent columnar memory
|
||||
format for flat and hierarchical data, organized for efficient
|
||||
analytic operations on modern hardware. It also provides computational
|
||||
libraries and zero-copy streaming messaging and interprocess
|
||||
communication.
|
||||
|
||||
%package -n libarrow%{sonum}
|
||||
Summary: Development platform for in-memory data - shared library
|
||||
Group: System/Libraries
|
||||
|
||||
%description -n libarrow%{sonum}
|
||||
Apache Arrow is a cross-language development platform for in-memory
|
||||
data. It specifies a standardized language-independent columnar memory
|
||||
format for flat and hierarchical data, organized for efficient
|
||||
analytic operations on modern hardware. It also provides computational
|
||||
libraries and zero-copy streaming messaging and interprocess
|
||||
communication.
|
||||
|
||||
This package provides the shared library for Apache Arrow.
|
||||
|
||||
%package -n libarrow_acero%{sonum}
|
||||
Summary: Development platform for in-memory data - shared library
|
||||
Group: System/Libraries
|
||||
|
||||
%description -n libarrow_acero%{sonum}
|
||||
Apache Arrow is a cross-language development platform for in-memory
|
||||
data. It specifies a standardized language-independent columnar memory
|
||||
format for flat and hierarchical data, organized for efficient
|
||||
analytic operations on modern hardware. It also provides computational
|
||||
libraries and zero-copy streaming messaging and interprocess
|
||||
communication.
|
||||
|
||||
This package provides the shared library for the Acero streaming execution engine
|
||||
|
||||
%package -n libarrow_dataset%{sonum}
|
||||
Summary: Development platform for in-memory data - shared library
|
||||
Group: System/Libraries
|
||||
|
||||
%description -n libarrow_dataset%{sonum}
|
||||
Apache Arrow is a cross-language development platform for in-memory
|
||||
data. It specifies a standardized language-independent columnar memory
|
||||
format for flat and hierarchical data, organized for efficient
|
||||
analytic operations on modern hardware. It also provides computational
|
||||
libraries and zero-copy streaming messaging and interprocess
|
||||
communication.
|
||||
|
||||
This package provides the shared library for Dataset API support.
|
||||
|
||||
%package -n libparquet%{sonum}
|
||||
Summary: Development platform for in-memory data - shared library
|
||||
Group: System/Libraries
|
||||
|
||||
%description -n libparquet%{sonum}
|
||||
Apache Arrow is a cross-language development platform for in-memory
|
||||
data. It specifies a standardized language-independent columnar memory
|
||||
format for flat and hierarchical data, organized for efficient
|
||||
analytic operations on modern hardware. It also provides computational
|
||||
libraries and zero-copy streaming messaging and interprocess
|
||||
communication.
|
||||
|
||||
This package provides the shared library for the Parquet format.
|
||||
|
||||
%package devel
|
||||
Summary: Development platform for in-memory data - development files
|
||||
Group: Development/Libraries/C and C++
|
||||
Requires: libarrow%{sonum} = %{version}
|
||||
Requires: libarrow_acero%{sonum} = %{version}
|
||||
Requires: libarrow_dataset%{sonum} = %{version}
|
||||
|
||||
%description devel
|
||||
Apache Arrow is a cross-language development platform for in-memory
|
||||
data. It specifies a standardized language-independent columnar memory
|
||||
format for flat and hierarchical data, organized for efficient
|
||||
analytic operations on modern hardware. It also provides computational
|
||||
libraries and zero-copy streaming messaging and interprocess
|
||||
communication.
|
||||
|
||||
This package provides the development libraries and headers for
|
||||
Apache Arrow.
|
||||
|
||||
%package devel-static
|
||||
Summary: Development platform for in-memory data - development files
|
||||
Group: Development/Libraries/C and C++
|
||||
Requires: %{name}-devel = %{version}
|
||||
|
||||
%description devel-static
|
||||
Apache Arrow is a cross-language development platform for in-memory
|
||||
data. It specifies a standardized language-independent columnar memory
|
||||
format for flat and hierarchical data, organized for efficient
|
||||
analytic operations on modern hardware. It also provides computational
|
||||
libraries and zero-copy streaming messaging and interprocess
|
||||
communication.
|
||||
|
||||
This package provides the static library
|
||||
|
||||
%package acero-devel-static
|
||||
Summary: Development platform for in-memory data - development files
|
||||
Group: Development/Libraries/C and C++
|
||||
Requires: %{name}-devel = %{version}
|
||||
|
||||
%description acero-devel-static
|
||||
Apache Arrow is a cross-language development platform for in-memory
|
||||
data. It specifies a standardized language-independent columnar memory
|
||||
format for flat and hierarchical data, organized for efficient
|
||||
analytic operations on modern hardware. It also provides computational
|
||||
libraries and zero-copy streaming messaging and interprocess
|
||||
communication.
|
||||
|
||||
This package provides the static library for the Acero streaming execution engine
|
||||
|
||||
%package dataset-devel-static
|
||||
Summary: Development platform for in-memory data - development files
|
||||
Group: Development/Libraries/C and C++
|
||||
Requires: %{name}-devel = %{version}
|
||||
|
||||
%description dataset-devel-static
|
||||
Apache Arrow is a cross-language development platform for in-memory
|
||||
data. It specifies a standardized language-independent columnar memory
|
||||
format for flat and hierarchical data, organized for efficient
|
||||
analytic operations on modern hardware. It also provides computational
|
||||
libraries and zero-copy streaming messaging and interprocess
|
||||
communication.
|
||||
|
||||
This package provides the static library for Dataset API support
|
||||
|
||||
%package -n apache-parquet-devel
|
||||
Summary: Development platform for in-memory data - development files
|
||||
Group: Development/Libraries/C and C++
|
||||
Requires: libparquet%{sonum} = %{version}
|
||||
|
||||
%description -n apache-parquet-devel
|
||||
Apache Arrow is a cross-language development platform for in-memory
|
||||
data. It specifies a standardized language-independent columnar memory
|
||||
format for flat and hierarchical data, organized for efficient
|
||||
analytic operations on modern hardware. It also provides computational
|
||||
libraries and zero-copy streaming messaging and interprocess
|
||||
communication.
|
||||
|
||||
This package provides the development libraries and headers for
|
||||
the Parquet format.
|
||||
|
||||
%package -n apache-parquet-devel-static
|
||||
Summary: Development platform for in-memory data - development files
|
||||
Group: Development/Libraries/C and C++
|
||||
Requires: apache-parquet-devel = %{version}
|
||||
|
||||
%description -n apache-parquet-devel-static
|
||||
Apache Arrow is a cross-language development platform for in-memory
|
||||
data. It specifies a standardized language-independent columnar memory
|
||||
format for flat and hierarchical data, organized for efficient
|
||||
analytic operations on modern hardware. It also provides computational
|
||||
libraries and zero-copy streaming messaging and interprocess
|
||||
communication.
|
||||
|
||||
This package provides the static library for the Parquet format.
|
||||
|
||||
%package -n apache-parquet-utils
|
||||
Summary: Development platform for in-memory data - development files
|
||||
Group: Productivity/Scientific/Math
|
||||
|
||||
%description -n apache-parquet-utils
|
||||
Apache Arrow is a cross-language development platform for in-memory
|
||||
data. It specifies a standardized language-independent columnar memory
|
||||
format for flat and hierarchical data, organized for efficient
|
||||
analytic operations on modern hardware. It also provides computational
|
||||
libraries and zero-copy streaming messaging and interprocess
|
||||
communication.
|
||||
|
||||
This package provides utilities for working with the Parquet format.
|
||||
|
||||
%prep
|
||||
%setup -q -n arrow-apache-arrow-%{version} -a1 -a2
|
||||
|
||||
%build
|
||||
export CFLAGS="%{optflags} -ffat-lto-objects"
|
||||
export CXXFLAGS="%{optflags} -ffat-lto-objects"
|
||||
|
||||
pushd cpp
|
||||
%cmake \
|
||||
-DARROW_BUILD_EXAMPLES:BOOL=ON \
|
||||
-DARROW_BUILD_SHARED:BOOL=ON \
|
||||
-DARROW_BUILD_STATIC:BOOL=ON \
|
||||
-DARROW_BUILD_TESTS:BOOL=%{?with_tests:ON}%{!?with_tests:OFF} \
|
||||
-DARROW_BUILD_UTILITIES:BOOL=ON \
|
||||
-DARROW_DEPENDENCY_SOURCE=SYSTEM \
|
||||
-DARROW_BUILD_BENCHMARKS:BOOL=OFF \
|
||||
%ifarch aarch64
|
||||
-DARROW_SIMD_LEVEL:STRING=%{?with_xsimd:NEON}%{!?with_xsimd:NONE} \
|
||||
%else
|
||||
-DARROW_SIMD_LEVEL:STRING="NONE" \
|
||||
%endif
|
||||
-DARROW_RUNTIME_SIMD_LEVEL:STRING=%{?with_xsimd:MAX}%{!?with_xsimd:NONE} \
|
||||
-DARROW_COMPUTE:BOOL=ON \
|
||||
-DARROW_CSV:BOOL=ON \
|
||||
-DARROW_DATASET:BOOL=ON \
|
||||
-DARROW_FILESYSTEM:BOOL=ON \
|
||||
-DARROW_FLIGHT:BOOL=OFF \
|
||||
-DARROW_GANDIVA:BOOL=OFF \
|
||||
-DARROW_HDFS:BOOL=ON \
|
||||
-DARROW_HIVESERVER2:BOOL=OFF \
|
||||
-DARROW_IPC:BOOL=ON \
|
||||
-DARROW_JEMALLOC:BOOL=OFF \
|
||||
-DARROW_JSON:BOOL=ON \
|
||||
-DARROW_ORC:BOOL=OFF \
|
||||
-DARROW_PARQUET:BOOL=ON \
|
||||
-DARROW_USE_GLOG:BOOL=ON \
|
||||
-DARROW_USE_OPENSSL:BOOL=ON \
|
||||
-DARROW_WITH_BACKTRACE:BOOL=ON \
|
||||
-DARROW_WITH_BROTLI:BOOL=ON \
|
||||
-DARROW_WITH_BZ2:BOOL=ON \
|
||||
-DARROW_WITH_LZ4:BOOL=ON \
|
||||
-DARROW_WITH_SNAPPY:BOOL=ON \
|
||||
-DARROW_WITH_ZLIB:BOOL=ON \
|
||||
-DARROW_WITH_ZSTD:BOOL=ON \
|
||||
-DPARQUET_BUILD_EXAMPLES:BOOL=ON \
|
||||
-DPARQUET_BUILD_EXECUTABLES:BOOL=ON \
|
||||
-DPARQUET_REQUIRE_ENCRYPTION:BOOL=ON \
|
||||
-DARROW_VERBOSE_THIRDPARTY_BUILD:BOOL=ON \
|
||||
-DARROW_CUDA:BOOL=OFF \
|
||||
-DARROW_GANDIVA_JAVA:BOOL=OFF
|
||||
|
||||
%cmake_build
|
||||
popd
|
||||
|
||||
%install
|
||||
pushd cpp
|
||||
%cmake_install
|
||||
popd
|
||||
%if %{with tests}
|
||||
rm %{buildroot}%{_libdir}/libarrow_testing.so*
|
||||
rm %{buildroot}%{_libdir}/libarrow_testing.a
|
||||
rm %{buildroot}%{_libdir}/pkgconfig/arrow-testing.pc
|
||||
rm -Rf %{buildroot}%{_includedir}/arrow/testing
|
||||
%endif
|
||||
rm -r %{buildroot}%{_datadir}/doc/arrow/
|
||||
%fdupes %{buildroot}%{_libdir}/cmake
|
||||
|
||||
%check
|
||||
%if %{with tests}
|
||||
export PARQUET_TEST_DATA="${PWD}/parquet-testing-%{parquet_testing_commit}/data"
|
||||
export ARROW_TEST_DATA="${PWD}/arrow-testing-%{arrow_testing_commit}/data"
|
||||
pushd cpp
|
||||
export PYTHON=%{_bindir}/python3
|
||||
%ifarch %ix86 %arm32
|
||||
GTEST_failing="TestDecimalFromReal*"
|
||||
GTEST_failing="${GTEST_failing}:*TestDecryptionConfiguration.TestDecryption*"
|
||||
%endif
|
||||
%ifnarch x86_64
|
||||
GTEST_failing="${GTEST_failing}:Jemalloc.GetAllocationStats"
|
||||
%endif
|
||||
if [ -n "${GTEST_failing}" ]; then
|
||||
export GTEST_FILTER=${GTEST_failing}
|
||||
%ctest --label-regex unittest || true
|
||||
export GTEST_FILTER=*:-${GTEST_failing}
|
||||
fi
|
||||
%ctest --label-regex unittest
|
||||
popd
|
||||
%endif
|
||||
|
||||
%post -n libarrow%{sonum} -p /sbin/ldconfig
|
||||
%postun -n libarrow%{sonum} -p /sbin/ldconfig
|
||||
%post -n libarrow_acero%{sonum} -p /sbin/ldconfig
|
||||
%postun -n libarrow_acero%{sonum} -p /sbin/ldconfig
|
||||
%post -n libarrow_dataset%{sonum} -p /sbin/ldconfig
|
||||
%postun -n libarrow_dataset%{sonum} -p /sbin/ldconfig
|
||||
%post -n libparquet%{sonum} -p /sbin/ldconfig
|
||||
%postun -n libparquet%{sonum} -p /sbin/ldconfig
|
||||
|
||||
%files
|
||||
%license LICENSE.txt NOTICE.txt header
|
||||
%{_bindir}/arrow-file-to-stream
|
||||
%{_bindir}/arrow-stream-to-file
|
||||
|
||||
%files -n libarrow%{sonum}
|
||||
%license LICENSE.txt NOTICE.txt header
|
||||
%{_libdir}/libarrow.so.*
|
||||
|
||||
%files -n libarrow_acero%{sonum}
|
||||
%license LICENSE.txt NOTICE.txt header
|
||||
%{_libdir}/libarrow_acero.so.*
|
||||
|
||||
%files -n libarrow_dataset%{sonum}
|
||||
%license LICENSE.txt NOTICE.txt header
|
||||
%{_libdir}/libarrow_dataset.so.*
|
||||
|
||||
%files -n libparquet%{sonum}
|
||||
%license LICENSE.txt NOTICE.txt header
|
||||
%{_libdir}/libparquet.so.*
|
||||
|
||||
%files devel
|
||||
%doc README.md
|
||||
%license LICENSE.txt NOTICE.txt header
|
||||
%{_includedir}/arrow/
|
||||
%{_libdir}/cmake/Arrow*
|
||||
%{_libdir}/libarrow.so
|
||||
%{_libdir}/libarrow_acero.so
|
||||
%{_libdir}/libarrow_dataset.so
|
||||
%{_libdir}/pkgconfig/arrow*.pc
|
||||
%dir %{_datadir}/arrow
|
||||
%{_datadir}/arrow/gdb
|
||||
%dir %{_datadir}/gdb
|
||||
%dir %{_datadir}/gdb/auto-load
|
||||
%dir %{_datadir}/gdb/auto-load/%{_prefix}
|
||||
%dir %{_datadir}/gdb/auto-load/%{_libdir}
|
||||
%{_datadir}/gdb/auto-load/%{_libdir}/libarrow.so.*.py
|
||||
|
||||
%files devel-static
|
||||
%license LICENSE.txt NOTICE.txt header
|
||||
%{_libdir}/libarrow.a
|
||||
|
||||
%files acero-devel-static
|
||||
%license LICENSE.txt NOTICE.txt header
|
||||
%{_libdir}/libarrow_acero.a
|
||||
|
||||
%files dataset-devel-static
|
||||
%license LICENSE.txt NOTICE.txt header
|
||||
%{_libdir}/libarrow_dataset.a
|
||||
|
||||
%files -n apache-parquet-devel
|
||||
%doc README.md
|
||||
%license LICENSE.txt NOTICE.txt header
|
||||
%{_includedir}/parquet/
|
||||
%{_libdir}/cmake/Parquet
|
||||
%{_libdir}/libparquet.so
|
||||
%{_libdir}/pkgconfig/parquet.pc
|
||||
|
||||
%files -n apache-parquet-devel-static
|
||||
%license LICENSE.txt NOTICE.txt header
|
||||
%{_libdir}/libparquet.a
|
||||
|
||||
%files -n apache-parquet-utils
|
||||
%doc README.md
|
||||
%license LICENSE.txt NOTICE.txt header
|
||||
%{_bindir}/parquet-*
|
||||
|
||||
%changelog
|
BIN
arrow-testing-14.0.1.tar.gz
(Stored with Git LFS)
Normal file
BIN
arrow-testing-14.0.1.tar.gz
(Stored with Git LFS)
Normal file
Binary file not shown.
BIN
parquet-testing-14.0.1.tar.gz
(Stored with Git LFS)
Normal file
BIN
parquet-testing-14.0.1.tar.gz
(Stored with Git LFS)
Normal file
Binary file not shown.
Loading…
Reference in New Issue
Block a user