commit fa16e2d0e2c2bc0f5ccbaaad719dc86b4eb33c2a Author: Adrian Schröter Date: Fri Jan 5 09:32:04 2024 +0100 Sync from SUSE:ALP:Source:Standard:1.0 apache-arrow revision 93f04aba74c56d8de77af9dd5d33dad0 diff --git a/.gitattributes b/.gitattributes new file mode 100644 index 0000000..fecc750 --- /dev/null +++ b/.gitattributes @@ -0,0 +1,23 @@ +## Default LFS +*.7z filter=lfs diff=lfs merge=lfs -text +*.bsp filter=lfs diff=lfs merge=lfs -text +*.bz2 filter=lfs diff=lfs merge=lfs -text +*.gem filter=lfs diff=lfs merge=lfs -text +*.gz filter=lfs diff=lfs merge=lfs -text +*.jar filter=lfs diff=lfs merge=lfs -text +*.lz filter=lfs diff=lfs merge=lfs -text +*.lzma filter=lfs diff=lfs merge=lfs -text +*.obscpio filter=lfs diff=lfs merge=lfs -text +*.oxt filter=lfs diff=lfs merge=lfs -text +*.pdf filter=lfs diff=lfs merge=lfs -text +*.png filter=lfs diff=lfs merge=lfs -text +*.rpm filter=lfs diff=lfs merge=lfs -text +*.tbz filter=lfs diff=lfs merge=lfs -text +*.tbz2 filter=lfs diff=lfs merge=lfs -text +*.tgz filter=lfs diff=lfs merge=lfs -text +*.ttf filter=lfs diff=lfs merge=lfs -text +*.txz filter=lfs diff=lfs merge=lfs -text +*.whl filter=lfs diff=lfs merge=lfs -text +*.xz filter=lfs diff=lfs merge=lfs -text +*.zip filter=lfs diff=lfs merge=lfs -text +*.zst filter=lfs diff=lfs merge=lfs -text diff --git a/_constraints b/_constraints new file mode 100644 index 0000000..fa5aa27 --- /dev/null +++ b/_constraints @@ -0,0 +1,11 @@ + + + + 10 + + + 10 + + + + diff --git a/apache-arrow-14.0.1.tar.gz b/apache-arrow-14.0.1.tar.gz new file mode 100644 index 0000000..0f6665f --- /dev/null +++ b/apache-arrow-14.0.1.tar.gz @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:a48e54a09d58168bc04d86b13e7dab04f0aaba18a6f7e4dadf3e9c7bb835c8f1 +size 20634558 diff --git a/apache-arrow.changes b/apache-arrow.changes new file mode 100644 index 0000000..6244ce8 --- /dev/null +++ b/apache-arrow.changes @@ -0,0 +1,418 @@ +------------------------------------------------------------------- +Mon Nov 13 23:51:00 UTC 2023 - Ondřej Súkup + +- update 14.0.1 + * GH-38431 - [Python][CI] Update fs.type_name checks for s3fs tests + * GH-38607 - [Python] Disable PyExtensionType autoload +- update to 14.0.1 + * very long list of changes can be found here: + https://arrow.apache.org/release/14.0.0.html + +------------------------------------------------------------------- +Fri Aug 25 09:05:09 UTC 2023 - Ben Greiner + +- Update to 13.0.0 + ## Acero + * Handling of unaligned buffers is input nodes can be configured + programmatically or by setting the environment variable + ACERO_ALIGNMENT_HANDLING. The default behavior is to warn when + an unaligned buffer is detected GH-35498. + ## Compute + * Several new functions have been added: + - aggregate functions “first”, “last”, “first_last” GH-34911; + - vector functions “cumulative_prod”, “cumulative_min”, + “cumulative_max” GH-32190; + - vector function “pairwise_diff” GH-35786. + * Sorting now works on dictionary arrays, with a much better + performance than the naive approach of sorting the decoded + dictionary GH-29887. Sorting also works on struct arrays, and + nested sort keys are supported using FieldRed GH-33206. + * The check_overflow option has been removed from + CumulativeSumOptions as it was redundant with the availability + of two different functions: “cumulative_sum” and + “cumulative_sum_checked” GH-35789. + * Run-end encoded filters are efficiently supported GH-35749. + * Duration types are supported with the “is_in” and “index_in” + functions GH-36047. They can be multiplied with all integer + types GH-36128. + * “is_in” and “index_in” now cast their inputs more flexibly: + they first attempt to cast the value set to the input type, + then in the other direction if the former fails GH-36203. + * Multiple bugs have been fixed in “utf8_slice_codeunits” when + the stop option is omitted GH-36311. + ## Dataset + * A custom schema can now be passed when writing a dataset + GH-35730. The custom schema can alter nullability or metadata + information, but is not allowed to change the datatypes + written. + ## Filesystems + * The S3 filesystem now writes files in equal-sized chunks, for + compatibility with Cloudflare’s “R2” Storage GH-34363. + * A long-standing issue where S3 support could crash at shutdown + because of resources still being alive after S3 finalization + has been fixed GH-36346. Now, attempts to use S3 resources + (such as making filesystem calls) after S3 finalization should + result in a clean error. + * The GCS filesystem accepts a new option to set the project id + GH-36227. + ## IPC + * Nullability and metadata information for sub-fields of map + types is now preserved when deserializing Arrow IPC GH-35297. + ## Orc + * The Orc adapter now maps Arrow field metadata to Orc type + attributes when writing, and vice-versa when reading GH-35304. + ## Parquet + * It is now possible to write additional metadata while a + ParquetFileWriter is open GH-34888. + * Writing a page index can be enabled selectively per-column + GH-34949. In addition, page header statistics are not written + anymore if the page index is enabled for the given column + GH-34375, as the information would be redundant and less + efficiently accessed. + * Parquet writer properties allow specifying the sorting columns + GH-35331. The user is responsible for ensuring that the data + written to the file actually complies with the given sorting. + * CRC computation has been implemented for v2 data pages + GH-35171. It was already implemented for v1 data pages. + * Writing compliant nested types is now enabled by default + GH-29781. This should not have any negative implication. + * Attempting to load a subset of an Arrow extension type is now + forbidden GH-20385. Previously, if an extension type’s storage + is nested (for example a “Point” extension type backed by a + struct), it was possible to load + selectively some of the columns of the storage type. + ## Substrait + * Support for various functions has been added: “stddev”, + “variance”, “first”, “last” (GH-35247, GH-35506). + * Deserializing sorts is now supported GH-32763. However, some + features, such as clustered sort direction or custom sort + functions, are not implemented. + ## Miscellaneous + * FieldRef sports additional methods to get a flattened version + of nested fields GH-14946. Compared to their non-flattened + counterparts, the methods GetFlattened, GetAllFlattened, + GetOneFlattened and GetOneOrNoneFlattened combine a child’s + null bitmap with its ancestors’ null bitmaps such as to compute + the field’s overall logical validity bitmap. + * In other words, given the struct array [null, {'x': null}, + {'x': 5}], FieldRef("x")::Get might return [0, null, 5] while + FieldRef("y")::GetFlattened will always return [null, null, 5]. + * Scalar::hash() has been fixed for sliced nested arrays + GH-35360. + * A new floating-point to decimal conversion algorithm exhibits + much better precision GH-35576. + * It is now possible to cast between scalars of different + list-like types GH-36309. + +------------------------------------------------------------------- +Mon Jun 12 12:13:18 UTC 2023 - Ben Greiner + +- Update to 12.0.1 + * [GH-35423] - [C++][Parquet] Parquet PageReader Force + decompression buffer resize smaller (#35428) + * [GH-35498] - [C++] Relax EnsureAlignment check in Acero from + requiring 64-byte aligned buffers to requiring value-aligned + buffers (#35565) + * [GH-35519] - [C++][Parquet] Fixing exception handling in parquet + FileSerializer (#35520) + * [GH-35538] - [C++] Remove unnecessary status.h include from + protobuf (#35673) + * [GH-35730] - [C++] Add the ability to specify custom schema on a + dataset write (#35860) + * [GH-35850] - [C++] Don't disable optimization with + RelWithDebInfo (#35856) +- Drop cflags.patch -- fixed upstream + +------------------------------------------------------------------- +Thu May 18 07:00:43 UTC 2023 - Ben Greiner + +- Update to 12.0.0 + * Run-End Encoded Arrays have been implemented and are accessible + (GH-32104) + * The FixedShapeTensor Logical value type has been implemented + using ExtensionType (GH-15483, GH-34796) + ## Compute + * New kernel to convert timestamp with timezone to wall time + (GH-33143) + * Cast kernels are now built into libarrow by default (GH-34388) + ## Acero + * Acero has been moved out of libarrow into it’s own shared + library, allowing for smaller builds of the core libarrow + (GH-15280) + * Exec nodes now can have a concept of “ordering” and will reject + non-sensible plans (GH-34136) + * New exec nodes: “pivot_longer” (GH-34266), “order_by” + (GH-34248) and “fetch” (GH-34059) + * Breaking Change: Reorder output fields of “group_by” node so + that keys/segment keys come before aggregates (GH-33616) + ## Substrait + * Add support for the round function GH-33588 + * Add support for the cast expression element GH-31910 + * Added API reference documentation GH-34011 + * Added an extension relation to support segmented aggregation + GH-34626 + * The output of the aggregate relation now conforms to the spec + GH-34786 + ## Parquet + * Added support for DeltaLengthByteArray encoding to the Parquet + writer (GH-33024) + * NaNs are correctly handled now for Parquet predicate push-downs + (GH-18481) + * Added support for reading Parquet page indexes (GH-33596) and + writing page indexes (GH-34053) + * Parquet writer can write columns in parallel now (GH-33655) + * Fixed incorrect number of rows in Parquet V2 page headers + (GH-34086) + * Fixed incorrect Parquet page null_count when stats are disabled + (GH-34326) + * Added support for reading BloomFilters to the Parquet Reader + (GH-34665) + * Parquet File-writer can now add additional key-value metadata + after it has been opened (GH-34888) + * Breaking Change: The default row group size for the Arrow + writer changed from 64Mi rows to 1Mi rows. GH-34280 + ## ORC + * Added support for the union type in ORC writer (GH-34262) + * Fixed ORC CHAR type mapping with Arrow (GH-34823) + * Fixed timestamp type mapping between ORC and arrow (GH-34590) + ## Datasets + * Added support for reading JSON datasets (GH-33209) + * Dataset writer now supports specifying a function callback to + construct the file name in addition to the existing file name + template (GH-34565) + ## Filesystems + * GcsFileSystem::OpenInputFile avoids unnecessary downloads + (GH-34051) + ## Other changes + * Convenience Append(std::optional...) methods have been added to + array builders + ([GH-14863](https://github.com/apache/arrow/issues/14863)) + * A deprecated OpenTelemetry header was removed from the Flight + library (GH-34417) + * Fixed crash in “take” kernels on ExtensionArrays with an + underlying dictionary type (GH-34619) + * Fixed bug where the C-Data bridge did not preserve nullability + of map values on import (GH-34983) + * Added support for EqualOptions to RecordBatch::Equals + (GH-34968) + * zstd dependency upgraded to v1.5.5 (GH-34899) + * Improved handling of “logical” nulls such as with union and + RunEndEncoded arrays (GH-34361) + * Fixed incorrect handling of uncompressed body buffers in IPC + reader, added IpcWriteOptions::min_space_savings for optional + compression optimizations (GH-15102) + +------------------------------------------------------------------- +Mon Apr 3 11:09:06 UTC 2023 - Andreas Schwab + +- cflags.patch: fix option order to compile with optimisation +- Adjust constraints + +------------------------------------------------------------------- +Wed Mar 29 13:13:13 UTC 2023 - Ben Greiner + +- Remove gflags-static. It was only needed due to a packaging error + with gflags which is about to be fixed in Tumbleweed +- Disable build of the jemalloc memory pool backend + * It requires every consuming application to LD_PRELOAD + libjemalloc.so.2, even when it is not set as the default memory + pool, due to static TLS block allocation errors + * Usage of the bundled jemalloc as a workaround is not desired + (gh#apache/arrow#13739) + * jemalloc does not seem to have a clear advantage over the + system glibc allocator: + https://ursalabs.org/blog/2021-r-benchmarks-part-1 + * This overrides the default behavior documented in + https://arrow.apache.org/docs/cpp/memory.html#default-memory-pool + +------------------------------------------------------------------- +Sun Mar 12 04:28:52 UTC 2023 - Ben Greiner + +- Update to v11.0.0 + * ARROW-4709 - [C++] Optimize for ordered JSON fields (#14100) + * ARROW-11776 - [C++][Java] Support parquet write from ArrowReader + to file (#14151) + * ARROW-13938 - [C++] Date and datetime types should autocast from + strings + * ARROW-14161 - [C++][Docs] Improve Parquet C++ docs (#14018) + * ARROW-14999 - [C++] Optional field name equality checks for map + and list type (#14847) + * ARROW-15538 - [C++] Expanding coverage of math functions from + Substrait to Acero (#14434) + * ARROW-15592 - [C++] Add support for custom output field names in + a substrait::PlanRel (#14292) + * ARROW-15732 - [C++] Do not use any CPU threads in execution plan + when use_threads is false (#15104) + * ARROW-16782 - [Format] Add REE definitions to FlatBuffers + (#14176) + * ARROW-17144 - [C++][Gandiva] Add sqrt function (#13656) + * ARROW-17301 - [C++] Implement compute function "binary_slice" + (#14550) + * ARROW-17509 - [C++] Simplify async scheduler by removing the + need to call End (#14524) + * ARROW-17520 - [C++] Implement SubStrait SetRel (UnionAll) + (#14186) + * ARROW-17610 - [C++] Support additional source types in + SourceNode (#14207) + * ARROW-17613 - [C++] Add function execution API for a + preconfigured kernel (#14043) + * ARROW-17640 - [C++] Add File Handling Test cases for GlobFile + handling in Substrait Read (#14132) + * ARROW-17798 - [C++][Parquet] Add DELTA_BINARY_PACKED encoder to + Parquet writer (#14191) + * ARROW-17825 - [C++] Allow the possibility to write several + tables in ORCFileWriter (#14219) + * ARROW-17836 - [C++] Allow specifying alignment of buffers + (#14225) + * ARROW-17837 - [C++][Acero] Create ExecPlan-owned QueryContext + that will store a plan's shared data structures (#14227) + * ARROW-17859 - [C++] Use self-pipe in signal-receiving StopSource + (#14250) + * ARROW-17867 - [C++][FlightRPC] Expose bulk parameter binding in + Flight SQL (#14266) + * ARROW-17932 - [C++] Implement streaming RecordBatchReader for + JSON (#14355) + * ARROW-17960 - [C++][Python] Implement list_slice kernel (#14395) + * ARROW-17966 - [C++] Adjust to new format for Substrait optional + arguments (#14415) + * ARROW-17975 - [C++] Create at-fork facility (#14594) + * ARROW-17980 - [C++] As-of-Join Substrait extension (#14485) + * ARROW-17989 - [C++][Python] Enable struct_field kernel to accept + string field names (#14495) + * ARROW-18008 - [Python][C++] Add use_threads to + run_substrait_query + * ARROW-18051 - [C++] Enable tests skipped by ARROW-16392 (#14425) + * ARROW-18095 - [CI][C++][MinGW] All tests exited with 0xc0000139 + * ARROW-18113 - [C++] Add RandomAccessFile::ReadManyAsync (#14723) + * ARROW-18135 - [C++] Avoid warnings that ExecBatch::length may be + uninitialized (#14480) + * ARROW-18144 - [C++] Improve JSONTypeError error message in + testing (#14486) + * ARROW-18184 - [C++] Improve JSON parser benchmarks (#14552) + * ARROW-18206 - [C++][CI] Add a nightly build for C++20 + compilation (#14571) + * ARROW-18235 - [C++][Gandiva] Fix the like function + implementation for escape chars (#14579) + * ARROW-18249 - [C++] Update vcpkg port to arrow 10.0.0 + * ARROW-18253 - [C++][Parquet] Add additional bounds safety checks + (#14592) + * ARROW-18259 - [C++][CMake] Add support for system Thrift CMake + package (#14597) + * ARROW-18280 - [C++][Python] Support slicing to end in list_slice + kernel (#14749) + * ARROW-18282 - [C++][Python] Support step >= 1 in list_slice + kernel (#14696) + * ARROW-18287 - [C++][CMake] Add support for Brotli/utf8proc + provided by vcpkg (#14609) + * ARROW-18342 - [C++] AsofJoinNode support for Boolean data field + (#14658) + * ARROW-18350 - [C++] Use std::to_chars instead of std::to_string + (#14666) + * ARROW-18367 - [C++] Enable the creation of named table relations + (#14681) + * ARROW-18373 - Fix component drop-down, add license text (#14688) + * ARROW-18377 - MIGRATION: Automate component labels from issue + form content (#15245) + * ARROW-18395 - [C++] Move select-k implementation into separate + module + * ARROW-18402 - [C++] Expose DeclarationInfo (#14765) + * ARROW-18406 - [C++] Can't build Arrow with Substrait on Ubuntu + 20.04 (#14735) + * ARROW-18409 - [GLib][Plasma] Suppress deprecated warning in + building plasma-glib (#14739) + * ARROW-18413 - [C++][Parquet] Expose page index info from + ColumnChunkMetaData (#14742) + * ARROW-18419 - [C++] Update vendored fast_float (#14817) + * ARROW-18420 - [C++][Parquet] Introduce ColumnIndex & OffsetIndex + (#14803) + * ARROW-18421 - [C++][ORC] Add accessor for stripe information in + reader (#14806) + * ARROW-18427 - [C++] Support negative tolerance in AsofJoinNode + (#14934) + * ARROW-18435 - [C++][Java] Update ORC to 1.8.1 (#14942) + * GH-14869 - [C++] Add Cflags.private defining _STATIC to .pc.in. + (#14900) + * GH-14920 - [C++][CMake] Add missing -latomic to Arrow CMake + package (#15251) + * GH-14937 - [C++] Add rank kernel benchmarks (#14938) + * GH-14951 - [C++][Parquet] Add benchmarks for DELTA_BINARY_PACKED + encoding (#15140) + * GH-15072 - [C++] Move the round functionality into a separate + module (#15073) + * GH-15074 - [Parquet][C++] change 16-bit page_ordinal to 32-bit + (#15182) + * GH-15096 - [C++] Substrait ProjectRel Emit Optimization (#15097) + * GH-15100 - [C++][Parquet] Add benchmark for reading strings from + Parquet (#15101) + * GH-15151 - [C++] Adding RecordBatchReaderSource to solve an + issue in R API (#15183) + * GH-15185 - [C++][Parquet] Improve documentation for Parquet + Reader column_indices (#15184) + * GH-15199 - [C++][Substrait] Allow + AGGREGATION_INVOCATION_UNSPECIFIED as valid invocation (#15198) + * GH-15200 - [C++] Created benchmarks for round kernels. (#15201) + * GH-15216 - [C++][Parquet] Parquet writer accepts RecordBatch + (#15240) + * GH-15226 - [C++] Add DurationType to hash kernels (#33685) + * GH-15237 - [C++] Add ::arrow::Unreachable() using + std::string_view (#15238) + * GH-15239 - [C++][Parquet] Parquet writer writes decimal as + int32/64 (#15244) + * GH-15290 - [C++][Compute] Optimize IfElse kernel AAS/ASA case + when the scalar is null (#15291) + * GH-33607 - [C++] Support optional additional arguments for + inline visit functions (#33608) + * GH-33657 - [C++] arrow-dataset.pc doesn't depend on parquet.pc + without ARROW_PARQUET=ON (#33665) + * PARQUET-2179 - [C++][Parquet] Add a test for skipping repeated + fields (#14366) + * PARQUET-2188 - [parquet-cpp] Add SkipRecords API to RecordReader + (#14142) + * PARQUET-2204 - [parquet-cpp] TypedColumnReaderImpl::Skip should + reuse scratch space (#14509) + * PARQUET-2206 - [parquet-cpp] Microbenchmark for ColumnReader + ReadBatch and Skip (#14523) + * PARQUET-2209 - [parquet-cpp] Optimize skip for the case that + number of values to skip equals page size (#14545) + * PARQUET-2210 - [C++][Parquet] Skip pages based on header + metadata using a callback (#14603) + * PARQUET-2211 - [C++] Print ColumnMetaData.encoding_stats field + (#14556) +- Remove unused python3-arrow package declaration + * Add options as recommended for python support +- Provide test data for unittests +- Don't use system jemalloc but bundle it in order to avoid + static TLS errors in consuming packages like python-pyarrow + * gh#apache/arrow#13739 + +------------------------------------------------------------------- +Sun Aug 28 19:30:50 UTC 2022 - Stefan Brüns + +- Revert ccache change, using ccache in a pristine buildroot + just slows down OBS builds (use --ccache for local builds). +- Remove unused gflags-static-devel dependency. + +------------------------------------------------------------------- +Mon Aug 22 06:22:43 UTC 2022 - John Vandenberg + +- Speed up builds with ccache + +------------------------------------------------------------------- +Sat Aug 6 01:59:08 UTC 2022 - Stefan Brüns + +- Update to v9.0.0 + No (current) changelog provided +- Spec file cleanup: + * Remove lots of duplicate, unused, or wrong build dependencies + * Do not package outdated Readmes and Changelogs +- Enable tests, disable ones requiring external test data + +------------------------------------------------------------------- +Sat Nov 14 09:07:59 UTC 2020 - John Vandenberg + +- Update to v2.0.0 + +------------------------------------------------------------------- +Wed Nov 13 21:14:00 UTC 2019 - TheBlackCat + +- Initial spec for v0.12.0 diff --git a/apache-arrow.spec b/apache-arrow.spec new file mode 100644 index 0000000..535f3db --- /dev/null +++ b/apache-arrow.spec @@ -0,0 +1,414 @@ +# +# spec file for package apache-arrow +# +# Copyright (c) 2023 SUSE LLC +# +# All modifications and additions to the file contributed by third parties +# remain the property of their copyright owners, unless otherwise agreed +# upon. The license for this file, and modifications and additions to the +# file, is the same license as for the pristine package itself (unless the +# license for the pristine package is not an Open Source License, in which +# case the license is the MIT License). An "Open Source License" is a +# license that conforms to the Open Source Definition (Version 1.9) +# published by the Open Source Initiative. + +# Please submit bugfixes or comments via https://bugs.opensuse.org/ +# + + +%bcond_without tests +# Required for runtime dispatch, not yet packaged +%bcond_with xsimd + +%define sonum 1400 +# See git submodule /testing pointing to the correct revision +%define arrow_testing_commit 47f7b56b25683202c1fd957668e13f2abafc0f12 +# See git submodule /cpp/submodules/parquet-testing pointing to the correct revision +%define parquet_testing_commit e45cd23f784aab3d6bf0701f8f4e621469ed3be7 +Name: apache-arrow +Version: 14.0.1 +Release: 0 +Summary: A development platform for in-memory data +License: Apache-2.0 AND BSD-3-Clause AND BSD-2-Clause AND MIT +Group: Development/Tools/Other +URL: https://arrow.apache.org/ +Source0: https://github.com/apache/arrow/archive/apache-arrow-%{version}.tar.gz +Source1: https://github.com/apache/arrow-testing/archive/%{arrow_testing_commit}.tar.gz#/arrow-testing-%{version}.tar.gz +Source2: https://github.com/apache/parquet-testing/archive/%{parquet_testing_commit}.tar.gz#/parquet-testing-%{version}.tar.gz +BuildRequires: bison +BuildRequires: cmake >= 3.16 +BuildRequires: fdupes +BuildRequires: flex +BuildRequires: gcc-c++ +BuildRequires: libboost_filesystem-devel +BuildRequires: libboost_system-devel >= 1.64.0 +BuildRequires: libzstd-devel-static +BuildRequires: llvm-devel >= 7 +BuildRequires: pkgconfig +BuildRequires: python-rpm-macros +BuildRequires: python3-base +BuildRequires: cmake(Snappy) >= 1.1.7 +BuildRequires: cmake(absl) +BuildRequires: cmake(double-conversion) >= 3.1.5 +BuildRequires: cmake(re2) +BuildRequires: pkgconfig(RapidJSON) +BuildRequires: pkgconfig(bzip2) >= 1.0.8 +BuildRequires: pkgconfig(gflags) >= 2.2.0 +BuildRequires: pkgconfig(grpc++) >= 1.20.0 +BuildRequires: pkgconfig(libbrotlicommon) >= 1.0.7 +BuildRequires: pkgconfig(libbrotlidec) >= 1.0.7 +BuildRequires: pkgconfig(libbrotlienc) >= 1.0.7 +BuildRequires: pkgconfig(libcares) >= 1.15.0 +BuildRequires: pkgconfig(libglog) >= 0.3.5 +BuildRequires: pkgconfig(liblz4) >= 1.8.3 +BuildRequires: pkgconfig(libopenssl) +BuildRequires: pkgconfig(liburiparser) >= 0.9.3 +BuildRequires: pkgconfig(libutf8proc) +BuildRequires: pkgconfig(libzstd) >= 1.4.3 +BuildRequires: pkgconfig(protobuf) >= 3.7.1 +BuildRequires: pkgconfig(thrift) >= 0.11.0 +BuildRequires: pkgconfig(zlib) >= 1.2.11 +%if %{with tests} +BuildRequires: timezone +BuildRequires: pkgconfig(gmock) >= 1.10 +BuildRequires: pkgconfig(gtest) >= 1.10 +%endif + +%description +Apache Arrow is a cross-language development platform for in-memory +data. It specifies a standardized language-independent columnar memory +format for flat and hierarchical data, organized for efficient +analytic operations on modern hardware. It also provides computational +libraries and zero-copy streaming messaging and interprocess +communication. + +%package -n libarrow%{sonum} +Summary: Development platform for in-memory data - shared library +Group: System/Libraries + +%description -n libarrow%{sonum} +Apache Arrow is a cross-language development platform for in-memory +data. It specifies a standardized language-independent columnar memory +format for flat and hierarchical data, organized for efficient +analytic operations on modern hardware. It also provides computational +libraries and zero-copy streaming messaging and interprocess +communication. + +This package provides the shared library for Apache Arrow. + +%package -n libarrow_acero%{sonum} +Summary: Development platform for in-memory data - shared library +Group: System/Libraries + +%description -n libarrow_acero%{sonum} +Apache Arrow is a cross-language development platform for in-memory +data. It specifies a standardized language-independent columnar memory +format for flat and hierarchical data, organized for efficient +analytic operations on modern hardware. It also provides computational +libraries and zero-copy streaming messaging and interprocess +communication. + +This package provides the shared library for the Acero streaming execution engine + +%package -n libarrow_dataset%{sonum} +Summary: Development platform for in-memory data - shared library +Group: System/Libraries + +%description -n libarrow_dataset%{sonum} +Apache Arrow is a cross-language development platform for in-memory +data. It specifies a standardized language-independent columnar memory +format for flat and hierarchical data, organized for efficient +analytic operations on modern hardware. It also provides computational +libraries and zero-copy streaming messaging and interprocess +communication. + +This package provides the shared library for Dataset API support. + +%package -n libparquet%{sonum} +Summary: Development platform for in-memory data - shared library +Group: System/Libraries + +%description -n libparquet%{sonum} +Apache Arrow is a cross-language development platform for in-memory +data. It specifies a standardized language-independent columnar memory +format for flat and hierarchical data, organized for efficient +analytic operations on modern hardware. It also provides computational +libraries and zero-copy streaming messaging and interprocess +communication. + +This package provides the shared library for the Parquet format. + +%package devel +Summary: Development platform for in-memory data - development files +Group: Development/Libraries/C and C++ +Requires: libarrow%{sonum} = %{version} +Requires: libarrow_acero%{sonum} = %{version} +Requires: libarrow_dataset%{sonum} = %{version} + +%description devel +Apache Arrow is a cross-language development platform for in-memory +data. It specifies a standardized language-independent columnar memory +format for flat and hierarchical data, organized for efficient +analytic operations on modern hardware. It also provides computational +libraries and zero-copy streaming messaging and interprocess +communication. + +This package provides the development libraries and headers for +Apache Arrow. + +%package devel-static +Summary: Development platform for in-memory data - development files +Group: Development/Libraries/C and C++ +Requires: %{name}-devel = %{version} + +%description devel-static +Apache Arrow is a cross-language development platform for in-memory +data. It specifies a standardized language-independent columnar memory +format for flat and hierarchical data, organized for efficient +analytic operations on modern hardware. It also provides computational +libraries and zero-copy streaming messaging and interprocess +communication. + +This package provides the static library + +%package acero-devel-static +Summary: Development platform for in-memory data - development files +Group: Development/Libraries/C and C++ +Requires: %{name}-devel = %{version} + +%description acero-devel-static +Apache Arrow is a cross-language development platform for in-memory +data. It specifies a standardized language-independent columnar memory +format for flat and hierarchical data, organized for efficient +analytic operations on modern hardware. It also provides computational +libraries and zero-copy streaming messaging and interprocess +communication. + +This package provides the static library for the Acero streaming execution engine + +%package dataset-devel-static +Summary: Development platform for in-memory data - development files +Group: Development/Libraries/C and C++ +Requires: %{name}-devel = %{version} + +%description dataset-devel-static +Apache Arrow is a cross-language development platform for in-memory +data. It specifies a standardized language-independent columnar memory +format for flat and hierarchical data, organized for efficient +analytic operations on modern hardware. It also provides computational +libraries and zero-copy streaming messaging and interprocess +communication. + +This package provides the static library for Dataset API support + +%package -n apache-parquet-devel +Summary: Development platform for in-memory data - development files +Group: Development/Libraries/C and C++ +Requires: libparquet%{sonum} = %{version} + +%description -n apache-parquet-devel +Apache Arrow is a cross-language development platform for in-memory +data. It specifies a standardized language-independent columnar memory +format for flat and hierarchical data, organized for efficient +analytic operations on modern hardware. It also provides computational +libraries and zero-copy streaming messaging and interprocess +communication. + +This package provides the development libraries and headers for +the Parquet format. + +%package -n apache-parquet-devel-static +Summary: Development platform for in-memory data - development files +Group: Development/Libraries/C and C++ +Requires: apache-parquet-devel = %{version} + +%description -n apache-parquet-devel-static +Apache Arrow is a cross-language development platform for in-memory +data. It specifies a standardized language-independent columnar memory +format for flat and hierarchical data, organized for efficient +analytic operations on modern hardware. It also provides computational +libraries and zero-copy streaming messaging and interprocess +communication. + +This package provides the static library for the Parquet format. + +%package -n apache-parquet-utils +Summary: Development platform for in-memory data - development files +Group: Productivity/Scientific/Math + +%description -n apache-parquet-utils +Apache Arrow is a cross-language development platform for in-memory +data. It specifies a standardized language-independent columnar memory +format for flat and hierarchical data, organized for efficient +analytic operations on modern hardware. It also provides computational +libraries and zero-copy streaming messaging and interprocess +communication. + +This package provides utilities for working with the Parquet format. + +%prep +%setup -q -n arrow-apache-arrow-%{version} -a1 -a2 + +%build +export CFLAGS="%{optflags} -ffat-lto-objects" +export CXXFLAGS="%{optflags} -ffat-lto-objects" + +pushd cpp +%cmake \ + -DARROW_BUILD_EXAMPLES:BOOL=ON \ + -DARROW_BUILD_SHARED:BOOL=ON \ + -DARROW_BUILD_STATIC:BOOL=ON \ + -DARROW_BUILD_TESTS:BOOL=%{?with_tests:ON}%{!?with_tests:OFF} \ + -DARROW_BUILD_UTILITIES:BOOL=ON \ + -DARROW_DEPENDENCY_SOURCE=SYSTEM \ + -DARROW_BUILD_BENCHMARKS:BOOL=OFF \ +%ifarch aarch64 + -DARROW_SIMD_LEVEL:STRING=%{?with_xsimd:NEON}%{!?with_xsimd:NONE} \ +%else + -DARROW_SIMD_LEVEL:STRING="NONE" \ +%endif + -DARROW_RUNTIME_SIMD_LEVEL:STRING=%{?with_xsimd:MAX}%{!?with_xsimd:NONE} \ + -DARROW_COMPUTE:BOOL=ON \ + -DARROW_CSV:BOOL=ON \ + -DARROW_DATASET:BOOL=ON \ + -DARROW_FILESYSTEM:BOOL=ON \ + -DARROW_FLIGHT:BOOL=OFF \ + -DARROW_GANDIVA:BOOL=OFF \ + -DARROW_HDFS:BOOL=ON \ + -DARROW_HIVESERVER2:BOOL=OFF \ + -DARROW_IPC:BOOL=ON \ + -DARROW_JEMALLOC:BOOL=OFF \ + -DARROW_JSON:BOOL=ON \ + -DARROW_ORC:BOOL=OFF \ + -DARROW_PARQUET:BOOL=ON \ + -DARROW_USE_GLOG:BOOL=ON \ + -DARROW_USE_OPENSSL:BOOL=ON \ + -DARROW_WITH_BACKTRACE:BOOL=ON \ + -DARROW_WITH_BROTLI:BOOL=ON \ + -DARROW_WITH_BZ2:BOOL=ON \ + -DARROW_WITH_LZ4:BOOL=ON \ + -DARROW_WITH_SNAPPY:BOOL=ON \ + -DARROW_WITH_ZLIB:BOOL=ON \ + -DARROW_WITH_ZSTD:BOOL=ON \ + -DPARQUET_BUILD_EXAMPLES:BOOL=ON \ + -DPARQUET_BUILD_EXECUTABLES:BOOL=ON \ + -DPARQUET_REQUIRE_ENCRYPTION:BOOL=ON \ + -DARROW_VERBOSE_THIRDPARTY_BUILD:BOOL=ON \ + -DARROW_CUDA:BOOL=OFF \ + -DARROW_GANDIVA_JAVA:BOOL=OFF + +%cmake_build +popd + +%install +pushd cpp +%cmake_install +popd +%if %{with tests} +rm %{buildroot}%{_libdir}/libarrow_testing.so* +rm %{buildroot}%{_libdir}/libarrow_testing.a +rm %{buildroot}%{_libdir}/pkgconfig/arrow-testing.pc +rm -Rf %{buildroot}%{_includedir}/arrow/testing +%endif +rm -r %{buildroot}%{_datadir}/doc/arrow/ +%fdupes %{buildroot}%{_libdir}/cmake + +%check +%if %{with tests} +export PARQUET_TEST_DATA="${PWD}/parquet-testing-%{parquet_testing_commit}/data" +export ARROW_TEST_DATA="${PWD}/arrow-testing-%{arrow_testing_commit}/data" +pushd cpp +export PYTHON=%{_bindir}/python3 +%ifarch %ix86 %arm32 +GTEST_failing="TestDecimalFromReal*" +GTEST_failing="${GTEST_failing}:*TestDecryptionConfiguration.TestDecryption*" +%endif +%ifnarch x86_64 +GTEST_failing="${GTEST_failing}:Jemalloc.GetAllocationStats" +%endif +if [ -n "${GTEST_failing}" ]; then + export GTEST_FILTER=${GTEST_failing} + %ctest --label-regex unittest || true + export GTEST_FILTER=*:-${GTEST_failing} +fi +%ctest --label-regex unittest +popd +%endif + +%post -n libarrow%{sonum} -p /sbin/ldconfig +%postun -n libarrow%{sonum} -p /sbin/ldconfig +%post -n libarrow_acero%{sonum} -p /sbin/ldconfig +%postun -n libarrow_acero%{sonum} -p /sbin/ldconfig +%post -n libarrow_dataset%{sonum} -p /sbin/ldconfig +%postun -n libarrow_dataset%{sonum} -p /sbin/ldconfig +%post -n libparquet%{sonum} -p /sbin/ldconfig +%postun -n libparquet%{sonum} -p /sbin/ldconfig + +%files +%license LICENSE.txt NOTICE.txt header +%{_bindir}/arrow-file-to-stream +%{_bindir}/arrow-stream-to-file + +%files -n libarrow%{sonum} +%license LICENSE.txt NOTICE.txt header +%{_libdir}/libarrow.so.* + +%files -n libarrow_acero%{sonum} +%license LICENSE.txt NOTICE.txt header +%{_libdir}/libarrow_acero.so.* + +%files -n libarrow_dataset%{sonum} +%license LICENSE.txt NOTICE.txt header +%{_libdir}/libarrow_dataset.so.* + +%files -n libparquet%{sonum} +%license LICENSE.txt NOTICE.txt header +%{_libdir}/libparquet.so.* + +%files devel +%doc README.md +%license LICENSE.txt NOTICE.txt header +%{_includedir}/arrow/ +%{_libdir}/cmake/Arrow* +%{_libdir}/libarrow.so +%{_libdir}/libarrow_acero.so +%{_libdir}/libarrow_dataset.so +%{_libdir}/pkgconfig/arrow*.pc +%dir %{_datadir}/arrow +%{_datadir}/arrow/gdb +%dir %{_datadir}/gdb +%dir %{_datadir}/gdb/auto-load +%dir %{_datadir}/gdb/auto-load/%{_prefix} +%dir %{_datadir}/gdb/auto-load/%{_libdir} +%{_datadir}/gdb/auto-load/%{_libdir}/libarrow.so.*.py + +%files devel-static +%license LICENSE.txt NOTICE.txt header +%{_libdir}/libarrow.a + +%files acero-devel-static +%license LICENSE.txt NOTICE.txt header +%{_libdir}/libarrow_acero.a + +%files dataset-devel-static +%license LICENSE.txt NOTICE.txt header +%{_libdir}/libarrow_dataset.a + +%files -n apache-parquet-devel +%doc README.md +%license LICENSE.txt NOTICE.txt header +%{_includedir}/parquet/ +%{_libdir}/cmake/Parquet +%{_libdir}/libparquet.so +%{_libdir}/pkgconfig/parquet.pc + +%files -n apache-parquet-devel-static +%license LICENSE.txt NOTICE.txt header +%{_libdir}/libparquet.a + +%files -n apache-parquet-utils +%doc README.md +%license LICENSE.txt NOTICE.txt header +%{_bindir}/parquet-* + +%changelog diff --git a/arrow-testing-14.0.1.tar.gz b/arrow-testing-14.0.1.tar.gz new file mode 100644 index 0000000..9c1183e --- /dev/null +++ b/arrow-testing-14.0.1.tar.gz @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:e147c003222abae832c9ab1bdc79699fcc59dcc3bc56c3fe59538df5056a6b75 +size 3567158 diff --git a/parquet-testing-14.0.1.tar.gz b/parquet-testing-14.0.1.tar.gz new file mode 100644 index 0000000..197aa5f --- /dev/null +++ b/parquet-testing-14.0.1.tar.gz @@ -0,0 +1,3 @@ +version https://git-lfs.github.com/spec/v1 +oid sha256:88c47cd41f36f4b8d8138c6f4e92580922568e3e4dc5dbae88a64d2ebab82396 +size 1018484