10
0
forked from pool/apache-arrow
Files
apache-arrow/apache-arrow.changes
Benjamin Greiner 853a205aac - Update to 20.0.0
## Bug Fixes
  * GH-30302 - [C++][Parquet] Preserve the bitwidth of integer
    dictionary indices on round-trip to Parquet (#45685)
  * GH-31992 - [C++][Parquet] Handling the special case when
    DataPageV2 values buffer is empty (#45252)
  * GH-37630 - [C++][Python][Dataset] Allow disabling fragment
    metadata caching (#45330)
  * GH-39023 - [C++][CMake] Add missing launcher path conversion
    for ExternalPackage (#45349)
  * GH-43057 - [C++] Thread-safe AesEncryptor / AesDecryptor
    (#44990)
  * GH-45048 - [C++][Parquet] Deprecate unused chunk_size parameter
    in parquet::arrow::FileWriter::NewRowGroup() (#45088)
  * GH-45129 - [Python][C++] Fix usage of deprecated C++
    functionality on pyarrow (#45189)
  * GH-45132 - [C++][Gandiva] Update LLVM to 18.1 (#45114)
  * GH-45185 - [C++][Parquet] Raise an error for invalid repetition
    levels when delimiting records (#45186)
  * GH-45254 - [C++][Acero] Fix the row offset truncation in row
    table merge (#45255)
  * GH-45266 - [C++][Acero] Fix the running tasks count of
    Scheduler when get error tasks in multi-threads (#45268)
  * GH-45270 - [C++][CI] Disable mimalloc on Valgrind builds
    (#45271)
  * GH-45301 - [C++] Change PrimitiveArray ctor to protected
    (#45444)
  * GH-45334 - [C++][Acero] Fix swiss join overflow issues in row
    offset calculation for fixed length and null masks (#45336)
  * GH-45362 - [C++] Fix identity cast for time and list scalar

OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=55
2025-06-13 18:31:56 +00:00

1798 lines
81 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
-------------------------------------------------------------------
Fri Jun 13 18:22:55 UTC 2025 - Ben Greiner <code@bnavigator.de>
- Update to 20.0.0
## Bug Fixes
* GH-30302 - [C++][Parquet] Preserve the bitwidth of integer
dictionary indices on round-trip to Parquet (#45685)
* GH-31992 - [C++][Parquet] Handling the special case when
DataPageV2 values buffer is empty (#45252)
* GH-37630 - [C++][Python][Dataset] Allow disabling fragment
metadata caching (#45330)
* GH-39023 - [C++][CMake] Add missing launcher path conversion
for ExternalPackage (#45349)
* GH-43057 - [C++] Thread-safe AesEncryptor / AesDecryptor
(#44990)
* GH-45048 - [C++][Parquet] Deprecate unused chunk_size parameter
in parquet::arrow::FileWriter::NewRowGroup() (#45088)
* GH-45129 - [Python][C++] Fix usage of deprecated C++
functionality on pyarrow (#45189)
* GH-45132 - [C++][Gandiva] Update LLVM to 18.1 (#45114)
* GH-45185 - [C++][Parquet] Raise an error for invalid repetition
levels when delimiting records (#45186)
* GH-45254 - [C++][Acero] Fix the row offset truncation in row
table merge (#45255)
* GH-45266 - [C++][Acero] Fix the running tasks count of
Scheduler when get error tasks in multi-threads (#45268)
* GH-45270 - [C++][CI] Disable mimalloc on Valgrind builds
(#45271)
* GH-45301 - [C++] Change PrimitiveArray ctor to protected
(#45444)
* GH-45334 - [C++][Acero] Fix swiss join overflow issues in row
offset calculation for fixed length and null masks (#45336)
* GH-45362 - [C++] Fix identity cast for time and list scalar
(#45370)
* GH-45371 - [C++] Fix data race in SimpleRecordBatch::columns
(#45372)
* GH-45393 - [C++][Compute] Fix wrong decoding for 32-bit column
in row table (#45473)
* GH-45396 - [C++] Use Boost with ARROW_FUZZING (#45397)
* GH-45423 - [C++] Dont require Boost library with
ARROW_TESTING=ON/ARROW_BUILD_SHARED=OFF (#45424)
* GH-45497 - [C++][CSV] Avoid buffer overflow when a line has too
many columns (#45498)
* GH-45510 - [CI][C++] Fix LLVM APT repository preparation on
Debian (#45511)
* GH-45512 - [C++] Clean up undefined symbols in libarrow without
IPC (#45513)
* GH-45514 - [CI][C++][Docs] Set CUDAToolkit_ROOT explicitly in
debian-docs (#45520)
* GH-45537 - [CI][C++] Add missing includes (iwyu) to
file_skyhook.cc (#45538)
* GH-45541 - [Doc][C++] Render ASCII art as-is (#45542)
* GH-45545 - [C++][Parquet] Add missing includes (#45554)
* GH-45564 - [C++][Acero] Add size validation for names and
expressions vectors in ProjectNode (#45565)
* GH-45568 - [C++][Parquet][CMake] Enable zlib automatically when
Thrift is needed (#45569)
* GH-45578 - [C++] Use max not min in
MakeStatisticsArrayMaxApproximate test (#45579)
* GH-45587 - [C++][Docs] Fix the statistics schema link in
arrow::RecordBatch::MakeStatisticsArray()s docstring (#45588)
* GH-45614 - [C++] Use Boosts CMake packages instead of
FindBoost.cmake in CMake (#45623)
* GH-45628 - [C++] Ensure specifying Boost include directory for
bundled Thrift (#45637)
* GH-45669 - [C++][Parquet] Add missing
ParquetFileReader::GetReadRanges() definition (#45684)
* GH-45693 - [C++][Gandiva] Fix aes_encrypt/decrypt algorithm
selection (#45695)
* GH-45700 - [C++][Compute] Added nullptr check in Equals method
to handle null impl_ pointers (#45701)
* GH-45733 - [C++][Python] Add biased/unbiased toggle to skew and
kurtosis functions (#45762)
* GH-45739 - [C++][Python] Fix crash when calling
hash_pivot_wider without options (#45740)
* GH-45788 - [C++][Acero] Fix data race in aggregate node
(#45789)
* GH-45868 - [C++][CI] Fix test for ambiguous initialization on
C++ 20 (#45871)
* GH-45905 - [C++][Acero] Enlarge the timeout in ConcurrentQueue
test to reduce sporadical failures (#45923)
* GH-45930 - [C++] Dont use ICU C++ API in Azure SDK C++
(#45952)
* GH-45939 - [C++][Benchmarking] Fix compilation failures
(#45942)
* GH-45959 - [C++][CMake] Fix Protobuf dependency in
Arrow::arrow_static (#45960)
* GH-45980 - [C++] Bump Bundled Snappy version to 1.2.2 (#45981)
* GH-45999 - [C++][Gandiva] Fix crashes on LLVM 20.1.1 (#46000)
* GH-46022 - [C++] Fix build error with g++ 7.5.0 (#46028)
* GH-46067 - [CI][C++] Remove system Flatbuffers from macOS
(#46105)
* GH-46077 - [CI][C++] Disable -Werror on macos-13 (#46106)
* GH-46111 - [C++][CI] Fix boost 1.88 on MinGW (#46113)
* GH-46123 - [C++] Undefined behavior in compare_internal.cc and
light_array_internal.cc (#46124)
* GH-46134 - [CI][C++] Explicit conversion of possible
absl::string_view on protobuf methods to std::string (#46136)
* GH-46159 - [CI][C++] Stop using possibly missing
boost/process/v2.hpp on boost 1.88 and use individual includes
(#46160)
* GH-46195 - [Release][C++] verify-rc-source-cpp-macos-amd64
failed to build googlemock
## New Features and Improvements
* GH-26648 - [C++] Optimize union equality comparison (#45384)
* GH-33592 - [C++] support casting nullable fields to
non-nullable if there are no null values (#43782)
* GH-41764 - [Parquet][C++] Support future logical types in the
Parquet reader (#41765)
* GH-41816 - [C++] Add Minimal Meson Build of libarrow (#45441)
* GH-43296 - [C++][FlightRPC] Remove Flight UCX transport
(#43297)
* GH-43573 - [C++] Copy bitmap when casting from string-view to
offset string and binary types (#44822)
* GH-44042 - [C++][Parquet] Limit num-of row-groups when building
parquet for encrypted file (# 44043)
* GH-44393 - [C++][Compute] Vector selection functions
inverse_permutation and scatter (#44394)
* GH-44615 - [C++][Compute] Add extract_regex_span function
(#45577)
* GH-44629 - [C++][Acero] Use implicit_ordering for asof_join
rather than require_sequenced_output (#44616)
* GH-44950 - [C++] Bump minimum CMake version to 3.25 (#44989)
* GH-45045 - [C++][Parquet] Add a benchmark for
size_statistics_level (#45085)
* GH-45190 - [C++][Compute] Add rank_quantile function (#45259)
* GH-45196 - [C++][Acero] Small refinement to hash join (#45197)
* GH-45206 - [C++][CMake] Add sanitizer presets (#45207)
* GH-45209 - [C++][CMake] Fix the issue that allocator not
disabled for sanitizer cmake presets (#45210)
* GH-45215 - [C++][Acero] Export SequencingQueue and
SerialSequencingQueue (#45221)
* GH-45216 - [C++][Compute] Refactor Rank implementation (#45217)
* GH-45219 - [C++][Examples] Update examples to disable mimalloc
(#45220)
* GH-45225 - [C++] Upgrade ORC to 2.1.0 (#45226)
* GH-45227 - [C++][Parquet] Enable Size Stats and Page Index by
default (#45249)
* GH-45269 - [C++][Compute] Add “pivot_wider” and
“hash_pivot_wider” functions (#45562)
* GH-45279 - [C++][Compute] Move all Grouper tests to
grouper_test.cc (#45280)
* GH-45344 - [C++][Testing] Generic StepGenerator (#45345)
* GH-45358 - [C++][Python] Add MemoryPool method to print
statistics (#45359)
* GH-45361 - [CI][C++] Curate ci/vcpkg/vcpkg.json (#45081)
* GH-45366 - [C++][Parquet] Set is_compressed to false when data
page v2 is not compressed (#45367)
* GH-45416 - [CI][C++][Homebrew] Backport the latest formula
changes (#45460)
* GH-45478 - [CI][C++] Drop support for Ubuntu 20.04 (#45519)
* GH-45506 - [C++][Acero] More overflow-safe Swiss table (#45515)
* GH-45551 - [C++][Acero] Release temp states of Swiss join
building hash table to reduce memory consumption (#45552)
* GH-45563 - [C++][Compute] Split up hash_aggregate.cc (#45725)
* GH-45566 - [C++][Parquet][CMake] Remove a workaround for
Windows in FindThriftAlt.cmake (#45567)
* GH-45572 - [C++][Compute] Add rank_normal function (#45573)
* GH-45584 - [C++][Thirdparty] Bump zstd to v1.5.7 (#45585)
* GH-45589 - [C++] Enable singular test in Meson configuration
(#45596)
* GH-45591 - [C++][Acero] Refine hash join benchmark and remove
openmp from the project (#45593)
* GH-45605 - [R][C++] Fix identifier … preceded by whitespace
warnings (#45606)
* GH-45611 - [C++][Acero] Improve Swiss join build performance by
partitioning batches ahead to reduce contention (#45612)
* GH-45620 - [CI][C++] Use Visual Studio 2022 not 2019 (#45621)
* GH-45652 - [C++][Acero] Unify ConcurrentQueue and
BackpressureConcurrentQueue API (#45421)
* GH-45676 - [C++][Python][Compute] Add skew and kurtosis
functions (#45677)
* GH-45680 - [C++][Python] Remove deprecated functions in 20.0
* GH-45689 - [C++][Thirdparty] Bump Apache ORC to 2.1.1 (#45600)
* GH-45694 - [C++] Bump vendored flatbuffers to 24.3.6 (#45687)
* GH-45696 - [C++][Gandiva] Accept LLVM 20.1 (#45697)
* GH-45732 - [C++][Compute] Accept more pivot key types (#45945)
* GH-45744 - [C++] Remove deprecated GetNextSegment (#45745)
* GH-45746 - [C++] Remove deprecated functions in 20.0 (C++
subset) (#45748)
* GH-45755 - [C++][Python][Compute] Add winsorize function
(#45763)
* GH-45771 - [C++] Add tests to top level Meson configuration
(#45773)
* GH-45772 - [C++] Export Arrow as dependency from Meson
configuration (#45774)
* GH-45775 - [C++] Use dict.get() in Meson configuration (#45776)
* GH-45779 - [C++] Add testing directory to Meson configuration
(#45780)
* GH-45784 - [C++] Unpin LLVM and OpenSSL in Brewfile (#45785)
* GH-45792 - [C++] Add benchmarks to Meson configuration (#45793)
* GH-45816 - [C++] Make VisitType() fallback branch unreachable
(#45815)
* GH-45820 - [C++] Add optional out_offset for Buffer-returning
CopyBitmap function (#45852)
* GH-45821 - [C++][Compute] Grouper improvements (#45822)
* GH-45825 - [C++] Add c directory to Meson configuration
(#45826)
* GH-45827 - [C++] Add io directory to Meson configuration
(#45828)
* GH-45831 - [C++] Add CSV directory to Meson configuration
(#45832)
* GH-45848 - [C++][Python][R] Remove deprecated PARQUET_2_0
(#45849)
* GH-45877 - [C++][Acero] Cleanup 64-bit temp states of Swiss
join by using 32-bit (#45878)
* GH-45917 - [C++][Acero] Add flush taskgroup to enable
parallelization (#45918)
* GH-45922 - [C++][Flight] Remove deprecated Authenticate and
StartCall (#45932)
* GH-45953 - [C++] Use lock to fix atomic bug in
ReadaheadGenerator (#45954)
* GH-45986 - [C++] Update bundled GoogleTest (#45996)
* GH-45987 - [C++] Set CMAKE_POLICY_VERSION_MINIMUM=3.5 for
bundled dependencies (#45997)
-------------------------------------------------------------------
Mon Apr 21 14:34:37 UTC 2025 - Friedrich Haubensak <hsk17@mail.de>
- to fix cmake-4 build problems, upgrade bundled mimalloc from
2.0.6 to 2.0.9 and add apache-arrow-19.0.1-mimalloc-version.patch;
mimalloc changes according to readme.md:
* 2.0.9:
- Supports building with asan and improved [Valgrind] support.
- Support abitrary large alignments, in particular for
`std::pmr` pools.
- Added C++ STL allocators attached to a specific heap.
- Heap walks now visit all object (including huge objects).
- Support Windows nano server containers.
- Various small bug fixes.
* 2.0.7:
- Initial support for [Valgrind] for leak testing and heap
block overflow detection.
- Initial support for attaching heaps to a speficic memory area.
- Fix `realloc` behavior for zero size blocks,
- Remove restriction to integral multiple of the alignment in
`alloc_align`.
- Improved aligned allocation performance.
- Reduced contention with many threads on few processors.
- VS2022 support.
- Support `pkg-config`.
-------------------------------------------------------------------
Fri Mar 28 08:47:10 UTC 2025 - Ben Greiner <code@bnavigator.de>
- Re-enable flight, grpc has been fixed boo#1237422
-------------------------------------------------------------------
Thu Mar 13 18:57:51 UTC 2025 - Ben Greiner <code@bnavigator.de>
- Add missing dependencies for libboost_process explicitly
boo#1239599
-------------------------------------------------------------------
Wed Feb 19 15:58:28 UTC 2025 - Ben Greiner <code@bnavigator.de>
- disable flight because of gh#grpc/grpc#37968 boo#1237422
-------------------------------------------------------------------
Mon Feb 17 19:17:26 UTC 2025 - Ben Greiner <code@bnavigator.de>
- Update to 19.0.1
## Bug Fixes
* [C++] Fix overflow issues for large build side in swiss join
(#45108)
* [C++][Fuzzing] Fix Negation bug discovered by fuzzing (#45181)
* [C++][Parquet] Omit level histogram when max level is 0
(#45285)
* [Parquet][C++] Fix statistics load logic for no row group and
multiple row groups (#45350)
* [C++] Disable Flight test (#45232)
## Improvements
* [C++][Parquet] Improve performance of generating size
statistics (#45202)
* [C++][S3] Workaround compatibility issue between AWS SDK and
MinIO (#45310)
- Release 19.0.0
## New Features and Improvements
* [CI][C++] Add a nightly job to test offline build (#44721)
* [C++] GcsFileSystem::Make should return Result (#44503)
* [C++][Parquet] Implement SizeStatistics (#40594)
* [C++] Reduce string inlining in Substrait serde (#45174)
* [C++][Acero] Enhance asof_join to work in multi-threaded
execution by sequencing input (#44083)
* [C++] Support the AWS S3 SSE-C encryption (#43601)
* [C++][Parquet] Parquet Metadata Printer supports print
sort-columns (#43599)
* [C++] Add C++ implementation of Async C Data Interface (#44495)
* [C++][Acero] Support AVX2 swiss join decoding (#43832)
* [C++] skip -0117 in StrptimeZoneOffset for old glibc (#44621)
* [C++] Add arrow::RecordBatch::MakeStatisticsArray() (#44252)
* [C++] Improve merge step in chunked sorting (#44217)
* [C++][Parquet] Tools: Debug Print for Json should be valid JSON
(#44532)
* [C++][FS][Azure] Implement SAS token authentication (#45021)
* [C++] Dont export template class (#44365)
* [C++][Docs] Update the URL to C++ Development in README.md
(#44427)
* [C++] Added rvalue-reference-qualified overload for
arrow::Result::status() returning value instead of reference
(#44477)
* [C++] StatusConstant- cheaply copied const Status (#44493)
* [C++][Compute] Allow casting struct to bigger nullable struct
(#44587)
* [C++] Use array type to compute min/max statistics Arrow type
(#45094)
* [C++] Minor: ArrayData ctor can assign null_count directly
(#44582)
* [C++] Add const and & to arrow::Array::statistics() return type
(#44592)
* [Python][C++] Add version suffix to libarrow_python* libraries
(#44702)
* [C++] NumericBuilder::AppendValues append vector prevent from
ub (#44794)
* [C++][Parquet] Remove obsolete parquet_constants generated
files from old thrift (#44772)
* [Docs][C++] Add arrow::ArrayStatistics to API doc (#44764)
* [C++] Upgrade ORC to 2.0.3 (#44745)
* [C++][Parquet] Add arrow::Result version of
parquet::arrow::OpenFile() (#44785)
* [C++] Fix a couple of maybe-uninitialized warnings (#44789)
* [C++] Use arrow::util::span on
arrow::util::bitmap_builders_utilities instead of std::vector
(#44796)
* [C++][Parquet] Add arrow::Result version of
parquet::arrow::FileReader::GetRecordBatchReader() (#44809)
* [C++] minor optimize cancel and thread pool (#44812)
* [C++][Parquet] Add an example to dump statistics read as
arrow::ArrayStatistics (#44816)
* [C++] Add the Expm1(exponent) scalar arithmetic function
(#44904)
* [C++] Add WithinUlp testing functions (#44906)
* [C++][Python] Add Hyperbolic Trig functions (#44630)
* [C++] Enable mimalloc by default, disable jemalloc by default
and more (#44951)
* [C++] Add support for building system OpenTelemetry (#44983)
* [C++][CMake] Use librt only for Linux (#44984)
* [C++] Support for fixed-size list in conversion of range tuple
(#45008)
* [C++][Parquet] Allow configuring the default footer read size
(#45016)
* [C++] Remove result_internal.h (#45066)
* [FlightRPC][C++] Deprecate InitializeFlightUcx before removing
UCX (#45080)
* [C++][Parquet] Add GetReadRanges function to FileReader
(#45093)
* [C++] Apply a cstdint patch to bundled Thrift for GCC 15
(#45097)
* [C++] Remove useless “hash table ready” states in swiss join
(#45136)
* [CI][C++] Add a GCC 15 job (#45138)
* [C++] Ensure using cpp/cmake_modules/*.cmake (#45143)
* [CI][C++] Upgrade Alpine Linux to 3.18 from 3.16 (#45168)
## Bug Fixes
* [C++] Fix CopyFiles when destination is a FileSystem with
background_writes (#44897)
* [C++][Python] Fix ORC crash when file contains unknown timezone
(#45051)
* [C++] Replace std::aligned_storage that is deprecated in C++23
(#45019)
* [C++][Parquet] Refuse writing non-nullable column that contains
nulls (#44921)
* [C++] Initialize offset vector head as 0 after memory allocated
in grouper.cc (#43123)
* [C++] io::BufferedInput: Fix invalid state after SetBufferSize
(#44387)
* [C++][Parquet] Fix schema conversion from two-level encoding
nested list (#43995)
* [C++] Use “lib” for generating bundled dependencies even with
“clang-cl” (#44391)
* [C++] Fix unaligned load/store implementation for clang-18
(#44468)
* [C++] Use CMAKE_LIBTOOL on macOS (#44385)
* [CI][C++] Use setup-python on hosted runner (#44411)
* [C++] Update vendored date to 3.0.3 (#44482)
* [GLib][C++] Meson searches libraries with specific versions.
(#44475)
* [C++][Acero] Fix crash when thread in asof_join is not running
(#44584)
* [C++] NumericArray should not use ctor from parent directly
(#44542)
* [C++] FunctionOptions::{Serialize,Deserialize}() return an
error without ARROW_IPC (#45171)
* [C++][Acero] Enhance partition sort example (#44678)
* [C++][Python] Fix Flight Timestamp precision, revert workaround
from #43537 (#44681)
* [C++] Add S3 option to ignore SIGPIPE signals (#44735)
* [C++] Keep field metadata for keys and values when importing a
map type via the C data interface (#44715)
* [C++][CI] Fix arrow-c-bridge-test timeout with threading
disabled (#44737)
* [C++] Use lowercased windows.h to enable cross-platform builds
(#44755)
* [C++] Fix Float16.To{Little,Big}Endian on big endian machines
(#44768)
* [C++][Parquet] Fix read/write of metadata length footer on
big-endian systems (#44787)
* [C++][CI] Migrate to arrow::Result based
parquet::arrow::OpenFile() API in example tutorials (#44807)
* [C++] Fix thread-unsafe access in ConcurrentQueue::UnsyncFront
(#44849)
* [C++] Fix compilation error on GCC 8 (#44899)
* [C++][CI] Silence protobuf-generated deprecations (#44955)
* [C++] Use recommended downloads URLs for ORC and Thrift
(#44977)
* [C++] Include path in the documentation is wrong (#45031)
* [C++] Remove Parquet requirement from Arrow Acero and from
Arrow Dataset when not necessary (#45035)
* [C++] Add support for Boost 1.87.0 (#45057)
* [C++][CI] Fix test-build-cpp-fuzz failures (#45060)
* [C++][Parquet] Fix generation of repetition levels for
encryption test data (#45074)
* [C++] Avoid static const variable in the status.h (#45100)
* [C++][Parquet] Fix Null-dereference READ in
parquet::arrow::ListToSchemaField (#45152)
* [C++][Release] Add llvm-dev back to setup-ubuntu.sh (#45184)
* [C++][Parquet] test-conda-cpp-valgrind fails on
arrow-dataset-file-parquet-encryption-test
- Release 18.1.0
## Bug Fixes
* [C++] Add support for overwriting grpc_cpp_plugin path for
cross-compiling (#44507)
* [Docs][C++] Fix documentation directive for ChunkLocation
(#44505)
* [C++] Add find module for abseil that handles missing version
(#44613)
* [C++][Dev] Update bundled Thrift, update mirrors to use CDN
(#44685)
## New Features and Improvements
* [C++] Move ChunkResolver to the public API (#44357)
- Release 18.0.0
## Bug Fixes
* [C++] data corruption when using `group_by` and `aggregate` on
large data sets
* [C++] Use PutObject request for S3 in OutputStream when only
uploading small data (#41564)
* [C++] Clean up implicit fallthrough warnings (#41892)
* [C++] Fix avx2 gather rows more than 2^31 issue in
CompareColumnsToRows (#43065)
* [C++][ArrowFlight] Crash due to UCS thread mode
* [C++] Add workaround for missing Boost dependency of Thrift
(#43328)
* [C++] Skip not Emscripten ready tests in CSV tests (#43724)
* [C++] Add date{32,64} to date{32,64} cast (#43192)
* [C++][Compute] Detect and explicit error for offset overflow in
row table (#43226)
* [C++] Fix decimal benchmarks to avoid out-of-bounds accesses
(#43212)
* [C++] Resolve Abseil like any other dependency in the build
system (#43219)
* [C++][Parquet] Refactor parquet::encryption::AesEncryptor to
use unique_ptr (#43222)
* [C++] Fix Abseil compile error on GCC 13 (#43157)
* [C++] Add missing serde methods to Location (#43332)
* [C++][Parquet] min-max Statistics doesnt work well when one of
min-max is truncated (#43383)
* [C++][Parquet] parquet-dump-footer: Remove redundant link and
fix debug processing (#43375)
* [C++] Ensure using bundled GoogleTest when we use bundled
GoogleTest (#43465)
* [C++][Compute] Fix invalid memory access when resizing
var-length buffer in row table (#43415)
* [C++][FlightRPC] Fix Flight UCX build issues (#43430)
* [C++] FIlter out zero length buffers on gRPC transport (#43448)
* [C++][Gandiva] Always use gdv_function_stubs.h in
context_helper.cc (#43464)
* [C++] Add support for the official LZ4 CMake package (#43468)
* [C++] Register the new Opaque extension type by default
(#43788)
* [C++][Acero] Fix typos in join benchmark (#43871)
* [C++][CI] Catch potential integer overflow in PoolBuffer
(#43886)
* [C++] Leak S3 structures if finalization happens too late
(#44090)
* [C++][Parquet] Fix reported metrics in
parquet-arrow-reader-writer-benchmark (#44082)
* [C++] Dont use Boost.Process with Emscripten (#44097)
* [C++] Add home made _mm256_set_m128i for compilers who are
missing it (#44116)
* [C++] JsonExtensionType equality check ignores storage type
(#44215)
* [CI][C++][AppVeyor] Use conda instead of Mamba (#44235)
* [C++][FS][Azure] Fix edgecase where GetFileInfo incorrectly
returns NotFound on flat namespace and Azurite (#44302)
* [C++][FS][Azure] Catch missing exceptions on HNS support check
(#44274)
* [C++][FS][Azure] Fix minor hierarchical namespace bugs (#44307)
* [C++] Fix S3 error handling in ObjectOutputStream (#44335)
* [C++] Disable jemalloc by default on ARM (#44380)
## New Features and Improvements
* [C++][Python] Native support for UUID (#37298)
* [C++][Python] Bool8 Extension Type Implementation (#43488)
* [C++][Parquet] Add JSON canonical extension type (#13901)
* [C++][Compute] Replace explicit checking with DCHECK for
invariants in row segmenter (#44236)
* [C++][CI] Improve IPC fuzzing seed corpus (#43621)
* [Documentation][C++] Explicitly note that compute is optional
(#43629)
* [C++] Azure file system write buffering & async writes (#43096)
* [C++][Parquet] Separate encoders and decoder (#43972)
* [C++][Python][Parquet] Support reading/writing key-value
metadata from/to ColumnChunkMetaData (#41580)
* [Docs][C++] Is arrow::dataset namespace still experimental?
* [C++] Add arrow::ArrayStatistics (#43273)
* [CI][C++] Update Minio version (#44225)
* [C++][Parquet] Add binary that extracts a footer from a parquet
file (#42174)
* [C++] Support casting to and from utf8_view/binary_view
(#43302)
* [C++] Update bundled vendor/datetime to support for building
with libc++ and C++20 (#43094)
* [C++] Implement PathFromUri support for Azure file system
(#43098)
* [C++][Compute] Fix the unnecessary allocation of extra bytes
when encoding row table (#43125)
* [C++][Parquet] Replace use of int with int32_t in the internal
Parquet encryption APIs (#43413)
* [C++][Parquet] Refactor Encryptor API to use arrow::util::span
instead of raw pointers (#43195)
* [C++][Parquet] Default initialize some parquet metadata
variables (#43144)
* [C++] Fix CMake link order for AWS SDK (#43230)
* [C++] Suggest a cast when Concatenate fails due to offsets
overflow (#43190)
* [C++] Support basic is_in predicate simplification (#43761)
* [C++][AzureFS] Ignore password field in URI (#44220)
* [C++] Add lint for DCHECK in public headers (#43248)
* [C++][FlightRPC] Reduce repetition in flight/types.cc in serde
functions (#43237)
* [C++][Parquet] remove useless template parameter of
DeltaLengthByteArrayEncoder (#43250)
* [C++] Always prefer mimalloc to jemalloc (#40875)
* [C++][Flight] Use a Base CRTP type for the types used in RPC
calls (#43255)
* [C++] Expand the take function tests to cover more
chunked-array cases (#43292)
* [C++][Parquet] Enhance the comment for ColumnReader/Decoder
(#44003)
* [C++] Order classes in flight/types.h according to Flight.proto
(#43330)
* [C++][Parquet] Deprecate ColumnChunk::file_offset field and no
longer write Metadata at end of Chunk (#43428)
* [C++] Add benchmark for binary view builder (#43445)
* [C++][Python] Add Opaque canonical extension type (#43458)
* [Java][C++] Support more CsvFragmentScanOptions in JNI call
(#43482)
* [C++] Thirdparty: Bump lz4 to 1.10.0 (#43493)
* [C++][Compute] Widen the row offset of the row table to 64-bit
(#43389)
* [C++] Use ViewOrCopyTo instead of CopyTo when pretty printing
non-CPU data (#43508)
* [FlightRPC][C++] Reduce the number of references to
protobuf::Any (#43544)
* [C++] Simplify arrow::ArrayStatistics::ValueType (#43581)
* [C++][GLib] Dont install arrow-cuda.pc/arrow-cuda-glib.pc on
Windows (#43593)
* [C++] Remove redundant default constructor/deconstructor in
arrow::ArrayStatistics (#43579)
* [C++] Remove std::optional from
arrow::ArrayStatistics::is_{min,max}_exact (#43595)
* [C++][FlightRPC] Move the FlightTestServer to its own .cc and
.h files (#43678)
* [C++] Compute: fix register kernel SimdLevel for
AddMinMax512AggKernels (#43704)
* [C++] Prevent Snappy from disabling RTTI when bundled (#43706)
* [C++][FS][Azure] Use the latest Azurite and update the bundled
Azure SDK for C++ to azure-identity_1.9.0 (#43723)
* [C++][Parquet][CI] Parquet: Introducing more bad_data for
testing (#43708)
* [C++][Parquet] Dataset: Handle num-nulls in Parquet correctly
when !HasNullCount() (#43726)
* [C++] Clarify the way SIMD-enabled agg kernels come from the
same code in different compilation units (#43720)
* [C++] Fix Scalar boolean handling in row encoder (#43734)
* [C++] Add support for Boost 1.86 (#43766)
* [C++] Compute: More comment in RowEncoder (#43763)
* [C++] Acero: Minor code enhancement for Join (#43760)
* [C++] Fix the case when boolean_{any all} meets constant input
with length in Acero (#43799)
* [C++] Add chunked Take benchmarks with a small selection factor
(#43772)
* [C++] Indent preprocessor directives (#43798)
* [C++] Attach arrow::ArrayStatistics to arrow::ArrayData
(#43801)
* [C++] Enable filesystem automatically when one of
ARROW_{AZURE,GCS,HDFS,S3}=ON is specified (#43806)
* [C++] Expose the set of device types where a ChunkedArray is
allocated (#43853)
* [C++] Make ChunkResolver::ResolveMany output a list of
ChunkLocations (#43928)
* [C++][Parquet] Add support for arrow::ArrayStatistics: non
zero-copy int based types (#43945)
* [C++][Parquet] Guard against use of cleared decryptor/encryptor
(#43947)
* [C++] Add tests based on random data and benchmarks to
ChunkResolver::ResolveMany (#43954)
* [C++] Enhance error message for URI parsing (#43938)
* [CI][C++][Dev] Add cpplint to pre-commit (#43982)
* [C++][Parquet] Add support for arrow::ArrayStatistics:
zero-copy types (#43984)
* [C++][Acero] Some code cleanup to Grouper (#43988)
* [C++] Add missing std::move() in array_nested.cc (#43993)
* [C++][Docs] Add missing install command in building docs
(#44000)
* [C++][Parquet] Add support for arrow::ArrayStatistics: boolean
(#44009)
* [C++] IPC: ipc reader/writer code enhancement (#44019)
* [C++][Compute] Reduce the complexity of row segmenter (#44053)
* [C++][Parquet] Add Float16 reading benchmarks (#44073)
* [C++][Parquet] Remove deprecated APIs (#44080)
* [C++][Acero] Add more row segmenter tests (#44166)
* [C++][Parquet] Fix typo in parquet/column_writer.cc (#40856)
* [C++] Avoid repeated ArrayData::offset lookups (#44190)
* [C++][Gandiva] Accept LLVM 19.1 (#44233)
* [C++] Unify simd header includings (#44250)
* [C++][Decimal] Use 0E+1 not 0.E+1 for broader compatibility
(#44275)
* [Packaging][C++] Enable Azure file system for deb/rpm (#44348)
- Drop apache-arrow-pr43766-boost1_86.patch
- Release notes for 18.0.0 and 19.0.0
-------------------------------------------------------------------
Fri Sep 27 05:31:41 UTC 2024 - Guang Yee <gyee@suse.com>
- Set the appropriate C++ complier for the given platform so
it will compile on Leap 15.x.
-------------------------------------------------------------------
Wed Sep 18 06:59:36 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Add apache-arrow-pr43766-boost1_86.patch for Boost 1.86
* gh#apache/arrow#43766
-------------------------------------------------------------------
Mon Aug 12 17:11:06 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Update to 17.0.0
## Bug Fixes
* [C++] Add option to string center kernel to control
left/right alignment on odd number of padding (#41449)
* [C++][Python] Fix casting to extension type with fixed size
list storage type (#42219)
* [C++] Replace null_count with MayHaveNulls in
ListArrayFromArray and MapArray (#41957)
* [C++][Python] RecordBatch.filter() segfaults if passed a
ChunkedArray (#40971)
* [C++][Parquet] Timestamp conversion from Parquet to Arrow does
not follow compatibility guidelines for convertedType
* [C++] Use LargeStringArray for casting when writing tables to
CSV (#40271)
* [C++][Python] Map child Array constructed from keys and items
shouldnt have offset (#40871)
* [C++] Fix compile warning with implicitly-defined constructor
does not initialize in encoding_benchmark (#41060)
* [C++] Get null_bit_id according to are_cols_in_encoding_order
in NullUpdateColumnToRow_avx2 (#40998)
* [C++] Clean up unused parameter warnings (#41111)
* [C++][Acero] Fix asof join race (#41614)
* [C++] support for single threaded joins (#41125)
* [C++] Fix hashjoin benchmark failed at make utf8s random
batches (#41195)
* [C++] Check to avoid copying when NullBitmapBuffer is Null
(#41452)
* [C++] Fix crash on invalid Parquet file (#41366)
* [C++][Parquet] More strict Parquet level checking (#41346)
* [C++][Gandiva] Fix gandiva cache size env var (#41330)
* [C++][CMake][Windows] Remove needless .dll suffix from link
libraries (#41341)
* [C++][CMake] Remove unused ARROW_NO_DEPRECATED_API (#41345)
* [C++][maybe_unused] with Arrow macro (#41359)
* [C++][Large] ListView and Map nested types for scalar_if_elses
kernel functions (#41419)
* [C++][Gandiva] Fix ascii_utf8 function to return same result on
x86 and Arm (#41434)
* [C++] Reuse deduplication logic for direct registration
(#41466)
* [C++] Clean up more redundant move warnings (#41487)
* [C++][Compute] Remove redundant logic for ArrayData as
ExecResults in ExecScalarCaseWhen (#41380)
* [C++][CMake] correctly use Protobuf_PROTOC_EXECUTABLE (#41582)
* [C++][CMake] Fix ARROW_USE_BOOST detect condition (#41622)
* [C++][Python] Add optional null_bitmap to MapArray::FromArrays
(#41757)
* [C++] macros.h: Fix ARROW_FORCE_INLINE for MSVC (#41712)
* [C++][Acero] Remove an useless parameter for QueryContext::Init
called in hash_join_benchmark (#41716)
* [C++] Fix the issue that temp vector stack may be under sized
(#41746)
* [C++] Check that extension metadata key is present before
attempting to delete it (#41763)
* [C++] Iterator releases its resource immediately when it reads
all values (#41824)
* [C++][Flight][Benchmark] Ensure waiting server ready (#41793)
* [C++] Fix avx2 gather offset larger than 2GB in
CompareColumnsToRows (#42188)
* [C++][S3] Fix potential deadlock when closing output stream
(#41876)
* [CI][C++] Clear cache for mamba on AppVeyor (#41977)
* [CI][Python][C++] Fix utf8proc detection for wheel on Windows
(#42022)
* [C++] Support list-views on list_slice (#42067)
* [C++] Fix an OTel test failure and remove needless logs
(#42122)
* [C++][FS][Azure] Ensure setting BlobSasBuilder::Protocol
(#42108)
* [C++] Support list-view typed arrays in array_take and
array_filter (#42117)
* [C++] Fix some potential uninitialized variable warnings
(#42207)
* [C++] Avoid invalid accesses in parquet-encoding-benchmark
(#42141)
* [C++] Use FetchContent for bundled ORC (#43011)
* [C++] Fix GetRecordBatchPayload crashes for device data
(#42199)
* [C++] Use non-stale c-ares download URL (#42250)
* [C++][Parquet] Check for valid ciphertext length to prevent
segfault (#43071)
* [C++][Compute] Mark KeyCompare.CompareColumnsToRowsLarge as
large memory test (#43128)
* [C++] Upgrade bundled google-cloud-cpp to 2.22.0 (#43136)
## New Features and Improvements
* [C++][Compute] Implement Grouper::Reset (#41352)
* [Go][C++] Implement Flight SQL Bulk Ingestion (#38385)
* [C++][FS][Azure] Support azure cli auth (#41976)
* [C++][FS][Azure] Add support for environment credential
(#41715)
* [C++] Optimize Take for fixed-size types including nested
fixed-size lists (#41297)
* [C++][Device] Add Copy/View slice functions to a CPU pointer
(#41477)
* [C++] Add support for OpenTelemetry logging (#39905)
* [C++] Import/Export ArrowDeviceArrayStream (#40807)
* [C++] move LocalFileSystem to the registry (#40356)
* [C++] Make flatbuffers serialization more deterministic
(#40392)
* [C++][Gandiva] add RE2::Options set_dot_nl(true) for Like
function (#40970)
* [C++] Introduce portable compiler assumptions (#41021)
* [C++] Add a grouper benchmark for preventing performance
regression (#41036)
* [C++] Support flatten for combining nested list related types
(#41092)
* [C++] Clean up remaining tasks related to half float casts
(#41084)
* [C++][FS][Azure] Add support for CopyFile with hierarchical
namespace support (#41276)
* [C++] Add is_validity_defined_by_bitmap() predicate (#41115)
* [C++] IO: enhance boundary checking in CompressedInputStream
(#41117)
* [C++][Python] Expose recursive flatten for lists on
list_flatten kernel function and pyarrow bindings (#41295)
* [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst
(#41187)
* [C++] Extract the kernel loops used for PrimitiveTakeExec and
generalize to any fixed-width type (#41373)
* [C++][Acero] Use per-node basis temp vector stack to mitigate
overflow (#41335)
* [C++][Parquet] Optimize DelimitRecords by batch execution when
max_rep_level > 1 (#41362)
* [C++][FS][Azure][Docs] Add AzureFileSystem to Filesystems API
reference (#41411)
* [C++] Use ASAN to poison temp vector stack memory (#41695)
* [C++][S3] Add a new option to check existence before CreateDir
(#41822)
* [C++][Parquet] Fix
DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize (#41546)
* [C++] Thirdparty: Upgrade xsimd to 13.0.0 (#41548)
* [C++] Improve fixed_width_test_util.h (#41575)
* [C++] ChunkResolver: Implement ResolveMany and add unit tests
(#41561)
* [C++] fixed_width_internal.h: Simplify docstring and support
bit-sized types (BOOL) (#41597)
* [C++][Python] Extends the add_key_value to parquet::arrow and
PyArrow (#41633)
* [C++][CMake][Windows] Dont build needless object libraries
(#41658)
* [C++][Python] PrettyPrint non-cpu data by copying to default
CPU device (#42010)
* [C++][Parquet] Thrift: generate template method to accelerate
reading thrift (#41703)
* [C++][Parquet] Minor: moving EncodedStats by default rather
than copying (#41727)
* [C++][ORC] Ensure setting detected ORC version (#41767)
* [C++][Parquet] Add file metadata read/write benchmark (#41761)
* [C++] Make git-dependent definitions internal (#41781)
* [C++][S3] Remove GetBucketRegion hack for newer AWS SDK
versions (#41798)
* [C++][Parquet] normalize dictionary encoding to use
RLE_DICTIONARY (#41819)
* [C++] IPC: Minor enhance the code of writer (#41900)
* [C++] Fix ExecuteScalar deduce all_scalar with chunked_array
(#41925)
* [C++] Minor enhance code style for FixedShapeTensorType
(#41954)
* [C++] Follow up of adding null_bitmap to MapArray::FromArrays
(#41956)
* [C++] Misc changes making code around list-like types and
list-view types behave the same way (#41971)
* [C++] : kernel.cc: Remove defaults on switch so that compiler
can check full enum coverage for us (#41995)
* [C++][Parquet] ParquetFilePrinter::JSONPrint print length of
FLBA (#41981)
* [C++][CMake] Add preset for Valgrind (#42110)
* [C++] Move TakeXXX free functions into TakeMetaFunction and
make them private (#42127)
* [C++][FS][Azure] Validate
AzureOptions::{blob,dfs}_storage_scheme (#42135)
* [C++] list_parent_indices: Add support for list-view types
(#42236)
* [C++] Reduce the recursion of many-join test (#43042)
* [C++] Limit buffer size in BufferedInputStream::SetBufferSize
with raw_read_bound (#43064)
- Require cmake lz4 for 1.10
-------------------------------------------------------------------
Sun Apr 21 16:35:21 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Update to 16.0.0
## Bug Fixes
* [C++][ORC] Catch all ORC exceptions to avoid crash (#40697)
* [C++][S3] Handle conventional content-type for directories
(#40147)
* [C++] Strengthen handling of duplicate slashes in S3, GCS
(#40371)
* [C++] Avoid hash_mean overflow (#39349)
* [C++] Fix spelling (array) (#38963)
* [C++][Parquet] Fix crash in Modular Encryption (#39623)
* [C++][Dataset] Fix failures in dataset-scanner-benchmark
(#39794)
* [C++][Device] Fix Importing nested and string types for
DeviceArray (#39770)
* [C++] Use correct (non-CPU) address of buffer in
ExportDeviceArray (#39783)
* [C++] Improve error message for "chunker out of sync" condition
(#39892)
* [C++] Use make -j1 to install bundled bzip2 (#39956)
* [C++] DatasetWriter avoid creating zero-sized batch when
max_rows_per_file enabled (#39995)
* [C++][CI] Disable debug memory pool for ASAN and Valgrind
(#39975)
* [C++][Gandiva] Make Gandiva's default cache size to be 5000 for
object code cache (#40041)
* [C++][FS][Azure] Fix CreateDir and DeleteDir trailing slash
issues on hierarchical namespace accounts (#40054)
* [C++][FS][Azure] Validate containers in
AzureFileSystem::Impl::MovePaths() (#40086)
* [C++] Decimal types with different precisions and scales bind
failed in resolve type when call arithmetic function (#40223)
* [C++][Docs] Correct the console emitter link (#40146)
* [C++][Python] Fix test_gdb failures on 32-bit (#40293)
* [Python][C++] Fix large file handling on 32-bit Python build
(#40176)
* [C++] Support glog 0.7 build (#40230)
* [C++] Fix cast function bind failed after add an alias name
through AddAlias (#40200)
* [C++] TakeCC: Concatenate only once and delegate to TakeAA
instead of TakeCA (#40206)
* [C++] Fix an abort on asof_join_benchmark run for lost an arg
(#40234)
* [C++] Fix an simple buffer-overflow case in decimal_benchmark
(#40277)
* [C++] Reduce S3Client initialization time (#40299)
* [C++] Fix a wrong total_bytes to generate StringType's test
data in vector_hash_benchmark (#40307)
* [C++][Gandiva] Add support for compute module's decimal
promotion rules (#40434)
* [C++][Parquet] Add missing config.h include in
key_management_test.cc (#40330)
* [C++][CMake] Add missing glog::glog dependency to arrow_util
(#40332)
* [C++][Gandiva] Add missing OpenSSL dependency to
encrypt_utils_test.cc (#40338)
* [C++] Remove const qualifier from Buffer::mutable_span_as
(#40367)
* [C++] Avoid simplifying expressions which call impure functions
(#40396)
* [C++] Expose protobuf dependency if opentelemetry or ORC are
enabled (#40399)
* [C++][FlightRPC] Add missing expiration_time arguments (#40425)
* [C++] Move key_hash/key_map/light_array related files to
internal for prevent using by users (#40484)
* [C++] Add missing Threads::Threads dependency to arrow_static
(#40433)
* [C++] Fix static build on Windows (#40446)
* [C++] Ensure using bundled FlatBuffers (#40519)
* [C++][CI] Fix TSAN and ASAN/UBSAN crashes (#40559)
* [C++] Repair FileSystem merge error (#40564)
* [C++] Fix 3.12 Python support (#40322)
* [C++] Move mold linker flags to variables (#40603)
* [C++] Enlarge dest buffer according to dest offset for
CopyBitmap benchmark (#40769)
* [C++][Gandiva] 'ilike' function does not work (#40728)
* [C++] Fix protobuf package name setting for builds with
substrait (#40753)
* [C++][ORC] Fix std::filesystem related link error with ORC
2.0.0 or later (#41023)
* [C++] Fix TSAN link error for module library (#40864)
* [C++][FS][Azure] Don't run TestGetFileInfoGenerator() with
Valgrind (#41163)
* [C++] Fix null count check in BooleanArray.true_count()
(#41070)
* [C++] IO: fixing compiling in gcc 7.5.0 (#41025)
* [C++][Parquet] Bugfixes and more tests in boolean arrow
decoding (#41037)
* [C++] formatting.h: Make sure space is allocated for the 'Z'
when formatting timestamps (#41045)
* [C++] Ignore ARROW_USE_MOLD/ARROW_USE_LLD with clang < 12
(#41062)
* [C++] Fix: left anti join filter empty rows. (#41122)
* [CI][C++] Don't use CMake 3.29.1 with vcpkg (#41151)
* [CI][C++] Use newer LLVM on Ubuntu 24.04 (#41150)
* [CI][R][C++] test-r-linux-valgrind has started failing
* [C++][Python] Sporadic asof_join failures in PyArrow
* [C++] Fix Valgrind error in string-to-float16 conversion
(#41155)
* [C++] Stop defining ARROW_TEST_MEMCHECK in config.h.cmake
(#41177)
* [C++] Fix mistake in integration test. Explicitly cast
std::string to avoid compiler interpreting char* -> bool
(#41202)
## New Features and Improvements
* [C++] Filesystem implementation for Azure Blob Storage
* [C++] Implement cast to/from halffloat (#40067)
* [C++] Add residual filter support to swiss join (#39487)
* [C++] Add support for building with Emscripten (#37821)
* [C++][Python] Add missing methods to RecordBatch (#39506)
* [C++][Java][Flight RPC] Add Session management messages
(#34817)
* [C++] build filesystems as separate modules (#39067)
* [C++][Parquet] Rewrite BYTE_STREAM_SPLIT SSE optimizations
using xsimd (#40335)
* [C++] Add support for service-specific endpoint for S3 using
AWS_ENDPOINT_URL_S3 (#39160)
* [C++][FS][Azure] Implement DeleteFile() (#39840)
* [C++] Implement Azure FileSystem Move() via Azure DataLake
Storage Gen 2 API (#39904)
* [C++] Add ImportChunkedArray and ExportChunkedArray to/from
ArrowArrayStream (#39455)
* [CI][C++][Go] Don't run jobs that use a self-hosted GitHub
Actions Runner on fork (#39903)
* [C++][FS][Azure] Use the generic filesystem tests (#40567)
* [C++][Compute] Add binary_slice kernel for fixed size binary
(#39245)
* [C++] Avoid creating memory manager instance for every buffer
view/copy (#39271)
* [C++][Parquet] Minor: Style enhancement for
parquet::FileMetaData (#39337)
* [C++] IO: Reuse same buffer in CompressedInputStream (#39807)
* [C++] Use more permissable return code for rename (#39481)
* [C++][Parquet] Use std::count in ColumnReader ReadLevels
(#39397)
* [C++] Support cast kernel from large string, (large) binary to
dictionary (#40017)
* [C++] Pass -jN to make in external projects (#39550)
* [C++][Parquet] Add integration test for BYTE_STREAM_SPLIT
(#39570)
* [C++] Ensure top-level benchmarks present informative metrics
(#40091)
* [C++] Ensure CSV and JSON benchmarks present a bytes/s or
items/s metric (#39764)
* [C++] Ensure dataset benchmarks present a bytes/s or items/s
metric (#39766)
* [C++][Gandiva] Ensure Gandiva benchmarks present a bytes/s or
items/s metric (#40435)
* [C++][Parquet] Benchmark levels decoding (#39705)
* [C++][FS][Azure] Remove StatusFromErrorResponse as it's not
necessary (#39719)
* [C++][Parquet] Make BYTE_STREAM_SPLIT routines type-agnostic
(#39748)
* [C++][Device] Generic CopyBatchTo/CopyArrayTo memory types
(#39772)
* [C++] Document and micro-optimize ChunkResolver::Resolve()
(#39817)
* [C++] Allow building cpp/src/arrow/**/*.cc without waiting
bundled libraries (#39824)
* [C++][Parquet] Parquet binary length overflow exception should
contain the length of binary (#39844)
* [C++][Parquet] Minor: avoid creating a new Reader object in
Decoder::SetData (#39847)
* [C++] Thirdparty: Bump google benchmark to 1.8.3 (#39878)
* [C++] DataType::ToString support optionally show metadata
(#39888)
* [C++][Gandiva] Accept LLVM 18 (#39934)
* [C++] Use Requires instead of Libs for system RE2 in arrow.pc
(#39932)
* [C++] Small CSV reader refactoring (#39963)
* [C++][Parquet] Expand BYTE_STREAM_SPLIT to support
FIXED_LEN_BYTE_ARRAY, INT32 and INT64 (#40094)
* [C++][FS][Azure] Add support for reading user defined metadata
(#40671)
* [C++][FS][Azure] Add AzureFileSystem support to
FileSystemFromUri() (#40325)
* [C++][FS][Azure] Make attempted reads and writes against
directories fail fast (#40119)
* [C++][Python] Basic conversion of RecordBatch to Arrow Tensor
(#40064)
* [C++][Python] Basic conversion of RecordBatch to Arrow Tensor -
add support for different data types (#40359)
* [C++][Python] Basic conversion of RecordBatch to Arrow Tensor -
add option to cast NULL to NaN (#40803)
* [C++][FS][Azure] Implement DeleteFile() for flat-namespace
storage accounts (#40075)
* [CI][C++] Add a job on ARM64 macOS (#40456)
* [C++][Parquet] Remove AVX512 variants of BYTE_STREAM_SPLIT
encoding (#40127)
* [C++][Parquet][Tools] Print FIXED_LEN_BYTE_ARRAY length
(#40132)
* [C++] Make S3 narrative test more flexible (#40144)
* [C++] Remove redundant invocation of BatchesFromTable (#40173)
* [C++][CMake] Use "RapidJSON" CMake target for RapidJSON
(#40210)
* [C++][CMake] Use arrow/util/config.h.cmake instead of
add_definitions() (#40222)
* [C++] Fix: improve the backpressure handling in the dataset
writer (#40722)
* [C++][CMake] Improve description why we need to initialize AWS
C++ SDK in arrow-s3fs-test (#40229)
* [C++] Add support for system glog 0.7 (#40275)
* [C++] Specialize ResolvedChunk::Value on value-specific types
instead of entire class (#40281)
* [C++][Docs] Add documentation of array factories (#40373)
* [C++][Parquet] Allow use of FileDecryptionProperties after the
CryptoFactory is destroyed (#40329)
* [FlightRPC][C++][Java][Go] Add URI scheme to reuse connection
(#40084)
* [C++] Add benchmark for ToTensor conversions (#40358)
* [C++] Define ARROW_FORCE_INLINE for non-MSVC builds (#40372)
* [C++] Add support for mold (#40397)
* [C++] Add support for LLD (#40927)
* [C++] Produce better error message when Move is attempted on
flat-namespace accounts (#40406)
* [C++][ORC] Upgrade ORC to 2.0.0 (#40508)
* [CI][C++] Don't install FlatBuffers (#40541)
* [C++] Ensure pkg-config flags include -ldl for static builds
(#40578)
* [Dev][C++][Python][R] Use pre-commit for clang-format (#40587)
* [C++] Rename Function::is_impure() to is_pure() (#40608)
* [C++] Add missing util/config.h in arrow/io/compressed_test.cc
(#40625)
* [Python][C++] Support conversion of pyarrow.RunEndEncodedArray
to numpy/pandas (#40661)
* [C++] Expand Substrait type support (#40696)
* [C++] Create registry for Devices to map DeviceType to
MemoryManager in C Device Data import (#40699)
* [C++][Parquet] Minor enhancement code of encryption (#40732)
* [C++][Parquet] Simplify PageWriter and ColumnWriter creation
(#40768)
* [C++] Re-order loads and stores in MemoryPoolStats update
(#40647)
* [C++] Revert changes from PR #40857 (#40980)
* [C++] Correctly report asimd/neon in GetRuntimeInfo (#40857)
* [C++] Thirdparty: bump zstd to 1.5.6 (#40837)
* [Docs][C++][Python] Add initial documentation for
RecordBatch::Tensor conversion (#40842)
* [C++][Python] Basic conversion of RecordBatch to Arrow Tensor -
add support for row-major (#40867)
* [C++][Parquet] Encoding: Optimize DecodeArrow/Decode(bitmap)
for PlainBooleanDecoder (#40876)
* [C++] Suppress shorten-64-to-32 warnings in CUDA/Skyhook codes
(#40883)
* [C++] Fix unused function build error (#40984)
* [C++][Parquet] RleBooleanDecoder supports DecodeArrow with
nulls (#40995)
* [C++][FS][Azure] Adjust
DeleteDir/DeleteDirContents/GetFileInfoSelector behaviors
against Azure for generic filesystem tests (#41068)
* [C++][Parquet] Avoid allocating buffer object in RecordReader's
SkipRecords (#39818)
- Drop apache-arrow-pr40230-glog-0.7.patch
- Drop apache-arrow-pr40275-glog-0.7-2.patch
- Belated inclusion of submission without changelog by
Shani Hadiyanto <shanipribadi@gmail.com>)
* disable static devel packages by default: The CMake targets
require them for all builds, if not disabled
* Add subpackages for Apache Arrow Flight and Flight SQL
-------------------------------------------------------------------
Sat Mar 23 15:23:23 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Update to 15.0.2
## Bug Fixes
* [C++][Acero] Increase size of Acero TempStack (#40007)
* [C++][Dataset] Add missing Protobuf static link dependency
(#40015)
* [C++] Possible data race when reading metadata of a parquet
file (#40111)
* [C++] Make span SFINAE standards-conforming to enable
compilation with nvcc (#40253)
-------------------------------------------------------------------
Wed Feb 28 08:08:44 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Reenable logging
* Add apache-arrow-pr40230-glog-0.7.patch
* Add apache-arrow-pr40275-glog-0.7-2.patch
* now requires glog devel files to be present for
apache-arrow-devel; ArrowConfig.cmake fails otherwise
* gh#apache/arrow#40181
* gh#apache/arrow#40230
* gh#apache/arrow#40275
-------------------------------------------------------------------
Fri Feb 23 17:35:45 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Update to 15.0.1
## Bug Fixes
* [C++] "iso_calendar" kernel returns incorrect results for array
length > 32 (#39360)
* [C++] Explicit error in ExecBatchBuilder when appending var
length data exceeds offset limit (int32 max) (#39383)
* [C++][Parquet] Pass memory pool to decoders (#39526)
* [C++][Parquet] Validate page sizes before truncating to int32
(#39528)
* [C++] Fix tail-word access cross buffer boundary in
`CompareBinaryColumnToRow` (#39606)
* [C++] Fix the issue of ExecBatchBuilder when appending
consecutive tail rows with the same id may exceed buffer
boundary (for fixed size types) (#39585)
* [Release] Update platform tags for macOS wheels to macosx_10_15
(#39657)
* [C++][FlightRPC] Fix nullptr dereference in PollInfo (#39711)
* [C++] Fix tail-byte access cross buffer boundary in key hash
avx2 (#39800)
* [C++][Acero] Fix AsOfJoin with differently ordered schemas than
the output (#39804)
* [C++] Expression ExecuteScalarExpression execute empty args
function with a wrong result (#39908)
* [C++] Strip extension metadata when importing a registered
extension (#39866)
* [C#] Restore support for .NET 4.6.2 (#40008)
* [C++] Fix out-of-line data size calculation in
BinaryViewBuilder::AppendArraySlice (#39994)
* [C++][CI][Parquet] Fixing parquet column_writer_test building
(#40175)
## New Features and Improvements
* [C++] PollFlightInfo does not follow rule of 5
* [C++] Fix filter and take kernel for month_day_nano intervals
(#39795)
* [C++] Thirdparty: Bump zlib to 1.3.1 (#39877)
* [C++] Add missing "#include <algorithm>" (#40010)
- Release 15.0.0
## Bug Fixes
* [C++] Bring back case_when tests for union types (#39308)
* [C++] Fix the issue of ExecBatchBuilder when appending
consecutive tail rows with the same id may exceed buffer
boundary (#39234)
* [C++][Python] Add a no-op kernel for
dictionary_encode(dictionary) (#38349)
* [C++] Use the latest tagged version of flatbuffers (#38192)
* [C++] Don't use MSVC_VERSION to determin
-fms-compatibility-version (#36595)
* [C++] Optimize hash kernels for Dictionary ChunkedArrays
(#38394)
* [C++][Gandiva] Avoid registering exported functions multiple
times in gandiva (#37752)
* [C++][Acero] Fix race condition caused by straggling input in
the as-of-join node (#37839)
* [C++][Parquet] add more closed file checks for
ParquetFileWriter (#38390)
* [C++][FlightRPC] Add missing app_metadata arguments (#38231)
* [C++][Parquet] Fix Valgrind memory leak in
arrow-dataset-file-parquet-encryption-test (#38306)
* [C++][Parquet] Don't initialize OpenSSL explicitly with OpenSSL
1.1 (#38379)
* [C++] Re-generate flatbuffers C++ for Skyhook (#38405)
* [C++] Avoid passing null pointer to LZ4 frame decompressor
(#39125)
* [C++] Add missing explicit size_t cast for i386 (#38557)
* [C++] Fix: add TestingEqualOptions for gtest functions.
(#38642)
* [C++][Gandiva] Use arrow io util to replace
std::filesystem::path in gandiva (#38698)
* [C++] Protect against PREALLOCATE preprocessor defined on macOS
(#38760)
* [C++] Check variadic buffer counts in bounds (#38740)
* [C++][FS][Azure] Do nothing for CreateDir("/container", true)
(#38783)
* Fix TestArrowReaderAdHoc.ReadFloat16Files to use new
uncompressed files (#38825)
* [C++] S3FileSystem export s3 sdk config
"use_virtual_addressing" to arrow::fs::S3Options (#38858)
* [C++][Gandiva] Fix Gandiva to_date function's validation for
supress errors parameter (#38987)
* [C++][Parquet] Fix spelling (#38959)
* [C++] Fix spelling (acero) (#38961)
* [C++] Fix spelling (compute) (#38965)
* [C++] Fix spelling (util) (#38967)
* [C++] Fix spelling (dataset) (#38969)
* [C++] Fix spelling (filesystem) (#38972)
* [C++] Fix spelling (#38978)
* [C++] Fix spelling (#38980)
* [C++][Acero] union node output batches should be unordered
(#39046)
* [C++][CI] Fix Valgrind failures (#39127)
* [C++] Remove needless system Protobuf dependency with
-DARROW_HDFS=ON (#39137)
* [C++][Compute] Fix negative duration division (#39158)
* [C++] Add missing data copy in StreamDecoder::Consume(data)
(#39164)
* [C++] Remove compiler warnings with -Wconversion
-Wno-sign-conversion in public headers (#39186)
* [C++][Benchmarking] Remove hardcoded min times (#39307)
* [C++] Don't use "if constexpr" in lambda (#39334)
* [C++] Disable -Werror=attributes for Azure SDK's identity.hpp
(#39448)
* [C++] Fix compile warning (#39389)
* [CI][JS] Force node 20 on JS build on arm64 to fix build issues
(#39499)
* [C++] Disable parallelism for jemalloc external project
(#39522)
* [C++][Parquet] Fix crash in test_parquet_dataset_lazy_filtering
(#39632)
* [C++] Disable parallelism for all `make`-based externalProjects
when CMake >= 3.28 is used
## New Features and Improvements
* [C++][JSON] Change the max rows to Unlimited(int_32) (#38582)
* [C++][Python] Add "Z" to the end of timestamp print string when
tz defined (#39272)
* [C++][Python] DLPack implementation for Arrow Arrays (producer)
(#38472)
* [C++] Diffing of Run-End Encoded arrays (#35003)
* [C++][Python][R] Allow users to adjust S3 log level by
environment variable (#38267)
* [C++][Format] Implementation of the LIST_VIEW and
LARGE_LIST_VIEW array formats (#35345)
* [C++] Use Cast() instead of CastTo() for Scalar in test
(#39044)
* [C++][Python][Parquet] Implement Float16 logical type (#36073)
* [C++] Add Utf8View and BinaryView to the c ABI (#38443)
* [C++][Parquet] Add api to get RecordReader from RowGroupReader
(#37003)
* [C++] Expose a span converter for Buffer and ArraySpan (#38027)
* [C++] Add A Dictionary Compaction Function For DictionaryArray
(#37418)
* [C++] Add arrow::ipc::StreamDecoder::Reset() (#37970)
* [C++] Implement file reads for Azure filesystem (#38269)
* [C++][Integration] Add C++ Utf8View implementation (#37792)
* [C++][Gandiva] Add external function registry support (#38116)
* [C++][Gandiva] Migrate LLVM JIT engine from MCJIT to ORC
v2/LLJIT (#39098)
* [C++] Feature: support concatenate recordbatches. (#37896)
* [C++] Add support for specifying custom Array opening and
closing delimiters to arrow::PrettyPrintDelimiters (#38187)
* [R] Allow code() to return package name prefix. (#38144)
* [C++][Benchmark] Add non-stream Codec Compression/Decompression
(#38067)
* [C++][Parquet] Change DictEncoder dtor checking to warning log
(#38118)
* [C++][Parquet] Support reading parquet files with multiple gzip
members (#38272)
* [C++][Parquet] check the decompressed page size same as size in
page header (#38327)
* [C++][Azure] Use properties for input stream metadata (#38524)
* [C++][FS][Azure] Implement file writes (#38780)
* [C++] Implement GetFileInfo for a single file in Azure
filesystem (#38505)
* [C++][CMake] Use transitive dependency for system GoogleTest
(#38340)
* [C++][Parquet] Use new encrypted files for page index
encryption test (#38347)
* Add validation logic for offsets and values to
arrow.array.ListArray.fromArrays (#38531)
* [C++][Acero] Create a sorted merge node (#38380)
* [C++][Benchmark] Adding benchmark for LZ4/Snappy Compression
(#38453)
* [C++] Support LogicalNullCount for DictionaryArray (#38681)
* [C++][Parquet] Faster scalar BYTE_STREAM_SPLIT (#38529)
* [C++][Gandiva] Support registering external C functions
(#38632)
* [C++] Implement GetFileInfo(selector) for Azure filesystem
(#39009)
* [C++][FS][Azure] Implement CreateDir() (#38708)
* [C++][FS][Azure] Implement DeleteDir() (#38793)
* [C++][FS][Azure] Implement DeleteDirContents() (#38888)
* [C++] : Implement AzureFileSystem::DeleteRootDirContents
(#39151)
* [C++][FS][Azure] Implement CopyFile() (#39058)
* [C++][Go][Parquet] Add tests for reading Float16 files in
parquet-testing (#38753)
* [C++][FS][Azure] Rename AzurePath to AzureLocation (#38773)
* [C++] Implement directory semantics even when the storage
account doesn't support HNS (#39361)
* [C++][Parquet] Update parquet.thrift to sync with 2.10.0
(#38815)
* [C++] Replace "#ifdef ARROW_WITH_GZIP" in dataset test to
ARROW_WITH_ZLIB (#38853)
* [C++][Parquet] Using length to optimize bloom filter read
(#38863)
* [C++][Parquet] Minor: making parquet TypedComparator operation
as const method (#38875)
* [C++] DatasetWriter release rows_in_flight_throttle when
allocate writing failed (#38885)
* [C++][Parquet] Move EstimatedBufferedValueBytes from
TypedColumnWriter to ColumnWriter (#39055)
* [C++] Stop installing internal bpacking_simd* headers (#38908)
* [C++][Gandiva] Refactor function holder to return arrow Result
(#38873)
* [C++] Use Cast() instead of CastTo() for Dictionary Scalar in
test (#39362)
* [C++] Use Cast() instead of CastTo() for Timestamp Scalar in
test (#39060)
* [C++] Use Cast() instead of CastTo() for List Scalar in test
(#39353)
* [C++][Parquet] Support row group filtering for nested paths for
struct fields (#39065)
* [C++] Refactor the Azure FS tests and filesystem class
instantiation (#39207)
* [C++][Parquet] Optimize FLBA record reader (#39124)
* Create module info compiler plugin (#39135)
* [C++] : Try to make Buffer::device_type_ non-optional (#39150)
* [C++][Parquet] Remove deprecated AppendRowGroup(int64_t
num_rows) (#39209)
* [C++][Parquet] Avoid WriteRecordBatch from produce zero-sized
RowGroup (#39211)
* [C++] Support binary to fixed_size_binary cast (#39236)
* [C++][Azure][FS] Add default credential auth configuration
(#39263)
* [C++] Don't install bundled Azure SDK for C++ with CMake 3.28+
(#39269)
* [C++][FS] : Remove the AzureBackend enum and add more flexible
connection options (#39293)
* [C++][FS] : Inform caller of container not-existing when
checking for HNS support (#39298)
* [C++][FS][Azure] Add workload identity auth configuration
(#39319)
* [C++][FS][Azure] Add managed identity auth configuration
(#39321)
* [C++] Forward arguments to ExceptionToStatus all the way to
Status::FromArgs (#39323)
* [C++] Flaky DatasetWriterTestFixture.MaxRowsOneWriteBackpresure
test (#39379)
* [C++] Add ForceCachedHierarchicalNamespaceSupport to help with
testing (#39340)
* [C++][FS][Azure] Add client secret auth configuration (#39346)
* [C++] Reduce function.h includes (#39312)
* [C++] Use Cast() instead of CastTo() for Parquet (#39364)
* [C++][Parquet] Vectorize decode plain on FLBA (#39414)
* [C++][Parquet] Style: Using arrow::Buffer data_as api rather
than reinterpret_cast (#39420)
* [C++][ORC] Upgrade ORC to 1.9.2 (#39431)
* [C++] Use default Azure credentials implicitly and support
anonymous credentials explicitly (#39450)
* [C++][Parquet] Allow reading dictionary without reading data
via ByteArrayDictionaryRecordReader (#39153)
- Disable logging until compatibility with glog is restored
gh#apache/arrow#40181
-------------------------------------------------------------------
Mon Jan 15 20:38:45 UTC 2024 - Ben Greiner <code@bnavigator.de>
- Update to 14.0.2
## New Features and Improvements
* GH-38449 - [Release][Go][macOS] Use local test data if possible
(#38450)
* GH-38591 - [Parquet][C++] Remove redundant open calls in
ParquetFileFormat::GetReaderAsync (#38621)
## Bug Fixes
* GH-38345 - [Release] Use local test data for verification if
possible (#38362)
* GH-38438 - [C++] Dataset: Trying to fix the async bug in
Parquet dataset (#38466)
* GH-38577 - Reading parquet file behavior change from 13.0.0 to
14.0.0
* GH-38618 - [C++] S3FileSystem: fix regression in deleting
explicitly created sub-directories (#38845)
* GH-38861 - [C++] Add missing “-framework Security” to
Libs.private in arrow.pc (#38869)
* GH-39072 - [Release][CI] Python3.11-devel is required for the
verification job on AlmaLinux 8 (#39073)
* GH-39074 - [Release][Packaging] Use UTF-8 explicitly for KEYS
(#39082)
-------------------------------------------------------------------
Thu Jan 11 20:27:13 UTC 2024 - pgajdos@suse.com
- disable some tests for s390x [bsc#1218592]
-------------------------------------------------------------------
Mon Nov 13 23:51:00 UTC 2023 - Ondřej Súkup <mimi.vx@gmail.com>
- update 14.0.1
* GH-38431 - [Python][CI] Update fs.type_name checks for s3fs tests
* GH-38607 - [Python] Disable PyExtensionType autoload
- update to 14.0.1
* very long list of changes can be found here:
https://arrow.apache.org/release/14.0.0.html
-------------------------------------------------------------------
Fri Aug 25 09:05:09 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Update to 13.0.0
## Acero
* Handling of unaligned buffers is input nodes can be configured
programmatically or by setting the environment variable
ACERO_ALIGNMENT_HANDLING. The default behavior is to warn when
an unaligned buffer is detected GH-35498.
## Compute
* Several new functions have been added:
- aggregate functions “first”, “last”, “first_last” GH-34911;
- vector functions “cumulative_prod”, “cumulative_min”,
“cumulative_max” GH-32190;
- vector function “pairwise_diff” GH-35786.
* Sorting now works on dictionary arrays, with a much better
performance than the naive approach of sorting the decoded
dictionary GH-29887. Sorting also works on struct arrays, and
nested sort keys are supported using FieldRed GH-33206.
* The check_overflow option has been removed from
CumulativeSumOptions as it was redundant with the availability
of two different functions: “cumulative_sum” and
“cumulative_sum_checked” GH-35789.
* Run-end encoded filters are efficiently supported GH-35749.
* Duration types are supported with the “is_in” and “index_in”
functions GH-36047. They can be multiplied with all integer
types GH-36128.
* “is_in” and “index_in” now cast their inputs more flexibly:
they first attempt to cast the value set to the input type,
then in the other direction if the former fails GH-36203.
* Multiple bugs have been fixed in “utf8_slice_codeunits” when
the stop option is omitted GH-36311.
## Dataset
* A custom schema can now be passed when writing a dataset
GH-35730. The custom schema can alter nullability or metadata
information, but is not allowed to change the datatypes
written.
## Filesystems
* The S3 filesystem now writes files in equal-sized chunks, for
compatibility with Cloudflares “R2” Storage GH-34363.
* A long-standing issue where S3 support could crash at shutdown
because of resources still being alive after S3 finalization
has been fixed GH-36346. Now, attempts to use S3 resources
(such as making filesystem calls) after S3 finalization should
result in a clean error.
* The GCS filesystem accepts a new option to set the project id
GH-36227.
## IPC
* Nullability and metadata information for sub-fields of map
types is now preserved when deserializing Arrow IPC GH-35297.
## Orc
* The Orc adapter now maps Arrow field metadata to Orc type
attributes when writing, and vice-versa when reading GH-35304.
## Parquet
* It is now possible to write additional metadata while a
ParquetFileWriter is open GH-34888.
* Writing a page index can be enabled selectively per-column
GH-34949. In addition, page header statistics are not written
anymore if the page index is enabled for the given column
GH-34375, as the information would be redundant and less
efficiently accessed.
* Parquet writer properties allow specifying the sorting columns
GH-35331. The user is responsible for ensuring that the data
written to the file actually complies with the given sorting.
* CRC computation has been implemented for v2 data pages
GH-35171. It was already implemented for v1 data pages.
* Writing compliant nested types is now enabled by default
GH-29781. This should not have any negative implication.
* Attempting to load a subset of an Arrow extension type is now
forbidden GH-20385. Previously, if an extension types storage
is nested (for example a “Point” extension type backed by a
struct<x: float64, y: float64>), it was possible to load
selectively some of the columns of the storage type.
## Substrait
* Support for various functions has been added: “stddev”,
“variance”, “first”, “last” (GH-35247, GH-35506).
* Deserializing sorts is now supported GH-32763. However, some
features, such as clustered sort direction or custom sort
functions, are not implemented.
## Miscellaneous
* FieldRef sports additional methods to get a flattened version
of nested fields GH-14946. Compared to their non-flattened
counterparts, the methods GetFlattened, GetAllFlattened,
GetOneFlattened and GetOneOrNoneFlattened combine a childs
null bitmap with its ancestors null bitmaps such as to compute
the fields overall logical validity bitmap.
* In other words, given the struct array [null, {'x': null},
{'x': 5}], FieldRef("x")::Get might return [0, null, 5] while
FieldRef("y")::GetFlattened will always return [null, null, 5].
* Scalar::hash() has been fixed for sliced nested arrays
GH-35360.
* A new floating-point to decimal conversion algorithm exhibits
much better precision GH-35576.
* It is now possible to cast between scalars of different
list-like types GH-36309.
-------------------------------------------------------------------
Mon Jun 12 12:13:18 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Update to 12.0.1
* [GH-35423] - [C++][Parquet] Parquet PageReader Force
decompression buffer resize smaller (#35428)
* [GH-35498] - [C++] Relax EnsureAlignment check in Acero from
requiring 64-byte aligned buffers to requiring value-aligned
buffers (#35565)
* [GH-35519] - [C++][Parquet] Fixing exception handling in parquet
FileSerializer (#35520)
* [GH-35538] - [C++] Remove unnecessary status.h include from
protobuf (#35673)
* [GH-35730] - [C++] Add the ability to specify custom schema on a
dataset write (#35860)
* [GH-35850] - [C++] Don't disable optimization with
RelWithDebInfo (#35856)
- Drop cflags.patch -- fixed upstream
-------------------------------------------------------------------
Thu May 18 07:00:43 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Update to 12.0.0
* Run-End Encoded Arrays have been implemented and are accessible
(GH-32104)
* The FixedShapeTensor Logical value type has been implemented
using ExtensionType (GH-15483, GH-34796)
## Compute
* New kernel to convert timestamp with timezone to wall time
(GH-33143)
* Cast kernels are now built into libarrow by default (GH-34388)
## Acero
* Acero has been moved out of libarrow into its own shared
library, allowing for smaller builds of the core libarrow
(GH-15280)
* Exec nodes now can have a concept of “ordering” and will reject
non-sensible plans (GH-34136)
* New exec nodes: “pivot_longer” (GH-34266), “order_by”
(GH-34248) and “fetch” (GH-34059)
* Breaking Change: Reorder output fields of “group_by” node so
that keys/segment keys come before aggregates (GH-33616)
## Substrait
* Add support for the round function GH-33588
* Add support for the cast expression element GH-31910
* Added API reference documentation GH-34011
* Added an extension relation to support segmented aggregation
GH-34626
* The output of the aggregate relation now conforms to the spec
GH-34786
## Parquet
* Added support for DeltaLengthByteArray encoding to the Parquet
writer (GH-33024)
* NaNs are correctly handled now for Parquet predicate push-downs
(GH-18481)
* Added support for reading Parquet page indexes (GH-33596) and
writing page indexes (GH-34053)
* Parquet writer can write columns in parallel now (GH-33655)
* Fixed incorrect number of rows in Parquet V2 page headers
(GH-34086)
* Fixed incorrect Parquet page null_count when stats are disabled
(GH-34326)
* Added support for reading BloomFilters to the Parquet Reader
(GH-34665)
* Parquet File-writer can now add additional key-value metadata
after it has been opened (GH-34888)
* Breaking Change: The default row group size for the Arrow
writer changed from 64Mi rows to 1Mi rows. GH-34280
## ORC
* Added support for the union type in ORC writer (GH-34262)
* Fixed ORC CHAR type mapping with Arrow (GH-34823)
* Fixed timestamp type mapping between ORC and arrow (GH-34590)
## Datasets
* Added support for reading JSON datasets (GH-33209)
* Dataset writer now supports specifying a function callback to
construct the file name in addition to the existing file name
template (GH-34565)
## Filesystems
* GcsFileSystem::OpenInputFile avoids unnecessary downloads
(GH-34051)
## Other changes
* Convenience Append(std::optional...) methods have been added to
array builders
([GH-14863](https://github.com/apache/arrow/issues/14863))
* A deprecated OpenTelemetry header was removed from the Flight
library (GH-34417)
* Fixed crash in “take” kernels on ExtensionArrays with an
underlying dictionary type (GH-34619)
* Fixed bug where the C-Data bridge did not preserve nullability
of map values on import (GH-34983)
* Added support for EqualOptions to RecordBatch::Equals
(GH-34968)
* zstd dependency upgraded to v1.5.5 (GH-34899)
* Improved handling of “logical” nulls such as with union and
RunEndEncoded arrays (GH-34361)
* Fixed incorrect handling of uncompressed body buffers in IPC
reader, added IpcWriteOptions::min_space_savings for optional
compression optimizations (GH-15102)
-------------------------------------------------------------------
Mon Apr 3 11:09:06 UTC 2023 - Andreas Schwab <schwab@suse.de>
- cflags.patch: fix option order to compile with optimisation
- Adjust constraints
-------------------------------------------------------------------
Wed Mar 29 13:13:13 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Remove gflags-static. It was only needed due to a packaging error
with gflags which is about to be fixed in Tumbleweed
- Disable build of the jemalloc memory pool backend
* It requires every consuming application to LD_PRELOAD
libjemalloc.so.2, even when it is not set as the default memory
pool, due to static TLS block allocation errors
* Usage of the bundled jemalloc as a workaround is not desired
(gh#apache/arrow#13739)
* jemalloc does not seem to have a clear advantage over the
system glibc allocator:
https://ursalabs.org/blog/2021-r-benchmarks-part-1
* This overrides the default behavior documented in
https://arrow.apache.org/docs/cpp/memory.html#default-memory-pool
-------------------------------------------------------------------
Sun Mar 12 04:28:52 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Update to v11.0.0
* ARROW-4709 - [C++] Optimize for ordered JSON fields (#14100)
* ARROW-11776 - [C++][Java] Support parquet write from ArrowReader
to file (#14151)
* ARROW-13938 - [C++] Date and datetime types should autocast from
strings
* ARROW-14161 - [C++][Docs] Improve Parquet C++ docs (#14018)
* ARROW-14999 - [C++] Optional field name equality checks for map
and list type (#14847)
* ARROW-15538 - [C++] Expanding coverage of math functions from
Substrait to Acero (#14434)
* ARROW-15592 - [C++] Add support for custom output field names in
a substrait::PlanRel (#14292)
* ARROW-15732 - [C++] Do not use any CPU threads in execution plan
when use_threads is false (#15104)
* ARROW-16782 - [Format] Add REE definitions to FlatBuffers
(#14176)
* ARROW-17144 - [C++][Gandiva] Add sqrt function (#13656)
* ARROW-17301 - [C++] Implement compute function "binary_slice"
(#14550)
* ARROW-17509 - [C++] Simplify async scheduler by removing the
need to call End (#14524)
* ARROW-17520 - [C++] Implement SubStrait SetRel (UnionAll)
(#14186)
* ARROW-17610 - [C++] Support additional source types in
SourceNode (#14207)
* ARROW-17613 - [C++] Add function execution API for a
preconfigured kernel (#14043)
* ARROW-17640 - [C++] Add File Handling Test cases for GlobFile
handling in Substrait Read (#14132)
* ARROW-17798 - [C++][Parquet] Add DELTA_BINARY_PACKED encoder to
Parquet writer (#14191)
* ARROW-17825 - [C++] Allow the possibility to write several
tables in ORCFileWriter (#14219)
* ARROW-17836 - [C++] Allow specifying alignment of buffers
(#14225)
* ARROW-17837 - [C++][Acero] Create ExecPlan-owned QueryContext
that will store a plan's shared data structures (#14227)
* ARROW-17859 - [C++] Use self-pipe in signal-receiving StopSource
(#14250)
* ARROW-17867 - [C++][FlightRPC] Expose bulk parameter binding in
Flight SQL (#14266)
* ARROW-17932 - [C++] Implement streaming RecordBatchReader for
JSON (#14355)
* ARROW-17960 - [C++][Python] Implement list_slice kernel (#14395)
* ARROW-17966 - [C++] Adjust to new format for Substrait optional
arguments (#14415)
* ARROW-17975 - [C++] Create at-fork facility (#14594)
* ARROW-17980 - [C++] As-of-Join Substrait extension (#14485)
* ARROW-17989 - [C++][Python] Enable struct_field kernel to accept
string field names (#14495)
* ARROW-18008 - [Python][C++] Add use_threads to
run_substrait_query
* ARROW-18051 - [C++] Enable tests skipped by ARROW-16392 (#14425)
* ARROW-18095 - [CI][C++][MinGW] All tests exited with 0xc0000139
* ARROW-18113 - [C++] Add RandomAccessFile::ReadManyAsync (#14723)
* ARROW-18135 - [C++] Avoid warnings that ExecBatch::length may be
uninitialized (#14480)
* ARROW-18144 - [C++] Improve JSONTypeError error message in
testing (#14486)
* ARROW-18184 - [C++] Improve JSON parser benchmarks (#14552)
* ARROW-18206 - [C++][CI] Add a nightly build for C++20
compilation (#14571)
* ARROW-18235 - [C++][Gandiva] Fix the like function
implementation for escape chars (#14579)
* ARROW-18249 - [C++] Update vcpkg port to arrow 10.0.0
* ARROW-18253 - [C++][Parquet] Add additional bounds safety checks
(#14592)
* ARROW-18259 - [C++][CMake] Add support for system Thrift CMake
package (#14597)
* ARROW-18280 - [C++][Python] Support slicing to end in list_slice
kernel (#14749)
* ARROW-18282 - [C++][Python] Support step >= 1 in list_slice
kernel (#14696)
* ARROW-18287 - [C++][CMake] Add support for Brotli/utf8proc
provided by vcpkg (#14609)
* ARROW-18342 - [C++] AsofJoinNode support for Boolean data field
(#14658)
* ARROW-18350 - [C++] Use std::to_chars instead of std::to_string
(#14666)
* ARROW-18367 - [C++] Enable the creation of named table relations
(#14681)
* ARROW-18373 - Fix component drop-down, add license text (#14688)
* ARROW-18377 - MIGRATION: Automate component labels from issue
form content (#15245)
* ARROW-18395 - [C++] Move select-k implementation into separate
module
* ARROW-18402 - [C++] Expose DeclarationInfo (#14765)
* ARROW-18406 - [C++] Can't build Arrow with Substrait on Ubuntu
20.04 (#14735)
* ARROW-18409 - [GLib][Plasma] Suppress deprecated warning in
building plasma-glib (#14739)
* ARROW-18413 - [C++][Parquet] Expose page index info from
ColumnChunkMetaData (#14742)
* ARROW-18419 - [C++] Update vendored fast_float (#14817)
* ARROW-18420 - [C++][Parquet] Introduce ColumnIndex & OffsetIndex
(#14803)
* ARROW-18421 - [C++][ORC] Add accessor for stripe information in
reader (#14806)
* ARROW-18427 - [C++] Support negative tolerance in AsofJoinNode
(#14934)
* ARROW-18435 - [C++][Java] Update ORC to 1.8.1 (#14942)
* GH-14869 - [C++] Add Cflags.private defining _STATIC to .pc.in.
(#14900)
* GH-14920 - [C++][CMake] Add missing -latomic to Arrow CMake
package (#15251)
* GH-14937 - [C++] Add rank kernel benchmarks (#14938)
* GH-14951 - [C++][Parquet] Add benchmarks for DELTA_BINARY_PACKED
encoding (#15140)
* GH-15072 - [C++] Move the round functionality into a separate
module (#15073)
* GH-15074 - [Parquet][C++] change 16-bit page_ordinal to 32-bit
(#15182)
* GH-15096 - [C++] Substrait ProjectRel Emit Optimization (#15097)
* GH-15100 - [C++][Parquet] Add benchmark for reading strings from
Parquet (#15101)
* GH-15151 - [C++] Adding RecordBatchReaderSource to solve an
issue in R API (#15183)
* GH-15185 - [C++][Parquet] Improve documentation for Parquet
Reader column_indices (#15184)
* GH-15199 - [C++][Substrait] Allow
AGGREGATION_INVOCATION_UNSPECIFIED as valid invocation (#15198)
* GH-15200 - [C++] Created benchmarks for round kernels. (#15201)
* GH-15216 - [C++][Parquet] Parquet writer accepts RecordBatch
(#15240)
* GH-15226 - [C++] Add DurationType to hash kernels (#33685)
* GH-15237 - [C++] Add ::arrow::Unreachable() using
std::string_view (#15238)
* GH-15239 - [C++][Parquet] Parquet writer writes decimal as
int32/64 (#15244)
* GH-15290 - [C++][Compute] Optimize IfElse kernel AAS/ASA case
when the scalar is null (#15291)
* GH-33607 - [C++] Support optional additional arguments for
inline visit functions (#33608)
* GH-33657 - [C++] arrow-dataset.pc doesn't depend on parquet.pc
without ARROW_PARQUET=ON (#33665)
* PARQUET-2179 - [C++][Parquet] Add a test for skipping repeated
fields (#14366)
* PARQUET-2188 - [parquet-cpp] Add SkipRecords API to RecordReader
(#14142)
* PARQUET-2204 - [parquet-cpp] TypedColumnReaderImpl::Skip should
reuse scratch space (#14509)
* PARQUET-2206 - [parquet-cpp] Microbenchmark for ColumnReader
ReadBatch and Skip (#14523)
* PARQUET-2209 - [parquet-cpp] Optimize skip for the case that
number of values to skip equals page size (#14545)
* PARQUET-2210 - [C++][Parquet] Skip pages based on header
metadata using a callback (#14603)
* PARQUET-2211 - [C++] Print ColumnMetaData.encoding_stats field
(#14556)
- Remove unused python3-arrow package declaration
* Add options as recommended for python support
- Provide test data for unittests
- Don't use system jemalloc but bundle it in order to avoid
static TLS errors in consuming packages like python-pyarrow
* gh#apache/arrow#13739
-------------------------------------------------------------------
Sun Aug 28 19:30:50 UTC 2022 - Stefan Brüns <stefan.bruens@rwth-aachen.de>
- Revert ccache change, using ccache in a pristine buildroot
just slows down OBS builds (use --ccache for local builds).
- Remove unused gflags-static-devel dependency.
-------------------------------------------------------------------
Mon Aug 22 06:22:43 UTC 2022 - John Vandenberg <jayvdb@gmail.com>
- Speed up builds with ccache
-------------------------------------------------------------------
Sat Aug 6 01:59:08 UTC 2022 - Stefan Brüns <stefan.bruens@rwth-aachen.de>
- Update to v9.0.0
No (current) changelog provided
- Spec file cleanup:
* Remove lots of duplicate, unused, or wrong build dependencies
* Do not package outdated Readmes and Changelogs
- Enable tests, disable ones requiring external test data
-------------------------------------------------------------------
Sat Nov 14 09:07:59 UTC 2020 - John Vandenberg <jayvdb@gmail.com>
- Update to v2.0.0
-------------------------------------------------------------------
Wed Nov 13 21:14:00 UTC 2019 - TheBlackCat <toddrme2178@gmail.com>
- Initial spec for v0.12.0