## Bug Fixes
* [C++] Fix overflow issues for large build side in swiss join
(#45108)
* [C++][Fuzzing] Fix Negation bug discovered by fuzzing (#45181)
* [C++][Parquet] Omit level histogram when max level is 0
(#45285)
* [Parquet][C++] Fix statistics load logic for no row group and
multiple row groups (#45350)
* [C++] Disable Flight test (#45232)
## Improvements
* [C++][Parquet] Improve performance of generating size
statistics (#45202)
* [C++][S3] Workaround compatibility issue between AWS SDK and
MinIO (#45310)
- Release 19.0.0
## New Features and Improvements
* [CI][C++] Add a nightly job to test offline build (#44721)
* [C++] GcsFileSystem::Make should return Result (#44503)
* [C++][Parquet] Implement SizeStatistics (#40594)
* [C++] Reduce string inlining in Substrait serde (#45174)
* [C++][Acero] Enhance asof_join to work in multi-threaded
execution by sequencing input (#44083)
* [C++] Support the AWS S3 SSE-C encryption (#43601)
* [C++][Parquet] Parquet Metadata Printer supports print
sort-columns (#43599)
* [C++] Add C++ implementation of Async C Data Interface (#44495)
* [C++][Acero] Support AVX2 swiss join decoding (#43832)
* [C++] skip -0117 in StrptimeZoneOffset for old glibc (#44621)
* [C++] Add arrow::RecordBatch::MakeStatisticsArray() (#44252)
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=40
- Update to 17.0.0
## Bug Fixes
* [C++] Add option to string ‘center’ kernel to control
left/right alignment on odd number of padding (#41449)
* [C++][Python] Fix casting to extension type with fixed size
list storage type (#42219)
* [C++] Replace null_count with MayHaveNulls in
ListArrayFromArray and MapArray (#41957)
* [C++][Python] RecordBatch.filter() segfaults if passed a
ChunkedArray (#40971)
* [C++][Parquet] Timestamp conversion from Parquet to Arrow does
not follow compatibility guidelines for convertedType
* [C++] Use LargeStringArray for casting when writing tables to
CSV (#40271)
* [C++][Python] Map child Array constructed from keys and items
shouldn’t have offset (#40871)
* [C++] Fix compile warning with ‘implicitly-defined constructor
does not initialize’ in encoding_benchmark (#41060)
* [C++] Get null_bit_id according to are_cols_in_encoding_order
in NullUpdateColumnToRow_avx2 (#40998)
* [C++] Clean up unused parameter warnings (#41111)
* [C++][Acero] Fix asof join race (#41614)
* [C++] support for single threaded joins (#41125)
* [C++] Fix hashjoin benchmark failed at make utf8’s random
batches (#41195)
* [C++] Check to avoid copying when NullBitmapBuffer is Null
(#41452)
* [C++] Fix crash on invalid Parquet file (#41366)
* [C++][Parquet] More strict Parquet level checking (#41346)
* [C++][Gandiva] Fix gandiva cache size env var (#41330)
* [C++][CMake][Windows] Remove needless .dll suffix from link
libraries (#41341)
* [C++][CMake] Remove unused ARROW_NO_DEPRECATED_API (#41345)
* [C++][maybe_unused] with Arrow macro (#41359)
* [C++][Large] ListView and Map nested types for scalar_if_else’s
kernel functions (#41419)
* [C++][Gandiva] Fix ascii_utf8 function to return same result on
x86 and Arm (#41434)
* [C++] Reuse deduplication logic for direct registration
(#41466)
* [C++] Clean up more redundant move warnings (#41487)
* [C++][Compute] Remove redundant logic for ArrayData as
ExecResults in ExecScalarCaseWhen (#41380)
* [C++][CMake] correctly use Protobuf_PROTOC_EXECUTABLE (#41582)
* [C++][CMake] Fix ARROW_USE_BOOST detect condition (#41622)
* [C++][Python] Add optional null_bitmap to MapArray::FromArrays
(#41757)
* [C++] macros.h: Fix ARROW_FORCE_INLINE for MSVC (#41712)
* [C++][Acero] Remove an useless parameter for QueryContext::Init
called in hash_join_benchmark (#41716)
* [C++] Fix the issue that temp vector stack may be under sized
(#41746)
* [C++] Check that extension metadata key is present before
attempting to delete it (#41763)
* [C++] Iterator releases its resource immediately when it reads
all values (#41824)
* [C++][Flight][Benchmark] Ensure waiting server ready (#41793)
* [C++] Fix avx2 gather offset larger than 2GB in
CompareColumnsToRows (#42188)
* [C++][S3] Fix potential deadlock when closing output stream
(#41876)
* [CI][C++] Clear cache for mamba on AppVeyor (#41977)
* [CI][Python][C++] Fix utf8proc detection for wheel on Windows
(#42022)
* [C++] Support list-views on list_slice (#42067)
* [C++] Fix an OTel test failure and remove needless logs
(#42122)
* [C++][FS][Azure] Ensure setting BlobSasBuilder::Protocol
(#42108)
* [C++] Support list-view typed arrays in array_take and
array_filter (#42117)
* [C++] Fix some potential uninitialized variable warnings
(#42207)
* [C++] Avoid invalid accesses in parquet-encoding-benchmark
(#42141)
* [C++] Use FetchContent for bundled ORC (#43011)
* [C++] Fix GetRecordBatchPayload crashes for device data
(#42199)
* [C++] Use non-stale c-ares download URL (#42250)
* [C++][Parquet] Check for valid ciphertext length to prevent
segfault (#43071)
* [C++][Compute] Mark KeyCompare.CompareColumnsToRowsLarge as
large memory test (#43128)
* [C++] Upgrade bundled google-cloud-cpp to 2.22.0 (#43136)
## New Features and Improvements
* [C++][Compute] Implement Grouper::Reset (#41352)
* [Go][C++] Implement Flight SQL Bulk Ingestion (#38385)
* [C++][FS][Azure] Support azure cli auth (#41976)
* [C++][FS][Azure] Add support for environment credential
(#41715)
* [C++] Optimize Take for fixed-size types including nested
fixed-size lists (#41297)
* [C++][Device] Add Copy/View slice functions to a CPU pointer
(#41477)
* [C++] Add support for OpenTelemetry logging (#39905)
* [C++] Import/Export ArrowDeviceArrayStream (#40807)
* [C++] move LocalFileSystem to the registry (#40356)
* [C++] Make flatbuffers serialization more deterministic
(#40392)
* [C++][Gandiva] add RE2::Options set_dot_nl(true) for Like
function (#40970)
* [C++] Introduce portable compiler assumptions (#41021)
* [C++] Add a grouper benchmark for preventing performance
regression (#41036)
* [C++] Support flatten for combining nested list related types
(#41092)
* [C++] Clean up remaining tasks related to half float casts
(#41084)
* [C++][FS][Azure] Add support for CopyFile with hierarchical
namespace support (#41276)
* [C++] Add is_validity_defined_by_bitmap() predicate (#41115)
* [C++] IO: enhance boundary checking in CompressedInputStream
(#41117)
* [C++][Python] Expose recursive flatten for lists on
list_flatten kernel function and pyarrow bindings (#41295)
* [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst
(#41187)
* [C++] Extract the kernel loops used for PrimitiveTakeExec and
generalize to any fixed-width type (#41373)
* [C++][Acero] Use per-node basis temp vector stack to mitigate
overflow (#41335)
* [C++][Parquet] Optimize DelimitRecords by batch execution when
max_rep_level > 1 (#41362)
* [C++][FS][Azure][Docs] Add AzureFileSystem to Filesystems API
reference (#41411)
* [C++] Use ASAN to poison temp vector stack memory (#41695)
* [C++][S3] Add a new option to check existence before CreateDir
(#41822)
* [C++][Parquet] Fix
DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize (#41546)
* [C++] Thirdparty: Upgrade xsimd to 13.0.0 (#41548)
* [C++] Improve fixed_width_test_util.h (#41575)
* [C++] ChunkResolver: Implement ResolveMany and add unit tests
(#41561)
* [C++] fixed_width_internal.h: Simplify docstring and support
bit-sized types (BOOL) (#41597)
* [C++][Python] Extends the add_key_value to parquet::arrow and
PyArrow (#41633)
* [C++][CMake][Windows] Don’t build needless object libraries
(#41658)
* [C++][Python] PrettyPrint non-cpu data by copying to default
CPU device (#42010)
* [C++][Parquet] Thrift: generate template method to accelerate
reading thrift (#41703)
* [C++][Parquet] Minor: moving EncodedStats by default rather
than copying (#41727)
* [C++][ORC] Ensure setting detected ORC version (#41767)
* [C++][Parquet] Add file metadata read/write benchmark (#41761)
* [C++] Make git-dependent definitions internal (#41781)
* [C++][S3] Remove GetBucketRegion hack for newer AWS SDK
versions (#41798)
* [C++][Parquet] normalize dictionary encoding to use
RLE_DICTIONARY (#41819)
* [C++] IPC: Minor enhance the code of writer (#41900)
* [C++] Fix ExecuteScalar deduce all_scalar with chunked_array
(#41925)
* [C++] Minor enhance code style for FixedShapeTensorType
(#41954)
* [C++] Follow up of adding null_bitmap to MapArray::FromArrays
(#41956)
* [C++] Misc changes making code around list-like types and
list-view types behave the same way (#41971)
* [C++] : kernel.cc: Remove defaults on switch so that compiler
can check full enum coverage for us (#41995)
* [C++][Parquet] ParquetFilePrinter::JSONPrint print length of
FLBA (#41981)
* [C++][CMake] Add preset for Valgrind (#42110)
* [C++] Move TakeXXX free functions into TakeMetaFunction and
make them private (#42127)
* [C++][FS][Azure] Validate
AzureOptions::{blob,dfs}_storage_scheme (#42135)
* [C++] list_parent_indices: Add support for list-view types
(#42236)
* [C++] Reduce the recursion of many-join test (#43042)
* [C++] Limit buffer size in BufferedInputStream::SetBufferSize
with raw_read_bound (#43064)
- Require cmake lz4 for 1.10
- Update to 17.0.0
## Bug Fixes
* [C++][Python] Fix casting to extension type with fixed size
list storage type (#42219)
* [Python] Include metadata when creating pa.schema from
PyCapsule (#41538)
* [C++][Python] RecordBatch.filter() segfaults if passed a
ChunkedArray (#40971)
* [Python] pa.array: add check for byte-swapped numpy arrays
inside python objects (#41549)
* [Python] Fix read_table for encrypted parquet (#39438)
* [Python] RunEndEncodedArray.from_arrays: bugfix for Array
arguments (#40560) (#41093)
* [C++][Python] Map child Array constructed from keys and items
shouldn’t have offset (#40871)
* [Python] `test_numpy_array_protocol` test failures with numpy
2.0.0rc1
* [Python] Fix StructArray.sort() for by=None (#41495)
* [Python] Build with Python 3.13 (#42034)
* [Python] remove special methods related to buffers in python
<2.6 (#41492)
* [Python] Fix reading column index with decimal values (#41503)
* [Docs][Python] Remove duplicate contents (#41588)
* [C++][Python] Add optional null_bitmap to MapArray::FromArrays
(#41757)
* [Python][Parquet] Implement to_dict method on SortingColumn
(#41704)
* [Python] CMake: ignore Parquet encryption option if Parquet
itself is not enabled (fix Java integration build) (#41776)
* [Python] Disallow direct pa.RecordBatchReader() construction to
avoid segfaults (#41773)
* [Python] Fix RecordBatchReader.cast to support casting to equal
schema for all types (#42098)
* [Python] Fix tests when using NumPy 2.0 on Windows (#42099)
* [CI][Python] Use pip install -e instead of setup.py build_ext
–inplace for installing pyarrow on verification script (#42007)
* [CI][Python][C++] Fix utf8proc detection for wheel on Windows
(#42022)
* [Python][CI] Update expected output for numpy 2.0.0 (#42172)
## New Features and Improvements
* [Python] Replace pandas.util.testing.rands with vendored
version (#42089)
* [Python] begin moving static settings to pyproject.toml
(#41041)
* [Python] Implement PyCapsule interface for Device data in
PyArrow (#40717)
* [Python] Expand the Arrow PyCapsule Interface with C Device
Data support (#40708)
* [Python] Let RecordBatch.filter accept a boolean expression in
addition to mask array (#43043)
* [Python] Fix pickling of LocalFileSystem for cython 2 (#41459)
* [Python] Expand the C Device Interface bindings to support
import on CUDA device (#40385)
* [Python] Allow passing a mapping of column names to
rename_columns (#40645)
* [Python][Packaging] Strip unnecessary symbols when building
wheels (#42028)
* [Python][Docs] Update PyArrow installation docs for conda
package split (#41135)
* [Python] Basic bindings for Device and MemoryManager classes
(#41685)
* [C++][Python] Expose recursive flatten for lists on
list_flatten kernel function and pyarrow bindings (#41295)
* [Python][Packaging] Ensure to build with released numpy 2.0
(instead of RC) in the wheel building workflows (#42194)
* [CI][Python] Add a job on ARM64 macOS (#41313)
* [CI][Python] Reduce CI time on macOS (#41378)
* [Python] Expose byte_width and bit_width of ExtensionType in
terms of the storage type (#41413)
* [Python] Update Python development guide about components being
enabled by default based on Arrow C++ (#41705)
* [Python] Building PyArrow: enable/disable python components by
default based on availability in Arrow C++ (#41494)
* [C++][Python] Extends the add_key_value to parquet::arrow and
PyArrow (#41633)
* [Python] Ensure Buffer methods don’t crash with non-CPU data
(#41889)
* [C++][Python] PrettyPrint non-cpu data by copying to default
CPU device (#42010)
* [Python][Parquet] Update BYTE_STREAM_SPLIT description in
write_table() docstring (#41759)
* [Python] Add support for Pyodide (#37822)
* [Python] Fix pandas tests to follow downstream datetime64 unit
changes (#41979)
* [Python] Allow Array.filter() to take general array input
(#42051)
* [Python] Expose new FLOAT16 logical type in the pyarrow.parquet
bindings (#42103)
* [Python] Array gracefully fails on non-cpu device (#42113)
* [Python][Parquet] Pyarrow store decimal as integer (#42169)
* [Python] Add CI job for Numpy 1.X (#42189)
* [CI][Python] Pin openjdk=17 in python substrait integration
(#43051)
- Drop pyarrow-pr41319-numpy2-tests.patch
- Add pyarrow-pr433325-extradirs.patch gh#apache/arrow/pull/43325
OBS-URL: https://build.opensuse.org/request/show/1194085
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=34
- Reenable logging
* Add apache-arrow-pr40230-glog-0.7.patch
* Add apache-arrow-pr40275-glog-0.7-2.patch
* now requires glog devel files to be present for
apache-arrow-devel; ArrowConfig.cmake fails otherwise
* gh#apache/arrow#40181
* gh#apache/arrow#40230
* gh#apache/arrow#40275
- Move d:l:p:n/python-pyarrow to the science/apache-arrow as multibuild package: Uses the same source and is tightly connected.
OBS-URL: https://build.opensuse.org/request/show/1152980
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=25
- Update to 15.0.1
## Bug Fixes
* [C++] "iso_calendar" kernel returns incorrect results for array
length > 32 (#39360)
* [C++] Explicit error in ExecBatchBuilder when appending var
length data exceeds offset limit (int32 max) (#39383)
* [C++][Parquet] Pass memory pool to decoders (#39526)
* [C++][Parquet] Validate page sizes before truncating to int32
(#39528)
* [C++] Fix tail-word access cross buffer boundary in
`CompareBinaryColumnToRow` (#39606)
* [C++] Fix the issue of ExecBatchBuilder when appending
consecutive tail rows with the same id may exceed buffer
boundary (for fixed size types) (#39585)
* [Release] Update platform tags for macOS wheels to macosx_10_15
(#39657)
* [C++][FlightRPC] Fix nullptr dereference in PollInfo (#39711)
* [C++] Fix tail-byte access cross buffer boundary in key hash
avx2 (#39800)
* [C++][Acero] Fix AsOfJoin with differently ordered schemas than
the output (#39804)
* [C++] Expression ExecuteScalarExpression execute empty args
function with a wrong result (#39908)
* [C++] Strip extension metadata when importing a registered
extension (#39866)
* [C#] Restore support for .NET 4.6.2 (#40008)
* [C++] Fix out-of-line data size calculation in
BinaryViewBuilder::AppendArraySlice (#39994)
* [C++][CI][Parquet] Fixing parquet column_writer_test building
(#40175)
## New Features and Improvements
* [C++] PollFlightInfo does not follow rule of 5
* [C++] Fix filter and take kernel for month_day_nano intervals
(#39795)
* [C++] Thirdparty: Bump zlib to 1.3.1 (#39877)
* [C++] Add missing "#include <algorithm>" (#40010)
- Release 15.0.0
## Bug Fixes
* [C++] Bring back case_when tests for union types (#39308)
* [C++] Fix the issue of ExecBatchBuilder when appending
consecutive tail rows with the same id may exceed buffer
boundary (#39234)
* [C++][Python] Add a no-op kernel for
dictionary_encode(dictionary) (#38349)
* [C++] Use the latest tagged version of flatbuffers (#38192)
* [C++] Don't use MSVC_VERSION to determin
-fms-compatibility-version (#36595)
* [C++] Optimize hash kernels for Dictionary ChunkedArrays
(#38394)
* [C++][Gandiva] Avoid registering exported functions multiple
times in gandiva (#37752)
* [C++][Acero] Fix race condition caused by straggling input in
the as-of-join node (#37839)
* [C++][Parquet] add more closed file checks for
ParquetFileWriter (#38390)
* [C++][FlightRPC] Add missing app_metadata arguments (#38231)
* [C++][Parquet] Fix Valgrind memory leak in
arrow-dataset-file-parquet-encryption-test (#38306)
* [C++][Parquet] Don't initialize OpenSSL explicitly with OpenSSL
1.1 (#38379)
* [C++] Re-generate flatbuffers C++ for Skyhook (#38405)
* [C++] Avoid passing null pointer to LZ4 frame decompressor
(#39125)
* [C++] Add missing explicit size_t cast for i386 (#38557)
* [C++] Fix: add TestingEqualOptions for gtest functions.
(#38642)
* [C++][Gandiva] Use arrow io util to replace
std::filesystem::path in gandiva (#38698)
* [C++] Protect against PREALLOCATE preprocessor defined on macOS
(#38760)
* [C++] Check variadic buffer counts in bounds (#38740)
* [C++][FS][Azure] Do nothing for CreateDir("/container", true)
(#38783)
* Fix TestArrowReaderAdHoc.ReadFloat16Files to use new
uncompressed files (#38825)
* [C++] S3FileSystem export s3 sdk config
"use_virtual_addressing" to arrow::fs::S3Options (#38858)
* [C++][Gandiva] Fix Gandiva to_date function's validation for
supress errors parameter (#38987)
* [C++][Parquet] Fix spelling (#38959)
* [C++] Fix spelling (acero) (#38961)
* [C++] Fix spelling (compute) (#38965)
* [C++] Fix spelling (util) (#38967)
* [C++] Fix spelling (dataset) (#38969)
* [C++] Fix spelling (filesystem) (#38972)
* [C++] Fix spelling (#38978)
* [C++] Fix spelling (#38980)
* [C++][Acero] union node output batches should be unordered
(#39046)
* [C++][CI] Fix Valgrind failures (#39127)
* [C++] Remove needless system Protobuf dependency with
-DARROW_HDFS=ON (#39137)
* [C++][Compute] Fix negative duration division (#39158)
* [C++] Add missing data copy in StreamDecoder::Consume(data)
(#39164)
* [C++] Remove compiler warnings with -Wconversion
-Wno-sign-conversion in public headers (#39186)
* [C++][Benchmarking] Remove hardcoded min times (#39307)
* [C++] Don't use "if constexpr" in lambda (#39334)
* [C++] Disable -Werror=attributes for Azure SDK's identity.hpp
(#39448)
* [C++] Fix compile warning (#39389)
* [CI][JS] Force node 20 on JS build on arm64 to fix build issues
(#39499)
* [C++] Disable parallelism for jemalloc external project
(#39522)
* [C++][Parquet] Fix crash in test_parquet_dataset_lazy_filtering
(#39632)
* [C++] Disable parallelism for all `make`-based externalProjects
when CMake >= 3.28 is used
## New Features and Improvements
* [C++][JSON] Change the max rows to Unlimited(int_32) (#38582)
* [C++][Python] Add "Z" to the end of timestamp print string when
tz defined (#39272)
* [C++][Python] DLPack implementation for Arrow Arrays (producer)
(#38472)
* [C++] Diffing of Run-End Encoded arrays (#35003)
* [C++][Python][R] Allow users to adjust S3 log level by
environment variable (#38267)
* [C++][Format] Implementation of the LIST_VIEW and
LARGE_LIST_VIEW array formats (#35345)
* [C++] Use Cast() instead of CastTo() for Scalar in test
(#39044)
* [C++][Python][Parquet] Implement Float16 logical type (#36073)
* [C++] Add Utf8View and BinaryView to the c ABI (#38443)
* [C++][Parquet] Add api to get RecordReader from RowGroupReader
(#37003)
* [C++] Expose a span converter for Buffer and ArraySpan (#38027)
* [C++] Add A Dictionary Compaction Function For DictionaryArray
(#37418)
* [C++] Add arrow::ipc::StreamDecoder::Reset() (#37970)
* [C++] Implement file reads for Azure filesystem (#38269)
* [C++][Integration] Add C++ Utf8View implementation (#37792)
* [C++][Gandiva] Add external function registry support (#38116)
* [C++][Gandiva] Migrate LLVM JIT engine from MCJIT to ORC
v2/LLJIT (#39098)
* [C++] Feature: support concatenate recordbatches. (#37896)
* [C++] Add support for specifying custom Array opening and
closing delimiters to arrow::PrettyPrintDelimiters (#38187)
* [R] Allow code() to return package name prefix. (#38144)
* [C++][Benchmark] Add non-stream Codec Compression/Decompression
(#38067)
* [C++][Parquet] Change DictEncoder dtor checking to warning log
(#38118)
* [C++][Parquet] Support reading parquet files with multiple gzip
members (#38272)
* [C++][Parquet] check the decompressed page size same as size in
page header (#38327)
* [C++][Azure] Use properties for input stream metadata (#38524)
* [C++][FS][Azure] Implement file writes (#38780)
* [C++] Implement GetFileInfo for a single file in Azure
filesystem (#38505)
* [C++][CMake] Use transitive dependency for system GoogleTest
(#38340)
* [C++][Parquet] Use new encrypted files for page index
encryption test (#38347)
* Add validation logic for offsets and values to
arrow.array.ListArray.fromArrays (#38531)
* [C++][Acero] Create a sorted merge node (#38380)
* [C++][Benchmark] Adding benchmark for LZ4/Snappy Compression
(#38453)
* [C++] Support LogicalNullCount for DictionaryArray (#38681)
* [C++][Parquet] Faster scalar BYTE_STREAM_SPLIT (#38529)
* [C++][Gandiva] Support registering external C functions
(#38632)
* [C++] Implement GetFileInfo(selector) for Azure filesystem
(#39009)
* [C++][FS][Azure] Implement CreateDir() (#38708)
* [C++][FS][Azure] Implement DeleteDir() (#38793)
* [C++][FS][Azure] Implement DeleteDirContents() (#38888)
* [C++] : Implement AzureFileSystem::DeleteRootDirContents
(#39151)
* [C++][FS][Azure] Implement CopyFile() (#39058)
* [C++][Go][Parquet] Add tests for reading Float16 files in
parquet-testing (#38753)
* [C++][FS][Azure] Rename AzurePath to AzureLocation (#38773)
* [C++] Implement directory semantics even when the storage
account doesn't support HNS (#39361)
* [C++][Parquet] Update parquet.thrift to sync with 2.10.0
(#38815)
* [C++] Replace "#ifdef ARROW_WITH_GZIP" in dataset test to
ARROW_WITH_ZLIB (#38853)
* [C++][Parquet] Using length to optimize bloom filter read
(#38863)
* [C++][Parquet] Minor: making parquet TypedComparator operation
as const method (#38875)
* [C++] DatasetWriter release rows_in_flight_throttle when
allocate writing failed (#38885)
* [C++][Parquet] Move EstimatedBufferedValueBytes from
TypedColumnWriter to ColumnWriter (#39055)
* [C++] Stop installing internal bpacking_simd* headers (#38908)
* [C++][Gandiva] Refactor function holder to return arrow Result
(#38873)
* [C++] Use Cast() instead of CastTo() for Dictionary Scalar in
test (#39362)
* [C++] Use Cast() instead of CastTo() for Timestamp Scalar in
test (#39060)
* [C++] Use Cast() instead of CastTo() for List Scalar in test
(#39353)
* [C++][Parquet] Support row group filtering for nested paths for
struct fields (#39065)
* [C++] Refactor the Azure FS tests and filesystem class
instantiation (#39207)
* [C++][Parquet] Optimize FLBA record reader (#39124)
* Create module info compiler plugin (#39135)
* [C++] : Try to make Buffer::device_type_ non-optional (#39150)
* [C++][Parquet] Remove deprecated AppendRowGroup(int64_t
num_rows) (#39209)
* [C++][Parquet] Avoid WriteRecordBatch from produce zero-sized
RowGroup (#39211)
* [C++] Support binary to fixed_size_binary cast (#39236)
* [C++][Azure][FS] Add default credential auth configuration
(#39263)
* [C++] Don't install bundled Azure SDK for C++ with CMake 3.28+
(#39269)
* [C++][FS] : Remove the AzureBackend enum and add more flexible
connection options (#39293)
* [C++][FS] : Inform caller of container not-existing when
checking for HNS support (#39298)
* [C++][FS][Azure] Add workload identity auth configuration
(#39319)
* [C++][FS][Azure] Add managed identity auth configuration
(#39321)
* [C++] Forward arguments to ExceptionToStatus all the way to
Status::FromArgs (#39323)
* [C++] Flaky DatasetWriterTestFixture.MaxRowsOneWriteBackpresure
test (#39379)
* [C++] Add ForceCachedHierarchicalNamespaceSupport to help with
testing (#39340)
* [C++][FS][Azure] Add client secret auth configuration (#39346)
* [C++] Reduce function.h includes (#39312)
* [C++] Use Cast() instead of CastTo() for Parquet (#39364)
* [C++][Parquet] Vectorize decode plain on FLBA (#39414)
* [C++][Parquet] Style: Using arrow::Buffer data_as api rather
than reinterpret_cast (#39420)
* [C++][ORC] Upgrade ORC to 1.9.2 (#39431)
* [C++] Use default Azure credentials implicitly and support
anonymous credentials explicitly (#39450)
* [C++][Parquet] Allow reading dictionary without reading data
via ByteArrayDictionaryRecordReader (#39153)
- Disable logging until compatibility with glog is restored
gh#apache/arrow#40181
OBS-URL: https://build.opensuse.org/request/show/1150081
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=23
- Update to 14.0.2
## New Features and Improvements
* GH-38449 - [Release][Go][macOS] Use local test data if possible
(#38450)
* GH-38591 - [Parquet][C++] Remove redundant open calls in
ParquetFileFormat::GetReaderAsync (#38621)
## Bug Fixes
* GH-38345 - [Release] Use local test data for verification if
possible (#38362)
* GH-38438 - [C++] Dataset: Trying to fix the async bug in
Parquet dataset (#38466)
* GH-38577 - Reading parquet file behavior change from 13.0.0 to
14.0.0
* GH-38618 - [C++] S3FileSystem: fix regression in deleting
explicitly created sub-directories (#38845)
* GH-38861 - [C++] Add missing “-framework Security” to
Libs.private in arrow.pc (#38869)
* GH-39072 - [Release][CI] Python3.11-devel is required for the
verification job on AlmaLinux 8 (#39073)
* GH-39074 - [Release][Packaging] Use UTF-8 explicitly for KEYS
(#39082)
OBS-URL: https://build.opensuse.org/request/show/1139092
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=21
- Update to 13.0.0
## Acero
* Handling of unaligned buffers is input nodes can be configured
programmatically or by setting the environment variable
ACERO_ALIGNMENT_HANDLING. The default behavior is to warn when
an unaligned buffer is detected GH-35498.
## Compute
* Several new functions have been added:
- aggregate functions “first”, “last”, “first_last” GH-34911;
- vector functions “cumulative_prod”, “cumulative_min”,
“cumulative_max” GH-32190;
- vector function “pairwise_diff” GH-35786.
* Sorting now works on dictionary arrays, with a much better
performance than the naive approach of sorting the decoded
dictionary GH-29887. Sorting also works on struct arrays, and
nested sort keys are supported using FieldRed GH-33206.
* The check_overflow option has been removed from
CumulativeSumOptions as it was redundant with the availability
of two different functions: “cumulative_sum” and
“cumulative_sum_checked” GH-35789.
* Run-end encoded filters are efficiently supported GH-35749.
* Duration types are supported with the “is_in” and “index_in”
functions GH-36047. They can be multiplied with all integer
types GH-36128.
* “is_in” and “index_in” now cast their inputs more flexibly:
they first attempt to cast the value set to the input type,
then in the other direction if the former fails GH-36203.
* Multiple bugs have been fixed in “utf8_slice_codeunits” when
the stop option is omitted GH-36311.
## Dataset
* A custom schema can now be passed when writing a dataset
GH-35730. The custom schema can alter nullability or metadata
information, but is not allowed to change the datatypes
written.
## Filesystems
* The S3 filesystem now writes files in equal-sized chunks, for
compatibility with Cloudflare’s “R2” Storage GH-34363.
* A long-standing issue where S3 support could crash at shutdown
because of resources still being alive after S3 finalization
has been fixed GH-36346. Now, attempts to use S3 resources
(such as making filesystem calls) after S3 finalization should
result in a clean error.
* The GCS filesystem accepts a new option to set the project id
GH-36227.
## IPC
* Nullability and metadata information for sub-fields of map
types is now preserved when deserializing Arrow IPC GH-35297.
## Orc
* The Orc adapter now maps Arrow field metadata to Orc type
attributes when writing, and vice-versa when reading GH-35304.
## Parquet
* It is now possible to write additional metadata while a
ParquetFileWriter is open GH-34888.
* Writing a page index can be enabled selectively per-column
GH-34949. In addition, page header statistics are not written
anymore if the page index is enabled for the given column
GH-34375, as the information would be redundant and less
efficiently accessed.
* Parquet writer properties allow specifying the sorting columns
GH-35331. The user is responsible for ensuring that the data
written to the file actually complies with the given sorting.
* CRC computation has been implemented for v2 data pages
GH-35171. It was already implemented for v1 data pages.
* Writing compliant nested types is now enabled by default
GH-29781. This should not have any negative implication.
* Attempting to load a subset of an Arrow extension type is now
forbidden GH-20385. Previously, if an extension type’s storage
is nested (for example a “Point” extension type backed by a
struct<x: float64, y: float64>), it was possible to load
selectively some of the columns of the storage type.
## Substrait
* Support for various functions has been added: “stddev”,
“variance”, “first”, “last” (GH-35247, GH-35506).
* Deserializing sorts is now supported GH-32763. However, some
features, such as clustered sort direction or custom sort
functions, are not implemented.
## Miscellaneous
* FieldRef sports additional methods to get a flattened version
of nested fields GH-14946. Compared to their non-flattened
counterparts, the methods GetFlattened, GetAllFlattened,
GetOneFlattened and GetOneOrNoneFlattened combine a child’s
null bitmap with its ancestors’ null bitmaps such as to compute
the field’s overall logical validity bitmap.
* In other words, given the struct array [null, {'x': null},
{'x': 5}], FieldRef("x")::Get might return [0, null, 5] while
FieldRef("y")::GetFlattened will always return [null, null, 5].
* Scalar::hash() has been fixed for sliced nested arrays
GH-35360.
* A new floating-point to decimal conversion algorithm exhibits
much better precision GH-35576.
* It is now possible to cast between scalars of different
list-like types GH-36309.
OBS-URL: https://build.opensuse.org/request/show/1109685
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=15
- Update to 12.0.0
* Run-End Encoded Arrays have been implemented and are accessible
(GH-32104)
* The FixedShapeTensor Logical value type has been implemented
using ExtensionType (GH-15483, GH-34796)
## Compute
* New kernel to convert timestamp with timezone to wall time
(GH-33143)
* Cast kernels are now built into libarrow by default (GH-34388)
## Acero
* Acero has been moved out of libarrow into it’s own shared
library, allowing for smaller builds of the core libarrow
(GH-15280)
* Exec nodes now can have a concept of “ordering” and will reject
non-sensible plans (GH-34136)
* New exec nodes: “pivot_longer” (GH-34266), “order_by”
(GH-34248) and “fetch” (GH-34059)
* Breaking Change: Reorder output fields of “group_by” node so
that keys/segment keys come before aggregates (GH-33616)
## Substrait
* Add support for the round function GH-33588
* Add support for the cast expression element GH-31910
* Added API reference documentation GH-34011
* Added an extension relation to support segmented aggregation
GH-34626
* The output of the aggregate relation now conforms to the spec
GH-34786
## Parquet
* Added support for DeltaLengthByteArray encoding to the Parquet
writer (GH-33024)
OBS-URL: https://build.opensuse.org/request/show/1087839
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=11