## Bug Fixes
* [C++] Fix overflow issues for large build side in swiss join
(#45108)
* [C++][Fuzzing] Fix Negation bug discovered by fuzzing (#45181)
* [C++][Parquet] Omit level histogram when max level is 0
(#45285)
* [Parquet][C++] Fix statistics load logic for no row group and
multiple row groups (#45350)
* [C++] Disable Flight test (#45232)
## Improvements
* [C++][Parquet] Improve performance of generating size
statistics (#45202)
* [C++][S3] Workaround compatibility issue between AWS SDK and
MinIO (#45310)
- Release 19.0.0
## New Features and Improvements
* [CI][C++] Add a nightly job to test offline build (#44721)
* [C++] GcsFileSystem::Make should return Result (#44503)
* [C++][Parquet] Implement SizeStatistics (#40594)
* [C++] Reduce string inlining in Substrait serde (#45174)
* [C++][Acero] Enhance asof_join to work in multi-threaded
execution by sequencing input (#44083)
* [C++] Support the AWS S3 SSE-C encryption (#43601)
* [C++][Parquet] Parquet Metadata Printer supports print
sort-columns (#43599)
* [C++] Add C++ implementation of Async C Data Interface (#44495)
* [C++][Acero] Support AVX2 swiss join decoding (#43832)
* [C++] skip -0117 in StrptimeZoneOffset for old glibc (#44621)
* [C++] Add arrow::RecordBatch::MakeStatisticsArray() (#44252)
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=40
- Update to 17.0.0
## Bug Fixes
* [C++] Add option to string ‘center’ kernel to control
left/right alignment on odd number of padding (#41449)
* [C++][Python] Fix casting to extension type with fixed size
list storage type (#42219)
* [C++] Replace null_count with MayHaveNulls in
ListArrayFromArray and MapArray (#41957)
* [C++][Python] RecordBatch.filter() segfaults if passed a
ChunkedArray (#40971)
* [C++][Parquet] Timestamp conversion from Parquet to Arrow does
not follow compatibility guidelines for convertedType
* [C++] Use LargeStringArray for casting when writing tables to
CSV (#40271)
* [C++][Python] Map child Array constructed from keys and items
shouldn’t have offset (#40871)
* [C++] Fix compile warning with ‘implicitly-defined constructor
does not initialize’ in encoding_benchmark (#41060)
* [C++] Get null_bit_id according to are_cols_in_encoding_order
in NullUpdateColumnToRow_avx2 (#40998)
* [C++] Clean up unused parameter warnings (#41111)
* [C++][Acero] Fix asof join race (#41614)
* [C++] support for single threaded joins (#41125)
* [C++] Fix hashjoin benchmark failed at make utf8’s random
batches (#41195)
* [C++] Check to avoid copying when NullBitmapBuffer is Null
(#41452)
* [C++] Fix crash on invalid Parquet file (#41366)
* [C++][Parquet] More strict Parquet level checking (#41346)
* [C++][Gandiva] Fix gandiva cache size env var (#41330)
* [C++][CMake][Windows] Remove needless .dll suffix from link
libraries (#41341)
* [C++][CMake] Remove unused ARROW_NO_DEPRECATED_API (#41345)
* [C++][maybe_unused] with Arrow macro (#41359)
* [C++][Large] ListView and Map nested types for scalar_if_else’s
kernel functions (#41419)
* [C++][Gandiva] Fix ascii_utf8 function to return same result on
x86 and Arm (#41434)
* [C++] Reuse deduplication logic for direct registration
(#41466)
* [C++] Clean up more redundant move warnings (#41487)
* [C++][Compute] Remove redundant logic for ArrayData as
ExecResults in ExecScalarCaseWhen (#41380)
* [C++][CMake] correctly use Protobuf_PROTOC_EXECUTABLE (#41582)
* [C++][CMake] Fix ARROW_USE_BOOST detect condition (#41622)
* [C++][Python] Add optional null_bitmap to MapArray::FromArrays
(#41757)
* [C++] macros.h: Fix ARROW_FORCE_INLINE for MSVC (#41712)
* [C++][Acero] Remove an useless parameter for QueryContext::Init
called in hash_join_benchmark (#41716)
* [C++] Fix the issue that temp vector stack may be under sized
(#41746)
* [C++] Check that extension metadata key is present before
attempting to delete it (#41763)
* [C++] Iterator releases its resource immediately when it reads
all values (#41824)
* [C++][Flight][Benchmark] Ensure waiting server ready (#41793)
* [C++] Fix avx2 gather offset larger than 2GB in
CompareColumnsToRows (#42188)
* [C++][S3] Fix potential deadlock when closing output stream
(#41876)
* [CI][C++] Clear cache for mamba on AppVeyor (#41977)
* [CI][Python][C++] Fix utf8proc detection for wheel on Windows
(#42022)
* [C++] Support list-views on list_slice (#42067)
* [C++] Fix an OTel test failure and remove needless logs
(#42122)
* [C++][FS][Azure] Ensure setting BlobSasBuilder::Protocol
(#42108)
* [C++] Support list-view typed arrays in array_take and
array_filter (#42117)
* [C++] Fix some potential uninitialized variable warnings
(#42207)
* [C++] Avoid invalid accesses in parquet-encoding-benchmark
(#42141)
* [C++] Use FetchContent for bundled ORC (#43011)
* [C++] Fix GetRecordBatchPayload crashes for device data
(#42199)
* [C++] Use non-stale c-ares download URL (#42250)
* [C++][Parquet] Check for valid ciphertext length to prevent
segfault (#43071)
* [C++][Compute] Mark KeyCompare.CompareColumnsToRowsLarge as
large memory test (#43128)
* [C++] Upgrade bundled google-cloud-cpp to 2.22.0 (#43136)
## New Features and Improvements
* [C++][Compute] Implement Grouper::Reset (#41352)
* [Go][C++] Implement Flight SQL Bulk Ingestion (#38385)
* [C++][FS][Azure] Support azure cli auth (#41976)
* [C++][FS][Azure] Add support for environment credential
(#41715)
* [C++] Optimize Take for fixed-size types including nested
fixed-size lists (#41297)
* [C++][Device] Add Copy/View slice functions to a CPU pointer
(#41477)
* [C++] Add support for OpenTelemetry logging (#39905)
* [C++] Import/Export ArrowDeviceArrayStream (#40807)
* [C++] move LocalFileSystem to the registry (#40356)
* [C++] Make flatbuffers serialization more deterministic
(#40392)
* [C++][Gandiva] add RE2::Options set_dot_nl(true) for Like
function (#40970)
* [C++] Introduce portable compiler assumptions (#41021)
* [C++] Add a grouper benchmark for preventing performance
regression (#41036)
* [C++] Support flatten for combining nested list related types
(#41092)
* [C++] Clean up remaining tasks related to half float casts
(#41084)
* [C++][FS][Azure] Add support for CopyFile with hierarchical
namespace support (#41276)
* [C++] Add is_validity_defined_by_bitmap() predicate (#41115)
* [C++] IO: enhance boundary checking in CompressedInputStream
(#41117)
* [C++][Python] Expose recursive flatten for lists on
list_flatten kernel function and pyarrow bindings (#41295)
* [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst
(#41187)
* [C++] Extract the kernel loops used for PrimitiveTakeExec and
generalize to any fixed-width type (#41373)
* [C++][Acero] Use per-node basis temp vector stack to mitigate
overflow (#41335)
* [C++][Parquet] Optimize DelimitRecords by batch execution when
max_rep_level > 1 (#41362)
* [C++][FS][Azure][Docs] Add AzureFileSystem to Filesystems API
reference (#41411)
* [C++] Use ASAN to poison temp vector stack memory (#41695)
* [C++][S3] Add a new option to check existence before CreateDir
(#41822)
* [C++][Parquet] Fix
DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize (#41546)
* [C++] Thirdparty: Upgrade xsimd to 13.0.0 (#41548)
* [C++] Improve fixed_width_test_util.h (#41575)
* [C++] ChunkResolver: Implement ResolveMany and add unit tests
(#41561)
* [C++] fixed_width_internal.h: Simplify docstring and support
bit-sized types (BOOL) (#41597)
* [C++][Python] Extends the add_key_value to parquet::arrow and
PyArrow (#41633)
* [C++][CMake][Windows] Don’t build needless object libraries
(#41658)
* [C++][Python] PrettyPrint non-cpu data by copying to default
CPU device (#42010)
* [C++][Parquet] Thrift: generate template method to accelerate
reading thrift (#41703)
* [C++][Parquet] Minor: moving EncodedStats by default rather
than copying (#41727)
* [C++][ORC] Ensure setting detected ORC version (#41767)
* [C++][Parquet] Add file metadata read/write benchmark (#41761)
* [C++] Make git-dependent definitions internal (#41781)
* [C++][S3] Remove GetBucketRegion hack for newer AWS SDK
versions (#41798)
* [C++][Parquet] normalize dictionary encoding to use
RLE_DICTIONARY (#41819)
* [C++] IPC: Minor enhance the code of writer (#41900)
* [C++] Fix ExecuteScalar deduce all_scalar with chunked_array
(#41925)
* [C++] Minor enhance code style for FixedShapeTensorType
(#41954)
* [C++] Follow up of adding null_bitmap to MapArray::FromArrays
(#41956)
* [C++] Misc changes making code around list-like types and
list-view types behave the same way (#41971)
* [C++] : kernel.cc: Remove defaults on switch so that compiler
can check full enum coverage for us (#41995)
* [C++][Parquet] ParquetFilePrinter::JSONPrint print length of
FLBA (#41981)
* [C++][CMake] Add preset for Valgrind (#42110)
* [C++] Move TakeXXX free functions into TakeMetaFunction and
make them private (#42127)
* [C++][FS][Azure] Validate
AzureOptions::{blob,dfs}_storage_scheme (#42135)
* [C++] list_parent_indices: Add support for list-view types
(#42236)
* [C++] Reduce the recursion of many-join test (#43042)
* [C++] Limit buffer size in BufferedInputStream::SetBufferSize
with raw_read_bound (#43064)
- Require cmake lz4 for 1.10
- Update to 17.0.0
## Bug Fixes
* [C++][Python] Fix casting to extension type with fixed size
list storage type (#42219)
* [Python] Include metadata when creating pa.schema from
PyCapsule (#41538)
* [C++][Python] RecordBatch.filter() segfaults if passed a
ChunkedArray (#40971)
* [Python] pa.array: add check for byte-swapped numpy arrays
inside python objects (#41549)
* [Python] Fix read_table for encrypted parquet (#39438)
* [Python] RunEndEncodedArray.from_arrays: bugfix for Array
arguments (#40560) (#41093)
* [C++][Python] Map child Array constructed from keys and items
shouldn’t have offset (#40871)
* [Python] `test_numpy_array_protocol` test failures with numpy
2.0.0rc1
* [Python] Fix StructArray.sort() for by=None (#41495)
* [Python] Build with Python 3.13 (#42034)
* [Python] remove special methods related to buffers in python
<2.6 (#41492)
* [Python] Fix reading column index with decimal values (#41503)
* [Docs][Python] Remove duplicate contents (#41588)
* [C++][Python] Add optional null_bitmap to MapArray::FromArrays
(#41757)
* [Python][Parquet] Implement to_dict method on SortingColumn
(#41704)
* [Python] CMake: ignore Parquet encryption option if Parquet
itself is not enabled (fix Java integration build) (#41776)
* [Python] Disallow direct pa.RecordBatchReader() construction to
avoid segfaults (#41773)
* [Python] Fix RecordBatchReader.cast to support casting to equal
schema for all types (#42098)
* [Python] Fix tests when using NumPy 2.0 on Windows (#42099)
* [CI][Python] Use pip install -e instead of setup.py build_ext
–inplace for installing pyarrow on verification script (#42007)
* [CI][Python][C++] Fix utf8proc detection for wheel on Windows
(#42022)
* [Python][CI] Update expected output for numpy 2.0.0 (#42172)
## New Features and Improvements
* [Python] Replace pandas.util.testing.rands with vendored
version (#42089)
* [Python] begin moving static settings to pyproject.toml
(#41041)
* [Python] Implement PyCapsule interface for Device data in
PyArrow (#40717)
* [Python] Expand the Arrow PyCapsule Interface with C Device
Data support (#40708)
* [Python] Let RecordBatch.filter accept a boolean expression in
addition to mask array (#43043)
* [Python] Fix pickling of LocalFileSystem for cython 2 (#41459)
* [Python] Expand the C Device Interface bindings to support
import on CUDA device (#40385)
* [Python] Allow passing a mapping of column names to
rename_columns (#40645)
* [Python][Packaging] Strip unnecessary symbols when building
wheels (#42028)
* [Python][Docs] Update PyArrow installation docs for conda
package split (#41135)
* [Python] Basic bindings for Device and MemoryManager classes
(#41685)
* [C++][Python] Expose recursive flatten for lists on
list_flatten kernel function and pyarrow bindings (#41295)
* [Python][Packaging] Ensure to build with released numpy 2.0
(instead of RC) in the wheel building workflows (#42194)
* [CI][Python] Add a job on ARM64 macOS (#41313)
* [CI][Python] Reduce CI time on macOS (#41378)
* [Python] Expose byte_width and bit_width of ExtensionType in
terms of the storage type (#41413)
* [Python] Update Python development guide about components being
enabled by default based on Arrow C++ (#41705)
* [Python] Building PyArrow: enable/disable python components by
default based on availability in Arrow C++ (#41494)
* [C++][Python] Extends the add_key_value to parquet::arrow and
PyArrow (#41633)
* [Python] Ensure Buffer methods don’t crash with non-CPU data
(#41889)
* [C++][Python] PrettyPrint non-cpu data by copying to default
CPU device (#42010)
* [Python][Parquet] Update BYTE_STREAM_SPLIT description in
write_table() docstring (#41759)
* [Python] Add support for Pyodide (#37822)
* [Python] Fix pandas tests to follow downstream datetime64 unit
changes (#41979)
* [Python] Allow Array.filter() to take general array input
(#42051)
* [Python] Expose new FLOAT16 logical type in the pyarrow.parquet
bindings (#42103)
* [Python] Array gracefully fails on non-cpu device (#42113)
* [Python][Parquet] Pyarrow store decimal as integer (#42169)
* [Python] Add CI job for Numpy 1.X (#42189)
* [CI][Python] Pin openjdk=17 in python substrait integration
(#43051)
- Drop pyarrow-pr41319-numpy2-tests.patch
- Add pyarrow-pr433325-extradirs.patch gh#apache/arrow/pull/43325
OBS-URL: https://build.opensuse.org/request/show/1194085
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=34
I would like to have apache flight and apache flight sql library built.
also disabling the static build because the generated CMake Targets includes them, making builds against libarrow requiring not just apache-arrow-devel but also all of the devel-static packages.
note: flight and flight-sql are packaged separately.
in upstream rpm and fedora repo, flight-sql is included in libarrow-flight-libs.
OBS-URL: https://build.opensuse.org/request/show/1163690
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=29
- Reenable logging
* Add apache-arrow-pr40230-glog-0.7.patch
* Add apache-arrow-pr40275-glog-0.7-2.patch
* now requires glog devel files to be present for
apache-arrow-devel; ArrowConfig.cmake fails otherwise
* gh#apache/arrow#40181
* gh#apache/arrow#40230
* gh#apache/arrow#40275
- Move d:l:p:n/python-pyarrow to the science/apache-arrow as multibuild package: Uses the same source and is tightly connected.
OBS-URL: https://build.opensuse.org/request/show/1152980
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=25
- Update to 15.0.1
## Bug Fixes
* [C++] "iso_calendar" kernel returns incorrect results for array
length > 32 (#39360)
* [C++] Explicit error in ExecBatchBuilder when appending var
length data exceeds offset limit (int32 max) (#39383)
* [C++][Parquet] Pass memory pool to decoders (#39526)
* [C++][Parquet] Validate page sizes before truncating to int32
(#39528)
* [C++] Fix tail-word access cross buffer boundary in
`CompareBinaryColumnToRow` (#39606)
* [C++] Fix the issue of ExecBatchBuilder when appending
consecutive tail rows with the same id may exceed buffer
boundary (for fixed size types) (#39585)
* [Release] Update platform tags for macOS wheels to macosx_10_15
(#39657)
* [C++][FlightRPC] Fix nullptr dereference in PollInfo (#39711)
* [C++] Fix tail-byte access cross buffer boundary in key hash
avx2 (#39800)
* [C++][Acero] Fix AsOfJoin with differently ordered schemas than
the output (#39804)
* [C++] Expression ExecuteScalarExpression execute empty args
function with a wrong result (#39908)
* [C++] Strip extension metadata when importing a registered
extension (#39866)
* [C#] Restore support for .NET 4.6.2 (#40008)
* [C++] Fix out-of-line data size calculation in
BinaryViewBuilder::AppendArraySlice (#39994)
* [C++][CI][Parquet] Fixing parquet column_writer_test building
(#40175)
## New Features and Improvements
* [C++] PollFlightInfo does not follow rule of 5
* [C++] Fix filter and take kernel for month_day_nano intervals
(#39795)
* [C++] Thirdparty: Bump zlib to 1.3.1 (#39877)
* [C++] Add missing "#include <algorithm>" (#40010)
- Release 15.0.0
## Bug Fixes
* [C++] Bring back case_when tests for union types (#39308)
* [C++] Fix the issue of ExecBatchBuilder when appending
consecutive tail rows with the same id may exceed buffer
boundary (#39234)
* [C++][Python] Add a no-op kernel for
dictionary_encode(dictionary) (#38349)
* [C++] Use the latest tagged version of flatbuffers (#38192)
* [C++] Don't use MSVC_VERSION to determin
-fms-compatibility-version (#36595)
* [C++] Optimize hash kernels for Dictionary ChunkedArrays
(#38394)
* [C++][Gandiva] Avoid registering exported functions multiple
times in gandiva (#37752)
* [C++][Acero] Fix race condition caused by straggling input in
the as-of-join node (#37839)
* [C++][Parquet] add more closed file checks for
ParquetFileWriter (#38390)
* [C++][FlightRPC] Add missing app_metadata arguments (#38231)
* [C++][Parquet] Fix Valgrind memory leak in
arrow-dataset-file-parquet-encryption-test (#38306)
* [C++][Parquet] Don't initialize OpenSSL explicitly with OpenSSL
1.1 (#38379)
* [C++] Re-generate flatbuffers C++ for Skyhook (#38405)
* [C++] Avoid passing null pointer to LZ4 frame decompressor
(#39125)
* [C++] Add missing explicit size_t cast for i386 (#38557)
* [C++] Fix: add TestingEqualOptions for gtest functions.
(#38642)
* [C++][Gandiva] Use arrow io util to replace
std::filesystem::path in gandiva (#38698)
* [C++] Protect against PREALLOCATE preprocessor defined on macOS
(#38760)
* [C++] Check variadic buffer counts in bounds (#38740)
* [C++][FS][Azure] Do nothing for CreateDir("/container", true)
(#38783)
* Fix TestArrowReaderAdHoc.ReadFloat16Files to use new
uncompressed files (#38825)
* [C++] S3FileSystem export s3 sdk config
"use_virtual_addressing" to arrow::fs::S3Options (#38858)
* [C++][Gandiva] Fix Gandiva to_date function's validation for
supress errors parameter (#38987)
* [C++][Parquet] Fix spelling (#38959)
* [C++] Fix spelling (acero) (#38961)
* [C++] Fix spelling (compute) (#38965)
* [C++] Fix spelling (util) (#38967)
* [C++] Fix spelling (dataset) (#38969)
* [C++] Fix spelling (filesystem) (#38972)
* [C++] Fix spelling (#38978)
* [C++] Fix spelling (#38980)
* [C++][Acero] union node output batches should be unordered
(#39046)
* [C++][CI] Fix Valgrind failures (#39127)
* [C++] Remove needless system Protobuf dependency with
-DARROW_HDFS=ON (#39137)
* [C++][Compute] Fix negative duration division (#39158)
* [C++] Add missing data copy in StreamDecoder::Consume(data)
(#39164)
* [C++] Remove compiler warnings with -Wconversion
-Wno-sign-conversion in public headers (#39186)
* [C++][Benchmarking] Remove hardcoded min times (#39307)
* [C++] Don't use "if constexpr" in lambda (#39334)
* [C++] Disable -Werror=attributes for Azure SDK's identity.hpp
(#39448)
* [C++] Fix compile warning (#39389)
* [CI][JS] Force node 20 on JS build on arm64 to fix build issues
(#39499)
* [C++] Disable parallelism for jemalloc external project
(#39522)
* [C++][Parquet] Fix crash in test_parquet_dataset_lazy_filtering
(#39632)
* [C++] Disable parallelism for all `make`-based externalProjects
when CMake >= 3.28 is used
## New Features and Improvements
* [C++][JSON] Change the max rows to Unlimited(int_32) (#38582)
* [C++][Python] Add "Z" to the end of timestamp print string when
tz defined (#39272)
* [C++][Python] DLPack implementation for Arrow Arrays (producer)
(#38472)
* [C++] Diffing of Run-End Encoded arrays (#35003)
* [C++][Python][R] Allow users to adjust S3 log level by
environment variable (#38267)
* [C++][Format] Implementation of the LIST_VIEW and
LARGE_LIST_VIEW array formats (#35345)
* [C++] Use Cast() instead of CastTo() for Scalar in test
(#39044)
* [C++][Python][Parquet] Implement Float16 logical type (#36073)
* [C++] Add Utf8View and BinaryView to the c ABI (#38443)
* [C++][Parquet] Add api to get RecordReader from RowGroupReader
(#37003)
* [C++] Expose a span converter for Buffer and ArraySpan (#38027)
* [C++] Add A Dictionary Compaction Function For DictionaryArray
(#37418)
* [C++] Add arrow::ipc::StreamDecoder::Reset() (#37970)
* [C++] Implement file reads for Azure filesystem (#38269)
* [C++][Integration] Add C++ Utf8View implementation (#37792)
* [C++][Gandiva] Add external function registry support (#38116)
* [C++][Gandiva] Migrate LLVM JIT engine from MCJIT to ORC
v2/LLJIT (#39098)
* [C++] Feature: support concatenate recordbatches. (#37896)
* [C++] Add support for specifying custom Array opening and
closing delimiters to arrow::PrettyPrintDelimiters (#38187)
* [R] Allow code() to return package name prefix. (#38144)
* [C++][Benchmark] Add non-stream Codec Compression/Decompression
(#38067)
* [C++][Parquet] Change DictEncoder dtor checking to warning log
(#38118)
* [C++][Parquet] Support reading parquet files with multiple gzip
members (#38272)
* [C++][Parquet] check the decompressed page size same as size in
page header (#38327)
* [C++][Azure] Use properties for input stream metadata (#38524)
* [C++][FS][Azure] Implement file writes (#38780)
* [C++] Implement GetFileInfo for a single file in Azure
filesystem (#38505)
* [C++][CMake] Use transitive dependency for system GoogleTest
(#38340)
* [C++][Parquet] Use new encrypted files for page index
encryption test (#38347)
* Add validation logic for offsets and values to
arrow.array.ListArray.fromArrays (#38531)
* [C++][Acero] Create a sorted merge node (#38380)
* [C++][Benchmark] Adding benchmark for LZ4/Snappy Compression
(#38453)
* [C++] Support LogicalNullCount for DictionaryArray (#38681)
* [C++][Parquet] Faster scalar BYTE_STREAM_SPLIT (#38529)
* [C++][Gandiva] Support registering external C functions
(#38632)
* [C++] Implement GetFileInfo(selector) for Azure filesystem
(#39009)
* [C++][FS][Azure] Implement CreateDir() (#38708)
* [C++][FS][Azure] Implement DeleteDir() (#38793)
* [C++][FS][Azure] Implement DeleteDirContents() (#38888)
* [C++] : Implement AzureFileSystem::DeleteRootDirContents
(#39151)
* [C++][FS][Azure] Implement CopyFile() (#39058)
* [C++][Go][Parquet] Add tests for reading Float16 files in
parquet-testing (#38753)
* [C++][FS][Azure] Rename AzurePath to AzureLocation (#38773)
* [C++] Implement directory semantics even when the storage
account doesn't support HNS (#39361)
* [C++][Parquet] Update parquet.thrift to sync with 2.10.0
(#38815)
* [C++] Replace "#ifdef ARROW_WITH_GZIP" in dataset test to
ARROW_WITH_ZLIB (#38853)
* [C++][Parquet] Using length to optimize bloom filter read
(#38863)
* [C++][Parquet] Minor: making parquet TypedComparator operation
as const method (#38875)
* [C++] DatasetWriter release rows_in_flight_throttle when
allocate writing failed (#38885)
* [C++][Parquet] Move EstimatedBufferedValueBytes from
TypedColumnWriter to ColumnWriter (#39055)
* [C++] Stop installing internal bpacking_simd* headers (#38908)
* [C++][Gandiva] Refactor function holder to return arrow Result
(#38873)
* [C++] Use Cast() instead of CastTo() for Dictionary Scalar in
test (#39362)
* [C++] Use Cast() instead of CastTo() for Timestamp Scalar in
test (#39060)
* [C++] Use Cast() instead of CastTo() for List Scalar in test
(#39353)
* [C++][Parquet] Support row group filtering for nested paths for
struct fields (#39065)
* [C++] Refactor the Azure FS tests and filesystem class
instantiation (#39207)
* [C++][Parquet] Optimize FLBA record reader (#39124)
* Create module info compiler plugin (#39135)
* [C++] : Try to make Buffer::device_type_ non-optional (#39150)
* [C++][Parquet] Remove deprecated AppendRowGroup(int64_t
num_rows) (#39209)
* [C++][Parquet] Avoid WriteRecordBatch from produce zero-sized
RowGroup (#39211)
* [C++] Support binary to fixed_size_binary cast (#39236)
* [C++][Azure][FS] Add default credential auth configuration
(#39263)
* [C++] Don't install bundled Azure SDK for C++ with CMake 3.28+
(#39269)
* [C++][FS] : Remove the AzureBackend enum and add more flexible
connection options (#39293)
* [C++][FS] : Inform caller of container not-existing when
checking for HNS support (#39298)
* [C++][FS][Azure] Add workload identity auth configuration
(#39319)
* [C++][FS][Azure] Add managed identity auth configuration
(#39321)
* [C++] Forward arguments to ExceptionToStatus all the way to
Status::FromArgs (#39323)
* [C++] Flaky DatasetWriterTestFixture.MaxRowsOneWriteBackpresure
test (#39379)
* [C++] Add ForceCachedHierarchicalNamespaceSupport to help with
testing (#39340)
* [C++][FS][Azure] Add client secret auth configuration (#39346)
* [C++] Reduce function.h includes (#39312)
* [C++] Use Cast() instead of CastTo() for Parquet (#39364)
* [C++][Parquet] Vectorize decode plain on FLBA (#39414)
* [C++][Parquet] Style: Using arrow::Buffer data_as api rather
than reinterpret_cast (#39420)
* [C++][ORC] Upgrade ORC to 1.9.2 (#39431)
* [C++] Use default Azure credentials implicitly and support
anonymous credentials explicitly (#39450)
* [C++][Parquet] Allow reading dictionary without reading data
via ByteArrayDictionaryRecordReader (#39153)
- Disable logging until compatibility with glog is restored
gh#apache/arrow#40181
OBS-URL: https://build.opensuse.org/request/show/1150081
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=23
- Update to 14.0.2
## New Features and Improvements
* GH-38449 - [Release][Go][macOS] Use local test data if possible
(#38450)
* GH-38591 - [Parquet][C++] Remove redundant open calls in
ParquetFileFormat::GetReaderAsync (#38621)
## Bug Fixes
* GH-38345 - [Release] Use local test data for verification if
possible (#38362)
* GH-38438 - [C++] Dataset: Trying to fix the async bug in
Parquet dataset (#38466)
* GH-38577 - Reading parquet file behavior change from 13.0.0 to
14.0.0
* GH-38618 - [C++] S3FileSystem: fix regression in deleting
explicitly created sub-directories (#38845)
* GH-38861 - [C++] Add missing “-framework Security” to
Libs.private in arrow.pc (#38869)
* GH-39072 - [Release][CI] Python3.11-devel is required for the
verification job on AlmaLinux 8 (#39073)
* GH-39074 - [Release][Packaging] Use UTF-8 explicitly for KEYS
(#39082)
OBS-URL: https://build.opensuse.org/request/show/1139092
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=21
- Update to 13.0.0
## Acero
* Handling of unaligned buffers is input nodes can be configured
programmatically or by setting the environment variable
ACERO_ALIGNMENT_HANDLING. The default behavior is to warn when
an unaligned buffer is detected GH-35498.
## Compute
* Several new functions have been added:
- aggregate functions “first”, “last”, “first_last” GH-34911;
- vector functions “cumulative_prod”, “cumulative_min”,
“cumulative_max” GH-32190;
- vector function “pairwise_diff” GH-35786.
* Sorting now works on dictionary arrays, with a much better
performance than the naive approach of sorting the decoded
dictionary GH-29887. Sorting also works on struct arrays, and
nested sort keys are supported using FieldRed GH-33206.
* The check_overflow option has been removed from
CumulativeSumOptions as it was redundant with the availability
of two different functions: “cumulative_sum” and
“cumulative_sum_checked” GH-35789.
* Run-end encoded filters are efficiently supported GH-35749.
* Duration types are supported with the “is_in” and “index_in”
functions GH-36047. They can be multiplied with all integer
types GH-36128.
* “is_in” and “index_in” now cast their inputs more flexibly:
they first attempt to cast the value set to the input type,
then in the other direction if the former fails GH-36203.
* Multiple bugs have been fixed in “utf8_slice_codeunits” when
the stop option is omitted GH-36311.
## Dataset
* A custom schema can now be passed when writing a dataset
GH-35730. The custom schema can alter nullability or metadata
information, but is not allowed to change the datatypes
written.
## Filesystems
* The S3 filesystem now writes files in equal-sized chunks, for
compatibility with Cloudflare’s “R2” Storage GH-34363.
* A long-standing issue where S3 support could crash at shutdown
because of resources still being alive after S3 finalization
has been fixed GH-36346. Now, attempts to use S3 resources
(such as making filesystem calls) after S3 finalization should
result in a clean error.
* The GCS filesystem accepts a new option to set the project id
GH-36227.
## IPC
* Nullability and metadata information for sub-fields of map
types is now preserved when deserializing Arrow IPC GH-35297.
## Orc
* The Orc adapter now maps Arrow field metadata to Orc type
attributes when writing, and vice-versa when reading GH-35304.
## Parquet
* It is now possible to write additional metadata while a
ParquetFileWriter is open GH-34888.
* Writing a page index can be enabled selectively per-column
GH-34949. In addition, page header statistics are not written
anymore if the page index is enabled for the given column
GH-34375, as the information would be redundant and less
efficiently accessed.
* Parquet writer properties allow specifying the sorting columns
GH-35331. The user is responsible for ensuring that the data
written to the file actually complies with the given sorting.
* CRC computation has been implemented for v2 data pages
GH-35171. It was already implemented for v1 data pages.
* Writing compliant nested types is now enabled by default
GH-29781. This should not have any negative implication.
* Attempting to load a subset of an Arrow extension type is now
forbidden GH-20385. Previously, if an extension type’s storage
is nested (for example a “Point” extension type backed by a
struct<x: float64, y: float64>), it was possible to load
selectively some of the columns of the storage type.
## Substrait
* Support for various functions has been added: “stddev”,
“variance”, “first”, “last” (GH-35247, GH-35506).
* Deserializing sorts is now supported GH-32763. However, some
features, such as clustered sort direction or custom sort
functions, are not implemented.
## Miscellaneous
* FieldRef sports additional methods to get a flattened version
of nested fields GH-14946. Compared to their non-flattened
counterparts, the methods GetFlattened, GetAllFlattened,
GetOneFlattened and GetOneOrNoneFlattened combine a child’s
null bitmap with its ancestors’ null bitmaps such as to compute
the field’s overall logical validity bitmap.
* In other words, given the struct array [null, {'x': null},
{'x': 5}], FieldRef("x")::Get might return [0, null, 5] while
FieldRef("y")::GetFlattened will always return [null, null, 5].
* Scalar::hash() has been fixed for sliced nested arrays
GH-35360.
* A new floating-point to decimal conversion algorithm exhibits
much better precision GH-35576.
* It is now possible to cast between scalars of different
list-like types GH-36309.
OBS-URL: https://build.opensuse.org/request/show/1109685
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=15
- Update to 12.0.0
* Run-End Encoded Arrays have been implemented and are accessible
(GH-32104)
* The FixedShapeTensor Logical value type has been implemented
using ExtensionType (GH-15483, GH-34796)
## Compute
* New kernel to convert timestamp with timezone to wall time
(GH-33143)
* Cast kernels are now built into libarrow by default (GH-34388)
## Acero
* Acero has been moved out of libarrow into it’s own shared
library, allowing for smaller builds of the core libarrow
(GH-15280)
* Exec nodes now can have a concept of “ordering” and will reject
non-sensible plans (GH-34136)
* New exec nodes: “pivot_longer” (GH-34266), “order_by”
(GH-34248) and “fetch” (GH-34059)
* Breaking Change: Reorder output fields of “group_by” node so
that keys/segment keys come before aggregates (GH-33616)
## Substrait
* Add support for the round function GH-33588
* Add support for the cast expression element GH-31910
* Added API reference documentation GH-34011
* Added an extension relation to support segmented aggregation
GH-34626
* The output of the aggregate relation now conforms to the spec
GH-34786
## Parquet
* Added support for DeltaLengthByteArray encoding to the Parquet
writer (GH-33024)
OBS-URL: https://build.opensuse.org/request/show/1087839
OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=11