apache-arrow

SHA256

Author	SHA256	Message	Date
dimstar_suse	29b55664af	Accepting request 1285645 from science - Update to 20.0.0 ## Bug Fixes * GH-30302 - [C++][Parquet] Preserve the bitwidth of integer dictionary indices on round-trip to Parquet (#45685) * GH-31992 - [C++][Parquet] Handling the special case when DataPageV2 values buffer is empty (#45252) * GH-37630 - [C++][Python][Dataset] Allow disabling fragment metadata caching (#45330) * GH-39023 - [C++][CMake] Add missing launcher path conversion for ExternalPackage (#45349) * GH-43057 - [C++] Thread-safe AesEncryptor / AesDecryptor (#44990) * GH-45048 - [C++][Parquet] Deprecate unused chunk_size parameter in parquet::arrow::FileWriter::NewRowGroup() (#45088) * GH-45129 - [Python][C++] Fix usage of deprecated C++ functionality on pyarrow (#45189) * GH-45132 - [C++][Gandiva] Update LLVM to 18.1 (#45114) * GH-45185 - [C++][Parquet] Raise an error for invalid repetition levels when delimiting records (#45186) * GH-45254 - [C++][Acero] Fix the row offset truncation in row table merge (#45255) * GH-45266 - [C++][Acero] Fix the running tasks count of Scheduler when get error tasks in multi-threads (#45268) * GH-45270 - [C++][CI] Disable mimalloc on Valgrind builds (#45271) * GH-45301 - [C++] Change PrimitiveArray ctor to protected (#45444) * GH-45334 - [C++][Acero] Fix swiss join overflow issues in row offset calculation for fixed length and null masks (#45336) * GH-45362 - [C++] Fix identity cast for time and list scalar OBS-URL: https://build.opensuse.org/request/show/1285645 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=20	2025-06-14 14:17:55 +00:00
bnavigator	1aba9e9712	OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=57	2025-06-13 18:46:54 +00:00
bnavigator	8697b15a63	. OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=56	2025-06-13 18:39:08 +00:00
bnavigator	853a205aac	- Update to 20.0.0 ## Bug Fixes * GH-30302 - [C++][Parquet] Preserve the bitwidth of integer dictionary indices on round-trip to Parquet (#45685) * GH-31992 - [C++][Parquet] Handling the special case when DataPageV2 values buffer is empty (#45252) * GH-37630 - [C++][Python][Dataset] Allow disabling fragment metadata caching (#45330) * GH-39023 - [C++][CMake] Add missing launcher path conversion for ExternalPackage (#45349) * GH-43057 - [C++] Thread-safe AesEncryptor / AesDecryptor (#44990) * GH-45048 - [C++][Parquet] Deprecate unused chunk_size parameter in parquet::arrow::FileWriter::NewRowGroup() (#45088) * GH-45129 - [Python][C++] Fix usage of deprecated C++ functionality on pyarrow (#45189) * GH-45132 - [C++][Gandiva] Update LLVM to 18.1 (#45114) * GH-45185 - [C++][Parquet] Raise an error for invalid repetition levels when delimiting records (#45186) * GH-45254 - [C++][Acero] Fix the row offset truncation in row table merge (#45255) * GH-45266 - [C++][Acero] Fix the running tasks count of Scheduler when get error tasks in multi-threads (#45268) * GH-45270 - [C++][CI] Disable mimalloc on Valgrind builds (#45271) * GH-45301 - [C++] Change PrimitiveArray ctor to protected (#45444) * GH-45334 - [C++][Acero] Fix swiss join overflow issues in row offset calculation for fixed length and null masks (#45336) * GH-45362 - [C++] Fix identity cast for time and list scalar OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=55	2025-06-13 18:31:56 +00:00
anag_factory	d35123d1c1	Accepting request 1271193 from science - to fix cmake-4 build problems, upgrade bundled mimalloc from 2.0.6 to 2.0.9 and add apache-arrow-19.0.1-mimalloc-version.patch; mimalloc changes according to readme.md: * 2.0.9: - Supports building with asan and improved [Valgrind] support. - Support abitrary large alignments, in particular for `std::pmr` pools. - Added C++ STL allocators attached to a specific heap. - Heap walks now visit all object (including huge objects). - Support Windows nano server containers. - Various small bug fixes. * 2.0.7: - Initial support for [Valgrind] for leak testing and heap block overflow detection. - Initial support for attaching heaps to a speficic memory area. - Fix `realloc` behavior for zero size blocks, - Remove restriction to integral multiple of the alignment in `alloc_align`. - Improved aligned allocation performance. - Reduced contention with many threads on few processors. - VS2022 support. - Support `pkg-config`. OBS-URL: https://build.opensuse.org/request/show/1271193 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=19	2025-04-22 15:28:04 +00:00
bnavigator	a03ab640dd	OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=53	2025-04-21 16:33:45 +00:00
bnavigator	12b0bf8517	Accepting request 1271189 from home:hsk17:branches:home:simotek:cmake4b changes to fix cmake-4 build problems OBS-URL: https://build.opensuse.org/request/show/1271189 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=52	2025-04-21 16:30:53 +00:00
anag_factory	4d14f521d8	Accepting request 1264972 from science - Re-enable flight, grpc has been fixed boo#1237422 OBS-URL: https://build.opensuse.org/request/show/1264972 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=18	2025-04-02 19:05:38 +00:00
bnavigator	986ddd3f2e	Accepting request 1264971 from home:bnavigator:branches:science - Re-enable flight, grpc has been fixed boo#1237422 OBS-URL: https://build.opensuse.org/request/show/1264971 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=50	2025-03-28 08:48:20 +00:00
anag_factory	c3e5d75605	Accepting request 1252869 from science - Add missing dependencies for libboost_process explicitly boo#1239599 (forwarded request 1252868 from bnavigator) OBS-URL: https://build.opensuse.org/request/show/1252869 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=17	2025-03-13 21:47:20 +00:00
bnavigator	1d57fa866b	Accepting request 1252868 from home:bnavigator:branches:science - Add missing dependencies for libboost_process explicitly boo#1239599 OBS-URL: https://build.opensuse.org/request/show/1252868 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=48	2025-03-13 19:08:14 +00:00
dimstar_suse	01acc18061	Accepting request 1247454 from science - disable flight because of gh#grpc/grpc#37968 boo#1237422 (forwarded request 1247453 from bnavigator) OBS-URL: https://build.opensuse.org/request/show/1247454 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=16	2025-02-20 18:53:17 +00:00
bnavigator	285eb6979a	Accepting request 1247453 from home:bnavigator:branches:science - disable flight because of gh#grpc/grpc#37968 boo#1237422 OBS-URL: https://build.opensuse.org/request/show/1247453 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=46	2025-02-20 16:43:05 +00:00
bnavigator	ea30dc8735	OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=45	2025-02-18 19:10:15 +00:00
bnavigator	e02b9f7269	OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=44	2025-02-18 19:08:10 +00:00
bnavigator	2c44dc303e	OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=43	2025-02-18 15:09:20 +00:00
bnavigator	1392e3167f	OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=42	2025-02-18 13:00:56 +00:00
bnavigator	d9b8e0ac6e	OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=41	2025-02-17 22:38:30 +00:00
bnavigator	55775895c9	- Update to 19.0.1 ## Bug Fixes * [C++] Fix overflow issues for large build side in swiss join (#45108) * [C++][Fuzzing] Fix Negation bug discovered by fuzzing (#45181) * [C++][Parquet] Omit level histogram when max level is 0 (#45285) * [Parquet][C++] Fix statistics load logic for no row group and multiple row groups (#45350) * [C++] Disable Flight test (#45232) ## Improvements * [C++][Parquet] Improve performance of generating size statistics (#45202) * [C++][S3] Workaround compatibility issue between AWS SDK and MinIO (#45310) - Release 19.0.0 ## New Features and Improvements * [CI][C++] Add a nightly job to test offline build (#44721) * [C++] GcsFileSystem::Make should return Result (#44503) * [C++][Parquet] Implement SizeStatistics (#40594) * [C++] Reduce string inlining in Substrait serde (#45174) * [C++][Acero] Enhance asof_join to work in multi-threaded execution by sequencing input (#44083) * [C++] Support the AWS S3 SSE-C encryption (#43601) * [C++][Parquet] Parquet Metadata Printer supports print sort-columns (#43599) * [C++] Add C++ implementation of Async C Data Interface (#44495) * [C++][Acero] Support AVX2 swiss join decoding (#43832) * [C++] skip -0117 in StrptimeZoneOffset for old glibc (#44621) * [C++] Add arrow::RecordBatch::MakeStatisticsArray() (#44252) OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=40	2025-02-17 22:32:29 +00:00
anag_factory	ffdc9dadfc	Accepting request 1218457 from science OBS-URL: https://build.opensuse.org/request/show/1218457 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=15	2024-10-27 10:25:51 +00:00
bnavigator	be27bc1230	Accepting request 1218425 from home:yeey:OpenWebUI - Set the appropriate C++ complier for the given platform so it will compile on Leap 15.x. - Enable sle15_python_module_pythons. OBS-URL: https://build.opensuse.org/request/show/1218425 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=38	2024-10-26 01:06:02 +00:00
anag_factory	758d4c683d	Accepting request 1201792 from science OBS-URL: https://build.opensuse.org/request/show/1201792 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=14	2024-09-22 09:05:54 +00:00
bnavigator	3f02fd3dcd	Accepting request 1201791 from home:bnavigator:branches:science - Add apache-arrow-pr43766-boost1_86.patch for Boost 1.86 * gh#apache/arrow#43766 OBS-URL: https://build.opensuse.org/request/show/1201791 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=36	2024-09-18 12:46:47 +00:00
dimstar_suse	1db8e83530	Accepting request 1194086 from science OBS-URL: https://build.opensuse.org/request/show/1194086 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=13	2024-08-16 10:23:38 +00:00
bnavigator	9bed06f66b	Accepting request 1194085 from home:bnavigator:branches:science - Update to 17.0.0 ## Bug Fixes * [C++] Add option to string ‘center’ kernel to control left/right alignment on odd number of padding (#41449) * [C++][Python] Fix casting to extension type with fixed size list storage type (#42219) * [C++] Replace null_count with MayHaveNulls in ListArrayFromArray and MapArray (#41957) * [C++][Python] RecordBatch.filter() segfaults if passed a ChunkedArray (#40971) * [C++][Parquet] Timestamp conversion from Parquet to Arrow does not follow compatibility guidelines for convertedType * [C++] Use LargeStringArray for casting when writing tables to CSV (#40271) * [C++][Python] Map child Array constructed from keys and items shouldn’t have offset (#40871) * [C++] Fix compile warning with ‘implicitly-defined constructor does not initialize’ in encoding_benchmark (#41060) * [C++] Get null_bit_id according to are_cols_in_encoding_order in NullUpdateColumnToRow_avx2 (#40998) * [C++] Clean up unused parameter warnings (#41111) * [C++][Acero] Fix asof join race (#41614) * [C++] support for single threaded joins (#41125) * [C++] Fix hashjoin benchmark failed at make utf8’s random batches (#41195) * [C++] Check to avoid copying when NullBitmapBuffer is Null (#41452) * [C++] Fix crash on invalid Parquet file (#41366) * [C++][Parquet] More strict Parquet level checking (#41346) * [C++][Gandiva] Fix gandiva cache size env var (#41330) * [C++][CMake][Windows] Remove needless .dll suffix from link libraries (#41341) * [C++][CMake] Remove unused ARROW_NO_DEPRECATED_API (#41345) * [C++][maybe_unused] with Arrow macro (#41359) * [C++][Large] ListView and Map nested types for scalar_if_else’s kernel functions (#41419) * [C++][Gandiva] Fix ascii_utf8 function to return same result on x86 and Arm (#41434) * [C++] Reuse deduplication logic for direct registration (#41466) * [C++] Clean up more redundant move warnings (#41487) * [C++][Compute] Remove redundant logic for ArrayData as ExecResults in ExecScalarCaseWhen (#41380) * [C++][CMake] correctly use Protobuf_PROTOC_EXECUTABLE (#41582) * [C++][CMake] Fix ARROW_USE_BOOST detect condition (#41622) * [C++][Python] Add optional null_bitmap to MapArray::FromArrays (#41757) * [C++] macros.h: Fix ARROW_FORCE_INLINE for MSVC (#41712) * [C++][Acero] Remove an useless parameter for QueryContext::Init called in hash_join_benchmark (#41716) * [C++] Fix the issue that temp vector stack may be under sized (#41746) * [C++] Check that extension metadata key is present before attempting to delete it (#41763) * [C++] Iterator releases its resource immediately when it reads all values (#41824) * [C++][Flight][Benchmark] Ensure waiting server ready (#41793) * [C++] Fix avx2 gather offset larger than 2GB in CompareColumnsToRows (#42188) * [C++][S3] Fix potential deadlock when closing output stream (#41876) * [CI][C++] Clear cache for mamba on AppVeyor (#41977) * [CI][Python][C++] Fix utf8proc detection for wheel on Windows (#42022) * [C++] Support list-views on list_slice (#42067) * [C++] Fix an OTel test failure and remove needless logs (#42122) * [C++][FS][Azure] Ensure setting BlobSasBuilder::Protocol (#42108) * [C++] Support list-view typed arrays in array_take and array_filter (#42117) * [C++] Fix some potential uninitialized variable warnings (#42207) * [C++] Avoid invalid accesses in parquet-encoding-benchmark (#42141) * [C++] Use FetchContent for bundled ORC (#43011) * [C++] Fix GetRecordBatchPayload crashes for device data (#42199) * [C++] Use non-stale c-ares download URL (#42250) * [C++][Parquet] Check for valid ciphertext length to prevent segfault (#43071) * [C++][Compute] Mark KeyCompare.CompareColumnsToRowsLarge as large memory test (#43128) * [C++] Upgrade bundled google-cloud-cpp to 2.22.0 (#43136) ## New Features and Improvements * [C++][Compute] Implement Grouper::Reset (#41352) * [Go][C++] Implement Flight SQL Bulk Ingestion (#38385) * [C++][FS][Azure] Support azure cli auth (#41976) * [C++][FS][Azure] Add support for environment credential (#41715) * [C++] Optimize Take for fixed-size types including nested fixed-size lists (#41297) * [C++][Device] Add Copy/View slice functions to a CPU pointer (#41477) * [C++] Add support for OpenTelemetry logging (#39905) * [C++] Import/Export ArrowDeviceArrayStream (#40807) * [C++] move LocalFileSystem to the registry (#40356) * [C++] Make flatbuffers serialization more deterministic (#40392) * [C++][Gandiva] add RE2::Options set_dot_nl(true) for Like function (#40970) * [C++] Introduce portable compiler assumptions (#41021) * [C++] Add a grouper benchmark for preventing performance regression (#41036) * [C++] Support flatten for combining nested list related types (#41092) * [C++] Clean up remaining tasks related to half float casts (#41084) * [C++][FS][Azure] Add support for CopyFile with hierarchical namespace support (#41276) * [C++] Add is_validity_defined_by_bitmap() predicate (#41115) * [C++] IO: enhance boundary checking in CompressedInputStream (#41117) * [C++][Python] Expose recursive flatten for lists on list_flatten kernel function and pyarrow bindings (#41295) * [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst (#41187) * [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type (#41373) * [C++][Acero] Use per-node basis temp vector stack to mitigate overflow (#41335) * [C++][Parquet] Optimize DelimitRecords by batch execution when max_rep_level > 1 (#41362) * [C++][FS][Azure][Docs] Add AzureFileSystem to Filesystems API reference (#41411) * [C++] Use ASAN to poison temp vector stack memory (#41695) * [C++][S3] Add a new option to check existence before CreateDir (#41822) * [C++][Parquet] Fix DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize (#41546) * [C++] Thirdparty: Upgrade xsimd to 13.0.0 (#41548) * [C++] Improve fixed_width_test_util.h (#41575) * [C++] ChunkResolver: Implement ResolveMany and add unit tests (#41561) * [C++] fixed_width_internal.h: Simplify docstring and support bit-sized types (BOOL) (#41597) * [C++][Python] Extends the add_key_value to parquet::arrow and PyArrow (#41633) * [C++][CMake][Windows] Don’t build needless object libraries (#41658) * [C++][Python] PrettyPrint non-cpu data by copying to default CPU device (#42010) * [C++][Parquet] Thrift: generate template method to accelerate reading thrift (#41703) * [C++][Parquet] Minor: moving EncodedStats by default rather than copying (#41727) * [C++][ORC] Ensure setting detected ORC version (#41767) * [C++][Parquet] Add file metadata read/write benchmark (#41761) * [C++] Make git-dependent definitions internal (#41781) * [C++][S3] Remove GetBucketRegion hack for newer AWS SDK versions (#41798) * [C++][Parquet] normalize dictionary encoding to use RLE_DICTIONARY (#41819) * [C++] IPC: Minor enhance the code of writer (#41900) * [C++] Fix ExecuteScalar deduce all_scalar with chunked_array (#41925) * [C++] Minor enhance code style for FixedShapeTensorType (#41954) * [C++] Follow up of adding null_bitmap to MapArray::FromArrays (#41956) * [C++] Misc changes making code around list-like types and list-view types behave the same way (#41971) * [C++] : kernel.cc: Remove defaults on switch so that compiler can check full enum coverage for us (#41995) * [C++][Parquet] ParquetFilePrinter::JSONPrint print length of FLBA (#41981) * [C++][CMake] Add preset for Valgrind (#42110) * [C++] Move TakeXXX free functions into TakeMetaFunction and make them private (#42127) * [C++][FS][Azure] Validate AzureOptions::{blob,dfs}_storage_scheme (#42135) * [C++] list_parent_indices: Add support for list-view types (#42236) * [C++] Reduce the recursion of many-join test (#43042) * [C++] Limit buffer size in BufferedInputStream::SetBufferSize with raw_read_bound (#43064) - Require cmake lz4 for 1.10 - Update to 17.0.0 ## Bug Fixes * [C++][Python] Fix casting to extension type with fixed size list storage type (#42219) * [Python] Include metadata when creating pa.schema from PyCapsule (#41538) * [C++][Python] RecordBatch.filter() segfaults if passed a ChunkedArray (#40971) * [Python] pa.array: add check for byte-swapped numpy arrays inside python objects (#41549) * [Python] Fix read_table for encrypted parquet (#39438) * [Python] RunEndEncodedArray.from_arrays: bugfix for Array arguments (#40560) (#41093) * [C++][Python] Map child Array constructed from keys and items shouldn’t have offset (#40871) * [Python] `test_numpy_array_protocol` test failures with numpy 2.0.0rc1 * [Python] Fix StructArray.sort() for by=None (#41495) * [Python] Build with Python 3.13 (#42034) * [Python] remove special methods related to buffers in python <2.6 (#41492) * [Python] Fix reading column index with decimal values (#41503) * [Docs][Python] Remove duplicate contents (#41588) * [C++][Python] Add optional null_bitmap to MapArray::FromArrays (#41757) * [Python][Parquet] Implement to_dict method on SortingColumn (#41704) * [Python] CMake: ignore Parquet encryption option if Parquet itself is not enabled (fix Java integration build) (#41776) * [Python] Disallow direct pa.RecordBatchReader() construction to avoid segfaults (#41773) * [Python] Fix RecordBatchReader.cast to support casting to equal schema for all types (#42098) * [Python] Fix tests when using NumPy 2.0 on Windows (#42099) * [CI][Python] Use pip install -e instead of setup.py build_ext –inplace for installing pyarrow on verification script (#42007) * [CI][Python][C++] Fix utf8proc detection for wheel on Windows (#42022) * [Python][CI] Update expected output for numpy 2.0.0 (#42172) ## New Features and Improvements * [Python] Replace pandas.util.testing.rands with vendored version (#42089) * [Python] begin moving static settings to pyproject.toml (#41041) * [Python] Implement PyCapsule interface for Device data in PyArrow (#40717) * [Python] Expand the Arrow PyCapsule Interface with C Device Data support (#40708) * [Python] Let RecordBatch.filter accept a boolean expression in addition to mask array (#43043) * [Python] Fix pickling of LocalFileSystem for cython 2 (#41459) * [Python] Expand the C Device Interface bindings to support import on CUDA device (#40385) * [Python] Allow passing a mapping of column names to rename_columns (#40645) * [Python][Packaging] Strip unnecessary symbols when building wheels (#42028) * [Python][Docs] Update PyArrow installation docs for conda package split (#41135) * [Python] Basic bindings for Device and MemoryManager classes (#41685) * [C++][Python] Expose recursive flatten for lists on list_flatten kernel function and pyarrow bindings (#41295) * [Python][Packaging] Ensure to build with released numpy 2.0 (instead of RC) in the wheel building workflows (#42194) * [CI][Python] Add a job on ARM64 macOS (#41313) * [CI][Python] Reduce CI time on macOS (#41378) * [Python] Expose byte_width and bit_width of ExtensionType in terms of the storage type (#41413) * [Python] Update Python development guide about components being enabled by default based on Arrow C++ (#41705) * [Python] Building PyArrow: enable/disable python components by default based on availability in Arrow C++ (#41494) * [C++][Python] Extends the add_key_value to parquet::arrow and PyArrow (#41633) * [Python] Ensure Buffer methods don’t crash with non-CPU data (#41889) * [C++][Python] PrettyPrint non-cpu data by copying to default CPU device (#42010) * [Python][Parquet] Update BYTE_STREAM_SPLIT description in write_table() docstring (#41759) * [Python] Add support for Pyodide (#37822) * [Python] Fix pandas tests to follow downstream datetime64 unit changes (#41979) * [Python] Allow Array.filter() to take general array input (#42051) * [Python] Expose new FLOAT16 logical type in the pyarrow.parquet bindings (#42103) * [Python] Array gracefully fails on non-cpu device (#42113) * [Python][Parquet] Pyarrow store decimal as integer (#42169) * [Python] Add CI job for Numpy 1.X (#42189) * [CI][Python] Pin openjdk=17 in python substrait integration (#43051) - Drop pyarrow-pr41319-numpy2-tests.patch - Add pyarrow-pr433325-extradirs.patch gh#apache/arrow/pull/43325 OBS-URL: https://build.opensuse.org/request/show/1194085 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=34	2024-08-15 09:43:24 +00:00
anag_factory	9c4175a075	Accepting request 1170145 from science - Update to 16.0.0 ## Bug Fixes * [C++][ORC] Catch all ORC exceptions to avoid crash (#40697) * [C++][S3] Handle conventional content-type for directories (#40147) * [C++] Strengthen handling of duplicate slashes in S3, GCS (#40371) * [C++] Avoid hash_mean overflow (#39349) * [C++] Fix spelling (array) (#38963) * [C++][Parquet] Fix crash in Modular Encryption (#39623) * [C++][Dataset] Fix failures in dataset-scanner-benchmark (#39794) * [C++][Device] Fix Importing nested and string types for DeviceArray (#39770) * [C++] Use correct (non-CPU) address of buffer in ExportDeviceArray (#39783) * [C++] Improve error message for "chunker out of sync" condition (#39892) * [C++] Use make -j1 to install bundled bzip2 (#39956) * [C++] DatasetWriter avoid creating zero-sized batch when max_rows_per_file enabled (#39995) * [C++][CI] Disable debug memory pool for ASAN and Valgrind (#39975) * [C++][Gandiva] Make Gandiva's default cache size to be 5000 for object code cache (#40041) * [C++][FS][Azure] Fix CreateDir and DeleteDir trailing slash issues on hierarchical namespace accounts (#40054) * [C++][FS][Azure] Validate containers in AzureFileSystem::Impl::MovePaths() (#40086) * [C++] Decimal types with different precisions and scales bind failed in resolve type when call arithmetic function (#40223) * [C++][Docs] Correct the console emitter link (#40146) * [C++][Python] Fix test_gdb failures on 32-bit (#40293) * [Python][C++] Fix large file handling on 32-bit Python build (#40176) * [C++] Support glog 0.7 build (#40230) * [C++] Fix cast function bind failed after add an alias name through AddAlias (#40200) * [C++] TakeCC: Concatenate only once and delegate to TakeAA instead of TakeCA (#40206) * [C++] Fix an abort on asof_join_benchmark run for lost an arg (#40234) * [C++] Fix an simple buffer-overflow case in decimal_benchmark (#40277) * [C++] Reduce S3Client initialization time (#40299) * [C++] Fix a wrong total_bytes to generate StringType's test data in vector_hash_benchmark (#40307) * [C++][Gandiva] Add support for compute module's decimal promotion rules (#40434) * [C++][Parquet] Add missing config.h include in key_management_test.cc (#40330) * [C++][CMake] Add missing glog::glog dependency to arrow_util (#40332) * [C++][Gandiva] Add missing OpenSSL dependency to encrypt_utils_test.cc (#40338) * [C++] Remove const qualifier from Buffer::mutable_span_as (#40367) * [C++] Avoid simplifying expressions which call impure functions (#40396) * [C++] Expose protobuf dependency if opentelemetry or ORC are enabled (#40399) * [C++][FlightRPC] Add missing expiration_time arguments (#40425) * [C++] Move key_hash/key_map/light_array related files to internal for prevent using by users (#40484) * [C++] Add missing Threads::Threads dependency to arrow_static (#40433) * [C++] Fix static build on Windows (#40446) * [C++] Ensure using bundled FlatBuffers (#40519) * [C++][CI] Fix TSAN and ASAN/UBSAN crashes (#40559) * [C++] Repair FileSystem merge error (#40564) * [C++] Fix 3.12 Python support (#40322) * [C++] Move mold linker flags to variables (#40603) * [C++] Enlarge dest buffer according to dest offset for CopyBitmap benchmark (#40769) * [C++][Gandiva] 'ilike' function does not work (#40728) * [C++] Fix protobuf package name setting for builds with substrait (#40753) * [C++][ORC] Fix std::filesystem related link error with ORC 2.0.0 or later (#41023) * [C++] Fix TSAN link error for module library (#40864) * [C++][FS][Azure] Don't run TestGetFileInfoGenerator() with Valgrind (#41163) * [C++] Fix null count check in BooleanArray.true_count() (#41070) * [C++] IO: fixing compiling in gcc 7.5.0 (#41025) * [C++][Parquet] Bugfixes and more tests in boolean arrow decoding (#41037) * [C++] formatting.h: Make sure space is allocated for the 'Z' when formatting timestamps (#41045) * [C++] Ignore ARROW_USE_MOLD/ARROW_USE_LLD with clang < 12 (#41062) * [C++] Fix: left anti join filter empty rows. (#41122) * [CI][C++] Don't use CMake 3.29.1 with vcpkg (#41151) * [CI][C++] Use newer LLVM on Ubuntu 24.04 (#41150) * [CI][R][C++] test-r-linux-valgrind has started failing * [C++][Python] Sporadic asof_join failures in PyArrow * [C++] Fix Valgrind error in string-to-float16 conversion (#41155) * [C++] Stop defining ARROW_TEST_MEMCHECK in config.h.cmake (#41177) * [C++] Fix mistake in integration test. Explicitly cast std::string to avoid compiler interpreting char* -> bool (#41202) ## New Features and Improvements * [C++] Filesystem implementation for Azure Blob Storage * [C++] Implement cast to/from halffloat (#40067) * [C++] Add residual filter support to swiss join (#39487) * [C++] Add support for building with Emscripten (#37821) * [C++][Python] Add missing methods to RecordBatch (#39506) * [C++][Java][Flight RPC] Add Session management messages (#34817) * [C++] build filesystems as separate modules (#39067) * [C++][Parquet] Rewrite BYTE_STREAM_SPLIT SSE optimizations using xsimd (#40335) * [C++] Add support for service-specific endpoint for S3 using AWS_ENDPOINT_URL_S3 (#39160) * [C++][FS][Azure] Implement DeleteFile() (#39840) * [C++] Implement Azure FileSystem Move() via Azure DataLake Storage Gen 2 API (#39904) * [C++] Add ImportChunkedArray and ExportChunkedArray to/from ArrowArrayStream (#39455) * [CI][C++][Go] Don't run jobs that use a self-hosted GitHub Actions Runner on fork (#39903) * [C++][FS][Azure] Use the generic filesystem tests (#40567) * [C++][Compute] Add binary_slice kernel for fixed size binary (#39245) * [C++] Avoid creating memory manager instance for every buffer view/copy (#39271) * [C++][Parquet] Minor: Style enhancement for parquet::FileMetaData (#39337) * [C++] IO: Reuse same buffer in CompressedInputStream (#39807) * [C++] Use more permissable return code for rename (#39481) * [C++][Parquet] Use std::count in ColumnReader ReadLevels (#39397) * [C++] Support cast kernel from large string, (large) binary to dictionary (#40017) * [C++] Pass -jN to make in external projects (#39550) * [C++][Parquet] Add integration test for BYTE_STREAM_SPLIT (#39570) * [C++] Ensure top-level benchmarks present informative metrics (#40091) * [C++] Ensure CSV and JSON benchmarks present a bytes/s or items/s metric (#39764) * [C++] Ensure dataset benchmarks present a bytes/s or items/s metric (#39766) * [C++][Gandiva] Ensure Gandiva benchmarks present a bytes/s or items/s metric (#40435) * [C++][Parquet] Benchmark levels decoding (#39705) * [C++][FS][Azure] Remove StatusFromErrorResponse as it's not necessary (#39719) * [C++][Parquet] Make BYTE_STREAM_SPLIT routines type-agnostic (#39748) * [C++][Device] Generic CopyBatchTo/CopyArrayTo memory types (#39772) * [C++] Document and micro-optimize ChunkResolver::Resolve() (#39817) * [C++] Allow building cpp/src/arrow/*/.cc without waiting bundled libraries (#39824) * [C++][Parquet] Parquet binary length overflow exception should contain the length of binary (#39844) * [C++][Parquet] Minor: avoid creating a new Reader object in Decoder::SetData (#39847) * [C++] Thirdparty: Bump google benchmark to 1.8.3 (#39878) * [C++] DataType::ToString support optionally show metadata (#39888) * [C++][Gandiva] Accept LLVM 18 (#39934) * [C++] Use Requires instead of Libs for system RE2 in arrow.pc (#39932) * [C++] Small CSV reader refactoring (#39963) * [C++][Parquet] Expand BYTE_STREAM_SPLIT to support FIXED_LEN_BYTE_ARRAY, INT32 and INT64 (#40094) * [C++][FS][Azure] Add support for reading user defined metadata (#40671) * [C++][FS][Azure] Add AzureFileSystem support to FileSystemFromUri() (#40325) * [C++][FS][Azure] Make attempted reads and writes against directories fail fast (#40119) * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor (#40064) * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add support for different data types (#40359) * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add option to cast NULL to NaN (#40803) * [C++][FS][Azure] Implement DeleteFile() for flat-namespace storage accounts (#40075) * [CI][C++] Add a job on ARM64 macOS (#40456) * [C++][Parquet] Remove AVX512 variants of BYTE_STREAM_SPLIT encoding (#40127) * [C++][Parquet][Tools] Print FIXED_LEN_BYTE_ARRAY length (#40132) * [C++] Make S3 narrative test more flexible (#40144) * [C++] Remove redundant invocation of BatchesFromTable (#40173) * [C++][CMake] Use "RapidJSON" CMake target for RapidJSON (#40210) * [C++][CMake] Use arrow/util/config.h.cmake instead of add_definitions() (#40222) * [C++] Fix: improve the backpressure handling in the dataset writer (#40722) * [C++][CMake] Improve description why we need to initialize AWS C++ SDK in arrow-s3fs-test (#40229) * [C++] Add support for system glog 0.7 (#40275) * [C++] Specialize ResolvedChunk::Value on value-specific types instead of entire class (#40281) * [C++][Docs] Add documentation of array factories (#40373) * [C++][Parquet] Allow use of FileDecryptionProperties after the CryptoFactory is destroyed (#40329) * [FlightRPC][C++][Java][Go] Add URI scheme to reuse connection (#40084) * [C++] Add benchmark for ToTensor conversions (#40358) * [C++] Define ARROW_FORCE_INLINE for non-MSVC builds (#40372) * [C++] Add support for mold (#40397) * [C++] Add support for LLD (#40927) * [C++] Produce better error message when Move is attempted on flat-namespace accounts (#40406) * [C++][ORC] Upgrade ORC to 2.0.0 (#40508) * [CI][C++] Don't install FlatBuffers (#40541) * [C++] Ensure pkg-config flags include -ldl for static builds (#40578) * [Dev][C++][Python][R] Use pre-commit for clang-format (#40587) * [C++] Rename Function::is_impure() to is_pure() (#40608) * [C++] Add missing util/config.h in arrow/io/compressed_test.cc (#40625) * [Python][C++] Support conversion of pyarrow.RunEndEncodedArray to numpy/pandas (#40661) * [C++] Expand Substrait type support (#40696) * [C++] Create registry for Devices to map DeviceType to MemoryManager in C Device Data import (#40699) * [C++][Parquet] Minor enhancement code of encryption (#40732) * [C++][Parquet] Simplify PageWriter and ColumnWriter creation (#40768) * [C++] Re-order loads and stores in MemoryPoolStats update (#40647) * [C++] Revert changes from PR #40857 (#40980) * [C++] Correctly report asimd/neon in GetRuntimeInfo (#40857) * [C++] Thirdparty: bump zstd to 1.5.6 (#40837) * [Docs][C++][Python] Add initial documentation for RecordBatch::Tensor conversion (#40842) * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add support for row-major (#40867) * [C++][Parquet] Encoding: Optimize DecodeArrow/Decode(bitmap) for PlainBooleanDecoder (#40876) * [C++] Suppress shorten-64-to-32 warnings in CUDA/Skyhook codes (#40883) * [C++] Fix unused function build error (#40984) * [C++][Parquet] RleBooleanDecoder supports DecodeArrow with nulls (#40995) * [C++][FS][Azure] Adjust DeleteDir/DeleteDirContents/GetFileInfoSelector behaviors against Azure for generic filesystem tests (#41068) * [C++][Parquet] Avoid allocating buffer object in RecordReader's SkipRecords (#39818) - Drop apache-arrow-pr40230-glog-0.7.patch - Drop apache-arrow-pr40275-glog-0.7-2.patch - Belated inclusion of submission without changelog by Shani Hadiyanto <shanipribadi@gmail.com>) * disable static devel packages by default: The CMake targets require them for all builds, if not disabled * Add subpackages for Apache Arrow Flight and Flight SQL - Update to 16.0.0 * [Python] construct pandas.DataFrame with public API in to_pandas (#40897) * [Python] Fix ORC test segfault in the python wheel windows test (#40609) * [Python] Attach Python stacktrace to errors in ConvertPyError (#39380) * [Python] Plug reference leaks when creating Arrow array from Python list of dicts (#40412) * [Python] Empty slicing an array backwards beyond the start is now empty (#40682) * [Python] Slicing an array backwards beyond the start now includes first item. (#39240) * [Python] Calling pyarrow.dataset.ParquetFileFormat.make_write_options as a class method results in a segfault (#40976) * [Python] Fix parquet import in encryption test (#40505) * [Python] fix raising ValueError on _ensure_partitioning (#39593) * [Python] Validate max_chunksize in Table.to_batches (#39796) * [C++][Python] Fix test_gdb failures on 32-bit (#40293) * [Python] Make Tensor.__getbuffer__ work on 32-bit platforms (#40294) * [Python] Avoid using np.take in Array.to_numpy() (#40295) * [Python][C++] Fix large file handling on 32-bit Python build (#40176) * [Python] Update size assumptions for 32-bit platforms (#40165) * [Python] Fix OverflowError in foreign_buffer on 32-bit platforms (#40158) * [Python] Add Type_FIXED_SIZE_LIST to _NESTED_TYPES set (#40172) * [Python] Mark ListView as a nested type (#40265) * [Python] only allocate the ScalarMemoTable when used (#40565) * [Python] Error compiling Cython files on Windows during release verification * [Python] Fix flake8 failures in python/benchmarks/parquet.py (#40440) * [Python] Suppress python/examples/minimal_build/Dockerfile.* warnings (#40444) * [Python][Docs] Add workaround for autosummary (#40739) * [Python] BUG: Empty slicing an array backwards beyond the start should be empty * [CI][Python] Activate ARROW_PYTHON_VENV if defined in sdist-test job (#40707) * [CI][Python] CI failures on Python builds due to pytest_cython (#40975) * [Python] ListView pandas tests should use np.nan instead of None (#41040) * [C++][Python] Sporadic asof_join failures in PyArrow ## New Features and Improvements * [Python][CI] Remove legacy hdfs tests from hdfs and hypothesis setup (#40363) * [Python] Remove deprecated pyarrow.filesystem legacy implementations (#39825) * [C++][Python] Add missing methods to RecordBatch (#39506) * [Python][CI] Support ORC in Windows wheels * [Python] Correct test marker for join_asof tests (#40666) * [Python] Add join_asof binding (#34234) * [Python] Add a function to download and extract timezone database on Windows (#38179) * [Python][CI][Packaging] Enable ORC on Windows Appveyor CI and Windows wheels for pyarrow * [Python] Add a FixedSizeTensorScalar class (#37533) * [Python][CI][Dev][Python] Release and merge script errors (#37819)" (#40150) * [Python] Construct pyarrow.Field and ChunkedArray through Arrow PyCapsule Protocol (#40818) * [Python] Fix missing byte_width attribute on DataType class (#39592) * [Python] Compatibility with NumPy 2.0 * [Packaging][Python] Enable building pyarrow against numpy 2.0 (#39557) * [Python] Basic pyarrow bindings for Binary/StringView classes (#39652) * [Python] Expose force_virtual_addressing in PyArrow (#39819) * [Python][Parquet] Support hashing for FileMetaData and ParquetSchema (#39781) * [Python] Add bindings for ListView and LargeListView (#39813) * [Python][Packaging] Build pyarrow wheels with numpy RC instead of nightly (#41097) * [Python] Support creating Binary/StringView arrays from python objects (#39853) * [Python] ListView support for pa.array() (#40160) * [Python][CI] Remove upper pin on pytest (#40487) * [Python][FS][Azure] Minimal Python bindings for AzureFileSystem (#40021) * [Python] Low-level bindings for exporting/importing the C Device Interface (#39980) * [Python] Add ChunkedArray import/export to/from C (#39985) * [Python] Use Cast() instead of CastTo (#40116) * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor (#40064) * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add support for different data types (#40359) * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add option to cast NULL to NaN (#40803) * [Python] Support requested_schema in __arrow_c_stream__() (#40070) * [Python] Support Binary/StringView conversion to numpy/pandas (#40093) * [Python] Allow FileInfo instances to be passed to dataset init (#40143) * [Python][CI] Add 32-bit Debian build on Crossbow (#40164) * [Python] ListView arrow-to-pandas conversion (#40482) * [Python][CI] Disable generating C lines in Cython tracebacks (#40225) * [Python] Support construction of Run-End Encoded arrays in pa.array(..) (#40341) * [Python] Accept dict in pyarrow.record_batch() function (#40292) * [Python] Update for NumPy 2.0 ABI change in PyArray_Descr->elsize (#40418) * [Python][CI] Fix install of nightly dask in integration tests (#40378) * [Python] Fix byte_width for binary(0) + fix hypothesis tests (#40381) * [Python][CI] Fix dataset partition filter tests with pandas nightly (#40429) * [Docs][Python] Added JsonFileFormat to docs (#40585) * [Dev][C++][Python][R] Use pre-commit for clang-format (#40587) * [Python][C++] Support conversion of pyarrow.RunEndEncodedArray to numpy/pandas (#40661) * [Python] Simplify and improve perf of creation of the column names in Table.to_pandas (#40721) * [Docs][C++][Python] Add initial documentation for RecordBatch::Tensor conversion (#40842) * [C++][Python] Basic conversion of RecordBatch to Arrow Tensor - add support for row-major (#40867) * [CI][Python] check message in test_make_write_options_error for Cython 2 (#41059) * [Python] Add copy keyword in Array.array for numpy 2.0+ compatibility (#41071) * [Python][Packaging] PyArrow wheel building is failing because of disabled vcpkg install of liblzma - Drop apache-arrow-pr40230-glog-0.7.patch - Drop apache-arrow-pr40275-glog-0.7-2.patch - Add pyarrow-pr41319-numpy2-tests.patch gh#apache/arrow#41319 OBS-URL: https://build.opensuse.org/request/show/1170145 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=12	2024-04-25 18:50:23 +00:00
bnavigator	fc3315cd8b	OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=32	2024-04-25 13:14:01 +00:00
bnavigator	d947cb7cd2	OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=31	2024-04-25 09:12:59 +00:00
bnavigator	c159005cc1	Accepting request 1170120 from home:bnavigator:numpy - Update to 16.0.0 ## Bug Fixes * [C++][ORC] Catch all ORC exceptions to avoid crash (#40697) * [C++][S3] Handle conventional content-type for directories (#40147) * [C++] Strengthen handling of duplicate slashes in S3, GCS (#40371) * [C++] Avoid hash_mean overflow (#39349) * [C++] Fix spelling (array) (#38963) * [C++][Parquet] Fix crash in Modular Encryption (#39623) * [C++][Dataset] Fix failures in dataset-scanner-benchmark (#39794) * [C++][Device] Fix Importing nested and string types for DeviceArray (#39770) * [C++] Use correct (non-CPU) address of buffer in ExportDeviceArray (#39783) * [C++] Improve error message for "chunker out of sync" condition (#39892) * [C++] Use make -j1 to install bundled bzip2 (#39956) * [C++] DatasetWriter avoid creating zero-sized batch when max_rows_per_file enabled (#39995) * [C++][CI] Disable debug memory pool for ASAN and Valgrind (#39975) * [C++][Gandiva] Make Gandiva's default cache size to be 5000 for object code cache (#40041) * [C++][FS][Azure] Fix CreateDir and DeleteDir trailing slash issues on hierarchical namespace accounts (#40054) * [C++][FS][Azure] Validate containers in AzureFileSystem::Impl::MovePaths() (#40086) * [C++] Decimal types with different precisions and scales bind OBS-URL: https://build.opensuse.org/request/show/1170120 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=30	2024-04-25 09:07:39 +00:00
bnavigator	525207619b	Accepting request 1163690 from home:shanipribadi I would like to have apache flight and apache flight sql library built. also disabling the static build because the generated CMake Targets includes them, making builds against libarrow requiring not just apache-arrow-devel but also all of the devel-static packages. note: flight and flight-sql are packaged separately. in upstream rpm and fedora repo, flight-sql is included in libarrow-flight-libs. OBS-URL: https://build.opensuse.org/request/show/1163690 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=29	2024-03-30 15:01:58 +00:00
anag_factory	b132f0a6a2	Accepting request 1160967 from science - Update to 15.0.2 ## Bug Fixes * [C++][Acero] Increase size of Acero TempStack (#40007) * [C++][Dataset] Add missing Protobuf static link dependency (#40015) * [C++] Possible data race when reading metadata of a parquet file (#40111) * [C++] Make span SFINAE standards-conforming to enable compilation with nvcc (#40253) - Update to 15.0.2 ## Bug Fixes * [Python] Fix except clauses (#40387) * [Python][CI] Skip failing test_dateutil_tzinfo_to_string (#40486) (forwarded request 1160966 from bnavigator) OBS-URL: https://build.opensuse.org/request/show/1160967 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=11	2024-03-25 20:09:02 +00:00
bnavigator	8d99637b3c	Accepting request 1160966 from home:bnavigator:branches:science - Update to 15.0.2 ## Bug Fixes * [C++][Acero] Increase size of Acero TempStack (#40007) * [C++][Dataset] Add missing Protobuf static link dependency (#40015) * [C++] Possible data race when reading metadata of a parquet file (#40111) * [C++] Make span SFINAE standards-conforming to enable compilation with nvcc (#40253) - Update to 15.0.2 ## Bug Fixes * [Python] Fix except clauses (#40387) * [Python][CI] Skip failing test_dateutil_tzinfo_to_string (#40486) OBS-URL: https://build.opensuse.org/request/show/1160966 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=27	2024-03-23 16:14:18 +00:00
dimstar_suse	74e375d960	Accepting request 1152982 from science OBS-URL: https://build.opensuse.org/request/show/1152982 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=10	2024-03-01 22:36:05 +00:00
bnavigator	f4b994c8a2	Accepting request 1152980 from home:bnavigator:branches:science - Reenable logging * Add apache-arrow-pr40230-glog-0.7.patch * Add apache-arrow-pr40275-glog-0.7-2.patch * now requires glog devel files to be present for apache-arrow-devel; ArrowConfig.cmake fails otherwise * gh#apache/arrow#40181 * gh#apache/arrow#40230 * gh#apache/arrow#40275 - Move d:l:p:n/python-pyarrow to the science/apache-arrow as multibuild package: Uses the same source and is tightly connected. OBS-URL: https://build.opensuse.org/request/show/1152980 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=25	2024-02-28 16:27:53 +00:00
anag_factory	d8d03abd38	Accepting request 1150089 from science OBS-URL: https://build.opensuse.org/request/show/1150089 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=9	2024-02-25 13:06:15 +00:00
bnavigator	b029d62e8c	Accepting request 1150081 from home:bnavigator:branches:science - Update to 15.0.1 ## Bug Fixes * [C++] "iso_calendar" kernel returns incorrect results for array length > 32 (#39360) * [C++] Explicit error in ExecBatchBuilder when appending var length data exceeds offset limit (int32 max) (#39383) * [C++][Parquet] Pass memory pool to decoders (#39526) * [C++][Parquet] Validate page sizes before truncating to int32 (#39528) * [C++] Fix tail-word access cross buffer boundary in `CompareBinaryColumnToRow` (#39606) * [C++] Fix the issue of ExecBatchBuilder when appending consecutive tail rows with the same id may exceed buffer boundary (for fixed size types) (#39585) * [Release] Update platform tags for macOS wheels to macosx_10_15 (#39657) * [C++][FlightRPC] Fix nullptr dereference in PollInfo (#39711) * [C++] Fix tail-byte access cross buffer boundary in key hash avx2 (#39800) * [C++][Acero] Fix AsOfJoin with differently ordered schemas than the output (#39804) * [C++] Expression ExecuteScalarExpression execute empty args function with a wrong result (#39908) * [C++] Strip extension metadata when importing a registered extension (#39866) * [C#] Restore support for .NET 4.6.2 (#40008) * [C++] Fix out-of-line data size calculation in BinaryViewBuilder::AppendArraySlice (#39994) * [C++][CI][Parquet] Fixing parquet column_writer_test building (#40175) ## New Features and Improvements * [C++] PollFlightInfo does not follow rule of 5 * [C++] Fix filter and take kernel for month_day_nano intervals (#39795) * [C++] Thirdparty: Bump zlib to 1.3.1 (#39877) * [C++] Add missing "#include <algorithm>" (#40010) - Release 15.0.0 ## Bug Fixes * [C++] Bring back case_when tests for union types (#39308) * [C++] Fix the issue of ExecBatchBuilder when appending consecutive tail rows with the same id may exceed buffer boundary (#39234) * [C++][Python] Add a no-op kernel for dictionary_encode(dictionary) (#38349) * [C++] Use the latest tagged version of flatbuffers (#38192) * [C++] Don't use MSVC_VERSION to determin -fms-compatibility-version (#36595) * [C++] Optimize hash kernels for Dictionary ChunkedArrays (#38394) * [C++][Gandiva] Avoid registering exported functions multiple times in gandiva (#37752) * [C++][Acero] Fix race condition caused by straggling input in the as-of-join node (#37839) * [C++][Parquet] add more closed file checks for ParquetFileWriter (#38390) * [C++][FlightRPC] Add missing app_metadata arguments (#38231) * [C++][Parquet] Fix Valgrind memory leak in arrow-dataset-file-parquet-encryption-test (#38306) * [C++][Parquet] Don't initialize OpenSSL explicitly with OpenSSL 1.1 (#38379) * [C++] Re-generate flatbuffers C++ for Skyhook (#38405) * [C++] Avoid passing null pointer to LZ4 frame decompressor (#39125) * [C++] Add missing explicit size_t cast for i386 (#38557) * [C++] Fix: add TestingEqualOptions for gtest functions. (#38642) * [C++][Gandiva] Use arrow io util to replace std::filesystem::path in gandiva (#38698) * [C++] Protect against PREALLOCATE preprocessor defined on macOS (#38760) * [C++] Check variadic buffer counts in bounds (#38740) * [C++][FS][Azure] Do nothing for CreateDir("/container", true) (#38783) * Fix TestArrowReaderAdHoc.ReadFloat16Files to use new uncompressed files (#38825) * [C++] S3FileSystem export s3 sdk config "use_virtual_addressing" to arrow::fs::S3Options (#38858) * [C++][Gandiva] Fix Gandiva to_date function's validation for supress errors parameter (#38987) * [C++][Parquet] Fix spelling (#38959) * [C++] Fix spelling (acero) (#38961) * [C++] Fix spelling (compute) (#38965) * [C++] Fix spelling (util) (#38967) * [C++] Fix spelling (dataset) (#38969) * [C++] Fix spelling (filesystem) (#38972) * [C++] Fix spelling (#38978) * [C++] Fix spelling (#38980) * [C++][Acero] union node output batches should be unordered (#39046) * [C++][CI] Fix Valgrind failures (#39127) * [C++] Remove needless system Protobuf dependency with -DARROW_HDFS=ON (#39137) * [C++][Compute] Fix negative duration division (#39158) * [C++] Add missing data copy in StreamDecoder::Consume(data) (#39164) * [C++] Remove compiler warnings with -Wconversion -Wno-sign-conversion in public headers (#39186) * [C++][Benchmarking] Remove hardcoded min times (#39307) * [C++] Don't use "if constexpr" in lambda (#39334) * [C++] Disable -Werror=attributes for Azure SDK's identity.hpp (#39448) * [C++] Fix compile warning (#39389) * [CI][JS] Force node 20 on JS build on arm64 to fix build issues (#39499) * [C++] Disable parallelism for jemalloc external project (#39522) * [C++][Parquet] Fix crash in test_parquet_dataset_lazy_filtering (#39632) * [C++] Disable parallelism for all `make`-based externalProjects when CMake >= 3.28 is used ## New Features and Improvements * [C++][JSON] Change the max rows to Unlimited(int_32) (#38582) * [C++][Python] Add "Z" to the end of timestamp print string when tz defined (#39272) * [C++][Python] DLPack implementation for Arrow Arrays (producer) (#38472) * [C++] Diffing of Run-End Encoded arrays (#35003) * [C++][Python][R] Allow users to adjust S3 log level by environment variable (#38267) * [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats (#35345) * [C++] Use Cast() instead of CastTo() for Scalar in test (#39044) * [C++][Python][Parquet] Implement Float16 logical type (#36073) * [C++] Add Utf8View and BinaryView to the c ABI (#38443) * [C++][Parquet] Add api to get RecordReader from RowGroupReader (#37003) * [C++] Expose a span converter for Buffer and ArraySpan (#38027) * [C++] Add A Dictionary Compaction Function For DictionaryArray (#37418) * [C++] Add arrow::ipc::StreamDecoder::Reset() (#37970) * [C++] Implement file reads for Azure filesystem (#38269) * [C++][Integration] Add C++ Utf8View implementation (#37792) * [C++][Gandiva] Add external function registry support (#38116) * [C++][Gandiva] Migrate LLVM JIT engine from MCJIT to ORC v2/LLJIT (#39098) * [C++] Feature: support concatenate recordbatches. (#37896) * [C++] Add support for specifying custom Array opening and closing delimiters to arrow::PrettyPrintDelimiters (#38187) * [R] Allow code() to return package name prefix. (#38144) * [C++][Benchmark] Add non-stream Codec Compression/Decompression (#38067) * [C++][Parquet] Change DictEncoder dtor checking to warning log (#38118) * [C++][Parquet] Support reading parquet files with multiple gzip members (#38272) * [C++][Parquet] check the decompressed page size same as size in page header (#38327) * [C++][Azure] Use properties for input stream metadata (#38524) * [C++][FS][Azure] Implement file writes (#38780) * [C++] Implement GetFileInfo for a single file in Azure filesystem (#38505) * [C++][CMake] Use transitive dependency for system GoogleTest (#38340) * [C++][Parquet] Use new encrypted files for page index encryption test (#38347) * Add validation logic for offsets and values to arrow.array.ListArray.fromArrays (#38531) * [C++][Acero] Create a sorted merge node (#38380) * [C++][Benchmark] Adding benchmark for LZ4/Snappy Compression (#38453) * [C++] Support LogicalNullCount for DictionaryArray (#38681) * [C++][Parquet] Faster scalar BYTE_STREAM_SPLIT (#38529) * [C++][Gandiva] Support registering external C functions (#38632) * [C++] Implement GetFileInfo(selector) for Azure filesystem (#39009) * [C++][FS][Azure] Implement CreateDir() (#38708) * [C++][FS][Azure] Implement DeleteDir() (#38793) * [C++][FS][Azure] Implement DeleteDirContents() (#38888) * [C++] : Implement AzureFileSystem::DeleteRootDirContents (#39151) * [C++][FS][Azure] Implement CopyFile() (#39058) * [C++][Go][Parquet] Add tests for reading Float16 files in parquet-testing (#38753) * [C++][FS][Azure] Rename AzurePath to AzureLocation (#38773) * [C++] Implement directory semantics even when the storage account doesn't support HNS (#39361) * [C++][Parquet] Update parquet.thrift to sync with 2.10.0 (#38815) * [C++] Replace "#ifdef ARROW_WITH_GZIP" in dataset test to ARROW_WITH_ZLIB (#38853) * [C++][Parquet] Using length to optimize bloom filter read (#38863) * [C++][Parquet] Minor: making parquet TypedComparator operation as const method (#38875) * [C++] DatasetWriter release rows_in_flight_throttle when allocate writing failed (#38885) * [C++][Parquet] Move EstimatedBufferedValueBytes from TypedColumnWriter to ColumnWriter (#39055) * [C++] Stop installing internal bpacking_simd* headers (#38908) * [C++][Gandiva] Refactor function holder to return arrow Result (#38873) * [C++] Use Cast() instead of CastTo() for Dictionary Scalar in test (#39362) * [C++] Use Cast() instead of CastTo() for Timestamp Scalar in test (#39060) * [C++] Use Cast() instead of CastTo() for List Scalar in test (#39353) * [C++][Parquet] Support row group filtering for nested paths for struct fields (#39065) * [C++] Refactor the Azure FS tests and filesystem class instantiation (#39207) * [C++][Parquet] Optimize FLBA record reader (#39124) * Create module info compiler plugin (#39135) * [C++] : Try to make Buffer::device_type_ non-optional (#39150) * [C++][Parquet] Remove deprecated AppendRowGroup(int64_t num_rows) (#39209) * [C++][Parquet] Avoid WriteRecordBatch from produce zero-sized RowGroup (#39211) * [C++] Support binary to fixed_size_binary cast (#39236) * [C++][Azure][FS] Add default credential auth configuration (#39263) * [C++] Don't install bundled Azure SDK for C++ with CMake 3.28+ (#39269) * [C++][FS] : Remove the AzureBackend enum and add more flexible connection options (#39293) * [C++][FS] : Inform caller of container not-existing when checking for HNS support (#39298) * [C++][FS][Azure] Add workload identity auth configuration (#39319) * [C++][FS][Azure] Add managed identity auth configuration (#39321) * [C++] Forward arguments to ExceptionToStatus all the way to Status::FromArgs (#39323) * [C++] Flaky DatasetWriterTestFixture.MaxRowsOneWriteBackpresure test (#39379) * [C++] Add ForceCachedHierarchicalNamespaceSupport to help with testing (#39340) * [C++][FS][Azure] Add client secret auth configuration (#39346) * [C++] Reduce function.h includes (#39312) * [C++] Use Cast() instead of CastTo() for Parquet (#39364) * [C++][Parquet] Vectorize decode plain on FLBA (#39414) * [C++][Parquet] Style: Using arrow::Buffer data_as api rather than reinterpret_cast (#39420) * [C++][ORC] Upgrade ORC to 1.9.2 (#39431) * [C++] Use default Azure credentials implicitly and support anonymous credentials explicitly (#39450) * [C++][Parquet] Allow reading dictionary without reading data via ByteArrayDictionaryRecordReader (#39153) - Disable logging until compatibility with glog is restored gh#apache/arrow#40181 OBS-URL: https://build.opensuse.org/request/show/1150081 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=23	2024-02-24 09:07:04 +00:00
anag_factory	78e62c5074	Accepting request 1139093 from science OBS-URL: https://build.opensuse.org/request/show/1139093 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=8	2024-01-16 20:38:38 +00:00
bnavigator	40e5983a49	Accepting request 1139092 from home:bnavigator:branches:science - Update to 14.0.2 ## New Features and Improvements * GH-38449 - [Release][Go][macOS] Use local test data if possible (#38450) * GH-38591 - [Parquet][C++] Remove redundant open calls in ParquetFileFormat::GetReaderAsync (#38621) ## Bug Fixes * GH-38345 - [Release] Use local test data for verification if possible (#38362) * GH-38438 - [C++] Dataset: Trying to fix the async bug in Parquet dataset (#38466) * GH-38577 - Reading parquet file behavior change from 13.0.0 to 14.0.0 * GH-38618 - [C++] S3FileSystem: fix regression in deleting explicitly created sub-directories (#38845) * GH-38861 - [C++] Add missing “-framework Security” to Libs.private in arrow.pc (#38869) * GH-39072 - [Release][CI] Python3.11-devel is required for the verification job on AlmaLinux 8 (#39073) * GH-39074 - [Release][Packaging] Use UTF-8 explicitly for KEYS (#39082) OBS-URL: https://build.opensuse.org/request/show/1139092 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=21	2024-01-16 09:00:47 +00:00
anag_factory	5938d9209e	Accepting request 1138300 from science OBS-URL: https://build.opensuse.org/request/show/1138300 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=7	2024-01-12 22:46:14 +00:00
bnavigator	6b4b71e17d	Accepting request 1138181 from home:pgajdos - disable some tests for s390x [bsc#1218592] OBS-URL: https://build.opensuse.org/request/show/1138181 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=19	2024-01-12 11:03:12 +00:00
anag_factory	39fe80b539	Accepting request 1125775 from science OBS-URL: https://build.opensuse.org/request/show/1125775 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=6	2023-11-14 20:42:29 +00:00
John Vandenberg	59b113ad72	Accepting request 1125774 from home:mimi_vx:branches:science - update 14.0.1 * GH-38431 - [Python][CI] Update fs.type_name checks for s3fs tests * GH-38607 - [Python] Disable PyExtensionType autoload - update to 14.0.1 * very long list of changes can be found here: https://arrow.apache.org/release/14.0.0.html OBS-URL: https://build.opensuse.org/request/show/1125774 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=17	2023-11-14 01:23:03 +00:00
anag_factory	110bca2ab1	Accepting request 1109686 from science OBS-URL: https://build.opensuse.org/request/show/1109686 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=5	2023-09-08 19:16:00 +00:00
bnavigator	0d83feb674	Accepting request 1109685 from home:bnavigator:branches:devel:languages:python:numeric - Update to 13.0.0 ## Acero * Handling of unaligned buffers is input nodes can be configured programmatically or by setting the environment variable ACERO_ALIGNMENT_HANDLING. The default behavior is to warn when an unaligned buffer is detected GH-35498. ## Compute * Several new functions have been added: - aggregate functions “first”, “last”, “first_last” GH-34911; - vector functions “cumulative_prod”, “cumulative_min”, “cumulative_max” GH-32190; - vector function “pairwise_diff” GH-35786. * Sorting now works on dictionary arrays, with a much better performance than the naive approach of sorting the decoded dictionary GH-29887. Sorting also works on struct arrays, and nested sort keys are supported using FieldRed GH-33206. * The check_overflow option has been removed from CumulativeSumOptions as it was redundant with the availability of two different functions: “cumulative_sum” and “cumulative_sum_checked” GH-35789. * Run-end encoded filters are efficiently supported GH-35749. * Duration types are supported with the “is_in” and “index_in” functions GH-36047. They can be multiplied with all integer types GH-36128. * “is_in” and “index_in” now cast their inputs more flexibly: they first attempt to cast the value set to the input type, then in the other direction if the former fails GH-36203. * Multiple bugs have been fixed in “utf8_slice_codeunits” when the stop option is omitted GH-36311. ## Dataset * A custom schema can now be passed when writing a dataset GH-35730. The custom schema can alter nullability or metadata information, but is not allowed to change the datatypes written. ## Filesystems * The S3 filesystem now writes files in equal-sized chunks, for compatibility with Cloudflare’s “R2” Storage GH-34363. * A long-standing issue where S3 support could crash at shutdown because of resources still being alive after S3 finalization has been fixed GH-36346. Now, attempts to use S3 resources (such as making filesystem calls) after S3 finalization should result in a clean error. * The GCS filesystem accepts a new option to set the project id GH-36227. ## IPC * Nullability and metadata information for sub-fields of map types is now preserved when deserializing Arrow IPC GH-35297. ## Orc * The Orc adapter now maps Arrow field metadata to Orc type attributes when writing, and vice-versa when reading GH-35304. ## Parquet * It is now possible to write additional metadata while a ParquetFileWriter is open GH-34888. * Writing a page index can be enabled selectively per-column GH-34949. In addition, page header statistics are not written anymore if the page index is enabled for the given column GH-34375, as the information would be redundant and less efficiently accessed. * Parquet writer properties allow specifying the sorting columns GH-35331. The user is responsible for ensuring that the data written to the file actually complies with the given sorting. * CRC computation has been implemented for v2 data pages GH-35171. It was already implemented for v1 data pages. * Writing compliant nested types is now enabled by default GH-29781. This should not have any negative implication. * Attempting to load a subset of an Arrow extension type is now forbidden GH-20385. Previously, if an extension type’s storage is nested (for example a “Point” extension type backed by a struct<x: float64, y: float64>), it was possible to load selectively some of the columns of the storage type. ## Substrait * Support for various functions has been added: “stddev”, “variance”, “first”, “last” (GH-35247, GH-35506). * Deserializing sorts is now supported GH-32763. However, some features, such as clustered sort direction or custom sort functions, are not implemented. ## Miscellaneous * FieldRef sports additional methods to get a flattened version of nested fields GH-14946. Compared to their non-flattened counterparts, the methods GetFlattened, GetAllFlattened, GetOneFlattened and GetOneOrNoneFlattened combine a child’s null bitmap with its ancestors’ null bitmaps such as to compute the field’s overall logical validity bitmap. * In other words, given the struct array [null, {'x': null}, {'x': 5}], FieldRef("x")::Get might return [0, null, 5] while FieldRef("y")::GetFlattened will always return [null, null, 5]. * Scalar::hash() has been fixed for sliced nested arrays GH-35360. * A new floating-point to decimal conversion algorithm exhibits much better precision GH-35576. * It is now possible to cast between scalars of different list-like types GH-36309. OBS-URL: https://build.opensuse.org/request/show/1109685 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=15	2023-09-08 07:18:56 +00:00
dimstar_suse	ad607e3932	Accepting request 1092627 from science OBS-URL: https://build.opensuse.org/request/show/1092627 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=4	2023-06-13 14:09:16 +00:00
bnavigator	cd7a2c42f0	Accepting request 1092619 from home:bnavigator:pyarrow - Update to 12.0.1 * [GH-35423] - [C++][Parquet] Parquet PageReader Force decompression buffer resize smaller (#35428) * [GH-35498] - [C++] Relax EnsureAlignment check in Acero from requiring 64-byte aligned buffers to requiring value-aligned buffers (#35565) * [GH-35519] - [C++][Parquet] Fixing exception handling in parquet FileSerializer (#35520) * [GH-35538] - [C++] Remove unnecessary status.h include from protobuf (#35673) * [GH-35730] - [C++] Add the ability to specify custom schema on a dataset write (#35860) * [GH-35850] - [C++] Don't disable optimization with RelWithDebInfo (#35856) - Drop cflags.patch -- fixed upstream OBS-URL: https://build.opensuse.org/request/show/1092619 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=13	2023-06-12 15:49:46 +00:00
dimstar_suse	b96a9994bf	Accepting request 1087840 from science - Update to 12.0.0 * Run-End Encoded Arrays have been implemented and are accessible (GH-32104) * The FixedShapeTensor Logical value type has been implemented using ExtensionType (GH-15483, GH-34796) ## Compute * New kernel to convert timestamp with timezone to wall time (GH-33143) * Cast kernels are now built into libarrow by default (GH-34388) ## Acero * Acero has been moved out of libarrow into it’s own shared library, allowing for smaller builds of the core libarrow (GH-15280) * Exec nodes now can have a concept of “ordering” and will reject non-sensible plans (GH-34136) * New exec nodes: “pivot_longer” (GH-34266), “order_by” (GH-34248) and “fetch” (GH-34059) * Breaking Change: Reorder output fields of “group_by” node so that keys/segment keys come before aggregates (GH-33616) ## Substrait * Add support for the round function GH-33588 * Add support for the cast expression element GH-31910 * Added API reference documentation GH-34011 * Added an extension relation to support segmented aggregation GH-34626 * The output of the aggregate relation now conforms to the spec GH-34786 ## Parquet * Added support for DeltaLengthByteArray encoding to the Parquet writer (GH-33024) (forwarded request 1087839 from bnavigator) OBS-URL: https://build.opensuse.org/request/show/1087840 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=3	2023-05-19 09:55:41 +00:00
bnavigator	f0e79bb038	Accepting request 1087839 from home:bnavigator:pyarrow - Update to 12.0.0 * Run-End Encoded Arrays have been implemented and are accessible (GH-32104) * The FixedShapeTensor Logical value type has been implemented using ExtensionType (GH-15483, GH-34796) ## Compute * New kernel to convert timestamp with timezone to wall time (GH-33143) * Cast kernels are now built into libarrow by default (GH-34388) ## Acero * Acero has been moved out of libarrow into it’s own shared library, allowing for smaller builds of the core libarrow (GH-15280) * Exec nodes now can have a concept of “ordering” and will reject non-sensible plans (GH-34136) * New exec nodes: “pivot_longer” (GH-34266), “order_by” (GH-34248) and “fetch” (GH-34059) * Breaking Change: Reorder output fields of “group_by” node so that keys/segment keys come before aggregates (GH-33616) ## Substrait * Add support for the round function GH-33588 * Add support for the cast expression element GH-31910 * Added API reference documentation GH-34011 * Added an extension relation to support segmented aggregation GH-34626 * The output of the aggregate relation now conforms to the spec GH-34786 ## Parquet * Added support for DeltaLengthByteArray encoding to the Parquet writer (GH-33024) OBS-URL: https://build.opensuse.org/request/show/1087839 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=11	2023-05-18 17:02:09 +00:00
dimstar_suse	41d9d0fb5f	Accepting request 1076956 from science OBS-URL: https://build.opensuse.org/request/show/1076956 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/apache-arrow?expand=0&rev=2	2023-04-03 15:47:02 +00:00
bnavigator	5313afc3ac	Accepting request 1076954 from home:Andreas_Schwab:Factory - cflags.patch: fix option order to compile with optimisation - Adjust constraints OBS-URL: https://build.opensuse.org/request/show/1076954 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=9	2023-04-03 12:19:46 +00:00

1 2

58 Commits