apache-arrow

SHA256

Author	SHA256	Message	Date
Benjamin Greiner	853a205aac	- Update to 20.0.0 ## Bug Fixes * GH-30302 - [C++][Parquet] Preserve the bitwidth of integer dictionary indices on round-trip to Parquet (#45685) * GH-31992 - [C++][Parquet] Handling the special case when DataPageV2 values buffer is empty (#45252) * GH-37630 - [C++][Python][Dataset] Allow disabling fragment metadata caching (#45330) * GH-39023 - [C++][CMake] Add missing launcher path conversion for ExternalPackage (#45349) * GH-43057 - [C++] Thread-safe AesEncryptor / AesDecryptor (#44990) * GH-45048 - [C++][Parquet] Deprecate unused chunk_size parameter in parquet::arrow::FileWriter::NewRowGroup() (#45088) * GH-45129 - [Python][C++] Fix usage of deprecated C++ functionality on pyarrow (#45189) * GH-45132 - [C++][Gandiva] Update LLVM to 18.1 (#45114) * GH-45185 - [C++][Parquet] Raise an error for invalid repetition levels when delimiting records (#45186) * GH-45254 - [C++][Acero] Fix the row offset truncation in row table merge (#45255) * GH-45266 - [C++][Acero] Fix the running tasks count of Scheduler when get error tasks in multi-threads (#45268) * GH-45270 - [C++][CI] Disable mimalloc on Valgrind builds (#45271) * GH-45301 - [C++] Change PrimitiveArray ctor to protected (#45444) * GH-45334 - [C++][Acero] Fix swiss join overflow issues in row offset calculation for fixed length and null masks (#45336) * GH-45362 - [C++] Fix identity cast for time and list scalar OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=55	2025-06-13 18:31:56 +00:00
Benjamin Greiner	12b0bf8517	Accepting request 1271189 from home:hsk17:branches:home:simotek:cmake4b changes to fix cmake-4 build problems OBS-URL: https://build.opensuse.org/request/show/1271189 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=52	2025-04-21 16:30:53 +00:00
Benjamin Greiner	986ddd3f2e	Accepting request 1264971 from home:bnavigator:branches:science - Re-enable flight, grpc has been fixed boo#1237422 OBS-URL: https://build.opensuse.org/request/show/1264971 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=50	2025-03-28 08:48:20 +00:00
Benjamin Greiner	1d57fa866b	Accepting request 1252868 from home:bnavigator:branches:science - Add missing dependencies for libboost_process explicitly boo#1239599 OBS-URL: https://build.opensuse.org/request/show/1252868 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=48	2025-03-13 19:08:14 +00:00
Benjamin Greiner	285eb6979a	Accepting request 1247453 from home:bnavigator:branches:science - disable flight because of gh#grpc/grpc#37968 boo#1237422 OBS-URL: https://build.opensuse.org/request/show/1247453 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=46	2025-02-20 16:43:05 +00:00
Benjamin Greiner	55775895c9	- Update to 19.0.1 ## Bug Fixes * [C++] Fix overflow issues for large build side in swiss join (#45108) * [C++][Fuzzing] Fix Negation bug discovered by fuzzing (#45181) * [C++][Parquet] Omit level histogram when max level is 0 (#45285) * [Parquet][C++] Fix statistics load logic for no row group and multiple row groups (#45350) * [C++] Disable Flight test (#45232) ## Improvements * [C++][Parquet] Improve performance of generating size statistics (#45202) * [C++][S3] Workaround compatibility issue between AWS SDK and MinIO (#45310) - Release 19.0.0 ## New Features and Improvements * [CI][C++] Add a nightly job to test offline build (#44721) * [C++] GcsFileSystem::Make should return Result (#44503) * [C++][Parquet] Implement SizeStatistics (#40594) * [C++] Reduce string inlining in Substrait serde (#45174) * [C++][Acero] Enhance asof_join to work in multi-threaded execution by sequencing input (#44083) * [C++] Support the AWS S3 SSE-C encryption (#43601) * [C++][Parquet] Parquet Metadata Printer supports print sort-columns (#43599) * [C++] Add C++ implementation of Async C Data Interface (#44495) * [C++][Acero] Support AVX2 swiss join decoding (#43832) * [C++] skip -0117 in StrptimeZoneOffset for old glibc (#44621) * [C++] Add arrow::RecordBatch::MakeStatisticsArray() (#44252) OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=40	2025-02-17 22:32:29 +00:00
Benjamin Greiner	be27bc1230	Accepting request 1218425 from home:yeey:OpenWebUI - Set the appropriate C++ complier for the given platform so it will compile on Leap 15.x. - Enable sle15_python_module_pythons. OBS-URL: https://build.opensuse.org/request/show/1218425 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=38	2024-10-26 01:06:02 +00:00
Benjamin Greiner	3f02fd3dcd	Accepting request 1201791 from home:bnavigator:branches:science - Add apache-arrow-pr43766-boost1_86.patch for Boost 1.86 * gh#apache/arrow#43766 OBS-URL: https://build.opensuse.org/request/show/1201791 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=36	2024-09-18 12:46:47 +00:00
Benjamin Greiner	9bed06f66b	Accepting request 1194085 from home:bnavigator:branches:science - Update to 17.0.0 ## Bug Fixes * [C++] Add option to string ‘center’ kernel to control left/right alignment on odd number of padding (#41449) * [C++][Python] Fix casting to extension type with fixed size list storage type (#42219) * [C++] Replace null_count with MayHaveNulls in ListArrayFromArray and MapArray (#41957) * [C++][Python] RecordBatch.filter() segfaults if passed a ChunkedArray (#40971) * [C++][Parquet] Timestamp conversion from Parquet to Arrow does not follow compatibility guidelines for convertedType * [C++] Use LargeStringArray for casting when writing tables to CSV (#40271) * [C++][Python] Map child Array constructed from keys and items shouldn’t have offset (#40871) * [C++] Fix compile warning with ‘implicitly-defined constructor does not initialize’ in encoding_benchmark (#41060) * [C++] Get null_bit_id according to are_cols_in_encoding_order in NullUpdateColumnToRow_avx2 (#40998) * [C++] Clean up unused parameter warnings (#41111) * [C++][Acero] Fix asof join race (#41614) * [C++] support for single threaded joins (#41125) * [C++] Fix hashjoin benchmark failed at make utf8’s random batches (#41195) * [C++] Check to avoid copying when NullBitmapBuffer is Null (#41452) * [C++] Fix crash on invalid Parquet file (#41366) * [C++][Parquet] More strict Parquet level checking (#41346) * [C++][Gandiva] Fix gandiva cache size env var (#41330) * [C++][CMake][Windows] Remove needless .dll suffix from link libraries (#41341) * [C++][CMake] Remove unused ARROW_NO_DEPRECATED_API (#41345) * [C++][maybe_unused] with Arrow macro (#41359) * [C++][Large] ListView and Map nested types for scalar_if_else’s kernel functions (#41419) * [C++][Gandiva] Fix ascii_utf8 function to return same result on x86 and Arm (#41434) * [C++] Reuse deduplication logic for direct registration (#41466) * [C++] Clean up more redundant move warnings (#41487) * [C++][Compute] Remove redundant logic for ArrayData as ExecResults in ExecScalarCaseWhen (#41380) * [C++][CMake] correctly use Protobuf_PROTOC_EXECUTABLE (#41582) * [C++][CMake] Fix ARROW_USE_BOOST detect condition (#41622) * [C++][Python] Add optional null_bitmap to MapArray::FromArrays (#41757) * [C++] macros.h: Fix ARROW_FORCE_INLINE for MSVC (#41712) * [C++][Acero] Remove an useless parameter for QueryContext::Init called in hash_join_benchmark (#41716) * [C++] Fix the issue that temp vector stack may be under sized (#41746) * [C++] Check that extension metadata key is present before attempting to delete it (#41763) * [C++] Iterator releases its resource immediately when it reads all values (#41824) * [C++][Flight][Benchmark] Ensure waiting server ready (#41793) * [C++] Fix avx2 gather offset larger than 2GB in CompareColumnsToRows (#42188) * [C++][S3] Fix potential deadlock when closing output stream (#41876) * [CI][C++] Clear cache for mamba on AppVeyor (#41977) * [CI][Python][C++] Fix utf8proc detection for wheel on Windows (#42022) * [C++] Support list-views on list_slice (#42067) * [C++] Fix an OTel test failure and remove needless logs (#42122) * [C++][FS][Azure] Ensure setting BlobSasBuilder::Protocol (#42108) * [C++] Support list-view typed arrays in array_take and array_filter (#42117) * [C++] Fix some potential uninitialized variable warnings (#42207) * [C++] Avoid invalid accesses in parquet-encoding-benchmark (#42141) * [C++] Use FetchContent for bundled ORC (#43011) * [C++] Fix GetRecordBatchPayload crashes for device data (#42199) * [C++] Use non-stale c-ares download URL (#42250) * [C++][Parquet] Check for valid ciphertext length to prevent segfault (#43071) * [C++][Compute] Mark KeyCompare.CompareColumnsToRowsLarge as large memory test (#43128) * [C++] Upgrade bundled google-cloud-cpp to 2.22.0 (#43136) ## New Features and Improvements * [C++][Compute] Implement Grouper::Reset (#41352) * [Go][C++] Implement Flight SQL Bulk Ingestion (#38385) * [C++][FS][Azure] Support azure cli auth (#41976) * [C++][FS][Azure] Add support for environment credential (#41715) * [C++] Optimize Take for fixed-size types including nested fixed-size lists (#41297) * [C++][Device] Add Copy/View slice functions to a CPU pointer (#41477) * [C++] Add support for OpenTelemetry logging (#39905) * [C++] Import/Export ArrowDeviceArrayStream (#40807) * [C++] move LocalFileSystem to the registry (#40356) * [C++] Make flatbuffers serialization more deterministic (#40392) * [C++][Gandiva] add RE2::Options set_dot_nl(true) for Like function (#40970) * [C++] Introduce portable compiler assumptions (#41021) * [C++] Add a grouper benchmark for preventing performance regression (#41036) * [C++] Support flatten for combining nested list related types (#41092) * [C++] Clean up remaining tasks related to half float casts (#41084) * [C++][FS][Azure] Add support for CopyFile with hierarchical namespace support (#41276) * [C++] Add is_validity_defined_by_bitmap() predicate (#41115) * [C++] IO: enhance boundary checking in CompressedInputStream (#41117) * [C++][Python] Expose recursive flatten for lists on list_flatten kernel function and pyarrow bindings (#41295) * [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst (#41187) * [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type (#41373) * [C++][Acero] Use per-node basis temp vector stack to mitigate overflow (#41335) * [C++][Parquet] Optimize DelimitRecords by batch execution when max_rep_level > 1 (#41362) * [C++][FS][Azure][Docs] Add AzureFileSystem to Filesystems API reference (#41411) * [C++] Use ASAN to poison temp vector stack memory (#41695) * [C++][S3] Add a new option to check existence before CreateDir (#41822) * [C++][Parquet] Fix DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize (#41546) * [C++] Thirdparty: Upgrade xsimd to 13.0.0 (#41548) * [C++] Improve fixed_width_test_util.h (#41575) * [C++] ChunkResolver: Implement ResolveMany and add unit tests (#41561) * [C++] fixed_width_internal.h: Simplify docstring and support bit-sized types (BOOL) (#41597) * [C++][Python] Extends the add_key_value to parquet::arrow and PyArrow (#41633) * [C++][CMake][Windows] Don’t build needless object libraries (#41658) * [C++][Python] PrettyPrint non-cpu data by copying to default CPU device (#42010) * [C++][Parquet] Thrift: generate template method to accelerate reading thrift (#41703) * [C++][Parquet] Minor: moving EncodedStats by default rather than copying (#41727) * [C++][ORC] Ensure setting detected ORC version (#41767) * [C++][Parquet] Add file metadata read/write benchmark (#41761) * [C++] Make git-dependent definitions internal (#41781) * [C++][S3] Remove GetBucketRegion hack for newer AWS SDK versions (#41798) * [C++][Parquet] normalize dictionary encoding to use RLE_DICTIONARY (#41819) * [C++] IPC: Minor enhance the code of writer (#41900) * [C++] Fix ExecuteScalar deduce all_scalar with chunked_array (#41925) * [C++] Minor enhance code style for FixedShapeTensorType (#41954) * [C++] Follow up of adding null_bitmap to MapArray::FromArrays (#41956) * [C++] Misc changes making code around list-like types and list-view types behave the same way (#41971) * [C++] : kernel.cc: Remove defaults on switch so that compiler can check full enum coverage for us (#41995) * [C++][Parquet] ParquetFilePrinter::JSONPrint print length of FLBA (#41981) * [C++][CMake] Add preset for Valgrind (#42110) * [C++] Move TakeXXX free functions into TakeMetaFunction and make them private (#42127) * [C++][FS][Azure] Validate AzureOptions::{blob,dfs}_storage_scheme (#42135) * [C++] list_parent_indices: Add support for list-view types (#42236) * [C++] Reduce the recursion of many-join test (#43042) * [C++] Limit buffer size in BufferedInputStream::SetBufferSize with raw_read_bound (#43064) - Require cmake lz4 for 1.10 - Update to 17.0.0 ## Bug Fixes * [C++][Python] Fix casting to extension type with fixed size list storage type (#42219) * [Python] Include metadata when creating pa.schema from PyCapsule (#41538) * [C++][Python] RecordBatch.filter() segfaults if passed a ChunkedArray (#40971) * [Python] pa.array: add check for byte-swapped numpy arrays inside python objects (#41549) * [Python] Fix read_table for encrypted parquet (#39438) * [Python] RunEndEncodedArray.from_arrays: bugfix for Array arguments (#40560) (#41093) * [C++][Python] Map child Array constructed from keys and items shouldn’t have offset (#40871) * [Python] `test_numpy_array_protocol` test failures with numpy 2.0.0rc1 * [Python] Fix StructArray.sort() for by=None (#41495) * [Python] Build with Python 3.13 (#42034) * [Python] remove special methods related to buffers in python <2.6 (#41492) * [Python] Fix reading column index with decimal values (#41503) * [Docs][Python] Remove duplicate contents (#41588) * [C++][Python] Add optional null_bitmap to MapArray::FromArrays (#41757) * [Python][Parquet] Implement to_dict method on SortingColumn (#41704) * [Python] CMake: ignore Parquet encryption option if Parquet itself is not enabled (fix Java integration build) (#41776) * [Python] Disallow direct pa.RecordBatchReader() construction to avoid segfaults (#41773) * [Python] Fix RecordBatchReader.cast to support casting to equal schema for all types (#42098) * [Python] Fix tests when using NumPy 2.0 on Windows (#42099) * [CI][Python] Use pip install -e instead of setup.py build_ext –inplace for installing pyarrow on verification script (#42007) * [CI][Python][C++] Fix utf8proc detection for wheel on Windows (#42022) * [Python][CI] Update expected output for numpy 2.0.0 (#42172) ## New Features and Improvements * [Python] Replace pandas.util.testing.rands with vendored version (#42089) * [Python] begin moving static settings to pyproject.toml (#41041) * [Python] Implement PyCapsule interface for Device data in PyArrow (#40717) * [Python] Expand the Arrow PyCapsule Interface with C Device Data support (#40708) * [Python] Let RecordBatch.filter accept a boolean expression in addition to mask array (#43043) * [Python] Fix pickling of LocalFileSystem for cython 2 (#41459) * [Python] Expand the C Device Interface bindings to support import on CUDA device (#40385) * [Python] Allow passing a mapping of column names to rename_columns (#40645) * [Python][Packaging] Strip unnecessary symbols when building wheels (#42028) * [Python][Docs] Update PyArrow installation docs for conda package split (#41135) * [Python] Basic bindings for Device and MemoryManager classes (#41685) * [C++][Python] Expose recursive flatten for lists on list_flatten kernel function and pyarrow bindings (#41295) * [Python][Packaging] Ensure to build with released numpy 2.0 (instead of RC) in the wheel building workflows (#42194) * [CI][Python] Add a job on ARM64 macOS (#41313) * [CI][Python] Reduce CI time on macOS (#41378) * [Python] Expose byte_width and bit_width of ExtensionType in terms of the storage type (#41413) * [Python] Update Python development guide about components being enabled by default based on Arrow C++ (#41705) * [Python] Building PyArrow: enable/disable python components by default based on availability in Arrow C++ (#41494) * [C++][Python] Extends the add_key_value to parquet::arrow and PyArrow (#41633) * [Python] Ensure Buffer methods don’t crash with non-CPU data (#41889) * [C++][Python] PrettyPrint non-cpu data by copying to default CPU device (#42010) * [Python][Parquet] Update BYTE_STREAM_SPLIT description in write_table() docstring (#41759) * [Python] Add support for Pyodide (#37822) * [Python] Fix pandas tests to follow downstream datetime64 unit changes (#41979) * [Python] Allow Array.filter() to take general array input (#42051) * [Python] Expose new FLOAT16 logical type in the pyarrow.parquet bindings (#42103) * [Python] Array gracefully fails on non-cpu device (#42113) * [Python][Parquet] Pyarrow store decimal as integer (#42169) * [Python] Add CI job for Numpy 1.X (#42189) * [CI][Python] Pin openjdk=17 in python substrait integration (#43051) - Drop pyarrow-pr41319-numpy2-tests.patch - Add pyarrow-pr433325-extradirs.patch gh#apache/arrow/pull/43325 OBS-URL: https://build.opensuse.org/request/show/1194085 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=34	2024-08-15 09:43:24 +00:00
Benjamin Greiner	fc3315cd8b	OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=32	2024-04-25 13:14:01 +00:00
Benjamin Greiner	d947cb7cd2	OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=31	2024-04-25 09:12:59 +00:00
Benjamin Greiner	c159005cc1	Accepting request 1170120 from home:bnavigator:numpy - Update to 16.0.0 ## Bug Fixes * [C++][ORC] Catch all ORC exceptions to avoid crash (#40697) * [C++][S3] Handle conventional content-type for directories (#40147) * [C++] Strengthen handling of duplicate slashes in S3, GCS (#40371) * [C++] Avoid hash_mean overflow (#39349) * [C++] Fix spelling (array) (#38963) * [C++][Parquet] Fix crash in Modular Encryption (#39623) * [C++][Dataset] Fix failures in dataset-scanner-benchmark (#39794) * [C++][Device] Fix Importing nested and string types for DeviceArray (#39770) * [C++] Use correct (non-CPU) address of buffer in ExportDeviceArray (#39783) * [C++] Improve error message for "chunker out of sync" condition (#39892) * [C++] Use make -j1 to install bundled bzip2 (#39956) * [C++] DatasetWriter avoid creating zero-sized batch when max_rows_per_file enabled (#39995) * [C++][CI] Disable debug memory pool for ASAN and Valgrind (#39975) * [C++][Gandiva] Make Gandiva's default cache size to be 5000 for object code cache (#40041) * [C++][FS][Azure] Fix CreateDir and DeleteDir trailing slash issues on hierarchical namespace accounts (#40054) * [C++][FS][Azure] Validate containers in AzureFileSystem::Impl::MovePaths() (#40086) * [C++] Decimal types with different precisions and scales bind OBS-URL: https://build.opensuse.org/request/show/1170120 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=30	2024-04-25 09:07:39 +00:00
Benjamin Greiner	8d99637b3c	Accepting request 1160966 from home:bnavigator:branches:science - Update to 15.0.2 ## Bug Fixes * [C++][Acero] Increase size of Acero TempStack (#40007) * [C++][Dataset] Add missing Protobuf static link dependency (#40015) * [C++] Possible data race when reading metadata of a parquet file (#40111) * [C++] Make span SFINAE standards-conforming to enable compilation with nvcc (#40253) - Update to 15.0.2 ## Bug Fixes * [Python] Fix except clauses (#40387) * [Python][CI] Skip failing test_dateutil_tzinfo_to_string (#40486) OBS-URL: https://build.opensuse.org/request/show/1160966 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=27	2024-03-23 16:14:18 +00:00
Benjamin Greiner	f4b994c8a2	Accepting request 1152980 from home:bnavigator:branches:science - Reenable logging * Add apache-arrow-pr40230-glog-0.7.patch * Add apache-arrow-pr40275-glog-0.7-2.patch * now requires glog devel files to be present for apache-arrow-devel; ArrowConfig.cmake fails otherwise * gh#apache/arrow#40181 * gh#apache/arrow#40230 * gh#apache/arrow#40275 - Move d:l:p:n/python-pyarrow to the science/apache-arrow as multibuild package: Uses the same source and is tightly connected. OBS-URL: https://build.opensuse.org/request/show/1152980 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=25	2024-02-28 16:27:53 +00:00
Benjamin Greiner	b029d62e8c	Accepting request 1150081 from home:bnavigator:branches:science - Update to 15.0.1 ## Bug Fixes * [C++] "iso_calendar" kernel returns incorrect results for array length > 32 (#39360) * [C++] Explicit error in ExecBatchBuilder when appending var length data exceeds offset limit (int32 max) (#39383) * [C++][Parquet] Pass memory pool to decoders (#39526) * [C++][Parquet] Validate page sizes before truncating to int32 (#39528) * [C++] Fix tail-word access cross buffer boundary in `CompareBinaryColumnToRow` (#39606) * [C++] Fix the issue of ExecBatchBuilder when appending consecutive tail rows with the same id may exceed buffer boundary (for fixed size types) (#39585) * [Release] Update platform tags for macOS wheels to macosx_10_15 (#39657) * [C++][FlightRPC] Fix nullptr dereference in PollInfo (#39711) * [C++] Fix tail-byte access cross buffer boundary in key hash avx2 (#39800) * [C++][Acero] Fix AsOfJoin with differently ordered schemas than the output (#39804) * [C++] Expression ExecuteScalarExpression execute empty args function with a wrong result (#39908) * [C++] Strip extension metadata when importing a registered extension (#39866) * [C#] Restore support for .NET 4.6.2 (#40008) * [C++] Fix out-of-line data size calculation in BinaryViewBuilder::AppendArraySlice (#39994) * [C++][CI][Parquet] Fixing parquet column_writer_test building (#40175) ## New Features and Improvements * [C++] PollFlightInfo does not follow rule of 5 * [C++] Fix filter and take kernel for month_day_nano intervals (#39795) * [C++] Thirdparty: Bump zlib to 1.3.1 (#39877) * [C++] Add missing "#include <algorithm>" (#40010) - Release 15.0.0 ## Bug Fixes * [C++] Bring back case_when tests for union types (#39308) * [C++] Fix the issue of ExecBatchBuilder when appending consecutive tail rows with the same id may exceed buffer boundary (#39234) * [C++][Python] Add a no-op kernel for dictionary_encode(dictionary) (#38349) * [C++] Use the latest tagged version of flatbuffers (#38192) * [C++] Don't use MSVC_VERSION to determin -fms-compatibility-version (#36595) * [C++] Optimize hash kernels for Dictionary ChunkedArrays (#38394) * [C++][Gandiva] Avoid registering exported functions multiple times in gandiva (#37752) * [C++][Acero] Fix race condition caused by straggling input in the as-of-join node (#37839) * [C++][Parquet] add more closed file checks for ParquetFileWriter (#38390) * [C++][FlightRPC] Add missing app_metadata arguments (#38231) * [C++][Parquet] Fix Valgrind memory leak in arrow-dataset-file-parquet-encryption-test (#38306) * [C++][Parquet] Don't initialize OpenSSL explicitly with OpenSSL 1.1 (#38379) * [C++] Re-generate flatbuffers C++ for Skyhook (#38405) * [C++] Avoid passing null pointer to LZ4 frame decompressor (#39125) * [C++] Add missing explicit size_t cast for i386 (#38557) * [C++] Fix: add TestingEqualOptions for gtest functions. (#38642) * [C++][Gandiva] Use arrow io util to replace std::filesystem::path in gandiva (#38698) * [C++] Protect against PREALLOCATE preprocessor defined on macOS (#38760) * [C++] Check variadic buffer counts in bounds (#38740) * [C++][FS][Azure] Do nothing for CreateDir("/container", true) (#38783) * Fix TestArrowReaderAdHoc.ReadFloat16Files to use new uncompressed files (#38825) * [C++] S3FileSystem export s3 sdk config "use_virtual_addressing" to arrow::fs::S3Options (#38858) * [C++][Gandiva] Fix Gandiva to_date function's validation for supress errors parameter (#38987) * [C++][Parquet] Fix spelling (#38959) * [C++] Fix spelling (acero) (#38961) * [C++] Fix spelling (compute) (#38965) * [C++] Fix spelling (util) (#38967) * [C++] Fix spelling (dataset) (#38969) * [C++] Fix spelling (filesystem) (#38972) * [C++] Fix spelling (#38978) * [C++] Fix spelling (#38980) * [C++][Acero] union node output batches should be unordered (#39046) * [C++][CI] Fix Valgrind failures (#39127) * [C++] Remove needless system Protobuf dependency with -DARROW_HDFS=ON (#39137) * [C++][Compute] Fix negative duration division (#39158) * [C++] Add missing data copy in StreamDecoder::Consume(data) (#39164) * [C++] Remove compiler warnings with -Wconversion -Wno-sign-conversion in public headers (#39186) * [C++][Benchmarking] Remove hardcoded min times (#39307) * [C++] Don't use "if constexpr" in lambda (#39334) * [C++] Disable -Werror=attributes for Azure SDK's identity.hpp (#39448) * [C++] Fix compile warning (#39389) * [CI][JS] Force node 20 on JS build on arm64 to fix build issues (#39499) * [C++] Disable parallelism for jemalloc external project (#39522) * [C++][Parquet] Fix crash in test_parquet_dataset_lazy_filtering (#39632) * [C++] Disable parallelism for all `make`-based externalProjects when CMake >= 3.28 is used ## New Features and Improvements * [C++][JSON] Change the max rows to Unlimited(int_32) (#38582) * [C++][Python] Add "Z" to the end of timestamp print string when tz defined (#39272) * [C++][Python] DLPack implementation for Arrow Arrays (producer) (#38472) * [C++] Diffing of Run-End Encoded arrays (#35003) * [C++][Python][R] Allow users to adjust S3 log level by environment variable (#38267) * [C++][Format] Implementation of the LIST_VIEW and LARGE_LIST_VIEW array formats (#35345) * [C++] Use Cast() instead of CastTo() for Scalar in test (#39044) * [C++][Python][Parquet] Implement Float16 logical type (#36073) * [C++] Add Utf8View and BinaryView to the c ABI (#38443) * [C++][Parquet] Add api to get RecordReader from RowGroupReader (#37003) * [C++] Expose a span converter for Buffer and ArraySpan (#38027) * [C++] Add A Dictionary Compaction Function For DictionaryArray (#37418) * [C++] Add arrow::ipc::StreamDecoder::Reset() (#37970) * [C++] Implement file reads for Azure filesystem (#38269) * [C++][Integration] Add C++ Utf8View implementation (#37792) * [C++][Gandiva] Add external function registry support (#38116) * [C++][Gandiva] Migrate LLVM JIT engine from MCJIT to ORC v2/LLJIT (#39098) * [C++] Feature: support concatenate recordbatches. (#37896) * [C++] Add support for specifying custom Array opening and closing delimiters to arrow::PrettyPrintDelimiters (#38187) * [R] Allow code() to return package name prefix. (#38144) * [C++][Benchmark] Add non-stream Codec Compression/Decompression (#38067) * [C++][Parquet] Change DictEncoder dtor checking to warning log (#38118) * [C++][Parquet] Support reading parquet files with multiple gzip members (#38272) * [C++][Parquet] check the decompressed page size same as size in page header (#38327) * [C++][Azure] Use properties for input stream metadata (#38524) * [C++][FS][Azure] Implement file writes (#38780) * [C++] Implement GetFileInfo for a single file in Azure filesystem (#38505) * [C++][CMake] Use transitive dependency for system GoogleTest (#38340) * [C++][Parquet] Use new encrypted files for page index encryption test (#38347) * Add validation logic for offsets and values to arrow.array.ListArray.fromArrays (#38531) * [C++][Acero] Create a sorted merge node (#38380) * [C++][Benchmark] Adding benchmark for LZ4/Snappy Compression (#38453) * [C++] Support LogicalNullCount for DictionaryArray (#38681) * [C++][Parquet] Faster scalar BYTE_STREAM_SPLIT (#38529) * [C++][Gandiva] Support registering external C functions (#38632) * [C++] Implement GetFileInfo(selector) for Azure filesystem (#39009) * [C++][FS][Azure] Implement CreateDir() (#38708) * [C++][FS][Azure] Implement DeleteDir() (#38793) * [C++][FS][Azure] Implement DeleteDirContents() (#38888) * [C++] : Implement AzureFileSystem::DeleteRootDirContents (#39151) * [C++][FS][Azure] Implement CopyFile() (#39058) * [C++][Go][Parquet] Add tests for reading Float16 files in parquet-testing (#38753) * [C++][FS][Azure] Rename AzurePath to AzureLocation (#38773) * [C++] Implement directory semantics even when the storage account doesn't support HNS (#39361) * [C++][Parquet] Update parquet.thrift to sync with 2.10.0 (#38815) * [C++] Replace "#ifdef ARROW_WITH_GZIP" in dataset test to ARROW_WITH_ZLIB (#38853) * [C++][Parquet] Using length to optimize bloom filter read (#38863) * [C++][Parquet] Minor: making parquet TypedComparator operation as const method (#38875) * [C++] DatasetWriter release rows_in_flight_throttle when allocate writing failed (#38885) * [C++][Parquet] Move EstimatedBufferedValueBytes from TypedColumnWriter to ColumnWriter (#39055) * [C++] Stop installing internal bpacking_simd* headers (#38908) * [C++][Gandiva] Refactor function holder to return arrow Result (#38873) * [C++] Use Cast() instead of CastTo() for Dictionary Scalar in test (#39362) * [C++] Use Cast() instead of CastTo() for Timestamp Scalar in test (#39060) * [C++] Use Cast() instead of CastTo() for List Scalar in test (#39353) * [C++][Parquet] Support row group filtering for nested paths for struct fields (#39065) * [C++] Refactor the Azure FS tests and filesystem class instantiation (#39207) * [C++][Parquet] Optimize FLBA record reader (#39124) * Create module info compiler plugin (#39135) * [C++] : Try to make Buffer::device_type_ non-optional (#39150) * [C++][Parquet] Remove deprecated AppendRowGroup(int64_t num_rows) (#39209) * [C++][Parquet] Avoid WriteRecordBatch from produce zero-sized RowGroup (#39211) * [C++] Support binary to fixed_size_binary cast (#39236) * [C++][Azure][FS] Add default credential auth configuration (#39263) * [C++] Don't install bundled Azure SDK for C++ with CMake 3.28+ (#39269) * [C++][FS] : Remove the AzureBackend enum and add more flexible connection options (#39293) * [C++][FS] : Inform caller of container not-existing when checking for HNS support (#39298) * [C++][FS][Azure] Add workload identity auth configuration (#39319) * [C++][FS][Azure] Add managed identity auth configuration (#39321) * [C++] Forward arguments to ExceptionToStatus all the way to Status::FromArgs (#39323) * [C++] Flaky DatasetWriterTestFixture.MaxRowsOneWriteBackpresure test (#39379) * [C++] Add ForceCachedHierarchicalNamespaceSupport to help with testing (#39340) * [C++][FS][Azure] Add client secret auth configuration (#39346) * [C++] Reduce function.h includes (#39312) * [C++] Use Cast() instead of CastTo() for Parquet (#39364) * [C++][Parquet] Vectorize decode plain on FLBA (#39414) * [C++][Parquet] Style: Using arrow::Buffer data_as api rather than reinterpret_cast (#39420) * [C++][ORC] Upgrade ORC to 1.9.2 (#39431) * [C++] Use default Azure credentials implicitly and support anonymous credentials explicitly (#39450) * [C++][Parquet] Allow reading dictionary without reading data via ByteArrayDictionaryRecordReader (#39153) - Disable logging until compatibility with glog is restored gh#apache/arrow#40181 OBS-URL: https://build.opensuse.org/request/show/1150081 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=23	2024-02-24 09:07:04 +00:00
Benjamin Greiner	40e5983a49	Accepting request 1139092 from home:bnavigator:branches:science - Update to 14.0.2 ## New Features and Improvements * GH-38449 - [Release][Go][macOS] Use local test data if possible (#38450) * GH-38591 - [Parquet][C++] Remove redundant open calls in ParquetFileFormat::GetReaderAsync (#38621) ## Bug Fixes * GH-38345 - [Release] Use local test data for verification if possible (#38362) * GH-38438 - [C++] Dataset: Trying to fix the async bug in Parquet dataset (#38466) * GH-38577 - Reading parquet file behavior change from 13.0.0 to 14.0.0 * GH-38618 - [C++] S3FileSystem: fix regression in deleting explicitly created sub-directories (#38845) * GH-38861 - [C++] Add missing “-framework Security” to Libs.private in arrow.pc (#38869) * GH-39072 - [Release][CI] Python3.11-devel is required for the verification job on AlmaLinux 8 (#39073) * GH-39074 - [Release][Packaging] Use UTF-8 explicitly for KEYS (#39082) OBS-URL: https://build.opensuse.org/request/show/1139092 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=21	2024-01-16 09:00:47 +00:00
Benjamin Greiner	6b4b71e17d	Accepting request 1138181 from home:pgajdos - disable some tests for s390x [bsc#1218592] OBS-URL: https://build.opensuse.org/request/show/1138181 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=19	2024-01-12 11:03:12 +00:00
John Vandenberg	59b113ad72	Accepting request 1125774 from home:mimi_vx:branches:science - update 14.0.1 * GH-38431 - [Python][CI] Update fs.type_name checks for s3fs tests * GH-38607 - [Python] Disable PyExtensionType autoload - update to 14.0.1 * very long list of changes can be found here: https://arrow.apache.org/release/14.0.0.html OBS-URL: https://build.opensuse.org/request/show/1125774 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=17	2023-11-14 01:23:03 +00:00
Benjamin Greiner	0d83feb674	Accepting request 1109685 from home:bnavigator:branches:devel:languages:python:numeric - Update to 13.0.0 ## Acero * Handling of unaligned buffers is input nodes can be configured programmatically or by setting the environment variable ACERO_ALIGNMENT_HANDLING. The default behavior is to warn when an unaligned buffer is detected GH-35498. ## Compute * Several new functions have been added: - aggregate functions “first”, “last”, “first_last” GH-34911; - vector functions “cumulative_prod”, “cumulative_min”, “cumulative_max” GH-32190; - vector function “pairwise_diff” GH-35786. * Sorting now works on dictionary arrays, with a much better performance than the naive approach of sorting the decoded dictionary GH-29887. Sorting also works on struct arrays, and nested sort keys are supported using FieldRed GH-33206. * The check_overflow option has been removed from CumulativeSumOptions as it was redundant with the availability of two different functions: “cumulative_sum” and “cumulative_sum_checked” GH-35789. * Run-end encoded filters are efficiently supported GH-35749. * Duration types are supported with the “is_in” and “index_in” functions GH-36047. They can be multiplied with all integer types GH-36128. * “is_in” and “index_in” now cast their inputs more flexibly: they first attempt to cast the value set to the input type, then in the other direction if the former fails GH-36203. * Multiple bugs have been fixed in “utf8_slice_codeunits” when the stop option is omitted GH-36311. ## Dataset * A custom schema can now be passed when writing a dataset GH-35730. The custom schema can alter nullability or metadata information, but is not allowed to change the datatypes written. ## Filesystems * The S3 filesystem now writes files in equal-sized chunks, for compatibility with Cloudflare’s “R2” Storage GH-34363. * A long-standing issue where S3 support could crash at shutdown because of resources still being alive after S3 finalization has been fixed GH-36346. Now, attempts to use S3 resources (such as making filesystem calls) after S3 finalization should result in a clean error. * The GCS filesystem accepts a new option to set the project id GH-36227. ## IPC * Nullability and metadata information for sub-fields of map types is now preserved when deserializing Arrow IPC GH-35297. ## Orc * The Orc adapter now maps Arrow field metadata to Orc type attributes when writing, and vice-versa when reading GH-35304. ## Parquet * It is now possible to write additional metadata while a ParquetFileWriter is open GH-34888. * Writing a page index can be enabled selectively per-column GH-34949. In addition, page header statistics are not written anymore if the page index is enabled for the given column GH-34375, as the information would be redundant and less efficiently accessed. * Parquet writer properties allow specifying the sorting columns GH-35331. The user is responsible for ensuring that the data written to the file actually complies with the given sorting. * CRC computation has been implemented for v2 data pages GH-35171. It was already implemented for v1 data pages. * Writing compliant nested types is now enabled by default GH-29781. This should not have any negative implication. * Attempting to load a subset of an Arrow extension type is now forbidden GH-20385. Previously, if an extension type’s storage is nested (for example a “Point” extension type backed by a struct<x: float64, y: float64>), it was possible to load selectively some of the columns of the storage type. ## Substrait * Support for various functions has been added: “stddev”, “variance”, “first”, “last” (GH-35247, GH-35506). * Deserializing sorts is now supported GH-32763. However, some features, such as clustered sort direction or custom sort functions, are not implemented. ## Miscellaneous * FieldRef sports additional methods to get a flattened version of nested fields GH-14946. Compared to their non-flattened counterparts, the methods GetFlattened, GetAllFlattened, GetOneFlattened and GetOneOrNoneFlattened combine a child’s null bitmap with its ancestors’ null bitmaps such as to compute the field’s overall logical validity bitmap. * In other words, given the struct array [null, {'x': null}, {'x': 5}], FieldRef("x")::Get might return [0, null, 5] while FieldRef("y")::GetFlattened will always return [null, null, 5]. * Scalar::hash() has been fixed for sliced nested arrays GH-35360. * A new floating-point to decimal conversion algorithm exhibits much better precision GH-35576. * It is now possible to cast between scalars of different list-like types GH-36309. OBS-URL: https://build.opensuse.org/request/show/1109685 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=15	2023-09-08 07:18:56 +00:00
Benjamin Greiner	cd7a2c42f0	Accepting request 1092619 from home:bnavigator:pyarrow - Update to 12.0.1 * [GH-35423] - [C++][Parquet] Parquet PageReader Force decompression buffer resize smaller (#35428) * [GH-35498] - [C++] Relax EnsureAlignment check in Acero from requiring 64-byte aligned buffers to requiring value-aligned buffers (#35565) * [GH-35519] - [C++][Parquet] Fixing exception handling in parquet FileSerializer (#35520) * [GH-35538] - [C++] Remove unnecessary status.h include from protobuf (#35673) * [GH-35730] - [C++] Add the ability to specify custom schema on a dataset write (#35860) * [GH-35850] - [C++] Don't disable optimization with RelWithDebInfo (#35856) - Drop cflags.patch -- fixed upstream OBS-URL: https://build.opensuse.org/request/show/1092619 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=13	2023-06-12 15:49:46 +00:00
Benjamin Greiner	f0e79bb038	Accepting request 1087839 from home:bnavigator:pyarrow - Update to 12.0.0 * Run-End Encoded Arrays have been implemented and are accessible (GH-32104) * The FixedShapeTensor Logical value type has been implemented using ExtensionType (GH-15483, GH-34796) ## Compute * New kernel to convert timestamp with timezone to wall time (GH-33143) * Cast kernels are now built into libarrow by default (GH-34388) ## Acero * Acero has been moved out of libarrow into it’s own shared library, allowing for smaller builds of the core libarrow (GH-15280) * Exec nodes now can have a concept of “ordering” and will reject non-sensible plans (GH-34136) * New exec nodes: “pivot_longer” (GH-34266), “order_by” (GH-34248) and “fetch” (GH-34059) * Breaking Change: Reorder output fields of “group_by” node so that keys/segment keys come before aggregates (GH-33616) ## Substrait * Add support for the round function GH-33588 * Add support for the cast expression element GH-31910 * Added API reference documentation GH-34011 * Added an extension relation to support segmented aggregation GH-34626 * The output of the aggregate relation now conforms to the spec GH-34786 ## Parquet * Added support for DeltaLengthByteArray encoding to the Parquet writer (GH-33024) OBS-URL: https://build.opensuse.org/request/show/1087839 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=11	2023-05-18 17:02:09 +00:00
Benjamin Greiner	5313afc3ac	Accepting request 1076954 from home:Andreas_Schwab:Factory - cflags.patch: fix option order to compile with optimisation - Adjust constraints OBS-URL: https://build.opensuse.org/request/show/1076954 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=9	2023-04-03 12:19:46 +00:00
John Vandenberg	d957479563	Accepting request 1075316 from home:bnavigator:branches:science - Remove gflags-static. It was only needed due to a packaging error with gflags which is about to be fixed in Tumbleweed - Disable build of the jemalloc memory pool backend * It requires every consuming application to LD_PRELOAD libjemalloc.so.2, even when it is not set as the default memory pool, due to static TLS block allocation errors * Usage of the bundled jemalloc as a workaround is not desired (gh#apache/arrow#13739) * jemalloc does not seem to have a clear advantage over the system glibc allocator: https://ursalabs.org/blog/2021-r-benchmarks-part-1 * This overrides the default behavior documented in https://arrow.apache.org/docs/cpp/memory.html#default-memory-pool OBS-URL: https://build.opensuse.org/request/show/1075316 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=7	2023-03-29 20:39:22 +00:00
John Vandenberg	ba553e9510	Accepting request 1074321 from home:bnavigator:pyarrow update to 11.0 OBS-URL: https://build.opensuse.org/request/show/1074321 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=6	2023-03-28 08:42:12 +00:00
Stefan Brüns	880ac17313	Accepting request 1001057 from home:StefanBruens:branches:science - Revert ccache change, using ccache in a pristine buildroot just slows down OBS builds (use --ccache for local builds). - Remove unused gflags-static-devel dependency. OBS-URL: https://build.opensuse.org/request/show/1001057 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=5	2022-09-07 13:19:56 +00:00
John Vandenberg	3d9efad54e	Accepting request 998575 from home:jayvdb:pyarrow - Speed up builds with ccache OBS-URL: https://build.opensuse.org/request/show/998575 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=4	2022-08-22 07:46:52 +00:00
John Vandenberg	34d29c598e	Accepting request 994163 from home:StefanBruens:branches:science - Update to v9.0.0 No (current) changelog provided - Spec file cleanup: * Remove lots of duplicate, unused, or wrong build dependencies * Do not package outdated Readmes and Changelogs - Enable tests, disable ones requiring external test data OBS-URL: https://build.opensuse.org/request/show/994163 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=3	2022-08-10 03:06:39 +00:00
Atri Bhattacharya	b913299c7b	Accepting request 849131 from home:jayvdb:branches:science - Update to v2.0.0 OBS-URL: https://build.opensuse.org/request/show/849131 OBS-URL: https://build.opensuse.org/package/show/science/apache-arrow?expand=0&rev=1	2020-11-18 08:11:49 +00:00

28 Commits