1539a8cfb2
- Update to 19.0.1 ## Bug Fixes * [C++] Fix overflow issues for large build side in swiss join (#45108) * [C++][Fuzzing] Fix Negation bug discovered by fuzzing (#45181) * [C++][Parquet] Omit level histogram when max level is 0 (#45285) * [Parquet][C++] Fix statistics load logic for no row group and multiple row groups (#45350) * [C++] Disable Flight test (#45232) ## Improvements * [C++][Parquet] Improve performance of generating size statistics (#45202) * [C++][S3] Workaround compatibility issue between AWS SDK and MinIO (#45310) - Release 19.0.0 ## New Features and Improvements * [CI][C++] Add a nightly job to test offline build (#44721) * [C++] GcsFileSystem::Make should return Result (#44503) * [C++][Parquet] Implement SizeStatistics (#40594) * [C++] Reduce string inlining in Substrait serde (#45174) * [C++][Acero] Enhance asof_join to work in multi-threaded execution by sequencing input (#44083) * [C++] Support the AWS S3 SSE-C encryption (#43601) * [C++][Parquet] Parquet Metadata Printer supports print sort-columns (#43599) * [C++] Add C++ implementation of Async C Data Interface (#44495) * [C++][Acero] Support AVX2 swiss join decoding (#43832) * [C++] skip -0117 in StrptimeZoneOffset for old glibc (#44621) * [C++] Add arrow::RecordBatch::MakeStatisticsArray() (#44252)
Benjamin Greiner2025-02-17 22:32:29 +00:00
22a0ee3370
Accepting request 1218457 from science
Ana Guerrero2024-10-27 10:25:51 +00:00
20345967c9
- Set the appropriate C++ complier for the given platform so it will compile on Leap 15.x.
Benjamin Greiner2024-10-26 01:06:02 +00:00
6f40ca4abe
Accepting request 1201792 from science
Ana Guerrero2024-09-22 09:05:54 +00:00
174a699a90
- Add apache-arrow-pr43766-boost1_86.patch for Boost 1.86 * gh#apache/arrow#43766
Benjamin Greiner2024-09-18 12:46:47 +00:00
ada1664357
- Update to 17.0.0 ## Bug Fixes * [C++] Add option to string ‘center’ kernel to control left/right alignment on odd number of padding (#41449) * [C++][Python] Fix casting to extension type with fixed size list storage type (#42219) * [C++] Replace null_count with MayHaveNulls in ListArrayFromArray and MapArray (#41957) * [C++][Python] RecordBatch.filter() segfaults if passed a ChunkedArray (#40971) * [C++][Parquet] Timestamp conversion from Parquet to Arrow does not follow compatibility guidelines for convertedType * [C++] Use LargeStringArray for casting when writing tables to CSV (#40271) * [C++][Python] Map child Array constructed from keys and items shouldn’t have offset (#40871) * [C++] Fix compile warning with ‘implicitly-defined constructor does not initialize’ in encoding_benchmark (#41060) * [C++] Get null_bit_id according to are_cols_in_encoding_order in NullUpdateColumnToRow_avx2 (#40998) * [C++] Clean up unused parameter warnings (#41111) * [C++][Acero] Fix asof join race (#41614) * [C++] support for single threaded joins (#41125) * [C++] Fix hashjoin benchmark failed at make utf8’s random batches (#41195) * [C++] Check to avoid copying when NullBitmapBuffer is Null (#41452) * [C++] Fix crash on invalid Parquet file (#41366) * [C++][Parquet] More strict Parquet level checking (#41346) * [C++][Gandiva] Fix gandiva cache size env var (#41330) * [C++][CMake][Windows] Remove needless .dll suffix from link libraries (#41341) * [C++][CMake] Remove unused ARROW_NO_DEPRECATED_API (#41345) * [C++][maybe_unused] with Arrow macro (#41359) * [C++][Large] ListView and Map nested types for scalar_if_else’s kernel functions (#41419) * [C++][Gandiva] Fix ascii_utf8 function to return same result on x86 and Arm (#41434) * [C++] Reuse deduplication logic for direct registration (#41466) * [C++] Clean up more redundant move warnings (#41487) * [C++][Compute] Remove redundant logic for ArrayData as ExecResults in ExecScalarCaseWhen (#41380) * [C++][CMake] correctly use Protobuf_PROTOC_EXECUTABLE (#41582) * [C++][CMake] Fix ARROW_USE_BOOST detect condition (#41622) * [C++][Python] Add optional null_bitmap to MapArray::FromArrays (#41757) * [C++] macros.h: Fix ARROW_FORCE_INLINE for MSVC (#41712) * [C++][Acero] Remove an useless parameter for QueryContext::Init called in hash_join_benchmark (#41716) * [C++] Fix the issue that temp vector stack may be under sized (#41746) * [C++] Check that extension metadata key is present before attempting to delete it (#41763) * [C++] Iterator releases its resource immediately when it reads all values (#41824) * [C++][Flight][Benchmark] Ensure waiting server ready (#41793) * [C++] Fix avx2 gather offset larger than 2GB in CompareColumnsToRows (#42188) * [C++][S3] Fix potential deadlock when closing output stream (#41876) * [CI][C++] Clear cache for mamba on AppVeyor (#41977) * [CI][Python][C++] Fix utf8proc detection for wheel on Windows (#42022) * [C++] Support list-views on list_slice (#42067) * [C++] Fix an OTel test failure and remove needless logs (#42122) * [C++][FS][Azure] Ensure setting BlobSasBuilder::Protocol (#42108) * [C++] Support list-view typed arrays in array_take and array_filter (#42117) * [C++] Fix some potential uninitialized variable warnings (#42207) * [C++] Avoid invalid accesses in parquet-encoding-benchmark (#42141) * [C++] Use FetchContent for bundled ORC (#43011) * [C++] Fix GetRecordBatchPayload crashes for device data (#42199) * [C++] Use non-stale c-ares download URL (#42250) * [C++][Parquet] Check for valid ciphertext length to prevent segfault (#43071) * [C++][Compute] Mark KeyCompare.CompareColumnsToRowsLarge as large memory test (#43128) * [C++] Upgrade bundled google-cloud-cpp to 2.22.0 (#43136) ## New Features and Improvements * [C++][Compute] Implement Grouper::Reset (#41352) * [Go][C++] Implement Flight SQL Bulk Ingestion (#38385) * [C++][FS][Azure] Support azure cli auth (#41976) * [C++][FS][Azure] Add support for environment credential (#41715) * [C++] Optimize Take for fixed-size types including nested fixed-size lists (#41297) * [C++][Device] Add Copy/View slice functions to a CPU pointer (#41477) * [C++] Add support for OpenTelemetry logging (#39905) * [C++] Import/Export ArrowDeviceArrayStream (#40807) * [C++] move LocalFileSystem to the registry (#40356) * [C++] Make flatbuffers serialization more deterministic (#40392) * [C++][Gandiva] add RE2::Options set_dot_nl(true) for Like function (#40970) * [C++] Introduce portable compiler assumptions (#41021) * [C++] Add a grouper benchmark for preventing performance regression (#41036) * [C++] Support flatten for combining nested list related types (#41092) * [C++] Clean up remaining tasks related to half float casts (#41084) * [C++][FS][Azure] Add support for CopyFile with hierarchical namespace support (#41276) * [C++] Add is_validity_defined_by_bitmap() predicate (#41115) * [C++] IO: enhance boundary checking in CompressedInputStream (#41117) * [C++][Python] Expose recursive flatten for lists on list_flatten kernel function and pyarrow bindings (#41295) * [C++][Parquet][Doc] Denote PARQUET:field_id in parquet.rst (#41187) * [C++] Extract the kernel loops used for PrimitiveTakeExec and generalize to any fixed-width type (#41373) * [C++][Acero] Use per-node basis temp vector stack to mitigate overflow (#41335) * [C++][Parquet] Optimize DelimitRecords by batch execution when max_rep_level > 1 (#41362) * [C++][FS][Azure][Docs] Add AzureFileSystem to Filesystems API reference (#41411) * [C++] Use ASAN to poison temp vector stack memory (#41695) * [C++][S3] Add a new option to check existence before CreateDir (#41822) * [C++][Parquet] Fix DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize (#41546) * [C++] Thirdparty: Upgrade xsimd to 13.0.0 (#41548) * [C++] Improve fixed_width_test_util.h (#41575) * [C++] ChunkResolver: Implement ResolveMany and add unit tests (#41561) * [C++] fixed_width_internal.h: Simplify docstring and support bit-sized types (BOOL) (#41597) * [C++][Python] Extends the add_key_value to parquet::arrow and PyArrow (#41633) * [C++][CMake][Windows] Don’t build needless object libraries (#41658) * [C++][Python] PrettyPrint non-cpu data by copying to default CPU device (#42010) * [C++][Parquet] Thrift: generate template method to accelerate reading thrift (#41703) * [C++][Parquet] Minor: moving EncodedStats by default rather than copying (#41727) * [C++][ORC] Ensure setting detected ORC version (#41767) * [C++][Parquet] Add file metadata read/write benchmark (#41761) * [C++] Make git-dependent definitions internal (#41781) * [C++][S3] Remove GetBucketRegion hack for newer AWS SDK versions (#41798) * [C++][Parquet] normalize dictionary encoding to use RLE_DICTIONARY (#41819) * [C++] IPC: Minor enhance the code of writer (#41900) * [C++] Fix ExecuteScalar deduce all_scalar with chunked_array (#41925) * [C++] Minor enhance code style for FixedShapeTensorType (#41954) * [C++] Follow up of adding null_bitmap to MapArray::FromArrays (#41956) * [C++] Misc changes making code around list-like types and list-view types behave the same way (#41971) * [C++] : kernel.cc: Remove defaults on switch so that compiler can check full enum coverage for us (#41995) * [C++][Parquet] ParquetFilePrinter::JSONPrint print length of FLBA (#41981) * [C++][CMake] Add preset for Valgrind (#42110) * [C++] Move TakeXXX free functions into TakeMetaFunction and make them private (#42127) * [C++][FS][Azure] Validate AzureOptions::{blob,dfs}_storage_scheme (#42135) * [C++] list_parent_indices: Add support for list-view types (#42236) * [C++] Reduce the recursion of many-join test (#43042) * [C++] Limit buffer size in BufferedInputStream::SetBufferSize with raw_read_bound (#43064) - Require cmake lz4 for 1.10 - Update to 17.0.0 ## Bug Fixes * [C++][Python] Fix casting to extension type with fixed size list storage type (#42219) * [Python] Include metadata when creating pa.schema from PyCapsule (#41538) * [C++][Python] RecordBatch.filter() segfaults if passed a ChunkedArray (#40971) * [Python] pa.array: add check for byte-swapped numpy arrays inside python objects (#41549) * [Python] Fix read_table for encrypted parquet (#39438) * [Python] RunEndEncodedArray.from_arrays: bugfix for Array arguments (#40560) (#41093) * [C++][Python] Map child Array constructed from keys and items shouldn’t have offset (#40871) * [Python] test_numpy_array_protocol test failures with numpy 2.0.0rc1 * [Python] Fix StructArray.sort() for by=None (#41495) * [Python] Build with Python 3.13 (#42034) * [Python] remove special methods related to buffers in python <2.6 (#41492) * [Python] Fix reading column index with decimal values (#41503) * [Docs][Python] Remove duplicate contents (#41588) * [C++][Python] Add optional null_bitmap to MapArray::FromArrays (#41757) * [Python][Parquet] Implement to_dict method on SortingColumn (#41704) * [Python] CMake: ignore Parquet encryption option if Parquet itself is not enabled (fix Java integration build) (#41776) * [Python] Disallow direct pa.RecordBatchReader() construction to avoid segfaults (#41773) * [Python] Fix RecordBatchReader.cast to support casting to equal schema for all types (#42098) * [Python] Fix tests when using NumPy 2.0 on Windows (#42099) * [CI][Python] Use pip install -e instead of setup.py build_ext –inplace for installing pyarrow on verification script (#42007) * [CI][Python][C++] Fix utf8proc detection for wheel on Windows (#42022) * [Python][CI] Update expected output for numpy 2.0.0 (#42172) ## New Features and Improvements * [Python] Replace pandas.util.testing.rands with vendored version (#42089) * [Python] begin moving static settings to pyproject.toml (#41041) * [Python] Implement PyCapsule interface for Device data in PyArrow (#40717) * [Python] Expand the Arrow PyCapsule Interface with C Device Data support (#40708) * [Python] Let RecordBatch.filter accept a boolean expression in addition to mask array (#43043) * [Python] Fix pickling of LocalFileSystem for cython 2 (#41459) * [Python] Expand the C Device Interface bindings to support import on CUDA device (#40385) * [Python] Allow passing a mapping of column names to rename_columns (#40645) * [Python][Packaging] Strip unnecessary symbols when building wheels (#42028) * [Python][Docs] Update PyArrow installation docs for conda package split (#41135) * [Python] Basic bindings for Device and MemoryManager classes (#41685) * [C++][Python] Expose recursive flatten for lists on list_flatten kernel function and pyarrow bindings (#41295) * [Python][Packaging] Ensure to build with released numpy 2.0 (instead of RC) in the wheel building workflows (#42194) * [CI][Python] Add a job on ARM64 macOS (#41313) * [CI][Python] Reduce CI time on macOS (#41378) * [Python] Expose byte_width and bit_width of ExtensionType in terms of the storage type (#41413) * [Python] Update Python development guide about components being enabled by default based on Arrow C++ (#41705) * [Python] Building PyArrow: enable/disable python components by default based on availability in Arrow C++ (#41494) * [C++][Python] Extends the add_key_value to parquet::arrow and PyArrow (#41633) * [Python] Ensure Buffer methods don’t crash with non-CPU data (#41889) * [C++][Python] PrettyPrint non-cpu data by copying to default CPU device (#42010) * [Python][Parquet] Update BYTE_STREAM_SPLIT description in write_table() docstring (#41759) * [Python] Add support for Pyodide (#37822) * [Python] Fix pandas tests to follow downstream datetime64 unit changes (#41979) * [Python] Allow Array.filter() to take general array input (#42051) * [Python] Expose new FLOAT16 logical type in the pyarrow.parquet bindings (#42103) * [Python] Array gracefully fails on non-cpu device (#42113) * [Python][Parquet] Pyarrow store decimal as integer (#42169) * [Python] Add CI job for Numpy 1.X (#42189) * [CI][Python] Pin openjdk=17 in python substrait integration (#43051) - Drop pyarrow-pr41319-numpy2-tests.patch - Add pyarrow-pr433325-extradirs.patch gh#apache/arrow/pull/43325
Benjamin Greiner2024-08-15 09:43:24 +00:00
9c4175a075
Accepting request 1170145 from science
Ana Guerrero2024-04-25 18:50:23 +00:00