apache-arrow/apache-arrow.changes

419 lines
19 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

-------------------------------------------------------------------
Mon Nov 13 23:51:00 UTC 2023 - Ondřej Súkup <mimi.vx@gmail.com>
- update 14.0.1
* GH-38431 - [Python][CI] Update fs.type_name checks for s3fs tests
* GH-38607 - [Python] Disable PyExtensionType autoload
- update to 14.0.1
* very long list of changes can be found here:
https://arrow.apache.org/release/14.0.0.html
-------------------------------------------------------------------
Fri Aug 25 09:05:09 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Update to 13.0.0
## Acero
* Handling of unaligned buffers is input nodes can be configured
programmatically or by setting the environment variable
ACERO_ALIGNMENT_HANDLING. The default behavior is to warn when
an unaligned buffer is detected GH-35498.
## Compute
* Several new functions have been added:
- aggregate functions “first”, “last”, “first_last” GH-34911;
- vector functions “cumulative_prod”, “cumulative_min”,
“cumulative_max” GH-32190;
- vector function “pairwise_diff” GH-35786.
* Sorting now works on dictionary arrays, with a much better
performance than the naive approach of sorting the decoded
dictionary GH-29887. Sorting also works on struct arrays, and
nested sort keys are supported using FieldRed GH-33206.
* The check_overflow option has been removed from
CumulativeSumOptions as it was redundant with the availability
of two different functions: “cumulative_sum” and
“cumulative_sum_checked” GH-35789.
* Run-end encoded filters are efficiently supported GH-35749.
* Duration types are supported with the “is_in” and “index_in”
functions GH-36047. They can be multiplied with all integer
types GH-36128.
* “is_in” and “index_in” now cast their inputs more flexibly:
they first attempt to cast the value set to the input type,
then in the other direction if the former fails GH-36203.
* Multiple bugs have been fixed in “utf8_slice_codeunits” when
the stop option is omitted GH-36311.
## Dataset
* A custom schema can now be passed when writing a dataset
GH-35730. The custom schema can alter nullability or metadata
information, but is not allowed to change the datatypes
written.
## Filesystems
* The S3 filesystem now writes files in equal-sized chunks, for
compatibility with Cloudflares “R2” Storage GH-34363.
* A long-standing issue where S3 support could crash at shutdown
because of resources still being alive after S3 finalization
has been fixed GH-36346. Now, attempts to use S3 resources
(such as making filesystem calls) after S3 finalization should
result in a clean error.
* The GCS filesystem accepts a new option to set the project id
GH-36227.
## IPC
* Nullability and metadata information for sub-fields of map
types is now preserved when deserializing Arrow IPC GH-35297.
## Orc
* The Orc adapter now maps Arrow field metadata to Orc type
attributes when writing, and vice-versa when reading GH-35304.
## Parquet
* It is now possible to write additional metadata while a
ParquetFileWriter is open GH-34888.
* Writing a page index can be enabled selectively per-column
GH-34949. In addition, page header statistics are not written
anymore if the page index is enabled for the given column
GH-34375, as the information would be redundant and less
efficiently accessed.
* Parquet writer properties allow specifying the sorting columns
GH-35331. The user is responsible for ensuring that the data
written to the file actually complies with the given sorting.
* CRC computation has been implemented for v2 data pages
GH-35171. It was already implemented for v1 data pages.
* Writing compliant nested types is now enabled by default
GH-29781. This should not have any negative implication.
* Attempting to load a subset of an Arrow extension type is now
forbidden GH-20385. Previously, if an extension types storage
is nested (for example a “Point” extension type backed by a
struct<x: float64, y: float64>), it was possible to load
selectively some of the columns of the storage type.
## Substrait
* Support for various functions has been added: “stddev”,
“variance”, “first”, “last” (GH-35247, GH-35506).
* Deserializing sorts is now supported GH-32763. However, some
features, such as clustered sort direction or custom sort
functions, are not implemented.
## Miscellaneous
* FieldRef sports additional methods to get a flattened version
of nested fields GH-14946. Compared to their non-flattened
counterparts, the methods GetFlattened, GetAllFlattened,
GetOneFlattened and GetOneOrNoneFlattened combine a childs
null bitmap with its ancestors null bitmaps such as to compute
the fields overall logical validity bitmap.
* In other words, given the struct array [null, {'x': null},
{'x': 5}], FieldRef("x")::Get might return [0, null, 5] while
FieldRef("y")::GetFlattened will always return [null, null, 5].
* Scalar::hash() has been fixed for sliced nested arrays
GH-35360.
* A new floating-point to decimal conversion algorithm exhibits
much better precision GH-35576.
* It is now possible to cast between scalars of different
list-like types GH-36309.
-------------------------------------------------------------------
Mon Jun 12 12:13:18 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Update to 12.0.1
* [GH-35423] - [C++][Parquet] Parquet PageReader Force
decompression buffer resize smaller (#35428)
* [GH-35498] - [C++] Relax EnsureAlignment check in Acero from
requiring 64-byte aligned buffers to requiring value-aligned
buffers (#35565)
* [GH-35519] - [C++][Parquet] Fixing exception handling in parquet
FileSerializer (#35520)
* [GH-35538] - [C++] Remove unnecessary status.h include from
protobuf (#35673)
* [GH-35730] - [C++] Add the ability to specify custom schema on a
dataset write (#35860)
* [GH-35850] - [C++] Don't disable optimization with
RelWithDebInfo (#35856)
- Drop cflags.patch -- fixed upstream
-------------------------------------------------------------------
Thu May 18 07:00:43 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Update to 12.0.0
* Run-End Encoded Arrays have been implemented and are accessible
(GH-32104)
* The FixedShapeTensor Logical value type has been implemented
using ExtensionType (GH-15483, GH-34796)
## Compute
* New kernel to convert timestamp with timezone to wall time
(GH-33143)
* Cast kernels are now built into libarrow by default (GH-34388)
## Acero
* Acero has been moved out of libarrow into its own shared
library, allowing for smaller builds of the core libarrow
(GH-15280)
* Exec nodes now can have a concept of “ordering” and will reject
non-sensible plans (GH-34136)
* New exec nodes: “pivot_longer” (GH-34266), “order_by”
(GH-34248) and “fetch” (GH-34059)
* Breaking Change: Reorder output fields of “group_by” node so
that keys/segment keys come before aggregates (GH-33616)
## Substrait
* Add support for the round function GH-33588
* Add support for the cast expression element GH-31910
* Added API reference documentation GH-34011
* Added an extension relation to support segmented aggregation
GH-34626
* The output of the aggregate relation now conforms to the spec
GH-34786
## Parquet
* Added support for DeltaLengthByteArray encoding to the Parquet
writer (GH-33024)
* NaNs are correctly handled now for Parquet predicate push-downs
(GH-18481)
* Added support for reading Parquet page indexes (GH-33596) and
writing page indexes (GH-34053)
* Parquet writer can write columns in parallel now (GH-33655)
* Fixed incorrect number of rows in Parquet V2 page headers
(GH-34086)
* Fixed incorrect Parquet page null_count when stats are disabled
(GH-34326)
* Added support for reading BloomFilters to the Parquet Reader
(GH-34665)
* Parquet File-writer can now add additional key-value metadata
after it has been opened (GH-34888)
* Breaking Change: The default row group size for the Arrow
writer changed from 64Mi rows to 1Mi rows. GH-34280
## ORC
* Added support for the union type in ORC writer (GH-34262)
* Fixed ORC CHAR type mapping with Arrow (GH-34823)
* Fixed timestamp type mapping between ORC and arrow (GH-34590)
## Datasets
* Added support for reading JSON datasets (GH-33209)
* Dataset writer now supports specifying a function callback to
construct the file name in addition to the existing file name
template (GH-34565)
## Filesystems
* GcsFileSystem::OpenInputFile avoids unnecessary downloads
(GH-34051)
## Other changes
* Convenience Append(std::optional...) methods have been added to
array builders
([GH-14863](https://github.com/apache/arrow/issues/14863))
* A deprecated OpenTelemetry header was removed from the Flight
library (GH-34417)
* Fixed crash in “take” kernels on ExtensionArrays with an
underlying dictionary type (GH-34619)
* Fixed bug where the C-Data bridge did not preserve nullability
of map values on import (GH-34983)
* Added support for EqualOptions to RecordBatch::Equals
(GH-34968)
* zstd dependency upgraded to v1.5.5 (GH-34899)
* Improved handling of “logical” nulls such as with union and
RunEndEncoded arrays (GH-34361)
* Fixed incorrect handling of uncompressed body buffers in IPC
reader, added IpcWriteOptions::min_space_savings for optional
compression optimizations (GH-15102)
-------------------------------------------------------------------
Mon Apr 3 11:09:06 UTC 2023 - Andreas Schwab <schwab@suse.de>
- cflags.patch: fix option order to compile with optimisation
- Adjust constraints
-------------------------------------------------------------------
Wed Mar 29 13:13:13 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Remove gflags-static. It was only needed due to a packaging error
with gflags which is about to be fixed in Tumbleweed
- Disable build of the jemalloc memory pool backend
* It requires every consuming application to LD_PRELOAD
libjemalloc.so.2, even when it is not set as the default memory
pool, due to static TLS block allocation errors
* Usage of the bundled jemalloc as a workaround is not desired
(gh#apache/arrow#13739)
* jemalloc does not seem to have a clear advantage over the
system glibc allocator:
https://ursalabs.org/blog/2021-r-benchmarks-part-1
* This overrides the default behavior documented in
https://arrow.apache.org/docs/cpp/memory.html#default-memory-pool
-------------------------------------------------------------------
Sun Mar 12 04:28:52 UTC 2023 - Ben Greiner <code@bnavigator.de>
- Update to v11.0.0
* ARROW-4709 - [C++] Optimize for ordered JSON fields (#14100)
* ARROW-11776 - [C++][Java] Support parquet write from ArrowReader
to file (#14151)
* ARROW-13938 - [C++] Date and datetime types should autocast from
strings
* ARROW-14161 - [C++][Docs] Improve Parquet C++ docs (#14018)
* ARROW-14999 - [C++] Optional field name equality checks for map
and list type (#14847)
* ARROW-15538 - [C++] Expanding coverage of math functions from
Substrait to Acero (#14434)
* ARROW-15592 - [C++] Add support for custom output field names in
a substrait::PlanRel (#14292)
* ARROW-15732 - [C++] Do not use any CPU threads in execution plan
when use_threads is false (#15104)
* ARROW-16782 - [Format] Add REE definitions to FlatBuffers
(#14176)
* ARROW-17144 - [C++][Gandiva] Add sqrt function (#13656)
* ARROW-17301 - [C++] Implement compute function "binary_slice"
(#14550)
* ARROW-17509 - [C++] Simplify async scheduler by removing the
need to call End (#14524)
* ARROW-17520 - [C++] Implement SubStrait SetRel (UnionAll)
(#14186)
* ARROW-17610 - [C++] Support additional source types in
SourceNode (#14207)
* ARROW-17613 - [C++] Add function execution API for a
preconfigured kernel (#14043)
* ARROW-17640 - [C++] Add File Handling Test cases for GlobFile
handling in Substrait Read (#14132)
* ARROW-17798 - [C++][Parquet] Add DELTA_BINARY_PACKED encoder to
Parquet writer (#14191)
* ARROW-17825 - [C++] Allow the possibility to write several
tables in ORCFileWriter (#14219)
* ARROW-17836 - [C++] Allow specifying alignment of buffers
(#14225)
* ARROW-17837 - [C++][Acero] Create ExecPlan-owned QueryContext
that will store a plan's shared data structures (#14227)
* ARROW-17859 - [C++] Use self-pipe in signal-receiving StopSource
(#14250)
* ARROW-17867 - [C++][FlightRPC] Expose bulk parameter binding in
Flight SQL (#14266)
* ARROW-17932 - [C++] Implement streaming RecordBatchReader for
JSON (#14355)
* ARROW-17960 - [C++][Python] Implement list_slice kernel (#14395)
* ARROW-17966 - [C++] Adjust to new format for Substrait optional
arguments (#14415)
* ARROW-17975 - [C++] Create at-fork facility (#14594)
* ARROW-17980 - [C++] As-of-Join Substrait extension (#14485)
* ARROW-17989 - [C++][Python] Enable struct_field kernel to accept
string field names (#14495)
* ARROW-18008 - [Python][C++] Add use_threads to
run_substrait_query
* ARROW-18051 - [C++] Enable tests skipped by ARROW-16392 (#14425)
* ARROW-18095 - [CI][C++][MinGW] All tests exited with 0xc0000139
* ARROW-18113 - [C++] Add RandomAccessFile::ReadManyAsync (#14723)
* ARROW-18135 - [C++] Avoid warnings that ExecBatch::length may be
uninitialized (#14480)
* ARROW-18144 - [C++] Improve JSONTypeError error message in
testing (#14486)
* ARROW-18184 - [C++] Improve JSON parser benchmarks (#14552)
* ARROW-18206 - [C++][CI] Add a nightly build for C++20
compilation (#14571)
* ARROW-18235 - [C++][Gandiva] Fix the like function
implementation for escape chars (#14579)
* ARROW-18249 - [C++] Update vcpkg port to arrow 10.0.0
* ARROW-18253 - [C++][Parquet] Add additional bounds safety checks
(#14592)
* ARROW-18259 - [C++][CMake] Add support for system Thrift CMake
package (#14597)
* ARROW-18280 - [C++][Python] Support slicing to end in list_slice
kernel (#14749)
* ARROW-18282 - [C++][Python] Support step >= 1 in list_slice
kernel (#14696)
* ARROW-18287 - [C++][CMake] Add support for Brotli/utf8proc
provided by vcpkg (#14609)
* ARROW-18342 - [C++] AsofJoinNode support for Boolean data field
(#14658)
* ARROW-18350 - [C++] Use std::to_chars instead of std::to_string
(#14666)
* ARROW-18367 - [C++] Enable the creation of named table relations
(#14681)
* ARROW-18373 - Fix component drop-down, add license text (#14688)
* ARROW-18377 - MIGRATION: Automate component labels from issue
form content (#15245)
* ARROW-18395 - [C++] Move select-k implementation into separate
module
* ARROW-18402 - [C++] Expose DeclarationInfo (#14765)
* ARROW-18406 - [C++] Can't build Arrow with Substrait on Ubuntu
20.04 (#14735)
* ARROW-18409 - [GLib][Plasma] Suppress deprecated warning in
building plasma-glib (#14739)
* ARROW-18413 - [C++][Parquet] Expose page index info from
ColumnChunkMetaData (#14742)
* ARROW-18419 - [C++] Update vendored fast_float (#14817)
* ARROW-18420 - [C++][Parquet] Introduce ColumnIndex & OffsetIndex
(#14803)
* ARROW-18421 - [C++][ORC] Add accessor for stripe information in
reader (#14806)
* ARROW-18427 - [C++] Support negative tolerance in AsofJoinNode
(#14934)
* ARROW-18435 - [C++][Java] Update ORC to 1.8.1 (#14942)
* GH-14869 - [C++] Add Cflags.private defining _STATIC to .pc.in.
(#14900)
* GH-14920 - [C++][CMake] Add missing -latomic to Arrow CMake
package (#15251)
* GH-14937 - [C++] Add rank kernel benchmarks (#14938)
* GH-14951 - [C++][Parquet] Add benchmarks for DELTA_BINARY_PACKED
encoding (#15140)
* GH-15072 - [C++] Move the round functionality into a separate
module (#15073)
* GH-15074 - [Parquet][C++] change 16-bit page_ordinal to 32-bit
(#15182)
* GH-15096 - [C++] Substrait ProjectRel Emit Optimization (#15097)
* GH-15100 - [C++][Parquet] Add benchmark for reading strings from
Parquet (#15101)
* GH-15151 - [C++] Adding RecordBatchReaderSource to solve an
issue in R API (#15183)
* GH-15185 - [C++][Parquet] Improve documentation for Parquet
Reader column_indices (#15184)
* GH-15199 - [C++][Substrait] Allow
AGGREGATION_INVOCATION_UNSPECIFIED as valid invocation (#15198)
* GH-15200 - [C++] Created benchmarks for round kernels. (#15201)
* GH-15216 - [C++][Parquet] Parquet writer accepts RecordBatch
(#15240)
* GH-15226 - [C++] Add DurationType to hash kernels (#33685)
* GH-15237 - [C++] Add ::arrow::Unreachable() using
std::string_view (#15238)
* GH-15239 - [C++][Parquet] Parquet writer writes decimal as
int32/64 (#15244)
* GH-15290 - [C++][Compute] Optimize IfElse kernel AAS/ASA case
when the scalar is null (#15291)
* GH-33607 - [C++] Support optional additional arguments for
inline visit functions (#33608)
* GH-33657 - [C++] arrow-dataset.pc doesn't depend on parquet.pc
without ARROW_PARQUET=ON (#33665)
* PARQUET-2179 - [C++][Parquet] Add a test for skipping repeated
fields (#14366)
* PARQUET-2188 - [parquet-cpp] Add SkipRecords API to RecordReader
(#14142)
* PARQUET-2204 - [parquet-cpp] TypedColumnReaderImpl::Skip should
reuse scratch space (#14509)
* PARQUET-2206 - [parquet-cpp] Microbenchmark for ColumnReader
ReadBatch and Skip (#14523)
* PARQUET-2209 - [parquet-cpp] Optimize skip for the case that
number of values to skip equals page size (#14545)
* PARQUET-2210 - [C++][Parquet] Skip pages based on header
metadata using a callback (#14603)
* PARQUET-2211 - [C++] Print ColumnMetaData.encoding_stats field
(#14556)
- Remove unused python3-arrow package declaration
* Add options as recommended for python support
- Provide test data for unittests
- Don't use system jemalloc but bundle it in order to avoid
static TLS errors in consuming packages like python-pyarrow
* gh#apache/arrow#13739
-------------------------------------------------------------------
Sun Aug 28 19:30:50 UTC 2022 - Stefan Brüns <stefan.bruens@rwth-aachen.de>
- Revert ccache change, using ccache in a pristine buildroot
just slows down OBS builds (use --ccache for local builds).
- Remove unused gflags-static-devel dependency.
-------------------------------------------------------------------
Mon Aug 22 06:22:43 UTC 2022 - John Vandenberg <jayvdb@gmail.com>
- Speed up builds with ccache
-------------------------------------------------------------------
Sat Aug 6 01:59:08 UTC 2022 - Stefan Brüns <stefan.bruens@rwth-aachen.de>
- Update to v9.0.0
No (current) changelog provided
- Spec file cleanup:
* Remove lots of duplicate, unused, or wrong build dependencies
* Do not package outdated Readmes and Changelogs
- Enable tests, disable ones requiring external test data
-------------------------------------------------------------------
Sat Nov 14 09:07:59 UTC 2020 - John Vandenberg <jayvdb@gmail.com>
- Update to v2.0.0
-------------------------------------------------------------------
Wed Nov 13 21:14:00 UTC 2019 - TheBlackCat <toddrme2178@gmail.com>
- Initial spec for v0.12.0