Files
python-pyarrow/python-pyarrow.spec

167 lines
5.8 KiB
RPMSpec
Raw Normal View History

#
# spec file for package python-pyarrow
#
# Copyright (c) 2023 SUSE LLC
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
# upon. The license for this file, and modifications and additions to the
# file, is the same license as for the pristine package itself (unless the
# license for the pristine package is not an Open Source License, in which
# case the license is the MIT License). An "Open Source License" is a
# license that conforms to the Open Source Definition (Version 1.9)
# published by the Open Source Initiative.
# Please submit bugfixes or comments via https://bugs.opensuse.org/
#
%bcond_with xsimd
%define plainpython python
Name: python-pyarrow
Accepting request 1109687 from home:bnavigator:branches:devel:languages:python:numeric - Update to 13.0.0 ## Compatibility notes: * The default format version for Parquet has been bumped from 2.4 to 2.6 GH-35746. In practice, this means that nanosecond timestamps now preserve its resolution instead of being converted to microseconds. * Support for Python 3.7 is dropped GH-34788 ## New features: * Conversion to non-nano datetime64 for pandas >= 2.0 is now supported GH-33321 * Write page index is now supported GH-36284 * Bindings for reading JSON format in Dataset are added GH-34216 * keys_sorted property of MapType is now exposed GH-35112 ## Other improvements: * Common python functionality between Table and RecordBatch classes has been consolidated ( GH-36129, GH-35415, GH-35390, GH-34979, GH-34868, GH-31868) * Some functionality for FixedShapeTensorType has been improved (__reduce__ GH-36038, picklability GH-35599) * Pyarrow scalars can now be accepted in the array constructor GH-21761 * DataFrame Interchange Protocol implementation and usage is now documented GH-33980 * Conversion between Arrow and Pandas for map/pydict now has enhanced support GH-34729 * Usability of pc.map_lookup / MapLookupOptions is improved GH-36045 * zero_copy_only keyword can now also be accepted in ChunkedArray.to_numpy() GH-34787 * Python C++ codebase now has linter support in Archery and the CI GH-35485 ## Relevant bug fixes: * __array__ numpy conversion for Table and RecordBatch is now corrected so that np.asarray(pa.Table) doesn’t return a transposed result GH-34886 * parquet.write_to_dataset doesn’t create empty files for non-observed dictionary (category) values anymore GH-23870 * Dataset writer now also correctly follows default Parquet version of 2.6 GH-36537 * Comparing pyarrow.dataset.Partitioning with other type is now correctly handled GH-36659 * Pickling of pyarrow.dataset PartitioningFactory objects is now supported GH-34884 * None schema is now disallowed in parquet writer GH-35858 * pa.FixedShapeTensorArray.to_numpy_ndarray is not failing on sliced arrays GH-35573 * Halffloat type is now supported in the conversion from Arrow list to pandas GH-36168 * __from_arrow__ is now also implemented for Array.to_pandas for pandas extension data types GH-36096 - Add pyarrow-pr37481-pandas2.1.patch gh#apache/arrow#37481 OBS-URL: https://build.opensuse.org/request/show/1109687 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-pyarrow?expand=0&rev=13
2023-09-08 07:19:13 +00:00
Version: 13.0.0
Release: 0
Summary: Python library for Apache Arrow
License: Apache-2.0 AND BSD-3-Clause AND BSD-2-Clause AND MIT
Group: Development/Languages/Python
URL: https://arrow.apache.org/
Source0: https://github.com/apache/arrow/archive/apache-arrow-%{version}.tar.gz
Source99: python-pyarrow.rpmlintrc
Accepting request 1109687 from home:bnavigator:branches:devel:languages:python:numeric - Update to 13.0.0 ## Compatibility notes: * The default format version for Parquet has been bumped from 2.4 to 2.6 GH-35746. In practice, this means that nanosecond timestamps now preserve its resolution instead of being converted to microseconds. * Support for Python 3.7 is dropped GH-34788 ## New features: * Conversion to non-nano datetime64 for pandas >= 2.0 is now supported GH-33321 * Write page index is now supported GH-36284 * Bindings for reading JSON format in Dataset are added GH-34216 * keys_sorted property of MapType is now exposed GH-35112 ## Other improvements: * Common python functionality between Table and RecordBatch classes has been consolidated ( GH-36129, GH-35415, GH-35390, GH-34979, GH-34868, GH-31868) * Some functionality for FixedShapeTensorType has been improved (__reduce__ GH-36038, picklability GH-35599) * Pyarrow scalars can now be accepted in the array constructor GH-21761 * DataFrame Interchange Protocol implementation and usage is now documented GH-33980 * Conversion between Arrow and Pandas for map/pydict now has enhanced support GH-34729 * Usability of pc.map_lookup / MapLookupOptions is improved GH-36045 * zero_copy_only keyword can now also be accepted in ChunkedArray.to_numpy() GH-34787 * Python C++ codebase now has linter support in Archery and the CI GH-35485 ## Relevant bug fixes: * __array__ numpy conversion for Table and RecordBatch is now corrected so that np.asarray(pa.Table) doesn’t return a transposed result GH-34886 * parquet.write_to_dataset doesn’t create empty files for non-observed dictionary (category) values anymore GH-23870 * Dataset writer now also correctly follows default Parquet version of 2.6 GH-36537 * Comparing pyarrow.dataset.Partitioning with other type is now correctly handled GH-36659 * Pickling of pyarrow.dataset PartitioningFactory objects is now supported GH-34884 * None schema is now disallowed in parquet writer GH-35858 * pa.FixedShapeTensorArray.to_numpy_ndarray is not failing on sliced arrays GH-35573 * Halffloat type is now supported in the conversion from Arrow list to pandas GH-36168 * __from_arrow__ is now also implemented for Array.to_pandas for pandas extension data types GH-36096 - Add pyarrow-pr37481-pandas2.1.patch gh#apache/arrow#37481 OBS-URL: https://build.opensuse.org/request/show/1109687 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-pyarrow?expand=0&rev=13
2023-09-08 07:19:13 +00:00
# PATCH-FIX-UPSTREAM pyarrow-pr37481-pandas2.1.patch gh#apache/arrow#37481
Patch0: pyarrow-pr37481-pandas2.1.patch
BuildRequires: %{python_module Cython >= 0.29.31 with %python-Cython < 3}
Accepting request 1109687 from home:bnavigator:branches:devel:languages:python:numeric - Update to 13.0.0 ## Compatibility notes: * The default format version for Parquet has been bumped from 2.4 to 2.6 GH-35746. In practice, this means that nanosecond timestamps now preserve its resolution instead of being converted to microseconds. * Support for Python 3.7 is dropped GH-34788 ## New features: * Conversion to non-nano datetime64 for pandas >= 2.0 is now supported GH-33321 * Write page index is now supported GH-36284 * Bindings for reading JSON format in Dataset are added GH-34216 * keys_sorted property of MapType is now exposed GH-35112 ## Other improvements: * Common python functionality between Table and RecordBatch classes has been consolidated ( GH-36129, GH-35415, GH-35390, GH-34979, GH-34868, GH-31868) * Some functionality for FixedShapeTensorType has been improved (__reduce__ GH-36038, picklability GH-35599) * Pyarrow scalars can now be accepted in the array constructor GH-21761 * DataFrame Interchange Protocol implementation and usage is now documented GH-33980 * Conversion between Arrow and Pandas for map/pydict now has enhanced support GH-34729 * Usability of pc.map_lookup / MapLookupOptions is improved GH-36045 * zero_copy_only keyword can now also be accepted in ChunkedArray.to_numpy() GH-34787 * Python C++ codebase now has linter support in Archery and the CI GH-35485 ## Relevant bug fixes: * __array__ numpy conversion for Table and RecordBatch is now corrected so that np.asarray(pa.Table) doesn’t return a transposed result GH-34886 * parquet.write_to_dataset doesn’t create empty files for non-observed dictionary (category) values anymore GH-23870 * Dataset writer now also correctly follows default Parquet version of 2.6 GH-36537 * Comparing pyarrow.dataset.Partitioning with other type is now correctly handled GH-36659 * Pickling of pyarrow.dataset PartitioningFactory objects is now supported GH-34884 * None schema is now disallowed in parquet writer GH-35858 * pa.FixedShapeTensorArray.to_numpy_ndarray is not failing on sliced arrays GH-35573 * Halffloat type is now supported in the conversion from Arrow list to pandas GH-36168 * __from_arrow__ is now also implemented for Array.to_pandas for pandas extension data types GH-36096 - Add pyarrow-pr37481-pandas2.1.patch gh#apache/arrow#37481 OBS-URL: https://build.opensuse.org/request/show/1109687 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-pyarrow?expand=0&rev=13
2023-09-08 07:19:13 +00:00
BuildRequires: %{python_module devel >= 3.8}
BuildRequires: %{python_module numpy-devel >= 1.16.6}
BuildRequires: %{python_module pip}
BuildRequires: %{python_module setuptools_scm}
BuildRequires: %{python_module setuptools}
BuildRequires: %{python_module wheel}
Accepting request 1087838 from home:bnavigator:pyarrow - Update to 12.0.0 ## Compatibility notes: * Plasma has been removed in this release (GH-33243). In addition, the deprecated serialization module in PyArrow was also removed (GH-29705). IPC (Inter-Process Communication) functionality of pyarrow or the standard library pickle should be used instead. * The deprecated use_async keyword has been removed from the dataset module (GH-30774) * Minimum Cython version to build PyArrow from source has been raised to 0.29.31 (GH-34933). In addition, PyArrow can now be compiled using Cython 3 (GH-34564). ## New features: * A new pyarrow.acero module with initial bindings for the Acero execution engine has been added (GH-33976) * A new canonical extension type for fixed shaped tensor data has been defined. This is exposed in PyArrow as the FixedShapeTensorType (GH-34882, GH-34956) * Run-End Encoded arrays binding has been implemented (GH-34686, GH-34568) * Method is_nan has been added to Array, ChunkedArray and Expression (GH-34154) * Dataframe interchange protocol has been implemented for RecordBatch (GH-33926) ## Other improvements: * Extension arrays can now be concatenated (GH-31868) * get_partition_keys helper function is implemented in the dataset module to access the partitioning field’s key/value from the partition expression of a certain dataset fragment (GH-33825) OBS-URL: https://build.opensuse.org/request/show/1087838 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-pyarrow?expand=0&rev=4
2023-05-18 17:02:23 +00:00
BuildRequires: apache-arrow-acero-devel-static = %{version}
BuildRequires: apache-arrow-dataset-devel-static = %{version}
BuildRequires: apache-arrow-devel = %{version}
BuildRequires: apache-arrow-devel-static = %{version}
BuildRequires: apache-parquet-devel = %{version}
BuildRequires: apache-parquet-devel-static = %{version}
BuildRequires: cmake
BuildRequires: fdupes
BuildRequires: gcc-c++
Accepting request 1087838 from home:bnavigator:pyarrow - Update to 12.0.0 ## Compatibility notes: * Plasma has been removed in this release (GH-33243). In addition, the deprecated serialization module in PyArrow was also removed (GH-29705). IPC (Inter-Process Communication) functionality of pyarrow or the standard library pickle should be used instead. * The deprecated use_async keyword has been removed from the dataset module (GH-30774) * Minimum Cython version to build PyArrow from source has been raised to 0.29.31 (GH-34933). In addition, PyArrow can now be compiled using Cython 3 (GH-34564). ## New features: * A new pyarrow.acero module with initial bindings for the Acero execution engine has been added (GH-33976) * A new canonical extension type for fixed shaped tensor data has been defined. This is exposed in PyArrow as the FixedShapeTensorType (GH-34882, GH-34956) * Run-End Encoded arrays binding has been implemented (GH-34686, GH-34568) * Method is_nan has been added to Array, ChunkedArray and Expression (GH-34154) * Dataframe interchange protocol has been implemented for RecordBatch (GH-33926) ## Other improvements: * Extension arrays can now be concatenated (GH-31868) * get_partition_keys helper function is implemented in the dataset module to access the partitioning field’s key/value from the partition expression of a certain dataset fragment (GH-33825) OBS-URL: https://build.opensuse.org/request/show/1087838 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-pyarrow?expand=0&rev=4
2023-05-18 17:02:23 +00:00
BuildRequires: libzstd-devel-static
BuildRequires: openssl-devel
BuildRequires: pkgconfig
BuildRequires: python-rpm-macros
BuildRequires: cmake(re2)
BuildRequires: pkgconfig(bzip2) >= 1.0.8
BuildRequires: pkgconfig(gmock) >= 1.10
BuildRequires: pkgconfig(gtest) >= 1.10
Requires: python-numpy >= 1.16.6
# SECTION test requirements
BuildRequires: %{python_module hypothesis}
BuildRequires: %{python_module pandas}
BuildRequires: %{python_module pytest-lazy-fixture}
BuildRequires: %{python_module pytest-xdist}
Accepting request 1087838 from home:bnavigator:pyarrow - Update to 12.0.0 ## Compatibility notes: * Plasma has been removed in this release (GH-33243). In addition, the deprecated serialization module in PyArrow was also removed (GH-29705). IPC (Inter-Process Communication) functionality of pyarrow or the standard library pickle should be used instead. * The deprecated use_async keyword has been removed from the dataset module (GH-30774) * Minimum Cython version to build PyArrow from source has been raised to 0.29.31 (GH-34933). In addition, PyArrow can now be compiled using Cython 3 (GH-34564). ## New features: * A new pyarrow.acero module with initial bindings for the Acero execution engine has been added (GH-33976) * A new canonical extension type for fixed shaped tensor data has been defined. This is exposed in PyArrow as the FixedShapeTensorType (GH-34882, GH-34956) * Run-End Encoded arrays binding has been implemented (GH-34686, GH-34568) * Method is_nan has been added to Array, ChunkedArray and Expression (GH-34154) * Dataframe interchange protocol has been implemented for RecordBatch (GH-33926) ## Other improvements: * Extension arrays can now be concatenated (GH-31868) * get_partition_keys helper function is implemented in the dataset module to access the partitioning field’s key/value from the partition expression of a certain dataset fragment (GH-33825) OBS-URL: https://build.opensuse.org/request/show/1087838 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-pyarrow?expand=0&rev=4
2023-05-18 17:02:23 +00:00
BuildRequires: %{python_module pytest}
# /SECTION
%python_subpackages
%description
Python library for Apache Arrow.
Apache Arrow defines a language-independent columnar
memory format for flat and hierarchical data, organized
for efficient analytic operations on modern hardware like
CPUs and GPUs. The Arrow memory format also supports
zero-copy reads for lightning-fast data access without
serialization overhead.
Arrow's libraries implement the format and provide building
blocks for a range of use cases, including high performance
analytics. Many popular projects use Arrow to ship columnar
data efficiently or as the basis for analytic engines.
%package devel
Accepting request 1087838 from home:bnavigator:pyarrow - Update to 12.0.0 ## Compatibility notes: * Plasma has been removed in this release (GH-33243). In addition, the deprecated serialization module in PyArrow was also removed (GH-29705). IPC (Inter-Process Communication) functionality of pyarrow or the standard library pickle should be used instead. * The deprecated use_async keyword has been removed from the dataset module (GH-30774) * Minimum Cython version to build PyArrow from source has been raised to 0.29.31 (GH-34933). In addition, PyArrow can now be compiled using Cython 3 (GH-34564). ## New features: * A new pyarrow.acero module with initial bindings for the Acero execution engine has been added (GH-33976) * A new canonical extension type for fixed shaped tensor data has been defined. This is exposed in PyArrow as the FixedShapeTensorType (GH-34882, GH-34956) * Run-End Encoded arrays binding has been implemented (GH-34686, GH-34568) * Method is_nan has been added to Array, ChunkedArray and Expression (GH-34154) * Dataframe interchange protocol has been implemented for RecordBatch (GH-33926) ## Other improvements: * Extension arrays can now be concatenated (GH-31868) * get_partition_keys helper function is implemented in the dataset module to access the partitioning field’s key/value from the partition expression of a certain dataset fragment (GH-33825) OBS-URL: https://build.opensuse.org/request/show/1087838 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-pyarrow?expand=0&rev=4
2023-05-18 17:02:23 +00:00
Summary: Python library for Apache Arrow - header files
Requires: python-Cython
Requires: python-pyarrow = %{version}
Requires: %plainpython(abi) = %python_version
Supplements: (python-devel and python-pyarrow)
%description devel
Python library for Apache Arrow.
This package provides the header files within the python
platlib for consuming modules using cythonization.
%prep
%autosetup -p1 -n arrow-apache-arrow-%{version}
# we disabled the jemalloc backend in apache-arrow
sed -i 's/should_have_jemalloc = sys.platform == "linux"/should_have_jemalloc = False/' python/pyarrow/tests/test_memory.py
%build
pushd python
export CFLAGS="%{optflags}"
export PYARROW_BUILD_TYPE=relwithdebinfo
export PYARROW_BUILD_VERBOSE=1
%{?_smp_build_ncpus:export PYARROW_PARALLEL=%{_smp_build_ncpus}}
export PYARROW_WITH_HDFS=1
export PYARROW_WITH_DATASET=1
export PYARROW_WITH_PARQUET=1
export PYARROW_WITH_PARQUET_ENCRYPTION=0
export PYARROW_PARQUET_USE_SHARED=1
# x86_64-v1 does not have the advanced SIMD instructions. TW is stuck on it, we can't have -v3 through hwcaps as non-lib.
export PYARROW_CMAKE_OPTIONS=" \
%ifarch aarch64
-DARROW_SIMD_LEVEL:STRING=%{?with_xsimd:NEON}%{!?with_xsimd:NONE} \
%else
-DARROW_SIMD_LEVEL:STRING="NONE" \
%endif
-DARROW_RUNTIME_SIMD_LEVEL:STRING=%{?with_xsimd:MAX}%{!?with_xsimd:NONE} \
"
%pyproject_wheel
popd
%install
pushd python
%pyproject_install
%python_expand %fdupes %{buildroot}%{$python_sitearch}
popd
%check
# Unexpected additional warning
donttest="test_env_var"
Accepting request 1087838 from home:bnavigator:pyarrow - Update to 12.0.0 ## Compatibility notes: * Plasma has been removed in this release (GH-33243). In addition, the deprecated serialization module in PyArrow was also removed (GH-29705). IPC (Inter-Process Communication) functionality of pyarrow or the standard library pickle should be used instead. * The deprecated use_async keyword has been removed from the dataset module (GH-30774) * Minimum Cython version to build PyArrow from source has been raised to 0.29.31 (GH-34933). In addition, PyArrow can now be compiled using Cython 3 (GH-34564). ## New features: * A new pyarrow.acero module with initial bindings for the Acero execution engine has been added (GH-33976) * A new canonical extension type for fixed shaped tensor data has been defined. This is exposed in PyArrow as the FixedShapeTensorType (GH-34882, GH-34956) * Run-End Encoded arrays binding has been implemented (GH-34686, GH-34568) * Method is_nan has been added to Array, ChunkedArray and Expression (GH-34154) * Dataframe interchange protocol has been implemented for RecordBatch (GH-33926) ## Other improvements: * Extension arrays can now be concatenated (GH-31868) * get_partition_keys helper function is implemented in the dataset module to access the partitioning field’s key/value from the partition expression of a certain dataset fragment (GH-33825) OBS-URL: https://build.opensuse.org/request/show/1087838 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-pyarrow?expand=0&rev=4
2023-05-18 17:02:23 +00:00
# flaky
donttest="$donttest or test_total_bytes_allocated"
%ifarch %{ix86} %{arm32}
# tests conversion to 64bit datatypes
donttest="$donttest or test_conversion"
donttest="$donttest or test_dictionary_to_numpy"
donttest="$donttest or test_foreign_buffer"
donttest="$donttest or test_from_numpy_nested"
donttest="$donttest or test_integer_limits"
donttest="$donttest or test_memory_map_large_seeks"
donttest="$donttest or test_primitive_serialization"
donttest="$donttest or test_python_file_large_seeks"
donttest="$donttest or test_schema_sizeof"
%endif
%pytest_arch --pyargs pyarrow -n auto -k "not ($donttest)"
Accepting request 1087838 from home:bnavigator:pyarrow - Update to 12.0.0 ## Compatibility notes: * Plasma has been removed in this release (GH-33243). In addition, the deprecated serialization module in PyArrow was also removed (GH-29705). IPC (Inter-Process Communication) functionality of pyarrow or the standard library pickle should be used instead. * The deprecated use_async keyword has been removed from the dataset module (GH-30774) * Minimum Cython version to build PyArrow from source has been raised to 0.29.31 (GH-34933). In addition, PyArrow can now be compiled using Cython 3 (GH-34564). ## New features: * A new pyarrow.acero module with initial bindings for the Acero execution engine has been added (GH-33976) * A new canonical extension type for fixed shaped tensor data has been defined. This is exposed in PyArrow as the FixedShapeTensorType (GH-34882, GH-34956) * Run-End Encoded arrays binding has been implemented (GH-34686, GH-34568) * Method is_nan has been added to Array, ChunkedArray and Expression (GH-34154) * Dataframe interchange protocol has been implemented for RecordBatch (GH-33926) ## Other improvements: * Extension arrays can now be concatenated (GH-31868) * get_partition_keys helper function is implemented in the dataset module to access the partitioning field’s key/value from the partition expression of a certain dataset fragment (GH-33825) OBS-URL: https://build.opensuse.org/request/show/1087838 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-pyarrow?expand=0&rev=4
2023-05-18 17:02:23 +00:00
%pytest_arch --pyargs pyarrow -n auto -k "$donttest" || :
%files %{python_files}
%doc README.md
%license LICENSE.txt NOTICE.txt
%{python_sitearch}/pyarrow
%exclude %{python_sitearch}/pyarrow/include
%exclude %{python_sitearch}/pyarrow/src
%exclude %{python_sitearch}/pyarrow/lib.h
%exclude %{python_sitearch}/pyarrow/lib_api.h
%{python_sitearch}/pyarrow-%{version}.dist-info
%files %{python_files devel}
%doc README.md
%license LICENSE.txt NOTICE.txt
%{python_sitearch}/pyarrow/include
%{python_sitearch}/pyarrow/src
%{python_sitearch}/pyarrow/lib.h
%{python_sitearch}/pyarrow/lib_api.h
%changelog