Dirk Mueller
548c6cc76f
* More general timestamp units (#874) * ReadTheDocs V2 (#871) * Better roundtrip dtypes (#861, 859) * No convert when computing bytes-per-item for str (#858) - Add patch to fox the test test_delta_from_def_2 on * row-level filtering of the data. Whereas previously, only full row-groups could be excluded on the basis of their parquet metadata statistics (if present), filtering can now be done within row-groups too. The syntax is the same as before, allowing for multiple column expressions to be combined with AND|OR, depending on the list structure. This mechanism requires two passes: one to load the columns needed to create the boolean mask, and another to load the columns actually needed in the output. This will not be faster, and may be slower, but in some cases can save significant memory footprint, if a small fraction of rows are considered good and the columns for the filter expression are not in the output. * DELTA integer encoding (read-only): experimentally working, but we only have one test file to verify against, since it is not trivial to persuade Spark to produce files encoded this way. DELTA can be extremely compact a representation for * nanosecond resolution times: the new extended "logical" types system supports nanoseconds alongside the previous millis and micros. We now emit these for the default pandas time type, and produce full parquet schema including both "converted" and "logical" type information. Note that all output has isAdjustedToUTC=True, i.e., these are timestamps rather than local time. The time-zone is stored in the metadata, as before, and will be successfully recreated only in fastparquet OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=52
87 lines
3.2 KiB
RPMSpec
87 lines
3.2 KiB
RPMSpec
#
|
|
# spec file for package python-fastparquet
|
|
#
|
|
# Copyright (c) 2023 SUSE LLC
|
|
#
|
|
# All modifications and additions to the file contributed by third parties
|
|
# remain the property of their copyright owners, unless otherwise agreed
|
|
# upon. The license for this file, and modifications and additions to the
|
|
# file, is the same license as for the pristine package itself (unless the
|
|
# license for the pristine package is not an Open Source License, in which
|
|
# case the license is the MIT License). An "Open Source License" is a
|
|
# license that conforms to the Open Source Definition (Version 1.9)
|
|
# published by the Open Source Initiative.
|
|
|
|
# Please submit bugfixes or comments via https://bugs.opensuse.org/
|
|
#
|
|
|
|
|
|
Name: python-fastparquet
|
|
Version: 2023.8.0
|
|
Release: 0
|
|
Summary: Python support for Parquet file format
|
|
License: Apache-2.0
|
|
URL: https://github.com/dask/fastparquet/
|
|
# Use GitHub archive, because it containts the test modules and data, requires setting version manuall for setuptools_scm
|
|
Source: https://github.com/dask/fastparquet/archive/%{version}.tar.gz#/fastparquet-%{version}.tar.gz
|
|
BuildRequires: %{python_module Cython >= 0.29.23}
|
|
BuildRequires: %{python_module base >= 3.8}
|
|
BuildRequires: %{python_module cramjam >= 2.3.0}
|
|
# version requirement not declared for runtime, but necessary for tests.
|
|
BuildRequires: %{python_module fsspec >= 2021.6.0}
|
|
BuildRequires: %{python_module numpy-devel >= 1.20.3}
|
|
BuildRequires: %{python_module packaging}
|
|
BuildRequires: %{python_module pandas >= 1.5.0}
|
|
BuildRequires: %{python_module pip}
|
|
BuildRequires: %{python_module pytest-asyncio}
|
|
BuildRequires: %{python_module pytest-xdist}
|
|
BuildRequires: %{python_module pytest}
|
|
BuildRequires: %{python_module python-lzo}
|
|
BuildRequires: %{python_module setuptools_scm > 1.5.4}
|
|
BuildRequires: %{python_module setuptools}
|
|
BuildRequires: %{python_module wheel}
|
|
BuildRequires: fdupes
|
|
BuildRequires: git-core
|
|
BuildRequires: python-rpm-macros
|
|
Requires: python-cramjam >= 2.3.0
|
|
Requires: python-fsspec
|
|
Requires: python-numpy >= 1.20.3
|
|
Requires: python-packaging
|
|
Requires: python-pandas >= 1.5.0
|
|
Recommends: python-python-lzo
|
|
%python_subpackages
|
|
|
|
%description
|
|
This is a Python implementation of the parquet format
|
|
for integrating it into python-based Big Data workflows.
|
|
|
|
%prep
|
|
%autosetup -p1 -n fastparquet-%{version}
|
|
# remove pytest-runner from setup_requires
|
|
sed -i "s/'pytest-runner',//" setup.py
|
|
# this is not meant for setup.py
|
|
sed -i "s/oldest-supported-numpy/numpy/" setup.py
|
|
# the tests import the fastparquet.test module and we need to import from sitearch, so install it.
|
|
sed -i -e "s/^\s*packages=\[/&'fastparquet.test', /" -e "/exclude_package_data/ d" setup.py
|
|
|
|
%build
|
|
export CFLAGS="%{optflags}"
|
|
export SETUPTOOLS_SCM_PRETEND_VERSION=%{version}
|
|
%pyproject_wheel
|
|
|
|
%install
|
|
%pyproject_install
|
|
%python_expand rm -v %{buildroot}%{$python_sitearch}/fastparquet/{speedups,cencoding}.c
|
|
%python_expand %fdupes %{buildroot}%{$python_sitearch}
|
|
|
|
%check
|
|
%pytest_arch --pyargs fastparquet --import-mode append -n auto
|
|
|
|
%files %{python_files}
|
|
%doc README.rst
|
|
%license LICENSE
|
|
%{python_sitearch}/fastparquet
|
|
%{python_sitearch}/fastparquet-%{version}*-info
|
|
|
|
%changelog
|