forked from pool/python-fastparquet
Accepting request 911011 from devel:languages:python:numeric
OBS-URL: https://build.opensuse.org/request/show/911011 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/python-fastparquet?expand=0&rev=18
This commit is contained in:
commit
58f5dcc434
@ -1,3 +0,0 @@
|
|||||||
version https://git-lfs.github.com/spec/v1
|
|
||||||
oid sha256:ae834d98670b7d67fd3dbadd09c6475de4a675e74eca9160969a9bd0fef2f4c2
|
|
||||||
size 29120288
|
|
3
fastparquet-0.7.1.tar.gz
Normal file
3
fastparquet-0.7.1.tar.gz
Normal file
@ -0,0 +1,3 @@
|
|||||||
|
version https://git-lfs.github.com/spec/v1
|
||||||
|
oid sha256:cc55e0f9048394e3b67d3af934bc690572e81c0c22488b63960bbe67d16e113e
|
||||||
|
size 29164760
|
@ -1,3 +1,79 @@
|
|||||||
|
-------------------------------------------------------------------
|
||||||
|
Sun Aug 8 15:13:55 UTC 2021 - Ben Greiner <code@bnavigator.de>
|
||||||
|
|
||||||
|
- Update to version 0.7.1
|
||||||
|
* Back compile for older versions of numpy
|
||||||
|
* Make pandas nullable types opt-out. The old behaviour (casting
|
||||||
|
to float) is still available with ParquetFile(...,
|
||||||
|
pandas_nulls=False).
|
||||||
|
* Fix time field regression: IsAdjustedToUTC will be False when
|
||||||
|
there is no timezone
|
||||||
|
* Micro improvements to the speed of ParquetFile creation by
|
||||||
|
using simple simple string ops instead of regex and
|
||||||
|
regularising filenames once at the start. Effects datasets with
|
||||||
|
many files.
|
||||||
|
- Release 0.7.0
|
||||||
|
* This version institutes major, breaking changes, listed here,
|
||||||
|
and incremental fixes and additions.
|
||||||
|
* Reading a directory without a _metadata summary file now works
|
||||||
|
by providing only the directory, instead of a list of
|
||||||
|
constituent files. This change also makes direct of use of
|
||||||
|
fsspec filesystems, if given, to be able to load the footer
|
||||||
|
metadata areas of the files concurrently, if the storage
|
||||||
|
backend supports it, and not directly instantiating
|
||||||
|
intermediate ParquetFile instances
|
||||||
|
* row-level filtering of the data. Whereas previously, only full
|
||||||
|
row-groups could be excluded on the basis of their parquet
|
||||||
|
metadata statistics (if present), filtering can now be done
|
||||||
|
within row-groups too. The syntax is the same as before,
|
||||||
|
allowing for multiple column expressions to be combined with
|
||||||
|
AND|OR, depending on the list structure. This mechanism
|
||||||
|
requires two passes: one to load the columns needed to create
|
||||||
|
the boolean mask, and another to load the columns actually
|
||||||
|
needed in the output. This will not be faster, and may be
|
||||||
|
slower, but in some cases can save significant memory
|
||||||
|
footprint, if a small fraction of rows are considered good and
|
||||||
|
the columns for the filter expression are not in the output.
|
||||||
|
Not currently supported for reading with DataPageV2.
|
||||||
|
* DELTA integer encoding (read-only): experimentally working,
|
||||||
|
but we only have one test file to verify against, since it is
|
||||||
|
not trivial to persuade Spark to produce files encoded this
|
||||||
|
way. DELTA can be extremely compact a representation for
|
||||||
|
slowly varying and/or monotonically increasing integers.
|
||||||
|
* nanosecond resolution times: the new extended "logical" types
|
||||||
|
system supports nanoseconds alongside the previous millis and
|
||||||
|
micros. We now emit these for the default pandas time type,
|
||||||
|
and produce full parquet schema including both "converted" and
|
||||||
|
"logical" type information. Note that all output has
|
||||||
|
isAdjustedToUTC=True, i.e., these are timestamps rather than
|
||||||
|
local time. The time-zone is stored in the metadata, as
|
||||||
|
before, and will be successfully recreated only in fastparquet
|
||||||
|
and (py)arrow. Otherwise, the times will appear to be UTC. For
|
||||||
|
compatibility with Spark, you may still want to use
|
||||||
|
times="int96" when writing.
|
||||||
|
* DataPageV2 writing: now we support both reading and writing.
|
||||||
|
For writing, can be enabled with the environment variable
|
||||||
|
FASTPARQUET_DATAPAGE_V2, or module global fastparquet.writer.
|
||||||
|
DATAPAGE_VERSION and is off by default. It will become on by
|
||||||
|
default in the future. In many cases, V2 will result in better
|
||||||
|
read performance, because the data and page headers are
|
||||||
|
encoded separately, so data can be directly read into the
|
||||||
|
output without addition allocation/copies. This feature is
|
||||||
|
considered experimental, but we believe it working well for
|
||||||
|
most use cases (i.e., our test suite) and should be readable
|
||||||
|
by all modern parquet frameworks including arrow and spark.
|
||||||
|
* pandas nullable types: pandas supports "masked" extension
|
||||||
|
arrays for types that previously could not support NULL at
|
||||||
|
all: ints and bools. Fastparquet used to cast such columns to
|
||||||
|
float, so that we could represent NULLs as NaN; now we use the
|
||||||
|
new(er) masked types by default. This means faster reading of
|
||||||
|
such columns, as there is no conversion. If the metadata
|
||||||
|
guarantees that there are no nulls, we still use the
|
||||||
|
non-nullable variant unless the data was written with
|
||||||
|
fastparquet/pyarrow, and the metadata indicates that the
|
||||||
|
original datatype was nullable. We already handled writing of
|
||||||
|
nullable columns.
|
||||||
|
|
||||||
-------------------------------------------------------------------
|
-------------------------------------------------------------------
|
||||||
Tue May 18 14:41:46 UTC 2021 - Ben Greiner <code@bnavigator.de>
|
Tue May 18 14:41:46 UTC 2021 - Ben Greiner <code@bnavigator.de>
|
||||||
|
|
||||||
|
@ -21,7 +21,7 @@
|
|||||||
%define skip_python2 1
|
%define skip_python2 1
|
||||||
%define skip_python36 1
|
%define skip_python36 1
|
||||||
Name: python-fastparquet
|
Name: python-fastparquet
|
||||||
Version: 0.6.3
|
Version: 0.7.1
|
||||||
Release: 0
|
Release: 0
|
||||||
Summary: Python support for Parquet file format
|
Summary: Python support for Parquet file format
|
||||||
License: Apache-2.0
|
License: Apache-2.0
|
||||||
@ -29,8 +29,9 @@ URL: https://github.com/dask/fastparquet/
|
|||||||
Source: https://github.com/dask/fastparquet/archive/%{version}.tar.gz#/fastparquet-%{version}.tar.gz
|
Source: https://github.com/dask/fastparquet/archive/%{version}.tar.gz#/fastparquet-%{version}.tar.gz
|
||||||
BuildRequires: %{python_module Cython}
|
BuildRequires: %{python_module Cython}
|
||||||
BuildRequires: %{python_module cramjam >= 2.3.0}
|
BuildRequires: %{python_module cramjam >= 2.3.0}
|
||||||
BuildRequires: %{python_module fsspec}
|
# version requirement not declared for runtime, but necessary for tests.
|
||||||
BuildRequires: %{python_module numpy-devel >= 1.11}
|
BuildRequires: %{python_module fsspec >= 2021.6.0}
|
||||||
|
BuildRequires: %{python_module numpy-devel >= 1.18}
|
||||||
BuildRequires: %{python_module pandas >= 1.1.0}
|
BuildRequires: %{python_module pandas >= 1.1.0}
|
||||||
BuildRequires: %{python_module pytest}
|
BuildRequires: %{python_module pytest}
|
||||||
BuildRequires: %{python_module python-lzo}
|
BuildRequires: %{python_module python-lzo}
|
||||||
@ -40,7 +41,7 @@ BuildRequires: fdupes
|
|||||||
BuildRequires: python-rpm-macros
|
BuildRequires: python-rpm-macros
|
||||||
Requires: python-cramjam >= 2.3.0
|
Requires: python-cramjam >= 2.3.0
|
||||||
Requires: python-fsspec
|
Requires: python-fsspec
|
||||||
Requires: python-numpy >= 1.11
|
Requires: python-numpy >= 1.18
|
||||||
Requires: python-pandas >= 1.1.0
|
Requires: python-pandas >= 1.1.0
|
||||||
Requires: python-thrift >= 0.11.0
|
Requires: python-thrift >= 0.11.0
|
||||||
Recommends: python-python-lzo
|
Recommends: python-python-lzo
|
||||||
@ -54,6 +55,8 @@ for integrating it into python-based Big Data workflows.
|
|||||||
%setup -q -n fastparquet-%{version}
|
%setup -q -n fastparquet-%{version}
|
||||||
# remove pytest-runner from setup_requires
|
# remove pytest-runner from setup_requires
|
||||||
sed -i "s/'pytest-runner',//" setup.py
|
sed -i "s/'pytest-runner',//" setup.py
|
||||||
|
# this is not meant for setup.py
|
||||||
|
sed -i "s/oldest-supported-numpy/numpy/" setup.py
|
||||||
# the tests import the fastparquet.test module and we need to import from sitearch, so install it.
|
# the tests import the fastparquet.test module and we need to import from sitearch, so install it.
|
||||||
sed -i -e "s/^\s*packages=\[/&'fastparquet.test', /" -e "/exclude_package_data/ d" setup.py
|
sed -i -e "s/^\s*packages=\[/&'fastparquet.test', /" -e "/exclude_package_data/ d" setup.py
|
||||||
|
|
||||||
|
Loading…
Reference in New Issue
Block a user