Accepting request 1130498 from devel:languages:python:numeric

- update to 2023.8.0: * More general timestamp units (#874) * ReadTheDocs V2 (#871) * Better roundtrip dtypes (#861, 859) * No convert when computing bytes-per-item for str (#858) - Add patch to fox the test test_delta_from_def_2 on * row-level filtering of the data. Whereas previously, only full row-groups could be excluded on the basis of their parquet metadata statistics (if present), filtering can now be done within row-groups too. The syntax is the same as before, allowing for multiple column expressions to be combined with AND|OR, depending on the list structure. This mechanism requires two passes: one to load the columns needed to create the boolean mask, and another to load the columns actually needed in the output. This will not be faster, and may be slower, but in some cases can save significant memory footprint, if a small fraction of rows are considered good and the columns for the filter expression are not in the output. * DELTA integer encoding (read-only): experimentally working, but we only have one test file to verify against, since it is not trivial to persuade Spark to produce files encoded this way. DELTA can be extremely compact a representation for * nanosecond resolution times: the new extended "logical" types system supports nanoseconds alongside the previous millis and micros. We now emit these for the default pandas time type, and produce full parquet schema including both "converted" and "logical" type information. Note that all output has isAdjustedToUTC=True, i.e., these are timestamps rather than local time. The time-zone is stored in the metadata, as OBS-URL: https://build.opensuse.org/request/show/1130498 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/python-fastparquet?expand=0&rev=29
2023-12-03 19:49:03 +00:00 · 2023-12-03 19:49:03 +00:00 · 981ea802f6
commit 981ea802f6
parent dd692944e8 548c6cc76f
4 changed files with 61 additions and 52 deletions
--- a/fastparquet-2023.7.0.tar.gz
+++ b/fastparquet-2023.7.0.tar.gz
@ -1,3 +0,0 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:3347318ce53194498e81b0203e0a3e0b2ab5dec946d274756bb44dbc5610cc0e
 size 28907973
--- a/fastparquet-2023.8.0.tar.gz
+++ b/fastparquet-2023.8.0.tar.gz
@ -0,0 +1,3 @@
 version https://git-lfs.github.com/spec/v1
 oid sha256:67cf29707c47003d33609a3c9a973714ab3646fc87c30a5b1eefc81d0c4048e1
 size 28904480
--- a/python-fastparquet.changes
+++ b/python-fastparquet.changes
@ -1,3 +1,12 @@
 -------------------------------------------------------------------
 Mon Sep 11 21:29:16 UTC 2023 - Dirk Müller <dmueller@suse.com>
 - update to 2023.8.0:
  * More general timestamp units (#874)
  * ReadTheDocs V2 (#871)
  * Better roundtrip dtypes (#861, 859)
  * No convert when computing bytes-per-item for str (#858)
 -------------------------------------------------------------------
 Sat Jul  1 20:05:36 UTC 2023 - Arun Persaud <arun@gmx.de>
--- a/python-fastparquet.spec
+++ b/python-fastparquet.spec
@ -17,7 +17,7 @@
 Name:           python-fastparquet
-Version:        2023.7.0
+Version:        2023.8.0
 Release:        0
 Summary:        Python support for Parquet file format
 License:        Apache-2.0