Dirk Mueller
593e78e75e
* Datetime units in empty() with tz (#893) * Fewer inplace decompressions for V2 pages (#890 * Allow writing categorical column with no categories (#888) * Fixes for new numpy (#886) * RLE bools and DELTA for v1 pages (#885, 883) OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=54
510 lines
19 KiB
Plaintext
510 lines
19 KiB
Plaintext
-------------------------------------------------------------------
|
||
Tue Dec 5 12:23:32 UTC 2023 - Dirk Müller <dmueller@suse.com>
|
||
|
||
- update to 2023.10.0:
|
||
* Datetime units in empty() with tz (#893)
|
||
* Fewer inplace decompressions for V2 pages (#890
|
||
* Allow writing categorical column with no categories (#888)
|
||
* Fixes for new numpy (#886)
|
||
* RLE bools and DELTA for v1 pages (#885, 883)
|
||
|
||
-------------------------------------------------------------------
|
||
Mon Sep 11 21:29:16 UTC 2023 - Dirk Müller <dmueller@suse.com>
|
||
|
||
- update to 2023.8.0:
|
||
* More general timestamp units (#874)
|
||
* ReadTheDocs V2 (#871)
|
||
* Better roundtrip dtypes (#861, 859)
|
||
* No convert when computing bytes-per-item for str (#858)
|
||
|
||
-------------------------------------------------------------------
|
||
Sat Jul 1 20:05:36 UTC 2023 - Arun Persaud <arun@gmx.de>
|
||
|
||
- update to version 2023.7.0:
|
||
* Add test case for reading non-pandas parquet file (#870)
|
||
* Extra field when cloning ParquetFile (#866)
|
||
|
||
-------------------------------------------------------------------
|
||
Fri Apr 28 08:10:46 UTC 2023 - Dirk Müller <dmueller@suse.com>
|
||
|
||
- update to 2023.4.0:
|
||
* allow loading categoricals even if not so in the pandas metadata,
|
||
when a column is dict-encodedand we only have one row-group (#863)
|
||
 * apply dtype to the columns names series, even when selecting no
|
||
columns (#861, 859)
|
||
 * don't make strings while estimating bye column size (#858)
|
||
 * handle upstream depr (#857, 856)
|
||
|
||
-------------------------------------------------------------------
|
||
Thu Feb 9 15:55:08 UTC 2023 - Arun Persaud <arun@gmx.de>
|
||
|
||
- update to version 2023.2.0:
|
||
* revert one-level set of filters (#852)
|
||
* full size dict for decoding V2 pages (#850)
|
||
* infer_object_encoding fix (#847)
|
||
* row filtering with V2 pages (#845)
|
||
|
||
-------------------------------------------------------------------
|
||
Wed Feb 8 18:25:03 UTC 2023 - Arun Persaud <arun@gmx.de>
|
||
|
||
- specfile:
|
||
* remove fastparquet-pr835.patch, implemented upstream
|
||
|
||
- update to version 2023.1.0:
|
||
* big improvement to write speed
|
||
* paging support for bigger row-groups
|
||
* pandas 2.0 support
|
||
* delta for big-endian architecture
|
||
|
||
-------------------------------------------------------------------
|
||
Mon Jan 2 20:38:49 UTC 2023 - Ben Greiner <code@bnavigator.de>
|
||
|
||
- Update to 2022.12.0
|
||
* check all int32 values before passing to thrift writer
|
||
* fix type of num_rows to i64 for big single file
|
||
- Release 2022.11.0
|
||
* Switch to calver
|
||
* Speed up loading of nullable types
|
||
* Allow schema evolution by addition of columns
|
||
* Allow specifying dtypes of output
|
||
* update to scm versioning
|
||
* fixes to row filter, statistics and tests
|
||
* support pathlib.Paths
|
||
* JSON encoder options
|
||
- Drop fastparquet-pr813-updatefixes.patch
|
||
|
||
-------------------------------------------------------------------
|
||
Fri Dec 23 09:18:39 UTC 2022 - Guillaume GARDET <guillaume.gardet@opensuse.org>
|
||
|
||
- Add patch to fox the test test_delta_from_def_2 on
|
||
aarch64, armv7 and ppc64le:
|
||
* fastparquet-pr835.patch
|
||
|
||
-------------------------------------------------------------------
|
||
Fri Oct 28 15:47:41 UTC 2022 - Ben Greiner <code@bnavigator.de>
|
||
|
||
- Update to 0.8.3
|
||
* improved key/value handling and rejection of bad types
|
||
* fix regression in consolidate_cats (caught in dask tests)
|
||
- Release 0.8.2
|
||
* datetime indexes initialised to 0 to prevent overflow from
|
||
randommemory
|
||
* case from csv_to_parquet where stats exists but has not nulls
|
||
entry
|
||
* define len and bool for ParquetFile
|
||
* maintain int types of optional data tha came from pandas
|
||
* fix for delta encoding
|
||
- Add fastparquet-pr813-updatefixes.patch gh#dask/fastparquet#813
|
||
|
||
-------------------------------------------------------------------
|
||
Tue Apr 26 11:02:27 UTC 2022 - Ben Greiner <code@bnavigator.de>
|
||
|
||
- Update to 0.8.1
|
||
* fix critical buffer overflow crash for large number of columns
|
||
and long column names
|
||
* metadata handling
|
||
* thrift int32 for list
|
||
* avoid error storing NaNs in column stats
|
||
|
||
-------------------------------------------------------------------
|
||
Sat Jan 29 21:36:38 UTC 2022 - Ben Greiner <code@bnavigator.de>
|
||
|
||
- Update to 0.8.0
|
||
* our own cythonic thrift implementation (drop thrift dependency)
|
||
* more in-place dataset editing ad reordering
|
||
* python 3.10 support
|
||
* fixes for multi-index and pandas types
|
||
- Clean test skips
|
||
|
||
-------------------------------------------------------------------
|
||
Sun Jan 16 13:34:53 UTC 2022 - Ben Greiner <code@bnavigator.de>
|
||
|
||
- Clean specfile from unused python36 conditionals
|
||
- Require thrift 0.15.0 (+patch) for Python 3.10 compatibility
|
||
* gh#dask/fastparquet#514
|
||
|
||
-------------------------------------------------------------------
|
||
Sat Nov 27 20:34:53 UTC 2021 - Arun Persaud <arun@gmx.de>
|
||
|
||
- update to version 0.7.2:
|
||
* Ability to remove row-groups in-place for multifile datasets
|
||
* Accept pandas nullable Float type
|
||
* allow empty strings and fix min/max when there is no data
|
||
* make writing statistics optional
|
||
* row selection in to_pandas()
|
||
|
||
-------------------------------------------------------------------
|
||
Sun Aug 8 15:13:55 UTC 2021 - Ben Greiner <code@bnavigator.de>
|
||
|
||
- Update to version 0.7.1
|
||
* Back compile for older versions of numpy
|
||
* Make pandas nullable types opt-out. The old behaviour (casting
|
||
to float) is still available with ParquetFile(...,
|
||
pandas_nulls=False).
|
||
* Fix time field regression: IsAdjustedToUTC will be False when
|
||
there is no timezone
|
||
* Micro improvements to the speed of ParquetFile creation by
|
||
using simple simple string ops instead of regex and
|
||
regularising filenames once at the start. Effects datasets with
|
||
many files.
|
||
- Release 0.7.0
|
||
* This version institutes major, breaking changes, listed here,
|
||
and incremental fixes and additions.
|
||
* Reading a directory without a _metadata summary file now works
|
||
by providing only the directory, instead of a list of
|
||
constituent files. This change also makes direct of use of
|
||
fsspec filesystems, if given, to be able to load the footer
|
||
metadata areas of the files concurrently, if the storage
|
||
backend supports it, and not directly instantiating
|
||
intermediate ParquetFile instances
|
||
* row-level filtering of the data. Whereas previously, only full
|
||
row-groups could be excluded on the basis of their parquet
|
||
metadata statistics (if present), filtering can now be done
|
||
within row-groups too. The syntax is the same as before,
|
||
allowing for multiple column expressions to be combined with
|
||
AND|OR, depending on the list structure. This mechanism
|
||
requires two passes: one to load the columns needed to create
|
||
the boolean mask, and another to load the columns actually
|
||
needed in the output. This will not be faster, and may be
|
||
slower, but in some cases can save significant memory
|
||
footprint, if a small fraction of rows are considered good and
|
||
the columns for the filter expression are not in the output.
|
||
Not currently supported for reading with DataPageV2.
|
||
* DELTA integer encoding (read-only): experimentally working,
|
||
but we only have one test file to verify against, since it is
|
||
not trivial to persuade Spark to produce files encoded this
|
||
way. DELTA can be extremely compact a representation for
|
||
slowly varying and/or monotonically increasing integers.
|
||
* nanosecond resolution times: the new extended "logical" types
|
||
system supports nanoseconds alongside the previous millis and
|
||
micros. We now emit these for the default pandas time type,
|
||
and produce full parquet schema including both "converted" and
|
||
"logical" type information. Note that all output has
|
||
isAdjustedToUTC=True, i.e., these are timestamps rather than
|
||
local time. The time-zone is stored in the metadata, as
|
||
before, and will be successfully recreated only in fastparquet
|
||
and (py)arrow. Otherwise, the times will appear to be UTC. For
|
||
compatibility with Spark, you may still want to use
|
||
times="int96" when writing.
|
||
* DataPageV2 writing: now we support both reading and writing.
|
||
For writing, can be enabled with the environment variable
|
||
FASTPARQUET_DATAPAGE_V2, or module global fastparquet.writer.
|
||
DATAPAGE_VERSION and is off by default. It will become on by
|
||
default in the future. In many cases, V2 will result in better
|
||
read performance, because the data and page headers are
|
||
encoded separately, so data can be directly read into the
|
||
output without addition allocation/copies. This feature is
|
||
considered experimental, but we believe it working well for
|
||
most use cases (i.e., our test suite) and should be readable
|
||
by all modern parquet frameworks including arrow and spark.
|
||
* pandas nullable types: pandas supports "masked" extension
|
||
arrays for types that previously could not support NULL at
|
||
all: ints and bools. Fastparquet used to cast such columns to
|
||
float, so that we could represent NULLs as NaN; now we use the
|
||
new(er) masked types by default. This means faster reading of
|
||
such columns, as there is no conversion. If the metadata
|
||
guarantees that there are no nulls, we still use the
|
||
non-nullable variant unless the data was written with
|
||
fastparquet/pyarrow, and the metadata indicates that the
|
||
original datatype was nullable. We already handled writing of
|
||
nullable columns.
|
||
|
||
-------------------------------------------------------------------
|
||
Tue May 18 14:41:46 UTC 2021 - Ben Greiner <code@bnavigator.de>
|
||
|
||
- Update to version 0.6.3
|
||
* no release notes
|
||
* new requirement: cramjam instead of separate compression libs
|
||
and their bindings
|
||
* switch from numba to Cython
|
||
|
||
-------------------------------------------------------------------
|
||
Fri Feb 12 14:50:18 UTC 2021 - Dirk Müller <dmueller@suse.com>
|
||
|
||
- skip python 36 build
|
||
|
||
-------------------------------------------------------------------
|
||
Thu Feb 4 17:50:32 UTC 2021 - Jan Engelhardt <jengelh@inai.de>
|
||
|
||
- Use of "+=" in %check warrants bash as buildshell.
|
||
|
||
-------------------------------------------------------------------
|
||
Wed Feb 3 21:43:10 UTC 2021 - Ben Greiner <code@bnavigator.de>
|
||
|
||
- Skip the import without warning test gh#dask/fastparquet#558
|
||
- Apply the Cepl-Strangelove-Parameter to pytest
|
||
(--import-mode append)
|
||
|
||
-------------------------------------------------------------------
|
||
Sat Jan 2 21:04:30 UTC 2021 - Benjamin Greiner <code@bnavigator.de>
|
||
|
||
- update to version 0.5
|
||
* no changelog
|
||
- update test suite setup -- install the .test module
|
||
|
||
-------------------------------------------------------------------
|
||
Sat Jul 18 18:13:53 UTC 2020 - Arun Persaud <arun@gmx.de>
|
||
|
||
- specfile:
|
||
* update requirements: version numbers and added packaging
|
||
|
||
- update to version 0.4.1:
|
||
* nulls, fixes #504
|
||
* deps: Add missing dependency on packaging. (#502)
|
||
|
||
-------------------------------------------------------------------
|
||
Thu Jul 9 14:04:10 UTC 2020 - Marketa Calabkova <mcalabkova@suse.com>
|
||
|
||
- Update to 0.4.0
|
||
* Changed RangeIndex private methods to public ones
|
||
* Use the python executable used to run the code
|
||
* Add support for Python 3.8
|
||
* support for numba > 0.48
|
||
- drop upstreamed patch use-python-exec.patch
|
||
|
||
-------------------------------------------------------------------
|
||
Mon Apr 6 06:54:36 UTC 2020 - Tomáš Chvátal <tchvatal@suse.com>
|
||
|
||
- Add patch to use sys.executable and not call py2 binary directly:
|
||
* use-python-exec.patch
|
||
|
||
-------------------------------------------------------------------
|
||
Mon Apr 6 06:50:26 UTC 2020 - Tomáš Chvátal <tchvatal@suse.com>
|
||
|
||
- Update to 0.3.3:
|
||
* no upstream changelog
|
||
|
||
-------------------------------------------------------------------
|
||
Fri Oct 25 17:50:50 UTC 2019 - Todd R <toddrme2178@gmail.com>
|
||
|
||
- Drop broken python 2 support.
|
||
- Testing fixes
|
||
|
||
-------------------------------------------------------------------
|
||
Sat Aug 3 15:10:41 UTC 2019 - Arun Persaud <arun@gmx.de>
|
||
|
||
- update to version 0.3.2:
|
||
* Only calculate dataset stats once (#453)
|
||
* Fixes #436 (#452)
|
||
* Fix a crash if trying to read a file whose created_by value is not
|
||
set
|
||
* COMPAT: Fix for pandas DeprecationWarning (#446)
|
||
* Apply timezone to index (#439)
|
||
* Handle NaN partition values (#438)
|
||
* Pandas meta (#431)
|
||
* Only strip _metadata from end of file path (#430)
|
||
* Simple nesting fix (#428)
|
||
* Disallow bad tz on save, warn on load (#427)
|
||
|
||
-------------------------------------------------------------------
|
||
Tue Jul 30 14:23:21 UTC 2019 - Todd R <toddrme2178@gmail.com>
|
||
|
||
- Fix spurious test failure
|
||
|
||
-------------------------------------------------------------------
|
||
Mon May 20 15:12:11 CEST 2019 - Matej Cepl <mcepl@suse.com>
|
||
|
||
- Clean up SPEC file.
|
||
|
||
-------------------------------------------------------------------
|
||
Tue Apr 30 14:28:46 UTC 2019 - Todd R <toddrme2178@gmail.com>
|
||
|
||
- update to 0.3.1
|
||
* Add schema == (__eq__) and != (__ne__) methods and tests.
|
||
* Fix item iteration for decimals
|
||
* List missing columns in error message
|
||
* Fix tz being None case
|
||
- Update to 0.3.0
|
||
* Squash some warnings and import failures
|
||
* Improvements to in and not in operators
|
||
* Fixes because pandas released
|
||
|
||
-------------------------------------------------------------------
|
||
Sat Jan 26 17:05:09 UTC 2019 - Arun Persaud <arun@gmx.de>
|
||
|
||
- specfile:
|
||
* update copyright year
|
||
|
||
- update to version 0.2.1:
|
||
* Compat for pandas 0.24.0 refactor (#390)
|
||
* Change OverflowError message when failing on large pages (#387)
|
||
* Allow for changes in dictionary while reading a row-group column
|
||
(#367)
|
||
* Correct pypi project names for compression libraries (#385)
|
||
|
||
-------------------------------------------------------------------
|
||
Thu Nov 22 22:47:24 UTC 2018 - Arun Persaud <arun@gmx.de>
|
||
|
||
- update to version 0.2.0:
|
||
* Don't mutate column list input (#383) (#384)
|
||
* Add optional requirements to extras_require (#380)
|
||
* Fix "broken link to parquet-format page" (#377)
|
||
* Add .c file to repo
|
||
* Handle rows split across 2 pages in the case of a map (#369)
|
||
* Fixes 370 (#371)
|
||
* Handle multi-page maps (#368)
|
||
* Handle zero-column files. Closes #361. (#363)
|
||
|
||
-------------------------------------------------------------------
|
||
Sun Sep 30 16:22:56 UTC 2018 - Arun Persaud <arun@gmx.de>
|
||
|
||
- specfile:
|
||
* update url
|
||
* make %files section more specific
|
||
|
||
- update to version 0.1.6:
|
||
* Restrict what categories get passed through (#358)
|
||
* Deep digging for multi-indexes (#356)
|
||
* allow_empty is the default in >=zstandard-0.9 (#355)
|
||
* Remove setup_requires from setup.py (#345)
|
||
* Fixed error if a certain partition is empty, when writing a
|
||
partioned (#347)
|
||
* Allow UTF8 column names to be read (#342)
|
||
* readd test file
|
||
* Allow for NULL converted type (#340)
|
||
* Robust partition names (#336)
|
||
* Fix accidental multiindex
|
||
* Read multi indexes (#331)
|
||
* Allow reading from any file-like (#330)
|
||
* change `parquet-format` link to apache repo (#328)
|
||
* Remove extra space from api.py (#325)
|
||
* numba bool fun (#324)
|
||
|
||
- changes from version 0.1.5:
|
||
* Fix _dtypes to be more efficient, to work with files with lots of
|
||
columns (#318)
|
||
* Buildfix (#313)
|
||
* Use LZ4 block compression for compatibility with parquet-cpp
|
||
(#314) (#315)
|
||
* Fix typo in ParquetFile docstring (#312)
|
||
* Remove annoying print() when reading file with CategoricalDtype
|
||
index (#311)
|
||
* Allow lists of multi-file data-sets (#309)
|
||
* Acceleate dataframe.empty for small/medium sizes (#307)
|
||
* Include dictionary page in column size (#306)
|
||
* Fix for selecting columns which were used for partitioning (#304)
|
||
* Remove occurances of np.fromstring (#303)
|
||
* Add support for zstandard compression (#296)
|
||
* Int96time order (#298)
|
||
|
||
- changes from version 0.1.4:
|
||
* Add handling of keyword arguments for compressor (#294)
|
||
* Fix setup.py duplication (#295)
|
||
* Integrate pytest with setup.py (#293)
|
||
* Get setup.py pytest to work. (#287)
|
||
* Add LZ4 support (#292)
|
||
* Update for forthcoming thrift release (#281)
|
||
* If timezones are in pandas metadata, assign columns as required
|
||
(#285)
|
||
* Pandas import (#284)
|
||
* Copy FMDs instead of mutate (#279)
|
||
* small fixes (#278)
|
||
* fixes to get benchmark to work (#276)
|
||
* backwards compat with Dask
|
||
* Fix test_time_millis on Windows (#275)
|
||
* join paths os-independently (#271)
|
||
* Adds int32 support for object encoding (#268)
|
||
* Fix a couple small typos in documentation (#267)
|
||
* Partition order should be sorted (#265)
|
||
* COMPAT: Update thrift (#264)
|
||
* Speedups result (#253)
|
||
* Remove thrift_copy
|
||
* Define `__copy__` on thrift structures
|
||
* Update rtd deps
|
||
|
||
- changes from version 0.1.3:
|
||
* More care over append when partitioning multiple columns
|
||
* Sep for windows cats filtering
|
||
* Move pytest imports to tests/ remove requirememnt
|
||
* Special-case only zeros
|
||
* Cope with partition values like "07"
|
||
* fix for s3
|
||
* Fix for list of paths rooted in the current directory
|
||
* add test
|
||
* Explicit file opens
|
||
* update docstring
|
||
* Refactor partition interpretation
|
||
* py2 fix
|
||
* Error in test changed
|
||
* Better error messages when failed to cnovert on write
|
||
|
||
- changes from version 0.1.2:
|
||
* Revert accidental removal of s3 import
|
||
* Move thrift things together, and make thrift serializer for pickle
|
||
* COMPAT: for new pandas CategoricalDtype
|
||
* Fixup for backwards seeking.
|
||
* Fix some test failures
|
||
* Protptype version using thrift instead of thriftpy
|
||
* Not all mergers have cats
|
||
* Revert accidental deletion
|
||
* remove warnings
|
||
* Sort keys in json for metadata
|
||
* Check column chunks for categories sizes
|
||
* Account for partition dir names with numbers
|
||
* Fix map/list doc
|
||
* Catch more stats errors
|
||
* Prevent pandas auto-names being given to index
|
||
|
||
- changes from version 0.1.1:
|
||
* Add workaround for single-value-partition
|
||
* update test
|
||
* Simplify and fix for py2
|
||
* Use thrift encoding on statistics strings
|
||
* remove redundant SNAPPY from supported compressions list
|
||
* Fix statistics
|
||
* lists again
|
||
* Always convert int96 to times
|
||
* Update docs
|
||
* attribute typo
|
||
* Fix definition level
|
||
* Add test, clean columns
|
||
* Allow optional->optional lists and maps
|
||
* Flatten schema to enable loading of non-repeated columns
|
||
* Remove extra file
|
||
* Fix py2
|
||
* Fix "in" filter to cope with strings that could be numbers
|
||
* Allow pip install without NumPy or Cython
|
||
|
||
- changes from version 0.1.0:
|
||
* Add ParquetFile attribute documentation
|
||
* Fix tests
|
||
* Enable append to an empty dataset
|
||
* More warning words and check on partition_on
|
||
* Do not fail stats if there are no row-groups
|
||
* Fix "numpy_dtype"->"numpy_type
|
||
* "in" was checking range not exact membership of set
|
||
* If metadata gives index, put in columns
|
||
* Fix pytest warning
|
||
* Fail on ordering dict statistics
|
||
* Fix stats filter
|
||
* clean test
|
||
* Fix ImportWarning on Python 3.6+
|
||
* TEST: added updated test file for special strings used in filters
|
||
* fix links
|
||
* [README]: indicate dependency on LLVM 4.0.x.
|
||
* Filter stats had unfortunate converted_type check
|
||
* Ignore exceptions in val_to_num
|
||
* Also for TODAY
|
||
* Very special case for partition: NOW should be kept as string
|
||
* Allow partition_on; fix category nuls
|
||
* Remove old category key/values on writing
|
||
* Implement writing pandas metadata and auto-setting cats/index
|
||
* Pandas compatability
|
||
* Test and fix for filter on single file
|
||
* Do not attempt to recurse into schema elements with zero childrean
|
||
|
||
-------------------------------------------------------------------
|
||
Thu Jun 7 20:41:31 UTC 2018 - jengelh@inai.de
|
||
|
||
- Fixup grammar./Replace future aims with what it does now.
|
||
|
||
-------------------------------------------------------------------
|
||
Thu May 3 14:07:08 UTC 2018 - toddrme2178@gmail.com
|
||
|
||
- Use %license tag
|
||
|
||
-------------------------------------------------------------------
|
||
Thu May 25 12:19:26 UTC 2017 - toddrme2178@gmail.com
|
||
|
||
- Initial version
|