python-fastparquet

SHA256

Author	SHA256	Message	Date
Markéta Machová	62ed31f2b5	Accepting request 1179058 from home:bnavigator:numpy - Update to 2024.5.0 * Allow zoneinfo objects (#916) * Use np.int64 type for day to nanosecond conversion (NEP50) (#922) OBS-URL: https://build.opensuse.org/request/show/1179058 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=61	2024-06-07 09:29:40 +00:00
Dirk Mueller	6b501fc0a4	Accepting request 1154922 from home:bnavigator:branches:devel:languages:python:numeric - Update to 2024.2.0 * allow loading categoricals even if not so in the pandas metadata, when a column is dict-encoded and we only have one row-group (#863) * apply dtype to the columns names series, even when selecting no columns (#861, 859) * don’t make strings while estimating bye column size (#858) * handle upstream depr (#857, 856) OBS-URL: https://build.opensuse.org/request/show/1154922 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=59	2024-03-05 08:52:14 +00:00
Daniel Garcia	6d11a3d361	- Do not run tests in s390x, bsc#1218603 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=57	2024-02-07 09:00:48 +00:00
Dirk Mueller	3a99f6d72a	OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=55	2023-12-05 12:26:45 +00:00
Dirk Mueller	593e78e75e	- update to 2023.10.0: * Datetime units in empty() with tz (#893) * Fewer inplace decompressions for V2 pages (#890 * Allow writing categorical column with no categories (#888) * Fixes for new numpy (#886) * RLE bools and DELTA for v1 pages (#885, 883) OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=54	2023-12-05 12:23:57 +00:00
Dirk Mueller	548c6cc76f	- update to 2023.8.0: * More general timestamp units (#874) * ReadTheDocs V2 (#871) * Better roundtrip dtypes (#861, 859) * No convert when computing bytes-per-item for str (#858) - Add patch to fox the test test_delta_from_def_2 on * row-level filtering of the data. Whereas previously, only full row-groups could be excluded on the basis of their parquet metadata statistics (if present), filtering can now be done within row-groups too. The syntax is the same as before, allowing for multiple column expressions to be combined with AND\|OR, depending on the list structure. This mechanism requires two passes: one to load the columns needed to create the boolean mask, and another to load the columns actually needed in the output. This will not be faster, and may be slower, but in some cases can save significant memory footprint, if a small fraction of rows are considered good and the columns for the filter expression are not in the output. * DELTA integer encoding (read-only): experimentally working, but we only have one test file to verify against, since it is not trivial to persuade Spark to produce files encoded this way. DELTA can be extremely compact a representation for * nanosecond resolution times: the new extended "logical" types system supports nanoseconds alongside the previous millis and micros. We now emit these for the default pandas time type, and produce full parquet schema including both "converted" and "logical" type information. Note that all output has isAdjustedToUTC=True, i.e., these are timestamps rather than local time. The time-zone is stored in the metadata, as before, and will be successfully recreated only in fastparquet OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=52	2023-12-02 17:26:53 +00:00
Markéta Machová	c916408743	Accepting request 1096315 from home:apersaud:branches:devel:languages:python:numeric update to latest version OBS-URL: https://build.opensuse.org/request/show/1096315 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=50	2023-07-02 15:14:02 +00:00
Dirk Mueller	f1d61b0abc	- update to 2023.4.0: * allow loading categoricals even if not so in the pandas metadata, when a column is dict-encodedand we only have one row-group (#863) * apply dtype to the columns names series, even when selecting no columns (#861, 859) * don't make strings while estimating bye column size (#858) * handle upstream depr (#857, 856) OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=48	2023-04-28 08:11:58 +00:00
Matej Cepl	42fa3a1c16	Accepting request 1064736 from home:apersaud:branches:devel:languages:python:numeric update to latest version OBS-URL: https://build.opensuse.org/request/show/1064736 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=46	2023-02-12 22:54:14 +00:00
Dirk Mueller	8b4bcd5004	Accepting request 1046313 from home:bnavigator:branches:devel:languages:python:numeric - Update to 2022.12.0 * check all int32 values before passing to thrift writer * fix type of num_rows to i64 for big single file - Release 2022.11.0 * Switch to calver * Speed up loading of nullable types * Allow schema evolution by addition of columns * Allow specifying dtypes of output * update to scm versioning * fixes to row filter, statistics and tests * support pathlib.Paths * JSON encoder options - Drop fastparquet-pr813-updatefixes.patch OBS-URL: https://build.opensuse.org/request/show/1046313 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=44	2023-01-03 07:37:17 +00:00
Dirk Mueller	68c0cc4bdd	Accepting request 1044387 from home:Guillaume_G:fastparquet - Add patch to fox the test test_delta_from_def_2 on aarch64, armv7 and ppc64le: * fastparquet-pr835.patch OBS-URL: https://build.opensuse.org/request/show/1044387 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=42	2022-12-23 16:12:14 +00:00
Markéta Machová	4d8484644b	Accepting request 1032127 from home:bnavigator:branches:devel:languages:python:numeric - Update to 0.8.3 * improved key/value handling and rejection of bad types * fix regression in consolidate_cats (caught in dask tests) - Release 0.8.2 * datetime indexes initialised to 0 to prevent overflow from randommemory * case from csv_to_parquet where stats exists but has not nulls entry * define len and bool for ParquetFile * maintain int types of optional data tha came from pandas * fix for delta encoding - Add fastparquet-pr813-updatefixes.patch gh#dask/fastparquet#813 OBS-URL: https://build.opensuse.org/request/show/1032127 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=40	2022-10-31 09:54:02 +00:00
Markéta Machová	6a03a42003	Accepting request 972857 from home:bnavigator:branches:devel:languages:python:numeric - Update to 0.8.1 * fix critical buffer overflow crash for large number of columns and long column names * metadata handling * thrift int32 for list * avoid error storing NaNs in column stats OBS-URL: https://build.opensuse.org/request/show/972857 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=38	2022-04-26 14:21:38 +00:00
Matej Cepl	59c9a3b022	Accepting request 950136 from home:bnavigator:branches:devel:languages:python:numeric - Update to 0.8.0 * our own cythonic thrift implementation (drop thrift dependency) * more in-place dataset editing ad reordering * python 3.10 support * fixes for multi-index and pandas types - Clean test skips OBS-URL: https://build.opensuse.org/request/show/950136 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=37	2022-01-31 20:28:31 +00:00
Matej Cepl	c9d6b4d9cf	Accepting request 946801 from home:bnavigator:branches:devel:languages:python:numeric - Clean specfile from unused python36 conditionals - Require thrift 0.15.0 (+patch) for Python 3.10 compatibility * gh#dask/fastparquet#514 OBS-URL: https://build.opensuse.org/request/show/946801 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=36	2022-01-17 06:31:02 +00:00
Dirk Mueller	dc8c5ab00b	Accepting request 934308 from home:apersaud:branches:devel:languages:python:numeric - still some failed builds, but they are also in the current package (and I don't know how to fix them) - update to version 0.7.2: * Ability to remove row-groups in-place for multifile datasets * Accept pandas nullable Float type * allow empty strings and fix min/max when there is no data * make writing statistics optional * row selection in to_pandas() OBS-URL: https://build.opensuse.org/request/show/934308 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=35	2021-11-28 19:15:14 +00:00
Matej Cepl	4cca0bf52a	Accepting request 910725 from home:bnavigator:branches:devel:languages:python:numeric - Update to version 0.7.1 * Back compile for older versions of numpy * Make pandas nullable types opt-out. The old behaviour (casting to float) is still available with ParquetFile(..., pandas_nulls=False). * Fix time field regression: IsAdjustedToUTC will be False when there is no timezone * Micro improvements to the speed of ParquetFile creation by using simple simple string ops instead of regex and regularising filenames once at the start. Effects datasets with many files. - Release 0.7.0 * This version institutes major, breaking changes, listed here, and incremental fixes and additions. * Reading a directory without a _metadata summary file now works by providing only the directory, instead of a list of constituent files. This change also makes direct of use of fsspec filesystems, if given, to be able to load the footer metadata areas of the files concurrently, if the storage backend supports it, and not directly instantiating intermediate ParquetFile instances * row-level filtering of the data. Whereas previously, only full row-groups could be excluded on the basis of their parquet metadata statistics (if present), filtering can now be done within row-groups too. The syntax is the same as before, allowing for multiple column expressions to be combined with AND\|OR, depending on the list structure. This mechanism requires two passes: one to load the columns needed to create the boolean mask, and another to load the columns actually needed in the output. This will not be faster, and may be slower, but in some cases can save significant memory footprint, if a small fraction of rows are considered good and the columns for the filter expression are not in the output. Not currently supported for reading with DataPageV2. * DELTA integer encoding (read-only): experimentally working, but we only have one test file to verify against, since it is not trivial to persuade Spark to produce files encoded this way. DELTA can be extremely compact a representation for slowly varying and/or monotonically increasing integers. * nanosecond resolution times: the new extended "logical" types system supports nanoseconds alongside the previous millis and micros. We now emit these for the default pandas time type, and produce full parquet schema including both "converted" and "logical" type information. Note that all output has isAdjustedToUTC=True, i.e., these are timestamps rather than local time. The time-zone is stored in the metadata, as before, and will be successfully recreated only in fastparquet and (py)arrow. Otherwise, the times will appear to be UTC. For compatibility with Spark, you may still want to use times="int96" when writing. * DataPageV2 writing: now we support both reading and writing. For writing, can be enabled with the environment variable FASTPARQUET_DATAPAGE_V2, or module global fastparquet.writer. DATAPAGE_VERSION and is off by default. It will become on by default in the future. In many cases, V2 will result in better read performance, because the data and page headers are encoded separately, so data can be directly read into the output without addition allocation/copies. This feature is considered experimental, but we believe it working well for most use cases (i.e., our test suite) and should be readable by all modern parquet frameworks including arrow and spark. * pandas nullable types: pandas supports "masked" extension arrays for types that previously could not support NULL at all: ints and bools. Fastparquet used to cast such columns to float, so that we could represent NULLs as NaN; now we use the new(er) masked types by default. This means faster reading of such columns, as there is no conversion. If the metadata guarantees that there are no nulls, we still use the non-nullable variant unless the data was written with fastparquet/pyarrow, and the metadata indicates that the original datatype was nullable. We already handled writing of nullable columns. OBS-URL: https://build.opensuse.org/request/show/910725 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=34	2021-08-09 13:21:06 +00:00
Matej Cepl	9dfe59d4e3	Accepting request 894265 from home:bnavigator:branches:devel:languages:python:numeric - Update to version 0.6.3 * no release notes * new requirement: cramjam instead of separate compression libs and their bindings * switch from numba to Cython OBS-URL: https://build.opensuse.org/request/show/894265 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=32	2021-05-19 10:12:11 +00:00
Dirk Mueller	c13c6c472e	- skip python 36 build OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=30	2021-02-12 14:50:26 +00:00
Matej Cepl	6eec839c0e	Accepting request 869540 from home:jengelh:branches:devel:languages:python:numeric - Use of "+=" in %check warrants bash as buildshell. OBS-URL: https://build.opensuse.org/request/show/869540 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=28	2021-02-09 21:28:22 +00:00
Matej Cepl	3b3a9ec57f	Accepting request 869041 from home:bnavigator:branches:devel:languages:python:numeric - Skip the import without warning test gh#dask/fastparquet#558 - Apply the Cepl-Strangelove-Parameter to pytest (--import-mode append) OBS-URL: https://build.opensuse.org/request/show/869041 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=26	2021-02-04 16:34:25 +00:00
Matej Cepl	3fc19bda15	Accepting request 859934 from home:bnavigator:branches:devel:languages:python:numeric - update to version 0.5 * no changelog - update test suite setup -- install the .test module OBS-URL: https://build.opensuse.org/request/show/859934 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=24	2021-01-03 10:17:10 +00:00
Todd R	e8e5e639c7	Accepting request 821674 from home:apersaud:branches:devel:languages:python:numeric update to latest version OBS-URL: https://build.opensuse.org/request/show/821674 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=22	2020-07-18 19:17:22 +00:00
Tomáš Chvátal	4c289424b1	Accepting request 819735 from home:mcalabkova:branches:devel:languages:python:numeric - Update to 0.4.0 * Changed RangeIndex private methods to public ones * Use the python executable used to run the code * Add support for Python 3.8 * support for numba > 0.48 - drop upstreamed patch use-python-exec.patch OBS-URL: https://build.opensuse.org/request/show/819735 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=20	2020-07-09 21:09:42 +00:00
Tomáš Chvátal	1e90d85788	- Add patch to use sys.executable and not call py2 binary directly: * use-python-exec.patch - Update to 0.3.3: * no upstream changelog OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=18	2020-04-06 07:07:54 +00:00
Todd R	b6df563592	Accepting request 742992 from home:TheBlackCat:branches:devel:languages:python:numeric - Drop broken python 2 support. - Testing fixes OBS-URL: https://build.opensuse.org/request/show/742992 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=16	2019-10-25 17:52:59 +00:00
Todd R	62d9529bea	Accepting request 720816 from home:apersaud:branches:devel:languages:python:numeric update to latest version OBS-URL: https://build.opensuse.org/request/show/720816 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=14	2019-08-04 14:05:57 +00:00
Todd R	c46e61d62b	Accepting request 719841 from home:TheBlackCat:branches:devel:languages:python:numeric - Fix spurious test failure OBS-URL: https://build.opensuse.org/request/show/719841 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=13	2019-07-30 14:23:51 +00:00
Tomáš Chvátal	6f19f6c0ef	Accepting request 704253 from home:mcepl:branches:devel:languages:python:numeric - Clean up SPEC file. OBS-URL: https://build.opensuse.org/request/show/704253 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=11	2019-05-20 13:24:28 +00:00
Todd R	bc29c40d35	Accepting request 699761 from home:TheBlackCat:branches:devel:languages:python:numeric - update to 0.3.1 * Add schema == (__eq__) and != (__ne__) methods and tests. * Fix item iteration for decimals * List missing columns in error message * Fix tz being None case - Update to 0.3.0 * Squash some warnings and import failures * Improvements to in and not in operators * Fixes because pandas released OBS-URL: https://build.opensuse.org/request/show/699761 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=9	2019-04-30 20:10:30 +00:00
Tomáš Chvátal	4186890208	Accepting request 668821 from home:apersaud:branches:devel:languages:python:numeric update to latest version OBS-URL: https://build.opensuse.org/request/show/668821 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=7	2019-01-26 21:38:54 +00:00
Tomáš Chvátal	dfe04380f9	Accepting request 651237 from home:apersaud:branches:devel:languages:python:numeric update to latest version OBS-URL: https://build.opensuse.org/request/show/651237 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python:numeric/python-fastparquet?expand=0&rev=5	2018-11-23 08:26:16 +00:00
Dominique Leuenberger	c8e155b151	Accepting request 639321 from devel:languages:python:numeric OBS-URL: https://build.opensuse.org/request/show/639321 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/python-fastparquet?expand=0&rev=3	2018-10-02 17:47:24 +00:00
Yuchen Lin	85cef9f15d	Accepting request 615166 from devel:languages:python OBS-URL: https://build.opensuse.org/request/show/615166 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/python-fastparquet?expand=0&rev=2	2018-06-13 13:37:19 +00:00
Dominique Leuenberger	67a551df18	Accepting request 603775 from devel:languages:python Needed by python-datashader OBS-URL: https://build.opensuse.org/request/show/603775 OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/python-fastparquet?expand=0&rev=1	2018-05-15 08:08:00 +00:00

35 Commits