Accepting request 724138 from devel:languages:python:numeric

Update to Version 0.25.0

All packages broken by this update should be fixed now.

OBS-URL: https://build.opensuse.org/request/show/724138
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/python-pandas?expand=0&rev=18
This commit is contained in:
Dominique Leuenberger 2019-08-19 18:48:18 +00:00 committed by Git OBS Bridge
commit 0fd36f4242
5 changed files with 465 additions and 104 deletions

View File

@ -1,3 +0,0 @@
version https://git-lfs.github.com/spec/v1
oid sha256:4f919f409c433577a501e023943e582c57355d50a724c589e78bc1d551a535a2
size 11837693

3
pandas-0.25.0.tar.gz Normal file
View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:914341ad2d5b1ea522798efa4016430b66107d05781dbfe7cf05eba8f37df995
size 12616848

View File

@ -1,48 +0,0 @@
From 5a73ff8b4e10d016e0fd4162fa14c8f1a41345d9 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Tom=C3=A1=C5=A1=20Chv=C3=A1tal?= <tchvatal@suse.com>
Date: Thu, 21 Feb 2019 15:05:21 +0100
Subject: [PATCH] Mark test_pct_max_many_rows as high memory
Fixes issue #25384
---
pandas/tests/frame/test_rank.py | 1 +
pandas/tests/series/test_rank.py | 1 +
pandas/tests/test_algos.py | 1 +
3 files changed, 3 insertions(+)
diff --git a/pandas/tests/frame/test_rank.py b/pandas/tests/frame/test_rank.py
index 10c42e0d1a1..6bb9dea15d1 100644
--- a/pandas/tests/frame/test_rank.py
+++ b/pandas/tests/frame/test_rank.py
@@ -310,6 +310,7 @@ def test_rank_pct_true(self, method, exp):
tm.assert_frame_equal(result, expected)
@pytest.mark.single
+ @pytest.mark.high_memory
def test_pct_max_many_rows(self):
# GH 18271
df = DataFrame({'A': np.arange(2**24 + 1),
diff --git a/pandas/tests/series/test_rank.py b/pandas/tests/series/test_rank.py
index 510a51e0029..dfcda889269 100644
--- a/pandas/tests/series/test_rank.py
+++ b/pandas/tests/series/test_rank.py
@@ -499,6 +499,7 @@ def test_rank_first_pct(dtype, ser, exp):
@pytest.mark.single
+@pytest.mark.high_memory
def test_pct_max_many_rows():
# GH 18271
s = Series(np.arange(2**24 + 1))
diff --git a/pandas/tests/test_algos.py b/pandas/tests/test_algos.py
index 888cf78a1c6..cb7426ce2f7 100644
--- a/pandas/tests/test_algos.py
+++ b/pandas/tests/test_algos.py
@@ -1484,6 +1484,7 @@ def test_too_many_ndims(self):
algos.rank(arr)
@pytest.mark.single
+ @pytest.mark.high_memory
@pytest.mark.parametrize('values', [
np.arange(2**24 + 1),
np.arange(2**25 + 2).reshape(2**24 + 1, 2)],

View File

@ -1,3 +1,409 @@
-------------------------------------------------------------------
Mon Jul 22 15:36:34 UTC 2019 - Todd R <toddrme2178@gmail.com>
- Update to Version 0.25.0
+ Warning
* Starting with the 0.25.x series of releases, pandas only supports Python 3.5.3 and higher.
* The minimum supported Python version will be bumped to 3.6 in a future release.
* Panel has been fully removed. For N-D labeled data structures, please
use xarray
* read_pickle read_msgpack are only guaranteed backwards compatible back to
pandas version 0.20.3
+ Enhancements
* Groupby aggregation with relabeling
Pandas has added special groupby behavior, known as "named aggregation", for naming the
output columns when applying multiple aggregation functions to specific columns.
* Groupby Aggregation with multiple lambdas
You can now provide multiple lambda functions to a list-like aggregation in
pandas.core.groupby.GroupBy.agg.
* Better repr for MultiIndex
Printing of MultiIndex instances now shows tuples of each row and ensures
that the tuple items are vertically aligned, so it's now easier to understand
the structure of the MultiIndex.
* Shorter truncated repr for Series and DataFrame
Currently, the default display options of pandas ensure that when a Series
or DataFrame has more than 60 rows, its repr gets truncated to this maximum
of 60 rows (the display.max_rows option). However, this still gives
a repr that takes up a large part of the vertical screen estate. Therefore,
a new option display.min_rows is introduced with a default of 10 which
determines the number of rows showed in the truncated repr:
* Json normalize with max_level param support
json_normalize normalizes the provided input dict to all
nested levels. The new max_level parameter provides more control over
which level to end normalization.
* Series.explode to split list-like values to rows
Series and DataFrame have gained the DataFrame.explode methods to transform
list-likes to individual rows.
* DataFrame.plot keywords logy, logx and loglog can now accept the value 'sym' for symlog scaling.
* Added support for ISO week year format ('%G-%V-%u') when parsing datetimes using to_datetime
* Indexing of DataFrame and Series now accepts zerodim np.ndarray
* Timestamp.replace now supports the fold argument to disambiguate DST transition times
* DataFrame.at_time and Series.at_time now support datetime.time objects with timezones
* DataFrame.pivot_table now accepts an observed parameter which is passed to underlying calls to DataFrame.groupby to speed up grouping categorical data.
* Series.str has gained Series.str.casefold method to removes all case distinctions present in a string
* DataFrame.set_index now works for instances of abc.Iterator, provided their output is of the same length as the calling frame
* DatetimeIndex.union now supports the sort argument. The behavior of the sort parameter matches that of Index.union
* RangeIndex.union now supports the sort argument. If sort=False an unsorted Int64Index is always returned. sort=None is the default and returns a monotonically increasing RangeIndex if possible or a sorted Int64Index if not
* TimedeltaIndex.intersection now also supports the sort keyword
* DataFrame.rename now supports the errors argument to raise errors when attempting to rename nonexistent keys
* Added api.frame.sparse for working with a DataFrame whose values are sparse
* RangeIndex has gained ~RangeIndex.start, ~RangeIndex.stop, and ~RangeIndex.step attributes
* datetime.timezone objects are now supported as arguments to timezone methods and constructors
* DataFrame.query and DataFrame.eval now supports quoting column names with backticks to refer to names with spaces
* merge_asof now gives a more clear error message when merge keys are categoricals that are not equal
* pandas.core.window.Rolling supports exponential (or Poisson) window type
* Error message for missing required imports now includes the original import error's text
* DatetimeIndex and TimedeltaIndex now have a mean method
* DataFrame.describe now formats integer percentiles without decimal point
* Added support for reading SPSS .sav files using read_spss
* Added new option plotting.backend to be able to select a plotting backend different than the existing matplotlib one. Use pandas.set_option('plotting.backend', '<backend-module>') where <backend-module is a library implementing the pandas plotting API
* pandas.offsets.BusinessHour supports multiple opening hours intervals
* read_excel can now use openpyxl to read Excel files via the engine='openpyxl' argument. This will become the default in a future release
* pandas.io.excel.read_excel supports reading OpenDocument tables. Specify engine='odf' to enable. Consult the IO User Guide <io.ods> for more details
* Interval, IntervalIndex, and ~arrays.IntervalArray have gained an ~Interval.is_empty attribute denoting if the given interval(s) are empty
+ Backwards incompatible API changes
* Indexing with date strings with UTC offsets
Indexing a DataFrame or Series with a DatetimeIndex with a
date string with a UTC offset would previously ignore the UTC offset. Now, the UTC offset
is respected in indexing.
* MultiIndex constructed from levels and codes
Constructing a MultiIndex with NaN levels or codes value < -1 was allowed previously.
Now, construction with codes value < -1 is not allowed and NaN levels' corresponding codes
would be reassigned as -1.
* Groupby.apply on DataFrame evaluates first group only once
The implementation of DataFrameGroupBy.apply()
previously evaluated the supplied function consistently twice on the first group
to infer if it is safe to use a fast code path. Particularly for functions with
side effects, this was an undesired behavior and may have led to surprises.
* Concatenating sparse values
When passed DataFrames whose values are sparse, concat will now return a
Series or DataFrame with sparse values, rather than a SparseDataFrame .
* The .str-accessor performs stricter type checks
Due to the lack of more fine-grained dtypes, Series.str so far only checked whether the data was
of object dtype. Series.str will now infer the dtype data *within* the Series; in particular,
'bytes'-only data will raise an exception (except for Series.str.decode, Series.str.get,
Series.str.len, Series.str.slice).
* Categorical dtypes are preserved during groupby
Previously, columns that were categorical, but not the groupby key(s) would be converted to object dtype during groupby operations. Pandas now will preserve these dtypes.
* Incompatible Index type unions
When performing Index.union operations between objects of incompatible dtypes,
the result will be a base Index of dtype object. This behavior holds true for
unions between Index objects that previously would have been prohibited. The dtype
of empty Index objects will now be evaluated before performing union operations
rather than simply returning the other Index object. Index.union can now be
considered commutative, such that A.union(B) == B.union(A) .
* DataFrame groupby ffill/bfill no longer return group labels
The methods ffill, bfill, pad and backfill of
DataFrameGroupBy <pandas.core.groupby.DataFrameGroupBy>
previously included the group labels in the return value, which was
inconsistent with other groupby transforms. Now only the filled values
are returned.
* DataFrame describe on an empty categorical / object column will return top and freq
When calling DataFrame.describe with an empty categorical / object
column, the 'top' and 'freq' columns were previously omitted, which was inconsistent with
the output for non-empty columns. Now the 'top' and 'freq' columns will always be included,
with numpy.nan in the case of an empty DataFrame
* __str__ methods now call __repr__ rather than vice versa
Pandas has until now mostly defined string representations in a Pandas objects's
__str__/__unicode__/__bytes__ methods, and called __str__ from the __repr__
method, if a specific __repr__ method is not found. This is not needed for Python3.
In Pandas 0.25, the string representations of Pandas objects are now generally
defined in __repr__, and calls to __str__ in general now pass the call on to
the __repr__, if a specific __str__ method doesn't exist, as is standard for Python.
This change is backward compatible for direct usage of Pandas, but if you subclass
Pandas objects *and* give your subclasses specific __str__/__repr__ methods,
you may have to adjust your __str__/__repr__ methods .
* Indexing an IntervalIndex with Interval objects
Indexing methods for IntervalIndex have been modified to require exact matches only for Interval queries.
IntervalIndex methods previously matched on any overlapping Interval. Behavior with scalar points, e.g. querying
with an integer, is unchanged .
* Binary ufuncs on Series now align
Applying a binary ufunc like numpy.power now aligns the inputs
when both are Series .
* Categorical.argsort now places missing values at the end
Categorical.argsort now places missing values at the end of the array, making it
consistent with NumPy and the rest of pandas .
* Column order is preserved when passing a list of dicts to DataFrame
Starting with Python 3.7 the key-order of dict is guaranteed <https://mail.python.org/pipermail/python-dev/2017-December/151283.html>_. In practice, this has been true since
Python 3.6. The DataFrame constructor now treats a list of dicts in the same way as
it does a list of OrderedDict, i.e. preserving the order of the dicts.
This change applies only when pandas is running on Python>=3.6 .
* Increased minimum versions for dependencies
* DatetimeTZDtype will now standardize pytz timezones to a common timezone instance
* Timestamp and Timedelta scalars now implement the to_numpy method as aliases to Timestamp.to_datetime64 and Timedelta.to_timedelta64, respectively.
* Timestamp.strptime will now rise a NotImplementedError
* Comparing Timestamp with unsupported objects now returns :pyNotImplemented instead of raising TypeError. This implies that unsupported rich comparisons are delegated to the other object, and are now consistent with Python 3 behavior for datetime objects
* Bug in DatetimeIndex.snap which didn't preserving the name of the input Index
* The arg argument in pandas.core.groupby.DataFrameGroupBy.agg has been renamed to func
* The arg argument in pandas.core.window._Window.aggregate has been renamed to func
* Most Pandas classes had a __bytes__ method, which was used for getting a python2-style bytestring representation of the object. This method has been removed as a part of dropping Python2
* The .str-accessor has been disabled for 1-level MultiIndex, use MultiIndex.to_flat_index if necessary
* Removed support of gtk package for clipboards
* Using an unsupported version of Beautiful Soup 4 will now raise an ImportError instead of a ValueError
* Series.to_excel and DataFrame.to_excel will now raise a ValueError when saving timezone aware data.
* ExtensionArray.argsort places NA values at the end of the sorted array.
* DataFrame.to_hdf and Series.to_hdf will now raise a NotImplementedError when saving a MultiIndex with extention data types for a fixed format.
* Passing duplicate names in read_csv will now raise a ValueError
+ Deprecations
* Sparse subclasses
The SparseSeries and SparseDataFrame subclasses are deprecated. Their functionality is better-provided
by a Series or DataFrame with sparse values.
* msgpack format
The msgpack format is deprecated as of 0.25 and will be removed in a future version. It is recommended to use pyarrow for on-the-wire transmission of pandas objects.
* The deprecated .ix[] indexer now raises a more visible FutureWarning instead of DeprecationWarning .
* Deprecated the units=M (months) and units=Y (year) parameters for units of pandas.to_timedelta, pandas.Timedelta and pandas.TimedeltaIndex
* pandas.concat has deprecated the join_axes-keyword. Instead, use DataFrame.reindex or DataFrame.reindex_like on the result or on the inputs
* The SparseArray.values attribute is deprecated. You can use np.asarray(...) or
the SparseArray.to_dense method instead .
* The functions pandas.to_datetime and pandas.to_timedelta have deprecated the box keyword. Instead, use to_numpy or Timestamp.to_datetime64 or Timedelta.to_timedelta64.
* The DataFrame.compound and Series.compound methods are deprecated and will be removed in a future version .
* The internal attributes _start, _stop and _step attributes of RangeIndex have been deprecated.
Use the public attributes ~RangeIndex.start, ~RangeIndex.stop and ~RangeIndex.step instead .
* The Series.ftype, Series.ftypes and DataFrame.ftypes methods are deprecated and will be removed in a future version.
Instead, use Series.dtype and DataFrame.dtypes .
* The Series.get_values, DataFrame.get_values, Index.get_values,
SparseArray.get_values and Categorical.get_values methods are deprecated.
One of np.asarray(..) or ~Series.to_numpy can be used instead .
* The 'outer' method on NumPy ufuncs, e.g. np.subtract.outer has been deprecated on Series objects. Convert the input to an array with Series.array first
* Timedelta.resolution is deprecated and replaced with Timedelta.resolution_string. In a future version, Timedelta.resolution will be changed to behave like the standard library datetime.timedelta.resolution
* read_table has been undeprecated.
* Index.dtype_str is deprecated.
* Series.imag and Series.real are deprecated.
* Series.put is deprecated.
* Index.item and Series.item is deprecated.
* The default value ordered=None in ~pandas.api.types.CategoricalDtype has been deprecated in favor of ordered=False. When converting between categorical types ordered=True must be explicitly passed in order to be preserved.
* Index.contains is deprecated. Use key in index (__contains__) instead .
* DataFrame.get_dtype_counts is deprecated.
* Categorical.ravel will return a Categorical instead of a np.ndarray
+ Removal of prior version deprecations/changes
* Removed Panel
* Removed the previously deprecated sheetname keyword in read_excel
* Removed the previously deprecated TimeGrouper
* Removed the previously deprecated parse_cols keyword in read_excel
* Removed the previously deprecated pd.options.html.border
* Removed the previously deprecated convert_objects
* Removed the previously deprecated select method of DataFrame and Series
* Removed the previously deprecated behavior of Series treated as list-like in ~Series.cat.rename_categories
* Removed the previously deprecated DataFrame.reindex_axis and Series.reindex_axis
* Removed the previously deprecated behavior of altering column or index labels with Series.rename_axis or DataFrame.rename_axis
* Removed the previously deprecated tupleize_cols keyword argument in read_html, read_csv, and DataFrame.to_csv
* Removed the previously deprecated DataFrame.from.csv and Series.from_csv
* Removed the previously deprecated raise_on_error keyword argument in DataFrame.where and DataFrame.mask
* Removed the previously deprecated ordered and categories keyword arguments in astype
* Removed the previously deprecated cdate_range
* Removed the previously deprecated True option for the dropna keyword argument in SeriesGroupBy.nth
* Removed the previously deprecated convert keyword argument in Series.take and DataFrame.take
+ Performance improvements
* Significant speedup in SparseArray initialization that benefits most operations, fixing performance regression introduced in v0.20.0
* DataFrame.to_stata() is now faster when outputting data with any string or non-native endian columns
* Improved performance of Series.searchsorted. The speedup is especially large when the dtype is
int8/int16/int32 and the searched key is within the integer bounds for the dtype
* Improved performance of pandas.core.groupby.GroupBy.quantile
* Improved performance of slicing and other selected operation on a RangeIndex
* RangeIndex now performs standard lookup without instantiating an actual hashtable, hence saving memory
* Improved performance of read_csv by faster tokenizing and faster parsing of small float numbers
* Improved performance of read_csv by faster parsing of N/A and boolean values
* Improved performance of IntervalIndex.is_monotonic, IntervalIndex.is_monotonic_increasing and IntervalIndex.is_monotonic_decreasing by removing conversion to MultiIndex
* Improved performance of DataFrame.to_csv when writing datetime dtypes
* Improved performance of read_csv by much faster parsing of MM/YYYY and DD/MM/YYYY datetime formats
* Improved performance of nanops for dtypes that cannot store NaNs. Speedup is particularly prominent for Series.all and Series.any
* Improved performance of Series.map for dictionary mappers on categorical series by mapping the categories instead of mapping all values
* Improved performance of IntervalIndex.intersection
* Improved performance of read_csv by faster concatenating date columns without extra conversion to string for integer/float zero and float NaN; by faster checking the string for the possibility of being a date
* Improved performance of IntervalIndex.is_unique by removing conversion to MultiIndex
* Restored performance of DatetimeIndex.__iter__ by re-enabling specialized code path
* Improved performance when building MultiIndex with at least one CategoricalIndex level
* Improved performance by removing the need for a garbage collect when checking for SettingWithCopyWarning
* For to_datetime changed default value of cache parameter to True
* Improved performance of DatetimeIndex and PeriodIndex slicing given non-unique, monotonic data .
* Improved performance of pd.read_json for index-oriented data.
* Improved performance of MultiIndex.shape .
+ Bug fixes
> Categorical
* Bug in DataFrame.at and Series.at that would raise exception if the index was a CategoricalIndex
* Fixed bug in comparison of ordered Categorical that contained missing values with a scalar which sometimes incorrectly resulted in True
* Bug in DataFrame.dropna when the DataFrame has a CategoricalIndex containing Interval objects incorrectly raised a TypeError
> Datetimelike
* Bug in to_datetime which would raise an (incorrect) ValueError when called with a date far into the future and the format argument specified instead of raising OutOfBoundsDatetime
* Bug in to_datetime which would raise InvalidIndexError: Reindexing only valid with uniquely valued Index objects when called with cache=True, with arg including at least two different elements from the set {None, numpy.nan, pandas.NaT}
* Bug in DataFrame and Series where timezone aware data with dtype='datetime64[ns] was not cast to naive
* Improved Timestamp type checking in various datetime functions to prevent exceptions when using a subclassed datetime
* Bug in Series and DataFrame repr where np.datetime64('NaT') and np.timedelta64('NaT') with dtype=object would be represented as NaN
* Bug in to_datetime which does not replace the invalid argument with NaT when error is set to coerce
* Bug in adding DateOffset with nonzero month to DatetimeIndex would raise ValueError
* Bug in to_datetime which raises unhandled OverflowError when called with mix of invalid dates and NaN values with format='%Y%m%d' and error='coerce'
* Bug in isin for datetimelike indexes; DatetimeIndex, TimedeltaIndex and PeriodIndex where the levels parameter was ignored.
* Bug in to_datetime which raises TypeError for format='%Y%m%d' when called for invalid integer dates with length >= 6 digits with errors='ignore'
* Bug when comparing a PeriodIndex against a zero-dimensional numpy array
* Bug in constructing a Series or DataFrame from a numpy datetime64 array with a non-ns unit and out-of-bound timestamps generating rubbish data, which will now correctly raise an OutOfBoundsDatetime error .
* Bug in date_range with unnecessary OverflowError being raised for very large or very small dates
* Bug where adding Timestamp to a np.timedelta64 object would raise instead of returning a Timestamp
* Bug where comparing a zero-dimensional numpy array containing a np.datetime64 object to a Timestamp would incorrect raise TypeError
* Bug in to_datetime which would raise ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True when called with cache=True, with arg including datetime strings with different offset
> Timedelta
* Bug in TimedeltaIndex.intersection where for non-monotonic indices in some cases an empty Index was returned when in fact an intersection existed
* Bug with comparisons between Timedelta and NaT raising TypeError
* Bug when adding or subtracting a BusinessHour to a Timestamp with the resulting time landing in a following or prior day respectively
* Bug when comparing a TimedeltaIndex against a zero-dimensional numpy array
> Timezones
* Bug in DatetimeIndex.to_frame where timezone aware data would be converted to timezone naive data
* Bug in to_datetime with utc=True and datetime strings that would apply previously parsed UTC offsets to subsequent arguments
* Bug in Timestamp.tz_localize and Timestamp.tz_convert does not propagate freq
* Bug in Series.at where setting Timestamp with timezone raises TypeError
* Bug in DataFrame.update when updating with timezone aware data would return timezone naive data
* Bug in to_datetime where an uninformative RuntimeError was raised when passing a naive Timestamp with datetime strings with mixed UTC offsets
* Bug in to_datetime with unit='ns' would drop timezone information from the parsed argument
* Bug in DataFrame.join where joining a timezone aware index with a timezone aware column would result in a column of NaN
* Bug in date_range where ambiguous or nonexistent start or end times were not handled by the ambiguous or nonexistent keywords respectively
* Bug in DatetimeIndex.union when combining a timezone aware and timezone unaware DatetimeIndex
* Bug when applying a numpy reduction function (e.g. numpy.minimum) to a timezone aware Series
> Numeric
* Bug in to_numeric in which large negative numbers were being improperly handled
* Bug in to_numeric in which numbers were being coerced to float, even though errors was not coerce
* Bug in to_numeric in which invalid values for errors were being allowed
* Bug in format in which floating point complex numbers were not being formatted to proper display precision and trimming
* Bug in error messages in DataFrame.corr and Series.corr. Added the possibility of using a callable.
* Bug in Series.divmod and Series.rdivmod which would raise an (incorrect) ValueError rather than return a pair of Series objects as result
* Raises a helpful exception when a non-numeric index is sent to interpolate with methods which require numeric index.
* Bug in ~pandas.eval when comparing floats with scalar operators, for example: x < -0.1
* Fixed bug where casting all-boolean array to integer extension array failed
* Bug in divmod with a Series object containing zeros incorrectly raising AttributeError
* Inconsistency in Series floor-division (//) and divmod filling positive//zero with NaN instead of Inf
> Conversion
* Bug in DataFrame.astype() when passing a dict of columns and types the errors parameter was ignored.
> Strings
* Bug in the __name__ attribute of several methods of Series.str, which were set incorrectly
* Improved error message when passing Series of wrong dtype to Series.str.cat
> Interval
* Construction of Interval is restricted to numeric, Timestamp and Timedelta endpoints
* Fixed bug in Series/DataFrame not displaying NaN in IntervalIndex with missing values
* Bug in IntervalIndex.get_loc where a KeyError would be incorrectly raised for a decreasing IntervalIndex
* Bug in Index constructor where passing mixed closed Interval objects would result in a ValueError instead of an object dtype Index
> Indexing
* Improved exception message when calling DataFrame.iloc with a list of non-numeric objects .
* Improved exception message when calling .iloc or .loc with a boolean indexer with different length .
* Bug in KeyError exception message when indexing a MultiIndex with a non-existant key not displaying the original key .
* Bug in .iloc and .loc with a boolean indexer not raising an IndexError when too few items are passed .
* Bug in DataFrame.loc and Series.loc where KeyError was not raised for a MultiIndex when the key was less than or equal to the number of levels in the MultiIndex .
* Bug in which DataFrame.append produced an erroneous warning indicating that a KeyError will be thrown in the future when the data to be appended contains new columns .
* Bug in which DataFrame.to_csv caused a segfault for a reindexed data frame, when the indices were single-level MultiIndex .
* Fixed bug where assigning a arrays.PandasArray to a pandas.core.frame.DataFrame would raise error
* Allow keyword arguments for callable local reference used in the DataFrame.query string
* Fixed a KeyError when indexing a MultiIndex` level with a list containing exactly one label, which is missing
* Bug which produced AttributeError on partial matching Timestamp in a MultiIndex
* Bug in Categorical and CategoricalIndex with Interval values when using the in operator (__contains) with objects that are not comparable to the values in the Interval
* Bug in DataFrame.loc and DataFrame.iloc on a DataFrame with a single timezone-aware datetime64[ns] column incorrectly returning a scalar instead of a Series
* Bug in CategoricalIndex and Categorical incorrectly raising ValueError instead of TypeError when a list is passed using the in operator (__contains__)
* Bug in setting a new value in a Series with a Timedelta object incorrectly casting the value to an integer
* Bug in Series setting a new key (__setitem__) with a timezone-aware datetime incorrectly raising ValueError
* Bug in DataFrame.iloc when indexing with a read-only indexer
* Bug in Series setting an existing tuple key (__setitem__) with timezone-aware datetime values incorrectly raising TypeError
> Missing
* Fixed misleading exception message in Series.interpolate if argument order is required, but omitted .
* Fixed class type displayed in exception message in DataFrame.dropna if invalid axis parameter passed
* A ValueError will now be thrown by DataFrame.fillna when limit is not a positive integer
> MultiIndex
* Bug in which incorrect exception raised by Timedelta when testing the membership of MultiIndex
> I/O
* Bug in DataFrame.to_html() where values were truncated using display options instead of outputting the full content
* Fixed bug in missing text when using to_clipboard if copying utf-16 characters in Python 3 on Windows
* Bug in read_json for orient='table' when it tries to infer dtypes by default, which is not applicable as dtypes are already defined in the JSON schema
* Bug in read_json for orient='table' and float index, as it infers index dtype by default, which is not applicable because index dtype is already defined in the JSON schema
* Bug in read_json for orient='table' and string of float column names, as it makes a column name type conversion to Timestamp, which is not applicable because column names are already defined in the JSON schema
* Bug in json_normalize for errors='ignore' where missing values in the input data, were filled in resulting DataFrame with the string "nan" instead of numpy.nan
* DataFrame.to_html now raises TypeError when using an invalid type for the classes parameter instead of AssertionError
* Bug in DataFrame.to_string and DataFrame.to_latex that would lead to incorrect output when the header keyword is used
* Bug in read_csv not properly interpreting the UTF8 encoded filenames on Windows on Python 3.6+
* Improved performance in pandas.read_stata and pandas.io.stata.StataReader when converting columns that have missing values
* Bug in DataFrame.to_html where header numbers would ignore display options when rounding
* Bug in read_hdf where reading a table from an HDF5 file written directly with PyTables fails with a ValueError when using a sub-selection via the start or stop arguments
* Bug in read_hdf not properly closing store after a KeyError is raised
* Improved the explanation for the failure when value labels are repeated in Stata dta files and suggested work-arounds
* Improved pandas.read_stata and pandas.io.stata.StataReader to read incorrectly formatted 118 format files saved by Stata
* Improved the col_space parameter in DataFrame.to_html to accept a string so CSS length values can be set correctly
* Fixed bug in loading objects from S3 that contain # characters in the URL
* Adds use_bqstorage_api parameter to read_gbq to speed up downloads of large data frames. This feature requires version 0.10.0 of the pandas-gbq library as well as the google-cloud-bigquery-storage and fastavro libraries.
* Fixed memory leak in DataFrame.to_json when dealing with numeric data
* Bug in read_json where date strings with Z were not converted to a UTC timezone
* Added cache_dates=True parameter to read_csv, which allows to cache unique dates when they are parsed
* DataFrame.to_excel now raises a ValueError when the caller's dimensions exceed the limitations of Excel
* Fixed bug in pandas.read_csv where a BOM would result in incorrect parsing using engine='python'
* read_excel now raises a ValueError when input is of type pandas.io.excel.ExcelFile and engine param is passed since pandas.io.excel.ExcelFile has an engine defined
* Bug while selecting from HDFStore with where='' specified .
* Fixed bug in DataFrame.to_excel() where custom objects (i.e. PeriodIndex) inside merged cells were not being converted into types safe for the Excel writer
* Bug in read_hdf where reading a timezone aware DatetimeIndex would raise a TypeError
* Bug in to_msgpack and read_msgpack which would raise a ValueError rather than a FileNotFoundError for an invalid path
* Fixed bug in DataFrame.to_parquet which would raise a ValueError when the dataframe had no columns
* Allow parsing of PeriodDtype columns when using read_csv
> Plotting
* Fixed bug where api.extensions.ExtensionArray could not be used in matplotlib plotting
* Bug in an error message in DataFrame.plot. Improved the error message if non-numerics are passed to DataFrame.plot
* Bug in incorrect ticklabel positions when plotting an index that are non-numeric / non-datetime
* Fixed bug causing plots of PeriodIndex timeseries to fail if the frequency is a multiple of the frequency rule code
* Fixed bug when plotting a DatetimeIndex with datetime.timezone.utc timezone
> Groupby/resample/rolling
* Bug in pandas.core.resample.Resampler.agg with a timezone aware index where OverflowError would raise when passing a list of functions
* Bug in pandas.core.groupby.DataFrameGroupBy.nunique in which the names of column levels were lost
* Bug in pandas.core.groupby.GroupBy.agg when applying an aggregation function to timezone aware data
* Bug in pandas.core.groupby.GroupBy.first and pandas.core.groupby.GroupBy.last where timezone information would be dropped
* Bug in pandas.core.groupby.GroupBy.size when grouping only NA values
* Bug in Series.groupby where observed kwarg was previously ignored
* Bug in Series.groupby where using groupby with a MultiIndex Series with a list of labels equal to the length of the series caused incorrect grouping
* Ensured that ordering of outputs in groupby aggregation functions is consistent across all versions of Python
* Ensured that result group order is correct when grouping on an ordered Categorical and specifying observed=True
* Bug in pandas.core.window.Rolling.min and pandas.core.window.Rolling.max that caused a memory leak
* Bug in pandas.core.window.Rolling.count and pandas.core.window.Expanding.count was previously ignoring the axis keyword
* Bug in pandas.core.groupby.GroupBy.idxmax and pandas.core.groupby.GroupBy.idxmin with datetime column would return incorrect dtype
* Bug in pandas.core.groupby.GroupBy.cumsum, pandas.core.groupby.GroupBy.cumprod, pandas.core.groupby.GroupBy.cummin and pandas.core.groupby.GroupBy.cummax with categorical column having absent categories, would return incorrect result or segfault
* Bug in pandas.core.groupby.GroupBy.nth where NA values in the grouping would return incorrect results
* Bug in pandas.core.groupby.SeriesGroupBy.transform where transforming an empty group would raise a ValueError
* Bug in pandas.core.frame.DataFrame.groupby where passing a pandas.core.groupby.grouper.Grouper would return incorrect groups when using the .groups accessor
* Bug in pandas.core.groupby.GroupBy.agg where incorrect results are returned for uint64 columns.
* Bug in pandas.core.window.Rolling.median and pandas.core.window.Rolling.quantile where MemoryError is raised with empty window
* Bug in pandas.core.window.Rolling.median and pandas.core.window.Rolling.quantile where incorrect results are returned with closed='left' and closed='neither'
* Improved pandas.core.window.Rolling, pandas.core.window.Window and pandas.core.window.EWM functions to exclude nuisance columns from results instead of raising errors and raise a DataError only if all columns are nuisance
* Bug in pandas.core.window.Rolling.max and pandas.core.window.Rolling.min where incorrect results are returned with an empty variable window
* Raise a helpful exception when an unsupported weighted window function is used as an argument of pandas.core.window.Window.aggregate
> Reshaping
* Bug in pandas.merge adds a string of None, if None is assigned in suffixes instead of remain the column name as-is .
* Bug in merge when merging by index name would sometimes result in an incorrectly numbered index (missing index values are now assigned NA)
* to_records now accepts dtypes to its column_dtypes parameter
* Bug in concat where order of OrderedDict (and dict in Python 3.6+) is not respected, when passed in as objs argument
* Bug in pivot_table where columns with NaN values are dropped even if dropna argument is False, when the aggfunc argument contains a list
* Bug in concat where the resulting freq of two DatetimeIndex with the same freq would be dropped .
* Bug in merge where merging with equivalent Categorical dtypes was raising an error
* bug in DataFrame instantiating with a dict of iterators or generators (e.g. pd.DataFrame({'A': reversed(range(3))})) raised an error .
* Bug in DataFrame instantiating with a range (e.g. pd.DataFrame(range(3))) raised an error .
* Bug in DataFrame constructor when passing non-empty tuples would cause a segmentation fault
* Bug in Series.apply failed when the series is a timezone aware DatetimeIndex
* Bug in pandas.cut where large bins could incorrectly raise an error due to an integer overflow
* Bug in DataFrame.sort_index where an error is thrown when a multi-indexed DataFrame is sorted on all levels with the initial level sorted last
* Bug in Series.nlargest treats True as smaller than False
* Bug in DataFrame.pivot_table with a IntervalIndex as pivot index would raise TypeError
* Bug in which DataFrame.from_dict ignored order of OrderedDict when orient='index' .
* Bug in DataFrame.transpose where transposing a DataFrame with a timezone-aware datetime column would incorrectly raise ValueError
* Bug in pivot_table when pivoting a timezone aware column as the values would remove timezone information
* Bug in merge_asof when specifying multiple by columns where one is datetime64[ns, tz] dtype
> Sparse
* Significant speedup in SparseArray initialization that benefits most operations, fixing performance regression introduced in v0.20.0
* Bug in SparseFrame constructor where passing None as the data would cause default_fill_value to be ignored
* Bug in SparseDataFrame when adding a column in which the length of values does not match length of index, AssertionError is raised instead of raising ValueError
* Introduce a better error message in Series.sparse.from_coo so it returns a TypeError for inputs that are not coo matrices
* Bug in numpy.modf on a SparseArray. Now a tuple of SparseArray is returned .
> Build Changes
* Fix install error with PyPy on macOS
> ExtensionArray
* Bug in factorize when passing an ExtensionArray with a custom na_sentinel .
* Series.count miscounts NA values in ExtensionArrays
* Added Series.__array_ufunc__ to better handle NumPy ufuncs applied to Series backed by extension arrays .
* Keyword argument deep has been removed from ExtensionArray.copy
> Other
* Removed unused C functions from vendored UltraJSON implementation
* Allow Index and RangeIndex to be passed to numpy min and max functions
* Use actual class name in repr of empty objects of a Series subclass .
* Bug in DataFrame where passing an object array of timezone-aware datetime objects would incorrectly raise ValueError
- Remove upstream-included pandas-tests-memory.patch
-------------------------------------------------------------------
Sat Mar 16 22:35:08 UTC 2019 - Arun Persaud <arun@gmx.de>

View File

@ -17,84 +17,81 @@
%{?!python_module:%define python_module() python-%{**} python3-%{**}}
%define oldpython python
%define skip_python2 1
Name: python-pandas
Version: 0.24.2
Version: 0.25.0
Release: 0
Summary: Python module for working with "relational" or "labeled" data
Summary: Python data structures for data analysis, time series, and statistics
License: BSD-3-Clause
Group: Development/Libraries/Python
URL: http://pandas.pydata.org/
Source0: https://files.pythonhosted.org/packages/source/p/pandas/pandas-%{version}.tar.gz
Patch0: pandas-tests-memory.patch
BuildRequires: %{python_module Cython >= 0.28.2}
BuildRequires: %{python_module SQLAlchemy}
BuildRequires: %{python_module XlsxWriter}
BuildRequires: %{python_module beautifulsoup4 >= 4.2.1}
BuildRequires: %{python_module devel}
BuildRequires: %{python_module hypothesis}
BuildRequires: %{python_module lxml}
BuildRequires: %{python_module nose}
BuildRequires: %{python_module numpy-devel >= 1.15.0}
BuildRequires: %{python_module pytest-mock}
BuildRequires: %{python_module pytest}
BuildRequires: %{python_module python-dateutil >= 2.5}
BuildRequires: %{python_module pytz >= 2011k}
BuildRequires: %{python_module numpy-devel >= 1.13.3}
BuildRequires: %{python_module setuptools >= 24.2.0}
BuildRequires: %{python_module six}
BuildRequires: %{python_module xlrd}
BuildRequires: fdupes
BuildRequires: gcc-c++
BuildRequires: python-rpm-macros
# SECTION test requirements
BuildRequires: %{python_module SQLAlchemy >= 1.1.4}
BuildRequires: %{python_module XlsxWriter >= 0.9.8}
BuildRequires: %{python_module beautifulsoup4 >= 4.6.0}
BuildRequires: %{python_module hypothesis}
BuildRequires: %{python_module lxml >= 3.8.0}
BuildRequires: %{python_module openpyxl >= 2.4.8}
BuildRequires: %{python_module pytest-mock}
BuildRequires: %{python_module pytest >= 4.0.2}
BuildRequires: %{python_module python-dateutil >= 2.6.1}
BuildRequires: %{python_module pytz >= 2015.4}
BuildRequires: %{python_module xlrd >= 1.1.0}
BuildRequires: %{python_module xlwt >= 1.2.0}
BuildRequires: xvfb-run
# /SECTION
Requires: python-Cython >= 0.28.2
Requires: python-Tempita
Requires: python-lxml
Requires: python-numpy >= 1.15.0
Requires: python-python-dateutil >= 2.5
Requires: python-pytz >= 2011k
Requires: python-six
Recommends: python-Bottleneck
Requires: python-numpy >= 1.13.3
Requires: python-python-dateutil >= 2.6.1
Requires: python-pytz >= 2015.4
Recommends: python-Bottleneck >= 1.2.1
Recommends: python-Jinja2
Recommends: python-SQLAlchemy >= 0.8.1
Recommends: python-XlsxWriter
Recommends: python-beautifulsoup4 >= 4.2.1
Recommends: python-QtPy
Recommends: python-SQLAlchemy >= 1.1.4
Recommends: python-XlsxWriter >= 0.9.8
Recommends: python-beautifulsoup4 >= 4.6.0
Recommends: python-blosc
Recommends: python-boto
Recommends: python-google-api-python-client
Recommends: python-fastparquet >= 0.2.1
Recommends: python-gcsfs >= 0.2.2
Recommends: python-html5lib
Recommends: python-matplotlib
Recommends: python-numexpr >= 2.1
Recommends: python-oauth2client
Recommends: python-openpyxl >= 2.4
Recommends: python-pandas-gbq
Recommends: python-python-gflags
Recommends: python-s3fs
Recommends: python-scipy
Recommends: python-tables >= 3.0.0
Recommends: python-xarray >= 0.7.0
Recommends: python-xlrd
Recommends: python-xlwt
Recommends: python-lxml >= 3.8.0
Recommends: python-matplotlib >= 2.2.2
Recommends: python-numexpr >= 2.6.2
Recommends: python-openpyxl >= 2.4.8
Recommends: python-pandas-gbq >= 0.8.0
Recommends: python-psycopg2
Recommends: python-pyarrow >= 0.9.0
Recommends: python-PyMySQL >= 0.7.11
Recommends: python-pyreadstat
Recommends: python-qt5
Recommends: python-scipy >= 0.19.0
Recommends: python-tables >= 3.4.2
Recommends: python-xarray >= 0.8.2
Recommends: python-xlrd >= 1.1.0
Recommends: python-xlwt >= 1.2.0
Recommends: xclip
Recommends: xsel
Recommends: python-zlib
Obsoletes: python-pandas-doc < %{version}
Provides: python-pandas-doc = %{version}
%ifpython2
Recommends: python-backports.lzma
Obsoletes: %{oldpython}-pandas-doc < %{version}
Provides: %{oldpython}-pandas-doc = %{version}
%endif
%python_subpackages
%description
pandas is a Python package providing flexible and expressive data
structures for working with "relational" or "labeled" data.
Documentation is located at
http://pandas.pydata.org/pandas-docs/stable/ .
Pandas is a Python package providing data structures designed for
working with structured (tabular, multidimensional, potentially
heterogeneous) and time series data. It is a high-level building
block for doing data analysis in Python.
%prep
%setup -q -n pandas-%{version}
%patch0 -p1
sed -i -e '/^#!\//, 1d' pandas/core/computation/eval.py
%build
@ -107,7 +104,13 @@ export CFLAGS="%{optflags} -fno-strict-aliasing"
%check
# skip test that tries to compile stuff in buildroot test_oo_optimizable
%python_expand PYTHONPATH=%{buildroot}%{$python_sitearch} xvfb-run py.test-%{$python_version} -v %{buildroot}%{$python_sitearch}/pandas/tests -k 'not test_oo_optimizable'
export PYTHONHASHSEED=$(python -c 'import random; print(random.randint(1, 4294967295))')
export http_proxy=http://1.2.3.4 https_proxy=http://1.2.3.4;
export LANG=en_US.UTF-8
export LC_ALL=en_US.UTF-8
%{python_expand export PYTHONPATH=%{buildroot}%{$python_sitearch}
xvfb-run py.test-%{$python_version} -v %{buildroot}%{$python_sitearch}/pandas/tests -k 'not test_oo_optimizable'
}
%files %{python_files}
%license LICENSE