|
|
|
@@ -1,3 +1,409 @@
|
|
|
|
|
-------------------------------------------------------------------
|
|
|
|
|
Mon Jul 22 15:36:34 UTC 2019 - Todd R <toddrme2178@gmail.com>
|
|
|
|
|
|
|
|
|
|
- Update to Version 0.25.0
|
|
|
|
|
+ Warning
|
|
|
|
|
* Starting with the 0.25.x series of releases, pandas only supports Python 3.5.3 and higher.
|
|
|
|
|
* The minimum supported Python version will be bumped to 3.6 in a future release.
|
|
|
|
|
* Panel has been fully removed. For N-D labeled data structures, please
|
|
|
|
|
use xarray
|
|
|
|
|
* read_pickle read_msgpack are only guaranteed backwards compatible back to
|
|
|
|
|
pandas version 0.20.3
|
|
|
|
|
+ Enhancements
|
|
|
|
|
* Groupby aggregation with relabeling
|
|
|
|
|
Pandas has added special groupby behavior, known as "named aggregation", for naming the
|
|
|
|
|
output columns when applying multiple aggregation functions to specific columns.
|
|
|
|
|
* Groupby Aggregation with multiple lambdas
|
|
|
|
|
You can now provide multiple lambda functions to a list-like aggregation in
|
|
|
|
|
pandas.core.groupby.GroupBy.agg.
|
|
|
|
|
* Better repr for MultiIndex
|
|
|
|
|
Printing of MultiIndex instances now shows tuples of each row and ensures
|
|
|
|
|
that the tuple items are vertically aligned, so it's now easier to understand
|
|
|
|
|
the structure of the MultiIndex.
|
|
|
|
|
* Shorter truncated repr for Series and DataFrame
|
|
|
|
|
Currently, the default display options of pandas ensure that when a Series
|
|
|
|
|
or DataFrame has more than 60 rows, its repr gets truncated to this maximum
|
|
|
|
|
of 60 rows (the display.max_rows option). However, this still gives
|
|
|
|
|
a repr that takes up a large part of the vertical screen estate. Therefore,
|
|
|
|
|
a new option display.min_rows is introduced with a default of 10 which
|
|
|
|
|
determines the number of rows showed in the truncated repr:
|
|
|
|
|
* Json normalize with max_level param support
|
|
|
|
|
json_normalize normalizes the provided input dict to all
|
|
|
|
|
nested levels. The new max_level parameter provides more control over
|
|
|
|
|
which level to end normalization.
|
|
|
|
|
* Series.explode to split list-like values to rows
|
|
|
|
|
Series and DataFrame have gained the DataFrame.explode methods to transform
|
|
|
|
|
list-likes to individual rows.
|
|
|
|
|
* DataFrame.plot keywords logy, logx and loglog can now accept the value 'sym' for symlog scaling.
|
|
|
|
|
* Added support for ISO week year format ('%G-%V-%u') when parsing datetimes using to_datetime
|
|
|
|
|
* Indexing of DataFrame and Series now accepts zerodim np.ndarray
|
|
|
|
|
* Timestamp.replace now supports the fold argument to disambiguate DST transition times
|
|
|
|
|
* DataFrame.at_time and Series.at_time now support datetime.time objects with timezones
|
|
|
|
|
* DataFrame.pivot_table now accepts an observed parameter which is passed to underlying calls to DataFrame.groupby to speed up grouping categorical data.
|
|
|
|
|
* Series.str has gained Series.str.casefold method to removes all case distinctions present in a string
|
|
|
|
|
* DataFrame.set_index now works for instances of abc.Iterator, provided their output is of the same length as the calling frame
|
|
|
|
|
* DatetimeIndex.union now supports the sort argument. The behavior of the sort parameter matches that of Index.union
|
|
|
|
|
* RangeIndex.union now supports the sort argument. If sort=False an unsorted Int64Index is always returned. sort=None is the default and returns a monotonically increasing RangeIndex if possible or a sorted Int64Index if not
|
|
|
|
|
* TimedeltaIndex.intersection now also supports the sort keyword
|
|
|
|
|
* DataFrame.rename now supports the errors argument to raise errors when attempting to rename nonexistent keys
|
|
|
|
|
* Added api.frame.sparse for working with a DataFrame whose values are sparse
|
|
|
|
|
* RangeIndex has gained ~RangeIndex.start, ~RangeIndex.stop, and ~RangeIndex.step attributes
|
|
|
|
|
* datetime.timezone objects are now supported as arguments to timezone methods and constructors
|
|
|
|
|
* DataFrame.query and DataFrame.eval now supports quoting column names with backticks to refer to names with spaces
|
|
|
|
|
* merge_asof now gives a more clear error message when merge keys are categoricals that are not equal
|
|
|
|
|
* pandas.core.window.Rolling supports exponential (or Poisson) window type
|
|
|
|
|
* Error message for missing required imports now includes the original import error's text
|
|
|
|
|
* DatetimeIndex and TimedeltaIndex now have a mean method
|
|
|
|
|
* DataFrame.describe now formats integer percentiles without decimal point
|
|
|
|
|
* Added support for reading SPSS .sav files using read_spss
|
|
|
|
|
* Added new option plotting.backend to be able to select a plotting backend different than the existing matplotlib one. Use pandas.set_option('plotting.backend', '<backend-module>') where <backend-module is a library implementing the pandas plotting API
|
|
|
|
|
* pandas.offsets.BusinessHour supports multiple opening hours intervals
|
|
|
|
|
* read_excel can now use openpyxl to read Excel files via the engine='openpyxl' argument. This will become the default in a future release
|
|
|
|
|
* pandas.io.excel.read_excel supports reading OpenDocument tables. Specify engine='odf' to enable. Consult the IO User Guide <io.ods> for more details
|
|
|
|
|
* Interval, IntervalIndex, and ~arrays.IntervalArray have gained an ~Interval.is_empty attribute denoting if the given interval(s) are empty
|
|
|
|
|
+ Backwards incompatible API changes
|
|
|
|
|
* Indexing with date strings with UTC offsets
|
|
|
|
|
Indexing a DataFrame or Series with a DatetimeIndex with a
|
|
|
|
|
date string with a UTC offset would previously ignore the UTC offset. Now, the UTC offset
|
|
|
|
|
is respected in indexing.
|
|
|
|
|
* MultiIndex constructed from levels and codes
|
|
|
|
|
Constructing a MultiIndex with NaN levels or codes value < -1 was allowed previously.
|
|
|
|
|
Now, construction with codes value < -1 is not allowed and NaN levels' corresponding codes
|
|
|
|
|
would be reassigned as -1.
|
|
|
|
|
* Groupby.apply on DataFrame evaluates first group only once
|
|
|
|
|
The implementation of DataFrameGroupBy.apply()
|
|
|
|
|
previously evaluated the supplied function consistently twice on the first group
|
|
|
|
|
to infer if it is safe to use a fast code path. Particularly for functions with
|
|
|
|
|
side effects, this was an undesired behavior and may have led to surprises.
|
|
|
|
|
* Concatenating sparse values
|
|
|
|
|
When passed DataFrames whose values are sparse, concat will now return a
|
|
|
|
|
Series or DataFrame with sparse values, rather than a SparseDataFrame .
|
|
|
|
|
* The .str-accessor performs stricter type checks
|
|
|
|
|
Due to the lack of more fine-grained dtypes, Series.str so far only checked whether the data was
|
|
|
|
|
of object dtype. Series.str will now infer the dtype data *within* the Series; in particular,
|
|
|
|
|
'bytes'-only data will raise an exception (except for Series.str.decode, Series.str.get,
|
|
|
|
|
Series.str.len, Series.str.slice).
|
|
|
|
|
* Categorical dtypes are preserved during groupby
|
|
|
|
|
Previously, columns that were categorical, but not the groupby key(s) would be converted to object dtype during groupby operations. Pandas now will preserve these dtypes.
|
|
|
|
|
* Incompatible Index type unions
|
|
|
|
|
When performing Index.union operations between objects of incompatible dtypes,
|
|
|
|
|
the result will be a base Index of dtype object. This behavior holds true for
|
|
|
|
|
unions between Index objects that previously would have been prohibited. The dtype
|
|
|
|
|
of empty Index objects will now be evaluated before performing union operations
|
|
|
|
|
rather than simply returning the other Index object. Index.union can now be
|
|
|
|
|
considered commutative, such that A.union(B) == B.union(A) .
|
|
|
|
|
* DataFrame groupby ffill/bfill no longer return group labels
|
|
|
|
|
The methods ffill, bfill, pad and backfill of
|
|
|
|
|
DataFrameGroupBy <pandas.core.groupby.DataFrameGroupBy>
|
|
|
|
|
previously included the group labels in the return value, which was
|
|
|
|
|
inconsistent with other groupby transforms. Now only the filled values
|
|
|
|
|
are returned.
|
|
|
|
|
* DataFrame describe on an empty categorical / object column will return top and freq
|
|
|
|
|
When calling DataFrame.describe with an empty categorical / object
|
|
|
|
|
column, the 'top' and 'freq' columns were previously omitted, which was inconsistent with
|
|
|
|
|
the output for non-empty columns. Now the 'top' and 'freq' columns will always be included,
|
|
|
|
|
with numpy.nan in the case of an empty DataFrame
|
|
|
|
|
* __str__ methods now call __repr__ rather than vice versa
|
|
|
|
|
Pandas has until now mostly defined string representations in a Pandas objects's
|
|
|
|
|
__str__/__unicode__/__bytes__ methods, and called __str__ from the __repr__
|
|
|
|
|
method, if a specific __repr__ method is not found. This is not needed for Python3.
|
|
|
|
|
In Pandas 0.25, the string representations of Pandas objects are now generally
|
|
|
|
|
defined in __repr__, and calls to __str__ in general now pass the call on to
|
|
|
|
|
the __repr__, if a specific __str__ method doesn't exist, as is standard for Python.
|
|
|
|
|
This change is backward compatible for direct usage of Pandas, but if you subclass
|
|
|
|
|
Pandas objects *and* give your subclasses specific __str__/__repr__ methods,
|
|
|
|
|
you may have to adjust your __str__/__repr__ methods .
|
|
|
|
|
* Indexing an IntervalIndex with Interval objects
|
|
|
|
|
Indexing methods for IntervalIndex have been modified to require exact matches only for Interval queries.
|
|
|
|
|
IntervalIndex methods previously matched on any overlapping Interval. Behavior with scalar points, e.g. querying
|
|
|
|
|
with an integer, is unchanged .
|
|
|
|
|
* Binary ufuncs on Series now align
|
|
|
|
|
Applying a binary ufunc like numpy.power now aligns the inputs
|
|
|
|
|
when both are Series .
|
|
|
|
|
* Categorical.argsort now places missing values at the end
|
|
|
|
|
Categorical.argsort now places missing values at the end of the array, making it
|
|
|
|
|
consistent with NumPy and the rest of pandas .
|
|
|
|
|
* Column order is preserved when passing a list of dicts to DataFrame
|
|
|
|
|
Starting with Python 3.7 the key-order of dict is guaranteed <https://mail.python.org/pipermail/python-dev/2017-December/151283.html>_. In practice, this has been true since
|
|
|
|
|
Python 3.6. The DataFrame constructor now treats a list of dicts in the same way as
|
|
|
|
|
it does a list of OrderedDict, i.e. preserving the order of the dicts.
|
|
|
|
|
This change applies only when pandas is running on Python>=3.6 .
|
|
|
|
|
* Increased minimum versions for dependencies
|
|
|
|
|
* DatetimeTZDtype will now standardize pytz timezones to a common timezone instance
|
|
|
|
|
* Timestamp and Timedelta scalars now implement the to_numpy method as aliases to Timestamp.to_datetime64 and Timedelta.to_timedelta64, respectively.
|
|
|
|
|
* Timestamp.strptime will now rise a NotImplementedError
|
|
|
|
|
* Comparing Timestamp with unsupported objects now returns :pyNotImplemented instead of raising TypeError. This implies that unsupported rich comparisons are delegated to the other object, and are now consistent with Python 3 behavior for datetime objects
|
|
|
|
|
* Bug in DatetimeIndex.snap which didn't preserving the name of the input Index
|
|
|
|
|
* The arg argument in pandas.core.groupby.DataFrameGroupBy.agg has been renamed to func
|
|
|
|
|
* The arg argument in pandas.core.window._Window.aggregate has been renamed to func
|
|
|
|
|
* Most Pandas classes had a __bytes__ method, which was used for getting a python2-style bytestring representation of the object. This method has been removed as a part of dropping Python2
|
|
|
|
|
* The .str-accessor has been disabled for 1-level MultiIndex, use MultiIndex.to_flat_index if necessary
|
|
|
|
|
* Removed support of gtk package for clipboards
|
|
|
|
|
* Using an unsupported version of Beautiful Soup 4 will now raise an ImportError instead of a ValueError
|
|
|
|
|
* Series.to_excel and DataFrame.to_excel will now raise a ValueError when saving timezone aware data.
|
|
|
|
|
* ExtensionArray.argsort places NA values at the end of the sorted array.
|
|
|
|
|
* DataFrame.to_hdf and Series.to_hdf will now raise a NotImplementedError when saving a MultiIndex with extention data types for a fixed format.
|
|
|
|
|
* Passing duplicate names in read_csv will now raise a ValueError
|
|
|
|
|
+ Deprecations
|
|
|
|
|
* Sparse subclasses
|
|
|
|
|
The SparseSeries and SparseDataFrame subclasses are deprecated. Their functionality is better-provided
|
|
|
|
|
by a Series or DataFrame with sparse values.
|
|
|
|
|
* msgpack format
|
|
|
|
|
The msgpack format is deprecated as of 0.25 and will be removed in a future version. It is recommended to use pyarrow for on-the-wire transmission of pandas objects.
|
|
|
|
|
* The deprecated .ix[] indexer now raises a more visible FutureWarning instead of DeprecationWarning .
|
|
|
|
|
* Deprecated the units=M (months) and units=Y (year) parameters for units of pandas.to_timedelta, pandas.Timedelta and pandas.TimedeltaIndex
|
|
|
|
|
* pandas.concat has deprecated the join_axes-keyword. Instead, use DataFrame.reindex or DataFrame.reindex_like on the result or on the inputs
|
|
|
|
|
* The SparseArray.values attribute is deprecated. You can use np.asarray(...) or
|
|
|
|
|
the SparseArray.to_dense method instead .
|
|
|
|
|
* The functions pandas.to_datetime and pandas.to_timedelta have deprecated the box keyword. Instead, use to_numpy or Timestamp.to_datetime64 or Timedelta.to_timedelta64.
|
|
|
|
|
* The DataFrame.compound and Series.compound methods are deprecated and will be removed in a future version .
|
|
|
|
|
* The internal attributes _start, _stop and _step attributes of RangeIndex have been deprecated.
|
|
|
|
|
Use the public attributes ~RangeIndex.start, ~RangeIndex.stop and ~RangeIndex.step instead .
|
|
|
|
|
* The Series.ftype, Series.ftypes and DataFrame.ftypes methods are deprecated and will be removed in a future version.
|
|
|
|
|
Instead, use Series.dtype and DataFrame.dtypes .
|
|
|
|
|
* The Series.get_values, DataFrame.get_values, Index.get_values,
|
|
|
|
|
SparseArray.get_values and Categorical.get_values methods are deprecated.
|
|
|
|
|
One of np.asarray(..) or ~Series.to_numpy can be used instead .
|
|
|
|
|
* The 'outer' method on NumPy ufuncs, e.g. np.subtract.outer has been deprecated on Series objects. Convert the input to an array with Series.array first
|
|
|
|
|
* Timedelta.resolution is deprecated and replaced with Timedelta.resolution_string. In a future version, Timedelta.resolution will be changed to behave like the standard library datetime.timedelta.resolution
|
|
|
|
|
* read_table has been undeprecated.
|
|
|
|
|
* Index.dtype_str is deprecated.
|
|
|
|
|
* Series.imag and Series.real are deprecated.
|
|
|
|
|
* Series.put is deprecated.
|
|
|
|
|
* Index.item and Series.item is deprecated.
|
|
|
|
|
* The default value ordered=None in ~pandas.api.types.CategoricalDtype has been deprecated in favor of ordered=False. When converting between categorical types ordered=True must be explicitly passed in order to be preserved.
|
|
|
|
|
* Index.contains is deprecated. Use key in index (__contains__) instead .
|
|
|
|
|
* DataFrame.get_dtype_counts is deprecated.
|
|
|
|
|
* Categorical.ravel will return a Categorical instead of a np.ndarray
|
|
|
|
|
+ Removal of prior version deprecations/changes
|
|
|
|
|
* Removed Panel
|
|
|
|
|
* Removed the previously deprecated sheetname keyword in read_excel
|
|
|
|
|
* Removed the previously deprecated TimeGrouper
|
|
|
|
|
* Removed the previously deprecated parse_cols keyword in read_excel
|
|
|
|
|
* Removed the previously deprecated pd.options.html.border
|
|
|
|
|
* Removed the previously deprecated convert_objects
|
|
|
|
|
* Removed the previously deprecated select method of DataFrame and Series
|
|
|
|
|
* Removed the previously deprecated behavior of Series treated as list-like in ~Series.cat.rename_categories
|
|
|
|
|
* Removed the previously deprecated DataFrame.reindex_axis and Series.reindex_axis
|
|
|
|
|
* Removed the previously deprecated behavior of altering column or index labels with Series.rename_axis or DataFrame.rename_axis
|
|
|
|
|
* Removed the previously deprecated tupleize_cols keyword argument in read_html, read_csv, and DataFrame.to_csv
|
|
|
|
|
* Removed the previously deprecated DataFrame.from.csv and Series.from_csv
|
|
|
|
|
* Removed the previously deprecated raise_on_error keyword argument in DataFrame.where and DataFrame.mask
|
|
|
|
|
* Removed the previously deprecated ordered and categories keyword arguments in astype
|
|
|
|
|
* Removed the previously deprecated cdate_range
|
|
|
|
|
* Removed the previously deprecated True option for the dropna keyword argument in SeriesGroupBy.nth
|
|
|
|
|
* Removed the previously deprecated convert keyword argument in Series.take and DataFrame.take
|
|
|
|
|
+ Performance improvements
|
|
|
|
|
* Significant speedup in SparseArray initialization that benefits most operations, fixing performance regression introduced in v0.20.0
|
|
|
|
|
* DataFrame.to_stata() is now faster when outputting data with any string or non-native endian columns
|
|
|
|
|
* Improved performance of Series.searchsorted. The speedup is especially large when the dtype is
|
|
|
|
|
int8/int16/int32 and the searched key is within the integer bounds for the dtype
|
|
|
|
|
* Improved performance of pandas.core.groupby.GroupBy.quantile
|
|
|
|
|
* Improved performance of slicing and other selected operation on a RangeIndex
|
|
|
|
|
* RangeIndex now performs standard lookup without instantiating an actual hashtable, hence saving memory
|
|
|
|
|
* Improved performance of read_csv by faster tokenizing and faster parsing of small float numbers
|
|
|
|
|
* Improved performance of read_csv by faster parsing of N/A and boolean values
|
|
|
|
|
* Improved performance of IntervalIndex.is_monotonic, IntervalIndex.is_monotonic_increasing and IntervalIndex.is_monotonic_decreasing by removing conversion to MultiIndex
|
|
|
|
|
* Improved performance of DataFrame.to_csv when writing datetime dtypes
|
|
|
|
|
* Improved performance of read_csv by much faster parsing of MM/YYYY and DD/MM/YYYY datetime formats
|
|
|
|
|
* Improved performance of nanops for dtypes that cannot store NaNs. Speedup is particularly prominent for Series.all and Series.any
|
|
|
|
|
* Improved performance of Series.map for dictionary mappers on categorical series by mapping the categories instead of mapping all values
|
|
|
|
|
* Improved performance of IntervalIndex.intersection
|
|
|
|
|
* Improved performance of read_csv by faster concatenating date columns without extra conversion to string for integer/float zero and float NaN; by faster checking the string for the possibility of being a date
|
|
|
|
|
* Improved performance of IntervalIndex.is_unique by removing conversion to MultiIndex
|
|
|
|
|
* Restored performance of DatetimeIndex.__iter__ by re-enabling specialized code path
|
|
|
|
|
* Improved performance when building MultiIndex with at least one CategoricalIndex level
|
|
|
|
|
* Improved performance by removing the need for a garbage collect when checking for SettingWithCopyWarning
|
|
|
|
|
* For to_datetime changed default value of cache parameter to True
|
|
|
|
|
* Improved performance of DatetimeIndex and PeriodIndex slicing given non-unique, monotonic data .
|
|
|
|
|
* Improved performance of pd.read_json for index-oriented data.
|
|
|
|
|
* Improved performance of MultiIndex.shape .
|
|
|
|
|
+ Bug fixes
|
|
|
|
|
> Categorical
|
|
|
|
|
* Bug in DataFrame.at and Series.at that would raise exception if the index was a CategoricalIndex
|
|
|
|
|
* Fixed bug in comparison of ordered Categorical that contained missing values with a scalar which sometimes incorrectly resulted in True
|
|
|
|
|
* Bug in DataFrame.dropna when the DataFrame has a CategoricalIndex containing Interval objects incorrectly raised a TypeError
|
|
|
|
|
> Datetimelike
|
|
|
|
|
* Bug in to_datetime which would raise an (incorrect) ValueError when called with a date far into the future and the format argument specified instead of raising OutOfBoundsDatetime
|
|
|
|
|
* Bug in to_datetime which would raise InvalidIndexError: Reindexing only valid with uniquely valued Index objects when called with cache=True, with arg including at least two different elements from the set {None, numpy.nan, pandas.NaT}
|
|
|
|
|
* Bug in DataFrame and Series where timezone aware data with dtype='datetime64[ns] was not cast to naive
|
|
|
|
|
* Improved Timestamp type checking in various datetime functions to prevent exceptions when using a subclassed datetime
|
|
|
|
|
* Bug in Series and DataFrame repr where np.datetime64('NaT') and np.timedelta64('NaT') with dtype=object would be represented as NaN
|
|
|
|
|
* Bug in to_datetime which does not replace the invalid argument with NaT when error is set to coerce
|
|
|
|
|
* Bug in adding DateOffset with nonzero month to DatetimeIndex would raise ValueError
|
|
|
|
|
* Bug in to_datetime which raises unhandled OverflowError when called with mix of invalid dates and NaN values with format='%Y%m%d' and error='coerce'
|
|
|
|
|
* Bug in isin for datetimelike indexes; DatetimeIndex, TimedeltaIndex and PeriodIndex where the levels parameter was ignored.
|
|
|
|
|
* Bug in to_datetime which raises TypeError for format='%Y%m%d' when called for invalid integer dates with length >= 6 digits with errors='ignore'
|
|
|
|
|
* Bug when comparing a PeriodIndex against a zero-dimensional numpy array
|
|
|
|
|
* Bug in constructing a Series or DataFrame from a numpy datetime64 array with a non-ns unit and out-of-bound timestamps generating rubbish data, which will now correctly raise an OutOfBoundsDatetime error .
|
|
|
|
|
* Bug in date_range with unnecessary OverflowError being raised for very large or very small dates
|
|
|
|
|
* Bug where adding Timestamp to a np.timedelta64 object would raise instead of returning a Timestamp
|
|
|
|
|
* Bug where comparing a zero-dimensional numpy array containing a np.datetime64 object to a Timestamp would incorrect raise TypeError
|
|
|
|
|
* Bug in to_datetime which would raise ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True when called with cache=True, with arg including datetime strings with different offset
|
|
|
|
|
> Timedelta
|
|
|
|
|
* Bug in TimedeltaIndex.intersection where for non-monotonic indices in some cases an empty Index was returned when in fact an intersection existed
|
|
|
|
|
* Bug with comparisons between Timedelta and NaT raising TypeError
|
|
|
|
|
* Bug when adding or subtracting a BusinessHour to a Timestamp with the resulting time landing in a following or prior day respectively
|
|
|
|
|
* Bug when comparing a TimedeltaIndex against a zero-dimensional numpy array
|
|
|
|
|
> Timezones
|
|
|
|
|
* Bug in DatetimeIndex.to_frame where timezone aware data would be converted to timezone naive data
|
|
|
|
|
* Bug in to_datetime with utc=True and datetime strings that would apply previously parsed UTC offsets to subsequent arguments
|
|
|
|
|
* Bug in Timestamp.tz_localize and Timestamp.tz_convert does not propagate freq
|
|
|
|
|
* Bug in Series.at where setting Timestamp with timezone raises TypeError
|
|
|
|
|
* Bug in DataFrame.update when updating with timezone aware data would return timezone naive data
|
|
|
|
|
* Bug in to_datetime where an uninformative RuntimeError was raised when passing a naive Timestamp with datetime strings with mixed UTC offsets
|
|
|
|
|
* Bug in to_datetime with unit='ns' would drop timezone information from the parsed argument
|
|
|
|
|
* Bug in DataFrame.join where joining a timezone aware index with a timezone aware column would result in a column of NaN
|
|
|
|
|
* Bug in date_range where ambiguous or nonexistent start or end times were not handled by the ambiguous or nonexistent keywords respectively
|
|
|
|
|
* Bug in DatetimeIndex.union when combining a timezone aware and timezone unaware DatetimeIndex
|
|
|
|
|
* Bug when applying a numpy reduction function (e.g. numpy.minimum) to a timezone aware Series
|
|
|
|
|
> Numeric
|
|
|
|
|
* Bug in to_numeric in which large negative numbers were being improperly handled
|
|
|
|
|
* Bug in to_numeric in which numbers were being coerced to float, even though errors was not coerce
|
|
|
|
|
* Bug in to_numeric in which invalid values for errors were being allowed
|
|
|
|
|
* Bug in format in which floating point complex numbers were not being formatted to proper display precision and trimming
|
|
|
|
|
* Bug in error messages in DataFrame.corr and Series.corr. Added the possibility of using a callable.
|
|
|
|
|
* Bug in Series.divmod and Series.rdivmod which would raise an (incorrect) ValueError rather than return a pair of Series objects as result
|
|
|
|
|
* Raises a helpful exception when a non-numeric index is sent to interpolate with methods which require numeric index.
|
|
|
|
|
* Bug in ~pandas.eval when comparing floats with scalar operators, for example: x < -0.1
|
|
|
|
|
* Fixed bug where casting all-boolean array to integer extension array failed
|
|
|
|
|
* Bug in divmod with a Series object containing zeros incorrectly raising AttributeError
|
|
|
|
|
* Inconsistency in Series floor-division (//) and divmod filling positive//zero with NaN instead of Inf
|
|
|
|
|
> Conversion
|
|
|
|
|
* Bug in DataFrame.astype() when passing a dict of columns and types the errors parameter was ignored.
|
|
|
|
|
> Strings
|
|
|
|
|
* Bug in the __name__ attribute of several methods of Series.str, which were set incorrectly
|
|
|
|
|
* Improved error message when passing Series of wrong dtype to Series.str.cat
|
|
|
|
|
> Interval
|
|
|
|
|
* Construction of Interval is restricted to numeric, Timestamp and Timedelta endpoints
|
|
|
|
|
* Fixed bug in Series/DataFrame not displaying NaN in IntervalIndex with missing values
|
|
|
|
|
* Bug in IntervalIndex.get_loc where a KeyError would be incorrectly raised for a decreasing IntervalIndex
|
|
|
|
|
* Bug in Index constructor where passing mixed closed Interval objects would result in a ValueError instead of an object dtype Index
|
|
|
|
|
> Indexing
|
|
|
|
|
* Improved exception message when calling DataFrame.iloc with a list of non-numeric objects .
|
|
|
|
|
* Improved exception message when calling .iloc or .loc with a boolean indexer with different length .
|
|
|
|
|
* Bug in KeyError exception message when indexing a MultiIndex with a non-existant key not displaying the original key .
|
|
|
|
|
* Bug in .iloc and .loc with a boolean indexer not raising an IndexError when too few items are passed .
|
|
|
|
|
* Bug in DataFrame.loc and Series.loc where KeyError was not raised for a MultiIndex when the key was less than or equal to the number of levels in the MultiIndex .
|
|
|
|
|
* Bug in which DataFrame.append produced an erroneous warning indicating that a KeyError will be thrown in the future when the data to be appended contains new columns .
|
|
|
|
|
* Bug in which DataFrame.to_csv caused a segfault for a reindexed data frame, when the indices were single-level MultiIndex .
|
|
|
|
|
* Fixed bug where assigning a arrays.PandasArray to a pandas.core.frame.DataFrame would raise error
|
|
|
|
|
* Allow keyword arguments for callable local reference used in the DataFrame.query string
|
|
|
|
|
* Fixed a KeyError when indexing a MultiIndex` level with a list containing exactly one label, which is missing
|
|
|
|
|
* Bug which produced AttributeError on partial matching Timestamp in a MultiIndex
|
|
|
|
|
* Bug in Categorical and CategoricalIndex with Interval values when using the in operator (__contains) with objects that are not comparable to the values in the Interval
|
|
|
|
|
* Bug in DataFrame.loc and DataFrame.iloc on a DataFrame with a single timezone-aware datetime64[ns] column incorrectly returning a scalar instead of a Series
|
|
|
|
|
* Bug in CategoricalIndex and Categorical incorrectly raising ValueError instead of TypeError when a list is passed using the in operator (__contains__)
|
|
|
|
|
* Bug in setting a new value in a Series with a Timedelta object incorrectly casting the value to an integer
|
|
|
|
|
* Bug in Series setting a new key (__setitem__) with a timezone-aware datetime incorrectly raising ValueError
|
|
|
|
|
* Bug in DataFrame.iloc when indexing with a read-only indexer
|
|
|
|
|
* Bug in Series setting an existing tuple key (__setitem__) with timezone-aware datetime values incorrectly raising TypeError
|
|
|
|
|
> Missing
|
|
|
|
|
* Fixed misleading exception message in Series.interpolate if argument order is required, but omitted .
|
|
|
|
|
* Fixed class type displayed in exception message in DataFrame.dropna if invalid axis parameter passed
|
|
|
|
|
* A ValueError will now be thrown by DataFrame.fillna when limit is not a positive integer
|
|
|
|
|
> MultiIndex
|
|
|
|
|
* Bug in which incorrect exception raised by Timedelta when testing the membership of MultiIndex
|
|
|
|
|
> I/O
|
|
|
|
|
* Bug in DataFrame.to_html() where values were truncated using display options instead of outputting the full content
|
|
|
|
|
* Fixed bug in missing text when using to_clipboard if copying utf-16 characters in Python 3 on Windows
|
|
|
|
|
* Bug in read_json for orient='table' when it tries to infer dtypes by default, which is not applicable as dtypes are already defined in the JSON schema
|
|
|
|
|
* Bug in read_json for orient='table' and float index, as it infers index dtype by default, which is not applicable because index dtype is already defined in the JSON schema
|
|
|
|
|
* Bug in read_json for orient='table' and string of float column names, as it makes a column name type conversion to Timestamp, which is not applicable because column names are already defined in the JSON schema
|
|
|
|
|
* Bug in json_normalize for errors='ignore' where missing values in the input data, were filled in resulting DataFrame with the string "nan" instead of numpy.nan
|
|
|
|
|
* DataFrame.to_html now raises TypeError when using an invalid type for the classes parameter instead of AssertionError
|
|
|
|
|
* Bug in DataFrame.to_string and DataFrame.to_latex that would lead to incorrect output when the header keyword is used
|
|
|
|
|
* Bug in read_csv not properly interpreting the UTF8 encoded filenames on Windows on Python 3.6+
|
|
|
|
|
* Improved performance in pandas.read_stata and pandas.io.stata.StataReader when converting columns that have missing values
|
|
|
|
|
* Bug in DataFrame.to_html where header numbers would ignore display options when rounding
|
|
|
|
|
* Bug in read_hdf where reading a table from an HDF5 file written directly with PyTables fails with a ValueError when using a sub-selection via the start or stop arguments
|
|
|
|
|
* Bug in read_hdf not properly closing store after a KeyError is raised
|
|
|
|
|
* Improved the explanation for the failure when value labels are repeated in Stata dta files and suggested work-arounds
|
|
|
|
|
* Improved pandas.read_stata and pandas.io.stata.StataReader to read incorrectly formatted 118 format files saved by Stata
|
|
|
|
|
* Improved the col_space parameter in DataFrame.to_html to accept a string so CSS length values can be set correctly
|
|
|
|
|
* Fixed bug in loading objects from S3 that contain # characters in the URL
|
|
|
|
|
* Adds use_bqstorage_api parameter to read_gbq to speed up downloads of large data frames. This feature requires version 0.10.0 of the pandas-gbq library as well as the google-cloud-bigquery-storage and fastavro libraries.
|
|
|
|
|
* Fixed memory leak in DataFrame.to_json when dealing with numeric data
|
|
|
|
|
* Bug in read_json where date strings with Z were not converted to a UTC timezone
|
|
|
|
|
* Added cache_dates=True parameter to read_csv, which allows to cache unique dates when they are parsed
|
|
|
|
|
* DataFrame.to_excel now raises a ValueError when the caller's dimensions exceed the limitations of Excel
|
|
|
|
|
* Fixed bug in pandas.read_csv where a BOM would result in incorrect parsing using engine='python'
|
|
|
|
|
* read_excel now raises a ValueError when input is of type pandas.io.excel.ExcelFile and engine param is passed since pandas.io.excel.ExcelFile has an engine defined
|
|
|
|
|
* Bug while selecting from HDFStore with where='' specified .
|
|
|
|
|
* Fixed bug in DataFrame.to_excel() where custom objects (i.e. PeriodIndex) inside merged cells were not being converted into types safe for the Excel writer
|
|
|
|
|
* Bug in read_hdf where reading a timezone aware DatetimeIndex would raise a TypeError
|
|
|
|
|
* Bug in to_msgpack and read_msgpack which would raise a ValueError rather than a FileNotFoundError for an invalid path
|
|
|
|
|
* Fixed bug in DataFrame.to_parquet which would raise a ValueError when the dataframe had no columns
|
|
|
|
|
* Allow parsing of PeriodDtype columns when using read_csv
|
|
|
|
|
> Plotting
|
|
|
|
|
* Fixed bug where api.extensions.ExtensionArray could not be used in matplotlib plotting
|
|
|
|
|
* Bug in an error message in DataFrame.plot. Improved the error message if non-numerics are passed to DataFrame.plot
|
|
|
|
|
* Bug in incorrect ticklabel positions when plotting an index that are non-numeric / non-datetime
|
|
|
|
|
* Fixed bug causing plots of PeriodIndex timeseries to fail if the frequency is a multiple of the frequency rule code
|
|
|
|
|
* Fixed bug when plotting a DatetimeIndex with datetime.timezone.utc timezone
|
|
|
|
|
> Groupby/resample/rolling
|
|
|
|
|
* Bug in pandas.core.resample.Resampler.agg with a timezone aware index where OverflowError would raise when passing a list of functions
|
|
|
|
|
* Bug in pandas.core.groupby.DataFrameGroupBy.nunique in which the names of column levels were lost
|
|
|
|
|
* Bug in pandas.core.groupby.GroupBy.agg when applying an aggregation function to timezone aware data
|
|
|
|
|
* Bug in pandas.core.groupby.GroupBy.first and pandas.core.groupby.GroupBy.last where timezone information would be dropped
|
|
|
|
|
* Bug in pandas.core.groupby.GroupBy.size when grouping only NA values
|
|
|
|
|
* Bug in Series.groupby where observed kwarg was previously ignored
|
|
|
|
|
* Bug in Series.groupby where using groupby with a MultiIndex Series with a list of labels equal to the length of the series caused incorrect grouping
|
|
|
|
|
* Ensured that ordering of outputs in groupby aggregation functions is consistent across all versions of Python
|
|
|
|
|
* Ensured that result group order is correct when grouping on an ordered Categorical and specifying observed=True
|
|
|
|
|
* Bug in pandas.core.window.Rolling.min and pandas.core.window.Rolling.max that caused a memory leak
|
|
|
|
|
* Bug in pandas.core.window.Rolling.count and pandas.core.window.Expanding.count was previously ignoring the axis keyword
|
|
|
|
|
* Bug in pandas.core.groupby.GroupBy.idxmax and pandas.core.groupby.GroupBy.idxmin with datetime column would return incorrect dtype
|
|
|
|
|
* Bug in pandas.core.groupby.GroupBy.cumsum, pandas.core.groupby.GroupBy.cumprod, pandas.core.groupby.GroupBy.cummin and pandas.core.groupby.GroupBy.cummax with categorical column having absent categories, would return incorrect result or segfault
|
|
|
|
|
* Bug in pandas.core.groupby.GroupBy.nth where NA values in the grouping would return incorrect results
|
|
|
|
|
* Bug in pandas.core.groupby.SeriesGroupBy.transform where transforming an empty group would raise a ValueError
|
|
|
|
|
* Bug in pandas.core.frame.DataFrame.groupby where passing a pandas.core.groupby.grouper.Grouper would return incorrect groups when using the .groups accessor
|
|
|
|
|
* Bug in pandas.core.groupby.GroupBy.agg where incorrect results are returned for uint64 columns.
|
|
|
|
|
* Bug in pandas.core.window.Rolling.median and pandas.core.window.Rolling.quantile where MemoryError is raised with empty window
|
|
|
|
|
* Bug in pandas.core.window.Rolling.median and pandas.core.window.Rolling.quantile where incorrect results are returned with closed='left' and closed='neither'
|
|
|
|
|
* Improved pandas.core.window.Rolling, pandas.core.window.Window and pandas.core.window.EWM functions to exclude nuisance columns from results instead of raising errors and raise a DataError only if all columns are nuisance
|
|
|
|
|
* Bug in pandas.core.window.Rolling.max and pandas.core.window.Rolling.min where incorrect results are returned with an empty variable window
|
|
|
|
|
* Raise a helpful exception when an unsupported weighted window function is used as an argument of pandas.core.window.Window.aggregate
|
|
|
|
|
> Reshaping
|
|
|
|
|
* Bug in pandas.merge adds a string of None, if None is assigned in suffixes instead of remain the column name as-is .
|
|
|
|
|
* Bug in merge when merging by index name would sometimes result in an incorrectly numbered index (missing index values are now assigned NA)
|
|
|
|
|
* to_records now accepts dtypes to its column_dtypes parameter
|
|
|
|
|
* Bug in concat where order of OrderedDict (and dict in Python 3.6+) is not respected, when passed in as objs argument
|
|
|
|
|
* Bug in pivot_table where columns with NaN values are dropped even if dropna argument is False, when the aggfunc argument contains a list
|
|
|
|
|
* Bug in concat where the resulting freq of two DatetimeIndex with the same freq would be dropped .
|
|
|
|
|
* Bug in merge where merging with equivalent Categorical dtypes was raising an error
|
|
|
|
|
* bug in DataFrame instantiating with a dict of iterators or generators (e.g. pd.DataFrame({'A': reversed(range(3))})) raised an error .
|
|
|
|
|
* Bug in DataFrame instantiating with a range (e.g. pd.DataFrame(range(3))) raised an error .
|
|
|
|
|
* Bug in DataFrame constructor when passing non-empty tuples would cause a segmentation fault
|
|
|
|
|
* Bug in Series.apply failed when the series is a timezone aware DatetimeIndex
|
|
|
|
|
* Bug in pandas.cut where large bins could incorrectly raise an error due to an integer overflow
|
|
|
|
|
* Bug in DataFrame.sort_index where an error is thrown when a multi-indexed DataFrame is sorted on all levels with the initial level sorted last
|
|
|
|
|
* Bug in Series.nlargest treats True as smaller than False
|
|
|
|
|
* Bug in DataFrame.pivot_table with a IntervalIndex as pivot index would raise TypeError
|
|
|
|
|
* Bug in which DataFrame.from_dict ignored order of OrderedDict when orient='index' .
|
|
|
|
|
* Bug in DataFrame.transpose where transposing a DataFrame with a timezone-aware datetime column would incorrectly raise ValueError
|
|
|
|
|
* Bug in pivot_table when pivoting a timezone aware column as the values would remove timezone information
|
|
|
|
|
* Bug in merge_asof when specifying multiple by columns where one is datetime64[ns, tz] dtype
|
|
|
|
|
> Sparse
|
|
|
|
|
* Significant speedup in SparseArray initialization that benefits most operations, fixing performance regression introduced in v0.20.0
|
|
|
|
|
* Bug in SparseFrame constructor where passing None as the data would cause default_fill_value to be ignored
|
|
|
|
|
* Bug in SparseDataFrame when adding a column in which the length of values does not match length of index, AssertionError is raised instead of raising ValueError
|
|
|
|
|
* Introduce a better error message in Series.sparse.from_coo so it returns a TypeError for inputs that are not coo matrices
|
|
|
|
|
* Bug in numpy.modf on a SparseArray. Now a tuple of SparseArray is returned .
|
|
|
|
|
> Build Changes
|
|
|
|
|
* Fix install error with PyPy on macOS
|
|
|
|
|
> ExtensionArray
|
|
|
|
|
* Bug in factorize when passing an ExtensionArray with a custom na_sentinel .
|
|
|
|
|
* Series.count miscounts NA values in ExtensionArrays
|
|
|
|
|
* Added Series.__array_ufunc__ to better handle NumPy ufuncs applied to Series backed by extension arrays .
|
|
|
|
|
* Keyword argument deep has been removed from ExtensionArray.copy
|
|
|
|
|
> Other
|
|
|
|
|
* Removed unused C functions from vendored UltraJSON implementation
|
|
|
|
|
* Allow Index and RangeIndex to be passed to numpy min and max functions
|
|
|
|
|
* Use actual class name in repr of empty objects of a Series subclass .
|
|
|
|
|
* Bug in DataFrame where passing an object array of timezone-aware datetime objects would incorrectly raise ValueError
|
|
|
|
|
- Remove upstream-included pandas-tests-memory.patch
|
|
|
|
|
|
|
|
|
|
-------------------------------------------------------------------
|
|
|
|
|
Sat Mar 16 22:35:08 UTC 2019 - Arun Persaud <arun@gmx.de>
|
|
|
|
|
|
|
|
|
|