1255 lines
60 KiB
Plaintext
1255 lines
60 KiB
Plaintext
|
-------------------------------------------------------------------
|
||
|
Sun Jul 13 14:04:39 UTC 2025 - Ben Greiner <code@bnavigator.de>
|
||
|
|
||
|
- Update to 4.13.4
|
||
|
* If you pass a function as the first argument to a find* method,
|
||
|
the function will only ever be called once per tag, with the
|
||
|
Tag object as the argument. Starting in 4.13.0, there were
|
||
|
cases where the function would be called with a Tag object and
|
||
|
then called again with the name of the tag. [bug=2106435]
|
||
|
* Added a passthrough implementation for
|
||
|
NavigableString.__getitem__ which gives a more helpful
|
||
|
exception if the user tries to treat it as a Tag and access its
|
||
|
HTML attributes.
|
||
|
* Fixed a bug that caused an exception when unpickling the result
|
||
|
of parsing certain invalid markup with lxml as the tree
|
||
|
builder. [bug=2103126]
|
||
|
* Converted the AUTHORS file to UTF-8 for PEP8 compliance.
|
||
|
[bug=2107405]
|
||
|
- Release 4.13.3 (20250204)
|
||
|
* Modified the 4.13.2 change slightly to restore backwards
|
||
|
compatibility. Specifically, calling a find_* method with no
|
||
|
arguments should return the first Tag out of the iterator, not
|
||
|
the first PageElement. [bug=2097333]
|
||
|
- Release 4.13.2 (20250204)
|
||
|
* Gave ElementFilter the ability to explicitly say that it
|
||
|
excludes every item in the parse tree. This is used internally
|
||
|
in situations where the provided filters are logically
|
||
|
inconsistent or match a value against the null set.
|
||
|
|
||
|
Without this, it's not always possible to distinguish between a
|
||
|
SoupStrainer that excludes everything and one that excludes
|
||
|
nothing.
|
||
|
|
||
|
This fixes a bug where calls to find_* methods with no
|
||
|
arguments returned None, instead of the first item out of the
|
||
|
iterator. [bug=2097333]
|
||
|
|
||
|
Things added to the API to support this:
|
||
|
|
||
|
- The ElementFilter.includes_everything property
|
||
|
- The MatchRule.exclude_everything member
|
||
|
- The _known_rules argument to ElementFilter.match. This is an
|
||
|
optional argument used internally to indicate that an
|
||
|
optimization is safe.
|
||
|
- Release 4.13.1 (20250203)
|
||
|
* Updated pyproject.toml to require Python 3.7 or above.
|
||
|
[bug=2097263]
|
||
|
* Pinned the typing-extensions dependency to a minimum version of
|
||
|
4.0.0. [bug=2097262]
|
||
|
* Restored the English documentation to the source distribution.
|
||
|
[bug=2097237]
|
||
|
* Fixed a regression where HTMLFormatter and XMLFormatter were
|
||
|
not propagating the indent parameter to the superconstructor.
|
||
|
[bug=2097272]
|
||
|
- Release 4.13.0 (20250202)
|
||
|
* This release introduces Python type hints to all public classes
|
||
|
and methods in Beautiful Soup. The addition of these type hints
|
||
|
exposed a large number of very small inconsistencies in the
|
||
|
code, which I've fixed, but the result is a larger-than-usual
|
||
|
number of deprecations and changes that may break backwards
|
||
|
compatibility.
|
||
|
|
||
|
Chris Papademetrious deserves a special thanks for his work on
|
||
|
this release through its long beta process.
|
||
|
## Deprecation notices
|
||
|
* These things now give DeprecationWarnings when you try to use
|
||
|
them, and are scheduled to be removed in Beautiful Soup 4.15.0.
|
||
|
* Every deprecated method, attribute and class from the 3.0 and
|
||
|
2.0 major versions of Beautiful Soup. These have been
|
||
|
deprecated for a very long time, but they didn't issue
|
||
|
DeprecationWarning when you tried to use them. Now they do, and
|
||
|
they're all going away soon.
|
||
|
|
||
|
This mainly refers to methods and attributes with camelCase
|
||
|
names, for example: renderContents, replaceWith,
|
||
|
replaceWithChildren, findAll, findAllNext, findAllPrevious,
|
||
|
findNext, findNextSibling, findNextSiblings, findParent,
|
||
|
findParents, findPrevious, findPreviousSibling,
|
||
|
findPreviousSiblings, getText, nextSibling, previousSibling,
|
||
|
isSelfClosing, fetchNextSiblings, fetchPreviousSiblings,
|
||
|
fetchPrevious, fetchPreviousSiblings, fetchParents, findChild,
|
||
|
findChildren, childGenerator, nextGenerator,
|
||
|
nextSiblingGenerator, previousGenerator,
|
||
|
previousSiblingGenerator, recursiveChildGenerator, and
|
||
|
parentGenerator.
|
||
|
|
||
|
This also includes the BeautifulStoneSoup class.
|
||
|
* The SAXTreeBuilder class, which was never officially supported
|
||
|
or tested.
|
||
|
* The private class method BeautifulSoup._decode_markup(), which
|
||
|
has not been used inside Beautiful Soup for many years.
|
||
|
* The first argument to BeautifulSoup.decode has been changed
|
||
|
from pretty_print:bool to indent_level:int, to match the
|
||
|
signature of Tag.decode. Using a bool will still work but will
|
||
|
give you a DeprecationWarning.
|
||
|
* SoupStrainer.text and SoupStrainer.string are both deprecated,
|
||
|
since a single item can't capture all the possibilities of a
|
||
|
SoupStrainer designed to match strings.
|
||
|
* SoupStrainer.search_tag(). It was never a documented method,
|
||
|
but if you use it, you should start using
|
||
|
SoupStrainer.allow_tag_creation() instead.
|
||
|
* The soup:BeautifulSoup argument to the TreeBuilderForHtml5lib
|
||
|
constructor is now required, not optional. It's unclear why it
|
||
|
was optional in the first place, so if you discover you need
|
||
|
this, contact me for possible un-deprecation.
|
||
|
## Compatibility notices
|
||
|
* This version drops support for Python 3.6. The minimum
|
||
|
supported major Python version for Beautiful Soup is now Python
|
||
|
3.7.
|
||
|
* Deprecation warnings have been added for all deprecated methods
|
||
|
and attributes (see above). Going forward, deprecated names
|
||
|
will be removed two feature releases or one major release after
|
||
|
the deprecation warning is added.
|
||
|
* The storage for a tag's attribute values now modifies incoming
|
||
|
values to be consistent with the HTML or XML spec. This means
|
||
|
that if you set an attribute value to a number, it will be
|
||
|
converted to a string immediately, rather than being converted
|
||
|
when you output the document. [bug=2065525]
|
||
|
|
||
|
More importantly for backwards compatibility, setting an HTML
|
||
|
attribute value to True will set the attribute's value to the
|
||
|
appropriate string per the HTML spec. Setting an attribute
|
||
|
value to False or None will remove the attribute value from the
|
||
|
tag altogether, rather than (effectively, as before) setting
|
||
|
the value to the string "False" or the string "None".
|
||
|
|
||
|
This means that some programs that modify documents will
|
||
|
generate different output than they would in earlier versions
|
||
|
of Beautiful Soup, but the new documents are more likely to
|
||
|
represent the intent behind the modifications.
|
||
|
|
||
|
To give a specific example, if you have code that looks
|
||
|
something like this:
|
||
|
|
||
|
checkbox1['checked'] = True checkbox2['checked'] = False
|
||
|
|
||
|
Then a document that used to look like this (with most browsers
|
||
|
treating both boxes as checked):
|
||
|
|
||
|
<input type="checkbox" checked="True"/> <input type="checkbox"
|
||
|
checked="False"/>
|
||
|
|
||
|
Will now look like this (with browsers treating only the first
|
||
|
box as checked):
|
||
|
|
||
|
<input type="checkbox" checked="checked"/> <input
|
||
|
type="checkbox"/>
|
||
|
|
||
|
You can get the old behavior back by instantiating a
|
||
|
TreeBuilder with `attribute_dict_class=dict`, or you can
|
||
|
customize how Beautiful Soup treates attribute values by
|
||
|
passing in a custom subclass of dict.
|
||
|
* If Tag.get_attribute_list() is used to access an attribute
|
||
|
that's not set, the return value is now an empty list rather
|
||
|
than [None].
|
||
|
* If you pass an empty list as the attribute value when searching
|
||
|
the tree, you will now find all tags which have that attribute
|
||
|
set to a value in the empty list--that is, you will find
|
||
|
nothing. This is consistent with other situations where a list
|
||
|
of acceptable values is provided. Previously, an empty list was
|
||
|
treated the same as None and False, and you would have found
|
||
|
the tags which did not have that attribute set at all.
|
||
|
[bug=2045469]
|
||
|
* For similar reasons, if you pass in limit=0 to a find() method,
|
||
|
you will now get zero results. Previously, you would get all
|
||
|
matching results.
|
||
|
* When using one of the find() methods or creating a
|
||
|
SoupStrainer, if you specify the same attribute value in
|
||
|
``attrs`` and the keyword arguments, you'll end up with two
|
||
|
different ways to match that attribute. Previously the value in
|
||
|
keyword arguments would override the value in ``attrs``.
|
||
|
* All exceptions were moved to the bs4.exceptions module, and all
|
||
|
warnings to the bs4._warnings module (named so as not to shadow
|
||
|
Python's built-in warnings module). All warnings and exceptions
|
||
|
are exported from the bs4 module, which is probably the safest
|
||
|
place to import them from in your own code.
|
||
|
* As a side effect of this, the string constant
|
||
|
BeautifulSoup.NO_PARSER_SPECIFIED_WARNING was moved to
|
||
|
GuessedAtParserWarning.MESSAGE.
|
||
|
* The 'html5' formatter is now much less aggressive about
|
||
|
escaping ampersands, escaping only the ampersands considered
|
||
|
"ambiguous" by the HTML5 spec (which is almost none of them).
|
||
|
This is the sort of change that might break your unit test
|
||
|
suite, but the resulting markup will be much more readable and
|
||
|
more HTML5-ish.
|
||
|
|
||
|
To quickly get the old behavior back, change code like this:
|
||
|
|
||
|
tag.encode(formatter='html5')
|
||
|
|
||
|
to this:
|
||
|
|
||
|
tag.encode(formatter='html5-4.12')
|
||
|
|
||
|
In the future, the 'html5' formatter may be become the default
|
||
|
HTML formatter, which will change Beautiful Soup's default
|
||
|
output. This will break a lot of test suites so it's not going
|
||
|
to happen for a while. [bug=1902431]
|
||
|
* Tag.sourceline and Tag.sourcepos now always have a consistent
|
||
|
data type: Optional[int]. Previously these values were
|
||
|
sometimes an Optional[int], and sometimes they were
|
||
|
Optional[Tag], the result of searching for a child tag called
|
||
|
<sourceline> or <sourcepos>. [bug=2065904]
|
||
|
|
||
|
If your code does search for a tag called <sourceline> or
|
||
|
<sourcepos>, it may stop finding that tag when you upgrade to
|
||
|
Beautiful Soup 4.13. If this happens, you'll need to replace
|
||
|
code that treats "sourceline" or "sourcepos" as tag names:
|
||
|
|
||
|
tag.sourceline
|
||
|
|
||
|
with code that explicitly calls the find() method:
|
||
|
|
||
|
tag.find("sourceline").name
|
||
|
|
||
|
Making the behavior of sourceline and sourcepos consistent has
|
||
|
the side effect of fixing a major performance problem when a
|
||
|
Tag is copied.
|
||
|
|
||
|
With this change, the store_line_numbers argument to the
|
||
|
BeautifulSoup constructor becomes much less useful, and its use
|
||
|
is now discouraged, thought I'm not deprecating it yet. Please
|
||
|
contact me if you have a performance or security rationale for
|
||
|
setting store_line_numbers=False.
|
||
|
* append(), extend(), insert(), and unwrap() were moved from
|
||
|
PageElement to Tag. Those methods manipulate the 'contents'
|
||
|
collection, so they would only have ever worked on Tag objects.
|
||
|
* The BeautifulSoupHTMLParser constructor now requires a
|
||
|
BeautifulSoup object as its first argument. This almost
|
||
|
certainly does not affect you, since you probably use
|
||
|
HTMLParserTreeBuilder, not BeautifulSoupHTMLParser directly.
|
||
|
* The TreeBuilderForHtml5lib methods fragmentClass(),
|
||
|
getFragment(), and testSerializer() now raise
|
||
|
NotImplementedError. These methods are called only by
|
||
|
html5lib's test suite, and Beautiful Soup isn't integrated into
|
||
|
that test suite, so this code was long since unused and
|
||
|
untested.
|
||
|
|
||
|
These methods are _not_ deprecated, since they are methods
|
||
|
defined by html5lib. They may one day have real
|
||
|
implementations, as part of a future effort to integrate
|
||
|
Beautiful Soup into html5lib's test suite.
|
||
|
* AttributeValueWithCharsetSubstitution.encode() is renamed to
|
||
|
substitute_encoding, to avoid confusion with the much different
|
||
|
str.encode()
|
||
|
* Using PageElement.replace_with() to replace an element with
|
||
|
itself returns the element instead of None.
|
||
|
* All TreeBuilder constructors now take the empty_element_tags
|
||
|
argument. The sets of tags found in
|
||
|
HTMLTreeBuilder.empty_element_tags and
|
||
|
HTMLTreeBuilder.block_elements are now in
|
||
|
HTMLTreeBuilder.DEFAULT_EMPTY_ELEMENT_TAGS and
|
||
|
HTMLTreeBuilder.DEFAULT_BLOCK_ELEMENTS, to avoid confusing them
|
||
|
with instance variables.
|
||
|
* The unused constant LXMLTreeBuilderForXML.DEFAULT_PARSER_CLASS
|
||
|
has been removed.
|
||
|
* Some of the arguments in the methods of LXMLTreeBuilderForXML
|
||
|
have been renamed for consistency with the names lxml uses for
|
||
|
those arguments in the superclass. This won't affect you unless
|
||
|
you were calling methods like LXMLTreeBuilderForXML.start()
|
||
|
directly.
|
||
|
* In particular, the arguments to
|
||
|
LXMLTreeBuilderForXML.prepare_markup have been changed to match
|
||
|
the arguments to the superclass, TreeBuilder.prepare_markup.
|
||
|
Specifically, document_declared_encoding now appears before
|
||
|
exclude_encodings, not after. If you were calling this method
|
||
|
yourself, I recommend switching to using keyword arguments
|
||
|
instead.
|
||
|
## New features
|
||
|
* The new ElementFilter class encapsulates Beautiful Soup's rules
|
||
|
about matching elements and deciding which parts of a document
|
||
|
to parse. It's easy to override those rules with subclassing or
|
||
|
function composition. The SoupStrainer class, which contains
|
||
|
all the matching logic you're familiar with from the find_*
|
||
|
methods, is now a subclass of ElementFilter.
|
||
|
* The new PageElement.filter() method provides a fully general
|
||
|
way of finding elements in a Beautiful Soup parse tree. You can
|
||
|
specify a function to iterate over the tree and an
|
||
|
ElementFilter to determine what matches.
|
||
|
* The new_tag() method now takes a 'string' argument. This allows
|
||
|
you to set the string contents of a Tag when creating it. Patch
|
||
|
by Chris Papademetrious. [bug=2044599]
|
||
|
* Defined a number of new iterators which are the same as
|
||
|
existing iterators, but which yield the element itself before
|
||
|
beginning to traverse the tree. [bug=2052936] [bug=2067634]
|
||
|
|
||
|
- PageElement.self_and_parents
|
||
|
- PageElement.self_and_descendants
|
||
|
- PageElement.self_and_next_elements
|
||
|
- PageElement.self_and_next_siblings
|
||
|
- PageElement.self_and_previous_elements
|
||
|
- PageElement.self_and_previous_siblings
|
||
|
|
||
|
self_and_parents yields the element you call it on and then all
|
||
|
of its parents. self_and_next_element yields the element you
|
||
|
call it on and then every element parsed afterwards; and so on.
|
||
|
* The NavigableString class now has a .string property which
|
||
|
returns the string itself. This makes it easier to iterate over
|
||
|
a mixed list of Tag and NavigableString objects. [bug=2044794]
|
||
|
* Defined a new method, Tag.copy_self(), which creates a copy of
|
||
|
a Tag with the same attributes but no contents. [bug=2065120]
|
||
|
|
||
|
Note that this method used to be a private method named
|
||
|
_clone(). The _clone() method has been removed, so if you were
|
||
|
using it, change your code to call copy_self() instead.
|
||
|
* The PageElement.append() method now returns the element that
|
||
|
was appended; it used to have no return value. [bug=2093025]
|
||
|
* The methods PageElement.insert(), PageElement.extend(),
|
||
|
PageElement.insert_before(), and PageElement.insert_after() now
|
||
|
return a list of the items inserted. These methods used to have
|
||
|
no return value. [bug=2093025]
|
||
|
* The PageElement.insert() method now takes a variable number of
|
||
|
arguments and returns a list of all elements inserted, to match
|
||
|
insert_before() and insert_after(). (Even if I hadn't made the
|
||
|
variable-argument change, an edge case around inserting one
|
||
|
Beautiful Soup object into another means that insert()'s return
|
||
|
value needs to be a list.) [bug=2093025]
|
||
|
* Defined a new warning class, UnusualUsageWarning, which is a
|
||
|
superclass for all of the warnings issued when Beautiful Soup
|
||
|
notices something unusual but not guaranteed to be wrong, like
|
||
|
markup that looks like a URL (MarkupResemblesLocatorWarning) or
|
||
|
XML being run through an HTML parser (XMLParsedAsHTMLWarning).
|
||
|
|
||
|
The text of these warnings has been revamped to explain in more
|
||
|
detail what is going on, how to check if you've made a mistake,
|
||
|
and how to make the warning go away if you are acting
|
||
|
deliberately.
|
||
|
|
||
|
If these warnings are interfering with your workflow, or simply
|
||
|
annoying you, you can filter all of them by filtering
|
||
|
UnusualUsageWarning, without worrying about losing the warnings
|
||
|
Beautiful Soup issues when there *definitely* is a problem you
|
||
|
need to correct.
|
||
|
* It's now possible to modify the behavior of the list used to
|
||
|
store the values of multi-valued attributes such as HTML
|
||
|
'class', by passing in whatever class you want instantiated
|
||
|
(instead of a normal Python list) to the TreeBuilder
|
||
|
constructor as attribute_value_list_class. [bug=2052943]
|
||
|
## Improvements
|
||
|
* decompose() was moved from Tag to its superclass PageElement,
|
||
|
since there's no reason it won't also work on NavigableString
|
||
|
objects.
|
||
|
* Emit an UnusualUsageWarning if the user tries to search for an
|
||
|
attribute called _class; they probably mean "class_".
|
||
|
[bug=2025089]
|
||
|
* The MarkupResemblesLocatorWarning issued when the markup
|
||
|
resembles a filename is now issued less often, due to
|
||
|
improvements in detecting markup that's unlikely to be a
|
||
|
filename. [bug=2052988]
|
||
|
* Emit a warning if a document is parsed using a SoupStrainer
|
||
|
that's set up to filter everything. In these cases, filtering
|
||
|
everything is the most consistent thing to do, but there was no
|
||
|
indication that this was happening, so the behavior may have
|
||
|
seemed mysterious.
|
||
|
* When using one of the find() methods or creating a
|
||
|
SoupStrainer, you can pass a list of any accepted object
|
||
|
(strings, regular expressions, etc.) for any of the objects.
|
||
|
Previously you could only pass in a list of strings.
|
||
|
* A SoupStrainer can now filter tag creation based on a tag's
|
||
|
namespaced name. Previously only the unqualified name could be
|
||
|
used.
|
||
|
* Added the correct stacklevel to another instance of the
|
||
|
XMLParsedAsHTMLWarning. [bug=2034451]
|
||
|
* Improved the wording of the TypeError raised when you pass
|
||
|
something other than markup into the BeautifulSoup constructor.
|
||
|
[bug=2071530]
|
||
|
* Optimized the case where you use Tag.insert() to "insert" a
|
||
|
PageElement into its current location. [bug=2077020]
|
||
|
* Changes to make tests work whether tests are run under
|
||
|
soupsieve 2.6 or an earlier version. Based on a patch by
|
||
|
Stefano Rivera.
|
||
|
* Removed the strip_cdata argument to lxml's HTMLParser
|
||
|
constructor, which never did anything and is deprecated as of
|
||
|
lxml 5.3.0. Patch by Stefano Rivera. [bug=2076897]
|
||
|
## Bug fixes
|
||
|
* Copying a tag with a multi-valued attribute now makes a copy of
|
||
|
the list of values, eliminating a bug where both the old and
|
||
|
new copy shared the same list. [bug=2067412]
|
||
|
* The lxml TreeBuilder, like the other TreeBuilders, now filters
|
||
|
a document's initial DOCTYPE if you've set up a SoupStrainer
|
||
|
that eliminates it. [bug=2062000]
|
||
|
* A lot of things can go wrong if you modify the parse tree while
|
||
|
iterating over it, especially if you are removing or replacing
|
||
|
elements. Most of those things fall under the category of
|
||
|
unexpected behavior (which is why I don't recommend doing
|
||
|
this), but there are a few ways that caused unhandled
|
||
|
exceptions. The list comprehensions used by Beautiful Soup
|
||
|
(e.g. .descendants, which powers the find* methods) should now
|
||
|
work correctly in those cases, or at least not raise
|
||
|
exceptions.
|
||
|
|
||
|
As part of this work, I changed when the list comprehension
|
||
|
determines the next element. Previously it was done after the
|
||
|
yield statement; now it's done before the yield statement. This
|
||
|
lets you remove the yielded element in calling code, or modify
|
||
|
it in a way that would break this calculation, without causing
|
||
|
an exception.
|
||
|
|
||
|
So if your code relies on modifying the tree in a way that
|
||
|
'steers' a list comprehension, rather than using the list
|
||
|
comprension to decide which bits of the tree to modify, it will
|
||
|
probably stop working at this point. [bug=2091118]
|
||
|
* Fixed an error in the lookup table used when converting
|
||
|
ISO-Latin-1 to ASCII, which no one should do anyway.
|
||
|
* Corrected the markup that's output in the unlikely event that
|
||
|
you encode a document to a Python internal encoding (like
|
||
|
"palmos") that's not recognized by the HTML or XML standard.
|
||
|
* UnicodeDammit.markup is now always a bytestring representing
|
||
|
the *original* markup (sans BOM), and
|
||
|
UnicodeDammit.unicode_markup is always the converted Unicode
|
||
|
equivalent of the original markup. Previously,
|
||
|
UnicodeDammit.markup was treated inconsistently and would often
|
||
|
end up containing Unicode. UnicodeDammit.markup was not a
|
||
|
documented attribute, but if you were using it, you probably
|
||
|
want to switch to using .unicode_markup instead.
|
||
|
- Drop soupsieve26-compat.patch
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Wed Jun 18 07:05:52 UTC 2025 - Matej Cepl <mcepl@cepl.eu>
|
||
|
|
||
|
- Skip failing test test_rejected_input, it is known to be flaky
|
||
|
and dependent on the various changes in Python (which there
|
||
|
will be more coming in few days).
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Fri Nov 1 07:22:57 UTC 2024 - Matej Cepl <mcepl@cepl.eu>
|
||
|
|
||
|
- Add soupsieve26-compat.patch to make tests more tolerant with
|
||
|
various versions of soupsieve (better solution for lp#2086199).
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Thu Oct 31 14:24:07 UTC 2024 - Matej Cepl <mcepl@cepl.eu>
|
||
|
|
||
|
- Skip the test test_unsupported_pseudoclass (lp#2086199).
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Sat Jan 20 13:11:41 UTC 2024 - Dirk Müller <dmueller@suse.com>
|
||
|
|
||
|
- update to 4.12.3:
|
||
|
* Fixed a regression such that if you set .hidden on a tag, the
|
||
|
tag becomes invisible but its contents are still visible. User
|
||
|
manipulation of .hidden is not a documented or supported
|
||
|
feature, so don't do this, but it wasn't too difficult to
|
||
|
keep the old behavior
|
||
|
working.
|
||
|
* Fixed a case found by Mengyuhan where html.parser giving up
|
||
|
on markup would result in an AssertionError instead of a
|
||
|
ParserRejectedMarkup exception.
|
||
|
* Added the correct stacklevel to instances of the
|
||
|
XMLParsedAsHTMLWarning.
|
||
|
* Corrected the syntax of the license definition in
|
||
|
pyproject.toml.
|
||
|
* Corrected a typo in a test that was causing test failures
|
||
|
when run against libxml2 2.12.1.
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Thu Nov 23 03:40:05 UTC 2023 - Steve Kowalik <steven.kowalik@suse.com>
|
||
|
|
||
|
- Require cchardet explicitly to avoid charset-normalizer braindamage.
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Mon May 8 11:39:40 UTC 2023 - Daniel Garcia <daniel.garcia@suse.com>
|
||
|
|
||
|
- Update to 4.12.2:
|
||
|
* Fixed an unhandled exception in BeautifulSoup.decode_contents
|
||
|
and methods that call it. [bug=2015545]
|
||
|
- 4.12.1:
|
||
|
* This version of Beautiful Soup replaces setup.py and setup.cfg
|
||
|
with pyproject.toml. Beautiful Soup now uses tox as its test backend
|
||
|
and hatch to do builds.
|
||
|
* The main functional improvement in this version is a nonrecursive technique
|
||
|
for regenerating a tree. This technique is used to avoid situations where,
|
||
|
in previous versions, doing something to a very deeply nested tree
|
||
|
would overflow the Python interpreter stack:
|
||
|
1. Outputting a tree as a string, e.g. with
|
||
|
BeautifulSoup.encode() [bug=1471755]
|
||
|
2. Making copies of trees (copy.copy() and
|
||
|
copy.deepcopy() from the Python standard library). [bug=1709837]
|
||
|
3. Pickling a BeautifulSoup object. (Note that pickling a Tag
|
||
|
object can still cause an overflow.)
|
||
|
* Making a copy of a BeautifulSoup object no longer parses the
|
||
|
document again, which should improve performance significantly.
|
||
|
* When a BeautifulSoup object is unpickled, Beautiful Soup now
|
||
|
tries to associate an appropriate TreeBuilder object with it.
|
||
|
* Tag.prettify() will now consistently end prettified markup with
|
||
|
a newline.
|
||
|
* Added unit tests for fuzz test cases created by third
|
||
|
parties. Some of these tests are skipped since they point
|
||
|
to problems outside of Beautiful Soup, but this change
|
||
|
puts them all in one convenient place.
|
||
|
* PageElement now implements the known_xml attribute. (This was technically
|
||
|
a bug, but it shouldn't be an issue in normal use.) [bug=2007895]
|
||
|
* The demonstrate_parser_differences.py script was still written in
|
||
|
Python 2. I've converted it to Python 3, but since no one has
|
||
|
mentioned this over the years, it's a sign that no one uses this
|
||
|
script and it's not serving its purpose.
|
||
|
- 4.12.0:
|
||
|
* Introduced the .css property, which centralizes all access to
|
||
|
the Soup Sieve API. This allows Beautiful Soup to give direct
|
||
|
access to as much of Soup Sieve that makes sense, without cluttering
|
||
|
the BeautifulSoup and Tag classes with a lot of new methods.
|
||
|
This does mean one addition to the BeautifulSoup and Tag classes
|
||
|
(the .css property itself), so this might be a breaking change if you
|
||
|
happen to use Beautiful Soup to parse XML that includes a tag called
|
||
|
<css>. In particular, code like this will stop working in 4.12.0:
|
||
|
|
||
|
soup.css['id']
|
||
|
|
||
|
Code like this will work just as before:
|
||
|
|
||
|
soup.find_one('css')['id']
|
||
|
|
||
|
The Soup Sieve methods supported through the .css property are
|
||
|
select(), select_one(), iselect(), closest(), match(), filter(),
|
||
|
escape(), and compile(). The BeautifulSoup and Tag classes still
|
||
|
support the select() and select_one() methods; they have not been
|
||
|
deprecated, but they have been demoted to convenience methods.
|
||
|
|
||
|
[bug=2003677]
|
||
|
|
||
|
* When the html.parser parser decides it can't parse a document, Beautiful
|
||
|
Soup now consistently propagates this fact by raising a
|
||
|
ParserRejectedMarkup error. [bug=2007343]
|
||
|
* Removed some error checking code from diagnose(), which is redundant with
|
||
|
similar (but more Pythonic) code in the BeautifulSoup constructor.
|
||
|
[bug=2007344]
|
||
|
* Added intersphinx references to the documentation so that other
|
||
|
projects have a target to point to when they reference Beautiful
|
||
|
Soup classes. [bug=1453370]
|
||
|
- 4.11.2:
|
||
|
* Fixed test failures caused by nondeterministic behavior of
|
||
|
UnicodeDammit's character detection, depending on the platform setup.
|
||
|
[bug=1973072]
|
||
|
* Fixed another crash when overriding multi_valued_attributes and using the
|
||
|
html5lib parser. [bug=1948488]
|
||
|
* The HTMLFormatter and XMLFormatter constructors no longer return a
|
||
|
value. [bug=1992693]
|
||
|
* Tag.interesting_string_types is now propagated when a tag is
|
||
|
copied. [bug=1990400]
|
||
|
* Warnings now do their best to provide an appropriate stacklevel,
|
||
|
improving the usefulness of the message. [bug=1978744]
|
||
|
* Passing a Tag's .contents into PageElement.extend() now works the
|
||
|
same way as passing the Tag itself.
|
||
|
* Soup Sieve tests will be skipped if the library is not installed.
|
||
|
- 4.11.1:
|
||
|
This release was done to ensure that the unit tests are packaged along
|
||
|
with the released source. There are no functionality changes in this
|
||
|
release, but there are a few other packaging changes:
|
||
|
* The Japanese and Korean translations of the documentation are included.
|
||
|
* The changelog is now packaged as CHANGELOG, and the license file is
|
||
|
packaged as LICENSE. NEWS.txt and COPYING.txt are still present,
|
||
|
but may be removed in the future.
|
||
|
* TODO.txt is no longer packaged, since a TODO is not relevant for released
|
||
|
code.
|
||
|
- 4.11.0:
|
||
|
* Ported unit tests to use pytest.
|
||
|
* Added special string classes, RubyParenthesisString and RubyTextString,
|
||
|
to make it possible to treat ruby text specially in get_text() calls.
|
||
|
[bug=1941980]
|
||
|
* It's now possible to customize the way output is indented by
|
||
|
providing a value for the 'indent' argument to the Formatter
|
||
|
constructor. The 'indent' argument works very similarly to the
|
||
|
argument of the same name in the Python standard library's
|
||
|
json.dump() function. [bug=1955497]
|
||
|
* If the charset-normalizer Python module
|
||
|
(https://pypi.org/project/charset-normalizer/) is installed, Beautiful
|
||
|
Soup will use it to detect the character sets of incoming documents.
|
||
|
This is also the module used by newer versions of the Requests library.
|
||
|
For the sake of backwards compatibility, chardet and cchardet both take
|
||
|
precedence if installed. [bug=1955346]
|
||
|
* Added a workaround for an lxml bug
|
||
|
(https://bugs.launchpad.net/lxml/+bug/1948551) that causes
|
||
|
problems when parsing a Unicode string beginning with BYTE ORDER MARK.
|
||
|
[bug=1947768]
|
||
|
* Issue a warning when an HTML parser is used to parse a document that
|
||
|
looks like XML but not XHTML. [bug=1939121]
|
||
|
* Do a better job of keeping track of namespaces as an XML document is
|
||
|
parsed, so that CSS selectors that use namespaces will do the right
|
||
|
thing more often. [bug=1946243]
|
||
|
* Some time ago, the misleadingly named "text" argument to find-type
|
||
|
methods was renamed to the more accurate "string." But this supposed
|
||
|
"renaming" didn't make it into important places like the method
|
||
|
signatures or the docstrings. That's corrected in this
|
||
|
version. "text" still works, but will give a DeprecationWarning.
|
||
|
[bug=1947038]
|
||
|
* Fixed a crash when pickling a BeautifulSoup object that has no
|
||
|
tree builder. [bug=1934003]
|
||
|
* Fixed a crash when overriding multi_valued_attributes and using the
|
||
|
html5lib parser. [bug=1948488]
|
||
|
* Standardized the wording of the MarkupResemblesLocatorWarning
|
||
|
warnings to omit untrusted input and make the warnings less
|
||
|
judgmental about what you ought to be doing. [bug=1955450]
|
||
|
* Removed support for the iconv_codec library, which doesn't seem
|
||
|
to exist anymore and was never put up on PyPI. (The closest
|
||
|
replacement on PyPI, iconv_codecs, is GPL-licensed, so we can't use
|
||
|
it--it's also quite old.)
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Sun Apr 23 23:26:12 UTC 2023 - Matej Cepl <mcepl@suse.com>
|
||
|
|
||
|
- Switch documentation to be within the main package.
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Fri Apr 21 12:22:35 UTC 2023 - Dirk Müller <dmueller@suse.com>
|
||
|
|
||
|
- add sle15_python_module_pythons (jsc#PED-68)
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Thu Apr 13 22:40:14 UTC 2023 - Matej Cepl <mcepl@suse.com>
|
||
|
|
||
|
- Make calling of %{sle15modernpython} optional.
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Wed Feb 9 10:14:27 UTC 2022 - Steve Kowalik <steven.kowalik@suse.com>
|
||
|
|
||
|
- Update to 4.10.0:
|
||
|
* This is the first release of Beautiful Soup to only support Python 3.
|
||
|
* The behavior of methods like .get_text() and .strings now differs
|
||
|
depending on the type of tag.
|
||
|
* NavigableString and its subclasses now implement the get_text()
|
||
|
method, as well as the properties .strings and
|
||
|
.stripped_strings.
|
||
|
* The 'html5' formatter now treats attributes whose values are the
|
||
|
empty string as HTML boolean attributes.
|
||
|
* The 'replace_with()' method now takes a variable number of arguments,
|
||
|
and can be used to replace a single element with a sequence of elements.
|
||
|
* Corrected output when the namespace prefix associated with a
|
||
|
namespaced attribute is the empty string, as opposed to
|
||
|
None.
|
||
|
* Performance improvement when processing tags that speeds up overall
|
||
|
tree construction by 2%. Patch by Morotti. [bug=1899358]
|
||
|
* Corrected the use of special string container classes in cases when a
|
||
|
single tag may contain strings with different containers; such as
|
||
|
the <template> tag, which may contain both TemplateString objects
|
||
|
and Comment objects.
|
||
|
* The html.parser tree builder can now handle named entities
|
||
|
found in the HTML5 spec in much the same way that the html5lib
|
||
|
tree builder does.
|
||
|
* Added a second way to pass specify encodings to UnicodeDammit and
|
||
|
EncodingDetector, based on the order of precedence defined in the
|
||
|
HTML5 spec.
|
||
|
* Improve the warning issued when a directory name (as opposed to
|
||
|
the name of a regular file) is passed as markup into the BeautifulSoup
|
||
|
constructor.
|
||
|
- Do not pass the directory to pytest.
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Sat Oct 10 18:34:16 UTC 2020 - Arun Persaud <arun@gmx.de>
|
||
|
|
||
|
- update to version 4.9.3:
|
||
|
* Implemented a significant performance optimization to the process
|
||
|
of searching the parse tree. Patch by Morotti. [bug=1898212]
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Mon Sep 28 11:41:27 UTC 2020 - Dirk Mueller <dmueller@suse.com>
|
||
|
|
||
|
- update to 4.9.2:
|
||
|
* Fixed a bug that caused too many tags to be popped from the tag
|
||
|
stack during tree building, when encountering a closing tag that had
|
||
|
no matching opening tag. [bug=1880420]
|
||
|
|
||
|
* Fixed a bug that inconsistently moved elements over when passing
|
||
|
a Tag, rather than a list, into Tag.extend(). [bug=1885710]
|
||
|
|
||
|
* Specify the soupsieve dependency in a way that complies with
|
||
|
PEP 508. Patch by Mike Nerone. [bug=1893696]
|
||
|
|
||
|
* Change the signatures for BeautifulSoup.insert_before and insert_after
|
||
|
(which are not implemented) to match PageElement.insert_before and
|
||
|
insert_after, quieting warnings in some IDEs. [bug=1897120]
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Wed Jun 3 11:10:03 UTC 2020 - Dirk Mueller <dmueller@suse.com>
|
||
|
|
||
|
- update to 4.9.1:
|
||
|
* Added a keyword argument 'on_duplicate_attribute' to the
|
||
|
BeautifulSoupHTMLParser constructor (used by the html.parser tree
|
||
|
builder) which lets you customize the handling of markup that
|
||
|
contains the same attribute more than once, as in:
|
||
|
<a href="url1" href="url2"> [bug=1878209]
|
||
|
* Added a distinct subclass, GuessedAtParserWarning, for the warning
|
||
|
issued when BeautifulSoup is instantiated without a parser being
|
||
|
specified. [bug=1873787]
|
||
|
* Added a distinct subclass, MarkupResemblesLocatorWarning, for the
|
||
|
warning issued when BeautifulSoup is instantiated with 'markup' that
|
||
|
actually seems to be a URL or the path to a file on
|
||
|
disk. [bug=1873787]
|
||
|
* The new NavigableString subclasses (Stylesheet, Script, and
|
||
|
TemplateString) can now be imported directly from the bs4 package.
|
||
|
* If you encode a document with a Python-specific encoding like
|
||
|
'unicode_escape', that encoding is no longer mentioned in the final
|
||
|
XML or HTML document. Instead, encoding information is omitted or
|
||
|
left blank. [bug=1874955]
|
||
|
* Fixed test failures when run against soupselect 2.0. Patch by Tomáš
|
||
|
Chvátal. [bug=1872279]
|
||
|
- remove soupsieve2-tests.patch: upstreamed
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Sun Apr 12 08:31:00 UTC 2020 - Tomáš Chvátal <tchvatal@suse.com>
|
||
|
|
||
|
- Add patch to fix the tests to pass with new soupsieve too:
|
||
|
* soupsieve2-tests.patch
|
||
|
* The assert name changed
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Sun Apr 12 07:50:37 UTC 2020 - Tomáš Chvátal <tchvatal@suse.com>
|
||
|
|
||
|
- Update to 4.9.0:
|
||
|
* fixes to work with new soupsieve
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Wed Jan 1 08:52:41 UTC 2020 - Ismail Dönmez <idonmez@suse.com>
|
||
|
|
||
|
- Update to 4.8.2
|
||
|
* Added Python docstrings to all public methods of the most commonly
|
||
|
used classes.
|
||
|
* Fixed two deprecation warnings. Patches by Colin
|
||
|
Watson and Nicholas Neumann. [bug=1847592] [bug=1855301]
|
||
|
* The html.parser tree builder now correctly handles DOCTYPEs that are
|
||
|
not uppercase. [bug=1848401]
|
||
|
* PageElement.select() now returns a ResultSet rather than a regular
|
||
|
list, making it consistent with methods like find_all().
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Fri Nov 1 08:59:57 UTC 2019 - Tomáš Chvátal <tchvatal@suse.com>
|
||
|
|
||
|
- Update to 4.8.1:
|
||
|
* When the html.parser or html5lib parsers are in use, Beautiful Soup
|
||
|
will, by default, record the position in the original document where
|
||
|
each tag was encountered.
|
||
|
* Fixed the definition of the default XML namespace when using
|
||
|
lxml 4.4.
|
||
|
* Avoid a crash when unpickling certain parse trees generated
|
||
|
using html5lib on Python 3.
|
||
|
* Avoid a crash when trying to detect the declared encoding of a
|
||
|
Unicode document.
|
||
|
- Drop patch beautifulsoup4-lxml-fixes.patch as it seems not needed
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Mon Oct 14 11:41:52 UTC 2019 - Matej Cepl <mcepl@suse.com>
|
||
|
|
||
|
- Replace %fdupes -s with plain %fdupes; hardlinks are better.
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Mon Jul 22 16:18:23 UTC 2019 - Todd R <toddrme2178@gmail.com>
|
||
|
|
||
|
- Update to 4.8.0
|
||
|
* It's now possible to customize the TreeBuilder object by passing
|
||
|
keyword arguments into the BeautifulSoup constructor. The main
|
||
|
reason to do this right now is to change how which attributes are
|
||
|
treated as multi-valued attributes (the way 'class' is treated by
|
||
|
default). You can do this with the `multi_valued_attributes` argument.
|
||
|
* The role of Formatter objects has been greatly expanded. The Formatter
|
||
|
class now controls the following:
|
||
|
> The function to call to perform entity substitution. (This was
|
||
|
previously Formatter's only job.)
|
||
|
> Which tags should be treated as containing CDATA and have their
|
||
|
contents exempt from entity substitution.
|
||
|
> The order in which a tag's attributes are output.
|
||
|
> Whether or not to put a '/' inside a void element, e.g. '<br/>' vs '<br>'
|
||
|
All preexisting code should work as before.
|
||
|
* Added a new method to the API, Tag.smooth(), which consolidates
|
||
|
multiple adjacent NavigableString elements.
|
||
|
* ' (which is valid in XML, XHTML, and HTML 5, but not HTML 4) is now
|
||
|
recognized as a named entity and converted to a single quote.
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Fri Mar 1 11:23:21 UTC 2019 - Tomáš Chvátal <tchvatal@suse.com>
|
||
|
|
||
|
- Do not generate doc for py2 and py3 variant they are the same
|
||
|
so keep just one around
|
||
|
- Update to 4.7.1:
|
||
|
* Fixed a significant performance problem introduced in 4.7.0. [bug=1810617]
|
||
|
* Fixed an incorrectly raised exception when inserting a tag before or
|
||
|
after an identical tag. [bug=1810692]
|
||
|
* Beautiful Soup will no longer try to keep track of namespaces that
|
||
|
are not defined with a prefix; this can confuse soupselect. [bug=1810680]
|
||
|
* Tried even harder to avoid the deprecation warning originally fixed in
|
||
|
4.6.1. [bug=1778909]
|
||
|
* Beautiful Soup's CSS Selector implementation has been replaced by a
|
||
|
dependency on Isaac Muse's SoupSieve project (the soupsieve package
|
||
|
on PyPI). The good news is that SoupSieve has a much more robust and
|
||
|
complete implementation of CSS selectors, resolving a large number
|
||
|
of longstanding issues. The bad news is that from this point onward,
|
||
|
SoupSieve must be installed if you want to use the select() method.
|
||
|
* Added the PageElement.extend() method, which works like list.append().
|
||
|
[bug=1514970]
|
||
|
* PageElement.insert_before() and insert_after() now take a variable
|
||
|
number of arguments. [bug=1514970]
|
||
|
* Fix a number of problems with the tree builder that caused
|
||
|
trees that were superficially okay, but which fell apart when bits
|
||
|
were extracted. Patch by Isaac Muse. [bug=1782928,1809910]
|
||
|
* Fixed a problem with the tree builder in which elements that
|
||
|
contained no content (such as empty comments and all-whitespace
|
||
|
elements) were not being treated as part of the tree. Patch by Isaac
|
||
|
Muse. [bug=1798699]
|
||
|
* Fixed a problem with multi-valued attributes where the value
|
||
|
contained whitespace. Thanks to Jens Svalgaard for the
|
||
|
fix. [bug=1787453]
|
||
|
* Clarified ambiguous license statements in the source code. Beautiful
|
||
|
Soup is released under the MIT license, and has been since 4.4.0.
|
||
|
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Thu Dec 6 14:47:30 UTC 2018 - Ondřej Súkup <mimi.vx@gmail.com>
|
||
|
|
||
|
- update to 4.6.3
|
||
|
* Fix an exception when a custom formatter was asked to format
|
||
|
a void element
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Sun Aug 5 11:02:25 UTC 2018 - adrian@suse.de
|
||
|
|
||
|
- update to 4.6.1:
|
||
|
* Stop data loss when encountering an empty numeric entity, and
|
||
|
possibly in other cases. Thanks to tos.kamiya for the fix. [bug=1698503]
|
||
|
|
||
|
* Preserve XML namespaces introduced inside an XML document, not just
|
||
|
the ones introduced at the top level. [bug=1718787]
|
||
|
|
||
|
* Added a new formatter, "html5", which represents void elements
|
||
|
as "<element>" rather than "<element/>". [bug=1716272]
|
||
|
|
||
|
* Fixed a problem where the html.parser tree builder interpreted
|
||
|
a string like "&foo " as the character entity "&foo;" [bug=1728706]
|
||
|
|
||
|
* Correctly handle invalid HTML numeric character entities like “
|
||
|
which reference code points that are not Unicode code points. Note
|
||
|
that this is only fixed when Beautiful Soup is used with the
|
||
|
html.parser parser -- html5lib already worked and I couldn't fix it
|
||
|
with lxml. [bug=1782933]
|
||
|
|
||
|
* Improved the warning given when no parser is specified. [bug=1780571]
|
||
|
|
||
|
* When markup contains duplicate elements, a select() call that
|
||
|
includes multiple match clauses will match all relevant
|
||
|
elements. [bug=1770596]
|
||
|
|
||
|
* Fixed code that was causing deprecation warnings in recent Python 3
|
||
|
versions. Includes a patch from Ville Skyttä. [bug=1778909] [bug=1689496]
|
||
|
|
||
|
* Fixed a Windows crash in diagnose() when checking whether a long
|
||
|
markup string is a filename. [bug=1737121]
|
||
|
|
||
|
* Stopped HTMLParser from raising an exception in very rare cases of
|
||
|
bad markup. [bug=1708831]
|
||
|
|
||
|
* Fixed a bug where find_all() was not working when asked to find a
|
||
|
tag with a namespaced name in an XML document that was parsed as
|
||
|
HTML. [bug=1723783]
|
||
|
|
||
|
* You can get finer control over formatting by subclassing
|
||
|
bs4.element.Formatter and passing a Formatter instance into (e.g.)
|
||
|
encode(). [bug=1716272]
|
||
|
|
||
|
* You can pass a dictionary of `attrs` into
|
||
|
BeautifulSoup.new_tag. This makes it possible to create a tag with
|
||
|
an attribute like 'name' that would otherwise be masked by another
|
||
|
argument of new_tag. [bug=1779276]
|
||
|
|
||
|
* Clarified the deprecation warning when accessing tag.fooTag, to cover
|
||
|
the possibility that you might really have been looking for a tag
|
||
|
called 'fooTag'.
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Mon Jul 16 18:08:01 UTC 2018 - mcepl@suse.com
|
||
|
|
||
|
- Clean SPEC file
|
||
|
Use py.test for running the tests instead of nosetests, which
|
||
|
breaks with python 3.7.
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Tue Mar 6 12:27:41 UTC 2018 - aplanas@suse.com
|
||
|
|
||
|
- Allows Recommends and Suggest in Fedora
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Tue Feb 27 17:00:11 UTC 2018 - aplanas@suse.com
|
||
|
|
||
|
- Recommends and Suggest are for SUSE
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Thu Aug 10 13:38:03 UTC 2017 - tbechtold@suse.com
|
||
|
|
||
|
- Only Suggests python-html5lib and python-lxml (instead of Requires
|
||
|
them). Both are not striclty needed. See
|
||
|
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Wed Jul 5 06:28:31 UTC 2017 - dmueller@suse.com
|
||
|
|
||
|
- update to 4.6.0:
|
||
|
* Added the `Tag.get_attribute_list` method, which acts like `Tag.get` for
|
||
|
getting the value of an attribute, but which always returns a list,
|
||
|
whether or not the attribute is a multi-value attribute. [bug=1678589]
|
||
|
* Improved the handling of empty-element tags like <br> when using the
|
||
|
html.parser parser. [bug=1676935]
|
||
|
* HTML parsers treat all HTML4 and HTML5 empty element tags (aka void
|
||
|
element tags) correctly. [bug=1656909]
|
||
|
* Namespace prefix is preserved when an XML tag is copied. Thanks
|
||
|
to Vikas for a patch and test. [bug=1685172]
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Mon May 22 13:25:06 UTC 2017 - aloisio@gmx.com
|
||
|
|
||
|
- Fixed failing tests in python3
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Sat Apr 8 17:35:17 UTC 2017 - aloisio@gmx.com
|
||
|
|
||
|
- update to version 4.5.3:
|
||
|
* Fixed foster parenting when html5lib is the tree builder. Thanks
|
||
|
to Geoffrey Sneddon for a patch and test.
|
||
|
* Fixed yet another problem that caused the html5lib tree builder to
|
||
|
create a disconnected parse tree. [bug=1629825]
|
||
|
changes from version 4.5.2:
|
||
|
* Apart from the version number, this release is identical to
|
||
|
4.5.3. Due to user error, it could not be completely uploaded to
|
||
|
PyPI. Use 4.5.3 instead.
|
||
|
|
||
|
- Converted to single-spec
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Thu Sep 1 19:20:36 UTC 2016 - tbechtold@suse.com
|
||
|
|
||
|
- Relax BuildRequires for python-Sphinx
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Thu Sep 1 10:26:24 UTC 2016 - tbechtold@suse.com
|
||
|
|
||
|
- update to 4.5.1:
|
||
|
* Fixed a crash when passing Unicode markup that contained a
|
||
|
processing instruction into the lxml HTML parser on Python
|
||
|
3. [bug=1608048]
|
||
|
* Beautiful Soup is no longer compatible with Python 2.6. This
|
||
|
actually happened a few releases ago, but it's now official.
|
||
|
* Beautiful Soup will now work with versions of html5lib greater than
|
||
|
0.99999999. [bug=1603299]
|
||
|
* If a search against each individual value of a multi-valued
|
||
|
attribute fails, the search will be run one final time against the
|
||
|
complete attribute value considered as a single string. That is, if
|
||
|
a tag has class="foo bar" and neither "foo" nor "bar" matches, but
|
||
|
"foo bar" does, the tag is now considered a match.
|
||
|
This happened in previous versions, but only when the value being
|
||
|
searched for was a string. Now it also works when that value is
|
||
|
a regular expression, a list of strings, etc. [bug=1476868]
|
||
|
* Fixed a bug that deranged the tree when a whitespace element was
|
||
|
reparented into a tag that contained an identical whitespace
|
||
|
element. [bug=1505351]
|
||
|
* Added support for CSS selector values that contain quoted spaces,
|
||
|
such as tag[style="display: foo"]. [bug=1540588]
|
||
|
* Corrected handling of XML processing instructions. [bug=1504393]
|
||
|
* Corrected an encoding error that happened when a BeautifulSoup
|
||
|
object was copied. [bug=1554439]
|
||
|
* The contents of <textarea> tags will no longer be modified when the
|
||
|
tree is prettified. [bug=1555829]
|
||
|
* When a BeautifulSoup object is pickled but its tree builder cannot
|
||
|
be pickled, its .builder attribute is set to None instead of being
|
||
|
destroyed. This avoids a performance problem once the object is
|
||
|
unpickled. [bug=1523629]
|
||
|
* Specify the file and line number when warning about a
|
||
|
BeautifulSoup object being instantiated without a parser being
|
||
|
specified. [bug=1574647]
|
||
|
* The `limit` argument to `select()` now works correctly, though it's
|
||
|
not implemented very efficiently. [bug=1520530]
|
||
|
* Fixed a Python 3 ByteWarning when a URL was passed in as though it
|
||
|
were markup. Thanks to James Salter for a patch and
|
||
|
test. [bug=1533762]
|
||
|
* We don't run the check for a filename passed in as markup if the
|
||
|
'filename' contains a less-than character; the less-than character
|
||
|
indicates it's most likely a very small document. [bug=1577864]
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Sun Nov 15 16:31:46 UTC 2015 - idonmez@suse.com
|
||
|
|
||
|
- Update to version 4.4.1
|
||
|
* Fixed a bug that deranged the tree when part of it was
|
||
|
removed. Thanks to Eric Weiser for the patch and John Wiseman for a
|
||
|
test. lp#1481520
|
||
|
* Fixed a parse bug with the html5lib tree-builder. Thanks to Roel
|
||
|
Kramer for the patch. lp#1483781
|
||
|
* Improved the implementation of CSS selector grouping. Thanks to
|
||
|
Orangain for the patch. lp#1484543
|
||
|
* Fixed the test_detect_utf8 test so that it works when chardet is
|
||
|
installed. lp#1471359
|
||
|
* Corrected the output of Declaration objects. lp#1477847
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Mon Jul 27 18:54:20 UTC 2015 - aloisio@gmx.com
|
||
|
|
||
|
- update to 4.4.0
|
||
|
Especially important changes:
|
||
|
* Added a warning when you instantiate a BeautifulSoup object without
|
||
|
explicitly naming a parser. [bug=1398866]
|
||
|
* __repr__ now returns an ASCII bytestring in Python 2, and a Unicode
|
||
|
string in Python 3, instead of a UTF8-encoded bytestring in both
|
||
|
versions. In Python 3, __str__ now returns a Unicode string instead
|
||
|
of a bytestring. [bug=1420131]
|
||
|
* The `text` argument to the find_* methods is now called `string`,
|
||
|
which is more accurate. `text` still works, but `string` is the
|
||
|
argument described in the documentation. `text` may eventually
|
||
|
change its meaning, but not for a very long time. [bug=1366856]
|
||
|
* Changed the way soup objects work under copy.copy(). Copying a
|
||
|
NavigableString or a Tag will give you a new NavigableString that's
|
||
|
equal to the old one but not connected to the parse tree. Patch by
|
||
|
Martijn Peters. [bug=1307490]
|
||
|
* Started using a standard MIT license. [bug=1294662]
|
||
|
* Added a Chinese translation of the documentation by Delong .w.
|
||
|
New features:
|
||
|
* Introduced the select_one() method, which uses a CSS selector but
|
||
|
only returns the first match, instead of a list of
|
||
|
matches. [bug=1349367]
|
||
|
* You can now create a Tag object without specifying a
|
||
|
TreeBuilder. Patch by Martijn Pieters. [bug=1307471]
|
||
|
* You can now create a NavigableString or a subclass just by invoking
|
||
|
the constructor. [bug=1294315]
|
||
|
* Added an `exclude_encodings` argument to UnicodeDammit and to the
|
||
|
Beautiful Soup constructor, which lets you prohibit the detection of
|
||
|
an encoding that you know is wrong. [bug=1469408]
|
||
|
* The select() method now supports selector grouping. Patch by
|
||
|
Francisco Canas [bug=1191917]
|
||
|
Bug fixes:
|
||
|
* Fixed yet another problem that caused the html5lib tree builder to
|
||
|
create a disconnected parse tree. [bug=1237763]
|
||
|
* Force object_was_parsed() to keep the tree intact even when an element
|
||
|
from later in the document is moved into place. [bug=1430633]
|
||
|
* Fixed yet another bug that caused a disconnected tree when html5lib
|
||
|
copied an element from one part of the tree to another. [bug=1270611]
|
||
|
* Fixed a bug where Element.extract() could create an infinite loop in
|
||
|
the remaining tree.
|
||
|
* The select() method can now find tags whose names contain
|
||
|
dashes. Patch by Francisco Canas. [bug=1276211]
|
||
|
* The select() method can now find tags with attributes whose names
|
||
|
contain dashes. Patch by Marek Kapolka. [bug=1304007]
|
||
|
* Improved the lxml tree builder's handling of processing
|
||
|
instructions. [bug=1294645]
|
||
|
* Restored the helpful syntax error that happens when you try to
|
||
|
import the Python 2 edition of Beautiful Soup under Python 3.
|
||
|
[bug=1213387]
|
||
|
* In Python 3.4 and above, set the new convert_charrefs argument to
|
||
|
the html.parser constructor to avoid a warning and future
|
||
|
failures. Patch by Stefano Revera. [bug=1375721]
|
||
|
* The warning when you pass in a filename or URL as markup will now be
|
||
|
displayed correctly even if the filename or URL is a Unicode
|
||
|
string. [bug=1268888]
|
||
|
* If the initial <html> tag contains a CDATA list attribute such as
|
||
|
'class', the html5lib tree builder will now turn its value into a
|
||
|
list, as it would with any other tag. [bug=1296481]
|
||
|
* Fixed an import error in Python 3.5 caused by the removal of the
|
||
|
HTMLParseError class. [bug=1420063]
|
||
|
* Improved docstring for encode_contents() and
|
||
|
decode_contents(). [bug=1441543]
|
||
|
* Fixed a crash in Unicode, Dammit's encoding detector when the name
|
||
|
of the encoding itself contained invalid bytes. [bug=1360913]
|
||
|
* Improved the exception raised when you call .unwrap() or
|
||
|
.replace_with() on an element that's not attached to a tree.
|
||
|
* Raise a NotImplementedError whenever an unsupported CSS pseudoclass
|
||
|
is used in select(). Previously some cases did not result in a
|
||
|
NotImplementedError.
|
||
|
* It's now possible to pickle a BeautifulSoup object no matter which
|
||
|
tree builder was used to create it. However, the only tree builder
|
||
|
that survives the pickling process is the HTMLParserTreeBuilder
|
||
|
('html.parser'). If you unpickle a BeautifulSoup object created with
|
||
|
some other tree builder, soup.builder will be None. [bug=1231545]
|
||
|
- Aligned requirement version with PyPI
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Fri Jul 24 20:25:54 UTC 2015 - seife+obs@b1-systems.com
|
||
|
|
||
|
- fix non-SUSE build by conditionalizing Recommends: tag
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Wed Jan 8 15:05:55 UTC 2014 - speilicke@suse.com
|
||
|
|
||
|
- Add beautifulsoup4-lxml-fixes.patch: LXML fixes
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Tue Oct 22 13:00:00 UTC 2013 - toddrme2178@gmail.com
|
||
|
|
||
|
- update to 4.3.2
|
||
|
* Fixed a bug in which short Unicode input was improperly encoded to
|
||
|
ASCII when checking whether or not it was the name of a file on
|
||
|
disk. [bug=1227016]
|
||
|
* Fixed a crash when a short input contains data not valid in
|
||
|
filenames. [bug=1232604]
|
||
|
* Fixed a bug that caused Unicode data put into UnicodeDammit to
|
||
|
return None instead of the original data. [bug=1214983]
|
||
|
* Combined two tests to stop a spurious test failure when tests are
|
||
|
run by nosetests. [bug=1212445]
|
||
|
- update to 4.3.1
|
||
|
* Fixed yet another problem with the html5lib tree builder, caused by
|
||
|
html5lib's tendency to rearrange the tree during
|
||
|
parsing. [bug=1189267]
|
||
|
* Fixed a bug that caused the optimized version of find_all() to
|
||
|
return nothing. [bug=1212655]
|
||
|
- update to 4.3.0
|
||
|
* Instead of converting incoming data to Unicode and feeding it to the
|
||
|
lxml tree builder in chunks, Beautiful Soup now makes successive
|
||
|
guesses at the encoding of the incoming data, and tells lxml to
|
||
|
parse the data as that encoding. Giving lxml more control over the
|
||
|
parsing process improves performance and avoids a number of bugs and
|
||
|
issues with the lxml parser which had previously required elaborate
|
||
|
workarounds:
|
||
|
- An issue in which lxml refuses to parse Unicode strings on some
|
||
|
systems. [bug=1180527]
|
||
|
- A returning bug that truncated documents longer than a (very
|
||
|
small) size. [bug=963880]
|
||
|
- A returning bug in which extra spaces were added to a document if
|
||
|
the document defined a charset other than UTF-8. [bug=972466]
|
||
|
This required a major overhaul of the tree builder architecture. If
|
||
|
you wrote your own tree builder and didn't tell me, you'll need to
|
||
|
modify your prepare_markup() method.
|
||
|
* The UnicodeDammit code that makes guesses at encodings has been
|
||
|
split into its own class, EncodingDetector. A lot of apparently
|
||
|
redundant code has been removed from Unicode, Dammit, and some
|
||
|
undocumented features have also been removed.
|
||
|
* Beautiful Soup will issue a warning if instead of markup you pass it
|
||
|
a URL or the name of a file on disk (a common beginner's mistake).
|
||
|
* A number of optimizations improve the performance of the lxml tree
|
||
|
builder by about 33%, the html.parser tree builder by about 20%, and
|
||
|
the html5lib tree builder by about 15%.
|
||
|
* All find_all calls should now return a ResultSet object. Patch by
|
||
|
Aaron DeVore. [bug=1194034]
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Fri Jul 19 17:07:52 UTC 2013 - berendt@b1-systems.de
|
||
|
|
||
|
- remove .buildinfo before installation
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Thu Jul 18 08:25:50 UTC 2013 - berendt@b1-systems.de
|
||
|
|
||
|
- removed python-lxml as build requirement to be able to
|
||
|
successfully pass the check section on SLES11 SP3
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Thu Jun 27 13:32:06 UTC 2013 - speilicke@suse.com
|
||
|
|
||
|
- Update upstream URL
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Tue Jun 25 11:52:34 UTC 2013 - dmueller@suse.com
|
||
|
|
||
|
- update to 4.2.1:
|
||
|
* The default XML formatter will now replace ampersands even if they
|
||
|
appear to be part of entities. That is, "<" will become
|
||
|
"&lt;". The old code was left over from Beautiful Soup 3, which
|
||
|
didn't always turn entities into Unicode characters.
|
||
|
|
||
|
If you really want the old behavior (maybe because you add new
|
||
|
strings to the tree, those strings include entities, and you want
|
||
|
the formatter to leave them alone on output), it can be found in
|
||
|
EntitySubstitution.substitute_xml_containing_entities(). [bug=1182183]
|
||
|
|
||
|
* Gave new_string() the ability to create subclasses of
|
||
|
NavigableString. [bug=1181986]
|
||
|
|
||
|
* Fixed another bug by which the html5lib tree builder could create a
|
||
|
disconnected tree. [bug=1182089]
|
||
|
|
||
|
* The .previous_element of a BeautifulSoup object is now always None,
|
||
|
not the last element to be parsed. [bug=1182089]
|
||
|
|
||
|
* Fixed test failures when lxml is not installed. [bug=1181589]
|
||
|
|
||
|
* html5lib now supports Python 3. Fixed some Python 2-specific
|
||
|
code in the html5lib test suite. [bug=1181624]
|
||
|
|
||
|
* The html.parser treebuilder can now handle numeric attributes in
|
||
|
text when the hexidecimal name of the attribute starts with a
|
||
|
capital X. Patch by Tim Shirley. [bug=1186242]
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Mon Jun 10 20:34:00 UTC 2013 - dmueller@suse.com
|
||
|
|
||
|
- disable tests on SLE_11, fail due to too old python-lxml
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Sat May 18 13:30:00 UTC 2013 - toddrme2178@gmail.com
|
||
|
|
||
|
- Update to 4.2.0
|
||
|
* The Tag.select() method now supports a much wider variety of CSS
|
||
|
selectors.
|
||
|
- Added support for the adjacent sibling combinator (+) and the
|
||
|
general sibling combinator (~). Tests by "liquider". [bug=1082144]
|
||
|
- The combinators (>, +, and ~) can now combine with any supported
|
||
|
selector, not just one that selects based on tag name.
|
||
|
- Added limited support for the "nth-of-type" pseudo-class. Code
|
||
|
by Sven Slootweg. [bug=1109952]
|
||
|
* The BeautifulSoup class is now aliased to "_s" and "_soup", making
|
||
|
it quicker to type the import statement in an interactive session
|
||
|
The alias may change in the future, so don't use this in code you're
|
||
|
going to run more than once.
|
||
|
* Added the 'diagnose' submodule, which includes several useful
|
||
|
functions for reporting problems and doing tech support.
|
||
|
- diagnose(data) tries the given markup on every installed parser,
|
||
|
reporting exceptions and displaying successes. If a parser is not
|
||
|
installed, diagnose() mentions this fact.
|
||
|
- lxml_trace(data, html=True) runs the given markup through lxml's
|
||
|
XML parser or HTML parser, and prints out the parser events as
|
||
|
they happen. This helps you quickly determine whether a given
|
||
|
problem occurs in lxml code or Beautiful Soup code.
|
||
|
- htmlparser_trace(data) is the same thing, but for Python's
|
||
|
built-in HTMLParser class.
|
||
|
* In an HTML document, the contents of a <script> or <style> tag will
|
||
|
no longer undergo entity substitution by default. XML documents work
|
||
|
the same way they did before. [bug=1085953]
|
||
|
* Methods like get_text() and properties like .strings now only give
|
||
|
you strings that are visible in the document--no comments or
|
||
|
processing commands. [bug=1050164]
|
||
|
* The prettify() method now leaves the contents of <pre> tags
|
||
|
alone. [bug=1095654]
|
||
|
* Fix a bug in the html5lib treebuilder which sometimes created
|
||
|
disconnected trees. [bug=1039527]
|
||
|
* Fix a bug in the lxml treebuilder which crashed when a tag included
|
||
|
an attribute from the predefined "xml:" namespace. [bug=1065617]
|
||
|
* Fix a bug by which keyword arguments to find_parent() were not
|
||
|
being passed on. [bug=1126734]
|
||
|
* Stop a crash when unwisely messing with a tag that's been
|
||
|
decomposed. [bug=1097699]
|
||
|
* Now that lxml's segfault on invalid doctype has been fixed, fixed a
|
||
|
corresponding problem on the Beautiful Soup end that was previously
|
||
|
invisible. [bug=984936]
|
||
|
* Fixed an exception when an overspecified CSS selector didn't match
|
||
|
anything. Code by Stefaan Lippens. [bug=1168167]
|
||
|
- Re-enable lxml support (unit tests require it)
|
||
|
- Build documentation and add doc sub-package
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Tue Apr 30 12:59:02 UTC 2013 - dmueller@suse.com
|
||
|
|
||
|
- remove lxml support (fails unit test)
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Sat Jan 12 14:10:18 UTC 2013 - toddrme2178@gmail.com
|
||
|
|
||
|
- Use explicit file list
|
||
|
- Fix building on openSUSE 12.1 and 12.2
|
||
|
- Use recommended lxml parser instead of native one
|
||
|
(native fails fails for some python versions)
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Wed Jan 9 21:15:18 UTC 2013 - cfarrell@suse.com
|
||
|
|
||
|
- license update: MIT
|
||
|
See COPYING.txt
|
||
|
|
||
|
-------------------------------------------------------------------
|
||
|
Mon Sep 10 18:52:45 UTC 2012 - nmo.marques@gmail.com
|
||
|
|
||
|
- initial package from version 4.1.3
|
||
|
- based on spec file from python-beautifulsoup
|
||
|
- requires python >= 2.6
|
||
|
|