forked from pool/python-beautifulsoup4
Dirk Mueller
1611f8c551
* Fixed a regression such that if you set .hidden on a tag, the tag becomes invisible but its contents are still visible. User manipulation of .hidden is not a documented or supported feature, so don't do this, but it wasn't too difficult to keep the old behavior working. * Fixed a case found by Mengyuhan where html.parser giving up on markup would result in an AssertionError instead of a ParserRejectedMarkup exception. * Added the correct stacklevel to instances of the XMLParsedAsHTMLWarning. * Corrected the syntax of the license definition in pyproject.toml. * Corrected a typo in a test that was causing test failures when run against libxml2 2.12.1. - Require cchardet explicitly to avoid charset-normalizer braindamage. - disable tests on SLE_11, fail due to too old python-lxml - remove lxml support (fails unit test) - Use recommended lxml parser instead of native one OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-beautifulsoup4?expand=0&rev=93
820 lines
39 KiB
Plaintext
820 lines
39 KiB
Plaintext
-------------------------------------------------------------------
|
|
Sat Jan 20 13:11:41 UTC 2024 - Dirk Müller <dmueller@suse.com>
|
|
|
|
- update to 4.12.3:
|
|
* Fixed a regression such that if you set .hidden on a tag, the
|
|
tag becomes invisible but its contents are still visible. User
|
|
manipulation of .hidden is not a documented or supported
|
|
feature, so don't do this, but it wasn't too difficult to
|
|
keep the old behavior
|
|
working.
|
|
* Fixed a case found by Mengyuhan where html.parser giving up
|
|
on markup would result in an AssertionError instead of a
|
|
ParserRejectedMarkup exception.
|
|
* Added the correct stacklevel to instances of the
|
|
XMLParsedAsHTMLWarning.
|
|
* Corrected the syntax of the license definition in
|
|
pyproject.toml.
|
|
* Corrected a typo in a test that was causing test failures
|
|
when run against libxml2 2.12.1.
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Nov 23 03:40:05 UTC 2023 - Steve Kowalik <steven.kowalik@suse.com>
|
|
|
|
- Require cchardet explicitly to avoid charset-normalizer braindamage.
|
|
|
|
-------------------------------------------------------------------
|
|
Mon May 8 11:39:40 UTC 2023 - Daniel Garcia <daniel.garcia@suse.com>
|
|
|
|
- Update to 4.12.2:
|
|
* Fixed an unhandled exception in BeautifulSoup.decode_contents
|
|
and methods that call it. [bug=2015545]
|
|
- 4.12.1:
|
|
* This version of Beautiful Soup replaces setup.py and setup.cfg
|
|
with pyproject.toml. Beautiful Soup now uses tox as its test backend
|
|
and hatch to do builds.
|
|
* The main functional improvement in this version is a nonrecursive technique
|
|
for regenerating a tree. This technique is used to avoid situations where,
|
|
in previous versions, doing something to a very deeply nested tree
|
|
would overflow the Python interpreter stack:
|
|
1. Outputting a tree as a string, e.g. with
|
|
BeautifulSoup.encode() [bug=1471755]
|
|
2. Making copies of trees (copy.copy() and
|
|
copy.deepcopy() from the Python standard library). [bug=1709837]
|
|
3. Pickling a BeautifulSoup object. (Note that pickling a Tag
|
|
object can still cause an overflow.)
|
|
* Making a copy of a BeautifulSoup object no longer parses the
|
|
document again, which should improve performance significantly.
|
|
* When a BeautifulSoup object is unpickled, Beautiful Soup now
|
|
tries to associate an appropriate TreeBuilder object with it.
|
|
* Tag.prettify() will now consistently end prettified markup with
|
|
a newline.
|
|
* Added unit tests for fuzz test cases created by third
|
|
parties. Some of these tests are skipped since they point
|
|
to problems outside of Beautiful Soup, but this change
|
|
puts them all in one convenient place.
|
|
* PageElement now implements the known_xml attribute. (This was technically
|
|
a bug, but it shouldn't be an issue in normal use.) [bug=2007895]
|
|
* The demonstrate_parser_differences.py script was still written in
|
|
Python 2. I've converted it to Python 3, but since no one has
|
|
mentioned this over the years, it's a sign that no one uses this
|
|
script and it's not serving its purpose.
|
|
- 4.12.0:
|
|
* Introduced the .css property, which centralizes all access to
|
|
the Soup Sieve API. This allows Beautiful Soup to give direct
|
|
access to as much of Soup Sieve that makes sense, without cluttering
|
|
the BeautifulSoup and Tag classes with a lot of new methods.
|
|
This does mean one addition to the BeautifulSoup and Tag classes
|
|
(the .css property itself), so this might be a breaking change if you
|
|
happen to use Beautiful Soup to parse XML that includes a tag called
|
|
<css>. In particular, code like this will stop working in 4.12.0:
|
|
|
|
soup.css['id']
|
|
|
|
Code like this will work just as before:
|
|
|
|
soup.find_one('css')['id']
|
|
|
|
The Soup Sieve methods supported through the .css property are
|
|
select(), select_one(), iselect(), closest(), match(), filter(),
|
|
escape(), and compile(). The BeautifulSoup and Tag classes still
|
|
support the select() and select_one() methods; they have not been
|
|
deprecated, but they have been demoted to convenience methods.
|
|
|
|
[bug=2003677]
|
|
|
|
* When the html.parser parser decides it can't parse a document, Beautiful
|
|
Soup now consistently propagates this fact by raising a
|
|
ParserRejectedMarkup error. [bug=2007343]
|
|
* Removed some error checking code from diagnose(), which is redundant with
|
|
similar (but more Pythonic) code in the BeautifulSoup constructor.
|
|
[bug=2007344]
|
|
* Added intersphinx references to the documentation so that other
|
|
projects have a target to point to when they reference Beautiful
|
|
Soup classes. [bug=1453370]
|
|
- 4.11.2:
|
|
* Fixed test failures caused by nondeterministic behavior of
|
|
UnicodeDammit's character detection, depending on the platform setup.
|
|
[bug=1973072]
|
|
* Fixed another crash when overriding multi_valued_attributes and using the
|
|
html5lib parser. [bug=1948488]
|
|
* The HTMLFormatter and XMLFormatter constructors no longer return a
|
|
value. [bug=1992693]
|
|
* Tag.interesting_string_types is now propagated when a tag is
|
|
copied. [bug=1990400]
|
|
* Warnings now do their best to provide an appropriate stacklevel,
|
|
improving the usefulness of the message. [bug=1978744]
|
|
* Passing a Tag's .contents into PageElement.extend() now works the
|
|
same way as passing the Tag itself.
|
|
* Soup Sieve tests will be skipped if the library is not installed.
|
|
- 4.11.1:
|
|
This release was done to ensure that the unit tests are packaged along
|
|
with the released source. There are no functionality changes in this
|
|
release, but there are a few other packaging changes:
|
|
* The Japanese and Korean translations of the documentation are included.
|
|
* The changelog is now packaged as CHANGELOG, and the license file is
|
|
packaged as LICENSE. NEWS.txt and COPYING.txt are still present,
|
|
but may be removed in the future.
|
|
* TODO.txt is no longer packaged, since a TODO is not relevant for released
|
|
code.
|
|
- 4.11.0:
|
|
* Ported unit tests to use pytest.
|
|
* Added special string classes, RubyParenthesisString and RubyTextString,
|
|
to make it possible to treat ruby text specially in get_text() calls.
|
|
[bug=1941980]
|
|
* It's now possible to customize the way output is indented by
|
|
providing a value for the 'indent' argument to the Formatter
|
|
constructor. The 'indent' argument works very similarly to the
|
|
argument of the same name in the Python standard library's
|
|
json.dump() function. [bug=1955497]
|
|
* If the charset-normalizer Python module
|
|
(https://pypi.org/project/charset-normalizer/) is installed, Beautiful
|
|
Soup will use it to detect the character sets of incoming documents.
|
|
This is also the module used by newer versions of the Requests library.
|
|
For the sake of backwards compatibility, chardet and cchardet both take
|
|
precedence if installed. [bug=1955346]
|
|
* Added a workaround for an lxml bug
|
|
(https://bugs.launchpad.net/lxml/+bug/1948551) that causes
|
|
problems when parsing a Unicode string beginning with BYTE ORDER MARK.
|
|
[bug=1947768]
|
|
* Issue a warning when an HTML parser is used to parse a document that
|
|
looks like XML but not XHTML. [bug=1939121]
|
|
* Do a better job of keeping track of namespaces as an XML document is
|
|
parsed, so that CSS selectors that use namespaces will do the right
|
|
thing more often. [bug=1946243]
|
|
* Some time ago, the misleadingly named "text" argument to find-type
|
|
methods was renamed to the more accurate "string." But this supposed
|
|
"renaming" didn't make it into important places like the method
|
|
signatures or the docstrings. That's corrected in this
|
|
version. "text" still works, but will give a DeprecationWarning.
|
|
[bug=1947038]
|
|
* Fixed a crash when pickling a BeautifulSoup object that has no
|
|
tree builder. [bug=1934003]
|
|
* Fixed a crash when overriding multi_valued_attributes and using the
|
|
html5lib parser. [bug=1948488]
|
|
* Standardized the wording of the MarkupResemblesLocatorWarning
|
|
warnings to omit untrusted input and make the warnings less
|
|
judgmental about what you ought to be doing. [bug=1955450]
|
|
* Removed support for the iconv_codec library, which doesn't seem
|
|
to exist anymore and was never put up on PyPI. (The closest
|
|
replacement on PyPI, iconv_codecs, is GPL-licensed, so we can't use
|
|
it--it's also quite old.)
|
|
|
|
-------------------------------------------------------------------
|
|
Sun Apr 23 23:26:12 UTC 2023 - Matej Cepl <mcepl@suse.com>
|
|
|
|
- Switch documentation to be within the main package.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Apr 21 12:22:35 UTC 2023 - Dirk Müller <dmueller@suse.com>
|
|
|
|
- add sle15_python_module_pythons (jsc#PED-68)
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Apr 13 22:40:14 UTC 2023 - Matej Cepl <mcepl@suse.com>
|
|
|
|
- Make calling of %{sle15modernpython} optional.
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Feb 9 10:14:27 UTC 2022 - Steve Kowalik <steven.kowalik@suse.com>
|
|
|
|
- Update to 4.10.0:
|
|
* This is the first release of Beautiful Soup to only support Python 3.
|
|
* The behavior of methods like .get_text() and .strings now differs
|
|
depending on the type of tag.
|
|
* NavigableString and its subclasses now implement the get_text()
|
|
method, as well as the properties .strings and
|
|
.stripped_strings.
|
|
* The 'html5' formatter now treats attributes whose values are the
|
|
empty string as HTML boolean attributes.
|
|
* The 'replace_with()' method now takes a variable number of arguments,
|
|
and can be used to replace a single element with a sequence of elements.
|
|
* Corrected output when the namespace prefix associated with a
|
|
namespaced attribute is the empty string, as opposed to
|
|
None.
|
|
* Performance improvement when processing tags that speeds up overall
|
|
tree construction by 2%. Patch by Morotti. [bug=1899358]
|
|
* Corrected the use of special string container classes in cases when a
|
|
single tag may contain strings with different containers; such as
|
|
the <template> tag, which may contain both TemplateString objects
|
|
and Comment objects.
|
|
* The html.parser tree builder can now handle named entities
|
|
found in the HTML5 spec in much the same way that the html5lib
|
|
tree builder does.
|
|
* Added a second way to pass specify encodings to UnicodeDammit and
|
|
EncodingDetector, based on the order of precedence defined in the
|
|
HTML5 spec.
|
|
* Improve the warning issued when a directory name (as opposed to
|
|
the name of a regular file) is passed as markup into the BeautifulSoup
|
|
constructor.
|
|
- Do not pass the directory to pytest.
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Oct 10 18:34:16 UTC 2020 - Arun Persaud <arun@gmx.de>
|
|
|
|
- update to version 4.9.3:
|
|
* Implemented a significant performance optimization to the process
|
|
of searching the parse tree. Patch by Morotti. [bug=1898212]
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Sep 28 11:41:27 UTC 2020 - Dirk Mueller <dmueller@suse.com>
|
|
|
|
- update to 4.9.2:
|
|
* Fixed a bug that caused too many tags to be popped from the tag
|
|
stack during tree building, when encountering a closing tag that had
|
|
no matching opening tag. [bug=1880420]
|
|
|
|
* Fixed a bug that inconsistently moved elements over when passing
|
|
a Tag, rather than a list, into Tag.extend(). [bug=1885710]
|
|
|
|
* Specify the soupsieve dependency in a way that complies with
|
|
PEP 508. Patch by Mike Nerone. [bug=1893696]
|
|
|
|
* Change the signatures for BeautifulSoup.insert_before and insert_after
|
|
(which are not implemented) to match PageElement.insert_before and
|
|
insert_after, quieting warnings in some IDEs. [bug=1897120]
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Jun 3 11:10:03 UTC 2020 - Dirk Mueller <dmueller@suse.com>
|
|
|
|
- update to 4.9.1:
|
|
* Added a keyword argument 'on_duplicate_attribute' to the
|
|
BeautifulSoupHTMLParser constructor (used by the html.parser tree
|
|
builder) which lets you customize the handling of markup that
|
|
contains the same attribute more than once, as in:
|
|
<a href="url1" href="url2"> [bug=1878209]
|
|
* Added a distinct subclass, GuessedAtParserWarning, for the warning
|
|
issued when BeautifulSoup is instantiated without a parser being
|
|
specified. [bug=1873787]
|
|
* Added a distinct subclass, MarkupResemblesLocatorWarning, for the
|
|
warning issued when BeautifulSoup is instantiated with 'markup' that
|
|
actually seems to be a URL or the path to a file on
|
|
disk. [bug=1873787]
|
|
* The new NavigableString subclasses (Stylesheet, Script, and
|
|
TemplateString) can now be imported directly from the bs4 package.
|
|
* If you encode a document with a Python-specific encoding like
|
|
'unicode_escape', that encoding is no longer mentioned in the final
|
|
XML or HTML document. Instead, encoding information is omitted or
|
|
left blank. [bug=1874955]
|
|
* Fixed test failures when run against soupselect 2.0. Patch by Tomáš
|
|
Chvátal. [bug=1872279]
|
|
- remove soupsieve2-tests.patch: upstreamed
|
|
|
|
-------------------------------------------------------------------
|
|
Sun Apr 12 08:31:00 UTC 2020 - Tomáš Chvátal <tchvatal@suse.com>
|
|
|
|
- Add patch to fix the tests to pass with new soupsieve too:
|
|
* soupsieve2-tests.patch
|
|
* The assert name changed
|
|
|
|
-------------------------------------------------------------------
|
|
Sun Apr 12 07:50:37 UTC 2020 - Tomáš Chvátal <tchvatal@suse.com>
|
|
|
|
- Update to 4.9.0:
|
|
* fixes to work with new soupsieve
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Jan 1 08:52:41 UTC 2020 - Ismail Dönmez <idonmez@suse.com>
|
|
|
|
- Update to 4.8.2
|
|
* Added Python docstrings to all public methods of the most commonly
|
|
used classes.
|
|
* Fixed two deprecation warnings. Patches by Colin
|
|
Watson and Nicholas Neumann. [bug=1847592] [bug=1855301]
|
|
* The html.parser tree builder now correctly handles DOCTYPEs that are
|
|
not uppercase. [bug=1848401]
|
|
* PageElement.select() now returns a ResultSet rather than a regular
|
|
list, making it consistent with methods like find_all().
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Nov 1 08:59:57 UTC 2019 - Tomáš Chvátal <tchvatal@suse.com>
|
|
|
|
- Update to 4.8.1:
|
|
* When the html.parser or html5lib parsers are in use, Beautiful Soup
|
|
will, by default, record the position in the original document where
|
|
each tag was encountered.
|
|
* Fixed the definition of the default XML namespace when using
|
|
lxml 4.4.
|
|
* Avoid a crash when unpickling certain parse trees generated
|
|
using html5lib on Python 3.
|
|
* Avoid a crash when trying to detect the declared encoding of a
|
|
Unicode document.
|
|
- Drop patch beautifulsoup4-lxml-fixes.patch as it seems not needed
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Oct 14 11:41:52 UTC 2019 - Matej Cepl <mcepl@suse.com>
|
|
|
|
- Replace %fdupes -s with plain %fdupes; hardlinks are better.
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Jul 22 16:18:23 UTC 2019 - Todd R <toddrme2178@gmail.com>
|
|
|
|
- Update to 4.8.0
|
|
* It's now possible to customize the TreeBuilder object by passing
|
|
keyword arguments into the BeautifulSoup constructor. The main
|
|
reason to do this right now is to change how which attributes are
|
|
treated as multi-valued attributes (the way 'class' is treated by
|
|
default). You can do this with the `multi_valued_attributes` argument.
|
|
* The role of Formatter objects has been greatly expanded. The Formatter
|
|
class now controls the following:
|
|
> The function to call to perform entity substitution. (This was
|
|
previously Formatter's only job.)
|
|
> Which tags should be treated as containing CDATA and have their
|
|
contents exempt from entity substitution.
|
|
> The order in which a tag's attributes are output.
|
|
> Whether or not to put a '/' inside a void element, e.g. '<br/>' vs '<br>'
|
|
All preexisting code should work as before.
|
|
* Added a new method to the API, Tag.smooth(), which consolidates
|
|
multiple adjacent NavigableString elements.
|
|
* ' (which is valid in XML, XHTML, and HTML 5, but not HTML 4) is now
|
|
recognized as a named entity and converted to a single quote.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Mar 1 11:23:21 UTC 2019 - Tomáš Chvátal <tchvatal@suse.com>
|
|
|
|
- Do not generate doc for py2 and py3 variant they are the same
|
|
so keep just one around
|
|
- Update to 4.7.1:
|
|
* Fixed a significant performance problem introduced in 4.7.0. [bug=1810617]
|
|
* Fixed an incorrectly raised exception when inserting a tag before or
|
|
after an identical tag. [bug=1810692]
|
|
* Beautiful Soup will no longer try to keep track of namespaces that
|
|
are not defined with a prefix; this can confuse soupselect. [bug=1810680]
|
|
* Tried even harder to avoid the deprecation warning originally fixed in
|
|
4.6.1. [bug=1778909]
|
|
* Beautiful Soup's CSS Selector implementation has been replaced by a
|
|
dependency on Isaac Muse's SoupSieve project (the soupsieve package
|
|
on PyPI). The good news is that SoupSieve has a much more robust and
|
|
complete implementation of CSS selectors, resolving a large number
|
|
of longstanding issues. The bad news is that from this point onward,
|
|
SoupSieve must be installed if you want to use the select() method.
|
|
* Added the PageElement.extend() method, which works like list.append().
|
|
[bug=1514970]
|
|
* PageElement.insert_before() and insert_after() now take a variable
|
|
number of arguments. [bug=1514970]
|
|
* Fix a number of problems with the tree builder that caused
|
|
trees that were superficially okay, but which fell apart when bits
|
|
were extracted. Patch by Isaac Muse. [bug=1782928,1809910]
|
|
* Fixed a problem with the tree builder in which elements that
|
|
contained no content (such as empty comments and all-whitespace
|
|
elements) were not being treated as part of the tree. Patch by Isaac
|
|
Muse. [bug=1798699]
|
|
* Fixed a problem with multi-valued attributes where the value
|
|
contained whitespace. Thanks to Jens Svalgaard for the
|
|
fix. [bug=1787453]
|
|
* Clarified ambiguous license statements in the source code. Beautiful
|
|
Soup is released under the MIT license, and has been since 4.4.0.
|
|
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Dec 6 14:47:30 UTC 2018 - Ondřej Súkup <mimi.vx@gmail.com>
|
|
|
|
- update to 4.6.3
|
|
* Fix an exception when a custom formatter was asked to format
|
|
a void element
|
|
|
|
-------------------------------------------------------------------
|
|
Sun Aug 5 11:02:25 UTC 2018 - adrian@suse.de
|
|
|
|
- update to 4.6.1:
|
|
* Stop data loss when encountering an empty numeric entity, and
|
|
possibly in other cases. Thanks to tos.kamiya for the fix. [bug=1698503]
|
|
|
|
* Preserve XML namespaces introduced inside an XML document, not just
|
|
the ones introduced at the top level. [bug=1718787]
|
|
|
|
* Added a new formatter, "html5", which represents void elements
|
|
as "<element>" rather than "<element/>". [bug=1716272]
|
|
|
|
* Fixed a problem where the html.parser tree builder interpreted
|
|
a string like "&foo " as the character entity "&foo;" [bug=1728706]
|
|
|
|
* Correctly handle invalid HTML numeric character entities like “
|
|
which reference code points that are not Unicode code points. Note
|
|
that this is only fixed when Beautiful Soup is used with the
|
|
html.parser parser -- html5lib already worked and I couldn't fix it
|
|
with lxml. [bug=1782933]
|
|
|
|
* Improved the warning given when no parser is specified. [bug=1780571]
|
|
|
|
* When markup contains duplicate elements, a select() call that
|
|
includes multiple match clauses will match all relevant
|
|
elements. [bug=1770596]
|
|
|
|
* Fixed code that was causing deprecation warnings in recent Python 3
|
|
versions. Includes a patch from Ville Skyttä. [bug=1778909] [bug=1689496]
|
|
|
|
* Fixed a Windows crash in diagnose() when checking whether a long
|
|
markup string is a filename. [bug=1737121]
|
|
|
|
* Stopped HTMLParser from raising an exception in very rare cases of
|
|
bad markup. [bug=1708831]
|
|
|
|
* Fixed a bug where find_all() was not working when asked to find a
|
|
tag with a namespaced name in an XML document that was parsed as
|
|
HTML. [bug=1723783]
|
|
|
|
* You can get finer control over formatting by subclassing
|
|
bs4.element.Formatter and passing a Formatter instance into (e.g.)
|
|
encode(). [bug=1716272]
|
|
|
|
* You can pass a dictionary of `attrs` into
|
|
BeautifulSoup.new_tag. This makes it possible to create a tag with
|
|
an attribute like 'name' that would otherwise be masked by another
|
|
argument of new_tag. [bug=1779276]
|
|
|
|
* Clarified the deprecation warning when accessing tag.fooTag, to cover
|
|
the possibility that you might really have been looking for a tag
|
|
called 'fooTag'.
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Jul 16 18:08:01 UTC 2018 - mcepl@suse.com
|
|
|
|
- Clean SPEC file
|
|
Use py.test for running the tests instead of nosetests, which
|
|
breaks with python 3.7.
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Mar 6 12:27:41 UTC 2018 - aplanas@suse.com
|
|
|
|
- Allows Recommends and Suggest in Fedora
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Feb 27 17:00:11 UTC 2018 - aplanas@suse.com
|
|
|
|
- Recommends and Suggest are for SUSE
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Aug 10 13:38:03 UTC 2017 - tbechtold@suse.com
|
|
|
|
- Only Suggests python-html5lib and python-lxml (instead of Requires
|
|
them). Both are not striclty needed. See
|
|
https://www.crummy.com/software/BeautifulSoup/bs4/doc/#installing-a-parser
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Jul 5 06:28:31 UTC 2017 - dmueller@suse.com
|
|
|
|
- update to 4.6.0:
|
|
* Added the `Tag.get_attribute_list` method, which acts like `Tag.get` for
|
|
getting the value of an attribute, but which always returns a list,
|
|
whether or not the attribute is a multi-value attribute. [bug=1678589]
|
|
* Improved the handling of empty-element tags like <br> when using the
|
|
html.parser parser. [bug=1676935]
|
|
* HTML parsers treat all HTML4 and HTML5 empty element tags (aka void
|
|
element tags) correctly. [bug=1656909]
|
|
* Namespace prefix is preserved when an XML tag is copied. Thanks
|
|
to Vikas for a patch and test. [bug=1685172]
|
|
|
|
-------------------------------------------------------------------
|
|
Mon May 22 13:25:06 UTC 2017 - aloisio@gmx.com
|
|
|
|
- Fixed failing tests in python3
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Apr 8 17:35:17 UTC 2017 - aloisio@gmx.com
|
|
|
|
- update to version 4.5.3:
|
|
* Fixed foster parenting when html5lib is the tree builder. Thanks
|
|
to Geoffrey Sneddon for a patch and test.
|
|
* Fixed yet another problem that caused the html5lib tree builder to
|
|
create a disconnected parse tree. [bug=1629825]
|
|
changes from version 4.5.2:
|
|
* Apart from the version number, this release is identical to
|
|
4.5.3. Due to user error, it could not be completely uploaded to
|
|
PyPI. Use 4.5.3 instead.
|
|
|
|
- Converted to single-spec
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Sep 1 19:20:36 UTC 2016 - tbechtold@suse.com
|
|
|
|
- Relax BuildRequires for python-Sphinx
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Sep 1 10:26:24 UTC 2016 - tbechtold@suse.com
|
|
|
|
- update to 4.5.1:
|
|
* Fixed a crash when passing Unicode markup that contained a
|
|
processing instruction into the lxml HTML parser on Python
|
|
3. [bug=1608048]
|
|
* Beautiful Soup is no longer compatible with Python 2.6. This
|
|
actually happened a few releases ago, but it's now official.
|
|
* Beautiful Soup will now work with versions of html5lib greater than
|
|
0.99999999. [bug=1603299]
|
|
* If a search against each individual value of a multi-valued
|
|
attribute fails, the search will be run one final time against the
|
|
complete attribute value considered as a single string. That is, if
|
|
a tag has class="foo bar" and neither "foo" nor "bar" matches, but
|
|
"foo bar" does, the tag is now considered a match.
|
|
This happened in previous versions, but only when the value being
|
|
searched for was a string. Now it also works when that value is
|
|
a regular expression, a list of strings, etc. [bug=1476868]
|
|
* Fixed a bug that deranged the tree when a whitespace element was
|
|
reparented into a tag that contained an identical whitespace
|
|
element. [bug=1505351]
|
|
* Added support for CSS selector values that contain quoted spaces,
|
|
such as tag[style="display: foo"]. [bug=1540588]
|
|
* Corrected handling of XML processing instructions. [bug=1504393]
|
|
* Corrected an encoding error that happened when a BeautifulSoup
|
|
object was copied. [bug=1554439]
|
|
* The contents of <textarea> tags will no longer be modified when the
|
|
tree is prettified. [bug=1555829]
|
|
* When a BeautifulSoup object is pickled but its tree builder cannot
|
|
be pickled, its .builder attribute is set to None instead of being
|
|
destroyed. This avoids a performance problem once the object is
|
|
unpickled. [bug=1523629]
|
|
* Specify the file and line number when warning about a
|
|
BeautifulSoup object being instantiated without a parser being
|
|
specified. [bug=1574647]
|
|
* The `limit` argument to `select()` now works correctly, though it's
|
|
not implemented very efficiently. [bug=1520530]
|
|
* Fixed a Python 3 ByteWarning when a URL was passed in as though it
|
|
were markup. Thanks to James Salter for a patch and
|
|
test. [bug=1533762]
|
|
* We don't run the check for a filename passed in as markup if the
|
|
'filename' contains a less-than character; the less-than character
|
|
indicates it's most likely a very small document. [bug=1577864]
|
|
|
|
-------------------------------------------------------------------
|
|
Sun Nov 15 16:31:46 UTC 2015 - idonmez@suse.com
|
|
|
|
- Update to version 4.4.1
|
|
* Fixed a bug that deranged the tree when part of it was
|
|
removed. Thanks to Eric Weiser for the patch and John Wiseman for a
|
|
test. lp#1481520
|
|
* Fixed a parse bug with the html5lib tree-builder. Thanks to Roel
|
|
Kramer for the patch. lp#1483781
|
|
* Improved the implementation of CSS selector grouping. Thanks to
|
|
Orangain for the patch. lp#1484543
|
|
* Fixed the test_detect_utf8 test so that it works when chardet is
|
|
installed. lp#1471359
|
|
* Corrected the output of Declaration objects. lp#1477847
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Jul 27 18:54:20 UTC 2015 - aloisio@gmx.com
|
|
|
|
- update to 4.4.0
|
|
Especially important changes:
|
|
* Added a warning when you instantiate a BeautifulSoup object without
|
|
explicitly naming a parser. [bug=1398866]
|
|
* __repr__ now returns an ASCII bytestring in Python 2, and a Unicode
|
|
string in Python 3, instead of a UTF8-encoded bytestring in both
|
|
versions. In Python 3, __str__ now returns a Unicode string instead
|
|
of a bytestring. [bug=1420131]
|
|
* The `text` argument to the find_* methods is now called `string`,
|
|
which is more accurate. `text` still works, but `string` is the
|
|
argument described in the documentation. `text` may eventually
|
|
change its meaning, but not for a very long time. [bug=1366856]
|
|
* Changed the way soup objects work under copy.copy(). Copying a
|
|
NavigableString or a Tag will give you a new NavigableString that's
|
|
equal to the old one but not connected to the parse tree. Patch by
|
|
Martijn Peters. [bug=1307490]
|
|
* Started using a standard MIT license. [bug=1294662]
|
|
* Added a Chinese translation of the documentation by Delong .w.
|
|
New features:
|
|
* Introduced the select_one() method, which uses a CSS selector but
|
|
only returns the first match, instead of a list of
|
|
matches. [bug=1349367]
|
|
* You can now create a Tag object without specifying a
|
|
TreeBuilder. Patch by Martijn Pieters. [bug=1307471]
|
|
* You can now create a NavigableString or a subclass just by invoking
|
|
the constructor. [bug=1294315]
|
|
* Added an `exclude_encodings` argument to UnicodeDammit and to the
|
|
Beautiful Soup constructor, which lets you prohibit the detection of
|
|
an encoding that you know is wrong. [bug=1469408]
|
|
* The select() method now supports selector grouping. Patch by
|
|
Francisco Canas [bug=1191917]
|
|
Bug fixes:
|
|
* Fixed yet another problem that caused the html5lib tree builder to
|
|
create a disconnected parse tree. [bug=1237763]
|
|
* Force object_was_parsed() to keep the tree intact even when an element
|
|
from later in the document is moved into place. [bug=1430633]
|
|
* Fixed yet another bug that caused a disconnected tree when html5lib
|
|
copied an element from one part of the tree to another. [bug=1270611]
|
|
* Fixed a bug where Element.extract() could create an infinite loop in
|
|
the remaining tree.
|
|
* The select() method can now find tags whose names contain
|
|
dashes. Patch by Francisco Canas. [bug=1276211]
|
|
* The select() method can now find tags with attributes whose names
|
|
contain dashes. Patch by Marek Kapolka. [bug=1304007]
|
|
* Improved the lxml tree builder's handling of processing
|
|
instructions. [bug=1294645]
|
|
* Restored the helpful syntax error that happens when you try to
|
|
import the Python 2 edition of Beautiful Soup under Python 3.
|
|
[bug=1213387]
|
|
* In Python 3.4 and above, set the new convert_charrefs argument to
|
|
the html.parser constructor to avoid a warning and future
|
|
failures. Patch by Stefano Revera. [bug=1375721]
|
|
* The warning when you pass in a filename or URL as markup will now be
|
|
displayed correctly even if the filename or URL is a Unicode
|
|
string. [bug=1268888]
|
|
* If the initial <html> tag contains a CDATA list attribute such as
|
|
'class', the html5lib tree builder will now turn its value into a
|
|
list, as it would with any other tag. [bug=1296481]
|
|
* Fixed an import error in Python 3.5 caused by the removal of the
|
|
HTMLParseError class. [bug=1420063]
|
|
* Improved docstring for encode_contents() and
|
|
decode_contents(). [bug=1441543]
|
|
* Fixed a crash in Unicode, Dammit's encoding detector when the name
|
|
of the encoding itself contained invalid bytes. [bug=1360913]
|
|
* Improved the exception raised when you call .unwrap() or
|
|
.replace_with() on an element that's not attached to a tree.
|
|
* Raise a NotImplementedError whenever an unsupported CSS pseudoclass
|
|
is used in select(). Previously some cases did not result in a
|
|
NotImplementedError.
|
|
* It's now possible to pickle a BeautifulSoup object no matter which
|
|
tree builder was used to create it. However, the only tree builder
|
|
that survives the pickling process is the HTMLParserTreeBuilder
|
|
('html.parser'). If you unpickle a BeautifulSoup object created with
|
|
some other tree builder, soup.builder will be None. [bug=1231545]
|
|
- Aligned requirement version with PyPI
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Jul 24 20:25:54 UTC 2015 - seife+obs@b1-systems.com
|
|
|
|
- fix non-SUSE build by conditionalizing Recommends: tag
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Jan 8 15:05:55 UTC 2014 - speilicke@suse.com
|
|
|
|
- Add beautifulsoup4-lxml-fixes.patch: LXML fixes
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Oct 22 13:00:00 UTC 2013 - toddrme2178@gmail.com
|
|
|
|
- update to 4.3.2
|
|
* Fixed a bug in which short Unicode input was improperly encoded to
|
|
ASCII when checking whether or not it was the name of a file on
|
|
disk. [bug=1227016]
|
|
* Fixed a crash when a short input contains data not valid in
|
|
filenames. [bug=1232604]
|
|
* Fixed a bug that caused Unicode data put into UnicodeDammit to
|
|
return None instead of the original data. [bug=1214983]
|
|
* Combined two tests to stop a spurious test failure when tests are
|
|
run by nosetests. [bug=1212445]
|
|
- update to 4.3.1
|
|
* Fixed yet another problem with the html5lib tree builder, caused by
|
|
html5lib's tendency to rearrange the tree during
|
|
parsing. [bug=1189267]
|
|
* Fixed a bug that caused the optimized version of find_all() to
|
|
return nothing. [bug=1212655]
|
|
- update to 4.3.0
|
|
* Instead of converting incoming data to Unicode and feeding it to the
|
|
lxml tree builder in chunks, Beautiful Soup now makes successive
|
|
guesses at the encoding of the incoming data, and tells lxml to
|
|
parse the data as that encoding. Giving lxml more control over the
|
|
parsing process improves performance and avoids a number of bugs and
|
|
issues with the lxml parser which had previously required elaborate
|
|
workarounds:
|
|
- An issue in which lxml refuses to parse Unicode strings on some
|
|
systems. [bug=1180527]
|
|
- A returning bug that truncated documents longer than a (very
|
|
small) size. [bug=963880]
|
|
- A returning bug in which extra spaces were added to a document if
|
|
the document defined a charset other than UTF-8. [bug=972466]
|
|
This required a major overhaul of the tree builder architecture. If
|
|
you wrote your own tree builder and didn't tell me, you'll need to
|
|
modify your prepare_markup() method.
|
|
* The UnicodeDammit code that makes guesses at encodings has been
|
|
split into its own class, EncodingDetector. A lot of apparently
|
|
redundant code has been removed from Unicode, Dammit, and some
|
|
undocumented features have also been removed.
|
|
* Beautiful Soup will issue a warning if instead of markup you pass it
|
|
a URL or the name of a file on disk (a common beginner's mistake).
|
|
* A number of optimizations improve the performance of the lxml tree
|
|
builder by about 33%, the html.parser tree builder by about 20%, and
|
|
the html5lib tree builder by about 15%.
|
|
* All find_all calls should now return a ResultSet object. Patch by
|
|
Aaron DeVore. [bug=1194034]
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Jul 19 17:07:52 UTC 2013 - berendt@b1-systems.de
|
|
|
|
- remove .buildinfo before installation
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Jul 18 08:25:50 UTC 2013 - berendt@b1-systems.de
|
|
|
|
- removed python-lxml as build requirement to be able to
|
|
successfully pass the check section on SLES11 SP3
|
|
|
|
-------------------------------------------------------------------
|
|
Thu Jun 27 13:32:06 UTC 2013 - speilicke@suse.com
|
|
|
|
- Update upstream URL
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Jun 25 11:52:34 UTC 2013 - dmueller@suse.com
|
|
|
|
- update to 4.2.1:
|
|
* The default XML formatter will now replace ampersands even if they
|
|
appear to be part of entities. That is, "<" will become
|
|
"&lt;". The old code was left over from Beautiful Soup 3, which
|
|
didn't always turn entities into Unicode characters.
|
|
|
|
If you really want the old behavior (maybe because you add new
|
|
strings to the tree, those strings include entities, and you want
|
|
the formatter to leave them alone on output), it can be found in
|
|
EntitySubstitution.substitute_xml_containing_entities(). [bug=1182183]
|
|
|
|
* Gave new_string() the ability to create subclasses of
|
|
NavigableString. [bug=1181986]
|
|
|
|
* Fixed another bug by which the html5lib tree builder could create a
|
|
disconnected tree. [bug=1182089]
|
|
|
|
* The .previous_element of a BeautifulSoup object is now always None,
|
|
not the last element to be parsed. [bug=1182089]
|
|
|
|
* Fixed test failures when lxml is not installed. [bug=1181589]
|
|
|
|
* html5lib now supports Python 3. Fixed some Python 2-specific
|
|
code in the html5lib test suite. [bug=1181624]
|
|
|
|
* The html.parser treebuilder can now handle numeric attributes in
|
|
text when the hexidecimal name of the attribute starts with a
|
|
capital X. Patch by Tim Shirley. [bug=1186242]
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Jun 10 20:34:00 UTC 2013 - dmueller@suse.com
|
|
|
|
- disable tests on SLE_11, fail due to too old python-lxml
|
|
|
|
-------------------------------------------------------------------
|
|
Sat May 18 13:30:00 UTC 2013 - toddrme2178@gmail.com
|
|
|
|
- Update to 4.2.0
|
|
* The Tag.select() method now supports a much wider variety of CSS
|
|
selectors.
|
|
- Added support for the adjacent sibling combinator (+) and the
|
|
general sibling combinator (~). Tests by "liquider". [bug=1082144]
|
|
- The combinators (>, +, and ~) can now combine with any supported
|
|
selector, not just one that selects based on tag name.
|
|
- Added limited support for the "nth-of-type" pseudo-class. Code
|
|
by Sven Slootweg. [bug=1109952]
|
|
* The BeautifulSoup class is now aliased to "_s" and "_soup", making
|
|
it quicker to type the import statement in an interactive session
|
|
The alias may change in the future, so don't use this in code you're
|
|
going to run more than once.
|
|
* Added the 'diagnose' submodule, which includes several useful
|
|
functions for reporting problems and doing tech support.
|
|
- diagnose(data) tries the given markup on every installed parser,
|
|
reporting exceptions and displaying successes. If a parser is not
|
|
installed, diagnose() mentions this fact.
|
|
- lxml_trace(data, html=True) runs the given markup through lxml's
|
|
XML parser or HTML parser, and prints out the parser events as
|
|
they happen. This helps you quickly determine whether a given
|
|
problem occurs in lxml code or Beautiful Soup code.
|
|
- htmlparser_trace(data) is the same thing, but for Python's
|
|
built-in HTMLParser class.
|
|
* In an HTML document, the contents of a <script> or <style> tag will
|
|
no longer undergo entity substitution by default. XML documents work
|
|
the same way they did before. [bug=1085953]
|
|
* Methods like get_text() and properties like .strings now only give
|
|
you strings that are visible in the document--no comments or
|
|
processing commands. [bug=1050164]
|
|
* The prettify() method now leaves the contents of <pre> tags
|
|
alone. [bug=1095654]
|
|
* Fix a bug in the html5lib treebuilder which sometimes created
|
|
disconnected trees. [bug=1039527]
|
|
* Fix a bug in the lxml treebuilder which crashed when a tag included
|
|
an attribute from the predefined "xml:" namespace. [bug=1065617]
|
|
* Fix a bug by which keyword arguments to find_parent() were not
|
|
being passed on. [bug=1126734]
|
|
* Stop a crash when unwisely messing with a tag that's been
|
|
decomposed. [bug=1097699]
|
|
* Now that lxml's segfault on invalid doctype has been fixed, fixed a
|
|
corresponding problem on the Beautiful Soup end that was previously
|
|
invisible. [bug=984936]
|
|
* Fixed an exception when an overspecified CSS selector didn't match
|
|
anything. Code by Stefaan Lippens. [bug=1168167]
|
|
- Re-enable lxml support (unit tests require it)
|
|
- Build documentation and add doc sub-package
|
|
|
|
-------------------------------------------------------------------
|
|
Tue Apr 30 12:59:02 UTC 2013 - dmueller@suse.com
|
|
|
|
- remove lxml support (fails unit test)
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Jan 12 14:10:18 UTC 2013 - toddrme2178@gmail.com
|
|
|
|
- Use explicit file list
|
|
- Fix building on openSUSE 12.1 and 12.2
|
|
- Use recommended lxml parser instead of native one
|
|
(native fails fails for some python versions)
|
|
|
|
-------------------------------------------------------------------
|
|
Wed Jan 9 21:15:18 UTC 2013 - cfarrell@suse.com
|
|
|
|
- license update: MIT
|
|
See COPYING.txt
|
|
|
|
-------------------------------------------------------------------
|
|
Mon Sep 10 18:52:45 UTC 2012 - nmo.marques@gmail.com
|
|
|
|
- initial package from version 4.1.3
|
|
- based on spec file from python-beautifulsoup
|
|
- requires python >= 2.6
|
|
|