* Fixed a bug where Unicode escapes in CSS were not properly decoded before
security checks. This prevents attackers from bypassing filters using
escape sequences. (CVE-2026-28348) (bsc#1259378)
* Fixed a security issue where <base> tags could be used for URL hijacking
attacks. The <base> tag is now automatically removed whenever the <head>
tag is removed (via page_structure=True or manual configuration), as <base>
must be inside <head> according to HTML specifications. (CVE-2026-28350)
(bsc#1259379)
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-lxml_html_clean?expand=0&rev=11
- Update to 0.4.1
* Bugs fixed
- Removed superfluous debug prints.
- Changes from 0.4.0
* Bugs fixed
- The Cleaner() now scans for hidden JavaScript code embedded
within CSS comments. In certain contexts, such as within
<svg> or <math> tags, <style> tags may lose their intended
function, allowing comments like /* foo */ to potentially be
executed by the browser. If a suspicious content is detected,
only the comment is removed.
- Changes from 0.3.1
* Features added
- Do not parse URL addresses when it is not necessary.
- Changes from 0.3.0
* Features added
- Parsing of URL addresses has been enhanced and Cleaner
removes ambiguous URLs.
- Changes from 0.2.2
* Bugs fixed
- sdist now includes all test files and changelog.
- Changes from 0.2.1
* Bugs fixed
- Memory efficiency is now much better for HTML pages where
cleaner removes a lot of elements. (#14)
- Changes from 0.2.0
* Features added
- ASCII control characters (except HT, VT, CR and LF) are now
removed from string inputs before they're parsed by lxml/libxml2.
- Fix boo#1233541
OBS-URL: https://build.opensuse.org/request/show/1225615
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-lxml_html_clean?expand=0&rev=5