forked from pool/jsoup
201 lines
9.9 KiB
Plaintext
201 lines
9.9 KiB
Plaintext
-------------------------------------------------------------------
|
|
Mon Oct 17 05:42:39 UTC 2022 - Fridrich Strba <fstrba@suse.com>
|
|
|
|
- Upgrade to upstream version 1.15.3
|
|
- Changes of 1.15.3
|
|
* Security
|
|
+ Fixed bsc#1203459 (CVE-2022-36033), an issue where the jsoup
|
|
cleaner may incorrectly sanitize crafted XSS attempts if
|
|
SafeList.preserveRelativeLinks is enabled. See the security
|
|
advisory for more details.
|
|
* Improvements
|
|
+ The Cleaner will preserve the source position of cleaned
|
|
elements, if source tracking is enabled in the original parse.
|
|
+ The error messages output from Validate are more descriptive.
|
|
Exceptions are now ValidationExceptions
|
|
(extending IllegalArgumentException). Stack traces do not
|
|
include the Validate class, to make it simpler to see where
|
|
the exception originated. Common validation errors including
|
|
malformed URLs and empty selector results have more explicit
|
|
error messages.
|
|
+ Build Improvement: added implementation version and related
|
|
fields to the jar manifest.
|
|
* Bug Fixes
|
|
+ The DataUtil would incorrectly read from InputStreams that
|
|
emitted reads less than the requested size. This lead to
|
|
incorrect results when parsing from chunked server responses,
|
|
for example.
|
|
- Changes of 1.15.2
|
|
* Improvements
|
|
+ Added the ability to track the position (line, column, index)
|
|
in the original input source from where a given node was
|
|
parsed. Accessible via Node.sourceRange() and
|
|
Element.endSourceRange().
|
|
+ Added Element.firstElementChild(), Element.lastElementChild(),
|
|
Node.firstChild(), Node.lastChild(), as convenient accessors
|
|
to those child nodes and elements.
|
|
+ Added Element.expectFirst(), which is just like
|
|
Element.selectFirst(), but instead of returning a null if
|
|
there is no match, will throw an IllegalArgumentException.
|
|
This is useful if you want to simply abort processing if an
|
|
expected match is not found, such as in test cases.
|
|
+ When pretty-printing HTML, doctypes are emitted on a newline
|
|
if there is a preceding comment.
|
|
+ When pretty-printing, trim the leading and trailing spaces of
|
|
textnodes in block tags when possible, so that they are
|
|
indented correctly.
|
|
+ In Element.selectXpath(), disable namespace awareness. This
|
|
makes it possible to always select elements by their simple
|
|
local name, regardless of whether an xmlns attribute was set.
|
|
* Bug Fixes
|
|
+ When using the DataUtil.readToByteBuffer() method, such as in
|
|
Connection.Response.body(), if the document has not already
|
|
been parsed and must be read fully, and there is any maximum
|
|
buffer size being applied, only the default internal buffer
|
|
size was read.
|
|
+ When serializing HTML, newlines in elements descending from a
|
|
pre tag were incorrectly skipped. That caused what should have
|
|
been preformatted output to instead be a run of text.
|
|
+ When pretty-print serializing HTML, newlines separating
|
|
phrasing content (e.g. a <span> tag within a <p> tag would be
|
|
incorrectly skipped, instead of normalized to a space.
|
|
Additionally, improved space normalization between other end
|
|
of line occurences, and whitespace handling after a closing
|
|
</body>
|
|
- Changes of 1.15.1
|
|
* Changes
|
|
+ Removed previously deprecated methods and classes (including
|
|
org.jsoup.safety.Whitelist; use org.jsoup.safety.Safelist
|
|
instead).
|
|
* Improvements
|
|
+ When converting jsoup Documents to W3C Documents in W3CDom,
|
|
preserve HTML valid attribute names if the input document is
|
|
using the HTML syntax. (Previously, would always coerce using
|
|
the more restrictive XML syntax.)
|
|
+ Added the :containsWholeText(text) selector, to match against
|
|
non-normalized Element text. That can be useful when elements
|
|
can only be distinguished by e.g. specific case, or leading
|
|
whitespace, etc.
|
|
+ Added Element#wholeOwnText() to retrieve the original
|
|
(non-normalized) ownText of an Element. Also added the
|
|
:containsWholeOwnText(text) selector, to match against that.
|
|
BR elements are now treated as newlines in the wholeText
|
|
methods.
|
|
+ Added the :matchesWholeText(regex) and
|
|
:matchesWholeOwnText(regex) selectors, to match against whole
|
|
(non-normalized, case sensitive) element text and own text,
|
|
respectively.
|
|
+ When evaluating an XPath query against a context element, the
|
|
complete document is now visible to the query, vs only the
|
|
context element's sub-tree. This enables support for queries
|
|
outside (parent or sibling) the element, e.g.
|
|
ancestor-or-self::*.
|
|
+ Allow a maxPaddingWidth on the indent level in OutputSettings
|
|
when pretty printing. This defaults to 30 to limit the indent
|
|
level for very deeply nested elements, and may be disabled by
|
|
setting to -1.
|
|
+ When cloning a Node or an Element, the clone gets a cloned
|
|
OwnerDocument containing only that clone, so as to preserve
|
|
applicable settings, such as the Pretty Print settings.
|
|
+ Added a convenience method Jsoup.parse(File).
|
|
+ In the NodeTraversor, added default implementations for
|
|
NodeVisitor.tail() and NodeFilter.tail(), so that code using
|
|
only head() methods can be written as lambdas.
|
|
+ In NodeTraversor, added support for removing nodes via
|
|
Node.remove() during NodeVisitor.head().
|
|
+ Added Node.forEachNode(Consumer<Node>) and
|
|
Element.forEach(Consumer<Element) methods, to efficiently
|
|
traverse the DOM with a functional interface.
|
|
* Bug Fixes
|
|
+ Boolean attribute names should be case-insensitive, but were
|
|
not when the parser was configured to preserve case.
|
|
+ When reading from SequenceInputStreams across the buffer, the
|
|
input stream was closed too early, resulting in missed
|
|
content.
|
|
+ A comment with all dashes (<!----->) should not emit a parse
|
|
error.
|
|
+ When throwing a SelectorParseException for an invalid
|
|
selector, don't try to String.format the input, as that could
|
|
throw an IllegalFormatException.
|
|
+ When serializing HTML with Pretty Print enabled, extraneous
|
|
whitespace may be added on closing tags, or extra newlines may
|
|
be added at the end of script blocks.
|
|
+ When copy-creating a Safelist from another, perform a
|
|
deep-copy of the original's settings, so that changes to the
|
|
original after creation do not affect the copy.
|
|
+ Speed improvement when parsing constructed HTML containing
|
|
very deeply incorrectly stacked formatting elements with many
|
|
attributes.
|
|
+ During parsing, a StackOverflowException was possible given
|
|
crafted HTML with hundreds of nested table elements followed
|
|
by invalid formatting elements.
|
|
- Changes of 1.14.3
|
|
* Improvements
|
|
+ Added native XPath support with Element.selectXpath(String)
|
|
+ Added full support for the <template> tag, up to the HTML5
|
|
parser spec.
|
|
+ Added support in CharacterReader to track newlines, so that
|
|
parse errors can be reported more intuitively.
|
|
+ Tracked parse errors now have more details, including the
|
|
erroneous token, to help clarify the errors.
|
|
+ Speed and memory optimizations for the :has(subquery)
|
|
selector.
|
|
+ The :contains(text) and :containsOwn(text) selectors are now
|
|
whitespace normalized, aligning to the document text that they
|
|
are matching against.
|
|
+ In Element, speed optimized adopting all of an element's child
|
|
nodes into a currently empty element. Improves the HTML
|
|
adoption agency algorithm when adopting elements with many
|
|
children.
|
|
+ Increased the parse speed when in RCData (e.g. <title>) and
|
|
unescaped <tag> tokens are found, by memoizing the </title>
|
|
scan and reducing GC.
|
|
+ When parsing custom tags (in HTML or XML), added a flyweight
|
|
cache on Tag.valueOf(String) to reduce memory overhead when
|
|
many tags are repeated. Also tuned other areas of the parser
|
|
when many very deeply stacked custom elements were present.
|
|
* Bug Fixes
|
|
+ The OSGi bundle meta-data incorrectly set a version on the
|
|
import of javax.annotation (used as a build-time dependency
|
|
for nullability assertions).
|
|
+ When tracking errors or checking for validity in the Cleaner,
|
|
errors were incorrectly raised for missing optional closing tags.
|
|
+ The Attributes.equals() method was sensitive to the order of
|
|
its contents, but it should not be.
|
|
+ When the HTML parser was configured to preserve case, Element
|
|
text methods would miss adding whitespace for BR tags.
|
|
+ Attribute names are now normalized & validated correctly for
|
|
the specific output syntax (HTML or XML). Previously,
|
|
syntactically invalid attribute names could be output by the
|
|
html() methods. Such attributes are still available in the
|
|
DOM, and will be normalized if possible on output.
|
|
+ Fixed an IOOB when an empty select tag was followed by a body
|
|
tag that needed reparenting.
|
|
* Build Improvements
|
|
+ Fixed nullability annotations for Node.equals(Object) and
|
|
other equals methods.
|
|
+ Added JDK 17 to the CI builds.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Aug 27 06:57:23 UTC 2021 - Fridrich Strba <fstrba@suse.com>
|
|
|
|
- Upgrade to upstream version 1.14.2
|
|
* fixes bsc#1189749, CVE-2021-37714
|
|
- Generate tarball using source service instead of a script
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Feb 22 22:39:00 UTC 2019 - Fridrich Strba <fstrba@suse.com>
|
|
|
|
- Remove from the tarball the non-free test data
|
|
|
|
-------------------------------------------------------------------
|
|
Sat Feb 2 18:52:01 UTC 2019 - Jan Engelhardt <jengelh@inai.de>
|
|
|
|
- Ensure neutrality of descriptions.
|
|
|
|
-------------------------------------------------------------------
|
|
Fri Feb 1 08:53:28 UTC 2019 - Fridrich Strba <fstrba@suse.com>
|
|
|
|
- Initial packaging of jsoup version 1.11.3
|
|
- Added jsoup-build.xml file to build with ant
|