forked from pool/python-beautifulsoup4
77790c427c
- update to 4.6.1: * Stop data loss when encountering an empty numeric entity, and possibly in other cases. Thanks to tos.kamiya for the fix. [bug=1698503] * Preserve XML namespaces introduced inside an XML document, not just the ones introduced at the top level. [bug=1718787] * Added a new formatter, "html5", which represents void elements as "<element>" rather than "<element/>". [bug=1716272] * Fixed a problem where the html.parser tree builder interpreted a string like "&foo " as the character entity "&foo;" [bug=1728706] * Correctly handle invalid HTML numeric character entities like “ which reference code points that are not Unicode code points. Note that this is only fixed when Beautiful Soup is used with the html.parser parser -- html5lib already worked and I couldn't fix it with lxml. [bug=1782933] * Improved the warning given when no parser is specified. [bug=1780571] * When markup contains duplicate elements, a select() call that includes multiple match clauses will match all relevant elements. [bug=1770596] * Fixed code that was causing deprecation warnings in recent Python 3 versions. Includes a patch from Ville Skyttä. [bug=1778909] [bug=1689496] * Fixed a Windows crash in diagnose() when checking whether a long markup string is a filename. [bug=1737121] OBS-URL: https://build.opensuse.org/request/show/627521 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-beautifulsoup4?expand=0&rev=62
111 lines
4.1 KiB
RPMSpec
111 lines
4.1 KiB
RPMSpec
#
|
|
# spec file for package python-beautifulsoup4
|
|
#
|
|
# Copyright (c) 2018 SUSE LINUX GmbH, Nuernberg, Germany.
|
|
#
|
|
# All modifications and additions to the file contributed by third parties
|
|
# remain the property of their copyright owners, unless otherwise agreed
|
|
# upon. The license for this file, and modifications and additions to the
|
|
# file, is the same license as for the pristine package itself (unless the
|
|
# license for the pristine package is not an Open Source License, in which
|
|
# case the license is the MIT License). An "Open Source License" is a
|
|
# license that conforms to the Open Source Definition (Version 1.9)
|
|
# published by the Open Source Initiative.
|
|
|
|
# Please submit bugfixes or comments via http://bugs.opensuse.org/
|
|
#
|
|
|
|
|
|
%{?!python_module:%define python_module() python-%{**} python3-%{**}}
|
|
Name: python-beautifulsoup4
|
|
Version: 4.6.1
|
|
Release: 0
|
|
Summary: HTML/XML Parser for Quick-Turnaround Applications Like Screen-Scraping
|
|
License: MIT
|
|
Group: Development/Libraries/Python
|
|
URL: https://www.crummy.com/software/BeautifulSoup/
|
|
Source: https://files.pythonhosted.org/packages/source/b/beautifulsoup4/beautifulsoup4-%{version}.tar.gz
|
|
# PATCH-FIX-UPSTREAM speilicke@suse.com -- Backport of https://code.launchpad.net/~saschpe/beautifulsoup/beautifulsoup/+merge/200849
|
|
Patch0: beautifulsoup4-lxml-fixes.patch
|
|
# Documentation requirements:
|
|
BuildRequires: %{python_module devel >= 2.6}
|
|
# Test requirements
|
|
BuildRequires: %{python_module pytest}
|
|
BuildRequires: %{python_module setuptools}
|
|
BuildRequires: fdupes
|
|
BuildRequires: python-rpm-macros
|
|
BuildRequires: python3-Sphinx
|
|
BuildArch: noarch
|
|
%if 0%{?suse_version} >= 1000 || 0%{?fedora_version} >= 24
|
|
Suggests: python-html5lib >= 0.999999
|
|
Suggests: python-lxml >= 3.4.4
|
|
%endif
|
|
%python_subpackages
|
|
|
|
%description
|
|
Beautiful Soup is a Python HTML/XML parser designed for quick turnaround
|
|
projects like screen-scraping. Three features make it powerful:
|
|
|
|
* Beautiful Soup won't choke if you give it bad markup. It yields a parse tree
|
|
that makes approximately as much sense as your original document. This is
|
|
usually good enough to collect the data you need and run away
|
|
|
|
* Beautiful Soup provides a few simple methods and Pythonic idioms for
|
|
navigating, searching, and modifying a parse tree: a toolkit for dissecting a
|
|
document and extracting what you need. You don't have to create a custom
|
|
parser for each application
|
|
|
|
* Beautiful Soup automatically converts incoming documents to Unicode and
|
|
outgoing documents to UTF-8. You don't have to think about encodings, unless
|
|
the document doesn't specify an encoding and Beautiful Soup can't autodetect
|
|
one. Then you just have to specify the original encoding
|
|
|
|
Beautiful Soup parses anything you give it, and does the tree traversal stuff
|
|
for you. You can tell it "Find all the links", or "Find all the links of class
|
|
externalLink", or "Find all the links whose urls match "foo.com", or "Find the
|
|
table heading that's got bold text, then give me that text."
|
|
|
|
Valuable data that was once locked up in poorly-designed websites is now within
|
|
your reach. Projects that would have taken hours take only minutes with
|
|
Beautiful Soup.
|
|
|
|
%package doc
|
|
Summary: Documentation for %{name}
|
|
Group: Development/Libraries/Python
|
|
%if 0%{?suse_version} >= 1000 || 0%{?fedora_version} >= 24
|
|
Recommends: %{name} = %{version}
|
|
%endif
|
|
|
|
%description doc
|
|
Documentation and help files for %{name}
|
|
|
|
%prep
|
|
%setup -q -n beautifulsoup4-%{version}
|
|
%patch0 -p1
|
|
|
|
%build
|
|
%python_build
|
|
pushd doc && make html && rm build/html/.buildinfo build/html/objects.inv && popd
|
|
|
|
%install
|
|
%python_install
|
|
%python_expand %fdupes -s %{buildroot}%{$python_sitelib}
|
|
|
|
%check
|
|
export LANG=en_US.UTF-8
|
|
%{python_expand export TESTROOT=%{buildroot}%{$python_sitelib}/bs4/tests
|
|
py.test-%{$python_bin_suffix} $TESTROOT
|
|
rm -rf $TESTROOT/__pycache__
|
|
}
|
|
|
|
%files %{python_files}
|
|
%license COPYING.txt
|
|
%doc AUTHORS.txt
|
|
%{python_sitelib}/bs4/
|
|
%{python_sitelib}/beautifulsoup4-%{version}-py*.egg-info
|
|
|
|
%files %{python_files doc}
|
|
%doc NEWS.txt README.txt TODO.txt doc/build/html
|
|
|
|
%changelog
|