2019-02-19 14:19:30 +00:00
|
|
|
#
|
|
|
|
# spec file for package python-Scrapy
|
|
|
|
#
|
2024-01-10 07:53:57 +00:00
|
|
|
# Copyright (c) 2024 SUSE LLC
|
2019-02-19 14:19:30 +00:00
|
|
|
#
|
|
|
|
# All modifications and additions to the file contributed by third parties
|
|
|
|
# remain the property of their copyright owners, unless otherwise agreed
|
|
|
|
# upon. The license for this file, and modifications and additions to the
|
|
|
|
# file, is the same license as for the pristine package itself (unless the
|
|
|
|
# license for the pristine package is not an Open Source License, in which
|
|
|
|
# case the license is the MIT License). An "Open Source License" is a
|
|
|
|
# license that conforms to the Open Source Definition (Version 1.9)
|
|
|
|
# published by the Open Source Initiative.
|
|
|
|
|
2019-07-24 09:00:54 +00:00
|
|
|
# Please submit bugfixes or comments via https://bugs.opensuse.org/
|
2019-02-19 14:19:30 +00:00
|
|
|
#
|
|
|
|
|
|
|
|
|
- update to 2.11.1 (bsc#1220514, CVE-2024-1892):
* Addressed `ReDoS vulnerabilities` (bsc#1220514, CVE-2024-1892)
- ``scrapy.utils.iterators.xmliter`` is now deprecated in favor of
:func:`~scrapy.utils.iterators.xmliter_lxml`, which
:class:`~scrapy.spiders.XMLFeedSpider` now uses.
To minimize the impact of this change on existing code,
:func:`~scrapy.utils.iterators.xmliter_lxml` now supports indicating
the node namespace with a prefix in the node name, and big files with
highly nested trees when using libxml2 2.7+.
- Fixed regular expressions in the implementation of the
:func:`~scrapy.utils.response.open_in_browser` function.
.. _ReDoS vulnerabilities: https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
* :setting:`DOWNLOAD_MAXSIZE` and :setting:`DOWNLOAD_WARNSIZE` now also apply
to the decompressed response body. Please, see the `7j7m-v7m3-jqm7 security
advisory`_ for more information.
.. _7j7m-v7m3-jqm7 security advisory: https://github.com/scrapy/scrapy/security/advisories/GHSA-7j7m-v7m3-jqm7
* Also in relation with the `7j7m-v7m3-jqm7 security advisory`_, the
deprecated ``scrapy.downloadermiddlewares.decompression`` module has been
removed.
* The ``Authorization`` header is now dropped on redirects to a different
domain. Please, see the `cw9j-q3vf-hrrv security advisory`_ for more
information.
* The OS signal handling code was refactored to no longer use private Twisted
functions. (:issue:`6024`, :issue:`6064`, :issue:`6112`)
* Improved documentation for :class:`~scrapy.crawler.Crawler` initialization
changes made in the 2.11.0 release. (:issue:`6057`, :issue:`6147`)
* Extended documentation for :attr:`Request.meta <scrapy.http.Request.meta>`.
* Fixed the :reqmeta:`dont_merge_cookies` documentation. (:issue:`5936`,
* Added a link to Zyte's export guides to the :ref:`feed exports
* Added a missing note about backward-incompatible changes in
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=37
2024-03-25 15:36:37 +00:00
|
|
|
%{?sle15_python_module_pythons}
|
2019-02-19 14:19:30 +00:00
|
|
|
Name: python-Scrapy
|
2024-07-11 10:53:38 +00:00
|
|
|
Version: 2.11.2
|
2019-02-19 14:19:30 +00:00
|
|
|
Release: 0
|
|
|
|
Summary: A high-level Python Screen Scraping framework
|
|
|
|
License: BSD-3-Clause
|
|
|
|
Group: Development/Languages/Python
|
2019-07-24 10:27:28 +00:00
|
|
|
URL: https://scrapy.org
|
2024-07-11 11:08:02 +00:00
|
|
|
Source: https://files.pythonhosted.org/packages/source/s/scrapy/scrapy-%{version}.tar.gz
|
2024-07-11 10:53:38 +00:00
|
|
|
BuildRequires: %{python_module Brotli}
|
2019-02-19 14:19:30 +00:00
|
|
|
BuildRequires: %{python_module Pillow}
|
2024-07-11 10:53:38 +00:00
|
|
|
BuildRequires: %{python_module Protego}
|
2019-02-19 14:19:30 +00:00
|
|
|
BuildRequires: %{python_module PyDispatcher >= 2.0.5}
|
2024-01-10 07:53:57 +00:00
|
|
|
BuildRequires: %{python_module Twisted >= 18.9.0}
|
|
|
|
BuildRequires: %{python_module attrs}
|
- update to 2.11.1 (bsc#1220514, CVE-2024-1892):
* Addressed `ReDoS vulnerabilities` (bsc#1220514, CVE-2024-1892)
- ``scrapy.utils.iterators.xmliter`` is now deprecated in favor of
:func:`~scrapy.utils.iterators.xmliter_lxml`, which
:class:`~scrapy.spiders.XMLFeedSpider` now uses.
To minimize the impact of this change on existing code,
:func:`~scrapy.utils.iterators.xmliter_lxml` now supports indicating
the node namespace with a prefix in the node name, and big files with
highly nested trees when using libxml2 2.7+.
- Fixed regular expressions in the implementation of the
:func:`~scrapy.utils.response.open_in_browser` function.
.. _ReDoS vulnerabilities: https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
* :setting:`DOWNLOAD_MAXSIZE` and :setting:`DOWNLOAD_WARNSIZE` now also apply
to the decompressed response body. Please, see the `7j7m-v7m3-jqm7 security
advisory`_ for more information.
.. _7j7m-v7m3-jqm7 security advisory: https://github.com/scrapy/scrapy/security/advisories/GHSA-7j7m-v7m3-jqm7
* Also in relation with the `7j7m-v7m3-jqm7 security advisory`_, the
deprecated ``scrapy.downloadermiddlewares.decompression`` module has been
removed.
* The ``Authorization`` header is now dropped on redirects to a different
domain. Please, see the `cw9j-q3vf-hrrv security advisory`_ for more
information.
* The OS signal handling code was refactored to no longer use private Twisted
functions. (:issue:`6024`, :issue:`6064`, :issue:`6112`)
* Improved documentation for :class:`~scrapy.crawler.Crawler` initialization
changes made in the 2.11.0 release. (:issue:`6057`, :issue:`6147`)
* Extended documentation for :attr:`Request.meta <scrapy.http.Request.meta>`.
* Fixed the :reqmeta:`dont_merge_cookies` documentation. (:issue:`5936`,
* Added a link to Zyte's export guides to the :ref:`feed exports
* Added a missing note about backward-incompatible changes in
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=37
2024-03-25 15:36:37 +00:00
|
|
|
BuildRequires: %{python_module base >= 3.8}
|
2024-01-10 07:53:57 +00:00
|
|
|
BuildRequires: %{python_module botocore >= 1.4.87}
|
|
|
|
BuildRequires: %{python_module cryptography >= 36.0.0}
|
2020-01-16 15:35:55 +00:00
|
|
|
BuildRequires: %{python_module cssselect >= 0.9.1}
|
2020-04-02 03:41:29 +00:00
|
|
|
BuildRequires: %{python_module dbm}
|
2024-07-11 10:53:38 +00:00
|
|
|
BuildRequires: %{python_module defusedxml >= 0.7.1}
|
2020-07-08 06:42:00 +00:00
|
|
|
BuildRequires: %{python_module itemadapter >= 0.1.0}
|
Accepting request 889030 from home:bnavigator:branches:devel:languages:python
- Update to 2.5.0:
* Official Python 3.9 support
* Experimental HTTP/2 support
* New get_retry_request() function to retry requests from spider
callbacks
* New headers_received signal that allows stopping downloads
early
* New Response.protocol attribute
- Release 2.4.1:
* Fixed feed exports overwrite support
* Fixed the asyncio event loop handling, which could make code
hang
* Fixed the IPv6-capable DNS resolver CachingHostnameResolver
for download handlers that call reactor.resolve
* Fixed the output of the genspider command showing placeholders
instead of the import part of the generated spider module
(issue 4874)
- Release 2.4.0:
* Python 3.5 support has been dropped.
* The file_path method of media pipelines can now access the
source item.
* This allows you to set a download file path based on item data.
* The new item_export_kwargs key of the FEEDS setting allows to
define keyword parameters to pass to item exporter classes.
* You can now choose whether feed exports overwrite or append to
the output file.
* For example, when using the crawl or runspider commands, you
can use the -O option instead of -o to overwrite the output
file.
* Zstd-compressed responses are now supported if zstandard is
installed.
* In settings, where the import path of a class is required, it
is now possible to pass a class object instead.
- Release 2.3.0:
* Feed exports now support Google Cloud Storage as a storage
backend
* The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver
output items in batches of up to the specified number of items.
* It also serves as a workaround for delayed file delivery,
which causes Scrapy to only start item delivery after the
crawl has finished when using certain storage backends (S3,
FTP, and now GCS).
* The base implementation of item loaders has been moved into a
separate library, itemloaders, allowing usage from outside
Scrapy and a separate release schedule
- Release 2.2.1:
* The startproject command no longer makes unintended changes to
the permissions of files in the destination folder, such as
removing execution permissions.
OBS-URL: https://build.opensuse.org/request/show/889030
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
|
|
|
BuildRequires: %{python_module itemloaders >= 1.0.1}
|
2024-01-10 07:53:57 +00:00
|
|
|
BuildRequires: %{python_module lxml >= 4.4.1}
|
2020-01-16 15:35:55 +00:00
|
|
|
BuildRequires: %{python_module parsel >= 1.5.0}
|
2024-01-10 07:53:57 +00:00
|
|
|
BuildRequires: %{python_module pexpect >= 4.8.1}
|
- update to 2.11.1 (bsc#1220514, CVE-2024-1892):
* Addressed `ReDoS vulnerabilities` (bsc#1220514, CVE-2024-1892)
- ``scrapy.utils.iterators.xmliter`` is now deprecated in favor of
:func:`~scrapy.utils.iterators.xmliter_lxml`, which
:class:`~scrapy.spiders.XMLFeedSpider` now uses.
To minimize the impact of this change on existing code,
:func:`~scrapy.utils.iterators.xmliter_lxml` now supports indicating
the node namespace with a prefix in the node name, and big files with
highly nested trees when using libxml2 2.7+.
- Fixed regular expressions in the implementation of the
:func:`~scrapy.utils.response.open_in_browser` function.
.. _ReDoS vulnerabilities: https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
* :setting:`DOWNLOAD_MAXSIZE` and :setting:`DOWNLOAD_WARNSIZE` now also apply
to the decompressed response body. Please, see the `7j7m-v7m3-jqm7 security
advisory`_ for more information.
.. _7j7m-v7m3-jqm7 security advisory: https://github.com/scrapy/scrapy/security/advisories/GHSA-7j7m-v7m3-jqm7
* Also in relation with the `7j7m-v7m3-jqm7 security advisory`_, the
deprecated ``scrapy.downloadermiddlewares.decompression`` module has been
removed.
* The ``Authorization`` header is now dropped on redirects to a different
domain. Please, see the `cw9j-q3vf-hrrv security advisory`_ for more
information.
* The OS signal handling code was refactored to no longer use private Twisted
functions. (:issue:`6024`, :issue:`6064`, :issue:`6112`)
* Improved documentation for :class:`~scrapy.crawler.Crawler` initialization
changes made in the 2.11.0 release. (:issue:`6057`, :issue:`6147`)
* Extended documentation for :attr:`Request.meta <scrapy.http.Request.meta>`.
* Fixed the :reqmeta:`dont_merge_cookies` documentation. (:issue:`5936`,
* Added a link to Zyte's export guides to the :ref:`feed exports
* Added a missing note about backward-incompatible changes in
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=37
2024-03-25 15:36:37 +00:00
|
|
|
BuildRequires: %{python_module pip}
|
2024-01-10 07:53:57 +00:00
|
|
|
BuildRequires: %{python_module pyOpenSSL >= 21.0.0}
|
- update to 2.11.1 (bsc#1220514, CVE-2024-1892):
* Addressed `ReDoS vulnerabilities` (bsc#1220514, CVE-2024-1892)
- ``scrapy.utils.iterators.xmliter`` is now deprecated in favor of
:func:`~scrapy.utils.iterators.xmliter_lxml`, which
:class:`~scrapy.spiders.XMLFeedSpider` now uses.
To minimize the impact of this change on existing code,
:func:`~scrapy.utils.iterators.xmliter_lxml` now supports indicating
the node namespace with a prefix in the node name, and big files with
highly nested trees when using libxml2 2.7+.
- Fixed regular expressions in the implementation of the
:func:`~scrapy.utils.response.open_in_browser` function.
.. _ReDoS vulnerabilities: https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
* :setting:`DOWNLOAD_MAXSIZE` and :setting:`DOWNLOAD_WARNSIZE` now also apply
to the decompressed response body. Please, see the `7j7m-v7m3-jqm7 security
advisory`_ for more information.
.. _7j7m-v7m3-jqm7 security advisory: https://github.com/scrapy/scrapy/security/advisories/GHSA-7j7m-v7m3-jqm7
* Also in relation with the `7j7m-v7m3-jqm7 security advisory`_, the
deprecated ``scrapy.downloadermiddlewares.decompression`` module has been
removed.
* The ``Authorization`` header is now dropped on redirects to a different
domain. Please, see the `cw9j-q3vf-hrrv security advisory`_ for more
information.
* The OS signal handling code was refactored to no longer use private Twisted
functions. (:issue:`6024`, :issue:`6064`, :issue:`6112`)
* Improved documentation for :class:`~scrapy.crawler.Crawler` initialization
changes made in the 2.11.0 release. (:issue:`6057`, :issue:`6147`)
* Extended documentation for :attr:`Request.meta <scrapy.http.Request.meta>`.
* Fixed the :reqmeta:`dont_merge_cookies` documentation. (:issue:`5936`,
* Added a link to Zyte's export guides to the :ref:`feed exports
* Added a missing note about backward-incompatible changes in
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=37
2024-03-25 15:36:37 +00:00
|
|
|
BuildRequires: %{python_module pyftpdlib >= 1.5.8}
|
2020-04-02 03:41:29 +00:00
|
|
|
BuildRequires: %{python_module pytest-xdist}
|
2019-07-24 10:27:28 +00:00
|
|
|
BuildRequires: %{python_module pytest}
|
2020-01-16 15:35:55 +00:00
|
|
|
BuildRequires: %{python_module queuelib >= 1.4.2}
|
2024-01-10 07:53:57 +00:00
|
|
|
BuildRequires: %{python_module service_identity >= 18.1.0}
|
2019-05-16 19:37:41 +00:00
|
|
|
BuildRequires: %{python_module setuptools}
|
2020-04-02 03:41:29 +00:00
|
|
|
BuildRequires: %{python_module sybil}
|
2024-01-10 07:53:57 +00:00
|
|
|
BuildRequires: %{python_module testfixtures}
|
2022-03-02 23:14:08 +00:00
|
|
|
BuildRequires: %{python_module tldextract}
|
Accepting request 889030 from home:bnavigator:branches:devel:languages:python
- Update to 2.5.0:
* Official Python 3.9 support
* Experimental HTTP/2 support
* New get_retry_request() function to retry requests from spider
callbacks
* New headers_received signal that allows stopping downloads
early
* New Response.protocol attribute
- Release 2.4.1:
* Fixed feed exports overwrite support
* Fixed the asyncio event loop handling, which could make code
hang
* Fixed the IPv6-capable DNS resolver CachingHostnameResolver
for download handlers that call reactor.resolve
* Fixed the output of the genspider command showing placeholders
instead of the import part of the generated spider module
(issue 4874)
- Release 2.4.0:
* Python 3.5 support has been dropped.
* The file_path method of media pipelines can now access the
source item.
* This allows you to set a download file path based on item data.
* The new item_export_kwargs key of the FEEDS setting allows to
define keyword parameters to pass to item exporter classes.
* You can now choose whether feed exports overwrite or append to
the output file.
* For example, when using the crawl or runspider commands, you
can use the -O option instead of -o to overwrite the output
file.
* Zstd-compressed responses are now supported if zstandard is
installed.
* In settings, where the import path of a class is required, it
is now possible to pass a class object instead.
- Release 2.3.0:
* Feed exports now support Google Cloud Storage as a storage
backend
* The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver
output items in batches of up to the specified number of items.
* It also serves as a workaround for delayed file delivery,
which causes Scrapy to only start item delivery after the
crawl has finished when using certain storage backends (S3,
FTP, and now GCS).
* The base implementation of item loaders has been moved into a
separate library, itemloaders, allowing usage from outside
Scrapy and a separate release schedule
- Release 2.2.1:
* The startproject command no longer makes unintended changes to
the permissions of files in the destination folder, such as
removing execution permissions.
OBS-URL: https://build.opensuse.org/request/show/889030
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
|
|
|
BuildRequires: %{python_module uvloop}
|
2022-03-02 23:14:08 +00:00
|
|
|
BuildRequires: %{python_module w3lib >= 1.17.0}
|
- update to 2.11.1 (bsc#1220514, CVE-2024-1892):
* Addressed `ReDoS vulnerabilities` (bsc#1220514, CVE-2024-1892)
- ``scrapy.utils.iterators.xmliter`` is now deprecated in favor of
:func:`~scrapy.utils.iterators.xmliter_lxml`, which
:class:`~scrapy.spiders.XMLFeedSpider` now uses.
To minimize the impact of this change on existing code,
:func:`~scrapy.utils.iterators.xmliter_lxml` now supports indicating
the node namespace with a prefix in the node name, and big files with
highly nested trees when using libxml2 2.7+.
- Fixed regular expressions in the implementation of the
:func:`~scrapy.utils.response.open_in_browser` function.
.. _ReDoS vulnerabilities: https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
* :setting:`DOWNLOAD_MAXSIZE` and :setting:`DOWNLOAD_WARNSIZE` now also apply
to the decompressed response body. Please, see the `7j7m-v7m3-jqm7 security
advisory`_ for more information.
.. _7j7m-v7m3-jqm7 security advisory: https://github.com/scrapy/scrapy/security/advisories/GHSA-7j7m-v7m3-jqm7
* Also in relation with the `7j7m-v7m3-jqm7 security advisory`_, the
deprecated ``scrapy.downloadermiddlewares.decompression`` module has been
removed.
* The ``Authorization`` header is now dropped on redirects to a different
domain. Please, see the `cw9j-q3vf-hrrv security advisory`_ for more
information.
* The OS signal handling code was refactored to no longer use private Twisted
functions. (:issue:`6024`, :issue:`6064`, :issue:`6112`)
* Improved documentation for :class:`~scrapy.crawler.Crawler` initialization
changes made in the 2.11.0 release. (:issue:`6057`, :issue:`6147`)
* Extended documentation for :attr:`Request.meta <scrapy.http.Request.meta>`.
* Fixed the :reqmeta:`dont_merge_cookies` documentation. (:issue:`5936`,
* Added a link to Zyte's export guides to the :ref:`feed exports
* Added a missing note about backward-incompatible changes in
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=37
2024-03-25 15:36:37 +00:00
|
|
|
BuildRequires: %{python_module wheel}
|
2024-01-10 07:53:57 +00:00
|
|
|
BuildRequires: %{python_module zope.interface >= 5.1.0}
|
2019-02-19 14:19:30 +00:00
|
|
|
BuildRequires: fdupes
|
2019-05-16 19:37:41 +00:00
|
|
|
BuildRequires: python-rpm-macros
|
2019-07-24 10:27:28 +00:00
|
|
|
BuildRequires: python3-Sphinx
|
2020-01-16 15:35:55 +00:00
|
|
|
Requires: python-Protego >= 0.1.15
|
2019-02-19 14:19:30 +00:00
|
|
|
Requires: python-PyDispatcher >= 2.0.5
|
2024-01-10 07:53:57 +00:00
|
|
|
Requires: python-Twisted >= 18.9.0
|
|
|
|
Requires: python-cryptography >= 36.0.0
|
2020-01-16 15:35:55 +00:00
|
|
|
Requires: python-cssselect >= 0.9.1
|
2024-07-11 10:53:38 +00:00
|
|
|
Requires: python-defusedxml >= 0.7.1
|
2020-07-08 06:42:00 +00:00
|
|
|
Requires: python-itemadapter >= 0.1.0
|
Accepting request 889030 from home:bnavigator:branches:devel:languages:python
- Update to 2.5.0:
* Official Python 3.9 support
* Experimental HTTP/2 support
* New get_retry_request() function to retry requests from spider
callbacks
* New headers_received signal that allows stopping downloads
early
* New Response.protocol attribute
- Release 2.4.1:
* Fixed feed exports overwrite support
* Fixed the asyncio event loop handling, which could make code
hang
* Fixed the IPv6-capable DNS resolver CachingHostnameResolver
for download handlers that call reactor.resolve
* Fixed the output of the genspider command showing placeholders
instead of the import part of the generated spider module
(issue 4874)
- Release 2.4.0:
* Python 3.5 support has been dropped.
* The file_path method of media pipelines can now access the
source item.
* This allows you to set a download file path based on item data.
* The new item_export_kwargs key of the FEEDS setting allows to
define keyword parameters to pass to item exporter classes.
* You can now choose whether feed exports overwrite or append to
the output file.
* For example, when using the crawl or runspider commands, you
can use the -O option instead of -o to overwrite the output
file.
* Zstd-compressed responses are now supported if zstandard is
installed.
* In settings, where the import path of a class is required, it
is now possible to pass a class object instead.
- Release 2.3.0:
* Feed exports now support Google Cloud Storage as a storage
backend
* The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver
output items in batches of up to the specified number of items.
* It also serves as a workaround for delayed file delivery,
which causes Scrapy to only start item delivery after the
crawl has finished when using certain storage backends (S3,
FTP, and now GCS).
* The base implementation of item loaders has been moved into a
separate library, itemloaders, allowing usage from outside
Scrapy and a separate release schedule
- Release 2.2.1:
* The startproject command no longer makes unintended changes to
the permissions of files in the destination folder, such as
removing execution permissions.
OBS-URL: https://build.opensuse.org/request/show/889030
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
|
|
|
Requires: python-itemloaders >= 1.0.1
|
2024-01-10 07:53:57 +00:00
|
|
|
Requires: python-lxml >= 4.4.1
|
- update to 2.11.1 (bsc#1220514, CVE-2024-1892):
* Addressed `ReDoS vulnerabilities` (bsc#1220514, CVE-2024-1892)
- ``scrapy.utils.iterators.xmliter`` is now deprecated in favor of
:func:`~scrapy.utils.iterators.xmliter_lxml`, which
:class:`~scrapy.spiders.XMLFeedSpider` now uses.
To minimize the impact of this change on existing code,
:func:`~scrapy.utils.iterators.xmliter_lxml` now supports indicating
the node namespace with a prefix in the node name, and big files with
highly nested trees when using libxml2 2.7+.
- Fixed regular expressions in the implementation of the
:func:`~scrapy.utils.response.open_in_browser` function.
.. _ReDoS vulnerabilities: https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
* :setting:`DOWNLOAD_MAXSIZE` and :setting:`DOWNLOAD_WARNSIZE` now also apply
to the decompressed response body. Please, see the `7j7m-v7m3-jqm7 security
advisory`_ for more information.
.. _7j7m-v7m3-jqm7 security advisory: https://github.com/scrapy/scrapy/security/advisories/GHSA-7j7m-v7m3-jqm7
* Also in relation with the `7j7m-v7m3-jqm7 security advisory`_, the
deprecated ``scrapy.downloadermiddlewares.decompression`` module has been
removed.
* The ``Authorization`` header is now dropped on redirects to a different
domain. Please, see the `cw9j-q3vf-hrrv security advisory`_ for more
information.
* The OS signal handling code was refactored to no longer use private Twisted
functions. (:issue:`6024`, :issue:`6064`, :issue:`6112`)
* Improved documentation for :class:`~scrapy.crawler.Crawler` initialization
changes made in the 2.11.0 release. (:issue:`6057`, :issue:`6147`)
* Extended documentation for :attr:`Request.meta <scrapy.http.Request.meta>`.
* Fixed the :reqmeta:`dont_merge_cookies` documentation. (:issue:`5936`,
* Added a link to Zyte's export guides to the :ref:`feed exports
* Added a missing note about backward-incompatible changes in
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=37
2024-03-25 15:36:37 +00:00
|
|
|
Requires: python-packaging
|
2020-01-16 15:35:55 +00:00
|
|
|
Requires: python-parsel >= 1.5.0
|
2024-01-10 07:53:57 +00:00
|
|
|
Requires: python-pyOpenSSL >= 21.0.0
|
2020-01-16 15:35:55 +00:00
|
|
|
Requires: python-queuelib >= 1.4.2
|
2024-01-10 07:53:57 +00:00
|
|
|
Requires: python-service_identity >= 18.1.0
|
2019-07-24 09:00:54 +00:00
|
|
|
Requires: python-setuptools
|
2022-03-06 16:31:19 +00:00
|
|
|
Requires: python-tldextract
|
2019-02-19 14:19:30 +00:00
|
|
|
Requires: python-w3lib >= 1.17.2
|
2024-01-10 07:53:57 +00:00
|
|
|
Requires: python-zope.interface >= 5.1.0
|
2020-05-19 12:14:06 +00:00
|
|
|
Requires(post): update-alternatives
|
- update to 2.11.1 (bsc#1220514, CVE-2024-1892):
* Addressed `ReDoS vulnerabilities` (bsc#1220514, CVE-2024-1892)
- ``scrapy.utils.iterators.xmliter`` is now deprecated in favor of
:func:`~scrapy.utils.iterators.xmliter_lxml`, which
:class:`~scrapy.spiders.XMLFeedSpider` now uses.
To minimize the impact of this change on existing code,
:func:`~scrapy.utils.iterators.xmliter_lxml` now supports indicating
the node namespace with a prefix in the node name, and big files with
highly nested trees when using libxml2 2.7+.
- Fixed regular expressions in the implementation of the
:func:`~scrapy.utils.response.open_in_browser` function.
.. _ReDoS vulnerabilities: https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
* :setting:`DOWNLOAD_MAXSIZE` and :setting:`DOWNLOAD_WARNSIZE` now also apply
to the decompressed response body. Please, see the `7j7m-v7m3-jqm7 security
advisory`_ for more information.
.. _7j7m-v7m3-jqm7 security advisory: https://github.com/scrapy/scrapy/security/advisories/GHSA-7j7m-v7m3-jqm7
* Also in relation with the `7j7m-v7m3-jqm7 security advisory`_, the
deprecated ``scrapy.downloadermiddlewares.decompression`` module has been
removed.
* The ``Authorization`` header is now dropped on redirects to a different
domain. Please, see the `cw9j-q3vf-hrrv security advisory`_ for more
information.
* The OS signal handling code was refactored to no longer use private Twisted
functions. (:issue:`6024`, :issue:`6064`, :issue:`6112`)
* Improved documentation for :class:`~scrapy.crawler.Crawler` initialization
changes made in the 2.11.0 release. (:issue:`6057`, :issue:`6147`)
* Extended documentation for :attr:`Request.meta <scrapy.http.Request.meta>`.
* Fixed the :reqmeta:`dont_merge_cookies` documentation. (:issue:`5936`,
* Added a link to Zyte's export guides to the :ref:`feed exports
* Added a missing note about backward-incompatible changes in
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=37
2024-03-25 15:36:37 +00:00
|
|
|
Requires(postun): update-alternatives
|
2019-02-19 14:19:30 +00:00
|
|
|
BuildArch: noarch
|
|
|
|
%python_subpackages
|
|
|
|
|
|
|
|
%description
|
|
|
|
Scrapy is a high level scraping and web crawling framework for writing spiders
|
|
|
|
to crawl and parse web pages for all kinds of purposes, from information
|
|
|
|
retrieval to monitoring or testing web sites.
|
|
|
|
|
|
|
|
%package -n %{name}-doc
|
|
|
|
Summary: Documentation for %{name}
|
|
|
|
Group: Documentation/HTML
|
|
|
|
|
|
|
|
%description -n %{name}-doc
|
|
|
|
Provides documentation for %{name}.
|
|
|
|
|
|
|
|
%prep
|
2024-07-11 10:53:38 +00:00
|
|
|
%autosetup -p1 -n scrapy-%{version}
|
2022-03-02 23:14:08 +00:00
|
|
|
|
2019-07-24 10:27:28 +00:00
|
|
|
sed -i -e 's:= python:= python3:g' docs/Makefile
|
2019-02-19 14:19:30 +00:00
|
|
|
|
|
|
|
%build
|
- update to 2.11.1 (bsc#1220514, CVE-2024-1892):
* Addressed `ReDoS vulnerabilities` (bsc#1220514, CVE-2024-1892)
- ``scrapy.utils.iterators.xmliter`` is now deprecated in favor of
:func:`~scrapy.utils.iterators.xmliter_lxml`, which
:class:`~scrapy.spiders.XMLFeedSpider` now uses.
To minimize the impact of this change on existing code,
:func:`~scrapy.utils.iterators.xmliter_lxml` now supports indicating
the node namespace with a prefix in the node name, and big files with
highly nested trees when using libxml2 2.7+.
- Fixed regular expressions in the implementation of the
:func:`~scrapy.utils.response.open_in_browser` function.
.. _ReDoS vulnerabilities: https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
* :setting:`DOWNLOAD_MAXSIZE` and :setting:`DOWNLOAD_WARNSIZE` now also apply
to the decompressed response body. Please, see the `7j7m-v7m3-jqm7 security
advisory`_ for more information.
.. _7j7m-v7m3-jqm7 security advisory: https://github.com/scrapy/scrapy/security/advisories/GHSA-7j7m-v7m3-jqm7
* Also in relation with the `7j7m-v7m3-jqm7 security advisory`_, the
deprecated ``scrapy.downloadermiddlewares.decompression`` module has been
removed.
* The ``Authorization`` header is now dropped on redirects to a different
domain. Please, see the `cw9j-q3vf-hrrv security advisory`_ for more
information.
* The OS signal handling code was refactored to no longer use private Twisted
functions. (:issue:`6024`, :issue:`6064`, :issue:`6112`)
* Improved documentation for :class:`~scrapy.crawler.Crawler` initialization
changes made in the 2.11.0 release. (:issue:`6057`, :issue:`6147`)
* Extended documentation for :attr:`Request.meta <scrapy.http.Request.meta>`.
* Fixed the :reqmeta:`dont_merge_cookies` documentation. (:issue:`5936`,
* Added a link to Zyte's export guides to the :ref:`feed exports
* Added a missing note about backward-incompatible changes in
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=37
2024-03-25 15:36:37 +00:00
|
|
|
%pyproject_wheel
|
2019-02-19 14:19:30 +00:00
|
|
|
pushd docs
|
2020-05-19 12:14:06 +00:00
|
|
|
%make_build html && rm -r build/html/.buildinfo
|
2019-02-19 14:19:30 +00:00
|
|
|
popd
|
|
|
|
|
|
|
|
%install
|
- update to 2.11.1 (bsc#1220514, CVE-2024-1892):
* Addressed `ReDoS vulnerabilities` (bsc#1220514, CVE-2024-1892)
- ``scrapy.utils.iterators.xmliter`` is now deprecated in favor of
:func:`~scrapy.utils.iterators.xmliter_lxml`, which
:class:`~scrapy.spiders.XMLFeedSpider` now uses.
To minimize the impact of this change on existing code,
:func:`~scrapy.utils.iterators.xmliter_lxml` now supports indicating
the node namespace with a prefix in the node name, and big files with
highly nested trees when using libxml2 2.7+.
- Fixed regular expressions in the implementation of the
:func:`~scrapy.utils.response.open_in_browser` function.
.. _ReDoS vulnerabilities: https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
* :setting:`DOWNLOAD_MAXSIZE` and :setting:`DOWNLOAD_WARNSIZE` now also apply
to the decompressed response body. Please, see the `7j7m-v7m3-jqm7 security
advisory`_ for more information.
.. _7j7m-v7m3-jqm7 security advisory: https://github.com/scrapy/scrapy/security/advisories/GHSA-7j7m-v7m3-jqm7
* Also in relation with the `7j7m-v7m3-jqm7 security advisory`_, the
deprecated ``scrapy.downloadermiddlewares.decompression`` module has been
removed.
* The ``Authorization`` header is now dropped on redirects to a different
domain. Please, see the `cw9j-q3vf-hrrv security advisory`_ for more
information.
* The OS signal handling code was refactored to no longer use private Twisted
functions. (:issue:`6024`, :issue:`6064`, :issue:`6112`)
* Improved documentation for :class:`~scrapy.crawler.Crawler` initialization
changes made in the 2.11.0 release. (:issue:`6057`, :issue:`6147`)
* Extended documentation for :attr:`Request.meta <scrapy.http.Request.meta>`.
* Fixed the :reqmeta:`dont_merge_cookies` documentation. (:issue:`5936`,
* Added a link to Zyte's export guides to the :ref:`feed exports
* Added a missing note about backward-incompatible changes in
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=37
2024-03-25 15:36:37 +00:00
|
|
|
%pyproject_install
|
2020-05-19 12:14:06 +00:00
|
|
|
%python_clone -a %{buildroot}%{_bindir}/scrapy
|
2019-02-19 14:19:30 +00:00
|
|
|
%python_expand %fdupes %{buildroot}%{$python_sitelib}
|
|
|
|
|
|
|
|
%check
|
Accepting request 889030 from home:bnavigator:branches:devel:languages:python
- Update to 2.5.0:
* Official Python 3.9 support
* Experimental HTTP/2 support
* New get_retry_request() function to retry requests from spider
callbacks
* New headers_received signal that allows stopping downloads
early
* New Response.protocol attribute
- Release 2.4.1:
* Fixed feed exports overwrite support
* Fixed the asyncio event loop handling, which could make code
hang
* Fixed the IPv6-capable DNS resolver CachingHostnameResolver
for download handlers that call reactor.resolve
* Fixed the output of the genspider command showing placeholders
instead of the import part of the generated spider module
(issue 4874)
- Release 2.4.0:
* Python 3.5 support has been dropped.
* The file_path method of media pipelines can now access the
source item.
* This allows you to set a download file path based on item data.
* The new item_export_kwargs key of the FEEDS setting allows to
define keyword parameters to pass to item exporter classes.
* You can now choose whether feed exports overwrite or append to
the output file.
* For example, when using the crawl or runspider commands, you
can use the -O option instead of -o to overwrite the output
file.
* Zstd-compressed responses are now supported if zstandard is
installed.
* In settings, where the import path of a class is required, it
is now possible to pass a class object instead.
- Release 2.3.0:
* Feed exports now support Google Cloud Storage as a storage
backend
* The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver
output items in batches of up to the specified number of items.
* It also serves as a workaround for delayed file delivery,
which causes Scrapy to only start item delivery after the
crawl has finished when using certain storage backends (S3,
FTP, and now GCS).
* The base implementation of item loaders has been moved into a
separate library, itemloaders, allowing usage from outside
Scrapy and a separate release schedule
- Release 2.2.1:
* The startproject command no longer makes unintended changes to
the permissions of files in the destination folder, such as
removing execution permissions.
OBS-URL: https://build.opensuse.org/request/show/889030
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
|
|
|
# no color in obs chroot console
|
2022-01-17 06:30:18 +00:00
|
|
|
skiplist="test_pformat"
|
2022-03-06 16:31:19 +00:00
|
|
|
# no online connection to toscrapy.com
|
2024-07-11 11:08:02 +00:00
|
|
|
skiplist="$skiplist or CheckCommandTest or test_file_path"
|
2024-01-10 08:44:17 +00:00
|
|
|
# Flaky test gh#scrapy/scrapy#5703
|
|
|
|
skiplist="$skiplist or test_start_requests_laziness"
|
2024-01-10 07:53:57 +00:00
|
|
|
%{pytest -x \
|
2022-03-06 16:31:19 +00:00
|
|
|
-k "not (${skiplist})" \
|
2019-02-19 14:19:30 +00:00
|
|
|
-W ignore::DeprecationWarning \
|
|
|
|
tests}
|
|
|
|
|
2020-05-19 12:14:06 +00:00
|
|
|
%post
|
|
|
|
%python_install_alternative scrapy
|
|
|
|
|
|
|
|
%postun
|
|
|
|
%python_uninstall_alternative scrapy
|
|
|
|
|
2019-02-19 14:19:30 +00:00
|
|
|
%files %{python_files}
|
2019-07-24 09:00:54 +00:00
|
|
|
%license LICENSE
|
|
|
|
%doc AUTHORS README.rst
|
2022-01-17 06:30:18 +00:00
|
|
|
%{python_sitelib}/scrapy
|
- update to 2.11.1 (bsc#1220514, CVE-2024-1892):
* Addressed `ReDoS vulnerabilities` (bsc#1220514, CVE-2024-1892)
- ``scrapy.utils.iterators.xmliter`` is now deprecated in favor of
:func:`~scrapy.utils.iterators.xmliter_lxml`, which
:class:`~scrapy.spiders.XMLFeedSpider` now uses.
To minimize the impact of this change on existing code,
:func:`~scrapy.utils.iterators.xmliter_lxml` now supports indicating
the node namespace with a prefix in the node name, and big files with
highly nested trees when using libxml2 2.7+.
- Fixed regular expressions in the implementation of the
:func:`~scrapy.utils.response.open_in_browser` function.
.. _ReDoS vulnerabilities: https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
* :setting:`DOWNLOAD_MAXSIZE` and :setting:`DOWNLOAD_WARNSIZE` now also apply
to the decompressed response body. Please, see the `7j7m-v7m3-jqm7 security
advisory`_ for more information.
.. _7j7m-v7m3-jqm7 security advisory: https://github.com/scrapy/scrapy/security/advisories/GHSA-7j7m-v7m3-jqm7
* Also in relation with the `7j7m-v7m3-jqm7 security advisory`_, the
deprecated ``scrapy.downloadermiddlewares.decompression`` module has been
removed.
* The ``Authorization`` header is now dropped on redirects to a different
domain. Please, see the `cw9j-q3vf-hrrv security advisory`_ for more
information.
* The OS signal handling code was refactored to no longer use private Twisted
functions. (:issue:`6024`, :issue:`6064`, :issue:`6112`)
* Improved documentation for :class:`~scrapy.crawler.Crawler` initialization
changes made in the 2.11.0 release. (:issue:`6057`, :issue:`6147`)
* Extended documentation for :attr:`Request.meta <scrapy.http.Request.meta>`.
* Fixed the :reqmeta:`dont_merge_cookies` documentation. (:issue:`5936`,
* Added a link to Zyte's export guides to the :ref:`feed exports
* Added a missing note about backward-incompatible changes in
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=37
2024-03-25 15:36:37 +00:00
|
|
|
%{python_sitelib}/Scrapy-%{version}.dist-info
|
2020-05-19 12:14:06 +00:00
|
|
|
%python_alternative %{_bindir}/scrapy
|
2019-02-19 14:19:30 +00:00
|
|
|
|
|
|
|
%files -n %{name}-doc
|
|
|
|
%doc docs/build/html
|
|
|
|
|
|
|
|
%changelog
|