2019-02-19 14:19:30 +00:00
|
|
|
#
|
|
|
|
# spec file for package python-Scrapy
|
|
|
|
#
|
Accepting request 889030 from home:bnavigator:branches:devel:languages:python
- Update to 2.5.0:
* Official Python 3.9 support
* Experimental HTTP/2 support
* New get_retry_request() function to retry requests from spider
callbacks
* New headers_received signal that allows stopping downloads
early
* New Response.protocol attribute
- Release 2.4.1:
* Fixed feed exports overwrite support
* Fixed the asyncio event loop handling, which could make code
hang
* Fixed the IPv6-capable DNS resolver CachingHostnameResolver
for download handlers that call reactor.resolve
* Fixed the output of the genspider command showing placeholders
instead of the import part of the generated spider module
(issue 4874)
- Release 2.4.0:
* Python 3.5 support has been dropped.
* The file_path method of media pipelines can now access the
source item.
* This allows you to set a download file path based on item data.
* The new item_export_kwargs key of the FEEDS setting allows to
define keyword parameters to pass to item exporter classes.
* You can now choose whether feed exports overwrite or append to
the output file.
* For example, when using the crawl or runspider commands, you
can use the -O option instead of -o to overwrite the output
file.
* Zstd-compressed responses are now supported if zstandard is
installed.
* In settings, where the import path of a class is required, it
is now possible to pass a class object instead.
- Release 2.3.0:
* Feed exports now support Google Cloud Storage as a storage
backend
* The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver
output items in batches of up to the specified number of items.
* It also serves as a workaround for delayed file delivery,
which causes Scrapy to only start item delivery after the
crawl has finished when using certain storage backends (S3,
FTP, and now GCS).
* The base implementation of item loaders has been moved into a
separate library, itemloaders, allowing usage from outside
Scrapy and a separate release schedule
- Release 2.2.1:
* The startproject command no longer makes unintended changes to
the permissions of files in the destination folder, such as
removing execution permissions.
OBS-URL: https://build.opensuse.org/request/show/889030
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
|
|
|
# Copyright (c) 2021 SUSE LLC
|
2019-02-19 14:19:30 +00:00
|
|
|
#
|
|
|
|
# All modifications and additions to the file contributed by third parties
|
|
|
|
# remain the property of their copyright owners, unless otherwise agreed
|
|
|
|
# upon. The license for this file, and modifications and additions to the
|
|
|
|
# file, is the same license as for the pristine package itself (unless the
|
|
|
|
# license for the pristine package is not an Open Source License, in which
|
|
|
|
# case the license is the MIT License). An "Open Source License" is a
|
|
|
|
# license that conforms to the Open Source Definition (Version 1.9)
|
|
|
|
# published by the Open Source Initiative.
|
|
|
|
|
2019-07-24 09:00:54 +00:00
|
|
|
# Please submit bugfixes or comments via https://bugs.opensuse.org/
|
2019-02-19 14:19:30 +00:00
|
|
|
#
|
|
|
|
|
|
|
|
|
2022-01-17 06:30:18 +00:00
|
|
|
%{?!python_module:%define python_module() python3-%{**}}
|
2020-04-02 03:41:29 +00:00
|
|
|
%define skip_python2 1
|
2019-02-19 14:19:30 +00:00
|
|
|
Name: python-Scrapy
|
Accepting request 923811 from home:bnavigator:branches:devel:languages:python
- Update to 2.5.1, Security bug fix
* boo#1191446, CVE-2021-41125
* If you use HttpAuthMiddleware (i.e. the http_user and
http_pass spider attributes) for HTTP authentication,
any request exposes your credentials to the request
target.
* To prevent unintended exposure of authentication
credentials to unintended domains, you must now
additionally set a new, additional spider attribute,
http_auth_domain, and point it to the specific domain to
which the authentication credentials must be sent.
* If the http_auth_domain spider attribute is not set, the
domain of the first request will be considered the HTTP
authentication target, and authentication credentials
will only be sent in requests targeting that domain.
* If you need to send the same HTTP authentication
credentials to multiple domains, you can use
w3lib.http.basic_auth_header instead to set the value of
the Authorization header of your requests.
* If you really want your spider to send the same HTTP
authentication credentials to any domain, set the
http_auth_domain spider attribute to None.
* Finally, if you are a user of scrapy-splash, know that
this version of Scrapy breaks compatibility with
scrapy-splash 0.7.2 and earlier. You will need to upgrade
scrapy-splash to a greater version for it to continue to
work.
OBS-URL: https://build.opensuse.org/request/show/923811
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=21
2021-10-07 16:57:39 +00:00
|
|
|
Version: 2.5.1
|
2019-02-19 14:19:30 +00:00
|
|
|
Release: 0
|
|
|
|
Summary: A high-level Python Screen Scraping framework
|
|
|
|
License: BSD-3-Clause
|
|
|
|
Group: Development/Languages/Python
|
2019-07-24 10:27:28 +00:00
|
|
|
URL: https://scrapy.org
|
|
|
|
Source: https://files.pythonhosted.org/packages/source/S/Scrapy/Scrapy-%{version}.tar.gz
|
2021-09-09 12:02:15 +00:00
|
|
|
# PATCH-FIX-OPENSUSE remove-h2-version-restriction.patch boo#1190035 -- run scrapy with h2 >= 4.0.0
|
|
|
|
Patch0: remove-h2-version-restriction.patch
|
|
|
|
# PATCH-FIX-UPSTREAM add-peak-method-to-queues.patch https://github.com/scrapy/scrapy/commit/68379197986ae3deb81a545b5fd6920ea3347094
|
|
|
|
Patch1: add-peak-method-to-queues.patch
|
2019-02-19 14:19:30 +00:00
|
|
|
BuildRequires: %{python_module Pillow}
|
2020-01-16 15:35:55 +00:00
|
|
|
BuildRequires: %{python_module Protego >= 0.1.15}
|
2019-02-19 14:19:30 +00:00
|
|
|
BuildRequires: %{python_module PyDispatcher >= 2.0.5}
|
|
|
|
BuildRequires: %{python_module Twisted >= 17.9.0}
|
Accepting request 889030 from home:bnavigator:branches:devel:languages:python
- Update to 2.5.0:
* Official Python 3.9 support
* Experimental HTTP/2 support
* New get_retry_request() function to retry requests from spider
callbacks
* New headers_received signal that allows stopping downloads
early
* New Response.protocol attribute
- Release 2.4.1:
* Fixed feed exports overwrite support
* Fixed the asyncio event loop handling, which could make code
hang
* Fixed the IPv6-capable DNS resolver CachingHostnameResolver
for download handlers that call reactor.resolve
* Fixed the output of the genspider command showing placeholders
instead of the import part of the generated spider module
(issue 4874)
- Release 2.4.0:
* Python 3.5 support has been dropped.
* The file_path method of media pipelines can now access the
source item.
* This allows you to set a download file path based on item data.
* The new item_export_kwargs key of the FEEDS setting allows to
define keyword parameters to pass to item exporter classes.
* You can now choose whether feed exports overwrite or append to
the output file.
* For example, when using the crawl or runspider commands, you
can use the -O option instead of -o to overwrite the output
file.
* Zstd-compressed responses are now supported if zstandard is
installed.
* In settings, where the import path of a class is required, it
is now possible to pass a class object instead.
- Release 2.3.0:
* Feed exports now support Google Cloud Storage as a storage
backend
* The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver
output items in batches of up to the specified number of items.
* It also serves as a workaround for delayed file delivery,
which causes Scrapy to only start item delivery after the
crawl has finished when using certain storage backends (S3,
FTP, and now GCS).
* The base implementation of item loaders has been moved into a
separate library, itemloaders, allowing usage from outside
Scrapy and a separate release schedule
- Release 2.2.1:
* The startproject command no longer makes unintended changes to
the permissions of files in the destination folder, such as
removing execution permissions.
OBS-URL: https://build.opensuse.org/request/show/889030
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
|
|
|
BuildRequires: %{python_module botocore}
|
2020-01-16 15:35:55 +00:00
|
|
|
BuildRequires: %{python_module cryptography >= 2.0}
|
|
|
|
BuildRequires: %{python_module cssselect >= 0.9.1}
|
2020-04-02 03:41:29 +00:00
|
|
|
BuildRequires: %{python_module dbm}
|
2020-07-08 06:42:00 +00:00
|
|
|
BuildRequires: %{python_module itemadapter >= 0.1.0}
|
Accepting request 889030 from home:bnavigator:branches:devel:languages:python
- Update to 2.5.0:
* Official Python 3.9 support
* Experimental HTTP/2 support
* New get_retry_request() function to retry requests from spider
callbacks
* New headers_received signal that allows stopping downloads
early
* New Response.protocol attribute
- Release 2.4.1:
* Fixed feed exports overwrite support
* Fixed the asyncio event loop handling, which could make code
hang
* Fixed the IPv6-capable DNS resolver CachingHostnameResolver
for download handlers that call reactor.resolve
* Fixed the output of the genspider command showing placeholders
instead of the import part of the generated spider module
(issue 4874)
- Release 2.4.0:
* Python 3.5 support has been dropped.
* The file_path method of media pipelines can now access the
source item.
* This allows you to set a download file path based on item data.
* The new item_export_kwargs key of the FEEDS setting allows to
define keyword parameters to pass to item exporter classes.
* You can now choose whether feed exports overwrite or append to
the output file.
* For example, when using the crawl or runspider commands, you
can use the -O option instead of -o to overwrite the output
file.
* Zstd-compressed responses are now supported if zstandard is
installed.
* In settings, where the import path of a class is required, it
is now possible to pass a class object instead.
- Release 2.3.0:
* Feed exports now support Google Cloud Storage as a storage
backend
* The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver
output items in batches of up to the specified number of items.
* It also serves as a workaround for delayed file delivery,
which causes Scrapy to only start item delivery after the
crawl has finished when using certain storage backends (S3,
FTP, and now GCS).
* The base implementation of item loaders has been moved into a
separate library, itemloaders, allowing usage from outside
Scrapy and a separate release schedule
- Release 2.2.1:
* The startproject command no longer makes unintended changes to
the permissions of files in the destination folder, such as
removing execution permissions.
OBS-URL: https://build.opensuse.org/request/show/889030
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
|
|
|
BuildRequires: %{python_module itemloaders >= 1.0.1}
|
2019-07-24 10:27:28 +00:00
|
|
|
BuildRequires: %{python_module jmespath}
|
2020-01-16 15:35:55 +00:00
|
|
|
BuildRequires: %{python_module lxml >= 3.5.0}
|
|
|
|
BuildRequires: %{python_module parsel >= 1.5.0}
|
|
|
|
BuildRequires: %{python_module pyOpenSSL >= 16.2.0}
|
Accepting request 889030 from home:bnavigator:branches:devel:languages:python
- Update to 2.5.0:
* Official Python 3.9 support
* Experimental HTTP/2 support
* New get_retry_request() function to retry requests from spider
callbacks
* New headers_received signal that allows stopping downloads
early
* New Response.protocol attribute
- Release 2.4.1:
* Fixed feed exports overwrite support
* Fixed the asyncio event loop handling, which could make code
hang
* Fixed the IPv6-capable DNS resolver CachingHostnameResolver
for download handlers that call reactor.resolve
* Fixed the output of the genspider command showing placeholders
instead of the import part of the generated spider module
(issue 4874)
- Release 2.4.0:
* Python 3.5 support has been dropped.
* The file_path method of media pipelines can now access the
source item.
* This allows you to set a download file path based on item data.
* The new item_export_kwargs key of the FEEDS setting allows to
define keyword parameters to pass to item exporter classes.
* You can now choose whether feed exports overwrite or append to
the output file.
* For example, when using the crawl or runspider commands, you
can use the -O option instead of -o to overwrite the output
file.
* Zstd-compressed responses are now supported if zstandard is
installed.
* In settings, where the import path of a class is required, it
is now possible to pass a class object instead.
- Release 2.3.0:
* Feed exports now support Google Cloud Storage as a storage
backend
* The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver
output items in batches of up to the specified number of items.
* It also serves as a workaround for delayed file delivery,
which causes Scrapy to only start item delivery after the
crawl has finished when using certain storage backends (S3,
FTP, and now GCS).
* The base implementation of item loaders has been moved into a
separate library, itemloaders, allowing usage from outside
Scrapy and a separate release schedule
- Release 2.2.1:
* The startproject command no longer makes unintended changes to
the permissions of files in the destination folder, such as
removing execution permissions.
OBS-URL: https://build.opensuse.org/request/show/889030
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
|
|
|
BuildRequires: %{python_module pyftpdlib}
|
2020-04-02 03:41:29 +00:00
|
|
|
BuildRequires: %{python_module pytest-xdist}
|
2019-07-24 10:27:28 +00:00
|
|
|
BuildRequires: %{python_module pytest}
|
2020-01-16 15:35:55 +00:00
|
|
|
BuildRequires: %{python_module queuelib >= 1.4.2}
|
|
|
|
BuildRequires: %{python_module service_identity >= 16.0.0}
|
2019-05-16 19:37:41 +00:00
|
|
|
BuildRequires: %{python_module setuptools}
|
2020-04-02 03:41:29 +00:00
|
|
|
BuildRequires: %{python_module sybil}
|
2021-09-09 12:02:15 +00:00
|
|
|
BuildRequires: %{python_module testfixtures >= 6.0.0}
|
Accepting request 889030 from home:bnavigator:branches:devel:languages:python
- Update to 2.5.0:
* Official Python 3.9 support
* Experimental HTTP/2 support
* New get_retry_request() function to retry requests from spider
callbacks
* New headers_received signal that allows stopping downloads
early
* New Response.protocol attribute
- Release 2.4.1:
* Fixed feed exports overwrite support
* Fixed the asyncio event loop handling, which could make code
hang
* Fixed the IPv6-capable DNS resolver CachingHostnameResolver
for download handlers that call reactor.resolve
* Fixed the output of the genspider command showing placeholders
instead of the import part of the generated spider module
(issue 4874)
- Release 2.4.0:
* Python 3.5 support has been dropped.
* The file_path method of media pipelines can now access the
source item.
* This allows you to set a download file path based on item data.
* The new item_export_kwargs key of the FEEDS setting allows to
define keyword parameters to pass to item exporter classes.
* You can now choose whether feed exports overwrite or append to
the output file.
* For example, when using the crawl or runspider commands, you
can use the -O option instead of -o to overwrite the output
file.
* Zstd-compressed responses are now supported if zstandard is
installed.
* In settings, where the import path of a class is required, it
is now possible to pass a class object instead.
- Release 2.3.0:
* Feed exports now support Google Cloud Storage as a storage
backend
* The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver
output items in batches of up to the specified number of items.
* It also serves as a workaround for delayed file delivery,
which causes Scrapy to only start item delivery after the
crawl has finished when using certain storage backends (S3,
FTP, and now GCS).
* The base implementation of item loaders has been moved into a
separate library, itemloaders, allowing usage from outside
Scrapy and a separate release schedule
- Release 2.2.1:
* The startproject command no longer makes unintended changes to
the permissions of files in the destination folder, such as
removing execution permissions.
OBS-URL: https://build.opensuse.org/request/show/889030
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
|
|
|
BuildRequires: %{python_module uvloop}
|
2019-02-19 14:19:30 +00:00
|
|
|
BuildRequires: %{python_module w3lib >= 1.17.2}
|
Accepting request 889030 from home:bnavigator:branches:devel:languages:python
- Update to 2.5.0:
* Official Python 3.9 support
* Experimental HTTP/2 support
* New get_retry_request() function to retry requests from spider
callbacks
* New headers_received signal that allows stopping downloads
early
* New Response.protocol attribute
- Release 2.4.1:
* Fixed feed exports overwrite support
* Fixed the asyncio event loop handling, which could make code
hang
* Fixed the IPv6-capable DNS resolver CachingHostnameResolver
for download handlers that call reactor.resolve
* Fixed the output of the genspider command showing placeholders
instead of the import part of the generated spider module
(issue 4874)
- Release 2.4.0:
* Python 3.5 support has been dropped.
* The file_path method of media pipelines can now access the
source item.
* This allows you to set a download file path based on item data.
* The new item_export_kwargs key of the FEEDS setting allows to
define keyword parameters to pass to item exporter classes.
* You can now choose whether feed exports overwrite or append to
the output file.
* For example, when using the crawl or runspider commands, you
can use the -O option instead of -o to overwrite the output
file.
* Zstd-compressed responses are now supported if zstandard is
installed.
* In settings, where the import path of a class is required, it
is now possible to pass a class object instead.
- Release 2.3.0:
* Feed exports now support Google Cloud Storage as a storage
backend
* The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver
output items in batches of up to the specified number of items.
* It also serves as a workaround for delayed file delivery,
which causes Scrapy to only start item delivery after the
crawl has finished when using certain storage backends (S3,
FTP, and now GCS).
* The base implementation of item loaders has been moved into a
separate library, itemloaders, allowing usage from outside
Scrapy and a separate release schedule
- Release 2.2.1:
* The startproject command no longer makes unintended changes to
the permissions of files in the destination folder, such as
removing execution permissions.
OBS-URL: https://build.opensuse.org/request/show/889030
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
|
|
|
BuildRequires: %{python_module zope.interface >= 4.1.3}
|
2019-02-19 14:19:30 +00:00
|
|
|
BuildRequires: fdupes
|
2019-05-16 19:37:41 +00:00
|
|
|
BuildRequires: python-rpm-macros
|
2019-07-24 10:27:28 +00:00
|
|
|
BuildRequires: python3-Sphinx
|
2022-01-17 06:30:18 +00:00
|
|
|
BuildRequires: (python3-dataclasses if python3-base < 3.7)
|
2020-01-16 15:35:55 +00:00
|
|
|
Requires: python-Protego >= 0.1.15
|
2019-02-19 14:19:30 +00:00
|
|
|
Requires: python-PyDispatcher >= 2.0.5
|
|
|
|
Requires: python-Twisted >= 17.9.0
|
2020-01-16 15:35:55 +00:00
|
|
|
Requires: python-cryptography >= 2.0
|
|
|
|
Requires: python-cssselect >= 0.9.1
|
2020-07-08 06:42:00 +00:00
|
|
|
Requires: python-itemadapter >= 0.1.0
|
Accepting request 889030 from home:bnavigator:branches:devel:languages:python
- Update to 2.5.0:
* Official Python 3.9 support
* Experimental HTTP/2 support
* New get_retry_request() function to retry requests from spider
callbacks
* New headers_received signal that allows stopping downloads
early
* New Response.protocol attribute
- Release 2.4.1:
* Fixed feed exports overwrite support
* Fixed the asyncio event loop handling, which could make code
hang
* Fixed the IPv6-capable DNS resolver CachingHostnameResolver
for download handlers that call reactor.resolve
* Fixed the output of the genspider command showing placeholders
instead of the import part of the generated spider module
(issue 4874)
- Release 2.4.0:
* Python 3.5 support has been dropped.
* The file_path method of media pipelines can now access the
source item.
* This allows you to set a download file path based on item data.
* The new item_export_kwargs key of the FEEDS setting allows to
define keyword parameters to pass to item exporter classes.
* You can now choose whether feed exports overwrite or append to
the output file.
* For example, when using the crawl or runspider commands, you
can use the -O option instead of -o to overwrite the output
file.
* Zstd-compressed responses are now supported if zstandard is
installed.
* In settings, where the import path of a class is required, it
is now possible to pass a class object instead.
- Release 2.3.0:
* Feed exports now support Google Cloud Storage as a storage
backend
* The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver
output items in batches of up to the specified number of items.
* It also serves as a workaround for delayed file delivery,
which causes Scrapy to only start item delivery after the
crawl has finished when using certain storage backends (S3,
FTP, and now GCS).
* The base implementation of item loaders has been moved into a
separate library, itemloaders, allowing usage from outside
Scrapy and a separate release schedule
- Release 2.2.1:
* The startproject command no longer makes unintended changes to
the permissions of files in the destination folder, such as
removing execution permissions.
OBS-URL: https://build.opensuse.org/request/show/889030
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
|
|
|
Requires: python-itemloaders >= 1.0.1
|
2020-01-16 15:35:55 +00:00
|
|
|
Requires: python-lxml >= 3.5.0
|
|
|
|
Requires: python-parsel >= 1.5.0
|
|
|
|
Requires: python-pyOpenSSL >= 16.2.0
|
|
|
|
Requires: python-queuelib >= 1.4.2
|
|
|
|
Requires: python-service_identity >= 16.0.0
|
2019-07-24 09:00:54 +00:00
|
|
|
Requires: python-setuptools
|
2019-02-19 14:19:30 +00:00
|
|
|
Requires: python-w3lib >= 1.17.2
|
2020-01-16 15:35:55 +00:00
|
|
|
Requires: python-zope.interface >= 4.1.3
|
2020-05-19 12:14:06 +00:00
|
|
|
Requires(post): update-alternatives
|
Accepting request 889030 from home:bnavigator:branches:devel:languages:python
- Update to 2.5.0:
* Official Python 3.9 support
* Experimental HTTP/2 support
* New get_retry_request() function to retry requests from spider
callbacks
* New headers_received signal that allows stopping downloads
early
* New Response.protocol attribute
- Release 2.4.1:
* Fixed feed exports overwrite support
* Fixed the asyncio event loop handling, which could make code
hang
* Fixed the IPv6-capable DNS resolver CachingHostnameResolver
for download handlers that call reactor.resolve
* Fixed the output of the genspider command showing placeholders
instead of the import part of the generated spider module
(issue 4874)
- Release 2.4.0:
* Python 3.5 support has been dropped.
* The file_path method of media pipelines can now access the
source item.
* This allows you to set a download file path based on item data.
* The new item_export_kwargs key of the FEEDS setting allows to
define keyword parameters to pass to item exporter classes.
* You can now choose whether feed exports overwrite or append to
the output file.
* For example, when using the crawl or runspider commands, you
can use the -O option instead of -o to overwrite the output
file.
* Zstd-compressed responses are now supported if zstandard is
installed.
* In settings, where the import path of a class is required, it
is now possible to pass a class object instead.
- Release 2.3.0:
* Feed exports now support Google Cloud Storage as a storage
backend
* The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver
output items in batches of up to the specified number of items.
* It also serves as a workaround for delayed file delivery,
which causes Scrapy to only start item delivery after the
crawl has finished when using certain storage backends (S3,
FTP, and now GCS).
* The base implementation of item loaders has been moved into a
separate library, itemloaders, allowing usage from outside
Scrapy and a separate release schedule
- Release 2.2.1:
* The startproject command no longer makes unintended changes to
the permissions of files in the destination folder, such as
removing execution permissions.
OBS-URL: https://build.opensuse.org/request/show/889030
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
|
|
|
Requires(postun):update-alternatives
|
2019-02-19 14:19:30 +00:00
|
|
|
BuildArch: noarch
|
|
|
|
%python_subpackages
|
|
|
|
|
|
|
|
%description
|
|
|
|
Scrapy is a high level scraping and web crawling framework for writing spiders
|
|
|
|
to crawl and parse web pages for all kinds of purposes, from information
|
|
|
|
retrieval to monitoring or testing web sites.
|
|
|
|
|
|
|
|
%package -n %{name}-doc
|
|
|
|
Summary: Documentation for %{name}
|
|
|
|
Group: Documentation/HTML
|
|
|
|
|
|
|
|
%description -n %{name}-doc
|
|
|
|
Provides documentation for %{name}.
|
|
|
|
|
|
|
|
%prep
|
2021-09-09 12:02:15 +00:00
|
|
|
%setup -n Scrapy-%{version}
|
|
|
|
%autopatch -p1
|
2019-07-24 10:27:28 +00:00
|
|
|
sed -i -e 's:= python:= python3:g' docs/Makefile
|
2019-02-19 14:19:30 +00:00
|
|
|
|
|
|
|
%build
|
|
|
|
%python_build
|
|
|
|
pushd docs
|
2020-05-19 12:14:06 +00:00
|
|
|
%make_build html && rm -r build/html/.buildinfo
|
2019-02-19 14:19:30 +00:00
|
|
|
popd
|
|
|
|
|
|
|
|
%install
|
|
|
|
%python_install
|
2020-05-19 12:14:06 +00:00
|
|
|
%python_clone -a %{buildroot}%{_bindir}/scrapy
|
2019-02-19 14:19:30 +00:00
|
|
|
%python_expand %fdupes %{buildroot}%{$python_sitelib}
|
|
|
|
|
|
|
|
%check
|
|
|
|
# tests/test_proxy_connect.py: requires mitmproxy == 0.10.1
|
Accepting request 889030 from home:bnavigator:branches:devel:languages:python
- Update to 2.5.0:
* Official Python 3.9 support
* Experimental HTTP/2 support
* New get_retry_request() function to retry requests from spider
callbacks
* New headers_received signal that allows stopping downloads
early
* New Response.protocol attribute
- Release 2.4.1:
* Fixed feed exports overwrite support
* Fixed the asyncio event loop handling, which could make code
hang
* Fixed the IPv6-capable DNS resolver CachingHostnameResolver
for download handlers that call reactor.resolve
* Fixed the output of the genspider command showing placeholders
instead of the import part of the generated spider module
(issue 4874)
- Release 2.4.0:
* Python 3.5 support has been dropped.
* The file_path method of media pipelines can now access the
source item.
* This allows you to set a download file path based on item data.
* The new item_export_kwargs key of the FEEDS setting allows to
define keyword parameters to pass to item exporter classes.
* You can now choose whether feed exports overwrite or append to
the output file.
* For example, when using the crawl or runspider commands, you
can use the -O option instead of -o to overwrite the output
file.
* Zstd-compressed responses are now supported if zstandard is
installed.
* In settings, where the import path of a class is required, it
is now possible to pass a class object instead.
- Release 2.3.0:
* Feed exports now support Google Cloud Storage as a storage
backend
* The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver
output items in batches of up to the specified number of items.
* It also serves as a workaround for delayed file delivery,
which causes Scrapy to only start item delivery after the
crawl has finished when using certain storage backends (S3,
FTP, and now GCS).
* The base implementation of item loaders has been moved into a
separate library, itemloaders, allowing usage from outside
Scrapy and a separate release schedule
- Release 2.2.1:
* The startproject command no longer makes unintended changes to
the permissions of files in the destination folder, such as
removing execution permissions.
OBS-URL: https://build.opensuse.org/request/show/889030
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
|
|
|
# tests/test_downloader_handlers_*.py and test_http2_client_protocol.py: no network
|
|
|
|
# tests/test_command_check.py: twisted dns resolution of example.com error
|
|
|
|
# no color in obs chroot console
|
2022-01-17 06:30:18 +00:00
|
|
|
skiplist="test_pformat"
|
|
|
|
# correct exception but not recognized due to different format
|
|
|
|
python310_skiplist=" or test_callback_kwargs"
|
Accepting request 889030 from home:bnavigator:branches:devel:languages:python
- Update to 2.5.0:
* Official Python 3.9 support
* Experimental HTTP/2 support
* New get_retry_request() function to retry requests from spider
callbacks
* New headers_received signal that allows stopping downloads
early
* New Response.protocol attribute
- Release 2.4.1:
* Fixed feed exports overwrite support
* Fixed the asyncio event loop handling, which could make code
hang
* Fixed the IPv6-capable DNS resolver CachingHostnameResolver
for download handlers that call reactor.resolve
* Fixed the output of the genspider command showing placeholders
instead of the import part of the generated spider module
(issue 4874)
- Release 2.4.0:
* Python 3.5 support has been dropped.
* The file_path method of media pipelines can now access the
source item.
* This allows you to set a download file path based on item data.
* The new item_export_kwargs key of the FEEDS setting allows to
define keyword parameters to pass to item exporter classes.
* You can now choose whether feed exports overwrite or append to
the output file.
* For example, when using the crawl or runspider commands, you
can use the -O option instead of -o to overwrite the output
file.
* Zstd-compressed responses are now supported if zstandard is
installed.
* In settings, where the import path of a class is required, it
is now possible to pass a class object instead.
- Release 2.3.0:
* Feed exports now support Google Cloud Storage as a storage
backend
* The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver
output items in batches of up to the specified number of items.
* It also serves as a workaround for delayed file delivery,
which causes Scrapy to only start item delivery after the
crawl has finished when using certain storage backends (S3,
FTP, and now GCS).
* The base implementation of item loaders has been moved into a
separate library, itemloaders, allowing usage from outside
Scrapy and a separate release schedule
- Release 2.2.1:
* The startproject command no longer makes unintended changes to
the permissions of files in the destination folder, such as
removing execution permissions.
OBS-URL: https://build.opensuse.org/request/show/889030
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
|
|
|
%{pytest \
|
2019-02-19 14:19:30 +00:00
|
|
|
--ignore tests/test_proxy_connect.py \
|
Accepting request 889030 from home:bnavigator:branches:devel:languages:python
- Update to 2.5.0:
* Official Python 3.9 support
* Experimental HTTP/2 support
* New get_retry_request() function to retry requests from spider
callbacks
* New headers_received signal that allows stopping downloads
early
* New Response.protocol attribute
- Release 2.4.1:
* Fixed feed exports overwrite support
* Fixed the asyncio event loop handling, which could make code
hang
* Fixed the IPv6-capable DNS resolver CachingHostnameResolver
for download handlers that call reactor.resolve
* Fixed the output of the genspider command showing placeholders
instead of the import part of the generated spider module
(issue 4874)
- Release 2.4.0:
* Python 3.5 support has been dropped.
* The file_path method of media pipelines can now access the
source item.
* This allows you to set a download file path based on item data.
* The new item_export_kwargs key of the FEEDS setting allows to
define keyword parameters to pass to item exporter classes.
* You can now choose whether feed exports overwrite or append to
the output file.
* For example, when using the crawl or runspider commands, you
can use the -O option instead of -o to overwrite the output
file.
* Zstd-compressed responses are now supported if zstandard is
installed.
* In settings, where the import path of a class is required, it
is now possible to pass a class object instead.
- Release 2.3.0:
* Feed exports now support Google Cloud Storage as a storage
backend
* The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver
output items in batches of up to the specified number of items.
* It also serves as a workaround for delayed file delivery,
which causes Scrapy to only start item delivery after the
crawl has finished when using certain storage backends (S3,
FTP, and now GCS).
* The base implementation of item loaders has been moved into a
separate library, itemloaders, allowing usage from outside
Scrapy and a separate release schedule
- Release 2.2.1:
* The startproject command no longer makes unintended changes to
the permissions of files in the destination folder, such as
removing execution permissions.
OBS-URL: https://build.opensuse.org/request/show/889030
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
|
|
|
--ignore tests/test_command_check.py \
|
2020-07-08 06:42:00 +00:00
|
|
|
--ignore tests/test_downloader_handlers.py \
|
Accepting request 889030 from home:bnavigator:branches:devel:languages:python
- Update to 2.5.0:
* Official Python 3.9 support
* Experimental HTTP/2 support
* New get_retry_request() function to retry requests from spider
callbacks
* New headers_received signal that allows stopping downloads
early
* New Response.protocol attribute
- Release 2.4.1:
* Fixed feed exports overwrite support
* Fixed the asyncio event loop handling, which could make code
hang
* Fixed the IPv6-capable DNS resolver CachingHostnameResolver
for download handlers that call reactor.resolve
* Fixed the output of the genspider command showing placeholders
instead of the import part of the generated spider module
(issue 4874)
- Release 2.4.0:
* Python 3.5 support has been dropped.
* The file_path method of media pipelines can now access the
source item.
* This allows you to set a download file path based on item data.
* The new item_export_kwargs key of the FEEDS setting allows to
define keyword parameters to pass to item exporter classes.
* You can now choose whether feed exports overwrite or append to
the output file.
* For example, when using the crawl or runspider commands, you
can use the -O option instead of -o to overwrite the output
file.
* Zstd-compressed responses are now supported if zstandard is
installed.
* In settings, where the import path of a class is required, it
is now possible to pass a class object instead.
- Release 2.3.0:
* Feed exports now support Google Cloud Storage as a storage
backend
* The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver
output items in batches of up to the specified number of items.
* It also serves as a workaround for delayed file delivery,
which causes Scrapy to only start item delivery after the
crawl has finished when using certain storage backends (S3,
FTP, and now GCS).
* The base implementation of item loaders has been moved into a
separate library, itemloaders, allowing usage from outside
Scrapy and a separate release schedule
- Release 2.2.1:
* The startproject command no longer makes unintended changes to
the permissions of files in the destination folder, such as
removing execution permissions.
OBS-URL: https://build.opensuse.org/request/show/889030
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
|
|
|
--ignore tests/test_downloader_handlers_http2.py \
|
|
|
|
--ignore tests/test_http2_client_protocol.py \
|
2022-01-17 06:30:18 +00:00
|
|
|
-k "not (${skiplist} ${$python_skiplist})" \
|
2019-02-19 14:19:30 +00:00
|
|
|
-W ignore::DeprecationWarning \
|
|
|
|
tests}
|
|
|
|
|
2020-05-19 12:14:06 +00:00
|
|
|
%post
|
|
|
|
%python_install_alternative scrapy
|
|
|
|
|
|
|
|
%postun
|
|
|
|
%python_uninstall_alternative scrapy
|
|
|
|
|
2019-02-19 14:19:30 +00:00
|
|
|
%files %{python_files}
|
2019-07-24 09:00:54 +00:00
|
|
|
%license LICENSE
|
|
|
|
%doc AUTHORS README.rst
|
2022-01-17 06:30:18 +00:00
|
|
|
%{python_sitelib}/scrapy
|
|
|
|
%{python_sitelib}/Scrapy-%{version}*-info
|
2020-05-19 12:14:06 +00:00
|
|
|
%python_alternative %{_bindir}/scrapy
|
2019-02-19 14:19:30 +00:00
|
|
|
|
|
|
|
%files -n %{name}-doc
|
|
|
|
%doc docs/build/html
|
|
|
|
|
|
|
|
%changelog
|