14
0
forked from pool/python-Scrapy
Files
python-Scrapy/python-Scrapy.spec

136 lines
4.4 KiB
RPMSpec
Raw Normal View History

#
# spec file for package python-Scrapy
#
# Copyright (c) 2022 SUSE LLC
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
# upon. The license for this file, and modifications and additions to the
# file, is the same license as for the pristine package itself (unless the
# license for the pristine package is not an Open Source License, in which
# case the license is the MIT License). An "Open Source License" is a
# license that conforms to the Open Source Definition (Version 1.9)
# published by the Open Source Initiative.
# Please submit bugfixes or comments via https://bugs.opensuse.org/
#
%{?!python_module:%define python_module() python3-%{**}}
%define skip_python2 1
Name: python-Scrapy
Accepting request 1002338 from home:yarunachalam:branches:devel:languages:python - Update to v2.6.2 Security bug fix: * When HttpProxyMiddleware processes a request with proxy metadata, and that proxy metadata includes proxy credentials, HttpProxyMiddleware sets the Proxy-Authentication header, but only if that header is not already set. * There are third-party proxy-rotation downloader middlewares that set different proxy metadata every time they process a request. * Because of request retries and redirects, the same request can be processed by downloader middlewares more than once, including both HttpProxyMiddleware and any third-party proxy-rotation downloader middleware. * These third-party proxy-rotation downloader middlewares could change the proxy metadata of a request to a new value, but fail to remove the Proxy-Authentication header from the previous value of the proxy metadata, causing the credentials of one proxy to be sent to a different proxy. * To prevent the unintended leaking of proxy credentials, the behavior of HttpProxyMiddleware is now as follows when processing a request: + If the request being processed defines proxy metadata that includes credentials, the Proxy-Authorization header is always updated to feature those credentials. + If the request being processed defines proxy metadata without credentials, the Proxy-Authorization header is removed unless it was originally defined for the same proxy URL. + To remove proxy credentials while keeping the same proxy URL, remove the Proxy-Authorization header. + If the request has no proxy metadata, or that metadata is a falsy value (e.g. None), the Proxy-Authorization header is removed. + It is no longer possible to set a proxy URL through the proxy metadata but set the credentials through the Proxy-Authorization header. Set proxy credentials through the proxy metadata instead. * Also fixes the following regressions introduced in 2.6.0: + CrawlerProcess supports again crawling multiple spiders (issue 5435, issue 5436) + Installing a Twisted reactor before Scrapy does (e.g. importing twisted.internet.reactor somewhere at the module level) no longer prevents Scrapy from starting, as long as a different reactor is not specified in TWISTED_REACTOR (issue 5525, issue 5528) + Fixed an exception that was being logged after the spider finished under certain conditions (issue 5437, issue 5440) + The --output/-o command-line parameter supports again a value starting with a hyphen (issue 5444, issue 5445) + The scrapy parse -h command no longer throws an error (issue 5481, issue 5482) OBS-URL: https://build.opensuse.org/request/show/1002338 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=28
2022-09-12 08:00:07 +00:00
Version: 2.6.2
Release: 0
Summary: A high-level Python Screen Scraping framework
License: BSD-3-Clause
Group: Development/Languages/Python
URL: https://scrapy.org
Source: https://files.pythonhosted.org/packages/source/S/Scrapy/Scrapy-%{version}.tar.gz
BuildRequires: %{python_module Pillow}
BuildRequires: %{python_module Protego >= 0.1.15}
BuildRequires: %{python_module PyDispatcher >= 2.0.5}
BuildRequires: %{python_module Twisted >= 17.9.0}
Accepting request 889030 from home:bnavigator:branches:devel:languages:python - Update to 2.5.0: * Official Python 3.9 support * Experimental HTTP/2 support * New get_retry_request() function to retry requests from spider callbacks * New headers_received signal that allows stopping downloads early * New Response.protocol attribute - Release 2.4.1: * Fixed feed exports overwrite support * Fixed the asyncio event loop handling, which could make code hang * Fixed the IPv6-capable DNS resolver CachingHostnameResolver for download handlers that call reactor.resolve * Fixed the output of the genspider command showing placeholders instead of the import part of the generated spider module (issue 4874) - Release 2.4.0: * Python 3.5 support has been dropped. * The file_path method of media pipelines can now access the source item. * This allows you to set a download file path based on item data. * The new item_export_kwargs key of the FEEDS setting allows to define keyword parameters to pass to item exporter classes. * You can now choose whether feed exports overwrite or append to the output file. * For example, when using the crawl or runspider commands, you can use the -O option instead of -o to overwrite the output file. * Zstd-compressed responses are now supported if zstandard is installed. * In settings, where the import path of a class is required, it is now possible to pass a class object instead. - Release 2.3.0: * Feed exports now support Google Cloud Storage as a storage backend * The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver output items in batches of up to the specified number of items. * It also serves as a workaround for delayed file delivery, which causes Scrapy to only start item delivery after the crawl has finished when using certain storage backends (S3, FTP, and now GCS). * The base implementation of item loaders has been moved into a separate library, itemloaders, allowing usage from outside Scrapy and a separate release schedule - Release 2.2.1: * The startproject command no longer makes unintended changes to the permissions of files in the destination folder, such as removing execution permissions. OBS-URL: https://build.opensuse.org/request/show/889030 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
BuildRequires: %{python_module botocore}
BuildRequires: %{python_module cryptography >= 2.0}
BuildRequires: %{python_module cssselect >= 0.9.1}
BuildRequires: %{python_module dbm}
BuildRequires: %{python_module itemadapter >= 0.1.0}
Accepting request 889030 from home:bnavigator:branches:devel:languages:python - Update to 2.5.0: * Official Python 3.9 support * Experimental HTTP/2 support * New get_retry_request() function to retry requests from spider callbacks * New headers_received signal that allows stopping downloads early * New Response.protocol attribute - Release 2.4.1: * Fixed feed exports overwrite support * Fixed the asyncio event loop handling, which could make code hang * Fixed the IPv6-capable DNS resolver CachingHostnameResolver for download handlers that call reactor.resolve * Fixed the output of the genspider command showing placeholders instead of the import part of the generated spider module (issue 4874) - Release 2.4.0: * Python 3.5 support has been dropped. * The file_path method of media pipelines can now access the source item. * This allows you to set a download file path based on item data. * The new item_export_kwargs key of the FEEDS setting allows to define keyword parameters to pass to item exporter classes. * You can now choose whether feed exports overwrite or append to the output file. * For example, when using the crawl or runspider commands, you can use the -O option instead of -o to overwrite the output file. * Zstd-compressed responses are now supported if zstandard is installed. * In settings, where the import path of a class is required, it is now possible to pass a class object instead. - Release 2.3.0: * Feed exports now support Google Cloud Storage as a storage backend * The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver output items in batches of up to the specified number of items. * It also serves as a workaround for delayed file delivery, which causes Scrapy to only start item delivery after the crawl has finished when using certain storage backends (S3, FTP, and now GCS). * The base implementation of item loaders has been moved into a separate library, itemloaders, allowing usage from outside Scrapy and a separate release schedule - Release 2.2.1: * The startproject command no longer makes unintended changes to the permissions of files in the destination folder, such as removing execution permissions. OBS-URL: https://build.opensuse.org/request/show/889030 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
BuildRequires: %{python_module itemloaders >= 1.0.1}
BuildRequires: %{python_module jmespath}
BuildRequires: %{python_module lxml >= 3.5.0}
BuildRequires: %{python_module parsel >= 1.5.0}
BuildRequires: %{python_module pyOpenSSL >= 16.2.0}
Accepting request 889030 from home:bnavigator:branches:devel:languages:python - Update to 2.5.0: * Official Python 3.9 support * Experimental HTTP/2 support * New get_retry_request() function to retry requests from spider callbacks * New headers_received signal that allows stopping downloads early * New Response.protocol attribute - Release 2.4.1: * Fixed feed exports overwrite support * Fixed the asyncio event loop handling, which could make code hang * Fixed the IPv6-capable DNS resolver CachingHostnameResolver for download handlers that call reactor.resolve * Fixed the output of the genspider command showing placeholders instead of the import part of the generated spider module (issue 4874) - Release 2.4.0: * Python 3.5 support has been dropped. * The file_path method of media pipelines can now access the source item. * This allows you to set a download file path based on item data. * The new item_export_kwargs key of the FEEDS setting allows to define keyword parameters to pass to item exporter classes. * You can now choose whether feed exports overwrite or append to the output file. * For example, when using the crawl or runspider commands, you can use the -O option instead of -o to overwrite the output file. * Zstd-compressed responses are now supported if zstandard is installed. * In settings, where the import path of a class is required, it is now possible to pass a class object instead. - Release 2.3.0: * Feed exports now support Google Cloud Storage as a storage backend * The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver output items in batches of up to the specified number of items. * It also serves as a workaround for delayed file delivery, which causes Scrapy to only start item delivery after the crawl has finished when using certain storage backends (S3, FTP, and now GCS). * The base implementation of item loaders has been moved into a separate library, itemloaders, allowing usage from outside Scrapy and a separate release schedule - Release 2.2.1: * The startproject command no longer makes unintended changes to the permissions of files in the destination folder, such as removing execution permissions. OBS-URL: https://build.opensuse.org/request/show/889030 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
BuildRequires: %{python_module pyftpdlib}
BuildRequires: %{python_module pytest-xdist}
BuildRequires: %{python_module pytest}
BuildRequires: %{python_module queuelib >= 1.4.2}
BuildRequires: %{python_module service_identity >= 16.0.0}
BuildRequires: %{python_module setuptools}
BuildRequires: %{python_module sybil}
BuildRequires: %{python_module testfixtures >= 6.0.0}
BuildRequires: %{python_module tldextract}
Accepting request 889030 from home:bnavigator:branches:devel:languages:python - Update to 2.5.0: * Official Python 3.9 support * Experimental HTTP/2 support * New get_retry_request() function to retry requests from spider callbacks * New headers_received signal that allows stopping downloads early * New Response.protocol attribute - Release 2.4.1: * Fixed feed exports overwrite support * Fixed the asyncio event loop handling, which could make code hang * Fixed the IPv6-capable DNS resolver CachingHostnameResolver for download handlers that call reactor.resolve * Fixed the output of the genspider command showing placeholders instead of the import part of the generated spider module (issue 4874) - Release 2.4.0: * Python 3.5 support has been dropped. * The file_path method of media pipelines can now access the source item. * This allows you to set a download file path based on item data. * The new item_export_kwargs key of the FEEDS setting allows to define keyword parameters to pass to item exporter classes. * You can now choose whether feed exports overwrite or append to the output file. * For example, when using the crawl or runspider commands, you can use the -O option instead of -o to overwrite the output file. * Zstd-compressed responses are now supported if zstandard is installed. * In settings, where the import path of a class is required, it is now possible to pass a class object instead. - Release 2.3.0: * Feed exports now support Google Cloud Storage as a storage backend * The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver output items in batches of up to the specified number of items. * It also serves as a workaround for delayed file delivery, which causes Scrapy to only start item delivery after the crawl has finished when using certain storage backends (S3, FTP, and now GCS). * The base implementation of item loaders has been moved into a separate library, itemloaders, allowing usage from outside Scrapy and a separate release schedule - Release 2.2.1: * The startproject command no longer makes unintended changes to the permissions of files in the destination folder, such as removing execution permissions. OBS-URL: https://build.opensuse.org/request/show/889030 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
BuildRequires: %{python_module uvloop}
BuildRequires: %{python_module w3lib >= 1.17.0}
Accepting request 889030 from home:bnavigator:branches:devel:languages:python - Update to 2.5.0: * Official Python 3.9 support * Experimental HTTP/2 support * New get_retry_request() function to retry requests from spider callbacks * New headers_received signal that allows stopping downloads early * New Response.protocol attribute - Release 2.4.1: * Fixed feed exports overwrite support * Fixed the asyncio event loop handling, which could make code hang * Fixed the IPv6-capable DNS resolver CachingHostnameResolver for download handlers that call reactor.resolve * Fixed the output of the genspider command showing placeholders instead of the import part of the generated spider module (issue 4874) - Release 2.4.0: * Python 3.5 support has been dropped. * The file_path method of media pipelines can now access the source item. * This allows you to set a download file path based on item data. * The new item_export_kwargs key of the FEEDS setting allows to define keyword parameters to pass to item exporter classes. * You can now choose whether feed exports overwrite or append to the output file. * For example, when using the crawl or runspider commands, you can use the -O option instead of -o to overwrite the output file. * Zstd-compressed responses are now supported if zstandard is installed. * In settings, where the import path of a class is required, it is now possible to pass a class object instead. - Release 2.3.0: * Feed exports now support Google Cloud Storage as a storage backend * The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver output items in batches of up to the specified number of items. * It also serves as a workaround for delayed file delivery, which causes Scrapy to only start item delivery after the crawl has finished when using certain storage backends (S3, FTP, and now GCS). * The base implementation of item loaders has been moved into a separate library, itemloaders, allowing usage from outside Scrapy and a separate release schedule - Release 2.2.1: * The startproject command no longer makes unintended changes to the permissions of files in the destination folder, such as removing execution permissions. OBS-URL: https://build.opensuse.org/request/show/889030 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
BuildRequires: %{python_module zope.interface >= 4.1.3}
BuildRequires: fdupes
BuildRequires: python-rpm-macros
BuildRequires: python3-Sphinx
BuildRequires: (python3-dataclasses if python3-base < 3.7)
Requires: python-Protego >= 0.1.15
Requires: python-PyDispatcher >= 2.0.5
Requires: python-Twisted >= 17.9.0
Requires: python-cryptography >= 2.0
Requires: python-cssselect >= 0.9.1
Requires: python-itemadapter >= 0.1.0
Accepting request 889030 from home:bnavigator:branches:devel:languages:python - Update to 2.5.0: * Official Python 3.9 support * Experimental HTTP/2 support * New get_retry_request() function to retry requests from spider callbacks * New headers_received signal that allows stopping downloads early * New Response.protocol attribute - Release 2.4.1: * Fixed feed exports overwrite support * Fixed the asyncio event loop handling, which could make code hang * Fixed the IPv6-capable DNS resolver CachingHostnameResolver for download handlers that call reactor.resolve * Fixed the output of the genspider command showing placeholders instead of the import part of the generated spider module (issue 4874) - Release 2.4.0: * Python 3.5 support has been dropped. * The file_path method of media pipelines can now access the source item. * This allows you to set a download file path based on item data. * The new item_export_kwargs key of the FEEDS setting allows to define keyword parameters to pass to item exporter classes. * You can now choose whether feed exports overwrite or append to the output file. * For example, when using the crawl or runspider commands, you can use the -O option instead of -o to overwrite the output file. * Zstd-compressed responses are now supported if zstandard is installed. * In settings, where the import path of a class is required, it is now possible to pass a class object instead. - Release 2.3.0: * Feed exports now support Google Cloud Storage as a storage backend * The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver output items in batches of up to the specified number of items. * It also serves as a workaround for delayed file delivery, which causes Scrapy to only start item delivery after the crawl has finished when using certain storage backends (S3, FTP, and now GCS). * The base implementation of item loaders has been moved into a separate library, itemloaders, allowing usage from outside Scrapy and a separate release schedule - Release 2.2.1: * The startproject command no longer makes unintended changes to the permissions of files in the destination folder, such as removing execution permissions. OBS-URL: https://build.opensuse.org/request/show/889030 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
Requires: python-itemloaders >= 1.0.1
Requires: python-lxml >= 3.5.0
Requires: python-parsel >= 1.5.0
Requires: python-pyOpenSSL >= 16.2.0
Requires: python-queuelib >= 1.4.2
Requires: python-service_identity >= 16.0.0
Requires: python-setuptools
Requires: python-tldextract
Requires: python-w3lib >= 1.17.2
Requires: python-zope.interface >= 4.1.3
Requires(post): update-alternatives
Accepting request 889030 from home:bnavigator:branches:devel:languages:python - Update to 2.5.0: * Official Python 3.9 support * Experimental HTTP/2 support * New get_retry_request() function to retry requests from spider callbacks * New headers_received signal that allows stopping downloads early * New Response.protocol attribute - Release 2.4.1: * Fixed feed exports overwrite support * Fixed the asyncio event loop handling, which could make code hang * Fixed the IPv6-capable DNS resolver CachingHostnameResolver for download handlers that call reactor.resolve * Fixed the output of the genspider command showing placeholders instead of the import part of the generated spider module (issue 4874) - Release 2.4.0: * Python 3.5 support has been dropped. * The file_path method of media pipelines can now access the source item. * This allows you to set a download file path based on item data. * The new item_export_kwargs key of the FEEDS setting allows to define keyword parameters to pass to item exporter classes. * You can now choose whether feed exports overwrite or append to the output file. * For example, when using the crawl or runspider commands, you can use the -O option instead of -o to overwrite the output file. * Zstd-compressed responses are now supported if zstandard is installed. * In settings, where the import path of a class is required, it is now possible to pass a class object instead. - Release 2.3.0: * Feed exports now support Google Cloud Storage as a storage backend * The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver output items in batches of up to the specified number of items. * It also serves as a workaround for delayed file delivery, which causes Scrapy to only start item delivery after the crawl has finished when using certain storage backends (S3, FTP, and now GCS). * The base implementation of item loaders has been moved into a separate library, itemloaders, allowing usage from outside Scrapy and a separate release schedule - Release 2.2.1: * The startproject command no longer makes unintended changes to the permissions of files in the destination folder, such as removing execution permissions. OBS-URL: https://build.opensuse.org/request/show/889030 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
Requires(postun):update-alternatives
BuildArch: noarch
%python_subpackages
%description
Scrapy is a high level scraping and web crawling framework for writing spiders
to crawl and parse web pages for all kinds of purposes, from information
retrieval to monitoring or testing web sites.
%package -n %{name}-doc
Summary: Documentation for %{name}
Group: Documentation/HTML
%description -n %{name}-doc
Provides documentation for %{name}.
%prep
%setup -n Scrapy-%{version}
%autopatch -p1
sed -i -e 's:= python:= python3:g' docs/Makefile
%build
%python_build
pushd docs
%make_build html && rm -r build/html/.buildinfo
popd
%install
%python_install
%python_clone -a %{buildroot}%{_bindir}/scrapy
%python_expand %fdupes %{buildroot}%{$python_sitelib}
%check
Accepting request 889030 from home:bnavigator:branches:devel:languages:python - Update to 2.5.0: * Official Python 3.9 support * Experimental HTTP/2 support * New get_retry_request() function to retry requests from spider callbacks * New headers_received signal that allows stopping downloads early * New Response.protocol attribute - Release 2.4.1: * Fixed feed exports overwrite support * Fixed the asyncio event loop handling, which could make code hang * Fixed the IPv6-capable DNS resolver CachingHostnameResolver for download handlers that call reactor.resolve * Fixed the output of the genspider command showing placeholders instead of the import part of the generated spider module (issue 4874) - Release 2.4.0: * Python 3.5 support has been dropped. * The file_path method of media pipelines can now access the source item. * This allows you to set a download file path based on item data. * The new item_export_kwargs key of the FEEDS setting allows to define keyword parameters to pass to item exporter classes. * You can now choose whether feed exports overwrite or append to the output file. * For example, when using the crawl or runspider commands, you can use the -O option instead of -o to overwrite the output file. * Zstd-compressed responses are now supported if zstandard is installed. * In settings, where the import path of a class is required, it is now possible to pass a class object instead. - Release 2.3.0: * Feed exports now support Google Cloud Storage as a storage backend * The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver output items in batches of up to the specified number of items. * It also serves as a workaround for delayed file delivery, which causes Scrapy to only start item delivery after the crawl has finished when using certain storage backends (S3, FTP, and now GCS). * The base implementation of item loaders has been moved into a separate library, itemloaders, allowing usage from outside Scrapy and a separate release schedule - Release 2.2.1: * The startproject command no longer makes unintended changes to the permissions of files in the destination folder, such as removing execution permissions. OBS-URL: https://build.opensuse.org/request/show/889030 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
# no color in obs chroot console
skiplist="test_pformat"
# no online connection to toscrapy.com
skiplist="$skiplist or CheckCommandTest"
Accepting request 889030 from home:bnavigator:branches:devel:languages:python - Update to 2.5.0: * Official Python 3.9 support * Experimental HTTP/2 support * New get_retry_request() function to retry requests from spider callbacks * New headers_received signal that allows stopping downloads early * New Response.protocol attribute - Release 2.4.1: * Fixed feed exports overwrite support * Fixed the asyncio event loop handling, which could make code hang * Fixed the IPv6-capable DNS resolver CachingHostnameResolver for download handlers that call reactor.resolve * Fixed the output of the genspider command showing placeholders instead of the import part of the generated spider module (issue 4874) - Release 2.4.0: * Python 3.5 support has been dropped. * The file_path method of media pipelines can now access the source item. * This allows you to set a download file path based on item data. * The new item_export_kwargs key of the FEEDS setting allows to define keyword parameters to pass to item exporter classes. * You can now choose whether feed exports overwrite or append to the output file. * For example, when using the crawl or runspider commands, you can use the -O option instead of -o to overwrite the output file. * Zstd-compressed responses are now supported if zstandard is installed. * In settings, where the import path of a class is required, it is now possible to pass a class object instead. - Release 2.3.0: * Feed exports now support Google Cloud Storage as a storage backend * The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver output items in batches of up to the specified number of items. * It also serves as a workaround for delayed file delivery, which causes Scrapy to only start item delivery after the crawl has finished when using certain storage backends (S3, FTP, and now GCS). * The base implementation of item loaders has been moved into a separate library, itemloaders, allowing usage from outside Scrapy and a separate release schedule - Release 2.2.1: * The startproject command no longer makes unintended changes to the permissions of files in the destination folder, such as removing execution permissions. OBS-URL: https://build.opensuse.org/request/show/889030 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
%{pytest \
-k "not (${skiplist})" \
-W ignore::DeprecationWarning \
tests}
%post
%python_install_alternative scrapy
%postun
%python_uninstall_alternative scrapy
%files %{python_files}
%license LICENSE
%doc AUTHORS README.rst
%{python_sitelib}/scrapy
%{python_sitelib}/Scrapy-%{version}*-info
%python_alternative %{_bindir}/scrapy
%files -n %{name}-doc
%doc docs/build/html
%changelog