14
0
forked from pool/python-Scrapy
Files
python-Scrapy/python-Scrapy.spec

138 lines
4.6 KiB
RPMSpec
Raw Normal View History

#
# spec file for package python-Scrapy
#
# Copyright (c) 2024 SUSE LLC
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
# upon. The license for this file, and modifications and additions to the
# file, is the same license as for the pristine package itself (unless the
# license for the pristine package is not an Open Source License, in which
# case the license is the MIT License). An "Open Source License" is a
# license that conforms to the Open Source Definition (Version 1.9)
# published by the Open Source Initiative.
# Please submit bugfixes or comments via https://bugs.opensuse.org/
#
Name: python-Scrapy
Version: 2.11.0
Release: 0
Summary: A high-level Python Screen Scraping framework
License: BSD-3-Clause
Group: Development/Languages/Python
URL: https://scrapy.org
Source: https://files.pythonhosted.org/packages/source/S/Scrapy/Scrapy-%{version}.tar.gz
# PATCH-FIX-UPSTREAM twisted-23.8.0-compat.patch gh#scrapy/scrapy#6064
Patch1: twisted-23.8.0-compat.patch
BuildRequires: %{python_module Pillow}
BuildRequires: %{python_module Protego >= 0.1.15}
BuildRequires: %{python_module PyDispatcher >= 2.0.5}
BuildRequires: %{python_module Twisted >= 18.9.0}
BuildRequires: %{python_module attrs}
BuildRequires: %{python_module botocore >= 1.4.87}
BuildRequires: %{python_module cryptography >= 36.0.0}
BuildRequires: %{python_module cssselect >= 0.9.1}
BuildRequires: %{python_module dbm}
BuildRequires: %{python_module itemadapter >= 0.1.0}
Accepting request 889030 from home:bnavigator:branches:devel:languages:python - Update to 2.5.0: * Official Python 3.9 support * Experimental HTTP/2 support * New get_retry_request() function to retry requests from spider callbacks * New headers_received signal that allows stopping downloads early * New Response.protocol attribute - Release 2.4.1: * Fixed feed exports overwrite support * Fixed the asyncio event loop handling, which could make code hang * Fixed the IPv6-capable DNS resolver CachingHostnameResolver for download handlers that call reactor.resolve * Fixed the output of the genspider command showing placeholders instead of the import part of the generated spider module (issue 4874) - Release 2.4.0: * Python 3.5 support has been dropped. * The file_path method of media pipelines can now access the source item. * This allows you to set a download file path based on item data. * The new item_export_kwargs key of the FEEDS setting allows to define keyword parameters to pass to item exporter classes. * You can now choose whether feed exports overwrite or append to the output file. * For example, when using the crawl or runspider commands, you can use the -O option instead of -o to overwrite the output file. * Zstd-compressed responses are now supported if zstandard is installed. * In settings, where the import path of a class is required, it is now possible to pass a class object instead. - Release 2.3.0: * Feed exports now support Google Cloud Storage as a storage backend * The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver output items in batches of up to the specified number of items. * It also serves as a workaround for delayed file delivery, which causes Scrapy to only start item delivery after the crawl has finished when using certain storage backends (S3, FTP, and now GCS). * The base implementation of item loaders has been moved into a separate library, itemloaders, allowing usage from outside Scrapy and a separate release schedule - Release 2.2.1: * The startproject command no longer makes unintended changes to the permissions of files in the destination folder, such as removing execution permissions. OBS-URL: https://build.opensuse.org/request/show/889030 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
BuildRequires: %{python_module itemloaders >= 1.0.1}
BuildRequires: %{python_module lxml >= 4.4.1}
BuildRequires: %{python_module parsel >= 1.5.0}
BuildRequires: %{python_module pexpect >= 4.8.1}
BuildRequires: %{python_module pyOpenSSL >= 21.0.0}
Accepting request 889030 from home:bnavigator:branches:devel:languages:python - Update to 2.5.0: * Official Python 3.9 support * Experimental HTTP/2 support * New get_retry_request() function to retry requests from spider callbacks * New headers_received signal that allows stopping downloads early * New Response.protocol attribute - Release 2.4.1: * Fixed feed exports overwrite support * Fixed the asyncio event loop handling, which could make code hang * Fixed the IPv6-capable DNS resolver CachingHostnameResolver for download handlers that call reactor.resolve * Fixed the output of the genspider command showing placeholders instead of the import part of the generated spider module (issue 4874) - Release 2.4.0: * Python 3.5 support has been dropped. * The file_path method of media pipelines can now access the source item. * This allows you to set a download file path based on item data. * The new item_export_kwargs key of the FEEDS setting allows to define keyword parameters to pass to item exporter classes. * You can now choose whether feed exports overwrite or append to the output file. * For example, when using the crawl or runspider commands, you can use the -O option instead of -o to overwrite the output file. * Zstd-compressed responses are now supported if zstandard is installed. * In settings, where the import path of a class is required, it is now possible to pass a class object instead. - Release 2.3.0: * Feed exports now support Google Cloud Storage as a storage backend * The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver output items in batches of up to the specified number of items. * It also serves as a workaround for delayed file delivery, which causes Scrapy to only start item delivery after the crawl has finished when using certain storage backends (S3, FTP, and now GCS). * The base implementation of item loaders has been moved into a separate library, itemloaders, allowing usage from outside Scrapy and a separate release schedule - Release 2.2.1: * The startproject command no longer makes unintended changes to the permissions of files in the destination folder, such as removing execution permissions. OBS-URL: https://build.opensuse.org/request/show/889030 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
BuildRequires: %{python_module pyftpdlib}
BuildRequires: %{python_module pytest-xdist}
BuildRequires: %{python_module pytest}
BuildRequires: %{python_module queuelib >= 1.4.2}
BuildRequires: %{python_module service_identity >= 18.1.0}
BuildRequires: %{python_module setuptools}
BuildRequires: %{python_module sybil}
BuildRequires: %{python_module testfixtures}
BuildRequires: %{python_module tldextract}
Accepting request 889030 from home:bnavigator:branches:devel:languages:python - Update to 2.5.0: * Official Python 3.9 support * Experimental HTTP/2 support * New get_retry_request() function to retry requests from spider callbacks * New headers_received signal that allows stopping downloads early * New Response.protocol attribute - Release 2.4.1: * Fixed feed exports overwrite support * Fixed the asyncio event loop handling, which could make code hang * Fixed the IPv6-capable DNS resolver CachingHostnameResolver for download handlers that call reactor.resolve * Fixed the output of the genspider command showing placeholders instead of the import part of the generated spider module (issue 4874) - Release 2.4.0: * Python 3.5 support has been dropped. * The file_path method of media pipelines can now access the source item. * This allows you to set a download file path based on item data. * The new item_export_kwargs key of the FEEDS setting allows to define keyword parameters to pass to item exporter classes. * You can now choose whether feed exports overwrite or append to the output file. * For example, when using the crawl or runspider commands, you can use the -O option instead of -o to overwrite the output file. * Zstd-compressed responses are now supported if zstandard is installed. * In settings, where the import path of a class is required, it is now possible to pass a class object instead. - Release 2.3.0: * Feed exports now support Google Cloud Storage as a storage backend * The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver output items in batches of up to the specified number of items. * It also serves as a workaround for delayed file delivery, which causes Scrapy to only start item delivery after the crawl has finished when using certain storage backends (S3, FTP, and now GCS). * The base implementation of item loaders has been moved into a separate library, itemloaders, allowing usage from outside Scrapy and a separate release schedule - Release 2.2.1: * The startproject command no longer makes unintended changes to the permissions of files in the destination folder, such as removing execution permissions. OBS-URL: https://build.opensuse.org/request/show/889030 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
BuildRequires: %{python_module uvloop}
BuildRequires: %{python_module w3lib >= 1.17.0}
BuildRequires: %{python_module zope.interface >= 5.1.0}
BuildRequires: fdupes
BuildRequires: python-rpm-macros
BuildRequires: python3-Sphinx
BuildRequires: (python3-dataclasses if python3-base < 3.7)
Requires: python-Protego >= 0.1.15
Requires: python-PyDispatcher >= 2.0.5
Requires: python-Twisted >= 18.9.0
Requires: python-cryptography >= 36.0.0
Requires: python-cssselect >= 0.9.1
Requires: python-itemadapter >= 0.1.0
Accepting request 889030 from home:bnavigator:branches:devel:languages:python - Update to 2.5.0: * Official Python 3.9 support * Experimental HTTP/2 support * New get_retry_request() function to retry requests from spider callbacks * New headers_received signal that allows stopping downloads early * New Response.protocol attribute - Release 2.4.1: * Fixed feed exports overwrite support * Fixed the asyncio event loop handling, which could make code hang * Fixed the IPv6-capable DNS resolver CachingHostnameResolver for download handlers that call reactor.resolve * Fixed the output of the genspider command showing placeholders instead of the import part of the generated spider module (issue 4874) - Release 2.4.0: * Python 3.5 support has been dropped. * The file_path method of media pipelines can now access the source item. * This allows you to set a download file path based on item data. * The new item_export_kwargs key of the FEEDS setting allows to define keyword parameters to pass to item exporter classes. * You can now choose whether feed exports overwrite or append to the output file. * For example, when using the crawl or runspider commands, you can use the -O option instead of -o to overwrite the output file. * Zstd-compressed responses are now supported if zstandard is installed. * In settings, where the import path of a class is required, it is now possible to pass a class object instead. - Release 2.3.0: * Feed exports now support Google Cloud Storage as a storage backend * The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver output items in batches of up to the specified number of items. * It also serves as a workaround for delayed file delivery, which causes Scrapy to only start item delivery after the crawl has finished when using certain storage backends (S3, FTP, and now GCS). * The base implementation of item loaders has been moved into a separate library, itemloaders, allowing usage from outside Scrapy and a separate release schedule - Release 2.2.1: * The startproject command no longer makes unintended changes to the permissions of files in the destination folder, such as removing execution permissions. OBS-URL: https://build.opensuse.org/request/show/889030 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
Requires: python-itemloaders >= 1.0.1
Requires: python-lxml >= 4.4.1
Requires: python-parsel >= 1.5.0
Requires: python-pyOpenSSL >= 21.0.0
Requires: python-queuelib >= 1.4.2
Requires: python-service_identity >= 18.1.0
Requires: python-setuptools
Requires: python-tldextract
Requires: python-w3lib >= 1.17.2
Requires: python-zope.interface >= 5.1.0
Requires(post): update-alternatives
Accepting request 889030 from home:bnavigator:branches:devel:languages:python - Update to 2.5.0: * Official Python 3.9 support * Experimental HTTP/2 support * New get_retry_request() function to retry requests from spider callbacks * New headers_received signal that allows stopping downloads early * New Response.protocol attribute - Release 2.4.1: * Fixed feed exports overwrite support * Fixed the asyncio event loop handling, which could make code hang * Fixed the IPv6-capable DNS resolver CachingHostnameResolver for download handlers that call reactor.resolve * Fixed the output of the genspider command showing placeholders instead of the import part of the generated spider module (issue 4874) - Release 2.4.0: * Python 3.5 support has been dropped. * The file_path method of media pipelines can now access the source item. * This allows you to set a download file path based on item data. * The new item_export_kwargs key of the FEEDS setting allows to define keyword parameters to pass to item exporter classes. * You can now choose whether feed exports overwrite or append to the output file. * For example, when using the crawl or runspider commands, you can use the -O option instead of -o to overwrite the output file. * Zstd-compressed responses are now supported if zstandard is installed. * In settings, where the import path of a class is required, it is now possible to pass a class object instead. - Release 2.3.0: * Feed exports now support Google Cloud Storage as a storage backend * The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver output items in batches of up to the specified number of items. * It also serves as a workaround for delayed file delivery, which causes Scrapy to only start item delivery after the crawl has finished when using certain storage backends (S3, FTP, and now GCS). * The base implementation of item loaders has been moved into a separate library, itemloaders, allowing usage from outside Scrapy and a separate release schedule - Release 2.2.1: * The startproject command no longer makes unintended changes to the permissions of files in the destination folder, such as removing execution permissions. OBS-URL: https://build.opensuse.org/request/show/889030 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
Requires(postun):update-alternatives
BuildArch: noarch
%python_subpackages
%description
Scrapy is a high level scraping and web crawling framework for writing spiders
to crawl and parse web pages for all kinds of purposes, from information
retrieval to monitoring or testing web sites.
%package -n %{name}-doc
Summary: Documentation for %{name}
Group: Documentation/HTML
%description -n %{name}-doc
Provides documentation for %{name}.
%prep
%autosetup -p1 -n Scrapy-%{version}
sed -i -e 's:= python:= python3:g' docs/Makefile
%build
%python_build
pushd docs
%make_build html && rm -r build/html/.buildinfo
popd
%install
%python_install
%python_clone -a %{buildroot}%{_bindir}/scrapy
%python_expand %fdupes %{buildroot}%{$python_sitelib}
%check
Accepting request 889030 from home:bnavigator:branches:devel:languages:python - Update to 2.5.0: * Official Python 3.9 support * Experimental HTTP/2 support * New get_retry_request() function to retry requests from spider callbacks * New headers_received signal that allows stopping downloads early * New Response.protocol attribute - Release 2.4.1: * Fixed feed exports overwrite support * Fixed the asyncio event loop handling, which could make code hang * Fixed the IPv6-capable DNS resolver CachingHostnameResolver for download handlers that call reactor.resolve * Fixed the output of the genspider command showing placeholders instead of the import part of the generated spider module (issue 4874) - Release 2.4.0: * Python 3.5 support has been dropped. * The file_path method of media pipelines can now access the source item. * This allows you to set a download file path based on item data. * The new item_export_kwargs key of the FEEDS setting allows to define keyword parameters to pass to item exporter classes. * You can now choose whether feed exports overwrite or append to the output file. * For example, when using the crawl or runspider commands, you can use the -O option instead of -o to overwrite the output file. * Zstd-compressed responses are now supported if zstandard is installed. * In settings, where the import path of a class is required, it is now possible to pass a class object instead. - Release 2.3.0: * Feed exports now support Google Cloud Storage as a storage backend * The new FEED_EXPORT_BATCH_ITEM_COUNT setting allows to deliver output items in batches of up to the specified number of items. * It also serves as a workaround for delayed file delivery, which causes Scrapy to only start item delivery after the crawl has finished when using certain storage backends (S3, FTP, and now GCS). * The base implementation of item loaders has been moved into a separate library, itemloaders, allowing usage from outside Scrapy and a separate release schedule - Release 2.2.1: * The startproject command no longer makes unintended changes to the permissions of files in the destination folder, such as removing execution permissions. OBS-URL: https://build.opensuse.org/request/show/889030 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=18
2021-04-28 13:47:21 +00:00
# no color in obs chroot console
skiplist="test_pformat"
# no online connection to toscrapy.com
skiplist="$skiplist or CheckCommandTest"
# Flaky test gh#scrapy/scrapy#5703
skiplist="$skiplist or test_start_requests_laziness"
%{pytest -x \
-k "not (${skiplist})" \
-W ignore::DeprecationWarning \
tests}
%post
%python_install_alternative scrapy
%postun
%python_uninstall_alternative scrapy
%files %{python_files}
%license LICENSE
%doc AUTHORS README.rst
%{python_sitelib}/scrapy
%{python_sitelib}/Scrapy-%{version}*-info
%python_alternative %{_bindir}/scrapy
%files -n %{name}-doc
%doc docs/build/html
%changelog