Compare commits

10 Commits

Author SHA256 Message Date
8062fb28e7 Accepting request 1296688 from devel:languages:python
- Update to 2.13.3:
  * Changed the values for DOWNLOAD_DELAY (from 0 to 1) and
    CONCURRENT_REQUESTS_PER_DOMAIN (from 8 to 1) in the default project
    template.
  * Fixed several bugs in the engine initialization and exception handling
    logic.
  * Allowed running tests with Twisted 25.5.0+ again and fixed test failures
    with lxml 6.0.0.
  * Give callback requests precedence over start requests when priority
    values are the same.
  * The asyncio reactor is now enabled by default
  * Replaced start_requests() (sync) with start() (async) and changed how it
    is iterated.
  * Added the allow_offsite request meta key
  * Spider middlewares that don't support asynchronous spider output are
    deprecated
  * Added a base class for universal spider middlewares
- Add patch remove-hoverxref.patch:
  * Do not use deprecated sphinx-hoverxref extension.
- Add patch no-dark-mode.patch:
  * Do not use unavailable sphinx-rtd-dark-mode extension.

OBS-URL: https://build.opensuse.org/request/show/1296688
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/python-Scrapy?expand=0&rev=24
2025-07-31 15:47:02 +00:00
99a9fd84ae skip another test
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=49
2025-07-31 05:18:58 +00:00
820484d3f8 - Update to 2.13.3:
* Changed the values for DOWNLOAD_DELAY (from 0 to 1) and
    CONCURRENT_REQUESTS_PER_DOMAIN (from 8 to 1) in the default project
    template.
  * Fixed several bugs in the engine initialization and exception handling
    logic.
  * Allowed running tests with Twisted 25.5.0+ again and fixed test failures
    with lxml 6.0.0.
  * Give callback requests precedence over start requests when priority
    values are the same.
  * The asyncio reactor is now enabled by default
  * Replaced start_requests() (sync) with start() (async) and changed how it
    is iterated.
  * Added the allow_offsite request meta key
  * Spider middlewares that don't support asynchronous spider output are
    deprecated
  * Added a base class for universal spider middlewares
- Add patch remove-hoverxref.patch:
  * Do not use deprecated sphinx-hoverxref extension.
- Add patch no-dark-mode.patch:
  * Do not use unavailable sphinx-rtd-dark-mode extension.

OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=48
2025-07-31 04:43:53 +00:00
18faaccdb2 Accepting request 1264848 from devel:languages:python
- Normalize metadata directory name.

Requires python-setuptools 78 to build successfully.

OBS-URL: https://build.opensuse.org/request/show/1264848
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/python-Scrapy?expand=0&rev=23
2025-04-16 18:38:35 +00:00
17fd446a8c - Normalize metadata directory name.
OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=46
2025-03-27 05:46:50 +00:00
83613729bd Accepting request 1227933 from devel:languages:python
- Update to 2.12.0:
  * Dropped support for Python 3.8, added support for Python 3.13
  * start_requests can now yield items
  * Added scrapy.http.JsonResponse
  * Added the CLOSESPIDER_PAGECOUNT_NO_ITEM setting

OBS-URL: https://build.opensuse.org/request/show/1227933
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/python-Scrapy?expand=0&rev=22
2024-12-03 19:47:04 +00:00
969a5b5698 - Update to 2.12.0:
* Dropped support for Python 3.8, added support for Python 3.13
  * start_requests can now yield items
  * Added scrapy.http.JsonResponse
  * Added the CLOSESPIDER_PAGECOUNT_NO_ITEM setting

OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=44
2024-12-03 08:25:27 +00:00
6dac99b3c7 Accepting request 1186841 from devel:languages:python
- update to 2.11.2 (bsc#1224474, CVE-2024-1968):
  * Redirects to non-HTTP protocols are no longer followed.
    Please, see the 23j4-mw76-5v7h security advisory for more
    information. (:issue:`457`)
  * The Authorization header is now dropped on redirects to a
    different scheme (http:// or https://) or port, even if the
    domain is the same. Please, see the 4qqq-9vqf-3h3f security
    advisory for more information.
  * When using system proxy settings that are different for
    http:// and https://, redirects to a different URL scheme
    will now also trigger the corresponding change in proxy
    settings for the redirected request. Please, see the
    jm3v-qxmh-hxwv security advisory for more information.
    (:issue:`767`)
  * :attr:`Spider.allowed_domains
    <scrapy.Spider.allowed_domains>` is now enforced for all
    requests, and not only requests from spider callbacks.
  * :func:`~scrapy.utils.iterators.xmliter_lxml` no longer
    resolves XML entities.
  * defusedxml is now used to make
    :class:`scrapy.http.request.rpc.XmlRpcRequest` more secure.
  * Restored support for brotlipy_, which had been dropped in
    Scrapy 2.11.1 in favor of brotli. (:issue:`6261`)  Note
    brotlipy is deprecated, both in Scrapy and upstream. Use
    brotli instead if you can.
  * Make :setting:`METAREFRESH_IGNORE_TAGS` ["noscript"] by
    default. This prevents :class:`~scrapy.downloadermiddlewares.
    redirect.MetaRefreshMiddleware` from following redirects that
    would not be followed by web browsers with JavaScript
    enabled.

OBS-URL: https://build.opensuse.org/request/show/1186841
OBS-URL: https://build.opensuse.org/package/show/openSUSE:Factory/python-Scrapy?expand=0&rev=21
2024-07-11 18:33:34 +00:00
1c6fbdfae1 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=42 2024-07-11 11:08:02 +00:00
add99b967c - update to 2.11.2 (bsc#1224474, CVE-2024-1968):
* Redirects to non-HTTP protocols are no longer followed.
    Please, see the 23j4-mw76-5v7h security advisory for more
    information. (:issue:`457`)
  * The Authorization header is now dropped on redirects to a
    different scheme (http:// or https://) or port, even if the
    domain is the same. Please, see the 4qqq-9vqf-3h3f security
    advisory for more information.
  * When using system proxy settings that are different for
    http:// and https://, redirects to a different URL scheme
    will now also trigger the corresponding change in proxy
    settings for the redirected request. Please, see the
    jm3v-qxmh-hxwv security advisory for more information.
    (:issue:`767`)
  * :attr:`Spider.allowed_domains
    <scrapy.Spider.allowed_domains>` is now enforced for all
    requests, and not only requests from spider callbacks.
  * :func:`~scrapy.utils.iterators.xmliter_lxml` no longer
    resolves XML entities.
  * defusedxml is now used to make
    :class:`scrapy.http.request.rpc.XmlRpcRequest` more secure.
  * Restored support for brotlipy_, which had been dropped in
    Scrapy 2.11.1 in favor of brotli. (:issue:`6261`)  Note
    brotlipy is deprecated, both in Scrapy and upstream. Use
    brotli instead if you can.
  * Make :setting:`METAREFRESH_IGNORE_TAGS` ["noscript"] by
    default. This prevents :class:`~scrapy.downloadermiddlewares.
    redirect.MetaRefreshMiddleware` from following redirects that
    would not be followed by web browsers with JavaScript
    enabled.

OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-Scrapy?expand=0&rev=41
2024-07-11 10:53:38 +00:00
6 changed files with 6 additions and 750 deletions

2
.gitattributes vendored
View File

@@ -21,5 +21,3 @@
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text
## Specific LFS patterns
CVE-2025-6176-testfile-bomb-br-64GiB.bin filter=lfs diff=lfs merge=lfs -text

View File

@@ -1,3 +0,0 @@
version https://git-lfs.github.com/spec/v1
oid sha256:5d5b6139298c899595f784cdd36ff195dbdd479504c4a48d2a8e0a43d2e7a03d
size 51713

View File

@@ -1,684 +0,0 @@
From e3673d5a42cdd8be95c09982240317af1410fea3 Mon Sep 17 00:00:00 2001
From: Rui Xi <Cycloctane@outlook.com>
Date: Thu, 6 Nov 2025 18:53:35 +0800
Subject: [PATCH 01/18] mitigate brotli decompression bomb
drop brotlicffi
---
.../downloadermiddlewares/httpcompression.py | 10 +--
scrapy/utils/_compression.py | 75 +++++--------------
scrapy/utils/gz.py | 9 +--
...st_downloadermiddleware_httpcompression.py | 16 +---
4 files changed, 29 insertions(+), 81 deletions(-)
Index: scrapy-2.13.3/scrapy/downloadermiddlewares/httpcompression.py
===================================================================
--- scrapy-2.13.3.orig/scrapy/downloadermiddlewares/httpcompression.py
+++ scrapy-2.13.3/scrapy/downloadermiddlewares/httpcompression.py
@@ -29,14 +29,20 @@ logger = getLogger(__name__)
ACCEPTED_ENCODINGS: list[bytes] = [b"gzip", b"deflate"]
try:
- try:
- import brotli # noqa: F401
- except ImportError:
- import brotlicffi # noqa: F401
+ import brotli
except ImportError:
pass
else:
- ACCEPTED_ENCODINGS.append(b"br")
+ try:
+ brotli.Decompressor.can_accept_more_data
+ except AttributeError:
+ warnings.warn(
+ "You have brotli installed. But 'br' encoding support now requires "
+ "brotli version >= 1.2.0. Please upgrade brotli version to make Scrapy "
+ "decode 'br' encoded responses.",
+ )
+ else:
+ ACCEPTED_ENCODINGS.append(b"br")
try:
import zstandard # noqa: F401
@@ -98,13 +104,13 @@ class HttpCompressionMiddleware:
decoded_body, content_encoding = self._handle_encoding(
response.body, content_encoding, max_size
)
- except _DecompressionMaxSizeExceeded:
+ except _DecompressionMaxSizeExceeded as e:
raise IgnoreRequest(
f"Ignored response {response} because its body "
- f"({len(response.body)} B compressed) exceeded "
- f"DOWNLOAD_MAXSIZE ({max_size} B) during "
- f"decompression."
- )
+ f"({len(response.body)} B compressed, "
+ f"{e.decompressed_size} B decompressed so far) exceeded "
+ f"DOWNLOAD_MAXSIZE ({max_size} B) during decompression."
+ ) from e
if len(response.body) < warn_size <= len(decoded_body):
logger.warning(
f"{response} body size after decompression "
@@ -187,7 +193,7 @@ class HttpCompressionMiddleware:
f"from unsupported encoding(s) '{encodings_str}'."
)
if b"br" in encodings:
- msg += " You need to install brotli or brotlicffi to decode 'br'."
+ msg += " You need to install brotli >= 1.2.0 to decode 'br'."
if b"zstd" in encodings:
msg += " You need to install zstandard to decode 'zstd'."
logger.warning(msg)
Index: scrapy-2.13.3/scrapy/utils/_compression.py
===================================================================
--- scrapy-2.13.3.orig/scrapy/utils/_compression.py
+++ scrapy-2.13.3/scrapy/utils/_compression.py
@@ -1,42 +1,9 @@
import contextlib
import zlib
from io import BytesIO
-from warnings import warn
-
-from scrapy.exceptions import ScrapyDeprecationWarning
-
-try:
- try:
- import brotli
- except ImportError:
- import brotlicffi as brotli
-except ImportError:
- pass
-else:
- try:
- brotli.Decompressor.process
- except AttributeError:
- warn(
- (
- "You have brotlipy installed, and Scrapy will use it, but "
- "Scrapy support for brotlipy is deprecated and will stop "
- "working in a future version of Scrapy. brotlipy itself is "
- "deprecated, it has been superseded by brotlicffi. Please, "
- "uninstall brotlipy and install brotli or brotlicffi instead. "
- "brotlipy has the same import name as brotli, so keeping both "
- "installed is strongly discouraged."
- ),
- ScrapyDeprecationWarning,
- )
-
- def _brotli_decompress(decompressor, data):
- return decompressor.decompress(data)
-
- else:
-
- def _brotli_decompress(decompressor, data):
- return decompressor.process(data)
+with contextlib.suppress(ImportError):
+ import brotli
with contextlib.suppress(ImportError):
import zstandard
@@ -46,62 +13,64 @@ _CHUNK_SIZE = 65536 # 64 KiB
class _DecompressionMaxSizeExceeded(ValueError):
- pass
+ def __init__(self, decompressed_size: int, max_size: int) -> None:
+ self.decompressed_size = decompressed_size
+ self.max_size = max_size
+
+ def __str__(self) -> str:
+ return (
+ "The number of bytes decompressed so far "
+ f"({self.decompressed_size} B) exceeded the specified maximum "
+ f"({self.max_size} B)."
+ )
+
+
+def _check_max_size(decompressed_size: int, max_size: int) -> None:
+ if max_size and decompressed_size > max_size:
+ raise _DecompressionMaxSizeExceeded(decompressed_size, max_size)
def _inflate(data: bytes, *, max_size: int = 0) -> bytes:
decompressor = zlib.decompressobj()
- raw_decompressor = zlib.decompressobj(wbits=-15)
- input_stream = BytesIO(data)
+ try:
+ first_chunk = decompressor.decompress(data, max_length=_CHUNK_SIZE)
+ except zlib.error:
+ # to work with raw deflate content that may sent by microsoft servers.
+ decompressor = zlib.decompressobj(wbits=-15)
+ first_chunk = decompressor.decompress(data, max_length=_CHUNK_SIZE)
+ decompressed_size = len(first_chunk)
+ _check_max_size(decompressed_size, max_size)
output_stream = BytesIO()
- output_chunk = b"."
- decompressed_size = 0
- while output_chunk:
- input_chunk = input_stream.read(_CHUNK_SIZE)
- try:
- output_chunk = decompressor.decompress(input_chunk)
- except zlib.error:
- if decompressor != raw_decompressor:
- # ugly hack to work with raw deflate content that may
- # be sent by microsoft servers. For more information, see:
- # http://carsten.codimi.de/gzip.yaws/
- # http://www.port80software.com/200ok/archive/2005/10/31/868.aspx
- # http://www.gzip.org/zlib/zlib_faq.html#faq38
- decompressor = raw_decompressor
- output_chunk = decompressor.decompress(input_chunk)
- else:
- raise
+ output_stream.write(first_chunk)
+ while decompressor.unconsumed_tail:
+ output_chunk = decompressor.decompress(
+ decompressor.unconsumed_tail, max_length=_CHUNK_SIZE
+ )
decompressed_size += len(output_chunk)
- if max_size and decompressed_size > max_size:
- raise _DecompressionMaxSizeExceeded(
- f"The number of bytes decompressed so far "
- f"({decompressed_size} B) exceed the specified maximum "
- f"({max_size} B)."
- )
+ _check_max_size(decompressed_size, max_size)
output_stream.write(output_chunk)
- output_stream.seek(0)
- return output_stream.read()
+ if tail := decompressor.flush():
+ decompressed_size += len(tail)
+ _check_max_size(decompressed_size, max_size)
+ output_stream.write(tail)
+ return output_stream.getvalue()
def _unbrotli(data: bytes, *, max_size: int = 0) -> bytes:
decompressor = brotli.Decompressor()
- input_stream = BytesIO(data)
+ first_chunk = decompressor.process(data, output_buffer_limit=_CHUNK_SIZE)
+ decompressed_size = len(first_chunk)
+ _check_max_size(decompressed_size, max_size)
output_stream = BytesIO()
- output_chunk = b"."
- decompressed_size = 0
- while output_chunk:
- input_chunk = input_stream.read(_CHUNK_SIZE)
- output_chunk = _brotli_decompress(decompressor, input_chunk)
+ output_stream.write(first_chunk)
+ while not decompressor.is_finished():
+ output_chunk = decompressor.process(b"", output_buffer_limit=_CHUNK_SIZE)
+ if not output_chunk:
+ raise ValueError("Truncated brotli compressed data")
decompressed_size += len(output_chunk)
- if max_size and decompressed_size > max_size:
- raise _DecompressionMaxSizeExceeded(
- f"The number of bytes decompressed so far "
- f"({decompressed_size} B) exceed the specified maximum "
- f"({max_size} B)."
- )
+ _check_max_size(decompressed_size, max_size)
output_stream.write(output_chunk)
- output_stream.seek(0)
- return output_stream.read()
+ return output_stream.getvalue()
def _unzstd(data: bytes, *, max_size: int = 0) -> bytes:
@@ -113,12 +82,6 @@ def _unzstd(data: bytes, *, max_size: in
while output_chunk:
output_chunk = stream_reader.read(_CHUNK_SIZE)
decompressed_size += len(output_chunk)
- if max_size and decompressed_size > max_size:
- raise _DecompressionMaxSizeExceeded(
- f"The number of bytes decompressed so far "
- f"({decompressed_size} B) exceed the specified maximum "
- f"({max_size} B)."
- )
+ _check_max_size(decompressed_size, max_size)
output_stream.write(output_chunk)
- output_stream.seek(0)
- return output_stream.read()
+ return output_stream.getvalue()
Index: scrapy-2.13.3/scrapy/utils/gz.py
===================================================================
--- scrapy-2.13.3.orig/scrapy/utils/gz.py
+++ scrapy-2.13.3/scrapy/utils/gz.py
@@ -5,7 +5,7 @@ from gzip import GzipFile
from io import BytesIO
from typing import TYPE_CHECKING
-from ._compression import _CHUNK_SIZE, _DecompressionMaxSizeExceeded
+from ._compression import _CHUNK_SIZE, _check_max_size
if TYPE_CHECKING:
from scrapy.http import Response
@@ -31,15 +31,9 @@ def gunzip(data: bytes, *, max_size: int
break
raise
decompressed_size += len(chunk)
- if max_size and decompressed_size > max_size:
- raise _DecompressionMaxSizeExceeded(
- f"The number of bytes decompressed so far "
- f"({decompressed_size} B) exceed the specified maximum "
- f"({max_size} B)."
- )
+ _check_max_size(decompressed_size, max_size)
output_stream.write(chunk)
- output_stream.seek(0)
- return output_stream.read()
+ return output_stream.getvalue()
def gzip_magic_number(response: Response) -> bool:
Index: scrapy-2.13.3/tests/test_downloadermiddleware_httpcompression.py
===================================================================
--- scrapy-2.13.3.orig/tests/test_downloadermiddleware_httpcompression.py
+++ scrapy-2.13.3/tests/test_downloadermiddleware_httpcompression.py
@@ -2,7 +2,6 @@ from gzip import GzipFile
from io import BytesIO
from logging import WARNING
from pathlib import Path
-from unittest import SkipTest
import pytest
from testfixtures import LogCapture
@@ -48,9 +47,26 @@ FORMAT = {
"zstd", # 1 096 → 11 511 612
)
},
+ "bomb-br-64GiB": ("bomb-br-64GiB.bin", "br"), # 51K → 64 GiB decompression bomb
}
+def _skip_if_no_br() -> None:
+ try:
+ import brotli # noqa: PLC0415
+
+ brotli.Decompressor.can_accept_more_data
+ except (ImportError, AttributeError):
+ pytest.skip("no brotli support")
+
+
+def _skip_if_no_zstd() -> None:
+ try:
+ import zstandard # noqa: F401,PLC0415
+ except ImportError:
+ pytest.skip("no zstd support (zstandard)")
+
+
class TestHttpCompression:
def setup_method(self):
self.crawler = get_crawler(Spider)
@@ -124,13 +140,8 @@ class TestHttpCompression:
self.assertStatsEqual("httpcompression/response_bytes", 74837)
def test_process_response_br(self):
- try:
- try:
- import brotli # noqa: F401
- except ImportError:
- import brotlicffi # noqa: F401
- except ImportError:
- raise SkipTest("no brotli")
+ _skip_if_no_br()
+
response = self._getresponse("br")
request = response.request
assert response.headers["Content-Encoding"] == b"br"
@@ -143,14 +154,9 @@ class TestHttpCompression:
def test_process_response_br_unsupported(self):
try:
- try:
- import brotli # noqa: F401
-
- raise SkipTest("Requires not having brotli support")
- except ImportError:
- import brotlicffi # noqa: F401
+ import brotli # noqa: F401,PLC0415
- raise SkipTest("Requires not having brotli support")
+ pytest.skip("Requires not having brotli support")
except ImportError:
pass
response = self._getresponse("br")
@@ -169,7 +175,7 @@ class TestHttpCompression:
(
"HttpCompressionMiddleware cannot decode the response for"
" http://scrapytest.org/ from unsupported encoding(s) 'br'."
- " You need to install brotli or brotlicffi to decode 'br'."
+ " You need to install brotli >= 1.2.0 to decode 'br'."
),
),
)
@@ -177,10 +183,8 @@ class TestHttpCompression:
assert newresponse.headers.getlist("Content-Encoding") == [b"br"]
def test_process_response_zstd(self):
- try:
- import zstandard # noqa: F401
- except ImportError:
- raise SkipTest("no zstd support (zstandard)")
+ _skip_if_no_zstd()
+
raw_content = None
for check_key in FORMAT:
if not check_key.startswith("zstd-"):
@@ -199,9 +203,9 @@ class TestHttpCompression:
def test_process_response_zstd_unsupported(self):
try:
- import zstandard # noqa: F401
+ import zstandard # noqa: F401,PLC0415
- raise SkipTest("Requires not having zstandard support")
+ pytest.skip("Requires not having zstandard support")
except ImportError:
pass
response = self._getresponse("zstd-static-content-size")
@@ -503,24 +507,20 @@ class TestHttpCompression:
self.assertStatsEqual("httpcompression/response_bytes", None)
def _test_compression_bomb_setting(self, compression_id):
- settings = {"DOWNLOAD_MAXSIZE": 10_000_000}
+ settings = {"DOWNLOAD_MAXSIZE": 1_000_000}
crawler = get_crawler(Spider, settings_dict=settings)
spider = crawler._create_spider("scrapytest.org")
mw = HttpCompressionMiddleware.from_crawler(crawler)
mw.open_spider(spider)
- response = self._getresponse(f"bomb-{compression_id}")
- with pytest.raises(IgnoreRequest):
- mw.process_response(response.request, response, spider)
+ response = self._getresponse(f"bomb-{compression_id}") # 11_511_612 B
+ with pytest.raises(IgnoreRequest) as exc_info:
+ mw.process_response(response.request, response, self.spider)
+ assert exc_info.value.__cause__.decompressed_size < 1_100_000
def test_compression_bomb_setting_br(self):
- try:
- try:
- import brotli # noqa: F401
- except ImportError:
- import brotlicffi # noqa: F401
- except ImportError:
- raise SkipTest("no brotli")
+ _skip_if_no_br()
+
self._test_compression_bomb_setting("br")
def test_compression_bomb_setting_deflate(self):
@@ -530,15 +530,13 @@ class TestHttpCompression:
self._test_compression_bomb_setting("gzip")
def test_compression_bomb_setting_zstd(self):
- try:
- import zstandard # noqa: F401
- except ImportError:
- raise SkipTest("no zstd support (zstandard)")
+ _skip_if_no_zstd()
+
self._test_compression_bomb_setting("zstd")
def _test_compression_bomb_spider_attr(self, compression_id):
class DownloadMaxSizeSpider(Spider):
- download_maxsize = 10_000_000
+ download_maxsize = 1_000_000
crawler = get_crawler(DownloadMaxSizeSpider)
spider = crawler._create_spider("scrapytest.org")
@@ -546,30 +544,28 @@ class TestHttpCompression:
mw.open_spider(spider)
response = self._getresponse(f"bomb-{compression_id}")
- with pytest.raises(IgnoreRequest):
- mw.process_response(response.request, response, spider)
+ with pytest.raises(IgnoreRequest) as exc_info:
+ mw.process_response(response.request, response, self.spider)
+ assert exc_info.value.__cause__.decompressed_size < 1_100_000
+ @pytest.mark.filterwarnings("ignore::scrapy.exceptions.ScrapyDeprecationWarning")
def test_compression_bomb_spider_attr_br(self):
- try:
- try:
- import brotli # noqa: F401
- except ImportError:
- import brotlicffi # noqa: F401
- except ImportError:
- raise SkipTest("no brotli")
+ _skip_if_no_br()
+
self._test_compression_bomb_spider_attr("br")
+ @pytest.mark.filterwarnings("ignore::scrapy.exceptions.ScrapyDeprecationWarning")
def test_compression_bomb_spider_attr_deflate(self):
self._test_compression_bomb_spider_attr("deflate")
+ @pytest.mark.filterwarnings("ignore::scrapy.exceptions.ScrapyDeprecationWarning")
def test_compression_bomb_spider_attr_gzip(self):
self._test_compression_bomb_spider_attr("gzip")
+ @pytest.mark.filterwarnings("ignore::scrapy.exceptions.ScrapyDeprecationWarning")
def test_compression_bomb_spider_attr_zstd(self):
- try:
- import zstandard # noqa: F401
- except ImportError:
- raise SkipTest("no zstd support (zstandard)")
+ _skip_if_no_zstd()
+
self._test_compression_bomb_spider_attr("zstd")
def _test_compression_bomb_request_meta(self, compression_id):
@@ -579,18 +575,14 @@ class TestHttpCompression:
mw.open_spider(spider)
response = self._getresponse(f"bomb-{compression_id}")
- response.meta["download_maxsize"] = 10_000_000
- with pytest.raises(IgnoreRequest):
- mw.process_response(response.request, response, spider)
+ response.meta["download_maxsize"] = 1_000_000
+ with pytest.raises(IgnoreRequest) as exc_info:
+ mw.process_response(response.request, response, self.spider)
+ assert exc_info.value.__cause__.decompressed_size < 1_100_000
def test_compression_bomb_request_meta_br(self):
- try:
- try:
- import brotli # noqa: F401
- except ImportError:
- import brotlicffi # noqa: F401
- except ImportError:
- raise SkipTest("no brotli")
+ _skip_if_no_br()
+
self._test_compression_bomb_request_meta("br")
def test_compression_bomb_request_meta_deflate(self):
@@ -600,12 +592,38 @@ class TestHttpCompression:
self._test_compression_bomb_request_meta("gzip")
def test_compression_bomb_request_meta_zstd(self):
- try:
- import zstandard # noqa: F401
- except ImportError:
- raise SkipTest("no zstd support (zstandard)")
+ _skip_if_no_zstd()
+
self._test_compression_bomb_request_meta("zstd")
+ def test_compression_bomb_output_buffer_limit(self):
+ """Test that the 64 GiB brotli decompression bomb is properly handled.
+
+ This test ensures that the output_buffer_limit parameter in the brotli
+ decompressor prevents the decompression bomb attack. The bomb file is
+ approximately 51 KB compressed but would decompress to 64 GiB, which
+ should trigger IgnoreRequest when DOWNLOAD_MAXSIZE is exceeded.
+ """
+ _skip_if_no_br()
+
+ settings = {"DOWNLOAD_MAXSIZE": 10_000_000} # 10 MB limit
+ crawler = get_crawler(Spider, settings_dict=settings)
+ spider = crawler._create_spider("scrapytest.org")
+ mw = HttpCompressionMiddleware.from_crawler(crawler)
+ mw.open_spider(spider)
+
+ response = self._getresponse("bomb-br-64GiB")
+
+ # Verify the response is properly configured
+ assert response.headers["Content-Encoding"] == b"br"
+
+ # The middleware should raise IgnoreRequest due to exceeding DOWNLOAD_MAXSIZE
+ with pytest.raises(IgnoreRequest) as exc_info:
+ mw.process_response(response.request, response, self.spider)
+
+ # Verify the exception message mentions the download size limits
+ assert "exceeded DOWNLOAD_MAXSIZE (10000000 B)" in str(exc_info.value)
+
def _test_download_warnsize_setting(self, compression_id):
settings = {"DOWNLOAD_WARNSIZE": 10_000_000}
crawler = get_crawler(Spider, settings_dict=settings)
@@ -619,7 +637,7 @@ class TestHttpCompression:
propagate=False,
level=WARNING,
) as log:
- mw.process_response(response.request, response, spider)
+ mw.process_response(response.request, response, self.spider)
log.check(
(
"scrapy.downloadermiddlewares.httpcompression",
@@ -633,13 +651,8 @@ class TestHttpCompression:
)
def test_download_warnsize_setting_br(self):
- try:
- try:
- import brotli # noqa: F401
- except ImportError:
- import brotlicffi # noqa: F401
- except ImportError:
- raise SkipTest("no brotli")
+ _skip_if_no_br()
+
self._test_download_warnsize_setting("br")
def test_download_warnsize_setting_deflate(self):
@@ -649,10 +662,8 @@ class TestHttpCompression:
self._test_download_warnsize_setting("gzip")
def test_download_warnsize_setting_zstd(self):
- try:
- import zstandard # noqa: F401
- except ImportError:
- raise SkipTest("no zstd support (zstandard)")
+ _skip_if_no_zstd()
+
self._test_download_warnsize_setting("zstd")
def _test_download_warnsize_spider_attr(self, compression_id):
@@ -670,7 +681,7 @@ class TestHttpCompression:
propagate=False,
level=WARNING,
) as log:
- mw.process_response(response.request, response, spider)
+ mw.process_response(response.request, response, self.spider)
log.check(
(
"scrapy.downloadermiddlewares.httpcompression",
@@ -683,27 +694,24 @@ class TestHttpCompression:
),
)
+ @pytest.mark.filterwarnings("ignore::scrapy.exceptions.ScrapyDeprecationWarning")
def test_download_warnsize_spider_attr_br(self):
- try:
- try:
- import brotli # noqa: F401
- except ImportError:
- import brotlicffi # noqa: F401
- except ImportError:
- raise SkipTest("no brotli")
+ _skip_if_no_br()
+
self._test_download_warnsize_spider_attr("br")
+ @pytest.mark.filterwarnings("ignore::scrapy.exceptions.ScrapyDeprecationWarning")
def test_download_warnsize_spider_attr_deflate(self):
self._test_download_warnsize_spider_attr("deflate")
+ @pytest.mark.filterwarnings("ignore::scrapy.exceptions.ScrapyDeprecationWarning")
def test_download_warnsize_spider_attr_gzip(self):
self._test_download_warnsize_spider_attr("gzip")
+ @pytest.mark.filterwarnings("ignore::scrapy.exceptions.ScrapyDeprecationWarning")
def test_download_warnsize_spider_attr_zstd(self):
- try:
- import zstandard # noqa: F401
- except ImportError:
- raise SkipTest("no zstd support (zstandard)")
+ _skip_if_no_zstd()
+
self._test_download_warnsize_spider_attr("zstd")
def _test_download_warnsize_request_meta(self, compression_id):
@@ -719,7 +727,7 @@ class TestHttpCompression:
propagate=False,
level=WARNING,
) as log:
- mw.process_response(response.request, response, spider)
+ mw.process_response(response.request, response, self.spider)
log.check(
(
"scrapy.downloadermiddlewares.httpcompression",
@@ -733,13 +741,8 @@ class TestHttpCompression:
)
def test_download_warnsize_request_meta_br(self):
- try:
- try:
- import brotli # noqa: F401
- except ImportError:
- import brotlicffi # noqa: F401
- except ImportError:
- raise SkipTest("no brotli")
+ _skip_if_no_br()
+
self._test_download_warnsize_request_meta("br")
def test_download_warnsize_request_meta_deflate(self):
@@ -749,8 +752,6 @@ class TestHttpCompression:
self._test_download_warnsize_request_meta("gzip")
def test_download_warnsize_request_meta_zstd(self):
- try:
- import zstandard # noqa: F401
- except ImportError:
- raise SkipTest("no zstd support (zstandard)")
+ _skip_if_no_zstd()
+
self._test_download_warnsize_request_meta("zstd")
Index: scrapy-2.13.3/tox.ini
===================================================================
--- scrapy-2.13.3.orig/tox.ini
+++ scrapy-2.13.3/tox.ini
@@ -141,8 +141,7 @@ deps =
Twisted[http2]
boto3
bpython # optional for shell wrapper tests
- brotli; implementation_name != "pypy" # optional for HTTP compress downloader middleware tests
- brotlicffi; implementation_name == "pypy" # optional for HTTP compress downloader middleware tests
+ brotli >= 1.2.0 # optional for HTTP compress downloader middleware tests
google-cloud-storage
ipython
robotexclusionrulesparser
@@ -156,9 +155,7 @@ deps =
Pillow==8.0.0
boto3==1.20.0
bpython==0.7.1
- brotli==0.5.2; implementation_name != "pypy"
- brotlicffi==0.8.0; implementation_name == "pypy"
- brotlipy
+ brotli==1.2.0
google-cloud-storage==1.29.0
ipython==2.0.0
robotexclusionrulesparser==1.6.2

View File

@@ -1,3 +0,0 @@
<multibuild>
<package>test</package>
</multibuild>

View File

@@ -1,14 +1,3 @@
-------------------------------------------------------------------
Wed Nov 12 12:28:41 UTC 2025 - Daniel Garcia <daniel.garcia@suse.com>
- Use libalternatives
- Use multibuild to run tests in a subpackage
- add upstream patch CVE-2025-6176.patch to mitigate brotli and
deflate decompression bombs DoS.
This patch adds a new bin test file that was added as a new source
as CVE-2025-6176-testfile-bomb-br-64GiB.bin
gh#scrapy/scrapy#7134, bsc#1252945, CVE-2025-6176)
-------------------------------------------------------------------
Thu Jul 31 05:18:40 UTC 2025 - Steve Kowalik <steven.kowalik@suse.com>

View File

@@ -1,7 +1,7 @@
#
# spec file for package python-Scrapy
#
# Copyright (c) 2025 SUSE LLC and contributors
# Copyright (c) 2025 SUSE LLC
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
@@ -16,60 +16,37 @@
#
%global flavor @BUILD_FLAVOR@%{nil}
%if "%{flavor}" == "test"
%define psuffix -test
%bcond_without test
%endif
%if "%{flavor}" == ""
%define psuffix %{nil}
%bcond_with test
%endif
%if 0%{?suse_version} > 1500
%bcond_without libalternatives
%else
%bcond_with libalternatives
%endif
%{?sle15_python_module_pythons}
Name: python-Scrapy%{?psuffix}
Name: python-Scrapy
Version: 2.13.3
Release: 0
Summary: A high-level Python Screen Scraping framework
License: BSD-3-Clause
URL: https://scrapy.org
Source: https://files.pythonhosted.org/packages/source/s/scrapy/scrapy-%{version}.tar.gz
# New test file added in the gh#scrapy/scrapy#7134, needed for Patch2
# related to CVE-2025-6176
Source1: CVE-2025-6176-testfile-bomb-br-64GiB.bin
# PATCH-FIX-UPSTREAM gh#scrapy/scrapy#6922
Patch0: remove-hoverxref.patch
# PATCH-FIX-OPENSUSE No sphinx-rtd-dark-mode
Patch1: no-dark-mode.patch
# PATCH-FIX-UPSTREAM CVE-2025-6176.patch gh#scrapy/scrapy#7134
Patch2: CVE-2025-6176.patch
BuildRequires: %{python_module base >= 3.9}
BuildRequires: %{python_module hatchling}
BuildRequires: %{python_module pip}
BuildRequires: %{python_module wheel}
%if %{with test}
# Test requirements:
BuildRequires: %{python_module Scrapy = %{version}}
BuildRequires: %{python_module Brotli}
BuildRequires: %{python_module Pillow}
BuildRequires: %{python_module Protego}
BuildRequires: %{python_module PyDispatcher >= 2.0.5}
BuildRequires: %{python_module Twisted >= 18.9.0}
BuildRequires: %{python_module attrs}
BuildRequires: %{python_module base >= 3.9}
BuildRequires: %{python_module botocore >= 1.4.87}
BuildRequires: %{python_module cryptography >= 36.0.0}
BuildRequires: %{python_module cssselect >= 0.9.1}
BuildRequires: %{python_module dbm}
BuildRequires: %{python_module defusedxml >= 0.7.1}
BuildRequires: %{python_module hatchling}
BuildRequires: %{python_module itemadapter >= 0.1.0}
BuildRequires: %{python_module itemloaders >= 1.0.1}
BuildRequires: %{python_module lxml >= 4.4.1}
BuildRequires: %{python_module parsel >= 1.5.0}
BuildRequires: %{python_module pexpect >= 4.8.1}
BuildRequires: %{python_module pip}
BuildRequires: %{python_module pyOpenSSL >= 21.0.0}
BuildRequires: %{python_module pyftpdlib >= 1.5.8}
BuildRequires: %{python_module pytest-xdist}
@@ -82,7 +59,6 @@ BuildRequires: %{python_module tldextract}
BuildRequires: %{python_module uvloop}
BuildRequires: %{python_module w3lib >= 1.17.0}
BuildRequires: %{python_module zope.interface >= 5.1.0}
%endif
BuildRequires: fdupes
BuildRequires: python-rpm-macros
BuildRequires: python3-Sphinx
@@ -105,14 +81,9 @@ Requires: python-service_identity >= 18.1.0
Requires: python-tldextract
Requires: python-w3lib >= 1.17.2
Requires: python-zope.interface >= 5.1.0
BuildArch: noarch
%if %{with libalternatives}
BuildRequires: alts
Requires: alts
%else
Requires(post): update-alternatives
Requires(postun): update-alternatives
%endif
BuildArch: noarch
%python_subpackages
%description
@@ -131,7 +102,6 @@ Provides documentation for %{name}.
sed -i -e 's:= python:= python3:g' docs/Makefile
%if %{without test}
%build
%pyproject_wheel
pushd docs
@@ -142,12 +112,8 @@ popd
%pyproject_install
%python_clone -a %{buildroot}%{_bindir}/scrapy
%python_expand %fdupes %{buildroot}%{$python_sitelib}
%endif
%if %{with test}
%check
cp %{SOURCE1} tests/sample_data/compressed/bomb-br-64GiB.bin
# no color in obs chroot console
skiplist="test_pformat"
# no online connection to toscrapy.com
@@ -160,12 +126,6 @@ skiplist="$skiplist or test_queue_push_pop_priorities"
-k "not (${skiplist})" \
-W ignore::DeprecationWarning \
tests}
%endif
%if %{without test}
%pre
# If libalternatives is used: Removing old update-alternatives entries.
%python_libalternatives_reset_alternative scrapy
%post
%python_install_alternative scrapy
@@ -182,6 +142,5 @@ skiplist="$skiplist or test_queue_push_pop_priorities"
%files -n %{name}-doc
%doc docs/build/html
%endif
%changelog