14
0
Files
python-pdfminer.six/python-pdfminer.six.spec

91 lines
3.0 KiB
RPMSpec
Raw Normal View History

#
# spec file for package python-pdfminer.six
#
- Update to 20250327: * Added + Support for Python 3.13 + Support for zipped jpeg's + Fuzzing harnesses for integration into Google's OSS-Fuzz + Support for setuptools-git-versioning version 2.0.0 * Changed + Reduce memory overhead on runlength encoding by using lists + Using pyproject.toml instead of setup.py + Updated Python 3.7 syntax to 3.8 + Updated all Python version specifications to a minimum of 3.8 + Using absolute instead of relative imports + Using standard library functions for ascii85 and asciihex * Fixed + TypeError when CID character widths are not parseable as floats + TypeError raised by extract_text method with compressed PDF file + PSBaseParser can't handle tokens split across end of buffer + TypeError when CropBox is an indirect object reference + Remove redundant line to be able to recognize rectangles + Support indirect objects for filters + Make sure bytes is bytes where it counts + TypeError when corrupt PDF object reference cannot be parsed as int + TypeError when corrupt PDF literal cannot be converted to str + ValueError when corrupt PDF specifies a negative xref location + ValueError when corrupt PDF specifies an invalid mediabox + RecursionError when corrupt PDF specifies a recursive /Pages object + TypeError when corrupt PDF specifies text-positioning operators with invalid values + inline image parsing fails when stream data contains "EI\n" + TypeError when parsing object reference as mediabox OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-pdfminer.six?expand=0&rev=16
2025-04-07 05:36:55 +00:00
# Copyright (c) 2025 SUSE LLC
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
# upon. The license for this file, and modifications and additions to the
# file, is the same license as for the pristine package itself (unless the
# license for the pristine package is not an Open Source License, in which
# case the license is the MIT License). An "Open Source License" is a
# license that conforms to the Open Source Definition (Version 1.9)
# published by the Open Source Initiative.
# Please submit bugfixes or comments via https://bugs.opensuse.org/
#
%{?sle15_python_module_pythons}
Name: python-pdfminer.six
- Update to 20250327: * Added + Support for Python 3.13 + Support for zipped jpeg's + Fuzzing harnesses for integration into Google's OSS-Fuzz + Support for setuptools-git-versioning version 2.0.0 * Changed + Reduce memory overhead on runlength encoding by using lists + Using pyproject.toml instead of setup.py + Updated Python 3.7 syntax to 3.8 + Updated all Python version specifications to a minimum of 3.8 + Using absolute instead of relative imports + Using standard library functions for ascii85 and asciihex * Fixed + TypeError when CID character widths are not parseable as floats + TypeError raised by extract_text method with compressed PDF file + PSBaseParser can't handle tokens split across end of buffer + TypeError when CropBox is an indirect object reference + Remove redundant line to be able to recognize rectangles + Support indirect objects for filters + Make sure bytes is bytes where it counts + TypeError when corrupt PDF object reference cannot be parsed as int + TypeError when corrupt PDF literal cannot be converted to str + ValueError when corrupt PDF specifies a negative xref location + ValueError when corrupt PDF specifies an invalid mediabox + RecursionError when corrupt PDF specifies a recursive /Pages object + TypeError when corrupt PDF specifies text-positioning operators with invalid values + inline image parsing fails when stream data contains "EI\n" + TypeError when parsing object reference as mediabox OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-pdfminer.six?expand=0&rev=16
2025-04-07 05:36:55 +00:00
Version: 20250327
Release: 0
Summary: PDF parser and analyzer
License: MIT
URL: https://github.com/pdfminer/pdfminer.six
Source: https://github.com/pdfminer/pdfminer.six/archive/%{version}.tar.gz#/pdfminer.six-%{version}.tar.gz
- Update to 20250327: * Added + Support for Python 3.13 + Support for zipped jpeg's + Fuzzing harnesses for integration into Google's OSS-Fuzz + Support for setuptools-git-versioning version 2.0.0 * Changed + Reduce memory overhead on runlength encoding by using lists + Using pyproject.toml instead of setup.py + Updated Python 3.7 syntax to 3.8 + Updated all Python version specifications to a minimum of 3.8 + Using absolute instead of relative imports + Using standard library functions for ascii85 and asciihex * Fixed + TypeError when CID character widths are not parseable as floats + TypeError raised by extract_text method with compressed PDF file + PSBaseParser can't handle tokens split across end of buffer + TypeError when CropBox is an indirect object reference + Remove redundant line to be able to recognize rectangles + Support indirect objects for filters + Make sure bytes is bytes where it counts + TypeError when corrupt PDF object reference cannot be parsed as int + TypeError when corrupt PDF literal cannot be converted to str + ValueError when corrupt PDF specifies a negative xref location + ValueError when corrupt PDF specifies an invalid mediabox + RecursionError when corrupt PDF specifies a recursive /Pages object + TypeError when corrupt PDF specifies text-positioning operators with invalid values + inline image parsing fails when stream data contains "EI\n" + TypeError when parsing object reference as mediabox OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-pdfminer.six?expand=0&rev=16
2025-04-07 05:36:55 +00:00
BuildRequires: %{python_module base >= 3.9}
BuildRequires: %{python_module charset-normalizer >= 2.0.0}
BuildRequires: %{python_module cryptography >= 36.0.0}
- update to 20231228: * Removed Support for Python 3.6 and 3.7 * Output converter for the hOCR format * Font name aliases for Arial, Courier New and Times New Roman * Documentation on why special characters can sometimes not be extracted * Storing Bezier path and dashing style of line in LTCurve * Broken CI/CD pipeline by setting upper version limit for black, mypy, pip and setuptools * `flake8` failures * `ValueError` when bmp images with 1 bit channel are decoded * `ValueError` when trying to decrypt empty metadata values * Sphinx errors during building of documentation * `TypeError` when getting default width of font * Installing typing-extensions on Python 3.6 and 3.7 * `TypeError` in cmapdb.py when parsing null characters * Color "convenience operators" now (per spec) also set color space * `ValueError` when extracting images, due to breaking changes in Pillow * Small typo's and issues in the documentation * Ignore non-Unicode cmaps in TrueType fonts * Using non-hardcoded version string and setuptools-git- versioning to enable installation from source and building on Python 3.12 * Usage of `if __name__ == "__main__"` where it was only intended for testing purposes - Option to disable boxes flow layout analysis when using pdf2txt - Exporting images without any specific encoding - Rename PDFTextExtractionNotAllowedError to PDFTextExtractionNotAllowed to revert breaking change OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-pdfminer.six?expand=0&rev=14
2024-01-07 20:38:38 +00:00
BuildRequires: %{python_module pip}
Accepting request 833056 from home:pgajdos:python - version update to 20200726 - Rename PDFTextExtractionNotAllowedError to PDFTextExtractionNotAllowed to revert breaking change - Always try to get CMap, not only for identity encodings - Support for painting multiple rectangles at once - Validate image object in do_EI is a PDFStream - Hiding fallback xref by default from dumppdf.py output - Raise a warning instead of an error when extracting text from a non-extractable PDF - Switched from pycryptodome to cryptography package for AES decryption - Python3 shebang line to script in tools - Fix ordering of textlines within a textbox when `boxes_flow=None` - Allow boxes_flow LAParam to be passed as None, validate the input, and update documentation - Also accept file-like objects in high level functions `extract_text` and `extract_pages` - Text no longer comes in reverse order when advanced layout analysis is disabled - Updated misleading documentation for `word_margin` and `char_margin` - Ignore ValueError when converting font encoding differences - Grouping of text lines outside of parent container bounding box - Group text lines if they are centered - Python3 shebang line to script in tools - Fix ordering of textlines within a textbox when `boxes_flow=None` - do not require nose for testing - added patches fix https://github.com/pdfminer/pdfminer.six/pull/489 + python-pdfminer.six-remove-nose.patch OBS-URL: https://build.opensuse.org/request/show/833056 OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-pdfminer.six?expand=0&rev=5
2020-09-08 18:34:28 +00:00
BuildRequires: %{python_module pytest}
- Update to 20250327: * Added + Support for Python 3.13 + Support for zipped jpeg's + Fuzzing harnesses for integration into Google's OSS-Fuzz + Support for setuptools-git-versioning version 2.0.0 * Changed + Reduce memory overhead on runlength encoding by using lists + Using pyproject.toml instead of setup.py + Updated Python 3.7 syntax to 3.8 + Updated all Python version specifications to a minimum of 3.8 + Using absolute instead of relative imports + Using standard library functions for ascii85 and asciihex * Fixed + TypeError when CID character widths are not parseable as floats + TypeError raised by extract_text method with compressed PDF file + PSBaseParser can't handle tokens split across end of buffer + TypeError when CropBox is an indirect object reference + Remove redundant line to be able to recognize rectangles + Support indirect objects for filters + Make sure bytes is bytes where it counts + TypeError when corrupt PDF object reference cannot be parsed as int + TypeError when corrupt PDF literal cannot be converted to str + ValueError when corrupt PDF specifies a negative xref location + ValueError when corrupt PDF specifies an invalid mediabox + RecursionError when corrupt PDF specifies a recursive /Pages object + TypeError when corrupt PDF specifies text-positioning operators with invalid values + inline image parsing fails when stream data contains "EI\n" + TypeError when parsing object reference as mediabox OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-pdfminer.six?expand=0&rev=16
2025-04-07 05:36:55 +00:00
BuildRequires: %{python_module setuptools_scm >= 8}
- update to 20231228: * Removed Support for Python 3.6 and 3.7 * Output converter for the hOCR format * Font name aliases for Arial, Courier New and Times New Roman * Documentation on why special characters can sometimes not be extracted * Storing Bezier path and dashing style of line in LTCurve * Broken CI/CD pipeline by setting upper version limit for black, mypy, pip and setuptools * `flake8` failures * `ValueError` when bmp images with 1 bit channel are decoded * `ValueError` when trying to decrypt empty metadata values * Sphinx errors during building of documentation * `TypeError` when getting default width of font * Installing typing-extensions on Python 3.6 and 3.7 * `TypeError` in cmapdb.py when parsing null characters * Color "convenience operators" now (per spec) also set color space * `ValueError` when extracting images, due to breaking changes in Pillow * Small typo's and issues in the documentation * Ignore non-Unicode cmaps in TrueType fonts * Using non-hardcoded version string and setuptools-git- versioning to enable installation from source and building on Python 3.12 * Usage of `if __name__ == "__main__"` where it was only intended for testing purposes - Option to disable boxes flow layout analysis when using pdf2txt - Exporting images without any specific encoding - Rename PDFTextExtractionNotAllowedError to PDFTextExtractionNotAllowed to revert breaking change OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-pdfminer.six?expand=0&rev=14
2024-01-07 20:38:38 +00:00
BuildRequires: %{python_module wheel}
BuildRequires: fdupes
BuildRequires: python-rpm-macros
Requires: python-charset-normalizer >= 2.0.0
Requires: python-cryptography >= 36.0.0
Requires(post): update-alternatives
- Update to 20250327: * Added + Support for Python 3.13 + Support for zipped jpeg's + Fuzzing harnesses for integration into Google's OSS-Fuzz + Support for setuptools-git-versioning version 2.0.0 * Changed + Reduce memory overhead on runlength encoding by using lists + Using pyproject.toml instead of setup.py + Updated Python 3.7 syntax to 3.8 + Updated all Python version specifications to a minimum of 3.8 + Using absolute instead of relative imports + Using standard library functions for ascii85 and asciihex * Fixed + TypeError when CID character widths are not parseable as floats + TypeError raised by extract_text method with compressed PDF file + PSBaseParser can't handle tokens split across end of buffer + TypeError when CropBox is an indirect object reference + Remove redundant line to be able to recognize rectangles + Support indirect objects for filters + Make sure bytes is bytes where it counts + TypeError when corrupt PDF object reference cannot be parsed as int + TypeError when corrupt PDF literal cannot be converted to str + ValueError when corrupt PDF specifies a negative xref location + ValueError when corrupt PDF specifies an invalid mediabox + RecursionError when corrupt PDF specifies a recursive /Pages object + TypeError when corrupt PDF specifies text-positioning operators with invalid values + inline image parsing fails when stream data contains "EI\n" + TypeError when parsing object reference as mediabox OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-pdfminer.six?expand=0&rev=16
2025-04-07 05:36:55 +00:00
Requires(postun): update-alternatives
Provides: python-pdfminer3k = %{version}
Obsoletes: python-pdfminer3k < %{version}
BuildArch: noarch
%python_subpackages
%description
Pdfminer.six is a community maintained fork of the original PDFMiner. It
is a tool for extracting information from PDF documents. It focuses on
getting and analyzing text data. Pdfminer.six extracts the text from a
page directly from the sourcecode of the PDF. It can also be used to get
the exact location, font or color of the text.
%prep
- update to 20231228: * Removed Support for Python 3.6 and 3.7 * Output converter for the hOCR format * Font name aliases for Arial, Courier New and Times New Roman * Documentation on why special characters can sometimes not be extracted * Storing Bezier path and dashing style of line in LTCurve * Broken CI/CD pipeline by setting upper version limit for black, mypy, pip and setuptools * `flake8` failures * `ValueError` when bmp images with 1 bit channel are decoded * `ValueError` when trying to decrypt empty metadata values * Sphinx errors during building of documentation * `TypeError` when getting default width of font * Installing typing-extensions on Python 3.6 and 3.7 * `TypeError` in cmapdb.py when parsing null characters * Color "convenience operators" now (per spec) also set color space * `ValueError` when extracting images, due to breaking changes in Pillow * Small typo's and issues in the documentation * Ignore non-Unicode cmaps in TrueType fonts * Using non-hardcoded version string and setuptools-git- versioning to enable installation from source and building on Python 3.12 * Usage of `if __name__ == "__main__"` where it was only intended for testing purposes - Option to disable boxes flow layout analysis when using pdf2txt - Exporting images without any specific encoding - Rename PDFTextExtractionNotAllowedError to PDFTextExtractionNotAllowed to revert breaking change OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-pdfminer.six?expand=0&rev=14
2024-01-07 20:38:38 +00:00
%autosetup -p1 -n pdfminer.six-%{version}
sed -i -e '/^#!\//, 1d' pdfminer/psparser.py
sed -i '1i #!%{_bindir}/python3' tools/dumppdf.py tools/pdf2txt.py
sed -i "s/__VERSION__/%{version}/g" pdfminer/__init__.py
%build
- Update to 20250327: * Added + Support for Python 3.13 + Support for zipped jpeg's + Fuzzing harnesses for integration into Google's OSS-Fuzz + Support for setuptools-git-versioning version 2.0.0 * Changed + Reduce memory overhead on runlength encoding by using lists + Using pyproject.toml instead of setup.py + Updated Python 3.7 syntax to 3.8 + Updated all Python version specifications to a minimum of 3.8 + Using absolute instead of relative imports + Using standard library functions for ascii85 and asciihex * Fixed + TypeError when CID character widths are not parseable as floats + TypeError raised by extract_text method with compressed PDF file + PSBaseParser can't handle tokens split across end of buffer + TypeError when CropBox is an indirect object reference + Remove redundant line to be able to recognize rectangles + Support indirect objects for filters + Make sure bytes is bytes where it counts + TypeError when corrupt PDF object reference cannot be parsed as int + TypeError when corrupt PDF literal cannot be converted to str + ValueError when corrupt PDF specifies a negative xref location + ValueError when corrupt PDF specifies an invalid mediabox + RecursionError when corrupt PDF specifies a recursive /Pages object + TypeError when corrupt PDF specifies text-positioning operators with invalid values + inline image parsing fails when stream data contains "EI\n" + TypeError when parsing object reference as mediabox OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-pdfminer.six?expand=0&rev=16
2025-04-07 05:36:55 +00:00
export SETUPTOOLS_SCM_PRETEND_VERSION="%{version}"
- update to 20231228: * Removed Support for Python 3.6 and 3.7 * Output converter for the hOCR format * Font name aliases for Arial, Courier New and Times New Roman * Documentation on why special characters can sometimes not be extracted * Storing Bezier path and dashing style of line in LTCurve * Broken CI/CD pipeline by setting upper version limit for black, mypy, pip and setuptools * `flake8` failures * `ValueError` when bmp images with 1 bit channel are decoded * `ValueError` when trying to decrypt empty metadata values * Sphinx errors during building of documentation * `TypeError` when getting default width of font * Installing typing-extensions on Python 3.6 and 3.7 * `TypeError` in cmapdb.py when parsing null characters * Color "convenience operators" now (per spec) also set color space * `ValueError` when extracting images, due to breaking changes in Pillow * Small typo's and issues in the documentation * Ignore non-Unicode cmaps in TrueType fonts * Using non-hardcoded version string and setuptools-git- versioning to enable installation from source and building on Python 3.12 * Usage of `if __name__ == "__main__"` where it was only intended for testing purposes - Option to disable boxes flow layout analysis when using pdf2txt - Exporting images without any specific encoding - Rename PDFTextExtractionNotAllowedError to PDFTextExtractionNotAllowed to revert breaking change OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-pdfminer.six?expand=0&rev=14
2024-01-07 20:38:38 +00:00
%pyproject_wheel
%install
- update to 20231228: * Removed Support for Python 3.6 and 3.7 * Output converter for the hOCR format * Font name aliases for Arial, Courier New and Times New Roman * Documentation on why special characters can sometimes not be extracted * Storing Bezier path and dashing style of line in LTCurve * Broken CI/CD pipeline by setting upper version limit for black, mypy, pip and setuptools * `flake8` failures * `ValueError` when bmp images with 1 bit channel are decoded * `ValueError` when trying to decrypt empty metadata values * Sphinx errors during building of documentation * `TypeError` when getting default width of font * Installing typing-extensions on Python 3.6 and 3.7 * `TypeError` in cmapdb.py when parsing null characters * Color "convenience operators" now (per spec) also set color space * `ValueError` when extracting images, due to breaking changes in Pillow * Small typo's and issues in the documentation * Ignore non-Unicode cmaps in TrueType fonts * Using non-hardcoded version string and setuptools-git- versioning to enable installation from source and building on Python 3.12 * Usage of `if __name__ == "__main__"` where it was only intended for testing purposes - Option to disable boxes flow layout analysis when using pdf2txt - Exporting images without any specific encoding - Rename PDFTextExtractionNotAllowedError to PDFTextExtractionNotAllowed to revert breaking change OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-pdfminer.six?expand=0&rev=14
2024-01-07 20:38:38 +00:00
%pyproject_install
%python_expand %fdupes %{buildroot}%{$python_sitelib}
mv %{buildroot}%{_bindir}/dumppdf.py %{buildroot}%{_bindir}/dumppdf
mv %{buildroot}%{_bindir}/pdf2txt.py %{buildroot}%{_bindir}/pdf2txt
%python_clone -a %{buildroot}%{_bindir}/pdf2txt
%python_clone -a %{buildroot}%{_bindir}/dumppdf
%check
%pytest
%post
%python_install_alternative pdf2txt
%python_install_alternative dumppdf
%postun
%python_uninstall_alternative pdf2txt
%python_uninstall_alternative dumppdf
%files %{python_files}
%license LICENSE
%doc README.md
%python_alternative %{_bindir}/dumppdf
%python_alternative %{_bindir}/pdf2txt
- update to 20231228: * Removed Support for Python 3.6 and 3.7 * Output converter for the hOCR format * Font name aliases for Arial, Courier New and Times New Roman * Documentation on why special characters can sometimes not be extracted * Storing Bezier path and dashing style of line in LTCurve * Broken CI/CD pipeline by setting upper version limit for black, mypy, pip and setuptools * `flake8` failures * `ValueError` when bmp images with 1 bit channel are decoded * `ValueError` when trying to decrypt empty metadata values * Sphinx errors during building of documentation * `TypeError` when getting default width of font * Installing typing-extensions on Python 3.6 and 3.7 * `TypeError` in cmapdb.py when parsing null characters * Color "convenience operators" now (per spec) also set color space * `ValueError` when extracting images, due to breaking changes in Pillow * Small typo's and issues in the documentation * Ignore non-Unicode cmaps in TrueType fonts * Using non-hardcoded version string and setuptools-git- versioning to enable installation from source and building on Python 3.12 * Usage of `if __name__ == "__main__"` where it was only intended for testing purposes - Option to disable boxes flow layout analysis when using pdf2txt - Exporting images without any specific encoding - Rename PDFTextExtractionNotAllowedError to PDFTextExtractionNotAllowed to revert breaking change OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-pdfminer.six?expand=0&rev=14
2024-01-07 20:38:38 +00:00
%{python_sitelib}/pdfminer
- Update to 20250327: * Added + Support for Python 3.13 + Support for zipped jpeg's + Fuzzing harnesses for integration into Google's OSS-Fuzz + Support for setuptools-git-versioning version 2.0.0 * Changed + Reduce memory overhead on runlength encoding by using lists + Using pyproject.toml instead of setup.py + Updated Python 3.7 syntax to 3.8 + Updated all Python version specifications to a minimum of 3.8 + Using absolute instead of relative imports + Using standard library functions for ascii85 and asciihex * Fixed + TypeError when CID character widths are not parseable as floats + TypeError raised by extract_text method with compressed PDF file + PSBaseParser can't handle tokens split across end of buffer + TypeError when CropBox is an indirect object reference + Remove redundant line to be able to recognize rectangles + Support indirect objects for filters + Make sure bytes is bytes where it counts + TypeError when corrupt PDF object reference cannot be parsed as int + TypeError when corrupt PDF literal cannot be converted to str + ValueError when corrupt PDF specifies a negative xref location + ValueError when corrupt PDF specifies an invalid mediabox + RecursionError when corrupt PDF specifies a recursive /Pages object + TypeError when corrupt PDF specifies text-positioning operators with invalid values + inline image parsing fails when stream data contains "EI\n" + TypeError when parsing object reference as mediabox OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-pdfminer.six?expand=0&rev=16
2025-04-07 05:36:55 +00:00
%{python_sitelib}/pdfminer[_.]six-%{version}.dist-info
%changelog