------------------------------------------------------------------- Mon Apr 7 05:17:52 UTC 2025 - Steve Kowalik - Update to 20250327: * Added + Support for Python 3.13 + Support for zipped jpeg's + Fuzzing harnesses for integration into Google's OSS-Fuzz + Support for setuptools-git-versioning version 2.0.0 * Changed + Reduce memory overhead on runlength encoding by using lists + Using pyproject.toml instead of setup.py + Updated Python 3.7 syntax to 3.8 + Updated all Python version specifications to a minimum of 3.8 + Using absolute instead of relative imports + Using standard library functions for ascii85 and asciihex * Fixed + TypeError when CID character widths are not parseable as floats + TypeError raised by extract_text method with compressed PDF file + PSBaseParser can't handle tokens split across end of buffer + TypeError when CropBox is an indirect object reference + Remove redundant line to be able to recognize rectangles + Support indirect objects for filters + Make sure bytes is bytes where it counts + TypeError when corrupt PDF object reference cannot be parsed as int + TypeError when corrupt PDF literal cannot be converted to str + ValueError when corrupt PDF specifies a negative xref location + ValueError when corrupt PDF specifies an invalid mediabox + RecursionError when corrupt PDF specifies a recursive /Pages object + TypeError when corrupt PDF specifies text-positioning operators with invalid values + inline image parsing fails when stream data contains "EI\n" + TypeError when parsing object reference as mediabox + Resolving mediabox and pdffont + Keywords that aren't terminated by the pattern END_KEYWORD before end-of-stream are parsed + ValueError wrong error message when specifying codec for text output + Resolve stream filter parameters + Reading cmap's with whitespace in the name + Optimize apply_png_predictor by using lists * Deprecated + The third argument (generation number) to PDFObjRef * Removed + Support for Python 3.8 + Deprecated tools, functions and classes ------------------------------------------------------------------- Sun Jan 7 20:34:47 UTC 2024 - Dirk Müller - update to 20231228: * Removed Support for Python 3.6 and 3.7 * Output converter for the hOCR format * Font name aliases for Arial, Courier New and Times New Roman * Documentation on why special characters can sometimes not be extracted * Storing Bezier path and dashing style of line in LTCurve * Broken CI/CD pipeline by setting upper version limit for black, mypy, pip and setuptools * `flake8` failures * `ValueError` when bmp images with 1 bit channel are decoded * `ValueError` when trying to decrypt empty metadata values * Sphinx errors during building of documentation * `TypeError` when getting default width of font * Installing typing-extensions on Python 3.6 and 3.7 * `TypeError` in cmapdb.py when parsing null characters * Color "convenience operators" now (per spec) also set color space * `ValueError` when extracting images, due to breaking changes in Pillow * Small typo's and issues in the documentation * Ignore non-Unicode cmaps in TrueType fonts * Using non-hardcoded version string and setuptools-git- versioning to enable installation from source and building on Python 3.12 * Usage of `if __name__ == "__main__"` where it was only intended for testing purposes - drop import-from-non-pythonpath-files.patch (upstream) ------------------------------------------------------------------- Mon Dec 11 17:24:21 UTC 2023 - Jonathan Papineau - Update to 20221105 - Option to disable boxes flow layout analysis when using pdf2txt - Add support for PDF 2.0 (ISO 32000-2) AES-256 encryption - Support for Paeth PNG filter compression (predictor value = 4) - Type annotations - Export type annotations from pypi package per PEP561 - Support for identity cmap's - Add support for PDF page labels - Installation of Pillow as an optional extra dependency - Exporting images without any specific encoding - Output converter for the hOCR format - Font name aliases for Arial, Courier New and Times New Roman - Documentation on why special characters can sometimes not be extracted - Remove patch python-pdfminer.six-remove-nose.patch - Update dependencies ------------------------------------------------------------------- Fri Aug 25 14:07:07 UTC 2023 - ecsos - Add %{?sle15_python_module_pythons} ------------------------------------------------------------------- Tue Apr 11 14:39:08 UTC 2023 - pgajdos@suse.com - python-six is not required - python-pycryptodome is not required ------------------------------------------------------------------- Tue Nov 9 07:32:27 UTC 2021 - Steve Kowalik - Use pytest to run the testsuite. - Add patch import-from-non-pythonpath-files.patch: * Allow the test suite to find modules not shipped as modules. ------------------------------------------------------------------- Tue Sep 8 16:58:08 UTC 2020 - pgajdos@suse.com - version update to 20200726 - Rename PDFTextExtractionNotAllowedError to PDFTextExtractionNotAllowed to revert breaking change - Always try to get CMap, not only for identity encodings - Support for painting multiple rectangles at once - Validate image object in do_EI is a PDFStream - Hiding fallback xref by default from dumppdf.py output - Raise a warning instead of an error when extracting text from a non-extractable PDF - Switched from pycryptodome to cryptography package for AES decryption - Python3 shebang line to script in tools - Fix ordering of textlines within a textbox when `boxes_flow=None` - Allow boxes_flow LAParam to be passed as None, validate the input, and update documentation - Also accept file-like objects in high level functions `extract_text` and `extract_pages` - Text no longer comes in reverse order when advanced layout analysis is disabled - Updated misleading documentation for `word_margin` and `char_margin` - Ignore ValueError when converting font encoding differences - Grouping of text lines outside of parent container bounding box - Group text lines if they are centered - Python3 shebang line to script in tools - Fix ordering of textlines within a textbox when `boxes_flow=None` - do not require nose for testing - added patches fix https://github.com/pdfminer/pdfminer.six/pull/489 + python-pdfminer.six-remove-nose.patch ------------------------------------------------------------------- Wed May 20 07:26:10 UTC 2020 - Petr Gajdos - %python3_only -> %python_alternative ------------------------------------------------------------------- Thu Feb 13 19:29:31 UTC 2020 - Martin Hauke - Update to version v20200124 - Drop support for python2 (not longer supported by upstream) - Specfile cleanup - Run testsuite ------------------------------------------------------------------- Sun Mar 17 11:48:09 UTC 2019 - John Vandenberg - Update to v20181108 ------------------------------------------------------------------- Thu Oct 26 17:23:08 UTC 2017 - toddrme2178@gmail.com - Initial version