14
0

- Update to v1.19.6

* Fixed #1620. The TextPage created by Page.get_textpage() will
    now be freed correctly (removed memory leak).
  * Fixed #1601. Document open errors should now be more concise
    and easier to interpret. In the course of this, two
    PyMuPDF-specific Python exceptions have been added:
    EmptyFileError – raised when trying to create a Document
    (fitz.open()) from an empty file or zero-length memory.
    FileDataError – raised when MuPDF encounters irrecoverable
    document structure issues.
  * Added Page.load_widget() given a PDF field’s xref.
  * Added Dictionary pdfcolor which provide the about 500 colors
    defined as PDF color values with the lower case color name as
    key.
  * Added algebra functionality to the Quad class. These objects
    can now also be added and subtracted among themselves, and be
    multiplied by numbers and matrices.
  * Added new constants defining the default text extraction flags
    for more comfortable handling. Their naming convention is like
    TEXTFLAGS_WORDS for page.get_text("words"). See Text Extraction
    Flags Defaults.
  * Changed Page.annots() and Page.widgets() to detect and prevent
    reloading the page (illegally) inside the iterator loops via
    Document.reload_page(). Doing this brings down the interpretor.
    Documented clean ways to do annotation and widget mass updates
    within properly designed loops.
  * Changed several internal utility functions to become
    standalone (“SWIG inline”) as opposed to be part of the Tools
    class. This, among other things, increases the performance of
    geometry object creation.

OBS-URL: https://build.opensuse.org/package/show/devel:languages:python/python-PyMuPDF?expand=0&rev=41
This commit is contained in:
2022-05-10 00:11:10 +00:00
committed by Git OBS Bridge
parent 80bbe9de57
commit 7b05a7d1ac
4 changed files with 366 additions and 4 deletions

View File

@@ -1,3 +0,0 @@
version https://git-lfs.github.com/spec/v1
oid sha256:cb6ba6c5038ce590a088b9bf320c6e9ce714c1fa304181ece8b551d8589a8b21
size 308451

3
PyMuPDF-1.19.6.tar.gz Normal file
View File

@@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:ef3d13e27f1585d776f6a2597f113aabd28d36b648b983a72850b21c5399ab08
size 2270248

View File

@@ -1,4 +1,366 @@
-------------------------------------------------------------------
Sun Mar 6 12:27:52 UTC 2022 - Hsiu-Ming Chang <cges30901@gmail.com>
- Update to v1.19.6
* Fixed #1620. The TextPage created by Page.get_textpage() will
now be freed correctly (removed memory leak).
* Fixed #1601. Document open errors should now be more concise
and easier to interpret. In the course of this, two
PyMuPDF-specific Python exceptions have been added:
EmptyFileError raised when trying to create a Document
(fitz.open()) from an empty file or zero-length memory.
FileDataError raised when MuPDF encounters irrecoverable
document structure issues.
* Added Page.load_widget() given a PDF fields xref.
* Added Dictionary pdfcolor which provide the about 500 colors
defined as PDF color values with the lower case color name as
key.
* Added algebra functionality to the Quad class. These objects
can now also be added and subtracted among themselves, and be
multiplied by numbers and matrices.
* Added new constants defining the default text extraction flags
for more comfortable handling. Their naming convention is like
TEXTFLAGS_WORDS for page.get_text("words"). See Text Extraction
Flags Defaults.
* Changed Page.annots() and Page.widgets() to detect and prevent
reloading the page (illegally) inside the iterator loops via
Document.reload_page(). Doing this brings down the interpretor.
Documented clean ways to do annotation and widget mass updates
within properly designed loops.
* Changed several internal utility functions to become
standalone (“SWIG inline”) as opposed to be part of the Tools
class. This, among other things, increases the performance of
geometry object creation.
* Changed Document.update_stream() to always accept stream
updates - whether or not the dictionary object behind the xref
already is a stream. Thus the former new parameter is now
ignored and will be removed in v1.20.0.
-------------------------------------------------------------------
Sun Feb 6 14:02:23 UTC 2022 - Hsiu-Ming Chang <cges30901@gmail.com>
- Update to v1.19.5
* Fixed #1518. A limited “fix”: in some cases, rectangles and
quadrupels were not correctly encoded to support re-drawing by
Shape.
* Fixed #1521. This had the same ultimate reason behind issue
#1510.
* Fixed #1513. Some Optional Content functions did not support
non-ASCII characters.
* Fixed #1510. Support more soft-mask image subtypes.
* Fixed #1507. Immunize against items in the outlines chain,
that are "null" objects.
* Fixed re-opened #1417. (“too many open files”). This was due
to insufficient calls to MuPDFs fz_drop_document(). This also
fixes #1550.
* Fixed several undocumented issues in relation to incorrectly
setting the text span origin point_like.
* Fixed undocumented error computing the character bbox in
method Page.get_texttrace() when text is flipped (as opposed to
just rotated).
* Added items to the dictionary returned by image_properties():
orientation and transform report the natural image orientation
(EXIF data).
* Added method Document.xref_copy(). It will make a given target
PDF object an exact copy of a source object.
-------------------------------------------------------------------
Mon Jan 10 12:52:19 UTC 2022 - Hsiu-Ming Chang <cges30901@gmail.com>
- Update to v1.19.4
* Fixed #1505. Immunize against circular outline items.
* Fixed #1484. Correct CropBox coordinates are now returned in
all situations.
* Fixed #1479.
* Fixed #1474. TextPage objects are now properly deleted again.
* Added Page methods and attributes for PDF /ArtBox, /BleedBox,
/TrimBox.
* Added global attribute TESSDATA_PREFIX for easy checking of OCR
support.
* Changed Document.xref_set_key() such that dictionary keys will
physically be removed if set to value "null".
* Changed Document.extract_font() to optionally return a
dictionary (instead of a tuple).
-------------------------------------------------------------------
Fri Dec 17 13:03:20 UTC 2021 - Hsiu-Ming Chang <cges30901@gmail.com>
- Update to v1.19.3
* Fixed #1351. Reverted code that introduced the memory growth
in v1.18.15.
* Fixed #1417. Developped circumvention for growth of open file
handles using Document.insert_pdf().
* Fixed #1418. Developped circumvention for memory growth using
Document.insert_pdf().
* Fixed #1430. Developped circumvention for mass pixmap
generations of document pages.
* Fixed #1433. Solves a bbox error for some Type 3 font in
PyMuPDF text processing.
* Added Pixmap.color_topusage() to determine the share of the
most frequently used color. Solves #1397.
* Added Pixmap.warp() which makes a new pixmap from a given
arbitrary convex quad inside the pixmap.
* Added Annot.irt_xref and Annot.set_irt_xref() to inquire or
set the /IRT (“In Responde To”) property of an annotation.
Implements #1450.
* Added Rect.torect() and IRect.torect() which compute a matrix
that transforms to a given other rectangle.
* Changed Pixmap.color_count() to also return the count of each
color.
* Changed Page.get_texttrace() to also return correct span and
character bboxes if span["dir"] != (1, 0).
-------------------------------------------------------------------
Mon Nov 22 10:33:01 UTC 2021 - Hsiu-Ming Chang <cges30901@gmail.com>
- Update to v1.19.2
* Fixed #1388. Fixed intermittent memory corruption when insert or
updating annotations.
* Fixed #1375. Inconsistencies between line numbers as returned
by the “words” and the “dict” options of `Page.get_text()` have
been corrected.
* Fixed #1364. The check for being a "rawdict" span in
`recover_span_quad()` now works correctly.
* Fixed #1342. Corrected the check for rectangle infiniteness in
`Page.show_pdf_page()`.
* Changed `Page.get_drawings()`, `Page.get_cdrawings()` to return
an indicator on the area orientation covered by a rectangle. This
implements #1355. Also, the recognition rate for rectangles and
quads has been significantly improved.
* Changed all text search and extraction methods to set the new
flags option TEXT_MEDIABOX_CLIP to ON by default. That bit causes
the automatic suppression of all characters that are completely
outside a pages mediabox (in as far as that notion is supported
for a document type). This eliminates the need for using
clip=page.rect or similar for omitting text outside the visible
area.
* Added parameter "dpi" to `Page.get_pixmap()` and
`Annot.get_pixmap()`. When given, parameter "matrix" is ignored,
and a Pixmap with the desired dots per inch is created.
* Added attributes `Pixmap.is_monochrome` and `Pixmap.is_unicolor`
allowing fast checks of pixmap properties. Addresses #1397.
* Added method `Pixmap.color_count()` to determine the unique
colors in the pixmap.
* Added boolean parameter "compress" to PDF document method
`Document.update_stream()`. Addresses / enables solution for
#1408.
- from v1.19.1
* Fixed #1328. “words” text extraction again returns correct (x0,
y0) coordinates.
* Changed `Page.get_textpage_ocr()`: it now supports parameter
dpi to control OCR quality. It is also possible to choose whether
the full page should be OCRed or only the images displayed by the
page.
* Changed `Page.get_drawings()` and `Page.get_cdrawings()` to
automatically convert colors to RGB color tuples. Implements
#1332. Similar change was applied to `Page.get_texttrace()`.
* Changed `Page.get_text()` to support a parameter sort. If set
to True the output is conveniently sorted.
- from v1.19.0
* Supports MuPDF 1.19.*
* Changed terminology and meaning of important geometry concepts:
Rectangles are now characterized as finite, valid or empty, while
the definitions of these terms have also changed. Rectangles
specifically are now thought of being “open”: not all corners
and sides are considered part of the retangle. Please do read
the Rect section for details.
* Added new parameter “no_new_id” to `Document.save()` /
`Document.tobytes()` methods. Use it to suppress updating the
second item of the document /ID which in PDF indicates that the
original file has been updated. If the PDF has no /ID at all yet,
then no new one will be created either.
* Added a journalling facility for PDF updates. This allows logging
changes, undoing or redoing them, or saving the journal for later
use. Refer to `Document.journal_enable()` and friends.
* Added new Pixmap methods `Pixmap.pdfocr_save()` and
`Pixmap.pdfocr_tobytes()`, which generate a 1-page PDF containing
the pixmap as PNG image with OCR text layer.
* Added `Page.get_textpage_ocr()` which executes optical character
recognition for the page, then extracts the results and stores
them together with “normal” page content in a TextPage. Use or
reuse this object in subsequent text extractions and text
searches to avoid multiple efforts. The existing text search
and text extraction methods have been extended to support a
separately created textpage see next item.
* Added a new parameter textpage to text extraction and text search
methods. This allows reuse of a previously created TextPage and
thus achieves significant runtime benefits which is especially
important for the new OCR features. But “normal” text extractions
can definitely also benefit.
* Added `Page.get_texttrace()`, a technical method delivering
low-level text character properties. It was present before as a
private method, but the author felt it now is mature enough to be
officially available. It specifically includes a “sequence
number” which indicates the page appearance build operation that
painted the text.
* Added `Page.get_bboxlog()` which delivers the list of
rectangles of page objects like text, images or drawings. Its
significance lies in its sequence: rectangles intersecting areas
with a lower index are covering or hiding them.
* Changed methods `Page.get_drawings()` and
`Page.get_cdrawings()` to include a “sequence number” indicating
the page appearance build operation that created the drawing.
* Fixed #1311. Field values in comboboxes should now be handled
correctly.
* Fixed #1290. Error was caused by incorrect rectangle emptiness
check, which is fixed due to new geometry logic of this version.
* Fixed #1286. Text alignment for redact annotations is working
again.
* Fixed #1287. Infinite loop issue for non-Windows systems when
applying some redactions has been resolved.
* Fixed #1284. Text layout destruction after applying redactions in
some cases has been resolved.
- from v1.18.19
* Fixed issue #1266. Failure to set `Pixmap.samples` in important
cases, was hotfixed in a new version 1.18.19.
- from v1.18.18
* Fixed issue #1257. Removing the read-only flag from PDF fields
is now possible.
* Fixed issue #1252. Now correctly specifying the zoom value for
PDF link annotations.
* Fixed issue #1244. Now correctly computing the transform matrix
in `Page.get_image__bbox()`.
* Fixed issue #1241. Prevent returning artifact characters in
`Page.get_textbox()`, which happened in certain constellations.
* Fixed issue #1234. Avoid creating infinite rectangles in corner
cases `Page.get_drawings()`, `Page.get_cdrawings()`.
* Added test data and test scripts to the source PyPI source
distribution.
- from v1.18.17
* Fixed issue #1199. Using a non-existing page number in
`Document.get_page_images()` and friends will no longer lead to
segfaults.
* Changed `Page.get_drawings()` to now differentiate between
“stroke”, “fill” and combined paths. Paths containing more than
one rectangle (i.e. “re” items) are now supported. Extracting
“clipped” paths is now available as an option.
* Added `Page.get_cdrawings()`, performance-optimized version of
`Page.get_drawings()`.
* Added `Pixmap.samples_mv`, memoryview of a pixmaps pixel area.
Does not copy and thus always accesses the current state of that
area.
* Added `Pixmap.samples_ptr`, Python “pointer” to a pixmaps pixel
area. Allows much faster creation (factor 800+) of Qt images.
- from v1.18.16
* Fixed issue #1184. Existing PDF widget fonts in a PDF are now
accepted (i.e. not forcedly changed to a Base-14 font).
* Fixed issue #1154. Text search hits should now be correct when
clip is specified.
* Fixed issue #1152.
* Fixed issue #1146.
* Added `Link.flags` and `Link.set_flags()` to the Link class.
Implements enhancement requests #1187.
* Added option to simulate `TextWriter.fill_textbox() output for
predicting the number of lines, that a given text would occupy in
the textbox.
* Added text output support as subcommand gettext to the fitz CLI
module. Most importantly, original physical text layout
reproduction is now supported.
- from v1.18.15
* Fixed issue #1088. Removing an annotations fill color should now
work again both ways, using the fill_color=[] argument in
`Annot.update()` as well as fill=[] in `Annot.set_colors()`.
* Fixed issue #1081. `Document.subset_fonts()`: fixed an error
which created wrong character widths for some fonts.
* Fixed issue #1078. `Page.get_text()` and other methods related to
text extraction: changed the default value of the TextPage flags
parameter. All whitespace and ligatures are now preserved.
* Fixed issue #1085. The old snake_cased alias of
`fitz.detTextlength` is now defined correctly.
* Changed `Document.subset_fonts()` will now correctly prefix font
subsets with an appropriate six letter uppercase tag, complying
with the PDF specification.
* Added new method `Widget.button_states()` which returns the
possible values that a button-type field can have when being set
to “on” or “off”.
* Added support of text with Small Capital letters to the Font and
TextWriter classes. This is reflected by an additional bool
parameter small_caps in various of their methods.
- from v1.18.14
* Finished implementing new, “snake_cased” names for methods and
properties, that were “camelCased” and awkward in many aspects.
At the end of this documentation, there is section Deprecated
Names with more background and a mapping of old to new names.
* Fixed issue #1053. `Page.insert_image()`: when given, include
image mask in the hash computation.
* Fixed issue #1043. Added `Pixmap.getPNGdata` to the aliases of
`Pixmap.tobytes()`.
* Fixed an internal error when computing the envelopping
rectangle of drawn paths as returned by `Page.get_drawings()`.
* Fixed an internal error occasionally causing loops when
outputting text via `TextWriter.fill_textbox()`.
* Added `Font.char_lengths()`, which returns a tuple of character
widths of a string.
* Added more ways to specify pages in `Document.delete_pages()`.
Now a sequence (list, tuple or range) can be specified, and the
Python del statement can be used. In the latter case, Python
slices are also accepted.
* Changed `Document.del_toc_item()`, which disables a single item
of the TOC: previously, the title text was removed. Instead, now
the complete item will be shown grayed-out by supporting viewers.
- from v1.18.13
* Fixed issue #1014
* Fixed an internal memory leak when computing image bboxes
`Page.get_image_bbox()`.
* Added support for low-level access and modification of the PDF
trailer. Applies to `Document.xref_get_keys()`,
`Document.xref_get_key(), and Document.xref_set_key()`.
* Added documentation for maintaining private entries in PDF
metadata.
* Added documentation for handling transparent image insertions,
`Page.insert_image()`.
* Added `Page.get_image_rects()`, an improved version of
`Page.get_image_bbox()`.
* Changed `Document.delete_pages()` to support various ways of
specifying pages to delete.
* Changed `Page.insert_image()` to also accept the xref of an
existing image in the file. This allows “copying” images between
pages, and extremely fast mutiple insertions.
* Changed `Page.insert_image()` to also accept the integer
parameter alpha. To be used for performance improvements.
* Changed `Pixmap.set_alpha()` to support new parameters for
pre-multiplying colors with their alpha values and setting a
specific color to fully transparent (e.g. white).
* Changed `Document.embfile_add()` to automatically set creation
and modification date-time. Correspondingly,
`Document.embfile_upd()` automatically maintains modification
date-time (/ModDate PDF key), and `Document.embfile_info()`
correspondingly reports these data. In addition, the embedded
files associated “collection item” is included via its xref.
This supports the development of PDF portfolio applications.
-------------------------------------------------------------------
Sat Apr 10 12:56:40 UTC 2021 - John Vandenberg <jayvdb@gmail.com>
- Update to v1.18.11
* Improved layout of source distribution material.
* Stabilized Linux distribution detection for generating PyMuPDF
from sources.
* Page.get_xobjects delivers the result of Document.get_page_xobjects.
* Page.get_image_info delivers meta information for all images shown
on the page.
* Tools.mupdf_display_warnings allows setting on / off the display
of MuPDF-generated warnings. The default is off.
* Document.ez_save convenience alias of :meth:`Document.save`
with some different defaults.
* Image extractions of document pages now also contain the image's
**transformation matrix**. This concerns `Page.get_image_bbox`
and the DICT, JSON, RAWDICT, and RAWJSON variants of `Page.get_text`.
- from v1.18.10
* Added old aliases for `DisplayList.get_pixmap` and
`DisplayList.get_textpage`.
* Stabilized removal of JavaScript objects with `Document.scrub`.
* Removed a loop in the reworked `TextWriter.fill_textbox`.
* `Document.xref_get_keys` and `Document.xref_get_key` to also allow
accessing the PDF trailer dictionary. This can be done by using
`-1` as the xref number argument.
* Added a number of functions for reconstructing the quads for text
lines, spans and characters extracted by `Page.get_text` options
"dict" and "rawdict".
* Added `Tools.unset_quad_corrections` to suppress character quad
corrections (occasionally required for erroneous fonts).
--------------------------------------------------------------------
Sat Feb 27 00:04:25 UTC 2021 - John Vandenberg <jayvdb@gmail.com>
- Revised License to be AGPL-3.0-only

View File

@@ -21,7 +21,7 @@
%define skip_python2 1
%define pypi_name PyMuPDF
Name: python-%{pypi_name}
Version: 1.18.9
Version: 1.19.6
Release: 0
Summary: Python binding for MuPDF, a PDF and XPS viewer
License: AGPL-3.0-only