# # spec file for package python-pytesseract # # Copyright (c) 2018 SUSE LINUX GmbH, Nuernberg, Germany. # # All modifications and additions to the file contributed by third parties # remain the property of their copyright owners, unless otherwise agreed # upon. The license for this file, and modifications and additions to the # file, is the same license as for the pristine package itself (unless the # license for the pristine package is not an Open Source License, in which # case the license is the MIT License). An "Open Source License" is a # license that conforms to the Open Source Definition (Version 1.9) # published by the Open Source Initiative. # Please submit bugfixes or comments via http://bugs.opensuse.org/ # %{?!python_module:%define python_module() python-%{**} python3-%{**}} Name: python-pytesseract Version: 0.2.0 Release: 0 Summary: Python wrapper for google's Tesseract-OCR License: GPL-3.0 Group: Development/Languages/Python Url: https://github.com/madmaze/python-tesseract Source: https://files.pythonhosted.org/packages/source/p/pytesseract/pytesseract-%{version}.tar.gz Source10: https://raw.githubusercontent.com/madmaze/pytesseract/v%{version}/LICENSE BuildRequires: %{python_module devel} BuildRequires: %{python_module setuptools} BuildRequires: fdupes BuildRequires: python-rpm-macros # SECTION test requirements BuildRequires: %{python_module Pillow} BuildRequires: pkgconfig(tesseract) BuildRequires: tesseract-traineddata-eng BuildRequires: tesseract-traineddata-deu # /SECTION Requires: python-Pillow Requires: pkgconfig(tesseract) Requires: tesseract-traineddata-eng Requires: tesseract-traineddata-deu BuildArch: noarch %python_subpackages %description Python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and "read" the text embedded in images. Python-tesseract is a wrapper for `Google's Tesseract-OCR Engine`_. It is also useful as a stand-alone invocation script to tesseract, as it can read all image types supported by the Python Imaging Library, including jpeg, png, gif, bmp, tiff, and others, whereas tesseract-ocr by default only supports tiff and bmp. Additionally, if used as a script, Python-tesseract will print the recognized text in stead of writing it to a file. Support for confidence estimates and bounding box data is planned for future releases. %prep %setup -q -n pytesseract-%{version} sed -i -e '/^#!\//, 1d' src/pytesseract.py cp %{SOURCE10} . %build %python_build %install %python_install %python_expand %fdupes %{buildroot}%{$python_sitelib} %files %{python_files} %doc README.rst %license LICENSE %python3_only %{_bindir}/pytesseract %{python_sitelib}/* %changelog