2009-03-11 12:44:36 +00:00
|
|
|
#
|
|
|
|
# spec file for package perl-Text-Unidecode
|
|
|
|
#
|
2014-08-13 07:24:53 +00:00
|
|
|
# Copyright (c) 2014 SUSE LINUX Products GmbH, Nuernberg, Germany.
|
2011-11-17 15:54:32 +00:00
|
|
|
#
|
|
|
|
# All modifications and additions to the file contributed by third parties
|
|
|
|
# remain the property of their copyright owners, unless otherwise agreed
|
|
|
|
# upon. The license for this file, and modifications and additions to the
|
|
|
|
# file, is the same license as for the pristine package itself (unless the
|
|
|
|
# license for the pristine package is not an Open Source License, in which
|
|
|
|
# case the license is the MIT License). An "Open Source License" is a
|
|
|
|
# license that conforms to the Open Source Definition (Version 1.9)
|
|
|
|
# published by the Open Source Initiative.
|
|
|
|
|
|
|
|
# Please submit bugfixes or comments via http://bugs.opensuse.org/
|
|
|
|
#
|
2009-03-11 12:44:36 +00:00
|
|
|
|
2011-11-17 15:54:32 +00:00
|
|
|
|
2014-08-13 07:24:53 +00:00
|
|
|
%define real_name Text-Unidecode
|
2011-11-17 15:54:32 +00:00
|
|
|
Name: perl-Text-Unidecode
|
2014-08-13 07:24:53 +00:00
|
|
|
Version: 1.01
|
|
|
|
Release: 0
|
2011-11-17 15:54:32 +00:00
|
|
|
Summary: US-ASCII transliterations of Unicode text
|
2014-08-13 07:24:53 +00:00
|
|
|
License: Artistic-1.0
|
2011-11-17 15:54:32 +00:00
|
|
|
Group: Development/Libraries/Perl
|
2014-08-13 07:24:53 +00:00
|
|
|
Url: http://search.cpan.org/perldoc?Text::Unidecode
|
|
|
|
Source: http://www.cpan.org/authors/id/S/SB/SBURKE/%{real_name}-%{version}.tar.gz
|
2010-10-19 15:37:45 +00:00
|
|
|
BuildRequires: perl
|
2010-12-03 14:54:17 +00:00
|
|
|
BuildRequires: perl-macros
|
2014-08-13 07:24:53 +00:00
|
|
|
BuildRoot: %{_tmppath}/%{name}-%{version}-build
|
|
|
|
%{perl_requires}
|
2009-03-11 12:44:36 +00:00
|
|
|
|
|
|
|
%description
|
|
|
|
It often happens that you have non-Roman text data in Unicode, but you can't
|
|
|
|
display it -- usually because you're trying to show it to a user via an
|
|
|
|
application that doesn't support Unicode, or because the fonts you need aren't
|
|
|
|
accessible. You could represent the Unicode characters as "???????" or
|
|
|
|
"\15BA\15A0\1610...", but that's nearly useless to the user who actually wants
|
|
|
|
to read what the text says.
|
|
|
|
|
|
|
|
What Text::Unidecode provides is a function, unidecode(...) that takes Unicode
|
|
|
|
data and tries to represent it in US-ASCII characters (i.e., the universally
|
|
|
|
displayable characters between 0x00 and 0x7F). The representation is almost
|
|
|
|
always an attempt at transliteration -- i.e., conveying, in Roman letters, the
|
|
|
|
pronunciation expressed by the text in some other writing system. (See the
|
|
|
|
example in the synopsis.)
|
|
|
|
|
|
|
|
Unidecode's ability to transliterate is limited by two factors:
|
|
|
|
|
|
|
|
* The amount and quality of data in the original
|
|
|
|
|
|
|
|
So if you have Hebrew data that has no vowel points in it, then Unidecode
|
|
|
|
cannot guess what vowels should appear in a pronounciation. S f y hv n vwls n
|
|
|
|
th npt, y wn't gt ny vwls n th tpt. (This is a specific application of the
|
|
|
|
general principle of "Garbage In, Garbage Out".)
|
|
|
|
|
|
|
|
* Basic limitations in the Unidecode design
|
|
|
|
|
|
|
|
Writing a real and clever transliteration algorithm for any single
|
|
|
|
language usually requires a lot of time, and at least a passable knowledge of
|
|
|
|
the language involved. But Unicode text can convey more languages than I could
|
|
|
|
possibly learn (much less create a transliterator for) in the entire rest of my
|
|
|
|
lifetime. So I put a cap on how intelligent Unidecode could be, by insisting
|
|
|
|
that it support only context-insensitive transliteration. That means missing
|
|
|
|
the finer details of any given writing system, while still hopefully being
|
|
|
|
useful.
|
|
|
|
|
|
|
|
Unidecode, in other words, is quick and dirty. Sometimes the output is not so
|
|
|
|
dirty at all: Russian and Greek seem to work passably; and while Thaana
|
|
|
|
(Divehi, AKA Maldivian) is a definitely non-Western writing system, setting up
|
|
|
|
a mapping from it to Roman letters seems to work pretty well. But sometimes the
|
|
|
|
output is very dirty: Unidecode does quite badly on Japanese and Thai.
|
|
|
|
|
|
|
|
If you want a smarter transliteration for a particular language than Unidecode
|
|
|
|
provides, then you should look for (or write) a transliteration algorithm
|
|
|
|
specific to that language, and apply it instead of (or at least before)
|
|
|
|
applying Unidecode.
|
|
|
|
|
|
|
|
In other words, Unidecode's approach is broad (knowing about dozens of writing
|
|
|
|
systems), but shallow (not being meticulous about any of them).
|
|
|
|
|
|
|
|
Author:
|
|
|
|
-------
|
|
|
|
Sean M. Burke sburke@cpan.org
|
|
|
|
|
|
|
|
%prep
|
2014-08-13 07:24:53 +00:00
|
|
|
%setup -q -n %{real_name}-%{version}
|
2009-03-11 12:44:36 +00:00
|
|
|
|
|
|
|
%build
|
2014-08-13 07:24:53 +00:00
|
|
|
perl Makefile.PL
|
|
|
|
make %{?_smp_mflags}
|
2009-03-11 12:44:36 +00:00
|
|
|
|
|
|
|
%check
|
|
|
|
make test
|
|
|
|
|
|
|
|
%install
|
|
|
|
%perl_make_install
|
|
|
|
%perl_process_packlist
|
|
|
|
|
2014-08-13 07:24:53 +00:00
|
|
|
%files
|
2009-03-11 12:44:36 +00:00
|
|
|
%defattr(-, root, root)
|
|
|
|
%doc ChangeLog README MANIFEST TODO.txt
|
|
|
|
%doc %{_mandir}/man?/*
|
|
|
|
%dir %{perl_vendorarch}/auto/Text
|
|
|
|
%dir %{perl_vendorarch}/auto/Text/Unidecode
|
|
|
|
%dir %{perl_vendorlib}/Text
|
|
|
|
%dir %{perl_vendorlib}/Text/Unidecode
|
|
|
|
%{perl_vendorlib}/Text/Unidecode/*.pm
|
|
|
|
%{perl_vendorlib}/Text/Unidecode.pm
|
|
|
|
|
2011-11-17 15:54:32 +00:00
|
|
|
%changelog
|