2009-03-11 12:44:36 +00:00
|
|
|
#
|
|
|
|
# spec file for package perl-Text-Unidecode
|
|
|
|
#
|
|
|
|
|
|
|
|
# norootforbuild
|
|
|
|
|
|
|
|
Name: perl-Text-Unidecode
|
|
|
|
%define real_name Text-Unidecode
|
|
|
|
Summary: US-ASCII transliterations of Unicode text
|
|
|
|
Url: http://search.cpan.org/perldoc?Text::Unidecode
|
|
|
|
Group: Development/Libraries/Perl
|
|
|
|
License: Artistic License
|
|
|
|
Version: 0.04
|
|
|
|
Release: 1
|
|
|
|
Vendor: openSUSE-Education
|
|
|
|
Source: %{real_name}-%{version}.tar.bz2
|
|
|
|
BuildRoot: %{_tmppath}/%{name}-%{version}-build
|
2010-12-03 14:54:17 +00:00
|
|
|
%{perl_requires}
|
2010-10-19 15:37:45 +00:00
|
|
|
BuildRequires: perl
|
2010-12-03 14:54:17 +00:00
|
|
|
BuildRequires: perl-macros
|
2009-03-11 12:44:36 +00:00
|
|
|
|
|
|
|
%description
|
|
|
|
It often happens that you have non-Roman text data in Unicode, but you can't
|
|
|
|
display it -- usually because you're trying to show it to a user via an
|
|
|
|
application that doesn't support Unicode, or because the fonts you need aren't
|
|
|
|
accessible. You could represent the Unicode characters as "???????" or
|
|
|
|
"\15BA\15A0\1610...", but that's nearly useless to the user who actually wants
|
|
|
|
to read what the text says.
|
|
|
|
|
|
|
|
What Text::Unidecode provides is a function, unidecode(...) that takes Unicode
|
|
|
|
data and tries to represent it in US-ASCII characters (i.e., the universally
|
|
|
|
displayable characters between 0x00 and 0x7F). The representation is almost
|
|
|
|
always an attempt at transliteration -- i.e., conveying, in Roman letters, the
|
|
|
|
pronunciation expressed by the text in some other writing system. (See the
|
|
|
|
example in the synopsis.)
|
|
|
|
|
|
|
|
Unidecode's ability to transliterate is limited by two factors:
|
|
|
|
|
|
|
|
* The amount and quality of data in the original
|
|
|
|
|
|
|
|
So if you have Hebrew data that has no vowel points in it, then Unidecode
|
|
|
|
cannot guess what vowels should appear in a pronounciation. S f y hv n vwls n
|
|
|
|
th npt, y wn't gt ny vwls n th tpt. (This is a specific application of the
|
|
|
|
general principle of "Garbage In, Garbage Out".)
|
|
|
|
|
|
|
|
* Basic limitations in the Unidecode design
|
|
|
|
|
|
|
|
Writing a real and clever transliteration algorithm for any single
|
|
|
|
language usually requires a lot of time, and at least a passable knowledge of
|
|
|
|
the language involved. But Unicode text can convey more languages than I could
|
|
|
|
possibly learn (much less create a transliterator for) in the entire rest of my
|
|
|
|
lifetime. So I put a cap on how intelligent Unidecode could be, by insisting
|
|
|
|
that it support only context-insensitive transliteration. That means missing
|
|
|
|
the finer details of any given writing system, while still hopefully being
|
|
|
|
useful.
|
|
|
|
|
|
|
|
Unidecode, in other words, is quick and dirty. Sometimes the output is not so
|
|
|
|
dirty at all: Russian and Greek seem to work passably; and while Thaana
|
|
|
|
(Divehi, AKA Maldivian) is a definitely non-Western writing system, setting up
|
|
|
|
a mapping from it to Roman letters seems to work pretty well. But sometimes the
|
|
|
|
output is very dirty: Unidecode does quite badly on Japanese and Thai.
|
|
|
|
|
|
|
|
If you want a smarter transliteration for a particular language than Unidecode
|
|
|
|
provides, then you should look for (or write) a transliteration algorithm
|
|
|
|
specific to that language, and apply it instead of (or at least before)
|
|
|
|
applying Unidecode.
|
|
|
|
|
|
|
|
In other words, Unidecode's approach is broad (knowing about dozens of writing
|
|
|
|
systems), but shallow (not being meticulous about any of them).
|
|
|
|
|
|
|
|
Author:
|
|
|
|
-------
|
|
|
|
Sean M. Burke sburke@cpan.org
|
|
|
|
|
|
|
|
|
|
|
|
%prep
|
|
|
|
%setup -n %{real_name}-%{version}
|
|
|
|
|
|
|
|
%build
|
|
|
|
perl Makefile.PL
|
|
|
|
make %{?jobs:-j%jobs}
|
|
|
|
|
|
|
|
%check
|
|
|
|
make test
|
|
|
|
|
|
|
|
%install
|
|
|
|
%perl_make_install
|
|
|
|
%perl_process_packlist
|
|
|
|
|
|
|
|
%clean
|
|
|
|
rm -rf %{buildroot}
|
|
|
|
|
|
|
|
%files
|
|
|
|
%defattr(-, root, root)
|
|
|
|
%doc ChangeLog README MANIFEST TODO.txt
|
|
|
|
%doc %{_mandir}/man?/*
|
|
|
|
%dir %{perl_vendorarch}/auto/Text
|
|
|
|
%dir %{perl_vendorarch}/auto/Text/Unidecode
|
|
|
|
%dir %{perl_vendorlib}/Text
|
|
|
|
%dir %{perl_vendorlib}/Text/Unidecode
|
|
|
|
%{perl_vendorlib}/Text/Unidecode/*.pm
|
|
|
|
%{perl_vendorlib}/Text/Unidecode.pm
|
|
|
|
|
|
|
|
%changelog
|