forked from pool/perl-Text-Unidecode
- updated to 1.22
* RELEASE 1.22. (The dev release works, so this is a version bump.) * See notes for 2014-07-25, because this is the first public release with significant changes since 2001! 2014-07-25 Sean M. Burke sburke@cpan.org * !DEVELOPER RELEASE! * !Release 1.20_01! * Many bugfixes. Thanks especially to Tomaž Šolc! * Yet more *.t files added for improved sanity checking. * Shuffling around the internals of Unidecode.pm * Putting in some vacuous 0x__.pm files where previously there would just be a load failure OBS-URL: https://build.opensuse.org/package/show/devel:languages:perl/perl-Text-Unidecode?expand=0&rev=11
This commit is contained in:
committed by
Git OBS Bridge
parent
e22a75bb72
commit
43b83c895f
@@ -16,96 +16,62 @@
|
||||
#
|
||||
|
||||
|
||||
%define real_name Text-Unidecode
|
||||
Name: perl-Text-Unidecode
|
||||
Version: 1.01
|
||||
Version: 1.22
|
||||
Release: 0
|
||||
Summary: US-ASCII transliterations of Unicode text
|
||||
License: Artistic-1.0
|
||||
%define cpan_name Text-Unidecode
|
||||
Summary: Provide plain ASCII transliterations of Unicode text
|
||||
License: Artistic-1.0 or GPL-1.0+
|
||||
Group: Development/Libraries/Perl
|
||||
Url: http://search.cpan.org/perldoc?Text::Unidecode
|
||||
Source: http://www.cpan.org/authors/id/S/SB/SBURKE/%{real_name}-%{version}.tar.gz
|
||||
Url: http://search.cpan.org/dist/Text-Unidecode/
|
||||
Source: http://www.cpan.org/authors/id/S/SB/SBURKE/%{cpan_name}-%{version}.tar.gz
|
||||
BuildArch: noarch
|
||||
BuildRoot: %{_tmppath}/%{name}-%{version}-build
|
||||
BuildRequires: perl
|
||||
BuildRequires: perl-macros
|
||||
BuildRoot: %{_tmppath}/%{name}-%{version}-build
|
||||
%{perl_requires}
|
||||
|
||||
%description
|
||||
It often happens that you have non-Roman text data in Unicode, but you can't
|
||||
display it -- usually because you're trying to show it to a user via an
|
||||
application that doesn't support Unicode, or because the fonts you need aren't
|
||||
accessible. You could represent the Unicode characters as "???????" or
|
||||
"\15BA\15A0\1610...", but that's nearly useless to the user who actually wants
|
||||
to read what the text says.
|
||||
It often happens that you have non-Roman text data in Unicode, but you
|
||||
can't display it-- usually because you're trying to show it to a user via
|
||||
an application that doesn't support Unicode, or because the fonts you need
|
||||
aren't accessible. You could represent the Unicode characters as "???????"
|
||||
or "\15BA\15A0\1610...", but that's nearly useless to the user who actually
|
||||
wants to read what the text says.
|
||||
|
||||
What Text::Unidecode provides is a function, unidecode(...) that takes Unicode
|
||||
data and tries to represent it in US-ASCII characters (i.e., the universally
|
||||
displayable characters between 0x00 and 0x7F). The representation is almost
|
||||
always an attempt at transliteration -- i.e., conveying, in Roman letters, the
|
||||
pronunciation expressed by the text in some other writing system. (See the
|
||||
example in the synopsis.)
|
||||
What Text::Unidecode provides is a function, 'unidecode(...)' that takes
|
||||
Unicode data and tries to represent it in US-ASCII characters (i.e., the
|
||||
universally displayable characters between 0x00 and 0x7F). The
|
||||
representation is almost always an attempt at _transliteration_-- i.e.,
|
||||
conveying, in Roman letters, the pronunciation expressed by the text in
|
||||
some other writing system. (See the example in the synopsis.)
|
||||
|
||||
Unidecode's ability to transliterate is limited by two factors:
|
||||
NOTE:
|
||||
|
||||
* The amount and quality of data in the original
|
||||
To make sure your perldoc/Pod viewing setup for viewing this page is
|
||||
working: The six-letter word "r<>sum<75>" should look like "resume" with an "/"
|
||||
accent on each "e".
|
||||
|
||||
So if you have Hebrew data that has no vowel points in it, then Unidecode
|
||||
cannot guess what vowels should appear in a pronounciation. S f y hv n vwls n
|
||||
th npt, y wn't gt ny vwls n th tpt. (This is a specific application of the
|
||||
general principle of "Garbage In, Garbage Out".)
|
||||
|
||||
* Basic limitations in the Unidecode design
|
||||
|
||||
Writing a real and clever transliteration algorithm for any single
|
||||
language usually requires a lot of time, and at least a passable knowledge of
|
||||
the language involved. But Unicode text can convey more languages than I could
|
||||
possibly learn (much less create a transliterator for) in the entire rest of my
|
||||
lifetime. So I put a cap on how intelligent Unidecode could be, by insisting
|
||||
that it support only context-insensitive transliteration. That means missing
|
||||
the finer details of any given writing system, while still hopefully being
|
||||
useful.
|
||||
|
||||
Unidecode, in other words, is quick and dirty. Sometimes the output is not so
|
||||
dirty at all: Russian and Greek seem to work passably; and while Thaana
|
||||
(Divehi, AKA Maldivian) is a definitely non-Western writing system, setting up
|
||||
a mapping from it to Roman letters seems to work pretty well. But sometimes the
|
||||
output is very dirty: Unidecode does quite badly on Japanese and Thai.
|
||||
|
||||
If you want a smarter transliteration for a particular language than Unidecode
|
||||
provides, then you should look for (or write) a transliteration algorithm
|
||||
specific to that language, and apply it instead of (or at least before)
|
||||
applying Unidecode.
|
||||
|
||||
In other words, Unidecode's approach is broad (knowing about dozens of writing
|
||||
systems), but shallow (not being meticulous about any of them).
|
||||
|
||||
Author:
|
||||
-------
|
||||
Sean M. Burke sburke@cpan.org
|
||||
For further tests, and help if that doesn't work, see below, the /A POD
|
||||
ENCODING TEST manpage.
|
||||
|
||||
%prep
|
||||
%setup -q -n %{real_name}-%{version}
|
||||
%setup -q -n %{cpan_name}-%{version}
|
||||
|
||||
%build
|
||||
perl Makefile.PL
|
||||
make %{?_smp_mflags}
|
||||
%{__perl} Makefile.PL INSTALLDIRS=vendor
|
||||
%{__make} %{?_smp_mflags}
|
||||
|
||||
%check
|
||||
make test
|
||||
%{__make} test
|
||||
|
||||
%install
|
||||
%perl_make_install
|
||||
%perl_process_packlist
|
||||
%perl_gen_filelist
|
||||
|
||||
%files
|
||||
%defattr(-, root, root)
|
||||
%doc ChangeLog README MANIFEST TODO.txt
|
||||
%doc %{_mandir}/man?/*
|
||||
%dir %{perl_vendorarch}/auto/Text
|
||||
%dir %{perl_vendorarch}/auto/Text/Unidecode
|
||||
%dir %{perl_vendorlib}/Text
|
||||
%dir %{perl_vendorlib}/Text/Unidecode
|
||||
%{perl_vendorlib}/Text/Unidecode/*.pm
|
||||
%{perl_vendorlib}/Text/Unidecode.pm
|
||||
%files -f %{name}.files
|
||||
%defattr(-,root,root,755)
|
||||
%doc ChangeLog LICENSE README TODO.txt
|
||||
|
||||
%changelog
|
||||
|
Reference in New Issue
Block a user