1
0
Lars Vogdt 2009-03-11 12:44:36 +00:00 committed by Git OBS Bridge
commit ed01d948f2
5 changed files with 136 additions and 0 deletions

23
.gitattributes vendored Normal file
View File

@ -0,0 +1,23 @@
## Default LFS
*.7z filter=lfs diff=lfs merge=lfs -text
*.bsp filter=lfs diff=lfs merge=lfs -text
*.bz2 filter=lfs diff=lfs merge=lfs -text
*.gem filter=lfs diff=lfs merge=lfs -text
*.gz filter=lfs diff=lfs merge=lfs -text
*.jar filter=lfs diff=lfs merge=lfs -text
*.lz filter=lfs diff=lfs merge=lfs -text
*.lzma filter=lfs diff=lfs merge=lfs -text
*.obscpio filter=lfs diff=lfs merge=lfs -text
*.oxt filter=lfs diff=lfs merge=lfs -text
*.pdf filter=lfs diff=lfs merge=lfs -text
*.png filter=lfs diff=lfs merge=lfs -text
*.rpm filter=lfs diff=lfs merge=lfs -text
*.tbz filter=lfs diff=lfs merge=lfs -text
*.tbz2 filter=lfs diff=lfs merge=lfs -text
*.tgz filter=lfs diff=lfs merge=lfs -text
*.ttf filter=lfs diff=lfs merge=lfs -text
*.txz filter=lfs diff=lfs merge=lfs -text
*.whl filter=lfs diff=lfs merge=lfs -text
*.xz filter=lfs diff=lfs merge=lfs -text
*.zip filter=lfs diff=lfs merge=lfs -text
*.zst filter=lfs diff=lfs merge=lfs -text

1
.gitignore vendored Normal file
View File

@ -0,0 +1 @@
.osc

View File

@ -0,0 +1,3 @@
version https://git-lfs.github.com/spec/v1
oid sha256:a23f0bb769d8507495bd06b269e8c1b50e4d55854447509a5d586880bb3886ae
size 79278

View File

@ -0,0 +1,5 @@
-------------------------------------------------------------------
Wed Mar 11 13:33:23 CET 2009 - lars@linux-schulserver.de
- initial version 0.04

104
perl-Text-Unidecode.spec Normal file
View File

@ -0,0 +1,104 @@
#
# spec file for package perl-Text-Unidecode
#
# norootforbuild
Name: perl-Text-Unidecode
%define real_name Text-Unidecode
Summary: US-ASCII transliterations of Unicode text
Url: http://search.cpan.org/perldoc?Text::Unidecode
Group: Development/Libraries/Perl
License: Artistic License
Version: 0.04
Release: 1
Vendor: openSUSE-Education
Source: %{real_name}-%{version}.tar.bz2
Requires: perl = %{perl_version}
BuildRoot: %{_tmppath}/%{name}-%{version}-build
%description
It often happens that you have non-Roman text data in Unicode, but you can't
display it -- usually because you're trying to show it to a user via an
application that doesn't support Unicode, or because the fonts you need aren't
accessible. You could represent the Unicode characters as "???????" or
"\15BA\15A0\1610...", but that's nearly useless to the user who actually wants
to read what the text says.
What Text::Unidecode provides is a function, unidecode(...) that takes Unicode
data and tries to represent it in US-ASCII characters (i.e., the universally
displayable characters between 0x00 and 0x7F). The representation is almost
always an attempt at transliteration -- i.e., conveying, in Roman letters, the
pronunciation expressed by the text in some other writing system. (See the
example in the synopsis.)
Unidecode's ability to transliterate is limited by two factors:
* The amount and quality of data in the original
So if you have Hebrew data that has no vowel points in it, then Unidecode
cannot guess what vowels should appear in a pronounciation. S f y hv n vwls n
th npt, y wn't gt ny vwls n th tpt. (This is a specific application of the
general principle of "Garbage In, Garbage Out".)
* Basic limitations in the Unidecode design
Writing a real and clever transliteration algorithm for any single
language usually requires a lot of time, and at least a passable knowledge of
the language involved. But Unicode text can convey more languages than I could
possibly learn (much less create a transliterator for) in the entire rest of my
lifetime. So I put a cap on how intelligent Unidecode could be, by insisting
that it support only context-insensitive transliteration. That means missing
the finer details of any given writing system, while still hopefully being
useful.
Unidecode, in other words, is quick and dirty. Sometimes the output is not so
dirty at all: Russian and Greek seem to work passably; and while Thaana
(Divehi, AKA Maldivian) is a definitely non-Western writing system, setting up
a mapping from it to Roman letters seems to work pretty well. But sometimes the
output is very dirty: Unidecode does quite badly on Japanese and Thai.
If you want a smarter transliteration for a particular language than Unidecode
provides, then you should look for (or write) a transliteration algorithm
specific to that language, and apply it instead of (or at least before)
applying Unidecode.
In other words, Unidecode's approach is broad (knowing about dozens of writing
systems), but shallow (not being meticulous about any of them).
Author:
-------
Sean M. Burke sburke@cpan.org
%prep
%setup -n %{real_name}-%{version}
%build
perl Makefile.PL
make %{?jobs:-j%jobs}
%check
make test
%install
%perl_make_install
%perl_process_packlist
%clean
rm -rf %{buildroot}
%files
%defattr(-, root, root)
%doc ChangeLog README MANIFEST TODO.txt
%doc %{_mandir}/man?/*
%dir %{perl_vendorarch}/auto/Text
%dir %{perl_vendorarch}/auto/Text/Unidecode
%dir %{perl_vendorlib}/Text
%dir %{perl_vendorlib}/Text/Unidecode
%{perl_vendorarch}/auto/Text/Unidecode/.packlist
%{perl_vendorlib}/Text/Unidecode/*.pm
%{perl_vendorlib}/Text/Unidecode.pm
/var/adm/perl-modules/%{name}
%changelog