forked from pool/perl-HTML-TableExtract
- updated to 2.11
OBS-URL: https://build.opensuse.org/package/show/devel:languages:perl/perl-HTML-TableExtract?expand=0&rev=24
This commit is contained in:
committed by
Git OBS Bridge
parent
3d4e564a24
commit
80be1dc906
@@ -1,3 +1,8 @@
|
|||||||
|
-------------------------------------------------------------------
|
||||||
|
Tue Dec 20 13:38:24 UTC 2011 - coolo@suse.com
|
||||||
|
|
||||||
|
- updated to 2.11
|
||||||
|
|
||||||
-------------------------------------------------------------------
|
-------------------------------------------------------------------
|
||||||
Tue Dec 20 09:13:30 UTC 2011 - coolo@suse.com
|
Tue Dec 20 09:13:30 UTC 2011 - coolo@suse.com
|
||||||
|
|
||||||
|
@@ -20,18 +20,21 @@ Name: perl-HTML-TableExtract
|
|||||||
Version: 2.11
|
Version: 2.11
|
||||||
Release: 0
|
Release: 0
|
||||||
%define cpan_name HTML-TableExtract
|
%define cpan_name HTML-TableExtract
|
||||||
Summary: For extracting the content contained in tables within an HTML document
|
Summary: Perl module for extracting the content contained in tables within an HTM[cut]
|
||||||
License: GPL-1.0+ or Artistic-1.0
|
License: GPL-1.0+ or Artistic-1.0
|
||||||
Group: Development/Libraries/Perl
|
Group: Development/Libraries/Perl
|
||||||
Url: http://search.cpan.org/dist/HTML-TableExtract/
|
Url: http://search.cpan.org/dist/HTML-TableExtract/
|
||||||
Source: http://www.cpan.org/authors/id/M/MS/MSISK/HTML-TableExtract-%{version}.tar.gz
|
Source: http://www.cpan.org/authors/id/M/MS/MSISK/%{cpan_name}-%{version}.tar.gz
|
||||||
Patch0: %{cpan_name}-2.10-HTML.patch
|
Patch0: HTML-TableExtract-2.10-HTML.patch
|
||||||
BuildArch: noarch
|
BuildArch: noarch
|
||||||
BuildRoot: %{_tmppath}/%{name}-%{version}-build
|
BuildRoot: %{_tmppath}/%{name}-%{version}-build
|
||||||
BuildRequires: perl
|
BuildRequires: perl
|
||||||
BuildRequires: perl-macros
|
BuildRequires: perl-macros
|
||||||
BuildRequires: perl(HTML::ElementTable) >= 1.16
|
BuildRequires: perl(HTML::ElementTable) >= 1.16
|
||||||
BuildRequires: perl(HTML::Parser)
|
BuildRequires: perl(HTML::Parser)
|
||||||
|
#BuildRequires: perl(HTML::Entities)
|
||||||
|
#BuildRequires: perl(HTML::TableExtract)
|
||||||
|
#BuildRequires: perl(testload)
|
||||||
Requires: perl(HTML::ElementTable) >= 1.16
|
Requires: perl(HTML::ElementTable) >= 1.16
|
||||||
Requires: perl(HTML::Parser)
|
Requires: perl(HTML::Parser)
|
||||||
%{perl_requires}
|
%{perl_requires}
|
||||||
@@ -94,45 +97,10 @@ When extracting only text from tables, the text is decoded with
|
|||||||
HTML::Entities by default; this can be disabled by setting the _decode_
|
HTML::Entities by default; this can be disabled by setting the _decode_
|
||||||
parameter to 0.
|
parameter to 0.
|
||||||
|
|
||||||
Extraction Modes
|
|
||||||
The default mode of extraction for HTML::TableExtract is raw text or
|
|
||||||
HTML. In this mode, embedded tables are completely decoupled from one
|
|
||||||
another. In this case, HTML::TableExtract is a subclass of
|
|
||||||
HTML::Parser:
|
|
||||||
|
|
||||||
use HTML::TableExtract;
|
|
||||||
|
|
||||||
Alternativevly, tables can be extracted as HTML::ElementTable
|
|
||||||
structures, which are in turn embedded in an HTML::Element tree
|
|
||||||
representing the entire HTML document. Embedded tables are not
|
|
||||||
decoupled from one another since this tree structure must be
|
|
||||||
manitained. In this case, HTML::TableExtract is a subclass of
|
|
||||||
HTML::TreeBuilder (itself a subclass of HTML:::Parser):
|
|
||||||
|
|
||||||
use HTML::TableExtract qw(tree);
|
|
||||||
|
|
||||||
In either case, the basic interface for HTML::TableExtract and the
|
|
||||||
resulting table objects remains the same -- all that changes is what
|
|
||||||
you can do with the resulting data.
|
|
||||||
|
|
||||||
HTML::TableExtract is a subclass of HTML::Parser, and as such inherits
|
|
||||||
all of its basic methods such as 'parse()' and 'parse_file()'. During
|
|
||||||
scans, 'start()', 'end()', and 'text()' are utilized. Feel free to
|
|
||||||
override them, but if you do not eventually invoke them in the SUPER
|
|
||||||
class with some content, results are not guaranteed.
|
|
||||||
|
|
||||||
Advice
|
|
||||||
The main point of this module was to provide a flexible method of
|
|
||||||
extracting tabular information from HTML documents without relying to
|
|
||||||
heavily on the document layout. For that reason, I suggest using
|
|
||||||
_Headers_ whenever possible -- that way, you are anchoring your
|
|
||||||
extraction on what the document is trying to communicate rather than
|
|
||||||
some feature of the HTML comprising the document (other than the fact
|
|
||||||
that the data is contained in a table).
|
|
||||||
|
|
||||||
%prep
|
%prep
|
||||||
%setup -q -n %{cpan_name}-%{version}
|
%setup -q -n %{cpan_name}-%{version}
|
||||||
%patch0 -p1
|
%patch0 -p1
|
||||||
|
find . -type f -print0 | xargs -0 chmod 644
|
||||||
|
|
||||||
%build
|
%build
|
||||||
%{__perl} Makefile.PL INSTALLDIRS=vendor
|
%{__perl} Makefile.PL INSTALLDIRS=vendor
|
||||||
@@ -146,11 +114,8 @@ Advice
|
|||||||
%perl_process_packlist
|
%perl_process_packlist
|
||||||
%perl_gen_filelist
|
%perl_gen_filelist
|
||||||
|
|
||||||
%clean
|
|
||||||
%{__rm} -rf %{buildroot}
|
|
||||||
|
|
||||||
%files -f %{name}.files
|
%files -f %{name}.files
|
||||||
%defattr(644,root,root,755)
|
%defattr(-,root,root,755)
|
||||||
%doc Changes README
|
%doc Changes README
|
||||||
|
|
||||||
%changelog
|
%changelog
|
||||||
|
Reference in New Issue
Block a user