forked from pool/perl-HTML-TableExtract
- updated to 2.11
OBS-URL: https://build.opensuse.org/package/show/devel:languages:perl/perl-HTML-TableExtract?expand=0&rev=24
This commit is contained in:
committed by
Git OBS Bridge
parent
3d4e564a24
commit
80be1dc906
@@ -1,3 +1,8 @@
|
||||
-------------------------------------------------------------------
|
||||
Tue Dec 20 13:38:24 UTC 2011 - coolo@suse.com
|
||||
|
||||
- updated to 2.11
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Tue Dec 20 09:13:30 UTC 2011 - coolo@suse.com
|
||||
|
||||
|
@@ -20,18 +20,21 @@ Name: perl-HTML-TableExtract
|
||||
Version: 2.11
|
||||
Release: 0
|
||||
%define cpan_name HTML-TableExtract
|
||||
Summary: For extracting the content contained in tables within an HTML document
|
||||
Summary: Perl module for extracting the content contained in tables within an HTM[cut]
|
||||
License: GPL-1.0+ or Artistic-1.0
|
||||
Group: Development/Libraries/Perl
|
||||
Url: http://search.cpan.org/dist/HTML-TableExtract/
|
||||
Source: http://www.cpan.org/authors/id/M/MS/MSISK/HTML-TableExtract-%{version}.tar.gz
|
||||
Patch0: %{cpan_name}-2.10-HTML.patch
|
||||
Source: http://www.cpan.org/authors/id/M/MS/MSISK/%{cpan_name}-%{version}.tar.gz
|
||||
Patch0: HTML-TableExtract-2.10-HTML.patch
|
||||
BuildArch: noarch
|
||||
BuildRoot: %{_tmppath}/%{name}-%{version}-build
|
||||
BuildRequires: perl
|
||||
BuildRequires: perl-macros
|
||||
BuildRequires: perl(HTML::ElementTable) >= 1.16
|
||||
BuildRequires: perl(HTML::Parser)
|
||||
#BuildRequires: perl(HTML::Entities)
|
||||
#BuildRequires: perl(HTML::TableExtract)
|
||||
#BuildRequires: perl(testload)
|
||||
Requires: perl(HTML::ElementTable) >= 1.16
|
||||
Requires: perl(HTML::Parser)
|
||||
%{perl_requires}
|
||||
@@ -94,45 +97,10 @@ When extracting only text from tables, the text is decoded with
|
||||
HTML::Entities by default; this can be disabled by setting the _decode_
|
||||
parameter to 0.
|
||||
|
||||
Extraction Modes
|
||||
The default mode of extraction for HTML::TableExtract is raw text or
|
||||
HTML. In this mode, embedded tables are completely decoupled from one
|
||||
another. In this case, HTML::TableExtract is a subclass of
|
||||
HTML::Parser:
|
||||
|
||||
use HTML::TableExtract;
|
||||
|
||||
Alternativevly, tables can be extracted as HTML::ElementTable
|
||||
structures, which are in turn embedded in an HTML::Element tree
|
||||
representing the entire HTML document. Embedded tables are not
|
||||
decoupled from one another since this tree structure must be
|
||||
manitained. In this case, HTML::TableExtract is a subclass of
|
||||
HTML::TreeBuilder (itself a subclass of HTML:::Parser):
|
||||
|
||||
use HTML::TableExtract qw(tree);
|
||||
|
||||
In either case, the basic interface for HTML::TableExtract and the
|
||||
resulting table objects remains the same -- all that changes is what
|
||||
you can do with the resulting data.
|
||||
|
||||
HTML::TableExtract is a subclass of HTML::Parser, and as such inherits
|
||||
all of its basic methods such as 'parse()' and 'parse_file()'. During
|
||||
scans, 'start()', 'end()', and 'text()' are utilized. Feel free to
|
||||
override them, but if you do not eventually invoke them in the SUPER
|
||||
class with some content, results are not guaranteed.
|
||||
|
||||
Advice
|
||||
The main point of this module was to provide a flexible method of
|
||||
extracting tabular information from HTML documents without relying to
|
||||
heavily on the document layout. For that reason, I suggest using
|
||||
_Headers_ whenever possible -- that way, you are anchoring your
|
||||
extraction on what the document is trying to communicate rather than
|
||||
some feature of the HTML comprising the document (other than the fact
|
||||
that the data is contained in a table).
|
||||
|
||||
%prep
|
||||
%setup -q -n %{cpan_name}-%{version}
|
||||
%patch0 -p1
|
||||
find . -type f -print0 | xargs -0 chmod 644
|
||||
|
||||
%build
|
||||
%{__perl} Makefile.PL INSTALLDIRS=vendor
|
||||
@@ -146,11 +114,8 @@ Advice
|
||||
%perl_process_packlist
|
||||
%perl_gen_filelist
|
||||
|
||||
%clean
|
||||
%{__rm} -rf %{buildroot}
|
||||
|
||||
%files -f %{name}.files
|
||||
%defattr(644,root,root,755)
|
||||
%defattr(-,root,root,755)
|
||||
%doc Changes README
|
||||
|
||||
%changelog
|
||||
|
Reference in New Issue
Block a user