2010-07-07 13:46:21 +00:00
|
|
|
#
|
2015-04-17 05:36:01 +00:00
|
|
|
# spec file for package perl-HTML-Strip
|
2010-07-07 13:46:21 +00:00
|
|
|
#
|
2016-05-20 05:20:30 +00:00
|
|
|
# Copyright (c) 2016 SUSE LINUX GmbH, Nuernberg, Germany.
|
2010-07-07 13:46:21 +00:00
|
|
|
#
|
|
|
|
# All modifications and additions to the file contributed by third parties
|
|
|
|
# remain the property of their copyright owners, unless otherwise agreed
|
|
|
|
# upon. The license for this file, and modifications and additions to the
|
|
|
|
# file, is the same license as for the pristine package itself (unless the
|
|
|
|
# license for the pristine package is not an Open Source License, in which
|
|
|
|
# case the license is the MIT License). An "Open Source License" is a
|
|
|
|
# license that conforms to the Open Source Definition (Version 1.9)
|
|
|
|
# published by the Open Source Initiative.
|
|
|
|
|
|
|
|
# Please submit bugfixes or comments via http://bugs.opensuse.org/
|
|
|
|
#
|
|
|
|
|
|
|
|
|
|
|
|
Name: perl-HTML-Strip
|
2016-05-20 05:20:30 +00:00
|
|
|
Version: 2.10
|
2015-04-17 05:36:01 +00:00
|
|
|
Release: 0
|
|
|
|
%define cpan_name HTML-Strip
|
2010-07-07 13:46:21 +00:00
|
|
|
Summary: Perl extension for stripping HTML markup from text
|
2015-04-17 05:36:01 +00:00
|
|
|
License: Artistic-1.0 or GPL-1.0+
|
2010-07-07 13:46:21 +00:00
|
|
|
Group: Development/Libraries/Perl
|
|
|
|
Url: http://search.cpan.org/dist/HTML-Strip/
|
2015-04-17 05:36:01 +00:00
|
|
|
Source0: http://www.cpan.org/authors/id/K/KI/KILINRAX/%{cpan_name}-%{version}.tar.gz
|
|
|
|
Source1: cpanspec.yml
|
2010-07-07 13:46:21 +00:00
|
|
|
BuildRoot: %{_tmppath}/%{name}-%{version}-build
|
|
|
|
BuildRequires: perl
|
|
|
|
BuildRequires: perl-macros
|
2015-04-17 05:36:01 +00:00
|
|
|
BuildRequires: perl(Test::Exception)
|
|
|
|
Requires: perl(Test::Exception)
|
|
|
|
%{perl_requires}
|
2010-07-07 13:46:21 +00:00
|
|
|
|
|
|
|
%description
|
2016-05-20 05:20:30 +00:00
|
|
|
This module simply strips HTML-like markup from text rapidly and brutally.
|
|
|
|
It could easily be used to strip XML or SGML markup instead; but as
|
|
|
|
removing HTML is a much more common problem, this module lives in the
|
|
|
|
HTML:: namespace.
|
2010-07-07 13:46:21 +00:00
|
|
|
|
2015-04-17 05:36:01 +00:00
|
|
|
It is written in XS, and thus about five times quicker than using regular
|
|
|
|
expressions for the same task.
|
|
|
|
|
2016-05-20 05:20:30 +00:00
|
|
|
It does _not_ do any syntax checking (if you want that, use HTML::Parser),
|
|
|
|
instead it merely applies the following rules:
|
2015-04-17 05:36:01 +00:00
|
|
|
|
|
|
|
* 1
|
|
|
|
|
2016-05-20 05:20:30 +00:00
|
|
|
Anything that looks like a tag, or group of tags will be replaced with a
|
|
|
|
single space character. Tags are considered to be anything that starts with
|
|
|
|
a '<' and ends with a '>'; with the caveat that a '>' character may appear
|
|
|
|
in either of the following without ending the tag:
|
2015-04-17 05:36:01 +00:00
|
|
|
|
|
|
|
* Quote
|
|
|
|
|
2016-05-20 05:20:30 +00:00
|
|
|
Quotes are considered to start with either a ''' or a '"' character, and
|
|
|
|
end with a matching character _not_ preceded by an even number or escaping
|
|
|
|
slashes (i.e. '\"' does not end the quote but '\\\\"' does).
|
2015-04-17 05:36:01 +00:00
|
|
|
|
|
|
|
* Comment
|
|
|
|
|
2016-05-20 05:20:30 +00:00
|
|
|
If the tag starts with an exclamation mark, it is assumed to be a
|
|
|
|
declaration or a comment. Within such tags, '>' characters do not end the
|
|
|
|
tag if they appear within pairs of double dashes (e.g. '<!-- <a
|
|
|
|
href="old.htm">old page</a> -->' would be stripped completely). No parsing
|
|
|
|
for quotes is performed within comments, so for instance '<!-- comment with
|
|
|
|
both ' quote types " -->' would be entirely stripped.
|
2015-04-17 05:36:01 +00:00
|
|
|
|
|
|
|
* 2
|
|
|
|
|
2016-05-20 05:20:30 +00:00
|
|
|
Anything the appears within what we term _strip tags_ is stripped as well.
|
|
|
|
By default, these tags are 'title', 'script', 'style' and 'applet'.
|
2015-04-17 05:36:01 +00:00
|
|
|
|
|
|
|
HTML::Strip maintains state between calls, so you can parse a document in
|
|
|
|
chunks should you wish. If one chunk ends half-way through a tag, quote,
|
|
|
|
comment, or whatever; it will remember this, and expect the next call to
|
|
|
|
parse to start with the remains of said tag.
|
|
|
|
|
|
|
|
If this is not going to be the case, be sure to call $hs->eof() between
|
|
|
|
calls to $hs->parse(). Alternatively, you may set 'auto_reset' to true on
|
|
|
|
the constructor or any time after with 'set_auto_reset', so that the parser
|
|
|
|
will always operate in one-shot basis (resetting after each parsed chunk).
|
|
|
|
|
2010-07-07 13:46:21 +00:00
|
|
|
%prep
|
2015-04-17 05:36:01 +00:00
|
|
|
%setup -q -n %{cpan_name}-%{version}
|
2010-07-07 13:46:21 +00:00
|
|
|
|
|
|
|
%build
|
2015-04-17 05:36:01 +00:00
|
|
|
%{__perl} Makefile.PL INSTALLDIRS=vendor OPTIMIZE="%{optflags}"
|
2010-07-07 13:46:21 +00:00
|
|
|
%{__make} %{?_smp_mflags}
|
|
|
|
|
|
|
|
%check
|
|
|
|
%{__make} test
|
|
|
|
|
|
|
|
%install
|
|
|
|
%perl_make_install
|
|
|
|
%perl_process_packlist
|
|
|
|
%perl_gen_filelist
|
|
|
|
|
|
|
|
%files -f %{name}.files
|
2015-04-17 05:36:01 +00:00
|
|
|
%defattr(-,root,root,755)
|
2010-07-07 13:46:21 +00:00
|
|
|
%doc Changes README
|
|
|
|
|
|
|
|
%changelog
|