209 lines
7.4 KiB
RPMSpec
209 lines
7.4 KiB
RPMSpec
#
|
|
# spec file for package perl-Regexp-Grammars
|
|
#
|
|
# Copyright (c) 2024 SUSE LLC
|
|
#
|
|
# All modifications and additions to the file contributed by third parties
|
|
# remain the property of their copyright owners, unless otherwise agreed
|
|
# upon. The license for this file, and modifications and additions to the
|
|
# file, is the same license as for the pristine package itself (unless the
|
|
# license for the pristine package is not an Open Source License, in which
|
|
# case the license is the MIT License). An "Open Source License" is a
|
|
# license that conforms to the Open Source Definition (Version 1.9)
|
|
# published by the Open Source Initiative.
|
|
|
|
# Please submit bugfixes or comments via https://bugs.opensuse.org/
|
|
#
|
|
|
|
|
|
%define cpan_name Regexp-Grammars
|
|
Name: perl-Regexp-Grammars
|
|
Version: 1.58.0
|
|
Release: 0
|
|
# 1.058 -> normalize -> 1.58.0
|
|
%define cpan_version 1.058
|
|
License: Artistic-1.0 OR GPL-1.0-or-later
|
|
Summary: Add grammatical parsing features to Perl 5.10 regexes
|
|
URL: https://metacpan.org/release/%{cpan_name}
|
|
Source0: https://cpan.metacpan.org/authors/id/D/DC/DCONWAY/%{cpan_name}-%{cpan_version}.tar.gz
|
|
Source1: cpanspec.yml
|
|
Source100: README.md
|
|
BuildArch: noarch
|
|
BuildRequires: perl
|
|
BuildRequires: perl-macros
|
|
BuildRequires: perl(Module::Build)
|
|
BuildRequires: perl(version)
|
|
Requires: perl(version)
|
|
Provides: perl(Regexp::Grammars) = %{version}
|
|
Provides: perl(Regexp::Grammars::Precursor)
|
|
%undefine __perllib_provides
|
|
%{perl_requires}
|
|
|
|
%description
|
|
This module adds a small number of new regex constructs that can be used
|
|
within Perl 5.10 patterns to implement complete recursive-descent parsing.
|
|
|
|
Perl 5.10 already supports recursive-descent _matching_, via the new
|
|
'(?<name>...)' and '(?&name)' constructs. For example, here is a simple
|
|
matcher for a subset of the LaTeX markup language:
|
|
|
|
$matcher = qr{
|
|
(?&File)
|
|
|
|
(?(DEFINE)
|
|
(?<File> (?&Element)* )
|
|
|
|
(?<Element> \s* (?&Command)
|
|
| \s* (?&Literal)
|
|
)
|
|
|
|
(?<Command> \\ \s* (?&Literal) \s* (?&Options)? \s* (?&Args)? )
|
|
|
|
(?<Options> \[ \s* (?:(?&Option) (?:\s*,\s* (?&Option) )*)? \s* \])
|
|
|
|
(?<Args> \{ \s* (?&Element)* \s* \} )
|
|
|
|
(?<Option> \s* [^][\$&%#_{}~^\s,]+ )
|
|
|
|
(?<Literal> \s* [^][\$&%#_{}~^\s]+ )
|
|
)
|
|
}xms
|
|
|
|
This technique makes it possible to use regexes to recognize complex,
|
|
hierarchical--and even recursive--textual structures. The problem is that
|
|
Perl 5.10 doesn't provide any support for extracting that hierarchical data
|
|
into nested data structures. In other words, using Perl 5.10 you can
|
|
_match_ complex data, but not _parse_ it into an internally useful form.
|
|
|
|
An additional problem when using Perl 5.10 regexes to match complex data
|
|
formats is that you have to make sure you remember to insert
|
|
whitespace-matching constructs (such as '\s*') at every possible position
|
|
where the data might contain ignorable whitespace. This reduces the
|
|
readability of such patterns, and increases the chance of errors (typically
|
|
caused by overlooking a location where whitespace might appear).
|
|
|
|
The Regexp::Grammars module solves both those problems.
|
|
|
|
If you import the module into a particular lexical scope, it preprocesses
|
|
any regex in that scope, so as to implement a number of extensions to the
|
|
standard Perl 5.10 regex syntax. These extensions simplify the task of
|
|
defining and calling subrules within a grammar, and allow those subrule
|
|
calls to capture and retain the components of they match in a proper
|
|
hierarchical manner.
|
|
|
|
For example, the above LaTeX matcher could be converted to a full LaTeX
|
|
parser (and considerably tidied up at the same time), like so:
|
|
|
|
use Regexp::Grammars;
|
|
$parser = qr{
|
|
<File>
|
|
|
|
<rule: File> <[Element]>*
|
|
|
|
<rule: Element> <Command> | <Literal>
|
|
|
|
<rule: Command> \\ <Literal> <Options>? <Args>?
|
|
|
|
<rule: Options> \[ <[Option]>+ % (,) \]
|
|
|
|
<rule: Args> \{ <[Element]>* \}
|
|
|
|
<rule: Option> [^][\$&%#_{}~^\s,]+
|
|
|
|
<rule: Literal> [^][\$&%#_{}~^\s]+
|
|
}xms
|
|
|
|
Note that there is no need to explicitly place '\s*' subpatterns throughout
|
|
the rules; that is taken care of automatically.
|
|
|
|
If the Regexp::Grammars version of this regex were successfully matched
|
|
against some appropriate LaTeX document, each rule would call the subrules
|
|
specified within it, and then return a hash containing whatever result each
|
|
of those subrules returned, with each result indexed by the subrule's name.
|
|
|
|
That is, if the rule named 'Command' were invoked, it would first try to
|
|
match a backslash, then it would call the three subrules '<Literal>',
|
|
'<Options>', and '<Args>' (in that sequence). If they all matched
|
|
successfully, the 'Command' rule would then return a hash with three keys:
|
|
''Literal'', ''Options'', and ''Args''. The value for each of those hash
|
|
entries would be whatever result-hash the subrules themselves had returned
|
|
when matched.
|
|
|
|
In this way, each level of the hierarchical regex can generate hashes
|
|
recording everything its own subrules matched, so when the entire pattern
|
|
matches, it produces a tree of nested hashes that represent the structured
|
|
data the pattern matched.
|
|
|
|
For example, if the previous regex grammar were matched against a string
|
|
containing:
|
|
|
|
\documentclass[a4paper,11pt]{article}
|
|
\author{D. Conway}
|
|
|
|
it would automatically extract a data structure equivalent to the following
|
|
(but with several extra "empty" keys, which are described in Subrule
|
|
results):
|
|
|
|
{
|
|
'file' => {
|
|
'element' => [
|
|
{
|
|
'command' => {
|
|
'literal' => 'documentclass',
|
|
'options' => {
|
|
'option' => [ 'a4paper', '11pt' ],
|
|
},
|
|
'args' => {
|
|
'element' => [ 'article' ],
|
|
}
|
|
}
|
|
},
|
|
{
|
|
'command' => {
|
|
'literal' => 'author',
|
|
'args' => {
|
|
'element' => [
|
|
{
|
|
'literal' => 'D.',
|
|
},
|
|
{
|
|
'literal' => 'Conway',
|
|
}
|
|
]
|
|
}
|
|
}
|
|
}
|
|
]
|
|
}
|
|
}
|
|
|
|
The data structure that Regexp::Grammars produces from a regex match is
|
|
available to the surrounding program in the magic variable '%/'.
|
|
|
|
Regexp::Grammars provides many features that simplify the extraction of
|
|
hierarchical data via a regex match, and also some features that can
|
|
simplify the processing of that data once it has been extracted. The
|
|
following sections explain each of those features, and some of the parsing
|
|
techniques they support.
|
|
|
|
%prep
|
|
%autosetup -n %{cpan_name}-%{cpan_version}
|
|
|
|
find . -type f ! -path "*/t/*" ! -name "*.pl" ! -path "*/bin/*" ! -path "*/script/*" ! -path "*/scripts/*" ! -name "configure" -print0 | xargs -0 chmod 644
|
|
|
|
%build
|
|
perl Build.PL --installdirs=vendor
|
|
./Build build --flags=%{?_smp_mflags}
|
|
|
|
%check
|
|
./Build test
|
|
|
|
%install
|
|
./Build install --destdir=%{buildroot} --create_packlist=0
|
|
%perl_gen_filelist
|
|
|
|
%files -f %{name}.files
|
|
%doc Changes README
|
|
|
|
%changelog
|