perl-Regexp-Grammars/perl-Regexp-Grammars.spec

#
# spec file for package perl-Regexp-Grammars
#
# Copyright (c) 2024 SUSE LLC
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
# upon. The license for this file, and modifications and additions to the
# file, is the same license as for the pristine package itself (unless the
# license for the pristine package is not an Open Source License, in which
# case the license is the MIT License). An "Open Source License" is a
# license that conforms to the Open Source Definition (Version 1.9)
# published by the Open Source Initiative.

# Please submit bugfixes or comments via https://bugs.opensuse.org/
#


%define cpan_name Regexp-Grammars
Name:           perl-Regexp-Grammars
Version:        1.58.0
Release:        0
# 1.058 -> normalize -> 1.58.0
%define cpan_version 1.058
License:        Artistic-1.0 OR GPL-1.0-or-later
Summary:        Add grammatical parsing features to Perl 5.10 regexes
URL:            https://metacpan.org/release/%{cpan_name}
Source0:        https://cpan.metacpan.org/authors/id/D/DC/DCONWAY/%{cpan_name}-%{cpan_version}.tar.gz
Source1:        cpanspec.yml
Source100:      README.md
BuildArch:      noarch
BuildRequires:  perl
BuildRequires:  perl-macros
BuildRequires:  perl(Module::Build)
BuildRequires:  perl(version)
Requires:       perl(version)
Provides:       perl(Regexp::Grammars) = %{version}
Provides:       perl(Regexp::Grammars::Precursor)
%undefine       __perllib_provides
%{perl_requires}

%description
This module adds a small number of new regex constructs that can be used
within Perl 5.10 patterns to implement complete recursive-descent parsing.

Perl 5.10 already supports recursive-descent _matching_, via the new
'(?<name>...)' and '(?&name)' constructs. For example, here is a simple
matcher for a subset of the LaTeX markup language:

    $matcher = qr{
        (?&File)

        (?(DEFINE)
            (?<File>     (?&Element)* )

            (?<Element>  \s* (?&Command)
                      |  \s* (?&Literal)
            )

            (?<Command>  \\ \s* (?&Literal) \s* (?&Options)? \s* (?&Args)? )

            (?<Options>  \[ \s* (?:(?&Option) (?:\s*,\s* (?&Option) )*)? \s* \])

            (?<Args>     \{ \s* (?&Element)* \s* \}  )

            (?<Option>   \s* [^][\$&%#_{}~^\s,]+     )

            (?<Literal>  \s* [^][\$&%#_{}~^\s]+      )
        )
    }xms

This technique makes it possible to use regexes to recognize complex,
hierarchical--and even recursive--textual structures. The problem is that
Perl 5.10 doesn't provide any support for extracting that hierarchical data
into nested data structures. In other words, using Perl 5.10 you can
_match_ complex data, but not _parse_ it into an internally useful form.

An additional problem when using Perl 5.10 regexes to match complex data
formats is that you have to make sure you remember to insert
whitespace-matching constructs (such as '\s*') at every possible position
where the data might contain ignorable whitespace. This reduces the
readability of such patterns, and increases the chance of errors (typically
caused by overlooking a location where whitespace might appear).

The Regexp::Grammars module solves both those problems.

If you import the module into a particular lexical scope, it preprocesses
any regex in that scope, so as to implement a number of extensions to the
standard Perl 5.10 regex syntax. These extensions simplify the task of
defining and calling subrules within a grammar, and allow those subrule
calls to capture and retain the components of they match in a proper
hierarchical manner.

For example, the above LaTeX matcher could be converted to a full LaTeX
parser (and considerably tidied up at the same time), like so:

    use Regexp::Grammars;
    $parser = qr{
        <File>

        <rule: File>       <[Element]>*

        <rule: Element>    <Command> | <Literal>

        <rule: Command>    \\  <Literal>  <Options>?  <Args>?

        <rule: Options>    \[  <[Option]>+ % (,)  \]

        <rule: Args>       \{  <[Element]>*  \}

        <rule: Option>     [^][\$&%#_{}~^\s,]+

        <rule: Literal>    [^][\$&%#_{}~^\s]+
    }xms

Note that there is no need to explicitly place '\s*' subpatterns throughout
the rules; that is taken care of automatically.

If the Regexp::Grammars version of this regex were successfully matched
against some appropriate LaTeX document, each rule would call the subrules
specified within it, and then return a hash containing whatever result each
of those subrules returned, with each result indexed by the subrule's name.

That is, if the rule named 'Command' were invoked, it would first try to
match a backslash, then it would call the three subrules '<Literal>',
'<Options>', and '<Args>' (in that sequence). If they all matched
successfully, the 'Command' rule would then return a hash with three keys:
''Literal'', ''Options'', and ''Args''. The value for each of those hash
entries would be whatever result-hash the subrules themselves had returned
when matched.

In this way, each level of the hierarchical regex can generate hashes
recording everything its own subrules matched, so when the entire pattern
matches, it produces a tree of nested hashes that represent the structured
data the pattern matched.

For example, if the previous regex grammar were matched against a string
containing:

    \documentclass[a4paper,11pt]{article}
    \author{D. Conway}

it would automatically extract a data structure equivalent to the following
(but with several extra "empty" keys, which are described in Subrule
results):

    {
        'file' => {
            'element' => [
                {
                    'command' => {
                        'literal' => 'documentclass',
                        'options' => {
                            'option'  => [ 'a4paper', '11pt' ],
                        },
                        'args'    => {
                            'element' => [ 'article' ],
                        }
                    }
                },
                {
                    'command' => {
                        'literal' => 'author',
                        'args' => {
                            'element' => [
                                {
                                    'literal' => 'D.',
                                },
                                {
                                    'literal' => 'Conway',
                                }
                            ]
                        }
                    }
                }
            ]
        }
    }

The data structure that Regexp::Grammars produces from a regex match is
available to the surrounding program in the magic variable '%/'.

Regexp::Grammars provides many features that simplify the extraction of
hierarchical data via a regex match, and also some features that can
simplify the processing of that data once it has been extracted. The
following sections explain each of those features, and some of the parsing
techniques they support.

%prep
%autosetup  -n %{cpan_name}-%{cpan_version}

find . -type f ! -path "*/t/*" ! -name "*.pl" ! -path "*/bin/*" ! -path "*/script/*" ! -path "*/scripts/*" ! -name "configure" -print0 | xargs -0 chmod 644

%build
perl Build.PL --installdirs=vendor
./Build build --flags=%{?_smp_mflags}

%check
./Build test

%install
./Build install --destdir=%{buildroot} --create_packlist=0
%perl_gen_filelist

%files -f %{name}.files
%doc Changes README

%changelog