8
0
forked from pool/perl-Code-DRY
Files
perl-Code-DRY/perl-Code-DRY.spec

151 lines
5.3 KiB
RPMSpec
Raw Normal View History

#
# spec file for package perl-Code-DRY
#
# Copyright (c) 2016 SUSE LINUX GmbH, Nuernberg, Germany.
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
# upon. The license for this file, and modifications and additions to the
# file, is the same license as for the pristine package itself (unless the
# license for the pristine package is not an Open Source License, in which
# case the license is the MIT License). An "Open Source License" is a
# license that conforms to the Open Source Definition (Version 1.9)
# published by the Open Source Initiative.
# Please submit bugfixes or comments via http://bugs.opensuse.org/
#
Name: perl-Code-DRY
Version: 0.03
Release: 0
%define cpan_name Code-DRY
Summary: Cut-and-Paste-Detector for Perl code
License: GPL-1.0+ or Artistic-1.0
Group: Development/Libraries/Perl
Url: http://search.cpan.org/dist/Code-DRY/
Source0: http://www.cpan.org/authors/id/H/HE/HEXCODER/%{cpan_name}-%{version}.tar.gz
BuildRoot: %{_tmppath}/%{name}-%{version}-build
BuildRequires: perl
BuildRequires: perl-macros
%{perl_requires}
%description
The module's main purpose is to report repeated text fragments (typically
Perl code) that could be considered for isolation and/or abstraction in
order to reduce multiple copies of the same code (aka cut and paste code).
Code duplicates may occur in the same line, file or directory.
The ad hoc approach to compare every item against every other item leads to
computing times growing exponentially with the amount of code, which is not
useful for anything but the smallest code bases.
So a efficient data structure is needed.
This module can create the suffix array and the longest common prefix array
for a string of 8-bit characters. These data structures can be used to
search for repetitions of substrings in O(n) time.
The current strategy is to concatenate code from all files into one string
and then use the suffix array and its companion, the longest-common-prefix
(lcp) array on this string.
Example:
Instead of real Perl code I use the string 'mississippi' for
simplicity. A *suffix* is a partial string of an input string, which
ends at the end of the input string. A *prefix* is a partial string of
an input string, which starts at the start of the input string. The
*suffix array* of a string is a list of offsets (each one for a
suffix), which is sorted lexicographically by suffix:
# offset suffix
================
0 10: i
1 7: ippi
2 4: issippi
3 1: ississippi
4 0: mississippi
5 9: pi
6 8: ppi
7 6: sippi
8 3: sissippi
9 5: ssippi
10 2: ssissippi
The other structure needed is the *longest common prefix array* (lcp).
It contains the maximal length of the prefixes for this entry shared
with the previous entry from the suffix array. For this example it
looks like this:
# offset lcp (common prefixes shown in ())
=====================
0 10: 0 ()
1 7: 1 (i)
2 4: 1 (i)
3 1: 4 (issi) overlap!
3 3 (iss) corrected non overlapping prefixes
4 0: 0 ()
5 9: 0 ()
6 8: 1 (p)
7 6: 0 ()
8 3: 2 (si)
9 5: 1 (s)
10 2: 3 (ssi)
The standard lcp array may contain overlapping prefixes, but for our
purposes we need only non overlapping prefixes lengths. The same
overlap may occur for prefixes that extend from the end of one source
file to the start of the next file when we use concatenated content of
source files. The limiting with respect to internal overlaps and file
crossing prefix lengths is done by two respective functions afterwards.
If we sort the so obtained lcp values in descending order we get
# offset lcp (prefix shown in ())
===================================
3 1: 3 (iss) now corrected to non overlapping prefixes
10 2: 3 (ssi)
8 3: 2 (si)
1 7: 1 (i)
2 4: 1 (i)
6 8: 1 (p)
9 5: 1 (s)
0 10: 0 ()
4 0: 0 ()
5 9: 0 ()
7 6: 0 ()
The first entry shows the longest repetition in the given string. Not
all entries are of interest since smaller copies are contained in the
longest match. After removing all 'shadowed' repetitions, the next
entry can be reported. Finally the lcp values are too small to be of
any interest.
Currently this is experimental code.
The most appropriate mailing list on which to discuss this module would
be perl-qa. See http://lists.perl.org/list/perl-qa.html.
%prep
%setup -q -n %{cpan_name}-%{version}
find . -type f ! -name \*.pl -print0 | xargs -0 chmod 644
%build
%{__perl} Makefile.PL INSTALLDIRS=vendor OPTIMIZE="%{optflags}"
%{__make} %{?_smp_mflags}
%check
%{__make} test
%install
%perl_make_install
%perl_process_packlist
%perl_gen_filelist
%files -f %{name}.files
%defattr(-,root,root,755)
%doc Changes README
%changelog