forked from pool/perl-Code-DRY
Accepting request 578020 from home:jengelh:branches:devel:languages:perl
- Compact description. OBS-URL: https://build.opensuse.org/request/show/578020 OBS-URL: https://build.opensuse.org/package/show/devel:languages:perl/perl-Code-DRY?expand=0&rev=2
This commit is contained in:
@@ -1,3 +1,8 @@
|
||||
-------------------------------------------------------------------
|
||||
Mon Feb 19 11:33:42 UTC 2018 - jengelh@inai.de
|
||||
|
||||
- Compact description.
|
||||
|
||||
-------------------------------------------------------------------
|
||||
Wed Jan 25 11:37:54 UTC 2017 - okurz@suse.com
|
||||
|
||||
|
@@ -1,7 +1,7 @@
|
||||
#
|
||||
# spec file for package perl-Code-DRY
|
||||
#
|
||||
# Copyright (c) 2016 SUSE LINUX GmbH, Nuernberg, Germany.
|
||||
# Copyright (c) 2018 SUSE LINUX GmbH, Nuernberg, Germany.
|
||||
#
|
||||
# All modifications and additions to the file contributed by third parties
|
||||
# remain the property of their copyright owners, unless otherwise agreed
|
||||
@@ -31,101 +31,20 @@ BuildRequires: perl-macros
|
||||
%{perl_requires}
|
||||
|
||||
%description
|
||||
The module's main purpose is to report repeated text fragments (typically
|
||||
Perl code) that could be considered for isolation and/or abstraction in
|
||||
order to reduce multiple copies of the same code (aka cut and paste code).
|
||||
The module reports repeated text fragments (typically Perl code) that
|
||||
could be considered for isolation and/or abstraction in order to
|
||||
reduce multiple copies of the same code (a.k.a. cut and paste code).
|
||||
|
||||
Code duplicates may occur in the same line, file or directory.
|
||||
|
||||
The ad hoc approach to compare every item against every other item leads to
|
||||
computing times growing exponentially with the amount of code, which is not
|
||||
useful for anything but the smallest code bases.
|
||||
|
||||
So a efficient data structure is needed.
|
||||
|
||||
This module can create the suffix array and the longest common prefix array
|
||||
for a string of 8-bit characters. These data structures can be used to
|
||||
search for repetitions of substrings in O(n) time.
|
||||
|
||||
The current strategy is to concatenate code from all files into one string
|
||||
and then use the suffix array and its companion, the longest-common-prefix
|
||||
(lcp) array on this string.
|
||||
|
||||
Example:
|
||||
Instead of real Perl code I use the string 'mississippi' for
|
||||
simplicity. A *suffix* is a partial string of an input string, which
|
||||
ends at the end of the input string. A *prefix* is a partial string of
|
||||
an input string, which starts at the start of the input string. The
|
||||
*suffix array* of a string is a list of offsets (each one for a
|
||||
suffix), which is sorted lexicographically by suffix:
|
||||
|
||||
# offset suffix
|
||||
================
|
||||
0 10: i
|
||||
1 7: ippi
|
||||
2 4: issippi
|
||||
3 1: ississippi
|
||||
4 0: mississippi
|
||||
5 9: pi
|
||||
6 8: ppi
|
||||
7 6: sippi
|
||||
8 3: sissippi
|
||||
9 5: ssippi
|
||||
10 2: ssissippi
|
||||
|
||||
The other structure needed is the *longest common prefix array* (lcp).
|
||||
It contains the maximal length of the prefixes for this entry shared
|
||||
with the previous entry from the suffix array. For this example it
|
||||
looks like this:
|
||||
|
||||
# offset lcp (common prefixes shown in ())
|
||||
=====================
|
||||
0 10: 0 ()
|
||||
1 7: 1 (i)
|
||||
2 4: 1 (i)
|
||||
3 1: 4 (issi) overlap!
|
||||
3 3 (iss) corrected non overlapping prefixes
|
||||
4 0: 0 ()
|
||||
5 9: 0 ()
|
||||
6 8: 1 (p)
|
||||
7 6: 0 ()
|
||||
8 3: 2 (si)
|
||||
9 5: 1 (s)
|
||||
10 2: 3 (ssi)
|
||||
|
||||
The standard lcp array may contain overlapping prefixes, but for our
|
||||
purposes we need only non overlapping prefixes lengths. The same
|
||||
overlap may occur for prefixes that extend from the end of one source
|
||||
file to the start of the next file when we use concatenated content of
|
||||
source files. The limiting with respect to internal overlaps and file
|
||||
crossing prefix lengths is done by two respective functions afterwards.
|
||||
|
||||
If we sort the so obtained lcp values in descending order we get
|
||||
|
||||
# offset lcp (prefix shown in ())
|
||||
===================================
|
||||
3 1: 3 (iss) now corrected to non overlapping prefixes
|
||||
10 2: 3 (ssi)
|
||||
8 3: 2 (si)
|
||||
1 7: 1 (i)
|
||||
2 4: 1 (i)
|
||||
6 8: 1 (p)
|
||||
9 5: 1 (s)
|
||||
0 10: 0 ()
|
||||
4 0: 0 ()
|
||||
5 9: 0 ()
|
||||
7 6: 0 ()
|
||||
|
||||
The first entry shows the longest repetition in the given string. Not
|
||||
all entries are of interest since smaller copies are contained in the
|
||||
longest match. After removing all 'shadowed' repetitions, the next
|
||||
entry can be reported. Finally the lcp values are too small to be of
|
||||
any interest.
|
||||
|
||||
Currently this is experimental code.
|
||||
|
||||
The most appropriate mailing list on which to discuss this module would
|
||||
be perl-qa. See http://lists.perl.org/list/perl-qa.html.
|
||||
Rather than the exponential-time ad-hoc approach to compare every
|
||||
item against every other item, this module can create the suffix
|
||||
array and the longest common prefix array for a string of 8-bit
|
||||
characters. These data structures can be used to search for
|
||||
repetitions of substrings in O(n) time. The current strategy is to
|
||||
concatenate code from all files into one string and then use the
|
||||
suffix array and its companion, the longest-common-prefix (lcp) array
|
||||
on this string.
|
||||
|
||||
%prep
|
||||
%setup -q -n %{cpan_name}-%{version}
|
||||
|
Reference in New Issue
Block a user