8
0
forked from pool/perl-Code-DRY

Accepting request 578020 from home:jengelh:branches:devel:languages:perl

- Compact description.

OBS-URL: https://build.opensuse.org/request/show/578020
OBS-URL: https://build.opensuse.org/package/show/devel:languages:perl/perl-Code-DRY?expand=0&rev=2
This commit is contained in:
2018-02-20 06:05:44 +00:00
committed by Git OBS Bridge
parent e8d7173271
commit 39b1d0bfe7
2 changed files with 17 additions and 93 deletions

View File

@@ -1,3 +1,8 @@
-------------------------------------------------------------------
Mon Feb 19 11:33:42 UTC 2018 - jengelh@inai.de
- Compact description.
-------------------------------------------------------------------
Wed Jan 25 11:37:54 UTC 2017 - okurz@suse.com

View File

@@ -1,7 +1,7 @@
#
# spec file for package perl-Code-DRY
#
# Copyright (c) 2016 SUSE LINUX GmbH, Nuernberg, Germany.
# Copyright (c) 2018 SUSE LINUX GmbH, Nuernberg, Germany.
#
# All modifications and additions to the file contributed by third parties
# remain the property of their copyright owners, unless otherwise agreed
@@ -31,101 +31,20 @@ BuildRequires: perl-macros
%{perl_requires}
%description
The module's main purpose is to report repeated text fragments (typically
Perl code) that could be considered for isolation and/or abstraction in
order to reduce multiple copies of the same code (aka cut and paste code).
The module reports repeated text fragments (typically Perl code) that
could be considered for isolation and/or abstraction in order to
reduce multiple copies of the same code (a.k.a. cut and paste code).
Code duplicates may occur in the same line, file or directory.
The ad hoc approach to compare every item against every other item leads to
computing times growing exponentially with the amount of code, which is not
useful for anything but the smallest code bases.
So a efficient data structure is needed.
This module can create the suffix array and the longest common prefix array
for a string of 8-bit characters. These data structures can be used to
search for repetitions of substrings in O(n) time.
The current strategy is to concatenate code from all files into one string
and then use the suffix array and its companion, the longest-common-prefix
(lcp) array on this string.
Example:
Instead of real Perl code I use the string 'mississippi' for
simplicity. A *suffix* is a partial string of an input string, which
ends at the end of the input string. A *prefix* is a partial string of
an input string, which starts at the start of the input string. The
*suffix array* of a string is a list of offsets (each one for a
suffix), which is sorted lexicographically by suffix:
# offset suffix
================
0 10: i
1 7: ippi
2 4: issippi
3 1: ississippi
4 0: mississippi
5 9: pi
6 8: ppi
7 6: sippi
8 3: sissippi
9 5: ssippi
10 2: ssissippi
The other structure needed is the *longest common prefix array* (lcp).
It contains the maximal length of the prefixes for this entry shared
with the previous entry from the suffix array. For this example it
looks like this:
# offset lcp (common prefixes shown in ())
=====================
0 10: 0 ()
1 7: 1 (i)
2 4: 1 (i)
3 1: 4 (issi) overlap!
3 3 (iss) corrected non overlapping prefixes
4 0: 0 ()
5 9: 0 ()
6 8: 1 (p)
7 6: 0 ()
8 3: 2 (si)
9 5: 1 (s)
10 2: 3 (ssi)
The standard lcp array may contain overlapping prefixes, but for our
purposes we need only non overlapping prefixes lengths. The same
overlap may occur for prefixes that extend from the end of one source
file to the start of the next file when we use concatenated content of
source files. The limiting with respect to internal overlaps and file
crossing prefix lengths is done by two respective functions afterwards.
If we sort the so obtained lcp values in descending order we get
# offset lcp (prefix shown in ())
===================================
3 1: 3 (iss) now corrected to non overlapping prefixes
10 2: 3 (ssi)
8 3: 2 (si)
1 7: 1 (i)
2 4: 1 (i)
6 8: 1 (p)
9 5: 1 (s)
0 10: 0 ()
4 0: 0 ()
5 9: 0 ()
7 6: 0 ()
The first entry shows the longest repetition in the given string. Not
all entries are of interest since smaller copies are contained in the
longest match. After removing all 'shadowed' repetitions, the next
entry can be reported. Finally the lcp values are too small to be of
any interest.
Currently this is experimental code.
The most appropriate mailing list on which to discuss this module would
be perl-qa. See http://lists.perl.org/list/perl-qa.html.
Rather than the exponential-time ad-hoc approach to compare every
item against every other item, this module can create the suffix
array and the longest common prefix array for a string of 8-bit
characters. These data structures can be used to search for
repetitions of substrings in O(n) time. The current strategy is to
concatenate code from all files into one string and then use the
suffix array and its companion, the longest-common-prefix (lcp) array
on this string.
%prep
%setup -q -n %{cpan_name}-%{version}