Skip to content
This repository was archived by the owner on Jan 10, 2023. It is now read-only.

Add new algorithm: suffixtree #199

Closed
wants to merge 29 commits into from

Conversation

olleharstedt
Copy link
Contributor

@olleharstedt olleharstedt commented Jun 17, 2021

Full description of algorithm here: https://www.cqse.eu/fileadmin/content/news/publications/2009-do-code-clones-matter.pdf

Original source available here: https://www.cqse.eu/en/news/blog/conqat-end-of-life/

Possible to detect Type 3 clones.

Example run:

php phpcpd --algorithm suffixtree --min-tokens 25 QuestionTheme.php

Question:

  • I need to add new options, which are not supported by the abstract strategy. How about replacing the input args with a new class, StrategyConfiguration? Needed options are: --edit-distance (int), and --head-equality (int).
  • To be able to analyze more than one file, the input should not be "file" but a list of tokens. Maybe there's a memory saving mechanism in place right now, where only the hashes are saved between analysis? Which works for the Rabin-Karp default algorithm, but maybe not for the new one. Have to double check.

@olleharstedt olleharstedt changed the title Add new algorithm: suffix-tree Add new algorithm: suffixtree Jun 17, 2021
@olleharstedt
Copy link
Contributor Author

Hm, lots of reported issues by Psalm.

@olleharstedt
Copy link
Contributor Author

Is it possible to configure cs-fixer so that it spits out the offending line?

  1. phpcpd/src/Detector/Strategy/SuffixTree/CloneInfo.php (ordered_class_elements, phpdoc_annotation_without_dot, phpdoc_scalar, no_superfluous_phpdoc_tags, declare_strict_types, no_blank_lines_before_namespace, no_trailing_whitespace_in_comment, phpdoc_summary, self_accessor, phpdoc_separation, header_comment, binary_operator_spaces)

@sebastianbergmann
Copy link
Owner

sebastianbergmann commented Jun 23, 2021

Is it possible to configure cs-fixer so that it spits out the offending line?

Simply run php-cs-fixer on your machine like so:

$ ./tools/php-cs-fixer fix --dry-run --diff
Loaded config default from "/usr/local/src/phpcpd/.php-cs-fixer.dist.php".

   1) src/Detector/Detector.php
      ---------- begin diff ----------
--- /usr/local/src/phpcpd/src/Detector/Detector.php
+++ /usr/local/src/phpcpd/src/Detector/Detector.php
@@ -20,19 +20,11 @@
      */
     private $strategy;
 
-    /**
-     * @param AbstractStrategy $strategy
-     */
     public function __construct(AbstractStrategy $strategy)
     {
         $this->strategy = $strategy;
     }
 
-    /**
-     * @param iterable $files
-     * @param StrategyConfiguration $config
-     * @return CodeCloneMap
-     */
     public function copyPasteDetection(iterable $files, StrategyConfiguration $config): CodeCloneMap
     {
         $result = new CodeCloneMap;

      ----------- end diff -----------

   2) src/Detector/Strategy/SuffixTree/JavaObjectInterface.php
      ---------- begin diff ----------
--- /usr/local/src/phpcpd/src/Detector/Strategy/SuffixTree/JavaObjectInterface.php
+++ /usr/local/src/phpcpd/src/Detector/Strategy/SuffixTree/JavaObjectInterface.php
@@ -1,9 +1,17 @@
-<?php
-
+<?php declare(strict_types=1);
+/*
+ * This file is part of PHP Copy/Paste Detector (PHPCPD).
+ *
+ * (c) Sebastian Bergmann <sebastian@phpunit.de>
+ *
+ * For the full copyright and license information, please view the LICENSE
+ * file that was distributed with this source code.
+ */
 namespace SebastianBergmann\PHPCPD\Detector\Strategy\SuffixTree;
 
 interface JavaObjectInterface
 {
     public function hashCode(): int;
-    public function equals(JavaObjectInterface $obj): bool;
+
+    public function equals(self $obj): bool;
 }

      ----------- end diff -----------

   3) src/Detector/Strategy/SuffixTree/SuffixTree.php
      ---------- begin diff ----------
--- /usr/local/src/phpcpd/src/Detector/Strategy/SuffixTree/SuffixTree.php
+++ /usr/local/src/phpcpd/src/Detector/Strategy/SuffixTree/SuffixTree.php
@@ -1,22 +1,12 @@
-<?php
-
-/*-------------------------------------------------------------------------+
-|                                                                          |
-| Copyright 2005-2011 The ConQAT Project                                   |
-|                                                                          |
-| Licensed under the Apache License, Version 2.0 (the "License");          |
-| you may not use this file except in compliance with the License.         |
-| You may obtain a copy of the License at                                  |
-|                                                                          |
-|    http://www.apache.org/licenses/LICENSE-2.0                            |
-|                                                                          |
-| Unless required by applicable law or agreed to in writing, software      |
-| distributed under the License is distributed on an "AS IS" BASIS,        |
-| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
-| See the License for the specific language governing permissions and      |
-| limitations under the License.                                           |
-+-------------------------------------------------------------------------*/
-
+<?php declare(strict_types=1);
+/*
+ * This file is part of PHP Copy/Paste Detector (PHPCPD).
+ *
+ * (c) Sebastian Bergmann <sebastian@phpunit.de>
+ *
+ * For the full copyright and license information, please view the LICENSE
+ * file that was distributed with this source code.
+ */
 namespace SebastianBergmann\PHPCPD\Detector\Strategy\SuffixTree;
 
 /**
@@ -41,10 +31,10 @@
  * <p>
  * Everything but the construction itself is protected to simplify increasing
  * its functionality by subclassing but without introducing new method calls.
- * 
+ *
  * @author Benjamin Hummel
  * @author $Author: kinnen $
- * 
+ *
  * @version $Revision: 41751 $
  * @ConQAT.Rating GREEN Hash: 4B2EF0606B3085A6831764ED042FF20D
  */
@@ -52,240 +42,284 @@
 {
     /**
      * Infinity in this context.
-     * @var int 
+     *
+     * @var int
      */
-	protected $INFTY;
+    protected $INFTY;
 
     /** The word we are working on.
-        * @var array */
-	protected $word;
+     * @var array */
+    protected $word;
 
     /** The number of nodes created so far.
-        * @var int */
-	protected $numNodes = 0;
+     * @var int */
+    protected $numNodes = 0;
 
-	/**
-	 * For each node this holds the index of the first character of
-	 * {@link #word} labeling the transition <b>to</b> this node. This
-	 * corresponds to the <em>k</em> for a transition used in Ukkonen's paper.
+    /**
+     * For each node this holds the index of the first character of
+     * {@link #word} labeling the transition <b>to</b> this node. This
+     * corresponds to the <em>k</em> for a transition used in Ukkonen's paper.
      *
      * @var int[]
-	 */
-	protected $nodeWordBegin;
+     */
+    protected $nodeWordBegin;
 
-	/**
-	 * For each node this holds the index of the one after the last character of
-	 * {@link #word} labeling the transition <b>to</b> this node. This
-	 * corresponds to the <em>p</em> for a transition used in Ukkonen's paper.
+    /**
+     * For each node this holds the index of the one after the last character of
+     * {@link #word} labeling the transition <b>to</b> this node. This
+     * corresponds to the <em>p</em> for a transition used in Ukkonen's paper.
      *
      * @var int[]
-	 */
-	protected $nodeWordEnd;
+     */
+    protected $nodeWordEnd;
 
     /** For each node its suffix link (called function <em>f</em> by Ukkonen).
-        * @var int[] */
-	protected $suffixLink;
+     * @var int[] */
+    protected $suffixLink;
 
-	/**
-	 * The next node function realized as a hash table. This corresponds to the
-	 * <em>g</em> function used in Ukkonen's paper.
+    /**
+     * The next node function realized as a hash table. This corresponds to the
+     * <em>g</em> function used in Ukkonen's paper.
      *
      * @var SuffixTreeHashTable
-	 */
-	protected $nextNode;
+     */
+    protected $nextNode;
 
-	/**
-	 * An array giving for each node the index where the first child will be
-	 * stored (or -1 if it has no children). It is initially empty and will be
-	 * filled "on demand" using
-	 * {@link org.conqat.engine.code_clones.detection.suffixtree.SuffixTreeHashTable#extractChildLists(int[], int[], int[])}
-	 * .
+    /**
+     * An array giving for each node the index where the first child will be
+     * stored (or -1 if it has no children). It is initially empty and will be
+     * filled "on demand" using
+     * {@link org.conqat.engine.code_clones.detection.suffixtree.SuffixTreeHashTable#extractChildLists(int[], int[], int[])}
+     * .
      *
      * @var int[]
-	 */
-	protected $nodeChildFirst;
+     */
+    protected $nodeChildFirst;
 
-	/**
-	 * This array gives the next index of the child list or -1 if this is the
-	 * last one. It is initially empty and will be filled "on demand" using
-	 * {@link org.conqat.engine.code_clones.detection.suffixtree.SuffixTreeHashTable#extractChildLists(int[], int[], int[])}
-	 * .
+    /**
+     * This array gives the next index of the child list or -1 if this is the
+     * last one. It is initially empty and will be filled "on demand" using
+     * {@link org.conqat.engine.code_clones.detection.suffixtree.SuffixTreeHashTable#extractChildLists(int[], int[], int[])}
+     * .
      *
      * @var int[]
-	 */
-	protected $nodeChildNext;
+     */
+    protected $nodeChildNext;
 
-	/**
-	 * This array stores the actual name (=number) of the mode in the child
-	 * list. It is initially empty and will be filled "on demand" using
-	 * {@link org.conqat.engine.code_clones.detection.suffixtree.SuffixTreeHashTable#extractChildLists(int[], int[], int[])}
-	 * .
+    /**
+     * This array stores the actual name (=number) of the mode in the child
+     * list. It is initially empty and will be filled "on demand" using
+     * {@link org.conqat.engine.code_clones.detection.suffixtree.SuffixTreeHashTable#extractChildLists(int[], int[], int[])}
+     * .
      *
      * @var int[]
-	 */
-	protected $nodeChildNode;
+     */
+    protected $nodeChildNode;
 
-	/**
-	 * The node we are currently at as a "global" variable (as it is always
-	 * passed unchanged). This is called <i>s</i> in Ukkonen's paper.
+    /**
+     * The node we are currently at as a "global" variable (as it is always
+     * passed unchanged). This is called <i>s</i> in Ukkonen's paper.
      *
      * @var int
-	 */
-	private $currentNode = 0;
+     */
+    private $currentNode = 0;
 
-	/**
-	 * Beginning of the word part of the reference pair. This is kept "global"
-	 * (in constrast to the end) as this is passed unchanged to all functions.
-	 * Ukkonen calls this <em>k</em>.
+    /**
+     * Beginning of the word part of the reference pair. This is kept "global"
+     * (in constrast to the end) as this is passed unchanged to all functions.
+     * Ukkonen calls this <em>k</em>.
      *
      * @var int
-	 */
-	private $refWordBegin = 0;
+     */
+    private $refWordBegin = 0;
 
-	/**
-	 * This is the new (or old) explicit state as returned by
-	 * {@link #testAndSplit(int, Object)}. Ukkonen calls this <em>r</em>.
+    /**
+     * This is the new (or old) explicit state as returned by
+     * {@link #testAndSplit(int, Object)}. Ukkonen calls this <em>r</em>.
      *
      * @var int
-	 */
-	private $explicitNode;
+     */
+    private $explicitNode;
 
-	/**
-	 * Create a new suffix tree from a given word. The word given as parameter
-	 * is used internally and should not be modified anymore, so copy it before
-	 * if required.
-     *
-     * @param array $word
-	 */
+    /**
+     * Create a new suffix tree from a given word. The word given as parameter
+     * is used internally and should not be modified anymore, so copy it before
+     * if required.
+     */
     public function __construct(array $word)
     {
-		$this->word = $word;
-		$size = count($word);
-		$this->INFTY = $size;
+        $this->word  = $word;
+        $size        = count($word);
+        $this->INFTY = $size;
 
-		$expectedNodes = 2 * $size;
+        $expectedNodes       = 2 * $size;
         $this->nodeWordBegin = array_fill(0, $expectedNodes, 0);
-		$this->nodeWordEnd = array_fill(0, $expectedNodes, 0);
-		$this->suffixLink = array_fill(0, $expectedNodes, 0);
-		$this->nextNode = new SuffixTreeHashTable($expectedNodes);
+        $this->nodeWordEnd   = array_fill(0, $expectedNodes, 0);
+        $this->suffixLink    = array_fill(0, $expectedNodes, 0);
+        $this->nextNode      = new SuffixTreeHashTable($expectedNodes);
 
-		$this->createRootNode();
+        $this->createRootNode();
 
-		for ($i = 0; $i < $size; ++$i) {
-			$this->update($i);
+        for ($i = 0; $i < $size; $i++) {
+            $this->update($i);
             $this->canonize($i + 1);
-		}
-	}
+        }
+    }
 
     /**
+     * Returns whether the given word is contained in the string given at
+     * construction time.
+     *
+     * @return bool
+     */
+    public function containsWord(array $find)
+    {
+        $node     = 0;
+        $findSize = count($find);
+
+        for ($i = 0; $i < $findSize;) {
+            /** @var int */
+            $next = $this->nextNode->get($node, $find[$i]);
+
+            if ($next < 0) {
+                return false;
+            }
+
+            for ($j = $this->nodeWordBegin[$next]; $j < $this->nodeWordEnd[$next] && $i < $findSize; ++$i, ++$j) {
+                if (!$this->word[$j]->equals($find[$i])) {
+                    return false;
+                }
+            }
+            $node = $next;
+        }
+
+        return true;
+    }
+
+    /**
+     * This method makes sure the child lists are filled (required for
+     * traversing the tree).
+     */
+    protected function ensureChildLists(): void
+    {
+        if ($this->nodeChildFirst == null || count($this->nodeChildFirst) < $this->numNodes) {
+            $this->nodeChildFirst = array_fill(0, $this->numNodes, 0);
+            $this->nodeChildNext  = array_fill(0, $this->numNodes, 0);
+            $this->nodeChildNode  = array_fill(0, $this->numNodes, 0);
+            $this->nextNode->extractChildLists($this->nodeChildFirst, $this->nodeChildNext, $this->nodeChildNode);
+        }
+    }
+
+    /**
      * Creates the root node.
-     *
-     * @return void
      */
-    private function createRootNode()
+    private function createRootNode(): void
     {
-		$this->numNodes = 1;
-		$this->nodeWordBegin[0] = 0;
-		$this->nodeWordEnd[0] = 0;
-		$this->suffixLink[0] = -1;
-	}
+        $this->numNodes         = 1;
+        $this->nodeWordBegin[0] = 0;
+        $this->nodeWordEnd[0]   = 0;
+        $this->suffixLink[0]    = -1;
+    }
 
-	/**
-	 * The <em>update</em> function as defined in Ukkonen's paper. This inserts
-	 * the character at charPos into the tree. It works on the canonical
-	 * reference pair ({@link #currentNode}, ({@link #refWordBegin}, charPos)).
-     *
-     * @param int $charPos
-     * @return void
-	 */
-	private function update(int $charPos) {
-		$lastNode = 0;
-		while (!$this->testAndSplit($charPos, $this->word[$charPos])) {
-			$newNode = $this->numNodes++;
-			$this->nodeWordBegin[$newNode] = $charPos;
-			$this->nodeWordEnd[$newNode] = $this->INFTY;
-			$this->nextNode->put($this->explicitNode, $this->word[$charPos], $newNode);
+    /**
+     * The <em>update</em> function as defined in Ukkonen's paper. This inserts
+     * the character at charPos into the tree. It works on the canonical
+     * reference pair ({@link #currentNode}, ({@link #refWordBegin}, charPos)).
+     */
+    private function update(int $charPos): void
+    {
+        $lastNode = 0;
 
-			if ($lastNode != 0) {
-				$this->suffixLink[$lastNode] = $this->explicitNode;
-			}
-			$lastNode = $this->explicitNode;
-			$this->currentNode = $this->suffixLink[$this->currentNode];
-			$this->canonize($charPos);
-		}
-		if ($lastNode != 0) {
-			$this->suffixLink[$lastNode] = $this->currentNode;
-		}
-	}
+        while (!$this->testAndSplit($charPos, $this->word[$charPos])) {
+            $newNode                       = $this->numNodes++;
+            $this->nodeWordBegin[$newNode] = $charPos;
+            $this->nodeWordEnd[$newNode]   = $this->INFTY;
+            $this->nextNode->put($this->explicitNode, $this->word[$charPos], $newNode);
 
-	/**
-	 * The <em>test-and-split</em> function as defined in Ukkonen's paper. This
-	 * checks whether the state given by the canonical reference pair (
-	 * {@link #currentNode}, ({@link #refWordBegin}, refWordEnd)) is the end
-	 * point (by checking whether a transition for the
-	 * <code>nextCharacter</code> exists). Additionally the state is made
-	 * explicit if it not already is and this is not the end-point. It returns
-	 * true if the end-point was reached. The newly created (or reached)
-	 * explicit node is returned in the "global" variable.
+            if ($lastNode != 0) {
+                $this->suffixLink[$lastNode] = $this->explicitNode;
+            }
+            $lastNode          = $this->explicitNode;
+            $this->currentNode = $this->suffixLink[$this->currentNode];
+            $this->canonize($charPos);
+        }
+
+        if ($lastNode != 0) {
+            $this->suffixLink[$lastNode] = $this->currentNode;
+        }
+    }
+
+    /**
+     * The <em>test-and-split</em> function as defined in Ukkonen's paper. This
+     * checks whether the state given by the canonical reference pair (
+     * {@link #currentNode}, ({@link #refWordBegin}, refWordEnd)) is the end
+     * point (by checking whether a transition for the
+     * <code>nextCharacter</code> exists). Additionally the state is made
+     * explicit if it not already is and this is not the end-point. It returns
+     * true if the end-point was reached. The newly created (or reached)
+     * explicit node is returned in the "global" variable.
      *
-     * @param int $refWordEnd
      * @param object $nextCharacter
-     * @return boolean
-	 */
+     *
+     * @return bool
+     */
     private function testAndSplit(int $refWordEnd, JavaObjectInterface $nextCharacter)
     {
-		if ($this->currentNode < 0) {
-			// trap state is always end state
-			return true;
-		}
+        if ($this->currentNode < 0) {
+            // trap state is always end state
+            return true;
+        }
 
-		if ($refWordEnd <= $this->refWordBegin) {
-			if ($this->nextNode->get($this->currentNode, $nextCharacter) < 0) {
-				$this->explicitNode = $this->currentNode;
-				return false;
-			}
-			return true;
-		}
+        if ($refWordEnd <= $this->refWordBegin) {
+            if ($this->nextNode->get($this->currentNode, $nextCharacter) < 0) {
+                $this->explicitNode = $this->currentNode;
 
+                return false;
+            }
+
+            return true;
+        }
+
         /** @var int */
-		$next = $this->nextNode->get($this->currentNode, $this->word[$this->refWordBegin]);
-		if ($nextCharacter->equals($this->word[$this->nodeWordBegin[$next] + $refWordEnd - $this->refWordBegin])) {
-			return true;
-		}
+        $next = $this->nextNode->get($this->currentNode, $this->word[$this->refWordBegin]);
 
-		// not an end-point and not explicit, so make it explicit.
-		$this->explicitNode = $this->numNodes++;
-		$this->nodeWordBegin[$this->explicitNode] = $this->nodeWordBegin[$next];
-		$this->nodeWordEnd[$this->explicitNode] = $this->nodeWordBegin[$next] + $refWordEnd - $this->refWordBegin;
-		$this->nextNode->put($this->currentNode, $this->word[$this->refWordBegin], $this->explicitNode);
+        if ($nextCharacter->equals($this->word[$this->nodeWordBegin[$next] + $refWordEnd - $this->refWordBegin])) {
+            return true;
+        }
 
-		$this->nodeWordBegin[$next] += $refWordEnd - $this->refWordBegin;
-		$this->nextNode->put($this->explicitNode, $this->word[$this->nodeWordBegin[$next]], $next);
-		return false;
-	}
+        // not an end-point and not explicit, so make it explicit.
+        $this->explicitNode                       = $this->numNodes++;
+        $this->nodeWordBegin[$this->explicitNode] = $this->nodeWordBegin[$next];
+        $this->nodeWordEnd[$this->explicitNode]   = $this->nodeWordBegin[$next] + $refWordEnd - $this->refWordBegin;
+        $this->nextNode->put($this->currentNode, $this->word[$this->refWordBegin], $this->explicitNode);
 
-	/**
-	 * The <em>canonize</em> function as defined in Ukkonen's paper. Changes the
-	 * reference pair (currentNode, (refWordBegin, refWordEnd)) into a canonical
-	 * reference pair. It works on the "global" variables {@link #currentNode}
-	 * and {@link #refWordBegin} and the parameter, writing the result back to
-	 * the globals.
-	 * 
-	 * @param int $refWordEnd one after the end index for the word of the reference pair.
-     * @return void
-	 */
+        $this->nodeWordBegin[$next] += $refWordEnd - $this->refWordBegin;
+        $this->nextNode->put($this->explicitNode, $this->word[$this->nodeWordBegin[$next]], $next);
+
+        return false;
+    }
+
+    /**
+     * The <em>canonize</em> function as defined in Ukkonen's paper. Changes the
+     * reference pair (currentNode, (refWordBegin, refWordEnd)) into a canonical
+     * reference pair. It works on the "global" variables {@link #currentNode}
+     * and {@link #refWordBegin} and the parameter, writing the result back to
+     * the globals.
+     *
+     * @param int $refWordEnd one after the end index for the word of the reference pair
+     */
     private function canonize(int $refWordEnd): void
     {
-		if ($this->currentNode === -1) {
-			// explicitly handle trap state
-			$this->currentNode = 0;
-			$this->refWordBegin++;
-		}
+        if ($this->currentNode === -1) {
+            // explicitly handle trap state
+            $this->currentNode = 0;
+            $this->refWordBegin++;
+        }
 
-		if ($refWordEnd <= $this->refWordBegin) {
-			// empty word, so already canonical
-			return;
-		}
+        if ($refWordEnd <= $this->refWordBegin) {
+            // empty word, so already canonical
+            return;
+        }
 
         /** @var int */
         $next = $this->nextNode->get(
@@ -292,57 +326,17 @@
             $this->currentNode,
             $this->word[$this->refWordBegin]
         );
-		while ($this->nodeWordEnd[$next] - $this->nodeWordBegin[$next] <= $refWordEnd
-				- $this->refWordBegin) {
-                $this->refWordBegin += $this->nodeWordEnd[$next] - $this->nodeWordBegin[$next];
-                $this->currentNode = $next;
-                if ($refWordEnd > $this->refWordBegin) {
-                    $next = $this->nextNode->get($this->currentNode, $this->word[$this->refWordBegin]);
-                } else {
-                    break;
-                }
+
+        while ($this->nodeWordEnd[$next] - $this->nodeWordBegin[$next] <= $refWordEnd
+                - $this->refWordBegin) {
+            $this->refWordBegin += $this->nodeWordEnd[$next] - $this->nodeWordBegin[$next];
+            $this->currentNode = $next;
+
+            if ($refWordEnd > $this->refWordBegin) {
+                $next = $this->nextNode->get($this->currentNode, $this->word[$this->refWordBegin]);
+            } else {
+                break;
+            }
         }
     }
-
-	/**
-	 * This method makes sure the child lists are filled (required for
-	 * traversing the tree).
-     *
-     * @return void
-	 */
-    protected function ensureChildLists()
-    {
-		if ($this->nodeChildFirst == null || count($this->nodeChildFirst) < $this->numNodes) {
-			$this->nodeChildFirst = array_fill(0, $this->numNodes, 0);
-			$this->nodeChildNext = array_fill(0, $this->numNodes, 0);
-			$this->nodeChildNode = array_fill(0, $this->numNodes, 0);
-			$this->nextNode->extractChildLists($this->nodeChildFirst, $this->nodeChildNext, $this->nodeChildNode);
-		}
-	}
-
-	/**
-	 * Returns whether the given word is contained in the string given at
-	 * construction time.
-     *
-     * @param array $find
-     * @return boolean
-	 */
-	public function containsWord(array $find) {
-		$node = 0;
-		$findSize = count($find);
-		for ($i = 0; $i < $findSize;) {
-            /** @var int */
-			$next = $this->nextNode->get($node, $find[$i]);
-			if ($next < 0) {
-				return false;
-			}
-			for ($j = $this->nodeWordBegin[$next]; $j < $this->nodeWordEnd[$next] && $i < $findSize; ++$i, ++$j) {
-				if (!$this->word[$j]->equals($find[$i])) {
-					return false;
-				}
-			}
-			$node = $next;
-		}
-		return true;
-	}
 }

      ----------- end diff -----------

   4) src/Detector/Strategy/SuffixTree/PairList.php
      ---------- begin diff ----------
--- /usr/local/src/phpcpd/src/Detector/Strategy/SuffixTree/PairList.php
+++ /usr/local/src/phpcpd/src/Detector/Strategy/SuffixTree/PairList.php
@@ -1,28 +1,19 @@
-<?php
-
-/*-------------------------------------------------------------------------+
-|                                                                          |
-| Copyright 2005-2011 The ConQAT Project                                   |
-|                                                                          |
-| Licensed under the Apache License, Version 2.0 (the "License");          |
-| you may not use this file except in compliance with the License.         |
-| You may obtain a copy of the License at                                  |
-|                                                                          |
-|    http://www.apache.org/licenses/LICENSE-2.0                            |
-|                                                                          |
-| Unless required by applicable law or agreed to in writing, software      |
-| distributed under the License is distributed on an "AS IS" BASIS,        |
-| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
-| See the License for the specific language governing permissions and      |
-| limitations under the License.                                           |
-+-------------------------------------------------------------------------*/
-
+<?php declare(strict_types=1);
+/*
+ * This file is part of PHP Copy/Paste Detector (PHPCPD).
+ *
+ * (c) Sebastian Bergmann <sebastian@phpunit.de>
+ *
+ * For the full copyright and license information, please view the LICENSE
+ * file that was distributed with this source code.
+ */
 namespace SebastianBergmann\PHPCPD\Detector\Strategy\SuffixTree;
 
 /**
  * A list for storing pairs in a specific order.
- * 
+ *
  * @author $Author: hummelb $
+ *
  * @version $Rev: 51770 $
  * @ConQAT.Rating GREEN Hash: 7459D6D0F59028B37DD23DD091BDCEEA
  */
@@ -30,231 +21,247 @@
 {
     /**
      * Version used for serialization.
+     *
      * @var int
      */
-	private $serialVersionUID = 1;
+    private $serialVersionUID = 1;
 
     /**
      * The current size.
+     *
      * @var int
      */
-	private $size = 0;
+    private $size = 0;
 
     /**
      * The array used for storing the S.
+     *
      * @var object[]
      */
-	private $firstElements;
+    private $firstElements;
 
     /**
      * The array used for storing the T.
-     * @var object[] 
+     *
+     * @var object[]
      */
-	private $secondElements;
+    private $secondElements;
 
     public function __construct(int $initialCapacity = 16)
     {
-		if ($initialCapacity < 1) {
-			$initialCapacity = 1;
-		}
-        $this->firstElements = array_fill(0, $initialCapacity, null);
+        if ($initialCapacity < 1) {
+            $initialCapacity = 1;
+        }
+        $this->firstElements  = array_fill(0, $initialCapacity, null);
         $this->secondElements = array_fill(0, $initialCapacity, null);
-	}
+    }
 
-
-	/** Returns whether the list is empty. */
+    /** Returns whether the list is empty. */
     public function isEmpty(): bool
     {
-		return $this->size == 0;
-	}
+        return $this->size == 0;
+    }
 
-	/** Returns the size of the list. */
+    /** Returns the size of the list. */
     public function size(): int
     {
-		return $this->size;
-	}
+        return $this->size;
+    }
 
     /**
      * Add the given pair to the list.
-     * @return void
      */
     public function add($first, $second): void
     {
-		$this->firstElements[$this->size] = $first;
-		$this->secondElements[$this->size] = $second;
-		++$this->size;
-	}
+        $this->firstElements[$this->size]  = $first;
+        $this->secondElements[$this->size] = $second;
+        $this->size++;
+    }
 
-	/** Adds all pairs from another list. */
-    public function addAll(PairList $other): void
+    /** Adds all pairs from another list. */
+    public function addAll(self $other): void
     {
-		// we have to store this in a local var, as other.$this->size may change if
-		// other == this
-		$otherSize = $other->size;
+        // we have to store this in a local var, as other.$this->size may change if
+        // other == this
+        $otherSize = $other->size;
 
-		for ($i = 0; $i < $otherSize; ++$i) {
-			$this->firstElements[$this->size] = $other->firstElements[$i];
-			$this->secondElements[$this->size] = $other->secondElements[$i];
-			++$this->size;
-		}
-	}
+        for ($i = 0; $i < $otherSize; $i++) {
+            $this->firstElements[$this->size]  = $other->firstElements[$i];
+            $this->secondElements[$this->size] = $other->secondElements[$i];
+            $this->size++;
+        }
+    }
 
-	/** Make sure there is space for at least the given amount of elements. */
-    protected function ensureSpace(int $space): void
-    {
-		if ($space <= count($this->firstElements)) {
-			return;
-		}
-
-		$oldFirst = $this->firstElements;
-		$oldSecond = $this->secondElements;
-		$newSize = count($this->firstElements) * 2;
-		while ($newSize < $space) {
-			$newSize *= 2;
-		}
-	}
-
-	/** Returns the first element at given index. */
+    /** Returns the first element at given index. */
     public function getFirst(int $i)
     {
-		$this->checkWithinBounds($i);
-		return $this->firstElements[$i];
-	}
+        $this->checkWithinBounds($i);
 
-	/**
-	 * Checks whether the given <code>$i</code> is within the bounds. Throws an
-	 * exception otherwise.
-	 */
-    private function checkWithinBounds(int $i): void
-    {
-		if ($i < 0 || $i >= $this->size) {
-			throw new Exception("Out of bounds: " + $i);
-		}
-	}
+        return $this->firstElements[$i];
+    }
 
-	/** Sets the first element at given index. */
+    /** Sets the first element at given index. */
     public function setFirst(int $i, $value): void
     {
-		$this->checkWithinBounds($i);
-		$this->firstElements[$i] = $value;
-	}
+        $this->checkWithinBounds($i);
+        $this->firstElements[$i] = $value;
+    }
 
-	/** Returns the second element at given index. */
+    /** Returns the second element at given index. */
     public function getSecond(int $i)
     {
-		$this->checkWithinBounds($i);
-		return $this->secondElements[$i];
-	}
+        $this->checkWithinBounds($i);
 
-	/** Sets the first element at given index. */
+        return $this->secondElements[$i];
+    }
+
+    /** Sets the first element at given index. */
     public function setSecond(int $i, $value): void
     {
-		$this->checkWithinBounds($i);
-		$this->secondElements[$i] = $value;
-	}
+        $this->checkWithinBounds($i);
+        $this->secondElements[$i] = $value;
+    }
 
-	/** Creates a new list containing all first elements. */
+    /** Creates a new list containing all first elements. */
     public function extractFirstList(): array
     {
-		//array $result = new ArrayList<S>($this->size + 1);
-		$result = [];
-		for ($i = 0; $i < $this->size; ++$i) {
-			$result[] = $this->firstElements[$i];
-		}
-		return $result;
-	}
+        //array $result = new ArrayList<S>($this->size + 1);
+        $result = [];
 
-	/** Creates a new list containing all second elements. */
+        for ($i = 0; $i < $this->size; $i++) {
+            $result[] = $this->firstElements[$i];
+        }
+
+        return $result;
+    }
+
+    /** Creates a new list containing all second elements. */
     public function extractSecondList(): array
     {
-		//$result = new ArrayList<T>($this->size + 1);
-		$result = [];
-		for ($i = 0; $i < $this->size; ++$i) {
-			$result[] = $this->secondElements[$i];
-		}
-		return $result;
-	}
+        //$result = new ArrayList<T>($this->size + 1);
+        $result = [];
 
-	/**
-	 * Swaps the pairs of this list. Is S and T are different types, this will
-	 * be extremely dangerous.
-	 */
+        for ($i = 0; $i < $this->size; $i++) {
+            $result[] = $this->secondElements[$i];
+        }
+
+        return $result;
+    }
+
+    /**
+     * Swaps the pairs of this list. Is S and T are different types, this will
+     * be extremely dangerous.
+     */
     public function swapPairs(): void
     {
-        $temp = $this->firstElements;
-		$this->firstElements = $this->secondElements;
-		$this->secondElements = $temp;
-	}
+        $temp                 = $this->firstElements;
+        $this->firstElements  = $this->secondElements;
+        $this->secondElements = $temp;
+    }
 
-	/** Swaps the entries located at indexes $i and $j. */
+    /** Swaps the entries located at indexes $i and $j. */
     public function swapEntries(int $i, int $j): void
     {
-		$tmp1 = $this->getFirst($i);
-		$tmp2 = $this->getSecond($i);
-		$this->setFirst($i, $this->getFirst($j));
-		$this->setSecond($i, $this->getSecond($j));
-		$this->setFirst($j, $tmp1);
-		$this->setSecond($j, $tmp2);
-	}
+        $tmp1 = $this->getFirst($i);
+        $tmp2 = $this->getSecond($i);
+        $this->setFirst($i, $this->getFirst($j));
+        $this->setSecond($i, $this->getSecond($j));
+        $this->setFirst($j, $tmp1);
+        $this->setSecond($j, $tmp2);
+    }
 
-	/** Clears this list. */
+    /** Clears this list. */
     public function clear(): void
     {
-		$this->size = 0;
-	}
+        $this->size = 0;
+    }
 
-	/** Removes the last element of the list. */
+    /** Removes the last element of the list. */
     public function removeLast(): void
     {
-		$this->size -= 1;
-	}
+        $this->size--;
+    }
 
     public function toString(): string
     {
-		$result = '';
-		$result += ('[');
-		for ($i = 0; $i < $this->size; $i++) {
-			if ($i != 0) {
-				$result .= ',';
-			}
-			$result .= '(';
-			$result .= (string) $this->firstElements[$i];
-			$result .= ',';
-			$result .= (string) $this->secondElements[$i];
-			$result .= ')';
-		}
-		$result .= ']';
-		return $result;
-	}
+        $result = '';
+        $result += ('[');
 
+        for ($i = 0; $i < $this->size; $i++) {
+            if ($i != 0) {
+                $result .= ',';
+            }
+            $result .= '(';
+            $result .= (string) $this->firstElements[$i];
+            $result .= ',';
+            $result .= (string) $this->secondElements[$i];
+            $result .= ')';
+        }
+        $result .= ']';
+
+        return $result;
+    }
+
     public function hashCode(): int
     {
-		$prime = 31;
-		$hash = $this->size;
-		$hash = $prime * $hash + crc32(serialize($this->firstElements));
-		return $prime * $hash + crc32(serialize($this->secondElements));
-	}
+        $prime = 31;
+        $hash  = $this->size;
+        $hash  = $prime * $hash + crc32(serialize($this->firstElements));
 
-    public function equals(PairList $obj): bool
+        return $prime * $hash + crc32(serialize($this->secondElements));
+    }
+
+    public function equals(self $obj): bool
     {
         // TODO: Doesn't work in PHP
-		if ($this === $obj) {
-			return true;
-		}
-		if (!($obj instanceof PairList)) {
-			return false;
-		}
+        if ($this === $obj) {
+            return true;
+        }
 
-		$other = $obj;
-		if ($this->size !== $other->size) {
-			return false;
-		}
-		for ($i = 0; $i < $this->size; $i++) {
-			if (!($this->firstElements[$i] == $other->firstElements[$i])
-					|| !($this->secondElements[$i] != $this->secondElements[$i])) {
-				return false;
-			}
-		}
-		return true;
-	}
+        if (!($obj instanceof self)) {
+            return false;
+        }
+
+        $other = $obj;
+
+        if ($this->size !== $other->size) {
+            return false;
+        }
+
+        for ($i = 0; $i < $this->size; $i++) {
+            if (!($this->firstElements[$i] == $other->firstElements[$i]) ||
+                    !($this->secondElements[$i] != $this->secondElements[$i])) {
+                return false;
+            }
+        }
+
+        return true;
+    }
+
+    /** Make sure there is space for at least the given amount of elements. */
+    protected function ensureSpace(int $space): void
+    {
+        if ($space <= count($this->firstElements)) {
+            return;
+        }
+
+        $oldFirst  = $this->firstElements;
+        $oldSecond = $this->secondElements;
+        $newSize   = count($this->firstElements) * 2;
+
+        while ($newSize < $space) {
+            $newSize *= 2;
+        }
+    }
+
+    /**
+     * Checks whether the given <code>$i</code> is within the bounds. Throws an
+     * exception otherwise.
+     */
+    private function checkWithinBounds(int $i): void
+    {
+        if ($i < 0 || $i >= $this->size) {
+            throw new Exception('Out of bounds: ' + $i);
+        }
+    }
 }

      ----------- end diff -----------

   5) src/Detector/Strategy/SuffixTree/SuffixTreeHashTable.php
      ---------- begin diff ----------
--- /usr/local/src/phpcpd/src/Detector/Strategy/SuffixTree/SuffixTreeHashTable.php
+++ /usr/local/src/phpcpd/src/Detector/Strategy/SuffixTree/SuffixTreeHashTable.php
@@ -1,22 +1,12 @@
-<?php
-
-/*-------------------------------------------------------------------------+
-|                                                                          |
-| Copyright 2005-2011 The ConQAT Project                                   |
-|                                                                          |
-| Licensed under the Apache License, Version 2.0 (the "License");          |
-| you may not use this file except in compliance with the License.         |
-| You may obtain a copy of the License at                                  |
-|                                                                          |
-|    http://www.apache.org/licenses/LICENSE-2.0                            |
-|                                                                          |
-| Unless required by applicable law or agreed to in writing, software      |
-| distributed under the License is distributed on an "AS IS" BASIS,        |
-| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
-| See the License for the specific language governing permissions and      |
-| limitations under the License.                                           |
-+-------------------------------------------------------------------------*/
-
+<?php declare(strict_types=1);
+/*
+ * This file is part of PHP Copy/Paste Detector (PHPCPD).
+ *
+ * (c) Sebastian Bergmann <sebastian@phpunit.de>
+ *
+ * For the full copyright and license information, please view the LICENSE
+ * file that was distributed with this source code.
+ */
 namespace SebastianBergmann\PHPCPD\Detector\Strategy\SuffixTree;
 
 /**
@@ -27,215 +17,229 @@
  * It hashes from (node, character) pairs to the next node, where nodes are
  * represented by integers and the type of characters is determined by the
  * generic parameter.
- * 
+ *
  * @author Benjamin Hummel
  * @author $Author: juergens $
- * 
+ *
  * @version $Revision: 34670 $
  * @ConQAT.Rating GREEN Hash: 6A7A830078AF0CA9C2D84C148F336DF4
  */
 class SuffixTreeHashTable
 {
-	/**
-	 * These numbers were taken from
-	 * http://planetmath.org/encyclopedia/GoodHashTablePrimes.html
+    /**
+     * These numbers were taken from
+     * http://planetmath.org/encyclopedia/GoodHashTablePrimes.html.
+     *
      * @var int[]
-	 */
-	private $allowedSizes = [ 53, 97, 193, 389, 769, 1543,
-			3079, 6151, 12289, 24593, 49157, 98317, 196613, 393241, 786433,
-			1572869, 3145739, 6291469, 12582917, 25165843, 50331653, 100663319,
-			201326611, 402653189, 805306457, 1610612741 ];
+     */
+    private $allowedSizes = [53, 97, 193, 389, 769, 1543,
+        3079, 6151, 12289, 24593, 49157, 98317, 196613, 393241, 786433,
+        1572869, 3145739, 6291469, 12582917, 25165843, 50331653, 100663319,
+        201326611, 402653189, 805306457, 1610612741, ];
 
     /**
      * The size of the hash table.
+     *
      * @var int
      */
-	private $tableSize;
+    private $tableSize;
 
     /**
-     * Storage space for the node part of the key
+     * Storage space for the node part of the key.
+     *
      * @var int[]
      */
-	private $keyNodes;
+    private $keyNodes;
 
     /**
      * Storage space for the character part of the key.
+     *
      * @var object[]
      */
-	private $keyChars;
+    private $keyChars;
 
     /**
      * Storage space for the result node.
+     *
      * @var int[]
      */
-	private $resultNodes;
+    private $resultNodes;
 
     /**
      * Debug info: number of stored nodes.
-     * @var int 
+     *
+     * @var int
      */
-	private $_numStoredNodes = 0;
+    private $_numStoredNodes = 0;
 
     /**
      * Debug info: number of calls to find so far.
+     *
      * @var int
      */
-	private $_numFind = 0;
+    private $_numFind = 0;
 
     /**
      * Debug info: number of collisions (i.e. wrong finds) during find so far.
+     *
      * @var int
      */
-	private $_numColl = 0;
+    private $_numColl = 0;
 
-	/**
-	 * Creates a new hash table for the given number of nodes. Trying to add
-	 * more nodes will result in worse performance down to entering an infinite
-	 * loop on some operations.
+    /**
+     * Creates a new hash table for the given number of nodes. Trying to add
+     * more nodes will result in worse performance down to entering an infinite
+     * loop on some operations.
+     */
+    public function __construct(int $numNodes)
+    {
+        $minSize   = (int) ceil(1.5 * $numNodes);
+        $sizeIndex = 0;
+
+        while ($this->allowedSizes[$sizeIndex] < $minSize) {
+            $sizeIndex++;
+        }
+        $this->tableSize = $this->allowedSizes[$sizeIndex];
+
+        $this->keyNodes    = array_fill(0, $this->tableSize, 0);
+        $this->keyChars    = array_fill(0, $this->tableSize, null);
+        $this->resultNodes = array_fill(0, $this->tableSize, 0);
+    }
+
+    /**
+     * Returns the next node for the given (node, character) key pair or a
+     * negative value if no next node is stored for this key.
+     */
+    public function get(int $keyNode, JavaObjectInterface $keyChar): int
+    {
+        $pos = $this->hashFind($keyNode, $keyChar);
+
+        if ($this->keyChars[$pos] === null) {
+            return -1;
+        }
+
+        return $this->resultNodes[$pos];
+    }
+
+    /**
+     * Inserts the given result node for the (node, character) key pair.
+     */
+    public function put(int $keyNode, JavaObjectInterface $keyChar, int $resultNode): void
+    {
+        $pos = $this->hashFind($keyNode, $keyChar);
+
+        if ($this->keyChars[$pos] == null) {
+            $this->_numStoredNodes++;
+            $this->keyChars[$pos] = $keyChar;
+            $this->keyNodes[$pos] = $keyNode;
+        }
+        $this->resultNodes[$pos] = $resultNode;
+    }
+
+    /**
+     * Extracts the list of child nodes for each node from the hash table
+     * entries as a linked list. All arrays are expected to be initially empty
+     * and of suitable size (i.e. for <em>n</em> nodes it should have size
+     * <em>n</em> given that nodes are numbered 0 to n-1). Those arrays will be
+     * filled from this method.
+     * <p>
+     * The method is package visible, as it is tighly coupled to the
+     * {@link SuffixTree} class.
      *
-     * @param int $numNodes
-	 */
-    public function __construct(int $numNodes)
+     * @param int[] nodeFirstIndex an array giving for each node the index where the first child
+     *            will be stored (or -1 if it has no children)
+     * @param int[] nodeNextIndex this array gives the next index of the child list or -1 if
+     *            this is the last one
+     * @param int[] nodeChild this array stores the actual name (=number) of the mode in the
+     *            child list
+     *
+     * @throws ArrayIndexOutOfBoundsException if any of the given arrays was too small
+     */
+    public function extractChildLists(array &$nodeFirstIndex, array &$nodeNextIndex, array &$nodeChild): void
     {
-		$minSize = (int) ceil(1.5 * $numNodes);
-		$sizeIndex = 0;
-		while ($this->allowedSizes[$sizeIndex] < $minSize) {
-			$sizeIndex++;
-		}
-		$this->tableSize = $this->allowedSizes[$sizeIndex];
+        // Instead of Arrays.fill($nodeFirstIndex, -1);
+        foreach ($nodeFirstIndex as $k => $v) {
+            $nodeFirstIndex[$k] = -1;
+        }
+        $free = 0;
 
-		$this->keyNodes = array_fill(0, $this->tableSize, 0);
-		$this->keyChars = array_fill(0, $this->tableSize, null);
-		$this->resultNodes = array_fill(0, $this->tableSize, 0);
-	}
+        for ($i = 0; $i < $this->tableSize; $i++) {
+            if ($this->keyChars[$i] !== null) {
+                // insert $this->keyNodes[$i] -> $this->resultNodes[$i]
+                $nodeChild[$free]                    = $this->resultNodes[$i];
+                $nodeNextIndex[$free]                = $nodeFirstIndex[$this->keyNodes[$i]];
+                $nodeFirstIndex[$this->keyNodes[$i]] = $free++;
+            }
+        }
+    }
 
-	/**
-	 * Returns the position of the (node,char) key in the hash map or the
-	 * position to insert it into if it is not yet in.
+    /**
+     * Returns the position of the (node,char) key in the hash map or the
+     * position to insert it into if it is not yet in.
      *
-     * @param int $keyNode
-     * @param JavaObjectInterface $keyChar
      * @return int
-	 */
+     */
     private function hashFind(int $keyNode, JavaObjectInterface $keyChar)
     {
-		++$this->_numFind;
+        $this->_numFind++;
         /** @var int */
-		$hash = $keyChar->hashCode();
+        $hash = $keyChar->hashCode();
         /** @var int */
-		$pos = $this->posMod($this->primaryHash($keyNode, $hash));
+        $pos = $this->posMod($this->primaryHash($keyNode, $hash));
         /** @var int */
-		$secondary = $this->secondaryHash($keyNode, $hash);
-		while ($this->keyChars[$pos] !== null) {
-			if ($this->keyNodes[$pos] === $keyNode && $keyChar->equals($this->keyChars[$pos])) {
-				break;
-			}
-			++$this->_numColl;
-			$pos = ($pos + $secondary) % $this->tableSize;
-		}
-		return $pos;
-	}
+        $secondary = $this->secondaryHash($keyNode, $hash);
 
-	/**
-	 * Returns the next node for the given (node, character) key pair or a
-	 * negative value if no next node is stored for this key.
-     *
-     * @return int
-	 */
-    public function get(int $keyNode, JavaObjectInterface $keyChar): int
-    {
-		$pos = $this->hashFind($keyNode, $keyChar);
-		if ($this->keyChars[$pos] === null) {
-			return -1;
-		}
-		return $this->resultNodes[$pos];
-	}
+        while ($this->keyChars[$pos] !== null) {
+            if ($this->keyNodes[$pos] === $keyNode && $keyChar->equals($this->keyChars[$pos])) {
+                break;
+            }
+            $this->_numColl++;
+            $pos = ($pos + $secondary) % $this->tableSize;
+        }
 
-    /**
-     * Inserts the given result node for the (node, character) key pair.
-     * @return void
-     */
-    public function put(int $keyNode, JavaObjectInterface $keyChar, int $resultNode)
-    {
-		$pos = $this->hashFind($keyNode, $keyChar);
-		if ($this->keyChars[$pos] == null) {
-			++$this->_numStoredNodes;
-			$this->keyChars[$pos] = $keyChar;
-			$this->keyNodes[$pos] = $keyNode;
-		}
-		$this->resultNodes[$pos] = $resultNode;
-	}
+        return $pos;
+    }
 
     /**
      * Returns the primary hash value for a (node, character) key pair.
+     *
      * @return int
      */
     private function primaryHash(int $keyNode, int $keyCharHash)
     {
-		$res =  $keyCharHash ^ (13 * $keyNode);
-        return $res;
-	}
+        return $keyCharHash ^ (13 * $keyNode);
+    }
 
     /**
      * Returns the secondary hash value for a (node, character) key pair.
+     *
      * @return int
      */
     private function secondaryHash(int $keyNode, int $keyCharHash)
     {
-		$result = $this->posMod(($keyCharHash ^ (1025 * $keyNode)));
-		if ($result == 0) {
-			return 2;
-		}
-		return $result;
-	}
+        $result = $this->posMod(($keyCharHash ^ (1025 * $keyNode)));
 
-	/**
-	 * Returns the smallest non-negative number congruent to x modulo
-	 * {@link #tableSize}.
+        if ($result == 0) {
+            return 2;
+        }
+
+        return $result;
+    }
+
+    /**
+     * Returns the smallest non-negative number congruent to x modulo
+     * {@link #tableSize}.
+     *
      * @return int
-	 */
+     */
     private function posMod(int $x)
     {
-		$x %= $this->tableSize;
-		if ($x < 0) {
-			$x += $this->tableSize;
-		}
-		return $x;
-	}
+        $x %= $this->tableSize;
 
-	/**
-	 * Extracts the list of child nodes for each node from the hash table
-	 * entries as a linked list. All arrays are expected to be initially empty
-	 * and of suitable size (i.e. for <em>n</em> nodes it should have size
-	 * <em>n</em> given that nodes are numbered 0 to n-1). Those arrays will be
-	 * filled from this method.
-	 * <p>
-	 * The method is package visible, as it is tighly coupled to the
-	 * {@link SuffixTree} class.
-	 * 
-	 * @param int[] nodeFirstIndex an array giving for each node the index where the first child
-	 *            will be stored (or -1 if it has no children).
-	 * @param int[] nodeNextIndex this array gives the next index of the child list or -1 if
-	 *            this is the last one.
-	 * @param int[] nodeChild this array stores the actual name (=number) of the mode in the
-	 *            child list.
-     * @return void
-	 * @throws ArrayIndexOutOfBoundsException if any of the given arrays was too small.
-	 */
-    public function extractChildLists(array &$nodeFirstIndex, array &$nodeNextIndex, array &$nodeChild)
-    {
-		// Instead of Arrays.fill($nodeFirstIndex, -1);
-        foreach ($nodeFirstIndex as $k => $v) {
-            $nodeFirstIndex[$k] = -1;
+        if ($x < 0) {
+            $x += $this->tableSize;
         }
-		$free = 0;
-		for ($i = 0; $i < $this->tableSize; ++$i) {
-			if ($this->keyChars[$i] !== null) {
-				// insert $this->keyNodes[$i] -> $this->resultNodes[$i]
-				$nodeChild[$free] = $this->resultNodes[$i];
-				$nodeNextIndex[$free] = $nodeFirstIndex[$this->keyNodes[$i]];
-				$nodeFirstIndex[$this->keyNodes[$i]] = $free++;
-			}
-		}
-	}
+
+        return $x;
+    }
 }

      ----------- end diff -----------

   6) src/Detector/Strategy/SuffixTree/CloneInfo.php
      ---------- begin diff ----------
--- /usr/local/src/phpcpd/src/Detector/Strategy/SuffixTree/CloneInfo.php
+++ /usr/local/src/phpcpd/src/Detector/Strategy/SuffixTree/CloneInfo.php
@@ -1,22 +1,12 @@
-<?php
-
-/*-------------------------------------------------------------------------+
-|                                                                          |
-| Copyright 2005-2011 The ConQAT Project                                   |
-|                                                                          |
-| Licensed under the Apache License, Version 2.0 (the "License");          |
-| you may not use this file except in compliance with the License.         |
-| You may obtain a copy of the License at                                  |
-|                                                                          |
-|    http://www.apache.org/licenses/LICENSE-2.0                            |
-|                                                                          |
-| Unless required by applicable law or agreed to in writing, software      |
-| distributed under the License is distributed on an "AS IS" BASIS,        |
-| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
-| See the License for the specific language governing permissions and      |
-| limitations under the License.                                           |
-+-------------------------------------------------------------------------*/
-
+<?php declare(strict_types=1);
+/*
+ * This file is part of PHP Copy/Paste Detector (PHPCPD).
+ *
+ * (c) Sebastian Bergmann <sebastian@phpunit.de>
+ *
+ * For the full copyright and license information, please view the LICENSE
+ * file that was distributed with this source code.
+ */
 namespace SebastianBergmann\PHPCPD\Detector\Strategy\SuffixTree;
 
 /** Stores information on a clone. */
@@ -24,40 +14,44 @@
 {
     /**
      * Length of the clone in tokens.
+     *
      * @var int
      */
     public $length;
 
     /**
-     * Position in word list
+     * Position in word list.
+     *
      * @var int
      */
     public $position;
 
     /**
-     * Number of occurrences of the clone.
-     * @var int
-     */
-    private $occurrences;
-
-    /**
      * @var PhpToken
      */
     public $token;
 
     /**
-     * Related clones
+     * Related clones.
+     *
      * @var PairList
      */
     public $otherClones;
 
+    /**
+     * Number of occurrences of the clone.
+     *
+     * @var int
+     */
+    private $occurrences;
+
     /** Constructor. */
     public function __construct(int $length, int $position, int $occurrences, PhpToken $token, PairList $otherClones)
     {
-        $this->length = $length;
-        $this->position = $position;
+        $this->length      = $length;
+        $this->position    = $position;
         $this->occurrences = $occurrences;
-        $this->token = $token;
+        $this->token       = $token;
         $this->otherClones = $otherClones;
     }
 
@@ -64,12 +58,10 @@
     /**
      * Returns whether this clone info dominates the given one, i.e. whether
      * both {@link #length} and {@link #occurrences} s not smaller.
-     * 
-     * @param CloneInfo $ci
-     * @param later The amount the given clone starts later than the "this" clone.
-     * @return boolean
+     *
+     * @param later the amount the given clone starts later than the "this" clone
      */
-    public function dominates(CloneInfo $ci, int $later): bool
+    public function dominates(self $ci, int $later): bool
     {
         return $this->length - $later >= $ci->length && $this->occurrences >= $ci->occurrences;
     }

      ----------- end diff -----------

   7) src/Detector/Strategy/SuffixTree/Sentinel.php
      ---------- begin diff ----------
--- /usr/local/src/phpcpd/src/Detector/Strategy/SuffixTree/Sentinel.php
+++ /usr/local/src/phpcpd/src/Detector/Strategy/SuffixTree/Sentinel.php
@@ -1,22 +1,12 @@
-<?php
-
-/*-------------------------------------------------------------------------+
-|                                                                          |
-| Copyright 2005-2011 The ConQAT Project                                   |
-|                                                                          |
-| Licensed under the Apache License, Version 2.0 (the "License");          |
-| you may not use this file except in compliance with the License.         |
-| You may obtain a copy of the License at                                  |
-|                                                                          |
-|    http://www.apache.org/licenses/LICENSE-2.0                            |
-|                                                                          |
-| Unless required by applicable law or agreed to in writing, software      |
-| distributed under the License is distributed on an "AS IS" BASIS,        |
-| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
-| See the License for the specific language governing permissions and      |
-| limitations under the License.                                           |
-+-------------------------------------------------------------------------*/
-
+<?php declare(strict_types=1);
+/*
+ * This file is part of PHP Copy/Paste Detector (PHPCPD).
+ *
+ * (c) Sebastian Bergmann <sebastian@phpunit.de>
+ *
+ * For the full copyright and license information, please view the LICENSE
+ * file that was distributed with this source code.
+ */
 namespace SebastianBergmann\PHPCPD\Detector\Strategy\SuffixTree;
 
 /**
@@ -43,11 +33,11 @@
     public function equals(object $obj): bool
     {
         // Original code uses physical object equality, not present in PHP.
-        return $obj instanceof Sentinel;
+        return $obj instanceof self;
     }
 
     public function toString(): string
     {
-        return "$";
+        return '$';
     }
 }

      ----------- end diff -----------

   8) src/Detector/Strategy/SuffixTree/PhpToken.php
      ---------- begin diff ----------
--- /usr/local/src/phpcpd/src/Detector/Strategy/SuffixTree/PhpToken.php
+++ /usr/local/src/phpcpd/src/Detector/Strategy/SuffixTree/PhpToken.php
@@ -1,13 +1,24 @@
-<?php
-
+<?php declare(strict_types=1);
+/*
+ * This file is part of PHP Copy/Paste Detector (PHPCPD).
+ *
+ * (c) Sebastian Bergmann <sebastian@phpunit.de>
+ *
+ * For the full copyright and license information, please view the LICENSE
+ * file that was distributed with this source code.
+ */
 namespace SebastianBergmann\PHPCPD\Detector\Strategy\SuffixTree;
 
 class PhpToken implements JavaObjectInterface
 {
     public $tokenCode;
+
     public $line;
+
     public $file;
+
     public $tokenName;
+
     public $content;
 
     public function __construct(
@@ -19,14 +30,16 @@
     ) {
         $this->tokenCode = $tokenCode;
         $this->tokenName = $tokenName;
-        $this->line = $line;
-        $this->content = $content;
-        $this->file = $file;
+        $this->line      = $line;
+        $this->content   = $content;
+        $this->file      = $file;
     }
 
-    /**
-     * @return int
-     */
+    public function __toString()
+    {
+        return $this->tokenName;
+    }
+
     public function hashCode(): int
     {
         return (int) crc32($this->content);
@@ -64,10 +77,8 @@
         //return $tokenCode;
     }
 
-    /**
-     * @return boolean
-     */
-    public function equals(JavaObjectInterface $token): bool {
+    public function equals(JavaObjectInterface $token): bool
+    {
         return $token->hashCode() === $this->hashCode();
     }
 
@@ -74,11 +85,8 @@
     /**
      * @return string
      */
-    public function toString() {
-        return $this->tokenName;
-    }
-
-    public function __tostring() {
+    public function toString()
+    {
         return $this->tokenName;
     }
 }

      ----------- end diff -----------

   9) src/Detector/Strategy/SuffixTree/ApproximateCloneDetectingSuffixTree.php
      ---------- begin diff ----------
--- /usr/local/src/phpcpd/src/Detector/Strategy/SuffixTree/ApproximateCloneDetectingSuffixTree.php
+++ /usr/local/src/phpcpd/src/Detector/Strategy/SuffixTree/ApproximateCloneDetectingSuffixTree.php
@@ -1,29 +1,20 @@
-<?php
-
-/*-------------------------------------------------------------------------+
-|                                                                          |
-| Copyright 2005-2011 The ConQAT Project                                   |
-|                                                                          |
-| Licensed under the Apache License, Version 2.0 (the "License");          |
-| you may not use this file except in compliance with the License.         |
-| You may obtain a copy of the License at                                  |
-|                                                                          |
-|    http://www.apache.org/licenses/LICENSE-2.0                            |
-|                                                                          |
-| Unless required by applicable law or agreed to in writing, software      |
-| distributed under the License is distributed on an "AS IS" BASIS,        |
-| WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
-| See the License for the specific language governing permissions and      |
-| limitations under the License.                                           |
-+-------------------------------------------------------------------------*/
-
+<?php declare(strict_types=1);
+/*
+ * This file is part of PHP Copy/Paste Detector (PHPCPD).
+ *
+ * (c) Sebastian Bergmann <sebastian@phpunit.de>
+ *
+ * For the full copyright and license information, please view the LICENSE
+ * file that was distributed with this source code.
+ */
 namespace SebastianBergmann\PHPCPD\Detector\Strategy\SuffixTree;
 
 /**
  * An extension of the suffix tree adding an algorithm for finding approximate
  * clones, i.e. substrings which are similar.
- * 
+ *
  * @author $Author: hummelb $
+ *
  * @version $Revision: 43151 $
  * @ConQAT.Rating GREEN Hash: BB94CD690760BC239F04D32D5BCAC33E
  */
@@ -30,89 +21,77 @@
 class ApproximateCloneDetectingSuffixTree extends SuffixTree
 {
     /**
+     * The minimal length of clones to return.
+     *
+     * @var int
+     */
+    protected $minLength;
+
+    /**
      * The number of leaves reachable from the given node (1 for leaves).
+     *
      * @var int[]
      * */
-	private $leafCount;
+    private $leafCount;
 
     /**
      * This is the distance between two entries in the {@link #cloneInfos} map.
+     *
      * @var int
      */
-	private $INDEX_SPREAD = 10;
+    private $INDEX_SPREAD = 10;
 
     /**
      * This map stores for each position the relevant clone infos.
+     *
      * @var array<int, CloneInfo>
      */
-	//private final ListMap<Integer, CloneInfo> cloneInfos = new ListMap<Integer, CloneInfo>();
-	private $cloneInfos = [];
+    //private final ListMap<Integer, CloneInfo> cloneInfos = new ListMap<Integer, CloneInfo>();
+    private $cloneInfos = [];
 
-	/**
-	 * The maximal length of a clone. This influences the size of the
-	 * (quadratic) {@link #edBuffer}.
+    /**
+     * The maximal length of a clone. This influences the size of the
+     * (quadratic) {@link #edBuffer}.
+     *
      * @var int
-	 */
-	private $MAX_LENGTH = 1024;
+     */
+    private $MAX_LENGTH = 1024;
 
     /**
      * Buffer used for calculating edit distance.
+     *
      * @var array<int[]>
      */
-	private $edBuffer = [];
+    private $edBuffer = [];
 
     /**
-     * The minimal length of clones to return.
+     * Number of units that must be equal at the start of a clone.
+     *
      * @var int
      */
-	protected $minLength;
+    private $headEquality;
 
     /**
-     * Number of units that must be equal at the start of a clone
-     * @var int
-     */
-	private $headEquality;
-
-	/**
-	 * Create a new suffix tree from a given word. The word given as parameter
-	 * is used internally and should not be modified anymore, so copy it before
-	 * if required.
-	 * <p>
-	 * This only word correctly if the given word is closed using a sentinel
-	 * character.
+     * Create a new suffix tree from a given word. The word given as parameter
+     * is used internally and should not be modified anymore, so copy it before
+     * if required.
+     * <p>
+     * This only word correctly if the given word is closed using a sentinel
+     * character.
      *
      * @param array $word List of tokens to analyze
-	 */
+     */
     public function __construct(array $word)
     {
-        $arr = array_fill(0, $this->MAX_LENGTH, 0);
+        $arr            = array_fill(0, $this->MAX_LENGTH, 0);
         $this->edBuffer = array_fill(0, $this->MAX_LENGTH, $arr);
 
         parent::__construct($word);
-		$this->ensureChildLists();
-		$this->leafCount = array_fill(0, $this->numNodes, 0);
-		$this->initLeafCount(0);
-	}
+        $this->ensureChildLists();
+        $this->leafCount = array_fill(0, $this->numNodes, 0);
+        $this->initLeafCount(0);
+    }
 
-	/**
-	 * Initializes the {@link #leafCount} array which given for each node the
-	 * number of leaves reachable from it (where leaves obtain a value of 1).
-     *
-     * @param int $node
-     * @return void
-	 */
-    private function initLeafCount(int $node)
-    {
-		$this->leafCount[$node] = 0;
-		for ($e = $this->nodeChildFirst[$node]; $e >= 0; $e = $this->nodeChildNext[$e]) {
-			$this->initLeafCount($this->nodeChildNode[$e]);
-			$this->leafCount[$node] += $this->leafCount[$this->nodeChildNode[$e]];
-		}
-		if ($this->leafCount[$node] == 0) {
-			$this->leafCount[$node] = 1;
-		}
-	}
-
     /**
      * @todo Add options:
      *   --min-tokens
@@ -121,48 +100,57 @@
      * @todo Possibly add consumer from original code.
      */
 
-	/**
-	 * Finds all clones in the string (List) used in the constructor.
-	 * 
-	 * @param int $minLength the minimal length of a clone in tokens (not lines)
-	 * @param int $maxErrors the maximal number of errors/gaps allowed
-	 * @param int $headEquality the number of elements which have to be the same at the beginning of a clone
-     * @return void
+    /**
+     * Finds all clones in the string (List) used in the constructor.
+     *
+     * @param int $minLength    the minimal length of a clone in tokens (not lines)
+     * @param int $maxErrors    the maximal number of errors/gaps allowed
+     * @param int $headEquality the number of elements which have to be the same at the beginning of a clone
+     *
      * @throws ConQATException
-	 */
+     */
     public function findClones(int $minLength, int $maxErrors, int $headEquality)
     {
-		$this->minLength = $minLength;
-		$this->headEquality = $headEquality;
-		$this->cloneInfos = [];
+        $this->minLength    = $minLength;
+        $this->headEquality = $headEquality;
+        $this->cloneInfos   = [];
 
-		for ($i = 0; $i < count($this->word); ++$i) {
-			// Do quick start, as first character has to match anyway.
-			$node = $this->nextNode->get(0, $this->word[$i]);
-			if ($node < 0 || $this->leafCount[$node] <= 1) {
-				continue;
-			}
+        for ($i = 0; $i < count($this->word); $i++) {
+            // Do quick start, as first character has to match anyway.
+            $node = $this->nextNode->get(0, $this->word[$i]);
 
-			// we know that we have an exact match of at least 'length'
-			// characters, as the word itself is part of the suffix tree.
-			$length = $this->nodeWordEnd[$node] - $this->nodeWordBegin[$node];
-			$numReported = 0;
-			for ($e = $this->nodeChildFirst[$node]; $e >= 0; $e = $this->nodeChildNext[$e]) {
-				if ($this->matchWord($i, $i + $length, $this->nodeChildNode[$e], $length,
-						$maxErrors)) {
-					++$numReported;
-				}
-			}
-			if ($length >= $this->minLength && $numReported != 1) {
-				$this->reportClone($i, $i + $length, $node, $length, $length);
-			}
-		}
+            if ($node < 0 || $this->leafCount[$node] <= 1) {
+                continue;
+            }
 
+            // we know that we have an exact match of at least 'length'
+            // characters, as the word itself is part of the suffix tree.
+            $length      = $this->nodeWordEnd[$node] - $this->nodeWordBegin[$node];
+            $numReported = 0;
+
+            for ($e = $this->nodeChildFirst[$node]; $e >= 0; $e = $this->nodeChildNext[$e]) {
+                if ($this->matchWord(
+                    $i,
+                    $i + $length,
+                    $this->nodeChildNode[$e],
+                    $length,
+                    $maxErrors
+                )) {
+                    $numReported++;
+                }
+            }
+
+            if ($length >= $this->minLength && $numReported != 1) {
+                $this->reportClone($i, $i + $length, $node, $length, $length);
+            }
+        }
+
         $map = [];
 
-		for ($index = 0; $index <= count($this->word); ++$index) {
-			$existingClones = $this->cloneInfos[$index] ?? null;
-			if ($existingClones != null) {
+        for ($index = 0; $index <= count($this->word); $index++) {
+            $existingClones = $this->cloneInfos[$index] ?? null;
+
+            if ($existingClones != null) {
                 foreach ($existingClones as $ci) {
                     // length = number of tokens
                     // TODO: min token length
@@ -169,13 +157,15 @@
                     if ($ci->length > $minLength) {
                         /** @var CloneInfo */
                         $previousCi = $map[$ci->token->line] ?? null;
+
                         if ($previousCi == null) {
-                            $map[$ci->token->line] =  $ci;
-                        } else if ($ci->length > $previousCi->length) {
                             $map[$ci->token->line] = $ci;
+                        } elseif ($ci->length > $previousCi->length) {
+                            $map[$ci->token->line] = $ci;
                         }
                         /** @var int[] */
                         $others = $ci->otherClones->extractFirstList();
+
                         for ($j = 0; $j < count($others); $j++) {
                             $otherStart = $others[$j];
                             /** @var PhpToken */
@@ -182,217 +172,304 @@
                             $t = $this->word[$otherStart];
                         }
                     }
-				}
-			}
-		}
+                }
+            }
+        }
 
         /** @var CloneInfo[] */
         $values = array_values($map);
-        usort($values, function ($a, $b) { return $b->length - $a->length;});
+        usort($values, static function ($a, $b) {
+            return $b->length - $a->length;
+        });
+
         return $values;
-	}
+    }
 
-	/**
-	 * Performs the approximative matching between the input word and the tree.
-	 * 
-	 * @param int $wordStart the start position of the currently matched word (position in
-	 *            the input word).
-	 * @param int $wordPosition the current position along the input word.
-	 * @param int $node the node we are currently at (i.e. the edge leading to this
-	 *            node is relevant to us).
-	 * @param int $nodeWordLength the length of the word found along the nodes (this may be
-	 *            different from the length along the input word due to gaps).
-	 * @param int $maxErrors the number of errors still allowed.
-	 * @return boolean whether some clone was reported
+    /**
+     * This should return true, if the provided character is not allowed to
+     * match with anything else (e.g. is a sentinel).
+     */
+    protected function mayNotMatch(JavaObjectInterface $character)
+    {
+        return $character instanceof Sentinel;
+    }
+
+    /**
+     * This method is called whenever the {@link #MAX_LENGTH} is to small and
+     * hence the {@link #edBuffer} was not large enough. This may cause that a
+     * really large clone is reported in multiple chunks of size
+     * {@link #MAX_LENGTH} and potentially minor parts of such a clone might be
+     * lost.
+     */
+    protected function reportBufferShortage(int $leafStart, int $leafLength): void
+    {
+        print 'Encountered buffer shortage: ' . $leafStart . ' ' . $leafLength . "\n";
+    }
+
+    /**
+     * Initializes the {@link #leafCount} array which given for each node the
+     * number of leaves reachable from it (where leaves obtain a value of 1).
+     */
+    private function initLeafCount(int $node): void
+    {
+        $this->leafCount[$node] = 0;
+
+        for ($e = $this->nodeChildFirst[$node]; $e >= 0; $e = $this->nodeChildNext[$e]) {
+            $this->initLeafCount($this->nodeChildNode[$e]);
+            $this->leafCount[$node] += $this->leafCount[$this->nodeChildNode[$e]];
+        }
+
+        if ($this->leafCount[$node] == 0) {
+            $this->leafCount[$node] = 1;
+        }
+    }
+
+    /**
+     * Performs the approximative matching between the input word and the tree.
+     *
+     * @param int $wordStart      the start position of the currently matched word (position in
+     *                            the input word)
+     * @param int $wordPosition   the current position along the input word
+     * @param int $node           the node we are currently at (i.e. the edge leading to this
+     *                            node is relevant to us).
+     * @param int $nodeWordLength the length of the word found along the nodes (this may be
+     *                            different from the length along the input word due to gaps)
+     * @param int $maxErrors      the number of errors still allowed
+     *
      * @throws ConQATException
-	 */
+     *
+     * @return bool whether some clone was reported
+     */
     private function matchWord(int $wordStart, int $wordPosition, int $node, int $nodeWordLength, int $maxErrors)
     {
-		// We are aware that this method is longer than desirable for code
-		// reading. However, we currently do not see a refactoring that has a
-		// sensible cost-benefit ratio. Suggestions are welcome!
+        // We are aware that this method is longer than desirable for code
+        // reading. However, we currently do not see a refactoring that has a
+        // sensible cost-benefit ratio. Suggestions are welcome!
 
-		// self match?
-		if ($this->leafCount[$node] == 1 && $this->nodeWordBegin[$node] == $wordPosition) {
-			return false;
-		}
+        // self match?
+        if ($this->leafCount[$node] == 1 && $this->nodeWordBegin[$node] == $wordPosition) {
+            return false;
+        }
 
-		$currentNodeWordLength = min($this->nodeWordEnd[$node] - $this->nodeWordBegin[$node], $this->MAX_LENGTH - 1);
+        $currentNodeWordLength = min($this->nodeWordEnd[$node] - $this->nodeWordBegin[$node], $this->MAX_LENGTH - 1);
 
-		// do min edit distance
+        // do min edit distance
         /** @var int */
-		$currentLength = $this->calculateMaxLength($wordStart, $wordPosition, $node,
-				$maxErrors, $currentNodeWordLength);
+        $currentLength = $this->calculateMaxLength(
+            $wordStart,
+            $wordPosition,
+            $node,
+            $maxErrors,
+            $currentNodeWordLength
+        );
 
-		if ($currentLength == 0) {
-			return false;
-		}
+        if ($currentLength == 0) {
+            return false;
+        }
 
-		if ($currentLength >= $this->MAX_LENGTH - 1) {
-			$this->reportBufferShortage($this->nodeWordBegin[$node], $currentNodeWordLength);
-		}
+        if ($currentLength >= $this->MAX_LENGTH - 1) {
+            $this->reportBufferShortage($this->nodeWordBegin[$node], $currentNodeWordLength);
+        }
 
-		// calculate cheapest match
-		$best = $maxErrors + 42;
-		$iBest = 0;
-		$jBest = 0;
-		for ($k = 0; $k <= $currentLength; ++$k) {
-			$i = $currentLength - $k;
-			$j = $currentLength;
-			if ($this->edBuffer[$i][$j] < $best) {
-				$best = $this->edBuffer[$i][$j];
-				$iBest = $i;
-				$jBest = $j;
-			}
+        // calculate cheapest match
+        $best  = $maxErrors + 42;
+        $iBest = 0;
+        $jBest = 0;
 
-			$i = $currentLength;
-			$j = $currentLength - $k;
-			if ($this->edBuffer[$i][$j] < $best) {
-				$best = $this->edBuffer[$i][$j];
-				$iBest = $i;
-				$jBest = $j;
-			}
-		}
+        for ($k = 0; $k <= $currentLength; $k++) {
+            $i = $currentLength - $k;
+            $j = $currentLength;
 
-		while ($wordPosition + $iBest < count($this->word)
-				&& $jBest < $currentNodeWordLength
-				&& $this->word[$wordPosition + $iBest] != $this->word[$this->nodeWordBegin[$node] + $jBest]
-				&& $this->word[$wordPosition + $iBest]->equals(
-						$this->word[$this->nodeWordBegin[$node] + $jBest])) {
-			++$iBest;
-			++$jBest;
-		}
+            if ($this->edBuffer[$i][$j] < $best) {
+                $best  = $this->edBuffer[$i][$j];
+                $iBest = $i;
+                $jBest = $j;
+            }
 
-		$numReported = 0;
-		if ($currentLength == $currentNodeWordLength) {
-			// we may proceed
-			for ($e = $this->nodeChildFirst[$node]; $e >= 0; $e = $this->nodeChildNext[$e]) {
-				if ($this->matchWord($wordStart, $wordPosition + $iBest,
-						$this->nodeChildNode[$e], $nodeWordLength + $jBest, $maxErrors
-								- $best)) {
-					++$numReported;
-				}
-			}
-		}
+            $i = $currentLength;
+            $j = $currentLength - $k;
 
-		// do not report locally if had reports in exactly one subtree (would be
-		// pure subclone)
-		if ($numReported == 1) {
-			return true;
-		}
+            if ($this->edBuffer[$i][$j] < $best) {
+                $best  = $this->edBuffer[$i][$j];
+                $iBest = $i;
+                $jBest = $j;
+            }
+        }
 
-		// disallow tail changes
-		while ($iBest > 0
-				&& $jBest > 0
-				&& !$this->word[$wordPosition + $iBest - 1]->equals(
-						$this->word[$this->nodeWordBegin[$node] + $jBest - 1])) {
+        while ($wordPosition + $iBest < count($this->word) &&
+                $jBest < $currentNodeWordLength &&
+                $this->word[$wordPosition + $iBest] != $this->word[$this->nodeWordBegin[$node] + $jBest] &&
+                $this->word[$wordPosition + $iBest]->equals(
+                    $this->word[$this->nodeWordBegin[$node] + $jBest]
+                )) {
+            $iBest++;
+            $jBest++;
+        }
 
-			if ($iBest > 1
-					&& $this->word[$wordPosition + $iBest - 2]->equals(
-							$this->word[$this->nodeWordBegin[$node] + $jBest - 1])) {
-				--$iBest;
-			} else if ($jBest > 1
-					&& $this->word[$wordPosition + $iBest - 1]->equals(
-							$this->word[$this->nodeWordBegin[$node] + $jBest - 2])) {
-				--$jBest;
-			} else {
-				--$iBest;
-				--$jBest;
-			}
-		}
+        $numReported = 0;
 
-		// report if real clone
-		if ($iBest > 0 && $jBest > 0) {
-			$numReported += 1;
-			$this->reportClone($wordStart, $wordPosition + $iBest, $node, $jBest, $nodeWordLength + $jBest);
-		}
+        if ($currentLength == $currentNodeWordLength) {
+            // we may proceed
+            for ($e = $this->nodeChildFirst[$node]; $e >= 0; $e = $this->nodeChildNext[$e]) {
+                if ($this->matchWord(
+                    $wordStart,
+                    $wordPosition + $iBest,
+                    $this->nodeChildNode[$e],
+                    $nodeWordLength + $jBest,
+                    $maxErrors
+                                - $best
+                )) {
+                    $numReported++;
+                }
+            }
+        }
 
-		return $numReported > 0;
-	}
+        // do not report locally if had reports in exactly one subtree (would be
+        // pure subclone)
+        if ($numReported == 1) {
+            return true;
+        }
 
-	/**
-	 * Calculates the maximum length we may take along the word to the current
-	 * $node (respecting the number of errors to make). *
-	 * 
-	 * @param int $wordStart the start position of the currently matched word (position in
-	 *            the input word).
-	 * @param int $wordPosition the current position along the input word.
-	 * @param int $node the node we are currently at (i.e. the edge leading to this
-	 *            node is relevant to us).
-	 * @param int $maxErrors the number of errors still allowed.
-	 * @param int $currentNodeWordLength the length of the word found along the nodes (this may be
-	 *            different from the actual length due to buffer limits).
-	 * @return int the maximal length that can be taken.
-	 */
+        // disallow tail changes
+        while ($iBest > 0 &&
+                $jBest > 0 &&
+                !$this->word[$wordPosition + $iBest - 1]->equals(
+                    $this->word[$this->nodeWordBegin[$node] + $jBest - 1]
+                )) {
+            if ($iBest > 1 &&
+                    $this->word[$wordPosition + $iBest - 2]->equals(
+                        $this->word[$this->nodeWordBegin[$node] + $jBest - 1]
+                    )) {
+                $iBest--;
+            } elseif ($jBest > 1 &&
+                    $this->word[$wordPosition + $iBest - 1]->equals(
+                        $this->word[$this->nodeWordBegin[$node] + $jBest - 2]
+                    )) {
+                $jBest--;
+            } else {
+                $iBest--;
+                $jBest--;
+            }
+        }
+
+        // report if real clone
+        if ($iBest > 0 && $jBest > 0) {
+            $numReported++;
+            $this->reportClone($wordStart, $wordPosition + $iBest, $node, $jBest, $nodeWordLength + $jBest);
+        }
+
+        return $numReported > 0;
+    }
+
+    /**
+     * Calculates the maximum length we may take along the word to the current
+     * $node (respecting the number of errors to make). *.
+     *
+     * @param int $wordStart             the start position of the currently matched word (position in
+     *                                   the input word)
+     * @param int $wordPosition          the current position along the input word
+     * @param int $node                  the node we are currently at (i.e. the edge leading to this
+     *                                   node is relevant to us).
+     * @param int $maxErrors             the number of errors still allowed
+     * @param int $currentNodeWordLength the length of the word found along the nodes (this may be
+     *                                   different from the actual length due to buffer limits)
+     *
+     * @return int the maximal length that can be taken
+     */
     private function calculateMaxLength(
         int $wordStart,
         int $wordPosition,
         int $node,
         int $maxErrors,
-        int $currentNodeWordLength)
+        int $currentNodeWordLength
+    )
     {
-		$this->edBuffer[0][0] = 0;
-		$currentLength = 1;
-		for (; $currentLength <= $currentNodeWordLength; ++$currentLength) {
+        $this->edBuffer[0][0] = 0;
+        $currentLength        = 1;
+
+        for (; $currentLength <= $currentNodeWordLength; $currentLength++) {
             /** @var int */
-			$best = $currentLength;
-			$this->edBuffer[0][$currentLength] = $currentLength;
-			$this->edBuffer[$currentLength][0] = $currentLength;
+            $best                              = $currentLength;
+            $this->edBuffer[0][$currentLength] = $currentLength;
+            $this->edBuffer[$currentLength][0] = $currentLength;
 
-			if ($wordPosition + $currentLength >= count($this->word)) {
-				break;
-			}
+            if ($wordPosition + $currentLength >= count($this->word)) {
+                break;
+            }
 
-			// deal with case that character may not be matched (sentinel!)
-			$iChar = $this->word[$wordPosition + $currentLength - 1];
-			$jChar = $this->word[$this->nodeWordBegin[$node] + $currentLength - 1];
-			if ($this->mayNotMatch($iChar) || $this->mayNotMatch($jChar)) {
-				break;
-			}
+            // deal with case that character may not be matched (sentinel!)
+            $iChar = $this->word[$wordPosition + $currentLength - 1];
+            $jChar = $this->word[$this->nodeWordBegin[$node] + $currentLength - 1];
 
-			// usual matrix completion for edit distance
-			for ($k = 1; $k < $currentLength; ++$k) {
-				$best = min(
-						$best,
-						$this->fillEDBuffer($k, $currentLength, $wordPosition,
-								$this->nodeWordBegin[$node]));
-			}
-			for ($k = 1; $k < $currentLength; ++$k) {
-				$best = min(
-						$best,
-						$this->fillEDBuffer($currentLength, $k, $wordPosition,
-								$this->nodeWordBegin[$node]));
-			}
-			$best = min(
-					$best,
-					$this->fillEDBuffer($currentLength, $currentLength, $wordPosition,
-							$this->nodeWordBegin[$node]));
+            if ($this->mayNotMatch($iChar) || $this->mayNotMatch($jChar)) {
+                break;
+            }
 
-			if ($best > $maxErrors
-					|| $wordPosition - $wordStart + $currentLength <= $this->headEquality
-					&& $best > 0) {
-				break;
-			}
-		}
-		--$currentLength;
-		return $currentLength;
-	}
+            // usual matrix completion for edit distance
+            for ($k = 1; $k < $currentLength; $k++) {
+                $best = min(
+                    $best,
+                    $this->fillEDBuffer(
+                            $k,
+                            $currentLength,
+                            $wordPosition,
+                            $this->nodeWordBegin[$node]
+                        )
+                );
+            }
 
+            for ($k = 1; $k < $currentLength; $k++) {
+                $best = min(
+                    $best,
+                    $this->fillEDBuffer(
+                            $currentLength,
+                            $k,
+                            $wordPosition,
+                            $this->nodeWordBegin[$node]
+                        )
+                );
+            }
+            $best = min(
+                $best,
+                $this->fillEDBuffer(
+                        $currentLength,
+                        $currentLength,
+                        $wordPosition,
+                        $this->nodeWordBegin[$node]
+                    )
+            );
+
+            if ($best > $maxErrors ||
+                    $wordPosition - $wordStart + $currentLength <= $this->headEquality &&
+                    $best > 0) {
+                break;
+            }
+        }
+        $currentLength--;
+
+        return $currentLength;
+    }
+
     /**
-     * @return void
      * @throws ConQATException
      */
-	private function reportClone(int $wordBegin, int $wordEnd, int $currentNode,
-        int $nodeWordPos, int $nodeWordLength)
+    private function reportClone(
+        int $wordBegin,
+        int $wordEnd,
+        int $currentNode,
+        int $nodeWordPos,
+        int $nodeWordLength
+    ): void
     {
         /** @var int */
-		$length = $wordEnd - $wordBegin;
-		if ($length < $this->minLength || $nodeWordLength < $this->minLength) {
-			return;
-		}
+        $length = $wordEnd - $wordBegin;
 
+        if ($length < $this->minLength || $nodeWordLength < $this->minLength) {
+            return;
+        }
+
         /** @var PairList */
-		$otherClones = new PairList();
+        $otherClones = new PairList();
         $this->findRemainingClones(
             $otherClones,
             $nodeWordLength,
@@ -401,118 +478,105 @@
             $wordBegin
         );
 
-		$occurrences = 1 + $otherClones->size();
+        $occurrences = 1 + $otherClones->size();
 
-		// check whether we may start from here
+        // check whether we may start from here
         /** @var PhpToken */
         $t = $this->word[$wordBegin];
         /** @var CloneInfo */
-		$newInfo = new CloneInfo($length, $wordBegin, $occurrences, $t, $otherClones);
-		for ($index = max(0, $wordBegin - $this->INDEX_SPREAD + 1); $index <= $wordBegin; ++$index) {
+        $newInfo = new CloneInfo($length, $wordBegin, $occurrences, $t, $otherClones);
+
+        for ($index = max(0, $wordBegin - $this->INDEX_SPREAD + 1); $index <= $wordBegin; $index++) {
             /** @var CloneInfo */
-			$existingClones = $this->cloneInfos[$index] ?? null;
-			if ($existingClones != null) {
-				//for (CloneInfo cloneInfo : $existingClones) {
+            $existingClones = $this->cloneInfos[$index] ?? null;
+
+            if ($existingClones != null) {
+                //for (CloneInfo cloneInfo : $existingClones) {
                 foreach ($existingClones as $cloneInfo) {
-					if ($cloneInfo->dominates($newInfo, $wordBegin - $index)) {
-						// we already have a dominating clone, so ignore
-						return;
-					}
-				}
-			}
-		}
+                    if ($cloneInfo->dominates($newInfo, $wordBegin - $index)) {
+                        // we already have a dominating clone, so ignore
+                        return;
+                    }
+                }
+            }
+        }
 
-		// add clone to $otherClones to avoid getting more duplicates
-		for ($i = $wordBegin; $i < $wordEnd; $i += $this->INDEX_SPREAD) {
-			$this->cloneInfos[$i][] = new CloneInfo($length - ($i - $wordBegin), $wordBegin, $occurrences, $t, $otherClones);
-		}
+        // add clone to $otherClones to avoid getting more duplicates
+        for ($i = $wordBegin; $i < $wordEnd; $i += $this->INDEX_SPREAD) {
+            $this->cloneInfos[$i][] = new CloneInfo($length - ($i - $wordBegin), $wordBegin, $occurrences, $t, $otherClones);
+        }
         /** @var PhpToken */
         $t = $this->word[$wordBegin];
-		for ($clone = 0; $clone < $otherClones->size(); ++$clone) {
-			$start = $otherClones->getFirst($clone);
-			$otherLength = $otherClones->getSecond($clone);
+
+        for ($clone = 0; $clone < $otherClones->size(); $clone++) {
+            $start       = $otherClones->getFirst($clone);
+            $otherLength = $otherClones->getSecond($clone);
+
             for ($j = 0; $j < $otherLength; $j++) {
                 /** @var PhpToken */
                 $r = $this->word[$j + $start];
             }
-			for ($i = 0; $i < $otherLength; $i += $this->INDEX_SPREAD) {
-				//$this->cloneInfos.add($start + $i, new CloneInfo($otherLength - $i, $wordBegin, occurrences, $t, $otherClones));
-				$this->cloneInfos[$start + $i][] = new CloneInfo($otherLength - $i, $wordBegin, $occurrences, $t, $otherClones);
-			}
-		}
-	}
 
+            for ($i = 0; $i < $otherLength; $i += $this->INDEX_SPREAD) {
+                //$this->cloneInfos.add($start + $i, new CloneInfo($otherLength - $i, $wordBegin, occurrences, $t, $otherClones));
+                $this->cloneInfos[$start + $i][] = new CloneInfo($otherLength - $i, $wordBegin, $occurrences, $t, $otherClones);
+            }
+        }
+    }
 
-	/**
-	 * Fills the edit distance buffer at position (i,j).
-	 * 
-	 * @param int $i the first index of the buffer.
-	 * @param int $j the second index of the buffer.
-	 * @param int $iOffset the offset where the word described by $i starts.
-	 * @param int $jOffset the offset where the word described by $j starts.
-	 * @return int the value inserted into the buffer.
-	 */
+    /**
+     * Fills the edit distance buffer at position (i,j).
+     *
+     * @param int $i       the first index of the buffer
+     * @param int $j       the second index of the buffer
+     * @param int $iOffset the offset where the word described by $i starts
+     * @param int $jOffset the offset where the word described by $j starts
+     *
+     * @return int the value inserted into the buffer
+     */
     private function fillEDBuffer(int $i, int $j, int $iOffset, int $jOffset)
     {
         /** @var JavaObjectInterface */
-		$iChar = $this->word[$iOffset + $i - 1];
+        $iChar = $this->word[$iOffset + $i - 1];
         /** @var JavaObjectInterface */
-		$jChar = $this->word[$jOffset + $j - 1];
+        $jChar = $this->word[$jOffset + $j - 1];
 
-		$insertDelete = 1 + min($this->edBuffer[$i - 1][$j], $this->edBuffer[$i][$j - 1]);
-		$change = $this->edBuffer[$i - 1][$j - 1] + ($iChar->equals($jChar) ? 0 : 1);
-		return $this->edBuffer[$i][$j] = min($insertDelete, $change);
-	}
+        $insertDelete = 1 + min($this->edBuffer[$i - 1][$j], $this->edBuffer[$i][$j - 1]);
+        $change       = $this->edBuffer[$i - 1][$j - 1] + ($iChar->equals($jChar) ? 0 : 1);
 
-	/**
-	 * Fills a list of pairs giving the start positions and lengths of the
-	 * remaining clones.
-	 * 
-	 * @param array<array{int, int}> $clonePositions the clone positions being filled (start position and length)
-	 * @param int $nodeWordLength the length of the word along the nodes.
-	 * @param int $currentNode the node we are currently at.
-	 * @param int $distance the distance along the word leading to the current node.
-	 * @param int $wordStart the start of the currently searched word.
-     * @return void
-	 */
+        return $this->edBuffer[$i][$j] = min($insertDelete, $change);
+    }
+
+    /**
+     * Fills a list of pairs giving the start positions and lengths of the
+     * remaining clones.
+     *
+     * @param array<array{int, int}> $clonePositions the clone positions being filled (start position and length)
+     * @param int                    $nodeWordLength the length of the word along the nodes
+     * @param int                    $currentNode    the node we are currently at
+     * @param int                    $distance       the distance along the word leading to the current node
+     * @param int                    $wordStart      the start of the currently searched word
+     */
     private function findRemainingClones(
         PairList $clonePositions,
         int $nodeWordLength,
         int $currentNode,
         int $distance,
-        int $wordStart)
+        int $wordStart
+    ): void
     {
-		for ($nextNode = $this->nodeChildFirst[$currentNode]; $nextNode >= 0; $nextNode = $this->nodeChildNext[$nextNode]) {
-			$node = $this->nodeChildNode[$nextNode];
-			$this->findRemainingClones($clonePositions, $nodeWordLength, $node, $distance
-					+ $this->nodeWordEnd[$node] - $this->nodeWordBegin[$node], $wordStart);
-		}
+        for ($nextNode = $this->nodeChildFirst[$currentNode]; $nextNode >= 0; $nextNode = $this->nodeChildNext[$nextNode]) {
+            $node = $this->nodeChildNode[$nextNode];
+            $this->findRemainingClones($clonePositions, $nodeWordLength, $node, $distance
+                    + $this->nodeWordEnd[$node] - $this->nodeWordBegin[$node], $wordStart);
+        }
 
-		if ($this->nodeChildFirst[$currentNode] < 0) {
-			$start = count($this->word) - $distance - $nodeWordLength;
-			if ($start != $wordStart) {
-				$clonePositions->add($start, $nodeWordLength);
-			}
-		}
-	}
+        if ($this->nodeChildFirst[$currentNode] < 0) {
+            $start = count($this->word) - $distance - $nodeWordLength;
 
-	/**
-	 * This should return true, if the provided character is not allowed to
-	 * match with anything else (e.g. is a sentinel).
-	 */
-    protected function mayNotMatch(JavaObjectInterface $character)
-    {
-        return $character instanceof Sentinel;
-    }
-
-	/**
-	 * This method is called whenever the {@link #MAX_LENGTH} is to small and
-	 * hence the {@link #edBuffer} was not large enough. This may cause that a
-	 * really large clone is reported in multiple chunks of size
-	 * {@link #MAX_LENGTH} and potentially minor parts of such a clone might be
-	 * lost.
-	 */
-    protected function reportBufferShortage(int $leafStart, int $leafLength) {
-        echo "Encountered buffer shortage: " . $leafStart . " " . $leafLength . "\n";
+            if ($start != $wordStart) {
+                $clonePositions->add($start, $nodeWordLength);
+            }
+        }
     }
 }

      ----------- end diff -----------

  10) src/Detector/Strategy/SuffixTreeStrategy.php
      ---------- begin diff ----------
--- /usr/local/src/phpcpd/src/Detector/Strategy/SuffixTreeStrategy.php
+++ /usr/local/src/phpcpd/src/Detector/Strategy/SuffixTreeStrategy.php
@@ -1,4 +1,4 @@
-<?php
+<?php declare(strict_types=1);
 /*
  * This file is part of PHP Copy/Paste Detector (PHPCPD).
  *
@@ -9,17 +9,17 @@
  */
 namespace SebastianBergmann\PHPCPD\Detector\Strategy;
 
-use function is_array;
 use function array_keys;
 use function file_get_contents;
+use function is_array;
 use function token_get_all;
+use SebastianBergmann\PHPCPD\CodeClone;
+use SebastianBergmann\PHPCPD\CodeCloneFile;
+use SebastianBergmann\PHPCPD\CodeCloneMap;
 use SebastianBergmann\PHPCPD\Detector\Strategy\SuffixTree\ApproximateCloneDetectingSuffixTree;
+use SebastianBergmann\PHPCPD\Detector\Strategy\SuffixTree\CloneInfo;
 use SebastianBergmann\PHPCPD\Detector\Strategy\SuffixTree\PhpToken;
-use SebastianBergmann\PHPCPD\Detector\Strategy\SuffixTree\CloneInfo;
 use SebastianBergmann\PHPCPD\Detector\Strategy\SuffixTree\Sentinel;
-use SebastianBergmann\PHPCPD\CodeClone;
-use SebastianBergmann\PHPCPD\CodeCloneFile;
-use SebastianBergmann\PHPCPD\CodeCloneMap;
 
 final class SuffixTreeStrategy extends AbstractStrategy
 {
@@ -33,16 +33,11 @@
      */
     private $config;
 
-    /**
-     * @param string $file
-     * @param CodeCloneMap $result
-     * @return void
-     */
     public function processFile(string $file, CodeCloneMap $result, StrategyConfiguration $config): void
     {
         $this->config = $config;
-        $content = file_get_contents($file);
-        $tokens = token_get_all($content);
+        $content      = file_get_contents($file);
+        $tokens       = token_get_all($content);
 
         foreach (array_keys($tokens) as $key) {
             $token = $tokens[$key];
@@ -79,6 +74,7 @@
         foreach ($cloneInfos as $cloneInfo) {
             /** @var int[] */
             $others = $cloneInfo->otherClones->extractFirstList();
+
             for ($j = 0; $j < count($others); $j++) {
                 $otherStart = $others[$j];
                 /** @var PhpToken */

      ----------- end diff -----------

  11) src/Detector/Strategy/DefaultStrategy.php
      ---------- begin diff ----------
--- /usr/local/src/phpcpd/src/Detector/Strategy/DefaultStrategy.php
+++ /usr/local/src/phpcpd/src/Detector/Strategy/DefaultStrategy.php
@@ -44,11 +44,6 @@
      */
     protected $hashes = [];
 
-    /**
-     * @param string $file
-     * @param CodeCloneMap $result
-     * @return void
-     */
     public function processFile(string $file, CodeCloneMap $result, StrategyConfiguration $config): void
     {
         $buffer                    = file_get_contents($file);

      ----------- end diff -----------

  12) src/Detector/Strategy/AbstractStrategy.php
      ---------- begin diff ----------
--- /usr/local/src/phpcpd/src/Detector/Strategy/AbstractStrategy.php
+++ /usr/local/src/phpcpd/src/Detector/Strategy/AbstractStrategy.php
@@ -39,5 +39,7 @@
 
     abstract public function processFile(string $file, CodeCloneMap $result, StrategyConfiguration $config): void;
 
-    public function postProcess(): void { }
+    public function postProcess(): void
+    {
+    }
 }

      ----------- end diff -----------

  13) src/Detector/Strategy/StrategyConfiguration.php
      ---------- begin diff ----------
--- /usr/local/src/phpcpd/src/Detector/Strategy/StrategyConfiguration.php
+++ /usr/local/src/phpcpd/src/Detector/Strategy/StrategyConfiguration.php
@@ -1,4 +1,4 @@
-<?php
+<?php declare(strict_types=1);
 /*
  * This file is part of PHP Copy/Paste Detector (PHPCPD).
  *
@@ -18,7 +18,8 @@
 final class StrategyConfiguration
 {
     /**
-     * Minimum lines to consider
+     * Minimum lines to consider.
+     *
      * @var int
      */
     private $minLines = 5;
@@ -25,6 +26,7 @@
 
     /**
      * Minimum tokens to consider in a clone.
+     *
      * @var int
      */
     private $minTokens = 70;
@@ -31,7 +33,8 @@
 
     /**
      * Edit distance to consider when comparing two clones
-     * Only available for the suffix-tree algorithm
+     * Only available for the suffix-tree algorithm.
+     *
      * @var int
      */
     private $editDistance = 5;
@@ -38,7 +41,8 @@
 
     /**
      * Tokens that must be equal to consider a clone
-     * Only available for the suffix-tree algorithm
+     * Only available for the suffix-tree algorithm.
+     *
      * @var int
      */
     private $headEquality = 10;
@@ -45,14 +49,12 @@
 
     /**
      * Fuzz variable names
-     * suffixtree always makes variables and functions fuzzy
+     * suffixtree always makes variables and functions fuzzy.
+     *
      * @var bool
      */
     private $fuzzy = false;
 
-    /**
-     * @param Arguments $arguments
-     */
     public function __construct(Arguments $arguments)
     {
         $this->minLines     = $arguments->linesThreshold();

      ----------- end diff -----------

  14) src/CLI/Application.php
      ---------- begin diff ----------
--- /usr/local/src/phpcpd/src/CLI/Application.php
+++ /usr/local/src/phpcpd/src/CLI/Application.php
@@ -17,8 +17,8 @@
 use SebastianBergmann\PHPCPD\Detector\Detector;
 use SebastianBergmann\PHPCPD\Detector\Strategy\AbstractStrategy;
 use SebastianBergmann\PHPCPD\Detector\Strategy\DefaultStrategy;
+use SebastianBergmann\PHPCPD\Detector\Strategy\StrategyConfiguration;
 use SebastianBergmann\PHPCPD\Detector\Strategy\SuffixTreeStrategy;
-use SebastianBergmann\PHPCPD\Detector\Strategy\StrategyConfiguration;
 use SebastianBergmann\PHPCPD\Log\PMD;
 use SebastianBergmann\PHPCPD\Log\Text;
 use SebastianBergmann\Timer\ResourceUsageFormatter;

      ----------- end diff -----------


Checked all files in 1.228 seconds, 8.000 MB memory used

To be honest, though, I would simply run

$ ./tools/php-cs-fixer fix

and be done with it :)

@olleharstedt
Copy link
Contributor Author

Got i!

Looks like it wants to remove the ConQAT Apache license header?

@sebastianbergmann
Copy link
Owner

Looks like it wants to remove the ConQAT Apache license header?

Yes.

@olleharstedt
Copy link
Contributor Author

Looks like it wants to remove the ConQAT Apache license header?

Yes.

Is this legally correct? IIRC, BSD license is more permissive than Apache 2.0.

@sebastianbergmann
Copy link
Owner

Is this legally correct? IIRC, BSD license is more permissive than Apache 2.0.

I would think so. You wrote new PHP code, you are not copying code as-is from another project.

@olleharstedt
Copy link
Contributor Author

olleharstedt commented Jun 23, 2021

Is this legally correct? IIRC, BSD license is more permissive than Apache 2.0.

I would think so. You wrote new PHP code, you are not copying code as-is from another project.

Depends on your definition of "new". :) The code is manually transpiled from the original Java version. From my discussions on the #fsf IRC channel, if seemed like a good idea to keep it.

@olleharstedt
Copy link
Contributor Author

Not done, need to update tests.

@olleharstedt
Copy link
Contributor Author

OK, please run the Github Action suite.

I did not add the SuffixtreeStrategy to the old test file, since it behaved slightly different. Main problem was how it behaves when all tokens repeat in a small pattern, e..g with fixtures a.php, b.php and c.php. Instead I added a new test for edit distance inside a clone.

One todo is to make EditDistanceTest work with three files. For some reason, the report behaviour was different with three files than of two. It might be related to the part of the algorithm I didn't transpile (yet), GappedCloneConsumer, which detects clones with gaps, as part of a post-process step. I do think the new algorithm is still useful without it, though.

@olleharstedt
Copy link
Contributor Author

Fix pushed. Didn't have xdebug installed locally, so missed the coverage issue.

@codecov
Copy link

codecov bot commented Jul 6, 2021

Codecov Report

Merging #199 (6231b93) into master (951cfbe) will increase coverage by 36.59%.
The diff coverage is 83.96%.

Impacted file tree graph

@@              Coverage Diff              @@
##             master     #199       +/-   ##
=============================================
+ Coverage     29.19%   65.79%   +36.59%     
- Complexity      121      293      +172     
=============================================
  Files            12       22       +10     
  Lines           387      877      +490     
=============================================
+ Hits            113      577      +464     
- Misses          274      300       +26     
Impacted Files Coverage Δ
src/CLI/Application.php 0.00% <0.00%> (ø)
src/CLI/ArgumentsBuilder.php 51.51% <25.00%> (+51.51%) ⬆️
src/Detector/Strategy/SuffixTree/PairList.php 38.33% <38.33%> (ø)
src/Detector/Strategy/DefaultStrategy.php 79.26% <75.00%> (ø)
src/CLI/Arguments.php 60.00% <80.00%> (+60.00%) ⬆️
src/Detector/Strategy/SuffixTree/Token.php 83.33% <83.33%> (ø)
src/Detector/Strategy/SuffixTree/Sentinel.php 84.61% <84.61%> (ø)
...SuffixTree/ApproximateCloneDetectingSuffixTree.php 93.61% <93.61%> (ø)
src/Detector/Strategy/SuffixTreeStrategy.php 94.44% <94.44%> (ø)
...tector/Strategy/SuffixTree/SuffixTreeHashTable.php 96.22% <96.22%> (ø)
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 951cfbe...6231b93. Read the comment docs.

@olleharstedt
Copy link
Contributor Author

You want me to act on the Codecov Report? Maybe also run the CI automatically at push? It's free, AFAIK.

@olleharstedt
Copy link
Contributor Author

Hm, the last failing 8.1 test is due to phpunit phar issue, right? I'll just wait and see. :)

@olleharstedt
Copy link
Contributor Author

Any updates about phpunit and PHP 8.1?

@sebastianbergmann
Copy link
Owner

Hm, the last failing 8.1 test is due to phpunit phar issue, right?

Looks like it.

@sebastianbergmann
Copy link
Owner

I have merged this into master, thank you for contributing this alternative strategy.

There is at least one TODO in the code at https://github.com/sebastianbergmann/phpcpd/blob/master/src/Detector/Strategy/SuffixTreeStrategy.php#L105. Can you have a look at this and check whether there are any other loose ends that need tidying?

There is at least one property that I think should be a constant instead: https://github.com/sebastianbergmann/phpcpd/blob/master/src/Detector/Strategy/SuffixTree/ApproximateCloneDetectingSuffixTree.php#L42 Can you have a look at this, please?

There is at least one file that contains version control placeholders from the original code: https://github.com/sebastianbergmann/phpcpd/blob/master/src/Detector/Strategy/SuffixTree/PairList.php#L17 Please remove these, thanks.

I do not understand what you're doing with templates here:

As far as I can see, $firstType and $secondType are not used. Can they be removed?

@olleharstedt
Copy link
Contributor Author

Thanks!

Sure, will look at those issues.

@olleharstedt
Copy link
Contributor Author

As far as I can see, $firstType and $secondType are not used. Can they be removed?

I wish! Since PHP does not support generics, this is how you tell Psalm the types of the templates.

@olleharstedt
Copy link
Contributor Author

There is at least one TODO in the code at https://github.com/sebastianbergmann/phpcpd/blob/master/src/Detector/Strategy/SuffixTreeStrategy.php#L105. Can you have a look at this and check whether there are any other loose ends that need tidying?

The current index "+1" works in the "normal" case, but it's pretty east to "freak out" the algorithm and get nonsensical results, especially when you lower the token count. In the original implementation, they had an extra post-processing step that I didn't port over. It's a little bit open which type of code-base should be considered "normal", I guess. Maybe I can run a "call to action" and ask people to test the algorithm, and tell me how it reacts. So far, I've only used it on parts of the Yii framework and our own code-base (fairly big, lots of legacy code). Hm, or maybe I can run it on a couple of most popular composer packages and see...

@olleharstedt
Copy link
Contributor Author

Hm, seems like there's a bug that causes an infinite loop now. Didn't see that before.

@olleharstedt
Copy link
Contributor Author

olleharstedt commented Aug 30, 2021

Oh, it stopped, but it took like 15 minutes... Phew. It found duplicates in the symfony/string library successfully, among phpcpd's dependencies, e.g.

- phpcpd/vendor/symfony/string/AbstractString.php:98-158 (60 lines)
phpcpd/vendor/symfony/string/AbstractString.php:165-225

One issue was this:

100.00% duplicated lines out of 0 total lines of code.

So it doesn't report the scanned lines properly.

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants