Skip to content
tchule edited this page Feb 27, 2017 · 3 revisions

Short description of how things are done. For those who would like to contribute to the project.

Components of the project

  • The run.php file launch the scan on all the selected files / directories.
  • The Tokenizer.php file take a file in entry and return an array of TokenInfo items.
  • the PHPCheckStyle.php file is the main part of the project. It launch the tokenization of the files and then analyse the stream of tokens.
    • The processToken method is a big SWITCH / CASE that launch a processXXX method depending on the token .
    • The processXXX methods detect different cases and launch the "check" rules.
    • The checkXXX methods do the checks of the rules that are activated.

Steps

Tokenization

We use the default PHP Tokenizer that we extend to identify tabs and returns and to add a few tokens. This allow the project to work on any computer having PHP installed without any modification.

A cleaner / more complete solution would be to use a proper AST and parse the files with complete information about each token and its context.

The difficulty is that PHP didn't have a real official grammar until PHP7/HHVM and it's not easy to build such an analyser. Some projects like could PHP-Parser help do that.

Statements Stack

To compensate for the lack of a real AST, we build during the analysis a stack of currently opened statements.

StatementItem objets are stored in the StatementStack. this allow us to have some limited contextual information.

This can be easily visualised by launching the tool with the --debug flag. It will display something like this:

CLASS(PHPCheckstyle) -> FUNCTION(_processControlStatement) -> IF -> IF

Analysis

The main analysis is done in 1 pass for each file.

For each token encountered, the processToken method can start an analysis. This analysis is done using the statementStack and often by looking ahead the next tokens. We currently use some global flags too, removing those flags would be an improvement.

The tokenizer has some helper functions to peek the value for the next token or the next valid token (not a space or a tab).

For performance reasons, we try to limit the number of tokens we have to look for around the current token.

Reporting

When a check rule is not verified an error message is sent to one or more reporters. All the reporters (Console, HTML, XML, ...) extend the Reporter class.