-
Notifications
You must be signed in to change notification settings - Fork 31
How It Works
Short description of how things are done. For those who would like to contribute to the project.
- The
run.php
file launch the scan on all the selected files / directories. - The
Tokenizer.php
file take a file in entry and return an array of TokenInfo items. - the
PHPCheckStyle.php
file is the main part of the project. It launch the tokenization of the files and then analyse the stream of tokens.- The
processToken
method is a big SWITCH / CASE that launch aprocessXXX
method depending on the token . - The
processXXX
methods detect different cases and launch the "check" rules. - The
checkXXX
methods do the checks of the rules that are activated.
- The
We use the default PHP Tokenizer that we extend to identify tabs and returns and to add a few tokens. This allow the project to work on any computer having PHP installed without any modification.
A cleaner / more complete solution would be to use a proper AST and parse the files with complete information about each token and its context.
The difficulty is that PHP didn't have a real official grammar until PHP7/HHVM and it's not easy to build such an analyser. Some projects like could PHP-Parser help do that.
To compensate for the lack of a real AST, we build during the analysis a stack of currently opened statements.
StatementItem
objets are stored in the StatementStack
. this allow us to have some limited contextual information.
This can be easily visualised by launching the tool with the --debug
flag. It will display something like this:
CLASS(PHPCheckstyle) -> FUNCTION(_processControlStatement) -> IF -> IF
The main analysis is done in 1 pass for each file.
For each token encountered, the processToken
method can start an analysis. This analysis is done using the statementStack
and often by looking ahead the next tokens. We currently use some global flags too, removing those flags would be an improvement.
The tokenizer has some helper functions to peek the value for the next token or the next valid token (not a space or a tab).
For performance reasons, we try to limit the number of tokens we have to look for around the current token.
When a check rule is not verified an error message is sent to one or more reporters. All the reporters (Console, HTML, XML, ...) extend the Reporter
class.