- Let
be the ascending set of unique lengths of all dataset words. - Sequentially apply the Lange and Wiehagen's algorithm to the set of words of length
. - If the pattern inclusion condition isn't satisfied, then apply the Angluin's algorithm to these patterns.
- The result pattern corresponds to the minimum length words.
git clone https://github.com/julia-bel/machine_learning_of_patterns
pip install -r requirements.txt
python main.py [-h] -d DATASET_PATH [-o]
-d, --dataset_path
- path to the file with words for learning (default datasets/dataset.csv
).
-o, --optimize
- whether to use optimization of Angluin's algorithm.
.
|-- README.md
|-- assets
| `-- readme visualizations
|-- automaton
| |-- abstract_automaton.py - superclass for NFA and DFA
| `-- automaton.py - implementation of NFA and DFA for regex matching
|-- datasets
| `-- files with datasets
|-- experiments
| |-- matching_time.ipynb - notebook with time measurements
| |-- two_steps_generation.ipynb - notebook for dataset creating
| `-- dataset.csv - dataset regexes
|-- learning_algorithm
| |-- angluin_learning.py - implementation of Angluin's algorithm
| `-- lange_weihagen_learning.py - implementation of LWA
|-- pattern
| |-- abstract_pattern.py - superclass for non-erasing pattern
| `-- pattern.py - implementation of non-erasing pattern
|-- regex
| |-- abstract_regex.py - superclass for regex
| |-- const.py - constants for regex module
| |-- generator.py - generator of random regex
| |-- parser.py - regex parser
| `-- regex.py - implementation of regex
|-- utils
| `-- utils.py - common utilities
|-- visualization
| `-- *.gv, *.gv.png files for DFA, NFA and regex structure visualization
|-- requirements.txt
`-- main.py - script for patterns learning