A sophisticated implementation of suffix trees with both standard and compressed variants, designed for efficient pattern matching and text analysis.
This implementation provides a comprehensive suffix tree solution featuring both standard and compressed variants. The project includes pattern matching capabilities, statistical analysis, and efficient tree traversal algorithms.
typedef struct node {
char c; // Character stored in node
struct node *children[27]; // Array for a-z + '$'
} node, *tree;
typedef struct nodeComp {
char *label; // Edge label string
struct nodeComp *children[27];
} nodeC, *treeC;
Operation | Description |
---|---|
initTree() |
Initializes empty tree with root node |
createTree() |
Builds complete suffix tree from input |
insertWord() |
Inserts single suffix into tree |
decomposeWord() |
Generates all suffixes of input string |
Operation | Description |
---|---|
calculateNumOfLeaves() |
Counts terminal nodes |
countNodesOnLevel() |
Analyzes nodes at specific depth |
maxNumOfDescendants() |
Finds maximum branching factor |
findSuffixes() |
Pattern matching in tree |
Operation | Description |
---|---|
stToCst() |
Converts to compressed format |
initTreeC() |
Initializes compressed tree |
isSingleChild() |
Checks for path compression opportunity |
- BFS traversal output
- Complete suffix tree building
- Dynamic memory management
Output format:
<leaf_count>
<k_length_suffixes>
<max_descendants>
Input: "banana", ["na", "ana"]
Output:
1
1
- Path compression
- Space optimization
- Equivalent functionality
./stree [-c1 | -c2 <K> | -c3 | -c4] [input_file] [output_file]
- Dynamic allocation for all nodes
- Proper initialization of pointers
- Fixed-size arrays (27) for alphabet
- Cleanup routines for all allocations
Construction: O(n²)
Pattern Match: O(m)
Space (Standard): O(n²)
Space (Compressed): O(n)
- Null pointer validation
- Input file verification
- Memory allocation checks
- Index bounds verification
make build # Compiles the project
make clean # Removes artifacts
- All suffixes are terminated with '$'
- Input strings use lowercase English alphabet
- Pattern matching returns binary results
- Compressed tree maintains original functionality