SparkGST

Parallel construction of generalized suffix tree in Spark.

Compile

git clone https://github.com/shad0w-walker233/SparkGST.git
cd SparkGST
sbt package

Execute

${SPARK_HOME/bin}/spark-submit \
--master <spark cluster master uri> \
--class GST.Main \
--executor-memory 15G \
--driver-memory 15G \
--executor-cores 4 \
<jar file path> \
hdfs://input_path \
hdfs://output_path \
TASK_MUL(optional, default 7) \
MAX_PREFIX_LEN(optional, default 4)

Algorithm

Read all the files under the input path.
Pretreatment: Determine which substring can be a key.
Map Stage: For each suffix, generate a node linking to root node with the key of the first several characters which can be a key.
Reduce Stage: Combine trees to generate the subtree of the GST by key.
Recursive traversal and output the information of leaf nodes.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
src/main/scala/GST		src/main/scala/GST
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SparkGST

Compile

Execute

Algorithm

About

Releases

Packages

Languages

sdwalker233/SparkGST

Folders and files

Latest commit

History

Repository files navigation

SparkGST

Compile

Execute

Algorithm

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages