Incremental Rule Discovery in Response to Parameter Updates

Overview

This paper has been accepted by SIGMOD 2025

This project proposes an incremental rule discovery algorithm. The goal is to efficiently mine rules from a dataset, ensuring their support and confidence meet specified thresholds. This algorithm allows for dynamic adjustments to these thresholds without restarting the entire process, saving time and computational resources.

Datasets

You can get datasets from this page

Packaged App Jar

You can find the packaged app jar in the release page.

Quick Start From Docker

Download Release

Download the jar file into target/scala-2.12/ directory.

wget https://github.com/Xpectuer/IncREE_HUME/releases/download/1.0.0/HUME-assembly-0.2.0-CDF.jar

Build Docker Image

docker build -f .devcontainer/Dockerfile -t incree .

Run the Spark App in container

docker run --rm \
  -v `pwd`/datasets:/app/datasets \
  -v `pwd`/in:/app/in \
  -e DATASET_PATH=/app/datasets/hospital.csv \
  -e PARAMS_PATH=/app/in/in-exp.csv \
  -e OUTPUT_DIR=/app/out \
  -e NUM_EXECUTORS=10 \
  incree

Manually Install

Prerequisites

Ensure the following software versions are installed:

Ubuntu 22.04 (tested)
Scala 2.12.10
SBT 1.9.2
Apache Spark 3.5.1

Steps

Download Datasets: Obtain the CSV datasets from the specified URL.
Organize Files: Place datasets in a directory, e.g., <PROJECT_ROOT>/datasets, and parameter input files in <PROJECT_ROOT>/in.
Compile the project with following command

sbt +clean +assembly

Make sure there is a file named plugins.sbt located in the <PROJECT_ROOT>/project directory, and that it contains the following content.

addSbtPlugin( "com.eed3si9n" %% "sbt-assembly" % "2.2.0")
addSbtPlugin("ch.epfl.scala" % "sbt-scalafix" % "0.10.1")

Run the Algorithm: Use the following shell command to execute the algorithm:

spark-submit \
--master local[*] \
--deploy-mode client \
--class org.dsl.SparkMain \
--name "IncREE" \
--conf spark.driver.memory=16g \
--conf spark.executor.memory=8g \
--conf spark.local.dir=<PROJECT_ROOT> \
--conf spark.executor.instances=20 \
<PROJECT_ROOT>/target/scala-2.12/HUME-assembly-0.2.0-CDF.jar \
-d datasets/adult.csv \
-p in/exp-test.csv \
-n 20 \
-o adult_output \
-m inc -l -c

Note: Key command options include:

Replace <PROJECT_ROOT> with the path to the root directory of your project.
-d: Specifies the dataset path.
-p: Specifies the parameter input file path.
-o: Specifies the output file path (created if it doesn't exist).

Input Parameter File

Input Parameter File Format The input parameter file should be in CSV format, structured as follows:

Supp1,Supp2,Conf1,Conf2,Recall,Radius
1E-06,1E-05,0.95,0.85,1.0,3
1E-06,1E-05,0.95,0.85,0.7,3

Supp1/Supp2: Old and new support parameters.
Conf1/Conf2: Old and new confidence parameters.
Recall: The recall guarantee for the incremental miner (applicable for certain parameter variations).
Radius: Sampling radius for the procedure.

This guide should help you set up and run the incremental rule discovery algorithm efficiently. For more detailed information, please refer to the documentation provided within the project.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
.metals		.metals
in		in
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
.jvmopts		.jvmopts
MANIFEST.MF		MANIFEST.MF
README.md		README.md
build.sbt		build.sbt
plugins.sbt		plugins.sbt
run-apriori.py		run-apriori.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Incremental Rule Discovery in Response to Parameter Updates

Overview

Datasets

Packaged App Jar

Quick Start From Docker

Download Release

Build Docker Image

Run the Spark App in container

Manually Install

Prerequisites

Steps

Input Parameter File

About

Releases 1

Packages

Languages

Xpectuer/IncREE_HUME

Folders and files

Latest commit

History

Repository files navigation

Incremental Rule Discovery in Response to Parameter Updates

Overview

Datasets

Packaged App Jar

Quick Start From Docker

Download Release

Build Docker Image

Run the Spark App in container

Manually Install

Prerequisites

Steps

Input Parameter File

About

Topics

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages