The main implementation of the LBMC (Learning-Based Multi-dimensional Cost model) can be found in the following directories:
- Cost Calculation: The cost calculation code is in
python/global_cost.py
andpython/local_cost.py
. These files contain implementations for the global and local cost calculations as outlined in the paper. - Verification and Utilities: Additional verification and utility functions, including those for cost verification, drop patterns, and rise patterns, are located in
python/verify_cost.py
andpython/utils.py
.
The key idea of verification is to proof that our proposed cost algorithms can get exactly the same results.
In verify_cost.py
the following code snippet will verify the results of the cost algorithm.
assert my_gc == naive_gc, ("wrong global cost calculation my_gc:%d, naive_gc:%d", (my_gc, naive_gc))
assert my_gc_all == naive_gc, ("wrong global cost calculation my_gc_all:%d, naive_gc:%d", (my_gc_all, naive_gc))
assert my_lc == naive_lc, ("wrong local cost calculation my_lc:%d, naive_lc:%d", (my_lc, naive_lc))
assert my_lc_all == naive_lc, ("wrong local cost calculation my_lc_all:%d, naive_lc:%d", (my_lc_all, naive_lc))
Please refer to calculate_drop_pattern
and calculate_rise_pattern
.
The experiments in the paper are divided into three main sections, each with corresponding code and dataset information as follows:
Global Cost: please refer to global_cost
Please refer to formula (5).
Local Cost: Please refer to local_cost
Please refer to Algorithm 1.
All used datasets are listed: here
Please refer to Learned-BMTree
For BMTree, please refer to Learned-BMTree pg_test.py
For others, please refer to LearnSFC pg_test.py
TPC:
https://www.tpc.org/tpc_documents_current_versions/current_specifications5.asp
./dbgen -s 1 -T o
NYC:
https://data.cityofnewyork.us/Transportation/2017-Yellow-Taxi-Trip-Data/biws-g3hs/about_data
For NYC dataset, please refer to nyc.scala
For TPC-H dataset, please refer to tpc.scala