-
Notifications
You must be signed in to change notification settings - Fork 0
Future Work
Thomas Schweizer edited this page Oct 18, 2023
·
4 revisions
There are many opportunities for the future of this project. We list a few ideas below.
- (medium) Synthetic commits:
- Are untangling tools' performance on synthetic and real commits similar?
- What are synthetic dataset made of? Calculate tangledness at file, hunk, and line granularity
- (low) Measure performance on different granularities
- Iteratively changes parameters to have a smaller number or modify SmartCommit and Flexeme implementation to create 2-3 groups.
- Also applies to file-based implementation: if too many files, increase granularity to folder, package.
- Take Flexeme and coarsen it to hunk granularity with the same process as SmartCommit? How much does it help?
- Refactor the implementation to separate coarsening from clustering so we can evaluate on hunks or lines using the same process for each tool.
- (low) Help identify the importance of each part by replacing graph clustering with another graph clustering for SmartCommit and Flexeme
- (medium) Try using the Flexeme original data to calculate the line-based rand index. This would be great data to have, and we already have the scripts to calculate it. The only thing we need to do is
- Unzip the PDGs from Flexeme's data
- Update the script that translates a PDG into CSV to also export a column with the true label (it’s another attribute of the node)
- Run the script that translates a PDG into CSV.
- Calculate the rand score between the true label and the grouping.
- (high) There are a few offshoots of Flexeme that should be easier to integrate to the pipeline
- (high) There are also a few more recent untangling tools.
- CoRA (from WangLZX2019, ASE 2019)
- UTango (2022). No tool available publicly. We reached out to authors.
- ComUnt (2022). No tool available publicly. We reached out to authors.
- (low) We make our own tool! Use past changes to predict whether changes are similar or not.
- (low) Add ChatGPT as a tool in the evaluation
- (high) Add dataset "A fine-grained data set and analysis of tangling in bug fixing commits" Herbold et., al. 2022 (DONE)
- (medium) Add the CoRA dataset "CoRA: Decomposing and Describing Tangled Code Changes for Reviewer" Wang et., al. 2019. This is only 50 commits.