This project aims to create a supervised dataset for image recognition in medical field for synthetic cannabis detection training purposes. A percentage of the images present in the dataset are from the web, while the others are synthetically generated trough Grok. The dataset includes natural cannabis images and synthetic cannabis images (spice). The dataset is designed to train a classifier that, given an image as input, recognizes whether it is a synthetic or natural cannabinoid.
Instructions on setting up the project environment:
- Clone the repository:
git clone https://gitlab.com/aigh1/synthetic-cannabis-detection.git
- Install dependencies:
pip install -r requirements.txt
Link to the dataset used for training: https://liveunibo-my.sharepoint.com/:u:/g/personal/edoardo_tommasi_studio_unibo_it/EfS138cZFMBHgHPbzsAkMUYBUsnSp5SIfiQwK17Jz3223Q?e=3JnVry
leafly_images.rar is the entire dataset used for training composed of:
- cannabis.zip/uncompressed (1): images of natural cannabis webscraped
- Granular.zip: images of synthetic cannabis generated by Grok using our prompts
- synthetic-prompts-output.zip: images of synthetic cannabis generated by Grok using prompts given by Grok
- synthetic_cannabis.zip: images of synthetic cannabis webscraped
link to the dataset used for inference:
-
natural cannabis images: https://drive.google.com/file/d/1EPfs6_8H02vEbVjC8prD7QKN6Kwwj7yz/view?usp=drive_link
-
synthetic cannabis images: shttps://drive.google.com/file/d/1l4h1rQ1bhsBoVZ53L84VvCGEMbSteQ_s/view?usp=sharing
How to run the project:
To run the webscraper simply cd to the folder /src/scripts/webScraping and then write in the shell "python scraper.py" the scraper will then generate a folder called "leafly_images" with two subfolders inside of it "compressed" and "uncompressed" containing respectively the compressed scraped pictures and the uncompressed ones.
The comand python3 gen_img_grok.py -l ./prompts.json -o ./results -n 1 --no-luizo is an example to lunch the script to generate grok images, however you have to choose the right CHROME_PATH
Regarding the training and evaluation phase of the classification models presented in the notebook, it is essential to thoroughly read each script and its accompanying comments before execution. Additionally, these scripts are not designed to be run sequentially, as they do not follow a specific order. Each script serves a distinct purpose, such as image preprocessing or training different classifiers.
/src
: Source code for the project./scripts
: Individual scripts for webScraping and for generate synthetic cannabis images trough Grok./notebooks
: Jupyter notebooks for training and inference the classifiers.
/docs
: Additional documentation in pdf format.