- We started with using yolov5 with large parameters but then switched to
yolov11nano
for faster inference time - We used
yolov11nano
model to segment products from the shelf image into images of each product - Results from
yolov11nano
: - After getting the bounding boxes on the products, we cropped them and fed the images to
GEMINI API
with propmts for it to identify the product in the image - Made calls to the model and the api using flask and the results are as follows:
- Clone the repository
git clone https://github.com/
cd cloned-repo/frontend
npm install
cd ..
cd backend
pip install -r requirements.txt
To run the frontend
run this command in the frontend directory
npm run dev
To run the backend
run this command in the backend directory
python app.py
- For frontend reactjs with tailwind was used
- The backend runs using flask to make requests to the model and the Gemini API
- The model utilizes YOLO v11n for nearly instant and fairly accurate object detection and cropping in the shelf image.
- Gemini 1.5 flash with 8 billion parameters was used for quick item recognition with added benefits of formatting, filtering unique items, fuzzy identification and context awareness at a low cost both computationally and in terms of API cost.