The Spring AI LLM Evaluator is an application that integrates with large language models (LLMs) to:
- Generate responses using pre-configured or custom prompts
- Evaluate and refine prompts to improve response quality
This project leverages Spring AI for LLM integration and a Python-based evaluation service to score and analyze LLM-generated responses. It is designed to assist in prompt engineering, code generation, and response evaluation, making it easier to use LLMs effectively.
-
Spring Boot Application: Hosts the endpoints and integrates with Spring AI for LLM access.
-
Python Evaluation Service:
- Evaluates responses and returns a score indicating their quality
- Runs as a Docker service, accessed via REST endpoint
For more information, check https://github.com/vudayani/spring-ai-llm-demo/tree/main/llm-response-evaluator
- Code Generation Endpoint (/generateResponse)
-
Purpose: Generate responses using user-defined or default prompts
-
Functionality:
- Supports custom prompts
- Provides sample JPA code for applications by default if no custom prompts are provided
Models Supported: OpenAI, Anthropic
- Prompt Tuning Endpoint (/promptTuning)
- Purpose: Fine-tune prompts by evaluating LLM responses
- Functionality:
- Evaluates response quality using a Python-based evaluation service
- Returns feedback or guidelines when responses score below the threshold
- Leverages the LLM itself to suggest improved prompts
-
Prerequisites
- Java 17+
- Docker (for Python evaluation service)
- Maven or Gradle (for Spring Boot application)
- API Keys: The project supports both OpenAI and Anthropic API keys for LLM integration. However, the Python evaluation service requires an OpenAI API key to work
-
Setup
- Clone the repository:
git clone https://github.com/vudayani/spring-ai-llm-demo.git cd spring-ai-llm-evaluator
- Build and start the Spring Boot application:
mvn clean install java -jar target/spring-ai-prompt-evaluator.jar
- Set up the Python evaluation service:
cd llm-response-evaluator
- Build and Run the Python evaluation service (Docker):
docker build -t evaluation-service . docker run -p 8000:8000 -e <OPENAI_API_KEY> evaluation-service