This specification describes the functional requirements, architecture considerations, data handling details, error handling approaches, and a testing plan for creating a single-page web app that takes an image, obtains bounding box data from an AI (LLM or computer vision model), and validates the returned data against a custom JSON Schema.

1. Overview

You want a single-page HTML web application (SPA) that:

Lets a user upload or provide an image.
Sends the image to one or more Large Language Models (LLMs) or computer vision services (collectively, "the AI") for analysis.
Receives a JSON response describing bounding boxes within the image.
Validates the JSON response against a custom bounding box schema (which enforces normalized coordinates, mandatory labels with confidence scores, etc.).
Displays the validation results (pass/fail, any errors) and potentially the bounding box overlays on the image for manual inspection.
Supports multiple AI models (OpenAI GPT-4o, Google Gemini, Anthropic Claude 3).
Creates heat map visualizations that aggregate results from multiple AI queries.

1.1 Primary Use Case

Goal: Evaluate and compare how different LLMs/CV models perform image analysis tasks by returning bounding boxes (along with descriptive labels).
Scenario: The user uploads an image (e.g., "excited dog running in grass"). The system calls the AI, which returns bounding box data in a JSON structure conforming to the schema below, then checks for schema validity and presents the results.
Comparison: The app generates multiple responses from the same AI model to analyze consistency and renders an aggregated heat map visualization.

2. Requirements

Single-Page Interface
- A minimal HTML page with embedded JavaScript.
- File/image upload functionality (or drag-and-drop).
- A place to show the validated JSON output and/or any error messages.
- Model selector dropdown for choosing different AI providers.
- API key input field that stores keys locally in the browser.
- Response selector to view individual AI responses.
Image Upload and AI Request
- The user selects an image (JPEG, PNG, etc.).
- The app sends this image (or a reference/URL to it) to an AI service endpoint.
- Supports OpenAI GPT-4o, Google Gemini, and Anthropic Claude 3 models.
- Makes multiple queries (5 by default) to the same model for consistency analysis.
JSON Schema Validation
- The AI responds with JSON describing bounding boxes.
- The app validates that JSON against the agreed-upon schema using Ajv.
- The result of the validation is displayed to the user (success, or a list of validation errors).
Custom Bounding Box Schema
- Must require width, height, bounding_boxes and contextual_attributes in image.
- width/height: integers.
- bounding_boxes: an array of 0 or more bounding box objects.
  - Each bounding box has x_min, y_min, x_max, y_max as integer pixel coordinates.
  - Each box must have a non-empty labels array.
    - Each label has name (string) and confidence (integer in [0, 100]).
- contextual_attributes: an array of objects with attribute and description strings.
Visualization Features
- Bounding boxes displayed as overlays on the image with different colors.
- Heat map visualization showing aggregated results from multiple queries.
- Ability to switch between individual AI responses for comparison.
- Labels displayed next to bounding boxes with confidence scores.
Flexibility and Extensibility
- The schema allows additional properties beyond those required.
- The user might pass extra data, but the core required fields must conform to the schema.
API Key Management
- API keys stored locally in browser's localStorage.
- Different key fields for each supported AI provider.
- Dynamic labeling based on selected model.

3. Implemented Architecture

A Front-End Only SPA that:

HTML + JavaScript:
- Renders a form for selecting an image and AI model.
- Manages API keys for different providers.
- On form submission, calls the selected AI service.
AI Integration:
- Uses the LangChain library for standardized integration with multiple AI providers.
- Sends the image as base64 to the selected AI model.
- Makes multiple queries (default: 5) for consistency analysis.
- Receives JSON responses with bounding box data.
JSON Validation:
- Uses Ajv for JSON Schema validation.
- Validates the AI's JSON against the custom bounding box schema.
- Displays validation results.
Visualization:
- Draws bounding boxes with labels on a canvas overlay.
- Generates heat map visualizations showing consistency across multiple responses.
- Applies color coding for different object types.

4. Data Handling Workflow

User Action
- The user selects an AI model and provides their API key.
- The user selects an image file from their device.
Front-End Processing
- Converts image to base64 and sends it to the selected AI service.
- Makes multiple identical queries to analyze consistency.
AI Response

The AI returns JSON structures that should conform to this schema:

{  
  "image": {  
    "width": 1024,  
    "height": 768,  
    "bounding_boxes": [  
      {  
        "x_min": 200,  
        "y_min": 100,  
        "x_max": 400,  
        "y_max": 300,  
        "labels": [  
          { "name": "dog", "confidence": 95 },  
          { "name": "running", "confidence": 80 }  
        ]  
      }  
      // ... more bounding boxes ...  
    ],
    "contextual_attributes": [
      { "attribute": "weather", "description": "Sunny day" },
      { "attribute": "location", "description": "Public park" }
    ]  
  }  
}

Validation
- Uses the JSON Schema to validate each response.
- Filters out invalid responses.
Visualization
- Renders individual bounding boxes for each valid response.
- Generates a heat map overlay showing agreement across multiple queries.
- Allows switching between individual responses.

5. Error Handling Strategy

AI Call Failures
- If the AI endpoint is unreachable or returns an error code, skips that response.
- Continues processing other successful responses.
Malformed JSON
- Attempts to clean JSON from markdown code blocks if needed.
- If the response is not valid JSON, skips that response.
Schema Validation Failures
- If the JSON doesn't validate against the schema, logs errors.
- Only processes valid responses for visualization.
Partial Success Strategy
- If some queries succeed and others fail, works with the successful ones.
- Shows a count of successful vs. total queries.

6. JSON Schema

The implemented schema includes support for bounding boxes and contextual attributes:

{
  "$id": "https://example.com/my-bbox-schema.json",
  "type": "object",
  "title": "Image Bounding Box Schema",
  "properties": {
    "image": {
      "type": "object",
      "properties": {
        "width": { "type": "integer", "minimum": 1 },
        "height": { "type": "integer", "minimum": 1 },
        "bounding_boxes": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "x_min": { "type": "integer", "minimum": 0 },
              "y_min": { "type": "integer", "minimum": 0 },
              "x_max": { "type": "integer", "minimum": 0 },
              "y_max": { "type": "integer", "minimum": 0 },
              "labels": {
                "type": "array",
                "minItems": 1,
                "items": {
                  "type": "object",
                  "properties": {
                    "name": { "type": "string" },
                    "confidence": { "type": "integer", "minimum": 0, "maximum": 100 }
                  },
                  "required": ["name", "confidence"],
                  "additionalProperties": true
                }
              }
            },
            "required": ["x_min", "y_min", "x_max", "y_max", "labels"],
            "additionalProperties": true
          }
        },
        "contextual_attributes": {
          "type": "array",
          "items": {
            "type": "object",
            "properties": {
              "attribute": { "type": "string" },
              "description": { "type": "string" }
            },
            "required": ["attribute", "description"],
            "additionalProperties": true
          }
        }
      },
      "required": ["width", "height", "bounding_boxes", "contextual_attributes"],
      "additionalProperties": true
    }
  },
  "required": ["image"],
  "additionalProperties": true
}

Validation Library

Using Ajv for JSON Schema validation with specific options to handle potential issues:

const ajv = new Ajv({
  strict: false,
  strictSchema: false,
  strictTypes: false,
  validateSchema: false
});

7. Heat Map Implementation

The application implements a heat map visualization that:

Makes multiple queries (5 by default) to the same AI model.
Aggregates responses into a grid-based heat map.
Applies a Gaussian blur for smoother visualization.
Uses different colors for different object labels.
Shows transparency based on confidence/agreement level.

This helps visualize:

Areas where the AI consistently detects objects
Differences in boundary precision across queries
Confidence levels for different parts of the image

8. Multi-Model Support

The application supports these AI models:

OpenAI GPT-4o - Using the ChatOpenAI class from LangChain
Google Gemini - Using the ChatGoogleGenerativeAI class from LangChain
Anthropic Claude 3 - Using the ChatAnthropic class from LangChain

Each model requires its own API key, which is:

Stored securely in the browser's localStorage
Not transmitted except to the respective AI provider
Referenced by a unique key based on the model name

9. Testing & Future Enhancements

Testing Considerations
- Test with various image types and sizes
- Compare responses across different AI models
- Validate heat map generation with different sample counts
Potential Enhancements
- Save/export analyzed images with bounding boxes
- Add additional AI model providers
- Include confidence thresholds for filtering results
- Implement side-by-side comparison between models
- Add annotation tools to correct or refine AI-generated boxes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

specification.md

specification.md

1. Overview

1.1 Primary Use Case

2. Requirements

3. Implemented Architecture

4. Data Handling Workflow

5. Error Handling Strategy

6. JSON Schema

Validation Library

7. Heat Map Implementation

8. Multi-Model Support

9. Testing & Future Enhancements

Files

specification.md

Latest commit

History

specification.md

File metadata and controls

1. Overview

1.1 Primary Use Case

2. Requirements

3. Implemented Architecture

4. Data Handling Workflow

5. Error Handling Strategy

6. JSON Schema

Validation Library

7. Heat Map Implementation

8. Multi-Model Support

9. Testing & Future Enhancements