Skip to content

🌊 datadiver-ai is the ultimate tool for πŸ“Š web scraping, transforming πŸ•ΈοΈ unstructured websites into ✨ clean JSON with the 🧠 AI-powered processing.

License

Notifications You must be signed in to change notification settings

divyanshudhruv/datadiver-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

11 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🌊 DataDiver AI

datadiver-ai is the ultimate tool for πŸ“Š web scraping, transforming πŸ•ΈοΈ unstructured websites into ✨ clean JSON. Easily extract πŸ“ paragraphs, πŸ“‹ lists, πŸ”— links, and πŸ–ΌοΈ images with our 🧠 AI-powered processing.


GitHub stars GitHub last commit GitHub issues GitHub contributors Top language

Made with XediX

Important

Extract structured data from any website with a simple API!πŸš€


πŸ” Overview

DataDiver AI is an intelligent web scraping tool that transforms unstructured web pages into clean, organized JSON data. Perfect for research, data analysis, content aggregation, and more!


✨ Features

  • 🌐 Universal Scraping - Works with virtually any website
  • 🧠 AI-Powered - Uses Mistral AI for intelligent data processing
  • 🧩 Structured Output - Converts messy HTML into clean, consistent JSON
  • πŸ”„ Content Categorization - Automatically organizes content by section
  • πŸ“Š Rich Content Support - Extracts paragraphs, lists, links, and images
  • πŸ’» Simple API - Easy-to-use interface for quick integration

πŸ› οΈ Tech Stack

  • βš›οΈ Next.js + React
  • πŸ“˜ TypeScript
  • πŸ” JSDOM for HTML parsing
  • 🧠 Mistral API for optimization
  • 🎨 Custom CSS for beautiful UI

πŸ“¦ Installation

# Clone the repository
git clone https://github.com/divyanshudhruv/datadiver-ai.git

# Navigate to project directory
cd datadiver-ai

# Install dependencies
npm install

# Set up environment variables
cp .env.example .env
# Add your Mistral API key to .env file

πŸš€ Getting Started

# Start the development server
npm run dev

# Open your browser and navigate to
http://localhost:3000

πŸ“‹ Usage

Web Interface

  1. Enter the URL you want to scrape
  2. Click "Scrape"
  3. View the structured JSON output

API Example

// Fetch data from a URL
const response = await fetch("/api/scrape", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({ url: "https://example.com" })
});

const data = await response.json();
console.log(data);

πŸ“Š Example Response

{
  "success": true,
  "url": "https://example.com",
  "data": {
    "title": "Example Website",
    "meta": {
      "description": "This is an example website"
    },
    "content": {
      "about_us": {
        "title": "About Us",
        "items": [
          {
            "type": "paragraph",
            "text": "We are a sample company demonstrating DataDiver AI"
          },
          {
            "type": "list",
            "listType": "unordered",
            "items": ["Feature 1", "Feature 2", "Feature 3"]
          }
        ]
      }
    }
  }
}

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

# Create a new branch
git checkout -b feature/amazing-feature

# Make your changes and commit them
git commit -m 'Add some amazing feature'

# Push to the branch
git push origin feature/amazing-feature

# Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

About

🌊 datadiver-ai is the ultimate tool for πŸ“Š web scraping, transforming πŸ•ΈοΈ unstructured websites into ✨ clean JSON with the 🧠 AI-powered processing.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published