Skip to content

nestrada2/Search-Engine

Repository files navigation

🐓

Rooster

A Java-Based Search Engine

View Demo · Report Bug · Request Feature

📖 About the Project

Rooster

This project is a smaller-scale search engine designed to help users quickly find the information they seek using keywords. It indexes pages and returns a list of ranked results containing the word or phrase they were searching for.

🛠️ Tech Stack

Java
Jetty
Terminal CSS

📦 Getting Started

💾 Installation

git clone https://github.com/nestrada2/Search-Engine.git

▶️ Running the Program

java Driver [options]

⚙️ Options

  • 📂 -text [path]
    • Path to a single file or directory of text files to add to the inverted index. If a directory is specified, all .txt and .text files in its subdirectories are added.
    • Example:
      • -text "input/text/simple/hello.txt"
      • -text "input/text/simple/"
  • 📄 -index [path]
    • Indicates the inverted index should be output to a JSON file. If a path is specified, It is the path to use for the output file. Defaults to index.json if not provided.
    • Example:
      • -index "actual/index-simple-hello.json"
  • 📊 -counts [path]
    • Saves word counts to the specified file path (default is counts.json).
    • Example:
      • -counts wordcounts.json
  • 📝 -query [path]
    • Path to a file of search queries. No search is performed if not provided.
    • Example:
      • -query "input/query/simple.txt"
  • 🔎 -exact
    • Specifies that searches should be exact search (defaults to partial search if not provided).
  • 📈 -results [path]
    • Saves search results to the specified file path (default is results.json).
    • Example:
      • -results actual/search-exact-simple.json
  • 🧵 -threads [num]
    • Enables multithreading with the specified number of threads (defaults to 5 if [num] argument is not provided, not a number, or less than 1).
    • Example:
      • -threads 3
  • 🌐 -html [seed]
    • The seed URL for the web crawler to start building the inverted index.
    • Example:
      • -html "https://usf-cs272-fall2022.github.io/project-web/input/simple/"
  • 🔍 -max [total]
    • Sets the maximum number of URLs to crawl (including the seed URL) when building the index (default is 1).
    • Example:
      • -max 15
  • 🖥️ -server [port]
    • Starts a multithreaded search engine web server on the specified port (default is 8080).
    • Example:
      • -server 8080

🏗️ Usage

📋 Examples

  • 🧵 Run in Single-Threaded Mode (No Server or Crawling)

       java Driver
  • 📄 Build an Inverted Index from a Text File

       java Driver -text "input/text/simple/hello.txt" -index "actual/index-simple-hello.json"
  • 🔍📁 Perform Search with Queries from a File, Exact Search, and Save Results

       java Driver -text "input/text/simple/" -query "input/query/simple.txt" -exact -results actual/search-exact-simple.json
  • 🔍📂🔄 Perform Search with Queries from a File, Partial Search, Save Results, and Enable Multithreading

       java Driver -text "input/text/simple/" -query "input/query/simple.txt" -results actual/search-exact-simple.json -threads 3
  • 🌐🔄📁 Web Crawl with Multithreading and Save Inverted Index to File

       java Driver -html "https://usf-cs272-fall2022.github.io/project-web/input/simple/" -max 15 -threads 3 -index index-crawl.json
  • 🌐💻🔍 Web Crawl with Multithreading and Run Server for User to Start Searching

       java Driver -html "https://usf-cs272-fall2022.github.io/project-web/input/simple/" -max 15 -threads 3 -server 8080

📜 License

Distributed under the MIT License. See LICENSE.txt for more information.

📚 Resources

Oracle, Jetty, Stack Overflow, W3 School, MDN, Geeks for Geeks, Terminal CSS, Regex 101