GitHub - nestrada2/Search-Engine: Java-Based Search Engine

🐓

Rooster

A Java-Based Search Engine

View Demo · Report Bug · Request Feature

📖 About the Project

This project is a smaller-scale search engine designed to help users quickly find the information they seek using keywords. It indexes pages and returns a list of ranked results containing the word or phrase they were searching for.

🛠️ Tech Stack

📦 Getting Started

💾 Installation

git clone https://github.com/nestrada2/Search-Engine.git

▶️ Running the Program

java Driver [options]

⚙️ Options

📂 -text [path]
- Path to a single file or directory of text files to add to the inverted index. If a directory is specified, all .txt and .text files in its subdirectories are added.
- Example:
  - -text "input/text/simple/hello.txt"
  - -text "input/text/simple/"
📄 -index [path]
- Indicates the inverted index should be output to a JSON file. If a path is specified, It is the path to use for the output file. Defaults to index.json if not provided.
- Example:
  - -index "actual/index-simple-hello.json"
📊 -counts [path]
- Saves word counts to the specified file path (default is counts.json).
- Example:
  - -counts wordcounts.json
📝 -query [path]
- Path to a file of search queries. No search is performed if not provided.
- Example:
  - -query "input/query/simple.txt"
🔎 -exact
- Specifies that searches should be exact search (defaults to partial search if not provided).
📈 -results [path]
- Saves search results to the specified file path (default is results.json).
- Example:
  - -results actual/search-exact-simple.json
🧵 -threads [num]
- Enables multithreading with the specified number of threads (defaults to 5 if [num] argument is not provided, not a number, or less than 1).
- Example:
  - -threads 3
🌐 -html [seed]
- The seed URL for the web crawler to start building the inverted index.
- Example:
  - -html "https://usf-cs272-fall2022.github.io/project-web/input/simple/"
🔍 -max [total]
- Sets the maximum number of URLs to crawl (including the seed URL) when building the index (default is 1).
- Example:
  - -max 15
🖥️ -server [port]
- Starts a multithreaded search engine web server on the specified port (default is 8080).
- Example:
  - -server 8080

🏗️ Usage

📋 Examples

🧵 Run in Single-Threaded Mode (No Server or Crawling)
```
   java Driver
```

📄 Build an Inverted Index from a Text File

   java Driver -text "input/text/simple/hello.txt" -index "actual/index-simple-hello.json"

🔍📁 Perform Search with Queries from a File, Exact Search, and Save Results

   java Driver -text "input/text/simple/" -query "input/query/simple.txt" -exact -results actual/search-exact-simple.json

🔍📂🔄 Perform Search with Queries from a File, Partial Search, Save Results, and Enable Multithreading

   java Driver -text "input/text/simple/" -query "input/query/simple.txt" -results actual/search-exact-simple.json -threads 3

🌐🔄📁 Web Crawl with Multithreading and Save Inverted Index to File

   java Driver -html "https://usf-cs272-fall2022.github.io/project-web/input/simple/" -max 15 -threads 3 -index index-crawl.json

🌐💻🔍 Web Crawl with Multithreading and Run Server for User to Start Searching

   java Driver -html "https://usf-cs272-fall2022.github.io/project-web/input/simple/" -max 15 -threads 3 -server 8080

📜 License

Distributed under the MIT License. See LICENSE.txt for more information.

📚 Resources

Oracle, Jetty, Stack Overflow, W3 School, MDN, Geeks for Geeks, Terminal CSS, Regex 101

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.idea		.idea
src/main		src/main
.classpath		.classpath
.gitattributes		.gitattributes
.gitignore		.gitignore
.project		.project
LICENSE.txt		LICENSE.txt
Procfile		Procfile
README.md		README.md
pom.xml		pom.xml
system.properties		system.properties

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rooster

📖 About the Project

🛠️ Tech Stack

📦 Getting Started

💾 Installation

▶️ Running the Program

🏗️ Usage

📋 Examples

📜 License

📚 Resources

About

Releases

Packages

Contributors 4

Languages

License

nestrada2/Search-Engine

Folders and files

Latest commit

History

Repository files navigation

Rooster

📖 About the Project

🛠️ Tech Stack

📦 Getting Started

💾 Installation

▶️ Running the Program

🏗️ Usage

📋 Examples

📜 License

📚 Resources

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages