A Java-Based Search Engine
View Demo
·
Report Bug
·
Request Feature
This project is a smaller-scale search engine designed to help users quickly find the information they seek using keywords. It indexes pages and returns a list of ranked results containing the word or phrase they were searching for.
git clone https://github.com/nestrada2/Search-Engine.git
java Driver [options]
⚙️ Options
- 📂 -text [path]
- Path to a single file or directory of text files to add to the inverted index. If a directory is specified, all .txt and .text files in its subdirectories are added.
- Example:
-text "input/text/simple/hello.txt"
-text "input/text/simple/"
- 📄 -index [path]
- Indicates the inverted index should be output to a JSON file. If a path is specified, It is the path to use for the output file. Defaults to
index.json
if not provided. - Example:
-index "actual/index-simple-hello.json"
- Indicates the inverted index should be output to a JSON file. If a path is specified, It is the path to use for the output file. Defaults to
- 📊 -counts [path]
- Saves word counts to the specified file path (default is
counts.json
). - Example:
-counts wordcounts.json
- Saves word counts to the specified file path (default is
- 📝 -query [path]
- Path to a file of search queries. No search is performed if not provided.
- Example:
-query "input/query/simple.txt"
- 🔎 -exact
- Specifies that searches should be exact search (defaults to partial search if not provided).
- 📈 -results [path]
- Saves search results to the specified file path (default is
results.json
). - Example:
-results actual/search-exact-simple.json
- Saves search results to the specified file path (default is
- 🧵 -threads [num]
- Enables multithreading with the specified number of threads (defaults to 5 if [num] argument is not provided, not a number, or less than 1).
- Example:
-threads 3
- 🌐 -html [seed]
- The seed URL for the web crawler to start building the inverted index.
- Example:
-html "https://usf-cs272-fall2022.github.io/project-web/input/simple/"
- 🔍 -max [total]
- Sets the maximum number of URLs to crawl (including the seed URL) when building the index (default is 1).
- Example:
-max 15
- 🖥️ -server [port]
- Starts a multithreaded search engine web server on the specified port (default is
8080
). - Example:
-server 8080
- Starts a multithreaded search engine web server on the specified port (default is
-
🧵 Run in Single-Threaded Mode (No Server or Crawling)
java Driver
-
📄 Build an Inverted Index from a Text File
java Driver -text "input/text/simple/hello.txt" -index "actual/index-simple-hello.json"
-
🔍📁 Perform Search with Queries from a File, Exact Search, and Save Results
java Driver -text "input/text/simple/" -query "input/query/simple.txt" -exact -results actual/search-exact-simple.json
-
🔍📂🔄 Perform Search with Queries from a File, Partial Search, Save Results, and Enable Multithreading
java Driver -text "input/text/simple/" -query "input/query/simple.txt" -results actual/search-exact-simple.json -threads 3
-
🌐🔄📁 Web Crawl with Multithreading and Save Inverted Index to File
java Driver -html "https://usf-cs272-fall2022.github.io/project-web/input/simple/" -max 15 -threads 3 -index index-crawl.json
-
🌐💻🔍 Web Crawl with Multithreading and Run Server for User to Start Searching
java Driver -html "https://usf-cs272-fall2022.github.io/project-web/input/simple/" -max 15 -threads 3 -server 8080
Distributed under the MIT License. See LICENSE.txt
for more information.
Oracle, Jetty, Stack Overflow, W3 School, MDN, Geeks for Geeks, Terminal CSS, Regex 101