Skip to content

ikazakof/java-search-engine

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

java-search-engine

The Java search engine is designed for multi-threaded indexing of a given group of sites with subsequent search by their content (Russian words).

The optimal speed of the program is ensured by:

  • Performing indexing process of each site/page in a separate thread
  • Using of ForkJoinPool for recursive crawling of the site and lemmatization of its pages.

  • Search engine developed on stack of technology:

  • Syntax - Java 11
  • Framework - Springframework
  • Database - MySQL 8.0.26
  • Library - Russianmorphology 1.5
  • Library - JSOUP 1.20.2
  • Library - Lombok 1.18.24
  • Library - Json-simple 1.1.1
  • FrontEnd - HTML, CSS, JavaScript
  • Try live DEMO

    Open live demo and go to "Indexing and search" chapter, point 2.

    1. Hosting:

      Simple Cloud

    2. Server characteristics:

      Processor: 1 core 2Ghz;

      Memory: 2 Gb;

      SSD: 20 GB.

    Prepare and start project on your device

    1. Install prerequisites:

      Install MySQL 8.0.26 or later.

    2. Clone repository.
    3. Configure application.yml:

      Type username and password for connect to database with corresponding rights;

      Type sites url and name.

      Type the maximum percentage of the appearance of the Lema from the total number of pages in the search. DEFAULT = 60%

    4. Configure your IDE:

      Increase Xmx memory in VM options: -Xmx4096m;

      Attach project directory "lib" with Russianmorphology in Project Settings -> Libraries;

      Start Main method after maven download all project depencies.

    Indexing and search

    1. Open Search engine start page in browser - http://localhost:8080
    2. Go to management tab and click the "Start indexing" button;

      ATTENTION!
      In this implementation, when you start a full indexing, all previous data will be deleted!

    3. On dashboard tab, you can monitor the progress of indexing;
    4. On search tab, you can enter a search query once any of the sites have been indexed;
    5. If there are more than 10 results, click "show more"

    Indexing a specific page

    1. On management tab, type the page url and click the "Add/update" button;

      NOTE
      Page must be member of one target site.

    2. Check the result on search tab.