Skip to content

Term project involving recommending researchers to other researchers keeping their diversity in mind

Notifications You must be signed in to change notification settings

priyankpalod/Information_Retrieval

Repository files navigation

Researcher Recommendation System

Term project involving recommending researchers to other researchers keeping their diversity in mind

Progress:

Total no. of authors now = 11143... Shortlisted the ones who have > 31 papers till 2005 and > 5 papers after 2005

We have diversity of any two authors as a matrix stored in 'cosine_distances_fields.npy' file.

We have created the datasets i.e. for each author, we have some authors who worked with him after 2005 which are:

  • Diverse (cosine similarity <= 0.2) - in diverse_coauthors.p
  • Similar (cosine similarity > 0.2) - in similar_coauthors.p

We wrote the code to find the fields of citations of every paper in a dictionary 'citation_field_vectors.p' but citation network info that we have seems unadequete/different to be meaningful.

I thought we should go by grepping '#index|#%f' in all papers data but even that can not be done since dumped_data_all_authors does not have lines starting from '#%f' (No field information for citations in the dataset)... May be we have to drop this feature.

Some code has been written for LDA training but is basically still not updated.

Run the following commands to set up the system

First clone this repo to your local machine if you have not done so. 'duhh!!'

The command for cloning is -

git clone https://github.com/priyankpalod/Information_Retrieval.git

If that gives error, maybe you don't have git installed on your machine. Try installing git and retry

sudo apt-get install git

Upon successful cloning, enter the repository directory and continue with the commands mentioned below

  • To copy the 17GB papers file, run the following command in your terminal. This may take some time depending on your internet speed.

scp 13CS10037@10.5.18.104:./Information_Retrieval/dumped_data_all_papers .

  • To copy the authors file, run the following command in your terminal.

scp 13CS10037@10.5.18.104:./Information_Retrieval/dumped_data_authors.txt .

  • To create the fieldsData.txt file by a simple grep, run this command. May take some time.

grep -E '#index|^#f' dumped_data_all_papers > fieldsData.txt

  • This will take up a lot of time... do not worry.. just a one time command :)

python system.py

About

Term project involving recommending researchers to other researchers keeping their diversity in mind

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published