Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Cord19 solr blacklight replication doc #1233

Merged
merged 5 commits into from
May 28, 2020
Merged

Cord19 solr blacklight replication doc #1233

merged 5 commits into from
May 28, 2020

Conversation

shaneding
Copy link
Contributor

An extension of experiments-cord19.md.

I encountered this error during indexing through solr, but I think this is a result of the format of data rather than the code:
Request to collection [cord19] failed due to (400) org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://192.168.1.20:8983/solr/cord19_shard1_replica_n1: ERROR: [doc=y6o5e2k5] multiple values encountered for non multiValued field source_x: [Elsevier, Medline, PMC], retry=0 commError=false errorCode=400

@codecov
Copy link

codecov bot commented May 27, 2020

Codecov Report

Merging #1233 into master will increase coverage by 0.22%.
The diff coverage is n/a.

Impacted file tree graph

@@             Coverage Diff              @@
##             master    #1233      +/-   ##
============================================
+ Coverage     48.11%   48.33%   +0.22%     
- Complexity      729      739      +10     
============================================
  Files           147      147              
  Lines          8559     8559              
  Branches       1217     1217              
============================================
+ Hits           4118     4137      +19     
+ Misses         4101     4082      -19     
  Partials        340      340              
Impacted Files Coverage Δ Complexity Δ
...java/io/anserini/ltr/feature/CountBigramPairs.java 89.61% <0.00%> (+24.67%) 33.00% <0.00%> (+10.00%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2b8453c...0d778a1. Read the comment docs.

@lintool
Copy link
Member

lintool commented May 27, 2020

@edwinzhng can you take a look? that multi-value issue sounds familiar.

@shaneding we can remove the Solr section from experiments-cord19.md, right?

@@ -0,0 +1,94 @@
## Title: Ingesting CORD-19 into Solr and Blacklight
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "Title: "

@shaneding
Copy link
Contributor Author

Yes, the solr section in experiments-cord19-extra.md is basically a copy and paste from experiments-cord19.md, also just a note the error didn't actually stop the indexing, indexing just continued past it.

Copy link
Member

@edwinzhng edwinzhng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issue is because of #1231 yesterday. I forgot to update the Solr config. To fix it, can you change src/main/resources/solr/schemas/covid.json so that the field with source_x becomes:

  "add-field": {
    "name":"source_x",
    "type":"string",
    "stored":true,
    "multiValued": true
  },

Once this is done, you can verify that it works by recreating the collection with the new schema as per the README and then index after.

solrini/bin/solr delete -c cord19
pushd src/main/resources/solr && ./solr.sh ../../../../solrini localhost:9983 && popd
solrini/bin/solr create -n anserini -c cord19
curl -X POST -H 'Content-type:application/json' --data-binary @src/main/resources/solr/schemas/covid.json http://localhost:8983/solr/cord19/schema

@edwinzhng
Copy link
Member

Also, would you be able to rename covid.json to cord19.json for consistency (and update the docs as well)? 😄

@shaneding
Copy link
Contributor Author

The error was resolved after updating the json file 👍

@lintool lintool merged commit 94893f1 into castorini:master May 28, 2020
crystina-z pushed a commit to crystina-z/anserini that referenced this pull request Oct 28, 2022
# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants