Skip to content
This repository has been archived by the owner on Jan 25, 2021. It is now read-only.

Search with * wildcard has an effect on the relevance ranking #700

Open
liowalter opened this issue May 23, 2019 · 4 comments
Open

Search with * wildcard has an effect on the relevance ranking #700

liowalter opened this issue May 23, 2019 · 4 comments

Comments

@liowalter
Copy link
Member Author

Here is solr scoring explanations for the document https://test.swissbib.ch/Record/316493929

This is ranked 1st for quarteroni and ranked 28th for quarteroni*.

quarteroni search
debug link

{
  "316493929": {
    "match": true,
    "value": 6917.0166,
    "description": "sum of:",
    "details": [
      {
        "match": true,
        "value": 6916.366,
        "description": "max of:",
        "details": [
          {
            "match": true,
            "value": 690.38696,
            "description": "weight(author_additional_gnd_txt_mv:quarteroni in 1258316) [ClassicSimilarity], result of:",
            "details": [
              {
                "match": true,
                "value": 690.38696,
                "description": "score(doc=1258316,freq=3.0), product of:",
                "details": [
                  {
                    "match": true,
                    "value": 100,
                    "description": "boost"
                  },
                  {
                    "match": true,
                    "value": 6.9038696,
                    "description": "fieldWeight in 1258316, product of:",
                    "details": [
                      {
                        "match": true,
                        "value": 1.7320508,
                        "description": "tf(freq=3.0), with freq of:",
                        "details": [
                          {
                            "match": true,
                            "value": 3,
                            "description": "termFreq=3.0"
                          }
                        ]
                      },
                      {
                        "match": true,
                        "value": 11.957853,
                        "description": "idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:",
                        "details": [
                          {
                            "match": true,
                            "value": 28,
                            "description": "docFreq"
                          },
                          {
                            "match": true,
                            "value": 1664689,
                            "description": "docCount"
                          }
                        ]
                      },
                      {
                        "match": true,
                        "value": 0.33333334,
                        "description": "fieldNorm(doc=1258316)"
                      }
                    ]
                  }
                ]
              }
            ]
          },
          {
            "match": true,
            "value": 6916.366,
            "description": "weight(author:quarteroni in 1258316) [ClassicSimilarity], result of:",
            "details": [
              {
                "match": true,
                "value": 6916.366,
                "description": "score(doc=1258316,freq=1.0), product of:",
                "details": [
                  {
                    "match": true,
                    "value": 750,
                    "description": "boost"
                  },
                  {
                    "match": true,
                    "value": 9.221822,
                    "description": "fieldWeight in 1258316, product of:",
                    "details": [
                      {
                        "match": true,
                        "value": 1,
                        "description": "tf(freq=1.0), with freq of:",
                        "details": [
                          {
                            "match": true,
                            "value": 1,
                            "description": "termFreq=1.0"
                          }
                        ]
                      },
                      {
                        "match": true,
                        "value": 13.041626,
                        "description": "idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:",
                        "details": [
                          {
                            "match": true,
                            "value": 46,
                            "description": "docFreq"
                          },
                          {
                            "match": true,
                            "value": 7974611,
                            "description": "docCount"
                          }
                        ]
                      },
                      {
                        "match": true,
                        "value": 0.70710677,
                        "description": "fieldNorm(doc=1258316)"
                      }
                    ]
                  }
                ]
              }
            ]
          },
          {
            "match": true,
            "value": 189.3192,
            "description": "weight(addfields_txt_mv:quarteroni in 1258316) [ClassicSimilarity], result of:",
            "details": [
              {
                "match": true,
                "value": 189.3192,
                "description": "score(doc=1258316,freq=1.0), product of:",
                "details": [
                  {
                    "match": true,
                    "value": 50,
                    "description": "boost"
                  },
                  {
                    "match": true,
                    "value": 3.7863839,
                    "description": "fieldWeight in 1258316, product of:",
                    "details": [
                      {
                        "match": true,
                        "value": 1,
                        "description": "tf(freq=1.0), with freq of:",
                        "details": [
                          {
                            "match": true,
                            "value": 1,
                            "description": "termFreq=1.0"
                          }
                        ]
                      },
                      {
                        "match": true,
                        "value": 13.116419,
                        "description": "idf, computed as log((docCount+1)/(docFreq+1)) + 1 from:",
                        "details": [
                          {
                            "match": true,
                            "value": 48,
                            "description": "docFreq"
                          },
                          {
                            "match": true,
                            "value": 8959630,
                            "description": "docCount"
                          }
                        ]
                      },
                      {
                        "match": true,
                        "value": 0.28867513,
                        "description": "fieldNorm(doc=1258316)"
                      }
                    ]
                  }
                ]
              }
            ]
          }
        ]
      },
      {
        "match": true,
        "value": 0.65048635,
        "description": "FunctionQuery(100.0/(3.16E-10*float(abs(ms(const(1558569600000),date(freshness))))+100.0)), product of:",
        "details": [
          {
            "match": true,
            "value": 0.65048635,
            "description": "100.0/(3.16E-10*float(abs(ms(const(1558569600000),date(freshness)=2014-01-01T00:00:00Z)))+100.0)"
          },
          {
            "match": true,
            "value": 1,
            "description": "boost"
          }
        ]
      }
    ]
  }
}

quarteroni* search
debug link

{
  "316493929": {
    "match": true,
    "value": 750.6505,
    "description": "sum of:",
    "details": [
      {
        "match": true,
        "value": 750,
        "description": "max of:",
        "details": [
          {
            "match": true,
            "value": 100,
            "description": "author_additional_gnd_txt_mv:quarteroni*^100.0"
          },
          {
            "match": true,
            "value": 750,
            "description": "author:quarteroni*^750.0"
          },
          {
            "match": true,
            "value": 50,
            "description": "addfields_txt_mv:quarteroni*^50.0"
          }
        ]
      },
      {
        "match": true,
        "value": 0.65048635,
        "description": "FunctionQuery(100.0/(3.16E-10*float(abs(ms(const(1558569600000),date(freshness))))+100.0)), product of:",
        "details": [
          {
            "match": true,
            "value": 0.65048635,
            "description": "100.0/(3.16E-10*float(abs(ms(const(1558569600000),date(freshness)=2014-01-01T00:00:00Z)))+100.0)"
          },
          {
            "match": true,
            "value": 1,
            "description": "boost"
          }
        ]
      }
    ]
  }
}

@liowalter
Copy link
Member Author

Looks like prefix queries ("a*") are constant-scoring (all matching documents get an equal score). The scoring factors TF, IDF, index boost, and "coord" are not used.

@liowalter
Copy link
Member Author

Looks like vufind suffers from the same problem :

https://vufind.org/demo/Search/Results?lookfor=quarteroni*&type=AllFields&limit=20
https://vufind.org/demo/Search/Results?lookfor=quarteroni&type=AllFields&limit=20

This is not really bad for searches, but it is very bad for suggestions, as suggestions are based on wildcard queries. One more reason to use https://lucene.apache.org/solr/guide/7_3/suggester.html

@liowalter
Copy link
Member Author

I solved it using "quarteroni OR quarteroni*" as a query. But this is not a fully convincing solution as this has some border effects (for example using pf solr parameter will boost documents which have the query word twice).

# for free to subscribe to this conversation on GitHub. Already have an account? #.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant