Extract method comes with no specified headers #96

TonyEight · 2013-10-21T09:45:31Z

I may be getting silly or maybe I just can't understand the implementation, but it seems like the extract method is not implemented in a way to give proper parameters to _send_request.

There are no given headers. With the current _send_request implementation, when no headers are provided, it uses application/xml instead :

if not 'content-type' in [key.lower() for key in headers.keys()]:
    headers['Content-type'] = 'application/xml; charset=UTF-8'

Using pysolr with django-haystack, I was unable to correctly implement the extract_contents_file method from SolrBackend due to this fact. Every submitted files were considered as XML instead of the ContentType found by Tika, resulting in a continuous ParseError...

Commentting the given part of code solves my case, but I'm sure there may be another better option.

Do you have any clue ?

The text was updated successfully, but these errors were encountered:

…tack see django-haystack#96

tongwang · 2013-12-04T18:43:16Z

Having the same issue here. extract is broken.

acdha · 2013-12-04T22:31:19Z

This apparently changed at some point in Solr's release cycle. I'd want to test to confirm that it works as far back as we support but otherwise my first question is whether this works if you use a generic mime type like application/octet-stream which would allow us to avoid any chance of affecting other requests by having extract set that header for its requests.

…ed test solr sever from 4.1.0 to 4.6.0. All tests pass

tongwang · 2013-12-06T17:39:10Z

application/octet-stream won't work. Seems like the only way to make it work is to not setting Content-type at all, letting requests to set it to the correct multipart/form-data with boundary.

stale · 2018-06-05T16:16:03Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

TonyEight pushed a commit to TonyEight/pysolr that referenced this issue Nov 2, 2013

Added a hacky workaround to allow documents data extraction from Hays…

3d44471

…tack see django-haystack#96

tongwang mentioned this issue Dec 4, 2013

Solr backend extract_file_contents fails due to incorrect default Content-type header django-haystack/django-haystack#912

Closed

ghost assigned acdha Dec 4, 2013

tongwang added a commit to tongwang/pysolr that referenced this issue Dec 6, 2013

fixed both issues django-haystack#96 and django-haystack#90 and updat…

5dbb1a5

…ed test solr sever from 4.1.0 to 4.6.0. All tests pass

tongwang mentioned this issue Dec 6, 2013

Fixed broken extract function #104

Merged

pyup-bot mentioned this issue Nov 8, 2016

Pin pysolr to latest version 3.6.0 mytardis/mytardis#694

Merged

stale bot added the stale label Jun 5, 2018

stale bot closed this as completed Jul 5, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extract method comes with no specified headers #96

Extract method comes with no specified headers #96

TonyEight commented Oct 21, 2013

tongwang commented Dec 4, 2013

acdha commented Dec 4, 2013

tongwang commented Dec 6, 2013

stale bot commented Jun 5, 2018

Extract method comes with no specified headers #96

Extract method comes with no specified headers #96

Comments

TonyEight commented Oct 21, 2013

tongwang commented Dec 4, 2013

acdha commented Dec 4, 2013

tongwang commented Dec 6, 2013

stale bot commented Jun 5, 2018