Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

CDXJ: Error: no such capture field: method #106

Open
edsu opened this issue May 16, 2022 · 7 comments
Open

CDXJ: Error: no such capture field: method #106

edsu opened this issue May 16, 2022 · 7 comments

Comments

@edsu
Copy link

edsu commented May 16, 2022

When posting a CDXJ file (generated with pywb 2.6.7) to the OutbackCDX on DockerHub (v0.11.0?) like so

curl -X POST --data-binary @index.cdxj http://localhost:8080/coll

I'm seeing the following error get printed to the console:

At line: com,google-analytics)/collect?__wb_method=post&__wb_post_data=dj0xjl92pwo5nizhaxa9mszhptc2ndcxodg1myz0pxbhz2v2awv3jl9zptemzgw9ahr0chmlm0elmkylmkzhcg9klm5hc2euz292jtjgyxbvzcuyrmfwmjiwmza3lmh0bwwmzha9jtjgyxbvzcuyrmfwmjiwmza3lmh0bwwmdww9zw4tdxmmzgu9vvrgltgmzhq9qvbprcuzqsuymdiwmjilmjbnyxjjacuymdclmjatjtiwqsuymexpb24lmjbpbiuyme9yaw9ujnnkpte2lwjpdczzcj0xmzywedewmjamdna9mta1mhg4odamamu9mczfdxrtyt0xmtm2otk1ndmumtc1ntcxnza2mi4xnjuymtq0nja0lje2ntixndq2mjaumty1mje0ndyymc4xjl91dg16ptexmzy5otu0my4xnjuymtq0njiwljeums51dg1jc3ilm0qozglyzwn0ksu3q3v0bwnjbiuzrchkaxjly3qpjtdddxrty21kjtnekg5vbmupjl91dg1odd0xnjuymtq0njk0mdg5jl91pvfbq0nbuufcfizqawq9jmdqawq9jmnpzd0xnzu1nze3mdyylje2ntixndq2mdqmdglkpvvbltmzntizmtq1ltemx2dpzd00ntc1ndm3mc4xnjuymtq0nja0jmnkmt1oqvnbjmnkmj1oqvnbjtiwlsuymgfwb2qubmfzys5nb3ymy2qzptiwmtgxmdewjtiwdjqumsuymc0lmjbvbml2zxjzywwlmjbbbmfsexrpy3mmy2q0pxvuc3bly2lmawvkjtnbyxbvzc5uyxnhlmdvdizjzdu9dw5zcgvjawzpzwqlm0fhcg9klm5hc2euz292jmnknj1odhrwcyuzqsuyriuyrmrhcc5kawdpdgfsz292lmdvdiuyrlvuaxzlcnnhbc1gzwrlcmf0zwqtqw5hbhl0awnzlu1pbi5qcyzjzdc9ahr0chmlm0emej0xmjc2mdq0mjew 20220510010455 {"url":"https://www.google-analytics.com/collect","mime":"image/gif","status":"200","digest":"B5HJFHOVXMSWJ55LTR3DHDQE4KJKIKWO","length":"651","offset":"49132028","method":"POST","requestBody":"__wb_post_data=dj0xJl92PWo5NiZhaXA9MSZhPTc2NDcxODg1MyZ0PXBhZ2V2aWV3Jl9zPTEmZGw9aHR0cHMlM0ElMkYlMkZhcG9kLm5hc2EuZ292JTJGYXBvZCUyRmFwMjIwMzA3Lmh0bWwmZHA9JTJGYXBvZCUyRmFwMjIwMzA3Lmh0bWwmdWw9ZW4tdXMmZGU9VVRGLTgmZHQ9QVBPRCUzQSUyMDIwMjIlMjBNYXJjaCUyMDclMjAtJTIwQSUyMExpb24lMjBpbiUyME9yaW9uJnNkPTE2LWJpdCZzcj0xMzYweDEwMjAmdnA9MTA1MHg4ODAmamU9MCZfdXRtYT0xMTM2OTk1NDMuMTc1NTcxNzA2Mi4xNjUyMTQ0NjA0LjE2NTIxNDQ2MjAuMTY1MjE0NDYyMC4xJl91dG16PTExMzY5OTU0My4xNjUyMTQ0NjIwLjEuMS51dG1jc3IlM0QoZGlyZWN0KSU3Q3V0bWNjbiUzRChkaXJlY3QpJTdDdXRtY21kJTNEKG5vbmUpJl91dG1odD0xNjUyMTQ0Njk0MDg5Jl91PVFBQ0NBUUFCfiZqaWQ9JmdqaWQ9JmNpZD0xNzU1NzE3MDYyLjE2NTIxNDQ2MDQmdGlkPVVBLTMzNTIzMTQ1LTEmX2dpZD00NTc1NDM3MC4xNjUyMTQ0NjA0JmNkMT1OQVNBJmNkMj1OQVNBJTIwLSUyMGFwb2QubmFzYS5nb3YmY2QzPTIwMTgxMDEwJTIwdjQuMSUyMC0lMjBVbml2ZXJzYWwlMjBBbmFseXRpY3MmY2Q0PXVuc3BlY2lmaWVkJTNBYXBvZC5uYXNhLmdvdiZjZDU9dW5zcGVjaWZpZWQlM0FhcG9kLm5hc2EuZ292JmNkNj1odHRwcyUzQSUyRiUyRmRhcC5kaWdpdGFsZ292LmdvdiUyRlVuaXZlcnNhbC1GZWRlcmF0ZWQtQW5hbHl0aWNzLU1pbi5qcyZjZDc9aHR0cHMlM0Emej0xMjc2MDQ0MjEw","filename":"apod.warc.gz"}
java.lang.IllegalArgumentException: no such capture field: method
	at outbackcdx.Capture.put(Capture.java:548)
	at outbackcdx.Capture.fromCdxjLine(Capture.java:434)
	at outbackcdx.Capture.fromCdxLine(Capture.java:385)
	at outbackcdx.Webapp.post(Webapp.java:249)
	at outbackcdx.Webapp.lambda$new$3(Webapp.java:102)
	at outbackcdx.Web$Route.handle(Web.java:312)
	at outbackcdx.Web$Router.handle(Web.java:236)
	at outbackcdx.Webapp.handle(Webapp.java:594)
	at outbackcdx.Web$Server.serve(Web.java:50)
	at outbackcdx.NanoHTTPD$HTTPSession.execute(NanoHTTPD.java:848)
	at outbackcdx.NanoHTTPD$1$1.run(NanoHTTPD.java:207)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

Other CDXJ files seem to work normally however.

@edsu
Copy link
Author

edsu commented May 16, 2022

If it's helpful to have the WARC and CDXJ files please let me know!

@ato
Copy link
Member

ato commented May 16, 2022

OutbackCDX does not (yet) support storing arbitrarily named fields.

@ato ato changed the title Error: no such capture field: method CDXJ: Error: no such capture field: method May 16, 2022
@ato
Copy link
Member

ato commented May 17, 2022

Note the "Things it doesn't do (yet): CDXJ" in the README. :-) While it can now map CDX11 fields to CDXJ for input/output it doesn't actually support storing arbitrary CDXJ data.

I don't have any short term plans to implement this myself but would be happy to accept a pull request.

@edsu
Copy link
Author

edsu commented May 17, 2022

Now I'm confused why another CDXJ file worked.

@edsu
Copy link
Author

edsu commented May 17, 2022

I think I understand now: current OutbackCDX can store CDXJ data of a known shape? And the method property is not something it is expecting?

@ato
Copy link
Member

ato commented May 17, 2022

Yes if the CDXJ input is limited to just the basic CDX11 fields it works.

edsu added a commit to edsu/outbackcdx that referenced this issue May 17, 2022
Allow for a a `method` property in the CDXJ, which is occasionally
emitted by pywb.

Fixes nla#106
@ato
Copy link
Member

ato commented May 30, 2023

Commit 9d73df3 added support for storing arbitrary extra CDXJ fields using a CBOR-based record encoding when can be enabled with --index-version 5. This is still experimental and a little more work is needed to actually make use of the method and requestBody fields when constructing the urlkey for compatibility with pywb.

ato added a commit that referenced this issue Jun 9, 2023
This should improve compatibility with Pywb for POST and PUT requests.

#106
# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants