Rubygem for interacting with the DocumentCloud API
gem install documentcloud
or put in your gemfile and bundle install
then
require 'documentcloud'
if not in Rails
To authenticate, initialize the configuration and pass in a block:
DocumentCloud.configure do |config|
config.email = 'my_email@somedomain.com'
config.password = 'my_secret_password'
end
All search results return Document objects that has methods and accessors for the information.
search = DocumentCloud.search('my query')
puts search.total # returns number of results
puts search.documents[0].pdf
You can upload a document by providing a ruby File object, or a string with the url to a file.
local_doc = DocumentCloud.upload(File.new('/my/file/path.pdf','rb'), 'My Document Title')
remote_doc = DocumentCloud.upload('http://somesite.com/file.pdf', 'Document Title')
Both return a document object which can then be used.
This gem creates several objects built from the JSON response in order to provide convenience methods and accessors.
Document objects wrap any document JSON. They are returned in search results, upload, and document methods.
doc = DocumentCloud.document('1234-document-id')
# Accessors
doc.id
doc.title
doc.access
doc.pages
doc.description
doc.source
doc.canonical_url
doc.language
doc.display_langauge
# Methods
doc.pdf # Full URL to pdf
doc.print_annotations
doc.related_article
doc.text # Raw parsed text from document
doc.thumbnail # Primary thumbnail of document
doc.entities # Returns a hash of entities and their relevance
# Images
doc.image(page, size) # Returns the image of a page at the specified size
SearchResults objects return information about the search, and an array of Document objects
results = DocumentCloud.search('my query')
results.total # Total number of results
results.page # The current results page. Used by specifying page param: DocumentCloud.search('my query', page: 2)
results.per_page # Number of results per page
results.query # Query string
results.documents # Array of documents of results:
results.documents[0].title # First result title
Projects returns an array of Project objects which contain a list of Documents and metadata about the project.
projects = DocumentCloud.projects
project = projects.first
# Accessors
project.id
project.title
project.description
project.documents # Returns an array of Document objects
Entities takes a document id and returns a hash of entities, which include a value and relevance for each type of entity.
entities = DocumentCloud.entities(2072158)
# example output
{
:person=>[{:value=>"Pita Sharples", :relevance=>0.112}],
:organization=>[{:value=>"Māori Party", :relevance=>0.48}],
:place=>[],
:term=>
[{:value=>"finance costs", :relevance=>0.552},
{:value=>"public services", :relevance=>0.526}],
:email=>[],
:phone=>[],
:city=>[],
:state=>[],
:country=>
[{:value=>"New Zealand", :relevance=>0.628},
{:value=>"Australia", :relevance=>0.302},
{:value=>"United States", :relevance=>0.295},
{:value=>"United Kingdom", :relevance=>0.295},
{:value=>"China", :relevance=>0.294}]
}
DocumentCloud API info: http://www.documentcloud.org/help/api