Just import the module and go on. Note that we must be async.
import asyncio
from pypaperless import Paperless
paperless = Paperless("localhost:8000", "your-secret-token")
# see main() examples
asyncio.run(main())
main() Example 1
async def main():
await paperless.initialize()
# do something
await paperless.close()
main() Example 2
async def main():
async with paperless:
# do something
There are some rules for the Paperless-ngx url.
- Isn't a scheme applied to it?
https
is automatically used. - Does the url explicitly start with
http
? Okay, be unsafe 😵. - Only use the base url of your Paperless-ngx. Don't add
/api
to it.
You may want to use an existing aiohttp.ClientSession
in some cases. Simply pass it to the Paperless
object.
import aiohttp
from pypaperless import Paperless
my_session = aiohttp.ClientSession()
# ...
paperless = Paperless("localhost:8000", "your-secret-token", session=my_session)
PyPaperless needs an API token to request and send data from and to Paperless-ngx for authentication purposes. I recommend you to create a technical user and assign a token to it via Django Admin, when you bootstrap any project with PyPaperless. If you need to create that token by providing credentials, PyPaperless ships with a little helper for that task.
token = Paperless.generate_api_token(
"localhost:8000",
"test_user",
"not-so-secret-password-anymore",
)
As for Paperless
itself, you can provide a custom aiohttp.ClientSession
object.
url = "localhost:8000"
my_session = aiohttp.ClientSession()
token = Paperless.generate_api_token(
"localhost:8000",
"test_user",
"not-so-secret-password-anymore",
session=my_session,
)
Caution
Hardcoding credentials or tokens is never good practise. Use that with caution.
The code above executes one http request:
POST
https://localhost:8000/api/token/
Resource | Request | Iterate | Create | Update | Delete | Permissions |
---|---|---|---|---|---|---|
config | x | |||||
correspondents | x | x | x | x | x | x |
custom_fields | x | x | x | x | x | |
document_types | x | x | x | x | x | x |
documents | x | x | x | x | x | x |
groups | x | x | ||||
logs | n.a. | |||||
mail_accounts | x | x | x | |||
mail_rules | x | x | x | |||
saved_views | x | x | x | |||
share_links | x | x | x | x | x | |
storage_paths | x | x | x | x | x | x |
tags | x | x | x | x | x | x |
tasks | x | x* | ||||
users | x | x | ||||
workflows | x | x |
*: Only __aiter__
is supported.
logs
are not implemented, as they return plain text. I cannot imagine any case where that could be needed by someone.
Retrieving data from Paperless-ngx is really easy, there are different possibilities to achieve that.
You'll need to use that in the most cases, as PyPaperless always returns references to other resource items by their primary keys. You must resolve these references on your own. The returned objects are always PaperlessModel
s.
document = await paperless.documents(1337)
doc_type = await paperless.document_types(document.document_type) # 23
print(f"Document '{document.title}' is an {doc_type.name}.")
#-> Document 'Order #23: Desktop Table' is an Invoice.
The code above executes two http requests:
GET
https://localhost:8000/api/documents/1337/
GET
https://localhost:8000/api/document_types/23/
Since resource items are requested by their primary key, it could be useful to request a list of all available primary keys.
item_keys = await paperless.documents.all()
#-> [1, 2, 3, ...]
The code above executes one http request:
GET
https://localhost:8000/api/documents/?page=1
Iteration enables you to execute mass operations of any kind. Like requesting single items, the iterator always returns PaperlessModel
s.
count = 0
async for item in paperless.documents:
if item.correspondent == 1:
count += 1
print(f"{count} documents are currently stored for correspondent 1.")
#-> 5 documents are currently stored for correspondent 1.
The code above executes many http requests, depending on the count of your stored documents:
GET
https://localhost:8000/api/documents/?page=1
GET
https://localhost:8000/api/documents/?page=2
...
GET
https://localhost:8000/api/documents/?page=19
Instead of iterating over resource items, you may want to iterate over pagination results in some cases. The Page
model itself delivers the possibility to check for the existence of previous and next pages, item counts, accessing the raw (.results
) or processed data (.items
), and so on.
page_iter = aiter(paperless.documents.pages())
page = await anext(page_iter)
#-> page.current_page == 1
page = await anext(page_iter)
#-> page.current_page == 2
The code above executes two http requests:
GET
https://localhost:8000/api/documents/?page=1
GET
https://localhost:8000/api/documents/?page=2
Requesting many pages can be time-consuming, so a better way to apply the filter (mentioned here) is using the reduce
context. Technically, it applies query parameters to the http request, which are interpreted as filters by Paperless-ngx.
filters = {
"correspondent__id": 1,
}
async with paperless.documents.reduce(**filters) as filtered:
async for item in filtered:
count += 1
# ...
#-> 5 documents are currently stored for correspondent 1.
The code above executes just one http request, and achieves the same:
GET
https://localhost:8000/api/documents/?page=1&correspondent__id=1
Tip
The reduce
context works with all previously mentioned methods: __aiter__
, all
and pages
.
Note
There are many filters available, PyPaperless doesn't provide a complete list. I am working on that. At the moment, you must use the Django Rest framework http endpoint of Paperless-ngx in your browser and play around with the Filter button on each resource.
Paperless-ngx simply ignores filters which don't exist and treats them as no filter instead of raising errors, be careful.
PyPaperless offers creation, update and deletion of resource items. These features are enabled where it makes (at least for me) sense, Paperless-ngx itself offers full CRUD functionality. Please check the resource features table at the top of this README. If you need CRUD for another resource, please let me know and open an issue with your specific use-case.
The process of creating items consists of three parts: retrieving a new draft instance from PyPaperless, apply data to it and call save
. You can choose whether applying data to the draft via kwargs
or by assigning it to the draft instance, or both. Maybe you want to request the newly created item by the returned primary key and compare it against the data from the draft. If not, you can safely trash the draft instance after saving, as it cannot be saved twice (database constraint violation).
from pypaperless.models.common import MatchingAlgorithmType
draft = paperless.correspondents.draft(
name="New correspondent",
is_insensitive=True, # this works
)
draft.matching_algorithm = MatchingAlgorithmType.ANY
draft.match = 'any word "or small strings" match'
draft.is_insensitive = False # and this, too!
new_pk = await draft.save()
#-> 42
The code above executes one http request:
POST
https://localhost:8000/api/correspondents/
When it comes to updating data, you can choose between http PATCH
(only changed fields) or PUT
(all fields) methods. Usually updating only changed fields will do the trick. You can continue working with the class instance after updating, as the update
method applies new data from Paperless-ngx to it.
item = await paperless.documents(23)
item.title = "New document title"
success = await item.update()
success = await item.update(only_changed=False) # put all fields
#-> True
The code above executes two http requests:
PATCH
http://localhost:8000/api/documents/23/
PUT
http://localhost:8000/api/documents/23/
Note
The actual payload of the request is completely different here, and I recommend you to use PATCH
whenever possible. It is cleaner and much safer, as it only updates fields which have actually changed.
PATCH
{
"title": "New document title"
}
PUT
{
"title": "New document title",
"content": "...",
"correspondents": ["..."],
"document_types": ["..."],
"storage_paths": ["..."],
"...": "..."
// and every other field
}
Lust but not least, it is also possible to remove data from Paperless-ngx.
Caution
This will permanently delete data from your database. There is no point of return. Be careful.
item = await paperless.documents(23)
success = await item.delete()
#-> True
The code above executes one http request:
DELETE
http://localhost:8000/api/documents/23/
Some Paperless-ngx resources provide more features as others, especially when it comes to Documents
.
You can access the binary data by using the following methods. They all return a DownloadedDocument
class instance, which holds the binary data and provides some more useful attributes, like content type, disposition type and filename.
Example 1: Provide a primary key
download = await paperless.documents.download(23)
preview = await paperless.documents.preview(23)
thumbnail = await paperless.documents.thumbnail(23)
Example 2: Already fetched item
document = await paperless.documents(23)
download = await document.get_download()
preview = await document.get_preview()
thumbnail = await document.get_thumbnail()
Both codes above execute all of these http requests:
GET
https://localhost:8000/api/documents/23/download/
GET
https://localhost:8000/api/documents/23/preview/
GET
https://localhost:8000/api/documents/23/thumb/
Paperless-ngx stores some metadata about your documents. If you wish to access that, there are again two possibilities.
Example 1: Provide a primary key
metadata = await paperless.documents.metadata(23)
Example 2: Already fetched item
document = await paperless.documents(23)
metadata = await document.get_metadata()
Both codes above execute one http request:
GET
https://localhost:8000/api/documents/23/metadata/
Documents can be commented with so called notes. Paperless-ngx supports requesting, creating and deleting those notes. PyPaperless ships with support for it, too.
Getting notes
Document notes are always available as list[DocumentNote]
after requesting them.
# by primary key
list_of_notes = await paperless.documents.notes(23)
# by already fetched item
document = await paperless.documents(23)
list_of_notes = await document.notes()
The code above executes one http request:
GET
https://localhost:8000/api/documents/23/notes/
Creating notes
You can add new notes. Updating existing notes isn't possible due to Paperless-ngx API limitations.
# by primary key
draft = paperless.documents.notes.draft(23)
# by already fetched item
document = await paperless.documents(23)
draft = document.notes.draft()
draft.note = "Lorem ipsum"
new_note_pk, document_pk = await draft.save()
#-> 42, 23
The code above executes one http request:
POST
https://localhost:8000/api/documents/23/notes/
Deleting notes
Sometimes it may be necessary to delete document notes.
Caution
This will permanently delete data from your database. There is no point of return. Be careful.
a_note = list_of_notes.pop() # document note with example pk 42
success = await a_note.delete()
#-> True
The code above executes one http request:
DELETE
https://localhost:8000/api/documents/23/notes/?id=42
If you want to seek after documents, Paperless-ngx offers two possibilities to achieve that. PyPaperless implements two iterable shortcuts for that.
Search query
Search query documentation: https://docs.paperless-ngx.com/usage/#basic-usage_searching
async for document in paperless.documents.search("type:invoice"):
# do something
The code above executes many http requests, depending on the count of your matched documents:
GET
https://localhost:8000/api/documents/?page=1&query=type%3Ainvoice
GET
https://localhost:8000/api/documents/?page=2&query=type%3Ainvoice
...
GET
https://localhost:8000/api/documents/?page=19&query=type%3Ainvoice
More like
Search for similar documents like the permitted document primary key.
async for document in paperless.documents.more_like(23):
# do something
The code above executes many http requests, depending on the count of your matched documents:
GET
https://localhost:8000/api/documents/?page=1&more_like_id=23
GET
https://localhost:8000/api/documents/?page=2&more_like_id=23
...
GET
https://localhost:8000/api/documents/?page=19&more_like_id=23
Search results
While iterating over search results, Document
models are extended with another field: search_hit
. Lets take a closer look at it.
async for document in paperless.documents.more_like(23):
print(f"{document.id} matched query by {document.search_hit.score}.")
#-> 42 matched query by 13.37.
To make life easier, you have the possibility to check whether a Document
model has been initialized from a search or not:
document = await paperless.documents(23) # no search
if document.has_search_hit:
print("result of a search query")
else:
print("not a result from a query")
#-> not a result from a query
One of the biggest tasks of Paperless-ngx is classification: it is the workflow of assigning classifiers to your documents, like correspondents or tags. Paperless does that by auto-assigning or suggesting them to you. These suggestions can be accessed by PyPaperless, as well.
Example 1: Provide a primary key
suggestions = await paperless.documents.suggestions(23)
Example 2: Already fetched item
document = await paperless.documents(23)
suggestions = await document.get_suggestions()
Both codes above execute one http request:
GET
https://localhost:8000/api/documents/23/suggestions/
The returned DocumentSuggestions
instance stores a list of suggested resource items for each classifier: correspondents, tags, document_types, storage_paths and dates.
Simply returns the next available archive serial number as int
.
next_asn = await paperless.documents.get_next_asn()
#-> 1337
The code above executes one http request:
GET
https://localhost:8000/api/documents/next_asn/
Some resources of Paperless-ngx provide getting and setting of object-level permissions. When requesting data from Paperless-ngx, it delivers two permission fields by default: owner
and user_can_change
. You have to explicitly call the API to return the permissions table by a toggle parameter.
If you want to access the permissions table, you have to enable it one by one for each resource.
paperless.documents.request_permissions = True
document = await paperless.documents(23)
if document.has_permissions:
print(document.permissions)
#-> PermissionTableType(
# view=PermissionSetType(users=[...], groups=[...],
# change=PermissionSetType(...)])
# )
for viewing_users in document.permissions.view.users:
# do something with the user
Requesting permissions stays enabled until it gets disabled again.
paperless.documents.request_permissions = False
document = await paperless.documents(23)
print(document.has_permissions)
#-> False
When creating new resource items, you can apply permissions by setting a PermissionTableType
to the optional set_permissions
field.
Note
Both PermissionTableType
and PermissionSetType
automatically initialize empty lists for their fields unless you provided a value.
from pypaperless.models.common import PermissionSetType, PermissionTableType
draft = paperless.correspondents.draft()
draft.name = "Correspondent with perms"
draft.set_permissions = PermissionTableType(
view=PermissionSetType(
users=[23],
),
)
# ...
If you want to change the permissions of a resource item, you have to enable requesting them before fetching it. The permissions
field gets available then, ready for modifications.
paperless.documents.request_permissions = True
document = await paperless.documents(23)
if document.has_permissions:
document.permissions.view.users.append(5)
await document.update()
As of release 2.6.0, Paperless-ngx supports displaying information about the current system health.
info = await paperless.status()
The code above executes one http request:
GET
https://localhost:8000/api/status/