Authentication and read endpoints #2

lindsay-stevens · 2022-08-02T10:12:26Z

Authentication and config implemented and tested
Replicate the original gist
Implement the first few steps from the spec
Maybe some of these demo endpoints
~~[ ] Maybe some API path building fanciness like Birdy~~

Why is this the best possible solution? Were any other approaches considered?

Simple structure for reading use cases. The Client is a context manager which does login on entry. The session token is cached in a file so multiple client instances could share the same token. Service objects for the Central API endpoints are attached to the Client object. Service classes retrieve data and (generally) return it Python objects with JSON data mapped into them.

Avoided fanciness in the structure at this stage: 1) in the interests of time, 2) the initial target audience is programmers so they probably don't need much help, 3) simple stuff is easier to maintain.

About the data model classes

Initially I looked at dataclasses to take care of the boilerplate around the domain data types (currently Project, Form, Submission). Although dataclasses has the benefit of being built-in, it became awkward when considering things such as:

converting datetimes from strings
validating optional/mandatory fields
serialisation e.g. to write back into files or something else
adding a field-that's-not-a-field for the "fluent" interface (see below, this was a problem for all options)

I then looked at attrs, which was OK but was a bit verbose when it came to defining validations and data conversions, because it ignores type annotations so you have to write it out again via the fields API. Serialisation was not really included, except for asdict and astuple, where again defining how to convert types back to JSON is an exercise for the user.

I then looked at marshmallow, which was better for serialisation, but ultimately it's a library about validating against a schema, i.e. the class defined for the data schema isn't used as an instance containing the actual data. So you have to write another dataclass that looks the same, to put the data in after it's been validated, or use a third-party library to generate such a class.

Lastly I looked at pydantic, which uses type annotations to work out what to validate (type, optional or not) and how to convert data for de/serialisation e.g. string to date incoming, date to string outgoing. It can also generate a json schema for the class, but at the moment that doesn't work with the manager field (it probably can, somehow).

About the fluent interface

The approach is to put an attribute m on the data model class, which is a model manager (like in django). The manager remembers the project_id and/or form_id, and is a namespace for services with those defaults set, thereby making it easier to interact with related endpoints further down the object hierarchy.

For example, given a Project instance, we may want to manipulate related Forms (or other Projects), or given a Form we may want manipulate related Submissions (or other Forms). So a user can whip through the hierarchy like this:

with Client() as client:
    data = client.projects.read(8).m.forms.read("birds").m.submissions.read_all_table()

What are the regression risks?

Hopefully none, it's brand new!

Does this change require updates to documentation?

Yes and the Readme is updated.

Before submitting this PR, please make sure you have:

included test cases for core behavior and edge cases in tests
run nosetests and verified all tests pass
run black pyodk tests to format code
verified that any code or assets from external sources are properly credited in comments

- some wording changes, mainly to push some issues to the forum

- organise now since there'll probably be a few modules

lindsay-stevens · 2022-08-26T12:12:32Z

Hi @lognaturel , this is about ready for a closer look.

~~There are a few TODO's mentioned above, but are not critical for using this to put together a~~ demo: perhaps something like a pandas chart of submissions count by month? Or a table with top 5 most common responses for a form question?

lognaturel

SO exciting to see all this come together. Liking many of the decisions you took here. I have a few questions inline. I'm also curious to hear more about why you chose to start with the designed calls rather than the dynamic query builder.

@samwel2000, @Thaliehln your thoughts on the initial shape of this are also very much appreciated. 🙏

README.md

lognaturel · 2022-08-29T23:22:33Z

README.md

+Available endpoints on `Client`:
+
+- Projects
+  - read


How did you pick this read/read_all convention (e.g. is it common in similar Python libs)? How do you imagine it extending to creation? How do you see these calls interacting with the dynamic query builder we've discussed?

I'm not sure I would have separated Submissions and OData in this custom part of the API (vs. the dynamic query builder). I don't think "OData" is particularly meaningful to users -- it's just a means to get a json representation of their data. Do you think that separation is important? The OData call could be something like client.submissions.get_json

I'm asking mostly to have a record of the design decisions so we can continue being consistent!

How did you pick this read/read_all convention

It's following CRUD which is a plain and protocol-agnostic way to refer to manipulating objects. Any of the below may be IO targets for pyodk (e.g. cache or helpers to SQL [sqlite or postgres] or files).

CRUD HTTP SQL Files

create post insert write

read get select read

update put/patch update write/append

delete delete delete delete

The all suffix is in preference to 1) a 5th verb list, or 2) a suffix many which implies specifying more than one ID to read (but it accepts no parameters and returns all IDs). Specific formats get a suffix too e.g. read_xml, read_all_table. I've started mapping this out in the file at docs/api_index.csv.

separated Submissions and OData in this custom part of the API

I agree the name OData isn't very obvious, but I used it to follow the central docs. The exercise of making the API index doc helped clarify that these fit well under forms and submissions so I've moved them and deleted the odata endpoint.

I'd like to propose a more freeform and descriptive approach to naming. With this part of the library's API we're not aiming for 1:1 parity with Central's API or full coverage so I think we should aim for as descriptive of names as we can and to also carefully design what parameters can be passed in.

The guiding question I'd like to use for naming is something like

How do people talk about the action that’s being performed? How do ODK docs and Central frontend talk about it?

E.g. we have the concept of “submission edits” so use client.submissions.edit rather than update. In the case of these reads, I think someone would say something like "ok, now I need to get this form's metadata". That leads me to client.forms.get_metadata(projectid) and client.forms.get_metadata(projectid, formid).

This we mean we value expressiveness and consistency with ODK concepts over consistency between method names.

Fine by me to get this merged with the current names and then to focus on naming in another PR!

I think someone would say something like "ok, now I need to get this form's metadata".

Right, but before that I think they'd say "let me list all of the forms in this project". So maybe it's client.forms.list(projectid) and client.forms.get_metadata(projectid, formid, extended=False).

I know I just said I'd value consistency between method names less than other things but it would be nice if this were consistent across resources. I think the following work equally well:

client.submissions.list(projectid, formid, extended=False), client.submissions.get_metadata(projectid, formid, extended=False)

client.app_users.list(projectid, extended=False) (no way to get metadata for a single one)

And wonderfully enough, that's exactly what the API docs say: "listing all xyz" and "getting xyz metadata".

README.md

samwel2000 · 2022-08-30T05:05:44Z

@lindsay-stevens this is already exciting, so far I like the decision of the design (The client is a context manager who does login and the Central API endpoints are attached to it), I just have a few comments.

First, regarding the read/read_all convention, I normally see in the python libs is the get convention or perhaps we can reserve this for the dynamic query builder as discussed here
For the designed calls just in case I missed it out I think it should also support reading submissions from encrypted projects

- "odata" not particularly meaningful so move endpoints to form (service, metadata) and submission (tables). - test_auth tests were manipulating default files so fixed to use temp files instead.

lindsay-stevens · 2022-08-30T18:03:45Z

Thanks for the review @lognaturel and comments @samwel2000 !

I'm also curious to hear more about why you chose to start with the designed calls rather than the dynamic query builder.

Main reasons: 1) the style of services and entities added here are building blocks for it, 2) it was lower down the priority list 😄, 3) limiting scope for this PR, 4) I'm not sure about the exact implementation strategy for it yet.

pyodk/endpoints/projects.py

- fluent style for chaining calls through the object hierarchy - remove "Entity" suffixes to avoid confusion with upcoming ODK feature with the same name, or weirdness like an "EntityEntity" - add pydantic for decent validation and de/serialization

- rename methods: read -> get etc. following feedback (naming). - subclass requests.Session to make error handling cleaner - build URL from base_url to make get/post/etc simpler - add client open/close following feedback (indentation).

- following feedback, it is not a desired approach.

- default is otherwise requests version

lognaturel · 2022-09-15T21:59:03Z

Thanks, @lindsay-stevens!

Here's a quick summary of some of the behind-the-scenes conversations @lindsay-stevens an I had:

For the dynamic/self-serve portion of this library intended for folks who are comfortable-ish with using the API docs directly, we're going with exposing the HTTP methods directly on the client. This is much simpler to implement than a fluent API and is also simpler to use since we can copy and paste from the API docs! I'll follow up with some tutorial-style instructions on that.
For the "designed" portion of this library, we are going to be very intentional about exposing the actions that users want to perform. Those may or may not map to a single Central API call.
We'll use documentation and "how people talk about things" to guide naming (more here)
We'll expose methods from service objects on the client roughly mapping to Central resources (currently projects, forms, submissions).
In the case of nested resources (e.g. App Users and Forms are in Projects, Submissions are in Forms), we won't offer a way to chain calls. @lindsay-stevens did some good exploration of the concept at d342233 I feel that the additional complexity is not worth it. I'd also rather not have too many ways to do the same thing because that makes documentation and maintenance harder.

I'm going to merge this now and follow up with a couple of small proposed changes.

Let's try to get a first release out next week. @samwel2000, please chime in here or file issues if you have any suggestions to make.

lindsay-stevens added 8 commits August 2, 2022 13:47

chg: project files to follow getodk/collect

83650c5

- some wording changes, mainly to push some issues to the forum

add: initial client/auth/config modules

4d426fd

add: auth tests, project module

7179564

chg: move endpoints into subfolder

75a020f

- organise now since there'll probably be a few modules

add: form endpoint and tests

df99fd7

add: submissions endpoint and tests

4c077a6

add: odata endpoint, update readme

f955cc5

fix: readme typo

d67fe3e

lindsay-stevens requested a review from lognaturel August 26, 2022 12:12

lindsay-stevens added 3 commits August 29, 2022 20:08

chg: validate config on read, log errors

7ff3eb9

chg: convert timestamp strings to datetimes in entity objects

caeafa3

add: map odata table params to args, odata metadata xml

de2f921

lognaturel reviewed Aug 29, 2022

View reviewed changes

lognaturel changed the title ~~MVP~~ Authentication and read endpoints Aug 30, 2022

lindsay-stevens added 3 commits August 31, 2022 02:48

add: set the config / cache paths via client init

1e5fecd

chg: move odata endpoints to forms/submissions, fix test_auth

730f5e8

- "odata" not particularly meaningful so move endpoints to form (service, metadata) and submission (tables). - test_auth tests were manipulating default files so fixed to use temp files instead.

add: api_index doc

719bd91

lindsay-stevens requested a review from lognaturel August 30, 2022 17:45

lognaturel reviewed Aug 30, 2022

View reviewed changes

pyodk/endpoints/projects.py Outdated Show resolved Hide resolved

lindsay-stevens marked this pull request as ready for review September 2, 2022 20:07

lindsay-stevens requested a review from lognaturel September 2, 2022 20:08

lindsay-stevens added 3 commits September 16, 2022 05:23

chg: remove fluent api / model managers

0daa9a2

- following feedback, it is not a desired approach.

add: user-agent for pyodk

6559ada

- default is otherwise requests version

lognaturel approved these changes Sep 15, 2022

View reviewed changes

lognaturel merged commit 7906600 into getodk:master Sep 15, 2022

lindsay-stevens deleted the mvp branch September 16, 2022 10:30

lognaturel mentioned this pull request Sep 20, 2022

README: add goals and how we think about design #10

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Authentication and read endpoints #2

Authentication and read endpoints #2

lindsay-stevens commented Aug 2, 2022 •

edited

Loading

lindsay-stevens commented Aug 26, 2022 •

edited

Loading

lognaturel left a comment •

edited

Loading

lognaturel Aug 29, 2022

lindsay-stevens Aug 30, 2022

lognaturel Sep 14, 2022

lognaturel Sep 14, 2022

lognaturel Sep 14, 2022

samwel2000 commented Aug 30, 2022

lindsay-stevens commented Aug 30, 2022

lognaturel commented Sep 15, 2022

CRUD	HTTP	SQL	Files
create	post	insert	write
read	get	select	read
update	put/patch	update	write/append
delete	delete	delete	delete

Authentication and read endpoints #2

Authentication and read endpoints #2

Conversation

lindsay-stevens commented Aug 2, 2022 • edited Loading

Why is this the best possible solution? Were any other approaches considered?

What are the regression risks?

Does this change require updates to documentation?

Before submitting this PR, please make sure you have:

lindsay-stevens commented Aug 26, 2022 • edited Loading

lognaturel left a comment • edited Loading

Choose a reason for hiding this comment

lognaturel Aug 29, 2022

Choose a reason for hiding this comment

lindsay-stevens Aug 30, 2022

Choose a reason for hiding this comment

lognaturel Sep 14, 2022

Choose a reason for hiding this comment

lognaturel Sep 14, 2022

Choose a reason for hiding this comment

lognaturel Sep 14, 2022

Choose a reason for hiding this comment

samwel2000 commented Aug 30, 2022

lindsay-stevens commented Aug 30, 2022

lognaturel commented Sep 15, 2022

lindsay-stevens commented Aug 2, 2022 •

edited

Loading

lindsay-stevens commented Aug 26, 2022 •

edited

Loading

lognaturel left a comment •

edited

Loading