Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Authentication and read endpoints #2

Merged
merged 18 commits into from
Sep 15, 2022
Merged

Conversation

lindsay-stevens
Copy link
Contributor

@lindsay-stevens lindsay-stevens commented Aug 2, 2022

Why is this the best possible solution? Were any other approaches considered?

Simple structure for reading use cases. The Client is a context manager which does login on entry. The session token is cached in a file so multiple client instances could share the same token. Service objects for the Central API endpoints are attached to the Client object. Service classes retrieve data and (generally) return it Python objects with JSON data mapped into them.

Avoided fanciness in the structure at this stage: 1) in the interests of time, 2) the initial target audience is programmers so they probably don't need much help, 3) simple stuff is easier to maintain.

About the data model classes

Initially I looked at dataclasses to take care of the boilerplate around the domain data types (currently Project, Form, Submission). Although dataclasses has the benefit of being built-in, it became awkward when considering things such as:

  • converting datetimes from strings
  • validating optional/mandatory fields
  • serialisation e.g. to write back into files or something else
  • adding a field-that's-not-a-field for the "fluent" interface (see below, this was a problem for all options)

I then looked at attrs, which was OK but was a bit verbose when it came to defining validations and data conversions, because it ignores type annotations so you have to write it out again via the fields API. Serialisation was not really included, except for asdict and astuple, where again defining how to convert types back to JSON is an exercise for the user.

I then looked at marshmallow, which was better for serialisation, but ultimately it's a library about validating against a schema, i.e. the class defined for the data schema isn't used as an instance containing the actual data. So you have to write another dataclass that looks the same, to put the data in after it's been validated, or use a third-party library to generate such a class.

Lastly I looked at pydantic, which uses type annotations to work out what to validate (type, optional or not) and how to convert data for de/serialisation e.g. string to date incoming, date to string outgoing. It can also generate a json schema for the class, but at the moment that doesn't work with the manager field (it probably can, somehow).

About the fluent interface

The approach is to put an attribute m on the data model class, which is a model manager (like in django). The manager remembers the project_id and/or form_id, and is a namespace for services with those defaults set, thereby making it easier to interact with related endpoints further down the object hierarchy.

For example, given a Project instance, we may want to manipulate related Forms (or other Projects), or given a Form we may want manipulate related Submissions (or other Forms). So a user can whip through the hierarchy like this:

with Client() as client:
    data = client.projects.read(8).m.forms.read("birds").m.submissions.read_all_table()

What are the regression risks?

Hopefully none, it's brand new!

Does this change require updates to documentation?

Yes and the Readme is updated.

Before submitting this PR, please make sure you have:

  • included test cases for core behavior and edge cases in tests
  • run nosetests and verified all tests pass
  • run black pyodk tests to format code
  • verified that any code or assets from external sources are properly credited in comments

@lindsay-stevens
Copy link
Contributor Author

lindsay-stevens commented Aug 26, 2022

Hi @lognaturel , this is about ready for a closer look.

There are a few TODO's mentioned above, but are not critical for using this to put together a demo: perhaps something like a pandas chart of submissions count by month? Or a table with top 5 most common responses for a form question?

Copy link
Member

@lognaturel lognaturel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SO exciting to see all this come together. Liking many of the decisions you took here. I have a few questions inline. I'm also curious to hear more about why you chose to start with the designed calls rather than the dynamic query builder.

@samwel2000, @Thaliehln your thoughts on the initial shape of this are also very much appreciated. 🙏

README.md Outdated
Available endpoints on `Client`:

- Projects
- read
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you pick this read/read_all convention (e.g. is it common in similar Python libs)? How do you imagine it extending to creation? How do you see these calls interacting with the dynamic query builder we've discussed?

I'm not sure I would have separated Submissions and OData in this custom part of the API (vs. the dynamic query builder). I don't think "OData" is particularly meaningful to users -- it's just a means to get a json representation of their data. Do you think that separation is important? The OData call could be something like client.submissions.get_json

I'm asking mostly to have a record of the design decisions so we can continue being consistent!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How did you pick this read/read_all convention

It's following CRUD which is a plain and protocol-agnostic way to refer to manipulating objects. Any of the below may be IO targets for pyodk (e.g. cache or helpers to SQL [sqlite or postgres] or files).

CRUD HTTP SQL Files
create post insert write
read get select read
update put/patch update write/append
delete delete delete delete

The all suffix is in preference to 1) a 5th verb list, or 2) a suffix many which implies specifying more than one ID to read (but it accepts no parameters and returns all IDs). Specific formats get a suffix too e.g. read_xml, read_all_table. I've started mapping this out in the file at docs/api_index.csv.

separated Submissions and OData in this custom part of the API

I agree the name OData isn't very obvious, but I used it to follow the central docs. The exercise of making the API index doc helped clarify that these fit well under forms and submissions so I've moved them and deleted the odata endpoint.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to propose a more freeform and descriptive approach to naming. With this part of the library's API we're not aiming for 1:1 parity with Central's API or full coverage so I think we should aim for as descriptive of names as we can and to also carefully design what parameters can be passed in.

The guiding question I'd like to use for naming is something like

How do people talk about the action that’s being performed? How do ODK docs and Central frontend talk about it?

E.g. we have the concept of “submission edits” so use client.submissions.edit rather than update. In the case of these reads, I think someone would say something like "ok, now I need to get this form's metadata". That leads me to client.forms.get_metadata(projectid) and client.forms.get_metadata(projectid, formid).

This we mean we value expressiveness and consistency with ODK concepts over consistency between method names.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine by me to get this merged with the current names and then to focus on naming in another PR!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think someone would say something like "ok, now I need to get this form's metadata".

Right, but before that I think they'd say "let me list all of the forms in this project". So maybe it's client.forms.list(projectid) and client.forms.get_metadata(projectid, formid, extended=False).

I know I just said I'd value consistency between method names less than other things but it would be nice if this were consistent across resources. I think the following work equally well:

  • client.submissions.list(projectid, formid, extended=False), client.submissions.get_metadata(projectid, formid, extended=False)
  • client.app_users.list(projectid, extended=False) (no way to get metadata for a single one)

And wonderfully enough, that's exactly what the API docs say: "listing all xyz" and "getting xyz metadata".

@samwel2000
Copy link

@lindsay-stevens this is already exciting, so far I like the decision of the design (The client is a context manager who does login and the Central API endpoints are attached to it), I just have a few comments.

  • First, regarding the read/read_all convention, I normally see in the python libs is the get convention or perhaps we can reserve this for the dynamic query builder as discussed here

  • For the designed calls just in case I missed it out I think it should also support reading submissions from encrypted projects

@lognaturel lognaturel changed the title MVP Authentication and read endpoints Aug 30, 2022
- "odata" not particularly meaningful so move endpoints to
  form (service, metadata) and submission (tables).
- test_auth tests were manipulating default files so fixed to
  use temp files instead.
@lindsay-stevens
Copy link
Contributor Author

Thanks for the review @lognaturel and comments @samwel2000 !

I'm also curious to hear more about why you chose to start with the designed calls rather than the dynamic query builder.

Main reasons: 1) the style of services and entities added here are building blocks for it, 2) it was lower down the priority list 😄, 3) limiting scope for this PR, 4) I'm not sure about the exact implementation strategy for it yet.

- fluent style for chaining calls through the object hierarchy
- remove "Entity" suffixes to avoid confusion with upcoming ODK feature
  with the same name, or weirdness like an "EntityEntity"
- add pydantic for decent validation and de/serialization
@lindsay-stevens lindsay-stevens marked this pull request as ready for review September 2, 2022 20:07
- rename methods: read -> get etc. following feedback (naming).
- subclass requests.Session to make error handling cleaner
- build URL from base_url to make get/post/etc simpler
- add client open/close following feedback (indentation).
- following feedback, it is not a desired approach.
- default is otherwise requests version
@lognaturel
Copy link
Member

Thanks, @lindsay-stevens!

Here's a quick summary of some of the behind-the-scenes conversations @lindsay-stevens an I had:

  • For the dynamic/self-serve portion of this library intended for folks who are comfortable-ish with using the API docs directly, we're going with exposing the HTTP methods directly on the client. This is much simpler to implement than a fluent API and is also simpler to use since we can copy and paste from the API docs! I'll follow up with some tutorial-style instructions on that.
  • For the "designed" portion of this library, we are going to be very intentional about exposing the actions that users want to perform. Those may or may not map to a single Central API call.
  • We'll use documentation and "how people talk about things" to guide naming (more here)
  • We'll expose methods from service objects on the client roughly mapping to Central resources (currently projects, forms, submissions).
  • In the case of nested resources (e.g. App Users and Forms are in Projects, Submissions are in Forms), we won't offer a way to chain calls. @lindsay-stevens did some good exploration of the concept at d342233 I feel that the additional complexity is not worth it. I'd also rather not have too many ways to do the same thing because that makes documentation and maintenance harder.

I'm going to merge this now and follow up with a couple of small proposed changes.

Let's try to get a first release out next week. @samwel2000, please chime in here or file issues if you have any suggestions to make.

# for free to join this conversation on GitHub. Already have an account? # to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants