Skip to content
New issue

Have a question about this project? # for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “#”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? # to your account

Add ?_extra= mechanism for requesting extra properties in JSON #262

Open
Tracked by #2333
simonw opened this issue May 16, 2018 · 27 comments
Open
Tracked by #2333

Add ?_extra= mechanism for requesting extra properties in JSON #262

simonw opened this issue May 16, 2018 · 27 comments

Comments

@simonw
Copy link
Owner

simonw commented May 16, 2018

Datasette views currently work by creating a set of data that should be returned as JSON, then defining an additional, optional template_data() function which is called if the view is being rendered as HTML.

This template_data() function calculates extra template context variables which are necessary for the HTML view but should not be included in the JSON.

Example of how that is used today:

async def template_data():
display_columns, display_rows = await self.display_columns_and_rows(
name,
table,
description,
rows,
link_column=False,
expand_foreign_keys=True,
)
for column in display_columns:
column["sortable"] = False
return {
"database_hash": hash,
"foreign_key_tables": await self.foreign_key_tables(
name, table, pk_values
),
"display_columns": display_columns,
"display_rows": display_rows,
"custom_rows_and_columns_templates": [
"_rows_and_columns-{}-{}.html".format(
to_css_class(name), to_css_class(table)
),
"_rows_and_columns-row-{}-{}.html".format(
to_css_class(name), to_css_class(table)
),
"_rows_and_columns.html",
],
"metadata": self.ds.metadata.get("databases", {}).get(name, {}).get(
"tables", {}
).get(
table, {}
),
}

With features like Facets in #255 I'm beginning to want to move more items into the template_data() - in the case of facets it's the suggested_facets array. This saves that feature from being calculated (involving several SQL queries) for the JSON case where it is unlikely to be used.

But... as an API user, I want to still optionally be able to access that information.

Solution: Add a ?_extra=suggested_facets&_extra=table_metadata argument which can be used to optionally request additional blocks to be added to the JSON API.

Then redefine as many of the current template_data() features as extra arguments instead, and teach Datasette to return certain extras by default when rendering templates.

This could allow the JSON representation to be slimmed down further (removing e.g. the table_definition and view_definition keys) while still making that information available to API users who need it.

@simonw
Copy link
Owner Author

simonw commented May 17, 2018

Idea: ?_extra=sqllog could output a lot of every individual SQL statement that was executed in order to generate the page - useful for seeing how foreign key expansion and faceting actually works.

@simonw simonw added the feature label Jun 21, 2018
@simonw simonw added this to the Next release milestone Jul 24, 2018
@simonw simonw removed this from the Next release milestone May 13, 2019
@simonw simonw changed the title Add ?_extras= mechanism for requesting extra properties in JSON Add ?_extra= mechanism for requesting extra properties in JSON Sep 12, 2020
@simonw
Copy link
Owner Author

simonw commented Sep 12, 2020

Idea: ?_extra=sqllog could output a lot of every individual SQL statement that was executed in order to generate the page - useful for seeing how foreign key expansion and faceting actually works.

I built a version of that a while ago as the ?_trace=1 argument.

@simonw
Copy link
Owner Author

simonw commented Sep 12, 2020

Are there any interesting use-cases for a plugin hook that allows plugins to define their own ?_extra= blocks?

@simonw
Copy link
Owner Author

simonw commented Oct 20, 2020

Just realized I added an undocumented ?_extras= option to the row view years ago and forgot about it - it's not even documented. Added in a30c5b2 - https://latest.datasette.io/fixtures/attraction_characteristic/2.json?_extras=foreign_key_tables

That will need to be made consistent with the new mechanism. I think ?_extra=a&_extra=b is more consistent with other Datasette features (like ?_facet=col1&_facet=col2) but potentially quite verbose.

So I could support ?_extra=a,b,c as an alternative allowed syntax, or I could allow ?_extra=single and ?_extras=comma,separated.

I think I prefer allowing commas in ?_extra=.

@simonw
Copy link
Owner Author

simonw commented Oct 20, 2020

In the documentation for ?_extra= I think I'll emphasize the comma-separated version of it. Also: there will be ?_extra= values which act as aliases for collection combinations - e.g. ?_extra=full will toggle everything.

@simonw
Copy link
Owner Author

simonw commented Oct 20, 2020

I think it's worth having a plugin hook for this - it can be same hook that is used internally. Maybe register_extra - it lets you return one or more extra implementations, each with a name and an async function that gets called.

Things like suggested facets will become register_extra hooks. Maybe actual facets too?

@simonw
Copy link
Owner Author

simonw commented Oct 20, 2020

This is now blocking simonw/datasette-graphql#61 because that issue needs a way to turn off suggested facets when retrieving the results of a table query.

@simonw
Copy link
Owner Author

simonw commented Oct 21, 2020

I think I should prioritize the facets component of this, since that could have significant performance wins while also supporting datasette-graphql.

@simonw
Copy link
Owner Author

simonw commented Dec 15, 2021

This is relevant to the big refactor in:

@simonw simonw pinned this issue Jan 19, 2023
@simonw
Copy link
Owner Author

simonw commented Jan 20, 2023

I'm going to write code which parses ?_extra= in the comma separated or multiple parameter format and then looks up functions in a dictionary.

It will return an error if you ask for an extra that does not exist.

@simonw
Copy link
Owner Author

simonw commented Jan 21, 2023

Got first prototype working using asyncinject and it's pretty nice:

diff --git a/datasette/views/table.py b/datasette/views/table.py
index ad45ecd3..c8690b22 100644
--- a/datasette/views/table.py
+++ b/datasette/views/table.py
@@ -2,6 +2,7 @@ import asyncio
 import itertools
 import json
 
+from asyncinject import Registry
 import markupsafe
 
 from datasette.plugins import pm
@@ -538,57 +539,60 @@ class TableView(DataView):
         # Execute the main query!
         results = await db.execute(sql, params, truncate=True, **extra_args)
 
-        # Calculate the total count for this query
-        count = None
-        if (
-            not db.is_mutable
-            and self.ds.inspect_data
-            and count_sql == f"select count(*) from {table_name} "
-        ):
-            # We can use a previously cached table row count
-            try:
-                count = self.ds.inspect_data[database_name]["tables"][table_name][
-                    "count"
-                ]
-            except KeyError:
-                pass
-
-        # Otherwise run a select count(*) ...
-        if count_sql and count is None and not nocount:
-            try:
-                count_rows = list(await db.execute(count_sql, from_sql_params))
-                count = count_rows[0][0]
-            except QueryInterrupted:
-                pass
-
-        # Faceting
-        if not self.ds.setting("allow_facet") and any(
-            arg.startswith("_facet") for arg in request.args
-        ):
-            raise BadRequest("_facet= is not allowed")
+        # Resolve extras
+        extras = _get_extras(request)
+        if request.args.getlist("_facet"):
+            extras.add("facet_results")
 
-        # pylint: disable=no-member
-        facet_classes = list(
-            itertools.chain.from_iterable(pm.hook.register_facet_classes())
-        )
-        facet_results = {}
-        facets_timed_out = []
-        facet_instances = []
-        for klass in facet_classes:
-            facet_instances.append(
-                klass(
-                    self.ds,
-                    request,
-                    database_name,
-                    sql=sql_no_order_no_limit,
-                    params=params,
-                    table=table_name,
-                    metadata=table_metadata,
-                    row_count=count,
-                )
+        async def extra_count():
+            # Calculate the total count for this query
+            count = None
+            if (
+                not db.is_mutable
+                and self.ds.inspect_data
+                and count_sql == f"select count(*) from {table_name} "
+            ):
+                # We can use a previously cached table row count
+                try:
+                    count = self.ds.inspect_data[database_name]["tables"][table_name][
+                        "count"
+                    ]
+                except KeyError:
+                    pass
+
+            # Otherwise run a select count(*) ...
+            if count_sql and count is None and not nocount:
+                try:
+                    count_rows = list(await db.execute(count_sql, from_sql_params))
+                    count = count_rows[0][0]
+                except QueryInterrupted:
+                    pass
+            return count
+
+        async def facet_instances(extra_count):
+            facet_instances = []
+            facet_classes = list(
+                itertools.chain.from_iterable(pm.hook.register_facet_classes())
             )
+            for facet_class in facet_classes:
+                facet_instances.append(
+                    facet_class(
+                        self.ds,
+                        request,
+                        database_name,
+                        sql=sql_no_order_no_limit,
+                        params=params,
+                        table=table_name,
+                        metadata=table_metadata,
+                        row_count=extra_count,
+                    )
+                )
+            return facet_instances
+
+        async def extra_facet_results(facet_instances):
+            facet_results = {}
+            facets_timed_out = []
 
-        async def execute_facets():
             if not nofacet:
                 # Run them in parallel
                 facet_awaitables = [facet.facet_results() for facet in facet_instances]
@@ -607,9 +611,13 @@ class TableView(DataView):
                         facet_results[key] = facet_info
                     facets_timed_out.extend(instance_facets_timed_out)
 
-        suggested_facets = []
+            return {
+                "results": facet_results,
+                "timed_out": facets_timed_out,
+            }
 
-        async def execute_suggested_facets():
+        async def extra_suggested_facets(facet_instances):
+            suggested_facets = []
             # Calculate suggested facets
             if (
                 self.ds.setting("suggest_facets")
@@ -624,8 +632,15 @@ class TableView(DataView):
                 ]
                 for suggest_result in await gather(*facet_suggest_awaitables):
                     suggested_facets.extend(suggest_result)
+            return suggested_facets
+
+        # Faceting
+        if not self.ds.setting("allow_facet") and any(
+            arg.startswith("_facet") for arg in request.args
+        ):
+            raise BadRequest("_facet= is not allowed")
 
-        await gather(execute_facets(), execute_suggested_facets())
+        # pylint: disable=no-member
 
         # Figure out columns and rows for the query
         columns = [r[0] for r in results.description]
@@ -732,17 +747,56 @@ class TableView(DataView):
             rows = rows[:page_size]
 
         # human_description_en combines filters AND search, if provided
-        human_description_en = filters.human_description_en(
-            extra=extra_human_descriptions
-        )
+        async def extra_human_description_en():
+            human_description_en = filters.human_description_en(
+                extra=extra_human_descriptions
+            )
+            if sort or sort_desc:
+                human_description_en = " ".join(
+                    [b for b in [human_description_en, sorted_by] if b]
+                )
+            return human_description_en
 
         if sort or sort_desc:
             sorted_by = "sorted by {}{}".format(
                 (sort or sort_desc), " descending" if sort_desc else ""
             )
-            human_description_en = " ".join(
-                [b for b in [human_description_en, sorted_by] if b]
-            )
+
+        async def extra_next_url():
+            return next_url
+
+        async def extra_columns():
+            return columns
+
+        async def extra_primary_keys():
+            return pks
+
+        registry = Registry(
+            extra_count,
+            extra_facet_results,
+            extra_suggested_facets,
+            facet_instances,
+            extra_human_description_en,
+            extra_next_url,
+            extra_columns,
+            extra_primary_keys,
+        )
+
+        results = await registry.resolve_multi(
+            ["extra_{}".format(extra) for extra in extras]
+        )
+        data = {
+            "ok": True,
+            "rows": rows[:page_size],
+            "next": next_value and str(next_value) or None,
+        }
+        data.update({
+            key.replace("extra_", ""): value
+            for key, value in results.items()
+            if key.startswith("extra_")
+            and key.replace("extra_", "") in extras
+        })
+        return Response.json(data, default=repr)
 
         async def extra_template():
             nonlocal sort
@@ -1334,3 +1388,11 @@ class TableDropView(BaseView):
 
         await db.execute_write_fn(drop_table)
         return Response.json({"ok": True}, status=200)
+
+
+def _get_extras(request):
+    extra_bits = request.args.getlist("_extra")
+    extras = set()
+    for bit in extra_bits:
+        extras.update(bit.split(","))
+    return extras

With that in place, http://127.0.0.1:8001/content/releases?author=25778&_size=1&_extra=count,primary_keys,columns&_facet=author returns:

{
  "ok": true,
  "rows": [
    {
      "html_url": "https://github.com/eyeseast/geocode-sqlite/releases/tag/0.1.2",
      "id": 30926270,
      "author": {
        "value": 25778,
        "label": "eyeseast"
      },
      "node_id": "MDc6UmVsZWFzZTMwOTI2Mjcw",
      "tag_name": "0.1.2",
      "target_commitish": "master",
      "name": "v0.1.2",
      "draft": 0,
      "prerelease": 1,
      "created_at": "2020-09-08T17:48:24Z",
      "published_at": "2020-09-08T17:50:15Z",
      "body": "Basic API is in place, with CLI support for Google, Bing, MapQuest and Nominatum (OSM) geocoders.",
      "repo": {
        "value": 293361514,
        "label": "geocode-sqlite"
      },
      "reactions": null,
      "mentions_count": null
    }
  ],
  "next": "30926270",
  "primary_keys": [
    "id"
  ],
  "columns": [
    "html_url",
    "id",
    "author",
    "node_id",
    "tag_name",
    "target_commitish",
    "name",
    "draft",
    "prerelease",
    "created_at",
    "published_at",
    "body",
    "repo",
    "reactions",
    "mentions_count"
  ],
  "count": 25,
  "facet_results": {
    "results": {
      "author": {
        "name": "author",
        "type": "column",
        "hideable": true,
        "toggle_url": "/content/releases?author=25778&_size=1&_extra=count%2Cprimary_keys%2Ccolumns",
        "results": [
          {
            "value": 25778,
            "label": "eyeseast",
            "count": 25,
            "toggle_url": "http://127.0.0.1:8001/content/releases?_size=1&_extra=count%2Cprimary_keys%2Ccolumns&_facet=author",
            "selected": true
          }
        ],
        "truncated": false
      }
    },
    "timed_out": []
  }
}

@simonw
Copy link
Owner Author

simonw commented Jan 21, 2023

Implementing this to work with the .json extension is going to be a lot harder.

The challenge here is that we're working with the whole BaseView() v.s. TableView() abstraction, which I've been wanting to get rid of for a long time.

BaseView() calls .data() and expects to get back a (data, extra_template_data, templates) tuple - then if a format is in play (.json or .geojson or similar from a plugin) it hands off data to that. If .csv is involved it does something special, in order to support streaming responses. And if it's regular HTML it calls await extra_template_data() and combines that with data and passes it to the template.

I want this to work completely differently: I want the formats (including HTML) to have the option of adding some extra ?_extra= extras, then I want HTML to be able to render the page entirely from the JSON if necessary.

simonw added a commit that referenced this issue Jan 21, 2023
@simonw
Copy link
Owner Author

simonw commented Jan 21, 2023

I pushed my prototype so far, going to start a draft PR for it.

@simonw
Copy link
Owner Author

simonw commented Jan 21, 2023

It's annoying that the https://docs.datasette.io/en/0.64.1/plugin_hooks.html#register-output-renderer-datasette plugin hook passes rows as "list of sqlite3.Row objects" - I'd prefer it if that plugin hook worked with JSON data, not sqlite3.Row.

https://docs.datasette.io/en/0.64.1/plugin_hooks.html#render-cell-row-value-column-table-database-datasette is documented as accepting Row but actually gets CustomRow, see:

@simonw
Copy link
Owner Author

simonw commented Jan 21, 2023

Maybe "rows" should be a default ?_extra=... but it should be possible to request "arrays" instead which would be a list of arrays, more suitable perhaps for custom renderers such as the CSV one.

This could be quite neat, in that EVERY key in the JSON representation would be defined as an extra - just some would be on by default. There could even be a mechanism for turning them back off again, maybe using ?_extra=-rows.

In which case maybe ?_extra= isn't actually the right name for this feature. It could be ?_key= perhaps, or ?_field=.

Being able to pass ?_field=count,-rows to get back just the count (and skip executing the count entirely) would be pretty neat.

Although ?_only=count would be tidier. So maybe the pair of ?_only= and ?_extra= would make sense.

Would ?_only=rows still return the "ok" field so you can always look at that to confirm an error didn't occur?

@simonw
Copy link
Owner Author

simonw commented Jan 25, 2023

This issue here would benefit from some kid of mechanism for returning just the HTML of the table itself, without any of the surrounding material. I'm not sure if that would make sense as an extra or not:

@simonw
Copy link
Owner Author

simonw commented Feb 5, 2023

I think that does make sense: ?_extra=table perhaps, which would add {"table": "..."}.

@simonw
Copy link
Owner Author

simonw commented Feb 8, 2023

Just realized that it's useful to be able to tell what parameters were used to generate a page... but reflecting things like _next back in the JSON is confusing in the presence of next.

So I'm going to add an extra for that information too.

Not sure what to call it though:

  • params - confusing because in the code that's usually used for params passed to SQL queries
  • query_string - wouldn't that be a string, not params as a dictionary?

I'm going to experiment with a request extra that returns some bits of information about the request.

simonw added a commit that referenced this issue Mar 22, 2023
* Implemented ?_extra= option for JSON views, refs #262
* New dependency: asyncinject
* Remove now-obsolete TableView class
@simonw
Copy link
Owner Author

simonw commented Mar 22, 2023

I just landed this PR so this feature is now in main:

Still needs documentation and maybe some extra tests too.

@simonw
Copy link
Owner Author

simonw commented Mar 29, 2023

I need to get the arbitrary query page to return the same format. It likely won't have nearly as many extras.

# for free to join this conversation on GitHub. Already have an account? # to comment
Projects
None yet
Development

No branches or pull requests

1 participant