Skip to content

Commit

Permalink
[lang] Add escape syntax for field names
Browse files Browse the repository at this point in the history
Related to rcoh#99

Currently field names containing a space or period, e.g. `date received`
or `grpc.method`, cannot be parsed. This could be worked around using
`jq` or similar tools to rewrite the field name, but that's a pain.

This commit adds an escaped field name syntax of `["<FIELD>"]` which is
based on the Object Identifier-Index syntax[0] used by `jq`, so it
should be somewhat familiar to many people who parse JSON on the
command line.

The more obvious option of delimiting with just quotes, e.g.
"date received", creates an ambiguity between string literals and
escaped field names. For example, does `where foo == "date received"`
mean field `foo` matches field `date received`, or field `foo` matches
the string "date received"?

Example query:

```
* | json | where ["grpc.method"] == "Foo" | count by ["date received"]
```

[0]
https://stedolan.github.io/jq/manual/#ObjectIdentifier-Index:.foo,.foo.bar
  • Loading branch information
Will Chandler committed Jul 23, 2021
1 parent e00d6ab commit f6f1bd6
Show file tree
Hide file tree
Showing 3 changed files with 45 additions and 1 deletion.
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,14 @@ A simple query that operates on JSON logs and counts the number of logs per leve
agrind '* | json | count by log_level'
```

### Escaping Field Names

Field names containing a space or period must be escaped using `["<FIELD>"]`:

```bash
agrind '* | json | count by ["date received"], ["grpc.method"]
```
### Filters
There are three basic filters:
Expand Down
25 changes: 24 additions & 1 deletion src/lang.rs
Original file line number Diff line number Diff line change
Expand Up @@ -366,6 +366,14 @@ fn is_ident(c: char) -> bool {
is_alphanumeric(c as u8) || c == '_'
}

fn is_escaped_ident(c: char) -> bool {
match c {
space if is_space(space as u8) => true,
'.' => true,
_ => is_ident(c),
}
}

fn starts_ident(c: char) -> bool {
is_alphabetic(c as u8) || c == '_'
}
Expand Down Expand Up @@ -431,12 +439,20 @@ named!(column_ref<Span, Expr>, do_parse!(
(Expr::Column { head: DataAccessAtom::Key(head), rest: rest })
));

named!(ident<Span, String>, do_parse!(
named!(ident<Span, String>, alt!(bare_ident | escaped_ident));

named!(bare_ident<Span, String>, do_parse!(
start: take_while1!(starts_ident) >>
rest: take_while!(is_ident) >>
(start.fragment.0.to_owned() + rest.fragment.0)
));

named!(escaped_ident<Span, String>, do_parse!(
start: preceded!(tag!("[\""), take_while1!(starts_ident)) >>
rest: terminated!(take_while!(is_escaped_ident), tag!("\"]")) >>
(start.fragment.0.to_owned() + rest.fragment.0)
));

named!(arguments<Span, Vec<Expr>>, add_return_error!(SyntaxErrors::StartOfError.into(), delimited!(
tag!("("),
separated_list!(tag!(","), expr),
Expand Down Expand Up @@ -1166,6 +1182,13 @@ mod tests {
expect_fail!(ident, "5x");
}

#[test]
fn parse_quoted_ident() {
expect!(ident, "[\"hello world\"]", "hello world".to_string());
expect!(ident, "[\"hello.world\"]", "hello.world".to_string());
expect_fail!(ident, "\"\"");
}

#[test]
fn parse_var_list() {
expect!(
Expand Down
13 changes: 13 additions & 0 deletions tests/structured_tests/escaped_ident.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
query = """
* | json | count by ["grpc.method"], ["start time"]
"""
input = """
{"start time": "today", "grpc.method": "Foo"}
{"start time": "today", "grpc.method": "Bar"}
"""
output = """
["grpc.method"] ["start time"] _count
-----------------------------------------------------------
Bar today 1
Foo today 1
"""

0 comments on commit f6f1bd6

Please # to comment.