Skip to content

Latest commit

 

History

History
115 lines (78 loc) · 3.31 KB

faq.md

File metadata and controls

115 lines (78 loc) · 3.31 KB

FAQ

Why to create zawk?

frawk is good tool created by Eli Rosenthal. We just want to make AWK more powerful with standard library. zawk = frawk + stdlib.

Time flies, and we need a new Modern AWK to work with DuckDB, ClickHouse, S3, KV etc. for text processing.

Why not just contribute to frawk?

frawk is a foundation to zawk for syntax, types, lex etc., and zawk focuses to make AWK more powerful with standard library. Now I'm not sure that developers will accept my changes to frawk, and zawk just experimental work: zawk = AWK + stdlib + Rust.

Frawk still good for text processing, embedded etc., and if possible I will contribute some work to frawk, for example:

  • Upgrade to Rust 2021
  • Upgrade to Clap 4.5
  • Dependencies updated to latest
  • gawk compatible: global variables(ENVIRON, PROCINFO) and functions(datetime etc.)

zawk will fix some bugs in frawk?

Yes. Eli Rosenthal had much less time over the last 1-2 years to devote to bug fixes and feature requests for frawk, and I will try my best to fix bugs in frawk.

Any roadmap for zawk?

Now I'm not sure about the roadmap, but I will try my best to make zawk more powerful and easy to use.

  • gawk compatible
  • stdlib enhancement
  • performance optimization
  • UX: Installation, Usage, Documentation, Examples etc.

What are limits with gawk?

zawk limits:

  • No BEGINFILE and ENDFILE blocks
  • CONVFMT and OFMT are not supported

How to query Apache Parquet?

$ duckdb -c "COPY (select * from 'family.parquet') TO 'family.csv' (FORMAT CSV)"

Special types in text

  • bool: mkbool("true")
  • Tuple: tuple("('abc',123)"): IntMap
  • Array: parse_array("[1,2,3]"): IntMap
  • Record: record("{field1:1,field2:'two'}"): StrMap
  • variants: days(30), week(2): StrMap, and key is name and value.
  • flags: {read,write}: StrMap

You can use above functions to parse special types in text. If possible, don't add space in value text.

Tips: No matter what type you use, the format should be regular expression friendly.

Nushell integration

Please use to csv then pipe output to zawk for csv processing.

$ ls | to csv  | ^zawk -i csv '{print $1}'

Nushell types support:

  • duration: duration("2min + 32sec")
  • timestamp: mktime("2024-04-27 17:07:25.684184848 +08:00")
  • lists: parse_array("[0 1 'two' 3]")
  • file size: to_bytes("1.5GB")
  • records: record("{name:'Nushell', lang: 'Rust'}")

awk file help support

You can add help information in awk file to make awk friendly. Use zawk init demo.awk, example as following:

#!/usr/bin/env zawk -f

# @desc this is a demo awk
# @meta author linux_china
# @meta version 0.1.0
# @var nick current user nick
# @var email current user email
# @env DB_NAME database name

then you can use ./demo.awk --help to get help support.

  • @desc: description for awk file
  • @meta: metadata for script, such as author, version etc.
  • @var: variable for script, email? means that the variable is optional. Access by awk -v varName="$PWD" ' END {print varName}'.
  • @env: environment variable, access by ENVIRON["USER"].

call zawk function from command line

Create a function cawk to call zawk:

cawk() { zawk "BEGIN{ print ${1} }" }

then call cawk 'uuid()' to get result.