Standard library for AWK with text, math, crypto, kv, database, network etc.
zawk stdlib Cheat Sheet: https://cheatography.com/linux-china/cheat-sheets/zawk/
Text is encoding with utf-8 by default.
Get unicode character length of text: length($1)
, length("你好")
to 2
.
Get byte length of text: strlen($1)
, strlen("你好")
to 6
.
Get char at index: char_at($1, 1)
, starts from 1. If index is out of range, return empty string.
Return char array of text: ar=chars($1)
, starts from 1.
if string text matches the regular expression in re. If s matches, the RSTART variable is set with the start of the leftmost match of re, and RLENGTH is set with the length of this match.
The 1-indexed substring of string s starting from index i and continuing for the next j characters or until the end of s if i+j exceeds the length of s or if s is not provided.
Substitutes t for the first matching occurrence of regular expression re in the string s.
Like sub, but with all occurrences substituted, not just the first.
index(text, 's')
: the first index within haystack in which the string needle occurs, 0 if needle does not appear.last_index(text,'s')
: the last index within haystack in which the string needle occurs, 0 if needle does not appear.
Splits the string s according to fs, placing the results in the array arr
. If fs is not specified then the FS variable
is
used to split s.
Get last part with sep: last_part("a/b/c", "/")
to c
.
If sep is not provided, zawk will use /
to search first, if not found, zawk will use .
to search.
last_part("a/b/c")
toc
last_part("a.b.c")
toc
Returns a string formatted according to fmt and provided arguments. The goal is to provide the semantics of the libc sprintf function.
Like sprintf but the result of the operation is written to standard output, or to out according to the append or overwrite semantics specified by > or >>. Like print, printf can be called without parentheses around its arguments, though arguments are parsed differently in this mode to avoid ambiguities.
Returns the hexadecimal integer (e.g. 0x123abc) encoded in s, or 0 otherwise.
hex("0xFF")
returns 255. Please use strtonum("0x11")
instead.
Returns columns i through j (1-indexed, inclusive) concatenated together, joined by sep, or by OFS if sep is not provided.
Like join_fields but with columns joined by , and escaped using escape_csv.
Like join_fields but with columns joined by tabs and escaped using escape_tsv.
Returns a copy of s where all uppercase ASCII characters are replaced with their lowercase counterparts; other characters are unchanged.
Returns a copy of s where all lowercase ASCII characters are replaced with their uppercase counterparts; other characters are unchanged.
numeric value(Decimal) strtonum("0x11")
.
Trim text with space by default. trim($1)
.
Trim text with chars with trim($1, "[]()")
truncate($1, 10)
or truncate($1, 10, "...")
capitalize("hello") # Hello
or uncapitalize("Hello") # hello
camel_case("hello World") # helloWorld
kebab_case("hello world") # hello-world
snake_case("hello world") # hello_world
title_case("hello world") # Hello World
isint("123")
isnum("1234.01")
Validate text format, such as: is("email", "demo@example.com")
. Format list:
- url
- phone
- ip: IP v4/v6
The return value is 1
or 0
.
starts_with($1, "https://")
ends_with($1, ".com")
contains($1, "//")
Why not use regex? Because starts_with/ends_with/contains are easy to use and understand. Most libraries include these functions, and I don't want AWK stdlib weird.
Tips: You can use regex expression for is_xxx()
、contains()
、starts_with()
、ends_with()
functions.
- is_int:
/^\d+$/
- contains:
/xxxx/
- starts_with:
/^xxxx/
- ends_with:
/xxxx$/
mask("abc@example.com")
, mask("186612347")
- pad:
pad($1, 10, "*")
to***hello**
,pad_start($1, 10, "*")
to***hello
,pad_end($1, 10, "**")
tohello***
,
text compare strcmp($1, $2)
return -1, 0, 1
Split text to none-empty lines: lines(text)
: array of text.
text to words: words("hello world? 你好") # ["hello", "world", "你", "好"]
repeat("*",3) # ***
Return default value if text is empty or not exist.
default_if_empty(" ", "demo") # demo
or default_if_empty(var_is_null, "demo") # demo
Add suffix/prefix if missing/present
append_if_missing("nats://example.com","/") # example.com/
preappend_if_missing("example.com","https://") # https://example.com
remove_if_end("demo.json", ".json") # demo
remove_if_begin("demo.json", "file://./") # file://./demo.json
quote/double text if not quoted/double quoted.
quote("hello world") # 'hello world'
double_quote("hello world") # "hello world"
- parse: use wild match -
parse("Hello World","{greet} {name}")["greet"]
- rparse: use regex group -
rparse("Hello World","(\\w+) (\\w+)")[1]
Convert bytes to human-readable format, and vice versa. Units(case-insensitive):
B
, KB
, MB
, GB
, TB
, PB
, EB
, ZB
, YB
, kib
, mib
, gib
, tib
, pib
, eib
, zib
, yib
.
format_bytes(1024)
: 1 KBto_bytes("2 KB")
: 2024
Generate password with numbers, lowercase/uppercase letters, and special chars.
mkpass()
: 8 chars passwordmkpass(12)
: 12 chars password
Help you to generate ASCII art text with figlet: BEGIN { print figlet("Hello zawk"); }
.
Attention: ascii characters only, don't use i18n characters. :)
- escape:
escape("format", $1)
: supportjson
,csv
,tsv
,xml
,html
,sql
,shell
- escape_csv(s): Returns s escaped as a CSV column, adding quotes if necessary, replacing quotes with double-quotes, and escaping other whitespace.
- escape_tsv(s): Returns s escaped as a TSV column. There is less to do with CSV, but tab and newline characters are replaced with \t and \n.
If you want to see the returned data structure, you can use the var_dump function, such
as var_dump(semver("1.2.3-alpha.1+zstd.1.5.0"))
.
url(url_text)
to parse url and return array with following fields:
- schema
- user
- password
- host
- port
- path
- query
- fragment
examples: url("https://example.com/user/1")
, url("jdbc:mysql://localhost:3306/test")
data_url("data:text/plain;base64,SGVsbG8sIFdvcmxkIQ==")
- data
- mime_type
- encoding
shlex("ls -l")
, https://crates.io/crates/shlex
path("./demo.txt")
- exists:
0
or1
- full_path
- parent
- file_name
- file_stem
- file_ext
- content_type
semver("1.2.3-alpha")
, semver("1.2.3-alpha.1+zstd.1.5.0")
array fields:
- major:
- minor
- patch
- pre
- build
Parse pairs text to array(MapStrStr), for example:
- URL query string
id=1&name=Hello%20World1
- Trace Context tracestate:
congo=congosSecondPosition,rojo=rojosFirstPosition
- Cookies:
pairs(cookies_text, ";", "=")
, such as:_device_id=c49fdb13b5c41be361ee80236919ba50; user_session=qDSJ7GlA3aLriNnDG-KJsqw_QIFpmTBjt0vcLy5Vq2ay6StZ;
Usage: pairs("a=b,c=d")
, pairs("id=1&name=Hello%20World","&")
, pairs("a=b;c=d",";","=")
.
Tips: if pairs("id=1&name=Hello%20World","&")
, text will be treated as URL query string, and URL decode will
be introduced to decode the value automatically.
Prometheus/OpenMetrics text format, such as http_requests_total{method="post",code="200"}
Usage:
record("http_requests_total{method='post',code=200}")
record("mysql{host=localhost user=root password=123456 database=test}")
record("table1(id int, age int)")
: DB table design
A message(record with body) always contains name, headers and body, and text format is
like http_requests_total{method="post",code="200"}(100)
Usage:
message("http_requests_total{method='post',code=200}(100)")
message("login_event{method='post',code=200}('xxx@example.com')")
Parse function invocation format into IntMap<Str>
, and 0 indicates function name.
arr=func("hello(1,2,3)")
:arr[0]=>hello
,arr[1]=>1
arr=func("welcome('Jackie Chan',3)")
:arr[0]=>welcome
,arr[1]=>Jackie Chan
uuid : uuid()
, uuid("v7")
ID specs:
- length: 128 bits
- version: v4, v7, and default is v4.
ulid: Universally Unique Lexicographically Sortable Identifier, please refer https://github.com/ulid/spec for detail.
ulid() #01ARZ3NDEKTSV4RRFFQ69G5FAV
ID specs:
- length: 128 bits
tsid: TSID generator tsid()
Snowflake ID is a form of unique identifier used in distributed computing.
snowflake(machine_id)
, and max value for machine_id
is 65535
.
ID specs:
- length: 64 bits
- machine_id: 16 bits, and max value is
65535
;
length(arr)
- delete item:
delete arr[1]
- delete array:
delete arr
seq(start, end, step)
: seq
command compatible
uniq(arr)
: IntMap -> IntMap, uniq
command compatible
n = asort(arr)
: sort array, and return sorted array length
_max(arr)
: IntIntMap -> Int, IntFloatMap -> Float
_join(arr, ",")
IntMap -> Str
parse_array("['first','second','third']")
: IntMap
tuple("(1,2,'first','second')")
: IntMap
variant("week(5)")
: StrMap
flags("{vip,top20}")
: StrMap
bf_insert(item)
orbf_insert(item, group)
bf_contains(item)
orbf_contains(item, group)
bf_icontains(item)
orbf_icontains(item, group)
: Insert if not found. It's useful for duplication check.
Find unique phone numbers: !bf_iconatins(phone) { }
Floating-point operations: sin, cos, atan, atan2, log, log2, log10, sqrt, exp are delegated to the Rust standard library, or LLVM intrinsics where available.
Returns a uniform random floating-point number between 0 and 1.
Seeds the random number generator used by rand, returns the old seed. Bitwise operations. All of these operations coerce their operands to integers before being evaluated.
abs(-1) # 1
,
floor(4.5) # 4
ceil(4.5) # 5
round(4.4) # 4
,
eval("1+2")
or eval("a + 2", context)
, and return type is Float.
Please refer https://github.com/isibboi/evalexpr for more.
Attention: Now only Int/Float/Boolean are supported, and boolean will be converted to 0/1.
fend("1+2") # 3
Please refer https://github.com/printfn/fend for more.
min(1,2,3)
, max("A","B")
,
the return value is 0
or 1
for mkbool(s)
.
examples: mkbool("true")
, mkbool("false")
, mkbool("1")
, mkbool("0")
, mkbool("0.0")
mkbool(" 0 ")
,
mkbool("Y")
, mkbool("Yes")
, mkbool("")
,
mkbool("✓")
int("11") # 11
,
float("11.2") # 11.2
utc by default.
systime()
: current Unix time
https://docs.rs/chrono/latest/chrono/format/strftime/index.html
strftime("%Y-%m-%d %H:%M:%S")
strftime()
orstrftime("%+")
: ISO 8601 / RFC 3339 date & time format.
please refer https://docs.rs/dateparser/latest/dateparser/#accepted-date-formats
mktime("2012 12 21 0 0 0")
:mktime("2019-11-29 08:08-08")
:
Convert duration to seconds: duration("2min + 12sec") # 132
. Time
units: sec, secs
, min, minute, minutes
, hour, h
, day, d
, week, wk
, month, mo
, year, yr
.
- Nushell Durations: https://www.nushell.sh/book/types_of_data.html#durations
- Fend: https://github.com/printfn/fend/blob/main/core/src/units/builtin.rs
Convert between hex and rgb.
hex2rgb("#FF0000") # [255,0,0]
: result is array[r,g,b]
rgb2hex(255,0,0) # #FF0000
Generate fake data for testing: fake("name")
or fake("name","cn")
.
- locale:
EN
(default) andCN
are supported now. - data:
name
,phone
,cell
,email
,wechat
,ip
,creditcard
,zipcode
,plate
,postcode
,id
(身份证).
from_json(json_text)
to_json(array)
json_value(json_text, json_path)
: return only one text value - json_value(json_text, '$.store.book[0].title')
Tips: RFC 9535 JSONPath: Query Expressions for JSON
json_query(json_text, json_path)
: return array with text value
from_csv(csv_row)
: array of text value for one rows
to_csv(array)
: csv row
xml_value(xml_text, xpath)
: node's inner_text
Attention: Please refer XPath cheatsheet for xpath syntax.
xml_query(xml_text, xpath)
: array of element's string value
html_value(html_text, selector)
: node's inner_text
Attention: please follow standard CSS selector syntax.
html_query(html_text, selector)
: array of node's inner_text
encode("format",$1)
Formats:
hex
,base32
(RFC4648 without padding),base58
base62
base64
,base64url
: url safe without padzlib2base64url
: zlib then base64url, good for online diagram service, such as PlantUML, Krokiurl
,hex-base64
,hex-base64url
,base64-hex
,base64url-hex
digest("algorithm",$1)
Algorithms:
md5
sha256
,sha512
,bcrypt
,murmur3
,xxh32
orxxh64
blake3
crc32
: checksumadler32
: checksum
- hmac:
hmac("HmacSHA256","your-secret-key", $1)
orhmac("HmacSHA512","your-secret-key", $1)
- encrypt:
encrypt("aes-128-cbc", "Secret Text", "your_pass_key")
,encrypt("aes-256-gcm", "Secret Text", "your_pass_key")
- encrypt:
decrypt("aes-128-cbc", "7b9c07a4903c9768ceeeb922bcb33448", "your_pass_key")
Explain for encrypt
and decrypt
:
- mode — Encryption mode. now only
aes-128-cbc
,aes-256-cbc
,aes-128-gcm
,aes-256-gcm
support - plaintext — Text that need to be encrypted.
- key — Encryption key.
16
bytes(16 ascii chars) for128
and32
bytes(32 ascii chars) for256
.
Hmac signature: HS256
, HS384
, HS512
:
- jwt:
jwt("HS256","your-secret-key", payload_map)
- dejwt:
dejwt("your-secret-key", token)
, return payload map.
RSA/ECDSA/EdDSA: RS256
, RS384
, RS512
, ES256
, ES384
, EdDSA
:
- jwt:
jwt("RS256", private_key_pem_text, payload_map)
- dejwt:
dejwt(public_key_pem_text, token)
JWK: RS256
, RS384
, RS512
, ES256
, ES384
, ES512
,
- dejwt:
dejwt("http://example.com/jwks.json#kid", token)
: please add kid as anchor.
Tips: you can use https://jwkset.com/generate to generate JWK json and keys PEM text.
Key/Value Functions:
kv_get(namespace, key)
kv_put(namespace, key, text)
kv_delete(namespace, key)
kv_clear(namespace)
namespace is SQLite db name, and db path is $HOME/.awk/sqlite.db
.
examples: kv_get("namespace1", "nick")
.
namespace is Redis URL: redis://localhost:6379/namespace
, or redis://localhost:6379/0/namespace
namespace is key name for Hash data structure.
kv_get("redis://user:password@host:6379/db/namespace")
namespace is NATS URL: nats://localhost:4222/bucket_name
, please use nats kv add bucket_name
to create bucket
kv_get("nats://localhost:4222/bucket_name/nick")
http_get(url,headers)
, http_post(url, body, headers)
.
you can ignore headers if not required.
response array:
- status: such as
200
,404
.0
means network error. - text: response as text
- HTTP header names: response headers, such as
Content-Type
Attention: If body is json text that starts with {
or [
and ends with {
or [
,
and Content-Type = application/json
will be added as HTTP header by default.
send_mail(from, to, subject, body)
by REST API, and to
could be multiple emails separated by ,
.
Environment variables for email sending:
MLSN_API_KEY
: API key for MailerSendRESEND_API_KEY
: API key for Resend
smtp_send(smtp_url, from, to, subject, body)
: send email by SMTP
SMTP URL format:
- SMTP basic:
smtp://localhost:1025
- SMTP with auth:
smtp://user:password@host:25
- SMTP + TLS
smtps://user:password@host:465
s3_get(bucket, object_name)
: get object, and return value is text.s3_put(bucket, object_name, body)
: put object, and body is text
Environment variables for S3 access:
- S3_ENDPOINT
- S3_ACCESS_KEY_ID
- S3_ACCESS_KEY_SECRET
- S3_REGION
Publish events to NATS: publish("nats://host:4222/topic", body)
Publish events to MQTT: publish("mqtt://servername:1883/topic", body)
- CloudFlare Pub/Sub:
mqtts://BROKER_TOKEN@YOUR-BROKER.YOUR-NAMESPACE.cloudflarepubsub.com/topic
local_ip() # 192.168.1.3
url: sqlite.db
db path
sqlite_query("sqlite.db", "select nick,email,age from user")
:sqlite_query("sqlite.db", "select nick,email,age from user")[1]
sqlite_execute("sqlite.db, "update users set nick ='demo' where id = 1")
libSQL url: ./demo.db
, http://127.0.0.1:8080
or libsql://db-name-your-name.turso.io?authToken=xxxx
.
libsql_query(url, "select id, email from users")
,libsql_execute(url,"update users set nick ='demo' where id = 1")
Tip: If you don't want to put authToken
in url, for example libsql://db-name-your-name.turso.io
,
you can set up LIBSQL_AUTH_TOKEN
environment variable.
url: postgresql://postgres:postgres@localhost/db_name
pg_query(url, "select id, name from people")
,pg_execute(url,"update users set nick ='demo' where id = 1")
url: mysql://root:123456@localhost:3306/test
mysql_query(url, "select id, name from people")
,mysql_execute(url,"update users set nick ='demo' where id = 1")
utc by default.
functions:
- systime: current Unix time
- strftime:
strftime("%Y-%m-%dT%H:%M:%S")
https://docs.rs/chrono/latest/chrono/format/strftime/index.html - mktime:
mktime("2021-05-01T01:17:02")
https://docs.rs/dateparser/latest/dateparser/#accepted-date-formats
Parse date time text to array: datatime()
, datetime(1621530000)["year"]
, datetime("2020-02-02")["year"]
datetime text format:
- systemd.time: https://www.freedesktop.org/software/systemd/man/latest/systemd.time.html
- dateparser: https://github.com/waltzofpearls/dateparser
date/time array:
- year: 2024
- month: 1, 2
- monthday: 24
- hour
- minute
- second
- yearday
- weekday
- hour: 1-24
- althour: 1-12
whoami()
,os()
,arch()
,os_family()
,pwd()
,user_home()
getenv()
is a function for ENVIRON["NAME"]
with default value: getenv("NAME", "default value")
.
Attention: zawk reads .env
file and injects them as environment variables by default
system2(cmd)
is different from system(cmd)
, and it will return array with code
, stdout
, stderr
.
To capture output of a command, and ou can use getline
and pipe to get the output:
function get_output(command) {
while (command | getline line) {
lines[i++] = line
}
return lines
}
With new system2
function, you can get the output directly:
result = system2("curl ifconfig.me")
println result["stdout"]
Attention: If you don't want to capture output, you can use system
function.
- read file into text:
read_all(file_path)
,read_all("https://example.com/text.gz")
- write text info file:
write_all(file_path, text)
Replace if file exits.
Tips: read_all
function uses OneIO, and remote(https or ftp) and compressions(
gz,bz,lz,xz) are supported.
Read config file to StrStrMap: read_config("tests/demo.ini")
.
Now only *.ini
and *.properties
are supported.
Tips: zawk will load .env
as environment variables automatically if it exists in the current directory.
Please visit: https://www.gnu.org/software/gawk/manual/html_node/Getline.html and http://awk.freeshell.org/AllAboutGetline
- dump:
var_dump(name)
, - logging:
log_debug(msg)
,log_info()
,log_warn()
,log_error()
Attention: dump/logging output will be directed to std err to avoid std output pollution.
isarray(x)
,typeof(x)
https://www.gnu.org/software/gawk/manual/html_node/Type-Functions.html
version()
: return zawk version
thanks to:
- DuckDB Functions: https://duckdb.org/docs/sql/functions/overview
- ClickHouse String Functions: https://clickhouse.com/docs/en/sql-reference/functions/string-functions
- Golang stdlib: https://pkg.go.dev/std
- Rust stdlib: https://doc.rust-lang.org/std/
- Deno stdlib: https://deno.land/std
- PHP stdlib: https://www.php.net/manual/en/book.strings.php
- sttr: https://github.com/abhimanyu003/sttr
- Java: