This document describes the style that Prettier plugin SQL-CST uses for formatting SQL.
- ✅ - decided and implemented
- ✔️ - decided, but not yet implemented
- ❓ - undecided
- 🔧 - can be disabled by setting
sqlCanonicalSyntax: false
This is based on the defaults used by Prettier:
- ✅ 2 spaces for indentation
- ✅ UNIX
\n
for newlines - ✅ Final newline at the end of a file
- ✅ Preserve single empty line between statements.
- ✅ UPPERCASE all keywords. Rule_L010, Rule_L040
- ✅ UPPERCASE all type names. Rule_L063
- ✅ preserve case of function names. Rule_L030
- ✅ preserve case of all other identifiers. Rule_L014
Keywords include literal values like NULL
, TRUE
, FALSE
,
but don't include function names that SQL dialects might themselves label as keywords (like sum()
, count()
),
notably the parameter-less builtin functions like current_date
aren't considered to be keywords.
At the same time CAST(x AS type)
is considered to be language construct
and is therefore formatted in uppercase.
- ✅ Finish all statements with a semicolon. Rule_L052
Undecided:
- ❓ Should single statement without a trailing semicolon be supported?
- ✔️ Use single quotes for strings if possible.
- ✔️ Use different quote style when string contains single quotes or is multi-line.
This might need to be configurable, though the support of single-quotes for strings should IMHO be available in all dialects.
- ✔️ Don't unnecessarily quote an identifier. Rule_L059
- ✔️ Always quote identifiers that are keywords. Rule_L029
For quoting style there are various options here:
- ❓
Default to double quotes (SQL standard)Not possible in some dialects like BigQuery. - ❓ Default to the preferred/default quoting style of a dialect, like backticks in MySQL and double quotes in SQLite.
- ❓ Preserve existing quoting style.
This definitely needs to be configurable.
- ✅ Surround binary operators with whitespace:
1 + 2
- ✅ Surround unary keyword-operators with whitespace:
NOT foo
- ✅ Don't add whitespace to unary punctuation-operators:
-price
- ✅ 🔧 Use
!=
instead of<>
for "not equal to" comparisons. Rule_L061 - ✔️ Add linebreak before binary operator: Rule_L007
WHERE
first_name = 'John'
AND age_bracket BETWEEN 12 AND 18
AND income > 1000
similarly:
SELECT
sqrt(normalized_power)
+ interval_duration
+ rest_duration
+ average_speed
- ✅ No whitespace before comma
- ✅ Single space after comma for inline list. Rule_L008
- ✅ Newline after comma for multiline list
- ✅ Forbid trailing commas in select clause. Rule_L038
- ✅ No space between function name and arguments.
my_func(1, 2, 3)
Rule_L017 - ✅ No space between single-word type name and arguments.
VARCHAR(100)
- ✅ Space between multi-word type name and arguments.
UNSIGNED NUMERIC (10, 5)
- ✅ Space after
USING
inJOIN
:JOIN foo USING (col1)
- ✅ Space after
OPTIONS
in BigQuery:OPTIONS (foo = bar)
- ✅ Space between table/view/CTE name and columns list:
CREATE TABLE foo (
id INT
);
CREATE VIEW my_view (col1, col2) AS
SELECT 1, 2;
WITH my_cte (c1, c2) AS (SELECT 1, 2)
SELECT * FROM my_cte;
- ✅ Discard unnecessary nested parenthesis.
((1 + 2)) * 3
-->(1 + 2) * 3
- ✅ Discard parenthesis around function arguments.
count((id))
-->count(id)
- ✔️ Top-level statements should not be wrapped in brackets. Rule_L053
- ✔️ No parenthesis after
DISTINCT
. Rule_L015
- ✅ Ensure space at the start of a line-comment:
--comment
-->-- comment
Often SQL allows for syntax variations. For example ALTER COLUMN col_name
or just ALTER col_name
.
It would be nice to enforce one consistent way of expressing this.
We should take into account:
- Which variant is preferred by the SQL standard
- Which variant is most common across various SQL dialects
Some candidates:
- ✅ 🔧
INSERT
->INSERT INTO
(standard syntax) - ✅ 🔧
REPLACE
->REPLACE INTO
(non-standard. More widely supported) - ✅ 🔧
DELETE
->DELETE FROM
(standard syntax) - ✅ 🔧
TRUNCATE
->TRUNCATE TABLE
(standard syntax.) - ✅ 🔧
MERGE
->MERGE INTO
(standard syntax.) - ✅ 🔧
CREATE TEMP TABLE
->CREATE TEMPORARY TABLE
(standard syntax) ADD
->ADD COLUMN
(more commonly supported)DROP
->DROP COLUMN
(more commonly supported)ALTER
->ALTER COLUMN
(more commonly supported)- ✅ 🔧
RENAME new_table_name
->RENAME TO new_table_name
(more commonly supported)
A related case is the use of non-standard function names where standard alternatives are available, like:
NOW()
->CURRENT_TIMESTAMP()
CURDATE()
->CURRENT_DATE()
Another case is with some syntax where some keywords are effectively redundant, as it specifies the default behavior. For example:
UNION DISTINCT
is the same asUNION
. Some dialects only supportUNION
andUNION ALL
, leaving out theDISTINCT
. SQL standard includes bothALL
andDISTINCT
.SELECT ALL
is the same asSELECT
.ORDER BY foo ASC
is the same asORDER BY foo
.- PostgreSQL
table_name *
is the same as justtable_name
. The*
indicates that inherited tables are to be included, which is the default.
In these cases the additional redundant syntax can be useful to clarify the intention of the code. So the pretty-printer probably shouldn't eliminate such redundant syntax.
- ✅ 🔧 Convert deprecated
:=
named argument syntax to standard=>
. - ❓
LIMIT ALL
-> (it's the same as specifying no limit)
- ✅ 🔧 Convert
SELECT DISTINCTROW
toSELECT DISTINCT
. - ✅ 🔧 Convert
RENAME AS tbl
toRENAME TO tbl
. - ✔️ Convert
CREATE TABLE foo (id INT KEY)
toCREATE TABLE foo (id INT PRIMARY KEY)
- ✅ 🔧 Convert
&&
,||
toAND
,OR
.
- Rule_L022: Blank line after CTE.
- Rule_L033:
UNION [DISTINCT|ALL]
is preferred over justUNION
. - Rule_L034: Select wildcards then simple targets before calculations and aggregates.
- Rule_L036: Select targets should be on a new line unless there is only one select target.
- Rule_L037: Ambiguous ordering directions for columns in order by clause.
- Rule_L047: Use consistent syntax to express "count number of rows", e.g.
count(*)
instead ofcount(1)
- Rule_L049: Comparisons with NULL should use
IS
orIS NOT
. - Rule_L051: Join clauses should be fully qualified, e.g.
INNER JOIN
instead of plainJOIN
. - Rule_L058: Nested CASE statement in ELSE clause could be flattened.
- Rule_L067: Enforce consistent type casting style, e.g. use
CAST(10 AS TEXT)
instead of10::TEXT
orCONVERT(10, TEXT)