We take a holistic approach towards best practices at 173Tech. Data and analytics are here to guide us in our daily decisions. It needs to be clearly defined, trusted, and easily digestible for everyone.
We build quality and thoughtfulness into all aspects of our processes. It is the key to a successful data strategy.
In this post, we would like to share our SQL formatting guidelines. We standardize because:
– Codes should always be peer-reviewed before release. Clearly formatted scripts cut downtime required to comprehend someone’s work.
– It helps spot and eliminates human errors, hence increasing accuracy and trust in your data solutions.
– It facilitates smooth and efficient handover. The project evolves, people move on. Well-documented scripts minimize disruptions.
– It spreads knowledge. Commenting on a data ‘trick’ costs little but could be greatly beneficial to others. Sometimes even pros can learn from other pros!
1. Write all SQL keywords and functions in capital letters. It helps to identify immediately the main components of the query.
2. Align top-level clauses (SELECT, FROM, WHERE, HAVING, GROUP BY) with river blanks.
3. Do not write more than one of the above clauses in a single line.
4. Keep the tabulation consistent throughout the query.
5. When selecting multiple columns, put each in a new line starting with spaces, comma, space then column name.
6. Use ‘AS’ for column alias.
7. Prefix column selection with table alias, e.g. t.user_id.
Common Table Expression (CTE)
1. Use CTEs instead of nested subquery.
2. Indent CTEs based on river of blanks with the “WITH” statement.
3. Specify the column names in the outermost query, rather than using “*”. SELECT * is acceptable for WHERE NOT EXISTS clauses or intermediate queries (eg: CTEs).
WITH registration AS (SELECT ... FROM ...) , activity AS (SELECT ... FROM ...) SELECT ... FROM registration
Structure and Comments
– Use comments to explain where code is complicated or not obvious or involves special scenarios or business logic.
– Avoid empty lines with only spaces.
– Use meaningful CTE names, e.g. registrations instead of table1.
– Use meaningful column alias, especially with aggregate functions.
– When possible, use common prefixes and suffixes. Examples:
On long lines (eg: CASE WHEN) use a logical point for wrapping the line manually. Some options:
a) Align the WHEN keywords
SELECT CASE WHEN t.column = 'a very long value' THEN 1 WHEN t.column = 'another very long value' THEN 2 END AS column_alias
b) Split on operator
SELECT first_picking_cost_usd + extra_picking_cost_usd * (n_items - 1) + packaging_cost_usd + shipping_cost_usd AS fulfilment_cost_usd FROM orders
c) One line per parameter to a function
SELECT COALESCE(t.a_very_looooooooooooooooooong_value, t.another_very_looooooooooooooooooong_value) AS long_values
The Open Data Science community is passionate and diverse, and we always welcome contributions from data science professionals! All of the articles under this profile are from our community, with individual authors mentioned in the text itself.