How many times has one switched between Python to Java, resulting in constant backspaces to correct missing semicolons and other syntax idiosyncrasies to appease stubborn compilers? As with any language, SQL implementations also have their own quirks and tricks that can lead to irritating troubleshooting when syntax differences lead to undefined behavior or inaccuracies, yet the nature of data science demands a competency in switching between database management systems to meet project requirements.
To simplify this process, I’ve compiled a list of some of the more peculiar and important differences between MySQL, PostgreSQL, and SQLite, all of which are widely used relational database management systems (RDBMS).
But first, what is SQL? Originally developed by IBM Corporation, Inc. based on a model for RDBMS by Dr. E. F. Codd’s paper in 1970, SQL, or Structured Query Language, was later commercialized by Oracle in 1979. It has now become the standard for database systems over the past few decades. The American National Standards Institute (ANSI) and the International Standards Organization (ISO) later published standards for the language for widespread reference. Since the creation of SQL, various RDBMS have implemented their own versions of the language in their systems, leading to a wide range of systems to choose from like MySQL or PostgreSQL with a small learning curve to each.
Unlike many other database systems, SQLite does not use a separate server but rather stores the complete database in a file. Since 2000, an international group of developers has offered support for it; however, SQLite remains in the public domain. The language is an effective option for projects with embedded devices, application file formats, websites with low to moderate traffic, data analysis and others. You can find it being used at Airbus for its flight software for the A350 XWB aircrafts or in Adobe for its application file format in Photoshop Lightroom. All of the Python versions since Python 2.5 also include SQLite.
SQLite Tips to Remember:
- There is no regular “join” in SQLite. Instead, you must explicitly state either a cross, inner or outer join.
- Unlike stricter languages, SQLite has flexible typing. Due to this, the system fixes some data entry errors. For example, if an integer were imported as ‘452’, then SQLite would convert this value to an integer. In addition to this, SQLite doesn’t truncate strings that are longer than what is defined.
- The language does not have booleans as a built-in data type. The ability to use ‘TRUE’ or ‘FALSE’ as aliases for 1 or 0 was just adopted in version 3.23.0 that was released in April.
- SQLite does not have an explicit DATETIME data type. Instead, it interprets dates from data that is inserted as a text in ISO-8610 format, as an integer in Unix time or as a real value in Julian Day number. The built-in date and time functions can then recognize and use dates and times stored in one of these formats.
- While SQL sets the standard for string literals being used in single quotes and identifiers being used in double quotes, SQLite also recognizes literals in double quotes if there is no corresponding identifier.
Created in the last 90’s, co-founders David Azmark and Michael ‘Monty’ Widenius saw MySQL grow to be one of the most popular database management systems. From its start, the system has been offered as open source under the GNU General Public License. MySQL was later acquired by Sun Microsystems, Inc. in 2008 before Oracle acquired Sun in 2012. MySQL is still maintained and owned by Oracle, which offers both free and enterprise versions of the system. Clients use MySQL across a variety of industries with customers like Netflix, NASA, Tesla, and Airbnb.
MySQL Tips to Remember:
- Schemas and databases in MySQL are pretty much synonymous. You could write both ‘CREATE SCHEMA’ and ‘CREATE DATABASE’ to define a new database.
- MySQL uses backticks instead of quotes for identifiers. It is necessary to use backticks when the identifier has whitespace or reserved words.
- The language does not have a full outer join, which allows one to pull all information from two tables with NULL values being returned where there is no match
- While subqueries can be mimicked with temporary tables, MySQL still does not allow subqueries to access data from previous queries.
The predecessor of PostgreSQL, the POSTGRES package developed at the University of California at Berkeley, started in 1986 through sponsorship by the Defense Advanced Research Projects Agency, the Army Research Office, the National Science Foundation and ESL, Inc. In 1994, an open source version of POSTGRES with a SQL interpreter was then released, which eventually took its current name of PostgreSQL in 1996. You can find companies like Uber, Netflix, Instagram and Spotify using PostgreSQL.
PostgreSQL Tips to Remember:
- When using aliases in nested queries or renaming fields, “as” is needed in between the column and the name.
- Each PostgreSQL database also has a schema which contains its tables. Queries cannot be made in PostgreSQL across different databases. One can have a database cluster where users are shared; however, data is not.
- Each database has a public schema where tables are stored without having a schema associated. When using PostGIS, geometry metadata is stored in the public schema.