Uncovering MySQL's Gotchas

5 min read

Dolt is Git for data. Git versions files, Dolt versions tables. Dolt comes with a SQL engine built in, which lets you run SQL queries against any version of the data you've committed. Our goal is to become fully SQL compliant and compatible with MySQL's dialect. To this end, we've contributed to sqllogictest and adopted the go-mysql-server project. We've made significant progress on our journey to full SQL correctness. As of publishing time, we're 92% correct according to sqllogictest. Along the way we've learned a lot about MySQL's idiosyncrasies and found some pretty inexplicable behavior. Today we're going to share some of our favorites.

MySQL: the Javascript of Databases

People love the hate. I've watched endless software talks on brilliant technologies given by top industry experts. My favorite software talk? "Wat", the JS-roasting classic by Gary Bernhardt. Sure, self-driving cars and planet-scale data systems are cool, but still not as interesting as

% node
> {is: "this"} + ["a", "number?"]    // object + array
NaN

In the words of one of DoltHub's founders, "MySQL is the Javascript of databases". It's got its rough spots, but it's a known entity. Let's take a look at some of those rough spots.

GOTO FAIL

MySQL has no problem letting you shoot yourself in the foot.

CREATE TABLE default_fail (
    pk int NOT NULL PRIMARY KEY DEFAULT (NULL)
);

Much like {} + [] this example is something you're not likely to run into, on the other hand why doesn't this just error?

Foreign Keys, like all database constraints, are crucial to ensuring data integrity. That's why you probably shouldn't be able to link these two columns with disparate data domains:

CREATE TABLE parent (
    id ENUM('a','b','c') PRIMARY KEY
);
CREATE TABLE child (
    parent_id ENUM('x','y','z') PRIMARY KEY,
    FOREIGN KEY (parent_id) REFERENCES parent(id)
);

Things are complicated even further by MySQL's implementation of enums:

INSERT INTO parent (id) VALUES ('a');
INSERT INTO child (parent_id) VALUES ('x');

This actually doesn't fail because enums are translated to integers and in this instance both 'a' and 'x' evaluate to 1. Gotcha.

The first two examples can really be attributed to user error. If you're going to mess with the schema of a database, you should know enough not to make those mistakes. But it turns out there are even more interesting ways to break relations between MySQL objects. Consider this table, view, and insert trigger:

mysql> CREATE TABLE users (
    id int PRIMARY KEY,
    level int
);
mysql> CREATE VIEW cohorts AS
    SELECT level, count(level) FROM users
        GROUP BY level ORDER BY level ASC;
mysql> CREATE TRIGGER start_level_zero
    BEFORE INSERT ON users
    FOR EACH ROW
        SET NEW.level = 0;

Now say we want to change the name of the column level...

mysql> ALTER TABLE users RENAME COLUMN level TO status;
    Query OK, 0 rows affected (0.01 sec)
    Records: 0  Duplicates: 0  Warnings: 0

mysql> INSERT INTO users VALUES (4,0);
    ERROR 1054 (42S22): Unknown column 'level' in 'NEW'
mysql> SELECT  * FROM cohorts;
    ERROR 1356 (HY000): View 'cohorts' references invalid table(s) or column(s) or function(s) or definer/invoker of view lack rights to use them

Broken! The column rename left the view and the trigger statement referencing the old column name. Everything that depends on that column name must be updated. Unknown column 'level' in 'NEW' doesn't clue you in that your trigger is broken, and until you figure it out you won't be able to make any inserts to users.

Now let's take a look at MySQL's type system, which I have to say is especially Javascripty. Our table definition is:

CREATE TABLE types_table (
    pk int primary key, s varchar(20), b bool
);
INSERT INTO types_table VALUES
    (0, "abc", true),
    (1, "xyz", false);

Suppose we made wanted to find rows with s = "abc", but we made a mistake and typed b instead of s.


mysql> SELECT * FROM types_table WHERE b = "abc"  -- heck it, we'll do it live
+----+------+------+
| pk | s    | b    |
+----+------+------+
|  1 | xyz  |    0 |
+----+------+------+
1 row in set, 1 warning (0.00 sec)

MySQL is happy to perform type coercion, interpret "abc" as false, and execute the query. It even spits out 1 warning at the end, which is mea culpa if I've ever seen it.

Compatibility Tradeoffs

Understanding and modeling MySQL's behaviors is important because it lets us leverage the massive MySQL ecosystem. Our SQL parser and server are based on Vitess and support the standard MySQL wire protocol. This means we can support a wide range of sql integrations with minimal effort. However, as some of the examples above showed, following along bit for bit isn't necessarily a good idea. Datetime conversion is one example of how we choose to deviate from the MySQL standard. The following query casts a datetime to an integer.

-- MySQL
SELECT CONVERT(CONVERT("2020-08-07 06:05:04", DATETIME), SIGNED); -- -> 20200807060504

MySQL chooses to use ISO8601 which have the format YYYYMMDDHHMMSS. Dolt, on the other hand, uses standard Unix timestamps for this cast.

-- Dolt
SELECT CONVERT(CONVERT("2020-08-07 06:05:04", DATETIME), SIGNED); -- -> 1596780304

Choosing where and how to deviate from the standard isn't an exact science, it involves some estimation and a lot of listening to your users. Generally we seek to maximize compatibility while also creating intuitive behaviors.

Conclusion

Building a relational database is a lot of work. Reaching 100% compatibility with an industry standard like MySQL adds another layer of difficulty. SQL is the lingua franca of data analysis and Git is the same for version control. We believe that your data deserves the power of both, and that migrating to Dolt should be a seamless transition. Our journey is ongoing, but we hope you'll come along for the ride. As always, let us know what you think about Dolt and what we should build next.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt