Miscellaneous#
Misc features#
| Component | Supported | Notes and limitations |
|---|---|---|
| Information schema | ✅ | |
| Views | ✅ | |
| Window functions | 🟠 | Some functions not supported, see window function docs |
| Common table expressions (CTEs) | ✅ | |
| Stored procedures | 🟠 | Only a few statements are not yet supported, see compound statements |
| Cursors | ✅ | |
| Triggers | ✅ |
Client Compatibility#
Some MySQL features are client features, not server features. Dolt ships with a client (ie. dolt sql) and a server (dolt sql-server). The Dolt client is not as sophisticated as the mysql client. To access these features you can use the mysql client that ships with MySQL.
| Feature | Supported | Notes and limitations |
|---|---|---|
| SOURCE | ❌ | Works with Dolt via the mysql client |
| LOAD DATA LOCAL INFILE | ❌ | LOAD DATA INFILE works with the Dolt client. The LOCAL option only works with Dolt via the mysql client |
Join hints#
Dolt supports the following join hints:
| name | supported | detail |
|---|---|---|
| JOIN_ORDER( | ✅ | Join tree in scope should use the following join execution order. Must include all table names. |
| LOOKUP_JOIN( | ✅ | Use LOOKUP strategy joining two tables. |
| MERGE_JOIN( | ✅ | Use MERGE strategy joining two tables. |
| HASH_JOIN( | ✅ | Use HASH strategy joining two tables. |
| INNER_JOIN( | ✅ | Use INNER strategy joining two tables. |
| SEMI_JOIN( | ✅ | Use SEMI strategy joining two tables (for EXISTS or IN queries). |
| ANTI_JOIN( | ✅ | Use ANTI strategy joining two tables (for NOT EXISTS or NOT IN queries). |
| JOIN_FIXED_ORDER | ❌ | Join tree uses in-place table order for execution. |
| NO_ICP | ❌ | Disable indexed range scans on index using filters. |
Join hints are indicated immediately after a SELECT token in a special
comment format /*+ */. Multiple hints should be separated by spaces:
SELECT /*+ JOIN_ORDER(arg1,arg2) */ 1
SELECT /*+ JOIN_ORDER(arg1,arg2) NO_ICP */ 1
Join hints currently require a full set of valid hints for all to be applied. For example, if we have a three table join we can enforce JOIN_ORDER on its own, join strategies on their own, or both order and strategy:
SELECT /*+ JOIN_ORDER(xy,uv,ab) LOOKUP_JOIN(xy,uv) HASH_JOIN(uv,ab) */ 1
FROM xy
JOIN uv on x = u
JOIN ab on a = u;
Additional notes:
- If one hint is invalid given the execution options, no hints are applied and the engine falls back to default costing.
- Join operator hints are order-insensitive
- Join operator hints apply as long as the indicated tables are subsets of the join left/right.
Table Statistics#
ANALYZE table#
Dolt currently supports table statistics for index and join costing.
Statistics are auto-collected by default for servers, but can be manually collected by running ANALYZE TABLE <table, ...>.
Here is an example of how to initialize and observe statistics:
CREATE TABLE xy (x int primary key, y int);
INSERT INTO xy values (1,1), (2,2);
ANALYZE TABLE xy;
SELECT * from information_schema.tables;
+-------------+------------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| SCHEMA_NAME | TABLE_NAME | COLUMN_NAME | HISTOGRAM |
+-------------+------------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| tmp4 | xy | x | {"statistic": {"avg_size": 0, "buckets": [{"bound_count": 1, "distinct_count": 2, "mcv_counts": [1,1], "mcvs": [[1],[2]], "null_count": 0, "row_count": 2, "upper_bound": [2]}], "columns": ["x"], "created_at": "2023-11-14T11:33:32.250178-08:00", "distinct_count": 2, "null_count": 2, "qualifier": "tmp4.xy.PRIMARY", "row_count": 2, "types:": ["int"]}} |
+-------------+------------+-------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
create table horses (id int primary key, name varchar(10), key(name));
insert into horses select x, 'Steve' from (with recursive inputs(x) as (select 1 union select x+1 from inputs where x < 1000) select * from inputs) dt;
analyze table horses;
select `index`, `position`, row_count, distinct_count, columns, upper_bound, upper_bound_cnt, mcv1 from dolt_statistics;
+---------+----------+-----------+----------------+----------+-------------+-----------------+-----------+
| index | position | row_count | distinct_count | columns | upper_bound | upper_bound_cnt | mcv1 |
+---------+----------+-----------+----------------+----------+-------------+-----------------+-----------+
| primary | 0 | 344 | 344 | ["id"] | [344] | 1 | [344] |
| primary | 1 | 125 | 125 | ["id"] | [469] | 1 | [469] |
| primary | 2 | 249 | 249 | ["id"] | [718] | 1 | [718] |
| primary | 3 | 112 | 112 | ["id"] | [830] | 1 | [830] |
| primary | 4 | 170 | 170 | ["id"] | [1000] | 1 | [1000] |
| name | 5 | 260 | 1 | ["name"] | ["Steve"] | 260 | ["Steve"] |
| name | 6 | 237 | 1 | ["name"] | ["Steve"] | 237 | ["Steve"] |
| name | 7 | 137 | 1 | ["name"] | ["Steve"] | 137 | ["Steve"] |
| name | 8 | 188 | 1 | ["name"] | ["Steve"] | 188 | ["Steve"] |
| name | 9 | 178 | 1 | ["name"] | ["Steve"] | 178 | ["Steve"] |
+---------+----------+-----------+----------------+----------+-------------+-----------------+-----------+
Disable#
Some workloads, like batch imports, perform strictly better without the overhead of statistics collection. In these cases, we can explicitly stop or purge (stop + delete) statistics on a running server:
call dolt_stats_stop();
call dolt_stats_purge();
A stopped-stats server can be restarted, or have a single collection cycle performed by an operator:
call dolt_stats_restart();
call dolt_stats_once();
An environment variable can disable statistics on server reboots:
— on version 1.51.0 or higher
SET @@PERSIST.dolt_stats_enabled = 0;
— up to 1.50.x
SET @@PERSIST.dolt_stats_auto_refresh_enabled = 0;
A rebooted server with stats turned off has no reversal mechanism at the moment. All stats operations are no-ops if a server starts with the above variables set.
Auto-Refresh#
Statistics automatically update for servers by default. Stats are stored in a database in .dolt/stats separate from user data. This folder can safely be deleted offline.
Stats throughput can be lowered by raising the the dolt_stats_job_interval variable, which indicates the milliseconds of delay between processing steps. The higher the delay and more branches in a database, the longer it will take for statistic updates to materialize. High delays reduce the fraction of runtime resources diverted to managing background statistics.
Stats can be disabled with the dolt_stats_enabled=0 variable.
Stats persistence can be disabled with the dolt_stats_memory_only=1 variable.
Stats Garbage Collection#
The stats in-memory cache accumulates new histograms proportionally to the write rate and stats update rate. Periodically, an update cycle will swap the currently active histogram buckets to a new in-memory map and clear the old set.
Stats garbage collection can be disabled with the dolt_stats_gc_enabled=0 variable.
Garbage collection frequency can be tuned with the dolt_stats_gc_interval variable (default 1 hour).
Stats Controller Functions#
Dolt exposes a set of helper procedures for managing statistics collection and use:
dolt_stats_stop: clear queue and disable threaddolt_stats_restart: clear queue, refresh queue, start threaddolt_stats_purge: clear queue, refresh queue, clear cache disable threaddolt_stats_once: collect statistics once, ex: in sql-shelldolt_stats_wait: block on a full queue cycledolt_stats_gc: block waiting for a GC signaldolt_stats_flush: block waiting for a flush signaldolt_stats_info: print the current state of the stats provider (optional'-short'flag)