Announcing Full-Text Indexes

FEATURE RELEASE
5 min read

Today, we are proud to announce the initial release of Full-Text indexes! This is very important for us, as Dolt is a drop-in replacement for MySQL, that gives you all the versioning features that you're used to from Git. As we're developing our own custom SQL engine (currently based on MySQL, but PostgreSQL support may not be far away) to support our versioning functionality, this means that we don't get any features for free. Happily, we've heard our customer's desires, and we've executed!

What are Full-Text indexes?

Full-Text indexes provide an efficient way to search through text. In its default form, you declare a FULLTEXT index over one or more text columns, and Dolt breaks up each document into words. The Full-Text search engine may then be invoked using the MATCH ... AGAINST ... expression, which will then match any given words to their parent documents. This can be far more powerful than a regex-based search, as words do not have to be in order, and also the text is normalized (according to MySQL's rules on normalization).

This search efficiency is especially evident when there are a large number of documents, or the documents themselves are very large. The tradeoff is that we pay a penalty during insertion, as we break each document down into its individual words.

How to use Full-Text indexes

Let's walk through a short example, just to see how they work in action. Feel free to download Dolt and follow along! Our installation process is the easiest in the business, as we're just a single binary!

First, we'll create our repository:

$ mkdir example

$ cd example

$ dolt init
Successfully initialized dolt data repository.

Next, let's create a table, with the FULLTEXT index declared during the table's creation. We'll then insert a few rows.

$ dolt sql <<SQL
> CREATE TABLE articles (pk BIGINT PRIMARY KEY, title VARCHAR(200), body VARCHAR(200), FULLTEXT (title, body)) COLLATE utf8mb4_0900_ai_ci;
> INSERT INTO articles VALUES 
> (1, 'MySQL Tutorial','DBMS stands for DataBase ...'), 
> (2, 'How To Use MySQL Well','After you went through a ...'), 
> (3, 'Optimizing MySQL','In this tutorial we will show ...'), 
> (4, '1001 MySQL Tricks','1. Never run mysqld as root. 2. ...'), 
> (5, 'MySQL vs. YourSQL','In the following database comparison ...'), 
> (6, 'MySQL Security','When configured properly, MySQL ...');
> SQL
Query OK, 6 rows affected (0.00 sec)

Now, to make use of a Full-Text index, you need to use the MATCH ... AGAINST ... expression. If this is used after the WHERE clause, then it will filter results and return only those documents that match the string given after AGAINST. For this example, we'll search our index (defined on columns title and body) for the string tutorial.

$ dolt sql -q "SELECT * FROM articles WHERE MATCH(title, body) AGAINST ('tutorial');"
+----+------------------+-----------------------------------+
| pk | title            | body                              |
+----+------------------+-----------------------------------+
| 1  | MySQL Tutorial   | DBMS stands for DataBase ...      |
| 3  | Optimizing MySQL | In this tutorial we will show ... |
+----+------------------+-----------------------------------+

You can also use the MATCH ... AGAINST ... expression like any other, so let's return a value from our SELECT statement. The value returned is called the "relevancy", which is a sort of magic number that states how "good" of a match the document is. This is another key differentiator to a basic regex search, as regex will only give two binary values: match or no match. It is worth noting that our relevancy values are subject to change as we tweak the algorithm.

$ dolt sql -q "SELECT MATCH(title, body) AGAINST ('tutorial'), pk, title, body FROM articles;"
+-----------------------------------------+----+-----------------------+------------------------------------------+
| MATCH(title, body) AGAINST ('tutorial') | pk | title                 | body                                     |
+-----------------------------------------+----+-----------------------+------------------------------------------+
| 0.79585797                              | 1  | MySQL Tutorial        | DBMS stands for DataBase ...             |
| 0                                       | 2  | How To Use MySQL Well | After you went through a ...             |
| 0.79585797                              | 3  | Optimizing MySQL      | In this tutorial we will show ...        |
| 0                                       | 4  | 1001 MySQL Tricks     | 1. Never run mysqld as root. 2. ...      |
| 0                                       | 5  | MySQL vs. YourSQL     | In the following database comparison ... |
| 0                                       | 6  | MySQL Security        | When configured properly, MySQL ...      |
+-----------------------------------------+----+-----------------------+------------------------------------------+

How about using some Version Control?

Of course, Dolt isn't just a MySQL replacement, we are a fully versioned database! Let's build off of our previous example, and create a branch that we'll call other. We also need to commit our changes (we default to the main branch).

$ dolt add -A

$ dolt commit -m "Initial commit"
commit 5gs6916ul0qmiajesvag5cfc5s6qe1kg (HEAD -> main)
Author: Daylon Wilkins <daylon@dolthub.com>
Date:  Wed Jul 26 12:00:00 -0700 2023

        Initial commit


$ dolt branch other

Before we switch to our other branch, let's add a new row to our main branch.

$ dolt sql -q "INSERT INTO articles VALUES (7, 'Versioning Features', 'Versioning is something that has ...');"
Query OK, 1 row affected (0.00 sec)

$ dolt add -A

$ dolt commit -m "Added an article"
commit 52j66t4qd6urnd3ls9380es5qtadmhdg (HEAD -> main)
Author: Daylon Wilkins <daylon@dolthub.com>
Date:  Wed Jul 26 12:00:01 -0700 2023

        Added an article

Now, we'll switch to our other branch, and add a another new row.

$ dolt checkout other
Switched to branch 'other'

$ dolt sql -q "INSERT INTO articles VALUES (8, 'Versioned Data', 'Applying versioning to data is ...');"
Query OK, 1 row affected (0.00 sec)

$ dolt add -A

$ dolt commit -m "Added another article"
commit 5uj1gqjh6g15oqbt8cfp96vedljrm02h (HEAD -> other)
Author: Daylon Wilkins <daylon@dolthub.com>
Date:  Wed Jul 26 12:00:02 -0700 2023

        Added another article

Finally, we'll switch back to our main branch, and merge our other branch into main, which merges both the table data, along with merging the Full-Text index data. We'll then search for a new word, 'versioning', which will return the new results from our main branch and the other branch!

$ dolt checkout main
Switched to branch 'main'

$ dolt merge other
Updating htfqt7598q1cqj2gt8dfgv8l5ho569it..5uj1gqjh6g15oqbt8cfp96vedljrm02h
commit htfqt7598q1cqj2gt8dfgv8l5ho569it (HEAD -> main)
Merge: 52j66t4qd6urnd3ls9380es5qtadmhdg 5uj1gqjh6g15oqbt8cfp96vedljrm02h
Author: Daylon Wilkins <daylon@dolthub.com>
Date:  Wed Jul 26 12:00:03 -0700 2023

        Merge branch 'other' into main

articles | 1 +
1 tables changed, 1 rows added(+), 0 rows modified(*), 0 rows deleted(-)

$ dolt sql -q "SELECT * FROM articles WHERE MATCH(title, body) AGAINST ('versioning');"
+----+---------------------+--------------------------------------+
| pk | title               | body                                 |
+----+---------------------+--------------------------------------+
| 7  | Versioning Features | Versioning is something that has ... |
| 8  | Versioned Data      | Applying versioning to data is ...   |
+----+---------------------+--------------------------------------+

Of course, this is all just the tip of the iceberg. Feel free to play around and explore how much power lies in Dolt!

What's Missing?

We've only implemented the baseline functionality, as Full-Text indexes are very expansive. A few missing features are:

  • Other search modifiers
  • Different text parsers
  • Stopword support
  • Minimum word length adjustments
  • Phrase matching

In addition, we have additional improvements to make regarding optimization and ergonomics. These will be added in the coming weeks, so stay tuned! If any of the aforementioned missing features are something that you need, then submit an issue and let us know!

Conclusion

Implementing just this amount of support for Full-Text has been a herculean task, and there is much more work to be done! If you've enjoyed this blog, feel free to check out some of our other blogs! To stay up to date, you can follow us on Twitter, or you can chat directly with us through Discord.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.