Announcing DoltHub SQL API

FEATURE RELEASE
4 min read

Dolt is Git for data, a relational database built to create, publish and consume datasets. DoltHub hosts a growing collection of public open datasets stored as Dolt databases. DoltHub allows you explore data through its SQL query interface. We're excited to announce that you can now query DoltHub datasets via the SQL API.

How it works

DoltHub's SQL API exposes repository data via an HTTP endpoint at www.dolthub.com/api/v1alpha1. Query parameters are encoded in the URL in the form of:

https://www.dolthub.com/api/v1alpha1/<owner_name>/<repo_name>/<branch_name>?q=<sql_query>

The <sql_query> is specified as a query string keyed by "q". Any valid SQL read query is acceptable.

An Example

Basketball is back y'all! The NBA bubble is fully inflated and sports fans out there are getting a much appreciated distraction from the day-to-day.

It's Dame Time!

The NBA hit pause on the regular season all the way back in March, so let's check out DoltHub's nba-players dataset and figure out where we left off. We'll use Python's requests library as a simple way to hit the API.

>>> import requests
>>> owner, repo = 'dolthub', 'nba-players'
>>> res = requests.get('https://dolthub.com/api/v1alpha1/{}/{}'.format(owner, repo))
>>> res.json()
{
     'query_execution_status': 'Success',
     'query_execution_message': '',
     'repository_owner': 'dolthub',
     'repository_name': 'nba-players',
     'commit_ref': 'master',
     'sql_query': 'SHOW TABLES;',
     'schema': [
            {'columnName': 'Table','columnType': 'String','isPrimaryKey': False}
        ],
     'rows': [
            {'Table': 'career_totals_allstar'},
            {'Table': 'career_totals_post_season'},
            {'Table': 'career_totals_regular_season'},
            ...
        ]
}

The response gives some general metadata about the query as well as the schema of the query and its result rows. A couple things to note here: <branch_name> and <sql_query> are optional parameters. If they're not specified they default to 'master' and 'SHOW TABLES' respectively. The nba-players dataset includes all of this season's stat totals up to the March stoppage. Let's find out who's leading the league in scoring.

>>> owner, repo, branch = 'dolthub', 'nba-players', 'master'
>>> query = '''SELECT tot.pts / tot.gp as ppg, ply.full_name as player
    FROM `season_totals_regular_season` as tot
    JOIN `players` as ply ON tot.player_id = ply.id
    WHERE tot.season_id = '2019-20'
    ORDER BY ppg DESC LIMIT 10;'''
>>> res = requests.get('https://www.dolthub.com/api/v1alpha1/{}/{}/{}'.format(owner, repo, branch), params={'q': query})
>>> res.json()
{
    'query_execution_status': 'Success',
    'query_execution_message': '',
    'repository_owner': 'dolthub',
    'repository_name': 'nba-players',
    'commit_ref': 'master',
    'sql_query': "SELECT tot.pts / tot.gp as ppg, ply.full_name as player\n\tFROM `season_totals_regular_season` as tot \n\tJOIN `players` as ply ON tot.player_id = ply.id \n\tWHERE tot.season_id = '2019-20' \n\tORDER BY ppg DESC LIMIT 10;",
    'schema': [
    	{'columnName': 'ppg','columnType': 'Int','isPrimaryKey': False},
    	{'columnName': 'player','columnType': 'String','isPrimaryKey': False}],
    'rows': [
    	{'ppg': '34', 'player': 'James Harden'},
    	{'ppg': '30', 'player': 'Bradley Beal'},
    	{'ppg': '29', 'player': 'Giannis Antetokounmpo'},
    	{'ppg': '29', 'player': 'Trae Young'},
    	{'ppg': '28', 'player': 'Damian Lillard'},
    	{'ppg': '28', 'player': 'Luka Doncic'},
    	{'ppg': '27', 'player': 'Russell Westbrook'},
    	{'ppg': '27', 'player': 'Kyrie Irving'},
    	{'ppg': '26', 'player': 'Kawhi Leonard'},
    	{'ppg': '26', 'player': 'Anthony Davis'}
    ]
}

This is a great example of the power of the SQL API. You can execute joins against that API rather than having to fetch all the data and compute the matches yourself.

Why Harden is a distant third in the MVP race... we'll never know. Speaking of Harden getting robbed, back in December this happened. Harden's dunk wasn't counted and could not be overturned with video replay. (The Rockets went on to lose in double OT.) Thankfully DoltHub has corrected the record on a separate branch within this repo. You can choose to believe either version of history:

>>> query = "SELECT id FROM `players` where last_name='Harden'"
>>> res = requests.get('http://www.dolthub.com/api/v1alpha1/{}/{}'.format(owner, repo), params={'q': query})
>>> res.json()
{
    ...
    'rows': [{'id': '201935'}]
}
>>> query = '''SELECT pts FROM `season_totals_regular_season`
    WHERE player_id = 201935 AND season_id = '2019-20';'''
>>> branch = 'master'
>>> res = requests.get('http://www.dolthub.com/api/v1alpha1/{}/{}/{}'.format(owner, repo, branch), params={'q': query})
>>> res.json()
{
    ...
    'rows': [{'pts': '2096'}]}
}
>>> branch = 'james-harden-basket-counts'
>>> res = requests.get('http://www.dolthub.com/api/v1alpha1/{}/{}/{}'.format(owner, repo, branch), params={'q': query})
>>> res.json()
{
    ...
    'rows': [{'pts': '2098'}]
}

And because much of Dolt's functionality is exposed via SQL, you can even query the commit log over the API:

>>> branch = 'james-harden-basket-counts'
>>> query = 'SELECT committer, message FROM dolt_log'
>>> res = requests.get('https://www.dolthub.com/api/v1alpha1/{}/{}/{}'.format(owner, repo, branch), params={'q': query})
>>> res.json()
{
    ...
    'rows': [
        {'committer': 'Tim Sehn', 'message': 'Added a field goal made as well'},
        {'committer': 'Tim Sehn', 'message': "James Harden's basket counts on Dec. 3, 2019"},
        {'committer': 'Tim Sehn', 'message': 'Relabeled columns in per36 views as per36 instead of per_game'},
        {'committer': 'Tim Sehn', 'message': 'Changed description of views'},
        {'committer': 'Tim Sehn', 'message': 'Added per36 views to README'},
        {'committer': 'Tim Sehn', 'message': 'Merged master'},
        {'committer': 'Tim Sehn', 'message': 'Added views for per36 stats'},
        {'committer': 'Andy Arthur', 'message': 'putting description back'},
        {'committer': 'Andy Arthur', 'message': 'moving player name to the front of saved query'},
        {'committer': 'Andy Arthur', 'message': 'removing where clause from save query'},
        {'committer': 'Tim Sehn', 'message': 'Added README'},
        {'committer': 'Tim Sehn', 'message': 'Added views for per game stats'},
        {'committer': 'Tim Sehn', 'message': 'Added sample query'},
        {'committer': 'Tim Sehn', 'message': 'Initial import of player data'},
        {'committer': 'Tim Sehn', 'message': 'Made appropriate columns doubles'},
        {'committer': 'Tim Sehn', 'message': 'Initial schema for this NBA player database'},
        {'committer': 'Tim Sehn', 'message': 'Initial import of teams and players'},
        {'committer': 'Tim Sehn', 'message': 'Initialize data repository'}
    ]
}

Anything is possible

Anything is possible

The flexibility of the SQL API allows you to do anything you want with your data. You can use the API to fetch data for visualization or as a simple backend for web app. The SQL API is currently in alpha. Its shape may change over time, but the basic principle of data access through SQL will remain. The API is built of top of Dolt's SQL engine powered by go-mysql-server so it's performance and compatibility are improving on a weekly basis. Checkout our SQL support on DoltHub. We're excited to see how the SQL API gets used in the wild. If you have any questions or requests, reach out to us!

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.