September Dataset Spotlight

3 min read

Every month we highlight some interesting datasets on DoltHub. The focus is on new or updated datasets but sometimes we shed fresh light on a classic.

For those new to Dolt and DoltHub, Dolt is Git for data. Git versions files. Dolt versions SQL tables. DoltHub is a place on the internet to share Dolt repositories.

We think the way we share data with each other is broken and we think Dolt is the fix. Whenever you see a link to a CSV, JSON, or XML file, you should think of Dolt. Whenever you see an API but want all the data, not just a few entries, you should think of Dolt. We are working hard to move data shared in these formats to Dolt. This series of blogs will update you on our progress.

US Stock Options

Link: post-no-preference/options
Contributor: post-no-preference
First Published: August 27, 2020

post-no-preference has been an active user and is publishing and updating two databases. The first is a US stock option database with greeks and volatility history. Data starts February 2019.

Calendar Estimates

Link: post-no-preference/calendar-estimates-statements
Contributor: post-no-preference
First Published: August 27, 2020

The second post-no-preference database is an earnings calendar for US companies. In other words this database contains when companies announce earnings. The database also contains earnings per share estimate data. We're excited to see maintained financial data on DoltHub. Thanks to post-no preference for the contribution.

Sample Network Topology

Link: itdependsnetworks/netbox_doltdb
Contributor: itdependsnetworks
First Published: September 16, 2020

Network To Code and DoltHub are co-presenting virtually at Interop in a few weeks. The presentation is how one may use Dolt to apply code management practices, version control, integration testing, and continuous deployment, to network configuration. This is a sample database from the presentation.

This use case inspired a user on our Dolt Discord:

"I picked this repo because, the application domain of the popular https://github.com/netbox-community/netbox application and ecosystem - tracking and planning the Infrastructure state, is a "textbook" example of how badly the world needs branching/merging in the databases.

With the infrastructure we have: chain of past states (for auditing and past incidents investigations), present states: recorded, buggy-automation-discovered and real (we fight to minimize the difference between these three), and future states: chain of future desired states (possibly with branched variants for evaluation/simulation from the architects/xyz-ops)."

Check out our interop presentation or contact us if you are interested in more information.

Google Open Images

Link: dolthub/open-images
Contributor: dolthub
First Published: September 11, 2019

Highlighting this one for a couple reasons. First, we changed our company name to DoltHub which meant migrating our organization to the new name. From now on, datasets we maintain will be published under the dolthub organization. Second, we released Dolt tags and as part of that release we highlighted data releases using the Google Open Images dataset. We imported version 5 and created releases for version 1 through 5.

Conclusion

That's it for this month. As you can see, most of the datasets are published by us. For Dolt and DoltHub to continue to exist, we need a community of data publishers to emerge. Help us build a community by publishing. We published a blog on how to publish with SQL and another on how to publish CSVs.

That said, if you want data in Dolt format but don't have the time or expertise to import and maintain it, send us a note or chat with us on Discord. We're happy to be an open data provider for your projects.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt