Introducing Data Bounties

5 min read

In 2018, we started the company that is now DoltHub to "create a place on the internet to get access to interesting, maintained data".

The data ecosystem of today reminds us a lot of the open source ecosystem of the late 1990s early 2000s. It's there but it's scattered and uncoordinated. Open Street Map and ImageNet gained notoriety and popularity. Those datasets even drove radical technological change. But there is no unifying force like Git and GitHub in software for the open data community to rally around.

At DoltHub, we believe if we allow branch and merge of data like in software, the barrier to sharing and collaborating on data is dramatically reduced and a vibrant open data community will emerge. We created Dolt, a SQL database you can branch and merge. Then we launched DoltHub, a place on the internet to share Dolt databases. Later, we launched forks further enhancing the ability to distributively collaborate on data.

In the beginning, DoltHub felt a little empty so we set out to load it with a bunch of interesting open datasets. The results are impressive. We learned a ton about open data and what data interests people. We even had a few intrepid souls start publishing like post-no-preference with stock market data.

In the summer of 2020, we asked ourselves how we could move faster getting interesting, maintained data on DoltHub. One hypothesis we had since we started this company was working with data was not as fun as working with code, so we might need to incentivize data creation and cleaning work more than code work. This meant adding currency to the arrangement somehow.

Today, we are announcing our test of that hypothesis. It's called Bounties. People who want data can create a bounty to have a Dolt database created, cleaned, or updated. Dolt users around the world can participate in the bounty and make real money doing data work. Because of Dolt's cell-wise tracking and merge capabilities, the bounty creator can implement sophisticated payment plans, like percentage of cells edited.

In the early phases of Bounties, we, DoltHub, will act on the buy side of the bounty. We have a number of interesting open datasets we would love to host on DoltHub and we're willing to pay to have those created. Over time, if cool data is being produced at a reasonable price, other buyers will show up and start running bounties as well.

We think Bounties might be a big part of the way "a place on the internet to get access to interesting, maintained data" emerges. Make money participating in a bounty today. Let us know if you would like to run a bounty.

How do Bounties work?

Bounties can have different payout structures: winner-take-all, percentage of edits, last write wins. Anything we can determine from the metadata in the Dolt commit graph. Bounty creators define the bounty. It can be as simple as a request for data in any schema or as complicated as a fully built schema with example data. This first bounty is a fully built schema with example data.

Anyone can participate in a public bounty. We may have private bounties later but for now bounties are all public. To participate, fork the database on DoltHub. Now, you have your own copy. This new fork is where you make all your writes. Make your modifications and push them to your fork. If you are worried about people sniping your work, make it private until you submit a pull request. That will require a DoltHub subscription but you'll make the money back and more from the bounty. Otherwise, don't push to DoltHub until you're ready to make a Pull Request.

Once you think your data is up to snuff, submit a pull request to the original repository from your fork. Your pull request will be reviewed by the bounty maintainer for correctness. The bounty reviewer will give you feedback and may ask you questions about your work. Once he or she is satisfied, she will accept or deny your pull request. If your pull request is accepted, you are now eligible for the bounty payout.

As pull requests are accepted, bounty payouts will be posted in the scoreboard. Bounty payouts are calculated as if the bounty stopped at the time viewed. As more submissions are accepted, you're expected payout will decline. Keep submitting pull requests to win a bigger share of the bounty.

The code for calculating bounty payouts is open source. We want to be as transparent as possible to bounty participants about how their hard work is rewarded. Also, we accept new payout strategies from bounties creators.

Please review the bounties help page or the bounties terms if you have any further questions.

The First Bounty

In the United States, we recently had a Presidential Election. Election data should be free and open. We've worked with the open elections project in the past. We also admire the MIT election data lab. Inspired by these two data projects, for our first bounty, we decided to offer a $25,000 prize pool for building precinct level election data for the 2016 and 2020 US presidential elections.

We started with a cleaned version of the MIT Election lab database. The cleaned results are in dolthub/us-president-precinct-results. We are accepting any inserted or updated rows for either the 2016 or 2020 presidential elections. The $25,000 bounty will be divided based on the percentage of cells modified, last write wins, in any table in the database. Get more details on how to contribute in Monday's blog post.

Start Contributing Today

We are really excited to launch Bounties but we'll be even more excited to start seeing contributions come in. Come hang out with us on our Discord channel. We have a Bounties room to help get you started. Let's build the best precinct level US presidential election database out there and get paid doing it.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt