Today, we’re launching our 4th bounty.
Our previous three on the Presidential Election, Hospital Pricing, and College Course Catalogs have been eminent successes, if I do say so myself. Bounties are a really great way to assemble open datasets. Super affordable, super fun.
We really believe ML open datasets (think imagenet) have changed the world, and leaders in the space are clamoring for more. With this mind, we’re launching our first data bounty designed specifically to improve an ML dataset.
As our first foray in collaborative ML, we’ve decided to launch our Logo-2K+Extended dataset bounty. In 2019, researchers at Shandong University, the Chinese Academy of Sciences, and Fairfield University undertook the laborious task of assembling and validating the Logo-2k+ dataset in order to enable the creation of logo classification/detection models—of course, at the time Dolthub bounties didn’t exist. We’re curious about the ease and alacrity such a dataset could be constructed in 2021. Our goal is not just to replicate the existing Logo-2k+ dataset, but if possible to go beyond it. If you’re interested, come check us out, contribute to a public open-source ML project, and claim your share of the bounty!
I instantiated our dolt repository with the 2,341 brand classes and the 10 root categories extant in Wang et al. 2019, and it’s up to participants to propose images of logos for each of the 2k+ brands, as well as propose new brands and/or categories to include in the dataset. Of the proposed images, we at DoltHub will curate the most suitable subset, optimizing for high-coverage and high-diversity. Your submissions’ inclusion in the final dataset will ensure you a portion of the $10k bounty.
For more information about how the table is structured, check out the database readme and you can see an example logo submission here. We want Dolthub bounties to become an avenue for generating publication worthy results in machine learning. I hope you’ll join us for our exciting first steps!