$10,000 Basketball Statistics Bounty

4 min read

It's that time again. We just finished the US businesses bounty so it's time for a new bounty. We are launching a $10,000 bounty to build SHAQ.

Shared
Hoops Database for
Accessible
Querying.

Shaq!

We had some Dolt-curious folks who work with the National Basketball Association (NBA) approach us in September. For NBA stat heads, there's a lot of online utilities like NBA.com and Basketball Reference to get access to basketball statistics. But there's not a great solution for getting basketball statistics in a queryable form. These folks thought Dolt might be a good way to bootstrap a global basketball statistics database.

We came up with SHAQ, which for now contains uniquely keyed players, teams, leagues, seasons, and player season statistics. We've imported most NBA data already. We hope this bounty helps fill in Women's, NCAA, and International League data. We might even get some high school and AAU data.

If this goes well, we imagine extending SHAQ to games and play-by-play data. Maybe the next bounty...

SHAQ will be the first publicly available database that links players across leagues. You can use SHAQ to answer questions like "Who is the best Euroleague player to never play in the NBA?" and back your analysis with a link to SHAQ. We're excited!

Data we want

We've inserted regular season and playoff NBA data from dolthub/nba-players. This data was sourced from the NBA.com API. There's a bunch more data to add. Also, potentially, there may be some errors in the quick and dirty import process I ran. Feel free to correct and clean the data.

WNBA

We would love to get data for the WNBA. Here is a good place to start.

2020 Season NBA rookies

The python package we used to import NBA data has not been updated with the new 2020 NBA rookies. Grabbing these new NBA players and their 2020 statistics would be great.

NBA All Star data

The NBA All star data has not been imported. Figuring out how to model All Star teams I thought would be fun for a bounty participant. After you do that, you can run a SQL query similar to the one in this PR to insert the player statistics.

NCAA Mens

We would love to get NCAA men's player statistics. A good place to start is the NCAA website. The challenge here is to link any player that played in the NBA to their SHAQ Player ID.

NCAA Womens

We would also love to get NCAA women's statistics. The NCAA website also posts these statistics. Again, the challenge here is linking WNBA and NCAA SHAQ player IDs.

International Leagues

We want to start grabbing data about Men's international leagues. Here's a good resource to get started. We're not sure how deep we'll get here but bounty hunters have impressed us in the past.

High School, AAU

This is where it will start to get a little crazy. How much pre-college basketball data can we get for the men's and women's game?

Schema

The schema is pretty straightforward. The tables are:

  1. Leagues
  2. Teams
  3. Seasons
  4. Season Types
  5. Team Seasons
  6. Players
  7. Player Season Statistics

Use the NBA example to model your data. You will need to create new IDs. For everything other than players, just increment by one. Be prepared to revise your PR if we run into collisions. Start new ID spaces for players in new leagues at the 100000 mark and increment if adding a player to a league. For instance, if we added NCAA men's next, any non-NBA player already created should start at 100,000. Already created players should use the existing ID.

Other Rules

  • snake_case, not camelCase
  • No special characters in names (e.g. Luka Doncic, not Luka Dončić). This decision makes it easier to match players to different public data sources, most of which do not use special characters.
  • A season_type differentiates both team_seasons and player_seasons between preseason, regular season, playoffs, all-star, etc.

Payout

We are going to pay out based on rank again for this bounty, just like the previous us-businesses bounty. Your ranking is determined by the number of non-NULL cells you contribute to the database.

The prize structure is fixed and is as follows:

| Your place  | You make |
|-------------|----------|
| 1st         | $5,000   |
| 2nd         | $2,500   |
| 3rd         | $1,250   |
| 4th         | $625     |
| 5th         | $320     |
| 6th         | $150     |
| 7th         | $100     |
| 8th-20th    | $50      |

Happy Hunting

Let's build a great hoops database for everyone to use! If you have any question, join our Discord. The #data-bounties channel is always lively.

SHARE

JOIN THE DATA EVOLUTION

Get started with Dolt

Or join our mailing list to get product updates.