The Web3 Data Ecosystem

5 min read


Dolt is a decentralized database. Dolt uses the Git protocol of decentralization with SQL database tables. Anyone can clone a Dolt database and make reads and writes offline. When that person is ready to share her changes, she can push to one of many copies. The person who receives the push can decide what to do with the change, merge it, make additional changes, or just leave it on a branch for others to read.

The Web3 space has embraced decentralization as a core tenet with blockchains. Blockchains allow anyone to make writes and use consensus mechanisms like proof of stake or proof of work to determine which writes get accepted.

We believe that Dolt and Web3 are spiritually aligned. We want to know how could Dolt help the Web3 ecosystem. Our first intuition is decentralized data distribution. Most common blockchains are not great at storing gigabytes of data. Could Dolt collect, store, and distribute interesting Web3 data in a decentralized way? What data would help the Web3 community the most? This blog surveys the Web3 data ecosystem to try and answer those questions.

Blockchain Data

Blockchains are the data structures that back cryptocurrencies. Blockchains are backed by nodes which are computers around the world running programs that validate the state of the blockchain. If you decide to run your own node (open-source) you can collect this data for free, although it can be cumbersome. Luckily enough, there are a wide variety of services that provide convenience/utility on top of data stored across multiple chains.

  1. Blockchain Scanners

The best way to get a sense of what exactly is stored on a blockchain is to use a blockchain scanner. Blockchain scanners index blockchains and describe what transactions have occurred on each block. The most popular blockchain scanner is Etherscan. Let's examine the following block.


We can see all sorts of information associated with this Ethereum block including the number of transactions, the gas fees (the fee for a transaction) and the miner reward (the eth reward for mining a block). We could also click into the transaction count to go into details over each transaction.

Other awesome blockchain scanners include:

These are super useful tools to get real-time insights on what exactly is happening on the blockchain.

  1. Blockchain Metrics

Blockchains also act as application ecosystems that compete for developers. If you're a developer, what are the metrics and pieces of data you are looking at to evaluate the health of a blockchain? Similarly, as a blockchain investor you want to constantly be monitoring the "health" of a blockchain to ensure the strength of your investment. Some blockchain health metrics include

  • Transaction Fees - Usually you want to minimize the fees you pay to transact, either to save yourself or your users money.
  • Hash Rate - The amount of computation power used to validate the blockchain. The higher this is the more secure a blockchain is.
  • Number of Addresses - The amount of unique users on the blockchain. The more users, the more reach your application has.

You can find most of this data through the below providers

Trading Data

The most well known part of the Web3 ecosystem are cryptocurrencies like Bitcoin, Ethereum, and Solana. These cryptocurrencies enable commerce in the Web3 ecosystem through transactions that are backed by blockchains. For a crypto newbie you can buy these currencies on centralized exchanges like Coinbase, Gemini and Binance. You can then hold them, transact with them or trade them on decentralized exchanges like Uniswap. This feels a lot like traditional financial data that tends to be highly proprietary.

  1. Coin Prices and Exchange Rates Data

Some of the most important data in the ecosystem is coin pricing data, which answers the question of "how many dollars is a coin worth". Providers of this data include:

Coin Gecko

These data points are sourced by tracking supply and demand at exchanges. Different exchanges may compute a different asset to USD Price. Trading coins among different exchanges is a common arbitrage strategy.

  1. Individual Trade Data

Another data component is trading level data. That is, across every exchange "What trades are executed at what time?" This data is much larger and much more difficult to collect, often requiring proprietary sources. Some providers include:

Application Level Data

Different types of applications use a blockchain as a backend to provide some sort of service. These applications range from DeFi trading protocols to NFT marketplaces. A transaction on the blockchain could represent something like an Uniswap trade or the purchase of an OpenSea NFT. Typically, these applications are backed on a smart contract which are open for everyone to see. You can collect this data yourself by monitoring smart contracts with your own chain node, but it can easily become an inconvenience. As a result, normalized and curated application data becomes proprietary.

Here are some providers:

Below is an example of a Dune dashboard providing data on the sales on the Bored Ape Yacht Club NFTs.



Web3 has certainly piqued our curiosity, and we are eager to get involved. Dolt's decentralized properties has the team exploring use cases like decentralized databases, crypto data bounties, and decentralized application architecture. We are first starting by publishing Web3 datasets on Dolthub like the Bitcoin blockchain. Swing by our Discord if you want to chat more about Dolt or Web3.



Get started with Dolt

Or join our mailing list to get product updates.