DoltHub is a place on the internet to share, discover, and collaborate on Dolt databases. We're committed to making data collaboration seamless and effective for you, which is why we're excited to share an exciting addition to our pull request workflow on DoltHub: the ability to comment on cells in pull request diffs. While you've always had the option to leave general comments or questions on pull request pages, our new diff comment feature takes your collaboration to a whole new level.
What are data diffs?
The diff pages on DoltHub provide a clear visual representation of changes between commits, making it easier to understand and review modifications. They highlight additions, modifications, and deletions – crucial for maintaining data integrity and ensuring everyone is on the same page.
With diff comments, you can pinpoint your feedback to specific cells within the table. You can provide direct and precise comments on the exact data points in question, as well as highlight and discuss the changes right there. This feature improves communication by providing a direct line of contact between you and the pull request author. It eliminates ambiguity, making collaboration more efficient and effective.
DoltHub's diff comments share similarities with GitHub diff comments, but with a focus on data instead of code. There are some GitHub features we have not yet implemented on DoltHub, such as committing comment suggestions and associating diff comments with a review. Both allow users to comment on a diff and reply to those comments, show context around the comments, and allow resolving conversations.
Due to the difference in nature of reviewing data vs reviewing code, you can comment on a single cell on DoltHub instead of an entire line/row. It's less common on GitHub to have a pull request with a single file containing thousands of line changes, while this is much more common for data tables. DoltHub handles this by displaying the diff of one table at a time and paginating the results. This makes it difficult to link to a particular comment on the diff page, and is something we are looking to solve in the near future.
Tracking the destination of cell comments wasn't easy. To locate the comment's exact location, we needed a distinctive identifier for each cell. For tables with primary keys, using the primary keys and column names is a straightforward approach, but it does not work for keyless tables. Dolt employs Prolly Trees for storing table indexes. For diffs, every row is assigned a unique key, regardless of whether it's part of a keyless table or not. We use this row key in combination with the column name to find the cell associated with a comment.
To mark a comment as outdated, we store both the row key and a hash of the row's content at the time the comment was created. When changes are pushed to the pull request, we use the row key to check if the modifications in the diff include the row that the conversation is on. If it does, we compare the row hash to identify any changes. Based on this comparison, we determine whether the conversation should be marked as outdated.
Let's take a look at how these features come together in a real-life data review. I've imported data from the Metro bike share website to a DoltHub database
Los-Angeles-Bike-Share-Trip, and submitted a pull request. To ensure data accuracy, I've requested a review from my colleague Taylor.
Taylor reviews the changes in the data diff and spots an inconsistency: some of the duration times and end times appear incorrect. She opens the comment form on the cell and leaves some feedback.
The diff comments also show up on the pull request page:
I get notified that she left some comments, reply on the pull request page, and make the adjustments to address the issues.
As I push the revised changes, I mark the conversation as resolved to ensure that feedback is acknowledged and addressed.
We're actively working on further improvements to our diff comments feature. Here's a sneak peek at what's coming:
Automatic marking of conversations as outdated when the cell with comments is updated.
For now, commenting on a deleted row won't display the row context of the comment since we currently only show the 'after' change. However, we're working on adding the 'before' change data to provide better context around all types of row changes.
Accurate navigation to specific rows within the diff.
Ability to add a comment on an entire row or multiple rows.
This new feature empowers you to have more focused discussions, making your feedback more valuable and your collaboration on DoltHub's even smoother. We hope this feature enhances your pull request experience, and we're eager to hear your thoughts. If you have any feedback, reach out on Discord or file an issue on GitHub.
Stay tuned for more exciting updates!