In our previous blog post we examined some freely available licensing tools for open data from Creative Commons. To briefly recap a license specifies the terms under which copyrightable material is made available for public access, sharply distinct from putting in the public domain. However, for content to be licensable it has to be copyrightable. In this post we explore what U.S. law has to say about what constitutes copyrightable material. The goal of the post is to give you a feel for the standard that is applied to adjudicate whether material you might share on DoltHub can be considered copyrightable.
Before getting into a brief discussion of the test for material being copyrightable, it is important to emphasize that this post does not constitute legal advice. If you are considering sharing data that is potentially sensitive, you should consult your own legal council. The U.S. Copyright Office has published a report specifically around best practices for database copyright. It's a great resource for digging deeper into the issue.
Data licensing and source code licensing are quite different. This is because the United States does provide any copy right protection for facts, which is generally what a database contains. This is laid out on the U.S. Copyright Office's homepage:
Copyright does not protect facts, ideas, systems, or methods of operation, although it may protect the way these things are expressed.
However, copyright does protect "original works of authorship", again quoting the U.S. Copyright Office, even explicitly calling out software as an example:
Copyright, a form of intellectual property law, protects original works of authorship including literary, dramatic, musical, and artistic works, such as poetry, novels, movies, songs, computer software, and architecture.
Thus in order for your database to be subject to copyright law, and therefore licensable, it needs to be considered a "work of original authorship." By way of example, in a database containing original poetry both the poetry and the decisions of how to organize the database could be considered copyrightable. On the other hand a database containing facts about the temperature at a given a time and place copyright can only apply to the to the selection and arrangement of the data. It is important to recognize that just because a database contains some copyrightable content, it does not mean that all the content is copyrightable.
A prominent Supreme Court case is highly illustrative. Rural Telephone Company of Kansas held that Feist Publications, a publisher of telephone directories, without obtaining a license, infringed on its copyright by harvesting the listings from its telephone directory in order to use them for commercial purposes. The relevant case is Feist Pubs., Inc. v. Rural Tel. Svc. Co., Inc., 499 U.S. 340 (1991), and you can find the decision here.
This example is illustrative for us because Rural Telephone Company of Kansas is making data publicly accessible, and a third party is harvesting that data for commercial ends. If anything the case has only gotten more salient given that the act of "harvesting" no longer requires copying out a phone book, but can in fact be done electronically. Consider also that the same advances in computer technology that have reduced the cost of harvesting data have also led to an explosion in the existence of publicly accessible data.
Writing for the majority, Sandra Day O'Connor makes explicit the distinction highlighted by the U.S. Copyright Office:
This case concerns the interaction of two well-established propositions. The first is that facts are not copyrightable; the other, that compilations of facts generally are.
Later she makes absolutely clear that "factual compilations", of which databases filled with data are certainly an example, can certainly be considered "original works of authorship":
Factual compilations, on the other hand, may possess the requisite originality. The compilation author typically chooses which facts to include, in what order to place them, and how to arrange the collected data so that they may be used effectively by readers. These choices as to selection and arrangement, so long as they are made independently by the compiler and entail a minimal degree of creativity, are sufficiently original that Congress may protect such compilations through the copyright laws.
Importantly she highlights the aspects of building a factual compilation, more colloquially a database, that are capable of giving it the "requisite originality": "selection and arrangement."
Thus, the question of whether your data is copyrightable misconstrues how copyright law works in this domain. The question of interest is whether the "selection and arrangement" choices that a dataset represents "entail a minimal degree of creativity." O'Connor makes clear this is not a hard threshold to clear:
To be sure, the requisite level of creativity is extremely low; even a slight amount will suffice.
As far as this pertains to your data as it exists in a Dolt repository, this "selection and arrangement" constitutes decisions such as:
- What facts to include in the dataset?
- What schema choices to make?
- How to model certain facts that might not be directly represented
In summary, while it's impossible for us to offer an opinion on whether any dataset on DoltHub is copyrightable, we can look back and see what standard the U.S. courts have established for answering that question. If you decide that your Dolt repository is copyrightable material, and attach license to it, we will cover considerations around enforcement in a follow up post.