Before we can discuss anything relating to how to collect data or to publish code, we need to first talk about licensing. If you do not understand how to license your work or how to obtain or use rights of others' work, then you will be extremely limited on how much you can make public or even store for secondary analyses. As such, this section will provide an overview on how to license and manage licenses on the work you wish to conduct.
As this somewhat involves the legal system, I must provide a disclaimer:
The information provided in this blog post does not, and is not intended to, constitute legal advice, instead, all information, content, and materials available in this blog post are for general informational purposes. Information in this blog post may not constitute the most up-to-date legal or other information. This blog post contains links to other third-party resources. Such links are only for the convenience of the readers.
Readers of this blog post should contact their attorney to obtain advice with respect to any particular legal matter. No readers of this blog post should act or refrain from acting on the basis of information on this site without first seeking legal advice from counsel in the relevant jurisdiction. Only your individual attorney can provide assurances that the information contained herein – and your interpretation of it – is applicable or appropriate to your particular situation. Use of, and access to, this blog post or any of the links or resources contained within the blog post do not create an attorney-client relationship between the readers and authors of this blog post or contributors.
The views expressed at, or through, this blog post are those of the individual authors writing in their individual capacities only. All liability with respect to actions taken or not taken based on the contents of this blog post are hereby expressly disclaimed. The content of this blog post is provided "as is;" no representations are made that the content is error-free.
What is Copyright?
Every person, at least in the United States, is afforded a certain amount of rights in relation to licensing. The right we will focus on today is known as copyright: the exclusive right, typically for a certain number of years, to publish, reproduce, adapt, or distribute some creative work.
Now, what defines a creative work depends on the laws of wherever you are. However, in general, a creative work is some expression of an idea. Some examples include an article you have written, source code you have developed, an infographic about a study, selection of data shown in a dataset, etc. However, something like an algorithm or theory cannot be copyrighted as they are ideas, not expressions. For example, I can copyright my implementation of adding two number together in a programming language as part of a library, but I can't copyright the concept of addition itself.
By default, the author owns all the rights, through copyright law, related to the creative work. As such, if an author wants to let others do anything with their creative work, they need some sort of license or agreement giving out the rights to their creative work.
What is a License?
A license is a piece of text which mentions what rights are afforded to a user of a creative work subject to the specified conditions. It also typically contains any information related to liability or warranty of the provided creative work.
There are numerous types of licenses depending on what creative work you are trying to distribution: software, dataset, content, etc. Additionally, there are different levels of openness depending on which license you choose to use. In general, an open license doesn't place any restrictions on the usage that cannot be reasonably met by any user.
The Open Licenses
There are numerous open licenses out there, but in this post we are only going to talk about four types of licenses. If you want to see some of the classifications of licenses that exist, I would recommend look at the SPDX License List and looking over whether they are considered FSF Free/Libre or OSI Approved.
FSF Free/Libre and OSI Approved are two different standards for open and free licenses. Other standards may have differing opinions, so make sure you read the license yourself before using it.
First, there is the Public Domain: anyone can use, modify, or distribute the creative work without restriction or condition. Some examples include Creative Commons 0 1.0 Universal Public Domain Dedication (CC0-1.0) or the Unlicense (Unlicense).
Next are the Permissive licenses. These licenses have minor conditions to obtain usage, modification, or distribution rights. In most cases, this is only attribution and providing a copy of the license itself. Some examples include the MIT License (MIT), Apache License Version 2.0 (Apache-2.0), or Berkeley Software Distribution 3-Clause "New" or "Revised" License (BSD-3-Clause).
Afterwards are the Copyleft licenses. These licenses typically contain all the clauses of the permissive licenses with an additional restriction of all modifications or distributions must also be free and open. Examples include the GNU General Public License v3.0 only (GPL-3.0-only) or Design Science License.
There is also a less restrictive form of copyright licenses aptly known as Weak Copyleft. These licenses have the same requirements as copyleft, but allows works that only consume the creative work to be non-free or non-open. For example, if a platform uses a software library, the platform itself doesn't need to have its source free and open if the library is a weak copyleft one. Examples include GNU Lesser General Public License v3.0 only (LGPL-3.0-only) or Mozilla Public License 2.0 (MPL-2.0).
The Non-Open Licenses
Of course, there are a number of non-open licenses as well which restrict what users can use the work.
The most common of these are Non-Commercial licenses, which cannot be used for commercial purposes. Examples include the Java Research License Version 1.5 or Creative Commons Attribution-NonCommercial 4.0 International (CC-BY-NC-4.0).
For many programs or software, they are typically under a commercial or proprietary license. These licenses typically include some restrictions towards usage or distribution, usually making them non-open.
Software that isn't licensed at all is considered to be All Rights Reserved (ARR). This is considered the default license when none is specified as it grants no rights whatsoever to the user.
Finally, there are Trade Secrets, which essentially mean that the work is not public.
Contributing to a Licensed Work
Contributions from others to your licensed work adds an additional layer of complexity. The author still owns all the rights to their work, even when they contribute to someone else's project. As such, they have the right to revoke access or change the terms of usage for their contribution.
To remedy this, contributors are made to sign a Contributor License Agreement, or CLA: the terms of how a contributor can contribute work to another creative work. In most cases, this states that the owner of the creative work has a non-revocable right to use, modify, or distribute the contribution with the license the project is under.
Of course, this means that changing the license at any point in the future requires agreements from all contributors. If you wish to take copyright ownership over the work, you can use a Copyright Transfer Agreement, or CTA, instead. This is particularly useful when dealing with unlicensed works such that the contributors are shielded from liability or any complicated procedures.
Agreements
Not all agreements can be boiled down into a single license for a general purpose. Some organizations, typically when it comes to data, require you to sign an agreement containing the terms and conditions related to how you are allowed to use and distribute the data. In these cases, you should always read the content of the agreement is before signing it. Additionally, you should always have a lawyer read over the contents.
When it comes to making the work more open, you must make sure to be able to either distribute the work or some recoverable metadata and allow researchers outside the original group to use it. These protections typically don't extend outside the original group who is conducting the research, so secondary data analyses or reproductions are usually difficult or sometimes even impossible. If you are unable to do so, you should make sure that there is some line of communication between the author and any users such that they may request the data. Of course, if the organization does not want to, there is nothing you can do. However, this is unlikely to be the case unless the information is highly sensitive and cannot be anonymized safely.
Some Additional Thoughts
Exceptions to Copyright Law
Creative works are not always protected by copyright law. There are a number of exceptions. Fair Use, or Fair Dealing, is one of the most common exceptions, where someone could use a copyrighted work without a license in certain circumstances. For example, a unlicensed work used for non-profit, educational purposes is more likely to be considered fair to use without a license.
My work on reproducing research projects uses this section of the Fair Use doctrine for providing additional files or patches needed to reproduce studies without suitable licensing.