When we talk about R packages, there are a ton of additional nuances that increase the complexity for setup compared to Python. The most pressing matter is that, unlike Python, R does not have built-in package management. Now, of course, there are packages which perform that role really well, which we will be talking about in this blog post. However, that only scratches the issues which are immediately visible to the user, not accounting for problems that are based on the root machine and not R itself.
So, I will be splitting this into two parts. The first part will talk about R packages and how to make them reproducible, similar to Python last post. The second part will review all the additional nuances that are required to get an R package to install and run properly, even if you have all of the metadata necessary to do so.
Once again, everything mentioned below is covered in greater detail on the packrat and renv sites; including many useful features that I will not be discussing below. I suggest taking a gander even if you do not want to skip the fluff of this blog post.
packrat
: Tales of the 2010s
packrat
is a R package manager created by Posit Software, initially released in 2013 and added to CRAN as of version 0.4.1-1. Since late 2019 to early 2020, packrat
has been deprecated in favor of using renv
. So, why am I mentioning this in a post about reproducibility. Well, numerous R scripts have been written before 2019 which use packrat
. Additionally, internals and implementations of RStudio, a popular R integrated development environment (IDE), are likely still in use today.
As such, we will review how to load R code which uses packrat
, not how to create a project with one. In addition, we'll talk about how to migrate to renv
from packrat
(it's only one command).
Loading a packrat
project
Luckily, loading a packrat
project is really simple: all you need to do is launch R from the project directory. For those unfamiliar, the project directory contains an `.Rprofile
` script. From there, packrat
will attempt to install itself it isn't already present and install any packages that are needed. Then, to verify all of the packages have been downloaded and installed:
packrat::status()
If it outputs 'Up to date.', all the packages have been successfully attached and installed.
In RStudio, packrat
projects can simply be imported as any other project by simply going to `File
` -> `New Project...
` -> `Existing Directory
`, and navigating to the current project's directory.
If the project is bundled
In the case where the packrat
project is bundled, you will need to install `packrat` yourself. That is done using:
install.packages("packrat")
From there, you can unbundle the project with a function of the same name:
packrat::unbundle(<bundle_loc>, <output_dir>, restore = TRUE)
`<bundle_loc>
` should point to the location of the bundle and `<output_dir>
` should point to where the bundle should be extracted to. Setting `restore
` to `TRUE
` uses packrat
to install and load any packages that are needed.
If packrat
has been turned off
If packrat
has been turned off on your machine, you simply need to turn it back on via:
packrat::on()
This typically only happens when the user explicitly has ran the `packrat::off()
` function.
Early changes in packrat
As a quick side note, between versions 0.1.0 and 0.2.0, packrat
's storage format changed. The function to update the format is `packrat::migrate()
`; however, the likelihood of encountering these old versions are slim to none.
renv
: The Future is Now
Since 2019, renv
, also developed by Posit Software, has taken the place of packrat
for package management. renv
's goal is 'to be a robust, stable replacement for the packrat package, with fewer surprises and better default behaviors.'
Migrating from packrat
to renv
To finish up the previous section, you can migrate packrat
to renv
using one function in your project directory:
renv::migrate()
Just make sure you have renv
installed on your machine:
install.packages("renv")
Creating a new Environment
Once you have renv
installed onto your machine, you can add it to a new or existing project by opening an R terminal in your project directory and running:
renv::init()
This will create the necessary files needed to properly log your dependencies including the version of R, the packages' names, versions, sources, repository, and requirements, and the necessary files to bootstrap renv
into in a new location. Additionally, if you have git installed, it will generate a .gitignore
that will only include the files necessary for management without bloating the repository.
From there, you can continue working on your project and add or remove any R packages that are needed. Every so often, you should run:
renv::snapshot()
This saves the current state of the project library to a lockfile containing your dependencies. Once you have completed the project, you can simply run `renv::snapshot()
` once more and then upload the codebase to the desired location.
Loading an existing Environment
To load an existing environment, you simply need to follow the same steps as packrat
: open an R terminal in the project directory which will install the used renv
version and its packages, typically after responding to a prompt with `Y
`.
If renv
isn't activated
This is typically set by the user, so it can be reenabled using:
renv::activate()
If you initialize said no or renv
asks you to run a command
In these cases, you can use one of two methods. First, you can simply run:
renv::restore()
The other method would have you run `renv::init()
` followed by a `1
` in the selection menu.
Congratulations! Your R code is now 'theoretically' reproducible. Of course, there are a number of nuances that need to be addressed which I'll review next time. Additionally, there are so many useful and cool features that renv
has that I didn't cover above, so I'll probably come back and make a separate post for that at some point.
Some Additional Thoughts
Package Sources
Packages do not need to be on CRAN to be installed by renv
. renv
currently supports Bioconductor, GitHub, GitLab, and Bitbucket as well as of this post. These are typically inferred by the DESCRIPTION
file associated with a package. If you want to read more on this, you can check out the renv wiki.
Removed Packages
This is similar to Python where packages can be removed by CRAN, however rare it may be. Once again, I would suggest using a container like Docker to store the development environment to mitigate these concerns.
Lack of documentation on older versions of packages
When I was initially researching packrat
to get a better sense of its history, I was shocked to see almost no documentation related to the original versions that weren't present on CRAN. The only reason I was able to find one of the blog posts below by downloading the earliest version of packrat
on CRAN, finding one of the original authors (Kevin Ushey), and then finding the original RStudio blog on WordPress to track down the release notification. Just from a maintenance perspective, I would suggest that you simply update your work every so often to avoid these extra burdens. This is especially the case when old programs can no longer run on newer machines.