An Introduction to Docker for R Users

What is Docker?

Docker is “a computer program that performs operating-system-level

virtualization, also known as ‘containerization’”

Wikipedia. As any

first line of a Wikipedia article about tech, this sentence is obscure

to anyone not already familiar with the content of the article.

So, to put it more simply, Docker is a program that allows to

manipulate (launch and stop) multiple operating systems (called

containers) on your machine (your machine will be called the host).

Just imagine having 10 RaspberryPi with different flavors of Linux, each

focused on doing one simple thing, that you can turn on and off whenever

you need to ; but all of this happens on your computer.

Why Docker & R?

Docker is designed to enclose environments inside an image / a

container. What this allows, for example, is to have a Linux machine on

a Macbook, or a machine with R 3.3 when your main computer has R 3.5.

Also, this means that you can use older versions of a package for a

specific task, while still keeping the package on your machine

up-to-date.

This way, you can “solve” dependencies issues: if ever you are afraid

dependencies will break your analysis when packages are updated, build a

container that will always have the software versions you desire: be

it Linux, R, or any package.

Docker images vs Docker containers

On your machine, you’re going to need two things: images, and

containers. Images are the definition of the OS, while the containers

are the actual running instances of the images. You’ll need to install

the image just once, while the containers are to be launched whenever

you need this instance. And of course, multiple containers of the same

images can be run at the same time.

To compare with R, this is the same principle as installing vs loading a

package: a package is to be downloaded once, while it has to be launched

every time you need it. And a package can be launched in several R

sessions at the same time easily.

Dockerfile

A Docker image is built from a Dockerfile. This file is the

configuration file, and describes several things: from what previous

docker image you are building this one, how to configure the OS, and

what happens when you run the container. In a sense, it’s a little bit

like the DESCRIPTION + NAMESPACE files of an R package, which

describes which are the dependencies to your package, gives meta

information, and states which functions and data are to be available to

the users library()ing the package.

So, let’s build a very basic Dockerfile for R, focused on

reproducibility. The idea is this one: I have today an analysis that

works (for example contained in a .R file), and I want to be sure this

analysis will always work in the future, regardless of any update to the

packages used.

So first, create a folder for your analysis, and a Dockerfile:

mkdir ~/mydocker

cd ~/mydocker

touch Dockerfile

FROM

Every Dockerfile starts with a FROM, which describes what image we

are building our image from. There are a lot of official images, and you

can also build from a local one.

This FROM is, in a way, describing the dependency of your image ; just

as in R, when building a package, you always rely on another package (be

it only the {base} package).

If you’re going for an R based image, Dirk Eddelbuettel & Carl Boettiger

are maintaining rocker, a collection

of Docker images for R you can use. We’ll use the rocker/r-base in

this blogpost.

FROM rocker/r-base

RUN

Once we’ve got that, we’ll add some RUN statements: these are commands

which mimic command line commands. Remember what we want: an image that

will, ad vitam aeternam, run an analysis as if we were still today. So

what we’ll do is use the {checkpoint} package.

The command to make R execute something, from the terminal, is R -e "my

code". Let’s add a {checkpoint} installation.

FROM rocker/r-base

RUN R -e "install.packages('checkpoint')"

We need a /root/.checkpoint folder to use {checkpoint}, let’s create

that one with mkdir (make directory).

FROM rocker/r-base

RUN R -e "install.packages('checkpoint')"

RUN mkdir /root/.checkpoint

COPY

Now, I need to get the script for my analysis from my machine (host) to

the container. For that, we’ll need to use COPY localfile

pathinthecontainer. I’ll first create a folder to receive everything,

with mkdir. Note that here, the myscript.R has to be in the same

folder as the Dockerfile on your computer.

Let’s say this is the content of myscript.R:

library(checkpoint)

checkpoint("2019-01-06")

library(tidystringdist)

df <- tidy_comb_all(iris, Species)

p <- tidy_stringdist(df)

write.csv(p, "p.csv")

Here, the {tidystringdist} that will be installed in the machine will

be the one from the date of today, even if I build this image in one

year, or two, or four.

FROM rocker/r-base

RUN R -e "install.packages('checkpoint')"

RUN mkdir /home/analysis

COPY myscript.R /home/analysis/myscript.R

CMD

CMD is the command to be run every time you’ll launch the docker. What

we want is myscript.R to be sourced.

FROM rocker/r-base

RUN R -e "install.packages('checkpoint')"

RUN mkdir /home/analysis

COPY myscript.R /home/analysis/myscript.R

CMD R -e "source('/home/analysis/myscript.R')"

Build, and run

Build

Now, go and build your image. From your terminal, in the directory where

the Dockerfile is located, run:

docker build -t analysis .

-t name is the name of the image (here analysis), and . means it

will build the Dockerfile in the current working directory.

run

Then, just launch with:

docker run analysis

And your analysis will be run 🎉!

Export container content

One thing to do now: you want to access what is created by your analysis

(here p.csv) outside your container ; i.e, on the host. Because yes,

as for now, everything that happens in the container stays in the

container. So what we need is to make the docker container share a

folder with the host. For this, we’ll use what is called Volume, which

are (roughly speaking), a way to tell the Docker container to use a

folder from the host as a folder inside the container.

That way, everything that will be created in the folder by the container

will persist after the container is turned off. To do this, we’ll use

the -v flag when running the container, with

path/from/host:/path/in/container. Also, create a folder to receive

the results in both :

FROM rocker/r-base

RUN R -e "install.packages('checkpoint')"

RUN mkdir /home/analysis && mkdir /home/results

COPY myscript.R /home/analysis/myscript.R

CMD cd /home/analysis && R -e "source('myscript.R')" && mv /home/analysis/p.csv /home/results/p.csv

mkdir ~/mydocker/results

docker run -v ~/mydocker/results:/home/results analysis

Wait for the computation to be done, and…

ls ~/mydocker/results

p.csv

🤘

What to do next?

So now, every time you’ll launch this Docker image, the analysis will be

performed and you’ll get the result back. With no problem of

dependencies: the packages will always be installed from the day you

desire. Although, this can be a little bit long to run as the packages

are installed each time you run the container. But as I said in the

Disclaimer, this is a basic introduction to Docker, R and

reproducibility, so the goal was more to get beginners on board with

Docker 🙂

Other things you can do would be:

Using {packrat}, and get the

library bundle in the container.

Use remotes::install_version() if you want your analysis to be

based on package version instead of a time based installation.

FROM rocker/r-base

RUN R -e "install.packages('remotes'); remotes::install_version('tidystringdist', '0.1.2')"

...

Use the Volume trick to bring data into your container, so that

any data will be analysed in the very same environment.[Source]-https://www.r-bloggers.com/an-introduction-to-docker-for-r-users/

Beginners & Advanced level Docker Training in Mumbai. Asterix Solution's 25 Hour Docker Training gives broad hands-on practicals.

Search This Blog

Digital Marketing Certfication Course

An Introduction to Docker for R Users

Comments

Post a Comment

Popular posts from this blog

How To Earn a Top-Paying AWS Certification & Salary

Full Stack Development : All that you need to know

Five Prominent AWS Security Services and Their Use Cases