Create Multiple Lines with Labels in ggplot2
How to graph multiple lines and assign custom legend labels using the ggplot2 package for R
- : Changing image and file paths
- : Changing picture and file paths
While working on something else, I came across the problem of graphing multiple lines on the same set of axes using ggplot2. I did not find it nearly as straightforward as I expected (largely due to my limited experience with R in general and ggplot2 in particular), so this serves as an archive of how to do this!
Creating and Labelling a Single Line
Creating and labelling a single line is relatively straightforward, although it took me a while to get used to the ggplot2 way of doing things. Most of my graphing experience is with tikz (LaTeX) and matplotlib (Python), so wrapping my head around mappings and different layers took a while, and to be honest, I am still not particularly comfortable with it. Nonetheless, here is how we can graph a single line and add custom labels to the legend.
# Load the library and set the options
library(ggplot2)
options(repr.plot.width=12, repr.plot.height=8)
times <- seq(0, 10, by=0.5)
x1 <- rnorm(n=length(times), mean=times, sd=0.5) # line centred around y = x but with some error
df <- data.frame(time=times, x1=x1)
head(df)
| time | x1 | |
|---|---|---|
| <dbl> | <dbl> | |
| 1 | 0.0 | 0.3129191 |
| 2 | 0.5 | 1.4673592 |
| 3 | 1.0 | 0.1460783 |
| 4 | 1.5 | 1.0497238 |
| 5 | 2.0 | 2.8634889 |
| 6 | 2.5 | 2.1927241 |
ggplot(df, aes(time, x1, colour="first")) + # without `colour` there would be no legend entry
# In the above line, the actual value for 'colour' doesn't matter
geom_line() + # graph a line
labs(colour="Variables") + # The title of the legend
scale_colour_discrete(labels="The First Variable") + # give a custom label to the single line
# Without the above line the label would be 'x1', the name of the variable
ylab("x1") +
xlab("Time") +
# the below two lines set and centre the title of the entire graph
ggtitle("A Variable") +
theme(plot.title=element_text(hjust=0.5))

Multiple Lines
Graphing a single line and including a custom label in the legend wasn’t too bad. However, when we get to multiple lines there are a few different ways. I will first describe how we can manually add different lines, and second how we can reshape the dataframe to have ggplot2 do it for us.
Manually Adding the Lines
In this method, I’ll add the lines from the single, ‘wide’ dataframe using multiple calls to geom_line, each being manually assigned to the column of the dataframe from which it is getting the data.
# Adding a second set of data to the frame
df$x2 <- rnorm(length(times), mean=times * 1.1, sd=0.5)
head(df)
| time | x1 | x2 | |
|---|---|---|---|
| <dbl> | <dbl> | <dbl> | |
| 1 | 0.0 | 0.3129191 | 0.4240040 |
| 2 | 0.5 | 1.4673592 | 0.6065533 |
| 3 | 1.0 | 0.1460783 | 0.7018993 |
| 4 | 1.5 | 1.0497238 | 1.1500628 |
| 5 | 2.0 | 2.8634889 | 2.2110318 |
| 6 | 2.5 | 2.1927241 | 2.6622088 |
ggplot(df, aes(x=time)) +
# Manually adding the first line
geom_line(aes(y=x1, colour="first")) +
# Manually adding the second line
geom_line(aes(y=x2, colour="second")) +
labs(colour="Variable") +
# Setting the custom variables
# Without this, the labels would be 'x1' and 'x2'
scale_colour_discrete(labels=c("Variable 1", "Variable 2")) +
ggtitle("Some Variables") +
theme(plot.title = element_text(hjust=0.5)) +
xlab("Time") +
ylab("Value")

Method 2 - “Melting” the DataFrame
This method, which seems to be more idiomatic, involves transforming the dataframe from being ‘wide’, with each set of lines in their own separate columns, to being ‘tall’, with all of the different values being in the same column and using a different column as a key to allow ggplot2 to tell which data points should be assigned to each different line.
melted_df <- cbind(df['time'], stack(df[c('x1', 'x2')]))
head(melted_df)
| time | values | ind | |
|---|---|---|---|
| <dbl> | <dbl> | <fct> | |
| 1 | 0.0 | 0.3129191 | x1 |
| 2 | 0.5 | 1.4673592 | x1 |
| 3 | 1.0 | 0.1460783 | x1 |
| 4 | 1.5 | 1.0497238 | x1 |
| 5 | 2.0 | 2.8634889 | x1 |
| 6 | 2.5 | 2.1927241 | x1 |
# renaming the columns: 'values' -> 'value' and 'ind' -> 'variable'
colnames(melted_df)[colnames(melted_df) == 'values'] = "value"
colnames(melted_df)[colnames(melted_df) == 'ind'] = "variable"
# This time the colour is set automatically by the `variable` column
ggplot(melted_df, aes(time, value, colour=variable)) +
geom_line() +
labs(colour="Variable") +
scale_colour_discrete(labels=c("First Variable", "Second Variable")) +
ggtitle("Some Variables") +
theme(plot.title = element_text(hjust=0.5)) +
xlab("Time") +
ylab("Value")

Conclusion
And there we have it! Several ways to graph and label different lines (and using methods analgous to these, other graph types) on the same set of axes. As is so often the case, looking back at this I can hardly believe that it caused me so much trouble. Perhaps the best lesson that can be drawn from this is that reading the documentation is key, as most the trouble I had was with my not understanding the basics of how ggplot2 works.