Created
February 6, 2019 14:00
-
-
Save Rekyt/4b7b187022ee570d96b103dd6d122550 to your computer and use it in GitHub Desktop.
Plot a phylogeny with clade labels and colored edges programmatically
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
--- | |
title: "Phylogeny with traits in branches and clade labels" | |
author: "Matthias Grenié" | |
date: /today | |
output: pdf_document | |
--- | |
```{r setup, include=FALSE} | |
knitr::opts_chunk$set(echo = TRUE) | |
``` | |
We are going to use the package `ggtree` to show trait values and clade labels on phylogenetic trees. | |
```{r needed_packages} | |
library("ggtree") | |
data("geospiza_raw", package = "phylobase") | |
``` | |
We are using a phylogenetic tree and trait data from Darwin's finches in the `phylobase` package called `geospiza`: | |
```{r overview_geospiza} | |
geospiza_raw$tree | |
head(geospiza_raw$data) | |
``` | |
We have five traits per species and the goal is to produce a fan phylogenetic tree with clade labels around the tree. | |
We can first visualize a simple tree: | |
```{r simple_vizualization} | |
ggtree(geospiza_raw$tree, layout = "fan") + | |
geom_tiplab() | |
``` | |
Now we can color the edges of the tree per trait: | |
```{r colored_by_traits} | |
base_tree = ggtree(geospiza_raw$tree, layout = "fan") + | |
geom_tiplab() | |
annotation_df = as.data.frame(geospiza_raw$data) | |
annotation_df$taxa = rownames(geospiza_raw$data) | |
# First column has to contain same names as tip labels | |
annotation_df = annotation_df[, ncol(annotation_df):1] | |
annotated_tree = base_tree %<+% annotation_df | |
colored_tree = annotated_tree + aes(color = wingL) | |
colored_tree | |
``` | |
We can add information of clade for a given node considering several species | |
```{r single_clade_label} | |
colored_tree + | |
geom_cladelabel(20, "First clade", offset = 0.1, barsize = 1.5, angle = "auto") | |
``` | |
If we have informations on several clades we can generate as many clade labels as needed and used them: | |
```{r several_clade_labels} | |
# Describing the clades of species | |
clade_names = data.frame( | |
node = c(20, 25), # Node id in the phylogenetic tree | |
clade = c("First clade", "Second clade")) | |
# All clade labels generated from each row of the data frame of names | |
all_clade_labs = apply(clade_names, 1, function(row) { | |
geom_cladelabel(row[["node"]], row[["clade"]], offset = 0.1, barsize = 1.5, | |
angle = "auto") | |
}) | |
# Plot all clade labels | |
colored_tree + | |
all_clade_labs | |
``` | |
To alternate the color of the bands you can alternate the color in the given data.frame: | |
```{r color_df} | |
# Describing the clades of species | |
clade_names_color = data.frame( | |
node = c(20, 25), # Node id in the phylogenetic tree | |
clade = c("First clade", "Second clade"), | |
color = c("#000000", "#AAAAAA")) | |
# All clade labels generated from each row of the data frame of names | |
all_clade_labs_color = apply(clade_names_color, 1, function(row) { | |
geom_cladelabel(row[["node"]], row[["clade"]], color = row[["color"]], | |
offset = 0.1, barsize = 1.5, angle = "auto") | |
}) | |
# Plot all clade labels | |
colored_tree + | |
all_clade_labs_color | |
``` | |
If the tree is big you can programmatically determine the node at which you need to put the label: | |
```{r label_mrca} | |
library("dplyr") | |
# Data frame with each clade labeled | |
species_clade = data.frame( | |
species = c("fuliginosa", "fortis", "magnirostris", "conirostris", | |
"scandens", "difficilis", "psittacula", "parvulus", "pauper", | |
"pallida"), | |
clade_name = c(rep("Clade 1", 6), rep("Clade 2", 4)), | |
stringsAsFactors = FALSE | |
) | |
# Retrieve Most Recent Common Ancestor for each clade to get node number | |
species_mrca = species_clade %>% | |
group_by(clade_name) %>% | |
summarise(mrca = ggtree::MRCA(colored_tree, species)) | |
# Add alternating color column | |
species_mrca$color = rep(c("#000000", "#AAAAAA"), size = nrow(species_mrca)) | |
# Then annotate the tree | |
colored_tree + | |
apply(species_mrca, 1, function(row) { | |
geom_cladelabel(row[["mrca"]], row[["clade_name"]], | |
color = row[["color"]], offset = 0.1, barsize = 1.5, | |
angle = "auto") | |
}) | |
``` |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment