Here, we show how to construct the phyloseq object based on the taxonomic table, the ASV table, and the metadata. We use the 88 soil dataset (Lauber et al. 2009) obtained from the gglasso(Schaipp, Vlasovets, and Müller 2021) tutorial. The raw data can be downloaded from the GitHub repositories. According the to tutorial the data is already processed in the following way:
Filter for OTUs with minimum abundance of \(100\) and
A detailed tutorial on constructing a phyloseq object can be found on the official phyloseq website.
We will need 2 packages for analysis:
phyloseq for the data structure and
tidyverse for data wranging.
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.1 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.1 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(phyloseq)
Load the data
Read the different data tables.
# The OTU table contains rownames in the column Xotu_raw <-read.csv("./data/soil_raw/soil_116.csv", header =TRUE, row.names ="X")tax_raw <-read.table("./data/soil_raw/88soils_taxonomy.txt", header =TRUE, sep ="\t")# The ph data also contains rownames in X.SampleIDph <-read.csv("./data/soil_raw/ph.csv", header =TRUE, row.names ="X.SampleID")
The taxonomic information is for the unfiltered OTUs; therefore, we need to filter this file as well. We also need to divide the data frame into seven taxonomic levels instead of one string for phyloseq.
Lauber, Christian L, Micah Hamady, Rob Knight, and Noah Fierer. 2009. “Pyrosequencing-Based Assessment of Soil pH as a Predictor of Soil Bacterial Community Structure at the Continental Scale.”Applied and Environmental Microbiology 75 (15): 5111–20.
Schaipp, Fabian, Oleg Vlasovets, and Christian L. Müller. 2021. “GGLasso - a Python Package for General Graphical Lasso Computation.”Journal of Open Source Software 6 (68): 3865. https://doi.org/10.21105/joss.03865.
Source Code
---title: "Practical applications of learned concepts in R"format: html: code-fold: false code-tools: truebibliography: references.bibeditor: visual---## Construct a phyloseq ObjectHere, we show how to construct the phyloseq object based on the taxonomic table, the ASV table, and the metadata. We use the 88 soil dataset [@lauber2009pyrosequencing] obtained from the [gglasso](https://gglasso.readthedocs.io/en/latest/auto_examples/plot_soil_example.html#sphx-glr-auto-examples-plot-soil-example-py)[@Schaipp2021] tutorial. The raw data can be downloaded from the GitHub repositories. According the to tutorial the data is already processed in the following way:1. Filter for OTUs with minimum abundance of $100$ and2. add pseudo-count of $1$.Data sources:- Taxonomy: https://github.com/Vlasovets/GGLasso/blob/cfbf01535c88bbcd3ba60f24b5b867472d549f89/data/soil/original/88soils_taxonomy.txt- OTU table: https://github.com/Vlasovets/GGLasso/blob/master/data/soil/processed/soil_116.csv- Metadata: https://github.com/Vlasovets/GGLasso/blob/master/data/soil/processed/ph.csvA detailed tutorial on constructing a phyloseq object can be found on the [official phyloseq website](https://joey711.github.io/phyloseq/import-data.html).We will need 2 packages for analysis:1. `phyloseq` for the data structure and2. `tidyverse` for data wranging.```{r}#| label: load-packageslibrary(tidyverse)library(phyloseq)```### Load the dataRead the different data tables.```{r}#| label: load-data#| warning: false#| echo: true# The OTU table contains rownames in the column Xotu_raw <-read.csv("./data/soil_raw/soil_116.csv", header =TRUE, row.names ="X")tax_raw <-read.table("./data/soil_raw/88soils_taxonomy.txt", header =TRUE, sep ="\t")# The ph data also contains rownames in X.SampleIDph <-read.csv("./data/soil_raw/ph.csv", header =TRUE, row.names ="X.SampleID")```Print the data.```{r}#| label: look-at-data-otu#| warning: false#| echo: truehead(otu_raw[, 1:6])```The OTU data looks good, but we must fix the column names since they start with `X`.```{r}#| label: look-at-data-tax#| warning: false#| echo: truehead(tax_raw)dim(tax_raw)```The taxonomic information is for the unfiltered OTUs; therefore, we need to filter this file as well. We also need to divide the data frame into seven taxonomic levels instead of one string for phyloseq.```{r}#| label: look-at-data-ph#| warning: false#| echo: truehead(ph)```This data frame looks fine.### Transform the dataRemove the X in the column name so they match with the taxonomy data.```{r}#| label: replace-columnname-otucolnames(otu_raw) <-str_remove_all(colnames(otu_raw), pattern ="X")```Filter the taxonomic information and split the data into the different taxonomic ranks.```{r}#| label: filter-taxtax_raw <- tax_raw %>%filter(Feature.ID %in%colnames(otu_raw))# Sanity check: do the dimensions match? 116 Taxonomic assignments# and 116 OTUs.dim(tax_raw)dim(otu_raw)``````{r}#| label: split-taxon-columntaxonomic_ranks <-c("Kingdom", "Phylum", "Class", "Order","Family", "Genus", "Species")tax_raw <- tax_raw %>%separate_wider_delim(., cols ="Taxon",delim =";", names = taxonomic_ranks)# Phyloseq needs the OTU ids as rownamestax_raw <- tax_raw %>%column_to_rownames("Feature.ID")```### Construct the phylo objectWe need the otu table and taxonomic table in matrix form to construct the phylo object. Also, we need to remove the pseudo-count of 1```{r}#| label: transform-to-matrix# remove pseudo-count of 1otu_raw <- otu_raw -1# transform to matrixotu <-as.matrix(otu_raw)tax <-as.matrix(tax_raw)```Construct the objects for the final phyloseq object```{r}#| label: phylo-step-1# We have taxa in the columns so we need to specifiy thisotu <-otu_table(otu, taxa_are_rows =FALSE)tax <-tax_table(tax)sample_df <-sample_data(ph)```Combine everything in the phylo object.```{r}#| label: final-phylo-objectsoil <-phyloseq(otu, tax, sample_df)```Finally, save the data.```{r}#| label: savesaveRDS(soil, file ="./data/soil_processed/soil.RDS")```## Computational environment```{r}#| label: comp-envirsessionInfo()```