--- title: "Downloading from KEGG" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Downloading from KEGG} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```r library(volcalc) library(dplyr) #for left_join() #> #> Attaching package: 'dplyr' #> The following object is masked from 'package:ChemmineR': #> #> groups #> The following objects are masked from 'package:stats': #> #> filter, lag #> The following objects are masked from 'package:base': #> #> intersect, setdiff, setequal, union ``` The `volcalc` package can be used to download .mol files directly from KEGG given either compound IDs or pathway IDs. First, choose a directory to download files to. For this vignette, we will use a temporary directory, but you should choose somewhere in your project. ```r dl_path <- tempdir() ``` ### Single compound usage You can search KEGG for compunds at to find their KEGG IDs starting with a "C". Let's download .mol files for two compounds, jasmonic acid and methyl jasmonate, with KEGG IDs [C08491](https://www.kegg.jp/entry/C08491) and [C11512](https://www.kegg.jp/entry/C11512), respectively, using the `volcalc` function `get_mol_kegg()`. ```r mols <- get_mol_kegg(compound_ids = c("C08491", "C11512"), dir = dl_path) mols #> # A tibble: 2 × 2 #> compound_id mol_path #> #> 1 C08491 /var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpKwerlt/C08491.mol #> 2 C11512 /var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpKwerlt/C11512.mol ``` The data frame returned by `get_mol_kegg()` contains the paths the files were downloaded to in `mol_path`, making for convenient passage on to the `volcalc` function `calc_vol()`. ```r rvi <- calc_vol(mols$mol_path) rvi #> # A tibble: 2 × 5 #> mol_path formula name rvi category #> #> 1 /var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpKwerlt/C08491.mol C12H18O3 (-)-Jasmonic acid 1.84 moderate #> 2 /var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpKwerlt/C11512.mol C13H20O3 Methyl jasmonate 3.81 high ``` `calc_vol()` also returns the file paths, so these two data frames can be easily joined. ```r left_join(mols, rvi, by = join_by(mol_path)) %>% select(-mol_path) #> # A tibble: 2 × 5 #> compound_id formula name rvi category #> #> 1 C08491 C12H18O3 (-)-Jasmonic acid 1.84 moderate #> 2 C11512 C13H20O3 Methyl jasmonate 3.81 high ``` ### Pathway usage We can download single or multiple compounds with `compound_ids`, but we can also download all compounds associated with a KEGG pathway with `pathway_ids`. Let's download the entire alpha-linolenic acid metabolism pathway ([map00592](https://www.kegg.jp/entry/map00592)) that the above two compounds are part of. ```r alam_pathway <- get_mol_kegg(pathway_ids = "map00592", dir = dl_path) head(alam_pathway) #> # A tibble: 6 × 3 #> pathway_id compound_id mol_path #> #> 1 map00592 C00157 /var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpKwerlt/map00592/C00157.mol #> 2 map00592 C01226 /var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpKwerlt/map00592/C01226.mol #> 3 map00592 C04672 /var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpKwerlt/map00592/C04672.mol #> 4 map00592 C04780 /var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpKwerlt/map00592/C04780.mol #> 5 map00592 C04785 /var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpKwerlt/map00592/C04785.mol #> 6 map00592 C06427 /var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpKwerlt/map00592/C06427.mol dim(alam_pathway) #> [1] 44 3 ``` Notice that this returns pathway IDs and compound IDs. We can do the same as above and pass the `mol_path` column to `calc_vol()` and then join the resulting data frame and do some basic data wrangling to find the top 10 most volatile compounds in that pathway. ```r rvi_path <- calc_vol(alam_pathway$mol_path) #> Warning in FUN(X[[i]], ...): Possible OpenBabel errors detected and only NAs returned. #> Run with `validate = FALSE` to ignore this. ``` ```r left_join(alam_pathway, rvi_path, by = join_by(mol_path)) %>% select(-mol_path) %>% #arrange from most to least volatile arrange(desc(rvi)) %>% #take just the top 10 slice_head(n = 10) #> # A tibble: 10 × 6 #> pathway_id compound_id formula name rvi category #> #> 1 map00592 C16310 C6H10O 3-Hexenal 7.32 high #> 2 map00592 C19757 C8H14O2 (3Z)-Hex-3-en-1-yl acetate 6.75 high #> 3 map00592 C08492 C6H12O 3-Hexenol 6.45 high #> 4 map00592 C16323 C9H14O 3,6-Nonadienal 6.05 high #> 5 map00592 C11512 C13H20O3 Methyl jasmonate 3.81 high #> 6 map00592 C16318 C13H20O3 (+)-7-Isomethyljasmonate 3.81 high #> 7 map00592 C16322 C9H16O3 9-Oxononanoic acid 2.77 high #> 8 map00592 C16343 C17H28O Heptadecatrienal 2.69 high #> 9 map00592 C08491 C12H18O3 (-)-Jasmonic acid 1.84 moderate #> 10 map00592 C16317 C12H18O3 (+)-7-Isojasmonic acid 1.84 moderate ```