---
title: "Downloading from KEGG"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Downloading from KEGG}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```r
library(volcalc)
library(dplyr) #for left_join()
#>
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:ChemmineR':
#>
#> groups
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
```
The `volcalc` package can be used to download .mol files directly from KEGG given either compound IDs or pathway IDs.
First, choose a directory to download files to. For this vignette, we will use a temporary directory, but you should choose somewhere in your project.
```r
dl_path <- tempdir()
```
### Single compound usage
You can search KEGG for compunds at to find their KEGG IDs starting with a "C".
Let's download .mol files for two compounds, jasmonic acid and methyl jasmonate, with KEGG IDs [C08491](https://www.kegg.jp/entry/C08491) and [C11512](https://www.kegg.jp/entry/C11512), respectively, using the `volcalc` function `get_mol_kegg()`.
```r
mols <- get_mol_kegg(compound_ids = c("C08491", "C11512"), dir = dl_path)
mols
#> # A tibble: 2 × 2
#> compound_id mol_path
#>
#> 1 C08491 /var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpKwerlt/C08491.mol
#> 2 C11512 /var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpKwerlt/C11512.mol
```
The data frame returned by `get_mol_kegg()` contains the paths the files were downloaded to in `mol_path`, making for convenient passage on to the `volcalc` function `calc_vol()`.
```r
rvi <- calc_vol(mols$mol_path)
rvi
#> # A tibble: 2 × 5
#> mol_path formula name rvi category
#>
#> 1 /var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpKwerlt/C08491.mol C12H18O3 (-)-Jasmonic acid 1.84 moderate
#> 2 /var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpKwerlt/C11512.mol C13H20O3 Methyl jasmonate 3.81 high
```
`calc_vol()` also returns the file paths, so these two data frames can be easily joined.
```r
left_join(mols, rvi, by = join_by(mol_path)) %>%
select(-mol_path)
#> # A tibble: 2 × 5
#> compound_id formula name rvi category
#>
#> 1 C08491 C12H18O3 (-)-Jasmonic acid 1.84 moderate
#> 2 C11512 C13H20O3 Methyl jasmonate 3.81 high
```
### Pathway usage
We can download single or multiple compounds with `compound_ids`, but we can also download all compounds associated with a KEGG pathway with `pathway_ids`. Let's download the entire alpha-linolenic acid metabolism pathway ([map00592](https://www.kegg.jp/entry/map00592)) that the above two compounds are part of.
```r
alam_pathway <- get_mol_kegg(pathway_ids = "map00592", dir = dl_path)
head(alam_pathway)
#> # A tibble: 6 × 3
#> pathway_id compound_id mol_path
#>
#> 1 map00592 C00157 /var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpKwerlt/map00592/C00157.mol
#> 2 map00592 C01226 /var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpKwerlt/map00592/C01226.mol
#> 3 map00592 C04672 /var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpKwerlt/map00592/C04672.mol
#> 4 map00592 C04780 /var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpKwerlt/map00592/C04780.mol
#> 5 map00592 C04785 /var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpKwerlt/map00592/C04785.mol
#> 6 map00592 C06427 /var/folders/wr/by_lst2d2fngf67mknmgf4340000gn/T/RtmpKwerlt/map00592/C06427.mol
dim(alam_pathway)
#> [1] 44 3
```
Notice that this returns pathway IDs and compound IDs. We can do the same as above and pass the `mol_path` column to `calc_vol()` and then join the resulting data frame and do some basic data wrangling to find the top 10 most volatile compounds in that pathway.
```r
rvi_path <- calc_vol(alam_pathway$mol_path)
#> Warning in FUN(X[[i]], ...): Possible OpenBabel errors detected and only NAs returned.
#> Run with `validate = FALSE` to ignore this.
```
```r
left_join(alam_pathway, rvi_path, by = join_by(mol_path)) %>%
select(-mol_path) %>%
#arrange from most to least volatile
arrange(desc(rvi)) %>%
#take just the top 10
slice_head(n = 10)
#> # A tibble: 10 × 6
#> pathway_id compound_id formula name rvi category
#>
#> 1 map00592 C16310 C6H10O 3-Hexenal 7.32 high
#> 2 map00592 C19757 C8H14O2 (3Z)-Hex-3-en-1-yl acetate 6.75 high
#> 3 map00592 C08492 C6H12O 3-Hexenol 6.45 high
#> 4 map00592 C16323 C9H14O 3,6-Nonadienal 6.05 high
#> 5 map00592 C11512 C13H20O3 Methyl jasmonate 3.81 high
#> 6 map00592 C16318 C13H20O3 (+)-7-Isomethyljasmonate 3.81 high
#> 7 map00592 C16322 C9H16O3 9-Oxononanoic acid 2.77 high
#> 8 map00592 C16343 C17H28O Heptadecatrienal 2.69 high
#> 9 map00592 C08491 C12H18O3 (-)-Jasmonic acid 1.84 moderate
#> 10 map00592 C16317 C12H18O3 (+)-7-Isojasmonic acid 1.84 moderate
```