The primary function in volcalc
is
calc_vol()
. It accepts either a path to a .mol file or a
SMILES string. There are a few example .mol files included in the
volcalc
installation and their file paths are returned by
mol_example()
.
#using built-in example .mol files
mol_paths <- mol_example()
mol_paths
#> [1] "/tmp/Rtmp7Kwjk8/Rinst13be5e9f4054/volcalc/extdata/C00031.mol"
#> [2] "/tmp/Rtmp7Kwjk8/Rinst13be5e9f4054/volcalc/extdata/C00157.mol"
#> [3] "/tmp/Rtmp7Kwjk8/Rinst13be5e9f4054/volcalc/extdata/C08491.mol"
#> [4] "/tmp/Rtmp7Kwjk8/Rinst13be5e9f4054/volcalc/extdata/C16181.mol"
#> [5] "/tmp/Rtmp7Kwjk8/Rinst13be5e9f4054/volcalc/extdata/C16286.mol"
#> [6] "/tmp/Rtmp7Kwjk8/Rinst13be5e9f4054/volcalc/extdata/C16521.mol"
The default output of calc_vol()
includes a relative
volatility index, rvi
which is equivalent to log10C*
(Meredith et al., 2023). It also includes a RVI category for clean
air.
calc_vol(mol_paths)
#> Warning in FUN(X[[i]], ...): Possible OpenBabel errors detected and only NAs returned.
#> Run with `validate = FALSE` to ignore this.
#> # A tibble: 6 × 5
#> mol_path formula name rvi category
#> <chr> <chr> <chr> <dbl> <fct>
#> 1 /tmp/Rtmp7Kwjk8/Rinst13be5e9f4054/volcalc/extdat… C6H12O6 D-Gl… -2.81 non-vol…
#> 2 /tmp/Rtmp7Kwjk8/Rinst13be5e9f4054/volcalc/extdat… <NA> Phos… NA <NA>
#> 3 /tmp/Rtmp7Kwjk8/Rinst13be5e9f4054/volcalc/extdat… C12H18… (-)-… 1.84 moderate
#> 4 /tmp/Rtmp7Kwjk8/Rinst13be5e9f4054/volcalc/extdat… C6H7Cl… beta… 6.98 high
#> 5 /tmp/Rtmp7Kwjk8/Rinst13be5e9f4054/volcalc/extdat… C12H22O Geos… 4.16 high
#> 6 /tmp/Rtmp7Kwjk8/Rinst13be5e9f4054/volcalc/extdat… C5H8 Isop… 8.84 high
Specifying environment
only alters the RVI category by
using different RVI cutoffs for non-volatile, low, moderate, and high
volatility. Environment options and their category cutoffs are in the
calc_vol()
documentation and are discussed in more detail
in Meredith et al. (2023) and Donahue et al. (2006).
calc_vol(mol_paths, environment = "soil")
#> Warning in FUN(X[[i]], ...): Possible OpenBabel errors detected and only NAs returned.
#> Run with `validate = FALSE` to ignore this.
#> # A tibble: 6 × 5
#> mol_path formula name rvi category
#> <chr> <chr> <chr> <dbl> <fct>
#> 1 /tmp/Rtmp7Kwjk8/Rinst13be5e9f4054/volcalc/extdat… C6H12O6 D-Gl… -2.81 non-vol…
#> 2 /tmp/Rtmp7Kwjk8/Rinst13be5e9f4054/volcalc/extdat… <NA> Phos… NA <NA>
#> 3 /tmp/Rtmp7Kwjk8/Rinst13be5e9f4054/volcalc/extdat… C12H18… (-)-… 1.84 non-vol…
#> 4 /tmp/Rtmp7Kwjk8/Rinst13be5e9f4054/volcalc/extdat… C6H7Cl… beta… 6.98 moderate
#> 5 /tmp/Rtmp7Kwjk8/Rinst13be5e9f4054/volcalc/extdat… C12H22O Geos… 4.16 low
#> 6 /tmp/Rtmp7Kwjk8/Rinst13be5e9f4054/volcalc/extdat… C5H8 Isop… 8.84 high
calc_vol()
uses a modified version of the SIMPOL.1
method by default which is a group contribution method. You can have
calc_vol()
return the counts of functional groups and other
molecular properties (which is useful for validation) with
return_fx_groups = TRUE
. See ?get_fx_groups()
for more information about these additional columns.
calc_vol(mol_paths, return_fx_groups = TRUE)
#> Warning in FUN(X[[i]], ...): Possible OpenBabel errors detected and only NAs returned.
#> Run with `validate = FALSE` to ignore this.
#> # A tibble: 6 × 53
#> mol_path formula name rvi category exact_mass carbons carbons_asa
#> <chr> <chr> <chr> <dbl> <fct> <dbl> <int> <int>
#> 1 /tmp/Rtmp7Kwjk8/R… C6H12O6 D-Gl… -2.81 non-vol… 180. 6 0
#> 2 /tmp/Rtmp7Kwjk8/R… <NA> Phos… NA <NA> NA 0 0
#> 3 /tmp/Rtmp7Kwjk8/R… C12H18… (-)-… 1.84 moderate 210. 12 0
#> 4 /tmp/Rtmp7Kwjk8/R… C6H7Cl… beta… 6.98 high 270. 6 0
#> 5 /tmp/Rtmp7Kwjk8/R… C12H22O Geos… 4.16 high 182. 12 0
#> 6 /tmp/Rtmp7Kwjk8/R… C5H8 Isop… 8.84 high 68.1 5 0
#> # ℹ 45 more variables: rings_aromatic <int>, rings_total <int>,
#> # rings_aliphatic <int>, carbon_dbl_bonds_aliphatic <int>,
#> # CCCO_aliphatic_ring <int>, hydroxyl_total <int>, hydroxyl_aromatic <int>,
#> # hydroxyl_aliphatic <int>, aldehydes <int>, ketones <int>,
#> # carbox_acids <int>, ester <int>, ether_total <int>, ether_alkyl <int>,
#> # ether_alicyclic <int>, ether_aromatic <int>, nitrate <int>, nitro <int>,
#> # amine_primary <int>, amine_secondary <int>, amine_tertiary <int>, …
The SIMPOL.1 method calculates log10PL, i∘(T),
which is used by calc_vol()
to calculate RVI as log10(PM/RT)
where P is the estimated vapor
pressure for the compound, M
is molecular weight of the compound, R is the universal gas constant, and
T is temperature (293.14K or
20ºC). To see these intermediate calculations, use
return_calc_steps = TRUE
.
calc_vol(mol_paths, return_calc_steps = TRUE)
#> Warning in FUN(X[[i]], ...): Possible OpenBabel errors detected and only NAs returned.
#> Run with `validate = FALSE` to ignore this.
#> # A tibble: 6 × 8
#> mol_path formula name rvi category molecular_weight log_alpha log10_P
#> <chr> <chr> <chr> <dbl> <fct> <dbl> <dbl> <dbl>
#> 1 /tmp/Rtmp7Kwj… C6H12O6 D-Gl… -2.81 non-vol… 180. 9.87 -12.7
#> 2 /tmp/Rtmp7Kwj… <NA> Phos… NA <NA> NA NA 1.79
#> 3 /tmp/Rtmp7Kwj… C12H18… (-)-… 1.84 moderate 210. 9.94 -8.10
#> 4 /tmp/Rtmp7Kwj… C6H7Cl… beta… 6.98 high 272. 10.1 -3.08
#> 5 /tmp/Rtmp7Kwj… C12H22O Geos… 4.16 high 182. 9.88 -5.72
#> 6 /tmp/Rtmp7Kwj… C5H8 Isop… 8.84 high 68.1 9.45 -0.61
log_alpha
= log10(M/RT)
All of this can be done using SMILES
strings rather than .mol files with from = "smiles"
.
Backslash, \
is a valid SMILES character, but isn’t a valid
character in R and must be “escaped” as \\
.
## This will error even though the SMILES is correct
# calc_vol("CC/C=C\C[C@@H]1[C@H](CCC1=O)CC(=O)O", from = "smiles")
# To solve this, escape \C as \\C
calc_vol("CC/C=C\\C[C@@H]1[C@H](CCC1=O)CC(=O)O", from = "smiles")
#> # A tibble: 1 × 5
#> smiles formula name rvi category
#> <chr> <chr> <chr> <dbl> <fct>
#> 1 "CC/C=C\\C[C@@H]1[C@H](CCC1=O)CC(=O)O" C12H18O3 <NA> 1.84 moderate
Occasionally, a .mol file will result in an error message bubbling up from the OpenBabel command line utility. For example, if there is an ‘R’ group somewhere in the molecule as is the case with Phosphatidylcholine on KEGG.
calc_vol(c00157)
#> ==============================
#> *** Open Babel Warning in InChI code
#> Phosphatidylcholine :Unknown element(s): *
#> ==============================
#> *** Open Babel Error in InChI code
#> InChI generation failed
#> # A tibble: 1 × 5
#> mol_path formula name rvi category
#> <chr> <chr> <chr> <dbl> <fct>
#> 1 /Users/ericscott/Documents/GitHub/volcalc/inst/extdata/C00157.mol NA Phosphatidylcholine NA NA
#> Warning message:
#> In FUN(X[[i]], ...) :
#> Possible OpenBabel errors detected and only NAs returned.
#> Run with `validate = FALSE` to ignore this.
Without validation, it will return an incorrect value for
rvi
and category
for this compound.
calc_vol(c00157, validate = FALSE)
#> ==============================
#> *** Open Babel Warning in InChI code
#> Phosphatidylcholine :Unknown element(s): *
#> ==============================
#> *** Open Babel Error in InChI code
#> InChI generation failed
#> # A tibble: 1 × 5
#> mol_path formula name rvi category
#> <chr> <chr> <chr> <dbl> <fct>
#> 1 /Users/ericscott/Documents/GitHub/volcalc/inst/extdata/C00157.mol C10H18NO8P Phosphatidylcholi… 2.89 high
Phosphatidylcholine is a large phospholipid and is not highly volatile as these results would suggest.
Unfortunately, it is nearly impossible to detect these parsing errors
from OpenBabel directly in R. When validate = TRUE
is set
(which it is by default), calc_vol()
will look for
“symptoms” of OpenBabel errors and return NA
s for all
values. Namely, validation works by assuming that InChI generation will
fail whenever there are OpenBabel parsing issues. Because InChI
generation is not available on the Windows version of OpenBabel
installed with ChemmineOB
, this volcalc
feature is only available on macOS and Linux. Setting
validate = TRUE
on Windows will have no effect.
Donahue, N.M., Robinson, A.L., Stanier, C.O., Pandis, S.N., 2006. Coupled Partitioning, Dilution, and Chemical Aging of Semivolatile Organics. Environ. Sci. Technol. 40, 2635–2643. DOI: 10.1021/es052297c
Meredith L, Ledford S, Riemer K, Geffre P, Graves K, Honeker L, LeBauer D, Tfaily M, Krechmer J, 2023. Automating methods for estimating metabolite volatility. Frontiers in Microbiology. DOI: 10.3389/fmicb.2023.1267234