Introduction to volcalc

library(volcalc)

The primary function in volcalc is calc_vol(). It accepts either a path to a .mol file or a SMILES string. There are a few example .mol files included in the volcalc installation and their file paths are returned by mol_example().

Basic usage with .mol files

#using built-in example .mol files
mol_paths <- mol_example()
mol_paths
#> [1] "/tmp/RtmpASHVvG/Rinst14b44644d793/volcalc/extdata/C00031.mol"
#> [2] "/tmp/RtmpASHVvG/Rinst14b44644d793/volcalc/extdata/C00157.mol"
#> [3] "/tmp/RtmpASHVvG/Rinst14b44644d793/volcalc/extdata/C08491.mol"
#> [4] "/tmp/RtmpASHVvG/Rinst14b44644d793/volcalc/extdata/C16181.mol"
#> [5] "/tmp/RtmpASHVvG/Rinst14b44644d793/volcalc/extdata/C16286.mol"
#> [6] "/tmp/RtmpASHVvG/Rinst14b44644d793/volcalc/extdata/C16521.mol"

The default output of calc_vol() includes a relative volatility index, rvi which is equivalent to log10C* (Meredith et al., 2023). It also includes a RVI category for clean air.

calc_vol(mol_paths)
#> Warning in FUN(X[[i]], ...): Possible OpenBabel errors detected and only NAs returned.
#> Run with `validate = FALSE` to ignore this.
#> # A tibble: 6 × 5
#>   mol_path                                          formula name    rvi category
#>   <chr>                                             <chr>   <chr> <dbl> <fct>   
#> 1 /tmp/RtmpASHVvG/Rinst14b44644d793/volcalc/extdat… C6H12O6 D-Gl… -2.81 non-vol…
#> 2 /tmp/RtmpASHVvG/Rinst14b44644d793/volcalc/extdat… <NA>    Phos… NA    <NA>    
#> 3 /tmp/RtmpASHVvG/Rinst14b44644d793/volcalc/extdat… C12H18… (-)-…  1.84 moderate
#> 4 /tmp/RtmpASHVvG/Rinst14b44644d793/volcalc/extdat… C6H7Cl… beta…  6.98 high    
#> 5 /tmp/RtmpASHVvG/Rinst14b44644d793/volcalc/extdat… C12H22O Geos…  4.16 high    
#> 6 /tmp/RtmpASHVvG/Rinst14b44644d793/volcalc/extdat… C5H8    Isop…  8.84 high

Specify environment

Specifying environment only alters the RVI category by using different RVI cutoffs for non-volatile, low, moderate, and high volatility. Environment options and their category cutoffs are in the calc_vol() documentation and are discussed in more detail in Meredith et al. (2023) and Donahue et al. (2006).

calc_vol(mol_paths, environment = "soil")
#> Warning in FUN(X[[i]], ...): Possible OpenBabel errors detected and only NAs returned.
#> Run with `validate = FALSE` to ignore this.
#> # A tibble: 6 × 5
#>   mol_path                                          formula name    rvi category
#>   <chr>                                             <chr>   <chr> <dbl> <fct>   
#> 1 /tmp/RtmpASHVvG/Rinst14b44644d793/volcalc/extdat… C6H12O6 D-Gl… -2.81 non-vol…
#> 2 /tmp/RtmpASHVvG/Rinst14b44644d793/volcalc/extdat… <NA>    Phos… NA    <NA>    
#> 3 /tmp/RtmpASHVvG/Rinst14b44644d793/volcalc/extdat… C12H18… (-)-…  1.84 non-vol…
#> 4 /tmp/RtmpASHVvG/Rinst14b44644d793/volcalc/extdat… C6H7Cl… beta…  6.98 moderate
#> 5 /tmp/RtmpASHVvG/Rinst14b44644d793/volcalc/extdat… C12H22O Geos…  4.16 low     
#> 6 /tmp/RtmpASHVvG/Rinst14b44644d793/volcalc/extdat… C5H8    Isop…  8.84 high

Return intermediate steps

calc_vol() uses a modified version of the SIMPOL.1 method by default which is a group contribution method. You can have calc_vol() return the counts of functional groups and other molecular properties (which is useful for validation) with return_fx_groups = TRUE. See ?get_fx_groups() for more information about these additional columns.

calc_vol(mol_paths, return_fx_groups = TRUE)
#> Warning in FUN(X[[i]], ...): Possible OpenBabel errors detected and only NAs returned.
#> Run with `validate = FALSE` to ignore this.
#> # A tibble: 6 × 53
#>   mol_path           formula name    rvi category exact_mass carbons carbons_asa
#>   <chr>              <chr>   <chr> <dbl> <fct>         <dbl>   <int>       <int>
#> 1 /tmp/RtmpASHVvG/R… C6H12O6 D-Gl… -2.81 non-vol…      180.        6           0
#> 2 /tmp/RtmpASHVvG/R… <NA>    Phos… NA    <NA>           NA         0           0
#> 3 /tmp/RtmpASHVvG/R… C12H18… (-)-…  1.84 moderate      210.       12           0
#> 4 /tmp/RtmpASHVvG/R… C6H7Cl… beta…  6.98 high          270.        6           0
#> 5 /tmp/RtmpASHVvG/R… C12H22O Geos…  4.16 high          182.       12           0
#> 6 /tmp/RtmpASHVvG/R… C5H8    Isop…  8.84 high           68.1       5           0
#> # ℹ 45 more variables: rings_aromatic <int>, rings_total <int>,
#> #   rings_aliphatic <int>, carbon_dbl_bonds_aliphatic <int>,
#> #   CCCO_aliphatic_ring <int>, hydroxyl_total <int>, hydroxyl_aromatic <int>,
#> #   hydroxyl_aliphatic <int>, aldehydes <int>, ketones <int>,
#> #   carbox_acids <int>, ester <int>, ether_total <int>, ether_alkyl <int>,
#> #   ether_alicyclic <int>, ether_aromatic <int>, nitrate <int>, nitro <int>,
#> #   amine_primary <int>, amine_secondary <int>, amine_tertiary <int>, …

The SIMPOL.1 method calculates log10PL, i(T), which is used by calc_vol() to calculate RVI as log10(PM/RT) where P is the estimated vapor pressure for the compound, M is molecular weight of the compound, R is the universal gas constant, and T is temperature (293.14K or 20ºC). To see these intermediate calculations, use return_calc_steps = TRUE.

calc_vol(mol_paths, return_calc_steps = TRUE)
#> Warning in FUN(X[[i]], ...): Possible OpenBabel errors detected and only NAs returned.
#> Run with `validate = FALSE` to ignore this.
#> # A tibble: 6 × 8
#>   mol_path       formula name    rvi category molecular_weight log_alpha log10_P
#>   <chr>          <chr>   <chr> <dbl> <fct>               <dbl>     <dbl>   <dbl>
#> 1 /tmp/RtmpASHV… C6H12O6 D-Gl… -2.81 non-vol…            180.       9.87  -12.7 
#> 2 /tmp/RtmpASHV… <NA>    Phos… NA    <NA>                 NA       NA       1.79
#> 3 /tmp/RtmpASHV… C12H18… (-)-…  1.84 moderate            210.       9.94   -8.10
#> 4 /tmp/RtmpASHV… C6H7Cl… beta…  6.98 high                272.      10.1    -3.08
#> 5 /tmp/RtmpASHV… C12H22O Geos…  4.16 high                182.       9.88   -5.72
#> 6 /tmp/RtmpASHV… C5H8    Isop…  8.84 high                 68.1      9.45   -0.61

log_alpha = log10(M/RT)

Use with SMILES

All of this can be done using SMILES strings rather than .mol files with from = "smiles". Backslash, \ is a valid SMILES character, but isn’t a valid character in R and must be “escaped” as \\.

## This will error even though the SMILES is correct
# calc_vol("CC/C=C\C[C@@H]1[C@H](CCC1=O)CC(=O)O", from = "smiles")

# To solve this, escape \C as \\C
calc_vol("CC/C=C\\C[C@@H]1[C@H](CCC1=O)CC(=O)O", from = "smiles")
#> # A tibble: 1 × 5
#>   smiles                                 formula  name    rvi category
#>   <chr>                                  <chr>    <chr> <dbl> <fct>   
#> 1 "CC/C=C\\C[C@@H]1[C@H](CCC1=O)CC(=O)O" C12H18O3 <NA>   1.84 moderate

Validation

Occasionally, a .mol file will result in an error message bubbling up from the OpenBabel command line utility. For example, if there is an ‘R’ group somewhere in the molecule as is the case with Phosphatidylcholine on KEGG.

# phosphatidylcholine .mol file from KEGG
c00157 <- mol_example()[2]
calc_vol(c00157)
#> ==============================
#> *** Open Babel Warning  in InChI code
#>   Phosphatidylcholine :Unknown element(s): *
#> ==============================
#> *** Open Babel Error  in InChI code
#>   InChI generation failed
#> # A tibble: 1 × 5
#>   mol_path                                                          formula name                  rvi category
#>   <chr>                                                             <chr>   <chr>               <dbl> <fct>   
#> 1 /Users/ericscott/Documents/GitHub/volcalc/inst/extdata/C00157.mol NA      Phosphatidylcholine    NA NA      
#> Warning message:
#> In FUN(X[[i]], ...) :
#>   Possible OpenBabel errors detected and only NAs returned.
#> Run with `validate = FALSE` to ignore this.

Without validation, it will return an incorrect value for rvi and category for this compound.

calc_vol(c00157, validate = FALSE)
#> ==============================
#> *** Open Babel Warning  in InChI code
#>   Phosphatidylcholine :Unknown element(s): *
#> ==============================
#> *** Open Babel Error  in InChI code
#>   InChI generation failed
#> # A tibble: 1 × 5
#>   mol_path                                                          formula    name                 rvi category
#>   <chr>                                                             <chr>      <chr>              <dbl> <fct>   
#> 1 /Users/ericscott/Documents/GitHub/volcalc/inst/extdata/C00157.mol C10H18NO8P Phosphatidylcholi…  2.89 high     

Phosphatidylcholine is a large phospholipid and is not highly volatile as these results would suggest.

Details

Unfortunately, it is nearly impossible to detect these parsing errors from OpenBabel directly in R. When validate = TRUE is set (which it is by default), calc_vol() will look for “symptoms” of OpenBabel errors and return NAs for all values. Namely, validation works by assuming that InChI generation will fail whenever there are OpenBabel parsing issues. Because InChI generation is not available on the Windows version of OpenBabel installed with ChemmineOB, this volcalc feature is only available on macOS and Linux. Setting validate = TRUE on Windows will have no effect.

References

Donahue, N.M., Robinson, A.L., Stanier, C.O., Pandis, S.N., 2006. Coupled Partitioning, Dilution, and Chemical Aging of Semivolatile Organics. Environ. Sci. Technol. 40, 2635–2643. DOI: 10.1021/es052297c

Meredith L, Ledford S, Riemer K, Geffre P, Graves K, Honeker L, LeBauer D, Tfaily M, Krechmer J, 2023. Automating methods for estimating metabolite volatility. Frontiers in Microbiology. DOI: 10.3389/fmicb.2023.1267234