--- title: "Getting Started" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{test} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ``` r library(fluxnet) #> ! Use of data downloaded by fluxnet requires you abide by FLUXNET data policies: <]8;;https://fluxnet.org/data/data-policy/https://fluxnet.org/data/data-policy/]8;;> #> ℹ Citations for individual sites' datasets are returned by `]8;;x-r-help:fluxnet::flux_listallfluxnet::flux_listall]8;;()` library(dplyr) ``` ## Discovering what is available for download `flux_listall()` is an R wrapper around a command line program, [`fluxnet-shuttle`](https://github.com/fluxnet/shuttle), which requires Python. The first time you run `flux_listall()`, a Python virtual environment will be created and `fluxnet-shuttle` will be installed into it. If you don't have an appropriate version of Python installed, you may be prompted with tips on how to install it. ``` r list <- flux_listall() ``` ``` #> virtualenv: fluxnet #> File list is expired, downloading the latest version ``` By default, the results of `flux_listall()` are saved in a user cache, so when you run it again it'll pull the results from there unless they are older than `cach_age`. To ignore the cache, use `use_cache = FALSE`. To *invalidate* the cache (and replace it with an updated one), use `cache_age = -Inf`. ``` r # Don't use cached results: list <- flux_listall(use_cache = FALSE) # Invalidate and replace cached results: list <- flux_listall(cache_age = -Inf) ``` The list returned by `flux_listall()` contains metadata on the available sites including, importantly, citations for site-level data attribution which is required by FLUXNET. ``` r colnames(list) #> [1] "data_hub" "site_id" "site_name" "location_lat" "location_long" "igbp" "network" #> [8] "team_member_name" "team_member_role" "team_member_email" "first_year" "last_year" "download_link" "fluxnet_product_name" #> [15] "product_citation" "product_id" "oneflux_code_version" "product_source_network" list %>% select(site_id, product_citation) #> # A tibble: 301 × 2 #> site_id product_citation #> #> 1 AR-CCg Gabriela Posse (2025), AmeriFlux FLUXNET-1F AR-CCg Carlos Casares grassland, Ver. v1.3_r1, AmeriFlux AMP, (Dataset). https://doi.org/10.17190/AMF/2469434 #> 2 AR-TF1 Lars Kutzbach (2025), AmeriFlux FLUXNET-1F AR-TF1 Rio Moat bog, Ver. v1.3_r1, AmeriFlux AMP, (Dataset). https://doi.org/10.17190/AMF/1818370 #> 3 AR-TF2 Lars Kutzbach (2025), AmeriFlux FLUXNET-1F AR-TF2 Rio Pipo bog, Ver. v1.3_r1, AmeriFlux AMP, (Dataset). https://doi.org/10.17190/AMF/2571120 #> 4 BR-CST Antonio Antonino (2025), AmeriFlux FLUXNET-1F BR-CST Caatinga Serra Talhada, Ver. v1.3_r1, AmeriFlux AMP, (Dataset). https://doi.org/10.17190/AMF/1902820 #> 5 CA-ARB Aaron Todd, Elyn Humphreys (2025), AmeriFlux FLUXNET-1F CA-ARB Attawapiskat River Bog, Ver. v1.3_r1, AmeriFlux AMP, (Dataset). https://doi.org/10.17190/AMF/1902821 #> 6 CA-Ca1 T. Andrew Black (2025), AmeriFlux FLUXNET-1F CA-Ca1 British Columbia - 1949 Douglas-fir stand, Ver. v1.3_r1, AmeriFlux AMP, (Dataset). https://doi.org/10.17190/AMF/2007163 #> 7 CA-Ca2 T. Andrew Black (2025), AmeriFlux FLUXNET-1F CA-Ca2 British Columbia - Clearcut Douglas-fir stand (harvested winter 1999/2000), Ver. v1.3_r1, AmeriFlux AMP, (Dataset). https://doi.org/10… #> 8 CA-DB2 Sara Knox (2025), AmeriFlux FLUXNET-1F CA-DB2 Delta Burns Bog 2, Ver. v1.3_r1, AmeriFlux AMP, (Dataset). https://doi.org/10.17190/AMF/1881564 #> 9 CA-DBB Andreas Christen, Sara Knox (2025), AmeriFlux FLUXNET-1F CA-DBB Delta Burns Bog, Ver. v1.3_r1, AmeriFlux AMP, (Dataset). https://doi.org/10.17190/AMF/1881565 #> 10 CA-DSM Sara Knox (2025), AmeriFlux FLUXNET-1F CA-DSM Delta Salt Marsh, Ver. v1.3_r1, AmeriFlux AMP, (Dataset). https://doi.org/10.17190/AMF/2571137 #> # ℹ 291 more rows ``` ## Downloading data There are a few paths to downloading FLUXNET data. If you just want to download everything available, simply run `flux_download()`. You can download just specific sites with the `site_ids` argument, or you can filter the results of `flux_listall()` and pass those in. ``` r # Download everything available. flux_download() # Download just certain sites flux_download(site_ids = c("AR-CCg", "AR-TF1", "BR-CST")) # Filter list and download list_wet <- list %>% filter(igbp == "WET") flux_download(file_list_df = list_wet) ``` ``` #> Downloading data from all available sites. #> [working] (0 + 0) -> 4 -> 1 | ■■■■■■■ 20% #> [working] (0 + 0) -> 3 -> 2 | ■■■■■■■■■■■■■ 40% #> [working] (0 + 0) -> 2 -> 3 | ■■■■■■■■■■■■■■■■■■■ 60% #> [working] (0 + 0) -> 1 -> 4 | ■■■■■■■■■■■■■■■■■■■■■■■■■ 80% #> ``` ## Extracting data from .zip files `flux_extract()` allows you to unzip only desired files from all or some of the downloaded site .zip files. ``` r # Extract everything (not recommended!) flux_extract() # Extract just annual and monthly data flux_extract(resolutions = c("y", "m")) # Extract hourly data for just certain sites flux_extract(site_ids = c("AR-CCg", "AR-TF1"), resolutions = "h") # Don't extract BIF and BIFVARINFO CSVs flux_extract(extract_varinfo = FALSE) ``` ## Discovering what data you have extracted `flux_discover_files()` is used to create a "manifest" of the data available to read in. You *must* create this manifest (and optionally filter it) to pass into `flux_read()`. ``` r manifest <- flux_discover_files() manifest ``` ``` #> MM / ERA5 → 5 sites, 221 site-years across 5 files #> MM / FLUXMET → 5 sites, 18 site-years across 5 files #> YY / ERA5 → 5 sites, 221 site-years across 5 files #> YY / FLUXMET → 5 sites, 18 site-years across 5 files #> # A tibble: 35 × 23 #> path product_source_network site_id dataset time_resolution first_year last_year oneflux_code_version release_version download_time data_hub site_name location_lat location_long igbp #> #> 1 ….3_r1.csv AMF CA-MA1 BIF 2009 2011 v1.3 r1 2026-03-17 14:47:12 AmeriFl… Manitoba… 50.2 -97.9 CRO #> 2 ….3_r1.csv AMF CA-MA1 BIFVAR… MM 2009 2011 v1.3 r1 2026-03-17 14:47:12 AmeriFl… Manitoba… 50.2 -97.9 CRO #> 3 ….3_r1.csv AMF CA-MA1 BIFVAR… YY 2009 2011 v1.3 r1 2026-03-17 14:47:12 AmeriFl… Manitoba… 50.2 -97.9 CRO #> 4 ….3_r1.csv AMF CA-MA1 ERA5 MM 1981 2024 v1.3 r1 2026-03-17 14:47:12 AmeriFl… Manitoba… 50.2 -97.9 CRO #> 5 ….3_r1.csv AMF CA-MA1 ERA5 YY 1981 2024 v1.3 r1 2026-03-17 14:47:12 AmeriFl… Manitoba… 50.2 -97.9 CRO #> 6 ….3_r1.csv AMF CA-MA1 FLUXMET MM 2009 2011 v1.3 r1 2026-03-17 14:47:12 AmeriFl… Manitoba… 50.2 -97.9 CRO #> 7 ….3_r1.csv AMF CA-MA1 FLUXMET YY 2009 2011 v1.3 r1 2026-03-17 14:47:12 AmeriFl… Manitoba… 50.2 -97.9 CRO #> 8 ….3_r1.csv ICOS DE-Amv BIF 2023 2024 v1.3 r1 2026-03-17 14:47:12 ICOS Amtsvenn 52.2 6.96 WET #> 9 ….3_r1.csv ICOS DE-Amv BIFVAR… MM 2023 2024 v1.3 r1 2026-03-17 14:47:12 ICOS Amtsvenn 52.2 6.96 WET #> 10 ….3_r1.csv ICOS DE-Amv BIFVAR… YY 2023 2024 v1.3 r1 2026-03-17 14:47:12 ICOS Amtsvenn 52.2 6.96 WET #> # ℹ 25 more rows #> # ℹ 8 more variables: network , team_member_name , team_member_role , team_member_email , download_link , fluxnet_product_name , product_citation , #> # product_id ``` You can visualize the sites you have data for with `flux_map_sites()`. ``` r flux_map_sites(manifest) ``` ![Map of downloaded sites](figure/fig-map-sites-1.png) ## Reading in data You can read in data by passing a manifest to `flux_read()`. You can read in data for just select sites with the `site_ids` argument, but for more complex filtering you can simply subset the manifest object first. ``` r # Read all available annual data annual <- flux_read(manifest, resolution = "y") #> Reading 10 files. ``` ``` r ## NOT RUN ## # Read in hourly data from specific sites hourly <- flux_read( manifest, resolution = "h", site_ids = c("AR-CCg", "AR-TF1") ) # Read in only ERA5 data annual_era5 <- flux_read(manifest, resolution = "y", datasets = "ERA5") # Filter manifest to just sites with "WET" for IGBP n_wet_manifest <- left_join(manifest, list, by = join_by(site_id)) %>% filter(igbp == "WET", location_lat > 0) n_wet_monthly <- flux_read(n_wet_manifest, resolution = "m") ``` ## QC flagging Fluxnet data can be gapfilled to varying degrees. `flux_qc()` allows you to flag overly-gapfilled data using your choice of variables and gapfilling threshold(s). ### Example 1: Flag rows where `NEE_VUT_REF` is more than 30% gapfilled ``` r flagged_nee <- flux_qc(annual, qc_vars = "NEE_VUT_REF", max_gapfilled = 0.3) flagged_nee %>% filter(qc_flagged) %>% select(qc_flagged, p_gapfilled, NEE_VUT_REF_QC, everything()) #> # A tibble: 0 × 337 #> # ℹ 337 variables: qc_flagged , p_gapfilled , NEE_VUT_REF_QC , site_id , dataset , YEAR , TA_ERA , TA_ERA_NIGHT , TA_ERA_NIGHT_SD , TA_ERA_DAY , #> # TA_ERA_DAY_SD , SW_IN_ERA , LW_IN_ERA , VPD_ERA , PA_ERA , P_ERA , WS_ERA , TA_F_MDS , TA_F_MDS_QC , TA_F_MDS_NIGHT , TA_F_MDS_NIGHT_SD , #> # TA_F_MDS_NIGHT_QC , TA_F_MDS_DAY , TA_F_MDS_DAY_SD , TA_F_MDS_DAY_QC , TA_F , TA_F_QC , TA_F_NIGHT , TA_F_NIGHT_SD , TA_F_NIGHT_QC , TA_F_DAY , #> # TA_F_DAY_SD , TA_F_DAY_QC , SW_IN_F_MDS , SW_IN_F_MDS_QC , SW_IN_F , SW_IN_F_QC , LW_IN_F_MDS , LW_IN_F_MDS_QC , LW_IN_F , LW_IN_F_QC , #> # LW_IN_JSB , LW_IN_JSB_QC , LW_IN_JSB_ERA , LW_IN_JSB_F , LW_IN_JSB_F_QC , VPD_F_MDS , VPD_F_MDS_QC , VPD_F , VPD_F_QC , PA_F , PA_F_QC , #> # P_F , P_F_QC , WS_F , WS_F_QC , USTAR , USTAR_QC , PPFD_IN , PPFD_IN_QC , PPFD_OUT , PPFD_OUT_QC , CO2_F_MDS , CO2_F_MDS_QC , #> # TS_F_MDS_1 , TS_F_MDS_2 , TS_F_MDS_1_QC , TS_F_MDS_2_QC , SWC_F_MDS_1 , SWC_F_MDS_1_QC , G_F_MDS , G_F_MDS_QC , LE_F_MDS , LE_F_MDS_QC , … ``` Notice that in this case `p_gapfilled` is just 1 - `NEE_VUT_REF_QC`. ### Example 2: Flag rows where NEE_VUT_REF *or* TA_F are more than 30% gapfilled ``` r flagged_2 <- flux_qc( annual, qc_vars = c("NEE_VUT_REF", "TA_F"), max_gapfilled = 0.3 ) flagged_2 %>% filter(qc_flagged) %>% select(qc_flagged, p_gapfilled, NEE_VUT_REF_QC, TA_F_QC, everything()) #> # A tibble: 6 × 337 #> qc_flagged p_gapfilled NEE_VUT_REF_QC TA_F_QC site_id dataset YEAR TA_ERA TA_ERA_NIGHT TA_ERA_NIGHT_SD TA_ERA_DAY TA_ERA_DAY_SD SW_IN_ERA LW_IN_ERA VPD_ERA PA_ERA P_ERA WS_ERA TA_F_MDS #> #> 1 TRUE 0.386 NA 0.614 CA-MA1 FLUXMET 2009 1.43 0.429 2.42 2.20 2.46 170. 276. 2.78 98.6 421. 3.23 NA #> 2 TRUE 0.817 NA 0.183 DK-Gds FLUXMET 2020 9.30 8.09 1.47 10.1 1.91 110. 312. 2.51 99.8 1002. 3.80 NA #> 3 TRUE 0.597 NA 0.403 JP-Hc3 FLUXMET 2005 15.6 15.1 1.57 16.0 1.55 171. 330. 4.28 101. 3387. 1.41 16.0 #> 4 TRUE 0.627 NA 0.373 JP-Hc3 FLUXMET 2006 16.0 15.6 1.37 16.3 1.47 159. 336. 4.08 101 4988. 1.30 23.7 #> 5 TRUE 0.486 NA 0.514 RU-Fy4 FLUXMET 2015 5.83 4.61 1.94 6.60 2.45 110. 311. 2.52 97.4 366. 2.25 NA #> 6 TRUE 0.392 0.700 0.608 RU-Fy4 FLUXMET 2019 6.00 4.75 1.92 6.73 2.36 107. 314. 2.41 97.3 447. 2.27 7.34 #> # ℹ 318 more variables: TA_F_MDS_QC , TA_F_MDS_NIGHT , TA_F_MDS_NIGHT_SD , TA_F_MDS_NIGHT_QC , TA_F_MDS_DAY , TA_F_MDS_DAY_SD , TA_F_MDS_DAY_QC , TA_F , #> # TA_F_NIGHT , TA_F_NIGHT_SD , TA_F_NIGHT_QC , TA_F_DAY , TA_F_DAY_SD , TA_F_DAY_QC , SW_IN_F_MDS , SW_IN_F_MDS_QC , SW_IN_F , SW_IN_F_QC , #> # LW_IN_F_MDS , LW_IN_F_MDS_QC , LW_IN_F , LW_IN_F_QC , LW_IN_JSB , LW_IN_JSB_QC , LW_IN_JSB_ERA , LW_IN_JSB_F , LW_IN_JSB_F_QC , VPD_F_MDS , #> # VPD_F_MDS_QC , VPD_F , VPD_F_QC , PA_F , PA_F_QC , P_F , P_F_QC , WS_F , WS_F_QC , USTAR , USTAR_QC , PPFD_IN , PPFD_IN_QC , #> # PPFD_OUT , PPFD_OUT_QC , CO2_F_MDS , CO2_F_MDS_QC , TS_F_MDS_1 , TS_F_MDS_2 , TS_F_MDS_1_QC , TS_F_MDS_2_QC , SWC_F_MDS_1 , SWC_F_MDS_1_QC , #> # G_F_MDS , G_F_MDS_QC , LE_F_MDS , LE_F_MDS_QC , LE_CORR , LE_RANDUNC , H_F_MDS , H_F_MDS_QC , H_CORR , H_RANDUNC , EBC_CF_N , #> # EBC_CF_METHOD , NIGHT_RANDUNC_N , DAY_RANDUNC_N , NEE_CUT_REF , NEE_VUT_REF , NEE_CUT_REF_QC , NEE_CUT_REF_RANDUNC , NEE_VUT_REF_RANDUNC , … ``` Now `p_gapfilled` is 1 - the smaller of `NEE_VUT_REF_QC` and `TA_F_QC`. ### Example 3: Flag rows where NEE_VUT_REF is more than 30% gapfilled *and* TA_F is more than 10% gapfilled ``` r flagged_3 <- flux_qc( annual, qc_vars = c("NEE_VUT_REF", "TA_F"), max_gapfilled = c(0.3, 0.1), operator = "all" ) flagged_3 %>% filter(qc_flagged) %>% select(qc_flagged, p_gapfilled, NEE_VUT_REF_QC, TA_F_QC, everything()) #> # A tibble: 0 × 337 #> # ℹ 337 variables: qc_flagged , p_gapfilled , NEE_VUT_REF_QC , TA_F_QC , site_id , dataset , YEAR , TA_ERA , TA_ERA_NIGHT , TA_ERA_NIGHT_SD , #> # TA_ERA_DAY , TA_ERA_DAY_SD , SW_IN_ERA , LW_IN_ERA , VPD_ERA , PA_ERA , P_ERA , WS_ERA , TA_F_MDS , TA_F_MDS_QC , TA_F_MDS_NIGHT , #> # TA_F_MDS_NIGHT_SD , TA_F_MDS_NIGHT_QC , TA_F_MDS_DAY , TA_F_MDS_DAY_SD , TA_F_MDS_DAY_QC , TA_F , TA_F_NIGHT , TA_F_NIGHT_SD , TA_F_NIGHT_QC , #> # TA_F_DAY , TA_F_DAY_SD , TA_F_DAY_QC , SW_IN_F_MDS , SW_IN_F_MDS_QC , SW_IN_F , SW_IN_F_QC , LW_IN_F_MDS , LW_IN_F_MDS_QC , LW_IN_F , #> # LW_IN_F_QC , LW_IN_JSB , LW_IN_JSB_QC , LW_IN_JSB_ERA , LW_IN_JSB_F , LW_IN_JSB_F_QC , VPD_F_MDS , VPD_F_MDS_QC , VPD_F , VPD_F_QC , #> # PA_F , PA_F_QC , P_F , P_F_QC , WS_F , WS_F_QC , USTAR , USTAR_QC , PPFD_IN , PPFD_IN_QC , PPFD_OUT , PPFD_OUT_QC , CO2_F_MDS , #> # CO2_F_MDS_QC , TS_F_MDS_1 , TS_F_MDS_2 , TS_F_MDS_1_QC , TS_F_MDS_2_QC , SWC_F_MDS_1 , SWC_F_MDS_1_QC , G_F_MDS , G_F_MDS_QC , LE_F_MDS , … ``` Because we've set `operator = "all"`, `p_gapfilled` is now 1 - the *larger* of `NEE_VUT_REF_QC` and `TA_F_QC` and all of the values are greater than the smaller of the `max_gapfilled` values.