| Title: | Download, Read, and Visualize FLUXNET Data |
|---|---|
| Description: | Utility functions to help download, read in, and work with data from FLUXNET. |
| Authors: | Eric R. Scott [aut, cre] (ORCID: <https://orcid.org/0000-0002-7430-7879>), David J.P. Moore [aut] (ORCID: <https://orcid.org/0000-0002-6462-3288>), Arizona Board of Regents on behalf of The University of Arizona [cph] (ROR: <https://ror.org/0054f1w39>) |
| Maintainer: | Eric R. Scott <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.3.2.9000 |
| Built: | 2026-06-02 23:46:46 UTC |
| Source: | https://github.com/EcosystemEcologyLab/fluxnet-package |
flux_download()
A helper to generate a list of AmeriFlux user info to pass along to
flux_download(). Because these elements are by default pulled from
environment variables, it is recommended that you set them in a project-level
.Renviron file like AMERIFLUX_USER_NAME=myusername, etc.
flux_amf_credentials( user_name = Sys.getenv("AMERIFLUX_USER_NAME", unset = NA_character_), user_email = Sys.getenv("AMERIFLUX_USER_EMAIL", unset = NA_character_), intended_use = Sys.getenv("AMERIFLUX_INTENDED_USE", unset = 6), description = Sys.getenv("AMERIFLUX_DESCRIPTION", unset = NA_character_) )flux_amf_credentials( user_name = Sys.getenv("AMERIFLUX_USER_NAME", unset = NA_character_), user_email = Sys.getenv("AMERIFLUX_USER_EMAIL", unset = NA_character_), intended_use = Sys.getenv("AMERIFLUX_INTENDED_USE", unset = 6), description = Sys.getenv("AMERIFLUX_DESCRIPTION", unset = NA_character_) )
user_name |
Your AmeriFlux username. |
user_email |
The email address associated with your AmeriFlux profile. |
intended_use |
An integer 1–6 as follows:
|
description |
An optional description of the project. |
A list with elements user_name, user_email, intended_use, and
description.
Reads in BADM data from "BIF" csv files, subsets to a single VARIABLE_GROUP
and returns a wide data frame.
flux_badm( manifest, variable_group, site_ids = NULL, networks = c("AMF", "CNF", "EUF", "FLX", "ICOS", "JPF", "KOF", "SAEON", "TERN") )flux_badm( manifest, variable_group, site_ids = NULL, networks = c("AMF", "CNF", "EUF", "FLX", "ICOS", "JPF", "KOF", "SAEON", "TERN") )
manifest |
A manifest data frame produced by |
variable_group |
A single |
site_ids |
A vector of site IDs to filter the manifest by. If |
networks |
A character vector indicating which networks to extract files from. Multiple values may be provided. Defaults to all networks. |
A tibble.
## Not run: manifest <- flux_discover_files() flux_badm(manifest, "SOIL_CHEM") flux_badm(manifest, "LAI") ## End(Not run)## Not run: manifest <- flux_discover_files() flux_badm(manifest, "SOIL_CHEM") flux_badm(manifest, "LAI") ## End(Not run)
Create a "manifest" of downloaded and unzipped FLUXNET data files
flux_discover_files(data_dir = "fluxnet/unzipped", ...)flux_discover_files(data_dir = "fluxnet/unzipped", ...)
data_dir |
The directory to look for FLUXNET CSV files in, typically the
same as the |
... |
Arguments passed to |
Prints a summary of discovered available data and returns
(invisibly) a dataframe with file paths and metadata extracted from file
names and merged in from flux_listall().
## Not run: # Download data flux_download(site_ids = c("AU-Boy", "BR-CST")) # Extract annual and monthly data flux_extract(resolutions = c("y", "m")) # Create a manifest of extracted files manifest <- flux_discover_files() ## End(Not run)## Not run: # Download data flux_download(site_ids = c("AU-Boy", "BR-CST")) # Extract annual and monthly data flux_extract(resolutions = c("y", "m")) # Create a manifest of extracted files manifest <- flux_discover_files() ## End(Not run)
Downloads zip files (one per site) for available FLUXNET sites.
flux_download( file_list_df = NULL, site_ids = NULL, download_dir = "fluxnet", overwrite = FALSE, user_info = list(ameriflux = flux_amf_credentials()), ... )flux_download( file_list_df = NULL, site_ids = NULL, download_dir = "fluxnet", overwrite = FALSE, user_info = list(ameriflux = flux_amf_credentials()), ... )
file_list_df |
When |
site_ids |
If |
download_dir |
The directory to download zip files to. |
overwrite |
Logical; overwrite already downloaded .zip files? If |
user_info |
An optional list with data-hub-specific user information.
Only AmeriFlux uses this currently. By default, these are retrieved from
environment variables by |
... |
Arguments passed to |
Invisibly returns a tibble with the download URL, path on disk, HTTP status code, and whether or not the download was successful.
## Not run: # Download data for all available sites flux_download() # Download data for just select site IDs flux_download(site_ids = c("UK-GaB", "CA-Ca2")) # Download specific sites filtered by site metadata (IGPB and data hub for example) available_sites <- flux_listall() to_get <- available_sites[available_sites$igbp == "CRO" & available_sites$data_hub == "AmeriFlux", ] flux_download(file_list_df = to_get) # Get a fresh list of available data and overwrite any existing downloads flux_download(use_cache = FALSE, overwrite = FALSE) ## End(Not run)## Not run: # Download data for all available sites flux_download() # Download data for just select site IDs flux_download(site_ids = c("UK-GaB", "CA-Ca2")) # Download specific sites filtered by site metadata (IGPB and data hub for example) available_sites <- flux_listall() to_get <- available_sites[available_sites$igbp == "CRO" & available_sites$data_hub == "AmeriFlux", ] flux_download(file_list_df = to_get) # Get a fresh list of available data and overwrite any existing downloads flux_download(use_cache = FALSE, overwrite = FALSE) ## End(Not run)
Extracts data from zip files downloaded by flux_download() with options to
extract only subsets of the files they contain.
flux_extract( zip_dir = "fluxnet", output_dir = fs::path(zip_dir, "unzipped"), site_ids = NULL, networks = c("AMF", "CNF", "EUF", "FLX", "ICOS", "JPF", "KOF", "SAEON", "TERN"), resolutions = c("y", "m", "w", "d", "h"), extract_varinfo = TRUE, extract_txt = FALSE, overwrite = FALSE )flux_extract( zip_dir = "fluxnet", output_dir = fs::path(zip_dir, "unzipped"), site_ids = NULL, networks = c("AMF", "CNF", "EUF", "FLX", "ICOS", "JPF", "KOF", "SAEON", "TERN"), resolutions = c("y", "m", "w", "d", "h"), extract_varinfo = TRUE, extract_txt = FALSE, overwrite = FALSE )
zip_dir |
The directory with the zip files |
output_dir |
The directory to unzip files to. Within this directory, data files will be nested by site. |
site_ids |
A character vector of site IDs (e.g. |
networks |
A character vector indicating which networks to extract files from. Multiple values may be provided. Defaults to all networks. |
resolutions |
A character vector indicating which time resolutions to
extract. Options are yearly ( |
extract_varinfo |
Logical; extract the BIF and BIFVARINFO files
containing variable information in the BADM Interchange Format? Defaults to
|
extract_txt |
Logical; extract the README.txt and
DATA_POLICY_LICENSE_AND_INSTRUCTIONS.txt files? Defaults to |
overwrite |
Logical; should existing extracted files be overwritten
( |
Invisbly returns a dataframe of file paths and error messages if there are any errors generated by unzipping.
Uses reticulate to install the
fluxnet-shuttle Python library and
command line interface into a virtual environment. Required for
flux_listall() and flux_download().
flux_install_shuttle( venv = Sys.getenv("FLUXNET_VENV", unset = "fluxnet"), shuttle_version = Sys.getenv("FLUXNET_SHUTTLE_VERSION", unset = "main"), from = c("github", "pypi"), reinitialize = FALSE )flux_install_shuttle( venv = Sys.getenv("FLUXNET_VENV", unset = "fluxnet"), shuttle_version = Sys.getenv("FLUXNET_SHUTTLE_VERSION", unset = "main"), from = c("github", "pypi"), reinitialize = FALSE )
venv |
A name to use for creating a virtual environment. Defaults to
|
shuttle_version |
A version tag (e.g. |
from |
Where to install from? Currently only |
reinitialize |
Logical; if |
The path to the fluxnet-shuttle CLI executable, silently.
This will be run automatically the first time you run flux_listall()
or flux_download(), so it is not necessary to run this function
separately first. Big thanks to Andrew Heiss for helping me figure this
all out!
## Not run: # Standard install of development version flux_install_shuttle() # Specify a version flux_install_shuttle(shuttle_version = "0.3.5") # When run a second time, even after restarting the R session, it skips # installation as long as the `venv` exists unless `reinitialize = TRUE` flux_install_shuttle(shuttle_version = "0.3.6") # If you want to update the version, set `reinitilaize = TRUE` flux_install_shuttle(shuttle_version = "0.3.6", reinitialize = TRUE) ## End(Not run)## Not run: # Standard install of development version flux_install_shuttle() # Specify a version flux_install_shuttle(shuttle_version = "0.3.5") # When run a second time, even after restarting the R session, it skips # installation as long as the `venv` exists unless `reinitialize = TRUE` flux_install_shuttle(shuttle_version = "0.3.6") # If you want to update the version, set `reinitilaize = TRUE` flux_install_shuttle(shuttle_version = "0.3.6", reinitialize = TRUE) ## End(Not run)
This provides a wrapper around the
fluxnet-shuttle command-line utility's
listall command, which downloads a data frame of available .zip files. By
default, the downloaded CSV is stored in
rappdirs::user_cache_dir("fluxnet"). If there is allready a FLUXNET
shanpshot CSV file downloaded and it is more recent than cache_age, it will
be read in instead of downloading a new snapshot.
flux_listall( cache_dir = rappdirs::user_cache_dir("fluxnet"), cache_age = as.difftime(1, units = "days"), clean_cache = 10L, log_file = NULL, echo_cmd = FALSE, ... )flux_listall( cache_dir = rappdirs::user_cache_dir("fluxnet"), cache_age = as.difftime(1, units = "days"), clean_cache = 10L, log_file = NULL, echo_cmd = FALSE, ... )
cache_dir |
The directory to store the list of available FLUXNET data in. |
cache_age |
A |
clean_cache |
A number of files |
log_file |
An optional file path (e.g. |
echo_cmd |
Set to |
... |
Arguments passed to |
A data frame of stations with available data and their metadata.
To force the fluxnet R package to re-install the fluxnet-shuttle
utility, remove the Pyhton virtualenv it is installed in by running
reticulate::virtualenv_remove("fluxnet"). Then, when you run
flux_listall() next, the virtualenv will be re-created and
fluxnet-shuttle will be re-installed.
## Not run: fluxnet_files <- flux_listall() # Invalidate cache and update it fluxnet_files <- flux_listall(cache_age = -Inf) ## End(Not run)## Not run: fluxnet_files <- flux_listall() # Invalidate cache and update it fluxnet_files <- flux_listall(cache_age = -Inf) ## End(Not run)
Plot locations of sites in a file manifest on a world map.
flux_map_sites( manifest, color_var = c("data_hub", "igbp", "network", "first_year", "last_year") )flux_map_sites( manifest, color_var = c("data_hub", "igbp", "network", "first_year", "last_year") )
manifest |
A data manifest created by |
color_var |
A variable to use to color-code points. |
A ggplot2 object
## Not run: manifest <- flux_discover_files() flux_map_sites(manifest) ## End(Not run)## Not run: manifest <- flux_discover_files() flux_map_sites(manifest) ## End(Not run)
Flags rows of data based on variables with associated _QC columns.
flux_qc(data, qc_vars, max_gapfilled = 0.5, operator = c("any", "all"))flux_qc(data, qc_vars, max_gapfilled = 0.5, operator = c("any", "all"))
data |
A data frame created by |
qc_vars |
A character vector of column names with associated |
max_gapfilled |
Numeric between 0 and 1; cutoff for the |
operator |
How to flag data when multiple |
A tibble with the added columns p_gapfilled and qc_flagged. If
operator = "any", qc_flagged = TRUE indicates that at least one of the
supplied QC variables was more gapfilled than max_gapfilled and
p_gapfilled will be the maximum proportion gapfilled across the QC vars
for each row. If operator = "all", then qc_flagged = TRUE indicates
that all of the supplied QC variables were more gapfilled than the
thresholds supplies and p_gapfilled will be the minimum proportion
gapfilled across all QC variables for each row.
## Not run: # Flag rows where NEE_VUT_REF is more than 50% gapfilled manifest <- flux_discover_files() annual <- flux_read(manifest, resolution = "y") annual_flagged <- flux_qc( annual, qc_vars = "NEE_VUT_REF", max_gapfilled = 0.5 ) # Use multiple variables each with a different threshold for QC annual_flagged2 <- flux_qc( annual, qc_vars = c("NEE_VUT_REF", "TA_F"), max_gapfilled = c(0.4, 0.6) ) # Same as above, but require *both* variables to be above their thresholds # to consider that row a problem annual_flagged2 <- flux_qc( annual, qc_vars = c("NEE_VUT_REF", "TA_F"), max_gapfilled = c(0.4, 0.6), operator = "all" ) ## End(Not run)## Not run: # Flag rows where NEE_VUT_REF is more than 50% gapfilled manifest <- flux_discover_files() annual <- flux_read(manifest, resolution = "y") annual_flagged <- flux_qc( annual, qc_vars = "NEE_VUT_REF", max_gapfilled = 0.5 ) # Use multiple variables each with a different threshold for QC annual_flagged2 <- flux_qc( annual, qc_vars = c("NEE_VUT_REF", "TA_F"), max_gapfilled = c(0.4, 0.6) ) # Same as above, but require *both* variables to be above their thresholds # to consider that row a problem annual_flagged2 <- flux_qc( annual, qc_vars = c("NEE_VUT_REF", "TA_F"), max_gapfilled = c(0.4, 0.6), operator = "all" ) ## End(Not run)
Reads and minimally cleans FLUXNET data found by flux_discover_files().
flux_read( manifest, resolution = c("y", "m", "w", "d", "h"), datasets = c("ERA5", "FLUXMET"), networks = c("AMF", "CNF", "EUF", "FLX", "ICOS", "JPF", "KOF", "SAEON", "TERN"), site_ids = NULL )flux_read( manifest, resolution = c("y", "m", "w", "d", "h"), datasets = c("ERA5", "FLUXMET"), networks = c("AMF", "CNF", "EUF", "FLX", "ICOS", "JPF", "KOF", "SAEON", "TERN"), site_ids = NULL )
manifest |
A manifest data frame produced by |
resolution |
The time resolution to read in. Must be one of |
datasets |
Character vector of one or both of |
networks |
A character vector indicating which networks to extract files from. Multiple values may be provided. Defaults to all networks. |
site_ids |
A vector of site IDs to filter the manifest by. If |
## Not run: manifest <- flux_discover_files() daily <- flux_read(manifest, resolution = "d") annual <- flux_read(manifest, resolution = "y") # Filter manifest by metadata first metadata <- flux_listall() library(dplyr) manifest_enriched <- left_join(manifest, metadata, by = join_by(site_id)) manifest_WET <- manifest_enriched %>% filter(igbp == "WET") annual_wet <- flux_read(manifest_WET, resolution = "y") ## End(Not run)## Not run: manifest <- flux_discover_files() daily <- flux_read(manifest, resolution = "d") annual <- flux_read(manifest, resolution = "y") # Filter manifest by metadata first metadata <- flux_listall() library(dplyr) manifest_enriched <- left_join(manifest, metadata, by = join_by(site_id)) manifest_WET <- manifest_enriched %>% filter(igbp == "WET") annual_wet <- flux_read(manifest_WET, resolution = "y") ## End(Not run)
Extracts and tidies variable information from "BIFVARINFO" csv files.
flux_varinfo( manifest, resolution = c("y", "m", "w", "d", "h"), networks = c("AMF", "CNF", "EUF", "FLX", "ICOS", "JPF", "KOF", "SAEON", "TERN"), site_ids = NULL )flux_varinfo( manifest, resolution = c("y", "m", "w", "d", "h"), networks = c("AMF", "CNF", "EUF", "FLX", "ICOS", "JPF", "KOF", "SAEON", "TERN"), site_ids = NULL )
manifest |
A manifest data frame produced by |
resolution |
The time resolution to read in. Must be one of |
networks |
A character vector indicating which networks to extract files from. Multiple values may be provided. Defaults to all networks. |
site_ids |
A vector of site IDs to filter the manifest by. If |
A tibble
This only returns variable info (VARIABLE_GROUP == GRP_VAR_INFO) from
the "BIFVARINFO" files as much of the other metadata they contain can be
found in the results of flux_listall().
HEIGHT is returned as a character value because some heights are reported
as ranges and cannot be parsed as a single numeric value.
## Not run: manifest <- flux_discover_files() flux_varinfo(manifest) ## End(Not run)## Not run: manifest <- flux_discover_files() flux_varinfo(manifest) ## End(Not run)