Package 'fluxnet' reference manual

Title:	Download, Read, and Visualize FLUXNET Data
Description:	Utility functions to help download, read in, and work with data from FLUXNET.
Authors:	Eric R. Scott [aut, cre] (ORCID: <https://orcid.org/0000-0002-7430-7879>), David J.P. Moore [aut] (ORCID: <https://orcid.org/0000-0002-6462-3288>), Arizona Board of Regents on behalf of The University of Arizona [cph] (ROR: <https://ror.org/0054f1w39>)
Maintainer:	Eric R. Scott <[email protected]>
License:	MIT + file LICENSE
Version:	0.3.2.9000
Built:	2026-06-02 23:46:46 UTC
Source:	https://github.com/EcosystemEcologyLab/fluxnet-package

Generate list of user info to pass to `flux_download()`

Description

A helper to generate a list of AmeriFlux user info to pass along to flux_download(). Because these elements are by default pulled from environment variables, it is recommended that you set them in a project-level .Renviron file like AMERIFLUX_USER_NAME=myusername, etc.

Usage

flux_amf_credentials(
  user_name = Sys.getenv("AMERIFLUX_USER_NAME", unset = NA_character_),
  user_email = Sys.getenv("AMERIFLUX_USER_EMAIL", unset = NA_character_),
  intended_use = Sys.getenv("AMERIFLUX_INTENDED_USE", unset = 6),
  description = Sys.getenv("AMERIFLUX_DESCRIPTION", unset = NA_character_)
)
flux_amf_credentials(
  user_name = Sys.getenv("AMERIFLUX_USER_NAME", unset = NA_character_),
  user_email = Sys.getenv("AMERIFLUX_USER_EMAIL", unset = NA_character_),
  intended_use = Sys.getenv("AMERIFLUX_INTENDED_USE", unset = 6),
  description = Sys.getenv("AMERIFLUX_DESCRIPTION", unset = NA_character_)
)

Arguments

user_name

Your AmeriFlux username.

user_email

The email address associated with your AmeriFlux profile.

intended_use

An integer 1–6 as follows:

1 = Synthesis / network synthesis analysis
2 = Land model/Earth system model
3 = Remote sensing research
4 = Other research
5 = Education (Teacher or Student)
6 = Other

description

An optional description of the project.

Value

A list with elements user_name, user_email, intended_use, and description.

Read and tidy BADM subsets

Description

Reads in BADM data from "BIF" csv files, subsets to a single VARIABLE_GROUP and returns a wide data frame.

Usage

flux_badm(
  manifest,
  variable_group,
  site_ids = NULL,
  networks = c("AMF", "CNF", "EUF", "FLX", "ICOS", "JPF", "KOF", "SAEON", "TERN")
)
flux_badm(
  manifest,
  variable_group,
  site_ids = NULL,
  networks = c("AMF", "CNF", "EUF", "FLX", "ICOS", "JPF", "KOF", "SAEON", "TERN")
)

Arguments

manifest

A manifest data frame produced by flux_discover_files().

variable_group

A single VARIABLE_GROUP without a GRP_ prefix. Options can be viewed at https://ameriflux.lbl.gov/data/badm/badm-standards/

site_ids

A vector of site IDs to filter the manifest by. If NULL (the default), the manifest isn't filtered by site ID.

networks

A character vector indicating which networks to extract files from. Multiple values may be provided. Defaults to all networks.

Value

A tibble.

Examples

## Not run: 
manifest <- flux_discover_files()
flux_badm(manifest, "SOIL_CHEM")
flux_badm(manifest, "LAI")

## End(Not run)
## Not run: 
manifest <- flux_discover_files()
flux_badm(manifest, "SOIL_CHEM")
flux_badm(manifest, "LAI")

## End(Not run)

Create a "manifest" of downloaded and unzipped FLUXNET data files

Description

Create a "manifest" of downloaded and unzipped FLUXNET data files

Usage

flux_discover_files(data_dir = "fluxnet/unzipped", ...)
flux_discover_files(data_dir = "fluxnet/unzipped", ...)

Arguments

data_dir

The directory to look for FLUXNET CSV files in, typically the same as the output_dir used for flux_extract().

...

Arguments passed to flux_listall().

Value

Prints a summary of discovered available data and returns (invisibly) a dataframe with file paths and metadata extracted from file names and merged in from flux_listall().

Examples

## Not run: 
# Download data
flux_download(site_ids = c("AU-Boy", "BR-CST"))

# Extract annual and monthly data
flux_extract(resolutions = c("y", "m"))

# Create a manifest of extracted files
manifest <- flux_discover_files()

## End(Not run)
## Not run: 
# Download data
flux_download(site_ids = c("AU-Boy", "BR-CST"))

# Extract annual and monthly data
flux_extract(resolutions = c("y", "m"))

# Create a manifest of extracted files
manifest <- flux_discover_files()

## End(Not run)

Download FLUXNET zip files

Description

Downloads zip files (one per site) for available FLUXNET sites.

Usage

flux_download(
  file_list_df = NULL,
  site_ids = NULL,
  download_dir = "fluxnet",
  overwrite = FALSE,
  user_info = list(ameriflux = flux_amf_credentials()),
  ...
)
flux_download(
  file_list_df = NULL,
  site_ids = NULL,
  download_dir = "fluxnet",
  overwrite = FALSE,
  user_info = list(ameriflux = flux_amf_credentials()),
  ...
)

Arguments

file_list_df

When NULL (default), flux_listall() is used to determine the sites with data available to download (specific sites can be selected with site_ids). If file_list_df is supplied, it should be a data frame generated by flux_listall(), but potentially filtered to exlude some rows. This provides an alternative way of downloading only specific sites. See the examples for a possible use case. If file_list_df is not NULL, cache_dir, use_cache, and cache_age will be ingored but site_ids will still be used.

site_ids

If NULL (default) all available sites will be downloaded. Alternatively, supply a character vector of site IDs. For example, c("UK-GaB", "CA-Ca2").

download_dir

The directory to download zip files to.

overwrite

Logical; overwrite already downloaded .zip files? If FALSE it will skip downloading existing files, unless they are invalid .zip files (e.g. due to partial download or corruption).

user_info

An optional list with data-hub-specific user information. Only AmeriFlux uses this currently. By default, these are retrieved from environment variables by flux_amf_credentials().

...

Arguments passed to flux_listall().

Value

Invisibly returns a tibble with the download URL, path on disk, HTTP status code, and whether or not the download was successful.

Examples

## Not run: 
# Download data for all available sites
flux_download()

# Download data for just select site IDs
flux_download(site_ids = c("UK-GaB", "CA-Ca2"))

# Download specific sites filtered by site metadata (IGPB  and data hub for example)
available_sites <- flux_listall()
to_get <-
  available_sites[available_sites$igbp == "CRO" & available_sites$data_hub == "AmeriFlux", ]
flux_download(file_list_df = to_get)

# Get a fresh list of available data and overwrite any existing downloads
flux_download(use_cache = FALSE, overwrite = FALSE)

## End(Not run)

## Not run: 
# Download data for all available sites
flux_download()

# Download data for just select site IDs
flux_download(site_ids = c("UK-GaB", "CA-Ca2"))

# Download specific sites filtered by site metadata (IGPB  and data hub for example)
available_sites <- flux_listall()
to_get <-
  available_sites[available_sites$igbp == "CRO" & available_sites$data_hub == "AmeriFlux", ]
flux_download(file_list_df = to_get)

# Get a fresh list of available data and overwrite any existing downloads
flux_download(use_cache = FALSE, overwrite = FALSE)

## End(Not run)

Extract FLUXNET data from downloaded zip files

Description

Extracts data from zip files downloaded by flux_download() with options to extract only subsets of the files they contain.

Usage

flux_extract(
  zip_dir = "fluxnet",
  output_dir = fs::path(zip_dir, "unzipped"),
  site_ids = NULL,
  networks = c("AMF", "CNF", "EUF", "FLX", "ICOS", "JPF", "KOF", "SAEON", "TERN"),
  resolutions = c("y", "m", "w", "d", "h"),
  extract_varinfo = TRUE,
  extract_txt = FALSE,
  overwrite = FALSE
)
flux_extract(
  zip_dir = "fluxnet",
  output_dir = fs::path(zip_dir, "unzipped"),
  site_ids = NULL,
  networks = c("AMF", "CNF", "EUF", "FLX", "ICOS", "JPF", "KOF", "SAEON", "TERN"),
  resolutions = c("y", "m", "w", "d", "h"),
  extract_varinfo = TRUE,
  extract_txt = FALSE,
  overwrite = FALSE
)

Arguments

zip_dir

The directory with the zip files

output_dir

The directory to unzip files to. Within this directory, data files will be nested by site.

site_ids

A character vector of site IDs (e.g. c("AR-TF2", "CA-Ca2")) can be supplied to only unzip data for certain sites. If NULL (default), all zip files found in zip_dir will be unzipped.

networks

A character vector indicating which networks to extract files from. Multiple values may be provided. Defaults to all networks.

resolutions

A character vector indicating which time resolutions to extract. Options are yearly ("y"), monthly ("m"), daily ("d"), and hourly/half-hourly ("h"). Multiple options may be passed with all of them as default.

extract_varinfo

Logical; extract the BIF and BIFVARINFO files containing variable information in the BADM Interchange Format? Defaults to TRUE.

extract_txt

Logical; extract the README.txt and DATA_POLICY_LICENSE_AND_INSTRUCTIONS.txt files? Defaults to FALSE.

overwrite

Logical; should existing extracted files be overwritten (TRUE) or ignored (FALSE)?

Value

Invisbly returns a dataframe of file paths and error messages if there are any errors generated by unzipping.

Install the fluxnet-shuttle CLI

Description

Uses reticulate to install the fluxnet-shuttle Python library and command line interface into a virtual environment. Required for flux_listall() and flux_download().

Usage

flux_install_shuttle(
  venv = Sys.getenv("FLUXNET_VENV", unset = "fluxnet"),
  shuttle_version = Sys.getenv("FLUXNET_SHUTTLE_VERSION", unset = "main"),
  from = c("github", "pypi"),
  reinitialize = FALSE
)
flux_install_shuttle(
  venv = Sys.getenv("FLUXNET_VENV", unset = "fluxnet"),
  shuttle_version = Sys.getenv("FLUXNET_SHUTTLE_VERSION", unset = "main"),
  from = c("github", "pypi"),
  reinitialize = FALSE
)

Arguments

venv

A name to use for creating a virtual environment. Defaults to "fluxnet", but we recommend using a project-specific virtual environment. You can set this in a project-level .Renviron file as FLUXNET_VENV=myproject and it will be pulled from there.

shuttle_version

A version tag (e.g. "0.3.7") to install. Defaults to GitHub development version (for now). Can also be set as an environment variable in .Renviron, E.g. ⁠FLUXNET_SHUTTLE_VERSION=0.3.6⁠

from

Where to install from? Currently only "github" is available, but eventually "pypi" will be an option to install from PyPI.

reinitialize

Logical; if TRUE, the virtual environment specified in venv will be removed and re-initialized. Useful if you'd like to update the version of fluxnet-shuttle.

Value

The path to the fluxnet-shuttle CLI executable, silently.

Note

This will be run automatically the first time you run flux_listall() or flux_download(), so it is not necessary to run this function separately first. Big thanks to Andrew Heiss for helping me figure this all out!

Examples

## Not run: 

# Standard install of development version
flux_install_shuttle()

# Specify a version
flux_install_shuttle(shuttle_version = "0.3.5")

# When run a second time, even after restarting the R session, it skips
# installation as long as the `venv` exists unless `reinitialize = TRUE`
flux_install_shuttle(shuttle_version = "0.3.6")

# If you want to update the version, set `reinitilaize = TRUE`
flux_install_shuttle(shuttle_version = "0.3.6", reinitialize = TRUE)

## End(Not run)

## Not run: 

# Standard install of development version
flux_install_shuttle()

# Specify a version
flux_install_shuttle(shuttle_version = "0.3.5")

# When run a second time, even after restarting the R session, it skips
# installation as long as the `venv` exists unless `reinitialize = TRUE`
flux_install_shuttle(shuttle_version = "0.3.6")

# If you want to update the version, set `reinitilaize = TRUE`
flux_install_shuttle(shuttle_version = "0.3.6", reinitialize = TRUE)

## End(Not run)

List available FLUXNET zip files for download

Description

This provides a wrapper around the fluxnet-shuttle command-line utility's listall command, which downloads a data frame of available .zip files. By default, the downloaded CSV is stored in rappdirs::user_cache_dir("fluxnet"). If there is allready a FLUXNET shanpshot CSV file downloaded and it is more recent than cache_age, it will be read in instead of downloading a new snapshot.

Usage

flux_listall(
  cache_dir = rappdirs::user_cache_dir("fluxnet"),
  cache_age = as.difftime(1, units = "days"),
  clean_cache = 10L,
  log_file = NULL,
  echo_cmd = FALSE,
  ...
)
flux_listall(
  cache_dir = rappdirs::user_cache_dir("fluxnet"),
  cache_age = as.difftime(1, units = "days"),
  clean_cache = 10L,
  log_file = NULL,
  echo_cmd = FALSE,
  ...
)

Arguments

cache_dir

The directory to store the list of available FLUXNET data in.

cache_age

A difftime object of length 1. If there are no cached snapshots more recent than cache_age, a new one will be downloaded and stored. You can force the cache to be invalidated with cache_age = -Inf.

clean_cache

A number of files $\geq 1$ to keep in cache_dir. Defaults to 10, which keeps only the 10 most recent snapshots.

log_file

An optional file path (e.g. "log.txt") to direct the fluxnet-shuttle log to. Useful for debugging.

echo_cmd

Set to TRUE to print the shell command in the console. Passed to processx::run().

...

Arguments passed to flux_install_shuttle().

Value

A data frame of stations with available data and their metadata.

Note

To force the fluxnet R package to re-install the fluxnet-shuttle utility, remove the Pyhton virtualenv it is installed in by running reticulate::virtualenv_remove("fluxnet"). Then, when you run flux_listall() next, the virtualenv will be re-created and fluxnet-shuttle will be re-installed.

Examples

## Not run: 
fluxnet_files <- flux_listall()

# Invalidate cache and update it
fluxnet_files <- flux_listall(cache_age = -Inf)

## End(Not run)
## Not run: 
fluxnet_files <- flux_listall()

# Invalidate cache and update it
fluxnet_files <- flux_listall(cache_age = -Inf)

## End(Not run)

Map sites with downloaded data

Description

Plot locations of sites in a file manifest on a world map.

Usage

flux_map_sites(
  manifest,
  color_var = c("data_hub", "igbp", "network", "first_year", "last_year")
)
flux_map_sites(
  manifest,
  color_var = c("data_hub", "igbp", "network", "first_year", "last_year")
)

Arguments

manifest

A data manifest created by flux_discover_files().

color_var

A variable to use to color-code points.

Value

A ggplot2 object

Examples

## Not run: 
manifest <- flux_discover_files()
flux_map_sites(manifest)

## End(Not run)

## Not run: 
manifest <- flux_discover_files()
flux_map_sites(manifest)

## End(Not run)

Flag overly-gapfilled data

Description

Flags rows of data based on variables with associated ⁠_QC⁠ columns.

Usage

flux_qc(data, qc_vars, max_gapfilled = 0.5, operator = c("any", "all"))
flux_qc(data, qc_vars, max_gapfilled = 0.5, operator = c("any", "all"))

Arguments

data

A data frame created by flux_read().

qc_vars

A character vector of column names with associated ⁠*_QC⁠ columns to use for flagging.

max_gapfilled

Numeric between 0 and 1; cutoff for the qc_flagged flag to be TRUE. Can be length 1 or the same length as qc_vars to supply a different threshold for each variable.

operator

How to flag data when multiple qc_vars are supplied? If "any", the row will be marked as bad if any of the QC vars indicate gap-filling above their max_gapfill threshold. If "all" then the row will be flagged only if all of the QC vars are above their max_gapfill.

Value

A tibble with the added columns p_gapfilled and qc_flagged. If operator = "any", qc_flagged = TRUE indicates that at least one of the supplied QC variables was more gapfilled than max_gapfilled and p_gapfilled will be the maximum proportion gapfilled across the QC vars for each row. If operator = "all", then qc_flagged = TRUE indicates that all of the supplied QC variables were more gapfilled than the thresholds supplies and p_gapfilled will be the minimum proportion gapfilled across all QC variables for each row.

Examples

## Not run: 

# Flag rows where NEE_VUT_REF is more than 50% gapfilled
manifest <- flux_discover_files()
annual <- flux_read(manifest, resolution = "y")
annual_flagged <- flux_qc(
  annual,
  qc_vars = "NEE_VUT_REF",
  max_gapfilled = 0.5
)

# Use multiple variables each with a different threshold for QC
annual_flagged2 <- flux_qc(
  annual,
  qc_vars = c("NEE_VUT_REF", "TA_F"),
  max_gapfilled = c(0.4, 0.6)
)

# Same as above, but require *both* variables to be above their thresholds
# to consider that row a problem
annual_flagged2 <- flux_qc(
  annual,
  qc_vars = c("NEE_VUT_REF", "TA_F"),
  max_gapfilled = c(0.4, 0.6),
  operator = "all"
)


## End(Not run)

## Not run: 

# Flag rows where NEE_VUT_REF is more than 50% gapfilled
manifest <- flux_discover_files()
annual <- flux_read(manifest, resolution = "y")
annual_flagged <- flux_qc(
  annual,
  qc_vars = "NEE_VUT_REF",
  max_gapfilled = 0.5
)

# Use multiple variables each with a different threshold for QC
annual_flagged2 <- flux_qc(
  annual,
  qc_vars = c("NEE_VUT_REF", "TA_F"),
  max_gapfilled = c(0.4, 0.6)
)

# Same as above, but require *both* variables to be above their thresholds
# to consider that row a problem
annual_flagged2 <- flux_qc(
  annual,
  qc_vars = c("NEE_VUT_REF", "TA_F"),
  max_gapfilled = c(0.4, 0.6),
  operator = "all"
)


## End(Not run)

Read in FLUXNET data

Description

Reads and minimally cleans FLUXNET data found by flux_discover_files().

Usage

flux_read(
  manifest,
  resolution = c("y", "m", "w", "d", "h"),
  datasets = c("ERA5", "FLUXMET"),
  networks = c("AMF", "CNF", "EUF", "FLX", "ICOS", "JPF", "KOF", "SAEON", "TERN"),
  site_ids = NULL
)
flux_read(
  manifest,
  resolution = c("y", "m", "w", "d", "h"),
  datasets = c("ERA5", "FLUXMET"),
  networks = c("AMF", "CNF", "EUF", "FLX", "ICOS", "JPF", "KOF", "SAEON", "TERN"),
  site_ids = NULL
)

Arguments

manifest

A manifest data frame produced by flux_discover_files().

resolution

The time resolution to read in. Must be one of "y" (annual), "m" (monthly), "w" (weekly), "d" (daily), or "h" (hourly/half-hourly).

datasets

Character vector of one or both of "FLUXMET" or "ERA5". Defaults to both.

networks

A character vector indicating which networks to extract files from. Multiple values may be provided. Defaults to all networks.

site_ids

A vector of site IDs to filter the manifest by. If NULL (the default), the manifest isn't filtered by site ID.

Examples

## Not run: 
manifest <- flux_discover_files()
daily <- flux_read(manifest, resolution = "d")
annual <- flux_read(manifest, resolution = "y")

# Filter manifest by metadata first
metadata <- flux_listall()

library(dplyr)
manifest_enriched <- left_join(manifest, metadata, by = join_by(site_id))
manifest_WET <- manifest_enriched %>% filter(igbp == "WET")
annual_wet <- flux_read(manifest_WET, resolution = "y")


## End(Not run)


## Not run: 
manifest <- flux_discover_files()
daily <- flux_read(manifest, resolution = "d")
annual <- flux_read(manifest, resolution = "y")

# Filter manifest by metadata first
metadata <- flux_listall()

library(dplyr)
manifest_enriched <- left_join(manifest, metadata, by = join_by(site_id))
manifest_WET <- manifest_enriched %>% filter(igbp == "WET")
annual_wet <- flux_read(manifest_WET, resolution = "y")


## End(Not run)

Read variable info from "BIFVARINFO" files

Description

Extracts and tidies variable information from "BIFVARINFO" csv files.

Usage

flux_varinfo(
  manifest,
  resolution = c("y", "m", "w", "d", "h"),
  networks = c("AMF", "CNF", "EUF", "FLX", "ICOS", "JPF", "KOF", "SAEON", "TERN"),
  site_ids = NULL
)
flux_varinfo(
  manifest,
  resolution = c("y", "m", "w", "d", "h"),
  networks = c("AMF", "CNF", "EUF", "FLX", "ICOS", "JPF", "KOF", "SAEON", "TERN"),
  site_ids = NULL
)

Arguments

manifest

A manifest data frame produced by flux_discover_files().

resolution

The time resolution to read in. Must be one of "y" (annual), "m" (monthly), "w" (weekly), "d" (daily), or "h" (hourly/half-hourly).

networks

A character vector indicating which networks to extract files from. Multiple values may be provided. Defaults to all networks.

site_ids

A vector of site IDs to filter the manifest by. If NULL (the default), the manifest isn't filtered by site ID.

Value

A tibble

Note

This only returns variable info (VARIABLE_GROUP == GRP_VAR_INFO) from the "BIFVARINFO" files as much of the other metadata they contain can be found in the results of flux_listall().

HEIGHT is returned as a character value because some heights are reported as ranges and cannot be parsed as a single numeric value.

Examples

## Not run: 
manifest <- flux_discover_files()
flux_varinfo(manifest)

## End(Not run)

## Not run: 
manifest <- flux_discover_files()
flux_varinfo(manifest)

## End(Not run)

Package 'fluxnet'

Help Index

Generate list of user info to pass to flux_download()

Description

Usage

Arguments

Value

Read and tidy BADM subsets

Description

Usage

Arguments

Value

Examples

Create a "manifest" of downloaded and unzipped FLUXNET data files

Description

Usage

Arguments

Value

Examples

Download FLUXNET zip files

Description

Usage

Arguments

Value

Examples

Extract FLUXNET data from downloaded zip files

Description

Usage

Arguments

Value

Install the fluxnet-shuttle CLI

Description

Usage

Arguments

Value

Note

Examples

List available FLUXNET zip files for download

Description

Usage

Arguments

Value

Note

Examples

Map sites with downloaded data

Description

Usage

Arguments

Value

Examples

Flag overly-gapfilled data

Description

Usage

Arguments

Value

Examples

Read in FLUXNET data

Description

Usage

Arguments

Examples

Read variable info from "BIFVARINFO" files

Description

Usage

Arguments

Value

Note

Examples

Generate list of user info to pass to `flux_download()`