--- title: "Introduction to zentracloud" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Introduction to zentracloud} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` ## Purpose The package is designed to act as a direct access point to the ZENTRA Cloud API. With a valid token, data for a chosen period can be directly loaded into R. Further, the data is saved into a cache, so that repeated queries for the same time period are performed much faster. **IMPORTANT** The package is currently not well suited to download large amounts of data. The ZENTRA Cloud API is limited to 2000 readings/minute, therefore the requests via the package functions are throttled. For long time periods we thus recommend continuing to use the ZENTRA Cloud interface. ## Usage ### Installation ```{r install, eval=FALSE} # install from GitLab url = "https://gitlab.com/meter-group-inc/pubpackages/zentracloud" remotes::install_git(url = url) ``` ```{r load} # load package library(zentracloud) ``` There might be some start-up messages, which will be explained later on. ### Token A valid token for the API is a prerequisite for the use of the package. Tokens can be generated on the ZENTRA Cloud web interface. There, go to the menu point API in the sidebar. If a valid token exists, it will show up and can be copied. If not, there is an option to add a new key, which will generate a token. ![](imgs/token.png){width="100%"} For use in the functions, the token has to be set as an option for the duration of the R session. To reload the setting for every session, the option can be written for example into the .Rprofile. To set the token for the session use function `setZentracloudOptions()`. Its arguments are the `token`, as well as the corresponding `domain` and any of the other three options that can be set for cache management (details below). The domain has to be set to know which server the API should query. To find out which options exist, see the help page of `setZentracloudOptions()`. If you are unsure which of the domains your token is valid for, check the URL of your ZENTRACLOUD web interface. If the URL starts with `zentracloud.com` use `default`, if it starts with `aroya.zentracloud.com` use `aroya` and so on. ```{r token, eval = FALSE} # set token as option setZentracloudOptions(token = "", domain = <"corresponding_domain">) # set token in .Rprofile # open profile usethis::edit_r_profile() # add token and domain options("ZENTRACLOUD_TOKEN" = "") options("ZENTRACLOUD_DOMAIN" = "") ``` ### Cache The other options that can be set for this package all concern the cache. Most importantly, the option to set the cache directory, but also the allowed maximum size and file age. For these, default values are set upon loading the package, if the options were not predefined otherwise. The defaults are: - ZENTRACLOUD_CACHE_MAX_SIZE: 500 kB - ZENTRACLOUD_CACHE_MAX_AGE: 7 days For the directory the default changes depending on the operating system. For instance, the default for Linux is: - ZENTRACLOUD_CACHE_DIR: `~/.cache/R/zentracloud` The path is determined using this function: ```{r cache_path, eval = FALSE} tools::R_user_dir("zentracloud", which = "cache") ``` Same as with the token, these options can also be changed using `setZentracloudOptions()`, or set more permanently in the .Rprofile. To see all currently set options use `getZentracloudOptions()` ```{r get_options, eval = FALSE} getZentracloudOptions() #> #> ZENTRACLOUD_CACHE_DIR : /home//.cache/R/zentracloud #> ZENTRACLOUD_CACHE_MAX_AGE : 7 #> ZENTRACLOUD_CACHE_MAX_SIZE: 500 #> ZENTRACLOUD_DOMAIN : zentracloud.com #> ZENTRACLOUD_TOKEN : <-- hidden --> ``` If the cache directory is filled upon loading the package, some checks will run automatically: - If files that are older than the maximum allowed age are found, they are deleted. If this is the case, a message will show if and how many files were deleted. - If afterwards the size of the cache directory still surpasses the maximum allowed size, a warning will be printed. Then it is up to the user to delete or move further files. To manually clear the cache of files older than a certain age, use function `clearCache()`. If argument `cache_dir` is not provided, the function will read the directory from the options. Any path can be set, as long as the cached files follow the same structure that is automatically created when running `getReadings()`, which will be described later. The argument `file_age` takes an integer, which must be observed in the notation. Again, if it is not provided, it will use the default value as stored in the options. ```{r cache, eval = FALSE} clearCache(file_age = 5L) ``` To load everything that is currently in your cache use `readCache()`. This will return a nested list with the data sorted by device and sensor. If argument `cache_dir` is not provided, the function will use the cache directory set in the options. ```{r readCache, eval = FALSE} cached_data = readCache() ``` ### Data To access the API and request the data use function `getReadings()`. Some notes on the arguments of the function: - Arguments that need to be provided are the device serial number, as well as start and end datetime of the period of interest. - Start and end time need to be provided in the format *"YYYY-MM-DD hh:mm:ss"* and have to be given in the *logger time zone*! - If `force_api = TRUE`, the cache is be bypassed and the query goes straight to the API. Still, the results are written to the cache. - If `ignore_cache = TRUE`, the function internally uses a tmp directory as cache during processing. No data is written to the cache directory set in the options. Be aware though, that no data are read from the cache either, so it is possible that the run time increases. When running the function, it first checks whether the queried data (or parts of it) are already in the cache. If yes, it loads it from there, if not, it accesses the ZENTRA Cloud API and requests the data. The maximum download is 2000 entries at once. That means for periods longer than around 20 days (in case of a measurement interval of 15 minutes), the response is paginated, meaning that the data has to be downloaded in chunks. Between the different chunks a downtime of 60 seconds has to be observed. As such, requesting larger amounts of data takes a while. The chunks are separately written to the cache to avoid memory shortages. The data is written as *.parquet* files, which is a highly efficient format, both in regards to the storage space it uses and to reading and writing speed. (More info on the format can be found [in a short blog post](https://www.r-bloggers.com/2021/09/understanding-the-parquet-file-format/) and [on the arrow github page](https://github.com/apache/arrow)).\ Within the cache a directory is created for the device you queried, inside which the data is written partitioned by sensor, year and month. For the example query below, this thus creates a directory tree as such: ![](imgs/dir_tree.png) This is the structure, that is needed for `clearCache()` to work reliably. ```{r readings, eval=FALSE} setZentracloudOptions( token = Sys.getenv("ZENTRACLOUD_TOKEN") , domain = "default" ) zentra_data = getReadings( device_sn = "06-01185" , start_time = "2022-06-01 00:00:00" , end_time = "2022-06-14 23:59:00" , force_api = FALSE , ignore_cache = FALSE ) ``` ```{r show-readings} str(zentra_data, max.level = 3, give.attr = FALSE) ``` The data that is returned in `zentra_data` is a list with entries for all the sensors connected to the queried device. Each entry in turn contains a `data.frame` with columns for the date & time specifications and for each variable measured, as well as the corresponding error flags and descriptions. Attached as attributes to the value columns are the corresponding unit and measurement precision. To access the attributes do the following: ```{r attributes} # gives back the names of the list items, i.e. the sensor names attributes(zentra_data) # querying the attributes of single columns gives back the measurement unit # and precision vars = c("saturation_extract_ec.value", "soil_temperature.value", "water_content.value") vars_attr = sapply( vars , \(v) {attributes(zentra_data$`T12-0000248_port5`[[v]])} , simplify = FALSE ) str(vars_attr) ``` > **NOTE:** > > Variable names in the returned `zentra_data` are taken from the API response. > This should ensure compatibility with data downloaded via other methods > (e.g. as *csv* from the ZENTRACLOUD). > We suggest to make syntactically valid names before continuing data analysis > (e.g. with `base::make_names()` or `janitor::clean_names()`). ### Settings For internal use within `getReadings()` the device settings are queried. This is necessary to accurately deal with the timestamps. The function can be called on its own as well. As this function also accesses the API, a token and device serial number are necessary. From the settings, information such as measurement intervals, location and time settings can be read. The returned object is a nested list. ```{r settings, eval=FALSE} set = queryDeviceSettings( device_sn = "06-01185" ) # this function allows a quick view into the data structure of the list: listviewer::jsonedit(set) ``` ![](imgs/README-settings-new.png){width="100%"}