Title: | Access Data from the NASS 'Quick Stats' API |
---|---|
Description: | Interface to access data via the United States Department of Agriculture's National Agricultural Statistical Service (NASS) 'Quick Stats' web API <https://quickstats.nass.usda.gov/api/>. Convenience functions facilitate building queries based on available parameters and valid parameter values. This product uses the NASS API but is not endorsed or certified by NASS. |
Authors: | Nicholas Potter [aut, cre], Robert Dinterman [ctb], Jonathan Adams [ctb], Joseph Stachelek [ctb], Julia Piaskowski [ctb], Branden Collingsworth [ctb], Adam Sparks [rev], Neal Richardson [ctb, rev] |
Maintainer: | Nicholas Potter <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.6.3 |
Built: | 2024-12-28 04:39:46 UTC |
Source: | https://github.com/ropensci/rnassqs |
The primary function in the rnassqs
package, nassqs
makes a HTTP GET
request to the USDA-NASS Quick Stats API and returns the data parsed as a
data.frame, plain text, or list. Various other functions make use of nassqs
to make specific queries. For a data request the Quick Stats API returns
JSON that when parsed to a data.frame contains 39 columns and a varying
number of rows depending on the query. Unfortunately there is not a way to
restrict the number of columns.
nassqs( ..., agg_level_desc = NULL, asd_code = NULL, asd_desc = NULL, begin_code = NULL, class_desc = NULL, commodity_desc = NULL, congr_district_code = NULL, country_code = NULL, country_name = NULL, county_ansi = NULL, county_code = NULL, county_name = NULL, domaincat_desc = NULL, domain_desc = NULL, end_code = NULL, freq_desc = NULL, group_desc = NULL, load_time = NULL, location_desc = NULL, prodn_practice_desc = NULL, reference_period_desc = NULL, region_desc = NULL, sector_desc = NULL, short_desc = NULL, source_desc = NULL, state_alpha = NULL, state_ansi = NULL, state_fips_code = NULL, state_name = NULL, statisticcat_desc = NULL, unit_desc = NULL, util_practice_desc = NULL, watershed_code = NULL, watershed_desc = NULL, week_ending = NULL, year = NULL, zip_5 = NULL, as_numeric = TRUE, progress_bar = TRUE, format = "csv", as = "data.frame" )
nassqs( ..., agg_level_desc = NULL, asd_code = NULL, asd_desc = NULL, begin_code = NULL, class_desc = NULL, commodity_desc = NULL, congr_district_code = NULL, country_code = NULL, country_name = NULL, county_ansi = NULL, county_code = NULL, county_name = NULL, domaincat_desc = NULL, domain_desc = NULL, end_code = NULL, freq_desc = NULL, group_desc = NULL, load_time = NULL, location_desc = NULL, prodn_practice_desc = NULL, reference_period_desc = NULL, region_desc = NULL, sector_desc = NULL, short_desc = NULL, source_desc = NULL, state_alpha = NULL, state_ansi = NULL, state_fips_code = NULL, state_name = NULL, statisticcat_desc = NULL, unit_desc = NULL, util_practice_desc = NULL, watershed_code = NULL, watershed_desc = NULL, week_ending = NULL, year = NULL, zip_5 = NULL, as_numeric = TRUE, progress_bar = TRUE, format = "csv", as = "data.frame" )
... |
either a named list of parameters or a series of additional
parameters that include operations, e.g. |
agg_level_desc |
Geographic level ("AGRICULTURAL DISTRICT", "COUNTY", "INTERNATIONAL", "NATIONAL", "REGION : MULTI-STATE", "REGION : SUB-STATE", "STATE", "WATERSHED", or "ZIP CODE"). |
asd_code |
Agriculture statistical district code. |
asd_desc |
Agriculture statistical district name / description. |
begin_code |
Week number indicating when the data series begins. |
class_desc |
Commodity class. |
commodity_desc |
Commodity, the primary subject of interest (e.g., "CORN", "CATTLE", "LABOR", "TRACTORS", "OPERATORS"). |
congr_district_code |
Congressional District codes. |
country_code |
Country code. |
country_name |
Country name. |
county_ansi |
County ANSI code. |
county_code |
County FIPS code. |
county_name |
County name. |
domaincat_desc |
Domain category within a domain (e.g., under domain_desc = "SALES", domain categories include $1,000 TO $9,999, $10,000 TO $19,999, etc). |
domain_desc |
Domain, a characteristic of operations that produce a particular commodity (e.g., "ECONOMIC CLASS", "AREA OPERATED", "NAICS CLASSIFICATION", "SALES"). For chemical usage data, the domain describes the type of chemical applied to the commodity. The domain_desc: = "TOTAL" will have no further breakouts; i.e., the data value pertains completely to the short_desc. |
end_code |
= Week number that the data series ends. |
freq_desc |
Time period type covered by the data ("ANNUAL", "SEASON", "MONTHLY", "WEEKLY", "POINT IN TIME"). "MONTHLY" often covers more than one month. "POINT IN TIME" is for a particular day. |
group_desc |
Commodity group within a sector (e.g., under sector_desc = "CROPS", the groups are "FIELD CROPS", "FRUIT & TREE NUTS", "HORTICULTURE", and "VEGETABLES"). |
load_time |
Date and time of the data load, e.g. "2015-02-17 16:05:20". |
location_desc |
Location code, e.g. 5-digit fips code for counties. |
prodn_practice_desc |
Production practice, (e.g. "UNDER PROTECTION", "OWNED, RIGHTS, LEASED", "ORGANIC, TRANSITIONING", "HIRED MANAGER"). |
reference_period_desc |
Reference period of the data (e.g. "JUN", "MID SEP", "WEEK #32"). |
region_desc |
Region name (e.g. "TEXAS", "WA & OR", "WEST COAST", "UMATILLA"). |
sector_desc |
Sector, the five high level, broad categories useful to narrow down choices. ("ANIMALS & PRODUCTS", "CROPS", "DEMOGRAPHICS", "ECONOMICS", or "ENVIRONMENTAL"). |
short_desc |
A concatenation of six columns: |
source_desc |
Source of data ("CENSUS" or "SURVEY"). Census program includes the Census of Ag as well as follow up projects. Survey program includes national, state, and county surveys. |
state_alpha |
2-character state abbreviation, e.g. "NM". |
state_ansi |
State ANSI code. |
state_fips_code |
State FIPS code. |
state_name |
Full name of the state, e.g. "ALABAMA". |
statisticcat_desc |
Statistical category of the data (e.g., "AREA HARVESTED", "PRICE RECEIVED", "INVENTORY", "SALES"). |
unit_desc |
The units of the data (e.g. "TONS / ACRE", "TREES", "OPERATIONS", "NUMBER", "LB / ACRE", "BU / PLANTED ACRE"). |
util_practice_desc |
Utilization practice (e.g. "WIND", "SUGAR", "SILAGE", "ONCE REFINED", "FEED", "ANIMAL FEED"). |
watershed_code |
Watershed code as 8-digit HUC (e.g. "13020100"). |
watershed_desc |
Watershed/HUC name (e.g. "UPPER COLORADO"). |
week_ending |
Date of ending week (e.g. "1975-11-22"). |
year |
Year of the data. Conditional values are possible by appending an
operation to the parameter, e.g. "year__GE = 2020" will return all records
with year >= 2020. See |
zip_5 |
5-digit zip code. |
as_numeric |
Whether to convert data to numeric format. Conversion will replace missing notation such as "(D)" or "(Z)" with NA, but removes the need to convert to numeric format after querying. |
progress_bar |
Whether or not to display the progress bar. |
format |
The format to return the query in. Only useful if as = "text". |
as |
whether to return a data.frame, list, or text string. See
|
nassqs()
accepts all parameters that are accepted by the USDA-NASS Quick
Stats. These parameters are listed in nassqs_params()
, and are used to form
the data query.
Parameters can be modified by operations, which are appended to the parameter name. For example, "year__GE = 2020" will fetch data in 2020 and after. Operations can take the following form:
__LE: less than or equal (<=)
__LT: less than (<)
__GT: greater than (>)
__GE: = >=
__LIKE = like
__NOT_LIKE = not like
__NE = not equal
a data frame, list, or text string of requested data.
nassqs_GET()
, nassqs_parse()
, nassqs_yields()
, nassqs_acres()
## Not run: # Get corn yields in Virginia in 2012 params <- list(commodity_desc = "CORN", year = 2012, agg_level_desc = "COUNTY", state_alpha = "VA", statisticcat_desc = "YIELD") yields <- nassqs(params) head(yields) ## End(Not run)
## Not run: # Get corn yields in Virginia in 2012 params <- list(commodity_desc = "CORN", year = 2012, agg_level_desc = "COUNTY", state_alpha = "VA", statisticcat_desc = "YIELD") yields <- nassqs(params) head(yields) ## End(Not run)
Get NASS Area given a set of parameters.
nassqs_acres( ..., area = c("AREA", "AREA PLANTED", "AREA BEARING", "AREA BEARING & NON-BEARING", "AREA GROWN", "AREA HARVESTED", "AREA IRRIGATED", "AREA NON-BEARING", "AREA PLANTED", "AREA PLANTED, NET") )
nassqs_acres( ..., area = c("AREA", "AREA PLANTED", "AREA BEARING", "AREA BEARING & NON-BEARING", "AREA GROWN", "AREA HARVESTED", "AREA IRRIGATED", "AREA NON-BEARING", "AREA PLANTED", "AREA PLANTED, NET") )
... |
either a named list of parameters or a series of parameters to form the query |
area |
the type of area to return. Default is all types. |
a data.frame of acres data
## Not run: # Get Area bearing for Apples in Washington, 2012. params <- list( commodity_desc = "APPLES", year = "2012", state_name = "WASHINGTON", agg_level_desc = "STATE" ) area <- nassqs_acres(params, area = "AREA BEARING") head(area) ## End(Not run)
## Not run: # Get Area bearing for Apples in Washington, 2012. params <- list( commodity_desc = "APPLES", year = "2012", state_name = "WASHINGTON", agg_level_desc = "STATE" ) area <- nassqs_acres(params, area = "AREA BEARING") head(area) ## End(Not run)
If the API key is provided, sets the environmental variable. You can set your API key in four ways:
nassqs_auth(key)
nassqs_auth(key)
key |
the API key (obtained from https://quickstats.nass.usda.gov/api/) |
directly or as a variable from your R
program: nassqs_auth(key = "<your api key>"
by setting NASSQS_TOKEN
in your R
environment file (you'll never have
to enter it again).
by entering it into the console when asked (it will be stored for the rest of the session.)
# Set the API key nassqs_auth(key = "<your api key>") Sys.getenv("NASSQS_TOKEN")
# Set the API key nassqs_auth(key = "<your api key>") Sys.getenv("NASSQS_TOKEN")
This wrapper allows specifying a list of counties by FIPS code. It iterates over each state in the list of FIPS, downloading for each separately and then concatenating.
nassqs_byfips(fips, ...)
nassqs_byfips(fips, ...)
fips |
a list of 5-digit fips codes |
... |
either a named list of parameters or a series of parameters to form the query |
a data.frame of data for each fips code
## Not run: nassqs_byfips( fips = c("19001", "17005", "17001"), commodity_desc = "CORN", year = 2019, statisticcat_desc = "YIELD") ## End(Not run)
## Not run: nassqs_byfips( fips = c("19001", "17005", "17001"), commodity_desc = "CORN", year = 2019, statisticcat_desc = "YIELD") ## End(Not run)
Check that the response is valid, i.e. that it doesn't exceed 50,000 records and that all the parameter values are valid. This is used to ensure that the query is valid before querying to reduce wait times before receiving an error.
nassqs_check(response)
nassqs_check(response)
response |
a |
nothing if check is passed, or an informative error if not passed.
Deprecated. Use nassqs_params()
instead.
nassqs_fields(...)
nassqs_fields(...)
... |
a parameter, series of parameters, or a list of parameters that you would like a description of. If missing, a list of all available parameters is returned. |
This is the workhorse of the package that provides the core request
functionality to the NASS 'Quick Stats' API:
https://quickstats.nass.usda.gov/api/.
In most cases nassqs()
or other high-level functions should be used.
nassqs_GET()
uses httr::GET()
to make a HTTP GET request, which returns a
request object which must then be parsed to a data.frame, list, or other R
object. Higher-level functions will do that parsing automatically. However,
if you need access to the request object directly, nassqs_GET()
provides
that.
nassqs_GET( ..., api_path = c("api_GET", "get_param_values", "get_counts"), progress_bar = TRUE, format = c("csv", "json", "xml") )
nassqs_GET( ..., api_path = c("api_GET", "get_param_values", "get_counts"), progress_bar = TRUE, format = c("csv", "json", "xml") )
... |
either a named list of parameters or a series of parameters to use in the query |
api_path |
the API path that determines the type of request being made. |
progress_bar |
whether to display the project bar or not. |
format |
The format to return the query in. Only useful if as = "text". |
a httr::GET()
response object
## Not run: # Yields for corn in 2012 in Washington params <- list(commodity_desc = "CORN", year = 2012, agg_level_desc = "STATE", state_alpha = "WA", statisticcat_desc = "YIELD") # Returns a request object that must be parsed either manually or # by using nassqs_parse() response <- nassqs_GET(params) yields <- nassqs_parse(response) head(yields) # Get the number of records that would be returned for a given request # Equivalent to 'nassqs_record_count(params)' response <- nassqs_GET(params, api_path = "get_counts") records <- nassqs_parse(response) records # Get the list of allowable values for the parameters 'statisticcat_desc' # Equivalent to 'nassqs_param_values("statisticcat_desc")' req <- nassqs_GET(list(param = "statisticcat_desc"), api_path = "get_param_values") statisticcat_desc_values <- nassqs_parse(req, as = "list") head(statisticcat_desc_values) ## End(Not run)
## Not run: # Yields for corn in 2012 in Washington params <- list(commodity_desc = "CORN", year = 2012, agg_level_desc = "STATE", state_alpha = "WA", statisticcat_desc = "YIELD") # Returns a request object that must be parsed either manually or # by using nassqs_parse() response <- nassqs_GET(params) yields <- nassqs_parse(response) head(yields) # Get the number of records that would be returned for a given request # Equivalent to 'nassqs_record_count(params)' response <- nassqs_GET(params, api_path = "get_counts") records <- nassqs_parse(response) records # Get the list of allowable values for the parameters 'statisticcat_desc' # Equivalent to 'nassqs_param_values("statisticcat_desc")' req <- nassqs_GET(list(param = "statisticcat_desc"), api_path = "get_param_values") statisticcat_desc_values <- nassqs_parse(req, as = "list") head(statisticcat_desc_values) ## End(Not run)
Returns a list of all possible values for a given parameter. Including additional parameters will restrict the list of valid values to those for data meeting the additional parameter restrictions. However, this is only possible by requesting the entire dataset and then filtering for unique values. It is recommended to make the query as small as possible if including additional parameters
nassqs_param_values(param, ...)
nassqs_param_values(param, ...)
param |
the name of a NASS quickstats parameter |
... |
additional parameters for which to filter the valid responses. |
a list containing all valid values for that parameter
## Not run: # See all values available for the statisticcat_desc field. Values may not # be available in the context of other parameters you set, for example # a given state may not have any 'YIELD' in blueberries if they don't grow # blueberries in that state. # Requires an API key: nassqs_param_values("source_desc") # Valid values for a parameter given a specific set of additional # parameters nassqs_param_values("commodity_desc", state_fips_code = "53", county_code = "077", year = 2017, group_desc = "EXPENSES") ## End(Not run)
## Not run: # See all values available for the statisticcat_desc field. Values may not # be available in the context of other parameters you set, for example # a given state may not have any 'YIELD' in blueberries if they don't grow # blueberries in that state. # Requires an API key: nassqs_param_values("source_desc") # Valid values for a parameter given a specific set of additional # parameters nassqs_param_values("commodity_desc", state_fips_code = "53", county_code = "077", year = 2017, group_desc = "EXPENSES") ## End(Not run)
Contains a simple hard-coded list of all available parameters. If no parameter name is provided, returns a list of all parameters. More information can be found in the API documentation on parameters found at https://quickstats.nass.usda.gov/api/#param_define.
nassqs_params(...)
nassqs_params(...)
... |
a parameter, series of parameters, or a list of parameters that you would like a description of. If missing, a list of all available parameters is returned. |
a list of all available parameters or a description of a subset
# Get a list of all available parameters nassqs_params() # Get information about specific parameters nassqs_params("source_desc", "group_desc")
# Get a list of all available parameters nassqs_params() # Get information about specific parameters nassqs_params("source_desc", "group_desc")
nassqs_GET()
.Returns a data frame, list, or text string. If a data.frame, all columns
except year
strings because the 'Quick Stats' data returns suppressed data
as '(D)', '(Z)', or other character indicators which mean different things.
Converting the value to a numerical results in NA, which loses that
information.
nassqs_parse(req, as_numeric = TRUE, as = c("data.frame", "list", "text"), ...)
nassqs_parse(req, as_numeric = TRUE, as = c("data.frame", "list", "text"), ...)
req |
the GET response from |
as_numeric |
whether to convert values to numeric format. |
as |
whether to return a data.frame, list, or text string |
... |
additional parameters passed to |
a data frame, list, or text string of the content from the response.
## Not run: # Set parameters and make the request params <- list(commodity_desc = "CORN", year = 2012, agg_level_desc = "STATE", state_alpha = "WA", statisticcat_desc = "YIELD") response <- nassqs_GET(params) # Parse the response to a data frame corn <- nassqs_parse(response, as = "data.frame") head(corn) # Parse the response into a raw character string. corn_text<- nassqs_parse(response, as = "text") head(corn_text) # Get a list of parameter values and parse as a list response <- nassqs_GET(list(param = "statisticcat_desc"), api_path = "get_param_values") statisticcat_desc_values <- nassqs_parse(response, as = "list") head(statisticcat_desc_values) ## End(Not run)
## Not run: # Set parameters and make the request params <- list(commodity_desc = "CORN", year = 2012, agg_level_desc = "STATE", state_alpha = "WA", statisticcat_desc = "YIELD") response <- nassqs_GET(params) # Parse the response to a data frame corn <- nassqs_parse(response, as = "data.frame") head(corn) # Parse the response into a raw character string. corn_text<- nassqs_parse(response, as = "text") head(corn_text) # Get a list of parameter values and parse as a list response <- nassqs_GET(list(param = "statisticcat_desc"), api_path = "get_param_values") statisticcat_desc_values <- nassqs_parse(response, as = "list") head(statisticcat_desc_values) ## End(Not run)
Returns the number of records that fit a set of parameters. Useful if your current parameter set returns more than the 50,000 record limit.
nassqs_record_count(...)
nassqs_record_count(...)
... |
either a named list of parameters or a series of parameters to form the query |
integer that is the number of records that are returned from the API in response to the query
## Not run: # Check the number of records returned for corn in 1995, Washington state params <- list( commodity_desc = "CORN", year = "2005", agg_level_desc = "STATE", state_name = "WASHINGTON" ) records <- nassqs_record_count(params) records # returns 17 ## End(Not run)
## Not run: # Check the number of records returned for corn in 1995, Washington state params <- list( commodity_desc = "CORN", year = "2005", agg_level_desc = "STATE", state_name = "WASHINGTON" ) records <- nassqs_record_count(params) records # returns 17 ## End(Not run)
Returns yields for other specified parameters. This function is intended to simplify common requests.
nassqs_yields(...)
nassqs_yields(...)
... |
either a named list of parameters or a series of parameters to form the query |
a data.frame of yields data
## Not run: # Get yields for wheat in 2012, all geographies params <- list( commodity_desc = "WHEAT", year = "2012", agg_level_desc = "STATE", state_alpha = "WA") yields <- nassqs_yields(params) head(yields) ## End(Not run)
## Not run: # Get yields for wheat in 2012, all geographies params <- list( commodity_desc = "WHEAT", year = "2012", agg_level_desc = "STATE", state_alpha = "WA") yields <- nassqs_yields(params) head(yields) ## End(Not run)