Title: | An R Client to the 'PatentsView' API |
---|---|
Description: | Provides functions to simplify the 'PatentsView' API (<https://patentsview.org/apis/purpose>) query language, send GET and POST requests to the API's twenty seven endpoints, and parse the data that comes back. |
Authors: | Christopher Baker [aut, cre], Russ Allen [aut] |
Maintainer: | Christopher Baker <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.0 |
Built: | 2024-12-28 22:18:54 UTC |
Source: | https://github.com/ropensci/patentsview |
This will cast the data fields returned by search_pv
so that
they have their most appropriate data types (e.g., date, numeric, etc.).
cast_pv_data(data)
cast_pv_data(data)
data |
The data returned by |
The same type of object that you passed into cast_pv_data
.
## Not run: fields <- c("patent_date", "patent_title", "patent_year") res <- search_pv(query = "{\"patent_number\":\"5116621\"}", fields = fields) cast_pv_data(data = res$data) ## End(Not run)
## Not run: fields <- c("patent_date", "patent_title", "patent_year") res <- search_pv(query = "{\"patent_number\":\"5116621\"}", fields = fields) cast_pv_data(data = res$data) ## End(Not run)
A data frame containing the names of retrievable fields for each of the endpoints. You can find this data on the API's online documentation for each endpoint as well (e.g., the patents endpoint field list table).
fieldsdf
fieldsdf
A data frame with the following columns:
The endpoint that this field record is for
The complete name of the field, including the parent group if applicable
The field's input data type
The group the field belongs to
The field name without the parent group structure
This function reminds the user what the possible PatentsView API endpoints are.
get_endpoints()
get_endpoints()
A character vector with the names of each endpoint.
This function returns a vector of fields that you can retrieve from a given
API endpoint (i.e., the fields you can pass to the fields
argument in
search_pv
). You can limit these fields to only cover certain
entity group(s) as well (which is recommended, given the large number of
possible fields for each endpoint).
get_fields(endpoint, groups = NULL)
get_fields(endpoint, groups = NULL)
endpoint |
The API endpoint whose field list you want to get. See
|
groups |
A character vector giving the group(s) whose fields you want
returned. A value of |
A character vector with field names.
# Get all assignee-level fields for the patent endpoint: fields <- get_fields(endpoint = "patent", groups = "assignees") # ...Then pass to search_pv: ## Not run: search_pv( query = '{"_gte":{"patent_date":"2007-01-04"}}', fields = fields ) ## End(Not run) # Get all patent and assignee-level fields for the patent endpoint: fields <- get_fields(endpoint = "patent", groups = c("assignees", "patents")) ## Not run: # ...Then pass to search_pv: search_pv( query = '{"_gte":{"patent_date":"2007-01-04"}}', fields = fields ) ## End(Not run)
# Get all assignee-level fields for the patent endpoint: fields <- get_fields(endpoint = "patent", groups = "assignees") # ...Then pass to search_pv: ## Not run: search_pv( query = '{"_gte":{"patent_date":"2007-01-04"}}', fields = fields ) ## End(Not run) # Get all patent and assignee-level fields for the patent endpoint: fields <- get_fields(endpoint = "patent", groups = c("assignees", "patents")) ## Not run: # ...Then pass to search_pv: search_pv( query = '{"_gte":{"patent_date":"2007-01-04"}}', fields = fields ) ## End(Not run)
This function suggests a value that you could use for the pk
argument
in unnest_pv_data
, based on the endpoint you searched.
It will return a potential unique identifier for a given entity (i.e., a
given endpoint). For example, it will return "patent_id" when
endpoint = "patent"
.
get_ok_pk(endpoint)
get_ok_pk(endpoint)
endpoint |
The endpoint which you would like to know a potential primary key for. |
The name of a primary key (pk
) that you could pass to
unnest_pv_data
.
get_ok_pk(endpoint = "inventor") get_ok_pk(endpoint = "cpc_subclass") get_ok_pk("publication/rel_app_text")
get_ok_pk(endpoint = "inventor") get_ok_pk(endpoint = "cpc_subclass") get_ok_pk("publication/rel_app_text")
A list of functions that make it easy to write PatentsView queries. See the details section below for a list of the 14 functions, as well as the writing queries vignette for further details.
qry_funs
qry_funs
An object of class list
of length 14.
1. Comparison operator functions
There are 6 comparison operator functions that work with fields of type integer, float, date, or string:
eq
- Equal to
neq
- Not equal to
gt
- Greater than
gte
- Greater than or equal to
lt
- Less than
lte
- Less than or equal to
There are 2 comparison operator functions that only work with fields of type string:
begins
- The string begins with the value string
contains
- The string contains the value string
There are 3 comparison operator functions that only work with fields of type fulltext:
text_all
- The text contains all the words in the value
string
text_any
- The text contains any of the words in the value
string
text_phrase
- The text contains the exact phrase of the value
string
2. Array functions
There are 2 array functions:
and
- Both members of the array must be true
or
- Only one member of the array must be true
3. Negation function
There is 1 negation function:
not
- The comparison is not true
An object of class pv_query
. This is basically just a simple
list with a print method attached to it.
qry_funs$eq(patent_date = "2001-01-01") qry_funs$not(qry_funs$eq(patent_date = "2001-01-01"))
qry_funs$eq(patent_date = "2001-01-01") qry_funs$not(qry_funs$eq(patent_date = "2001-01-01"))
Some of the endpoints now return HATEOAS style links to get more data. E.g., the inventors endpoint may return a link such as: "https://search.patentsview.org/api/v1/inventor/252373/"
retrieve_linked_data(url, api_key = Sys.getenv("PATENTSVIEW_API_KEY"), ...)
retrieve_linked_data(url, api_key = Sys.getenv("PATENTSVIEW_API_KEY"), ...)
url |
The link that was returned by the API on a previous call. |
api_key |
API key. See Here for info on creating a key. |
... |
A list with the following three elements:
A list with one element - a named data frame containing the data returned by the server. Each row in the data frame corresponds to a single value for the primary entity. For example, if you search the assignees endpoint, then the data frame will be on the assignee-level, where each row corresponds to a single assignee. Fields that are not on the assignee-level would be returned in list columns.
Entity counts across all pages of output (not just the page returned to you).
Details of the HTTP request that was sent to the server.
When you set all_pages = TRUE
, you will only get a sample request.
In other words, you will not be given multiple requests for the multiple
calls that were made to the server (one for each page of results).
## Not run: retrieve_linked_data( "https://search.patentsview.org/api/v1/cpc_subgroup/G01S7:4811/" ) ## End(Not run)
## Not run: retrieve_linked_data( "https://search.patentsview.org/api/v1/cpc_subgroup/G01S7:4811/" ) ## End(Not run)
This function makes an HTTP request to the PatentsView API for data matching the user's query.
search_pv( query, fields = NULL, endpoint = "patent", subent_cnts = FALSE, mtchd_subent_only = lifecycle::deprecated(), page = 1, per_page = 1000, all_pages = FALSE, sort = NULL, method = "GET", error_browser = NULL, api_key = Sys.getenv("PATENTSVIEW_API_KEY"), ... )
search_pv( query, fields = NULL, endpoint = "patent", subent_cnts = FALSE, mtchd_subent_only = lifecycle::deprecated(), page = 1, per_page = 1000, all_pages = FALSE, sort = NULL, method = "GET", error_browser = NULL, api_key = Sys.getenv("PATENTSVIEW_API_KEY"), ... )
query |
The query that the API will use to filter records.
|
fields |
A character vector of the fields that you want returned to you.
A value of |
endpoint |
The web service resource you wish to search. Use
|
subent_cnts |
Non-matched subentities will always be returned under the new version of the API |
mtchd_subent_only |
|
page |
The page number of the results that should be returned. |
per_page |
The number of records that should be returned per page. This
value can be as high as 1,000 (e.g., |
all_pages |
Do you want to download all possible pages of output? If
|
sort |
A named character vector where the name indicates the field to
sort by and the value indicates the direction of sorting (direction should
be either "asc" or "desc"). For example, |
method |
The HTTP method that you want to use to send the request. Possible values include "GET" or "POST". Use the POST method when your query is very long (say, over 2,000 characters in length). |
error_browser |
|
api_key |
API key. See Here for info on creating a key. |
... |
A list with the following three elements:
A list with one element - a named data frame containing the data returned by the server. Each row in the data frame corresponds to a single value for the primary entity. For example, if you search the assignees endpoint, then the data frame will be on the assignee-level, where each row corresponds to a single assignee. Fields that are not on the assignee-level would be returned in list columns.
Entity counts across all pages of output (not just the page returned to you).
Details of the HTTP request that was sent to the server.
When you set all_pages = TRUE
, you will only get a sample request.
In other words, you will not be given multiple requests for the multiple
calls that were made to the server (one for each page of results).
## Not run: search_pv(query = '{"_gt":{"patent_year":2010}}') search_pv( query = qry_funs$gt(patent_year = 2010), fields = get_fields("patent", c("patents", "assignees")) ) search_pv( query = qry_funs$gt(patent_year = 2010), method = "POST", fields = "patent_number", sort = c("patent_number" = "asc") ) search_pv( query = qry_funs$eq(inventor_name_last = "Crew"), endpoint = "inventor", all_pages = TRUE ) search_pv( query = qry_funs$contains(assignee_individual_name_last = "Smith"), endpoint = "assignee" ) search_pv( query = qry_funs$contains(inventors_at_grant.name_last = "Smith"), endpoint = "patent", config = httr::timeout(40) ) ## End(Not run)
## Not run: search_pv(query = '{"_gt":{"patent_year":2010}}') search_pv( query = qry_funs$gt(patent_year = 2010), fields = get_fields("patent", c("patents", "assignees")) ) search_pv( query = qry_funs$gt(patent_year = 2010), method = "POST", fields = "patent_number", sort = c("patent_number" = "asc") ) search_pv( query = qry_funs$eq(inventor_name_last = "Crew"), endpoint = "inventor", all_pages = TRUE ) search_pv( query = qry_funs$contains(assignee_individual_name_last = "Smith"), endpoint = "assignee" ) search_pv( query = qry_funs$contains(inventors_at_grant.name_last = "Smith"), endpoint = "patent", config = httr::timeout(40) ) ## End(Not run)
This function converts a single data frame that has subentity-level list columns in it into multiple data frames, one for each entity/subentity. The multiple data frames can be merged together using the primary key variable specified by the user (see the relational data chapter in "R for Data Science" for an in-depth introduction to joining tabular data).
unnest_pv_data(data, pk = get_ok_pk(names(data)))
unnest_pv_data(data, pk = get_ok_pk(names(data)))
data |
The data returned by |
pk |
The column/field name that will link the data frames together. This
should be the unique identifier for the primary entity. For example, if you
used the patents endpoint in your call to |
A list with multiple data frames, one for each entity/subentity.
Each data frame will have the pk
column in it, so you can link the
tables together as needed.
## Not run: fields <- c("patent_id", "patent_title", "inventors.inventor_city", "inventors.inventor_country") res <- search_pv(query = '{"_gte":{"patent_year":2015}}', fields = fields) unnest_pv_data(data = res$data, pk = "patent_id") ## End(Not run)
## Not run: fields <- c("patent_id", "patent_title", "inventors.inventor_city", "inventors.inventor_country") res <- search_pv(query = '{"_gte":{"patent_year":2015}}', fields = fields) unnest_pv_data(data = res$data, pk = "patent_id") ## End(Not run)
This function evaluates whatever code you pass to it in the environment of
the qry_funs
list. This allows you to cut down on typing when
writing your queries. If you want to cut down on typing even more, you can
try assigning the qry_funs
list into your global environment
with: list2env(qry_funs, envir = globalenv())
.
with_qfuns(code, envir = parent.frame())
with_qfuns(code, envir = parent.frame())
code |
Code to evaluate. See example. |
envir |
Where should R look for objects present in |
The result of code
- i.e., your query.
# Without with_qfuns, we have to do: qry_funs$and( qry_funs$gte(patent_date = "2007-01-01"), qry_funs$text_phrase(patent_abstract = c("computer program")), qry_funs$or( qry_funs$eq(inventor_last_name = "ihaka"), qry_funs$eq(inventor_first_name = "chris") ) ) #...With it, this becomes: with_qfuns( and( gte(patent_date = "2007-01-01"), text_phrase(patent_abstract = c("computer program")), or( eq(inventor_last_name = "ihaka"), eq(inventor_first_name = "chris") ) ) )
# Without with_qfuns, we have to do: qry_funs$and( qry_funs$gte(patent_date = "2007-01-01"), qry_funs$text_phrase(patent_abstract = c("computer program")), qry_funs$or( qry_funs$eq(inventor_last_name = "ihaka"), qry_funs$eq(inventor_first_name = "chris") ) ) #...With it, this becomes: with_qfuns( and( gte(patent_date = "2007-01-01"), text_phrase(patent_abstract = c("computer program")), or( eq(inventor_last_name = "ihaka"), eq(inventor_first_name = "chris") ) ) )