Title: | 'NoSQL' Database Connector |
---|---|
Description: | Simplified JSON document database access and manipulation, providing a common API across supported 'NoSQL' databases 'Elasticsearch', 'CouchDB', 'MongoDB' as well as 'SQLite/JSON1', 'PostgreSQL', and 'DuckDB'. |
Authors: | Ralf Herold [aut, cre] , Scott Chamberlain [aut] , Rich FitzJohn [aut], Jeroen Ooms [aut] |
Maintainer: | Ralf Herold <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.11.0 |
Built: | 2024-11-10 18:17:20 UTC |
Source: | https://github.com/ropensci/nodbi |
Data set 'contacts'
contacts
contacts
A JSON string with ragged, nested contact details
Data set 'diamonds'
A data frame with 53940 rows and 10 variables:
price price in US dollars (326-18,823 USD)
carat weight of the diamond (0.2-5.01)
cut quality of the cut (Fair, Good, Very Good, Premium, Ideal)
color diamond colour, from J (worst) to D (best)
clarity a measurement of how clear the diamond is (I1 (worst), SI1, SI2, VS1, VS2, VVS1, VVS2, IF (best))
x length in mm (0-10.74)
y width in mm (0-58.9)
z depth in mm (0-31.8)
depth total depth percentage = z / mean(x, y) = 2 * z / (x + y) (43-79)
table width of top of diamond relative to widest point (43-95)
from ggplot2
A message is emitted if the container key
already exists.
docdb_create(src, key, value, ...)
docdb_create(src, key, value, ...)
src |
Source object, result of call to any of functions
|
key |
(character) The name of the container in the
database backend (corresponds to |
value |
The data to be created in the database: a single data.frame, a JSON string, a list, or a file name or URL that points to NDJSON documents |
... |
Passed to functions |
An error is raised for document(s) in value
when their
_id
already exist(s) in the collection key
;
use docdb_update()
to update such document(s).
(integer) Number of successfully created documents
If value
is a data.frame that has a column _id
,
or is a JSON string having a key _id
at root level,
or is a list having an item _id
at its top level,
this will be used as _id
's and primary index
in the database. If there are no such _id
's in value
,
row names (if any exist) of value
will be used as _id
's,
otherwise random _id
's will be created (using
uuid::UUIDgenerate()
with use.time = TRUE
for
SQLite and PostgreSQL, or using DuckDB's built-in uuid()
).
## Not run: src <- src_sqlite() docdb_create(src, key = "diamonds_small", value = as.data.frame(diamonds[1:3000L, ]) ) head(docdb_get(src, "diamonds_small")) docdb_create(src, key = "contacts", value = contacts) docdb_get(src, "contacts")[["friends"]] ## End(Not run)
## Not run: src <- src_sqlite() docdb_create(src, key = "diamonds_small", value = as.data.frame(diamonds[1:3000L, ]) ) head(docdb_get(src, "diamonds_small")) docdb_create(src, key = "contacts", value = contacts) docdb_get(src, "contacts")[["friends"]] ## End(Not run)
Delete documents or container
docdb_delete(src, key, ...)
docdb_delete(src, key, ...)
src |
Source object, result of call to any of functions
|
key |
(character) The name of the container in the
database backend (corresponds to |
... |
Optionally, specify |
(logical) Success of operation. Typically TRUE
if
document(s) or collection existed, and FALSE
if document(s)
did not exist, or collection did not exist, or delete was not successful.
## Not run: src <- src_sqlite() docdb_create(src, "iris", iris) docdb_delete(src, "iris", query = '{"Species": {"$regex": "a$"}}') docdb_delete(src, "iris") ## End(Not run)
## Not run: src <- src_sqlite() docdb_create(src, "iris", iris) docdb_delete(src, "iris", query = '{"Species": {"$regex": "a$"}}') docdb_delete(src, "iris") ## End(Not run)
Check if container exists in database
docdb_exists(src, key, ...)
docdb_exists(src, key, ...)
src |
Source object, result of call to any of functions
|
key |
(character) The name of the container in the
database backend (corresponds to |
... |
Passed to functions |
(logical) TRUE
or FALSE
to indicate
existence of container key
in database.
Note this does not indicate if the container
holds any documents.
## Not run: src <- src_sqlite() docdb_exists(src, "nonexistingcontainer") docdb_create(src, "mtcars", mtcars) docdb_exists(src, "mtcars") ## End(Not run)
## Not run: src <- src_sqlite() docdb_exists(src, "nonexistingcontainer") docdb_create(src, "mtcars", mtcars) docdb_exists(src, "mtcars") ## End(Not run)
Get all documents from container in database
docdb_get(src, key, limit = NULL, ...)
docdb_get(src, key, limit = NULL, ...)
src |
Source object, result of call to any of functions
|
key |
(character) The name of the container in the
database backend (corresponds to |
limit |
(integer) Maximum number of documents
to be returned. If |
... |
Passed on to functions:
|
Data frame, one document per row
## Not run: src <- src_sqlite() docdb_create(src, "mtcars", mtcars) docdb_get(src, "mtcars", limit = 10L) ## End(Not run)
## Not run: src <- src_sqlite() docdb_create(src, "mtcars", mtcars) docdb_get(src, "mtcars", limit = 10L) ## End(Not run)
List containers in database
docdb_list(src, ...)
docdb_list(src, ...)
src |
Source object, result of call to any of functions
|
... |
Passed to function |
Vector of names of containers that can be
used as parameter key
with other functions such as
docdb_create()
.
## Not run: src <- src_sqlite() docdb_create(src, "iris", iris) docdb_list(src) ## End(Not run)
## Not run: src <- src_sqlite() docdb_create(src, "iris", iris) docdb_list(src) ## End(Not run)
Complements the databases' native query and filtering functions
by using jqr::jqr()
.
If query = "{}"
and neither fields
nor listfields
is specified, runs docdb_get()
.
docdb_query(src, key, query, ...)
docdb_query(src, key, query, ...)
src |
Source object, result of call to any of functions
|
key |
(character) The name of the container in the
database backend (corresponds to |
query |
(character) A JSON query string, see examples.
Can use comparisons / tests ( |
... |
Optional parameters:
|
Data frame with requested documents, one per row,
may have nested lists in columns;
NULL
if no documents could be found.
If listfields
is specified: vector of all field names
in dot path notation.
A dot in query
or fields
is interpreted as a dot path,
pointing to a field nested within another, e.g. friends.id
in
the example.
## Not run: src <- src_sqlite() docdb_create(src, "myKey", mtcars) docdb_create(src, "myKey", contacts) docdb_create(src, "myKey", mapdata) docdb_query(src, "myKey", query = '{"mpg":21}') docdb_query(src, "myKey", query = '{"mpg":21, "gear": {"$lte": 4}}') docdb_query(src, "myKey", query = '{"mpg":21}', fields = '{"_id":0, "mpg":1, "cyl":1}') docdb_query(src, "myKey", query = '{"_id": {"$regex": "^.+0.*$"}}', fields = '{"gear": 1}') docdb_query(src, "myKey", query = '{"$and": [{"mpg": {"$lte": 18}}, {"gear": {"$gt": 3}}]}') docdb_query(src, "myKey", query = '{}', fields = '{"_id":0, "mpg":1, "cyl":1}') docdb_query(src, "myKey", query = '{"$and": [{"age": {"$gt": 21}}, {"friends.name": {"$regex": "^B[a-z]{3,9}.*"}}]}') docdb_query(src, "myKey", query = '{"$or": [{"rows.elements.status": "OK"}, {"$and": [ {"_id": "5cd6785325ce3a94dfc54096"}, {"friends.name": {"$regex": "^B[a-z]{3,90}.*"}}]}]}') docdb_query(src, "myKey", query = '{"$and": [{"_id": "5cd6785325ce3a94dfc54096"}, {"friends.name": {"$regex": "^B[a-z]{3,90}.*"}}]}') docdb_query(src, "myKey", query = '{"origin_addresses": {"$in": ["Santa Barbara, CA, USA", "New York, NY, USA"]}}', fields = '{"age": 1, "friends.id": 1, "_id": 0, "rows.elements.status": 1}') docdb_query(src, "myKey", query = '{"rows.elements.status": "OK"}', listfields = TRUE) ## End(Not run)
## Not run: src <- src_sqlite() docdb_create(src, "myKey", mtcars) docdb_create(src, "myKey", contacts) docdb_create(src, "myKey", mapdata) docdb_query(src, "myKey", query = '{"mpg":21}') docdb_query(src, "myKey", query = '{"mpg":21, "gear": {"$lte": 4}}') docdb_query(src, "myKey", query = '{"mpg":21}', fields = '{"_id":0, "mpg":1, "cyl":1}') docdb_query(src, "myKey", query = '{"_id": {"$regex": "^.+0.*$"}}', fields = '{"gear": 1}') docdb_query(src, "myKey", query = '{"$and": [{"mpg": {"$lte": 18}}, {"gear": {"$gt": 3}}]}') docdb_query(src, "myKey", query = '{}', fields = '{"_id":0, "mpg":1, "cyl":1}') docdb_query(src, "myKey", query = '{"$and": [{"age": {"$gt": 21}}, {"friends.name": {"$regex": "^B[a-z]{3,9}.*"}}]}') docdb_query(src, "myKey", query = '{"$or": [{"rows.elements.status": "OK"}, {"$and": [ {"_id": "5cd6785325ce3a94dfc54096"}, {"friends.name": {"$regex": "^B[a-z]{3,90}.*"}}]}]}') docdb_query(src, "myKey", query = '{"$and": [{"_id": "5cd6785325ce3a94dfc54096"}, {"friends.name": {"$regex": "^B[a-z]{3,90}.*"}}]}') docdb_query(src, "myKey", query = '{"origin_addresses": {"$in": ["Santa Barbara, CA, USA", "New York, NY, USA"]}}', fields = '{"age": 1, "friends.id": 1, "_id": 0, "rows.elements.status": 1}') docdb_query(src, "myKey", query = '{"rows.elements.status": "OK"}', listfields = TRUE) ## End(Not run)
Documents are updated by patching their JSON with
value
. Documents are identified by a query
or by _id
's in value
, where the latter takes
precedence. value
can have multiple documents (with
_id
's), which then are iteratively updated.
docdb_update(src, key, value, query, ...)
docdb_update(src, key, value, query, ...)
src |
Source object, result of call to any of functions
|
key |
(character) The name of the container in the
database backend (corresponds to |
value |
The data to be created in the database: a single data.frame, a JSON string, a list, or a file name or URL that points to NDJSON documents |
query |
(character) A JSON query string, see examples.
Can use comparisons / tests ( |
... |
Passed on to functions |
Uses native functions in MongoDB (mongolite::mongo()
$update()),
SQLite (jsonb_update()
), DuckDB (jsonb_merge_patch()
),
Elasticsearch (elastic::docs_bulk_update()
);
a plpgsql
function added when calling src_postgres()
,
and a jqr::jqr()
programme for CouchDB.
(integer) Number of successfully updated documents
## Not run: src <- src_sqlite() docdb_create(src, "mtcars", mtcars) docdb_update(src, "mtcars", value = mtcars[3, 4:5], query = '{"gear": 3}') docdb_update(src, "mtcars", value = '{"carb":999}', query = '{"gear": 5}') docdb_update(src, "mtcars", value = '{"_id":"Fiat 128", "carb":888}', query = '{}') docdb_get(src, "mtcars") ## End(Not run)
## Not run: src <- src_sqlite() docdb_create(src, "mtcars", mtcars) docdb_update(src, "mtcars", value = mtcars[3, 4:5], query = '{"gear": 3}') docdb_update(src, "mtcars", value = '{"carb":999}', query = '{"gear": 5}') docdb_update(src, "mtcars", value = '{"_id":"Fiat 128", "carb":888}', query = '{}') docdb_get(src, "mtcars") ## End(Not run)
Data set 'mapdata'
mapdata
mapdata
A JSON string with ragged, nested travel details
There is a src_*()
function to setup a connection to each
of the database backends. The backends may have specific parameters
in the respective function src_*()
, but all other nodbi
functions
are independent of the backend (e.g., see docdb_query()
).
MongoDB - src_mongo()
SQLite - src_sqlite()
Elasticsearch - src_elastic()
CouchDB - src_couchdb()
PostgreSQL - src_postgres()
DuckDB - src_duckdb()
Documentation details for each database:
MongoDB - https://docs.mongodb.com/
SQLite/JSON1 - https://www.sqlite.org/json1.html
Elasticsearch - https://www.elastic.co/guide/en/elasticsearch/reference/current/index.html
CouchDB - http://docs.couchdb.org/
PostgreSQL - https://www.postgresql.org/docs/current/functions-json.html
Documentation of R packages used by nodbi
for the databases:
mongolite - https://CRAN.R-project.org/package=mongolite
RPostgres - https://CRAN.R-project.org/package=RPostgres
Setup a CouchDB database connection
src_couchdb( host = "127.0.0.1", port = 5984, path = NULL, transport = "http", user = NULL, pwd = NULL, headers = NULL )
src_couchdb( host = "127.0.0.1", port = 5984, path = NULL, transport = "http", user = NULL, pwd = NULL, headers = NULL )
host |
(character) host value, default: 127.0.0.1 |
port |
(integer/numeric) Port. Remember that if you don't want a port set, set this parameter to NULL. Default: 5984 |
path |
(character) context path that is appended to the end of the url, e.g., bar in http://foo.com/bar. Default: NULL, ignored |
transport |
(character) http or https. Default: http |
user |
(character) Username, if any |
pwd |
(character) Password, if any |
headers |
(list) list of named headers |
Uses sofa as backend. nodbi creates or uses
a CouchDB database with JSON documents. If documents do not have
root-level _id
's, UUID's are created as _id
's. Function
docdb_update()
uses jqr::jqr()
to implement patching JSON.
For a benchmark, see https://github.com/ropensci/nodbi#benchmark.
A nodbi
source object
## Not run: con <- src_couchdb() print(con) ## End(Not run)
## Not run: con <- src_couchdb() print(con) ## End(Not run)
Setup a DuckDB database connection
src_duckdb(drv = duckdb::duckdb(), dbdir = attr(drv, "dbdir"), ...)
src_duckdb(drv = duckdb::duckdb(), dbdir = attr(drv, "dbdir"), ...)
drv |
Object returned by |
dbdir |
Location for database files. Should be a path to an existing
directory in the file system. With the default (or |
... |
Additional named parameters passed on to |
Uses duckdb::duckdb()
as backend. nodbi creates or
uses a DuckDB table, with columns _id
and json
created and used
by package nodbi
, applying SQL functions as per
https://duckdb.org/docs/extensions/json to the json
column.
Each row in the table represents a JSON
document.
Any root-level _id
is extracted from the document(s) and used for
column _id
, otherwise a UUID is created as _id
.
The table is indexed on _id
.
For a benchmark, see https://github.com/ropensci/nodbi#benchmark.
A nodbi
source object
## Not run: con <- src_duckdb() print(con) ## End(Not run)
## Not run: con <- src_duckdb() print(con) ## End(Not run)
Setup an Elasticsearch database connection
src_elastic( host = "127.0.0.1", port = 9200, path = NULL, transport_schema = "http", user = NULL, pwd = NULL, force = FALSE, ... )
src_elastic( host = "127.0.0.1", port = 9200, path = NULL, transport_schema = "http", user = NULL, pwd = NULL, force = FALSE, ... )
host |
(character) the base url, defaults to localhost (http://127.0.0.1) |
port |
(character) port to connect to, defaults to 9200 (optional) |
path |
(character) context path that is appended to the end of the
url. Default: |
transport_schema |
(character) http or https. Default: http |
user |
(character) User name, if required for the connection. You can specify, but ignored for now. |
pwd |
(character) Password, if required for the connection. You can specify, but ignored for now. |
force |
(logical) Force re-load of connection details |
... |
Further args passed on to |
Uses elastic as backend. nodbi creates or uses
an Elasticsearch index, in which nodbi
creates JSON documents.
Any root-level _id
is extracted from the document(s) and used as
document ID _id
, otherwise a UUID is created as document ID _id
.
Only lowercase is accepted for container names (in parameter key
).
Opensearch can equally be used.
For a benchmark, see https://github.com/ropensci/nodbi#benchmark
A nodbi
source object
## Not run: con <- src_elastic() print(con) ## End(Not run)
## Not run: con <- src_elastic() print(con) ## End(Not run)
Setup a MongoDB database connection
src_mongo(collection = "test", db = "test", url = "mongodb://localhost", ...)
src_mongo(collection = "test", db = "test", url = "mongodb://localhost", ...)
collection |
(character) Name of collection |
db |
(character) Name of database |
url |
(character) Address of the MongoDB server in Mongo connection
string URI format, see to |
... |
Additional named parameters passed on to |
Uses monoglite as backend. nodbi creates or uses
a MongoDB collection, in which nodbi
creates JSON documents.
If documents do not have root-level _id
's, UUID's are created as _id
's.
MongoDB but none of the other databases require to specify the container
already in the src_mongo()
function.
For a benchmark, see https://github.com/ropensci/nodbi#benchmark
A nodbi
source object
## Not run: con <- src_mongo() print(con) ## End(Not run)
## Not run: con <- src_mongo() print(con) ## End(Not run)
Setup a PostgreSQL database connection
src_postgres(dbname = "test", host = "localhost", port = 5432L, ...)
src_postgres(dbname = "test", host = "localhost", port = 5432L, ...)
dbname |
(character) name of database, has to exist to open a connection |
host |
(character) host of the database,
see |
port |
(integer) port of the database,
see |
... |
additional named parameters passed
on to |
Uses RPostgres as backend. nodbi creates or uses
a PostgreSQL table, with columns _id
and json
created and used
by package nodbi
, applying SQL functions as per
https://www.postgresql.org/docs/current/functions-json.html
to the json
column.
Each row in the table represents a JSON
document.
Any root-level _id
is extracted from the document(s) and used
for column _id
, otherwise a UUID is created as _id
.
The table is indexed on _id
. A custom plpgsql
function
jsonb_merge_patch()
is used for docdb_update()
.
The order of variables in data frames returned by docdb_get()
and docdb_query()
can differ from their order the input to
docdb_create()
.
For a benchmark, see https://github.com/ropensci/nodbi#benchmark
A nodbi
source object
## Not run: con <- src_postgres() print(con) ## End(Not run)
## Not run: con <- src_postgres() print(con) ## End(Not run)
Setup a RSQLite database connection
src_sqlite(dbname = ":memory:", ...)
src_sqlite(dbname = ":memory:", ...)
dbname |
(character) name of database file,
defaults to ":memory:" for an in-memory database,
see |
... |
additional named parameters passed
on to |
Uses RSQLite as backend. nodbi creates or uses
an SQLite table, with columns _id
and json
created and used by
package nodbi
, applying SQL functions as per
https://www.sqlite.org/json1.html to the json
column.
Each row in the table represents a JSON
document.
Any root-level _id
is extracted from the document(s) and used for
column _id
, otherwise a UUID is created as _id
.
The table is indexed on _id
.
For a benchmark, see https://github.com/ropensci/nodbi#benchmark
A nodbi
source object
## Not run: con <- src_sqlite() print(con) ## End(Not run)
## Not run: con <- src_sqlite() print(con) ## End(Not run)