Title: | Chemical Information from the Web |
---|---|
Description: | Chemical information from around the web. This package interacts with a suite of web services for chemical information. Sources include: Alan Wood's Compendium of Pesticide Common Names, Chemical Identifier Resolver, ChEBI, Chemical Translation Service, ChemSpider, ETOX, Flavornet, NIST Chemistry WebBook, OPSIN, PubChem, SRS, Wikidata. |
Authors: | Eduard Szöcs [aut], Robert Allaway [ctb], Daniel Muench [ctb], Johannes Ranke [ctb], Andreas Scharmüller [ctb], Eric R Scott [ctb], Jan Stanstrup [ctb], João Vitor F Cavalcante [ctb], Gordon Getzinger [ctb], Ethan Bass [ctb], Tamás Stirling [ctb, cre] |
Maintainer: | Tamás Stirling <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.3.0 |
Built: | 2024-11-01 05:10:53 UTC |
Source: | https://github.com/ropensci/webchem |
This function attempts to format numeric (or character) vectors
as character vectors of CAS numbers. If they cannot be converted to CAS
format or don't pass is.cas
, NA
is returned
as.cas(x, verbose = getOption("verbose"))
as.cas(x, verbose = getOption("verbose"))
x |
numeric vector, or character vector of CAS numbers missing the hyphens |
verbose |
logical; should a verbose output be printed on the console? |
character vector of valid CAS numbers
x = c(58082, 123456, "hexenol") as.cas(x)
x = c(58082, 123456, "hexenol") as.cas(x)
Query the BCPC Compendium of Pesticide Common Names https://pesticidecompendium.bcpc.org formerly known as Alan Woods Compendium of Pesticide Common Names
bcpc_query( query, from = c("name", "cas"), verbose = getOption("verbose"), type, ... )
bcpc_query( query, from = c("name", "cas"), verbose = getOption("verbose"), type, ... )
query |
character; search string |
from |
character; type of input ('cas' or 'name') |
verbose |
logical; print message during processing to console? |
type |
deprecated |
... |
additional arguments to internal utility functions |
A list of eight entries: common-name, status, preferred IUPAC Name, IUPAC Name, cas, formula, activity, subactivity, inchikey, inchi and source url.
for from = 'cas' only the first matched link is returned. Please respect Copyright, Terms and Conditions https://pesticidecompendium.bcpc.org/legal.html!
Eduard Szöcs, Tamás Stirling, Eric R. Scott, Andreas Scharmüller, Ralf B. Schäfer (2020). webchem: An R Package to Retrieve Chemical Information from the Web. Journal of Statistical Software, 93(13). doi:10.18637/jss.v093.i13.
## Not run: bcpc_query('Fluazinam', from = 'name') out <- bcpc_query(c('Fluazinam', 'Diclofop'), from = 'name') out # extract subactivity from object sapply(out, function(y) y$subactivity[1]) # use CAS-numbers bcpc_query("79622-59-6", from = 'cas') ## End(Not run)
## Not run: bcpc_query('Fluazinam', from = 'name') out <- bcpc_query(c('Fluazinam', 'Diclofop'), from = 'name') out # extract subactivity from object sapply(out, function(y) y$subactivity[1]) # use CAS-numbers bcpc_query("79622-59-6", from = 'cas') ## End(Not run)
Returns a list of Complete ChEBI entities. ChEBI data are parsed as data.frames ("properties", "chebiid_snd", "synonyms", "iupacnames", "formulae", "regnumbers", "citations", "dblinks", "parents", "children", "comments", "origins") or as a list ("chem_structure") in the list. The SOAP protocol is used https://www.ebi.ac.uk/chebi/webServices.do.
chebi_comp_entity(chebiid, verbose = getOption("verbose"), ...)
chebi_comp_entity(chebiid, verbose = getOption("verbose"), ...)
chebiid |
character; search term (i.e. chebiid). |
verbose |
logical; should a verbose output be printed on the console? |
... |
optional arguments |
returns a list of data.frames or lists containing a complete ChEBI entity
Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V, Turner S, Swainston N, Mendes P, Steinbeck C. (2016). ChEBI in 2016: Improved services and an expanding collection of metabolites. Nucleic Acids Res.
Hastings, J., de Matos, P., Dekker, A., Ennis, M., Harsha, B., Kale, N., Muthukrishnan, V., Owen, G., Turner, S., Williams, M., and Steinbeck, C. (2013) The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013. Nucleic Acids Res.
de Matos, P., Alcantara, R., Dekker, A., Ennis, M., Hastings, J., Haug, K., Spiteri, I., Turner, S., and Steinbeck, C. (2010) Chemical entities of biological interest: an update. Nucleic Acids Res. Degtyarenko, K., Hastings, J., de Matos, P., and Ennis, M. (2009). ChEBI: an open bioinformatics and cheminformatics resource. Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis et al., Chapter 14.
Degtyarenko, K., de Matos, P., Ennis, M., Hastings, J., Zbinden, M., McNaught, A., Alcántara, R., Darsow, M., Guedj, M. and Ashburner, M. (2008) ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Res. 36, D344–D350.
Eduard Szöcs, Tamás Stirling, Eric R. Scott, Andreas Scharmüller, Ralf B. Schäfer (2020). webchem: An R Package to Retrieve Chemical Information from the Web. Journal of Statistical Software, 93(13). doi:10.18637/jss.v093.i13.
## Not run: # might fail if API is not available chebi_comp_entity('CHEBI:27744') # multiple inputs comp <- c('CHEBI:27744', 'CHEBI:27744') chebi_comp_entity(comp) ## End(Not run)
## Not run: # might fail if API is not available chebi_comp_entity('CHEBI:27744') # multiple inputs comp <- c('CHEBI:27744', 'CHEBI:27744') chebi_comp_entity(comp) ## End(Not run)
Retrieve all available classes within the Anatomical Therapeutic Chemical (ATC) classification system.
chembl_atc_classes(verbose = getOption("verbose"), test_service_down = FALSE)
chembl_atc_classes(verbose = getOption("verbose"), test_service_down = FALSE)
verbose |
logical; should a verbose output be printed on the console? |
test_service_down |
logical; this argument is only used for testing. |
Gaulton, A., Bellis, L. J., Bento, A. P., Chambers, J., Davies, M., Hersey, A., ... & Overington, J. P. (2012). ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic acids research, 40(D1), D1100-D1107.
## Not run: # Might fail if API is not available atc <- atc_classes() ## End(Not run)
## Not run: # Might fail if API is not available atc <- atc_classes() ## End(Not run)
Retrieve ChEMBL data using a vector of ChEMBL IDs.
chembl_query( query, resource = "molecule", cache_file = NULL, verbose = getOption("verbose"), test_service_down = FALSE )
chembl_query( query, resource = "molecule", cache_file = NULL, verbose = getOption("verbose"), test_service_down = FALSE )
query |
character; a vector of ChEMBL IDs. |
resource |
character; the ChEMBL resource to query. Use [chembl_resources()] to see all available resources. |
cache_file |
character; the name of the cache file without the file
extension. If |
verbose |
logical; should a verbose output be printed on the console? |
test_service_down |
logical; this argument is only used for testing. |
Each entry in ChEMBL has a unique ID. Data in ChEMBL is organized in databases called resources. An entry may or may not have a record in a particular resource. An entry may have a record in more than one resource, e.g. a compound may be present in both the "molecule" and the "drug" resource. This function queries a vector of ChEMBL IDs from a specific ChEMBL resource.
If you are unsure which ChEMBL resource contains your ChEMBL ID,
use this function with the "chembl_id_lookup"
resource to find the
appropriate resource for a ChEMBL ID. Note that "chembl_id_lookup"
is
not a separate function but a resource used by chembl_query
.
If cache_file
is not NULL
the function creates a
cache directory in the working directory and a cache file in the cache
directory. This file is used in subsequent calls of the function. The
function first tries to retrieve query results from the cache file and only
accesses the webservice if the ChEMBL ID cannot be found in the cache file.
The cache file is extended as new ChEMBL ID-s are queried during the session.
The function returns a list of lists, where each element of the list contains a list of respective query results. Results are simplified, if possible.
Links to the webservice documentation:
Gaulton, A., Bellis, L. J., Bento, A. P., Chambers, J., Davies, M., Hersey, A., ... & Overington, J. P. (2012). ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic acids research, 40(D1), D1100-D1107.
## Not run: # Might fail if API is not available # Search molecules chembl_query("CHEMBL1082", resource = "molecule") chembl_query(c("CHEMBL25", "CHEMBL1082"), resource = "molecule") # Look up ChEMBL IDs in ChEMBL "resources", returns one resource per query. chembl_query("CHEMBL771355", "chembl_id_lookup") # Search assays chembl_query("CHEMBL771355", resource = "assay") ## End(Not run)
## Not run: # Might fail if API is not available # Search molecules chembl_query("CHEMBL1082", resource = "molecule") chembl_query(c("CHEMBL25", "CHEMBL1082"), resource = "molecule") # Look up ChEMBL IDs in ChEMBL "resources", returns one resource per query. chembl_query("CHEMBL771355", "chembl_id_lookup") # Search assays chembl_query("CHEMBL771355", resource = "assay") ## End(Not run)
Data in ChEMBL is organized in databases called resources. This function lists available ChEMBL resources.
chembl_resources()
chembl_resources()
The list was compiled manually using the following url: https://chembl.gitbook.io/chembl-interface-documentation/web-services/chembl-data-web-services
Gaulton, A., Bellis, L. J., Bento, A. P., Chambers, J., Davies, M., Hersey, A., ... & Overington, J. P. (2012). ChEMBL: a large-scale bioactivity database for drug discovery. Nucleic acids research, 40(D1), D1100-D1107.
A interface to the Chemical Identifier Resolver (CIR). (https://cactus.nci.nih.gov/chemical/structure_documentation).
cir_img( query, dir, format = c("png", "gif"), width = 500, height = 500, linewidth = 2, symbolfontsize = 16, bgcolor = NULL, antialiasing = TRUE, atomcolor = NULL, bondcolor = NULL, csymbol = c("special", "all"), hsymbol = c("special", "all"), hcolor = NULL, header = NULL, footer = NULL, frame = NULL, verbose = getOption("verbose"), ... )
cir_img( query, dir, format = c("png", "gif"), width = 500, height = 500, linewidth = 2, symbolfontsize = 16, bgcolor = NULL, antialiasing = TRUE, atomcolor = NULL, bondcolor = NULL, csymbol = c("special", "all"), hsymbol = c("special", "all"), hcolor = NULL, header = NULL, footer = NULL, frame = NULL, verbose = getOption("verbose"), ... )
query |
character; Search term. Can be any common chemical identifier (e.g. CAS, INCHI(KEY), SMILES etc.) |
dir |
character; Directory to save the image. |
format |
character; Output format of the image. Can be one of "png", "gif". |
width |
integer; Width of the image. |
height |
integer; Height of the image. |
linewidth |
integer; Width of lines. |
symbolfontsize |
integer; Fontsize of atoms in the image. |
bgcolor |
character; E.g. transparent, white, %23AADDEE |
antialiasing |
logical; Should antialiasing be used? |
atomcolor |
character; Color of the atoms in the image. |
bondcolor |
character; Color of the atom bond lines. |
csymbol |
character; Can be one of "special" (default - i.e. only hydrogen atoms in functional groups or defining stereochemistry) or "all". |
hsymbol |
character; Can be one of "special" (default - i.e. none are shown) or "all" (all are printed). |
hcolor |
character; Color of the hydrogen atoms. |
header |
character; Should a header text be added to the image? Can be any string. |
footer |
character; Should a footer text be added to the image? Can be any string. |
frame |
integer; Should a frame be plotted? Can be on of NULL (default) or 1. |
verbose |
logical; Should a verbose output be printed on the console? |
... |
currently not used. |
CIR can resolve can be of the following identifier
: Chemical Names,
IUPAC names,
CAS Numbers, SMILES strings, IUPAC InChI/InChIKeys, NCI/CADD Identifiers,
CACTVS HASHISY, NSC number, PubChem SID, ZINC Code, ChemSpider ID,
ChemNavigator SID, eMolecule VID.
For an image with transparent background use ‘transparent’ as color name and switch off antialiasing (i.e. antialiasing = 0).
image written to disk
You can only make 1 request per second (this is a hard-coded feature).
cir
relies on the great CIR web service created by the CADD
Group at NCI/NIH!
https://cactus.nci.nih.gov/chemical/structure_documentation,
https://cactus.nci.nih.gov/blog/?cat=10,
https://cactus.nci.nih.gov/blog/?p=1386,
https://cactus.nci.nih.gov/blog/?p=1456,
## Not run: # might fail if API is not available cir_img("CCO", dir = tempdir()) # SMILES # multiple query strings and different formats query = c("Glyphosate", "Isoproturon", "BSYNRYMUTXBXSQ-UHFFFAOYSA-N") cir_img(query, dir = tempdir(), bgcolor = "transparent", antialising = 0) # all parameters query = "Triclosan" cir_img(query, dir = tempdir(), format = "png", width = 600, height = 600, linewidth = 5, symbolfontsize = 30, bgcolor = "red", antialiasing = FALSE, atomcolor = "green", bondcolor = "yellow", csymbol = "all", hsymbol = "all", hcolor = "purple", header = "My funky chemical structure..", footer = "..is just so awesome!", frame = 1, verbose = getOption("verbose")) ## End(Not run)
## Not run: # might fail if API is not available cir_img("CCO", dir = tempdir()) # SMILES # multiple query strings and different formats query = c("Glyphosate", "Isoproturon", "BSYNRYMUTXBXSQ-UHFFFAOYSA-N") cir_img(query, dir = tempdir(), bgcolor = "transparent", antialising = 0) # all parameters query = "Triclosan" cir_img(query, dir = tempdir(), format = "png", width = 600, height = 600, linewidth = 5, symbolfontsize = 30, bgcolor = "red", antialiasing = FALSE, atomcolor = "green", bondcolor = "yellow", csymbol = "all", hsymbol = "all", hcolor = "purple", header = "My funky chemical structure..", footer = "..is just so awesome!", frame = 1, verbose = getOption("verbose")) ## End(Not run)
A interface to the Chemical Identifier Resolver (CIR). (https://cactus.nci.nih.gov/chemical/structure_documentation).
cir_query( identifier, representation = "smiles", resolver = NULL, match = c("all", "first", "ask", "na"), verbose = getOption("verbose"), choices = NULL, ... )
cir_query( identifier, representation = "smiles", resolver = NULL, match = c("all", "first", "ask", "na"), verbose = getOption("verbose"), choices = NULL, ... )
identifier |
character; chemical identifier. |
representation |
character; what representation of the identifier should be returned. See details for possible representations. |
resolver |
character; what resolver should be used? If NULL (default) the identifier type is detected and the different resolvers are used in turn. See details for possible resolvers. |
match |
character; How should multiple hits be handled? |
verbose |
logical; should a verbose output be printed on the console? |
choices |
deprecated. Use the |
... |
currently not used. |
CIR can resolve can be of the following identifier
: Chemical Names,
IUPAC names,
CAS Numbers, SMILES strings, IUPAC InChI/InChIKeys, NCI/CADD Identifiers,
CACTVS HASHISY, NSC number, PubChem SID, ZINC Code, ChemSpider ID,
ChemNavigator SID, eMolecule VID.
cir_query()
can handle only a part of all possible conversions of CIR.
Possible representations
are:
'smiles'
(SMILES strings),
'names'
(Names),
'cas'
(CAS numbers),
'stdinchikey'
(Standard InChIKey),
'stdinchi'
(Standard InChI),
'ficts'
(FICTS Identifier),
'ficus'
(FICuS Indetifier),
'uuuuu'
(uuuuu Identifier),
'mw'
(Molecular weight),
'monoisotopic_mass'
(Monoisotopic Mass),
'formula'
(Chemical Formula),
'chemspider_id'
(ChemSpider ID),
'pubchem_sid'
(PubChem SID),
'chemnavigator_sid'
(ChemNavigator SID),
'h_bond_donor_count'
(Number of Hydrogen Bond Donors),
'h_bond_acceptor_count'
(Number of Hydrogen Bond Acceptors),
'h_bond_center_count'
(Number of Hydrogen Bond Centers),
'rule_of_5_violation_count'
(Number of Rule of 5 Violations),
'rotor_count'
(Number of Freely Rotatable Bonds),
'effective_rotor_count'
(Number of Effectively Rotatable Bonds),
'ring_count'
(Number of Rings),
'ringsys_count'
(Number of Ring Systems),
'xlogp2'
(octanol-water partition coefficient),
'aromatic'
(is the compound aromatic),
'macrocyclic'
(is the compound macrocyclic),
'heteroatom_count'
(heteroatom count),
'hydrogen_atom_count'
(H atom count),
'heavy_atom_count'
( Heavy atom count),
'deprotonable_group_count'
(Number of deprotonable groups),
'protonable_group_count'
(Number of protonable groups).
CIR first tries to determine the identifier type submitted and then
uses 'resolvers' to look up the data.
If no resolver
is supplied, CIR tries different resolvers in
turn till a hit is found.
E.g. for names CIR tries first to look up in OPSIN and if this fails
the local name index of CIR.
However, it can be also specified which resolvers to use
(if you know e.g. know your identifier type)
Possible resolvers
are:
'name_by_cir'
(Lookup in name index of CIR),
'name_by_opsin'
(Lookup in OPSIN),
'name_by_chemspider'
(Lookup in ChemSpider,
https://cactus.nci.nih.gov/blog/?p=1386),
'smiles'
(Lookup SMILES),
'stdinchikey'
, 'stdinchi'
(InChI),
'cas_number'
(CAS Number),
'name_pattern'
(Google-like pattern search
(https://cactus.nci.nih.gov/blog/?p=1456)
Note, that the pattern search can be combined with other resolvers,
e.g. resolver = 'name_by_chemspider,name_pattern'
.
A tibble with a 'query' column and a column for the requested representation.
You can only make 1 request per second (this is a hard-coded feature).
cir
relies on the great CIR web service created by the CADD
Group at NCI/NIH!
https://cactus.nci.nih.gov/chemical/structure_documentation,
https://cactus.nci.nih.gov/blog/?cat=10,
https://cactus.nci.nih.gov/blog/?p=1386,
https://cactus.nci.nih.gov/blog/?p=1456,
## Not run: # might fail if API is not available cir_query("Triclosan", "cas") cir_query("3380-34-5", "cas", match = "first") cir_query("3380-34-5", "cas", resolver = "cas_number") cir_query("3380-34-5", "smiles") cir_query("Triclosan", "mw") # multiple inputs comp <- c("Triclosan", "Aspirin") cir_query(comp, "cas", match = "first") ## End(Not run)
## Not run: # might fail if API is not available cir_query("Triclosan", "cas") cir_query("3380-34-5", "cas", match = "first") cir_query("3380-34-5", "cas", resolver = "cas_number") cir_query("3380-34-5", "smiles") cir_query("Triclosan", "mw") # multiple inputs comp <- c("Triclosan", "Aspirin") cir_query(comp, "cas", match = "first") ## End(Not run)
Look for and retrieve ChemSpider API key stored in .Renviron or .Rprofile.
cs_check_key()
cs_check_key()
To use the any of the functions in webchem
that access the
ChemSpider database, you'll need to obtain an API key. Register at
https://developer.rsc.org/ for an API key. Please respect the Terms &
Conditions https://developer.rsc.org/terms.
You can store your API key as CHEMSPIDER_KEY = <your key>
in
.Renviron or as options(chemspider_key = <your key>)
in .Rprofile.
This will allow you to use ChemSpider without adding your API key in the
beginning of each session, and will also allow you to share your analysis
without sharing your API key. Keeping your API key hidden is good practice.
an API key
## Not run: cs_check_key() ## End(Not run)
## Not run: cs_check_key() ## End(Not run)
Submit a ChemSpider ID (CSID) and the fields you are interested in, and retrieve the record details for your query.
cs_compinfo(csid, fields, verbose = getOption("verbose"), apikey = NULL)
cs_compinfo(csid, fields, verbose = getOption("verbose"), apikey = NULL)
csid |
numeric; can be obtained using |
fields |
character; see details. |
verbose |
logical; should a verbose output be printed on the console? |
apikey |
character; your API key. If NULL (default),
|
Valid values for fields
are "SMILES"
,
"Formula"
, "InChI"
, "InChIKey"
, "StdInChI"
,
"StdInChIKey"
, "AverageMass"
, "MolecularWeight"
,
"MonoisotopicMass"
, "NominalMass"
, "CommonName"
,
"ReferenceCount"
, "DataSourceCount"
, "PubMedCount"
,
"RSCCount"
, "Mol2D"
, "Mol3D"
. You can specify any
number of fields.
Returns a data frame.
An API key is needed. Register at https://developer.rsc.org/ for an API key. Please respect the Terms & Conditions. The Terms & Conditions can be found at https://developer.rsc.org/terms.
https://developer.rsc.org/docs/compounds-v1-trial/1/overview
## Not run: cs_compinfo(171, c("SMILES", "CommonName")) cs_compinfo(171:182, "SMILES") ## End(Not run)
## Not run: cs_compinfo(171, c("SMILES", "CommonName")) cs_compinfo(171:182, "SMILES") ## End(Not run)
For some ChemSpider API requests, you can also specify various control options. This function is used to set these control options.
cs_control( datasources = vector(), order_by = "default", order_direction = "default", include_all = FALSE, complexity = "any", isotopic = "any" )
cs_control( datasources = vector(), order_by = "default", order_direction = "default", include_all = FALSE, complexity = "any", isotopic = "any" )
datasources |
character; specifies the databases to query. Use
|
order_by |
character; specifies the sort order for the results.
Valid values are |
order_direction |
character; specifies the sort order for the results.
Valid values are |
include_all |
logical; see details. |
complexity |
character; see details.
Valid values are |
isotopic |
character; see details.
Valid values are |
The only function that currently uses databases
is
get_csid()
and only when you query a CSID from a formula. This
parameter is disregarded in all other queries.
Setting include_all
to TRUE
will consider records
which contain all of the filter criteria specified in the request. Setting
it to FALSE
will consider records which contain any of the filter
criteria.
A compound with a complexity
of "multiple"
has more
than one disconnected system in it or a metal atom or ion.
Returns a list of specified control options.
This is a full list of all API control options. However, not all of these options are used in all functions. Each API uses a subset of these controls. The controls that are available for a given function are indicated within the documentation of the function.
https://developer.rsc.org/docs/compounds-v1-trial/1/overview
cs_control() cs_control(order_direction = "descending")
cs_control() cs_control(order_direction = "descending")
Submit one or more identifiers (CSID, SMILES, InChI, InChIKey or Mol) and return one or more identifiers in another format (CSID, SMILES, InChI, InChIKey or Mol).
cs_convert(query, from, to, verbose = getOption("verbose"), apikey = NULL)
cs_convert(query, from, to, verbose = getOption("verbose"), apikey = NULL)
query |
character; query ID. |
from |
character; type of query ID. |
to |
character; type to convert to. |
verbose |
logical; should a verbose output be printed on the console? |
apikey |
character; your API key. If NULL (default),
|
Not all conversions are supported. Allowed conversions:
CSID <-> InChI
CSID <-> InChIKey
CSID <-> SMILES
CSID -> Mol file
InChI <-> InChIKey
InChI <-> SMILES
InChI -> Mol file
InChIKey <-> Mol file
Returns a vector containing the converted identifier(s).
An API key is needed. Register at https://developer.rsc.org/ for an API key. Please respect the Terms & Conditions. The Terms & Conditions can be found at https://developer.rsc.org/terms.
https://developer.rsc.org/docs/compounds-v1-trial/1/overview
Eduard Szöcs, Tamás Stirling, Eric R. Scott, Andreas Scharmüller, Ralf B. Schäfer (2020). webchem: An R Package to Retrieve Chemical Information from the Web. Journal of Statistical Software, 93(13). doi:10.18637/jss.v093.i13.
## Not run: cs_convert("BQJCRHHNABKAKU-KBQPJGBKSA-N", from = "inchikey", to = "csid" ) cs_convert("BQJCRHHNABKAKU-KBQPJGBKSA-N", from = "inchikey", to = "inchi" ) cs_convert("BQJCRHHNABKAKU-KBQPJGBKSA-N", from = "inchikey", to = "mol" ) cs_convert(160, from = "csid", to = "smiles") ## End(Not run)
## Not run: cs_convert("BQJCRHHNABKAKU-KBQPJGBKSA-N", from = "inchikey", to = "csid" ) cs_convert("BQJCRHHNABKAKU-KBQPJGBKSA-N", from = "inchikey", to = "inchi" ) cs_convert("BQJCRHHNABKAKU-KBQPJGBKSA-N", from = "inchikey", to = "mol" ) cs_convert(160, from = "csid", to = "smiles") ## End(Not run)
The function returns a vector of available data sources used by ChemSpider. Some ChemSpider functions allow you to restrict which sources are used to lookup the requested query. Restricting the sources makes these queries faster.
cs_datasources(apikey = NULL, verbose = getOption("verbose"))
cs_datasources(apikey = NULL, verbose = getOption("verbose"))
apikey |
character; your API key. If NULL (default),
|
verbose |
should a verbose output be printed on the console? |
Returns a character vector.
An API key is needed. Register at https://developer.rsc.org/ for an API key. Please respect the Terms & Conditions. The Terms & Conditions can be found at https://developer.rsc.org/terms.
https://developer.rsc.org/docs/compounds-v1-trial/1/overview
## Not run: cs_datasources() ## End(Not run)
## Not run: cs_datasources() ## End(Not run)
Get extended info from ChemSpider, see https://www.chemspider.com/
cs_extcompinfo(csid, token, verbose = getOption("verbose"), ...)
cs_extcompinfo(csid, token, verbose = getOption("verbose"), ...)
csid |
character, ChemSpider ID. |
token |
character; security token. |
verbose |
logical; should a verbose output be printed on the console? |
... |
currently not used. |
a data.frame with entries: 'csid', 'mf' (molecular formula), 'smiles', 'inchi' (non-standard), 'inchikey' (non-standard), 'average_mass', 'mw' (Molecular weight), 'monoiso_mass' (MonoisotopicMass), nominal_mass', 'alogp', 'xlogp', 'common_name' and 'source_url'
A security token is needed. Please register at RSC https://www.rsc.org/rsc-id/register for a security token. Please respect the Terms & conditions https://www.rsc.org/help-legal/legal/terms-conditions/.
use cs_compinfo
to retrieve standard inchikey.
get_csid
to retrieve ChemSpider IDs,
cs_compinfo
for extended compound information.
## Not run: token <- "<redacted>" csid <- get_csid("Triclosan") cs_extcompinfo(csid, token) csids <- get_csid(c('Aspirin', 'Triclosan')) cs_compinfo(csids) ## End(Not run)
## Not run: token <- "<redacted>" csid <- get_csid("Triclosan") cs_extcompinfo(csid, token) csids <- get_csid(c('Aspirin', 'Triclosan')) cs_compinfo(csids) ## End(Not run)
Retrieve images of substances from ChemSpider and export them in PNG format.
cs_img( csid, dir, overwrite = TRUE, apikey = NULL, verbose = getOption("verbose") )
cs_img( csid, dir, overwrite = TRUE, apikey = NULL, verbose = getOption("verbose") )
csid |
numeric; the ChemSpider ID (CSID) of the substance. This will also be the name of the image file. |
dir |
character; the download directory. |
overwrite |
logical; should existing files in the directory with the same name be overwritten? |
apikey |
character; your API key. If NULL (default),
|
verbose |
logical; should a verbose output be printed on the console? |
An API key is needed. Register at https://developer.rsc.org/ for an API key. Please respect the Terms & Conditions. The Terms & Conditions can be found at https://developer.rsc.org/terms.
https://developer.rsc.org/docs/compounds-v1-trial/1/overview
## Not run: cs_img(c(582, 682), dir = tempdir()) ## End(Not run)
## Not run: cs_img(c(582, 682), dir = tempdir()) ## End(Not run)
Get record details from CTS, see http://cts.fiehnlab.ucdavis.edu/
cts_compinfo( query, from = "inchikey", verbose = getOption("verbose"), inchikey )
cts_compinfo( query, from = "inchikey", verbose = getOption("verbose"), inchikey )
query |
character; InChIkey. |
from |
character; currently only accepts "inchikey". |
verbose |
logical; should a verbose output be printed on the console? |
inchikey |
deprecated |
a list of lists (for each supplied inchikey): a list of 7. inchikey, inchicode, molweight, exactmass, formula, synonyms and externalIds
Wohlgemuth, G., P. K. Haldiya, E. Willighagen, T. Kind, and O. Fiehn 2010The Chemical Translation Service – a Web-Based Tool to Improve Standardization of Metabolomic Reports. Bioinformatics 26(20): 2647–2648.
## Not run: # might fail if API is not available out <- cts_compinfo("XEFQLINVKFYRCS-UHFFFAOYSA-N") # = Triclosan str(out) out[[1]][1:5] ### multiple inputs inchikeys <- c("XEFQLINVKFYRCS-UHFFFAOYSA-N","BSYNRYMUTXBXSQ-UHFFFAOYSA-N" ) out2 <- cts_compinfo(inchikeys) str(out2) # a list of two # extract molecular weight sapply(out2, function(y) y$molweight) ## End(Not run)
## Not run: # might fail if API is not available out <- cts_compinfo("XEFQLINVKFYRCS-UHFFFAOYSA-N") # = Triclosan str(out) out[[1]][1:5] ### multiple inputs inchikeys <- c("XEFQLINVKFYRCS-UHFFFAOYSA-N","BSYNRYMUTXBXSQ-UHFFFAOYSA-N" ) out2 <- cts_compinfo(inchikeys) str(out2) # a list of two # extract molecular weight sapply(out2, function(y) y$molweight) ## End(Not run)
Convert Ids using Chemical Translation Service (CTS), see http://cts.fiehnlab.ucdavis.edu/
cts_convert( query, from, to, match = c("all", "first", "ask", "na"), verbose = getOption("verbose"), choices = NULL, ... )
cts_convert( query, from, to, match = c("all", "first", "ask", "na"), verbose = getOption("verbose"), choices = NULL, ... )
query |
character; query ID. |
from |
character; type of query ID, e.g. |
to |
character; type to convert to. |
match |
character; How should multiple hits be handled? |
verbose |
logical; should a verbose output be printed on the console? |
choices |
deprecated. Use the |
... |
currently not used. |
See also http://cts.fiehnlab.ucdavis.edu/ for possible values of from and to.
a list of character vectors or if choices
is used, then a
single named vector.
Wohlgemuth, G., P. K. Haldiya, E. Willighagen, T. Kind, and O. Fiehn 2010The Chemical Translation Service – a Web-Based Tool to Improve Standardization of Metabolomic Reports. Bioinformatics 26(20): 2647–2648.
cts_from
for possible values in the 'from' argument
and cts_to
for possible values in the 'to' argument.
## Not run: # might fail if API is not available cts_convert("XEFQLINVKFYRCS-UHFFFAOYSA-N", "inchikey", "Chemical Name") ### multiple inputs keys <- c("XEFQLINVKFYRCS-UHFFFAOYSA-N", "VLKZOEOYAKHREP-UHFFFAOYSA-N") cts_convert(keys, "inchikey", "cas") ## End(Not run)
## Not run: # might fail if API is not available cts_convert("XEFQLINVKFYRCS-UHFFFAOYSA-N", "inchikey", "Chemical Name") ### multiple inputs keys <- c("XEFQLINVKFYRCS-UHFFFAOYSA-N", "VLKZOEOYAKHREP-UHFFFAOYSA-N") cts_convert(keys, "inchikey", "cas") ## End(Not run)
Return a list of all possible ids that can be used in the 'from' argument
cts_from(verbose = getOption("verbose"))
cts_from(verbose = getOption("verbose"))
verbose |
logical; should a verbose output be printed on the console? |
See also http://cts.fiehnlab.ucdavis.edu/services
a character vector.
Wohlgemuth, G., P. K. Haldiya, E. Willighagen, T. Kind, and O. Fiehn 2010The Chemical Translation Service – a Web-Based Tool to Improve Standardization of Metabolomic Reports. Bioinformatics 26(20): 2647–2648.
## Not run: cts_from() ## End(Not run)
## Not run: cts_from() ## End(Not run)
Return a list of all possible ids that can be used in the 'to' argument
cts_to(verbose = getOption("verbose"))
cts_to(verbose = getOption("verbose"))
verbose |
logical; should a verbose output be printed on the console? |
See also http://cts.fiehnlab.ucdavis.edu/services
a character vector.
Wohlgemuth, G., P. K. Haldiya, E. Willighagen, T. Kind, and O. Fiehn 2010The Chemical Translation Service – a Web-Based Tool to Improve Standardization of Metabolomic Reports. Bioinformatics 26(20): 2647–2648.
## Not run: cts_from() ## End(Not run)
## Not run: cts_from() ## End(Not run)
Query ETOX: Information System Ecotoxicology and Environmental Quality Targets https://webetox.uba.de/webETOX/index.do for basic information
etox_basic(id, verbose = getOption("verbose"))
etox_basic(id, verbose = getOption("verbose"))
id |
character; ETOX ID |
verbose |
logical; print message during processing to console? |
a list with lists of four entries: cas (the CAS numbers), ec (the EC number), gsbl (the gsbl number), a data.frame synonys with synonyms and the source url.
Before using this function, please read the disclaimer https://webetox.uba.de/webETOX/disclaimer.do.
Eduard Szöcs, Tamás Stirling, Eric R. Scott, Andreas Scharmüller, Ralf B. Schäfer (2020). webchem: An R Package to Retrieve Chemical Information from the Web. Journal of Statistical Software, 93(13). doi:10.18637/jss.v093.i13.
get_etoxid
to retrieve ETOX IDs,
etox_basic
for basic information, etox_targets
for quality targets and etox_tests
for test results
## Not run: id <- get_etoxid('Triclosan', match = 'best') etox_basic(id$etoxid) # Retrieve data for multiple inputs ids <- c("20179", "9051") out <- etox_basic(ids) out # extract cas numbers sapply(out, function(y) y$cas) ## End(Not run)
## Not run: id <- get_etoxid('Triclosan', match = 'best') etox_basic(id$etoxid) # Retrieve data for multiple inputs ids <- c("20179", "9051") out <- etox_basic(ids) out # extract cas numbers sapply(out, function(y) y$cas) ## End(Not run)
Query ETOX: Information System Ecotoxicology and Environmental Quality Targets https://webetox.uba.de/webETOX/index.do for quality targets
etox_targets(id, verbose = getOption("verbose"))
etox_targets(id, verbose = getOption("verbose"))
id |
character; ETOX ID |
verbose |
logical; print message during processing to console? |
A list of lists of two: res
a data.frame with quality targets
from the ETOX database, and source_url.
Before using this function, please read the disclaimer https://webetox.uba.de/webETOX/disclaimer.do.
Eduard Szöcs, Tamás Stirling, Eric R. Scott, Andreas Scharmüller, Ralf B. Schäfer (2020). webchem: An R Package to Retrieve Chemical Information from the Web. Journal of Statistical Software, 93(13). doi:10.18637/jss.v093.i13.
get_etoxid
to retrieve ETOX IDs,
etox_basic
for basic information, etox_targets
for quality targets and etox_tests
for test results
## Not run: id <- get_etoxid('Triclosan', match = 'best') out <- etox_targets(id$etoxid) out[ , c('Substance', 'CAS_NO', 'Country_or_Region', 'Designation', 'Value_Target_LR', 'Unit')] etox_targets( c("20179", "9051")) ## End(Not run)
## Not run: id <- get_etoxid('Triclosan', match = 'best') out <- etox_targets(id$etoxid) out[ , c('Substance', 'CAS_NO', 'Country_or_Region', 'Designation', 'Value_Target_LR', 'Unit')] etox_targets( c("20179", "9051")) ## End(Not run)
Query ETOX: Information System Ecotoxicology and Environmental Quality Targets https://webetox.uba.de/webETOX/index.do for tests
etox_tests(id, verbose = getOption("verbose"))
etox_tests(id, verbose = getOption("verbose"))
id |
character; ETOX ID |
verbose |
logical; print message during processing to console? |
A list of lists of two: A data.frame with test results from the ETOX database and the source_url.
Before using this function, please read the disclaimer https://webetox.uba.de/webETOX/disclaimer.do.
get_etoxid
to retrieve ETOX IDs, etox_basic
for basic information,
etox_targets
for quality targets and etox_tests
for test results
## Not run: id <- get_etoxid('Triclosan', match = 'best') out <- etox_tests(id$etoxid) out[ , c('Organism', 'Effect', 'Duration', 'Time_Unit', 'Endpoint', 'Value', 'Unit')] etox_tests( c("20179", "9051")) ## End(Not run)
## Not run: id <- get_etoxid('Triclosan', match = 'best') out <- etox_tests(id$etoxid) out[ , c('Organism', 'Effect', 'Duration', 'Time_Unit', 'Endpoint', 'Value', 'Unit')] etox_tests( c("20179", "9051")) ## End(Not run)
Extract parts from webchem objects
cas(x, ...) inchikey(x, ...) smiles(x, ...)
cas(x, ...) inchikey(x, ...) smiles(x, ...)
x |
object |
... |
currently not used. |
a vector.
Eduard Szöcs, Tamás Stirling, Eric R. Scott, Andreas Scharmüller, Ralf B. Schäfer (2020). webchem: An R Package to Retrieve Chemical Information from the Web. Journal of Statistical Software, 93(13). doi:10.18637/jss.v093.i13.
Checks if entries are found in (most) data sources included in webchem
find_db( query, from, sources = c("etox", "pc", "chebi", "cs", "bcpc", "fn", "srs"), plot = FALSE )
find_db( query, from, sources = c("etox", "pc", "chebi", "cs", "bcpc", "fn", "srs"), plot = FALSE )
query |
character; the search term |
from |
character; the format or type of query. Commonly accepted values are "name", "cas", "inchi", and "inchikey" |
sources |
character; which data sources to check. Data sources are identified by the prefix associated with webchem functions that query those databases. If not specified, all data sources listed will be checked. |
plot |
logical; plot a graphical representation of results. |
a tibble of logical values where TRUE
indicates that a data
source contains a record for the query
## Not run: find_db("hexane", from = "name") ## End(Not run)
## Not run: find_db("hexane", from = "name") ## End(Not run)
Retrieve flavor percepts from http://www.flavornet.org. Flavornet is a database of 738 compounds with odors perceptible to humans detected using gas chromatography olfactometry (GCO).
fn_percept(query, from = "cas", verbose = getOption("verbose"), CAS, ...)
fn_percept(query, from = "cas", verbose = getOption("verbose"), CAS, ...)
query |
character; CAS number to search by. See |
from |
character; currently only CAS numbers are accepted. |
verbose |
logical; should a verbose output be printed on the console? |
CAS |
deprecated |
... |
currently unused |
A named character vector containing flavor percepts or NA's in the case of CAS numbers that are not found
## Not run: # might fail if website is not available fn_percept("123-32-0") CASs <- c("75-07-0", "64-17-5", "109-66-0", "78-94-4", "78-93-3") fn_percept(CASs) ## End(Not run)
## Not run: # might fail if website is not available fn_percept("123-32-0") CASs <- c("75-07-0", "64-17-5", "109-66-0", "78-94-4", "78-93-3") fn_percept(CASs) ## End(Not run)
Returns a data.frame with a ChEBI entity ID (chebiid), a ChEBI entity name (chebiasciiname), a search score (searchscore) and stars (stars) using the SOAP protocol: https://www.ebi.ac.uk/chebi/webServices.do
get_chebiid( query, from = c("all", "chebi id", "chebi name", "definition", "name", "iupac name", "citations", "registry numbers", "manual xrefs", "automatic xrefs", "formula", "mass", "monoisotopic mass", "charge", "inchi", "inchikey", "smiles", "species"), match = c("all", "best", "first", "ask", "na"), max_res = 200, stars = c("all", "two only", "three only"), verbose = getOption("verbose"), ... )
get_chebiid( query, from = c("all", "chebi id", "chebi name", "definition", "name", "iupac name", "citations", "registry numbers", "manual xrefs", "automatic xrefs", "formula", "mass", "monoisotopic mass", "charge", "inchi", "inchikey", "smiles", "species"), match = c("all", "best", "first", "ask", "na"), max_res = 200, stars = c("all", "two only", "three only"), verbose = getOption("verbose"), ... )
query |
character; search term. |
from |
character; type of input. |
match |
character; How should multiple hits be handled?, |
max_res |
integer; maximum number of results to be retrieved from the web service |
stars |
character; "three only" restricts results to those manualy annotated by the ChEBI team. |
verbose |
logical; should a verbose output be printed on the console? |
... |
currently unused |
returns a list of data.frames containing a chebiid, a chebiasciiname, a searchscore and stars if matches were found. If not, data.frame(NA) is returned
Hastings J, Owen G, Dekker A, Ennis M, Kale N, Muthukrishnan V, Turner S, Swainston N, Mendes P, Steinbeck C. (2016). ChEBI in 2016: Improved services and an expanding collection of metabfolites. Nucleic Acids Res.
Hastings, J., de Matos, P., Dekker, A., Ennis, M., Harsha, B., Kale, N., Muthukrishnan, V., Owen, G., Turner, S., Williams, M., and Steinbeck, C. (2013) The ChEBI reference database and ontology for biologically relevant chemistry: enhancements for 2013. Nucleic Acids Res.
de Matos, P., Alcantara, R., Dekker, A., Ennis, M., Hastings, J., Haug, K., Spiteri, I., Turner, S., and Steinbeck, C. (2010) Chemical entities of biological interest: an update. Nucleic Acids Res. Degtyarenko, K., Hastings, J., de Matos, P., and Ennis, M. (2009). ChEBI: an open bioinformatics and cheminformatics resource. Current protocols in bioinformatics / editoral board, Andreas D. Baxevanis et al., Chapter 14.
Degtyarenko, K., de Matos, P., Ennis, M., Hastings, J., Zbinden, M., McNaught, A., Alcántara, R., Darsow, M., Guedj, M. and Ashburner, M. (2008) ChEBI: a database and ontology for chemical entities of biological interest. Nucleic Acids Res. 36, D344–D350.
Eduard Szöcs, Tamás Stirling, Eric R. Scott, Andreas Scharmüller, Ralf B. Schäfer (2020). webchem: An R Package to Retrieve Chemical Information from the Web. Journal of Statistical Software, 93(13). doi:10.18637/jss.v093.i13.
## Not run: # might fail if API is not available get_chebiid('Glyphosate') get_chebiid('BPGDAMSIGCZZLK-UHFFFAOYSA-N') # multiple inputs comp <- c('Iron', 'Aspirin', 'BPGDAMSIGCZZLK-UHFFFAOYSA-N') get_chebiid(comp) ## End(Not run)
## Not run: # might fail if API is not available get_chebiid('Glyphosate') get_chebiid('BPGDAMSIGCZZLK-UHFFFAOYSA-N') # multiple inputs comp <- c('Iron', 'Aspirin', 'BPGDAMSIGCZZLK-UHFFFAOYSA-N') get_chebiid(comp) ## End(Not run)
Retrieve compound IDs (CIDs) from PubChem.
get_cid( query, from = "name", domain = c("compound", "substance", "assay"), match = c("all", "first", "ask", "na"), verbose = getOption("verbose"), arg = NULL, first = NULL, ... )
get_cid( query, from = "name", domain = c("compound", "substance", "assay"), match = c("all", "first", "ask", "na"), verbose = getOption("verbose"), arg = NULL, first = NULL, ... )
query |
character; search term, one or more compounds. |
from |
character; type of input. See details for more information. |
domain |
character; query domain, can be one of |
match |
character; How should multiple hits be handled?, |
verbose |
logical; should a verbose output be printed on the console? |
arg |
character; optinal arguments like "name_type=word" to match individual words. |
first |
deprecated. Use 'match' instead. |
... |
currently unused. |
Valid values for the from
argument depend on the
domain
:
compound
: "name"
, "smiles"
, "inchi"
,
"inchikey"
, "formula"
, "sdf"
, "cas"
(an alias for
"xref/RN"
), <xref>, <structure search>, <fast search>.
substance
: "name"
, "sid"
,
<xref>
, "sourceid/<source id>"
or "sourceall"
.
assay
: "aid"
, <assay target>
.
<structure search> is assembled as "substructure
|
superstructure
| similarity
| identity
/ smiles
| inchi
| sdf
| cid
", e.g.
from = "substructure/smiles"
.
<xref>
is assembled as "xref
/{RegistryID
|
RN
| PubMedID
| MMDBID
| ProteinGI
,
NucleotideGI
| TaxonomyID
| MIMID
| GeneID
|
ProbeID
| PatentID
}", e.g. from = "xref/RN"
will query
by CAS RN.
<fast search> is either fastformula
or it is assembled as
"fastidentity
| fastsimilarity_2d
| fastsimilarity_3d
|
fastsubstructure
| fastsuperstructure
/smiles
|
smarts
| inchi
| sdf
| cid
", e.g.
from = "fastidentity/smiles"
.
<source id>
is any valid PubChem Data Source ID. When
from = "sourceid/<source id>"
, the query is the ID of the substance in
the depositor's database.
If from = "sourceall"
the query is one or more valid Pubchem
depositor names. Depositor names are not case sensitive.
Depositor names and Data Source IDs can be found at https://pubchem.ncbi.nlm.nih.gov/sources/.
<assay target>
is assembled as "target
/{gi
|
proteinname
| geneid
| genesymbol
| accession
}",
e.g. from = "target/geneid"
will query by GeneID.
a tibble.
Please respect the Terms and Conditions of the National Library of Medicine, https://www.nlm.nih.gov/databases/download.html the data usage policies of National Center for Biotechnology Information, https://www.ncbi.nlm.nih.gov/home/about/policies/, https://pubchem.ncbi.nlm.nih.gov/docs/programmatic-access, and the data usage policies of the indicidual data sources https://pubchem.ncbi.nlm.nih.gov/sources/.
Wang, Y., J. Xiao, T. O. Suzek, et al. 2009 PubChem: A Public Information System for Analyzing Bioactivities of Small Molecules. Nucleic Acids Research 37: 623–633.
Kim, Sunghwan, Paul A. Thiessen, Evan E. Bolton, et al. 2016 PubChem Substance and Compound Databases. Nucleic Acids Research 44(D1): D1202–D1213.
Kim, S., Thiessen, P. A., Bolton, E. E., & Bryant, S. H. (2015). PUG-SOAP and PUG-REST: web services for programmatic access to chemical information in PubChem. Nucleic acids research, gkv396.
Eduard Szöcs, Tamás Stirling, Eric R. Scott, Andreas Scharmüller, Ralf B. Schäfer (2020). webchem: An R Package to Retrieve Chemical Information from the Web. Journal of Statistical Software, 93(13). doi:10.18637/jss.v093.i13.
## Not run: # might fail if API is not available get_cid("Triclosan") get_cid("Triclosan", arg = "name_type=word") # from SMILES get_cid("CCCC", from = "smiles") # from InChI get_cid("InChI=1S/CH5N/c1-2/h2H2,1H3", from = "inchi") # from InChIKey get_cid("BPGDAMSIGCZZLK-UHFFFAOYSA-N", from = "inchikey") # from formula get_cid("C26H52NO6P", from = "formula") # from CAS RN get_cid("56-40-6", from = "xref/rn") # similarity get_cid(5564, from = "similarity/cid") get_cid("CCO", from = "similarity/smiles") # from SID get_cid("126534046", from = "sid", domain = "substance") # sourceid get_cid("VCC957895", from = "sourceid/23706", domain = "substance") # sourceall get_cid("Optopharma Ltd", from = "sourceall", domain = "substance") # from AID (CIDs of substances tested in the assay) get_cid(170004, from = "aid", domain = "assay") # from GeneID (CIDs of substances tested on the gene) get_cid(25086, from = "target/geneid", domain = "assay") # multiple inputs get_cid(c("Triclosan", "Aspirin")) ## End(Not run)
## Not run: # might fail if API is not available get_cid("Triclosan") get_cid("Triclosan", arg = "name_type=word") # from SMILES get_cid("CCCC", from = "smiles") # from InChI get_cid("InChI=1S/CH5N/c1-2/h2H2,1H3", from = "inchi") # from InChIKey get_cid("BPGDAMSIGCZZLK-UHFFFAOYSA-N", from = "inchikey") # from formula get_cid("C26H52NO6P", from = "formula") # from CAS RN get_cid("56-40-6", from = "xref/rn") # similarity get_cid(5564, from = "similarity/cid") get_cid("CCO", from = "similarity/smiles") # from SID get_cid("126534046", from = "sid", domain = "substance") # sourceid get_cid("VCC957895", from = "sourceid/23706", domain = "substance") # sourceall get_cid("Optopharma Ltd", from = "sourceall", domain = "substance") # from AID (CIDs of substances tested in the assay) get_cid(170004, from = "aid", domain = "assay") # from GeneID (CIDs of substances tested on the gene) get_cid(25086, from = "target/geneid", domain = "assay") # multiple inputs get_cid(c("Triclosan", "Aspirin")) ## End(Not run)
Query one or more compunds by name, formula, SMILES, InChI or InChIKey and return a vector of ChemSpider IDs.
get_csid( query, from = c("name", "formula", "inchi", "inchikey", "smiles"), match = c("all", "first", "ask", "na"), verbose = getOption("verbose"), apikey = NULL, ... )
get_csid( query, from = c("name", "formula", "inchi", "inchikey", "smiles"), match = c("all", "first", "ask", "na"), verbose = getOption("verbose"), apikey = NULL, ... )
query |
character; search term. |
from |
character; the type of the identifier to convert from. Valid
values are |
match |
character; How should multiple hits be handled?, "all" all matches are returned, "best" the best matching is returned, "ask" enters an interactive mode and the user is asked for input, "na" returns NA if multiple hits are found. |
verbose |
logical; should a verbose output be printed on the console? |
apikey |
character; your API key. If NULL (default),
|
... |
furthrer arguments passed to |
Queries by SMILES, InChI or InChiKey do not use cs_control
options. Queries by name use order_by
and order_direction
.
Queries by formula also use datasources
. See cs_control()
for
a full list of valid values for these control options.
formula
can be expressed with and without LaTeX syntax.
Returns a tibble.
An API key is needed. Register at https://developer.rsc.org/ for an API key. Please respect the Terms & conditions: https://developer.rsc.org/terms.
https://developer.rsc.org/docs/compounds-v1-trial/1/overview
Eduard Szöcs, Tamás Stirling, Eric R. Scott, Andreas Scharmüller, Ralf B. Schäfer (2020). webchem: An R Package to Retrieve Chemical Information from the Web. Journal of Statistical Software, 93(13). doi:10.18637/jss.v093.i13.
## Not run: get_csid("triclosan") get_csid(c("carbamazepine", "naproxene","oxygen")) get_csid("C2H6O", from = "formula") get_csid("C_{2}H_{6}O", from = "formula") get_csid("CC(O)=O", from = "smiles") get_csid("InChI=1S/C2H4O2/c1-2(3)4/h1H3,(H,3,4)", from = "inchi") get_csid("QTBSBXVTEAMEQO-UHFFFAOYAR", from = "inchikey") ## End(Not run)
## Not run: get_csid("triclosan") get_csid(c("carbamazepine", "naproxene","oxygen")) get_csid("C2H6O", from = "formula") get_csid("C_{2}H_{6}O", from = "formula") get_csid("CC(O)=O", from = "smiles") get_csid("InChI=1S/C2H4O2/c1-2(3)4/h1H3,(H,3,4)", from = "inchi") get_csid("QTBSBXVTEAMEQO-UHFFFAOYAR", from = "inchikey") ## End(Not run)
Query ETOX: Information System Ecotoxicology and Environmental Quality Targets https://webetox.uba.de/webETOX/index.do for their substance ID
get_etoxid( query, from = c("name", "cas", "ec", "gsbl", "rtecs"), match = c("all", "best", "first", "ask", "na"), verbose = getOption("verbose") )
get_etoxid( query, from = c("name", "cas", "ec", "gsbl", "rtecs"), match = c("all", "best", "first", "ask", "na"), verbose = getOption("verbose") )
query |
character; The searchterm |
from |
character; Type of input, can be one of "name" (chemical name), "cas" (CAS Number), "ec" (European Community number for regulatory purposes), "gsbl" (Identifier used by https://www.chemikalieninfo.de/) and "rtecs" (Identifier used by the Registry of Toxic Effects of Chemical Substances database). |
match |
character; How should multiple hits be handeled? "all" returns
all matched IDs, "first" only the first match, "best" the best matching (by
name) ID, "ask" is a interactive mode and the user is asked for input, "na"
returns |
verbose |
logical; print message during processing to console? |
a tibble with 3 columns: the query, the match, and the etoxID
Before using this function, please read the disclaimer https://webetox.uba.de/webETOX/disclaimer.do.
Eduard Szöcs, Tamás Stirling, Eric R. Scott, Andreas Scharmüller, Ralf B. Schäfer (2020). webchem: An R Package to Retrieve Chemical Information from the Web. Journal of Statistical Software, 93(13). doi:10.18637/jss.v093.i13.
etox_basic
for basic information,
etox_targets
for quality targets and
etox_tests
for test results.
## Not run: # might fail if API is not available get_etoxid("Triclosan") # multiple inputs comps <- c("Triclosan", "Glyphosate") get_etoxid(comps) get_etoxid(comps, match = "all") get_etoxid("34123-59-6", from = "cas") # Isoproturon get_etoxid("133483", from = "gsbl") # 3-Butin-1-ol get_etoxid("203-157-5", from = "ec") # Paracetamol ## End(Not run)
## Not run: # might fail if API is not available get_etoxid("Triclosan") # multiple inputs comps <- c("Triclosan", "Glyphosate") get_etoxid(comps) get_etoxid(comps, match = "all") get_etoxid("34123-59-6", from = "cas") # Isoproturon get_etoxid("133483", from = "gsbl") # 3-Butin-1-ol get_etoxid("203-157-5", from = "ec") # Paracetamol ## End(Not run)
Search www.wikidata.org for wikidata item identifiers. Note that this search is currently not limited to chemical substances, so be sure to check your results.
get_wdid( query, match = c("best", "first", "all", "ask", "na"), verbose = getOption("verbose"), language = "en" )
get_wdid( query, match = c("best", "first", "all", "ask", "na"), verbose = getOption("verbose"), language = "en" )
query |
character; The searchterm |
match |
character; How should multiple hits be handeled? 'all' returns all matched IDs, 'first' only the first match, 'best' the best matching (by name) ID, 'ask' is a interactive mode and the user is asked for input, na' returns NA if multiple hits are found. |
verbose |
logical; print message during processing to console? |
language |
character; the language to search in |
if match = 'all' a list with ids, otherwise a dataframe with 4 columns: id, matched text, string distance to match and the queried string
Only matches in labels are returned.
## Not run: get_wdid('Triclosan', language = 'de') get_wdid('DDT') get_wdid('DDT', match = 'all') # multiple inputs comps <- c('Triclosan', 'Glyphosate') get_wdid(comps) ## End(Not run)
## Not run: get_wdid('Triclosan', language = 'de') get_wdid('DDT') get_wdid('DDT', match = 'all') # multiple inputs comps <- c('Triclosan', 'Glyphosate') get_wdid(comps) ## End(Not run)
This function checks if a string is a valid CAS registry number. A valid CAS is 1) separated by two hyphes into three parts; 2) the first part consists from two up to seven digits; 3) the second of two digits; 4) the third of one digit (check digit); 5) the check digits corresponds the checksum. The checksum is found by taking the last digit (excluding the check digit) multiplyingit with 1, the second last multiplied with 2, the third-last multiplied with 3 etc. The modulo 10 of the sum of these is the checksum.
is.cas(x, verbose = getOption("verbose"))
is.cas(x, verbose = getOption("verbose"))
x |
character; input CAS |
verbose |
logical; print messages during processing to console? |
a logical
This function can only handle one CAS string
Eduard Szöcs, Tamás Stirling, Eric R. Scott, Andreas Scharmüller, Ralf B. Schäfer (2020). webchem: An R Package to Retrieve Chemical Information from the Web. Journal of Statistical Software, 93(13). doi:10.18637/jss.v093.i13.
is.cas('64-17-5') is.cas('64175') is.cas('4-17-5') is.cas('64-177-6') is.cas('64-17-55') is.cas('64-17-6')
is.cas('64-17-5') is.cas('64175') is.cas('4-17-5') is.cas('64-177-6') is.cas('64-17-55') is.cas('64-17-6')
This function checks if a string is a valid inchikey. Inchikey must fulfill the following criteria: 1) consist of 27 characters; 2) be all uppercase, all letters (no numbers); 3) contain two hyphens at positions 15 and 26; 4) 24th character (flag character) be 'S' (Standard InChI) or 'N' (non-standard) 5) 25th character (version character) must be 'A' (currently).
is.inchikey( x, type = c("format", "chemspider"), verbose = getOption("verbose") )
is.inchikey( x, type = c("format", "chemspider"), verbose = getOption("verbose") )
x |
character; input InChIKey |
type |
character; How should be checked? Either, by format (see above) ('format') or by ChemSpider ('chemspider'). |
verbose |
logical; print messages during processing to console? |
a logical
This function can handle only one inchikey string.
Heller, Stephen R., et al. "InChI, the IUPAC International Chemical Identifier." Journal of Cheminformatics 7.1 (2015): 23.
Eduard Szöcs, Tamás Stirling, Eric R. Scott, Andreas Scharmüller, Ralf B. Schäfer (2020). webchem: An R Package to Retrieve Chemical Information from the Web. Journal of Statistical Software, 93(13). doi:10.18637/jss.v093.i13.
is.inchikey('BQJCRHHNABKAKU-KBQPJGBKSA-N') is.inchikey('BQJCRHHNABKAKU-KBQPJGBKSA') is.inchikey('BQJCRHHNABKAKU-KBQPJGBKSA-5') is.inchikey('BQJCRHHNABKAKU-KBQPJGBKSA-n') is.inchikey('BQJCRHHNABKAKU/KBQPJGBKSA/N') is.inchikey('BQJCRHHNABKAKU-KBQPJGBKXA-N') is.inchikey('BQJCRHHNABKAKU-KBQPJGBKSB-N')
is.inchikey('BQJCRHHNABKAKU-KBQPJGBKSA-N') is.inchikey('BQJCRHHNABKAKU-KBQPJGBKSA') is.inchikey('BQJCRHHNABKAKU-KBQPJGBKSA-5') is.inchikey('BQJCRHHNABKAKU-KBQPJGBKSA-n') is.inchikey('BQJCRHHNABKAKU/KBQPJGBKSA/N') is.inchikey('BQJCRHHNABKAKU-KBQPJGBKXA-N') is.inchikey('BQJCRHHNABKAKU-KBQPJGBKSB-N')
Check if input is a valid inchikey using ChemSpider API
is.inchikey_cs(x, verbose = getOption("verbose"))
is.inchikey_cs(x, verbose = getOption("verbose"))
x |
character; input string |
verbose |
logical; print messages during processing to console? |
a logical
is.inchikey
for a pure-R implementation.
## Not run: # might fail if API is not available is.inchikey_cs('BQJCRHHNABKAKU-KBQPJGBKSA-N') is.inchikey_cs('BQJCRHHNABKAKU-KBQPJGBKSA') is.inchikey_cs('BQJCRHHNABKAKU-KBQPJGBKSA-5') is.inchikey_cs('BQJCRHHNABKAKU-KBQPJGBKSA-n') is.inchikey_cs('BQJCRHHNABKAKU/KBQPJGBKSA/N') is.inchikey_cs('BQJCRHHNABKAKU-KBQPJGBKXA-N') is.inchikey_cs('BQJCRHHNABKAKU-KBQPJGBKSB-N') ## End(Not run)
## Not run: # might fail if API is not available is.inchikey_cs('BQJCRHHNABKAKU-KBQPJGBKSA-N') is.inchikey_cs('BQJCRHHNABKAKU-KBQPJGBKSA') is.inchikey_cs('BQJCRHHNABKAKU-KBQPJGBKSA-5') is.inchikey_cs('BQJCRHHNABKAKU-KBQPJGBKSA-n') is.inchikey_cs('BQJCRHHNABKAKU/KBQPJGBKSA/N') is.inchikey_cs('BQJCRHHNABKAKU-KBQPJGBKXA-N') is.inchikey_cs('BQJCRHHNABKAKU-KBQPJGBKSB-N') ## End(Not run)
Inchikey must fulfill the following criteria: 1) consist of 27 characters; 2) be all uppercase, all letters (no numbers); 3) contain two hyphens at positions 15 and 26; 4) 24th character (flag character) be 'S' (Standard InChI) or 'N' (non-standard) 5) 25th character (version character) must be 'A' (currently).
is.inchikey_format(x, verbose = getOption("verbose"))
is.inchikey_format(x, verbose = getOption("verbose"))
x |
character; input string |
verbose |
logical; print messages during processing to console? |
a logical
is.inchikey
for a pure-R implementation.
## Not run: # might fail if API is not available is.inchikey_format('BQJCRHHNABKAKU-KBQPJGBKSA-N') is.inchikey_format('BQJCRHHNABKAKU-KBQPJGBKSA') is.inchikey_format('BQJCRHHNABKAKU-KBQPJGBKSA-5') is.inchikey_format('BQJCRHHNABKAKU-KBQPJGBKSA-n') is.inchikey_format('BQJCRHHNABKAKU/KBQPJGBKSA/N') is.inchikey_format('BQJCRHHNABKAKU-KBQPJGBKXA-N') is.inchikey_format('BQJCRHHNABKAKU-KBQPJGBKSB-N') ## End(Not run)
## Not run: # might fail if API is not available is.inchikey_format('BQJCRHHNABKAKU-KBQPJGBKSA-N') is.inchikey_format('BQJCRHHNABKAKU-KBQPJGBKSA') is.inchikey_format('BQJCRHHNABKAKU-KBQPJGBKSA-5') is.inchikey_format('BQJCRHHNABKAKU-KBQPJGBKSA-n') is.inchikey_format('BQJCRHHNABKAKU/KBQPJGBKSA/N') is.inchikey_format('BQJCRHHNABKAKU-KBQPJGBKXA-N') is.inchikey_format('BQJCRHHNABKAKU-KBQPJGBKSB-N') ## End(Not run)
This function checks if a string is a valid SMILES by checking if (R)CDK can parse it. If it cannot be parsed by rcdk FALSE is returned, else TRUE.
is.smiles(x, verbose = getOption("verbose"))
is.smiles(x, verbose = getOption("verbose"))
x |
character; input SMILES. |
verbose |
logical; print messages during processing to console? |
a logical
This function can handle only one SMILES string.
Egon Willighagen (2015). How to test SMILES strings in Supplementary Information. https://chem-bla-ics.blogspot.nl/2015/10/how-to-test-smiles-strings-in.html
## Not run: # might fail if rcdk is not working properly is.smiles('Clc(c(Cl)c(Cl)c1C(=O)O)c(Cl)c1Cl') is.smiles('Clc(c(Cl)c(Cl)c1C(=O)O)c(Cl)c1ClJ') ## End(Not run)
## Not run: # might fail if rcdk is not working properly is.smiles('Clc(c(Cl)c(Cl)c1C(=O)O)c(Cl)c1Cl') is.smiles('Clc(c(Cl)c(Cl)c1C(=O)O)c(Cl)c1ClJ') ## End(Not run)
This dataset comprises environmental monitoring data of organic plant protection products in the year 2013 in the river Jagst, Germany. The data is publicly available and can be retrieved from the LUBW Landesanstalt für Umwelt, Messungen und Naturschutz Baden-Württemberg. It has been preprocessed and comprises measurements of 34 substances. Substances without detects have been removed. on 13 sampling occasions. Values are given in ug/L.
jagst
jagst
A data frame with 442 rows and 4 variables:
sampling data
substance names
concentration in ug/L
qualifier, indicating values < LOQ
https://udo.lubw.baden-wuerttemberg.de/public/pages/home/index.xhtml
This dataset comprises acute ecotoxicity data of 124 insecticides. The data is publicly available and can be retrieved from the EPA ECOTOX database (https://cfpub.epa.gov/ecotox/) It comprises acute toxicity data (D. magna, 48h, Laboratory, 48h) and has been preprocessed (remove non-insecticides, aggregate multiple value, keep only numeric data etc).
lc50
lc50
A data frame with 124 rows and 2 variables:
CAS registry number
LC50value
This function scrapes NIST for literature retention indices
given a query or vector of queries as input. The query can be a cas
number, IUPAC name, or International Chemical Identifier (inchikey
),
according to the value of the from
argument. Retention indices are
stored in tables by type
, polarity
and temperature program
(temp_prog
). The function can take multiple arguments for these
parameters and will return any retention times matching the specified
criteria in a single table.
If a non-cas query is provided, the function will try to resolve the query
by searching the NIST WebBook for a corresponding CAS number. If
from == "name"
, phonetic spellings of Greek stereo-descriptors
(e.g. "alpha", "beta", "gamma") will be automatically
converted to the corresponding letters to match the form used by NIST. If
a CAS number is found, it will be returned in a tibble
with the
corresponding information from the NIST retention index database.
nist_ri( query, from = c("cas", "inchi", "inchikey", "name"), type = c("kovats", "linear", "alkane", "lee"), polarity = c("polar", "non-polar"), temp_prog = c("isothermal", "ramp", "custom"), cas = NULL, verbose = getOption("verbose") )
nist_ri( query, from = c("cas", "inchi", "inchikey", "name"), type = c("kovats", "linear", "alkane", "lee"), polarity = c("polar", "non-polar"), temp_prog = c("isothermal", "ramp", "custom"), cas = NULL, verbose = getOption("verbose") )
query |
character; the search term |
from |
character; type of search term. can be one of |
type |
Retention index type: |
polarity |
Column polarity: |
temp_prog |
Temperature program: |
cas |
deprecated. Use |
verbose |
logical; should a verbose output be printed on the console? |
The types of retention indices included in NIST include Kovats
("kovats"
), Van den Dool and Kratz ("linear"
), normal alkane
("alkane"
), and Lee ("lee"
). Details about how these are
calculated are available on the NIST website:
https://webbook.nist.gov/chemistry/gc-ri/
returns a tibble of literature RIs with the following columns:
query
is the query provided to the NIST server
cas
is the CAS number or unique record identified used by NIST
RI
is retention index
type
is the type of RI (e.g. "kovats", "linear", "alkane", or "lee")
polarity
is the polarity of the column (either "polar" or "non-polar")
temp_prog
is the type of temperature program (e.g. "isothermal", "ramp", or "custom")
column
is the column type, e.g. "capillary"
phase
is the stationary phase (column phase)
length
is column length in meters
gas
is the carrier gas used
substrate
diameter
is the column diameter in mm
thickness
is the phase thickness in µm
program
. various columns depending on the value of
temp_prog
reference
is where this retention index was published
comment
. I believe this denotes the database these data
were aggregated from
Copyright for NIST Standard Reference Data is governed by the Standard Reference Data Act, https://www.nist.gov/srd/public-law.
NIST Mass Spectrometry Data Center, William E. Wallace, director, "Retention Indices" in NIST Chemistry WebBook, NIST Standard Reference Database Number 69, Eds. P.J. Linstrom and W.G. Mallard, National Institute of Standards and Technology, Gaithersburg MD, 20899, doi:10.18434/T4D303.
## Not run: myRIs <- nist_ri( c("78-70-6", "13474-59-4"), from = "cas", type = c("linear", "kovats"), polarity = "non-polar", temp_prog = "ramp" ) myRIs ## End(Not run)
## Not run: myRIs <- nist_ri( c("78-70-6", "13474-59-4"), from = "cas", type = c("linear", "kovats"), polarity = "non-polar", temp_prog = "ramp" ) myRIs ## End(Not run)
Query the OPSIN (Open Parser for Systematic IUPAC nomenclature) web service https://opsin.ch.cam.ac.uk/instructions.html.
opsin_query(query, verbose = getOption("verbose"), ...)
opsin_query(query, verbose = getOption("verbose"), ...)
query |
character; chemical name that should be queryed. |
verbose |
logical; should a verbose output be printed on the console? |
... |
currently not used. |
a tibble with six columnns: "query", inchi", "stdinchi", "stdinchikey", "smiles", "message", and "status"
Lowe, D. M., Corbett, P. T., Murray-Rust, P., & Glen, R. C. (2011). Chemical Name to Structure: OPSIN, an Open Source Solution. Journal of Chemical Information and Modeling, 51(3), 739–753. doi:10.1021/ci100384d
## Not run: opsin_query('Cyclopropane') opsin_query(c('Cyclopropane', 'Octane')) opsin_query(c('Cyclopropane', 'Octane', 'xxxxx')) ## End(Not run)
## Not run: opsin_query('Cyclopropane') opsin_query(c('Cyclopropane', 'Octane')) opsin_query(c('Cyclopropane', 'Octane', 'xxxxx')) ## End(Not run)
Parse Molfile (as returned by ChemSpider) into a R-object.
parse_mol(string)
parse_mol(string)
string |
molfile as one string |
A list with of four entries: header (eh), counts line (cl), atom block (ab) and bond block (bb).
header: a = number of atoms, b = number of bonds, l = number of atom lists, f = obsolete, c = chiral flag (0=not chiral, 1 = chiral), s = number of stext entries, x, r, p, i = obsolete, m = 999, v0 version
atom block: x, y, z = atom coordinates, a = mass difference, c= charge, s= stereo parity, h = hydrogen count 1, b = stereo care box, v = valence, h = h0 designator, r, i = not used, m = atom-atom mapping number, n = inversion/retention flag, e = exact change flag
bond block: 1 = first atom, 2 = second atom, t = bond type, s = stereo type, x = not used, r = bond typology, c = reacting center status.
Grabner, M., Varmuza, K., & Dehmer, M. (2012). RMol: a toolset for transforming SD/Molfile structure information into R objects. Source Code for Biology and Medicine, 7, 12. doi:10.1186/1751-0473-7-12
Retrieve compound information from pubchem CID, see https://pubchem.ncbi.nlm.nih.gov/
pc_prop(cid, properties = NULL, verbose = getOption("verbose"), ...)
pc_prop(cid, properties = NULL, verbose = getOption("verbose"), ...)
cid |
numeric; a vector of Pubchem IDs (CIDs). The input vector will be coerced to a vector of positive integers. The function will return a row of NAs for elements that cannot be coerced to positive integers. |
properties |
character; a vector of properties to retrieve, e.g. c("MolecularFormula", "MolecularWeight"). If NULL (default) all available properties are retrieved. See https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest for a list of all available properties. |
verbose |
logical; should a verbose output be printed to the console? |
... |
currently not used. |
a tibble; each row is a queried CID, each column is a requested property.
Please respect the Terms and Conditions of the National Library of Medicine, https://www.nlm.nih.gov/databases/download.html the data usage policies of National Center for Biotechnology Information, https://www.ncbi.nlm.nih.gov/home/about/policies/, https://pubchem.ncbi.nlm.nih.gov/docs/programmatic-access, and the data usage policies of the indicidual data sources https://pubchem.ncbi.nlm.nih.gov/sources/.
Wang, Y., J. Xiao, T. O. Suzek, et al. 2009 PubChem: A Public Information System for Analyzing Bioactivities of Small Molecules. Nucleic Acids Research 37: 623–633.
Kim, Sunghwan, Paul A. Thiessen, Evan E. Bolton, et al. 2016 PubChem Substance and Compound Databases. Nucleic Acids Research 44(D1): D1202–D1213.
Kim, S., Thiessen, P. A., Bolton, E. E., & Bryant, S. H. (2015). PUG-SOAP and PUG-REST: web services for programmatic access to chemical information in PubChem. Nucleic acids research, gkv396.
Eduard Szöcs, Tamás Stirling, Eric R. Scott, Andreas Scharmüller, Ralf B. Schäfer (2020). webchem: An R Package to Retrieve Chemical Information from the Web. Journal of Statistical Software, 93(13). doi:10.18637/jss.v093.i13.
## Not run: # might fail if API is not available pc_prop(5564) ### # multiple CIDS comp <- c("Triclosan", "Aspirin") cids <- get_cid(comp) pc_prop(cids$cid, properties = c("MolecularFormula", "MolecularWeight", "CanonicalSMILES")) ## End(Not run)
## Not run: # might fail if API is not available pc_prop(5564) ### # multiple CIDS comp <- c("Triclosan", "Aspirin") cids <- get_cid(comp) pc_prop(cids$cid, properties = c("MolecularFormula", "MolecularWeight", "CanonicalSMILES")) ## End(Not run)
When you search for an entity at https://pubchem.ncbi.nlm.nih.gov/, e.g. a compound or a substance, and select the record you are interested in, you will be forwarded to a PubChem content page. When you look at a PubChem content page, you can see that chemical information is organised into sections, subsections, etc. The chemical data live at the lowest levels of these sections. Use this function to retrieve the lowest level information from PubChem content pages.
pc_sect( id, section, domain = c("compound", "substance", "assay", "gene", "protein", "patent"), verbose = getOption("verbose") )
pc_sect( id, section, domain = c("compound", "substance", "assay", "gene", "protein", "patent"), verbose = getOption("verbose") )
id |
numeric or character; a vector of PubChem identifiers to search for. |
section |
character; the section of the content page to be imported. |
domain |
character; the query domain. Can be one of |
verbose |
logical; should a verbose output be printed on the console? |
section
is not case sensitive but it is sensitive to typing
errors and it requires the full name of the section as it is printed on the
content page. The PubChem Table of Contents Tree can also be found at
https://pubchem.ncbi.nlm.nih.gov/classification/#hid=72.
Returns a tibble of query results. In the returned tibble,
SourceName
is the name of the depositor, and SourceID
is the
ID of the search term within the depositor's database. You can browse
https://pubchem.ncbi.nlm.nih.gov/sources/ for more information about
the depositors.
Please respect the Terms and Conditions of the National Library of Medicine, https://www.nlm.nih.gov/databases/download.html the data usage policies of National Center for Biotechnology Information, https://www.ncbi.nlm.nih.gov/home/about/policies/, https://pubchem.ncbi.nlm.nih.gov/docs/programmatic-access, and the data usage policies of the individual data sources https://pubchem.ncbi.nlm.nih.gov/sources/.
Kim, S., Thiessen, P.A., Cheng, T. et al. PUG-View: programmatic access to chemical annotations integrated in PubChem. J Cheminform 11, 56 (2019). doi:10.1186/s13321-019-0375-2.
# might fail if API is not available ## Not run: pc_sect(176, "Dissociation Constants") pc_sect(c(176, 311), "density") pc_sect(2231, "depositor-supplied synonyms", "substance") pc_sect(780286, "modify date", "assay") pc_sect(9023, "Ensembl ID", "gene") pc_sect("1ZHY_A", "Sequence", "protein") ## End(Not run)
# might fail if API is not available ## Not run: pc_sect(176, "Dissociation Constants") pc_sect(c(176, 311), "density") pc_sect(2231, "depositor-supplied synonyms", "substance") pc_sect(780286, "modify date", "assay") pc_sect(9023, "Ensembl ID", "gene") pc_sect("1ZHY_A", "Sequence", "protein") ## End(Not run)
Search synonyms using PUG-REST, see https://pubchem.ncbi.nlm.nih.gov/.
pc_synonyms( query, from = c("name", "cid", "sid", "aid", "smiles", "inchi", "inchikey"), match = c("all", "first", "ask", "na"), verbose = getOption("verbose"), arg = NULL, choices = NULL, ... )
pc_synonyms( query, from = c("name", "cid", "sid", "aid", "smiles", "inchi", "inchikey"), match = c("all", "first", "ask", "na"), verbose = getOption("verbose"), arg = NULL, choices = NULL, ... )
query |
character; search term. |
from |
character; type of input, can be one of "name" (default), "cid", "sid", "aid", "smiles", "inchi", "inchikey" |
match |
character; How should multiple hits be handled? |
verbose |
logical; should a verbose output be printed on the console? |
arg |
character; optional arguments like "name_type=word" to match individual words. |
choices |
deprecated. Use the |
... |
currently unused |
a named list.
Please respect the Terms and Conditions of the National Library of Medicine, https://www.nlm.nih.gov/databases/download.html the data usage policies of National Center for Biotechnology Information, https://www.ncbi.nlm.nih.gov/home/about/policies/, https://pubchem.ncbi.nlm.nih.gov/docs/programmatic-access, and the data usage policies of the indicidual data sources https://pubchem.ncbi.nlm.nih.gov/sources/.
Wang, Y., J. Xiao, T. O. Suzek, et al. 2009 PubChem: A Public Information System for Analyzing Bioactivities of Small Molecules. Nucleic Acids Research 37: 623–633.
Kim, Sunghwan, Paul A. Thiessen, Evan E. Bolton, et al. 2016 PubChem Substance and Compound Databases. Nucleic Acids Research 44(D1): D1202–D1213.
Kim, S., Thiessen, P. A., Bolton, E. E., & Bryant, S. H. (2015). PUG-SOAP and PUG-REST: web services for programmatic access to chemical information in PubChem. Nucleic acids research, gkv396.
## Not run: pc_synonyms("Aspirin") pc_synonyms(c("Aspirin", "Triclosan")) pc_synonyms(5564, from = "cid") pc_synonyms(c("Aspirin", "Triclosan"), match = "ask") ## End(Not run)
## Not run: pc_synonyms("Aspirin") pc_synonyms(c("Aspirin", "Triclosan")) pc_synonyms(5564, from = "cid") pc_synonyms(c("Aspirin", "Triclosan"), match = "ask") ## End(Not run)
Ping an API used in webchem to see if it's working.
ping_service( service = c("bcpc", "chebi", "chembl", "cs", "cs_web", "cir", "cts", "etox", "fn", "nist", "opsin", "pc", "srs", "wd"), apikey = NULL )
ping_service( service = c("bcpc", "chebi", "chembl", "cs", "cs_web", "cir", "cts", "etox", "fn", "nist", "opsin", "pc", "srs", "wd"), apikey = NULL )
service |
character; the same abbreviations used as prefixes in
|
apikey |
character; API key for services that require API keys |
A logical, TRUE if the service is available or FALSE if it isn't
## Not run: ping_service("chembl") ## End(Not run)
## Not run: ping_service("chembl") ## End(Not run)
Get record details from SRS, see https://cdxnodengn.epa.gov/cdx-srs-rest/
srs_query( query, from = c("itn", "cas", "epaid", "tsn", "name"), verbose = getOption("verbose"), ... )
srs_query( query, from = c("itn", "cas", "epaid", "tsn", "name"), verbose = getOption("verbose"), ... )
query |
character; query ID. |
from |
character; type of query ID, e.g. |
verbose |
logical; should a verbose output be printed on the console? |
... |
not currently used. |
a list of lists (for each supplied query): a list of 22. subsKey, internalTrackingNumber, systematicName, epaIdentificationNumber, currentCasNumber, currentTaxonomicSerialNumber, epaName, substanceType, categoryClass, kingdomCode, iupacName, pubChemId, molecularWeight, molecularFormula, inchiNotation, smilesNotation, classifications, characteristics, synonyms, casNumbers, taxonomicSerialNumbers, relationships
## Not run: # might fail if API is not available srs_query(query = '50-00-0', from = 'cas') ### multiple inputs casrn <- c('50-00-0', '67-64-1') srs_query(query = casrn, from = 'cas') ## End(Not run)
## Not run: # might fail if API is not available srs_query(query = '50-00-0', from = 'cas') ### multiple inputs casrn <- c('50-00-0', '67-64-1') srs_query(query = casrn, from = 'cas') ## End(Not run)
Retrieve identifiers from Wikidata
wd_ident(id, verbose = getOption("verbose"))
wd_ident(id, verbose = getOption("verbose"))
id |
character; identifier, as returned by |
verbose |
logical; print message during processing to console? |
A data.frame of identifiers. Currently these are 'smiles', 'cas', 'cid', 'einecs', 'csid', 'inchi', 'inchikey', 'drugbank', 'zvg', 'chebi', 'chembl', 'unii', 'lipidmaps', 'swisslipids' and source_url.
Only matches in labels are returned. If more than one unique hit is found, only the first is returned.
Willighagen, E., 2015. Getting CAS registry numbers out of WikiData. The Winnower. doi:10.15200/winn.142867.72538
Mitraka, Elvira, Andra Waagmeester, Sebastian Burgstaller-Muehlbacher, et al. 2015 Wikidata: A Platform for Data Integration and Dissemination for the Life Sciences and beyond. bioRxiv: 031971.
## Not run: id <- c("Q408646", "Q18216") wd_ident(id) ## End(Not run)
## Not run: id <- c("Q408646", "Q18216") wd_ident(id) ## End(Not run)
Chemical information from around the web. This package interacts with a suite of web APIs for chemical information.
These functions are defunct and no longer available.
ppdb_query() ppdb_parse() ppdb() cir() pp_query() cs_prop() ci_query() pan_query()
ppdb_query() ppdb_parse() ppdb() cir() pp_query() cs_prop() ci_query() pan_query()
These functions are provided for compatibility with older version of the webchem package. They may eventually be completely removed.
cid_compinfo(...) aw_query(...)
cid_compinfo(...) aw_query(...)
... |
Parameters to be passed to the modern version of the function |
Deprecated functions are:
pc_prop |
was formerly cid_compinfo
|
bcpc_query |
was formerly aw_query
|
Supply a query of any type (e.g. SMILES, CAS, name, InChI, etc.) along with
any webchem function that has query
and from
arguments. If the
function doesn't accept the type of query you've supplied, this will try to
automatically translate it using CTS and run the query.
with_cts(query, from, .f, .verbose = getOption("verbose"), ...)
with_cts(query, from, .f, .verbose = getOption("verbose"), ...)
query |
character; the search term |
from |
character; the format or type of query. Commonly accepted values are "name", "cas", "inchi", and "inchikey" |
.f |
character; the (quoted) name of a webchem function |
.verbose |
logical; print a message when translating query? |
... |
other arguments passed to the function specified with |
returns results from .f
During the translation step, only the first hit from CTS is used. Therefore, using this function to translate on the fly is not foolproof and care should be taken to verify the results.
## Not run: with_cts("XDDAORKBJWWYJS-UHFFFAOYSA-N", from = "inchikey", .f = "get_etoxid") ## End(Not run)
## Not run: with_cts("XDDAORKBJWWYJS-UHFFFAOYSA-N", from = "inchikey", .f = "get_etoxid") ## End(Not run)
Some webchem functions return character strings that contain a chemical structure in Mol format. This function exports a character string as a .mol file so it can be imported with other chemistry software.
write_mol(x, file = "")
write_mol(x, file = "")
x |
a character string of a chemical structure in mol format. |
file |
a character vector of file names |
## Not run: # export Mol file csid <- get_csid("bergapten") mol3d <- cs_compinfo(csid$csid, field = "Mol3D") write_mol(mol3d$mol3D, file = mol3d$id) # export multiple Mol files csids <- get_csid(c("bergapten", "xanthotoxin")) mol3ds <- cs_compinfo(csids$csid, field = "Mol3D") mapply(function(x, y) write_mol(x, y), x = mol3ds$mol3D, y = mol3ds$id) ## End(Not run)
## Not run: # export Mol file csid <- get_csid("bergapten") mol3d <- cs_compinfo(csid$csid, field = "Mol3D") write_mol(mol3d$mol3D, file = mol3d$id) # export multiple Mol files csids <- get_csid(c("bergapten", "xanthotoxin")) mol3ds <- cs_compinfo(csids$csid, field = "Mol3D") mapply(function(x, y) write_mol(x, y), x = mol3ds$mol3D, y = mol3ds$id) ## End(Not run)