Title: | R Interface to the Europe PubMed Central RESTful Web Service |
---|---|
Description: | An R Client for the Europe PubMed Central RESTful Web Service (see <https://europepmc.org/RestfulWebService> for more information). It gives access to both metadata on life science literature and open access full texts. Europe PMC indexes all PubMed content and other literature sources including Agricola, a bibliographic database of citations to the agricultural literature, or Biological Patents. In addition to bibliographic metadata, the client allows users to fetch citations and reference lists. Links between life-science literature and other EBI databases, including ENA, PDB or ChEMBL are also accessible. No registration or API key is required. See the vignettes for usage examples. |
Authors: | Najko Jahn [aut, cre, cph], Maëlle Salmon [ctb] |
Maintainer: | Najko Jahn <[email protected]> |
License: | GPL-3 |
Version: | 0.4.3 |
Built: | 2024-12-23 05:01:15 UTC |
Source: | https://github.com/ropensci/europepmc |
Retrieve text-mined annotations contained in abstracts and open access full-text articles.
epmc_annotations_by_id(ids = NULL)
epmc_annotations_by_id(ids = NULL)
ids |
character vector with publication identifiers following the structure "source:ext_id", e.g. '"MED:28585529"' |
returns text-mined annotations in a tidy format with the following variables
Publication data source
Article Identifier
PMCID that locates full-text in Pubmed Central
Text snipped found before the annotation
Annotated entity
Text snipped found after the annotation
Targeted entity
Uniform link dictionary entry for targeted entity
URL to full-text occurence of the annotation
Type of annotation like Chemicals
Article section mentioning the annotation like Methods
Annotation data provider
Sub-data provider
## Not run: annotations_by_id("MED:28585529") # multiple ids annotations_by_id(c("MED:28585529", "PMC:PMC1664601")) ## End(Not run)
## Not run: annotations_by_id("MED:28585529") # multiple ids annotations_by_id(c("MED:28585529", "PMC:PMC1664601")) ## End(Not run)
Finds works that cite a given publication.
epmc_citations(ext_id = NULL, data_src = "med", limit = 100, verbose = TRUE)
epmc_citations(ext_id = NULL, data_src = "med", limit = 100, verbose = TRUE)
ext_id |
character, publication identifier |
data_src |
character, data source, by default Pubmed/MedLine index will be searched. The following three letter codes represent the sources Europe PubMed Central supports:
|
limit |
integer, number of results. By default, this function returns 100 records. |
verbose |
logical, print some information on what is going on. |
Metadata of citing documents as data.frame
## Not run: epmc_citations("PMC3166943", data_src = "pmc") epmc_citations("9338777") ## End(Not run)
## Not run: epmc_citations("PMC3166943", data_src = "pmc") epmc_citations("9338777") ## End(Not run)
This function returns EBI database entities referenced in a publication from Europe PMC RESTful Web Service.
epmc_db( ext_id = NULL, data_src = "med", db = NULL, limit = 100, verbose = TRUE )
epmc_db( ext_id = NULL, data_src = "med", db = NULL, limit = 100, verbose = TRUE )
ext_id |
character, publication identifier |
data_src |
character, data source, by default Pubmed/MedLine index will be searched. The following three letter codes represent the sources Europe PubMed Central supports:
|
db |
character, specify database:
|
limit |
integer, number of results. By default, this function returns 100 records. |
verbose |
logical, print some information on what is going on. |
Cross-references as data.frame
## Not run: epmc_db("12368864", db = "uniprot", limit = 150) epmc_db("25249410", db = "embl") epmc_db("14756321", db = "uniprot") epmc_db("11805837", db = "pride") ## End(Not run)
## Not run: epmc_db("12368864", db = "uniprot", limit = 150) epmc_db("25249410", db = "embl") epmc_db("14756321", db = "uniprot") epmc_db("11805837", db = "pride") ## End(Not run)
This function returns the number of EBI database links associated with a publication.
epmc_db_count(ext_id = NULL, data_src = "med")
epmc_db_count(ext_id = NULL, data_src = "med")
ext_id |
character, publication identifier |
data_src |
character, data source, by default Pubmed/MedLine index will be searched. |
Europe PMC supports cross-references between literature and the following databases:
Array Express, a database of functional genomics experiments
a database and ontology of chemical entities of biological interest
a database of bioactive drug-like small molecules
now ENA, provides a comprehensive record of the world's nucleotide sequencing information
provides a freely available, open source database system and analysis tools for molecular interaction data
provides functional analysis of proteins by classifying them into families and predicting domains and important sites
a comprehensive and authoritative compendium of human genes and genetic phenotypes
European resource for the collection, organisation and dissemination of data on biological macromolecular structures
comprehensive and freely accessible resource of protein sequence and functional information
PRIDE Archive - proteomics data repository
data.frame with counts for each database
## Not run: epmc_db_count(ext_id = "10779411") epmc_db_count(ext_id = "PMC3245140", data_src = "PMC") ## End(Not run)
## Not run: epmc_db_count(ext_id = "10779411") epmc_db_count(ext_id = "PMC3245140", data_src = "PMC") ## End(Not run)
This function returns parsed metadata for a given publication ID including abstract, full text links, author details including ORCID and affiliation, MeSH terms, chemicals, grants.
epmc_details(ext_id = NULL, data_src = "med")
epmc_details(ext_id = NULL, data_src = "med")
ext_id |
character, publication identifier |
data_src |
character, data source, by default Pubmed/MedLine index will be searched. Other sources Europe PubMed Central supports are:
|
list of data frames
## Not run: epmc_details(ext_id = "26980001") epmc_details(ext_id = "24270414") # PMC record epmc_details(ext_id = "PMC4747116", data_src = "pmc") # Other sources: # Agricolo epmc_details("IND43783977", data_src = "agr") # Biological Patents epmc_details("EP2412369", data_src = "pat") # Chinese Biological Abstracts epmc_details("583843", data_src = "cba") # CiteXplore epmc_details("C6802", data_src = "ctx") # NHS Evidence epmc_details("338638", data_src = "hir") # Theses epmc_details("409323", data_src = "eth") # Preprint epmc_details("PPR158112", data_src = "ppr") ## End(Not run)
## Not run: epmc_details(ext_id = "26980001") epmc_details(ext_id = "24270414") # PMC record epmc_details(ext_id = "PMC4747116", data_src = "pmc") # Other sources: # Agricolo epmc_details("IND43783977", data_src = "agr") # Biological Patents epmc_details("EP2412369", data_src = "pat") # Chinese Biological Abstracts epmc_details("583843", data_src = "cba") # CiteXplore epmc_details("C6802", data_src = "ctx") # NHS Evidence epmc_details("338638", data_src = "hir") # Theses epmc_details("409323", data_src = "eth") # Preprint epmc_details("PPR158112", data_src = "ppr") ## End(Not run)
This function loads full texts into R. Full texts are in XML format and are only provided for the Open Access subset of Europe PMC.
epmc_ftxt(ext_id = NULL)
epmc_ftxt(ext_id = NULL)
ext_id |
character, PMCID. All full text publications have external IDs starting 'PMC_' |
xml_document
## Not run: epmc_ftxt("PMC3257301") epmc_ftxt("PMC3639880") ## End(Not run)
## Not run: epmc_ftxt("PMC3257301") epmc_ftxt("PMC3639880") ## End(Not run)
Use this function to retrieve book XML formatted full text for the Open Access subset of the Europe PMC bookshelf.
epmc_ftxt_book(ext_id = NULL)
epmc_ftxt_book(ext_id = NULL)
ext_id |
character, publication identifier. All book full texts are accessible either by the PMID or the 'NBK' book number. |
xml_document
## Not run: epmc_ftxt_book("NBK32884") ## End(Not run)
## Not run: epmc_ftxt_book("NBK32884") ## End(Not run)
Search over Europe PMC and retrieve the number of results found
epmc_hits(query = NULL, ...)
epmc_hits(query = NULL, ...)
query |
query in the Europe PMC syntax |
... |
add query parameters from 'epmc_search()', e.g. synonym=true |
## Not run: epmc_hits('abstract:"burkholderia pseudomallei"') epmc_hits('AUTHORID:"0000-0002-7635-3473"') ## End(Not run)
## Not run: epmc_hits('abstract:"burkholderia pseudomallei"') epmc_hits('AUTHORID:"0000-0002-7635-3473"') ## End(Not run)
Get the yearly number of hits for a query and the total yearly number of hits for a given period
epmc_hits_trend(query, synonym = TRUE, data_src = "med", period = 1975:2016)
epmc_hits_trend(query, synonym = TRUE, data_src = "med", period = 1975:2016)
query |
query in the Europe PMC syntax |
synonym |
logical, synonym search. If TRUE, synonym terms from MeSH terminology and the UniProt synonym list are queried, too. Disabled by default. |
data_src |
character, data source, by default Pubmed/MedLine index (
|
period |
a vector of years (numeric) over which to perform the search |
A similar function was used in https://masalmon.eu/2017/05/14/evergreenreviewgraph/ where it was advised to not plot no. of hits over time for a query, but to normalize it by the total no. of hits.
a data.frame (dplyr tbl_df) with year, total number of hits (all_hits) and number of hits for the query (query_hits)
## Not run: # aspirin as query epmc_hits_trend('aspirin', period = 2006:2016, synonym = FALSE) # link to cran packages in reference lists epmc_hits_trend('REF:"cran.r-project.org*"', period = 2006:2016, synonym = FALSE) # more complex with publication type review epmc_hits_trend('(REF:"cran.r-project.org*") AND (PUB_TYPE:"Review" OR PUB_TYPE:"review-article")', period = 2006:2016, synonym = FALSE) ## End(Not run)
## Not run: # aspirin as query epmc_hits_trend('aspirin', period = 2006:2016, synonym = FALSE) # link to cran packages in reference lists epmc_hits_trend('REF:"cran.r-project.org*"', period = 2006:2016, synonym = FALSE) # more complex with publication type review epmc_hits_trend('(REF:"cran.r-project.org*") AND (PUB_TYPE:"Review" OR PUB_TYPE:"review-article")', period = 2006:2016, synonym = FALSE) ## End(Not run)
With the External Link services, Europe PMC allows third parties to publish links from Europe PMC to other webpages or tools. Current External Link providers, which can be selected through Europe PMC's advanced search, include Wikipedia, Dryad Digital Repository or other open services. For more information, see https://europepmc.org/labslink.
epmc_lablinks( ext_id = NULL, data_src = "med", lab_id = NULL, limit = 100, verbose = TRUE )
epmc_lablinks( ext_id = NULL, data_src = "med", lab_id = NULL, limit = 100, verbose = TRUE )
ext_id |
publication identifier |
data_src |
data source, by default Pubmed/MedLine index will be searched. The following three letter codes represents the sources Europe PubMed Central supports:
|
lab_id |
character vector, identifiers of the external link service. Use Europe PMC's advanced search form to find ids. |
limit |
Number of records to be returned. By default, this function returns 100 records. |
verbose |
print information about what's going on |
Links found as nested data_frame
## Not run: # Fetch links epmc_lablinks("24007304") # Link to Altmetric (lab_id = "1562") epmc_lablinks("25389392", lab_id = "1562") # Links to Wikipedia epmc_lablinks("24007304", lab_id = "1507") # Link to full text copy archived through the institutional repo of Bielefeld University epmc_lablinks("12736239", lab_id = "1056") ## End(Not run)
## Not run: # Fetch links epmc_lablinks("24007304") # Link to Altmetric (lab_id = "1562") epmc_lablinks("25389392", lab_id = "1562") # Links to Wikipedia epmc_lablinks("24007304", lab_id = "1507") # Link to full text copy archived through the institutional repo of Bielefeld University epmc_lablinks("12736239", lab_id = "1056") ## End(Not run)
With the External Link services, Europe PMC allows third parties to publish links from Europe PMC to other webpages or tools. Current External Link providers, which can be selected through Europe PMC's advanced search, include Wikipedia, Dryad Digital Repository or the institutional repo of Bielefeld University. For more information, see https://europepmc.org/labslink.
epmc_lablinks_count(ext_id = NULL, data_src = "med")
epmc_lablinks_count(ext_id = NULL, data_src = "med")
ext_id |
publication identifier |
data_src |
data source, by default Pubmed/MedLine index will be searched. The following three letter codes represents the sources Europe PubMed Central supports:
|
data.frame with counts for each database
## Not run: epmc_lablinks_count("24023770") epmc_lablinks_count("PMC3986813", data_src = "pmc") ## End(Not run)
## Not run: epmc_lablinks_count("24023770") epmc_lablinks_count("PMC3986813", data_src = "pmc") ## End(Not run)
This functions returns the number of results found for your query, and breaks it down to the various publication types, data sources, and subsets Europe PMC provides.
epmc_profile(query = NULL, synonym = TRUE)
epmc_profile(query = NULL, synonym = TRUE)
query |
character, search query. For more information on how to build a search query, see https://europepmc.org/Help |
synonym |
logical, synonym search. If TRUE, synonym terms from MeSH terminology and the UniProt synonym list are queried, too. Enabled by default. |
## Not run: epmc_profile('malaria') # use field search, e.g. query materials and reference section for # mentions of "ropensci" epmc_profile('(METHODS:"ropensci")') ## End(Not run)
## Not run: epmc_profile('malaria') # use field search, e.g. query materials and reference section for # mentions of "ropensci" epmc_profile('(METHODS:"ropensci")') ## End(Not run)
This function retrieves all the works listed in the bibliography of a given article.
epmc_refs(ext_id = NULL, data_src = "med", limit = 100, verbose = TRUE)
epmc_refs(ext_id = NULL, data_src = "med", limit = 100, verbose = TRUE)
ext_id |
character, publication identifier |
data_src |
character, data source, by default Pubmed/MedLine index will be searched. The following three letter codes represent the sources Europe PubMed Central supports:
|
limit |
integer, number of results. By default, this function returns 100 records. |
verbose |
logical, print some information on what is going on. |
returns reference section as tibble
## Not run: epmc_refs("PMC3166943", data_src = "pmc") epmc_refs("25378340") epmc_refs("21753913") ## End(Not run)
## Not run: epmc_refs("PMC3166943", data_src = "pmc") epmc_refs("25378340") epmc_refs("21753913") ## End(Not run)
This is the main function to search Europe PMC RESTful Web Service (https://europepmc.org/RestfulWebService). It fully supports the comprehensive Europe PMC query language. Simply copy & paste your query terms to R. To get familiar with the Europe PMC query syntax, check the Advanced Search Query Builder https://europepmc.org/advancesearch.
epmc_search( query = NULL, output = "parsed", synonym = TRUE, verbose = TRUE, limit = 100, sort = NULL )
epmc_search( query = NULL, output = "parsed", synonym = TRUE, verbose = TRUE, limit = 100, sort = NULL )
query |
character, search query. For more information on how to build a search query, see https://europepmc.org/Help |
output |
character, what kind of output should be returned. One of 'parsed', 'id_list' or 'raw' As default, parsed key metadata will be returned as data.frame. 'id_list' returns a list of IDs and sources. Use 'raw' to get full metadata as list. Please be aware that these lists can become very large. |
synonym |
logical, synonym search. If TRUE, synonym terms from MeSH terminology and the UniProt synonym list are queried, too. In order to replicate results from the website, with the Rest API you need to turn synonyms ON! |
verbose |
logical, print progress bar. Activated by default. |
limit |
integer, limit the number of records you wish to retrieve. By default, 100 are returned. |
sort |
character, relevance ranking is used by default. Use
|
tibble
## Not run: #Search articles for 'Gabi-Kat' my.data <- epmc_search(query='Gabi-Kat') #Get article metadata by DOI my.data <- epmc_search(query = 'DOI:10.1007/bf00197367') #Get article metadata by PubMed ID (PMID) my.data <- epmc_search(query = 'EXT_ID:22246381') #Get only PLOS Genetics article with EMBL database references my.data <- epmc_search(query = 'ISSN:1553-7404 HAS_EMBL:y') #Limit search to 250 PLOS Genetics articles my.data <- epmc_search(query = 'ISSN:1553-7404', limit = 250) # exclude MeSH synonyms in search my.data <- epmc_search(query = 'aspirin', synonym = FALSE) # get 100 most cited atricles from PLOS ONE publsihed in 2014 epmc_search(query = '(ISSN:1932-6203) AND FIRST_PDATE:2014', sort = 'cited') # print number of records found attr(my.data, "hit_count") # change output ## End(Not run)
## Not run: #Search articles for 'Gabi-Kat' my.data <- epmc_search(query='Gabi-Kat') #Get article metadata by DOI my.data <- epmc_search(query = 'DOI:10.1007/bf00197367') #Get article metadata by PubMed ID (PMID) my.data <- epmc_search(query = 'EXT_ID:22246381') #Get only PLOS Genetics article with EMBL database references my.data <- epmc_search(query = 'ISSN:1553-7404 HAS_EMBL:y') #Limit search to 250 PLOS Genetics articles my.data <- epmc_search(query = 'ISSN:1553-7404', limit = 250) # exclude MeSH synonyms in search my.data <- epmc_search(query = 'aspirin', synonym = FALSE) # get 100 most cited atricles from PLOS ONE publsihed in 2014 epmc_search(query = '(ISSN:1932-6203) AND FIRST_PDATE:2014', sort = 'cited') # print number of records found attr(my.data, "hit_count") # change output ## End(Not run)
In general, use epmc_search
instead. It calls this function, calling all
pages within the defined limit.
epmc_search_( query = NULL, limit = 100, output = "parsed", page_token = NULL, ... )
epmc_search_( query = NULL, limit = 100, output = "parsed", page_token = NULL, ... )
query |
character, search query. For more information on how to build a search query, see https://europepmc.org/Help |
limit |
integer, limit the number of records you wish to retrieve. By default, 25 are returned. |
output |
character, what kind of output should be returned. One of 'parsed', 'id_list' or 'raw' As default, parsed key metadata will be returned as data.frame. 'id_list returns a list of IDs and sources. Use 'raw' to get full metadata as list. Please be aware that these lists can become very large. |
page_token |
cursor marking the page |
... |
further params from |
Look up DOIs indexed in Europe PMC and get metadata back.
epmc_search_by_doi(doi = NULL, output = "parsed")
epmc_search_by_doi(doi = NULL, output = "parsed")
doi |
character vector containing DOI names. |
output |
character, what kind of output should be returned. One of 'parsed', 'id_list' or 'raw' As default, parsed key metadata will be returned as data.frame. 'id_list' returns a list of IDs and sources. Use 'raw' to get full metadata as list. Please be aware that these lists can become very large. |
## Not run: # single DOI name epmc_search_by_doi(doi = "10.1161/strokeaha.117.018077") # multiple DOIname in a vector my_dois <- c( "10.1159/000479962", "10.1002/sctm.17-0081", "10.1161/strokeaha.117.018077", "10.1007/s12017-017-8447-9") epmc_search_by_doi(doi = my_dois) # full metadata epmc_search_by_doi(doi = my_dois, output = "raw") ## End(Not run)
## Not run: # single DOI name epmc_search_by_doi(doi = "10.1161/strokeaha.117.018077") # multiple DOIname in a vector my_dois <- c( "10.1159/000479962", "10.1002/sctm.17-0081", "10.1161/strokeaha.117.018077", "10.1007/s12017-017-8447-9") epmc_search_by_doi(doi = my_dois) # full metadata epmc_search_by_doi(doi = my_dois, output = "raw") ## End(Not run)
Please use epmc_search_by_doi
instead. It calls this
method, returning open access status information from all your requests.
epmc_search_by_doi_(doi, .pb = NULL, output = NULL)
epmc_search_by_doi_(doi, .pb = NULL, output = NULL)
doi |
character vector containing DOI names. |
.pb |
progress bar object |
output |
character, what kind of output should be returned. One of 'parsed', 'id_list' or 'raw' As default, parsed key metadata will be returned as data.frame. 'id_list' returns a list of IDs and sources. Use 'raw' to get full metadata as list. Please be aware that these lists can become very large. |
## Not run: epmc_search_by_doi_("10.1159/000479962") ## End(Not run)
## Not run: epmc_search_by_doi_("10.1159/000479962") ## End(Not run)
What is europepmc?:
europepmc facilitates access to Europe PMC RESTful Web Service. Europe PMC covers life science literature and gives access to open access full texts. Coverage is not only restricted to Europe, but articles and abstracts are indexed from all over the world. Europe PMC ingests all PubMed content and extends its index with other sources, including Agricola, a bibliographic database of citations to the agricultural literature, or Biological Patents.
Besides searching abstracts and full text, europepmc can be used to retrieve reference sections and citations, text-mined terms or cross-links to other databases hosted by the European Bioinformatics Institute (EBI).
For more information about Europe PMC, see their current paper: Ferguson, C., Araújo, D., Faulk, L., Gou, Y., Hamelers, A., Huang, Z., Ide-Smith, M., Levchenko, M., Marinos, N., Nambiar, R., Nassar, M., Parkin, M., Pi, X., Rahman, F., Rogers, F., Roochun, Y., Saha, S., Selim, M., Shafique, Z., … McEntyre, J. (2020). Europe PMC in 2020. Nucleic Acids Research, 49(D1), D1507–D1514. doi:10.1093/nar/gkaa994.
Maintainer: Najko Jahn [email protected] [copyright holder]
Other contributors:
Maëlle Salmon [contributor]
Useful links:
Report bugs at https://github.com/ropensci/europepmc/issues