Package 'medrxivr'

Title: Access and Search MedRxiv and BioRxiv Preprint Data
Description: An increasingly important source of health-related bibliographic content are preprints - preliminary versions of research articles that have yet to undergo peer review. The two preprint repositories most relevant to health-related sciences are medRxiv <https://www.medrxiv.org/> and bioRxiv <https://www.biorxiv.org/>, both of which are operated by the Cold Spring Harbor Laboratory. 'medrxivr' provides programmatic access to the 'Cold Spring Harbour Laboratory (CSHL)' API <https://api.biorxiv.org/>, allowing users to easily download medRxiv and bioRxiv preprint metadata (e.g. title, abstract, publication date, author list, etc) into R. 'medrxivr' also provides functions to search the downloaded preprint records using regular expressions and Boolean logic, as well as helper functions that allow users to export their search results to a .BIB file for easy import to a reference manager and to download the full-text PDFs of preprints matching their search criteria.
Authors: Yaoxiang Li [aut, cre] , Luke McGuinness [aut], Lena Schmidt [aut], Tuija Sonkkila [rev], Najko Jahn [rev]
Maintainer: Yaoxiang Li <[email protected]>
License: GPL-2
Version: 0.1.1
Built: 2025-01-04 06:12:47 UTC
Source: https://github.com/ropensci/medrxivr

Help Index


Access medRxiv/bioRxiv data via the Cold Spring Harbour Laboratory API

Description

Provides programmatic access to all preprints available through the Cold Spring Harbour Laboratory API, which serves both the medRxiv and bioRxiv preprint repositories.

Usage

mx_api_content(
  from_date = "2013-01-01",
  to_date = as.character(Sys.Date()),
  clean = TRUE,
  server = "medrxiv",
  include_info = FALSE
)

Arguments

from_date

Earliest date of interest, written as "YYYY-MM-DD". Defaults to 1st Jan 2013 ("2013-01-01"), ~6 months prior to earliest preprint registration date.

to_date

Latest date of interest, written as "YYYY-MM-DD". Defaults to current date.

clean

Logical, defaulting to TRUE, indicating whether to clean the data returned by the API. If TRUE, variables containing absolute paths to the preprints web-page ("link_page") and PDF ("link_pdf") are generated from the "server", "DOI", and "version" variables returned by the API. The "title", "abstract" and "authors" variables are converted to title case. Finally, the "type" and "server" variables are dropped.

server

Specify the server you wish to use: "medrxiv" (default) or "biorxiv"

include_info

Logical, indicating whether to include variables containing information returned by the API (e.g. API status, cursor number, total count of papers, etc). Default is FALSE.

Value

Dataframe with 1 record per row

See Also

Other data-source: mx_api_doi(), mx_snapshot()

Examples

if (interactive()) {
  mx_data <- mx_api_content(
    from_date = "2020-01-01",
    to_date = "2020-01-07"
  )
}

Access data on a single medRxiv/bioRxiv record via the Cold Spring Harbour Laboratory API

Description

Provides programmatic access to data on a single preprint identified by a unique Digital Object Identifier (DOI).

Usage

mx_api_doi(doi, server = "medrxiv", clean = TRUE)

Arguments

doi

Digital object identifier of the preprint you wish to retrieve data on.

server

Specify the server you wish to use: "medrxiv" (default) or "biorxiv"

clean

Logical, defaulting to TRUE, indicating whether to clean the data returned by the API. If TRUE, variables containing absolute paths to the preprints web-page ("link_page") and PDF ("link_pdf") are generated from the "server", "DOI", and "version" variables returned by the API. The "title", "abstract" and "authors" variables are converted to title case. Finally, the "type" and "server" variables are dropped.

Value

Dataframe containing details on the preprint identified by the DOI.

See Also

Other data-source: mx_api_content(), mx_snapshot()

Examples

if (interactive()) {
  mx_data <- mx_api_doi("10.1101/2020.02.25.20021568")
}

Search term wrapper that allows for different capitalization of term

Description

Inspired by the varying capitalization of "NCOV" during the corona virus pandemic (e.g. ncov, nCoV, NCOV, nCOV), this function allows for all possible configurations of lower- and upper-case letters in your search term.

Usage

mx_caps(x)

Arguments

x

Search term to be formatted

Value

The input string is return, but with each non-space character repeated in lower- and upper-case, and enclosed in square brackets. For example, mx_caps("ncov") returns "[Nn][Cc][Oo][Vv]"

See Also

Other helper: mx_crosscheck(), mx_download(), mx_export()

Examples

query <- c("coronavirus", mx_caps("ncov"))

mx_search(mx_snapshot("6c4056d2cccd6031d92ee4269b1785c6ec4d555b"), query)

Check how up-to-date the maintained medRxiv snapshot is

Description

Provides information on how up-to-date the maintained medRxiv snapshot provided by 'mx_snapshot()' is by checking whether there have been any records added to, or updated in, the medRxiv repository since the last snapshot was taken.

Usage

mx_crosscheck()

See Also

Other helper: mx_caps(), mx_download(), mx_export()

Examples

mx_crosscheck()

Download PDF's of preprints returned by a search

Description

Download PDF's of all the papers in your search results

Usage

mx_download(
  mx_results,
  directory,
  create = TRUE,
  name = c("ID", "DOI"),
  print_update = 10
)

Arguments

mx_results

Vector containing the links to the medRxiv PDFs

directory

The location you want to download the PDF's to

create

TRUE or FALSE. If TRUE, creates the directory if it doesn't exist

name

How to name the downloaded PDF. By default, both the ID number of the record and the DOI are used.

print_update

How frequently to print an update

See Also

Other helper: mx_caps(), mx_crosscheck(), mx_export()

Examples

mx_results <- mx_search(mx_snapshot(), query = "10.1101/2020.02.25.20021568")
mx_download(mx_results, directory = tempdir())

Export references for preprints returning by a search to a .bib file

Description

Export references for preprints returning by a search to a .bib file

Usage

mx_export(data, file = "medrxiv_export.bib")

Arguments

data

Dataframe returned by mx_search() or mx_api_*() functions

file

File location to save to. Must have the .bib file extension

Value

Exports a formatted .BIB file, for import into a reference manager

See Also

Other helper: mx_caps(), mx_crosscheck(), mx_download()

Examples

mx_results <- mx_search(mx_snapshot(), query = "brain")
mx_export(mx_results, tempfile(fileext = ".bib"))

Search and print output for individual search items

Description

Search and print output for individual search items

Usage

mx_reporter(mx_data, num_results, query, fields, deduplicate, NOT)

Arguments

mx_data

The mx_dataset filtered for the date limits

num_results

The number of results returned by the overall search

query

Character string, vector or list

fields

Fields of the database to search - default is Title, Abstract, Authors, Category, and DOI.

deduplicate

Logical. Only return the most recent version of a record. Default is TRUE.

NOT

Vector of regular expressions to exclude from the search. Default is "".

See Also

Other main: mx_search(), print_full_results(), run_search()


Access a static snapshot of the medRxiv repository

Description

[Available for medRxiv only] This function allows users to import a maintained static snapshot of the medRxiv repository, instead of downloading a copy from the API, which can become unavailable during peak usage times. The function dynamically retrieves multiple snapshot parts from the specified repository and combines them into a single dataframe.

Usage

mx_snapshot(commit = "main")

Arguments

commit

Commit hash or branch name for the snapshot, taken from https://github.com/yaoxiangli/medrxivr-data. Allows for reproducible searching by specifying the exact snapshot used to perform the searches. Defaults to "main", which will return the most recent snapshot from the main branch.

Value

A formatted dataframe containing the combined data from the snapshot parts, with reconstructed 'link_page' and 'link_pdf' columns.

See Also

Other data-source: mx_api_content(), mx_api_doi()

Examples

mx_data <- mx_snapshot()
mx_data_specific <- mx_snapshot(commit = "specific_commit_hash")