Package 'rsnps'

Title: Get 'SNP' ('Single-Nucleotide' 'Polymorphism') Data on the Web
Description: A programmatic interface to various 'SNP' 'datasets' on the web: 'OpenSNP' (<https://opensnp.org>), and 'NBCIs' 'dbSNP' database (<https://www.ncbi.nlm.nih.gov/projects/SNP/>). Functions are included for searching for 'NCBI'. For 'OpenSNP', functions are included for getting 'SNPs', and data for 'genotypes', 'phenotypes', annotations, and bulk downloads of data by user.
Authors: Julia Gustavsen [aut, cre] , Sina Rüeger [aut] , Scott Chamberlain [aut] , Kevin Ushey [aut], Hao Zhu [aut]
Maintainer: Julia Gustavsen <[email protected]>
License: MIT + file LICENSE
Version: 0.6.1
Built: 2024-12-27 03:27:31 UTC
Source: https://github.com/ropensci/rsnps

Help Index


Get SNP (Single-Nucleotide Polymorphism) Data on the Web

Description

This package gives you access to data from OpenSNP (https://opensnp.org) via their API (https://opensnp.org/faq#api) and NCBI's dbSNP SNP database (https://www.ncbi.nlm.nih.gov/snp).

NCBI Authentication

This applies the function ncbi_snp_query():

You can optionally use an API key, if you do it will allow higher rate limits (more requests per time period)

To get an API key from NCBI you can login to create a key via your account settings at https://www.ncbi.nlm.nih.gov/account/settings/

#' Note: NCBI login is via with a 3rd party account (e.g. Google, orcid, etc.). If you had an already existing NCBI account you can link it with a 3rd party login and then you can retire your old NCBI login if you haven't already), otherwise just #' create a new account.

Once you are logged on to your NCBI account settings (https://www.ncbi.nlm.nih.gov/account/settings/) you can go to the section "API Key Management"

Here you can select "Create an API Key" (which will give you up to 10 requests per second, instead of the 3 per second without the API key.).

After generating your key, set an environment variable as ENTREZ_KEY in .Renviron. This .Renviron file can be edited using usethis::edit_r_environ() or by locating and creating/editing this file yourself.

ENTREZ_KEY='youractualkeynotthisstring'

Once the API is added to your .Renviron file you can then restart R for this to take effect.

You can optionally pass in your API key to the key parameter in NCBI functions in this package. However, it's much better from a security perspective to set an environment variable.

Author(s)

Scott Chamberlain [email protected]

Kevin Ushey [email protected]

Hao Zhu [email protected]

Sina Rüeger [email protected]

Julia Gustavsen [email protected]


Get openSNP genotype data for all users at a particular snp.

Description

Get openSNP genotype data for all users at a particular snp.

Usage

allgensnp(snp = NA, usersubset = FALSE, ...)

Arguments

snp

(character) A SNP name

usersubset

Get a subset of users, integer numbers, e.g. 1-8 (default: none)

...

Curl options passed on to crul::HttpClient

Value

data.frame of genotypes for all users at a certain SNP

See Also

Other opensnp-fxns: allphenotypes(), annotations(), download_users(), fetch_genotypes(), genotypes(), phenotypes_byid(), phenotypes(), users()

Examples

## Not run: 
x <- allgensnp(snp = "rs7412")
head(x)

## End(Not run)

Get all openSNP phenotypes, their variations, and how many users have data available for a given phenotype.

Description

Either return data.frame with all results, or output a list, then call the characteristic by id (parameter = "id") or name (parameter = "characteristic").

Usage

allphenotypes(df = FALSE, ...)

Arguments

df

Return a data.frame of all data. The column known_variations can take multiple values, so the other columns id, characteristic, and number_of_users are replicated in the data.frame. Default: FALSE

...

Curl options passed on to crul::HttpClient

Value

data.frame of results, or list if df=FALSE

See Also

Other opensnp-fxns: allgensnp(), annotations(), download_users(), fetch_genotypes(), genotypes(), phenotypes_byid(), phenotypes(), users()

Examples

## Not run: 
# Get all data
allphenotypes(df = TRUE)

# Output a list, then call the characterisitc of interest by 'id' or
# 'characteristic'
datalist <- allphenotypes()
names(datalist) # get list of all characteristics you can call
datalist[["ADHD"]] # get data.frame for 'ADHD'
datalist[c("mouth size", "SAT Writing")] # get data.frame for 'ADHD'

## End(Not run)

Get all openSNP phenotypes, their variations, and how many users have data available for a given phenotype.

Description

Either return data.frame with all results, or output a list, then call the characteristic by id (parameter = "id") or name (parameter = "characteristic").

Usage

annotations(
  snp = NA,
  output = c("all", "plos", "mendeley", "snpedia", "metadata"),
  ...
)

Arguments

snp

SNP name.

output

Name the source or sources you want annotations from (options are: 'plos', 'mendeley', 'snpedia', 'metadata'). 'metadata' gives the metadata for the response.

...

Curl options passed on to crul::HttpClient

Value

data.frame of results

See Also

Other opensnp-fxns: allgensnp(), allphenotypes(), download_users(), fetch_genotypes(), genotypes(), phenotypes_byid(), phenotypes(), users()

Examples

## Not run: 
# Get all data
## get just the metadata
annotations(snp = "rs7903146", output = "metadata")

## just from plos
annotations(snp = "rs7903146", output = "plos")

## just from snpedia
annotations(snp = "rs7903146", output = "snpedia")

## get all annotations
annotations(snp = "rs7903146", output = "all")

## End(Not run)

Download openSNP user files.

Description

Download openSNP user files.

Usage

download_users(name = NULL, id = NULL, dir = "~/", ...)

Arguments

name

User name

id

User id

dir

Directory to save file to

...

Curl options passed on to crul::HttpClient

Value

File downloaded to directory you specify (or default), nothing returned in R.

See Also

Other opensnp-fxns: allgensnp(), allphenotypes(), annotations(), fetch_genotypes(), genotypes(), phenotypes_byid(), phenotypes(), users()

Examples

## Not run: 
# Download a single user file, by id
download_users(id = 14)

# Download a single user file, by user name
download_users(name = "kevinmcc")

# Download many user files
lapply(c(14, 22), function(x) download_users(id = x))
read_users(id = 14, nrows = 5)

## End(Not run)

Download openSNP genotype data for a user

Description

Download openSNP genotype data for a user

Usage

fetch_genotypes(url, rows = 100, filepath = NULL, quiet = TRUE, ...)

Arguments

url

(character) URL for the download. See example below of function use.

rows

(integer) Number of rows to read in. Useful for getting a glimpse of the data. Negative and other invalid values are ignored, giving back all data. Default: 100

filepath

(character) If none is given the file is saved to a temporary file, which will be lost after your session is closed. Save to a file if you want to access it later.

quiet

(logical) Should download progress be suppressed. Default: TRUE

...

Further args passed on to download.file()

Details

Beware, not setting the rows parameter means that you download the entire file, which can be large (e.g., 15MB), and so take a while to download depending on your connection speed. Therefore, rows is set to 10 by default to sort of protect the user.

Internally, we use download.file() to download each file, then read.table() to read the file to a data.frame.

Value

data.frame for a single user, with four columns:

  • rsid (character)

  • chromosome (integer)

  • position (integer)

  • genotype (character)

See Also

Other opensnp-fxns: allgensnp(), allphenotypes(), annotations(), download_users(), genotypes(), phenotypes_byid(), phenotypes(), users()

Examples

## Not run: 
# get a data.frame of the users data
data <- users(df = TRUE)
head(data[[1]]) # users with links to genome data
mydata <- fetch_genotypes(
  url = data[[1]][1, "genotypes.download_url"],
  file = "~/myfile.txt"
)

# see some data right away
mydata

# Or read in data later separately
read.table("~/myfile.txt", nrows = 10)

## End(Not run)

Get openSNP genotype data for one or multiple users.

Description

Get openSNP genotype data for one or multiple users.

Usage

genotypes(snp = NA, userid = NA, df = FALSE, ...)

Arguments

snp

SNP name.

userid

ID of openSNP user.

df

Return data.frame (TRUE) or not (FALSE). Default: FALSE

...

Curl options passed on to crul::HttpClient]

Value

List (or data.frame) of genotypes for specified user(s) at a certain SNP.

See Also

Other opensnp-fxns: allgensnp(), allphenotypes(), annotations(), download_users(), fetch_genotypes(), phenotypes_byid(), phenotypes(), users()

Examples

## Not run: 
genotypes(snp = "rs9939609", userid = 1)
genotypes("rs9939609", userid = "1,6,8", df = TRUE)
genotypes("rs9939609", userid = "1-2", df = FALSE)

## End(Not run)

Internal function to get the frequency of the variants from different studies.

Description

Internal function to get the frequency of the variants from different studies.

Usage

get_frequency(Class, primary_info)

Arguments

Class

What kind of variant is the rsid. Accepted options are "snv", "snp" and "delins".

primary_info

refsnp entry read in JSON format


Internal function to get gene names.

Description

If multiple gene names are encountered they are collapsed with a "/".

Usage

get_gene_names(primary_info)

Arguments

primary_info

refsnp entry read in JSON format


Internal function to get the position, alleles, assembly, hgvs notation

Description

Internal function to get the position, alleles, assembly, hgvs notation

Usage

get_placements(primary_info)

Arguments

primary_info

refsnp entry read in JSON format


Query NCBI's refSNP for information on a set of SNPs via the API

Description

This function queries NCBI's refSNP for information related to the latest dbSNP build and latest reference genome for information on the vector of snps submitted.

Usage

ncbi_snp_query(snps)

Arguments

snps

(character) A vector of SNPs (rs numbers).

Details

This function currently pulling data for Assembly 38 - in particular note that if you think the BP position is wrong, that you may be hoping for the BP position for a different Assembly.

Note that you are limited in the to a max of one query per second and concurrent queries are not allowed. If users want to set curl options when querying for the SNPs they can do so by using httr::set_config/httr::with_config

Value

A dataframe with columns:

  • query: The rs ID that was queried.

  • chromosome: The chromosome that the marker lies on.

  • bp: The chromosomal position, in base pairs, of the marker, as aligned with the current genome used by dbSNP. we add 1 to the base pair position in the BP column in the output data.frame to agree with what the dbSNP website has.

  • rsid: Reference SNP cluster ID. If the rs ID queried has been merged, the up-to-date name of the ID is returned here, and a warning is issued.

  • class: The rsid's 'class'. See https://www.ncbi.nlm.nih.gov/projects/SNP/snp_legend.cgi?legend=snpClass for more details.

  • gene: If the rsid lies within a gene (either within the exon or introns of a gene), the name of that gene is returned here; otherwise, NA. Note that the gene may not be returned if the rsid lies too far upstream or downstream of the particular gene of interest.

  • alleles: The alleles associated with the SNP if it is a SNV; otherwise, if it is an INDEL, microsatellite, or other kind of polymorphism the relevant information will be available here.

  • minor: The allele for which the MAF is computed, given it is an SNV; otherwise, NA.

  • maf: The minor allele frequency of the SNP, given it is an SNV. This is drawn from the current global reference population used by NCBI (GnomAD).

  • ancestral_allele: allele as described in the current assembly

  • variation_allele: difference to the current assembly

  • seqname - Chromosome RefSeq reference.

  • hgvs - full hgvs notation for variant

  • assembly - which assembly was used for the annotations

  • ref_seq - sequence in reference assembly

  • maf_population - dataframe of all minor allele frequencies reported, with columns study, reference allele, alternative allele (minor) and minor allele frequency.

References

https://www.ncbi.nlm.nih.gov/projects/SNP/

https://pubmed.ncbi.nlm.nih.gov/31738401/ SPDI model

Examples

## Not run: 
## an example with both merged SNPs, non-SNV SNPs, regular SNPs,
## SNPs not found, microsatellite
SNPs <- c("rs332", "rs420358", "rs1837253", "rs1209415715", "rs111068718")
ncbi_snp_query(SNPs)
# ncbi_snp_query("123456") ##invalid: must prefix with 'rs'
ncbi_snp_query("rs420358")
ncbi_snp_query("rs332") # warning that its merged into another, try that
ncbi_snp_query("rs121909001")
ncbi_snp_query("rs1837253")
ncbi_snp_query("rs1209415715")
ncbi_snp_query("rs111068718")
ncbi_snp_query(snps = "rs9970807")

ncbi_snp_query("rs121909001")
ncbi_snp_query("rs121909001", verbose = TRUE)

## End(Not run)

Get openSNP phenotype data for one or multiple users.

Description

Get openSNP phenotype data for one or multiple users.

Usage

phenotypes(userid = NA, df = FALSE, ...)

Arguments

userid

ID of openSNP user.

df

Return data.frame (TRUE) or not (FALSE). Default: FALSE

...

Curl options passed on to crul::HttpClient

Value

List of phenotypes for specified user(s).

See Also

Other opensnp-fxns: allgensnp(), allphenotypes(), annotations(), download_users(), fetch_genotypes(), genotypes(), phenotypes_byid(), users()

Examples

## Not run: 
phenotypes(userid = 1)
phenotypes(userid = "1,6,8", df = TRUE)
phenotypes(userid = "1-8", df = TRUE)

# coerce to data.frame
library(plyr)
df <- ldply(phenotypes(userid = "1-8", df = TRUE))
head(df)
tail(df)

# pass on curl options
phenotypes(1, verbose = TRUE)

## End(Not run)

Get all openSNP known variations and all users sharing that phenotype for one phenotype(-ID).

Description

Get all openSNP known variations and all users sharing that phenotype for one phenotype(-ID).

Usage

phenotypes_byid(
  phenotypeid = NA,
  return_ = c("description", "knownvars", "users"),
  ...
)

Arguments

phenotypeid

ID of openSNP phenotype.

return_

Return data.frame (TRUE) or not (FALSE). Default: FALSE

...

Curl options passed on to crul::HttpClient

Value

List of description of phenotype, list of known variants, or data.frame of variants for each user with that phenotype.

See Also

Other opensnp-fxns: allgensnp(), allphenotypes(), annotations(), download_users(), fetch_genotypes(), genotypes(), phenotypes(), users()

Examples

## Not run: 
phenotypes_byid(phenotypeid = 12, return_ = "desc")
phenotypes_byid(phenotypeid = 12, return_ = "knownvars")
phenotypes_byid(phenotypeid = 12, return_ = "users")

# pass on curl options
phenotypes_byid(phenotypeid = 12, return_ = "desc", verbose = TRUE)

## End(Not run)

Read in openSNP user files from local storage.

Description

Beware, these tables can be large. Check your RAM before executing. Or possibly read in a subset of the data. This function reads in the whole kitten kaboodle.

Usage

read_users(name = NULL, id = NULL, path = NULL, ...)

Arguments

name

User name

id

User id

path

Path to file to read from.

...

Parameters passed on to read.table()

Details

If you specify a name or id, this function reads environment variables written in the function download_users, and then searches against those variables for the path to the file saved. Alternatively, you can supply the path.

Value

A data.frame.

Examples

## Not run: 
# dat <- read_users(name = "kevinmcc")
# head(dat)
# dat <- read_users(id = 285)

## End(Not run)

For use with usethis::use_release_issue()

Description

For use with usethis::use_release_issue()

Usage

release_bullets()

Defunct functions in rsnps

Description

  • LDSearch(): Function name changed to ld_search

  • ld_search(): The Broad Institute took the service down, see https://www.broadinstitute.org/snap/snap

  • NCBI_snp_query(): Function name changed to ncbi_snp_query

  • NCBI_snp_query2(): Function name changed to ncbi_snp_query

  • ncbi_snp_summary(): Function name changed to ncbi_snp_query

  • ncbi_snp_query2(): Function name changed to ncbi_snp_query


Get openSNP users.

Description

Get openSNP users.

Usage

users(df = FALSE, ...)

Arguments

df

Return data.frame (TRUE) or not (FALSE). Default: FALSE

...

Curl options passed on to crul::HttpClient

Value

List of openSNP users, their ID numbers, and XX if available.

See Also

Other opensnp-fxns: allgensnp(), allphenotypes(), annotations(), download_users(), fetch_genotypes(), genotypes(), phenotypes_byid(), phenotypes()

Examples

## Not run: 
# just the list
data <- users(df = FALSE)
data

# get a data.frame of the users data
data <- users(df = TRUE)
data[[1]] # users with links to genome data
data[[2]] # users without links to genome data

## End(Not run)