Title: | author name disambiguation, author georeferencing, and mapping of coauthorship networks with 'Web of Science' data |
---|---|
Description: | Tools to parse and organize reference records downloaded from the 'Web of Science' citation database into an R-friendly format, disambiguate the names of authors, geocode their locations, and generate/visualize coauthorship networks. This package has been peer-reviewed by rOpenSci (v. 1.0). |
Authors: | Auriel M.V. Fournier [aut], Matthew E. Boone [aut], Forrest R. Stevens [aut], Emilio Bruna [aut, cre], Bianca Kramer [rev] (Kramer reviewed the package (v 1.0) for rOpenSci, see <https://github.com/ropensci/software-review/issues/256>), Najko Jahn [rev] (Jahn reviewed the package (v1.0) for rOpenSci, see <https://github.com/ropensci/software-review/issues/256>) |
Maintainer: | Emilio Bruna <[email protected]> |
License: | GPL-3 |
Version: | 1.0 |
Built: | 2025-01-12 05:38:14 UTC |
Source: | https://github.com/ropensci/refsplitr |
references_read
authors_clean
This function takes the output from
references_read
and cleans the author information.
authors_clean(references)
authors_clean(references)
references |
output from |
Information on addresses, emails, ORCIDs, etc are matched.
It then attempts to match same author entries together into likely author groups based on common full names, addresses, emails, ORCIDs etc.
Records that are not matched this way have a Jaro-Winkler similiarty analysis metric calculated for all possible matching author names.
This calculates the amount of character similarities based on distance of similar character.
## Load the refsplitr sample dataset "BITR" data(BITR) BITR_clean <- authors_clean(BITR) ## The output of authors_clean is a list with two elements, ## which can be assigend to dataframes. BITR_review_df <- BITR_clean$review BITR_prelim_df <- BITR_clean$prelim ## Users can save the these dataframes outside of R as .csv files. ## The "review_df.csv" is then used to review the groupID or authorID ## assignments and make any necessary corrections. ## The function "authors_refine" is used to load and merge the changes ## into R and create a dataframe used for analyses.
## Load the refsplitr sample dataset "BITR" data(BITR) BITR_clean <- authors_clean(BITR) ## The output of authors_clean is a list with two elements, ## which can be assigend to dataframes. BITR_review_df <- BITR_clean$review BITR_prelim_df <- BITR_clean$prelim ## Users can save the these dataframes outside of R as .csv files. ## The "review_df.csv" is then used to review the groupID or authorID ## assignments and make any necessary corrections. ## The function "authors_refine" is used to load and merge the changes ## into R and create a dataframe used for analyses.
authors_georef
This function takes the final author list from
refine_authors, and calculates the lat long of the addresses.
It does this by feeding the addresses into data science toolkit.
In order to maximize effectiveness and mitigate errors in parsing addresses
We run this multiple times creating addresses in different ways
in hopes that the google georeferencing API can recognize an address
1st. University, city, zipcode, country
2nd. City, zipcode, country
3rd. city, country
4th. University, country
authors_georef(data, address_column = "address")
authors_georef(data, address_column = "address")
data |
dataframe from |
address_column |
name of column in quotes where the addresses are |
The output is a list with three data.frames
addresses
is a data frame with all information from
refine_authors plus new location columns and calculated lat longs.
missing addresses
is a data frame with all addresses could
not be geocoded
addresses
is a data frame like addresses
except
the missing addresses are gone.
## Not run: BITR_georef_df <- authors_georef(BITR_refined, address_column='address') ## End(Not run)
## Not run: BITR_georef_df <- authors_georef(BITR_refined, address_column='address') ## End(Not run)
authors_refine
This function takes the author list output after the
output has been synthesized for incorrect author matches. It contains a
similarity score cutoff like read_authors. This however is to further
constrain the list. New values ARE NOT created, instead it filters by the
sim_score column in the output file.
authors_refine(review, prelim, sim_score = NULL, confidence = NULL)
authors_refine(review, prelim, sim_score = NULL, confidence = NULL)
review |
the |
prelim |
the |
sim_score |
similarity score cut off point. Number from 0-1. |
confidence |
confidence score cut off point. Number from 0 - 10. |
## First gather the authors data.frame from authors_clean data(BITR) BITR_authors <- authors_clean(BITR) BITR_review_df <- BITR_authors$review BITR_prelim_df <- BITR_authors$prelim ## If accepting the preliminary disambiguation ## from authors_clean() without review: refine_df <- authors_refine(BITR_review_df, BITR_prelim_df, sim_score = 0.90, confidence = 5) ## Note that 'sim_score' and 'confidence' are optional arguments and are ## only required if changing the default values. refine_df <- authors_refine(BITR_review_df, BITR_prelim_df) ## If changes were made to groupID or authorID in the "_review.csv" file: ## then incorporate those changes in a text editor, save the corrections as ## a new file name, load in to R and run `authors_refine()` with the ## new corrections as the review arguement.
## First gather the authors data.frame from authors_clean data(BITR) BITR_authors <- authors_clean(BITR) BITR_review_df <- BITR_authors$review BITR_prelim_df <- BITR_authors$prelim ## If accepting the preliminary disambiguation ## from authors_clean() without review: refine_df <- authors_refine(BITR_review_df, BITR_prelim_df, sim_score = 0.90, confidence = 5) ## Note that 'sim_score' and 'confidence' are optional arguments and are ## only required if changing the default values. refine_df <- authors_refine(BITR_review_df, BITR_prelim_df) ## If changes were made to groupID or authorID in the "_review.csv" file: ## then incorporate those changes in a text editor, save the corrections as ## a new file name, load in to R and run `authors_refine()` with the ## new corrections as the review arguement.
A dataset containing 10 articles taken from the BioTropica journal.
This dataset represents the typical formatted output from references_read()
in the refsplitr package. It serves as a testbed for commonly miscategorized names
BITR
BITR
A data frame with 10 rows and 32 variables:
the original filename the text was created from
the unique identifier given to each reference article by references_read()
Abstract
Full Names
Abbreviated names
Addresses
emails
Web of Science ID
OrcID
Reprint Address
Title
Web of Knowledge Unique ID
See url below
See url below
See url below
See url below
See url below
See url below
See url below
See url below
See url below
See url below
See url below
See url below
See url below
See url below
See url below
See url below
See url below
See url below
See url below
See url below
The remaining codes are described on the Web of Knowledge website: https://images.webofknowledge.com/images/help/WOS/hs_wos_fieldtags.html
A dataset containing 41 authors taken from the BioTropica journal.
This dataset represents the typical formatted output
from authors_georef()
in the refsplitr package. It serves as a useful testing data set for
spatial functions and
BITR_geocode
BITR_geocode
A data frame with 41 rows and 15 variables:
ID field populated in authors_clean
also can be considered institution for non-universities
character, international postcode
country name
numeric, latitude populated from authors_georef
numeric, longitude populated from authors_georef
ID field for what name group the author is identied as from authors_clean()
numeric, order of author from jounral article
address of references pulled from the original raw WOS file
department which is nested within university
reprint address, pulled from the original raw WOS file
ResearcherID number, identifier given by web of science only, less common than OrcID
OrcID, unique identifier for researcher given by https://orcid.org
unique identifier to each article, given by WOS
unique identifier for each article, given by references_read()
#'
countries
countries
a character vector of country names
a character vector of country names
@export countries @noRd
This function plots an addresses data.frame object by country name.
plot_addresses_country(data, mapRegion = "world")
plot_addresses_country(data, mapRegion = "world")
data |
address element from the output from the |
mapRegion |
what portion of the world map to show. possible values
include |
## Using the output of authors_georef (e.g., BITR_geocode) data(BITR_geocode) ## Plots the whole world plot_addresses_country(BITR_geocode) ## Just select North America plot_addresses_country(BITR_geocode, mapRegion = 'North America')
## Using the output of authors_georef (e.g., BITR_geocode) data(BITR_geocode) ## Plots the whole world plot_addresses_country(BITR_geocode) ## Just select North America plot_addresses_country(BITR_geocode, mapRegion = 'North America')
This function plots an addresses data.frame object by point overlaid on the countries of the world.
plot_addresses_points(data, mapCountry = NULL)
plot_addresses_points(data, mapCountry = NULL)
data |
the |
mapCountry |
What country to map. Possible values
include |
## Using the output of authors_georef (e.g., BITR_geocode) data(BITR_geocode) ## Plots the whole world plot_addresses_points(BITR_geocode) ## mapCountry names can be querried using: data(countries) ## Plot only Brazil plot_addresses_points(BITR_geocode, mapCountry = 'Brazil')
## Using the output of authors_georef (e.g., BITR_geocode) data(BITR_geocode) ## Plots the whole world plot_addresses_points(BITR_geocode) ## mapCountry names can be querried using: data(countries) ## Plot only Brazil plot_addresses_points(BITR_geocode, mapCountry = 'Brazil')
This function takes an addresses data.frame, links it to an authors__references dataset and plots a network diagram generated for individual points of co-authorship.
plot_net_address( data, mapRegion = "world", lineResolution = 10, lineAlpha = 0.5 )
plot_net_address( data, mapRegion = "world", lineResolution = 10, lineAlpha = 0.5 )
data |
the |
mapRegion |
what portion of the world map to show. possible values
include |
lineResolution |
the resolution of the lines drawn, higher numbers will make smoother curves default is 10. |
lineAlpha |
transparency of the lines, fed into ggplots alpha value. Number between 0 - 1. |
## Using the output of authors_georef (e.g., BITR_geocode) data(BITR_geocode) ## Plots the whole world output <- plot_net_address(BITR_geocode) ## Just select North America output <- plot_net_address(BITR_geocode, mapRegion = 'North America') ## Change the transparency of lines by modifying the lineAlpha parameter output <- plot_net_address(BITR_geocode, lineAlpha = 0.2) ## Change the curvature of lines by modifying the lineResolution paramater output <- plot_net_address(BITR_geocode, lineResolution = 30 ) output <- plot_net_address(BITR_geocode, mapRegion = 'North America', lineAlpha = 0.2, lineResolution = 30)
## Using the output of authors_georef (e.g., BITR_geocode) data(BITR_geocode) ## Plots the whole world output <- plot_net_address(BITR_geocode) ## Just select North America output <- plot_net_address(BITR_geocode, mapRegion = 'North America') ## Change the transparency of lines by modifying the lineAlpha parameter output <- plot_net_address(BITR_geocode, lineAlpha = 0.2) ## Change the curvature of lines by modifying the lineResolution paramater output <- plot_net_address(BITR_geocode, lineResolution = 30 ) output <- plot_net_address(BITR_geocode, mapRegion = 'North America', lineAlpha = 0.2, lineResolution = 30)
Creates a network diagram of coauthors' countries linked by reference This function takes an addresses data.frame, links it to an authors_references dataset and plots a network diagram generated for co-authorship.
plot_net_coauthor(data)
plot_net_coauthor(data)
data |
the |
## Using the output of authors_georef (e.g., BITR_geocode) data(BITR_geocode) plot_net_coauthor(BITR_geocode)
## Using the output of authors_georef (e.g., BITR_geocode) data(BITR_geocode) plot_net_coauthor(BITR_geocode)
This function takes an addresses data.frame, links it to an authors_references dataset and plots a network diagram generated for countries of co-authorship.
plot_net_country( data, lineResolution = 10, mapRegion = "world", lineAlpha = 0.5 )
plot_net_country( data, lineResolution = 10, mapRegion = "world", lineAlpha = 0.5 )
data |
the |
lineResolution |
the resolution of the lines drawn, higher numbers will make smoother curves default is 10. |
mapRegion |
what portion of the world map to show. possible values
include |
lineAlpha |
transparency of the lines, fed into ggplots alpha value. Number between 0 - 1. |
## Using the output of authors_georef (e.g., BITR_geocode) data(BITR_geocode) ## Plots the whole world output <- plot_net_country(BITR_geocode) ## Mapping only North America output <- plot_net_country(BITR_geocode, mapRegion = 'North America') ## Change the transparency of lines by modifying the lineAlpha parameter output <- plot_net_country(BITR_geocode, lineAlpha = 0.2) ## Change the curvature of lines by modifying the lineResolution paramater output <- plot_net_country(BITR_geocode, lineResolution = 30 ) ## With all arguments: output <- plot_net_country(BITR_geocode, mapRegion = 'North America', lineAlpha = 0.2, lineResolution = 30)
## Using the output of authors_georef (e.g., BITR_geocode) data(BITR_geocode) ## Plots the whole world output <- plot_net_country(BITR_geocode) ## Mapping only North America output <- plot_net_country(BITR_geocode, mapRegion = 'North America') ## Change the transparency of lines by modifying the lineAlpha parameter output <- plot_net_country(BITR_geocode, lineAlpha = 0.2) ## Change the curvature of lines by modifying the lineResolution paramater output <- plot_net_country(BITR_geocode, lineResolution = 30 ) ## With all arguments: output <- plot_net_country(BITR_geocode, mapRegion = 'North America', lineAlpha = 0.2, lineResolution = 30)
references_read
This function reads Thomson Reuters Web of Knowledge
and ISI format reference data files into an R-friendly data format. The resulting dataframe
is the argument for the refplitr function authors_clean()
.
references_read(data = ".", dir = FALSE, include_all = FALSE)
references_read(data = ".", dir = FALSE, include_all = FALSE)
data |
the location of the file or files to be imported. This can be either the absolute or relative name of the file (for a single file) or folder (for multiple files stored in the same folder; used in conjuction with 'dir = TRUE“). If left blank it is assumed the location is the working directory. |
dir |
if FALSE it is assumed a single file is to be imported. Set to TRUE if importing multiple files (the path to the folder in which files are stored is set with 'data=“; all files in the folder will be imported). Defaults to FALSE. |
include_all |
if FALSE only a subset of commonly used fields from references records are imported.
If TRUE then all fields from the reference records are imported. Defaults to FALSE.
The additional data fields included if |
## If a single files is being imported from a folder called "data" located in an RStudio Project: ## imported_refs<-references_read(data = './data/refs.txt', dir = FALSE, include_all=FALSE) ## If multiple files are being imported from a folder named "heliconia" nested within a folder ## called "data" located in an RStudio Project: ## heliconia_refs<-references_read(data = './data/heliconia', dir = TRUE, include_all=FALSE) ## To load the Web of Science records used in the examples in the documentation BITR_data_example <- system.file('extdata', 'BITR_test.txt', package = 'refsplitr') BITR <- references_read(BITR_data_example)
## If a single files is being imported from a folder called "data" located in an RStudio Project: ## imported_refs<-references_read(data = './data/refs.txt', dir = FALSE, include_all=FALSE) ## If multiple files are being imported from a folder named "heliconia" nested within a folder ## called "data" located in an RStudio Project: ## heliconia_refs<-references_read(data = './data/heliconia', dir = TRUE, include_all=FALSE) ## To load the Web of Science records used in the examples in the documentation BITR_data_example <- system.file('extdata', 'BITR_test.txt', package = 'refsplitr') BITR <- references_read(BITR_data_example)