Title: | Checks for Exclusion Criteria in Online Data |
---|---|
Description: | Data that are collected through online sources such as Mechanical Turk may require excluding rows because of IP address duplication, geolocation, or completion duration. This package facilitates exclusion of these data for Qualtrics datasets. |
Authors: | Jeffrey R. Stevens [aut, cre, cph] , Joseph O'Brien [rev] , Julia Silge [rev] |
Maintainer: | Jeffrey R. Stevens <[email protected]> |
License: | GPL (>= 3) |
Version: | 0.5.1 |
Built: | 2024-10-28 05:51:39 UTC |
Source: | https://github.com/ropensci/excluder |
The check_duplicates()
function subsets rows of data, retaining rows
that have the same IP address and/or same latitude and longitude. The
function is written to work with data from
Qualtrics surveys.
check_duplicates( x, id_col = "ResponseId", ip_col = "IPAddress", location_col = c("LocationLatitude", "LocationLongitude"), rename = TRUE, dupl_ip = TRUE, dupl_location = TRUE, include_na = FALSE, keep = FALSE, quiet = FALSE, print = TRUE )
check_duplicates( x, id_col = "ResponseId", ip_col = "IPAddress", location_col = c("LocationLatitude", "LocationLongitude"), rename = TRUE, dupl_ip = TRUE, dupl_location = TRUE, include_na = FALSE, keep = FALSE, quiet = FALSE, print = TRUE )
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
ip_col |
Column name for IP addresses. |
location_col |
Two element vector specifying columns for latitude and longitude (in that order). |
rename |
Logical indicating whether to rename columns (using |
dupl_ip |
Logical indicating whether to check IP addresses. |
dupl_location |
Logical indicating whether to check latitude and longitude. |
include_na |
Logical indicating whether to include rows with NAs for IP address and location as potentially excluded rows. |
keep |
Logical indicating whether to keep or remove exclusion column. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
To record this information in your Qualtrics survey, you must ensure that Anonymize responses is disabled.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
By default, IP address and location are both checked, but they can be
checked separately with the dupl_ip
and dupl_location
arguments.
The function outputs to console separate messages about the number of rows with duplicate IP addresses and rows with duplicate locations. These counts are computed independently, so rows may be counted for both types of duplicates.
An object of the same type as x
that includes the rows with
duplicate IP addresses and/or locations. This includes a column
called dupe_count that returns the number of duplicates.
For a function that marks these rows, use mark_duplicates()
.
For a function that excludes these rows, use exclude_duplicates()
.
Other duplicates functions:
exclude_duplicates()
,
mark_duplicates()
Other check functions:
check_duration()
,
check_ip()
,
check_location()
,
check_preview()
,
check_progress()
,
check_resolution()
# Check for duplicate IP addresses and locations data(qualtrics_text) check_duplicates(qualtrics_text) # Check only for duplicate locations qualtrics_text %>% check_duplicates(dupl_location = FALSE) # Do not print rows to console qualtrics_text %>% check_duplicates(print = FALSE) # Do not print message to console qualtrics_text %>% check_duplicates(quiet = TRUE)
# Check for duplicate IP addresses and locations data(qualtrics_text) check_duplicates(qualtrics_text) # Check only for duplicate locations qualtrics_text %>% check_duplicates(dupl_location = FALSE) # Do not print rows to console qualtrics_text %>% check_duplicates(print = FALSE) # Do not print message to console qualtrics_text %>% check_duplicates(quiet = TRUE)
The check_duration()
function subsets rows of data, retaining rows
that have durations that are too fast or too slow.
The function is written to work with data from
Qualtrics surveys.
check_duration( x, min_duration = 10, max_duration = NULL, id_col = "ResponseId", duration_col = "Duration (in seconds)", rename = TRUE, keep = FALSE, quiet = FALSE, print = TRUE )
check_duration( x, min_duration = 10, max_duration = NULL, id_col = "ResponseId", duration_col = "Duration (in seconds)", rename = TRUE, keep = FALSE, quiet = FALSE, print = TRUE )
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
min_duration |
Minimum duration that is too fast in seconds. |
max_duration |
Maximum duration that is too slow in seconds. |
id_col |
Column name for unique row ID (e.g., participant). |
duration_col |
Column name for durations. |
rename |
Logical indicating whether to rename columns (using |
keep |
Logical indicating whether to keep or remove exclusion column. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
Default column names are set based on output from the
qualtRics::fetch_survey()
.
By default, minimum durations of 10 seconds are checked, but either
minima or maxima can be checked with the min_duration
and
max_duration
arguments. The function outputs to console separate
messages about the number of rows that are too fast or too slow.
This function returns the fast and slow rows.
An object of the same type as x
that includes the rows with fast and/or
slow duration.
For a function that marks these rows, use mark_duration()
.
For a function that excludes these rows, use exclude_duration()
.
Other duration functions:
exclude_duration()
,
mark_duration()
Other check functions:
check_duplicates()
,
check_ip()
,
check_location()
,
check_preview()
,
check_progress()
,
check_resolution()
# Check for durations faster than 100 seconds data(qualtrics_text) check_duration(qualtrics_text, min_duration = 100) # Remove preview data first qualtrics_text %>% exclude_preview() %>% check_duration(min_duration = 100) # Check only for durations slower than 800 seconds qualtrics_text %>% exclude_preview() %>% check_duration(max_duration = 800) # Do not print rows to console qualtrics_text %>% exclude_preview() %>% check_duration(min_duration = 100, print = FALSE) # Do not print message to console qualtrics_text %>% exclude_preview() %>% check_duration(min_duration = 100, quiet = TRUE)
# Check for durations faster than 100 seconds data(qualtrics_text) check_duration(qualtrics_text, min_duration = 100) # Remove preview data first qualtrics_text %>% exclude_preview() %>% check_duration(min_duration = 100) # Check only for durations slower than 800 seconds qualtrics_text %>% exclude_preview() %>% check_duration(max_duration = 800) # Do not print rows to console qualtrics_text %>% exclude_preview() %>% check_duration(min_duration = 100, print = FALSE) # Do not print message to console qualtrics_text %>% exclude_preview() %>% check_duration(min_duration = 100, quiet = TRUE)
The check_ip()
function subsets rows of data, retaining rows
that have IP addresses from outside the specified country.
The function is written to work with data from
Qualtrics surveys.
check_ip( x, id_col = "ResponseId", ip_col = "IPAddress", rename = TRUE, country = "US", include_na = FALSE, keep = FALSE, quiet = FALSE, print = TRUE )
check_ip( x, id_col = "ResponseId", ip_col = "IPAddress", rename = TRUE, country = "US", include_na = FALSE, keep = FALSE, quiet = FALSE, print = TRUE )
x |
Data frame or tibble (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
ip_col |
Column name for IP addresses. |
rename |
Logical indicating whether to rename columns (using |
country |
Two-letter abbreviation of country to check (default is "US"). |
include_na |
Logical indicating whether to include rows with NA in IP address column in the output list of potentially excluded data. |
keep |
Logical indicating whether to keep or remove exclusion column. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
To record this information in your Qualtrics survey, you must ensure that Anonymize responses is disabled.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The function uses ipaddress::country_networks()
to assign IP addresses to
specific countries using
ISO 3166-1 alpha-2 country codes.
The function outputs to console a message about the number of rows
with IP addresses outside of the specified country. If there are NA
s for IP
addresses (likely due to including preview data—see check_preview()
), it
will print a message alerting to the number of rows with NA
s.
An object of the same type as x
that includes the rows with
IP addresses outside of the specified country.
For a function that marks these rows, use mark_ip()
.
For a function that excludes these rows, use exclude_ip()
.
This function requires internet connectivity as it uses the
ipaddress::country_networks()
function, which pulls daily updated data
from https://www.iwik.org/ipcountry/. It only updates the data once
per session, as it caches the results for future work during the session.
Other ip functions:
exclude_ip()
,
mark_ip()
Other check functions:
check_duplicates()
,
check_duration()
,
check_location()
,
check_preview()
,
check_progress()
,
check_resolution()
# Check for IP addresses outside of the US data(qualtrics_text) check_ip(qualtrics_text) # Remove preview data first qualtrics_text %>% exclude_preview() %>% check_ip() # Check for IP addresses outside of Germany qualtrics_text %>% exclude_preview() %>% check_ip(country = "DE") # Do not print rows to console qualtrics_text %>% exclude_preview() %>% check_ip(print = FALSE) # Do not print message to console qualtrics_text %>% exclude_preview() %>% check_ip(quiet = TRUE)
# Check for IP addresses outside of the US data(qualtrics_text) check_ip(qualtrics_text) # Remove preview data first qualtrics_text %>% exclude_preview() %>% check_ip() # Check for IP addresses outside of Germany qualtrics_text %>% exclude_preview() %>% check_ip(country = "DE") # Do not print rows to console qualtrics_text %>% exclude_preview() %>% check_ip(print = FALSE) # Do not print message to console qualtrics_text %>% exclude_preview() %>% check_ip(quiet = TRUE)
The check_location()
function subsets rows of data, retaining rows
that have locations outside of the US.
The function is written to work with data from
Qualtrics surveys.
check_location( x, id_col = "ResponseId", location_col = c("LocationLatitude", "LocationLongitude"), rename = TRUE, include_na = FALSE, keep = FALSE, quiet = FALSE, print = TRUE )
check_location( x, id_col = "ResponseId", location_col = c("LocationLatitude", "LocationLongitude"), rename = TRUE, include_na = FALSE, keep = FALSE, quiet = FALSE, print = TRUE )
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
location_col |
Two element vector specifying columns for latitude and longitude (in that order). |
rename |
Logical indicating whether to rename columns (using |
include_na |
Logical indicating whether to include rows with NA in latitude and longitude columns in the output list of potentially excluded data. |
keep |
Logical indicating whether to keep or remove exclusion column. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
To record this information in your Qualtrics survey, you must ensure that Anonymize responses is disabled.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The function only works for the United States.
It uses the #' maps::map.where()
to determine if latitude and longitude
are inside the US.
The function outputs to console a message about the number of rows with locations outside of the US.
The output is a data frame of the rows that are located outside of
the US and (if include_na == FALSE
) rows with no location information.
For a function that marks these rows, use mark_location()
.
For a function that excludes these rows, use exclude_location()
.
Other location functions:
exclude_location()
,
mark_location()
Other check functions:
check_duplicates()
,
check_duration()
,
check_ip()
,
check_preview()
,
check_progress()
,
check_resolution()
# Check for locations outside of the US data(qualtrics_text) check_location(qualtrics_text) # Remove preview data first qualtrics_text %>% exclude_preview() %>% check_location() # Do not print rows to console qualtrics_text %>% exclude_preview() %>% check_location(print = FALSE) # Do not print message to console qualtrics_text %>% exclude_preview() %>% check_location(quiet = TRUE)
# Check for locations outside of the US data(qualtrics_text) check_location(qualtrics_text) # Remove preview data first qualtrics_text %>% exclude_preview() %>% check_location() # Do not print rows to console qualtrics_text %>% exclude_preview() %>% check_location(print = FALSE) # Do not print message to console qualtrics_text %>% exclude_preview() %>% check_location(quiet = TRUE)
The check_preview()
function subsets rows of data, retaining rows
that are survey previews.
The function is written to work with data from
Qualtrics surveys.
check_preview( x, id_col = "ResponseId", preview_col = "Status", rename = TRUE, keep = FALSE, quiet = FALSE, print = TRUE )
check_preview( x, id_col = "ResponseId", preview_col = "Status", rename = TRUE, keep = FALSE, quiet = FALSE, print = TRUE )
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
preview_col |
Column name for survey preview. |
rename |
Logical indicating whether to rename columns (using |
keep |
Logical indicating whether to keep or remove exclusion column. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The preview column in Qualtrics can be a numeric or character vector
depending on whether it is exported as choice text or numeric values.
This function works for both.
The function outputs to console a message about the number of rows that are survey previews.
The output is a data frame of the rows
that are survey previews.
For a function that marks these rows, use mark_preview()
.
For a function that excludes these rows, use exclude_preview()
.
Other preview functions:
exclude_preview()
,
mark_preview()
Other check functions:
check_duplicates()
,
check_duration()
,
check_ip()
,
check_location()
,
check_progress()
,
check_resolution()
# Check for survey previews data(qualtrics_text) check_preview(qualtrics_text) # Works for Qualtrics data exported as numeric values, too qualtrics_numeric %>% check_preview() # Do not print rows to console qualtrics_text %>% check_preview(print = FALSE) # Do not print message to console qualtrics_text %>% check_preview(quiet = TRUE)
# Check for survey previews data(qualtrics_text) check_preview(qualtrics_text) # Works for Qualtrics data exported as numeric values, too qualtrics_numeric %>% check_preview() # Do not print rows to console qualtrics_text %>% check_preview(print = FALSE) # Do not print message to console qualtrics_text %>% check_preview(quiet = TRUE)
The check_progress()
function subsets rows of data, retaining rows
that have incomplete progress.
The function is written to work with data from
Qualtrics surveys.
check_progress( x, min_progress = 100, id_col = "ResponseId", finished_col = "Finished", progress_col = "Progress", rename = TRUE, keep = FALSE, quiet = FALSE, print = TRUE )
check_progress( x, min_progress = 100, id_col = "ResponseId", finished_col = "Finished", progress_col = "Progress", rename = TRUE, keep = FALSE, quiet = FALSE, print = TRUE )
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
min_progress |
Amount of progress considered acceptable to include. |
id_col |
Column name for unique row ID (e.g., participant). |
finished_col |
Column name for whether survey was completed. |
progress_col |
Column name for percentage of survey completed. |
rename |
Logical indicating whether to rename columns (using |
keep |
Logical indicating whether to keep or remove exclusion column. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The default requires 100% completion, but lower levels of completion
maybe acceptable and can be allowed by specifying the min_progress
argument.
The finished column in Qualtrics can be a numeric or character vector
depending on whether it is exported as choice text or numeric values.
This function works for both.
The function outputs to console a message about the number of rows that have incomplete progress.
The output is a data frame of the rows
that have incomplete progress.
For a function that marks these rows, use mark_progress()
.
For a function that excludes these rows, use exclude_progress()
.
Other progress functions:
exclude_progress()
,
mark_progress()
Other check functions:
check_duplicates()
,
check_duration()
,
check_ip()
,
check_location()
,
check_preview()
,
check_resolution()
# Check for rows with incomplete progress data(qualtrics_text) check_progress(qualtrics_text) # Remove preview data first qualtrics_text %>% exclude_preview() %>% check_progress() # Include a lower acceptable completion percentage qualtrics_numeric %>% exclude_preview() %>% check_progress(min_progress = 98) # Do not print rows to console qualtrics_text %>% exclude_preview() %>% check_progress(print = FALSE) # Do not print message to console qualtrics_text %>% exclude_preview() %>% check_progress(quiet = TRUE)
# Check for rows with incomplete progress data(qualtrics_text) check_progress(qualtrics_text) # Remove preview data first qualtrics_text %>% exclude_preview() %>% check_progress() # Include a lower acceptable completion percentage qualtrics_numeric %>% exclude_preview() %>% check_progress(min_progress = 98) # Do not print rows to console qualtrics_text %>% exclude_preview() %>% check_progress(print = FALSE) # Do not print message to console qualtrics_text %>% exclude_preview() %>% check_progress(quiet = TRUE)
The check_resolution()
function subsets rows of data, retaining rows
that have unacceptable screen resolution. This can be used, for example, to
determine data collected via phones when desktop monitors are required.
The function is written to work with data from
Qualtrics surveys.
check_resolution( x, res_min = 1000, width_min = 0, height_min = 0, id_col = "ResponseId", res_col = "Resolution", rename = TRUE, keep = FALSE, quiet = FALSE, print = TRUE )
check_resolution( x, res_min = 1000, width_min = 0, height_min = 0, id_col = "ResponseId", res_col = "Resolution", rename = TRUE, keep = FALSE, quiet = FALSE, print = TRUE )
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
res_min |
Minimum acceptable screen resolution (width and height). |
width_min |
Minimum acceptable screen width. |
height_min |
Minimum acceptable screen height. |
id_col |
Column name for unique row ID (e.g., participant). |
res_col |
Column name for screen resolution (in format widthxheight). |
rename |
Logical indicating whether to rename columns (using |
keep |
Logical indicating whether to keep or remove exclusion column. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
To record this information in your Qualtrics survey, you must insert a meta info question.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The function outputs to console a message about the number of rows with unacceptable screen resolution.
The output is a data frame of the rows that have unacceptable screen
resolutions. This includes new columns for resolution width and height.
For a function that marks these rows, use mark_resolution()
.
For a function that excludes these rows, use exclude_resolution()
.
Other resolution functions:
exclude_resolution()
,
mark_resolution()
Other check functions:
check_duplicates()
,
check_duration()
,
check_ip()
,
check_location()
,
check_preview()
,
check_progress()
# Check for survey previews data(qualtrics_text) check_resolution(qualtrics_text) # Remove preview data first qualtrics_text %>% exclude_preview() %>% check_resolution() # Do not print rows to console qualtrics_text %>% exclude_preview() %>% check_resolution(print = FALSE) # Do not print message to console qualtrics_text %>% exclude_preview() %>% check_resolution(quiet = TRUE)
# Check for survey previews data(qualtrics_text) check_resolution(qualtrics_text) # Remove preview data first qualtrics_text %>% exclude_preview() %>% check_resolution() # Do not print rows to console qualtrics_text %>% exclude_preview() %>% check_resolution(print = FALSE) # Do not print message to console qualtrics_text %>% exclude_preview() %>% check_resolution(quiet = TRUE)
The deidentify()
function selects out columns from
Qualtrics surveys that may include identifiable
information such as IP address, location, or computer characteristics.
deidentify(x, strict = TRUE)
deidentify(x, strict = TRUE)
x |
Data frame (downloaded from Qualtrics). |
strict |
Logical indicating whether to use strict or non-strict level of deidentification. Strict removes computer information columns in addition to IP address and location. |
The function offers two levels of deidentification. The default strict level removes columns associated with IP address and location and computer information (browser type and version, operating system, and screen resolution). The non-strict level removes only columns associated with IP address and location.
Typically, deidentification should be used at the end of a processing pipeline so that these columns can be used to exclude rows.
An object of the same type as x
that excludes Qualtrics columns with
identifiable information.
names(qualtrics_numeric) # Remove IP address, location, and computer information columns deid <- deidentify(qualtrics_numeric) names(deid) # Remove only IP address and location columns deid2 <- deidentify(qualtrics_numeric, strict = FALSE) names(deid2)
names(qualtrics_numeric) # Remove IP address, location, and computer information columns deid <- deidentify(qualtrics_numeric) names(deid) # Remove only IP address and location columns deid2 <- deidentify(qualtrics_numeric, strict = FALSE) names(deid2)
The exclude_duplicates()
function removes
rows of data that have the same IP address and/or same latitude and
longitude. The function is written to work with data from
Qualtrics surveys.
exclude_duplicates( x, id_col = "ResponseId", ip_col = "IPAddress", location_col = c("LocationLatitude", "LocationLongitude"), rename = TRUE, dupl_ip = TRUE, dupl_location = TRUE, include_na = FALSE, quiet = TRUE, print = TRUE, silent = FALSE )
exclude_duplicates( x, id_col = "ResponseId", ip_col = "IPAddress", location_col = c("LocationLatitude", "LocationLongitude"), rename = TRUE, dupl_ip = TRUE, dupl_location = TRUE, include_na = FALSE, quiet = TRUE, print = TRUE, silent = FALSE )
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
ip_col |
Column name for IP addresses. |
location_col |
Two element vector specifying columns for latitude and longitude (in that order). |
rename |
Logical indicating whether to rename columns (using |
dupl_ip |
Logical indicating whether to check IP addresses. |
dupl_location |
Logical indicating whether to check latitude and longitude. |
include_na |
Logical indicating whether to include rows with NAs for IP address and location as potentially excluded rows. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
silent |
Logical indicating whether to print message to console. Note this argument controls the exclude message not the check message. |
To record this information in your Qualtrics survey, you must ensure that Anonymize responses is disabled.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
By default, IP address and location are both checked, but they can be
checked separately with the dupl_ip
and dupl_location
arguments.
The function outputs to console separate messages about the number of rows with duplicate IP addresses and rows with duplicate locations. These counts are computed independently, so rows may be counted for both types of duplicates.
An object of the same type as x
that excludes rows
with duplicate IP addresses and/or locations.
For a function that just checks for and returns duplicate rows,
use check_duplicates()
. For a function that marks these rows,
use mark_duplicates()
.
Other duplicates functions:
check_duplicates()
,
mark_duplicates()
Other exclude functions:
exclude_duration()
,
exclude_ip()
,
exclude_location()
,
exclude_preview()
,
exclude_progress()
,
exclude_resolution()
# Exclude duplicate IP addresses and locations data(qualtrics_text) df <- exclude_duplicates(qualtrics_text) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% exclude_duplicates() # Exclude only for duplicate locations df <- qualtrics_text %>% exclude_preview() %>% exclude_duplicates(dupl_location = FALSE)
# Exclude duplicate IP addresses and locations data(qualtrics_text) df <- exclude_duplicates(qualtrics_text) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% exclude_duplicates() # Exclude only for duplicate locations df <- qualtrics_text %>% exclude_preview() %>% exclude_duplicates(dupl_location = FALSE)
The exclude_duration()
function removes
rows of data that have durations that are too fast or too slow.
The function is written to work with data from
Qualtrics surveys.
exclude_duration( x, min_duration = 10, max_duration = NULL, id_col = "ResponseId", duration_col = "Duration (in seconds)", rename = TRUE, quiet = TRUE, print = TRUE, silent = FALSE )
exclude_duration( x, min_duration = 10, max_duration = NULL, id_col = "ResponseId", duration_col = "Duration (in seconds)", rename = TRUE, quiet = TRUE, print = TRUE, silent = FALSE )
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
min_duration |
Minimum duration that is too fast in seconds. |
max_duration |
Maximum duration that is too slow in seconds. |
id_col |
Column name for unique row ID (e.g., participant). |
duration_col |
Column name for durations. |
rename |
Logical indicating whether to rename columns (using |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
silent |
Logical indicating whether to print message to console. Note this argument controls the exclude message not the check message. |
Default column names are set based on output from the
qualtRics::fetch_survey()
.
By default, minimum durations of 10 seconds are checked, but either
minima or maxima can be checked with the min_duration
and
max_duration
arguments. The function outputs to console separate
messages about the number of rows that are too fast or too slow.
This function returns the fast and slow rows.
An object of the same type as x
that excludes rows
with fast and/or slow duration.
For a function that checks for these rows, use check_duration()
.
For a function that marks these rows, use mark_duration()
.
Other duration functions:
check_duration()
,
mark_duration()
Other exclude functions:
exclude_duplicates()
,
exclude_ip()
,
exclude_location()
,
exclude_preview()
,
exclude_progress()
,
exclude_resolution()
# Exclude durations faster than 100 seconds data(qualtrics_text) df <- exclude_duration(qualtrics_text, min_duration = 100) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% exclude_duration() # Exclude only for durations slower than 800 seconds df <- qualtrics_text %>% exclude_preview() %>% exclude_duration(max_duration = 800)
# Exclude durations faster than 100 seconds data(qualtrics_text) df <- exclude_duration(qualtrics_text, min_duration = 100) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% exclude_duration() # Exclude only for durations slower than 800 seconds df <- qualtrics_text %>% exclude_preview() %>% exclude_duration(max_duration = 800)
The exclude_ip()
function removes rows of data that have
IP addresses from outside the specified country.
The function is written to work with data from
Qualtrics surveys.
exclude_ip( x, id_col = "ResponseId", ip_col = "IPAddress", rename = TRUE, country = "US", include_na = FALSE, quiet = TRUE, print = TRUE, silent = FALSE )
exclude_ip( x, id_col = "ResponseId", ip_col = "IPAddress", rename = TRUE, country = "US", include_na = FALSE, quiet = TRUE, print = TRUE, silent = FALSE )
x |
Data frame or tibble (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
ip_col |
Column name for IP addresses. |
rename |
Logical indicating whether to rename columns (using |
country |
Two-letter abbreviation of country to check (default is "US"). |
include_na |
Logical indicating whether to include rows with NA in IP address column in the output list of potentially excluded data. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
silent |
Logical indicating whether to print message to console. Note this argument controls the exclude message not the check message. |
To record this information in your Qualtrics survey, you must ensure that Anonymize responses is disabled.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The function uses ipaddress::country_networks()
to assign IP addresses to
specific countries using
ISO 3166-1 alpha-2 country codes.
The function outputs to console a message about the number of rows
with IP addresses outside of the specified country. If there are NA
s for IP
addresses (likely due to including preview data—see check_preview()
), it
will print a message alerting to the number of rows with NA
s.
An object of the same type as x
that excludes rows
with IP addresses outside of the specified country.
For a function that checks these rows, use check_ip()
.
For a function that marks these rows, use mark_ip()
.
This function requires internet connectivity as it uses the
ipaddress::country_networks()
function, which pulls daily updated data
from http://www.iwik.org/ipcountry/. It only updates the data once
per session, as it caches the results for future work during the session.
Other ip functions:
check_ip()
,
mark_ip()
Other exclude functions:
exclude_duplicates()
,
exclude_duration()
,
exclude_location()
,
exclude_preview()
,
exclude_progress()
,
exclude_resolution()
# Exclude IP addresses outside of the US data(qualtrics_text) df <- exclude_ip(qualtrics_text) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% exclude_ip() # Exclude IP addresses outside of Germany df <- qualtrics_text %>% exclude_preview() %>% exclude_ip(country = "DE")
# Exclude IP addresses outside of the US data(qualtrics_text) df <- exclude_ip(qualtrics_text) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% exclude_ip() # Exclude IP addresses outside of Germany df <- qualtrics_text %>% exclude_preview() %>% exclude_ip(country = "DE")
The exclude_location()
function removes
rows that have locations outside of the US.
The function is written to work with data from
Qualtrics surveys.
exclude_location( x, id_col = "ResponseId", location_col = c("LocationLatitude", "LocationLongitude"), rename = TRUE, include_na = FALSE, quiet = TRUE, print = TRUE, silent = FALSE )
exclude_location( x, id_col = "ResponseId", location_col = c("LocationLatitude", "LocationLongitude"), rename = TRUE, include_na = FALSE, quiet = TRUE, print = TRUE, silent = FALSE )
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
location_col |
Two element vector specifying columns for latitude and longitude (in that order). |
rename |
Logical indicating whether to rename columns (using |
include_na |
Logical indicating whether to include rows with NA in latitude and longitude columns in the output list of potentially excluded data. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
silent |
Logical indicating whether to print message to console. Note this argument controls the exclude message not the check message. |
To record this information in your Qualtrics survey, you must ensure that Anonymize responses is disabled.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The function only works for the United States.
It uses the #' maps::map.where()
to determine if latitude and longitude
are inside the US.
The function outputs to console a message about the number of rows with locations outside of the US.
An object of the same type as x
that excludes rows
that are located outside of the US and (if include_na == FALSE
) rows with
no location information.
For a function that checks for these rows, use check_location()
.
For a function that marks these rows, use mark_location()
.
Other location functions:
check_location()
,
mark_location()
Other exclude functions:
exclude_duplicates()
,
exclude_duration()
,
exclude_ip()
,
exclude_preview()
,
exclude_progress()
,
exclude_resolution()
# Exclude locations outside of the US data(qualtrics_text) df <- exclude_location(qualtrics_text) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% exclude_location()
# Exclude locations outside of the US data(qualtrics_text) df <- exclude_location(qualtrics_text) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% exclude_location()
The exclude_preview()
function removes
rows that are survey previews.
The function is written to work with data from
Qualtrics surveys.
exclude_preview( x, id_col = "ResponseId", preview_col = "Status", rename = TRUE, quiet = TRUE, print = TRUE, silent = FALSE )
exclude_preview( x, id_col = "ResponseId", preview_col = "Status", rename = TRUE, quiet = TRUE, print = TRUE, silent = FALSE )
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
preview_col |
Column name for survey preview. |
rename |
Logical indicating whether to rename columns (using |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
silent |
Logical indicating whether to print message to console. Note this argument controls the exclude message not the check message. |
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The preview column in Qualtrics can be a numeric or character vector
depending on whether it is exported as choice text or numeric values.
This function works for both.
The function outputs to console a message about the number of rows that are survey previews.
An object of the same type as x
that excludes rows
that are survey previews.
For a function that checks for these rows, use check_preview()
.
For a function that marks these rows, use mark_preview()
.
Other preview functions:
check_preview()
,
mark_preview()
Other exclude functions:
exclude_duplicates()
,
exclude_duration()
,
exclude_ip()
,
exclude_location()
,
exclude_progress()
,
exclude_resolution()
# Exclude survey previews data(qualtrics_text) df <- exclude_preview(qualtrics_text) # Works for Qualtrics data exported as numeric values, too df <- qualtrics_numeric %>% exclude_preview() # Do not print rows to console df <- qualtrics_text %>% exclude_preview(print = FALSE)
# Exclude survey previews data(qualtrics_text) df <- exclude_preview(qualtrics_text) # Works for Qualtrics data exported as numeric values, too df <- qualtrics_numeric %>% exclude_preview() # Do not print rows to console df <- qualtrics_text %>% exclude_preview(print = FALSE)
The exclude_progress()
function removes
rows that have incomplete progress.
The function is written to work with data from
Qualtrics surveys.
exclude_progress( x, min_progress = 100, id_col = "ResponseId", finished_col = "Finished", progress_col = "Progress", rename = TRUE, quiet = TRUE, print = TRUE, silent = FALSE )
exclude_progress( x, min_progress = 100, id_col = "ResponseId", finished_col = "Finished", progress_col = "Progress", rename = TRUE, quiet = TRUE, print = TRUE, silent = FALSE )
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
min_progress |
Amount of progress considered acceptable to include. |
id_col |
Column name for unique row ID (e.g., participant). |
finished_col |
Column name for whether survey was completed. |
progress_col |
Column name for percentage of survey completed. |
rename |
Logical indicating whether to rename columns (using |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
silent |
Logical indicating whether to print message to console. Note this argument controls the exclude message not the check message. |
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The default requires 100% completion, but lower levels of completion
maybe acceptable and can be allowed by specifying the min_progress
argument.
The finished column in Qualtrics can be a numeric or character vector
depending on whether it is exported as choice text or numeric values.
This function works for both.
The function outputs to console a message about the number of rows that have incomplete progress.
An object of the same type as x
that excludes rows
that have incomplete progress.
For a function that checks for these rows, use check_progress()
.
For a function that marks these rows, use mark_progress()
.
Other progress functions:
check_progress()
,
mark_progress()
Other exclude functions:
exclude_duplicates()
,
exclude_duration()
,
exclude_ip()
,
exclude_location()
,
exclude_preview()
,
exclude_resolution()
# Exclude rows with incomplete progress data(qualtrics_text) df <- exclude_progress(qualtrics_text) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% exclude_progress() # Include a lower acceptable completion percentage df <- qualtrics_numeric %>% exclude_preview() %>% exclude_progress(min_progress = 98) # Do not print rows to console df <- qualtrics_text %>% exclude_preview() %>% exclude_progress(print = FALSE)
# Exclude rows with incomplete progress data(qualtrics_text) df <- exclude_progress(qualtrics_text) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% exclude_progress() # Include a lower acceptable completion percentage df <- qualtrics_numeric %>% exclude_preview() %>% exclude_progress(min_progress = 98) # Do not print rows to console df <- qualtrics_text %>% exclude_preview() %>% exclude_progress(print = FALSE)
The exclude_resolution()
function removes
rows that have unacceptable screen resolution.
The function is written to work with data from
Qualtrics surveys.
exclude_resolution( x, res_min = 1000, width_min = 0, height_min = 0, id_col = "ResponseId", res_col = "Resolution", rename = TRUE, quiet = TRUE, print = TRUE, silent = FALSE )
exclude_resolution( x, res_min = 1000, width_min = 0, height_min = 0, id_col = "ResponseId", res_col = "Resolution", rename = TRUE, quiet = TRUE, print = TRUE, silent = FALSE )
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
res_min |
Minimum acceptable screen resolution (width and height). |
width_min |
Minimum acceptable screen width. |
height_min |
Minimum acceptable screen height. |
id_col |
Column name for unique row ID (e.g., participant). |
res_col |
Column name for screen resolution (in format widthxheight). |
rename |
Logical indicating whether to rename columns (using |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
silent |
Logical indicating whether to print message to console. Note this argument controls the exclude message not the check message. |
To record this information in your Qualtrics survey, you must insert a meta info question.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The function outputs to console a message about the number of rows with unacceptable screen resolution.
An object of the same type as x
that excludes rows
that have unacceptable screen resolutions.
For a function that checks for these rows, use check_resolution()
.
For a function that marks these rows, use mark_resolution()
.
Other resolution functions:
check_resolution()
,
mark_resolution()
Other exclude functions:
exclude_duplicates()
,
exclude_duration()
,
exclude_ip()
,
exclude_location()
,
exclude_preview()
,
exclude_progress()
# Exclude low screen resolutions data(qualtrics_text) df <- exclude_resolution(qualtrics_text) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% exclude_resolution()
# Exclude low screen resolutions data(qualtrics_text) df <- exclude_resolution(qualtrics_text) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% exclude_resolution()
The mark_duplicates()
function creates a column labeling
rows of data that have the same IP address and/or same latitude and
longitude. The function is written to work with data from
Qualtrics surveys.
mark_duplicates( x, id_col = "ResponseId", ip_col = "IPAddress", location_col = c("LocationLatitude", "LocationLongitude"), rename = TRUE, dupl_ip = TRUE, dupl_location = TRUE, include_na = FALSE, quiet = FALSE, print = TRUE )
mark_duplicates( x, id_col = "ResponseId", ip_col = "IPAddress", location_col = c("LocationLatitude", "LocationLongitude"), rename = TRUE, dupl_ip = TRUE, dupl_location = TRUE, include_na = FALSE, quiet = FALSE, print = TRUE )
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
ip_col |
Column name for IP addresses. |
location_col |
Two element vector specifying columns for latitude and longitude (in that order). |
rename |
Logical indicating whether to rename columns (using |
dupl_ip |
Logical indicating whether to check IP addresses. |
dupl_location |
Logical indicating whether to check latitude and longitude. |
include_na |
Logical indicating whether to include rows with NAs for IP address and location as potentially excluded rows. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
To record this information in your Qualtrics survey, you must ensure that Anonymize responses is disabled.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
By default, IP address and location are both checked, but they can be
checked separately with the dupl_ip
and dupl_location
arguments.
The function outputs to console separate messages about the number of rows with duplicate IP addresses and rows with duplicate locations. These counts are computed independently, so rows may be counted for both types of duplicates.
An object of the same type as x
that includes a column marking rows
with duplicate IP addresses and/or locations.
For a function that just checks for and returns duplicate rows,
use check_duplicates()
. For a function that excludes these rows,
use exclude_duplicates()
.
Other duplicates functions:
check_duplicates()
,
exclude_duplicates()
Other mark functions:
mark_duration()
,
mark_ip()
,
mark_location()
,
mark_preview()
,
mark_progress()
,
mark_resolution()
# Mark duplicate IP addresses and locations data(qualtrics_text) df <- mark_duplicates(qualtrics_text) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% mark_duplicates() # Mark only for duplicate locations df <- qualtrics_text %>% exclude_preview() %>% mark_duplicates(dupl_location = FALSE)
# Mark duplicate IP addresses and locations data(qualtrics_text) df <- mark_duplicates(qualtrics_text) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% mark_duplicates() # Mark only for duplicate locations df <- qualtrics_text %>% exclude_preview() %>% mark_duplicates(dupl_location = FALSE)
The mark_duration()
function creates a column labeling
rows with fast and/or slow duration.
The function is written to work with data from
Qualtrics surveys.
mark_duration( x, min_duration = 10, max_duration = NULL, id_col = "ResponseId", duration_col = "Duration (in seconds)", rename = TRUE, quiet = FALSE, print = TRUE )
mark_duration( x, min_duration = 10, max_duration = NULL, id_col = "ResponseId", duration_col = "Duration (in seconds)", rename = TRUE, quiet = FALSE, print = TRUE )
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
min_duration |
Minimum duration that is too fast in seconds. |
max_duration |
Maximum duration that is too slow in seconds. |
id_col |
Column name for unique row ID (e.g., participant). |
duration_col |
Column name for durations. |
rename |
Logical indicating whether to rename columns (using |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
Default column names are set based on output from the
qualtRics::fetch_survey()
.
By default, minimum durations of 10 seconds are checked, but either
minima or maxima can be checked with the min_duration
and
max_duration
arguments. The function outputs to console separate
messages about the number of rows that are too fast or too slow.
This function returns the fast and slow rows.
An object of the same type as x
that includes a column marking rows
with fast and slow duration.
For a function that checks for these rows, use check_duration()
.
For a function that excludes these rows, use exclude_duration()
.
Other duration functions:
check_duration()
,
exclude_duration()
Other mark functions:
mark_duplicates()
,
mark_ip()
,
mark_location()
,
mark_preview()
,
mark_progress()
,
mark_resolution()
# Mark durations faster than 100 seconds data(qualtrics_text) df <- mark_duration(qualtrics_text, min_duration = 100) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% mark_duration() # Mark only for durations slower than 800 seconds df <- qualtrics_text %>% exclude_preview() %>% mark_duration(max_duration = 800)
# Mark durations faster than 100 seconds data(qualtrics_text) df <- mark_duration(qualtrics_text, min_duration = 100) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% mark_duration() # Mark only for durations slower than 800 seconds df <- qualtrics_text %>% exclude_preview() %>% mark_duration(max_duration = 800)
The mark_ip()
function creates a column labeling
rows of data that have IP addresses from outside the specified country.
The function is written to work with data from
Qualtrics surveys.
mark_ip( x, id_col = "ResponseId", ip_col = "IPAddress", rename = TRUE, country = "US", include_na = FALSE, quiet = FALSE, print = TRUE )
mark_ip( x, id_col = "ResponseId", ip_col = "IPAddress", rename = TRUE, country = "US", include_na = FALSE, quiet = FALSE, print = TRUE )
x |
Data frame or tibble (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
ip_col |
Column name for IP addresses. |
rename |
Logical indicating whether to rename columns (using |
country |
Two-letter abbreviation of country to check (default is "US"). |
include_na |
Logical indicating whether to include rows with NA in IP address column in the output list of potentially excluded data. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
To record this information in your Qualtrics survey, you must ensure that Anonymize responses is disabled.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The function uses ipaddress::country_networks()
to assign IP addresses to
specific countries using
ISO 3166-1 alpha-2 country codes.
The function outputs to console a message about the number of rows
with IP addresses outside of the specified country. If there are NA
s for IP
addresses (likely due to including preview data—see check_preview()
), it
will print a message alerting to the number of rows with NA
s.
An object of the same type as x
that includes a column marking rows
with IP addresses outside of the specified country.
For a function that checks these rows, use check_ip()
.
For a function that excludes these rows, use exclude_ip()
.
This function requires internet connectivity as it uses the
ipaddress::country_networks()
function, which pulls daily updated data
from https://www.iwik.org/ipcountry/. It only updates the data once
per session, as it caches the results for future work during the session.
Other ip functions:
check_ip()
,
exclude_ip()
Other mark functions:
mark_duplicates()
,
mark_duration()
,
mark_location()
,
mark_preview()
,
mark_progress()
,
mark_resolution()
# Mark IP addresses outside of the US data(qualtrics_text) df <- mark_ip(qualtrics_text) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% mark_ip() # Mark IP addresses outside of Germany df <- qualtrics_text %>% exclude_preview() %>% mark_ip(country = "DE")
# Mark IP addresses outside of the US data(qualtrics_text) df <- mark_ip(qualtrics_text) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% mark_ip() # Mark IP addresses outside of Germany df <- qualtrics_text %>% exclude_preview() %>% mark_ip(country = "DE")
The mark_location()
function creates a column labeling
rows that have locations outside of the US.
The function is written to work with data from
Qualtrics surveys.
mark_location( x, id_col = "ResponseId", location_col = c("LocationLatitude", "LocationLongitude"), rename = TRUE, include_na = FALSE, quiet = FALSE, print = TRUE )
mark_location( x, id_col = "ResponseId", location_col = c("LocationLatitude", "LocationLongitude"), rename = TRUE, include_na = FALSE, quiet = FALSE, print = TRUE )
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
location_col |
Two element vector specifying columns for latitude and longitude (in that order). |
rename |
Logical indicating whether to rename columns (using |
include_na |
Logical indicating whether to include rows with NA in latitude and longitude columns in the output list of potentially excluded data. |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
To record this information in your Qualtrics survey, you must ensure that Anonymize responses is disabled.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The function only works for the United States.
It uses the #' maps::map.where()
to determine if latitude and longitude
are inside the US.
The function outputs to console a message about the number of rows with locations outside of the US.
An object of the same type as x
that includes a column marking rows
that are located outside of the US and (if include_na == FALSE
) rows with
no location information.
For a function that checks for these rows, use check_location()
.
For a function that excludes these rows, use exclude_location()
.
Other location functions:
check_location()
,
exclude_location()
Other mark functions:
mark_duplicates()
,
mark_duration()
,
mark_ip()
,
mark_preview()
,
mark_progress()
,
mark_resolution()
# Mark locations outside of the US data(qualtrics_text) df <- mark_location(qualtrics_text) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% mark_location()
# Mark locations outside of the US data(qualtrics_text) df <- mark_location(qualtrics_text) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% mark_location()
The mark_preview()
function creates a column labeling
rows that are survey previews.
The function is written to work with data from
Qualtrics surveys.
mark_preview( x, id_col = "ResponseId", preview_col = "Status", rename = TRUE, quiet = FALSE, print = TRUE )
mark_preview( x, id_col = "ResponseId", preview_col = "Status", rename = TRUE, quiet = FALSE, print = TRUE )
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
id_col |
Column name for unique row ID (e.g., participant). |
preview_col |
Column name for survey preview. |
rename |
Logical indicating whether to rename columns (using |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The preview column in Qualtrics can be a numeric or character vector
depending on whether it is exported as choice text or numeric values.
This function works for both.
The function outputs to console a message about the number of rows that are survey previews.
An object of the same type as x
that includes a column marking rows
that are survey previews.
For a function that checks for these rows, use check_preview()
.
For a function that excludes these rows, use exclude_preview()
.
Other preview functions:
check_preview()
,
exclude_preview()
Other mark functions:
mark_duplicates()
,
mark_duration()
,
mark_ip()
,
mark_location()
,
mark_progress()
,
mark_resolution()
# Mark survey previews data(qualtrics_text) df <- mark_preview(qualtrics_text) # Works for Qualtrics data exported as numeric values, too df <- qualtrics_numeric %>% mark_preview()
# Mark survey previews data(qualtrics_text) df <- mark_preview(qualtrics_text) # Works for Qualtrics data exported as numeric values, too df <- qualtrics_numeric %>% mark_preview()
The mark_progress()
function creates a column labeling
rows that have incomplete progress.
The function is written to work with data from
Qualtrics surveys.
mark_progress( x, min_progress = 100, id_col = "ResponseId", finished_col = "Finished", progress_col = "Progress", rename = TRUE, quiet = FALSE, print = TRUE )
mark_progress( x, min_progress = 100, id_col = "ResponseId", finished_col = "Finished", progress_col = "Progress", rename = TRUE, quiet = FALSE, print = TRUE )
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
min_progress |
Amount of progress considered acceptable to include. |
id_col |
Column name for unique row ID (e.g., participant). |
finished_col |
Column name for whether survey was completed. |
progress_col |
Column name for percentage of survey completed. |
rename |
Logical indicating whether to rename columns (using |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The default requires 100% completion, but lower levels of completion
maybe acceptable and can be allowed by specifying the min_progress
argument.
The finished column in Qualtrics can be a numeric or character vector
depending on whether it is exported as choice text or numeric values.
This function works for both.
The function outputs to console a message about the number of rows that have incomplete progress.
An object of the same type as x
that includes a column marking rows
that have incomplete progress.
For a function that checks for these rows, use check_progress()
.
For a function that excludes these rows, use exclude_progress()
.
Other progress functions:
check_progress()
,
exclude_progress()
Other mark functions:
mark_duplicates()
,
mark_duration()
,
mark_ip()
,
mark_location()
,
mark_preview()
,
mark_resolution()
# Mark rows with incomplete progress data(qualtrics_text) df <- mark_progress(qualtrics_text) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% mark_progress() # Include a lower acceptable completion percentage df <- qualtrics_numeric %>% exclude_preview() %>% mark_progress(min_progress = 98)
# Mark rows with incomplete progress data(qualtrics_text) df <- mark_progress(qualtrics_text) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% mark_progress() # Include a lower acceptable completion percentage df <- qualtrics_numeric %>% exclude_preview() %>% mark_progress(min_progress = 98)
The mark_resolution()
function creates a column labeling
rows that have unacceptable screen resolution.
The function is written to work with data from
Qualtrics surveys.
mark_resolution( x, res_min = 1000, width_min = 0, height_min = 0, id_col = "ResponseId", res_col = "Resolution", rename = TRUE, quiet = FALSE, print = TRUE )
mark_resolution( x, res_min = 1000, width_min = 0, height_min = 0, id_col = "ResponseId", res_col = "Resolution", rename = TRUE, quiet = FALSE, print = TRUE )
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
res_min |
Minimum acceptable screen resolution (width and height). |
width_min |
Minimum acceptable screen width. |
height_min |
Minimum acceptable screen height. |
id_col |
Column name for unique row ID (e.g., participant). |
res_col |
Column name for screen resolution (in format widthxheight). |
rename |
Logical indicating whether to rename columns (using |
quiet |
Logical indicating whether to print message to console. |
print |
Logical indicating whether to print returned tibble to console. |
To record this information in your Qualtrics survey, you must insert a meta info question.
Default column names are set based on output from the
qualtRics::fetch_survey()
.
The function outputs to console a message about the number of rows with unacceptable screen resolution.
An object of the same type as x
that includes a column marking rows
that have unacceptable screen resolutions.
For a function that checks for these rows, use check_resolution()
.
For a function that excludes these rows, use exclude_resolution()
.
Other resolution functions:
check_resolution()
,
exclude_resolution()
Other mark functions:
mark_duplicates()
,
mark_duration()
,
mark_ip()
,
mark_location()
,
mark_preview()
,
mark_progress()
# Mark low screen resolutions data(qualtrics_text) df <- mark_resolution(qualtrics_text) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% mark_resolution()
# Mark low screen resolutions data(qualtrics_text) df <- mark_resolution(qualtrics_text) # Remove preview data first df <- qualtrics_text %>% exclude_preview() %>% mark_resolution()
qualtRics::fetch_survey()
from
simulated Qualtrics studyA dataset containing the metadata from a standard Qualtrics survey with
browser metadata collected and exported with "Use numeric values". The data
were imported using
qualtRics::fetch_survey()
.
These data were randomly generated using iptools::ip_random() and
rgeolocate::ip2location() functions.
qualtrics_fetch
qualtrics_fetch
A data frame with 100 rows and 17 variables:
date and time data collection started, in ISO 8601 format
date and time data collection ended, in ISO 8601 format
numeric flag for preview (1) vs. implemented survey (0) entries
participant IP address (truncated for anonymity)
percentage of survey completed
duration of time required to complete survey, in seconds
numeric flag for whether survey was completed (1) or progress was < 100 (0)
date and time survey was recorded, in ISO 8601 format
random ID for participants
latitude geolocated from IP address
longitude geolocated from IP address
language set in Qualtrics
user web browser type
user web browser version
user operating system
user screen resolution
response to question about whether the user liked the survey (1 = Yes, 0 = No)
Other data:
qualtrics_fetch2
,
qualtrics_numeric
,
qualtrics_raw
,
qualtrics_text
qualtRics::fetch_survey()
from
simulated Qualtrics study but with labels included as column namesA dataset containing the metadata from a standard Qualtrics survey with
browser metadata collected and exported with "Use numeric values". The data
were imported using
qualtRics::fetch_survey()
.
and then the secondary labels were assigned as column names with
sjlabelled::get_label()
.
These data were randomly generated using iptools::ip_random() and
rgeolocate::ip2location() functions.
qualtrics_fetch2
qualtrics_fetch2
A data frame with 100 rows and 17 variables:
date and time data collection started, in ISO 8601 format
date and time data collection ended, in ISO 8601 format
numeric flag for preview (1) vs. implemented survey (0) entries
participant IP address (truncated for anonymity)
percentage of survey completed
duration of time required to complete survey, in seconds
numeric flag for whether survey was completed (1) or progress was < 100 (0)
date and time survey was recorded, in ISO 8601 format
random ID for participants
latitude geolocated from IP address
longitude geolocated from IP address
language set in Qualtrics
user web browser type
user web browser version
user operating system
user screen resolution
response to question about whether the user liked the survey (1 = Yes, 0 = No)
Other data:
qualtrics_fetch
,
qualtrics_numeric
,
qualtrics_raw
,
qualtrics_text
A dataset containing the metadata from a standard Qualtrics survey with browser metadata collected and exported with "Use numeric values". These data were randomly generated using iptools::ip_random() and rgeolocate::ip2location() functions.
qualtrics_numeric
qualtrics_numeric
A data frame with 100 rows and 16 variables:
date and time data collection started, in ISO 8601 format
date and time data collection ended, in ISO 8601 format
numeric flag for preview (1) vs. implemented survey (0) entries
participant IP address (truncated for anonymity)
percentage of survey completed
duration of time required to complete survey, in seconds
numeric flag for whether survey was completed (1) or progress was < 100 (0)
date and time survey was recorded, in ISO 8601 format
random ID for participants
latitude geolocated from IP address
longitude geolocated from IP address
language set in Qualtrics
user web browser type
user web browser version
user operating system
user screen resolution
Other data:
qualtrics_fetch2
,
qualtrics_fetch
,
qualtrics_raw
,
qualtrics_text
A dataset containing the metadata from a standard Qualtrics survey with browser metadata collected and exported with "Use choice text". These data were randomly generated using iptools::ip_random() and rgeolocate::ip2location() functions. This dataset includes the two header rows of with column information that is exported by Qualtrics.
qualtrics_raw
qualtrics_raw
A data frame with 102 rows and 16 variables:
date and time data collection started, in ISO 8601 format
date and time data collection ended, in ISO 8601 format
flag for preview (Survey Preview) vs. implemented survey (IP Address) entries
participant IP address (truncated for anonymity)
percentage of survey completed
duration of time required to complete survey, in seconds
logical for whether survey was completed (TRUE) or progress was < 100 (FALSE)
date and time survey was recorded, in ISO 8601 format
random ID for participants
latitude geolocated from IP address
longitude geolocated from IP address
language set in Qualtrics
user web browser type
user web browser version
user operating system
user screen resolution
Other data:
qualtrics_fetch2
,
qualtrics_fetch
,
qualtrics_numeric
,
qualtrics_text
A dataset containing the metadata from a standard Qualtrics survey with browser metadata collected and exported with "Use choice text". These data were randomly generated using iptools::ip_random() and rgeolocate::ip2location() functions.
qualtrics_text
qualtrics_text
A data frame with 100 rows and 16 variables:
date and time data collection started, in ISO 8601 format
date and time data collection ended, in ISO 8601 format
flag for preview (Survey Preview) vs. implemented survey (IP Address) entries
participant IP address (truncated for anonymity)
percentage of survey completed
duration of time required to complete survey, in seconds
logical for whether survey was completed (TRUE) or progress was < 100 (FALSE)
date and time survey was recorded, in ISO 8601 format
random ID for participants
latitude geolocated from IP address
longitude geolocated from IP address
language set in Qualtrics
user web browser type
user web browser version
user operating system
user screen resolution
Other data:
qualtrics_fetch2
,
qualtrics_fetch
,
qualtrics_numeric
,
qualtrics_raw
The remove_label_rows()
function filters out the initial label rows from
datasets downloaded from Qualtrics surveys.
remove_label_rows(x, convert = TRUE, rename = FALSE)
remove_label_rows(x, convert = TRUE, rename = FALSE)
x |
Data frame (downloaded from Qualtrics). |
convert |
Logical indicating whether to convert/coerce date, logical and numeric columns from the metadata. |
rename |
Logical indicating whether to rename columns based on first row of data. |
The function (1) checks if the data set uses Qualtrics column names, (2) checks if label rows are already used as column names, (3) removes label rows if present, and (4) converts date, logical, and numeric metadata columns to proper data type. Datasets imported using qualtRics::fetch_survey() should not need this function.
The convert
argument only converts the StartDate, EndDate,
RecordedDate, Progress, Finished, Duration (in seconds),
LocationLatitude, and LocationLongitude columns. To convert other data
columns, see dplyr::mutate()
.
An object of the same type as x
that excludes Qualtrics label rows and
with date, logical, and numeric metadata columns converted to the correct
data class.
# Remove label rows data(qualtrics_raw) df <- remove_label_rows(qualtrics_raw)
# Remove label rows data(qualtrics_raw) df <- remove_label_rows(qualtrics_raw)
The rename_columns()
function renames the metadata columns to match
standard Qualtrics names.
rename_columns(x, alert = TRUE)
rename_columns(x, alert = TRUE)
x |
Data frame (preferably imported from Qualtrics using {qualtRics}). |
alert |
Logical indicating whether to alert user to the fact that the columns do not match the secondary labels and therefore cannot be renamed. |
When importing Qualtrics data using
qualtRics::fetch_survey()
.
labels entered in Qualtrics questions are saved as 'subtitles' for column
names. Using sjlabelled::get_label()
can make these secondary labels be the
primary column names. However, this results in a different set of names for
the metadata columns than is used in all of the mark_()
, check_()
, and
exclude_()
functions. This function renames these columns to match the
standard Qualtrics names.
An object of the same type as x
that has column names that match standard
Qualtrics names.
Other column name functions:
use_labels()
# Rename columns data(qualtrics_fetch) qualtrics_renamed <- qualtrics_fetch %>% rename_columns() names(qualtrics_fetch) names(qualtrics_renamed) # Alerts when columns cannot be renamed data(qualtrics_numeric) rename_columns(qualtrics_numeric) # Turn off alert rename_columns(qualtrics_numeric, alert = FALSE)
# Rename columns data(qualtrics_fetch) qualtrics_renamed <- qualtrics_fetch %>% rename_columns() names(qualtrics_fetch) names(qualtrics_renamed) # Alerts when columns cannot be renamed data(qualtrics_numeric) rename_columns(qualtrics_numeric) # Turn off alert rename_columns(qualtrics_numeric, alert = FALSE)
Each of the mark_*()
functions appends a new column to the data.
The unite_exclusions()
function unites all of those columns in a
single column that can be used to filter any or all exclusions downstream.
Rows with multiple exclusions are concatenated with commas.
unite_exclusions( x, exclusion_types = c("duplicates", "duration", "ip", "location", "preview", "progress", "resolution"), separator = ",", remove = TRUE )
unite_exclusions( x, exclusion_types = c("duplicates", "duration", "ip", "location", "preview", "progress", "resolution"), separator = ",", remove = TRUE )
x |
Data frame or tibble (preferably exported from Qualtrics). |
exclusion_types |
Vector of types of exclusions to unite. |
separator |
Character string specifying what character to use to separate multiple exclusion types |
remove |
Logical specifying whether to remove united columns (default = TRUE) or leave them in the data frame (FALSE) |
An object of the same type as x
that includes the all of the same
rows but with a single exclusion
column replacing all of the specified
exclusion_*
columns.
# Unite all exclusion types df <- qualtrics_text %>% mark_duplicates() %>% mark_duration(min_duration = 100) %>% mark_ip() %>% mark_location() %>% mark_preview() %>% mark_progress() %>% mark_resolution() df2 <- df %>% unite_exclusions() # Unite subset of exclusion types df2 <- df %>% unite_exclusions(exclusion_types = c("duplicates", "duration", "ip"))
# Unite all exclusion types df <- qualtrics_text %>% mark_duplicates() %>% mark_duration(min_duration = 100) %>% mark_ip() %>% mark_location() %>% mark_preview() %>% mark_progress() %>% mark_resolution() df2 <- df %>% unite_exclusions() # Unite subset of exclusion types df2 <- df %>% unite_exclusions(exclusion_types = c("duplicates", "duration", "ip"))
The use_labels()
function renames the columns using the labels generated
in Qualtrics. Data must be imported using
qualtRics::fetch_survey()
.
use_labels(x)
use_labels(x)
x |
Data frame imported using |
An object of the same type as x
that has column names using the labels
generated in Qualtrics.
Other column name functions:
rename_columns()
# Rename columns data(qualtrics_fetch) qualtrics_renamed <- qualtrics_fetch %>% use_labels() names(qualtrics_fetch) names(qualtrics_renamed)
# Rename columns data(qualtrics_fetch) qualtrics_renamed <- qualtrics_fetch %>% use_labels() names(qualtrics_fetch) names(qualtrics_renamed)