--- title: "Getting Started with osfr" date: "2022-09-25" output: rmarkdown::html_vignette: toc: TRUE vignette: > %\VignetteIndexEntry{Getting Started with osfr} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- This vignette provides a quick tour of the *osfr* package. ```r library(osfr) ``` ## What is OSF? [OSF][osf] is a free and open source web application that provides a space for researchers to collaboratively store, manage, and share their research materials (e.g. data, code, protocols). Most work on OSF is organized around ***projects***, which include a cloud-based storage bucket where files can be stored and organized into directories. Note there is no storage limit on the size of projects but individual files must be < 5Gb. Projects can be kept private, shared with a specific group of collaborators, or made publicly available with citable DOIs so you can get credit for their work. If you'd like to learn more about OSF the Center for Open Science has published an excellent series of [guides](https://help.osf.io/) to help get you started. We'll provide links to specific guides throughout this vignette. Here are a few relevant topics: * [Creating projects and components][osf-create-projects] * [Managing projects][osf-manage-projects] * [Collaborating on projects][osf-collaborating] ## Accessing OSF projects Let's check out an example project containing materials for an analysis of the 2012 American National Election Survey (ANES). You can access the OSF project in your browser by navigating to its URL: . Let's load this project into R with `osfr::osf_retrieve_node()`: ```r anes_project <- osf_retrieve_node("https://osf.io/jgyxm") anes_project #> # A tibble: 1 × 3 #> name id meta #> #> 1 Political identification and gender jgyxm ``` This returns an `osf_tbl` object, which is the `data.frame`-like class *osfr* uses to represent items retrieved from OSF. You can now use `anes_project` to perform a variety of project related tasks by passing it to different osfr functions. ### Downloading files Let's list all of the files that have been uploaded to the project: ```r anes_files <- osf_ls_files(anes_project) anes_files #> # A tibble: 5 × 3 #> name id meta #> #> 1 cleaning.R 5e20d22bedceab002d82e0f1 #> 2 Questionnaire.docx 5e20d22bedceab002b82dc3f #> 3 raw_data.csv 5e20d22c675e0e00096b4de8 #> 4 Data Dictionary.docx 5e20d22c675e0e000e6b4b18 #> 5 analyses.R 5e20d22c675e0e000a6b4bd3 ``` This returns another `osf_tbl` but this one contains 5 rows; one for each of the project *files* stored on OSF. A nice feature of OSF is it provides rendered views for a wide variety of file formats, so it's not necessary to actually download and open a file if you just want to quickly examine it. Let's open the Word Document containing the project's data dictionary by extracting the relevant row from `anes_tbl` and passing it to `osf_open()`: ```r osf_open(anes_files[4, ]) ``` Because `osf_tbl`s are just specialized `data.frame`s, we could also `subset()` or `dplyr::filter()` to achieve the same result. *__Note:__ If an `osf_tbl` with multiple entities is passed to an non-vectorized osfr function like `osf_open()`, the default behavior is to use the entity in the first row and warn that all other entities are ignored.* We can also download local copies of these files by passing `anes_files` to `osf_download()`. ```r osf_download(anes_files) #> # A tibble: 5 × 4 #> name id local_path meta #> #> 1 cleaning.R 5e20d22bedceab002d82e0f1 ./cleaning.R #> 2 Questionnaire.docx 5e20d22bedceab002b82dc3f ./Questionnaire.do… #> 3 raw_data.csv 5e20d22c675e0e00096b4de8 ./raw_data.csv #> 4 Data Dictionary.docx 5e20d22c675e0e000e6b4b18 ./Data Dictionary.… #> 5 analyses.R 5e20d22c675e0e000a6b4bd3 ./analyses.R ``` We'll use these files in the next section for creating a new project. ### Pipes As you've likely noticed, `osf_tbl` objects are central to osfr's functionality. Indeed, nearly all of its functions both expect an `osf_tbl` as input and return an `osf_tbl` as output. As such, osfr functions can be chained together using the [pipe operator][magrittr] (`%>%`), allowing for the creation of pipelines to automate OSF-based tasks. Here is a short example that consolidates all of the steps we've performed so far: ```r osf_retrieve_node("jgyxm") %>% osf_ls_files() %>% osf_download() ``` ## Project management Now let's see how to use osfr to create and manage your own projects. The goal for this section is to create your own version of the *Political Identification and Gender* project but with a better organizational structure. To follow along with this section you'll need to authenticate osfr using a personal access token (PAT). See the `?osf_auth()` function documentation or the `auth` vignette for more information. ### Creating a project First you will need to create a new private project on OSF to store all the files related to the project. Here, we're giving the new project a title (required) and description (optional). ```r my_project <- osf_create_project( title = "Political Identification and Gender: Re-examined", description = "A re-analysis of the original study's results." ) my_project #> # A tibble: 1 × 3 #> name id meta #> #> 1 Political Identification and Gender: Re-examined f7bgz ``` The GUID for this new project is `f7bgz`, but yours will be something different. You can check out the project on OSF by opening it's URL (`https://www.osf.io/`), or, more conveniently: `osf_open(my_project)`. ### Adding structure with components A key organizational feature of OSF is the ability to augment a project's structure with sub-projects, which are referred to as *components* on OSF. Like top-level projects, every component is assigned a unique URL and contains its own cloud-based storage bucket. They can also have different privacy settings from the parent project. We are going to create two nested *components*, one for the raw data and one for the analysis scripts. ```r data_comp <- osf_create_component(my_project, title = "Raw Data") script_comp <- osf_create_component(my_project, title = "Analysis Scripts") # verify the components were created # osf_open(my_project) ``` If you refresh the OSF project in your browser the *Components* widget should now contain two entries for each of our newly created components. ### Uploading files Now that our project components are in place we can start to populate them with files. Let's start with the csv file containing our raw data. ```r data_file <- osf_upload(my_project, path = "raw_data.csv") data_file #> # A tibble: 1 × 3 #> name id meta #> #> 1 raw_data.csv 63309f3e18f4581162429679 ``` Oh no! Instead of uploading `raw_data.csv` to the *Raw Data* component, we uploaded it to the parent project instead. Fear not. We can easily fix this contrived mistake by simply moving the file to its intended location. ```r data_file <- osf_mv(data_file, to = data_comp) ``` Crisis averted. Now if you open *Raw Data* on OSF (`osf_open(data_comp)`), it should contain the csv file. Our next step is to upload the R scripts into the *Analysis Scripts* component. Rather than upload each file individually, we'll take advantage of `osf_upload()`'s ability to handle multiple files/directories and use `list.files()` to identify all `.R` files in the working directory: ```r r_files <- osf_upload(script_comp, path = list.files(pattern = ".R$")) r_files #> # A tibble: 3 × 3 #> name id meta #> #> 1 analyses.R 63309f47408a27127e7637de #> 2 cleaning.R 63309f4a555fe211977a9017 #> 3 precompile.R 63309f4c18f4581167428d70 ``` ### Putting it all together Finally, let's repeat the process for the 2 `.docx` file containing the survey and accompanying data dictionary. This time we'll use a more succinct approach that leverages pipes to create and populate the component in one block: ```r my_project %>% osf_create_component("Research Materials") %>% osf_upload(path = list.files(pattern = "\\.docx$")) #> # A tibble: 2 × 3 #> name id meta #> #> 1 Data Dictionary.docx 63309f526c2401128550a2bc #> 2 Questionnaire.docx 63309f5418f4581150428ef4 ``` We can verify the project is now structured the way we wanted by listing the components we have under the main project. ```r osf_ls_nodes(my_project) #> # A tibble: 3 × 3 #> name id meta #> #> 1 Research Materials dg79a #> 2 Analysis Scripts fquzh #> 3 Raw Data 6urqv ``` which gives us an `osf_tbl` with one row for each of the project's components. ### Updating files OSF provides automatic and unlimited file versioning. Let's see how this works with osfr. Make a small edit to your local copy of `cleaning.R` and save. Now, if we attempt to upload this new version to the *Analysis Scripts* component, osfr will throw a conflict error: ```r osf_upload(script_comp, path = "cleaning.R") ``` ``` Error: Can't upload file 'cleaning.R'. * A file with the same name already exists at the destination. * Use the `conflicts` argument to avoid this error in the future. ``` As the error indicates, we need to use the `conflicts` argument to instruct `osf_upload()` how to handle the conflict. In this case, we want to overwrite the original copy with our new version: ```r osf_upload(script_comp, path = "cleaning.R", conflicts = "overwrite") ``` Learn more about file versioning on OSF [here][osf-versioning]. ### Sharing Remember, new OSF projects are *always* private by default. You can change this by opening the project's settings page on OSF and making it public. See the following guides for more information about OSF permissions and how to optionally generate a DOI so other can cite your project. * [Control your privacy settings][osf-privacy] * [Sharing, linking, and forking projects][osf-sharing] * [Generate DOIs][osf-doi] ## A few details about files on OSF On OSF, files can exist in projects, components, and/or directories. Files can be stored on *OSF's Storage* or in another service that is connected to an OSF project (e.g. GitHub, Dropbox, or Google Drive). However, `osfr` currently only supports interacting with files on OSF Storage. We can download files from any public or private node that we have access to and can identify files to download in two different ways: 1. If we know where the file is located, but don't remember its GUID, you can use the `osf_ls_files` function to filter by filename within a specified node and then pipe the results to `osf_download()`. ```r anes_project %>% osf_ls_files(pattern = ) %>% osf_download(conflicts = "overwrite") ``` 2. For a public file that was referenced in a published article, you may already have the GUID, and so can retrieve the file directly before downloading it. For example, let's download Daniel Laken's helpful spreadsheet for calculating effect sizes (available from ). ```r osf_retrieve_file("vbdah") %>% osf_download(excel_file) ``` ## Additional resources For more information on OSF and `osfr` check out: * [OSF][osf] * [OSF API Documentation][osf-api] * [OSF Support](https://osf.io/support/) * [osfr GitHub Repository](https://github.com/ropensci/osfr) [osf]: https://osf.io [cos]: https://www.cos.io [osf-api]: https://developer.osf.io [magrittr]: https://magrittr.tidyverse.org [tibble]: https://tibble.tidyverse.org [osf-create-projects]: https://help.osf.io/article/383-creating-a-project [osf-manage-projects]: https://help.osf.io/article/384-managing-projects [osf-privacy]: https://help.osf.io/article/285-control-your-privacy-settings [osf-collaborating]: https://help.osf.io/article/385-collaborating-on-projects [osf-doi]: https://help.osf.io/article/220-create-dois [osf-versioning]: https://help.osf.io/article/282-file-revisions-and-version-control [osf-sharing]: https://help.osf.io/article/388-sharing-linking-and-forking-projects