--- title: "`lingtypology`: Glottolog functions" author: "George Moroz" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{`lingtypology`: Glottolog functions} %\VignetteEncoding{UTF-8} %\VignetteEngine{knitr::rmarkdown} editor_options: chunk_output_type: console --- ```{r, include=FALSE} library(lingtypology) ``` This package is based on the [Glottolog database](https://glottolog.org/), so `lingtypology` has several functions for accessing data from that database. ### 1. Command name's syntax Most of the functions in `lingtypology` have the same syntax: **what you need.what you have**. Most of them are based on _language name_. * **aff.lang()** — get affiliation by language * **area.lang()** — get macro area by language * **country.lang()** — get country by language * **iso.lang()** — get ISO 639-3 code by language * **gltc.lang()** — get glottocode (identifier for a language in the Glottolog database) code by language * **lat.lang()** — get latitude by language * **long.lang()** — get longitude by language * **level.lang()** — get level by language * **subc.lang()** — get subclassification in the Newick tree format by language Some of them help to define a vector of languages. * **lang.aff()** — get language by affiliation * **lang.iso()** — get language by ISO 639-3 code * **lang.gltc()** — get language by glottocode Additionally there are some functions to convert glottocodes to ISO 639-3 codes and vice versa: * **gltc.iso()** — get glottocode by ISO 639-3 code * **iso.gltc()** — get ISO 639-3 code by glottocode The most important functionality of `lingtypology` is the ability to create interactive maps based on features and sets of languages (see the third section): * **map.feature()** [Glottolog database (v. 4.1)](https://glottolog.org/) provides `lingtypology` with language names, ISO codes, glottocodes, affiliation, macro area, coordinates, and much information. This set of functions doesn't have a goal to cover all possible combinations of functions. Check out additional information that is preserved in the version of the Glottolog database used in `lingtypology`: ```{r} names(glottolog) ``` Using R functions for data manipulation you can create your own database for your purpose. ### 2. Using base functions All functions introduced in the previous section are regular functions, so they can take the following objects as input: * a regular string ```{r} iso.lang("West Circassian") lang.iso("ady") lang.aff("Abkhaz-Adyge") ``` I would like to point out that you can create strings in R using single or double quotes. Since inserting single quotes in a string created with single quotes causes an error in R, I use double quotes in my tutorial. You can use single quotes, but be careful and remember that `'Ma'ya'` is an incorrect string in R. * a vector of strings ```{r} area.lang(c("Kabardian", "Aduge")) lang <- c("Kabardian", "Russian") aff.lang(lang) ``` * other functions. For example, let's try to get a vector of ISO codes for the Circassian languages ```{r} iso.lang(lang.aff("Abkhaz-Adyge")) ``` If you are new to R, it is important to mention that you can create a table with languages, features and other parametres with any spreadsheet software you used to work. Then you can import the created file to R using standard tools. ### 3. Spell Checker: look carefully at warnings! All functions which take a vector of languages are enriched with a kind of a spell checker. If a language from a query is absent in the database, functions return a warning message containing a set of candidates with the minimal Levenshtein distance to the language from the query. ```{r} aff.lang("Kabardian") ``` ### 4. `subc.lang()` function The `subc.lang()` function returns language subclassification in the Newick tree format. ```{r} subc.lang("Lechitic") ``` This format is hard to interpret by itself, but there are some tools in R that make it possible to visualise those subclassifications: ```{r} library(ape) plot(read.tree(text = subc.lang("Lechitic"))) ``` It is possible to specify colors of tips in case you want to emphasize some nodes: ```{r} plot(read.tree(text = subc.lang("Lechitic")), tip.color = c("red", "black", "black", "black")) ``` As you can see nodes are counted from bottom to top. For more sophisticated tree visualization you can look into [`ggtree` package](https://bioconductor.org/packages/release/bioc/html/ggtree.html). There are several linguistic packages that provide some functionality for creating glottolog trees: * [`glottoTrees`](https://github.com/erichround/glottoTrees) package by Erich Round * [`lingtypr`](https://gitlab.com/laurabecker/lingtypr) package by Laura Becker