--- title: "Google Cloud Translation API" author: "Mark Edmondson" date: "`r Sys.Date()`" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Google Cloud Translation API} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- The Google Cloud Translation API provides a simple programmatic interface for translating an arbitrary string into any supported language. Translation API is highly responsive, so websites and applications can integrate with Translation API for fast, dynamic translation of source text from the source language to a target language (e.g. French to English). Read more [on the Google Cloud Translation Website](https://cloud.google.com/translate/) You can detect the language via `gl_translate_detect`, or translate and detect language via `gl_translate` ### Language Translation Translate text via `gl_translate`. Note this is a lot more refined than the free version on Google's translation website. ```r library(googleLanguageR) text <- "to administer medicince to animals is frequently a very difficult matter, and yet sometimes it's necessary to do so" ## translate British into Danish gl_translate(text, target = "da")$translatedText ``` You can choose the target language via the argument `target`. The function will automatically detect the language if you do not define an argument `source`. This function which will also detect the langauge. As it costs the same as `gl_translate_detect`, its usually cheaper to detect and translate in one step. You can pass a vector of text which will first be attempted to translate in one API call - if that fails due to being greater than the API limits, it will attempt again but vectorising the API calls. This will result in more calls and be slower, but cost the same as you are charged per character translated, not per API call. #### HTML support You can also supply web HTML and select the `format='html'` which will handle HTML tags to give you a cleaner translation. Consider removing anything not needed to be translated first, such as JavaScript and CSS scripts using the tools of `rvest` - an example is shown below: ```r # translate webpages library(rvest) library(googleLanguageR) my_url <- "http://www.dr.dk/nyheder/indland/greenpeace-facebook-og-google-boer-foelge-apples-groenne-planer" ## in this case the content to translate is in css select .wcms-article-content read_html(my_url) %>% # read html html_node(css = ".wcms-article-content") %>% # select article content html_text %>% # extract text gl_translate(format = "html") %>% # translate with html flag dplyr::select(translatedText) # show translatedText column of output tibble ``` ### Language Detection This function only detects the language: ```r ## which language is this? gl_translate_detect("katten sidder på måtten") ``` The more text it has, the better. And it helps if its not Danish... It may be better to use [`cld2`](https://github.com/ropensci/cld2) to translate offline first, to avoid charges if the translation is unnecessary (e.g. already in English). You could then verify online for more uncertain cases. ```r cld2::detect_language("katten sidder på måtten") ``` ### Translation API limits The API limits in three ways: characters per day, characters per 100 seconds, and API requests per 100 seconds. All can be set in the API manager in Google Cloud console: `https://console.developers.google.com/apis/api/translate.googleapis.com/quotas` The library will limit the API calls for the characters and API requests per 100 seconds. The API will automatically retry if you are making requests too quickly, and also pause to make sure you only send `100000` characters per 100 seconds.