Customise your query

Geocoding is surprisingly hard. Address formats and spellings differ in and between countries; administrative areas on different levels intersect; names, numbers, and boundaries change over time — you name it. The OpenCage API, therefore, supports about a dozen parameters to customise queries. This vignette explains how to use the query parameters with {opencage} to get better geocoding results.

Multiple results

Forward geocoding typically returns multiple results because many places have the same or similar names.

By default oc_forward_df() only returns one result: the one defined as the best result by the OpenCage API. To receive more results, modify the limit argument, which specifies the maximum number of results that should be returned. Integer values between 1 and 100 are allowed.

oc_forward_df("Berlin")
#> # A tibble: 1 × 4
#>   placename oc_lat oc_lng oc_formatted   
#>   <chr>      <dbl>  <dbl> <chr>          
#> 1 Berlin      52.5   13.4 Berlin, Germany
oc_forward_df("Berlin", limit = 5)
#> # A tibble: 5 × 4
#>   placename oc_lat oc_lng oc_formatted                                               
#>   <chr>      <dbl>  <dbl> <chr>                                                      
#> 1 Berlin      52.5   13.4 Berlin, Germany                                            
#> 2 Berlin      44.5  -71.2 Berlin, NH 03570, United States of America                 
#> 3 Berlin      52.5   13.4 Berlin Ostbahnhof, Mitteltunnel, 10243 Berlin, Germany     
#> 4 Berlin      39.8  -89.9 Berlin, Sangamon County, Illinois, United States of America
#> 5 Berlin      41.6  -72.7 Berlin, Connecticut, United States of America

Reverse geocoding only returns at most one result. Therefore, oc_reverse_df() does not support the limit argument.

OpenCage may sometimes have more than one record of one place. Duplicated records are not returned by default. If you set the no_dedupe argument to TRUE, you will receive duplicated results when available.

oc_forward_df("Berlin", limit = 5, no_dedupe = TRUE)
#> # A tibble: 5 × 4
#>   placename oc_lat oc_lng oc_formatted                                               
#>   <chr>      <dbl>  <dbl> <chr>                                                      
#> 1 Berlin      52.5   13.4 Berlin, Germany                                            
#> 2 Berlin      44.5  -71.2 Berlin, NH 03570, United States of America                 
#> 3 Berlin      52.5   13.4 Berlin Ostbahnhof, Mitteltunnel, 10243 Berlin, Germany     
#> 4 Berlin      39.8  -89.9 Berlin, Sangamon County, Illinois, United States of America
#> 5 Berlin      41.6  -72.7 Berlin, Connecticut, United States of America

Better targeted results

As you can see, place names are often ambiguous. Happily, the OpenCage API has tools to deal with this problem. The countrycode, bounds, and proximity arguments can make the query more precise. min_confidence lets you limit the results to those with a specified confidence score (which is not necessarily the “best” or most “relevant” result, though). These parameters are only relevant and available for forward geocoding.

countrycode

The countrycode parameter restricts the results to the given country. The country code is a two letter code as defined by the ISO 3166-1 Alpha 2 standard. E.g. “AR” for Argentina, “FR” for France, and “NZ” for the New Zealand.

oc_forward_df(placename = "Paris", countrycode = "US", limit = 5)
#> # A tibble: 5 × 4
#>   placename oc_lat oc_lng oc_formatted                              
#>   <chr>      <dbl>  <dbl> <chr>                                     
#> 1 Paris       33.7  -95.6 Paris, Texas, United States of America    
#> 2 Paris       38.2  -84.3 Paris, KY 40361, United States of America 
#> 3 Paris       36.3  -88.3 Paris, Tennessee, United States of America
#> 4 Paris       39.6  -87.7 Paris, IL 61944, United States of America 
#> 5 Paris       44.3  -70.5 Paris, 04281, United States of America

Multiple countrycodes per placename must be wrapped in a list. Here is an example with places called “Paris” in Italy and Portugal.

oc_forward_df(placename = "Paris", countrycode = list(c("IT", "PT")), limit = 5)
#> # A tibble: 5 × 4
#>   placename oc_lat oc_lng oc_formatted                                   
#>   <chr>      <dbl>  <dbl> <chr>                                          
#> 1 Paris       44.6   7.28 Brossasco, Cuneo, Italy                        
#> 2 Paris       46.5  10.4  23030 Valfurva SO, Italy                       
#> 3 Paris       37.4  -8.79 8670-320 São Teotónio, Portugal                
#> 4 Paris       43.5  12.1  Paris, 52035 Monterchi AR, Italy               
#> 5 Paris       43.8  11.3  Paris, Via dei Banchi, 50123 Florence FI, Italy

Despite the name, country codes also exist for territories that are not independent states, e.g. Gibraltar (“GI”), Greenland (“GL”), Guadaloupe (“GP”), or Guam (“GU”). You can look up specific country codes with the {ISOcodes} or {countrycodes} packages or on the ISO or Wikipedia webpages. In fact, you can also look up country codes via OpenCage as well. If you were interested in the country code of Curaçao for example, you could run:

oc_forward_df("Curaçao", no_annotations = FALSE)["oc_iso_3166_1_alpha_2"]
#> # A tibble: 1 × 1
#>   oc_iso_3166_1_alpha_2
#>   <chr>                
#> 1 CW

bounds

The bounds parameter restricts the possible results to a defined bounding box. A bounding box is a named numeric vector with four coordinates specifying its south-west and north-east corners: (xmin, ymin, xmax, ymax). The bounds parameter can most easily be specified with the oc_bbox() helper. For example, bounds = oc_bbox(-0.56, 51.28, 0.27, 51.68). OpenCage provides a ‘bounds-finder’ to interactively determine bounds values.

Below is an example of the use of bounds where the bounding box specifies the the South American continent.

oc_forward_df(placename = "Paris", bounds = oc_bbox(-97, -56, -32, 12), limit = 5)
#> # A tibble: 5 × 4
#>   placename oc_lat oc_lng oc_formatted                                                  
#>   <chr>      <dbl>  <dbl> <chr>                                                         
#> 1 Paris       8.05  -80.6 Paris, Distrito Parita, Panama                                
#> 2 Paris      -6.71  -69.9 Eirunepé, Região Geográfica Intermediária de Tefé, Brazil     
#> 3 Paris      -3.99  -79.2 110105, Loja, Ecuador                                         
#> 4 Paris     -13.5   -62.5 Canton Motegua, Municipio Baures, Provincia de Iténez, Bolivia
#> 5 Paris     -23.5   -47.5 Paris, Jardim Santa Fé, Sorocaba - SP, Brazil

Again, you can also use {opencage} to determine a bounding box for subsequent queries. If you wanted to see how many Plaça d’Espanya there are on the Balearic Islands, for example, you could find the appropriate bounding box and then search for the squares:

hi <- oc_forward_df(placename = "Balearic Islands", no_annotations = FALSE)

hi_bbox <-
  oc_bbox(
    hi$oc_southwest_lng,
    hi$oc_southwest_lat,
    hi$oc_northeast_lng,
    hi$oc_northeast_lat
  )

oc_forward_df(placename = "Plaça d'Espanya", bounds = hi_bbox, limit = 20)
#> # A tibble: 16 × 4
#>    placename       oc_lat oc_lng oc_formatted                                                   
#>    <chr>            <dbl>  <dbl> <chr>                                                          
#>  1 Plaça d'Espanya   39.6   2.65 Plaça d'Espanya, Carrer d'Eusebi Estada, 07005 Palma, Spain    
#>  2 Plaça d'Espanya   39.6   2.65 Plaça d'Espanya, Canavall, Palma, Balearic Islands, Spain      
#>  3 Plaça d'Espanya   39.0   1.53 Plaça d'Espanya, Santa Eulària des Riu, Balearic Islands, Spain
#>  4 Plaça d'Espanya   39.0   1.30 Plaça d'Espanya, 07820 Sant Antoni de Portmany, Spain          
#>  5 Plaça d'Espanya   39.9   4.27 Plaça d'Espanya, Maó, Spain                                    
#>  6 Plaça d'Espanya   39.5   3.15 Plaça d'Espanya, 07200 Felanich, Spain                         
#>  7 Plaça d'Espanya   39.0   1.53 Plaça d'Espanya, 07840 Santa Eulària des Riu, Spain            
#>  8 Plaça d'Espanya   39.6   2.65 Plaça d'Espanya, 07002 Palma, Spain                            
#>  9 Plaça d'Espanya   39.9   4.27 Plaça d'Espanya, 07701 Maó, Spain                              
#> 10 Plaça d'Espanya   38.9   1.44 Plaça d'Espanya, Ibiza, Spain                                  
#> 11 Plaça d'Espanya   39.8   2.72 Plaça d'Espanya, Sóller, Spain                                 
#> 12 Plaça d'Espanya   39.7   2.91 Plaça d'Espanya, Inca, Spain                                   
#> 13 Plaça d'Espanya   39.6   2.42 plaça d'Espanya, Andratx, Spain                                
#> 14 Plaça d'Espanya   39.6   2.75 Plaça d'Espanya, Marratxí, Spain                               
#> 15 Plaça d'Espanya   39.6   2.90 Plaça d'Espanya, 07140 Sencelles, Spain                        
#> 16 Plaça d'Espanya   39.8   2.74 Plaça d'Espanya, Fornalutx, Spain

Note that OpenCage does not support point-of-interest or feature search, like “show me all bus stops in this area”. If you are more interested in these kind of features, you might want to take a look at the {osmdata} package.

proximity

The proximity parameter provides OpenCage with a hint to bias results in favour of those closer to the specified location. It is just one of many factors used for ranking results, however, and (some) results may be far away from the location or point passed to the proximity parameter. A point is a named numeric vector of a latitude and longitude coordinate pair in decimal format. The proximity parameter can most easily be specified with the oc_points() helper. For example, proximity = oc_point(38.0, -84.5), if you happen to already know the coordinates. If not, you can also look them up with {opencage}, of course:

lx <- oc_forward_df("Lexington, Kentucky")

lx_point <- oc_points(lx$oc_lat, lx$oc_lng)

oc_forward_df(placename = "Paris", proximity = lx_point, limit = 5)
#> # A tibble: 5 × 4
#>   placename oc_lat oc_lng oc_formatted                                              
#>   <chr>      <dbl>  <dbl> <chr>                                                     
#> 1 Paris       38.2 -84.3  Paris, KY 40361, United States of America                 
#> 2 Paris       48.9   2.32 Paris, Ile-de-France, France                              
#> 3 Paris       39.6 -87.7  Paris, IL 61944, United States of America                 
#> 4 Paris       38.8 -85.6  Paris, Jennings County, IN 47230, United States of America
#> 5 Paris       33.7 -95.6  Paris, Texas, United States of America

Note that the French capital is listed before other places in the US, which are closer to the point provided. This illustrates how proximity is only one of many factors influencing the ranking of results.

Confidence

min_confidence — an integer value between 0 and 10 — indicates the precision of the returned result as defined by its geographical extent, i.e. by the extent of the result’s bounding box. When you specify min_confidence, only results with at least the requested confidence will be returned. Thus, in the following example, the French capital is too large to be returned.

oc_forward_df(placename = "Paris", min_confidence = 7, limit = 5)
#> # A tibble: 5 × 4
#>   placename oc_lat oc_lng oc_formatted                                            
#>   <chr>      <dbl>  <dbl> <chr>                                                   
#> 1 Paris       38.2  -84.3 Paris, KY 40361, United States of America               
#> 2 Paris       36.3  -88.3 Paris, Tennessee, United States of America              
#> 3 Paris       39.6  -87.7 Paris, IL 61944, United States of America               
#> 4 Paris       35.3  -93.7 Paris, Logan County, AR 72855, United States of America 
#> 5 Paris       43.0  -75.3 Paris, Oneida County, New York, United States of America

Note that confidence is not used for the ranking of results. It does not tell you which result is more “correct” or “relevant”, nor what type of thing the result is, but rather how small a result is, geographically speaking. See the API documentation for details.

Retrieve more information from the API

Besides parameters to target your search better, OpenCage offers parameters to receive more or specific types of information from the API.

language

If you would like to get your results in a specific language, you can pass an IETF BCP 47 language tag, such as “tr” for Turkish or “pt-BR” for Brazilian Portuguese, to the language parameter. OpenCage will attempt to return results in that language.

oc_forward_df(placename = "Munich", language = "tr")
#> # A tibble: 1 × 4
#>   placename oc_lat oc_lng oc_formatted           
#>   <chr>      <dbl>  <dbl> <chr>                  
#> 1 Munich      48.1   11.6 Münih, Bavyera, Almanya

Alternatively, you can specify the “native” tag, in which case OpenCage will attempt to return the response in the “official” language(s) of the location. Keep in mind, however, that some countries have more than one official language or that the official language may not be the one actually used day-to-day.

oc_forward_df(placename = "Munich", language = "native")
#> # A tibble: 1 × 4
#>   placename oc_lat oc_lng oc_formatted                
#>   <chr>      <dbl>  <dbl> <chr>                       
#> 1 Munich      48.1   11.6 München, Bayern, Deutschland

If the language parameter is set to NULL (which is the default), the tag is not recognized, or OpenCage does not have a record in that language, the results will be returned in English.

oc_forward_df(placename = "München")
#> # A tibble: 1 × 4
#>   placename oc_lat oc_lng oc_formatted            
#>   <chr>      <dbl>  <dbl> <chr>                   
#> 1 München     48.1   11.6 Munich, Bavaria, Germany

To find the correct language tag for your desired language, you can search for the language on the BCP47 language subtag lookup for example. Note however, that there are some language tags in use on OpenStreetMap, one of OpenCage’s main sources, that do not conform with the IETF BCP 47 standard. For example, OSM uses zh_pinyin instead of zh-Latn-pinyin for Hanyu Pinyin. It might, therefore, be helpful to consult the details page of the target country on openstreetmap.org to see which language tags are actually used. In any case, neither the OpenCage API nor the functions in this package will validate the language tags you provide.

For further details, see OpenCage’s API documentation.

Annotations

OpenCage supplies additional information about the result location in what it calls annotations. Annotations include, among a variety of other types of information, country information, time of sunset and sunrise, UN M49 codes or the location in different geocoding formats, like Maidenhead, Mercator projection (EPSG:3857), geohash or what3words. Some annotations, like the Irish Transverse Mercator (ITM, EPSG:2157) or the Federal Information Processing Standards (FIPS) code will only be shown when appropriate.

Whether the annotations are shown, is controlled by the no_annotations argument. It is TRUE by default, which means that the output will not contain annotations. (Yes, inverted argument names are confusing, but we just follow OpenCage’s lead here.) When you set no_annotations to FALSE, all columns are returned (i.e. output is implicitly set to "all"). This leads to a result with a lot of columns.

oc_forward_df("Dublin", no_annotations = FALSE)
#> # A tibble: 1 × 70
#>   placen…¹ oc_lat oc_lng oc_co…² oc_fo…³ oc_mgrs oc_ma…⁴ oc_ca…⁵ oc_flag oc_ge…⁶ oc_qi…⁷ oc_wi…⁸ oc_dm…⁹ oc_dm…˟ oc_it…˟ oc_it…˟
#>   <chr>     <dbl>  <dbl>   <int> <chr>   <chr>   <chr>     <int> <chr>   <chr>     <dbl> <chr>   <chr>   <chr>   <chr>   <chr>  
#> 1 Dublin     53.3  -6.26       5 Dublin… 29UPV8… IO63ui…     353 "\U000… gc7x98…    114. Q1761   53° 20… 6° 15'… 715826… 734697…
#> # … with 54 more variables: oc_mercator_x <dbl>, oc_mercator_y <dbl>, oc_osm_edit_url <chr>, oc_osm_note_url <chr>,
#> #   oc_osm_url <chr>, oc_un_m49_statistical_groupings <list>, oc_un_m49_regions_europe <chr>, oc_un_m49_regions_ie <chr>,
#> #   oc_un_m49_regions_northern_europe <chr>, oc_un_m49_regions_world <chr>, oc_currency_alternate_symbols <list>,
#> #   oc_currency_decimal_mark <chr>, oc_currency_html_entity <chr>, oc_currency_iso_code <chr>, oc_currency_iso_numeric <chr>,
#> #   oc_currency_name <chr>, oc_currency_smallest_denomination <int>, oc_currency_subunit <chr>,
#> #   oc_currency_subunit_to_unit <int>, oc_currency_symbol <chr>, oc_currency_symbol_first <int>,
#> #   oc_currency_thousands_separator <chr>, oc_roadinfo_drive_on <chr>, oc_roadinfo_speed_in <chr>, …

roadinfo

roadinfo indicates whether the geocoder should attempt to match the nearest road (rather than an address) and provide additional road and driving information. It is FALSE by default, which means OpenCage will not attempt to match the nearest road. Some road and driving information is nevertheless provided as part of the annotations (see above), even when roadinfo is set to FALSE.

oc_forward_df(placename = c("Europa Advance Rd", "Bovoni Rd"), roadinfo = TRUE)
#> # A tibble: 2 × 30
#>   placen…¹ oc_lat oc_lng oc_co…² oc_fo…³ oc_ro…⁴ oc_ro…⁵ oc_ro…⁶ oc_ro…⁷ oc_ro…⁸ oc_ro…⁹ oc_no…˟ oc_no…˟ oc_so…˟ oc_so…˟ oc_is…˟
#>   <chr>     <dbl>  <dbl>   <int> <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>     <dbl>   <dbl>   <dbl>   <dbl> <chr>  
#> 1 Europa …   36.1  -5.34       9 Europa… right   yes     Europa… second… km/h    asphalt    36.1   -5.34    36.1   -5.35 GI     
#> 2 Bovoni …   18.3 -64.9        8 Bovoni… left    <NA>    Bovoni… primary mph     <NA>       18.3  -64.9     18.3  -64.9  VI     
#> # … with 14 more variables: oc_iso_3166_1_alpha_3 <chr>, oc_category <chr>, oc_type <chr>, oc_city <chr>, oc_continent <chr>,
#> #   oc_country <chr>, oc_country_code <chr>, oc_postcode <chr>, oc_road <chr>, oc_road_type <chr>, oc_iso_3166_2 <list>,
#> #   oc_county <chr>, oc_state <chr>, oc_state_code <chr>, and abbreviated variable names ¹​placename, ²​oc_confidence,
#> #   ³​oc_formatted, ⁴​oc_roadinfo_drive_on, ⁵​oc_roadinfo_oneway, ⁶​oc_roadinfo_road, ⁷​oc_roadinfo_road_type,
#> #   ⁸​oc_roadinfo_speed_in, ⁹​oc_roadinfo_surface, ˟​oc_northeast_lat, ˟​oc_northeast_lng, ˟​oc_southwest_lat, ˟​oc_southwest_lng,
#> #   ˟​oc_iso_3166_1_alpha_2

A blog post provides more details.

Abbreviated addresses

The geocoding functions also have an abbr parameter, which is FALSE by default. When it is TRUE, the addresses in the formatted field of the results are abbreviated (e.g. “Main St.” instead of “Main Street”).

oc_forward_df("Wall Street")
#> # A tibble: 1 × 4
#>   placename   oc_lat oc_lng oc_formatted                                             
#>   <chr>        <dbl>  <dbl> <chr>                                                    
#> 1 Wall Street   40.7  -74.0 Wall Street, New York, NY 10005, United States of America
oc_forward_df("Wall Street", abbrv = TRUE)
#> # A tibble: 1 × 4
#>   placename   oc_lat oc_lng oc_formatted                    
#>   <chr>        <dbl>  <dbl> <chr>                           
#> 1 Wall Street   40.7  -74.0 Wall St, New York, NY 10005, USA

See this blog post for more information.

address_only

When address_only is set to TRUE (by default FALSE), OpenCage will attempt to exclude names of points-of-interests from the formatted field of the results. In the following example, the POI “Hôtel de ville de Nantes” (town hall of Nantes) is removed from the oc_formatted column with address_only = TRUE.

oc_reverse_df(47.21864, -1.55413)
#> # A tibble: 1 × 3
#>   latitude longitude oc_formatted                                                             
#>      <dbl>     <dbl> <chr>                                                                    
#> 1     47.2     -1.55 Hôtel de ville de Nantes, Place de l'Hôtel de Ville, 44000 Nantes, France
oc_reverse_df(47.21864, -1.55413, address_only = TRUE)
#> # A tibble: 1 × 3
#>   latitude longitude oc_formatted                                   
#>      <dbl>     <dbl> <chr>                                          
#> 1     47.2     -1.55 Place de l'Hôtel de Ville, 44000 Nantes, France

Vectorised arguments

All of the function arguments mentioned above are vectorised, so you can send queries like this:

oc_forward_df(
  placename = c("New York", "Rio", "Tokyo"),
  language = c("es", "de", "fr")
)
#> # A tibble: 3 × 4
#>   placename oc_lat oc_lng oc_formatted                                                     
#>   <chr>      <dbl>  <dbl> <chr>                                                            
#> 1 New York    40.7  -74.0 Nueva York, Estados Unidos de América                            
#> 2 Rio        -22.9  -43.2 Rio de Janeiro, Região Metropolitana do Rio de Janeiro, Brasilien
#> 3 Tokyo       35.7  140.  Tokyo, Japon

Or geocode place names with country codes in a data frame:

for_df <-
  data.frame(
    location = c("Golden Gate Bridge", "Buckingham Palace", "Eiffel Tower"),
    ccode = c("at", "cg", "be")
  )

oc_forward_df(for_df, placename = location, countrycode = ccode)
#> # A tibble: 3 × 5
#>   location           ccode oc_lat oc_lng oc_formatted                                                                           
#>   <chr>              <chr>  <dbl>  <dbl> <chr>                                                                                  
#> 1 Golden Gate Bridge at     47.6   15.8  Wiesenbauer, Martin's Golden Gate Bridge, 8684 Gemeinde Spital am Semmering, Austria   
#> 2 Buckingham Palace  cg     -4.80  11.8  Buckingham Palace, Boulevard du Général Charles de Gaulle, Pointe-Noire, Congo-Brazzav…
#> 3 Eiffel Tower       be     50.9    4.34 Eiffel Tower, Avenue de Bouchout - Boechoutlaan, 1020 Brussels, Belgium

This also works with oc_reverse_df(), of course.

rev_df <-
  data.frame(
    lat = c(51.952659, 41.401372),
    lon = c(7.632473, 2.128685)
  )

oc_reverse_df(rev_df, lat, lon, language = "native")
#> # A tibble: 2 × 3
#>     lat   lon oc_formatted                                        
#>   <dbl> <dbl> <chr>                                               
#> 1  52.0  7.63 Friedrich-Ebert-Straße 7, 48153 Münster, Deutschland
#> 2  41.4  2.13 Carrer de Calatrava, 68, 08017 Barcelona, España

Further information

For further information about the output and query parameters, see the OpenCage API docs and the OpenCage FAQ. When building queries, OpenCage’s best practices can be very useful, as well as their guide to geocoding accuracy.