Title: | Datasets for Historians |
Description: | These sample data sets are intended for historians learning R. They include population, institutional, religious, military, and prosopographical data suitable for mapping, quantitative analysis, and network analysis. |
Authors: | Lincoln Mullen [aut, cre] |
Maintainer: | Lincoln Mullen <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.3.0 |
Built: | 2025-02-12 03:09:55 UTC |
Source: | https://github.com/ropensci/historydata |
Dates when Roman Catholic dioceses and archdioceses in the United States were founded or made metropolitan sees, with geocoded locations. The sources cited indicate that none of these sees have been discontinued.
A data frame with 425 observations of 6 variables.
: name, and thus location, of the diocese.
, date
: the date when the diocese was
, that is first founded, or made a metropolitan
Encoded as a date object.
: the rite overseen by the diocese. Regions with ordinary
jurisdiction but not episcopal character are not included.
, long
: latitude and longitude coordinates for the
headquarters city of the diocese.
This data is compiled from several sources:
Joseph Bernard Code, Dictionary of the American Hierarchy (1789-1964) (New York: Joseph F. Wagner, 1964), 425-26.
For the United States since 1963, Canada, and Mexico: https://www.catholic-hierarchy.org/
These are wholesale market prices in the city of Dijon in Burgundy in central France from 1568 to 1630. They include the major cereal grains, different qualities of wine, dried legumes, oils used for cooking, seeds, and candle wax made from beef tallow. All prices were generally recorded by the city council at the same time of year on the first market day after the feast of St. Martin (November 11). All prices are in sous tournois / 20 sous = 1 livre tournois.
dijon_prices dijon_prices_wide
dijon_prices dijon_prices_wide
is a data frame with 1,110 observations of 6
variables. dijon_prices_wide
is a data frame with 19 observations of
65 variables. dijon_prices_wide
contains the data as it was
transcribed; that data has been converted to a tidy format in
An object of class tbl_df
(inherits from tbl
, data.frame
) with 19 rows and 65 columns.
: the commodity for sale
: the
amount of the commodity for that price
: the price in
sous tournais.
, citation_date
citations and dates for documents in the Archives municipales de Dijon.
Mack Holt, George Mason University
All citations are to the Archives municipales de Dijon. See the
columns citation
and citation_date
in dijon_prices
the documents from which each price was gathered.
This dataset contains information about the founding of colleges established before 1848 in the United States of America.
A data frame with 65 observations of 6 variables.
: The name of the college or university.
: The name under which the institution was
founded, if different.
, state
: The location of the institution.
: The year that the institution was founded.
: The sponsoring religious denomination, or
if not founded by a denomination.
George Oberle, George Mason University
Daniel Walker Howe
This data was transcribed by George Oberle from the table "Some American Institutions of Higher Education Founded Before 1848, in Daniel Walker Howe, What Hath God Wrought: The Transformation of America, 1815-1848 (New York: Oxford University Press, 2007), 460-461.
head(early_colleges) if(require(ggplot2)) { ggplot(early_colleges, aes(x = established)) + geom_bar(binwidth = 5) + ggtitle("Founding Dates of Early American Colleges") }
head(early_colleges) if(require(ggplot2)) { ggplot(early_colleges, aes(x = established)) + geom_bar(binwidth = 5) + ggtitle("Founding Dates of Early American Colleges") }
This package provides sample datasets of interest to historians, analogous to the datasets package for R generally and the histdata package for datasets from the history of statistics.
Citations to the sources for datasets are provided in the documentation for each dataset.
This dataset contains information about the appointments and careers of all federal judges in United States history since 1789. It includes judges who "judges presidentially appointed during good behavior who have served since 1789 on the U.S. District Courts, the U.S. Courts of Appeals, the Supreme Court of the United States, the former U.S. Circuit Courts, and the federal judiciary's courts of special jurisdiction." Some of the unnecessary information from the source has been excluded.
Two data frames, judges_people
and judges_appointments
The data frame judges_people
contains information about the judges,
such as names and vital information. The data frame
contains information about their appointments,
such as the name of the court, nominating president, and the dates of
This data is taken from the Biographical Directory of Federal Judges, 1789-present.
data(judges_people) data(judges_appointments)
data(judges_people) data(judges_appointments)
This dataset contains transcriptions of the membership figures from the annual meeting minutes of the Methodist Episcopal Church in the United States of America.
A data frame with 20241 observations of 11 variables.
Lincoln Mullen, George Mason University
See the url
field in each row for a link to the page in
the Hathi Trust from which that row was transcribed.
This dataset contains historic consumer price index (CPI) data including estimates before the modern U.S. CPI, retrieved from the Federal Reserve Bank of Minneapolis.
A data frame with 225 observations of 3 variables.
These details are taken from https://www.minneapolisfed.org/about-us/monetary-policy/inflation-calculator/consumer-price-index-1800- and edited.
Official U.S. data go back to 1913 for a consumer price index (CPI) comparable to what the U.S. Bureau of Labor Statistics (BLS) still calculates today. The table below reflects the following historical series as initially compiled by the BLS for the Handbook of Labor Statistics, with modern CPI data from 1913 to the present day:
1800 to 1851 - Index of Prices Paid by Vermont Farmers for Family Living
1851 to 1890 - Consumer Price Index by Ethel D. Hoover
1890 to 1912 - Cost of Living Index by Albert Rees
1913 to 1977 - Consumer Price Index (CPI)
1978 to present - Consumer Price Index for all Urban Consumers (CPI-U)
The dataset uses 1967 as the index (1967=100). With the caveat that data before 1913 should be considered estimates To find out how much a price in Year 1 would be in Year 2 dollars:
Year 2 Price = Year 1 Price x (Year 2 CPI/Year 1 CPI)
: date of CPI, or estimate of CPI.
: average annual CPI, or estimate of average annual CPI.
: annual percentage change of the CPI.
This data is compiled from the Federal Reserve Bank of Minneapolis:
This dataset contains transcriptions of records of missions held in by the Paulist Fathers, a Roman Catholic missionary order, in the nineteenth-century United States. This dataset contains only the most interesting data recorded in the full Paulist mission chronicles. The founding members of the Paulist Fathers began as members of the Redemptorist order. Their chronicles include both the missions held as Redemptorists and as Paulists. This transcription only includes missions up to the year 1893; the Paulist Chronicles contain more records than are transcribed here, and the Paulists continued holding missions for many more years than are recorded in the Chronicles.
A data frame with 841 observations of 17 variables.
: The number assigned to the mission in the
Paulist Mission Chronicles.
: The name of the church or cathedral at which the
mission was held.
, state
: The location of the mission.
, end_date
: The start and end dates of the
mission as date objects.
: The year of the mission.
: The decade of the mission (useful for faceting).
, duration_years
: The duration of the
mission in days (as an integer) and in weeks (as an ordered factor).
: The number of confessions heard by the Paulists
at the mission, which is a rough proxy for the number of people who
attended the mission.
: The total number of converts made during the mission
and people left under instruction for conversion after the mission was
: Whether the mission was held under the Redemptorist or
Paulist order.
, long
: The latitude and longitude of the city where
the mission was held.
, page
: The location of the mission record in the
Paulist mission chronicles.
Lincoln Mullen, George Mason University
The Paulist missions are recorded in Chronicle of the Missions Given by the Congregation of Missionary Priests of St. Paul the Apostle, six manuscript volumes, Office of Paulist History and Archives, North American Paulist Center, Washington, DC. Data transcribed by Lincoln Mullen.
This dataset contains transcriptions of annual denominational records for Presbyterians in the United States. This data was compiled Herman Carl Weber for the Presbyterian Church U.S.A. in 1927. For an explanation of the variables, see the book from which the records were transcribed.
A data frame with 133 observations of 37 variables.
Weber includes two sections, one of raw data compiled from the Minutes of the General Assembly, which apparently includes the foreign membership of the Presbyterian churches. The second section contains the data that Weber used for his visualizations, which excludes the foreign membership and makes various calculations on the data.
Not all of the data in Weber is reproduced here. The data comes from part 1, with the tables on financial information excluded.
One field that varies in title from members received by exam to members received by confession varies in title, but has been presumed to refer to the same process.
Lincoln Mullen, George Mason University
Weber, Herman C. Presbyterian Statistics Through One Hundred Year, 1826–1926: Tabulated, Visualized, and Interpreted. Philadelphia: The General Council, Presbyterian Church in the U.S.A., 1927. https://catalog.hathitrust.org/Record/007109885.
Naval encounters during the Quasi War between France and the United States of America
A data frame with 198 observations of 16 variables.
This data was gathered by Abby Mullen.
This dataset contains estimates of the population of American Jews.
A data frame with 92 observations of 3 variables.
: date of estimate.
: the type of estimate. population_low
are the lower and upper bounds on the population of
American Jews; percentage_high
and percentage_low
are the
lower and upper bounds on the percentage of Jews among the United States
: the value of the estimate.
This data is taken from the appendix in Jonathan D. Sarna, American Judaism: A History (New Haven: Yale University Press, 2004), 375-376.
Spousal and parent/child relationships among selected members of the Tudor dynasty, suitable for network analysis.
A data frame with 35 observations of 3 variables.
, person_2
: The two people in the relationship.
: The type of relationship.
A dataset of cities mentioned in the U.S. Census and other data sources. In general, cities are included once they reach a population of 2,500. Data is recorded every ten years. See the dataset's GitHub page for details on how the data was gathered.
An object of class tbl_df
(inherits from tbl
, data.frame
) with 62224 rows and 17 columns.
Erik Steiner and Jason Heppler, "United States Historical City Populations, 1790-2010," Spatial History Project, Center for Spatial and Textual Analysis, Stanford University, https://github.com/cestastanford/historical-us-city-populations/
Active Duty US Military Personnel by Regional Area and by Country: Worldwide Manpower Distribution by Geographical Area
A data frame with 50 observations of 27 variables.
This data subset was extracted and transcribed in February 2018 from 48 individual DoD military troop strength yearly reports which varied in composition over time according to requirements mandated by Congress. The compiled spreadsheet covers the period 1950-1999, less the years 1951-1952 which are not available. It includes total military personnel, the numbers in the U.S. and overseas, the totals by military Service, and the figures for some twenty countries in which the United States had a significant military presence for much or all of the period. The original DoD reports are a mix of .pdf and .xls formats downloaded in one.zip file from the section “Historical Publications” at the bottom of the DMDC reports page. The country column headings in this spreadsheet use the ISO 3166 Alpha-2 country codes from the ISO Online Browsing Platform (OBP) at https://www.iso.org/obp/ui/#search.
Clarke Bursley put this data together.
Defense Manpower Data Center, “Active Duty Military Personnel by Regional Area and by Country: Worldwide Manpower Distribution by Geographical Area (M05), Historical Reports - Military Only - 1950, 1953 - 1999,” DoD Personnel, Workforce Reports & Publications, www.dmdc.osd.mil/appj/dwp/dwp_reports.jsp
Population figures for the entire United States of America from the decennial census.
A data frame with 23 observations of 2 variables.
: date of the census.
: population of the state or territory.
This dataset has been gathered by the NHGIS. Minnesota Population Center, National Historical Geographic Information System: Version 2.0 (Minneapolis: University of Minnesota, 2011).
head(us_national_population) if(require(ggplot2)) { ggplot(us_national_population, aes(x = year, y = population)) + geom_line() + ggtitle("Population of the United States, 1790-2010") }
head(us_national_population) if(require(ggplot2)) { ggplot(us_national_population, aes(x = year, y = population)) + geom_line() + ggtitle("Population of the United States, 1790-2010") }
Population figures for US states and territories from the decennial census.
A data frame with 983 observations of 4 variables.
: date of the census.
: name of the state or territory.
: population of the state or territory.
: a unique identifier for joining NHGIS data to spatial
This dataset has been gathered by the NHGIS. Minnesota Population Center, National Historical Geographic Information System: Version 2.0 (Minneapolis: University of Minnesota, 2011).