Author: John Lauermann, School of Information, Pratt Institute
This repository contains materials for a workshop with the Data Storytelling Lab at Pratt Institute. The workshop covers how to query data from the Census API and then visualize and analyze it using R tools.
In this workshop, you’ll learn how to visualize and analyze data from the US Census. Census data products are the primary historic record of American society, available as open data for every community in the United States. The long temporal record and diverse geographic scope of these data present opportunities for data storytelling at scale. Census data allow you to tell stories that move beyond small scale case studies, to think about national patterns and historic trends.
To work with this kind of big data, we’ll use open source statistical tools in R to analyze and visualize geographic patterns.
By the end of the workshop, you should be able to:
- Query American Community Survey data from the Census API.
- Visualize patterns in the data using choropleth maps and scatterplots.
- Analyze whether those patterns are statistically significant using correlation and simple linear regression.
Initial set up steps include:
- Sign up for a Census API key at this link
- Install R and RStudio on your computer with this link
- Download the code version of this workflow:
visualize-social-patterns-census-api.rmd. (go to the page, click the download button on the top right) - Open the
.rmdnotebook in RStudio (from the top left menu bar: File -> Open File)
You can replicate the workflow step by step using the code chunks in the notebook. To run each chunk, either click the green ‘Run Current Chunk’ button or put your cursor in the code block and hit either Cntrl + Shift + Enter (for a PC) or Cmd + Shift + Enter (for a Mac).
To see the full scope of the this workshop, you can check out the markdown version of the notebook: visualize-social-patterns-census-api.md.
- Install and load the libraries we’ll need for the workflow.
- Explore Census data and define a query to pull two variables from the API for further analysis. For this we’ll use
tidycensusto query the Census API anddplyrto clean up the data table. - Visualize the geographic distribution of each variable using choropleth maps built with
ggplot2. - Visualize the relationship between those two variables using a scatterplot built with
ggplot2. - Test the correlation between those variables using
cor.test() - Test whether one variable predicts the other using simple linear regression with the
lm()function.
- Walker, Kyle (2023) Analyzing US Census Data: Methods, Maps, and Models in R, CRC Press (open access at https://walker-data.com/census-r/)
- Wickham, Hadley (2025) ggplot2: Elegant Graphics for Data Analysis, Springer (open access at https://ggplot2-book.org/preface-3e.html)
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
