-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathREADME.Rmd
138 lines (102 loc) · 5.09 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
---
output: github_document
editor_options:
chunk_output_type: console
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
[![Travis-CI Build Status](https://travis-ci.org/jjchern/zcta.svg?branch=master)](https://travis-ci.org/jjchern/zcta)
# About
This R data package intends to store 2010 Census ZIP Code Tabulation Area (ZCTA) Relationship files. So far it includes:
- official US Census "2010 ZCTA to County Relationship File" (`zcta_county_rel_10.rda`)
- official US Census "2010 ZCTA to Tract Relationship File" (`zcta_tract_rel_10.rda`)
- ZIP Code to ZCTA crosswalk table developed by John Snow, Inc. (`zipzcta.rda`)
# Motivation
Linking the USPS's ZIP codes to US counties is tedious:
- ZIP codes do not resemble spatial entities; they are created for delivering mails. ZIP codes also change over time.
- To get a truly spatial representations of ZIP codes, the US Census Bureau develops the concept of ZIP Code tabulation areas (ZCTAs), which approximates ZIP codes. But
- Census does not release an official crosswalk between ZIP codes and ZCTAs.
- Census does release relationship files between ZCTAs and counties, but at least 25% of the ZCTAs cannot be uniquely linked to counties.
A proposed solution: ZIP codes -> ZCTAs -> counties. This package contains data for connecting these links using official US Census relationship files and ZIP-to-ZCTA crosswalk files created by John Snow, Inc.
# Useful Links
- [Census File Data Source](https://www.census.gov/geographies/reference-files/time-series/geo/relationship-files.2010.html#par_textimage_674173622)
- [Explanation of the 2010 ZCTA to County Relationship File](http://www2.census.gov/geo/pdfs/maps-data/data/rel/explanation_zcta_county_rel_10.pdf)
- [Census ZIP Code Tabulation Areas page](https://www.census.gov/programs-surveys/geography/guidance/geo-areas/zctas.html)
- [2010 ZIP Code Tabulation Area (ZCTA) Relationship File Layouts and Contents](https://www.census.gov/programs-surveys/geography/technical-documentation/records-layout/2010-zcta-record-layout.html)
- [UDS Mapper](https://udsmapper.org/) hosts the ZCTA-ZIP crosswalk file, which should be able to link ZIP codes to 2010 Census ZCTAs. Newer crosswalks are also available from UDS Mapper.
- [The ZIP Code Resources Page](http://mcdc.missouri.edu/allabout/zipcodes.html)
- [All About ZIP Codes: 2010 Supplement](http://mcdc.missouri.edu/allabout/zipcodes_2010supplement.shtml)
# Installation
```r
# install.packages("devtools")
devtools::install_github("jjchern/zcta")
```
# Usage
```{r, message=FALSE, warning=FALSE}
# devtools::install_github("jjchern/gaze")
library(tidyverse)
# ZCTA to counties
zcta::zcta_county_rel_10
# ZIP codes to ZCTAs
zcta::zipzcta
# Show variable labels, and whether value label exists for certain variables
# devtools::install_github("larmarange/labelled")
labelled::var_label(zcta::zcta_county_rel_10)
# Total number of zcta records
nrow(zcta::zcta_county_rel_10)
# Number of distinct zcta
zcta::zcta_county_rel_10 %>% distinct(zcta5) %>% nrow()
# In most instances the ZCTA code is the same as the ZIP Code for an area
# But some zctas fall in more than one county
# For example, there're 7060 zctas fall in 2 counties
zcta::zcta_county_rel_10 %>%
group_by(zcta5) %>%
summarise(`Num of counties` = n()) %>%
group_by(`Num of counties`) %>%
summarise(`Num of zctas` = n())
# To get an one-to-one relationship between zcta and county, assign county to
# a zcta if the zcta has the most population. For Example:
# Before: zcta 601 fall in county 72001 and 72141
zcta::zcta_county_rel_10 %>%
select(zcta5, state, county, geoid, poppt, zpoppct)
# After: relate zcta 601 only to county 72001 as it accounts for 99.43% of the population
one_to_one_pop <- zcta::zcta_county_rel_10 %>%
select(zcta5, state, county, geoid, poppt, zpoppct) %>%
group_by(zcta5) %>%
slice(which.max(zpoppct)) %>%
ungroup()
one_to_one_pop
# Or assign county to a zcta if the zcta accounts for most of the area.
one_to_one_area <- zcta::zcta_county_rel_10 %>%
select(zcta5, state, county, geoid, poppt, zpoppct, zareapct) %>%
group_by(zcta5) %>%
slice(which.max(zareapct)) %>%
ungroup()
one_to_one_area
# Using either of these ZCTA-to-county tables, you can go from ZIP codes to ZCTAs to county
zipcounty <- zcta::zipzcta %>%
left_join(one_to_one_area, by = c("zcta" = "zcta5")) %>%
select(zip, zcta, state = state.x, countygeoid = geoid) %>%
arrange(zip)
zipcounty
# Merge the two 1 to 1 relationship datasets and identify zctas that have different county match
one_to_one_pop %>%
left_join(one_to_one_area, by = "zcta5") %>%
select(zcta5,
county.x, geoid.x, zpoppct.x,
county.y, geoid.y, zpoppct.y) %>%
filter(geoid.x != geoid.y)
# Get county names for the 1 to 1 relationship dataset
# Also keep just states and DC
one_to_one_pop %>%
mutate(geoid = as.integer(geoid)) %>%
left_join(gaze::county10, by = "geoid") %>%
select(zcta5, state, usps, county, geoid, name) %>%
filter(state <= 56)
```