David A. Swanson, University of California Riverside, Riverside, CA and The Center for Studies in Demography and Ecology, University of Washington, Seattle, WA (email: firstname.lastname@example.org)
T.M. Bryan, Bryan Demographic Research, Richmond, VA (email: email@example.com)
Richard Sewell, Alaska Department of Transportation and Public Facilities, Anchorage, AK (email: firstname.lastname@example.org)
The Census Bureau plans to introduce a new Disclosure Avoidance System known as Differential Privacy (DP) for its 2020 census data products. DP provides a probabilistic formal quantification that measures how much privacy is afforded by a query. Ruggles et al. argue that DP goes far beyond precedent and exceeds what is necessary to keep data safe under census law. They contend further that, because DP focuses on concealing individual characteristics instead of respondent identities, it is a blunt and inefficient instrument for disclosure control. As they point out, the core metric of DP does not measure the risk of identity disclosure, so it cannot assess disclosure risk as defined under census law, making it untenable for optimizing the privacy/usability trade-off. Our purpose in this paper is to assess the errors introduced by (DP) on census block data in Alaska in the form of four case studies.
Covering 570,641 square miles of land, Alaska is the largest state. With the 2010 census showing only 710,231 people, however, it is also the least densely-populated of the 50 states, at 1.24 people per square mile. The 2010 census (see below), organized the state into 45,292 census blocks, of which only 12,870 had one or more people, leaving 32,422 without any population. Across the 12,870 census blocks with at least one person, there were an average of 55.2 persons in each. These summary statistics make Alaska one of the states in which one would expect a high level of error introduction in order to avoid disclosure of personal information at the block level because there are so few people on average per block. This is a point to which we return in the final section.
The application of DP is a brand new approach for the Census Bureau and is different from all prior Census Bureau initiatives in regard to disclosure avoidance. As a component of the DP initiative, the Census Bureau has released a series of “demonstration products” that allow outside analysts and stakeholders to determine for their purposes the impact DP would have on Census data. These demonstration products generally contain:
- the most common, basic demographic and housing variables
- different levels of geography
- data as they were originally reported in the Summary Files (SF) in 2010, which reported actual census data with small privacy protection modifications as noted supra page
- trial data as they have been by adjusted (perturbed) DP
Here, we examine the errors introduced by DP on 2010 Census SF block data for Alaska in the form of four case studies. We employ the “demonstration product” for census blocks in Alaska labeled as 2020527, which we downloaded from the Minnesota Population Center’s NHGIS site. In this analysis, we utilize all of the 45,292 census blocks in Alaska found in this file.
In the analyses for case studies 1 through 3, we employed the cross-tabulation routine found in Release 12 of the NCSS Statistical System. For case study 4, we sorted the blocks in descending order by the 2010 census total population, then used the logical “IF’ function to examine differences between the 2010 census count and the DP count (match = zero; non-match =1), and summed the number of non-matches.
Case 1: Children without Adults: How Did Differential Privacy turn three blocks into 765?
The 2010 census reported that there were three blocks in which 1 or more children (under age 18) were listed, but no adults (18 years and over). Of these three blocks, the first had one child, the second, five children, and the third had 15 children. It is likely that the last block includes a facility where children reside in the presence of adults who themselves live elsewhere.
Out of 45,292 blocks, it is highly believable that there are three in which a total of 21 children reside without adults. However, DP produced 765 such blocks in which 3,381 children reside without adults, a highly unbelievable number
Case 2: Differential Privacy turned 1,252 blocks with one or more people of voting age into blocks with zero people of voting age
- In comparing the voting age populations reported by the 2010 census and the DP file, it was found that there are 1,252 blocks in which DP reported zero people of voting age while the 2010 census reported one or more persons of voting age in these same blocks.
Case 3: Differential Privacy turned 830 blocks with zero persons of voting age into blocks with one or more persons of voting age
- At the same time, DP turned 830 blocks in which the 2010 census reported zero persons of voting age into blocks with one or more persons of voting age.
Case 4: Of 12,870 blocks in which the 2010 census shows one or more persons, 12,366 of them (96%) show a different number of persons when DP is applied.
- Of these same 12,870 blocks, 12,009 of them (93%) show a different number of persons of voting age population (18 years and over) when DP is applied.
Discussion and Conclusion
As far as we can tell from the information available from the Minnesota Population Center, Alaska was not subject to higher levels of DP Disclosure Avoidance than the other states in the “Demonstration Product” file (2020527) we have analyzed. Instead, it appears that DP levels were uniform across all states. Given the low numbers of people found statewide in the 2010 census and its low number of 2010 census blocks, Alaska would appear to be a candidate for a higher level of DP Disclosure Avoidance than many other states. This makes our findings all the more worrying because they show high levels of error at the census block level even at what might be described as a low level of DP Disclosure Avoidance. Finding that DP produced 765 census blocks in which 3,381 children reside without adults at this level of DP Disclosure Avoidance is troubling, as is, among our other findings, that of the 12,870 blocks in which the 2010 census shows one or more persons, 12,366 of them (96%) show a different number of persons when DP is applied.
If DP is implemented at the avoidance level found in the “Demonstration Product” file (2020527) for census blocks in Alaska we examined in this study, it will affect almost all of the state’s users of small area census data, from legislatures relying on the data to design Congressional Districts to comply with the law, to demographics vendors who supply clients with zip code level characteristics so businesses can make better decisions. Other end users, such as health district administrators, who need the data to track health issues such as COVID-19 and businesses that use small area data such as zip codes, blocks and block groups to improve marketing, stand to be dramatically impacted. Many government agencies also depend on accurate small area census data to make programs run efficiently and effectively and the biggest impact of DP will be in small areas. When small areas are aggregated into higher levels of geography by users, the errors introduced by DP tend to even out. However, when small areas themselves are the units of analysis, these users and their clients will be forced to deal with erroneous data if DP is implemented.
As evidence that this problem is not unique to Alaska, note that on March 11th, 2021 the state of Alabama filed a civil suit in the United States District Court for the Middle District of Alabama, Eastern Division, aimed at enjoining the Census Bureau from deploying Differential Privacy on 2020 census products (Civil Action NO. 3:21-CV-211).
The declaration filed with this suit as Exhibit 6 lists not only errors similar to what we have found in Alaska, but, in addition, many more, affecting not just census blocks, but also other levels of geography, including census tracts, school districts, legislative districts, and counties.
Because it is likely that the errors we found in Alaska, not to mention the even more extensive errors found in Alabama, will be found in other states and perhaps at even higher levels of error, our examination leads us to conclude that it is likely the errors introduced by DP of the type and at the level found in the demonstration product file we examined will render the nation’s block level data essentially unusable.
We are grateful to the Minnesota Population Center for assembling and making available the DP demonstration product file we use here. We also are grateful for advice and comments from Jan Vink and Bill O’Hare and editing by Emily Merchant.#Member-News