Linnentown - 1940 Census Analysis

This work has been prepared by Hope Bleasdale, a PhD Student from the University of Liverpool studying Data Analytics and Society. This project came about through a fellowship between the University of Liverpool and the University of Georgia under the mentorship of Assistant Professor Jerry Shannon and his class, Community GIS. This analysis was originally prepared by Hope - inspired by questions formulated by the Community GIS class regarding the census 1940 dataset. This was followed by discussions and involvements with the Community GIS class and Dr Shannon, resulting in amendments to the analysis which are displayed in this notebook.

This notebook comprises an analysis of the 1940 Census for the block 29-11 in Athens, Georgia. This block contains the Linnentown Neighbourhood - an African American neighbourhood that was removed as part of an ‘urban renewal’ project in the mid 1960s to make way for student housing for the University of Georgia. As with other urban renewal schemes across America, the removal of Linnentown brought devestating economic, social and political effects for the black community. Although the Linnentown neighbourhood no longer exists as it once did; the individuals who made up the community are still affected by it’s demolition.

The overarching aim of the Community GIS class’s work under Dr Shannon has been to shine a light on the history of Linnentown, and recover the story of this neighbourhood. This has included talking to those who have lived in Linnentown, or who are connected to the Linnentown community; uncovering and digitising the removal records of Linnentown; rebuilding maps, images and narratives of the neighbourhood; gathering relocation records for those who were forced out of Linnentown; and digitising and analysing census data from the area to gain more in-depth understandings of the people who resided there.

Unfortunately, the 1960 Census data has not yet been published by the state, which would have been more time-appropriate and likely more useful when thinking through the history of Linnentown before the demolition in the mid 60s. But, the 1940 census data is available thus should still provide some interesting insights on the neighbourhood at this time.

This analysis has been undertaken in R - a programming language for statistical computing, and all of the code used will be provided in this notebook so it will be fully reproducible for anyone. The 1940 Census data (which is held in hand-written form) has been retrieved from the Ancestry database - https://guides.libs.uga.edu/az.php - and digitised and collated by the Community GIS class.

Setting up for analysis

To set up the R notebook, relevant libraries are loaded (they must be installed first and then loaded from the library).

# Load up the tidyverse library
if (!require("tidyverse")) install.packages("tidyverse")

## Loading required package: tidyverse

## -- Attaching packages -------------------------------------------------------------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.2.1     v purrr   0.3.3
## v tibble  2.1.3     v dplyr   0.8.4
## v tidyr   1.0.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0

## -- Conflicts ----------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(tidyverse)

The next step is to read in the census data as a csv file.

# Read in the census 1940 data 
data<-read_csv("census_1940_full_geo.csv")

## Parsed with column specification:
## cols(
##   .default = col_character(),
##   line = col_double(),
##   page = col_double(),
##   home_value = col_number(),
##   age = col_double(),
##   birth_year = col_double(),
##   hours_week = col_double(),
##   unemploy_dur = col_double(),
##   weeks_work_39 = col_double(),
##   f_age_marriage = col_double(),
##   child_num = col_double(),
##   X = col_double(),
##   Y = col_double()
## )

## See spec(...) for full column specifications.

To have a look at the data, we can call ‘names’. This provides all of the column names in the census 1940 data so we can see what information it holds.

# Have a look at the column names
names(data)

##  [1] "line"              "page"              "entered_by"       
##  [4] "house_num"         "street"            "street1"          
##  [7] "address"           "address1"          "block"            
## [10] "Linnentown"        "own_Rent"          "home_value"       
## [13] "first_me"          "last_name"         "given_name"       
## [16] "relation_head"     "gender"            "race"             
## [19] "age"               "birth_year"        "marital_status"   
## [22] "school_attend"     "highest_grade"     "birthplace"       
## [25] "citizen"           "rescity_35"        "rescounty_35"     
## [28] "resstate_35"       "farm"              "employ_pay"       
## [31] "public_emerg"      "seek_work"         "employ_hist"      
## [34] "employ_detail"     "hours_week"        "unemploy_dur"     
## [37] "occupation"        "industry"          "work_class"       
## [40] "employ_code"       "weeks_work_39"     "income"           
## [43] "income_other"      "father_birth"      "mother_birth"     
## [46] "language"          "veteran"           "vet_father"       
## [49] "military"          "occupy_usual"      "industry_usual"   
## [52] "work_class_usual"  "employ_cide_usual" "female_marriage"  
## [55] "f_age_marriage"    "child_num"         "X"                
## [58] "Y"

Linnentown context

First we will look at the context of Linnentown in relation to the wider census block covered in the data.

# First group the data by address and by whether they are in Linnentown or not, and count how many residents are in each category.

data %>% 
  group_by(Linnentown) %>%
  count()

## # A tibble: 2 x 2
## # Groups:   Linnentown [2]
##   Linnentown     n
##   <chr>      <int>
## 1 No           581
## 2 Yes          250

There are 250 residents within Linnentown and 581 non-Linnentown residents in this census data block.

# Use distinct to find unique addresses to count how many households are in Linnentown and not within Linnentown

data %>% 
  group_by(Linnentown) %>%
  distinct(address) %>%
  count()

## # A tibble: 2 x 2
## # Groups:   Linnentown [2]
##   Linnentown     n
##   <chr>      <int>
## 1 No           142
## 2 Yes           49

In this data, there are 49 households in Linnentown and 142 non-Linnnentown households. Using these numbers we can work out the average household size for Linnentown and non-Linnentown residents for this 1940 census data.

# Non-Linnentown residents (number of residents divided by number of households)
581/142

## [1] 4.091549

# Linnentown residents (number of residents divided by number of households)
250/49

## [1] 5.102041

The average household size in Linnentown was 5 persons compared to the average household size outside of Linnentown which was about 4 persons.

Age of residents

# Make the 'age' variable numeric, and change the levels in the Linnentown variable (so 'Yes' comes first as it reads better)
data_Linnentown <- data %>%
  mutate(age=as.numeric(age)) %>%
  filter(age>0) %>%
  dplyr::mutate(Linnentown = factor(Linnentown,
                                    levels=c("Yes","No")))

# Plot a density estimate of the age distributions of Linnentown and non-Linnentown residents
ggplot(data_Linnentown, aes(age, fill=Linnentown, colour=Linnentown)) +
  geom_density(alpha=0.25) +
  labs(x="Age", y="Density",
       title = "Age of residents") +
  scale_fill_manual(values =c("Orange", "Purple"))

As shown by the density plot, Linnentown residents tended to be younger than non-Linnentown residents, with many of them below around age 27/28. There is also a slight upwards trend in the mid 50s age range - we discussed in class that this may be due to multi-generational families living in Linnentown.

How does income compare for residents in Linnentown and those nearby?

Median household income varies by neighbourhood (within-Linnentown or not within-Linnentown) and by race.

These next few chunks of code work out the median household income of Linnentown and Non-Linnentown residents.

# Find out median household income for Linnentown residents
data %>% 
  mutate(income=as.numeric(income)) %>%
  filter(income>0, Linnentown=="Yes") %>%
  group_by(address) %>%
  summarise(hhinc=sum(income)) %>%
  ungroup() %>%
  summarise(medianinc=median(hhinc))

## Warning: NAs introduced by coercion

## # A tibble: 1 x 1
##   medianinc
##       <dbl>
## 1      804.

# Find out median household income for Non-Linnentown residents
data %>% 
  mutate(income=as.numeric(income)) %>%
  filter(income>0, Linnentown=="No") %>%
  group_by(address) %>%
  summarise(hhinc=sum(income)) %>%
  ungroup() %>%
  summarise(medianinc=median(hhinc))

## Warning: NAs introduced by coercion

## # A tibble: 1 x 1
##   medianinc
##       <dbl>
## 1      1336

data %>% 
  mutate(income=as.numeric(income)) %>%
  filter(income>0, Linnentown=="No", race=="Negro") %>%
  group_by(address) %>%
  summarise(hhinc=sum(income)) %>%
  ungroup() %>%
  summarise(medianinc=median(hhinc))

## Warning: NAs introduced by coercion

## # A tibble: 1 x 1
##   medianinc
##       <dbl>
## 1      1033

Median household income for Linnentown residents was $804. Median household annual income for Non-Linnentown residents was $1336 - but this can be separated by race of the residents. For black non-Linnentown residents, median household annual income was $1033.

This can be plotted on a bar chart.

# First set up the variables that will be plotted 
# Make the income variable numeric, and find out the overall sum of income per each household
data_inc_nona <- data %>%
  mutate(income=as.numeric(income)) %>%
  filter(income>0) %>%
  group_by(address, Linnentown, race) %>%
  summarise(househinc=sum(income)) %>%
  ungroup()

## Warning: NAs introduced by coercion

# Use the above variable to work out the median household income by Linnentown and by race. (Change the levels for Linnentown and for race just so they read better)
data_hhincome <- data_inc_nona %>%
  dplyr::mutate(Linnentown = factor(Linnentown,
                                    levels=c("Yes","No"))) %>%
  dplyr::mutate(race = factor(race,
                                    levels=c("White","Negro"))) %>%
  group_by(Linnentown, race) %>%
  summarise(medianinc=median(househinc))

# Display bar plot
ggplot(data_hhincome, aes(x=Linnentown,y=medianinc, fill=race)) +
  geom_bar(stat="identity",position="dodge", width=0.8) +
  labs(x="Within Linnentown?", y="Median household income",
  title = "Median household income of those in Linnentown and residents nearby") +
  scale_fill_manual(values=c("#fc9272","#7fcdbb"))

This bar plot of median household income shows that income was lower in Linnentown than nearby areas, but was only slightly lower than other black residents who were nearby. It was discussed with class that the lower income may also be influenced by the lower age distribution of Linnentown.

This can also be visualised in a box plot - where it is easier to see any outliers in the data.

# Alter the factor levels for Linnentown and race again just so they read clearer
data_inc_nona$Linnentown <- factor(data_inc_nona$Linnentown, levels=c("Yes", "No")) 
data_inc_nona$race <- factor(data_inc_nona$race, levels=c("White", "Negro")) 

# Set up and display the box plot
ggplot(data_inc_nona, aes(x= Linnentown, y=househinc,color=race)) +
  geom_boxplot() +
  labs(title="Household income of those in Linnentown and residents nearby",
       x="Within Linnentown?", y="Household income")

This boxplot shows that the household income does not actually vary that much, particularly for the black residents within and not within Linnentown, the household income covers all the same values, with the top values in Linnentown being higher than those for black residents not within Linnentown. However, the average household income for those black residents not within Linnentown is still higher than those within Linnentown. The household income for whites is generally higher, but still covers the same values as the other black groups - meaning that some households brought in the same household incomes as others across races and across neighbourhoods.

The income for the white households shows some outliers with really high household incomes. The top outlier represents the household 390 Lumpkin Street, which seems to be a rooming house of 7 individuals who are all providing an income - 3 of them work at UGA, two as teachers and one as a librarian. Thus it is plausible that this rooming house would have an exceptionally high hosuehold income with seven working residents. The second outlier represents the household 598 Milledge Avenue - a three-person family household with only one income, a UGA professor. Again, it is likely that this professor earnt considerably more than other residents in the area.

Nonetheless, it is important to note the household income still covers pretty much the same range - apart from the non-white, non-Linnentown residents, but this is probably influenced by the small sample size.

How does the educational level of residents in Linnentown compare to nearby residents?

# First need to recode all of the education categories so they display in order - if not it is very unclear and confusing
data_education <- data %>%
  dplyr::mutate(highest_grade = factor(highest_grade,
                                    levels=c("0",
                                             "Elementary_1",
                                             "Elementary_2",
                                             "Elementary_3",
                                             "Elementary_4",
                                             "Elementary_5",
                                             "Elementary_6",
                                             "Elementary_7",
                                             "Elementary_8",
                                             "Highschool_1",
                                             "Highschool_2",
                                             "Highschool_3",
                                             "Highschool_4",
                                             "College_1",
                                             "College_2",
                                             "College_3",
                                             "College_4",
                                             "NA"))) %>%
  dplyr::mutate(Linnentown = factor(Linnentown,
                                    levels=c("Yes","No")))

Below is a preliminary plot of the highest grade achieved by residents in Linnentown and not-within-Linnentown. This shows that the rates of education are much lower in Linnentown. However, after a discussion with the Community GIS class it was suggested that this may just be reflecting the younger population in Linnentown - perhaps the residents weren’t old enough to possibly hold certain educational grades.

# plot the bar chart
ggplot(data_education)+
  geom_bar(aes(x=highest_grade, fill=Linnentown), position=position_dodge(preserve = "single")) +
  theme(axis.text.x = element_text(angle=45, hjust = 1)) +
  labs(x="Highest grade", y="Count",
       title = "Highest educational grade achieved by residents") +
  scale_fill_manual(values=c("#dd1c77","#fdae6b"))

Therefore, an age floor of 25 has been set up. This will then show how the education rates vary across individuals over the age of 25 (an age where it is possible for them to have completed all the way up to College 4) by neighbourhood.

# Filter for individuals above age 25
age_data_education <- data_education %>% 
  filter(age>25)

# plot the bar chart for those aged over 25
ggplot(age_data_education)+
  geom_bar(aes(x=highest_grade, fill=Linnentown), position=position_dodge(preserve = "single")) +
  theme(axis.text.x = element_text(angle=45, hjust = 1)) +
  labs(x="Highest grade", y="Count",
       title = "Highest educational grade achieved by residents") +
  scale_fill_manual(values=c("#dd1c77","#fdae6b"))

This bar chart of highest grade by neighbourhood for individuals over the age of 25 still shows much lower rates of education for Linnentown residents. There is still the same sharp drop off after elementary school, with low rates of high school and college education for Linnentown residents. However, given that there are only half as many residents in Linnentown, compared to outside of Linnentown (in this census block data), and that a large proportion of the residents were likely to be dropped with the age filter (as Linnentown has a younger population), it is perhaps hard to compare educational rates between the neighbourhoods.

How does the amount of hours worked compare for Linnentown and non-Linnentown residents and how do their hourly wages compare?

# Work out average hours worked per Linnentown and nearby 

data_hours_nona<-data %>%
  mutate(hours_week=as.numeric(hours_week)) %>%
  filter(hours_week>0) %>%
  group_by(Linnentown) %>%
  summarise(mean_hw=mean(hours_week)) %>%
  print(as_tibble(n=100))

## # A tibble: 2 x 2
##   Linnentown mean_hw
##   <chr>        <dbl>
## 1 No            46.6
## 2 Yes           49.8

Average hours worked for those living in Linnentown and nearby residents was very similar - Linnentown residents worked a mean of 49.8 hours per week, and non-Linnentown residents worked a mean of 46.6 hours per week.

# Work out hourly wage 

data_hourlywage <-data %>%
  mutate(income=as.numeric(income), hours_week=as.numeric(hours_week), weeks_work_39=as.numeric(weeks_work_39)) %>%
  filter(income>0, hours_week>0, weeks_work_39>0) %>%
  mutate(hourlywage = (income / weeks_work_39) / hours_week)

## Warning: NAs introduced by coercion

# Compare hourly wage for Linnentown and nearby
data_hourlywage %>%
  group_by(Linnentown) %>%
  summarise(medianHW=median(hourlywage))

## # A tibble: 2 x 2
##   Linnentown medianHW
##   <chr>         <dbl>
## 1 No            0.371
## 2 Yes           0.133

The average hourly wage is very different for Linnentown residents compared to Non-Linnentown residents. The median hourly wage was used here as some outliers were throwing the mean off.

Median hourly wage for Linnentown residents: $0.133 which on the inflation calculator would be about $2.40 today.

Median hourly wage for Non-Linnentown residents was $0.377 which on the inflation calculator would be about $6.84 today. But again the non-Linnentown category can be further separated by race, where you see a big disparity, with the whites earning a lot more per hour of work.

# Re-factor the Linnentown and Race variables so they read more clearly
data_hourlywage$Linnentown <- factor(data_hourlywage$Linnentown, levels=c("Yes", "No")) 
data_hourlywage$race <- factor(data_hourlywage$race, levels=c("White", "Negro")) 

# plot the hourly wage as a box plot 
ggplot(data_hourlywage, aes(x=Linnentown, y=hourlywage,color=race)) +
  geom_boxplot() +
  labs(title="Hourly wage for Linnentown and residents nearby",
       x="Within Linnentown?", y="Hourly wage")

The average hourly wage is displayed on a box plot above. However - the top two outliers are probably inaccurate as they both represent situations where the individual is working very low hours. The top outlier represents William McComba, a ‘dresting mechanic’ who only worked 5 weeks of 1939 and had an income of 1500. The second outlier is a similar situation, it represents a saleslady who worked 10 weeks in 1939 and had an income of 750. The third outlier is a professor which explains their high hourly wage.

What was the pattern of home ownership inside and outside the neighborhood?

# Set up home ownership where the data is grouped by address (households)
data_homeownership<-data %>%
  group_by(address)

#Get rid of any NA values (missing values)
valid.cases <- !is.na(data_homeownership$own_Rent)

# re factor Linnentown so it is clearer
data_homeownership$Linnentown <- factor(data_homeownership$Linnentown, levels=c("Yes", "No"))

# Plot this as a bar graph 
ggplot(data_homeownership[valid.cases, ])+
  geom_bar(aes(x=own_Rent, fill=Linnentown), position = "dodge") +
  labs(x="Home-ownership", y="Count",
       title = "Home-ownership rates in Linnentown and nearby") +
  scale_fill_manual(values=c("#7fcdbb","#fc9272"))

Home ownership rates are shown in the bar graph above. The levels of home-ownership appear surprisingly low. As a class we discussed that this may be due to the low numbers of households (only 49) or it is likely that home ownership rates increased from 1940-1960.

Linnentown - 1940 Census Analysis

Hope Bleasdale

27/04/2020