How to build a Social Fragmentation Index using census variables


The Social Fragmentation Index (SFI) was developed by Peter Congdon. The indicator aims to use census variables to capture (for small areas across the country) aspects of the local population that may reflect a greater collective risk of social fragmentation/lack of social cohesion. The census variables are proxies for these risk factors, rather than ‘direct’ indicators. It focusses on risk due to potentially high levels of isolation and residential instability of members of the population. The earliest use of this indicator was with 1991 census data, investigating the relationship with suicide risk (Congdon, 1996). It has been used in a Scottish context (1991 census), understanding how it co-varies with deprivation and urbanicity, and associates with pychoses admissions (Allardyce et al., 2005). The SFI shows some association with socioeconomic deprivation but also varies independently. It has been shown, after controlling for other socio-economic risk factors, to be associated with various health outcomes including use of care for mental illness (Curtis et al., 2006). The index is built from four census variables:

  • Number of unmarried persons
  • Single-person households
  • Number of privately rented households
  • Mobility in the previous year

The aim of this work was to generate SFI estimates for datazones in Scotland. We also aimed to explore the association with socio-economic deprivation.



The 2011 census data was downloaded from the Scotland Census website, at 2001 datazone geography. Three of the indicators reside in the ‘Bulk tables’ files:

  • Number of unmarried persons - KS103SC.csv
  • Single-person households- KS104SC.csv
  • Number of privately rented households - KS402SC.csv

The final table can be downloaded from the ‘Additional and Commissioned Tables’ section:

  • Mobility in the previous year - CT_0210_2011.csv

Data Manipulation

Each CSV file was read into R and a series of generic cleaning procedures were performed (some extra formatting was required for the moves in the last year indicator):

  • Take out Scotland row
  • Rename of datazone column to “Datazone2001”
  • Took out commas or hypens in variables representing the number of people in the datazone
  • Converted variables to numeric

Each indicator was derived in the following way:

  • For the unmarried indicator we subtracted all people aged 16 and over from those that were recorded as married.
  • For the living alone indicator, we added together the number of people who were in the not living in a couple category, which included: single (never married or never registered a same-sex civil partnership), Married or in a registered same-sex civil partnership, Separated (but still legally married or still legally in a same-sex civil partnership, divorced or formerly in a same-sex civil partnership which is now legally dissolved and Widowed or surviving partner from a same-sex civil partnership.
  • For the private renting indicator, we added together the number of people in the rented categories, which included Council (Local authority), Other social rented, Private landlord or letting agency and Other.
  • For the moved indicator, we subtracted all people in the datazone from those that lived at the same address one year ago.

The number of people who corresponded to each indicator was then divided by the population denominator:

  • For the unmarried indicator this was: All people aged 16 and over
  • For the living alone indicator this was: All people aged over 16 in the household
  • For the private rented indicator this was: All households
  • For the moved indicators this was: All people

The percentage of people in each datazone corresponding to each indicator condition was then standardised by taking a z-score. Finally the SFI was created by adding together the indicator z-scores and then standardising once more.

Is there a relationship with deprivation?

  • A scatterplot was create using SFI and SIMD score (higher score more deprived), with colour representing the local authority and marker size representing the population density.
  • Double click on the local authority name on the right hand side will subset the data to just these data points.

  • Single clicking on another local authority name will add these data points to the graph

  • The graphs show that there is no association between SFI and deprivation. From previous calculation of SFI using different criteria, we know that this association is sensitive to the way that the renting indicator is calculated (i.e. including people that are private renting, as in this defintion of SFI, only results if weak association with deprivation).

Unpicking the SFI indicators

  • The correlation between the indicators was calculated using Pearson’s rank correlation coefficients. We found high correlation between rented and moved (r=0.77), and unmarried and living alone (r=0.95) indicators.

  • Variation in the indicator z-scores was then calculated the standard deviation of the z-scores for each datazone, and the results plotted against the SFI below.