Exploring the World Bank's Gini Index Data With R
Written by Peter Rosenmai on 17 Dec 2013. Last revised 18 Dec 2013.
Let's have a look at the Gini index data available from the World Bank through R's
WDI package.
For those who haven't met it before, the Gini index is an elegantly constructed measure of, typically, income
inequality. A Gini index of 0 represents a perfectly equal economy; a Gini index of 100 represents a perfectly
unequal economy. (To find out more about the Gini index, have a look at my
Gini index calculator.)
Let's search for the Gini index within the World Bank's datasets:
require(WDI)
WDIsearch('gini')
If you run the above code, you'll see that
SI.POV.GINI
is the stat we need. Let's take a peek at the values it has taken in post-apartheid South Africa:
> df.wb <- WDI(indicator="SI.POV.GINI", country="ZA", start=1994, end=2013)
> df.wb[order(df.wb$year),]
iso2c country SI.POV.GINI year
20 ZA South Africa NA 1994
19 ZA South Africa 56.59 1995
18 ZA South Africa NA 1996
17 ZA South Africa NA 1997
16 ZA South Africa NA 1998
15 ZA South Africa NA 1999
14 ZA South Africa 57.77 2000
13 ZA South Africa NA 2001
12 ZA South Africa NA 2002
11 ZA South Africa NA 2003
10 ZA South Africa NA 2004
9 ZA South Africa NA 2005
8 ZA South Africa 67.40 2006
7 ZA South Africa NA 2007
6 ZA South Africa NA 2008
5 ZA South Africa 63.14 2009
4 ZA South Africa NA 2010
3 ZA South Africa NA 2011
2 ZA South Africa NA 2012
1 ZA South Africa NA 2013
Those are grim numbers. We have a figure for South Africa for 2009, so let's compare that against the Gini index for other countries for that same year:
> df.wb <- WDI(indicator="SI.POV.GINI", country="all", start=2009, end=2009)
> df.wb <- df.wb[!is.na(df.wb$"SI.POV.GINI"),]
> df.wb[order(df.wb$"SI.POV.GINI", decreasing=TRUE),][1:10,]
iso2c country SI.POV.GINI year
212 ZA South Africa 63.14 2009
121 HN Honduras 56.95 2009
80 CO Colombia 56.67 2009
65 BR Brazil 54.69 2009
78 CL Chile 52.06 2009
186 PA Panama 52.03 2009
188 PY Paraguay 51.04 2009
84 CR Costa Rica 50.73 2009
95 EC Ecuador 49.43 2009
189 PE Peru 49.05 2009
So the Gini index for South Africa appears to be worse even than for South America's titans of income inequality. But how many countries are in our dataset?
> nrow(df.wb)
[1] 42
Only 42! The problem is that such statistics aren't collected for every country every year. So we need to do some interpolation and extrapolation to expand our dataset. I've written a simple function—LinearlyInterpolateFlatExtrapolateWBData()—to do that:
LinearlyInterpolateFlatExtrapolate <- function(v, max.extrapolate=NA){
n <- length(v)
indexes.non.na <- which(!is.na(v))
n.not.na <- length(indexes.non.na)
if (n.not.na == 0) return(v)
x <- 1:n
v <- approx(x=x, y=v, xout=x, rule=2:2, method=ifelse(n.not.na == 1, "constant", "linear"))$y
if (!is.na(max.extrapolate)){
non.na.range.min <- max(1, (min(indexes.non.na) - max.extrapolate))
non.na.range.max <- min(n, (max(indexes.non.na) + max.extrapolate))
v[setdiff(1:n, non.na.range.min:non.na.range.max)] <- NA
}
return(v)
}
LinearlyInterpolateFlatExtrapolateWBData <- function(country="all", indicator="NY.GNS.ICTR.GN.ZS", start=2000, end=NA, extra=FALSE, max.extrapolate=NA){
require(WDI)
df.wb <- WDI(country, indicator, start=1000, end=3000, extra, cache=NULL)
df.wb <- df.wb[order(df.wb$country, df.wb$year),]
df.wb$source <- ifelse(is.na(df.wb[,3]), "Interpolated/Extrapolated", "Supplied")
all.countries <- unique(df.wb$country)
for (country in all.countries){
df.wb[df.wb$country==country, indicator] <- LinearlyInterpolateFlatExtrapolate(df.wb[df.wb$country==country, indicator], max.extrapolate)
}
if (is.na(end)) end <- 3000
df.wb <- df.wb[start <= df.wb$year & df.wb$year <= end,]
return(df.wb)
}
Using the above code, let's look at 2009 again, flat-line extrapolating out at most five years:
> df.wb <- LinearlyInterpolateFlatExtrapolateWBData(indicator="SI.POV.GINI", start=2009, end=2009, max.extrapolate=5)
> df.wb <- df.wb[!is.na(df.wb$"SI.POV.GINI"),]
> df.wb[order(df.wb$"SI.POV.GINI", decreasing=TRUE),][1:10,]
iso2c country SI.POV.GINI year source
10967 SC Seychelles 65.770 2009 Interpolated/Extrapolated
4325 KM Comoros 64.300 2009 Interpolated/Extrapolated
9293 NA Namibia 63.900 2009 Interpolated/Extrapolated
11399 ZA South Africa 63.140 2009 Supplied
6485 HN Honduras 56.950 2009 Supplied
13505 ZM Zambia 56.775 2009 Interpolated/Extrapolated
4271 CO Colombia 56.670 2009 Supplied
4001 CF Central African Republic 56.300 2009 Interpolated/Extrapolated
3299 BO Bolivia 56.290 2009 Interpolated/Extrapolated
6215 GT Guatemala 55.890 2009 Interpolated/Extrapolated
Ah. I was wondering where Namibia and Zambia had got to.
And how many countries do we now have in our dataset?
> nrow(df.wb)
[1] 108
Much better. So let's have a look at the Gini index over time for the BRICS economies:
require(ggplot2)
countries <-
c(
"Brazil",
"China",
"India",
"Russian Federation",
"South Africa")
df.wb <-
LinearlyInterpolateFlatExtrapolateWBData(country=
"all", indicator=
"SI.POV.GINI", start=1980, end=2013, max.extrapolate=0)
df.wb <- df.wb[df.wb$country %in% countries,]
ggplot(data=df.wb,
aes(x=year, y=SI.POV.GINI, group=country, colour=country)) +
theme_bw() +
geom_line(size=2) +
ggtitle(
"Gini Index for the BRICS Economies") +
xlab(
"Year") +
ylab(
"Gini Index") +
labs(colour=
"")
And let's map the Gini index for 2012, again extrapolating out at most five years:
MapWBData <-
function(indicator, year, max.extrapolate=NA){
require(rworldmap)
df.wb <-
LinearlyInterpolateFlatExtrapolateWBData(indicator=indicator, start=year, end=year, max.extrapolate=max.extrapolate)
sPDF <-
joinCountryData2Map(df.wb, joinCode=
"ISO2", nameJoinColumn=
"iso2c")
map.title <-
paste(indicator,
"in", year,
"(Flat-line extrapolating")
map.title <-
paste(map.title,
ifelse(
is.na(max.extrapolate),
"from most recent)",
paste(
"at most", max.extrapolate,
"years)")))
mapCountryData(sPDF, nameColumnToPlot=indicator, colourPalette=
"heat", missingCountryCol=
"grey",
numCats=100, mapTitle=map.title)
}
MapWBData(indicator=
"SI.POV.GINI", year=2012, max.extrapolate=5)
Thanks to Andy South's
Beautiful world maps in R with rworldmap
for introducing me to this dataset and the
rworldmap package.