Skip to main content

Data Visulaization course - Week 4 - Creating graphs for your data

Week 4 assignment asks to prepare univariate and bivariate graphs of the variables in the study and provide a summary of the graphs.


Program

LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;

DATA new; set mydata.gapminder;

/*
Formatting for income, armed forces and oil
*/

format oilcategory $35.;  
format incomecategory $20.;
format armedforcescategory $20.;

/*
Oil categorisation
*/

if   oilperperson = .  then oilcategory='Missing';
else if  oilperperson lt 2  then  oilcategory= 'Less than 2 tonnes per year';
else if  oilperperson lt 4  then  oilcategory= '2 to 4 tonnes per year';
else if  oilperperson lt 6  then  oilcategory= '4 to 6 tonnes per year';
else if  oilperperson lt 8  then  oilcategory= '6 to 8 tonnes per year';
else if  oilperperson gt 8  then  oilcategory= 'Greater than 8 tonnes per year';

/*
Income per person categorisation
*/

if   incomeperperson = .   then  incomecategory='Missing'; 
else if incomeperperson lt 5000  then  incomecategory = 'Less than $5000';
else if incomeperperson lt 10000  then  incomecategory = '$5000 to $10000';
else if incomeperperson lt 15000  then  incomecategory = '$10000 to $15000';
else if incomeperperson lt 20000  then  incomecategory = '$15000 to $20000';
else if incomeperperson lt 30000  then  incomecategory = '$20000 to $30000';
else if incomeperperson gt 30000  then  incomecategory = '$30000 and higher';

/*
Armed forces categorisation
*/

if   armedforcesrate= .   then  armedforcescategory='Missing'; 
else if armedforcesrate lt 1  then  armedforcescategory = 'Less than 1%';
else if armedforcesrate lt 2  then  armedforcescategory = '1% to 2%';
else if armedforcesrate lt 4  then  armedforcescategory = '2% to 4%';
else if armedforcesrate lt 6  then  armedforcescategory = '4% to 6%';
else if armedforcesrate lt 8  then  armedforcescategory = '6% to 8%';
else if armedforcesrate gt 8  then  armedforcescategory = 'greater than 8%';

/*
Democracy score categorisation
*/

if   polityscore  = .  then  politycategory='Missing';
else politycategory  =     polityscore;

/*
Insert meaningful lables to the variables
*/

label  country    =  "country"
  oilperperson  =  "Oil per person"
  incomeperperson  =  "Income per person ($)(based on 2010 dollar exchange rate)"
  incomecategory  =  "Income category"
  polityscore   =  "Democracy score"
  armedforcesrate  =  "Armed forces personnel (% of total labor force)"
  incomecategory  =  "Income category"
  armedforcescategory =  "Armed forces category"
  oilcategory   =  "Oil category"
  politycategory  =   "Democracy Score category";
run;

/*
Univariate report of income per person
*/

PROC UNIVARIATE Data=new; VAR incomeperperson; 

/*
Quantitative univariate plot of
income per person for all countries
*/

proc sgplot data=new;
histogram incomeperperson /showbins binstart=0 binwidth=1000;
yaxis grid values=(0 to 20 by 1) label='Percent of countries';
title 'Quantitative univariate plot of income per person for all countries';
run;


/*
Categorical variable plot of 
income per person for all countries.
*/

/**/
proc sgplot;
vbar incomecategory/stat=pct barwidth=0.5;
yaxis label='Percent of countries';
title 'Categorical univariate plot of income category for all countries';
run;

/*
Univariate report of armed forces rate
*/
title 'Univariate procedure report for armedforcesrate variable';
PROC UNIVARIATE Data=new; VAR armedforcesrate  ; 

/*
Quantitative univariate plot of
armed forces for all countries
*/
proc sgplot data=new;
histogram armedforcesrate /showbins binstart=0 binwidth=0.2;
yaxis grid values=(0 to 15 by 1) label='Percent of countries';
title 'Quantitative univariate plot of armed forces rate';
run;

/*
Categorical variable plot of 
armed forces category for all countries
*/
PROC sgplot;
vbar armedforcescategory/ stat=pct barwidth=0.5;
xaxis label='Armed forces categories';
yaxis grid values=(0 to 0.50 by 0.02) label='Percent of countries';
title 'Categorical univariate plot of armedforcescategory';
run;

/*
Scatter plot of response variable (Democracy score) versus explanatory variable (incomeperperson)

This plot does not show any causal relationship between income and democracy. This fact has also been seen in 
the paper by Daron et al. where the papers concludes that no relationship exists between income and democracy
*/
proc sgplot data=new;
scatter x=incomeperperson y=polityscore;
xaxis grid;
yaxis grid;
title 'Scatter plot of response variable (Democracy score) versus explanatory variable (incomeperperson)';
run;

/*
Scatter plot of response variable (oilperperson) versus explanatory variable (incomeperperson)

Data shows a linear relationship between oil consumption 
and incomeperperson which makes sense. One would expect that as the income increases of a country then 
its oil consumption would increase due to increased standards of living.
*/

proc sgplot data=new;
scatter x=incomeperperson y=oilperperson;
xaxis grid;
yaxis grid;
title 'Scatter plot of response variable (oilperperson) versus explanatory variable (incomeperperson)';
run;


Univariate procedure of incomeperperson





The UNIVARIATE Procedure
Variable: incomeperperson (Income per person ($)(based on 2010 dollar exchange rate))



Moments
N190Sum Weights190
Mean8740.96608Sum Observations1660783.55
Std Deviation14262.8091Variance203427723
Skewness3.25047792Kurtosis14.6656757
Uncorrected SS5.29647E10Corrected SS3.84478E10
Coeff Variation163.171999Std Error Mean1034.73292
Basic Statistical Measures
LocationVariability
Mean8740.966Std Deviation14263
Median2553.496Variance203427723
Mode.Range105044
Interquartile Range8681
Tests for Location: Mu0=0
TestStatisticp Value
Student's tt8.447558Pr > |t|<.0001
SignM95Pr >= |M|<.0001
Signed RankS9072.5Pr >= |S|<.0001
Quantiles (Definition 5)
LevelQuantile
100% Max105147.438
99%81647.100
95%33945.314
90%26901.858
75% Q39425.326
50% Median2553.496
25% Q1744.239
10%337.318
5%242.678
1%115.306
0% Min103.776
Extreme Observations
LowestHighest
ValueObsValueObs
103.7764239972.4145
115.3063052301.6112
131.7965962682.121
155.03310881647.1110
161.31780105147.4128
Missing Values
Missing
Value
CountPercent Of
All ObsMissing Obs
.2310.80100.00

Summary

1. The data shows that the mean is greater than the median. This suggests that data is not symmetric. The direction can be interpreted as right skewed as some of the data is pulling the mean towards higher values. This makes sense as income distributions are usually right skewed suggesting few countries earning high incomes.

2. The standard deviation seems low (14263) as it does not seem to be catching the extremes. As per the normal distribution, around 68% of the data should be covered within 1 standard deviation. In an earlier post I created a frequency distribution of the income per person for all countries and I can see that around 78% of the data is covered by incomes in the combined range of 'less than $5000' , '$5000 to $10000', '$10000 to $15000' and '$15000 to $20000' i.e. covering the mean + sd range = 8740 + 14623 = 23363. The cumulative range equals 54% + 13% + 4.7% + 6.5%= 78%. This is  greater than the 68% that the univariate procedure has returned in the standard deviation value. This suggests that the data is not normally distributed. It does suggest that a large portion of the income is in the lower income group. 



Quantitative univariate plot of income per person

Summary
The graph clearly shows the right skewed distribution of the income meaning that a few countries have a large income per person. The data point at the extreme right hand of the graph shows that 0.5% of the 190 countries which equals 0.95 or approx 1 country has an income larger than $100000. The histogram graph is plotted with $1000 as the bin-width.

Categorical univariate plot of income category


Summary
The graph shows the income category percentage distribution for all countries. More than 50% of the countries have an income level less than $5000.

Univariate procedure of armed forces rate
Univariate procedure report for armedforcesrate variable
The UNIVARIATE Procedure

Variable: armedforcesrate (Armed forces personnel (% of total labor force))
Moments
N164Sum Weights164
Mean1.44401628Sum Observations236.81867
Std Deviation1.70900751Variance2.92070667
Skewness2.80701143Kurtosis9.43821248
Uncorrected SS818.045203Corrected SS476.075188
Coeff Variation118.350986Std Error Mean0.13345107
Basic Statistical Measures
LocationVariability
Mean1.444016Std Deviation1.70901
Median0.930638Variance2.92071
Mode.Range10.63852
Interquartile Range1.13473
Tests for Location: Mu0=0
TestStatisticp Value
Student's tt10.82057Pr > |t|<.0001
SignM81.5Pr >= |M|<.0001
Signed RankS6683Pr >= |S|<.0001
Quantiles (Definition 5)
LevelQuantile
100% Max10.638521
99%9.820127
95%5.406536
90%3.290807
75% Q31.613217
50% Median0.930638
25% Q10.478489
10%0.234286
5%0.134730
1%0.066100
0% Min0.000000
Extreme Observations
LowestHighest
ValueObsValueObs
0.000000825.95585187
0.066100866.39494174
0.1022691327.7379190
0.1051151509.8201359
0.11459311610.63852100
Missing Values
Missing
Value
CountPercent Of
All ObsMissing Obs
.4923.00100.00

Summary

1. The median is less than the mean suggesting right skewness of the armed forces personnel. 

2. The spread 1.7 from the mean of 1.4 means that according to the normal distribution this should be 1.4 + 1.7 upper and 1.4 - 1.7 lower. The upper level to constitute 68% of the data should therefore fall at 3.1. The frequency distribution of the armed forces data suggests that already 69% of the data is covered in the ranges 'Less than 1%', '1% to 2%' and  '2% to 4%' only in the upper direction.  This means the data is not normally distributed and is largely right skewed.



Quantitative plot of armed forces rate



Summary

This plot visually shows the claim of right skewness of the armed forces data. The bin-width is st at 0.2. This needs to be read percentage of countries where the armed forces rate is some x%.

Categorical plot of armedforces category


Summary

This plot shows the univariate categorical distribution of armed forces percentage in all countries. It can be interpreted as the percentage of countries with armed forces in the range of 1% to 2% is 20% for example.

Scatter plot of democarcy score versus incomeperperson


Summary

Plotting a scatter plot of response variable democracy score versus explanatory variable incomeperperson shows no real causal relationship between the two variables. Even culling the income to be below $40000 does not show a causal relationship. This claim that any relationship existing between income and democracy has also been challenged in the paper by Daron et al. 


Scatter plot of oilperperson versus incomeperperson 





Summary

1. This plot shows a positive relationship between the response variable (oilperperson) and the explanatory variable (incomeperperson). Intuitively this makes sense as higher the income usually results in higher standards of living causing higher oil consumption.

2. The one country where the oil consumption is the largest and the income is in the upper bracket of the income scale is Singapore. An interesting point to study would be why is Singapore consuming so much oil compared to countries within the same bracket of income. Has it got anything to do with the large armed forces variable or is there some other variable influencing the oil consumption? A nice topic for regression analysis. 

Comments

Popular posts from this blog

Installing and using ROracle in R

Hi, Hope this post keeps you in the best of health. I am an oracle user and wanted to know how to fetch database information in R. There is a package out there called ROracle but there are no binaries for it and it thus needs to be built and then installed. Here are the steps to install it on Windows 7 machines. 1. Download the package from http://cran.r-project.org/web/packages/ROracle/index.html. Since I wrote this post the latest that was available was  ROracle_1.1-12.tar.gz . 2. Place the package in the directory where R is installed. I placed mine in E:\R\R-3.0.2\bin folder. 3. Install RTools from http://cran.r-project.org/bin/windows/Rtools/. Since my R version is R-3.0.2 the toolkit I needed was RTools31.exe. 4. Install the Rtools software in the R home directory. I placed mine in E:\R\Rtools. Place all the extras in there too. For example I placed my 32 bit extras in E:\R\RExtras32 and the 64 bit in E:\R\RExtras64 folder. These extras are not necessary for ...

Basic Econometrics - Chapter 1 - Exercise 1

Exercise 1.1 Table 1.2 gives data on the Consumer Price Index (CPI) for seven industrialized countries with 1982-1984 = 100 as base of the index. a. From the given data, compute the inflation rate of each country. b. Plot the inflation rate for each country against time (i.e. use the horizantal axis for time and the vertical axis for the inflation rate) c. What broad conclusions can you draw abou the inflation experience in the seven countries? d. Which countries inflation seems to be most variable? Can you offer any explanation? ## Note here I have to skip several rows and add column names. Have a look at ## the raw data. Column names are c('Year', 'Canada', 'France', 'Germany', ## 'Italy','Japan', 'UK', 'US') cpi <- read.table("https://raw.githubusercontent.com/cablegui/Econometrics/master/OriginalData/Table%201.2.txt", skip = 6, col.names = c("Ye...

Step by step guide to installing and using miktex with RStudio (Windows)

Using miktex with Rstudio is very easy with the miktex portable app available from http://miktex.org/portable. Steps 1. Follow the instructions from http://miktex.org/portable to download and unzip the miktex portable application in a loccation of your choice. 2. In R write the following code in a script and save it. Note that the E:\\Software-Silo\\Miktex\\miktex\\bin location is the location where I unzipped the miktex portable application. # Install miktex y <- Sys.getenv("PATH") x <- paste0(y,";","E:\\Software-Silo\\Miktex\\miktex\\bin") Sys.setenv(PATH=x) 3. Run Miktex by double clicking the following application "miktex-portable.cmd" in the Miktex main directory. 4. Run step 2 in RStudio to install the path into R environment. 5. Open a new RNW in RStudio to test whether Miktex works . 6. Run Compile PDF in RStudio. It should be just at the top of the RNW file created in step 5. 7. You will now see a PDF file whic...