Skip to main content

Data Visualization Course Week 2 - Running your first program



Week 2 assignment asks to submit frequency distribution programs for 3 data variables and describe the output in terms of the values the variables take, how often they take them and the presence of missing data.



The program creates 3 frequency distributions and 1 cross table distribution

Program

LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;

DATA new; set mydata.gapminder;
/*
Insert meaningful lables to the variables
*/
label  country="country"
  oilperperson="Oil per person"
  incomeperperson="Income per person ($) (based on 2010 dollar exchange rate)"
  incomecategory="Income category"
  polityscore="Democracy score"
  armedforcesrate="Armed forces personnel";
run;

/*
Formatting the dataset to make it ready for frequency distribution
*/

proc format;

 value incomecategory
  low-<5000 = 'Less than 5000'
  5000-<10000 = '5000 to 10000'
  10000-<15000 = '10000 to 15000'
  15000-<20000 = '15000 to 20000'
  20000-<30000 = '20000 to 30000'
  30000-high = '30000 and higher'
  .='Missing'; 

 value oilcategory
  low-<2 = 'Less than 2 tonnes per year'
  2-<4 = 'Between 2 and 4 tonnes per year'
  4-<6 = 'Between 4 and 6 tonnes per year'
  6-<8 = 'Between 6 and 8 tonnes per year'
  8-high = 'Greater than 8 tonnes per year'
  .='Missing';   
  
 value politycategory
  .='Missing';   

run;

/*Do a frequency distribution*/

Title 'All country income distribution categorised';
PROC FREQ Data=new; 
 TABLES incomeperperson / missing;
 format incomeperperson incomecategory.;
run;

Title 'All country Oil consumption per person categorised';
PROC FREQ Data=new; 
 TABLES oilperperson / missing;
 format oilperperson oilcategory.;
run;

Title 'All country democracy score categorised';
PROC FREQ Data=new order=freq; 
 TABLES polityscore / missing; 
 format polityscore politycategory.;
run;

/*
Cross table frequency distribution.
This distribution shows the count of countries 
cross tabulated by democracy score and income per person.
*/

Title 'All country democracy score by income per person categorised';
PROC FREQ Data=new; 
 TABLES polityscore * incomeperperson / nopercent nocum nocol norow missing; 
 format polityscore politycategory.;
 format incomeperperson incomecategory.; 
run;


Income per person frequency distribution
  


Summary: This distribution shows the income per person distribution across all countries in the gapminder data set.

Note that each countries income per person data is based on the dollar exchange rate as of 2010. This fact is mentioned in the world bank data source used in the gapminder data set.

Noticeable points are
  • There are 23 missing observations. This means that 23 countries in the dataset have missing information on the income per person data in the gapminder dataset.
  • 115 countries (54%) have income below $5000.
  • There are 16 countries (7.5%) with income above $30000.


Oil consumption per person 



Summary: This distribution shows the oil consumption per person distribution across all countries in the gapminder data set. 

Note that 1 Tonne = 1000 Kilograms.

Noticeable points are
  • There are 150 countries where the oil consumption data is missing.
  • 51 countries (24%) consume less than 2 tonnes per person per year
  • 1 country (0.5%) consumes greater than 8 tonnes per year. This country happens to be Singapore!!. Interesting.


Democracy score distribution


Summary: This distribution shows the frequency distribution arranged in descending order of the frequency count. Here I was interested in which democracy score was at the top score in terms of number of countries.

Noticeable points are
  • There are 52 countries (24.4%) with missing data.
  • 33 countries (15%) have a high democracy score posted of 10.
  • Only 2 countries scored an extreme democracy score of -10. These happen to be 2 Middle East countries i.e. Qatar and Saudi Arabia. This fact draws me closer to the claim made by Michael L.Ross in his paper "Does Oil hinder democracy?".


Democracy score versus income per person


Summary: This distribution shows the cross tabulated frequency count of democracy score versus income per person. 

Noticeable points are
  • 17 countries have both missing democracy score and income per person data points in the gapminder dataset.
  • The interesting data points are the ones at the extreme ends of the diagonal i.e. countries which show high income but the lowest on the democracy scale and vice versa. These two countries happen to be Saudi Arabia and Mongolia respectively. Interesting fact.
  • Another interesting fact is that many of the countries score a high democracy i.e. between 5 to 10 and have low income per person suggesting probably another factor other than income could be driving the high democracy scores for these countries. This causal relationship has also been studied by Daron et al.in their paper "Income and Democracy".

Comments

Popular posts from this blog

Basic Econometrics - Chapter 1 - Exercise 1

Exercise 1.1 Table 1.2 gives data on the Consumer Price Index (CPI) for seven industrialized countries with 1982-1984 = 100 as base of the index. a. From the given data, compute the inflation rate of each country. b. Plot the inflation rate for each country against time (i.e. use the horizantal axis for time and the vertical axis for the inflation rate) c. What broad conclusions can you draw abou the inflation experience in the seven countries? d. Which countries inflation seems to be most variable? Can you offer any explanation? ## Note here I have to skip several rows and add column names. Have a look at ## the raw data. Column names are c('Year', 'Canada', 'France', 'Germany', ## 'Italy','Japan', 'UK', 'US') cpi <- read.table("https://raw.githubusercontent.com/cablegui/Econometrics/master/OriginalData/Table%201.2.txt", skip = 6, col.names = c("Ye...

Step by step guide to installing and using miktex with RStudio (Windows)

Using miktex with Rstudio is very easy with the miktex portable app available from http://miktex.org/portable. Steps 1. Follow the instructions from http://miktex.org/portable to download and unzip the miktex portable application in a loccation of your choice. 2. In R write the following code in a script and save it. Note that the E:\\Software-Silo\\Miktex\\miktex\\bin location is the location where I unzipped the miktex portable application. # Install miktex y <- Sys.getenv("PATH") x <- paste0(y,";","E:\\Software-Silo\\Miktex\\miktex\\bin") Sys.setenv(PATH=x) 3. Run Miktex by double clicking the following application "miktex-portable.cmd" in the Miktex main directory. 4. Run step 2 in RStudio to install the path into R environment. 5. Open a new RNW in RStudio to test whether Miktex works . 6. Run Compile PDF in RStudio. It should be just at the top of the RNW file created in step 5. 7. You will now see a PDF file whic...

Installing and using ROracle in R

Hi, Hope this post keeps you in the best of health. I am an oracle user and wanted to know how to fetch database information in R. There is a package out there called ROracle but there are no binaries for it and it thus needs to be built and then installed. Here are the steps to install it on Windows 7 machines. 1. Download the package from http://cran.r-project.org/web/packages/ROracle/index.html. Since I wrote this post the latest that was available was  ROracle_1.1-12.tar.gz . 2. Place the package in the directory where R is installed. I placed mine in E:\R\R-3.0.2\bin folder. 3. Install RTools from http://cran.r-project.org/bin/windows/Rtools/. Since my R version is R-3.0.2 the toolkit I needed was RTools31.exe. 4. Install the Rtools software in the R home directory. I placed mine in E:\R\Rtools. Place all the extras in there too. For example I placed my 32 bit extras in E:\R\RExtras32 and the 64 bit in E:\R\RExtras64 folder. These extras are not necessary for ...