Skip to main content

Data Visualization Course Week 3 - Making Data Management Decisions


Week 3 assignment asks to make and implement data management decisions for the variables selected by coding in valid data, coding out missing data, creating secondary variables and binning or grouping variables. It also asks to produce frequency distributions for the chosen variables.


The program creates 3 frequency distributions and 1 cross table distribution with 3 variables.

Program
LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;

DATA new; set mydata.gapminder;

/*
Formatting for income, armed forces and oil
*/

format oilcategory $35.;  
format incomecategory $20.;
format armedforcescategory $20.;

/*
Oil categorisation
*/

if    oilperperson = .   then oilcategory='Missing';
else if  oilperperson lt 2   then  oilcategory= 'Less than 2 tonnes per year';
else if  oilperperson lt 4   then  oilcategory= '2 to 4 tonnes per year';
else if  oilperperson lt 6   then  oilcategory= '4 to 6 tonnes per year';
else if  oilperperson lt 8   then  oilcategory= '6 to 8 tonnes per year';
else if  oilperperson gt 8   then  oilcategory= 'Greater than 8 tonnes per year';

/*
Income per person categorisation
*/

if   incomeperperson = .   then  incomecategory='Missing'; 
else if incomeperperson lt 5000  then  incomecategory = 'Less than $5000';
else if incomeperperson lt 10000  then  incomecategory = '$5000 to $10000';
else if incomeperperson lt 15000  then  incomecategory = '$10000 to $15000';
else if incomeperperson lt 20000  then  incomecategory = '$15000 to $20000';
else if incomeperperson lt 30000  then  incomecategory = '$20000 to $30000';
else if incomeperperson gt 30000  then  incomecategory = '$30000 and higher';

/*
Armed forces categorisation
*/

if   armedforcesrate= .   then  armedforcescategory='Missing'; 
else if armedforcesrate lt 1  then  armedforcescategory = 'Less than 1%';
else if armedforcesrate lt 2  then  armedforcescategory = '1% to 2%';
else if armedforcesrate lt 4  then  armedforcescategory = '2% to 4%';
else if armedforcesrate lt 6  then  armedforcescategory = '4% to 6%';
else if armedforcesrate lt 8  then  armedforcescategory = '6% to 8%';
else if armedforcesrate gt 8  then  armedforcescategory = 'greater than 8%';

/*
Democracy score categorisation
*/

if   polityscore  = .  then  politycategory='Missing';
else politycategory  =     polityscore;

/*
Insert meaningful lables to the variables
*/

label  country    =  "country"
  oilperperson  =  "Oil per person"
  incomeperperson  =  "Income per person ($)(based on 2010 dollar exchange rate)"
  incomecategory  =  "Income category"
  polityscore  =  "Democracy score"
  armedforcesrate  =  "Armed forces personnel (% of total labor force)"
  incomecategory  =  "Income category"
  armedforcescategory =  "Armed forces category"
  oilcategory  =  "Oil category"
  politycategory  =   "Democracy Score category";
run;

/*Do a frequency distribution*/

Title 'All country income distribution categorised';
PROC FREQ Data=new order=freq; 
 TABLES incomecategory;
run;

Title 'All country armed forces distribution categorised';
PROC FREQ Data=new order=freq; 
 TABLES armedforcescategory;
run;

Title 'All country Oil consumption per person categorised';
PROC FREQ Data=new order=freq; 
 TABLES oilcategory;
run;

/*
Using PROC TABULATE to practice multi dimensional distribution
This will draw grid of oil category and democarcy score on the rows versus
armed forces category on the column.

Using misstext ='no data' to replace missing data with the word 'no data'
*/

Title 'Oil and democracy versus armed forces';
proc tabulate Data=new;
 class oilcategory politycategory armedforcescategory;
 table (oilcategory)*(politycategory),
 armedforcescategory / misstext='no data';

run; 

title;

Income per person frequency distribution



Summary: This distribution shows the income per person distribution across all countries in the gapminder data set arranged in descending order of frequency.

Note that each countries income per person data is based on the dollar exchange rate as of 2010. This fact is mentioned in the world bank data source used in the gapminder data set.

Noticeable points are
  • There are 23 missing observations. This means that 23 countries in the dataset have missing information on the income per person data in the gapminder dataset.
  • 115 countries (54%) have income below $5000.
  • There are 16 countries (7.5%) with income above $30000.

All country armed forces distribution categorised


Summary: This distribution shows the frequency of the worlds armed forces arranged in descneding order of frequency.
Noticeable points are
  • Missing data account for 23% of the dataset. 49 countries out of the gapmider dataset have no data on their population recruited in the military.
  • 89 countries have the armed forces contribution as less than 1%.
  • 2 countries contribute more than 8% of the population into armed forces personnel. The 2 countries happen to be North Korea and Eritrea.
North Korea was no surprise but Eritrea definitely was a surprise. One particular article from "The Economist" which explains the large contribution of Eritrea's population to the military.The article reports that since 1995 the president introduced compulsory conscription for individuals below the age of 50 for an indefinite period. Post 1998 though, the law allowed an individual could be released from active service but this was at the discretion of the commander which usually took years. This therfore explains Eritrea large contribution to the armed forces category in the gapminder dataset. http://www.economist.com/blogs/baobab/2014/03/national-service-eritrea
Oil consumption per person 
Summary: This distribution shows the oil consumption per person distribution across all countries in the gapminder data set ordered by decreasing frequency. 
Note that 1 Tonne = 1000 Kilograms.
Noticeable points are
  • There are 150 countries where the oil consumption data is missing.
  • 51 countries (24%) consume less than 2 tonnes per person per year
  • 1 country (0.5%) consumes greater than 8 tonnes per year. This country happens to be Singapore!!. Interesting.
Oil consumption and democracy score versus armed forces

Summary: This distribution is an attempt to practice using the SAS procedure PROC TABULATE. I attempt to draw some meaning out by combining the frequency distribution of 3 variables in the same table. Data points where no data exists for the cross tabulation are marked with the word 'no data'.
Noticeable points are
  • 13 out of the 213 countries consume 

·         less than 2 tonnes of oil per year
·         have less than 1% of the population recruited as military personnel
·         have a democracy score of 10
These countries appear to be more free, more safer and consume less oil than other countries.
  • 1 country out of the 213 
·         has an oil consumption of 6 to 8 tonnes per year
·         scores -7 on the democracy score level which signifies probably less freedom of the people.
·         1% to 2% of the population are registered as armed forces personnel.
The country happens to be Saudi Arabia. It would be interesting to understand why this country has such a low armed forces participation.
  •       1 country of the 213

·         consumes greater than 8 tonnes of oil per year
·         has a large participation of the populace in the military (6% to 8%)
·         borders on a democracy score of -2
This country happens to be Singapore. Could it be that Sigaporeans are forcibly motivated by their government like Eritrea to be conscripted or are there other reasons?
Apparently the reason why Singapore prefers a large armed forces even though it is such a small country is because it is a rich small country located in a region with unpredictable neighbours.Singaporeans are ethnically Chinese and the religious demographic largely practice Taoism, Confucianism and Buddhism. It is believed that the threat posed by its most largest and populous neighbours Malaysia and Indonesia who happen to both be largely Islamic in religious practice could possibly be defended by having a large army to swiftly neutralise any threat. A long battle in the Singapore governments view would devastate Singapore financially as it is highly dependent on outside trade.
A link on the Singapore Armed Forces website shows its misson statement

Comments

Popular posts from this blog

Basic Econometrics - Chapter 1 - Exercise 1

Exercise 1.1 Table 1.2 gives data on the Consumer Price Index (CPI) for seven industrialized countries with 1982-1984 = 100 as base of the index. a. From the given data, compute the inflation rate of each country. b. Plot the inflation rate for each country against time (i.e. use the horizantal axis for time and the vertical axis for the inflation rate) c. What broad conclusions can you draw abou the inflation experience in the seven countries? d. Which countries inflation seems to be most variable? Can you offer any explanation? ## Note here I have to skip several rows and add column names. Have a look at ## the raw data. Column names are c('Year', 'Canada', 'France', 'Germany', ## 'Italy','Japan', 'UK', 'US') cpi <- read.table("https://raw.githubusercontent.com/cablegui/Econometrics/master/OriginalData/Table%201.2.txt", skip = 6, col.names = c("Ye...

Step by step guide to installing and using miktex with RStudio (Windows)

Using miktex with Rstudio is very easy with the miktex portable app available from http://miktex.org/portable. Steps 1. Follow the instructions from http://miktex.org/portable to download and unzip the miktex portable application in a loccation of your choice. 2. In R write the following code in a script and save it. Note that the E:\\Software-Silo\\Miktex\\miktex\\bin location is the location where I unzipped the miktex portable application. # Install miktex y <- Sys.getenv("PATH") x <- paste0(y,";","E:\\Software-Silo\\Miktex\\miktex\\bin") Sys.setenv(PATH=x) 3. Run Miktex by double clicking the following application "miktex-portable.cmd" in the Miktex main directory. 4. Run step 2 in RStudio to install the path into R environment. 5. Open a new RNW in RStudio to test whether Miktex works . 6. Run Compile PDF in RStudio. It should be just at the top of the RNW file created in step 5. 7. You will now see a PDF file whic...

Installing and using ROracle in R

Hi, Hope this post keeps you in the best of health. I am an oracle user and wanted to know how to fetch database information in R. There is a package out there called ROracle but there are no binaries for it and it thus needs to be built and then installed. Here are the steps to install it on Windows 7 machines. 1. Download the package from http://cran.r-project.org/web/packages/ROracle/index.html. Since I wrote this post the latest that was available was  ROracle_1.1-12.tar.gz . 2. Place the package in the directory where R is installed. I placed mine in E:\R\R-3.0.2\bin folder. 3. Install RTools from http://cran.r-project.org/bin/windows/Rtools/. Since my R version is R-3.0.2 the toolkit I needed was RTools31.exe. 4. Install the Rtools software in the R home directory. I placed mine in E:\R\Rtools. Place all the extras in there too. For example I placed my 32 bit extras in E:\R\RExtras32 and the 64 bit in E:\R\RExtras64 folder. These extras are not necessary for ...