Skip to main content

Data Analysis tools - Week 1 - Running an analysis of variance

The week 1 assignment asks to run an analysis of variance and then analyze and interpret post hoc paired comparisons in instances where the original statistical test was significant, and examining more than two groups (i.e. more than two levels of a categorical, explanatory variable).

I have analysed the gapminder dataset. In the analysis I look at the relationship between the income per person (incomecategory) as the explanatory variable and the oil consumption per person (oilperperson) as the response variable.

Program

LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;

DATA new; set mydata.gapminder;

/*
Formatting for income, armed forces and oil
*/

format oilcategory $35.;  
format incomecategory $20.;
format armedforcescategory $20.;

/*
Oil categorisation
*/

if /*oilperperson = .  then oilcategory='Missing';*/
oilperperson lt 2  then  oilcategory= 'Less than 2 tonnes per year';
else if oilperperson lt 4  then  oilcategory= '2 to 4 tonnes per year';
else if oilperperson lt 6  then  oilcategory= '4 to 6 tonnes per year';
else if oilperperson lt 8  then  oilcategory= '6 to 8 tonnes per year';
else if oilperperson gt 8  then  oilcategory= 'Greater than 8 tonnes per year';

/*
Income per person categorisation
*/

if  /*incomeperperson = .   then  incomecategory='Missing'; */
incomeperperson lt 5000  then  incomecategory = 'Less than $5000';
else if incomeperperson lt 10000  then  incomecategory = '$5000 to $10000';
else if incomeperperson lt 15000  then  incomecategory = '$10000 to $15000';
else if incomeperperson lt 20000  then  incomecategory = '$15000 to $20000';
else if incomeperperson lt 30000  then  incomecategory = '$20000 to $30000';
else if incomeperperson gt 30000  then  incomecategory = '$30000 and higher';

/*
Armed forces categorisation
*/

if /*armedforcesrate= . then  armedforcescategory='Missing'; */
armedforcesrate lt 1  then  armedforcescategory = 'Less than 1%';
else if armedforcesrate lt 2  then  armedforcescategory = '1% to 2%';
else if armedforcesrate lt 4  then  armedforcescategory = '2% to 4%';
else if armedforcesrate lt 6  then  armedforcescategory = '4% to 6%';
else if armedforcesrate lt 8  then  armedforcescategory = '6% to 8%';
else if armedforcesrate gt 8  then  armedforcescategory = 'greater than 8%';

/*
Insert meaningful lables to the variables
*/

label  country   =  "country"
 oilperperson  =  "Oil per person"
 incomeperperson  =  "Income per person ($)(based on 2010 dollar exchange rate)"
 incomecategory  =  "Income category"
 polityscore  =  "Democracy score"
 armedforcesrate  =  "Armed forces personnel (% of total labor force)"
 incomecategory  =  "Income category"
 armedforcescategory =  "Armed forces category"
 oilcategory  =  "Oil category";

run;

proc sort data=new; by country;

proc anova data=new; class incomecategory;
model oilperperson=incomecategory;
means incomecategory/duncan;

run;

ANOVA Analysis




Interpretation

The means of the oil consumption variable is generally in an increasing order with the income per person. This basically gives an idea that there must be some positive relationship between the two variables.

From the data I can see that the p value is less than the 5% level which means that the test is statistically significant. So I must reject the null Hypothesis that there is no relation between the income and the oil consumption. This rejection also makes sense as higher income could be used to explain the higher oil consumption.

From the Duncan post hoc test it seems that the mean oil consumption of the highest income category "$30000 and higher" is statistically different to the mean oil consumption of the lower income categories "Less than $5000" , "$5000 to $10000" and "$10000 to $20000" because the groupings have distinctly separate alphabets. 

In my view though ideally each income category should have had their own group. The details of how the Duncan test works needs to be thoroughly understood.

Comments

Popular posts from this blog

Basic Econometrics - Chapter 1 - Exercise 1

Exercise 1.1 Table 1.2 gives data on the Consumer Price Index (CPI) for seven industrialized countries with 1982-1984 = 100 as base of the index. a. From the given data, compute the inflation rate of each country. b. Plot the inflation rate for each country against time (i.e. use the horizantal axis for time and the vertical axis for the inflation rate) c. What broad conclusions can you draw abou the inflation experience in the seven countries? d. Which countries inflation seems to be most variable? Can you offer any explanation? ## Note here I have to skip several rows and add column names. Have a look at ## the raw data. Column names are c('Year', 'Canada', 'France', 'Germany', ## 'Italy','Japan', 'UK', 'US') cpi <- read.table("https://raw.githubusercontent.com/cablegui/Econometrics/master/OriginalData/Table%201.2.txt", skip = 6, col.names = c("Ye...

Step by step guide to installing and using miktex with RStudio (Windows)

Using miktex with Rstudio is very easy with the miktex portable app available from http://miktex.org/portable. Steps 1. Follow the instructions from http://miktex.org/portable to download and unzip the miktex portable application in a loccation of your choice. 2. In R write the following code in a script and save it. Note that the E:\\Software-Silo\\Miktex\\miktex\\bin location is the location where I unzipped the miktex portable application. # Install miktex y <- Sys.getenv("PATH") x <- paste0(y,";","E:\\Software-Silo\\Miktex\\miktex\\bin") Sys.setenv(PATH=x) 3. Run Miktex by double clicking the following application "miktex-portable.cmd" in the Miktex main directory. 4. Run step 2 in RStudio to install the path into R environment. 5. Open a new RNW in RStudio to test whether Miktex works . 6. Run Compile PDF in RStudio. It should be just at the top of the RNW file created in step 5. 7. You will now see a PDF file whic...

Installing and using ROracle in R

Hi, Hope this post keeps you in the best of health. I am an oracle user and wanted to know how to fetch database information in R. There is a package out there called ROracle but there are no binaries for it and it thus needs to be built and then installed. Here are the steps to install it on Windows 7 machines. 1. Download the package from http://cran.r-project.org/web/packages/ROracle/index.html. Since I wrote this post the latest that was available was  ROracle_1.1-12.tar.gz . 2. Place the package in the directory where R is installed. I placed mine in E:\R\R-3.0.2\bin folder. 3. Install RTools from http://cran.r-project.org/bin/windows/Rtools/. Since my R version is R-3.0.2 the toolkit I needed was RTools31.exe. 4. Install the Rtools software in the R home directory. I placed mine in E:\R\Rtools. Place all the extras in there too. For example I placed my 32 bit extras in E:\R\RExtras32 and the 64 bit in E:\R\RExtras64 folder. These extras are not necessary for ...