Skip to main content

Data Analysis tools - Week 2 - Chi Square Test of Independence

Week 2 asks to perform a Chi Square test of independence on two categorical variables. After running on multiple categories in the explanatory variable it asks to perform pair wise post hoc tests of independence and interpret the results.

Program


LIBNAME mydata "/courses/d1406ae5ba27fe300 " access=readonly;

DATA new; set mydata.gapminder;

/*
Formatting for income and oil
*/
format oilcategory $35.;  
format incomecategory $20.;

/*
Oil categorisation
*/
if 
 oilperperson le 1  then  oilcategory= '<= 1 ton per year';
else if oilperperson gt 1  then  oilcategory= '> 1 ton per year';

/*
Income per person categorisation
*/
if  
 incomeperperson le 15000  then  incomecategory = '<= $15000';
else if incomeperperson lt 30000  then  incomecategory = '$15000 to $30000';
else if incomeperperson gt 30000  then  incomecategory = '$30000 and higher';

/*
Insert meaningful lables to the variables
*/
label  country   =  "country"
 oilperperson  =  "Oil per person"
 incomeperperson  =  "Income per person ($)(based on 2010 dollar exchange rate)"
 incomecategory  =  "Income category"
 polityscore  =  "Democracy score"
 armedforcesrate  =  "Armed forces personnel (% of total labor force)"
 incomecategory  =  "Income category"
 oilcategory  =  "Oil category";

run;

proc sort data=new; by country;
PROC FREQ; TABLES oilcategory*incomecategory/CHISQ;
RUN;

DATA COMPARISON1; SET NEW;
IF incomecategory='<= $15000' OR incomecategory='$15000 to $30000';
PROC SORT; BY country;
PROC FREQ; TABLES oilcategory*incomecategory/CHISQ;
RUN;

DATA COMPARISON2; SET NEW;
IF incomecategory='<= $15000' OR incomecategory='$30000 and higher';
PROC SORT; BY country;
PROC FREQ; TABLES oilcategory*incomecategory/CHISQ;
RUN;

DATA COMPARISON3; SET NEW;
IF incomecategory='$15000 to $30000' OR incomecategory='$30000 and higher';
PROC SORT; BY country;
PROC FREQ; TABLES oilcategory*incomecategory/CHISQ;
RUN;


Chi Square Analysis


Interpretation

The test is meant to check the degree of independence between the explanatory variable
incomecategory and the oilCategory. Here I want to see if there is any relationship between
the income per person and the oil consumption. Ideally I would be expecting to see 
a positive relationship that is higher the income then higher the oil consumption. The 
ANOVA test conducted in the earlier week proved some degree of relationship. The Bonferronni adjusted p value
where I have 3 explanatory categories resuls in p=0.05/3 = 0.017.

In the test conducted the p value is less than 0.017. This suggests a dependence
between the two variables. There is a warning mentioned though which suggests that 
this test may not be a good test for this dataset because the "expected count" is below 5.
Since the Chi square test looks at observed count versus expected counts then due to the 
low number of data points in each of the categories the Chi Square test complains on this
issue.

Chi Square post hoc analysis


Interpretation

Here we see the warning that the expected count is lower than 5 and suggests that
the Chi Square test may no be valid test. However I do see a p value < 0.017 which
suggests a relationship between the two variables where the income category is <= $15000
and $15000 to $30000.

Interpretation

Here the warning again. The p value of < 0.017 suggests a dependence between these two income categories <=$15000 and $15000 to $30000 and the oil consumption.



Interpretation

Interestingly this test for the last two categories does not produce a warning.
The p value of 0.72 is much greater than 0.017. This suggests that I must accept the null
hypothesis which means that there is no relationship between the income categories 
$15000 to $30000 and $30000 and higher versus oil consumption.

Comments

Post a Comment

Popular posts from this blog

Basic Econometrics - Chapter 1 - Exercise 1

Exercise 1.1 Table 1.2 gives data on the Consumer Price Index (CPI) for seven industrialized countries with 1982-1984 = 100 as base of the index. a. From the given data, compute the inflation rate of each country. b. Plot the inflation rate for each country against time (i.e. use the horizantal axis for time and the vertical axis for the inflation rate) c. What broad conclusions can you draw abou the inflation experience in the seven countries? d. Which countries inflation seems to be most variable? Can you offer any explanation? ## Note here I have to skip several rows and add column names. Have a look at ## the raw data. Column names are c('Year', 'Canada', 'France', 'Germany', ## 'Italy','Japan', 'UK', 'US') cpi <- read.table("https://raw.githubusercontent.com/cablegui/Econometrics/master/OriginalData/Table%201.2.txt", skip = 6, col.names = c("Ye...

Step by step guide to installing and using miktex with RStudio (Windows)

Using miktex with Rstudio is very easy with the miktex portable app available from http://miktex.org/portable. Steps 1. Follow the instructions from http://miktex.org/portable to download and unzip the miktex portable application in a loccation of your choice. 2. In R write the following code in a script and save it. Note that the E:\\Software-Silo\\Miktex\\miktex\\bin location is the location where I unzipped the miktex portable application. # Install miktex y <- Sys.getenv("PATH") x <- paste0(y,";","E:\\Software-Silo\\Miktex\\miktex\\bin") Sys.setenv(PATH=x) 3. Run Miktex by double clicking the following application "miktex-portable.cmd" in the Miktex main directory. 4. Run step 2 in RStudio to install the path into R environment. 5. Open a new RNW in RStudio to test whether Miktex works . 6. Run Compile PDF in RStudio. It should be just at the top of the RNW file created in step 5. 7. You will now see a PDF file whic...

Installing and using ROracle in R

Hi, Hope this post keeps you in the best of health. I am an oracle user and wanted to know how to fetch database information in R. There is a package out there called ROracle but there are no binaries for it and it thus needs to be built and then installed. Here are the steps to install it on Windows 7 machines. 1. Download the package from http://cran.r-project.org/web/packages/ROracle/index.html. Since I wrote this post the latest that was available was  ROracle_1.1-12.tar.gz . 2. Place the package in the directory where R is installed. I placed mine in E:\R\R-3.0.2\bin folder. 3. Install RTools from http://cran.r-project.org/bin/windows/Rtools/. Since my R version is R-3.0.2 the toolkit I needed was RTools31.exe. 4. Install the Rtools software in the R home directory. I placed mine in E:\R\Rtools. Place all the extras in there too. For example I placed my 32 bit extras in E:\R\RExtras32 and the 64 bit in E:\R\RExtras64 folder. These extras are not necessary for ...