Skip to main content

Regression modelling in practice - Week 1 - Writing About Your Data

Week 1 asks to describe the data management steps taken for the dataset selected by describing

1) the sample, 2) the data collection procedure, and 3) a measures section describing the variables and how its been managed to address the research question.

The sample

The sample dataset being used for the study is the gapminder dataset. This dataset consists of data on 213 countries. Looking at the sample based on incomeperperson a substantial portion of the countries have income below $10000 (N=143; 66%) and a small percentage above $30000 (N=16; 7.5%). The oilperperson has a substantial portion of country data missing (N=150;70%). Barring this most of the other non missing data have countries (N=51;24%) consuming less than 2 tonnes per year per person. The frequency distribution of the polityscore variable shows that most of the countries are highly democratic i.e. score > 5 score (N=97;45%). The armedforces category shows 23% of the data missing (N=49; 23%), few countries have an armed forces category of > 4% of the population registered in the military (N=13; 6%).

The data collection procedure

The gapminder dataset is an observational dataset. Data for each of the countries has been recorded with no hypothesis bias behind the data collection process. Data has been collected from several sources like world bank, defence databases and other economic databases into one place.

The purpose of the study was to research several relationships like the following hypothesis

1. Poorer countries are less democratic.
2. Highly urbanised and high oil consuming countries tend to be more democratic.
3. Some countries due to historical reasons will have a higher democracy score with low income levels and vice versa.
4. High oil consuming countries will highly correlate with high incomes and high democracy scores.
5. High armed forces personnel will negatively correlate to democracy scores.

The variables selected for the study are from the dataset attached to the course (gapminder.csv):
country, oilperperson, incomeperperson, polityscore, armedforcesrate

The measures

The gapminder dataset has been set at a particular time line of 2010. The different measures used to answer the research questions with their sources are

1. incomeperperson = Income per person ($) (based on 2010 dollar exchange rate)
source=World Bank

2. polityscore= Democracy score ranging from -10 to 10. (higher the level more free the country is)
source=Polity IV project

3. armedforcesrate=Armed forces personnel as a percentage of the population
 source=International Institute for Strategic Studies, The Military Balance.

4. oilperperson=Oil consumption per person per year in tonnes.
source=BP (British Petroleum)

Comments

Popular posts from this blog

Basic Econometrics - Chapter 1 - Exercise 1

Exercise 1.1 Table 1.2 gives data on the Consumer Price Index (CPI) for seven industrialized countries with 1982-1984 = 100 as base of the index. a. From the given data, compute the inflation rate of each country. b. Plot the inflation rate for each country against time (i.e. use the horizantal axis for time and the vertical axis for the inflation rate) c. What broad conclusions can you draw abou the inflation experience in the seven countries? d. Which countries inflation seems to be most variable? Can you offer any explanation? ## Note here I have to skip several rows and add column names. Have a look at ## the raw data. Column names are c('Year', 'Canada', 'France', 'Germany', ## 'Italy','Japan', 'UK', 'US') cpi <- read.table("https://raw.githubusercontent.com/cablegui/Econometrics/master/OriginalData/Table%201.2.txt", skip = 6, col.names = c("Ye...

Step by step guide to installing and using miktex with RStudio (Windows)

Using miktex with Rstudio is very easy with the miktex portable app available from http://miktex.org/portable. Steps 1. Follow the instructions from http://miktex.org/portable to download and unzip the miktex portable application in a loccation of your choice. 2. In R write the following code in a script and save it. Note that the E:\\Software-Silo\\Miktex\\miktex\\bin location is the location where I unzipped the miktex portable application. # Install miktex y <- Sys.getenv("PATH") x <- paste0(y,";","E:\\Software-Silo\\Miktex\\miktex\\bin") Sys.setenv(PATH=x) 3. Run Miktex by double clicking the following application "miktex-portable.cmd" in the Miktex main directory. 4. Run step 2 in RStudio to install the path into R environment. 5. Open a new RNW in RStudio to test whether Miktex works . 6. Run Compile PDF in RStudio. It should be just at the top of the RNW file created in step 5. 7. You will now see a PDF file whic...

Installing and using ROracle in R

Hi, Hope this post keeps you in the best of health. I am an oracle user and wanted to know how to fetch database information in R. There is a package out there called ROracle but there are no binaries for it and it thus needs to be built and then installed. Here are the steps to install it on Windows 7 machines. 1. Download the package from http://cran.r-project.org/web/packages/ROracle/index.html. Since I wrote this post the latest that was available was  ROracle_1.1-12.tar.gz . 2. Place the package in the directory where R is installed. I placed mine in E:\R\R-3.0.2\bin folder. 3. Install RTools from http://cran.r-project.org/bin/windows/Rtools/. Since my R version is R-3.0.2 the toolkit I needed was RTools31.exe. 4. Install the Rtools software in the R home directory. I placed mine in E:\R\Rtools. Place all the extras in there too. For example I placed my 32 bit extras in E:\R\RExtras32 and the 64 bit in E:\R\RExtras64 folder. These extras are not necessary for ...