Week 1 asks to describe the data management steps taken for the dataset selected by describing
1) the sample, 2) the data collection procedure, and 3) a measures section describing the variables and how its been managed to address the research question.
The sample
The sample dataset being used for the study is the gapminder dataset. This dataset consists of data on 213 countries. Looking at the sample based on incomeperperson a substantial portion of the countries have income below $10000 (N=143; 66%) and a small percentage above $30000 (N=16; 7.5%). The oilperperson has a substantial portion of country data missing (N=150;70%). Barring this most of the other non missing data have countries (N=51;24%) consuming less than 2 tonnes per year per person. The frequency distribution of the polityscore variable shows that most of the countries are highly democratic i.e. score > 5 score (N=97;45%). The armedforces category shows 23% of the data missing (N=49; 23%), few countries have an armed forces category of > 4% of the population registered in the military (N=13; 6%).
The data collection procedure
The purpose of the study was to research several relationships like the following hypothesis
1. Poorer countries are less democratic.
2. Highly urbanised and high oil consuming countries tend to be more democratic.
3. Some countries due to historical reasons will have a higher democracy score with low income levels and vice versa.
4. High oil consuming countries will highly correlate with high incomes and high democracy scores.
5. High armed forces personnel will negatively correlate to democracy scores.
The variables selected for the study are from the dataset attached to the course (gapminder.csv):
country, oilperperson, incomeperperson, polityscore, armedforcesrate
1. incomeperperson = Income per person ($) (based on 2010 dollar exchange rate)
source=World Bank
2. polityscore= Democracy score ranging from -10 to 10. (higher the level more free the country is)
source=Polity IV project
3. armedforcesrate=Armed forces personnel as a percentage of the population
source=International Institute for Strategic Studies, The Military Balance.
4. oilperperson=Oil consumption per person per year in tonnes.
source=BP (British Petroleum)
1) the sample, 2) the data collection procedure, and 3) a measures section describing the variables and how its been managed to address the research question.
The sample
The sample dataset being used for the study is the gapminder dataset. This dataset consists of data on 213 countries. Looking at the sample based on incomeperperson a substantial portion of the countries have income below $10000 (N=143; 66%) and a small percentage above $30000 (N=16; 7.5%). The oilperperson has a substantial portion of country data missing (N=150;70%). Barring this most of the other non missing data have countries (N=51;24%) consuming less than 2 tonnes per year per person. The frequency distribution of the polityscore variable shows that most of the countries are highly democratic i.e. score > 5 score (N=97;45%). The armedforces category shows 23% of the data missing (N=49; 23%), few countries have an armed forces category of > 4% of the population registered in the military (N=13; 6%).
The data collection procedure
The gapminder dataset is an observational dataset. Data for each of the countries has been recorded with no hypothesis bias behind the data collection process. Data has been collected from several sources like world bank, defence databases and other economic databases into one place.
The purpose of the study was to research several relationships like the following hypothesis
1. Poorer countries are less democratic.
2. Highly urbanised and high oil consuming countries tend to be more democratic.
3. Some countries due to historical reasons will have a higher democracy score with low income levels and vice versa.
4. High oil consuming countries will highly correlate with high incomes and high democracy scores.
5. High armed forces personnel will negatively correlate to democracy scores.
The variables selected for the study are from the dataset attached to the course (gapminder.csv):
country, oilperperson, incomeperperson, polityscore, armedforcesrate
The measures
The gapminder dataset has been set at a particular time line of 2010. The different measures used to answer the research questions with their sources are
source=World Bank
2. polityscore= Democracy score ranging from -10 to 10. (higher the level more free the country is)
source=Polity IV project
3. armedforcesrate=Armed forces personnel as a percentage of the population
source=International Institute for Strategic Studies, The Military Balance.
4. oilperperson=Oil consumption per person per year in tonnes.
source=BP (British Petroleum)
Comments
Post a Comment