OVERVIEW OF THE ASSIGNMENT
This assignment will test your skills to collect, summarise and present data using Microsoft Excel and/or other approved tools. It will also test your understanding to interpret the output produced by the tools to solve business problems.
You will need to use the dataset allocated to you, as well as to perform data collection and produce numerical and graphical summary. You will need to submit an Excel file following the requirement as explained below.
TASK DESCRIPTION
There are two datasets involved in this assignment: Dataset 1 and Dataset 2, detailed below.
Dataset 1: This dataset will be sent to your KOI email address by the end of Week 3. Please email the lecturer if you have not received the dataset. This dataset is a subset of Google Play Store Apps dataset, by including only apps that have more than 1 million installs. The original dataset can be obtained from https://www.kaggle.com/gauthamp10/google-playstore-apps. Note that you will need to use the dataset sent to your email, not the original dataset.
Dataset 2: You will need to collect a dataset to answer your own research question. The details are given in Section 6 below. You will need to collect data from at least 30 international students. The data should have 2 categorical variables that will answer your research question (see Section 6).
Both datasets should be saved in an Excel file (see Submission Requirement on the next page). All data processing should be performed in Excel or other approved tools (this will be communicated during tutorials). Failure to use Excel or approved tools will results in mark deduction.
Your tasks are described below.
Section 1: Description about the Datasets
Dataset 1: Give a short but clear description (2-3 sentences) about this dataset (e.g. what the dataset is about, where it comes from, is this primary or secondary data).
Dataset 2:
State your research question.
Give a short but clear description (2-3 sentences) about this dataset (e.g. what the dataset is about, how you collect the data, is this primary or secondary data, is it biased).
Section 2: Do you believe that 99.8% of Google Play Apps (with more than 1 million installs) are free?
Using Dataset 1, provide the frequency and the proportion (either as a decimal or a percentage) for each category for the variable Free. You also need to provide a graphical display that easily shows the proportion of each category. Finally, write a comment about your findings and answer the question.
Section 3: Is the average rating of Google Play Apps (with more than 1 million installs) less than 4.1?
Using Dataset 1, describe the Rating distribution of Google Play Store apps. You need to provide numerical summary (sample size, mean, standard deviation and median) as well as graphical display which shows the outliers, if any. Finally, write a comment about your findings and answer the question.
Section 4: Is there a difference in the rating of Google Play Store Apps of different categories?
Using Dataset 1, first filter the variable Category to include only Entertainment, Tools, and Shopping. Then provide the numerical summary for the Rating grouped by the three different categories. You also need to provide graphical display which shows any outliers. Finally, write a comment about your findings and answer the question.
Section 5: Is there a relationship between the rating of a paid app and its price?
First, filter Dataset 1 to include only paid apps, then describe the relationship between the Rating of the apps and the Price. You need to provide both numerical summary as well as graphical display. Finally, write a comment about your findings and answer the question.
Section 6: Exploration of Two Categorical Variables
Suggest a research question around the topic of mobile phone or mobile apps, which involves only 2 categorical variables and involves international students. You need to design a survey that consist of only two questions, resulting in two categorical variables.
After collecting the data and store it in Dataset 2, describe the relationship between the two variables. You need to provide both numerical summary and graphical display. Finally, write a comment about your findings and answer the research question.