Solution: QBUS2820- –Assignment 1

联系我们：手动添加方式: 微信>添加朋友>企业微信联系人>13262280223 或者 QQ: 1483266981

QBUS2820- Assignment 1: Predicting Restaurant Revenue

1) Background

As a data analyst for a national quick service restaurant (QSR) chain, the leadership is looking for

reliable daily revenue forecasts for each outlet to support staffing, inventory, and promotional

decisions. Your task is to build a regression model that predicts Revenue for each outlet day,

using the features provided below.

2) Data Provided

You will be given three CSV files (two to students, one used only by the marker):

Training.csv : labeled training data containing the target column Revenue.

Test_noLabel.csv : the same columns without Revenue; generate predictions for these

rows.

Test.csv : the same rows as Test_noLabel.csv but with Revenue; provided only to the

marker for evaluation.

3) Variables

Target Variable

Revenue : Continuous target; daily outlet revenue, measured in thousands of dollars.

Features

Variable Type Description

OutletID Identifier Unique outlet identifier (not

a feature unless engineered).

Date Date Calendar date of observation.

Month Integer (1–12) Month number.

Weekday Binary 1 if Monday–Friday; 0 if

Saturday–Sunday.

Downtown Binary 1 if outlet is located in the

CBD; else 0.

Mall Binary 1 if outlet is inside/adjacent

to a mall, airport or campus;

else 0.

HighIncomeArea Binary 1 if catchment area is high

income; else 0.

OfficesNearby Numeric Nearby office density

(roughly “hundreds of

offices” scale).

CompetitorsNearby Numeric Number of nearby competing

QSRs (~5 km radius).

Promo Binary 1 if a promotion was active

that day; else 0.

EventNearby Binary 1 if a local event occurred

near the outlet that day; else

Rain_mm Numeric Daily rainfall in millimetres

(mm).

LagHigh Binary 1 if the previous month’s

mean revenue for this outlet

exceeded the global median;

else 0.

Important: The dataset contains realistic complications, a small number of outliers (in some

explanatory features as well as the response Revenue) and missing values. Your preprocessing

should handle these appropriately (e.g. imputation, robust choices, and sensible feature

engineering). The dataset has been ‘anonymized’ or obfuscated. This process may modify the

realism or the strength of the variable relationships, therefore, prioritize statistical principles

while making modelling decisions over domain knowledge (e.g. do not assume a variable is

influential or not influential based on just ‘common sense’, but pay attention to variable tha

may make the modelling spurious).

4) Your Tasks

1. Exploratory Data Analysis (EDA): Explore distributions, identify outliers, check missingness

patterns, and examine relationships with the target. Be careful with making modelling

decisions at the exploratory stage,

your goal should be to maximize predictive accuracy.

2. Modeling: Fit appropriate regression models to predict Revenue. You may compare multiple

methods covered in class (e.g., linear/regularized models, KNN, variable transformations).

Use

only methods covered in class. Keep model selection consistent (e.g., compare on the

same metric and scale).

3. Prediction file: Generate predictions for Test_noLabel.csv from your chosen model save as

SID_Assignment1_prediction.csv with a single column named Revenue.

4. Reproducibility: Ensure your notebook runs end-to-end when without errors when starting

from a clean python session and the last cell prints the test MSE when Test.csv is present.

Assume the training data is in the same folder as the notebook. It should always produce

the same results (remember the random seed).

Running time for the notebook should not

be too long, aim for a max of 10 minutes (we will be flexible due to hardware variability, just

make sure it does not take much more than that). This may force you to make decisions

based on theoretical tradeoffs for speeding up the process.

5) Evaluation Metric

We will use Mean Squared Error (MSE) on the hidden test set (Test.csv) to evaluate the

performance of your selected model. Make

Your submitted notebook will be executed with Test.csv in the same folder to compute and print

this MSE.

6) What to Submit

SID_Assignment1_document.pdf: a clear report (≤ 15 pages, font size 12) describing EDA,

modeling, selection rationale, and conclusions. Report numerical results to four decimal

places. Focus on the important aspects, documenting the reasoning for the decisions and

the tradeoffs expected. Tradeoffs: Every time you make a decision, there are pros and cons,

you should state them (e.g. using holdout validation with a certain validation size trades off

accuracy of the estimation of the error for fidelity to the original training set size).

Descriptions should be enough for data analysts in your field to understand the process and

the decisions made along the way.

SID_Assignment1_implementation.ipynb: your Python notebook that produces the result

of the report document. Please make sure the notebook executes cleanly end-to-end

(restart and run all).

SID_Assignment1_prediction.csv: a single column named Revenue with predictions for

Test_noLabel.csv,

created from your selected model.

Last cell template (the marker will run this):

import pandas as pd

from sklearn.metrics import mean_squared_error

QSR_test = pd.read_csv(“Test.csv”)

# provided by the marker

y_true = QSR_test[“Revenue”].values

# YOUR CODE: load your trained pipeline/model here and predict:

X_hidden = QSR_test.drop(columns=[“Revenue”])

# replace below with how you call the model, should reproduce your

submitted csv

y_pred = my_model.predict(X_hidden)

# code below should run as is

test_error = mean_squared_error(y_true, y_pred)

print(test_error)

7) Additional info on formatting

The report should be well-structured and easy to read; include clear figures/tables.

Maximum 15 pages (including appendices). Make sure figures and tables are readable, have

captions and are referenced in the text. Do not rely on Appendices for completeness, all

main points should be on the body of the report. The report should not be the notebook

converted to pdf, it should be a Word, LaTex or similar document containing the essential

information.

8) Marking Criteria

This assignment is worth 25 marks in total, with 14 marks allocated to the content of the

document.pdf and 11 marks to the Python implementation.

The marking breakdown is as follows:

Prediction accuracy: Your test error will be compared against a baseline model developed by

the teaching team.

The marker first runs SID Assignment1 implementation.ipynb.

If the file runs smoothly and produces a test MSE, up to 11 marks will be awarded based on

prediction accuracy relative to the baseline model (outperforming the baseline model will

net you the 11 marks, then it will proportionally subtract marks).

If the marker cannot run SID Assignment1 implementation.ipynb or if no test MSE is

produced, partial marks (maximum 3, meaning a potential loss of 8 marks) may be awarded

based on the appropriateness of the file.

Report described in SID Assignment1 document.pdf: Up to 14 marks are allocated based on:

The appropriateness of the chosen prediction method.

The detail, discussion, and explanation of your data analysis procedure.

See the Marking Criteria for more details.

CSV File Submission: Up to 2 marks will be deducted if you fail to upload the CSV

9) Final Notes

If you believe there are errors in the assignment, please contact the teaching team as soon

as possible. We encourage you to read the instructions carefully and seek clarification early

if you are unsure about any requirements.

QBUS2820- –Assignment 1最先出现在KJESSAY历史案例。

Related posts:

Need help with your own assignment?

Related Articles

MARKETING PLAN ASSIGNMENT

Describe how the Work Health and Safety Act 2011 applies to community organisations 2. Outline the policies and procedures you are responsible for developing and monitoring in relation to:

A marketing plan is the key way that businesses decide how to market their products and services to customers. You have been asked by your company to participate

Assignment Instructions: Answer the following questions using the Week 6 Correla