SYLLABUS

Syllabus

INTRODUCTION TO DATA MINING

Course Introduction
Data Science 360
ML Pipelines
A Case Study of an ML Architecture – Uber

THE LEARNING PROBLEM

The Learning Problem
Linear Regression
Regression Notebooks
Maximum Likelihood Estimation of a marginal model
Maximum Likelihood (ML) Estimation of conditional models
Entropy
Stochastic Gradient Descent
Introduction to Classification
Logistic Regression

DEEP NEURAL NETWORKS

The Neuron (Perceptron)
Deep Neural Networks
Introduction to Backpropagation
Backpropagation in Deep Neural Networks
Backpropagation DNN exercises
Fashion MNIST Case Study
Regularization in Deep Neural Networks

CONVOLUTIONAL NEURAL NETWORKS

Introduction to Convolutional Neural Networks
CNN Architectures
CNN Example Architectures
Using convnets with small datasets
Feature Extraction via Residual Networks

TRANSFER LEARNING

Introduction to Transfer Learning
Transfer Learning for Computer Vision Tutorial

SCENE UNDERSTANDING

Introduction to Scene Understanding
Object Detection
Semantic Segmentation

SEQUENCES AND RNNS

Introduction to Recurrent Neural Networks (RNN)
Simple RNN
The Long Short-Term Memory (LSTM) Cell Architecture
Processing Sequences Using RNN

EMBEDDINGS AND NLP

Introduction to NLP
Word2Vec Embeddings
Word2Vec Workshop
RNN Language Models
Simple RNN Language Model
CNN Language Model
Neural Machine Translation

CLASSICAL LEARNING METHODS

Decision Trees
Regression tree stumps
Ensemble Methods
Random Forests
Adaptive Boosting (AdaBoost)
Gradient Boosting
Boosting workshop
Bayesian Inference
Bayesian Coin Flipping
COVID-19 Antibody Test

NON-PARAMETRIC METHODS

K-means Clustering
k-Nearest Neighbors (kNN) Classification
kNN Workshop

DIMENSIONALITY REDUCTION

Principal Component Analysis (PCA)

MATH BACKGROUND

Math for ML

RESOURCES

Learn Python
Your Programming Environment

ASSIGNMENTS & PROJECTS

Probability Assignment
Bike Rides and the Poisson Model

Theme by the Executable Book Project

Contents

Probability Assignment

Question 1a (10 points)
Question 1b (10 points)
Question 2 (20 points)
Question 3 (20 points)
Question 4 (20 points)

Question 5 (20 points)

Probability Assignment

To get full credit in this assignment you need to use only numpy or jax libraries and include adequate explanation of the code in either markdown cells or code comments. Sometimes you need to type equations – type equations in latex math notation.

PS: Please note that we run through chatGPT the questions and you will be referred to the Dean if we find that a robot answered your questions. .

Question 1a (10 points)

In a private subreddit people are posting their opinions on the CEO of the company you work for. Lets assume that the employees that are posting are random logging in to that subreddit and that each post indicates whether the employee approves or not the job that the CEO is doing. Let �� be the binary random variable where ��=1 indicates approval. You can assume that � is distributed according to a Bernoulli distribution with parameter �=1/2.

Your job is to sample �=50 posts and estimate the approval rate of the CEO by considering the statistics of �=�1+�2+⋯+��. What is the probability that 25 employees approve the CEO?

Question 1b (10 points)

Following your findings in Q1a, read about the Cenral Limit Theorem and recognize that

�=�−��

is normally distributed with mean 0 and variance 1.

Can you find the probability that 25 employees approve the CEO using the Gaussian approximation?

Type the answer here using the latex syntax or handwrite the answer, upload the picture in the same folder and use a new markdown cell with markdown syntax ![title](image_name.png)

Question 2 (20 points)

A sequential experiment involves repeatedly drawing a ball from one of the two urns, noting the number on the ball and replacing the ball in the urn. Urn 0 contains a ball with the number 0 and two balls with the number 1. Urn 1 contains five balls with the number 0 and one ball with the number 1.

The urn from which the first ball is drawn is selected by flipping a fair coin. Urn 0 is used if the outcome is H and urn 1 is used if the outcome is T. The urn used in a subsequent draws corresponds to the number on the ball drawn in the previous draw.

What is the probability of a specific sequence of the numbers on drawn balls being 0011 ?

Type the answer here using the latex syntax or handwrite the answer, upload the picture in the same folder and use a new markdown cell with markdown syntax ![title](image_name.png)

Question 3 (20 points)

Referring to Example 6.6 of the Math for ML book, simulate and plot the bivariate normal distribution with the shown parameters using the Cholesky factorization for the simulation.

# Type the Python code here and ensure you save the notebook with the results of the code execution.

Question 4 (20 points)

Go through the provided links on Poisson and exponential distributions as the Math for ML textbook in your course site is not covering enough these important distributions.

Watch this video https://www.youtube.com/watch?v=Asto3RS46ks where the author is explaining how to simulate a Poisson distribution from scratch.

Using the Kaggle API download this dataset and plot the histogram of the number of cyclists that cross the Brooklyn bridge per day.
Simulate the number of cyclists that cross the Brooklyn bridge per day using the Poisson distribution. Ensure that the simulated counts are similar distribution-wise to the observed counts.

# Type the Python code here and ensure you save the notebook with the results of the code execution.

Question 5 (20 points)

You are asked to stress test an cloud API endpoint and are told that the API exposes a database server that can be abstracted as an M/M/1 queue. Go through this introductory page to just understand the queuing domain and the notation M/M/1. Go also through the elements of the MM1 queue here. Make sure you click on the links and learn about the random process called Poisson process.

Your task is to simulate the behavior of the queue and plot the number of requests that are waiting in the queue as a function of time. You are given three arrival rates of the API requests �=[1,3,4] and the service time of the requests as an exponential random variable with rate �=4.

# Type the Python code here and ensure you save the notebook with the results of the code execution.

Your Programming Environment

Bike Rides and the Poisson Model

Data Mining

Data Mining

Probability Assignment

Question 1a (10 points)

Question 1b (10 points)

Question 2 (20 points)

Question 3 (20 points)

Question 4 (20 points)

Question 5 (20 points)

Need help with your own assignment?

Data Mining

Probability Assignment

Question 1a (10 points)

Question 1b (10 points)

Question 2 (20 points)

Question 3 (20 points)

Question 4 (20 points)

Question 5 (20 points)

Need help with your own assignment?

Related Articles

Task 3 Evaluate the theories that underpin the delivery of health and social care practice. Evaluate person-centered care and its role in holistic care. Discuss issues of social isolation and exclusion in service users and how professio

Nursing Assessment 3: Stakeholder Mapping Use the data analysis from the previous two assessments to complete an initial content draft of your CHAP, including an executive summary identifying

.combine your summaries of your primary and supplemental papers. You should try

Develop a base model. With this base model, how many tests are needed in the row-pooled test strategy: Quantitative Techniques Assignment, NUS