Course Details Datasets

Course Outline

Course Logistics

Instructor: Patrick Boily, uOttawa, Data Action Lab & Idlewyld Analytics
Communications with the Instructor must be conducted through Slack. The Slack invite link is available in the “Announcements” section on Brigthspace.

Schedule: Sep 08 – Dec 08
    Mondays, 08:30-10:00, MNT 103
    Wednesdays, 13:00-14:30, MNT 204
    No class on Oct 11, Oct 25, and Oct 27
    The session on Dec 08 takes place from 08:30-10:00, MNT 103

Zoom link
    Meeting ID: 784 099 4865
    Passcode: 7RXJ9J

Projects Deadlines: 24-Sep, 08-Oct, 29-Oct, 19-Nov, 10-Dec, 17-Dec
    Upload completed projects as PDFs to Brightspace
    12 pages maximum, no exceptions

Detailed Schedule:

Project Datasets:
    polls_us_election_2016
    ab_data | mimic3d
    BASA_AUC_2028_912 | dat_F_sub | data_P_sub | years20262030
    flights1_2019_1

Sample Reports:
    Classification Project, [data]
    Short write-up [B; although I like how it is written in terms of the “Multiple I’s”, there are too many missing technical details, it is too short, it includes no code, and it doesn’t really do what was required in the project]
    Long write-up [A-; I like the “Multiple I’s” approach on this one, but it is a tad too long and doesn’t entirely do what was asked for in the project statement]

0. Data Analysis Universals (06-Sep to 17-Sep) 1. Data Visualization (18-Sep to 08-Oct)
Video Lectures: (3:59:29)
    0.1 Data Fundamentals | Part 1 (0:39:56)
    0.1 Data Fundamentals | Part 2 (0:39:08)
    0.2 Basics of R | Part 1 (0:25:50)
    0.2 Basics of R | Part 2 (0:25:01)
    0.3 Tidyverse for Data Wrangling (0:37:44)
    0.4 Data Processing | Part 1 (0:34:23)
    0.4 Data Processing | Part 2 (0:37:27)

Slide Decks:
    Data Fundamentals
    Introduction to Programming
    Programming in R (and Python)
    Data Processing

Supplementary Materials:
    The Fundamentals of Data Insight (Report)
    Basics of R for Data Analysis (Report)
    The Essentials of Data Preparation (Report)
    R Basics (Notebook)
    More Data Stuff in R (Notebook)
    Data Wrangling and the Tidyverse (Notebook)
    Data Cleaning and Preparation in R (Notebook)

Video Lectures: (4:28:04)
    1.1 Data Exploration & 1.2 Pre-Analysis Visualization (1:16:00)
    1.3 Post-Analysis Visualization & 1.4 Visualization Catalogue (1:13:35)
    1.5 Grammar of Graphics & 1.6 Introduction to Dashboards (1:11:55)
    1.7 Graphics with ggplot2 | Part 1 (0:19:32)
    1.7 Graphics with ggplot2 | Part 2 (0:27:02)

Slide Deck: Data Exploration and Data Visualization

Course Notes: A Primer of Data Visualization

Project 1 (due date: 08-Oct)

Supplementary Materials:
    DAL Podcast Episodes: 17 episodes (9:07:38)
    A ggplot2 Primer (Report)
    Simple Data Visualization in R (Notebook)
    Data Visualization with ggplot2 (Notebook)
    More Data Visualization Stuff in R (Notebook)

2. Bayesian Data Analysis (09-Oct to 29-Oct) 3. Queueing Systems (30-Oct to 19-Nov)

Video Lectures: (4:06:59)
    2.1 Plausible Reasoning & 2.2 The Rules of Probability & 2.3 Bayes’ Theorem (1:13:48)
    2.3 Bayes’ Theorem (cont.) & 2.4 Examples & 2.5 Prior Distributions (1:07:30)
    2.5 Prior Distributions (cont.) (0:42:10)
    2.6 Naïve Bayes Classification & 2.7 MCMC and Numerical Methods (1:03:30)

Slide Deck: A Cursory Glance at Bayesian Analysis

Course Notes: A Soft Introduction to Bayesian Data Analysis

Project 2 (due date: 29-Oct)

Supplementary Materials:
    R Code Archive (from Kruschke’ Doing Bayesian Analysis)
    Tutorial – Coin (Excel)
    Tutorial – Dollar Bills (Notebook)
    Tutorial – Planes (Excel)
    Tutorial – Salaries (Excel)

Video Lectures: (2:39:29)
    3.1 Introduction & 3.2 Terminology & 3.3 Notation (0:46:05)
    3.4 Little’s Queueing Formula & 3.5 The M/M/c Queueing Model (00:46:16)
    3.6 Example: Canadian Airports (01:07:08)

Slide Decks:
    Basics of Queueing Theory
    CATSA and Queueing Systems

Course Notes: The Essentials of Queueing Systems Methods

Project 3 (due date: 19-Nov)

4. Anomaly Detection and Outlier Analysis (20-Nov to 10-Dec) Supplementary Material
Video Lectures: (5:51:03)
    4.1 Basic Notions and Overview (00:42:36)
    4.1 Basic Notions and Overview (cont.) (00:36:32)
    4.2 Quantitative Methods (00:52:31)
    4.2 Quantitative Methods (cont.) (00:34:35)
    4.3 Qualitative Methods (00:20:46)
    4.4 Anomalies in High-Dimensional Datasets (00:29:49)
    4.4 Anomalies in High-Dimensional Datasets (cont.) & 4.5 Outlier Ensembles (00:31:32)
    4.6 Anomalies in Text Datasets (00:29:16)

Slide Deck: Anomaly Detection and Outlier Analysis

Course Notes: Anomaly Detection and Outlier Analysis

Project 4 (due date: 10-Dec)

Supplementary Materials: R Code Archive for the Slides (not commented)

Slide Deck: Feature Selection and Dimension Reduction

Video Lectures:
    Part 1 (01:13:30)
    Part 2 (00:44:01)
    Part 3 (00:30:15)
    Part 4 (00:28:45)

Course Notes: Feature Selection and Data Reduction (with Examples)

Supplementary Materials:
    R Code Archive (not commented)
    Python Notebooks Archive (install Anaconda to view)

Supplementary Datasets:
    Canada2011
    Canada2011_CMA
    Algae_Blooms
    2016collisionsfinal.csv
    HR_2016_Census_simple
    GlobalCitiesPBI
    ab_data
    mimic3d
    DraftData
    BASA_AUC_2028_912
    dat_F_sub
    data_P_sub
    years20262030
    Distracted Driving Fatalities
    flights1_2019_1
    Flights Read Me