Course Details Datasets

Course Outline

Course Logistics

Instructor: Patrick Boily, uOttawa
Communications with the Instructor must be conducted through Slack. The Slack invite link is available in the “Announcements” section on Brightspace.

Schedule: Sep 06 – Dec 06
    Mondays, 08:30-09:50, STE J0106
    Wednesdays, 13:00-14:30, STE J0106

Zoom link (if needed)

Projects Deadlines: 22-Sep, 06-Oct, 27-Oct, 17-Nov, 08-Dec, 15-Dec
    Upload completed projects as PDFs to Slack
    10 pages maximum for reports, no exceptions

Course Notes:
    Data Understanding, Data Analysis, and Data Science (DUDADS)
    The Practice of Data Visualization (PDV)

Detailed Schedule:
Schedule

Project Datasets:
    polls_us_election_2016
    ab_data | mimic3d
    BASA_AUC_2028_912 | dat_F_sub | data_P_sub | years20262030
    flights1_2019_1

Sample Reports: (may not apply to course)
    Classification Project, [data]
    Short write-up [B; although I like how it is written in terms of the “Multiple I’s”, there are too many missing technical details, it is too short, it includes no code, and it doesn’t really do what was required in the project]
    Long write-up [A-; I like the “Multiple I’s” approach on this one, but it is a tad too long and doesn’t entirely do what was asked for in the project statement]

0. Storytelling with Data (06-Sep to 15-Sep) 1. Data Visualization (18-Sep to 06-Oct)
Video Lectures: (3:59:29)
    0.1 Data Fundamentals | Part 1 (0:39:56)
    0.1 Data Fundamentals | Part 2 (0:39:08)
    0.2 Basics of R | Part 1 (0:25:50)
    0.2 Basics of R | Part 2 (0:25:01)
    0.3 Tidyverse for Data Wrangling (0:37:44)
    0.4 Data Processing | Part 1 (0:34:23)
    0.4 Data Processing | Part 2 (0:37:27)

Slide Decks:
    Data Fundamentals
    Introduction to Programming
    Programming in R (and Python)
    Data Processing
    Storytelling with Data

Course Notes Chapters:
    DUDADS, ch. 1: Programming Primer
    DUDADS, ch. 13: Non-Technical Aspects of Data Work (DUDADS)
    DUDADS, ch. 14: Data Science Basics
    DUDADS, ch. 15: Data Preparation
    PDV, ch. 7: Stories and Storytelling
    PDV, ch. 8: Effective Storytelling Visuals

Video Lectures: (4:28:04)
    1.1 Data Exploration & 1.2 Pre-Analysis Visualization (1:16:00)
    1.3 Post-Analysis Visualization & 1.4 Visualization Catalogue (1:13:35)
    1.5 Grammar of Graphics & 1.6 Introduction to Dashboards (1:11:55)
    1.7 Graphics with ggplot2 | Part 1 (0:19:32)
    1.7 Graphics with ggplot2 | Part 2 (0:27:02)

Slide Deck: Data Exploration and Data Visualization

Course Notes Chapters:
    PDV, ch. 1: A Data Visualization Primer
    PDV, ch. 2: Data Visualization and Exploration
    PDV, ch. 4: The Mechanics of Visual Perception
    PDV, ch. 5: Visual Design and Data Charts
    PDV, ch. 6: Universal Design and Accessibility
    PDV, ch. 9: Visualization Toolbox
    PDV, ch. 10: Visualization Software
    PDV, ch. 12: ggplot2 Visualizations in R

Project 1 (due date: 06-Oct)

Supplementary Materials:
    DAL Podcast Episodes: 17 episodes (9:07:38)

2. Bayesian Data Analysis (09-Oct to 27-Oct) 3. Queueing Systems (30-Oct to 17-Nov)

Video Lectures: (4:06:59)
    2.1 Plausible Reasoning & 2.2 The Rules of Probability & 2.3 Bayes’ Theorem (1:13:48)
    2.3 Bayes’ Theorem (cont.) & 2.4 Examples & 2.5 Prior Distributions (1:07:30)
    2.5 Prior Distributions (cont.) (0:42:10)
    2.6 Naïve Bayes Classification & 2.7 MCMC and Numerical Methods (1:03:30)

Slide Deck: A Cursory Glance at Bayesian Analysis

Course Notes Chapter:
    DUDADS, ch. 26: Bayesian Data Analysis

Project 2 (due date: 27-Oct)

Supplementary Materials:
    R Code Archive (from Kruschke’ Doing Bayesian Analysis)
    Tutorial – Coin (Excel)
    Tutorial – Dollar Bills (Notebook)
    Tutorial – Planes (Excel)
    Tutorial – Salaries (Excel)

Video Lectures: (2:39:29)
    3.1 Introduction & 3.2 Terminology & 3.3 Notation (0:46:05)
    3.4 Little’s Queueing Formula & 3.5 The M/M/c Queueing Model (00:46:16)
    3.6 Example: Canadian Airports (01:07:08)

Slide Decks:
    Basics of Queueing Theory
    CATSA and Queueing Systems

Course Notes Chapter:
    DUDADS, ch. 25: Queueing Models

Project 3 (due date: 17-Nov)

4. Anomaly Detection and Outlier Analysis (20-Nov to 08-Dec) Supplementary Material
Video Lectures: (5:51:03)
    4.1 Basic Notions and Overview (00:42:36)
    4.1 Basic Notions and Overview (cont.) (00:36:32)
    4.2 Quantitative Methods (00:52:31)
    4.2 Quantitative Methods (cont.) (00:34:35)
    4.3 Qualitative Methods (00:20:46)
    4.4 Anomalies in High-Dimensional Datasets (00:29:49)
    4.4 Anomalies in High-Dimensional Datasets (cont.) & 4.5 Outlier Ensembles (00:31:32)
    4.6 Anomalies in Text Datasets (00:29:16)

Slide Deck: Anomaly Detection and Outlier Analysis

Course Notes Chapter:
    DUDADS, ch. 27: Anomaly Detection and Outlier Analysis

Project 4 (due date: 30-Jul)

Slide Deck: Feature Selection and Dimension Reduction (DUDADS)

Video Lectures:
    Part 1 (01:13:30)
    Part 2 (00:44:01)
    Part 3 (00:30:15)
    Part 4 (00:28:45)

Course Notes Chapter:
    DUDADS, ch. 19: Introduction to Machine Learning
    DUDADS, ch. 20: Regression and Value Estimation
    DUDADS, ch. 21: Classification and Supervised Learning
    DUDADS, ch. 22: Clustering
    DUDADS, ch. 23: Feature Selection and Dimension Reduction

Supplementary Datasets:
    Canada2011
    Canada2011_CMA
    Algae_Blooms
    2016collisionsfinal.csv
    HR_2016_Census_simple
    GlobalCitiesPBI
    ab_data
    mimic3d
    DraftData
    BASA_AUC_2028_912
    dat_F_sub
    data_P_sub
    years20262030
    Distracted Driving Fatalities
    flights1_2019_1
    Flights Read Me