Course Details Datasets

Course Outline

Course Logistics

Instructor: Patrick Boily, uOttawa
Communications with the Instructor must be conducted through Slack. The Slack invite link is available in the “Announcements” section on Brigthspace.

Schedule: Sep 07 – Dec 07
    Mondays, 08:30-10:00, MRT 221
    Wednesdays, 13:00-14:30, MRT 221
    No class on Oct 24 and Oct 26

Zoom link

Projects Deadlines: 23-Sep, 07-Oct, 28-Oct, 18-Nov, 09-Dec, 16-Dec
    Upload completed projects as PDFs to Slack
    12 pages maximum, no exceptions

Course Notes: (you will need a gmail address to request access to the course notes on RStudio Connect)
    Data Understanding, Data Analysis, and Data Science (DUDADS)
    The Practice of Data Visualization (PDV)

Detailed Schedule:
Schedule

Project Datasets:
    polls_us_election_2016
    ab_data | mimic3d
    BASA_AUC_2028_912 | dat_F_sub | data_P_sub | years20262030
    flights1_2019_1

Sample Reports:
    Classification Project, [data]
    Short write-up [B; although I like how it is written in terms of the “Multiple I’s”, there are too many missing technical details, it is too short, it includes no code, and it doesn’t really do what was required in the project]
    Long write-up [A-; I like the “Multiple I’s” approach on this one, but it is a tad too long and doesn’t entirely do what was asked for in the project statement]

0. Data Analysis Universals (05-Sep to 16-Sep) 1. Data Visualization (17-Sep to 07-Oct)
Video Lectures: (3:59:29)
    0.1 Data Fundamentals | Part 1 (0:39:56)
    0.1 Data Fundamentals | Part 2 (0:39:08)
    0.2 Basics of R | Part 1 (0:25:50)
    0.2 Basics of R | Part 2 (0:25:01)
    0.3 Tidyverse for Data Wrangling (0:37:44)
    0.4 Data Processing | Part 1 (0:34:23)
    0.4 Data Processing | Part 2 (0:37:27)

Slide Decks:
    Data Fundamentals
    Introduction to Programming
    Programming in R (and Python)
    Data Processing
    Storytelling with Data

Course Notes Chapters:
    Programming Primer (DUDADS)
    Non-Technical Aspects of Data Work (DUDADS)
    Data Science Basics (DUDADS)
    Data Preparation (DUDADS)
    Visualization and Storytelling (PDV)

Video Lectures: (4:28:04)
    1.1 Data Exploration & 1.2 Pre-Analysis Visualization (1:16:00)
    1.3 Post-Analysis Visualization & 1.4 Visualization Catalogue (1:13:35)
    1.5 Grammar of Graphics & 1.6 Introduction to Dashboards (1:11:55)
    1.7 Graphics with ggplot2 | Part 1 (0:19:32)
    1.7 Graphics with ggplot2 | Part 2 (0:27:02)

Slide Deck: Data Exploration and Data Visualization

Course Notes Chapters:
    Data Visualization and Data Exploration (DUDADS)
    Overview (PDV)
    Basics of Data Visualization (PDV)
    Essentials of Visual Design (PDV)
    Practical Aspects of Data Visualization (PDV)

Project 1 (due date: 07-Oct)

Supplementary Materials:
    DAL Podcast Episodes: 17 episodes (9:07:38)

2. Bayesian Data Analysis (08-Oct to 28-Oct) 3. Queueing Systems (29-Oct to 18-Nov)

Video Lectures: (4:06:59)
    2.1 Plausible Reasoning & 2.2 The Rules of Probability & 2.3 Bayes’ Theorem (1:13:48)
    2.3 Bayes’ Theorem (cont.) & 2.4 Examples & 2.5 Prior Distributions (1:07:30)
    2.5 Prior Distributions (cont.) (0:42:10)
    2.6 Naïve Bayes Classification & 2.7 MCMC and Numerical Methods (1:03:30)

Slide Deck: A Cursory Glance at Bayesian Analysis

Course Notes Chapter:
    Bayesian Data Analysis (DUDADS)

Project 2 (due date: 28-Oct)

Supplementary Materials:
    R Code Archive (from Kruschke’ Doing Bayesian Analysis)
    Tutorial – Coin (Excel)
    Tutorial – Dollar Bills (Notebook)
    Tutorial – Planes (Excel)
    Tutorial – Salaries (Excel)

Video Lectures: (2:39:29)
    3.1 Introduction & 3.2 Terminology & 3.3 Notation (0:46:05)
    3.4 Little’s Queueing Formula & 3.5 The M/M/c Queueing Model (00:46:16)
    3.6 Example: Canadian Airports (01:07:08)

Slide Decks:
    Basics of Queueing Theory
    CATSA and Queueing Systems

Course Notes Chapter:
    Queueing Systems (DUDADS)

Project 3 (due date: 18-Nov)

4. Anomaly Detection and Outlier Analysis (19-Nov to 09-Dec) Supplementary Material
Video Lectures: (5:51:03)
    4.1 Basic Notions and Overview (00:42:36)
    4.1 Basic Notions and Overview (cont.) (00:36:32)
    4.2 Quantitative Methods (00:52:31)
    4.2 Quantitative Methods (cont.) (00:34:35)
    4.3 Qualitative Methods (00:20:46)
    4.4 Anomalies in High-Dimensional Datasets (00:29:49)
    4.4 Anomalies in High-Dimensional Datasets (cont.) & 4.5 Outlier Ensembles (00:31:32)
    4.6 Anomalies in Text Datasets (00:29:16)

Slide Deck: Anomaly Detection and Outlier Analysis

Course Notes Chapter:
    Anomaly Detection and Outlier Analysis (DUDADS)

Project 4 (due date: 09-Dec)

Slide Deck: Feature Selection and Dimension Reduction (DUDADS)

Video Lectures:
    Part 1 (01:13:30)
    Part 2 (00:44:01)
    Part 3 (00:30:15)
    Part 4 (00:28:45)

Course Notes Chapter:
    Feature Selection and Data Reduction

Supplementary Datasets:
    Canada2011
    Canada2011_CMA
    Algae_Blooms
    2016collisionsfinal.csv
    HR_2016_Census_simple
    GlobalCitiesPBI
    ab_data
    mimic3d
    DraftData
    BASA_AUC_2028_912
    dat_F_sub
    data_P_sub
    years20262030
    Distracted Driving Fatalities
    flights1_2019_1
    Flights Read Me