Course Syllabus

{Updated 3/24/2020}

 

Class format: A series of guest lectures (14 total), Homework, Final Project/Paper

Textbook: None;  topic readings assigned for each lecture

Bulletin Description:  Introduction to the practical application of data science, machine learning, and artificial intelligence.  A review of relevant Python tools necessary for applying data science is reviewed, as well as a detailed review of data infrastructure and database construction for data science.  A series of detailed industry case studies from experts in the field of data science will be presented.

Rationale: Data science curriculum generally maintains a strong focus on machine learning, and more broadly, AI fundamentals.  A gap exists between academia and the practical application of data science methods in industry.  Beyond machine learning and AI in general, a strong basis in Python and a nuanced understanding of the challenges of building appropriate data pipeline and database architecture are foundational requirements for building successful data science programs.  This course aims to fill this gap. 

Full Description: This seminar-based course will consist of case studies in the field of data science.  The course is primarily aimed at engineering students with an interest in data science.  The goal of this course is to provide an initial look at a broad range of ‘real world’ data science applications.  Each lecture will provide a background review of current data science methods, followed by an applied example based on current projects at Rho AI.  The case studies will generally focus on engineering problems across the waste, water, and energy industries.  In addition to AI and machine learning, this class will review data infrastructure and more generally, “DevOps,” as a critical component to successful deployment of AI-based solutions. 

Grading and requirements: Overall grades calculated as follows: 20% class participation, 30% homework, 50% final project.  Attendance is mandatory, as class participation is a critical element of the curriculum (1 letter grade penalty per unexcused absence).  Students will be expected to arrive for each lecture on time and participate in the discussion. Readings will mostly consist of relevant industry and research papers relating to practical data science applications for mechanical systems.  Readings will be assigned prior to the week’s lecture and students will be expected to come to class prepared to discuss assigned readings in the context of that day’s lecture material.  Homework will be assigned throughout the semester and the penalty for late homework will 10% of the total grade for that assignment per day late.  Each student will be required to complete a final project that is based on the class material, but that takes the class material beyond its introductory stage and reports on a recent application of data science for industrial and/or mechanical system processes.  Projects can be completed individually or in groups of 2-3, and an idea must be formulated early on in the semester in coordination with the instructor(s). 

Syllabus (scroll to end for guest lecturer bios)

Lecture #1 (1/22/20) -  Introduction to data science, machine learning, and Artificial Intelligence (Dr. Erik Allen)

  • Lecture Goal: Understand what machine learning, data science and artificial intelligence are, how they relate to topics you might be familiar with, how they are being applied.
    • What is machine learning?
    • How can you understand it based on current coursework?
    • Techniques / overview
    • Why do we care / why is it hot
  • Required reading: selected overviews of key introductory articles and chapters regarding the role of AI and Machine Learning as being applied to business today.
  • Homework: run a complete machine learning example focused on the Boston Housing data set. Preliminary analytical approaches will be provided, and students will be encouraged to expand upon these tools to increase the accuracy of the prediction method.

 

Lecture #2 (1/29/20) – Python 101 (Dr. Amber Gold)

  • Lecture Goal: Review the foundations of Python programming within the context of data science
  • Required reading and other prep:
  • Lecture Flow:
    • Why use Python?
    • Different ways to interact with Python
    • Write simple scripts using common data science tools
  • Homework: TBD

 

Lecture #3 (2/5/20) - Databases, databases, and databases (Gilman Callsen)

  • Lecture Goal: provide an appreciation for the wide array of available database "makes and models", along with why their relative strengths/weaknesses are critical to understand in the world of data science.
  • Required reading: selected overviews of key database classes (specifically, the difference between relational and non-relational) and commonly used databases in each category (e.g. MySQL/Postgresql, Redis, Elasticsearch, Arango)
  • Lecture Flow:
    • High level overview of relational databases.
      • Describe the core concepts of tables, normalization, etc.
      • Describe the SQL query language
      • Show different examples of relational database options such as MySQL, Postgresql, etc.
    • Overview of non-relational databases
      • Describe core concepts, which are far more varied than relational
      • Show examples of different query languages for databases
      • Show two examples of non-relational databases (e.g. elasticsearch and arango)
    • Real-world example/demonstration
      • Pit Rho using Postgres for core relational models and Redis for high speed caching during real-time events
    • Interactive example of the steps involved in the database selection process
      • Provide a use case example
      • Walk through selection process with students, showing techniques for weighing pros/cons of different options.
    • Homework: design a system with the most efficient use of databases based on a provided scenario.
    • References / reading materials:

 

Lecture #4 (2/12/20) - Data Pipelines for ML (Alejandro Mesa)

  • Lecture Goal: provide an understanding on how to collect, move and analyze data for ML.
  • Lecture flow:
    • Overview of methodologies for collecting data (open data vs. proprietary data)
    • Overview of data ingestion pipelines:
      • Ingesting, little, but specialized data
      • Ingesting big data
      • Ingesting time-series data
      • Real-time vs batch processing
    • Overview of tools for data analysis and manipulation:
      • Jupyter notebooks
      • Data scaling and transformation
      • Plotting for visual analysis
    • Homework:
      • Pick a real-world problem and design data ingestion pipeline for it

 

Lecture #5 (2/19/20) - Introduction to Neural Networks and Deep Learning (Dr. Vickram Premakumar)

Class Description: 

  • Lecture goal: Introduce the neural network, survey the landscape of specialized architectures and their respective fields of applicability and open the discussion of ‘deepness’ and why these algorithms have been so successful. 
  • SWBAT: Describe a deep neural network as a series of concepts, highlighting the hierarchical nature of the algorithm.
  • Lecture flow:
    • Expose students’ perception of neural networks/deep learning
    • Early applications & Successes of deep learning
    • Fundamentals towards a basic neural network
      • Nonlinearity, logistic regressions, and layering
    • Neural network
      • Stitching together logistic units
      • The costs of model complexity & backpropagation through calculus
    • Why are deep learning models so useful?
      • Review of some theories
      • Leveraging Hierarchy
    • Modern deep learning architectures & applications:
      • How can we ensure translational invariance? Convolutional NN 
      • How can we encode temporal dependencies? Recurrent NN
      • How can we create, rather than predict? Generative Adversarial Networks

Lecture #6 (2/26/20) - Reinforcement learning (Dr. Kevin Lyons)

  • Lecture goal: An overview of the reinforcement learning problem and applications in engineering. This includes tie-ins with techniques from supervised learning, and recent successes with deep neural networks as environment models, value function approximators, and policy estimators.
  • Lecture flow:
    • Brief intro to Markov decision processes (MDPs)
    • Definition of reward, state, actions, value, and policy
    • Modeling the environment
      • Examples
    • Modeling the value function
      • Deep Q learning successes (e.g. Atari games)
    • Policy gradient methods
      • Deep learning policy gradient methods
      • A3C algorithm and AlphaGo
    • Review some specific engineering applications
  • (Possible HW):
    • Use Open AI Gym and ML library of choice to get a Q-Learning ML agent working on an Atari game
  • Reference:  Reinforcement Learning, An Introduction

 

Lecture #7 (3/4/20) - Intersections of data science and software engineering (Gilman Callsen)

  • Lecture Goal: provide a guide for successful collaboration between data scientists and software engineers.
  • Required reading: selected overviews of different workflow and project management methodologies (e.g. waterfall vs agile vs purely exploratory vs ...) to highlight how there are multiple methods available for successfully planning and managing projects - and each has its own timelines, expectations, and paths to the end goal.
  • Lecture Flow:
    • High level overview of what industry expects from “data scientists” vs “software engineers”
    • Examples of why it’s important for data scientists and software engineers to work together and find a common language (e.g. some fun examples of where things went wrong and why)
    • Timeline expectations - examples of typical development cycles between the two competencies (specifically highlighting how expectation setting is important and describing why timeline expectations are different)
    • Natural divisions of labor and how to speak each other’s language
    • Recipes for success - ensuring ML work can make it to a production application and ‘play nice’ in demanding environments (e.g. memory, processor, disk, network concerns)
    • Applied example of weekly cross-domain coordination on Pit Rho product, include here not only the roles of software developers and data scientists, but also analysts and domain experts

 

Lecture #8 (3/11/20) - From Research to Reality (Dr. Jason Vandeventer)

Lecture Goal:  To provide insight into the research world, start-up life, and industry, with regards to data science.

Lecture Flow:  Case Studies in Academia, Start-Up, and Industry.

  • Academia:  A deep-dive into an academic research problems and their solutions
  • Start-Up: Understanding the start-up lifestyle and responsibilities
  • Industry:  The challenges of applying data science in industry

 

Lecture #9 (4/1/20) - Machine learning for chemistry and materials science 1: an overview (Dr. Austin Sendek)

  • Lecture goal: provide an overview of ML applications in materials and chemicals design and discovery, including discussion of successful and unsuccessful efforts
  • Required reading:
  • Lecture flow:
    • Overview of computational materials science techniques and how they have evolved in the last few decades: going from solving the H atom to modern-day Kohn-Sham DFT
    • Frame the problem of materials/chemicals discovery: when intuition is poor, humans result to guessing. When screening materials there is a speed vs accuracy trade-off - for certain problems you need speed and can sacrifice accuracy (especially when material space is large)
    • Discuss ML approach and how this is different/similar from existing approaches
    • Provide examples, discuss companies doing this
    • Begin discussion of technical issues
  • Homework (one assignment for both lectures): ~1000 word paper discussing how ML could apply to your individual engineering field of interest. Think beyond “how do we answer questions” and delve into “why do we want to answer this question”? This is the domain where ML shines. Highlight cases where the conventional wisdom/approaches have been unsuccessful and cite work when possible. This may help you see your field in a new light.

 

Lecture #10 (4/8/20) - Machine learning for materials science 2: deep dive in batteries (Dr. Austin Sendek)

  • Lecture goal: provide a real life example of how ML can accelerate materials development by discussing Austin’s work in batteries
  • Required reading:
  • Lecture flow:
    • Continue discussion of technical issues around ML: this is small data and we have to be careful not to overfit at all costs
    • Begin with a dive into battery science
    • Discuss Austin’s battery paper to highlight the technical issues around ML for materials
    • End with class discussion of other directions in hard science where this approach may be valuable
  • Homework: Continue writing assignment from previous week

 

Lecture #11  (4/15/20) - Transfer learning and AutoML (Dr. Ekin Cubuk)

  • Lecture goal: Provide an introduction to the importance of transfer learning in industry, science, and engineering, with the focus on the applications of image recognition, self driving cars, and physics models.
  • Lecture flow:
    • Description of transfer learning, why it's useful, how it relates to human reasoning (reading)
      • Successes in image recognition (reading)
      • Example applications in self driving cars (segmentation), engineering problems with small datasets (car/airplane model prediction)
      • Example applications in physics and materials science
    • General description of AutoML (meta-learning)
      • Why is it needed, what are recent developments
      • The CNN/RNN loop
      • Applications to hyperparameter tuning, dataprocessing pipelines, state of the art results in a variety of applications (reading1, 2 )

 

Lecture #12 (4/22/20) - Applied example, Pit Rho (Andrew Maness)

  • Lecture Goal: Review how data science and machine learning are used for in-race strategy, with an application to NASCAR.
  • Required reading: introductory presentation
  • Lecture Flow:
    • High level overview of NASCAR and NASCAR strategy
    • Overview of how technology drives decision-making in NASCAR
    • The case for real-time strategy tool
    • Key strategy considerations
    • How it plays out in a race
    • Lessons to be more broadly applied
  • Homework: Continue the homework from week one.

 

Lecture #13 (4/29/20) - [topic TBD] - Dr. Erik Allen

 

Instructor and Guest Lecturer Bios

Dr. Josh Browne (Course Instructor)

Josh Browne is a founder at Rho AI, a data science start-up. Rho AI has its roots in sports analytics, and has expanded to applying artificial intelligence (AI) to solve impact problems in the waste, water, and energy industries. One of Rho AI’s current projects, “Partner AI,” aims to disrupt the historically inefficient partnering and investment models for bringing clean tech to market. Using natural language processing and deep learning techniques, the Partner AI program will build and maintain a vast real-time network of organizations, people, technologies, and transactions with the goal of rapidly and efficiently connecting technology developers with influential partners.

Dr. Browne received his PhD (2015) in Earth & Environmental Engineering from Columbia University. Dr. Browne’s research, in collaboration with MIT and funded by the U.S. DOE’s ARPA-E program, focused on the development and commercialization of a novel technology aimed at reducing greenhouse gas emissions in the oil & gas extraction process. Prior to his PhD, Dr. Browne spent two decades as a mechanical engineer in professional motorsports. Dr. Browne held a range of engineering roles, culminating in a role as Crew Chief in NASCAR’s top series.

Dr. Browne received a BS in Mechanical Engineering & Engineering Public Policy from Carnegie Mellon University (1993). Dr. Browne also currently teaches the capstone senior mechanical engineering design class, a graduate course in vehicle dynamics, and is the faculty advisor to the University’s Formula SAE team.

 

Dr. Erik Allen (Guest Lecturer)

Dr. Erik Allen is the Chief Scientific Officer for Rho AI, a Data Analytics consulting company serving business partners across a range of industries. He has developed advanced machine learning models for industries as varied as motorsports, baseball, solar energy, grain processing and logistics, real estate, telecom, and oil and gas, among others. He leads a multidisciplinary team of data scientists and programmers to rapidly prototype and develop machine learning models to solve important business problems, including real-time strategy, asset allocation, process optimization, and investment screening.

 

Dr. Ekin Cubuk (Guest Lecturer)

Ekin "Dogus" Cubuk is a research scientist at Google Brain, working on deep learning as well as its applications to the physical sciences. He holds a B.S. in engineering and a B.A. in physics from Swarthmore College. He studied machine learning applications to atomistic systems at Harvard University where he received his M.A in physics and PhD in applied physics. Before his current position at Google, he spent a year at Stanford University as a postdoctoral fellow to develop machine learning tools for materials design.

 

Gilman Callsen (Guest Lecturer)

Gilman Callsen is an entrepreneur with a penchant for technology startups. He is currently is a founder and Chief Technology Officer at Rho AI, a data science start-up. Before Rho AI he was a co-founder of MC10, an electronic materials company turning traditional rigid electronics into flexible and stretchable systems. Prior to MC10, Gilman founded Chromic Decor, a startup focused on energy efficiency based on electrochromic polymer technology from the University of Connecticut. Gilman holds a BA in Psychology from Yale University.

 

Dr. Amber Gold (Guest Lecturer)

Dr. Amber Gold is a Data Scientist at Rho AI, who assists in the development of machine learning models to support innovative software. Dr. Gold graduated from the University of Southern California with a Ph.D. in biomedical engineering, with an emphasis on neural engineering, and a Master’s degree in electrical engineering, specializing in signal processing. Her doctoral research focused on exploring the role of risk in human motor control and behavior. Her research culminated in implementing a visually guided risk-aware reaching robot controlled by a neural network that exemplified many of the results she found in the human studies. Prior to Rho AI, Dr. Gold worked as a Human Factors consultant for litigation and large-scale user studies.

 

Dr. Kyle Jensen (Guest Lecturer) *tentative*

Kyle Jensen is Associate Dean and the Shanna and Eric Bass Director of Entrepreneurial Programs at the Yale School of Management. Kyle is also an entrepreneur, developer, and scientist. Before joining the Yale SOM faculty, he co-founded Agrivida, a venture-backed biotechnology company; PriorSmart, a patent analytics provider (acquired by RPX); and Rho AI, a software development company focused on data science. Kyle worked previously at the non-profit PIPRA, helping universities in developing economies establish technology licensing offices. In addition to teaching, Kyle works with numerous Yale start-ups outside the classroom. His research interests include entrepreneurship, intellectual property, and innovation.

 

Dr. Kevin Lyons (Guest Lecturer)

Kevin Lyons is a machine learning engineer at Rho AI, where he implements algorithms for a wide variety of problems in industry.  He has also worked as a physics consultant on large engineering projects involving ultra-precise optical measurements and radioisotope production.  He received his Ph.D. in Physics from the University of Rochester, a B.S. in Physics and B.S. in Astronomy from Stony Brook University. 

 

Andrew Maness (Guest Lecturer)

Andrew is a data analyst at Rho AI and is the program manager for “Pit Rho,” Rho AI’s real-time motorsports strategy tool. Previously, he worked as an economist at the Federal Reserve Bank, where he developed financial stress-tests within the Dodd-Frank Act framework. Andrew also co-founded Racingnomics.com, the leading public source for business and economic analysis in the NASCAR industry. Andrew holds BS degrees in Mathematics and Statistics from Kansas State University, as well as MS degrees in Economics and Finance from Wichita State University.

 

Alejandro Mesa (Guest Lecturer)

Alejandro Mesa is the Lead Software Architect at Rho AI, where the leads all aspects of software development across the company. Previously, he founded AxemWorx, a web development company. He also worked at Prometheus Research, where he developed the RexMart platform as a way to analyze medical records. Before that he worked at United Technologies Corporation were he held multiple positions as Senior Engineer, Project Manager and Leadership Associate. Alejandro holds a BS degree in Computer Science and Engineering from the University of Connecticut, as well as a MS degree in Management and Technological Innovation from Rensselaer Polytechnic Institute, and a MS degree in Computer Science from Columbia University.

 

Dr. Joel Moxley (Guest Lecturer)

Dr. Moxley is Co-Founder of Foro Energy and Rho AI, he is a Founding Investor and Board Member of Rubicon Global, Zero Mass Water, Pie Insurance, and Fervo Energy and he is an angel investor in over 30 early stage technology companies including Biota Technology. He actively invests from a small institutional fund into pre-seed, seed, and Series A financing rounds. He is also a member of the Investment Team at Breakthrough Energy Ventures.

Joel is a Precourt Energy Scholar and Adjunct Professor at Stanford University. Joel received his B.S.E in Chemical Engineering from Princeton University, and his Ph.D. in Chemical Engineering from Massachusetts Institute for Technology.  https://energy.stanford.edu/people/joel-moxley-0

 

Dr. Vickram Premakumar (Guest Lecturer)

Vickram Premakumar is a Data Scientist at Rho AI creating novel applications of Natural Language Processing in business. He completed B.S. Mathematics & B.A. Physics at the University of Chicago and PhD in Theoretical Physics at UW Madison. His main interests are the interface of deep learning and statistical mechanics and the unique challenges presented by raw language data.

 

Dr. Austin Sendek (Guest Lecturer)

Dr. Austin Sendek is an Entrepreneur-in-Residence at Rho AI, where he is founding AIONICS, a Stanford spinout company providing a platform for machine learning-enabled battery design. He received his Ph.D. from the Department of Applied Physics at Stanford in 2018 under Prof. Evan Reed. At Stanford, he was the president of the Stanford Energy Club, a member of the 2016 cohort of the Rising Environmental Leaders Program, and a Distinguished Student Lecturer with the Global Climate and Energy Project. Due to his work with AIONICS he was recently named to Forbes Magazine’s 30 Under 30 in Energy for 2019. He is a native Californian and holds a B.S. in Applied Physics with highest honors from UC Davis.

 

Dr. Jason Vandeventer (Guest Lecturer)

Jason Vandeventer is a Computer Vision and Deep Learning Engineer at Rho AI, who works on a variety of projects, such as bringing real-time visual intelligence to edge-platforms distributed around the world. He received his Ph.D. from Cardiff University, where he developed the world's first 4-Dimensional database of human (facial) dyadic interactions. He is a founding member of Soul Machines (featured in 'The Age of A.I.' series: https://youtu.be/UwsrzCVZAb8), which is a start-up that has created AI/EI digital humans with simulated brains and nervous systems. These digital humans are used in a variety of applications, such as helping those with disabilities and providing better customer services.

Course Summary:

Date Details