From Quantwiki
Jump to: navigation, search



Machine Learning for Trading 2014

keywords: algorithmic, algorithms, trading, machine learning, ML, finance, equities, markets, quantitative finance, georgia tech, gt, Tucker Balch

Important Announcements


Course Overview

This course introduces students to the real world challenges of implementing machine learning based trading strategies including the algorithmic steps from information gathering to market orders. The focus is on how to apply probabilistic machine learning approaches to trading decisions. We consider statistical approaches like linear regression, KNN and regression trees and how to apply them to actual stock trading situations.

Who the Course is For

The course is open to and intended for graduate and upper level undergraduate students in Computing, ISYE, Math & Management.

Prerequisites: Students should have strong coding skills and some familiarity with equity markets. No finance or machine learning experience is assumed. Here's a short test to check if you have strong programming skills: quiz. If you don't do well on that quiz, you should either drop the course, or be sure to plan so that you can devote extra time to the course.

Note that this course serves CS major students with machine learning experience, as well as students in other majors such as ISYE, MGMT, or MATH who have different experiences. All types of students are welcome! The ML topics might be "review" for CS students, while finance parts will be review for finance students. However, even if you have experience in these topics, you will find that we consider them in a different way than you might have seen before, in particular with an eye towards implementation for trading.

Student Responsibilities

  • Read the emails sent to the course email list. Check at least daily.
  • Participate in class and via the piazza site.
  • Don't plagiarize.

Course Logistics

  • Instructor: Associate Professor Tucker Balch
    • Office hours: Tu/Th 4:30-5:30 (after class) or by appointment
    • firstname at
    • phone 678-523-8685
  • TAs: Brian Hrolenok, Alex Moreno, Jayita Bhattacharya
  • Course Website: (this webpage)
  • Required textbook: What Hedge Funds Really Do
  • Some parts of the course will be based on readings from "Machine Learning: A Probabilistic Perspective" by Kevin Murphy.
  • Prerequisites: Machine learning and portfolio management experience is not assumed; the course is designed to provide students with the necessary background they will need on these topics. Programming will be in the Python language. Students are expected to be strong programmers (or willing to invest significant effort in learning to program in Python).
  • Computing_environment_for_ML4Trading
  • GT T-Square site for the class: [TBD].
  • discussion forum on
  • Resources

Goals For the Course

By the end of this course, students should be able to:

  • Understand data structures used for algorithmic trading.
  • Know how to construct software to access live equity data, assess it, and make trading decisions.
  • Understand 3 popular machine learning algorithms and how to apply them to trading problems.
  • Understand how to assess a machine learning algorithm's performance for time series data (stock price data).
  • Know how and why data mining (machine learning) techniques fail.
  • Construct a stock trading software system that uses current daily data.

Some limitations/constraints:

  • We use daily data. This is not an HFT course, but many of the concepts here are relevant.
  • We don't interact (trade) directly with the market, but we will generate equity allocations that you could trade if you wanted to.


  • Projects: 70%
  • Midterm Exam: 10%
  • Homework and pop quizzes: 20%

Late Policy

  • -5% per day late


In most cases I expect all code that you submit was written by you. I will present some libraries in class that you are allowed to use (such as pandas and numpy). Otherwise, all source code, images and write ups you provide should have been created by you alone.

What is allowed:

  • Meeting with other students to discuss implementations. You should talk about solutions at the pseudo code level.
  • Sharing snippets of code to solve specific (small) problems such as examples of how to address sections of arrays in Python. In this case the shared code should not be more than 5 lines.
  • Searching the web for other solution outlines that you may draw on (but not copy directly). If you are inspired by a solution on the web, you MUST cite that code with comments in your code.

What is not allowed:

  • Copying sections of code longer than 5 lines. Note that merely changing variable names does not suffice.
  • Copying code from the web.
  • Use of ideas from the web that are not cited in comments.

Week 1

Tuesday 19 Aug 2014
Course Overview, Project 1 A Assignment

Thursday 21 Aug 2014
Overview of Project 1
N files, leave one out
Instructor will outline one solution in class: Not required that you follow it
Bag of words model
Hashing trick

Week 2

Tuesday 26 August 2014
Project 1A due tonight, questions?
Project 1B definition
tf(t,d) definition
idf(t,D) definition
Project 1C definition
Overall: How would instructor solve the problem

Thursday 28 August 2014
How to solve Project 1B
Words by documents size matrix: word_count, tf, tfidf
Words array: words, idf
How to get words array?
Review of solutions to Project 1A

Week 3

Monday 1 September 2014: GT Holiday

Tuesday 2 September 2014

Thursday 4 September 2014

Week 4

Tuesday 9 September 2014

Thursday 11 September 2014

Friday 12 September 2014: Coursera course starts

Week 5

Tuesday 16 September 2014

Thursday 18 September 2014

Week 6

Tuesday 23 September 2014

Thursday 25 September 2014

Week 7

Tuesday 30 September 2014: Possible Mid-Term Date

Thursday 2 October 2014: Possible Mid-Term Date

Week 8

Tuesday 7 October 2014

Thursday 9 October 2014

Friday 10 October 2014: Drop Day

Week 9

Tuesday 14 October 2014: Fall Break

Thursday 16 October 2014

Week 10

Tuesday 21 October 2014

Thursday 23 October 2014

Week 11

Tuesday 28 October 2014

Thursday 30 October 2014

Week 12

Tuesday 4 November 2014

Thursday 6 November 2014

Week 13

Tuesday 11 November 2014

Thursday 13 November 2014

Week 14

Tuesday 18 November 2014

Thursday 20 November 2014

Week 15

Tuesday 26 November 2014: No Class

Thursday 27 November 2014: Thanksgiving Break

Week 16 (Dead Week)

Tuesday 2 December 2014

Thursday 4 December 2014

Week 17 (Finals Week)

Tuesday 9 December 2014

Thursday 11 December 2014


DATE ------------------------ TOPICS-------------------------------------------------------------- NOTES---------------------------------------

In preparation


DATE ------------------------ TOPICS-------------------------------------------------------------- NOTES---------------------------------------
XXXXXX XXX XX Video: [1]

Cramer's Guide to the Stock Market

XXXXXX XXX XX Efficient Markets Hypothesis (3 versions)

Technical analysis
Fundamental analysis
A bit about data

Paper of interest: Markets are efficient iff P=NP


Instance-based learning
Andrew Moore's slides [[2]] Linear regression
KNN introduction


Formulation of problem in context of time series


Recap of approach to KNN (making problem fit KNN)
Assessing a learning algorithm.


DATE ------------------------ TOPICS-------------------------------------------------------------- NOTES---------------------------------------

Simplified to learn simpler "straight" data
Assessing a learning algorithm.

XXXXXX XXX XX Decision Trees

How to use one if you have one
How to learn one
Seminal paper on trees: Quinlan Paper
Cross validation
Roll forward cross validation

XXXXXX XXX XX Random Trees

Decision Trees Recap
Data Structure for Decision Trees
Ensemble learners wikipedia
Bagging wikipedia
Boosting wikipedia
Bagging and boosting trees Diettrich
Perfect Ensembles of Random Trees Cutler
Use technical indicator to generate trades
Random forests (Project 3)

XXXXXX XXX XX Random Forests

Definition of RF Project
Tricks with leaves
Regression Leaves

XXXXXX XXX XX Yes we will have class on this day

Integrating your learner with a trader Project

XXXXXX XXX XX Thanksgiving (no class)
XXXXXX XXX XX More on integrating learner with trader
XXXXXX XXX XX Recap of Course Objectives

Class Survey Review
Student Review
Feature selection


DATE ------------------------ TOPICS-------------------------------------------------------------- NOTES---------------------------------------
XXXXXX XXX XX Random Forest Review and ML on Stock Data
XXXXXX XXX XX Project Random Forests Due
XXXXXX XXX XX Final Exam Week

No Class

XXXXXX XXX XX Final Exam Week

No Class

Combine ML & Trading Due

Slide Decks

Old Homework & Projects From 2013

Old Homework & Projects From 2012

Other Resources

Projects from Past Versions of the Course

These are from last year. We will be changing them for this year.

Personal tools