2012 GT Course

From Quantwiki
Jump to: navigation, search

ML4TradingOverview

Machine Learning for Trading 2012

keywords: algorithmic, algorithms, trading, machine learning, ML, finance, equities, markets, quantitative finance, georgia tech, gt, Tucker Balch

Important Announcements

New:

Course Overview

This course introduces students to the real world challenges of implementing machine learning based trading strategies including the algorithmic steps from information gathering to market orders. The focus is on how to apply probabilistic machine learning approaches to trading decisions. We consider statistical approaches like linear regression, KNN and regression trees and how to apply them to actual stock trading situations.

Who the Course is For

The course is open to and intended for graduate and upper level undergraduate students in Computing, ISYE, Math & Management.

Prerequisites: Students should have strong coding skills and some familiarity with equity markets. No finance or machine learning experience is assumed. Here's a short test to check if you have strong programming skills: quiz. If you don't do well on that quiz, you should either drop the course, or be sure to plan so that you can devote extra time to the course.

Note that this course serves CS major students with machine learning experience, as well as students in other majors such as ISYE, MGMT, or MATH who have different experiences. All types of students are welcome! The ML topics might be "review" for CS students, while finance parts will be review for finance students. However, even if you have experience in these topics, you will find that we consider them in a different way than you might have seen before, in particular with an eye towards implementation for trading.

Student Responsibilities

  • Read the emails sent to the course email list. Check at least daily.
  • Participate in class and via the piazza site.
  • Don't plagiarize.

Course Logistics

  • Instructor: Associate Professor Tucker Balch
    • Office hours: Tu/Th 4:30-5:30 (after class) or by appointment
    • firstname at cc.gatech.edu
    • phone 678-523-8685
  • TA Fang Li
    • Office Hours: Tuesdays 1PM - 3PM
  • TA Jia Zhao
    • Office Hours: Thursdays 10AM - 12 Noon
    • CCB 308
  • TA Sourabh Bajaj
  • Course Website: http://wiki.quantsoftware.org/wiki/ML4Trading. (this webpage)
  • Some parts of the course will be based on readings from this book: Active Portfolio Management by Grinold & Kahn. You will have to complete the readings, but you don't have to buy the book if you are a GT faculty or student, you may be able to read the book online [here] at no cost.
  • Prerequisites: Machine learning and portfolio management experience is not assumed; the course is designed to provide students with the necessary background they will need on these topics. Programming will be in the Python language. Students are expected to be strong programmers (or willing to invest significant effort in learning to program in Python).
  • Computing_environment_for_ML4Trading
  • GT T-Square site for the class: t-square is here.
  • discussion forum on piazza.com.
  • Resources

Goals For the Course

By the end of this course, students should be able to:

  • Understand data structures used for algorithmic trading.
  • Know how to construct software to access live equity data, assess it, and make trading decisions.
  • Understand 3 popular machine learning algorithms and how to apply them to trading problems.
  • Understand how to assess a machine learning algorithm's performance for time series data (stock price data).
  • Know how and why data mining (machine learning) techniques fail.
  • Construct a stock trading software system that uses current daily data.

Some limitations/constraints:

  • We use daily data. This is not an HFT course, but many of the concepts here are relevant.
  • We don't interact (trade) directly with the market, but we will generate equity allocations that you could trade if you wanted to.

Grading

  • Projects: 60%
  • Midterm Exam: 10%
  • Homework: 30%

Plagiarism

In most cases I expect all code that you submit was written by you. I will present some libraries in class that you are allowed to use (such as pandas and numpy). Otherwise, all source code, images and write ups you provide should have been created by you alone.

What is allowed:

  • Meeting with other students to discuss implementations. You should talk about solutions at the pseudo code level.
  • Sharing snippets of code to solve specific (small) problems such as examples of how to address sections of arrays in Python. In this case the shared code should not be more than 5 lines.
  • Searching the web for other solution outlines that you may draw on (but not copy directly). If you are inspired by a solution on the web, you MUST cite that code with comments in your code.

What is not allowed:

  • Copying sections of code longer than 5 lines. Note that merely changing variable names does not suffice.
  • Copying code from the web.
  • Use of ideas from the web that are not cited in comments.

August

DATE ------------------------ TOPICS-------------------------------------------------------------- NOTES---------------------------------------
Tuesday August 21 Class overview
Thursday August 23 The $1T Bet Movie
Tuesday August 28 Mechanics of Stock Prices: the order book

What is HFT?
Tour of a hold-overnight hedge fund

Thursday August 30 Approach: I assume you want to run a hedge fund

Why run a hedge fund?
How to assess a fund?

September

DATE ------------------------ TOPICS-------------------------------------------------------------- NOTES---------------------------------------
Tuesday September 4 Company Value
Thursday September 6 Video: [1]

Cramer's Guide to the Stock Market

Tuesday September 11 Capital Assets Pricing Model

Read: Grinold & Kahn Chapter 2

Thursday September 13 Recap of CAPM

Definition of Beta and example with scatter plots
Discussion of scatter plots (fatness)
Mechanics of shorting
Magic of long/short investing: removing market risk
Relationship of Long/Short to CAPM

Tuesday September 18 Biased coin betting (HW 2 overview)

Workshop on setting up computing environment

Thursday September 20 Find correlated returns (HW 3 overview)

Workshop on reading real stock data (tutorial 1)
Balch away

Tuesday September 25 HW 3 overview

Project 1 overview

Thursday September 27 Big picture for next group of homeworks:

Goal: Software to develop and test your strategies
Hypothesis testing: Does event X affect price?
(Event Studies)
What is the shape of the return curve?
Generate the trades
Test with market simulator

October

DATE ------------------------ TOPICS-------------------------------------------------------------- NOTES---------------------------------------
Tuesday October 2 Efficient Markets Hypothesis (3 versions)

Technical analysis
Fundamental analysis
A bit about data
Create & plot your own technical indicator
(HW 5 overview)

HW 3, Find correlated stocks Due

HW 4, Optional, Due
Paper of interest: Markets are efficient iff P=NP

Thursday October 4 About Data

Adjusting for Dividends and Splits
Missing Data
Fill forward and fill back
The fundamental law of active portfolio management

Tuesday October 9 Recap of Fundamental Law

Review Project 1
Integrate your technical indicator
with Event Studies (QSTK_Tutorial_9)
(HW 6 overview)

HW 5, Create a Technical Indicator Due
Thursday October 11 Introduction to ML

Regression
Classification
Exam 1 preview
Data we will use for ML

Friday October 12 Drop Day
Tuesday October 16 No Class (Fall Break) Project 1, Market Simulator Due
Thursday October 18 Exam Prep

Instance-based learning
Andrew Moore's slides [[2]] Linear regression
KNN introduction


Tuesday October 23 Exam 1 (30 minutes)

KNN continued
Compare LR and KNN for time series prediction
(Project 2)


Thursday October 25 Recap of KNN

Formulation of problem in context of time series


Tuesday October 30 Recap of HW 6

Recap of approach to KNN (making problem fit KNN)
Overfitting.
Assessing a learning algorithm.

HW 6, Event Study Due

November

DATE ------------------------ TOPICS-------------------------------------------------------------- NOTES---------------------------------------
Thursday November 1 Recap of HW 6 (re-doing it!)

Project 2 announced and defined
Simplified to learn simpler "straight" data
Assessing a learning algorithm.

Tuesday November 6 Decision Trees

How to use one if you have one
How to learn one
Seminal paper on trees: Quinlan Paper
Overfitting.
Cross validation
Roll forward cross validation

HW 6, Event Study Due
Thursday November 8 Random Trees

Decision Trees Recap
Data Structure for Decision Trees
Ensemble learners wikipedia
Bagging wikipedia
Boosting wikipedia
Bagging and boosting trees Diettrich
Perfect Ensembles of Random Trees Cutler
Use technical indicator to generate trades
Random forests (Project 3)

Tuesday November 13 Random Forests

Definition of Project 3
Tricks with leaves
NaNs
Regression Leaves

Project 2, KNN due
Thursday November 15 News & Prices (Prof Jonathan Clarke guest talk)
Tuesday November 20 Yes we will have class on this day

Integrating your learner with a trader (Project 4)

Homework 7 due
Thursday November 22 Thanksgiving (no class)
Tuesday November 27 More on integrating learner with trader (Project 4)
Thursday November 29 Recap of Course Objectives

Class Survey Review
Student Review
Homework 8 (last one) Discussion
Feature selection

December

DATE ------------------------ TOPICS-------------------------------------------------------------- NOTES---------------------------------------
Tuesday December 4 Random Forest Review and ML on Stock Data
Thursday December 6 Guest Lecture: Prof Jonathan Clarke Project 3, Random Forests Due
Tuesday December 11 Final Exam Week

No Class

Thursday December 13 Final Exam Week

No Class

Project 4, Combine ML & Trading Due

Slide Decks

Homework & Projects

Other Resources

Projects from Past Versions of the Course

These are from last year. We will be changing them for this year.