ML4Trading
Machine Learning for Trading 2014
keywords: algorithmic, algorithms, trading, machine learning, ML, finance, equities, markets, quantitative finance, georgia tech, gt, Tucker Balch
Important Announcements
New:
- Please sign up with piazza for this course: piazza
- Project 1A description 2014Fall7646_Project_1A due Tuesday August 26 at 11:55PM
- Project 1B description 2014Fall7646_Project_1B due Thursday September 4 at 11:55PM
- Project 1C description 2014Fall7646_Project_1C due Tuesday September 16 at 11:55PM
- fin_project1 CompInvestI_Homework_1 due Sunday October 5 at 11:55PM
- Mid-term study guide 2014MidTermGuide
- fin_project2 CompInvestI_Homework_2 due Sunday October 12 at 11:55PM
- all fin_projs here: Computational_Investing_I
- fin_project3 CompInvesti_Homework_3 due Thursday October 23 at 11:55PM
- fin_project4 CompInvesti_Homework_4 due Sunday Nov 2 at 11:55PM
- Project 2 description 2014Fall7646_Project_2 due Tuesday Nov 18 at 11:55PM
- Project 3 description 2014Fall7646_Project_3 due Tuesday Dec 2 at 11:55PM
- Project 4 description 2014Fall7646_Project_4 due Friday Dec 12 at 11:55PM
Course Overview
This course introduces students to the real world challenges of implementing machine learning based trading strategies including the algorithmic steps from information gathering to market orders. The focus is on how to apply probabilistic machine learning approaches to trading decisions. We consider statistical approaches like linear regression, KNN and regression trees and how to apply them to actual stock trading situations.
- For more detail on topics to be covered see ML4TradingOverview
Who the Course is For
The course is open to and intended for graduate and upper level undergraduate students in Computing, ISYE, Math & Management.
Prerequisites: Students should have strong coding skills and some familiarity with equity markets. No finance or machine learning experience is assumed. Here's a short test to check if you have strong programming skills: quiz. If you don't do well on that quiz, you should either drop the course, or be sure to plan so that you can devote extra time to the course.
Note that this course serves CS major students with machine learning experience, as well as students in other majors such as ISYE, MGMT, or MATH who have different experiences. All types of students are welcome! The ML topics might be "review" for CS students, while finance parts will be review for finance students. However, even if you have experience in these topics, you will find that we consider them in a different way than you might have seen before, in particular with an eye towards implementation for trading.
Student Responsibilities
- Read the emails sent to the course email list. Check at least daily.
- Participate in class and via the piazza site.
- Don't plagiarize.
Course Logistics
- Instructor: Associate Professor Tucker Balch
- Office hours: Tu/Th 4:30-5:30 (after class) or by appointment
- firstname at cc.gatech.edu
- phone 678-523-8685
- TAs: Brian Hrolenok, Alex Moreno, Jayita Bhattacharya
- Course Website: http://wiki.quantsoftware.org/wiki/ML4Trading. (this webpage)
- Required textbook: What Hedge Funds Really Do
- Some parts of the course will be based on readings from "Machine Learning: A Probabilistic Perspective" by Kevin Murphy.
- Prerequisites: Machine learning and portfolio management experience is not assumed; the course is designed to provide students with the necessary background they will need on these topics. Programming will be in the Python language. Students are expected to be strong programmers (or willing to invest significant effort in learning to program in Python).
- Computing_environment_for_ML4Trading
- GT T-Square site for the class: [TBD].
- discussion forum on piazza.com.
- Resources
Goals For the Course
By the end of this course, students should be able to:
- Understand data structures used for algorithmic trading.
- Know how to construct software to access live equity data, assess it, and make trading decisions.
- Understand 3 popular machine learning algorithms and how to apply them to trading problems.
- Understand how to assess a machine learning algorithm's performance for time series data (stock price data).
- Know how and why data mining (machine learning) techniques fail.
- Construct a stock trading software system that uses current daily data.
Some limitations/constraints:
- We use daily data. This is not an HFT course, but many of the concepts here are relevant.
- We don't interact (trade) directly with the market, but we will generate equity allocations that you could trade if you wanted to.
Grading
- 70.0%: Projects
- 15.0%: Text processing: ML Project 1
- 10.0%: KNN Learner: ML Project 2
- 15.0%: Random Forest Learner: ML Project 3
- 15.0%: Price Prediction: ML Project 4
- 7.5%: fin_proj: Portfolio Optimization Project
- 7.5%: fin_proj: Market Simulator Project
- 10.0%: Midterm Exam
- 20.0%: Homework and attendance
- 5.0%: Attendance, taken at random times. One score dropped.
- 15.0% Homework: The remaining finance projects, equally weighted
Late Policy
- -5% per day late
Plagiarism
In most cases I expect all code that you submit was written by you. I will present some libraries in class that you are allowed to use (such as pandas and numpy). Otherwise, all source code, images and write ups you provide should have been created by you alone.
What is allowed:
- Meeting with other students to discuss implementations. You should talk about solutions at the pseudo code level.
- Sharing snippets of code to solve specific (small) problems such as examples of how to address sections of arrays in Python. In this case the shared code should not be more than 5 lines.
- Searching the web for other solution outlines that you may draw on (but not copy directly). If you are inspired by a solution on the web, you MUST cite that code with comments in your code.
What is not allowed:
- Copying sections of code longer than 5 lines. Note that merely changing variable names does not suffice.
- Copying code from the web.
- Use of ideas from the web that are not cited in comments.
Week 1
Tuesday 19 Aug 2014
Course Overview, Project 1 A Assignment
Thursday 21 Aug 2014
Overview of Project 1
N files, leave one out
Instructor will outline one solution in class: Not required that you follow it
Bag of words model
Hashing trick
tf-idf
Required Readings
http://en.wikipedia.org/wiki/Bag-of-words_model
http://en.wikipedia.org/wiki/Tf–idf
http://en.wikipedia.org/wiki/Vector_space_model
http://en.wikipedia.org/wiki/Hashing_trick
Week 2
Tuesday 26 August 2014
Project 1A due tonight, questions?
Project 1B definition
tf(t,d) definition
idf(t,D) definition
Project 1C definition
Overall: How would instructor solve the problem
Thursday 28 August 2014
How to solve Project 1B
Words by documents size matrix: word_count, tf, tfidf
Words array: words, idf
How to get words array?
Review of solutions to Project 1A
Week 3
Monday 1 September 2014: GT Holiday
Tuesday 2 September 2014
Review of reference solution to Project 1B
Intro to ML: Parametric, Non-Parametric
How to use ML for trading
Selecting X and Y
Relative change, not absolute price
Market relative change is important
How long/short works: Sort SP500 and long 250, short other 250
Thursday 4 September 2014
How supervised training works for time series data
Roll forward cross validation
Example ML trading strategies? (maybe)
KNN intro from Murphy book.
Required Readings
Murphy: Chapter 1
Week 4
Tuesday 9 September 2014
Final assignment/description of Project 1C
Definition of the regression problem
Show example with data
What's the API for a regression learner?
- learner = KNNLearner( k = 3)
- learner.addEvidence( Xtrain, Ytrain)
- Yquery = learner.query(Xtest)
How do we compare the quality of two regression learners?
- Error, RMS
- Correlation with ground truth
Thursday 11 September 2014
Book
Any questions about Project 1C (due next Tuesday)?
Review how to measure similarity of d1 & d2 via cosine
Overview of next few weeks:
- Implement and test KNN learner
- Install QSTK
How to implement KNN
Pop quiz
Friday 12 September 2014: Coursera course starts
Required Readings
Andrew Moore's slides on KNN [[1]]
Week 5
Tuesday 16 September 2014
Review: Parameterized polynomial model
degrees of polynomial + 1 = degrees of freedom
In sample error versus degrees of freedom
Out of sample error versus degrees of freedom
Definition of overfitting
Module 11: Overview of Fin part
Module 21: So you want to be a fund manager?
Module 22: Common metrics to assess fund performance
Thursday 18 September 2014
Module 23: Common metrics, pt 2
Module 24: Excel demo of metrics
Required Readings
Murphy 1.4.7 (overfitting)
Chapters 1, 2, 3, 4 in course textbook
Optional Readings
Slide decks [[2]]
Week 6
Tuesday 23 September 2014
QSTK Installation Workshop
Thursday 25 September 2014
Module 31: Market Mechanics
Module 32: Order Book, Short Selling
Module 33: How Hedge Funds Exploit, Market Mechanics
Required Readings
QSToolKit_Installation_Guide
Chapters 1, 2, 3, 4 in course textbook
Optional Readings
Slide decks [[3]]
Week 7
Tuesday 30 September 2014
Mid-term review
fin_proj1 review
Thursday 2 October 2014
Mid-term exam
Week 8
Tuesday 7 October 2014
Module 34: The computing inside a hedge fund
Module 91: Where does info come from? Arbitrage
Module 92: 3 Versions of EMH
Module 93: Event Studies
fin_proj2
Thursday 9 October 2014
3 ways to assess company value: Book, Market Cap, Intrinsic
Module 41: Intrinsic Value
Module 43: Fundamental analysis of company value
Friday 10 October 2014: Drop Day
Required Readings
Chapters 8, 10, 11 in course textbook
Optional Readings
Slide decks [[4]]
Week 9
Tuesday 14 October 2014: Fall Break
Thursday 16 October 2014
Module 121: Review Events, and Survivor Bias
How to tell how much money we'd make?
A backtester!
fin_project3: due in one week
Module 43: Fundamental analysis of company value
Back to: 3 ways to assess company value: Book, Market Cap, Intrinsic
Review Intrinsic Value ($1/year for 1M years)
Module 41: Intrinsic Value
Required Readings
http://en.wikipedia.org/wiki/Discounted_cash_flow
Optional Readings
Slide decks [[5]]
Hints on how to build marketsim: media:marketsim-guidelines.pdf
Week 10
Tuesday 21 October 2014
Module 42: How news affects prices
Module 71: Capital Assets Pricing Model
Module 72: What is Beta
Module 73: How Hedge Funds use CAPM
Module 122: Actual vs Adjusted Prices (Dividends & Splits)
Module 123: Data Scrubbing (Checking for Sanity)
Thursday 23 October 2014
Module 112: The Inputs and Outputs of a Portfolio Optimizer
Module 113: The Importance of Correlation and Covariance (in daily returns)
Module 114: The Efficient Frontier
Module 115: How Optimizers Work (In general, not just for portfolios)
fin_proj4 introduction
Required Readings
Chapter 7, 12, 13, 14 in the course textbook
Optional Readings
Slide decks [[6]]
Week 11
Tuesday 28 October 2014
Module 151: Coin Flipping
Module 152: Fundamental Law Part 1
Module 153: Fundamental Law Part 2
Module 141: CAPM recap, overview for portfolios
Module 142: Example use of CAPM for long/short bet removing market risk
Thursday 30 October 2014
Module 191: Example Information Feeds
Module 192: Intro to Technical Analysis
Module 193: Some Example Technical Indicators
Module 194: Bollinger Bands
introduction to fin_proj5
Required Readings
Chapter 9 in the course textbook
Optional Readings
Slide decks [[7]]
Week 12
Tuesday 4 November 2014 |
Thursday 6 November 2014 |
Week 13
Tuesday 11 November 2014
ML Project 2: KNN defined and released
How to build a decision tree, according to Quinlan
Thursday 13 November 2014
Recap how to build a decision tree
Ensemble learning == Decision Forest
Recommended data structure for decision trees
Bagging
Boosting
Random trees (Cutler/PERT) version
Tricks with leaves
NaNs
Regression Leaves
Required Readings
Seminal paper on trees: Quinlan Paper
Perfect Ensembles of Random Trees Cutler
Optional Readings
Ensemble learners wikipedia
Bagging wikipedia
Boosting wikipedia
Bagging and boosting trees Diettrich
Section 16.2 of Murphy
Week 14
Tuesday 18 November 2014
Guest Speaker: Prof Soohun Kim
media:Bayesian_Portfolio_Analysis_Practice.zip
Random Forest Assignment
Thursday 20 November 2014
Converting spatial learners to timeseries learners.
Overview of last assignment
- Sine wave part
- Real stock part
Week 15
Tuesday 26 November 2014: No Class
Thursday 27 November 2014: Thanksgiving Break
Optional Readings
Slide decks [[8]]
Week 16 (Dead Week)
Tuesday 2 December 2014
Tips and tricks on how to make the sine wave learner work
Thursday 4 December 2014
Week 17 (Finals Week)
Tuesday 9 December 2014
Thursday 11 December 2014
Old Homeworks & Projects
2013
- 2013Fall7646_Project_1: Build a KNN learner (due Friday November 15 at 11:55PM)
- 2013Fall7646_Project_2: Build a RandomForest learner (due Tuesday December 3 at 11:55PM)
- 2013Fall7646_Project_3: Build a Stock price forecaster (due Friday December 13 at 11:55PM)
2012
- 2012Fall7646 Homework 1: Create a portfolio of 4 equities, and assess it in excel or libre office. Due 11:55PM, September 6.
- 2012Fall7646 Homework 2: Write a Python program to investigate different one day trading methods. Due 11:55PM, September 26.
- 2012Fall7646 Homework 3: Write a Python program to to assess real stock data. Due 11:55PM, October 2.
- 2012Fall7646 Homework 4: Optional homework: Assess correlation and Beta. Due 11:55PM, October 2.
- 2012Fall7646_Project_1: Project 1: Build a market simulator. Due 11:55PM, October 16.
- 2012Fall7646 Homework 5: Create an indicator and plot it, Due 11:55, October 9.
- 2012Fall7646 Homework 6: Event Studies Due 11:55, October 30.
- 2012Fall7646_Homework_7: Event Studies generate trades which you feed into market simulator
- 2012Fall7646_Project_2: Build a KNN learner
- 2012Fall7646_Project_3: Build a Random Forest Learner
- 2012Fall7646_Homework_8: Use a learner to forecast returns
2011
- 2011Fall7646 Homework 1: Write a Python program to print the first 100 prime numbers, Due at 11:55, Thursday September 1.
- 2011Fall7646 Homework 2: Get your account set up at gekko.cc.gatech.edu, Due at 11:55, Tuesday September 13.
- 2011Fall7646 Homework 3: Generate some of your own plots, Due at 11:55, Thursday September 15.
- 2011Fall7646 Homework 4: Compute and plot Bollinger bands. Due at 11:55, Monday October 3.
- 2011Fall7646 Project 1: Event studies. Due at 11:55, Tuesday October 25.
- 2011Fall7646 Project 2: KNN Learner. Due at 11:55, Thursday November 3.
- 2011Fall7646 Homework 5: Compare two implementations of KNN. Due at 11:55, Tuesday November 15.
- 2011Fall7646 Homework 6: Classify News Articles. Due at 11:55, Tuesday November 22.
- 2011Fall7646 Project 3: Find the best indicators. Due at 11:55, Wednesday, December 7.
- 2011Fall7646 Project 4: Classify news articles. Due at 11:55, Wednesday December 14.
Other Resources
- Resources
- python tutorial for timeseries
- wikipedia page on KNN
- Andrew Moore's slides on KNN
- Culter paper on random forests
- original paper on how to build a single decision tree
- [9] Cramer's Guide to the Stock Market
- Markets are efficient iff P=NP
- Andrew Moore's slides [[10]]
- 2010_GT_Course
- 2011_GT_Course
- 2012_GT_Course
- 2013_GT_Course