2010 GT Course
Machine Learning for Trading
keywords: algorithmic, algorithms, trading, machine learning, ML, finance, equities, markets, quantitative finance, georgia tech, gt, Tucker Balch
- The course for Fall 2011 is listed as CS 7646 and MGT 8803 TSL.
- QSL students can enroll in either, but there are 30 seats reserved in the MGT 8803 section in early registration.
- The QSTK installation guide has been updated: QSToolKit_Installation_Guide.
- Setting up your computing environment for ML4Trading
This course introduces students to the real world challenges of implementing machine learning based trading strategies including the algorithmic steps from information gathering to market orders. The focus is on how to apply probabilistic machine learning approaches to trading decisions. We consider statistical approaches like linear regression, KNN and regression trees and how to apply them to actual stock trading situations. The course is open to and intended for graduate and upper level undergraduate students in Computing, ISYE, Math & Management.
Prerequisites: Students should have strong coding skills and some familiarity with equity markets. No finance or machine learning experience is assumed. Here's a short test to check if you have "strong programming skills": . If you don't do well on that test, you should either drop the course, or be sure to plan so that you can devote extra time to the course.
Note that this course serves CS major students with machine learning experience, as well as students in other majors such as ISYE, MGMT, or MATH who have different experiences. All types of students are welcome! The ML topics might be "review" for CS students, while finance parts will be review for finance students. However, even if you have experience in these topics, you will find that we consider them in a different way than you might have seen before, in particular with an eye towards implementation for trading.
- For more detail on topics to be covered see ML4TradingOverview
- Instructor: Associate Professor Tucker Balch
- Office hours: Tu/Th 4:30-5:30 (after class) or by appointment
- firstname at cc.gatech.edu
- phone 678-523-8685
- TA Vishal Shekhar
- Office hours: TBD
- phone TBD
- Time/Location: Tuesday & Thursday 3:05 to 4:30, map: 
- Course Website: http://wiki.quantsoftware.org/wiki/ML4Trading. (this webpage)
- Some parts of the course will be based on readings from this book: Active Portfolio Management by Grinold & Kahn. You will have to complete the readings, but you don't have to buy the book if you are a GT faculty or student, you may be able to read the book online [here] at no cost.
- Prerequisites: Machine learning and portfolio management experience is not assumed; the course is designed to provide students with the necessary background they will need on these topics. Programming will be in the Python language. Students are expected to be strong programmers (or willing to invest significant effort in learning to program in Python).
- GT T-Square site for the class
- discussion forum on piazza.com.
Goals For the Course
- Understand 3 machine learning algorithms and how to apply them to trading problems.
- Understand how to assess a machine learning algorithm's performance for time series data (stock price data).
- Know how to construct software to access live equity data, assess it, and make trading decisions.
- Know how and why data mining (machine learning) techniques fail.
- Construct a stock trading software system that uses current daily data.
- Projects: 65%
- Mid-term exam: 10%
- Homeworks: 10%
- Class participation: 5%
- Final exam: 10%
In most cases I expect all code that you submit was written by you. I will present some libraries in class that you are allowed to use (such as pandas and numpy). Otherwise, all source code, images and write ups you provide should have been created by you alone.
What is allowed:
- Meeting with other students to discuss implementations. You should talk about solutions at the pseudo code level.
- Sharing snippets of code to solve specific (small) problems such as examples of how to address sections of arrays in Python. In this case the shared code should be more than 5 lines.
- Searching the web for other solution outlines that you may draw on (but not copy directly). If you are inspired by a solution on the web, you MUST cite that code with comments in your code.
What is not allowed:
- Copying sections of code longer than 5 lines. Note that merely changing variable names does not suffice.
- Copying code from the web.
- Use of ideas from the web that are not cited in comments.
Projects & Homework
2010Fall8803 Project 1: Write a Python program to print the first 100 prime numbers, Due at midnight, Thursday Sept 2.
2010Fall8803 Project 2: Connect to our compute server and start processing stock data.
2010Fall8803 Project 3: Mini-project: compute an indicator.
2010Fall8803 Project 4: Implement an Event Profiler.
2010Fall8803 Project 5: Implement KNN in Python.
2010Fall8803 Project 6: Implement a Random Forest learner.
- python tutorial for timeseries
- wikipedia page on KNN
- Andrew Moore's slides on KNN
- Culter paper on random forests
- original paper on how to build a single decision tree
- See also Resources on this wiki.
|Day||Date||Lecture Topics||Projects Due/Additional Info|
|Tuesday||August 23||Class overview|
|Thursday||August 25||lecture 1: Infrastructure of a quant shop. |
Policy learning, prediction learning.
|Tuesday||August 31||lecture 2: Why does information impact stock price? |
What is a company worth?
|Thursday||September 2||lecture 3: How markets work, details about data, |
details about dividends
is due at midnight.
|Monday||September 6||-----||Holiday: Labor Day|
|Tuesday||September 7||lecture 4: Introduction to data and initial processing of |
data for analysis Part 1
|Thursday||September 9||lecture 5: Introduction to data and initial processing of data|
for analysis Part 2 (with spreadsheet example)
|Tuesday||September 14||lecture 6: Getting your computing environment set up, |
|Thursday||September 16||In class example of time series processing|
|Tuesday||September 21||Guest speaker: Prof Jonathan Clarke||2010Fall8803_Project_2 is due at midnight.|
|Thursday||October 7||John Alberg|
Partner, Euclidean Technologies
|Tuesday||October 12||Demo of pandas.|
|Thursday||October 14||The CAPM|
|Tuesday||October 19||-----||Fall Recess|
|Thursday||October 21||The CAPM and Beta|
|Tuesday||November 2||In depth introduction of KNN|
|Tuesday||November 9||Learner evaluation: Training and Testing data sets|
Roll forward cross validataion
|Thursday||November 11||How to implement a Random Forest learner||KNN Project released 2010Fall8803 Project 5|
|Tuesday||November 16||Review KNN project, address questions about KNN, |
Overview of random forest learner project.
|Thursday||November 18||Final description of random forest learner project||KNN Project Due at 11:55 PM 2010Fall8803 Project 5|
|Thursday||November 25||-----||Holiday: Thanksgiving|
|Friday||November 26||-----||Holiday: Thanksgiving|
|Thursday||December 2||Prof Balch on travel|
|Tuesday||December 14||Final Exam Week|
|Thursday||December 16||Final Exam Week|
|Monday||December 20||Grades Due|