ML4Trading
From Quantwiki
Machine Learning for Trading 2012
keywords: algorithmic, algorithms, trading, machine learning, ML, finance, equities, markets, quantitative finance, georgia tech, gt, Tucker Balch
Important Announcements
New:
- Project 3 is posted. 2012Fall7646_Project_3
- Homework 7 is posted. 2012Fall7646_Homework_7
- Homework 8 is posted. 2012Fall7646_Homework_8
- Required readings:
Course Overview
This course introduces students to the real world challenges of implementing machine learning based trading strategies including the algorithmic steps from information gathering to market orders. The focus is on how to apply probabilistic machine learning approaches to trading decisions. We consider statistical approaches like linear regression, KNN and regression trees and how to apply them to actual stock trading situations.
- For more detail on topics to be covered see ML4TradingOverview
Who the Course is For
The course is open to and intended for graduate and upper level undergraduate students in Computing, ISYE, Math & Management.
Prerequisites: Students should have strong coding skills and some familiarity with equity markets. No finance or machine learning experience is assumed. Here's a short test to check if you have strong programming skills: quiz. If you don't do well on that quiz, you should either drop the course, or be sure to plan so that you can devote extra time to the course.
Note that this course serves CS major students with machine learning experience, as well as students in other majors such as ISYE, MGMT, or MATH who have different experiences. All types of students are welcome! The ML topics might be "review" for CS students, while finance parts will be review for finance students. However, even if you have experience in these topics, you will find that we consider them in a different way than you might have seen before, in particular with an eye towards implementation for trading.
Student Responsibilities
- Read the emails sent to the course email list. Check at least daily.
- Participate in class and via the piazza site.
- Don't plagiarize.
Course Logistics
- Instructor: Associate Professor Tucker Balch
- Office hours: Tu/Th 4:30-5:30 (after class) or by appointment
- firstname at cc.gatech.edu
- phone 678-523-8685
- TA Fang Li
- Office Hours: Tuesdays 1PM - 3PM
- TA Jia Zhao
- Office Hours: Thursdays 10AM - 12 Noon
- CCB 308
- TA Sourabh Bajaj
- Course Website: http://wiki.quantsoftware.org/wiki/ML4Trading. (this webpage)
- Some parts of the course will be based on readings from this book: Active Portfolio Management by Grinold & Kahn. You will have to complete the readings, but you don't have to buy the book if you are a GT faculty or student, you may be able to read the book online [here] at no cost.
- Prerequisites: Machine learning and portfolio management experience is not assumed; the course is designed to provide students with the necessary background they will need on these topics. Programming will be in the Python language. Students are expected to be strong programmers (or willing to invest significant effort in learning to program in Python).
- Computing_environment_for_ML4Trading
- GT T-Square site for the class: t-square is here.
- discussion forum on piazza.com.
- Resources
Goals For the Course
By the end of this course, students should be able to:
- Understand data structures used for algorithmic trading.
- Know how to construct software to access live equity data, assess it, and make trading decisions.
- Understand 3 popular machine learning algorithms and how to apply them to trading problems.
- Understand how to assess a machine learning algorithm's performance for time series data (stock price data).
- Know how and why data mining (machine learning) techniques fail.
- Construct a stock trading software system that uses current daily data.
Some limitations/constraints:
- We use daily data. This is not an HFT course, but many of the concepts here are relevant.
- We don't interact (trade) directly with the market, but we will generate equity allocations that you could trade if you wanted to.
Grading
- Projects: 60%
- Midterm Exam: 10%
- Homework: 30%
Plagiarism
In most cases I expect all code that you submit was written by you. I will present some libraries in class that you are allowed to use (such as pandas and numpy). Otherwise, all source code, images and write ups you provide should have been created by you alone.
What is allowed:
- Meeting with other students to discuss implementations. You should talk about solutions at the pseudo code level.
- Sharing snippets of code to solve specific (small) problems such as examples of how to address sections of arrays in Python. In this case the shared code should not be more than 5 lines.
- Searching the web for other solution outlines that you may draw on (but not copy directly). If you are inspired by a solution on the web, you MUST cite that code with comments in your code.
What is not allowed:
- Copying sections of code longer than 5 lines. Note that merely changing variable names does not suffice.
- Copying code from the web.
- Use of ideas from the web that are not cited in comments.
August
| DATE | ------------------------ | TOPICS-------------------------------------------------------------- | NOTES--------------------------------------- |
| Tuesday | August 21 | Class overview | |
| Thursday | August 23 | The $1T Bet Movie | |
| Tuesday | August 28 | Mechanics of Stock Prices: the order book What is HFT? | |
| Thursday | August 30 | Approach: I assume you want to run a hedge fund Why run a hedge fund? |
September
| DATE | ------------------------ | TOPICS-------------------------------------------------------------- | NOTES--------------------------------------- |
| Tuesday | September 4 | Company Value | |
| Thursday | September 6 | Video: [1] Cramer's Guide to the Stock Market | |
| Tuesday | September 11 | Capital Assets Pricing Model Read: Grinold & Kahn Chapter 2 | |
| Thursday | September 13 | Recap of CAPM Definition of Beta and example with scatter plots | |
| Tuesday | September 18 | Biased coin betting (HW 2 overview) Workshop on setting up computing environment | |
| Thursday | September 20 | Find correlated returns (HW 3 overview) Workshop on reading real stock data (tutorial 1) | |
| Tuesday | September 25 | HW 3 overview Project 1 overview | |
| Thursday | September 27 | Big picture for next group of homeworks: Goal: Software to develop and test your strategies |
October
| DATE | ------------------------ | TOPICS-------------------------------------------------------------- | NOTES--------------------------------------- |
| Tuesday | October 2 | Efficient Markets Hypothesis (3 versions) Technical analysis | HW 3, Find correlated stocks Due HW 4, Optional, Due |
| Thursday | October 4 | About Data Adjusting for Dividends and Splits | |
| Tuesday | October 9 | Recap of Fundamental Law Review Project 1 | HW 5, Create a Technical Indicator Due |
| Thursday | October 11 | Introduction to ML Regression | |
| Friday | October 12 | Drop Day | |
| Tuesday | October 16 | No Class (Fall Break) | Project 1, Market Simulator Due |
| Thursday | October 18 | Exam Prep Instance-based learning
| |
| Tuesday | October 23 | Exam 1 (30 minutes) KNN continued
| |
| Thursday | October 25 | Recap of KNN Formulation of problem in context of time series
| |
| Tuesday | October 30 | Recap of HW 6 Recap of approach to KNN (making problem fit KNN) | HW 6, Event Study Due |
November
| DATE | ------------------------ | TOPICS-------------------------------------------------------------- | NOTES--------------------------------------- |
| Thursday | November 1 | Recap of HW 6 (re-doing it!) Project 2 announced and defined | |
| Tuesday | November 6 | Decision Trees How to use one if you have one | HW 6, Event Study Due |
| Thursday | November 8 | Random Trees Decision Trees Recap | |
| Tuesday | November 13 | Random Forests Definition of Project 3 | Project 2, KNN due |
| Thursday | November 15 | News & Prices (Prof Jonathan Clarke guest talk) | |
| Tuesday | November 20 | Yes we will have class on this day Integrating your learner with a trader (Project 4) | Homework 7 due |
| Thursday | November 22 | Thanksgiving (no class) | |
| Tuesday | November 27 | More on integrating learner with trader (Project 4) | |
| Thursday | November 29 | Recap of Course Objectives Class Survey Review |
December
| DATE | ------------------------ | TOPICS-------------------------------------------------------------- | NOTES--------------------------------------- |
| Tuesday | December 4 | Random Forest Review and ML on Stock Data | |
| Thursday | December 6 | Guest Lecture: Prof Jonathan Clarke | Project 3, Random Forests Due |
| Tuesday | December 11 | Final Exam Week No Class | |
| Thursday | December 13 | Final Exam Week No Class | Project 4, Combine ML & Trading Due |
Slide Decks
Homework & Projects
- 2012Fall7646 Homework 1: Create a portfolio of 4 equities, and assess it in excel or libre office. Due 11:55PM, September 6.
- 2012Fall7646 Homework 2: Write a Python program to investigate different one day trading methods. Due 11:55PM, September 26.
- 2012Fall7646 Homework 3: Write a Python program to to assess real stock data. Due 11:55PM, October 2.
- 2012Fall7646 Homework 4: Optional homework: Assess correlation and Beta. Due 11:55PM, October 2.
- 2012Fall7646_Project_1: Project 1: Build a market simulator. Due 11:55PM, October 16.
- 2012Fall7646 Homework 5: Create an indicator and plot it, Due 11:55, October 9.
- 2012Fall7646 Homework 6: Event Studies Due 11:55, October 30.
- 2012Fall7646_Homework_7: Event Studies generate trades which you feed into market simulator
- 2012Fall7646_Project_2: Build a KNN learner
- 2012Fall7646_Project_3: Build a Random Forest Learner
- 2012Fall7646_Homework_8: Use a learner to forecast returns
Other Resources
- python tutorial for timeseries
- wikipedia page on KNN
- Andrew Moore's slides on KNN
- Culter paper on random forests
- original paper on how to build a single decision tree
- See also Resources on this wiki.
- 2010_GT_Course
- 2011_GT_Course
Projects from Past Versions of the Course
These are from last year. We will be changing them for this year.
- 2011Fall7646 Homework 1: Write a Python program to print the first 100 prime numbers, Due at 11:55, Thursday September 1.
- Reading assignment: Chapters 1 & 2 of Grinold & Kahn text book. Due before Thursday class, September 1.
- 2011Fall7646 Homework 2: Get your account set up at gekko.cc.gatech.edu, Due at 11:55, Tuesday September 13.
- 2011Fall7646 Homework 3: Generate some of your own plots, Due at 11:55, Thursday September 15.
- 2011Fall7646 Homework 4: Compute and plot Bollinger bands. Due at 11:55, Monday October 3.
- 2011Fall7646 Project 1: Event studies. Due at 11:55, Tuesday October 25.
- 2011Fall7646 Project 2: KNN Learner. Due at 11:55, Thursday November 3.
- 2011Fall7646 Homework 5: Compare two implementations of KNN. Due at 11:55, Tuesday November 15.
- 2011Fall7646 Homework 6: Classify News Articles. Due at 11:55, Tuesday November 22.
- 2011Fall7646 Project 3: Find the best indicators. Due at 11:55, Wednesday, December 7.
- 2011Fall7646 Project 4: Classify news articles. Due at 11:55, Wednesday December 14.