2010 GT Course

From Quantwiki
Jump to: navigation, search

ML4TradingOverview

Machine Learning for Trading

keywords: algorithmic, algorithms, trading, machine learning, ML, finance, equities, markets, quantitative finance, georgia tech, gt, Tucker Balch

Important Announcements

  • The course for Fall 2011 is listed as CS 7646 and MGT 8803 TSL.
  • QSL students can enroll in either, but there are 30 seats reserved in the MGT 8803 section in early registration.

Course Overview

This course introduces students to the real world challenges of implementing machine learning based trading strategies including the algorithmic steps from information gathering to market orders. The focus is on how to apply probabilistic machine learning approaches to trading decisions. We consider statistical approaches like linear regression, KNN and regression trees and how to apply them to actual stock trading situations. The course is open to and intended for graduate and upper level undergraduate students in Computing, ISYE, Math & Management.

Prerequisites: Students should have strong coding skills and some familiarity with equity markets. No finance or machine learning experience is assumed. Here's a short test to check if you have "strong programming skills": [1]. If you don't do well on that test, you should either drop the course, or be sure to plan so that you can devote extra time to the course.

Note that this course serves CS major students with machine learning experience, as well as students in other majors such as ISYE, MGMT, or MATH who have different experiences. All types of students are welcome! The ML topics might be "review" for CS students, while finance parts will be review for finance students. However, even if you have experience in these topics, you will find that we consider them in a different way than you might have seen before, in particular with an eye towards implementation for trading.

Course Logistics

  • Instructor: Associate Professor Tucker Balch
    • Office hours: Tu/Th 4:30-5:30 (after class) or by appointment
    • firstname at cc.gatech.edu
    • phone 678-523-8685
  • TA Vishal Shekhar
    • Office hours: TBD
    • mailvishalshekhar-at-gmail-dot-com
    • phone TBD
  • Time/Location: Tuesday & Thursday 3:05 to 4:30, map: [2]
  • Course Website: http://wiki.quantsoftware.org/wiki/ML4Trading. (this webpage)
  • Some parts of the course will be based on readings from this book: Active Portfolio Management by Grinold & Kahn. You will have to complete the readings, but you don't have to buy the book if you are a GT faculty or student, you may be able to read the book online [here] at no cost.
  • Prerequisites: Machine learning and portfolio management experience is not assumed; the course is designed to provide students with the necessary background they will need on these topics. Programming will be in the Python language. Students are expected to be strong programmers (or willing to invest significant effort in learning to program in Python).
  • GT T-Square site for the class
  • discussion forum on piazza.com.
  • Resources

Goals For the Course

  • Understand 3 machine learning algorithms and how to apply them to trading problems.
  • Understand how to assess a machine learning algorithm's performance for time series data (stock price data).
  • Know how to construct software to access live equity data, assess it, and make trading decisions.
  • Know how and why data mining (machine learning) techniques fail.
  • Construct a stock trading software system that uses current daily data.

Grading

  • Projects: 65%
  • Mid-term exam: 10%
  • Homeworks: 10%
  • Class participation: 5%
  • Final exam: 10%

Plagiarism

In most cases I expect all code that you submit was written by you. I will present some libraries in class that you are allowed to use (such as pandas and numpy). Otherwise, all source code, images and write ups you provide should have been created by you alone.

What is allowed:

  • Meeting with other students to discuss implementations. You should talk about solutions at the pseudo code level.
  • Sharing snippets of code to solve specific (small) problems such as examples of how to address sections of arrays in Python. In this case the shared code should be more than 5 lines.
  • Searching the web for other solution outlines that you may draw on (but not copy directly). If you are inspired by a solution on the web, you MUST cite that code with comments in your code.

What is not allowed:

  • Copying sections of code longer than 5 lines. Note that merely changing variable names does not suffice.
  • Copying code from the web.
  • Use of ideas from the web that are not cited in comments.

Projects & Homework

2010Fall8803 Project 1: Write a Python program to print the first 100 prime numbers, Due at midnight, Thursday Sept 2.

2010Fall8803 Project 2: Connect to our compute server and start processing stock data.

2010Fall8803 Project 3: Mini-project: compute an indicator.

2010Fall8803 Project 4: Implement an Event Profiler.

2010Fall8803 Project 5: Implement KNN in Python.

2010Fall8803 Project 6: Implement a Random Forest learner.

final exam

final exam key

Other Resources

Syllabus

Day Date Lecture Topics Projects Due/Additional Info
August
Tuesday August 23 Class overview
Thursday August 25 lecture 1: Infrastructure of a quant shop.

Policy learning, prediction learning.
see also: Work_Flow_Guide

Tuesday August 31 lecture 2: Why does information impact stock price?

What is a company worth?

September
Thursday September 2 lecture 3: How markets work, details about data,

details about dividends

2010Fall8803_Project_1

is due at midnight.

Monday September 6 ----- Holiday: Labor Day
Tuesday September 7 lecture 4: Introduction to data and initial processing of

data for analysis Part 1

Thursday September 9 lecture 5: Introduction to data and initial processing of data

for analysis Part 2 (with spreadsheet example)

Tuesday September 14 lecture 6: Getting your computing environment set up,

and Processing data in Python, see these tutorials:
Python_tutorial_for_timeseries and
Computing environment for ML4Trading.

Thursday September 16 In class example of time series processing
Tuesday September 21 Guest speaker: Prof Jonathan Clarke

Can analysts surprise the market? Evidence from intraday jumps

2010Fall8803_Project_2 is due at midnight.
Thursday September 23
Tuesday September 28
Thursday September 30
October
Tuesday October 5
Thursday October 7 John Alberg

Partner, Euclidean Technologies
Euclidean

Tuesday October 12 Demo of pandas.

video demo


Thursday October 14 The CAPM
Tuesday October 19 ----- Fall Recess
Thursday October 21 The CAPM and Beta
Tuesday October 26
Thursday October 28
November
Tuesday November 2 In depth introduction of KNN

wikipedia page on KNN
Andrew Moore's slides on KNN

Thursday November 4
Tuesday November 9 Learner evaluation: Training and Testing data sets

Roll forward cross validataion

Thursday November 11 How to implement a Random Forest learner

Culter paper on random forests

KNN Project released 2010Fall8803 Project 5
Tuesday November 16 Review KNN project, address questions about KNN,

Overview of random forest learner project.

Thursday November 18 Final description of random forest learner project
KNN Project Due at 11:55 PM 2010Fall8803 Project 5
Tuesday November 23
Thursday November 25 ----- Holiday: Thanksgiving
Friday November 26 ----- Holiday: Thanksgiving
Tuesday November 30
December
Thursday December 2 Prof Balch on travel
Tuesday December 7
Thursday December 9
Tuesday December 14 Final Exam Week
Thursday December 16 Final Exam Week
Monday December 20 Grades Due