2011Fall7646 Project 3
You are to implement a KNN learner that can accept a different number of features. You should then use that learner to evaluate the effectiveness of each feature. Use the kdtknn learner (for speed).
The deliverable for this project is the code that implements the above learner and evaluator (evaluate.py), the output text from your program (output.txt) that concludes with a listing of the "best" set of features.
You can find a program that provides a framework for this in QSTK/Examples/Features/featuretest.py
Use the following 9 features. They are coded in the file QSTK/qstkfeat/features.py
0. 10 day moving average: featMA(dfPrice, lLookback=10, bRel=True)
1. 20 day moving average: featMA(dfPrice, lLookback=20)
2. 10 day RSI: featRSI(dfPrice, lLookback=10)
3. 20 day RSI: featRSI(dfPrice, lLookback=20)
4. draw down: featDrawDown(dfPrice)
5. run up: featRunUp(dfPrice)
6. 10 day volume: featVolumeDelta(dfVolume, lLookback= 10)
7. 20 day volume: featVolumeDelta(dfVolume, lLookback = 20)
8. Aroon: featAroon(dfPrice, bDown=False)
You will also have to populate your matrix with:
9. 5 day return: classFutRet(dfPrice, lLookforward = 5). (not really a feature, this is the value to be trained/learned).
You should create a list of the current constituents of the DOW 30 (this is available online). Save those symbols in a file of your choosing and use it as input to your program. Due to difficulties with the data, we will not be using the SP500 for this project. You should use Jan 1 2007 to Dec 31 20009 for training. You should use Jan 1 2010 to June 30 2010 for testing
Create a learner that can accept a list of 1 to 10 features (see list of features above). Train your system to predict 5 day returns. You are to find the set of features that maximizes the correlation coefficient between actual 5 day returns and predicted 5 day returns for the testing set. Use K=5
You should do this by first finding the single best feature, then iteratively adding one feature at a time such that that feature improves the correlation coefficient the best.
Each time your program tests a set of features, it should print out a line of text similar to the below:
testing feature set [0, 9] corr coef = 0.0005 testing feature set [1, 9] corr coef = 0.0008 ... testing feature set [8, 6, 5, 9] corr coef = 0.0012 ... best feature set is [8, 6, 5, 1, 9] corr coef = 0.0013
Software to write
- Create a python program evaluate.py that finds the best set of features (as outlined above).
Experiments to run
- Run the experiment over the symbols given above, using the dates given above.
Submit files (attachments) via t-square
- Your code in evaluate.py
- Output of your code in output.txt
- Disclose and cite any code or ideas you drew from others.
How to submit
Go to the t-square site for the class, then click on the "assignments" tab. Click on "add attachment" to add your 2 files. Once you are sure you've added the files, click "submit."