2011Fall7646 Homework 5

From Quantwiki
Jump to: navigation, search

Overview

This assignment is meant to illustrate one of the tradeoffs between different learning methods. We will compare your implementation of KNN with a decision tree based method. The key comparison here is runtime.

To Do

  • Create a program called compare.py that does the following:
    • Reads in "data-ripple-prob.csv"
    • Use first 60% of data for training, remaining 40% for testing
    • For each K = 1 to 30
      • Time how long it takes to train YOUR implementation of KNN
      • Time how long it takes to train kdtknn's implementation of KNN
      • Time how long it takes to query YOUR KNN
      • Time how long it takes to query kdtknn's KNN
  • The timing should be per instance
  • Create 4 plots: mytraining.pdf, myquery.pdf, kdtknntraining.pdf, kdtknnquery.pdf
  • Create a report, report.pdf in which you explain:
    • Explain the shape of each plot (why it goes up down, or is flat, etc).
    • Compare and explain for training
      • Why are the shapes of the plots different?
    • Compare and explain for query
      • Why are the shapes of the plots different?

Deliverables

Submit 6 files (attachment) via t-square:

  • compare.py
  • report.pdf
  • The 4 plots

How to submit

Go to the t-square site for the class, then click on the "assignments" tab. Click on "add attachment" to add your files. Once you are sure you've added the files, click "submit."