2014Fall7646 Project 1A

From Quantwiki
Jump to: navigation, search

Important Updates

Project 1 Overview

The objective of this project is for you to develop a system that can classify documents as "good" or "bad" regarding a stock. You will train the system with example good and bad documents. After the training you will then test your system with additional documents it has not seen before to assess how accurate it is. We will complete this project in a number of short sub-projects.

Part A: Convert words to numbers and back

You are to implement two programs in Python: tonumber.py and toword.py

tonumber.py should have the following functionality:

  • Objective: convert english words to numbers.
  • Convert all text to lower case.
  • Support words up to 12 characters in length.
  • Support English text (i.e., no need to support non-ASCII characters).
  • Ignore / skip punctuation and other non-character symbols (e.g., "I'm" becomes "im").
  • Process one word at a time and convert them to a whole number.
  • Command line usage: python tonumber.py < example.txt > numbers.txt
  • Result of executing this command is the file numbers.txt that contains one number per line.

toword.py should have the following functionality:

  • Read in a file like numbers.txt and output words.txt -- a file with one word per line.
  • Command line usage: python toword.py < numbers.txt > words.txt

Here's what an example.txt file might look like:

DISCLAIMER PERTAINING TO INVEST-ADVICE: Please note, JoJo'Ba is a 
"technology company."

The output to numbers.txt might look like this:

345343454
56454532
45654
45686967855
456234
34534
345655
45
1
97897877
78345232

Note that the above list of numbers is NOT intended to be correct. The numbers your code calculates will depend on the method you use. Finally the output from towords.py might look like this:

disclaimer
pertaining
to
investadvice
please
note
jojoba
is 
a
technology
company

Deliverables

Submit files as individual attachments via t-square. Do not zip up all your files into a zip file.

Run your code on this text file: Media:Words-to-numbers.txt. You cannot/should not submit your report until you have this file.

  • Your code in tonumber.py, toword.py
  • A SINGLE Report (in a pdf file), report.pdf including:
    • The report should include the name of the course, project name and student name.
    • The report should include one or two paragraphs explaining how you solved the problem in the assignment.
    • The report should include the (numbers) output of tonumber.py and the (words) output of toword.py

How to submit

Go to the t-square site for the class, then click on the "assignments" tab. Click on "add attachment" to add your N files. Once you are sure you've added the files, click "submit."

Part A Extra Credit

  • +2% bonus if either program submission operates correctly and is written in 10 or fewer lines of code.

Part A Rubric

Start with 100. Points off as follows:

  • Report missing -20
  • tonumber.py missing -100
  • toword.py missing -100
  • Report submitted is not in PDF -10
  • Files were not submitted individually, but all together as a zip file -10
  • Report submitted could not have been created using submitted code -100
  • Incorrect result in final output -10 for each class of failure