2014Fall7646 Project 1A
Project 1 Overview
The objective of this project is for you to develop a system that can classify documents as "good" or "bad" regarding a stock. You will train the system with example good and bad documents. After the training you will then test your system with additional documents it has not seen before to assess how accurate it is. We will complete this project in a number of short sub-projects.
Part A: Convert words to numbers and back
You are to implement two programs in Python: tonumber.py and toword.py
tonumber.py should have the following functionality:
- Objective: convert english words to numbers.
- Convert all text to lower case.
- Support words up to 12 characters in length.
- Support English text (i.e., no need to support non-ASCII characters).
- Ignore / skip punctuation and other non-character symbols (e.g., "I'm" becomes "im").
- Process one word at a time and convert them to a whole number.
- Command line usage: python tonumber.py < example.txt > numbers.txt
- Result of executing this command is the file numbers.txt that contains one number per line.
toword.py should have the following functionality:
- Read in a file like numbers.txt and output words.txt -- a file with one word per line.
- Command line usage: python toword.py < numbers.txt > words.txt
Here's what an example.txt file might look like:
DISCLAIMER PERTAINING TO INVEST-ADVICE: Please note, JoJo'Ba is a "technology company."
The output to numbers.txt might look like this:
345343454 56454532 45654 45686967855 456234 34534 345655 45 1 97897877 78345232
Note that the above list of numbers is NOT intended to be correct. The numbers your code calculates will depend on the method you use. Finally the output from towords.py might look like this:
disclaimer pertaining to investadvice please note jojoba is a technology company
Submit files as individual attachments via t-square. Do not zip up all your files into a zip file.
Run your code on this text file: Media:Words-to-numbers.txt. You cannot/should not submit your report until you have this file.
- Your code in tonumber.py, toword.py
- A SINGLE Report (in a pdf file), report.pdf including:
- The report should include the name of the course, project name and student name.
- The report should include one or two paragraphs explaining how you solved the problem in the assignment.
- The report should include the (numbers) output of tonumber.py and the (words) output of toword.py
How to submit
Go to the t-square site for the class, then click on the "assignments" tab. Click on "add attachment" to add your N files. Once you are sure you've added the files, click "submit."
Part A Extra Credit
- +2% bonus if either program submission operates correctly and is written in 10 or fewer lines of code.
Part A Rubric
Start with 100. Points off as follows:
- Report missing -20
- tonumber.py missing -100
- toword.py missing -100
- Report submitted is not in PDF -10
- Files were not submitted individually, but all together as a zip file -10
- Report submitted could not have been created using submitted code -100
- Incorrect result in final output -10 for each class of failure