Getting Compustat Data.

From Quantwiki
Jump to: navigation, search

Contents

Downloading Data - Step By Step

Log into WRDS

Click COMPUSTAT

Click North America

Click Fundamentals Quarterly


Fill out the form accordingly:

Step 1:

Select maximum date range possible.

Step 2:

Select “CUSIP”
Select “Search Entire Database”

Step 3:

Under Identifying information select “Ticker Symbol”
Under Identifying Information, cont. click “check all.” (some will not be used, but that is currently what we do)
Under Company Descriptor click “check all.”
Under Quarterly Data Items click “check all.”
Under Year-To-Date Data Items click “check all.”
Leave Supplemental Data Items blank - we get this data elsewhere.

Step 4

Under Data Format select .csv
Under Compression Type select your preference
Under Data Format select MMDDYY10:


After downloading, unzip the file to <QSDATA>/Raw/Compustat/Compustat.csv

Converting the data

If you followed the steps above your data file should be in the exact format needed to simply run:

source ./config.sh
compustat_csv_to_pkl.py

This will convert all of the data to pkl files and store it in the <QSDATA>/Processed/Compustat/ directory. Note that this script matches the symbols with existing Norgate symbols and places them in the proper directory, either NYSE, NASDAQ, or AMEX. If the symbol does not match at all the data is disregarded.

If you do not have access to Norgate data, you can change ‘Norgate’ to ‘Yahoo’ in your script to compare the Compustat symbols to Yahoo symbols instead.

Non-standard data

If you have not followed the directions above and have a different Compustat data file, there are other steps you need to take to successfully import the data.

There is a function in the script called _analyze, which analyses the csv file to determine which columns can be successfully imported as float values, i.e. they are non-strings. If you run this function it will print out two sets, “good” labels, and “bad” labels.

If you copy the set of bad labels to the code in the compustat_csv_to_pkl.py variable lsBadLabels, and the good labels to the variable COMPUSTAT in DataAccess.py, you can then follow the above steps to use your non-standard data file.

./get_yahoo_data.sh

Accessing the data

The data can be accessed with the DataAccess object like all other data sources. Check QSTK Tutorial 6 for more information.

Personal tools