Connecting With Data Sources

From Quantwiki
Jump to: navigation, search

Contents

Overview

QSTK comes with some example historical data that you can work with to get started. For instructions on how to install that data, see QSToolKit_Installation_Guide, (scroll down to Install Example Data).

If you want to start working with "live" data you'll need to make arrangements with a data provider. However- Yahoo finance data is pretty good and QSTK comes with a super easy way to use it.

In general the sorts of data you'd want to work with comes in a CSV format, and we follow the same format as Yahoo data. QSTK directly reads data from the csv files and creates cache files based on the data that is read.

Yahoo

Yahoo finance gives us NASDAQ, NYSE and AMEX stock data- about 5,000 stocks- compared to 35,000+ from Norgate. But its free. All you need to do is run a script and find something interesting to do while we setup things for you. For (not too many) details see Getting Yahoo Data.

If you're a student in the class and want to use some equity not provided in the example data, just download the csv from Yahoo finance and put into the directory QSData/Yahoo/. And it should work automatically.

Keep in mind that Yahoo data is survivor biased. If you want non-survivor biased data, we recommend Norgate (below).

Norgate

We use and recommend Norgate Investor Services for historical price and volume data. But there are two small problems with their data: 1) Their downloading client only runs on Windows (and we like to run on ubuntu linux), and 2) their data arrives in a crufty old metastock format. Here's our work flow to address this:

  • We configure a Windows machine for receiving data only (no QSTK stuff on it). You can read about our production windows setup for receiving data.
  • On the Windows machine we schedule a daily process that:
    • Runs Norgate's client to download the data from Norgate after market close, then
    • Runs Norgate's conversion utility to convert from metastock to CSV, then
    • Runs rsync to push the CSV data to an ubuntu box into the QSData/Raw/Norgate directory.
  • On the ubuntu box we have have a cron job that
    • Calls csvconverter to convert the CSV files Python pkl files into the QSData/Processed/Norgate

COMPUSTAT via WRDS

To do more detailed analysis of stocks we need to go beyond simple price data. Other sources of common, publicly available data, are annual and quarterly financial reports. These reports contain accounting information regarding sales, revenue, assets, liabilities, etc. This data is made public by the SEC at http://www.sec.gov/edgar.shtml. However, in general, the less the data costs, the less formatting and error checking has been performed on it.

Fortunately, Georgia Tech has access to Compustat data, which compiles historical and present financial reports into an easy to access database. This data consists of several hundred accounting items, although many of them are esoteric and rarely filled in. Information on how to format and import this data into the QSTK can be found at Getting Compustat Data.

Additional Data Sources

If you want to add additional data sources, the first step is to figure out how to get them into a time series CSV format. From there, look closely at the DataAccess class and modify it to add a different source. Specifically, look for "Norgate" and add new code for your source there. If you're having trouble, please email us (tucker@cc.gatech.edu) and we'll try to help you out.

Data Directory Tree

  • QSData (root)
    • Yahoo (contains csv files read from Yahoo without any changes) (used in class)
    • Raw
      • Norgate
        • Stocks
          • US (everything below in metastock format)
            • AMEX/Indices/NASDAQ/NYSE/NYSE Arca/OTC/Delisted Securities
          • World Indices
          • CSV (everything below in CSV)
            • names.txt (names corresponding to symbols below)
            • US
              • AMEX/Indices/NASDAQ/NYSE/NYSE Arca/OTC/Delisted Securities
            • World Indices
      • Compustat (everything below in CSV)
        • US
          • NYSE/NASDAQ/AMEX
    • Processed (everything under here is in .pkl format)
      • Norgate
        • Stocks
          • US
            • AMEX/Indices/NASDAQ/NYSE/NYSE Arca/OTC/Delisted Securities
      • Compustat
        • US
          • NYSE/NASDAQ/AMEX
Personal tools