From Quantwiki
Jump to: navigation, search


QuantViz is a visualization module for the Quant software toolkit. It can be used to visualize various features for a particular list of symbols. All the features are pre-calculated and then stored. The visualizer reads in this data and then can be used for plotting. It is located in the Tools/Visualizer directory of QSTK. Some features of the visualizer are :

1) Save specific plots and create movies over time.

2) Manipulate axis limits and slice data according to your requirement.

3) Can be used to visualize any data provided it was stored in the specific format.

Using the Visualizer on QSTK

Using the visualizer on QSTK is a two step process. First you need to create the data you want to visualize and then run the visualizer.

In the file GenerateData.py located in the visualizer directory, edit the parameters in the main. You are required to make 4 changes in total for creating any data you want.

1) Start date - The start date from which you want to read the data. 2) End date - The last date of your data analysis. 3) Datadirectory - It is a tag associated with the list of symbols, used for identification of this data set while running the visualizer. 4) Symbols - It is the list of symbols you would like to analyze. Please make sure that 'SPY' is one of the symbols as that is used in a lot of features which are relative to the market.

Once your data set is created you just need to run the Visualizer.py file and select the data you just created or any previously created dataset.

Using the Visualizer on data from a CSV file

CsvData.py converts data stored in a CSV file into the format of input to the visualizer. To convert a csv to the format used in the visualizer just change the inputpath and datadirectory elements in the main of the CsvData.py. The datadirectory is the name of the output directory which you want to create. An example file is stored in the directory /Data/Raw/Test.csv which is processed and the result is stored in the directory /Data/TestCSV/.

The raw csv file format should be :

Timestamps Tag Feat1 Feat2 Feat3 Feat4 Feat5
(yyyy-mm-dd) Tag1 Value Value Value Value Value
(yyyy-mm-dd) Tag2 Value Value Value Value

Timestamps will be repeated between rows while the tags change. The code can handle missing values and a tag not being present on a certain time.

Using the Visualizer on other Data

The basic requirement of QuantViz is that all the data should be in a proper format. So the main challenge is to convert the data collected externally into the format of the visualizer. The data used in the visualizer is in a four file format.

1) Timestamps.txt - It is the list of all the timestamps over which the reading were taken. The dates are stored in ISO format of year-month-date.

2) Symbols.txt - It is the list of all ID Tags associated with the observations. Like stations names for readings, or tickers for equity data.

3) Features.txt - It is a list of all the feature tags, that is the name of the features you have observed ir calculated.

4) ALLDATA.pkl - It is a pickle dump created out of a 3D array which has all the observed readings. The format of the array is timestamps as rows, tags as columns and features as z-axis. Which means that every slice of the array is a single feature.

After that you just need to store them in a single directory and select that location while using the visualizer.

FormatData.py file gives an example of converting raw data to the format of the visualizer. The example data is stored in the directory Data/Norway/Raw which is processed and stored in the Data/Norway directory. This example data is weather data from various stations in Norway and taken over a period of more than 200 years with yearly readings.


A snapshot of the Visualizer