Category Archives: books

Book reviews

Pandas data anaysis for the SQL Guy

Here’s some beginning examples for using the excellent python data analysis libary pandas.

Coming from database background, I found I spent a fair amount of time looking up how to do things that would be second nature in SQL.  The intent of this blog is to shortcut this learning curve.

Setup

Easiest way to get running with pandas (as of 08/13) is from the Anaconda download at https://store.continuum.io/.  This installs python and all the requisite libraries.

I have been using the ipython command shell to use pandas interactively, and it works well.   Start from command line ipython –pylab.

Save work using %save command (this means you don’t have to start from scratch each session):

    %save <filename> <startline>-<endline>

Run saved file as

    %run <filename>

Beyond the online documentation, Python for Data Analysis (http://www.amazon.com/books/dp/1449319793) is very good.

Data for examples:

Sample data:  EOD stock data from EODData.com.

Data staged in nasdqa_daily.csv file that looks like this:

AACC,2012-01-02,3.91,3.91,3.91,3.91,0
AAME,2012-01-02,1.97,1.97,1.97,1.97,0
AAON,2012-01-02,20.49,20.49,20.49,20.49,0
AAPL,2012-01-02,405,405,405,405,0
AAWW,2012-01-02,38.43,38.43,38.43,38.43,0

Imported into a DB (Sqlite), the stocks table looks like

CREATE TABLE nasdaq_daily (
  ticker text,
  date text,
  open real,
  high real,
  low real,
  close real,
  volume real
)

The second table contains twitter data  that looks like this:

1,goog,2013-06-20,488
2,vod,2013-06-30,1

Imported into a DB, the table looks like this:

CREATE TABLE `twitter_data` (
  id INTEGER primary key,
  ticker CHAR(5) NOT NULL,
  date DATETIME NOT NULL,
  count INTEGER
)

——————————————————————————————–

# set up the environment and import data

from pandas import DataFrame, Series
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Import csv data
stocks = pd.read_csv(“nasdaq_daily.csv”, names=[‘ticker’, ‘date’, ‘open’, ‘high’, ‘low’, ‘close’, ‘volume’])

# Import database data
import sqlite3
import pandas.io.sql as sql
con = sqlite3.connect(‘stockdata.db’)
tweets = sql.read_frame(‘select * from twitter_data’, con)

#start of data manipulation

# select * from stocks where ticker = ‘GOOG’
goog = stocks[stocks.ticker == ‘goog’]

# select * from stocks where ticker in (‘GOOG’, ‘VOD’, ‘AAPL’, ‘CSCO’, ‘AMGN’, ‘QCOM’, ‘MSFT’, ‘AMZN’, ‘CMCSA’, ‘INTC’ )
mystocks = stocks[stocks.ticker.isin( [‘GOOG’, ‘VOD’, ‘AAPL’, ‘CSCO’, ‘AMGN’, ‘QCOM’, ‘MSFT’, ‘AMZN’, ‘CMCSA’, ‘INTC’ ] ) ];

# update stocks set ticker = lower(ticker)
mystocks[‘ticker’] = mystocks[‘ticker’].map(lambda x: x.lower())

# select * from stocks s  join twitter_data t on  s.ticker = t.ticker and s.date = t.date
merged = pd.merge(tweets, mystocks)  # assumes same named columns in two tables are joined — can also specify

#select date, count, close from <previous join>
merged = merged[[‘date’, ‘count’, ‘close’]]

#select ticker, avg(close), min(close), max(close)
#from stocks
#group by ticker
stocks[[‘ticker’, ‘close’]].groupby(‘ticker’).agg([np.mean, np.min, np.max])

 

 

Close Encounters on the Appalachian Trail, by Bill Walker

Just finished reading Skywalker, Close encounters on the Appalachian Trail, by Bill Walker.

Good light read, and one that potentially saved me lots of money.  For as long as I can remember, I’ve had a through hike of the Appalachian or the Pacific Crest trail as one of those these I’d like to do before I die.

Having read this, it seems to be one of those many trips best enjoyed vicariously.  Way too many descriptions of nights spent on cold mountains, soaked to the bones and fighting off hypothermia.  Way too many weeks preoccupied with how to deal with bears should one attack (one close encounter).  When a night’s sleep in the bathrooms at Clingman’s Dome in the Smokies is the subject of joyful anticipation, I gain some appreciation for my wind-blocking walls, my sturdy roof, and my comfy bed.

Enjoyed the book, and recommend it to anyone who thinks s/he might have the hiking bug.  At the very least, it is a good reality check.

I’ll be sticking to day hikes (well, maybe a section hike someday).