Node.js programming is different

So you are developing a web app using some Javascript technology on the front end, and you need to build some APIs to reference a data store.  You have no shortage of choices.

You can use PHP,  Java, Ruby, Python, Scala, Go.  With PHP you can use bare metal PHP or you can use a full service framework or a microframework.  With any of the others you have your choice of frameworks, ranging from basic (ie Servlets) to all-inclusive (J2EE, ROR, Django, etcetera)..  And so on.

Your Javascript front end will require some basic APIs to retrieve back end data … a rest API something like /mysite/api/users should retrieve all users, and /mysite/api/users/2 should retrieve user #2. A fairly standardized sort of microframework has emerged to serve up these APIs.  Ruby/Sinatra is the father of these ‘micro framework’ clones:  Python/Flask, Java/Spark, PHP/Slim,  Each is a small framework that provides a simple way to parse URLS, to trigger code in response to said URLS, and to insert ‘middleware’ filters to intercept and process requests.

Node/Express looks very similar, but after the initial familiarity, it presents some challenges that require considerably different solutions from any of the similar tools:  in particular, Node’s primary technical claim to fame is the fact that it handles I/O, including database calls, asynchronously.   This requires significant changes to how programming is done.

In this article I will demonstrate some of these changes by writing an API to back up a JQGrid data grid.

JQGrid is one of the better, if not the best, of the many browser based data grids  available on the internets.  Its demo page http://www.trirand.com/blog/jqgrid/jqgrid.html, gives many examples of its capabilities, along with some PHP examples of the backend code used to provide data to the grids.

JQGrid is a useful as a demo of how node.js programming differs from the other frameworks because displaying a grid requires two queries, one to count the grid rows and one to actually return the data.  Only after both queries return can the JSON used to populate the grid be returned.  In this blog we will demonstrate how to do that.

Out of the box, JQGrid expects JSON with the following layout. Notice that calculating the ‘total’ attribute requires knowing the total number of rows in the table.

{
    "total": "xxx",
    "page": "yyy",
    "records": "zzz",
    "rows" : [
        {"id" :"1", "cell" :["cell11", "cell12", "cell13"]},
        {"id" :"2", "cell":["cell21", "cell22", "cell23"]},
        ...
    ]
}

In the second example provided on the JQGrid demo site, Loading Data/JSON Data, the back end code looks like this:

<?php
...
$page = $_GET['page']; // get the requested page
$limit = $_GET['rows']; // get how many rows we want to have into the grid
$sidx = $_GET['sidx']; // get index row - i.e. user click to sort
$sord = $_GET['sord']; // get the direction
if (!$sidx) $sidx = 1;
// connect to the database
$db = mysql_connect($dbhost, $dbuser, $dbpassword)
or die("Connection Error: " . mysql_error());

mysql_select_db($database) or die("Error conecting to db.");
$result = mysql_query("SELECT COUNT(*) AS count FROM invheader a, clients b WHERE a.client_id=b.client_id");
$row = mysql_fetch_array($result, MYSQL_ASSOC);
$count = $row['count'];

if ($count > 0) {
    $total_pages = ceil($count / $limit);
} else {
    $total_pages = 0;
}
if ($page > $total_pages) $page = $total_pages;
$start = $limit * $page - $limit; // do not put $limit*($page - 1)
$SQL = "SELECT a.id, a.invdate, b.name, a.amount,a.tax,a.total,a.note FROM invheader a, clients b WHERE a.client_id=b.client_id ORDER BY $sidx $sord LIMIT $start , $limit";
$result = mysql_query($SQL) or die("Couldn t execute query." . mysql_error());

$responce->page = $page;
$responce->total = $total_pages;
$responce->records = $count;
$i = 0;
while ($row = mysql_fetch_array($result, MYSQL_ASSOC)) {
    $responce->rows[$i]['id'] = $row[id];
    $responce->rows[$i]['cell'] = array($row[id], $row[invdate], $row[name], $row[amount], $row[tax], $row[total], $row[note]);
    $i++;
}
echo json_encode($responce);

In short, here is what the code does:

 

  1. extracts the variables showing current page, rows per page, index column and sort order from the call
  2. runs an SQL to get total number of rows in dataset
  3. runsanother SQL to get the actual data
  4. create the default JSON string that JQGrid expects

This code is straightforward, and can be very easily translated into any language that can handle a web request. Translating this into a Java servlet, or any of the microframeworks above is very similar, and differ only in language syntax.

A simple translation  will not work in node.  The key difference is how database calls are made. In the examples above, the calls are all synchronous, so we can safely do code like:

  1. Call SQL statement 1
  2. Call SQL statement 2
  3. Process statement results

Because node is asynchronous a straight translation won’t work.  We have to do something to make sure that the statements have completed before we can process the statement results.

At a high level, each query needs to be passed the function to perform when it completes — this is the callback function. Node will run the query asynchronously, will go on to do other things until the query completes, and on completion will finally then execute the callback function. This all requires a different way of programming.

Here are four ways we can program node.js to generate the JQuery JSON:

  1. Force the steps to run serially, pass each step’s data to the next step
  2. Use simple callbacks to run the queries concurrently and process results when they are finished
  3. Use the async library
  4. Use promises (my favorite)

Approach 1: Serial execution through callbacks

One approach — the one the behaves most similarly to the above, is to have the each call back invoke the next call back when it finishes, and pass the result sets through the chain until it is ready to be processed.  Here’s what this looks like:

/* GET users listing. Run queries serially */
router.get('/cpsSerial', function (req, res, next) {
    let page = req.query.page ? req.query.page : 1;
    let rows = req.query.rows ? req.query.rows : 10;
    let sidx = req.query.sidx ? req.query.sidx : 'id';
    let sord = req.query.sord ? req.query.sord : 'asc';
    let count = 0;

    let first = req.query.first ? req.query.first : '%';
    let last = req.query.last ? req.query.last : '%';

    function countQuery(err, first, last, page, rows) {
        if (err) return dataQuery(err, null, null, null, null);
        db.get().query('SELECT count(*) as count ' +
            '           FROM Users' +
            '           WHERE first like ? and last like ?',
            [first, last],
            (err, result) => {
                if (err) {
                    return dataQuery(err, null, null, null, null);
                }
                ;
                dataQuery(err, first, last, result, page, rows)
            });
    }

    function dataQuery(err, first, last, countData, page, rows) {
        if (err) return returnData(err, null, null, null, null);
        db.get().query('SELECT * ' +
            '           FROM Users' +
            '           WHERE first like ? and last like ?',
            [first, last],
            (err, result) => {
                if (err) {
                    return returnData(err, null, null, null, null);
                }

                returnData(err, countData, result, page, rows)
            });
    }

    function returnData(err, countData, recordData, page, rows) {
        if (err) {
            res.send(err);
            return;
        }

        console.log(countData);
        console.log(recordData);
        console.log(`page: ${page}`)
        console.log(`rows: ${rows}`)
        let count = countData[0].count;
        console.log(`count: ${count}`)
        let retval = {};
        retval.page = page;
        retval.total = Math.floor(count / rows) + 1;
        retval.records = count;
        retval.rows = recordData;
        res.send(JSON.stringify(retval));
    }


    let result = countQuery(null, first, last, page, rows);
})

The mainline starts by calling countQuery, which queries for the count and  passes the resuolt to dataQuery.  dataQuery queries the actual data and passes that, along with the count result, to returnData, which builds the JSON and returns the value to the caller.

This approach works but it is not a good use of node.js.  Besides all the hardcoding to control execution sequence, this is forcing a sequence on the queries which we don’t need:  we should be able to run the count and the data queries in parallel, and process only when all results are available.

There are at least three ways to accomplish it:

  1. Code it by hand in Javascript,
  2. Use the async library
  3. Use promises.

Approach 2: Parallel execution using Javascript and callbacks

One way is to use callbacks.  This would work something like this:

  1. Create a queue of the statements to be run
  2. Run the statements
  3. Have each statement do a callback which removes itself from queue, checks if queue is empty
  4. If queue is empty, process statement results

Beyond the concept of callbacks (which is central to node), this approach is initially attractive because it doesn’t require any dependencies, but it can easily explode in terms of complexity as callbacks beget callbacks beget callbacks.  This has its own name: “callback hell”.

This approach also presents challenges in terms of error handling:  because the asynchronous calls have their own call stacks, throwing an error doesn’t work.  Errors have to be passed along.  The most common approach is to pass them along until the final callback function, which essentially does the work of a catch block.

Another challenge in this particular example, is that we have to provide some sort of scope where the results from the different queries can be stored and aggregated.  Since an asynchronous call does not return a value to the calling function, we need somewhere to hold the results of each query.  The most obvious place would be a global variable, but that has problems of its own.

In the code below, we will be using Javascript closures as a means of storing the value.   As each query is processed, it requests a new callback function, which is generated by the makeCallback() function within the fork() function.  Because of Javascript’s use of closures, the all_results variable, which is defined in fork(), will be accessible to the generated callback functions, which will be able to use it to store the intermediate query values.

/* GET users listing. Uses the async module */
router.get('/cpsParallel', function (req, res, next) {
    let page = req.query.page ? req.query.page : 1;
    let rows = req.query.rows ? req.query.rows : 10;
    let sidx = req.query.sidx ? req.query.sidx : 'id';
    let sord = req.query.sord ? req.query.sord : 'asc';
    let count = 0;

    let first = req.query.first ? req.query.first : '%';
    let last = req.query.last ? req.query.last : '%';

    // thanks to slebetman from http://stackoverflow.com/questions/4631774/coordinating-parallel-execution-in-node-js
    function fork(async_calls, shared_callback) {
        var counter = async_calls.length;
        var all_results = [];

        function makeCallback(index) {
            return function () {
                counter--;
                var results = [];

                for (var i = 1; i < arguments.length - 1; i++) {
                    console.log(`i: ${i}.  arguments: ${JSON.stringify(arguments[i])}`)
                    results.push(arguments[i]);
                }

                all_results[index] = results;
                if (counter == 0) {
                    shared_callback(all_results);
                }
            }
        }

        for (var i = 0; i < async_calls.length; i++) {
            async_calls[i](makeCallback(i));
        }
    }

    function query1(callback) {
        let first = '%';
        let last = '%';
        db.get().query('SELECT count(*) as count ' +
            '           FROM Users' +
            '           WHERE first like ? and last like ?',
            [first, last],
            callback,
            (err, result) => {
                if (err) console.log("Error in query2");
            });
    }

    function query2(callback) {
        let first = '%';
        let last = '%';
        db.get().query('SELECT * ' +
            '           FROM Users' +
            '           WHERE first like ? and last like ?',
            [first, last],
            callback,
            (err, result) => {
                if (err) console.log("Error in query2");
            });
    }

    function returnJSON(result) {
                console.log(`result: ${JSON.stringify(result)}`)
                let count = result[0][0][0].count;
                let data = result[1][0];
                let retval = {};
                retval.page = page;
                retval.total = Math.floor(count / rows) + 1;
                retval.records = count;
                retval.rows = data;
                res.send(JSON.stringify(retval));
    }
})

For someone coming the Javascript, this use of closures is novel and perhaps a little difficult to follow.  And the whole book keeping process of tracking which functions have completed is something that could be, and has been encapsulated.  We will see an approach to that next.

 

Approach 3: Using the Async module

The Async module was written to resolve many of the challenges of asynchronous node.js programming.  We can use it to simplify the prior example like so:

/* GET users listing. Uses the async module */
router.get('/async', function (req, res, next) {
    let locals = {};

    let page = req.query.page ? req.query.page : 1;
    let rows = req.query.rows ? req.query.rows : 10;
    let sidx = req.query.sidx ? req.query.sidx : 'id';
    let sord = req.query.sord ? req.query.sord : 'asc';
    let count = 0;

    let first = req.query.first ? req.query.first : '%';
    let last = req.query.last ? req.query.last : '%';

    async.parallel([

            function query1(callback) {
                let first = '%';
                let last = '%';
                db.get().query('SELECT count(*) as count ' +
                    '           FROM Users' +
                    '           WHERE first like ? and last like ?',
                    [first, last],
                    callback,
                    (err, result) => {
                        if (err) return console.log("Error in query2");
                        locals.count = result;
                    });
            },

            function query2(callback) {
                let first = '%';
                let last = '%';
                db.get().query('SELECT * ' +
                    '           FROM Users' +
                    '           WHERE first like ? and last like ?',
                    [first, last],
                    callback,
                    (err, result) => {
                        if (err) return console.log("Error in query2");
                        locals.data = result;
                    });
            }
        ],

        function returnJSON(result) {
            console.log(`result: ${JSON.stringify(result)}`)
            let count = result[0][0][0].count;
            let data = result[1][0];
            let retval = {};
            retval.page = page;
            retval.total = Math.floor(count / rows) + 1;
            retval.records = count;
            retval.rows = data;
            res.send(JSON.stringify(retval));
        });

})

Approach 4: Using Promises

The async library works well, and resolves many of the issues, but there is another approach as well.  We can keep promises.

Promises are essentially placeholders for results.

So, in our example, we want to say:

  1. We promise we will get a result for statement 1
  2. We promise we will get a result for statement 2
  3. When both of these promises are kept, we will process the statementsThe actual code cleanly reflects this approach:
/* GET users listing. Uses promises */
router.get('/', function (req, res, next) {
    let page = req.query.page ? req.query.page : 1;
    let rows = req.query.rows ? req.query.rows : 10;
    let sidx = req.query.sidx ? req.query.sidx : 'id';
    let sord = req.query.sord ? req.query.sord : 'asc';
    let count = 0;

    let first = req.query.first ? req.query.first : '%';
    let last = req.query.last ? req.query.last : '%';

    // get total number of rows in dataset
    let prm1 = new Promise((resolve, reject) => {
        db.get().query('SELECT count(*) as count ' +
            '           FROM Users' +
            '           WHERE first like ? and last like ?',
            [first, last],
            (err, result) => {
                if (err) {
                    reject(err)
                }
                resolve(result)
            })
    })

    // get user data
    let prm2 = new Promise((resolve, reject) => {
        db.get().query('SELECT * ' +
            '           FROM Users' +
            '           WHERE first like ? and last like ?',
            [first, last],
            (err, result) => {
                if (err) {
                    reject(err)
                }
                resolve(result)
            })
    })

    Promise.all([prm1, prm2])
        .then((values) => {
            let count = values[0][0].count;
            let retval = {};
            retval.page = page;
            retval.total = Math.floor(count / rows) + 1;
            retval.records = count;
            retval.rows = values[1];
            res.send(JSON.stringify(retval));
        })
        .catch(err => {
            console.log(`Stuff broke ${err}`)
        })

}

Some useful references:

Node.js Design Patterns – Second Edition Kindle Edition by Mario Casciaro

Is the top 1% the right group to be discussing when it comes to wage inequality?

Much of the media discussion of increasing wage inequality in the US is framed as a discussion of how the top 1% is growing richer relative to the rest of the population.

The World Top Incomes Database, hosted by the Paris School of Economics provides a breakdown of the yearly incomes for many major countries since the early 1900s.

Using this data to track US data, here is a breakdown of average (mean) vs top 10%, and vs top 1% for most of the last 100 years:Image

So it would appear than the top 1% has been doing very well for the last few decades.

But is it all of the top 1%?  To check this, let’s break down the top 1%:

Image

 

We see that the blue and purple groups — those not in the top .1% are getting crumbs.  The top .01% to the top .1% is getting some, but it is the top hundredth of a percent that has got the lion’s share of any income increases in the last few decades.

It is good to be rich.

 

Pandas data anaysis for the SQL Guy

Here’s some beginning examples for using the excellent python data analysis libary pandas.

Coming from database background, I found I spent a fair amount of time looking up how to do things that would be second nature in SQL.  The intent of this blog is to shortcut this learning curve.

Setup

Easiest way to get running with pandas (as of 08/13) is from the Anaconda download at https://store.continuum.io/.  This installs python and all the requisite libraries.

I have been using the ipython command shell to use pandas interactively, and it works well.   Start from command line ipython –pylab.

Save work using %save command (this means you don’t have to start from scratch each session):

    %save <filename> <startline>-<endline>

Run saved file as

    %run <filename>

Beyond the online documentation, Python for Data Analysis (http://www.amazon.com/books/dp/1449319793) is very good.

Data for examples:

Sample data:  EOD stock data from EODData.com.

Data staged in nasdqa_daily.csv file that looks like this:

AACC,2012-01-02,3.91,3.91,3.91,3.91,0
AAME,2012-01-02,1.97,1.97,1.97,1.97,0
AAON,2012-01-02,20.49,20.49,20.49,20.49,0
AAPL,2012-01-02,405,405,405,405,0
AAWW,2012-01-02,38.43,38.43,38.43,38.43,0

Imported into a DB (Sqlite), the stocks table looks like

CREATE TABLE nasdaq_daily (
  ticker text,
  date text,
  open real,
  high real,
  low real,
  close real,
  volume real
)

The second table contains twitter data  that looks like this:

1,goog,2013-06-20,488
2,vod,2013-06-30,1

Imported into a DB, the table looks like this:

CREATE TABLE `twitter_data` (
  id INTEGER primary key,
  ticker CHAR(5) NOT NULL,
  date DATETIME NOT NULL,
  count INTEGER
)

——————————————————————————————–

# set up the environment and import data

from pandas import DataFrame, Series
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Import csv data
stocks = pd.read_csv(“nasdaq_daily.csv”, names=[‘ticker’, ‘date’, ‘open’, ‘high’, ‘low’, ‘close’, ‘volume’])

# Import database data
import sqlite3
import pandas.io.sql as sql
con = sqlite3.connect(‘stockdata.db’)
tweets = sql.read_frame(‘select * from twitter_data’, con)

#start of data manipulation

# select * from stocks where ticker = ‘GOOG’
goog = stocks[stocks.ticker == ‘goog’]

# select * from stocks where ticker in (‘GOOG’, ‘VOD’, ‘AAPL’, ‘CSCO’, ‘AMGN’, ‘QCOM’, ‘MSFT’, ‘AMZN’, ‘CMCSA’, ‘INTC’ )
mystocks = stocks[stocks.ticker.isin( [‘GOOG’, ‘VOD’, ‘AAPL’, ‘CSCO’, ‘AMGN’, ‘QCOM’, ‘MSFT’, ‘AMZN’, ‘CMCSA’, ‘INTC’ ] ) ];

# update stocks set ticker = lower(ticker)
mystocks[‘ticker’] = mystocks[‘ticker’].map(lambda x: x.lower())

# select * from stocks s  join twitter_data t on  s.ticker = t.ticker and s.date = t.date
merged = pd.merge(tweets, mystocks)  # assumes same named columns in two tables are joined — can also specify

#select date, count, close from <previous join>
merged = merged[[‘date’, ‘count’, ‘close’]]

#select ticker, avg(close), min(close), max(close)
#from stocks
#group by ticker
stocks[[‘ticker’, ‘close’]].groupby(‘ticker’).agg([np.mean, np.min, np.max])

 

 

Close Encounters on the Appalachian Trail, by Bill Walker

Just finished reading Skywalker, Close encounters on the Appalachian Trail, by Bill Walker.

Good light read, and one that potentially saved me lots of money.  For as long as I can remember, I’ve had a through hike of the Appalachian or the Pacific Crest trail as one of those these I’d like to do before I die.

Having read this, it seems to be one of those many trips best enjoyed vicariously.  Way too many descriptions of nights spent on cold mountains, soaked to the bones and fighting off hypothermia.  Way too many weeks preoccupied with how to deal with bears should one attack (one close encounter).  When a night’s sleep in the bathrooms at Clingman’s Dome in the Smokies is the subject of joyful anticipation, I gain some appreciation for my wind-blocking walls, my sturdy roof, and my comfy bed.

Enjoyed the book, and recommend it to anyone who thinks s/he might have the hiking bug.  At the very least, it is a good reality check.

I’ll be sticking to day hikes (well, maybe a section hike someday).