Deploy your H2O model without hassle in plain SQL

3 min readMar 6, 2019

I consistently feel (warranted) pressure to improve my model with more elaborate feature engineering only to later facepalm when I need to put that model into production. The path forwards to deploy a complex model is clear, just make an API! This post represents the other end of the spectrum. Let’s do as much in the database as possible.

tl;dr

You can still leverage all the power of Python/R to build and validate a generalized linear model. These functions will allow you to quickly deploy your model in a database as raw SQL code.

Who this is for

I have written this especially for

Data Science practitioners
who are asked to deliver interpretable linear models quickly
whose data starts in a SQL database
whose predictions end up in a SQL database
who want to keep is simple

Functions

In this post we’ll use the R version.

But you could just as easily use the equivalent Python.

Walkthrough

In this post I’ll build a purposefully simple model for all-cause readmissions based on a demo sample of the MIMIC-III critical care database. This is a relational database that keeps track of all of a patient’s medical information (labs, diagnoses, vitals, etc) during a single hospital stay.

I purposefully did all the feature engineering / denormalization in SQL so that my deployment script would be this simple.

In another post I’ll speak about how I got to this view from the MIMIC source tables. The important thing to know here is that the rows represent a cross-section of individuals across time. This is a really simple model, and so each of the columns represent a binary indicator (encoded as 1=present, 0=absent).

Now that we’ve inspected the data, let’s bring it into a dataframe.

With our data in hand we can build a simple logistic regression model with H2O.

Then translate it into SQL.

The power here is that we can cut R (or Python) out of the loop entirely. All that we need to do is put this into an appropriate SQL structure, like a view.

This function is only useful if it faithfully returns the same predictions. Hooray for the 1:1 line!

Make your life simple

Some of my most successful deployments have followed this pattern. Without any complications from R or Python code to mess anything up, these views silently did their job and never failed.

Feel free to reach out and connect if this is helpful to you!

Deploy your H2O model without hassle in plain SQL

tl;dr

Who this is for

Functions

Make your life simple

Written by Nathan Sutton