A template for your ML take home

Dockerized with Jupyter, Postgres, FastAPI, and Streamlit

Nathan Sutton
3 min readAug 12, 2022

One common interview practice for machine learning roles is to send the candidate a small take home problem. They usually are framed like the following.

Here are some data and instructions

Build a model for us using a Jupyter notebook

Send us the notebook and/or some slides and we’ll talk about it

Years ago when we gave this problem to a candidate he went further to actually deploy a little model inside a web application. We all really appreciated his desire to take a vision end to end, and this template repository is meant to give a similar flavor. It will allow you focus on the problem at hand with less boilerplate code and a much better deliverable!

With this repository you can be sure that you work is reproduced by a hiring committee because it’s dockerized! In my experience it pays dividends to actually walk through a notebook instead of just looking at the output.

With this repository you can take your model deploy it as a REST API. This shows you can think beyond your interactive work in notebooks.

With this repository your users can interact with your model in a web application (albeit, a crude one). This pay dividends as your hiring committee can inspect your model’s outputs and see your latency.

Services

This repository exposes four components that are useful in a take home assignment.

  • A container running Jupyter notebooks with common machine learning libraries (port:8888). Any notebooks will persist in a mounted volume (./volumes/notebooks).
  • A container running Postgres in the event a relational database is useful (port:5432). Any transformations will persist between containers in a mounted volume (./volumes/postgres).
  • A container running FastAPI to serve predictions from a scikit-learn model (port:8080)
  • A container running Streamlit allows a user to access the predictions from their scikit-learn model based on user inputs (port:8501)

Usage

This template is completely dockerized, and so to turn on the individual components just involves a single command.

docker-compose up 

Data

This template includes an example dataset that I created based on iteratively querying the New Jersey State Health Assessment Data, and these data are also available as an extract on Kaggle. Full caveats there is virtually no signal in these data to predict premature birth outcomes, it is merely meant to illustrate a problem.

REST API Endpoint

The model is available as a REST API endpoint on port 8080. It accepts JSON data that look like 1 row of the dataframe it as trained on.

curl --request POST http://127.0.0.1:8080/predict \
-H 'Content-Type: application/json' \
-d '{"age_group": "Under 15 yrs","reported_race_ethnicity": "White, non-Hispanic", "previous_births": "None","tobacco_use_during_pregnancy": "Yes","adequate_prenatal_care": "Inadequate"}'

Streamlit User Interface

A small web application can take features used to drive your model, then return a prediction from the REST API.

Sample User Interface — all images by author.

Let’s Connect!

If you’ll benefit from this work, clone the repository and chug away at your problem! If this is all a little daunting and you’d like help breaking into ML roles in industry, reach out to me or other mentors @ SharpestMinds.

--

--