How to setup VS Code for Data Science?

Feedback is welcomed and expected! :)

Why?

So I recently started a Data Science course and learnt to use Amazon SageMaker Studio Lab (ASL) to create and run our DS projects. ASL is a free Machine Learning (ML) development environment that provides a web based virtual interface to perform all Data Science and Machine Learning steps. Its really easy to setup and use but I found one drawback for my use case. I wanted a Web Interface for my Data Science apps. So I wanted to test them locally and then deploy it to Heroku. Unfortunately ASL doesn't support browser in its virtual environment. So decided to set it up locally :)

What?

In this blog I'll walk you thru the steps to setup dev environment for Data Science and Machine Learning locally. This is ideal for learning and quickly proto-typing ideas and applications, but not for training production Data Models as it might require a lot of processing power. We are going to use Docker and Visual Studio Code so setup the environment. We will also setup few VS Code plugins during the setup.

How?

Step 1: Install and setup docker

Install Docker by following the steps for respective platform here

Step 2: Install and setup Visual Studio Code

Install Visual Studio Code by following the steps for respective platform here

Step 3: Setup workspace

In your workspace create a directory called Data Science. This will be the root directory for all Data Science related projects and applications.

Step 4: Setup Remote Containers

Open the newly created Data Science directory in VS Code. Click Ctrl + Shift + P on Windows or Cmd + Shift + P in Mac to open the Command Pallet in VS Code

Step 4.1: Add Development Container

In the command pallet search for Remote-Containers and you should see a list of commands for Remote-Containers. Click on Remote Containers: Add Development Container Configuration Files

image.png

image.png

Step 4.2: Select Anaconda (Python 3)

You should see the list of Dev Containers. These are various Docker dev environments that VSCode offers out of the box. From the list select Anaconda (Python 3).

image.png

Step 4.3: Select Node version

For the Node version you can select lts or none depending on the use case.

image.png

Step 4.4: Skip additional features

DO NOT select any additional features to install and click ok. You should see a new folder called .devcontainer in the directory. Two files to look into are

  • devcontainer.json This file contains all the VSCode related options like settings, extensions etc. that we want when we run VSCode from Docker Container

  • Dockerfile This file builds the container and install all the required dependencies.

image.png

Step 5: Build and open Docker container.

Next we need to build and open the folder in Docker Container. To do that, open the Command Pallet by clicking Ctrl + Shift + P in Windows or Cmd + Shift + P on Mac. Search for Remote-Containers and run Remote-Containers: Rebuild and Reopen in Container. This might take few mins depending on internet connection and machine, but luckily we need to do this only once.

image.png

Step 5.1: Verify Docker container.

Once the container is built, VS Code automatically maps the local directory to workspace directory in the container and reloads the IDE. You should see Dev Container: Anaconda (Python 3) in the lower left corner of VS Code. This means your folder structure is now opened in the container.

image.png

Step 5.2: Verify Installations

Confirm Installations by opening a terminal in VS Code from Command Pallet

image.png

  • Confirm Python version and running which python and python --version.

image.png

  • Confirm Conda installed and we can check by running conda --version command.

image.png

  • Check the installed extensions by clicking Ctrl + Shift + x on Windows or Cmd + Shift + x on Mac

image.png

Step 6: Create Jupyter Notebook

Lets test the setup by creating a Jupyter Notebook. To do that,

  • Create new project directory called hello_world

  • Open a terminal in VS Code from Command Pallet and run the command to create new Jupyter Notebook

Create: New Jypyter  Notebook
  • Save the notebook and select hello_world directory as destination.

image.png

  • From VS Code command pallet run following command to select Conda interpreter for the notebook.
Jupyter: Select Interpreter to Start Jupyter Server

image.png

image.png

  • Edit hello_world.ipyb and insert following commands to insert install requirements. Installation should few mins depending on internet speed and machine.
# ! conda install -c plotly plotly_express  -y
# ! conda install pandas -y
# ! conda install numpy -y
  • In the next cell import the libraries
# import the python libraries

import numpy as np
import pandas as pd
import plotly.graph_objs as go
import plotly.express as px
  • Finally lets create a bar chart for a quick test
# define some variables
x_values=['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']
y_values=[15, 12, 8, 20, 19]

# create the data trace
trace = go.Bar(x=x_values, y=y_values)

# combine into a figure
fig = go.Figure([trace])
fig
  • If the bar chart loads as below, the setup is working as expected

image.png

Step 7: Heroku Setup

Finally lets install Heroku CLI to create and deploy Heroku apps. Run the following command to install in VSCode Terminal

 curl https://cli-assets.heroku.com/install.sh | sh;

Login into CLI by running.

heroku login -i

If you have 2FA turned on, for the password copy/paste the API Key from Heroku > Account Settings > API Key on the web portal.

Step 7.1: Verify Heroku CLI

Verify Heroku CLI installation. By running following command on VSCode Terminal,

heroku

We should see Heroku version and supported commands.

image.png

  • Now create a Python web application using Flask to quickly test our setup.
  • Open a terminal in VS Code and run following command to create virtual environment
python -m venv env
  • You should now see a new env directory inside hello_world

image.png

  • Run the following command to activate the environment
source env/bin/activate

image.png

  • Run the following command to install Flask library
pip install Flask

image.png

  • Run the following command to create requirements.txt
pip freeze > requirements.txt

image.png

  • Create app.py file in hello_world directory and add following Python code to create an hello world app.
from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello_world():
    return 'Hello, World!'

if __name__ == "__main__":
    app.run()
  • Run the following command to run the server
python app.py

image.png

  • On successful run, VS Code will automatically forward port to our host machine,

image.png

  • Click Open in Browser or go to http://127.0.0.1:5000/ on your host machine to check our new Hello World web app locally.

image.png

  • To run the app on Heroku we'll need to install gunicorn web server, by running following command
pip install gunicorn
  • Make sure to update requirements.txt by running
pip freeze > requirements.txt
  • Create Procfile to specify the commands executed by Heroku app on startup. More info can be found here. Copy paste the following contents into Procfile
web: gunicorn app:app
  • Create runtime.txt to define runtime environment for Heroku app. Add docker container python version in the runtime
python-3.9.12
  • Initialize git on this repo by running following commands
git init
git add .
git commit -m "My first commit"
  • Crate a new Heroku app by running following command
heroku create

image.png

  • This will not only create a new Heroku app but also add new remote server for our git remote. Run the following command to confirm that,
git remote -v

image.png

  • Finally run following command to deploy the app to Heroku
git push heroku master

image.png

  • Heroku CLI will build and deploy the app to the server, and can be accessed at the URL published in console logs

image.png

image.png

  • We can also tail the logs by running
heroku logs --tail

The End

So that was it, thats how I've setup my local environment using Docker and VS Code. Please feel free to comment with any suggestions, improvements or issues.

Happy Coding!

Did you find this article valuable?

Support Gaurang Dave by becoming a sponsor. Any amount is appreciated!