Abstract your machine

26 May 2015

Since working on Snapsat, my development workflow has changed considerably. Previously I did the majority of my work locally — downloading the relevant data to my machine, affecting it, and then uploading it elsewhere to be presented. It’s a common trope, and while it works, I’m convinced there’s a better way, which is what I’m going to show you today.

We’ll cover:

Assumptions I’m making about you:

Creating a machine

First, let’s install the AWS command line interface.

pip install awscli

The aws API is dense — it’s worth taking the time to configure command autocompletion. Fight the urge to create an instance using the AWS web interface. Over the long run, having the ability to interact with the AWS services via the command line will save you time.

Next, we’re going to make sure that aws has the credentials necessary to interact with your AWS account.

aws configure

If you don’t have them already, you’ll need to go ahead and generate some access credentials. You’ll be asked to pass in an access_key_id, and a secret_access_key.

Before we begin accessing our instance using ssh, we’re going to open up port 22 so that we can actually connect to it. You’ll need to create a security-group to do that.

Assuming your credentials are correct, you should now be able to spin up an Ubuntu 14.10 EC2 instance. Notice that we’ve passed in a key-name and a security-group. Without the key-name, this command will fail. Without the security-group, you won’t be able to log in via ssh.

aws ec2 run-instances \
    --image-id ami-c5ccfcf5 \
    --count 1 \
    --instance-type t2.small \
    --key-name personal-power \
    --security-groups ssh \
    --query Instances[0].InstanceId


The --query flag will list the instance-id of our machine. Using that, we can identify the IP address we’ll use to log in.

aws ec2 describe-instances \
    --instance-ids i-5ce9f995 \
    --profile default \
    --query Reservations[0].Instances[0].PublicIpAddress


Let’s log in.

ssh -i /Users/j/.ssh/personal-power.pem ubuntu@

Configuring our machine

Alright, we’re in, but we’re still running a vanilla Ubuntu installation that’s missing most of the libraries we’re going to need to work with Landsat data. We’ll use Docker to remedy that.

Install the Docker dependencies like so:

sudo apt-get update
sudo apt-get upgrade
wget -qO- https://get.docker.com/ | sh
sudo docker run hello-world

Prefacing all of our Docker commands with sudo is poor form, so we’re going to add our user to the ‘docker’ group, which will allow us to run it without requiring root permissions. Applying those changes requires logging out.

sudo usermod -aG docker ubuntu
ssh -i /Users/j/.ssh/personal-power.pem ubuntu@

Now it’s time to get fancy. If we use Docker to pull down docker-landsat-util, we’ll be getting landsat-util and all of it’s dependencies.

docker run jacquestardie/docker-landsat-util

That command is the equivalent of running landsat without any parameters. Let’s do something a little more useful.

docker run jacquestardie/docker-landsat-util download LC80110282014262LGN00 -p

That should create something like:

Better, but our data is still stuck inside of a Docker container on a remote EC2 instance. Making landsat composites without any means of actually being able to access them isn’t very useful. Luckily, landsat-util provides a means of uploading our images to S3. Let’s try that out.

docker run jacquestardie/docker-landsat-util download LC80110282014262LGN00 -p \
      --upload \
      --key <YOUR AWS ACCESS KEY> \
      --secret <YOUR AWS SECRET KEY> \
      --bucket <YOUR BUCKET> \
      --region s3-us-west-2.amazonaws.com

Sweet! Let’s recap what just happened:

  1. You spun up an EC2 instance and applied the appropriate permissions to it.
  2. You used Docker to install landsat-util and it’s requisite environment.
  3. You used landsat-util to create a landsat composite and upload it to S3.

Why this is an improvement

Admittedly, that’s a lot of work to run landsat-util. You could certainly get up and running faster working locally. However by using EC2, you get to make use of a significantly faster network, and you’ve isolated (and protected) the means by which you can create future Landsat scenes. While working locally might get you the first 10 scenes faster, over the long-term, you’ll save a considerable amount of time using EC2. Furthermore, as your workflow or application grows in complexity, being able to use Docker to keep things organized will help keep you sane.

In the next few weeks, I’ll be covering some more advanced use cases. If you have any questions, don’t hesitate to get in touch!