Since working on Snapsat, my development workflow has changed considerably. Previously I did the majority of my work locally — downloading the relevant data to my machine, affecting it, and then uploading it elsewhere to be presented. It’s a common trope, and while it works, I’m convinced there’s a better way, which is what I’m going to show you today.
What we’re going to cover:
- Efficiently creating remote machines with AWS
- Running landsat-util via Docker
- Why this is an improvement over your current workflow
Assumptions I’m making about you:
- You’re using OSX or Linux and are comfortable working in a terminal.
- You have a basic familiarity with AWS — If you don’t, start here.
Creating a machine
First, let’s install the AWS command line interface.
pip install awscli
aws API is dense — it’s worth taking the time to configure command autocompletion. Fight the urge to create an instance using the AWS web interface. Over the long run, having the ability to interact with the AWS services via the command line will save you time.
Next, we’re going to make sure that
aws has the credentials necessary to interact with your AWS account.
If you don’t have them already, you’ll need to go ahead and generate some access credentials. You’ll be asked to pass in an
access_key_id, and a
Before we begin accessing our instance using ssh, we’re going to open up port 22 so that we can actually connect to it. You’ll need to create a security-group to do that.
Here's why my ssh security group settings look like.
Assuming your credentials are correct, you should now be able to spin up an Ubuntu 14.10 EC2 instance. Notice that we’ve passed in a
key-name and a
security-group. Without the key-name, this command will fail. Without the security-group, you won’t be able to log in via ssh.
aws ec2 run-instances \ --image-id ami-c5ccfcf5 \ --count 1 \ --instance-type t2.small \ --key-name personal-power \ --security-groups ssh \ --query Instances.InstanceId 'i.5ce9f995'
--query flag will list the
instance-id of our machine. Using that, we can identify the IP address we’ll use to log in.
aws ec2 describe-instances \ --instance-ids i-5ce9f995 \ --profile default \ --query Reservations.Instances.PublicIpAddress '220.127.116.11'
Let’s log in.
ssh -i /Users/j/.ssh/personal-power.pem email@example.com
Configuring our machine
Alright, we’re in, but we’re still running a vanilla Ubuntu installation that’s missing most of the libraries we’re going to need to work with Landsat data. We’ll use Docker to remedy that.
Install the Docker dependencies like so:
sudo apt-get update sudo apt-get upgrade wget -qO- https://get.docker.com/ | sh sudo docker run hello-world
Prefacing all of our Docker commands with sudo is poor form, so we’re going to add our user to the ‘docker’ group, which will allow us to run it without requiring root permissions. Applying those changes requires logging out.
sudo usermod -aG docker ubuntu exit ssh -i /Users/j/.ssh/personal-power.pem firstname.lastname@example.org
docker run jacquestardie/docker-landsat-util
That command is the equivalent of running
landsat without any parameters. Let’s do something a little more useful.
docker run jacquestardie/docker-landsat-util download LC80110282014262LGN00 -p
That should create something like:
Better, but our data is still stuck inside of a Docker container on a remote EC2 instance. Making landsat composites without any means of actually being able to access them isn’t very useful. Luckily,
landsat-util provides a means of uploading our images to S3. Let’s try that out.
docker run jacquestardie/docker-landsat-util download LC80110282014262LGN00 -p \ --upload \ --key <YOUR AWS ACCESS KEY> \ --secret <YOUR AWS SECRET KEY> \ --bucket <YOUR BUCKET> \ --region s3-us-west-2.amazonaws.com
Sweet! Let’s recap what just happened:
- You spun up an EC2 instance and applied the appropriate permissions to it.
- You used Docker to install landsat-util and it’s requisite environment.
- You used landsat-util to create a landsat composite and upload it to S3.
Why this is an improvement
Admittedly, that’s a lot of work to run
landsat-util. You could certainly get up and running faster working locally. However by using EC2, you get to make use of a significantly faster network, and you’ve isolated (and protected) the means by which you can create future Landsat scenes. While working locally might get you the first 10 scenes faster, over the long-term, you’ll save a considerable amount of time using EC2. Furthermore, as your workflow or application grows in complexity, being able to use Docker to keep things organized will help keep you sane.
In the next few weeks, I’ll be covering some more advanced use cases. If you have any questions, don’t hesitate to get in touch!
— @jqtrde 26 May 2015