Tuesday, August 16, 2011

pip - python package management

If you’ve ever developed a Python application before, you may be familiar with the litany of packages you need to get everything running. You might start off your web app with Django then add in Jinja2 to get some template speedups. From there, you might want some additional speed so you install packages like Cython, MarksupSafe, and simplejson for their C-extensions. After that, you realize you might want to dabble around with PyMongo as well. You go back to hacking for a while until you start thinking to yourself... how am I ever going to manage all these 3rd party packages? Luckily for us, things aren’t as dire as they seem. Python package management has gotten a lot better over the last few years thanks in part to PyPi, Python’s central package repository and a tool called “pip”


In essence, pip is a python package installer

Things pip can do
  • Manage python packages much like apt-get manages system packages
  • Automatically build C-extensions as required
  • Automatically upgrade/downgrade packages based on your specs

Here’s a quick cheatsheet to the commands we most commonly use

Enumerating installed packages

$ pip freeze

Uninstalling packages

$ sudo pip uninstall Django

Installation - By name and version

$ sudo pip install Django==1.2.5 pycrypto==2.0.1 simplejson==2.1.5

Installation - From a requirements file

A requirements file is a text file that literally looks like the following. This makes it trivial to version control what python packages/versions we have running in our production systems.
Django==1.2.5
pycrypto==2.0.1
simplejson==2.1.5
$ sudo pip install -r my-requirements.txt

We found "pip" to be VERY fast at enumerating already installed packages. To see this for yourself, re-run the above command! We currently have about 35 packages in production and
"pip" can enumerate all of these in about half a second.

Installation - Using a private PyPi server

All commands thus far have managed to magically discover these packages/versions versions and install them on our behalf. Behind the scenes, "pip" installs from PyPi, a public HTTP server that has meta-information regarding all these packages. You can think of PyPi as the closest thing Python has to a central package repository. Most of the time, we can rely on PyPi being up. To isolate yourself from the occasional PyPi hiccup though, you can setup your own private PyPi server. To install packages using your own private server, run

$ sudo pip install [-r my-requirements.txt] [package==version] --index-url http://your-pypi-server:8001/simple

Setting up your own private PyPi server has some benefits, you can
  • Isolate yourself from PyPi going down
  • Download all packages over a local network connection (Speedy!)
  • Manage custom-modified packages and have them install as part of the standard requirements process - for example, if you needed to hack Tornado
Setting up a private PyPi server is a bit beyond the scope of this article but here’s an article that definitely helped us on our way



Thanks to “pip” we’ve managed to tame our package installation process. Hopefully, after reading this article you will have too :)

Thursday, August 11, 2011

Hardly working hard

I am lazy. I don’t remember when it began, but to the great frustration of my mother, it has yet to end. Luckily though, there are many tools available for the lazy software engineer, some of which I’d like to share today in this post.

Too lazy to type?

I like saving key strokes. Why type when you can...not type? There are a couple of bash tricks I learned from Carlos and Jesse that I really like:

  1. cd - cd to last visited directory. Everybody knows about cd ~, but this is at least as useful. Instead of taking you home, cd - takes you to the last directory you visited.

Example use case:

cd /path/to/logs

tail log1 | cut -f 3 | sort

tail log2 | cut -f 5 | sort

cd -

vim

2. ctrl + r: Reverse search bash history. Here at Adku, we run A LOT of map reduces. Most of the time, we’re calling the same few commands. ctrl + r lets us run, forget, and then run again later.

Example use case:

fab some_map_reduce_job:arg1=’ahhhh’,arg2=’wahhhhh’

cd ../

touch ‘asdf’

ctrl + r

fab

fab some_map_reduce_job:arg1=’ahhhh’,arg2=’wahhhhh’

3. !n:x-y Recall arguments x-y from the last nth command. This is another command that is a great help when running map reduces with complicated argument lists. Though, it is a lot more complicated than the other commands mentioned here, and worth reading about separately. True to the theme of this post, I redirect you to: http://www.catonmat.net/blog/the-definitive-guide-to-bash-command-line-history/ for a great resource on this, and will stick to giving a few simple examples.

Example use case(s):

>echo “foobar”

foobar

>echo “moocow”

moocow

>!-2

foobar

>!-2

moocow

--------

cat foo > impossible_to_remember_filename_akj3kj2437fvaj

nano !:3

--------

port install haskell-platform

##oh no, permission failure##

sudo !!

---------

>echo “hi” && echo “why”

hi

>!!:0-1

hi

>!-2:2-$

why

Too lazy to go to work?

Being too lazy to go to work is, of course, never a problem for me. But theoretically, I could see the following scenario occur.

1. While at work, start a ssh tunnel between your work computer and a gateway server. (For details on how we do this, checkout: http://blog.adku.com/2011/06/working-remotely.html)

2. Go home, relax.

3. Start a ssh session to your work server.

4. Start tmux

5. never go to work again.

With the exception of 4 (and 5?), this is probably a familiar process. But here’s the beauty of four:

connection closed by remote host.

Never again.

To quote the tmux man page:

Each session is persistent and will survive accidental disconnection (such as ssh(1) connection timeout) or intentional detaching (with the `C-b d' key strokes).

Aside from never losing your work due to connection problems, Tmux also makes it easy to start a job at work, go home, and continue monitoring that job remotely, or vice versa.

For example:

I start a tmux session at start tailing a random log at work:

To remember this session, I name it by typing

ctrl + b + : rename_session “foo”

Then I go home.

At home, I ssh into my work computer and type

and everything just works!

In addition to creating and naming sessions, tmux makes it easy to create and name panes within sessions. I often find myself running many jobs of the same type, which I like to organize into named panes within a single session.

The panes I have open are called “mo”, “meeny”, and “eeny”. In each screen I am running a different map reduce job. By naming the panes, I’ll remember exactly what I was running in each pane wherever I go.

Basic panel usage:

create

ctrl + b c

delete

ctrl + b x

rename

ctrl + b ,

navigate to previous panel

ctrl + b p

nagivate to next panel

ctrl + b n

navigate to panel #

ctrl + b #

Too lazy to...continue?