Tuesday, December 20, 2011

How we use 20% time at Adku



Experiments in the social graph
Here at Adku, we love data, and we are constantly looking for ways to leverage new sources of information to make our algorithms better. While our main line of research focuses on the relevance of signals such as demographics and click stream information, we’ve also recently been enamored with the possibilities of the social graph. A few of our engineers used 20% time to put together some hacks in this direction, and we would like to invite you to try them out.
1. Likelygifts (http://www.likelygifts.com)
Now that you’ve tried them out...
Let us explain why we built these tools in the first place.
1. Likelygifts
In spite of what privacy “features”, Facebook, Google+, and Twitter roll out, your actions on a social network are inherently public. They can be culled and managed, but a like is a publicly stated preference, and a wall post is a publicly stated fact. For real privacy, you can use the phone, or meet in person. The broadcast nature of social networks is something that our culture has not fully mastered. Hence much of the rise of social networks has been accompanied by frauds, cons, and Farmville. While social networking has exploded, useful derivatives of social networking have not.   
We built Likelygifts in hopes of changing that. If you’re like us, you’ve just started panicking about all the gifts you have to send out by the end of the month. And who knows what third cousin Bob wants? Or the college buddy who has way too many toys? Maybe we can jog your memory.
Likelygifts doesn’t post on your wall, solicit your friends, or spy on you when you’re offline. We ask for the minimum amount of information to give you accurate recommendations, and that’s all we do.

( a competitor's facebook permissions request vs. ours!)
2. Mildred
We built Mildred as a visualization tool for ourselves. After looking through rows and rows of aggregate statistics about our friends’ likes, we thought maybe there was a better way to see things.
But then we liked Mildred enough to show our friends, and they liked it enough that now we want to show you.
Here are some screenshots:

This is Carlos’s taste graph. Out of all of his friends, he alone likes ‘The Hills’. His other favorite ‘Lost’ is shared by a large number of his friends.
`
David likes ‘The Beatles’, ‘Pink Floyd’, …, and is solitary in his love for Klute.
The most interesting aspect of Mildred for us came from comparing the taste graphs for different people. We found that we could tell who the graph belonged to without even looking at the names. And though everyone seems to like The Beatles & Coldplay, the less well known tastes proved very useful for distinguishing people. Hence one of the upcoming 20% time projects: topic clustering on facebook likes. David is working on that right now, and I am sure something exciting will come of it soon. In the mean time, we'd love to hear suggestions for ways to play with this data.

Thursday, November 3, 2011

Scope and Variable Binding in Python

1. Python only has two notions of scope -- global and local. This means that if a variable is declared in a function, it is bound to the function. Otherwise it is bound to the global state.

You can inspect everything that is bound globally by calling globals(), and everything that is bound locally by calling locals()

2. In the following post I will investigate variable binding and scope in python via an interactive python session. That is, by example. If you want to follow along, check out the documentation for dir() and dis.

Local modifications stay local

# define a global variable a
In [1]: a = 5
# modify a locally
In [2]: def foo():
...: a = 3
...:
In [3]: foo()
# did global a change?
In [4]: a
Out[4]: 5
# nope.

Global variables can be accessed but not modified

In [10]: a = 5

In [11]: def foo():
....: print a
....:

In [12]: foo()
5

In order to modify a function in global scope, you must declare it to be global first. The relevant difference is explained by the following byte code snippets. In the case where a is a local variable, STORE_FAST is called, whereas in the case where a is a declared global, STORE_GLOBAL is called.

Compare:

In [21]: def foo():
....: a = 3
....:

In [22]: dis.dis(foo.func_code)
2 0 LOAD_CONST 1 (3)
3 STORE_FAST 0 (a)
6 LOAD_CONST 0 (None)
9 RETURN_VALUE

And:

In [24]: def foo():
....: global a
....: a = 3
....:
# modification is successful
In [25]: foo()
In [26]: print a
3
In [27]: dis.dis(foo.func_code)
3 0 LOAD_CONST 1 (3)
3 STORE_GLOBAL 0 (a)
6 LOAD_CONST 0 (None)
9 RETURN_VALUE

Expressions are always in global scope.

As a review, expressions are constructs like if/while/for/try/except. The point below is made with if statements, but would apply equally to any of the others.

# b is first defined inside a trivially true if statement
In [6]: if True:
...: b = 21
...:
# b is still defined globally
In [7]: b
Out[7]: 21

# it does matter that the if statement is evaluated
In [10]: if False:
....: b = 1337
....:
In [11]: b

Out[11]: 21

This contrasts with other languages like C, where variables have block scope.

#include

int main()
{
if(1)
{
int a = 6;
printf("%d\n", a);
}
printf("%d\n", a);
return 0;

}

for example, generates a compile time error: ‘a’ undeclared at the second printf statement.

Nested Function Scope

# let’s see what happens here
In [18]: def foo():
....: a = 5
....: def goo():
....: print a
....: goo()
In [19]: foo()
Out[19]: 5

Inner functions have access to the outer functions’ local variables. This is the basis of functional closures. Similar behavior occurs for lambdas.

In [23]: def foo():
....: a = 5
....: b = (lambda : a)()
....: print b
....:

The fact that functions and lambdas have access to external scope is useful for closures, but also opens up this common bug*:

# define a sequence of functions
In [40]: fs = [(lambda n: i + n) for i in xrange(10)]

# e.g.
In [41]: fs
Out[41]:
[<function __main__.<lambda>>,
<function __main__.<lambda>>,
<function __main__.<lambda>>,
<function __main__.<lambda>>,
<function __main__.<lambda>>,
<function __main__.<lambda>>,
<function __main__.<lambda>>,
<function __main__.<lambda>>,
<function __main__.<lambda>>,
<function __main__.<lambda>>]

# now apply one of the functions in the sequence
In [43]: fs[0](3)

Out[43]: 12 # wait...wtf?

Let’s see exactly what’s happening here, just for fun:

In [49]: [dis.dis(i.func_code) for i in fs]
1 0 LOAD_GLOBAL 0 (i)
3 LOAD_FAST 0 (n)
6 BINARY_ADD
7 RETURN_VALUE
1 0 LOAD_GLOBAL 0 (i)
3 LOAD_FAST 0 (n)
6 BINARY_ADD
7 RETURN_VALUE
.
.
.

<snipped>

The lambda loads i from globals, because i cannot be found within its local scope. But during the construction of the function sequence, i changes. By the time any of the functions are called, i has been incremented to 9, and so fs[0][3] doesn’t add 0 to 3 but rather 9 to 3. This is reflected in the byte code by the LOAD_GLOBAL instruction.

To avoid problems like this one, remember that python will always look for unknown function variables in global (or higher level function) scope.

Class vs Instance vs method scope

Scope behaves in the same way for functions as for classes and objects. In fact, functions are objects. If you have some experience with python, you probably already know how to define class variables and instance variables. Class variables are declared under the class declaration, and outside of any functions. Instance variables are defined attached to self, as follows:

class Foo(object):
# class variable
a_class_var = 'yo yo'
def __init__(self):
# instance variable
self.an_instance_var = 'whattttt'

Brief Exercise for the reader: input the above into ipython, and compare dir(Foo)with dir(Foo())

Consider the following class

class Foo(object):
# class variable
x = 'yo yo'

bar = Foo()

What is bar.x?

What is Foo.x?

Solution: Both are ‘yo yo’

How about this class?

class Foo(object):
x = 'yo yo'
def __init__(self):
# print self.x
self.x = 'whattttt'
# print self.x

bar = Foo()

Now,

What is bar.x?

What is Foo.x?

In this case, bar.x= ‘whattttt’ and Foo.x = ‘yo yo’. If you uncomment the print statements though, you’ll notice that initially self.x = ‘yo yo’. Huh?

Conceptually, we can think of Foo as a template for all of it’s instances. The class variable x, declared as ‘yo yo’ gets copied to the instance bar on creation. self.x in the initializer is not referring to the class’s x, but rather to the instance’s x. We never tamper with the class’s x at all. If you do want to change Foo’s x, you would set self.__class__.blah.

One important thing to note is that creating foo from Foo does a shallow copy of Foo’s internals. So

class Foo2(object):
hi = []
def __init__(self):
self.hi.append('1')

bar2 = Foo2()
print bar2.hi
bar3 = Foo2()
print bar3.hi
print Foo2.hi

produces the output:
[
1]
[1, 1]
[1, 1]

The class Foo2’s copy of hi was modified by its instances because on instance creation, a reference to hi was copied, but not hi itself. You may have seen this error disguised as:

In [1]: def foo(blah = []):
...: blah.append(1)
...: return blah
...:
In [2]: foo()
Out[2]: [1]
In [3]: foo()
Out[3]: [1, 1]
In [4]: foo()
Out[4]: [1, 1, 1]

End

Hopefully these examples have given you some intuition about scope in python. For more detailed explanations of the examples in this post, check out the Python Language Reference. Or try out your own explorations in ipython.

*http://math.andrej.com/2009/04/09/pythons-lambda-is-broken/ inspired by this post. Even though the title of that article seems strongly worded. It appears that Guido agrees: http://www.artima.com/weblogs/viewpost.jsp?thread=98196