Motivation

In this posting I want to explain a fundamental concept shared by many popular programming languages such as Python et al. Understanding this concept is key to working with objects, names and variables successfully. Those who don’t understand it often face difficulties at some point, and many questions asked on support channels reveal that not everybody knows about this. After reading this posting you will (hopefully) have understood why things behave the way they behave, so let’s dive in.

The Problem (one of them)

Consider the following code (if you want the code beautifully colored for easier reading, please visit my blog directly rather than reading through a planet):

    >>> class Foo(object):
    ...     def __init__(self):
    ...         self.my_attribute = "some string literal"
    ... 
    >>> foo = Foo()
    >>> bar = foo

This first defines a class Foo and an attribute self.my_attribute. (You can tell that this is an instance attribute because the definition takes place in the __init__ method. We will cover that in a later blog posting.) We then assign the string literal “some string literal” to the attribute. Now, every instance of class Foo is initialized with an attribute my_attribute with the value “some string literal”. After the definition of the class, we create an instance of it (By calling something Java folks would refer to as the constructor. Take note of the parentheses.) which we call foo. The next line is what is most interesting: One might say “bar equals foo”, but that is not as precise an idiom it should be. Let’s go on and see what happens.

    >>> foo.my_attribute
    'some string literal'
    >>> bar.my_attribute
    'some string literal'

This is easy. We just get the value stored in my_attribute. No surprises here. The next step, however, may be confusing to some.

    >>> bar.my_attribute = "it's a trap!"
    >>> foo.my_attribute
    "it's a trap!"

We assigned another string literal “it’s a trap!” to bar.my_attribute and for some reason (which I will explain shortly) foo.my_attribute changed to the very same string as well!

The Home of the Objects

When you instantiate an object from a class, there is no doubt that you need memory for your object (even if you didn’t specify any attributes on that class/object) to be stored. Upon creation of an object, that object is stored somewhere. (How this happens exactly is dependant on the languages implementation and is of no interest for us.) Consider this example:

    >>> Foo()
    <__main__.Foo object at 0xb7e791ac>

If you read the above carefully you will have noticed that this, again, creates an object of class Foo (indicated by the parentheses). The instance of Foo we just created lives somewhere (albeit, in CPython, only for a limited time as we shall soon see). Now, if we wanted to alter that objects my_attribute, how would we do it? The answer is: We can’t (I am simplifying here). “Why can’t we?” I hear you say. The explanation is, that we did not bind the object to a name. Binding an object to a name usually means selecting a unique identifier (like the very unique word “foo”), creating an instance of a class as we did above, and putting the = operator between those. Like this:

    >>> foo2 = Foo()

We now have bound (a new!) instance of class Foo to the name (or reference) foo2 (Check your languages specification to see what is allowed as a valid identifier). We can now do:

    >>> foo2
    <__main__.Foo object at 0xb7825e0c>

As you can see, foo2′s differs from the object we created above (the one that we didn’t bind to a name), because it actually is a new, different object. (The hexadecimal value 0xb7825e0c tells us where this object is stored but does not matter much. You will see other values for your machine.)

Let me visualize this (without the objects we created when we began):

The Home of the Objects

As you can see, the object we created last is bound to the name foo2. The other object is not bound to any name and thus not reachable for us.

You can now understand what happened when we initially did this:

    >>> class Foo(object):
    ...     def __init__(self):
    ...         self.my_attribute = "some string literal"
    ... 
    >>> foo = Foo()
    >>> bar = foo

On the very last line, we did not create an instance of class Foo, but instead just created another reference for that object. The very same object is now reachable with either name, foo and bar. Hence, if we change foo’s attributes we will alter bar’s as well and vice versa.

If you want to check whether two names are referring to the same object, you can use the is operator in Python like this:

    >>> foo is bar
    True
    >>> bar is foo
    True

Here you are checking for identity of both objects. Don’t confuse the is operator with the == operator (in Python), which checks for equality but not identity! (This is different for other languages like Java!)

Doing

    foo = 5

binds object 5 to foo, represented by the integer literal 5 (Yes, even ints are objects in Python). For performance reasons Python does not create a new object for every integer you use, but rather uses the same object for the same integer. With this, you have lost the (potentially last) reference to the object (to which the name foo was bound to just a minute ago).

If you want to delete your reference, you can just do:

    del foo

This does not necessarily delete your object! It only deletes the name. If, how and when the object itself is deleted is completely dependant on the implementation of Python you use. (This is said to be an implementation detail.)

Accessing a name after it has been deleted results in the following traceback:

    >>> foo
    Traceback (most recent call last):
      File "< stdin >", line 1, in < module >
    NameError: name 'foo' is not defined

Objects do not know what names they are bound to, if any. They actually don’t even have a name. The names we defined are just references to the objects.

You can reach objects not only by name. Consider the following example:

    >>> a = [Foo(), Foo(), Foo()]

Here we create a list (an object itself, by the way), create three instances of Foo and put them into the list. Note that we did not bind any of the objects to a name, except for the list, which is bound to “a”. The three instances are reachable via that name. They are said to be contained by the list and thus still reachable.

    >>> a
    [<__main__.Foo object at 0xb7cf9f4c>, <__main__.Foo object at 0xb7cf9e2c>, <__main__.Foo object at 0xb7cf9eec>]
    >>> a[0]
    <__main__.Foo object at 0xb7cf9f4c>
    >>> a[1]
    <__main__.Foo object at 0xb7cf9e2c>
    >>> a[2]
    <__main__.Foo object at 0xb7cf9eec>

Names have only limited visibility. The underlying concept is called the namespace. (I won’t go into more detail here. Just that you got something to search for.)

Conclusion

By now you should have understood the basic principle of object creation and name binding in languages like Python. If there are any questions remaining, feel free to drop a comment to this posting. If someone comes up to you and says “Take a look at my shiny Python variable!”, always keep in mind that it is actually a name, bound to an object.

Further reading:

2 comments Nov 22, 2008 1:17:00 PM Coding, Python

Comment by Ciarán — Nov 29, 2008 12:12:00 PM | #- re

Hi,

Thanks for posting this. It’s really helped my understanding of Python. What you say seems to make sense to me, do you plan to compliment this post with a follow-up? I’d be interested in reading more.

Comment by dennda — Nov 29, 2008 3:19:00 PM | #- re

Hi Ciarán,

thanks for your comment!

I’ve got a few things in mind I want to blog about. One of which may be seen as a follow-up to this posting. It’d be helpful to know what you are interested in precisely.

Regards, Christopher