I had a programming interview recently, a phone-screen in which we used a collaborative text editor.

I was asked to implement a certain API, and chose to do so in Python. Abstracting away the problem statement, let’s say I needed a class whose instances stored some data and some other_data.

I took a deep breath and started typing. After a few lines, I had something like this:

class Service(object):
    data = []

    def __init__(self, other_data):
        self.other_data = other_data

My interviewer stopped me:

  • Interviewer: “That line: data = []. I don’t think that’s valid Python?”
  • Me: “I’m pretty sure it is. It’s just setting a default value for the instance attribute.”
  • Interviewer: “When does that code get executed?”
  • Me: “I’m not really sure. I’ll just fix it up to avoid confusion.”

For reference, and to give you an idea of what I was going for, here’s how I amended the code:

class Service(object):

    def __init__(self, other_data):
        self.data = []
        self.other_data = other_data

As it turns out, we were both wrong. The real answer lay in understanding the distinction between class and instance attributes.

Python class attributes vs. Python instance attributes

Note: if you have an expert handle on class attributes, you can skip ahead to use cases.

Class Attributes

My interviewer was wrong in that the above code is syntactically valid.

I too was wrong in that it isn’t setting a “default value” for the instance attribute. Instead, it’s defining data as a class attribute with value [].

In my experience, class attributes are a topic that many people know something about, but few understand completely.

What’s the difference?

A class attribute is an attribute of the class (circular, I know), rather than an attribute of an instance of a class.

Let’s use an example to illustrate the difference. Here, class_var is a class attribute, and i_var is an instance attribute:

class MyClass(object):
    class_var = 1

    def __init__(self, i_var):
        self.i_var = i_var

Note that all instances of the class have access to class_var, and that it can also be accessed as a property of the class itself:

foo = MyClass(2)
bar = MyClass(3)

foo.class_var, foo.i_var
## 1, 2
bar.class_var, bar.i_var
## 1, 3
MyClass.class_var ## <— This is key
## 1

For Java or C++ programmers, the class attribute is similar—but not identical—to the static member. We’ll see how they differ below.

Class vs. instance namespaces

To understand what’s happening here, let’s talk briefly about Python namespaces.

A namespace is a mapping from names to objects, with the property that there is zero relation between names in different namespaces. They’re usually implemented as Python dictionaries, although this is abstracted away.

Depending on the context, you may need to access a namespace using dot syntax (e.g., object.name_from_objects_namespace) or as a local variable (e.g., object_from_namespace). As a concrete example:

class MyClass(object):
    ## No need for dot syntax
    class_var = 1

    def __init__(self, i_var):
        self.i_var = i_var

## Need dot syntax as we've left scope of class namespace
## 1

Python classes and instances of classes each have their own distinct namespaces represented by pre-defined attributes MyClass.__dict__ and instance_of_MyClass.__dict__, respectively.

When you try to access an attribute from an instance of a class, it first looks at its instance namespace. If it finds the attribute, it returns the associated value. If not, it then looks in the class namespace and returns the attribute (if it’s present, throwing an error otherwise). For example:

foo = MyClass(2)

## Finds i_var in foo's instance namespace
## 2

## Doesn't find class_var in instance namespace…
## So look's in class namespace (MyClass.__dict__)
## 1

The instance namespace takes supremacy over the class namespace: if there is an attribute with the same name in both, the instance namespace will be checked first and its value returned. Here’s a simplified version of the code (source) for attribute lookup:

def instlookup(inst, name):
    ## simplified algorithm...
    if inst.__dict__.has_key(name):
        return inst.__dict__[name]
        return inst.__class__.__dict__[name]

And, in visual form:

attribute lookup in visual form

Handling assignment

With this in mind, we can make sense of how class attributes handle assignment:

  • If a class attribute is set by accessing the class, it will override the value for all instances. For example:

    foo = MyClass(2)
    ## 1
    MyClass.class_var = 2
    ## 2

    At the namespace level… we’re setting MyClass.__dict__['class_var'] = 2. (Note: this isn’t the exact code (which would be setattr(MyClass, 'class_var', 2)) as __dict__ returns a dictproxy, an immutable wrapper that prevents direct assignment, but it helps for demonstration’s sake). Then, when we access foo.class_var, class_var has a new value in the class namespace and thus 2 is returned.

  • If a class variable is set by accessing an instance, it will override the value only for that instance. This essentially overrides the class variable and turns it into an instance variable available, intuitively, only for that instance. For example:

    foo = MyClass(2)
    ## 1
    foo.class_var = 2
    ## 2
    ## 1

    At the namespace level… we’re adding the class_var attribute to foo.__dict__, so when we lookup foo.class_var, we return 2. Meanwhile, other instances of MyClass will not have class_var in their instance namespaces, so they continue to find class_var in MyClass.__dict__ and thus return 1.


Quiz question: What if your class attribute has a mutable type? You can manipulate (mutilate?) the class attribute by accessing it through a particular instance and, in turn, end up manipulating the referenced object that all instances are accessing (as pointed out by Timothy Wiseman).

This is best demonstrated by example. Let’s go back to the Service I defined earlier and see how my use of a class variable could have led to problems down the road.

class Service(object):
    data = []

    def __init__(self, other_data):
        self.other_data = other_data

My goal was to have the empty list ([]) as the default value for data, and for each instance of Service to have its own data that would be altered over time on an instance-by-instance basis. But in this case, we get the following behavior (recall that Service takes some argument other_data, which is arbitrary in this example):

s1 = Service(['a', 'b'])
s2 = Service(['c', 'd'])


## [1]
## [1]


## [1, 2]
## [1, 2]

This is no good—altering the class variable via one instance alters it for all the others!

At the namespace level… all instances of Service are accessing and modifying the same list in Service.__dict__ without making their own data attributes in their instance namespaces.

We could get around this using assignment; that is, instead of exploiting the list’s mutability, we could assign our Service objects to have their own lists, as follows:

s1 = Service(['a', 'b'])
s2 = Service(['c', 'd'])

s1.data = [1]
s2.data = [2]

## [1]
## [2]

In this case, we’re adding s1.__dict__['data'] = [1], so the original Service.__dict__['data'] remains unchanged.

Unfortunately, this requires that Service users have intimate knowledge of its variables, and is certainly prone to mistakes. In a sense, we’d be addressing the symptoms rather than the cause. We’d prefer something that was correct by construction.

My personal solution: if you’re just using a class variable to assign a default value to a would-be instance variable, don’t use mutable values. In this case, every instance of Service was going to override Service.data with its own instance attribute eventually, so using an empty list as the default led to a tiny bug that was easily overlooked. Instead of the above, we could’ve either:

  1. Stuck to instance attributes entirely, as demonstrated in the introduction.
  2. Avoided using the empty list (a mutable value) as our “default”:

    class Service(object):
        data = None
        def __init__(self, other_data):
            self.other_data = other_data

    Of course, we’d have to handle the None case appropriately, but that’s a small price to pay.

So when would you use them?

Class attributes are tricky, but let’s look at a few cases when they would come in handy:

  1. Storing constants. As class attributes can be accessed as attributes of the class itself, it’s often nice to use them for storing Class-wide, Class-specific constants. For example:

    class Circle(object):
        pi = 3.14159
        def __init__(self, radius):
            self.radius = radius
        def area(self):
            return Circle.pi * self.radius * self.radius
    ## 3.14159
    c = Circle(10)
    ## 3.14159
    ## 314.159
  2. Defining default values. As a trivial example, we might create a bounded list (i.e., a list that can only hold a certain number of elements or fewer) and choose to have a default cap of 10 items:

    class MyClass(object):
        limit = 10
        def __init__(self):
            self.data = []
        def item(self, i):
            return self.data[i]
        def add(self, e):
            if len(self.data) >= self.limit:
                raise Exception("Too many elements")
    ## 10

    We could then create instances with their own specific limits, too, by assigning to the instance’s limit attribute.

    foo = MyClass()
    foo.limit = 50
    ## foo can now hold 50 elements—other instances can hold 10

    This only makes sense if you will want your typical instance of MyClass to hold just 10 elements or fewer—if you’re giving all of your instances different limits, then limit should be an instance variable. (Remember, though: take care when using mutable values as your defaults.)

  3. Tracking all data across all instances of a given class. This is sort of specific, but I could see a scenario in which you might want to access a piece of data related to every existing instance of a given class.

    To make the scenario more concrete, let’s say we have a Person class, and every person has a name. We want to keep track of all the names that have been used. One approach might be to iterate over the garbage collector’s list of objects, but it’s simpler to use class variables.

    Note that, in this case, names will only be accessed as a class variable, so the mutable default is acceptable.

    class Person(object):
        all_names = []
        def __init__(self, name):
            self.name = name
    joe = Person('Joe')
    bob = Person('Bob')
    print Person.all_names
    ## ['Joe', 'Bob']

    We could even use this design pattern to track all existing instances of a given class, rather than just some associated data.

    class Person(object):
        all_people = []
        def __init__(self, name):
            self.name = name
    joe = Person('Joe')
    bob = Person('Bob')
    print Person.all_people
    ## [<__main__.Person object at 0x10e428c50>, <__main__.Person object at 0x10e428c90>]
  4. Performance (sort of… see below).


Note: If you’re worrying about performance at this level, you might not want to be use Python in the first place, as the differences will be on the order of tenths of a millisecond—but it’s still fun to poke around a bit, and helps for illustration’s sake.

Recall that a class’s namespace is created and filled in at the time of the class’s definition. That means that we do just one assignment—ever—for a given class variable, while instance variables must be assigned every time a new instance is created. Let’s take an example.

def called_class():
    print "Class assignment"
    return 2

class Bar(object):
    y = called_class()

    def __init__(self, x):
        self.x = x

## "Class assignment"

def called_instance():
    print "Instance assignment"
    return 2

class Foo(object):
    def __init__(self, x):
        self.y = called_instance()
        self.x = x

## "Instance assignment"
## "Instance assignment"

We assign to Bar.y just once, but instance_of_Foo.y on every call to __init__.

As further evidence, let’s use the Python disassembler:

import dis

class Bar(object):
    y = 2

    def __init__(self, x):
        self.x = x

class Foo(object):
    def __init__(self, x):
        self.y = 2
        self.x = x

##  Disassembly of __init__:
##  7           0 LOAD_FAST                1 (x)
##              3 LOAD_FAST                0 (self)
##              6 STORE_ATTR               0 (x)
##              9 LOAD_CONST               0 (None)
##             12 RETURN_VALUE

## Disassembly of __init__:
## 11           0 LOAD_CONST               1 (2)
##              3 LOAD_FAST                0 (self)
##              6 STORE_ATTR               0 (y)

## 12           9 LOAD_FAST                1 (x)
##             12 LOAD_FAST                0 (self)
##             15 STORE_ATTR               1 (x)
##             18 LOAD_CONST               0 (None)
##             21 RETURN_VALUE

When we look at the byte code, it’s again obvious that Foo.__init__ has to do two assignments, while Bar.__init__ does just one.

In practice, what does this gain really look like? I’ll be the first to admit that timing tests are highly dependent on often uncontrollable factors and the differences between them are often hard to explain accurately.

However, I think these small snippets (run with the Python timeit module) help to illustrate the differences between class and instance variables, so I’ve included them anyway.

Note: I’m on a MacBook Pro with OS X 10.8.5 and Python 2.7.2.


10000000 calls to `Bar(2)`: 4.940s
10000000 calls to `Foo(2)`: 6.043s

The initializations of Bar are faster by over a second, so the difference here does appear to be statistically significant.

So why is this the case? One speculative explanation: we do two assignments in Foo.__init__, but just one in Bar.__init__.


10000000 calls to `Bar(2).y = 15`: 6.232s
10000000 calls to `Foo(2).y = 15`: 6.855s
10000000 `Bar` assignments: 6.232s - 4.940s = 1.292s
10000000 `Foo` assignments: 6.855s - 6.043s = 0.812s

Note: There’s no way to re-run your setup code on each trial with timeit, so we have to reinitialize our variable on our trial. The second line of times represents the above times with the previously calculated initialization times deducted.

From the above, it looks like Foo only takes about 60% as long as Bar to handle assignments.

Why is this the case? One speculative explanation: when we assign to Bar(2).y, we first look in the instance namespace (Bar(2).__dict__[y]), fail to find y, and then look in the class namespace (Bar.__dict__[y]), then making the proper assignment. When we assign to Foo(2).y, we do half as many lookups, as we immediately assign to the instance namespace (Foo(2).__dict__[y]).

In summary, though these performance gains won’t matter in reality, these tests are interesting at the conceptual level. If anything, I hope these differences help illustrate the mechanical distinctions between class and instance variables.

In Conclusion

Class attributes seem to be underused in Python; a lot of programmers have different impressions of how they work and why they might be helpful.

My take: class variables have their place within the school of good code. When used with care, they can simplify things and improve readability. But when carelessly thrown into a given class, they’re sure to trip you up.

Appendix: Private instance variables

One thing I wanted to include but didn’t have a natural entrance point…

Python doesn’t have private variables so-to-speak, but another interesting relationship between class and instance naming comes with name mangling.

In the Python style guide, it’s said that pseudo-private variables should be prefixed with a double underscore: ‘__’. This is not only a sign to others that your variable is meant to be treated privately, but also a way to prevent access to it, of sorts. Here’s what I mean:

class Bar(object):
    def __init__(self):
    self.__zap = 1

a = Bar()
## Traceback (most recent call last):
##   File "<stdin>", line 1, in <module>
## AttributeError: 'Bar' object has no attribute '__baz'

## Hmm. So what's in the namespace?
{'_Bar__zap': 1}
## 1

Look at that: the instance attribute __zap is automatically prefixed with the class name to yield _Bar__zap.

While still settable and gettable using a._Bar__zap, this name mangling is a means of creating a ‘private’ variable as it prevents you and others from accessing it by accident or through ignorance.

Edit: as Pedro Werneck kindly pointed out, this behavior is largely intended to help out with subclassing. In the PEP 8 style guide, they see it as serving two purposes: (1) preventing subclasses from accessing certain attributes, and (2) preventing namespace clashes in these subclasses. While useful, variable mangling shouldn’t be seen as an invitation to write code with an assumed public-private distinction, such as is present in Java.

Nice article. However, keep in mind that the name mangling with the double underscore isn't a way to prevent access to the variable, but to avoid name clashing when using inheritance. People coming from another language where the public/private distinction are more prevalent might believe it's a good practice to do that for all their "private" attributes. Not at all.
Richard Cochrane
Been using Python for years but this still taught me something new. Well written Charles.
The second example you give for reasons to use class attributes has a slight problem. Because you are directly referring to the class attribute in the add function, rather than the instance's attribute, then simply changing an instance's value for the class attribute will have no effect on the add function. That is, in order for the effect you desire, you need to change "MyClass.limit" to "self.limit" in the add function.
In the 2nd example you set a default value for the "data" variable in the __init__ method. That's a much better solution for the initial problem than using a class variable. So I'd say reason 2 and 4 are not good reasons to use it, and the 1st and 3rd reasons are what you would use static variables for. Why not reduce all this article to "use python's class variables like you'd use static variables in other languages", i.e. a namespaced/glorified global variable
Agreed. Plus: if you do fix it the way Brandon says, you still have a problem: update MyClass.limit and, suddenly, all your existing instances without explicit limit will have their behavior modified. There are (few) cases to make for that, but this limit-list is not one of them.
Thanks! That compliment means a lot--much appreciated.
Great catch. I just made the fix.
Tom Leo
Great read and great examples! The issue you ran into with mutability of class variables can also be an issue when giving functions default values. The parameters of your functions should never have a default mutable value i.e. def Bar(baz=[]).
Clark Bedsole
Quibble: In the title of this article, "overly thorough" should be hyphenated.
Graham Swan
I'm quite frankly amazed you were able to write this much on class variables! I consider myself intimately acquainted.
Aloulou Amine
Well explained !
Thank you for this article. I learned quite a bit from it. In the example in Appendix the following line: ## AttributeError: 'Bar' object has no attribute '__baz' Instead of __baz it should say __zap.
Kihwang Lee
Thank you very much for kind and comprehensive description!
Igor b
Very imformative. Thanks a lot!
Would this if len(self.data) >= self.limit: be if len(self.data) >= MyClass.limit: ?
Steven Erdmanczyk
I used Python for my MS thesis while I was still a Python newb. I was trying to use a class to store sensed nodes, but was baffled when modifying one node object was modifying others. In haste I abandoned the class approach and used dictionaries. This article has finally given me clarity.
zapp franka
Good tutorial!
Just one additional remark regarding "Recall that a class’s namespace is created and filled in at the time of the class’s definition.". One should be aware that, because of this, value assigned to class or instance variable (in __init__) using the same function call might be different. If, for example, function returns current time stamp, in the case of using class variable, the function would be evaluated at the time of class definition, while in the case of using instance variable, it would be evaluated at the time of creating the class instance. This matters big time when using functions for initialization that are dependent on parameters that could change.
Yakiv Kramarenko
I did not undestand the "Handling assignment" part. For me it seems to be wrong. Here is what I have: Python 2.7.6 (default, Sep 9 2014, 15:04:36) >>> class A(object): ... cv = 0 ... >>> a1 = A() >>> a2 = A() >>> a1.cv, a2.cv, A.cv (0, 0, 0) >>> a1.cv = 1 >>> a1.cv, a2.cv, A.cv (1, 0, 0) >>> a2.cv = 2 >>> a1.cv, a2.cv, A.cv (1, 2, 0) >>> A.cv = 3 >>> a1.cv, a2.cv, A.cv (1, 2, 3) This explicitly tells that the "If a class attribute is set by accessing the class, it will override the value for all instances" excerpt form your "Handling Assignment" section is wrong. The mutation section seems to tell the truth:)... >>> class B(object): ... cv = [] ... >>> b1 = B() >>> b2 = B() >>> b1.cv, b2.cv, B.cv ([], [], []) >>> b1.cv.append(1) >>> b1.cv, b2.cv, B.cv ([1], [1], [1]) >>> b2.cv.append(2) >>> b1.cv, b2.cv, B.cv ([1, 2], [1, 2], [1, 2]) >>> B.cv.append(3) >>> b1.cv, b2.cv, B.cv ([1, 2, 3], [1, 2, 3], [1, 2, 3]) For me the "A" behavior makes no sense... I can't understand the logic of such python behavior... I can't understand what kind of this logic may be that it leads to so 'not relevant' behaviors for "immutable" and "mutable" cases.. This makes me think of "no any sense of using class variables" as they may be prone to mistakes... I hope that's me who does not see the light in this tunnel...
Yakiv Kramarenko
:) I have got the explanation... Here it is: >>> A.cv = 0 >>> a1, a2 = A(), A() >>> A.cv, a1.cv, a2.cv (0, 0, 0) >>> A.cv = 1 >>> A.cv, a1.cv, a2.cv (1, 1, 1) >>> a1.cv = 2 # Here the new instance attribute is created for a1, # and so it will hide the class attribute with the same name, # once getting the value from instance namespace >>> A.cv, a1.cv, a2.cv (1, 2, 1) >>> A.cv = 3 >>> A.cv, a1.cv, a2.cv (3, 2, 3) Here is question asked and answered: http://stackoverflow.com/questions/28918920/why-assignment-of-the-class-attributes-in-python-behaves-like-assignment-of-inst/28919070#28919070
Jelle Smet
Here you can see <i>i_var</i> is an attribute of the instance whilst <i>class_var</i> is an attribute of the class : <pre><code> >>> first = MyClass("one") >>> second = MyClass("one") >>> id(first.i_var) 12370264 >>> id(second.i_var) 12370240 >>> id(first.class_var) 140178214034336 >>> id(second.class_var) 140178214034336 </code></pre>
Nirvik Ghosh
Thats one great article .. worth the read .. awesome stuff ..
Mong Him Ng
great article!
the word you were looking for is "mutate", not "mutilate", nor "manipulate" (though everyone got the gist)
Thanks, very useful!
Afzal Shahul Hameed
Great read! I can foresee me using class variables efficiently going forward. Thanks!
Jeffrey James
Just came across this and spent a good hour with it. Really appreciate the clarity and organization.
Alicja Kozikowska
Very interesting article. It helped me to organize and complete my knowledge on the topic, which I knew in bits and pieces. However, there are some things which I would like to clarify. In reason 3 for using class variables: "Note that, in this case, names will only be accessed as a class variable, so the mutable default is acceptable." It is acceptable that the class variable is mutable, because in this case it is not a default at all. Actually, using class variables as defaults is not a good idea anyway. As on of the commenters (Pedro) pointed out and I agree with him, it is much better to set them in the __init__ method.
Alicja Kozikowska
I agree with you, but instead of saying "use python's class variables like you'd use static variables in other languages" (because what if somebody has no or little experience with other languages), I would say "use Python's class variables if you need some data to be shared by the entire class and for a good reason".
A class attribute is an attribute of the class (circular, I know) what do you mean by saying circular??
He means that defining a "class attribute" as a "attribute class" is the same, and therefore is "circular"
Wow. Very comprehensive. Thanks!
This is great! Thank you for the article.
