A Guide to Performance Testing and Optimization With Python and Django

View all articles

Donald Knuth said that “premature optimization is the root of all evil.” But there comes a time, usually in mature projects with high loads, when there is an inevitable need to optimize. In this article, I would like to talk about five common methods to optimize your web project’s code. I will use Django, but the principles should be similar for other frameworks and languages. In this article, I will use these methods to reduce the response time of a query from 77 to 3.7 seconds.

Guide to Performance Optimization and Performance Testing With Python and Django

The example code is adapted from a real project that I have worked with and is demonstrative of performance optimization techniques. In case you would like to follow along and see the results yourself, you can grab the code at its initial state on GitHub and make the corresponding changes while following along the article. I’ll be using Python 2, since some third-party packages are not yet available for Python 3.

Introducing our application

Our web project simply tracks real estate offers per country. Therefore, there are only two models:

# houses/models.py
from utils.hash import Hasher


class HashableModel(models.Model):
    """Provide a hash property for models."""
    class Meta:
        abstract = True

    @property
    def hash(self):
        return Hasher.from_model(self)


class Country(HashableModel):
    """Represent a country in which the house is positioned."""
    name = models.CharField(max_length=30)

    def __unicode__(self):
        return self.name


class House(HashableModel):
    """Represent a house with its characteristics."""
    # Relations
    country = models.ForeignKey(Country, related_name='houses')

    # Attributes
    address = models.CharField(max_length=255)
    sq_meters = models.PositiveIntegerField()
    kitchen_sq_meters = models.PositiveSmallIntegerField()
    nr_bedrooms = models.PositiveSmallIntegerField()
    nr_bathrooms = models.PositiveSmallIntegerField()
    nr_floors = models.PositiveSmallIntegerField(default=1)
    year_built = models.PositiveIntegerField(null=True, blank=True)
    house_color_outside = models.CharField(max_length=20)
    distance_to_nearest_kindergarten = models.PositiveIntegerField(null=True, blank=True)
    distance_to_nearest_school = models.PositiveIntegerField(null=True, blank=True)
    distance_to_nearest_hospital = models.PositiveIntegerField(null=True, blank=True)
    has_cellar = models.BooleanField(default=False)
    has_pool = models.BooleanField(default=False)
    has_garage = models.BooleanField(default=False)
    price = models.PositiveIntegerField()

    def __unicode__(self):
        return '{} {}'.format(self.country, self.address)

The abstract HashableModel provides any model that inherits from it a hash property that contains the instance’s primary key and the content type of the model. This hides sensitive data, like instance IDs, by substituting them with a hash. It may also be useful in cases when your project has multiple models and you need a centralized place that unhashes and decides what to do with different model instances of different classes. Note that for our small project, hashing is not really needed, as we can deal without it, but it will help demonstrate some optimization techniques, so I will keep it there.

Here is the Hasher class:

# utils/hash.py
import basehash


class Hasher(object):
    @classmethod
    def from_model(cls, obj, klass=None):
        if obj.pk is None:
            return None
        return cls.make_hash(obj.pk, klass if klass is not None else obj)

    @classmethod
    def make_hash(cls, object_pk, klass):
        base36 = basehash.base36()
        content_type = ContentType.objects.get_for_model(klass, for_concrete_model=False)
        return base36.hash('%(contenttype_pk)03d%(object_pk)06d' % {
            'contenttype_pk': content_type.pk,
            'object_pk': object_pk
        })

    @classmethod
    def parse_hash(cls, obj_hash):
        base36 = basehash.base36()
        unhashed = '%09d' % base36.unhash(obj_hash)
        contenttype_pk = int(unhashed[:-6])
        object_pk = int(unhashed[-6:])
        return contenttype_pk, object_pk

    @classmethod
    def to_object_pk(cls, obj_hash):    
        return cls.parse_hash(obj_hash)[1]

As we would like to serve this data through an API endpoint, we install Django REST Framework and define the following serializers and view:

# houses/serializers.py
class HouseSerializer(serializers.ModelSerializer):
    """Serialize a `houses.House` instance."""

    id = serializers.ReadOnlyField(source="hash")
    country = serializers.ReadOnlyField(source="country.hash")

    class Meta:
        model = House
        fields = (
            'id',
            'address',
            'country',
            'sq_meters',
            'price'
        )
# houses/views.py
class HouseListAPIView(ListAPIView):
    model = House
    serializer_class = HouseSerializer
    country = None

    def get_queryset(self):
        country = get_object_or_404(Country, pk=self.country)
        queryset = self.model.objects.filter(country=country)
        return queryset

    def list(self, request, *args, **kwargs):
        # Skipping validation code for brevity
        country = self.request.GET.get("country")
        self.country = Hasher.to_object_pk(country)
        queryset = self.get_queryset()

        serializer = self.serializer_class(queryset, many=True)

        return Response(serializer.data)

Now, we populate our database with some data (a total of 100,000 house instances generated using factory-boy: 50,000 for one country, 40,000 for another, and 10,000 for a third country) and are ready to test the performance of our app.

Performance Optimization Is All about Measuring

There are several things we can measure in a project:

  • Execution time
  • Number of lines of code
  • Number of function calls
  • Allocated memory
  • Etc.

But not all of them are relevant in measuring how good our project performs. Generally speaking, there are two main metrics that are the most important: how long something executes and how much memory it needs.

In a web project, response time (the time required for the server to receive a request generated by some user’s action, process it and send back the result) is usually the most important metric, as it doesn’t let users get bored while waiting for a response and switch to another tab in their browser.

In programming, analyzing project performance is called profiling. In order to profile the performance of our API endpoint, we will use the Silk package. After installing it and making our /api/v1/houses/?country=5T22RI call (the hash that corresponds to the country with 50,000 house entries), we get this:

200 GET
/api/v1/houses/

77292ms overall
15854ms on queries
50004 queries

The overall response time is 77 seconds, out of which 16 seconds are spent on queries in the database, where there have been a total of 50,000 queries made. With such huge numbers, there is plenty room for improvement, so let’s start.

1. Optimizing Database Queries

One of the most frequent tips on performance optimization is to make sure your database queries are optimized. This case is no exception. Moreover, we can do several things about our queries to optimize the response time.

1.1 Provide all data at once

Taking a closer look at what those 50,000 queries are, you can see that these are all redundant queries to the houses_country table:

200 GET
/api/v1/houses/

77292ms overall
15854ms on queries
50004 queries

At Tables Joins Execution Time (ms)
+0:01:15.874374 "houses_country" 0 0.176
+0:01:15.873304 "houses_country" 0 0.218
+0:01:15.872225 "houses_country" 0 0.218
+0:01:15.871155 "houses_country" 0 0.198
+0:01:15.870099 "houses_country" 0 0.173
+0:01:15.869050 "houses_country" 0 0.197
+0:01:15.867877 "houses_country" 0 0.221
+0:01:15.866807 "houses_country" 0 0.203
+0:01:15.865646 "houses_country" 0 0.211
+0:01:15.864562 "houses_country" 0 0.209
+0:01:15.863511 "houses_country" 0 0.181
+0:01:15.862435 "houses_country" 0 0.228
+0:01:15.861413 "houses_country" 0 0.174

The source of this problem is the fact that, in Django, query sets are lazy. This means that a query set is not evaluated and it does not hit the database until you actually need to get the data. At the same time, it gets only the data you told it to, making subsequent requests if any additional data is needed.

That’s exactly what happened in our case. When getting the query set through House.objects.filter(country=country), Django gets a list of all houses in the given country. However, when serializing a house instance, HouseSerializer requires the country instance of the house for computing the serializer’s country field. As the country data isn’t present in the query set, django makes an additional request to get that data. And it does so for every house in the query set—that’s 50,000 times in all.

The solution is very simple, though. In order to extract all required data for serialization, you can use the select_related() method on the query set. Thus, our get_queryset will look like:

def get_queryset(self):
    country = get_object_or_404(Country, pk=self.country)
    queryset = self.model.objects.filter(country=country).select_related('country')
    return queryset

Let’s see how this impacted the performance:

200 GET
/api/v1/houses/

35979ms overall
102ms on queries
4 queries

The overall response time dropped to 36 seconds and the time spent in the database is ~100ms with only 4 queries! That’s great news, but we can do more.

1.2 Provide only the relevant data

By default, Django extracts all fields from the database. However, when you have huge tables with many columns and rows, it makes sense to tell Django what specific fields to extract, so that it won’t spend time to get information that won’t be used at all. In our case, we need only five fields for serialization, but we have 17 fields. It makes sense to specify exactly what fields to extract from the database, so that we further cut down the response time.

Django has the defer() and the only() query set methods for doing exactly this. The first one specifies what fields not to load and the second one specifies what fields to load only.

def get_queryset(self):
    country = get_object_or_404(Country, pk=self.country)
    queryset = self.model.objects.filter(country=country)\
        .select_related('country')\
        .only('id', 'address', 'country', 'sq_meters', 'price')
    return queryset

This cut the time spent on queries in half, which is good, but 50ms is not that much. The overall time also slightly dropped, but there is more space to cut it.

200 GET
/api/v1/houses/

33111ms overall
52ms on queries
4 queries

2. Optimizing Your Code

You can’t infinitely optimize the database queries, and our last result just showed that. Even if we hypothetically decrease the time spent on queries to 0, we would still face the reality of waiting for half a minute to get the response. It’s time to switch to another level of optimization: business logic.

2.1 Simplify your code

Sometimes, third-party packages come with a lot of overhead for simple tasks. One such example is our task to return serialized house instances.

Django REST Framework is great, with lots of useful features out of the box. However, our main goal right now is to cut down the response time, so it is a great candidate for optimization, especially that the serialized objects are quite simple.

Let’s write a custom serializer for this purpose. To keep it simple, we will have a single static method that does the job. In reality, you might want to have the same class and method signatures to be able to use serializers interchangeably:

# houses/serializers.py
class HousePlainSerializer(object):
    """
    Serializes a House queryset consisting of dicts with
    the following keys: 'id', 'address', 'country',
    'sq_meters', 'price'.
    """

    @staticmethod
    def serialize_data(queryset):
        """
        Return a list of hashed objects from the given queryset.
        """
        return [
            {
                'id': Hasher.from_pk_and_class(entry['id'], House),
                'address': entry['address'],
                'country': Hasher.from_pk_and_class(entry['country'], Country),
                'sq_meters': entry['sq_meters'],
                'price': entry['price']
            } for entry in queryset
        ]


# houses/views.py
class HouseListAPIView(ListAPIView):
    model = House
    serializer_class = HouseSerializer
    plain_serializer_class = HousePlainSerializer  # <-- added custom serializer
    country = None

    def get_queryset(self):
        country = get_object_or_404(Country, pk=self.country)
        queryset = self.model.objects.filter(country=country)
        return queryset

    def list(self, request, *args, **kwargs):
        # Skipping validation code for brevity
        country = self.request.GET.get("country")
        self.country = Hasher.to_object_pk(country)
        queryset = self.get_queryset()

        data = self.plain_serializer_class.serialize_data(queryset)  # <-- serialize

        return Response(data)

200 GET
/api/v1/houses/

17312ms overall
38ms on queries
4 queries

This looks better now. The response time was almost halved due to the fact that we didn’t employ DRF serializers code.

Another measurable result—the number of total function calls made during the request/response cycle—dropped from 15,859,427 calls (from the request made in section 1.2 above) to 9,257,469 calls. This means that around 1/3 of all function calls were made by Django REST Framework.

2.2 Update/substitute third-party packages

The optimization techniques described above are the most common, the ones you can do without thorough analysis and thought. However, 17 seconds still feels pretty long; in order to reduce this number, we will need to dive deeper into our code and analyze what happens under the hood. In other words, we will need to profile our code.

You can do the profiling yourself, using the built-in Python profiler, or you can use some third party packages for it (that are using the built-in Python profiler). As we already use silk, it can profile the code and generate a binary profile file, that we can further visualize. There are several visualization packages that transform a binary profile into some insightful visualizations. I will be using the snakeviz package.

Here is the visualization of the binary profile of the last request from above, hooked at the view’s dispatch method:

Image of the view's dispatch method

From top to bottom is the call stack, displaying the file name, method/function name with its line number and the corresponding cumulative time spent in that method. Now it is easier to see that a lion’s share of time is dedicated to computing the hash (the __init__.py and primes.py rectangles of violet color).

Currently, this is the main performance bottleneck in our code, but at the same time it is not really our code—it’s a third-party package.

In such situations, there are a limited number of things we can do:

  • Check for a new version of the package (that hopefully has a better performance).
  • Find another package that performs better on tasks we need.
  • Write our own implementation that will beat the performance of the package we currently use.

Luckily for me, there is a newer version of the basehash package that is responsible for hashing. The code uses v.2.1.0, but there is a v.3.0.4. Such situations, when you are able to update to a newer version of a package, are more probable when you are working on an existing project.

When checking release notes for v.3, there is this specific sentence that sounds very promising:

“A massive overhaul was done with the primality algorithms. Including (sic) support for gmpy2 if it available (sic) on the system for that much more of an increase.”

Let’s find this out!

pip install -U basehash gmpy2

200 GET
/api/v1/houses/

7738ms overall
59ms on queries
4 queries

We cut down the response time from 17 to under 8 seconds. Great result, but there is one more thing we should look at.

2.3 Refactor your own code

So far, we have improved our queries, substituted third-party complex and generic code with our own very specific functions, and updated third-party packages, but we left our existing code untouched. But sometimes a small refactoring of existing code can bring impressive results. But for this we need again to analyze the profiling results.

Image of profiling results

Taking a closer look, you can see that hashing is still a problem (not surprisingly, it is the only thing we do with our data), although we did improve in that direction. However, the greenish rectangle that says that __init__.py consumes 2.14 seconds bothers me, along with the grayish __init__.py:54(hash) that goes right after it. This means that some initialization takes a long time.

Let’s take a look at the source code of the basehash package.

# basehash/__init__.py

# Initialization of `base36` class initializes the parent, `base` class.
class base36(base):
    def __init__(self, length=HASH_LENGTH, generator=GENERATOR):
        super(base36, self).__init__(BASE36, length, generator)


class base(object):
    def __init__(self, alphabet, length=HASH_LENGTH, generator=GENERATOR):
        if len(set(alphabet)) != len(alphabet):
            raise ValueError('Supplied alphabet cannot contain duplicates.')

        self.alphabet = tuple(alphabet)
        self.base = len(alphabet)
        self.length = length
        self.generator = generator
        self.maximum = self.base ** self.length - 1
        self.prime = next_prime(int((self.maximum + 1) * self.generator))  # `next_prime` call on each initialized instance

As you can see, the initialization of a base instance requires a call of the next_prime function; that is quite heavy as we can see in the bottom left rectangles of the visualization above.

Let’s take a look at my Hash class again:

class Hasher(object):
    @classmethod
    def from_model(cls, obj, klass=None):
        if obj.pk is None:
            return None
        return cls.make_hash(obj.pk, klass if klass is not None else obj)

    @classmethod
    def make_hash(cls, object_pk, klass):
        base36 = basehash.base36()  # <-- initializing on each method call
        content_type = ContentType.objects.get_for_model(klass, for_concrete_model=False)
        return base36.hash('%(contenttype_pk)03d%(object_pk)06d' % {
            'contenttype_pk': content_type.pk,
            'object_pk': object_pk
        })

    @classmethod
    def parse_hash(cls, obj_hash):
        base36 = basehash.base36()  # <-- initializing on each method call
        unhashed = '%09d' % base36.unhash(obj_hash)
        contenttype_pk = int(unhashed[:-6])
        object_pk = int(unhashed[-6:])
        return contenttype_pk, object_pk

    @classmethod
    def to_object_pk(cls, obj_hash):    
        return cls.parse_hash(obj_hash)[1]

As you can see, I have labeled two methods that are initializing a base36 instance on each method call, which is not really required.

As hashing is a deterministic procedure, meaning that for a given input value it must always generate the same hash value, we can make it a class attribute without fearing that it will break something. Let’s check out how it will perform:

class Hasher(object):
    base36 = basehash.base36()  # <-- initialize hasher only once

    @classmethod
    def from_model(cls, obj, klass=None):
        if obj.pk is None:
            return None
        return cls.make_hash(obj.pk, klass if klass is not None else obj)

    @classmethod
    def make_hash(cls, object_pk, klass):
        content_type = ContentType.objects.get_for_model(klass, for_concrete_model=False)
        return cls.base36.hash('%(contenttype_pk)03d%(object_pk)06d' % {
            'contenttype_pk': content_type.pk,
            'object_pk': object_pk
        })

    @classmethod
    def parse_hash(cls, obj_hash):
        unhashed = '%09d' % cls.base36.unhash(obj_hash)
        contenttype_pk = int(unhashed[:-6])
        object_pk = int(unhashed[-6:])
        return contenttype_pk, object_pk

    @classmethod
    def to_object_pk(cls, obj_hash):    
        return cls.parse_hash(obj_hash)[1]

200 GET
/api/v1/houses/

3766ms overall
38ms on queries
4 queries

The final result is under four seconds, which is much smaller than what we started with. Further optimization of response time can be achieved using caching, but I won’t tackle it in this article.

Conclusion

Performance optimization is a process of analysis and discovery. There are no hard rules that apply to all cases, as each project has its own flow and bottlenecks. However, the first thing you should do is profile your code. And if in such a short example I could reduce the response time from 77 seconds to 3.7 seconds, huge projects have even more optimization potential.

If you’re interested in reading more Django related articles, check out Top 10 Mistakes that Django Developers Make by fellow Toptal Django Developer Alexandr Shurigin.

About the author

Iulian Gulea, Latvia
member since July 5, 2016
Iulian has over seven years of experience developing web projects of all sizes, from simple web sites to complex SaaS products. He has worked individually and in teams; in big companies and in startups. This diversity of experience has broadened his technical skillset, and his time as a corporate trainer has helped to develop strong communication and people skills. Iulian is also passionate about using Data Science to improve his projects. [click to continue...]
Hiring? Meet the Top 10 Freelance Python Developers for Hire in November 2017

Comments

Nikhil Kumar
Very informative article ! Thanks lulian. Do you also run a blog or some place where i can follow for more such insights ?
Iulian Gulea
Many thanks, Nikhil! I'm glad you found it useful. I have a blog at http://blog.heroesofprogramming.com/, although I haven't written there for some time. I will write more articles here on Toptal though, so stay tuned ;)
Dima
Nice article! I have some feedback, though. Instead of constantly computing hash at runtime, we could create a column which would contain the computed hash. Besides, replacing battle-tested DRF serializers with a handcrafted one might improve performance but could potentially introduce bugs and security issues. Finally, how about putting a caching middleware behind all this logic? It makes more sense to compute the results once and then server static data each time.
Iulian Gulea
Thanks Dima! I agree with your comment and that caching is an option as well and generally speaking, caching is a topic in itself. The goal of my article was to show the detailed path of this process, to demonstrate, based on a simple example, the steps and process of approaching performance optimization, as in big projects you will have to step through all of them at some point.
comments powered by Disqus
Subscribe
The #1 Blog for Engineers
Get the latest content first.
No spam. Just great engineering posts.
The #1 Blog for Engineers
Get the latest content first.
Thank you for subscribing!
Check your inbox to confirm subscription. You'll start receiving posts after you confirm.
Trending articles
Relevant Technologies
About the author
Iulian Gulea
Python 3 Developer
Iulian has over seven years of experience developing web projects of all sizes, from simple web sites to complex SaaS products. He has worked individually and in teams; in big companies and in startups. This diversity of experience has broadened his technical skillset, and his time as a corporate trainer has helped to develop strong communication and people skills. Iulian is also passionate about using Data Science to improve his projects.