A Guide to Performance Testing and Optimization With Python and Django
Donald Knuth said that “premature optimization is the root of all evil.” But there comes a time, usually in mature projects with high loads, when the need to optimize presents itself. In this article, Toptal Freelance Software Engineer Iulian Gulea talks about five common methods to optimize a web project’s code using principles that can be used in Django, as well as in other frameworks and languages. Using these principles, he demonstrates how to reduce the response time of a query from 77 to 3.7 seconds.
Donald Knuth said that “premature optimization is the root of all evil.” But there comes a time, usually in mature projects with high loads, when the need to optimize presents itself. In this article, Toptal Freelance Software Engineer Iulian Gulea talks about five common methods to optimize a web project’s code using principles that can be used in Django, as well as in other frameworks and languages. Using these principles, he demonstrates how to reduce the response time of a query from 77 to 3.7 seconds.
Iulian is a software engineer with experience developing web projects of all sizes, from simple sites to complex SaaS products. Previously a full-stack developer for PepsiCo, Iulian has worked both individually and in teams for companies large and small.
Previous Role
CTOPreviously At
Donald Knuth said that “premature optimization is the root of all evil.” But there comes a time, usually in mature projects with high loads, when there is an inevitable need to optimize. In this article, I would like to talk about five common methods to optimize your web project’s code. I will use Django, but the principles should be similar for other frameworks and languages. In this article, I will use these methods to reduce the response time of a query from 77 to 3.7 seconds.
The example code is adapted from a real project that I have worked with and is demonstrative of performance optimization techniques. In case you would like to follow along and see the results yourself, you can grab the code at its initial state on GitHub and make the corresponding changes while following along the article. I’ll be using Python 2, since some third-party packages are not yet available for Python 3.
Introducing Our Application
Our web project simply tracks real estate offers per country. Therefore, there are only two models:
# houses/models.py
from utils.hash import Hasher
class HashableModel(models.Model):
"""Provide a hash property for models."""
class Meta:
abstract = True
@property
def hash(self):
return Hasher.from_model(self)
class Country(HashableModel):
"""Represent a country in which the house is positioned."""
name = models.CharField(max_length=30)
def __unicode__(self):
return self.name
class House(HashableModel):
"""Represent a house with its characteristics."""
# Relations
country = models.ForeignKey(Country, related_name='houses')
# Attributes
address = models.CharField(max_length=255)
sq_meters = models.PositiveIntegerField()
kitchen_sq_meters = models.PositiveSmallIntegerField()
nr_bedrooms = models.PositiveSmallIntegerField()
nr_bathrooms = models.PositiveSmallIntegerField()
nr_floors = models.PositiveSmallIntegerField(default=1)
year_built = models.PositiveIntegerField(null=True, blank=True)
house_color_outside = models.CharField(max_length=20)
distance_to_nearest_kindergarten = models.PositiveIntegerField(null=True, blank=True)
distance_to_nearest_school = models.PositiveIntegerField(null=True, blank=True)
distance_to_nearest_hospital = models.PositiveIntegerField(null=True, blank=True)
has_cellar = models.BooleanField(default=False)
has_pool = models.BooleanField(default=False)
has_garage = models.BooleanField(default=False)
price = models.PositiveIntegerField()
def __unicode__(self):
return '{} {}'.format(self.country, self.address)
The abstract HashableModel
provides any model that inherits from it a hash
property that contains the instance’s primary key and the content type of the model. This hides sensitive data, like instance IDs, by substituting them with a hash. It may also be useful in cases when your project has multiple models and you need a centralized place that unhashes and decides what to do with different model instances of different classes. Note that for our small project, hashing is not really needed, as we can deal without it, but it will help demonstrate some optimization techniques, so I will keep it there.
Here is the Hasher
class:
# utils/hash.py
import basehash
class Hasher(object):
@classmethod
def from_model(cls, obj, klass=None):
if obj.pk is None:
return None
return cls.make_hash(obj.pk, klass if klass is not None else obj)
@classmethod
def make_hash(cls, object_pk, klass):
base36 = basehash.base36()
content_type = ContentType.objects.get_for_model(klass, for_concrete_model=False)
return base36.hash('%(contenttype_pk)03d%(object_pk)06d' % {
'contenttype_pk': content_type.pk,
'object_pk': object_pk
})
@classmethod
def parse_hash(cls, obj_hash):
base36 = basehash.base36()
unhashed = '%09d' % base36.unhash(obj_hash)
contenttype_pk = int(unhashed[:-6])
object_pk = int(unhashed[-6:])
return contenttype_pk, object_pk
@classmethod
def to_object_pk(cls, obj_hash):
return cls.parse_hash(obj_hash)[1]
As we would like to serve this data through an API endpoint, we install Django REST Framework and define the following serializers and view:
# houses/serializers.py
class HouseSerializer(serializers.ModelSerializer):
"""Serialize a `houses.House` instance."""
id = serializers.ReadOnlyField(source="hash")
country = serializers.ReadOnlyField(source="country.hash")
class Meta:
model = House
fields = (
'id',
'address',
'country',
'sq_meters',
'price'
)
# houses/views.py
class HouseListAPIView(ListAPIView):
model = House
serializer_class = HouseSerializer
country = None
def get_queryset(self):
country = get_object_or_404(Country, pk=self.country)
queryset = self.model.objects.filter(country=country)
return queryset
def list(self, request, *args, **kwargs):
# Skipping validation code for brevity
country = self.request.GET.get("country")
self.country = Hasher.to_object_pk(country)
queryset = self.get_queryset()
serializer = self.serializer_class(queryset, many=True)
return Response(serializer.data)
Now, we populate our database with some data (a total of 100,000 house instances generated using factory-boy
: 50,000 for one country, 40,000 for another, and 10,000 for a third country) and are ready to test the performance of our app.
Performance Optimization Is All About Measuring
There are several things we can measure in a project:
- Execution time
- Number of lines of code
- Number of function calls
- Allocated memory
- Etc.
But not all of them are relevant in measuring how good our project performs. Generally speaking, there are two main metrics that are the most important: how long something executes and how much memory it needs.
In a web project, response time (the time required for the server to receive a request generated by some user’s action, process it and send back the result) is usually the most important metric, as it doesn’t let users get bored while waiting for a response and switch to another tab in their browser.
In programming, analyzing project performance is called profiling. In order to profile the performance of our API endpoint, we will use the Silk package. After installing it and making our /api/v1/houses/?country=5T22RI
call (the hash that corresponds to the country with 50,000 house entries), we get this:
200 GET
/api/v1/houses/
77292ms overall
15854ms on queries
50004 queries
The overall response time is 77 seconds, out of which 16 seconds are spent on queries in the database, where there have been a total of 50,000 queries made. With such huge numbers, there is plenty room for improvement, so let’s start.
1. Optimizing Database Queries
One of the most frequent tips on performance optimization is to make sure your database queries are optimized. This case is no exception. Moreover, we can do several things about our queries to optimize the response time.
1.1 Provide all data at once
Taking a closer look at what those 50,000 queries are, you can see that these are all redundant queries to the houses_country
table:
200 GET
/api/v1/houses/
77292ms overall
15854ms on queries
50004 queries
At | Tables | Joins | Execution Time (ms) |
---|---|---|---|
+0:01:15.874374 | "houses_country" | 0 | 0.176 |
+0:01:15.873304 | "houses_country" | 0 | 0.218 |
+0:01:15.872225 | "houses_country" | 0 | 0.218 |
+0:01:15.871155 | "houses_country" | 0 | 0.198 |
+0:01:15.870099 | "houses_country" | 0 | 0.173 |
+0:01:15.869050 | "houses_country" | 0 | 0.197 |
+0:01:15.867877 | "houses_country" | 0 | 0.221 |
+0:01:15.866807 | "houses_country" | 0 | 0.203 |
+0:01:15.865646 | "houses_country" | 0 | 0.211 |
+0:01:15.864562 | "houses_country" | 0 | 0.209 |
+0:01:15.863511 | "houses_country" | 0 | 0.181 |
+0:01:15.862435 | "houses_country" | 0 | 0.228 |
+0:01:15.861413 | "houses_country" | 0 | 0.174 |
The source of this problem is the fact that, in Django, query sets are lazy. This means that a query set is not evaluated and it does not hit the database until you actually need to get the data. At the same time, it gets only the data you told it to, making subsequent requests if any additional data is needed.
That’s exactly what happened in our case. When getting the query set through House.objects.filter(country=country)
, Django gets a list of all houses in the given country. However, when serializing a house
instance, HouseSerializer
requires the country
instance of the house for computing the serializer’s country
field. As the country data isn’t present in the query set, django makes an additional request to get that data. And it does so for every house in the query set—that’s 50,000 times in all.
The solution is very simple, though. In order to extract all required data for serialization, you can use the select_related()
method on the query set. Thus, our get_queryset
will look like:
def get_queryset(self):
country = get_object_or_404(Country, pk=self.country)
queryset = self.model.objects.filter(country=country).select_related('country')
return queryset
Let’s see how this impacted the performance:
200 GET
/api/v1/houses/
35979ms overall
102ms on queries
4 queries
The overall response time dropped to 36 seconds and the time spent in the database is ~100ms with only 4 queries! That’s great news, but we can do more.
1.2 Provide only the relevant data
By default, Django extracts all fields from the database. However, when you have huge tables with many columns and rows, it makes sense to tell Django what specific fields to extract, so that it won’t spend time to get information that won’t be used at all. In our case, we need only five fields for serialization, but we have 17 fields. It makes sense to specify exactly what fields to extract from the database, so that we further cut down the response time.
Django has the defer()
and the only()
query set methods for doing exactly this. The first one specifies what fields not to load and the second one specifies what fields to load only.
def get_queryset(self):
country = get_object_or_404(Country, pk=self.country)
queryset = self.model.objects.filter(country=country)\
.select_related('country')\
.only('id', 'address', 'country', 'sq_meters', 'price')
return queryset
This cut the time spent on queries in half, which is good, but 50ms is not that much. The overall time also slightly dropped, but there is more space to cut it.
200 GET
/api/v1/houses/
33111ms overall
52ms on queries
4 queries
2. Optimizing Your Code
You can’t infinitely optimize the database queries, and our last result just showed that. Even if we hypothetically decrease the time spent on queries to 0, we would still face the reality of waiting for half a minute to get the response. It’s time to switch to another level of optimization: business logic.
2.1 Simplify your code
Sometimes, third-party packages come with a lot of overhead for simple tasks. One such example is our task to return serialized house instances.
Django REST Framework is great, with lots of useful features out of the box. However, our main goal right now is to cut down the response time, so it is a great candidate for optimization, especially that the serialized objects are quite simple.
Let’s write a custom serializer for this purpose. To keep it simple, we will have a single static method that does the job. In reality, you might want to have the same class and method signatures to be able to use serializers interchangeably:
# houses/serializers.py
class HousePlainSerializer(object):
"""
Serializes a House queryset consisting of dicts with
the following keys: 'id', 'address', 'country',
'sq_meters', 'price'.
"""
@staticmethod
def serialize_data(queryset):
"""
Return a list of hashed objects from the given queryset.
"""
return [
{
'id': Hasher.from_pk_and_class(entry['id'], House),
'address': entry['address'],
'country': Hasher.from_pk_and_class(entry['country'], Country),
'sq_meters': entry['sq_meters'],
'price': entry['price']
} for entry in queryset
]
# houses/views.py
class HouseListAPIView(ListAPIView):
model = House
serializer_class = HouseSerializer
plain_serializer_class = HousePlainSerializer # <-- added custom serializer
country = None
def get_queryset(self):
country = get_object_or_404(Country, pk=self.country)
queryset = self.model.objects.filter(country=country)
return queryset
def list(self, request, *args, **kwargs):
# Skipping validation code for brevity
country = self.request.GET.get("country")
self.country = Hasher.to_object_pk(country)
queryset = self.get_queryset()
data = self.plain_serializer_class.serialize_data(queryset) # <-- serialize
return Response(data)
200 GET
/api/v1/houses/
17312ms overall
38ms on queries
4 queries
This looks better now. The response time was almost halved due to the fact that we didn’t employ DRF serializers code.
Another measurable result—the number of total function calls made during the request/response cycle—dropped from 15,859,427 calls (from the request made in section 1.2 above) to 9,257,469 calls. This means that around 1/3 of all function calls were made by Django REST Framework.
2.2 Update/substitute third-party packages
The optimization techniques described above are the most common, the ones you can do without thorough analysis and thought. However, 17 seconds still feels pretty long; in order to reduce this number, we will need to dive deeper into our code and analyze what happens under the hood. In other words, we will need to profile our code.
You can do the profiling yourself, using the built-in Python profiler, or you can use some third party packages for it (that are using the built-in Python profiler). As we already use silk
, it can profile the code and generate a binary profile file, that we can further visualize. There are several visualization packages that transform a binary profile into some insightful visualizations. I will be using the snakeviz
package.
Here is the visualization of the binary profile of the last request from above, hooked at the view’s dispatch method:
From top to bottom is the call stack, displaying the file name, method/function name with its line number and the corresponding cumulative time spent in that method. Now it is easier to see that a lion’s share of time is dedicated to computing the hash (the __init__.py
and primes.py
rectangles of violet color).
Currently, this is the main performance bottleneck in our code, but at the same time it is not really our code—it’s a third-party package.
In such situations, there are a limited number of things we can do:
- Check for a new version of the package (that hopefully has a better performance).
- Find another package that performs better on tasks we need.
- Write our own implementation that will beat the performance of the package we currently use.
Luckily for me, there is a newer version of the basehash
package that is responsible for hashing. The code uses v.2.1.0, but there is a v.3.0.4. Such situations, when you are able to update to a newer version of a package, are more probable when you are working on an existing project.
When checking release notes for v.3, there is this specific sentence that sounds very promising:
A massive overhaul was done with the primality algorithms. Including (sic) support for gmpy2 if it available (sic) on the system for that much more of an increase.
Let’s find this out!
pip install -U basehash gmpy2
200 GET
/api/v1/houses/
7738ms overall
59ms on queries
4 queries
We cut down the response time from 17 to under 8 seconds. Great result, but there is one more thing we should look at.
2.3 Refactor your own code
So far, we have improved our queries, substituted third-party complex and generic code with our own very specific functions, and updated third-party packages, but we left our existing code untouched. But sometimes a small refactoring of existing code can bring impressive results. But for this we need again to analyze the profiling results.
Taking a closer look, you can see that hashing is still a problem (not surprisingly, it is the only thing we do with our data), although we did improve in that direction. However, the greenish rectangle that says that __init__.py
consumes 2.14 seconds bothers me, along with the grayish __init__.py:54(hash)
that goes right after it. This means that some initialization takes a long time.
Let’s take a look at the source code of the basehash
package.
# basehash/__init__.py
# Initialization of `base36` class initializes the parent, `base` class.
class base36(base):
def __init__(self, length=HASH_LENGTH, generator=GENERATOR):
super(base36, self).__init__(BASE36, length, generator)
class base(object):
def __init__(self, alphabet, length=HASH_LENGTH, generator=GENERATOR):
if len(set(alphabet)) != len(alphabet):
raise ValueError('Supplied alphabet cannot contain duplicates.')
self.alphabet = tuple(alphabet)
self.base = len(alphabet)
self.length = length
self.generator = generator
self.maximum = self.base ** self.length - 1
self.prime = next_prime(int((self.maximum + 1) * self.generator)) # `next_prime` call on each initialized instance
As you can see, the initialization of a base
instance requires a call of the next_prime
function; that is quite heavy as we can see in the bottom left rectangles of the visualization above.
Let’s take a look at my Hash
class again:
class Hasher(object):
@classmethod
def from_model(cls, obj, klass=None):
if obj.pk is None:
return None
return cls.make_hash(obj.pk, klass if klass is not None else obj)
@classmethod
def make_hash(cls, object_pk, klass):
base36 = basehash.base36() # <-- initializing on each method call
content_type = ContentType.objects.get_for_model(klass, for_concrete_model=False)
return base36.hash('%(contenttype_pk)03d%(object_pk)06d' % {
'contenttype_pk': content_type.pk,
'object_pk': object_pk
})
@classmethod
def parse_hash(cls, obj_hash):
base36 = basehash.base36() # <-- initializing on each method call
unhashed = '%09d' % base36.unhash(obj_hash)
contenttype_pk = int(unhashed[:-6])
object_pk = int(unhashed[-6:])
return contenttype_pk, object_pk
@classmethod
def to_object_pk(cls, obj_hash):
return cls.parse_hash(obj_hash)[1]
As you can see, I have labeled two methods that are initializing a base36
instance on each method call, which is not really required.
As hashing is a deterministic procedure, meaning that for a given input value it must always generate the same hash value, we can make it a class attribute without fearing that it will break something. Let’s check out how it will perform:
class Hasher(object):
base36 = basehash.base36() # <-- initialize hasher only once
@classmethod
def from_model(cls, obj, klass=None):
if obj.pk is None:
return None
return cls.make_hash(obj.pk, klass if klass is not None else obj)
@classmethod
def make_hash(cls, object_pk, klass):
content_type = ContentType.objects.get_for_model(klass, for_concrete_model=False)
return cls.base36.hash('%(contenttype_pk)03d%(object_pk)06d' % {
'contenttype_pk': content_type.pk,
'object_pk': object_pk
})
@classmethod
def parse_hash(cls, obj_hash):
unhashed = '%09d' % cls.base36.unhash(obj_hash)
contenttype_pk = int(unhashed[:-6])
object_pk = int(unhashed[-6:])
return contenttype_pk, object_pk
@classmethod
def to_object_pk(cls, obj_hash):
return cls.parse_hash(obj_hash)[1]
200 GET
/api/v1/houses/
3766ms overall
38ms on queries
4 queries
The final result is under four seconds, which is much smaller than what we started with. Further optimization of response time can be achieved using caching, but I won’t tackle it in this article.
Conclusion
Performance optimization is a process of analysis and discovery. There are no hard rules that apply to all cases, as each project has its own flow and bottlenecks. However, the first thing you should do is profile your code. And if in such a short example I could reduce the response time from 77 seconds to 3.7 seconds, huge projects have even more optimization potential.
Further Reading on the Toptal Blog:
- Security in Django Applications: A Pydantic Tutorial, Part 4
- Supply Chain Optimization Using Python and Mathematical Modeling
- Optimize Your Environment for Development and Production: A Pydantic Tutorial, Part 2
- Streamline Your Django Settings With Type Hints: A Pydantic Tutorial, Part 1
- How to Deploy Django on Heroku: A Pydantic Tutorial, Part 3
- Top 10 Mistakes That Django Developers Make
Chișinău, Chisinau, Moldova
Member since August 4, 2017
About the author
Iulian is a software engineer with experience developing web projects of all sizes, from simple sites to complex SaaS products. Previously a full-stack developer for PepsiCo, Iulian has worked both individually and in teams for companies large and small.
Previous Role
CTOPREVIOUSLY AT