Python is amazing.

Surprisingly, that’s a fairly ambiguous statement. What do I mean by ‘Python’? Do I mean Python the abstract interface? Do I mean CPython, the common Python implementation (and not to be confused with the similarly named Cython)? Or do I mean something else entirely? Maybe I’m obliquely referring to Jython, or IronPython, or PyPy. Or maybe I’ve really gone off the deep end and I’m talking about RPython or RubyPython (which are very, very different things).

While the technologies mentioned above are commonly-named and commonly-referenced, some of them serve completely different purposes (or, at least, operate in completely different ways).

Throughout my time working with Python, I’ve run across tons of these .*ython tools. But not until recently did I take the time to understand what they are, how they work, and why they’re necessary (in their own ways).

In this post, I’ll start from scratch and move through the various Python implementations, concluding with a thorough introduction to PyPy, which I believe is the future of the language.

It all starts with an understanding of what ‘Python’ actually is.

If you have a good understanding for machine code, virtual machines, and the like, feel free to skip ahead.

“Is Python interpreted or compiled?”

This is a common point of confusion for Python beginners.

The first thing to realize is that ‘Python’ is an interface. There’s a specification of what Python should do and how it should behave (as with any interface). And there are multiple implementations (as with any interface).

The second thing to realize is that ‘interpreted’ and ‘compiled’ are properties of an implementation, not an interface.

So the question itself isn’t really well-formed.

Is Python interpreted or compiled? The question isn't really well-formed.

That said, for the most common implementation (CPython: written in C, often referred to as simply ‘Python’, and surely what you’re using if you have no idea what I’m talking about), the answer is: interpreted, with some compilation. CPython compiles* Python source code to bytecode, and then interprets this bytecode, executing it as it goes.

* Note: this isn’t ‘compilation’ in the traditional sense of the word. Typically, we’d say that ‘compilation’ is taking a high-level language and converting it to machine code. But it is a ‘compilation’ of sorts.

Let’s look at that answer more closely, as it will help us understand some of the concepts that come up later in the post.

Bytecode vs. Machine Code

It’s very important to understand the difference between bytecode and machine (or native) code, perhaps best illustrated by example:

  • C compiles to machine code, which is then run directly on your processor. Each instruction instructs your CPU to move stuff around.
  • Java compiles to bytecode, which is then run on the Java Virtual Machine (JVM), an abstraction of a computer that executes programs. Each instruction is then handled by the JVM, which interacts with your computer.

In very brief terms: machine code is much faster, but bytecode is more portable and secure.

Machine code looks different depending on your machine, but bytecode looks the same on all machines. One might say that machine code is optimized to your setup.

Returning to CPython, the toolchain process is as follows:

  1. CPython compiles your Python source code into bytecode.
  2. That bytecode is then executed on the CPython Virtual Machine.
Beginners often assume Python is compiled because of .pyc files. There's some truth to that: the .pyc file is the compiled bytecode, which is then interpreted. So if you've run your Python code before and have the .pyc file handy, it will run faster the second time, as it doesn't have to re-compile the bytecode.

Alternative VMs: Jython, IronPython, and More

As I mentioned earlier, Python has several implementations. Again, as mentioned earlier, the most common is CPython. This a Python implementation written in C and considered the ‘default’ implementation.

But what about the alternatives? One of the more prominent is Jython, a Python implementation written Java that utilizes the JVM. While CPython produces bytecode to run on the CPython VM, Jython produces Java bytecode to run on the JVM (this is the same stuff that’s produced when you compile a Java program).

“Why would you ever use an alternative implementation?”, you might ask. Well, for one, these different implementations play nicely with different technology stacks.

CPython makes it very easy to write C-extensions for your Python code because in the end it is executed by a C interpreter. Jython, on the other hand, makes it very easy to work with other Java programs: you can import any Java classes with no additional effort, summoning up and utilizing your Java classes from within your Jython programs. (Aside: if you haven’t thought about it closely, this is actually nuts. We’re at the point where you can mix and mash different languages and compile them all down to the same substance. (As mentioned by Rostin, programs that mix Fortran and C code have been around for a while. So, of course, this isn’t necessarily new. But it’s still cool.))

As an example, this is valid Jython code:

[Java HotSpot(TM) 64-Bit Server VM (Apple Inc.)] on java1.6.0_51
>>> from java.util import HashSet
>>> s = HashSet(5)
>>> s.add("Foo")
>>> s.add("Bar")
>>> s
[Foo, Bar]

IronPython is another popular Python implementation, written entirely in C# and targeting the .NET stack. In particular, it runs on what you might call the .NET Virtual Machine, Microsoft’s Common Language Runtime (CLR), comparable to the JVM.

You might say that Jython : Java :: IronPython : C#. They run on the same respective VMs, you can import C# classes from your IronPython code and Java classes from your Jython code, etc.

It’s totally possible to survive without ever touching a non-CPython Python implementation. But there are advantages to be had from switching, most of which are dependent on your technology stack. Using a lot of JVM-based languages? Jython might be for you. All about the .NET stack? Maybe you should try IronPython (and maybe you already have).

By the way: while this wouldn’t be a reason to use a different implementation, note that these implementations do actually differ in behavior beyond how they treat your Python source code. However, these differences are typically minor, and dissolve or emerge over time as these implementations are under active development. For example, IronPython uses Unicode strings by default; CPython, however, defaults to ASCII for versions 2.x (failing with a UnicodeEncodeError for non-ASCII characters), but does support Unicode strings by default for 3.x.

Just-in-Time Compilation: PyPy, and the Future

So we have a Python implementation written in C, one in Java, and one in C#. The next logical step: a Python implementation written in… Python. (The educated reader will note that this is slightly misleading.)

Here’s where things might get confusing. First, lets discuss just-in-time (JIT) compilation.

JIT: The Why and How

Recall that native machine code is much faster than bytecode. Well, what if we could compile some of our bytecode and then run it as native code? We’d have to pay some price to compile the bytecode (i.e., time), but if the end result was faster, that’d be great! This is the motivation of JIT compilation, a hybrid technique that mixes the benefits of interpreters and compilers. In basic terms, JIT wants to utilize compilation to speed up an interpreted system.

For example, a common approach taken by JITs:

  1. Identify bytecode that is executed frequently.
  2. Compile it down to native machine code.
  3. Cache the result.
  4. Whenever the same bytecode is set to be run, instead grab the pre-compiled machine code and reap the benefits (i.e., speed boosts).

This is what PyPy is all about: bringing JIT to Python (see the Appendix for previous efforts). There are, of course, other goals: PyPy aims to be cross-platform, memory-light, and stackless-supportive. But JIT is really its selling point. As an average over a bunch of time tests, it’s said to improve performance by a factor of 6.27. For a breakdown, see this chart from the PyPy Speed Center:

PyPy is Hard to Understand

PyPy has huge potential, and at this point it’s highly compatible with CPython (so it can run Flask, Django, etc.).

But there’s a lot of confusion around PyPy (see, for example, this nonsensical proposal to create a PyPyPy…). In my opinion, that’s primarily because PyPy is actually two things:

  1. A Python interpreter written in RPython (not Python (I lied before)). RPython is a subset of Python with static typing. In Python, it’s “mostly impossible” to reason rigorously about types (Why is it so hard? Well consider the fact that:

     x = random.choice([1, "foo"])
    

    would be valid Python code (credit to Ademan). What is the type of x? How can we reason about types of variables when the types aren’t even strictly enforced?). With RPython, you sacrifice some flexibility, but instead make it much, much easier to reason about memory management and whatnot, which allows for optimizations.

  2. A compiler that compiles RPython code for various targets and adds in JIT. The default platform is C, i.e., an RPython-to-C compiler, but you could also target the JVM and others.

Solely for clarity, I’ll refer to these as PyPy (1) and PyPy (2).

Why would you need these two things, and why under the same roof? Think of it this way: PyPy (1) is an interpreter written in RPython. So it takes in the user’s Python code and compiles it down to bytecode. But the interpreter itself (written in RPython) must be interpreted by another Python implementation in order to run, right?

Well, we could just use CPython to run the interpreter. But that wouldn’t be very fast.

Instead, the idea is that we use PyPy (2) (referred to as the RPython Toolchain) to compile PyPy’s interpreter down to code for another platform (e.g., C, JVM, or CLI) to run on our machine, adding in JIT as well. It’s magical: PyPy dynamically adds JIT to an interpreter, generating its own compiler! (Again, this is nuts: we’re compiling an interpreter, adding in another separate, standalone compiler.)

In the end, the result is a standalone executable that interprets Python source code and exploits JIT optimizations. Which is just what we wanted! It’s a mouthful, but maybe this diagram will help:

To reiterate, the real beauty of PyPy is that we could write ourselves a bunch of different Python interpreters in RPython without worrying about JIT (barring a few hints). PyPy would then implement JIT for us using the RPython Toolchain/PyPy (2).

In fact, if we get even more abstract, you could theoretically write an interpreter for any language, feed it to PyPy, and get a JIT for that language. This is because PyPy focuses on optimizing the actual interpreter, rather than the details of the language it’s interpreting.

You could theoretically write an interpreter for any language, feed it to PyPy, and get a JIT for that language.

As a brief digression, I’d like to mention that the JIT itself is absolutely fascinating. It uses a technique called tracing, which executes as follows:

  1. Run the interpreter and interpret everything (adding in no JIT).
  2. Do some light profiling of the interpreted code.
  3. Identify operations you’ve performed before.
  4. Compile these bits of code down to machine code.

For more, this paper is highly accessible and very interesting.

To wrap up: we use PyPy’s RPython-to-C (or other target platform) compiler to compile PyPy’s RPython-implemented interpreter.

Wrapping Up

Why is this so great? Why is this crazy idea worth pursuing? I think Alex Gaynor put it well on his blog: “[PyPy is the future] because [it] offers better speed, more flexibility, and is a better platform for Python’s growth.”

In short:

  • It’s fast because it compiles source code to native code (using JIT).
  • It’s flexible because it adds the JIT to your interpreter with very little additional work.
  • It’s flexible (again) because you can write your interpreters in RPython, which is easier to extend than, say, C (in fact, it’s so easy that there’s a tutorial for writing your own interpreters).

Appendix: Other Names You May Have Heard

  • Python 3000 (Py3k): an alternative naming for Python 3.0, a major, backwards-incompatible Python release that hit the stage in 2008. The Py3k team predicted that it would take about five years for this new version to be fully adopted. And while most (warning: anecdotal claim) Python developers continue to use Python 2.x, people are increasingly conscious of Py3k.

  • Cython: a superset of Python that includes bindings to call C functions.
    • Goal: allow you to write C extensions for your Python code.
    • Also lets you add static typing to your existing Python code, allowing it to be compiled and reach C-like performance.
    • This is similar to PyPy, but not the same. In this case, you’re enforcing typing in the user’s code before passing it to a compiler. With PyPy, you write plain old Python, and the compiler handles any optimizations.

  • Numba: a “just-in-time specializing compiler” that adds JIT to annotated Python code. In the most basic terms, you give it some hints, and it speeds up portions of your code. Numba comes as part of the Anaconda distribution, a set of packages for data analysis and management.

  • IPython: very different from anything else discussed. A computing environment for Python. Interactive with support for GUI toolkits and browser experience, etc.

  • Psyco: a Python extension module, and one of the early Python JIT efforts. However, it’s since been marked as “unmaintained and dead”. In fact, the lead developer of Psyco, Armin Rigo, now works on PyPy.

Language Bindings

  • RubyPython: a bridge between the Ruby and Python VMs. Allows you to embed Python code into your Ruby code. You define where the Python starts and stops, and RubyPython marshals the data between the VMs.

  • PyObjc: language-bindings between Python and Objective-C, acting as a bridge between them. Practically, that means you can utilize Objective-C libraries (including everything you need to create OS X applications) from your Python code, and Python modules from your Objective-C code. In this case, it’s convenient that CPython is written in C, which is a subset of Objective-C.

  • PyQt: while PyObjc gives you binding for the OS X GUI components, PyQt does the same for the Qt application framework, letting you create rich graphic interfaces, access SQL databases, etc. Another tool aimed at bringing Python’s simplicity to other frameworks.

JavaScript Frameworks

  • pyjs (Pyjamas): a framework for creating web and desktop applications in Python. Includes a Python-to-JavaScript compiler, a widget set, and some more tools.

  • Brython: a Python VM written in JavaScript to allow for Py3k code to be executed in the browser.

Looking to hire top engineers? Check out Toptal's Python developers!
Editor's note: want posts just like this delivered straight to your inbox? Subscribe below to receive our latest engineering articles.
Subscribe to our engineering blog for the latest tips

Comments

Alex Rodionov
Nice article! Request the same overview of Ruby implementations.
krukmat
Excellent article! So we should prepare to PyPy!? When?
Radan Skoric
Interesting article! Don't forget, that while very young there's also Topaz: http://www.topazruby.com/ Sort of an inverse of RubyPython. :)
Eike Post
Yes, great article!
ironmagma
Don't forget Skulpt!
fisadev
Really nice article!! But there is one little thing that isn't correct: you wrote that in python 3 and ironpython the "default string encoding" is unicode. Unicode is *not* an encoding, it's a specification, just a list of known chars, and you can't encode strings in unicode. Encodings define how to convert from unicode chars to binary representations and back. You just can't encode a string in unicode, because encoding is precisely the act of transforming from unicode to binary representations. For more information, read this excelent article on the topic: http://www.joelonsoftware.com/articles/Unicode.html But again, the rest of the article is just great :)
Prasanna Venkadesh
Informative. I still don't get this. If both Java and Python produces Bytecodes which in-turn is run by VM's why Java is called as Compiled language and Python is called Interpreted language? Can we say Interpretation is an additionally implemented feature that Python has and Java misses?
Carl Friedrich Bolz
Because Java has better marketing :-). More seriously, in Java you need to invoke the bytecode compiler explicitly (javac), whereas in Python it is invoked under the covers.
Ray Mears
Or Python the bad-ass, big fucking snake.
Terrel Shumway
Get it now. Use it now. It works now. The few changes you have to make to your code were probably a good idea anyway.
krukmat
Hey Eike! How is it going?
krukmat
Thanks Terrel! I'll follow your piece of advice
crm416
Very good suggestion--thanks! I've modified the text to reflect this comment.
shawnfumo
Besides what Carl said, I think the fact that Java is more static typed probably contributes to the impression people have. "Scripting language" doesn't really tell you much but most assume they are interpreted.
Fei Fan
Nice article!
gcavalcante8808
Awesome Article!! Congratz!
neuruss
Good summary. Let me add two more projects to the list: Shedskin and HotPy. https://code.google.com/p/shedskin/ http://www.hotpy.org/
asmeurer
In my experience, CPython and PyPy are the only ones you should take seriously, in the sense that they are the only ones that implement the core language well enough to run a full application or library. And PyPy is only half serious because you can't run compiled libraries on it, and it is still stuck on Python 2.
David Jensen
The Sku[pt compiler which compiles Python into JavaScript, similar to Brython, is active. There a currently many messages from it on Google Code. In may have come out of inactivity when Brython became very active on Google Code. When Brython started, an elaborate post on a site explained why it could not be done. I do not remember the reasoning, but I do not think this is correct.
Sebastian Raschka
Great work! This article will be referenced in future, I am sure. I am just wondering: Why would someone come up with something like RubyPython, what's the point, and what's the benefit?
disillusioned
Brilliant! You lost this Python noob when you started going into PyPy, but up to that point I found your explanation of the various Pythons incredibly clear and extraordinarily helpful. Thank you so much for this article!
raj
Thanks for the link.!
shawnfumo
While I like this article overall, I think a big part of the JIT topic is missing. First is info on how Jython and IronPython allow interoperability with those platforms' libraries. Then JIT is brought up, but only in relation to CPython vs PyPy. This seems a bit odd since the default implementations of the JVM (HotSpot) and CLR (and V8 for that matter) are themselves JIT compilers. V8 doesn't even have an interpreter per se. It has two different JIT compilers (one which optimizes more based on previous inline cache data). I just wanted to bring this up since the article text and top diagram might lead someone to believe that Java is normally just "interpreted" at the bytecode level the same as Python is. My understanding is that it is at first, but then any functions called enough times will be JIT to optimized machine code. My guess would be that Jython, IronPython and maybe even a good JS implementation could be faster than CPython. Brython probably is slower right now since it looks like it does a big eval to translate the code and then at least one eval in each function to get parameters as local variables. If it was pre-compiled to JS outside of the browser and params created a different way (manual vars created in the function body?) it might actually be decently fast.
Dan Stromberg
Nice article, but a few comments: 1) I don't think Jython compiles to JVM bytecode anymore. It was once possible to compile a Python program and run it in a webpage as JVM bytecode, but no more. 2) I don't think Pypy supports its JVM backend anymore. 3) Cython isn't really a superset. It's more of a dialect of Python. In particular, you have to declare the types of your class attributes, making it related to Python.
yegle
Maybe it's because people distributing Python packages using raw source code file, while in Java world people tends to distribute the jar file.
Mathew
This is an incredible article. You should make a Python magazine and make more awesome articles like these.
Ashish Agarwal
Great article...I am new to python and it will definately help me...!!!
Mark Lawrence
Not checked out Nuitka yet http://nuitka.net/pages/overview.html? :)
Peter Wang
I've never really understood why a JVM-based Python couldn't really pick up the mantle of performance. Granted, it does lose compatibility with all compiled extension code (at least PyPy has CFFI) but the JVM has had tremendous amount of manpower put into its JIT and GC technologies.
paulo_jpinto
Nice article, although you fail to mention that also C and Java have multiple implementations. When you describe C, you mention it as a language that compiles to native code, yet there are quite a few interpreters available. Same thing regarding Java. The Sun/Oracle implementation is JIT based only, yet other vendors do exist that support ahead of time compilation to native code.
Kristján Valur Jónsson
Conspicuously absent is Stackless Python: http://stackless.com
Anthony Jude De-Gol
So PyPy on average is about two pi faster... I'm not sorry. :)
atc
just after making your point about python being an interface and the question of interpreted versus compiled being an implementation question, you go on to confuse your readers by talking about C and Java, which are languages (interfaces). one can certainly write interpreters for C, and gcj compiles java to machine code.
Tshepang Lekhonkhobe
Imagine someone who prefers using language A, but wants to benefit from the ecosystem of language B.
Seth @ FBT
Hey Charles, this is an interesting introduction to Python and I thoroughly enjoyed reading it. We at http://www.fireboxtraining.com/ also provide high quality tutorials on Python and a host of other technologies. Visit us for some awesome tips and tricks
eternalko
Great article. Sorry for offtopic, but what tool do you use to generate such great charts?
Cees Timmerman
Shed Skin (compiles to C++) and CorePy (inline ASM) are also interesting.
Luis Alberto Romero Calderon
Great!!!
comments powered by Disqus
Subscribe
Subscribe to Engineering Articles by Email
Trending articles
Relevant technologies