homelane-tech
Published in

homelane-tech

How Python Became 60% Faster in Version 3.11!

Author: Anurag Bhakuni, SDE-1

Python is a high level language compared to C, C++. So, there are a lot of abstractions that happen in the background during runtime to make it executable for the machine. This is one of the reasons it is considerably slower than C/C++/Java. Python code is interpreted at runtime unlike other low level languages.

Interpreter compiles the code into bytecode during runtime which then gets executed by python virtual machine to convert it into machine code which is then executed to display the final result.

There is a vast difference between the above execution model that python uses to what C/C++ uses where the source code is converted into binary code during compilation. This allows the CPU to directly execute it during runtime.

We can write python code directly without mentioning the type of variable values while in a language like java which is a statically typed language, we have to mention the type of variables while declaring them.

This makes development faster and easier for developers but comes at the cost of performance since type checking happens at runtime as well in python.

Runtime Type Checking and Dynamic Compilation of the source code makes it incredibly difficult to write a JIT compiler for python like Java as we don’t know what kind or type of arguments will be provided to a block of code or function. Essentially, this makes optimization difficult.

We can only execute one single thread while within a single instance of python interpreter at a time. This is because python has Global Interpreter Lock (GIL) which blocks multithreading. However, we can implement multithreading using python’s multithreading module but the code will run into memory lock issues.

How a small team at Microsoft is driving to make core Python faster

Guido Van Rossum, the creator of python joined Microsoft in 2020 where he chose to work on making core python faster along with a small team of 6 members. The idea was to improve performance in the next releases so that python can deliver the growing use cases and is widely accepted by the community at the same time.

They formulated a 5 stage process where in the next 5 releases, they are aiming to make core python 5x faster. Python 3.11 has already made python faster by 50% on certain parameters.

To follow that, visit here and learn more about their plans as discussed by Guido van Rossum in Pycon 2021. It’s an open source repository where they are contributing for a faster core python and sharing the roadmap and ideas for the next releases.

Moreover, the idea is to find a solution for platforms like iOS and Apple mobile devices where runtime code generation is not allowed. Also, as we all know that python is widely used for backend development but it doesn’t compare to languages like Javascript which are front runners for frontend development.

In Pycon 2022, Peter Wang who is a founder of Anaconda announced the release of PyScript which allows developers to interleave python in HTML like PHP and call certain Javascript libraries depending upon the user requirements for the frontend web app. It’s built on top of Pyodide which is a third party open source project aimed to bring python to the browser.

An insight into changes planned for core python runtime and execution in v3.11

Static typing: Basically static typing the variables and methods instantly decreases the execution time and improves the overall performance of the source code. With python 3.11, they have released a Self class which lets you annotate the return type. This helps to get useful and predictable results from a function.

Caching in python interpreter: Pre 3.11, python used to allocate memory chunks for every linked stack.

(Image Source: Europython Conference)

In Python 3.11, memory allocation is done lazily which helps in usage of less memory. For common cases, it is done faster and memory is given to a particular object on a need basis. All of this done by LRU caches behind the scenes.

(Image Source: Europython Conference)

Better exception handling: In Python 3.10, try catch used to happen in the bytecode itself. Try would push data to the internal stack which will tell the program pointer where to search for the exception. This resulted in more memory allocation as every try catch block will have its own piece of memory for storing data.

In Python 3.11, a separate lookup table is created which will tell the program the address of the exception if a try catch block is encountered. So, every frame object in python’s internal stack now consumes 160 bytes of memory compared to 240 bytes of the earlier version.

There are similar improvements done across the core python codebase which resulted in better performance and speed in v3.11. Basically, major changes in the data structures and the internal memory allocation algorithms were the key for this release.

What do the numbers say

I tried running two different docker containers. One has v3.10 installed while the other one has v3.11 installed in it. Then, I compared results for certain parameters using the py performance library in the two containers. Here are the results:

Python 3.11 is faster by 60–65% in some of these benchmarks.

All these new developments are making python developers and the programming community really excited about python’s future and we will only see an increase in areas where python is used for developing new and innovative solutions to solve real world challenges.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store