51 Essential Python Interview Questions *
Toptal sourced essential questions that the best Python developers can answer. Driven from our community, we encourage experts to submit questions and offer feedback.
Hire a Top Python Developer NowVikas is a Python Architect with over 15 years of experience designing scalable, secure, high-performance applications. Skilled in architectural planning, DevOps practices, CI/CD pipelines, and containerized environments, Vikas leads the strategic execution of end-to-end development solutions across complex projects. His strong leadership and clear stakeholder communication skills bring a practical understanding to how experienced Python developers think and perform under real-world demands.
These Python questions cover the fundamentals, core language behavior, concurrency models, performance optimization, and practical coding patterns. They are suited for applicants at all experience levels. Whether you’re screening applicants or preparing for your own Python interview, these questions will assess readiness through core Python knowledge and problem-solving skills.
Python`s built-in types fall into several categories.
-
Numeric:
int,float,complex,bool -
Sequence:
str,list,tuple,range -
Mapping:
dict -
Set:
set,frozenset -
Other:
bytes,bytearray,NoneType
The most important practical distinction is mutability.
- Mutable objects can be changed after creation (
list,dict,set,bytearray). - Immutable objects cannot change after creation (
int,float,str,tuple,frozenset,bytes).
Mutable example:
lst = [1, 2, 3]
lst.append(4)
This modifies the original list in place.
Immutable example:
s = "hello"
s = s + " world"
This creates a new string. It does not modify the original string.
Practical consequence (mutable default argument bug):
def append_to(val, lst=[]):
lst.append(val)
return lst
The same default list is reused across calls, which can cause unexpected behavior.
Hashability determines whether an object can be used as a dictionary key or set member.
Only immutable objects are hashable by default, including str, int, tuple (if all elements are hashable), and frozenset. Mutable objects like list and dict are not hashable:
d = {}
d[(1, 2)] = "tuple key works" # ok — tuple is hashable
d[[1, 2]] = "list key" # TypeError — list is not hashable
valid_set = {1, "hello", (1, 2)} # ok
invalid_set = {[1, 2]} # TypeError
Choosing between similar types:
- Use
tupleoverlistwhen the collection should not change. It signals intent and is slightly faster. - Use
frozensetoversetwhen you need animmutableset or want to use it as adictkey. - Use
dictover a list oftuplesfor any key-value data -O(1)lookup versusO(n).
*args collects any number of positional arguments into a tuple.
**kwargs collects any number of keyword arguments into a dictionary.
Both are conventions. The names args and kwargs are not special.
The * and ** operators are what matter.
Example:
def show(*args, **kwargs):
print(args)
print(kwargs)
show(1, 2, 3, name="Alice", age=30)
# (1, 2, 3)
# {`name`: `Alice`, `age`: 30}
Inside the function, *args produces a tuple and **kwargs produces a dictionary.
Correct parameter ordering:
The correct parameter ordering when mixing all types is:
- Regular positional parameters first,
- Then
*args, - Then keyword-only parameters,
- Then
**kwargs.
def func(pos1, pos2, *args, keyword_only, **kwargs):
print(pos1, pos2) # regular positional
print(args) # extra positionals as tuple
print(keyword_only) # must be passed by name
print(kwargs) # extra keyword args as dict
func(1, 2, 3, 4, keyword_only="required", extra="yes")
# 1 2
# (3, 4)
# required
# {`extra`: `yes`}
Forwarding arguments:
A common real-world use is writing wrapper functions that forward arguments without knowing what they are.
They’re useful in decorators, middleware, and delegation patterns:
import functools
def log_call(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
print(f"Calling {func.__name__}")
return func(*args, **kwargs) # forward everything unchanged
return wrapper
@log_call
def create_user(name: str, role: str = "viewer"):
return {"name": name, "role": role}
Unpacking at call sites:
The unpacking operators * and ** also work at call sites, expanding a list or dict into arguments:
def greet(first, last, greeting="Hello"):
print(f"{greeting}, {first} {last}!")
args = ["Alice", "Smith"]
kwargs = {"greeting": "Hi"}
greet(*args, **kwargs)
# Hi, Alice Smith!
Merging dictionaries with **:
** unpacking can also be used to merge dictionaries. Python 3.9+ also supports |.
defaults = {"color": "blue", "size": "medium"}
overrides = {"size": "large", "weight": "heavy"}
merged = {**defaults, **overrides}
print(merged)
# {`color`: `blue`, `size`: `large`, `weight`: `heavy`}
Positional-only and keyword-only parameters:
Python 3 introduced positional-only parameters using “/” in the signature, which completes the full parameter specification syntax:
def func(pos_only1, pos_only2, /, normal, *, kw_only, **kwargs):
pass
In this signature:
-
pos_only1andpos_only2are positional-only and cannot be passed by name -
normalcan be positional or keyword -
kw_onlyiskeyword-onlyand cannot be positional -
kwargscatches remaining keyword arguments
This is the signature style used by many Python built-ins like len, print, and range.
A lambda function is an anonymous function defined with the lambda keyword in a single expression. It can take any number of arguments but can only contain a single expression. It does not allow statements, assignments, or an explicit return keyword.
Lambda syntax:
square = lambda x: x * x
Equivalent named function:
def square(x):
return x * x
print(square(5)) # 25 — identical behavior
Where lambda is the right tool:
Inline sorting keys:
employees = [
{"name": "Alice", "salary": 95000},
{"name": "Bob", "salary": 72000},
{"name": "Carol", "salary": 110000},
]
Sort by salary: Lambda avoids defining a one-line named function.
sorted_employees = sorted(employees, key=lambda e: e["salary"])
Multi-key sort: Sort by department, then salary.
sorted_employees = sorted(
employees,
key=lambda e: (e.get("department"), e["salary"])
)
Short callbacks in functional operations:
numbers = [1, 2, 3, 4, 5, 6, 7, 8]
evens = list(filter(lambda x: x % 2 == 0, numbers)) # [2, 4, 6, 8]
doubled = list(map(lambda x: x * 2, numbers)) # [2, 4, 6, 8, 10, 12, 14, 16]
List comprehensions are often more readable than map() or filter() with lambdas.
evens = [x for x in numbers if x % 2 == 0] # clearer
doubled = [x * 2 for x in numbers] # clearer
Event handlers and UI callbacks:
import tkinter as tk
button = tk.Button(
root,
text="Click me",
command=lambda: print("Button clicked!")
)
Why experienced developers prefer named functions:
- Lambda functions have no useful name in stack traces. Errors often appear as
, which can make debugging harder. - They are harder to test in isolation because there is no function name to reference directly.
- They cannot contain statements or docstrings, which limits readability and tooling.
- PEP 8 discourages assigning a lambda to a variable. If you are naming it, use
def.
When to use a named function instead:
Bad example: Lambda is hard to read and difficult to test or reuse.
process = lambda x: x["price"] * (1 - x["discount"]) if x["discount"] else x["price"]
Good example: A named function is more readable, testable, and reusable.
def calculate_final_price(item: dict) -> float:
"""Apply discount if present, otherwise return base price."""
if item["discount"]:
return item["price"] * (1 - item["discount"])
return item["price"]
The Pythonic rule: Use lambda only when the function is truly throwaway (passed inline to another function and never referenced again). If you find yourself assigning a lambda to a variable or writing a complex expression inside one, reach for def instead.
Why this question matters:
This is a Python interview question that tests whether a candidate understands not just the syntax, but the tradeoffs between concise functional code and maintainable production code. Hiring managers should look for candidates who can identify when lambda is appropriate for short inline behavior and when a named function is the better choice for readability, testing, and debugging. Applicants should clearly explain lambda’s limitations, explain why experienced developers often prefer def, and demonstrate that they understand that working code is not always the same as good code.
A global variable is a name defined at the module level, outside any function or class. It is accessible from anywhere in the module and, when imported, from other modules too.
Module-level global example:
config = {"debug": False, "max_retries": 3}
request_count = 0
def process_request():
global request_count
request_count += 1
if config["debug"]:
print(f"Request #{request_count}")
The global keyword declares that the function intends to modify the module-level variable. Without it, assigning to request_count would raise UnboundLocalError.
Read-only globals like config are relatively safe. Constants and configuration dictionaries are legitimate uses. Mutable globals that are written to from multiple functions are where problems begin.
Hidden-state example:
total = 0
def add(value):
global total
total += value
To test add(), you must remember to reset total first.
In concurrent code, two threads calling add() simultaneously can create a race condition because read-modify-write is not atomic.
Risks of mutable global state:
- Hidden dependencies: Callers cannot tell from the function signature that it modifies global state.
- Testing difficulty: Tests must manage and reset global state between runs.
- Concurrency bugs: Multiple threads reading and writing the same global require explicit locking.
- Unpredictable behavior: Execution order determines the global’s value, making bugs hard to reproduce.
Better alternatives:
Passing state explicitly as arguments and returning new values is easier to test and reason about.
def add(total: int, value: int) -> int:
return total + value
Using a class makes the state explicit and contained.
class Counter:
def __init__(self):
self.total = 0
def add(self, value: int):
self.total += value
Module-level constants are an acceptable form of global state when the values genuinely never change.
MAX_RETRIES = 3
DEFAULT_TIMEOUT = 30.0
Practical rule: If a global variable is modified by more than one function, it should usually be refactored into a class or passed explicitly. If it is truly constant, name it in uppercase and never assign to it after definition.
Why this question matters:
This is a global state interview question for a Python developer that reveals whether a candidate understands hidden dependencies, testability, and concurrency risks in real codebases. Hiring managers should listen for candidates who go beyond defining global and can explain why mutable module-level state causes brittle designs and hard-to-reproduce bugs. Applicants should cover how the global keyword changes scope behavior, why mutable globals are risky, and what cleaner alternatives they would use instead.
A namespace is a mapping from names to objects. It acts like a dictionary that associates variable names with the values they point to. Python uses namespaces to ensure names in different parts of a program do not collide.
Namespace and scope are related but distinct. A namespace is the data structure storing name-to-object mappings. A scope is the region of code where a namespace is directly accessible without a prefix.
Every scope has a namespace, but not every namespace has a scope. Class namespaces exist, but they are not part of the LEGB chain, so you cannot access class attributes by bare name inside methods.
Python’s main namespace types:
-
Built-in namespace: Created when the interpreter starts and holds names like
len,print, andException. - Global namespace: Created when a module is imported and usually lives until the program ends.
- Local namespace: Created each time a function is called and destroyed when it returns.
- Enclosing namespace: Exists for nested functions and lives as long as the outer function’s closure.
Example: locals, global, and enclosing namespaces
x = "global"
def outer():
y = "enclosing"
def inner():
z = "local"
print(locals()) # {`z`: `local`}
print(globals()) # all module-level names including x
inner()
outer()
In this example, locals() shows the current local namespace, while globals() shows the module-level namespace.
Using globals() and locals():
globals()["new_var"] = 42
print(new_var) # 42
def func():
x = 10
locals()["x"] = 99
print(x) # still 10
-
globals()is live, so modifying it changes the global namespace. -
locals()should be treated as a snapshot. Changing it does not reliably modify actual local variables.
Class namespaces and LEGB:
Class namespaces are not part of the LEGB lookup chain, which surprises many developers.
class MyClass:
class_var = "hello"
def method(self):
print(class_var) # NameError — class namespace skipped in LEGB
print(self.class_var) # OK — explicit access via self
Inside a method, class attributes must be accessed explicitly through self or the class name.
__dict__ and attribute lookup:
Every module, class, and instance exposes its namespace through __dict__.
Attribute lookup on obj.name is essentially a dictionary lookup in obj.__dict__, then the class dictionary, then parent class dictionaries in MRO order.
Practical rule: Avoid from module import * because it dumps names into the current namespace and increases the risk of silent collisions.
Use __all__ in your modules to explicitly control what gets exported.
Python resolves variable names by searching four scopes in a fixed order known as the LEGB rule: Local, Enclosing, Global, and Built-in. The first match found wins.
LEGB order:
-
Built-in: outermost scope, including names like
len,print,range, andException - Global: module-level names defined at the top of the file
- Enclosing: outer function scope for nested functions
- Local: innermost scope inside the current function
** Example: Local, enclosing, and global scopes**
x = "global"
def outer():
x = "enclosing"
def inner():
x = "local"
print(x) # "local" — Local wins
inner()
print(x) # "enclosing" — Enclosing wins here
outer()
print(x) # "global" — Global scope
Python searches scopes in LEGB order and uses the first matching name it finds.
What happens when a name is not found:
def func():
print(undefined_name) # NameError: name `undefined_name` is not defined
If Python cannot find a name in any LEGB scope, it raises NameError.
The global keyword:
The global keyword is used when a function needs to modify a module-level variable.
count = 0
def increment():
global count
count += 1
increment()
increment()
print(count) # 2
Without global, assigning to the name would make Python treat it as a local variable.
def increment_broken():
count += 1 # UnboundLocalError
Python sees the assignment and treats count as local, but it has no local value yet.
The nonlocal keyword:
The nonlocal keyword is used to modify a variable from an enclosing function scope.
def make_counter():
count = 0
def increment():
nonlocal count
count += 1
return count
return increment
counter = make_counter()
print(counter()) # 1
print(counter()) # 2
print(counter()) # 3
nonlocal is essential for closures that need to maintain and modify state across calls.
A common gotcha: UnboundLocalError
x = 10
def func():
print(x) # UnboundLocalError
x = 20
func()
Even though x exists globally, Python sees x = 20 and marks x as local for the entire function.
Python determines scope at compile time, not at runtime. If a name is assigned anywhere in a function, Python treats it as local throughout that function unless global or nonlocal is used.
The Pythonic alternative to global:
Passing and returning state explicitly is usually easier to test and reason about.
def increment(count: int) -> int:
return count + 1
count = 0
count = increment(count)
count = increment(count)
print(count) # 2
Explicit argument passing and return values are usually preferable to global because they make dependencies visible, functions easier to test, and behavior more predictable.
Why this question matters:
This is a scope and name resolution Python programming question that tests whether a candidate truly understands how Python looks up names rather than just memorizing the LEGB acronym. Hiring managers should look for candidates who can explain local, enclosing, global, and built-in scopes, plus the role of global and nonlocal in modifying scope behavior. Applicants should be sure to mention lookup order, common gotchas like UnboundLocalError, and what strong scope awareness looks like in everyday function design.
PYTHONPATH is an environment variable containing a colon-separated list of directories that Python prepends to sys.path when the interpreter starts. Any module or package in those directories becomes importable without installation.
Inspecting sys.path:
import sys
print(sys.path)
# [``, `/usr/lib/python3.11`, ...]
# PYTHONPATH directories appear near the front
Modifying the import path at runtime:
import sys
sys.path.append("/path/to/my/modules")
import my_module
In practice, directly manipulating PYTHONPATH or sys.path is usually a code smell. It often means the project structure or packaging is not set up correctly.
A cleaner alternative is installing the project properly with pip install -e ., which registers the package in the environment without path hacks.
Virtual environments:
Virtual environments are the foundation of Python environment management. They create an isolated Python installation per project and prevent dependency conflicts between projects.
Create and activate a virtual environment:
python -m venv .venv
source .venv/bin/activate # Linux/macOS
.venv\Scripts\activate # Windows
Install dependencies and install the current project in editable mode:
pip install -r requirements.txt
pip install -e .
Project metadata and dependencies:
Modern Python projects typically define metadata and dependencies in pyproject.toml rather than setup.py.
[project]
name = "myproject"
version = "0.1.0"
requires-python = ">=3.11"
dependencies = [
"requests>=2.31",
"pydantic>=2.0",
]
[project.optional-dependencies]
dev = ["pytest", "mypy", "ruff"]
Environment variables and secrets:
Use a .env file with python-dotenv for local environment variables.
Never hardcode credentials.
from dotenv import load_dotenv
import os
load_dotenv()
db_url = os.getenv("DATABASE_URL")
api_key = os.getenv("API_KEY")
Modern tooling: uv
Modern teams increasingly use uv, a fast Rust-based package manager that replaces pip and venv with a single tool. It offers faster dependency resolution and built-in lockfile support through uv.lock.
Practical rule: Use virtual environments, install the project properly, and define dependencies in pyproject.toml. Reserve PYTHONPATH for edge cases, not normal project setup.
A docstring is a string literal placed as the first statement in a module, class, or function body. Python stores it as the object’s __doc__ attribute and makes it accessible through help().
One-line docstring example:
def calculate_discount(price: float, pct: float) -> float:
"""Apply a percentage discount to a price."""
return price * (1 - pct / 100)
print(calculate_discount.__doc__)
# "Apply a percentage discount to a price."
help(calculate_discount)
# prints formatted docstring in the REPL
One-line docstrings suit simple functions.
Multi-line docstring example:
Multi-line docstrings are useful when the function has parameters, return values, exceptions, or non-obvious behavior worth documenting.
def fetch_user(user_id: int, include_deleted: bool = False) -> dict | None:
"""
Fetch a user record from the database by ID.
Args:
user_id: The unique identifier of the user.
include_deleted: If True, also searches soft-deleted records.
Returns:
A dict containing user data, or None if not found.
Raises:
ValueError: If user_id is negative.
DatabaseError: If the connection fails.
Example:
>>> fetch_user(42)
{'id': 42, 'name': 'Alice', 'email': 'alice@example.com'}
"""
if user_id < 0:
raise ValueError(f"user_id must be non-negative, got {user_id}")
return db.find(user_id, include_deleted=include_deleted)
Main docstring conventions:
-
Google style: Uses labeled sections like
Args:,Returns:, and ‘Raises:’. This is often the most readable style in source code. - NumPy style: Uses underlined section headers and is common in scientific Python libraries.
-
reStructuredText style: Uses directives like
:param:and:returns:and is the default style for Sphinx.
Google style is usually the easiest to read directly in code. NumPy style is common in scientific libraries. reStructuredText is preferred when generating HTML documentation with Sphinx.
Docstrings and doctest:
Docstrings that include Example: blocks can be run as tests with doctest.
import doctest
doctest.testmod() # runs all >>> examples in the module's docstrings
Why docstrings matter:
Tools like pydoc, sphinx-autodoc, and IDEs extract docstrings automatically to generate API documentation. Well-written docstrings improve the usability of a codebase.
Type annotations and docstrings are complementary. Annotations specify types, while docstrings explain intent, behavior, and edge cases that types alone cannot express.
class Parent(object):
x = 1
class Child1(Parent):
pass
class Child2(Parent):
pass
print(Parent.x, Child1.x, Child2.x)
Child1.x = 2
print(Parent.x, Child1.x, Child2.x)
Parent.x = 3
print(Parent.x, Child1.x, Child2.x)
Output:
1 1 1
1 2 1
3 2 3
The third line of output surprises many developers. Changing Parent.x after Child1.x has been explicitly set produces different results for Child1 and Child2 because Child1 now has its own class attribute, while Child2 still inherits from Parent.
Line-by-line explanation:
- Line 1 (1 1 1):
x = 1is defined onParent. NeitherChild1norChild2defines its ownx, so attribute lookup findsParent.xfor all three. - Line 2 (1 2 1):
Child1.x = 2creates a new class attribute directly onChild1.Child1now has its ownxthat shadowsParent.x.ParentandChild2still useParent.x = 1. - Line 3 (3 2 3):
Parent.x = 3updates the attribute onParent.Child2still has noxof its own, so it now resolves toParent.x = 3.Child1still resolves to its ownx = 2.
Underlying mechanism:
Python class variables are stored in each class’s __dict__. Attribute lookup follows the MRO (Method Resolution Order): Python checks the instance dictionary first, then the class dictionary, then parent class dictionaries in order.
Setting Child1.x = 2 writes directly to Child1.__dict__, which breaks the lookup chain to Parent for Child1 only.
print(Parent.__dict__) # {'x': 3, ...}
print(Child1.__dict__) # {'x': 2, ...}
print(Child2.__dict__) # no 'x' of its own
Practical implication:
Class-level shared state can be dangerous. Changes to a parent class attribute propagate to child classes that have not overridden it, which can create subtle bugs in larger hierarchies. Prefer instance attributes set in __init__ when the state should belong to each object rather than the class.
lst = ['a', 'b', 'c', 'd', 'e']
print(lst[10:])
Output:
[]
No IndexError is raised.
Why slicing behaves differently from direct index access:
Accessing a single element beyond the list length raises IndexError immediately.
lst[10] # IndexError: list index out of range
lst[10:] # [] — no error
Python’s slice implementation clamps the start and stop indices to the valid range of the list before performing the slice. If the start index exceeds the list length, the clamped range is empty, and an empty list is returned.
This is by design. Slices represent ranges, and an out-of-range start simply means an empty range.
Why this is dangerous:
The silent [] return can hide bugs in production code.
def get_page(items, page, page_size=10):
start = page * page_size
return items[start:] # returns [] silently for any out-of-range page
results = get_page(data, 999)
process(results) # processes empty list — no error, wrong behavior
This bug produces no exception. It silently returns no data, which can propagate through a system before anyone notices.
How to guard against it:
def get_page(items, page, page_size=10):
start = page * page_size
if start >= len(items):
raise IndexError(
f"Page {page} out of range — only {len(items) // page_size} pages available"
)
return items[start:start + page_size]
Practical rule:
Whenever a slice start index is computed dynamically rather than hardcoded, validate it against len() first. Do not rely on the silent [] behavior as a signal that something went wrong.
def multipliers():
return [lambda x: i * x for i in range(4)]
print([m(2) for m in multipliers()])
Output:
[6, 6, 6, 6]
Most people expect [0, 2, 4, 6]. The actual output is different because Python closures use late binding. A variable referenced inside a closure is looked up when the inner function is called, not when it is defined.
By the time any of the returned lambdas are called, the for loop has completed and i is left at its final value of 3. Each lambda, therefore, multiplies its argument by 3, giving 3 × 2 = 6 for all four calls.
This behavior is not specific to lambda; the same result occurs with functions defined using def.
Fix 1: Default argument binding
def multipliers():
return [lambda x, i=i: i * x for i in range(4)]
Capturing i as a default argument forces evaluation at definition time, binding the current value of i to each lambda immediately.
Fix 2: Generator instead of list
def multipliers():
for i in range(4):
yield lambda x: i * x
Each lambda is yielded while i still holds its current value. However, this still uses late binding and can behave unexpectedly if the generator is not consumed immediately. Fix 1 is safer.
Fix 3: functools.partial
from functools import partial
from operator import mul
def multipliers():
return [partial(mul, i) for i in range(4)]
partial binds i immediately at creation time, which makes this the most explicit and readable solution.
All three fixes produce the expected output:
[0, 2, 4, 6]
def extendList(val, lst=[]):
lst.append(val)
return lst
list1 = extendList(10)
list2 = extendList(123, [])
list3 = extendList('a')
print(list1)
print(list2)
print(list3)
Output:
[10, 'a']
[123]
[10, 'a']
Many people expect list1 to be [10] and list3 to be [a]. The actual output is different because default argument expressions in Python are evaluated once when the function is defined, not each time it is called.
The empty list [] is created once and then reused across every call that does not pass an explicit list argument. That means list1 and list3 both operate on the same shared list object, while list2 uses a separate list passed explicitly.
This is one of Python’s most common gotchas, and it affects any mutable default argument such as lists, dictionaries, and sets.
How to fix it:
Use None as the default sentinel and create the list inside the function.
def extendList(val, lst=None):
if lst is None:
lst = []
lst.append(val)
return lst
With this fix, the output becomes:
[10]
[123]
['a']
Practical rule:
Do not use a mutable object as a default argument. Use None as a sentinel and create the mutable object inside the function body instead.
Why this question matters:
This Python coding interview question looks at function behavior to determine if a candidate understands how default arguments are evaluated and why mutable defaults can cause subtle bugs. Hiring managers should ask this of junior to mid-level candidates and expect to hear that default values are evaluated once at function definition time, not on each call, and how this leads to shared state across invocations. Applicants should clearly describe this high-frequency issue, explain why it occurs, and demonstrate the correct pattern by using None as a placeholder value and creating the list inside the function to avoid unintended side effects.
lst = [[]] * 5
print(lst) # line 2
lst[0].append(10)
print(lst) # line 4
lst[1].append(20)
print(lst) # line 6
lst.append(30)
print(lst) # line 8
Output:
[[], [], [], [], []]
[[10], [10], [10], [10], [10]]
[[10, 20], [10, 20], [10, 20], [10, 20], [10, 20]]
[[10, 20], [10, 20], [10, 20], [10, 20], [10, 20], 30]
Most people expect lst[0].append(10) to affect only the first inner list. It affects all five because [[]] * 5 does not create five distinct inner lists. It creates one inner list and five references to the same object.
This is equivalent to:
inner = []
lst = [inner, inner, inner, inner, inner]
All five elements refer to the same list object in memory.
Line-by-line explanation:
-
Line 2:
lstcontains five references to the same empty inner list, so the output is[[], [], [], [], []]. -
Line 4:
lst[0].append(10)mutates the single shared inner list. Because every element points to that same object, the output becomes[[10], [10], [10], [10], [10]]. -
Line 6:
lst[1].append(20)mutates the same shared list again, so the output becomes[[10, 20], [10, 20], [10, 20], [10, 20], [10, 20]]. -
Line 8:
lst.append(30)is different. It adds a new element to the outer list rather than modifying the shared inner list. The output becomes[[10, 20], [10, 20], [10, 20], [10, 20], [10, 20], 30].
How to initialize independent inner lists correctly:
Use a list comprehension so that a new inner list is created on each iteration.
lst = [[] for _ in range(5)]
Now mutating one inner list affects only that element:
lst[0].append(10)
print(lst) # [[10], [], [], [], []]
A list comprehension evaluates [] five separate times, creating five distinct objects. The same trap applies to other mutable objects:
Wrong: Five references to the same dictionary.
rows = [{}] * 5
Correct: Five independent dictionaries.
rows = [{} for _ in range(5)]
Practical rule:
Never use * N to initialize a list of mutable objects. Use a list comprehension instead.
In Python 3, the output is:
5 / 2 = 2.5
5.0 / 2 = 2.5
5 // 2 = 2
5.0 // 2.0 = 2.0
/- True division:
/ always returns a float result, regardless of operand types.
For example, 5 / 2 returns 2.5, not 2. This is the default behavior in Python 3.
//- Floor division:
// always rounds down to the nearest integer, but the return type follows the operands. If both operands are integers, it returns an int. If either operand is a float, it returns a float.
5 // 2 # 2 (int)
5.0 // 2.0 # 2.0 (float)
-7 // 2 # -4 (floors toward negative infinity, not toward zero)
Why // is not the same as integer truncation:
// is floor division, not integer division in the “truncate toward zero” sense.
For negative numbers, // floors toward negative infinity rather than truncating toward zero:
-7 // 2 # -4 (floor toward negative infinity)
int(-7 / 2) # -3 (truncate toward 0)
Use math.trunc() or int() if you specifically need truncation toward zero.
Example:
def div1(x, y):
print(f"{x}/{y} = {x/y}")
def div2(x, y):
print(f"{x}//{y} = {x//y}")
div1(5, 2)
div1(5.0, 2)
div2(5, 2)
div2(5.0, 2.0)
Why Python 2 behaved differently:
In Python 2, / performed integer division when both operands were integers, so 5 / 2 returned 2. This caused frequent bugs when the operand types were not immediately obvious.
Python 3 changed / to always perform true division, removing that ambiguity. The old behavior can be recovered with // when floor division is what you want.
Practical rule:
If you maintain code that runs on both Python 2 and Python 3, add from __future__ import division at the top of Python 2 files to get Python 3-style division behavior.
Why this question matters:
This is an operator semantics Python question that tests whether a candidate understands numeric behavior precisely, especially around floor division and negative numbers. Hiring managers should listen for candidates who know that // is floor division, not truncation toward zero, and can explain how Python 3 differs from Python 2. Applicants should cover return types, negative-number behavior, and why this distinction matters in real code instead of answering only with examples.
class DefaultDict(dict):
def __missing__(self, key):
return []
d = DefaultDict()
d['florp'] = 127
The code runs without error, but it does not behave as most people would expect.
What __missing__ does:
__missing__ is called by dict.__getitem__()≈ when a key is not found in the dictionary. It is not called when setting a value with d[key] = value`.
So d[florp] = 127 simply sets a key normally. __missing__ is never involved.
The current implementation only returns a value when accessing a missing key, but it does not store it.
Example:
d = DefaultDict()
print(d) # {}
print(d['foo']) # [] — __missing__ called, returns []
print(d) # {} — nothing was stored
How to fix it:
Store the default value inside __missing__before returning it.
class DefaultDict(dict):
def __missing__(self, key):
self[key] = []
return self[key]
Now the value is created and saved when a missing key is accessed.
d = DefaultDict()
d['foo'].append(1)
print(d) # {'foo': [1]}
Production alternative: collections.defaultdict
In production code, collections.defaultdictis usually the better choice.
from collections import defaultdict
d = defaultdict(list)
d['foo'].append(1)
print(d) # defaultdict(<class 'list'>, {'foo': [1]})
defaultdict handles this pattern correctly without subclassing. Its default factory can be list, int, set, or any zero-argument callable.
When a custom __missing__ is useful:
A custom __missing__ implementation is useful when the default value depends on the key itself.
class KeyBasedDefault(dict):
def __missing__(self, key):
self[key] = key.upper()
return self[key]
d = KeyBasedDefault()
print(d['hello']) # 'HELLO'
It is also useful when missing-key access needs custom side effects beyond simply providing a default value.
Practical rule:
Reach for collections.defaultdict first. Only implement __missing__ when the default value must be computed from the key or when subclass-specific behavior is required.
Why this question matters:
This is a dictionary internals Python interview question that reveals whether a candidate understands how missing-key lookup works under the hood and when custom behavior is worth implementing. Hiring managers should look for candidates who know that __missing__ is tied to key access, not assignment, and can distinguish between a custom dict subclass and collections.defaultdict. Applicants should explain what the current implementation does not do, how to fix it, and when a custom __missing__method is justified over the standard library.
A shallow copy creates a new container object but populates it with references to the same inner objects as the original.
A deep copy creates a new container and recursively copies all inner objects as well, producing a fully independent result.
When it matters:
The distinction matters when the data structure contains mutable nested objects, such as lists of lists, dictionaries of dictionaries, or objects with mutable attributes.
If you shallow-copy a list of lists and then modify an inner list, both the original and the copy see the change because they share the same inner list reference.
For flat structures of immutable objects, such as a list of integers or strings, shallow and deep copy are often functionally identical because immutable objects cannot change.
Example:
import copy
original = [[1, 2], [3, 4]]
shallow = copy.copy(original)
deep = copy.deepcopy(original)
original[0].append(99)
print(shallow[0]) # [1, 2, 99] — shared reference, sees the change
print(deep[0]) # [1, 2] — independent copy, unaffected
Performance tradeoffs:
- Shallow copy: Usually ‘O(n)’ where n is the number of top-level elements. It is generally fast and predictable.
- Deep copy: Recursive and can be significantly slower on large or deeply nested structures. It also handles circular references by tracking already-copied objects, which adds overhead.
- Large nested objects: For large objects like pandas DataFrames or complex class instances, ‘deepcopy’ can be surprisingly expensive. Always profile before using it in a hot path.
Practical rule:
Default to shallow copy and only use deep copy when you have confirmed that inner objects are mutable and must be independent.
Write a single list comprehension that returns only even numbers located at even indices in a list.
lst = [1, 3, 5, 8, 10, 13, 18, 36, 78]
result = [x for x in lst[::2] if x % 2 == 0]
print(result) # [10, 18, 78]
How it works:
-
lst[::2]uses a step of2, selecting only elements at even indices. -
if x % 2 == 0filters that subset to keep only even values. - The conditions are applied in sequence: first select even-indexed elements, then filter for even numbers.
Alternative using enumerate:
This version is more explicit and can be easier to extend when the index logic becomes more complex.
result = [x for i, x in enumerate(lst) if i % 2 == 0 and x % 2 == 0]
enumerate() yields index-value pairs, allowing both conditions to be checked directly in one expression.
Python’s slice syntax is sequence[start:stop:step].
-
startis inclusive -
stopis exclusive -
stepcontrols direction and stride
All three parts are optional
- If omitted, the defaults are roughly:
start = 0stop = len(sequence)step = 1
Basic slicing examples:
lst = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
lst[2:7] # [2, 3, 4, 5, 6]
lst[::2] # [0, 2, 4, 6, 8]
lst[1::2] # [1, 3, 5, 7, 9]
lst[::-1] # [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
lst[7:2:-1] # [7, 6, 5, 4, 3]
Negative indices:
Negative indices count from the end, so -1 is the last element and -2 is the second to last.
lst[-3:] # [7, 8, 9]
lst[:-3] # [0, 1, 2, 3, 4, 5, 6]
lst[-5:-2] # [5, 6, 7]
Boundary behavior:
Slicing does not raise IndexError when indices are out of range. Python silently clamps slice bounds to the valid range.
Slice assignment:
Slices can also be assigned to for in-place modification of mutable sequences.
lst = [1, 2, 3, 4, 5]
lst[1:4] = [20, 30]
print(lst) # [1, 20, 30, 5]
Reusable slice objects:
The ‘slice’ object lets you name and reuse slice definitions.
HEADER = slice(0, 4)
PAYLOAD = slice(4, -2)
CHECKSUM = slice(-2, None)
packet = b"\x01\x02\x03\x04DATADATA\xff\xfe"
print(packet[HEADER]) # header bytes
print(packet[PAYLOAD]) # payload bytes
NumPy slicing:
In NumPy, slicing extends to multiple dimensions and often returns views rather than copies. Modifying a slice can modify the original array.
import numpy as np
matrix = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(matrix[:2, 1:]) # first two rows, columns 1 onward
# [[2 3]
# [5 6]]
Advanced use cases:
- Taking every nth element
- Reversing sequences
- Extracting headers, payloads, or trailing sections
- Replacing subranges in lists
- Working with rows and columns in NumPy arrays
Practical rule:
Use slicing when you need concise range-based access or transformation, but remember that out-of-range slices fail silently and NumPy slices may return views instead of copies.
What are the different ways to reverse a list in Python and when should you use each?
Python offers three main ways to reverse a list, each with different behavior around mutation and memory use.
1. Slicing with [::-1]
This creates a new reversed list and leaves the original unchanged.
lst = [1, 2, 3, 4, 5]
reversed_lst = lst[::-1]
print(reversed_lst) # [5, 4, 3, 2, 1]
print(lst) # [1, 2, 3, 4, 5]
list.reverse()
This reverses the list in place and returns None.
lst = [1, 2, 3, 4, 5]
lst.reverse()
print(lst) # [5, 4, 3, 2, 1]
Use reverse() only when mutating the original list is intentional.
reversed()
This returns a lazy iterator and does not copy the list.
lst = [1, 2, 3, 4, 5]
for item in reversed(lst):
print(item)
Convert it to a list if you need a concrete reversed copy.
reversed_lst = list(reversed(lst))
When to use each approach:
- Use
lst[::-1]when you want a reversed copy and want to keep the original unchanged. - Use
reverse()when you want to mutate the original list in place. - Use
reversed()when you only need to iterate in reverse and want to avoid making a copy.
For large lists, reversed() is usually the most memory-efficient because it returns an iterator rather than allocating a new list.
Reversing strings:
Strings are immutable, so reverse() does not exist for them. Use slicing instead.
s = "hello"
print(s[::-1]) # "olleh"
Reversing words in a sentence:
sentence = "the quick brown fox"
print(" ".join(reversed(sentence.split())))
This would return “fox brown quick the”.
Other sequences and custom objects:
reversed() works on any object that implements __reversed__ or supports both __len__ and __getitem__.
Custom classes can support reversed() by implementing __reversed__.
class Countdown:
def __init__(self, start):
self.start = start
def __reversed__(self):
return iter(range(1, self.start + 1))
print(list(reversed(Countdown(5)))) # [1, 2, 3, 4, 5]
Practical rule:
Use slicing when you need a reversed copy, reverse() when you want in-place mutation, and reversed() when you only need reverse iteration without allocating a new list.
Flattening depends on the depth of nesting. For a single level of nesting, such as a list of lists, several concise approaches work.
For one level of nesting:
nested = [[1, 2], [3, 4], [5, 6]]
List comprehension:
flat = [x for sublist in nested for x in sublist]
itertools.chain.from_iterable():
import itertools
flat = list(itertools.chain.from_iterable(nested))
sum() trick:
flat = sum(nested, [])
itertools.chain.from_iterable() is the preferred approach for large lists because it is lazy and avoids building intermediate objects. The sum() trick creates a new list on every addition and has poor performance, so it is usually best avoided.
For arbitrary deep nesting, a recursive generator handles any structure:
from collections.abc import Iterable
def flatten(items):
for item in items:
if isinstance(item, Iterable) and not isinstance(item, (str, bytes)):
yield from flatten(item)
else:
yield item
deep = [1, [2, [3, [4, [5]]]], 6]
print(list(flatten(deep))) # [1, 2, 3, 4, 5, 6]
Strings and bytes are iterable, but they should not usually be flattened. Without that exclusion, a string would be broken into individual characters.
mixed = ["hello", [1, 2], ["world"]]
print(list(flatten(mixed))) # ["hello", 1, 2, "world"]
For NumPy arrays, use ravel() or flatten():
import numpy as np
matrix = np.array([[1, 2, 3], [4, 5, 6]])
print(matrix.ravel()) # [1 2 3 4 5 6]
print(matrix.flatten()) # [1 2 3 4 5 6]
For NumPy arrays, ravel() returns a view where possible, while flatten() always returns a copy.
Practical rule:
Use a list comprehension or itertools.chain.from_iterable() for one-level flattening. Use a recursive generator for arbitrary nesting. For NumPy arrays, prefer ravel() or flatten().
A dictionary comprehension builds a dictionary from an iterable in a single expression using {key: value for item in iterable} syntax. It is the dictionary equivalent of a list comprehension and is often more concise and readable than building the same dictionary with a loop.
Basic example: ** ~~~ squares = {n: n2 for n in range(6)} # {0: 0, 1: 1, 2: 4, 3: 9, 4: 16, 5: 25} ~~~
Transforming an existing dictionary:
prices = {"apple": 1.50, "banana": 0.75, "cherry": 3.00}
discounted = {item: price * 0.9 for item, price in prices.items()}
Filtering items:
expensive = {item: price for item, price in prices.items() if price > 1.00}
# {"apple": 1.50, "cherry": 3.00}
Inverting a dictionary:
This only works cleanly when the values are unique and hashable.
inverted = {v: k for k, v in prices.items()}
Building a dictionary from two lists:
keys = ["name", "age", "city"]
values = ["Alice", 30, "London"]
record = {k: v for k, v in zip(keys, values)}
# {"name": "Alice", "age": 30, "city": "London"}
Using enumerate():
Dictionary comprehensions also work with enumerate() when you need index-based keys.
fruits = ["apple", "banana", "cherry"]
indexed = {i: fruit for i, fruit in enumerate(fruits)}
# {0: "apple", 1: "banana", 2: "cherry"}
When a comprehension becomes too complex:
The same readability rule that applies to list comprehensions applies here. If the expression requires nested comprehensions or complex conditionals, a loop is usually clearer.
result = {}
for item in inventory:
if item["in_stock"] and item["price"] > 0:
category = item["category"]
result[item["id"]] = {
"name": item["name"],
"price": apply_tax(item["price"], category),
}
Practical rule:
Use a dictionary comprehension when the transformation is direct and easy to read. If you cannot read it aloud in one breath and understand it immediately, write a loop instead.
Python offers several ways to merge dictionaries, each with different behavior around mutation and key conflict resolution. In all cases, when the same key exists in both dictionaries, the rightmost value wins.
a = {"x": 1, "y": 2}
b = {"y": 99, "z": 3}
Python 3.9+ merge operator |:
This creates a new dictionary and does not mutate either original.
merged = a | b
print(merged) # {"x": 1, "y": 99, "z": 3}
Python 3.9+ update operator |=:
This mutates the left-hand dictionary in place.
a |= b
print(a) # {"x": 1, "y": 99, "z": 3}
Dictionary unpacking with **:
This works across Python 3 versions and creates a new dictionary.
merged = {**a, **b}
dict() constructor with unpacking:
merged = dict(**a, **b)
The | operator is often the clearest modern approach because it is readable, explicit, and non-mutating. Use |= when you specifically want an in-place update.
Merging more than two dictionaries:
The | operator chains cleanly.
defaults = {"color": "blue", "size": "medium", "weight": "light"}
config = {"size": "large"}
overrides = {"color": "red"}
final = defaults | config | overrides
# {"color": "red", "size": "large", "weight": "light"}
Nested dictionaries:
The standard merge approaches only merge at the top level. They do not recursively merge nested dictionaries.
For deep merging, use a recursive function.
def deep_merge(base: dict, override: dict) -> dict:
result = base.copy()
for key, value in override.items():
if key in result and isinstance(result[key], dict) and isinstance(value,dict):
result[key] = deep_merge(result[key], value)
else:
result[key] = value
return result
base = {"db": {"host": "localhost", "port": 5432}, "debug": False}
override = {"db": {"port": 5433, "name": "prod"}}
print(deep_merge(base, override))
# {"db": {"host": "localhost", "port": 5433, "name": "prod"}, "debug": False}
A shallow merge would replace the entire nested db dictionary and lose host. A deep merge preserves keys from the base dictionary that are not overridden.
Practical rule:
Use | for a clean non-mutating merge, |= or .update() for in-place updates, and a recursive merge function when nested dictionaries must be merged rather than replaced.
What is a Python generator and how does it differ from a list comprehension in terms of memory?
A generator is a function that uses yield instead of return. Calling it does not execute the body immediately. It returns a generator object, and execution resumes each time next() is called or the generator is iterated.
The function’s local state, including variables, loop counters, and execution position, is preserved between yields.
Memory comparison:
- A list comprehension executes completely and builds the entire list in memory before the first element is consumed.
- A generator produces one item at a time and discards it after use, so memory usage stays much lower regardless of dataset size.
Generator example:
def squares(n):
for i in range(n):
yield i * i
Only one yielded value needs to exist at a time.
gen = squares(10_000_000)
print(next(gen)) # 0
print(next(gen)) # 1
List comprehension example:
A list comprehension builds all values immediately.
lst = [i * i for i in range(10_000_000)]
Constraints generators introduce:
-
Exhaustion: A generator can only be iterated once. After the last yield, it raises
StopIterationand cannot be reused without recreating it. -
No random access: Unlike a list, you cannot index into a generator.
gen[5]raisesTypeError. -
No length until consumed:
len(gen)raisesTypeErrorbecause the total count is not known in the same way as a list.
When to use each:
Use a generator when processing large datasets, streaming data, or building pipelines where only one item needs to be in memory at a time. Use a list when you need to iterate multiple times, index by position, check length, or pass the full collection to a function that expects a sequence.
Practical rule:
Use generators for lazy, memory-efficient iteration. Use list comprehensions when you need an eagerly built sequence with full list behavior.
An iterable is any object that can return an iterator. It implements __iter__(), which returns an iterator object. Lists, tuples, strings, dictionaries, and files are all iterables.
An iterator is an object that produces values one at a time. It implements both __iter__() and __next__(). Calling __next__() returns the next value or raises StopIteration when the iterator is exhausted.
The distinction matters because you can iterate over a list multiple times, but an iterator itself is usually single-use.
Iterable versus iterator:
numbers = [1, 2, 3] # iterable — not an iterator
it = iter(numbers) # iterator — produced from the iterable
print(next(it)) # 1
print(next(it)) # 2
print(next(it)) # 3
print(next(it)) # StopIteration
Calling iter() on an iterable usually produces a fresh iterator. An iterator itself cannot be reset after exhaustion.
How a for loop works:
A for loop is essentially syntactic sugar for this pattern:
it = iter(numbers)
while True:
try:
value = next(it)
except StopIteration:
break
Implementing a custom iterator:
class Countdown:
def __init__(self, start):
self.current = start
def __iter__(self):
return self
def __next__(self):
if self.current <= 0:
raise StopIteration
value = self.current
self.current -= 1
return value
for n in Countdown(5):
print(n) # 5, 4, 3, 2, 1
Countdown is both iterable and iterator. Calling iter() on it returns itself.
cd = Countdown(3)
print(next(cd)) # 3
print(next(cd)) # 2
Why this design has a limitation:
A class-based iterator like Countdown cannot be iterated twice, because __iter__() returns the same exhausted object instead of a fresh iterator.
To support multiple passes, separate the iterable from the iterator.
Iterable and iterator as separate objects:
class CountdownIterable:
def __init__(self, start):
self.start = start
def __iter__(self):
return CountdownIterator(self.start)
class CountdownIterator:
def __init__(self, current):
self.current = current
def __iter__(self):
return self
def __next__(self):
if self.current <= 0:
raise StopIteration
value = self.current
self.current -= 1
return value
cd = CountdownIterable(3)
print(list(cd)) # [3, 2, 1]
print(list(cd)) # [3, 2, 1]
This works because __iter__() creates a fresh iterator each time.
How generators relate:
Generators are iterators. A generator function automatically implements the full iterator protocol for you. Python creates __iter__() and __next__() automatically and preserves execution state between yield calls.
Generator version:
def countdown(start):
while start > 0:
yield start
start -= 1
for n in countdown(5):
print(n) # 5, 4, 3, 2, 1
Practical rule:
Use iterables when you need something that can produce a fresh iterator on demand. Use iterators when consuming values one at a time. Prefer generators over class-based iterators for most use cases, because they are shorter, clearer, and handle state preservation automatically. Use a class-based iterator when you need extra methods on the iterator object or more explicit control over the iteration logic.
Why this question matters:
This is an iteration protocol Python interview question that shows whether a candidate understands how Python iteration actually works beneath for loops. Hiring managers should look for candidates who can distinguish an iterable from an iterator, explain iter() and next(), and describe exhaustion and single-use behavior correctly. Applicants should be sure to mention the iterator protocol, why some objects can be iterated multiple times while others cannot, and how generators fit naturally into the same model.
A module is any single .py file. A package is a directory that contains an __init__.py file, which tells Python to treat that directory as a package.
Importing a package executes its __init__.py file, which is commonly used to expose a clean public API by importing selected names from submodules.
Example package structure:
mypackage/
__init__.py
utils.py
models/
__init__.py
user.py
In this structure:
-
utils.pyis a module -
modelsis a nested package -
user.pyis a module inside that package
Absolute imports:
Absolute imports use the full path from the project root.
from mypackage.utils import helper
from mypackage.models.user import User
Relative imports:
Relative imports are based on the current module’s location.
from . import utils
from .models import user
Python searches for modules using the current environment and import path, including the current directory, entries on PYTHONPATH, and the standard library.
Why packages are useful:
Packages help organize related modules, avoid name collisions, and make large codebases easier to maintain.
They also improve readability by grouping related functionality into a clear structure.
Absolute versus relative imports:
- Absolute imports: Usually clearer and less ambiguous. They work well in larger codebases and make it obvious where an import comes from.
- Relative imports: Useful inside a package when you want imports to remain local to that package.
Absolute imports are usually preferred in most cases because they are easier to read and continue to work regardless of where the importing module is called from.
Relative imports are most useful when you want imports inside a package to remain valid even if the package is renamed or moved as a unit.
Practical rule:
Use modules for individual files, packages for organizing related modules, absolute imports by default, and relative imports mainly for internal package structure.
A decorator is a callable that takes a function as input, wraps it with additional behavior, and returns a new callable.
The @decorator syntax is syntactic sugar. It is exactly equivalent to reassigning the function name to the result of calling the decorator.
@my_decorator
def greet():
pass
# Exactly equivalent to:
def greet():
pass
greet = my_decorator(greet)
How a decorator works:
def my_decorator(func):
def wrapper(*args, **kwargs):
print("Before the function runs")
result = func(*args, **kwargs)
print("After the function runs")
return result
return wrapper
@my_decorator
def greet(name):
"""Says hello to someone."""
print(f"Hello, {name}!")
greet("Alice")
# Before the function runs
# Hello, Alice!
# After the function runs
Why functools.wraps matters:
Without functools.wraps, the wrapper function replaces the original and takes its metadata with it.
Without functools.wraps:
def my_decorator(func):
def wrapper(*args, **kwargs):
return func(*args, **kwargs)
return wrapper
@my_decorator
def greet(name):
"""Says hello to someone."""
pass
print(greet.__name__) # 'wrapper'
print(greet.__doc__) # None
print(greet.__module__) # original module may remain, but other metadata is lost
This breaks documentation tools, stack traces, logging, and code that inspects function metadata, including pytest, sphinx, and functools.lru_cache.
The fix: functools.wraps
import functools
def my_decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
print("Before")
result = func(*args, **kwargs)
print("After")
return result
return wrapper
@my_decorator
def greet(name):
"""Says hello to someone."""
pass
print(greet.__name__) # 'greet'
print(greet.__doc__) # 'Says hello to someone.'
print(greet.__wrapped__) # access to the original function
‘functools.wraps’ preserves important metadata such as ‘name, doc, module’, ‘qualname’,’ annotations’, and ‘dict’. ~~~
Stacking multiple decorators:
@decorator_a
@decorator_b
def my_func():
pass
# Equivalent to:
my_func = decorator_a(decorator_b(my_func))
decorator_b is applied first, then decorator_a wraps the result.
Order matters. Decorators are applied bottom-up, but their effects execute top-down at call time.
Practical rule:
Use decorators when you want to add reusable behavior around a function, such as logging, timing, caching, or access control. Use functools.wraps in almost all custom decorators so the wrapped function keeps its original metadata.
Why this question matters:
This is a @decorator python programming question that tests whether a candidate understands both syntax and the mechanics behind higher-order functions. Hiring managers should listen for candidates who can translate @decorator into explicit reassignment, explain wrapping behavior clearly, and understand why functools.wraps matters for metadata preservation. Applicants should cover how decorators work, what problem wraps solves, and why decorator order matters when multiple decorators are stacked.
What are Python context managers and when would you use contextlib.contextmanager'over a class-based implementation?
A context manager is any object that implements the methods __enter__ and exit. The with statement calls enter when execution enters the block and guarantees that exit` is called on exit, whether the block completes normally, raises an exception, or hits a return. This makes context managers the standard pattern for managing resources such as file handles, database connections, locks, and temporary state changes.
The __exit__ method receives three arguments: the exception type, value, and traceback. Returning a truthy value suppresses the exception, while returning None or False allows it to propagate.
Class-based implementation:
class ManagedConnection:
def __init__(self, url):
self.url = url
self.conn = None
def __enter__(self):
self.conn = connect(self.url)
return self.conn
def __exit__(self, exc_type, exc_val, exc_tb):
self.conn.close()
Returning False from __exit__ means exceptions are not suppressed and will continue to propagate normally.
Generator-based implementation using @contextlib.contextmanager:
from contextlib import contextmanager
import time
@contextmanager
def timer(label):
start = time.perf_counter()
try:
yield
finally:
elapsed = time.perf_counter() - start
print(f"{label}: {elapsed:.3f}s")
with timer("database query"):
results = db.execute(query)
Decision framework of when to use each:
- Simple setup and teardown, one-off use:
@contextmanager - Logic needs to be subclassed or extended: Class-based
- Context manager needs to maintain state across multiple
withuses: Class-based - Need fine-grained exception handling logic in
__exit__: Class-based - Readability and brevity matter most:
@contextmanager - Reusable across a codebase, shared as a utility: Use either approach, class-based is more explicit
The generator approach is idiomatic for simple cases and reads naturally: setup occurs before yield, and teardown runs after. The class-based approach is better when the context manager is stateful, requires inheritance, or has complex exception-handling logic that would be awkward to express around a single yield.
Why this question matters:
This is a resource management Python interview question that tests whether a candidate understands how to acquire and release resources safely in real applications. Context managers have been introduced to handle this process automatically, even under error conditions. Hiring managers should look for candidates who are able to explain how __enter__ and __exit__ work with the with statement, and why this pattern prevents subtle bugs around file handling, database connections, and locks. Applicants should clearly distinguish between class-based context managers and contextlib.contextmanager, and explain when each approach is more appropriate based on complexity, reuse, and readability.
What are the most common real-world use cases for Python decorators, beyond the textbook timer?
Decorators are the idiomatic Python tool for cross-cutting concerns, meaning behavior that needs to be applied consistently across many functions without cluttering their core logic. The most common production use cases include logging, access control, and caching.
Logging and auditing:
import functools
import logging
import time
logger = logging.getLogger(__name__)
def log_call(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
logger.info(f"Calling {func.__name__} with args={args} kwargs={kwargs}")
start = time.perf_counter()
try:
result = func(*args, **kwargs)
elapsed = time.perf_counter() - start
logger.info(f"{func.__name__} completed in {elapsed:.3f}s")
return result
except Exception as e:
logger.error(f"{func.__name__} raised {type(e).__name__}: {e}")
raise
return wrapper
@log_call
def process_order(order_id: int, amount: float):
"""Process a customer order."""
return {"order_id": order_id, "status": "processed"}
Without a decorator, every function would need identical try/except and logging boilerplate duplicated across hundreds of functions, making it extremely difficult to update consistently.
Access control and authentication:
import functools
from typing import Callable
def require_role(role: str):
def decorator(func: Callable):
@functools.wraps(func)
def wrapper(*args, **kwargs):
current_user = get_current_user()
if current_user is None:
raise PermissionError("Authentication required")
if role not in current_user.roles:
raise PermissionError(
f"Role '{role}' required, user has {current_user.roles}"
)
return func(*args, **kwargs)
return wrapper
return decorator
@require_role("admin")
def delete_user(user_id: int):
"""Delete a user — admin only."""
pass
@require_role("editor")
def publish_article(article_id: int):
"""Publish an article — editor only."""
pass
In a real application, current_user would typically come from a request or session context, rather than being defined directly in the function.
This pattern is used directly in Flask (@login_required), Django (@permission_required), and FastAPI (Depends). The role check is defined once and applied declaratively. Adding the same logic inline to every protected function would be verbose and more error-prone.
Caching computed results:
import functools
import time
def timed_cache(seconds: int):
"""Cache function results for `seconds` seconds — then recompute."""
def decorator(func):
cache = {}
@functools.wraps(func)
def wrapper(*args):
now = time.monotonic()
if args in cache:
result, timestamp = cache[args]
if now - timestamp < seconds:
return result
result = func(*args)
cache[args] = (result, now)
return result
return wrapper
return decorator
@timed_cache(seconds=60)
def fetch_exchange_rate(currency: str) -> float:
"""Fetch live exchange rate — cached for 60 seconds."""
return call_exchange_rate_api(currency)
The caching example assumes the function makes an external API call, which can be slow and expensive. This is why it`s a good candidate for caching.
Note: Python’s built-in functools.lru_cache and functools.cache handle simpler cases of indefinite caching. This custom version adds TTL (time-to-live) which the built-ins do not support.
Why this question matters:
This Python interview question focuses on design and abstraction, revealing whether a candidate understands how to apply behavior consistently across a codebase without duplicating logic. Hiring managers can identify strong candidates by listening to how they describe real-world uses for decorators such as logging, authentication, caching, and rate limiting, rather than theoretical definitions. Applicants should demonstrate that they understand decorators as a practical tool for separating core logic from cross-cutting concerns, and be able to explain why this improves maintainability and consistency in production systems.
Python supports single, multiple, multilevel, and hierarchical inheritance.
Single inheritance is the simplest, where one class inherits from one parent.
Multiple inheritance is the most powerful and complex, as a class inherits from more than one parent at the same time.
Hierarchical inheritance is when multiple classes inherit from the same parent.
Multilevel inheritance creates a chain where A inherits from B which inherits from C.
Single inheritance:
class Animal:
def speak(self):
return "..."
class Dog(Animal):
def speak(self):
return "Woof"
Multiple inheritance:
class Flyable:
def move(self):
return "flying"
class Swimmable:
def move(self):
return "swimming"
class Duck(Flyable, Swimmable):
pass
d = Duck()
print(d.move())
This example returns “flying” because Flyable is listed first, so Python checks it first when resolving the move() method.
The Method Resolution Order (MRO) defines the sequence Python follows when looking up a method or attribute across a class hierarchy. It matters most in multiple inheritance, where the same method name may exist in more than one parent class.
Python uses the C3 linearization algorithm to compute the MRO. In practice, this means a class always appears before its parents, and the order of parents in the class definition is preserved.
You can inspect the MRO directly by using __mro__.
print(Duck.__mro__)
Or you can use a simpler format that is easier to read (both are technically correct):
print([cls.__name__ for cls in Duck.__mro__])
The diamond problem occurs when two parent classes share a common ancestor and a child class inherits from both. Without a defined resolution order, it would be unclear which inherited method to call.
class A:
def greet(self):
return "Hello from A"
class B(A):
def greet(self):
return "Hello from B"
class C(A):
def greet(self):
return "Hello from C"
class D(B, C):
pass
d = D()
print(d.greet())
print(D.__mro__)
Python’s C3 algorithm resolves the diamond issue cleanly. In the example above, the call returns “Hello from B” because Python follows the MRO for D, searching B first, then C, then A. Class A is only visited once regardless of how many inheritance paths lead to it.
The super() function works with the MRO to call the next class in the resolution chain, not necessarily the direct parent class.
class B(A):
def greet(self):
return super().greet() + " via B"
class C(A):
def greet(self):
return super().greet() + " via C"
class D(B, C):
pass
print(D().greet())
The result shows that each class contributes in MRO order, producing “Hello from A, via C, via B”. For this hierarchy the MRO is D, B, C, then A.
In this example super() inside B calls C, not A, because super() follows the MRO chain. This cooperative use of super() is why multiple inheritance in Python works cleanly when all classes in the hierarchy use super() consistently. Mixing classes that use super() with those that call parent methods directly can break the chain.
What are Python descriptors and how do built-ins like property and classmethod use them?
A descriptor is any object that defines one or more of __get__, __set__, or __delete__. When Python looks up an attribute on an instance, it checks if the class, or a parent class, defines a descriptor for that attribute name. If it does, Python calls the descriptor`s protocol methods instead of returning the attribute directly.
How built-ins use descriptors:
-
propertyis a descriptor that wraps getter, setter, and deleter functions. Accessing an attribute likeobj.namecalls the descriptorsget`, which calls the defined getter. -
classmethodis a descriptor whose__get__always passes the class (not the instance) as the first argument. -
staticmethodis a descriptor whose__get__returns the original function without binding it to an instance or class.
Types of descriptors:
- Data descriptors define both
__get__and__set__(or__delete__). They take priority over the instancesdict`. - Non-data descriptors define only
__get__. The instance__dict__takes priority over them
Custom descriptor example:
class Positive:
def __set_name__(self, owner, name):
self.name = name
def __get__(self, obj, objtype=None):
return getattr(obj, f"_{self.name}", None)
def __set__(self, obj, value):
if value < 0:
raise ValueError(f"{self.name} must be positive")
setattr(obj, f"_{self.name}", value)
class Product:
price = Positive()
p = Product()
p.price = 10 # ok
p.price = -1 # ValueError
Tradeoffs:
- Custom descriptors add indirection to every attribute access. In hot paths this has a measurable performance cost.
- Debugging can be more difficult since attribute access is no longer transparent.
- For most use cases
propertyis sufficient. A custom descriptor is typically only needed when the same validation or access logic must be reused across multiple classes.
The Global Interpreter Lock (GIL) is a mutex in CPython that allows only one thread to execute Python bytecode at a time, even on multi-core machines. It exists to protect Python`s internal memory management, specifically reference counting, from race conditions.
Impact by task type:
- CPU-bound tasks (such as image processing, number crunching, cryptography): Threads do not provide real parallelism. They constantly compete for the GIL, so adding threads can reduce performance due to contention overhead.
- I/O-bound tasks (network requests, disk reads, database queries): Threads work well. The GIL is released while a thread waits for I/O, allowing other threads to run in the meantime.
Workarounds:
-
multiprocessing: Each process gets its own interpreter and GIL, enabling true parallelism. The tradeoff is higher memory usage and startup overhead, with communication handled through inter-process communication (IPC). -
asyncio: provides single-threaded concurrency for I/O-bound work, with no GIL contention. - C extensions (
NumPy,Cython): These can release the GIL during heavy computation, allowing real parallel execution within extension code. - Python 3.13 and free-threaded mode (PEP 703): This is an experimental opt-in feature that removes the GIL entirely, enabling true multithreaded parallelism in pure Python.
Why this question matters:
This is a concurrency model Python question that tests performance thinking and explores whether a candidate understands how the limitations of CPython can impact performance. Hiring managers should note whether candidates can explain what the GIL is, why it exists, and its impact by task type for CPU-bound versus I/O-bound workloads. A strong applicant will also be able to explain practical workarounds such as using multiprocessing, asyncio, or native extensions, and demonstrate their ability to make informed decisions based on workload type.
Python manages memory automatically using a private heap, which is a dedicated region of memory controlled entirely by the Python interpreter and not directly accessible to the programmer. All Python objects and data structures are stored in this heap.
Python`s memory architecture has four layers:
Layer 0 Operating System: Provides raw memory through system calls.
Layer 1 Raw memory allocator: Pythons internal allocator wraps malloc() and free() from the C standard library. Its used for large allocations and internal interpreter structures.
Layer 2 pymalloc: Python`s custom allocator, optimized for small objects under 512 bytes, which covers the majority of Python objects. It manages memory in three tiers.
- Arenas (256KB chunks): large blocks requested from the OS
- Pools (4KB each): subdivisions of arenas, each pool holds objects of a fixed-size class
- Blocks: individual object slots within a pool
Layer 3 Object-specific allocators: Built-in types like int, list, and dict have their own allocators optimized for their usage patterns.
This tiered structure reduces memory fragmentation and avoids the overhead of frequent malloc() and free() calls.
Example of integer caching:
a = 256
b = 256
print(a is b)
This returns True because small integers (-5 to 256) are cached and reused by Python, so both variables reference the same object.
a = 257
b = 257
print(a is b)
This returns False because larger integers are not cached, so new objects are allocated for each assignment.
How this differs from C:
In C, the developer must call malloc() and free() manually. Forgetting to free memory can cause leaks, while freeing memory incorrectly can cause crashes. Python`s memory manager handles all of this automatically. The tradeoff is that the developer has less control over how and when memory is released.
Practical implications:
Pythons memory usage can appear higher than expected because pymalloc` may retain freed memory in its internal pools instead of returning it immediately to the operating system.
The integer cache means is comparisons on small integers can produce unexpected results. Always use == for value comparison.
For memory-intensive workloads, tools like tracemalloc and memory_profiler can help inspect memory usage in the heap.
Python uses two complementary mechanisms to reclaim memory: reference counting as the primary mechanism and a cyclic garbage collector as a secondary mechanism.
Mechanism 1: Reference counting
Every Python object maintains an internal counter tracking how many references point to it. When this count reaches zero the object is immediately deallocated and its memory is returned to the allocator.
import sys
a = []
print(sys.getrefcount(a))
This prints 2 because one reference comes from a and another is temporarily created for the getrefcount() argument.
b = a
print(sys.getrefcount(a))
This prints 3 because the object is now referenced by a, b, and the temporary reference created by getrefcount().
del b
print(sys.getrefcount(a))
This returns to 2 after deleting b, leaving only a and the temporary reference.
Reference counting is immediate and deterministic. Objects are freed the moment they are no longer referenced with no pause required.
Reference counting limitation: It cannot solve circular references
import gc
class Node:
def __init__(self):
self.ref = None
a = Node()
b = Node()
a.ref = b
b.ref = a
del a
del b
Here, a and b reference each other, forming a circular reference. When two objects reference each other, even deleting all external references will still leave each object`s reference count at 1. Therefore reference counting can never reach 0 and cannot free these objects, leading to a memory leak.
Mechanism 2: Cyclic garbage collector
Pythons gc` module implements a generational cyclic garbage collector that periodically detects and breaks circular reference cycles.
Objects that survive a collection are promoted to the next generation. This is based on the generational hypothesis, which assumes that most objects are short-lived. As a result, Python collects younger generations frequently and older generations rarely, to keep overhead low.
Interacting with the garbage collector programmatically:
import gc
print(gc.isenabled())
gc.collect()
gc.collect(generation=0)
print(gc.get_threshold())
gc.set_threshold(1000, 15, 15)
print(gc.garbage)
gc.disable()
-
gc.isenabled()checks whether garbage collection is enabled and is true by default. -
gc.collect()triggers a manual garbage collection across all generations. -
gc.collect(generation=0)collects generation 0 only. -
gc.get_threshold()returns the collection thresholds, which are object-count limits that trigger collection. The defaults are typically 700, 10, 10. -
gc.set_threshold(1000, 15, 15)changes and tunes those thresholds for performance-sensitive applications. -
gc.garbagelists objects the garbage collector could not free. -
gc.disable()disables garbage collection which is useful in batch scripts with no circular references.
Performance implications:
- Reference counting adds a small overhead to every assignment and deletion, because incrementing and decrementing counters is not free.
- The cyclic collector can introduce occasional pauses, usually milliseconds, but it may be noticeable in latency-sensitive applications.
- In long-running servers, disabling the cyclic collector and manually controlling
gc.collect()can reduce unpredictable latency spikes. - With CPython`s reference counting, objects are usually freed immediately when they go out of scope, unlike JVM or .NET GC which may hold objects longer and delay collection.
Practical tip for detecting memory leaks:
import tracemalloc
tracemalloc.start()
snapshot = tracemalloc.take_snapshot()
top_stats = snapshot.statistics("lineno")
for stat in top_stats[:5]:
print(stat)
-
tracemalloc.start()begins tracking memory allocations and runs the code you want to analyze while tracking is active. -
take_snapshot()captures the current memory state. -
statistics("lineno")groups memory usage by line of code. -
top_stats[:5]goes through the top 5 memory-consuming lines and prints each one.
How does Python’s asyncio event loop work and what is the difference between concurrency and parallelism?
A coroutine is a function defined with async def that can pause its own execution at an await point and yield control back to the event loop without blocking the thread. It suspends itself until the awaited operation is ready.
The event loop is a scheduler that runs on a single thread. It maintains a queue of ready-to-run coroutines. When a coroutine reaches an await, the loop switches to the next ready coroutine. When the I/O completes, detected through operating system mechanisms such as select, epoll, or kqueue, the paused coroutine is placed back in the queue and resumes execution from where it left off.
Concurrency vs parallelism:
-
asyncioprovides concurrency, meaning multiple tasks can be in progress at once, but it doesn`t provide parallelism since only one coroutine runs at any given moment. - Parallelism, which provides true simultaneous execution across multiple CPU cores, requires
multiprocessingor thread pools usingloop.run_in_executor().
Where asyncio falls short:
- CPU-bound work gets no benefit because a coroutine performing heavy computation will block the entire event loop. Nothing can run while it holds the thread.
- Simple synchronous scripts gain little benefit and add boilerplate overhead.
- Mixing synchronous and asynchronous code requires care, because calling a blocking function inside a coroutine will stall the whole loop.
import asyncio
async def fetch(name, delay):
await asyncio.sleep(delay)
print(f"{name} done")
async def main():
await asyncio.gather(
fetch("A", 2),
fetch("B", 1),
)
asyncio.run(main())
The call to asyncio.sleep() pauses the coroutine without blocking the event loop, allowing other coroutines to run. Task B completes first because it has a shorter delay, even though both tasks begin at the same time.
Why this question matters:
This is an asynchronous Python programming question that tests back-end skills, and whether a candidate understands how modern high-concurrency systems are built. Hiring managers should watch for candidates who explain how the asyncio event loop schedules coroutines, what await actually does, and how non-blocking I/O enables multiple tasks to progress concurrently. Applicants must clearly distinguish between concurrency and parallelism, and demonstrate that they understand both the strengths and limitations of async code in real applications.
Use multithreading for I/O-bound tasks such as network requests, database queries, and file reads, where threads spend most of their time waiting. While Pythons Global Interpreter Lock (GIL) prevents true parallel execution of Python bytecode across threads, its released during I/O operations, allowing threads to overlap their waiting time effectively.
Use multiprocessing for CPU-bound tasks like image processing, numerical computation, and data transformation, where true parallelism across CPU cores is required. Each process has its own Python interpreter and its own GIL, so they can truly run in parallel.
Practical tradeoffs:
- Threads are lightweight, share memory, and are easier to communicate across, but they require careful handling of shared state to avoid race conditions.
- Processes have higher startup overhead and communicate through inter-process communication (IPC), such as pipes or queues, but they`re fully isolated and better suited for CPU-heavy work.
Alternative approach:
A third option is asyncio which provides I/O-bound concurrency on a single thread, with no overhead. It avoids GIL contention and is the preferred approach for high-concurrency I/O workloads in modern Python.
Why this question matters:
This is a concurrency decision-making interview question for Python developers that evaluates whether the candidate knows the best execution model to choose for a given problem. Hiring managers should look for candidates who can clearly distinguish between I/O-bound and CPU-bound tasks and explain why threads or processes are more appropriate for each case. Experienced applicants will also possess a clear understanding of the tradeoffs here, such as memory usage, inter-process communication, and synchronization complexity instead of treating the approaches as interchangeable.
Find the smallest positive integer that cannot be represented as a sum of any subset of a given list.
There are two main ways to find the smallest positive integer that can`t be represented as the sum of any subset.
The naive approach is to generate every possible subset, compute each sum, and then find the smallest integer that doesn`t appear in that set of sums. This works, but becomes impractical for large inputs, as the number of subsets grows exponentially.
Exponential time complexity: O(2^n). This approach generates all subsets. Correct but unusable for large inputs.
import itertools
def least_non_representable_naive(nums):
reachable = {0}
for r in range(1, len(nums) + 1):
for subset in itertools.combinations(nums, r):
reachable.add(sum(subset))
result = 1
while result in reachable:
result += 1
return result
The greedy approach: O(n log n). A more efficient solution uses a greedy approach with time complexity.
def least_non_representable(nums: list[int]) -> int:
"""
Sort the list, then track the smallest number we cannot
yet represent as `reach`. For each number x in sorted order:
- if x <= reach + 1, it extends our representable range to reach + x
- if x > reach + 1, there is a gap — reach + 1 is unreachable
"""
nums.sort()
reach = 0
for x in nums:
if x > reach + 1:
break
reach += x
return reach + 1
print(least_non_representable([1, 2, 5, 7]))
print(least_non_representable([1, 2, 2, 5, 7]))
print(least_non_representable([1, 2, 3, 4]))
print(least_non_representable([]))
print(least_non_representable([2, 3, 4]))
If the list does not contain 1, then the answer is immediately 1, as that value cannot be formed.
Why the greedy approach works:
After sorting, maintain the invariant that reach represents the maximum integer that can be formed using the elements seen so far. If the next element x satisfies x <= reach + 1, it fills in or extends the representable range so that every integer from 1 to reach + x can now be formed. If x > reach + 1, there is a gap between reach and x, which means reach + 1 is unreachable. Since all remaining values are at least as large as x, none of them can fill that gap.
A simple way to implement a state machine for task lifecycle validation in Python is to define the allowed task states and then map which transitions are valid from each of them.
import enum
from collections.abc import Iterable
class TaskStatus(enum.Enum):
CREATED = "CREATED"
IN_PROGRESS = "IN_PROGRESS"
DONE = "DONE"
CANCELED = "CANCELED"
VALID_TRANSITIONS = {
TaskStatus.CREATED: {TaskStatus.IN_PROGRESS, TaskStatus.CANCELED},
TaskStatus.IN_PROGRESS: {TaskStatus.DONE, TaskStatus.CANCELED},
TaskStatus.DONE: set(),
TaskStatus.CANCELED: set(),
}
def is_valid_sequence(sequence: Iterable[TaskStatus]) -> bool:
try:
status, *rest = sequence
except ValueError:
return False
if status is not TaskStatus.CREATED:
return False
for next_status in rest:
if next_status not in VALID_TRANSITIONS[status]:
return False
status = next_status
return True
- If the sequence is empty the function returns
Falsebecause there is no valid starting state. - A valid task lifecycle must always begin with
TaskStatus.CREATED.
A simple interval approach will fall short. A naive implementation that enforces a fixed minimum interval between calls, for example sleeping 0.5 seconds between each call for a two calls per second limit, does not actually allow two calls within the same second. Instead, it forces exactly one call every 0.5 seconds regardless of the actual call pattern. It also blocks all callers, including those arriving well within the allowed rate.
Token bucket implementation:
import time
import threading
import functools
def rate_limit(calls_per_second: float, burst: int = None):
"""
Token bucket rate limiter.
Tokens accumulate at `calls_per_second` rate up to `burst` maximum.
Each call consumes one token. If the bucket is empty, the caller blocks
until a token is available.
burst: maximum tokens that can accumulate (default: calls_per_second).
Allows short bursts above the steady-state rate.
"""
capacity = float(burst or calls_per_second)
interval = 1.0 / calls_per_second
lock = threading.Lock()
state = {"tokens": capacity, "last_refill": time.monotonic()}
def decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
while True:
with lock:
now = time.monotonic()
elapsed = now - state["last_refill"]
state["tokens"] = min(capacity, state["tokens"] + elapsed * calls_per_second)
state["last_refill"] = now
if state["tokens"] >= 1:
state["tokens"] -= 1
break
else:
wait = (1 - state["tokens"]) / calls_per_second
time.sleep(wait)
return func(*args, **kwargs)
return wrapper
return decorator
@rate_limit(calls_per_second=2, burst=5)
def call_external_api(endpoint: str) -> str:
return f"Response from {endpoint}"
for i in range(8):
result = call_external_api(f"/items/{i}")
from datetime import datetime
print(f"{datetime.now().strftime('%H:%M:%S.%f')[:-3]} → {result}")
The limiter refills tokens based on how much time has passed since the last update. If no token is available, the caller waits only until one token has refilled. With burst=5, the first five calls can run immediately because the bucket starts full. After the initial burst is used, later calls are spaced according to the configured rate, represented above as one token every 0.5 seconds.
How the token bucket works:
- The bucket holds up to
bursttokens, refilling atcalls_per_secondtokens per second. - Each call consumes one token. If a token is available, the call proceeds immediately.
- If the bucket is empty, the caller waits only until one token refills, rather than waiting for a full fixed interval every time.
- This allows short bursts of traffic while still enforcing the steady-state rate over time.
Async function to unit test:
async def logs(cont, name):
conn = aiohttp.UnixConnector(path="/var/run/docker.sock")
async with aiohttp.ClientSession(connector=conn) as session:
async with session.get(
f"http://xx/containers/{cont}/logs?follow=1&stdout=1"
) as resp:
async for line in resp.content:
print(name, line)
The main challenge in testing this async function is that logs depends on two external resources that arent available in a test environment: a Unix socket at /var/run/docker.sock` and a live Docker daemon.
The solution is to mock both aiohttp.UnixConnector and aiohttp.ClientSession so the function never touches real I/O.
Test implementation using pytest-asyncio:
import pytest
import asyncio
from unittest.mock import AsyncMock, MagicMock, patch
@pytest.mark.asyncio
async def test_logs_prints_each_line():
fake_lines = [b"line one\n", b"line two\n"]
mock_content = AsyncMock()
mock_content.__aiter__ = MagicMock(return_value=iter(fake_lines))
mock_resp = AsyncMock()
mock_resp.__aenter__.return_value.content = mock_content
mock_session = AsyncMock()
mock_session.__aenter__.return_value.get.return_value = mock_resp
with patch("aiohttp.UnixConnector"), \
patch("aiohttp.ClientSession", return_value=mock_session), \
patch("builtins.print") as mock_print:
await logs("abc123", "my_container")
assert mock_print.call_count == 2
mock_print.assert_any_call("my_container", b"line one\n")
mock_print.assert_any_call("my_container", b"line two\n")
This test uses two fake log lines to simulate streamed output from Docker. The test mocks the async iterator so the function can loop over log lines without using a real network stream.
The response object is mocked as an async context manager so it behaves like the real aiohttp response inside an async with block. The client session is also mocked as an async context manager so the test can replace the real aiohttp.ClientSession.
Why async code is harder to test than sync code:
- Async functions must be awaited. They can`t be called directly in a regular test function.
- Mocking async context managers such as
async withand async iteratorsasync forrequiresAsyncMockand__aiter__setup, which has no sync equivalent. - The event loop must be managed. Each test needs its own loop to avoid state leaking between tests.
How pytest-asyncio addresses this:
@pytest.mark.asyncio runs the test coroutine inside a managed event loop automatically.
Each test gets a fresh event loop by default preventing shared state leaks between tests.
It works seamlessly with unittest.mock.AsyncMock, introduced in Python 3.8.
pytest-asyncio setup required in pyproject.toml or pytest.ini:
[tool.pytest.ini_options]
asyncio_mode = "auto"
This setting allows async test functions to run automatically without adding the marker to each one individually.
Additional tests:
Early termination: Set keep_running = False mid-stream and assert the loop exits.
Connection errors: Mock session.get to raise aiohttp.ClientConnectionError and confirm it propagates correctly.
Empty stream: Mock content with no lines and confirm that print is never called.
Why this question matters:
This is a Python technical interview question about testing and asynchronous programming that is valuable for all experience levels. It tests multiple real-world skills at once by evaluating whether a candidate can write reliable tests for code that depends on external systems and async execution. Hiring managers should expect junior candidates to understand testing basics beyond simple functions, and mid-level candidates to discuss mocking network interactions, handling async context managers, and show a strong understanding of async testing concepts. Strong senior applicants will answer from a design and architecture standpoint, demonstrating that they can isolate behavior, avoid real I/O in tests, validate outcomes in a controlled and repeatable way, and structure code to improve testability.
To fetch multiple URLs concurrently in Python using aiohttp, a common approach is to create one async function that fetches a single URL and another that schedules all of those fetches together with asyncio.gather().
import asyncio
import aiohttp
async def fetch_one(
session: aiohttp.ClientSession,
url: str,
semaphore: asyncio.Semaphore,
) -> dict:
async with semaphore:
try:
timeout = aiohttp.ClientTimeout(total=10)
async with session.get(url, timeout=timeout) as resp:
resp.raise_for_status()
return {
"url": url,
"status": resp.status,
"body": await resp.text(),
}
except aiohttp.ClientResponseError as e:
return {"url": url, "status": e.status, "error": str(e)}
except aiohttp.ClientConnectionError as e:
return {"url": url, "status": None, "error": f"Connection error: {e}"}
except asyncio.TimeoutError:
return {"url": url, "status": None, "error": "Timed out"}
async def fetch_all(urls: list[str], max_concurrent: int = 10) -> list[dict]:
semaphore = asyncio.Semaphore(max_concurrent)
async with aiohttp.ClientSession() as session:
tasks = [fetch_one(session, url, semaphore) for url in urls]
return await asyncio.gather(*tasks)
urls = [
"https://httpbin.org/get",
"https://httpbin.org/status/404",
"https://httpbin.org/delay/2",
"https://does-not-exist.example.com",
]
results = asyncio.run(fetch_all(urls, max_concurrent=5))
for r in results:
if "error" in r:
print(f"FAIL [{r.get('status', '---')}] {r['url']} - {r['error']}")
else:
print(f"OK [{r['status']}] {r['url']}")
In this example fetch_one() handles a single request, while fetch_all() creates the tasks and runs them concurrently. The default behavior of asyncio.gather() is used here because errors are handled inside fetch_one(), so exceptions are not expected to propagate to gather(). A semaphore is used to limit how many requests can be active at the same time.
Why the semaphore matters:
Without a concurrency limit, fetching 1000 URLs simultaneously would open 1000 connections at once. This can exhaust local file descriptors, overwhelm the target server, and increase the risk of triggering rate limiting or IP bans. Using asyncio.Semaphore(N) caps active requests at N while still allowing all tasks to be scheduled concurrently.
Production considerations:
- Add retry logic per URL for transient errors such as 5xx responses or timeouts.
- Use
aiohttp.TCPConnector(limit=N)as an alternative to a semaphore for connection pool limiting. - Stream large response bodies using
resp.content.read(chunk_size)instead ofawait resp.text()to avoid memory spikes.
A thread-safe singleton in Python can be implemented by protecting instance creation with a lock.
The naive implementation fails because the race condition occurs in the gap between the if check and the assignment. If two threads both read _instance is None as True before either has finished creating the instance, both proceed to create a new object. The result is two separate instances, breaking the singleton guarantee entirely. This is a classic check-then-act race condition.
Naive implementation:
class Singleton:
_instance = None
def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
A thread-safe solution uses double-checked locking:
import threading
class Singleton:
_instance = None
_lock = threading.Lock()
def __new__(cls, *args, **kwargs):
if cls._instance is None:
with cls._lock:
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
s1 = Singleton()
s2 = Singleton()
assert s1 is s2
The outer check avoids acquiring the lock on every call. Once the instance exists, all threads can return it with no synchronization overhead. The inner check handles the case where two threads both pass the outer check before either acquires the lock, so only the first thread through will find _instance is None still True.
This alternative example assumes the singleton instance is defined at module level, for example in a file like mysingleton.py where the instance is created once the module is first imported.
class _Singleton:
def __init__(self):
self.value = 42
instance = _Singleton()
Python`s import system holds a module-level lock during import, so module-level objects inherently behave as thread-safe singletons. This is often the simplest and most Pythonic approach when you control the module structure.
Reasons to avoid the singleton pattern:
- It makes unit testing harder because global state is difficult to mock or reset between tests.
- It introduces hidden dependencies because callers don`t explicitly declare that they need the singleton before reaching for it.
- Dependency injection is often a cleaner alternative in larger codebases.
A retry decorator with exponential backoff can be implemented by wrapping a function in retry logic that catches specific exceptions, and progressively increases the wait time between retries after a failure.
Here, jitter is set to up to 10 percent of the current delay.
import time
import random
import functools
def retry(max_attempts=3, base_delay=1.0, max_delay=30.0, exceptions=(Exception,)):
"""
Retries the decorated function up to max_attempts times.
Delay doubles each attempt (exponential backoff) with random
jitter added to prevent thundering herd under concurrent load.
Delay is capped at max_delay to avoid unbounded waits.
"""
def decorator(func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(1, max_attempts + 1):
try:
return func(*args, **kwargs)
except exceptions as e:
if attempt == max_attempts:
raise
delay = min(base_delay * (2 ** (attempt - 1)), max_delay)
jitter = random.uniform(0, delay * 0.1)
wait = delay + jitter
print(f"Attempt {attempt} failed: {e}. Retrying in {wait:.2f}s...")
time.sleep(wait)
return wrapper
return decorator
@retry(max_attempts=4, base_delay=0.5, max_delay=10.0, exceptions=(ConnectionError, TimeoutError))
def fetch_data(url):
import random
if random.random() < 0.7:
raise ConnectionError("Network unreachable")
return f"Data from {url}"
Backoff sequence:
In the example above, jitter is set to up to 10 percent of the current delay. Retry delays are approximate because the jitter adds a small random variation.
Attempt 1 fails: Wait 0.5s
Attempt 2 fails: Wait 1.0s
Attempt 3 fails: Wait 2.0s
Attempt 4 fails: Raises ConnectionError
Production considerations:
Jitter: Without jitter, all clients that hit an error simultaneously will retry at exactly the same intervals, creating a thundering herd that overwhelms the recovering service. Adding a small random offset spreads out the retry load.
Max delay cap: Without a cap, the delay keeps growing. A max_delay parameter ensures callers are never blocked for an unreasonable amount of time.
When not to use retry logic:
Non-idempotent operations: Retrying a payment charge or database insert without idempotency guarantees can cause duplicate actions.
Client errors (such as 4xx responses): Retrying a 400 Bad Request or 404 Not Found will not succeed and should only be retried on transient server or network errors.
Tight loops: Blindly retrying in a high-throughput system can amplify load on an already struggling downstream service. In these cases, consider a circuit breaker pattern instead.
To process a large CSV file without loading it all into memory, read it incrementally and process rows in small chunks. This keeps memory usage stable regardless of file size.
import csv
from pathlib import Path
def streaming_column_stats(
filepath: str | Path,
column: str,
chunk_size: int = 1000,
skip_errors: bool = False,
):
"""
Streams a CSV file in chunks of `chunk_size` rows.
Memory usage stays constant regardless of file size.
Returns total, count, average, and skipped row count.
skip_errors: if True, malformed rows are skipped and counted.
if False, malformed rows raise ValueError immediately.
"""
total = 0.0
count = 0
skipped = 0
chunk = []
with open(filepath, newline="", encoding="utf-8") as f:
reader = csv.DictReader(f)
if column not in (reader.fieldnames or []):
raise ValueError(f"Column '{column}' not found. Available: {reader.fieldnames}")
for row in reader:
raw = row.get(column, "").strip()
try:
chunk.append(float(raw))
except ValueError:
if skip_errors:
skipped += 1
continue
raise ValueError(f"Non-numeric value '{raw}' in column '{column}'")
if len(chunk) == chunk_size:
total += sum(chunk)
count += len(chunk)
chunk = []
if chunk:
total += sum(chunk)
count += len(chunk)
return {
"total": total,
"count": count,
"average": total / count if count else 0.0,
"skipped": skipped,
}
stats = streaming_column_stats(
"sales_data.csv",
column="revenue",
chunk_size=500,
skip_errors=True,
)
print(f"Total: {stats['total']:>12,.2f}")
print(f"Count: {stats['count']:>12,}")
print(f"Average: {stats['average']:>12,.2f}")
print(f"Skipped: {stats['skipped']:>12,} malformed rows")
After each chunk is processed it gets discarded instead of accumulated, which keeps memory usage stable. After the loop finishes any remaining partial chunk is processed so the final rows are included.
pandas equivalent for teams already using it:
import pandas as pd
total, count = 0.0, 0
for chunk in pd.read_csv("sales_data.csv", chunksize=500):
total += pd.to_numeric(chunk["revenue"], errors="coerce").sum()
count += len(chunk)
average = total / count
When to use pandas chunking instead:
pandas pd.read_csv(chunksize=N) provides the same memory benefit with vectorized operations on each chunk. This is often significantly faster than pure Python for numeric work at scale.
Performance optimization should always start with measurement. Do not guess where the bottleneck is.
Python’s standard library provides two profiling tools: cProfile for function-level profiling and timeit for microbenchmarking individual expressions.
import cProfile
import pstats
profiler = cProfile.Profile()
profiler.enable()
run_my_application()
profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats("cumulative")
stats.print_stats(20)
import timeit
timeit.timeit("[x**2 for x in range(1000)]", number=10000)
Sorting by cumulative time highlights the functions that spend the most total time, including time spent in functions they call. print_stats(20) displays the top 20 functions with the highest measured cost.
You can also profile from the terminal for a quick pass. Run cProfile as a module against your script, sorting results by cumulative time, and review the highest-ranked entries at the start of the output. timeit is useful for benchmarking small, focused expressions or code snippets.
For production applications py-spy is a sampling profiler that attaches to a running process without modifying the code, which is useful if the application cant be restarted. memory_profiler` works similarly for memory usage on a line-by-line basis.
Common Python performance bottlenecks and fixes:
Slow loops over large datasets: These can be replaced with fast vectorized NumPy or pandas operations, which execute in C rather than Python bytecode.
result = [x * 2 for x in large_list]
import numpy as np
result = np.array(large_list) * 2
Repeated string concatenation: This is inefficient and can become expensive because strings are immutable so += in a loop creates a new object with every iteration. Use join instead.
result = ""
for part in parts:
result += part
result = "".join(parts)
Redundant database or API calls inside loops: This can create an N+1 query problem. Batch the calls outside of the loop:
for user_id in user_ids:
user = db.get_user(user_id)
users = db.get_users_batch(user_ids)
Missing caching for expensive repeated computations: This can often be addressed by using functools.lru_cache for pure functions or a TTL cache for time-sensitive data. With lru_cache, each unique input is computed once and then reused from the cache on later calls.
from functools import lru_cache
@lru_cache(maxsize=256)
def expensive_computation(n: int) -> int:
return sum(range(n))
Note: For CPU-bound bottlenecks that cannot be vectorized, consider multiprocessing to parallelize across multiple CPU cores, or rewrite the hottest path as a C extension using tools like Cython or ctypes.
Why this question matters:
This is a senior-level interview question for Python developers about performance and debugging. It tests whether a candidate approaches optimization methodically. Hiring managers should focus on candidates who discuss the importance of starting with measurement tools such as cProfile or timeit to identify real bottlenecks before making any changes. Applicants should also demonstrate their familiarity with common performance issues like inefficient loops, redundant I/O, and lack of caching, then explain their practical strategies for improving performance without sacrificing code clarity.
Python’s multiprocessing module bypasses the GIL by creating separate processes, each with their own interpreter and memory space. This enables true CPU parallelism, unlike threads, which do not run Python bytecode in parallel across CPU cores.
Process gives direct control over individual processes. Pool manages a pool of worker processes and is a better tool for distributing work across many items:
from multiprocessing import Process, Pool
import os
def worker(name):
print(f"{name} running in PID {os.getpid()}")
p = Process(target=worker, args=("task_1",))
p.start()
p.join()
def square(n):
return n * n
with Pool(processes=4) as pool:
results = pool.map(square, range(10))
print(results)
-
join()waits for the process to finish before the main program continues. -
pool.map()blocks until all results are ready and returns them in order. -
pool.imap()returns a lazy iterator, which is more memory-efficient for large inputs. -
pool.starmap()is similar tomap(), but it unpacks argument tuples for each call.
Processes do not share memory so communication happens through Queue or Pipe:
from multiprocessing import Process, Queue
def producer(q):
for i in range(5):
q.put(i)
q.put(None)
def consumer(q):
while True:
item = q.get()
if item is None:
break
print(f"Got {item}")
q = Queue()
Process(target=producer, args=(q,)).start()
Process(target=consumer, args=(q,)).start()
Above, None is used as a sentinel value to signal that the producer has finished sending data.
For shared state, Value or Array from multiprocessing can be used but shared state should be minimized, as it requires locks and reintroduces synchronization complexity.
Common pitfalls include:
- Functions passed to
Poolmust be picklable, lambdas and closures will not work. - On Windows, multiprocessing code must be placed inside
if __name__ == "__main__"to prevent recursive process spawning. - Process startup overhead is significant so multiprocessing is most useful for CPU-bound tasks that run long enough to offset the cost.
- For I/O-bound work
asyncioor threading are usually better choices.
A pytest fixture is a function decorated with @pytest.fixture that provides setup data or resources to tests. A test requests a fixture by naming it as a parameter, and pytest injects it automatically. Teardown is handled by using yield inside the fixture, keeping setup and teardown defined in one place.
import pytest
@pytest.fixture(scope="module")
def db_connection():
conn = create_connection()
yield conn
conn.close()
def test_query(db_connection):
result = db_connection.execute("SELECT 1")
assert result is not None
Execution pauses at yield, the test runs using the yielded value and teardown continues after the test completes.
Fixture scope controls how often the fixture is created. function (default) creates a fresh fixture per test, class shares it across a test class, module shares it across a file, and session shares it across the entire test run. Choosing the right scope helps avoid expensive repeated setup work.
@pytest.mark.parametrize runs the same test with multiple input sets, eliminating copy-paste test functions:
@pytest.mark.parametrize("value, expected", [
(2, 4),
(3, 9),
(-4, 16),
(0, 0),
])
def test_square(value, expected):
assert value ** 2 == expected
This produces four separate test cases, each with its own pass/fail result and label in the output.
Fixtures and parametrize combine naturally. The fixture provides shared setup while parametrize varies the inputs:
@pytest.fixture
def user_service(db_connection):
return UserService(db_connection)
@pytest.mark.parametrize("role, can_delete", [
("admin", True),
("editor", False),
("viewer", False),
])
def test_delete_permission(user_service, role, can_delete):
user = User(role=role)
assert user_service.can_delete(user) == can_delete
Compared to unittest, pytest fixtures are more flexible. They don’t require class inheritance, scoping is explicit, and teardown is defined alongside setup rather than in a separate method.
Routing in Flask maps incoming HTTP requests to Python functions using decorators such as @app.route(). When a request matches a defined route, Flask calls the associated function and returns its response.
from flask import Flask, request, jsonify, abort
app = Flask(__name__)
@app.route("/")
def index():
return "Hello, World!"
@app.route("/users/<int:user_id>")
def get_user(user_id):
user = db.find(user_id)
if user is None:
abort(404)
return jsonify(user)
@app.route("/users", methods=["GET", "POST"])
def users():
if request.method == "GET":
role = request.args.get("role")
active = request.args.get("active", default=True, type=bool)
return jsonify(db.find_all(role=role, active=active))
if request.method == "POST":
data = request.get_json()
if not data or "name" not in data:
abort(400, description="name is required")
user = db.create(data)
return jsonify(user), 201
@app.errorhandler(404)
def not_found(e):
return jsonify({"error": str(e)}), 404
Route parameters such as <int:user_id> are captured as function arguments.
Query parameters are read from the URL, as in the example /users?role=admin&active=true.
POST requests are where data is typically read from a JSON request body using request.get_json().
Returning status code 201 indicates that a new resource was created.
URL converters like <int:user_id> automatically cast and validate the parameter. If the value cannot be converted, Flask returns a 404 response. Built-in converters include string (default), int, float, path, and uuid.
For larger applications Flask uses Blueprints to split routes across multiple files without circular imports:
from flask import Blueprint
users_bp = Blueprint("users", __name__, url_prefix="/users")
@users_bp.route("/<int:user_id>")
def get_user(user_id):
pass
app.register_blueprint(users_bp)
Flask is a micro-framework. It handles routing and request and response processing, but leaves database, authentication, and validation to extensions or the developer. For APIs that require automatic validation and documentation, FastAPI is now a common choice for new projects.
How do you properly evaluate a machine learning model in Python using train/test splits and cross-validation?
A train/test split separates the data into two sets. The model learns from the training set and is evaluated on a test set that it has not seen before. This measures how well the model generalizes to new data instead of simply memorizing the training examples.
In this example, test_size=0.2 means that 80 percent of the data is used for training and 20 percent for testing. random_state=42 makes the split reproducible, so the same train and test sets are created each time the code runs.
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
import numpy as np
X, y = load_data()
X_train, X_test, y_train, y_test = train_test_split(
X, y,
test_size=0.2,
random_state=42,
stratify=y,
)
model = RandomForestClassifier()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
print(accuracy_score(y_test, predictions))
stratify=y is important for imbalanced datasets. Without it, a random split might place most of the minority class in the training set and leave little to none of it in the test set.
A single train/test split has a weakness: the result depends heavily on which samples end up in each set. A favorable or unfavorable split can produce a misleadingly high or low score. K-fold cross-validation addresses this by dividing the data into k equal folds, training k times each, using a different fold as the test set each time, then averaging the scores.
from sklearn.model_selection import cross_val_score, StratifiedKFold
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
scores = cross_val_score(model, X, y, cv=cv, scoring="accuracy")
print(scores)
print(scores.mean())
print(scores.std())
In the example above, the mean score provides a more reliable estimate of model performance than a single train/test split. A low standard deviation across folds suggests the model performs consistently across different splits. A high standard deviation across folds signals that the model is sensitive to the specific data it trains on, which can be a sign of instability or insufficient data.
Never use the test set for any decision during model development, including hyperparameter tuning, feature selection, or architecture choices. When you repeatedly look at test set performance and adjust the model, you are effectively training on that test data. For hyperparameter tuning, use a three-way split with training, validation, and test sets, or use nested cross-validation.
from sklearn.model_selection import GridSearchCV
param_grid = {"n_estimators": [50, 100, 200], "max_depth": [3, 5, None]}
grid_search = GridSearchCV(model, param_grid, cv=5, scoring="accuracy")
grid_search.fit(X_train, y_train)
print(grid_search.best_params_)
print(grid_search.score(X_test, y_test))
In the example above, GridSearchCV performs its own internal cross-validation to compare hyperparameter combinations. This allows hyperparameter tuning to happen without using the final test set.
The standard Python scraping stack uses requests to fetch pages and BeautifulSoup to parse HTML. Together they handle most static websites cleanly.
import requests
from bs4 import BeautifulSoup
def scrape_articles(url: str) -> list[dict]:
headers = {"User-Agent": "Mozilla/5.0"}
response = requests.get(url, headers=headers, timeout=10)
response.raise_for_status()
soup = BeautifulSoup(response.text, "html.parser")
articles = []
for card in soup.select("div.article-card"):
title = card.select_one("h2.title")
link = card.select_one("a")
date = card.select_one("span.date")
articles.append({
"title": title.get_text(strip=True) if title else None,
"url": link["href"] if link else None,
"date": date.get_text(strip=True) if date else None,
})
return articles
As seen in the example above, User-Agent header is often added so the request looks more like a normal browser request. raise_for_status() raises an exception if the server returns an HTTP error response such as 4xx or 5xx.
Always guard against missing elements with if element checks. Page structure changes silently, and None.get_text() will crash the scraper.
For pagination, follow the next page link until it no longer exists.
def scrape_all_pages(base_url: str) -> list[dict]:
results = []
url = base_url
while url:
soup = BeautifulSoup(requests.get(url).text, "html.parser")
results.extend(parse_page(soup))
next_btn = soup.select_one("a.next-page")
url = next_btn["href"] if next_btn else None
return results
For JavaScript-rendered content that requests cannot see, use Playwright or Selenium which control a real browser.
from playwright.sync_api import sync_playwright
with sync_playwright() as p:
browser = p.chromium.launch()
page = browser.new_page()
page.goto("https://example.com")
page.wait_for_selector("div.content")
html = page.content()
browser.close()
soup = BeautifulSoup(html, "html.parser")
Ethical and legal considerations matter. Always check robots.txt before scraping, respect Crawl-delay directives, and never scrape personal data without consent. Many sites explicitly prohibit scraping in their terms of service. Add delays between requests to avoid overloading servers, and use official APIs when available.
To inspect and list the contents of a Python module at runtime, you can start with dir() to see all available names, and then use inspect for more detailed information.
Basic approach using dir():
import some_module
print(dir(some_module))
dir() returns a sorted list of all names in the modules namespace, including functions, classes, variables, and imported names. Its useful as a quick way to explore what a module exposes but gives no information to show what each name represents.
Filtering for functions using inspect:
This retrieves all functions defined in the module along with their signatures.
import inspect
import some_module
functions = inspect.getmembers(some_module, predicate=inspect.isfunction)
for name, func in functions:
print(name, inspect.signature(func))
inspect.getmembers() returns (name, value) pairs filtered by the predicate. inspect.isfunction returns True only for functions defined with def, not for built-ins, classes, or imported callables.
Listing only functions defined in the module, excluding imports:
import inspect
import some_module
own_functions = [
name for name, obj in inspect.getmembers(some_module, inspect.isfunction)
if obj.__module__ == some_module.__name__
]
print(own_functions)
Checking obj.__module__ against the module`s own name filters out functions that were imported from elsewhere.
What inspect provides beyond dir():
import inspect
print(inspect.getsource(some_module.some_function))
print(inspect.getdoc(some_module.some_function))
sig = inspect.signature(some_module.some_function)
for param_name, param in sig.parameters.items():
print(param_name, param.annotation, param.default)
Practical use cases for runtime introspection:
- Plugin systems: Discover and register handler functions dynamically without a hardcoded registry.
-
Auto-generated CLI tools: Frameworks like
fireandclickuse introspection to build command-line interfaces from function signatures. -
Test discovery:
pytestuses introspection to find test functions without requiring explicit registration. -
Documentation generation: Tools such as
sphinxuseinspect.getsourceandinspect.getdocto extract docstrings automatically. - Debugging and REPL exploration: Quickly understand what an unfamiliar module exposes.
What are the most important pandas DataFrame operations every Python developer should know?
pandas DataFrame is the core data structure for tabular data in Python. In practice, the most important operations are selection, filtering, grouping, merging, and handling missing values.
import pandas as pd
import numpy as np
df = pd.DataFrame({
"name": ["Alice", "Bob", "Carol", "Dave"],
"dept": ["Eng", "Eng", "HR", "HR"],
"salary": [95000, 82000, 74000, None],
})
df["salary"]
df[["name", "salary"]]
df.loc[0]
df.iloc[0]
df[df["salary"] > 80000]
df[(df["dept"] == "Eng") & (df["salary"] > 80000)]
df["salary"].isna()
df["salary"].fillna(df["salary"].mean())
df.dropna(subset=["salary"])
df.groupby("dept")["salary"].mean()
df.groupby("dept").agg(
avg_salary=("salary", "mean"),
headcount=("name", "count"),
)
other = pd.DataFrame({"name": ["Alice", "Bob"], "level": ["senior", "mid"]})
merged = df.merge(other, on="name", how="left")
In the example above, selecting a single column returns a Series. Selecting multiple columns returns another DataFrame. df.loc[0] selects a row by label. df.iloc[0] selects a row by integer position. isna() returns a Boolean mask showing which values are missing. It uses fillna() to fill missing salary values with the mean salary and dropna() to remove rows where salary is missing. Merging in pandas is similar to an SQL join.
The most common performance pitfalls include using iterrows() to loop over rows, which is extremely slow and can almost always be replaced with vectorised operations or apply().
Chained indexing such as df["col"][0] can trigger a SettingWithCopyWarning and may not modify the original data, so using df.loc[0, "col"] is safer.
For large DataFrames, use df.dtypes to confirm column types, because storing integers as float64 due to missing values wastes memory. For files that dont fit in memory, use pd.read_csv(chunksize=N)`.
These sample questions are intended as a starting point for your interview process. If you need additional help, explore our hiring resources—or let Toptal find the best developers, designers, marketing experts, product managers, project managers, and management consultants for you.
Submit an interview question
Submitted questions and answers are subject to review and editing, and may or may not be selected for posting, at the sole discretion of Toptal, LLC.
Toptal Connects the Top 3% of Freelance Talent All Over The World.
Join the Toptal community.
