Understanding python modules

Table of contents

In a growing python project, there comes the point where keeping all code in a single file becomes impractical. Fortunately, python modules and packages offer ways to structure your code into easily maintainable chunks and enable code reuse among projects.


Scripts, Modules and Packages

Especially when coming from other programming languages, python's module system may appear a bit confusing at first. This is in part due to it consisting of 3 different components, that play different (sometimes overlapping) roles:

  • Scripts are simple files ending in .py that are supposed to be executed directly
  • Modules are files with the .py extension that are supposed to be imported by other python files
  • Packages are directories containing at least one file named __init__.py and any number of module files with the .py extension

Importing modules

To use a module, you first need to import it. This can happen in a few different ways. Assume we have the following file as the module we want to import:

lib.py

def sample():
  print("Hello world!")

def test():
  print("Test!")

In the same directory, create a second file called main.py. In this file, we have a few options to import lib.py:


Option 1: importing the entire module

The most basic way to import a module is the import keyword:

import lib

lib.sample()

The name of the module is simply it's filename without the .py extension. This way of importing will import everything from lib.py and make it accessible under the lib. prefix, so the function sample() in lib.py can be executed as lib.sample() from main.py.


Option 2: Importing only specific symbols

A symbol is anything in a python file that has a name, like variables, classes and functions. To import only specific symbols, use the from module import symbol syntax:

from lib import test

test()

Note that this type of import does not prefix the imported function, but instead imports it into our current namespace. Since we only imported test(), the sample() function remains unavailable in main.py.

You can import multiple symbols with this syntax as well:

from lib import test, sample

test()
sample()

Option 3: Importing all symbols from a module

This import is essentially the same as the previous one, except that it blindly imports all symbols from the module without needing to specify each by name.

from lib import *

test()
sample()

This option is somewhat frowned upon, because it is not clear from the import what gets imported. Reading code using this strategy may be unclear to future readers, as there is no hint where symbols came from. In our example, we have access to the test() and sample() functions, but it is not obvious where they came from. With only a single import statement this is still manageable, but as soon as you add another, it quickly becomes chaotic.

Name collisions and aliases

One tricky issue to keep in mind when importing modules using the from keyword is the name collision issue. When importing a symbol that is already defined in the current namespace, it gets quietly replaced by the new one. Consider this example:

lib1.py

def test():
  print("Test from lib1")

lib2.py

def test():
  print("Test from lib2")

main.py

from lib1 import test
from lib2 import test

test()

If you run main.py, it will only print

Test from lib2

While it may look obvious from the current example, it can easily be overlooked when changing import syntax:

main.py

from lib1 import test
from lib2 import *

test()

This version of main.py is equivalent to the one above and suffers from the same name collision issue, but it is much harder to spot if you don't know the full contents of lib2.py.

In order to deal with this issue, it is common practice to avoid the from module import * syntax entirely. But sometimes you may still need 2 different functions with the same name from 2 different modules. In those cases, you can assign one (or both) an alias (aka "nickname") using the as keyword:

from lib1 import test as lib1_test
from lib2 import test

lib1_test()
test()

The test() function from lib1.py is now locally renamed to lib1_test(), so when we import the test() function from lib2.py, it does not replace it anymore and we have access to both functions in main.py.

Importing modules from subdirectories

Python modules can be imported from any number of nested subdirectories by replacing the slashes (or backslashes on windows) from the path with dots:

lib/util/tools.py

def test():
  print("Test!")

Importing the module with prefix:

main.py

import lib.util.tools

lib.util.tools.test()

Note that the entire path became the prefix for the imported module.


Importing only the module:

main.py

from lib.util import tools

tools.test()

Importing specific symbols:

main.py

from lib.util.tools import test

test()

Executable modules

The introduction above mentioned that the different components of the python module ecosystem may sometimes overlap in functionality. One such overlap is the executable module, combining a script and a module in a single file.

An executable module can be either be imported like any regular module, or executed directly. A well-known example of this can be found in the http.server module from python's standard library:

When imported, it exposes all the components required to build your own http server in a python script


But when executed directly, it started an http server from a named directory:

python -m http.server --directory .

This behaviour is achieved through the magic variable __name__, which contains the name of the current module. If the module is executed directly, this will always be "main", otherwise it will be the name the module was imported as.

Using this feature, modules can include code that will only be executed when it is run directly, but not when imported by a script:

if __name__ == "main":
  print("I am run directly!")

def test():
  print("Test")

This sample module will print "I am run directly!" if executed directly. If imported, it will not print that, but still expose the test() function.

Packages

Packages are used to group modules together. A package is a directory containing at least a file named __init__.py. It may contain more module files, but only this one is required.

Modules inside a package behave largely like modules in directories, except that the __init__.py file is implicitly executed the first time the package or any module inside it is imported. It is only executed once, but guaranteed to be executed before the import finishes.

mypkg/__init__.py

print("__init__.py ran")

mypkg/lib1.py

print("lib1.py ran")

mypkg/lib2.py

print("lib2.py ran")

main.py

import mypkg.lib1
import mypkg.lib2

Running main.py produces this output:

__init__.py ran
lib1.py ran
lib2.py ran

Despite not being mentioned anywhere, __init__.py executed before the modules we actually imported. It did so only once, meaning importing mypkg.lib1 triggered it, but when we imported mypkg.lib2 it did not execute as it had already done so.

The __init__.py file is a great way to initialize a package or run setup logic required by modules within it, such as creating directories or connecting to a database.

Importing entire packages

When coming from other languages, one might assume that importing an entire package directory will first run the __init__.py file, then import all .py files within that directory. You may be surprised that this is not how python modules behave by default. But they might, creating a lot of confusion for new python developers.

Assume you have the following setup:

mypkg/__init__.py

n = 10

mypkg/lib1.py

def test():
  print("Test")

main.py

import mypkg

print(mypkg.n)
mypkg.lib1.test()

What seems okay at first glance results in an error when running main.py:

10
Traceback (most recent call last):
 File "/tmp/a/main.py", line 4, in <module>
   mypkg.lib1.test()
   ^^^^^^^^^^
AttributeError: module 'mypkg' has no attribute 'lib1'

While the first line print(mypkg.n) worked, calling mypkg.lib1.test() did not. But why? Because python packages, by default, only guarantee to execute __init__.py, what else happens is up to the code inside that file. This is why we could access variable n, because it was inside that file. But we couldn't access anything from the other file in the package, mypkg/lib1.py.

The creator of the package can choose to import other files during init, but must do so explicitly:

mypkg/__init__.py

from . import lib1

main.py

import mypkg

mypkg.lib1.test()

This time, the code runs fine and prints the expected "Test" message to console.

An alternative way to import the package is to use the less clear from package import * syntax, importing all symbols without the mypkg. prefix:

main.py

from mypkg import *

print(n)
lib1.test()

There is a mechanism to control what gets imported by this syntax from within __init__.py, the __all__ magic variable. It is a list containing the names of symbols that will automatically be imported by this import syntax:

mypkg/__init__.py

n = 10
i = 3
__all__ = ["n"]

main.py

from mypkg import *

print(n)
print(i)

The first print() call will correctly output 10, but the second one will error because it cannot find a variable named i. Even thought __init__.py contains a variable with that name, the __all__ variable only declared n for exporting, so main.py cannot access the variable i.

Note that the __all__ variables only affects the from package import * syntax, the import package syntax ignores this entirely.

Subpackages and relative imports

As you may have guessed, there is nothing stopping you from nesting packages into one another. Doing so is perfectly fine, and importing a package located inside another package will first run the parent's __init__.py file, then the one of the imported package. There is no limit on how deep you can nest subpackages, but in reality this seldom exceeds two directory levels.

Doing so creates a unique edge case, where a module may want to import the submodule of it's own parent. Image you have a package lib containing these subpackages:

lib/database/session.py is a module handling sessions (connections) to some database

lib/tools/log.py is a module handling log output. it may send logs elsewhere for processing or add context to messages

You may want to import log.py for logging output inside session.py, but they aren't in the same package. But they do share a parent package. This is where relative imports come into play: A relative import is an import statement referring to a file with a path relative to the current file. Note that this is only allowed in imported modules, scripts (or modules being run directly) cannot use this feature:

lib/tools/log.py

def write(msg):
   print(f"[LOG] {msg}")

lib/database/session.py

from ..tools import log

def start():
   log.write("db connection started")

main.py

from lib.database import session

session.start()

The output is, as expected, the result of log.write():

[LOG] db connection started



Understanding the python module ecosystem is a vital part of writing effective code in python, and utilizing it correctly leads to clearer project structure and an easily maintainable codebase. While splitting an application into reusable chunks is useful, remember to find the right balance between reusability and verbosity, keeping logic that belongs together grouped in packages and refrain from exporting code to seperate packages just because you can, without any direct need for it.

More articles

Validating and sanitizing data in PHP

Using standardized filters to solve common input tasks

Object oriented PHP cheat sheet

A condensed view of all object-oriented PHP features

Setting up a LAMP stack for development in docker

Streamlining your local PHP development environment

Exploring CPU caches

Why modern CPUs need L1, L2 and L3 caches

Extracting video covers, thumbnails and previews with ffmpeg

Generating common metadata formats from video sources

PHP image upload exploits and prevention

Safely handling image files in PHP environments