Organizing Larger Programs

In this chapter we’ll be covering more of Python’s techniques for organizing programs. Specifically we’ll be looking at Python’s concept of packages and how these can be used to add structure to your program as it grows beyond simple modules.

Packages

As you’ll recall, Python’s basic tool for organizing code is the module. ¹ A module typically corresponds to a single source file, and you load modules into programs by using the import keyword. When you import a module, it is represented by an object of type module and you can interact with it like any other object.

A package in Python is just a special type of module. The defining characteristic of a package is that it can contain other modules, including other packages. So packages are a way to define hierarchies of modules in Python. This allows you to group modules with similar functionality together in ways that express their cohesiveness.

An example of a package: `urllib`

Many parts of Python’s standard library are implemented as packages. To see an example, open your REPL and import urllib and urllib.request:

>>> import urllib>>> import urllib.request

Now if you check the types of both of the these modules, you’ll see that they’re both of type module:

>>> type(urllib)<class 'module'>>>> type(urllib.request)<class 'module'>

The important point here is that urllib.request is nested inside urllib. In this case, urllib is a package and request is a normal module.

The `path` attribute of packages

If you closely inspect each of these objects, you’ll notice an important difference. The urllib package has a __path__ member that urllib.request does not have:

>>> urllib.__path__['./urllib']>>> urllib.request.__path__Traceback (most recent call last):  File "<stdin>", line 1, in <module>AttributeError: 'module' object has no attribute '__path__'

This attribute is a list of filesystem paths indicating where urllib searches to find nested modules. This hints at the nature of the distinction between packages and modules: packages are generally represented by directories in the filesystem while modules are represented by single files.

Note that in Python 3 versions prior to 3.3, __path__ was just a single string, not a list. In this book we’re focusing on Python 3.5+, but for most purposes the difference is not important.

Locating modules

Before we get in to the details of packages it’s important to understand how Python locates modules. Generally speaking, when you ask Python to import a module, Python looks on your filesystem for the corresponding Python source file and loads that code. But how does Python know where to look? The answer is that Python checks the path attribute of the standard sys module, commonly referred to as sys.path.

The sys.path object is a list of directories. When you ask Python to import a module, it starts with the first directory in sys.path and checks for an appropriate file. If no match is found in the first directory it checks subsequent entries, in order, until a match is found or Python runs out of entries in sys.path, in which case an ImportError is raised.

`sys.path`

Let’s explore sys.path from the REPL. Run Python from the command line with no arguments and enter the following statements:

>>> import sys>>> sys.path['', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/rope_\py3k-0.9.4-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/s\ite-packages/decorator-3.4.0-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/3\.5/lib/python3.5/site-packages/Baker-1.3-py3.5.egg', '/Library/Frameworks/Python.framewor\k/Versions/3.5/lib/python3.5/site-packages/beautifulsoup4-4.1.3-py3.5.egg', '/Library/Fra\meworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pymongo-2.3-py3.5-macos\x-10.6-intel.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-\packages/eagertools-0.3-py3.5.egg', '/home/projects/emacs_config/traad/traad', '/Library/\Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/bottle-0.11.6-py3.5.\egg', '/home/projects/see_stats', '/Library/Frameworks/Python.framework/Versions/3.5/lib/\python3.5/site-packages/waitress-0.8.5-py3.5.egg', '/Library/Frameworks/Python.framework/\Versions/3.5/lib/python3.5/site-packages/pystache-0.5.3-py3.5.egg', '/Library/Frameworks/\Python.framework/Versions/3.5/lib/python3.5/site-packages/pyramid_tm-0.7-py3.5.egg', '/Li\brary/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pyramid_debugt\oolbar-1.0.6-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5\/site-packages/pyramid-1.4.3-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/3\.5/lib/python3.5/site-packages/transaction-1.4.1-py3.5.egg', '/Library/Frameworks/Python.\framework/Versions/3.5/lib/python3.5/site-packages/Pygments-1.6-py3.5.egg', '/Library/Fra\meworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/PasteDeploy-1.5.0-py3.5\.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tra\nslationstring-1.1-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/pyt\hon3.5/site-packages/venusian-1.0a8-py3.5.egg', '/Library/Frameworks/Python.framework/Ver\sions/3.5/lib/python3.5/site-packages/zope.deprecation-4.0.2-py3.5.egg', '/Library/Framew\orks/Python.framework/Versions/3.5/lib/python3.5/site-packages/zope.interface-4.0.5-py3.5\-macosx-10.6-intel.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5\/site-packages/repoze.lru-0.6-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/\3.5/lib/python3.5/site-packages/WebOb-1.2.3-py3.5.egg', '/Library/Frameworks/Python.frame\work/Versions/3.5/lib/python3.5/site-packages/Mako-0.8.1-py3.5.egg', '/Library/Frameworks\/Python.framework/Versions/3.5/lib/python3.5/site-packages/Chameleon-2.11-py3.5.egg', '/L\ibrary/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/MarkupSafe-0.\18-py3.5-macosx-10.6-intel.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/p\ython3.5/site-packages/pip-1.4.1-py3.5.egg', '/Library/Frameworks/Python.framework/Versio\ns/3.5/lib/python3.5/site-packages/ipython-1.0.0-py3.5.egg', '/Library/Frameworks/Python.\framework/Versions/3.5/lib/python3.5/site-packages/pandas-0.12.0-py3.5-macosx-10.6-intel.\egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/setu\ptools-1.1.6-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5\/site-packages/readline-6.2.4.1-py3.5-macosx-10.6-intel.egg', '/home/projects/see_stats/d\istribute-0.6.49-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/pytho\n3.5/site-packages/nltk-2.0.4-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/\3.5/lib/python3.5/site-packages/PyYAML-3.10-py3.5-macosx-10.6-intel.egg', '/Library/Frame\works/Python.framework/Versions/3.5/lib/python3.5/site-packages/numpy-1.8.0-py3.5-macosx-\10.6-intel.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-pa\ckages/grin-1.2.1-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/pyth\on3.5/site-packages/argparse-1.2.1-py3.5.egg', '/Library/Frameworks/Python.framework/Vers\ions/3.5/lib/python33.zip', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python\3.5', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/plat-darwin', '/Li\brary/Frameworks/Python.framework/Versions/3.5/lib/python3.5/lib-dynload', '/Library/Fram\eworks/Python.framework/Versions/3.5/lib/python3.5/site-packages']

As you see, your sys.path can be quite large. It’s precise entries depend on a number of factors, including how many third-party packages you’ve installed and how you’ve installed them. For our purposes, a few of these entries are of particular importance. First, let’s look at the very first entry:

>>> sys.path[0]''

Remember that sys.path is just a normal list so we can examine it’s contents with indexing and slicing. We see here that the first entry is the empty string. This happens when you start the Python interpreter with no arguments, and it instructs Python to search for modules first in the current directory.

Let’s also look at the tail of sys.path:

>>> sys.path[-5:]['/Library/Frameworks/Python.framework/Versions/3.5/lib/python33.zip', '/Library/Framewor\ks/Python.framework/Versions/3.5/lib/python3.5', '/Library/Frameworks/Python.framework/Ve\rsions/3.5/lib/python3.5/plat-darwin', '/Library/Frameworks/Python.framework/Versions/3.5\/lib/python3.5/lib-dynload', '/Library/Frameworks/Python.framework/Versions/3.5/lib/pytho\n3.5/site-packages']

These entries comprise Python’s standard library and the site-packages directory where you can install third-party modules.

`sys.path` in action

To really get a feel for sys.path, let’s create a Python source file in a directory that Python would not normally search:

$ mkdir not_searched

In that directory create a file called path_test.py with the following function definition:

# not_searched/path_test.py
def found():    return "Python found me!"

Now, start your REPL from the directory containing the not_searched directory and try to import path_test:

>>> import path_testTraceback (most recent call last):  File "<stdin>", line 1, in <module>ImportError: No module named 'path_test'

The path_test module — which, remember, is embodied in not_searched/path_test.py — is not found because path_test.py is not in a directory contained in sys.path. To make path_test importable we need to put the directory not_searched into sys.path.

Since sys.path is just a normal list, we can add our new entry using the append() method. Start a new REPL and enter this:

>>> import sys>>> sys.path.append('not_searched')

Now when we try to import path_test in the same REPL session we see that it works:

>>> import path_test>>> path_test.found()Python found me!

Knowing how to manually manipulate sys.path can be useful, and sometimes it’s the best way to make code available to Python. There is another way to add entries to sys.path, though, that doesn’t require direct manipulation of the list.

The `PYTHONPATH` environment variable

The PYTHONPATH environment variable is a list of paths that are added to sys.path when Python starts.

The format of PYTHONPATH is the same as PATH on your platform. On Windows PYTHONPATH is a semicolon-separated list of directories:

c:\some\path;c:\another\path;d:\yet\another

On Linux and OS X it’s a colon-separated list of directories:

/some/path:/another/path:/yet/another

To see how PYTHONPATH works, let’s add not_searched to it before starting Python again. On Windows use the set command:

> set PYTHONPATH=not_searched

On Linux or OS X the syntax will depend on your shell, but for bash-like shells you can use export:

$ export PYTHONPATH=not_searched

Now start a new REPL and check that not_searched is indeed in sys.path:

>>> [path for path in sys.path if 'not_searched' in path]['/home/python_journeyman/not_searched']

And of course we can now import path_test without manually editing sys.path:

>>> import path_test>>> path_test.found()Python found me!

There are more details to sys.path and PYTHONPATH, but this is most of what you need to know. For more information you can check these links to the Python documentation:

Implementing packages

We’ve seen that packages are modules that can contain other modules. But how are packages implemented? To create a normal module you simply create a Python source file in a directory contained in sys.path. The process for creating packages is not much different.

To create a package, first you create the package’s root directory. This root directory needs to be in some directory on sys.path; remember, this is how Python finds modules and packages for importing. Then, in that root directory, you create a file called __init__.py. This file — which we’ll often call the package init file — is what makes the package a module. __init__.py can be (and often is) empty; its presence alone suffices to establish the package.

In Namespace packages we’ll look at a somewhat more general form of packages which can span multiple directory trees. In that section we’ll see that, since PEP 420 was introduced in Python 3.3, __init__.py files are not technically required for packages any more.
So why do we do we talk about them as if they are required? For one thing, they are still required for earlier versions of Python. In fact, many people writing code for 3.3+ aren’t aware that __init__.py files are optional. As a result, you’ll still find them in the vast majority of packages, so it’s good to be familiar with them.
Furthermore, they provide a powerful tool for package initialization. So it’s important to understand how they can be used.
Perhaps most importantly, though, we recommend that you include __init__.py files when possible because “explicit is better than implicit”. The existence of a package initialization file is an unambigious signal that you intend for a directory to be a package, and it’s something that many Python developers instinctively look for.

A first package: `reader`

As with many things in Python, an example is much more instructive than words. Go to your command prompt and create a new directory called ‘reader’:

$ mkdir reader

Add an empty __init__.py to this directory. On Linux os OS X you can use the touch command:

$ touch reader/__init__.py

On Windows you can use type:

> type NUL > reader/__init__.py

Now if you start your REPL you’ll see that you can import reader:

>>> import reader>>> type(reader)<class 'module'>

The role of `init.py`

We can now start to examine the role that __init__.py plays in the functioning of a package. Check the __file__ attribute of the reader package:

>>> reader.__file__'./reader/__init__.py'

We saw that reader is a module, even though on our filesystem the name “reader” refers to a directory. Furthermore, the source file that is executed when reader is imported is the package init file in the reader directory, a.k.a reader/__init__.py. In other words — and to reiterate a critical point — a package is nothing more than a directory containing a file named __init__.py.

To see that the __init__.py is actually executed like any other module when you import reader, let’s add a small bit of code to it:

# reader/__init__.pyprint('reader is being imported!')

Restart your REPL, import reader, and you’ll see our little message printed out:

>>> import readerreader is being imported!

Adding functionality to the package

Now that we’ve created a basic package, let’s add some useful content to it. The goal of our package will be to create a class that can read data three different file formats: uncompressed text files, text files compressed with gzip, and text files compressed with bz2. ² We’ll call this class MultiReader since it can read multiple formats.

We’ll start by defining MultiReader. The initial version the class will only know how to read uncompressed text files; we’ll add support for gzip and bz2 later. Create the file reader/multireader.py with these contents:

# reader/multireader.py

class MultiReader:    def __init__(self, filename):        self.filename = filename        self.f = open(filename, 'rt')
    def close(self):        self.f.close()
    def read(self):        return self.f.read()

Start a new REPL and import your new module to try out the MultiReader class. Let’s use it to read the contents of reader/__init__.py itself:

>>> import reader.multireader>>> r = reader.multireader.MultiReader('reader/__init__.py')>>> r.read()"# reader/__init__.py\n">>> r.close()

In a somewhat “meta” turn, our package is reading some of its own source!

Subpackages

To demonstrate how packages provide high levels of structure to your Python code let’s add more layers of packages to the reader hierarchy. We’re going to add a subpackage to reader called compressed which will contain the code for working with compressed files. First, let’s create the new directory and its associated __init__.py On Linux or OS X use this:

$ mkdir reader/compressed$ touch reader/compressed/__init__.py

On Windows use these commands:

> mkdir reader\compressed> type NUL > reader\compressed\__init__.py

If you restart your REPL you’ll see that you can import reader.compressed:

>>> import reader.compressed>>> reader.compressed.__file__'reader/compressed/__init__.py'

gzip support

Next we’ll create the file reader/compressed/gzipped.py which will contain some code for working with the gzip compression format:

# reader/compressed/gzipped.py
import gzipimport sys
opener = gzip.open
if __name__ == '__main__':    f = gzip.open(sys.argv[1], mode='wt')    f.write(' '.join(sys.argv[2:]))    f.close()

As you can see, there’s not much to this code: it simply defines the name opener which is just an alias for gzip.open(). This function behaves much like the normal open() in that it returns a file-like-object ³ which can be read from. The main difference, of course, is that gzip.open() decompresses the contents of the file during reading while open() does not. ⁴

Note the idiomatic “main block” here. It uses gzip to create a new compressed file and write data to it. We’ll use that later to create some test files. For more information on __name__ and __main__, see chapter 3 of The Python Apprentice.

bz2 support

Similarly, let’s create another file for handling bz2 compression called reader/compressed/bzipped.py:

# reader/compressed/bzipped.py
import bz2import sys
opener = bz2.open
if __name__ == '__main__':    f = bz2.open(sys.argv[1], mode='wt')    f.write(' '.join(sys.argv[2:]))    f.close()

At this point you should have a directory structure that looks like this:

reader├── __init__.py├── multireader.py└── compressed    ├── __init__.py    ├── bzipped.py    └── gzipped.py

If you start a new REPL, you’ll see that we can import all of our modules just as you might expect:

>>> import reader>>> import reader.multireader>>> import reader.compressed>>> import reader.compressed.gzipped>>> import reader.compressed.bzipped

A full program

Let’s glue all of this together into a more useful sort of program. We’ll update MultiReader so that it can read from gzip files, bz2 files, and normal text files. It will determine which format to use based on file extensions. Change your MultiReader class in reader/multireader.py to use the compression handlers when necessary:

1 # reader/multireader.py 2  3 import os 4  5 from reader.compressed import bzipped, gzipped 6  7 # This maps file extensions to corresponding open methods. 8 extension_map = { 9     '.bz2': bzipped.opener,10     '.gz': gzipped.opener,11 }12 13 14 class MultiReader:15     def __init__(self, filename):16         extension = os.path.splitext(filename)[1]17         opener = extension_map.get(extension, open)18         self.f = opener(filename, 'rt')19 20     def close(self):21         self.f.close()22 23     def read(self):24         return self.f.read()

The most interesting part of this change is line 5 where we import the subpackages bzipped and gzipped. This demonstrates the fundamental organizing power of packages: related functionality can be grouped under a common name for easier identification.

With these changes, MultiReader will now check the extension of the filename it’s handed. If that extension is a key in extension_map then a specialized file-opener will be used — in this case, either reader.compressed.gzipped.opener or reader.compressed.gzipped.opener. Otherwise, the standard open() will be used.

To test this out, let’s first create some compressed files using the utility code we built into our compression modules. Execute the modules directly from the command line:

$ python3 -m reader.compressed.bzipped test.bz2 data compressed with bz2$ python3 -m reader.compressed.gzipped test.gz data compressed with gzip$ lsreader                test.bz2        test.gz

Feel free to verify for yourself that the contents of test.bz2 and test.gz are actually compressed (or at least that they’re not plain text!)

Start a new REPL and let’s take our code for a spin:

>>> from reader.multireader import MultiReader>>> r = MultiReader('test.bz2')>>> r.read()'data compressed with bz2'>>> r.close()>>> r = MultiReader('test.gz')>>> r.read()'data compressed with gzip'>>> r.close()>>> r = MultiReader('reader/__init__.py')>>> r.read(). . . the contents of reader/__init__.py . . .>>> r.close()

If you’ve put all the right code in all the right places, you should see that your MultiReader can indeed decompress gzip and bz2 files when it sees their associated file extensions.

Package review

We’ve covered a lot of information in this chapter already, so let’s review.

Packages are modules which can contain other modules.
Packages are generally implemented as directories containing a special __init__.py file.
The __init__.py file is executed when the package is imported.
Packages can contain subpackages which are themselves implemented as directories containing __init__.py files.
The module objects for packages have a __path__ attribute.

Relative imports

In this book we’ve seen a number of uses of the import keyword, and if you’ve done any amount of Python programming then you should be familiar with it. All of the uses we’ve seen so far are what are called absolute imports wherein you specify all of the ancestor modules of any module you want to import. For example, in order to import reader.compressed.bzipped in the previous section you had to mention both reader and compressed in the import statement:

# Both of these absolute imports mention both `reader` and `compressed`import reader.compressed.bzippedfrom reader.compressed import bzipped

There is an alternative form of imports called relative imports that lets you use shortened paths to modules and packages. Relative imports look like this:

from ..module_name import name

The obvious difference between this form of import and the absolute imports we’ve seen so far is the . s before module_name. In short, each . stands for an ancestor package of the module that is doing the import, starting with the package containing the module and moving towards the package root. Instead of specifying imports with absolute paths from the root of the package tree, you can specify them relative to the importing module, hence relative imports.

Note that you can only use relative imports with the “ from <module> import<names> ” form of import. Trying to do something like “ import.module ” is a syntax error.

Critically, relative imports can only be used within the current top-level package, never for importing modules outside of that package. So in our previous example the reader module could use a relative import for gzipped, but it needs to use absolute imports for anything outside of the top-level reader package.

Let’s illustrate this with some simple examples. Suppose we have this package structure:

graphics/  __init__.py  primitives/    __init__.py    line.py  shapes/    __init__.py    triangle.py  scenes/    __init__.py

Further suppose that the module line.py module includes the following definition:

# graphics/primitives/line.py
def render(x, y):   "draw a line from x to y"   # . . .

The graphics.shapes.triangle module is responsible for rendering — you guessed it! – triangles, and to do this it needs to use graphics.primitives.lines.render() to draw lines. One way triangle.py could import this function is with an absolute import:

# graphics/shapes/triangle.py
from graphics.primitives.line import render

Alternatively you could use a relative import:

# graphics/shapes/triangle.py
from ..primitives.line import render

The leading .. in the relative form means “the parent of the package containing this module”, or, in other words, the graphics package.

This table summarizes how the . s are interpreted when used to make relative imports from graphics.shapes.triangle:

triangle.py	module
`.`	`graphics.shapes`
`..`	`graphics`
`..primitives`	`graphics.shapes.primitives`

Bare dots in the `from` clause

In relative imports it’s also legal for the from section to consist purely of dots. In this case the dots are still interpreted in exactly the same way as before.

Going back to our example, suppose that the graphics.scenes package is used to build complex, multi-shape scenes. To do this it needs to use the contents of graphics.shapes, so graphics/scenes/__init__.py needs to import the graphics.shapes package. It could do this in a number of ways using absolute imports:

# graphics/scenes/__init__.py
# All of these import the same module in different waysimport graphics.shapesimport graphics.shapes as shapesfrom graphics import shapes

Alternatively, graphics.scenes could use relative imports to get at the same module:

# graphics/scenes/__init__.py
from .. import shapes

Here the .. means “the parent of the current package”, just as with the first form of relative imports.

It’s easy to see how relative imports can be useful for reducing typing in deeply nested package structures. They also promote certain forms of modifiability since they allow you, in principle, to rename top-level and sub-packages in some cases. On the whole, though, the general consensus seems to be that relative imports are best avoided in most cases.

`all`

Another topic we want to look at is the optional __all__ attribute of modules. __all__ lets you control which attributes are imported when someone uses the from module import * syntax. If __all__ is not specified then from x import* imports all public ⁵ names from the imported module.

The __all__ module attribute must be a list of strings, and each string indicates a name which will be imported when the * syntax is used. For example, we can see what from reader.compressed import * does. First, let’s add some code to reader/compressed/__init__.py:

# reader/compressed/__init__.py
from reader.compressed.bzipped import opener as bz2_openerfrom reader.compressed.gzipped import opener as gzip_opener

Next we’ll start a REPL and display all the names currently in scope:

>>> locals(){'__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__name__': '__main__', '__builtins__': <module 'builtins' (built-in)>, '__package__': None, '__doc__': None}

Now we’ll import all public names from compressed:

>>> from reader.compressed import *>>> locals(){'bz2_opener': <function open at 0x1018300e0>, 'gzip_opener': <function open at 0x101a36320>, 'gzipped': <module 'reader.compressed.gzipped' from './reader/compressed/gzipped.py'>, '\bzipped': <module 'reader.compressed.bzipped' from './reader/compressed/bzipped.py'>, '__\package__': None, '__name__': '__main__', '__builtins__': <module 'builtins' (built-in)>, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__doc__': None}>>> bzipped<module 'reader.compressed.bzipped' from './reader/compressed/bzipped.py'>>>> gzipped<module 'reader.compressed.gzipped' from './reader/compressed/gzipped.py'>

What we see is that from reader.compressed import * imported the bzipped and gzipped submodules of the compressed package directly into our local namespace. We prefer that import * only imports the different “opener” functions from each of these modules, so let’s update compressed/__init__.py to do that:

# reader/compressed/__init__.python
from reader.compressed.bzipped import opener as bz2_openerfrom reader.compressed.gzipped import opener as gzip_opener
__all__ = ['bz2_opener', 'gzip_opener']

Now if we use import * on reader.compressed we only import the “opener” functions, not their modules as well:

>>> locals(){'__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__doc__\': None, '__builtins__': <module 'builtins' (built-in)>, '__name__': '__main__'}>>> from reader.compressed import *>>> locals(){'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_fro\zen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__':\ <module 'builtins' (built-in)>, 'bz2_opener': <function open at 0x10efb9378>, 'gzip_open\er': <function open at 0x10f06b620>}>>> bz2_opener<function open at 0x1022300e0>>>> gzip_opener<function open at 0x10230a320>

The __all__ module attribute can be a useful tool for limiting which names are exposed by your modules. We still don’t recommend that you use the import * syntax outside of convenience in the REPL, but it’s good to know about __all__ since you’re likely to see it in the wild.

Namespace packages

Earlier we said that packages are implemented as directories containing a __init__.py file. This is true for most cases, but there are situations where you want to be able to split a package across multiple directories. This is useful, for example, when a logical package needs to be delivered in multiple parts, as happens in some of the larger Python projects.

Several approaches to addressing this need have been implemented, but it was in PEP420 in 2012 that an official solution was built into the Python language. This solution is known as namespace packages. A namespace package is a package which is spread over several directories, with each directory tree contributing to a single logical package from the programmer’s point of view.

Namespace packages are different from normal packages in that they don’t have __init__.py files. This is important because it means that namespace packages can’t have package-level initialization code; nothing will be executed by the package when it’s imported. The reason for this limitation is primarily that it avoids complex questions of initialization order when multiple directories contribute to a package.

But if namespace packages don’t have __init__.py files, how does Python find them during import? The answer is that Python follows a relatively simple algorithm to detect namespace packages. When asked to import the name “foo”, Python scans each of the entries in sys.path in order. If in any of these directories it finds a directory named “foo” containing __init__.py, then a normal package is imported. If it doesn’t find any normal packages but it does find foo.py or any other file that can act as a module ⁶, then this module is imported instead. Otherwise, the import mechanism keeps track of any directories it finds which are named “foo”. If no normal packages or modules are found which satisfy the import, then all of the matching directory names act as parts of a namespace package.

An example of a namespace package

As a simple example, let’s see how we might turn the graphics package into a namespace package. Instead of putting all of the code under a single directory, we would have two independent parts rooted at path1 and path2 like this:

path1└── graphics    ├── primitives    │   ├── __init__.py    │   └── line.py    └── shapes        ├── __init__.py        └── triangle.pypath2└── graphics    └── scenes        └── __init__.py

This separates the scenes package from the rest of the package.

Now to import graphics you need to make sure that both path1 and path2 are in your sys.path. We can do that in a REPL like this:

>>> import sys>>> sys.path.extend(['./path1', './path2'])>>> import graphics>>> graphics.__path___NamespacePath(['./path1/graphics', './path2/graphics'])>>> import graphics.primitives>>> import graphics.scenes>>> graphics.primitives.__path__['./path1/graphics/primitives']>>> graphics.scenes.__path__['./path2/graphics/scenes']

We put path1 and path2 at the end of sys.path. When we import graphics we see that its __path__ includes portions from both path1 and path2. And when we import the primitives and scenes we see that they are indeed coming from their respective directories.

There are more details to namespace packages, but this addresses most of the important details that you’ll need to know. In fact, for the most part it’s not likely that you’ll need to develop your own namespace packages at all. If you do want to learn more about them, though, you can start by reading PEP 420.

Executable directories

Packages are often developed because they implement some program that you want to execute. There are a number of ways to construct such programs, but one of the simplest is through the use of executable directories. Executable directories let you specify a main entry point which is run when the directory is executed by Python.

What do we mean when we say that Python “executes a directory”? We mean passing a directory name to Python on the command line like this:

$ mkdir text_stats$ python3 text_stats

Normally this doesn’t work and Python will complain saying that it can’t find a __main__ module:

$ python3 text_stats/usr/local/bin/python3: can't find '__main__' module in 'text_stats'

However, as that error message suggests, you can put a special module named __main__.py in the directory and Python will execute it. This module can execute whatever code it wants, meaning that it can call into modules you’ve created to provide, for example, a user interface to your modules.

To illustrate this, let’s add a __main__.py to our text_stats directory. This program will count (crudely!) the number of words and characters passed in on the command line:

# text_stats/__main__.py
import sys
segments = sys.argv[1:]
full_text = ' '.join(segments)
output = '# words: {}, # chars: {}'.format(    len(full_text.split()),    sum(len(w) for w in full_text.split()))
print(output)

Now if we pass this text_stats directory to Python we will see our __main__.py executed:

$ python text_stats I’m seated in an office, surrounded by heads and bodies.# words: 10, # chars: 47

This is interesting, but used in this way __main__.py is not much more than a curiosity. As we’ll soon see, however, this idea of an “executable directory” can be used to better organize code that might otherwise sprawl inside a single file.

`main.py` and `sys.path`

When Python executes a __main__.py, it first adds the directory containing __main__.py to sys.path. This way __main__.py can easily import any other modules with which it shares a directory.

If you think of the directory containing __main__.py as a program, then you can see how this change to sys.path allows you to organize your code in better ways. You can use separate modules for the logically distinct parts of your program. In the case of our text_stats example it makes sense to move the actual counting logic into a separate module, so let’s put that code into a module called counter:

# text_stats/counter.py
def count(text):    words = text.split()    return (len(words), sum(len(w) for w in words))

We’ll then update our __main__.py to use counter:

# text_stats/__main__.py
import sys
import counter
segments = sys.argv[1:]full_text = ' '.join(segments)output = '# words: {}, # chars: {}'.format(    *counter.count(full_text))print(output)

Now the logic for counting words and letters is cleanly separated from the UI logic in our little program. If we run our directory again we see that it still works:

$ python3 text_stats It is possible I already had some presentiment of my future.# words: 11, # chars: 50

Zipping up executable directories

We can take the executable directory idea one step further by zipping the directory. Python knows how to read zip-files and treat them like directories, meaning that we can create executable zip-files just like we created executable directories.

Create a zip-file from your text_stats directory:

$ cd text_stats$ zip -r ../text_stats.zip *$ cd ..

The zip-file should contain the contents of your executable directory but not the executable directory itself. The zip-file takes the place of the directory.

Now we can tell Python to execute the zip-file rather than the directory:

$ python3 text_stats.zip Sing, goddess, the anger of Peleus’ son Achilleus# words: 8, # chars: 42

Combining Python’s support for __main__.py with its ability to execute zip-files gives us a convenient way to distribute code in some cases. If you develop a program consisting of a directory containing some modules and a __main__.py, you can zip up the contents of the directory, share it with others, and they’ll be able to run it with no need for installing any packages to their Python installation.

Of course, sometimes you really do need to distribute proper packages rather than more ad hoc collections of modules, so we’ll look at the role of __main__.py in packages next.

Executable packages

In the previous section we saw how to use __main__.py to make a directory directly executable. You can use a similar technique to create executable packages. If you put __main__.py in a package directory, then Python will execute it when you run the package with python3 -m package_name.

To demonstrate this, let’s convert our text_stats program into a package. Create an empty __init__.py in the text_stats directory. Then edit text_stats/__main__.py to import counter as a relative import:

# text_stats/__main__.py
import sys
from .counter import count
segments = sys.argv[1:]full_text = ' '.join(segments)output = '# words: {}, # chars: {}'.format(    *count(full_text))print(output)

After these changes we see that we can no longer execute the directory with python3 text_stats like before:

$ python3 text_stats horror vacuiTraceback (most recent call last):  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.6/runpy.py", line 1\93, in _run_module_as_main    "__main__", mod_spec)  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.6/runpy.py", line 8\5, in _run_code    exec(code, run_globals)  File "text_stats/__main__.py", line 5, in <module>    from .counter import countImportError: attempted relative import with no known parent package

Python complains that __main__.py can’t import counter with a relative import. This seems to be at odds with our design: __main__.py and counter.py are clearly in the same package!

The reason this fails is because of what we learned earlier about executable directories. When we ask Python to execute the text_stats directory it first adds the text_stats directory to sys.path. It then executes __main__.py. The crucial detail is that sys.path contains text_stats itself, not the directory containing text_stats. As a result, Python doesn’t recognize text_stats as a package at all; we haven’t told it the right place to look.

In order to execute our package we need to tell Python to treat text_stats as a module ⁷ with the -m command line flag:

$ python3 -m text_stats fiat lux# words: 2, # chars: 7

Now Python looks for the module text_stats.__main__ — that is, our text_stats/__main__.py — and executes it while treating text_stats as a package. As a result, __main__.py is able to use a relative import to pull in counter.

The difference between `init.py` and `main.py`

As you’ll recall, Python executes __init__.py the first time a package is imported. So you may be wondering why we need __main__.py at all. After all, can’t we just execute the same code in __init__.py as we do in __main__.py?

The short answer is “no”. You can, of course, put whatever code you want in __init__.py, but Python will not execute a package unless it contains __main__.py. To see this, first move text_stats/__main__.py out of text_stats and try running the package again:

$ python3 -m text_stats sleeping giant/usr/local/bin/python3.6: No module named text_stats.__main__; 'text_stats' is a package \and cannot be directly executed

Now move __main__.py back into place and edit text_stats/__init__.py to print a little message:

# text_stats/__init__.py
print("executing {}.__init__".format(__name__))

Execute the package again:

$ python3 -m text_stats sleeping giantexecuting text_stats.__init__# words: 2, # chars: 13

We can see that our package’s __init__.py is indeed executed when it’s imported, but, again, Python will not let us execute a package unless it contains __main__.py.

Recommended layout

As we close out this chapter, let’s look at how you can best structure your projects. There are no hard-and-fast rules about how you lay out your code, but some options are generally better than others. What we’ll present here is a good, general-purpose structure that will work for almost any project you might work on.

Here’s the basic project layout:

project_name/  setup.py  project_name/    __init__.py    more_source.py    subpackage1/      __init__.py  test/    test_code.py

At the very top level you have a directory with the project’s name. This directory is not a package but is a directory containing both your package as well as supporting files like your setup.py, license details, and your test directory.

The next directory down is your actual package directory. This has the same name as your top-level directory. Again, there’s no rule that says that this must be so, but this is a common pattern and makes it easy to recognize where you are when navigating your project. Your package contains all of the production code including any subpackages.

Separation between package and test

The test directory contains all of your tests. This may be as simple as a few Python files or as complex as multiple suites of unit, integration, and end-to-end tests. We recommend keeping your tests outside of your package for a number of reasons.

Generally speaking, test and production code serve very different purposes and shouldn’t be coupled unnecessarily. Since you usually don’t need your tests to be installed along with your package, this keeps packaging tools from bundling them together. Also, more exotically, this arrangement ensures that certain tools won’t accidentally try to treat your tests as production code. ⁸

As with all things, this test directory arrangement may not suit your needs. Certainly you will find examples of packages that include their tests as subpackages. ⁹ If you find that you need to include all or some of your tests in a subpackage, you absolutely should.

A pragmatic starting point

There’s not much more to it than that. This is a very simple structure, but it works well for most needs. It serves as a fine starting point for more complex project structures, and this is the structure we typically use when starting new projects.

Modules are singletons

The singleton pattern is one of the most widely-known patterns in software development, in large part because it’s very simple and, in some ways, provides a superior option to the dreaded global variable. The intention of the singleton pattern is to limit the number of instances of an type to only one. For example, a single registry of items easily accessible from throughout the application. ¹⁰

If you find that you need a singleton in Python, one simple and effective way to implement it is as a module-level attribute. Since modules are only ever executed once, this guarantees that your singleton will only be initialized once. And since modules are initialized in a well-defined user-controlled order, you can have strong guarantees about when your singleton will be initialized. This forms a strong basis for implementing practical singletons.

Consider a simple singleton registry, implemented in registry.py, where callers can leave their names:

# registry.py
_registry = []
def register(name):    _registry.append(name)
def registered_names():    return iter(_registry)

Callers would use it like this:

import registry
registry.register('my name')
for name in registry.registered_names():    print(name)

The first time registry is imported, the _registry list is initialized. Then, every call to register and registered_names will access that list with complete assurance that it has been properly initialized.

You will recall that the leading underscore in _registry is a Python idiom indicating that _registry is an implementation detail that should not be accessed directly.

This simple pattern leverages Python’s robust module semantics and is a useful way to implement singletons in a safe, reliable way.

Summary

Packages are an important concept in Python, and in this chapter we’ve covered most of the major topics related to implementing and working with them. Let’s review the topics we looked at:

Packages:
- Packages are a special type of module
- Unlike normal module, packages can contain other modules, including other packages.
- Packages hierarchies are a powerful way to organize related code.
- Packages have a __path__ member which is a sequence specifying the directories from which a package is loaded.
- A simple standard project structure includes a location for non-python files, the project’s package, and a dedicated test subpackage.
sys.path:
- sys.path is a list of directories where Python searches for modules.
- sys.path is a normal list and can be modified and queried like any other list.
- If you start Python with no arguments, an empty string is put at the front of sys.path. This instructs Python to import modules from the current directory.
- Appending directories to sys.path at runtime allows modules to be imported from those directories.
PYTHONPATH:
- PYTHONPATH is an environment variable containing a list of directories.
- The format of PYTHONPATH is the same as for PATH on your system. It’s a semi-colon-separated list on Windows and a colon-separated list on Linux or Mac OS X.
- The contents of PYTHONPATH are added as entries to sys.path.
__init__.py:
- Normal packages are implemented by putting a file named __init__.py into a directory.
- The __init__.py file for a package is executed when the package is imported.
- __init__.py files can hoist attributes from submodule into higher namespaces for convenience.
Relative imports:
- Relative import allow you to import modules within a package without specifying the full module path.
- Relative imports must use the from module import name form of import.
- The “from” portion of a relative import starts with at least one dot.
- Each dot in a relative import represents a containing package.
- The first dot in a relative import means “the package containing this module.”
- Relative imports can be useful for reducing typing.
- Relative imports can improve modifiability in some cases.
- In general, it’s best to avoid relative imports because they can make code harder to understand.
Namespace packages:
- A namespace package is a package split across several directories.
- Namespace packages are described in PEP420.
- Namespace packages don’t use __init__.py files.
- Namespace packages are created when one or more directories in the Python path match an import request and no normal packages or modules match the request.
- Each directory that contributes to a namespace package is listed in the package’s __path__ attribute.
Executable directories:
- Executable directories are created by putting a __main__.py file in a directory.
- You execute a directory with Python by passing it to the Python executable on the command line.
- When __main__.py is executed its __name__ attribute is set to __main__.
- When __main__.py is executed, it’s parent directory is automatically added to sys.path.
- The if __name__ == '__main__': construct is redundant in a __main__.py file.
- Executable directories can be compressed into zip-files which can be executed as well.
- Executable directories and zip-files are convenient ways to distribute Python programs.
Modules:
- Modules can be executed by passing them to Python with the -m argument.
- The __all__ attribute of a module is a list of string specifying the names to export when from module import * is used.
- Module-level attributes provide a good mechanism for implementing singletons.
- Modules have well-defined initialization semantics.
Miscellaneous:
- The standard gzip module allows you to work with files compressed using GZIP.
- The standard bz2 module allows you to work with files compressed using BZ2.