Organizing Larger Programs
In this chapter we’ll be covering more of Python’s techniques for organizing programs. Specifically we’ll be looking at Python’s concept of packages and how these can be used to add structure to your program as it grows beyond simple modules.
Packages
As you’ll recall, Python’s basic tool for organizing code is the module. 1 A module typically corresponds to a single source file, and you load modules into programs by using the import
keyword. When you import a module, it is represented by an object of type module
and you can interact with it like any other object.
A package in Python is just a special type of module. The defining characteristic of a package is that it can contain other modules, including other packages. So packages are a way to define hierarchies of modules in Python. This allows you to group modules with similar functionality together in ways that express their cohesiveness.
An example of a package: urllib
Many parts of Python’s standard library are implemented as packages. To see an example, open your REPL and import urllib
and urllib.request
:
>>> import urllib>>> import urllib.request
Now if you check the types of both of the these modules, you’ll see that they’re both of type module
:
>>> type(urllib)<class 'module'>>>> type(urllib.request)<class 'module'>
The important point here is that urllib.request
is nested inside urllib
. In this case, urllib
is a package and request
is a normal module.
The __path__
attribute of packages
If you closely inspect each of these objects, you’ll notice an important difference. The urllib
package has a __path__
member that urllib.request
does not have:
>>> urllib.__path__['./urllib']>>> urllib.request.__path__Traceback (most recent call last): File "<stdin>", line 1, in <module>AttributeError: 'module' object has no attribute '__path__'
This attribute is a list of filesystem paths indicating where urllib
searches to find nested modules. This hints at the nature of the distinction between packages and modules: packages are generally represented by directories in the filesystem while modules are represented by single files.
Note that in Python 3 versions prior to 3.3, __path__
was just a single string, not a list. In this book we’re focusing on Python 3.5+, but for most purposes the difference is not important.
Locating modules
Before we get in to the details of packages it’s important to understand how Python locates modules. Generally speaking, when you ask Python to import
a module, Python looks on your filesystem for the corresponding Python source file and loads that code. But how does Python know where to look? The answer is that Python checks the path
attribute of the standard sys
module, commonly referred to as sys.path
.
The sys.path
object is a list of directories. When you ask Python to import a module, it starts with the first directory in sys.path
and checks for an appropriate file. If no match is found in the first directory it checks subsequent entries, in order, until a match is found or Python runs out of entries in sys.path
, in which case an ImportError
is raised.
sys.path
Let’s explore sys.path
from the REPL. Run Python from the command line with no arguments and enter the following statements:
>>> import sys>>> sys.path['', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/rope_\py3k-0.9.4-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/s\ite-packages/decorator-3.4.0-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/3\.5/lib/python3.5/site-packages/Baker-1.3-py3.5.egg', '/Library/Frameworks/Python.framewor\k/Versions/3.5/lib/python3.5/site-packages/beautifulsoup4-4.1.3-py3.5.egg', '/Library/Fra\meworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pymongo-2.3-py3.5-macos\x-10.6-intel.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-\packages/eagertools-0.3-py3.5.egg', '/home/projects/emacs_config/traad/traad', '/Library/\Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/bottle-0.11.6-py3.5.\egg', '/home/projects/see_stats', '/Library/Frameworks/Python.framework/Versions/3.5/lib/\python3.5/site-packages/waitress-0.8.5-py3.5.egg', '/Library/Frameworks/Python.framework/\Versions/3.5/lib/python3.5/site-packages/pystache-0.5.3-py3.5.egg', '/Library/Frameworks/\Python.framework/Versions/3.5/lib/python3.5/site-packages/pyramid_tm-0.7-py3.5.egg', '/Li\brary/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pyramid_debugt\oolbar-1.0.6-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5\/site-packages/pyramid-1.4.3-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/3\.5/lib/python3.5/site-packages/transaction-1.4.1-py3.5.egg', '/Library/Frameworks/Python.\framework/Versions/3.5/lib/python3.5/site-packages/Pygments-1.6-py3.5.egg', '/Library/Fra\meworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/PasteDeploy-1.5.0-py3.5\.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tra\nslationstring-1.1-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/pyt\hon3.5/site-packages/venusian-1.0a8-py3.5.egg', '/Library/Frameworks/Python.framework/Ver\sions/3.5/lib/python3.5/site-packages/zope.deprecation-4.0.2-py3.5.egg', '/Library/Framew\orks/Python.framework/Versions/3.5/lib/python3.5/site-packages/zope.interface-4.0.5-py3.5\-macosx-10.6-intel.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5\/site-packages/repoze.lru-0.6-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/\3.5/lib/python3.5/site-packages/WebOb-1.2.3-py3.5.egg', '/Library/Frameworks/Python.frame\work/Versions/3.5/lib/python3.5/site-packages/Mako-0.8.1-py3.5.egg', '/Library/Frameworks\/Python.framework/Versions/3.5/lib/python3.5/site-packages/Chameleon-2.11-py3.5.egg', '/L\ibrary/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/MarkupSafe-0.\18-py3.5-macosx-10.6-intel.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/p\ython3.5/site-packages/pip-1.4.1-py3.5.egg', '/Library/Frameworks/Python.framework/Versio\ns/3.5/lib/python3.5/site-packages/ipython-1.0.0-py3.5.egg', '/Library/Frameworks/Python.\framework/Versions/3.5/lib/python3.5/site-packages/pandas-0.12.0-py3.5-macosx-10.6-intel.\egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/setu\ptools-1.1.6-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5\/site-packages/readline-6.2.4.1-py3.5-macosx-10.6-intel.egg', '/home/projects/see_stats/d\istribute-0.6.49-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/pytho\n3.5/site-packages/nltk-2.0.4-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/\3.5/lib/python3.5/site-packages/PyYAML-3.10-py3.5-macosx-10.6-intel.egg', '/Library/Frame\works/Python.framework/Versions/3.5/lib/python3.5/site-packages/numpy-1.8.0-py3.5-macosx-\10.6-intel.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-pa\ckages/grin-1.2.1-py3.5.egg', '/Library/Frameworks/Python.framework/Versions/3.5/lib/pyth\on3.5/site-packages/argparse-1.2.1-py3.5.egg', '/Library/Frameworks/Python.framework/Vers\ions/3.5/lib/python33.zip', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python\3.5', '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/plat-darwin', '/Li\brary/Frameworks/Python.framework/Versions/3.5/lib/python3.5/lib-dynload', '/Library/Fram\eworks/Python.framework/Versions/3.5/lib/python3.5/site-packages']
As you see, your sys.path
can be quite large. It’s precise entries depend on a number of factors, including how many third-party packages you’ve installed and how you’ve installed them. For our purposes, a few of these entries are of particular importance. First, let’s look at the very first entry:
>>> sys.path[0]''
Remember that sys.path
is just a normal list so we can examine it’s contents with indexing and slicing. We see here that the first entry is the empty string. This happens when you start the Python interpreter with no arguments, and it instructs Python to search for modules first in the current directory.
Let’s also look at the tail of sys.path
:
>>> sys.path[-5:]['/Library/Frameworks/Python.framework/Versions/3.5/lib/python33.zip', '/Library/Framewor\ks/Python.framework/Versions/3.5/lib/python3.5', '/Library/Frameworks/Python.framework/Ve\rsions/3.5/lib/python3.5/plat-darwin', '/Library/Frameworks/Python.framework/Versions/3.5\/lib/python3.5/lib-dynload', '/Library/Frameworks/Python.framework/Versions/3.5/lib/pytho\n3.5/site-packages']
These entries comprise Python’s standard library and the site-packages
directory where you can install third-party modules.
sys.path
in action
To really get a feel for sys.path
, let’s create a Python source file in a directory that Python would not normally search:
$ mkdir not_searched
In that directory create a file called path_test.py
with the following function definition:
# not_searched/path_test.py
def found(): return "Python found me!"
Now, start your REPL from the directory containing the not_searched
directory and try to import path_test
:
>>> import path_testTraceback (most recent call last): File "<stdin>", line 1, in <module>ImportError: No module named 'path_test'
The path_test
module — which, remember, is embodied in not_searched/path_test.py
— is not found because path_test.py
is not in a directory contained in sys.path
. To make path_test
importable we need to put the directory not_searched
into sys.path
.
Since sys.path
is just a normal list, we can add our new entry using the append()
method. Start a new REPL and enter this:
>>> import sys>>> sys.path.append('not_searched')
Now when we try to import path_test
in the same REPL session we see that it works:
>>> import path_test>>> path_test.found()Python found me!
Knowing how to manually manipulate sys.path
can be useful, and sometimes it’s the best way to make code available to Python. There is another way to add entries to sys.path
, though, that doesn’t require direct manipulation of the list.
The PYTHONPATH
environment variable
The PYTHONPATH
environment variable is a list of paths that are added to sys.path
when Python starts.
The format of PYTHONPATH
is the same as PATH
on your platform. On Windows PYTHONPATH
is a semicolon-separated list of directories:
c:\some\path;c:\another\path;d:\yet\another
On Linux and OS X it’s a colon-separated list of directories:
/some/path:/another/path:/yet/another
To see how PYTHONPATH
works, let’s add not_searched
to it before starting Python again. On Windows use the set
command:
> set PYTHONPATH=not_searched
On Linux or OS X the syntax will depend on your shell, but for bash-like shells you can use export
:
$ export PYTHONPATH=not_searched
Now start a new REPL and check that not_searched
is indeed in sys.path
:
>>> [path for path in sys.path if 'not_searched' in path]['/home/python_journeyman/not_searched']
And of course we can now import path_test
without manually editing sys.path
:
>>> import path_test>>> path_test.found()Python found me!
There are more details to sys.path
and PYTHONPATH
, but this is most of what you need to know. For more information you can check these links to the Python documentation:
Implementing packages
We’ve seen that packages are modules that can contain other modules. But how are packages implemented? To create a normal module you simply create a Python source file in a directory contained in sys.path
. The process for creating packages is not much different.
To create a package, first you create the package’s root directory. This root directory needs to be in some directory on sys.path
; remember, this is how Python finds modules and packages for importing. Then, in that root directory, you create a file called __init__.py
. This file — which we’ll often call the package init file — is what makes the package a module. __init__.py
can be (and often is) empty; its presence alone suffices to establish the package.
In Namespace packages we’ll look at a somewhat more general form of packages which can span multiple directory trees. In that section we’ll see that, since PEP 420 was introduced in Python 3.3,
__init__.py
files are not technically required for packages any more.So why do we do we talk about them as if they are required? For one thing, they are still required for earlier versions of Python. In fact, many people writing code for 3.3+ aren’t aware that
__init__.py
files are optional. As a result, you’ll still find them in the vast majority of packages, so it’s good to be familiar with them.Furthermore, they provide a powerful tool for package initialization. So it’s important to understand how they can be used.
Perhaps most importantly, though, we recommend that you include
__init__.py
files when possible because “explicit is better than implicit”. The existence of a package initialization file is an unambigious signal that you intend for a directory to be a package, and it’s something that many Python developers instinctively look for.
A first package: reader
As with many things in Python, an example is much more instructive than words. Go to your command prompt and create a new directory called ‘reader’:
$ mkdir reader
Add an empty __init__.py
to this directory. On Linux os OS X you can use the touch command:
$ touch reader/__init__.py
On Windows you can use type
:
> type NUL > reader/__init__.py
Now if you start your REPL you’ll see that you can import reader
:
>>> import reader>>> type(reader)<class 'module'>
The role of __init__.py
We can now start to examine the role that __init__.py
plays in the functioning of a package. Check the __file__
attribute of the reader
package:
>>> reader.__file__'./reader/__init__.py'
We saw that reader
is a module, even though on our filesystem the name “reader” refers to a directory. Furthermore, the source file that is executed when reader
is imported is the package init file in the reader
directory, a.k.a reader/__init__.py
. In other words — and to reiterate a critical point — a package is nothing more than a directory containing a file named __init__.py
.
To see that the __init__.py
is actually executed like any other module when you import reader
, let’s add a small bit of code to it:
# reader/__init__.pyprint('reader is being imported!')
Restart your REPL, import reader
, and you’ll see our little message printed out:
>>> import readerreader is being imported!
Adding functionality to the package
Now that we’ve created a basic package, let’s add some useful content to it. The goal of our package will be to create a class that can read data three different file formats: uncompressed text files, text files compressed with gzip, and text files compressed with bz2. 2 We’ll call this class MultiReader
since it can read multiple formats.
We’ll start by defining MultiReader
. The initial version the class will only know how to read uncompressed text files; we’ll add support for gzip and bz2 later. Create the file reader/multireader.py
with these contents:
# reader/multireader.py
class MultiReader: def __init__(self, filename): self.filename = filename self.f = open(filename, 'rt')
def close(self): self.f.close()
def read(self): return self.f.read()
Start a new REPL and import your new module to try out the MultiReader
class. Let’s use it to read the contents of reader/__init__.py
itself:
>>> import reader.multireader>>> r = reader.multireader.MultiReader('reader/__init__.py')>>> r.read()"# reader/__init__.py\n">>> r.close()
In a somewhat “meta” turn, our package is reading some of its own source!
Subpackages
To demonstrate how packages provide high levels of structure to your Python code let’s add more layers of packages to the reader
hierarchy. We’re going to add a subpackage to reader
called compressed
which will contain the code for working with compressed files. First, let’s create the new directory and its associated __init__.py
On Linux or OS X use this:
$ mkdir reader/compressed$ touch reader/compressed/__init__.py
On Windows use these commands:
> mkdir reader\compressed> type NUL > reader\compressed\__init__.py
If you restart your REPL you’ll see that you can import reader.compressed
:
>>> import reader.compressed>>> reader.compressed.__file__'reader/compressed/__init__.py'
gzip support
Next we’ll create the file reader/compressed/gzipped.py
which will contain some code for working with the gzip compression format:
# reader/compressed/gzipped.py
import gzipimport sys
opener = gzip.open
if __name__ == '__main__': f = gzip.open(sys.argv[1], mode='wt') f.write(' '.join(sys.argv[2:])) f.close()
As you can see, there’s not much to this code: it simply defines the name opener
which is just an alias for gzip.open()
. This function behaves much like the normal open()
in that it returns a file-like-object 3 which can be read from. The main difference, of course, is that gzip.open()
decompresses the contents of the file during reading while open()
does not. 4
Note the idiomatic “main block” here. It uses
gzip
to create a new compressed file and write data to it. We’ll use that later to create some test files. For more information on__name__
and__main__
, see chapter 3 of The Python Apprentice.
bz2 support
Similarly, let’s create another file for handling bz2 compression called reader/compressed/bzipped.py
:
# reader/compressed/bzipped.py
import bz2import sys
opener = bz2.open
if __name__ == '__main__': f = bz2.open(sys.argv[1], mode='wt') f.write(' '.join(sys.argv[2:])) f.close()
At this point you should have a directory structure that looks like this:
reader├── __init__.py├── multireader.py└── compressed ├── __init__.py ├── bzipped.py └── gzipped.py
If you start a new REPL, you’ll see that we can import all of our modules just as you might expect:
>>> import reader>>> import reader.multireader>>> import reader.compressed>>> import reader.compressed.gzipped>>> import reader.compressed.bzipped
A full program
Let’s glue all of this together into a more useful sort of program. We’ll update MultiReader
so that it can read from gzip files, bz2 files, and normal text files. It will determine which format to use based on file extensions. Change your MultiReader
class in reader/multireader.py
to use the compression handlers when necessary:
1 # reader/multireader.py 2 3 import os 4 5 from reader.compressed import bzipped, gzipped 6 7 # This maps file extensions to corresponding open methods. 8 extension_map = { 9 '.bz2': bzipped.opener,10 '.gz': gzipped.opener,11 }12 13 14 class MultiReader:15 def __init__(self, filename):16 extension = os.path.splitext(filename)[1]17 opener = extension_map.get(extension, open)18 self.f = opener(filename, 'rt')19 20 def close(self):21 self.f.close()22 23 def read(self):24 return self.f.read()
The most interesting part of this change is line 5 where we import the subpackages bzipped
and gzipped
. This demonstrates the fundamental organizing power of packages: related functionality can be grouped under a common name for easier identification.
With these changes, MultiReader
will now check the extension of the filename it’s handed. If that extension is a key in extension_map
then a specialized file-opener will be used — in this case, either reader.compressed.gzipped.opener
or reader.compressed.gzipped.opener
. Otherwise, the standard open()
will be used.
To test this out, let’s first create some compressed files using the utility code we built into our compression modules. Execute the modules directly from the command line:
$ python3 -m reader.compressed.bzipped test.bz2 data compressed with bz2$ python3 -m reader.compressed.gzipped test.gz data compressed with gzip$ lsreader test.bz2 test.gz
Feel free to verify for yourself that the contents of test.bz2
and test.gz
are actually compressed (or at least that they’re not plain text!)
Start a new REPL and let’s take our code for a spin:
>>> from reader.multireader import MultiReader>>> r = MultiReader('test.bz2')>>> r.read()'data compressed with bz2'>>> r.close()>>> r = MultiReader('test.gz')>>> r.read()'data compressed with gzip'>>> r.close()>>> r = MultiReader('reader/__init__.py')>>> r.read(). . . the contents of reader/__init__.py . . .>>> r.close()
If you’ve put all the right code in all the right places, you should see that your MultiReader
can indeed decompress gzip and bz2 files when it sees their associated file extensions.
Package review
We’ve covered a lot of information in this chapter already, so let’s review.
- Packages are modules which can contain other modules.
- Packages are generally implemented as directories containing a special
__init__.py
file. - The
__init__.py
file is executed when the package is imported. - Packages can contain subpackages which are themselves implemented as directories containing
__init__.py
files. - The
module
objects for packages have a__path__
attribute.
Relative imports
In this book we’ve seen a number of uses of the import
keyword, and if you’ve done any amount of Python programming then you should be familiar with it. All of the uses we’ve seen so far are what are called absolute imports wherein you specify all of the ancestor modules of any module you want to import. For example, in order to import reader.compressed.bzipped
in the previous section you had to mention both reader
and compressed
in the import
statement:
# Both of these absolute imports mention both `reader` and `compressed`import reader.compressed.bzippedfrom reader.compressed import bzipped
There is an alternative form of imports called relative imports that lets you use shortened paths to modules and packages. Relative imports look like this:
from ..module_name import name
The obvious difference between this form of import
and the absolute imports we’ve seen so far is the .
s before module_name
. In short, each .
stands for an ancestor package of the module that is doing the import, starting with the package containing the module and moving towards the package root. Instead of specifying imports with absolute paths from the root of the package tree, you can specify them relative to the importing module, hence relative imports.
Note that you can only use relative imports with the “ from <module> import<names>
” form of import
. Trying to do something like “ import.module
” is a syntax error.
Critically, relative imports can only be used within the current top-level package, never for importing modules outside of that package. So in our previous example the reader
module could use a relative import for gzipped
, but it needs to use absolute imports for anything outside of the top-level reader
package.
Let’s illustrate this with some simple examples. Suppose we have this package structure:
graphics/ __init__.py primitives/ __init__.py line.py shapes/ __init__.py triangle.py scenes/ __init__.py
Further suppose that the module line.py
module includes the following definition:
# graphics/primitives/line.py
def render(x, y): "draw a line from x to y" # . . .
The graphics.shapes.triangle
module is responsible for rendering — you guessed it! – triangles, and to do this it needs to use graphics.primitives.lines.render()
to draw lines. One way triangle.py
could import this function is with an absolute import:
# graphics/shapes/triangle.py
from graphics.primitives.line import render
Alternatively you could use a relative import:
# graphics/shapes/triangle.py
from ..primitives.line import render
The leading ..
in the relative form means “the parent of the package containing this module”, or, in other words, the graphics
package.
This table summarizes how the .
s are interpreted when used to make relative imports from graphics.shapes.triangle
:
triangle.py | module |
---|---|
|
|
|
|
|
|
Bare dots in the from
clause
In relative imports it’s also legal for the from
section to consist purely of dots. In this case the dots are still interpreted in exactly the same way as before.
Going back to our example, suppose that the graphics.scenes
package is used to build complex, multi-shape scenes. To do this it needs to use the contents of graphics.shapes
, so graphics/scenes/__init__.py
needs to import the graphics.shapes
package. It could do this in a number of ways using absolute imports:
# graphics/scenes/__init__.py
# All of these import the same module in different waysimport graphics.shapesimport graphics.shapes as shapesfrom graphics import shapes
Alternatively, graphics.scenes
could use relative imports to get at the same module:
# graphics/scenes/__init__.py
from .. import shapes
Here the ..
means “the parent of the current package”, just as with the first form of relative imports.
It’s easy to see how relative imports can be useful for reducing typing in deeply nested package structures. They also promote certain forms of modifiability since they allow you, in principle, to rename top-level and sub-packages in some cases. On the whole, though, the general consensus seems to be that relative imports are best avoided in most cases.
__all__
Another topic we want to look at is the optional __all__
attribute of modules. __all__
lets you control which attributes are imported when someone uses the from module import *
syntax. If __all__
is not specified then from x import*
imports all public 5 names from the imported module.
The __all__
module attribute must be a list of strings, and each string indicates a name which will be imported when the *
syntax is used. For example, we can see what from reader.compressed import *
does. First, let’s add some code to reader/compressed/__init__.py
:
# reader/compressed/__init__.py
from reader.compressed.bzipped import opener as bz2_openerfrom reader.compressed.gzipped import opener as gzip_opener
Next we’ll start a REPL and display all the names currently in scope:
>>> locals(){'__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__name__': '__main__', '__builtins__': <module 'builtins' (built-in)>, '__package__': None, '__doc__': None}
Now we’ll import all public names from compressed
:
>>> from reader.compressed import *>>> locals(){'bz2_opener': <function open at 0x1018300e0>, 'gzip_opener': <function open at 0x101a36320>, 'gzipped': <module 'reader.compressed.gzipped' from './reader/compressed/gzipped.py'>, '\bzipped': <module 'reader.compressed.bzipped' from './reader/compressed/bzipped.py'>, '__\package__': None, '__name__': '__main__', '__builtins__': <module 'builtins' (built-in)>, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__doc__': None}>>> bzipped<module 'reader.compressed.bzipped' from './reader/compressed/bzipped.py'>>>> gzipped<module 'reader.compressed.gzipped' from './reader/compressed/gzipped.py'>
What we see is that from reader.compressed import *
imported the bzipped
and gzipped
submodules of the compressed
package directly into our local namespace. We prefer that import *
only imports the different “opener” functions from each of these modules, so let’s update compressed/__init__.py
to do that:
# reader/compressed/__init__.python
from reader.compressed.bzipped import opener as bz2_openerfrom reader.compressed.gzipped import opener as gzip_opener
__all__ = ['bz2_opener', 'gzip_opener']
Now if we use import *
on reader.compressed
we only import the “opener” functions, not their modules as well:
>>> locals(){'__package__': None, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__doc__\': None, '__builtins__': <module 'builtins' (built-in)>, '__name__': '__main__'}>>> from reader.compressed import *>>> locals(){'__name__': '__main__', '__doc__': None, '__package__': None, '__loader__': <class '_fro\zen_importlib.BuiltinImporter'>, '__spec__': None, '__annotations__': {}, '__builtins__':\ <module 'builtins' (built-in)>, 'bz2_opener': <function open at 0x10efb9378>, 'gzip_open\er': <function open at 0x10f06b620>}>>> bz2_opener<function open at 0x1022300e0>>>> gzip_opener<function open at 0x10230a320>
The __all__
module attribute can be a useful tool for limiting which names are exposed by your modules. We still don’t recommend that you use the import *
syntax outside of convenience in the REPL, but it’s good to know about __all__
since you’re likely to see it in the wild.
Namespace packages
Earlier we said that packages are implemented as directories containing a __init__.py
file. This is true for most cases, but there are situations where you want to be able to split a package across multiple directories. This is useful, for example, when a logical package needs to be delivered in multiple parts, as happens in some of the larger Python projects.
Several approaches to addressing this need have been implemented, but it was in PEP420 in 2012 that an official solution was built into the Python language. This solution is known as namespace packages. A namespace package is a package which is spread over several directories, with each directory tree contributing to a single logical package from the programmer’s point of view.
Namespace packages are different from normal packages in that they don’t have __init__.py
files. This is important because it means that namespace packages can’t have package-level initialization code; nothing will be executed by the package when it’s imported. The reason for this limitation is primarily that it avoids complex questions of initialization order when multiple directories contribute to a package.
But if namespace packages don’t have __init__.py
files, how does Python find them during import? The answer is that Python follows a relatively simple algorithm to detect namespace packages. When asked to import the name “foo”, Python scans each of the entries in sys.path
in order. If in any of these directories it finds a directory named “foo” containing __init__.py
, then a normal package is imported. If it doesn’t find any normal packages but it does find foo.py
or any other file that can act as a module 6, then this module is imported instead. Otherwise, the import mechanism keeps track of any directories it finds which are named “foo”. If no normal packages or modules are found which satisfy the import, then all of the matching directory names act as parts of a namespace package.
An example of a namespace package
As a simple example, let’s see how we might turn the graphics
package into a namespace package. Instead of putting all of the code under a single directory, we would have two independent parts rooted at path1
and path2
like this:
path1└── graphics ├── primitives │ ├── __init__.py │ └── line.py └── shapes ├── __init__.py └── triangle.pypath2└── graphics └── scenes └── __init__.py
This separates the scenes
package from the rest of the package.
Now to import graphics
you need to make sure that both path1
and path2
are in your sys.path
. We can do that in a REPL like this:
>>> import sys>>> sys.path.extend(['./path1', './path2'])>>> import graphics>>> graphics.__path___NamespacePath(['./path1/graphics', './path2/graphics'])>>> import graphics.primitives>>> import graphics.scenes>>> graphics.primitives.__path__['./path1/graphics/primitives']>>> graphics.scenes.__path__['./path2/graphics/scenes']
We put path1
and path2
at the end of sys.path
. When we import graphics
we see that its __path__
includes portions from both path1
and path2
. And when we import the primitives
and scenes
we see that they are indeed coming from their respective directories.
There are more details to namespace packages, but this addresses most of the important details that you’ll need to know. In fact, for the most part it’s not likely that you’ll need to develop your own namespace packages at all. If you do want to learn more about them, though, you can start by reading PEP 420.
Executable directories
Packages are often developed because they implement some program that you want to execute. There are a number of ways to construct such programs, but one of the simplest is through the use of executable directories. Executable directories let you specify a main entry point which is run when the directory is executed by Python.
What do we mean when we say that Python “executes a directory”? We mean passing a directory name to Python on the command line like this:
$ mkdir text_stats$ python3 text_stats
Normally this doesn’t work and Python will complain saying that it can’t find a __main__
module:
$ python3 text_stats/usr/local/bin/python3: can't find '__main__' module in 'text_stats'
However, as that error message suggests, you can put a special module named __main__.py
in the directory and Python will execute it. This module can execute whatever code it wants, meaning that it can call into modules you’ve created to provide, for example, a user interface to your modules.
To illustrate this, let’s add a __main__.py
to our text_stats
directory. This program will count (crudely!) the number of words and characters passed in on the command line:
# text_stats/__main__.py
import sys
segments = sys.argv[1:]
full_text = ' '.join(segments)
output = '# words: {}, # chars: {}'.format( len(full_text.split()), sum(len(w) for w in full_text.split()))
print(output)
Now if we pass this text_stats
directory to Python we will see our __main__.py
executed:
$ python text_stats I’m seated in an office, surrounded by heads and bodies.# words: 10, # chars: 47
This is interesting, but used in this way __main__.py
is not much more than a curiosity. As we’ll soon see, however, this idea of an “executable directory” can be used to better organize code that might otherwise sprawl inside a single file.
__main__.py
and sys.path
When Python executes a __main__.py
, it first adds the directory containing __main__.py
to sys.path
. This way __main__.py
can easily import any other modules with which it shares a directory.
If you think of the directory containing __main__.py
as a program, then you can see how this change to sys.path
allows you to organize your code in better ways. You can use separate modules for the logically distinct parts of your program. In the case of our text_stats
example it makes sense to move the actual counting logic into a separate module, so let’s put that code into a module called counter
:
# text_stats/counter.py
def count(text): words = text.split() return (len(words), sum(len(w) for w in words))
We’ll then update our __main__.py
to use counter
:
# text_stats/__main__.py
import sys
import counter
segments = sys.argv[1:]full_text = ' '.join(segments)output = '# words: {}, # chars: {}'.format( *counter.count(full_text))print(output)
Now the logic for counting words and letters is cleanly separated from the UI logic in our little program. If we run our directory again we see that it still works:
$ python3 text_stats It is possible I already had some presentiment of my future.# words: 11, # chars: 50
Zipping up executable directories
We can take the executable directory idea one step further by zipping the directory. Python knows how to read zip-files and treat them like directories, meaning that we can create executable zip-files just like we created executable directories.
Create a zip-file from your text_stats
directory:
$ cd text_stats$ zip -r ../text_stats.zip *$ cd ..
The zip-file should contain the contents of your executable directory but not the executable directory itself. The zip-file takes the place of the directory.
Now we can tell Python to execute the zip-file rather than the directory:
$ python3 text_stats.zip Sing, goddess, the anger of Peleus’ son Achilleus# words: 8, # chars: 42
Combining Python’s support for __main__.py
with its ability to execute zip-files gives us a convenient way to distribute code in some cases. If you develop a program consisting of a directory containing some modules and a __main__.py
, you can zip up the contents of the directory, share it with others, and they’ll be able to run it with no need for installing any packages to their Python installation.
Of course, sometimes you really do need to distribute proper packages rather than more ad hoc collections of modules, so we’ll look at the role of __main__.py
in packages next.
Executable packages
In the previous section we saw how to use __main__.py
to make a directory directly executable. You can use a similar technique to create executable packages. If you put __main__.py
in a package directory, then Python will execute it when you run the package with python3 -m package_name
.
To demonstrate this, let’s convert our text_stats
program into a package. Create an empty __init__.py
in the text_stats
directory. Then edit text_stats/__main__.py
to import counter
as a relative import:
# text_stats/__main__.py
import sys
from .counter import count
segments = sys.argv[1:]full_text = ' '.join(segments)output = '# words: {}, # chars: {}'.format( *count(full_text))print(output)
After these changes we see that we can no longer execute the directory with python3 text_stats
like before:
$ python3 text_stats horror vacuiTraceback (most recent call last): File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.6/runpy.py", line 1\93, in _run_module_as_main "__main__", mod_spec) File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.6/runpy.py", line 8\5, in _run_code exec(code, run_globals) File "text_stats/__main__.py", line 5, in <module> from .counter import countImportError: attempted relative import with no known parent package
Python complains that __main__.py
can’t import counter
with a relative import. This seems to be at odds with our design: __main__.py
and counter.py
are clearly in the same package!
The reason this fails is because of what we learned earlier about executable directories. When we ask Python to execute the text_stats
directory it first adds the text_stats
directory to sys.path
. It then executes __main__.py
. The crucial detail is that sys.path
contains text_stats
itself, not the directory containing text_stats
. As a result, Python doesn’t recognize text_stats
as a package at all; we haven’t told it the right place to look.
In order to execute our package we need to tell Python to treat text_stats
as a module 7 with the -m
command line flag:
$ python3 -m text_stats fiat lux# words: 2, # chars: 7
Now Python looks for the module text_stats.__main__
— that is, our text_stats/__main__.py
— and executes it while treating text_stats
as a package. As a result, __main__.py
is able to use a relative import to pull in counter
.
The difference between __init__.py
and __main__.py
As you’ll recall, Python executes __init__.py
the first time a package is imported. So you may be wondering why we need __main__.py
at all. After all, can’t we just execute the same code in __init__.py
as we do in __main__.py
?
The short answer is “no”. You can, of course, put whatever code you want in __init__.py
, but Python will not execute a package unless it contains __main__.py
. To see this, first move text_stats/__main__.py
out of text_stats
and try running the package again:
$ python3 -m text_stats sleeping giant/usr/local/bin/python3.6: No module named text_stats.__main__; 'text_stats' is a package \and cannot be directly executed
Now move __main__.py
back into place and edit text_stats/__init__.py
to print a little message:
# text_stats/__init__.py
print("executing {}.__init__".format(__name__))
Execute the package again:
$ python3 -m text_stats sleeping giantexecuting text_stats.__init__# words: 2, # chars: 13
We can see that our package’s __init__.py
is indeed executed when it’s imported, but, again, Python will not let us execute a package unless it contains __main__.py
.
Recommended layout
As we close out this chapter, let’s look at how you can best structure your projects. There are no hard-and-fast rules about how you lay out your code, but some options are generally better than others. What we’ll present here is a good, general-purpose structure that will work for almost any project you might work on.
Here’s the basic project layout:
project_name/ setup.py project_name/ __init__.py more_source.py subpackage1/ __init__.py test/ test_code.py
At the very top level you have a directory with the project’s name. This directory is not a package but is a directory containing both your package as well as supporting files like your setup.py
, license details, and your test directory.
The next directory down is your actual package directory. This has the same name as your top-level directory. Again, there’s no rule that says that this must be so, but this is a common pattern and makes it easy to recognize where you are when navigating your project. Your package contains all of the production code including any subpackages.
Separation between package and test
The test
directory contains all of your tests. This may be as simple as a few Python files or as complex as multiple suites of unit, integration, and end-to-end tests. We recommend keeping your tests outside of your package for a number of reasons.
Generally speaking, test and production code serve very different purposes and shouldn’t be coupled unnecessarily. Since you usually don’t need your tests to be installed along with your package, this keeps packaging tools from bundling them together. Also, more exotically, this arrangement ensures that certain tools won’t accidentally try to treat your tests as production code. 8
As with all things, this test directory arrangement may not suit your needs. Certainly you will find examples of packages that include their tests as subpackages. 9 If you find that you need to include all or some of your tests in a subpackage, you absolutely should.
A pragmatic starting point
There’s not much more to it than that. This is a very simple structure, but it works well for most needs. It serves as a fine starting point for more complex project structures, and this is the structure we typically use when starting new projects.
Modules are singletons
The singleton pattern is one of the most widely-known patterns in software development, in large part because it’s very simple and, in some ways, provides a superior option to the dreaded global variable. The intention of the singleton pattern is to limit the number of instances of an type to only one. For example, a single registry of items easily accessible from throughout the application. 10
If you find that you need a singleton in Python, one simple and effective way to implement it is as a module-level attribute. Since modules are only ever executed once, this guarantees that your singleton will only be initialized once. And since modules are initialized in a well-defined user-controlled order, you can have strong guarantees about when your singleton will be initialized. This forms a strong basis for implementing practical singletons.
Consider a simple singleton registry, implemented in registry.py
, where callers can leave their names:
# registry.py
_registry = []
def register(name): _registry.append(name)
def registered_names(): return iter(_registry)
Callers would use it like this:
import registry
registry.register('my name')
for name in registry.registered_names(): print(name)
The first time registry
is imported, the _registry
list is initialized. Then, every call to register
and registered_names
will access that list with complete assurance that it has been properly initialized.
You will recall that the leading underscore in
_registry
is a Python idiom indicating that_registry
is an implementation detail that should not be accessed directly.
This simple pattern leverages Python’s robust module semantics and is a useful way to implement singletons in a safe, reliable way.
Summary
Packages are an important concept in Python, and in this chapter we’ve covered most of the major topics related to implementing and working with them. Let’s review the topics we looked at:
- Packages:
- Packages are a special type of module
- Unlike normal module, packages can contain other modules, including other packages.
- Packages hierarchies are a powerful way to organize related code.
- Packages have a
__path__
member which is a sequence specifying the directories from which a package is loaded. - A simple standard project structure includes a location for non-python files, the project’s package, and a dedicated test subpackage.
sys.path
:sys.path
is a list of directories where Python searches for modules.sys.path
is a normal list and can be modified and queried like any other list.- If you start Python with no arguments, an empty string is put at the front of
sys.path
. This instructs Python to import modules from the current directory. - Appending directories to
sys.path
at runtime allows modules to be imported from those directories.
PYTHONPATH
:PYTHONPATH
is an environment variable containing a list of directories.- The format of
PYTHONPATH
is the same as forPATH
on your system. It’s a semi-colon-separated list on Windows and a colon-separated list on Linux or Mac OS X. - The contents of
PYTHONPATH
are added as entries tosys.path
.
__init__.py
:- Normal packages are implemented by putting a file named
__init__.py
into a directory. - The
__init__.py
file for a package is executed when the package is imported. __init__.py
files can hoist attributes from submodule into higher namespaces for convenience.
- Normal packages are implemented by putting a file named
- Relative imports:
- Relative import allow you to import modules within a package without specifying the full module path.
- Relative imports must use the
from module import name
form of import. - The “from” portion of a relative import starts with at least one dot.
- Each dot in a relative import represents a containing package.
- The first dot in a relative import means “the package containing this module.”
- Relative imports can be useful for reducing typing.
- Relative imports can improve modifiability in some cases.
- In general, it’s best to avoid relative imports because they can make code harder to understand.
- Namespace packages:
- A namespace package is a package split across several directories.
- Namespace packages are described in PEP420.
- Namespace packages don’t use
__init__.py
files. - Namespace packages are created when one or more directories in the Python path match an import request and no normal packages or modules match the request.
- Each directory that contributes to a namespace package is listed in the package’s
__path__
attribute.
- Executable directories:
- Executable directories are created by putting a
__main__.py
file in a directory. - You execute a directory with Python by passing it to the Python executable on the command line.
- When
__main__.py
is executed its__name__
attribute is set to__main__
. - When
__main__.py
is executed, it’s parent directory is automatically added tosys.path
. - The
if __name__ == '__main__':
construct is redundant in a__main__.py
file. - Executable directories can be compressed into zip-files which can be executed as well.
- Executable directories and zip-files are convenient ways to distribute Python programs.
- Executable directories are created by putting a
- Modules:
- Modules can be executed by passing them to Python with the
-m
argument. - The
__all__
attribute of a module is a list of string specifying the names to export whenfrom module import *
is used. - Module-level attributes provide a good mechanism for implementing singletons.
- Modules have well-defined initialization semantics.
- Modules can be executed by passing them to Python with the
- Miscellaneous:
- The standard
gzip
module allows you to work with files compressed using GZIP. - The standard
bz2
module allows you to work with files compressed using BZ2.
- The standard