Python for Java People

Editor’s Note: Being in a Java channel, most of us know the language very well and have been in its ecosystem for at least a couple of years. This gives us routine and expertise but it also induces a certain amount of tunnel vision. In the series Outside-In Java non-Javaists will give us their perspective of our ecosystem.

Syntax
Dynamic typing
The dynamic typing philosophy
A hybrid paradigm
Sequences
Functions
Objects and the dynamic runtime
Objects
Classes
Wrapping up
Comments

Philosophically, Python is almost a polar opposite to Java. It forgoes static types and rigid structure in favor of a loose sandbox, within which you’re free to do basically whatever you want. Perhaps Python is about what you can do, whereas Java is about what you may do.

And yet both languages still share a great deal of inspiration tracing back to C. They’re both imperative languages with blocks, loops, functions, assignment, and infix math. Both make heavy use of classes, objects, inheritance, and polymorphism. Both feature exceptions fairly prominently. Both handle memory management automatically. They even both compile to bytecode that runs on a VM, though Python compiles transparently for you. Python even took a few cues from Java — the standard library’s logging and unittest modules are inspired by log4j and JUnit, respectively.

Given that overlap, I think Java developers ought to feel reasonably at home with Python. And so I come to you bearing some gentle Python propaganda. If you’ll give me a chance, I can show you what makes Python different from Java, and why I find those differences appealing. At the very least, you might find some interesting ideas to take back to the Java ecosystem.

(If you want a Python tutorial, the Python documentation has a good one. Also, this is from a Python 3 perspective! Python 2 is still fairly common in the wild, and it has a few syntactic differences.)

Syntax

Let’s get this out of the way first. Here’s hello world:

print("Hello, world!")

Hm, well, that’s not very enlightening. Okay, here’s a function to find the ten most common words in a file. I’m cheating a little by using the standard library’s Counter type, but it’s just so good.

from collections import Counter

def count_words(path):
    words = Counter()
    with open(path) as f:
        for line in f:
            for word in line.strip().split():
                words[word] += 1

    for word, count in words.most_common(10):
        print(f"{word} x{count}")

Python is delimited by whitespace. People frequently have strong opinions about this. I even thought it was heretical when I first saw it. Now, a decade or so later, it seems so natural that I have a hard time going back to braces. If you’re put off by this, I doubt I can convince you otherwise, but I urge you to overlook it at least for a little while; it really doesn’t cause any serious problems in practice, and it eliminates a decent bit of noise. Plus, Python developers never have to argue about where a { should go.

Beyond that aesthetic difference, most of this ought to look familiar. We’ve got some numbers, some assignment, and some method calls. The import statement works a little differently, but it has the same general meaning of “make this thing available”. Python’s for loop is very similar to Java’s for-each loop, only with a bit less punctuation. The function itself is delimited with def instead of a type, but it works how you’d expect: it can be called with arguments and then return a value (though this one doesn’t).

Only two things are really unusual here. One is the with block, quite similar to Java 7’s “try-with-resources” — it guarantees the file will be closed at the end of the block, even if an exception is raised within it. The other is the f"..." syntax, a fairly new feature that allows interpolating expressions directly into strings.

And that’s it! You’ve already read some Python. At the very least, this isn’t a language from a totally different planet.

Dynamic typing

It’s probably obvious from that example, but Python code doesn’t have a lot of types sprinkled around. Not on variable declarations, not on argument or return types, not on the layout of an object. Anything can be any type at any time. I haven’t shown a class definition yet, so here’s a trivial one.

class Point:
    def __init__(self, x, y):
        self.x = x
        self.y = y

    def magnitude(self):
        return (self.x ** 2 + self.y ** 2) ** 0.5

point = Point(3, 4)
print(point.x)  # 3
print(point.magnitude())  # 5.0

Even the x and y aren’t declared as attributes; they only exist because the constructor created them. Nothing forced me to pass in integers. I could’ve passed in floats, or perhaps Decimals or Fractions.

If you’ve only used static languages, this might sound like chaos. Types are warm and cozy and comforting. They guarantee… well, perhaps not that the code actually works (though some would disagree), but something. How can you rely on code when you don’t even know that anything’s the correct type?

But wait — Java has no such guarantee either! After all, any object might be null, right? That’s virtually never an object of the correct type.

You might think of dynamic typing as a complete surrender to the null problem. If we have to deal with it anyway, we might as well embrace it and make it work for us — by deferring everything to run time. Type errors become normal logic errors, and you deal with them the same way.

(For the opposite approach, see Rust, which has no null value — or exceptions. I’d still rather write Python, but I appreciate that Rust’s type system isn’t always quietly lying to me.)

In my magnitude method, it doesn’t matter that self.x is an int or a float or any kind of number at all. It only needs to support the ** operator and return something that supports the + operator. (Python supports operator overloading, so this could be potentially anything.) The same applies to normal method calls: any type is acceptable, as long as it works in practice.

That means Python has no need for generics; everything already works generically. No need for interfaces; everything is already polymorphic with everything. No downcasts, no upcasts, no escape hatches in the type system. No running into APIs requiring a List when they could work just as well with any Iterable.

A number of common patterns become much easier. You can create wrapper objects and proxies without needing to change consuming code. You can use composition instead of inheritance to extend a third-party type — without needing to do anything special to preserve polymorphism. A flexible API doesn’t require duplicating every class as an interface; everything already acts as an implicit interface.

The dynamic typing philosophy

With static typing, whoever writes some code gets to choose the types, and the compiler checks that they’ll work. With dynamic typing, whoever uses some code gets to choose the types, and the runtime will give it a try. Here’s that opposing philosophy in action: the type system focuses on what you can do, not what you may do.

Using dynamic typing this way is sometimes called “duck typing”, in the sense that “if it walks like a duck and it quacks like a duck, it’s a duck.” The idea is that if all you want is something that quacks, then instead of statically enforcing that your code must receive a duck, you take whatever you’re given and ask it to quack. If it does, that’s all you cared about anyway, so it’s just as good as a duck. (If it can’t, you’ll get an AttributeError, but that’s not very punchy.)

Do note, too, that Python is still strongly typed. The term is a little fuzzy, but it generally means that values preserve their types at run time. The typical example is that Python won’t let you add a string to a number, whereas a weakly-typed language like JavaScript would silently convert one type to the other, using precedence rules that may not match your expectations.

Unlike a lot of dynamic languages, Python errs on the side of catching mistakes early — at run time, anyway. For example, reading from a variable that doesn’t yet exist will raise an exception, as will reading a nonexistent key from a dict (like a Map). In JavaScript, Lua, and similar languages, you’d silently get a null value in both cases. (Even Java returns null for missing Map keys!) If you want to fall back to a default, dicts have methods for expressing that more explicitly.

There’s definitely a tradeoff here, and whether it’s worth it will differ by project and by person. For me, at least, it’s easier to settle on a firm design for a system after I see it in action, but a statically typed language expects a design upfront. Static typing makes it harder to try out a lot of different ideas, harder to play.

You do have fewer static guarantees, but in my experience, most type errors are caught right away… because the first thing I do after writing some code is try to run it! Any others should be caught by your tests — which you should be writing in any language, and which Python makes relatively easy.

A hybrid paradigm

Both Python and Java are imperative and object-oriented: they work by executing instructions, and they model everything as objects.

In recent releases, Java has been adding some functional features, to much hurrah, I assume. Python also has its fair share of functional features, but… the approach is somewhat different. It offers a few token builtins like map and reduce, but it’s not really designed around the idea of chaining lots of small functions together.

Instead, Python mixes in… something else. I don’t know of any common name for the approaches Python takes. I suppose it split the idea of “chaining functions” into two: working with sequences, and making functions themselves more powerful.

Sequences

Sequences and iteration play a significant role in Python. Sequences are arguably the most fundamental data structure, so tools for working with them are very nice to have. I interpret this as Python’s alternative to functional programming: instead of making it easier to combine a lot of small functions and then apply them to sequences, Python makes it easier to manipulate sequences with imperative code in the first place.

Way back at the beginning, I casually dropped in this line:

    for word, count in words.most_common(10):

A for loop is familiar enough, but this code iterates over two variables at a time. What’s actually going on is that each element in the list most_common returns a tuple, a group of values distinguished by order. Tuples can be unpacked by assigning them to a tuple of variable names, which is what’s really happening here. Tuples are commonly used to return multiple values in Python, but they’re occasionally useful in ad-hoc structures as well. In Java, you’d need an entire class and a couple lines of assigning stuff around.

Anything that can be iterated over can also be unpacked. Unpacking supports arbitrary nesting, so a, (b, c) = ... does what it looks like. For sequences of unknown length, a *leftovers element can appear anywhere and will soak up as many elements as necessary. Perhaps you really like LISP?

values = [5, 7, 9]
head, *tail = values
print(head)  # 5
print(tail)  # (7, 9)

Python also has syntax for creating lists out of simple expressions — so-called “list comprehensions” — which are much more common than functional approaches like map. Similar syntax exists for creating dicts and sets. Entire loops can be reduced to a single expression that emphasizes what you’re actually interested in.

values = [3, 4, 5]
values2 = [val * 2 for val in values if val != 4]
print(values2)  # [6, 10]

The standard library also contains a number of interesting iterables, combinators, and recipes in the itertools module.

Finally, Python has generators for producing lazy sequences with imperative code. A function containing the yield keyword, when called, doesn’t execute immediately; instead it returns a generator object. When the generator is iterated over, the function runs until it encounters a yield, at which point it pauses; the yielded value becomes the next iterated value.

def odd_numbers():
    n = 1
    while True:
        yield n
        n += 2

for x in odd_numbers():
    print(x)
    if x > 4:
        break
# 1
# 3
# 5

Because generators run lazily, they can produce infinite sequences or be interrupted midway. They can yield a lot of large objects without consuming gobs of memory by having them all live at once. They also work as a general alternative to the “chained” style of functional programming. Instead of combining maps and filters, you can write familiar imperative code.

# This is the pathlib.Path API from the standard library
def iter_child_filenames(dirpath):
    for child in dirpath.iterdir():
        if child.is_file():
            yield child.name

To express a completely arbitrary lazy iterator in Java, you’d need to write an Iterator that manually tracks its state. For all but the simplest cases, that can get pretty hairy. Python has an iteration interface as well, so you can still use this approach, but generators are so easy to use that most custom iteration is written with them.

And because generators can pause themselves, they’re useful in a few other contexts. By advancing the generator manually (instead of merely iterating it all at once with a for loop), it’s possible to run a function partway, have it stop at a certain point, and run other code before resuming the function. Python leveraged this to add support for asynchronous I/O (non-blocking networking without threads) purely as a library, though now it has dedicated async and await syntax.

Functions

At a glance, Python functions are pretty familiar. You can call them with arguments. The passing style is exactly the same as in Java — Python has neither references nor implicit copying. Python even has “docstrings”, similar to Javadoc comments, but built into the syntax and readable at run time.

def foo(a, b, c):
    """Print out the arguments.  Not a very useful function, really."""
    print("I got", a, b, c)

foo(1, 2, 3)  # I got 1 2 3

Java has variadic functions with args... syntax; Python has much the same using *args. (The *leftovers syntax for unpacking was inspired by the function syntax.) But Python has a few more tricks up its sleeve. Any argument can have a default value, making it optional. Any argument can also be given by name — I did this earlier with Point(x=3, y=4). The *args syntax can be used when calling any function, to pass a sequence as though it were individual arguments, and there’s an equivalent **kwargs that accepts or passes named arguments as a dict. An argument can be made “keyword-only”, so it must be passed by name, which is very nice for optional bools.

Python does not have function overloading, of course, but most of what you’d use it for can be replaced by duck typing and optional arguments.

The stage is now set for one of Python’s most powerful features. In much the same way as dynamic typing lets you transparently replace an object by a wrapper or proxy, *args and **kwargs allow any function to be transparently wrapped.

def log_calls(old_function):
    def new_function(*args, **kwargs):
        print("i'm being called!", args, kwargs)
        return old_function(*args, **kwargs)

    return new_function

@log_calls
def foo(a, b, c=3):
    print(f"a = {a}, b = {b}, c = {c}")

foo(1, b=2)
# i'm being called! (1,) {'b': 2}
# a = 1, b = 2, c = 3

That’s a bit dense, sorry. Don’t worry too much about exactly how it works; the gist is that foo gets replaced by a new_function, which forwards all its arguments along to foo. Neither foo nor the caller need to know that anything is any different.

I cannot understate how powerful this is. It can be used for logging, debugging, managing resources, caching, access control, validation, and more. It works very nicely in tandem with the other metaprogramming features, and in a similar vein, it lets you factor out structure rather than just code.

Java or Ptyhon?

Objects and the dynamic runtime

A dynamic runtime is a runtime — the stuff behind the scenes that powers core parts of the language — that can be played with at run time. Languages like C or C++ very much do not have dynamic runtimes; the structure of the source code is “baked” into the compiled output, and there’s no sensible way to change its behavior later on. Java, on the other hand, does have a dynamic runtime! It even comes with a whole package devoted to reflection.

Python has reflection too, of course. There are number of simple functions built right in for inspecting or modifying objects’ attributes on the fly, which is incredibly useful for debugging and the occasional shenanigans.

But Python takes this a little bit further. Since everything is done at run time anyway, Python exposes a number of extension points for customizing its semantics. You can’t change the syntax, so code will still look like Python, but you can often factor out structure — something that’s very difficult to do in a more rigid language.

For an extreme example, have a look at pytest, which does very clever things with Python’s assert statement. Normally, writing assert x == 1 would simply throw an AssertionError when false, leaving you with no context for what went wrong or where. That’s why Python’s built-in unittest module — like JUnit and many other testing facilities — provides a pile of specific utility functions like assertEquals. Unfortunately, these make tests somewhat wordier and harder to read at a glance. But with pytest, assert x == 1 is fine. If it fails, pytest will tell you what x is… or where two lists diverge, or what elements are different between two sets, or whathaveyou. All of this happens automatically, based on the comparison being done and the types of the operands.

How does pytest work? You really don’t want to know. And you don’t have to know to write tests with pytest — and have a blast doing it.

That’s the real advantage of a dynamic runtime. You, personally, may not make use of these features. But you can reap great benefits from libraries that use them without caring about how they work. Even Python itself implements a number of extra features using its own extension points — no changes required to the syntax or interpreter.

Objects

My favorite simple example is attribute access. In Java, a Point class might opt for getX() and setX() methods instead of a plain x attribute. The reasoning is that if you ever need to change how x is read or written, you can do so without breaking the interface. In Python, you don’t need to worry about that upfront, because you can intercept attribute access if necessary.

class Point:
    def __init__(self, x, y):
        self._x = x
        self._y = y

    @property
    def x(self):
        return self._x

    # ... same for y ...

point = Point(3, 4)
print(point.x)  # 3

The funny @property syntax is a decorator, which looks like a Java annotation, but can more directly modify a function or class.

Reading point.x now calls a function and evaluates to its return value. This is completely transparent to calling code — and indistinguishable from any other attribute read — but the object can intervene and handle it however it likes. Unlike Java, attribute access is part of a class’s API and freely customizable. (Note that this example also makes x read-only, because I didn’t specify how to write to it! The syntax for a writable property is a little funny-looking, and how it works doesn’t matter here. But you could trivially, say, enforce that only odd numbers can be assigned to point.x.)

Similar features exist in other static languages like C#, so perhaps this isn’t so impressive. The really interesting part about Python is that property isn’t special at all. It’s a normal built-in type, one that could be written in less than a screenful of pure Python. It works because a Python class can customize its own attribute access, both generally and per-attribute. Wrappers and proxies and composition are easy to implement: you can forward all method calls along to the underlying object without having to know what methods it has.

The same hooks property uses could be used for a lazy-loading attribute or an attribute that automatically holds a weak reference — completely transparent to calling code, and all from pure Python.

You’ve probably noticed by now that my code has no public or private modifiers, and indeed, Python has no such concepts. By convention, a single leading underscore is used to mean “private-ish” — or perhaps more accurately, “not intended as part of a stable public API”. But this has no semantic meaning, and Python itself doesn’t stop anyone from inspecting or changing such an attribute (or calling it, if it’s a method). No final or static or const, either.

This is that same philosophy at work: core Python isn’t usually in the business of preventing you from doing anything. And when you need it, it’s very useful. I’ve patched around bugs in third-party libraries by calling or overriding or even outright redefining private methods at startup time. It saves me from having to create a whole local fork of the project, and once the bug is fixed upstream, I simply delete my patch code.

In a similar vein, you can easily write tests for code that depends on external state — say, the current time. If refactoring is impractical, you could replace time.time() with a dummy function for the duration of the test. Library functions are just attributes of modules (like Java packages), and Python modules are objects like anything else, so they can be inspected and modified in the same ways.

Classes

A Java class is backed by a Class object, but the two aren’t quite interchangeable. For a class Foo, the class object is Foo.class. I don’t think Foo can be used usefully on its own, because it names a type, and Java makes some subtle distinctions between types and values.

In Python, a class is an object, an instance of type (which is itself an object, and thus an instance of itself, which is fun to think about.) Classes can thus be treated like any other value: passed as arguments, stored in larger structures, inspected, or manipulated. The ability to make dicts whose keys are classes is especially useful at times. And because classes are instantiated simply by calling them — Python has no new keyword — they can be interchanged with simple functions in many cases. Some common patterns like factories are so simple that they almost vanish.

# Wait, is Vehicle a class or a factory function?  Who cares!
# It could even be changed from one to the other without breaking this code.
car = Vehicle(wheels=4, doors=4)

Several times now, I’ve put functions and even regular code at top-level, outside of any class. That’s allowed, but the implications are a little subtle. In Python, even the class and def statements are regular code that execute at run time. A Python file executes from the top down, and class and def aren’t special in that regard. They’re just special syntax for creating certain kinds of objects: classes and functions.

Here’s the really cool part. Because classes are objects, and their type is type, you can subclass type and change how it works. Then you can make classes that are instances of your subclass.

That’s a little weird to wrap your head around at first. But again, you don’t need to know how it works to benefit from it. For example, Python has no enum block, but it does have an enum module:

class Animal(Enum):
    cat = 0
    dog = 1
    mouse = 2
    snake = 3

print(Animal.cat)           # <Animal.cat: 0>
print(Animal.cat.value)     # 0
print(Animal(2))            # <Animal.mouse: 2>
print(Animal['dog'])        # <Animal.dog: 1>

The class statement creates an object, which means it calls a constructor somewhere, and that constructor can be overridden to change how the class is built. Here, Enum creates a fixed set of instances rather than class attributes. All of it is implemented with plain Python code and normal Python syntax.

Entire libraries have been built on these ideas. Do you hate the tedium of typing self.foo = foo for every attribute in constructors? And then defining equality and hashing and cloning and a dev-readable representation, all by hand? Java would need compiler support, which may be coming with Project Amber. Python is flexible enough that the community solved this problem with the attrs library.

import attr

@attr.s
class Point:
    x = attr.ib()
    y = attr.ib()

p = Point(3, 4)
q = Point(x=3, y=4)
p == q  # True, which it wouldn't have been before!
print(p)  # Point(x=3, y=4)

Or take SQLAlchemy, a featureful database library for Python. It includes an ORM inspired by Java’s Hibernate, but instead of declaring a table’s schema in a configuration file or via somewhat wordy annotations, you can write it directly and compactly as a class:

class Order(Table):
    id = Column(Integer, primary_key=True)
    order_number = Column(Integer, index=True)
    status = Column(Enum('pending', 'complete'), default='pending')
    ...

This is the same basic idea as Enum, but SQLAlchemy also uses the same hooks as property so you can modify column values naturally.

order.order_number = 5
session.commit()

Finally, classes themselves can be created at run time. It’s a little more niche, but thriftpy creates a whole module full of classes based on a Thrift definition file. In Java, you’d need code generation, which adds a whole new compilation step that can get out of sync.

All of these examples rely on Python’s existing syntax but breathe new meaning into it. None of them do anything you couldn’t do in Java or any other language, but they cut down on structural repetition — which makes code easier to write, easier to read, and less bug-prone.

Wrapping up

Python has a lot of the same basic concepts as Java, but takes them in a very different direction and adds some entirely new ideas. Where Java focuses on stability and reliability, Python focuses on expressiveness and flexibility. It’s an entirely different way to think about imperative programming.

I doubt Python will replace Java for you in the spaces where Java excels. Python probably won’t win any speed contests, for instance (but see PyPy, a JITted Python). Java has native support for threads, whereas the Python community largely shuns them. Very large complex software with a lot of dusty corners may prefer the sanity checking that static typing provides (but see mypy, a static type checker for Python).

But perhaps Python will shine in spaces where Java doesn’t. Plenty of software doesn’t need to be particularly fast or parallel, and then other concerns float to the surface. I find it very quick and easy to get a project started in Python. With no separate compilation step, the write/run loop is much quicker. The code is shorter, which usually means it’s easier to understand. Trying out different architectural approaches feels cheaper. And sometimes it’s fun to just try out stupid ideas, like implementing goto with a library.

I hope you’ll give Python a try. I have a lot of fun with it, and I think you will too. Just try not to treat it as Java with all the types hidden from you.

Worst case, there’s always Pyjnius, which lets you do this.

from jnius import autoclass

System = autoclass('java.lang.System')
System.out.println('Hello, world!')