Maybe I Was Wrong about Java - Part 2

Editor’s Note: Being in a Java channel, most of us know the language very well and have been in its ecosystem for at least a couple of years. This gives us routine and expertise but it also induces a certain amount of tunnel vision. In a new series Outside-In Java non-Javaists will give us their perspective of our ecosystem.

I don’t deal with Java much, so I’m investigating how true all my preconceived notions about it are. Last time, I mostly explored user-facing concerns, like speed and size. Results: inconclusive.

But modern Java is increasingly used in places where it’s invisible to users: In datacenters, on phones, on my toaster. So perhaps the really interesting questions are about how Java looks to developers.

Java Is Insecure

Java’s infamy as a walking security issue dates back to the ancient days where Java applets were a common thing, you trusted that the JVM could effectively sandbox them, and occasionally it couldn’t.

Maybe trying to whittle down an entire general-purpose language and its massive standard library to be safe enough to run from the web was a bad idea, but it’s a moot point now. I very rarely see Java applets any more. I don’t think I even have the NPAPI plugin installed. Firefox doesn’t let Java applets run automatically, and is dropping support for them entirely in March; Chrome dropped support last year.

Granted, this was probably in part because Java applets had become more of an attack surface than a useful platform; a CISCO report from 2014 prominently claims that 91% of web exploits were aimed at Java. I think that same year, my then-employer was warning everyone to manually disable Java in their browsers if they didn’t specifically need it. If that’s the only exposure you get to Java, well, it’s not going to leave a great impression.

Hey, hang on. This is supposed to be about the developer perspective. So what about the runtime itself, independent of applet concerns? “Secure” is difficult to quantify, but as a very rough approximation, I can look for the number of CVEs issued this year.

PHP: 107
Oracle Java: 37
Node: 9
CPython: 6
Perl: 5
Ruby: 1

Er, whoops, that caught me off guard. I honestly expected to be pleasantly surprised and clearly proven wrong here, but that list makes Java sound somewhat worse than I thought. I hope there’s a great explanation for this, but I don’t have one.

Java Is Enterprisey

Ah, another word that everyone uses (myself included) but that doesn’t mean anything. It conjures a very specific image, but a very fuzzy definition. I have a few guesses as to what it might mean.

Java Is Abstracted Into The Stratosphere

The abstractosphere, if you will. The realm of the infamous AbstractSingletonProxyFactoryBean.

I’m actually a little confused about this one. Turning to elasticsearch again, I stumbled upon this class, WhitespaceTokenizerFactory. Its entire source code is:

public class WhitespaceTokenizerFactory extends AbstractTokenizerFactory {

    public WhitespaceTokenizerFactory(
            IndexSettings indexSettings,
            Environment environment,
            String name,
            Settings settings) {
        super(indexSettings, name, settings);
    }

    @Override
    public Tokenizer create() {
        return new WhitespaceTokenizer();
    }
}

Okay, sure. You want to be able to create an arbitrary tokenizer from some external state, but you don’t want the tokenizers themselves to depend on the external state. Makes sense.

Still, this code looks pretty silly, especially if you haven’t seen the other classes that do more elaborate things. The same words are repeated three times; a 38-line file has only two lines of actual code. It’s easy to look at this and think Java code goes to ridiculous extremes with its indirection. At worst, I might do this in Python:

@builder_for(WhitespaceTokenizer)
def build(cls, index_settings, env, name, settings):
    return cls()

@builder_for(SomeOtherTokenizer)
def build(cls, index_settings, env, name, settings):
    return cls(index_settings.very_important_setting)

# etc.

I’m handwaving how this would actually work, but there’s not much to it. It might even be possible in Java, come to think of it, but probably not pretty or idiomatic. Alternatively, Python code might just have the build on the tokenizer classes themselves. One nice thing about dynamic typing is that code can use a type without depending on it. The tokenizer class can work with IndexSettings and Environment objects without having to import the types or even know they exist. It’s a little iffy, but in a case like this where everything’s internal, it could make sense.

But given that Java’s type system is what it is, I can understand why you’d end up with the above code. What confuses me is this.

Why don’t I see the same thing in other languages?

I found this collection of tiny factory classes after about a minute of randomly clicking around in the most starred Java project on GitHub. I’m completely unsurprised by it. Yet I can’t recall seeing anything similar in other explicit, statically-typed languages. Where are the tiny factory classes in C++? The most starred C++ project is Electron, and searching for “factory” only finds me code like this, which has a lot more going on. The most starred Objective-C project is AFNetworking, which contains “factory” once — in a changelog. The most starred Swift project is Alamofire, which somehow doesn’t contain the word “factory” anywhere!

So while I can accept that layers of indirection and tiny classes are useful for getting along with a C++-style type system, I don’t understand why I see them so much more often in Java than even in, well, C++.

Is this a cultural difference? Are C++ developers happy to have a tangled web of interconnected dependencies? Do these tiny classes exist in C++, but live all together in a single file where they’re much easier to ignore?

Java definitely seems to live in the abstractosphere, but I can’t figure out why it’s so different from similar languages.

Java is Tediously Verbose

“Enterprise” makes me think of repetitive bureaucracy sucking the joy out of everything.

Accessors Everywhere

And Java makes me think of accessors. Same idea, really.

private int foo;

public int getFoo() {
    return this.foo;
}

public setFoo(int foo) {
    this.foo = foo;
}

Look at all this code eating up precious vertical space to do absolutely nothing. I could’ve just said public int foo; and been done with it.

There are three kinds of programmers in the world, distinguished by how they reacted to that last paragraph. Some nodded their heads, and they are probably Python programmers. Some balked that this violates encapsulation, and will balk again when I say that I don’t care about encapsulation. Finally, some rolled their eyes and pointed out that a public attribute is frozen into the API and can never be changed without breaking existing code.

Ah, those latter folks might have a point. The trouble is that Java doesn’t support properties. “Property” is a horrible generic name for a language feature that’s become popular only somewhat recently, but if you’re not familiar, I mean this magical thing you can do in Python. If you have a foo attribute that external code is free to modify, and later you decide that it should only ever be set to an odd number, you can do that without breaking your API:

class Bar:
    def __init__(self):
        # Leading underscore is convention for "you break it, you bought it"
        self._foo = 3

    @property
    def foo(self):
        return self._foo

    @foo.setter
    def foo(self, foo):
        if foo % 2 == 0:
            raise ValueError("foo must be odd")
        self._foo = foo

bar = Bar()
bar.foo = 8  # ValueError: foo must be odd

@property is an artifact of great power that transparently intercepts attempts to read or write an attribute. Other code can still work with obj.foo as expected and never know the difference. Even @property itself can be expressed in plain Python code, and there are some interesting variants: Lazy-loading attributes, attributes that transparently act as weak references, etc.

I know Python, Swift, and a number of .NET languages (C#, F#, VB, Boo, …) support properties. JavaScript is specced as supporting them by now, though I’m not sure how much code relies on them in the wild. Ruby has them, with slightly different semantics. Lua and PHP can fake them. Perl has a thing but you probably shouldn’t use it. The JVM itself must be able to support them, since Jython and JRuby exist. So why not Java the language?

It seems odd to me that Java hasn’t picked up on this feature that would cut out a lot of repetition. It was apparently proposed for Java 7, but I can’t find an explanation of why it didn’t make the cut, and now it seems to be very much not a priority.

But Wait, There’s More

I’m showing my Python colors here, but while I’m at it: Another neat trick is that classes are easy to manipulate at module load time. A class definition is just code that creates a class when executed. So Python has some interesting shenanigans like the attrs module, which allows doing this:

import attr

@attr.s
class Point:
    x = attr.ib(default=0)
    y = attr.ib(default=0)

With the attributes declared like so, you get for free: A constructor that takes arguments in order or by name; a reasonable repr (like toString, but explicitly for debugging only); hashability; comparison operators; and opt-in immutability. No code generation, just some quick manipulation of the class as it’s defined at runtime.

Obviously that exact approach wouldn’t fly in Java, but it could be simulated. I know Java IDEs are almost infamous for the amount of code generation they can do, so I’m a little surprised Java itself hasn’t adopted a way to generate or rewrite code at compile time.

Don’t Not Repeat Yourself

Along similar lines, this seems to be a common affliction:

ComicallyLongStrawmanTypeName value = new ComicallyLongStrawmanTypeName();

A little type inference would go a long way here. Java did get some type inference recently, but only for generics:

List<ComicallyLongStrawmanTypeName> value = new ArrayList<>();

Definitely an improvement, but I’m a little puzzled as to why this feature stopped here. In fact that’s backwards from how I’d expect type inference to work — usually the first step is to infer the type of the variable from the type of the expression, not the other way around. I have seen some speculation about more traditional type inference in Java 10, and it’ll be a decent improvement if that pans out.

ComicallyLongStrawmanTypeName

The ComicallyLongStrawmanTypeName itself is also worth mention. Java is infamous for its very long type names. I never even thought about this until right now, but it’s almost certainly caused by… the design of the package system!

Package names tend to start with at least a two-part domain and a project name, like org.mozilla.rhino, which already makes them a bit of a mouthful. The trouble is, package and class names can’t be aliased. Packages aren’t hierarchical, either, so you can’t import a whole “branch” of a package. If you have a package containing a class, you have exactly two ways of referring to it: As org.mozilla.rhino.ClassName, or as ClassName after an import. That’s it.

The result is that class names must be somewhat qualified to avoid name conflicts! If you name a class List, you’re going to annoy anyone who wants to use your class in the same file as the standard library List. So you end up with a package com.bar.foo containing FooBarList.

This seems slightly opposed to the point of packaging. In Python I can alias a package, or import the parent package, or alias a class name. I can fill a file with classes with very generic names, then import that file with a short alias and use its contents as pkg.List. It’s great and seriously cuts down on repeated noise. But in Java you have to name classes as though they were part of a single global namespace, because they might be imported alongside any other class.

(Incidentally, this kind of subtle global namespacing also makes me wary of Java-style interfaces. The method names in an interface effectively share a single global namespace with all method names, in all Java code, everywhere — because the entire point of interfaces is that a class can implement any arbitrary set of them. So you can’t use nice short names here, either, or you run the risk of a naming conflict. This isn’t a knock against Java, though — the same problem exists in many languages, including Python, though the lack of explicit interfaces makes it less of a concern. Rust largely avoids it by making interface implementations distinct, rather than part of a single class body. I believe the idea came from the ML family.)

Something that stands out to me is the one place Java does let you be a bit terse: implicit this. I don’t much like implicit this. It makes scanning a file very difficult, because when I see this:

foo = 3;

I can’t tell at a glance whether that’s a local variable or an attribute without checking the rest of the method. (Yes, yes, IDEs, but a running theme here is that I think a language should be usable without one.) I’m sure plenty of people disagree with me here, and that’s fine; I’m just saying that implicit this doesn’t really save much space or typing, so it strikes me as an odd “optimization” to have when other language features are responsible for much more verbosity.

I’m a little surprised; I thought Java’s wordiness would be more a cultural phenomenon than a property of the language, but it seems Java itself inadvertently promotes loquacious code. Some recent language changes do look promising, and I hope to see more work on this in the future.

A Brief Tangent

At this point, I’m starting to realize that much of Java’s verbosity is really about API stability:

public vs private declares what you’re willing to support as part of a stable interface.
Accessors future-proof the interface and give you somewhere to add more logic later.
Static types make the interface as conservative as possible.
Interfaces and other forms of indirection minimize how much functionality your code relies on.
Inheritance is also part of every class’s API, which is where protected and final class come in.
Factories future-proof against later changing the return type, which you can’t do with constructors.
Importing is really rigid because… er… well, can’t explain that one.

But cast in a stability light, a few things about Java strike me as unusual.

If the guarantee everyone actually wants is a stable API, why do the core language and tooling not have some way to… enforce a stable API? Instead we have a box of indirect tools like public and private, and we have to consciously remember the list of allowed changes that won’t break API compatibility. If we accidentally make a breaking change, well, we’d better hope that we have a test relying on that API, or we’ll never notice. Why can’t a computer check this guarantee for us? Why isn’t this built into any major language, when so many of them make a point of offering this handful of primitives?

Also, Java has no way to indicate what parts of your API are intended for public consumption. A method can be private or public; a class can be private or public; but I don’t know of a way to say whether an entire file (or directory) is part of the public API or merely an internal utility. “Package-private” visibility is a thing, but since packages aren’t hierarchical, it only goes so far unless you want to cram your entire project into one package. (The only language I know of that handles this nicely is, again, Rust.) I’d guess the workaround would be to keep a separate set of public wrapper classes, but if your public interface ends up completely separate, what’s the point of all the stability keywords sprinkled throughout your internal code? Maybe Java 9’s modules will improve on this.

Come to think of it, I wonder: Do many Java developers forego the stability stuff for internal code? Using private inside your own codebase doesn’t cost you anything, I suppose, but using public inside your own codebase doesn’t either. I’ve heard many times that Java IDEs have refactoring abilities bordering on wizardry, so surely an attribute could be automatically changed to use accessors if necessary. Does anyone take advantage of this to avoid the future-proofing boilerplate?

I suspect the answer is “no”, because accessors are treated as best practice — or even inherently virtuous, part of the very fabric of OO design. If so, that’s a fascinating and stark contrast with Python, where accessors are generally regarded as superfluous cruft — because the language provides a way to change plain attributes without breaking the API.

Along similar lines, Python has no concept of “privacy” at all. By convention, a method or attribute name that starts with a single underscore is “private”, but that only means “don’t blame me if this changes later” and makes virtually no difference to the core language. It’s very useful when a third-party library almost does what I want, but doesn’t have a hook somewhere I need it. I can just subclass something and wrap a “private” method, if I’m willing to accept a little brittleness. I might accidentally break some object’s guarantees along the way, but I know it’s my own dumb fault. It helps that the original source code is generally available.

Yet I’ve seen people insist that Python doesn’t have “real” or “full” OO support, simply because it lacks these Java features. As though “good OO design/support” were equivalent to “whatever Java does”. That is some mighty impressive branding power.

It’s counter-intuitive that design principles would be so radically opposed between two languages, and it suggests that perhaps those principles aren’t as principled as we think. Encapsulation is supposed to be good because it hides the representation of an object, but sometimes an object is its representation, and Java has no satisfying way to express that. A hostname isn’t part of what a URL does; it’s part of what a URL is. Expressing “this URL’s hostname” in the form of a question or command (i.e., a method call) feels unnatural and awkward to me. But in Java, you have no choice, and the result is that I’ve encountered Java developers who treat encapsulation as an absolute good — as though OO principles were defined in terms of Java’s own limitations.

I’m not trying to rag on Java or Java developers here. I’m sure I’ve picked up some far more exotic principles from writing so much Python. (“Overloading the division operator to join path segments is great design!”) But I’m fascinated by the way our own cultural context informs the way we perceive the world… and the way we decide the world ought to be.

Right, where was I?

wrong-about-java-enterprisey

Java Is Extremely Conservative

Ah, now we might be getting somewhere.

After all, the language has changed comparatively little over the years. It picked up generics and annotations and some other niceties in 2004, and later lambda expressions in 2014, but nothing else quite so fundamental. In contrast, even C++ has seen a flurry of significant changes in recent years.

I like new and exciting developments in languages, but I understand the appeal of the slow and steady approach. C has only had three and a half releases in its lifetime: C89, C99, C11, and maybe C95. The language is small and stable enough that bleeding-edge compilers are still expected to handle code from 1990. Interfaces are simple enough that compiled code my age has a decent chance of linking with new libraries. That’s an impressive feat.

The cost is that C doesn’t do much heavy lifting for you, so a lot of work is reinvented and a lot of ground is retread. C also rarely removes or even deprecates features; known minefields in C are marked with an ad-hoc collection of compiler warnings.

Python, on the other hand, has had four releases in the past five years alone. Obsolete features or libraries are occasionally deprecated and removed several versions later, so Python code may need some light ongoing maintenance if it wants to work against future Python releases. Python also has no stable compiled format to insulate a deployed library from language changes. The language itself is specified well enough that several alternative implementations exist, but they tend to lag behind by several releases.

Java appears to be aiming for an interesting compromise. The core language grows relatively slowly, which avoids the need to remove mistakes later on. The standard library is fairly extensive, yet if this deprecations list is any indication, old APIs are rarely outright removed — I see features here that were deprecated in Java 1.3, sixteen years ago.

This suggests a strong focus on stability and compatibility. New features are added very carefully, especially to the core language. To compensate, the ecosystem has built a great deal of tooling on top of Java; third-party code can experiment more freely, and conservative developers are free to simply not use such tools. I notice that Java annotations were originally a javadoc hack, so it seems the core language adopts popular ideas from the ecosystem, which is good.

The downside is that Java has somewhat of a reputation for code generation, layers of configuration/reflection to do things the core language makes difficult, and lots of XML for some reason.

On the other hand, keeping the language’s growth steady has probably made it easier to experiment with JVMs, and that in turn has opened the door to several new languages. Clojure, Scala, and Groovy might not exist without a solid VM and a vast ecosystem of usable libraries at their disposal.

Perhaps “Java is conservative” is too simple. More accurately, it seems that Java has rationed out its liberalness very carefully. Conservatism is really about risk aversion, after all, and that’s definitely a trait associated with the enterprise world.

Enlightenment

Along this winding road, I’ve come to realize something else. Many of my specific kneejerk dislikes of the core language are really C++’s fault.

Getters, setters, and no properties… just like C++.

Access control… just like C++. Perhaps slightly more justified in C++, where meddling in an object’s internals could cause memory corruption.

One class per file… just like (well, a lot of) C++.

Braces, semicolons, and dromedaryCase… just like C++. This is a whole separate can of worms, but I think braces are noise.

Repeating the class name for constructors… just like C++. It probably would’ve been okay to write public new() or something.

Ability to use types without explicitly importing them… just like C++. Okay, yes, Java is much better about this, but skimming for imports is still not enough to figure out a file’s dependencies.

A null value… just like C++. This is especially aggravating since it makes the entire type system meaningless. A variable of type T might contain a value of type T… or it might contain null, which is not a value of a type T. Every single type annotation is a lie. Even an Optional<T> might be null!

And Finally

Java has a few warts. The language design imposes some tedious development overhead; security issues in the platform itself are perhaps a bit too common; and it’s slow to change, for better or worse. I’m still not sure why a couple of those programs from the previous post ate so much memory, either.

Was I wrong about Java? Kinda. It’s not as bad as I thought, but it’s not orders of magnitude better, either. Its original design philosophy was probably “C++ without so much razor wire”, and by that metric, it’s definitely a success. I can tell it’s not for me, but it exists as an alternative to something that’s also not for me. And hey, I found more things to blame on C++, which always makes me happy.

The world has changed a lot since Java first came out. Now we have an entire competing ecosystem in .NET, JITs like LuaJIT and PyPy popping up, and even desktop software written in JavaScript and deployed with its own little web browser. Java might just be responsible for this widespread acceptance of VMs and JIT, and I’m certainly grateful for that. (Well… jury’s still out on Electron.)

If you found any of this interesting, and you haven’t done it before, maybe give Python or Ruby a whirl to see where I’m coming from. Both languages have thriving communities full of very clever people, and both have very different perspectives on what it means to be object-oriented. Python looks a little more Java-like on the surface, though beneath that lurks a very flexible prototypical object model; Ruby takes Smalltalk’s message-passing approach and adds a light dash of Perl. Try both, even; the worst that can happen is that you appreciate Java all the more. (I definitely know Python better — maybe I’ll write about it from a Java or C++ perspective sometime.)

If I have any parting advice for the Java community, it’s this: Please go back in time and base Java on Pascal instead.

Maybe I Was Wrong about Java – Part 2