How to Properly Organize Files in Your Codebase & Avoid Mayhem

    Lucero del Alba

    The main library, data, UI, docs and wiki, tests, legacy and third-party components … How do we keep track and maintain order within all of this? Organizing the files in your codebase can become a daunting task.

    Relax — we’ve got this! In this article, we’ll review the most common systems for both small and large projects, with some easy-to-follow best practices.

    Why Bother?

    As with pretty much all of the tasks related to project management — documentation, software commits, deployment — you’ll benefit from taking a conscious, programmatic approach. Not only it will reduce problems now, but it will also save you and your team quality time in the future when you need to quickly access and review things.

    You surely can recall function names from the top of your head for whatever is it that you’re coding right now, and quickly find a file you need to edit, and sharply tell what works from what doesn’t — or so you think. But could you say the same about that project you were working on last year?

    Let’s admit it: software projects can go on spans of inactivity that last for months, and even years. A simple README file could do a lot for your colleagues or your future self. But let’s think about the other ways you could structure your project, and establish some basic rules to name files, address project documentation, and to some degree organize an effective workflow that would stand the test of time.

    Making Sense of Things

    We’ll establish a “baseline” for organizing files in a project — a logic that will serve us for a number of situations within the scope of software development.

    As with our rules for committing changes to your codebase the right way, none of this is carved in stone, and for what it’s worth, you and your team might come up with different guidelines. In any case, consistency is the name of the game. Be sure you understand (and discuss or dispute) what the rules are, and follow them once you’ve reached a consensus.

    The Mandatory Set

    This is a reference list of files that nearly every software project should have:

    • README: this is what GitHub renders for you right under the sourcetree, and it can go a long way to explaining what the project is about, how files are organized, and where to find further information.
    • CHANGELOG: to list what’s new, modified or discontinued on every version or revision — normally in a reverse chronological order for convenience (last changes first).
    • COPYING LICENSE: a file containing the full text of the license covering the software, including some additional copyright information, if necessary (such as third-party licenses).
    • .gitignore: assuming you use Git (you most probably do), this will also be a must to tell what files not to sync with the repository. (See Jump Start Git’s primer on .gitignore and the documentation for more info, and have a look at a collection of useful .gitignore templates for some ideas.)

    Supporting Actors

    Some additional files you might also consider including, depending on the project:

    • AUTHORS: credits to those participating in writing the code.
    • BUGS: known issues and instructions on reporting newly found bugs.
    • CONTRIBUTING/HACKING: guide for prospective contributors, especially useful for open-source projects.
    • FAQ: you already know what that is. ;)
    • INSTALL: instructions on how to compile or install the software on different systems.
    • NEWS: similar to the CHANGELOG file, but intended for end users, not developers.
    • THANKS: acknowledgments.
    • TODO/ROADMAP: a listing for planned upcoming features.
    • VERSION/RELEASE: a one-liner describing the current version number or release name.

    Folders for Components or Subsystems

    Often we’ll come across a set of functionalities that can be grouped into a single concept.

    Some examples could be:

    • internationalization (i18n) and localization (l18n)
    • authentication modules
    • third-party add-ons
    • general purpose tools and cron jobs
    • user interface (UI) and graphical user interface (GUI)

    All these can be organized into a single “component” or “subsystem” directory — but don’t go crazy!

    We want to limit the creation of directories to keep things manageable, both on the root directory (where the main components will be located) and recursively (inside each directory). Otherwise, we might end up spending a lot of time routinely browsing files in carefully — and excessively — organized directories.

    Leave that Out of the Sourcetree, Please

    As much as we want the project to be neat and organized, there are certain kinds of files we want to leave entirely out of it.

    Data. You might be tempted to have a data/ directory in your sourcetree for CSV files and such, especially if they take up just a few kilobytes. But how about if they take megabytes or even gigabytes (which isn’t unusual these days)? Do you really want to commit that to your codebase as if it were code? No.

    Binary files. You don’t want renders of videos or compiled executable files next to source code. These aren’t development files, and they simply don’t belong here. As with data files, they can also end up using a lot of space.

    Settings. This is another big NO. You shouldn’t put credentials, passwords, or even security tokens in your codebase. We can’t cover the ways around this here, but if you’re a Python developer, consider using Python Decouple.

    Case 1: Web App

    Let’s consider a web application — software that runs on a web server and that you can access through the browser, either on your desktop computer or mobile device. And let’s say this is a web app that offers a membership to access a premium service of sorts — maybe exclusive reports, or travel tips, or a library of videos.

    File Structure

    ├── .elasticbeanstalk
    ├── .env
    ├── billing
    ├── changelog.txt
    ├── locale
    │   ├── en
    │   └── zh_Hans
    ├── members
    ├── readme.txt
    ├── static
    │   ├── fonts
    │   ├── images
    │   ├── javascript
    │   └── styles
    ├── templates
    │   ├── admin
    │   └── frontend
    ├── todo.txt
    └── tools


    This is a basic structure for a web app with support for two languages — English and simplified Chinese for mainland China (locale directory). Also two main components, billing and members.

    If you’re a tiny bit familiar with website development, the contents of the static and templates folder might look familiar to you. Perhaps the only unusual elements might be .elasticbeanstalk, which stores deployment files for Amazon Web Services (AWS), and .env, which only locally stores settings for the project, such as database credentials. The rest, such as README and TODO, we’ve already discussed.

    The tools directory is an interesting one. Here we can store scripts that, for example, prune the database, or check the status of a payment, or render static files to a cache — essentially, anything that isn’t the app itself but helps to make it function properly.

    Regarding naming, it doesn’t make much of a difference if we name the images directory images/ or img/, or the styles directory styles/ or css/, or the javascript/ directory js/. The main thing is that the structuring is logical, and we always follow something of a convention, either long descriptive names, or short ones.

    Case 2: Desktop App

    Now let’s consider an application that you can download and install on your computer. And let’s say the app takes some input, such as CSV files, and presents a series of reports afterward.

    In this examples, we’ll let the sourcetree grow a little larger.

    File Structure

    ├── .gitignore
    ├── data
    ├── doc
    ├── legacy
    │   ├── dashboard
    │   ├── img
    │   └── system
    ├── LICENSE
    ├── README
    ├── tests
    ├── thirdparty
    ├── tools
    │   ├── data_integration
    │   └── data_scraping
    ├── ui
    │   ├── charts
    │   ├── css
    │   ├── csv
    │   ├── dashboard
    │   ├── img
    │   │   └── icons
    │   ├── js
    │   ├── reports
    │   └── summaries
    ├── VERSION
    └── wiki


    The ui/ folder is, essentially, the core of the app. The name of the subfolders are pretty much self-descriptive (another good practice). And unlike our web app example, here we’ve opted for shortened names (such as js instead of javascript). Once again, what really matters is that we’re consistent within the project.

    Earlier, I suggested leaving data files out the sourcetree, and yet there’s a data/ folder in there. How come? Think of this tree as a developer’s box that needs data in order to properly test the app. But that data is still out of the repository synchronization, following the rules set in the .gitignore file.

    The legacy/ folder is for a part of the app that’s being discontinued but still provides some functionality that might come in handy until it’s fully refactored into the new system. So it provides a good way of separating old from current code.

    Also new here are tests/, which provides a place to do quality assurance with unit tests, and thirdparty/, a place to store external libraries that the software needs.

    Notice there are doc/ and wiki/ folders, which might look like duplication. However, it’s also perfectly possible — and even reasonable — to have a documentation folder intended for the end-user, and a wiki for the development team.

    Wrap Up

    A good message is worth repeating: be organized, even when working individually. Hopefully, this article has given you some ideas that you can start implementing into your workflow right away to prevent mess as the number of files in your app increases.

    As mentioned, the guidelines might change here and there, as (almost) every project is different, and so are teams. Ideally, you or your team will get to decide how you structure the project — adding a little document describing the reasoning for this structure — and you’ll then stay consistent with those rules from now on.

    And remember that, with most of the guidelines here, it isn’t all that important if you choose dashes or underscores to name files (to choose one topic among many). Consistency is key.

    Further Reading