Ruby
Article

Document Your JSON API Schema with PRMD

By Glenn Goodrich

JSON API Documentation & Validation

article

On a recent project, my team agreed that we needed some way to validate and document the JSON coming in and out of our API. We had already settled on using JSON Schema as the means to describe the API, so determining how to validate (and document) JSON Schema was next on the to-do list. I did tons of research into how to best accomplish such noble tasks. OK, OK, “tons of research” really means “look for gems that do this already”. I also consulted my past experience with things like Grape and Swagger, but we aren’t using Grape here and I couldn’t find anything that would allow me to easily incorporate Swagger. Even if I did, that’s only docs, without and validation.

The long and short of it (that’s “old person talk” for TL;DR) is, if you want to document your API, validate requests and responses, and use JSON Schema, it’s a bit of an effort. There are approaches, such as iodocs, but I didn’t want to have to install Redis, use Node, etc. just to get docs. Also, there are plenty of tools out there to generate docs once you have the schema, but they don’t give much help in creating the schema. I quickly learned that, no matter which tool or direction I chose, it was going to be a lot of manual effort to get this done. I am hoping this article gets anyone else with the same goal further up the path.

Previously this year, I took a long, hard look at pliny as a possible platform for our APIs. Pliny comes from the good folks at Heroku, who know a little bit about APIs. It is Sinatra-based and comes with some opinionated (but very good) helpers for things like logging, request tracking, versioning, etc. I highly encourage you to check it out if you’re writing APIs in Ruby. I did, and I ended up answering our JSON Schema needs as a result.

Pliny utilizes two other gems from Github user interagent: prmd and committee. Together, these two gems tag-team your JSON Schema/API needs. prmd focuses on JSON Schema creation and API doc generation, and it’s the focus of this post. Committee is a collection of methods and Rack middleware to validate your schema. A subsequent post will focus on Committee.

JSON Schema and JSON Hyper Schema

JSON Schema is one (of very many) attempts to create a way to define the structure of JSON data. The goal is to be able to document and verify a JSON provider/repository just like you would a database schema. There are specifications (currently on Draft version 4), but I prefer this online book by Michael Droettboom of the Space Telescope Science Institute (GREAT name) as a starting point.

JSON Hyper Schema turns your Schema into “Hypertext”. In other words, JSON Hyper Schema describes the endpoints for your application, including what it will accept and provide. The Space Telescope book does not cover JSON Hyper Schema, so I suggest you scan the spec and read this post.

Finally, the prmd gem provides this markdown file that is extremely useful when defining your schema and what prmd expects.

I am not going to walk through the specification, so I encourage you to read that book. I will presume you understand the basics of JSON Schema and Hyper Schema for the rest of this post.

PRMD

PRMD’s tag line is “JSON Schema tools and doc generation for HTTP APIs”. In other words, prmd allows you to generate JSON Schema, which then must be manually changed/tweaked to match your API. Once the schema is fully defined, prmd provides tasks to verify the schema is JSON Schema-compliant, as well as generate docs for that API.

The end game of using prmd is to have a defined JSON Schema for your API, along with supporting documentation. The documentation is generated in Markdown format, which is nice.

prmd provides an executable (prmd) with the following commands:

init: Scaffold resource schemata
combine: Combine schemata and metadata into single schema
verify: Verify a schema
doc: Generate documentation from a schema
render: Render views from schema

We will directly use each of these, either directly or via Rake task, except render, which I have not had a use for.

Example

In this post, our application provides account creation (registration) and authentication. We’ll define the resource (account) and endpoints (links). I am using Rails here, but Pliny is Sinatra-based, so you should be able to easily use the concepts covered here with any Ruby web framework. The schema will also expose endpoints for sessions and password reset.

The Rails application uses rails-api, starting with Ruby 2.2.2, Rails 4.2.3. I have a repo that I am using, so you can check it out to see how it’s setup.

PRMD Setup

prmd needs a directory to store the schema and it’s supporting files. Make a schema/schemata directory:

mkdir -p schema/schemata

The top level schema files will live in schema and the individual schemata files (such as for account) will live in schema/schemata. This is simply our convention, so you can do what you like.

The “top level” files I previously mentioned are files that describe the metadata for our overall API, unrelated to a specific resource. Here’s ours:

{
 "description": "Account API",
 "id": "account-api",
 "links": [{
   "href": "https://accounts.ourapi.com",
   "rel": "self"
 }],
 "title": "Accounts"
}

This is taken from the prmd README and satisfies the metadata requirements of JSON Schema.

I feel I should mention that you can use JSON or YAML to define your API. I started with YAML and didn’t like it as much. I prefer doing this in JSON. One added bonus of doing this in JSON is that you can drop things in various online JSON and JSON Schema linters and catch typos or orphaned brackets.

Account Schema

Time to generate the schema scaffold:

prmd init account > schema/schemata/account.json

This creates a basic JSON Schema for our Account resource. Unfortunately, it doesn’t pull in anything from our model, if we had one (and, it probably shouldn’t), so the JSON schema has to be manually tweaked to match the API. Open up that file and quickly read through it. I’ll dissect it, section by section, so we’re on the same page.

General Resource Information

{
  "$schema": "http://json-schema.org/draft-04/hyper-schema",
  "title": "Account",
  "description": "The Account resource for the API",
  "stability": "prototype",
  "strictProperties": true,
  "type": [
    "object"
  ],
  ....

As you can see, the $schema is draft 4 of JSON Hyper Schema. There are a couple of “FIXME”s that need to be addressed in title and description, which I have fixed above. stability specifies the stability (duh) of the resource and is one of prototype, development, or production. strictProperties indicates that this object (the resource) ONLY has the properties defined in this object. There is an additionalProperties property that is mutually exclusive with strictProperties.

Definitions

The definitions are reference properties that will be used throughout the schema so it’s not necessary to constantly redefined id or email in the context of the schema definition. This is how you DRY things up in a schema.

The generated definitions have examples, such as name and id. I have changed the snippet below to match our resource by replacing name with email and adding password:

"definitions": {
    "id": {
      "description": "unique identifier of account",
      "readOnly": true,
      "format": "uuid",
      "type": [
        "string"
      ]
    },
    "email": {
      "description": "unique email of account",
      "readOnly": true,
      "type": [
        "string"
      ]
    },
    "password": {
      "description": "account password",
      "readOnly": true,
      "type": [
        "string"
      ]
    }
    "identity": {
      "anyOf": [
        {
          "$ref": "/schemata/account#/definitions/id"
        },
        {
          "$ref": "/schemata/account#/definitions/email"
        }
      ]
    },
    "created_at": {
      "description": "when account was created",
      "format": "date-time",
      "type": [
        "string"
      ]
    },
    "updated_at": {
      "description": "when account was updated",
      "format": "date-time",
      "type": [
        "string"
      ]
    }
  }

Some important takeaways from the definitions are:

  • Each definition has a type that represents the kind of data type. Acceptable values are: "array", "boolean", "integer", "number", "null", "object", "string". format is what you think it is, and possible values are: "date-time", "email", "hostname", "ipv4", "ipv6", "uri". prmd actually supplies a custom uuid, which you can (should) use for IDs.
  • Speaking of identifying, the identity property is how the resource identifies specific instances. In this case, we can use id or email. The $ref property is a reference to the property in definitions. So, "$ref": "/schemata/account#/definitions/email" pulls in our email definition (technically called “dereferencing”).

The remaining properties should be self-explanatory.

Links

The links section is part of the JSON Hyper Schema spec and explains the endpoints supported by the schema:

"links": [
  {
    "description": "Create a new account.",
    "href": "/accounts",
    "method": "POST",
    "rel": "create",
    "schema": {
      "properties": {
      },
      "type": [
        "object"
      ]
    },
    "title": "Create"
  },
  {
    "description": "Delete an existing account.",
    "href": "/accounts/{(%2Fschemata%2Faccount%23%2Fdefinitions%2Fidentity)}",
    "method": "DELETE",
    "rel": "destroy",
    "title": "Delete"
  },
  {
    "description": "Info for existing account.",
    "href": "/accounts/{(%2Fschemata%2Faccount%23%2Fdefinitions%2Fidentity)}",
    "method": "GET",
    "rel": "self",
    "title": "Info"
  },
  {
    "description": "List existing accounts.",
    "href": "/accounts",
    "method": "GET",
    "rel": "instances",
    "title": "List"
  },
  {
    "description": "Update an existing account.",
    "href": "/accounts/{(%2Fschemata%2Faccount%23%2Fdefinitions%2Fidentity)}",
    "method": "PATCH",
    "rel": "update",
    "schema": {
      "properties": {
      },
      "type": [
        "object"
      ]
    },
    "title": "Update"
  }
],

A link can include the href, method, schema, and targetSchema. The latter two properties are what the API will accept and provide, respectively. rel is the relationship of the link to the resource, and should be one of create, destroy, self, instances, or update.

I am sure you see some of that JSON vomit in the href for particular links. I am speaking of things like:

"href": "/accounts/{(%2Fschemata%2Faccount%23%2Fdefinitions%2Fidentity)}",

Basically, that is really /accounts/{identity}. Remember when we defined the identity in the definitions section? That long, weird string is just the URL encoded /schemata/account/#definitions/identity.

You can see that prmd basically generated “typical” RESTful links, building out a CRUD-like API. In some cases, that’s a good start. In this case, we need to make some changes. Since this is for registration and authentication:

  • Let’s get rid of the “List existing accounts”, sounds like a security issue anyway.
  • Add links for the session and password flows.
  • Signing in will be a POST to /accounts/session, accepts a remember_me parameter, and will return a token.
  • The password endpoints will require a reset token.
  • I am going to get pretty detailed with how I define what the links accept and return.

As an example, here is the new account creation/registration schema:

{
  "description": "Create a new account.",
  "href": "/account",
  "method": "POST",
  "rel": "create",
  "schema": {
    "properties": {
      "account" : {
        "type" : "object",
        "properties": {
          "email" : { "$ref": "/schemata/account#/definitions/email" },
          "password": {
            "type": "string",
            "description": "The password"
          },
          "remember_me": {
            "type": "boolean",
            "description": "True/false - generate refresh token (optional)"
          }
        },
        "required" : [ "email", "password" ]
      }
    },
    "type": [ "object" ]
  },
  "title": "Create",
  "targetSchema": {
    "type": "object",
    "properties":  {
      "token" : { "$ref": "/schemata/account#/definitions/token" }
    }
  }
}

So, what changed?

  • I expanded the schema/properties to take in an account that has an email, a password, and an optional remember_me property. The email and password are references to the same items in the definitions, but the remember_me property is locally defined.
  • Notice the required parameters that are nested in account. This is to allow you to define required and optional parameters at any level.
  • I added the targetSchema property, defining what will be returned by the call. In this case, it’s a token definition that I added, which looks like:

    token": {
      "type": "string",
      "description": "The token",
      "example" : "eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzUxMiJ9.eyJkYXRhIjp7ImlkIjoiMTE0MzYiLCJ0eXBlIjoiYWNjb3VudHMiLCJhdHRyaWJ1dGVzIjp7ImVtYWlsIjoiZ2xlbm4uZ29vZHJpY2hAZ21haWwuY29tIn19LCJzdWIiOiJhY2NvdW50IiwiZXhwIjoxNDM3MjM0OTM0LCJpc3MiOiJVbmlxdWUgVVNBIiwiaWF0IjoxNDM3MTQ4NTM0LCJqdGkiOiI3ZmJiYTgzOS1kMGRiLTQwODItOTBmZC1kNmMwM2YwN2NmMWMifQ.SuAAhWPz_7VfJ2iyQpPEHjAnj_aZ-0-gI4uptFucWWflQnrYJl3Z17vAjypiQB_6io85Nuw7VK0Kz2_VHc7VHZwAjxMpzSvigzpUS4HHjSsDil8iYocVEFlnJWERooCOCjSB9R150Pje1DKB8fNeePUGbkCDH6QSk2BsBzT07yT-7zrTJ7kRlsJ-3Kw2GDnvSbb_k2ecX_rkeMeaMj3FmF3PDBNlkM"
    },
    

Whoa! That’s ugly. Yes, it is. However, the example will be used when we generate the markdown documentation, so I’ll take ugly now to get that later.

In a production app, there are many changes like this to get the schema well-defined. It is a manual, tedious process that no one likes to do. However, the payoff of having docs and being able to verify JSON schema in tests and on the production site is worth it. At least, we think it is. Also, our APIs are focused (think microservices) so the scope of each JSON Schema effort is smaller than, say, a large, monolithic API.

Doc Generation

Part of the reward that comes from painstakingly defining the JSON Schema is “easy” API documentation. And yes, I realize there are other ways to do this (RAML, Apiary, etc.), so if you have a good way to do it, I won’t talk you out of it.

While prmd does offer an executable, I like having Rake tasks for combining the schema files and generating the docs. The README explains how to create the Rake tasks. In short, I created a lib/tasks/schema.rake with the following:

require "prmd/rake_tasks/combine"
require "prmd/rake_tasks/verify"
require "prmd/rake_tasks/doc"

namespace :schema do
  Prmd::RakeTasks::Combine.new do |t|
    t.options[:meta] = "schema/meta.json"
    # use meta.yml if you prefer YAML format
    t.paths << "schema/schemata"
    t.output_file = "schema/authentication-api.json"
  end

  Prmd::RakeTasks::Verify.new do |t|
    t.files << "schema/authentication-api.json"
  end

  Prmd::RakeTasks::Doc.new do |t|
    t.files = { "schema/authentication-api.json" => "schema/authentication-api.md" }
  end
  task default: ["schema:combine", "schema:verify", "schema:doc"]
end

Notice that I have changed some paths and file names from the examples in the README to match this project. Now, I can go

rake schema:combine
rake schema:verify
rake schema:doc

or

rake schema

If you have JSON errors, the combine will fail. Here, I deleted a : in the doc and got:

unable to parse schema/schemata/account.json (#<JSON::ParserError: 795: unexpected token at '{
"$schema": "http://json-schema.org/draft-04/hyper-schema",
  "title": "Authentication API - Account",
  "description": "The Account Schema",
  "stability": "prototype",
  "strictProperties": true,
  "type": [
    "object"
    ],
  "definitions": {
    "id": {
      "description": "unique identifier of account",
      "readOnly": true,
  ...
  Somes files have failed to parse. If you wish to continue without them,please enable faulty_load using --faulty-load

So, you can force it to load, but I don’t know why you would.

If you have JSON Schema errors, then the verify task will fail. Here, I made the password type a sting:

schema/authentication-api.json: #/definitions/account/links/1/schema/properties/account/properties/password/type: failed schema #/properties/type: No subschema in "anyOf" matched.
schema/authentication-api.json: #/definitions/account/links/0/schema/properties/account/properties/password/type: failed schema #/properties/type: No subschema in "anyOf" matched.
schema/authentication-api.json: #/definitions/account/links/0/schema/properties/account/properties/password/type: failed schema #/properties/type: No subschema in "anyOf" matched.
schema/authentication-api.json: #/definitions/account/links/0/schema/properties/account/properties/password: failed schema #/properties/properties/additionalProperties: Not all subschemas of "allOf" matched.

Presuming the combine works and the verify doesn’t find anything, then the doc task with create a schema/authentication-api.md file. Here’s a snippet:

1

The full schema docs can be found here.

I can add this markdown docs to the Github Repo or add a route that uses something like Redcarpet to create HTML. The important point is the docs are available. If your team uses Github, they’re easy to share, too.

But Wait! There’s More!

I know what you’re thinking. “Why did this guy go through all this for some OK markdown documentation? Hasn’t he heard of Swagger?” I have heard of it, and it won’t do what I want. I don’t think. I have mentioned using the schema in specs/tests to validate JSON schema, as well as using middleware to validate requests based on what the schema will accept. This is beyond what prmd offers. However, the folks behind prmd write the committee gem for just that purpose. And that will be the subject of my next post. You can use the time between now and then to get your schema in order. ;)

  • Arnold Schrijver

    Nice article! You could go one step further and use a machine-readable API, by using JSON-LD Hydra (and create a Hypermedia API) :)

    • ggsp

      Interesting. I hadn’t heard of JSON-LD…seems like something we should look into. Thanks!

  • Arnold Schrijver

    Nice article! You could go one step further and use a machine-readable API, by using JSON-LD Hydra (and create a Hypermedia API) :)

  • ggsp

    Thanks Scott. I don’t have a ton of experience with Swagger, so any issues I have with it are due to that or maybe tripping over stuff in Rails/Ruby. I couldn’t find a good gem and we went to prmd rather than try to write one.

Recommended

Learn Coding Online
Learn Web Development

Start learning web development and design for free with SitePoint Premium!

Get the latest in Ruby, once a week, for free.