Silicon Zucchini

• about 2,700 words

Describing a static site generator using JSON Schema to validate input data, generate template examples, and render editor interfaces.

Today, I’d like to pre­sent some ideas about how to gen­er­ate (sta­tic) web sites aided by data schemas[1]. This post will first talk a bit about the back­ground of Con­tent Man­age­ment Sys­tems and Sta­tic Site Gen­er­a­tors, then describe my moti­va­tion for devel­op­ing a new Sta­tic Site Gen­er­a­tor and con­clude with what I want to achieve with this pro­ject.

Con­tent Man­age­ment Sys­tems and Sta­tic Site Gen­er­a­tors

First, let’s talk a bit about Con­tent Man­age­ment Sys­tems (CMS) and Sta­tic Site Gen­er­a­tors (SSG). While the for­mer can be con­fig­ured to allow non-tech­ni­cal peo­ple to eas­ily update the con­tents of a web page, the lat­ter are mostly writ­ten to be used by other devel­op­ers[2].

Let’s get some­thing out of the way first, though. There is one fea­ture that CMS gives you that is basi­cally the oppo­site of what SSGs can offer: dynamic con­tent (of course). You should really use a CMS if you want a site where vis­i­tors can

  1. add com­ments (with­out using exter­nal ser­vices),
  2. cre­ate accounts to access restricted con­tent,
  3. or search and fil­ter your site’s large data­base quickly (i.e., on the server and with­out pre-ren­der­ing every pos­si­ble view).

If you want any of that, SSGs won’t be an option for you. A lot of times, though, peo­ple won’t need these fea­tures and still use a CMS for con­ve­nience.

Dynamic But Fixed: CMS

Let’s think about how the data of a web site is struc­tured. Most CMSs use rela­tional data­bases in which they save a list of page con­tents mapped to pages.

An easy exam­ple is the “page” fea­ture of Word­Press. The pages you add are basi­cally just records with a title (in plain text), a con­tent field (in HTML) and a URL (rel­a­tive to the base, e.g. /about/). There may also be addi­tional fields for cat­e­gories and authors (and prob­a­bly some data about embed­ded media files), but they are optional. This schema can writ­ten as this:

{title: "string", content: "string", url: "string"}

TYPO3 (a larger CMS) is based on the con­struct that a page con­tains var­i­ous con­tent ele­ments (e.g., rich text with head­line, text and images, cus­tom forms). You can imag­ine it like this:

{
  title: "string",
  url: "string",
  content: [{
    headline: "string"
    text: "string",
    pictures: [{url: "string"}]
  }]
}

Adjust­ing these schemas to a user’s lik­ing is one of the most time con­sum­ing tasks when con­fig­ur­ing a CMS. Even then, it can be dif­fi­cult to antic­i­pate all pos­si­ble use cases. Some­times, it’s eas­ier to have a con­ven­tion (“always start pages with a big image”) instead of enforc­ing this in the CMS.

Do What You Want: SSG

In con­trast to these strict data­base schemas, pop­u­lar Sta­tic Site Gen­er­a­tors like Jekyll or Met­al­smith don’t require you to define any such struc­ture. Jekyll, for instance, treats every file with a “YAML front mat­ter”[3] as some­thing it should trans­form in one way or the other.

A file with front mat­ter might look like this:

---
title: Lorem ipsum
author: Pascal
---
Here is the actual content of this article.

This front mat­ter may con­tain arbi­trary struc­tures. While your "posts" may always have a title field (the con­tent after the front mat­ter is rep­re­sented as a content field), some of them could also include banner_image, author or even layout fields. These fields can be accessed in tem­plates just like title and content and be used to cus­tomize the appear­ance of a page ad hoc.

The schema-less nature of SSGs can lead to some prob­lems, though. One of the things I always get wrong are field names. Since there is not form to input new data, I edit plain text files and think I know what the fields are called. But defin­ing a bannerImage (instead of banner_image) will not result in what I intended. It will not show any errors, either.

Using Schemas for Sta­tic Analy­sis

By writ­ing down the fields used in tem­plates you get the basic struc­ture of your data. Just add a few anno­ta­tions and you have a schema. (Just ask a few ques­tions: “Is it text or a num­ber? Is this field required? How long should this be at most?”)

I’d be skep­ti­cal if a SSG was using schemas just to check the spelling of field names, though! Since defin­ing and writ­ing schemas takes time, it bet­ter be worth the effort. And there’s a lot of other use­ful fea­tures one can get from schemas.

One pos­si­bil­ity is to use schemas to con­firm that the struc­ture of the input data fits the expec­ta­tions of a tem­plate. Let’s say I have an article schema that requires me to set the fields title, summary, content, and published_at. Given some input data for which this schema is valid, I can with­out prob­lems ren­der this data using a tem­plate that also cor­re­sponds to this schema[4].

By the way: When I say “schema” in this con­text, I don’t nec­es­sar­ily mean “data­base schema”, but rather a con­struct like JSON Schema which can also describe nested array types and use ref­er­ences to other schemas. Ide­ally, this can be a way to re-use (parts of) a schema in many tem­plates and dif­fer­ent data types.

Using Schemas for Cre­at­ing Data

So far, we’ve only seen how we can use schemas to val­i­date input data. But what if we could also use the same schemas to eas­ily cre­ate new data?

Get a Style Guide for Free

With a tool like Faker you can eas­ily gen­er­ate fake val­ues of var­i­ous types, from clas­sic Lorem ispum (of ran­dom length) to com­plete addresses in France. The next step is to not just cre­ate one value, but one for each field in a schema. Luck­ily, this is exactly what JSON Schema Faker does.

Remem­ber: Tem­plate inputs are defined as schemas, too – And now we have an easy way to gen­er­ate data for them. So, let’s take our tem­plates, par­tials, com­po­nents and ren­der them with ran­dom data!

To me, this sounds like an ideal (and inter­ac­tive) style guide.

Edit Real Data

Still, it gets bet­ter. What good are your tem­plates and style guides when you don’t have any real data to use them for?

The same schemas that define require­ments for input data can (to a point) also be used to gen­er­ate forms to enter just such data. So, while part of the charm of using a SSG is the abil­ity to write every­thing as plain files, you don’t have to give up some niceties of the form-based approach just yet.

A small server that ren­ders a page with JSON Edi­tor or Alpaca.js and that allows you to sub­mit changes might be enough to get started. (It’ll be harder to add image uploads, redi­rects, and cross-ref­er­ences between pages.)

Sil­i­con Zuc­chini

Every­thing I’ve described so far I attempted to imple­ment in a pro­ject under the code name[5] Sil­i­con Zuc­chini. It is writ­ten in JavaScript and uses a stream-ori­ented approach for com­pil­ing files[6].

Right now, this is only a pro­to­type. You can find the code on Github.

Data and tem­plate schema val­i­da­tion work. Some nice fea­tures like map­ping data files to URIs and sup­port for var­i­ous inputs types (Mark­down, JSON, CSON, YAML) are also avail­able.

Addi­tional Fea­tures

Aside from the fea­tures based on schemas above described above, Sil­i­con Zuc­chini should offer the fol­low­ing things.

Easy Ver­sion Con­trol

Put all your files (Sil­i­con Zuc­chini’s input data) in a Git repos­i­tory and you’ll be able to repro­duce every ver­sion of your page that ever existed. My goal is to cre­ate a default set of build steps that every Sil­i­con Zuc­chini pro­ject can use.

While future ver­sions may add more opti­miza­tions, build­ing the same data and tem­plates with a fixed ver­sion of Sil­i­con Zuc­chini should always gen­er­ate the same out­put — the build should be deter­min­is­tic.

Has­sle-free Orga­ni­za­tion Of Data, Schemas And Tem­plates

There are three ingre­di­ents to each Sil­i­con Zuc­chini site: your con­tent, schemas describ­ing the struc­ture of this con­tent, and tem­plates to gen­er­ate a web­site from this. Sil­i­con Zuc­chini will make good sug­ges­tions where to put these files.

The “tem­plate files” are at their core just sim­ple HTML files con­tain­ing a few place­hold­ers that, when ren­dered, return com­plete HTML pages. They can use cus­tom com­po­nents (“par­tial” tem­plates with their own input schemas) and require exter­nal files. Sil­i­con Zuc­chini will track which files are required by the tem­plates you use and auto­mat­i­cally include those in its build process.

Com­po­nents should be reusable between sites as they very specif­i­cally define what they need. It will be a chal­lenge to com­bine this goal with some­thing like global set­tings for sig­nif­i­cant col­ors and font styles, though. I expect this to evolve over time.

Future ver­sion might also allow you to define (or to auto­mat­i­cally deter­mine) mul­ti­ple entry points so that a build might result in mul­ti­ple JavaScript/​CSS bun­dles which are only included where needed. (No need to load the code to ren­der a 3D WebGL map on the start page!)

Full-page Opti­miza­tions

For exam­ple, feel free to include the com­plete CSS of Twit­ter Boot­strap and then run uncss as part of your build process to col­lect just the parts you actu­ally use. Going one step fur­ther, you could use pent­house to auto­mat­i­cally extract the "crit­i­cal path CSS" for every page you have – not just the start page.

Speak­ing of sta­tic files, in your build process you can deter­mine every file you require, from CSS, JavaScript and font files used in your tem­plates to the images ref­er­enced in your con­tent. Then, you can eas­ily opti­mize them, add hashes to the file­names, and let browsers cache them indef­i­nitely.

Alter­na­tively, you could also embed some files in other resources. E.g., put small images as data URIs in your stylesheets (using base64) or include SVG images directly into your HTML files.

You Can Eas­ily Batch Changes

While pos­si­ble with a CMS, this is triv­ial with a SSG. You can just edit your con­tent, run the build process and pre­view your com­plete site locally. When all changes are done and you like the result, you can update your live site.

The same build process could also run on a server. Aside from the live site (the “pro­duc­tion envi­ron­ment”), you could add “pre­view” instances (vir­tual hosts). One could even have a pub­lic server run­ning the "back­end" described above (the forms gen­er­ated from the schemas and the abil­ity to save files) and regen­er­ate the pre­view site when a file was changed.

Great Error Report­ing

Well, I guess this goes hand-in-hand with using data schemas. There are a lot of things that might go wrong when com­pil­ing your input data into a full-fledged web­site, and it will be impor­tant to tell users exactly where the error comes from and how to fix it.

Some com­mon cases might be:

Appen­dix A: Exist­ing Schemas

The main task that Sil­i­con Zuc­chini adds to your exist­ing devel­op­ment work­flow is the cre­ation of schemas. I could argue that each web site already has an implicit data schema and that ide­ally the pos­si­ble val­ues of inputs were already a con­cern when cre­at­ing the site’s design and infor­ma­tion struc­ture. It is still more work to explic­itly write that schema down (in a for­mat that you may need to learn first[7]).

An easy way to reduce some of that time would be to have a col­lec­tion of already well defined schemas for sev­eral data types, e.g. sim­ple blog arti­cles, images (with file type and dimen­sions), etc. – sim­i­lar to what schema.org does for RDF Schemas[8]. The only col­lec­tion of JSON schemas I could find so far was “ans-schema” by The Wash­ing­ton Post. (There is also schema­s­tore.org, but it seems to be focused on con­fig­u­ra­tion files in JSON.)

I’ll make sure that any Sil­i­con Zuc­chini “starter kit” I pub­lish will include sev­eral use­ful default schemas for you to extend and base your own schemas on. E.g., a schema for sim­ple pages with title, content, and slug (URL) and another schema for news/​blog entries which also includes publication_date and is_draft.

Appen­dix B: Prior Art

I’ve been look­ing for sim­i­lar pro­jects every now and then. (Mostly with the hope that some­one with more time than me had already imple­mented some­thing I could use!) Here is what I have found so far:


  1. I’ve pre­vi­ously hinted at some of them before in the con­text of some exper­i­ments.

  2. That’s not uni­ver­sally true, of course. Every CMS can the­o­ret­i­cally also be used a SSG. And noth­ing pre­vents some­one to write a nice edit inter­face for a SSG.

  3. Text files that start with a block of YAML code (between two lines of 3 dashes ---) have the data defined in the YAML asso­ci­ated with them. Have a look at the Jekyll doc­u­men­ta­tion.

  4. Another pos­si­ble step of sta­tic analy­sis: A tem­plate that requires data in the article schema described above may not uncon­di­tion­ally ren­der the author field since that field is not required and may be empty.

  5. Gen­er­ated with my trusty code name gen­er­a­tor.

  6. I might talk about tech­ni­cal details in a later post. I’ll also prob­a­bly rewrite the entire thing before it becomes pro­duc­tion-ready.

  7. Sil­i­con Zuc­chini could also val­i­date that the schemas you have writ­ten are in fact valid schemas: The JSON Schema spec con­tains a JSON Schema for val­i­dat­ing JSON Schemas.

  8. Speak­ing of schema.org: It may be use­ful to add some anno­ta­tions used by JSON-LD to a JSON Schema to gen­er­ate alter­na­tive rep­re­sen­ta­tions of the data. This could also guide the devel­op­ment of tem­plates so they include the cor­rect Micro­data markup.