Today, I’d like to present some ideas about how to generate (static) web sites aided by data schemas[1]. This post will first talk a bit about the background of Content Management Systems and Static Site Generators, then describe my motivation for developing a new Static Site Generator and conclude with what I want to achieve with this project.
Content Management Systems and Static Site Generators
First, let’s talk a bit about Content Management Systems (CMS) and Static Site Generators (SSG). While the former can be configured to allow non-technical people to easily update the contents of a web page, the latter are mostly written to be used by other developers[2].
Let’s get something out of the way first, though. There is one feature that CMS gives you that is basically the opposite of what SSGs can offer: dynamic content (of course). You should really use a CMS if you want a site where visitors can
- add comments (without using external services),
- create accounts to access restricted content,
- or search and filter your site’s large database quickly (i.e., on the server and without pre-rendering every possible view).
If you want any of that, SSGs won’t be an option for you. A lot of times, though, people won’t need these features and still use a CMS for convenience.
Dynamic But Fixed: CMS
Let’s think about how the data of a web site is structured. Most CMSs use relational databases in which they save a list of page contents mapped to pages.
An easy example is the “page” feature of WordPress. The pages you add are basically just records with a title (in plain text), a content field (in HTML) and a URL (relative to the base, e.g. /about/
). There may also be additional fields for categories and authors (and probably some data about embedded media files), but they are optional. This schema can written as this:
{title: "string", content: "string", url: "string"}
TYPO3 (a larger CMS) is based on the construct that a page contains various content elements (e.g., rich text with headline, text and images, custom forms). You can imagine it like this:
{
title: "string",
url: "string",
content: [{
headline: "string"
text: "string",
pictures: [{url: "string"}]
}]
}
Adjusting these schemas to a user’s liking is one of the most time consuming tasks when configuring a CMS. Even then, it can be difficult to anticipate all possible use cases. Sometimes, it’s easier to have a convention (“always start pages with a big image”) instead of enforcing this in the CMS.
Do What You Want: SSG
In contrast to these strict database schemas, popular Static Site Generators like Jekyll or Metalsmith don’t require you to define any such structure. Jekyll, for instance, treats every file with a “YAML front matter”[3] as something it should transform in one way or the other.
A file with front matter might look like this:
---
title: Lorem ipsum
author: Pascal
---
Here is the actual content of this article.
This front matter may contain arbitrary structures. While your "posts" may always have a title
field (the content after the front matter is represented as a content
field), some of them could also include banner_image
, author
or even layout
fields. These fields can be accessed in templates just like title
and content
and be used to customize the appearance of a page ad hoc.
The schema-less nature of SSGs can lead to some problems, though. One of the things I always get wrong are field names. Since there is not form to input new data, I edit plain text files and think I know what the fields are called. But defining a bannerImage
(instead of banner_image
) will not result in what I intended. It will not show any errors, either.
Using Schemas for Static Analysis
By writing down the fields used in templates you get the basic structure of your data. Just add a few annotations and you have a schema. (Just ask a few questions: “Is it text or a number? Is this field required? How long should this be at most?”)
I’d be skeptical if a SSG was using schemas just to check the spelling of field names, though! Since defining and writing schemas takes time, it better be worth the effort. And there’s a lot of other useful features one can get from schemas.
One possibility is to use schemas to confirm that the structure of the input data fits the expectations of a template. Let’s say I have an article
schema that requires me to set the fields title
, summary
, content
, and published_at
. Given some input data for which this schema is valid, I can without problems render this data using a template that also corresponds to this schema[4].
By the way: When I say “schema” in this context, I don’t necessarily mean “database schema”, but rather a construct like JSON Schema which can also describe nested array types and use references to other schemas. Ideally, this can be a way to re-use (parts of) a schema in many templates and different data types.
Using Schemas for Creating Data
So far, we’ve only seen how we can use schemas to validate input data. But what if we could also use the same schemas to easily create new data?
Get a Style Guide for Free
With a tool like Faker you can easily generate fake values of various types, from classic Lorem ispum (of random length) to complete addresses in France. The next step is to not just create one value, but one for each field in a schema. Luckily, this is exactly what JSON Schema Faker does.
Remember: Template inputs are defined as schemas, too – And now we have an easy way to generate data for them. So, let’s take our templates, partials, components and render them with random data!
To me, this sounds like an ideal (and interactive) style guide.
Edit Real Data
Still, it gets better. What good are your templates and style guides when you don’t have any real data to use them for?
The same schemas that define requirements for input data can (to a point) also be used to generate forms to enter just such data. So, while part of the charm of using a SSG is the ability to write everything as plain files, you don’t have to give up some niceties of the form-based approach just yet.
A small server that renders a page with JSON Editor or Alpaca.js and that allows you to submit changes might be enough to get started. (It’ll be harder to add image uploads, redirects, and cross-references between pages.)
Silicon Zucchini
Everything I’ve described so far I attempted to implement in a project under the code name[5] Silicon Zucchini. It is written in JavaScript and uses a stream-oriented approach for compiling files[6].
Right now, this is only a prototype. You can find the code on Github.
Data and template schema validation work. Some nice features like mapping data files to URIs and support for various inputs types (Markdown, JSON, CSON, YAML) are also available.
Additional Features
Aside from the features based on schemas above described above, Silicon Zucchini should offer the following things.
Easy Version Control
Put all your files (Silicon Zucchini’s input data) in a Git repository and you’ll be able to reproduce every version of your page that ever existed. My goal is to create a default set of build steps that every Silicon Zucchini project can use.
While future versions may add more optimizations, building the same data and templates with a fixed version of Silicon Zucchini should always generate the same output — the build should be deterministic.
Hassle-free Organization Of Data, Schemas And Templates
There are three ingredients to each Silicon Zucchini site: your content, schemas describing the structure of this content, and templates to generate a website from this. Silicon Zucchini will make good suggestions where to put these files.
The “template files” are at their core just simple HTML files containing a few placeholders that, when rendered, return complete HTML pages. They can use custom components (“partial” templates with their own input schemas) and require external files. Silicon Zucchini will track which files are required by the templates you use and automatically include those in its build process.
Components should be reusable between sites as they very specifically define what they need. It will be a challenge to combine this goal with something like global settings for significant colors and font styles, though. I expect this to evolve over time.
Future version might also allow you to define (or to automatically determine) multiple entry points so that a build might result in multiple JavaScript/CSS bundles which are only included where needed. (No need to load the code to render a 3D WebGL map on the start page!)
Full-page Optimizations
For example, feel free to include the complete CSS of Twitter Bootstrap and then run uncss as part of your build process to collect just the parts you actually use. Going one step further, you could use penthouse to automatically extract the "critical path CSS" for every page you have – not just the start page.
Speaking of static files, in your build process you can determine every file you require, from CSS, JavaScript and font files used in your templates to the images referenced in your content. Then, you can easily optimize them, add hashes to the filenames, and let browsers cache them indefinitely.
Alternatively, you could also embed some files in other resources. E.g., put small images as data URIs in your stylesheets (using base64
) or include SVG images directly into your HTML files.
You Can Easily Batch Changes
While possible with a CMS, this is trivial with a SSG. You can just edit your content, run the build process and preview your complete site locally. When all changes are done and you like the result, you can update your live site.
The same build process could also run on a server. Aside from the live site (the “production environment”), you could add “preview” instances (virtual hosts). One could even have a public server running the "backend" described above (the forms generated from the schemas and the ability to save files) and regenerate the preview site when a file was changed.
Great Error Reporting
Well, I guess this goes hand-in-hand with using data schemas. There are a lot of things that might go wrong when compiling your input data into a full-fledged website, and it will be important to tell users exactly where the error comes from and how to fix it.
Some common cases might be:
- A data file is not valid for the given schema. Tell the user which field was missing/wrong in which input file and which schema file was used. (The challenge: Schemas can contain references to other schemas.)
- A template could not be rendered, because it was given invalid data. These errors should be as clear as the ones given for invalid data files.
- A template could not be rendered because its code is not valid (either a parse error or a JavaScript error when executing the template). This should output something like a stack trace with file names and line numbers.
- A static file was referenced but does not exist. Since we know each output file, we can validate all relative links in our files (as one of the last steps in the build process). This should show where a file was requested and under which absolute path we tried to find it.
Appendix A: Existing Schemas
The main task that Silicon Zucchini adds to your existing development workflow is the creation of schemas. I could argue that each web site already has an implicit data schema and that ideally the possible values of inputs were already a concern when creating the site’s design and information structure. It is still more work to explicitly write that schema down (in a format that you may need to learn first[7]).
An easy way to reduce some of that time would be to have a collection of already well defined schemas for several data types, e.g. simple blog articles, images (with file type and dimensions), etc. – similar to what schema.org does for RDF Schemas[8]. The only collection of JSON schemas I could find so far was “ans-schema” by The Washington Post. (There is also schemastore.org, but it seems to be focused on configuration files in JSON.)
I’ll make sure that any Silicon Zucchini “starter kit” I publish will include several useful default schemas for you to extend and base your own schemas on. E.g., a schema for simple pages with title
, content
, and slug
(URL) and another schema for news/blog entries which also includes publication_date
and is_draft
.
Appendix B: Prior Art
I’ve been looking for similar projects every now and then. (Mostly with the hope that someone with more time than me had already implemented something I could use!) Here is what I have found so far:
- The "Perfect CMS" which Hay Kranen describes also uses JSON Schema for entering and validating data. (Not really “prior” art – I’ve been thinking about this for at least a year and implemented the first prototype of Silicon Zucchini in March.)
- prismic.io (a “CMS as a Service”) allows you to create document masks (see their documentation) to define custom data types. The format is something like JSON Schema and is used to create the forms a content editor sees.
I’ve previously hinted at some of them before in the context of some experiments. ↩
That’s not universally true, of course. Every CMS can theoretically also be used a SSG. And nothing prevents someone to write a nice edit interface for a SSG. ↩
Text files that start with a block of YAML code (between two lines of 3 dashes
---
) have the data defined in the YAML associated with them. Have a look at the Jekyll documentation. ↩Another possible step of static analysis: A template that requires data in the
article
schema described above may not unconditionally render theauthor
field since that field is not required and may be empty. ↩Generated with my trusty code name generator. ↩
I might talk about technical details in a later post. I’ll also probably rewrite the entire thing before it becomes production-ready. ↩
Silicon Zucchini could also validate that the schemas you have written are in fact valid schemas: The JSON Schema spec contains a JSON Schema for validating JSON Schemas. ↩
Speaking of schema.org: It may be useful to add some annotations used by JSON-LD to a JSON Schema to generate alternative representations of the data. This could also guide the development of templates so they include the correct Microdata markup. ↩