Ewen Cheslack-Postava me@ewencp.org @ewencp

September 16 2013
Tags: email | web development | programming | tools

Recently I’ve been adding some features to StraightUp that require email. And of course we want those emails to look attractive, so we need to send multi-part text + HTML messages. This doesn’t sound so bad until you look at the sorry state of HTML support in email clients — expect to code like it’s 1999.

It actually isn’t that bad if you keep your emails simple, and there are quite a few posts out there covering the nitty-gritty of writing HTML emails (and a lot more that are useless, superficial overviews). But when I went to start implementing this, I found none of them actually explain how to implement and test your emails. They are just standard lists of what you can’t do. Even the very nice boilerplate templates, such as HTML Email Boilerplate and Emailology, don’t explain how you should implement a pipeline on top of their tools and recommendations.

If you’re sending one-off marketing emails, it might be fine to manually edit and test each email, writing all your styling inline. If you’re doing anything more complex, generating dynamic emails, or repeatedly creating lots of emails, you don’t actually want to write your email templates with a bunch of inline styles. This is one of the reasons services like MailChimp are handy. They do a bunch of this for you automatically and give you good starting templates. But if their service doesn’t fit your needs, what do you do?

To make your life easier, you need a few tools and a basic pipeline in place to let you test and iterate quickly. This is the basic pipeline I came up with:

  1. Start with base HTML email template. Customize it to work with your template engine. Keep CSS styles in <style> tags.
  2. Run your template engine to generate each email, resulting in a raw email that will likely look horrible in a lot of clients since they’ll strip the styling.
  3. Run it through a CSS inliner. There are already a bunch of libraries out there to move your CSS styles into style attributes on the tags that reference them. For some reason, those other articles don’t let you know that these tools exist, let alone point you to them.
  4. Have an automatic mechanism for getting generated test emails out of your system and viewable in a browser. This can be as simple as logging them to files and having your web browser serve them up when you’re in testing mode.
  5. Run a script to filter the output. Ideally this would be identical to the filter executed by a web-based email client. This step could be done before or after you serve the file to the browser.

Not particularly complex, but I figure that documenting this will hopefully save others the extra couple of days of research on best practices and tools to make it all work in a larger application. And especially for the last step, there don’t seem to be many good tools to do this.

If you’ve already got the gist from this list, just jump to the section on CSS inliners for pointers to a few good ones, and to the last section for my half-hearted attempt at a bookmarklet that tries to give you a semi-realistic preview of what your email will end up looking like in browser.

For me, this entire process is tied to our Django-based web application. Naturally, this means we selected some Python libraries, chose Django-specific solutions where it made life easier, and we already have infrastructure for templating and serving web pages ready. However I’ll try to describe most things generically (hopefully not even assuming you’re working on a web app) so the general tools/libraries will still be Google-able, then suggest specific solutions for Python/Django apps.

Steps 1 & 2: Base Templates

Don’t roll your own. Use templates like HTML Email Boilerplate, Emailology, or MailChimp’s Blueprints.1 They resolve so many issues with random email clients, including ones that I can barely believe people still use, that it would be crazy not to. They’ll also help you understand why you need all these settings. But remember to remove all the comments. You’ll avoid spam issues and use significantly less bandwidth.

Once you select an HTML template, you need to translate it into your system’s templating language (sorry, template is overloaded here…). I’m not going to suggest any since that’s a huge topic, especially if you aren’t already using one. We took one of MailChimp’s templates and converted it to use Django template tags rather than MailChimp’s merge tags. We also gave it sections so most of our emails could be written by extending the template to fill in just a small number of sections (Django’s version of template inheritance). Essentially all we do for a new email is specify the title text and body HTML.

Step 3: Inlining

Email clients filter the HTML you feed them, often very aggressively. From what I’ve read and seen so far, GMail seems to be the most aggressive. And they have good reason. HTML emails need to be filtered to make sure they can’t do anything evil — run malicious scripts, modify or overlay elements in the UI, and probably other evil things I haven’t thought of. It’s especially important for web-based clients since they don’t have a good mechanism for isolating themselves from the email being displayed.

But it’s a pain if you’re the one authoring the HTML. You need to know the subset of functionality you’re allowed to use. The biggest issue, and the reason for this post, is that you need to provide all your styling inline as style attributes because <style> tags are removed entirely. This is mentioned with all the templates. However, they all put the style information in a style tag.

Eventually I figured out why — no sane person does the styling inline because it’s too repetitive and becomes unwieldy to modify. Exactly the reason stylesheets were created in the first place. But they never mention that the smart way of handling this is to use their template, then run an inliner which pulls the style for each tag and puts it as the style attribute of the tag, allowing you to completely remove the <style> tag.2 If nothing else, they really should put a warning in giant red letters somewhere near the beginning of their documentation saying that you ultimately need to inline them and that the template’s are provided this way for ease of reading and editability.

Frustratingly, if you search for tools to do this and include ‘email’ as a search term, you’ll probably turn up a lot of forms that take a complete email and process it for you. That’s not very useful if you’re generating dynamic emails, especially if the dynamic parts contain elements that need styling.

But there are plenty of good libraries for doing this:

  • Generally, search for ‘CSS inliner’ and your language.
  • For Ruby: Premailer seems to be the right choice (code here, and available as a gem).
  • For Python: There’s a premailer port. inlinestyler also seems to be a good choice.

We’re using Django and its template language, so we went with an inliner that integrated easily and had a template tag that took care of everything for us: Django Inlinecss (which was forked from roverdotcom’s version).

Step 4: Getting the Emails

Of course you’ll want to check the results before putting the new email into production. You might think that just emailing yourself is good enough. It might be, and that’s how I started. But it quickly becomes annoying: run the test code, wait for the email to arrive, then dig through GMail’s version in the DOM to figure out what’s going wrong and what’s been filtered. If you can get your emails right on the first try, then you can skip these last two steps. If you don’t, you probably want to read on…

So what’s a better way to test? Since web-based email clients have the most aggressive filters, previewing the email in the browser is probably the best idea. How you get them to the browser will vary depending on your setup. I think the easiest thing to do is log each email to a separate file and either make them accessible as a local file or via a web server (e.g. if you develop on a headless VM).

We already have a full web stack on our development VMs since we need it to test our app, so we just modified that setup. First we changed the email backend. Previously we were using the console, which was sufficient for text-only emails. A two-line config change converted us to the filesystem backend, which just spits each email out to a separate file. We also configured our development machines to serve the files at a special URL to make them easily browsable. Of course our unit tests can also access the emails so we can write proper tests of any email functionality we create.

As a side note, I considered a different option: an additional, development-only middleware layer that, when a special query parameter was included, would emit the email back to the browser instead of the normal response. This works great for some other debugging tools, like adding CPU profiling for a request. But it broke down quickly for me. The first thing I wanted to use it for generated multiple emails, and it wasn’t obvious how to handle that well. Additionally, we’re going to generate emails in other processes (in a task queue), where we can’t hijack the response since there isn’t a response. The file-based approach handles all these cases and maintains a bit of history, which can be nice for debugging. It does, however, mean a bit more work in finding the email when you’re doing manual testing.

Step 5: Realistic Previews

Finally, even if you output only the HTML (versus the whole email, which will obviously not render as a regular HTML page), you’re not testing anything by viewing it in a fully functional browser.

There are commercial products you could feed the output to, and these could be very helpful if you need to truly test the emails across a lot of email clients. We’re not particularly worried about ancient clients, so we mostly just want a sanity check (at least for now).

So what we really need is something that transforms the HTML in the same way that GMail or any other web-based email client would. After quite a bit of Googling, I didn’t turn anything up that performs this specific transformation. There are a lot of HTML sanitizers, designed to filter user-submitted HTML that will be included on a site. However, most of these seem to be even more restrictive than email clients (e.g. completely stripping heading tags). Others are huge chunks of code with dependencies that looked like they were going to be tough to build on (e.g. JsHtmlSanitizer from google-caja and htmlsanitizer.js from Google’s Closure library) and may not provide an easy way to customize their rules to match the filtering performed by email clients. I wanted something that would work in the browser (no additional backend support required) and wouldn’t take me too long to build. So I decided to hack something up that would do it for me well enough for my testing.

This Gist is the result. The approach is pretty simple. First, it tries to check if we’re actually looking at a raw, multi-part email and extract just the HTML part, overwriting the document if it finds it. This lets you use the raw email output dumped in the earlier step (which also makes it easy to inspect the plain-text version of the email).

Next, it tries to filter the page so it matches what would be displayed in an email client. It does so by inspecting and modifying the DOM, a simpler approach than requiring a full HTML parser. The filtering is by no means complete, but it takes care of a few key things. First, it removes the head, style, and link tags that email clients always filter out. Then, it tries to remove attributes it can’t trust. Creating an exhaustive list of good/bad attributes (which vary from tag to tag) would take too long, so I used a combined blacklist/whitelist approach. By default, all attributes are permitted and we explcitly specify tags where we want to apply filtering. These filters, however, are whitelists: for tags that have filters, everything but those attributes are removed. Except for a few exceptions. There are some attributes we always permit, e.g. style, and some we always filter, e.g. id and class. The final step should be to filter CSS rules in the style attributes. Right now I haven’t implemented this step because we haven’t needed it yet. We started from a good template and avoided adding extra styles that might require this filtering.

If you want to use the script, try it as a bookmarklet:

Preview HTML Email

You can click it here to test it on this page (which isn’t supposed to be HTML email compatible, and so will look horrible after you do so). Note that the bookmarklet uses the code directly from the Gist, so if you want to use it regularly you should self-host it.

The small set of rules I implemented are enough for testing our emails based on the template we used. But ideally the CSS style filtering would be based on a complete reference table like this. I don’t know that there’s a similar reference for tags and attributes, but I’m sure the small set of rules currently provided could be upgraded (at worst with a set of valid tags provided by the HTML specs, which would at least be a better baseline). If anyone wants to contribute changes, I’d be happy to update the script. Alternatively, I’d love to see an HTML email-specific sanitizer (or set of sanitizer rules for the configurable implementations).

Conclusion

HTML emails don’t have to be particularly hard. But if you’re just getting started with them and trying to get a minimal set of functionality in place, it can be frustrating to find so little guidance about the actual practice of generating them. Hopefully this guide improves that situation, and I hope others will build on and extend the simple previewing bookmarklet. With the right guide and tools, I think we can get people up and running in an hour or two rather than the couple of days it took me to sort out the details.

Thanks

Thanks to Faolan Cheslack-Postava for comments and editing.

  1. Kudos to all of these folks for extremely valuable projects, and to MailChimp specifically for trying to improve the quality of emails even if it’s potentially at odds with their business.

  2. One thing I’ve yet to figure out is why the email clients don’t do this for you. They’re already dealing with potentially malicious input and filtering it. They have to parse and modify the HTML. Why not add that extra step of CSS inlining before the filtering? It would make email author’s lives more sane, reduce the size of emails, and seems like it has little overhead when added to the steps they must already take. In fact, it could make some of them more efficient by removing the original CSS rule before it’s scattered into 72 different tags.