The Many Jobs of JS Build Tools

For new JS developers

swyx 2020-01-06

Essay status: mostly baked, sat on this for about 3 weeks and got some amount of peer review

One of my regrets in my recent SE Daily interview was my rather poor, panicked description of “Why Webpack” and “Why Gatsby”. Jeff, the host, doesn’t do much frontend coverage, whereas I have lived and breathed this stuff for the past 2+ years.

It felt rather like that phenomenon of how “Fish have no word for Water” - Water just is. Having to justify the existence of Water from first principles called on explanation muscles I rarely use. And then when I checked Webpack’s Why Webpack docs, I felt it focused mostly on modules and didn’t spend enough time on the other important jobs that build tools provide for us.

I’d like another shot at explaining this.

Build Tools are Optional
Only Scripts
Task Runners
Multiple Targets, Multiple Assets: the rise of Bundlers
Multiple Targets, Multiple Languages: the rise of Transpilers and Compile-to-JS
Production Optimizations
Developer Experience
Metaframeworks - reducing Many Jobs to One
Conclusion - The Pandora’s Box
Related Reads
Thanks
Addendum on Hot Reloading

Build Tools are Optional

The first thing to acknowledge is that build tools (which imply having a “build step” before deploying code, instead of directly being able to go from source code to deploy) aren’t technically required.

JavaScript is meant to be an interpreted language - write some in a <script> tag or in browser console, and it runs. No mandatory compile step unlike some other languages. So, then, to put a build step - that kinda looks like a compile step - in front of it evidently takes something away from that benefit.

Some people I greatly admire, like Luke and Brian, write their apps entirely without a build step. The universal result of doing this is lightning fast deploy times.

Only Scripts

I wasn’t around for much of this history, but as best as I can tell, build tools came to JavaScript primarily because we wanted a sane way to reuse code.

For almost 20 years since creation in 1995, reusable JavaScript code looked something like this:

<script src="https://code.jquery.com/jquery-3.3.1.slim.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.6/umd/popper.min.js"></script>
<script src="https://stackpath.bootstrapcdn.com/bootstrap/4.2.1/js/bootstrap.min.js"></script>
<script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script>
<script src="magnific-popup/jquery.magnific-popup.js"></script>

Variables are scoped to each file, so in order to make a script that depended on another script, you had to throw them on the global window object:

window.MyScript = /* etc */

and then the subsequent script would access that magic global on the window:

var MyScriptCore = window.MyScript
console.log(MyScriptCore())

This is all kinds of bad - causing a namespace pollution problem, and requiring scripts to be loaded in the exact right order or face inscrutable undefined is not a function errors. What we really wanted was modules, rather than scripts.

I will not dwell further on the importance of modules - refer to Why Webpack and Why Would I Use a Webpack? if you need more convincing.

Task Runners

Relying on script CDNs introduced security and latency costs, so the alternative was to download the code and glue them together in the right way to create one (or more) custom JS bundles entirely within your control and hosted on your infrastructure. To gloss over a huge amount of development in the space (that I don’t know much about), “task runners” arose to automate this process for you - Grunt, Gulp, Broccoli.

But something else also happened at the same time - server-side JavaScript became a thing.

Multiple Targets, Multiple Assets: the rise of Bundlers

Since introduction in 2009, Node paved the way for Server-side JavaScript, with npm becoming the de facto package manager for Node packages, but eventually all JavaScript. Developers naturally want to reuse code across Node and the browser, and this caused a rather unfortunate rift in the community between different module specifications CommonJS (from the Node camp) and AMD (from the browser people).

The job of task runners quickly expanded to help developers build to these different build targets, which is where “task runners” eventually gave way to “bundlers” like Browserify and Webpack

Important caveats: Fred Schott notes that this account may overattribute causality - but it isn’t disputed that Browserify and Webpack were necessary enablers of “universal” or “isomorphic” JavaScript. Also, the chronology overlaps a lot - Mike Sherov notes that bundlers atop like RequireJS and Closure Compiler predated some task runners like Grunt and Gulp.

We realized that the act of concatenating scripts requires building a dependency graph and managing any namespace conflicts via static analysis. Where “task runners” mostly stopped at concatenating files per your explicit instructions, bundlers can use ASTs to statically analyze imports, ensure everything is initialized in the right order, and produce a minimal number of bundles for optimized loading.

To facilitate static analysis of dependency graphs, Webpack pioneered the import syntax, which has since been standardized into ESModules (here’s Lin Clark on ESModules) and begun healing the divide between the various module specifications.

Incidentally, the job of task runners were also chipped away as package managers, primarily npm but also bower, built in simple and effective scripting to their CLI’s. The similarity and overlapping responsibilities caused much confusion but hopefully I have given a good accounting here.

Today, Webpack is in wide use for apps and metaframeworks (tools that bundle other tools to lessen the pain of configuration) like create-react-app, Next.js and Gatsby. Rollup is in second place and is widely used for building libraries as opposed to apps, partially for historical reasons but its hard reliance on ES Modules lets it build smaller bundles at the cost of some incompatibility. Parcel is the newest kid on the block (launched Dec 2017), aiming for a batteries-included approach with parallelizable, origin-agnostic graph crawling.

Writing web apps often involves coordinating non-JS assets, like images and css files. It’s often better to co-locate related assets with their associated code, than to have to manage them in separate folders. Webpack made it a norm to simply import Image from './image.png' and import 'style.css' inside of a JavaScript file to include these for bundling. Yes, you’re “importing” non-JS assets into a JavaScript file, and this is Webpack magic made possible by its loaders.

A final, notable outcome of JavaScript being run both on the server and client is the rise of Server-side Rendering that “hydrate” into Clientside Apps - variously known as “Isomorphic apps” or, simply, “SSR”. Next.js and other Metaframeworks take advantage of this for speed, SEO, and a unified JavaScript framework based approach to writing sites and apps (and all the shades of gray between the two).

Multiple Targets, Multiple Languages: the rise of Transpilers and Compile-to-JS

The other thing about clientside JavaScript is the incremental rollout of new language features. ES6 was released in 2015, a -huge- update that comprised over a decade of work and argument on the language. It made JavaScript a lot more tolerable to write serious apps in.

On one hand, people wanted to use this syntax right away (or even before release), but they were handcuffed by the fact that they would have to wait for the various JavaScript engines to implement them, and then for the individual browser updates to roll out to billions of users. That would take forever.

So Traceur and eventually Babel arose to allow people to write modern JavaScript, but that would “transpile” (compile from JavaScript to JavaScript) that code to the lowest common denominator understandable by all the engines the developer chooses to target. Similarly for new APIs (as opposed to new syntax, which Babel handles), you need to polyfill them with tools like core-js.

You technically could run these transpilations on their own, but more often than not you were including them somewhere in the plugin chain of your bundler for the creation of your final output JavaScript.

One issue that divided the community for a long time was whether you should transpile only your code or whether you should transpile all code including your code’s dependencies and so on. With the former, you risked your app ugly-crashing if the dependencies weren’t transpiled to your target browsers (hurting developer experience). The inverse was also a problem - if your dependencies preemptively transpiled to browsers you didn’t really care about, their code was also a lot bigger than you actually needed (hurting user experience).

With the latter, you took your fate in your own hands - at the cost of much longer build times. IMO the consensus has recently shifted to the latter, but with passionate disagreement - more reading here. A final important note on JS compatibility - With the advent of evergreen browsers and the undying nature of legacy browsers, it is now common to use the module/nomodule pattern and build twice (one bundle for each group of browser targets, to ensure optimal size for evergreen browsers -and- maximum compatibility for legacy browsers).

Other non-standardized additions to JavaScript gained popularity as well. React added JSX as a way of keeping the HTML-like authoring experience of writing components inside JavaScript files. Ractive, Vue and Svelte adopted Single File Components that lean on bundlers to compile to JS, CSS, and sometimes HTML. Babel-Macros put metaprogramming and compile-time execution inside of Babel.

By the way - If you’re gonna be writing make-believe JavaScript and compiling to JavaScript anyway, why keep writing JavaScript? CoffeeScript led the way to adding a Ruby-like developer experience, eventually supplanted by ES6. TypeScript and Flow added static type annotations and inference atop existing JavaScript syntax. ReasonML, Purescript, Elm and ClojureScript abandoned that familiarity in favor of stronger typing and more functional paradigms. Of these Compile-to-JS breeds, TypeScript currently polls highest at ~60% adoption in recent surveys.

Again - you can compile-to-JS all of these independently, but more likely than not you’re also going to want to tie this work to a step in your overall bundler workflow.

Production Optimizations

The other job of task runners and eventually bundler plugins, apart from all this module, target, and language related work, is all production optimizations that were usually stuck on to the build pipeline. (Not core to bundling, but usually coupled with it) Here is a nonexhaustive list of optimizations in rough order of importance:

Reduce JS bundle size: Strip out comments and minify JavaScript variable names. Google Closure Compiler deserves special mention as best-in-class here although it is not native to the Webpack ecosystem. You can also do the same for CSS - in fact, utility-first CSS frameworks like Tailwind rely on build tools stripping out unused classes.
Code Splitting: Making it trivial to reduce initial bundle size, outputting multiple JavaScript chunks that are only loaded on demand
Image and Font Optimization - these can be very large raw files that you can downsample to just the quality/glyphs you actually use. However, if you stick a bunch of image processing into your build process, especially without caching, you slow it down. Gatsby-Transformer-Sharp does a lot of pre-emptive image processing that you may wish to avoid - caching can help. I favor one-time work but the best approach is probably to use an Image CDN like Cloudinary or Netlify Large Media.
Prerendering or Server Side Rendering - so that first-loads of HTML show content instead of having to download and parse JS to be rendered clientside.
Creating HTML files that load your generated assets optimally: so you don’t have to
Inlining Critical CSS - so pages show up with important styles already loaded
Injecting environment variables - so you can vary Constants used in your code based on production/staging/other environments
PWA creation - plugins for an offline-first/cached speed (1, 2)
Content Hashing - for cachebusting - less relevant with some modern CDN configurations, but still can be relevant for browser caching scenarios
Tree Shaking and Dead Code Elimination: static analysis for stripping out unused code also for the purpose of reducing JS bundle size - but overrated in terms of actual impact

As you can see, enough of these are critical to modern web apps that build tools often become critically important for most teams and framework communities.

Developer Experience

Secondary to Production Optimizations (which impact end users), modern bundlers also offer niceties for Developer Experience (which end users don’t see):

Automatic Dependency Collection - Old school Task Runners and even Google Closure Compiler require you to manually declare all dependencies upfront. Bundlers like Webpack automatically build and infer your dependency graph based on what is imported and exported.
Dev Server - for Single Page Apps and JAMstack apps, there is no running Node server to run locally, so the bundler boots one up for local testing.
Hot Module Replacement - reduces the feedback loop of code changes by swapping out modules while an app is running, without a full reload. Reliant on having a running Dev Server. A higher level of hot reloading preserves the state of components while changing them, not just modules - this is an active area of development in React and a first class citizen in Elm. (See Addendum for more details)
Source Maps - production code isn’t human readable after all the production optimizations - bundlers can map source to production code, and browsers can request and display errors pointing to the original source if given a sourcemap. Useful both in production and in development.
Bundle Analyzer/Visualizer plugin - the modern app uses too many modules to manually account for. Visualization and analysis can help identify the biggest points to work on first. Formidable Labs makes a nice Dashboard for this.

Metaframeworks - reducing Many Jobs to One

Specifically for frontend developers using frameworks, all of the above can be a lot to setup just to say “Hello World” with best practices. It is natural to want to subsume these build tools under a single abstraction to form a new starting point. Dan Abramov, with inspiration from Ember CLI, led the way on this in React with create-react-app (this was the subject of my second ever talk).

In particular, Server-side Rendering is a pain point and of critical importance for performance while using a front-end framework, leading to Gatsby and Next.js (React), Gridsome and Nuxt.js (Vue), and Scully (Angular).

Even stepping outside the frontend web developer role, there is a case for abstracting over build tools, as with TSDX which helps write TypeScript libraries.

Conclusion - The Pandora’s Box

A small and passionate part of the webdev community is working very hard to make build tools more optional than they are today. Given that modules are the primary/original reason for the rise of build tools, and have since been standardized in ECMAScript, each runtime is working on making them usable without a build tool. ES Modules arrived in browsers in 2017, and Node.js unflagged them in Nov 2019 (putting them on track for widespread use in Node 14 in 2020). Pika and Rollup are banking especially hard on the universal, ESM-enabled future.

However, legacy browsers still exist, and legacy, battletested code is still in wide use. I’m also not sure how transitive dependencies are handled (the import map proposal may help! but… how would you build your import maps?). But the overriding issue for the “no-build-tools” future is that the bar has been raised so much higher than just “we want modules in JavaScript”, as the rest of this blogpost has painted. We still want asset management, static types, prerendering, image optimization, and whatever else we take for granted in modern web apps. So despite the progress in ESM-everywhere, I don’t see a clear path toward it materially impacting JavaScript in the near term.

Here’s V8’s advice:

With modules, it becomes possible to develop websites without using bundlers such as webpack, Rollup, or Parcel. It’s fine to use native JS modules directly in the following scenarios:

during local development

in production for small web apps with less than 100 modules in total and with a relatively shallow dependency tree (i.e. a maximum depth less than 5)

Addy Osmani put it like this:

Imo module bundlers will be necessary for prod builds for a good while yet. In a few years, ES modules perf + modulepreload + H2 Push + Cache Digests might give us a compelling story, but it’s a long road ahead. Modules for dev/authoring format may take off in short term.

Build Tools in JavaScript are a Pandora’s Box. We opened them, and from what little I can see, they are here to stay. Hopefully this has been a good intro to what jobs they perform in JavaScript.

Thanks

Thanks to Sean Larkin, Mark Erikson, Robin Wieruch and Joe Previte for reviewing early drafts of this!

Addendum on Hot Reloading

I don’t have much experience in the details of HMR. These are notes from Mark Erikson:

Module reloading: recompile, push new code to client, allow app to do something
“Plain” client reloading: just reimport and use affected modules. For React, this is mostly just reimport and re-render.
“Clever” reloading: attempting to preserve state in the component tree, but this requires much more complex work. React-Hot-Loader works by using Babel to insert http://module.hot.accept() and “proxy components” around every component it can identify, and moving the state up into the proxy component, but this is fragile. That’s why the new “Fast Refresh” capability is actually half-built into React itself, with bundler-specific use of the APIs.

His full explanation of Webpack vs React’s Hot reloading is on his blog..

Table of Contents

The Many Jobs of JS Build Tools

Table of Contents

Build Tools are Optional

Only Scripts

Task Runners

Multiple Targets, Multiple Assets: the rise of Bundlers

Multiple Targets, Multiple Languages: the rise of Transpilers and Compile-to-JS

Production Optimizations

Developer Experience

Metaframeworks - reducing Many Jobs to One

Conclusion - The Pandora’s Box

Thanks

Addendum on Hot Reloading

Latest Posts

Table of Contents [X]

The Many Jobs of JS Build Tools

Latest Posts

Table of Contents