The Many Jobs of JS Build Tools
For new JS developers
A discussion of why JS developers use build tools like Webpack and what we do with them, for new JS developers.
Essay status: mostly baked, sat on this for about 3 weeks and got some amount of peer review
One of my regrets in my recent SE Daily interview was my rather poor, panicked description of "Why Webpack" and "Why Gatsby". Jeff, the host, doesn't do much frontend coverage, whereas I have lived and breathed this stuff for the past 2+ years.
It felt rather like that phenomenon of how "Fish have no word for Water" - Water just is. Having to justify the existence of Water from first principles called on explanation muscles I rarely use. And then when I checked Webpack's Why Webpack docs, I felt it focused mostly on modules and didn't spend enough time on the other important jobs that build tools provide for us.
I'd like another shot at explaining this.
- Build Tools are Optional
- Only Scripts
- Task Runners
- Multiple Targets, Multiple Assets: the rise of Bundlers
- Multiple Targets, Multiple Languages: the rise of Transpilers and Compile-to-JS
- Production Optimizations
- Developer Experience
- Metaframeworks - reducing Many Jobs to One
- Conclusion - The Pandora's Box
- Related Reads
- Addendum on Hot Reloading
The first thing to acknowledge is that build tools (which imply having a "build step" before deploying code, instead of directly being able to go from source code to deploy) aren't technically required.
<script> tag or in browser console, and it runs. No mandatory compile step unlike some other languages. So, then, to put a build step - that kinda looks like a compile step - in front of it evidently takes something away from that benefit.
<script src="https://code.jquery.com/jquery-3.3.1.slim.min.js"></script> <script src="https://cdnjs.cloudflare.com/ajax/libs/popper.js/1.14.6/umd/popper.min.js"></script> <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.2.1/js/bootstrap.min.js"></script> <script src="//ajax.googleapis.com/ajax/libs/jquery/1.9.1/jquery.min.js"></script> <script src="magnific-popup/jquery.magnific-popup.js"></script>
Variables are scoped to each file, so in order to make a script that depended on another script, you had to throw them on the global
window.MyScript = /* etc */
and then the subsequent script would access that magic global on the window:
var MyScriptCore = window.MyScript console.log(MyScriptCore())
This is all kinds of bad - causing a namespace pollution problem, and requiring scripts to be loaded in the exact right order or face inscrutable
undefined is not a function errors. What we really wanted was modules, rather than scripts.
Relying on script CDNs introduced security and latency costs, so the alternative was to download the code and glue them together in the right way to create one (or more) custom JS bundles entirely within your control and hosted on your infrastructure. To gloss over a huge amount of development in the space (that I don't know much about), "task runners" arose to automate this process for you - Grunt, Gulp, Broccoli.
We realized that the act of concatenating scripts requires building a dependency graph and managing any namespace conflicts via static analysis. Where "task runners" mostly stopped at concatenating files per your explicit instructions, bundlers can use ASTs to statically analyze imports, ensure everything is initialized in the right order, and produce a minimal number of bundles for optimized loading.
To facilitate static analysis of dependency graphs, Webpack pioneered the
import syntax, which has since been standardized into ESModules (here's Lin Clark on ESModules) and begun healing the divide between the various module specifications.
Incidentally, the job of task runners were also chipped away as package managers, primarily npm but also bower, built in simple and effective scripting to their CLI's. The similarity and overlapping responsibilities caused much confusion but hopefully I have given a good accounting here.
Today, Webpack is in wide use for apps and metaframeworks (tools that bundle other tools to lessen the pain of configuration) like
create-react-app, Next.js and Gatsby. Rollup is in second place and is widely used for building libraries as opposed to apps, partially for historical reasons but its hard reliance on ES Modules lets it build smaller bundles at the cost of some incompatibility. Parcel is the newest kid on the block (launched Dec 2017), aiming for a batteries-included approach with parallelizable, origin-agnostic graph crawling.
Writing web apps often involves coordinating non-JS assets, like images and css files. It's often better to co-locate related assets with their associated code, than to have to manage them in separate folders. Webpack made it a norm to simply
import Image from './image.png' and
One issue that divided the community for a long time was whether you should transpile only your code or whether you should transpile all code including your code's dependencies and so on. With the former, you risked your app ugly-crashing if the dependencies weren't transpiled to your target browsers (hurting developer experience). The inverse was also a problem - if your dependencies preemptively transpiled to browsers you didn't really care about, their code was also a lot bigger than you actually needed (hurting user experience).
With the latter, you took your fate in your own hands - at the cost of much longer build times. IMO the consensus has recently shifted to the latter, but with passionate disagreement - more reading here. A final important note on JS compatibility - With the advent of evergreen browsers and the undying nature of legacy browsers, it is now common to use the module/nomodule pattern and build twice (one bundle for each group of browser targets, to ensure optimal size for evergreen browsers -and- maximum compatibility for legacy browsers).
Again - you can compile-to-JS all of these independently, but more likely than not you're also going to want to tie this work to a step in your overall bundler workflow.
The other job of task runners and eventually bundler plugins, apart from all this module, target, and language related work, is all production optimizations that were usually stuck on to the build pipeline. (Not core to bundling, but usually coupled with it) Here is a nonexhaustive list of optimizations in rough order of importance:
- Image and Font Optimization - these can be very large raw files that you can downsample to just the quality/glyphs you actually use. However, if you stick a bunch of image processing into your build process, especially without caching, you slow it down. Gatsby-Transformer-Sharp does a lot of pre-emptive image processing that you may wish to avoid - caching can help. I favor one-time work but the best approach is probably to use an Image CDN like Cloudinary or Netlify Large Media.
- Prerendering or Server Side Rendering - so that first-loads of HTML show content instead of having to download and parse JS to be rendered clientside.
- Creating HTML files that load your generated assets optimally: so you don't have to
- Inlining Critical CSS - so pages show up with important styles already loaded
- Injecting environment variables - so you can vary Constants used in your code based on production/staging/other environments
- PWA creation - plugins for an offline-first/cached speed (1, 2)
- Content Hashing - for cachebusting - less relevant with some modern CDN configurations, but still can be relevant for browser caching scenarios
- Tree Shaking and Dead Code Elimination: static analysis for stripping out unused code also for the purpose of reducing JS bundle size - but overrated in terms of actual impact
As you can see, enough of these are critical to modern web apps that build tools often become critically important for most teams and framework communities.
Secondary to Production Optimizations (which impact end users), modern bundlers also offer niceties for Developer Experience (which end users don't see):
- Automatic Dependency Collection - Old school Task Runners and even Google Closure Compiler require you to manually declare all dependencies upfront. Bundlers like Webpack automatically build and infer your dependency graph based on what is imported and exported.
- Dev Server - for Single Page Apps and JAMstack apps, there is no running Node server to run locally, so the bundler boots one up for local testing.
- Hot Module Replacement - reduces the feedback loop of code changes by swapping out modules while an app is running, without a full reload. Reliant on having a running Dev Server. A higher level of hot reloading preserves the state of components while changing them, not just modules - this is an active area of development in React and a first class citizen in Elm. (See Addendum for more details)
- Source Maps - production code isn't human readable after all the production optimizations - bundlers can map source to production code, and browsers can request and display errors pointing to the original source if given a sourcemap. Useful both in production and in development.
- Bundle Analyzer/Visualizer plugin - the modern app uses too many modules to manually account for. Visualization and analysis can help identify the biggest points to work on first. Formidable Labs makes a nice Dashboard for this.
Specifically for frontend developers using frameworks, all of the above can be a lot to setup just to say "Hello World" with best practices. It is natural to want to subsume these build tools under a single abstraction to form a new starting point. Dan Abramov, with inspiration from Ember CLI, led the way on this in React with
create-react-app (this was the subject of my second ever talk).
In particular, Server-side Rendering is a pain point and of critical importance for performance while using a front-end framework, leading to Gatsby and Next.js (React), Gridsome and Nuxt.js (Vue), and Scully (Angular).
Even stepping outside the frontend web developer role, there is a case for abstracting over build tools, as with TSDX which helps write TypeScript libraries.
A small and passionate part of the webdev community is working very hard to make build tools more optional than they are today. Given that modules are the primary/original reason for the rise of build tools, and have since been standardized in ECMAScript, each runtime is working on making them usable without a build tool. ES Modules arrived in browsers in 2017, and Node.js unflagged them in Nov 2019 (putting them on track for widespread use in Node 14 in 2020). Pika and Rollup are banking especially hard on the universal, ESM-enabled future.
Here's V8's advice:
With modules, it becomes possible to develop websites without using bundlers such as webpack, Rollup, or Parcel. It’s fine to use native JS modules directly in the following scenarios:
- during local development
- in production for small web apps with less than 100 modules in total and with a relatively shallow dependency tree (i.e. a maximum depth less than 5)
Imo module bundlers will be necessary for prod builds for a good while yet. In a few years, ES modules perf + modulepreload + H2 Push + Cache Digests might give us a compelling story, but it's a long road ahead. Modules for dev/authoring format may take off in short term.
Other reads I recommend on this topic:
- Blogpost: Why would I use a Webpack?
- Webpack and Rollup: the same but different
- Comparing bundlers: Webpack, Rollup & Parcel
- ESModules: A Cartoon Deep Dive
- ESModules Support vs Build Tools
- Kyle Simpson on the divergence between Human-written JS and Machine-run JS
- Sean Larkin's Webpack Academy Workshop starts with 1 hour on Why Webpack - view on Frontend Masters
I don't have much experience in the details of HMR. These are notes from Mark Erikson:
- Module reloading: recompile, push new code to client, allow app to do something
- "Plain" client reloading: just reimport and use affected modules. For React, this is mostly just reimport and re-render.
- "Clever" reloading: attempting to preserve state in the component tree, but this requires much more complex work. React-Hot-Loader works by using Babel to insert
http://module.hot.accept()and "proxy components" around every component it can identify, and moving the state up into the proxy component, but this is fragile. That's why the new "Fast Refresh" capability is actually half-built into React itself, with bundler-specific use of the APIs.