RFC: Changing how we handle Webpack Building

Webpack Build

This RFC aims to address issues that developers have raised with the current webpack workflow. There have been a number of proposed solutions and issues raised. From my understanding, the following are a set of proposals on how we can modify the process to reduce time spent debugging issues and increase UI developer efficiency given the fast changing webpack world. The goal of this RFC is to first propose changes that have been discussed, to collect data and feedback from resulting conversations, converge on an agreed upon set of solutions to implement and implement those solutions.

Caveat: It is highly likely I have missed some nuances, proposals and issues that developers have raised regarding webpack. Please throw them into the discussion and I keep the RFC up to date as changes come in through discussion.

Node Module Packaging Proposals

Option 1: Node modules tarball as a Foreman source

This option would create a new node_modules tarball every time a new Foreman source is created (e.g. as part of test_develop or a specific release). This node_module tarball would be treated similar to foreman source itself, included in the RPM source. There are 1-2 node modules that have binary components and will still have to be packaged independently. Webpack plugins will need to provide their own node_modules tarball alongside gem source.

Pros

  • Drops nearly all NPM packages
  • Given Foreman source will always contain latest and greatest node_modules per package.json

Cons

  • The node_modules tar is ~50MB
  • Storage of node_modules tar will require enough space on:
    • Koji
    • Jenkins
  • SRPM will be ~50-60MB each

Option 2: Node modules archive RPM

This option would extract node_modules to their own RPM and serve as an archive that is strategically updated based upon changes to package.json. This would drop the packaging of node modules to 1-3 packages (accounting for those with binaries) and could be updated via direct npm install sourcing. Plugins would either package the few additional required modules or create their own node_module RPM.

Pros

  • Drops nearly all NPM individual packaging
  • Allows for node_modules to be in lockstep with core

Cons

  • Storage is a concern but depends on how often the node_modules RPM is updated
  • Requires rebuild process whenever package.json gets updated
  • Plugins still need a viable solution

Webpack Compilation Proposal

Today, the Foreman RPM runs webpack compilation based upon available sources and provides a macro for plugins. This means that it’s not as easy to replicate what the RPM build process is doing locally for tracking down issues. A proposal is to move webpack compilation out of the RPM/Deb steps and into a tool such as a script or Docker container that can easily replicate the compilation environment for developers.

This change would mean that we now compile webpack outside of the RPM and Deb build environment, commit those compiled aritfacts to Foreman and plugin sources and include it into the RPM. This does not cancel out the need for the the packaging change proposals. If we compile webpack outside of the RPM/Deb environment, we still need to at a minimum provide the set of node modules used to compile as sources (e.g. including them in the Foreman SRPM). If compilation occurs outside of RPM/Deb environments, we would avoid having to package the binary node modules (e.g. sass).

3 Likes

Another con for (against?) option 1:

Our current package.json does not completely pin versions, and we don’t have a package.lock. This means that if we don’t package our node modules, we basically end up with a potentially (!) different set of modules on every RPM build (or whenever we run npm i and tar). This will lead to a different vendor.js each time, and require us to rebuild webpack-using plugins more often. We don’t have this problem that often today, as the list of modules does not change often.

Also, may I ask what issues have been raised? I am not following the core webpack stuff too closely, but am curious from the packaging side.

I think just packing in node_modules will lead to problems on non-x86_64 platforms because of some node packages bringing native bindings (e.g. node-sass).

Would you mind expanding on the non-x86_64 platforms we have? I think most folks may not realize that we build for them.

For core only arm64/aarch64 ist left at the moment, but I’m seeing this as a more general problem, as there might be more platforms in the future, which might we want to build packages for (e.g. ppc64). Also, armhf and i386 were only removed form the DEB builds, because the whole JS build stuff went out of hand CPU and (recently) RAM wise over the time.

If webpack compilation was done outside of the build environment this may not play a factor and avoid the architecture issue. These additional architectures are Debian only right? Do we have insight into the benefit/effort of supporting these arches relative to usage?

The effort is minimal (if nodesource repos are available) - also probably the usage. I don’t think my arguments are strong enough here, but if we switch the build process to be executed regularly only on RH7/x86_64/NodeJS6(EPEL) only, we simply won’t see all the issues that occur in other environments and thus loose the ability to just add OSes/archs like we can now.

If we change the webpack compilation to run separately from the packaging process, wouldn’t that mean we won’t have any more issues regarding other oses/archs?
Since the final artifacts, the webpacked js bundles, don’t really care what OS they are served from - they are executed by the browser, not the server. We could just reuse the same artifacts for all packages, as well as to test locally for issues.

There are several issues that led to this proposal:

  1. A lot of effort is wasted packaging NPM modules as RPMs that are only used to make the modules available for the RPM build process. There are often issues caused by mismatched versions/dependencies, bundled vs. unbundled modules, and more. Effectively, we are trying to solve in RPM the dependency issues that npm already solves better (such as allowing multiple versions of the same module as a dependency for other modules), for little benefit (as the only user of these RPMs is the koji builder)
  2. UI development is slowed down as any change affecting the node modules requires matching changes in packaging, changes that are sometimes found to cause unexpected breakages that are only discovered in nightlies.
  3. Different distros/architectures have different resulting js bundles due to different node and npm versions being used by the builders.
  4. There is no way currently for developers to generate a “production-like” build locally, only download nightly packages - so debugging any issue found only in the builds becomes an extremely difficult and slow process.
  5. The problem you mentioned, that repeated builds may lead to different module versions being pulled in, is already present on debians where npm i is executed as part of the build process. I would argue that the reason the rpm packages don’t change very often is due to the pain of changing them and fear of causing something to break, not because we don’t want to do it. We have multiple modules that are already out of date which we don’t touch just so the build doesn’t break.
  6. Plugins regularly break due to changes in core modules requiring constant rebuilds every time something changes in core. Again, with no way of testing if a change in core will require a rebuild until something breaks. Adding the hashes to the modules is an improvement but still requires rebuilds once we realize something changed, sometimes for no need (for example if the content of a module changed but not any of its apis that are being used). The whole plugin workflow most likely needs to change somehow to allow changes in core without causing all plugins to break (perhaps by importing the foreman node module instead?).

If this will work with RPMs and DEBs and also in a way that people still have something like the foreman-assets for their custom plugins, yes - at the first sight. The build process will be tied at the end to that exact environment and if it would stop working on different environments (different Node versions, OSes, Architectures…) we just wouldn’t notice it.

Why does it matter if the webpack build doesn’t work on other environments? as long as we have a way of producing reliable and reproducible bundles, we only need it to work on the environment we use to generate them. The resulting bundles are architecture and distro agnostic.
The package builds for all environments can just rely on that output being available for them, and the same files will be available for anyone using custom plugins. If someone wants to recreate the bundles from scratch they can run the webpack process on their own on whatever distro/arch they want, but we don’t have to support that or promise that the result would be compatible with all plugins.

If you never need anything like node-sass after that build process, yes.

Yes, that’s what I’m saying: In reality, it will probably stop working soon, because we don’t see the problems with these envs anymore.

We never need any of the node modules after the build process. We currently package them as RPMs for the sole purpose of making them available to the builders in the case of RPMs, and install them with npm in the case of debians. None of the modules are shipped to our users anyways, just the resulting webpack bundles which are always completely distro-agnostic (they are simple js files that are served as-is to the browser).

Is this something that should matter to us? I think it is much better to provide a way of reliably generating the same js files regardless of where they were built - which is why I tend to think a using a container to create them might be the best way to go.

1 Like

I like option 2 more than option 1, because it decouples the updates of the foreman itself from the npm dependencies. With option 1, every nightly (or every z-stream bulid) would actually produce a new potentially unique version of the bundle, which is something I’m not sure we need (and in z-stream, we actually want to avoid it IMO). Also this update would potentially cause the breakage of plugins that were build against this vendor.

It looks like option 2 gives more control over what actually gets. One cons of option 2 over 1 is, that with 1, the srpm provides more reproducible build, as more sources are bundled there, while with the option 2, it depends on what assets rpm is currently available in the build root (unless we would actually version the builds and used strict dependencies).

Regarding the storage need, we could still store the package.json with the foreman rpm, to list what npm packages were used, which could be used later for producing the assets rpm with the same version (in case the assets rpm was already removed.

I think it’s still quite a corner-case, as we usually need to re-build just the latest versions of nightly and z-streams, so many of older asset rpms could be cleaned quite soon without anyone missing them. Basically, in nightlies, keeping just few last assets rpms would be good. With z-streams, we could actually keep them, because I truly don’t expect we would need to do many npm updates within z-streams.

For the plugins, I wonder if the best path wouldn’t be to actually add the dependencies that plugins need to the core assets rpm. It would help unify the npm package versions being used across the ecosystem.

Otherwise, option 1 sounds like more suitable for plugins (I mean using option 2 for core, and option 1 for the plugins)

I’m also in favor of option 2 for the core.

I think that ideally plugins import all the common dependencies from an npm package foreman (it doesn’t exist yet but we need to create it anyway to fix issues with tests). I believe that then there won’t be many purely plugin dependencies left. Even though option 1 for plugins might be simpler, I’d probably vote for using the same approach that we take with core. Just for the sake of unification.

If it is outside of rpm/deb environment, could we get rid of packaging the node modules by storing package.lock in the git repo too? That should give us reproducible way of installing the same modules.

As I have thought about this and tried to understand it, especially compared to how the asset pipeline has worked (and not really caused us trouble) I wrote down my assumptions of the understanding of how things work today for core webpack and plugins. I’d ask @ui_ux developers to read through, correct statements and add more details where appropriate. Given part of this solution requires a potential change to how we handle plugin assets.

Webpack compile on core creates two JS files:

  • bundle.js --> contains all of the Foreman application code and 0 third party dependencies
  • vendor.js --> contains all of the Foreman third party dependency code

Change events:

  • vendor.js should only change when third party dependencies update
  • bundle.js should only change when Foreman code changes

For plugins:

  • plugins have requires on 1 or more dependencies in vendor.js
  • plugins may use API’s defined by Foreman code which lives in bundler.js
  • sometimes plugins need to generate their own vendor.js for third party dependencies only they depend on

Compilation:

  • webpack compile can change references to vendor libraries
  • webpack compile can change references to APIs plugins rely on in bundle.js
  • webpack compile eliminates redundancy by magically detecting what’s in vendor or bundle and keeping it out of the plugin bundle

Requirements

  • Foreman needs to be able to generate vendor and bundle for core code
  • plugins need to generate vendor for additional dependencies
  • plugins need to generate a bundle for just their code

As I understand things, generating a plugins bundle is not as simple as take all the JS, concat it together and minify it due to the import statements within the code that attempts to ensure that dependencies are present combined with per compilation generation of code references due to how minification works. The plugin compilation step therefore requires analyzing core vendor.js and bundle.js to figure out what is present there to prevent putting it into the plugins bundle. As I think I understand things, this is one of the major issues given all references to something like lodash need to be the same. Any changes to vendor or bundle therefore “breaks” the generated “API” webpack creates and plugins must be recompiled.

3 Likes

Regarding the webpack process, using webpack-dev-analyzer can be really helpful for visualizing the bundles webpack is actually creating

Here is a screenshot of its output:

It is an interactive map, you can run it locally with the following change. Starting the webpack server will make it run on port 8888

diff --git a/config/webpack.config.js b/config/webpack.config.js
index 52e3b88..49ccc54 100644
--- a/config/webpack.config.js
+++ b/config/webpack.config.js
@@ -11,6 +11,7 @@ var LodashModuleReplacementPlugin = require('lodash-webpack-plugin');
 var pluginUtils = require('../script/plugin_webpack_directories');
 var vendorEntry = require('./webpack.vendor');
 var SimpleNamedModulesPlugin = require('../webpack/simple_named_modules');
+var BundleAnalyzerPlugin = require('webpack-bundle-analyzer').BundleAnalyzerPlugin;
 
 module.exports = env => {
   // must match config.webpack.dev_server.port
@@ -156,6 +157,8 @@ module.exports = env => {
     minChunks: Infinity,
   }))
 
+  config.plugins.push(new BundleAnalyzerPlugin({ analyzerHost: '0.0.0.0' }));
+
   if (production) {
     config.plugins.push(
       new webpack.NoEmitOnErrorsPlugin(),

On a dev server with katello and foreman-remote-execution checked out, these are the js files rake webpack:compile generates

public/webpack/katello-5355e26c3c64f9ae449b.js
public/webpack/bundle-64a5567e880bad13b0f0.js
public/webpack/foreman_remote_execution-169b6eac99449ad52178.js
public/webpack/vendor-595096fd722e17348663.js

It looks like there is a bundle.js type file for each plugin. You can see from the bundle analyzer that its mostly source code, there is a small amount of duplication, but nothing major (afaict). It also doesn’t look like more than one vendor.js file is created. I’m pretty sure webpack has magic to prevent duplicating third party libraries in vendor.js.

Another aspect I found is foremanReact is “aliased” by webpack, and that allows us to use it in Katello and other plugins to import functionality, much the same as we would any other package.

I don’t know if foremanReact is included in vendor.js or there is something that webpack uses to reference the right bundle, but I can look more.

My understanding of the overall webpack process is we end up with the following bundles (ignoring gzipped and map files)

bundle.js # foreman source code
foo-plugin.js # plugin source code, there will be a file for each plugin
vendor.js # third-party libraries from 'package.json' only.

Its still not clear to me what happens to packages that are only in plugins, but not in foreman core. It looks like they end up in node_modules in the respective plugin’s bundle.

I’m learning this myself, so if any of my assumptions are incorrect, please jump in and correct me.

Hope this helps!

For running the bundle analyzer locally, I forgot the most important step, you have to install it :smile:
npm install --save-dev webpack-bundle-analyzer

Then you can make the changes in the diff and restart your server to see the analyzed bundle on port 8888

One thing to note is that the structure of the node_modules directory is affected by multiple things:

  • Which distro npm install is executed on (not sure if this only effects the modules with native extensions or all of them).
  • Which version of npm and node is used for the npm install. This we could possibly control using nvm, which afaik works similarly to rvm for ruby.
  • What modules are included in package.json. npm has a method of de-duping common dependencies when possible, but this de-duping is dependent on the dependency tree. Currently, when building plugin bundles, I think we add their dependencies to the package.json, which can cause different structure of the folder to be created due to different dependency tree.
  • new modules that are released which change their dependencies. We don’t pin our modules to specific versions in most cases, so if one of our dependencies releases a new version that may change the directory structure.

When developers try to build the bundles locally, the will not get the same structure as the one that is generated at build time, which makes it difficult to debug some issues that are only present in production builds.

Webpack currently uses the path under node_modules to identify each module (https://github.com/theforeman/foreman/pull/5734). What this means that if the directory structure changes between building core and building a plugin, the plugin’s bundle may not work correctly with core’s vendor.js. To make sure this doesn’t happen we started using hashes to confirm the plugin expects the correct vendor.js from core, but this leads to every single change in the file to require rebuilding all plugins to match it - even though in many cases that isn’t required.

1 Like

why not simply run the npm run analyze? :slight_smile: - it comes out of the box

1 Like