RFC: Changing how we handle Webpack Building

tbrisker · October 18, 2018, 12:18pm

Why does it matter if the webpack build doesn’t work on other environments? as long as we have a way of producing reliable and reproducible bundles, we only need it to work on the environment we use to generate them. The resulting bundles are architecture and distro agnostic.
The package builds for all environments can just rely on that output being available for them, and the same files will be available for anyone using custom plugins. If someone wants to recreate the bundles from scratch they can run the webpack process on their own on whatever distro/arch they want, but we don’t have to support that or promise that the result would be compatible with all plugins.

mmoll · October 18, 2018, 4:13pm

If you never need anything like node-sass after that build process, yes.

Yes, that’s what I’m saying: In reality, it will probably stop working soon, because we don’t see the problems with these envs anymore.

tbrisker · October 18, 2018, 9:35pm

We never need any of the node modules after the build process. We currently package them as RPMs for the sole purpose of making them available to the builders in the case of RPMs, and install them with npm in the case of debians. None of the modules are shipped to our users anyways, just the resulting webpack bundles which are always completely distro-agnostic (they are simple js files that are served as-is to the browser).

Is this something that should matter to us? I think it is much better to provide a way of reliably generating the same js files regardless of where they were built - which is why I tend to think a using a container to create them might be the best way to go.

iNecas · October 19, 2018, 5:02pm

I like option 2 more than option 1, because it decouples the updates of the foreman itself from the npm dependencies. With option 1, every nightly (or every z-stream bulid) would actually produce a new potentially unique version of the bundle, which is something I’m not sure we need (and in z-stream, we actually want to avoid it IMO). Also this update would potentially cause the breakage of plugins that were build against this vendor.

It looks like option 2 gives more control over what actually gets. One cons of option 2 over 1 is, that with 1, the srpm provides more reproducible build, as more sources are bundled there, while with the option 2, it depends on what assets rpm is currently available in the build root (unless we would actually version the builds and used strict dependencies).

Regarding the storage need, we could still store the package.json with the foreman rpm, to list what npm packages were used, which could be used later for producing the assets rpm with the same version (in case the assets rpm was already removed.

I think it’s still quite a corner-case, as we usually need to re-build just the latest versions of nightly and z-streams, so many of older asset rpms could be cleaned quite soon without anyone missing them. Basically, in nightlies, keeping just few last assets rpms would be good. With z-streams, we could actually keep them, because I truly don’t expect we would need to do many npm updates within z-streams.

For the plugins, I wonder if the best path wouldn’t be to actually add the dependencies that plugins need to the core assets rpm. It would help unify the npm package versions being used across the ecosystem.

Otherwise, option 1 sounds like more suitable for plugins (I mean using option 2 for core, and option 1 for the plugins)

Tomas_Strachota · October 26, 2018, 1:36pm

I’m also in favor of option 2 for the core.

I think that ideally plugins import all the common dependencies from an npm package foreman (it doesn’t exist yet but we need to create it anyway to fix issues with tests). I believe that then there won’t be many purely plugin dependencies left. Even though option 1 for plugins might be simpler, I’d probably vote for using the same approach that we take with core. Just for the sake of unification.

If it is outside of rpm/deb environment, could we get rid of packaging the node modules by storing package.lock in the git repo too? That should give us reproducible way of installing the same modules.

ehelms · October 31, 2018, 6:04pm

As I have thought about this and tried to understand it, especially compared to how the asset pipeline has worked (and not really caused us trouble) I wrote down my assumptions of the understanding of how things work today for core webpack and plugins. I’d ask @ui_ux developers to read through, correct statements and add more details where appropriate. Given part of this solution requires a potential change to how we handle plugin assets.

Webpack compile on core creates two JS files:

bundle.js --> contains all of the Foreman application code and 0 third party dependencies
vendor.js --> contains all of the Foreman third party dependency code

Change events:

vendor.js should only change when third party dependencies update
bundle.js should only change when Foreman code changes

For plugins:

plugins have requires on 1 or more dependencies in vendor.js
plugins may use API’s defined by Foreman code which lives in bundler.js
sometimes plugins need to generate their own vendor.js for third party dependencies only they depend on

Compilation:

webpack compile can change references to vendor libraries
webpack compile can change references to APIs plugins rely on in bundle.js
webpack compile eliminates redundancy by magically detecting what’s in vendor or bundle and keeping it out of the plugin bundle

Requirements

Foreman needs to be able to generate vendor and bundle for core code
plugins need to generate vendor for additional dependencies
plugins need to generate a bundle for just their code

As I understand things, generating a plugins bundle is not as simple as take all the JS, concat it together and minify it due to the import statements within the code that attempts to ensure that dependencies are present combined with per compilation generation of code references due to how minification works. The plugin compilation step therefore requires analyzing core vendor.js and bundle.js to figure out what is present there to prevent putting it into the plugins bundle. As I think I understand things, this is one of the major issues given all references to something like lodash need to be the same. Any changes to vendor or bundle therefore “breaks” the generated “API” webpack creates and plugins must be recompiled.

John_Mitsch · November 5, 2018, 5:20pm

Regarding the webpack process, using webpack-dev-analyzer can be really helpful for visualizing the bundles webpack is actually creating

Here is a screenshot of its output:

It is an interactive map, you can run it locally with the following change. Starting the webpack server will make it run on port 8888

diff --git a/config/webpack.config.js b/config/webpack.config.js
index 52e3b88..49ccc54 100644
--- a/config/webpack.config.js
+++ b/config/webpack.config.js
@@ -11,6 +11,7 @@ var LodashModuleReplacementPlugin = require('lodash-webpack-plugin');
 var pluginUtils = require('../script/plugin_webpack_directories');
 var vendorEntry = require('./webpack.vendor');
 var SimpleNamedModulesPlugin = require('../webpack/simple_named_modules');
+var BundleAnalyzerPlugin = require('webpack-bundle-analyzer').BundleAnalyzerPlugin;
 
 module.exports = env => {
   // must match config.webpack.dev_server.port
@@ -156,6 +157,8 @@ module.exports = env => {
     minChunks: Infinity,
   }))
 
+  config.plugins.push(new BundleAnalyzerPlugin({ analyzerHost: '0.0.0.0' }));
+
   if (production) {
     config.plugins.push(
       new webpack.NoEmitOnErrorsPlugin(),

On a dev server with katello and foreman-remote-execution checked out, these are the js files rake webpack:compile generates

public/webpack/katello-5355e26c3c64f9ae449b.js
public/webpack/bundle-64a5567e880bad13b0f0.js
public/webpack/foreman_remote_execution-169b6eac99449ad52178.js
public/webpack/vendor-595096fd722e17348663.js

It looks like there is a bundle.js type file for each plugin. You can see from the bundle analyzer that its mostly source code, there is a small amount of duplication, but nothing major (afaict). It also doesn’t look like more than one vendor.js file is created. I’m pretty sure webpack has magic to prevent duplicating third party libraries in vendor.js.

Another aspect I found is foremanReact is “aliased” by webpack, and that allows us to use it in Katello and other plugins to import functionality, much the same as we would any other package.

I don’t know if foremanReact is included in vendor.js or there is something that webpack uses to reference the right bundle, but I can look more.

My understanding of the overall webpack process is we end up with the following bundles (ignoring gzipped and map files)

bundle.js # foreman source code
foo-plugin.js # plugin source code, there will be a file for each plugin
vendor.js # third-party libraries from 'package.json' only.

Its still not clear to me what happens to packages that are only in plugins, but not in foreman core. It looks like they end up in node_modules in the respective plugin’s bundle.

I’m learning this myself, so if any of my assumptions are incorrect, please jump in and correct me.

Hope this helps!

John_Mitsch · November 5, 2018, 7:37pm

For running the bundle analyzer locally, I forgot the most important step, you have to install it
npm install --save-dev webpack-bundle-analyzer

Then you can make the changes in the diff and restart your server to see the analyzed bundle on port 8888

tbrisker · November 6, 2018, 8:53am

One thing to note is that the structure of the node_modules directory is affected by multiple things:

Which distro npm install is executed on (not sure if this only effects the modules with native extensions or all of them).
Which version of npm and node is used for the npm install. This we could possibly control using nvm, which afaik works similarly to rvm for ruby.
What modules are included in package.json. npm has a method of de-duping common dependencies when possible, but this de-duping is dependent on the dependency tree. Currently, when building plugin bundles, I think we add their dependencies to the package.json, which can cause different structure of the folder to be created due to different dependency tree.
new modules that are released which change their dependencies. We don’t pin our modules to specific versions in most cases, so if one of our dependencies releases a new version that may change the directory structure.

When developers try to build the bundles locally, the will not get the same structure as the one that is generated at build time, which makes it difficult to debug some issues that are only present in production builds.

Webpack currently uses the path under node_modules to identify each module (https://github.com/theforeman/foreman/pull/5734). What this means that if the directory structure changes between building core and building a plugin, the plugin’s bundle may not work correctly with core’s vendor.js. To make sure this doesn’t happen we started using hashes to confirm the plugin expects the correct vendor.js from core, but this leads to every single change in the file to require rebuilding all plugins to match it - even though in many cases that isn’t required.

ohadlevy · November 6, 2018, 9:07am

why not simply run the npm run analyze? - it comes out of the box

ehelms · November 6, 2018, 1:26pm

In general though, the relative path shouldn’t change if it’s a top level module though? e.g. lodash/map.js will always be at lodash/map.js unless the lodash project itself changes the publish structure?

tbrisker · November 6, 2018, 1:29pm

That is correct, but if the dependency tree changes that may cause a module that was a sub-dependency to become a top level dependency or vice versa due to de-duping. Also, the exact de-duping may vary between node/npm versions, so a module that is top level on one system may be a sub-module on another.

ehelms · November 6, 2018, 4:49pm

This idea is based on my limited knowledge, so hoping that folks can steer this in the right direction. Could we move to a model where plugins assume that a “module” exists named foremanVendor and all requires for thirdparty modules are requested through the foremanVendor module?

If something like foremanVendor.lodash was a constant I could rely on as a plugin, then I wouldn’t have to care about the relative paths or dependencies shifting inside vendor.js (I think). This seems sorta what an alias in webpack is designed for. Similar to how all of the React is reference-able using the foremanReact alias.

John_Mitsch · November 6, 2018, 6:12pm

Excuse my asking of obvious questions, but what problem does the node modules directory structure changing cause? It seems like if you had a package-lock.json or a way to reference package versions you could recreate the bundle.

What is the idea behind this? To ensure our specified packages are always in the same hierarchy in the node_modules folder?

I think alias is more for code that you have locally that you want to reference in imports, but don’t want to package and ship a new npm package for. I would think putting third-party code in an alias is an anti-pattern, but this being the wild west of javascript, I could be wrong

tbrisker · November 6, 2018, 7:58pm

As I mentioned, there are several different causes leading to different structures - changes in module versions (which package lock may solve) is just one of them. I’m not sure that running with the same pacakge lock would yield the same structure e.g. on different npm versions. The problem it causes is in the identifiers assigned to modules during the webpack compilation - which are their path relative to node_modules folder.

John_Mitsch · November 6, 2018, 8:22pm

Ok, gotcha, I’m trying to wrap my mind around doing this in a packaging context so thanks for your patience

If I understand correctly, the idea is if we are doing this new way of packaging, we want to just be able to rebuild vendor.js and keep the foreman/plugin bundles the same. So if you recompile vendor.js because a package was added and the node_modules directory structure changes, you break all the plugin’s bundles since the references are no longer valid.

tbrisker · November 6, 2018, 9:00pm

That is correct right now since the hash of vendor.js will change when anything in it changes. If we drop the hash, this will be semi correct - sometimes it would still work (i.e. when the path doesn’t change) and other times it will break without us knowing, which is why the hash was introduced. The goal is indeed to allow changes in core node modules without requiring all plugins to be rebuilt.

John_Mitsch · November 6, 2018, 9:57pm

One part I’m still confused on is how the directory structure of node_modules affects the vendor.js bundle. I can walk through a scenario to illustrate my confusion.

wepback creates bundled files for foreman, katello, and third-party. I’m assuming we dropped the hashes here, so lets call them bundle.js, katello.js, and vendor.js
some-package is added to foreman’s package.json
npm i is ran and node_modules structure is updated to include some-package but maybe there are some byproducts of folders being rearranged. The node_modules directory structure is different now.
I create a new vendor.js using webpack, but I keep the old katello.js and bundle.js. These are the files being served.

Wouldn’t the existing bundle.js and katello.js still be able to reference vendor.js correctly? As long as no packages were removed, it should have the same packages in it with some-package added. I guess you would need to ensure the existing package’s versions are correct or compatible with the old bundles, but that seems like it could be done with pinning the versions.

What I am wondering is how node_modules directory structure actually affects the bundles? It seems like webpack will ensure the packages in node_modules are available as modules in vendor.js that can be imported elsewhere. This would mean the directory structure of node_modules is inconsequential.

According to the docs, webpack uses a manifest to find where modules are located. If this is manifest.json in our case, the contents shouldn’t change if the hashes are removed.

I could be missing something obvious, just trying to get a better idea of the issue, thanks for the explanations so far.

John_Mitsch · November 6, 2018, 10:07pm

ok, I think I (partially) answered my question right after sending it (as is always the case).

I looked into the bundled files and see numerous references to third-party paths inside node_modules, i.e node_modules/jquery-flot/jquery.flot.time.js, so the concerns make more sense now.

I’m still not totally clear on how this works client-side, what is node_modules referencing when the bundles get to the browser?

John_Mitsch · November 9, 2018, 1:50am

I looked into this more and had some off-thread discussions to clear up my confusion. Also, @tbrisker’s medium article helped explain a lot of background of our webpack usage and the current issue we are trying to solve. (thanks for writing that!)

I came up with a potential solution to making new vendor.js bundles compatible with other existing bundles if they have don’t have breaking changes.

I created a plugin (basing it of @tbrisker’s existing one) that creates a hash based on plugin name, author, major, and minor versions. This is used as the module identifier. The patch (or z) version is excluded from creating the hash, so updating z versions won’t create new hashes. Only major or minor breaking changes will update the module identifiers.

This way, if you have a bundle.js that was build with this plugin, it will keep the same references to the vendor.js file as long as packages didn’t break x or y versions. So you could do the following:

build vendor.js files with updated z-versions and use an old bundle.js files with them
build new bundle.js files and use old vendor.js files with them.

I created an example repo with a webpack config that resembles ours. The plugin can be found in this file and included steps to test the concept in the README.

I also found webpack has a way to create a records.json file, which maps the identifiers to modules, and can be used to persist this across builds. I enabled this in the repo and use it in the instructions. This might be something we find useful.

I’m not sure if this will fit all of our criteria and I haven’t tested it with actual foreman builds, but hopefully its a step in the right direction.