Proposal: Delete old manuals from our site

Hello,

generating of our site is super slow (a minute on my beefy PC), it went up to the point when you need to increase kernel option for inotify

FATAL: Listen error: unable to monitor directories for changes.
Visit https://github.com/guard/listen/wiki/Increasing-the-amount-of-inotify-watchers for info on how to fix this.
rake aborted!

While I was working on a PR that blacklists old manuals from search engines (https://github.com/theforeman/theforeman.org/pull/1256) I figured out that it would be beneficial to simply delete old manuals (core + plugins) from the site completely or replace them with some stub or redirect.

Opinions?

1 Like

Kind of funny, but this would help us with Katello 3.1 vs. 3.10 confusion in the docs.

As a policy, we support the last two releases. I think we should go ahead and make a similar policy for the docs as well. Maybe the last 5 or 10 releases? Katello rarely gets bug reports from 10 releases back, but 6 isn’t uncommon (e.g. we still get some 3.4 bug reports).

I also wonder if we’ve done any research on making Jekyll faster (I don’t see a -j flag or anything).

1 Like

I agree on that…docs should only mention versions in active support and additionally maybe 2-3 versions back.

2 Likes

Disappearing links are always annoying and bad for search engines. Luckily there’s a solution: redirects. We probably need to implement this in Apache since it’s a static site.

2 Likes

This would work: https://help.github.com/articles/redirects-on-github-pages/

So let’s just agree on last N releases? I like 5 that’s fair.

Only the manual isn’t really a GH page but a jekyll site we deploy on our server (not that that prevents us from doing redirects).

That was my thought too, but it’s client side redirection. Both the <meta> and JS redirect methods. I don’t know if search engines also respect this and see the new page as the natural new page for all the incoming links.

Ok thanks. I haven’t realized we deploy our site ourselves :slight_smile:

Cool, are you guys fine with 5 last versions for plugin/core manuals? The remaining 4 would include NOINDEX meta tag for google so at least search engines learn fast to offer the stable one.

One more thing, do you prefer commiting .htaccess directly to the git codebase? I do, kinda assume we are running Apache httpd but I can be again wrong.

We are running Apache. https://github.com/theforeman/foreman-infra/blob/master/puppet/modules/web/templates/web.conf.erb already includes some rewrite rules.

1 Like

Thanks.

So would like to come to agreenment first because rebasing those PRs with deleted stuff won’t be easy I guess. I am gonna do PR with all manuals deleted except last 5 versions for both core and plugins. All old versions (except the stable) will have “noindex” meta tag to prevent search engines from indexing. And for all deleted pages I will create redirects in Apache configuration.

@tbrisker @Gwmngilfen

I’d rather not see that content deleted entirely. We do offer the packages right back to 1.0, and if a user needs to do an upgrade from an ancient version, they’ll want to see the upgrade notes / known issues for each version as they follow the upgrade path.

That said, I do agree the search index issue is annoying. Could we move older manuals to a PDF download? Then the people who really need them can still get them, but it’s de facto not indexable :slight_smile:

+1 on the rest of this, especially redirects in Apache (awkward, but what can you do?). I just don’t want to make it harder for users of old versions to upgrade - otherwise they won’t upgrade :stuck_out_tongue:

Ok how about moving them out to some /legacy/ directory? No markdown just pure HTML if that’s possible. But amount of files in a git checkout won’t go down. Maybe tarballs would do it!

Edit: I like tarballs on our downloads.tf.org site with a link from our docs.

PDFs would be a good idea I think

Unfortunately, our manuals are not in an easy format to generate PDFs from. I mean, if there is a volunteer let’s do it. Here are my next steps, speak up now or never:

  • Gonna create HTML tarballs from all old manuals until version 1.14
  • Versions 1.14-1.19 and nightly directory stays untouched but I will prevent search engines from indexing them
  • Stable version (1.20) stays as-is.

Here is the PR I call for help with review.

Actually, I think I found an easy way to generate PDF - Print to PDF for a page, works fine. So will be providing PDFs then.

One question tho - where to put those PDF files? Do we do regular backup of downloads.tf.org? Can I simply create a folder there @ekohl or @Gwmngilfen ?

We don’t at the moment, but we could do so, the backup system is all in Puppet anyway.

I’ll review the PDFs once you have them, they need to be of a high quality to replace the website manuals. I’m still not convinced that shaving a few 10s of seconds off the compile time of the site is a good excuse for deleting user-facing content, so they’d better be good :wink:

I created them using print feature from a web browser, our formatting switches over to “printer friendly” and it looks just fine. See it yourself when you do a print preview in our browser.

I expect to cut the generation time to half or something like that. We need to start some day, status quo is not a solution. If you have some numbers showing that we still have users from some particular release let’s say 1.10+ I have no problems moving the cut around.

Correct me if my maths is wrong, but half of one minute is 30s, which is indeed “tens of seconds”.

I cannot prove usage numbers, as the RSS widget that we use to get an idea of version stats only went live in 1.17. However, I don’t think it matters - you’re suggesting using compile time (something that only really affects core devs) as a reason to delete content, which has no dependence on compile time. To put that another way - from the user perspective, they lose something (content) for no gain (because they aren’t compiling the site).

I’ll note that I don’t see issues with inotify, and compile time for jekyll serve is currently ~56s. I think that’s fine. If the build were broken it’d be different, but it isn’t.

I realise I previous backed some of this in my earlier post but the more I think about it, the more this feels like convenience for the few over hardship for the many. I will still back this, but only once I’m satisfied that the end user can still get at this content easily. Ideally that means the switcher in the manual page should offer the PDFs when selecting an old version - I don’t want users having to hunt for them, or come into the IRC channel to ask where they are. It should be obvious from the manual page where to get them. I’ll go add that comment to the PR now :stuck_out_tongue: