Errata not calculated/GenerateApplicability fails trying to calculate deleted hosts/Katello 3.17

Problem:
I have a Foreman/Katello cluster containing 23 smart proxies managing 10700 servers. New virtual servers are provisioned daily and added fo Foreman as well as old ones deleted. I’ve noticed that on Content Host page the Installable updates column, there are all 0’s (errata and upgradeable packages)for a lot of servers, like 90% of them. I’ve first noticed this in Katello 3.16 (this was not an issue in Katello 3.15), however I couldn’t find any logs to investigate. Now I got some logs and theories and I need your thoughts.

Since this was an issue that Content host page was missing I went and started Bulk Generate Applicability for hosts manually by running:

# foreman-rake console
> irb(main):001:0> User.current = User.anonymous_admin
> irb(main):002:0> hosts = Host.all
> irb(main):003:0> ForemanTasks.sync_task(::Actions::Katello::Host::GenerateApplicability, hosts)

This started many Generate Applicability tasks, each task for a group of 50 hosts. Most of them ended up successfully.
However, 14 tasks failed. As far as I’ve checked, it was trying to calculate applicability for servers that were deleted. The list of servers contains ID’s of servers no longer exist. In example below, this is true for most of the servers ID’s on that list.

When server is deleted, internal provisioning system sends api call to Foreman to delete it from it’s inventory as well, so that Foreman doesn’t pile up no longer existing servers in its inventory.

Expected outcome:

Not to include deleted servers in generate applicability tasks and all tasks to be successful.

Foreman and Proxy versions:
Foreman 2.2
Proxy 2.1
Foreman and Proxy plugin versions:
Katello 3.17
Distribution and version:
CentOS 7.9
Other relevant data:

Action:
Actions::Katello::Applicability::Hosts::BulkGenerate
Input:
{"host_ids"=>
  [1,
   2,
   3,
   4,
   5,
   96,
   267,
   756,
   872,
   971,
   995,
   1003,
   1004,
   1005,
   1006,
   1007,
   1009,
   1011,
   1012,
   1013,
   1014,
   1015,
   1016,
   1017,
   1018,
   1019,
   1020,
   1021,
   1022,
   1023,
   1024,
   1025,
   1026,
   1027,
   1028,
   1029,
   1030,
   1031,
   1032,
   1033,
   1034,
   1035,
   1036,
   1037,
   1038,
   1039,
   1040,
   1041,
   1042,
   1043],
 "current_request_id"=>nil,
 "current_timezone"=>"UTC",
 "current_user_id"=>1,
 "current_organization_id"=>nil,
 "current_location_id"=>nil}
Output:
{}
Exception:
NoMethodError: undefined method `calculate_and_import_applicability' for nil:NilClass
Backtrace:
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.17.0/app/lib/actions/katello/applicability/hosts/bulk_generate.rb:13:in `block in run'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.17.0/app/lib/actions/katello/applicability/hosts/bulk_generate.rb:11:in `each'
/opt/theforeman/tfm/root/usr/share/gems/gems/katello-3.17.0/app/lib/actions/katello/applicability/hosts/bulk_generate.rb:11:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/action.rb:571:in `block (3 levels) in execute_run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware/stack.rb:27:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware.rb:19:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-3.0.1/app/lib/actions/middleware/rails_executor_wrap.rb:14:in `block in run'
/opt/theforeman/tfm/root/usr/share/gems/gems/activesupport-6.0.3.1/lib/active_support/execution_wrapper.rb:88:in `wrap'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-3.0.1/app/lib/actions/middleware/rails_executor_wrap.rb:13:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware/stack.rb:23:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware/stack.rb:27:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware.rb:19:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/action/progress.rb:31:in `with_progress_calculation'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/action/progress.rb:17:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware/stack.rb:23:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware/stack.rb:27:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware.rb:19:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-3.0.1/app/lib/actions/middleware/keep_current_request_id.rb:15:in `block in run'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-3.0.1/app/lib/actions/middleware/keep_current_request_id.rb:49:in `restore_current_request_id'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-3.0.1/app/lib/actions/middleware/keep_current_request_id.rb:15:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware/stack.rb:23:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware/stack.rb:27:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware.rb:19:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-3.0.1/app/lib/actions/middleware/keep_current_timezone.rb:15:in `block in run'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-3.0.1/app/lib/actions/middleware/keep_current_timezone.rb:44:in `restore_curent_timezone'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-3.0.1/app/lib/actions/middleware/keep_current_timezone.rb:15:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware/stack.rb:23:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware/stack.rb:27:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware.rb:19:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-3.0.1/app/lib/actions/middleware/keep_current_user.rb:15:in `block in run'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-3.0.1/app/lib/actions/middleware/keep_current_user.rb:44:in `restore_curent_user'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-3.0.1/app/lib/actions/middleware/keep_current_user.rb:15:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware/stack.rb:23:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware/stack.rb:27:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware.rb:19:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-3.0.1/app/lib/actions/middleware/keep_current_taxonomies.rb:15:in `block in run'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-3.0.1/app/lib/actions/middleware/keep_current_taxonomies.rb:45:in `restore_current_taxonomies'
/opt/theforeman/tfm/root/usr/share/gems/gems/foreman-tasks-3.0.1/app/lib/actions/middleware/keep_current_taxonomies.rb:15:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware/stack.rb:23:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware/stack.rb:27:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware.rb:19:in `pass'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware.rb:32:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware/stack.rb:23:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/middleware/world.rb:31:in `execute'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/action.rb:570:in `block (2 levels) in execute_run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/action.rb:569:in `catch'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/action.rb:569:in `block in execute_run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/action.rb:472:in `block in with_error_handling'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/action.rb:472:in `catch'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/action.rb:472:in `with_error_handling'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/action.rb:564:in `execute_run'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/action.rb:285:in `execute'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:18:in `block (2 levels) in execute'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/execution_plan/steps/abstract.rb:167:in `with_meta_calculation'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:17:in `block in execute'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:32:in `open_action'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/execution_plan/steps/abstract_flow_step.rb:16:in `execute'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/director.rb:68:in `execute'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/executors/sidekiq/worker_jobs.rb:11:in `block (2 levels) in perform'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/executors.rb:18:in `run_user_code'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/executors/sidekiq/worker_jobs.rb:9:in `block in perform'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/executors/sidekiq/worker_jobs.rb:25:in `with_telemetry'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/executors/sidekiq/worker_jobs.rb:8:in `perform'
/opt/theforeman/tfm/root/usr/share/gems/gems/dynflow-1.4.7/lib/dynflow/executors/sidekiq/serialization.rb:27:in `perform'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:192:in `execute_job'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:165:in `block (2 levels) in process'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/middleware/chain.rb:128:in `block in invoke'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/middleware/chain.rb:133:in `invoke'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:164:in `block in process'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:137:in `block (6 levels) in dispatch'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/job_retry.rb:109:in `local'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:136:in `block (5 levels) in dispatch'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq.rb:37:in `block in <module:Sidekiq>'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:132:in `block (4 levels) in dispatch'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:250:in `stats'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:127:in `block (3 levels) in dispatch'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/job_logger.rb:8:in `call'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:126:in `block (2 levels) in dispatch'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/job_retry.rb:74:in `global'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:125:in `block in dispatch'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/logging.rb:48:in `with_context'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/logging.rb:42:in `with_job_hash_context'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:124:in `dispatch'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:163:in `process'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:83:in `process_one'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/processor.rb:71:in `run'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/util.rb:16:in `watchdog'
/opt/theforeman/tfm/root/usr/share/gems/gems/sidekiq-5.2.7/lib/sidekiq/util.rb:25:in `block in safe_thread'
/opt/theforeman/tfm/root/usr/share/gems/gems/logging-2.2.2/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'

@iballou the issue we talked before is still present. I got the logs now. It seems it is including deleted hosts for some reason Bulk Generate Applicably tasks and that is messing it up.

Hey @matemikulic,

This looks promising, it should be easy to fix too. I’ll take a stab at it and share a patch if you’re interested.

Absolutely. :slight_smile:
Thank you.

Try this out: https://github.com/Katello/katello/pull/9041.patch

Let me know if you have any issues with it.

Hey, I haven’t had time to apply it sooner. I’ve done it today. I’ve forced applicability calculation (as in my first post) and all tasks were successful(last time 14 failed).

Let’s leave it like this for at least 48 hours. As previously when “Bulk generate applicability for hosts” was running and it failed all subsequent automatic “Bulk generate applicability for hosts” would run successfully, without errors, but their running time would be 0 seconds(I assume this was the cause). Meanwhile Content hosts page(Installable updates section) would show all 0’s for more and more hosts.

So this definitely fixes fails when forcing applicability calculation. If all hosts still retain Installable updates data after 48 hours we can call it fixed for also slowly disappearing Content hosts/Installable updates all 0’s issue. :slight_smile:

Thank you very much for this.

It seems this patch took care of the one part of the problem, other part still remains.

This morning bulk generate applicability tasks again began lasting 0-1 seconds and 90% of Content hosts/Installable updates column shows all 0’s again.

And no errors were logged. I think this is when this starts happening.

And this corresponds with time when repos are synced and new packages and errata are introduced.

If I don’t manually force recalculation via console, all tasks from now on will run with success but 0-1 seconds. Looking into tasks it seems it stops running for groups of 50 hosts, but starts doing it one by one host.
The last task that lasted 7 minutes for many hosts:

Id:abd18ea8-a639-47e9-872b-a3d9701fee51
Label:Actions::Katello::Applicability::Hosts::BulkGenerate
Duration:6 minutes
Raw input:
{
  "host_ids": [
    9563,
    6740,
    1450,
    5637,
    9521,
    7311,
    2637,
    2997,
    8422,
    11316,
    9812,
    5264,
    5000,
    8934,
    8355,
    5944,
    7261,
    3565,
    9397,
    3510,
    3022,
    9137,
    1014,
    11494,
    11366,
    6982,
    6571,
    8524,
    7004,
    9098,
    10282,
    6107,
    4035,
    9180,
    3858,
    7227,
    10896,
    5642,
    2850
  ],
  "current_request_id": null,
  "current_timezone": "UTC",
  "current_user_id": 1,
  "current_organization_id": null,
  "current_location_id": null
}
Raw output:
{}
External Id:885f1514-b4de-4aea-bf53-377790368267

After sync plan ran, all bulk generate applicability tasks run per one host:

Id:2471d2e8-7d15-4fdd-95bf-549acb8fee71
Label:Actions::Katello::Applicability::Hosts::BulkGenerate
Duration:0 seconds
Raw input:
{
  "host_ids": [
    8297
  ],
  "current_request_id": null,
  "current_timezone": "UTC",
  "current_user_id": 1,
  "current_organization_id": null,
  "current_location_id": null
}
Raw output:
{}
External Id:8d48d340-5b95-4abb-8e04-95b5595261a7

And when these bulk generate tasks run for one host, that host has data on Content hosts/Installable updates. That task is truly successful.
Looking at all of this the problem seems to be that “Bulk Generate Applicability” task is not doing the “bulk” part.
As sync plan syncs new packages and errata, the old one expires and is no longer relevant, so it resets to all 0’s, and “Bulk generate applicability” doesn’t get around 10 000+ hosts in 24 hours doing it one by one(instead of bulk), before new sync is triggered and previous calculations expire.

To confirm this theory, I’ve switched sync plan to weekly. Next sync will be Monday morning. I’ll check over the weekend if Content hosts/Installable updates is populated. If they are over the weekend, and on Monday after sync they aren’t, theory is confirmed.

This one-by-one applicability calculating that you’re seeing is really strange. Looking over the code, I don’t see immediately what would cause it. There might be a race condition, just a hunch.

In the Dynflow console of your sync task that triggers the applicability regeneration, how many Actions::Katello::Applicability::Repository::Regenerate tasks do you see?

Just to be sure, ::Setting::Content.find_by(name: "applicability_batch_size").value in the foreman-console is 50?

One way to test the repository-based host applicability regeneration is by running the following in the console:

ForemanTasks.sync_task(Actions::Katello::Applicability::Repository::Regenerate, :repo_id => <id of some repo in your sync plan>, :contents_changed => true)

I’ll be curious to see if your theory is true and if the one-by-one applicability generation continues. If this is consistently reproducible for you, then there a better chance we’ll be able to see your bug as well.

Theory confirmed. I’ve checked Sunday evening and Installable updates data was accounted for. This morning after sync plan ran, it was reset to all 0’s.
This is what is in dynflow console. It is running only one task for Actions::Katello::Applicability::Repository::Regenerate

Value is 50.

# foreman-rake console
Loading production environment (Rails 6.0.3.1)
irb(main):001:0> ::Setting::Content.find_by(name: "applicability_batch_size").value
=> 50
irb(main):002:0> 

I’ve tested repository regeneration.

irb(main):002:0> ForemanTasks.sync_task(Actions::Katello::Applicability::Repository::Regenerate, :repo_id => 5, :contents_changed => true)
=> #<ForemanTasks::Task::DynflowTask id: "40bab928-a81f-423a-a357-5ff1ef03bd9d", type: "ForemanTasks::Task::DynflowTask", label: "Actions::Katello::Applicability::Repository::Regen...", started_at: "2020-11-23 12:05:23", ended_at: "2020-11-23 12:05:39", state: "stopped", result: "success", external_id: "4ae0fd73-3658-4efc-9bf4-3f191290ebf4", parent_task_id: nil, start_at: "2020-11-23 12:05:23", start_before: nil, action: "Generate repository applicability", user_id: 2, state_updated_at: "2020-11-23 12:05:39">

production.log for that task

# grep 40bab928-a81f-423a-a357-5ff1ef03bd9d /var/log/foreman/production.log
2020-11-23T12:05:23 [I|bac|] Task {label: Actions::Katello::Applicability::Repository::Regenerate, id: 40bab928-a81f-423a-a357-5ff1ef03bd9d, execution_plan_id: 4ae0fd73-3658-4efc-9bf4-3f191290ebf4} state changed: planning 
2020-11-23T12:05:23 [I|bac|] Task {label: Actions::Katello::Applicability::Repository::Regenerate, id: 40bab928-a81f-423a-a357-5ff1ef03bd9d, execution_plan_id: 4ae0fd73-3658-4efc-9bf4-3f191290ebf4} state changed: planned 
2020-11-23T12:05:23 [I|bac|] Task {label: Actions::Katello::Applicability::Repository::Regenerate, id: 40bab928-a81f-423a-a357-5ff1ef03bd9d, execution_plan_id: 4ae0fd73-3658-4efc-9bf4-3f191290ebf4} state changed: running 
2020-11-23T12:05:39 [I|bac|] Task {label: Actions::Katello::Applicability::Repository::Regenerate, id: 40bab928-a81f-423a-a357-5ff1ef03bd9d, execution_plan_id: 4ae0fd73-3658-4efc-9bf4-3f191290ebf4} state changed: stopped  result: success
2020-11-23T12:05:39 [I|bac|] Task {label: Actions::Katello::Applicability::Repository::Regenerate, id: 40bab928-a81f-423a-a357-5ff1ef03bd9d, execution_plan_id: 4ae0fd73-3658-4efc-9bf4-3f191290ebf4} state changed: stopped  result: success

It was sucessful.
I agree, if code is fine it could be only race condition, considering when I start it manually from foreman console it takes hosts in batches of 50, while if it does it by itself it runs it one by one.
Not sure what I can check further.