Trace_status = reboot_needed not working after upgrade to 4.12

dodo · April 5, 2024, 6:14am

I’ve just noticed trace_status = reboot_needed is not working as expected anymore

In Content Host UI > search bar > trace_status = reboot_needed
does not show any hosts

but i have Host’s which needs a restart

When i do
trace_helper ~ reboot

It seems to find all the Hosts which needs a reboot

Is this expected or a bug?
Whats the correct search string for Job Run when im scheduling a reboot for Hosts with dynamic query ?

Name	Description	Author	Version
foreman-tasks	The goal of this plugin is to unify the way of showing task statuses across the Foreman instance. It defines Task model for keeping the information about the tasks and Lock for assigning the tasks to resources. The locking allows dealing with preventing multiple colliding tasks to be run on the same resource. It also optionally provides Dynflow infrastructure for using it for managing the tasks.	Ivan Nečas	9.1.1
foreman_acd	Foreman plugin to provide application centric deployment and self service portal	ATIX AG	0.9.4
foreman_ansible	Ansible integration with Foreman	Daniel Lobato Garcia	13.0.3
foreman_fog_proxmox	Foreman plugin adds Proxmox VE compute resource using fog-proxmox. It is compatible with Foreman 1.22+	Tristan Robert and The Foreman Team	0.15.0
foreman_remote_execution	A plugin bringing remote execution to the Foreman, completing the config management functionality with remote management functionality.	Foreman Remote Execution team	12.0.5
foreman_templates	Engine to synchronise provisioning templates from GitHub	Greg Sutcliffe	9.4.0
katello	Katello adds Content and Subscription Management to Foreman. For this it relies on Candlepin and Pulp.	N/A	4.12.0

jeremylenz · April 5, 2024, 2:57pm

It sounds like the host’s trace status is not up to date. In the web UI, what do you see for the host’s Tracer status? Does it match what you see on the host itself?

dodo · April 12, 2024, 6:31am

Let me try to show you

Content Host Page > Traces > Reboot needed

Content Host > Search Query
1.

Search for trace_status = reboot_needed is not working as expected

The other two Hosts need a reboot too, so thats correct

jeremylenz · April 12, 2024, 1:19pm

If you click Overview, then on the Host status card click ‘Manage all statuses’, what do you see there? My question was does this status match what you are seeing elsewhere or not. (my theory is that it does not.)

example:

jeremylenz · April 12, 2024, 1:21pm

I’m also curious, what version of katello_host_tools is installed on the host? We recently had a big change to the way traces are handled there.

dodo · April 15, 2024, 11:57am

hey,

All Statuses fine, reboot needed

In Content Host Page, the search String is not working

the other search string is working

dodo · April 15, 2024, 12:03pm

Katello-Host -tools
katello-host-tools-4.2.3-5.el9.noarch
katello-host-tools-tracer-4.2.3-5.el9.noarch

jeremylenz · April 15, 2024, 2:53pm

Thanks, this is helpful! This tells me that the status itself is working correctly (and also katello-host-tools is working); it is only the search that is not working.

Opened Bug #37354: Trace_status = reboot_needed not working after upgrade to 4.12 - Katello - Foreman

gvde · June 24, 2024, 4:58am

I have just upgraded to 4.12 on Friday and noticed this, too. The search for trace_status doesn’t work. Neither reboot nor process restart is found. Any chance to get this backported to 4.12, soon?

Looking at the fix Fixes #37354 - Reload host_traces when computing trace status by jeremylenz · Pull Request #11032 · Katello/katello · GitHub , I am wondering if that changes isn’t putting a lot of load on the database? Looks to me, as if it will load everything again each time I do a search. I would rather wonder what this “for some reason” actually is.

jeremylenz · June 24, 2024, 1:29pm

It’s not targeted for 4.12, but since it’s merged now it shouldn’t be a problem to get it in for 4.13 GA.

IMO to_status is the right place to put load on the database, considering how host statuses are supposed to work: The to_status value is computed, and then saved (cached) in the DB for searching etc. The reason why the host details page was showing the correct status is because it relies on to_status rather than the stored status value, which, one could argue that’s what’s not correct. In any case, we’ll keep an eye on it and please report if you notice any slowness.

As to what changed, I’m still not sure. Maybe Rails itself changed the rules about when it decides to hit the database for Active Record queries…? Or, the Occum’s razor hypothesis would be that Katello code changed something about when we compute the status, but I couldn’t find any. If you have any other theories, let me know!

gvde · June 24, 2024, 2:36pm

Thanks for the explanation. I don’t think I really understand the logic there, but I suppose I don’t have to. I kind of thought that the trace status in database table katello_host_tracers would be accessed. That contains all traces. So you are saying, that that information is cached somewhere else in the database again for access on trace_status?

jeremylenz · June 24, 2024, 5:07pm

It is. The problem (I think) is that the value saved in the DB is a stale value.

Looking at it some more, I think the problem may be related to this commit. There we added host_traces to the host’s included associations:

def included_associations(include = [])
 [:host_traces] + super
end

This means that this previous flow

host is loaded in Active Record via any call to a host object
host trace status is computed, triggering a call to to_status and thus host.host_traces
host.host_traces are loaded via a SQL query to the database

now looks like this:

host is loaded in Active Record via any call to a host object. host.host_traces is also loaded via SQL query at the same time, and saved in memory
host trace status is computed, triggering a call to to_status > host.host_traces.
But this time, no SQL query is executed, because host_traces is already in memory.

Thus, that commit altered the moment at which the host’s traces are queried, and introduced the possibility that to_status is computed based on stale data.

gvde · June 24, 2024, 5:31pm

But then where else is the status store?

For instance, if I search the database like this:

foreman=# select a.*,b.name from katello_host_tracers a left join hosts b on (a.host_id = b.id) where app_type = 'static'; 
   id   | host_id | application |                helper                 | app_type |            name            
--------+---------+-------------+---------------------------------------+----------+----------------------------
 187095 |     372 | systemd     | You will have to reboot your computer | static   | host1.example.com

I’ll see currently 117 rows listed, i.e. 117 hosts which need to be rebooted.

If I search hosts in the foreman gui with trace_status != updated I currently get 3 hosts listed.

If have restarted all foreman services (foreman-maintain service restart). It’s still the same. Thus it’s not in the memory. So I am really puzzled to which SQL query it does to retrieve the host_trace. It doesn’t seem to access katello_host_tracers because that does contain the traces for each host.

And it doesn’t really look like stale data because it’s missing most of the data altogether. Basically trace_status doesn’t know about trace information of those 114 missing hosts.

jeremylenz · June 24, 2024, 5:37pm

Right. Any search for trace_status is only querying the host_status table, not the katello_host_tracers table. And computing the status based on stale data causes the host_status table to have incorrect values.

gvde · June 24, 2024, 6:24pm

Thanks. Now I have got it. The status column in table host_status contains values 0,1, or 2 for each row with type Katello::TraceStatus. In that table only those 3 hosts have a value other than 0. All others have 0. So that value status in host_status does not match the content of the katello_host_tracers.

It doesn’t look like a particular good idea to store an additional status value in host_status while the real data is available in katello_host_tracers for the exact reason of this issue, when the one value doesn’t match the value of the other table. But then, I guess there must be a good reason for doing so.

Either way, now I know how it works and was able to quickly fix the status in host_status for our hosts. Now it looks good. I hope it stays this way with this patch.

dodo · July 30, 2024, 7:13am

seems to be fixed in 4.13.1