Troubleshooting foreman discovery kexec

Hello,

I’m using Formean 1.24.2 / Katello 3.14 / FDI image 3.5.7 and also using the fdi.script option instead of following the “typical” discovery process.
In a nutshell what the script does is:

  • remove all network configuration
  • create a dummy file /etc/NetworkManager/system-connections/primary (to allow foreman-proxy to start)
  • configure a bonding interface called primary (to have generate-proxy-cert script executed successfully)
  • launch discovery_register

I have discovery rules enabled and the host appears in the discovered hosts list with all the expected facts.

My issue is when I click on autoprovision in the discovered hosts interface, the discovery process stops and the server reboots. Check screenshots below.






But if I run the full kexec command that is supposed to be executed manually, the installation process proceeds and the server gets installed successfully.

Does anyone have any idea on how to troubleshoot this. How can I follow what is triggered when I click on autoprovision?

Thanks,
JS

I replaced /usr/sbin/shutdown to prevent it from rebooting and in fact when I click on autoprovision the host is getting “Started PUT /power/reboot” it doesn’t even attempt to run kexec.

If I don’t use fdi.script and just go with the typical discovery process clicking on autoprovision works as expected.

I finally figured it out.

When using fdi.script the TUI is not launched, but the TUI sets some important facts that are required for the discovery process to complete correctly:

def new_custom_facts mac
ip_cidr, gw, dns = detect_ipv4_credentials(‘primary’)
ip = ip_cidr.split(’/’).first
mask = IPAddr.new(ip_cidr).inspect.scan(/\d+.\d+.\d+.\d+/\d+.\d+.\d+.\d+/).first.split(’/’).last
{
‘discovery_kexec’ => command(“kexec --version”),
‘discovery_ip_cidr’ => ip_cidr,
‘discovery_ip’ => ip,
‘discovery_netmask’ => mask,
‘discovery_gateway’ => gw,
‘discovery_dns’ => dns,
‘discovery_bootif’ => mac,
}

In my case I was already setting all except for discovery_kexec, which I assumed was not needed, but it is.

Now the installation proceeds as expected.

Oh that was quick, looks like you know a lot about FDI now! Yeah, it’s sub-ideal implementation. These should have probably been a custom facts available for both the service and TUI. Feel free to send a patch!

I would like to get rid of smart proxy and use SSH for all the communication, this way you could actually use arbitrary shell scripts to do anything you want during the discovery stage. On our TODO.

Yeah, I’ve been playing with it for some time now but only discovered the fdi.script recently. It might be sub-ideal but it gives us a lot more flexibility to get creative and work around some of the out-of-the-box limitations and it’s not too complex in a way that doesn’t turn the customization too cumbersome.

I’m still struggling with the full auto provisioning not working (discovery_auto yes), I still have to click on autoprovision but haven’t lost too much time with it yet. I’m guessing there’s probably a condition that is not met for it to not get triggered automatically. Any leads on that?

2 Likes

You must enable autoprovisioning in settings but other than that this is more of a feature in Foreman core rather than anything on FDI. Once you have that enabled you should be good to go.

It is enabled, however it’s not triggered automatically. I would assume that if clicking on autoprovision works, all the conditions are met, and auto provision should work too, but it doesn’t.
In fact this worked when I was using satellite 6.6, but since I moved to foreman it hasn’t worked.

We haven’t touched this in a while. Can you enable debug and investigate the HTTP transaction? It writes a lot of debugging info like “Processing rule XYZ”. What do you have for the rule search? Are facts uploading correctly?

I have several smart proxies should this be enabled both on the foreman server and the smart proxy that is serving as deployment server?

Facts appear correctly in the discovered hosts lists. My search rule is based on some custom facts (fdi.pxfactname/fdi.pxfactvalue) injected when calling the FDI image kernel.

It turns out it eventually gets triggered and I just didn’t wait long enough, I expected it to trigger in a few seconds. It takes 15m to trigger, apparently when facts get reloaded.

I’m getting:

2020-05-22T19:29:32 [D|app|93e0b94a] Finding auto discovery rule for host XXXX (151)
2020-05-22T19:29:32 [D|app|93e0b94a] Body: {"id":151,"name":"XXXX","last_compile":null,"last_report":"2020-05-22T16:29:31.054Z","updated_at":"2020-05-22T16:29:31.951Z","created_at":"2020-05-22T16:29:31.013Z",
"root_pass":null,"architecture_id":null,"operatingsystem_id":null,"environment_id":null,"ptable_id":null,"medium_id":null,"build":false,"comment":null,"disk":null,"installed_at":null,"model_id":2,"hostgroup_id":
null,"owner_id":1,"owner_type":"User","enabled":true,"puppet_ca_proxy_id":null,"managed":false,"use_image":null,"image_file":null,"uuid":null,"compute_resource_id":null,"puppet_proxy_id":null,"certname":null,"im
age_id":null,"organization_id":3,"location_id":30,"otp":null,"realm_id":null,"compute_profile_id":null,"provision_method":null,"grub_pass":"","global_status":0,"lookup_value_matcher":null,"pxe_loader":null,"init
iated_at":null,"build_errors":null,"discovery_rule_id":null,"monitoring_proxy_id":null,"openscap_proxy_id":null}
2020-05-22T19:29:32 [I|app|93e0b94a] Completed 201 Created in 1060ms (Views: 1.2ms | ActiveRecord: 332.3ms)

Then further down:

2020-05-22T19:35:07 [I|app|00a95aa9] Started GET "/discovered_hosts/XXXX" for XXXXXXXXX at 2020-05-22 19:35:07 +0300
2020-05-22T19:35:07 [I|app|00a95aa9] Processing by DiscoveredHostsController#show as HTML
2020-05-22T19:35:07 [I|app|00a95aa9]   Parameters: {"id"=>"XXXX"}
2020-05-22T19:35:07 [D|tax|00a95aa9] Current location set to none
2020-05-22T19:35:07 [D|tax|00a95aa9] Current organization set to none
2020-05-22T19:35:07 [I|app|00a95aa9]   Rendering /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_discovery-16.0.1/app/views/discovered_hosts/show.html.erb within layouts/application
2020-05-22T19:35:08 [I|app|00a95aa9]   Rendered /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_discovery-16.0.1/app/views/discovered_hosts/_discovered_host_modal.html.erb (8.0ms)
2020-05-22T19:35:08 [I|app|00a95aa9]   Rendered /opt/theforeman/tfm/root/usr/share/gems/gems/foreman_discovery-16.0.1/app/views/discovered_hosts/show.html.erb within layouts/application (18.5ms)
2020-05-22T19:35:08 [W|app|00a95aa9] unable to detect breadcrumb title name in for discovered_hosts, defaulting to name
2020-05-22T19:35:08 [D|app|00a95aa9] <NameError> Could not find resource class for resource discovered_host
    /usr/share/foreman/app/controllers/concerns/find_common.rb:31:in `resource_class'
    /usr/share/foreman/app/services/breadcrumbs_options.rb:23:in `resource_class'
    /usr/share/foreman/app/services/breadcrumbs_options.rb:68:in `model_name_field'
    /usr/share/foreman/app/services/breadcrumbs_options.rb:81:in `resource'
    /usr/share/foreman/app/services/breadcrumbs_options.rb:16:in `bar_props'
    /usr/share/foreman/app/helpers/layout_helper.rb:83:in `mount_breadcrumbs'
    /usr/share/foreman/app/views/layouts/_application_content.html.erb:15:in `_81b16ab1ca83e548f24e04619beb72a8'
    .
    .
    .
    /usr/share/passenger/phusion_passenger/request_handler/thread_handler.rb:109:in `main_loop'
    /usr/share/passenger/phusion_passenger/request_handler.rb:455:in `block (3 levels) in start_threads'
    /opt/theforeman/tfm/root/usr/share/gems/gems/logging-2.2.2/lib/logging/diagnostic_context.rb:474:in `block in create_with_logging_context'
2020-05-22T19:35:08 [I|app|00a95aa9]   Rendered layouts/_application_content.html.erb (2.3ms)
2020-05-22T19:35:08 [I|app|00a95aa9]   Rendering layouts/base.html.erb
2020-05-22T19:35:08 [I|app|00a95aa9]   Rendered layouts/base.html.erb (32.5ms)
2020-05-22T19:35:08 [I|app|00a95aa9] Completed 200 OK in 147ms (Views: 55.0ms | ActiveRecord: 6.7ms)

Then finally:

2020-05-22T19:44:37 [I|app|fb6ba3fa] Started POST "/api/v2/discovered_hosts/facts" for XXXXXXX at 2020-05-22 19:44:37 +0300
2020-05-22T19:44:37 [I|app|fb6ba3fa] Processing by Api::V2::DiscoveredHostsController#facts as JSON
2020-05-22T19:44:37 [I|app|fb6ba3fa]   Parameters: {"facts"=>"[FILTERED]", "apiv"=>"v2", "discovered_host"=>{"facts"=>"[FILTERED]"}}
2020-05-22T19:44:37 [I|app|fb6ba3fa] Import facts for 'XXXX' completed. Added: 10, Updated: 2, Deleted 0 facts
2020-05-22T19:44:37 [D|tax|fb6ba3fa] Current location set to XXXXXXXXX
2020-05-22T19:44:37 [D|tax|fb6ba3fa] Current organization set to XXXXX
.
.
.
2020-05-22T19:44:37 [I|app|fb6ba3fa] Detected IPv4 subnet: XXXXXX with taxonomy ["XXXXXX"]/["XXXXXXXXXX"]
2020-05-22T19:44:37 [I|app|fb6ba3fa] Assigned location: XXXXXXXXX
2020-05-22T19:44:37 [I|app|fb6ba3fa] Assigned organization: XXXXXXX
2020-05-22T19:44:37 [D|not|fb6ba3fa] Notification event: UINotifications::NewHost - checking for notifications
2020-05-22T19:44:37 [D|app|fb6ba3fa] Finding auto discovery rule for host XXXX (151)
2020-05-22T19:44:37 [D|app|fb6ba3fa] Found rule XXXXXXXXXX (7) [6/0]
2020-05-22T19:44:37 [I|app|fb6ba3fa] Match found for host XXXX (151) rule XXXXXXXX (7)
2020-05-22T19:44:37 [D|app|fb6ba3fa] Auto-provisioning via rule XXXXXXXXXX hostgroup XXXXXXXXXXXXX subnet 
2020-05-22T19:44:37 [W|app|fb6ba3fa] Could not find a provider for XXXX. Providers returned {"Katello::ManagedContentMediumProvider"=>["Kickstart repository was not set for host 'XXXX'", "Content source was not set for host 'XXXX'"], "MediumProviders::Default"=>["Operating system was not set for host 'XXXX'", " medium was not set for host 'XXXX'", "Invalid medium '' for ''", "Invalid architecture '' for 
''"]}

I’m curious about this message:

Could not find a provider for XXXX. Providers returned {“Katello::ManagedContentMediumProvider”=>[“Kickstart repository was not set for host ‘XXXX’”, “Content source was not set for host ‘XXXX’”], “MediumProviders::Default”=>[“Operating system was not set for host ‘XXXX’”, " medium was not set for host ‘XXXX’", “Invalid medium ‘’ for ‘’”, “Invalid architecture ‘’ for
‘’”]}

Because all of those parameters are configured in the hostgroup either by inheritance or directly.

That fact upload issue sounds like a bug, can you confirm those facts are missing from the initial upload? I wonder why it would be missing.

Are you able to test this hostgroup using non-discovery workflow, meaning by creating New Host via UI and API? This error means there is a misconfiguration for your Installation Media or Content Source (if you have Katello plugin).

Do you use nested hostgroups in this particular case?

The facts appear in the initial upload, at least visibly I don’t see any difference after fact reload.

I’m doing bare-metal deployment, don’t know what you mean by non-discovery workflow, do you mean rebuilding the OS once it has an OS deployed? If so, I never tried to get that to work, just clicking build doesn’t work.

I use nested groups yes. But even if I set all hostgroup settings without using inheritance the outcome is the same.

Create a host, assign it a hostgroup, put a provisioning MAC address in, Submit. Boot up the host. It should do unattended provisioning successfully.

Only after this works, do discovery on top of that.

I don’t have dhcp on my networks nor do I have a dedicated provisioning network where I could set that up. My deployments are all based on customized ipxe boot images with static network configurations on them from where I launch FDI with all required parameters and custom facts.

Oh I see, I am too narrow-minded when it comes to this.

Anyway, what I recommend still applies, just create the host in Foreman and obviously don’t boot anything up. But you should see errors when trying to save it - resolve them first.

I create the host, select the host group, everything gets prefilled, I configure the network interface and am able to save without any errors.

Hmmm this error has been reported and a colleague suggested some workaround in https://bugzilla.redhat.com/show_bug.cgi?id=1785226#c8 but I haven’t seen that myself. Tell me how to reproduce, exact steps. Then I can take a look.

Thanks Lukáš, for you support on this, I haven’t had time to dig further into this. Given that the only thing I have to do is click on auto-provision, it’s not a priority for me yet to have the fully automated provisioning working.
I have other issues now related to UEFI+kexec+framebuffer, for which I’ll open another topic. Once I get that working, I’ll get back to this.

Thanks!