Virt-who service won't start, times out

So, the RPM seems to work in the sense it installed a virt-who binary, and after fixing the config file rhsm_encrypted_password, seems to login to Foreman ok, but after the initial run, I’m not seeing any ESXi hosts being populated in my “All Hosts” view now. (Using “Any Location” view).

I do see the host information running the exe in “oneshot” mode. The log also says it found the hosts and would be reporting it to Foreman.

Edit: also, to be clear, I reverted back my Foreman VM snapshot to pre-virt-who configuration, its just a bare (working) Foreman 1.22 with no other virt-who stuff. This is a totally clean install attempt with no cruft laying around.

Sep 16 12:37:24 my_foreman systemd: Started Daemon for reporting virtual guest IDs to subscription-manager.
Sep 16 12:37:32 my_foreman /usr/bin/virt-who: 2019-09-16 12:37:32,374 [virtwho.main INFO] MainProcess(16714):Thread-2 @virt.py:_send_data:969 - Report for config "virt-who-config-1" gathered, placing in datastore
Sep 16 12:37:33 my_foreman /usr/bin/virt-who: 2019-09-16 12:37:32,993 [virtwho.destination_-7244243523893140581 INFO] MainProcess(16714):Thread-3 @virt.py:_send_data:639 - Hosts-to-guests mapping for config "virt-who-config-1": 41 hypervisors and 1104 guests found
Sep 16 12:37:33 my_foreman /usr/bin/virt-who: 2019-09-16 12:37:32,994 [virtwho.destination_-7244243523893140581 INFO] MainProcess(16714):Thread-3 @virt.py:_send_data:670 - Sending updated Host-to-guest mapping to "MY_ORG" including 41 hypervisors and 1104 guests
Sep 16 12:37:33 my_foreman /usr/bin/virt-who: 2019-09-16 12:37:32,996 [virtwho.destination_-7244243523893140581 INFO] MainProcess(16714):Thread-3 @subscriptionmanager.py:_connect:155 - X-Correlation-ID: 1535e5eace2540f395843b867148b81d
Sep 16 12:42:09 my_foreman /usr/bin/virt-who: 2019-09-16 12:42:09,696 [virtwho.main INFO] MainProcess(16714):Thread-2 @virt.py:_send_data:969 - Report for config "virt-who-config-1" gathered, placing in datastore
Sep 16 12:42:09 my_foreman /usr/bin/virt-who: 2019-09-16 12:42:09,918 [virtwho.main INFO] MainProcess(16714):Thread-2 @virt.py:_send_data:969 - Report for config "virt-who-config-1" gathered, placing in datastore

The scan interval is set for two hours, I’ll check back after that just to see, but so far, nothing showing up in “All Hosts” with “Any Location” view, and no errors to report.

Can you check under Monitor > tasks? You should see a hypervisor checkin task. I’m curious if ended with a warning or threw an error?

You might also tail /var/log/foreman/production.log and /var/log/candlepin/candlepin.log while you restart virt-who to see if anything interesting is printed there.

Same problem as before, it started a task (well, there are four of them stacked up now), its been running for an hour, no Running Steps, no errors, no locks, and the only output from the task at all is the RAW input:

Id: dc7dbccb-ba43-49b8-be0d-0d3d85ea8d19
Label: Actions::Katello::Host::Hypervisors
Duration: about 1 hour
Raw input:
{"services_checked"=>["candlepin", "candlepin_auth"],
 "hypervisors"=>Step(3).output[:hypervisors],
 "current_request_id"=>nil,
 "current_timezone"=>"UTC",
 "current_user_id"=>nil,
 "current_organization_id"=>nil,
 "current_location_id"=>nil}
Raw output:
{}
External Id: f5b7edee-79bc-4817-b5e9-f02e4162b617

I do know for a fact that this worked with 0.24.2 (I think, its been a couple weeks now) and the task finished in less than a minute or two and all my ESXi hosts were visible under “All Hosts”.

Ok, I think it might be configuration related.

I did the following:

  1. Killed the stuck tasks (Foreman.blah("").destroy)
  2. foreman-rake katello:reimport
  3. Downloaded github virt-who 0.24.2 binary
  4. executed (without changing anything else) “./virt-who -d -o -c /etc/virt-who.d/virt-who-config-1.conf” and my task is again stuck with no running steps, no errors, no locks, nothing.

The dynflow console table yields:

Id: 91fc0843-a688-4eb1-bd02-51a333d5599d

Label: Actions::Katello::Host::Hypervisors

Status: running

Result: pending

Started at: 2019-09-16 17:39:36 UTC

Ended at:

Plan
Run
Finalize
Execution History
3: Actions::Candlepin::AsyncHypervisors (pending)
5: Actions::Katello::Host::HypervisorsUpdate (pending)
sequence		concurrence
Back

It never moves on from here.

The only thing I can think of is the virt-who passwords, but I configured my virt-who-config-1.conf with the following:

[virt-who-config-1]
type=esx
hypervisor_id=hostname
owner=MY_ORG
env=Library
server=IP_ADDRESS
username=MY_USER
encrypted_password=MY_PASSWORD_ENCRYPTED_WITH_V-W-Password
rhsm_hostname=FQND_TO_FOREMAN
rhsm_username=virt_who_reporter_1
rhsm_encrypted_password=84884b922649290f5b60c8394a032cb6
rhsm_prefix=/rhsm

Now, the rhsm_encrypted_password is the LAST hash in /var/lib/virt-who/key. There were no errors when it connected to Foreman, and the task started, so I assumed this was the correct hash to use. If I am mistaken, or there is another thing I need to change, please inform. Thanks.

PS: It appears this newer version ignores the “env=” line, but 0.24.2 requires it, so I left it in there. The log specifically says 0.24.7 ignores that unknown configuration option, so it appears harmless to leave it.

This is log output from virt-who-0.24.2 using the 0.24.7 configs, unchanged.

last bit of virt-who output (esxi info removed):

2019-09-16 14:20:20,607 [rhsm.connection DEBUG] MainProcess(40127):Thread-3 @connection.py:_request:545 - Making request: POST /rhsm/hypervisors/MY_ORG?reporter_id=MY_FOREMAN_FQDN-43860f2762b4429fa7b7ae8d0fa2b2b6&cloaked=False&env=Library
2019-09-16 14:20:21,250 [rhsm.connection INFO] MainProcess(40127):Thread-3 @connection.py:_request:588 - Response: status=200, request="POST /rhsm/hypervisors/MY_ORG?reporter_id=MY_FOREMAN_FQDN-43860f2762b4429fa7b7ae8d0fa2b2b6&cloaked=False&env=Library"
2019-09-16 14:20:21,328 [virtwho.destination_-1593291127803739355 DEBUG] MainProcess(40127):Thread-3 @virt.py:run:426 - Thread 'destination_-1593291127803739355' terminated
2019-09-16 14:20:21,799 [virtwho.main DEBUG] MainProcess(40127):MainThread @__main__.py:main:25 - virt-who terminated
2019-09-16 14:20:21,799 [virtwho.main DEBUG] MainProcess(40127):MainThread @executor.py:terminate:225 - virt-who is shutting down

production.log:

2019-09-16T14:20:19 [I|app|60f3ba45] Started GET "/rhsm/status/" for 10.34.193.114 at 2019-09-16 14:20:19 -0400
2019-09-16T14:20:19 [I|app|60f3ba45] Processing by Katello::Api::Rhsm::CandlepinProxiesController#server_status as JSON
2019-09-16T14:20:19 [I|app|60f3ba45] Completed 200 OK in 91ms (Views: 1.1ms | ActiveRecord: 25.3ms)
2019-09-16T14:20:20 [I|app|ca111187] Started GET "/rhsm/consumers/81d2227b-8732-4c9f-b1be-4b4ce6990e91" for 10.34.193.114 at 2019-09-16 14:20:20 -0400
2019-09-16T14:20:20 [I|app|ca111187] Processing by Katello::Api::Rhsm::CandlepinProxiesController#consumer_show as JSON
2019-09-16T14:20:20 [I|app|ca111187]   Parameters: {"id"=>"81d2227b-8732-4c9f-b1be-4b4ce6990e91"}
2019-09-16T14:20:20 [I|app|ca111187] Current user set to 81d2227b-8732-4c9f-b1be-4b4ce6990e91 (regular)
2019-09-16T14:20:20 [I|app|ca111187] Completed 200 OK in 96ms (Views: 13.1ms | ActiveRecord: 8.6ms)
2019-09-16T14:20:20 [I|app|5ae8ef26] Started GET "/rhsm/consumers/81d2227b-8732-4c9f-b1be-4b4ce6990e91/owner" for 10.34.193.114 at 2019-09-16 14:20:20 -0400
2019-09-16T14:20:20 [I|app|5ae8ef26] Processing by Katello::Api::Rhsm::CandlepinProxiesController#get as JSON
2019-09-16T14:20:20 [I|app|5ae8ef26]   Parameters: {"id"=>"81d2227b-8732-4c9f-b1be-4b4ce6990e91"}
2019-09-16T14:20:20 [I|app|5ae8ef26] Current user set to 81d2227b-8732-4c9f-b1be-4b4ce6990e91 (regular)
2019-09-16T14:20:20 [I|app|5ae8ef26] Completed 200 OK in 66ms (Views: 0.3ms | ActiveRecord: 4.7ms)
2019-09-16T14:20:20 [I|app|30e5ebdc] Started GET "/rhsm/status" for 10.34.193.114 at 2019-09-16 14:20:20 -0400
2019-09-16T14:20:20 [I|app|30e5ebdc] Processing by Katello::Api::Rhsm::CandlepinProxiesController#server_status as JSON
2019-09-16T14:20:20 [I|app|30e5ebdc] Completed 200 OK in 43ms (Views: 1.0ms | ActiveRecord: 2.7ms)
2019-09-16T14:20:20 [I|app|54875d08] Started POST "/rhsm/hypervisors/MY_ORG?reporter_id=[FILTERED]&cloaked=False&env=Library" for 10.34.193.114 at 2019-09-16 14:20:20 -0400
2019-09-16T14:20:20 [I|app|54875d08] Processing by Katello::Api::Rhsm::CandlepinProxiesController#async_hypervisors_update as JSON
2019-09-16T14:20:20 [I|app|54875d08]   Parameters: {"reporter_id"=>"[FILTERED]", "cloaked"=>"False", "env"=>"Library", "owner"=>"MY_ORG"}
2019-09-16T14:20:20 [I|app|54875d08] Current user set to 81d2227b-8732-4c9f-b1be-4b4ce6990e91 (regular)
2019-09-16T14:20:20 [I|bac|54875d08] Task {label: Actions::Katello::Host::Hypervisors, id: 6f63ddcc-deb8-4dc1-bf6f-450da561684d, execution_plan_id: 4c165bac-9af6-4f60-aa23-7de0b3cb33fa} state changed: planning
2019-09-16T14:20:20 [I|app|] Current user set to 81d2227b-8732-4c9f-b1be-4b4ce6990e91 (regular)
2019-09-16T14:20:21 [I|app|] Current user set to 81d2227b-8732-4c9f-b1be-4b4ce6990e91 (regular)
2019-09-16T14:20:21 [I|bac|] Task {label: Actions::Katello::Host::Hypervisors, id: 6f63ddcc-deb8-4dc1-bf6f-450da561684d, execution_plan_id: 4c165bac-9af6-4f60-aa23-7de0b3cb33fa} state changed: planned
2019-09-16T14:20:21 [I|app|] Current user set to 81d2227b-8732-4c9f-b1be-4b4ce6990e91 (regular)
2019-09-16T14:20:21 [I|app|] Current user set to 81d2227b-8732-4c9f-b1be-4b4ce6990e91 (regular)
2019-09-16T14:20:21 [I|app|] Current user set to 81d2227b-8732-4c9f-b1be-4b4ce6990e91 (regular)
2019-09-16T14:20:21 [I|app|] Current user set to 81d2227b-8732-4c9f-b1be-4b4ce6990e91 (regular)
2019-09-16T14:20:21 [I|app|] Completed 200 OK in 499ms (Views: 0.5ms | ActiveRecord: 38.9ms)

candlepin.log:

2019-09-16 14:20:00,032 [thread=QuartzScheduler_Worker-10] [job=SweepBarJob-6ce72469-4876-4df0-9c4c-f23143636366, org=, csid=] INFO  org.candlepin.pinsetter.tasks.KingpinJob - Starting job: org.candlepin.pinsetter.tasks.SweepBarJob
2019-09-16 14:20:00,036 [thread=QuartzScheduler_Worker-10] [job=SweepBarJob-6ce72469-4876-4df0-9c4c-f23143636366, org=, csid=] INFO  org.candlepin.pinsetter.tasks.KingpinJob - Job completed: time=4
2019-09-16 14:20:19,982 [thread=http-bio-8443-exec-5] [req=d763111e-3566-4f24-be16-fab5fa3da220, org=, csid=] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=GET, uri=/candlepin/status
2019-09-16 14:20:19,988 [thread=http-bio-8443-exec-5] [req=d763111e-3566-4f24-be16-fab5fa3da220, org=, csid=] INFO  org.candlepin.common.filter.LoggingFilter - Response: status=200, content-type="application/json", time=6
2019-09-16 14:20:20,113 [thread=http-bio-8443-exec-10] [req=a838c228-0628-4644-8cf8-8d639a1563c9, org=, csid=ca111187-8f13-49a5-9fc2-70f6004cc396] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=GET, uri=/candlepin/consumers/81d2227b-8732-4c9f-b1be-4b4ce6990e91
2019-09-16 14:20:20,147 [thread=http-bio-8443-exec-10] [req=a838c228-0628-4644-8cf8-8d639a1563c9, org=MY_ORG, csid=ca111187-8f13-49a5-9fc2-70f6004cc396] INFO  org.candlepin.common.filter.LoggingFilter - Response: status=200, content-type="application/json", time=34
2019-09-16 14:20:20,302 [thread=http-bio-8443-exec-7] [req=904f07b2-0d1c-4930-964f-c1f9d24b46a6, org=, csid=] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=GET, uri=/candlepin/consumers/81d2227b-8732-4c9f-b1be-4b4ce6990e91/owner
2019-09-16 14:20:20,322 [thread=http-bio-8443-exec-7] [req=904f07b2-0d1c-4930-964f-c1f9d24b46a6, org=MY_ORG, csid=] INFO  org.candlepin.common.filter.LoggingFilter - Response: status=200, content-type="application/json", time=20
2019-09-16 14:20:20,445 [thread=http-bio-8443-exec-1] [req=42e49dea-1fc7-4af5-a204-045b8bdc8fe6, org=, csid=] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=GET, uri=/candlepin/status
2019-09-16 14:20:20,448 [thread=http-bio-8443-exec-1] [req=42e49dea-1fc7-4af5-a204-045b8bdc8fe6, org=, csid=] INFO  org.candlepin.common.filter.LoggingFilter - Response: status=200, content-type="application/json", time=4
2019-09-16 14:20:20,810 [thread=http-bio-8443-exec-4] [req=60b3d519-335b-4c89-8833-4582892157f9, org=, csid=54875d08-c0b7-49fc-8b76-54106e11e715] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=POST, uri=/candlepin/hypervisors/MY_ORG
2019-09-16 14:20:20,840 [thread=http-bio-8443-exec-4] [req=60b3d519-335b-4c89-8833-4582892157f9, org=MY_ORG, csid=54875d08-c0b7-49fc-8b76-54106e11e715] INFO  org.candlepin.resource.HypervisorResource - Hypervisor update by principal: {"type":"trusteduser","name":"foreman_admin"}
2019-09-16 14:20:20,889 [thread=http-bio-8443-exec-4] [req=60b3d519-335b-4c89-8833-4582892157f9, org=MY_ORG, csid=54875d08-c0b7-49fc-8b76-54106e11e715] INFO  org.candlepin.common.filter.LoggingFilter - Response: status=202, content-type="application/json", time=83
2019-09-16 14:20:20,936 [thread=QuartzScheduler_Worker-1] [job=hypervisor_update_23727021-4db2-4cf9-aa17-870b1fd30f57, org=MY_ORG, csid=] INFO  org.candlepin.pinsetter.tasks.KingpinJob - Starting job: org.candlepin.pinsetter.tasks.HypervisorUpdateJob$$EnhancerByGuice$$663b758d
2019-09-16 14:20:21,044 [thread=http-bio-8443-exec-3] [req=90a93f3e-f304-436a-997a-8034401b4346, org=, csid=] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=GET, uri=/candlepin/status
2019-09-16 14:20:21,048 [thread=http-bio-8443-exec-3] [req=90a93f3e-f304-436a-997a-8034401b4346, org=, csid=] INFO  org.candlepin.common.filter.LoggingFilter - Response: status=200, content-type="application/json", time=4
2019-09-16 14:20:21,079 [thread=http-bio-8443-exec-6] [req=7eeceac1-7782-4872-bc7b-3dbabfacca07, org=, csid=] INFO  org.candlepin.common.filter.LoggingFilter - Request: verb=GET, uri=/candlepin/status
2019-09-16 14:20:21,082 [thread=http-bio-8443-exec-6] [req=7eeceac1-7782-4872-bc7b-3dbabfacca07, org=, csid=] INFO  org.candlepin.common.filter.LoggingFilter - Response: status=200, content-type="application/json", time=4
2019-09-16 14:20:21,595 [thread=QuartzScheduler_Worker-1] [job=hypervisor_update_23727021-4db2-4cf9-aa17-870b1fd30f57, org=MY_ORG, csid=] INFO  org.candlepin.pinsetter.tasks.HypervisorUpdateJob - Summary for report from null by principal {"type":"trusteduser","name":"foreman_admin"}
 Created: 0, Updated: 2, Unchanged: 39, Failed: 0
2019-09-16 14:20:21,607 [thread=QuartzScheduler_Worker-1] [job=hypervisor_update_23727021-4db2-4cf9-aa17-870b1fd30f57, org=MY_ORG, csid=] INFO  org.candlepin.pinsetter.tasks.KingpinJob - Job completed: time=671

That’s everything I got, if you want more, tell me where.

If I’m reading that candlepin log correctly, it seems like it completed, yet the task never moves on. It’s still stuck there today. 0% complete, no running steps, no errors, no output.

Would you be able to get the task output for that async hypervisor task:

Go to monitor > tasks.
Look for the ‘Hypervisors’ task. Click on it
Click on ‘dynflow console’
click on the AsyncHypervisors Step
Grab the “Output” from there.

Here’s an example of what mine looks like (its complete though):

Also, i’d recommend against doing this. You can clear tasks with the foreman-tasks task cleaner:

doing it just via .destroy() will not fully delete the task.

Here is my Dynflow output:

Thanks for this link, I’ll try getting this working later.

Google recommended me the .destroy method.

Sorry for the out of order replies:

From Dynflow, here is the text on the “Execution History” :


event	time	world
start execution	2019-09-16 18:20:21 UTC	5c62e6bf-8733-4508-a8fb-9ff6e39dd505 {"hostname"=>"MY_FOREMAN_FQDN", "pid"=>3079, "queues"=>{"default"=>{"pool_size"=>5}, "remote_execution"=>{"pool_size"=>5}}, "last_seen"=>"2019-09-17 11:56:31.367", "delayed_executor"=>true}

Are other things in foreman-tasks working fine? Things like repository syncing or content view publish?

This seems to be the only stuck task. Repo sync’s succeeded after this task was started. Other tasks are finishing with state: stopped, Result: success. No Hypervisor task ever moves.

Can you elaborate on this please? After installing the RPM, this file exists (I don’t recall creating it when I was installing from source) with two hashes in it. I figured out the second hash is the rhsm password, but what is the first hash? Your reply seems to infer that the first hash is the “password”, but password for what? The only other password in the /etc/virt-who.d/virt-who-config-1.conf is the password for the user for vSphere and that first hash does not work for that. What is that first hash in /var/lib/virt-who/key for?

Please detail the command used to clean these Hypervisor tasks. Oddly enough, I already have this plugin installed in my Foreman.

I attempted to do:

foreman-rake foreman_tasks:cleanup TASK_SEARCH="Actions::Candlepin::AsyncHypervisors" STATE="running" AFTER=1s NOOP=true
foreman-rake foreman_tasks:cleanup TASK_SEARCH="Actions::Katello::Host::HypervisorsUpdate" STATE="running" AFTER=1s NOOP=true

and neither command reported any tasks that would be cleaned, even though there are running tasks with state=running and older than 1s. I’d like to use the correct process for this task cleanup, but following the documentation has so far yielded no results.

nvm on the foreman_tasks:cleanup command, I had the syntax all wrong. Some trial/error yielded:

foreman-rake foreman_tasks:cleanup TASK_SEARCH="action=Hypervisors" STATES=running AFTER=2h NOOP=true

to at least acknowledge there was a task that needed cleaning.

This is a master password encrypted with user provided pasphrase I believe. It’s auto generated with first virt-who-password run I think. I think that’s then used for encrypting/decrypting any other password.

Ok, that makes some sense. I’m disappointed it won’t help me solve my issue though.

I’d be more than happy to publish my complete installation steps. I took meticulous notes cuz I’m that kinda guy. Maybe it helps someone reproduce? I really, really, really, need this virt-who business to work and I feel like I’m right on the verge (had it working on old version actually, but now, doesn’t work again).

So, a complete service restart seems to have let new Hypervisor tasks actually run.

Solution:

foreman-maintain service stop && foreman-maintain service start

I’m seeing ESXi hosts populating in Hosts->All Hosts now as expected.

/var/lib/virt-who/key hashes explained.

Correct foreman-rake commands for killing a stuck Hypervisor task seems to be close to :

foreman-rake foreman_tasks:cleanup TASK_SEARCH="action=Hypervisors" STATES=running AFTER=1s

(tweak time as needed, this will kill all Hypervisors tasks in state running older than 1s, which is probably all of them, so don’t just c/p)

Also side note: they moved the virt-who-0.24.7 to main from cr repo very very recently.