HELP! Unable to rebuild hosts from PXE in production environment

Hi all,
I have a Foreman installed on production environment version 1.8.2 and for
some reason, with no current change in system, i am start getting this
CRITICAL error:

Started GET
"/unattended/provision?token=9fbfb26e-5c78-426d-8cad-8d685886c520" for
172.29.52.125 at 2015-06-25 07:27:51 -0500
2015-06-25 07:27:51 [I] Processing by UnattendedController#provision as /
2015-06-25 07:27:51 [I] Parameters:
{"token"=>"9fbfb26e-5c78-426d-8cad-8d685886c520"}
2015-06-25 07:27:51 [I] Found civwdd202.lab.primarydata.com
2015-06-25 07:27:51 [I] Remove puppet certificate for
civwdd202.lab.primarydata.com
2015-06-25 07:27:53 [I]

Started GET
"/api/unattended/provision?token=9fbfb26e-5c78-426d-8cad-8d685886c520" for
10.100.8.19 at 2015-06-25 07:27:53 -0500
2015-06-25 07:27:53 [F]
ActionController::RoutingError (No route matches [GET]
"/api/unattended/provision"):

/usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/rack/thread_handler_extension.rb:77:in
`process_request'

/usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/request_handler/thread_handler.rb:140:in
`accept_and_process_next_request'

/usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/request_handler/thread_handler.rb:108:in
`main_loop'

/usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/request_handler.rb:441:in
`block (3 levels) in start_threads'

2015-06-25 07:28:00 [I]

Started GET "/api/unattended/provision" for 10.100.8.19 at 2015-06-25
07:28:00 -0500
2015-06-25 07:28:00 [F]
ActionController::RoutingError (No route matches [GET]
"/api/unattended/provision"):

/usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/rack/thread_handler_extension.rb:77:in
`process_request'

/usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/request_handler/thread_handler.rb:140:in
`accept_and_process_next_request'

/usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/request_handler/thread_handler.rb:108:in
`main_loop'

/usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/request_handler.rb:441:in
`block (3 levels) in start_threads'

In the actual boot console i see the error "Failed to fetch kickstart from
http://…"

Can someone please help me understand what is happening and how to solve
it?

Thanks

<https://lh3.googleusercontent.com/-TeWmDT5H44k/VYwGvjXu7wI/AAAAAAAAEJI/QCHGlwcXuP4/s1600/Screen%2BShot%2B2015-06-25%2Bat%2B4.46.32%2BPM.png>

··· On Thursday, June 25, 2015 at 3:37:55 PM UTC+3, Avi Tal wrote: > > *Hi all,* > *I have a Foreman installed on production environment version 1.8.2 and > for some reason, with no current change in system, i am start getting this > CRITICAL error:* > > Started GET > "/unattended/provision?token=9fbfb26e-5c78-426d-8cad-8d685886c520" for > 172.29.52.125 at 2015-06-25 07:27:51 -0500 > 2015-06-25 07:27:51 [I] Processing by UnattendedController#provision as */* > 2015-06-25 07:27:51 [I] Parameters: > {"token"=>"9fbfb26e-5c78-426d-8cad-8d685886c520"} > 2015-06-25 07:27:51 [I] Found civwdd202.lab.primarydata.com > 2015-06-25 07:27:51 [I] Remove puppet certificate for > civwdd202.lab.primarydata.com > 2015-06-25 07:27:53 [I] > > Started GET > "/api/unattended/provision?token=9fbfb26e-5c78-426d-8cad-8d685886c520" for > 10.100.8.19 at 2015-06-25 07:27:53 -0500 > 2015-06-25 07:27:53 [F] > ActionController::RoutingError (No route matches [GET] > "/api/unattended/provision"): > > /usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/rack/thread_handler_extension.rb:77:in > `process_request' > > /usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/request_handler/thread_handler.rb:140:in > `accept_and_process_next_request' > > /usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/request_handler/thread_handler.rb:108:in > `main_loop' > > /usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/request_handler.rb:441:in > `block (3 levels) in start_threads' > > > 2015-06-25 07:28:00 [I] > > Started GET "/api/unattended/provision" for 10.100.8.19 at 2015-06-25 > 07:28:00 -0500 > 2015-06-25 07:28:00 [F] > ActionController::RoutingError (No route matches [GET] > "/api/unattended/provision"): > > /usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/rack/thread_handler_extension.rb:77:in > `process_request' > > /usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/request_handler/thread_handler.rb:140:in > `accept_and_process_next_request' > > /usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/request_handler/thread_handler.rb:108:in > `main_loop' > > /usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/request_handler.rb:441:in > `block (3 levels) in start_threads' > > > > *In the actual boot console i see the error "Failed to fetch kickstart > from http://..."* > *Can someone please help me understand what is happening and how to solve > it?* > > *Thanks* >

> I have a Foreman installed on production environment version 1.8.2 and
> for some reason, with no current change in system, i am start getting
> this CRITICAL error:

>
> Started GET
> "/unattended/provision?token=9fbfb26e-5c78-426d-8cad-8d685886c520" for
> 172.29.52.125 at 2015-06-25 07:27:51 -0500
> 2015-06-25 07:27:51 [I] Processing by UnattendedController#provision as /
> 2015-06-25 07:27:51 [I] Parameters:
> {"token"=>"9fbfb26e-5c78-426d-8cad-8d685886c520"}
> 2015-06-25 07:27:51 [I] Found civwdd202.lab.primarydata.com
> 2015-06-25 07:27:51 [I] Remove puppet certificate for
> civwdd202.lab.primarydata.com
> 2015-06-25 07:27:53 [I]

Strange that there's no further messages in relation to this request, as
it appears to end in a 500 error. Does the preview under Host > your
host > Templates (left hand side) > provision > Review work?

> Started GET
> "/api/unattended/provision?token=9fbfb26e-5c78-426d-8cad-8d685886c520"
> for 10.100.8.19 at 2015-06-25 07:27:53 -0500
> 2015-06-25 07:27:53 [F]
> ActionController::RoutingError (No route matches [GET]
> "/api/unattended/provision"):

This is very strange, hosts shouldn't be using /api for their templates.
What is the IP address 10.100.8.19? It's different to the first
request. Do you have anything custom, like foreman_hooks?

··· On 25/06/15 13:37, Avi Tal wrote:


Dominic Cleal
Red Hat Engineering

Hey,

> "/api/unattended/provision?token=9fbfb26e-5c78-426d-8cad-8d685886c520" for

this is wrong URL, remove "/api" part. Did you modify the provisioning
template or foreman_url setting?

··· -- Later, Lukas #lzap Zapletal

more info

tail /var/log/httpd/foreman_access.log

172.29.52.125 - - [26/Jun/2015:00:00:10 -0500] "GET
/unattended/provision?token=06f83798-6b33-4ad9-98e4-c7c2d65149e0 HTTP/1.1"
500 728 "-" "curl/7.29.0"
172.29.52.125 - - [26/Jun/2015:00:01:12 -0500] "GET
/unattended/provision?token=06f83798-6b33-4ad9-98e4-c7c2d65149e0 HTTP/1.1"
500 728 "-" "curl/7.29.0"
172.29.52.125 - - [26/Jun/2015:00:02:14 -0500] "GET
/unattended/provision?token=06f83798-6b33-4ad9-98e4-c7c2d65149e0 HTTP/1.1"
500 728 "-" "curl/7.29.0"
172.29.52.125 - - [26/Jun/2015:00:03:18 -0500] "GET
/unattended/provision?token=06f83798-6b33-4ad9-98e4-c7c2d65149e0 HTTP/1.1"
500 728 "-" "curl/7.29.0"
172.29.52.125 - - [26/Jun/2015:00:12:32 -0500] "GET
/unattended/provision?token=9b2b9f95-0974-45a1-96ee-c8466cfe6bbc HTTP/1.1"
500 728 "-" "curl/7.29.0"
172.29.52.125 - - [26/Jun/2015:00:13:34 -0500] "GET
/unattended/provision?token=9b2b9f95-0974-45a1-96ee-c8466cfe6bbc HTTP/1.1"
500 728 "-" "curl/7.29.0"
172.29.52.125 - - [26/Jun/2015:00:14:36 -0500] "GET
/unattended/provision?token=9b2b9f95-0974-45a1-96ee-c8466cfe6bbc HTTP/1.1"
500 728 "-" "curl/7.29.0"
172.29.52.125 - - [26/Jun/2015:00:15:40 -0500] "GET
/unattended/provision?token=9b2b9f95-0974-45a1-96ee-c8466cfe6bbc HTTP/1.1"
500 728 "-" "curl/7.29.0"

··· On Thursday, June 25, 2015 at 4:49:04 PM UTC+3, Avi Tal wrote: > > > > > > On Thursday, June 25, 2015 at 3:37:55 PM UTC+3, Avi Tal wrote: >> >> *Hi all,* >> *I have a Foreman installed on production environment version 1.8.2 and >> for some reason, with no current change in system, i am start getting this >> CRITICAL error:* >> >> Started GET >> "/unattended/provision?token=9fbfb26e-5c78-426d-8cad-8d685886c520" for >> 172.29.52.125 at 2015-06-25 07:27:51 -0500 >> 2015-06-25 07:27:51 [I] Processing by UnattendedController#provision as >> */* >> 2015-06-25 07:27:51 [I] Parameters: >> {"token"=>"9fbfb26e-5c78-426d-8cad-8d685886c520"} >> 2015-06-25 07:27:51 [I] Found civwdd202.lab.primarydata.com >> 2015-06-25 07:27:51 [I] Remove puppet certificate for >> civwdd202.lab.primarydata.com >> 2015-06-25 07:27:53 [I] >> >> Started GET >> "/api/unattended/provision?token=9fbfb26e-5c78-426d-8cad-8d685886c520" for >> 10.100.8.19 at 2015-06-25 07:27:53 -0500 >> 2015-06-25 07:27:53 [F] >> ActionController::RoutingError (No route matches [GET] >> "/api/unattended/provision"): >> >> /usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/rack/thread_handler_extension.rb:77:in >> `process_request' >> >> /usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/request_handler/thread_handler.rb:140:in >> `accept_and_process_next_request' >> >> /usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/request_handler/thread_handler.rb:108:in >> `main_loop' >> >> /usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/request_handler.rb:441:in >> `block (3 levels) in start_threads' >> >> >> 2015-06-25 07:28:00 [I] >> >> Started GET "/api/unattended/provision" for 10.100.8.19 at 2015-06-25 >> 07:28:00 -0500 >> 2015-06-25 07:28:00 [F] >> ActionController::RoutingError (No route matches [GET] >> "/api/unattended/provision"): >> >> /usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/rack/thread_handler_extension.rb:77:in >> `process_request' >> >> /usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/request_handler/thread_handler.rb:140:in >> `accept_and_process_next_request' >> >> /usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/request_handler/thread_handler.rb:108:in >> `main_loop' >> >> /usr/lib/ruby/gems/1.8/gems/passenger-4.0.18/lib/phusion_passenger/request_handler.rb:441:in >> `block (3 levels) in start_threads' >> >> >> >> *In the actual boot console i see the error "Failed to fetch kickstart >> from http://..."* >> *Can someone please help me understand what is happening and how to solve >> it?* >> >> *Thanks* >> >

I haven't install foreman_hooks and never touch foreman_url settings.
Dominic, provision review is working for this specific host.

BTW, the same problem appears for several hosts.

Guys, i must have a workaround ASAP. What can i do to debug it? Can you
recommend a fast workaround?

I'll appreciate every suggestion.

Thanks

··· On Friday, June 26, 2015 at 1:39:58 PM UTC+3, Lukas Zapletal wrote: > > Hey, > > > "/api/unattended/provision?token=9fbfb26e-5c78-426d-8cad-8d685886c520" > for > > this is wrong URL, remove "/api" part. Did you modify the provisioning > template or foreman_url setting? > > -- > Later, > Lukas #lzap Zapletal >

Do you guys think that the following log is related?
BTW, civwdd202.lab.primarydata.com host is the one i have trouble with

Started POST "/api/reports" for 172.29.100.100 at 2015-06-27 02:12:21 -0500
2015-06-27 02:12:21 [I] Processing by Api::V2::ReportsController#create as
JSON
2015-06-27 02:12:21 [I] Parameters: {"report"=>"[FILTERED]", "apiv"=>"v2"}
2015-06-27 02:12:21 [I] processing report for civwds005.lab.primarydata.com
2015-06-27 02:12:21 [I] Imported report for civwds005.lab.primarydata.com
in 0.02 seconds
2015-06-27 02:12:21 [I] Completed 201 Created in 34ms (Views: 2.3ms |
ActiveRecord: 0.0ms)
2015-06-27 02:12:30 [W] Operation FAILED: ERF12-7740
[ProxyAPI::ProxyException]: Unable to delete PuppetCA certificate for
civwdd202.lab.primarydata.com ([RestClient::RequestTimeout]: Request
Timeout) for proxy https://foreman1-vw.lab.primarydata.com:8443/puppet/ca
2015-06-27 02:12:30 [I] Completed 500 Internal Server Error in 60073ms
2015-06-27 02:12:30 [F]
ArgumentError (There was no default layout for UnattendedController in
#<ActionView::PathSet:0x0000000acf95e0
@paths=[/usr/share/foreman/app/views,
/opt/rh/ruby193/root/usr/share/gems/gems/foreman_setup-2.1.1/app/views,
/opt/rh/ruby193/root/usr/share/gems/gems/foreman_bootdisk-5.0.0/app/views,
/opt/rh/ruby193/root/usr/share/gems/gems/apipie-rails-0.2.6/app/views]>):
app/controllers/application_controller.rb:299:in generic_exception&#39; lib/middleware/catch_json_parse_errors.rb:9:incall'

··· On Saturday, June 27, 2015 at 9:45:13 AM UTC+3, Avi Tal wrote: > > I haven't install foreman_hooks and never touch foreman_url settings. > Dominic, provision review is working for this specific host. > > BTW, the same problem appears for several hosts. > > Guys, i must have a workaround ASAP. What can i do to debug it? Can you > recommend a fast workaround? > > I'll appreciate every suggestion. > > Thanks > > > On Friday, June 26, 2015 at 1:39:58 PM UTC+3, Lukas Zapletal wrote: >> >> Hey, >> >> > "/api/unattended/provision?token=9fbfb26e-5c78-426d-8cad-8d685886c520" >> for >> >> this is wrong URL, remove "/api" part. Did you modify the provisioning >> template or foreman_url setting? >> >> -- >> Later, >> Lukas #lzap Zapletal >> >

I think i got it even though i have no idea how it appears!

For some reason civwdd202… certificate is no longer listed under smart
proxy certificates
Again, i have done anything that should cause removal of its certificates.

How can i add back the hosts's certificate?

E, [2015-06-27T02:06:00.835720 #24712] ERROR – : Attempt to remove
nonexistant client autosign for civwdd202.lab.primarydata.com
172.29.100.100 - - [27/Jun/2015 02:06:00] "DELETE
/autosign/civwdd202.lab.primarydata.com HTTP/1.1" 404 79 0.0016

··· On Saturday, June 27, 2015 at 10:14:57 AM UTC+3, Avi Tal wrote: > > *Do you guys think that the following log is related?* > *BTW, civwdd202.lab.primarydata.com > host is the one i have trouble with* > > > Started POST "/api/reports" for 172.29.100.100 at 2015-06-27 02:12:21 -0500 > 2015-06-27 02:12:21 [I] Processing by Api::V2::ReportsController#create as > JSON > 2015-06-27 02:12:21 [I] Parameters: {"report"=>"[FILTERED]", > "apiv"=>"v2"} > 2015-06-27 02:12:21 [I] processing report for > civwds005.lab.primarydata.com > 2015-06-27 02:12:21 [I] Imported report for civwds005.lab.primarydata.com > in 0.02 seconds > 2015-06-27 02:12:21 [I] Completed 201 Created in 34ms (Views: 2.3ms | > ActiveRecord: 0.0ms) > 2015-06-27 02:12:30 [W] Operation FAILED: ERF12-7740 > [ProxyAPI::ProxyException]: Unable to delete PuppetCA certificate for > civwdd202.lab.primarydata.com ([RestClient::RequestTimeout]: Request > Timeout) for proxy https://foreman1-vw.lab.primarydata.com:8443/puppet/ca > 2015-06-27 02:12:30 [I] Completed 500 Internal Server Error in 60073ms > 2015-06-27 02:12:30 [F] > ArgumentError (There was no default layout for UnattendedController in > # @paths=[/usr/share/foreman/app/views, > /opt/rh/ruby193/root/usr/share/gems/gems/foreman_setup-2.1.1/app/views, > /opt/rh/ruby193/root/usr/share/gems/gems/foreman_bootdisk-5.0.0/app/views, > /opt/rh/ruby193/root/usr/share/gems/gems/apipie-rails-0.2.6/app/views]>): > app/controllers/application_controller.rb:299:in `generic_exception' > lib/middleware/catch_json_parse_errors.rb:9:in `call' > > > > > On Saturday, June 27, 2015 at 9:45:13 AM UTC+3, Avi Tal wrote: >> >> I haven't install foreman_hooks and never touch foreman_url settings. >> Dominic, provision review is working for this specific host. >> >> BTW, the same problem appears for several hosts. >> >> Guys, i must have a workaround ASAP. What can i do to debug it? Can you >> recommend a fast workaround? >> >> I'll appreciate every suggestion. >> >> Thanks >> >> >> On Friday, June 26, 2015 at 1:39:58 PM UTC+3, Lukas Zapletal wrote: >>> >>> Hey, >>> >>> > "/api/unattended/provision?token=9fbfb26e-5c78-426d-8cad-8d685886c520" >>> for >>> >>> this is wrong URL, remove "/api" part. Did you modify the provisioning >>> template or foreman_url setting? >>> >>> -- >>> Later, >>> Lukas #lzap Zapletal >>> >>

> Do you guys think that the following log is related?
> BTW, civwdd202.lab.primarydata.com host is the one i have trouble with
>
>
> Started POST "/api/reports" for 172.29.100.100 at 2015-06-27 02:12:21 -0500
> 2015-06-27 02:12:21 [I] Processing by Api::V2::ReportsController#create
> as JSON
> 2015-06-27 02:12:21 [I] Parameters: {"report"=>"[FILTERED]", "apiv"=>"v2"}
> 2015-06-27 02:12:21 [I] processing report for civwds005.lab.primarydata.com
> 2015-06-27 02:12:21 [I] Imported report for
> civwds005.lab.primarydata.com in 0.02 seconds
> 2015-06-27 02:12:21 [I] Completed 201 Created in 34ms (Views: 2.3ms |
> ActiveRecord: 0.0ms)
> 2015-06-27 02:12:30 [W] Operation FAILED: ERF12-7740
> [ProxyAPI::ProxyException]: Unable to delete PuppetCA certificate for
> civwdd202.lab.primarydata.com ([RestClient::RequestTimeout]: Request
> Timeout) for proxy https://foreman1-vw.lab.primarydata.com:8443/puppet/ca
> 2015-06-27 02:12:30 [I] Completed 500 Internal Server Error in 60073ms

This suggests something's wrong on your PuppetCA proxy. I'd suspect
that sudoers is misconfigured, perhaps it's prompting foreman-proxy for
a password.

Double check that you have the correct sudoers entries (both) for the
version of Puppet installed:
http://theforeman.org/manuals/1.8/index.html#4.3.8PuppetCA

> I have just witnessed a removal of another host from the certificates list without adding back
>
> Started GET "/unattended/provision?token=48a4c1f2-a997-4d04-a118-8c23e300d526" for 172.29.52.126 at 2015-06-27 03:26:05 -0500
> …skipping…
> 2015-06-27 03:26:05 [I] Found civwdd203.lab.primarydata.com
> 2015-06-27 03:26:05 [I] Remove puppet certificate for civwdd203.lab.primarydata.com
> 2015-06-27 03:26:13 [I]
>
> What is happening? What do i miss here?

The host is being rebuilt, so its certificate has been removed.
Accessing /unattended/provision is it requesting a kickstart/preseed
script. If it's being rebuilt, the certificates will get replaced.

If it's not being rebuilt, then I'd check what 172.29.52.126 is, and why
it's requesting a kickstart!

··· On 27/06/15 08:14, Avi Tal wrote:


Dominic Cleal
Red Hat Engineering

I have just witnessed a removal of another host from the certificates list
without adding back

Started GET
"/unattended/provision?token=48a4c1f2-a997-4d04-a118-8c23e300d526" for
172.29.52.126 at 2015-06-27 03:26:05 -0500
…skipping…
2015-06-27 03:26:05 [I] Found civwdd203.lab.primarydata.com
2015-06-27 03:26:05 [I] Remove puppet certificate for
civwdd203.lab.primarydata.com
2015-06-27 03:26:13 [I]

What is happening? What do i miss here?
:frowning:

··· On Saturday, June 27, 2015 at 11:21:42 AM UTC+3, Avi Tal wrote: > > I think i got it even though i have no idea how it appears! > > For some reason civwdd202... certificate is no longer listed under smart > proxy certificates > Again, i have done anything that should cause removal of its certificates. > > How can i add back the hosts's certificate? > > E, [2015-06-27T02:06:00.835720 #24712] ERROR -- : Attempt to remove > nonexistant client autosign for civwdd202.lab.primarydata.com > 172.29.100.100 - - [27/Jun/2015 02:06:00] "DELETE /autosign/ > civwdd202.lab.primarydata.com HTTP/1.1" 404 79 0.0016 > > On Saturday, June 27, 2015 at 10:14:57 AM UTC+3, Avi Tal wrote: >> >> *Do you guys think that the following log is related?* >> *BTW, civwdd202.lab.primarydata.com >> host is the one i have trouble with* >> >> >> Started POST "/api/reports" for 172.29.100.100 at 2015-06-27 02:12:21 >> -0500 >> 2015-06-27 02:12:21 [I] Processing by Api::V2::ReportsController#create >> as JSON >> 2015-06-27 02:12:21 [I] Parameters: {"report"=>"[FILTERED]", >> "apiv"=>"v2"} >> 2015-06-27 02:12:21 [I] processing report for >> civwds005.lab.primarydata.com >> 2015-06-27 02:12:21 [I] Imported report for civwds005.lab.primarydata.com >> in 0.02 seconds >> 2015-06-27 02:12:21 [I] Completed 201 Created in 34ms (Views: 2.3ms | >> ActiveRecord: 0.0ms) >> 2015-06-27 02:12:30 [W] Operation FAILED: ERF12-7740 >> [ProxyAPI::ProxyException]: Unable to delete PuppetCA certificate for >> civwdd202.lab.primarydata.com ([RestClient::RequestTimeout]: Request >> Timeout) for proxy https://foreman1-vw.lab.primarydata.com:8443/puppet/ca >> 2015-06-27 02:12:30 [I] Completed 500 Internal Server Error in 60073ms >> 2015-06-27 02:12:30 [F] >> ArgumentError (There was no default layout for UnattendedController in >> #> @paths=[/usr/share/foreman/app/views, >> /opt/rh/ruby193/root/usr/share/gems/gems/foreman_setup-2.1.1/app/views, >> /opt/rh/ruby193/root/usr/share/gems/gems/foreman_bootdisk-5.0.0/app/views, >> /opt/rh/ruby193/root/usr/share/gems/gems/apipie-rails-0.2.6/app/views]>): >> app/controllers/application_controller.rb:299:in `generic_exception' >> lib/middleware/catch_json_parse_errors.rb:9:in `call' >> >> >> >> >> On Saturday, June 27, 2015 at 9:45:13 AM UTC+3, Avi Tal wrote: >>> >>> I haven't install foreman_hooks and never touch foreman_url settings. >>> Dominic, provision review is working for this specific host. >>> >>> BTW, the same problem appears for several hosts. >>> >>> Guys, i must have a workaround ASAP. What can i do to debug it? Can you >>> recommend a fast workaround? >>> >>> I'll appreciate every suggestion. >>> >>> Thanks >>> >>> >>> On Friday, June 26, 2015 at 1:39:58 PM UTC+3, Lukas Zapletal wrote: >>>> >>>> Hey, >>>> >>>> > >>>> "/api/unattended/provision?token=9fbfb26e-5c78-426d-8cad-8d685886c520" for >>>> >>>> this is wrong URL, remove "/api" part. Did you modify the provisioning >>>> template or foreman_url setting? >>>> >>>> -- >>>> Later, >>>> Lukas #lzap Zapletal >>>> >>>

10x, so i have reconfigured my sudoers and waiting for reproduce.
Will update if it does

··· On Monday, June 29, 2015 at 11:59:53 AM UTC+3, Dominic Cleal wrote: > > On 27/06/15 08:14, Avi Tal wrote: > > *Do you guys think that the following log is related?* > > *BTW, civwdd202.lab.primarydata.com host is the one i have trouble > with* > > > > > > Started POST "/api/reports" for 172.29.100.100 at 2015-06-27 02:12:21 > -0500 > > 2015-06-27 02:12:21 [I] Processing by Api::V2::ReportsController#create > > as JSON > > 2015-06-27 02:12:21 [I] Parameters: {"report"=>"[FILTERED]", > "apiv"=>"v2"} > > 2015-06-27 02:12:21 [I] processing report for > civwds005.lab.primarydata.com > > 2015-06-27 02:12:21 [I] Imported report for > > civwds005.lab.primarydata.com in 0.02 seconds > > 2015-06-27 02:12:21 [I] Completed 201 Created in 34ms (Views: 2.3ms | > > ActiveRecord: 0.0ms) > > 2015-06-27 02:12:30 [W] Operation FAILED: ERF12-7740 > > [ProxyAPI::ProxyException]: Unable to delete PuppetCA certificate for > > civwdd202.lab.primarydata.com ([RestClient::RequestTimeout]: Request > > Timeout) for proxy > https://foreman1-vw.lab.primarydata.com:8443/puppet/ca > > 2015-06-27 02:12:30 [I] Completed 500 Internal Server Error in 60073ms > > This suggests something's wrong on your PuppetCA proxy. I'd suspect > that sudoers is misconfigured, perhaps it's prompting foreman-proxy for > a password. > > Double check that you have the correct sudoers entries (both) for the > version of Puppet installed: > http://theforeman.org/manuals/1.8/index.html#4.3.8PuppetCA > > > I have just witnessed a removal of another host from the certificates > list without adding back > > > > Started GET > "/unattended/provision?token=48a4c1f2-a997-4d04-a118-8c23e300d526" for > 172.29.52.126 at 2015-06-27 03:26:05 -0500 > > ...skipping... > > 2015-06-27 03:26:05 [I] Found civwdd203.lab.primarydata.com > > 2015-06-27 03:26:05 [I] Remove puppet certificate for > civwdd203.lab.primarydata.com > > 2015-06-27 03:26:13 [I] > > > > What is happening? What do i miss here? > > The host is being rebuilt, so its certificate has been removed. > Accessing /unattended/provision is it requesting a kickstart/preseed > script. If it's being rebuilt, the certificates will get replaced. > > If it's not being rebuilt, then I'd check what 172.29.52.126 is, and why > it's requesting a kickstart! > > -- > Dominic Cleal > Red Hat Engineering >

Well, sadly it keeps reproducing randomly.
It can simply work perfect for existing hosts/new created hosts but then
one of the hosts suddenly will loose his certificate and provisioning will
fail again.
The only solution again is to run "puppet cert --generate … "
I couldn't find any pattern for the reproduction of this failure.

I'll appreciate any help guys.

Here are some details:
$ cat /etc/redhat-release
CentOS release 6.5 (Final)

$ rpm -qa |grep foreman*
foreman-compute-1.8.2-1.el6.noarch
foreman-ec2-1.8.2-1.el6.noarch
rubygem-foreman_api-0.1.11-2.el6.noarch
foreman-selinux-1.8.2-1.el6.noarch
foreman-postgresql-1.8.2-1.el6.noarch
ruby193-rubygem-foreman_setup-2.1.1-1.el6.noarch
rubygem-hammer_cli_foreman-0.2.0-1.el6.noarch
ruby193-rubygem-foreman_bootdisk-5.0.0-1.el6.noarch
foreman-debug-1.8.2-1.el6.noarch
foreman-installer-1.8.2-1.el6.noarch
foreman-proxy-1.8.2-2.el6.noarch
ruby193-rubygem-foreman_templates-1.5.0-1.el6.noarch
foreman-1.8.2-1.el6.noarch
foreman-vmware-1.8.2-1.el6.noarch
foreman-release-1.8.2-1.el6.noarch
ruby193-rubygem-foreman_reserve-0.1.8.3-2.el6.noarch

$ rpm -qa |grep puppet*
puppet-server-3.7.4-1.el6.noarch
puppetlabs-release-6-11.noarch
puppet-3.7.4-1.el6.noarch

$ cat /etc/sudoers.d/foreman-proxy
*foreman-proxy ALL = NOPASSWD: /usr/bin/puppet cert **
*foreman-proxy ALL = NOPASSWD: /usr/bin/puppet kick **
Defaults:foreman-proxy !requiretty

$ ll /var/lib/puppet/ssl/ca/
total 3480
-rw-r–r--. 1 puppet puppet 3283836 Jul 6 06:41 ca_crl.pem
-rw-r–r--. 1 puppet puppet 1915 Jun 18 2014 ca_crt.pem
-rw-r-----. 1 puppet puppet 3243 Jun 18 2014 ca_key.pem
-rw-r–r--. 1 puppet puppet 775 Jun 18 2014 ca_pub.pem
-rw-r–r--. 1 puppet puppet 244537 Jul 6 04:51 inventory.txt
drwxr-x—. 2 puppet puppet 4096 Jun 18 2014 private
drwxr-xr-x. 2 puppet puppet 4096 Jul 6 04:51 requests
-rw-r–r-- 1 puppet puppet 4 Jul 6 04:51 serial
drwxr-xr-x. 2 puppet puppet 4096 Jul 6 06:38 signed

$ ll /etc/puppet/
total 64
-rw-r–r-- 1 root root 4178 Jun 27 02:32 auth.conf
-rw-rw-r–. 1 puppet puppet 255 Jul 6 06:38 autosign.conf
drwxr-xr-x. 6 root root 4096 Mar 27 16:11 environments
-rw-r–r-- 1 root root 1462 Jan 26 17:46 fileserver.conf
-rw-r–r-- 1 root root 422 Jun 27 02:59 foreman.yaml
drwxr-xr-x. 2 root root 4096 Mar 21 16:32 manifests
drwxr-xr-x. 2 root root 4096 Jan 26 17:48 modules
-rwxr-xr-x 1 root root 9850 Jun 27 02:54 node.rb
-rw-r–r-- 1 root root 2835 Jun 27 15:31 puppet.conf
drwxr-xr-x. 4 puppet root 4096 Mar 17 01:20 rack

$ cat /etc/foreman-proxy/settings.d/puppetca.yml

··· *---* *# Can be true, false, or http/https to enable just one of the protocols* *:enabled: https* *:ssldir: /var/lib/puppet/ssl* *:puppetdir: /etc/puppet* *:puppetca_use_sudo: true* *:sudo_command: /usr/bin/sudo*

On Monday, June 29, 2015 at 12:05:07 PM UTC+3, Avi Tal wrote:

10x, so i have reconfigured my sudoers and waiting for reproduce.
Will update if it does

On Monday, June 29, 2015 at 11:59:53 AM UTC+3, Dominic Cleal wrote:

On 27/06/15 08:14, Avi Tal wrote:

Do you guys think that the following log is related?
BTW, civwdd202.lab.primarydata.com host is the one i have trouble
with

Started POST “/api/reports” for 172.29.100.100 at 2015-06-27 02:12:21
-0500
2015-06-27 02:12:21 [I] Processing by Api::V2::ReportsController#create
as JSON
2015-06-27 02:12:21 [I] Parameters: {“report”=>"[FILTERED]",
“apiv”=>“v2”}
2015-06-27 02:12:21 [I] processing report for
civwds005.lab.primarydata.com
2015-06-27 02:12:21 [I] Imported report for
civwds005.lab.primarydata.com in 0.02 seconds
2015-06-27 02:12:21 [I] Completed 201 Created in 34ms (Views: 2.3ms |
ActiveRecord: 0.0ms)
2015-06-27 02:12:30 [W] Operation FAILED: ERF12-7740
[ProxyAPI::ProxyException]: Unable to delete PuppetCA certificate for
civwdd202.lab.primarydata.com ([RestClient::RequestTimeout]: Request
Timeout) for proxy
https://foreman1-vw.lab.primarydata.com:8443/puppet/ca
2015-06-27 02:12:30 [I] Completed 500 Internal Server Error in 60073ms

This suggests something’s wrong on your PuppetCA proxy. I’d suspect
that sudoers is misconfigured, perhaps it’s prompting foreman-proxy for
a password.

Double check that you have the correct sudoers entries (both) for the
version of Puppet installed:
Foreman :: Manual

I have just witnessed a removal of another host from the certificates
list without adding back

Started GET
"/unattended/provision?token=48a4c1f2-a997-4d04-a118-8c23e300d526" for
172.29.52.126 at 2015-06-27 03:26:05 -0500
…skipping…
2015-06-27 03:26:05 [I] Found civwdd203.lab.primarydata.com
2015-06-27 03:26:05 [I] Remove puppet certificate for
civwdd203.lab.primarydata.com
2015-06-27 03:26:13 [I]

What is happening? What do i miss here?

The host is being rebuilt, so its certificate has been removed.
Accessing /unattended/provision is it requesting a kickstart/preseed
script. If it’s being rebuilt, the certificates will get replaced.

If it’s not being rebuilt, then I’d check what 172.29.52.126 is, and why
it’s requesting a kickstart!


Dominic Cleal
Red Hat Engineering

Anyways, as i suspected, "puppet cert --generate
civwdd203.lab.primarydata.com
" fixes the problem.

But the main problem stays the same, why was it removed and how do i debug
and fix it?

··· On Saturday, June 27, 2015 at 11:28:53 AM UTC+3, Avi Tal wrote: > > I have just witnessed a removal of another host from the certificates list > without adding back > > Started GET > "/unattended/provision?token=48a4c1f2-a997-4d04-a118-8c23e300d526" for > 172.29.52.126 at 2015-06-27 03:26:05 -0500 > ...skipping... > 2015-06-27 03:26:05 [I] Found civwdd203.lab.primarydata.com > 2015-06-27 03:26:05 [I] Remove puppet certificate for > civwdd203.lab.primarydata.com > 2015-06-27 03:26:13 [I] > > What is happening? What do i miss here? > :( > > On Saturday, June 27, 2015 at 11:21:42 AM UTC+3, Avi Tal wrote: >> >> I think i got it even though i have no idea how it appears! >> >> For some reason civwdd202... certificate is no longer listed under smart >> proxy certificates >> Again, i have done anything that should cause removal of its certificates. >> >> How can i add back the hosts's certificate? >> >> E, [2015-06-27T02:06:00.835720 #24712] ERROR -- : Attempt to remove >> nonexistant client autosign for civwdd202.lab.primarydata.com >> 172.29.100.100 - - [27/Jun/2015 02:06:00] "DELETE /autosign/ >> civwdd202.lab.primarydata.com HTTP/1.1" 404 79 0.0016 >> >> On Saturday, June 27, 2015 at 10:14:57 AM UTC+3, Avi Tal wrote: >>> >>> *Do you guys think that the following log is related?* >>> *BTW, civwdd202.lab.primarydata.com >>> host is the one i have trouble with* >>> >>> >>> Started POST "/api/reports" for 172.29.100.100 at 2015-06-27 02:12:21 >>> -0500 >>> 2015-06-27 02:12:21 [I] Processing by Api::V2::ReportsController#create >>> as JSON >>> 2015-06-27 02:12:21 [I] Parameters: {"report"=>"[FILTERED]", >>> "apiv"=>"v2"} >>> 2015-06-27 02:12:21 [I] processing report for >>> civwds005.lab.primarydata.com >>> 2015-06-27 02:12:21 [I] Imported report for >>> civwds005.lab.primarydata.com in 0.02 seconds >>> 2015-06-27 02:12:21 [I] Completed 201 Created in 34ms (Views: 2.3ms | >>> ActiveRecord: 0.0ms) >>> 2015-06-27 02:12:30 [W] Operation FAILED: ERF12-7740 >>> [ProxyAPI::ProxyException]: Unable to delete PuppetCA certificate for >>> civwdd202.lab.primarydata.com ([RestClient::RequestTimeout]: Request >>> Timeout) for proxy >>> https://foreman1-vw.lab.primarydata.com:8443/puppet/ca >>> 2015-06-27 02:12:30 [I] Completed 500 Internal Server Error in 60073ms >>> 2015-06-27 02:12:30 [F] >>> ArgumentError (There was no default layout for UnattendedController in >>> #>> @paths=[/usr/share/foreman/app/views, >>> /opt/rh/ruby193/root/usr/share/gems/gems/foreman_setup-2.1.1/app/views, >>> /opt/rh/ruby193/root/usr/share/gems/gems/foreman_bootdisk-5.0.0/app/views, >>> /opt/rh/ruby193/root/usr/share/gems/gems/apipie-rails-0.2.6/app/views]>): >>> app/controllers/application_controller.rb:299:in `generic_exception' >>> lib/middleware/catch_json_parse_errors.rb:9:in `call' >>> >>> >>> >>> >>> On Saturday, June 27, 2015 at 9:45:13 AM UTC+3, Avi Tal wrote: >>>> >>>> I haven't install foreman_hooks and never touch foreman_url settings. >>>> Dominic, provision review is working for this specific host. >>>> >>>> BTW, the same problem appears for several hosts. >>>> >>>> Guys, i must have a workaround ASAP. What can i do to debug it? Can you >>>> recommend a fast workaround? >>>> >>>> I'll appreciate every suggestion. >>>> >>>> Thanks >>>> >>>> >>>> On Friday, June 26, 2015 at 1:39:58 PM UTC+3, Lukas Zapletal wrote: >>>>> >>>>> Hey, >>>>> >>>>> > >>>>> "/api/unattended/provision?token=9fbfb26e-5c78-426d-8cad-8d685886c520" for >>>>> >>>>> this is wrong URL, remove "/api" part. Did you modify the provisioning >>>>> template or foreman_url setting? >>>>> >>>>> -- >>>>> Later, >>>>> Lukas #lzap Zapletal >>>>> >>>>

I've just witnessed the reason, I think :frowning:
As much as I understood, the right flow in build mode is that cert is
automatically removed from "/var/lib/puppet/ssl/ca/signed/" while being
added to "/etc/puppet/autosign.conf".

The problem is that the host was stuck in reboot for some reason (not
related to our problem) and now after long timeout, restart the host will
automatically enter to "dracut:/#"
The situation is that the hostname still exists in autosign.conf but not in
"/var/lib/puppet/ssl/ca/signed/".
Cancel rebuild through Foreman UI will remove the hostname from
autosign.conf but will never add the certificate back to
"/var/lib/puppet/ssl/ca/signed/" and there for i will never be able to
rebuild it again until running the "puppet cert --generate …" command.

WTF?

··· On Monday, July 6, 2015 at 2:51:26 PM UTC+3, Avi Tal wrote: > > Well, sadly it keeps reproducing randomly. > It can simply work perfect for existing hosts/new created hosts but then > one of the hosts suddenly will loose his certificate and provisioning will > fail again. > The only solution again is to run "puppet cert --generate ... " > I couldn't find any pattern for the reproduction of this failure. > > I'll appreciate any help guys. > > Here are some details: > *$ cat /etc/redhat-release* > *CentOS release 6.5 (Final)* > > *$ rpm -qa |grep foreman** > *foreman-compute-1.8.2-1.el6.noarch* > *foreman-ec2-1.8.2-1.el6.noarch* > *rubygem-foreman_api-0.1.11-2.el6.noarch* > *foreman-selinux-1.8.2-1.el6.noarch* > *foreman-postgresql-1.8.2-1.el6.noarch* > *ruby193-rubygem-foreman_setup-2.1.1-1.el6.noarch* > *rubygem-hammer_cli_foreman-0.2.0-1.el6.noarch* > *ruby193-rubygem-foreman_bootdisk-5.0.0-1.el6.noarch* > *foreman-debug-1.8.2-1.el6.noarch* > *foreman-installer-1.8.2-1.el6.noarch* > *foreman-proxy-1.8.2-2.el6.noarch* > *ruby193-rubygem-foreman_templates-1.5.0-1.el6.noarch* > *foreman-1.8.2-1.el6.noarch* > *foreman-vmware-1.8.2-1.el6.noarch* > *foreman-release-1.8.2-1.el6.noarch* > *ruby193-rubygem-foreman_reserve-0.1.8.3-2.el6.noarch* > > *$ rpm -qa |grep puppet** > *puppet-server-3.7.4-1.el6.noarch* > *puppetlabs-release-6-11.noarch* > *puppet-3.7.4-1.el6.noarch* > > > *$ cat /etc/sudoers.d/foreman-proxy* > *foreman-proxy ALL = NOPASSWD: /usr/bin/puppet cert ** > *foreman-proxy ALL = NOPASSWD: /usr/bin/puppet kick ** > *Defaults:foreman-proxy !requiretty* > > > *$ ll /var/lib/puppet/ssl/ca/* > *total 3480* > *-rw-r--r--. 1 puppet puppet 3283836 Jul 6 06:41 ca_crl.pem* > *-rw-r--r--. 1 puppet puppet 1915 Jun 18 2014 ca_crt.pem* > *-rw-r-----. 1 puppet puppet 3243 Jun 18 2014 ca_key.pem* > *-rw-r--r--. 1 puppet puppet 775 Jun 18 2014 ca_pub.pem* > *-rw-r--r--. 1 puppet puppet 244537 Jul 6 04:51 inventory.txt* > *drwxr-x---. 2 puppet puppet 4096 Jun 18 2014 private* > *drwxr-xr-x. 2 puppet puppet 4096 Jul 6 04:51 requests* > *-rw-r--r-- 1 puppet puppet 4 Jul 6 04:51 serial* > *drwxr-xr-x. 2 puppet puppet 4096 Jul 6 06:38 signed* > > > > *$ ll /etc/puppet/* > *total 64* > *-rw-r--r-- 1 root root 4178 Jun 27 02:32 auth.conf* > *-rw-rw-r--. 1 puppet puppet 255 Jul 6 06:38 autosign.conf* > *drwxr-xr-x. 6 root root 4096 Mar 27 16:11 environments* > *-rw-r--r-- 1 root root 1462 Jan 26 17:46 fileserver.conf* > *-rw-r--r-- 1 root root 422 Jun 27 02:59 foreman.yaml* > *drwxr-xr-x. 2 root root 4096 Mar 21 16:32 manifests* > *drwxr-xr-x. 2 root root 4096 Jan 26 17:48 modules* > *-rwxr-xr-x 1 root root 9850 Jun 27 02:54 node.rb* > *-rw-r--r-- 1 root root 2835 Jun 27 15:31 puppet.conf* > *drwxr-xr-x. 4 puppet root 4096 Mar 17 01:20 rack* > > > *$ cat /etc/foreman-proxy/settings.d/puppetca.yml* > *---* > *# Can be true, false, or http/https to enable just one of the protocols* > *:enabled: https* > *:ssldir: /var/lib/puppet/ssl* > *:puppetdir: /etc/puppet* > *:puppetca_use_sudo: true* > *:sudo_command: /usr/bin/sudo* > > > > On Monday, June 29, 2015 at 12:05:07 PM UTC+3, Avi Tal wrote: >> >> 10x, so i have reconfigured my sudoers and waiting for reproduce. >> Will update if it does >> >> On Monday, June 29, 2015 at 11:59:53 AM UTC+3, Dominic Cleal wrote: >>> >>> On 27/06/15 08:14, Avi Tal wrote: >>> > *Do you guys think that the following log is related?* >>> > *BTW, civwdd202.lab.primarydata.com host is the one i have trouble >>> with* >>> > >>> > >>> > Started POST "/api/reports" for 172.29.100.100 at 2015-06-27 02:12:21 >>> -0500 >>> > 2015-06-27 02:12:21 [I] Processing by >>> Api::V2::ReportsController#create >>> > as JSON >>> > 2015-06-27 02:12:21 [I] Parameters: {"report"=>"[FILTERED]", >>> "apiv"=>"v2"} >>> > 2015-06-27 02:12:21 [I] processing report for >>> civwds005.lab.primarydata.com >>> > 2015-06-27 02:12:21 [I] Imported report for >>> > civwds005.lab.primarydata.com in 0.02 seconds >>> > 2015-06-27 02:12:21 [I] Completed 201 Created in 34ms (Views: 2.3ms | >>> > ActiveRecord: 0.0ms) >>> > 2015-06-27 02:12:30 [W] Operation FAILED: ERF12-7740 >>> > [ProxyAPI::ProxyException]: Unable to delete PuppetCA certificate for >>> > civwdd202.lab.primarydata.com ([RestClient::RequestTimeout]: Request >>> > Timeout) for proxy >>> https://foreman1-vw.lab.primarydata.com:8443/puppet/ca >>> > 2015-06-27 02:12:30 [I] Completed 500 Internal Server Error in 60073ms >>> >>> This suggests something's wrong on your PuppetCA proxy. I'd suspect >>> that sudoers is misconfigured, perhaps it's prompting foreman-proxy for >>> a password. >>> >>> Double check that you have the correct sudoers entries (both) for the >>> version of Puppet installed: >>> http://theforeman.org/manuals/1.8/index.html#4.3.8PuppetCA >>> >>> > I have just witnessed a removal of another host from the certificates >>> list without adding back >>> > >>> > Started GET >>> "/unattended/provision?token=48a4c1f2-a997-4d04-a118-8c23e300d526" for >>> 172.29.52.126 at 2015-06-27 03:26:05 -0500 >>> > ...skipping... >>> > 2015-06-27 03:26:05 [I] Found civwdd203.lab.primarydata.com >>> > 2015-06-27 03:26:05 [I] Remove puppet certificate for >>> civwdd203.lab.primarydata.com >>> > 2015-06-27 03:26:13 [I] >>> > >>> > What is happening? What do i miss here? >>> >>> The host is being rebuilt, so its certificate has been removed. >>> Accessing /unattended/provision is it requesting a kickstart/preseed >>> script. If it's being rebuilt, the certificates will get replaced. >>> >>> If it's not being rebuilt, then I'd check what 172.29.52.126 is, and why >>> it's requesting a kickstart! >>> >>> -- >>> Dominic Cleal >>> Red Hat Engineering >>> >>

> I've just witnessed the reason, I think :frowning:
> As much as I understood, the right flow in build mode is that cert is
> automatically removed from "/var/lib/puppet/ssl/ca/signed/" while being
> added to "/etc/puppet/autosign.conf".
>
> The problem is that the host was stuck in reboot for some reason (not
> related to our problem) and now after long timeout, restart the host will
> automatically enter to "dracut:/#"
> The situation is that the hostname still exists in autosign.conf but not
> in "/var/lib/puppet/ssl/ca/signed/".
> Cancel rebuild through Foreman UI will remove the hostname from
> autosign.conf but will never add the certificate back to
> "/var/lib/puppet/ssl/ca/signed/" and there for i will never be able to
> rebuild it again until running the "puppet cert --generate …" command.
>
> WTF?
>

once the host is provisioning again, it requests a new certificate, so
clearly you did not get into that stage (as you got to the dracut prompt).
figure out why you went into prompt to start with.

Ohad

··· On Mon, Jul 6, 2015 at 4:54 PM, Avi Tal wrote:

On Monday, July 6, 2015 at 2:51:26 PM UTC+3, Avi Tal wrote:

Well, sadly it keeps reproducing randomly.
It can simply work perfect for existing hosts/new created hosts but then
one of the hosts suddenly will loose his certificate and provisioning will
fail again.
The only solution again is to run "puppet cert --generate … "
I couldn’t find any pattern for the reproduction of this failure.

I’ll appreciate any help guys.

Here are some details:
$ cat /etc/redhat-release
CentOS release 6.5 (Final)

$ rpm -qa |grep foreman*
foreman-compute-1.8.2-1.el6.noarch
foreman-ec2-1.8.2-1.el6.noarch
rubygem-foreman_api-0.1.11-2.el6.noarch
foreman-selinux-1.8.2-1.el6.noarch
foreman-postgresql-1.8.2-1.el6.noarch
ruby193-rubygem-foreman_setup-2.1.1-1.el6.noarch
rubygem-hammer_cli_foreman-0.2.0-1.el6.noarch
ruby193-rubygem-foreman_bootdisk-5.0.0-1.el6.noarch
foreman-debug-1.8.2-1.el6.noarch
foreman-installer-1.8.2-1.el6.noarch
foreman-proxy-1.8.2-2.el6.noarch
ruby193-rubygem-foreman_templates-1.5.0-1.el6.noarch
foreman-1.8.2-1.el6.noarch
foreman-vmware-1.8.2-1.el6.noarch
foreman-release-1.8.2-1.el6.noarch
ruby193-rubygem-foreman_reserve-0.1.8.3-2.el6.noarch

$ rpm -qa |grep puppet*
puppet-server-3.7.4-1.el6.noarch
puppetlabs-release-6-11.noarch
puppet-3.7.4-1.el6.noarch

$ cat /etc/sudoers.d/foreman-proxy
*foreman-proxy ALL = NOPASSWD: /usr/bin/puppet cert **
*foreman-proxy ALL = NOPASSWD: /usr/bin/puppet kick **
Defaults:foreman-proxy !requiretty

$ ll /var/lib/puppet/ssl/ca/
total 3480
-rw-r–r--. 1 puppet puppet 3283836 Jul 6 06:41 ca_crl.pem
-rw-r–r--. 1 puppet puppet 1915 Jun 18 2014 ca_crt.pem
-rw-r-----. 1 puppet puppet 3243 Jun 18 2014 ca_key.pem
-rw-r–r--. 1 puppet puppet 775 Jun 18 2014 ca_pub.pem
-rw-r–r--. 1 puppet puppet 244537 Jul 6 04:51 inventory.txt
drwxr-x—. 2 puppet puppet 4096 Jun 18 2014 private
drwxr-xr-x. 2 puppet puppet 4096 Jul 6 04:51 requests
-rw-r–r-- 1 puppet puppet 4 Jul 6 04:51 serial
drwxr-xr-x. 2 puppet puppet 4096 Jul 6 06:38 signed

$ ll /etc/puppet/
total 64
-rw-r–r-- 1 root root 4178 Jun 27 02:32 auth.conf
-rw-rw-r–. 1 puppet puppet 255 Jul 6 06:38 autosign.conf
drwxr-xr-x. 6 root root 4096 Mar 27 16:11 environments
-rw-r–r-- 1 root root 1462 Jan 26 17:46 fileserver.conf
-rw-r–r-- 1 root root 422 Jun 27 02:59 foreman.yaml
drwxr-xr-x. 2 root root 4096 Mar 21 16:32 manifests
drwxr-xr-x. 2 root root 4096 Jan 26 17:48 modules
-rwxr-xr-x 1 root root 9850 Jun 27 02:54 node.rb
-rw-r–r-- 1 root root 2835 Jun 27 15:31 puppet.conf
drwxr-xr-x. 4 puppet root 4096 Mar 17 01:20 rack

$ cat /etc/foreman-proxy/settings.d/puppetca.yml

# Can be true, false, or http/https to enable just one of the protocols
:enabled: https
:ssldir: /var/lib/puppet/ssl
:puppetdir: /etc/puppet
:puppetca_use_sudo: true
:sudo_command: /usr/bin/sudo

On Monday, June 29, 2015 at 12:05:07 PM UTC+3, Avi Tal wrote:

10x, so i have reconfigured my sudoers and waiting for reproduce.
Will update if it does

On Monday, June 29, 2015 at 11:59:53 AM UTC+3, Dominic Cleal wrote:

On 27/06/15 08:14, Avi Tal wrote:

Do you guys think that the following log is related?
BTW, civwdd202.lab.primarydata.com host is the one i have trouble
with

Started POST “/api/reports” for 172.29.100.100 at 2015-06-27 02:12:21
-0500
2015-06-27 02:12:21 [I] Processing by
Api::V2::ReportsController#create
as JSON
2015-06-27 02:12:21 [I] Parameters: {“report”=>"[FILTERED]",
“apiv”=>“v2”}
2015-06-27 02:12:21 [I] processing report for
civwds005.lab.primarydata.com
2015-06-27 02:12:21 [I] Imported report for
civwds005.lab.primarydata.com in 0.02 seconds
2015-06-27 02:12:21 [I] Completed 201 Created in 34ms (Views: 2.3ms |
ActiveRecord: 0.0ms)
2015-06-27 02:12:30 [W] Operation FAILED: ERF12-7740
[ProxyAPI::ProxyException]: Unable to delete PuppetCA certificate for
civwdd202.lab.primarydata.com ([RestClient::RequestTimeout]: Request
Timeout) for proxy
https://foreman1-vw.lab.primarydata.com:8443/puppet/ca
2015-06-27 02:12:30 [I] Completed 500 Internal Server Error in
60073ms

This suggests something’s wrong on your PuppetCA proxy. I’d suspect
that sudoers is misconfigured, perhaps it’s prompting foreman-proxy for
a password.

Double check that you have the correct sudoers entries (both) for the
version of Puppet installed:
Foreman :: Manual

I have just witnessed a removal of another host from the certificates
list without adding back

Started GET
"/unattended/provision?token=48a4c1f2-a997-4d04-a118-8c23e300d526" for
172.29.52.126 at 2015-06-27 03:26:05 -0500
…skipping…
2015-06-27 03:26:05 [I] Found civwdd203.lab.primarydata.com
2015-06-27 03:26:05 [I] Remove puppet certificate for
civwdd203.lab.primarydata.com
2015-06-27 03:26:13 [I]

What is happening? What do i miss here?

The host is being rebuilt, so its certificate has been removed.
Accessing /unattended/provision is it requesting a kickstart/preseed
script. If it’s being rebuilt, the certificates will get replaced.

If it’s not being rebuilt, then I’d check what 172.29.52.126 is, and
why
it’s requesting a kickstart!


Dominic Cleal
Red Hat Engineering


You received this message because you are subscribed to the Google Groups
"Foreman users" group.
To unsubscribe from this group and stop receiving emails from it, send an
email to foreman-users+unsubscribe@googlegroups.com.
To post to this group, send email to foreman-users@googlegroups.com.
Visit this group at http://groups.google.com/group/foreman-users.
For more options, visit https://groups.google.com/d/optout.

Indeed, and the reason i am entering this prompt is due to "*Failed to
fetch kickstart from http://…" *meaning i haven't had certificate.
This is exactly what i am trying to understand.

What drives me crazy is that it usually works and then it suddenly appears
in one hosts provisioning.

··· On Monday, July 6, 2015 at 5:00:49 PM UTC+3, ohad wrote: > > > > On Mon, Jul 6, 2015 at 4:54 PM, Avi Tal <avi...@gmail.com > > wrote: > >> I've just witnessed the reason, I think :( >> As much as I understood, the right flow in build mode is that cert is >> automatically removed from "/var/lib/puppet/ssl/ca/signed/" while being >> added to "/etc/puppet/autosign.conf". >> >> The problem is that the host was stuck in reboot for some reason (not >> related to our problem) and now after long timeout, restart the host will >> automatically enter to "*dracut:/#*" >> The situation is that the hostname still exists in autosign.conf but not >> in "/var/lib/puppet/ssl/ca/signed/". >> Cancel rebuild through Foreman UI will remove the hostname from >> autosign.conf but will never add the certificate back to >> "/var/lib/puppet/ssl/ca/signed/" and there for i will never be able to >> rebuild it again until running the "puppet cert --generate ..." command. >> >> WTF? >> > > once the host is provisioning again, it requests a new certificate, so > clearly you did not get into that stage (as you got to the dracut prompt). > figure out why you went into prompt to start with. > > Ohad > >> >> >> >> >> >> On Monday, July 6, 2015 at 2:51:26 PM UTC+3, Avi Tal wrote: >>> >>> Well, sadly it keeps reproducing randomly. >>> It can simply work perfect for existing hosts/new created hosts but then >>> one of the hosts suddenly will loose his certificate and provisioning will >>> fail again. >>> The only solution again is to run "puppet cert --generate ... " >>> I couldn't find any pattern for the reproduction of this failure. >>> >>> I'll appreciate any help guys. >>> >>> Here are some details: >>> *$ cat /etc/redhat-release* >>> *CentOS release 6.5 (Final)* >>> >>> *$ rpm -qa |grep foreman** >>> *foreman-compute-1.8.2-1.el6.noarch* >>> *foreman-ec2-1.8.2-1.el6.noarch* >>> *rubygem-foreman_api-0.1.11-2.el6.noarch* >>> *foreman-selinux-1.8.2-1.el6.noarch* >>> *foreman-postgresql-1.8.2-1.el6.noarch* >>> *ruby193-rubygem-foreman_setup-2.1.1-1.el6.noarch* >>> *rubygem-hammer_cli_foreman-0.2.0-1.el6.noarch* >>> *ruby193-rubygem-foreman_bootdisk-5.0.0-1.el6.noarch* >>> *foreman-debug-1.8.2-1.el6.noarch* >>> *foreman-installer-1.8.2-1.el6.noarch* >>> *foreman-proxy-1.8.2-2.el6.noarch* >>> *ruby193-rubygem-foreman_templates-1.5.0-1.el6.noarch* >>> *foreman-1.8.2-1.el6.noarch* >>> *foreman-vmware-1.8.2-1.el6.noarch* >>> *foreman-release-1.8.2-1.el6.noarch* >>> *ruby193-rubygem-foreman_reserve-0.1.8.3-2.el6.noarch* >>> >>> *$ rpm -qa |grep puppet** >>> *puppet-server-3.7.4-1.el6.noarch* >>> *puppetlabs-release-6-11.noarch* >>> *puppet-3.7.4-1.el6.noarch* >>> >>> >>> *$ cat /etc/sudoers.d/foreman-proxy* >>> *foreman-proxy ALL = NOPASSWD: /usr/bin/puppet cert ** >>> *foreman-proxy ALL = NOPASSWD: /usr/bin/puppet kick ** >>> *Defaults:foreman-proxy !requiretty* >>> >>> >>> *$ ll /var/lib/puppet/ssl/ca/* >>> *total 3480* >>> *-rw-r--r--. 1 puppet puppet 3283836 Jul 6 06:41 ca_crl.pem* >>> *-rw-r--r--. 1 puppet puppet 1915 Jun 18 2014 ca_crt.pem* >>> *-rw-r-----. 1 puppet puppet 3243 Jun 18 2014 ca_key.pem* >>> *-rw-r--r--. 1 puppet puppet 775 Jun 18 2014 ca_pub.pem* >>> *-rw-r--r--. 1 puppet puppet 244537 Jul 6 04:51 inventory.txt* >>> *drwxr-x---. 2 puppet puppet 4096 Jun 18 2014 private* >>> *drwxr-xr-x. 2 puppet puppet 4096 Jul 6 04:51 requests* >>> *-rw-r--r-- 1 puppet puppet 4 Jul 6 04:51 serial* >>> *drwxr-xr-x. 2 puppet puppet 4096 Jul 6 06:38 signed* >>> >>> >>> >>> *$ ll /etc/puppet/* >>> *total 64* >>> *-rw-r--r-- 1 root root 4178 Jun 27 02:32 auth.conf* >>> *-rw-rw-r--. 1 puppet puppet 255 Jul 6 06:38 autosign.conf* >>> *drwxr-xr-x. 6 root root 4096 Mar 27 16:11 environments* >>> *-rw-r--r-- 1 root root 1462 Jan 26 17:46 fileserver.conf* >>> *-rw-r--r-- 1 root root 422 Jun 27 02:59 foreman.yaml* >>> *drwxr-xr-x. 2 root root 4096 Mar 21 16:32 manifests* >>> *drwxr-xr-x. 2 root root 4096 Jan 26 17:48 modules* >>> *-rwxr-xr-x 1 root root 9850 Jun 27 02:54 node.rb* >>> *-rw-r--r-- 1 root root 2835 Jun 27 15:31 puppet.conf* >>> *drwxr-xr-x. 4 puppet root 4096 Mar 17 01:20 rack* >>> >>> >>> *$ cat /etc/foreman-proxy/settings.d/puppetca.yml* >>> *---* >>> *# Can be true, false, or http/https to enable just one of the protocols* >>> *:enabled: https* >>> *:ssldir: /var/lib/puppet/ssl* >>> *:puppetdir: /etc/puppet* >>> *:puppetca_use_sudo: true* >>> *:sudo_command: /usr/bin/sudo* >>> >>> >>> >>> On Monday, June 29, 2015 at 12:05:07 PM UTC+3, Avi Tal wrote: >>>> >>>> 10x, so i have reconfigured my sudoers and waiting for reproduce. >>>> Will update if it does >>>> >>>> On Monday, June 29, 2015 at 11:59:53 AM UTC+3, Dominic Cleal wrote: >>>>> >>>>> On 27/06/15 08:14, Avi Tal wrote: >>>>> > *Do you guys think that the following log is related?* >>>>> > *BTW, civwdd202.lab.primarydata.com host is the one i have trouble >>>>> with* >>>>> > >>>>> > >>>>> > Started POST "/api/reports" for 172.29.100.100 at 2015-06-27 >>>>> 02:12:21 -0500 >>>>> > 2015-06-27 02:12:21 [I] Processing by >>>>> Api::V2::ReportsController#create >>>>> > as JSON >>>>> > 2015-06-27 02:12:21 [I] Parameters: {"report"=>"[FILTERED]", >>>>> "apiv"=>"v2"} >>>>> > 2015-06-27 02:12:21 [I] processing report for >>>>> civwds005.lab.primarydata.com >>>>> > 2015-06-27 02:12:21 [I] Imported report for >>>>> > civwds005.lab.primarydata.com in 0.02 seconds >>>>> > 2015-06-27 02:12:21 [I] Completed 201 Created in 34ms (Views: 2.3ms >>>>> > >>>>> > ActiveRecord: 0.0ms) >>>>> > 2015-06-27 02:12:30 [W] Operation FAILED: ERF12-7740 >>>>> > [ProxyAPI::ProxyException]: Unable to delete PuppetCA certificate >>>>> for >>>>> > civwdd202.lab.primarydata.com ([RestClient::RequestTimeout]: >>>>> Request >>>>> > Timeout) for proxy >>>>> https://foreman1-vw.lab.primarydata.com:8443/puppet/ca >>>>> > 2015-06-27 02:12:30 [I] Completed 500 Internal Server Error in >>>>> 60073ms >>>>> >>>>> This suggests something's wrong on your PuppetCA proxy. I'd suspect >>>>> that sudoers is misconfigured, perhaps it's prompting foreman-proxy >>>>> for >>>>> a password. >>>>> >>>>> Double check that you have the correct sudoers entries (both) for the >>>>> version of Puppet installed: >>>>> http://theforeman.org/manuals/1.8/index.html#4.3.8PuppetCA >>>>> >>>>> > I have just witnessed a removal of another host from the >>>>> certificates list without adding back >>>>> > >>>>> > Started GET >>>>> "/unattended/provision?token=48a4c1f2-a997-4d04-a118-8c23e300d526" for >>>>> 172.29.52.126 at 2015-06-27 03:26:05 -0500 >>>>> > ...skipping... >>>>> > 2015-06-27 03:26:05 [I] Found civwdd203.lab.primarydata.com >>>>> > 2015-06-27 03:26:05 [I] Remove puppet certificate for >>>>> civwdd203.lab.primarydata.com >>>>> > 2015-06-27 03:26:13 [I] >>>>> > >>>>> > What is happening? What do i miss here? >>>>> >>>>> The host is being rebuilt, so its certificate has been removed. >>>>> Accessing /unattended/provision is it requesting a kickstart/preseed >>>>> script. If it's being rebuilt, the certificates will get replaced. >>>>> >>>>> If it's not being rebuilt, then I'd check what 172.29.52.126 is, and >>>>> why >>>>> it's requesting a kickstart! >>>>> >>>>> -- >>>>> Dominic Cleal >>>>> Red Hat Engineering >>>>> >>>> -- >> You received this message because you are subscribed to the Google Groups >> "Foreman users" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to foreman-user...@googlegroups.com . >> To post to this group, send email to forema...@googlegroups.com >> . >> Visit this group at http://groups.google.com/group/foreman-users. >> For more options, visit https://groups.google.com/d/optout. >> > >

going over the logs will probably provide the missing information…

··· On Mon, Jul 6, 2015 at 5:18 PM, Avi Tal wrote:

What drives me crazy is that it usually works and then it suddenly appears
in one hosts provisioning.

Any idea?

Started GET
"/api/hostgroups?search=title+%3D+datastore+and+location_id+%3D+1" for
192.168.136.32 at 2015-07-07 11:29:58 -0500
2015-07-07 11:29:58 [I] Processing by Api::V2::HostgroupsController#index
as JSON
2015-07-07 11:29:58 [I] Parameters: {"search"=>"title = datastore and
location_id = 1", "apiv"=>"v2", "hostgroup"=>{}}
2015-07-07 11:30:02 [I] Authorized user prov(Provision Automation)
2015-07-07 11:30:02 [I] Rendered api/v2/hostgroups/index.json.rabl within
api/v2/layouts/index_layout (32.3ms)
2015-07-07 11:30:02 [I] Completed 200 OK in 4419ms (Views: 33.5ms |
ActiveRecord: 20.2ms)
2015-07-07 11:30:02 [W] Operation FAILED: ERF12-7740
[ProxyAPI::ProxyException]: Unable to delete PuppetCA certificate for
civwdd200.lab.primarydata.com ([RestClient::RequestTimeout]: Request
Timeout) for proxy https://foreman1-vw.lab.primarydata.com:8443/puppet/ca

2015-07-07 11:30:02 [I] Completed 500 Internal Server Error in 60070ms
2015-07-07 11:30:02 [F]
ArgumentError (There was no default layout for UnattendedController in
#<ActionView::PathSet:0x0000000a403e18
@paths=[/usr/share/foreman/app/views,
/opt/rh/ruby193/root/usr/share/gems/gems/foreman_setup-2.1.1/app/views,
/opt/rh/ruby193/root/usr/share/gems/gems/foreman_bootdisk-5.0.0/app/views,
/opt/rh/ruby193/root/usr/share/gems/gems/apipie-rails-0.2.6/app/views]>):
app/controllers/application_controller.rb:299:in generic_exception&#39; lib/middleware/catch_json_parse_errors.rb:9:incall'

··· On Monday, July 6, 2015 at 5:24:15 PM UTC+3, ohad wrote: > > > On Mon, Jul 6, 2015 at 5:18 PM, Avi Tal <avi...@gmail.com > > wrote: > >> What drives me crazy is that it usually works and then it suddenly >> appears in one hosts provisioning. > > > > going over the logs will probably provide the missing information... >

Do you think that due to the fact that it happens randomly and not
persistently, it is somehow related to the number of concurrent connections
i am running when requesting multiple provision operations?

Can i somehow raise the number of workers of smart proxy?

Thanks

··· On Tuesday, July 7, 2015 at 7:31:34 PM UTC+3, Avi Tal wrote: > > Any idea? > > > Started GET > "/api/hostgroups?search=title+%3D+datastore+and+location_id+%3D+1" for > 192.168.136.32 at 2015-07-07 11:29:58 -0500 > 2015-07-07 11:29:58 [I] Processing by Api::V2::HostgroupsController#index > as JSON > 2015-07-07 11:29:58 [I] Parameters: {"search"=>"title = datastore and > location_id = 1", "apiv"=>"v2", "hostgroup"=>{}} > 2015-07-07 11:30:02 [I] Authorized user prov(Provision Automation) > 2015-07-07 11:30:02 [I] Rendered api/v2/hostgroups/index.json.rabl > within api/v2/layouts/index_layout (32.3ms) > 2015-07-07 11:30:02 [I] Completed 200 OK in 4419ms (Views: 33.5ms | > ActiveRecord: 20.2ms) > *2015-07-07 11:30:02 [W] Operation FAILED: ERF12-7740 > [ProxyAPI::ProxyException]: Unable to delete PuppetCA certificate for > civwdd200.lab.primarydata.com > ([RestClient::RequestTimeout]: Request Timeout) for proxy > https://foreman1-vw.lab.primarydata.com:8443/puppet/ca > * > 2015-07-07 11:30:02 [I] Completed 500 Internal Server Error in 60070ms > 2015-07-07 11:30:02 [F] > ArgumentError (There was no default layout for UnattendedController in > # @paths=[/usr/share/foreman/app/views, > /opt/rh/ruby193/root/usr/share/gems/gems/foreman_setup-2.1.1/app/views, > /opt/rh/ruby193/root/usr/share/gems/gems/foreman_bootdisk-5.0.0/app/views, > /opt/rh/ruby193/root/usr/share/gems/gems/apipie-rails-0.2.6/app/views]>): > app/controllers/application_controller.rb:299:in `generic_exception' > lib/middleware/catch_json_parse_errors.rb:9:in `call' > > > > On Monday, July 6, 2015 at 5:24:15 PM UTC+3, ohad wrote: >> >> >> On Mon, Jul 6, 2015 at 5:18 PM, Avi Tal wrote: >> >>> What drives me crazy is that it usually works and then it suddenly >>> appears in one hosts provisioning. >> >> >> >> going over the logs will probably provide the missing information... >> >

or even raise the timeout?

··· On Wednesday, July 8, 2015 at 2:30:15 PM UTC+3, Avi Tal wrote: > > Do you think that due to the fact that it happens randomly and not > persistently, it is somehow related to the number of concurrent connections > i am running when requesting multiple provision operations? > > Can i somehow raise the number of workers of smart proxy? > > Thanks > > On Tuesday, July 7, 2015 at 7:31:34 PM UTC+3, Avi Tal wrote: >> >> Any idea? >> >> >> Started GET >> "/api/hostgroups?search=title+%3D+datastore+and+location_id+%3D+1" for >> 192.168.136.32 at 2015-07-07 11:29:58 -0500 >> 2015-07-07 11:29:58 [I] Processing by Api::V2::HostgroupsController#index >> as JSON >> 2015-07-07 11:29:58 [I] Parameters: {"search"=>"title = datastore and >> location_id = 1", "apiv"=>"v2", "hostgroup"=>{}} >> 2015-07-07 11:30:02 [I] Authorized user prov(Provision Automation) >> 2015-07-07 11:30:02 [I] Rendered api/v2/hostgroups/index.json.rabl >> within api/v2/layouts/index_layout (32.3ms) >> 2015-07-07 11:30:02 [I] Completed 200 OK in 4419ms (Views: 33.5ms | >> ActiveRecord: 20.2ms) >> *2015-07-07 11:30:02 [W] Operation FAILED: ERF12-7740 >> [ProxyAPI::ProxyException]: Unable to delete PuppetCA certificate for >> civwdd200.lab.primarydata.com >> ([RestClient::RequestTimeout]: Request Timeout) for proxy >> https://foreman1-vw.lab.primarydata.com:8443/puppet/ca >> * >> 2015-07-07 11:30:02 [I] Completed 500 Internal Server Error in 60070ms >> 2015-07-07 11:30:02 [F] >> ArgumentError (There was no default layout for UnattendedController in >> #> @paths=[/usr/share/foreman/app/views, >> /opt/rh/ruby193/root/usr/share/gems/gems/foreman_setup-2.1.1/app/views, >> /opt/rh/ruby193/root/usr/share/gems/gems/foreman_bootdisk-5.0.0/app/views, >> /opt/rh/ruby193/root/usr/share/gems/gems/apipie-rails-0.2.6/app/views]>): >> app/controllers/application_controller.rb:299:in `generic_exception' >> lib/middleware/catch_json_parse_errors.rb:9:in `call' >> >> >> >> On Monday, July 6, 2015 at 5:24:15 PM UTC+3, ohad wrote: >>> >>> >>> On Mon, Jul 6, 2015 at 5:18 PM, Avi Tal wrote: >>> >>>> What drives me crazy is that it usually works and then it suddenly >>>> appears in one hosts provisioning. >>> >>> >>> >>> going over the logs will probably provide the missing information... >>> >>