Ideas for http(s) proxy on smart-proxy

Hey,

there were some ideas about incorporating http(s) proxy into our
smart-proxy node. This would be useful if datacenters have no direct
connection to internet or a network with managing foreman instance.
I investigated this a bit.

In the text bellow, when I refer "proxy" I mean http(s) proxy, when I
write "smart proxy" I mean, you guessed it, foreman proxy.

So the idea would be to install and configure a http proxy, let's say
Squid which is small but powerful enough http(s) proxy. My understanding
of the whole idea is really http proxy for a managed LAN (not reverse
proxy for a managed LAN). Drawn as something like:

MANAGED HOST --> PROXY --> FOREMAN

Three main areas would need some changes:

A) Unattended installations. Anaconda, the Red Hat and Fedora installer,
does support http proxy setting in kickstart and I assume the same for
other Linux distributions.

User would need to specify additional smart proxy on a host with "http
proxy" feature turned on. This would be used in the provision template.
Therefore minimum Foreman UI impact.

B) Discovery. This one is slightly more difficult. Since we are booting
new bare metal, the server has no context. But since we are usually
booting from a TFTP server (from a smart proxy) and the configuration
is deployed by foreman, we should be able to give the context by
providing a kernel command line option specifying the proxy to use. And
we can do this in the template - we just need to create a
variable/function that returns proxy hostname which the template is
being rendered on.

Additional change needs to be done in the discover_host.rb script - to
make use of the kernel option setting proxy for HTTP ruby client.

C) Subscription-manager. Assuming that rhsm does support HTTP CONNECT
method for TCP tunneling of HTTPS protocol (Squid does support that), we
can configure rhsm using Foreman templates similar way. The only thing
that needs to be done is a variable/method returning the http proxy that
has been set for a host in the rhsm template snippet.

I assume that the proxy should be only used for subscription actions,
the content is synced on the smart proxy anyway so there is no added
value of proxying the content. On the other hand, content proxying is
possible and might be useful for large setups (one proxy with pulp,
other proxies - sublans - only proxy). But since the content is
available mainly under https, it cannot be cached anyway (there are some
nasty tricks but let's skip that for now).

Are there any other services that could take advantage of proxy
installed on a smart proxy?

I can only think of VNC/Spice acces if we move Compute Resource
management to smart proxies (currently foreman connect directly).

Opinions?

··· -- Later,

Lukas “lzap” Zapletal
irc: lzap #theforeman

> A) Unattended installations. Anaconda, the Red Hat and Fedora installer,
> does support http proxy setting in kickstart and I assume the same for
> other Linux distributions.

I think assuming that could cause us problems. Relying on it could
shut us out of some OSes if they don't support it (or there are bugs
in their implementation of it).

> User would need to specify additional smart proxy on a host with "http
> proxy" feature turned on. This would be used in the provision template.
> Therefore minimum Foreman UI impact.

Bear in mind we already have pull requests open for handling templates
on the smart-proxy, which correctly handles re-writing the URLs in the
template, so the client is talking direct to the smart-proxy, and the
smart-proxy separately talks to Foreman. In that scenario, we no
longer need an HTTP proxy (and thus don't need to assume the OS
installer supports it). There is still the open question of how to
have the smart-proxy listen on http and https, but I'm sure we can
solve that.

Of course, I might be biased since I wrote the template-proxying PR in
the first place :slight_smile:

> B) Discovery. This one is slightly more difficult. Since we are booting
> new bare metal, the server has no context. But since we are usually
> booting from a TFTP server (from a smart proxy) and the configuration
> is deployed by foreman, we should be able to give the context by
> providing a kernel command line option specifying the proxy to use. And
> we can do this in the template - we just need to create a
> variable/function that returns proxy hostname which the template is
> being rendered on.
>
> Additional change needs to be done in the discover_host.rb script - to
> make use of the kernel option setting proxy for HTTP ruby client.

Something like this was implemented in the PR for (A) (see [1]). It
already handles PXE templates (necessary for (A) to work), so that's a
zero-effort win if we solve (A) first.

I'm also thinking we can do this entirely in the smart-proxy, if we
wish, since all the code (foreman, smart-proxy, and discovery image)
is under our control. Foreman is already contacting the smart-proxy in
order to issue the reboot command (i.e it's not a direct foreman->host
connection) so we could extend this. It's not really any different to
how we handle BMC communication today.

I think an HTTP proxy might be useful for the Discovered Host ->
Foreman communication (e.g when registering itself), but again, that
could easily be a RESTful route on the smart-proxy - even easier
if/when we finally add plugin support to the smart-proxy.

> C) Subscription-manager. Assuming that rhsm does support HTTP CONNECT
> method for TCP tunneling of HTTPS protocol (Squid does support that), we
> can configure rhsm using Foreman templates similar way. The only thing
> that needs to be done is a variable/method returning the http proxy that
> has been set for a host in the rhsm template snippet.
>
> I assume that the proxy should be only used for subscription actions,
> the content is synced on the smart proxy anyway so there is no added
> value of proxying the content. On the other hand, content proxying is
> possible and might be useful for large setups (one proxy with pulp,
> other proxies - sublans - only proxy). But since the content is
> available mainly under https, it cannot be cached anyway (there are some
> nasty tricks but let's skip that for now).

This is only case I can see where using an HTTP proxy could save us a
lot of work (because handling https is a pain). For A) we have a
solution 95% coded, and for B) we'd have to do the same amount of work
(roughly) for using an HTTP proxy or extending the smart-proxy.

> Are there any other services that could take advantage of proxy
> installed on a smart proxy?
>
> I can only think of VNC/Spice acces if we move Compute Resource
> management to smart proxies (currently foreman connect directly).

I'd consider that fairly low-priority.

Overall I'm not against using Squid (or another HTTP proxy) but my concerns are:

  • Is there really only one (or maybe 1.5) use case for it?
  • Keeping the smart-proxy lightweight (mine runs on a Rapsberry Pi,
    and I know it can be run on an OpenWRT router with some effort) - this
    would need to be optional. If it's a lot of work to correctly make it
    optional, perhaps we should just handle it in the smart-proxy?

Personally I'd like to see us finish plugin support for the
smart-proxy (Sam? :P). That would make it a lot easier to play with
this stuff and see what works.

Greg

[1] https://github.com/theforeman/foreman/pull/751/files#diff-6632a0077c474a44423343eb597f9d72R24

··· On 13 February 2014 14:05, Lukas Zapletal wrote:

> there were some ideas about incorporating http(s) proxy into our
> smart-proxy node. This would be useful if datacenters have no direct
> connection to internet or a network with managing foreman instance.
> I investigated this a bit.
>
> In the text bellow, when I refer "proxy" I mean http(s) proxy, when I
> write "smart proxy" I mean, you guessed it, foreman proxy.
>
> So the idea would be to install and configure a http proxy, let's say
> Squid which is small but powerful enough http(s) proxy. My understanding
> of the whole idea is really http proxy for a managed LAN (not reverse
> proxy for a managed LAN). Drawn as something like:
>
> MANAGED HOST --> PROXY --> FOREMAN
>
> Three main areas would need some changes:
>
> A) Unattended installations. Anaconda, the Red Hat and Fedora installer,
> does support http proxy setting in kickstart and I assume the same for
> other Linux distributions.
>
> User would need to specify additional smart proxy on a host with "http
> proxy" feature turned on. This would be used in the provision template.
> Therefore minimum Foreman UI impact.

First of all, let's take a look at the available work to proxy:

My personal preference is the Apache solution since we already run
apache for puppet on our smartproxies, but I get that maybe not everyone
does this so mabe we should support multiple options? A provision_proxy
feature with multiple implementations comes to mind. Then
foreman_url('provision') should take that into account and the client
never needs to know anything about the proxy.

> B) Discovery. This one is slightly more difficult. Since we are booting
> new bare metal, the server has no context. But since we are usually
> booting from a TFTP server (from a smart proxy) and the configuration
> is deployed by foreman, we should be able to give the context by
> providing a kernel command line option specifying the proxy to use. And
> we can do this in the template - we just need to create a
> variable/function that returns proxy hostname which the template is
> being rendered on.
>
> Additional change needs to be done in the discover_host.rb script - to
> make use of the kernel option setting proxy for HTTP ruby client.

Is this foreman_url('provision') again?

> C) Subscription-manager. Assuming that rhsm does support HTTP CONNECT
> method for TCP tunneling of HTTPS protocol (Squid does support that), we
> can configure rhsm using Foreman templates similar way. The only thing
> that needs to be done is a variable/method returning the http proxy that
> has been set for a host in the rhsm template snippet.
>
> I assume that the proxy should be only used for subscription actions,
> the content is synced on the smart proxy anyway so there is no added
> value of proxying the content. On the other hand, content proxying is
> possible and might be useful for large setups (one proxy with pulp,
> other proxies - sublans - only proxy). But since the content is
> available mainly under https, it cannot be cached anyway (there are some
> nasty tricks but let's skip that for now).
>
> Are there any other services that could take advantage of proxy
> installed on a smart proxy?

Not sure if subscription-manager covers puppet ca, but that could
benefit from it as well. There's a nice apache example in
http://docs.puppetlabs.com/guides/scaling_multiple_masters.html#option-2-proxy-certificate-traffic.
A puppet_ca_proxy feature comes to mind.

··· On Thu, Feb 13, 2014 at 03:05:39PM +0100, Lukas Zapletal wrote:

I can only think of VNC/Spice acces if we move Compute Resource
management to smart proxies (currently foreman connect directly).

Hey Greg,

> > A) Unattended installations. Anaconda, the Red Hat and Fedora installer,
> > does support http proxy setting in kickstart and I assume the same for
> > other Linux distributions.
>
> I think assuming that could cause us problems. Relying on it could
> shut us out of some OSes if they don't support it (or there are bugs
> in their implementation of it).

I understand, proxy would be totally optional.

> > User would need to specify additional smart proxy on a host with "http
> > proxy" feature turned on. This would be used in the provision template.
> > Therefore minimum Foreman UI impact.
>
> Bear in mind we already have pull requests open for handling templates
> on the smart-proxy, which correctly handles re-writing the URLs in the
> template, so the client is talking direct to the smart-proxy, and the
> smart-proxy separately talks to Foreman. In that scenario, we no
> longer need an HTTP proxy (and thus don't need to assume the OS
> installer supports it). There is still the open question of how to
> have the smart-proxy listen on http and https, but I'm sure we can
> solve that.

I know, but the concept of our smart proxy is a simple process in one
thread. Once we start putting things like proxying stuff there, we would
need to change from Webrick to something else (which is not a bad thing
generally).

I like the idea of rendering templates on the proxy side, on the other
hand if the user forbids internet access (or access to foreman) there
must be some reason and proxy is needed anyway. At least for rhsm. Then
this could be win-win.

> > B) Discovery. This one is slightly more difficult. Since we are booting
> > new bare metal, the server has no context. But since we are usually
> > booting from a TFTP server (from a smart proxy) and the configuration
> > is deployed by foreman, we should be able to give the context by
> > providing a kernel command line option specifying the proxy to use. And
> > we can do this in the template - we just need to create a
> > variable/function that returns proxy hostname which the template is
> > being rendered on.
> >
> > Additional change needs to be done in the discover_host.rb script - to
> > make use of the kernel option setting proxy for HTTP ruby client.
>
> Something like this was implemented in the PR for (A) (see [1]). It
> already handles PXE templates (necessary for (A) to work), so that's a
> zero-effort win if we solve (A) first.
>
> I'm also thinking we can do this entirely in the smart-proxy, if we
> wish, since all the code (foreman, smart-proxy, and discovery image)
> is under our control. Foreman is already contacting the smart-proxy in
> order to issue the reboot command (i.e it's not a direct foreman->host
> connection) so we could extend this. It's not really any different to
> how we handle BMC communication today.

Yeah, this is definitely an option and I do not lean towards any
solution yet. Collecting some input. I'd also like to hear opinion from
Katello guys because they have experiences with Candlepin proxying.
Any bugs on that page?

> I think an HTTP proxy might be useful for the Discovered Host ->
> Foreman communication (e.g when registering itself), but again, that
> could easily be a RESTful route on the smart-proxy - even easier
> if/when we finally add plugin support to the smart-proxy.

Yeah, again if our smart-proxy scales well, then this is definitely a
way. Keep in mind proxy currently does run on all ruby platforms and
versions, including Windows. We want to keep that, so this might tighten
our hands a bit when speaking about scalability, threading, moving away
from webrick etc.

> > C) Subscription-manager. Assuming that rhsm does support HTTP CONNECT
> > method for TCP tunneling of HTTPS protocol (Squid does support that), we
> > can configure rhsm using Foreman templates similar way. The only thing
> > that needs to be done is a variable/method returning the http proxy that
> > has been set for a host in the rhsm template snippet.
> >
> > I assume that the proxy should be only used for subscription actions,
> > the content is synced on the smart proxy anyway so there is no added
> > value of proxying the content. On the other hand, content proxying is
> > possible and might be useful for large setups (one proxy with pulp,
> > other proxies - sublans - only proxy). But since the content is
> > available mainly under https, it cannot be cached anyway (there are some
> > nasty tricks but let's skip that for now).
>
> This is only case I can see where using an HTTP proxy could save us a
> lot of work (because handling https is a pain). For A) we have a
> solution 95% coded, and for B) we'd have to do the same amount of work
> (roughly) for using an HTTP proxy or extending the smart-proxy.

On the other hand, the code is already in Katello, we might move it from
there to smart proxy. Why would we proxy via two endpoints when we can
send directly to Candlepin? Then no Squid would be required at all.

> Overall I'm not against using Squid (or another HTTP proxy) but my concerns are:
>
> * Is there really only one (or maybe 1.5) use case for it?

That's why I ask here. Please community users, chip in. Would you like
this feature? What are your current pain points?

> * Keeping the smart-proxy lightweight (mine runs on a Rapsberry Pi,
> and I know it can be run on an OpenWRT router with some effort) - this
> would need to be optional. If it's a lot of work to correctly make it
> optional, perhaps we should just handle it in the smart-proxy?

This would be optional, turned off by default I guess.

> Personally I'd like to see us finish plugin support for the
> smart-proxy (Sam? :P). That would make it a lot easier to play with
> this stuff and see what works.

I am fine with any way, but what I like about the squid approach is we
just need to deploy it (new puppet module in the installer) and then to
make several tiny changes here and there in Foreman. No changes on the
smart proxy side. That looks attractive too.

Thanks for your valid points.

··· -- Later,

Lukas “lzap” Zapletal
irc: lzap #theforeman

I think either is fine. Given the thundering heard problems as we get to
higher densities… i think something like squid or apache would be better.

– bk

··· On 02/13/2014 10:17 AM, Ewoud Kohl van Wijngaarden wrote: > On Thu, Feb 13, 2014 at 03:05:39PM +0100, Lukas Zapletal wrote: >> >there were some ideas about incorporating http(s) proxy into our >> >smart-proxy node. This would be useful if datacenters have no direct >> >connection to internet or a network with managing foreman instance. >> >I investigated this a bit. >> > >> >In the text bellow, when I refer "proxy" I mean http(s) proxy, when I >> >write "smart proxy" I mean, you guessed it, foreman proxy. >> > >> >So the idea would be to install and configure a http proxy, let's say >> >Squid which is small but powerful enough http(s) proxy. My understanding >> >of the whole idea is really http proxy for a managed LAN (not reverse >> >proxy for a managed LAN). Drawn as something like: >> > >> >MANAGED HOST --> PROXY --> FOREMAN >> > >> >Three main areas would need some changes: >> > >> >A) Unattended installations. Anaconda, the Red Hat and Fedora installer, >> >does support http proxy setting in kickstart and I assume the same for >> >other Linux distributions. >> > >> >User would need to specify additional smart proxy on a host with "http >> >proxy" feature turned on. This would be used in the provision template. >> >Therefore minimum Foreman UI impact. > First of all, let's take a look at the available work to proxy: > > *https://github.com/theforeman/smart-proxy/pull/31 [manage squid] > *https://github.com/theforeman/smart-proxy/pull/100 [built in proxy] > * Rewrite rule in apache > http://www.slideshare.net/inovex/deploying-foreman-in-enterprise-environments slide 13 > > My personal preference is the Apache solution since we already run > apache for puppet on our smartproxies, but I get that maybe not everyone > does this so mabe we should support multiple options? A provision_proxy > feature with multiple implementations comes to mind. Then > foreman_url('provision') should take that into account and the client > never needs to know anything about the proxy. >

> Hey Greg,
>
>>> A) Unattended installations. Anaconda, the Red Hat and Fedora installer,
>>> does support http proxy setting in kickstart and I assume the same for
>>> other Linux distributions.
>>
>> I think assuming that could cause us problems. Relying on it could
>> shut us out of some OSes if they don't support it (or there are bugs
>> in their implementation of it).
>
> I understand, proxy would be totally optional.
>
>>> User would need to specify additional smart proxy on a host with "http
>>> proxy" feature turned on. This would be used in the provision template.
>>> Therefore minimum Foreman UI impact.
>>
>> Bear in mind we already have pull requests open for handling templates
>> on the smart-proxy, which correctly handles re-writing the URLs in the
>> template, so the client is talking direct to the smart-proxy, and the
>> smart-proxy separately talks to Foreman. In that scenario, we no
>> longer need an HTTP proxy (and thus don't need to assume the OS
>> installer supports it). There is still the open question of how to
>> have the smart-proxy listen on http and https, but I'm sure we can
>> solve that.
>
> I know, but the concept of our smart proxy is a simple process in one
> thread. Once we start putting things like proxying stuff there, we would
> need to change from Webrick to something else (which is not a bad thing
> generally).

You probably know by now that I'm a huge fan of moving away from a
single-threaded proxy, but I don't think we specifically need to do it
to fetch templates through the proxy. The operation happens rarely
enough and is fast enough to render that it wouldn't significantly slow
down operations.

Another thing Greg and I have thrown around in the past is the potential
to pre-render the output of templates and store them on the proxy. Then
they could be invalidated when the template which was used to render
that output changes. The only major problem there is that we'd likely
have to introduce a new feature into the proxy called 'Templates' or
something like that so we know which hosts need to have new output
written to the proxy's FS. Sorry if this is a little tangential to the
issues at hand, but I figured I would publicly document it since most of
the conversations about it have happened IRL.

-S

··· On 02/13/2014 10:26 AM, Lukas Zapletal wrote:

I like the idea of rendering templates on the proxy side, on the other
hand if the user forbids internet access (or access to foreman) there
must be some reason and proxy is needed anyway. At least for rhsm. Then
this could be win-win.

B) Discovery. This one is slightly more difficult. Since we are booting
new bare metal, the server has no context. But since we are usually
booting from a TFTP server (from a smart proxy) and the configuration
is deployed by foreman, we should be able to give the context by
providing a kernel command line option specifying the proxy to use. And
we can do this in the template - we just need to create a
variable/function that returns proxy hostname which the template is
being rendered on.

Additional change needs to be done in the discover_host.rb script - to
make use of the kernel option setting proxy for HTTP ruby client.

Something like this was implemented in the PR for (A) (see [1]). It
already handles PXE templates (necessary for (A) to work), so that’s a
zero-effort win if we solve (A) first.

I’m also thinking we can do this entirely in the smart-proxy, if we
wish, since all the code (foreman, smart-proxy, and discovery image)
is under our control. Foreman is already contacting the smart-proxy in
order to issue the reboot command (i.e it’s not a direct foreman->host
connection) so we could extend this. It’s not really any different to
how we handle BMC communication today.

Yeah, this is definitely an option and I do not lean towards any
solution yet. Collecting some input. I’d also like to hear opinion from
Katello guys because they have experiences with Candlepin proxying.
Any bugs on that page?

I think an HTTP proxy might be useful for the Discovered Host ->
Foreman communication (e.g when registering itself), but again, that
could easily be a RESTful route on the smart-proxy - even easier
if/when we finally add plugin support to the smart-proxy.

Yeah, again if our smart-proxy scales well, then this is definitely a
way. Keep in mind proxy currently does run on all ruby platforms and
versions, including Windows. We want to keep that, so this might tighten
our hands a bit when speaking about scalability, threading, moving away
from webrick etc.

C) Subscription-manager. Assuming that rhsm does support HTTP CONNECT
method for TCP tunneling of HTTPS protocol (Squid does support that), we
can configure rhsm using Foreman templates similar way. The only thing
that needs to be done is a variable/method returning the http proxy that
has been set for a host in the rhsm template snippet.

I assume that the proxy should be only used for subscription actions,
the content is synced on the smart proxy anyway so there is no added
value of proxying the content. On the other hand, content proxying is
possible and might be useful for large setups (one proxy with pulp,
other proxies - sublans - only proxy). But since the content is
available mainly under https, it cannot be cached anyway (there are some
nasty tricks but let’s skip that for now).

This is only case I can see where using an HTTP proxy could save us a
lot of work (because handling https is a pain). For A) we have a
solution 95% coded, and for B) we’d have to do the same amount of work
(roughly) for using an HTTP proxy or extending the smart-proxy.

On the other hand, the code is already in Katello, we might move it from
there to smart proxy. Why would we proxy via two endpoints when we can
send directly to Candlepin? Then no Squid would be required at all.

Overall I’m not against using Squid (or another HTTP proxy) but my concerns are:

  • Is there really only one (or maybe 1.5) use case for it?

That’s why I ask here. Please community users, chip in. Would you like
this feature? What are your current pain points?

  • Keeping the smart-proxy lightweight (mine runs on a Rapsberry Pi,
    and I know it can be run on an OpenWRT router with some effort) - this
    would need to be optional. If it’s a lot of work to correctly make it
    optional, perhaps we should just handle it in the smart-proxy?

This would be optional, turned off by default I guess.

Personally I’d like to see us finish plugin support for the
smart-proxy (Sam? :P). That would make it a lot easier to play with
this stuff and see what works.

I am fine with any way, but what I like about the squid approach is we
just need to deploy it (new puppet module in the installer) and then to
make several tiny changes here and there in Foreman. No changes on the
smart proxy side. That looks attractive too.

Thanks for your valid points.

What are the expectations of simultaneous connections? Lukas' examples
above basically all boil to template rendering, which is only going to
happen once per build. Any given piece of hardware has an effectively
random boot time (in seconds) so even if you simultaneously reboot a
whole rack, then the proxy would not recieve 42 simultaneous requests
for templates. The template retrieval should be pretty quick, and
fairly infrequent, so I expect it to scale fairly well, even in the
current architecture.

Greg

··· On 13 February 2014 15:22, Bryan Kearney wrote: > > I think either is fine. Given the thundering heard problems as we get to > higher densities... i think something like squid or apache would be better.

That is all true, the problem with the current proxy architecture is the
only one thread is not dedicated for unattended, but it handles all the
requests. Foreman creates hosts, will send mcollective stuff. These are
the major concerns.

Again, nothing we cannot solve. I guess there are pure Ruby web servers
that handle at least some load (on newer Ruby versions I guess).

··· On Thu, Feb 13, 2014 at 03:58:02PM +0000, Greg Sutcliffe wrote: > What are the expectations of simultaneous connections? Lukas' examples > above basically all boil to template rendering, which is only going to > happen once per build. Any given piece of hardware has an effectively > random boot time (in seconds) so even if you simultaneously reboot a > whole rack, then the proxy would not recieve 42 simultaneous requests > for templates. The template retrieval should be pretty quick, and > fairly infrequent, so I expect it to scale fairly well, even in the > current architecture.


Later,

Lukas “lzap” Zapletal
irc: lzap #theforeman