RFC: Add puma as the default smart-proxy server

lzap · September 4, 2018, 8:27am

Hello,

we are using ancient webrick version for smart-proxy and due to Ruby 2.0 and long-term plans to SCL smart-proxy, we are stuck on this version. Webrick does not appear to close inactive connections rendering itself to an easy DoS target, it’s performance is also not great. More on this topic at:

And

It looks like Smart Proxy boots fine on Puma which is small, fast, scalable and up-to-date Rack compatible HTTP(s) server with zero Ruby dependencies and few basic C dependencies (like openssl), so packaging should be easy.

I propose to add support for Puma in Smart Proxy launcher but still not dropping webrick from its dependencies and making it opt-in for few releases to allow good fallback option. Although puma offers both threading and forking architecture, we need to start with just single worker because current ruby code is thread safe but does not take into account there can be multiple instances (e.g. inotify threads etc). Support for multiple workers is out of scope of this effort.

Things to test:

Start, stop, restart
Logging, log file rotation
Signals
Log buffer
Inotify leases monitor
Plugin authors (announcement, ask for testing)

I’ve tested that Puma closes inactive connections correctly which gives better results against preventing from slow DoS attacks.

Quick and dirty test via Apache “ab” tool, 1000 requests with concurrency of 8. Webrick:

[lzap@box ~]$ ab -n 5000 -c 8 http://127.0.0.1:8448/features
Document Path:          /features
Document Length:        128 bytes

Concurrency Level:      8
Time taken for tests:   11.567 seconds
Complete requests:      5000
Failed requests:        0
Total transferred:      1640000 bytes
HTML transferred:       640000 bytes
Requests per second:    432.26 [#/sec] (mean)
Time per request:       18.507 [ms] (mean)
Time per request:       2.313 [ms] (mean, across all concurrent requests)
Transfer rate:          138.46 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     2   18   8.5     17      69
Waiting:        1   13   7.2     12      53
Total:          2   18   8.5     17      69

Percentage of the requests served within a certain time (ms)
  50%     17
  66%     21
  75%     23
  80%     25
  90%     30
  95%     34
  98%     40
  99%     44
 100%     69 (longest request)

Puma:

[lzap@box ~]$ ab -n 5000 -c 8 http://127.0.0.1:9292/features
Document Path:          /features
Document Length:        128 bytes

Concurrency Level:      8
Time taken for tests:   3.713 seconds
Complete requests:      5000
Failed requests:        0
Total transferred:      1165000 bytes
HTML transferred:       640000 bytes
Requests per second:    1346.57 [#/sec] (mean)
Time per request:       5.941 [ms] (mean)
Time per request:       0.743 [ms] (mean, across all concurrent requests)
Transfer rate:          306.40 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:     1    6   3.4      5      30
Waiting:        0    3   1.1      2      10
Total:          1    6   3.4      5      30

Percentage of the requests served within a certain time (ms)
  50%      5
  66%      6
  75%      7
  80%      8
  90%     10
  95%     12
  98%     15
  99%     18
 100%     30 (longest request)

Puma appears to be 3.1x faster in handling /features API call than Webrick. This is naive test which aims to measure performance of the web server stack, in real world requests most of the time is spent in waiting for (network) IO but it gives the picture about efficiency of the middleware.

ohadlevy · September 4, 2018, 8:53am

Puma support cluster mode, do you think we could consider using that as well?
maybe additional approach for scale would be to start each proxy service as its own puma process with a simple LB in-front (e.g haproxy or even apache?)

lzap · September 4, 2018, 9:07am

As I state in my OP, I intentionally put cluster mode out of the table now as I smell some issues with it (multiple processes calling nsupdate/omshell should be fine but I can’t tell for all plugins), but yes long term this will prepare smart proxy to be full-blown multithreaded multiple worker server with thread pools.

One important thing I forgot:

Test on Windows (puma should work but our modules and launcher code)

sean797 · September 4, 2018, 10:25am

No, this would require changes like Fixes #18717 - Allow for loadbalanced and multi-homed proxies by sean797 · Pull Request #4561 · theforeman/foreman · GitHub first for all the reasons we explored during those discussions.

ehelms · September 4, 2018, 6:04pm

Would this change require SCLing the smart proxy or are we able to switch to puma without making the packaging change? We already package puma as an RPM within the SCL so modifying it for system Ruby as well shouldn’t be too difficult if it can support Ruby 2.0. If this will require SCL, then we should discuss that and timeline to get both done.

BTW, I am all for this, I just jumped right into the logistics out of excitement

TimoGoebel · September 5, 2018, 5:51am

I‘m all for this as well. Let‘s start without cluster support to not make the task too big.

Can we extend the smart-proxy module registration api with a statement so that plugins can state that they support the cluster mode? And activate it if all enabled plugins habe support? That way a smart-proxy with just the httpboot and templates modules enabled could make use of the performance boost the cluster mode will probably bring.

ik5 · September 5, 2018, 7:46am

Let me separate my answer to two stages please:

Webrick issues
Puma and such moves.

Webrick

Webrick is a default web server that comes with Ruby. It is multi-threaded, and have some cool things to offer (using Rack as it’s main layer, plugable, pure Ruby, arrives as gem these days and it’s even easy and simple to do some pure HTTP server using servlet by mounting over it, to name a few).

It is using Ruby’s TCP (HTTP) stack. and 100% of it’s code is pure Ruby stack as well.

Many of it’s features are configurable, and we can also add our own servlet and server types.

We can configure it to act to our needs, and also change settings if needed to close connection, instead of setting the keep-alive (usually done with TCP keep-alive) that is true by default.

Today Webrick is a gem instead of Ruby’s stdlib, ever since Ruby moved to that type of delivery, so that Ruby can take longer then any stdlib to be at the oven, while a faster release cycle is introduce to the rest of the library that arrives from Ruby.

Puma vs Webrick

Puma is a great web server. I really think we should use it if it solves issues for us.
The thing is, is that I’m not sure how much of the effort is Webrick vs configuration of Webrick to our needs.

Furthermore, comparing an old Ruby 2.0.0 Webrick to the current version is a bit of an issue in my opinion.
a lot of bugs were solved, while the basics remains the same.

Many of the sane issues of that require us to solve with Webrick is by default set on Puma, and we can also write many types of Ruby script to handle configurations if needed, because that’s the puma configuration.

Buttom line

While I do think we should configure webrick to our needs, we also need to see the cost effective for moving to Puma and what takes more effort for us.

I’m not sure that just moving to a server just because it is not configured to our needs is the right way, but Puma is faster, and have smaller footprint then Webrick, so we need to see the cost effective part before jumping in and just moving.

mmoll · September 5, 2018, 1:42pm

I’d also say, SCL first, tackle all the rest potentially later.

Moving to puma shouldn’t also be a big thing then, or are having some special rack-magic in place that’s expected not to work with puma?

ekohl · September 5, 2018, 1:50pm

I wouldn’t mind SCL-izing the proxy. We can also revisit the whole x_core situation in various gems after that.

lzap · September 5, 2018, 5:11pm

Puma runs on Ruby 2.0+ therefore no extra packaging is needed, that’s actually what I considered as the primary goal of this RFC. However, I haven’t tested puma yet on Ruby 2.0 from RHEL. It should work according to docs.

lzap · September 5, 2018, 5:14pm

Awesome idea which is reasonable to develop, however we must not rush into enabling most plugins - we need to carefully test them. Only simple code review is not usually enough.

But this goes hand-in-hand with containerization effort, once we have set of modules which work in cluster mode, we can think about breaking them into microservices. In the smart-proxy context it makes sense as these are very isolated with well defined interface and API. This is out of scope for the initial phase tho.

lzap · September 5, 2018, 5:26pm

The idea behind this RFC is to solve the slow-dos attack without actually doing all the SCL packaging work.

By the way, Puma is 3.5x faster in that naive /features GET test than current latest and greatest Webrick, I was not comparing to three years old version of course.

That’s not the case, Webrick which we use does not work as expected. Upgrading means SCLing whole smart proxy, that’s not an easy job. The discussion is about SCL vs move to Puma.

Not much, both are Rack servers. Note that smart-proxy has its own launche code, we will not be using puma configuration parser, the goal is to continue using our settings mechanism.

ik5 · September 6, 2018, 4:12am

Two things, first what is “SCL” (sorry ) ?
Second, the current issue is that there is by default a keep-alive to the response, that is keeping the connection open and a DoS is happening because of it, but the good news, that it is configurable.

But do we care for such speeds? We are not a web service that needs to handle many connection per seconds to my knowledge, so does it matter, or just nice to have?

I know of that, that’s why I was talking about the effort.
Have a look at places such launcher.rb#L151, It uses actual HTTPServer class of Webrick to mount to rack.

I’m not sure that you can do that with puma, at least not as simple as that - hence my talk about effort.

Please note that I’m not against the move, just feeling that the conversation should have started a bit different, and that’s why I added all the extra information.

lzap · September 6, 2018, 2:06pm

Hehe sure. SCL stands for Software Collections, a way of packaging multiple versions of software in Fedora and RHEL. For example you can install postgresql 9.x in RHEL 7 (which has 8.x version at the release date years ago). We were using some SCLs like Ruby on Rails and Ruby itself, now we only use Ruby and we build our own SCL.

The thing is - libraries and programs do not “see each other”, therefore SCL rubygem cannot be used outside of SCL context and vice versa. Thus in order to upgrade Proxy to Ruby 2.4 or whatever is our current stable Ruby we use for production, you need to move everything into SCL.

Are you sure about this? Can you show me configuration change on production system (Ruby 2.0) that does that so I can verify it using telnet? I am afraid keep-alive and idle session timeout are two different things. We suffer from the latter. You can have a werver which does accept keep-alive clients but still is not vulnerable to slow dos attacks closing non active or very slow connections, these are in fact most of the modern and up-to-date web servers like apache2, ngix and similar. Simply try to connect to an apache2 instance and keep entering headers for 30 seconds very slowly, you will be disconnected (this is also known as slow headers attack).

Nice to have, I thought this is clear from my OP. The goal is to mitigate the slow dos attack but avoiding SCLing proxy, all the rest is bonus.

Sure but if you scroll down, we only do one thing with this object - call start method in a separate thread and that’s all. I assume there is a similar class in Puma, but I am just guessing to be honest.

That is fine! You provide fair arguments, if this turns out to be too of a technical challenge or we find a nice way of fising this, let’s stay with webrick - sure. It’s not like I am keen of dropping webrick - doing migrations for nothing is the worst thing to do!

Let me wrap up with:

Let’s only think about changing stack to Puma if it is reasonably easy, fulfills all requirements (incl. Windows support) and stable in production.

iNecas · September 6, 2018, 2:42pm

It should be something like (see GitHub - puma/puma: A Ruby/Rack web server built for parallelism for different binds options:

::Rack::Handler::Puma.run(app, binds: ['tcp://localhost:9292'])

Konstantin_Orekhov · September 10, 2018, 10:00pm

Even though I don’t see connection stacking issue in my environment, slow-header tests definitely show service impact for SmartProxy that may explain some instability of the latter in our production (all of the sudden I see that no responses coming back from foreman-proxy yet the process is running - I think I posted a question about that in a support forum at some point back).

Also, increasing performance 3x is a definite and a big “yes” factor, so I’d love to run Puma under our production load and see how things are. Are there any (even preliminary) docs describing a process of SmartProxy migration to Puma?

Thanks!

lzap · September 11, 2018, 11:51am

You are reading the document right now

Konstantin_Orekhov · September 11, 2018, 4:13pm

Well, yes, but I meant a doc that may have actual steps like those that you had to do to perform testing of smart-proxy under Puma vs Webrick. So unfamiliar with Puma like myself could try it before the deployment procedure is fully automated (under installer or whatever). Again, if there is such a doc already.

Thanks!

ik5 · September 12, 2018, 8:12am

The code located here: webrick/lib/webrick/httpresponse.rb at master · ruby/webrick · GitHub

I’ll try to figure out how to change it, I have no experience effecting webrick directly.

lzap · September 15, 2018, 6:49am

Then turn it off and try with telnet, I am pretty sure this does not affect slow DoS attacks at all.