Highly Available Smart Proxies (part 2)

sean797 · February 13, 2018, 11:34am

And the load-balancer approach explicitly doesn’t try to solve this:

Sure, but the other approach does allow it.

the fact that you need to define the group twice actually suggests, that it’s not the right approach to solve this case.

Maybe, but there’s so many different scenario’s with groups and multihoming Smart Proxies, we cannot possible make everyone neat and tidy. I want to initially enable all (or as many as we can) but only support a subset, we don’t know how these features will evolve over time or people might use them, I would like to explicitly allow for experimentation with these features.

STI approach

I see SmartProxyGroup/Route/Loadbalancer/Whatever as a completely different object than a Smart Proxy, with a different purpose and attributes. The only attributes they would hold that are the same would be an alternative Hostname/Route and a Name attribute right? If we use STI I think we may regret that as the 2 objects evolve over time. They are different objects, not like Manged Hosts & Discovery Hosts, one is a way of Grouping Smart Proxies and the other is the Smart Proxies themselves. Very different objects with different purposes.

Can you explain this a little more? If we had a SmartProxyGroup/Route object that Hosts, Subnets & Domains are associated to I don’t understand how that would make your “brain boil”. You set some predefined set of SmartProxyGroups/Routes then associate them when creating a Hosts, Subnets or Domains.

Anyway, I explained one initial concern, but like @lzap I’m interested in hearing a fuller explanation and will reserve judgment until then

Gwmngilfen · February 13, 2018, 10:29pm

Is there value in doing a Deep Dive video chat on this, as a sort of panel discussion? It might help to figure out where the common ground is, at least

James_Shewey · February 14, 2018, 5:47pm

I’m not sure which load balancer solution you are planning to use or want to support, however I can tell you that F5s have something called iApp templates that can be written to solve this particular problem of having a large quantity of virtual servers to manage (devcentral registration required, but free) - they use it to resolve this issue for things like load balancing Exchange and Lync / Skype for Business which also need a large number of virtual servers and need iRules for manging issues with statefulness.

The lab license for a VM of their load balancer is about $100. You can also use iRules LX to do anything crazy thing you can manage to code out in node.js.

Marek_Hulan · February 14, 2018, 6:36pm

Just to avoid few rounds of replies, @iNecas I recommen you read my proposal at https://github.com/theforeman/foreman/pull/4561#issuecomment-348196075 see the header Load balancing in this comment and then reactions. The idea was the same, but I didn’t want to introduce STI, flag on smart proxy seemed enough. Now I think everyone agrees on need of load-balancer object and the only question is whether we need it as new entity or it can be special case of proxy. I see the biggest benefit of reusing proxy object in that we don’t have to change any API (v2 and internal objects API). But after further recent discussion I don’t have extremely strong opinion on it as it might be a bit more confusing for users unless we clearly say “this smart proxy is in fact load balancer or alias” somehow in UI.

iNecas · February 16, 2018, 8:24am

With this, for me is the most clear solution (taking all of the other porposal into consideration) to fit to the rest of the system.

sean797 · February 19, 2018, 3:07pm

@iNecas Can you explain the STI approach further please? (please see @lzap and mine latest replies). As I explained above I don’t see how using Single Table Inheritance is of any real benefit considering:

They would share and only Name, and maybe an alternative hostname attributes.
As a user if the UI/API parameter says “puppet_proxy” I’m expecting to put a Smart Proxy in, not a “Smart Proxy Group” type object. I think a change like that requires API changes so users know that something has changed, attempting to trick users into not noticing could lead to frustrated users, especially when debugging and not realizing something has changed.

iNecas · February 22, 2018, 7:47pm

First of all sry for late reply, my inbox kind of exploded earlier this week with the number of unread e-mails, including notifications from this one

With the virtual proxies, it actually looks to me like we could

It depends on which angle you are looking at that… From the host perspective, they are very similar with very similar perspective. Using STI doesn’t mean that everything needs to be in this one model, and we can use composition there (where the load balancer proxy would use internally a proxy group or whatever we would need to provide the best model).

I’m not big fan for STI either, and there are places were they cause troubles (cough…taxono…cough…mies). And even for discovered hosts, I’m not that sure. However, there are other places, such as compute resources, where IMO their usage is justified, even when the compute resources differ quite a lot.

The biggest struggle I had was when I started thinking about hammer/API commands. Currently, when I register a smart proxy, I do hammer proxy list, and from there I see all the ids is can use when assigning them to the hosts. With the original approach, I would need to go to hammer smart-proxy-group list, where by default, there would be just the single-proxy groups. As others mentioned in the original PR, the case where the users would actually use the group is quite edge comparing to the case of 1 proxy mapping.

Another assumption of the original proposal (and correct me, if I’m wrong) was that all the features of the proxies in the group should be load-balancable (I take it from the fact, that you don’t mention features assigned to the groups). This is quite a limitation from my point, as for example, I don’t think it would be that rare that I would want to define a load-balanced group just for, lets say remote execution, where there would be one proxy with rex and puppetca features, and additional proxy with rex only and I would like to use those as one group for rex, while the puppetca prevents doing that.

This lead us to introducing the group + feature concept, which lead to the enormous amount of groups, which would by super-confusing from hammer point of view (suddenly, when I register on proxy with 5 features, I would have 5 different ids to assign to the host, depending on the specific role the proxy would have).

No, they would shere:

name
url
features
taxonomies
status
assignments to hosts

I’m not sure how many people would be happy to see the host API changing, especially when most of the users will not care about the groups, but care about their current scripts working cross release. For me, this is actually reason against going the proxy group/route path.

Details on virtual proxy proposal

However, I don’t want to just talk about the downsides of the original proposal, which would be pretty odd if there wasn’t other alternative. So let’s spend some time on this.

LoadBalanced Proxy

When defining, the user selects:

the features to be included for load balancing
selects Real roxies that:
a. have all the features that were selected in (1)
b. were not already assigned in any other LoadBalanced proxy for any given feature
fills in the url if any feature requires that (e.g. the rex feature itself would not require any url for the load balanced one)

As for orchestration, I can even imagine the LoadBalanced proxy having the orchestration methods (such as create_tftp_file) where it would decide how to handle the particular case (calling to all proxies, or just one of them…).

Alias Proxy

When defining, the user selects:

the features to be exposed behind the alias
selects a proxy (Real or LoadBalanced) that has all the features: there can be multiple aliases defined for any proxy.
fills in the alias url

For orchestration, the Alias proxy could delegate all methods to the real proxy.

Here is also the difference between this approach and Marek’s from the original PR, as that proposal required additional attribute at host level.

iNecas · February 22, 2018, 7:47pm

Let me summarize the difference in the approaches in pros/cons table. I’m open for updating this table based on arguments (perhaps we can discuss this on a deep dive, that Greg suggested?)

	Groups/Route	Virtual Proxy
Supports load balancing
Supports multi homing
Does it require syncing members between groups
Does it keep API backward-compatibility
Allows selective load-balancing (per feature)
Doesn’t need single-item groups
Doesn’t need STI
Was it proposed by a cool developer
Does the proposing developer have capacity to work on the feature?

sean797 · February 23, 2018, 9:50am

As I understand, STI is designed for models that share the same or mostly the same attributes, relations and methods. I don’t see how Smart Proxies would share any real attributes, relations and methods to Smart Proxy Groups (lets just call them that for now).

Attributes
Smart Proxies:

name
url (foreman uses this for API, and clients base hostname from it)

Smart Proxies Groups:

name
alternative hostname (client use this)

So the only column they would share would be name and potentially alternative hostname. Though that would mean there would be 2 ways for a user to defining an alternative hostname. Also a user would have to select a Smart Proxy and choose to use the alternative hostname on the host form. I don’t like the idea of potentially selecting 2 things per feature.

Relations
Mostly similar or the same. Similar as features are shared but one is direct the other is in-direct via the Smart Proxy(ies)

Similar ones would probably have to be renamed

Methods
Some overlap

We can share methods via other ways.

In the original proposal there is no need to select a feature per Smart Proxy Group, you just define a group between 2 proxies with the same features. If a user doesn’t want to use HA DNS for example then they wouldn’t select that group, they would select one of the other “default” groups containing a single Smart Proxy.

You are wrong Features that we can’t or don’t want to make HA we can disallow selecting a Smart Proxy Group with more than one Smart Proxy in or adding more Smart Proxies to a group that is used by a host for that feature.

We can provide a deprecation for the life of APIv2. Ugly example: foreman/app/controllers/api/v2/hostgroups_controller.rb at 23c105a9635b65bb2090f58b26e5430d810e77e8 · sean797/foreman · GitHub

I think the API change is good for 3 main reasons:

advertise that something has changed
calling my puppet_proxy_id API param with my Smart Proxy Group doesn’t feel right.
not keeping old APIs around allows us to be free of our technical debt and make improvements

I would expect more users to take advantage of this, let me explain. Right now we have 3 reasons to add another Smart Proxy:

capacity
network security zone
more client networks (think interface or NAT) AKA multi-homing

The feature removes #3 from that list and enables a better way of doing #1.
I think the name of the Smart Proxy Group object is important so users see and realize it as a positive change. IMO Smart Proxy Group wouldn’t give positive vibes to most users; hostname or route probably would.

Happy to do a deep dive if people want to, though I think a written form of discussion allows for a wider audience and lets people read, think, then make a decision and reply. IMO deep dives are good for explaining proposals (but I think we mostly both understand each other now) not making decisions.

I’ve added some entries to your table and reworded some items:

	Groups/Route	Virtual Proxy
Supports load balancing
Supports multi homing
Does it keep plugin API backward-compatible
Does it keep user API backward-compatible
Allows selective load-balancing (per feature)
Doesn’t need single-item groups
Doesn’t need STI and complex models
Was it proposed by a cool developer
Does said cool developer have capacity to work on the feature?
Doesn’t require a large number of group/proxy objects

Having said all of the above I feel like a lot of those discussion points of moot, its a situation where 1 thing is marginally better with 1 method and another thing is marginally better with the other method.
I think we should make the decision based on the user API; its the thing that will impact people the most.

Reasons NOT to change the API:

no impact to users who don’t care (as explained above the size of this user base if up for debate)

Reasons to change the API:

advertise that something has changed (who reads the docs? )
calling my puppet_proxy_id API param with my SmartProxyGroup doesn’t feel right.
less technical debt - if we were to start Foreman today, I’m fairly sure we would have a Routing/Grouping object like this.

So I ask, are you okay with changing the API provided the following?

We provide deprecation’s for the entire life of APIv2
We call it Route (or something like that), as you say having a SmartProxyGroup of 1 could feel unnecessary to a user.

lzap · February 23, 2018, 12:19pm

Our API has versioning and we already discussed moving towards V3, we are not using this mechanism and we should improve in that. Also we are close to 2.0 release and the API change could be additional reason for those who needs reasons to change major versions

iNecas · February 23, 2018, 1:26pm

Could you expand on this? I don’t see where the large number of objects in in the virtual proxy case?

I’m quite ok with the rest of what you wrote. I guess we need more developers now to chime in and express their preference.

iNecas · February 23, 2018, 1:28pm

And by developers, I also meant also upstream users and PMs (that represent the downstream users).

sean797 · February 23, 2018, 1:37pm

Right, I understand now. I thought the Virtual Proxies were per feature, but they are not.

Please disregard that row.

Marek_Hulan · February 23, 2018, 4:00pm

If we are changing the API for assigning proxy, I think we should change it to something better that “proxy group”. I like @iNecas’s idea about proxy group being just another kind of proxy. I think it does not matter if STI is used only for blank object hierarchy and all the logic is added via composition. But I see huge advantage of keeping “proxy” API for users who scripted or build new UI on top of our API. I’ve heard about such projects not only on this cfgmgmt camp.

If majority thinks it’s worth of changing API (-1 here) and we decide to change the API (even if that means v3) maybe we should rename the proxy to something more meaningful. A new object called e.g. Worker. Proxy group and Smart proxy would be just two implementations of Foreman’s Worker. Actually Worker would work quite well with Foreman name as Foreman usually have multiple workers who work on his/her orders

sean797 · February 24, 2018, 5:49am

I strongly believe this API needs to change, it isn’t fit for purpose anymore.
I’ve spent almost a year writing this code and convincing people, only for people to agree in Ghent and change their mind a couple of weeks later.
Taking on technical debt and being unable to inovate is the biggest thing that will drive new & existing users away in equal measure. That’s exactly what the DevOps movement is all about for business.

I’m not saying we should not care about the past and blindly change things, I’m saying everything we change should be our best effort to meet users needs for today and tomorrow. We should provide migrations and deprecations and implement things exactly how we would if we were to start the project today.

Some bigger examples of this:
Pulp (who are creating Pulp 3)
CFEngine (who didn’t change, now look at Puppet & CFEngine’s userbase’s)
Netflix (they destroyed a whole industry)

Yes - you could argue that the STI approach does meet users needs, and it probably does, but it’s so complex and hacky that we will be unable (or hard to) to innovate after. Nor am I saying my approach is definitely the correct one, I am simply just saying that that API needs to change, it needs to allow for separate Routing than Foreman to Smart Proxies & Grouping of Smart Proxies. This is exactly how you configure systems outside of Foreman, using Hostnames, DNS & config files. Foreman should be no different. The fact it is is actually confusing to new users.

Taking on so much technical debt and being unable to innovative will slowly kill our upstream user base and downstream customer base. People don’t have any loyality, when something is better somewhere else they will move, our past should not stop us changing things.

Marek_Hulan · February 25, 2018, 7:42am

My understanding in Ghent was to change API to start using “route_id” which to me was acceptable change as that is understandable for non HA users. Now IIUC we ended up with “proxy_group_id” which does not sound as a good aproach. Not because of the change itself, but because in API/hammer it won’t be easy to make it clear for users, who are not interested in HA. I see 2 options here, do a change of API, but make it friendly for non HA users, or don’t make a change and introduce STI. I call it STI but it doesn’t have to use rails STI.

sean797 · February 26, 2018, 10:20pm

So I’m going to summarize what I think we (@iNecas, @Marek_Hulan & I) have agreed, in an effort to get more feedback (+1 or -1 and why?) from other @Developers. Obviously guys correct me if anything I say is not correct.

We are going to create a new Route object that can be assigned to one or many Smart Proxies and hold a Hostname attribute that is used by clients to connect to the Smart Proxy(ies). This Route object will also be assigned to Hosts, Subnets and Domain in-place of the current Smart Proxy associations. So puppet_proxy_id will become puppet_route_id.

The way Foreman will communicate to Smart Proxies is unchanged.
Where a Route has more than one Smart Proxy Foreman will orchestrate on all Smart Proxies providing that the features supports multiple nodes.
We think Route is the best name for this. Kubernetes call it a Route, but don’t use it for DHCP where we don’t feel Route quite makes sense. Other suggestions are strongly encouraged unless it is SmartProxyGroup

lzap · February 28, 2018, 4:09pm

Thanks for the wrap-up. Names, hmmmm.

“Associate DHCP proxy route with the subnet…”

“Go to domains, open the domain, select appropriate DNS proxy route…”

What’s wrong with proxy group again? Other ideas:

pool
set
cluster
party
gang

sean797 · March 5, 2018, 2:07pm

Thanks for the suggestions @lzap I think they have the problem as group.

Group implies “more than 1” but this object isn’t just for routing attributes for “more than 1” Smart Proxy, its for every Smart Proxy weather it is part of a Group or not.

I think the name Route shows how and what this object is used for in both scenarios (1 or many Smart Proxies), but like you say Route isn’t really correct for DHCP

Gwmngilfen · March 6, 2018, 2:17am

+1 for the implementation. Bikeshedding on the name, I do agree Route feels weird. Set or Pool would feel more accurate to me, and I’ll note that my expectation of such objects is that it has “1 or more” members, not “more than 1”. We don’t enforce Hostgroups having at least two members, after all

Also kudos to the excellent discussion here. You’re all fantastic