RFC: Moving off of Rackspace Infrastructure

As Rackspace has ended it’s support for open source projects we would like to move everything off of Rackspace an onto new providers. See below for a detailed layout of the current Rackspace infrastructure.

If you have questions or concerns please raise them here.

Proposal

There are three main components to the proposal:

  • Moving DNS to a centrally managed provider
  • Keeping secure infrastructure on a single provider
  • Maintain current level of Jenkins nodes

DNS

Gandi provides open source support and centralized group management. Accounts will be setup there and DNS transferred from Ohad to Gandi. Anyone within the infrastructure team that wants access may create an account and be added to the group.

Jenkins Nodes

The non-security focused Jenkins nodes (slave01 and slave05 on Rackspace) will be shutdown and replaced by AWS Jenkins nodes that are copies of the two AWS nodes we have today.

Main Infrastructure

The main and security focused infrastructure is:

  • Foreman/Puppet master
  • Jenkins master
  • Webserver
  • Debian Jenkins node
  • slave02 that handles specific jobs that push into our infrastructure

These servers will be moved alongside existing infrastructure running in Oregon State University Open Source Lab (OSUOSL) on their Openstack infrastructure. This will require a capacity increase on their footprint which they are happy to do.

Action Items

  • Email Ohad about DNS access [Evgeni]

    • Account on Gandi created by Evgeni
    • Create an account if you want access
    • Blocked on:
      • Waiting on transfer which has been initiated
    • Next steps
      • Flip DNS over from Godaddy to Gandi
  • OSUOSL team access [ewoud]

  • Move Jenkins nodes

  • Migrate Jenkins master

  • Migrate Webserver

  • Migrate Foreman/puppet

DONE

  • OSUOL [Ewoud]
    • Email them and ask for more resources [DONE]
    • Waiting on ticket resolution [DONE]
  • Archive stats box [Evgeni]
  • Create new Jenkins nodes in AWS [ehelms]
    • Patrick to spin up 2 nodes
    • Eric to configure the nodes
    • New AWS slaves online, slave01.rackspace and slave05.rackspace shutdown and deleted

Current Rackspace Infrastructure

6 Likes

A big “thank you” for all the work done so far!

2 Likes

Agree, thank you!

Thanks to the whole infra team and beyond to anyone who is involved.

Updated action items today and built out action items for transferring Jenkins master and final Jenkins nodes.

As part of these updates, we are moving to use the term node for our Jenkins workers. I have updated the OSUOSL and AWS nodes to the following format:

node0X.jenkins..theforeman.org

node0X.jenkins.aws.theforeman.org
node0X.jenkins.osuosl.theforeman.org

I am less sure about doing the Scaleways and Netways ones properly. The debian01 and slave02 left on Rackspace will get updated as part of their move to OSUOSL.

2 Likes

I’ve spun up a node05 at OSUOSL that will replace slave02.rackspace once we figured out secrets handling and label the nodes properly.

Next up is creating a deb-node01.jenkins.osuosl.theforeman.org to replace the debian01 in Rackspace. I looked into this within OSUOSL but I could not find an Ubuntu 18.04 to match the current debian01. @mmoll what approach should we take here?

@ehelms Debian 10.3 should also be fine.

Last night theforeman.org was transferred from Godaddy to Gandi. It looks like various DNS recursors have cached the old NS records (TTL is 1 day in the .org zone) and Godaddy stopped responding to DNS requests. This is causing some instability. I expect this to stabilize during the day as TTLs expire.

I also took the opportunity to create DNS records for all hosts we manage.

1 Like

Looks like katello vcr tests are now failing with:

An HTTP request has been made that VCR does not know how to handle:
  POST https://node03.jenkins.osuosl.theforeman.org/pulp/api/v3/distributions/container/container/

That sounds like a broken test. You should never use real domains and the actual FQDN, especially with VCR.