Last week Vitaly and I migrated BotBot.me to new servers and also launched a redesign of the user account section. You can now support us by becoming a subscriber for $3/month and even log personal channels for $2/month. If you are curious check it out here.
In this post I'll be sharing tactics we used to migrate the service with minimal service interruption.
BotBot.me receives less traffic than most our customers' sites but it is made up of multiple services and collects hundreds of messages per minute leading to some interesting challenges. The main services that make up BotBot.me are:
- Go IRC client
- Python plugins
- Django web interface
A quick read of the architecture docs will help you understand how all these services plug together. We use SaltStack to configure our servers. I am not going to present the detailed configuration of each service but instead explain the strategy we followed for the migration.
We use Nginx for TLS termination, serving static assets, and finally reverse proxying to the Django app.
In order to securely connect our legacy system to the new server, we used autossh to create SSH tunnel between the two. Here is an example for the
# /etc/init/autossh_redis.conf # autossh startup Script description "autossh daemon startup" start on net-device-up IFACE=eth0 stop on runlevel [01S6] respawn respawn limit 5 60 # respawn max 5 times in 60 seconds env AUTOSSH_PIDFILE=/var/run/autossh_redis.pid env AUTOSSH_POLL=60 env AUTOSSH_FIRST_POLL=30 env AUTOSSH_GATETIME=0 env AUTOSSH_DEBUG=1 exec autossh -2 -M 20000 -C -N autossh@legacy_server -L 6379:localhost:6379 -i /root/autossh_id_rsa
The Django web app is being served with uWSGI. Here is its autossh config:
# /etc/init/autossh_uwsgi.conf # autossh startup Script description "autossh daemon startup" start on net-device-up IFACE=eth0 stop on runlevel [01S6] respawn env AUTOSSH_PIDFILE=/var/run/autossh_uwsgi.pid env AUTOSSH_POLL=60 env AUTOSSH_FIRST_POLL=30 env AUTOSSH_GATETIME=0 env AUTOSSH_DEBUG=1 exec autossh -2 -M 30000 -C -N autossh@legacy_server -L 8080:localhost:8080 -i /root/autossh_id_rsa
Then we configured Nginx on
new_server to run in read-only mode (i.e., only accept
GET requests) and pointed it to the SSH tunneled uWSGI instances running on
We lowered the DNS TTL to the minimum our provider allows (5 minutes) a few days before D-day to ensure changes would propagate as quickly as possible. When we switched the DNS to point to
new_server, traffic started to flow to without fully engaging the full stack (Redis and uWSGI were still tunneled to
legacy_server). It introduced a marginal extra latency, but was barely noticeable since the two servers were not too far apart geographically. Using the new Nginx instance to switch traffic between the legacy and new servers gave us full control over exactly which servers users would hit versus waiting on DNS changes to propagate.
When we made the switch, we needed to ensure that no processes were writing to our legacy database. For BotBot.me we stopped logging into
PostgreSQL, but continued to collect the logs temporarily storing them in our message bus,
Redis. Due to the way BotBot.me is architected, the web UI was still functional in read-only mode and people continued to receive real-time updates in their browser. With database writes on hold, we could safely dump the legacy database and restore it on the other side of the fence.
Finalizing the switch over
With the data moved over to the new infrastructure, we configured Nginx to send traffic to the uWSGI instance on
new_server and started up the plugins to drain the logs accumulated on the
legacy_server Redis instance. Once that process completed, we simultaneously :
- shut down the IRC client on
- started the IRC client on
- killed the Redis tunnel on
- started the local Redis instance on
Et voila we moved a live site with minimal service interruption (only a short period of read-only mode). Due to IRC network flood control, we lose a minute or two of IRC messages as the bots
JOIN channels over a hundred channels, but this was an acceptable loss for us. If you haven't already, check out BotBot.me and let us know what you think.