We’ve received a number of inquiries lately asking about scaling and redundancy (load-balancing and high-availability).  The WholesaleBackup Server Manual covers these topics, but we felt a high-level overview would be a good topic for a blog entry.

A decent quad-core server should be able to handle several hundred backup clients.  Disk I/O performance is the single most important factor in choosing hardware, in particular the server’s storage devices (both RAID controller cards and the physical disk drives) ability to handle multiple simultaneous readers and writers is critical.  We suggest you think ahead when making hardware purchases and configuring it.  For example, RAID10 is typically much quicker to rebuild multi-terabyte arrays then RAID5 or RAID6, so in the event of a disk failure (which is inevitable) you’ll be much happier with RAID10.  In addition, SAS drives typically handle multiple simultaneous readers better than SATA drives and SCSI and eSATA interfaces are much faster then USB.If your server’s disk controller card and chassis will support it, you can add additional disk drives and make them available in Linux at any time.  If you run out of local storage capacity, it is straightforward to setup a Linux NAS or SAN and remote mount that network storage via NFS to your backup server (the file systems need to be Linux formatted).

Our backup server will automatically backup it’s key meta-data to the /backup/ directory, which you can then backup or sync to another device.  We also provide a script that automatically replicates this data to an Amazon EC2 Cloud server (the script will fire up the server, replicate the data, and then automatically stop the server to save you $s).  It is straightforward to modify this script to replicate to a server of your choosing.

The WholesaleBackup Server is architected for scaling and redundancy and can be thought of as having four major components:

  • A MySQL database containing meta-data about every account’s backups and restores.  This is not mission critical data for a customer to backup and restore, rather it is created for you in supporting your clients and their ability to access their backup logs from a web portal.  This database is typically configured to run on one of your backup servers though it can be run on a separate machine.  If you desire you can custom configure replication of your MySQL database to an additional server using standard MySQL and Linux replication methods (there is nothing custom about our implementation which would preclude this).   If general your clients will be able to backup and restore, even if this database is offline, although we do support some load-balancing schemes that require to database to be online to perform optimally.
  • A web-portal consisting of PHP and HTML pages which connect to the MySQL database.  This portal is specifically created for you to support your clients and for them to be able to access their backup logs when away from the software client.  Typically this web-infrastructure runs on the same server as the MySQL database, though that need not be the case.   Multiple servers can host these pages if you wish to implement high availability for your support web portal (though this is not requqired for your client’s backups and restores).
  • The storage device(s) that hold the actual customer’s data, logs, and meta-data.  The devices can be local to the backup server(s) or remotely attached via NFS.  If you are using multiple storage devices, our support web-portal allows you to move customers between these devices.
  • The actual backup server(s) running WholesaleBackup Server software.

A typical scenario then is for new WholesaleBackup Server customers to setup a single backup server with all four components on it, and then be sure that the /backup/ directory is safely backed up (or synced) to a separate device.   As one adds more users one may add more storage (either local or remote) and when the server’s processing ability (RAM and CPU) become maxed out after several hundred clients, one then adds a second server, and a third and so one.

So the question then is, how are multiple backup servers supported?  Conceptually you have a few choices:

  1. Setup different backup server implementations (such as backup1.mydomain.com, backup2.mydomain.com, etc.) and create unique software client installers for each where you manage where each client backups up to.  This scenario requires the minimum amount of configuration effort on your part but there will be a separate support web-portal and database for each backup server.  This is a non-load-balanced implementation.
  2. Each backup server supports backups for certain client accounts, specifically those whose underlying storage device is mounted on that server.  So, when a client makes a request to WholesaleBackup to connect to your infrastructure our load-balancer makes a secure http call to one of your servers to request the network destination for that specific user, which is then returned back to the client so they can execute their backup/restore operation with the server that has access to their data.  This approach is most commonly used in Cloud implementations (which often have limitations on the size and number of volumes you can cross mount and which suffer from the Linux kernel bug described in #3 below) or when you want to minimize cross-mounting ‘slow’ network storage volumes or devices that aren’t fast enough to handle many simultaneous readers and writers.
  3. Each backup server can provide backup and restore capability to every one of your clients, so each server must have all your storage devices mounted (either locally or remotely via NFS) so every backup account’s data is visible to each backup server.  This approach is most commonly used in implementations on your own hardware where you have extremely fast SAN or NAS devices with cross-mounted network storage volumes and is often the best if you are striving for ISO of Six Sigma compliance. In this scenario one needs to direct each client to a backup server either via:
    1. A simple DNS round-robin scheme or a dedicated load-balancing device (such as a box running pfsense).
    2. Sophisticated load-balancing mechanisms that look at the load of each backup server to determine where to direct a backup client (this is sometimes scripted with BSD machines running pf or by using a commercial load-balancing device such as made by f5 or other vendor).
    3. Please NOTE that there is a bug in the Linux kernel, which has not been fixed as of January 2012, with NFS buffers over-running and causing server crashes; we have seen this bug manifest itself on Amazon EC2 instances in high backup traffic situations where NFS mounts are being accessed from multiple backup servers simultaneously; if you think you may scale to this level and you are using EC2 servers we suggest you utilize scenario #2 above instead of this method of load-balancing.

In each of the above scenarios, the backup servers will need to be configured appropriately such as for Linux account replication if required, access to your MySQL backup server, potentially accepting load-balancing calls from your load-balancer or the one which WholesaleBackup provides, etc.  Given the intricacies of configuring and testing enterprise level load-balancing and high-availability schemes with multiple backup servers, you’ll need to enter into a professional services engagement with a trained WholesaleBackup engineer to configure and support load-balanced configurations #2 and #3 above.

To get started, we suggest you setup a single backup server, knowing that our solution will scale as your business scales.  When you’ve hit several hundred users and exhausted the resources of your first server then it’s time to add a second server and make some decisions as to whether to go with option #2 or #3 above (you can change between them so no decision is permanent)…

The following graph show’s four under-utilized backup servers (e03 – e06) plus a MySQL/web-server (e01) in production in the Amazon EC2 Cloud using load-balancing scheme #2 listed above.