Speaker : Sanjay Narang, Luca Bandinelli
The requirement is to build an internet facing site, highly customized and the needs to be always on. So, minimum downtime and data loss. At which level high availability is needed ? This has also to cover the natural disasters, which implies different data center location. Azure has a connectivity time of 99.95%. If you want more, the solution has to be designed accordingly.
O365 is the place to go for collaboration, but it is not the case for Internet scenarios. Therefore, Azure is the good option and is able to scale on-demand. SharePoint solution on top of Azure is a Microsoft supported solution. Specific features, such as blob storage, fast cross-dc transfer will be very useful.
The solution is based on two different farms in two different Windows Azure regions, using a custom log shipping jobs for data synchronization (and not SQL Always-On). Also, traffic manager will be used.
Content and Management database will be synchronized. Search will have 2 search services, one for the production, one for the DR.
Virtual networks is a challenge as they are restricted to a single datacenter. Also, an AD cannot span multiple DCs. Therefore, each farms will be in different domains, preventing the use of SQL Always-On. Also, a domain trust has to be setup.
The primary farm in a Windows Azure will have an affinity group, in which a virtual network will be defined. Different cloud services will be defined containing the virtual machines. But, each of these elements need to be always available, using an availability set. For front-end servers, Windows Azure Load Balancer can be used. For SQL Server, an Always On Availability Group will be setup, with an Availability Group Listener Group. But, this implies having all the clients in a different Cloud Service. For custom log backups, blob storage will be used.
The DR farm is similar to the primary farm. The custom log shipping job will take the backup from the blob storage. The content DBs and MMS DB are read-only and not part of Always-On AG. The search is created separately and crawls the read-only content DBs and must be scheduled outside of the restore window time.
Custom Log shipping is required on both farms. The backup and restore commands will use an URL for the storage. The challenges of having two farms with different AD is that accounts are different from one farm to the other. Doing a backup/restore will therefore not work. The DR required accounts must be added. Once it is done, it has to be backed up and restored on the primary farm, thus containing the accounts of the DR farm.
For search, log shipping can't be used. Having a separate search services allows to keep the SLAs and not requires to copy the indexes. But, having this setup makes the search analytics not usable (at the global level).
The main component enabling failover is the Azure Traffic Manager. Requests will always be directed to the primary endpoints while it is available. A custom job will poll the TM to check whether the target endpoint has changed. When the primary farm goes down, the TM detects it and redirect the request to the DR farm, which is read-only. The custom job also detects it as well and pauses the restore job to enable read-write accesses. TM takes 90 seconds to detect a farm is not available. When the TM has switched to the DR farm, we need to prevent it to come back to the primary farm when it is back online, as this farm is no longer primary.
Issue now is that when the DR is permanently switched, there is no DR anymore. It has to be rebuilt, similarly to how it was done for the original DR farm. During the patching, the DR can be used temporarily, but, think about SLA, as the DR will be read-only. Consider also using the Content Delivery Network to cache the pages and other content.
The opinions expressed herein are my own personal opinions and do not represent
my employer's view in any way.