Workflow Manager 1.0 Refresh Disaster Recovery (further) Explained

With the release of SharePoint 2013, Microsoft released a new platform for workflows called Workflow Manager (WFM). As of this writing the current version is 1.0 Cumulative Update 3. Unfortunately disaster recovery (DR) for this product is not as straight forward as just setting up database replication.

Following are a list of resources I’ve used to implement disaster recovery:

I found that each of the above references hold vital clues to making DR for WFM work, but none of them had details upon which I was stumbling. There are two basic concepts where I needed to do additional research:

    • Certificates (which ones to use where and how to restore effectively)
    • Changing service accounts and admin groups upon a failover

As pointed out there are plenty of TechNet articles and blogs that talk about how to do WFM Disaster Recovery (DR), so I am not going into detail on the individual steps, but I decided to document my discoveries in hopes that others can benefit from my experiences.

So, at a high level, the basic operation is as follows. I’ll have sections below describing each of the areas where I had concerns:

    • Install production WFM and configure
    • Configure your backup/replication strategy for the WF/SB databases
    • Install WFM in DR
    • Execute the failover process
    • Re-connect SharePoint 2013
    • (Optional) Changing RunAsAccount and AdminGroup

Install Production WFM and Configure

Certificates – AutoGenerate or custom Certs?

Installing WFM 1.0 CU3 is fairly well documented in several places, but the one piece that I feel needs to be called out is regarding certificate configuration. There are options to Autogenerate your certificates (self-signed), to use your own domain certificates, or to use certs acquired from a 3rd party certificate authority. There are some businesses who have no restrictions against self-signed certs, but this will affect your restoration of service in the DR environment. As noted in Spencer’s blog, there are a total of six or seven possible certificates. Auto-generating your WFM certificates will dictate your restoration process in a failover scenario. One reason for this is that the WorkflowOutbound certficate is created with private keys, but they non-exportable.

Configure Your Backup/Replication Strategy for the WF/SB Databases

The key to disaster recovery with WFM (as with many products) is the data store. In this case we are referring to the SQL Server databases. Again, this information is in the related links and there are two things to keep in mind:

  1. You can use pretty much any replication method – backup/restore, mirroring, log shipping — except for SQL Server 2012 AlwaysOn, which is unsupported at this time. It is also crucially important to keep the WF/SB databases backed up as close in time as possible as the content databases in order to preserve the WF instance integrity.

UPDATE: With the release of Workflow Manager CU 4, SQL AlwaysOn is now supported and should be considered as the High Availability/Disaster Recovery solution. You can find information on CU4 here. And you can find installation information here.

  1. You do not need to backup the management databases, WFManagementDb and SBManagementDb, as they will be re-created during the recovery process.

Install WFM in DR

Depending on whether you want a cold or warm standby WFM farm, you will either have already installed the servers or will perform this as part of your recovery process. NOTE: WFM does *not* support a hot standby configuration. There are a couple of keys to your DR installation:

  • You will install the bits on the DR app servers, but you will *not* configure the product at this time.
  • If you are choosing to do a warm standby, then you may also import the necessary certificates ahead of time.
    • If you are using:
      • Auto-generated certificates, then it’s important to know that you need to export/import the Service Bus certificates from Prod to DR and for the Workflow Manager certificates you can auto-generate them in DR (remember you cannot import/export the WF certificates because the private keys are marked as non-exportable)
      • Custom domain certificates, then you will export/import all of them from Prod to DR
  • The Service Bus root certificate should be imported into the LocalMachine\TrustedRootAuthorities store.
  • The other Service Bus certs should be imported into the LocalMachine\Personal store.

Executing the Failover Process

In the event of a disaster (or just a need to failover), the following process is required.

  1. Restore the 4+ SQL databases (WFResourceManagementDb, WFInstanceManagementDb, SBGatwayDatabase, SBMessageContainer01 – n) from prod_SQL to dr_SQL.
  2. Assuming the steps above have been followed to install WFM in DR, then you need to use powershell to restore the SB farm. If you were doing a true ‘cold standby’, then you need to install (but not configure) the SB/WF bits from Web Platform Installer.
  3. Restore the SBFarm, SBGateway, and MessageContainer databases and settings (do this on only one WFM node)
      • The SBManagementDB will be created in DR during this ‘restore’ process
      • The RunAsAccount *must* be the same as the credentials used in production
  1. Again, using powershell, run Add-SBHost on each node of the farm.
  2. If you used auto-generated certificates for the WFFarm in prod, then when you restore the WFFarm you will auto-generate new ones. However this also means that you may need to restore the PrimarySymmeticKey to the new SBNamespace.
  3. At this point, restore the WFFarm using powershell (do this on only one WFM node)
  4. Run Add-WFHost on each node of the farm.

At this point, the new WF Farm should be in a working state. You can test this by navigating to the endpoint in a browser and you should receive output similar to the image below:

clip_image002

Re-connect SharePoint 2013

If WF certificates were re-generated in DR, then you will need to recreate the SharePoint Trusted Root Authority. Export the WF SSL certificate and add it to the SharePoint farm using New-SPTrustedRootAuthority.

Create a new registration to the Workflow farm using Register-SPWorkflowService.

There is a cache of security trusts, so in order to see the change more immediately you will likely need to execute the timer job “Refresh Trusted Security Token Services Metadata feed.” with the following powershell:

Start-SPTimerJob –Identity ‘RefreshMetadataFeed’

(Optional) Changing RunAsAccount and AdminGroup

Summary

The process above should work in most (if not all) scenarios, but I welcome any comments if you encounter problems or challenges. I’ve spent many hours on this over the past 6 months off and on and it’s very possible that I’ve missed something. Smile

I’ll add the last section about changing service accounts once I have the complete set of steps for WF accounts. Service Bus added powershell cmdlets, which makes this easier, but Workflow Manager has not as of yet.

UPDATE: With the release of Workflow Manager CU 4, one can now change the credentials for the Workflow Manager Service with the Set-WFCredentials powershell commandlet. You can find information on CU4 here. And you can find installation information here.

Advertisements

11 thoughts on “Workflow Manager 1.0 Refresh Disaster Recovery (further) Explained

    • Hi Sarath! From my recollection these steps should work for either cold or warm standby… the basic difference would be whether you have the WFM farm pre-installed (warm) or not (cold).. Hope this helps!

      • Hi Brian,

        In my case, I m using SQL 2014 Always on for async replication of content DBs between primary and DR SQL nodes. WFM is installed along with SharePoint in an application server. So when I pre-install WFM farm in DR site (warm), do I need to configure WFM with new databases and then during failover execute the restore commands for SB and WF ?

      • Hi Sarath,

        You should be careful with this scenario as the currently released version of WFM still is not supported with SQL Availability Groups, to my knowledge. In this case you must rely on the simple backup/restore of the databases until something changes in that support policy.

        Brian

  1. Hi Brian,

    Yes, that is correct. I have a small confusion in this method, do we need to configure WFM or just install bits and certificates on DR app servers and leave it like you mentioned in the article ?
    Are we really configuring the WFM with new databases in warm standby method or restoring first the 4 databases and then restoring SB & WF farm using powershell?

    • for a warm standby you’d be installing WFM and the certificates, then during a failover event you would configure WFM using Powershell and pointing at the restored databases. Hope that help clears it up!

      • Thanks Brian! Just to be double sure, when you say configure using powershell, i think it is same mentioned here under “Process to run the restore commands” in https://technet.microsoft.com/en-us/library/jj730570.aspx. Is that correct?

        Also, If our SharePoint sites are accessed from external network, do we need an external certificate as WF, Service Bus and Outbound signing certificate? Or domain CA issued certificate can be used without any certificate error?

      • Great question, Sarath… so the issue here is that your workflow manager servers will never be contacted directly by the client. The communication is strictly between SharePoint and WFM; so you can use domain issued certificates if you so choose.

        About the restore commands, yes… that is exactly what I mean…

        Best of luck!

  2. Hello Brian,

    I’m starting the DR installation part for WFM and another question came to my mind. We discussed WFM farm will be restored using powershell during failover pointing to the restored databases.

    When we do failover activities multiple times (as part of DR testing), do we need to restore WFM farm using powershell everytime or just restoring the databases will be enough in this case?

    • Hey Sarath! When you run the Restore-* cmdlets for WFM it will make entries into the databases regarding the DR servers. I would be more comfortable running the entire process for each test. However I haven’t tried running the cmdlets on databases where the restore has already been run previously. 🙂

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s