torsdag 16 februari 2017

System Center Data Protection Manager 2016, Best Design Practices

The purpose of this blog article is to inspire and inform you as a reader what you should consider before you start implementing System Center Data Protection Manager 2016 as the restore-feature within your datacenter.
Throughout the years, I have noticed a lot of different designs that hasn’t played out well. So hopefully readers of this article will avoid any of those pitfalls that either leads to reinstalling System Data Protection Manager or realizing that you are not able to do that restore scenario that you first thought of.
Having sorted out those questions I can assure you that the road to a successful DPM implementation based on the BaaS concept will be closer at hand for sure.


The start, building the restore scenario

The most important part when it comes to backup has always been, and will always be one thing, restore. This should always be your major and primary focus in any design regardless of technology.
Having a prober recovery-plan will provide you with an optimal backup design and strategy that will meet the company need for recovery. However, this is just the thing that the majority of companies misses out on. Building the restore scenario first will provide you a more optimal design when it comes to, for example, the number of DPM servers needed, time for recovery point creations and also archiving/long-term protection and more. The best way to get started with this strategy is to identifying two things:

  • Infrastructural Services
  • Business Services

Your Infrastructural Services is for example your Active Directory meanwhile your Business Services could be your CRM solution that consumes the Infrastructural Services that builds-up the Business Services. The initial step in building a true restore scenario is to breakdown all the Infrastructural Services that cooperates to deliver the Business Services. What Windows Servers are involved? Does the Business Services use SQL Server and is there any IIS also dependent for a successful delivery of the Business Service from an end-user perspective and so on.
Having a detailed breakdown of a Business Service will also help you identify and understand how you could consume Azure services like Azure Site Recovery and others IaaS, PaaS or SaaS services in the most optimal way for your restore-process of the Business Service.


Virtual vs. physical DPM servers

This is a very common discussion, even today, where people tend to keep the physical concept for the DPM servers instead of virtualizing them. The most important part when it comes to providing restore concepts for a company is being able to simply scale and provide more resources to your BaaS (Backup-as-a-Service) service that you provide and deliver for your company regardless of the company size.
This is a simple task when you adapt a virtual concept for your DPM servers where you are able to provide a deployment plan for new DPM servers that will be associated with your BaaS.
The most important take-away; if you haven’t virtualized your DPM servers yet and thinking of deploying DPM 2016, Microsoft highly recommend you to virtualize them using Hyper-V to achieve the possibility of providing scale. 


DPM disk setup

To get the most out of your DPM servers, both form a performance perspective but also from a disaster recovery perspective (restoring the DPM server itself), you should use the following disk setup.

  • OS disc                  (your %systemdrive%)
  • Program disc      (Where you install all software)
  • DPMDB disc        (dedicated disk for the DPMDB)
  • Azure Scratch     (A disc dedicated for the MARS agents scratch catalogue)
  • Recovery disc     (A dedicated disc dedicated for the prestaging procedure for the MARS agent)

All discs should be in the VHDX format and the dedicated DPMDB disc should be fixed in size due to performance; all other discs could be dynamic discs.


SQL Server installation

There are some key points when it comes to delivering an optimal SQL Server installation for System Center Data Protection Manager 2016. The importance of having dedicated service-accounts for your DPM server is a must but also the fact of using the correct collation of your SQL instance hosting the DPMDB. The only collation you should use are SQL_Latin1_General_CP1_CI_AS
All other collations are unsupported, so keep in mind to use the right one or you end up reinstalling your DPM servers from the beginning.
Also, remember to setup the amount of memory your SQL Server should consume. This should be set accordingly to the number of GB you have, spare at least 4-6 GB of RAM for the operating system. The SQL memory configuration is defined in the SQL instance that hosts the DPMDB.
Having a poor or wrong setup or configuration will give you a negative impression of DPM. Keep in mind that the SQL Server is the engine behind it all; If its configured poorly the engine will perform badly, simple as that.


Antivirus exclusions DPM and the MARS agent

The most common performance challenges there is regarding DPM is the fact of not setting up the real-time protection on the antivirus software correctly. If you don’t configure the exceptions for catalogues and services, you will end up with possibly corrupted data.
For System Center Data Protection Manager you should exclude the following catalogues that resides within the DPM installation catalogue:
  • XSD
  • Temp\MTA
  • Bin

The following process should be excluded from real-time scanning:
  • CSC.EXE
  • DPMRA.EXE

For the DPM servers that has the MARS agent installed and pushing data to Azure it's important to exclude the following catalogues:
  •  Microsoft Azure Recovery Agent\Bin
  • Scratch folder

The following processes must be excluded from real-time scanning:
  • CBEngine.exe
  • CSC.EXE (Goes for both DPM and the MARS agent)


Not having the correct exclusions will end up with your local antivirus scanning your DPM disk pool or the or scratch area for example.
However, there is one more thing you should keep in mind. In the case where your antivirus software does find a threat you shouldn’t quarantine it (which is the most common policy), you should have it deleted by default.


Protection Group Designs

This is one of my favorite topics from the field. I have seen a lot of interesting designs when it comes to designing Protection Groups throughout the years and here is some of my thoughts that has played out very well for a large number of companies.
The first thing to keep in mind is the Protection Group name. The best staring point is to build your Protection Groups designs according to your Business Services RTO, RLO and SLA (if any). The Recovery Time Objectives (RTO) is the definition if how fast you should be able to be back-on-track with your Business Service. System Center Data Protection Manager can either synchronize your data changes every fifteen minutes for workloads like SQL, Exchange and File. For other workloads like SharePoint, Hyper-V etc., DPM can make Recovery Points every thirty minutes. So, having a clear understanding of your Business Services and your Infrastructural Services are crucial since the Protection Group is where you set the actual backup strategy or plan that should correspond to your restore plans.
Regarding naming of Protection Groups there are a few tips that could hopefully inspire some DPM administrators. To get a decent start you should consider using the following naming convention for your Protection Groups:
  • Workload + “number of recovery points per day or week” + time + SYNC info + Azure + time

An example for a Protection Group having this naming convention would be:
  • File (1RP/d 01:00PM | 6h Sync | Azure 01:00 AM)

In some cases, you could also provide the retention range for your on-prem disc and also Azure to make it even more clear. An example would be:
  • File (1RP/d 01:00PM 30 days|No Sync|Azure 01:AM 180 days)

In this latter example, you get a clear understanding of what kind of members should be associated with this Protection Group but most importantly your restore capabilities. Let’s break it down shall we. The members of this Protection Group are for the File workload. Everyday a recovery point is made 01:00 PM that has a Retention Policy that states that the data should be available for 30 days on-prem meaning in the DPM disk pool. There is no extra synchronization made for the members of this Protection Group and all protection data will be sent to Azure 01:00 PM where it will be stored 180 days back in time.


Replace your tapes, use Azure since it really works…

There are many companies that has now started their journey to using Azure instead of tapes for their long-term retention. In many Business Cases I have made, Azure is far most the most optimal and cost effective solution when it comes to long-term retention for System Center Data Protection Manager protected workloads.
More than 50% of the cases; using Azure will cut the cost in half even for those companies that uses TaaS (Tape-as-a-Service) solutions or VTL.

One important fact to point out is the possibility to restore data from a Recovery Service Vault that is shared between DPM servers. Let’s say that you have two or more DPM servers associated with the same Recovery Services Vault. If one DPM server fail you are still able to restore the data that the DPM server pushed to Azure by adding the Azure Passphrase generated in the MARS setup. This could be done via the Add External DPM feature from the Recovery pane in DPM. This means one simple thing, as long as you have your protected data in Azure, you will always be able to restore it.

Inga kommentarer:

Skicka en kommentar