Monthly Archives: October 2011

Exchange 2010: Implementing a dedicated backup network for a Database Availability Group…

Occasionally, customers that deploy highly available Mailbox servers use a dedicated or secondary network for backup operations. The way in which you do this in Exchange 2010 has changes, and it’s important that when you implement such a configuration that you don’t use the methods or instructions designed for previous versions of Exchange.

 

In Exchange 2003 and Exchange 2007, we leveraged the Windows Cluster service and used the cluster resource model. To implement a dedicated backup network, you would create a virtual IP address resource in the Exchange resource group corresponding to the network on which you wanted to perform backups. The network associated with the IP address would then be configured as follows:

· Allow cluster network communication on this network and Allow clients to connect through this network enabled (Windows 2008 and Windows 2008 R2)

· All Communications (Windows 2003 and Windows 2003 R2).

On the backup server a hosts file would be used to resolve the name of the Exchange Virtual Server (Exchange 2003) or Clustered Mailbox Server (Exchange 2007) using this dedicated IP address. Because each database was associated with a single server name this would allow the backup server to connect to the Exchange server name on the private network and leverage the backup agent installed on the node hosting those resources.

 

In Exchange 2010, there have been several changes that no longer allow this type of implementation to function. Exchange 2010 still leverages the Windows Cluster service for some of its high availability functionality, but it no longer uses the cluster resource model. Exchange 2010 includes a form of application level high availability known as the Database Availability Group (DAG). When an administrator creates a DAG they must provide a name and one or more IP addresses. The DAG name and the list of IP addresses are used when forming the DAG’s underlying cluster – these will become the Cluster Name Object (CNO) and the list of IPs will be associated with the cluster network name. All IP addresses assigned to a DAG must be on the MAPI network. This network is automatically configured to allow cluster network communications on this network and allow clients to connect through this network.

 

Exchange 2010 implements tightly coupled network integration with the Cluster service. This integration manages the settings of all networks found on the DAG members – for example Replication and backup networks. By default, Exchange 2010 sets these networks to allow cluster network communication on this network but does not set allow clients to connect through this network. When the setting allow clients to connect through this network is not enabled, virtual IP addresses cannot be bound to this network.

 

There is no straightforward way to create a dedicated backup network in Exchange 2010 using the legacy Exchange implementation. There are no application and service groups in which to create the IP address resources, ancillary networks do not support the settings necessary for a virtual IP address, and the integrated Exchange cmdlets do not allow you to assign an IP address to the DAG on a backup network.

 

Despite these challenges, some administrators have been semi-successful at implementing a variation of the solution (semi-successful as in, it works for a little while, but not for the long term). For example, administrators will find a way to change the network roles to allow virtual IPs to be created and will then update the Cluster core resources with the virtual IPs. This works, but only until Exchange reconfigures the network settings automatically, or until the administrator runs one of the integrated DAG cmdlets which removes the additional IP addresses being created.

 

So if that is the case how does one implement a dedicated backup networks, and what are some backup vendors doing? Unlike previous versions of Exchange where databases were associated with a server name, Exchange 2010 does not associated databases with a single server name per se. Instead, copies of a database are associated with a server. These database copies can exist across multiple members of the DAG with one member having an “active” database and one or more other members having “passive” databases. In order for backup vendors to determine database locations and status, some make a call to the DAG name to essentially perform a topology discovery. Remember that the DAG name is the name of the cluster, should have valid IP addresses for all subnets on which DAG members exist, and should be dynamically updating in DNS so that the DAG name always resolves correctly. Once this topology discovery is completed the backup server then initiates individual backup sessions to the DAG members themselves (via the DAG member names).

 

When implementing a dedicated backup network for an Exchange 2010, do not modify the cluster core resources. The backup servers should be allowed to do topology discovery over the MAPI network. This requires that the backup server has an interface that can query DNS, determine the IP address currently resolving to the cluster name, and establish a connection to that address.

 

Next, modify the hosts file on each backup server. In this configuration, each DAG member has a backup network interface with an IP address locally assigned to that interface. The hosts file entries on the backup server are used to resolve DAG member names.

 

How does this accomplish the task? Let’s take a look.

 

Here is an example 3-member DAG. The DAG has two networks, a MAPI network and a Backup network. Each member has an IP address statically assigned to each NIC..

 

image

 

When the backup software attempts to perform a backup operation, it queries DNS and determines that the IP address associated with the DAG is 10.0.0.100. The backup software connects to the DAG and performs a topology discovery.

 

image

 

The backup software determines that a database that needs to be backed up resides on NodeA. Normally the backup software would access the backup agent by querying DNS for the IP address associated with NodeA. If this happens, the backup would occur over the MAPI network and not the Backup network. To prevent this from happening, the administrator edited the hosts file of the backup server and added an entry for NodeA.company.com with an IP address of 10.1.1.1. The backup server uses this information and establishes a connection to the backup software agent running on NodeA via the Backup network.

 

image

 

In some scenarios, customers want to stop cluster heartbeat and continuous replication from using the Backup network. To do this, you can use the Set-DatababaseAvailabilityGroupNetwork cmdlet with the–IgnoreNetwork:$true and –ReplicationEnabled:$false parameters for the Backup network.

 

When implemented as described above, this configuration enables most backup applications to operate using a dedicated network without changing DAG settings that might be automatically disabled by the system.

 

================================

10/25/2011

Corrected location of host file entry change.

================================

 

================================

12/18/2011

This weekend I reviewed a configuration that was using Backup Exec to protect their Exchange 2010 deployment.  It was pointed out to me that on the properties of the job, on the network and security selection, there is an option to select which network / subnet could be utilized for backup operations.  Some may find this helpful when configuring their jobs.  I would defer to Symantec on specifics of establishing this configuration.

================================

Exchange 2010: Get-DatabaseAvailabilityGroup does not return all attributes of a DAG

When administrators run Get-DatabaseAvailabilityGroup they notice that certain fields within the output are not populated.  In some scenarios this leads administrators to attempt to change settings of the DAG or believe that there is an operational issues with the DAG. 

 

[PS] C:Windowssystem32>Get-DatabaseAvailabilityGroup DAG | fl

RunspaceId                             : 37b4f2b4-06b8-4e87-9252-968452ab3a28
Name                                   : DAG
Servers                                : {DAG-4, DAG-3, DAG-2, DAG-1}
WitnessServer                          : mbx-1.domain.com
WitnessDirectory                       : c:DAG-FSW
AlternateWitnessServer                 : mbx-2.domain.com
AlternateWitnessDirectory              : c:DAG-FSW
NetworkCompression                     : Enabled
NetworkEncryption                      : Enabled
DatacenterActivationMode               : DagOnly
StoppedMailboxServers                  : {}
StartedMailboxServers                  : {DAG-1.domain.com, DAG-2.domain.com, DAG-3.domain.com, DAG-4.domain.com}
DatabaseAvailabilityGroupIpv4Addresses : {10.0.0.24}
DatabaseAvailabilityGroupIpAddresses   : {10.0.0.24}
AllowCrossSiteRpcClientAccess          : False
OperationalServers                     :
PrimaryActiveManager                   :

ServersInMaintenance                   :

ThirdPartyReplication                  : Disabled
ReplicationPort                        : 0
NetworkNames                           : {}
WitnessShareInUse                      :
AdminDisplayName                       :
ExchangeVersion                        : 0.10 (14.0.100.0)
DistinguishedName                      : CN=DAG,CN=Database Availability Groups,CN=Exchange Administrative Group (FYDIB
                                         OHF23SPDLT),CN=Administrative Groups,CN=Organization,CN=Microsoft Exchange
                                         ,CN=Services,CN=Configuration,DC=Domain,DC=com
Identity                               : DAG
Guid                                   : 72c87136-6721-46e6-ac43-2ad5f6bd66d2
ObjectCategory                         : domain.com/Configuration/Schema/ms-Exch-MDB-Availability-Group
ObjectClass                            : {top, msExchMDBAvailabilityGroup}
WhenChanged                            : 10/13/2011 12:29:44 PM
WhenCreated                            : 9/19/2009 6:16:52 PM
WhenChangedUTC                         : 10/13/2011 4:29:44 PM
WhenCreatedUTC                         : 9/19/2009 10:16:52 PM
OrganizationId                         :
OriginatingServer                      : DC-2.domain.com
IsValid                                : True

The fields highlighted in red above cannot simply be read from Active Directory.  These fields require calls to cluster services and / or replication services running on each member of the DAG.  Depending on the number of members, network conditions, geographical locations of members, etc querying for these as a routine part of running the command could require additional time for the command to complete.  Administrators that desire this information can add the –status switch to get-databaseavailabilitygroup and the fields will be returned as part of the output.  Here is an example:

 

[PS] C:Windowssystem32>Get-DatabaseAvailabilityGroup DAG -Status | fl

RunspaceId                             : 37b4f2b4-06b8-4e87-9252-968452ab3a28
Name                                   : DAG
Servers                                : {DAG-4, DAG-3, DAG-2, DAG-1}
WitnessServer                          : mbx-1.domain.com
WitnessDirectory                       : c:DAG-FSW
AlternateWitnessServer                 : mbx-2.domain.com
AlternateWitnessDirectory              : c:DAG-FSW
NetworkCompression                     : Enabled
NetworkEncryption                      : Enabled
DatacenterActivationMode               : DagOnly
StoppedMailboxServers                  : {}
StartedMailboxServers                  : {DAG-1.domain.com, DAG-2.domain.com, DAG-3.home.e-mcmichae
                                         l.com, DAG-4.domain.com}
DatabaseAvailabilityGroupIpv4Addresses : {10.0.0.24}
DatabaseAvailabilityGroupIpAddresses   : {10.0.0.24}
AllowCrossSiteRpcClientAccess          : False
OperationalServers                     : {DAG-1, DAG-2, DAG-3, DAG-4}
PrimaryActiveManager                   : DAG-2
ServersInMaintenance                   : {DAG-4}
ThirdPartyReplication                  : Disabled
ReplicationPort                        : 64327
NetworkNames                           : {DAG-iSCSI, DAG-MAPI, DAG-REPL-A, DAG-REPL-B}
WitnessShareInUse                      : Primary

AdminDisplayName                       :
ExchangeVersion                        : 0.10 (14.0.100.0)
DistinguishedName                      : CN=DAG,CN=Database Availability Groups,CN=Exchange Administrative Group (FYDIB
                                         OHF23SPDLT),CN=Administrative Groups,CN=Organization,CN=Microsoft Exchange
                                         ,CN=Services,CN=Configuration,DC=domain,DC=com
Identity                               : DAG
Guid                                   : 72c87136-6721-46e6-ac43-2ad5f6bd66d2
ObjectCategory                         : domain.com/Configuration/Schema/ms-Exch-MDB-Availability-Group
ObjectClass                            : {top, msExchMDBAvailabilityGroup}
WhenChanged                            : 10/13/2011 12:29:44 PM
WhenCreated                            : 9/19/2009 6:16:52 PM
WhenChangedUTC                         : 10/13/2011 4:29:44 PM
WhenCreatedUTC                         : 9/19/2009 10:16:52 PM
OrganizationId                         :
OriginatingServer                      : DC-2.domain.com
IsValid                                : True

In general the command without the –status switch should return the necessary day to day information regarding the configuration of the DAG.

Error–567 ( JET_errDbTimeTooNew )

I recently worked with a customer who presented an interesting issue.  The customer had installed an Exchange 2007 Cluster Continuous Replication solution where nodes leveraged independent direct attached storage.  On the active node of the cluster the customer began to experience storage issues that lead to physical corruption of the Exchange database files.  In this case the corruption noted was a –1018.  In almost all cases a –1018 is indicative of a storage level error that occurs at a layer lower than the operating system.

 

In most cases customers would choose to switchover the solution to the passive copies -  after all the passive copies are completely independent and physical corruption does not replicate through the log file stream.  In this instance that is what the customer choose to do.  Although the move appeared successful (in terms of resources moving from NodeA to NodeB) several databases did not mount.  Subsequent attempts to manually mount the database also failed.  When reviewing the event logs the following events were noted on the passive node (now attempting to become active) for the databases that failed to mount:

 

Time:     9/22/2011 7:19:59
PM
ID:       516

Level:    Error

Source: ESE
Machine:  Node.company.com

Message:  Microsoft.Exchange.Cluster.ReplayService (6032) Recovery E03 SGSGCDBG001SGSGCDBG001-SG21: Database m:SG21MDBSGSGCDBG001-SG21-JE-T2.edb: Page 118956 (0x0001d0ac) failed verification due to a timestamp mismatch.  The expected timestamp was 0x39cd983 but the actual timestamp on the page was 0x39cdd83.  Recovery/restore will fail with error -567.  If this condition persists then please restore the database from a previous backup. This problem is likely due to faulty hardware "losing" one or more flushes on this page sometime in the past. Please contact your hardware vendor for further assistance diagnosing the problem.

Time:     9/22/2011 7:19:28
PM
ID:       2095

Level:    Error

Source: MSExchangeRepl

Machine:  Node.company.com

Message:  Log file X:SG38LogsE250002D510.log in SGSGCDBG001SGSGCDBG001-SG38 could not be replayed. Re-seeding the passive node is now required. Use the Update-StorageGroupCopy cmdlet in the Exchange Management Shell to perform a re-seed operation.

Time:     9/22/2011 7:19:28
PM
ID:       2097

Level:    Error

Source: MSExchangeRepl

Machine:  Node.company.com
Message:  The Microsoft Exchange Replication Service encountered an unexpected Extensible Storage Engine (ESE) exception in storage group ‘SGSGCDBG001SGSGCDBG001-SG38’. The ESE exception is dbtime on page in advance of the dbtimeBefore in record (-567) ().

When utilizing the ERR tool we note that the error –567 is:

 

# for decimal -567 / hex 0xfffffdc9
  JET_errDbTimeTooNew                                            esent98.h
# /* dbtime on page in advance of the dbtimeBefore in record
# */
# 1 matches found for "-567"

 

In general DBTime errors are usually attributed also to failing storage.  In this case though something did not seem right – would it be possible to have two independent storage devices experiences failures almost simultaneously.  In all honesty it’s not outside the realm of possibility that those circumstances may have actually occurred.  Unlike the one set of storage which was clearly displaying hardware faults on the storage devices the second storage device displayed no faults at all.  There was no way to attribute this particular error to a local storage failure.

 

Is it possible that a storage failure on the previously active node caused corruption in a passive database copy?  In this case the answer is…yes…

 

First and foremost one must draw a distinction between the types of corruption we can have within an Exchange database.  You have physical corruption and you have logical corruption.  Logical corruption is the type of corruption that will replicate through the log stream and subsequently may cause corruption within a passive database copy.  In this case DBTime errors are considered logical corruption.  Therefore, this logical corruption is introduced into the log file stream and subsequently replicated over to the passive node.  In terms of DBTime corruption this will halt logging recovery of the database instance and prevent it from being a healthy replica (when not being activated) or preventing mounting the database (when being activated).

 

So how is it possible that this type of logical corruption occurred because of a storage failure on the source copy?  Let’s investigate…

 

DBTime is a counter within an Exchange database that is incremented for each page level change that occurs – think of DBTime as the odometer in your car.  If one determines the highest DBTime, and locates the page within the database that matches that DBTime, they have identified the last page changed within the database.  When a page change occurs and that change is recorded to the log file we record three pieces of important information (although there is much more recorded).  This information includes:

 

1)  The page number that is being changed.

2)  The current DBTime on that page.

3)  The new DBTime for the page.

 

When logging recovery occurs the following rules are applied where DBTime is concerned:

 

1)  Does the page in the log record exist in the database?

2)  Is the previous DBTime recorded in the log record for the page match the current DBTime stamped on the page?

3)  Is the new DBTime recorded in the log record for the page greater than the current DBTime stamped on the page?

 

When all page modifications have occurred to a database in the correct order then the three rules are always met.  In this case the JET_errDBTimeTooNew occurs when check number 2 fails.  Simply put, when I compare the current DBTime of the page recorded in the log record to the current DBTime stamped on the same page within the database, the database has a newer value.  Therefore if I were to allow my change to be committed data loss would occur – logging recovery is halted.

 

Still – how did this happen because of storage issues on the previously active node?  Let’s explore that further…

 

So let’s assume we can isolate all changes to a single page.  In this case we will say page 45.  Page 45 has a current DBTime of 400:

 

image

 

At this time we make a change to page 45.  The log record records page 45, current DBTime 400, and new DBTime of 410.  When the change is committed to the database the DBTime on the page is updated to the new DBTime.

 

image

 

Another page change to page 45 occurs.  The log record records page 45, current DBTime 410, and new DBTime of 500.  When the change is committed to the database the DBTime on the page is updated to the new DBTime.

 

image

 

Now this is where the storage issue is introduced.  In this case Exchange was informed by the storage subsystem that the write to the database was successful.  When this information was processed Exchange purged the change from the database cache.  Unfortunately the write was “lost” and therefore the DBTime on the page never really updated to reflect the correct values.  So now a new change to page 45 is issued.  The log record records page 45, current DBTime 410, and new dbTime 600.  (See the issue – the current DBTime on the page should have reflected 500 but because of the lost flush reflects 410). 

 

image

 

At this time the administrator moves the clustered mailbox server to the passive node.  The log file referenced in this example now needs to be replayed into the database.  Remember the rules above for log replay in terms of DB Time.  So in our example the first log record is encountered:

 

Does page 45 exist = Yes.

Is the current dbTime on the page 400 = Yes.

Is the new dbTime greater than current dbTime on the page = Yes.

Transaction allowed to proceed = Yes.

 

image

 

The second log record is now evaluated. 

 

Does page 45 exist = Yes.

Is the current dbTime on the page 410 = Yes.

Is the new dbTime greater than current dbTime on the page = Yes.

Transaction allowed to proceed = Yes.

 

image

 

Note that the dbTime was correctly updated to 500.  This was the previous write that was lost to the original database (and therefore the dbTime was never updated in the original database).

 

The third log record is now evaluated.

 

Does page 45 exist = Yes.

Is the current dbTime on the page 410 = NO

Is the new dbTIME greater than current dbTime on the page = Yes.

Transaction allowed to proceed = NO  JET_errDBTimeTooNew

image

 

As you can see the dbTime on the page which is currently 500 is greater (or newer) then the previous dbTime recorded for the page from the previous active database (410).  Therefore the commit is blocked and logging recovery fails.  In our case this prevented the CMS from coming online on the second node.

 

To rectify this situation we would have normally just moved the CMS back to the original node and then reseeded these database copies (therefore bypassing the need to replay the log file in question).  In this case a restoration was necessary since both database copies were essentially unusable due to the mix of corruption present.  Unfortunately this type of corruption is only ever detected during logging recovery – there is no proactive way to detect this form of logical corruption.