Monthly Archives: May 2010

Network port design and Exchange 2010 Database Availability Groups

(An Exchange 2007 version of this article can be found here:  http://blogs.technet.com/b/timmcmic/archive/2009/04/27/network-port-design-and-exchange-2007-clusters.aspx)

A question that has come up is how many network ports should I have in my DAG members and how should I use them. 

I generally see three different hardware configurations:

  • Two network ports.
    • Usually two onboard <or> 1 onboard / 1 add-on.
  • Three network ports.
    • Usually 2 onboard / 1 add-on.
  • Four network ports.
    • Usually 2 onboard / 2 add-on.

In some hardware there are now 4 port cards.  The information contained here can be expanded to include additional hardware / port configurations as they become available.

You’ll note that there is no configuration with a single network port – I personally do not recommend having only a single network port even though this is now a supported implementation.  (Note:  VLANS to a single port are not two network interfaces).

Network Teaming

In the recommendations I’ll outline next you will see references to the use of network teaming.  It’s important to note that Microsoft does not support network teaming as this is hardware vendor supported and designed technology.  What it is though is a recognition that in absence of anyway to provide multiple client facing ports for Exchange network teaming does have a valid place in the overall high availability design.

When using network teaming, only the client facing network should be a teamed adapter and at all times the team created for NETWORK FAULT TOLERANCE.  Do not, for an Exchange instance, use any type of load balancing between ports.

For non-client facing networks it is not necessary to implement at network team (these would typically be your “heartbeat” networks).  Windows clustering has the ability to balance and use all interfaces on the cluster designated for cluster use without the need to establish teaming for cluster / heartbeat communications.

From a support perspective any customer that establishes a teamed interface for the client side network should recognize that they may be asked to dissolve the team to support troubleshooting efforts.

MAPI Networks

For Exchange 2010 DAG MAPI networks I recommend using a network fault tolerant team consisting of two ports.  More ports maybe utilized if they are available.

Replication Networks

After a team has been utilized for the MAPI network the remaining network interfaces can be divided into replication networks.  I do not recommend that any form of network teaming be utilized on replication networks.  Utilization of teaming on replication networks – although supported – is redundant.  Both the replication service and cluster service have the ability to switch between these additional networks as necessary.  All additional networks must be on their own subnet, subnets between networks may not overlap on the host.

Cluster Networks

There is no reason to establish dedicated cluster heartbeat networks with Exchange 2010 DAG members as cluster can utilized all configured interfaces between hosts for heartbeat exchange.

==============================

Updated – 6/2/10 – It is supported to use teaming on non-client facing networks although in theory this is redundant as both the replication service and cluster service have the ability to utilize multiple secondary interfaces.

==============================

Upgrading service packs on a single node Exchange 2007 cluster.

There are some installations that utilize a single node cluster hosting a clustered mailbox server for Exchange 2007.  It may become necessary to perform an upgrade of the solution to a different service pack.  This process can introduce some issues as the upgrade and upgrade processes assume that there is a second node in the cluster that can own both the cluster core resources and Exchange resources.

I wanted to outline a process that can be utilized to upgrade service packs on a single node cluster.

In order to being the upgrade process the clustered and Exchange services must be stopped.  This can be performed on the Exchange server by:

1)  Open an instance of Exchange Management Shell.

2)  Issue a stop-clusteredmailboxserver –identity <CMSName> –stopreason “Upgrade”

3)  Stop the cluster core resources group / cluster group.  Issue a cluster.exe group “cluster group” /offline.

4)  Exit the Exchange Management Shell

At this point the Exchange services will be stopped as well as the cluster core resources.  We should be able to begin the upgrade of the Exchange binaries at this time.  Use the following process to begin the Exchange upgrade.

The upgrade of the mailbox role binaries is completed by running this command from the service pack media:

setup.com /mode:upgrade

When the mailbox role upgrade has completed you can install any pending roll up updates.

The next step of the upgrade is to issue an upgrade to the clustered mailbox server.  To upgrade the clustered mailbox server:

1)  Start the cluster core resources by issuing the following command – cluster.exe group “cluster group” /online

2)  At this point the cluster core resources are online and cluster services should be started with the Exchange resources in an offline state (the same state prior to stopping the cluster services).

3)  Upgrade the clustered mailbox server by running the command – setup.com /uprgadeCMS.  When the command has completed the Exchange resources should be online on the node where the upgrade was performed.

These instructions were tested using Exchange 2007 SP1 on Windows 2008 single node Cluster Continuous Replication (CCR) installation.

Exchange 2010: Cluster core resources, the replication service, and active manager…

Every Exchange 2010 server has a process internal to the replication service known as Active Manager.  The Active Manager is responsible for all database mount, dismount, and move operations that occur in Exchange 2010.

When a server is a standalone server, Active Manager is configured as a Standalone Active Manager. 

When a server is a member of a Database Availability Group (DAG), Active Manager is either configured as:

  • PAM – Primary Active Manager
  • SAM – Secondary Active Manager

The Active Manager status in a DAG is determined by the node that owns the cluster core resources.  If a node owns the cluster core resources group, this node is then known as the Primary Active Manager (PAM).  All other nodes successfully participating in the cluster and not owning the cluster core resources are Secondary Active Managers.

Let’s take a look at an example database availability group.

DAGName:  DAG

DagMembers:  DAG-1,DAG-2,DAG-3,DAG-4

Running get-databaseavailabilitygroup –identity DAG –status | fl name,primaryActiveManager you can determine which machine currently owns the cluster core resources and is acting as the PAM.

Get-DatabaseAvailabilityGroup -Identity DAG -Status | fl name,primaryactivemanager

Name                 : DAG
PrimaryActiveManager : DAG-3

Using cluster.exe we can also confirm the owner of the cluster core resources group

cluster.exe DAG.domain.com group

Group                Node            Status
——————– ————— ——
cluster group        DAG-3           Online

Using the cluster command line, the cluster core resources can be moved to another DAG member and the PAM will subsequently change.

cluster.exe DAG.domain.com group "cluster group" /moveto:DAG-4

Moving resource group ‘cluster group’…

Group                Node            Status
——————– ————— ——
cluster group        DAG-4           Online

Get-DatabaseAvailabilityGroup -Identity DAG -Status | fl name,primaryactivemanager

Name                 : DAG
PrimaryActiveManager : DAG-4

Remember that Active Manager runs inside the Microsoft Exchange Replication service which is installed on every Exchange 2010 Mailbox Role Server.  This is important – if the replication service on a DAG member is not started, but that DAG member owns the cluster core resources, database mount / dismount / move functionality will not function.

Here is an example…

Currently the cluster core resources are owned on the node DAG-4 which is successfully participating in the cluster DAG.  Using the services control panel the Microsoft Exchange Replication service on the server DAG-4 was stopped.  We can confirm using the commands above that DAG-4 is still seen as the PAM.

Get-DatabaseAvailabilityGroup -Identity DAG -Status | fl name,primaryactivemanager

Name                 : DAG
PrimaryActiveManager : DAG-4

cluster dag.domain.com group
Listing status for all available resource groups:

Group                Node            Status
——————– ————— ——
Cluster Group        DAG-4           Online
Available Storage    DAG-1           Offline

Using test-replicationHealth and test-serviceHealth we can see that the replication service on node DAG-4 is unavailable.

Server          Check                      Result     Error      
——          —–                      ——     —–   

DAG-4           ClusterService             Passed  
DAG-4           ReplayService              *FAILED*   The Microsoft Exchange Replication service is not running on s…
DAG-4           DagMembersUp               Passed
          

Role                    : Mailbox Server Role
RequiredServicesRunning : False
ServicesRunning         : {IISAdmin, MSExchangeADTopology, MSExchangeIS, MSExchangeMailboxAssistants, MSExchangeMailSubmission, MSExchangeRPC, MSExchangeSA, MSExchangeSearch, MSExchangeServiceHost, MSExchangeThrottling, MSExchangeTransportLogSearch, W3Svc, WinRM}
ServicesNotRunning      : {MSExchangeRepl}

At this time a dismount operation on a database was issuing using the dismount-database command.  An error is immediately returned:

Dismount-Database DAG-DB0

Confirm
Are you sure you want to perform this action?
Dismounting database "DAG-DB0". This may result in reduced availability for mailboxes in the database.
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [?] Help (default is "Y"): y

Couldn’t dismount the database that you specified. Specified database: DAG-DB0; Error code: An Active Manager operation
failed. Error: The Microsoft Exchange Replication service may not be running on server DAG-4.domain.com. Specific RPC error message: Error 0x6d9 (There are no more endpoints available from the endpoint mapper) from cli_MountDatabase.
    + CategoryInfo          : InvalidOperation: (DAG-DB0:ADObjectId) [Dismount-Database], InvalidOperationException
    + FullyQualifiedErrorId : D64CA7E2,Microsoft.Exchange.Management.SystemConfigurationTasks.DismountDatabase

 

This error is the occurs because the server that is designated as the Primary Active Manager does not have it’s replication service running (and therefore the Active Manager is not running).  Stopping the replication service does not automatically arbitrate Active Manager functions to another DAG member.

To fix this error:

  • Start the replication service on the machine that is designated as the Primary Active Manager (preferred).
  • Move the cluster core resources to another DAG member (promoting that server to the Primary Active Manager.  (Least preferred since it does not address why the replication service is stopped on a running DAG member).

It is important that the replication service be monitored on all DAG members to ensure it remains functional.

*Updated – 5/30/2010 – Corrected the commandlet for testing services –> test-serviceHealth instead of test-serverHealth.

*Updated – 6/22/2011 – Corrected table formatting of output.

Exchange 2010 – Stopping the Information Store Service does not failover database copies.

In Exchange 2010 we achieve high availability of mailbox databases by utilizing a Database Availability Group (DAG).  When a DAG is utilized, and mailbox database copies are created on DAG members, they are either MOUNTED (active) or have a copy status – for example HEALTHY (passive).

If an administrator or another process gracefully stops the Information Store Service on a DAG node, any database copies that were mounted on that server enter a DISMOUNTED state.  These copies do not fail over to another node, as the shutdown was graceful and not the result of an error condition that would trigger a failover event to occur.  Should the processed have crashed or otherwise became unavailable, this would be detected as an error and the database instances failed over.

Let’s take a look at this.

In this environment I have a four mailbox server DAG.  Database DAG-DB0 is replicated to all members of the DAG.  Currently DAG-DB0 is mounted on mailbox server DAG-1.

Get-mailboxdatabasecopystatus *DAG-1

Name                                          Status        
—-                                          ——         
DAG-DB0DAG-1                                 Mounted             
DAG-1-DB0DAG-1                               Mounted          
DAG-DB1DAG-1                                 Healthy
 

On DAG-1 I issued a net stop msexchangeis.  This gracefully stopped the Information Store Service.

Log Name:      System
Source:        Service Control Manager
Date:          3/21/2010 11:27:19 AM
Event ID:      7036
Task Category: None
Level:         Information
Keywords:      Classic
User:          N/A
Computer:      DAG-1.domain.com
Description:
The Microsoft Exchange Information Store service entered the stopped state.

Reviewing the mailbox database copy status for DAG-1, the following is noted.  (Get-mailboxdatabasecopystatus *DAG-1)

Name                                          Status         
—-                                          ——         
DAG-DB0DAG-1                                 Dismounted     
DAG-1-DB0DAG-1                               Dismounted     
DAG-DB1DAG-1                                 Healthy
 

As a response to the IS gracefully shutting down the mailbox database copies that were mounted on that host are now dismounted.

Subsequently I issued the command net start msexchangeis to start the Information Store Service. 

Log Name:      System
Source:        Service Control Manager
Date:          3/21/2010 11:34:18 AM
Event ID:      7036
Task Category: None
Level:         Information
Keywords:      Classic
User:          N/A
Computer:      DAG-1.domain.com
Description:
The Microsoft Exchange Information Store service entered the running state.

After restarting the service the databases that were previous dismounted are now mounted.  (Get-mailboxdatabasecopystatus *DAG-1)

Name                                          Status         
—-                                          ——         
DAG-DB0DAG-1                                 Mounted        
DAG-1-DB0DAG-1                               Mounted        
DAG-DB1DAG-1                                 Healthy

Exchange 2007 – Using VSS to perform an online offline database seed.

When using continuous replication in Exchange 2007, an operation that sometimes needs to be performed is a database seed.  This operation is typically performed as part of enabling replication, and infrequently it is performed as part of the process for recovering from divergence.

Seeding is most often performed with the Update-StorageGroupCopy cmdlet. During seeding, an ESE streaming backup is performed on the source database.  This API is used to copy the database from the source to the target.  There are sometimes where this process fails or for various reasons cannot be utilized.  This means an alternate way to seed the database is needed.

One method is to perform a manual offline seeding. In this operation, the source database is dismounted, verified to be in a clean shutdown state, and then manually copied offline to the target. This can obviously be inconvenient, since the source database has to be down while the copy procedure is being performed.

Another method is to use a VSS backup of the database to seed the database copy.  You can use VSS to back up the database, and VSS to restore the database.  (Sorry – if you are using the online streaming backup API for your databases you will not be able to use these instructions).

When using an Exchange-aware VSS application, there are typically four destinations for a restore (note, your backup software may not enable all the options):

  1. Original storage group
  2. Alternate storage group
  3. Recovery storage group
  4. File system

To use the VSS backup and restore method, you would choose to restore to the file system.

The following steps outline a high level process on how to utilize a VSS backup and restore to file system to complete an online offline database seed operation.

====================================

The first step is to enable replication for the storage group.  In CCR this is handled for you automatically every time you create a database. When using SCR or LCR, this is accomplished using the Enable-StorageGroupCopy cmdlet. It is important to ensure that neither circular logging nor backups truncate any of the log files necessary to complete this process.

(SCR)
Enable-StorageGroupCopy –Identity <ServerNameStorageGroupName> –StandbyMachine <SCRTargetName> –SeedingPostponed

(LCR)
Enable-DatabaseCopy –Identity <ServerNameDatabaseName> –CopyEdbFilePath “pathdatabase.edb”

(LCR)
Enable-StorageGroupCopy –Identity <ServerNameStorageGroupName> –CopyLogFolderPath <path> –CopySystemFolderPath <path> –SeedingPostponed

For more information on enabling SCR, please see my blog post at http://blogs.technet.com/timmcmic/archive/2009/01/22/inconsistent-results-when-enabling-standby-continuous-replication-scr-in-exchange-2007-sp1.aspx

If you have already enabled continuous replication for the storage group, proceed to the second step.

====================================

The second step is to ensure that the storage group copy is in a suspended state.  Storage group copies can be suspended either in bulk or one at a time.  The following are example commands:

(All Storage Groups)
Get-StorageGroup –Server <SourceServerName> | Suspend-StorageGroupCopy –StandbyMachine <TargetMachineName>

(Single Storage Group)
Suspend-StorageGroupCopy –identity <ServerNameStorageGroupName> –StandbyMachine <TargetMachineName>

It is important that in the SCR environment these commands are run on both the source and target servers.  All servers should indicate a suspended status, reflecting that both Active Directory replication and the Microsoft Exchange Replication service configuration updates occurred successfully.

====================================

The third step is to note the important paths that are necessary to complete the rest of these steps. Specifically, we are interested in the storage group log file path, the system folder path and copy system folder path, and the log file prefix.  For the mailbox database we are interested in the database file path and copy database file paths.

To get all paths for all storage groups on the source, use the following command:

Get-StorageGroup –Server <ServerName> | fl Name,LogFolderPath,SystemFolderPath,CopyLogFolderPath,CopySystemFolderPath,LogFilePrefix

This will give you a formatted list of storage group names, log paths, and system paths.

To get the paths for all mailbox databases, use the following command:

Get-MailboxDatabase –Server <ServerName> | fl Name,EdbFilePath,CopyEdbFilePath

This will give you a formatted list of mailbox database names and mailbox database paths.

Here is an example of the output you can expect to see (copy path attributes will only be populated if you are utilizing LCR):

Name            : Mailbox Database LCR
EdbFilePath     : d:SG1DB1.edb
CopyEdbFilePath : d:SG1-LCRDB1.edb

Name            : Mailbox Database CCR or SCR
EdbFilePath     : d:SG2DB2.edb
CopyEdbFilePath :

Name                 : Storage Group LCR
LogFolderPath        : d:SG1
SystemFolderPath     : d:SG1
CopyLogFolderPath    : d:SG1-LCR
CopySystemFolderPath : d:SG1-LCR
LogFilePrefix        : E00

Name                 : Storage Group CCR or SCR
LogFolderPath        : d:SG2
SystemFolderPath     : d:SG2
CopyLogFolderPath    :
CopySystemFolderPath :
LogFilePrefix        : E01

====================================

The fourth step is to verify that the source log file sequence is in order.  If the source log file sequence has been manually manipulated, and if any log file gaps are present, this results in a failure of the seed operation.  This step ensures that log files are in sequence on the source machine.

To ensure that the log sequence on the source machine is in the correct order, perform the following operations:

1. Open a command prompt and navigate to the log directory of the storage group.  This path can be found from the output gathered in step 3 above.

2. Run the following eseutil command:

eseutil /ml <LogFilePrefix>

The log file prefix can be found from the output gathered in step 3.

When you run this command it will scan every log file found in the source directory.  If any gaps or errors are identified, you cannot continue with these steps.  If the command completes and errors on the last log file in the series this is expected, as the Exx.log is currently open for writing and cannot be scanned.  The following is sample output that you should receive for a storage group that is online.

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Version 08.02
Copyright (C) Microsoft Corporation. All Rights Reserved.
Initiating FILE DUMP mode…

Verifying log files…
     Base name: e00

      Log file: d:SG1E0000001353.log – OK
      Log file: d:SG1E0000001354.log – OK
      Log file: d:SG1E0000001355.log – OK
      Log file: d:SG1E0000001356.log – OK
      Log file: d:SG1E0000001357.log – OK
      Log file: d:SG1E0000001358.log – OK
      Log file: d:SG1E0000001359.log – OK
      Log file: d:SG1E000000135A.log – OK
      Log file: d:SG1E000000135B.log – OK
      Log file: d:SG1E000000135C.log – OK
      Log file: d:SG1E000000135D.log – OK
      Log file: d:SG1E000000135E.log – OK
      Log file: d:SG1E000000135F.log – OK
      Log file: d:SG1E0000001360.log – OK
      Log file: d:SG1E0000001361.log – OK
      Log file: d:SG1E0000001362.log – OK
      Log file: d:SG1E0000001363.log – OK
      Log file: d:SG1E0000001364.log – OK
      Log file: d:SG1E0000001365.log – OK
      Log file: d:SG1E0000001366.log – OK
      Log file: d:SG1E0000001367.log – OK
      Log file: d:SG1E0000001368.log – OK
      Log file: d:SG1E0000001369.log – OK
      Log file: d:SG1E00.log
                ERROR: Cannot open log file (d:SG1E00.log). Error -1032.

Operation terminated with error -1032 (JET_errFileAccessDenied, Cannot access file, the file is locked or in use) after 368.625 seconds.

====================================

The fifth step is to perform a VSS backup of the database. Please consult with your backup vendor to ensure that a successful FULL backup is performed.  Please also make sure that a consistency check of the backup is performed.

====================================

The sixth step is to restore the VSS backup.  When you perform the restore, you should select the option to restore to file system.  This may require that you restore to the file system of an Exchange server, so it may be necessary to ensure that sufficient free space exists on a volume on the Exchange server where the restore will be performed.

For SCR and CCR restore to the original server.  In our example we will say that we are restoring to the original server at x:Restore.

For LCR we can restore to the CopyEdbFilePath.  In our example you would restore to d:SG1-LCR.  This will prevent us from having to run a copy operation at a later time.

If multiple databases are being restored, I recommend that databases be restored individually.

At this point we now have the EDB file on the file system, and we will use it for the seeding operation.

====================================

The seventh step is to ensure that the target paths are ready to have the database moved in place.  The paths referenced in these steps can be obtained from the output gathered in step 3.

For SCR – ensure that the logFolderPath, systemFolderPath, and edbFilePath are empty on the SCR target.

For CCR – ensure that the logFolderPath, systemFolderPath, and edbFilePath are empty on the passive node.

For LCR – ensure that the copyLogFolderPath, copySystemFolderPath, and copyEdbFilePath are empty.

At this point the destination paths are empty and ready for the database to be moved.

We now need to create the directory structure where logs, system, and database files will be copied.

For SCR and CCR – create the log, system, and database folder.  In our example logs, system, and database files are located at d:SG1.  Therefore on the SCR target or CCR passive node I would create the directory structure d:SG1.

For LCR – we would create the copy folder.  In our example we have copies placed in d:SG1-LCR.  Therefore on the server I would create the directory structure D:SG1-LCR.

If you are using nested folders you need to create the entire directory structure.

====================================

The eighth step is to move the restored database to the target directory.  This can be accomplished in a few different ways, but I will make a recommendation below.

For CCR and SCR:

From the source server map a drive to the drive$ share of the target.  For example, I would map the drive Y: to \SCRTargetd$ using our example.

Open a command prompt and navigate to the restore directory.  In our example this is X:Restore

Use Eseutil to copy the database from the source directory to the target directory.  The sample command using our example is:

eseutil /y SG1-DB1.edb /d y:SG1-DB1.edb

Here is the expected output from this command:

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Version 08.02
Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating COPY FILE mode…
     Source File: SG1-DB1.edb

Destination File: y:SG1-DB1.edb

                      Copy Progress (% complete)

          0    10   20   30   40   50   60   70   80   90  100

          |—-|—-|—-|—-|—-|—-|—-|—-|—-|—-|

          ……………………………………………

Operation completed successfully in 13.281 seconds.

At this point the copy has been seeded on the target server.

For LCR, this step is not necessary as the restoration to file system was performed to the LCR location.

Information on the usage of Eseutil can be found here.  http://technet.microsoft.com/en-us/library/aa998249(EXCHG.80).aspx

====================================

The ninth step is to verify the health of the copied database.  We need to ensure that the database was not corrupted as a part of the copy process.

For SCR and CCR:

Log on locally to the SCR target or CCR passive node, open a command prompt, and navigate to the database directory.  In our example this would be d:SG1.

Use Eseutil /k to perform a checksum of the database:

eseutil /k SG1-DB1.edb

The following output will be observed when the command completes:

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Version 08.02
Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating CHECKSUM mode…
        Database: SG1-DB1.edb
  Temp. Database: TEMPCHKSUM3888.EDB

File: SG1-DB1.edb

                     Checksum Status (% complete)

          0    10   20   30   40   50   60   70   80   90  100

          |—-|—-|—-|—-|—-|—-|—-|—-|—-|—-|

          ……………………………………………

514 pages seen
0 bad checksums

0 correctable checksums
129 uninitialized pages
0 wrong page numbers
0x4676 highest dbtime (pgno 0x86)
65 reads performed
4 MB read
1 seconds taken
4 MB/second
2755 milliseconds used
42 milliseconds per read
78 milliseconds for the slowest read
15 milliseconds for the fastest read

Operation completed successfully in 0.140 seconds.

We are interested in ensuring that there are 0 bad checksums (bolded line above).

For LCR, this command should be run locally on the machine.

Open a command prompt and navigate to the copy database directory.  In our example, this would be d:SG1-LCR.

Use Eseutil /k to perform a checksum of the database:

eseutil /k SG1-DB1.edb

The following output will be observed when the command completes:

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server
Version 08.02
Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating CHECKSUM mode…
        Database: SG1-DB1.edb
  Temp. Database: TEMPCHKSUM3888.EDB

File: SG1-DB1.edb

                     Checksum Status (% complete)

          0    10   20   30   40   50   60   70   80   90  100

          |—-|—-|—-|—-|—-|—-|—-|—-|—-|—-|

          ……………………………………………

514 pages seen
0 bad checksums

0 correctable checksums
129 uninitialized pages
0 wrong page numbers
0x4676 highest dbtime (pgno 0x86)
65 reads performed
4 MB read
1 seconds taken
4 MB/second
2755 milliseconds used
42 milliseconds per read
78 milliseconds for the slowest read
15 milliseconds for the fastest read

Operation completed successfully in 0.140 seconds.

We are interested in ensuring that there are 0 bad checksums (bolded line above).

====================================

The last step in the process is to resume the storage group copy::

(SCR):  Get-StorageGroup –Server <SourceServerName> | Resume-StorageGroupCopy –StandbyMachne <SCRTargetName>

(CCR / LCR):  Get-StorageGroup –Server <SourceServerName> | Resume-StorageGroupCopy

(Note:  These command resume storage group copy for all storage groups.  If you have a storage group that has copy suspended for another reason it may be necessary to resume single storage groups).

When replication has resumed successfully, you can note the following events in the application log indicating that replication began copying log files.

Event Type:    Information
Event Source: MSExchangeRepl
Event Category:   Action
Event ID:    2084
Date:        3/16/2010
Time:        10:12:50 AM
User:        N/A
Computer:    SERVER
Description: Replication for storage group SERVERStorage Group SCR or CCR has been resumed.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Event Type:    Information
Event Source:    MSExchangeRepl
Event Category:   Service
Event ID:    2114
Date:        3/16/2010
Time:        10:13:19 AM
User:        N/A
Computer:    SERVER
Description: The replication instance for storage group SERVERStorage Group SCR or CCR has started copying transaction log files. The first log file successfully copied was generation 31201.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

====================================

The following are links to references from this post.

· Enable-StorageGroupCopy (http://technet.microsoft.com/en-us/library/aa996389(EXCHG.80).aspx)

· Enable-DatabaseCopy (http://technet.microsoft.com/en-us/library/aa996389(EXCHG.80).aspx)

· Suspend-StorageGroupCopy (http://technet.microsoft.com/en-us/library/aa998182(EXCHG.80).aspx)

· Get-StorageGroup (http://technet.microsoft.com/en-us/library/aa998331(EXCHG.80).aspx)

· Get-MailboxDatabase (http://technet.microsoft.com/en-us/library/bb124924(EXCHG.80).aspx)

· ESEUTIL (http://technet.microsoft.com/en-us/library/aa998249(EXCHG.80).aspx)

· Resume-StorageGroupCopy (http://technet.microsoft.com/en-us/library/bb124529(EXCHG.80).aspx)

Using a Windows 2008 or Windows 2008 R2 File Server Cluster to host the File Share Witness for an Exchange 2010 Database Availability Group

Recently I have been asked how a file server / file share cluster can be utilized to host the file share witness directory for an Exchange 2010 Database Availability Group.  I think it is important to note that this is not a recommended deployment. 

With Exchange 2010 changing the witness directory is as simple as recognizing that the server hosting the witness is unavailable for an extended period of time and running the set-databaseavailabilitygroup command with the –witnessServer and specifying a remaining Exchange server <or> another server where the Exchange Trusted Subsystem group has been granted local administrator rights.

Due to the permissions model necessary for Exchange 2010 these a Windows 2003 File Share cluster should not be utilized for these functions.  Utilizing a Windows 2003 File Share cluster causes incompatibilities with the set-databaseavailabilitygroup commandlet.

Node Configuration

Using server manager the Exchange Trusted Subsystem was added to the Local Administrator group of each node in the cluster.

Cluster Configuration

The cluster services were established utilizing the appropriate cluster quorum model and the desired number of nodes.

An empty service and application was created.  Inside the empty service and application a client access point was created.  An appropriate name and IP address were specified.  (For this example the name is FileServer1).

image

The shared storage that will host the file share witness was already available in the Available Storage group.  The desired disks were added to the service or application group.

image

At this time it’s important to note the drive letter associated with the disk where you want the witness directory hosted.  (For this example the drive letter is E).

DAG Configuration

In our example the name of our dag is DAG.  The domain where the DAG is created is exchange.msft.

By default when only a witness server is specified the default DAG directory is c:DAGFileShareWitnessDAG-FQDN.  (In our example c:DAGFileShareWitnessDAG.exchange.msft.)

This is where the configuration issue arises.  The C drive is not valid in the service or application that we created in cluster.  Only the E drive is a valid drive.  Therefore, we will have to utilize the E drive in the witness path.  This will require us to set both the WitnessServer and WitnessDirectory attribute.

To configure the DAG to utilize the FileServer1 name and a directory on the E drive which is valid in the group, run the following command:

set-databaseavailabilitygroup –identity DAG –witnessServer FileServer1 –witnessDirectory e:DAGFileShareWitnessDAG.exchange.msft

At this time when you review the service or application holding the name and disk resource, you will see an additional file server resource created.  The share is now available on the file server cluster.

image

Cluster Core Resources fail to come online on some Exchange 2010 Database Availability Group (DAG) nodes.

Although Exchange 2010 no longer deploys a cluster resource model we still use Windows Failover Clustering service for certain functions.

When a Windows 2008 / 2008 R2 cluster is created, the cluster core resources are groups together in the ‘Cluster Group’.  THe Cluster Group is a hidden group that contains the following resources:

  • Cluster Name:  This is the cluster name object (CNO).  Exchange 2010 uses the name of the DAG to create this resource.  The name of the DAG is always the name of the cluster and the CNO.
  • Cluster IPv4 Addresses:  These are the IPv4 addresses that are associated with the DAG.  If the members of the DAG span multiple subnets, there will be multiple IPv4 resources.
  • File Share Witness:  This is the quorum resource that is created using the witness server and witness directory settings of the DAG.  This resource should only be present when there is an even number of DAG members.

You can see the cluster core resources in failover cluster manager by selecting the cluster name in the upper left hand pane.  In the center pane, expand the cluster core resources section.

image

The cluster core resource group can also be seen using cluster.exe (or in Windows 2008 R2 cluster powershell extensions).

Windows 2008 / Windows 2008 R2:  Cluster.exe DAG.company.com group

cluster.exe dag.company.com group
Listing status for all available resource groups:

Group                Node            Status
——————– ————— ——
Cluster Group        DAG-1           Online
Available Storage    DAG-1           Offline

Windows 2008 R2:  Get-ClusterGroup –Cluster DAG.company.com

PS C:UsersAdministrator> Get-ClusterGroup -Cluster DAG.company.com

Name                   OwnerNode        State
—-                   ———        —–
Cluster Group          dag-1           Online
Available Storage      dag-1          Offline

From an Exchange 2010 perspective you do not really need to manage the cluster core resources.  As members join and depart the cluster this resource group will be automatically moved to a remaining member.  Each member of the DAG should have the ability to arbitrate and fully bring online the cluster core resources.

When a cluster is created in Windows 2008 or Windows 2008 R2, the cluster service enumerates all network ports found on the nodes.  These network ports are then combined into cluster networks.  You can view the cluster networks in failover cluster manager by expanding the cluster name and expanding networks.

image

You can also view the cluster networks using cluster.exe or powershell.

Windows 2008 / Windows 2008 R2:  cluster.exe dag.company.com network

cluster.exe dag.company.com network
Listing status for all available networks:

Network                                  Status
—————————————- ———–
Cluster Network 2                        Up
Cluster Network 4                        Up
Cluster Network 1                        Up

Windows 2008 R2:  get-clusternetwork –cluster DAG.company.com

Get-ClusterNetwork -Cluster DAG.company.com

Name                                State
—-                                —–
Cluster Network 1                   Up
Cluster Network 2                   Up
Cluster Network 4                   Up

A cluster network has three settings:

  • Do not allow cluster network communications on this network
  • Allow cluster network communications on this network
    • Allow clients to connect through this network

You can see these settings in failover cluster manager by getting the properties of a cluster network.

image

You can also view the network role either by using cluster.exe or powershell.

Windows 2008 / Windows 2008 R2:  cluster.exe dag.company.com network "Cluster Network 1” /prop

cluster dag.company.com network "Cluster Network 1" /prop

Listing properties for ‘Cluster Network 1’:

T  Network              Name                           Value
— ——————– —————————— ———–
SR Cluster Network 1    Name                           Cluster Network 1
MR Cluster Network 1    IPv6Addresses
MR Cluster Network 1    IPv6PrefixLengths
MR Cluster Network 1    IPv4Addresses                  10.0.0.0
MR Cluster Network 1    IPv4PrefixLengths              24
SR Cluster Network 1    Address                        10.0.0.0
SR Cluster Network 1    AddressMask                    255.255.255.0
S  Cluster Network 1    Description
D  Cluster Network 1    Role                           3 (0x3)
D  Cluster Network 1    Metric                         1200 (0x4b0)
D  Cluster Network 1    AutoMetric                     1 (0x1)

Windows 2008 R2:  get-clusternetwork –cluster DAG.company.com | fl name,role

Get-ClusterNetwork -Cluster DAG-1.company.com | fl name,role

Name : Cluster Network 1
Role : 3

Name : Cluster Network 2
Role : 1

Name : Cluster Network 4
Role : 1

The role of the networks can also be viewed in the registry of each node.  This information is located at:  HKEY_LOCAL_MACHINEClusterNetworks.  Each cluster network is represented by a subkey which is the GUID of the network.  Expanding the GUID, you will see sub-values including Name and Role.

[HKEY_LOCAL_MACHINEClusterNetworks2cd2b920-0a2a-4851-bb24-de02d4a70b7e]
@="class mscs::TmNetworkInfo"
"Id"="2cd2b920-0a2a-4851-bb24-de02d4a70b7e"
"Name"="Cluster Network 2"
"Signature"="NETW"
"Description"=""
"Role"=dword:00000001
"Priority"=dword:ffffffff
"Transport"="TCP/IP"
"Ignore"=dword:00000000
"Address"="192.168.0.0"
"AddressMask"="255.255.255.0"
"IPv6Address"=""
"State"=dword:00000003
"Metric"=dword:0000044c
"AutoMetric"=dword:00000001

The role value can contain three different values depending on the cluster network settings.  The values are:

  • 0:  Do not allow cluster network communications on this network
  • 1:  Allow cluster network communications on this network
  • 3:  Allow clients to connect through this network

In order for an IPv4 resource to be brought online it must be associated with a network that  is configured to “Allow cluster network communications on this network” and to “Allow clients to connect through this network”.  If for any reason the “Allow clients to connect through this network” option is not enabled, the IPv4 resource associated with that network will not be able to be brought online.

On an Exchange 2010 DAG member, when attempting to move the cluster core resources to another DAG member the resources may fail to come online.  Specifically the IPv4 resource fails to come online which results in the network name resource failing to come online (due to dependency).

If using Failover Cluster Manager and attempting to bring online the IPv4 resource in the cluster core resources group, the following pop up error is displayed:

image

A review of the system log shows event 1223:

Log Name:      System

Source:        Microsoft-Windows-FailoverClustering

Date:          5/10/2010 1:14:42 PM

Event ID:      1223

Task Category: IP Address Resource

Level:         Error

Keywords:     

User:          SYSTEM

Computer:     dagNode.company.com

Description:

Cluster IP address resource ‘IPv4 Static Address 2 (Cluster Group)’ cannot be brought online because the cluster network ‘Cluster Network 2’ is not configured to allow client access. Please use the Failover Cluster Manager snap-in to check the configured properties of the cluster network.

This Event 1223, described above, indicates that the effective setting for Cluster Network 2 is “Allow cluster network communications on this network” but does not have “Allow clients to connect through this network” set.  However, when reviewing the settings in failover cluster manager for Cluster Network 2 you might see that both “Allow cluster network communications on this network” and “allow clients to connect through this network” are enabled. 

The Microsoft Exchange Replication Service is responsible for assisting to maintain the cluster network configuration.  There is an issue in the current Replication Service where settings are not changed.  This essentially causes a difference between the setting inside the cluster and the setting displayed in Failover Cluster Management tools.

Workaround:

A quick and easy workaround for this issue is to simply reset the state of the network.  There are multiple ways to accomplish this and I will outline each below.  Step zero before proceeding with any other steps is to note the cluster network that is displayed in the above event since that is the network that will need to be reset (in this example Cluster Network 2). 

Windows 2008 / Windows 2008 R2 – Using Failover Cluster Management Tool

The network state can be reset using Failover Cluster Manager

  • Launch Failover Cluster Management
  • Expand the cluster networks.

image

  • Get the properties of the cluster network in question.
  • Uncheck the box to “Allow clients to connect through this network”.

image

  • Press <apply> – you will be prompted with the following – select OK.

image

  • Press <OK> to exist the properties pane.
  • The network is disabled for “Allow clients to connect through this network”. 

Next we need to enable the network for “Allow clients to connect through this network”.

  • Get the properties of the cluster network.
  • Check the box to “Allow clients to connect through this network”.

image

  • Press <apply> – you will be prompted with the following – select OK.

image

  • Press <OK> to exist the properties pane.

The network has been reset and cluster core resources should successfully arbitrate to any DAG member with a network port in this network.

Windows 2008 / Windows 2008 R2:  Using cluster.exe

  • Launch a command prompt with administrative privileges.
  • Run the following command:

cluster.exe dag.company.com network “Cluster Network 2” /prop role=1

  • The network is disabled for “Allow clients to connect through this network”. 

Next, we need to enable the network for “Allow clients to connect through this network”.

  • Run the following command:

cluster.exe dag.company.com network “Cluster Network 2” /prop role=3

  • The network is enabled for “Allow clients to connect through this network”.  At this time we need to enable the network for “Allow clients to connect through this network”.

The network has now been reset and cluster core resources should successfully arbitrate to any DAG member with a network port in this network.

Windows 2008 R2:  Using powershell

  • Launch powershell with administrative privileges.
  • Run the following command:

Get-clusternetwork –cluster DAG.company.com –name “Cluster Network 2” | % {$_.role=1}

  • The network is disabled for “Allow clients to connect through this network”. 

Next, enable the network for “Allow clients to connect through this network”.

  • Run the following command:

Get-clusternetwork –cluster DAG.company.com –name “Cluster Network 2” | % {$_.role=3}

  • The network is enabled for “Allow clients to connect through this network”. 

Next, we need to enable the network for “Allow clients to connect through this network”.

The network has now been reset and cluster core resources should successfully arbitrate to any DAG member with a network port in this network.

 

LONG TERM FIX

This issue will be fixed in Exchange 2010 Service Pack 1.  The issue will not be fixed in Exchange 2010 RTM.

==========================================

Updated – 6/2/2010

Updated to list Exchange 2010 SP1 confirmed to contain fix. 

==========================================