Monthly Archives: December 2008

Exchange Replication Service (Exchange 2007 SP1) and Windows 2008 Clusters

If you are reading this blog post I also suggest you read the the following white paper:  "White Paper:  Continuous Replication Deep Dive" which can be found at http://technet.microsoft.com/en-us/library/cc535020.aspx.  This white paper does a fantastic job of covering the operations of the replication service.

 

Windows 2008 and Windows 2008 clustering introduce changes to the operating system which effect the operation of the Exchange Replication Service.  Specifically there are two items of interest.  The first is the ability to programmatically create a share to a specific host and the integration of file sharing operations with the cluster service when shared storage is used.  (These same concepts also effect Exchange Offline Address List Generation, see Dave Goldman’s blog for more info on these topics – http://blogs.msdn.com/dgoldman/archive/2008/12/11/fix-for-oab-generation-failing-on-ccr-and-scc-clusters.aspx.).

 

Lets take a look at how this effects Exchange 2007 SP1 Cluster Continuous Replication (CCR) clusters.

 

With Exchange 2007 SP1 CCR clusters there are no shared storage devices.  File shares that are created, which are necessary for the replication service to replicate log files, are created on the host.

 

In Windows 2003 clustering, when a file share was created on a local node it was accessible by any name the resolved to that node.  (If the name did not exist as either the node name, or a virtual network name in cluster, it would be necessary to disable strict name checking in order for the host to accept the request – http://support.microsoft.com/kb/281308 – this could possibly happen when using a host file to map a different name or using a CNAME DNS record to map to a host).  For example, if I create a share on the host named Files, I could access the share at \NodeNameFiles or \CMSNameFiles if the CMS (clustered mailbox server) was active on that node.  In addition, if I used enable-continuousreplicationhostnames to create replication networks to replicate logs on an alternate network interface, I could also access the share at \ContinuousReplicationHostNameFiles

 

In the below example on Windows 2003, on the root of C, I created a share called Files and placed two text files in the share.  There is a cluster network name that runs on the node where I created the share.  From another machine in the environment, you can see that the share can be accessed at both \NodeName and \CMSName.

 

image 

image

 

In Windows 2008, if I created the same share on the local node, it would only be accessible by \NodeNameFiles.  If attempting to access the share at \CMSNameFiles an error is displayed.  (In this example node name is 2008-Node1 and CMSName is 2008-MBX3).

 

image

 

image

 

On Windows 2008, in server manager-> roles -> file services -> share and storage management, the Files share appears in the list of available shares.

image

 

If you get the properties of the share, you can see that it’s specifically scoped to the Node Name (review the share path).

image

 

In a default configuration for CCR, scoped shares do not have any impact how the replication service replicates log files.  The main impact comes when using continuous replication host names.  In order for this to function, the replication service has to specifically create the share and have it scoped at both the NodeName and CMSName. 

 

When using continuous replication hostnames, if you review the Share and Storage Management console, you will see two (or more depending on how many continuous replication host names exist) instances of the shares for each storage group. 

 

image

 

Each instance of the share uses the same share name and the same physical path on storage.  The difference is in the properties of each of these shares.  When reviewing the properties, you will see one share specifically scoped to the node name, the other share specifically scoped to the continuous replication host name.  (In this example 2008-MBX3-ReplC is the continuous replication host name.)

 

image
*Example of share scoped to continuous replication host name.

 

image

*Example of share scoped to node name.

 

By creating scoped shares at both endpoints, the replication service is able to access logs using both the NodeName and ContinuousReplicationHostName.

 

Lets take a look at how this effects Exchange 2007 SP1 Single Copy Clusters (SCC).

 

With Exchange 2007 SP1 SCC all databases and log files reside on a shared storage device.  This requires that the shares necessary for creation be created against folders that exist on shared volumes.  In this instance, there is no local replication activity or replication between nodes of the cluster.  The replication service is only used against a single copy cluster when the SCC cluster is acting as a standby continuous replication source.

 

With Windows 2003 the replication service would create the shares necessary for SCR replication to occur.  These shares, by default, were available at both \NodeName and \CMSName.  There was no tight integration between the sharing functions of the operating system and cluster where shared storage was concerned.  (Remember that an SCR target replicates log files from an SCC SCR source at \CMSNameStorageGroupGuid$).

 

In Windows 2008 cluster is more aware of shares created on shared storage.  By default, when a share is created on a shared disk, the cluster service will automatically intercept that share and scope it only to the virtual name associated with the client access point that owns the disk.  This happens when manually creating a share through the operating system or programmatically creating a share (regardless of the endpoint passed into the sharing function).  Let’s take a look at an example.

 

In this example I created a empty service or application.  In the empty service or application, I created a new client access point.  I created a new shared disk on my SAN, added the disk to available storage, and then moved it to the client access point.  In my case the disk is the H volume (Cluster Disk 9).

 

image

 

On the node that owns the Empty-Group with the the client access point and disk, through the operating system, I created a folder on the H drive and shared it.  After completing the sharing wizard, you can see the share is immediately scoped to the name used in the client access point.  In this example my client access point name is EMPTY-GROUP, so the share is available at \Empty-GroupFiles.

 

image

 

When reviewing share and storage management, you will see the share.  It’s properties also reference the share created to the client access point name.

image

 

image

 

In additional to the above information, in the cluster administrator, in the service or application that I created, a new FileServer resource is created.  The file server resource is dependant on the physical disk that the share resides on, and the client access point name.  The share that was created is also viewable in cluster administrator.

 

image

 

How is this handled programmatically?

 

A common method to create shares was the Share_Info_502 Structure.  (http://msdn.microsoft.com/en-us/library/bb525410(VS.85).aspx)  When this structure is used on Windows 2008 cluster, the share is automatically created against the node name (as long as the shared folder does not reside on a shared disk).  If the folder that is being shared resides on shared disk, cluster automatically intercepts this sharing requests and scopes the share to the client access point that owns the shared disk resource.

 

A new sharing method was introduced with Windows 2008.  This is the Share_Info_503 structure.  (http://msdn.microsoft.com/en-us/library/cc462916(VS.85).aspx)  This structure allows the programmer to specify the server name as part of the sharing call.  Here is an excerpt from the MSDN page.

shi503_servername

A pointer to a string that specifies the DNS or NetBIOS name of the remote server on which the shared resource resides. A value of "*" indicates no configured server name.

When using this sharing structure, the programmer can specify to create the share against the node name, the cms name (in the Exchange case), or both.  The only exception is when the folder to be shared resides on a shared disk.  Cluster will intercept this sharing call and allow the share to only be scoped to the client access point that owns the physical disk resource.

 

*Note:  When creating shares on a shared storage device in Windows 2008 you should install KB 955733.

 

Exchange 2007 SP1 RU5 – Error regarding replication between computers in different domains when using standby continuous replication (SCR).

Exchange 2007 SP1 RU5 introduces an error into the enable-storagegroupcopy -standbymachine commandlet when attempting to enable SCR on a storage group.

The error is only present when there is a parent domain -> child domain active directory domain structure.  The issue does not occur if the active directory domain / forest is flat.

When attempting to use the enable-storagegroupcopy -standbymachine <scrTarget> commandlet, the following error is returned:

Enable-StorageGroupCopy:  Standby continuous replication is not supported between computers in different Active Directory domains.  The target node is in domain <child.parent.com>, which is different from the source domain <parent.com>.

This error is correct if the machines are actually in different domains, as SCR targets must be in the same domain as their sources.  In this case the error is being thrown incorrectly when the machines are actually members of the same domain.

To correct this issue:

1)  Verify that all machines are actually members of the same domain.

 

2)  From the machine where the commandlet is being run, uninstall RU5.  (Note:  If any of the machines are involved in replication, and RU5 is removed, it needs to be reinstalled after replication is established.  All machines involved in replication should run the same release rollup update).

 

or

 

3)  From a machine with just the management tools installed, with any RU prior to RU5, run the enable-storagegroupcopy -standbymachine commandlet.  (This would be preferred as it does not involve uninstalling and re-installing any already applied RUs.)  If any of the servers involved are cluster servers, the the command must be run from a management tools workstation that is the same version.  For example, if Windows 2008 is the operating system hosting Exchange clustering, then the tools workstation must also run the Exchange 2007 SP1 management tools and the Windows 2008 failover cluster manager.

 

The issue is scheduled to be corrected in Exchange 2007 SP1 RU8.

 

***UPDATE:  As of today this has been rescheduled for Exchange 2007 SP1 RU7***

Windows 2008 / Exchange 2007 SP1 – ESE 522 errors on CCR passive or SCR target machine.

There are some users that are experiencing errors on CCR (cluster continuous replication) passive nodes or SCR (standby continuous replication) target machines. 

Here is an example error:

Event ID     : 522
Raw Event ID : 522
Record Nr.   : 3965
Category     : General
Source       : ESE
Type         : Error
Generated    : 6/4/2008 4:48:58 PM
Written      : 6/4/2008 4:48:58 PM
Machine      : server.domain.com
Message      : Microsoft.Exchange.Cluster.ReplayService (7012) Log Verifier e0a 31573001: An attempt to open the device name "\sourceshare$" containing "\sourceshare$" failed with system error 5 (0x00000005): "Access is denied. ".  The operation will fail with error -1032 (0xfffffbf8).

For more information, click http://www.microsoft.com/contentredirect.asp.

The error -1032 (0xfffffbf8) translates to Jet_errFileAccessDenied.

The errors most commonly occur when:

1)  Replication is paused and resumed automatically to accomidate a backup operation.

2)  Replication is paused and resumed using suspend-storagegroupcopy and resume-storagegroupcopy (or through the GUI equivalents in the Exchange Management Console).

3)  The replication service is restarted or the entire node / target is restarted.

4)  Databases are mounted and dismounted using dismount-mailboxdatabase and mount-mailboxdatabase (or though the GUI equivalents in Exchange Management Console).

5)  The CMS (clustered mailbox server) is stopped and restarted using stop-clusteredmailboxserver and start-clusteredmailboxserver (or though the GUI equivalents in Exchange Management Console).

If you manually review the log file directories on the passive or target machines they are relatively in sync with the source machine. 

Likewise, when using any replication health check including get-storagegroupcopystatus, get-storagegroupcopystatus -standbymachine, and test-replicationhealth no errors are displayed.

Here is an example get-storagegroupcopystatus on a two node Exchange 2007 SP1 / Windows 2008 CCR cluster.

 

Name                  SummaryCopySt
atus
CopyQueueLeng
th
ReplayQueueL
ength
LastInspecte
dLogTime
—- ————-     ————-   ————      ————
2008-MBX3-SG1 Healthy           0               0               12/21/200…
2008-MBX3-SG2  Healthy           0               0               12/21/200…

 

Here is an example of test-replicationHealth

                         

Server                   Check                          Result Error                   
——          —–                          —— —-                   
2008-NODE1 PassiveNodeUp                  Passed
2008-NODE1 ClusterNetwork                 Passed 
2008-NODE1 QuorumGroup                    Passed
2008-NODE1 FileShareQuorum                Passed
2008-NODE1 CmsGroup                       Passed 
2008-NODE1 NodePaused                     Passed 
2008-NODE1 DnsRegistrationStatus          Passed 
2008-NODE1 ReplayService                  Passed
2008-NODE1 DBMountedFailover              Passed 
2008-NODE1 SGCopySuspended                Passed
2008-NODE1 SGCopyFailed                   Passed
2008-NODE1 SGInitializing                 Passed
2008-NODE1 SGCopyQueueLength              Passed 
2008-NODE1 SGReplayQueueLength            Passed                  

 

The error is caused by an invalid function call from the replication service to the operating system.  This causes the operating system to respond with access denied, and the replication service to respond by logging the ESE 522 event.

This issue is scheduled to be corrected in Exchange 2007 SP1 RU7.  There is currently no incremental update for the issue.

The issue can be considered benign if:

1)  Visual inspection of the log file directories show that replication is occurring.

2)  Get-storagegroupcopystatus or get-storagegroupcopystatus -standbymachine shows all storage groups in healthy state (some storage groups may be in an initializing state, in which case logs will need to be generated to determine status).

3)  Test-replicationhealth when run on the passive node or target server shows all tests passed.