Monthly Archives: July 2010

Continuous Replication Hostnames fail to create or function correctly with Exchange 2007 SP3 Cluster Continuous Replication (CCR) on Windows 2008 R2

Exchange 2007 SP3 adds the support for utilizing Windows 2008 R2 servers. 

In Exchange 2007 Cluster Continuous Replication (CCR) installations, all log shipping activity by default occurs over the “public” cluster interface.  When administrators desire to have log shipping activities occur over a “private” network or desire to implement multiple replication paths between nodes, continuous replication hostnames can be utilized.

More information on Exchange 2007 CCR clusters and continuous replication hostnames can be found at http://technet.microsoft.com/en-us/library/bb124521(EXCHG.80).aspx.

Prior to implementing a continuous replication host name the get-clusteredservermailboxstatus commandlet can be utilized to see the current names services replication.  Here is a sample output from a cluster not configured to utilize continuous replication hostnames.

Identity                        : MBX-3
ClusteredMailboxServerName      : MBX-3.domain.com
State                           : Online
OperationalMachines             : {NODE-1 <Active>, Node-2 <Quorum Owner>}
FailedResources                 : {}
OperationalReplicationHostNames : {node-1, node-2}
FailedReplicationHostNames      : {}
InUseReplicationHostNames       : {node-1, node-2}
IsValid                         : True
ObjectState                     : Unchanged

After establishing the pre-requisites necessary to utilize continuous replication hostnames, the hostnames creation is performed using the enable-continuousreplicationhostname shell command.  (http://technet.microsoft.com/en-us/library/bb690985(EXCHG.80).aspx)

When attempting to enable a replication hostname on a Windows 2008 R2 cluster, the following error may be displayed in the management shell.

[PS] C:>Enable-ContinuousReplicationHostName -TargetMachine Node-1 -HostName Node-1-Repl-A -IPv4Address 10.0.1.3

Confirm
Are you sure you want to perform this action?

Enabling continuous replication host name "Node-1-Repl-A".
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [S] Suspend  [?] Help
(default is "Y"):a
Enable-ContinuousReplicationHostName : Enable-ContinuousReplicationHostNameNetw
ork configuration could not be completed.
At line:1 char:37
+ Enable-ContinuousReplicationHostName <<<<  -TargetMachine Node-1 -HostName Node-1-Repl-A -IPv4Address 10.0.1.3
    + CategoryInfo          : InvalidOperation: (:) [Enable-ContinuousReplicat
   ionHostName], NetworkConfigException
    + FullyQualifiedErrorId : C3F1320,Microsoft.Exchange.Management.SystemConf
   igurationTasks.EnableContinuousReplicationHostName

When reviewing Failover Cluster Manager, the replication host name group containing the correct network name and ipv4 address appear to have been created successfully.

image

image

Although the continuous replication hostname group was created, reviewing get-clusteredservermailboxstatus indicates the name is not being utilized by the replication service on the cluster.

Identity                        : MBX-3
ClusteredMailboxServerName      : MBX-3.domain.com
State                           : Online
OperationalMachines             : {NODE-1 <Active>, Node-2 <Quorum Owner>}
FailedResources                 : {}
OperationalReplicationHostNames : {node-1, node-2}
FailedReplicationHostNames      : {}
InUseReplicationHostNames       : {node-1, node-2}

IsValid                         : True
ObjectState                     : Unchanged

When the replication service first starts up <or> the configuration time expires the replication service enumerates all network names on the cluster to determine which are valid endpoints for log shipping.  This is initially based on two cluster private properties stamped on each name, MSExchange_NetName and MSExchange_UseNetworkForLogCopying.  Each of these should have a value of 1 on a network name utilized as a continuous replication host name.

Listing private properties for ‘Network Name (Node-1-Repl-A)’:

T  Resource             Name                           Value

— ——————– —————————— ———————–

BR Network Name (Node-1-Repl-A) ResourceData                   01 00 00 00 … (260 bytes)

DR Network Name (Node-1-Repl-A) StatusNetBIOS                  0 (0x0)

DR Network Name (Node-1-Repl-A) StatusDNS                      0 (0x0)

DR Network Name (Node-1-Repl-A) StatusKerberos                 0 (0x0)

SR Network Name (Node-1-Repl-A) CreatingDC                     \DC-1.domain.com

FTR Network Name (Node-1-Repl-A) LastDNSUpdateTime              7/11/2010 2:26:26 PM

SR Network Name (Node-1-Repl-A) ObjectGUID                     5adc38b3281a004788f2a3e27ae7a0ce

S  Network Name (Node-1-Repl-A) Name                           NODE-1-REPL-A

S  Network Name (Node-1-Repl-A) DnsName                        Node-1-Repl-A

D  Network Name (Node-1-Repl-A) RemapPipeNames                 0 (0x0)

D  Network Name (Node-1-Repl-A) HostRecordTTL                  1200 (0x4b0)

D  Network Name (Node-1-Repl-A) RegisterAllProvidersIP         0 (0x0)

D  Network Name (Node-1-Repl-A) PublishPTRRecords              0 (0x0)

D  Network Name (Node-1-Repl-A) TimerCallbackAdditionalThreshold 5 (0x5)

D  Network Name (Node-1-Repl-A) MSExchange_NetName             1 (0x1)

D  Network Name (Node-1-Repl-A) RequireDNS                     1 (0x1)

D  Network Name (Node-1-Repl-A) MSExchange_UseNetworkForLogCopying 1 (0x1)

On the surface it would appear that there is nothing preventing this name from operating correctly as a continuous replication host name.  After performing some internal tracing it was determined that the replication service is also implementing another check on a network name resource to ensure that it can be satisfactorily utilized for replication – is Kerberos enabled for the network name.  The replication service performs this check by reviewing a private property of a network name resource – requirekerberos and ensuring it has a value of 1.

In Windows 2003 network name resources could be enabled for Kerberos at the administrators discretion.  In Windows 2008 and Windows 2008 R2 all network names must be Kerberos enabled.  In Windows 2008 requireKerberos is a valid private property and can be programatically set.  In Windows 2008 R2 the requireKerberos property has been deprecated and can be no longer be programmatically set.  Without the requireKerberos property in Windows 2008 R2 the enable-continuousreplicationhostname commandlet fails with the previously documented error. 

To work around this issue and allow the replication host names created with the enable-continuousreplicationhostname command to function the following steps can be performed:

  • Using the Exchange Managment Shell invoke the enable-continuousreplicationhostname command.  Allow the command to create the resource group, network name, and IPv4 resource. 
  • Verify with Failover Cluster Manager that the resource group, network name, and IPv4 resource were created and are online.
  • Manually set requireKerberos utilizing either cluster.exe or Failover Cluster Powershell extensions (preferred)
    • Cluster.exe
      • Set the requirekerberos key.
        • Cluster.exe <clusterFQDN> res "<Network Name> /priv requirekerberos=1:DWORD
        • Example:  cluster.exe cluster cluster-1.domain.com res “Network Name (Node-1-Repl-A)” /priv requirekerberos=1:DWORD
        • Note that requirekerberos is all lowercase.
      • Take offline and online the continuous replication hostname group.
        • Cluster.exe <clusterFQDN> group <Group> /offline
        • Example:  cluster.exe cluster.domain.com group “Node-1-Repl-A_group” /offline
        • Cluster.exe <clusterFQDN group <Group> /online
        • Example:  cluster.exe cluster.domain.com group “Node-1-Repl-A_group” /online
      • Restart the replication service
        • net stop msexchangerepl
        • net start msexchangerepl
    • PowerShell
      • Import the failover cluster powershell extensions.
        • Import-Module FailoverClusters
      • Set the requirekerberos key.
        • Get-ClusterResource <Network Name> | Set-ClusterParameter requirekerberos 1
        • Example:  Get-ClusterResource “Network Name (Node-1-Repl-A)” | Set-ClusterParameter –create requirekerberos 1
        • Node that requirekerberos is all lowercase.
      • Take offline and online the continuous replication hostname group.
        • Stop-ClusterGroup –cluster <ClusterFQDN> –Name <Group>
        • Example:  Stop-ClusterGroup –cluster Cluster.domain.com –Name Node-1-Repl-A_group
        • Start-ClusterGroup –cluster <ClusterFQDN> –Name <Group>
        • Example:  Start-ClusterGroup –cluster Cluster.domain.com –Name Node-1-Repl-A_group
      • Restart the replication service.
        • Stop-Service msexchangerepl
        • Start-Service msexchangerepl

At this time you can utilize either cluster.exe or powershell to verify that the requirekerboros key has been created with a value of 1.

Cluster.exe <clusterFQDN> res <Network Name> /priv  –> Cluster.exe cluster.domain.com res “Network Name (Node-1-Repl-A)” /priv

Listing private properties for ‘Network Name (Node-1-Repl-A)’:

T  Resource             Name                           Value

— ——————– —————————— ———————–

BR Network Name (Node-1-Repl-A) ResourceData                   01 00 00 00 … (260 bytes)

DR Network Name (Node-1-Repl-A) StatusNetBIOS                  0 (0x0)

DR Network Name (Node-1-Repl-A) StatusDNS                      0 (0x0)

DR Network Name (Node-1-Repl-A) StatusKerberos                 0 (0x0)

SR Network Name (Node-1-Repl-A) CreatingDC                     \DC-1.domain.com

FTR Network Name (Node-1-Repl-A) LastDNSUpdateTime              7/11/2010 2:26:26 PM

SR Network Name (Node-1-Repl-A) ObjectGUID                     5adc38b3281a004788f2a3e27ae7a0ce

S  Network Name (Node-1-Repl-A) Name                           NODE-1-REPL-A

S  Network Name (Node-1-Repl-A) DnsName                        Node-1-Repl-A

D  Network Name (Node-1-Repl-A) RemapPipeNames                 0 (0x0)

D  Network Name (Node-1-Repl-A) HostRecordTTL                  1200 (0x4b0)

D  Network Name (Node-1-Repl-A) RegisterAllProvidersIP         0 (0x0)

D  Network Name (Node-1-Repl-A) PublishPTRRecords              0 (0x0)

D  Network Name (Node-1-Repl-A) TimerCallbackAdditionalThreshold 5 (0x5)

D  Network Name (Node-1-Repl-A) MSExchange_NetName             1 (0x1)

D  Network Name (Node-1-Repl-A) RequireDNS                     1 (0x1)

D  Network Name (Node-1-Repl-A) MSExchange_UseNetworkForLogCopying 1 (0x1)

D  Network Name (Node-1-Repl-A) requirekerberos                1 (0x1)

Get-ClusterResource <NAME> | Get-ClusterParameter

Object              Name                Value               Type              
——              —-                —–               —-              
Network Name (No… Name                NODE-1-REPL-A       String            
Network Name (No… DnsName             Node-1-Repl-A       String            
Network Name (No… RemapPipeNames      0                   UInt32            
Network Name (No… HostRecordTTL       1200                UInt32            
Network Name (No… RegisterAllProvi… 0                   UInt32            
Network Name (No… PublishPTRRecords   0                   UInt32            
Network Name (No… TimerCallbackAdd… 5                   UInt32            
Network Name (No… MSExchange_NetName  1                   UInt32            
Network Name (No… RequireDNS          1                   UInt32            
Network Name (No… MSExchange_UseNe… 1                   UInt32            
Network Name (No… requirekerberos     1                   UInt32            
Network Name (No… ResourceData        {1, 0, 0, 0, 118… ByteArray         
Network Name (No… StatusNetBIOS       0                   UInt32            
Network Name (No… StatusDNS           0                   UInt32            
Network Name (No… StatusKerberos      0                   UInt32            
Network Name (No… CreatingDC         
\DC-1.domain…… String            
Network Name (No… LastDNSUpdateTime   7/11/2010 9:26:2… DateTime          
Network Name (No… ObjectGUID          5adc38b3281a0047… String
            

By restarting the replication service after setting this key the replication services configuration is immediately updated.  At this time the replication service should detect and begin to utilize the replication hostnames created.  This can be verified using the get-clusteredservermailboxstatus commandlet.

Identity                        : MBX-3
ClusteredMailboxServerName      : MBX-3.exchange.msft
State                           : Online
OperationalMachines             : {NODE-1 <Active>, Node-2 <Quorum Owner>}
FailedResources                 : {}
OperationalReplicationHostNames : {node-1-repl-a, node-1, node-2}
FailedReplicationHostNames      : {}
InUseReplicationHostNames       : {node-1-repl-a, node-2}

IsValid                         : True
ObjectState                     : Unchanged

At this time we are investigating a fix that does not require a workaround.  As changes occur I will update this blog.

Exchange 2010 – File Share Witness oddities…

In Exchange 2010 when a Database Availability Group (DAG) it utilized, and there is an even number of DAG members, the underlying cluster is implemented utilizing the quorum type Node and File Share Majority.  The settings utilized for the File Share Witness are defined on the DAG when the logical DAG object is created and are either set by the administrator or automatically defined.

To verify the quorum type you can use either cluster.exe or cluster powershell extensions (Preferred)

Cluster.exe <cluster> /quorum  (Windows 2008 & Windows 2008 R2)

Cluster.exe cluster.domain.com /quorum

Witness Resource Name Path                                          Type

——————— ——————————————— ——–

File Share Witness (\HT-1.DOMAIN.COMDAG.DOMAIN.COM)               Majority

Get-Cluster <cluster> | Get-ClusterQuorum | FL (Windows 2008 R2 Only)

Cluster        : DAG
QuorumResource : File Share Witness (
\HT-1.DOMAIN.COMDAG.DOMAIN.COM)
QuorumType     : NodeAndFileShareMajority

In Failover Cluster Manager, the resources can be viewed by looking at the Cluster Core Resources.

image

It may become necessary to change the server hosting the file share witness.  In Exchange 2010 this is not done utilizing Failover Cluster Manager, but rather utilizing the set-databaseavailabilitygroup commandlet.  It is after the witness server is successfully updated that the oddity occurs.  Here’s an example:

Currently the DAG utilizes the witness server HT-1.  Using the set-databaseavailabilitygroup command the witness server is changed to HT-2.  (set-databaseavailabilitygroupserver –witnessServer HT-2)  The command returns without error.  When running the previous cluster commands the following output is noted:

Cluster.exe cluster.domain.com /quorum (Windows 2008 and Windows 2008 R2)

Witness Resource Name Path                                          Type

——————— ——————————————— ——–

File Share Witness (\HT-1.DOMAIN.COMDAG.DOMAIN.COM)               Majority

Get-Cluster <cluster> | Get-ClusterQuorum | FL (Windows 2008 R2 Only)

Cluster        : DAG
QuorumResource : File Share Witness (
\HT-1.DOMAIN.COMDAG.DOMAIN.COM)
QuorumType     : NodeAndFileShareMajority

Also in Failover Cluster Manager the following is noted in the cluster core resources group.

image

After looking at this output the administrator could be lead to believe that the witness server did not successfully update.  After all both cluster.exe and powershell both show the File Share Witness (\HT-1.DOMAIN.COMDAG.DOMAIN.COM).  It is only in Failover Cluster Manager, if the windows is fully expanded, that you can see both (\HT-1.DOMAIN.COMDAG.DOMAIN.COM) and (\HT-2.DOMAIN.COMDAG.DOMAIN.COM).  This leads administrators to believe that two file share witness servers are currently in use.

Thankfully both of these perceived conditions are false.  The command was both successful in changing the witness server and only one file share witness is in use.

Each cluster resource has a display name and a set of public and private properties.  Unfortunately when using set-databaseavailabilitygroup to change the witness server, the File Share Witness resource private property for where the witness is stored is updated but the public property display name, which contains the previous witness server, is not.  Let’s take a look at this further.

Using cluster.exe or powershell I can review the private properties of the File Share Witness resource.  (Command output truncated to show relevant values only.)

Cluster.exe <cluster> res <resource> /priv <or> /prop (Windows 2008 & Windows 2008 R2)

Cluster.exe cluster.domain.com res “File Share Witness (\HT-1.domain.comDAG.domain.com)" /prop

Listing properties for ‘File Share Witness (\HT-1.domain.COMDAG.domain.COM)’:

T  Resource             Name                           Value

— ——————– —————————— ———————–

SR File Share Witness (\HT-1.domain.COMDAG.domain.COM) Name                           File Share Witness (\HT-1.domain.COMDAG.domain.COM)

Cluster.exe cluster.domain.com res “File Share Witness (\HT-1.domain.comDAG.domain.com)" /priv

Listing private properties for ‘File Share Witness (\HT-1.domain.COMDAG.domain.COM)’:

T  Resource             Name                           Value

— ——————– —————————— ———————–

S  File Share Witness (\HT-1.domain.COMDAG.domain.COM) SharePath                      \HT-1.domain.comDAG.domain.com

Get-ClusterResource –Cluster <cluster> –Name <ResourceName> | fl (Windows 2008 R2 Only – Public Properties)

Name         : File Share Witness (\HT-1.domain.COMDAG.domain.COM)
State        : Online
OwnerGroup   : Cluster Group
ResourceType : File Share Witness

Get-ClusterResource –Cluster <cluster> –Name <ResourceName> | Get-ClusterParameter fl (Windows 2008 R2 Only – Private Properties)

Name          : SharePath
IsReadOnly    : False
ParameterType : String
Value         : \HT-1.domain.comDAG.domain.com

At this time a set-databaseavailability group is issued to change the witness server.  After the command completes successfully, the previous commands are run.  (Command output truncated to show relevant values only.)

Cluster.exe cluster.domain.com res “File Share Witness (\HT-1.domain.comDAG.domain.com)" /prop

Listing properties for ‘File Share Witness (\HT-1.domain.COMDAG.domain.COM)’:

T  Resource             Name                           Value

— ——————– —————————— ———————–

SR File Share Witness (\HT-1.domain.COMDAG.domain.COM) Name                           File Share Witness (\HT-1.domain.COMDAG.domain.COM)

Cluster.exe cluster.domain.com res “File Share Witness (\HT-1.domain.comDAG.domain.com)" /priv

Listing private properties for ‘File Share Witness (\HT-1.domain.COMDAG.domain.COM)’:

T  Resource             Name                           Value

— ——————– —————————— ———————–

S  File Share Witness (\HT-1.domain.COMDAG.domain.COM) SharePath                      \HT-2.domain.comDAG.domain.com

(Note:  The SharePath in the previous output reflects the new witness server as expected)

Get-ClusterResource –Cluster <cluster> –Name <ResourceName> | fl (Windows 2008 R2 Only – Public Properties)

Name         : File Share Witness (\HT-1.domain.COMDAG.domain.COM)
State        : Online
OwnerGroup   : Cluster Group
ResourceType : File Share Witness

Get-ClusterResource –Cluster <cluster> –Name <ResourceName> | Get-ClusterParameter fl (Windows 2008 R2 Only – Private Properties)

Name          : SharePath
IsReadOnly    : False
ParameterType : String
Value         : \HT-2.domain.comDAG.domain.com

(Note:  The SharePath in the previous output reflects the new witness server as expected)

As you can see the set-databaseavailability group command did complete it’s task successfully by updating the SharePath attribute of the quorum resource to utilize the correct witness server.

Mount point design and MSSearch

The use of mount points for Exchange is becoming more common place in many installations.  Some customers feel the best implementation of mount points consists of a small root disk with mount points created from folders on that disk.

For example, I may have a Drive L: that is 10 megs and I may create 4 folders on this drive (Database1 / Database2 / Database3 / Database4).  I will then create mount points utilizing the folders created from the L drive.

There are certain process in Exchange that often check for free drive space prior to performing certain operations.  Unfortunately these processes are not necessarily mount point aware – therefore they end up querying the free drive space of the lettered volume rather than the mount point.  One of these process is MSSearch.

MSSearch by default creates a catalog data folder co-located with each EDB file.  In our example above the catalog data folder and the edb file would be in L:Database1 (where Database1 is the mount point).  In this this case the L drive has 10 megs free space but the Database1 mount point has 1.5 terabytes of free space.  When MSSearch attempts to initialize the initial catalog this operation fails as the drive space reported by the disk L is not sufficient (even though there is plenty of space where the actual catalog is stored).

Here is an example of some events you may see when this occurs.

Log Name:      Application
Source:        MSExchange Search Indexer
Date:          6/14/2010 12:11:20 PM
Event ID:      104
Task Category: General
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      server.company.com
Description:
Exchange Search Indexer failed to enable the Mailbox Database DATABASE(GUID = 58c0ed8a-dbfc-4d55-b265-8a80f1dc477b) after 1 tries. The last failure was: System.ComponentModel.Win32Exception: Unable to SetProperty FTE_PluginList on catalog ExSearch-58c0ed8a-dbfc-4d55-b265-8a80f1dc477b-26fc1c62-d3e8-4711-b3c9-3bb0b32aec0a. Error = -2147215320
   at Microsoft.Exchange.Msfte.CFTEAdmin.SetProperty(CatalogState catalogInfo, PropertyScope propertyScope, String propertyName, Object propertyValue, Boolean throwOnFailure)
   at Microsoft.Exchange.Msfte.CFTEAdmin.CreateCatalog(CatalogState catalogInfo)
   at Microsoft.Exchange.Search.Globals.CreateCatalog(CatalogState state, String reason)
   at Microsoft.Exchange.Search.Globals.RecreateCatalogAndPropertyStore(CatalogState catalogInfo, String reason)
   at Microsoft.Exchange.Search.CatalogState.CreateNew(String reason)
   at Microsoft.Exchange.Search.CatalogState.Reset(String reason)
   at Microsoft.Exchange.Search.CatalogState.HandleMountCatalogException(Exception exception)
   at Microsoft.Exchange.Search.Globals.CheckAndInitializeCatalog(CatalogState catalogInfo)
   at Microsoft.Exchange.Search.Driver.ProcessNewCatalogInternal(CatalogState catalog, List`1 mdbsToCrawl, Int32& numberOfDisabledMDBs). It will retry after 10 minutes.

 

Log Name:      Application
Source:        ExchangeStoreDB
Date:          6/14/2010 12:12:51 PM
Event ID:      222
Task Category: Database recovery
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      server.company.com
Description:
At ‘6/14/2010 11:12:50 AM’ the Microsoft Exchange Information Store Database ‘DATABASE’ copy on this server experienced a corrupted search catalog. The error returned by failover was "There is only one copy of this mailbox database (DATABASE). Automatic recovery is not available.". Consult the event log on the server for other "ExchangeStoreDb" and "MSExchange Search Indexer" events for more specific information about the failures.

The important information is actually contained in the first event – the error code –2147215320.  This error code translates to CI_E_CONFIG_DISK_FULL.

To resolve this issue you can:

  • Increase the space allotted to the root disk hosting the mount point.
  • Change from utilizing mount points to drive letters.

Once this is done restarting the MSSearch services may be necessary so that initial catalog creation can occur. 

MSExchangeRepl 2147 / MSExchangeRepl 2104 / MSExchangeRepl 2127 occurring on Windows 2008 or Windows 2008 R2 with Exchange 2007 Cluster Continuous Replication (CCR)

When Exchange 2007 CCR is installed on Windows 2008 or Windows 2008 R2 the following error may be noted in the application log of the passive node:

Log Name: Application
Source: MSExchangeRepl
Event ID: 2104
Task Category: Service
Level: Error
Keywords: Classic
User: N/A
Computer: MACHINE
Description:
Log file action LogCopy failed for storage group EXCLUST01SG2. Reason:
CreateFile(
\ServerStorageGroupGUID$LogFile.log) = 2

If the CCR cluster is not utilizing continuous replication host names the following event series may also be noted:

Event ID : 2147
Raw Event ID : 2147
Source : MSExchangeRepl
Type : Error
Machine : SERVER
Message : There was a problem with ‘ActiveNode’, which is an alternate name for ‘ActiveNode’. The list of aliases is now ‘ActiveNode’, and the alias ‘was’ removed from the list. The specific problem is ‘CreateFile(
\ActiveNodeStorageGroupGuid$LogFile.log) = 2′.

ID:       2127
Level:    Information
Provider: MSExchangeRepl
Machine:  SERVER
Message:  The system has detected a change in the available replication networks.  The system is now using network ‘ActiveNode’ instead of network ‘ActiveNode’ for log copying from node ActiveNode.

In this situation if the solution is aggressively monitored you may not that replication is temporarily failed and then resumes automatically as healthy.  This occurs due to a temporary pause in replication when the error condition is detected, while the replication service attempts to find other replication paths, and then automatically re-attempts the same copy operation.

If the CCR cluster is utilizing continuous replication host names the following event series may also be noted:

Event ID : 2147
Raw Event ID : 2147
Source : MSExchangeRepl
Type : Error
Machine : SERVER
Message : There was a problem with ‘ReplicationHostName’, which is an alternate name for ‘ActiveNode’. The list of aliases is now ‘ActiveNode’, and the alias ‘was’ removed from the list. The specific problem is ‘CreateFile(
\ReplicationHostNameStorageGroupGUID$LogFile.log) = 2′.

ID:       2127
Level:    Information
Provider: MSExchangeRepl
Machine:  SERVER
Message:  The system has detected a change in the available replication networks.  The system is now using network ‘ActiveNode’ instead of network ‘ReplicationHostName’ for log copying from node ActiveNode.

Error 2 is ERROR_FILE_NOT_FOUND

In this situation the error is detected on the replication host name.  The replication service will temporarily pause replication while other network paths are enumerated.  If other continuous replication host names are in use, the replication serivce will select an alternate replication host name and automatically resume log copying.  If the only path valid is the “public” path, the replication service will begin copying log files over the “public” network.  Eventually this error occurs on the public network, forcing network re-enumeration to occur and replication to automatically switch back to the replication network.  If the solution is aggressively monitored, the replication status may be failed during this switch but will automatically resume healthy.

In almost all incidences these errors are considered benign to the operation of the Exchange Server.

The replication service is extremely aggressive in its attempts to copy log files.  The replication service is always aware of the next log file in the series that requires copying to the passive node.  As part of normal processes the replication service may query multiple times for the presence of this file and make copy attempts.  These attempts may result in the replication service querying for a  log file that is not fully available.  Under Windows 2003 this was not necessarily an issue.  Windows 2008 introduces a component into SMBv2 that may cause this to be a problem.

SMBv2 introduces status caching into the LanManWorkstation service.  When an application requests information from a file share, the workstation service caches the response from the server hosting the share.  Subsequent requests for the same information are returned from cache rather than re-contacting the server hosting the share.  Eventually this cache will expire (in our case it expires by the time replication is failed / resumed <or> a switch between replication host names occur).  The replication service has received feedback that the log file in question should not be available for copy, attempts to copy it, and receives an older return status that the file is not ready (even though the file does exist on the source at the time the attempt is made).  In turn the replication service detects this as an error condition and takes action.

From a Windows 2008 / Windows 2008 R2 perspective this is by design.

To correct these errors on an Exchange 2007 / Windows 2008 <or> Exchange 2007 / Windows 2008 R2 implementation, the following registry keys should be set to a zero (0) value and the nodes rebooted:

HKEY_LOCAL_MACHINESystemCurrentControlSetServicesLanmanworkstationParameters

FileInfoCacheLifetime [DWORD]

FileNotFoundCacheLifetime [DWORD]

DirectoryCacheLifetime [DWORD]

If the DWORDs are not present they may need to be created.  The recommended value is HEX / DEC 0.

More information on these keys can be found here: http://technet.microsoft.com/en-us/library/ff686200(WS.10).aspx  (Note that registry path in the article is missing the SERVICES hive – correct path in blog post).