Have a DAG? Thinking about having a DAG? I recommend you check out this awesome blog on designing a highly available database copy layout…
Have a DAG? Thinking about having a DAG? I recommend you check out this awesome blog on designing a highly available database copy layout…
This is a great blog post on Windows 2008 and Windows 2008 R2 Cluster Logs.
TIMMCMIC
If an Exchange 2010 RTM server <or> an Exchange 2010 SP1 Beta has been upgraded to Exchange 2010 SP1 RTM administrators may experience an error when attempting to utilize the remove-mailboxdatabasecopy <or> add-mailboxdatabasecopy commandlets.
When running remove-mailboxdatabasecopy the following error is noted:
Remove-MailboxDatabaseCopy DAG-DB0DAG-2 –Verbose
WARNING: An unexpected error has occurred and a Watson dump is being generated: Registry key has subkeys and recursive removes are not supported by this method.
Registry key has subkeys and recursive removes are not supported by this method.
+ CategoryInfo : NotSpecified: (:) [Remove-MailboxDatabaseCopy], InvalidOperationException
+ FullyQualifiedErrorId : System.InvalidOperationException,Microsoft.Exchange.Management.SystemConfigurationTasks.
RemoveMailboxDatabaseCopy
Although the error is reported, the remove was successful in updating the database object within the active directory to show the server no longer hosts a copy of the database. You can verify the copy was successfully removed by reviewing the Servers with the get-mailboxdatabase –identity <NAME> | fl name, servers commandlet.
Here is sample output (note DAG-2 is missing):
[PS] D:>Get-MailboxDatabase DAG-DB0 | fl name,servers
Name : DAG-DB0
Servers : {DAG-1, DAG-3, DAG-4}
If an administrator attempts to add a database copy to a DAG member, the same error may also be returned.
Add-MailboxDatabaseCopy DAG-DB0 -MailboxServer DAG-2
WARNING: An unexpected error has occurred and a Watson dump is being generated: Registry key has subkeys and recursive
removes are not supported by this method.
Registry key has subkeys and recursive removes are not supported by this method.
+ CategoryInfo : NotSpecified: (:) [Add-MailboxDatabaseCopy], InvalidOperationException
+ FullyQualifiedErrorId : System.InvalidOperationException,Microsoft.Exchange.Management.SystemConfigurationTasks.
AddMailboxDatabaseCopy
Unlike the remove-mailboxdatabasecopy this command is not successful in adding the copy <or> updating the Active Directory to show the copy was added.
To work around this issue the administrator should:
1) Identify the GUID of the database that is being added.
2) On the server specified in the add command, using the database GUID identified, remove the following registry key:
HKEY_LOCAL_MACHINESOFTWAREMicrosoftExchangeServerv14ReplayState{DB-GUID}DumpsterInfo
To identify the mailbox database GUID, use the following command:
[PS] D:>Get-MailboxDatabase DAG-DB0 | fl name,GUID
Name : DAG-DB0
Guid : 8d3a9778-851c-40a4-91af-65a2c487b4cc
The GUID identified in this case is 8d3a9778-851c-40a4-91af-65a2c487b4cc. With this information we can no export and delete the DUMPSTERINFO key on the server where you are attempting to add the mailbox database copy.
Once the registry key is removed the add-mailboxdatabasecopy command will complete successfully and the database copy will be added.
A question that has come up a few times recently is why does the date modified timestamp on my Exchange databases not change (even though the database is mounted and functioning). Specifically some administrators have been looking at this as an indicator of health on a passive database copy – which it is not.
The date modified timestamp will generally get updated on an Exchange database when one of two things happen:
1) The EDB file size is extended in order to accommodate data that does not fit into whitespace that currently exists in the database.
2) The database is dismounted and all open handles to the file are released.
Note that the modified time is not subject to change if the contents of the file are changed – for example if whitespace is utilized within the database for the storage of new messages etc the date modified will not change.
To show this I used my lab to generate some examples. Here is a screen shot of a database that was mounted last on 8/3/2010. The database screen shot was taken 8/8/2010 before 8:29 am edt.
Using the Exchange Management Console, I dismounted the database at 8:29 am edt on 8/8/2010.
You will note that the date modified changed to the time and date the dismount occurred. I then used the Exchange Management Console to re-mount the database.
After remounting the database I noted that the time remained the same as in the previous screen shot. I then took some test mailboxes with content, and moved them into the mailbox store. You will note in this screen shot that both the size and date modified changed – in this case the database file was extended on the partition so the change was expected.
It is normal for an Exchange database to not show an updated date modified and this field should be used to judge the health or utilization of an Exchange database.
In Exchange 2007 there are two clustered installation models. Some customers elect to utilize a clustered installation model based on shared storage – this is a single copy cluster installation. In order to achieve site resiliency or provide for disaster recovery, some customers will implement a SAN based data replication solution.
Recently I encountered a customer that was utilizing SAN based data replication and the single copy cluster installation model to provide their site resilient solution. The installation encompassed a source cluster with single copy configuration and a target cluster with single copy cluster configuration. Each clustered mailbox server was established utilizing a different name – for example Exchange-Main and Exchange-DR. The physical disk resources that were assigned to each CMS instance represented the LUNs that were replicated between SANs. When it was necessary to activate the solution databases would be marked as “Allow this database to be overwritten by a restore” and then mounted. Mailboxes would be moved utilizing the move-mailbox –configurationOnly to restore client access to the replicated databases
This presented an interesting challenge for this customer when it came to deploying service packs. When the same physical disk resources are utilized between clusters, only one set of the physical disk resources can be brought online. This is because one SAN has a Read / Write setting and the other SAN has a Read Only setting. Essentially an online attempt of the database instances of the CMS Exchange-DR would fail because their dependant physical disks could not be brought online (because they were read only).
When an /upgradeCMS is performed after upgrading the binaries on a clustered node, the resources are initially in an offline state. As a completion of the upgradeCMS the setup process initiates an online to the cluster mailbox server group. Should any resources fail to come online this is considered a failure of the upgrade. The administrator performing the upgrade is notified that a failure occurred and the upgrade setup watermark persists in the registry. Therefore it is necessary that the /upgradeCMS be allowed to complete. In this case database instances could not be brought online because their associated storage could not be brought online due to the storage being Read Only.
In order to complete the upgrade process the following steps were utilized (utilizing my sample clustered mailbox server names).
At this point both Exchange-Main and Exchange-DR are online. This means that the databases that were previously replicated to Exchange-DR are no longer equal to the databases that exist on Exchange-Main. As a post upgrade step we need to do the following:
In this installation it was necessary to temporarily break and re-establish replication in order to complete the /upgradeCMS process.
Exchange 2007 SP3 adds the support for utilizing Windows 2008 R2 servers.
In Exchange 2007 Cluster Continuous Replication (CCR) installations, all log shipping activity by default occurs over the “public” cluster interface. When administrators desire to have log shipping activities occur over a “private” network or desire to implement multiple replication paths between nodes, continuous replication hostnames can be utilized.
More information on Exchange 2007 CCR clusters and continuous replication hostnames can be found at http://technet.microsoft.com/en-us/library/bb124521(EXCHG.80).aspx.
Prior to implementing a continuous replication host name the get-clusteredservermailboxstatus commandlet can be utilized to see the current names services replication. Here is a sample output from a cluster not configured to utilize continuous replication hostnames.
Identity : MBX-3
ClusteredMailboxServerName : MBX-3.domain.com
State : Online
OperationalMachines : {NODE-1 <Active>, Node-2 <Quorum Owner>}
FailedResources : {}
OperationalReplicationHostNames : {node-1, node-2}
FailedReplicationHostNames : {}
InUseReplicationHostNames : {node-1, node-2}
IsValid : True
ObjectState : Unchanged
After establishing the pre-requisites necessary to utilize continuous replication hostnames, the hostnames creation is performed using the enable-continuousreplicationhostname shell command. (http://technet.microsoft.com/en-us/library/bb690985(EXCHG.80).aspx)
When attempting to enable a replication hostname on a Windows 2008 R2 cluster, the following error may be displayed in the management shell.
[PS] C:>Enable-ContinuousReplicationHostName -TargetMachine Node-1 -HostName Node-1-Repl-A -IPv4Address 10.0.1.3
Confirm
Are you sure you want to perform this action?
Enabling continuous replication host name "Node-1-Repl-A".
[Y] Yes [A] Yes to All [N] No [L] No to All [S] Suspend [?] Help
(default is "Y"):a
Enable-ContinuousReplicationHostName : Enable-ContinuousReplicationHostNameNetw
ork configuration could not be completed.
At line:1 char:37
+ Enable-ContinuousReplicationHostName <<<< -TargetMachine Node-1 -HostName Node-1-Repl-A -IPv4Address 10.0.1.3
+ CategoryInfo : InvalidOperation: (:) [Enable-ContinuousReplicat
ionHostName], NetworkConfigException
+ FullyQualifiedErrorId : C3F1320,Microsoft.Exchange.Management.SystemConf
igurationTasks.EnableContinuousReplicationHostName
When reviewing Failover Cluster Manager, the replication host name group containing the correct network name and ipv4 address appear to have been created successfully.
Although the continuous replication hostname group was created, reviewing get-clusteredservermailboxstatus indicates the name is not being utilized by the replication service on the cluster.
Identity : MBX-3
ClusteredMailboxServerName : MBX-3.domain.com
State : Online
OperationalMachines : {NODE-1 <Active>, Node-2 <Quorum Owner>}
FailedResources : {}
OperationalReplicationHostNames : {node-1, node-2}
FailedReplicationHostNames : {}
InUseReplicationHostNames : {node-1, node-2}
IsValid : True
ObjectState : Unchanged
When the replication service first starts up <or> the configuration time expires the replication service enumerates all network names on the cluster to determine which are valid endpoints for log shipping. This is initially based on two cluster private properties stamped on each name, MSExchange_NetName and MSExchange_UseNetworkForLogCopying. Each of these should have a value of 1 on a network name utilized as a continuous replication host name.
Listing private properties for ‘Network Name (Node-1-Repl-A)’:
T Resource Name Value
— ——————– —————————— ———————–
BR Network Name (Node-1-Repl-A) ResourceData 01 00 00 00 … (260 bytes)
DR Network Name (Node-1-Repl-A) StatusNetBIOS 0 (0x0)
DR Network Name (Node-1-Repl-A) StatusDNS 0 (0x0)
DR Network Name (Node-1-Repl-A) StatusKerberos 0 (0x0)
SR Network Name (Node-1-Repl-A) CreatingDC \DC-1.domain.com
FTR Network Name (Node-1-Repl-A) LastDNSUpdateTime 7/11/2010 2:26:26 PM
SR Network Name (Node-1-Repl-A) ObjectGUID 5adc38b3281a004788f2a3e27ae7a0ce
S Network Name (Node-1-Repl-A) Name NODE-1-REPL-A
S Network Name (Node-1-Repl-A) DnsName Node-1-Repl-A
D Network Name (Node-1-Repl-A) RemapPipeNames 0 (0x0)
D Network Name (Node-1-Repl-A) HostRecordTTL 1200 (0x4b0)
D Network Name (Node-1-Repl-A) RegisterAllProvidersIP 0 (0x0)
D Network Name (Node-1-Repl-A) PublishPTRRecords 0 (0x0)
D Network Name (Node-1-Repl-A) TimerCallbackAdditionalThreshold 5 (0x5)
D Network Name (Node-1-Repl-A) MSExchange_NetName 1 (0x1)
D Network Name (Node-1-Repl-A) RequireDNS 1 (0x1)
D Network Name (Node-1-Repl-A) MSExchange_UseNetworkForLogCopying 1 (0x1)
On the surface it would appear that there is nothing preventing this name from operating correctly as a continuous replication host name. After performing some internal tracing it was determined that the replication service is also implementing another check on a network name resource to ensure that it can be satisfactorily utilized for replication – is Kerberos enabled for the network name. The replication service performs this check by reviewing a private property of a network name resource – requirekerberos and ensuring it has a value of 1.
In Windows 2003 network name resources could be enabled for Kerberos at the administrators discretion. In Windows 2008 and Windows 2008 R2 all network names must be Kerberos enabled. In Windows 2008 requireKerberos is a valid private property and can be programatically set. In Windows 2008 R2 the requireKerberos property has been deprecated and can be no longer be programmatically set. Without the requireKerberos property in Windows 2008 R2 the enable-continuousreplicationhostname commandlet fails with the previously documented error.
To work around this issue and allow the replication host names created with the enable-continuousreplicationhostname command to function the following steps can be performed:
At this time you can utilize either cluster.exe or powershell to verify that the requirekerboros key has been created with a value of 1.
Cluster.exe <clusterFQDN> res <Network Name> /priv –> Cluster.exe cluster.domain.com res “Network Name (Node-1-Repl-A)” /priv
Listing private properties for ‘Network Name (Node-1-Repl-A)’:
T Resource Name Value
— ——————– —————————— ———————–
BR Network Name (Node-1-Repl-A) ResourceData 01 00 00 00 … (260 bytes)
DR Network Name (Node-1-Repl-A) StatusNetBIOS 0 (0x0)
DR Network Name (Node-1-Repl-A) StatusDNS 0 (0x0)
DR Network Name (Node-1-Repl-A) StatusKerberos 0 (0x0)
SR Network Name (Node-1-Repl-A) CreatingDC \DC-1.domain.com
FTR Network Name (Node-1-Repl-A) LastDNSUpdateTime 7/11/2010 2:26:26 PM
SR Network Name (Node-1-Repl-A) ObjectGUID 5adc38b3281a004788f2a3e27ae7a0ce
S Network Name (Node-1-Repl-A) Name NODE-1-REPL-A
S Network Name (Node-1-Repl-A) DnsName Node-1-Repl-A
D Network Name (Node-1-Repl-A) RemapPipeNames 0 (0x0)
D Network Name (Node-1-Repl-A) HostRecordTTL 1200 (0x4b0)
D Network Name (Node-1-Repl-A) RegisterAllProvidersIP 0 (0x0)
D Network Name (Node-1-Repl-A) PublishPTRRecords 0 (0x0)
D Network Name (Node-1-Repl-A) TimerCallbackAdditionalThreshold 5 (0x5)
D Network Name (Node-1-Repl-A) MSExchange_NetName 1 (0x1)
D Network Name (Node-1-Repl-A) RequireDNS 1 (0x1)
D Network Name (Node-1-Repl-A) MSExchange_UseNetworkForLogCopying 1 (0x1)
D Network Name (Node-1-Repl-A) requirekerberos 1 (0x1)
Get-ClusterResource <NAME> | Get-ClusterParameter
Object Name Value Type
—— —- —– —-
Network Name (No… Name NODE-1-REPL-A String
Network Name (No… DnsName Node-1-Repl-A String
Network Name (No… RemapPipeNames 0 UInt32
Network Name (No… HostRecordTTL 1200 UInt32
Network Name (No… RegisterAllProvi… 0 UInt32
Network Name (No… PublishPTRRecords 0 UInt32
Network Name (No… TimerCallbackAdd… 5 UInt32
Network Name (No… MSExchange_NetName 1 UInt32
Network Name (No… RequireDNS 1 UInt32
Network Name (No… MSExchange_UseNe… 1 UInt32
Network Name (No… requirekerberos 1 UInt32
Network Name (No… ResourceData {1, 0, 0, 0, 118… ByteArray
Network Name (No… StatusNetBIOS 0 UInt32
Network Name (No… StatusDNS 0 UInt32
Network Name (No… StatusKerberos 0 UInt32
Network Name (No… CreatingDC \DC-1.domain…… String
Network Name (No… LastDNSUpdateTime 7/11/2010 9:26:2… DateTime
Network Name (No… ObjectGUID 5adc38b3281a0047… String
By restarting the replication service after setting this key the replication services configuration is immediately updated. At this time the replication service should detect and begin to utilize the replication hostnames created. This can be verified using the get-clusteredservermailboxstatus commandlet.
Identity : MBX-3
ClusteredMailboxServerName : MBX-3.exchange.msft
State : Online
OperationalMachines : {NODE-1 <Active>, Node-2 <Quorum Owner>}
FailedResources : {}
OperationalReplicationHostNames : {node-1-repl-a, node-1, node-2}
FailedReplicationHostNames : {}
InUseReplicationHostNames : {node-1-repl-a, node-2}
IsValid : True
ObjectState : Unchanged
At this time we are investigating a fix that does not require a workaround. As changes occur I will update this blog.
In Exchange 2010 when a Database Availability Group (DAG) it utilized, and there is an even number of DAG members, the underlying cluster is implemented utilizing the quorum type Node and File Share Majority. The settings utilized for the File Share Witness are defined on the DAG when the logical DAG object is created and are either set by the administrator or automatically defined.
To verify the quorum type you can use either cluster.exe or cluster powershell extensions (Preferred)
Cluster.exe <cluster> /quorum (Windows 2008 & Windows 2008 R2)
Cluster.exe cluster.domain.com /quorum
Witness Resource Name Path Type
——————— ——————————————— ——–
File Share Witness (\HT-1.DOMAIN.COMDAG.DOMAIN.COM) Majority
Get-Cluster <cluster> | Get-ClusterQuorum | FL (Windows 2008 R2 Only)
Cluster : DAG
QuorumResource : File Share Witness (\HT-1.DOMAIN.COMDAG.DOMAIN.COM)
QuorumType : NodeAndFileShareMajority
In Failover Cluster Manager, the resources can be viewed by looking at the Cluster Core Resources.
It may become necessary to change the server hosting the file share witness. In Exchange 2010 this is not done utilizing Failover Cluster Manager, but rather utilizing the set-databaseavailabilitygroup commandlet. It is after the witness server is successfully updated that the oddity occurs. Here’s an example:
Currently the DAG utilizes the witness server HT-1. Using the set-databaseavailabilitygroup command the witness server is changed to HT-2. (set-databaseavailabilitygroupserver –witnessServer HT-2) The command returns without error. When running the previous cluster commands the following output is noted:
Cluster.exe cluster.domain.com /quorum (Windows 2008 and Windows 2008 R2)
Witness Resource Name Path Type
——————— ——————————————— ——–
File Share Witness (\HT-1.DOMAIN.COMDAG.DOMAIN.COM) Majority
Get-Cluster <cluster> | Get-ClusterQuorum | FL (Windows 2008 R2 Only)
Cluster : DAG
QuorumResource : File Share Witness (\HT-1.DOMAIN.COMDAG.DOMAIN.COM)
QuorumType : NodeAndFileShareMajority
Also in Failover Cluster Manager the following is noted in the cluster core resources group.
After looking at this output the administrator could be lead to believe that the witness server did not successfully update. After all both cluster.exe and powershell both show the File Share Witness (\HT-1.DOMAIN.COMDAG.DOMAIN.COM). It is only in Failover Cluster Manager, if the windows is fully expanded, that you can see both (\HT-1.DOMAIN.COMDAG.DOMAIN.COM) and (\HT-2.DOMAIN.COMDAG.DOMAIN.COM). This leads administrators to believe that two file share witness servers are currently in use.
Thankfully both of these perceived conditions are false. The command was both successful in changing the witness server and only one file share witness is in use.
Each cluster resource has a display name and a set of public and private properties. Unfortunately when using set-databaseavailabilitygroup to change the witness server, the File Share Witness resource private property for where the witness is stored is updated but the public property display name, which contains the previous witness server, is not. Let’s take a look at this further.
Using cluster.exe or powershell I can review the private properties of the File Share Witness resource. (Command output truncated to show relevant values only.)
Cluster.exe <cluster> res <resource> /priv <or> /prop (Windows 2008 & Windows 2008 R2)
Cluster.exe cluster.domain.com res “File Share Witness (\HT-1.domain.comDAG.domain.com)" /prop
Listing properties for ‘File Share Witness (\HT-1.domain.COMDAG.domain.COM)’:
T Resource Name Value
— ——————– —————————— ———————–
SR File Share Witness (\HT-1.domain.COMDAG.domain.COM) Name File Share Witness (\HT-1.domain.COMDAG.domain.COM)
Cluster.exe cluster.domain.com res “File Share Witness (\HT-1.domain.comDAG.domain.com)" /priv
Listing private properties for ‘File Share Witness (\HT-1.domain.COMDAG.domain.COM)’:
T Resource Name Value
— ——————– —————————— ———————–
S File Share Witness (\HT-1.domain.COMDAG.domain.COM) SharePath \HT-1.domain.comDAG.domain.com
Get-ClusterResource –Cluster <cluster> –Name <ResourceName> | fl (Windows 2008 R2 Only – Public Properties)
Name : File Share Witness (\HT-1.domain.COMDAG.domain.COM)
State : Online
OwnerGroup : Cluster Group
ResourceType : File Share Witness
Get-ClusterResource –Cluster <cluster> –Name <ResourceName> | Get-ClusterParameter fl (Windows 2008 R2 Only – Private Properties)
Name : SharePath
IsReadOnly : False
ParameterType : String
Value : \HT-1.domain.comDAG.domain.com
At this time a set-databaseavailability group is issued to change the witness server. After the command completes successfully, the previous commands are run. (Command output truncated to show relevant values only.)
Cluster.exe cluster.domain.com res “File Share Witness (\HT-1.domain.comDAG.domain.com)" /prop
Listing properties for ‘File Share Witness (\HT-1.domain.COMDAG.domain.COM)’:
T Resource Name Value
— ——————– —————————— ———————–
SR File Share Witness (\HT-1.domain.COMDAG.domain.COM) Name File Share Witness (\HT-1.domain.COMDAG.domain.COM)
Cluster.exe cluster.domain.com res “File Share Witness (\HT-1.domain.comDAG.domain.com)" /priv
Listing private properties for ‘File Share Witness (\HT-1.domain.COMDAG.domain.COM)’:
T Resource Name Value
— ——————– —————————— ———————–
S File Share Witness (\HT-1.domain.COMDAG.domain.COM) SharePath \HT-2.domain.comDAG.domain.com
(Note: The SharePath in the previous output reflects the new witness server as expected)
Get-ClusterResource –Cluster <cluster> –Name <ResourceName> | fl (Windows 2008 R2 Only – Public Properties)
Name : File Share Witness (\HT-1.domain.COMDAG.domain.COM)
State : Online
OwnerGroup : Cluster Group
ResourceType : File Share Witness
Get-ClusterResource –Cluster <cluster> –Name <ResourceName> | Get-ClusterParameter fl (Windows 2008 R2 Only – Private Properties)
Name : SharePath
IsReadOnly : False
ParameterType : String
Value : \HT-2.domain.comDAG.domain.com
(Note: The SharePath in the previous output reflects the new witness server as expected)
As you can see the set-databaseavailability group command did complete it’s task successfully by updating the SharePath attribute of the quorum resource to utilize the correct witness server.
The use of mount points for Exchange is becoming more common place in many installations. Some customers feel the best implementation of mount points consists of a small root disk with mount points created from folders on that disk.
For example, I may have a Drive L: that is 10 megs and I may create 4 folders on this drive (Database1 / Database2 / Database3 / Database4). I will then create mount points utilizing the folders created from the L drive.
There are certain process in Exchange that often check for free drive space prior to performing certain operations. Unfortunately these processes are not necessarily mount point aware – therefore they end up querying the free drive space of the lettered volume rather than the mount point. One of these process is MSSearch.
MSSearch by default creates a catalog data folder co-located with each EDB file. In our example above the catalog data folder and the edb file would be in L:Database1 (where Database1 is the mount point). In this this case the L drive has 10 megs free space but the Database1 mount point has 1.5 terabytes of free space. When MSSearch attempts to initialize the initial catalog this operation fails as the drive space reported by the disk L is not sufficient (even though there is plenty of space where the actual catalog is stored).
Here is an example of some events you may see when this occurs.
Log Name: Application
Source: MSExchange Search Indexer
Date: 6/14/2010 12:11:20 PM
Event ID: 104
Task Category: General
Level: Error
Keywords: Classic
User: N/A
Computer: server.company.com
Description:
Exchange Search Indexer failed to enable the Mailbox Database DATABASE(GUID = 58c0ed8a-dbfc-4d55-b265-8a80f1dc477b) after 1 tries. The last failure was: System.ComponentModel.Win32Exception: Unable to SetProperty FTE_PluginList on catalog ExSearch-58c0ed8a-dbfc-4d55-b265-8a80f1dc477b-26fc1c62-d3e8-4711-b3c9-3bb0b32aec0a. Error = -2147215320
at Microsoft.Exchange.Msfte.CFTEAdmin.SetProperty(CatalogState catalogInfo, PropertyScope propertyScope, String propertyName, Object propertyValue, Boolean throwOnFailure)
at Microsoft.Exchange.Msfte.CFTEAdmin.CreateCatalog(CatalogState catalogInfo)
at Microsoft.Exchange.Search.Globals.CreateCatalog(CatalogState state, String reason)
at Microsoft.Exchange.Search.Globals.RecreateCatalogAndPropertyStore(CatalogState catalogInfo, String reason)
at Microsoft.Exchange.Search.CatalogState.CreateNew(String reason)
at Microsoft.Exchange.Search.CatalogState.Reset(String reason)
at Microsoft.Exchange.Search.CatalogState.HandleMountCatalogException(Exception exception)
at Microsoft.Exchange.Search.Globals.CheckAndInitializeCatalog(CatalogState catalogInfo)
at Microsoft.Exchange.Search.Driver.ProcessNewCatalogInternal(CatalogState catalog, List`1 mdbsToCrawl, Int32& numberOfDisabledMDBs). It will retry after 10 minutes.
Log Name: Application
Source: ExchangeStoreDB
Date: 6/14/2010 12:12:51 PM
Event ID: 222
Task Category: Database recovery
Level: Error
Keywords: Classic
User: N/A
Computer: server.company.com
Description:
At ‘6/14/2010 11:12:50 AM’ the Microsoft Exchange Information Store Database ‘DATABASE’ copy on this server experienced a corrupted search catalog. The error returned by failover was "There is only one copy of this mailbox database (DATABASE). Automatic recovery is not available.". Consult the event log on the server for other "ExchangeStoreDb" and "MSExchange Search Indexer" events for more specific information about the failures.
The important information is actually contained in the first event – the error code –2147215320. This error code translates to CI_E_CONFIG_DISK_FULL.
To resolve this issue you can:
Once this is done restarting the MSSearch services may be necessary so that initial catalog creation can occur.
When Exchange 2007 CCR is installed on Windows 2008 or Windows 2008 R2 the following error may be noted in the application log of the passive node:
Log Name: Application
Source: MSExchangeRepl
Event ID: 2104
Task Category: Service
Level: Error
Keywords: Classic
User: N/A
Computer: MACHINE
Description:
Log file action LogCopy failed for storage group EXCLUST01SG2. Reason:
CreateFile(\ServerStorageGroupGUID$LogFile.log) = 2
If the CCR cluster is not utilizing continuous replication host names the following event series may also be noted:
Event ID : 2147
Raw Event ID : 2147
Source : MSExchangeRepl
Type : Error
Machine : SERVER
Message : There was a problem with ‘ActiveNode’, which is an alternate name for ‘ActiveNode’. The list of aliases is now ‘ActiveNode’, and the alias ‘was’ removed from the list. The specific problem is ‘CreateFile(\ActiveNodeStorageGroupGuid$LogFile.log) = 2′.
ID: 2127
Level: Information
Provider: MSExchangeRepl
Machine: SERVER
Message: The system has detected a change in the available replication networks. The system is now using network ‘ActiveNode’ instead of network ‘ActiveNode’ for log copying from node ActiveNode.
In this situation if the solution is aggressively monitored you may not that replication is temporarily failed and then resumes automatically as healthy. This occurs due to a temporary pause in replication when the error condition is detected, while the replication service attempts to find other replication paths, and then automatically re-attempts the same copy operation.
If the CCR cluster is utilizing continuous replication host names the following event series may also be noted:
Event ID : 2147
Raw Event ID : 2147
Source : MSExchangeRepl
Type : Error
Machine : SERVER
Message : There was a problem with ‘ReplicationHostName’, which is an alternate name for ‘ActiveNode’. The list of aliases is now ‘ActiveNode’, and the alias ‘was’ removed from the list. The specific problem is ‘CreateFile(\ReplicationHostNameStorageGroupGUID$LogFile.log) = 2′.
ID: 2127
Level: Information
Provider: MSExchangeRepl
Machine: SERVER
Message: The system has detected a change in the available replication networks. The system is now using network ‘ActiveNode’ instead of network ‘ReplicationHostName’ for log copying from node ActiveNode.
Error 2 is ERROR_FILE_NOT_FOUND
In this situation the error is detected on the replication host name. The replication service will temporarily pause replication while other network paths are enumerated. If other continuous replication host names are in use, the replication serivce will select an alternate replication host name and automatically resume log copying. If the only path valid is the “public” path, the replication service will begin copying log files over the “public” network. Eventually this error occurs on the public network, forcing network re-enumeration to occur and replication to automatically switch back to the replication network. If the solution is aggressively monitored, the replication status may be failed during this switch but will automatically resume healthy.
In almost all incidences these errors are considered benign to the operation of the Exchange Server.
The replication service is extremely aggressive in its attempts to copy log files. The replication service is always aware of the next log file in the series that requires copying to the passive node. As part of normal processes the replication service may query multiple times for the presence of this file and make copy attempts. These attempts may result in the replication service querying for a log file that is not fully available. Under Windows 2003 this was not necessarily an issue. Windows 2008 introduces a component into SMBv2 that may cause this to be a problem.
SMBv2 introduces status caching into the LanManWorkstation service. When an application requests information from a file share, the workstation service caches the response from the server hosting the share. Subsequent requests for the same information are returned from cache rather than re-contacting the server hosting the share. Eventually this cache will expire (in our case it expires by the time replication is failed / resumed <or> a switch between replication host names occur). The replication service has received feedback that the log file in question should not be available for copy, attempts to copy it, and receives an older return status that the file is not ready (even though the file does exist on the source at the time the attempt is made). In turn the replication service detects this as an error condition and takes action.
From a Windows 2008 / Windows 2008 R2 perspective this is by design.
To correct these errors on an Exchange 2007 / Windows 2008 <or> Exchange 2007 / Windows 2008 R2 implementation, the following registry keys should be set to a zero (0) value and the nodes rebooted:
HKEY_LOCAL_MACHINESystemCurrentControlSetServicesLanmanworkstationParameters
FileInfoCacheLifetime [DWORD]
FileNotFoundCacheLifetime [DWORD]
DirectoryCacheLifetime [DWORD]
If the DWORDs are not present they may need to be created. The recommended value is HEX / DEC 0.
More information on these keys can be found here: http://technet.microsoft.com/en-us/library/ff686200(WS.10).aspx (Note that registry path in the article is missing the SERVICES hive – correct path in blog post).
(An Exchange 2007 version of this article can be found here: http://blogs.technet.com/b/timmcmic/archive/2009/04/27/network-port-design-and-exchange-2007-clusters.aspx)
A question that has come up is how many network ports should I have in my DAG members and how should I use them.
I generally see three different hardware configurations:
In some hardware there are now 4 port cards. The information contained here can be expanded to include additional hardware / port configurations as they become available.
You’ll note that there is no configuration with a single network port – I personally do not recommend having only a single network port even though this is now a supported implementation. (Note: VLANS to a single port are not two network interfaces).
Network Teaming
In the recommendations I’ll outline next you will see references to the use of network teaming. It’s important to note that Microsoft does not support network teaming as this is hardware vendor supported and designed technology. What it is though is a recognition that in absence of anyway to provide multiple client facing ports for Exchange network teaming does have a valid place in the overall high availability design.
When using network teaming, only the client facing network should be a teamed adapter and at all times the team created for NETWORK FAULT TOLERANCE. Do not, for an Exchange instance, use any type of load balancing between ports.
For non-client facing networks it is not necessary to implement at network team (these would typically be your “heartbeat” networks). Windows clustering has the ability to balance and use all interfaces on the cluster designated for cluster use without the need to establish teaming for cluster / heartbeat communications.
From a support perspective any customer that establishes a teamed interface for the client side network should recognize that they may be asked to dissolve the team to support troubleshooting efforts.
MAPI Networks
For Exchange 2010 DAG MAPI networks I recommend using a network fault tolerant team consisting of two ports. More ports maybe utilized if they are available.
Replication Networks
After a team has been utilized for the MAPI network the remaining network interfaces can be divided into replication networks. I do not recommend that any form of network teaming be utilized on replication networks. Utilization of teaming on replication networks – although supported – is redundant. Both the replication service and cluster service have the ability to switch between these additional networks as necessary. All additional networks must be on their own subnet, subnets between networks may not overlap on the host.
Cluster Networks
There is no reason to establish dedicated cluster heartbeat networks with Exchange 2010 DAG members as cluster can utilized all configured interfaces between hosts for heartbeat exchange.
==============================
Updated – 6/2/10 – It is supported to use teaming on non-client facing networks although in theory this is redundant as both the replication service and cluster service have the ability to utilize multiple secondary interfaces.
==============================