Category Archives: Uncategorized

Exchange 2013: Health Manager service may not reliably start after server boot.

Office365: POP and IMAP clients receive OWA links for calendar invitations

As you may know, Office 365 supports a number of different client protocols, including POP and IMAP and a variety of POP and IMAP clients. By default, POP or IMAP clients will be configured to use Outlook Web App (OWA) for handling calendar invitations. When these clients receive a meeting request, within the body of the invite is a link. Clicking on the link allows the user to open their mailbox via OWA so they can accept or decline the request.

If you are using a POP or IMAP client that is capable of handling ICAL messages, you may want to change your configuration settings so that you can get a better experience.

You can configure – on a per-mailbox basis – how POP and IMAP clients receive calendar appointments. Get-CASMailbox can be used to view your current settings.

Get-CASMAILBOX –identity <NAME> | fl name,*pop*,*imap*

Name                                    : administrator
ExternalPopSettings                     :
InternalPopSettings                     :
PopEnabled                              : True
PopUseProtocolDefaults                  : True
PopMessagesRetrievalMimeFormat          : BestBodyFormat
PopEnableExactRFC822Size                : False
PopSuppressReadReceipt                  : False
PopForceICalForCalendarRetrievalOption : False
ExternalImapSettings                    :
InternalImapSettings                    :
ImapEnabled                             : True
ImapUseProtocolDefaults                 : True
ImapMessagesRetrievalMimeFormat         : BestBodyFormat
ImapEnableExactRFC822Size               : False
ImapSuppressReadReceipt                 : False
ImapForceICalForCalendarRetrievalOption : False

In the above output, you can see two attributes that you need to set to True to enable iCAL support. The *ICALForCalendarRetrievalOption specifies that the client should be provided calendar appointments that are in ICAL format.

Also note the *UseProtocolDefaults attributes. These must be set to False in order for any changes to *ICALForCalendarRetrievalOption to take effect. You can use Set-CASMailbox to change these settings:

Set-CASMailbox –identity <NAME> –PopUseProtocolDefaults:$FALSE –ImapUseProtocolDefaults:$FALSE –PopForceICalForCalendarRetrievalOption:$TRUE –ImapForceICalForCalendarRetrievalOption:$TRUE

Name                                    : administrator
ExternalPopSettings                     :
InternalPopSettings                     :
PopEnabled                              : True
PopUseProtocolDefaults                  : False
PopMessagesRetrievalMimeFormat          : BestBodyFormat
PopEnableExactRFC822Size                : False
PopSuppressReadReceipt                  : False
PopForceICalForCalendarRetrievalOption : True
ExternalImapSettings                    :
InternalImapSettings                    :
ImapEnabled                             : True
ImapUseProtocolDefaults                 : False
ImapMessagesRetrievalMimeFormat         : BestBodyFormat
ImapEnableExactRFC822Size               : False
ImapSuppressReadReceipt                 : False
ImapForceICalForCalendarRetrievalOption : True

The settings described here are per-mailbox settings. There is no global setting to change the default for the entire tenant. There is also no method to adjust the settings per-client.

Exchange 2013 Cumulative Update 2 – Where are my databases?

3 Replies

In Exchange Server 2013 Cumulative Update 2 (CU2) we introduced support for a maximum of 100 mounted mailbox databases per server (active or passive) when an Enterprise Edition license is applied. This was an increase from Exchange Server 2013 RTM and Exchange Server 2013 Cumulative Update 1 (CU1).

It has been recently discovered that when updating to CU2 from Exchange Server 2013 RTM or Exchange Server 2013 CU1, the database limit is not increased to 100. This results in the inability to mount more than 50 databases on a server running the Enterprise Edition of Exchange Server 2013. When installing Exchange Server 2013 CU2 directly, the database limit is applied correctly.

The Core Problem

In this example, I deployed a new install of Exchange Server 2013 CU1 on Windows Server 2012 (although, the operating system doesn’t matter here). After the Setup completed, I performing an LDP dump of the information store object, and noted a database limit of 5.

Expanding base ‘CN=InformationStore,CN=TEST-MBX-0,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=EXCHANGE,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=msft’…
Getting 1 entries:
Dn: CN=InformationStore,CN=TEST-MBX-0,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=EXCHANGE,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=msft
adminDisplayName: InformationStore;
cn: InformationStore;
distinguishedName: CN=InformationStore,CN=TEST-MBX-0,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=EXCHANGE,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=msft;
dSCorePropagationData: 0x0 = ( );
instanceType: 0x4 = ( WRITE );
msExchESEParamCircularLog: 0;
msExchESEParamCommitDefault: 0;
msExchESEParamDbExtensionSize: 256;
msExchESEParamEnableIndexChecking: TRUE;
msExchESEParamEnableOnlineDefrag: TRUE;
msExchESEParamLogFileSize: 5120;
msExchESEParamPageFragment: 8;
msExchESEParamPageTempDBMin: 0;
msExchESEParamZeroDatabaseDuringBackup: 0;
msExchMaxRestoreStorageGroups: 1;
msExchMaxStorageGroups: 5;
msExchMaxStoresPerGroup: 5;
msExchMaxStoresTotal: 5;
name: InformationStore;
objectCategory: CN=ms-Exch-Information-Store,CN=Schema,CN=Configuration,DC=exchange,DC=msft;
objectClass (3): top; container; msExchInformationStore;
objectGUID: 52c2fc98-b8b4-4d33-b35b-ca00cb3fcfff;
showInAdvancedViewOnly: TRUE;
uSNChanged: 24300;
uSNCreated: 24300;
whenChanged: 10/27/2013 3:58:35 PM Pacific Daylight Time;
whenCreated: 10/27/2013 3:58:35 PM Pacific Daylight Time;

———–

This is normal and expected because all unlicensed installations (e.g., Trial) are treated as Standard Edition, and Standard Edition can have a maximum of 5 mounted databases.

Next, I entered an Enterprise Edition license key. This updated the value msExchMaxStoresTotal to a value of 50, which is the expected value for the Enterprise Edition of Exchange Server 2013 RTM and Exchange Server 2013 CU1.

Expanding base ‘CN=InformationStore,CN=TEST-MBX-0,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=EXCHANGE,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=msft’…
Getting 1 entries:
Dn: CN=InformationStore,CN=TEST-MBX-0,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=EXCHANGE,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=msft
adminDisplayName: InformationStore;
cn: InformationStore;
distinguishedName: CN=InformationStore,CN=TEST-MBX-0,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=EXCHANGE,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=msft;
dSCorePropagationData: 0x0 = ( );
instanceType: 0x4 = ( WRITE );
msExchESEParamCircularLog: 0;
msExchESEParamCommitDefault: 0;
msExchESEParamDbExtensionSize: 256;
msExchESEParamEnableIndexChecking: TRUE;
msExchESEParamEnableOnlineDefrag: TRUE;
msExchESEParamLogFileSize: 5120;
msExchESEParamPageFragment: 8;
msExchESEParamPageTempDBMin: 0;
msExchESEParamZeroDatabaseDuringBackup: 0;
msExchMaxRestoreStorageGroups: 1;
msExchMaxStorageGroups: 100;
msExchMaxStoresPerGroup: 5;
msExchMaxStoresTotal: 50;
msExchMinAdminVersion: -2147453113;
msExchVersion: 4535486012416;
name: InformationStore;
objectCategory: CN=ms-Exch-Information-Store,CN=Schema,CN=Configuration,DC=exchange,DC=msft;
objectClass (3): top; container; msExchInformationStore;
objectGUID: 52c2fc98-b8b4-4d33-b35b-ca00cb3fcfff;
showInAdvancedViewOnly: TRUE;
uSNChanged: 27284;
uSNCreated: 24300;
whenChanged: 10/28/2013 4:44:30 AM Pacific Daylight Time;
whenCreated: 10/27/2013 3:58:35 PM Pacific Daylight Time;

———–

Next, I upgraded the server to Exchange Server 2013 CU2. Once Setup completed, I reviewed the value of msExchMaxStoresTotal, and found that it did not get updated as expected.

Expanding base ‘CN=InformationStore,CN=TEST-MBX-0,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=EXCHANGE,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=msft’…
Getting 1 entries:
Dn: CN=InformationStore,CN=TEST-MBX-0,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=EXCHANGE,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=msft
adminDisplayName: InformationStore;
cn: InformationStore;
distinguishedName: CN=InformationStore,CN=TEST-MBX-0,CN=Servers,CN=Exchange Administrative Group (FYDIBOHF23SPDLT),CN=Administrative Groups,CN=EXCHANGE,CN=Microsoft Exchange,CN=Services,CN=Configuration,DC=exchange,DC=msft;
dSCorePropagationData: 0x0 = ( );
instanceType: 0x4 = ( WRITE );
msExchESEParamCircularLog: 0;
msExchESEParamCommitDefault: 0;
msExchESEParamDbExtensionSize: 256;
msExchESEParamEnableIndexChecking: TRUE;
msExchESEParamEnableOnlineDefrag: TRUE;
msExchESEParamLogFileSize: 5120;
msExchESEParamPageFragment: 8;
msExchESEParamPageTempDBMin: 0;
msExchESEParamZeroDatabaseDuringBackup: 0;
msExchMaxRestoreStorageGroups: 1;
msExchMaxStorageGroups: 100;
msExchMaxStoresPerGroup: 5;
msExchMaxStoresTotal: 50;
msExchMinAdminVersion: -2147453113;
msExchVersion: 4535486012416;
name: InformationStore;
objectCategory: CN=ms-Exch-Information-Store,CN=Schema,CN=Configuration,DC=exchange,DC=msft;
objectClass (3): top; container; msExchInformationStore;
objectGUID: 52c2fc98-b8b4-4d33-b35b-ca00cb3fcfff;
showInAdvancedViewOnly: TRUE;
uSNChanged: 27284;
uSNCreated: 24300;
whenChanged: 10/28/2013 4:44:30 AM Pacific Daylight Time;
whenCreated: 10/27/2013 3:58:35 PM Pacific Daylight Time;

———–

In this state, when the administrator tries to mount more than 50 databases (any combination of active or passive) on this server, they will receive an error message:

[PS] C:>Mount-Database DB51
Couldn’t mount the database that you specified. Specified database: DB51; Error code: An Active Manager operation failed. Error: The database action failed. Error: Operation failed with message: MapiExceptionTooManyMountedDatabases: Unable to mount database. (hr=0x8004060e, ec=-2147219954)

Diagnostic context:
    Lid: 65256
    Lid: 10722   StoreEc: 0x8004060E
    Lid: 1494    —- Remote Context Beg —-
    Lid: 37952   dwParam: 0x144489A
    Lid: 39576   StoreEc: 0x977
    Lid: 35200   dwParam: 0x39F8
    Lid: 58864   StoreEc: 0x8004060E
    Lid: 43248   StoreEc: 0x8004060E
    Lid: 48432   StoreEc: 0x8004060E
    Lid: 54336   dwParam: 0x14448C8
    Lid: 1750    —- Remote Context End —-
    Lid: 1047    StoreEc: 0x8004060E [Database: DB51, Server: TEST-MBX-0.exchange.msft].
    + CategoryInfo          : InvalidOperation: (DB51:ADObjectId) [Mount-Database], InvalidOperationException
    + FullyQualifiedErrorId : [Server=TEST-MBX-0,RequestId=797928f4-fe83-4e1f-8545-e01a72aaf79d,TimeStamp=10/28/2013 7
   :53:32 PM] 5FE3BF1B,Microsoft.Exchange.Management.SystemConfigurationTasks.MountDatabase
    + PSComputerName        : test-mbx-0.exchange.msft

In addition, three events will be logged to the Application event log:

Log Name:      Application
Source:        MSExchangeIS
Date:          10/28/2013 1:00:27 PM
Event ID:      40003
Task Category: High Availability
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      Test-MBX-0.exchange.msft
Description:
Exceeded the max number of 50 databases on this server.

Log Name:      Application
Source:        MSExchangeRepl
Date:          10/28/2013 1:00:27 PM
Event ID:      3154
Task Category: Service
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      Test-MBX-0.exchange.msft
Description:
Active Manager failed to mount database DB51 on server Test-MBX-0.exchange.msft. Error: An Active Manager operation failed. Error: The database action failed. Error: Operation failed with message: MapiExceptionTooManyMountedDatabases: Unable to mount database. (hr=0x8004060e, ec=-2147219954)
Diagnostic context:
    Lid: 65256
    Lid: 10722   StoreEc: 0x8004060E
    Lid: 1494    —- Remote Context Beg —-
    Lid: 37952   dwParam: 0x14A9D25
    Lid: 39576   StoreEc: 0x977
    Lid: 58864   StoreEc: 0x8004060E
    Lid: 43248   StoreEc: 0x8004060E
    Lid: 48432   StoreEc: 0x8004060E
    Lid: 54336   dwParam: 0x14A9D54
    Lid: 1750    —- Remote Context End —-
    Lid: 1047    StoreEc: 0x8004060E

Log Name:      Application
Source:        ExchangeStoreDB
Date:          10/28/2013 1:00:27 PM
Event ID:      226
Task Category: Database recovery
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      Test-MBX-0.exchange.msft
Description:
At ’10/28/2013 1:00:27 PM’ the copy of database ‘DB51’ on this server didn’t mount because the number of mailbox database copies on this server exceeds the supported limit. The error returned by failover was "There is only one copy of this mailbox database (DB51). Automatic recovery is not available.".

Similar errors and events will occur when attempting to activate a passive database copy on a server with 50 mounted databases or when adding a passive database copy to a server that already has 50 database copies assigned.

How to Correct this Condition

Fortunately, fixing this problem is pretty easy. You simply re-enter your Enterprise Edition product key on the server, and restart the Microsoft Exchange Information Store service.

Moving Forward

For the immediate future, we expect this problem to exist when upgrading from Exchange Server 2013 CU1 to any future CU. However, once the server is re-licensed on CU2 or newer, the condition will remain resolved.

Exchange 2010: Remove-databaseavailabilitygroupserver–configurationOnly does not evict the member from the cluster.

8 Replies

Administrators may encounter conditions where DAG members cannot be gracefully removed from a database availability group. For example, a member server may have encountered an unrecoverable failure or the server may need to be removed from the DAG in order to perform a server recovery.

In order to account for these and similar conditions the remove-databaseavailabilitygroupserver –configurationOnly command exists. This command, when utilized, simply removes the member from the Database Availability Groups Active Directory object.

Here is an example…

Using get-databaseavailabilitygroup –status | fl name,servers,operationalservers the membership of the DAG can be verified. In this example the only operational server is MBX-1 since that is the only server currently running in the DAG.

[PS] C:>Get-DatabaseAvailabilityGroup DAG -status | fl name,Servers,OperationalServers

Name : DAG
Servers : {MBX-1, MBX-2}
OperationalServers : {MBX-1}

Using the remove-databaseavailabilitygroupserver –configurationOnly command a DAG member can be removed.

[PS] C:>Remove-DatabaseAvailabilityGroupServer -Identity DAG -MailboxServer MBX-2 -ConfigurationOnly

Confirm
Are you sure you want to perform this action?
Removing Mailbox server "MBX-2" from database availability group "DAG".
[Y] Yes [A] Yes to All [N] No [L] No to All [?] Help (default is "Y"): a

The results of the command can be verified using get-databaseavailabilitygroup –status | fl name,servers,operationalservers:

[PS] C:>Get-DatabaseAvailabilityGroup DAG -status | fl name,Servers,OperationalServers

Name : DAG
Servers : {MBX-1}
OperationalServers : {MBX-1}

When a server is removed from the DAG in this manner it is not evicted from the corresponding cluster. You can verify cluster membership using the built in cluster commands. Here is an example from this test:

( Windows 2008 / Windows 2008 R2 )

[PS] C:>cluster.exe node
Listing status for all available nodes:

Node           Node ID Status
————– ——- ———————
MBX-1                1 Up
MBX-2                2 Down

( Windows 2008 R2 )

[PS] C:>Import-Module FailoverClusters

[PS] C:>Get-ClusterNode

Name                                State
—-                                —–
mbx-1                                  Up
mbx-2                                Down

In general this issue surfaces when administrators complete a server rebuild operation and note that the rebuilt node cannot be added back to the cluster because it already exists in the cluster. Here is an example:

[PS] C:>Add-DatabaseAvailabilityGroupServer –identity DAG –mailboxServer MBX-2

WARNING: The operation wasn’t successful because an error was encountered. You may find more details in log file
"C:ExchangeSetupLogsDagTasksdagtask_2012-06-24_14-51-47.841_add-databaseavailabiltygroupserver.log".
A server-side database availability group administrative operation failed. Error: The operation failed. CreateCluster errors may result from incorrectly configured static addresses. Error: An error occurred while attempting a cluster operation. Error: Node mbx-2 is already joined to a cluster. [Server: MBX-1.domain.com]
+ CategoryInfo : InvalidArgument: (:) [Add-DatabaseAvailabilityGroupServer], DagTaskOperationFailedException
+ FullyQualifiedErrorId : D05F37CD,Microsoft.Exchange.Management.SystemConfigurationTasks.AddDatabaseAvailabilityGroupServer

When using the remove-databaseavailabilitygroupserver –configurationOnly administrators must remove the node from the cluster. This can be accomplished through two methods:

( Windows 2008 / Windows 2008 R2 )

Administrators may utilize Failover Cluster Manager. After connecting to the cluster servicing the Database Availability Group the nodes hive can be expanded. The administrator can right click on the node that was removed –> select more actions –> evict

( Windows 2008 R2 )

[PS] C:>Import-Module FailoverClusters

[PS] C:>Remove-ClusterNode MBX-2

Remove-ClusterNode
Are you sure you want to evict node mbx-2?
[Y] Yes [N] No [S] Suspend [?] Help (default is "Y"): y
Remove-ClusterNode : The cluster node ‘MBX-2’ was evicted from the cluster, but was not fully cleaned up. Please see the Failover Clustering application event log on node MBX-2 for more information.
    The RPC server is unavailable
At line:1 char:19
+ Remove-ClusterNode <<<< MBX-2
    + CategoryInfo          : NotSpecified: (:) [Remove-ClusterNode], ClusterCmdletException
    + FullyQualifiedErrorId : Remove-ClusterNode,Microsoft.FailoverClusters.PowerShell.RemoveClusterNodeCommand

(Note: The RPC error is expected as the command attempts to cleanup the local cluster configuration on the node but the node is not accessible)

After cleaning up the cluster configuration the administrator can run set-databaseavailabilitygroup –identity <DAGNAME> to ensure the appropriate cluster configuration is utilized.

RPC Filtering and Exchange 2010 Database Availability Groups

2 Replies

Recently I’ve had the opportunity to work with customers who were having issues seeding databases using update-mailboxdatabasecopy in Exchange 2010. When attempting to perform an update the following sample error was returned:

A source-side operation failed. Error An error occurred while performing the seed operation. Error: Failed to open a log truncation context to source server ‘SOURCE-SERVER’. Hresult: 0xc7ff07d7. Error: Failed to open a log truncation context because the Microsoft Exchange Information Store service is not running.. [Database: MailboxDatabase2, Server: TARGET-SERVER]

*Note that the HResult maybe different in the error even though the root of the issue is the same.

In each instance the server we were trying to run the update for was located across a WAN link or separated by firewall devices.

In the reference cases I worked we found that the devices providing the WAN connectivity were performing RPC packet inspection. For example, Threat Management Gateway has an RPC inspection agent and Cisco devices have a setting to enable DCERPC filtering. It would appear that certain RPCs that originate from Windows 2008 and Windows 2008 R2 do not conform to the expected format that these filters use. When a non-conforming packet is identified it is subsequently dropped.

We have also observed RPC filtering cause the following issues:

Continuous replication circular logging fails to trigger log truncation across nodes.
Log truncation does not occur in a DAG when a backup is successful on a member that has traffic between nodes subject to RPC filtering.
Backup header information for databases does not update on active database copies when a backup is successful on a member that has traffic between nodes subject to RPC filtering.

To correct the issue RPC filtering had to be disabled on both the source and target devices providing the WAN connectivity between sites.

Exchange 2013: OutlookMailboxDeeptTestProbe fails with Access Denied

5 Replies

I recently had an opportunity to work with a customer who was experiencing unexpected database failovers between nodes. When looking at the ProbeResults log within the ActiveMonitoring crimson event channel the following error event was noted (full event text at end of post):

Contained within the event is the following:

<Error>Error 0x5 (Access is denied) from ClientAsyncCallState.CheckCompletion: RpcAsyncCompleteCall

In the process of troubleshooting we determined that the LMCompatabilityLevel was set to 1. It should be noted that the default for Windows 2003 is a LMCompatabilityLevel of 2. When the LMCompatabilityLevel is set to 1 this causes the authentication mechanism that the health service utilizes for these probes to fail. Due to the number of logon failures to the same database the health service attempts to failover the database to another server to correct the condition. This does not correct the condition since the issue is domain level authentication and the LMCompatabilityLevel is consistent across all nodes.

This issue also impacted the OutlookSelfTestProbe and may result in failure of the RPC Client Access service terminating and restarting. When setting the LMCompatabilityLevel to 1 this forces the enablement of LanManHash. This causes the winhttpAutoLogon setting to change to high and no default credentials are sent with requests.

Information on LMCompatibilityLevel can be found here: http://technet.microsoft.com/en-us/library/cc960646.aspx

To correct this condition the LMCompatibilityLevel in the registry of the Exchange servers and all domain controllers was changed to 2 – and the servers were rebooted.

================================================================================

Log Name:      Microsoft-Exchange-ActiveMonitoring/ProbeResult
Source:        Microsoft-Exchange-ActiveMonitoring
Date:          7/8/2013 12:08:31 PM
Event ID:      2
Task Category: Probe result
Level:         Error
Keywords:
User:          SYSTEM
Computer:      SERVER.domain.com
Description:
Probe result (Name=OutlookMailboxDeepTestProbe/DATABASE)
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
<System>
    <Provider Name="Microsoft-Exchange-ActiveMonitoring" Guid="{ECD64F52-A3BC-47B8-B681-A11B7A1C8770}" />
    <EventID>2</EventID>
    <Version>0</Version>
    <Level>2</Level>
    <Task>2</Task>
    <Opcode>0</Opcode>
    <Keywords>0x800000000000000</Keywords>
    <TimeCreated SystemTime="2013-07-08T19:08:31.121798100Z" />
    <EventRecordID>3319929</EventRecordID>
    <Correlation />
    <Execution ProcessID="49060" ThreadID="43444" />
    <Channel>Microsoft-Exchange-ActiveMonitoring/ProbeResult</Channel>
    <Computer>SERVER.domain.com</Computer>
    <Security UserID="S-1-5-18" />
</System>
<UserData>
    <EventXML xmlns:auto-ns2="http://schemas.microsoft.com/win/2004/08/events" xmlns="myNs">
      <ResultId>3320940</ResultId>
      <ServiceName>Outlook.Protocol</ServiceName>
      <IsNotified>0</IsNotified>
      <ResultName>OutlookMailboxDeepTestProbe/DATABASE</ResultName>
      <WorkItemId>251</WorkItemId>
      <DeploymentId>0</DeploymentId>
      <MachineName>SERVER</MachineName>
      <Error>Error 0x5 (Access is denied) from ClientAsyncCallState.CheckCompletion: RpcAsyncCompleteCall
EEInfo: ComputerName: n/a
EEInfo: ProcessID: 49060
EEInfo: Generation Time: 2013-07-08 19:08:31.121
EEInfo: Generating component: 2
EEInfo: Status: 0x00000005
EEInfo: Detection location: 1710
EEInfo: Flags: 0
EEInfo: NumberOfParameters: 1
EEInfo: prm[0]: Long val: 0 (0x00000000)

EEInfo: ComputerName: n/a
EEInfo: ProcessID: 49060
EEInfo: Generation Time: 2013-07-08 19:08:31.121
EEInfo: Generating component: 14
EEInfo: Status: 0x00000005
EEInfo: Detection location: 1398
EEInfo: Flags: 0
EEInfo: NumberOfParameters: 2
EEInfo: prm[0]: Long val: 4 (0x00000004)
EEInfo: prm[1]: Long val: -1073606612 (0xC002102C)

EEInfo: ComputerName: n/a
EEInfo: ProcessID: 49060
EEInfo: Generation Time: 2013-07-08 19:08:31.121
EEInfo: Generating component: 13
EEInfo: Status: 0xC002102C
EEInfo: Detection location: 1401
EEInfo: Flags: 0
EEInfo: NumberOfParameters: 0

EEInfo: ComputerName: n/a
EEInfo: ProcessID: 49060
EEInfo: Generation Time: 2013-07-08 19:08:31.121
EEInfo: Generating component: 13
EEInfo: Status: 0x00000191
EEInfo: Detection location: 1417
EEInfo: Flags: 0
EEInfo: NumberOfParameters: 1
EEInfo: prm[0]: Unicode string: Unauthorized

EEInfo: ComputerName: n/a
EEInfo: ProcessID: 49060
EEInfo: Generation Time: 2013-07-08 19:08:31.106
EEInfo: Generating component: 13
EEInfo: Status: 0x00000000
EEInfo: Detection location: 3041
EEInfo: Flags: 0
EEInfo: NumberOfParameters: 0
</Error>
<Exception>Microsoft.Exchange.Rpc.RpcException: Error 0x5 (Access is denied) from ClientAsyncCallState.CheckCompletion: RpcAsyncCompleteCall
EEInfo: ComputerName: n/a
EEInfo: ProcessID: 49060
EEInfo: Generation Time: 2013-07-08 19:08:31.121
EEInfo: Generating component: 2
EEInfo: Status: 0x00000005
EEInfo: Detection location: 1710
EEInfo: Flags: 0
EEInfo: NumberOfParameters: 1
EEInfo: prm[0]: Long val: 0 (0x00000000)

   at Microsoft.Exchange.Rpc.ClientAsyncCallState.CheckCompletion()
   at Microsoft.Exchange.Rpc.ExchangeClient.ClientAsyncCallState_Connect.End(IntPtr& contextHandle, TimeSpan& pollsMax, Int32& retryCount, TimeSpan& retryDelay, String& dnPrefix, String& displayName, Int16[]& DATABASEVersion, ArraySegment`1& segmentExtendedAuxOut)
   at Microsoft.Exchange.Rpc.ExchangeClient.ExchangeAsyncRpcClient.EndConnect(ICancelableAsyncResult result, IntPtr& contextHandle, TimeSpan& pollsMax, Int32& retryCount, TimeSpan& retryDelay, String& dnPrefix, String& displayName, Int16[]& DATABASEVersion, ArraySegment`1& segmentExtendedAuxOut)
   at Microsoft.Exchange.RpcClientAccess.Monitoring.EmsmdbClient.ConnectCallContext.OnEnd(ICancelableAsyncResult asyncResult)
   at Microsoft.Exchange.RpcClientAccess.Monitoring.ClientCancelableCallContext`1.<InternalEnd>b__3(ICancelableAsyncResult r)
   at Microsoft.Exchange.RpcClientAccess.Monitoring.ClientCancelableCallContext`1.DeferExceptions[TArgIn](Action`1 guardedAction, TArgIn arg)</Exception>
      <RetryCount>0</RetryCount>
      <StateAttribute1>[null]</StateAttribute1>
      <StateAttribute2>SERVER.domain.com</StateAttribute2>
      <StateAttribute3>SERVER.domain.com</StateAttribute3>
      <StateAttribute4>HealthMailboxed1e4e23d3ba446aaccec3ae4e13c600</StateAttribute4>
      <StateAttribute5>{AC73C7EA-935A-4EBA-8B31-E9ECA8430D2A}</StateAttribute5>
      <StateAttribute6>0</StateAttribute6>
      <StateAttribute7>0</StateAttribute7>
      <StateAttribute8>0</StateAttribute8>
      <StateAttribute9>0</StateAttribute9>
      <StateAttribute10>0</StateAttribute10>
      <StateAttribute11>EMSMDB.Connect()</StateAttribute11>
      <StateAttribute12>[21] -EMSMDB.Connect(); </StateAttribute12>
      <StateAttribute13>[null]</StateAttribute13>
      <StateAttribute14>[null]</StateAttribute14>
      <StateAttribute15>[null]</StateAttribute15>
      <StateAttribute16>0</StateAttribute16>
      <StateAttribute17>0</StateAttribute17>
      <StateAttribute18>0</StateAttribute18>
      <StateAttribute19>0</StateAttribute19>
      <StateAttribute20>0</StateAttribute20>
      <StateAttribute21>[null]</StateAttribute21>
      <StateAttribute22>[null]</StateAttribute22>
      <StateAttribute23>[null]</StateAttribute23>
      <StateAttribute24>[null]</StateAttribute24>
      <StateAttribute25>[null]</StateAttribute25>
      <ResultType>4</ResultType>
      <ExecutionId>53740913</ExecutionId>
      <ExecutionStartTime>2013-07-08T19:08:31.1061979Z</ExecutionStartTime>
      <ExecutionEndTime>2013-07-08T19:08:31.1217981Z</ExecutionEndTime>
      <PoisonedCount>0</PoisonedCount>
      <ExtensionXml>[null]</ExtensionXml>
      <SampleValue>21.6446</SampleValue>
      <ExecutionContext>    Mailbox logon verification
        EMSMDB.Connect()
        Task produced output:
        – TaskStarted = 7/8/2013 12:08:31 PM
        – TaskFinished = 7/8/2013 12:08:31 PM
        – Exception = Microsoft.Exchange.Rpc.RpcException: Error 0x5 (Access is denied) from ClientAsyncCallState.CheckCompletion: RpcAsyncCompleteCall
EEInfo: ComputerName: n/a
EEInfo: ProcessID: 49060
EEInfo: Generation Time: 2013-07-08 19:08:31.121
EEInfo: Generating component: 2
EEInfo: Status: 0x00000005
EEInfo: Detection location: 1710
EEInfo: Flags: 0
EEInfo: NumberOfParameters: 1
EEInfo: prm[0]: Long val: 0 (0x00000000)

EEInfo: ComputerName: n/a
EEInfo: ProcessID: 49060
</ExecutionContext>
      <FailureContext>
      </FailureContext>
      <FailureCategory>-1</FailureCategory>
      <Version>65536</Version>
    </EventXML>
</UserData>
</Event>

================================================================================

Part 9: Datacenter Activation Coordination: An error caused a change in the current set of domain controllers

4 Replies

As a part of a datacenter switchover process, administrators run Stop-DatabaseAvailabilityGroup to stop DAG members in the failed datacenter. This cmdlet is responsible for updating the stoppedMailboxServers attribute of the DAG object within Active Directory.

When this command is run and multiple AD sites are involved, the command attempts to force AD replication between the sites so that all AD sites are aware of the stopped mailbox servers. This allows us to bypass issues that can arise when non-default replication times are used on AD site replication connections.

In many cases, not only are the Exchange severs in the primary datacenter failed, but so are the supporting domain controllers for that AD site. There may also be scenarios where the remote site where the command is being executed has no network connectivity to domain controllers in the primary site. When this occurs, Stop-DatabaseAvailabilityGroup fails with the following error:

[PS] C:>Stop-DatabaseAvailabilityGroup -ActiveDirectorySite Exchange-B -ConfigurationOnly:$TRUE -Identity DAG

Confirm
Are you sure you want to perform this action?
Stopping Mailbox servers for Active Directory site "Exchange-B" in database availability group "DAG".
[Y] Yes [A] Yes to All [N] No [L] No to All [?] Help (default is "Y"): a
WARNING: Active Directory couldn’t be updated in Exchange-B site(s) affected by the change to ‘DAG’. It won’t be
completely usable until after Active Directory replication occurs.
An error caused a change in the current set of domain controllers.
+ CategoryInfo : NotSpecified: (0:Int32) [], ADServerSettingsChangedException
+ FullyQualifiedErrorId : 3647E7F3

If this error occurs as a result of issuing Stop-DatabaseAvailbilityGroup when known connectivity issues exist to domain controllers in the site hosting the Exchange servers that are being stopped, the error can be safely ignored. When domain controllers come back up in the primary datacenter normal Active Directory replication will handle populating this attribute on those domain controllers and other safeguards exist in the product not necessitating this attribute be updated for the solution to function.

========================================================

Datacenter Activation Coordination Series:

Part 1: My databases do not mount automatically after I enabled Datacenter Activation Coordination (https://aka.ms/F6k65e)
Part 2: Datacenter Activation Coordination and the File Share Witness (https://aka.ms/Wsesft)
Part 3: Datacenter Activation Coordination and the Single Node Cluster (https://aka.ms/N3ktdy)
Part 4: Datacenter Activation Coordination and the Prevention of Split Brain (https://aka.ms/C13ptq)
Part 5: Datacenter Activation Coordination: How do I Force Automount Concensus? (https://aka.ms/T5sgqa)
Part 6: Datacenter Activation Coordination: Who has a say? (https://aka.ms/W51h6n)
Part 7: Datacenter Activation Coordination: When to run start-databaseavailabilitygroup to bring members back into the DAG after a datacenter switchover. (https://aka.ms/Oieqqp)
Part 8: Datacenter Activation Coordination: Stop! In the Name of DAG… (https://aka.ms/Uzogbq)
Part 9: Datacenter Activation Coordination: An error cause a change in the current set of domain controllers (https://aka.ms/Qlt035)

========================================================

Part 8: Datacenter Activation Coordination: Stop! In the Name of DAG…

19 Replies

Sometimes, even when following a specific process, it takes only one mistake to send the entire process off course. Recently I’ve worked with several customers on their datacenter switchover steps that have found themselves unable to complete the process. Let’s explore several examples of what happened…

====================================================================================

In the first example, we have a four member database availability group (DAG). Two members are deployed in the primary datacenter along with the witness server, and the other two members are installed in a remote datacenter with an alternate witness server. Each datacenter is an Active Directory site with a defined subnet. In this example, AD site Exchange-A is the primary datacenter and AD site Exchange-B is the remote datacenter. Here is an example network diagram:

In preparation for testing the witness server, MBX-1, MBX-2, and the router are powered down. This leaves MBX-3 and MBX-4 in a lost quorum state in the remote datacenter. The administrator starts the datacenter switchover process with Stop-DatabaseAvailabilityGroup, as shown in this example:

Stop-DatabaseAvailabilityGroup –identity DAG –ActiveDirectorySite Exchange-B –ConfigurationOnly:$TRUE –confirm:$FALSE

WARNING: Active Directory couldn’t be updated in Exchange-A site(s) affected by the change to ‘DAG’. It won’t be completely usable until after Active Directory replication occurs.
An error caused a change in the current set of domain controllers.
+ CategoryInfo : NotSpecified: (0:Int32) [], ADServerSettingsChangedException + FullyQualifiedErrorId : 372697AD

Next, the cluster service is stopped on MBX-3 and MBX-4.

Stop-service clussvc

To complete the switchover, Restore-DatabaseAvailabilityGroup is used.

Restore-DatabaseAvailabilityGroup –identity DAG –ActiveDirectorySite:Exchange-B

WARNING: The operation wasn’t successful because an error was encountered. You may find more details in log file
"C:ExchangeSetupLogsDagTasksdagtask_2012-08-12_14-07-52.764_restore-databaseavailabilitygroup.log".
Unable to get the status of the cluster service on server ‘MBX-2’. Error: ‘Cannot open Service Control Manager on computer ‘MBX-2′. This operation might require other privileges.’
+ CategoryInfo : InvalidArgument: (:) [Restore-DatabaseAvailabilityGroup], FailedToGetServiceStatusForNodeException
+ FullyQualifiedErrorId : A9B129A5,Microsoft.Exchange.Management.SystemConfigurationTasks.RestoreDatabaseAvailabilityGroup

The command returns an error indicating that it cannot contact server MBX-2 in order to determine the status of the Cluster service. Why is the task attempting to contact a server in the primary site that is down? Using Get-DatabaseAvailabilityGroup to review the properties of the DAG shows us why:

Get-DatabaseAvailabilityGroup -Identity DAG | fl name,servers,startedmailboxservers,stoppedmailboxservers

Name                                   : DAG
Servers                                : {MBX-2, MBX-3, MBX-4, MBX-1}
StoppedMailboxServers                  : {MBX-4.exchange.msft, MBX-3.exchange.msft}
StartedMailboxServers                  : {MBX-1.exchange.msft, MBX-2.exchange.msft}

We can examine StoppedMailboxServers and note that MBX-3 and MBX-4 are on the stopped list when they should be on the started servers list. This happened because in this instance the administrator stopped the wrong Active Directory site. When using Stop-DatabaseAvailabilityGroup, the administrator should have specified site Exchange-A but accidentally specified Exchange-B. This means the restore task is attempting to force the Cluster service on either MBX-1 or MBX-2 online and subsequently evict MBX-3 and MBX-4 from the cluster.

If this mistake is made, how do you fix it? The first step that needs to be done is to correct the stopped and started servers list. To do this, first stop the correct set of servers.

Stop-DatabaseAvailabilityGroup -Identity DAG -ActiveDirectorySite Exchange-A -ConfigurationOnly:$TRUE -Confirm:$FALSE

WARNING: Active Directory couldn’t be updated in Exchange-A site(s) affected by the change to ‘DAG’. It won’t be completely usable until after Active Directory replication occurs.
An error caused a change in the current set of domain controllers.
+ CategoryInfo : NotSpecified: (0:Int32) [], ADServerSettingsChangedException
+ FullyQualifiedErrorId : 372697AD

Next, use Get-DatabaseAvailabiltyGroup to confirm that all four servers in the DAG now appear on the StoppedMailboxSservers list.

Get-DatabaseAvailabilityGroup -Identity DAG | fl name,servers,stoppedmailboxservers,startedmailboxservers

Name                                   : DAG
Servers                                : {MBX-2, MBX-3, MBX-4, MBX-1}
StoppedMailboxServers                  : {MBX-2.exchange.msft, MBX-1.exchange.msft, MBX-4.exchange.msft, MBX-3.exchange.msft}
StartedMailboxServers                  : {}

The second step requires starting the servers in the remote datacenter. Start-DatabaseAvailabilityGroup can be used to do this, as shown in the following example:

Start-DatabaseAvailabilityGroup -Identity DAG -ActiveDirectorySite Exchange-B

WARNING: Active Directory couldn’t be updated in Exchange-A site(s) affected by the change to ‘DAG’. It won’t be completely usable until after Active Directory replication occurs.
An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API ‘OpenByNames(‘MBX-3.exchange.msft’, ‘MBX-4.exchange.msft’) failed for each server. Specific exceptions: ‘An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API ‘"OpenCluster(MBX-
3.exchange.msft) failed with 0x6d9. Error: There are no more endpoints available from the endpoint mapper"’ failed..’, ‘An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API ‘"OpenCluster(MBX-4.exchange.msft) failed with 0x6d9. Error: There are no more endpoints available from the endpoint mapper"’ failed..’.’ failed.. + CategoryInfo : NotSpecified: (0:Int32) [Start-DatabaseAvailabilityGroup], AmClusterApiException
+ FullyQualifiedErrorId : BA1A902A,Microsoft.Exchange.Management.SystemConfigurationTasks.StartDatabaseAvailabilityGroup

An error caused a change in the current set of domain controllers.

+ CategoryInfo : NotSpecified: (0:Int32) [], ADServerSettingsChangedException
+ FullyQualifiedErrorId : 372697AD

The failures that are displayed are expected. The Cluster services on the nodes are not in a started state at this time. Using Get-DatabaseAvailabilityGroup we note that the servers listed are correct for both the StartedMailboxServers and StoppedMailboxServers list.

Get-DatabaseAvailabilityGroup -Identity DAG | fl name,servers,stoppedmailboxservers,startedmailboxservers

Name                                   : DAG
Servers                                : {MBX-2, MBX-3, MBX-4, MBX-1}
StoppedMailboxServers                  : {MBX-2.exchange.msft, MBX-1.exchange.msft}
StartedMailboxServers                  : {MBX-3.exchange.msft, MBX-4.exchange.msft}

The third step is to ensure the Cluster service is stopped on each node, which can be accomplished by using Stop-Service.

Stop-Service ClusSvc

The last step is to use Restore-DatabaseAvailabiltyGroup. This cmdlet will complete the datacenter switchover process by forcing the Cluster service to start and by evicting the nodes on the StoppedMailboxServers list.

Restore-DatabaseAvailabilityGroup -Identity DAG -ActiveDirectorySite Exchange-B -Confirm:$FALSE

WARNING: The Exchange Trusted Subsystem is not a member of the local Administrators group on specified witness server dc-2.exchange.msft.

This completes the datacenter switchover for the database availability group. The procedure can now continue with database activation and changes required for client access.

====================================================================================

In the second example we have a four-member DAG. Two members are in the primary datacenter with the witness server, and two members are in a remote datacenter with an alternate witness server configured. Both datacenters are in the same Active Directory site. Here is an example network diagram:

In preparation for testing the witness server, MBX-1, MBX-2, and the router are powered down. This leaves MBX-3 and MBX-4 in a lost quorum state in the remote datacenter. So the administrator starts the datacenter switchover process by issuing Stop-DatabaseAvailabilityGroup, as illustrated in the following example:

Stop-DatabaseAvailabilityGroup -Identity DAG -ActiveDirectorySite Exchange -ConfigurationOnly:$TRUE -Confirm:$FALSE
WARNING: Active Directory couldn’t be updated in Exchange site(s) affected by the change to ‘DAG’. It won’t be completely usable until after Active Directory replication occurs.

Next, the Cluster service is stopped on MBX-3 and MBX-4.

Stop-service ClusSvc

To complete the activation, Restore-DatabaseAvailabilityGroup is issued.

Restore-DatabaseAvailabilityGroup -Identity DAG -ActiveDirectorySite Exchange -Confirm:$FALSE

WARNING: The operation wasn’t successful because an error was encountered. You may find more details in log file "C:ExchangeSetupLogsDagTasksdagtask_2012-08-12_16-57-27.326_restore-databaseavailabilitygroup.log". Unable to form quorum for database availability group ‘DAG’. Please try the operation again, or run the Restore-DatabaseAvailabilityGroup cmdlet and specify the site with servers known to be running.
+ CategoryInfo : InvalidArgument: (:) [Restore-DatabaseAvailabilityGroup], DagTaskQuorumNotAchievedException
+ FullyQualifiedErrorId : C7FE0CB9,Microsoft.Exchange.Management.SystemConfigurationTasks.RestoreDatabaseAvailabilityGroup

The command returns an error indicating that a quorum cannot be formed because no servers are known to be running. Why has this occurred? Using Get-DatabaseAvailabilityGroup we can review the properties of the DAG:

Get-DatabaseAvailabilityGroup -Identity DAG | fl name,servers,stoppedmailboxservers,startedmailboxservers

Name : DAG
Servers : {MBX-2, MBX-3, MBX-4, MBX-1}
StoppedMailboxServers : {MBX-1.exchange.msft, MBX-4.exchange.msft, MBX-3.exchange.msft, MBX-2.exchange.msft}
StartedMailboxServers : {}

Specifically we are interested in StoppedMailboxServers. In this example, all four DAG members appear in the StoppedMailboxServers list. Why is that? In our scenario, all Exchange servers are in the same Active Directory site. The administrator issued Stop-DatabaseAvailabiltyGroup command with the ActiveDirectorySite parameter when instead the MailboxServer parameter should have been used. The MailboxServer parameter was needed so that the administrator could stop individual servers instead of all of the servers in the same site.

If this mistake is made, you can recover from it fairly easily. The first step is to fix the started and stopped mailbox server lists. You can use Start-DatabaseAvailabilityGroup to correct this.

Start-DatabaseAvailabilityGroup -Identity DAG -MailboxServer MBX-3

An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API ‘OpenByNames(‘MBX-3.exchange.msft’) failed for each server. Specific exceptions: ‘An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API ‘"OpenCluster(MBX-3.exchange.msft) failed with 0x6d9. Error: There are no more endpoints available from the endpoint mapper"’ failed..’.’ failed..
+ CategoryInfo : NotSpecified: (0:Int32) [Start-DatabaseAvailabilityGroup], AmClusterApiException
+ FullyQualifiedErrorId : CE668F87,Microsoft.Exchange.Management.SystemConfigurationTasks.StartDatabaseAvailabilityGroup

Start-DatabaseAvailabilityGroup -Identity DAG -MailboxServer MBX-4

An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API ‘OpenByNames(‘MBX-3.exchange.msft’, ‘MBX-4.exchange.msft’) failed for each server. Specific exceptions: ‘An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API ‘"OpenCluster(MBX-
3.exchange.msft) failed with 0x6d9. Error: There are no more endpoints available from the endpoint mapper"’ failed..’,’An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API ‘"OpenCluster(MBX-4.exchange.msft) failed with 0x6d9. Error: There are no more endpoints available from the endpoint mapper"’ failed..’.’ failed..
+ CategoryInfo : NotSpecified: (0:Int32) [Start-DatabaseAvailabilityGroup], AmClusterApiException
+ FullyQualifiedErrorId : BB89A63D,Microsoft.Exchange.Management.SystemConfigurationTasks.StartDatabaseAvailabilityGroup

The failures that are displayed are expected. The Cluster services on the DAG members is not started at this time. We can use Get-DatabaseAvailabliityGroup to verify that the StartedMailboxServers and StoppedMailboxServers lists are correct.

Get-DatabaseAvailabilityGroup -Identity DAG | fl name,servers,stoppedmailboxservers,startedmailboxservers

Name : DAG
Servers : {MBX-2, MBX-3, MBX-4, MBX-1}
StoppedMailboxServers : {MBX-1.exchange.msft, MBX-2.exchange.msft}
StartedMailboxServers : {MBX-4.exchange.msft, MBX-3.exchange.msft}

The second step is to ensure that the Cluster service is stopped on MBX-3 and MBX-4.

Stop-Server ClusSvc

The last step is to run Restore-DatabaseAvailabilityGroup command. This will complete the datacenter switchover process by forcing the Cluster service to start and by evicting the nodes on the stopped servers list.

Restore-DatabaseAvailabilityGroup -Identity DAG -ActiveDirectorySite Exchange -Confirm:$FALSE

WARNING: The Exchange Trusted Subsystem is not a member of the local Administrators group on specified witness server dc-2.exchange.msft.

This completes the datacenter switchover for the database availability group. The procedure can now continue with database activation and changes required for client access.

====================================================================================

In the last example we have a four-member DAG. Two members are installed in a primary datacenter with the witness server, and two members are installed in a remote datacenter with an alternate witness server configured. Both datacenters are in the same Active Directory site. The same situation described in this example can occur when multiple Active Directory sites are used, but in my experience, this problem most commonly occurs with just a single Active Directory site. Here is an example network diagram:

In preparation for testing the witness server, MBX-1, MBX-2, and the router are powered down. This leaves MBX-3 and MBX-4 in a lost quorum state in the remote datacenter. So, the administrator starts the datacenter switchover process with Stop-DatabaseAvailabilityGroup:

Stop-DatabaseAvailabilityGroup -Identity DAG -ActiveDirectorySite Exchange -Confirm:$FALSE

Next, the cluster service is stopped on MBX-3 and MBX-4.

Stop-service ClusSvc

Finally, Restore-DatabaseAvailabilityGroup is issued.

Restore-DatabaseAvailabilityGroup -Identity DAG -ActiveDirectorySite Exchange -Confirm:$FALSE

As with the previous examples, the problem because the administrator issued the Stop-DatabaseAvailabilityGroup command and all servers were added to the stopped servers list. This is verified with Get-DatabaseAvailabilityGroup.

Get-DatabaseAvailabilityGroup -Identity DAG | fl name,servers,stoppedmailboxservers,startedmailboxservers

Name : DAG
Servers : {MBX-2, MBX-3, MBX-4, MBX-1}
StoppedMailboxServers : {MBX-1.exchange.msft, MBX-4.exchange.msft, MBX-3.exchange.msft, MBX-2.exchange.msft}
StartedMailboxServers : {}

The extent of the issue is realized when we attempt to correct the started and stopped mailbox server lists and proceed with the switchover process. As with the previous examples, we use Start-DatabaseAvailabilityGroup with the MailboxServer parameter to start the individual servers in the remote datacenter.

Start-DatabaseAvailabilityGroup -Identity DAG -MailboxServer MBX-3

Start-DatabaseAvailabilityGroup -Identity DAG -MailboxServer MBX-4

An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API ‘OpenByNames(‘MBX-3.exchange.msft’, ‘MBX-4.exchange.msft’) failed for each server. Specific exceptions: ‘An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API ‘"OpenCluster(MBX-3.exchange.msft) failed with 0x6d9. Error: There are no more endpoints available from the endpoint mapper"’ failed..’,’An Active Manager operation failed. Error An error occurred while attempting a cluster operation. Error: Cluster API ‘"OpenCluster(MBX-4.exchange.msft) failed with 0x6d9. Error: There are no more endpoints available from the endpoint mapper"’ failed..’.’ failed..
+ CategoryInfo : NotSpecified: (0:Int32) [Start-DatabaseAvailabilityGroup], AmClusterApiException
+ FullyQualifiedErrorId : BB89A63D,Microsoft.Exchange.Management.SystemConfigurationTasks.StartDatabaseAvailabilityGroup

The failures that are displayed are expected because the Cluster services on the DAG members are not in a started state. Using Get-DatabaseAvailabliityGroup, we note that the servers are correct on both the StartedMailboxServers and StoppedMailboxServers list.

Get-DatabaseAvailabilityGroup -Identity DAG | fl name,servers,stoppedmailboxservers,startedmailboxservers

Name : DAG
Servers : {MBX-2, MBX-3, MBX-4, MBX-1}
StoppedMailboxServers : {MBX-1.exchange.msft, MBX-2.exchange.msft}
StartedMailboxServers : {MBX-4.exchange.msft, MBX-3.exchange.msft}

Using Stop-Service, we stop the Cluster service on each server.

Stop-Service ClusSvc

Finally, the last step should be run Restore-DatabaseAvailabilityGroup force quorum on the remaining servers and evict the servers on the stopped servers list.

Restore-DatabaseAvailabilityGroup -Identity DAG -ActiveDirectorySite Exchange -Confirm:$FALSE

WARNING: The operation wasn’t successful because an error was encountered. You may find more details in log file "C:ExchangeSetupLogsDagTasksdagtask_2012-08-12_17-55-16.974_restore-databaseavailabilitygroup.log". Couldn’t start the Cluster service on ‘MBX-3’. Service state: Stopped. Try forcing the cluster to start without quorum by running "net start clussvc /fq" from a command prompt on that node. + CategoryInfo : InvalidArgument: (:) [Restore-DatabaseAvailabilityGroup], FailedToStartClusSvcException
+ FullyQualifiedErrorId : 6CD04940,Microsoft.Exchange.Management.SystemConfigurationTasks.RestoreDatabaseAvailabilityGroup

As shown above, Restore-DatabaseAvailabilityGroup failed because it failed to successfully start the Cluster service on MBX-3 using force quorum. The error suggests that the administrator should attempt to manually start the service with /forcequorum.

net start clussvc /fq

System error 1058 has occurred. The service cannot be started, either because it is disabled or because it has no enabled devices associated with it.

After attempting to manually start the Cluster service with /forceQuorum the above error is displayed, which indicates that the Cluster service is not installed.

When reviewing Service Control Manager, we note that the Cluster service on the remaining members is set to Disabled.

When reviewing the system event log, we see the following event at or about the time the Stop-DatabaseAvailabilityGroup was issued.

Log Name:      System
Source:        Service Control Manager
Date:          8/12/2012 10:30:07 AM
Event ID:      7040
Task Category: None
Level:         Information
Keywords:      Classic
User:          SYSTEM
Computer:      MBX-3.exchange.msft
Description:
The start type of the Cluster Service service was changed from auto start to disabled.

This is where the extent of the mistake is exposed. Stop-DatabaseAvailabilityGroup was not only run against servers that should not have been stopped, but it was also run without the ConfigurationOnly parameter. When the cmdlet is run without the ConfigurationOnly parameter, any servers that are being stopped that are accessible will have their Cluster service forcibly cleaned up. This in turn prevents Restore-DatabaseAvailabilityGroup from being successful.

In order to overcome this situation the administrator must re-establish the Cluster and then proceed with database activation. The first step is to ensure that the Cluster service is completely cleaned up from the DAG members in the remote datacenter.

Windows 2008:

Cluster Node /force

Attempting to clean up node ” …
Clean up successfully completed.

Windows 2008 R2:

Import-Module FailoverClusters

Clear-ClusterNode -Force -Verbose -Confirm:$FALSE

VERBOSE: Performing operation "Clear-ClusterNode" on Target "MBX-3".
VERBOSE: Clearing cluster node MBX-3.

The second step is to use Active Directory Users and Computers to locate the DAG’s CNO. Right-click the CNO and select RESET, and then right-click the CNO and select disable. Allow sufficient time for the disabled account to replicate around Active Directory.

The third step is to manually create the cluster. There are three methods to manually create the cluster.

Windows 2008 and Windows 2008 R2 utilizing Failover Cluster Manager:

Launch Failover Cluster Manager.

In the upper right corner select “Create a cluster…”

In the “Before you begin” dialog, select Next.

On the “Selected Server” dialog enter the server names of all servers in the remote datacenter. In our example, we will add MBX-3 and MBX-4. Select the Add button after each server name. Select Next when completed.

On the “Validation Warning” select NO. Select Next when completed.

On the “Access Point for Administering the Cluster” in the “Cluster Name:” field, enter the name of the DAG. In our example we will use DAG (creative eh?). In the networks dialog enter the IP address assigned to the DAG in the remote datacenter (if you are not sure you can use Get-DatabaseAvailabilityGroup | fl name,databaseavailabilitygroupipaddresses to list the IP addresses assigned to the DAG). Select Next when complete.

On the “Confirmation” select Next.

At this time the Cluster service should be configured on both servers. On the “Summary” select Finish.

The last step is to use the Exchange Management Shell and run the following command:

Set-DatabaseAvailabilityGroup –identity DAG

By running this command and not specifying any values this will ensure that the DAG settings from Active Directory are applied to the new cluster.

Windows 2008 Command Line:

Cluster.exe DAGNAME /create /nodes:”NODE1 NODE2 NODE3” /ipaddress:”IP/Subnet”

C:>cluster.exe DAG /create /nodes:"MBX-3 MBX-4" /ipAddress:"192.168.1.20/24"
4% Initializing Cluster DAG.
9% Validating cluster state on node MBX-3.
13% Searching the domain for computer object DAG
18% Verifying computer object DAG in the domain
22% Configuring computer object DAG as cluster name object
27% Validating installation of the Microsoft Failover Cluster Virtual Adapter on node MBX-3.
31% Validating installation of the Cluster Disk Driver on node MBX-3.
36% Configuring Cluster Service on node MBX-3.
40% Validating installation of the Microsoft Failover Cluster Virtual Adapter on node MBX-4.
45% Validating installation of the Cluster Disk Driver on node MBX-4.
50% Configuring Cluster Service on node MBX-4.
54% Starting Cluster Service on node MBX-3.
54% Starting Cluster Service on node MBX-4.
59% Forming cluster DAG.
63% Adding cluster common properties to DAG.
68% Creating resource types on cluster DAG.
72% Creating group ‘Cluster Group’.
72% Creating group ‘Available Storage’.
77% Creating IP Address resource ‘Cluster IP Address’.
81% Creating Network Name resource ‘DAG’.
86% Searching the domain for computer object DAG
90% Verifying computer object DAG in the domain
95% Configuring computer object DAG as cluster name object

100% Bringing resource group ‘Cluster Group’ online.

Windows 2008 R2 Powershell:

Import-Module FailoverClusters

New-Cluster –name DAGNAME –node NODE1,NODE2,NODE3 /staticIP:IPAddress /noStorage

[PS] C:>Import-Module FailoverClusters

[PS] C:>New-Cluster -Name DAG -Node MBX-3,MBX-4 -StaticAddress 192.168.1.20 -NoStorage
Report file location: C:WindowsclusterReportsCreate Cluster Wizard DAG on 2013.08.12 At 12.09.55.mht

Name
—-
DAG

At this time, the started and stopped mailbox server lists are accurate, and the Cluster service for the DAG has been re-established. To ensure the configuration is correct the administrator can run Set-DatabaseAvailabilityGroup. This will ensure that the DAG configuration in Active Directory matches the cluster configuration.

This completes the datacenter switchover for the database availability group. The procedure can now continue with database activation and changes required for client access.

====================================================================================

This blog post covers three common scenarios I see where administrators make mistakes when using Stop-DatabaseAvailabilityGroup. When used incorrectly, the cmdlet can have unintended results and the steps outlined here can be used to work around them.

========================================================

Datacenter Activation Coordination Series:

========================================================

Part 7: Datacenter Activation Coordination: When to run start-databaseavailabilitygroup to bring members back into the DAG after a datacenter switchover…

2 Replies

When running Restore-DatabaseAvailabilityGroup as part of the datacenter switchover process, servers in the secondary datacenter are forced online from a quorum and cluster perspective, and servers in the primary datacenter are evicted from the DAG’s cluster. When nodes in the primary datacenter come back online and network connectivity is restored, these restored nodes are not aware that any changes to cluster membership have occurred. The cluster services on the nodes in the primary datacenter will attempt to join/form a cluster with the nodes running in the secondary datacenter. When this occurs, the nodes in the secondary datacenter inform the nodes in the primary datacenter that they were evicted.

After a datacenter switchover has occurred, unless the original datacenter is gone or otherwise unrecoverable, eventually services in the primary datacenter will be restored. When services are restored, including full network connectivity, database availability group (DAG) administrators can begin the switchback process by using the Start-DatabaseAvailabilityGroup cmdlet.

Before performing a switchback, you can perform the following tasks to verify that it is safe to run Start-DatabaseAvailabilityGroup for servers in the primary datacenter.

The first task is to ensure that the following events are present in the system log of the servers on the StoppedMailboxServers list:

Log Name:      System
Source:        Service Control Manager
Date:          5/27/2012 1:13:35 PM
Event ID:      7040
Task Category: None
Level:         Information
Keywords:      Classic
User:          SYSTEM
Computer:      MBX-1.exchange.msft
Description:
The start type of the Cluster Service service was changed from auto start to disabled.

Log Name:      System
Source:        Microsoft-Windows-FailoverClustering
Date:          5/27/2012 1:13:35 PM
Event ID:      4621
Task Category: Cluster Evict/Destroy Cleanup
Level:         Information
Keywords:
User:          SYSTEM
Computer:      MBX-1.exchange.msft
Description:
This node was sucessfully removed from the cluster.

Log Name:      System
Source:        Service Control Manager
Date:          5/27/2012 1:13:35 PM
Event ID:      7036
Task Category: None
Level:         Information
Keywords:      Classic
User:          N/A
Computer:      MBX-1.exchange.msft
Description:
The Cluster Service service entered the stopped state.

In this example, MBX-1 was informed of the eviction, and had it’s cluster services cleaned up and it’s Cluster service startup type set to disabled. The second task is to verify that the Cluster service startup type is set to Disabled. You can use the Services snap-in to verify this.

The third and last task is to verify that the cluster registry has been successfully cleaned up. This is an important step because any remnants of the cluster registry can lead the server to believe it is actually still in a cluster even though it has been evicted. You can use registry editor and navigate to HKEY_LOCAL_MACHINE (HKLM). If there is a hive called Cluster under the root of HKLM then the cleanup did not complete successfully.

Here is an example of a node where a successful cleanup was performed:

Here is an example of a node where the Cluster service has not been successfully cleaned up:

Anytime part of the cleanup process fails it typically means that Start-DatabaseAvailabilityGroup will also fail. If any of these three tasks show that cleanup did not complete successfully, it’s relatively easy to fix these issues. Administrators can force the cleanup to occur by running a cluster command.

Windows 2008:

Cluster node /force

Windows 2008 R2 / Windows 2012:

Import-Module FailoverCluters

Clear-CluserNode <NODENAME> –Force

Some administrators proactively include this as a step in their datacenter switchover documentation when bringing resources back to the primary datacenter. This is not a bad idea. Proactively running this command, even on a node was cleaned up successfully has no ill effects and eliminates the need to perform the three tasks listed above.

Therefore, I recommend administrators either incorporate the three tasks or proactively run the cleanup command as a part of their datacenter switchover procedures.

========================================================

Datacenter Activation Coordination Series:

========================================================

Exchange 2010: Page Zeroing and VSS Based Backups

2 Replies

In Exchange Server 2010 Service Pack 1, page zeroing is enabled by default and there is no method to disable it. Page zeroing is a process that takes pages that exist in whitespace within the database and marks them with a pattern of zeros making them forensically unrecoverable. The page zeroing process runs as part of the background maintenance process.

Recently we have investigated cases where page zeroing has led to larger than expected VSS backup data sets. Specifically these are for VSS-based products that perform a delta backup from a previous snapshot versus transferring a full data set with each backup. These might include but are not limited to:

System Center Data Protection Manager
Windows Server Backup (when leveraging non-network storage)
Hardware-based VSS providers

In all cases investigated there was an event that increased the amount of whitespace within the database. For example, multiple mailboxes were moved to another database leaving a more noticeable amount of whitespace within a given database.

Let’s look at an example:

The anchor backup is taken of a 500 GB database with 10 GB of associated log files. This results in a transfer to backup medium of 510 GB. Over the course of the day 6 GB of changes occur within the database (5 GB of actual user changes / 1 GB of page zeroing changes) with 10 GB of associated logs. This yields a delta transfer to backup medium of 16 GB. Over the course of time, delta transfers all float around 16 GB with standard usage patterns etc. At some point the administrator migrates a group of users out of the database and this accounts for 30 GB of change. The resulting backup to medium is now 30 GB + log files. At this point, we expected the increase in delta transfer as there was an event that caused an increase in database activity. What is not expected though, is that from this point forward, the delta transfers to backup medium continue to be greater than 30 GB. In many cases this exceeds the expected snapshot size and storage allocated on the medium server.

In these instances, the page zeroing process continuously zeros pages that are already zeroed, thereby causing a daily delta change rate that corresponds with whitespace.

The following actions can be taken to correct this condition:

1) Adjust snapshot storage to accommodate the larger delta data sets.

2) Migrate all mailboxes out of the mailbox database and remove the mailbox database.

3) Allow the whitespace to be recycled as additional mailboxes are added to the database.

4) Offline defragment the database.

Exchange 2010 Service Pack 3 Rollup Update 1 has a code change that corrects the page zeroing behavior.

TIMMCMIC

Navigating the world of high availability….and occassionally sticking my head in the cloud…

Category Archives: Uncategorized

Exchange 2013: Health Manager service may not reliably start after server boot.

Office365: POP and IMAP clients receive OWA links for calendar invitations

Exchange 2013 Cumulative Update 2 – Where are my databases?

The Core Problem

How to Correct this Condition

Moving Forward

Exchange 2010: Remove-databaseavailabilitygroupserver–configurationOnly does not evict the member from the cluster.

RPC Filtering and Exchange 2010 Database Availability Groups

Exchange 2013: OutlookMailboxDeeptTestProbe fails with Access Denied

Part 9: Datacenter Activation Coordination: An error caused a change in the current set of domain controllers

Part 8: Datacenter Activation Coordination: Stop! In the Name of DAG…

Part 7: Datacenter Activation Coordination: When to run start-databaseavailabilitygroup to bring members back into the DAG after a datacenter switchover…

Exchange 2010: Page Zeroing and VSS Based Backups