Monthly Archives: February 2009

Restore-StorageGroupCopy –standbymachine requires use of –force in order to complete successfully when activating a Standby Continuous Replication target in Exchange 2007 SP1.

When attempt to run restore-storagegroupcopy on an SCR target you may receive the following error (note this is with verbose output):

[PS] C:WindowsSystem32>Restore-StorageGroupCopy MBX-4MBX-4-SG1 -StandbyMachine MBX-3 -Verbose > output.txt
VERBOSE: Restore-StorageGroupCopy : Beginning processing.
VERBOSE: Restore-StorageGroupCopy : Searching objects "MBX-4MBX-4-SG1" of type "StorageGroup" under the root "$null".
VERBOSE: Restore-StorageGroupCopy : Previous operation run on domain controller ‘DC-3.domain.com.
VERBOSE: Restore-StorageGroupCopy : Processing object "MBX-4MBX-4-SG1".
VERBOSE: Restoring Storage Group Copy "MBX-4MBX-4-SG1".
WARNING: Failed to copy remaining log files (through Exx.log)during
Restore-StorageGroupCopy operation for storage group copy (MBX-4-SG1). Failure reason: Unable to move a log file from the inspector directory to the log directory for storage group 9b5be25a-1c2b-48a0-9087-2819d2887001|Standby. Old path: f:MBX-4MBX-4-SG1-LogsE00.log. New path: f:MBX-4MBX-4-SG1-LogsIgnoredLogsE00OutofDate2009-02-17T14-42-53E00.log.
.
Restore-StorageGroupCopy : Failed to copy the last log files for storage group ‘MBX-4-SG1’. Use the Force option if you want to restore the storage group despite the data loss.
At line:1 char:25
+ Restore-StorageGroupCopy  <<<< MBX-4MBX-4-SG1 -StandbyMachine MBX-3 -Verbose
VERBOSE: Restore-StorageGroupCopy : Ending processing.

When reviewing the application log of the server, you will note the following events:

Log Name:      Application
Source:        MSExchangeRepl
Date:          2/17/2009 9:42:53 AM
Event ID:      2013
Task Category: Service
Level:         Information
Keywords:      Classic
User:          N/A
Computer:      MBX-3.domain.com
Description:
The replication instance for storage group MBX-4MBX-4-SG1 found an invalid copy of log file f:MBX-4MBX-4-SG1-LogsinspectorE00.log. The log file has been moved to f:MBX-4MBX-4-SG1-LogsIgnoredLogsInspectionFailed2009-02-17T14-42-53E00.log. Reason: The log file has failed inspection.  It is inappropriate for replay..

Log Name:      Application
Source:        MSExchangeRepl
Date:          2/17/2009 9:42:53 AM
Event ID:      2089
Task Category: Action
Level:         Information
Keywords:      Classic
User:          N/A
Computer:      MBX-3.domain.com
Description:
The Restore-StorageGroupCopy operation on MBX-4MBX-4-SG1 failed to complete with error code Microsoft.Exchange.Management.Tasks.RestoreSCRFailedToCopyLastLog: Failed to copy the last log files for storage group ‘MBX-4-SG1’. Use the Force option if you want to restore the storage group despite the data loss..

In the output of the management shell command the user is advised to use the –FORCE option to complete this operation.  There are certain circumstances where this is necessary, this is not one of them.  The reason this should not be necessary is that the source is fully available so all logs not yet replicated should be available for copy when running restore-storagegroupcopy.

The issue here arises from a bug.  We will attempt to copy the last log up to 6 times.  In this case, regardless of our success or failure, we continue to copy the logs until we reach the max attempts and then display a failure.  In all actuality the command completed successfully.  Let’s take a look at that.

If you review the log file directory on the SCR target, you will see that the ENN.log is actually present in the log file folder.

 

image

 

Using eseutil /ml <log>, we can dump the header information for the log file on the SCR TARGET log directory.

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server

Version 08.01

Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode…

      Base name: e00
      Log file: e00.log
      lGeneration: 2959 (0xB8F)
      Checkpoint: NOT AVAILABLE
      creation time: 02/17/2009 09:26:41
      prev gen time: 02/17/2009 09:26:40
      Format LGVersion: (7.3704.12)
      Engine LGVersion: (7.3704.12)
     
Signature: Create time:01/08/2009 10:39:04 Rand:47957569 Computer:
      Env SystemPath: e:MBX-4MBX-4-SG1-System
      Env LogFilePath: f:MBX-4MBX-4-SG1-Logs
      Env Log Sec size: 512
      Env (CircLog,Session,Opentbl,VerPage,Cursors,LogBufs,LogFile,Buffers)
          (    off,    552,  27600,  15960,  27600,   2048,   2048,2000000000)
      Using Reserved Log File: false
      Circular Logging Flag (current file): off
      Circular Logging Flag (past files): off

      Last Lgpos: (0xb8f,8,30)

Integrity check passed for log file: e00.log

Operation completed successfully in 0.63 seconds.

The most important item in this output to us is the signature. 

Using eseutil /ml <log>, we can dump the header information for the log file on the SCR SOURCE log directory.

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server

Version 08.01

Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode…

      Base name: e00
      Log file: e00.log
      lGeneration: 2959 (0xB8F)
      Checkpoint: NOT AVAILABLE
      creation time: 02/17/2009 09:26:41
      prev gen time: 02/17/2009 09:26:40
      Format LGVersion: (7.3704.12)
      Engine LGVersion: (7.3704.12)
     
Signature: Create time:01/08/2009 10:39:04 Rand:47957569 Computer:
      Env SystemPath: e:MBX-4MBX-4-SG1-System
      Env LogFilePath: f:MBX-4MBX-4-SG1-Logs
      Env Log Sec size: 512
      Env (CircLog,Session,Opentbl,VerPage,Cursors,LogBufs,LogFile,Buffers)
(    off,    552,  27600,  15960,  27600,   2048, 2048,2000000000)
      Using Reserved Log File: false
      Circular Logging Flag (current file): off
      Circular Logging Flag (past files): off

      Last Lgpos: (0xb8f,8,30)

Integrity check passed for log file: e00.log

Operation completed successfully in 0.63 seconds.

The signature is what we can use to make the most direct comparison.  Based on this information, we can confirm that the log file that was copied and placed in the SCR TARGET log directory matches the log from the SCR SOURCE log directory.  The command was actually successful.

As I indicated above we actually will attempt to copy the log file 6 times (max retires).  If you look in the management shell text, you will see that the log was moved to “f:MBX-4MBX-4-SG1-LogsIgnoredLogsE00OutofDate2009-02-17T14-42-53E00.log” –> in this case the E00OutofDate directory inside ignored logs. 

 

image

 

If you review the folder you will see that the time stamps of the logs are all the same and correspond to the attempt to run restore-storagegroupcopy –standbymachine.  If you run eseutil /ml against one of the log headers, you can again compare the signatures and verify these logs both match the log successfully copied to the SCR TARGET directory and the log on the SCR SOURCE directory.

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server

Version 08.01

Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode…

      Base name: 200
      Log file: 2009-02-17T14-41-58E00.log
      lGeneration: 2959 (0xB8F)
      Checkpoint: NOT AVAILABLE
      creation time: 02/17/2009 09:26:41
      prev gen time: 02/17/2009 09:26:40
      Format LGVersion: (7.3704.12)
      Engine LGVersion: (7.3704.12)
     
Signature: Create time:01/08/2009 10:39:04 Rand:47957569 Computer:
      Env SystemPath: e:MBX-4MBX-4-SG1-System
      Env LogFilePath: f:MBX-4MBX-4-SG1-Logs
      Env Log Sec size: 512
      Env (CircLog,Session,Opentbl,VerPage,Cursors,LogBufs,LogFile,Buffers)
(    off,    552,  27600,  15960,  27600,   2048, 2048,2000000000)
      Using Reserved Log File: false
      Circular Logging Flag (current file): off
      Circular Logging Flag (past files): off

      Last Lgpos: (0xb8f,8,30)

Integrity check passed for log file: e00.log

Operation completed successfully in 0.63 seconds.

Because of this issue repeat attempts to run restore-storagegroupcopy –standbymachine will return the same error.  There are a few things that can be done to correct this issue:

  • Use the information in this blog to manually verify log copy was successful.  If successful, run restore-storagegroupcopy –standbymachine <name> –force.
  • Contact Microsoft Customer Support Services.  There are incremental updates for both Exchange 2007 SP1 RU5 and Exchange 2007 SP1 RU6 currently available.

This issue is corrected in Exchange 2007 SP1 RU7.

This issue only affects Exchange 2007 SP1 Standby Continuous Replication when using any potential source.

(Note:  If you elect to receive an interim update, the IU only needs to be applied on the SCR target machine.)

Permissions recommended for the CNO (Cluster Name Object) in Windows 2008 for Exchange 2007 SP1 setup operations.

In Windows 2003 when cluster would attempt to create or modify Kerberos enabled machine accounts it would do so by leveraging the rights assigned to the cluster service account.  The Windows 2003 cluster service would use this domain account for the logon right at service startup.

In Windows 2008 when the cluster attempts to create or modify Kerberos enable machine accounts it does so by leveraging the machine account associated with the name of the cluster (this is the Cluster Name Object (CNO) ).  The Windows 2008 cluster service now starts under “Local System”.

When the CNO does not have rights to join machine accounts to the domain, or modify existing machine accounts, the Exchange setup will fail after programmatically creating the network name resources and attempting to bring it online.

This situation most commonly occurs when running:

1)  Setup.com /newCMS /cmsName:<NAME> /cmsIPv4Address:<IP>

2)  Setup.com /recoverCMS /cmsName:<NAME> /cmsIPv4Address:<IP>

3)  Enable-ContinuousReplicationHostName

The following errors may be noted during setup where the network name failed to come online due to this issue:

"Cluster Common Failure Exception: Failed to bring cluster resource Network name (<NAME>) in cluster group <NAME> online.The group or resource is not in the correct state to perform the requested operation. (Exception from HRESULT:0x8007139f)"

Error 0x8007139f translates to:

ERROR_INVALID_STATE
# The group or resource is not in the correct state to
# perform the requested operation.

In the application and system logs, the following events may be noted:

Log Name: Application
Source: MSExchangeRepl
Date: 10/24/2008 2:17:15 PM
Event ID: 107
Task Category: Action
Level: Error
Keywords: Classic
User: N/A
Computer: <NAME>.domain.com
Description:
The New-ClusteredMailboxServer operation failed for server <NAME>

Log Name: Application
Source: MSExchangeSetup
Date: 10/24/2008 2:17:15 PM
Event ID: 1002
Task Category: Microsoft Exchange Setup
Level: Error
Keywords: Classic
User: N/A
Computer: <NAME>.domain.com
Description:
Exchange Server component Clustered Mailbox Server failed.
Error: Error:
Cluster Common Failure Exception: Failed to bring cluster resource Network Name (<NAME>) in cluster group <NAME> online. The event log may contain more details. Cluster Common Failure Exception: The group or resource is not in the correct state to perform the requested operation. (Exception from HRESULT: 0x8007139F)

Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 10/24/2008 2:17:13 PM
Event ID: 1194
Task Category: Network Name Resource
Level: Error
Keywords:
User: SYSTEM
Computer: <NAME>.domain.com
Description:
Cluster network name resource ‘Network Name (<NAME>)’ failed to create its
associated computer object in domain ‘domain.com’ for the following reason: Unable to create computer account. The text for the associated error code is: Access is denied.

To correct this situation this is what I recommend when creating Windows 2008 clusters.  (These steps assume the cluster service on the nodes has not already been configured):

  • Using Active Directory Users and Computers showing advanced features:
    • In the appropriate container create a new machine account to correspond to the name of the cluster – this will be the cluster name object or CNO.
    • In the appropriate container create a new machine account for the Exchange name – this will be the CMS or clustered mailbox server name.
  • Once the machine accounts are created, the necessary permissions should be updated:
    • Get the properties of the CMS computer account.
    • Select the security tab.
    • Select add.
      • Select the object types button – change the scope to just computer accounts.
      • In the search field, type the name of the CNO machine account and press check names.
      • Press OK once the machine account is found.
    • In the group or user names box, find the machine account just added.
      • Assign the FULL CONTROL right to this machine account.
  • Complete the process by disabling both the CNO account and the CMS account.
  • Allow time for AD replication.

If the cluster services have already been configured you can skip the step of creating an account for the CNO and disabling the CNO account since this account should already exist in the active directory.

When these steps are completed you should be able to establish the cluster services and begin the Exchange installation.

If you are using Standby Continuous Replication (SCR) and the target is a single node cluster you will follow the same instructions with the exception of:

  • Create two CNO accounts, one for each cluster.
  • Add both CNO accounts with full control to the same CMS account.
  • Disable all accounts created.

By updating permissions for the additional CNO this will ensure that the standby cluster CNO has the appropriate rights when running setup.com /recoverCMS.

If you are using continuous replication hostnames with cluster continuous replication clusters you will follow the same process outlined above to pre-stage your machine accounts associated with the replication names and add the CNO account with full control.  The only CNO account that requires permissions is that of the cluster hosting the replication host names – SCR target cluster CNOs do not require permissions to these names.

By pre-staging machine accounts and establishing the appropriate security contexts you can help prevent errors during Exchange setup and commandlet operations.

Errors with Test-ReplicationHealth when using a multi-subnet Windows 2008 Cluster.

Windows 2008 supports having clustered nodes that are installed into different network subnets.  For Exchange 2007 SP1 this becomes a configuration used for cluster continuous replication clusters.

In troubleshooting and monitoring clusters customers will often use the test-replicationhealth commandlet to determine the status of replication between two clustered nodes in a continuous cluster replication solution.  When the cluster itself is multi-subnet, the following error is thrown during testing:

Server Check Result Error
—— —– —— —–
2008-NODE6 ClusterNetwork WARNING Warnings:
Network ‘Cluster Netw
ork 3′ used for client co
nnectivity is up but node
‘2008-Node5’ does not ha
ve a Network Interface Ca
rd configured on it. Chec
k that a NIC is configure
d for this network and is
enabled.
Network ‘Cluster Netw
ork 1′ used for client co
nnectivity is up but node
‘2008-Node6’ does not ha
ve a Network Interface Ca
rd configured on it. Chec
k that a NIC is configure
d for this network and is
enabled.
2008-NODE6 QuorumGroup Passed
2008-NODE6 FileShareQuorum Passed
2008-NODE6 CmsGroup Passed
2008-NODE6 NodePaused Passed
2008-NODE6 DnsRegistrationStatus Passed
2008-NODE6 ReplayService Passed
2008-NODE6 DBMountedFailover Passed
2008-NODE6 SGCopySuspended Passed
2008-NODE6 SGCopyFailed Passed
2008-NODE6 SGInitializing Passed
2008-NODE6 SGCopyQueueLength Passed
2008-NODE6 SGReplayQueueLength Passed

 

The issue is a bug in the way that test-replicationhealth handles cluster networks.  In cluster administrator, under networks, you will see a “network” enumerated for each subnet.  Each of these networks shows all of the cluster interfaces that reside on that network.  In a single subnet cluster, these networks would generally have two interfaces, one for each node.  In a multi-subnet cluster, each network only has a single interface, one for each node.

To view a network and it’s interfaces, select networks from the left hand pane:

 

image

 

Here is an example of a multi-subnet cluster (showing a single interface per network):

 

image

 

Here is an example of a single subnet cluster (showing multiple interfaces per network):

 

image

 

At this time this issue is not scheduled to be corrected in any Exchange 2007 release.

 

*Updates:

10/19/2009 – Updated to reflect fix release.

Recommendations for enabling a two node Standby Continuous Replication target based on a Single Copy Cluster (Exchange 2007 SP1)

In this blog post I will assume there exists a source cluster that consists of a two node Exchange 2007 SP1 Single Copy Cluster (SCC) hosted on either Windows 2003 or Windows 2008. 

Standby Continuous Replication (SCR) was designed, in this type of deployment, to have a target that is a single node cluster.  Recently I’ve received several requests on how this could be extended to a two node cluster functioning as the SCR target.

Single copy clusters make having a two node target more complicated because of having to deal with the shared storage.  It is a requirement of SCR that the same drive letters / paths used for databases, logs, and system files on the source also exist on the target.  Also, we must take into consideration the fact that the storage necessary for replication can only be owned by a single node (shared nothing cluster model), and therefore only one node of the target cluster can be subscribed as the SCR target.

If you desire to have a two node SCR target, consider making the following configuration changes to assist in ensuring that the physical disk resources are owned on the correct node.

Windows cluster allows administrators to specify, on the properties of clustered groups, a list of preferred owners.  The preferred owners list on an Exchange cluster is generally cosmetic.  When preferred owners is combined with a Failback Policy, the settings become more then cosmetic.  A preferred owners group allows the administrator to establish the list of nodes, in order, that they prefer the group be hosted on when nodes are available.  When combined with a failback policy, the preferred owners list tells the cluster where and when to move the group automatically when specific nodes are available.  Let’s look at a few examples of this as it applies to our SCR target.  The preferred owners list and failback policy will be invoked anytime cluster membership also changes, for example, when rebooting a node that is a member of the cluster.

Example #1:

I have a two node SCR target with a group configured to hold my physical disk resources.  I have set a preferred owners list of NodeA then NodeB and a failback policy of immediate.  The group is currently owned on NodeA.  At patch management time I apply the necessary hotfixes to NodeB and reboot the server.  When NodeB has successfully rejoined the cluster, I then apply the patches to NodeA and reboot.  The disk group automatically moves from NodeA to NodeB.  When NodeA successfully rejoins the cluster, the disk group automatically moves back to NodeA.  Replication can now successfully resume since the underlying storage necessary for replication is present on NodeA, and NodeA is subscribed as the SCR target.  In this instance cluster membership changed during the reboot causing the cluster to evaluate the preferred owners list and failback policy and take actions as defined.

Example #2:

I have a two node SCR target with a group configured to hold my physical disk resources.  I have set a preferred owners list of NodeA then NodeB and a failback policy of immediate.  The group is currently owned on NodeA.  NodeA experiences a blue screen condition due to a faulty storage driver.  The disk group automatically moves from NodeA to NodeB.  When NodeA automatically reboots and successfully rejoins the cluster, the disk group automatically moves back to NodeA.  Replication can now successfully resume since the underlying storage necessary for replication is present on NodeA, and NodeA is subscribed as the SCR target.  In this instance cluster membership changed during the reboot causing the cluster to evaluate the preferred owners list and failback policy and take actions as defined.

Example #3

I have a two node SCR target with a group configured to hold my physical disk resources.  I have set a preferred owners list of NodeA then NodeB and a failback policy of immediate.  At patch management time I apply the necessary hotfixes to NodeB and reboot the server.  When NodeB has successfully rejoined the cluster, I launch failover cluster management and manually move the disk group from NodeA to NodeB.  I then apply the patches to NodeA and reboot the server.  When NodeA successfully rejoins the cluster, the disk group automatically moves back to NodeA.  Replication can now successfully resume since the underlying storage necessary for replication is present on NodeA, and NodeA is subscribed as the SCR target.  In this instance cluster membership changed during the reboot causing the cluster to evaluate the preferred owners list and failback policy and take actions as defined.

Example #4

I have a two node SCR target with a group configured to hold my physical disk resources.  I have set a preferred owners list of NodeA then NodeB and a failback policy of immediate.  An administrator, using failover cluster management, moves the disk group from NodeA to NodeB.  The group is not moved back.  Replication will enter a failed state for all instances since the storage necessary for replication to function is no longer present on the node subscribed to SCR.  Alerting informs the administrator there is an issue.  It is determined that the disk group is owned on the wrong node, and is manually moved back to NodeA.  Soon after replication successfully resumes since the underlying storage necessary for replication is present on NodeA, and NodeA is subscribed as the SCR target.  In this instance cluster membership did NOT change, so the preferred owners list and failback policy was not applied. 

Establishing the disk group, Preferred Owner, and Failback Policy in Windows 2003

Use the following steps to establish the disk group, preferred owners list, and a failback policy in Windows 2003.

  • Launch cluster administrator and connect to the SCR target cluster.
  • Under groups give in the left hand pane, create a new cluster group. 

 

clip_image002[4]

 

    • Name the group as appropriate.
    • By default disks found on a shared bus have physical disk resources created for them in default groups (ie Group0, Group1, Group2).  Move physical disk resources from the default groups into the new group created.
    • If mount points are being used physical disk resources must be created manually.  Please sure that all mountpoint disks are created and made dependant on the lettered volume hosting them (refer to – http://support.microsoft.com/default.aspx?scid=kb;[LN];280297).
  • Right click on the new group – select properties.
  • On the general tab is the preferred owners list.  Using the modify button, add preferred owners to the list and adjust the order as necessary.  (The server listed first in order will be the node most preferred to own the group).

 

    clip_image002[6]

     

    clip_image004[4]

     

  • After changing preferred owners (if necessary), select the failback tab.
    • Select the radio button allow failback.
    • Select the radio button immediately.

 

    clip_image006[4]

     

    The configuration of preferred owners and a failback policy can be performed with command line.

    To set the list of preferred owners and configure failback:

     

  • cluster.exe <ClusterFQDN> group <GroupName> /setOwners:<FirstNode>,<SecondNode>
  • cluster.exe <ClusterFQDN> group <GroupName> /prop AutoFailbackType=1

    Examples of these commands:

     

  • cluster.exe 2003-Cluster3.exchange.msft group SCRTargetDisks /setOwners:2003-Node1,2003-Node2
  • cluster.exe 2003-Cluster3.exchange.msft group SCRTargetDisks /pro AutoFailbackType=1

Establishing the disk group, Preferred Owner, and Failback Policy in Windows 2008

  • Launch failover cluster management and connect to the SCR target cluster.
  • In the left hand pane, under the cluster name, right click on Services and Application, select more actions -> Create Empty Service of Application

 

clip_image002[8]

 

  • Under services and applications you will now see a group named "New service of application".
  • Right click on "New service or application", select rename, and assign an appropriate name.

 

clip_image004[6]

 

  • Right click on the group, select add storage.  From here, choose the storage that should be added to this group.
    • Note:  By default Windows 2008 adds both lettered volumes and mounted volumes (mount points) to the available storage group at cluster creation.  Mounted volumes must manually be made dependant on their lettered physical disk.  Please update dependencies if necessary.
    • If the desired storage does not appear in the storage picker ensure that it has been added to the available storage group.  Only storage that has first been added to the available storage group is allowed to be added to services and applications.

 

clip_image006[6]

clip_image008[4]

 

  • Once storage has been added, right click on group and select properties.
  • On the general tab, select the checkbox next to each node of the cluster.  Use the up / down buttons to establish the preferred order.  Machines appearing first in the list will have preference over other nodes.

 

clip_image010[4]

 

  • Select the Failover tab.
    • Under the failback portion, select the radio button next to "Allow Failback".
    • Under "Allow Failback" select the radio button next to "Immediately".

 

clip_image012[4]

 

  • Apply the changes to complete the configuration.

The configuration of preferred owners and a failback policy can be performed with command line.

To set the list of preferred owners and configure failback:

  • cluster.exe <ClusterFQDN> group <GroupName> /setOwners:<FirstNode>,<SecondNode>
  • cluster.exe <ClusterFQDN> group <GroupName> /prop AutoFailbackType=1

Examples of these commands:

  • cluster.exe 2008-Cluster3.exchange.msft group SCRTargetDisks /setOwners:2008-Node1,2008-Node2
  • cluster.exe 2008-Cluster3.exchange.msft group SCRTargetDisks /pro AutoFailbackType=1

Consider reviewing the following references for more information.

http://support.microsoft.com/kb/197047

http://support.microsoft.com/kb/299631

http://support.microsoft.com/kb/823955