Category Archives: Uncategorized

KB978790 – Update to Windows 2008 to change the failure behavior of the File Share Witness Quorum resource.

In a previous blog post I outlined a failure behavior of the file share witness resource on Windows 2008 clusters.

http://blogs.technet.com/timmcmic/archive/2009/01/22/exchange-2007-sp1-ccr-windows-2008-clusters-file-share-witness-fsw-failures.aspx

Ultimately there could exist a condition where the cluster enters a lost quorum state due to the File Share Witness resource being in a FAILED status even though the witness directory was fully available.

In Windows 2008 R2 a design change was made.  When the File Share Witness host server becomes unavailable, the File Share Witness resource will still fail in cluster and cause the Cluster Core Resources to move between nodes.  In this case assuming the File Share Witness host server is still not available, the resource remains in a failed state.  If it becomes necessary to utilize the File Share Witness to maintain Quorum, and the witness resource is in a failed state, cluster will attempt to online the witness resource.  If the online is successful the witness share is alive and accessible – quorum is maintained.  If the online is not successful, the witness share is not alive and accessible – a lost quorum condition is encountered.

A hotfix has been released (KB978790) that back ports this change from Windows 2008 R2 to Windows 2008.

You can download this hotfix by navigating to http://support.microsoft.com and requesting the knowledge base article 978790.  There is a link to request and download this fix in the upper left hand corner.  (Note:  When downloading the fix the package is marked for the product Windows Vista, but this is actually for Windows 2008).

This hotfix is applicable to Exchange 2007 and Exchange 2010 when utilized with Windows 2008.

Do I need to move my cluster core resources?

Recently one of our Exchange MVPs posted a question to our support engineers…Is it necessary to move the cluster group / cluster core resources in Exchange 2007 / Exchange 2010?

As a reminder…the cluster core resources are generally the cluster name and cluster IP address.  This is the management name specified when the cluster is created.

In Windows 2003 this would be known as the “Cluster Group” and would contain the cluster name, cluster IP address, and majority node set or quorum disk resource.

In Windows 2008 and Windows 2008 R2 this would be known as the “Cluster Core Resources” (the actual group in cluster though is still called the “Cluster Group”) and generally contains the cluster name, cluster IP address, and file share witness or quorum disk resource. 

In Windows 2003 the cluster group is visible and manageable in cluster administrator.  In Windows 2008 you can view the cluster core resources, but management requires using the cluster.exe command line (or powershell in Windows 2008 R2).

So…enough background and back to the question…

From the platforms perspective it should not be necessary to maintain either of these group or move the cluster core resources when rebooting nodes etc.  When the resources are owned on a node where the cluster service is stopping, the resources will automatically be arbitrated to another node in the cluster. 

I personally though do not prefer to take this approach.  I prefer to know that the resources have been moved off the node that I’m managing and have been successfully arbitrated to another node in the cluster.

Therefore, I recommend the following…

Prior to managing a node verify whether or not the node owns the cluster core resources.

Run the following command – cluster.exe <FQDNCluster> group

C:>cluster dag.exchange.msft group
Listing status for all available resource groups:

Group                Node            Status
——————– ————— ——
Cluster Group        DAG-3           Online
Available Storage    DAG-1           Offline

As you can see from the output the “Cluster Group” currently resides on DAG-3 and is online.

If DAG-3 is the node to be rebooted, the cluster group can be moved to another node.

Run the following command – cluster.exe <FQDN> group “Cluster Group” /moveto:<NODE>

Here is an example of where the resources did not arbitrate successfully.  You can see that the group successfully moved to the specified node, but the status is partially online.

C:>cluster.exe dag.exchange.msft group "Cluster Group" /moveto:DAG-1

Moving resource group ‘Cluster Group’…

Group                Node            Status
——————– ————— ——
Cluster Group        DAG-1           Partially Online

Any other status but online should be investigated and corrected prior to rebooting or managing the cluster services on the nodes.

Here is an example of a successful move.

C:>cluster.exe dag.Exchange.msft group "Cluster Group" /moveto:DAG-1

Moving resource group ‘Cluster Group’…

Group                Node            Status
——————– ————— ——
Cluster Group        DAG-1           Online

If the group successfully moved, the node information should display the name of the node where the group was moved and the status should show online.

Exchange 2010 – Pre-staging the Cluster Name Object (CNO) to support a Database Availability Group (DAG)

In Exchange 2010 when using the Database Availability Group (DAG) we leverage the cluster services in Windows 2008 and Windows 2008 R2. 

When utilizing the cluster services in Windows 2008 and Windows 2008 R2 the cluster core resources – cluster name is a Kerberos enabled name.  This requires that a machine account be created within the directory for association with this cluster name resource.  This is known as the CNO or cluster name object. 

With previous versions of Exchange this Kerberos enabled machine account was created in the domain leveraging the credentials / rights assigned to the user logged into the machine using the failover cluster management tools to establish the clustered services.  This would require either the user’s account have permissions to create and enable machine accounts within the domain or the machine account to be pre-staged with the appropriate rights assigned.

In Exchange 2010 the establishment of the cluster and the cluster name object is performed by using either the Exchange Management Console or Exchange Management Shell.  Specifically the cluster creation occurs when running the command add-databaseavailabilitygroupserver when adding the first server to the database availability group.  Since we are leveraging the Exchange management tools to establish the cluster service, this also means that we are utilizing RBAC.  In this case when the first server is added to the DAG, the remote powershell contacts the replication service of the first node to be added.  In turn, the replication service of that local machine installs the failover cluster management service and begins cluster creation utilizing the credentials assigned to LOCAL SYSTEM (since the replication service starts on the machine in the LOCAL SYSTEM security context).  Therefore, the individual user rights to create machine accounts in the domain are not leveraged when establishing the clustered services for the database availability group, but rather that of the replication service.

In environments where computer account creation is restricted, it may become necessary to pre-stage the CNO for the clustered services and assign the appropriate rights.  There are two methods which work to establish this security context:

1)  Assign the machine account of the first node added to the DAG with full control of the pre-staged object.

2)  Assign the Exchange Trusted Subsystem universal security group with full control of the pre-staged object.

The first method works by ensuring that the LOCAL SYSTEM security context will be able to manage the pre-staged computer account fully.  The second method works because the Exchange Trusted Subsystem group contains the machine accounts of all Exchange servers in the domain.  If utilizing method one you need to ensure that the first member added to the DAG is the machine granted permissions.  If the second method is utilized any DAG member can be added as all members have rights to manage the account.

Here is an example of how to pre-stage the machine account utilizing the local system rights of the first DAG member.

1)  Select the appropriate container in Active Directory where you wish the account to be created.  Right click, select NEW –> COMPUTER.

 

image

 

2)  In the Computer Name field, type the name that was assigned to the database availability group.

 

image

 

3)  After the account is created, right click on the account and select properties.  Select the security tab.  (Note:  You may have to enable advanced features in Active Directory Users and Computers).

 

image

 

4)  Select the ADD button to present the add dialog.

image

 

5)  Select the Object Types button.  De-select all options and select the COMPUTERS option.  Press OK.

 

image

 

6)   In the Enter the object name to select field, type the name of the first DAG member.  Select Check Name to ensure the name resolves to a valid account in the directory.  Select OK to add the account.

 

image

 

7)   Select the machine account in the Group or User Names field, assign the machine account full control.  Press OK.

 

image

 

8)  Locate the account in the container where it was created.  Right click on the account and select disable account.

 

image

 

9)  When presented with the disable dialog, select YES.

 

image

 

10)  After the object has been successfully disabled,  press the OK button on the success confirmation.

image

 

11)  At this point allow time for active directory replication to occur.  Once replication has completed, the first database availability group member can be added and the cluster services established.

 

======================================

Update 3/14/2010

======================================

It was recently asked how can I programmatically achieve establishing these permissions?  Thanks to Jeff Kizner and Robert Gillies for providing these instructions.

cd ad:

$comp = Get-ADComputer DAG01

$sid = (Get-ADGroup "Exchange Trusted Subsystem").sid

$rights = [System.DirectoryServices.ActiveDirectoryRights]::GenericAll

$perm = [System.Security.AccessControl.AccessControlType]::Allow

$acl = get-acl $comp

$ace = new-object System.DirectoryServices.ActiveDirectoryAccessRule $sid, $rights, $perm

$acl.AddAccessRule($ace)

set-acl -AclObject $acl $comp

======================================

Windows 2003 / Exchange 2007 SP1 – Alternate Cluster Setup

The official TechNet documentation for establishing a Windows 2003 / Exchange 2007 SP1 cluster can be found at: 

What I want to do in this blog post is outline an alternate method for installing the cluster different from the instructions referenced above for Windows 2003 clustered nodes.

The first recommendation that I make when establishing the cluster service is to not use the GUI mode setup included with Exchange 2007.  My main rational for this advice is that when GUI mode setup fails you generally have to revert to command line setup to correct conditions and repeat the setup operations.  With this in mind it’s much easier to start with the command line setup and continue through.

In each of the links above you will see that the basic structure of a command line setup for the active roles installation is:

setup.com /mode:install /roles:mailbox /newCMS /cmsName /cmsIPAddress [/css] [/cmsDataPath]

The passive role installation is:

setup.com /mode:install /roles:mailbox

The way to think about the active role command is the combination of the passive role (setup.com /mode:install /roles:mailbox) with the establishment of the clustered services (setup.com /newCMS /cmsName /cmsIPAddress [/css] [/cmsDataPath]).

These commands do not have to be run together.  With this in mind my second recommendation is to establish the passive role first on each clustered node.  I recommend this for a few reasons:

  • By separating the commands we can identify failures and address them easier.
  • This method uses a foundational approach whereby the nodes are established and the cluster services verified, the mailbox roles established and verified, and only after that the clustered instance established and verified.
  • When the commands are combined, and the failure occurs in the /newCMS portion of the command, further setup attempts using the same command error indicating the mailbox roles is currently installed and must be uninstalled.  Although the error is true, uninstalling the mailbox role and rerunning the same command without correcting the conditions that failed the cluster portion of setup will generally result is hitting the same error again (and thus the loop continues).
  • In a shared storage environment the restart of the clustered services during mailbox role installation causes storage to move between nodes.  Setup attempts to move this storage back, but is sometimes not successful in doing so.  This causes the /newCMS portion of the setup command to fail, and thus the active role command to fail.  Separating the commands allows us to establish storage and arbitrate it to the “active” node for setup operations after the mailbox role has been installed.
  • A setup code error exists where when an existing CMS exists in a cluster, and the passive node installation is run, the passive nodes machine account is promoted to full control permissions of the CMS.  This results in errors in the management console and management shell that an account was found with full administrator rights that is not an Exchange administrator.  (Note:  The error is truly benign, I just point it out because it is annoying.)

Now I will outline for you the steps that I use when establishing a Windows 2003 / Exchange 2007 SP1 Cluster.  Please refer to these additional blog posts for some other helpful information regarding these installations:

In all instructions below the assumption is that the cluster service and appropriate quorum type have been established and fully configured.  The instructions also outline only the establishment of the Exchange resources, the TechNet documentation referenced above should be used to establish the specific cluster service requirements and other installation / management tasks for an Exchange 2007 SP1 clustered installation on Windows 2003.

 

Windows 2003 / Exchange 2007 SP1 Cluster Continuous Replication (CCR)

  • On each node, run setup.com /mode:install /roles:mailbox.
    • This will establish the “passive” installation.
    • A reboot will be required post installation.
  • Install the appropriate Exchange 2007 SP1 rollup update for your organization.
  • On the “active” node, run setup.com /newCMS /cmsName:<NAME> /cmsIPAddress:<IPAddress>
    • This will establish the “active” installation.

At this point the replication service will begin copying log files from the new databases created and bring both sides into synchronization.  This completes the setup of a Windows 2003 / Exchange 2007 SP1 CCR cluster.

 

Windows 2003 / Exchange 2007 SP1 Single Copy Cluster (SCC)

  • After establishing the clustered services any shared storage must be prepared for the Exchange installation.  Consider the following regarding shared storage:
    • Any lettered physical volume found during cluster creation will have a physical disk resource created for it and each of these resources placed into temporary groups.
    • When using cluster administrator, these groups will look like Group0 / Group1 / Group2 / etc each holding a single physical disk resource.
    • Any mountpoints established on the system are not automatically assigned physical disk resources in the cluster.
  • To prepare the shared storage:
    • Create a new group within cluster administrator, name the group with a temporary name.  (Do not use the Exchange CMS name for this group).
    • Move all physical disk resources from their temporary group (GroupX) to the new group created.
    • If mountpoints are in use, create new physical disk resources in the temporary group you created for each mountpoint that exists on the node.
    • Mountpoint physical disk resources should be configured with a dependency of the lettered physical disk resources hosting the mountpoint.
    • Delete the temporary groups (GroupX) that should now be empty.
  • On each node, run setup.com /mode:install /roles:mailbox
    • This will establish the “passive” installation.
    • A reboot will be required post installation.
  • Install the appropriate Exchange 2007 SP1 rollup update for your organization.
  • Using cluster administrator, move the group holding the physical disk resources to the “active” node.
  • On the “active” node, run setup.com /newCMS /cmsName:<NAME> /cmsIPAddress:<IPAddress> /css /cmsDataPath:<Path to folder on shared disk>
    • This will establish the “active” installation storing the initial database and log files on the shared disk path specified in the command.
  • Address the remaining shared storage requirements:
    • Physical disk resources must be moved from the temporary group you created to the Exchange CMS group created during setup.
    • As outlined in TechNet documentation, the dependencies for the database instances must be configured so that each database is dependant on the physical disks where the files reside.

 

By using these steps you should hopefully be able to simply your Exchange 2007 SP1 clustered installation on Windows 2003 and more easily address errors that may arise during the clustered setup.

Windows 2008 – Exchange 2007 SP1 – Alternate Cluster Setup

The official TechNet documentation for establishing a Windows 2008 / Exchange 2007 SP1 cluster can be found at: 

What I want to do in this blog post is outline an alternate method for installing the cluster different from the instructions referenced above for Windows 2003 clustered nodes.

The first recommendation that I make when establishing the cluster service is to not use the GUI mode setup included with Exchange 2007.  My main rational for this advice is that when GUI mode setup fails you generally have to revert to command line setup to correct conditions and repeat the setup operations.  With this in mind it’s much easier to start with the command line setup and continue through.

In each of the links above you will see that the basic structure of a command line setup for the active roles installation is:

setup.com /mode:install /roles:mailbox /newCMS /cmsName /cmsIPAddress [/css] [/cmsDataPath]

The passive role installation is:

setup.com /mode:install /roles:mailbox

The way to think about the active role command is the combination of the passive role (setup.com /mode:install /roles:mailbox) with the establishment of the clustered services (setup.com /newCMS /cmsName /cmsIPAddress [/css] [/cmsDataPath]).

These commands do not have to be run together.  With this in mind my second recommendation is to establish the passive role first on each clustered node.  I recommend this for a few reasons:

  • By separating the commands we can identify failures and address them easier.
  • This method uses a foundational approach whereby the nodes are established and the cluster services verified, the mailbox roles established and verified, and only after that the clustered instance established and verified.
  • When the commands are combined, and the failure occurs in the /newCMS portion of the command, further setup attempts using the same command error indicating the mailbox roles is currently installed and must be uninstalled.  Although the error is true, uninstalling the mailbox role and rerunning the same command without correcting the conditions that failed the cluster portion of setup will generally result is hitting the same error again (and thus the loop continues).
  • In a shared storage environment the restart of the clustered services during mailbox role installation causes storage to move between nodes.  Setup attempts to move this storage back, but is sometimes not successful in doing so.  This causes the /newCMS portion of the setup command to fail, and thus the active role command to fail.  Separating the commands allows us to establish storage and arbitrate it to the “active” node for setup operations after the mailbox role has been installed.
  • A setup code error exists where when an existing CMS exists in a cluster, and the passive node installation is run, the passive nodes machine account is promoted to full control permissions of the CMS.  This results in errors in the management console and management shell that an account was found with full administrator rights that is not an Exchange administrator.  (Note:  The error is truly benign, I just point it out because it is annoying.)

Now I will outline for you the steps that I use when establishing a Windows 2008 / Exchange 2007 SP1 Cluster.  Please refer to these additional blog posts for some other helpful information regarding these installations:

In all instructions below the assumption is that the cluster service and appropriate quorum type have been established and fully configured.  The instructions also outline only the establishment of the Exchange resources, the TechNet documentation referenced above should be used to establish the specific cluster service requirements and other installation / management tasks for an Exchange 2007 SP1 clustered installation on Windows 2008.

 

Windows 2008 / Exchange 2007 SP1 Cluster Continuous Replication (CCR)

  • On each node, run setup.com /mode:install /roles:mailbox.
    • This will establish the “passive” installation.
    • A reboot will be required post installation.
    • The installation command when run on a machine where the clustered service is configured and running simply installs the Exchange binaries to the local machine.
  • Install the appropriate Exchange 2007 SP1 rollup update for your organization.
  • On the “active” node, run setup.com /newCMS /cmsName:<NAME> /cmsIPv4Addresses:<IPAddress>,<IPAddress>
    • This will establish the “active” installation.

At this point the replication service will begin copying log files from the new databases created and bring both sides into synchronization.  This completes the setup of a Windows 2008 / Exchange 2007 SP1 CCR cluster.

 

Windows 2008 / Exchange 2007 SP1 Single Copy Cluster (SCC)

  • After establishing the clustered services any shared storage must be prepared for the Exchange installation.  Consider the following regarding shared storage:
    • In Windows 2008 shared storage found at the time of cluster configuration is now placed into the “Available Storage” group.
    • The “Available Storage” group is actually a hidden group inside of cluster.  (You can see this group by running cluster <clusterFQDN> group from a command line.)
    • Each lettered physical drive and mountpoint found during cluster configuration will be represented in the available storage group by a physical disk resource.
    • Mountpoint physical disk resources are not automatically made dependant on the lettered physical disk resource hosting them.
  • On each node, run setup.com /mode:install /roles:mailbox
    • This will establish the “passive” installation.
    • A reboot will be required post installation.
  • Install the appropriate Exchange 2007 SP1 rollup update for your organization.
  • Using the command line, move the available storage group to the node where you will run the “active” installation:
    • Open a command prompt.
    • Run cluster.exe <clusterFQDN> group “Available Storage” /moveto:<NODE>
    • For example, cluster.exe cluster-1.exchange.msft group “Available Storage” /moveto:node1.exchange.msft
  • On the “active” node, run:
    • setup.com /newCMS /cmsName:<NAME> /cmsIPv4Addresses:<IPAddress>,<IPAddress> /css /cmsDataPath:<Path to folder on shared disk>
    • This will establish the “active” installation storing the initial database and log files on the shared disk path specified in the command.
  • Address the remaining shared storage requirements:
    • Physical disk resources must be moved from the available storage group to the Exchange CMS group created during setup.
    • This can be accomplished by right clicking on the CMS group in cluster administrator and selecting add storage.
    • After the storage is moved, all mountpoint resources should be configured to have a dependency on the lettered physical disk hosting the mountpoint.
    • As outlined in TechNet documentation, the dependencies for the database instances must be configured so that each database is dependant on the physical disks where the files reside.

 

By using these steps you should hopefully be able to simply your Exchange 2007 SP1 clustered installation on Windows 2008 and more easily address errors that may arise during the clustered setup.

An interesting issue with file level antivirus…

In order for Exchange to function correctly administrators are generally advised to make two important configuration changes to their file level antivirus scanners:

  • Exclude directories that contain Exchange binaries and Exchange data.
  • Add process exclusions for the Exchange processes (for example – store.exe etc).

When antivirus exclusions are not properly set servers may experience performance issues or database availability issues.

Today I worked an interesting issue with a customer.  A common issue when antivirus settings are not set correctly is that the ENNtmp.log file in the Exchange database log directory is deleted or quarantined during the rename procedure.  Remember that while a database is mounted any current transactions are being written to the ENN.log file.  When the ENN.log file is full, the lock on it is released and the file is renamed to it’s full name (for example ENN00001af.log).  While writing to the ENN.log, the next log file in the series is being built as the ENNtmp.log.  When the ENN.log is full, and that rename is occurring another rename operation is occurring – renaming the ENNtmp.log to ENN.log.  As locks are released on these files to facilitate renaming etc – file level antivirus when mis-configured can interject itself and operate against these files.  In this case the file was deleted as a virus and the database forcefully dismounted.

In our case events were clearly logged by the antivirus scanner indicating an action was performed on the log file in question.  This naturally lead us to reviewing the file level antivirus exclusions.

The structure of the server had all log directories stored on L: and all databases stored on M:.  When the file system exclusions were reviewed, you could clearly see that exclusions for L and M existed and were set to include sub-directories.

Upon further investigation it was determined that the log file directories though were actually mountpoints.  For example, L:SG1 where SG1 is a mounted volume as a mountpoint. 

From here it was determined that the antivirus scanner was actually looking at things at the volume / disk level.  Therefore, L: exclusions did not apply since SG1 was a mounted physical disk, and not a folder on the file system.

Given the above example:

A folder L:Folder would be excluded since it’s a folder existing on the L file system.

A mountpoint L:MountPoint would not be excluded since L is a physical disk and MountPoint is a physical disk – and no exclusion existed specifically for the mounted physical disk.

A folder L:FolderMountpoint exists.  In this case the L:Folder would be excluded, but the L:FolderMountpoint would not be excluded since the Mountpoint is a mounted physical disk.

The moral of the story…please make sure you are following up with your antivirus vendor on the appropriate method to apply exclusions when files exist on both lettered physical disks and mounted physical disks as mountpoints.

=========================

Update – 2/16/2015

A customer shared this with me today as a vendor example.

https://kc.mcafee.com/corporate/index?page=content&id=KB70886

 

=========================

Considerations when using mountpoints with Exchange 2007 clusters.

It is not uncommon in many installations today, especially clustered installations, to see the use of mountpoints.

When using mountpoints with clustering it is a requirement that the physical disk resources created for the mountpoints be dependant on the lettered physical volume that hosts them. 

The use of mountpoints also requires that the lettered physical volume hosting them be available at all times.  Exchange uses the full path – for example X:MountPointdata…

I wanted to highlight a design consideration that I encourage you to consider when utilizing mountpoints.

I see two common designs where mountpoints are utilized.

  • A single lettered volume and all mountpoints created off that volume.

image

  • Two lettered volumes – database mountpoints created off one lettered volume with log mountpoints created off the second lettered volume.

image

These two designs do give logical ( and possibly physical depending on storage design )separation to the storage for databases and the storage for log files.  It also limits the amount of lettered physical volumes that are required to support the desired solution. 

The issue it introduces though is a single point of failure.

In the first example in order for the mountpoints to function you have to have the X drive always available.  If the X drive is lost, all mountpoints associated with the X drive are lost, and the database instances dependant on those mountpoints are lost.

In the second example you have the same issue as in the first example where a single point of failure exists.  A loss of either the X or Y disk causes mounpoints associated on those disks to be unavailable.  Since an Exchange database instance requires dependencies on both the mountpoint hosting the database, and the mountpoint hosting the logs, the database instance becomes unavailable.

You can increase the availability of the overall solution by spreading out the mountpoints over a series of lettered volumes.  For example, let’s take a solution there there will be 25 storage groups.  You could in this instance create 5 lettered volumes.  Off of each of these 5 lettered volumes you would create 10 mountpoints.  Each of the mountpoint pairs would serve as the database and log drives for a single database instance.  That would mean that the loss of a single lettered volume would only affect users hosted in 5 of the databases in the solution, verses all 25 databases being affected when using the design outlined in either of the previous examples. 

Here is a small picture example of this concept.

image

Another consideration is that when a root volume hosting mount points is lost and subsequently replaced, the administrator must manually recreate the necessary folder structure and map the mount points back.  By spreading this out, you can hopefully eliminate the amount of work that would be necessary to restore the solution.

By reducing the single points of failure you can increase the overall availability of the solution.

‘ClusteredMailboxServer’ is partially configured – ExBPA.

When running ExBPA against an Exchange 2007 cluster, the following error text may be noted:

===========================================================

Error: ‘ClusteredMailboxServer’ is partially configured

Server: <SERVER>

‘ClusteredMailboxServer’ is partially configured on server <SERVER>.contoso.com. Either setup is configuring it right now or setup failed while configuring it. Please rerun setup if it isn’t currently running.

===========================================================

The link text from ExBPA advises users on how to correct this issue:

===========================================================

Possible CMS Setup Failure  Send Feedback

Microsoft Exchange Server Analyzer – Articles > Exchange Cluster

Topic Last Modified: 2008-04-01

The Microsoft® Exchange Server Analyzer Tool reads the following registry branch for the presence of the Action or Watermark registry values:

HKEY_LOCAL_MACHINESoftwareMicrosoftExchangev8.0ClusteredMailboxServer

If the Exchange Server Analyzer finds that either the Action or Watermark registry values are present, an error is displayed.

The presence of the Action or Watermark registry values on this registry branch indicates that either there was a previously unsuccessful installation of the Cluster Mailbox Server (CMS) to the cluster node, or that the CMS installation was in progress when the Exchange Server Analyzer was doing its analysis.

If the CMS installation failed on this node but was successful on other nodes of the cluster, it could appear that the cluster network is functioning normally when that is not the case.

To address this error, determine whether CMS installation is still in progress on the affected node and, if not, rerun setup to complete the failed CMS installation.

===========================================================

One of the ExBPA rules checks the registry to find out the install state of roles on the server.  Exchange 2007 uses watermark keys to track setup progress.  If a failure is encountered during setup, when setup is rerun the watermark is consulted and setup resumed at or near the point of failure.  When setup completes successfully the watermark value is cleared.

The following is a sample export of a clustered server mailbox role watermark.

===========================================================

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINESOFTWAREMicrosoftExchangev8.0ClusteredMailboxServer]
"Action"="Install"
"Watermark"=dword:7448a073

===========================================================

It is not uncommon to see this rule triggered against clustered nodes – especially if there at one point was a setup failure.  Sometimes during installation when a failure is encountered and the clustered mailbox server is partially configured the resources created will be moved between nodes.  Setup is then run on the other node in hopes that it will complete successfully.  In some cases it will complete successfully and in other cases it will fail. 

The watermark is not checkpointed / replicated between the registry of the nodes.  In the event that setup fails each node will have a watermark indicating the point in the installation where the failure occurred.  Should one of the nodes complete setup successfully, one node will have it’s watermark cleared while the other retains the watermark. 

The presence of this key triggers the ExBPA rule.

In order to tell what actions need to be taken we must first determine if installation completed successfully.  For this we can turn to the Exchange setup logs that are created on each node.  We will review the logs and see if it can be determined if setup completed successfully.

The setup logs are located at c:ExchangeSetupLogs.

We are looking for the file ExchangeSetup.log.  This file can be opened in a text viewer.

If the following text can be found in the setup log it should be safe to assume that setup completed successfully for that CMS on at least one of the nodes.

The start of the clustered mailbox server installation is indicated in the logs by the following series of text (or similar depending on single copy cluster or cluster continuous replication installation):

===========================================================

<DATE> [0] Setup will run the task ‘new-ClusteredMailboxServer’
<DATE> [1] Setup launched task ‘new-ClusteredMailboxServer -DomainController DC’ -PublicFolderDatabase $false -updatesdir ‘Y:x64Updates’ -Name <CMSNAME> -IPAddress <IPAddress> -IPv4Addresses $null -IPv4Networks $null -IPv6Networks $null -SharedStorage $false -DataPath <DATAPATH>

===========================================================

If you continue to scroll through the log and reach the following text the setup command completed successfully.

===========================================================

<DATE> [0] The Microsoft Exchange Server setup operation completed successfully.
<DATE> [0] End of Setup

===========================================================

Once you have verified that the command was run and completed successfully, the watermark can be exported and safely removed using registry editor.

If for any reason you cannot verify that setup finished for this CMS instance on at least one of the nodes, the only safe procedure for removing the watermark is to allow setup to complete successfully.  (This needs to occur even if the resources are already online and even if they are servicing users).

If you have the setup log this can be an easy task.  The parameters that were used for setup are also logged in this log.  The following line will tell you what parameters were used when establishing the clustered mailbox server installation. 

<DATE> [0] ExSetup was started with the following command: ‘/NoLogo /newCMS /cmsName:2003-MBX3 /cmsIPAddress:192.168.0.14 /sourcedir:Y:x64 /FromSetup’.

In this case you can see that the CMSName parameter used is 2003-MBX3, the CMSIpaddress is 192.168.0.4.  No additional switches (like CSS and CMSDataPath) indicate this was a CCR installation.

If you do not have the setup log you will most likely need to use cluster administrator to get the necessary information.  Using cluster administrator locate the CMS Network Name resource and CMS IPv4 Resource.  Make note of the values of each of these resources.

Once you have the values necessary to complete the command the CMS must be moved to the node where the watermark exists.  If the CMS is not currently on that node, issue a move-clusteredmailboxserver using the Exchange Management Shell. 

Once the CMS has been moved to the node with the watermark, run setup using the command line setup with  appropriate values.  Using my example above:

setup.com /newCMS /cmsName:2003-MBX3 /cmsIPAddress:192.168.0.4

Setup will consult the watermark on the machine and when it completes successfully clear the watermark value.  This will ensure that the CMS is now fully configured to support Exchange.

Log files required to be copied after an update-storagegroupcopy / re-seed has been successfully completed.

There may arise times where it is necessary to completely reseed an LCR, CCR, or SCR database copy from the source database.  In order to reseed the copy we use the update-storagegroupcopy commandlet.

When the update-storagegroupcopy is run, the database is pulled using the ESE backup online streaming API from the source machine to the target machine.  If the database is successfully copied without error the replication instance is automatically resumed.  No log files are pulled or copied as a part of the update-storagegroupcopy process.  It is not until after the update-storagegroupcopy process is completed, and replication is resumed, that the header of the database is reviewed and the replication service determines which logs are necessary to be copied.

In this blog post I want to highlight how the replication service makes the decision on which log files need to be copied post a re-seed of the database.  I will use examples from cluster.replay tracing (which can only be done with consultation with product support services).

*Databases copied offline between servers (clean shutdown).

When databases are copied offline between nodes this is a manual seeding operation.  By default a database that is offline and copied between nodes is in a clean shutdown state. 

Here is a sample header dump.

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server

Version 08.01

Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode…
         Database: 2003-MBX3-SG1-DB1.edb

        File Type: Database
   Format ulMagic: 0x89abcdef
   Engine ulMagic: 0x89abcdef
Format ulVersion: 0x620,12
Engine ulVersion: 0x620,12
Created ulVersion: 0x620,12
     DB Signature: Create time:08/09/2009 14:10:12 Rand:7948610 Computer:
         cbDbPage: 8192
           dbtime: 20053 (0x4e55)
            State: Clean Shutdown
     Log Required: 0-0 (0x0-0x0)
    Log Committed: 0-0 (0x0-0x0)
   Streaming File: No
         Shadowed: Yes
       Last Objid: 133
     Scrub Dbtime: 0 (0x0)
       Scrub Date: 00/00/1900 00:00:00
     Repair Count: 0
      Repair Date: 00/00/1900 00:00:00
Old Repair Count: 0
  Last Consistent: (0x9,A,1C1)  08/09/2009 14:12:18
      Last Attach: (0x8,9,86)  08/09/2009 14:12:15
      Last Detach: (0x9,A,1C1)  08/09/2009 14:12:18
             Dbid: 1
    Log Signature: Create time:08/09/2009 14:10:08 Rand:7930576 Computer:
       OS Version: (5.2.3790 SP 2)

Previous Full Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Incremental Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Copy Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Differential Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Current Full Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Current Shadow copy backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

     cpgUpgrade55Format: 0
    cpgUpgradeFreePages: 0
cpgUpgradeSpaceMapPages: 0

       ECC Fix Success Count: none
   Old ECC Fix Success Count: none
         ECC Fix Error Count: none
     Old ECC Fix Error Count: none
    Bad Checksum Error Count: none
Old bad Checksum Error Count: none

Operation completed successfully in 0.63 seconds.

 

When replication is resumed, the header of the database is consulted.  Here is an example trace tag from non-customer viewable tracing.

2826 74006100440074 2256 Cluster.Replay FileChecker RunChecks is successful. FileState is: LowestGenerationPresent: 0
HighestGenerationPresent: 0
LowestGenerationRequired: 0
HighestGenerationRequired: 0
LastGenerationBackedUp: 0
CheckpointGeneration: 0
LogfileSignature:
LatestFullBackupTime:
LatestIncrementalBackupTime:
LatestDifferentialBackupTime:
LatestCopyBackupTime:
SnapshotBackup:
SnapshotLatestFullBackup:
SnapshotLatestIncrementalBackup:
SnapshotLatestDifferentialBackup:
SnapshotLatestCopyBackup:
ConsistentDatabase: True
2827 74006100440074 2256 Cluster.Replay ReplicaInstance SetReplayState(): LowestGenerationPresent: 0
HighestGenerationPresent: 0
LowestGenerationRequired: 0
HighestGenerationRequired: 0
LastGenerationBackedUp: 0
CheckpointGeneration: 0
LogfileSignature:
LatestFullBackupTime:
LatestIncrementalBackupTime:
LatestDifferentialBackupTime:
LatestCopyBackupTime:
SnapshotBackup:
SnapshotLatestFullBackup:
SnapshotLatestIncrementalBackup:
SnapshotLatestDifferentialBackup:
SnapshotLatestCopyBackup:
ConsistentDatabase: True

You can see here that the replication service, after reading the status of the database, has detected that a clean shutdown database was found.  (ConsisntentDatabase: True).

Since no backup has been performed on the database, and the log directory on the target was empty prior to resuming the storage group copy, the replication service determines that no minimum log file was necessary.  Log copy will start from the first log available on the source server and continue to the highest generation on the source server.  As long as the generation is contiguous replication will proceed and remain healthy post the manual database seed.  (It would be best practice with an offline reseed to clear the log directory on the target prior to resuming the database copy).

2871 030F3F44 2256 Cluster.Replay ReplicaInstance No logfiles present, no backup information, no required generation
2872 030F3F44 2256 Cluster.Replay ReplicaInstance Log copying will start from generation 0

The replication service begins the replication instance and queries the source directory, in this case it was determined that log file 5 was the first available, and thus the replication service starts by copying this log file.

2903 0385717B 2256 Cluster.Replay NetPath ClusterPathManager.GetPath() returns \2003-mbx3-replc3d0099f3-ff35-46ea-8a2f-39eb50923209$
2904 007D2DBB 2256 Cluster.Replay LogCopy First generation for \2003-mbx3-replc3d0099f3-ff35-46ea-8a2f-39eb50923209$ is 00000005
2905 0385717B 2256 Cluster.Replay NetPath ClusterPathManager.GetPath() returns \2003-mbx3-replc3d0099f3-ff35-46ea-8a2f-39eb50923209$
2906 007D2DBB 2256 Cluster.Replay PFD PFD CRS 18907 First generation for \2003-mbx3-replc3d0099f3-ff35-46ea-8a2f-39eb50923209$ is 00000005
2907 0385717B 2256 Cluster.Replay NetPath ClusterPathManager.GetPath() returns \2003-mbx3-replc3d0099f3-ff35-46ea-8a2f-39eb50923209$
2908 007D2DBB 2256 Cluster.Replay ShipLog LogCopy: Trying to find file \2003-mbx3-replc3d0099f3-ff35-46ea-8a2f-39eb50923209$E0000000005.log
2909 007D2DBB 2256 Cluster.Replay ShipLog LogCopy: Found file E0000000005.log
2910 007D2DBB 2256 Cluster.Replay PFD PFD CRS 18395 LogCopy: Found file E0000000005.log

This is also confirmed by the event in the application log indicating that log copy began by successfully copying log generation 5 (0x5).

Event Type:    Information
Event Source:    MSExchangeRepl
Event Category:    Service
Event ID:    2114
Date:        8/9/2009
Time:        2:15:22 PM
User:        N/A
Computer:    2003-NODE2
Description:
The replication instance for storage group 2003-MBX32003-MBX3-SG1 has started copying transaction log files. The first log file successfully copied was generation 5.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

At minimum we must have all logs from the Last Consistent log generation forward in order to maintain replication.  This makes sense, if I did not have all logs from the last consistent log (where the database was shutdown) forward, how could I bring the passive copy up to current point in time?

As long as the generation is contiguous and logs are present on the source from last consistent to current log replication will proceed and remain healthy post the manual database seed.

 

*Database seeded using Update-StorageGroupCopy where no full or incremental backup was performed.

In this example we have a database on a source server that has neither had a full or incremental backup performed on it.  The storage group replication between nodes was suspended using suspend-storagegroupcopy.  Then the update-storagegroupcopy command was used to stream the database to the target server (the –manualResume switch was also used so I could generate the header dumps).  Below is a sample header dump of a database post an update-storagegroupcopy.

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server

Version 08.01

Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode…
         Database: 2003-MBX3-SG1-DB1.edb

        File Type: Database
   Format ulMagic: 0x89abcdef
   Engine ulMagic: 0x89abcdef
Format ulVersion: 0x620,12
Engine ulVersion: 0x620,12
Created ulVersion: 0x620,12
     DB Signature: Create time:08/09/2009 14:10:12 Rand:7948610 Computer:
         cbDbPage: 8192
           dbtime: 20053 (0x4e55)
            State: Dirty Shutdown
     Log Required: 11-11 (0xb-0xb)
    Log Committed: 0-14 (0x0-0xe)
   Streaming File: No
         Shadowed: Yes
       Last Objid: 133
     Scrub Dbtime: 0 (0x0)
       Scrub Date: 00/00/1900 00:00:00
     Repair Count: 0
      Repair Date: 00/00/1900 00:00:00
Old Repair Count: 0
  Last Consistent: (0x9,A,1C1)  08/09/2009 14:12:18
      Last Attach: (0xB,9,86)  08/09/2009 14:28:20
      Last Detach: (0x0,0,0)  00/00/1900 00:00:00
             Dbid: 1
    Log Signature: Create time:08/09/2009 14:10:08 Rand:7930576 Computer:
       OS Version: (5.2.3790 SP 2)

Previous Full Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Incremental Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Copy Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Differential Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Current Full Backup:
        Log Gen: 11-14 (0xb-0xe)
           Mark: (0xE,188,167)
           Mark: 08/09/2009 14:29:54

Current Shadow copy backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

     cpgUpgrade55Format: 0
    cpgUpgradeFreePages: 0
cpgUpgradeSpaceMapPages: 0

       ECC Fix Success Count: none
   Old ECC Fix Success Count: none
         ECC Fix Error Count: none
     Old ECC Fix Error Count: none
    Bad Checksum Error Count: none
Old bad Checksum Error Count: none

Operation completed successfully in 0.31 seconds.

In this header dump you will notice that the database is in dirty shutdown.  This is expected of a database that has come from an online seeding operation.  You will also note that the Current Full Backup header section of the database is populated.  The low log value here is 11 (0xb) and the high log value is 14 (0xe).

After the header dump was generated I resumed storage group copy (normally after a successful update-storagegroupcopy this is done for you automatically).  When replication is resumed the header of the database is consulted.  Here is a sample output from non-customer viewable tracing.

5107 61007400610044 2256 Cluster.Replay FileChecker RunChecks is successful. FileState is: LowestGenerationPresent: 0
HighestGenerationPresent: 0
LowestGenerationRequired: 11
HighestGenerationRequired: 11
LastGenerationBackedUp: 0
CheckpointGeneration: 0
LogfileSignature: MJET_SIGNATURE(Random = 7930576,CreationTime = 8/9/2009 2:10:08 PM)
LatestFullBackupTime:
LatestIncrementalBackupTime:
LatestDifferentialBackupTime:
LatestCopyBackupTime:
SnapshotBackup:
SnapshotLatestFullBackup:
SnapshotLatestIncrementalBackup:
SnapshotLatestDifferentialBackup:
SnapshotLatestCopyBackup:
ConsistentDatabase: False
5108 61007400610044 2256 Cluster.Replay ReplicaInstance SetReplayState(): LowestGenerationPresent: 0
HighestGenerationPresent: 0
LowestGenerationRequired: 11
HighestGenerationRequired: 11
LastGenerationBackedUp: 0
CheckpointGeneration: 0
LogfileSignature: MJET_SIGNATURE(Random = 7930576,CreationTime = 8/9/2009 2:10:08 PM)
LatestFullBackupTime:
LatestIncrementalBackupTime:
LatestDifferentialBackupTime:
LatestCopyBackupTime:
SnapshotBackup:
SnapshotLatestFullBackup:
SnapshotLatestIncrementalBackup:
SnapshotLatestDifferentialBackup:
SnapshotLatestCopyBackup:
ConsistentDatabase: False

You will note from this output that the HighestGenerationRequired and LowestGenerationRequired is 11 (0xb).  This is based on the current full backup information in the header of the database.  The lowest log recorded in current full backup represents the lowest log necessary to complete the source database at the time the update-storagegroupcopy was run.

You will note that the events in the application log indicate log copy started with logs 11 (0xb).

Event Type:    Information
Event Source:    MSExchangeRepl
Event Category:    Service
Event ID:    2114
Date:        8/9/2009
Time:        2:36:00 PM
User:        N/A
Computer:    2003-NODE2
Description:
The replication instance for storage group 2003-MBX32003-MBX3-SG1 has started copying transaction log files. The first log file successfully copied was generation 11.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Post an update-storagegroupcopy replication will remain healthy pending that all logs are present and contiguous on the source server from the time the update-storagegroupcopy was initiated until it completed successfully.

 

*Database seeded using Update-StorageGroupCopy where a full backup was performed on the source database.

In this example we have a database on a source server that has had a full backup performed on it (in this case an ESE online streaming backup).  The storage group replication between nodes was suspended using suspend-storagegroupcopy.  Then the update-storagegroupcopy command was used to stream the database to the target server (the –manualResume switch was also used so I could generate the header dumps).  Below is a sample header dump of a database post an update-storagegroupcopy.

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server

Version 08.01

Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode…
         Database: 2003-MBX3-SG1-DB1.edb

        File Type: Database
   Format ulMagic: 0x89abcdef
   Engine ulMagic: 0x89abcdef
Format ulVersion: 0x620,12
Engine ulVersion: 0x620,12
Created ulVersion: 0x620,12
     DB Signature: Create time:08/09/2009 14:10:12 Rand:7948610 Computer:
         cbDbPage: 8192
           dbtime: 22631 (0x5867)
            State: Dirty Shutdown
     Log Required: 31-31 (0x1f-0x1f)
    Log Committed: 0-32 (0x0-0x20)
   Streaming File: No
         Shadowed: Yes
       Last Objid: 134
     Scrub Dbtime: 0 (0x0)
       Scrub Date: 00/00/1900 00:00:00
     Repair Count: 0
      Repair Date: 00/00/1900 00:00:00
Old Repair Count: 0
  Last Consistent: (0x1D,A,1C1)  08/09/2009 14:48:13
      Last Attach: (0x1F,9,86)  08/09/2009 14:48:15
      Last Detach: (0x0,0,0)  00/00/1900 00:00:00
             Dbid: 1
    Log Signature: Create time:08/09/2009 14:10:08 Rand:7930576 Computer:
       OS Version: (5.2.3790 SP 2)

Previous Full Backup:
        Log Gen: 20-21 (0x14-0x15)
           Mark: (0x15,D,195)
           Mark: 08/09/2009 14:45:43

Previous Incremental Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Copy Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Differential Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Current Full Backup:
        Log Gen: 31-32 (0x1f-0x20)
           Mark: (0x20,E,185)
           Mark: 08/09/2009 14:49:06

Current Shadow copy backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

     cpgUpgrade55Format: 0
    cpgUpgradeFreePages: 0
cpgUpgradeSpaceMapPages: 0

       ECC Fix Success Count: none
   Old ECC Fix Success Count: none
         ECC Fix Error Count: none
     Old ECC Fix Error Count: none
    Bad Checksum Error Count: none
Old bad Checksum Error Count: none

Operation completed successfully in 0.46 seconds.

In this header dump you will note that Previous Full Backup is populated.  The low log generation is 20 (0x14) and the high log generation is 21 (0x15).

After the header dump was generated I resumed storage group copy (normally after a successful update-storagegroupcopy this is done for you automatically).  When replication is resumed the header of the database is consulted.  Here is a sample output from non-customer viewable tracing.

6593 61007400610044 2472 Cluster.Replay FileChecker RunChecks is successful. FileState is: LowestGenerationPresent: 0
HighestGenerationPresent: 0
LowestGenerationRequired: 31
HighestGenerationRequired: 31
LastGenerationBackedUp: 21
CheckpointGeneration: 0
LogfileSignature: MJET_SIGNATURE(Random = 7930576,CreationTime = 8/9/2009 2:10:08 PM)
LatestFullBackupTime: 8/9/2009 2:45:43 PM
LatestIncrementalBackupTime:
LatestDifferentialBackupTime:
LatestCopyBackupTime:
SnapshotBackup: False
SnapshotLatestFullBackup: False
SnapshotLatestIncrementalBackup:
SnapshotLatestDifferentialBackup:
SnapshotLatestCopyBackup:
ConsistentDatabase: False
6594 61007400610044 2472 Cluster.Replay ReplicaInstance SetReplayState(): LowestGenerationPresent: 0
HighestGenerationPresent: 0
LowestGenerationRequired: 31
HighestGenerationRequired: 31
LastGenerationBackedUp: 21
CheckpointGeneration: 0
LogfileSignature: MJET_SIGNATURE(Random = 7930576,CreationTime = 8/9/2009 2:10:08 PM)
LatestFullBackupTime: 8/9/2009 2:45:43 PM
LatestIncrementalBackupTime:
LatestDifferentialBackupTime:
LatestCopyBackupTime:
SnapshotBackup: False
SnapshotLatestFullBackup: False
SnapshotLatestIncrementalBackup:
SnapshotLatestDifferentialBackup:
SnapshotLatestCopyBackup:
ConsistentDatabase: False

In this output you will note that LastGenerationBackedUp is 21 (0x14).  This corresponds to the high log generation as stamped in previous full backup.  You’ll also note that the LowestGenerationRequired and HighestGenerationRequired is 31 which corresponds to the low log value stamped in current full backup. 

In this case log file copy will start at generation 21 (0x14).  Events in the application log correspond with this:

Event Type:    Information
Event Source:    MSExchangeRepl
Event Category:    Service
Event ID:    2114
Date:        8/9/2009
Time:        2:51:20 PM
User:        N/A
Computer:    2003-NODE2
Description:
The replication instance for storage group 2003-MBX32003-MBX3-SG1 has started copying transaction log files. The first log file successfully copied was generation 21.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

The difference between this example and previous examples is that a full backup was performed.  The decision to start copy at log 21 (0x14), which is based on previous full backup, makes sense if you think about the replication service.  Remember that a database can be backed up either from the active or passive nodes.  If I did not base my log file copy on previous full backup that means that I would not have all the logs on my passive copy since the last full backup.  This would essentially prevent me from, at a later point in time, performing an incremental backup.  (Remember an incremental backup requires all log files from the previous full backup be present).

When a database has had a full backup on it replication will remain healthy as long as all logs are contiguous on the source from the high log generation as stamped in previous full backup to the current log.

 

*Database seeded using Update-StorageGroupCopy where a full and incremental backup was performed on the source database.

In this example we have a database on a source server that has had a full and incremental backup performed on it (in this case an ESE online streaming backup).  The storage group replication between nodes was suspended using suspend-storagegroupcopy.  Then the update-storagegroupcopy command was used to stream the database to the target server (the –manualResume switch was also used so I could generate the header dumps).  Below is a sample header dump of a database post an update-storagegroupcopy.

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server

Version 08.01

Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode…
         Database: 2003-MBX3-SG1-DB1.edb

        File Type: Database
   Format ulMagic: 0x89abcdef
   Engine ulMagic: 0x89abcdef
Format ulVersion: 0x620,12
Engine ulVersion: 0x620,12
Created ulVersion: 0x620,12
     DB Signature: Create time:08/09/2009 14:10:12 Rand:7948610 Computer:
         cbDbPage: 8192
           dbtime: 22745 (0x58d9)
            State: Dirty Shutdown
     Log Required: 50-50 (0x32-0x32)
    Log Committed: 0-51 (0x0-0x33)
   Streaming File: No
         Shadowed: Yes
       Last Objid: 134
     Scrub Dbtime: 0 (0x0)
       Scrub Date: 00/00/1900 00:00:00
     Repair Count: 0
      Repair Date: 00/00/1900 00:00:00
Old Repair Count: 0
  Last Consistent: (0x30,A,1C1)  08/09/2009 14:59:22
      Last Attach: (0x32,9,86)  08/09/2009 14:59:24
      Last Detach: (0x0,0,0)  00/00/1900 00:00:00
             Dbid: 1
    Log Signature: Create time:08/09/2009 14:10:08 Rand:7930576 Computer:
       OS Version: (5.2.3790 SP 2)

Previous Full Backup:
        Log Gen: 20-21 (0x14-0x15)
           Mark: (0x15,D,195)
           Mark: 08/09/2009 14:45:43

Previous Incremental Backup:
        Log Gen: 5-34 (0x5-0x22)
           Mark: (0x23,8,16)
           Mark: 08/09/2009 14:59:00

Previous Copy Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Differential Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Current Full Backup:
        Log Gen: 50-51 (0x32-0x33)
           Mark: (0x33,F,29)
           Mark: 08/09/2009 15:00:05

Current Shadow copy backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

     cpgUpgrade55Format: 0
    cpgUpgradeFreePages: 0
cpgUpgradeSpaceMapPages: 0

       ECC Fix Success Count: none
   Old ECC Fix Success Count: none
         ECC Fix Error Count: none
     Old ECC Fix Error Count: none
    Bad Checksum Error Count: none
Old bad Checksum Error Count: none

Operation completed successfully in 0.78 seconds.

In this header dump you will note that Previous Incremental Backup is populated.  The low log generation is 5 (0x5) and the high log generation is 34 (0x22).

After the header dump was generated I resumed storage group copy (normally after a successful update-storagegroupcopy this is done for you automatically).  When replication is resumed the header of the database is consulted.  Here is a sample output from non-customer viewable tracing.

8933 61007400610044 2472 Cluster.Replay ReplicaInstance SetReplayState(): LowestGenerationPresent: 0
HighestGenerationPresent: 0
LowestGenerationRequired: 50
HighestGenerationRequired: 50
LastGenerationBackedUp: 34
CheckpointGeneration: 0
LogfileSignature: MJET_SIGNATURE(Random = 7930576,CreationTime = 8/9/2009 2:10:08 PM)
LatestFullBackupTime: 8/9/2009 2:45:43 PM
LatestIncrementalBackupTime: 8/9/2009 2:59:00 PM
LatestDifferentialBackupTime:
LatestCopyBackupTime:
SnapshotBackup: False
SnapshotLatestFullBackup: False
SnapshotLatestIncrementalBackup: False
SnapshotLatestDifferentialBackup:
SnapshotLatestCopyBackup:
ConsistentDatabase: False
8934 020C9AB5 2472 Cluster.Replay State CopyGenerationNumber is changing to 0 on replica 3d0099f3-ff35-46ea-8a2f-39eb50923209

In this output you will note that LastGenerationBackedUp is 34 (0x14).  This corresponds to the high log generation as stamped in previous incremental backup.  You’ll also note that the LowestGenerationRequired and HighestGenerationRequired is 50 which corresponds to the low log value stamped in current full backup. 

In this case log file copy will start at generation 34 (0x22).  Events in the application log correspond with this:

Event Type:    Information
Event Source:    MSExchangeRepl
Event Category:    Service
Event ID:    2114
Date:        8/9/2009
Time:        2:51:20 PM
User:        N/A
Computer:    2003-NODE2
Description:
The replication instance for storage group 2003-MBX32003-MBX3-SG1 has started copying transaction log files. The first log file successfully copied was generation 34.

The difference between this example and previous examples is that a full and incremental backup was performed.  The decision to start copy at log 34 (0x22), which is based on previous incremental backup, makes sense if you think about the replication service.  Remember that a database can be backed up either from the active or passive nodes.  If I did not base my log file copy on previous incremental backup that means that I would not have all the logs on my passive copy since the last incremental backup.  This would essentially prevent me from, at a later point in time, performing an incremental backup of the passive copy.  (Remember and incremental backup requires all log files from the previous incremental backup be present).

When a database has had a incremental backup on it replication will remain healthy as long as all logs are contiguous on the source from the high log generation as stamped in previous incremental backup to the current log.

 

*So what does an example look like where the necessary logs are not present.

In this example I have a database that has had a full and incremental backup performed on it.  I have suspended the storage group copy between nodes and forced log generation to occur.  I then went into the source log directory, and removed two logs from the end of the log stream. 

This is not an uncommon example.  While storage group copy is failed or suspended logs will continue to generate on the source server.  All full and incremental backups of the source will continue to be successful, but logs will not purge.  Depending on the size of your log file drive, and the amount of time that copy is suspended or failed, your log drive may begin to fill up.  This may lead to administrators manually purging the log file series.

Here is an eseutil /ml output of the source log directory showing the gap.

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server

Version 08.01

Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode…

Verifying log files…
     Base name: e00

      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000005.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000006.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000007.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000008.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000009.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000000A.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000000B.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000000C.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000000D.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000000E.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000000F.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000010.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000011.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000012.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000013.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000014.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000015.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000016.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000017.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000018.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000019.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000001A.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000001B.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000001C.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000001D.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000001E.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000001F.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000020.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000021.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000022.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000023.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000024.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000025.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000026.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000027.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000028.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000029.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000002A.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000002B.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000002C.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000002D.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000002E.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000002F.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000030.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000031.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000032.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000033.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000034.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000035.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000036.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000037.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000038.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000039.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000003A.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000003B.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000003C.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000003D.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000003E.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000003F.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000040.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000041.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000042.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000043.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000044.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000045.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000046.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000047.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000048.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000049.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000004A.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000004B.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000004C.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000004D.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000004E.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000004F.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000050.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000051.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000052.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000053.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000054.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000055.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000056.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000057.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000058.log – OK
      Missing log files: e00{00000059 – 0000005A}.log
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000005B.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000005C.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000005D.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE00.log – OK

Operation terminated with error -528 (JET_errMissingLogFile, Current log file missing) after 5.203 seconds.

On the passive node I have now issues my update-storagegroupcopy.  Prior to resuming storage group copy I dumped the header of the database and here is the output.

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server

Version 08.01

Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode…
         Database: 2003-MBX3-SG1-DB1.edb

        File Type: Database
   Format ulMagic: 0x89abcdef
   Engine ulMagic: 0x89abcdef
Format ulVersion: 0x620,12
Engine ulVersion: 0x620,12
Created ulVersion: 0x620,12
     DB Signature: Create time:08/09/2009 14:10:12 Rand:7948610 Computer:
         cbDbPage: 8192
           dbtime: 23064 (0x5a18)
            State: Dirty Shutdown
     Log Required: 94-94 (0x5e-0x5e)
    Log Committed: 0-95 (0x0-0x5f)
   Streaming File: No
         Shadowed: Yes
       Last Objid: 134
     Scrub Dbtime: 0 (0x0)
       Scrub Date: 00/00/1900 00:00:00
     Repair Count: 0
      Repair Date: 00/00/1900 00:00:00
Old Repair Count: 0
  Last Consistent: (0x5C,13,AC)  08/09/2009 15:20:34
      Last Attach: (0x5E,9,86)  08/09/2009 15:21:58
      Last Detach: (0x0,0,0)  00/00/1900 00:00:00
             Dbid: 1
    Log Signature: Create time:08/09/2009 14:10:08 Rand:7930576 Computer:
       OS Version: (5.2.3790 SP 2)

Previous Full Backup:
        Log Gen: 70-71 (0x46-0x47)
           Mark: (0x47,E,4F)
           Mark: 08/09/2009 15:06:34

Previous Incremental Backup:
        Log Gen: 5-75 (0x5-0x4b)
           Mark: (0x4C,8,16)
           Mark: 08/09/2009 15:17:03

Previous Copy Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Differential Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Current Full Backup:
        Log Gen: 94-95 (0x5e-0x5f)
           Mark: (0x5F,C,198)
           Mark: 08/09/2009 15:22:10

Current Shadow copy backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

     cpgUpgrade55Format: 0
    cpgUpgradeFreePages: 0
cpgUpgradeSpaceMapPages: 0

       ECC Fix Success Count: none
   Old ECC Fix Success Count: none
         ECC Fix Error Count: none
     Old ECC Fix Error Count: none
    Bad Checksum Error Count: none
Old bad Checksum Error Count: none

Operation completed successfully in 0.62 seconds.

Based on the header of the database I can see that incremental backup is populated.  Knowing our rules of replication – a database with an incremental backup will require all logs from the highest log generation, in this case 75 (0x4b) to current point in time be present on the source, contiguous, and able to be copied to the target. 

From our ML output you can see that I removed log 0x59 and 0x5a (decimal 89 and 90). 

I resumed storage group copy using resume-storagegroupcopy. 

The following event was logged indicating that copy started at log 75 (0x4b).

Event Type:    Information
Event Source:    MSExchangeRepl
Event Category:    Service
Event ID:    2114
Date:        8/9/2009
Time:        3:26:06 PM
User:        N/A
Computer:    2003-NODE2
Description:
The replication instance for storage group 2003-MBX32003-MBX3-SG1 has started copying transaction log files. The first log file successfully copied was generation 75.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

As logs copied between the nodes, I noticed that using get-storagegroupcopystatus the storage group in question was in a failed state.

Name                      SummaryCopySt CopyQueueLeng ReplayQueueL LastInspecte
                          atus          th            ength        dLogTime   
—-                      ————- ————- ———— ————
2003-MBX3-SG1             Failed        0             0                       
2003-MBX3-SG2             Suspended     0             0            8/9/2009 …

By reviewing the application log I noticed that our failure was due to the inability to copy the 0x59 (89) log file from the source.  This makes sense since I knowingly deleted it…and is expected since I know that all logs from the high log generation stamped in previous incremental backup to current time must be present on the source, contiguous, and able to get copied to the target.

Here is the sample error text:

Event Type:    Error
Event Source:    MSExchangeRepl
Event Category:    Service
Event ID:    2059
Date:        8/9/2009
Time:        3:27:07 PM
User:        N/A
Computer:    2003-NODE2
Description:
The log file \2003-mbx3-replc3d0099f3-ff35-46ea-8a2f-39eb50923209$E0000000059.log for 2003-MBX32003-MBX3-SG1 is missing on the production copy. Continuous replication for this storage group is blocked. If you removed the log file, please replace it. If the log is lost, the passive copy will need to be reseeded using the Update-StorageGroupCopy cmdlet in the Exchange Management Shell.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

What is confusing about this event is that the administrator is advised to run update-storagegroupcopy…and that’s what we just ran that generated the error.  Based solely on the event, and without knowledge of what logs are required by the replication service depending on the state of the database, one could end up in an endless loop of update-storagegroupcopy and log file copy failures.

Now…how can this condition be corrected.  The condition can be corrected by running a new full backup on the database prior to running the update-storagegroup copy.  The full backup will reset the previous incremental information, and stamp values in the previous full backup section based on the logs that are not missing.

 

*So what are the log file copy rules:

If a database is offline (clean shutdown) then all logs from last consistent value to current point must be present on the source, be contiguous, and able to be copied to the target.

If a database is online but has never had a full or incremental backup then all the logs form the anchor log at the time the update-storagegroupcopy was initiated to the current log must exist on the source, must be contiguous, and must be able to be copied to the target.

If a database is online, and has had a full backup performed on it, then all logs from the high log generation, as stamped in previous full backup, must be present on the source, must be contiguous, and must be able to be copied to the target.

If a database is online and has had a full and incremental backup performed on it then all logs from the high log generation as stamped in previous incremental backup must be present on the source, must be contiguous, and must be able to be copied to the target.