Monthly Archives: September 2009

An interesting issue with file level antivirus…

In order for Exchange to function correctly administrators are generally advised to make two important configuration changes to their file level antivirus scanners:

  • Exclude directories that contain Exchange binaries and Exchange data.
  • Add process exclusions for the Exchange processes (for example – store.exe etc).

When antivirus exclusions are not properly set servers may experience performance issues or database availability issues.

Today I worked an interesting issue with a customer.  A common issue when antivirus settings are not set correctly is that the ENNtmp.log file in the Exchange database log directory is deleted or quarantined during the rename procedure.  Remember that while a database is mounted any current transactions are being written to the ENN.log file.  When the ENN.log file is full, the lock on it is released and the file is renamed to it’s full name (for example ENN00001af.log).  While writing to the ENN.log, the next log file in the series is being built as the ENNtmp.log.  When the ENN.log is full, and that rename is occurring another rename operation is occurring – renaming the ENNtmp.log to ENN.log.  As locks are released on these files to facilitate renaming etc – file level antivirus when mis-configured can interject itself and operate against these files.  In this case the file was deleted as a virus and the database forcefully dismounted.

In our case events were clearly logged by the antivirus scanner indicating an action was performed on the log file in question.  This naturally lead us to reviewing the file level antivirus exclusions.

The structure of the server had all log directories stored on L: and all databases stored on M:.  When the file system exclusions were reviewed, you could clearly see that exclusions for L and M existed and were set to include sub-directories.

Upon further investigation it was determined that the log file directories though were actually mountpoints.  For example, L:SG1 where SG1 is a mounted volume as a mountpoint. 

From here it was determined that the antivirus scanner was actually looking at things at the volume / disk level.  Therefore, L: exclusions did not apply since SG1 was a mounted physical disk, and not a folder on the file system.

Given the above example:

A folder L:Folder would be excluded since it’s a folder existing on the L file system.

A mountpoint L:MountPoint would not be excluded since L is a physical disk and MountPoint is a physical disk – and no exclusion existed specifically for the mounted physical disk.

A folder L:FolderMountpoint exists.  In this case the L:Folder would be excluded, but the L:FolderMountpoint would not be excluded since the Mountpoint is a mounted physical disk.

The moral of the story…please make sure you are following up with your antivirus vendor on the appropriate method to apply exclusions when files exist on both lettered physical disks and mounted physical disks as mountpoints.

=========================

Update – 2/16/2015

A customer shared this with me today as a vendor example.

https://kc.mcafee.com/corporate/index?page=content&id=KB70886

 

=========================

Considerations when using mountpoints with Exchange 2007 clusters.

It is not uncommon in many installations today, especially clustered installations, to see the use of mountpoints.

When using mountpoints with clustering it is a requirement that the physical disk resources created for the mountpoints be dependant on the lettered physical volume that hosts them. 

The use of mountpoints also requires that the lettered physical volume hosting them be available at all times.  Exchange uses the full path – for example X:MountPointdata…

I wanted to highlight a design consideration that I encourage you to consider when utilizing mountpoints.

I see two common designs where mountpoints are utilized.

  • A single lettered volume and all mountpoints created off that volume.

image

  • Two lettered volumes – database mountpoints created off one lettered volume with log mountpoints created off the second lettered volume.

image

These two designs do give logical ( and possibly physical depending on storage design )separation to the storage for databases and the storage for log files.  It also limits the amount of lettered physical volumes that are required to support the desired solution. 

The issue it introduces though is a single point of failure.

In the first example in order for the mountpoints to function you have to have the X drive always available.  If the X drive is lost, all mountpoints associated with the X drive are lost, and the database instances dependant on those mountpoints are lost.

In the second example you have the same issue as in the first example where a single point of failure exists.  A loss of either the X or Y disk causes mounpoints associated on those disks to be unavailable.  Since an Exchange database instance requires dependencies on both the mountpoint hosting the database, and the mountpoint hosting the logs, the database instance becomes unavailable.

You can increase the availability of the overall solution by spreading out the mountpoints over a series of lettered volumes.  For example, let’s take a solution there there will be 25 storage groups.  You could in this instance create 5 lettered volumes.  Off of each of these 5 lettered volumes you would create 10 mountpoints.  Each of the mountpoint pairs would serve as the database and log drives for a single database instance.  That would mean that the loss of a single lettered volume would only affect users hosted in 5 of the databases in the solution, verses all 25 databases being affected when using the design outlined in either of the previous examples. 

Here is a small picture example of this concept.

image

Another consideration is that when a root volume hosting mount points is lost and subsequently replaced, the administrator must manually recreate the necessary folder structure and map the mount points back.  By spreading this out, you can hopefully eliminate the amount of work that would be necessary to restore the solution.

By reducing the single points of failure you can increase the overall availability of the solution.

‘ClusteredMailboxServer’ is partially configured – ExBPA.

When running ExBPA against an Exchange 2007 cluster, the following error text may be noted:

===========================================================

Error: ‘ClusteredMailboxServer’ is partially configured

Server: <SERVER>

‘ClusteredMailboxServer’ is partially configured on server <SERVER>.contoso.com. Either setup is configuring it right now or setup failed while configuring it. Please rerun setup if it isn’t currently running.

===========================================================

The link text from ExBPA advises users on how to correct this issue:

===========================================================

Possible CMS Setup Failure  Send Feedback

Microsoft Exchange Server Analyzer – Articles > Exchange Cluster

Topic Last Modified: 2008-04-01

The Microsoft® Exchange Server Analyzer Tool reads the following registry branch for the presence of the Action or Watermark registry values:

HKEY_LOCAL_MACHINESoftwareMicrosoftExchangev8.0ClusteredMailboxServer

If the Exchange Server Analyzer finds that either the Action or Watermark registry values are present, an error is displayed.

The presence of the Action or Watermark registry values on this registry branch indicates that either there was a previously unsuccessful installation of the Cluster Mailbox Server (CMS) to the cluster node, or that the CMS installation was in progress when the Exchange Server Analyzer was doing its analysis.

If the CMS installation failed on this node but was successful on other nodes of the cluster, it could appear that the cluster network is functioning normally when that is not the case.

To address this error, determine whether CMS installation is still in progress on the affected node and, if not, rerun setup to complete the failed CMS installation.

===========================================================

One of the ExBPA rules checks the registry to find out the install state of roles on the server.  Exchange 2007 uses watermark keys to track setup progress.  If a failure is encountered during setup, when setup is rerun the watermark is consulted and setup resumed at or near the point of failure.  When setup completes successfully the watermark value is cleared.

The following is a sample export of a clustered server mailbox role watermark.

===========================================================

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINESOFTWAREMicrosoftExchangev8.0ClusteredMailboxServer]
"Action"="Install"
"Watermark"=dword:7448a073

===========================================================

It is not uncommon to see this rule triggered against clustered nodes – especially if there at one point was a setup failure.  Sometimes during installation when a failure is encountered and the clustered mailbox server is partially configured the resources created will be moved between nodes.  Setup is then run on the other node in hopes that it will complete successfully.  In some cases it will complete successfully and in other cases it will fail. 

The watermark is not checkpointed / replicated between the registry of the nodes.  In the event that setup fails each node will have a watermark indicating the point in the installation where the failure occurred.  Should one of the nodes complete setup successfully, one node will have it’s watermark cleared while the other retains the watermark. 

The presence of this key triggers the ExBPA rule.

In order to tell what actions need to be taken we must first determine if installation completed successfully.  For this we can turn to the Exchange setup logs that are created on each node.  We will review the logs and see if it can be determined if setup completed successfully.

The setup logs are located at c:ExchangeSetupLogs.

We are looking for the file ExchangeSetup.log.  This file can be opened in a text viewer.

If the following text can be found in the setup log it should be safe to assume that setup completed successfully for that CMS on at least one of the nodes.

The start of the clustered mailbox server installation is indicated in the logs by the following series of text (or similar depending on single copy cluster or cluster continuous replication installation):

===========================================================

<DATE> [0] Setup will run the task ‘new-ClusteredMailboxServer’
<DATE> [1] Setup launched task ‘new-ClusteredMailboxServer -DomainController DC’ -PublicFolderDatabase $false -updatesdir ‘Y:x64Updates’ -Name <CMSNAME> -IPAddress <IPAddress> -IPv4Addresses $null -IPv4Networks $null -IPv6Networks $null -SharedStorage $false -DataPath <DATAPATH>

===========================================================

If you continue to scroll through the log and reach the following text the setup command completed successfully.

===========================================================

<DATE> [0] The Microsoft Exchange Server setup operation completed successfully.
<DATE> [0] End of Setup

===========================================================

Once you have verified that the command was run and completed successfully, the watermark can be exported and safely removed using registry editor.

If for any reason you cannot verify that setup finished for this CMS instance on at least one of the nodes, the only safe procedure for removing the watermark is to allow setup to complete successfully.  (This needs to occur even if the resources are already online and even if they are servicing users).

If you have the setup log this can be an easy task.  The parameters that were used for setup are also logged in this log.  The following line will tell you what parameters were used when establishing the clustered mailbox server installation. 

<DATE> [0] ExSetup was started with the following command: ‘/NoLogo /newCMS /cmsName:2003-MBX3 /cmsIPAddress:192.168.0.14 /sourcedir:Y:x64 /FromSetup’.

In this case you can see that the CMSName parameter used is 2003-MBX3, the CMSIpaddress is 192.168.0.4.  No additional switches (like CSS and CMSDataPath) indicate this was a CCR installation.

If you do not have the setup log you will most likely need to use cluster administrator to get the necessary information.  Using cluster administrator locate the CMS Network Name resource and CMS IPv4 Resource.  Make note of the values of each of these resources.

Once you have the values necessary to complete the command the CMS must be moved to the node where the watermark exists.  If the CMS is not currently on that node, issue a move-clusteredmailboxserver using the Exchange Management Shell. 

Once the CMS has been moved to the node with the watermark, run setup using the command line setup with  appropriate values.  Using my example above:

setup.com /newCMS /cmsName:2003-MBX3 /cmsIPAddress:192.168.0.4

Setup will consult the watermark on the machine and when it completes successfully clear the watermark value.  This will ensure that the CMS is now fully configured to support Exchange.

Log files required to be copied after an update-storagegroupcopy / re-seed has been successfully completed.

There may arise times where it is necessary to completely reseed an LCR, CCR, or SCR database copy from the source database.  In order to reseed the copy we use the update-storagegroupcopy commandlet.

When the update-storagegroupcopy is run, the database is pulled using the ESE backup online streaming API from the source machine to the target machine.  If the database is successfully copied without error the replication instance is automatically resumed.  No log files are pulled or copied as a part of the update-storagegroupcopy process.  It is not until after the update-storagegroupcopy process is completed, and replication is resumed, that the header of the database is reviewed and the replication service determines which logs are necessary to be copied.

In this blog post I want to highlight how the replication service makes the decision on which log files need to be copied post a re-seed of the database.  I will use examples from cluster.replay tracing (which can only be done with consultation with product support services).

*Databases copied offline between servers (clean shutdown).

When databases are copied offline between nodes this is a manual seeding operation.  By default a database that is offline and copied between nodes is in a clean shutdown state. 

Here is a sample header dump.

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server

Version 08.01

Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode…
         Database: 2003-MBX3-SG1-DB1.edb

        File Type: Database
   Format ulMagic: 0x89abcdef
   Engine ulMagic: 0x89abcdef
Format ulVersion: 0x620,12
Engine ulVersion: 0x620,12
Created ulVersion: 0x620,12
     DB Signature: Create time:08/09/2009 14:10:12 Rand:7948610 Computer:
         cbDbPage: 8192
           dbtime: 20053 (0x4e55)
            State: Clean Shutdown
     Log Required: 0-0 (0x0-0x0)
    Log Committed: 0-0 (0x0-0x0)
   Streaming File: No
         Shadowed: Yes
       Last Objid: 133
     Scrub Dbtime: 0 (0x0)
       Scrub Date: 00/00/1900 00:00:00
     Repair Count: 0
      Repair Date: 00/00/1900 00:00:00
Old Repair Count: 0
  Last Consistent: (0x9,A,1C1)  08/09/2009 14:12:18
      Last Attach: (0x8,9,86)  08/09/2009 14:12:15
      Last Detach: (0x9,A,1C1)  08/09/2009 14:12:18
             Dbid: 1
    Log Signature: Create time:08/09/2009 14:10:08 Rand:7930576 Computer:
       OS Version: (5.2.3790 SP 2)

Previous Full Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Incremental Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Copy Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Differential Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Current Full Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Current Shadow copy backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

     cpgUpgrade55Format: 0
    cpgUpgradeFreePages: 0
cpgUpgradeSpaceMapPages: 0

       ECC Fix Success Count: none
   Old ECC Fix Success Count: none
         ECC Fix Error Count: none
     Old ECC Fix Error Count: none
    Bad Checksum Error Count: none
Old bad Checksum Error Count: none

Operation completed successfully in 0.63 seconds.

 

When replication is resumed, the header of the database is consulted.  Here is an example trace tag from non-customer viewable tracing.

2826 74006100440074 2256 Cluster.Replay FileChecker RunChecks is successful. FileState is: LowestGenerationPresent: 0
HighestGenerationPresent: 0
LowestGenerationRequired: 0
HighestGenerationRequired: 0
LastGenerationBackedUp: 0
CheckpointGeneration: 0
LogfileSignature:
LatestFullBackupTime:
LatestIncrementalBackupTime:
LatestDifferentialBackupTime:
LatestCopyBackupTime:
SnapshotBackup:
SnapshotLatestFullBackup:
SnapshotLatestIncrementalBackup:
SnapshotLatestDifferentialBackup:
SnapshotLatestCopyBackup:
ConsistentDatabase: True
2827 74006100440074 2256 Cluster.Replay ReplicaInstance SetReplayState(): LowestGenerationPresent: 0
HighestGenerationPresent: 0
LowestGenerationRequired: 0
HighestGenerationRequired: 0
LastGenerationBackedUp: 0
CheckpointGeneration: 0
LogfileSignature:
LatestFullBackupTime:
LatestIncrementalBackupTime:
LatestDifferentialBackupTime:
LatestCopyBackupTime:
SnapshotBackup:
SnapshotLatestFullBackup:
SnapshotLatestIncrementalBackup:
SnapshotLatestDifferentialBackup:
SnapshotLatestCopyBackup:
ConsistentDatabase: True

You can see here that the replication service, after reading the status of the database, has detected that a clean shutdown database was found.  (ConsisntentDatabase: True).

Since no backup has been performed on the database, and the log directory on the target was empty prior to resuming the storage group copy, the replication service determines that no minimum log file was necessary.  Log copy will start from the first log available on the source server and continue to the highest generation on the source server.  As long as the generation is contiguous replication will proceed and remain healthy post the manual database seed.  (It would be best practice with an offline reseed to clear the log directory on the target prior to resuming the database copy).

2871 030F3F44 2256 Cluster.Replay ReplicaInstance No logfiles present, no backup information, no required generation
2872 030F3F44 2256 Cluster.Replay ReplicaInstance Log copying will start from generation 0

The replication service begins the replication instance and queries the source directory, in this case it was determined that log file 5 was the first available, and thus the replication service starts by copying this log file.

2903 0385717B 2256 Cluster.Replay NetPath ClusterPathManager.GetPath() returns \2003-mbx3-replc3d0099f3-ff35-46ea-8a2f-39eb50923209$
2904 007D2DBB 2256 Cluster.Replay LogCopy First generation for \2003-mbx3-replc3d0099f3-ff35-46ea-8a2f-39eb50923209$ is 00000005
2905 0385717B 2256 Cluster.Replay NetPath ClusterPathManager.GetPath() returns \2003-mbx3-replc3d0099f3-ff35-46ea-8a2f-39eb50923209$
2906 007D2DBB 2256 Cluster.Replay PFD PFD CRS 18907 First generation for \2003-mbx3-replc3d0099f3-ff35-46ea-8a2f-39eb50923209$ is 00000005
2907 0385717B 2256 Cluster.Replay NetPath ClusterPathManager.GetPath() returns \2003-mbx3-replc3d0099f3-ff35-46ea-8a2f-39eb50923209$
2908 007D2DBB 2256 Cluster.Replay ShipLog LogCopy: Trying to find file \2003-mbx3-replc3d0099f3-ff35-46ea-8a2f-39eb50923209$E0000000005.log
2909 007D2DBB 2256 Cluster.Replay ShipLog LogCopy: Found file E0000000005.log
2910 007D2DBB 2256 Cluster.Replay PFD PFD CRS 18395 LogCopy: Found file E0000000005.log

This is also confirmed by the event in the application log indicating that log copy began by successfully copying log generation 5 (0x5).

Event Type:    Information
Event Source:    MSExchangeRepl
Event Category:    Service
Event ID:    2114
Date:        8/9/2009
Time:        2:15:22 PM
User:        N/A
Computer:    2003-NODE2
Description:
The replication instance for storage group 2003-MBX32003-MBX3-SG1 has started copying transaction log files. The first log file successfully copied was generation 5.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

At minimum we must have all logs from the Last Consistent log generation forward in order to maintain replication.  This makes sense, if I did not have all logs from the last consistent log (where the database was shutdown) forward, how could I bring the passive copy up to current point in time?

As long as the generation is contiguous and logs are present on the source from last consistent to current log replication will proceed and remain healthy post the manual database seed.

 

*Database seeded using Update-StorageGroupCopy where no full or incremental backup was performed.

In this example we have a database on a source server that has neither had a full or incremental backup performed on it.  The storage group replication between nodes was suspended using suspend-storagegroupcopy.  Then the update-storagegroupcopy command was used to stream the database to the target server (the –manualResume switch was also used so I could generate the header dumps).  Below is a sample header dump of a database post an update-storagegroupcopy.

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server

Version 08.01

Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode…
         Database: 2003-MBX3-SG1-DB1.edb

        File Type: Database
   Format ulMagic: 0x89abcdef
   Engine ulMagic: 0x89abcdef
Format ulVersion: 0x620,12
Engine ulVersion: 0x620,12
Created ulVersion: 0x620,12
     DB Signature: Create time:08/09/2009 14:10:12 Rand:7948610 Computer:
         cbDbPage: 8192
           dbtime: 20053 (0x4e55)
            State: Dirty Shutdown
     Log Required: 11-11 (0xb-0xb)
    Log Committed: 0-14 (0x0-0xe)
   Streaming File: No
         Shadowed: Yes
       Last Objid: 133
     Scrub Dbtime: 0 (0x0)
       Scrub Date: 00/00/1900 00:00:00
     Repair Count: 0
      Repair Date: 00/00/1900 00:00:00
Old Repair Count: 0
  Last Consistent: (0x9,A,1C1)  08/09/2009 14:12:18
      Last Attach: (0xB,9,86)  08/09/2009 14:28:20
      Last Detach: (0x0,0,0)  00/00/1900 00:00:00
             Dbid: 1
    Log Signature: Create time:08/09/2009 14:10:08 Rand:7930576 Computer:
       OS Version: (5.2.3790 SP 2)

Previous Full Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Incremental Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Copy Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Differential Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Current Full Backup:
        Log Gen: 11-14 (0xb-0xe)
           Mark: (0xE,188,167)
           Mark: 08/09/2009 14:29:54

Current Shadow copy backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

     cpgUpgrade55Format: 0
    cpgUpgradeFreePages: 0
cpgUpgradeSpaceMapPages: 0

       ECC Fix Success Count: none
   Old ECC Fix Success Count: none
         ECC Fix Error Count: none
     Old ECC Fix Error Count: none
    Bad Checksum Error Count: none
Old bad Checksum Error Count: none

Operation completed successfully in 0.31 seconds.

In this header dump you will notice that the database is in dirty shutdown.  This is expected of a database that has come from an online seeding operation.  You will also note that the Current Full Backup header section of the database is populated.  The low log value here is 11 (0xb) and the high log value is 14 (0xe).

After the header dump was generated I resumed storage group copy (normally after a successful update-storagegroupcopy this is done for you automatically).  When replication is resumed the header of the database is consulted.  Here is a sample output from non-customer viewable tracing.

5107 61007400610044 2256 Cluster.Replay FileChecker RunChecks is successful. FileState is: LowestGenerationPresent: 0
HighestGenerationPresent: 0
LowestGenerationRequired: 11
HighestGenerationRequired: 11
LastGenerationBackedUp: 0
CheckpointGeneration: 0
LogfileSignature: MJET_SIGNATURE(Random = 7930576,CreationTime = 8/9/2009 2:10:08 PM)
LatestFullBackupTime:
LatestIncrementalBackupTime:
LatestDifferentialBackupTime:
LatestCopyBackupTime:
SnapshotBackup:
SnapshotLatestFullBackup:
SnapshotLatestIncrementalBackup:
SnapshotLatestDifferentialBackup:
SnapshotLatestCopyBackup:
ConsistentDatabase: False
5108 61007400610044 2256 Cluster.Replay ReplicaInstance SetReplayState(): LowestGenerationPresent: 0
HighestGenerationPresent: 0
LowestGenerationRequired: 11
HighestGenerationRequired: 11
LastGenerationBackedUp: 0
CheckpointGeneration: 0
LogfileSignature: MJET_SIGNATURE(Random = 7930576,CreationTime = 8/9/2009 2:10:08 PM)
LatestFullBackupTime:
LatestIncrementalBackupTime:
LatestDifferentialBackupTime:
LatestCopyBackupTime:
SnapshotBackup:
SnapshotLatestFullBackup:
SnapshotLatestIncrementalBackup:
SnapshotLatestDifferentialBackup:
SnapshotLatestCopyBackup:
ConsistentDatabase: False

You will note from this output that the HighestGenerationRequired and LowestGenerationRequired is 11 (0xb).  This is based on the current full backup information in the header of the database.  The lowest log recorded in current full backup represents the lowest log necessary to complete the source database at the time the update-storagegroupcopy was run.

You will note that the events in the application log indicate log copy started with logs 11 (0xb).

Event Type:    Information
Event Source:    MSExchangeRepl
Event Category:    Service
Event ID:    2114
Date:        8/9/2009
Time:        2:36:00 PM
User:        N/A
Computer:    2003-NODE2
Description:
The replication instance for storage group 2003-MBX32003-MBX3-SG1 has started copying transaction log files. The first log file successfully copied was generation 11.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

Post an update-storagegroupcopy replication will remain healthy pending that all logs are present and contiguous on the source server from the time the update-storagegroupcopy was initiated until it completed successfully.

 

*Database seeded using Update-StorageGroupCopy where a full backup was performed on the source database.

In this example we have a database on a source server that has had a full backup performed on it (in this case an ESE online streaming backup).  The storage group replication between nodes was suspended using suspend-storagegroupcopy.  Then the update-storagegroupcopy command was used to stream the database to the target server (the –manualResume switch was also used so I could generate the header dumps).  Below is a sample header dump of a database post an update-storagegroupcopy.

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server

Version 08.01

Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode…
         Database: 2003-MBX3-SG1-DB1.edb

        File Type: Database
   Format ulMagic: 0x89abcdef
   Engine ulMagic: 0x89abcdef
Format ulVersion: 0x620,12
Engine ulVersion: 0x620,12
Created ulVersion: 0x620,12
     DB Signature: Create time:08/09/2009 14:10:12 Rand:7948610 Computer:
         cbDbPage: 8192
           dbtime: 22631 (0x5867)
            State: Dirty Shutdown
     Log Required: 31-31 (0x1f-0x1f)
    Log Committed: 0-32 (0x0-0x20)
   Streaming File: No
         Shadowed: Yes
       Last Objid: 134
     Scrub Dbtime: 0 (0x0)
       Scrub Date: 00/00/1900 00:00:00
     Repair Count: 0
      Repair Date: 00/00/1900 00:00:00
Old Repair Count: 0
  Last Consistent: (0x1D,A,1C1)  08/09/2009 14:48:13
      Last Attach: (0x1F,9,86)  08/09/2009 14:48:15
      Last Detach: (0x0,0,0)  00/00/1900 00:00:00
             Dbid: 1
    Log Signature: Create time:08/09/2009 14:10:08 Rand:7930576 Computer:
       OS Version: (5.2.3790 SP 2)

Previous Full Backup:
        Log Gen: 20-21 (0x14-0x15)
           Mark: (0x15,D,195)
           Mark: 08/09/2009 14:45:43

Previous Incremental Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Copy Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Differential Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Current Full Backup:
        Log Gen: 31-32 (0x1f-0x20)
           Mark: (0x20,E,185)
           Mark: 08/09/2009 14:49:06

Current Shadow copy backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

     cpgUpgrade55Format: 0
    cpgUpgradeFreePages: 0
cpgUpgradeSpaceMapPages: 0

       ECC Fix Success Count: none
   Old ECC Fix Success Count: none
         ECC Fix Error Count: none
     Old ECC Fix Error Count: none
    Bad Checksum Error Count: none
Old bad Checksum Error Count: none

Operation completed successfully in 0.46 seconds.

In this header dump you will note that Previous Full Backup is populated.  The low log generation is 20 (0x14) and the high log generation is 21 (0x15).

After the header dump was generated I resumed storage group copy (normally after a successful update-storagegroupcopy this is done for you automatically).  When replication is resumed the header of the database is consulted.  Here is a sample output from non-customer viewable tracing.

6593 61007400610044 2472 Cluster.Replay FileChecker RunChecks is successful. FileState is: LowestGenerationPresent: 0
HighestGenerationPresent: 0
LowestGenerationRequired: 31
HighestGenerationRequired: 31
LastGenerationBackedUp: 21
CheckpointGeneration: 0
LogfileSignature: MJET_SIGNATURE(Random = 7930576,CreationTime = 8/9/2009 2:10:08 PM)
LatestFullBackupTime: 8/9/2009 2:45:43 PM
LatestIncrementalBackupTime:
LatestDifferentialBackupTime:
LatestCopyBackupTime:
SnapshotBackup: False
SnapshotLatestFullBackup: False
SnapshotLatestIncrementalBackup:
SnapshotLatestDifferentialBackup:
SnapshotLatestCopyBackup:
ConsistentDatabase: False
6594 61007400610044 2472 Cluster.Replay ReplicaInstance SetReplayState(): LowestGenerationPresent: 0
HighestGenerationPresent: 0
LowestGenerationRequired: 31
HighestGenerationRequired: 31
LastGenerationBackedUp: 21
CheckpointGeneration: 0
LogfileSignature: MJET_SIGNATURE(Random = 7930576,CreationTime = 8/9/2009 2:10:08 PM)
LatestFullBackupTime: 8/9/2009 2:45:43 PM
LatestIncrementalBackupTime:
LatestDifferentialBackupTime:
LatestCopyBackupTime:
SnapshotBackup: False
SnapshotLatestFullBackup: False
SnapshotLatestIncrementalBackup:
SnapshotLatestDifferentialBackup:
SnapshotLatestCopyBackup:
ConsistentDatabase: False

In this output you will note that LastGenerationBackedUp is 21 (0x14).  This corresponds to the high log generation as stamped in previous full backup.  You’ll also note that the LowestGenerationRequired and HighestGenerationRequired is 31 which corresponds to the low log value stamped in current full backup. 

In this case log file copy will start at generation 21 (0x14).  Events in the application log correspond with this:

Event Type:    Information
Event Source:    MSExchangeRepl
Event Category:    Service
Event ID:    2114
Date:        8/9/2009
Time:        2:51:20 PM
User:        N/A
Computer:    2003-NODE2
Description:
The replication instance for storage group 2003-MBX32003-MBX3-SG1 has started copying transaction log files. The first log file successfully copied was generation 21.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

The difference between this example and previous examples is that a full backup was performed.  The decision to start copy at log 21 (0x14), which is based on previous full backup, makes sense if you think about the replication service.  Remember that a database can be backed up either from the active or passive nodes.  If I did not base my log file copy on previous full backup that means that I would not have all the logs on my passive copy since the last full backup.  This would essentially prevent me from, at a later point in time, performing an incremental backup.  (Remember an incremental backup requires all log files from the previous full backup be present).

When a database has had a full backup on it replication will remain healthy as long as all logs are contiguous on the source from the high log generation as stamped in previous full backup to the current log.

 

*Database seeded using Update-StorageGroupCopy where a full and incremental backup was performed on the source database.

In this example we have a database on a source server that has had a full and incremental backup performed on it (in this case an ESE online streaming backup).  The storage group replication between nodes was suspended using suspend-storagegroupcopy.  Then the update-storagegroupcopy command was used to stream the database to the target server (the –manualResume switch was also used so I could generate the header dumps).  Below is a sample header dump of a database post an update-storagegroupcopy.

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server

Version 08.01

Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode…
         Database: 2003-MBX3-SG1-DB1.edb

        File Type: Database
   Format ulMagic: 0x89abcdef
   Engine ulMagic: 0x89abcdef
Format ulVersion: 0x620,12
Engine ulVersion: 0x620,12
Created ulVersion: 0x620,12
     DB Signature: Create time:08/09/2009 14:10:12 Rand:7948610 Computer:
         cbDbPage: 8192
           dbtime: 22745 (0x58d9)
            State: Dirty Shutdown
     Log Required: 50-50 (0x32-0x32)
    Log Committed: 0-51 (0x0-0x33)
   Streaming File: No
         Shadowed: Yes
       Last Objid: 134
     Scrub Dbtime: 0 (0x0)
       Scrub Date: 00/00/1900 00:00:00
     Repair Count: 0
      Repair Date: 00/00/1900 00:00:00
Old Repair Count: 0
  Last Consistent: (0x30,A,1C1)  08/09/2009 14:59:22
      Last Attach: (0x32,9,86)  08/09/2009 14:59:24
      Last Detach: (0x0,0,0)  00/00/1900 00:00:00
             Dbid: 1
    Log Signature: Create time:08/09/2009 14:10:08 Rand:7930576 Computer:
       OS Version: (5.2.3790 SP 2)

Previous Full Backup:
        Log Gen: 20-21 (0x14-0x15)
           Mark: (0x15,D,195)
           Mark: 08/09/2009 14:45:43

Previous Incremental Backup:
        Log Gen: 5-34 (0x5-0x22)
           Mark: (0x23,8,16)
           Mark: 08/09/2009 14:59:00

Previous Copy Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Differential Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Current Full Backup:
        Log Gen: 50-51 (0x32-0x33)
           Mark: (0x33,F,29)
           Mark: 08/09/2009 15:00:05

Current Shadow copy backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

     cpgUpgrade55Format: 0
    cpgUpgradeFreePages: 0
cpgUpgradeSpaceMapPages: 0

       ECC Fix Success Count: none
   Old ECC Fix Success Count: none
         ECC Fix Error Count: none
     Old ECC Fix Error Count: none
    Bad Checksum Error Count: none
Old bad Checksum Error Count: none

Operation completed successfully in 0.78 seconds.

In this header dump you will note that Previous Incremental Backup is populated.  The low log generation is 5 (0x5) and the high log generation is 34 (0x22).

After the header dump was generated I resumed storage group copy (normally after a successful update-storagegroupcopy this is done for you automatically).  When replication is resumed the header of the database is consulted.  Here is a sample output from non-customer viewable tracing.

8933 61007400610044 2472 Cluster.Replay ReplicaInstance SetReplayState(): LowestGenerationPresent: 0
HighestGenerationPresent: 0
LowestGenerationRequired: 50
HighestGenerationRequired: 50
LastGenerationBackedUp: 34
CheckpointGeneration: 0
LogfileSignature: MJET_SIGNATURE(Random = 7930576,CreationTime = 8/9/2009 2:10:08 PM)
LatestFullBackupTime: 8/9/2009 2:45:43 PM
LatestIncrementalBackupTime: 8/9/2009 2:59:00 PM
LatestDifferentialBackupTime:
LatestCopyBackupTime:
SnapshotBackup: False
SnapshotLatestFullBackup: False
SnapshotLatestIncrementalBackup: False
SnapshotLatestDifferentialBackup:
SnapshotLatestCopyBackup:
ConsistentDatabase: False
8934 020C9AB5 2472 Cluster.Replay State CopyGenerationNumber is changing to 0 on replica 3d0099f3-ff35-46ea-8a2f-39eb50923209

In this output you will note that LastGenerationBackedUp is 34 (0x14).  This corresponds to the high log generation as stamped in previous incremental backup.  You’ll also note that the LowestGenerationRequired and HighestGenerationRequired is 50 which corresponds to the low log value stamped in current full backup. 

In this case log file copy will start at generation 34 (0x22).  Events in the application log correspond with this:

Event Type:    Information
Event Source:    MSExchangeRepl
Event Category:    Service
Event ID:    2114
Date:        8/9/2009
Time:        2:51:20 PM
User:        N/A
Computer:    2003-NODE2
Description:
The replication instance for storage group 2003-MBX32003-MBX3-SG1 has started copying transaction log files. The first log file successfully copied was generation 34.

The difference between this example and previous examples is that a full and incremental backup was performed.  The decision to start copy at log 34 (0x22), which is based on previous incremental backup, makes sense if you think about the replication service.  Remember that a database can be backed up either from the active or passive nodes.  If I did not base my log file copy on previous incremental backup that means that I would not have all the logs on my passive copy since the last incremental backup.  This would essentially prevent me from, at a later point in time, performing an incremental backup of the passive copy.  (Remember and incremental backup requires all log files from the previous incremental backup be present).

When a database has had a incremental backup on it replication will remain healthy as long as all logs are contiguous on the source from the high log generation as stamped in previous incremental backup to the current log.

 

*So what does an example look like where the necessary logs are not present.

In this example I have a database that has had a full and incremental backup performed on it.  I have suspended the storage group copy between nodes and forced log generation to occur.  I then went into the source log directory, and removed two logs from the end of the log stream. 

This is not an uncommon example.  While storage group copy is failed or suspended logs will continue to generate on the source server.  All full and incremental backups of the source will continue to be successful, but logs will not purge.  Depending on the size of your log file drive, and the amount of time that copy is suspended or failed, your log drive may begin to fill up.  This may lead to administrators manually purging the log file series.

Here is an eseutil /ml output of the source log directory showing the gap.

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server

Version 08.01

Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode…

Verifying log files…
     Base name: e00

      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000005.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000006.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000007.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000008.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000009.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000000A.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000000B.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000000C.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000000D.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000000E.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000000F.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000010.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000011.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000012.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000013.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000014.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000015.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000016.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000017.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000018.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000019.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000001A.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000001B.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000001C.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000001D.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000001E.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000001F.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000020.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000021.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000022.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000023.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000024.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000025.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000026.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000027.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000028.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000029.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000002A.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000002B.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000002C.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000002D.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000002E.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000002F.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000030.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000031.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000032.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000033.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000034.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000035.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000036.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000037.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000038.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000039.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000003A.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000003B.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000003C.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000003D.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000003E.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000003F.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000040.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000041.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000042.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000043.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000044.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000045.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000046.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000047.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000048.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000049.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000004A.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000004B.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000004C.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000004D.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000004E.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000004F.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000050.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000051.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000052.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000053.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000054.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000055.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000056.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000057.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE0000000058.log – OK
      Missing log files: e00{00000059 – 0000005A}.log
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000005B.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000005C.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE000000005D.log – OK
      Log file: D:2003-MBX32003-MBX3-SG1-LogsE00.log – OK

Operation terminated with error -528 (JET_errMissingLogFile, Current log file missing) after 5.203 seconds.

On the passive node I have now issues my update-storagegroupcopy.  Prior to resuming storage group copy I dumped the header of the database and here is the output.

Extensible Storage Engine Utilities for Microsoft(R) Exchange Server

Version 08.01

Copyright (C) Microsoft Corporation. All Rights Reserved.

Initiating FILE DUMP mode…
         Database: 2003-MBX3-SG1-DB1.edb

        File Type: Database
   Format ulMagic: 0x89abcdef
   Engine ulMagic: 0x89abcdef
Format ulVersion: 0x620,12
Engine ulVersion: 0x620,12
Created ulVersion: 0x620,12
     DB Signature: Create time:08/09/2009 14:10:12 Rand:7948610 Computer:
         cbDbPage: 8192
           dbtime: 23064 (0x5a18)
            State: Dirty Shutdown
     Log Required: 94-94 (0x5e-0x5e)
    Log Committed: 0-95 (0x0-0x5f)
   Streaming File: No
         Shadowed: Yes
       Last Objid: 134
     Scrub Dbtime: 0 (0x0)
       Scrub Date: 00/00/1900 00:00:00
     Repair Count: 0
      Repair Date: 00/00/1900 00:00:00
Old Repair Count: 0
  Last Consistent: (0x5C,13,AC)  08/09/2009 15:20:34
      Last Attach: (0x5E,9,86)  08/09/2009 15:21:58
      Last Detach: (0x0,0,0)  00/00/1900 00:00:00
             Dbid: 1
    Log Signature: Create time:08/09/2009 14:10:08 Rand:7930576 Computer:
       OS Version: (5.2.3790 SP 2)

Previous Full Backup:
        Log Gen: 70-71 (0x46-0x47)
           Mark: (0x47,E,4F)
           Mark: 08/09/2009 15:06:34

Previous Incremental Backup:
        Log Gen: 5-75 (0x5-0x4b)
           Mark: (0x4C,8,16)
           Mark: 08/09/2009 15:17:03

Previous Copy Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Previous Differential Backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

Current Full Backup:
        Log Gen: 94-95 (0x5e-0x5f)
           Mark: (0x5F,C,198)
           Mark: 08/09/2009 15:22:10

Current Shadow copy backup:
        Log Gen: 0-0 (0x0-0x0)
           Mark: (0x0,0,0)
           Mark: 00/00/1900 00:00:00

     cpgUpgrade55Format: 0
    cpgUpgradeFreePages: 0
cpgUpgradeSpaceMapPages: 0

       ECC Fix Success Count: none
   Old ECC Fix Success Count: none
         ECC Fix Error Count: none
     Old ECC Fix Error Count: none
    Bad Checksum Error Count: none
Old bad Checksum Error Count: none

Operation completed successfully in 0.62 seconds.

Based on the header of the database I can see that incremental backup is populated.  Knowing our rules of replication – a database with an incremental backup will require all logs from the highest log generation, in this case 75 (0x4b) to current point in time be present on the source, contiguous, and able to be copied to the target. 

From our ML output you can see that I removed log 0x59 and 0x5a (decimal 89 and 90). 

I resumed storage group copy using resume-storagegroupcopy. 

The following event was logged indicating that copy started at log 75 (0x4b).

Event Type:    Information
Event Source:    MSExchangeRepl
Event Category:    Service
Event ID:    2114
Date:        8/9/2009
Time:        3:26:06 PM
User:        N/A
Computer:    2003-NODE2
Description:
The replication instance for storage group 2003-MBX32003-MBX3-SG1 has started copying transaction log files. The first log file successfully copied was generation 75.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

As logs copied between the nodes, I noticed that using get-storagegroupcopystatus the storage group in question was in a failed state.

Name                      SummaryCopySt CopyQueueLeng ReplayQueueL LastInspecte
                          atus          th            ength        dLogTime   
—-                      ————- ————- ———— ————
2003-MBX3-SG1             Failed        0             0                       
2003-MBX3-SG2             Suspended     0             0            8/9/2009 …

By reviewing the application log I noticed that our failure was due to the inability to copy the 0x59 (89) log file from the source.  This makes sense since I knowingly deleted it…and is expected since I know that all logs from the high log generation stamped in previous incremental backup to current time must be present on the source, contiguous, and able to get copied to the target.

Here is the sample error text:

Event Type:    Error
Event Source:    MSExchangeRepl
Event Category:    Service
Event ID:    2059
Date:        8/9/2009
Time:        3:27:07 PM
User:        N/A
Computer:    2003-NODE2
Description:
The log file \2003-mbx3-replc3d0099f3-ff35-46ea-8a2f-39eb50923209$E0000000059.log for 2003-MBX32003-MBX3-SG1 is missing on the production copy. Continuous replication for this storage group is blocked. If you removed the log file, please replace it. If the log is lost, the passive copy will need to be reseeded using the Update-StorageGroupCopy cmdlet in the Exchange Management Shell.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.

What is confusing about this event is that the administrator is advised to run update-storagegroupcopy…and that’s what we just ran that generated the error.  Based solely on the event, and without knowledge of what logs are required by the replication service depending on the state of the database, one could end up in an endless loop of update-storagegroupcopy and log file copy failures.

Now…how can this condition be corrected.  The condition can be corrected by running a new full backup on the database prior to running the update-storagegroup copy.  The full backup will reset the previous incremental information, and stamp values in the previous full backup section based on the logs that are not missing.

 

*So what are the log file copy rules:

If a database is offline (clean shutdown) then all logs from last consistent value to current point must be present on the source, be contiguous, and able to be copied to the target.

If a database is online but has never had a full or incremental backup then all the logs form the anchor log at the time the update-storagegroupcopy was initiated to the current log must exist on the source, must be contiguous, and must be able to be copied to the target.

If a database is online, and has had a full backup performed on it, then all logs from the high log generation, as stamped in previous full backup, must be present on the source, must be contiguous, and must be able to be copied to the target.

If a database is online and has had a full and incremental backup performed on it then all logs from the high log generation as stamped in previous incremental backup must be present on the source, must be contiguous, and must be able to be copied to the target.