Monthly Archives: June 2009

Setting up a Windows 2008 cluster where nodes reside in a disjointed DNS namespace…

When attempting to establish the cluster services on nodes that utilize a dis-joint DNS namespace, the following errors may be encountered:

Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: Date_Time
Event ID: 1127
Task Category: None
Level: Error
Keywords:
User: SYSTEM
Computer: ComputerName
Description:
Cluster Network interface InterfaceName for cluster node NodeName on network NetworkName failed. Run the Validate a Configuration wizard to check your network configuration.

Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: Date_Time
Event ID: 1207
Task Category: Network Name
Resource Level: Error
Keywords:
User: SYSTEM
Computer: Computer-name.domain.com Description: Cluster network name resource ‘Cluster Name’ cannot be brought online. The computer object associated with the resource could not be updated in domain ‘disjoined.domain.com’ for the following reason: Unable to update password for computer account.
The text for the associated error code is: The password does not meet the password policy requirements. Check the minimum password length, password complexity and password history requirements.
The cluster identity ‘Cluster-name$’ may lack permissions required to update the object. Please work with your domain administrator to ensure that the cluster identity can update computer objects in the domain.

 

If you see errors similar to this, check out the following two links that may apply.

http://technet.microsoft.com/en-us/library/cc755926(WS.10).aspx

http://support.microsoft.com/kb/952247/en-us

Exchange 2007 SP1 CCR / LCR / SCR – Transaction Log Roll

Recently there have been some questions about transaction log rolling and continuous replication.  In some cases these questions often surround storage group copy status showing an initializing state (http://blogs.technet.com/timmcmic/archive/2009/01/26/get-storagegroupcopystatus-initializing.aspx).

Under normal circumstances, the only time that log would roll, is when we’ve reached a log full condition.  If the server is being utilized, this is not a problem, as logs will roll naturally as the server processes activity.

There are times though where the server is relatively idle.  This would mean the current log generation would not receive enough transaction activity against it to cause it to roll over.  This is where “transaction log roll” is important.  If the current log file (ENN.log) contains a durable (or hard) commit, and that log is not filled in a period of time, it will be rolled over and shipped to the other side.  (This is not an immediate process, if we rolled a log over every time there was a durable (hard) commit we’d generate a ton of logs).  The article referenced above gives examples of how to calculate the time that a log would roll over should it contain a durable (hard) commit.  The article above also contains the following text highlighting this behavior:

“The log roll mechanism does not generate transaction logs in the absence of user or other database activity. In fact, log roll is designed to occur only when there is a partially filled log.”

This information is important to us for several reasons.

The first is generally if logs roll why do my storage groups stay initializing for hours at a time.  The answer is because the current log does not contain a durable commit.  If you were to restart the replication service or suspend and resume a replication instance manually the first replication state you will encounter is initializing.  We remain in initializing until a log is generated, copied, inspected, and put out for replay with divergence information determined.  If no durable (hard) commit exists in the source log stream, the logs may not be rolled over until there is a durable (hard) commit or user activity, which means replication would stay in an initializing state for a while.  My suggestion is, if this is a test environment, simply send mail / dismount the source databases / etc.  In production, I’ve seen people script email to test mailboxes at a schedule time with a test mailbox located in each database.  This causes a durable commit, which will eventually result in log file roll over and shipment to the other side.

The second reason is that log file roll can cause churn in the log file stream which does not appear normal.  If you reference the link above you can see that an idle storage group could generate up to 960 log files a day.  This is especially true of the storage group contains some type of system mailboxes (which exchange accesses causing a durable commit) or test mailboxes which the user is accessing.  In either scenario, there may not be enough load by either process to force log roll to occur naturally, so Exchange rolls the log for you at a certain time.  This causes some concern, especially when looking at the log file drive on a test server etc and questioning why so many logs were generated.  IE – there wasn’t enough traffic to generate 960 megs of logs, which is probably correct, but there was enough traffic to put a durable commit into each of those 960 logs such that we rolled and shipped them without being full in attempts to keep both sides up to date.

The third reason I pointed this out is that there seems to be confusion on when log roll should occur.  This leads to people believing the log roll should occur no matter what, when as indicated it should only occur if the log contains a durable (hard) commit. 

There are other operations besides user activity or a durable (hard) commit which will cause the current transaction log to roll:

  • An attachment record is created in a log when a database is mounted.
  • A VSS backup occurs of the active node.
  • A VSS backup occurs of the passive node.
  • An online streaming backup occurs of the active node.

I hope everyone finds this information helpful.

When to use restore-storagegroupcopy with the –force switch and standby continuous replication (SCR)

Recently there was a lively internal debate regarding how to use restore-storagegroupcopy and the –force switch.

The documentation regarding the restore-storagegroupcopy command can be found at http://technet.microsoft.com/en-us/library/aa996024.aspx.

According to the TechNet documentation:

“The Force parameter can be used when the task is run programmatically and prompting for administrative input is inappropriate. If Force is not provided in the cmdlet, administrative input is prompted. If Force is provided in the cmdlet, but the value is omitted, its default value is $true. When the Restore-StorageGroupCopy cmdlet is run to make an SCR target viable for mounting, the Force parameter must be included when the SCR source is not available.”

You’ll notice in this text that –force is required for standby continuous replication when the SCR source is not available.

So the first question is what constitutes the source being unavailable.  In the most general terms the source is unavailable when the shares where the log files reside are not available such that the restore-storagegroupcopy command can be run and the remaining logs copied between machines.

For Windows 2003 based sources, and Windows 2008 non-shared storage clusters, the shares are generally not available when the entire machine is offline.  For Windows 2008 shared storage clusters, the shares may not be available because their corresponding file server resources are offline in the clustered mailbox server group (for example, a stop-clusteredmailboxserver was issued taking the entire CMS offline, including the file server resources).  Of course there are other reasons that shares may not be available, like network issues / misc hardware issues / etc.

The reason I point this out is that if the source is available, and the –force command is being used, we will not copy the delta logs over to the SCR source and mark the databases mountable.  This effectively causes the database mount process to fail indicating log files necessary for recovery are not present.  Manual recovery using eseutil /r /a would have to be performed in order for the databases to mount.

The second question is how can I overcome this limitation so this does not happen to me?  The answer to that is simple.  If you run the restore-storagegroupcopy without the –force we will attempt to copy delta logs.  Should the source be unavailable, we will fail the copy procedure with a meaningful message indicating that the delta logs cannot be copied, and –force is necessary.  After receiving this error you can repeat the restore-storagegroupcopy, this time specifying the –force.  Since –force was required, the logs will not be copied (source unavailable) but the databases will be marked mountable.

Rule of Thumb:  First try restore-storagegroupcopy and only run restore-storagegroupcopy –force if indicated to do so in the error text of the command.

===========================================================

Example of successful activation using restore-storagegroupcopy where the shares are available (no –force used).

===========================================================

Environment:  Source cluster / target standalone.

The source clustered mailbox server was stopped using stop-clusteredmailboxserver.

An eseutil /ml of the source log directory was run, the end of the log file can be seen here.  You will see that the log stream is complete through the E01.log.

      Log file: F:MBX-2MBX-2-SG2-LogsE0100000070.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000071.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000072.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000073.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000074.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000075.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000076.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE01.log – OK

No damaged log files were found.

Operation completed successfully in 14.921 seconds.

Prior to running the restore-storagegroupcopy an eseutil /ml was run against the logs on the SCR target.  You will note that the same logs are present with the exception of the E01.log.  (This is expected, even when the source CMS is shutdown gracefully the last log in the series is not copied to the SCR target.)

      Log file: F:MBX-2MBX-2-SG2-LogsE0100000070.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000071.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000072.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000073.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000074.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000075.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000076.log – OK

No damaged log files were found.

Operation completed successfully in 7.63 seconds.

At this time the shares on the source are available, and the mailbox stores dismounted.  A restore-storagegroupcopy –standbymachine <machine> is run and completes without error.  The following events are noted in the application log.

Log Name:      Application
Source:        MSExchangeRepl
Date:          4/30/2009 8:20:16 AM
Event ID:      2114
Task Category: Service
Level:         Information
Keywords:      Classic
User:          N/A
Computer:      MBX-3.exchange.msft
Description:
The replication instance for storage group MBX-2MBX-2-SG2 has started copying transaction log files. The first log file successfully copied was generation 119.

Log Name:      Application
Source:        MSExchangeRepl
Date:          4/30/2009 8:20:16 AM
Event ID:      2085
Task Category: Action
Level:         Information
Keywords:      Classic
User:          N/A
Computer:      MBX-3.exchange.msft
Description:
The Restore-StorageGroupCopy operation on MBX-2MBX-2-SG2 was successful. All logs were successfully copied.

I then re-ran the eseutil /ml against the log series.  You will note that after the restore-storagegroupcopy –standbymachine:<machine> that the e01.log is now present, it was successfully copied as a part of the restore process.

I followed up with an eseutil /ml of the target log directory, you can now see that the E01.log is present in the directory.

      Log file: F:MBX-2MBX-2-SG2-LogsE0100000071.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000072.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000073.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000074.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000075.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000076.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE01.log – OK

No damaged log files were found.

Operation completed successfully in 7.250 seconds.

The last operation was to mount the databases.  At this time the databases mounted successfully – eseutil /r /a was not required.

Log Name:      Application
Source:        MSExchangeIS Mailbox Store
Date:          4/30/2009 8:25:06 AM
Event ID:      9523
Task Category: General
Level:         Information
Keywords:      Classic
User:          N/A
Computer:      MBX-3.exchange.msft
Description:
The Microsoft Exchange Database "MBX-3-SG2MBX-3-SG2-DB1" has been started.

Database File: G:MBX-2MBX-2-SG2-DatabaseMBX-2-SG2-DB1.edb
Transaction Logfiles: F:MBX-2MBX-2-SG2-Logs
Base Name (logfile prefix): E01
System Path: E:MBX-2MBX-2-SG2-System

===========================================================

 

===========================================================

Example of successful activation using restore-storagegroupcopy where the shares are not available (-force used).

===========================================================

Environment:  Source cluster / target standalone.

The clustered nodes comprising the source solution were completely shutdown making them completely unavailable.

Prior to shutting the nodes down, after issuing a stop-clusteredmailboxserver, and eseutil /ml was run against the log directory.  You will see the log stream is complete through E01.log.

Verifying log files…
     Base name: e01

      Log file: F:MBX-2MBX-2-SG2-LogsE0100000092.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000093.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000094.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE01.log – OK

No damaged log files were found.

Operation completed successfully in 0.78 seconds

Prior to running the restore-storagegroupcopy, an eseutil /ml was run against the logs on the SCR target.  You will note that the E01.log is not present.

Verifying log files…
     Base name: e01

      Log file: F:MBX-2MBX-2-SG2-LogsE0100000092.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000093.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000094.log – OK

No damaged log files were found.

Operation completed successfully in 0.64 seconds.

At this time a restore-storagegroupcopy –standbymachine <MACHINE> was issued.  The following error was noted and expected since the source is no longer available.

[PS] G:>Restore-StorageGroupCopy -Identity MBX-2MBX-2-SG2 -StandbyMachine MBX-3
Restore-StorageGroupCopy : Restore failed to verify if the database on ‘MBX-2’ is mounted. Verify that the database is dismounted and then use the -Force parameter to restore the storage group copy.
At line:1 char:25
+ Restore-StorageGroupCopy  <<<< -Identity MBX-2MBX-2-SG2 -StandbyMachine MBX-3

After receiving an error that –force was necessary, the command was re-run using restore-storagegroupcopy –standbymachine –force.  The following information was presented in the Exchange Management Shell window:

[PS] G:>Restore-StorageGroupCopy -Identity MBX-2MBX-2-SG2 -StandbyMachine MBX-3    -force
WARNING: Performing a Restore-StorageGroupCopy operation on storage group
‘MBX-2-SG2’ with the Force option. Data loss is expected for this storage group.

The following events were noted in the application log:

Log Name:      Application
Source:        MSExchangeRepl
Date:          5/3/2009 10:37:39 AM
Event ID:      2139
Task Category: Action
Level:         Information
Keywords:      Classic
User:          N/A
Computer:      MBX-3.exchange.msft
Description:
The forced Restore-StorageGroupCopy operation on MBX-2MBX-2-SG2 was successful. However, there may be some data loss.

After the command complete successfully, an eseutil /ml was performed against the log stream.  You will note that the e01.log is not present in the target log directory, since the remaining logs could not be copied due to the SCR source being unavailable.

Verifying log files…
     Base name: e01

      Log file: F:MBX-2MBX-2-SG2-LogsE0100000092.log – OK 
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000093.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000094.log – OK

No damaged log files were found.

Operation completed successfully in 0.64 seconds.

At this time the database was successfully mounted as indicated by the following event in the application log.

Log Name:      Application
Source:        MSExchangeIS Mailbox Store
Date:          5/3/2009 10:44:06 AM
Event ID:      9523
Task Category: General
Level:         Information
Keywords:      Classic
User:          N/A
Computer:      MBX-3.exchange.msft
Description:
The Microsoft Exchange Database "MBX-3-SG2MBX-3-SG2-DB1" has been started.

Database File: G:MBX-2MBX-2-SG2-DatabaseMBX-2-SG2-DB1.edb
Transaction Logfiles: F:MBX-2MBX-2-SG2-Logs
Base Name (logfile prefix): E01
System Path: E:MBX-2MBX-2-SG2-System

===========================================================

 

===========================================================

Example of successful activation using restore-storagegroupcopy where the shares are available (-force used).

===========================================================

Environment:  Source cluster / target standalone.

The source clustered mailbox server was stopped using stop-clusteredmailboxserver.

An eseutil /ml of the source log directory was run, the end of the log file can be seen here.  You will see that the log stream is complete through the E01.log.

      Log file: F:MBX-2MBX-2-SG2-LogsE010000007A.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE010000007B.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE010000007C.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE010000007D.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE010000007E.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE01.log – OK

No damaged log files were found.

Operation completed successfully in 16.219 seconds.

Prior to running the restore-storagegroupcopy an eseutil /ml was run against the logs on the SCR target.  You will note that the same logs are present with the exception of the E01.log.  (This is expected, even when the source CMS is shutdown gracefully the last log in the series is not copied to the SCR target.)

      Log file: F:MBX-2MBX-2-SG2-LogsE0100000079.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE010000007A.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE010000007B.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE010000007C.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE010000007D.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE010000007E.log – OK

No damaged log files were found.

Operation completed successfully in 0.359 seconds.

At this time a restore-storagegroupcopy with the –force command was run.  Please note:  The source shares are available so –force is NOT NECESSARY.  Here is sample Exchange Management Shell output.

[PS] C:WindowsSystem32>Restore-StorageGroupCopy -Identity MBX-2MBX-2-SG2 –StandbyMachine MBX-3 –force

WARNING: Performing a Restore-StorageGroupCopy operation on storage group
‘MBX-2-SG2’ with the Force option. Data loss is expected for this storage
group.

The command completed successfully as indicated by returning to the Exchange Management Shell prompt without error.  The following event was noted in the application log.

Log Name:      Application
Source:        MSExchangeRepl
Date:          5/1/2009 8:29:41 AM
Event ID:      2139
Task Category: Action
Level:         Information
Keywords:      Classic
User:          N/A
Computer:      MBX-3.exchange.msft
Description:
The forced Restore-StorageGroupCopy operation on MBX-2MBX-2-SG2 was successful. However, there may be some data loss.

As follow up eseutil /ml was run against the logs on the SCR target machine.  You will note that the E01.log was not copied even though the restore-storagegroupcopy –force command completed successfully.

      Log file: F:MBX-2MBX-2-SG2-LogsE0100000077.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000078.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE0100000079.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE010000007A.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE010000007B.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE010000007C.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE010000007D.log – OK
      Log file: F:MBX-2MBX-2-SG2-LogsE010000007E.log – OK

No damaged log files were found.

Operation completed successfully in 0.187 seconds.

At this time a database mount attempt was performed, and failed with the following events noted in the application log.

Log Name:      Application
Source:        MSExchangeIS
Date:          5/1/2009 8:32:13 AM
Event ID:      9518
Task Category: General
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      MBX-3.exchange.msft
Description:
Error Current log file missing starting Storage Group /DC=com/DC=domain/DC=domain/CN=Configuration/CN=Services/CN=Microsoft Exchange/CN=Organization/CN=Administrative Groups/CN=Exchange Administrative Group (FYDIBOHF23SPDLT)/CN=Servers/CN=MBX-3/CN=InformationStore/CN=MBX-3-SG2 on the Microsoft Exchange Information Store.
Storage Group – Initialization of Jet failed.

Log Name:      Application
Source:        ESE
Date:          5/1/2009 8:32:13 AM
Event ID:      455
Task Category: Logging/Recovery
Level:         Error
Keywords:      Classic
User:          N/A
Computer:      MBX-3.exchange.msft
Description:
MSExchangeIS (2984) MBX-3-SG2: Error -1811 (0xfffff8ed) occurred while opening logfile f:MBX-2MBX-2-SG2-LogsE01.log.

The –1811 error translates to:

# for decimal -1811 / hex 0xfffff8ed
  JET_errFileNotFound
# /* File not found */
  JET_errFileNotFound
# /* File not found */
  JET_errFileNotFound
# /* File not found */
# 3 matches found for "-1811"

In this case the –force command was improperly used resulting in logs not being copied to the SCR target.  The databases could be mounted if they were manually recovered using eseutil /r /a or the logs manually copied to the SCR target.

This behavior is BY DESIGN.  The –force command does not check to see if the SCR source is available, therefore no log file copy attempts are made.

===========================================================

Evicting an Exchange 2007 clustered node.

There maybe time in either Windows 2003 or Windows 2008 where it may become necessary to evict a clustered node that has Exchange 2007 installed on it.

Under normal circumstances evicting a clustered node is a benign procedure.  When the node has Exchange 2007 installed on it special precautions must be taken.

When Exchange 2007 is installed on a clustered node a special DLL (exres.dll) is registered for the cluster service.  This dll contains the extensions in cluster that define the system attendant, information store, and databases instance clustered resources.  You can see the resource definitions in the cluster registry hive (HKLM –> System –> Cluster –> ResourceTypes).

 

image

If you select one of the Exchange resource types, you will see that the DLL that defines it (DLLName) is exres.dll.

 

image

The resource types that are registered in a cluster are local to each node.  When a node is evicted from a cluster, the local configuration is destroyed.  If the node is joined back to an existing cluster, the Exchange resource types are no longer registered.  This will effectively prevent this node from participating in the cluster.

In terms of Exchange there is no manual way to re-register the cluster extensions.  Exchange 2007 does not have a reinstall procedure.  If you attempt to rerun setup for the passive mailbox role, an error is generated indicating the role is already installed (because technically it is).  In some cases you are able to uninstall the mailbox role successfully, where the uninstall is not successful though there are no manual removal steps that can be used.  The worse case scenario is that the entire operating system must be rebuilt in order to facilitate installing Exchange.

To avoid this, use the following steps to successfully remove Exchange to facilitate evicting a clustered node:

1)  Run setup.com /mode:uninstall /roles:mt,mb

(Note:  MT is necessary to remove the management tools.  By default, any role install also includes the management tools.  By default, any uninstall only applies to the role specific – to have a complete removal you must specify both the mailbox role and management tools role.)

2)  Evict the node from the cluster.

3)  Re-join the node to the cluster.

4)  Run setup.com /mode:install /roles:mailbox to re-establish the passive node mailbox role installation.

Testing SCR in a production environment.

Occasionally I have an opportunity to work with our Exchange MVPs.  Neil Hobson, one of our Exchange MVPs, has recently started a series on how to test SCR in a production environment. 

If you have the time I would suggest checking it out!

Part 1:  http://www.msexchange.org/articles_tutorials/exchange-server-2007/high-availability-recovery/testing-scr-production-environment-part1.html

 

Part 2:  http://www.msexchange.org/articles_tutorials/exchange-server-2007/high-availability-recovery/testing-scr-production-environment-part2.html

 

Look for parts 3 and 4 in the upcoming weeks.

 

Tim