Skip to main content
MMCUG Logo

MMCUG Blogs

Go Search
Home
MMCUG Blogs
Events
Event Registration
Directions
Sponsors
Links
LinkedIn
Search
  

> MMCUG Blogs > Posts > Microsoft Exchange Server 2007 SP1 CCR Testing
Microsoft Exchange Server 2007 SP1 CCR Testing

I wanted to document the procedures to testing CCR failover scenarios. I wanted this procedure list to be short and easy – quick to execute to determine if the two CCR nodes were operating properly. The procedure with the Exchange Management Shell cmdlets is presented here. The procedures listed will work when using Windows Server 2003 or Windows Server 2008.

Assumptions

  • The two CCR nodes are called CCRNode1 and CCRNode2.
  • The CMS is called CMS

CCR Testing

Active node has E00.log; passive node only has E00.chk file – no E00.log on passive.

Public NIC Failure Test

This test simulates the failure of the public NIC on the active node.

  1. Connect to active node – CCRNODE1.
  2. Note presence of E00.log file on CCRNODE1. Passive node does not have E00.log file.
  3. Disable public NIC.
  4. Failover Clustering Manager shows resources going offline and moving to CCRNODE2 node.
  5. Execute Get-StorageGroupCopyStatus on machine with disabled NIC to see errors in not finding Active Directory DC. No client traffic now occurs over this public network – no users can access this node.
  6. Outlook clients can connect to mailbox on CMS now on CCRNODE2.
  7. Test inbound Internet mail to test mailbox on CMS.
  8. Enable public NIC on CCRNODE1.
  9. Issue Get-ClusteredMailboxServerStatus on CCRNODE1. Note it indicates CCRNODE2 is active. Issue same cmdlet from CCRNODE2 to verify it also shows it's the active node.
  10. Connect to CCRNODE2 to verify the presence of E00.log.

Note – state of databases may show as Initializing. What initializing is telling the administrator is that:  

  • The replication service has not received a notification to copy a log.
  • The replication service has not yet copied a log.
  • The replication service has not yet inspected a log.
  • The replication service has not placed a log out for replay and determined divergence information.

To change replication to a healthy state I usually do one of two steps: 

  1. Dismount and re-mount all databases -> this should cause log roll to occur and the replication service to replicate a log.
  2. Create test mailboxes in each store and send mail to them.  Mail flow will create log files.  If the mail flow is not significant enough to roll the logs, automatic log roll will occur at a later time even though the log file is not full.
  1. Perform a Move-ClusteredMailboxServer -id "CMS" -TargetMachine CCRNODE1 -MoveComment "Move CMS back to CCRNODE1 after public NIC testfailure"
  2. Verify the presence of E00.log on CCRNODE1, and that the E00.log file is not present on CCRNODE2 (now passive node).
  3. Perform these three standard tests:
    1. Test-ReplicationHealth
    2. Get-ClusteredMailboxServer
    3. Get-StorageGroupCopyStatus –ID CMS

      to validate all clustering and replication is healthy.

End of Public NIC Test

Private NIC Failure Test

This test simulates the failure of the private NIC on the active node.

  1. Verify CCRNODE1 node is active.
  2. Disable private NIC on active node.
  3. Issue the following standard test cmdlets and note the results:
  4. Get-ClusteredMailboxServer
  5. Get-StorageGroupCopyStatus –ID CMS
  6. Test-ReplicationHealth

Note the first two cmdlets show normal online and healthy status. The replication health result shows an error for the ClusterNetwork and an error on the private internal network, as expected.

  1. Re-enable private NIC on active node.
  2. Perform these three standard tests:
    1. Test-ReplicationHealth
    2. Get-ClusteredMailboxServer
    3. Get-StorageGroupCopyStatus –ID CMS

      to validate all clustering and replication is healthy.
  3. If the storage group does not come online, stop and start-ClusteredMailboxServer.
  4. Check Failover Cluster Manager for resources that may need to manually come online.
  5. Perform the standard three testing cmdlets once again to verify correct function.

End of Private NIC Test

Database Failure Test

This test simulates the failure of one of the mailbox databases on the active node.

  1. Verify CCRNODE1 is active.
  2. Dismount active node Store1.edb database simulating a database failure.
  3. CCR passive node CCRNODE2 will not provide service for this failed database – this is a manual recovery process involving the administrator back to the original active node. To simulate a real database failure, delete (or move) the SG1 logs and database files simulating a loss of logs and database files.
  4. A reseed from the passive copy to the active node is required. Suspend the copy process by executing the Suspend-StorageGroupCopy –id "EXCMS\sg1" cmdlet from the active node.
  5. Mount the database for SG1. Note the message regarding the creation of a new empty database.
  6. Execute the Update-StorageGroupCopy –id "CMS\sg1" –DeleteExistingFiles from the passive node. Note the warning message and choose to initiate and for possible reseeding of the checkpoint file. Note 2nd warning message and choose to initiate and for possible reseeding of the database.
  7. Perform these three standard tests:
    1. Test-ReplicationHealth
    2. Get-ClusteredMailboxServer
    3. Get-StorageGroupCopyStatus –ID CMS

      to validate all clustering and replication is healthy.


    End of Database Failure Test

Failed Information Store Service Test

This test simulates the failure of the Information Store service on the active node.

  1. Configure the information store service on CCRNODE1 (active node) for a startup type of Disabled, and then stop the service. If the service was simply stopped manually, it would be restarted automatically and no changes to service or nodes would occur and service would resume normally. But, in this case, its startup type is set to Disabled to prevent the auto-restart.
  2. Failover to CCRNODE2 to fully recover from failed service using Move-ClusteredMailboxServer –id CMS –TargetMachine CCRNODE2 –MoveComment "Comment here"
  3. Correct failed service on CCRNODE1. Restart service.
  4. Perform these three standard tests:
    1. Test-ReplicationHealth
    2. Get-ClusteredMailboxServer
    3. Get-StorageGroupCopyStatus –ID CMS

      to validate all clustering and replication is healthy.

 

  1. Failback to CCRNODE1 if desired with Move-ClusteredMailboxServer –id CMS –TargetMachine CCRNODE1 –MoveComment "Comment here".
  2. Perform these three standard tests:
    1. Test-ReplicationHealth
    2. Get-ClusteredMailboxServer
    3. Get-StorageGroupCopyStatus –ID CMS

      to validate all clustering and replication is healthy.

    End of Failed Information Store Service Failure Test

Active Node Failure Test

This test simulates the failure of the active node.

  1. Take CCRNODE1 active node offline completely.
  2. Note passive node CCRNODE2 comes online for the CMS for most or all resources. If CMS does not come online, try performing a Restore-StorageGroupCopy for that storage group. (Public folder will not come online – see bullet #4 below.
  3. If the previous cmdlet fails, execute Stop-ClusteredMailboxServer – then – Start-ClusteredMailboxServer.
  4. Verify in Failover Cluster Manager resource are online – manually attempt to bring any down online. For public folder storage group, it will never mount until original node is brought back online. For more information on public folders and CCR, see http://technet.microsoft.com/en-us/library/bb123996.aspx.
  5. To determine if databases have come online and have been mounted, execute Get-MailboxDatabase -status | ft name,storagegroup,mounted. This EMS cmdlet will return the database status much faster than using EMC.
  6. When the failed node comes online, check the following and make necessary corrections:
    1. Bring online the public folder resource/storage group.
    2. For any other database resources stuck in initialization state, attempt to Stop and Start-ClusteredMailboxServer to bring online.

    End of Active Node Failure Test

     

Mark Myers

Senior Consultant

Comments

There are no comments yet for this post.

Copyright © MMCUG - Midwest Messaging and Collaboration User Group 2008 Terms and conditions