Windows NLB and Application Level health check

Recently I came across a scenario where requirement was having Active Passive windows NLB.  So, if Active node experiences issue we should ensure that all the Application related services are stopped on that node and these services are started on passive node. To achieve the failover we need the health check. Windows NLB provides support for all network failures such as active node looses network connectivity, is shut down or crash. In all these cases windows NLB will start sending connections to passive node. However, what if application related service crashes, hangs or stops servicing the connections? Does windows NLB offers any options to initiate failover options with Application level health check?

Windows NLB is great feature for network load balancing, it has not evolved much since windows 2000 days and it does not address above questions directly. However, Microsoft does provide some sample script templates for Monitoring Application Level Health which can be enhanced as required.

I also came across another blog by DAVID TOSOFF, where he has provided a brilliant script which can be configured for application level health check and can be customized to work with any application for NLB. What more is it can also be configured to run as a service.

However, all i needed is to have a script which will check if certain services are running on Active node. If any of the service is not running, stop all services on Active Node, make it passive, start services or passive node and make it active. After looking for few pieces of code from different blogs and forums i came up with below Power shell script to accomplish this task.

In addition to the service health check and initiating failover, this script also provides basic health check and recovery for NLB cluster.

########################### NLB Health Check ##############################
#Define Nodes
$node1 = “Node1.Contoso.Com”
$node2 = “Node2.Contoso.Com”
#get NLB status on NLB Nodes
$Node1status = Get-WmiObject -Class MicrosoftNLB_Node -computername $node1 -namespace root\MicrosoftNLB |  where {$_.ComputerName -eq $node1} | Select-Object __Server, statuscode
$Node2status = Get-WmiObject -Class MicrosoftNLB_Node -computername $node2 -namespace root\MicrosoftNLB |  where {$_.ComputerName -eq $node2} | Select-Object __Server, statuscode

Function HealthCheck ([String]$Active, [String]$Passive)
{
#Create an array of all services running
$GetService = get-service -ComputerName $Active
#Write-Host “Checking Service on $Active ” -ForegroundColor Green
#Create a subset of the previous array for services you want to monitor
$ServiceArray = “Service1″,”Service2″,”Service3″,”Service4”;
#Find any iWFM service that is stopped
foreach ($Service in $GetService)
{
    foreach ($srv in $ServiceArray)
    {
        if ($Service.name -eq $srv)
        {
            #check if a service is hung
            if ($Service.status -eq “StopPending”)
            {
            #email to notify if a service is down
            #Send-Mailmessage -to admin@domain.com -Subject “$srv is hung on $Active” -from admin@domain.com -Body “The $srv service was found hung” -SmtpServer smtp.domain.com
            $servicePID = (gwmi win32_Service | where { $_.Name -eq $srv}).ProcessID
            Stop-Process $ServicePID
            }
            # check if a service is stopped
            elseif ($Service.status -eq “Stopped”)
            {
            #email to notify if a service is down
            #Send-Mailmessage -to admin@domain.com -Subject “$srv is stopped on $Active” -from admin@domain.com -Body “The $srv service was found stopped” -SmtpServer smtp.domain.com
            #Write-Host “$srv is stopped on $Active” -ForegroundColor Red
            if ( Test-Connection -ComputerName $Passive -Count 1 -ErrorAction SilentlyContinue )
                {
                Write-Host “$Passive is up” -ForegroundColor Magenta
                #Call Cleanup Function for Active to Passive
                Cleanup $Active $Passive
                }
            else
                {
                Write-Host “$Passive is down” -ForegroundColor Red
                #automatically restart the service.
                Start-Service -InputObject (get-Service -ComputerName $Active -Name $srv)
                }
            }
        }
    }
}
} # End of Function

Function Cleanup ([String]$Stop, [String]$Start)
{
    Import-Module NetworkLoadBalancingClusters
    #$services = “Service1″,”Service2″,”Service3″,”Service4”;
    #Invoke-Command -ComputerName $Stop -ScriptBlock { cd C:\users\Administrator.Contoso\Desktop; .\Services_stop.cmd}
    #Stop Services on Failed Node
        (gwmi win32_service -computername $Stop -filter “name=’Service1′”).stopservice()
        (gwmi win32_service -computername $Stop -filter “name=’Service2′”).stopservice()
        (gwmi win32_service -computername $Stop -filter “name=’Service3′”).stopservice()
        (gwmi win32_service -computername $Stop -filter “name=’Service4′”).stopservice()
    #Stop Failed NLB Node
    Stop-NlbClusterNode -HostName $Stop -Drain -Timeout 10
    #Invoke-Command -ComputerName $Start -ScriptBlock { cd C:\users\Administrator.Contoso\Desktop; .\Services_start.cmd}
    #Start Service on Active Node
        (gwmi win32_service -computername $Start -filter “name=’Service1′”).startservice()
        (gwmi win32_service -computername $Start -filter “name=’Service2′”).startservice()
        (gwmi win32_service -computername $Start -filter “name=’Service3′”).startservice()
        (gwmi win32_service -computername $Start -filter “name=’Service4′”).startservice()
    #Start Passive NLB Node
    Start-NlbClusterNode -HostName $Start
        #$result = (gwmi win32_service -computername $computer -filter “name=’$service'”).startservice()}
        #$result = (gwmi win32_service -computername $computer -filter “name=’$service'”).stopservice()
        #$result = (gwmi win32_service -computername $computer -filter “name=’$service'”).ChangeStartMode(“Disabled”)
        #$result = (gwmi win32_service -computername $computer -filter “name=’$service'”).ChangeStartMode(“Automatic”)
} #End of Function
IF ($node1status.statuscode -eq “1008” -or $node1status.statuscode -eq “1007”)
{
    write-host “NLB Status of $node1 is: Converged”  -ForegroundColor Green
    HealthCheck $node1 $node2
}
else
{
    write-host “NLB Status of $node1 is: Error”  -ForegroundColor Red
    IF ($node2status.statuscode -eq “1008” -or $node2status.statuscode -eq “1007”)
    {
    write-host “NLB Status of $node2 is: Converged”  -ForegroundColor Green
    #Write-Host “Passing HealthCheck with $node2, $node1” -ForegroundColor Green
    HealthCheck $node2 $node1
    }
    else
    {
    write-host “NLB Status of $node2 is: Error”  -ForegroundColor Red
        if ( Test-Connection -ComputerName $Node1 -Count 1 -ErrorAction SilentlyContinue )
                {
                Write-Host “$Node1 is up” -ForegroundColor Magenta
                #Call Cleanup Function for Active to Passive
                Start-NlbCluster -HostName Node1
                Cleanup $node2 $node1
                start-sleep -seconds 30
                }
            else
                {
                Write-Host “$Node1 is down” -ForegroundColor Red
                if ( Test-Connection -ComputerName $Node2 -Count 1 -ErrorAction SilentlyContinue )
                    {
                    Write-Host “$Node2 is up” -ForegroundColor Magenta
                    #Call Cleanup Function for Active to Passive
                    Start-NlbCluster -HostName Node2
                    Cleanup $node2 $node1
                    start-sleep -seconds 30
                    }
                else
                    {
                    Write-Host “$Node2 is down” -ForegroundColor Red
                    Write-Host “All nodes in NLB are DOWN !!!!!!!” -ForegroundColor Red
                    }
                }
    }
}

########################### NLB Health Check ##############################

Note: I am no expert when it comes to scripting and this is not a perfect script but it works. There is a lot of room for improvement and if you have any suggestions please help me make it better:)

Next thing is to run this script as a service to monitor the health of NLB. There are many ways to do it. Schtasks is a great utility to install your custom scripts as a service or you can use Instsrv.exe and Srvany.exe which are part of Windows Server 2003 Resource Kit Tools.

However, i found task scheduler a better fit for my scenario. There is a nice post on TechNet blog which explains the details of it. However if you want a quick version, below is the only command that you would need.

C:\schtasks /create /tn HealthCheck /tr “powershell -NoLogo -WindowStyle hidden -file NLB_Health_Check.ps1” /sc minute /mo 1 /ru System

Don’t forget to change the execution policy of power shell to unrestricted before you schedule the script. To do that run power shell as Administrator and run command,

Set-ExecutionPolicy – Unrestricted

With power shell you can enhance the power of windows NLB to host your standard applications for high availability with ease, Have Fun 🙂

Advertisements