English 中文(简体)
Challenges and Best Practices for Failing Over Services
原标题:

Does anyone know of any established best practices for running Windows services (in my case, developed in .NET) such that they will (automatically) fail over correctly to another server, for high availability purposes?

The main ways I can see this being done are either starting up the secondary server when required (in which case there needs to be something monitoring the other server), or having both services running together (in which case they need to synchronize their work so they don t try to do the same things).

Is there a pattern or model for this sort of problem? I know the exact situation will make a big difference, but it does seem like a fairly common issue.

Thanks

John

最佳回答

Here s what has worked for me.

From an infrastructure stand point you will need to have 2 Windows servers that are clustered. (2 standard Windows Server boxes will do, the Clustering piece can be installed and configured, most sys admins should know how to do this.) Next, install your service on both nodes of the cluster and have them both turned OFF and set to MANUAL startup. Next, add a clustered resource to the Windows Cluster Administrator for your service that will manage turning on and off your service on whichever node is active. Let the Windows cluster manage when your service is running and on which node. This is the easy part of clustering your service.

From the service stand point, you will want to design your service so that it can be as stateless as possible. This is kind of lame advice but it really depends on what your service is doing. In the design, just assume that at somepoint during the code s lifetime it will stop at the worst possible time. How will the service on the node2 know where to pickup where node1 left off? That s the hard part that you need to design for. Depending on what your service is doing you can leave the last completed task in a db table or shared data file. You could also have it start from the beginning and double check whether that task has been completed or not before acting upon it.

Again, it is really going to depend on what the service needs to accomplish. Hope this helps.

问题回答

Having both running all the time is probably the simplest solution, but you need to ensure that you never go above 50% load, otherwise when one fails, the other will become overloaded and perhaps fail too.

To synchronize, use a transactional database. Trying to write your own synchronization will usually result in bugs.

If you can have both services working - it is better. you need to make sure they are stateless or know how to handle state issue, and the Databse will sync between them. In a no single point of failure - you will push the problem to the DB, and there you can have a 2 node active active cluster, and let the DB manufacture handle the sync issues.

I believe the best way to deal with failover is at the network level wherever possible. Virtual IPs fronting load-balanced or primary/failover environments is a good way to avoid having to write code for failover scenarios.

In cases where you must handle failover in code:

  1. Test connection/service call
  2. If test fails, send alerts
  3. Fail over to next "registered" service endpoint

There are two basic approaches.

  1. clients are aware of different endpoint address and switch as needed or as directed by another service or configuration mechanism. (as an example the stocktrader demo application does this.)

  2. The clients are not aware, and you use a standard network load balancing approach which can also provide failover. F5 is one product. There are many others. It is basically like a NAT for services all requests go through your NLB and and it sends them on to a server, and forwards the response back to the caller. These products monitor the services and only use the ones that are up. Also you can often customize it with rules to have it assign new requests to servers based on server workloads. Windows server has this functionality built-in to some extent.

Either way you do it, it is much much easier if your service calls are "stateless".





相关问题
Choosing the right subclass to instantiate programmatically

Ok, the context is some serialization / deserialization code that will parse a byte stream into an object representation that s easier to work with (and vice-versa). Here s a simplified example ...

Design pattern for managing queues and stacks?

Is there a design pattern for managing a queue or a stack? For example, we are looking to manage a list of tasks. These tasks will be added to a group queue, users will then be able to pull off the ...

Organizing classes using the repository design pattern

I have started upgrading one of our internal software applications, written in ASP.NET Web Forms, and moving to ASP.NET MVC. I am trying to leverage the Repository design pattern for my classes, ...

Misuse of Observer Pattern?

I have a Car object which contains a latitude field and a longitude field. I use the observer pattern so that any time either of these fields change in my application, my car object is notified. I ...

How are Models (in MVC) and DAOs supposed to interact?

How are models and DAOs supposed to interact? I m in the process of putting together a simple login module and I m unsure where to put the "business logic." If I put the logic with the data in the ...

热门标签