English 中文(简体)
What is the best way to search multiple sources simultaneously?
原标题:

I m writing a phonebook search, that will query multiple remote sources but I m wondering how it s best to approach this task.

The easiest way to do this is to take the query, start a thread per remote source query (limiting max results to say 10), waiting for the results from all threads and aggregating the list into a total of 10 entries and returning them.

BUT...which of the remote source is more important if all sources return at least 10 results, so then I would have to do a search on the search results. While this would yield accurate information it seems inefficient and unlikely to scale up well.

Is there a solution commercial or open source that I could use and extend, or is there a clever algorithm I can use that I ve missed?

Thanks

最佳回答

John, I believe what you want is federated search. I suggest you check out Solr as a framework for this. I agree with Nick that you will have to evaluate the relative quality of the different sources yourself, and build a merge function. Solr has some infrastructure for this, as this email thread shows.

问题回答

To be honest I haven t seen a ready solution, but this is why we programmers exist: to create a solution if one is not readily availble :-)

The way I would do it is similar to what you describe: using threads - if this is a web application then ajax is your friend for speed and usability, for a desktop app gui representation is not even an issue.

It sounds like you can t determine or guess upfront which source is the best in terms of reliability, speed & number of results. So you need to setup you program so that it determines best results on the fly. Let s say you have 10 data sources, and therfore 10 threads. When you fire up your threads - wait for the first one to return with results > 0. This is going to be you "master" result. As other threads return you can compare them to your "master" result and add new results. There is really no way to avoid this if you want to provide unique results. You can start displaying results as soon as you have your first thread. You don t have to update your screen right away with all the new results as they come in but if takes some time user may become agitated. You can just have some sort of indicator that shows that more results are available, if you have more than 10 for instance.

If you only have a few sources, like 10, and you limit the number of results per source you are waiting for, to like 10, it really shouldn t take that much time to sort through them in any programming language. Also make sure you can recover if your remote sources are not available. If let s say, you are waiting for all 10 sources to come back to display data - you may be in for a long wait, if one of the sources is down.

The other approach is to f00l user. Sort of like airfare search sites do - where they make you want a few seconds while they collect and sort results. I really like Kayak.com s implementation - as it make me feel like it s doing something unlike some other sites.

Hope that helps.





相关问题
What to look for in performance analyzer in VS 2008

What to look for in performance analyzer in VS 2008 I am using VS Team system and got the performance wizard and reports going. What benchmarks/process do I use? There is a lot of stuff in the ...

SQL Table Size And Query Performance

We have a number of items coming in from a web service; each item containing an unknown number of properties. We are storing them in a database with the following Schema. Items - ItemID - ...

How to speed up Visual Studio 2008? Add more resources?

I m using Visual Studio 2008 (with the latest service pack) I also have ReSharper 4.5 installed. ReSharper Code analysis/ scan is turned off. OS: Windows 7 Enterprise Edition It takes me a long time ...

Manually implementing high performance algorithms in .NET

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#. Apart from needing to add an extra range check on the left/right variables before the recursive call, ...

How do I profile `paster serve` s startup time?

Python s paster serve app.ini is taking longer than I would like to be ready for the first request. I know how to profile requests with middleware, but how do I profile the initialization time? I ...

热门标签