English 中文(简体)
Expert System (?) algorithm
原标题:

I have an algorithmic problem which can be reduced to this task:

Suppose we have a list of n diseases and m symptoms.
For each disease d and symptom s, we have one of three options:

  • the symptom is positively correlated with the disease: s => d
  • the symptom is negatively correlated with the disease: s => ~d
  • the symptom is uncorrelated with the disease

The goal of the algorithm is to create a list of yes/no questions regarding symptoms (or even better - a binary tree of questions), which can deduce the exact disease according to the symptoms.

Any references to specific algorithms, relevant software tools and even domain-specific jargon would be very appreciated.

最佳回答

You case use a decision tree : http://en.wikipedia.org/wiki/Decision_tree_learning

Basically finding the optimum tree (ie which will minimize the average number of questions asked before you can identify the desease) is NP-hard.

You can can use a greedy algorithm and then try to optimise it (if you need it).

At each step you want to reduce at much as possible the number of deceases that are still "possible".

You are at the top of your tree and so you have three possible options for a given s, count the number of diseases in each option : pc nc uc.

On one side you have pc on the other nc and the uc you can t say anything (you can look at two level of your tree to have some info but for now we don t do that).

So worst case scenario, you have pc / nc + uc or pc + uc / nc, choose the s that minimize worst case (ie : lot of one side, only a few on the other).

You need to minimize abs( pc - (nc + uc)) + abs ( (pc+uc) - nc).

You now have your s for your first question and you can iteratively build your tree.

问题回答

Is your domain really this binary or in fact, would you more likely want to use the correlation coefficient for each symptom/disease pair as a numeric value? This would allow strong correlations to influence the result more than weak correlations.

If so then you might want to look at Support Vector Machines that analyze data and recognize patterns.

The problem is very similar to the bacteria / antibiotic question of Mycin, a forerunner to more generalized rule-based expert systems technology of the 1980s. There were other medical diagnostic programs developed with the resulting tools.

http://en.wikipedia.org/wiki/Mycin

If you just need a reference, take a look at the Russel & Norvig book. I don t have my copy at hand right now, but I can update this answer with some chapter suggestions tomorrow.

If each disease only has a few symptoms, then you can use graphical models to model the probabilities.

http://en.wikipedia.org/wiki/Graphical_model

http://www.cs.ubc.ca/~murphyk/Software/bnsoft.html

But I don t know if you can use a graphical model to create a tree of questions.





相关问题
How to add/merge several Big O s into one

If I have an algorithm which is comprised of (let s say) three sub-algorithms, all with different O() characteristics, e.g.: algorithm A: O(n) algorithm B: O(log(n)) algorithm C: O(n log(n)) How do ...

Grokking Timsort

There s a (relatively) new sort on the block called Timsort. It s been used as Python s list.sort, and is now going to be the new Array.sort in Java 7. There s some documentation and a tiny Wikipedia ...

Manually implementing high performance algorithms in .NET

As a learning experience I recently tried implementing Quicksort with 3 way partitioning in C#. Apart from needing to add an extra range check on the left/right variables before the recursive call, ...

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

Enumerating All Minimal Directed Cycles Of A Directed Graph

I have a directed graph and my problem is to enumerate all the minimal (cycles that cannot be constructed as the union of other cycles) directed cycles of this graph. This is different from what the ...

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签