English 中文(简体)
Calculation of DNA sequences
原标题:

Could you tell me how I can calculate the DNA sequences by Java using Levenshtein algorithm

问题回答

Since you did not tag it as homework, I see no need in writing this yourself. Apache s StringUtils has it.

Here is the algorithm from the Wikipedia page on Levenshtein distances:

 int LevenshteinDistance(char s[1..m], char t[1..n])
 {
   // d is a table with m+1 rows and n+1 columns
   declare int d[0..m, 0..n]

   for i from 0 to m
     d[i, 0] := i // deletion
   for j from 0 to n
     d[0, j] := j // insertion

   for j from 1 to n
   {
     for i from 1 to m
     {
       if s[i] = t[j] then 
         d[i, j] := d[i-1, j-1]
       else
         d[i, j] := minimum
                    (
                      d[i-1, j] + 1,  // deletion
                      d[i, j-1] + 1,  // insertion
                      d[i-1, j-1] + 1 // substitution
                    )
     }
   }

   return d[m, n]
 }

(I m sure you can make java out of that with a little work.)

pass in your two DNA sequences as s and t and it will return the distance as an int.

I believe this is what you re after. You can remove the System.out.println statements if you like. Note that if you leave them in, that the first row and columns are omitted from what is printed.

Verified against the results on the wikipedia page.

public int getLevenshteinDistance(String a, String b)
{
    // d is a table with m+1 rows and n+1 columns
    char[] s = (a).toCharArray();
    char[] t = (b).toCharArray();
    System.out.println(a + " - " + b);
    int m = s.length;
    int n = t.length;
    int[][] d = new int[m + 1][n + 1];

    int i;
    int j;
    for(i = 0; i < (m + 1); i++)
    {
        d[i][0] = i; //deletion
    }

    for(j = 0; j < (n + 1); j++)
    {
        d[0][j] = j; //insertion
    }

    for (j = 1; j < (n + 1); j++)
    {
        for (i = 1; i < (m + 1); i++)
        {
            if (s[i-1] == t[j-1])
            {
                d[i][j] = d[i-1][j-1];
            }
            else
            {
                d[i][j] = Math.min((d[i-1][j] + 1), //deletion
                        (Math.min((d[i][j-1] + 1), //insertion
                        (d[i-1][j-1] + 1)))); //substitution
            }
            System.out.print(" [" + d[i][j] + "]");
        }
        System.out.println("");
    }

    return d[m][n];
}

To test:

    String a = "Saturday";
    String b = "Sunday";
    int d = getLevenshteinDistance(a, b);
    System.out.println(d);
    a = "kitten";
    b = "sitting";
    d = getLevenshteinDistance(a, b);
    System.out.println(d);

The wiki for Levenshtein contains an algorithm and an explanation of the resulting matrix. Simply implement the algorithm as a method and return the last element in the matrix.

Copy/Paste the function from the Levenshtein Distance Algorithm and use it like so:

 String a = "AAAAAAAAAAAAAAAAAA";
 String b = "AAAAAAAAACTAAAAAAA";

 int d = getLevenshteinDistance(a,b);
 System.out.println(d);

If you are just interested in calculating the variation between two DNA sequences you should use the Damerau–Levenshtein distance not the regular Levenshtein distance.

The wikipedia entry contains some sample code which you surely are able to map to java code.





相关问题
Spring Properties File

Hi have this j2ee web application developed using spring framework. I have a problem with rendering mnessages in nihongo characters from the properties file. I tried converting the file to ascii using ...

Logging a global ID in multiple components

I have a system which contains multiple applications connected together using JMS and Spring Integration. Messages get sent along a chain of applications. [App A] -> [App B] -> [App C] We set a ...

Java Library Size

If I m given two Java Libraries in Jar format, 1 having no bells and whistles, and the other having lots of them that will mostly go unused.... my question is: How will the larger, mostly unused ...

How to get the Array Class for a given Class in Java?

I have a Class variable that holds a certain type and I need to get a variable that holds the corresponding array class. The best I could come up with is this: Class arrayOfFooClass = java.lang....

SQLite , Derby vs file system

I m working on a Java desktop application that reads and writes from/to different files. I think a better solution would be to replace the file system by a SQLite database. How hard is it to migrate ...

热门标签