English 中文(简体)
在一个具有1个特性的容忍度的范围内进行配对
原标题:Match sub-string within a string with tolerance of 1 character mismatch
  • 时间:2010-07-24 17:52:08
  •  标签:
  • c
  • string

我正在接受亚马孙关于职业资格的约谈,我从这个令人感兴趣的问题来到这里,我没有能够说明如何做。 我自两天以来一直在思考这一问题。 要么我走了一条路,要么是真心实意的书写功能。

www.un.org/Depts/DGACM/index_spanish.htm 问题如下:

Write a function in C that can find if a string is a sub-string of another. Note that a mismatch of one character should be ignored.

A mismatch can be an extra character: ’dog’ matches ‘xxxdoogyyyy’  
A mismatch can be a missing character: ’dog’ matches ‘xxxdgyyyy’ 
A mismatch can be a different character: ’dog’ matches ‘xxxdigyyyy’

这里提到的回报价值是没有的,因此,我假定这一功能的签名可以这样:

char * MatchWithTolerance(const char * str, const char * substr);

如果与特定规则相匹配,则把点人送回在地狱中搭配的距离。 遣返无效。

<<>Bonus

If someone can also figure out a generic way of making the tolerance to n instead of 1, then that would be just brilliant. In that case the signature would be:

char * MatchWithTolerance(const char * str, const char * substr, unsigned int tolerance = 1);
最佳回答

似乎在工作上,请允许我知道,你是否发现任何错误,我试图纠正这些错误:

int findHelper(const char *str, const char *substr, int mustMatch = 0)
{
    if ( *substr ==    )
        return 1;

    if ( *str ==    )
        return 0;

    if ( *str == *substr )
        return findHelper(str + 1, substr + 1, mustMatch);
    else
    {
        if ( mustMatch )
            return 0;

        if ( *(str + 1) == *substr )
            return findHelper(str + 1, substr, 1);
        else if ( *str == *(substr + 1) )
            return findHelper(str, substr + 1, 1);
        else if ( *(str + 1) == *(substr + 1) )
            return findHelper(str + 1, substr + 1, 1);
        else if ( *(substr + 1) ==    )
            return 1;
        else
            return 0;
    }
}

int find(const char *str, const char *substr)
{
    int ok = 0;
    while ( *str !=    )
        ok |= findHelper(str++, substr, 0);

    return ok;
}


int main()
{
    printf("%d
", find("xxxdoogyyyy", "dog"));
    printf("%d
", find("xxxdgyyyy", "dog"));
    printf("%d
", find("xxxdigyyyy", "dog"));
}

基本上,我确保只有一种特性可以有所不同,并履行这种职能,为每一重的沙斯特人服务。

问题回答

This is related to a classical problem of IT, referred to as Levenshtein distance. See Wikibooks for a bunch of implementations in different languages.

这与先前的解决办法稍有不同,但我受到问题的影响,希望给它一枪。 如果希望的话,显然最优化,我只想找到解决办法。

char *match(char *str, char *substr, int tolerance)
{
  if (! *substr) return str;
  if (! *str) return NULL;

  while (*str)
  {
    char *str_p;
    char *substr_p;
    char *matches_missing;
    char *matches_mismatched;

    str_p = str;
    substr_p = substr;

    while (*str_p && *substr_p && *str_p == *substr_p)
    {
      str_p++;
      substr_p++;
    }
    if (! *substr_p) return str;
    if (! tolerance)
    {
      str++;
      continue;
    }

    if (strlen(substr_p) <= tolerance) return str;

    /* missed due to a missing letter */
    matches_missing = match(str_p, substr_p + 1, tolerance - 1);
    if (matches_missing == str_p) return str;

    /* missed due to a mismatch of letters */
    matches_mismatched = match(str_p + 1, substr_p + 1, tolerance - 1);
    if (matches_mismatched == str_p + 1) return str;

    str++;
  }
  return NULL;
}

问题是否要有效地做到这一点?

宽松的解决办法是,从左边到右边,在目前次位,如果只有一种特性在比较中有所不同时,则转折。

Let n = size of str Let m = size of substr

There are O(n) substrings in str, and the matching step takes time O(m). Ergo, the naive solution runs in time O(n*m)

With arbitary no. of tolerance levels.

我可以考虑的所有试办案例。 Loosely based on ́ / ́s Solutions.

#include<stdio.h>
#include<string.h>

report (int x, char* str, char* sstr, int[] t) {
    if ( x )
        printf( "%s is a substring of %s for a tolerance[%d]
",sstr,str[i],t[i] );
    else
        printf ( "%s is NOT a substring of %s for a tolerance[%d]
",sstr,str[i],t[i] );
}

int find_with_tolerance (char *str, char *sstr, int tol) {

    if ( (*sstr) ==    ) //end of substring, and match
        return 1;

    if ( (*str) ==    ) //end of string
        if ( tol >= strlen(sstr) ) //but tol saves the day
            return 1;
        else    //there s nothing even the poor tol can do
            return 0;

    if ( *sstr == *str ) { //current char match, smooth
        return find_with_tolerance ( str+1, sstr+1, tol );
    } else {
        if ( tol <= 0 ) //that s it. no more patience
            return 0;
        for(int i=1; i<=tol; i++) {
            if ( *(str+i) == *sstr ) //insertioan of a foreign character
                return find_with_tolerance ( str+i+1, sstr+1, tol-i );
            if ( *str == *(sstr+i) ) //deal with dletion
                return find_with_tolerance ( str+1, sstr+i+1, tol-i );
            if ( *(str+i) == *(sstr+i)  ) //deal with riplacement
                return find_with_tolerance ( str+i+1, sstr+i+1, tol-i );
            if ( *(sstr+i) ==    ) //substr ends, thanks to tol & this loop
                return 1;
        }
        return 0; //when all fails
    }
}

int find (char *str, char *sstr, int tol ) {
    int w = 0;
    while (*str!=  )
        w |= find_with_tolerance ( str++, sstr, tol );
    return (w) ? 1 : 0;
}

int main() {
    const int n=3; //no of test cases
    char *sstr = "dog"; //the substr
    char *str[n] = { "doox", //those cases
                    "xxxxxd",
                    "xxdogxx" };
    int t[] = {1,1,0}; //tolerance levels for those cases
    for(int i = 0; i < n; i++) {
        report( find ( *(str+i), sstr, t[i] ), *(str+i), sstr, t[i] );
    }
    return 0;
}




相关问题
Simple JAVA: Password Verifier problem

I have a simple problem that says: A password for xyz corporation is supposed to be 6 characters long and made up of a combination of letters and digits. Write a program fragment to read in a string ...

Case insensitive comparison of strings in shell script

The == operator is used to compare two strings in shell script. However, I want to compare two strings ignoring case, how can it be done? Is there any standard command for this?

Trying to split by two delimiters and it doesn t work - C

I wrote below code to readin line by line from stdin ex. city=Boston;city=New York;city=Chicago and then split each line by ; delimiter and print each record. Then in yet another loop I try to ...

String initialization with pair of iterators

I m trying to initialize string with iterators and something like this works: ifstream fin("tmp.txt"); istream_iterator<char> in_i(fin), eos; //here eos is 1 over the end string s(in_i, ...

break a string in parts

I have a string "pc1|pc2|pc3|" I want to get each word on different line like: pc1 pc2 pc3 I need to do this in C#... any suggestions??

Quick padding of a string in Delphi

I was trying to speed up a certain routine in an application, and my profiler, AQTime, identified one method in particular as a bottleneck. The method has been with us for years, and is part of a "...

热门标签