English 中文(简体)
在另一个较大的数组中查找一个数组
原标题:Find an array inside another larger array
  • 时间:2010-10-15 07:12:53
  •  标签:
  • java
  • arrays

我最近被要求为一份工作编写3个测试程序。它们将只使用核心Java API和我选择的任何测试框架编写。应在适当的情况下进行单元测试。

虽然我根本没有收到任何反馈,但我想他们不喜欢我的解决方案(否则我会从他们那里得到消息),所以我决定在这里展示我的程序,并询问这个实施是否可以被认为是好的,如果不能,为什么?

为了避免混淆,我现在只问第一个。

Implement a function that finds an array in another larger array. It should accept two arrays as parameters and it will return the index of the first array where the second array first occurs in full. Eg, findArray([2,3,7,1,20], [7,1]) should return 2.

我没有试图找到任何现有的解决方案,而是想自己做。

Possible reasons: 1. Should be static. 2. Should use line comments instead of block ones. 3. Didn t check for null values first (I know, just spotted too late). 4. ?

UPDATE:
Quite a few reasons have been presented, and it s very difficult for me to choose one answer as many answers have a good solution. As @adietrich mentioned, I tend to believe they wanted me to demonstrate knowledge of core API (they even asked to write a function, not to write an algorithm).

I believe the best way to secure the job was to provide as many solutions as possible, including: 1. Implementation using Collections.indexOfSubList() method to show that I know core collections API. 2. Implement using brute-force approach, but provide a more elegant solution. 3. Implement using a search algorithm, for example Boyer-Moore. 4. Implement using combination of System.arraycopy() and Arrays.equal(). However not the best solution in terms of performance, it would show my knowledge of standard array routines.

Thank you all for your answers!
END OF UPDATE.

以下是我写的内容:

实际程序:

package com.example.common.utils;

/**
 * This class contains functions for array manipulations.
 * 
 * @author Roman
 *
 */
public class ArrayUtils {

    /**
     * Finds a sub array in a large array
     * 
     * @param largeArray
     * @param subArray
     * @return index of sub array
     */
    public int findArray(int[] largeArray, int[] subArray) {

        /* If any of the arrays is empty then not found */
        if (largeArray.length == 0 || subArray.length == 0) {
            return -1;
        }

        /* If subarray is larger than large array then not found */
        if (subArray.length > largeArray.length) {
            return -1;
        }

        for (int i = 0; i < largeArray.length; i++) {
            /* Check if the next element of large array is the same as the first element of subarray */
            if (largeArray[i] == subArray[0]) {

                boolean subArrayFound = true;
                for (int j = 0; j < subArray.length; j++) {
                    /* If outside of large array or elements not equal then leave the loop */
                    if (largeArray.length <= i+j || subArray[j] != largeArray[i+j]) {
                        subArrayFound = false;
                        break;
                    }
                }

                /* Sub array found - return its index */
                if (subArrayFound) {
                    return i;
                }

            }
        }

        /* Return default value */
        return -1;
    }

}

测试代码:

package com.example.common.utils;

import com.example.common.utils.ArrayUtils;

import junit.framework.TestCase;

public class ArrayUtilsTest extends TestCase {

    private ArrayUtils arrayUtils = new ArrayUtils();

    public void testFindArrayDoesntExist() {

        int[] largeArray = {1,2,3,4,5,6,7};
        int[] subArray = {8,9,10};

        int expected = -1;
        int actual = arrayUtils.findArray(largeArray, subArray);

        assertEquals(expected, actual);
    }

    public void testFindArrayExistSimple() {

        int[] largeArray = {1,2,3,4,5,6,7};
        int[] subArray = {3,4,5};

        int expected = 2;
        int actual = arrayUtils.findArray(largeArray, subArray);

        assertEquals(expected, actual);
    }

    public void testFindArrayExistFirstPosition() {

        int[] largeArray = {1,2,3,4,5,6,7};
        int[] subArray = {1,2,3};

        int expected = 0;
        int actual = arrayUtils.findArray(largeArray, subArray);

        assertEquals(expected, actual);
    }

    public void testFindArrayExistLastPosition() {

        int[] largeArray = {1,2,3,4,5,6,7};
        int[] subArray = {5,6,7};

        int expected = 4;
        int actual = arrayUtils.findArray(largeArray, subArray);

        assertEquals(expected, actual);
    }

    public void testFindArrayDoesntExistPartiallyEqual() {

        int[] largeArray = {1,2,3,4,5,6,7};
        int[] subArray = {6,7,8};

        int expected = -1;
        int actual = arrayUtils.findArray(largeArray, subArray);

        assertEquals(expected, actual);
    }

    public void testFindArrayExistPartiallyEqual() {

        int[] largeArray = {1,2,3,1,2,3,4,5,6,7};
        int[] subArray = {1,2,3,4};

        int expected = 3;
        int actual = arrayUtils.findArray(largeArray, subArray);

        assertEquals(expected, actual);
    }

    public void testFindArraySubArrayEmpty() {

        int[] largeArray = {1,2,3,4,5,6,7};
        int[] subArray = {};

        int expected = -1;
        int actual = arrayUtils.findArray(largeArray, subArray);

        assertEquals(expected, actual);
    }

    public void testFindArraySubArrayLargerThanArray() {

        int[] largeArray = {1,2,3,4,5,6,7};
        int[] subArray = {4,5,6,7,8,9,10,11};

        int expected = -1;
        int actual = arrayUtils.findArray(largeArray, subArray);

        assertEquals(expected, actual);
    }

    public void testFindArrayExistsVeryComplex() {

        int[] largeArray = {1234, 56, -345, 789, 23456, 6745};
        int[] subArray = {56, -345, 789};

        int expected = 1;
        int actual = arrayUtils.findArray(largeArray, subArray);

        assertEquals(expected, actual);
    }

}
最佳回答

“只使用核心Java API”的要求也可能意味着他们想看看您是否会重新发明轮子。因此,除了您自己的实现之外,为了安全起见,您还可以提供一线解决方案:

public static int findArray(Integer[] array, Integer[] subArray)
{
    return Collections.indexOfSubList(Arrays.asList(array), Arrays.asList(subArray));
}

指出给定的示例包含无效的数组文字可能是个好主意,也可能不是个好主意。

问题回答
Clean and improved code 

public static int findArrayIndex(int[] subArray, int[] parentArray) {
    if(subArray.length==0){
        return -1;
    }
    int sL = subArray.length;
    int l = parentArray.length - subArray.length;
    int k = 0;
    for (int i = 0; i < l; i++) {
        if (parentArray[i] == subArray[k]) {
            for (int j = 0; j < subArray.length; j++) {
                if (parentArray[i + j] == subArray[j]) {
                    sL--;
                    if (sL == 0) {
                        return i;
                    }

                }

            }
        }

    }
    return -1;
}

要在较大的整数数组中查找整数数组,可以使用与在较大字符串中查找子字符串相同的算法。为此,有许多已知的算法(请参阅Wikipedia)。尤其是Boyer-Moore字符串搜索对大数组来说是有效的。你试图实现的算法不是很有效(维基百科称之为天真的实现)。

对于您的问题:

  1. Yes, such a method should be static
  2. Don t care, that s a question of taste
  3. The null check can be included, or you should state in the JavaDoc that null values are not allowed, or JavaDoc should state that when either parameter is null a NullPointerException will be thrown.

下面是一种使用KMP模式匹配算法的方法。此解决方案采用O(n+m)。其中n=大数组的长度m=子数组的长度[/code>。有关详细信息,请查看:

https://en.wikipedia.org/wiki/KMP_algorithm

冲击力取O(n*m)。我刚刚检查了Collections.indexOfSubList方法也是O(n*m)

public static int subStringIndex(int[] largeArray, int[] subArray) {
    if (largeArray.length == 0 || subArray.length == 0){
      throw new IllegalArgumentException();
}
    if (subArray.length > largeArray.length){
      throw new IllegalArgumentException();
}

    int[] prefixArr = getPrefixArr(subArray);
    int indexToReturn = -1;

    for (int m = 0, s = 0; m < largeArray.length; m++) {
      if (subArray[s] == largeArray[m]) {
        s++;
      } else {
        if (s != 0) {
          s = prefixArr[s - 1];
          m--;
        }
      }
      if (s == subArray.length) {
        indexToReturn = m - subArray.length + 1;
        break;
      }
    }

    return indexToReturn;
  }

  private static int[] getPrefixArr(int[] subArray) {
    int[] prefixArr = new int[subArray.length];
    prefixArr[0] = 0;

    for (int i = 1, j = 0; i < prefixArr.length; i++) {
      while (subArray[i] != subArray[j]) {
        if (j == 0) {
          break;
        }
        j = prefixArr[j - 1];
      }

      if (subArray[i] == subArray[j]) {
        prefixArr[i] = j + 1;
        j++;
      } else {
        prefixArr[i] = j;
      }

    }
    return prefixArr;
  }

好吧,在我的脑海里:

  1. 是的,应该是静态的。

  2. 一家抱怨这件事的公司不值得为之工作。

  3. 是的,但你会怎么做?回来或者抛出异常?它将以现有的方式抛出一个异常。

  4. 我认为主要的问题是您的代码不是很优雅。内部循环中的检查过多。冗余检查过多。

只是生的,从我的头顶:

public int findArray(int[] largeArray, int[] subArray) {

    int subArrayLength = subArray.length;

    if (subArrayLength == 0) {
        return -1;
    }

    int limit = largeArray.length - subArrayLength;

    int i=0;

    for (int i = 0; i <= limit; i++) {
        boolean subArrayFound = true;

        for (int j = 0; j < subArrayLength; j++) {
            if (subArray[j] != largeArray[i+j]) {
                subArrayFound = false;
                break;
            }

        /* Sub array found - return its index */
        if (subArrayFound) {
            return i;
        }
    }

    /* Return default value */
    return -1;
}

可以对第一个元素进行检查,这样就不会为数组中的每个元素设置布尔值和for循环。然后你会看到

public int findArray(int[] largeArray, int[] subArray) {

    int subArrayLength = subArray.length;

    if (subArrayLength == 0) {
        return -1;
    }

    int limit = largeArray.length - subArrayLength;

    for (int i = 0; i <= limit; i++) {
        if (subArray[0] == largeArray[i]) {
            boolean subArrayFound = true;

            for (int j = 1; j < subArrayLength; j++) {
                if (subArray[j] != largeArray[i+j]) {
                    subArrayFound = false;
                    break;
                }

            /* Sub array found - return its index */
            if (subArrayFound) {
                return i;
            }
        }
    }

    /* Return default value */
    return -1;
}

之前发布的经过一点优化的代码:

public int findArray(byte[] largeArray, byte[] subArray) {
    if (subArray.length == 0) {
        return -1;
    }
    int limit = largeArray.length - subArray.length;
    next:
    for (int i = 0; i <= limit; i++) {
        for (int j = 0; j < subArray.length; j++) {
            if (subArray[j] != largeArray[i+j]) {
                continue next;
            }
        }
        /* Sub array found - return its index */
        return i;
    }
    /* Return default value */
    return -1;
}
int findSubArr(int[] arr,int[] subarr)
{
    int lim=arr.length-subarr.length;

    for(int i=0;i<=lim;i++)
    {
        int[] tmpArr=Arrays.copyOfRange(arr,i,i+subarr.length);
        if(Arrays.equals(tmpArr,subarr))
            return i;   //returns starting index of sub array
    }
    return -1;//return -1 on finding no sub-array   
}

更新:

通过重用相同的int数组实例:

int findSubArr(int[] arr,int[] subarr)
{
    int lim=arr.length-subarr.length;
    int[] tmpArr=new int[subarr.length];
    for(int i=0;i<=lim;i++)
    {
        System.arraycopy(arr,i,tmpArr,0,subarr.length);
        if(Arrays.equals(tmpArr,subarr))
          return i; //returns starting index of sub array
    }
    return -1;//return -1 on finding no sub-array   

}

我建议进行以下改进:

  • make the function static so that you can avoid creating an instance
  • the outer loop condition could be i <= largeArray.length-subArray.length, to avoid a test inside the loop
  • remove the test (largeArray[i] == subArray[0]) that is redundant

这里是字符串中的#indexOf:

/**
 * Code shared by String and StringBuffer to do searches. The
 * source is the character array being searched, and the target
 * is the string being searched for.
 *
 * @param   source       the characters being searched.
 * @param   sourceOffset offset of the source string.
 * @param   sourceCount  count of the source string.
 * @param   target       the characters being searched for.
 * @param   targetOffset offset of the target string.
 * @param   targetCount  count of the target string.
 * @param   fromIndex    the index to begin searching from.
 */
static int indexOf(char[] source, int sourceOffset, int sourceCount,
        char[] target, int targetOffset, int targetCount,
        int fromIndex) {
    if (fromIndex >= sourceCount) {
        return (targetCount == 0 ? sourceCount : -1);
    }
    if (fromIndex < 0) {
        fromIndex = 0;
    }
    if (targetCount == 0) {
        return fromIndex;
    }

    char first = target[targetOffset];
    int max = sourceOffset + (sourceCount - targetCount);

    for (int i = sourceOffset + fromIndex; i <= max; i++) {
        /* Look for first character. */
        if (source[i] != first) {
            while (++i <= max && source[i] != first);
        }

        /* Found first character, now look at the rest of v2 */
        if (i <= max) {
            int j = i + 1;
            int end = j + targetCount - 1;
            for (int k = targetOffset + 1; j < end && source[j]
                    == target[k]; j++, k++);

            if (j == end) {
                /* Found whole string. */
                return i - sourceOffset;
            }
        }
    }
    return -1;
}

首先是您可能的原因:

  1. Yes. And the class final with a private constructor.
  2. Shouldn t use this kind of comments at all. The code should be self-explanatory.
  3. You re basically implicitly checking for null by accessing the length field which will throw a NullPointerException. Only in the case of a largeArray.length == 0 and a subArray == null will this slip through.

更多潜在原因:

  • The class doesn t contain any function for array manipulations, opposed to what the documentation says.
  • The documentation for the method is very sparse. It should state when and which exceptions are thrown (e.g. NullPointerException) and which return value to expect if the second array isn t found or if it is empty.
  • The code is more complex than needed.
    1. Why is the equality of the first elements so important that it gets its own check?
    2. In the first loop, it is assumed that the second array will be found, which is unintentional.
    3. Unneeded variable and jump (boolean and break), further reducing legibility.
    4. largeArray.length <= i+j is not easy to grasp. Should be checked before the loop, improving the performance along the way.
    5. I d swap the operands of subArray[j] != largeArray[i+j]. Seems more natural to me.
    6. All in all too long.
  • The test code is lacking more edge cases (null arrays, first array empty, both arrays empty, first array contained in second array, second array contained multiple times etc.).
  • Why is the last test case named testFindArrayExistsVeryComplex?

练习缺少的是数组参数的组件类型的规范,以及方法的签名。无论组件类型是基元类型还是引用类型,都会产生巨大的差异adietrich的解决方案假设一个引用类型(因此可以作为进一步的改进进行泛化),我的解决方案假设一个基元类型(int)。

下面是我的镜头,专注于代码/忽略文档和测试:

public final class ArrayUtils {
    // main method

    public static int indexOf(int[] haystack, int[] needle) {
        return indexOf(haystack, needle, 0);
    }

    // helper methods

    private static int indexOf(int[] haystack, int[] needle, int fromIndex) {
        for (int i = fromIndex; i < haystack.length - needle.length; i++) {
            if (containsAt(haystack, needle, i)) {
                return i;
            }
        }
        return -1;
    }

    private static boolean containsAt(int[] haystack, int[] needle, int offset) {
        for (int i = 0; i < needle.length; i++) {
            if (haystack[i + offset] != needle[i]) {
                return false;
            }
        }
        return true;
    }

    // prevent initialization

    private ArrayUtils() {}
}
    byte[] arr1 = {1, 2, 3, 4, 5, 6, 7, 7, 8, 9, 1, 3, 4, 56, 6, 7};
    byte[] arr2 = {9, 1, 3};

    boolean i = IsContainsSubArray(arr1, arr2);

 public static boolean IsContainsSubArray(byte[] Large_Array, byte[] Sub_Array){
    try {
        int Large_Array_size, Sub_Array_size, k = 0;

        Large_Array_size = Large_Array.length;
        Sub_Array_size = Sub_Array.length;

        if (Sub_Array_size > Large_Array_size) {
            return false;
        }
        for (int i = 0; i < Large_Array_size; i++) {
            if (Large_Array[i] == Sub_Array[k]) {
                k++;
            } else {
                k = 0;
            }
            if (k == Sub_Array_size) {
                return true;
            }
        }
    } catch (Exception e) {
    }
    return false;
}

来自Guava的代码:

import javax.annotation.Nullable;

/**
 * Ensures that an object reference passed as a parameter to the calling method is not null.
 *
 * @param reference an object reference
 * @param errorMessage the exception message to use if the check fails; will be converted to a
 *     string using {@link String#valueOf(Object)}
 * @return the non-null reference that was validated
 * @throws NullPointerException if {@code reference} is null
 */
public static <T> T checkNotNull(T reference, @Nullable Object errorMessage) {
    if (reference == null) {
        throw new NullPointerException(String.valueOf(errorMessage));
    }
    return reference;
}


/**
 * Returns the start position of the first occurrence of the specified {@code
 * target} within {@code array}, or {@code -1} if there is no such occurrence.
 *
 * <p>More formally, returns the lowest index {@code i} such that {@code
 * java.util.Arrays.copyOfRange(array, i, i + target.length)} contains exactly
 * the same elements as {@code target}.
 *
 * @param array the array to search for the sequence {@code target}
 * @param target the array to search for as a sub-sequence of {@code array}
 */
public static int indexOf(int[] array, int[] target) {
    checkNotNull(array, "array");
    checkNotNull(target, "target");
    if (target.length == 0) {
        return 0;
    }

    outer:
    for (int i = 0; i < array.length - target.length + 1; i++) {
        for (int j = 0; j < target.length; j++) {
            if (array[i + j] != target[j]) {
                continue outer;
            }
        }
        return i;
    }
    return -1;
}

我想通过三种方式做到这一点:

  1. 不使用导入,即使用纯Java语句。

  2. 使用JAVA核心API-在某种程度上或很大程度上。

  3. Using string pattern search algorithms like KMP etc. (Probably the most optimized one.)

1、2和3都显示在上面的答案中。以下是我这边的方法2:

public static void findArray(int[] array, int[] subArray) {

        if (subArray.length > array.length) {
            return;
        }

        if (array == null || subArray == null) {
            return;
        }

        if (array.length == 0 || subArray.length == 0) {
            return;
        }

        //Solution 1
        List<Integer> master = Arrays.stream(array).boxed().collect(Collectors.toList());
        List<Integer> pattern = IntStream.of(subArray).boxed().collect(Collectors.toList());

        System.out.println(Collections.indexOfSubList(master, pattern));

        //Solution2
        for (int i = 0; i <= array.length - subArray.length; i++) {
            String s = Arrays.toString(Arrays.copyOfRange(array, i, i + subArray.length));

            if (s.equals(Arrays.toString(subArray))) {
                System.out.println("Found at:" + i);
                return;
            }
        }
        System.out.println("Not found.");
    }

使用java 8和lambda表达式:

String[] smallArray = {"1","2","3"};
final String[] bigArray = {"0","1","2","3","4"};
boolean result = Arrays.stream(smallArray).allMatch(s -> Arrays.stream(bigArray).anyMatch(b -> b.equals(s)));

PS:使用finalString[]bigArray来封闭lambda表达式的空间是很重要的。

仅供参考:如果目标只是搜索数组y是否是数组x的子集,我们可以使用以下方法:

val x = Array(1,2,3,4,5)
val y = Array(3,4,5)
val z = Array(3,4,8)
x.containsSlice(y) // true
x.containsSlice(z) // false




相关问题
Spring Properties File

Hi have this j2ee web application developed using spring framework. I have a problem with rendering mnessages in nihongo characters from the properties file. I tried converting the file to ascii using ...

Logging a global ID in multiple components

I have a system which contains multiple applications connected together using JMS and Spring Integration. Messages get sent along a chain of applications. [App A] -> [App B] -> [App C] We set a ...

Java Library Size

If I m given two Java Libraries in Jar format, 1 having no bells and whistles, and the other having lots of them that will mostly go unused.... my question is: How will the larger, mostly unused ...

How to get the Array Class for a given Class in Java?

I have a Class variable that holds a certain type and I need to get a variable that holds the corresponding array class. The best I could come up with is this: Class arrayOfFooClass = java.lang....

SQLite , Derby vs file system

I m working on a Java desktop application that reads and writes from/to different files. I think a better solution would be to replace the file system by a SQLite database. How hard is it to migrate ...