Question

我遇到一个常见的面试问题,

assume we have an int array int[] A, we want to find the first duplicate entry.

几乎每个人都可以想到使用哈希塞特, 然后在解析时添加它。这将导致 O(n) 时间和 O(n) 空间。在此之后, 我被要求在没有其它数据结构的情况下解决它。我说最愚蠢的想法是比较 O(n) 2 时间中的每个时间。然后有人要求我改进 O(n) 2 时间。
为了改进它,我想到使用一个固定的大小阵列(假设最多数为n),布林[ b =新的布林[ n];然而,我不允许使用这种方法。
Then I thought of using an int variable, using bit manipulation, if the maximum number is smaller than 32, then for n we can push 1 to n bits left and | to a checker, then & the checker to the next entry in the array to check if it s > 0. e.g.:
```
int c = A[i];
if(check & (1 << c) > 0) return false;
check |= 1 << c;
```

然而,也不允许这样做。

所以有一个提示,我可以使用阵列本身作为hashset/hashbable, 和"线性散列"?

有什么帮助吗谢谢

Answer 1

由维基百科 < / a > 定义的线性散列< a href=> http://en.wikipedia.org/wiki/Linear_hashing> 的优点是,重整工作会逐渐发生,因为桶以圆形条状方式一对一地分割,保持固定的摊余时间复杂度,以便插入变大。因此,它们的想法是将已经循环的元素复制到阵列上,作为线性散列的存储器重新使用。

虽然我远非线性散列专家,但我看不出有什么办法适合阵列中的散列表。当然,为了用线性散列存储 n 元素,你可能会通过使用 n 桶来得到。然而,桶中元素的数量没有被绑住,你需要像连接列表这样的东西来安装每桶,这需要额外的O(n)内存,用于指针。

因此,这种算法不会产生比普通 < code> HashSet 更复杂的无药可救的空间复杂性,但确实会减少内存消耗的常数。

其时间复杂性与普通 < code> HashSet 相同。

编辑: 在我看来, 这个答案被忽略了( 没有投票, 没有评论 ) 。有用吗? 请评论, 这样我才能知道该改进什么。

Answer 2

我有一个想法:随着您在数组中的进展, 您可以排序您访问过的部分。通过使用二进制搜索, 您将会改进时间; 空间是 0 。排序本身是... 插入排序。您基本上运行了正常的排序。但是, 当您在寻找插入新网膜的地方时, 如果您按下数字本身, 您会喊“ 宾果 ” 。这比零空格 + O(n²) 时间有所改进。

Answer 3

我想问问采访者,为什么他们不希望你使用“其他数据结构”,因为显然存在为此目的设计的内在结构——HashSet 。

It is O(n). You probably won t do much better than this using other methods, unless you do something really clever and get it down to O(log n).
This is Java - not C. There are readily available data structures to do this, painlessly, with almost no additional effort on the programmer s part.

http://docs.oracle.com/javase/6/docs/technotes/guides/countings/index.html" rel=“no follow”>收集框架上的贾瓦文件 :

The collections framework is a unified architecture for representing and manipulating collections, allowing them to be manipulated independently of the details of their representation. It reduces programming effort while increasing performance. It allows for interoperability among unrelated APIs, reduces effort in designing and learning new APIs, and fosters software reuse.

<强>增编

以下多数评论认为,这只是一种确定程序员技能的练习。

这个“ 访问” 用于 Java 编程位置。 Java 是一个面向目标的语言, 有能力完成类似的任务, 而不需要从零开始设计一个过程( 如 C 和其他各种低级别语言 ) 。此外, Java 也不是在空间复杂性引起关注时的最佳选择。也就是说, 再读一遍我上面列出的条目之一。

Answer 4

well, you give the answer yourself: linear hashing does exist. it has complexity o(1)/o(1) according to http://cgi.di.uoa.gr/~ad/MDE515/e_ds_linearhashing.pdf so you d take out elements from the array one after the other while using the first few as memory for the hash map.
but really, it s a datastructure that you implement yourself.

或面试者其实并不理解数据结构是数据结构, 即使您自己执行数据结构,

主要是因为这个问题你要么知道,要么你不知道。在采访中,没有办法提出这个问题。我希望你不会为他们工作。

Answer 5

这不使用线性散列, 但工作速度比 O( N² 更快 ) :

Choose some small number C and use a brute-force algorithm to find first duplicate for the first C elements of the array. Clear first C elements if nothing found yet.
Perform the remaining steps with first N elements empty. Initially, N=C. After each iteration, N is doubled.
Sequentially add numbers from indexes N+1 .. 3*N/2 to the hash table in first N array elements. Use open addressing. After all N/2 elements moved, hash load factor should be 1/2. Clear space, occupied by N/2 elements we just moved. For the next N/4 elements, search each of them in the hash table(s), constructed so far, then hash them to the space which is always twice as much as number of elements. Continue this until N-C array elements are hashed. Search the remaining C elements in the hash tables and compare them to each other.
Now we have N array elements without duplicates, occupying 2*N space. Rehash them in-place.
Sequentially search all other elements of the array in this hash table. Then clear these 2*N elements, set N=2*N, and continue with step 3.

步骤 3. 5 可简化。只有散列元素 N+1. 3*N/2 并在此散列表格中搜索数组的所有其他元素。然后对元素 3* N/2+1. 2* N. 这比原始算法慢两倍, 但平均还是 O(Nlog N) 。

其它的替代方案是使用第一个 N 空元素来构造元素 N+1 的二进制搜索树。 3* N/2 并搜索此树中数组的所有其他元素。然后对元素 3* N/2+1. 2 * N. (只有在数组足够小且其元素可以用整数值索引的情况下才能使用此功能 ) 。

上文描述的 Algorithm 是概率性的, 平均在 O( Nlog N) 时间工作。最差的情况复杂度是 O( N < supp> 2 ) 。使用二进制搜索树的替代办法可能是 O( Nlog < sups > 2 N) 最差的情况复杂度, 如果树是自平衡的, 情况复杂。但是这很复杂。在 O( N log < supp > 2 N) 最差的情况复杂时, 可以用更简单的算法来完成任务。

此算法通过数组顺序迭代, 并保留以下变量: 最大可能的子数组, 大小为二分之一, 适合当前位置左侧, 开始于索引 0 并进行分类; 下一个次数组跟随它, 并且也进行分类。换句话说, 当前指数的二进制表达方式描述它之前有多少次数排序。例如, 87 指数 ( 1010111) 在指数 86 上有一个单项元素, 在指数 84 上对一对进行分类, 分类为 80 的 4 个次数组, 分类为 64 的 16 个次数组, 在数组开始时有 64 个次数组元素。

Iterate through the array
Search current element in all preceding sub-arrays using binary search.
Sort current element together with those preceding sub-arrays, that correspond to trailing "ones" in the binary representation of current index. For example, for index 87 (1010111), we need to sort current element together with 3 sub-arrays (1+1+2+4=8 elements). This step allows adding current element to sub-arrays while keeping algorithm s invariant.
Continue with next iteration of step 1.

Answer 6

优多代码 :

res = -1;
startArray = [...];
sortedArray = mergeSort(startArray);
for i = 1 to n
     x = bynary_search(sortedArray, startArray[i]); //array, element
     if ((sorted_array[x] == sortedArray[x-1])    ||   (sorted_array[x] == sortedArray[x+1]))
           res = i;
           break;
if (res != -1)
     print( First duplicate is  ,startArray[res]);
else
     print( There are no duplicates );

合并最差的大小写 O(nlognn)

二进制搜索最差二进制大小写 < streng>O(log n)

n 二进制搜索最差情况 <强>O(nlognn)

Total O(nlognn)

Answer 7

我得到了一个额外的限制,没有额外的内存, 只有登记册。这就是我想出来的:

outer: for (i = 0; i < arr.length - 1; i++)
 for (j = i+1; j < arr.length; j++)
   if (arr[i] == arr[j])
     break outer;

如果 i 和 j 是 & lt; arr. 长度, 则是第一个重复值的索引, 并且是匹配的。

它比O(n)2要好一点,因为j从不覆盖整个Arr的长度

Answer 8

这里是 O(n) 平均算法的时间

public static int firstRepeatingElement(int[] elements) {
    int index = -1;
    Set<Integer> set = new HashSet<Integer>();

    for (int i = elements.length - 1; i >=0; i--) {
        if (set.contains(elements[i])) {
            index = i;
        }
        set.add(elements[i]);
    }
    if (index != -1) {
        return elements[index];
    }
    throw new IllegalArgumentException("No repeating elements found");
}

这是测试病例

@Test
public void firstRepeatingElementTest() {
    int [] elements = {1,2,5,7,5,3,10,2};
    int element = ArrayUtils.firstRepeatingElement(elements);
    assertThat(element, is(2));
}

@Test(expected=IllegalArgumentException.class)
public void firstRepeatingElementTestWithException() {
    int [] elements = {1,2,5,7,3,10};
    int element = ArrayUtils.firstRepeatingElement(elements);
    assertThat(element, is(2));
}

Answer 9

我认为这是你们访谈者寻找的“线性散列”解决方案。我们首先需要假设两个额外的限制:

length of A is >= max value of A
All values of A are positive

有了这些额外的制约因素,我们就可以利用较少的时间和空间解决问题。

好吧,让我们到代码:

int findFirstDuplicateEntry(int[] A) {
    for (int i=0; i<A.length; i++) {
        if (A[Math.abs(A[i])-1]<0)
            return Math.abs(A[i]);
        else {
            A[Math.abs(A[i])-1] = -A[Math.abs(A[i])-1];
        }
    }
    return -1;
}

我在这里做的是使用阵列本身存储一些额外信息。当我在阵列中反复显示, 每次我遇到一个值, 我就会使用该值作为“ 强” 指数 < / 强” 。在这个索引中, 我将检查该值。如果数值为负值, 我知道自己之前就在这里了( 因为所有的正限制 ) 。因此我找到了我的第一个复制件, 并且可以退出。否则, 我将否定该索引的值。

友情链接