English 中文(简体)
Java-多面法在较核心的操作上不会更快。
原标题:Java - multithreaded code does not run faster on more cores

I was just running some multithreaded code on a 4-core machine in the hopes that it would be faster than on a single-core machine. Here s the idea: I got a fixed number of threads (in my case one thread per core). Every thread executes a Runnable of the form:

private static int[] data; // data shared across all threads


public void run() {

    int i = 0;

    while (i++ < 5000) {

        // do some work
        for (int j = 0; j < 10000 / numberOfThreads) {
            // each thread performs calculations and reads from and
            // writes to a different part of the data array
        }

        // wait for the other threads
        barrier.await();
    }
}

On a quadcore machine, this code performs worse with 4 threads than it does with 1 thread. Even with the CyclicBarrier s overhead, I would have thought that the code should perform at least 2 times faster. Why does it run slower?

EDIT: Here s a busy wait implementation I tried. Unfortunately, it makes the program run slower on more cores (also being discussed in a separate question here):

public void run() {

    // do work

    synchronized (this) {

        if (atomicInt.decrementAndGet() == 0) {

            atomicInt.set(numberOfOperations);

            for (int i = 0; i < threads.length; i++)
                threads[i].interrupt();
        }
    }

    while (!Thread.interrupted()) {}
}
最佳回答

添加更多透镜并不一定是为了改善业绩。 造成业绩下降的可能原因有:

  • Coarse-grained locking may overly serialize execution - that is, a lock may result in only one thread running at a time. You get all the overhead of multiple threads but none of the benefits. Try to reduce how long locks are held.
  • The same applies to overly frequent barriers and other synchronization structures. If the inner j loop completes quickly, you might spend most of your time in the barrier. Try to do more work between synchronization points.
  • If your code runs too quickly, there may be no time to migrate threads to other CPU cores. This usually isn t a problem unless you create a lot of very short-lived threads. Using thread pools, or simply giving each thread more work can help. If your threads run for more than a second or so each, this is unlikely to be a problem.
  • If your threads are working on a lot of shared read/write data, cache line bouncing may decrease performance. That said, although this often results in performance degradation, this alone is unlikely to result in performance worse than the single threaded case. Try to make sure the data that each thread writes is separated from other threads data by the size of a cache line (usually around 64 bytes). In particular, don t have output arrays laid out like [thread A, B, C, D, A, B, C, D ...]

既然你没有显示你的法典,我可以在这里作更详细的发言。

问题回答

You re sleeping nano-seconds instead of milli-seconds.

我的改动如下:

Thread.sleep(0, 100000 / numberOfThreads); // sleep 0.025 ms for 4 threads

纽约总部

Thread.sleep(100000 / numberOfThreads);

and got a speed-up proportional 纽约总部 the number of threads started just as expected.


I invented a CPU-intensive "countPrimes". Full test code available here.

I get the following speed-up on my quad-core machine:

4 threads: 1625
1 thread: 3747

(the CPU-load moni纽约总部r indeed shows that 4 course are busy in the former case, and that 1 core is busy in the latter case.)

结论: 您在“同步”之间每一条线的工作部分,相对较小>。 同步占用的时间大大超过CPU实际密集计算工作。

(Also, if you have memory intensive code, such as 纽约总部ns of array-accesses in the threads, the CPU won t be the bottle-neck anyway, and you won t see any speed-up by splitting it on multiple CPUs.)

The code inside runnable does not actually do anything.
In your specific example of 4 threads each thread will sleep for 2.5 seconds and wait for the others via the barier.
So all that is happening is that each thread gets on the processor to increment i and then blocks for sleep leaving processor available.
I do not see why the scheduler would alocate each thread to a separate core since all that is happening is that the threads mostly wait.
It is fair and reasonable to expect to just to use the same core and switch among threads
UPDATE
Just saw that you updated post saying that some work is happening in the loop. What is happening though you do not say.

各个核心的同步比对单一核心的整合要慢得多。

由于一台单一的核心机器,联合机器在每台合成器中 flu(avery放慢)。

http://mailinator.blogspot.com/2008/03/how-fast-is-java-volaless-or-atomic-or.html”rel=“nofollow”

Here is a not tested SpinBarrier but it should work.

Check if that may have any improvement on the case. Since you run the code in loop extra sync only hurt performance if you have the cores on idle. Btw, I still believe you have a bug in the calc, memory intense operation. Can you tell what CPU+OS you use.

Edit, forgot the text out.

import java.util.concurrent.atomic.AtomicInteger;

public class SpinBarrier {
    final int permits;
    final AtomicInteger count;
    final AtomicInteger version;
    public SpinBarrier(int count){ 
        this.count = new AtomicInteger(count);
        this.permits= count;
        this.version = new AtomicInteger();
    }

    public void await(){        
        for (int c = count.decrementAndGet(), v = this.version.get(); c!=0 && v==version.get(); c=count.get()){
            spinWait();
        }       
        if (count.compareAndSet(0, permits)){;//only one succeeds here, the rest will lose the CAS
            this.version.incrementAndGet();
        }
    }

    protected void spinWait() {
    }
}




相关问题
Spring Properties File

Hi have this j2ee web application developed using spring framework. I have a problem with rendering mnessages in nihongo characters from the properties file. I tried converting the file to ascii using ...

Logging a global ID in multiple components

I have a system which contains multiple applications connected together using JMS and Spring Integration. Messages get sent along a chain of applications. [App A] -> [App B] -> [App C] We set a ...

Java Library Size

If I m given two Java Libraries in Jar format, 1 having no bells and whistles, and the other having lots of them that will mostly go unused.... my question is: How will the larger, mostly unused ...

How to get the Array Class for a given Class in Java?

I have a Class variable that holds a certain type and I need to get a variable that holds the corresponding array class. The best I could come up with is this: Class arrayOfFooClass = java.lang....

SQLite , Derby vs file system

I m working on a Java desktop application that reads and writes from/to different files. I think a better solution would be to replace the file system by a SQLite database. How hard is it to migrate ...

热门标签