English 中文(简体)
How many instructions to access pointer in C?
原标题:

I am trying to figure out how many clock cycles or total instructions it takes to access a pointer in C. I dont think I know how to figure out for example, p->x = d->a + f->b

i would assume two loads per pointer, just guessing that there would be a load for the pointer, and a load for the value. So in this operations, the pointer resolution would be a much larger factor than the actual addition, as far as trying to speed this code up, right?

This may depend on the compiler and architecture implemented, but am I on the right track?

I have seen some code where each value used in say, 3 additions, came from a

 f2->sum = p1->p2->p3->x + p1->p2->p3->a + p1->p2->p3->m

type of structure, and I am trying to define how bad this is

最佳回答

This depends on the architecture at hand.

Some architectures can reference/dereference memory for an instruction without first loading it into a register, others don t. Some architectures don t have the notion of instructions that compute the offsets for you to dereference and will make you load the memory address, add your offset to it, and then allow you to dereference the memory location. I m sure there are more variances chip-to-chip.

Once you get past these, each instruction takes varying amount of time depending on the architecture as well. To be honest though, it s an overhead that is very, very minimal.

For your immediate question of dereferencing a chain of items, the slowness will come in the fact that there is likely a poor locality of reference the farther you go in a dereferencing chain. This means more cache misses, which means more hits to main memory (or disk!) to get the data. Main memory is very slow compared to the CPU.

问题回答

Some IDEs like VisualStudio allow you to view the assembly generated along with the source code.

How to view the assembly behind the code using Visual C++?

Then you can see for your exact architecture and implementation what it looks like.

If you are using GDB (linux, mac) use disassemble

(gdb) disas 0x32c4 0x32e4
Dump of assembler code from 0x32c4 to 0x32e4:
0x32c4 <main+204>:      addil 0,dp
0x32c8 <main+208>:      ldw 0x22c(sr0,r1),r26
0x32cc <main+212>:      ldil 0x3000,r31
0x32d0 <main+216>:      ble 0x3f8(sr4,r31)
0x32d4 <main+220>:      ldo 0(r31),rp
0x32d8 <main+224>:      addil -0x800,dp
0x32dc <main+228>:      ldo 0x588(r1),r26
0x32e0 <main+232>:      ldil 0x3000,r31
End of assembler dump.

Depends what you are doing, a trivial pointer dereference y = *z; where

int x = 1;
int* z = &x;
int y;

might assemble to something like this on the x86:

mov eax, [z]
mov eax, [eax]
mov [y], eax

and y = x would still take a memory dereference:

mov eax, [x]
mov [y], eax

Mov instructions to memory take about 2-4 cycles IIRC.

Although, if you are loading memory from completely random locations, you will be causing a lot of page faults, resulting in hundreds of clock cycles being wasted.

Where it can, the compiler will remove that overhead for you by keeping repeatedly-used base locations in a register (eg. p1->p2->p3 in your example).

However, sometimes the compiler can t determine which pointers might alias other pointers used within your function - which means that it has to fall back to a very conservative position, and reload values from pointers frequently.

This is where C99 s restrict keyword can help. It lets you inform the compiler when certain pointers are never aliased by other pointers in the scope of the function, which consquently can improve the optimisation.


For example, take this function:

struct xyz {
    int val1;
    int val2;
    int val3;
};

struct abc {
    struct xyz *p2;
};

int foo(struct abc *p1)
{
    int sum;

    sum = p1->p2->val1 + p1->p2->val2 + p1->p2->val3;

    return sum;
}

Under gcc 4.3.2 with optimisation level -O1, it compiles to this x86 code:

foo:
    pushl   %ebp
    movl    %esp, %ebp
    movl    8(%ebp), %eax
    movl    (%eax), %edx
    movl    4(%edx), %eax
    addl    (%edx), %eax
    addl    8(%edx), %eax
    popl    %ebp
    ret

As you can see, it only deferences p1 once - it keeps the value of p1->p2 in the %edx register and uses it three times to fetch the three values from that structure.





相关问题
Fastest method for running a binary search on a file in C?

For example, let s say I want to find a particular word or number in a file. The contents are in sorted order (obviously). Since I want to run a binary search on the file, it seems like a real waste ...

Print possible strings created from a Number

Given a 10 digit Telephone Number, we have to print all possible strings created from that. The mapping of the numbers is the one as exactly on a phone s keypad. i.e. for 1,0-> No Letter for 2->...

Tips for debugging a made-for-linux application on windows?

I m trying to find the source of a bug I have found in an open-source application. I have managed to get a build up and running on my Windows machine, but I m having trouble finding the spot in the ...

Trying to split by two delimiters and it doesn t work - C

I wrote below code to readin line by line from stdin ex. city=Boston;city=New York;city=Chicago and then split each line by ; delimiter and print each record. Then in yet another loop I try to ...

Good, free, easy-to-use C graphics libraries? [closed]

I was wondering if there were any good free graphics libraries for C that are easy to use? It s for plotting 2d and 3d graphs and then saving to a file. It s on a Linux system and there s no gnuplot ...

Encoding, decoding an integer to a char array

Please note that this is not homework and i did search before starting this new thread. I got Store an int in a char array? I was looking for an answer but didn t get any satisfactory answer in the ...

热门标签