English 中文(简体)
Load half word and load byte in a single cycle datapath
原标题:

There was this problem that has been asked about implementing a load byte into a single cycle datapath without having to change the data memory, and the solution was something below.

alt text http://img214.imageshack.us/img214/7107/99897101.jpg

This is actually quite a realistic question; most memory systems are entirely word-based, and individual bytes are typically only dealt with inside the processor. When you see a “bus error” on many computers, this often means that the processor tried to access a memory address that was not properly word-aligned, and the memory system raised an exception. Anyway, because byte addresses might not be a multiple of 4, we cannot pass them to memory directly. However, we can still get at any byte, because every byte can be found within some word, and all word addresses are multiples of 4. So the first thing we do is to make sure we get the right word. If we take the high 30 bits of the address (i.e., ALUresult[31-2]) and combine them with two 0 bits at the low end (this is what the “left shift 2” unit is really doing), we have the byte address of the word that contains the desired byte. This is just the byte’s own address, rounded down to a multiple of 4. This change means that lw will now also round addresses down to multiples of 4, but that’s OK since non-aligned addresses wouldn’t work for lw anyway with this memory unit. OK, now we get the data word back from memory. How do we get the byte we want out of it? Well, note that the byte’s byte-offset within the word is just given by the low-order 2 bits of the byte’s address. So, we simply use those 2 bits to select the appropriate byte out of the word using a mux. Note the use of big-endian byte numbering, as is appropriate for MIPS. Next, we have to zero-extend the byte to 32 bits (i.e., just combine it with 24 zeros at its high end), because the problem specifies to do so. Actually, this was a slight mistake in the question: in reality, the lbu instruction zero-extends the byte, but lb sign-extends it. Oh, well. Finally, we have to extend the MemtoReg-controlled mux to accept one new input: the zero-extended byte for the lb case. The MemtoReg control signal must be widened to 2 bits. The original 0 and 1 cases change to 00 and 01, respectively, and we add a new case 10 which is only used in the case of lb.

I don t quite actually understand on how this works even after reading the explanation, especially about left shift the ALU result by 2 would give the byte address... how is this possible?? so if I would like to load a half word then I would do one left shift and I would get the address of the half word?? what would be a better way to do load byte, load half word by modifying the data memory? (the question above puts constraints that we can t modify the data memory)

最佳回答

The original author simply seems to be adding a byte multiplexer to the 32-bit data being read from the memory. This memory allows a full 32-bit naturally aligned load (lw instruction) and the additional byte multiplexer and zero extension allows for load byte instructions as well (lbu instruction).

The left shift of the ALU result yields a word address, NOT a byte address, and accounts for the implicit right shift by two in the signal routing. The end result is simply the lower two bits of the ALU result being masked (zeroed) before being sent to the memory. The two LSBs of the ALU value are fed down-stream of the memory to the byte multiplexer, allowing the word memory to read arbitrary bytes.

There is no direct support in the logic shown for loading half-words (16-bits), just bytes and full 32-bit words. You could, however, easily modify the byte addressing logic to support words instead of bytes (or even both) using a similar approach.

问题回答

暂无回答




相关问题
Compiling for both Intel and PPC CPUs on OSX

I have a MacBook Pro with a 64-bit Intel Core 2 Duo processor, and I m using gcc (i686-apple-darwin9-gcc-4.0.1) to compile executables which I can run ok on my own machine. Recently someone tried to ...

Preserving the Execution pipeline

Return types are frequently checked for errors. But, the code that will continue to execute may be specified in different ways. if(!ret) { doNoErrorCode(); } exit(1); or if(ret) { exit(1); } ...

C programming and error_code variable efficiency

Most code I have ever read uses a int for standard error handling (return values from functions and such). But I am wondering if there is any benefit to be had from using a uint_8 will a compiler -- ...

Design code to fit in CPU Cache?

When writing simulations my buddy says he likes to try to write the program small enough to fit into cache. Does this have any real meaning? I understand that cache is faster than RAM and the main ...

What are some examples of non-Von Neumann architectures?

If I understand correctly modern computers are modeled after the Von Neumann architecture. I have sometimes seen reference to alternatives, but haven t really seen any very good descriptions of how ...

System Architecture

How do I determine whether the currently running Mac OS X system is of 32bit or 64bit machine?

CPU Numbering on a hypertheading enabled system

I am trying to find out how an OS (Windows, linux) assigns numbers to logical cpus in a Hyper threading enabled environment. ? Does both the OSs first serially assign numbers to the Physical CPUs and ...

Is this a mistake in my Computer Architecture book?

I m working on my HW for computer architecture and I came across the following problem: A = 247 B = 237 1) Assume A and B are signed 8-bit integers stored in two s complement format. ...

热门标签