Homework 2. Due: Wednesday, October 20rd, 2004 Submission: By plain text e-mail to lboloni@cpe.ucf.edu. In the subject write: EEL 5708, HW 2 Do not send other correspondence with the homework. Send it in a separate mail, with appropriate subject line. E.g.: question about problem 4. 1. Consider the following assembly program written for a RISC style 32 bit machine: ld r20, 16000 # load immediate ld r19, 20000 ld r18, 40000 start: add r3, r20, r19 ld r1, 0(r3) # load into r1 from memory add r4, r20, r18 ld r2, 0(r4) mult r2, r1, r2 st 0(r4), r2 # store r2 into memory ld r2, 4 sub r20, r20, r2 bnez r20, start # jump to start if r20 not equal with zero a) Write a C, C++ or Java program which corresponds to this assembly level program. b) Identify the RAW, WAR and WAW which can appear in the program conflicts in the program. c) Assume that the program is run on a machine which has a 5 stage pipeline. Rewrite the program by inserting "nop" operations where there will be stalls. d) Would a processor with a Tomasulo-style register renaming improve the execution time of this program? What about the speculation? Superscalar? Which one would perform better, the original version or the unrolled? Explain. e) Unroll the loop in the program 4 times in software (assembly). Transform the program by register renaming such that the number of stalls is minimized. 2. Suppose that a processor with a load/store architecture executes at a clock rate of 1GHz, with the ideal CPI of 1.2. Profiling shows that 15% of the instructions in average are load or store instructions. The processor accesses the memory through a separate data and instruction cache. An average 2% of the instructions produce an instruction miss, while 12% of the data accesses are cache misses. The penalty of a miss is 140 cycles. Cache hits do not produce any penalty. (a) What is the real CPI of the architecture? (b) What is the average memory access time (AMAT) of the architecture?