Homework 2.

Due: Wednesday, October 20rd, 2004
Submission: By plain text e-mail to lboloni@cpe.ucf.edu.
            In the subject write: EEL 5708, HW 2
            
            Do not send other correspondence with the homework. Send it in a 
            separate mail, with appropriate subject line. E.g.: question about
            problem 4. 


1. Consider the following assembly program written for a RISC style 32 bit 
machine:

   ld  r20, 16000     # load immediate
   ld  r19, 20000  
   ld  r18, 40000    
start:
   add r3, r20, r19
   ld  r1, 0(r3)      # load into r1 from memory
   add r4, r20, r18
   ld  r2, 0(r4)
   mult r2, r1, r2
   st  0(r4), r2      # store r2 into memory
   ld  r2, 4
   sub r20, r20, r2   
   bnez r20, start    # jump to start if r20 not equal with zero
   
  a) Write a C, C++ or Java program which corresponds to this assembly level 
  program.
  
  b) Identify the RAW, WAR and WAW which can appear in the program conflicts 
  in the program.
  
  c) Assume that the program is run on a machine which has a 5 stage pipeline. 
  Rewrite the program by inserting "nop" operations where there will be stalls.  
  
  d) Would a processor with a Tomasulo-style register renaming improve the
  execution time of this program? What about the speculation? Superscalar?
  Which one would perform better, the original version or the unrolled?
  Explain. 
  
  e) Unroll the loop in the program 4 times in software (assembly). 
  Transform the program by register renaming such that the number of stalls 
  is minimized.  
  
2. Suppose that a processor with a load/store architecture executes at a clock
rate of 1GHz, with the ideal CPI of 1.2. Profiling shows that 15% of the
instructions in average are load or store instructions. The processor accesses the
memory through a separate data and instruction cache. An average 2% of the
instructions produce an instruction miss, while 12% of the data accesses are
cache misses. The penalty of a miss is 140 cycles. Cache hits do not produce
any penalty. 

(a)	What is the real CPI of the architecture?
(b)	What is the average memory access time (AMAT) of the architecture?