School of Computer Science

CDA 4150: Computer Architecture (Fall 2005)

Instructor: Prof. Euripides Montagne

Lecture Meetings: TuTh 1:30 p.m. to 2:45 p.m. (ENG2 105)

Office hours: MW from 3:00 p.m. to 5:00 p.m and TTh from 3:15pm to 4:15pm(CSB 239)

eurip@cs.ucf.edu Tel: 407-823-2684


TA: HONGLIANG GAO
Office Hours: Wednesday, 2:30 p.m. - 4:30 p.m. (CSB 107)
hgao@cs.ucf.edu Tel: 407-823-3228


Announcements

  • NEW Dec 12 Grades of Project1, Project2 and Project3
  • Nov 30 Grades of Project1 and Project2
  • Nov 29 Cache implementation project posted. Due 11:59 pm of Dec. 1st. You can select to do previous posted Project 3 or this one.
    • Submission: Please compress your project and send it to droygao@gmail.com with a README file.
    • Some hints from CDA4150 2004:
      • 10/30/04 - These words of Lab 3 debugging advice come from your classmate Carlos Sierra:

        Executing simv, a Ctrl-c interrupts the simulator. Type help or "?" for a list of commands. Use print or trace commands to debug. Print will let you print any signal value (syntax requires module_name.module_name.module_name.signal, just like virsims). Trace will give a list of lines where the code is looping. Read those lines and follow the loop in your code. Add in your code some $DISPLAY in suspicious signals.


        10/28/04 - Lab 3 Hints:
        You don't have to, but I would suggest doing the I$ accesses first and don't worry about the D$ at all until the I$ works. The boot code has about 20 instructions before you even get to the first load or store (a store) so you will have ample opportunity to test the hits and misses to the Icache before you even need to worry about the data cache. Once you have the CPU properly missing, fetching 8 instructions, hitting for 7 more instructions, missing, etc...you will have completed a significant portion of the lab. The data cache code is strikingly similar to the instruction cache code, with the added complication of writebacks to main memory.


        You should read the caches combinationally, and you can assume that like other combinational logic in our designs, the access can happen in "zero time". For example, the I$ is addressed by Iaddr. Whenever Iaddr changes, your code may re-read the tag array of the I$ and assign the result to a variable called iTag. This will happen with 0-delay in our behavioral simulator. This allows the I$ access to happen after the negedge of the clock and still allow Iin to be flopped with the result (on a hit) at the next posedge clock.


        You should not use any delays in your code other than the standard `TICK we have been using with flops. Aside from having no physical meaning, your design wouldn't work if I changed the number of ticks in a clock cycle. You do not wait a certain amount of time by adding random delays in your logic. You build a finite state machine (FSM) that drives the appropriate outputs when it sees the appropriate inputs. Your submitted design should work if I change the MEM_DELAY or change the number of Verilog ticks in the CLK cycle time. It will automatically work as long as you never use any delays in your logic aside from `TICK for the flop delay.


        As some of you have noticed, the Valid bits are initialized to 0 but the Dirty bits are not. Consequently, on the first miss to a line, your miss logic might find the existing line to be Dirty but not Valid and mistakenly try to write the line back to main memory. Be sure that your cache controller only writes lines back that are Dirty and Valid.


        Be sure that you are not combinationally reading the cache all the time. You don't want to read during a miss (the cache has only one port and part of the miss time you will be writing to it). So you should read the tag and state from the cache combinationally only if you are not stalled or if it is the retry cycle.


        mips.v expects that Addr is driven during the entire fill cycle for I$ and D$ fills.

  • Nov 22 slides of matrix multiplication using systolic arrays.
  • Nov 19 Project3 posted. Due Dec. 1st.
  • Nov 16 homework posted. Due Nov. 22th 11:59 p.m.
  • Nov 15 The deadline of project2 is extended to 23:59pm of Nov 16th. If you submit after the deadline but before 23:59pm of Nov 17th, you will get 50% late penalty. No submission after 23:59pm of Nov 17th will be accepted.
  • Nov 11 Because the monroe server may be down Friday and Saturday for a replacement power supply installation, the deadline of project2 is extended to 23:59pm of Nov 15th. If you submit after the deadline but before 23:59pm of Nov 16th, you will get 50% late penalty. No submission after 23:59pm of Nov 16th will be accepted.
  • Nov 9 For those of you got "Missing README file" error, please change the access permission of your project with these commands:
    • cd ~/
    • chmod -R 755 cda4150
    • Then submit your project
    • After submission, use "chmod -R 700 cda4150" to remove readable permission of your project. Otherwise, others can read your project and copy it!
  • Nov 7 Submission of Project2
    • First, the deadline has been postponed to 23:59pm of Nov 12th. If you submit after the deadline but before 23:59pm of Nov 13th, you will get 50% late penalty. No submission after 23:59pm of Nov 13th will be accepted.
    • Please run "make clobber" first. Then run "~hgao/cda4150/submit4150 lab2" to submit your project. If you have any problem to submit it, please compress it and send the compressed file to "droygao@gmail.com" before the deadline.
  • Nov 6 IMPORTANT INFORMATION for Project2
    • Please copy the new mips.v from ~hgao/cda4150 to your lab2 directory. Depending on subtle timing in your Verilog code it turns out there was still a race condition that could generate spurious writes to the memory system. I changed the memory system to only examine the write signals halfway through the cycle, to give time for everything to settle to their stable values. For those of you who thought you had everything working but your simulations were just looping forever, please grab this file and give it a try. It will likely fix your problems.
    • Hints
      • The signal inDelaySlot should be entirely removed from the design. The delay slot in lab2 falls out of the pipeline structure, and unlike lab1 we do not need to do anything special to create one. In addition the signals savedBranchPC and instIsBranch should also be removed.
      • savedBranchPC, inDelaySlot, and instIsBranch all go the way of the State always block in the pipelined processor. Bye Bye.
      • If you get an "Illegal Instruction" error, your PC fetch logic is most likely messed up. Your execution actually starts in boot.o (you can disassemble it) and then jumps to your main program. If you do not see the first instructions in boot.o as the instructions you are executing, there are problems with your fetch logic. Check to make sure PC is negedge flopped. Be sure that PC1, PC2, PC3, and PC4 are posedge flopped. Also be sure that PC1, PC2, IR1, IR2, and PC do not change when decodeStall is high.
      • Make sure PC chooses 1 of 4 possible values based on PCsel.
      • Make sure you are fetching from the PC. Iaddr is the instruction address sent to the memory system. Make sure you are incrementing PC as well.
      • Branches and jumps are resolved in EX. Therefore PC calculations involving branches or jumps should use the PC of the instruction in the EX stage (PC2) in their calculations.
  • Nov 2 Please check these slides from Prof. Heinrich to know what is the block diagram mentioned inproject2.
  • Nov 2 The submission command was wrong in the project2 pdf file. Please check the latest file when you submit.
  • Oct 22 --- Project2 posted. Due 11/8/05 11:59pm.
  • Oct 18 Slides of "I/O performance" posted.
  • Oct 17 The dead line of Project1 is postponed to 23:59:00 on Oct 21st.
    • Please add "~heinrich/myusr/local/bin" in your PATH of .cshrc. Edit your .cshrc accordingly, "source .cshrc" and try it again.
  • Oct 17 Here are some references from Prof. Heinrich's material of CDA4150 Spring04:
  • Oct 16 Project1 Submission
    Plese follow these instructions to submit your project1:
    • First cd into your lab1 directory (cd ~yourid/cda4150/lab1).
    • Edit your README file as described in the projcet file.
      Here is an example:
      GROUP: hgao1 hgao2
      HONGLIANG GAO1
      HONGLIANG GAO2
      Lab1
    • Then run "make clobber"
    • At last run "~hgao/cda4150/submit4150 lab1"
    • The due time is 23:59:00 on Oct 18th. You can resubmit your project upto 5 times.
    • If you have any problem to submit it, please compress your files with "tar -czf yourid.tgz *", then email yourid.tgz to droygao@gmail.com with your id as the subject.
  • Oct 12

    Quote from Prof. Heinrich's material:
    Project 1 Hints -- How do I write an assembly language program to test my processor (versus a C program)? If you like, you can write assembly directly, but for this lab you will need to update a few files. To write assembly language programs for Lab 1, follow these steps:
    cp ~hgao/cda4150/{Makefile,sample.s} ~/cda4150/lab1/test
    That's it. To see a sample assembly file, look at the sample.s file you just copied into your lab1/test directory. To assemble and run it, in the test/ directory type make sample. Then you can run test/sample through vcs. At this point you can create whatever .s files you want, using sample.s as a template and assemble and run them at will.

    What does the following Verilog line do?:

    loadData = (Din >> ((~Daddr & 32'h3) << 3)) & 32'hff;

    Follow the logic thru and think about what LBU needs to do given the data address Daddr. Din is the aligned word value containing the byte described by Daddr. This is a big endian machine. So the code below looks at the inverse of the bottom two bits of Daddr (the byte offset within the word). It then shifts Din to the right by that amount * 8 (some number of bytes, either 0, 1, 2, or 3 bytes to the right). So after isolating the byte you want in the low order 8 bits, we and the result with 0xff so that the final result has 24 leading zeroes and the correct low order 8 bits of the proper byte (based on Daddr) within the word Din. This is what LBU does! Note how LB is slightly different. LHU and LH will also be similar....but different.

    If for whatever reason you decide to edit the comparisons in qc.v be cautioned that you should NOT use > or >= or <= comparisons. This instantiates quite a bit of hardware (think of what that function would do). In qc.v you should use tests of equality (or inequality) and checks of the sign bit only. I will discuss in class why the branch comparison logic is especially latency-sensitive.

    Here is an updated link to details on the origins of Big Endian and Little Endian from Danny Cohen's famous article "On Holy Wars and a Plea for Peace".

  • Oct 10 --- sample .cshrc and .login files from David M. Lyle. Based on the project's pdf file, there are following changes in these two files:
    change .cshrc to add "~heinrich/myusr/local/bin" and "/mcad/synopsys/vcs6.0/bin" in PATH
    chage a line in .login (then you don't need to source .cshrc anymore):
    In the test block for SunOS_ver == 5, change this:
    setenv PATH .:/usr/local/bin:/usr/openwin/bin:/usr/dt/bin:/usr/local/hosts:/usr/ucb:/usr/bin:/bin:/usr/ccs/bin:/usr/etc
    to this:
    setenv PATH .:/usr/local/bin:/usr/openwin/bin:/usr/dt/bin:/usr/local/hosts:/usr/ucb:/usr/bin:/bin:/usr/ccs/bin:/usr/etc:$PATH
    Thanks David!
  • Oct 08 --- Tip for a common problem for Project1
  • Sep 28 --- Instructions on Cygwin and virsims posted. virsims is a tool to show waveforms and help you debug your verilog code.
  • Sep 28 --- Project1 posted. Due 10/18/05 11:59pm.
  • Sep 28 --- Solution of Project0 posted.
  • Sep 28 --- Slides of Lecture 4 - "Vector Processing CRAY like machines" posted.
  • Sep 26 --- Notes on Computer Organization and Architecture by Dr. Barry Wilkinson(read Lecture 6) posted
  • Sep 25 --- Tip for a common problem for Project0
  • Sep 21 --- Project0 posted.
  • Sep 20 --- TA's office hours changed.
  • Sep 19 --- Midterm exam is on Oct. 4th.
  • Sep 19 --- Slides of Bus Architectures is posted.
  • Aug 25 --- Syllabus and handout of interrupt handling are posted.

Syllabus

Handouts:

  1. Interrupt handling
  2. Flynn's Taxonomy
  3. Bus Architectures
  4. verilog_view, verilog_print
  5. Computer Organization and Architecture by Dr. Barry Wilkinson(read Lecture 6)
  6. Lecture 4 - "Vector Processing CRAY like machines"
  7. I/O performance

Projects Page