6.893 Advanced VLSI Computer Architecture
Readings and Other Resources

These readings have been designed to provide background for research in computer architecture and to provide controversial material to encourage discussion. The schedule lists the assigned papers using the BibTex paper reference keys from the list below. The papers will be made available either online or in NE43-620 at least one week prior to the class meeting.

The usual class structure will begin with a 10-minute summary of the previous meetings discussions, then two student-led discussion sessions of approximately 30 minutes each, and finally a 10-minute introduction to the material to be discussed in the next meeting. Some classes will replace one or both reading discussions with additional lecture material.

Each student will lead a discussion session based on assigned readings and should prepare a 10 minute summary (5 slides) summarizing and critiquing the material leaving 20 minutes for class discussion. The discussion leader is encourage to research the work beyond the assigned paper, and to include this extra detail in their presentation. Either a PowerPoint or PDF version of the slides must be sent in to be added to the class web site. All students are expected to have read all the assigned papers before each class session. Each student should prepare their own comments and questions on each paper and be ready to discuss these in class.


Assigned Readings


September 14

Alpha 21264 Case Study Part 1: Microarchitecture & Performance

@Article{kessler:micro:99,
  author =       {R. E. Kessler},
  title =        {The {Alpha} 21264 Microprocessor},
  journal =      {IEEE Micro},
  year =         1999,
  volume =       19,
  number =       2,
  month =        {March/April},
  pages =        {24--36}
}

Ken Conley, slides PPT, PDF

@InProceedings{cvetanovic:isca:2000,
  author =       {Z. Cvetanovic and R. E. Kessler},
  title =        {Performance analysis of the {Alpha} 21264-based
                  {Compaq} {ES40} system},
  booktitle =    {ISCA 2000},
  pages =        {192 - 202},
  year =         2000,
  address =      {Vancouver, Canada},
  month =        {June},
  URL = {http://www.acm.org/pubs/articles/proceedings/isca/339647/p192-cvetanovic/p192-cvetanovic.pdf}
}

Mark Hampton, slides PPT, PDF

September 19

Alpha 21264 Case Study Part 2: VLSI Implementation

@Article{gronowski:ijssc:1998,
  author =       {P. E. Gronowski and W. J. Bowhill and R. P. Preston
                 and M. K. Gowan and R. L. Allmon},
  title =        {High-Performance Microprocessor Design},
  journal =      {IEEE Journal Solid-State Circuits},
  year =         1998,
  volume =       33,
  number =       5,
  month =        {May},
  pages =        {676--686}
}

Albert Ma, slides PPT, PDF

@Article{farrel:jssc:1998,
  author =       {J. A. Farrell and T. C. Fischer},
  title =        {Issue logic for a {600-Mhz} out-of-order execution
                  microprocessor},
  journal =      {IEEE Journal of Solid-State Circuits},
  year =         1998,
  volume =       33,
  number =       5,
  pages =        {707--712},
  month =        {May}
}

September 21

Limits of Conventional Microarchitecture Scaling

@InProceedings{palacharla:isca:1997,
  author =       {S. Palacharla and N. Jouppi and J. E. Smith},
  title =        {Complexity-Effective Superscalar Processors},
  booktitle =    {Proceedings 24th International Symposium on Computer
                  Architecture},
  year =         1997,
  month =        {June},
  pages =        {206--218},
  URL = {http://www.acm.org/pubs/articles/proceedings/isca/264107/p206-palacharla/p206-palacharla.pdf},
  URL2 = {ftp://ftp.cs.wisc.edu/sohi/papers/1997/isca.complexity.ps.gz}
}

Second URL for above might be easier to print.

@InProceedings{agarwal:isca:2000,
  author =       {V. Agarwal and M. S. Hrishikesh and S. W. Keckler
                  and D. Burger},
  title =        {Clock rate versus {IPC}: the end of the road for
                  conventional microarchitectures},
  booktitle =    {Proceedings 27th International Symposium on Computer
                  Architecture},
  year =         {2000},
  month =        {June},
  address =      {Vancouver, Canada},
  pages =        {248--259},
  URL = {http://www.acm.org/pubs/articles/proceedings/isca/339647/p248-agarwal/p248-agarwal.pdf}
}

Ed Olson, slides PPT, PDF

September 28

Low-Power Design

@Article{chandrakasan:jssc:1992,
  author =       {A. P. Chandrakasan and S. Cheng and R. W. Broderson},
  title =        {Low-Power {CMOS} Digital Design},
  journal =      {IEEE Journal of Solid-State Circuits},
  year =         1992,
  volume =       27,
  number =       4,
  month =        {April},
  pages =        {473--484}
}
@Article{burd:jvsp:1996,
  author =       {T. D. Burd and R. W. Broderson},
  title =        {Processor Design for Portable Systems},
  journal =      {Journal of VLSI Signal Processing},
  year =         1996,
  volume =       13,
  number =       {2/3},
  month =        {August/September},
  pages =        {203--222},
  URL = {http://infopad.eecs.berkeley.edu/infopad-ftp/papers/1996/Processor_Design_for_Portable_Systems/}
}

Krste's lecture notes PPT, PDF

October 5

Vectors Part1: Cray-1, NEC SX-4/5

@Article{russel:cacm:1978,
  author =       "R. M. Russel",
  title =        "The {CRAY-1} computer system",
  journal =      "Communications of the ACM",
  year =         1978,
  volume =       21,
  number =       1,
  pages =        "63--72",
  month =        "January"
}
@Article{kolodzey:ieeetchmt:1981,
  author =       {J. S. Kolodzey},
  title =        {Cray-1 computer technology},
  journal =      {IEEE Transactions on Components, Hybrids, and
                  Manufacturing Technology},
  year =         1981,
  pages =        {181-186},
  month =        {June}
}
@InProceedings{hammond:sc:1996,
  author =       {S. W. Hammond and R. D. Loft},
  title =        {Architecture and Application: The Performance of
                  {NEC SX-4} on the {NCAR} Benchmark Suite},
  booktitle =    {Proceedings Supercomputing '96},
  year =         1996,
  URL =      {http://www.scd.ucar.edu/css/sc96/sc96.html},
  PDF = {http://www.ai.mit.edu/projects/aries/papers/vector/hammond.pdf}
}
@Misc{necsx5:1998,
  author =       {HNSX Supercomputers Inc, NEC Corporation},
  title =        {{SX-5} series architecture},
  URL = {http://www.hstc.necsyl.com/},
  year =         1998,
  month =        {June}
}

October 12

Vectors part 2: Data Parallel Programming

@InProceedings{zagha:sc:1991,
  author =       {M. Zagha and G. E. Blelloch},
  title =        {Radix sort for vector multiprocessors},
  booktitle =    {Proceedings Supercomputing '91},
  year =         1991,
  month =        {November},
  pages =        {712--721},
  URL = {http://www.cs.cmu.edu/afs/cs.cmu.edu/project/scandal/public/papers/cray-sort-supercomputing91.ps.gz}
}
@Unpublished{martin:ucb:1996,
  author =       {R. Martin},
  title =        {A vectorized hash-join},
  note =         {IRAM course project report, University of California
                  at Berkeley}
  year =         1996,
  month =        {May},
  URL = {vectorhashjoin.pdf}
}
@Article{appel:js:1989,
  author =       {A. W. Appel and A. Bendiksen},
  title =        {Vectorized Garbage Collection},
  journal =      {Journal of Supercomputing},
  year =         1989,
  volume =       3,
  pages =        {151--160},
  URL = {http://www.cs.princeton.edu/faculty/appel/papers/169.ps}
}
@Misc{asanovic:unpub:1998,
  author =       {K. Asanovi\'{c}},
  title =        {Vectorizing {SPECint95}},
  howpublished = {Unpublished manuscript extracted from PhD Thesis},
  month =        {March},
  year =         1998,
  URL = {vspecint.pdf}
}

October 17

VLIW Part 1: ELI-512 and Multiflow Trace

@InProceedings{fisher:isca:1983,
  author =       "J. A. Fisher",
  title =        "Very Long Instruction Word Architectures and the ELI-512",
  pages =        "140--150",
  booktitle =    "Proc. 10th ISCA",
  year =         1983,
  publisher =    "Computer Society Press"
}
@InCollection{fisher:isca25:1998,
  author =       {J. A. Fisher},
  title =        {Retrospective on "Very Long Instruction Word
                  Architectures and the ELI-512" },
  booktitle =    {25 Years of the International Symposium on Computer
                  Architecture},
  pages =        {34--36},
  publisher =    {ACM Press},
  year =         1998,
  editor =       {G. Sohi}
}
@InProceedings{fisher:scc:1984,
  author =       {J. A. Fisher, J. R. Ellis, J. C. Ruttenberg,
                  and A. Nicolau},
  title =        {Parallel Processing: A Smart Compiler and a Dumb Machine},
  booktitle =    {Proceedings of the ACM Symposium on Compiler
                  Construction},
  year =         1984,
  pages =        {37--47}
}
@InProceedings{colwell:sc:1990,
  author =       {R. P. Colwell and W. E. Hall and C. S. Joshi and
                  D. B. Papworth and P. K. Rodman and J. E. Tornes},
  title =        {Architecture and implementation of a {VLIW} supercomputer},
  booktitle =    {Proceedings on Supercomputing '90 },
  pages =        {910-919},
  year =         1990,
  address =      {New York, NY USA},
  month =        {November},
  URL = {http://www.acm.org/pubs/articles/proceedings/supercomputing/110382/p910-colwell/p910-colwell.pdf}
}

October 24

VLIW Part 2: Cydra-5 and Compilation Techniques

@Article{rau:computer:1989,
  author =       "B. R. Rau and D. W. L. Yen and W. Yen
                  and R. A. Towle.",
  title =        "{The Cydra 5 Departmental Supercomputer: Design
                  Philosophies, Decisions, and Trade-offs.}",
  journal =      "IEEE Computer",
  year =         1989,
  volume =       22,
  number =       1,
  pages =        "12--35",
  month =        "January"
}
@Article{chang:ieeetc:1995,
  author =       "P. P. Chang and N. J. Warter and S.
                  A. Mahlke and W. Y. Chen and W.-m. Hwu",
  title =        "Three Architectural Models for Compiler-Controlled
                  Speculative Execution",
  journal =      "IEEE Transactions on Computers",
  year =         1995,
  volume =       44,
  number =       4,
  pages =        "481--494",
  month =        "April"
}
@InProceedings{mahlke:asplos5:1992,
  author = "S. A. Mahlke and W. Y. Chen and W.-m. W. Hwu and
        B. R. Rau and M. S. Schlansker",
  title = "{Sentinel Scheduling for VLIW and Superscalar Processors}",
  booktitle = "Proceedings of the Fifth International Conference on
        Architectural Support for Programming Languages and Operating Systems",
  address = "Boston, Massachusetts",
  month = October,
  year = 1992,
  pages = "238--247",
  URL = {http://www.acm.org/pubs/articles/proceedings/asplos/143365/p238-mahlke/p238-mahlke.pdf}
}

October 26

VLIW Part 3: Compilation Techniques and IA-64

@InProceedings{mahlke:isca:1995,
  author =       {S. A. Mahlke and R. E. Hank and J. E. McCormick and
                  D. I. August and W.-m. Hwu},
  title =        {A Comparison of Full and Partial Predicated
                  Execution Support for ILP Processors},
  booktitle =    {Proceedings ISCA '22},
  year =         1995,
  month =        {June},
  pages =        {138--149},
  URL = {http://www.acm.org/pubs/articles/proceedings/isca/223982/p138-mahlke/p138-mahlke.pdf}
}
@InProceedings{august:isca:1999,
  author =       {D. I. August and J. W. Sias and J.-M. Puiatti and
                  S. A. Mahlke and D. A. Connors and K. M. Crozier and
                  W. W. Hwu},
  title =        {The program decision logic approach to predicated
                  execution},
  booktitle =    {Proceedings of the 26th Annual International
                  Symposium on Computer Architecture},
  year =         1999,
  month =        {May},
  address =      {Atlanta, GA},
  pages =        {208--219},
  URL = {http://www.acm.org/pubs/articles/proceedings/isca/300979/p208-august/p208-august.pdf}
}
@Article{huck:micro:2000,
  author =       {J. Huck and D. Morris and J. Ross and A. Knies and
                  H. Mulder and R. Zahir},
  title =        {Introducing the {IA-64} Architecture},
  journal =      {IEEE Micro},
  year =         2000,
  volume =       20,
  number =       5,
  pages =        {12--23},
  month =        {September/October}
}
@Article{sharangpani:micro:2000,
  author =       {H. Sharangpani and K. Arora},
  title =        {Itanium Processor Microarchitecture},
  journal =      {IEEE Micro},
  year =         2000,
  volume =       20,
  number =       5,
  pages =        {24--43},
  month =        {September/October}
}

October 31

Multithreading: Tera MTA

@InProceedings{alverson:sc:1990,
    author = "R. Alverson and D. Callahan and D. Cummings and
                  B. Koblenz and A. Porterfield and B. Smith",
    title = "The {Tera} Computer System",
    booktitle = "Proceedings of the 1990 International Conference on
                  Supercomputing",
        Address = "Amsterdam, Netherlands",
    pages = "1--6",
    month = "September",
    year = 1990,
    URL = {http://www.acm.org/pubs/articles/proceedings/supercomputing/77726/p1-alverson/p1-alverson.pdf}
}
@InProceedings{alverson:ics:1997,
  author =       {G. Alverson and P. Briggs and S. Coatney and S. Kahan
                  and R. Korry},
  title =        {Tera Hardware-Software Cooperation},
  booktitle =    {Supercomputing `97},
  year =         1997,
  address =      {San Jose, CA},
  month =        {November},
  URL = {http://www.ee.lsu.edu/tca/papers/tera97.pdf}
}
@InProceedings{snavely:sc:1998,
  author =       {A. Snavely and L. Carter and J. Boisseau and
                  A. Majumdar and K. S. Gatlin and N. Mitchell and
                  J. Feo and B. Koblenz},
  title =        {Multi-processor Performance on the {Tera} {MTA} },
  booktitle =    {Supercomputing '98},
  year =         1998,
  address =      {Orlando, Florida},
  month =        {November},
  URL = {http://www.supercomp.org/sc98/TechPapers/sc98_FullAbstracts/Snavely973/}
}
@InProceedings{brunett:sc:1998,
  author =       {S. M. Brunett and J. Thornley and M. Ellenbecker},
  title =        {An Initial Evaluation of the {Tera} Multithreaded
                  Architecture and Programming System Using the {C3I}
                  Parallel Benchmark Suite},
  booktitle =    {Supercomputing '98},
  year =         1998,
  address =      {Orlando, Florida},
  month =        {November},
  URL = {http://www.supercomp.org/sc98/TechPapers/sc98_FullAbstracts/Brunett1063/Index.htm}
}

November 7

Occam and the Transputer

@Article{walker:byte:1985,
  author =       "P. Walker",
  title =        "The {Transputer}: a building block for parallel
                 processing",
  journal =      "BYTE Magazine",
  volume =       "10",
  number =       "5",
  pages =        "219--235",
  month =        "May",
  year =         "1985"
}

November 16

Berkeley IRAM Project

@Article{kozyrakis:computer:1997,
  author =       {C. Kozyrakis and S. Perissakis and
                  D. Patterson and T. Anderson and K.
                  Asanovi\'c and N. Cardwell and R. Fromm and
                  J. Golbus and B. Gribstad and K.
                  Keeton and R. Thomas and N. Treuhaft and K. Yelick},
  title =        "{Scalable Processors in the Billion-Transistor Era: IRAM}",
  journal =      {IEEE Computer},
  year =         1997,
  volume =       30,
  number =       9,
  month =        {September},
  pages =        {75--78},
  URL = {http://iram.cs.berkeley.edu/papers/IRAM.computer.pdf}
}
@MastersThesis{kozyrakis:ms:1999,
  author =       {C. Kozyrakis},
  title =        {A Media-Enhanced Vector Architecture for Embedded
                  Memory Systems},
  school =  {University of California, Berkeley},
  year =         1999,
  month =        {July},
  note =         {Also available as technical report UCB//CSD-99-1059},
  URL = {http://www.cs.berkeley.edu/~kozyraki/papers/csd-99-1059.pdf}
}

November 28

Reconfigurable Computing: GARP and Stanford Smart Memories

@InProceedings{hauser:fccm:1997,
  author =       {J. R. Hauser and J. Wawrzynek},
  title =        {Garp: {A} {MIPS} processor with a reconfigurable
                  coprocessor},
  booktitle =    {Proceedings FCCM},
  year =         1997,
  month =        {April},
  pages = {24-33},
  URL = {http://brass.cs.berkeley.edu/documents/GarpProcessor.ps}
}
@Article{callahan:computer:2000,
  author =       {T. J. Callahan and J. R. Hauser and J. Wawrzynek},
  title =        {The {Garp} architecture and {C} compiler},
  journal =      {IEEE Computer},
  year =         2000,
  volume =       33,
  number =       4,
  month =        {April},
  URL = {http://www.computer.org/computer/articles/April/coverfeature400_1.htm}
}
@InProceedings{mai:isca:2000,
  author =       {K. Mai and T. Paaske and N. Jayasena and R. Ho and
                  W. Dally and M. Horowitz},
  title =        {{Smart Memories}: {A} Modular Reconfigurable
                  Architecture},
  booktitle =    {Proc. ISCA 27},
  year =         2000,
  address =      {Vancouver, BC, Canada},
  month =        {June},
  pages = {161-171},
  URL = {http://mos.stanford.edu/papers/km_isca_00.pdf}
}

November 30

MIT RAW Architecture and Compiler

@Article{waingold:computer:1997,
  author =       {E. Waingold and M. Taylor and D. Srikrishna and
                  V. Sarkar and W. Lee and V. Lee and J. Kim and
                  M. Frank and P. Finch and R. Barua and J. Babb and
                  S. Amarasinghe and A. Agarwal},
  title =        "{Baring it all to software: Raw machines}",
  journal =      {IEEE Computer},
  year =         1997,
  volume =       30,
  number =       9,
  month =        {September},
  pages =        {86--93},
  URL = {../raw/documents/Waingold:Computer:1997.pdf}
}
@InProceedings{lee:asplos:1998,
  author =       {W. Lee and R. Barua and M. Frank and D. Srikrishna
                  and J. Babb and V. Sarkar and S. Amarasinghe},
  title =        {Space-Time Scheduling of Instruction-Level
                  Parallelism on a {Raw} Machine},
  booktitle =    {Proc ASPLOS-VIII},
  year =         1998,
  address =      {San Jose, CA},
  month =        {October},
  URL = {../raw/documents/Lee:ASPLOS:1998.pdf}
}
@InProceedings{barua:isca:1999,
  author =       {R. Barua and W. Lee and S. Amarasinghe and A. Agarwal},
  title =        {Maps: {A} Compiler-Managed Memory System for {Raw} Machines},
  booktitle =    {Proc. ISCA-26},
  year =         1999,
  address =      {Atlanta, GA},
  month =        {June},
  URL = {../raw/documents/Barua:ISCA:1999.pdf}
}

Supplemental Readings and Resources



Benchmarks


VLSI Scaling

International Technology Roadmap for Semiconductors
@Article{borkar:micro:1999,
  author =       {S. Borkar},
  title =        {Design Challenges of Technology Scaling},
  journal =      {IEEE Micro},
  year =         1999,
  volume =       19,
  number =       4,
  month =        {July/August},
  pages =        {23--29},
  URL = {http://dlib.computer.org/mi/books/mi1999/pdf/m4023.pdf}
}
@Article{thompson:itj:1998,
  author =       {Scott Thompson and Paul Packan and Mark Bohr},
  title =        {MOS Scaling: Transistor Challenges for the 21st Century},
  journal =      {Intel Technology Journal},
  year =         1998,
  month =        {3rd Quarter},
  URL = {http://developer.intel.com/technology/itj/q31998/pdf/trans.pdf}
}

Future Processor Predictions

ISSCC-2000 Panel Slides "Where will processor performance come from in the next ten years?"
ISCA-2000 Panel Slides "Slow Wires, Hot Chips, and Leaky Transistors: New Challenges in the New Millennium"

VLSI Design

@InProceedings{sutherland:arvlsi:1991,
  author = 	 {I.E. Sutherland and R.F. Sproull},
  title = 	 {Logical {Effort}: {Designing} for speed on
                  the back of an envelope},
  booktitle = 	 {Advanced Research in VLSI},
  pages =	 {1--16},
  year =	 1991,
  address =	 {Santa Cruz},
  URL = {http://bwrc.eecs.berkeley.edu/Classes/icdesign/ee241_s00/PAPERS/archive/SutherlandSproull91.pdf}
}
@Book{sutherland:mkp:1999,
  author =       {I. E. Sutherland and R. F. Sproull and D. Harris},
  title =        {Logical Effort: Designing Fast CMOS Circuits},
  publisher =    {Morgan Kaufmann Publishers},
  year =         {1st edition (1999)},
  note =         {ISBN: 1558605576}
}

Mark Horowitz's DAC 2000 presentation. (PowerPoint slides)
(On-line video)
(PDF slides)


Alpha 21264

@Article{dec21264:uPR:1996,
  author =       {Linley Gwennap},
  title =        {Digital 21264 Sets New Standard},
  journal =      {Microprocessor Report},
  year =         1996,
  volume =       10,
  number =       14,
  month =        {October},
  pages =        {11--16}
}
@Article{clouser:jssc:1999,
  author =       {J. Clouser and M. Matson and R. Badeau and R. Dupcak
                  and S. Samudrala and R. Allmon and and N. Fairbanks},
  title =        {A {600-MHz} superscalar floating-point processor},
  journal =      {IEEE Journal of Solid-State Circuits},
  year =         1999,
  volume =       34,
  number =       7,
  pages =        {1026--1029},
  month =        {July}
}
@Article{bailey:jssc:98,
  author =       {Daniel W. Bailey and Bradley J. Benschneider},
  title =        {Clocking design and analysis for a \wunits{600}{MHz}
                  {Alpha} microprocessor},
  journal =      {IEEE JSSC},
  year =         1998,
  volume =       33,
  number =       11,
  month =        {November},
  pages =        {1627-1633}
}

Low-Power Computing

@InProceedings{martin:islped:1996,
  author =       {Thomas L. Martin and Daniel P. Siewiorek},
  title =        {A power metric for mobile systems},
  booktitle =    {Proceedings ISLPED},
  year =         1996,
  address =      {Monterey, CA},
  pages =        {37--42}
}

Vector Machines

Vectors part 1: Cray-1, NEC SX-5

@InProceedings{wasserman:ics:1996,
  author =       {Harvey J. Wasserman},
  title =        {Benchmark Tests on the Digital Equipment Corporation
                  Alpha AXP 21164-Based AlphaServer 8400, Including a
                  Comparison on Optimized Vector and Scalar Processing},
  booktitle =    {Proceedings of the 1996 International Conference on
                  Supercomputing},
  year =         1996,
  organization = {ACM},
  month =        {May},
  pages =        {333-340}
}

VLIW Machines

@Article{colwell:ieeetc:1988,
  author =       {R. P. Colwell and R. P. Nix and J. J. O'Donnell and
                  D. B. Papworth and P. K. Rodman},
  title =        {A VLIW Architecture for a Trace Scheduling Compiler},
  journal =      {IEEE Transacations on Computers},
  year =         1988,
  volume =       37,
  number =       8,
  pages =        {967-979},
  month =        {August},
  annote =       {Also in Proceedings of the Second International
                  Conference on Architectural Support for Programming
                  Languages and Operating System, 1987, pp. 180-192}
}

UltraSPARC-III

@Article{Ultrasparc3uPR97,
  author =       {Peter Song},
  title =        {Ultrasparc-3 aims at {MP} servers},
  journal =      {Microprocessor Report},
  year =         1997,
  volume =       11,
  number =       14,
  month =        {October},
  pages =        {29--34},
  URL = {http://www.mdronline.com/EDN/articles/111407.pdf}
}

MIPS R10000

@Article{yeager:micro:1996,
  author =       {K. C. Yeager},
  title =        {The {MIPS} {R10000} superscalar microprocessor},
  journal =      {IEEE Micro},
  year =         1996,
  volume =       16,
  number =       2,
  month =        {April},
  pages =        {28--40}
}
R10000 Die Photo

Transputer


Krste Asanovic (krste@mit.edu)