Digital Design and Computer Architecture 2nd Edition

This book is suitable for a rapid-paced, single-semester introduction to digital design and computer architecture or for a two-quarter or two-semester sequence giving more time to digest the material and experiment in the lab. The course can be taught without prerequisites. The material is usually taught at the sophomore- or junior-year level, but may also be accessible to bright freshmen.

David Money Harris, Sarah L. Harris

721 Pages

5198 Reads



PDF Format

23.4 MB

Other Books

Download PDF format

  • David Money Harris, Sarah L. Harris   
  • 721 Pages   
  • 19 May 2017
  • Page - 1

    read more..

  • Page - 2

    In Praise of Digital Design and Computer Architecture Harris and Harris have taken the popular pedagogy from Computer Organization and Design to the next level of refinement, showing in detail how to build a MIPS microprocessor in both SystemVerilog and VHDL. With the exciting opportunity that students have to run large digital designs on modern FGPAs, the approach the authors take read more..

  • Page - 3

    Harris and Harris have created the first book that successfully combines digital system design with computer architecture. Digital Design and Computer Architecture is a much-welcomed text that extensively explores digital systems designs and explains the MIPS architecture in fantastic detail. I highly recommend this book. James E. Stine, Jr., Oklahoma State University Digital Design and read more..

  • Page - 4

    Digital Design and Computer Architecture Second Edition read more..

  • Page - 5

    About the Authors David Money Harris is a professor of engineering at Harvey Mudd College. He received his Ph.D. in electrical engineering from Stanford University and his M.Eng. in electrical engineering and computer science from MIT. Before attending Stanford, he worked at Intel as a logic and circuit designer on the Itanium and Pentium II processors. Since then, he has consulted read more..

  • Page - 6

    Digital Design and Computer Architecture Second Edition David Money Harris Sarah L. Harris AMSTERDAM • BOSTON • HEIDELBERG • LONDON NEW YORK • OXFORD • PARIS • SAN DIEGO SAN FRANCISCO • SINGAPORE • SYDNEY • TOKYO Morgan Kaufmann is an imprint of Elsevier read more..

  • Page - 7

    Acquiring Editor: Todd Green Development Editor: Nathaniel McFadden Project Manager: Danielle S. Miller Designer: Dennis Schaefer Morgan Kaufmann is an imprint of Elsevier 225 Wyman Street, Waltham, MA 02451, USA © 2013 Elsevier, Inc. All rights reserved. No part of this publication may be reproduced or transmitted in any form or by any means, electronic or mechanical, including read more..

  • Page - 8

    To my family, Jennifer, Abraham, Samuel, and Benjamin – DMH To Ivan and Ocaan, who defy logic – SLH read more..

  • Page - 9

    This page intentionally left blank read more..

  • Page - 10

    Contents Preface . .. .. ... .. .. ... .. .. ... .. .. ... .. .. ... .. .. ... .. .. ... .. xix Features . ...... ...... ...... ..... ...... ...... ...... ...... .. xx Online Supplements . ...... ...... ...... ...... ...... ..... ... xxi How to Use the Software Tools in a Course ..... ...... ..... .. xxii Labs ... ...... ...... ...... ...... ...... ...... ...... read more..

  • Page - 11

    1.7 CMOS Transistors . . ...... ...... ...... ...... ..... ...... ... 26 1.7.1 Semiconductors . ...... ...... ...... ...... ...... ..... 27 1.7.2 Diodes ... ...... ...... ...... ...... ...... ...... ..... 27 1.7.3 Capacitors ..... ...... ...... ...... ...... ...... ..... 28 1.7.4 nMOS and pMOS Transistors ..... ...... ...... ..... 28 1.7.5 CMOS NOT Gate .... ...... ...... ...... read more..

  • Page - 12

    2.10 Summary ..... ...... ...... ..... ...... ...... ...... ...... .. 95 Exercises ..... ...... ...... ...... ...... ...... ...... ..... .... 97 Interview Questions . ...... ...... ...... ...... ...... ..... .. 106 Chapter 3 Sequential Logic Design ... .. .. ... .. .. ... .. .. ... .. 109 3.1 Introduction... ...... ...... ..... ...... ...... ...... ...... . 109 3.2 Latches and read more..

  • Page - 13

    4.2 Combinational Logic ...... ...... ...... ...... ..... ...... .. 177 4.2.1 Bitwise Operators ...... ...... ...... ..... ...... .. 177 4.2.2 Comments and White Space ... ...... ..... ...... .. 180 4.2.3 Reduction Operators .... ...... ...... ..... ...... .. 180 4.2.4 Conditional Assignment . ...... ...... ..... ...... .. 181 4.2.5 Internal Variables ...... ...... ...... ..... ...... .. read more..

  • Page - 14

    5.2.7 Division .... ...... ..... ...... ...... ...... ...... . 253 5.2.8 Further Reading ... ..... ...... ...... ...... ...... . 254 5.3 Number Systems ..... ...... ..... ...... ...... ...... ...... . 255 5.3.1 Fixed-Point Number Systems .. ...... ...... ...... . 255 5.3.2 Floating-Point Number Systems ...... ...... ...... . 256 5.4 Sequential Building Blocks . . ..... ...... ...... read more..

  • Page - 15

    6.5 Addressing Modes . . ...... ...... ...... ...... ..... ...... .. 333 6.6 Lights, Camera, Action: Compiling, Assembling, and Loading. . . 336 6.6.1 The Memory Map ...... ...... ...... ..... ...... .. 336 6.6.2 Translating and Starting a Program .. ..... ...... .. 337 6.7 Odds and Ends ..... ...... ...... ...... ...... ..... ...... .. 342 6.7.1 Pseudoinstructions ...... ...... read more..

  • Page - 16

    7.6 HDL Representation . ...... ..... ...... ...... ...... ...... . 429 7.6.1 Single-Cycle Processor .. ...... ...... ...... ...... . 430 7.6.2 Generic Building Blocks . ...... ...... ...... ...... . 434 7.6.3 Testbench ... ...... ..... ...... ...... ...... ...... . 437 7.7 Exceptions .... ...... ...... ..... ...... ...... ...... ...... . 440 7.8 Advanced Microarchitecture ..... ...... read more..

  • Page - 17

    8.6.5 Interrupts ..... ...... ...... ...... ..... ...... ..... 529 8.6.6 Analog I/O ... ...... ...... ...... ..... ...... ..... 531 8.6.7 Other Microcontroller Peripherals . ..... ...... ..... 537 8.7 PC I/O Systems ..... ...... ...... ...... ...... ..... ...... .. 558 8.7.1 USB .... ...... ...... ...... ...... ..... ...... ..... 559 8.7.2 PCI and PCI Express ... ...... ...... ..... read more..

  • Page - 18

    Appendix B MIPS Instructions . .. .. ... .. .. ... .. .. ... .. .. ... .. 619 Appendix C C Programming ... .. .. ... .. .. ... .. .. ... .. .. ... .. 623 C.1 Introduction... ...... ...... ..... ...... ...... ...... ...... . 623 C.2 Welcome to C . ...... ...... ..... ...... ...... ...... ...... . 625 C.2.1 C Program Dissection ... ...... ...... ...... ...... . read more..

  • Page - 19

    This page intentionally left blank read more..

  • Page - 20

    Preface Why publish yet another book on digital design and computer architecture? There are dozens of good books in print on digital design. There are also several good books about computer architecture, especially the classic texts of Patterson and Hennessy. This book is unique in its treatment in that it presents digital logic design from the perspective of computer architecture, read more..

  • Page - 21

    FEATURES This book offers a number of special features. Side-by-Side Coverage of SystemVerilog and VHDL Hardware description languages (HDLs) are at the center of modern digi- tal design practices. Unfortunately, designers are evenly split between the two dominant languages, SystemVerilog and VHDL. This book intro- duces HDLs in Chapter 4 as soon as combinational and sequential logic read more..

  • Page - 22

    End-of-Chapter Exercises and Interview Questions The best way to learn digital design is to do it. Each chapter ends with numerous exercises to practice the material. The exercises are followed by a set of interview questions that our industrial colleagues have asked students applying for work in the field. These questions provide a helpful glimpse into the types of problems job read more..

  • Page - 23

    HOW TO USE THE SOFTWARE TOOLS IN A COURSE Altera Quartus II Quartus II Web Edition is a free version of the professional-strength Quartus™ II FPGA design tools. It allows students to enter their digital designs in schematic or using either the SystemVerilog or VHDL hard- ware description language (HDL). After entering the design, students can simulate their circuits using read more..

  • Page - 24

    LABS The companion site includes links to a series of labs that cover topics from digital design through computer architecture. The labs teach stu- dents how to use the Quartus II tools to enter, simulate, synthesize, and implement their designs. The labs also include topics on C and assembly language programming using the Microchip MPLAB IDE. After synthesis, students can implement read more..

  • Page - 25

    work of Chris Parks, Carl Pearson, and Johnathan Chai who tested code and developed content for the second edition. Numerous reviewers substantially improved the book. They include John Barr, Jack V. Briner, Andrew C. Brown, Carl Baumgaertner, A. Utku Diril, Jim Frenzel, Jaeha Kim, Phillip King, James Pinter-Lucke, Amir Roth, Z. Jerry Shi, James E. Stine, Luke Teyssier, Peiyi Zhao, read more..

  • Page - 26

    This page intentionally left blank read more..

  • Page - 27

    read more..

  • Page - 28

    1 From Zero to One 1.1 THE GAME PLAN Microprocessors have revolutionized our world during the past three dec- ades. A laptop computer today has far more capability than a room-sized mainframe of yesteryear. A luxury automobile contains about 50 micro- processors. Advances in microprocessors have made cell phones and the Internet possible, have vastly improved medicine, and have read more..

  • Page - 29

    your head all at once. One of the major themes weaved through this book is how to manage complexity. 1.2 THE ART OF MANAGING COMPLEXITY One of the characteristics that separates an engineer or computer scientist from a layperson is a systematic approach to managing complexity. Mod- ern digital systems are built from millions or billions of transistors. No human being could read more..

  • Page - 30

    instructions and registers (memory for temporarily storing variables) that the programmer is allowed to use. Microarchitecture involves combining logic elements to execute the instructions defined by the architecture. A particular architecture can be implemented by one of many different microarchitectures with different price/performance/power trade-offs. For example, the Intel Core i7, the Intel read more..

  • Page - 31

    ourselves to digital circuits, we can easily combine components into sophisticated systems that ultimately outperform those built from analog components in many applications. For example, digital televisions, com- pact disks (CDs), and cell phones are replacing their analog predecessors. 1.2.3 The Three-Y ’s In addition to abstraction and discipline, designers use the three “-y’s” to read more..

  • Page - 32

    1.3 THE DIGITAL ABSTRACTION Most physical variables are continuous. For example, the voltage on a wire, the frequency of an oscillation, or the position of a mass are all con- tinuous quantities. Digital systems, on the other hand, represent informa- tion with discrete-valued variables —that is, variables with a finite number of distinct values. An early digital system using read more..

  • Page - 33

    of the Analytical Engine, in which each row processes one digit. Babbage chose 25 rows of gears, so the machine has 25-digit precision. Unlike Babbage ’s machine, most electronic computers use a binary (two-valued) representation in which a high voltage indicates a '1' and a low voltage indicates a '0', because it is easier to distinguish between two voltages than ten. The read more..

  • Page - 34

    A computer programmer can work without needing to know the intimate details of the computer hardware. On the other hand, understanding the details of the hardware allows the programmer to optimize the software better for that specific computer. An individual bit doesn ’t carry much information. In the next section, we examine how groups of bits can be used to represent numbers. read more..

  • Page - 35

    column weights (again from right to left) are 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384, 32768, 65536, and so on. If you work with binary numbers often, you ’ll save time if you remember these powers of two up to 2 16. An N-bit binary number represents one of 2 N possibilities: 0, 1, 2, 3, …, 2 N − 1. Table 1.1 shows 1, 2, 3, read more..

  • Page - 36

    Example 1.2 DECIMAL TO BINARY CONVERSION Convert the decimal number 8410 to binary. Solution: Determine whether each column of the binary result has a 1 or a 0. We can do this starting at either the left or the right column. Working from the left, start with the largest power of 2 less than or equal to the number (in this case, 64). 84 ≥ 64, so there is a 1 in read more..

  • Page - 37

    Example 1.4 BINARY TO HEXADECIMAL CONVERSION Convert the binary number 11110102 to hexadecimal. Solution: Again, conversion is easy. Start reading from the right. The four least significant bits are 10102 = A16. The next bits are 1112 = 716. Hence 11110102 = 7A16. Table 1.2 Hexadecimal number system Hexadecimal Digit Decimal Equivalent Binary Equivalent 0 0 0000 1 1 0001 2 2 0010 3 3 0011 read more..

  • Page - 38

    Example 1.5 DECIMAL TO HEXADECIMAL AND BINARY CONVERSION Convert the decimal number 33310 to hexadecimal and binary. Solution: Like decimal to binary conversion, decimal to hexadecimal conversion can be done from the left or the right. Working from the left, start with the largest power of 16 less than or equal to the number (in this case, 256). 256 goes into 333 once, so read more..

  • Page - 39

    By handy coincidence, 2 10 = 1024 ≈ 10 3. Hence, the term kilo (Greek for thousand) indicates 2 10. For example, 2 10 bytes is one kilobyte (1 KB). Similarly, mega (million) indicates 2 20 ≈ 10 6,and giga (billion) indicates 2 30 ≈ 10 9. If you know 2 10 ≈ 1 thousand, 2 20 ≈ 1 million, 2 30 ≈ 1 billion, and remember the powers of two up to 2 9, it is read more..

  • Page - 40

    the sum 1 + 1 = 210 = 102 cannot fit in a single binary digit. So we record the 1 ’s digit (0) and carry the 2 ’s digit (1) of the result to the next column. In the second column, the sum is 1 + 1 + 1 = 310 = 112. Again, we record the 1 ’s digit (1) and carry the 2 ’s digit (1) to the next column. For obvious reasons, the bit that is carried read more..

  • Page - 41

    bit as the sign and the remaining N −1 bits as the magnitude (absolute value). A sign bit of 0 indicates positive and a sign bit of 1 indicates negative. Example 1.9 SIGN/MAGNITUDE NUMBERS Write 5 and −5 as 4-bit sign/magnitude numbers Solution: Both numbers have a magnitude of 510 = 1012. Thus, 510 = 01012 and −510 = 11012. Unfortunately, ordinary binary addition does not read more..

  • Page - 42

    Solution: Start with + 210 = 00102. To get −210, invert the bits and add 1. Inverting 00102 produces 11012. 11012 + 1 = 11102.So −210 is 11102. Example 1.11 VALUE OF NEGATIVE TWO ’S COMPLEMENT NUMBERS Find the decimal value of the two ’s complement number 10012. Solution: 10012 has a leading 1, so it must be negative. To find its magnitude, invert the bits and add 1. read more..

  • Page - 43

    Like unsigned numbers, N-bit two ’s complement numbers represent one of 2 N possible values. However the values are split between positive and negative numbers. For example, a 4-bit unsigned number represents 16 values: 0 to 15. A 4-bit two ’s complement number also represents 16 values: −8 to 7. In general, the range of an N-bit two ’s complement num- ber spans [ read more..

  • Page - 44

    complement), and then adding. Unless stated otherwise, assume that all signed binary numbers use two ’s complement representation. Figure 1.11 shows a number line indicating the values of 4-bit num- bers in each system. Unsigned numbers span the range [0, 15] in regular binary order. Two ’s complement numbers span the range [ −8, 7]. The nonnegative numbers [0, 7] share the read more..

  • Page - 45

    the left (or top) and outputs on the right (or bottom). Digital designers typically use letters near the beginning of the alphabet for gate inputs and the letter Y for the gate output. The relationship between the inputs and the output can be described with a truth table or a Boolean equation. A truth table lists inputs on the left and the corresponding output on the read more..

  • Page - 46

    1.5.4 OR Gate The OR gate shown in Figure 1.15 produces a TRUE output, Y, if either A or B (or both) are TRUE. The Boolean equation for an OR gate is writ- ten as Y = A + B or Y = A ∪ B. The ∪ symbol is pronounced union and is preferred by logicians. Digital designers normally use the + notation, Y = A + B is pronounced “Y equals A or B. ” 1.5.5 read more..

  • Page - 47

    produces a TRUE output when all N inputs are TRUE. An N-input OR gate produces a TRUE output when at least one input is TRUE. Example 1.16 THREE-INPUT NOR GATE Figure 1.19 shows the symbol and Boolean equation for a three-input NOR gate. Complete the truth table. Solution: Figure 1.20 shows the truth table. The output is TRUE only if none of the inputs are TRUE. Example read more..

  • Page - 48

    connected to the input of the receiver. The driver produces a LOW (0) out- put in the range of 0 to VOL or a HIGH (1) output in the range of VOH to VDD · If the receiver gets an input in the range of 0 to VIL,it will consider the input to be LOW. If the receiver gets an input in the range of VIH to VDD, it will consider the input to be HIGH. If, read more..

  • Page - 49

    Solution: The inverter noise margins are: NML = VIL − VOL = (1.35 V − 0.33 V) = 1.02 V, NMH = VOH − VIH = (3.84 V − 3.15 V) = 0.69 V. The circuit can tolerate 1 V of noise when the output is LOW (NML = 1.02 V) but not when the output is HIGH (NMH = 0.69 V). For example, suppose the driver, I1, outputs its worst- case HIGH value, VO1 = VOH = 3.84 V. read more..

  • Page - 50

    from analog to digital, increasing design productivity by hiding needless detail. The choice of VDD and logic levels is arbitrary, but all gates that com- municate must have compatible logic levels. Therefore, gates are grouped into logic families such that all gates in a logic family obey the static dis- cipline when used with other gates in the family. Logic gates in the read more..

  • Page - 51

    Example 1.19 LOGIC FAMILY COMPATIBILITY Which of the logic families in actionGoTo:50,Table actionGoTo:50,1.4 can communicate with each other reliably? Solution: actionGoTo:51,Table actionGoTo:51,1.5 lists which logic families have compatible logic levels. Note that a 5 V logic family such as TTL or CMOS may produce an output voltage as HIGH as 5 V. If this 5 V signal drives the input read more..

  • Page - 52

    digital systems. In this section, we will peer beneath the digital abstraction to see how logic gates are built from MOSFETs. 1.7.1 Semiconductors MOS transistors are built from silicon, the predominant atom in rock and sand. Silicon (Si) is a group IV atom, so it has four electrons in its valence shell and forms bonds with four adjacent atoms, resulting in a crystalline read more..

  • Page - 53

    flows through the diode from the anode to the cathode. But when the anode voltage is lower than the voltage on the cathode, the diode is reverse biased, and no current flows. The diode symbol intuitively shows that current only flows in one direction. 1.7.3 Capacitors A capacitor consists of two conductors separated by an insulator. When a voltage V is applied to one of read more..

  • Page - 54

    There are two flavors of MOSFETs: nMOS and pMOS (pronounced “n-moss” and “p-moss”). actionGoTo:54,Figure actionGoTo:54,1.29 shows cross-sections of each type, made by sawing through a wafer and looking at it from the side. The n-type transistors, called nMOS, have regions of n-type dopants adjacent to the gate called the source and the drain and are built on a p-type semi- read more..

  • Page - 55

    pMOS transistors work in just the opposite fashion, as might be guessed from the bubble on their symbol shown in actionGoTo:55,Figure actionGoTo:55,1.31. The substrate is tied to VDD. When the gate is also at VDD, the pMOS transistor is OFF. When the gate is at GND, the channel inverts to p-type and the pMOS transistor is ON. Unfortunately, MOSFETs are not perfect switches. read more..

  • Page - 56

    transistors are just the opposite: ON when the gate is 0 and OFF when the gate is 1. 1.7.5 CMOS NOT Gate actionGoTo:56,Figure actionGoTo:56,1.32 shows a schematic of a NOT gate built with CMOS transis- tors. The triangle indicates GND, and the flat bar indicates VDD; these labels will be omitted from future schematics. The nMOS transistor, N1, is connected between GND and the read more..

  • Page - 57

    good at passing 1 ’s, so a pull-up network of pMOS transistors is placed between the output and VDD to pull the output up to 1. The networks may consist of transistors in series or in parallel. When transistors are in parallel, the network is ON if either transistor is ON. When transistors are in series, the network is ON only if both transistors are ON. The slash read more..

  • Page - 58

    Solution: It is impossible to build an AND gate with a single CMOS gate. However, building NAND and NOT gates is easy. Thus, the best way to build an AND gate using CMOS transistors is to use a NAND followed by a NOT, as shown in actionGoTo:58,Figure actionGoTo:58,1.37. 1.7.7 Transmission Gates At times, designers find it convenient to use an ideal switch that can pass both read more..

  • Page - 59

    Pseudo-nMOS gates got their name from the 1970 ’s, when manufactur- ing processes only had nMOS transistors. A weak nMOS transistor was used to pull the output HIGH because pMOS transistors were not available. 1.8 POWER CONSUMPTION * Power consumption is the amount of energy used per unit time. Power consumption is of great importance in digital systems. The battery life of read more..

  • Page - 60

    Solution: The static power is Pstatic = (0.040 A)(1.2 V) = 48 mW. (a) If the phone is not being used, this is the only power consumption, so the battery life is (6 Whr)/ (0.048 W) = 125 hours (about 5 days). (b) If the phone is being used, the dynamic power is Pdynamic = (0.5)(10− 8 F)(1.2 V) 2(3 × 10 8 Hz) = 2.16 W. Together with the static and broadcast read more..

  • Page - 61

    build hardware rather than software. Most digital systems today are designed with HDLs. SystemVerilog and VHDL are the two prevalent lan- guages, and they are covered side-by-side in this book. actionGoTo:264,Chapter actionGoTo:264,5 studies other combinational and sequential building blocks such as adders, multi- pliers, and memories. actionGoTo:320,Chapter actionGoTo:320,6 shifts to computer read more..

  • Page - 62

    Exercises Exercise 1.1 Explain in one paragraph at least three levels of abstraction that are used by (a) biologists studying the operation of cells. (b) chemists studying the composition of matter. Exercise 1.2 Explain in one paragraph how the techniques of hierarchy, modularity, and regularity may be used by (a) automobile designers. (b) businesses to manage their operations. Exercise read more..

  • Page - 63

    Exercise 1.10 What is the largest 32-bit binary number that can be represented with (a) unsigned numbers? (b) two ’s complement numbers? (c) sign/magnitude numbers? Exercise 1.11 What is the smallest (most negative) 16-bit binary number that can be represented with (a) unsigned numbers? (b) two ’s complement numbers? (c) sign/magnitude numbers? Exercise 1.12 What is the smallest (most read more..

  • Page - 64

    Exercise 1.17 Convert the following hexadecimal numbers to decimal. Show your work. (a) A516 (b) 3B16 (c) FFFF16 (d) D000000016 Exercise 1.18 Convert the following hexadecimal numbers to decimal. Show your work. (a) 4E16 (b) 7C16 (c) ED3A16 (d) 403FB00116 Exercise 1.19 Repeat Exercise 1.17, but convert to unsigned binary. Exercise 1.20 Repeat Exercise 1.18, but convert to unsigned binary. read more..

  • Page - 65

    Exercise 1.25 Convert the following decimal numbers to unsigned binary numbers. (a) 4210 (b) 6310 (c) 22910 (d) 84510 Exercise 1.26 Convert the following decimal numbers to unsigned binary numbers. (a) 1410 (b) 5210 (c) 33910 (d) 71110 Exercise 1.27 Repeat Exercise 1.25, but convert to hexadecimal. Exercise 1.28 Repeat Exercise 1.26, but convert to hexadecimal. Exercise 1.29 Convert the read more..

  • Page - 66

    Exercise 1.31 Repeat Exercise 1.29, but convert to 8-bit sign/magnitude numbers. Exercise 1.32 Repeat Exercise 1.30, but convert to 8-bit sign/magnitude numbers. Exercise 1.33 Convert the following 4-bit two ’s complement numbers to 8-bit two ’s complement numbers. (a) 01012 (b) 10102 Exercise 1.34 Convert the following 4-bit two ’s complement numbers to 8-bit two ’s complement numbers. read more..

  • Page - 67

    Exercise 1.41 How many 5-bit two ’s complement numbers are greater than 0? How many are less than 0? How would your answers differ for sign/magnitude numbers? Exercise 1.42 How many 7-bit two ’s complement numbers are greater than 0? How many are less than 0? How would your answers differ for sign/magnitude numbers? Exercise 1.43 How many bytes are in a 32-bit word? How read more..

  • Page - 68

    Exercise 1.53 Perform the following additions of unsigned binary numbers. Indicate whether or not the sum overflows an 8-bit result. (a) 100110012 + 010001002 (b) 110100102 + 101101102 Exercise 1.54 Repeat Exercise 1.52, assuming that the binary numbers are in two ’s complement form. Exercise 1.55 Repeat Exercise 1.53, assuming that the binary numbers are in two ’s complement form. read more..

  • Page - 69

    Exercise 1.59 Perform the following additions of unsigned hexadecimal numbers. Indicate whether or not the sum overflows an 8-bit (two hex digit) result. (a) 2216 + 816 (b) 7316 + 2C16 (c) 7F16 + 7F16 (d) C216 + A416 Exercise 1.60 Convert the following decimal numbers to 5-bit two ’s complement binary numbers and subtract them. Indicate whether or not the difference overflows a read more..

  • Page - 70

    Exercise 1.64 In a binary coded decimal (BCD) system, 4 bits are used to represent a decimal digit from 0 to 9. For example, 3710 is written as 00110111BCD. (a) Write 28910 in BCD (b) Convert 100101010001BCD to decimal (c) Convert 01101001BCD to binary (d) Explain why BCD might be a useful way to represent numbers Exercise 1.65 Answer the following questions related to BCD read more..

  • Page - 71

    Exercise 1.70 Repeat Exercise 1.69 but convert from an arbitrary base b1 to another base b2, as specified by the user. Support bases up to 16, using the letters of the alphabet for digits greater than 9. The user should enter b1,b2, and then the number to convert in base b1. The program should print the equivalent number in base b2. Exercise 1.71 Draw the symbol, Boolean read more..

  • Page - 72

    Exercise 1.76 There are 16 different truth tables for Boolean functions of two variables. List each truth table. Give each one a short descriptive name (such as OR, NAND, and so on). Exercise 1.77 How many different truth tables exist for Boolean functions of N variables? Exercise 1.78 Is it possible to assign logic levels so that a device with the transfer characteristics shown read more..

  • Page - 73

    Exercise 1.80 Is it possible to assign logic levels so that a device with the transfer characteristics shown in actionGoTo:73,Figure actionGoTo:73,1.46 would serve as a buffer? If so, what are the input and output low and high levels (VIL,VOL,VIH, and VOH) and noise margins (NML and NMH)? If not, explain why not. Exercise 1.81 Ben Bitdiddle has invented a circuit with the read more..

  • Page - 74

    (a) What kind of logic gate did he find? (b) What are the approximate high and low logic levels? Exercise 1.83 Repeat Exercise 1.82 for actionGoTo:74,Figure actionGoTo:74,1.49. Exercise 1.84 Sketch a transistor-level circuit for the following CMOS gates. Use a minimum number of transistors. (a) four-input NAND gate (b) three-input OR-AND-INVERT gate (see Exercise 1.75) (c) three-input AND-OR read more..

  • Page - 75

    Exercise 1.85 Sketch a transistor-level circuit for the following CMOS gates. Use a minimum number of transistors. (a) three-input NOR gate (b) three-input AND gate (c) two-input OR gate Exercise 1.86 A minority gate produces a TRUE output if and only if fewer than half of its inputs are TRUE. Otherwise it produces a FALSE output. Sketch a transistor-level circuit for a three-input read more..

  • Page - 76

    Exercise 1.90 Resistor-Transistor Logic (RTL) uses nMOS transistors to pull the gate output LOW and a weak resistor to pull the output HIGH when none of the paths to ground are active. A NOT gate built using RTL is shown in actionGoTo:76,Figure actionGoTo:76,1.52. Sketch a three-input RTL NOR gate. Use a minimum number of transistors. A Y weak Figure 1.52 RTL NOT gate Exercises read more..

  • Page - 77

    Interview Questions These questions have been asked at interviews for digital design jobs. Question 1.1 Sketch a transistor-level circuit for a CMOS four-input NOR gate. Question 1.2 The king receives 64 gold coins in taxes but has reason to believe that one is counterfeit. He summons you to identify the fake coin. You have a balance that can hold coins on each side. How many read more..

  • Page - 78

    This page intentionally left blank read more..

  • Page - 79

    read more..

  • Page - 80

    2 Combinational Logic Design 2.1 INTRODUCTION In digital electronics, a circuit is a network that processes discrete-valued variables. A circuit can be viewed as a black box, shown in actionGoTo:80,Figure actionGoTo:80,2.1, with ▶ one or more discrete-valued input terminals ▶ one or more discrete-valued output terminals ▶ a functional specification describing the relationship between inputs read more..

  • Page - 81

    illustrates a circuit with three elements, E1, E2, and E3, and six nodes. Nodes A, B, and C are inputs. Y and Z are outputs. n1 is an internal node between E1 and E3. Digital circuits are classified as combinational or sequential. Acom- binational circuit ’s outputs depend only on the current values of the inputs; in other words, it combines the current input values to read more..

  • Page - 82

    A circuit is combinational if it consists of interconnected circuit elements such that ▶ Every circuit element is itself combinational. ▶ Every node of the circuit is either designated as an input to the circuit or connects to exactly one output terminal of a circuit element. ▶ The circuit contains no cyclic paths: every path through the circuit visits each circuit node at read more..

  • Page - 83

    smaller circuit elements is an application of hierarchy. The rules of com- binational composition are an application of discipline. The functional specification of a combinational circuit is usually expressed as a truth table or a Boolean equation. In the next sections, we describe how to derive a Boolean equation from any truth table and how to use Boolean algebra and Karnaugh read more..

  • Page - 84

    ABY 00 01 10 11 0 1 0 1 minterm AB AB AB AB minterm name m 0 m 1 m 2 m 3 Figure 2.9 Truth table with multiple TRUE minterms Canonical form is just a fancy word for standard form. You can use the term to impress your friends and scare your enemies. numbered starting with 0; the top row corresponds to minterm 0, m0,the next row to minterm 1, m1, and so read more..

  • Page - 85

    Unfortunately, sum-of-products form does not necessarily generate the simplest equation. In actionGoTo:85,Section actionGoTo:85,2.3 we show how to write the same function using fewer terms. 2.2.3 Product-of-Sums Form An alternative way of expressing Boolean functions is the product- of-sums canonical form. Each row of a truth table corresponds to a max- term that is FALSE for that row. read more..

  • Page - 86

    These theorems have great practical significance, because they teach us how to simplify logic to produce smaller and less costly circuits. Axioms and theorems of Boolean algebra obey the principle of duality. If the symbols 0 and 1 and the operators • (AND) and + (OR) are inter- changed, the statement will still be correct. We use the prime symbol ( ′) to denote the dual read more..

  • Page - 87

    as shown in actionGoTo:87,Figure actionGoTo:87,2.15, if one input of an AND gate is 0, we can replace the AND gate with a wire that is tied LOW (to 0). Likewise, if one input of an OR gate is 1, we can replace the OR gate with a wire that is tied HIGH (to 1). Idempotency, T3, says that a variable AND itself is equal to just itself. Likewise, a variable OR read more..

  • Page - 88

    De Morgan ’s Theorem, T12, is a particularly powerful tool in digital design. The theorem explains that the complement of the product of all the terms is equal to the sum of the complement of each term. Likewise, the complement of the sum of all the terms is equal to the product of the complement of each term. According to De Morgan ’s theorem, a NAND gate is read more..

  • Page - 89

    ABY 00 01 10 11 0 0 1 1 Y 1 1 0 0 Figure 2.20 Truth table showing Y and Y¯¯ ABY 00 01 10 11 0 0 1 1 Y 1 1 0 0 minterm AB AB AB AB Figure 2.21 Truth table showing minterms for Y¯¯ and flips the body of the gate from AND to OR or vice versa. For example, the NAND gate in actionGoTo:88,Figure actionGoTo:88,2.19 consists of an AND body with a bubble on the output. Pushing read more..

  • Page - 90

    2.3.5 Simplifying Equations The theorems of Boolean algebra help us simplify Boolean equations. For example, consider the sum-of-products expression from the truth table of actionGoTo:84,Figure actionGoTo:84,2.9: Y = A B + AB : By Theorem T10, the equation simplifies to Y = B : This may have been obvious looking at the truth table. In general, multiple steps may be necessary to read more..

  • Page - 91

    Although it is a bit counterintuitive, expanding an implicant (for example, turning AB into ABC + ABC) is sometimes useful in minimizing equations. By doing this, you can repeat one of the expanded minterms to be combined (shared) with another minterm. You may have noticed that completely simplifying a Boolean equa- tion with the theorems of Boolean algebra can take some trial read more..

  • Page - 92

    By drawing schematics in a consistent fashion, we make them easier to read and debug. We will generally obey the following guidelines: ▶ Inputs are on the left (or top) side of a schematic. ▶ Outputs are on the right (or bottom) side of a schematic. ▶ Whenever possible, gates should flow from left to right. ▶ Straight wires are better to use than wires with multiple read more..

  • Page - 93

    Y A C B Figure 2.26 Schematic using fewer gates AND with inverted inputs. actionGoTo:93,Figure actionGoTo:93,2.26 shows a schematic using this optimization to eliminate the inverter on C. Recall that by De Morgan ’s theorem the AND with inverted inputs is equivalent to a NOR gate. Depending on the implementation technology, it may be cheaper to use the fewest gates or to use read more..

  • Page - 94

    X is an overloaded symbol that means “don’tcare” in truth tables and “contention” in logic simulation (see actionGoTo:98,Section actionGoTo:98,2.6.1). Think about the context so you don ’tmix up the meanings. Some authors useDor ?instead for “don’t care ” to avoid this ambiguity. Notice that if A3 is asserted in the priority circuit, the outputs don ’t care what the other read more..

  • Page - 95

    gates. These multilevel combinational circuits may use less hardware than their two-level counterparts. Bubble pushing is especially helpful in ana- lyzing and designing multilevel circuits. 2.5.1 Hardware Reduction Some logic functions require an enormous amount of hardware when built using two-level logic. A notable example is the XOR function of multiple variables. For example, consider read more..

  • Page - 96

    Selecting the best multilevel implementation of a specific logic function is not a simple process. Moreover, “best” has many meanings: fewest gates, fastest, shortest design time, least cost, least power consumption. In actionGoTo:264,Chapter actionGoTo:264,5, you will see that the “best” circuit in one technology is not necessarily the best in another. For example, we have been read more..

  • Page - 97

    actionGoTo:97,Figure actionGoTo:97,2.34(a). Working to the left, the rightmost gate has an input bubble that cancels with the output bubble of the middle NAND gate, so no change is necessary, as shown in actionGoTo:97,Figure actionGoTo:97,2.34(b). The middle gate has no input bubble, so we transform the leftmost gate to have no output bubble, as shown in actionGoTo:97,Figure read more..

  • Page - 98

    Example 2.8 BUBBLE PUSHING FOR CMOS LOGIC Most designers think in terms of AND and OR gates, but suppose you would like to implement the circuit in actionGoTo:98,Figure actionGoTo:98,2.36 in CMOS logic, which favors NAND and NOR gates. Use bubble pushing to convert the circuit to NANDs, NORs, and inverters. Solution: A brute force solution is to just replace each AND gate with read more..

  • Page - 99

    an error and must be avoided. The actual voltage on a node with conten- tion may be somewhere between 0 and VDD, depending on the relative strengths of the gates driving HIGH and LOW. It is often, but not always, in the forbidden zone. Contention also can cause large amounts of power to flow between the fighting gates, resulting in the circuit getting hot and possibly read more..

  • Page - 100

    the buffer is enabled. We show that the signal is active low by putting a bubble on its input wire. We often indicate an active low input by draw- ing a bar over its name, E, or appending the letters “b” or “bar” after its name, Eb or Ebar. Tristate buffers are commonly used on busses that connect multiple chips. For example, a microprocessor, a video controller, read more..

  • Page - 101

    with up to four variables. More important, they give insight into manipu- lating Boolean equations. Recall that logic minimization involves combining terms. Two terms containing an implicant P and the true and complementary forms of some variable A are combined to eliminate A : PA + PA = P : Karnaugh maps make these combinable terms easy to see by putting them next to each read more..

  • Page - 102

    As before, we can use Boolean algebra to minimize equations in sum-of- products form. Y = A B C + A BC = A B ðC + C Þ = A B (2.7) K-maps help us do this simplification graphically by circling 1 ’sin adjacent squares, as shown in actionGoTo:102,Figure actionGoTo:102,2.44. For each circle, we write the cor- responding implicant. Remember from actionGoTo:83,Section actionGoTo:83,2.2 read more..

  • Page - 103

    For example, in the K-map of actionGoTo:102,Figure actionGoTo:102,2.44, A B C and A BC are impli- cants, but not prime implicants. Only A B is a prime implicant in that K-map. Rules for finding a minimized equation from a K-map are as follows: ▶ Use the fewest circles necessary to cover all the 1 ’s. ▶ All the squares in each circle must contain 1 ’s. ▶ Each circle read more..

  • Page - 104

    00 01 Y 11 10 AB 1 1 0 0 1 0 1 1 0 1 C AC Y = AC + B B Figure 2.46 Solution for Example 2.9 Example 2.10 SEVEN-SEGMENT DISPLAY DECODER A seven-segment display decoder takes a 4-bit data input D3:0 and produces seven outputs to control light-emitting diodes to display a digit from 0 to 9. The seven outputs are often called segments a through g,or Sa –Sg, as defined read more..

  • Page - 105

    0123456789 Figure 2.48 Seven-segment display digits Table 2.6 Seven-segment display decoder truth table D3:0 Sa Sb Sc Sd Se Sf Sg 0000 1111110 0001 0110000 0010 1101101 0011 1111001 0100 0110011 0101 1011011 0110 1011111 0111 1110000 1000 1111111 1001 1110011 others 0000000 01 11 1 0 0 1 0 0 1 1 01 1 1 1 1 0 0 0 0 11 10 D3:2 00 00 10 01 11 1 1 1 0 0 0 1 1 01 1 1 1 0 0 0 0 0 11 10 00 00 10 D1:0 read more..

  • Page - 106

    2.7.3 Don ’t Cares Recall that “don’t care ” entries for truth table inputs were introduced in actionGoTo:91,Section actionGoTo:91,2.4 to reduce the number of rows in the table when some vari- ables do not affect the output. They are indicated by the symbol X, which means that the entry can be either 0 or 1. Don ’t cares also appear in truth table outputs where the read more..

  • Page - 107

    happen. Such outputs can be treated as either 0 ’sor 1 ’s at the designer ’s discretion. In a K-map, X ’s allow for even more logic minimization. They can be circled if they help cover the 1 ’s with fewer or larger circles, but they do not have to be circled if they are not helpful. Example 2.11 SEVEN-SEGMENT DISPLAY DECODER WITH DON ’T CARES Repeat Example 2.10 read more..

  • Page - 108

    human with a bit of experience can find a good solution by inspection. Neither of the authors has ever used a Karnaugh map in real life to solve a practical problem. But the insight gained from the principles underlying Karnaugh maps is valuable. And Karnaugh maps often appear at job interviews! 2.8 COMBINATIONAL BUILDING BLOCKS Combinational logic is often grouped into larger read more..

  • Page - 109

    with a Karnaugh map or read off by inspection (Y is 1 if S = 0 AND D0 is 1OR if S = 1AND D1 is 1). Alternatively, multiplexers can be built from tristate buffers as shown in actionGoTo:109,Figure actionGoTo:109,2.56. The tristate enables are arranged such that, at all times, exactly one tristate buffer is active. When S = 0, tristate T0 is enabled, allowing D0 to flow read more..

  • Page - 110

    AND gate. The inputs, A and B, serve as select lines. The multiplexer data inputs are connected to 0 or 1 according to the corresponding row of the truth table. In general, a 2 N-input multiplexer can be programmed to per- form any N-input logic function by applying 0 ’s and 1 ’s to the appropri- ate data inputs. Indeed, by changing the data inputs, the multiplexer read more..

  • Page - 111

    Example 2.13 LOGIC WITH MULTIPLEXERS, REPRISED Alyssa turns on her circuit one more time before the final presentation and blows up the 8:1 multiplexer. (She accidently powered it with 20 V instead of 5 V after not sleeping all night.) She begs her friends for spare parts and they give her a 4:1 multiplexer and an inverter. Can she build her circuit with only these parts? read more..

  • Page - 112

    Example 2.14 DECODER IMPLEMENTATION Implement a 2:4 decoder with AND, OR, and NOT gates. Solution: actionGoTo:112,Figure actionGoTo:112,2.64 shows an implementation for the 2:4 decoder using four AND gates. Each gate depends on either the true or the complementary form of each input. In general, an N:2 N decoder can be constructed from 2 N N-input AND gates that accept the various read more..

  • Page - 113

    When using decoders to build logic, it is easiest to express functions as a truth table or in canonical sum-of-products form. An N-input function with M 1 ’s in the truth table can be built with an N:2 N decoder and an M-input OR gate attached to all of the minterms containing 1 ’sin the truth table. This concept will be applied to the building of Read Only read more..

  • Page - 114

    actionGoTo:114,Figure actionGoTo:114,2.67 illustrates a buffer ’s propagation delay and contamina- tion delay in blue and gray, respectively. The figure shows that A is initi- ally either HIGH or LOW and changes to the other state at a particular time; we are interested only in the fact that it changes, not what value it has. In response, Y changes some time later. The arcs read more..

  • Page - 115

    therefore the slowest, path, because the input travels through three gates to the output. This path is critical because it limits the speed at which the circuit operates. The short path through the circuit, shown in gray, is from input D to output Y. This is the shortest, and therefore the fastest, path through the circuit, because the input travels through only a single read more..

  • Page - 116

    Solution: Ben begins by finding the critical path and the shortest path through the circuit. The critical path, highlighted in blue in actionGoTo:116,Figure actionGoTo:116,2.71, is from input A or B through three gates to the output Y. Hence, tpd is three times the propagation delay of a single gate, or 300 ps. The shortest path, shown in gray in actionGoTo:116,Figure read more..

  • Page - 117

    actionGoTo:118,Figure actionGoTo:118,2.74 shows the hierarchical implementation of the 4:1 multiplexer using two stages of 2:1 multiplexers. The critical path is from any of the D inputs to the output. This circuit is data critical, because the critical path is from the data input to the output: tpd = tpd_dy. If data inputs arrive well before the control inputs, we would prefer read more..

  • Page - 118

    As B transitions from 1 to 0, n2 (on the short path) falls before n1 (on the critical path) can rise. Until n1 rises, the two inputs to the OR gate are 0, and the output Y drops to 0. When n1 eventually rises, Y returns to 1. As shown in the timing diagram of actionGoTo:119,Figure actionGoTo:119,2.76, Y starts at 1 and ends at 1 but momentarily glitches to 0. read more..

  • Page - 119

    As long as we wait for the propagation delay to elapse before we depend on the output, glitches are not a problem, because the output eventually settles to the right answer. If we choose to, we can avoid this glitch by adding another gate to the implementation. This is easiest to understand in terms of the K-map. actionGoTo:119,Figure actionGoTo:119,2.77 shows how an input read more..

  • Page - 120

    actionGoTo:120,Figure actionGoTo:120,2.79 shows the glitch-proof circuit. The added AND gate is highlighted in blue. Now a transition on B when A = 0and C = 1 does not cause a glitch on the output, because the blue AND gate outputs 1 throughout the transition. In general, a glitch can occur when a change in a single variable crosses the boundary between two prime implicants read more..

  • Page - 121

    The function of a combinational circuit can be given by a truth table or a Boolean equation. The Boolean equation for any truth table can be obtained systematically using sum-of-products or product-of-sums form. In sum-of-products form, the function is written as the sum (OR) of one or more implicants. Implicants are the product (AND) of literals. Literals are the true or read more..

  • Page - 122

    Exercises Exercise 2.1 Write a Boolean equation in sum-of-products canonical form for each of the truth tables in actionGoTo:122,Figure actionGoTo:122,2.80. Exercise 2.2 Write a Boolean equation in sum-of-products canonical form for each of the truth tables in actionGoTo:122,Figure actionGoTo:122,2.81. BCY 00 01 10 11 1 0 1 0 A 0 0 0 0 00 01 10 11 1 1 1 1 1 1 0 1 BCY 00 01 10 11 1 0 0 0 A read more..

  • Page - 123

    Exercise 2.3 Write a Boolean equation in product-of-sums canonical form for the truth tables in actionGoTo:122,Figure actionGoTo:122,2.80. Exercise 2.4 Write a Boolean equation in product-of-sums canonical form for the truth tables in actionGoTo:122,Figure actionGoTo:122,2.81. Exercise 2.5 Minimize each of the Boolean equations from Exercise 2.1. Exercise 2.6 Minimize each of the Boolean equations read more..

  • Page - 124

    Exercise 2.15 Sketch a reasonably simple combinational circuit implementing each of the functions from Exercise 2.13. Exercise 2.16 Sketch a reasonably simple combinational circuit implementing each of the functions from Exercise 2.14. Exercise 2.17 Simplify each of the following Boolean equations. Sketch a reasonably simple combinational circuit implementing the simplified equation. (a) Y = BC + read more..

  • Page - 125

    Exercise 2.23 Prove De Morgan ’s Theorem (T12) for three variables, B2, B1, B0, using perfect induction. Exercise 2.24 Write Boolean equations for the circuit in actionGoTo:125,Figure actionGoTo:125,2.82. You need not minimize the equations. Exercise 2.25 Minimize the Boolean equations from Exercise 2.24 and sketch an improved circuit with the same function. Exercise 2.26 Using De Morgan read more..

  • Page - 126

    Exercise 2.27 Repeat Exercise 2.26 for the circuit in actionGoTo:126,Figure actionGoTo:126,2.84. Exercise 2.28 Find a minimal Boolean equation for the function in actionGoTo:126,Figure actionGoTo:126,2.85. Remember to take advantage of the don ’t care entries. Exercise 2.29 Sketch a circuit for the function from Exercise 2.28. Exercise 2.30 Does your circuit from Exercise 2.29 have any read more..

  • Page - 127

    Exercise 2.32 Sketch a circuit for the function from Exercise 2.31. Exercise 2.33 Ben Bitdiddle will enjoy his picnic on sunny days that have no ants. He will also enjoy his picnic any day he sees a hummingbird, as well as on days where there are ants and ladybugs. Write a Boolean equation for his enjoyment (E) in terms of sun (S), ants (A), hummingbirds (H), and read more..

  • Page - 128

    the inputs are TRUE. Design an eight-input priority encoder with inputs A7:0 and outputs Y2.0 and NONE. For example, if the input is 00100000, the output Y should be 101 and NONE should be 0. Give a simplified Boolean equation for each output, and sketch a schematic. Exercise 2.37 Design a modified priority encoder (see Exercise 2.36) that receives an 8-bit input, A7:0, and read more..

  • Page - 129

    Exercise 2.41 Implement the function from actionGoTo:122,Figure actionGoTo:122,2.80(b) using (a) an 8:1 multiplexer (b) a 4:1 multiplexer and one inverter (c) a 2:1 multiplexer and two other logic gates Exercise 2.42 Implement the function from Exercise 2.17(a) using (a) an 8:1 multiplexer (b) a 4:1 multiplexer and no other gates (c) a 2:1 multiplexer, one OR gate, and an inverter read more..

  • Page - 130

    Exercise 2.46 Redesign the circuit from Exercise 2.35 to be as fast as possible. Use only the gates from actionGoTo:129,Table actionGoTo:129,2.8. Sketch the new circuit and indicate the critical path. What are its propagation delay and contamination delay? Exercise 2.47 Redesign the priority encoder from Exercise 2.36 to be as fast as possible. You may use any of the gates from read more..

  • Page - 131

    Interview Questions The following exercises present questions that have been asked at interviews for digital design jobs. Question 2.1 Sketch a schematic for the two-input XOR function using only NAND gates. How few can you use? Question 2.2 Design a circuit that will tell whether a given month has 31 days in it. The month is specified by a 4-bit input Α3:0. For example, if read more..

  • Page - 132

    This page intentionally left blank read more..

  • Page - 133

    read more..

  • Page - 134

    3 Sequential Logic Design 3.1 INTRODUCTION In the last chapter, we showed how to analyze and design combinational logic. The output of combinational logic depends only on current input values. Given a specification in the form of a truth table or Boolean equa- tion, we can create an optimized circuit to meet the specification. In this chapter, we will analyze and design read more..

  • Page - 135

    Analyzing this circuit is different from analyzing a combinational circuit because it is cyclic: Q depends on Q , and Q depends on Q. Consider the two cases, Q is 0 or Q is 1. Working through the con- sequences of each case, we have: ▶ Case I: Q = 0 As shown in actionGoTo:135,Figure actionGoTo:135,3.2(a), I2 receives a FALSE input, Q, so it produces a TRUE output on read more..

  • Page - 136

    When power is first applied to a sequential circuit, the initial state is unknown and usually unpredictable. It may differ each time the circuit is turned on. Although the cross-coupled inverters can store a bit of information, they are not practical because the user has no inputs to control the state. However, other bistable elements, such as latches and flip-flops, provide read more..

  • Page - 137

    ▶ Case IVa: Q = 0 Because S and Q are FALSE, N2 produces a TRUE output on Q , as shown in actionGoTo:137,Figure actionGoTo:137,3.4(a). Now N1 receives one TRUE input, Q , so its output, Q, is FALSE, just as we had assumed. ▶ Case IVb: Q = 1 Because Q is TRUE, N2 produces a FALSE output on Q ,as shown in actionGoTo:137,Figure actionGoTo:137,3.4(b). Now N1 receives two read more..

  • Page - 138

    accounted for by the single state variable Q. No matter what pattern of setting and resetting occurred in the past, all that is needed to predict the future behavior of the SR latch is whether it was most recently set or reset. 3.2.2 D Latch The SR latch is awkward because it behaves strangely when both S and R are simultaneously asserted. Moreover, the S and R inputs read more..

  • Page - 139

    3.2.3 D FIip-Flop A D flip-flop can be built from two back-to-back D latches controlled by complementary clocks, as shown in actionGoTo:139,Figure actionGoTo:139,3.8(a). The first latch, L1, is called the master. The second latch, L2, is called the slave. The node between them is named N1. A symbol for the D flip-flop is given in actionGoTo:139,Figure actionGoTo:139,3.8(b). When the read more..

  • Page - 140

    shows the schematic and symbol for a four-bit register with inputs D3:0 and outputs Q3:0. D3:0 and Q3:0 are both 4-bit busses. 3.2.5 Enabled Flip-Flop An enabled flip-flop adds another input called EN or ENABLE to deter- mine whether data is loaded on the clock edge. When EN is TRUE, the enabled flip-flop behaves like an ordinary D flip-flop. When EN is FALSE, the enabled read more..

  • Page - 141

    FALSE, the CLK input is also FALSE and the flip-flop retains its old value. Notice that EN must not change while CLK = 1, lest the flip-flop see a clock glitch (switch at an incorrect time). Generally, performing logic on the clock is a bad idea. Clock gating delays the clock and can cause timing errors, as we will see in actionGoTo:173,Section actionGoTo:173,3.5.3, so do read more..

  • Page - 142

    actionGoTo:58,Section actionGoTo:58,1.7.7 that a transmission gate is an efficient way to build a CMOS switch, so we might expect that we could take advantage of transmission gates to reduce the transistor count. A compact D latch can be constructed from a single transmission gate, as shown in actionGoTo:142,Figure actionGoTo:142,3.12(a). When CLK = 1 and CLK = 0, the trans- mission read more..

  • Page - 143

    3.2.8 Putting It All Together Latches and flip-flops are the fundamental building blocks of sequential circuits. Remember that a D latch is level-sensitive, whereas a D flip-flop is edge-triggered. The D latch is transparent when CLK = 1, allowing the input D to flow through to the output Q. The D flip-flop copies D to Q on the rising edge of CLK. At all other times, read more..

  • Page - 144

    3.3 SYNCHRONOUS LOGIC DESIGN In general, sequential circuits include all circuits that are not combinational — that is, those whose output cannot be determined simply by looking at the current inputs. Some sequential circuits are just plain kooky. This section begins by examining some of those curious circuits. It then introduces the notion of synchronous sequential circuits and the read more..

  • Page - 145

    output, Q, given the two inputs, D and CLK, and the old state of the latch, Qprev. Based on this truth table, he has derived Boolean equations. He obtains Qprev by feeding back the output, Q. His design is shown in actionGoTo:145,Figure actionGoTo:145,3.18. Does his latch work correctly, independent of the delays of each gate? Solution: actionGoTo:145,Figure actionGoTo:145,3.19 shows read more..

  • Page - 146

    collection of combinational logic and registers. The registers contain the state of the system, which changes only at the clock edge, so we say the state is synchronized to the clock. If the clock is sufficiently slow, so that the inputs to all registers settle before the next clock edge, all races are eliminated. Adopting this discipline of always using registers in the read more..

  • Page - 147

    Example 3.5 SYNCHRONOUS SEQUENTIAL CIRCUITS Which of the circuits in actionGoTo:147,Figure actionGoTo:147,3.21 are synchronous sequential circuits? Solution: Circuit (a) is combinational, not sequential, because it has no registers. (b) is a simple sequential circuit with no feedback. (c) is neither a combinational circuit nor a synchronous sequential circuit, because it has a latch that is read more..

  • Page - 148

    Of course, asynchronous circuits are occasionally necessary when communicating between systems with different clocks or when receiving inputs at arbitrary times, just as analog circuits are necessary when com- municating with the real world of continuous voltages. Furthermore, research in asynchronous circuits continues to generate interesting insights, some of which can improve synchronous read more..

  • Page - 149

    looking where they are going. Football players are hustling between the athletic fields and the dining hall on Bravado Boulevard. They are tossing the ball back and forth and aren ’t looking where they are going either. Several serious injuries have already occurred at the intersection of these two roads, and the Dean of Students asks Ben Bitdiddle to install a traffic light read more..

  • Page - 150

    traffic is present on Academic Ave., the lights do not change. When there is no longer traffic on Academic Ave., the light on Academic Ave. becomes yellow for 5 seconds before it turns red and Bravado Blvd. ’s light turns green. Similarly, the Bravado Blvd. light remains green as long as traffic is present on the boulevard, then turns yellow and eventually red. In a read more..

  • Page - 151

    Ben updates the state transition table to use these binary encodings, as shown in actionGoTo:151,Table actionGoTo:151,3.4. The revised state transition table is a truth table specifying the next state logic. It defines next state, S ′, as a function of the current state, S, and the inputs. From this table, it is straightforward to read off the Boolean equa- tions for the next read more..

  • Page - 152

    S ′ 1 = S1 ⊕ S0 S ′ 0 = S1S0TA + S1S0TB (3.2) Similarly, Ben writes an output table (actionGoTo:152,Table actionGoTo:152,3.5) indicating, for each state, what the output should be in that state. Again, it is straightforward to read off and simplify the Boolean equations for the outputs. For exam- ple, observe that LA1 is TRUE only on the rows where S1 is TRUE. LA1 = S1 read more..

  • Page - 153

    (a) S1 S0 S'1 S'0 CLK state register Reset r S1 S0 S'1 S'0 CLK next state logic state register Reset TA TB inputs (b) S1 S0 r S1 S0 S'1 S'0 CLK next state logic output logic state register Reset LA1 LB1 LB0 LA0 TA TB inputs outputs (c) S1 S0 r Figure 3.26 State machine circuit for traffic light controller CLK Reset TA TB S'1:0 S1:0 LA1:0 LB1:0 Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 read more..

  • Page - 154

    immediately resets to S0, indicating that asynchronously resettable flip- flops are being used. In state S0, light LA is green and light LB is red. In this example, traffic arrives immediately on Academic Ave. There- fore, the controller remains in state S0, keeping LA green even though traffic arrives on Bravado Blvd. and starts waiting. After 15 seconds, the traffic on Academic read more..

  • Page - 155

    Solution: actionGoTo:155,Tables actionGoTo:155,3.6 actionGoTo:155,and actionGoTo:155,3.7 show the abstract state transition and output tables before encoding. actionGoTo:156,Table actionGoTo:156,3.8 compares binary and one-hot encodings for the three states. The binary encoding uses two bits of state. Using this encoding, the state transi- tion table is shown in actionGoTo:156,Table actionGoTo:156,3.9. Note read more..

  • Page - 156

    Table 3.8 One-hot and binary encodings for divide-by-3 counter State One-Hot Encoding Binary Encoding S2 S1 S0 S1 S0 S0 0 0 1 0 0 S1 0 1 0 0 1 S2 1 0 0 1 0 Table 3.9 State transition table with binary encoding Current State Next State S1 S0 S ′ 1 S ′ 0 00 0 1 01 1 0 10 0 0 Table 3.10 State transition table with one-hot encoding Current State Next State S2 S1 S0 S ′ 2 S ′ 1 read more..

  • Page - 157

    3.4.3 Moore and Mealy Machines So far, we have shown examples of Moore machines, in which the output depends only on the state of the system. Hence, in state transition diagrams for Moore machines, the outputs are labeled in the circles. Recall that Mealy machines are much like Moore machines, but the outputs can depend on inputs as well as the current state. Hence, in read more..

  • Page - 158

    actionGoTo:159,Table actionGoTo:159,3.15 shows the combined state transition and output table for the Mealy machine. The Mealy machine requires only one bit of state. Consider using a bin- ary state encoding: S0 = 0 and S1 = 1. actionGoTo:159,Table actionGoTo:159,3.16 rewrites the state transition and output table with these encodings. From these tables, we find the next state and read more..

  • Page - 159

    3.4.4 Factoring State Machines Designing complex FSMs is often easier if they can be broken down into multiple interacting simpler state machines such that the output of some machines is the input of others. This application of hierarchy and modularity is called factoring of state machines. Table 3.13 Moore state transition table with state encodings Current State Input Next State S1 read more..

  • Page - 160

    Example 3.8 UNFACTORED AND FACTORED STATE MACHINES Modify the traffic light controller from actionGoTo:148,Section actionGoTo:148,3.4.1 to have a parade mode, which keeps the Bravado Boulevard light green while spectators and the band march to football games in scattered groups. The controller receives two more inputs: P and R. Asserting P for at least one cycle enters parade mode. read more..

  • Page - 161

    Controller FSM T A (a) T B L A L B (b) Mode FSM Lights FSM P M Controller FSM L A L B T A T B R P R Figure 3.33 (a) single and (b) factored designs for modified traffic light controller FSM S0 L A: green L B: red L A: green L B: red L A: yellow L B: red L A: yellow L B: red L A: red L B: green L A: red L B: green L A: red L B: yellow L A: red L B: yellow S1 S3 S2 TA TA read more..

  • Page - 162

    3.4.5 Deriving an FSM from a Schematic Deriving the state transition diagram from a schematic follows nearly the reverse process of FSM design. This process can be necessary, for exam- ple, when taking on an incompletely documented project or reverse engi- neering somebody else ’s system. ▶ Examine circuit, stating inputs, outputs, and state bits. ▶ Write next state and output read more..

  • Page - 163

    machine because the output depends only on the state bits. From the circuit, she writes down the next state and output equations directly: S ′ 1 = S0A1A0 S ′ 0 = S1 S0A1A0 Unlock = S1 (3.12) Next, she writes down the next state and output tables from the equations, as shown in actionGoTo:163,Tables actionGoTo:163,3.17 actionGoTo:163,and actionGoTo:163,3.18, first placing 1 ’s in read more..

  • Page - 164

    state is always S1:0 = 00, independent of the inputs, so don ’t cares are inserted for the inputs. The reduced tables are shown in actionGoTo:164,Tables actionGoTo:164,3.19 actionGoTo:164,and actionGoTo:164,3.20. She assigns names to each state bit combination: S0 is S1:0 = 00, S1 is S1:0 = 01, and S2 is S1:0 = 10. actionGoTo:164,Tables actionGoTo:164,3.21 actionGoTo:164,and read more..

  • Page - 165

    Alyssa writes down the state transition diagram shown in actionGoTo:165,Figure actionGoTo:165,3.36 using actionGoTo:164,Tables actionGoTo:164,3.21 actionGoTo:164,and actionGoTo:164,3.22. By inspection, she can see that the finite state machine unlocks the door only after detecting an input value, A1:0, of three followed by an input value of one. The door is then locked again. Alyssa tries this read more..

  • Page - 166

    We will repeatedly use FSMs to design complex digital systems throughout this book. 3.5 TIMING OF SEQUENTIAL LOGIC Recall that a flip-flop copies the input D to the output Q on the rising edge of the clock. This process is called sampling D on the clock edge. If D is stable at either 0 or 1 when the clock rises, this behavior is clearly defined. But what happens if read more..

  • Page - 167

    3.5.1 The Dynamic Discipline So far, we have focused on the functional specification of sequential circuits. Recall that a synchronous sequential circuit, such as a flip-flop or FSM, also has a timing specification, as illustrated in actionGoTo:167,Figure actionGoTo:167,3.37. Whenthe clockrises, the output (or outputs) may start to change after the clock-to-Q contamina- tion delay, tccq, and read more..

  • Page - 168

    changes and settles to the final value within a propagation delay after its input settles. The gray arrows represent the contamination delay through R1 and the combinational logic, and the blue arrows represent the propa- gation delay through R1 and the combinational logic. We analyze the timing constraints with respect to the setup and hold time of the second register, R2. Setup read more..

  • Page - 169

    computation in the combinational logic, tpd. However, the sequencing overhead of the flip-flop cuts into this time. actionGoTo:168,Equation actionGoTo:168,3.14 is called the setup time constraint or max-delay constraint, because it depends on the setup time and limits the maximum delay through combinational logic. If the propagation delay through the combinational logic is too great, D2 may read more..

  • Page - 170

    In such a case, tcd = 0 because there is no combinational logic between flip-flops. Substituting into actionGoTo:169,Equation actionGoTo:169,3.16 yields the requirement that thold ≤ tccq (3.17) In other words, a reliable flip-flop must have a hold time shorter than its contamination delay. Often, flip-flops are designed with thold = 0, so that actionGoTo:170,Equation actionGoTo:170,3.17 is read more..

  • Page - 171

    contamination delay of 25 ps. Help Ben determine the maximum clock frequency and whether any hold time violations could occur. This process is called timing analysis. Solution: actionGoTo:171,Figure actionGoTo:171,3.43(a) shows waveforms illustrating when the signals might change. The inputs, A to D, are registered, so they only change shortly after CLK rises. The critical path occurs when read more..

  • Page - 172

    60 ps, meaning that X ′ must remain stable for 60 ps after the rising edge of CLK for the flip-flop to reliably sample its value. In this case, X ′ = 0 at the first rising edge of CLK, so we want the flip-flop to capture X = 0. Because X ′ did not hold stable long enough, the actual value of X is unpredictable. The circuit has a hold time violation and read more..

  • Page - 173

    However, some high-performance microprocessors, including the Pentium 4, use an element called a pulsed latch in place of a flip-flop. The pulsed latch behaves like a flip-flop but has a short clock-to-Q delay and a long hold time. In general, adding buffers can usually, but not always, solve hold time problems without slowing the critical path. 3.5.3 Clock Skew* In the previous read more..

  • Page - 174

    The data propagates through the register and combinational logic and must setup before R2 samples it. Hence, we conclude that Tc ≥ tpcq + tpd + tsetup + tskew (3.19) tpd ≤ Tc −ðtpcq + tsetup + tskew Þ (3.20) Next, consider the hold time constraint shown in actionGoTo:175,Figure actionGoTo:175,3.49. In the worst case, R1 receives an early skewed clock, CLK1, and R2 receives a read more..

  • Page - 175

    serious hold time failures, designers must not permit too much clock skew. Sometimes flip-flops are intentionally designed to be particularly slow (i.e., large tccq), to prevent hold time problems even when the clock skew is substantial. Example 3.12 TIMING ANALYSIS WITH CLOCK SKEW Revisit actionGoTo:170,Example actionGoTo:170,3.10 and assume that the system has 50 ps of clock skew. read more..

  • Page - 176

    The short path increases to 80 ps. This is still less than thold + tskew = 110 ps, so the circuit still violates its hold time constraint. To fix the problem, even more buffers could be inserted. Buffers would need to be added on the critical path as well, reducing the clock frequency. Alternatively, a better flip-flop with a shorter hold time might be used. 3.5.4 read more..

  • Page - 177

    shown that the probability that the resolution time, tres, exceeds some arbitrary time, t, decreases exponentially with t: P ðtres > t Þ = T0 Tc e− t τ (3.24) where Tc is the clock period, and T0 and τ are characteristic of the flip- flop. The equation is valid only for t substantially longer than tpcq. Intuitively, T0/Tc describes the probability that the input changes read more..

  • Page - 178

    period is long enough, D2 will, with high probability, resolve to a valid logic level before the end of the period. F2 then samples D2, which is now stable, producing a good output Q. We say that a synchronizer fails if Q, the output of the synchronizer, becomes metastable. This may happen if D2 has not resolved to a valid level by the time it must setup at F2 read more..

  • Page - 179

    that waits for one clock cycle provides a safe MTBF. In exceptionally high-speed systems, waiting for more cycles may be necessary. Example 3.14 SYNCHRONIZER FOR FSM INPUT The traffic light controller FSM from actionGoTo:148,Section actionGoTo:148,3.4.1 receives asynchronous inputs from the traffic sensors. Suppose that a synchronizer is used to guarantee stable inputs to the controller. read more..

  • Page - 180

    to a valid logic level after a time t. The remainder of this section analyzes a simple model of a bistable device to estimate this probability. A bistable device uses storage with positive feedback. actionGoTo:180,Figure actionGoTo:180,3.55(a) shows this feedback implemented with a pair of inverters; this circuit ’s behavior is representative of most bistable elements. A pair of read more..

  • Page - 181

    rate dvin(t)/dt = i(t)/C. Putting these facts together, we find the governing equation for the output voltage. dvout ðtÞ dt = ðG−1Þ RC vout ðtÞ− VDD 2 ! (3.31) This is a linear first-order differential equation. Solving it with the initial condition vout(0) = VDD/2 + ΔV gives vout ðtÞ = VDD 2 + ΔVe ðG−1Þt RC (3.32) actionGoTo:181,Figure actionGoTo:181,3.56 plots trajectories read more..

  • Page - 182

    between 0 and VDD. The probability that the output has not resolved to a legal value after time tres depends on the probability that the initial offset is sufficiently small. Specifically, the initial offset on vout must be less than ΔVres, so the initial offset on vin must be less than ΔVres/G. Then the prob- ability that the bistable device samples the input at a time read more..

  • Page - 183

    same time. With temporal parallelism, a task is broken into stages, like an assembly line. Multiple tasks can be spread across the stages. Although each task must pass through all stages, a different task will be in each stage at any given time so multiple tasks can overlap. Temporal parallelism is commonly called pipelining. Spatial parallelism is sometimes just called read more..

  • Page - 184

    token in one pipeline stage from catching up with and corrupting the token in the next stage. actionGoTo:184,Figure actionGoTo:184,3.58 shows an example of a circuit with no pipelining. It con- tains four blocks of logic between the registers. The critical path passes through blocks 2, 3, and 4. Assume that the register has a clock-to-Q pro- pagation delay of 0.3 ns and a read more..

  • Page - 185

    is two clock cycles, or 11 ns. The throughput is 1/5.5 ns = 182 MHz. This example shows that, in a real circuit, pipelining with two stages almost doubles the throughput and slightly increases the latency. In comparison, ideal pipelining would exactly double the throughput at no penalty in latency. The discrepancy comes about because the circuit cannot be divided into two read more..

  • Page - 186

    3.7 SUMMARY This chapter has described the analysis and design of sequential logic. In contrast to combinational logic, whose outputs depend only on the cur- rent inputs, sequential logic outputs depend on both current and prior inputs. In other words, sequential logic remembers information about prior inputs. This memory is called the state of the logic. Sequential circuits can be read more..

  • Page - 187

    Exercises Exercise 3.1 Given the input waveforms shown in actionGoTo:187,Figure actionGoTo:187,3.61, sketch the output, Q, of an SR latch. Exercise 3.2 Given the input waveforms shown in actionGoTo:187,Figure actionGoTo:187,3.62, sketch the output, Q, of an SR latch. Exercise 3.3 Given the input waveforms shown in actionGoTo:187,Figure actionGoTo:187,3.63, sketch the output, Q, of a D latch. read more..

  • Page - 188

    Exercise 3.5 Given the input waveforms shown in actionGoTo:187,Figure actionGoTo:187,3.63, sketch the output, Q, of a D flip-flop. Exercise 3.6 Given the input waveforms shown in actionGoTo:187,Figure actionGoTo:187,3.64, sketch the output, Q, of a D flip-flop. Exercise 3.7 Is the circuit in actionGoTo:188,Figure actionGoTo:188,3.65 combinational logic or sequential logic? Explain in a simple read more..

  • Page - 189

    Exercise 3.11 The circuit in actionGoTo:189,Figure actionGoTo:189,3.67 is called a Muller C-element. Explain in a simple fashion what the relationship is between the inputs and output. Exercise 3.12 Design an asynchronously resettable D latch using logic gates. Exercise 3.13 Design an asynchronously resettable D flip-flop using logic gates. Exercise 3.14 Design a synchronously settable D read more..

  • Page - 190

    Exercise 3.20 You are designing an FSM to keep track of the mood of four students working in the digital design lab. Each student ’s mood is either HAPPY (the circuit works), SAD (the circuit blew up), BUSY (working on the circuit), CLUELESS (confused about the circuit), or ASLEEP (face down on the circuit board). How many states does the FSM have? What is the minimum read more..

  • Page - 191

    light controller from actionGoTo:148,Section actionGoTo:148,3.4.1 so that both lights are red for 5 seconds before either light turns green again. Sketch your improved Moore machine state transition diagram, state encodings, state transition table, output table, next state and output equations, and your FSM schematic. Exercise 3.25 Alyssa P. Hacker ’s snail from actionGoTo:157,Section read more..

  • Page - 192

    repeats. For example, a watch uses a modulo 60 counter for the minutes and seconds that counts from 0 to 59.) When reset, the output should be 000. On each clock edge, the output should advance to the next Gray code. After reaching 100, it should repeat with 000. Exercise 3.28 Extend your modulo 8 Gray code counter from actionGoTo:191,Exercise actionGoTo:191,3.27 to be an read more..

  • Page - 193

    Exercise 3.32 Repeat actionGoTo:192,Exercise actionGoTo:192,3.31 for the FSM shown in actionGoTo:193,Figure actionGoTo:193,3.73. Recall that the s and r register inputs indicate set and reset, respectively. Exercise 3.33 Ben Bitdiddle has designed the circuit in actionGoTo:193,Figure actionGoTo:193,3.74 to compute a registered four-input XOR function. Each two-input XOR gate has a propagation delay read more..

  • Page - 194

    Exercise 3.34 You are designing an adder for the blindingly fast 2-bit RePentium Processor. The adder is built from two full adders such that the carry out of the first adder is the carry in to the second adder, as shown in actionGoTo:193,Figure actionGoTo:193,3.75. Your adder has input and output registers and must complete the addition in one clock cycle. Each full adder read more..

  • Page - 195

    Exercise 3.37 You would like to build a synchronizer that can receive asynchronous inputs with an MTBF of 50 years. Your system is running at 1 GHz, and you use sampling flip-flops with τ = 100 ps, T0 = 110 ps, and tsetup = 70 ps. The synchronizer receives a new asynchronous input on average 0.5 times per second (i.e., once every 2 seconds). What is the required read more..

  • Page - 196

    Interview Questions The following exercises present questions that have been asked at interviews for digital design jobs. Question 3.1 Draw a state machine that can detect when it has received the serial input sequence 01010. Question 3.2 Design a serial (one bit at a time) two ’s complementer FSM with two inputs, Start and A, and one output, Q. A binary number of arbitrary read more..

  • Page - 197

    read more..

  • Page - 198

    4 Hardware Description Languages 4.1 INTRODUCTION Thus far, we have focused on designing combinational and sequential digital circuits at the schematic level. The process of finding an efficient set of logic gates to perform a given function is labor intensive and error prone, requiring manual simplification of truth tables or Boolean equa- tions and manual translation of finite state read more..

  • Page - 199

    behavioral and structural. Behavioral models describe what a module does. Structural models describe how a module is built from simpler pieces; it is an application of hierarchy. The SystemVerilog and VHDL code in actionGoTo:199,HDL actionGoTo:199,Example actionGoTo:199,4.1 illustrate behavioral descriptions of a module that computes the Boolean function from actionGoTo:90,Example actionGoTo:90,2.6, read more..

  • Page - 200

    read more..

  • Page - 201

    read more..

  • Page - 202

    read more..

  • Page - 203

    read more..

  • Page - 204

    read more..

  • Page - 205

    read more..

  • Page - 206

    read more..

  • Page - 207

    read more..

  • Page - 208

    read more..

  • Page - 209

    read more..

  • Page - 210

    order is defined by the language. actionGoTo:210,HDL actionGoTo:210,Example actionGoTo:210,4.8 specifies operator prece- dence from highest to lowest for each language. The tables include arithmetic, shift, and comparison operators that will be defined in actionGoTo:264,Chapter actionGoTo:264,5. 4.2.7 Numbers Numbers can be specified in binary, octal, decimal, or hexadecimal (bases 2, 8, 10, and read more..

  • Page - 211

    4.2.8 Z ’s and X ’s HDLs use z to indicate a floating value, z is particularly useful for describ- ing a tristate buffer, whose output floats when the enable is 0. Recall from actionGoTo:99,Section actionGoTo:99,2.6.2 that a bus can be driven by several tristate buffers, exactly one of which should be enabled. actionGoTo:212,HDL actionGoTo:212,Example actionGoTo:212,4.10 shows the read more..

  • Page - 212

    At the start of simulation, state nodes such as flip-flop outputs are initia- lized to an unknown state (x in SystemVerilog and u in VHDL). This is helpful to trackerrorscausedbyforgettingtoreset a flip-flop before its output is used. If a gate receives a floating input, it may produce an x output when it can ’t determine the correct output value. Similarly, if it receives an read more..

  • Page - 213

    shows how SystemVerilog and VHDL combine these different signal values in logic gates. Seeing x or u values in simulation is almost always an indication of a bug or bad coding practice. In the synthesized circuit, this corresponds to a floating gate input, uninitialized state, or contention. The x or u may be interpreted ran- domly by the circuit as 0 or 1, leading to read more..

  • Page - 214

    understand cause and effect (deducing the source of a bad output is tricky if all signals change simultaneously in the simulation results). These delays are ignored during synthesis; the delay of a gate produced by the synthesizer depends on its tpd and tcd specifications, not on numbers in HDL code. actionGoTo:214,HDL actionGoTo:214,Example actionGoTo:214,4.13 adds delays to the read more..

  • Page - 215

    4.3 STRUCTURAL MODELING The previous section discussed behavioral modeling, describing a module in terms of the relationships between inputs and outputs. This section examines structural modeling, describing a module in terms of how it is composed of simpler modules. For example, actionGoTo:215,HDL actionGoTo:215,Example actionGoTo:215,4.14 shows how to assemble a 4:1 multi- plexer from three read more..

  • Page - 216

    an instance. Multiple instances of the same module are distinguished by distinct names, in this case lowmux , highmux ,and finalmux . Thisisan example of regularity, in which the 2:1 multiplexer is reused many times. actionGoTo:216,HDL actionGoTo:216,Example actionGoTo:216,4.15 uses structural modeling to construct a 2:1 multi- plexer from a pair of tristate buffers. Building logic out of read more..

  • Page - 217

    actionGoTo:217,HDL actionGoTo:217,Example actionGoTo:217,4.16 shows how modules can access part of a bus. An 8-bit wide 2:1 multiplexer is built using two of the 4-bit 2:1 multiplexers already defined, operating on the low and high nibbles of the byte. In general, complex systems are designed hierarchically. The overall system is described structurally by instantiating its major components. read more..

  • Page - 218

    4.4 SEQUENTIAL LOGIC HDL synthesizers recognize certain idioms and turn them into specific sequential circuits. Other coding styles may simulate correctly but synthe- size into circuits with blatant or subtle errors. This section presents the proper idioms to describe registers and latches. 4.4.1 Registers The vast majority of modern commercial systems are built with registers using positive read more..

  • Page - 219

    4.4.2 Resettable Registers When simulation begins or power is first applied to a circuit, the output of a flop or register is unknown. This is indicated with x in SystemVerilog and u in VHDL. Generally, it is good practice to use resettable registers so that on powerup you can put your system in a known state. The reset may be either asynchronous or synchronous. Recall that read more..

  • Page - 220

    the next rising edge of the clock. actionGoTo:220,HDL actionGoTo:220,Example actionGoTo:220,4.18 demonstrates the idioms for flip-flops with asynchronous and synchronous resets. Note that distinguishing synchronous and asynchronous reset in a schematic can be difficult. The schematic produced by Synplify Premier places asynchronous reset at the bottom of a flip-flop and synchronous reset on the read more..

  • Page - 221

    4.4.3 Enabled Registers Enabled registers respond to the clock only when the enable is asserted. actionGoTo:221,HDL actionGoTo:221,Example actionGoTo:221,4.19 shows an asynchronously resettable enabled register that retains its old value if both reset and en are FALSE. R q[3:0] d[3:0] reset clk [3:0] Q[3:0] [3:0] D[3:0] (a) q[3:0] d[3:0] reset clk [3:0] Q[3:0] [3:0] D[3:0] R (b) Figure 4.15 flopr read more..

  • Page - 222

    4.4.4 Multiple Registers A single always/process statement can be used to describe multiple pieces of hardware. For example, consider the synchronizer from actionGoTo:177,Section actionGoTo:177,3.5.5 made of two back-to-back flip-flops, as shown in actionGoTo:222,Figure actionGoTo:222,4.17. actionGoTo:222,HDL actionGoTo:222,Example actionGoTo:222,4.20 describes the synchronizer. On the rising edge of clk , d read more..

  • Page - 223

    4.4.5 Latches Recall from actionGoTo:138,Section actionGoTo:138,3.2.2 that a D latch is transparent when the clock is HIGH, allowing data to flow from input to output. The latch becomes opaque when the clock is LOW, retaining its old state. actionGoTo:223,HDL actionGoTo:223,Example actionGoTo:223,4.21 shows the idiom for a D latch. Not all synthesis tools support latches well. Unless you read more..

  • Page - 224

    statements are used to describe sequential circuits, because they remember the old state when no new state is prescribed. However, always/process statements can also be used to describe combinational logic behaviorally if the sensitivity list is written to respond to changes in all of the inputs and the body prescribes the output value for every possible input combination. read more..

  • Page - 225

    SystemVerilog In a SystemVerilog always statement, = indicates a blocking assignment and < = indicates a nonblocking assignment (also called a concurrent assignment). Do not confuse either type with continuous assignment using the assign statement. assign statements must be used out- side always statements and are also evaluated concurrently. VHDL In a VHDL process statement, : = indicates a read more..

  • Page - 226

    4.5.1 Case Statements A better application of using the always/process statement for combina- tional logic is a seven-segment display decoder that takes advantage of the case statement that must appear inside an always/process statement. As you might have noticed in the seven-segment display decoder of actionGoTo:104,Example actionGoTo:104,2.10, the design process for large blocks of read more..

  • Page - 227

    possible input combinations are defined; otherwise it implies sequential logic, because the output will keep its old value in the undefined cases. Synplify Premier synthesizes the seven-segment display decoder into a read-only memory (ROM) containing the 7 outputs for each of the 16 possible inputs. ROMs are discussed further in Section 5.5.6. If the default or others clause were left read more..

  • Page - 228

    y41 y34 y35 y36 y37 y38 y39 y40 y[7:0] a[2:0] [2:0] [0] [1] [2] [0] [1] [2] [2] [0] [1] [0] [2] [1] [1] [2] [0] [1] [0] [2] [0] [1] [2] [0] [1] [2] Figure 4.21 decoder3_8 synthesized circuit 4.5 More Combinational Logic 203 read more..

  • Page - 229

    HDL Example 4.26 PRIORITY CIRCUIT SystemVerilog module priorityckt(input logic [3:0] a, output logic [3:0] y); always_comb if (a[3]) y < = 4'b1000; else if (a[2]) y < = 4'b0100; else if (a[1]) y < = 4'b0010; else if (a[0]) y < = 4'b0001; else y < = 4'b0000; endmodule In SystemVerilog, if statements must appear inside of always statements. VHDL library IEEE; use read more..

  • Page - 230

    are handled, the statement implies combinational logic; otherwise, it pro- duces sequential logic (like the latch in actionGoTo:230,Section actionGoTo:230,4.4.5). actionGoTo:229,HDL actionGoTo:229,Example actionGoTo:229,4.26 uses if statements to describe a priority circuit, defined in actionGoTo:91,Section actionGoTo:91,2.4. Recall that an N-input priority circuit sets the out- put TRUE that corresponds to read more..

  • Page - 231

    y23[0] y24[0] y25 y[3:0] a[3:0] [3:0] [3] [2] [2] [1] [0] [3] [1] [2] [3] [0] [1] [2] [3] Figure 4.23 priority_casez synthesized circuit BLOCKING AND NONBLOCKING ASSIGNMENT GUIDELINES SystemVerilog 1. Use always_ff @(posedge clk) and nonblocking assign- ments to model synchronous sequential logic. always_ff @(posedge clk) begin n1 < = d; // nonblocking q< = n1; // nonblocking end 2. Use continuous read more..

  • Page - 232

    Combinational Logic* The full adder from actionGoTo:225,HDL actionGoTo:225,Example actionGoTo:225,4.23 is correctly modeled using block- ing assignments. This section explores how it operates and how it would differ if nonblocking assignments had been used. Imagine that a , b ,and cin are all initially 0. p , g , s , and cout are thus 0 as well. At some time, a changes to 1, read more..

  • Page - 233

    Observe that s is computed concurrently with p and hence uses the old value of p , not the new value. Therefore, s remains 0 rather than becoming 1. However, p does change from 0 to 1. This change triggers the always/process statement to evaluate a second time, as follows: p ← 1 ⊕ 0 = 1 g ← 1 ∙ 0 = 0 s ← 1 ⊕ 0 = 1 cout ← 0 + 1 ∙ 0 = 0 This read more..

  • Page - 234

    Because n1 is invisible to the outside world and does not influence the behavior of q , the synthesizer optimizes it away entirely, as shown in actionGoTo:234,Figure actionGoTo:234,4.24. The moral of this illustration is to exclusively use nonblocking assign- ment in always/process statements when modeling sequential logic. With sufficient cleverness, such as reversing the orders of the read more..

  • Page - 235

    actionGoTo:235,HDL actionGoTo:235,Example actionGoTo:235,4.30 describes the divide-by-3 FSM from actionGoTo:154,Section actionGoTo:154,3.4.2. It provides an asynchronous reset to initialize the FSM. The state register uses the ordinary idiom for flip-flops. The next state and output logic blocks are combinational. The Synplify Premier synthesis tool just produces a block diagram and state transition read more..

  • Page - 236

    gates or the inputs and outputs on the arcs and states. Therefore, be care- ful that you have specified the FSM correctly in your HDL code. The state transition diagram in actionGoTo:236,Figure actionGoTo:236,4.25 for the divide-by-3 FSM is analogous to the diagram in actionGoTo:155,Figure actionGoTo:155,3.28(b). The double circle indicates that S0 is the reset state. Gate-level read more..

  • Page - 237

    HDL Example 4.31 PATTERN RECOGNIZER MOORE FSM SystemVerilog module patternMoore(input logic clk, input logic reset, input logic a, output logic y); typedef enum logic [1:0] {S0, S1, S2} statetype; statetype state, nextstate; // state register always_ff @(posedge clk, posedge reset) if (reset) state < = S0; else state < = nextstate; // next state logic always_comb case (state) S0: if (a) read more..

  • Page - 238

    4.7 DATA TYPES* This section explains some subtleties about SystemVerilog and VHDL types in more depth. HDL Example 4.32 PATTERN RECOGNIZER MEALY FSM SystemVerilog module patternMealy(input logic clk, input logic reset, input logic a, output logic y); typedef enum logic {S0, S1} statetype; statetype state, nextstate; // state register always_ff @(posedge clk, posedge reset) if (reset) state < read more..

  • Page - 239

    4.7.1 SystemVerilog Prior to SystemVerilog, Verilog primarily used two types: reg and wire . Despite its name, a reg signal might or might not be associated with a regis- ter. This was a great source of confusion for those learning the language. SystemVerilog introduced the logic type to eliminate the confusion; hence, this book emphasizes the logic type. This section explains the read more..

  • Page - 240

    4.7.2 VHDL Unlike SystemVerilog, VHDL enforces a strict data typing system that can protect the user from some errors but that is also clumsy at times. Despite its fundamental importance, the STD_LOGIC type is not built into VHDL. Instead, it is part of the IEEE.STD_LOGIC_1164 library. Thus, every file must contain the library statements shown in the previous examples. Moreover, read more..

  • Page - 241

    because (state = S2) returns a BOOLEAN result, which cannot be directly assigned to the STD_LOGIC signal y . Although we do not declare any signals to be BOOLEAN , they are automa- tically implied by comparisons and used by conditional statements. Similarly, VHDL has an INTEGER type that represents both positive and negative integers. Signals of type INTEGER span at least the values read more..

  • Page - 242

    ports. VHDL 2008 eliminates this restriction by allowing out ports to be readable, but this change is not supported by the Synplify CAD tool at the time of this writing. entity and23 is port(a, b, c: in STD_LOGIC; v: buffer STD_LOGIC; w: out STD_LOGIC); end; Most operations such as addition, subtraction, and Boolean logic are identical whether a number is signed or unsigned. read more..

  • Page - 243

    HDL Example 4.34 PARAMETERIZED N-BIT 2:1 MULTIPLEXERS SystemVerilog module mux2 #(parameter width = 8) (input logic [width –1:0] d0, d1, input logic s, output logic [width –1:0] y); assign y = s ? d1 : d0; endmodule SystemVerilog allows a #(parameter . . . ) statement before the inputs and outputs to define parameters. The parameter statement includes a default value (8) of the read more..

  • Page - 244

    actionGoTo:244,HDL actionGoTo:244,Example actionGoTo:244,4.35 shows a decoder, which is an even better applica- tion of parameterized modules. A large N:2 N decoder is cumbersome to specify with case statements, but easy using parameterized code that sim- ply sets the appropriate output bit to 1. Specifically, the decoder uses block- ing assignments to set all the bits to 0, then read more..

  • Page - 245

    cascade of two-input AND gates. Of course, a reduction operator would be cleaner and simpler for this application, but the example illustrates the general principle of hardware generators. Use generate statements with caution; it is easy to produce a large amount of hardware unintentionally! 4.9 TESTBENCHES A testbench is an HDL module that is used to test another module, called the read more..

  • Page - 246

    actionGoTo:246,HDL actionGoTo:246,Example actionGoTo:246,4.37 demonstrates a simple testbench. It instantiates the DUT, then applies the inputs. Blocking assignments and delays are used to apply the inputs in the appropriate order. The user must view the results of the simulation and verify by inspection that the correct out- puts are produced. Testbenches are simulated the same as other read more..

  • Page - 247

    is to place the test vectors in a separate file. The testbench simply reads the test vectors from the file, applies the input test vector to the DUT, waits, checks that the output values from the DUT match the output vec- tor, and repeats until reaching the end of the test vectors file. actionGoTo:248,HDL actionGoTo:248,Example actionGoTo:248,4.39 demonstrates such a testbench. read more..

  • Page - 248

    sequential DUT. is a text file containing the inputs and expected output written in binary: 000_1 001_0 010_0 011_0 100_1 101_1 110_0 111_0 HDL Example 4.39 TESTBENCH WITH TEST VECTOR FILE SystemVerilog module testbench3(); logic clk, reset; logic a, b, c, y, yexpected; logic [31:0] vectornum, errors; logic [3:0] testvectors[10000:0]; // instantiate device under test sillyfunction dut(a, b, read more..

  • Page - 249

    New inputs are applied on the rising edge of the clock, and the output is checked on the falling edge of the clock. Errors are reported as they occur. At the end of the simulation, the testbench prints the total number of test vectors applied and the number of errors detected. The testbench in actionGoTo:248,HDL actionGoTo:248,Example actionGoTo:248,4.39 is overkill for such a read more..

  • Page - 250

    system that might be impossible to measure on a physical piece of hard- ware. Logic synthesis converts the HDL code into digital logic circuits. The most important thing to remember when you are writing HDL code is that you are describing real hardware, not writing a computer pro- gram. The most common beginner ’s mistake is to write HDL code with- out thinking about the read more..

  • Page - 251

    Exercises The following exercises may be done using your favorite HDL. If you have a simulator available, test your design. Print the waveforms and explain how they prove that it works. If you have a synthesizer available, synthesize your code. Print the generated circuit diagram, and explain why it matches your expectations. Exercise 4.1 Sketch a schematic of the circuit described read more..

  • Page - 252

    Introduce an error in the test vector file and show that the testbench reports a mismatch. Exercise 4.5 Write an HDL module called minority . It receives three inputs, a , b , and c . It produces one output, y , that is TRUE if at least two of the inputs are FALSE. Exercise 4.6 Write an HDL module for a hexadecimal seven-segment display decoder. The decoder should handle read more..

  • Page - 253

    Exercise 4.20 Write an HDL module that implements the priority encoder from actionGoTo:127,Exercise actionGoTo:127,2.36. Exercise 4.21 Write an HDL module that implements the modified priority encoder from actionGoTo:128,Exercise actionGoTo:128,2.37. Exercise 4.22 Write an HDL module that implements the binary-to-thermometer code converter from actionGoTo:128,Exercise actionGoTo:128,2.38. Exercise 4.23 Write an read more..

  • Page - 254

    Exercise 4.25 Sketch the state transition diagram for the FSM described by the following HDL code. An FSM of this nature is used in a branch predictor on some microprocessors. SystemVerilog module fsm1(input logic clk, reset, input logic taken, back, output logic predicttaken); logic [4:0] state, nextstate; parameter S0 = 5'b00001; parameter SI = 5'b00010; parameter S2 = 5'b00100; parameter S3 read more..

  • Page - 255

    Exercise 4.26 Write an HDL module for an SR latch. Exercise 4.27 Write an HDL module for a JK flip-flop. The flip-flop has inputs, clk, J, and K, and output Q. On the rising edge of the clock, Q keeps its old value if J = K = 0. It sets Q to 1 if J = 1, resets Q to 0 if K = 1, and inverts Q if J = K = 1. Exercise 4.28 Write an HDL module for the read more..

  • Page - 256

    Exercise 4.41 Write an HDL module for the serial two ’s complementer from Question 3.2. Exercise 4.42 Write an HDL module for the circuit in actionGoTo:192,Exercise actionGoTo:192,3.31. Exercise 4.43 Write an HDL module for the circuit in actionGoTo:193,Exercise actionGoTo:193,3.32. Exercise 4.44 Write an HDL module for the circuit in actionGoTo:193,Exercise actionGoTo:193,3.33. Exercise 4.45 Write read more..

  • Page - 257

    Exercise 4.50 The following SystemVerilog modules show errors that the authors have seen students make in the laboratory. Explain the error in each module and show how to fix it. (a) module latch(input logic clk, input logic [3:0] d, output reg [3:0] q); always @(clk) if (clk) q < = d; endmodule (b) module gates(input logic [3:0] a, b, output logic [3:0] y1, y2, y3, y4, y5); read more..

  • Page - 258

    always_comb // output logic (combinational) if (state == 0) out1 = 1; else out2 = 1; endmodule (f) module priority(input logic [3:0] a, output logic [3:0] y); always_comb if (a[3]) y = 4'b1000; else if (a[2]) y = 4'b0100; else if (a[1]) y = 4'b0010; else if (a[0]) y = 4'b0001; endmodule (g) module divideby3FSM(input logic clk, input logic reset, output logic out); logic [1:0] state, read more..

  • Page - 259

    always_ff @(posedge clk, posedge reset) if (reset) q < = 0; else q < = d; always @(set) if (set) q < = 1; endmodule (j) module and3(input logic a, b, c, output logic y); logic tmp; always @(a, b, c) begin tmp < = a& b; y< = tmp & c; end endmodule VHDL Exercises The following exercises are specific to VHDL. Exercise 4.51 In VHDL, why is it necessary to write read more..

  • Page - 260

    (c) architecture synth of flop is begin process(clk) if rising_edge(clk) then q< = d; end; (d) architecture synth of priority is begin process(all) begin if a(3) then y < = "1000"; elsif a(2) then y < = "0100"; elsif a(1) then y < = "0010"; elsif a(0) then y < = "0001"; end if; end process; end; (e) architecture synth of divideby3FSM is read more..

  • Page - 261

    (g) architecture asynchronous of floprs is begin process(clk, reset) begin if reset then q< = '0'; elsif rising_edge(clk) then q< = d; end if; end process; process(set) begin if set then q< = '1'; end if; end process; end; 236 CHAPTER FOUR Hardware Description Languages read more..

  • Page - 262

    Interview Questions The following exercises present questions that have been asked at interviews for digital design jobs. Question 4.1 Write a line of HDL code that gates a 32-bit bus called data with another signal called sel to produce a 32-bit result .If sel is TRUE, result = data . Otherwise, result should be all 0 ’s. Question 4.2 Explain the difference between blocking and read more..

  • Page - 263

    read more..

  • Page - 264

    5 Digital Building Blocks 5.1 INTRODUCTION Up to this point, we have examined the design of combinational and sequential circuits using Boolean equations, schematics, and HDLs. This chapter introduces more elaborate combinational and sequential building blocks used in digital systems. These blocks include arithmetic circuits, counters, shift registers, memory arrays, and logic arrays. These read more..

  • Page - 265

    Half Adder We begin by building a 1-bit half adder. As shown in actionGoTo:265,Figure actionGoTo:265,5.1, the half adder has two inputs, A and B, and two outputs, S and Cout. S is the sum of A and B.If A and B are both 1, S is 2, which cannot be represented with a single binary digit. Instead, it is indicated with a carry out Cout in the next column. The half read more..

  • Page - 266

    grows directly with the number of bits, as given in actionGoTo:266,Equation actionGoTo:266,5.1, where tFA is the delay of a full adder. tripple = NtFA (5.1) Carry-Lookahead Adder The fundamental reason that large ripple-carry adders are slow is that the carry signals must propagate through every bit in the adder. A carry- lookahead adder (CLA) is another type of carry propagate adder read more..

  • Page - 267

    actionGoTo:267,Figure actionGoTo:267,5.6(a) shows a 32-bit carry-lookahead adder composed of eight 4-bit blocks. Each block contains a 4-bit ripple-carry adder and some lookahead logic to compute the carry out of the block given the carry in, as shown in actionGoTo:267,Figure actionGoTo:267,5.6(b). The AND and OR gates needed to compute the single-bit generate and propagate signals, Gi and read more..

  • Page - 268

    where tpg is the delay of the individual generate/propagate gates (a single AND or OR gate) to generate Pi and Gi, tpg_block is the delay to find the generate/propagate signals Pi:j and Gi:j for a k-bit block, and tAND_OR is the delay from Cin to Cout through the final AND/OR logic of the k-bit CLA block. For N > 16, the carry-lookahead adder is generally much faster read more..

  • Page - 269

    actionGoTo:269,Figure actionGoTo:269,5.7 shows an N = 16-bit prefix adder. The adder begins with a precomputation to form Pi and Gi for each column from Ai and Bi using AND and OR gates. It then uses log2N = 4 levels of black cells to form the prefixes of Gi:j and Pi:j. A black cell takes inputs from the upper part of a block spanning bits i:k and from the lower read more..

  • Page - 270

    lower parts propagate the carry. Finally, the prefix adder computes the sums using actionGoTo:268,Equation actionGoTo:268,5.8. In summary, the prefix adder achieves a delay that grows logarithmi- cally rather than linearly with the number of columns in the adder. This speedup is significant, especially for adders with 32 or more bits, but it comes at the expense of more hardware than read more..

  • Page - 271

    5.2.2 Subtraction Recall from actionGoTo:40,Section actionGoTo:40,1.4.6 that adders can add positive and negative num- bers using two ’s complement number representation. Subtraction is almost as easy: flip the sign of the second number, then add. Flipping the sign of a two ’s complement number is done by inverting the bits and adding 1. To compute Y = A − B, first create the read more..

  • Page - 272

    An equality comparator produces a single output indicating whether A is equal to B (A == B). A magnitude comparator produces one or more out- puts indicating the relative values of A and B. The equality comparator is the simpler piece of hardware. actionGoTo:272,Figure actionGoTo:272,5.11 shows the symbol and implementation of a 4-bit equality comparator. It first checks to determine read more..

  • Page - 273

    5.2.4 ALU An Arithmetic/Logical Unit (ALU) combines a variety of mathematical and logical operations into a single unit. For example, a typical ALU might perform addition, subtraction, magnitude comparison, AND, and OR operations. The ALU forms the heart of most computer systems. actionGoTo:273,Figure actionGoTo:273,5.14 shows the symbol for an N-bit ALU with N-bit inputs and outputs. The read more..

  • Page - 274

    function to perform. Control signals will generally be shown in blue to distinguish them from the data. actionGoTo:274,Table actionGoTo:274,5.1 lists typical functions that the ALU can perform. The SLT function is used for magnitude comparison and will be discussed later in this section. actionGoTo:274,Figure actionGoTo:274,5.15 shows an implementation of the ALU. The ALU contains an N-bit read more..

  • Page - 275

    More specifically, the arithmetic and logical blocks in the ALU oper- ate on A and BB. BB is either B or B, depending on F2.If F1:0 = 00, the output multiplexer chooses A AND BB.If F1:0 = 01, the ALU computes A OR BB.If F1:0 = 10, the ALU performs addition or subtraction. Note that F2 is also the carry in to the adder. Also remember that B + 1 = −B in two read more..

  • Page - 276

    (see Sections 5.2.6 and 5.2.7). Arithmetic shift left (ASL) is the same as logical shift left (LSL). Ex: 11001 ASR 2 = 11110; 11001 ASL 2 = 00100 ▶ Rotator —rotates number in circle such that empty spots are filled with bits shifted off the other end. Ex: 11001 ROR 2 = 01110; 11001 ROL 2 = 00111 An N-bit shifter can be built from NN:1 multiplexers. The input is read more..

  • Page - 277

    An arithmetic right shift is a special case of division. An arithmetic right shift by N bits divides the number by 2 N. For example, 111002 >>> 2 = 111112 is equivalent to −410/22 = −110. 5.2.6 Multiplication* Multiplication of unsigned binary numbers is similar to decimal multipli- cation but involves only 1 ’s and 0 ’s. actionGoTo:277,Figure actionGoTo:277,5.17 compares read more..

  • Page - 278

    are N partial products and N − 1 stages of 1-bit adders. For example, for a 4 × 4 multiplier, the partial product of the first row is B0 AND (A3, A2, A1, A0). This partial product is added to the shifted second partial product, B1 AND (A3, A2, A1, A0). Subsequent rows of AND gates and adders form and add the remaining partial products. The HDL for a multiplier read more..

  • Page - 279

    the quotient bit Qi is 0 and the difference is discarded. Otherwise, Qi is 1, and the partial remainder is updated to be the difference. In any event, the partial remainder is then doubled (left-shifted by one column), the next most significant bit of A becomes the least significant bit of R, and the process repeats. The result satisfies A B =Q + R B. actionGoTo:279,Figure read more..

  • Page - 280

    (a) 01101100 (b) 0110.1100 (c) 22 + 21 + 2–1 + 2–2 = 6.75 Figure 5.21 Fixed-point notation of 6.75 with four integer bits and four fraction bits (a) 0010.0110 (b) 1010.0110 (c) 1101.1010 Figure 5.22 Fixed-point representation of −2.375: (a) absolute value, (b) sign and magnitude, (c) two ’s complement Fixed-point number systems are commonly used for banking read more..

  • Page - 281

    5.3.2 Floating-Point Number Systems* Floating-point numbers are analogous to scientific notation. They circum- vent the limitation of having a constant number of integer and fractional bits, allowing the representation of very large and very small numbers. Like scientific notation, floating-point numbers have a sign, mantissa (M), base (B), and exponent (E), as shown in actionGoTo:281,Figure read more..

  • Page - 282

    We make one final modification to the exponent field. The exponent needs to represent both positive and negative exponents. To do so, float- ing-point uses a biased exponent, which is the original exponent plus a constant bias. 32-bit floating-point uses a bias of 127. For example, for the exponent 7, the biased exponent is 7 + 127 = 134 = 100001102. For the exponent −4, read more..

  • Page - 283

    defines 64-bit double-precision numbers (also called doubles) that provide greater precision and greater range. actionGoTo:283,Table actionGoTo:283,5.3 shows the number of bits used for the fields in each format. Excluding the special cases mentioned earlier, normal single-precision numbers span a range of ±1.175494 × 10− 38 to ±3.402824 × 10 38. They have a precision of about seven read more..

  • Page - 284

    actionGoTo:284,Figure actionGoTo:284,5.29 shows the floating-point addition of 7.875 (1.11111 × 2 2) and 0.1875 (1.1 × 2− 3). The result is 8.0625 (1.0000001 × 2 3). After the fraction and exponent bits are extracted and the implicit leading 1 is prepended in steps 1 and 2, the exponents are compared by subtracting the smaller exponent from the larger exponent. The result is read more..

  • Page - 285

    5.4 SEQUENTIAL BUILDING BLOCKS This section examines sequential building blocks, including counters and shift registers. 5.4.1 Counters An N-bit binary counter, shown in actionGoTo:285,Figure actionGoTo:285,5.30, is a sequential arithmetic circuit with clock and reset inputs and an N-bit output Q. Reset initializes the output to 0. The counter then advances through all 2 N possible out- puts read more..

  • Page - 286

    5.4.2 Shift Registers A shift register has a clock, a serial input Sin, a serial output Sout,and N parallel outputs QN−1:0, as shown in actionGoTo:286,Figure actionGoTo:286,5.33. On each rising edge of the clock, a new bit is shifted in from Sin and all the subsequent contents are shifted forward. The last bit in the shift register is available at Sout. Shift registers can read more..

  • Page - 287

    To solve this problem, designers like to be able to directly observe and control all the state of the machine. This is done by adding a test mode in which the contents of all flip-flops can be read out or loaded with desired values. Most systems have too many flip-flops to dedicate individual pins to read and write each flip-flop. Instead, all the flip-flops in the read more..

  • Page - 288

    contents out and shift in new contents using Sin and Sout. The load multi- plexer is usually integrated into the flip-flop to produce a scannable flip-flop. actionGoTo:288,Figure actionGoTo:288,5.37 shows the schematic and symbol for a scannable flip-flop and illustrates how the flops are cascaded to build an N-bit scan- nable register. For example, the 32-bit counter could be tested read more..

  • Page - 289

    specified by an Address. The value read or written is called Data.An array with N-bit addresses and M-bit data has 2 N rows and M columns. Each row of data is called a word. Thus, the array contains 2 N M-bit words. actionGoTo:289,Figure actionGoTo:289,5.39 shows a memory array with two address bits and three data bits. The two address bits specify one of the four rows read more..

  • Page - 290

    Memory Ports All memories have one or more ports. Each port gives read and/or write access to one memory address. The previous examples were all single- ported memories. Multiported memories can access several addresses simultaneously. actionGoTo:290,Figure actionGoTo:290,5.43 shows a three-ported memory with two read ports and one write port. Port 1 reads the data from address A1 onto read more..

  • Page - 291

    the tape). ROM is called read only memory because, historically, it could only be read but not written. These names are confusing, because ROMs are randomly accessed too. Worse yet, most modern ROMs can be writ- ten as well as read! The important distinction to remember is that RAMs are volatile and ROMs are nonvolatile. The two major types of RAMs are dynamic RAM (DRAM) read more..

  • Page - 292

    immediately at its output. But flip-flops take at least 20 transistors to build. Generally, the more transistors a device has, the more area, power, and cost it requires. DRAM latency is longer than that of SRAM because its bitline is not actively driven by a transistor. DRAM must wait for charge to move (relatively) slowly from the capacitor to the bitline. DRAM also read more..

  • Page - 293

    file has two read ports (A1/RD1 and A2/RD2) and one write port (A3/ WD3). The 5-bit addresses, A1, A2,and A3, can each access all 2 5 = 32 registers. So, two registers can be read and one register written simultaneously. 5.5.6 Read Only Memory Read only memory (ROM) stores a bit as the presence or absence of a transistor. actionGoTo:293,Figure actionGoTo:293,5.48 shows a simple read more..

  • Page - 294

    transistor in every bit cell but provides a way to connect or disconnect the transistor to ground. actionGoTo:294,Figure actionGoTo:294,5.51 shows the bit cell for a fuse-programmable ROM.The user programs the ROM by applying a high voltage to selectively blow fuses. If the fuse is present, the transistor is connected to GND and the cell holds a 0. If the fuse is destroyed, the read more..

  • Page - 295

    amounts of data in portable battery-powered systems such as cameras and music players. In summary, modern ROMs are not really read only; they can be programmed (written) as well. The difference between RAM and ROM is that ROMs take a longer time to write but are nonvolatile. 5.5.7 Logic Using Memory Arrays Although they are used primarily for data storage, memory arrays can read more..

  • Page - 296

    edge of the clock if the write enable we is asserted. Reads occur imme- diately. When power is first applied, the contents of the RAM are unpredictable. actionGoTo:297,HDL actionGoTo:297,Example actionGoTo:297,5.8 describes a 4-word × 3-bit ROM. The contents of the ROM are specified in the HDL case statement. A ROM as small as this one may be synthesized into logic gates rather read more..

  • Page - 297

    5.6 LOGIC ARRAYS Like memory, gates can be organized into regular arrays. If the connec- tions are made programmable, these logic arrays can be configured to perform any function without the user having to connect wires in specific ways. The regular structure simplifies design. Logic arrays are mass pro- duced in large quantities, so they are inexpensive. Software tools allow users read more..

  • Page - 298

    and complementary form) drive an AND array, which produces implicants, which in turn are ORed together to form the outputs. An M × N × P-bit PLA has M inputs, N implicants, and P outputs. actionGoTo:298,Figure actionGoTo:298,5.55 shows the dot notation for a 3 × 3 × 2-bit PLA perform- ing the functions X = A BC + ABC and Y = AB. Each row in the AND array forms an read more..

  • Page - 299

    depend on all 2 M minterms, a PLA is likely to be smaller than a ROM. For example, an 8-word × 2-bit ROM is required to perform the same functions performed by the 3 × 3 × 2-bit PLA shown in actionGoTo:298,Figures actionGoTo:298,5.55 actionGoTo:298,and actionGoTo:298,5.56. Simple programmable logic devices (SPLDs) are souped-up PLAs that add registers and various other features to read more..

  • Page - 300

    Two of the leading FPGA manufacturers are Altera Corp. and Xilinx, Inc. actionGoTo:301,Figure actionGoTo:301,5.58 shows a single LE from Altera ’s Cyclone IV FPGA introduced in 2009. The key elements of the LE are a 4-input lookup table (LUT) and a 1-bit register. The LE also contains con- figurable multiplexers to route signals through the LE. The FPGA is con- figured by read more..

  • Page - 301

    the carry chain hardware, other multiplexers for routing, and flip-flop enable and reset. Altera groups 16 LEs together to create a logic array block (LAB) and provides local connections between LEs within the LAB. In summary, the Cyclone IV LE can perform one combinational and/ or registered function which can involve up to four variables. Other brands of FPGAs are organized read more..

  • Page - 302

    Example 5.6 FUNCTIONS BUILT USING LEs Explain how to configure one or more Cyclone IV LEs to perform the following functions: (a) X = A BC + ABC and Y = AB (b) Y = JKLMPQR; (c) a divide-by-3 counter with binary state encoding (see actionGoTo:156,Figure actionGoTo:156,3.29(a)). You may show interconnec- tion between LEs as needed. Solution: (a) Configure two LEs. One LUT computes X read more..

  • Page - 303

    using the multiplexer on data 3 and routing channels between LEs, as indicated by the dashed blue lines. In general, another LE might be necessary to compute the output Y. However, in this case Y = S0,so Y can come from LE 1. Hence, the entire FSM fits in two LEs. In general, an FSM requires at least one LE for each bit of state, and it may require more LEs read more..

  • Page - 304

    Example 5.7 LE DELAY Alyssa P. Hacker is building a finite state machine that must run at 200 MHz. She uses a Cyclone IV GX FPGA with the following specifications: tLE = 381 ps per LE, tsetup = 76 ps, and tpcq = 199 ps for all flip-flops. The wiring delay between LEs is 246 ps. Assume the hold time for the flip-flops is 0. What is the maximum number of LEs her read more..

  • Page - 305

    output HIGH for each wordline without a pull-down transistor. For exam- ple, when AB = 11, the 11 wordline is HIGH and transistors on X and Z turn on and pull those outputs LOW. The Y output has no transistor con- necting to the 11 wordline, so Y is pulled HIGH by the weak pull-up. PLAs can also be built using pseudo-nMOS circuits, as shown in actionGoTo:305,Figure read more..

  • Page - 306

    ripples through each of the full adders. Faster CPAs can be constructed using lookahead or prefix techniques. A subtractor negates the second input and adds it to the first. A mag- nitude comparator subtracts one number from another and determines the relative value based on the sign of the result. A multiplier forms par- tial products using AND gates, then sums these bits read more..

  • Page - 307

    Exercises Exercise 5.1 What is the delay for the following types of 64-bit adders? Assume that each two-input gate delay is 150 ps and that a full adder delay is 450 ps. (a) a ripple-carry adder (b) a carry-lookahead adder with 4-bit blocks (c) a prefix adder Exercise 5.2 Design two adders: a 64-bit ripple-carry adder and a 64-bit carry- lookahead adder with 4-bit blocks. Use read more..

  • Page - 308

    Exercise 5.8 Design the following comparators for 32-bit numbers. Sketch the schematics. (a) not equal (b) greater than (c) less than or equal to Exercise 5.9 Design the 32-bit ALU shown in actionGoTo:274,Figure actionGoTo:274,5.15 using your favorite HDL. You can make the top-level module either behavioral or structural. Exercise 5.10 Add an Overflow output to the 32-bit ALU from read more..

  • Page - 309

    values of B, C, and k, the funnel shifter can perform any type of shift or rotate. Explain what these values should be in terms of A, shamt, and N for (a) logical right shift of A by shamt (b) arithmetic right shift of A by shamt (c) left shift of A by shamt (d) right rotate of A by shamt (e) left rotate of A by shamt Exercise 5.18 Find the critical path read more..

  • Page - 310

    Exercise 5.23 Compute 111001.0002/001100.0002 in binary using the standard division algorithm from elementary school. Show your work. Exercise 5.24 What is the range of numbers that can be represented by the following number systems? (a) 24-bit unsigned fixed-point numbers with 12 integer bits and 12 fraction bits (b) 24-bit sign and magnitude fixed-point numbers with 12 integer bits and read more..

  • Page - 311

    Exercise 5.31 Convert the following two ’s complement binary fixed-point numbers to base 10. The implied binary point is explicitly shown to aid in your interpretation. (a) 0101.1000 (b) 1111.1111 (c) 1000.0000 Exercise 5.32 Repeat Exercise 5.31 for the following two ’s complement binary fixed-point numbers. (a) 011101.10101 (b) 100110.11010 (c) 101000.00100 Exercise 5.33 When adding two read more..

  • Page - 312

    (b) How many additional numbers could be represented if ±∞ and NaN were not represented? (c) Explain why ±∞ and NaN are given special representations. Exercise 5.38 Consider the following decimal numbers: 245 and 0.0625. (a) Write the two numbers using single-precision floating-point notation. Give your answers in hexadecimal. (b) Perform a magnitude comparison of the two 32-bit read more..

  • Page - 313

    (c) What is the delay of your 32-bit prefix adder from part (a)? Assume that each two-input gate delay is 100 ps. (d) Design a pipelined version of the 32-bit prefix adder. Sketch the schematic of your design. How fast can your pipelined prefix adder run? You may assume a sequencing overhead (tpcq + tsetup) of 80 ps. Make the design run as fast as possible. (e) Design read more..

  • Page - 314

    Exercise 5.48 The English language has a good deal of redundancy that allows us to reconstruct garbled transmissions. Binary data can also be transmitted in redundant form to allow error correction. For example, the number 0 could be coded as 00000 and the number 1 could be coded as 11111. The value could then be sent over a noisy channel that might flip up to two of read more..

  • Page - 315

    Exercise 5.51 Implement the following functions using a single 16 × 3 ROM. Use dot notation to indicate the ROM contents. (a) X = AB + BCD + A B (b) Y = AB + BD (c) Z = A + B + C + D Exercise 5.52 Implement the functions from Exercise 5.51 using a 4 × 8 × 3 PLA. You may use dot notation. Exercise 5.53 Specify the size of a ROM that you could use to program read more..

  • Page - 316

    Exercise 5.55 How many Cyclone IV FPGA LEs are required to perform each of the following functions? Show how to configure one or more LEs to perform the function. You should be able to do this by inspection, without performing logic synthesis. (a) the combinational function from Exercise 2.13(c) (b) the combinational function from Exercise 2.17(c) (c) the two-output function from read more..

  • Page - 317

    Exercise 5.58 Repeat Exercise 5.57 for the FSM of actionGoTo:160,Figure actionGoTo:160,3.31(b). Exercise 5.59 You would like to use an FPGA to implement an M &M sorter with a color sensor and motors to put red candy in one jar and green candy in another. The design is to be implemented as an FSM using a Cyclone IV FPGA. According to the data sheet, the FPGA has timing read more..

  • Page - 318

    Interview Questions The following exercises present questions that have been asked at interviews for digital design jobs. Question 5.1 What is the largest possible result of multiplying two unsigned N-bit numbers? Question 5.2 Binary coded decimal (BCD) representation uses four bits to encode each decimal digit. For example, 4210 is represented as 01000010BCD · Explain in words why read more..

  • Page - 319

    read more..

  • Page - 320

    6 Architecture 6.1 INTRODUCTION The previous chapters introduced digital design principles and building blocks. In this chapter, we jump up a few levels of abstraction to define the architecture of a computer. The architecture is the programmer ’s view of a computer. It is defined by the instruction set (language) and operand locations (registers and memory). Many different read more..

  • Page - 321

    A computer architecture does not define the underlying hardware implementation. Often, many different hardware implementations of a single architecture exist. For example, Intel and Advanced Micro Devices (AMD) both sell various microprocessors belonging to the same x86 architecture. They all can run the same programs, but they use different underlying hardware and therefore offer trade-offs read more..

  • Page - 322

    assembly language. Note that statements in a C program end with a semicolon. The first part of the assembly instruction, add , is called the mnemonic and indicates what operation to perform. The operation is performed on b and c , the source operands, and the result is written to a , the destination operand. actionGoTo:322,Code actionGoTo:322,Example actionGoTo:322,6.2 shows that read more..

  • Page - 323

    to perform more complex operations is an example of the second design principle of computer architecture: Design Principle 2: Make the common case fast. The MIPS instruction set makes the common case fast by including only simple, commonly used instructions. The number of instructions is kept small so that the hardware required to decode the instruction and its operands can be read more..

  • Page - 324

    architectures specify a small number of registers that hold commonly used operands. The MIPS architecture uses 32 registers, called the register set or register file. The fewer the registers, the faster they can be accessed. This leads to the third design principle: Design Principle 3: Smaller is faster. Looking up information from a small number of relevant books on your desk is read more..

  • Page - 325

    Example 6.1 TRANSLATING HIGH-LEVEL CODE TO ASSEMBLY LANGUAGE Translate the following high-level code into assembly language. Assume variables a –c are held in registers $s0 –$s2 and f –j are in $s3 –$s7. a = b – c; f = (g + h) − (i + j); Solution: The program uses four assembly language instructions. # MIPS assembly code # $s0 = a, $s1 = b, $s2 = c, $s3 = f, $s4 = read more..

  • Page - 326

    Memory If registers were the only storage space for operands, we would be confined to simple programs with no more than 32 variables. However, data can also be stored in memory. When compared to the register file, memory has many data locations, but accessing it takes a longer amount of time. Whereas the register file is small and fast, memory is large and slow. For this read more..

  • Page - 327

    address ($0 + 1 ) = 1 . After the load word instruction (lw) is executed, $s3 holds the value 0xF2F1AC07, which is the data value stored at memory address 1 in actionGoTo:326,Figure actionGoTo:326,6.1. Similarly, MIPS uses the store word instruction, sw , to write a data word from a register into memory. actionGoTo:327,Code actionGoTo:327,Example actionGoTo:327,6.7 writes the contents of read more..

  • Page - 328

    significant) end. In little-endian machines, bytes are numbered starting with 0 at the little (least significant) end. Word addresses are the same in both for- mats and refer to the same four bytes. Only the addresses of bytes within a word differ. Example 6.2 BIG- AND LITTLE-ENDIAN MEMORY Suppose that $s0 initially contains 0x23456789. After the following program is run on a read more..

  • Page - 329

    sharing data between big-endian and little-endian computers. In examples in this text, we will use little-endian format whenever byte ordering matters. In the MIPS architecture, word addresses for lw and sw must be word aligned. That is, the address must be divisible by 4. Thus, the instruction lw $s0, 7($0) is an illegal instruction. Some architectures, such as x86, allow read more..

  • Page - 330

    6.3 MACHINE LANGUAGE Assembly language is convenient for humans to read. However, digital circuits understand only 1 ’s and 0 ’s. Therefore, a program written in assembly language is translated from mnemonics to a representation using only 1 ’s and 0 ’s called machine language. MIPS uses 32-bit instructions. Again, simplicity favors regularity, and the most regular choice is to read more..

  • Page - 331

    register. The fields contain the register numbers that were given in actionGoTo:325,Table actionGoTo:325,6.1. For example, $s0 is register 16. The fifth field, shamt , is used only in shift operations. In those instructions, the binary value stored in the 5-bit shamt field indicates the amount to shift. For all other R-type instructions, shamt is 0. actionGoTo:331,Figure actionGoTo:331,6.6 read more..

  • Page - 332

    6.3.2 l-Type Instructions The name I-type is short for immediate-type. I-type instructions use two register operands and one immediate operand. actionGoTo:332,Figure actionGoTo:332,6.8 shows the I-type machine instruction format. The 32-bit instruction has four fields: op , rs , rt , and imm . The first three fields, op , rs , and rt , are like those of R-type instructions. The imm read more..

  • Page - 333

    I-type instructions have a 16-bit immediate field, but the immediates are used in 32-bit operations. For example, lw adds a 16-bit offset to a 32-bit base register. What should go in the upper half of the 32 bits? For positive immediates, the upper half should be all 0 ’s, but for negative immediates, the upper half should be all 1 ’s. Recall from actionGoTo:40,Section read more..

  • Page - 334

    Example 6.5 TRANSLATING MACHINE LANGUAGE TO ASSEMBLY LANGUAGE Translate the following machine language code into assembly language. 0x2237FFF1 0x02F34022 Solution: First, we represent each instruction in binary and look at the six most sig- nificant bits to find the opcode for each instruction, as shown in actionGoTo:334,Figure actionGoTo:334,6.12. The opcode determines how to interpret the rest read more..

  • Page - 335

    To run or execute the stored program, the processor fetches the instructions from memory sequentially. The fetched instructions are then decoded and executed by the digital hardware. The address of the current instruction is kept in a 32-bit register called the program counter (PC). The PC is separate from the 32 registers shown previously in actionGoTo:325,Table actionGoTo:325,6.1. To read more..

  • Page - 336

    Logical Instructions MIPS logical operations include and , or , xor , and nor . These R-type instructions operate bit-by-bit on two source registers and write the result to the destination register. actionGoTo:336,Figure actionGoTo:336,6.14 shows examples of these opera- tions on the two source values 0xFFFF0000 and 0x46A1F0B7. The figure shows the values stored in the destination register, read more..

  • Page - 337

    register and immediate and the value of the destination register rt after the instruction executes. Because these instructions operate on a 32-bit value from a register and a 16-bit immediate, they first zero-extend the immediate to 32 bits. Shift Instructions Shift instructions shift the value in a register left or right by up to 31 bits. Shift operations multiply or divide by read more..

  • Page - 338

    field is ignored and should be all 0 ’s. actionGoTo:338,Figure actionGoTo:338,6.19 shows register values for each type of variable-shift instruction. Generating Constants The addi instruction is helpful for assigning 16-bit constants, as shown in actionGoTo:338,Code actionGoTo:338,Example actionGoTo:338,6.10. To assign 32-bit constants, use a load upper immediate instruction (lui) followed by an or read more..

  • Page - 339

    Multiplication and Division Instructions* Multiplication and division are somewhat different from other arithmetic operations. Multiplying two 32-bit numbers produces a 64-bit product. Dividing two 32-bit numbers produces a 32-bit quotient and a 32-bit remainder. The MIPS architecture has two special-purpose registers, hi and lo , which are used to hold the results of multiplication and read more..

  • Page - 340

    labels are translated into instruction addresses (see actionGoTo:358,Section actionGoTo:358,6.5). MIPS assembly labels are followed by a colon (:) and cannot use reserved words, such as instruction mnemonics. Most programmers indent their instructions but not the labels, to help make labels stand out. actionGoTo:340,Code actionGoTo:340,Example actionGoTo:340,6.13 shows an example using the branch read more..

  • Page - 341

    actionGoTo:341,Code actionGoTo:341,Example actionGoTo:341,6.15 shows the use of the jump register instruction (jr). Instruction addresses are given to the left of each instruction. jr $s0 jumps to the address held in $s0 , 0x00002010. 6.4.3 Conditional Statements if , if/else ,and switch/case statements are conditional statements com- monly used in high-level languages. They each conditionally read more..

  • Page - 342

    The assembly code for the if statement tests the opposite condition of the one in the high-level code. In actionGoTo:341,Code actionGoTo:341,Example actionGoTo:341,6.16, the high-level code tests for i == j , and the assembly code tests for i! = j . The bne instruction branches (skips the if block) when i! = j . Otherwise, i == j , the branch is not taken, and the if block read more..

  • Page - 343

    While Loops while loops repeatedly execute a block of code until a condition is not met. The while loop in actionGoTo:343,Code actionGoTo:343,Example actionGoTo:343,6.19 determines the value of x such that 2 x = 128. It executes seven times, until pow = 128. Like if/else statements, the assembly code for while loops tests the opposite condition of the one given in the high-level code. read more..

  • Page - 344

    In actionGoTo:343,Code actionGoTo:343,Example actionGoTo:343,6.19, the while loop compares pow to 128 and exits the loop if it is equal. Otherwise it doubles pow (using a left shift), increments x , and jumps back to the start of the while loop. For Loops for loops, like while loops, repeatedly execute a block of code until a condition is not met. However, for loops add support for read more..

  • Page - 345

    Example 6.6 LOOPS USING slt The following high-level code adds the powers of 2 from 1 to 100. Translate it into assembly language. // high-level code int sum = 0; for (i = 1; i < 101; i = i* 2) sum = sum + i; Solution: The assembly language code uses the set less than (slt) instruction to perform the less than comparison in the for loop. # MIPS assembly code # $s0 read more..

  • Page - 346

    into $s0 . Recall that the load upper immediate (lui) and or immediate (ori) instructions can be used to load a 32-bit constant into a register. actionGoTo:346,Code actionGoTo:346,Example actionGoTo:346,6.21 also illustrates why lw takes a base address and an offset. The base address points to the start of the array. The offset can be used to access subsequent elements of the read more..

  • Page - 347

    Shifting left by 2 is a convenient way to multiply by 4 in MIPS assembly language. This example readily extends to an array of any size. Bytes and Characters Numbers in the range [ –128, 127] can be stored in a single byte rather than an entire word. Because there are much fewer than 256 characters on an English language keyboard, English characters are often represented read more..

  • Page - 348

    MIPS provides load byte and store byte instructions to manipulate bytes or characters of data: load byte unsigned (lbu), load byte (lb), and store byte (sb). All three are illustrated in actionGoTo:348,Figure actionGoTo:348,6.22. Table 6.2 ASCII encodings # Char # Char # Char # Char # Char # Char 20 space 30 0 40 @ 50 P 60 ` 70 p 21 ! 31 1 41 A 51 Q 61 a 71 q 22 " 32 2 42 B 52 R read more..

  • Page - 349

    Word Address 1522FFF4 1522FFF0 Data 48 65 6C 6C 6F 21 00 Little-Endian Memory Byte 3 Byte 0 Figure 6.23 The string “Hello!” stored in memory Load byte unsigned (lbu) zero-extends the byte, and load byte (lb)sign- extends the byte to fill the entire 32-bit register. Store byte (sb)stores the least significant byte of the 32-bit register into the specified byte address in memory. read more..

  • Page - 350

    6.4.6 Function Calls High-level languages often use functions (also called procedures) to reuse frequently accessed code and to make a program more modular and readable. Functions have inputs, called arguments, and an output, called the return value. Functions should calculate the return value and cause no other unintended side effects. When one function calls another, the calling read more..

  • Page - 351

    Jump and link (jal) and jump register (jr $ra ) are the two essential instructions needed for a function call. jal performs two operations: it stores the address of the next instruction (the instruction after jal )in the return address register ($ra), and it jumps to the target instruction. In actionGoTo:350,Code actionGoTo:350,Example actionGoTo:350,6.23, the main function calls the read more..

  • Page - 352

    The Stack The stack is memory that is used to save local variables within a function. The stack expands (uses more memory) as the processor needs more scratch space and contracts (uses less memory) when the processor no longer needs the variables stored there. Before explaining how functions use the stack to store temporary variables, we explain how the stack works. The stack read more..

  • Page - 353

    actionGoTo:353,Code actionGoTo:353,Example actionGoTo:353,6.25 showsanimprovedversion of diffofsums that saves and restores $t0 , $t1 ,and $s0 . The new lines are indicated in blue. actionGoTo:353,Figure actionGoTo:353,6.25 shows the stack before, during, and after a call to the diffofsums function from actionGoTo:353,Code actionGoTo:353,Example actionGoTo:353,6.25. diffofsums makes room for three words on read more..

  • Page - 354

    Preserved Registers actionGoTo:353,Code actionGoTo:353,Example actionGoTo:353,6.25 assumes that temporary registers $t0 and $t1 must be saved and restored. If the calling function does not use those registers, the effort to save and restore them is wasted. To avoid this waste, MIPS divides registers into preserved and nonpreserved categories. The preserved regis- ters include $s0 –$s7 (hence read more..

  • Page - 355

    Hence, they must be saved by the caller if the caller depends on any of its own arguments after a called function returns. $v0 –$v1 certainly should not be preserved, because the callee returns its result in these registers. The stack above the stack pointer is automatically preserved as long as the callee does not write to memory addresses above $sp . In this way, it read more..

  • Page - 356

    instruction (mul $v0, $a0, $v0 ) multiplies $a0 and $v0 and places the result in $v0 . actionGoTo:356,Figure actionGoTo:356,6.26 shows the stack when executing factorial(3) .Weassume that $sp initially points to 0xFC, as shown in actionGoTo:356,Figure actionGoTo:356,6.26(a). The function creates a two-word stack frame to hold $a0 and $ra . On the first invocation, factorial saves $a0 (holding n read more..

  • Page - 357

    factorial returns the value 1 in $v0 and deallocates the stack frame before returning to the second invocation. The second invocation restores n to 2, restores $ra to 0xBC (it happened to already have this value), deallocates the stack frame, and returns $v0 = 2 × 1 = 2 to the first invocation. The first invocation restores n to 3, restores $ra to the return address of the read more..

  • Page - 358

    arguments, it finds them in the caller ’s stack frame. Accessing additional input arguments is the one exception in which a function can access stack data not in its own stack frame. 6.5 ADDRESSING MODES MIPS uses five addressing modes: register-only, immediate, base, PC-relative, and pseudo-direct. The first three modes (register-only, immediate, and base addressing) define modes of read more..

  • Page - 359

    Example 6.8 CALCULATING THE IMMEDIATE FIELD FOR PC-RELATIVE ADDRESSING Calculate the immediate field and show the machine code for the branch not equal (bne) instruction in the following program. # MIPS assembly code 0x40 loop: add $t1, $a0, $s0 0x44 lb $t1, 0($t1) 0x48 add $t2, $a1, $s0 0x4C sb $t1, 0($t2) 0x50 addi $s0, $s0, 1 0x54 bne $t1, $0, loop 0x58 lw $s0, 0($sp) Solution: read more..

  • Page - 360

    32-bit jump target address (JTA) to indicate the instruction address to execute next. Unfortunately, the J-type instruction encoding does not have enough bits to specify a full 32-bit JTA. Six bits of the instruction are used for the opcode , so only 26 bits are left to encode the JTA. Fortunately, the two least significant bits, JTA1:0, should always be 0, because instructions read more..

  • Page - 361

    6.6 LIGHTS, CAMERA, ACTION: COMPILING, ASSEMBLING, AND LOADING Up until now, we have shown how to translate short high-level code snip- pets into assembly and machine code. This section describes how to com- pile and assemble a complete high-level program and how to load the program into memory for execution. We begin by introducing the MIPS memory map, which defines where code, read more..

  • Page - 362

    are defined at start-up, before the program begins executing. These vari- ables are declared outside the main function in a C program and can be accessed by any function. The global data segment is large enough to store 64 KB of global variables. Global variables are accessed using the global pointer ($gp), which is initialized to 0x100080000. Unlike the stack pointer ($sp), read more..

  • Page - 363

    loads the program into memory and starts execution. The remainder of this section walks through these steps for a simple program. Step 1: Compilation A compiler translates high-level code into assembly language. actionGoTo:363,Code actionGoTo:363,Exam- actionGoTo:363,ple actionGoTo:363,6.30 shows a simple high-level program with three global variables and Assembly Code High-Level Code Compiler Object read more..

  • Page - 364

    two functions, along with the assembly code produced by a typical compi- ler. The .data and .text keywords are assembler directives that indicate where the text and data segments begin. Labels are used for global vari- ables f , g ,and y . Their storage location will be determined by the assem- bler; for now, they are left as symbols in the code. Step 2: Assembling The read more..

  • Page - 365

    Step 3: Linking Most large programs contain more than one file. If the programmer changes only one of the files, it would be wasteful to recompile and reas- semble the other files. In particular, programs often call functions in library files; these library files almost never change. If a file of high-level code is not changed, the associated object file need not be updated. read more..

  • Page - 366

    store instruction, sw $a0, 0x8000($gp) , stores the value 2 to the global variable f , which is located at memory address 0x10000000. Remember that the offset, 0x8000, is a 16-bit signed number that is sign-extended and added to the base address, $gp .So, $gp + 0x8000 = 0x10008000 + 0xFFFF8000 = 0x10000000, the memory address of variable f . Step 4: Loading The operating system read more..

  • Page - 367

    6.7 ODDS AND ENDS* This section covers a few optional topics that do not fit naturally elsewhere in the chapter. These topics include pseudoinstructions, excep- tions, signed and unsigned arithmetic instructions, and floating-point instructions. 6.7.1 Pseudoinstructions If an instruction is not available in the MIPS instruction set, it is probably because the same operation can be performed read more..

  • Page - 368

    how the assembler uses $at in converting a pseudoinstruction to real MIPS instructions. We leave it as actionGoTo:391,Exercises actionGoTo:391,6.38 actionGoTo:391,and actionGoTo:391,6.39 to implement other pseudo- instructions such as rotate left (rol) and rotate right (ror). 6.7.2 Exceptions An exception is like an unscheduled function call that jumps to a new address. Exceptions may be caused read more..

  • Page - 369

    exception takes place. The processor returns to the address in EPC after handling the exception. This is analogous to using $ra to store the old value of the PC during a jal instruction. The EPC and Cause registers are not part of the MIPS register file. The mfc0 (move from coprocessor 0) instruction copies these and other special-purpose registers into one of the general purpose read more..

  • Page - 370

    example, adding the following two huge positive numbers gives a negative result: 0x7FFFFFFF + 0x7FFFFFFF = 0xFFFFFFFE = −2. Similarly, add- ing two huge negative numbers gives a positive result, 0x80000001 + 0x80000001 = 0x00000002. This is called arithmetic overflow. The C language ignores arithmetic overflows, but other languages, such as Fortran, require that the program be notified. read more..

  • Page - 371

    6.7.4 Floating-Point Instructions The MIPS architecture defines an optional floating-point coprocessor, known as coprocessor 1. In early MIPS implementations, the floating- point coprocessor was a separate chip that users could purchase if they needed fast floating-point math. In most recent MIPS implementations, the floating-point coprocessor is built in alongside the main processor. MIPS read more..

  • Page - 372

    Floating-point branches have two parts. First, a compare instruction is used to set or clear the floating-point condition flag (fpcond). Then, a conditional branch checks the value of the flag. The compare instruc- tions include equality (c.seq.s/c.seq.d), less than (, and less than or equal to (c.le.s/c.le.d). The conditional branch instructions are bc1f and bc1t that branch read more..

  • Page - 373

    This section introduces the x86 architecture. The goal is not to make you into an x86 assembly language programmer, but rather to illustrate some of the similarities and differences between x86 and MIPS. We think it is interesting to see how x86 works. However, none of the material in this section is needed to understand the rest of the book. Major differ- ences between x86 read more..

  • Page - 374

    MIPS instructions generally specify three operands: two sources and one destination. x86 instructions specify only two operands. The first is a source. The second is both a source and the destination. Hence, x86 instructions always overwrite one of their sources with the result. actionGoTo:374,Table actionGoTo:374,6.10 lists the combinations of operand locations in x86. All combinations are read more..

  • Page - 375

    6.8.3 Status Flags x86, like many CISC architectures, uses status flags (also called condi- tion codes) to make decisions about branches and to keep track of car- ries and arithmetic overflow. x86 uses a 32-bit register, called EFLAGS , that stores the status flags. Some of the bits of the EFLAGS register are given in actionGoTo:375,Table actionGoTo:375,6.13. Other bits are used by read more..

  • Page - 376

    Table 6.14 Selected x86 instructions Instruction Meaning Function ADD/SUB add/subtract D= D+S /D =D − S ADDC add with carry D= D+S +CF INC/DEC increment/decrement D= D+1 /D =D − 1 CMP compare Set flags based on D − S NEG negate D= −D AND/OR/XOR logical AND/OR/XOR D= DopS NOT logical NOT D= D _ IMUL/MUL signed/unsigned multiply EDX:EAX = EAX × D IDIV/DIV signed/unsigned divide EDX:EAX/D EAX read more..

  • Page - 377

    zero flag (ZF)is1. JNZ jumps if the zero flag is 0. The jumps usually fol- low an instruction, such as the compare instruction (CMP), that sets the flags. actionGoTo:377,Table actionGoTo:377,6.15 lists some of the conditional jumps and how they depend on the flags set by a prior compare operation. 6.8.5 x86 Instruction Encoding The x86 instruction encodings are truly messy, a read more..

  • Page - 378

    instruction can be preceded by up to four optional byte-long prefixes that modify its behavior. The ModR/M byte uses the 2-bit Mod and 3-bit R/M field to specify the addressing mode for one of the operands. The operand can come from one of the eight registers, or from one of 24 memory addressing modes. Due to artifacts in the encodings, the ESP and EBP registers are not read more..

  • Page - 379

    backward compatibility with 8086 programs, defaulting the opcode to 16-bit operands. It is set to 1 for programs to default to 32-bit oper- ands. Moreover, the programmer can specify prefixes to change the form for a particular instruction. If the prefix 0x66 appears before the opcode , the alternative size operand is used (16 bits in 32-bit mode, or 32 bits in 16-bit mode). read more..

  • Page - 380

    have shorter programs, because a complex instruction is equivalent to a series of simple MIPS instructions and because the instructions are encoded to minimize memory use. However, the x86 architecture is a hodgepodge of features accumulated over the years, some of which are no longer useful but must be kept for compatibility with old pro- grams. It has too few registers, and read more..

  • Page - 381

    In the first part of this book, we learned about the circuit and logic levels of abstraction. In this chapter, we jumped up to the architecture level. In the next chapter, we study microarchitecture, the arrangement of digital building blocks that implement a processor architecture. Micro- architecture is the link between hardware and software engineering. And, we believe it is one read more..

  • Page - 382

    Exercises Exercise 6.1 Give three examples from the MIPS architecture of each of the architecture design principles: (1) simplicity favors regularity; (2) make the common case fast; (3) smaller is faster; and (4) good design demands good compromises. Explain how each of your examples exhibits the design principle. Exercise 6.2 The MIPS architecture has a register set that consists of read more..

  • Page - 383

    Exercise 6.8 Show how the strings in actionGoTo:382,Exercise actionGoTo:382,6.6 are stored in a byte-addressable memory on (a) a big-endian machine and (b) a little-endian machine starting at memory address 0x1000100C. Use a memory diagram similar to actionGoTo:328,Figure actionGoTo:328,6.4. Clearly indicate the memory address of each byte on each machine. Exercise 6.9 Repeat read more..

  • Page - 384

    Exercise 6.15 Repeat actionGoTo:383,Exercise actionGoTo:383,6.14 for the following machine code. $a0 and $a1 are the inputs. $a0 contains a 32-bit number and $a1 is the address of a 32-element array of characters (char). 0x00400000 0x2008001F 0x00400004 0x01044806 0x00400008 0x31290001 0x0040000C 0x0009482A 0x00400010 0xA0A90000 0x00400014 0x20A50001 0x00400018 0x2108FFFF 0x0040001C 0x0501FFF9 0x00400020 0x03E00008 read more..

  • Page - 385

    // C code void strcpy(char dst[], char src[]) { int i = 0; do { dst[i] = src[i]; } while (src[i++]); } (a) Implement the strcpy function in MIPS assembly code. Use $s0 for i . (b) Draw a picture of the stack before, during, and after the strcpy function call. Assume $sp = 0x7FFFFF00 just before strcpy is called. Exercise 6.20 Convert the high-level function from read more..

  • Page - 386

    (a) What is fib(n) for n = 0 and n = –1? (b) Write a function called fib in a high-level language that returns the Fibonacci number for any nonnegative value of n. Hint: You probably will want to use a loop. Clearly comment your code. (c) Convert the high-level function of part (b) into MIPS assembly code. Add comments after every line of code that explain clearly read more..

  • Page - 387

    Ben then translates the two functions into assembly language as follows. He also writes a function, test , that calls the function f(5, 3) . # MIPS assembly code # f: $a0 = a, $a1 = b, $s0 = j; f2: $a0 = x, $s0 = k 0x00400000 test: addi $a0, $0, 5 # $a0 = 5(a = 5) 0x00400004 addi $a1, $0, 3 # $a1 = 3(b = 3) 0x00400008 jal f # call f(5, 3) 0x0040000C loop: j read more..

  • Page - 388

    (b) Suppose Ben deletes the instructions at addresses 0x0040001C and 0x00400044 that save and restore $ra . Will the program (1) enter an infinite loop but not crash; (2) crash (cause the stack to grow beyond the dynamic data segment or the PC to jump to a location outside the program); (3) produce an incorrect value in $v0 when the program returns to loop (if so, what read more..

  • Page - 389

    Exercise 6.26 Consider the following MIPS assembly language snippet. The numbers to the left of each instruction indicate the instruction address. 0x00400028 add $a0, $a1, $0 0x0040002C jal f2 0x00400030 f1: jr $ra 0x00400034 f2: sw $s0, 0($s2) 0x00400038 bne $a0, $0, else 0x0040003C jf1 0x00400040 else: addi $a0, $a0, −1 0x00400044 jf2 (a) Translate the instruction sequence into machine code. read more..

  • Page - 390

    Exercise 6.28 Consider the following high-level function. // C code int f(int n, int k) { int b; b = k + 2; if (n == 0) b = 10; else b = b + (n * n) + f(n − 1, k + 1); return b * k; } (a) Translate the high-level function f into MIPS assembly language. Pay particular attention to properly saving and restoring registers across function calls and using the MIPS read more..

  • Page - 391

    Exercise 6.31 Explain why it is advantageous to have a large address field, addr , in the machine format for the jump instructions, j and jal . Exercise 6.32 Write assembly code that jumps to an instruction 64 Minstructions from the first instruction. Recall that 1 Minstruction = 2 20 instructions = 1,048,576 instructions. Assume that your code begins at address 0x00400000. Use a read more..

  • Page - 392

    lw $ra, 0($sp) addi $sp, $sp, 4 jr $ra diff: sub $v0, $a0, $a1 jr $ra (a) First show the instruction address next to each assembly instruction. (b) Draw the symbol table showing the labels and their addresses. (c) Convert all instructions into machine code. (d) How big (how many bytes) are the data and text segments? (e) Sketch a memory map showing where data and read more..

  • Page - 393

    Exercise 6.39 Repeat actionGoTo:391,Exercise actionGoTo:391,6.38 for the following pseudoinstructions. (a) beq $t1, imm31:0,L (b) ble $t3, $t5, L (c) bgt $t3, $t5, L (d) bge $t3, $t5, L 368 CHAPTER SIX Architecture read more..

  • Page - 394

    Interview Questions The following exercises present questions that have been asked at interviews for digital design jobs (but are usually open to any assembly language). Question 6.1 Write MIPS assembly code for swapping the contents of two registers, $t0 and $t1 . You may not use any other registers. Question 6.2 Suppose you are given an array of both positive and negative read more..

  • Page - 395

    read more..

  • Page - 396

    7 Microarchitecture With contributions from Matthew Watkins 7.1 INTRODUCTION In this chapter, you will learn how to piece together a MIPS microproces- sor. Indeed, you will puzzle out three different versions, each with differ- ent trade-offs between performance, cost, and complexity. To the uninitiated, building a microprocessor may seem like black magic. But it is actually relatively read more..

  • Page - 397

    of the program counter and the 32 registers. Any MIPS microarchitecture must contain all of this state. Based on the current architectural state, the processor executes a particular instruction with a particular set of data to produce a new architectural state. Some microarchitectures contain addi- tional nonarchitectural state to either simplify the logic or improve perfor- mance; we read more..

  • Page - 398

    control signals, such as the register file write enable. We will use this con- vention throughout the chapter to avoid cluttering diagrams with bus widths. Also, state elements usually have a reset input to put them into a known state at start-up. Again, to save clutter, this reset is not shown. The program counter is an ordinary 32-bit register. Its output, PC, points to read more..

  • Page - 399

    is built of clocked state elements and combinational logic, so it too is a synchronous sequential circuit. Indeed, the processor can be viewed as a giant finite state machine, or as a collection of simpler interacting state machines. 7.1.3 MIPS Microarchitectures In this chapter, we develop three microarchitectures for the MIPS proces- sor architecture: single-cycle, multicycle, and read more..

  • Page - 400

    of these additional transistors to deliver more performance. Precise cost calculations require detailed knowledge of the implementation technol- ogy, but in general, more gates and more memory mean more dollars. This section lays the foundation for analyzing performance. There are many ways to measure the performance of a computer system, and marketing departments are infamous for choosing read more..

  • Page - 401

    we have an ideal memory system that does not affect the CPI. In actionGoTo:500,Chap- actionGoTo:500,ter actionGoTo:500,8, we examine how the processor sometimes has to wait for the mem- ory, which increases the CPI. The number of seconds per cycle is the clock period, Tc. The clock period is determined by the critical path through the logic on the proces- sor. Different read more..

  • Page - 402

    The processor ’s actions depend on the specific instruction that was fetched. First we will work out the datapath connections for the lw instruction. Then we will consider how to generalize the datapath to han- dle the other instructions. For a lw instruction, the next step is to read the source register con- taining the base address. This register is specified in the rs read more..

  • Page - 403

    The processor must add the base address to the offset to find the address to read from memory. actionGoTo:403,Figure actionGoTo:403,7.5 introduces an ALU to perform this addition. The ALU receives two operands, SrcA and SrcB. SrcA comes from the register file, and SrcB comes from the sign-extended immediate. The ALU can perform many operations, as was described in read more..

  • Page - 404

    The destination register for the lw instruction is specified in the rt field, Instr20:16, which is connected to the port 3 address input, A3, of the reg- ister file. The ReadData bus is connected to the port 3 write data input, WD3, of the register file. A control signal called RegWrite is connected to the port 3 write enable input, WE3, and is asserted during a lw read more..

  • Page - 405

    be written to the register file. Note that data is still read from the address given to the data memory, but that this ReadData is ignored because RegWrite = 0. Next, consider extending the datapath to handle the R-type instruc- tions add , sub , and , or , and slt . All of these instructions read two regis- ters from the register file, perform some ALU operation on read more..

  • Page - 406

    instructions to choose Result from the ALUResult; it is 1 for lw to choose ReadData. We don ’t care about the value of MemtoReg for sw , because sw does not write to the register file. Similarly, in actionGoTo:405,Figure actionGoTo:405,7.8, the register to write was specified by the rt field of the instruction, Instr20:16. However, for R-type instructions, the register is specified read more..

  • Page - 407

    and the Zero flag is asserted. Hence, Branch is 1 for beq and 0 for other instructions. For beq , ALUControl = 110, so the ALU performs a sub- traction. ALUSrc = 0 to choose SrcB from the register file. RegWrite and MemWrite are 0, because a branch does not write to the register file or memory. We don ’t care about the values of RegDst and MemtoReg, because the read more..

  • Page - 408

    actionGoTo:409,Table actionGoTo:409,7.2 is a truth table for the ALU decoder. Recall that the mean- ings of the three ALUControl signals were given in actionGoTo:274,Table actionGoTo:274,5.1. Because ALUOp is never 11, the truth table can use don ’t care ’s X1 and 1X instead of 01 and 10 to simplify the logic. When ALUOp is 00 or 01, the ALU should add or subtract, read more..

  • Page - 409

    ALU decoder output. Recall that, for instructions that do not write to the register file (e.g., sw and beq ), the RegDst and MemtoReg control signals are don ’t cares (X); the address and data to the register write port do not matter because RegWrite is not asserted. The logic for the decoder can be designed using your favorite techniques for combina- tional logic design. read more..

  • Page - 410

    (not SignImm), so ALUSrc must be 0. or is an R-type instruction, so ALUOp is 10, indicating that ALUControl should be determined from the funct field to be 001. Result is taken from the ALU, so MemtoReg is 0. The result is written to the register file, so RegWrite is 1. The instruction does not write memory, so MemWrite = 0. The selection of the destination register is read more..

  • Page - 411

    supporting some instructions simply requires enhancing the main decoder, whereas supporting others also requires more hardware in the datapath. Example 7.2 addi INSTRUCTION The add immediate instruction, addi , adds the value in a register to the immediate and writes the result to another register. The datapath already is capable of this task. Determine the necessary changes to the read more..

  • Page - 412

    Now we must add a row to the main decoder truth table for the j instruction and a column for the Jump signal, as shown in actionGoTo:412,Table actionGoTo:412,7.5.The Jump control sig- nal is 1 for the j instruction and 0 for all others. j does not write the register file or memory, so RegWrite = MemWrite = 0. Hence, we don ’t care about the com- putation done in the read more..

  • Page - 413

    7.3.4 Performance Analysis Each instruction in the single-cycle processor takes one clock cycle, so the CPI is 1. The critical path for the lw instruction is shown in actionGoTo:413,Figure actionGoTo:413,7.15 with a heavy dashed blue line. It starts with the PC loading a new address on the rising edge of the clock. The instruction memory reads the next instruction. The register read more..

  • Page - 414

    disciplining ourselves to synchronous sequential design, so the clock per- iod is constant and must be long enough to accommodate the slowest instruction. Example 7.4 SINGLE-CYCLE PROCESSOR PERFORMANCE Ben Bitdiddle is contemplating building the single-cycle MIPS processor in a 65 nm CMOS manufacturing process. He has determined that the logic elements have the delays given in read more..

  • Page - 415

    can read or write the memory or register file or use the ALU. Different instructions use different numbers of steps, so simpler instructions can complete faster than more complex ones. The processor needs only one adder; this adder is reused for different purposes on various steps. And the processor uses a combined memory for instructions and data. The instruction is fetched from read more..

  • Page - 416

    Instruction Register so that it is available for future cycles. The Instruction Register receives an enable signal, called IRWrite, that is asserted when it should be updated with a new instruction. As we did with the single-cycle processor, we will work out the data- path connections for the lw instruction. Then we will enhance the data- path to handle the other instructions. read more..

  • Page - 417

    SignImm CLK A RD Instr/Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File PC PC' Instr 25:21 15:0 SrcB ALUResult SrcA ALUOut CLK ALUControl2:0 WD WE CLK CLK A CLK EN IRWrite ALU Figure 7.20 Add base address to offset SignImm CLK A RD Instr/Data Memory A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File PC PC' Instr 25:21 15:0 SrcB ALUResult SrcA ALUOut CLK ALUControl2:0 WD WE CLK Adr Data read more..

  • Page - 418

    memory address, Adr, from either the PC or ALUOut, as shown in actionGoTo:417,Figure actionGoTo:417,7.21. The multiplexer select signal is called IorD, to indicate either an instruction or data address. The data read from the memory is stored in another nonarchitectural register, called Data. Notice that the address multiplexer permits us to reuse the memory during the lw instruction. read more..

  • Page - 419

    This completes the datapath for the lw instruction. Next, let us extend the datapath to also handle the sw instruction. Like the lw instruc- tion, the sw instruction reads a base address from port 1 of the register file and sign-extends the immediate. The ALU adds the base address to the immediate to find the memory address. All of these functions are already supported by read more..

  • Page - 420

    For R-type instructions, the instruction is again fetched, and the two source registers are read from the register file. ALUSrcB1:0, the control input of the SrcB multiplexer, is used to choose register B as the second source register for the ALU, as shown in actionGoTo:420,Figure actionGoTo:420,7.25. The ALU performs the appropriate operation and stores the result in ALUOut. On the read more..

  • Page - 421

    called PCEn, which is TRUE either when PCWrite is asserted or when both Branch and Zero are asserted. This completes the design of the multicycle MIPS processor datapath. The design process is much like that of the single-cycle processor in that hardware is systematically connected between the state elements to handle each instruction. The main difference is that the instruction is read more..

  • Page - 422

    The main controller produces multiplexer select and register enable signals for the datapath. The select signals are MemtoReg, RegDst, IorD, PCSrc, ALUSrcB, and ALUSrcA. The enable signals are IRWrite, Mem- Write, PCWrite, Branch, and RegWrite. To keep the following state transition diagrams readable, only the relevant control signals are listed. Select signals are listed only when their read more..

  • Page - 423

    The first step for any instruction is to fetch the instruction from mem- ory at the address held in the PC. The FSM enters this state on reset. To read memory, IorD = 0, so the address is taken from the PC. IRWrite is asserted to write the instruction into the instruction register, IR. Meanwhile, the PC should be incremented by 4 to point to the next instruction. read more..

  • Page - 424

    Now the FSM proceeds to one of several possible states, depending on the opcode . If the instruction is a memory load or store (lw or sw ), the multicycle processor computes the address by adding the base address to the sign-extended immediate. This requires ALUSrcA = 1 to select reg- ister A and ALUSrcB = 10 to select SignImm. ALUOp = 00, so the ALU adds. The effective read more..

  • Page - 425

    Data,and RegDst = 0 to pull the destination register from the rt field of the instruction. RegWrite is asserted to perform the write, completing the lw instruction. Finally, the FSM returns to the initial state, S0, to fetch the next instruction. For these and subsequent steps, try to visualize the data flow on your own. From state S2, if the instruction is sw , the data read more..

  • Page - 426

    If the opcode indicates an R-type instruction, the multicycle processor must calculate the result using the ALU and write that result to the regis- ter file. actionGoTo:428,Figure actionGoTo:428,7.37 shows these two steps. In S6, the instruction is exe- cuted by selecting the A and B registers (ALUSrcA = 1, ALUSrcB = 00) and performing the ALU operation indicated by the funct field read more..

  • Page - 427

    ALUSrcB = 11 to select SignImm × 4, and ALUOp = 00 to add. The destination address is stored in ALUOut. If the instruction is not beq , the computed address will not be used in subsequent cycles, but its computation was harmless. In S8, the processor compares the two regis- ters by subtracting them and checking to determine whether the result is 0. If it is, the read more..

  • Page - 428

    7.4.3 More Instructions As we did in actionGoTo:410,Section actionGoTo:410,7.3.3 for the single-cycle processor, let us now extend the multicycle processor to support the addi and j instructions. The next two examples illustrate the general design process to support new instructions. Example 7.5 addi INSTRUCTION Modify the multicycle processor to support addi . Solution: The datapath is already read more..

  • Page - 429

    to the register specified by the rt field of the instruction (RegDst = 0, MemtoReg = 0, RegWrite asserted). The astute reader may notice that S2 and S9 are identical and could be merged into a single state. Example 7.6 j INSTRUCTION Modify the multicycle processor to support j . Solution: First, we must modify the datapath to compute the next PC value in the case of a j read more..

  • Page - 430

    actionGoTo:433,Figure actionGoTo:433,7.42 shows the enhanced main controller (see page 408). The new state, S11, simply selects PC ′ as the PCJump value (PCSrc = 10) and writes the PC. Note that the PCSrc select signal is extended to two bits in S0 and S8 as well. 7.4.4 Performance Analysis The execution time of an instruction depends on both the number of cycles it uses read more..

  • Page - 431

    Example 7.7 MULTICYCLE PROCESSOR CPI The SPECINT2000 benchmark consists of approximately 25% loads, 10% stores, 11% branches, 2% jumps, and 52% R-type instructions. actionGoTo:431,3 Determine the average CPI for this benchmark. Solution: The average CPI is the sum over each instruction of the CPI for that instruction multiplied by the fraction of the time that instruction is used. For read more..

  • Page - 432

    Recall that we designed the multicycle processor so that each cycle involved one ALU operation, memory access, or register file access. Let us assume that the register file is faster than the memory and that writing memory is faster than reading memory. Examining the datapath reveals two possible critical paths that would limit the cycle time: Tc = tpcq + tmux + max ðtALU + read more..

  • Page - 433

    example shows that the multicycle processor is slower than the single-cycle proces- sor given the assumptions of CPI and circuit element delays. The fundamental pro- blem is that even though the slowest instruction, lw , was broken into five steps, the multicycle processor cycle time was not nearly improved five-fold. This is partly because not all of the steps are exactly the read more..

  • Page - 434

    7.5 PIPELINED PROCESSOR Pipelining, introduced in actionGoTo:182,Section actionGoTo:182,3.6, is a powerful way to improve the throughput of a digital system. We design a pipelined processor by sub- dividing the single-cycle processor into five pipeline stages. Thus, five instructions can execute simultaneously, one in each stage. Because each stage has only one-fifth of the entire logic, the read more..

  • Page - 435

    Time (ps) Instr Fetch Instruction Decode Read Reg Execute ALU Memory Read/Write Write Reg 1 2 0 100 200 300 400 500 600 700 800 900 1100 1200 1300 1400 1500 1600 1700 1800 1900 1000 (a) Fetch Instruction Decode Read Reg Execute ALU Memory Read/Write Write Reg Instr 1 2 (b) 3 Fetch Instruction Decode Read Reg Execute ALU Memory Read/Write Write Reg Decode Read Reg Execute ALU Memory Read/Write Write Reg read more..

  • Page - 436

    second instruction is fetched. At 500 ps, the first instruction executes, the second instruction enters the Decode stage, and a third instruction is fetched. And so forth, until all the instructions complete. The instruc- tion latency is 5 × 250 = 1250 ps. The throughput is 1 instruction per 250 ps (4 billion instructions per second). Because the stages are not perfectly read more..

  • Page - 437

    instruction except sw . In the pipelined processor, the register file is written in the first part of a cycle and read in the second part, as suggested by the shading. This way, data can be written and read back within a single cycle. A central challenge in pipelined systems is handling hazards that occur when the results of one instruction are needed by a subsequent read more..

  • Page - 438

    7.5.2 Pipelined Control The pipelined processor takes the same control signals as the single-cycle processor and therefore uses the same control unit. The control unit examines the opcode and funct fields of the instruction in the Decode stage to produce the control signals, as was described in actionGoTo:407,Section actionGoTo:407,7.3.2. These control signals must be pipelined along with read more..

  • Page - 439

    7.5.3 Hazards In a pipelined system, multiple instructions are handled concurrently. When one instruction is dependent on the results of another that has not yet completed, a hazard occurs. SignImmE CLK ARD Instruction Memory 4 + + A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign Extend Register File 0 1 0 1 ARD Data Memory WD WE 0 1 PCF 0 1 PC' InstrD 25:21 20:16 15:0 SrcBE 20:16 15:11 RtE RdE <<2 read more..

  • Page - 440

    The register file can be read and written in the same cycle. The write takes place during the first half of the cycle and the read takes place during the second half of the cycle, so a register can be written and read back in the same cycle without introducing a hazard. actionGoTo:440,Figure actionGoTo:440,7.48 illustrates hazards that occur when one instruction writes a read more..

  • Page - 441

    enhance the pipelined processor with a hazard unit that detects hazards and handles them appropriately, so that the processor executes the pro- gram correctly. Solving Data Hazards with Forwarding Some data hazards can be solved by forwarding (also called bypassing)a result from the Memory or Writeback stage to a dependent instruction in the Execute stage. This requires adding read more..

  • Page - 442

    E m m I n g i S K L C D R A n o i t c u r t s n I y r o m e M 4 + + 1 A 3 A 3 D W 2 D R 1 D R 3 E W 2 A K L C n g i S d n e t x E r e t s i g e R e l i F 0 1 0 1 D R A a t a D y r o m e M D W E W 1 0 F C P 0 1 ' C P D r t s n I 25:21 20:16 15:0 5:0 E B c r read more..

  • Page - 443

    The hazard detection unit computes control signals for the forwarding multiplexers to choose operands from the register file or from the results in the Memory or Writeback stage. It should forward from a stage if that stage will write a destination register and the destination register matches the source register. However, $0 is hardwired to 0 and should never be forwarded. If read more..

  • Page - 444

    The alternative solution is to stall the pipeline, holding up operation until the data is available. actionGoTo:444,Figure actionGoTo:444,7.52 shows stalling the dependent instruc- tion (and) in the Decode stage. and enters the Decode stage in cycle 3 and stalls there through cycle 4. The subsequent instruction (or) must remain in the Fetch stage during both cycles as well, because read more..

  • Page - 445

    E m m I n g i S K L C D R A n o i t c u r t s n I y r o m e M 4 + + 1 A 3 A 3 D W 2 D R 1 D R 3 E W 2 A K L C n g i S d n e t x E r e t s i g e R e l i F 0 1 0 1 D R A a t a D y r o m e M D W E W 1 0 F C P 0 1 ' C P D r t s n I 1 2 : 5 2 6 1 : 0 2 0 read more..

  • Page - 446

    are asserted to force the Decode and Fetch stage pipeline registers to hold their old values. FlushE is also asserted to clear the contents of the Exe- cute stage pipeline register, introducing a bubble. actionGoTo:446,4 The MemtoReg signal is asserted for the lw instruction. Hence, the logic to compute the stalls and flushes is lwstall = ((rsD == rtE) OR (rtD == rtE)) AND read more..

  • Page - 447

    actionGoTo:447,Figure actionGoTo:447,7.55 shows the pipeline operation with the early branch deci- sion being made in cycle 2. In cycle 3, the and instruction is flushed and the slt instruction is fetched. Now the branch misprediction penalty is reduced to only one instruction rather than three. actionGoTo:448,Figure actionGoTo:448,7.56 modifies the pipelined processor to move the branch read more..

  • Page - 448

    D l a u q E E m m I n g i S K L C D R A n o i t c u r t s n I y r o m e M 4 + 1 A 3 A 3 D W 2 D R 1 D R 3 E W 2 A K L C n g i S d n e t x E r e t s i g e R e l i F 0 1 0 1 D R A a t a D y r o m e M D W E W 1 0 F C P 0 1 ' C P D r t s n I 1 2 : 5 2 6 1 : 0 2 0 : 5 1 0 : 5 E B c r S 1 2 : 5 2 1 1 : 5 1 E s R E d R 2 < < + M t u O read more..

  • Page - 449

    so that PCSrc can be determined in the Decode stage rather than the Memory stage. The PCBranch adder must also be moved into the Decode stage so that the destination address can be computed in time. The syn- chronous clear input (CLR) connected to PCSrcD is added to the Decode stage pipeline register so that the incorrectly fetched instruction can be flushed when a branch read more..

  • Page - 450

    D l a u q E E m m I n g i S K L C D R A n o i t c u r t s n I y r o m e M 4 + + 1 A 3 A 3 D W 2 D R 1 D R 3 E W 2 A K L C n g i S d n e t x E r e t s i g e R e l i F 0 1 0 1 D R A a t a D y r o m e M D W E W 1 0 F C P 0 1 ' C P D r t s n I 1 2 : 5 2 6 1 : 0 2 0 : 5 1 0 : 5 E B c r S 1 2 : 5 2 1 1 : 5 1 E s R E d R 2 < < M t u O read more..

  • Page - 451

    must be fetched. Control hazards are solved by predicting which instruc- tion should be fetched and flushing the pipeline if the prediction is later determined to be wrong. Moving the decision as early as possible mini- mizes the number of instructions that are flushed on a misprediction. You may have observed by now that one of the challenges of designing a pipelined processor read more..

  • Page - 452

    EqualD SignImmE CLK ARD Instruction memory + 4 A1 A3 WD3 RD2 RD1 WE3 A2 CLK Sign extend Register file 0 1 0 1 ARD Data memory WD WE 1 0 PCF 0 1 PC' InstrD 25:21 20:16 15:0 5:0 SrcBE 25:21 15:11 RsE RdE <<2 + ALUOutM ALUOutW ReadDataW WriteDataE WriteDataM SrcAE PCPlus4D PCBranchD ResultW PCPlus4F 31:26 RegDstD BranchD MemWriteD MemtoRegD ALUControlD 2:0 ALUSrcD RegWriteD Op Funct Control unit PCSrcD CLK CLK CLK CLK read more..

  • Page - 453

    We can determine the cycle time by considering the critical path in each of the five pipeline stages shown in actionGoTo:452,Figure actionGoTo:452,7.58. Recall that the register file is written in the first half of the Writeback cycle and read in the second half of the Decode cycle. Therefore, the cycle time of the Decode and Writeback stages is twice the time necessary to read more..

  • Page - 454

    7.6 HDL REPRESENTATION* This section presents HDL code for the single-cycle MIPS processor sup- porting all of the instructions discussed in this chapter, including addi and j . The code illustrates good coding practices for a moderately com- plex system. HDL code for the multicycle processor and pipelined proces- sor are left to actionGoTo:495,Exercises actionGoTo:495,7.25 actionGoTo:495,and read more..

  • Page - 455

    testbench and external memories. The HDL is available in electronic form on this book ’s website (see the preface). 7.6.1 Single-Cycle Processor The main modules of the single-cycle MIPS processor module are given in the following HDL examples. HDL Example 7.1 SINGLE-CYCLE MIPS PROCESSOR SystemVerilog module mips(input logic clk, reset, output logic [31:0] pc, input logic [31:0] instr, output read more..

  • Page - 456

    HDL Example 7.2 CONTROLLER SystemVerilog module controller(input logic [5:0] op, funct, input logic zero, output logic memtoreg, memwrite, output logic pcsrc, alusrc, output logic regdst, regwrite, output logic jump, output logic [2:0] alucontrol); logic [1:0] aluop; logic branch; maindec md(op, memtoreg, memwrite, branch, alusrc, regdst, regwrite, jump, aluop); aludec ad(funct, aluop, alucontrol); assign pcsrc read more..

  • Page - 457

    HDL Example 7.3 MAIN DECODER SystemVerilog module maindec(input logic [5:0] op, output logic memtoreg, memwrite, output logic branch, alusrc, output logic regdst, regwrite, output logic jump, output logic [1:0] aluop); logic [8:0] controls; assign {regwrite, regdst, alusrc, branch, memwrite, memtoreg, jump, aluop} = controls; always_comb case(op) 6'b000000: controls <= 9'b110000010; // RTYPE 6'b100011: read more..

  • Page - 458

    HDL Example 7.5 DATAPATH SystemVerilog module datapath(input logic clk, reset, input logic memtoreg, pcsrc, input logic alusrc, regdst, input logic regwrite, jump, input logic [2:0] alucontrol, output logic zero, output logic [31:0] pc, input logic [31:0] instr, output logic [31:0] aluout, writedata, input logic [31:0] readdata); logic [4:0] writereg; logic [31:0] pcnext, pcnextbr, pcplus4, pcbranch; logic [31:0] read more..

  • Page - 459

    7.6.2 Generic Building Blocks This section contains generic building blocks that may be useful in any MIPS microarchitecture, including a register file, adder, left shift unit, sign-extension unit, resettable flip-flop, and multiplexer. The HDL for the ALU is left to actionGoTo:493,Exercise actionGoTo:493,5.9. signal signimm, signimmsh: STD_LOGIC_VECTOR(31 downto 0); signal srca, srcb, result: read more..

  • Page - 460

    HDL Example 7.6 REGISTER FILE SystemVerilog module regfile(input logic clk, input logic we3, input logic [4:0] ra1, ra2, wa3, input logic [31:0] wd3, output logic [31:0] rd1, rd2); logic [31:0] rf[31:0]; // three ported register file // read two ports combinationally // write third port on rising edge of clk // register 0 hardwired to 0 // note: for pipelined processor, write third port // read more..

  • Page - 461

    HDL Example 7.8 LEFT SHIFT (MULTIPLY BY 4) SystemVerilog module sl2(input logic [31:0] a, output logic [31:0] y); // shift left by 2 assign y = {a[29:0], 2'b00}; endmodule VHDL library IEEE; use IEEE.STD_LOGIC_1164.all; entity sl2 is –– shift left by 2 port(a: in STD_LOGIC_VECTOR(31 downto 0); y: out STD_LOGIC_VECTOR(31 downto 0)); end; architecture behave of sl2 is begin y <= a(29 read more..

  • Page - 462

    7.6.3 Testbench The MIPS testbench loads a program into the memories. The program in actionGoTo:462,Figure actionGoTo:462,7.60 exercises all of the instructions by performing a computation that should produce the correct answer only if all of the instructions are functioning properly. Specifically, the program will write the value 7 to address 84 if it runs correctly, and is unlikely read more..

  • Page - 463

    The machine code is stored in a hexadecimal file called memfile.dat (see actionGoTo:462,Figure actionGoTo:462,7.61), which is loaded by the testbench during simulation. The file consists of the machine code for the instructions, one instruction per line. The testbench, top-level MIPS module, and external memory HDL code are given in the following examples. The memories in this example read more..

  • Page - 464

    HDL Example 7.13 MIPS TOP-LEVEL MODULE SystemVerilog module top(input logic clk, reset, output logic [31:0] writedata, dataadr, output logic memwrite); logic [31:0] pc, instr, readdata; // instantiate processor and memories mips mips(clk, reset, pc, instr, memwrite, dataadr, writedata, readdata); imem imem(pc[7:2], instr); dmem dmem(clk, memwrite, dataadr, writedata, readdata); endmodule VHDL library IEEE; use read more..

  • Page - 465

    7.7 EXCEPTIONS* actionGoTo:368,Section actionGoTo:368,6.7.2 introduced exceptions, which cause unplanned changes in the flow of a program. In this section, we enhance the multicycle proces- sor to support two types of exceptions: undefined instructions and HDL Example 7.15 MIPS INSTRUCTION MEMORY SystemVerilog module imem(input logic [5:0] a, output logic [31:0] rd); logic [31:0] RAM[63:0]; initial read more..

  • Page - 466

    arithmetic overflow. Supporting exceptions in other microarchitectures follows similar principles. As described in actionGoTo:368,Section actionGoTo:368,6.7.2, when an exception takes place, the pro- cessor copies the PC to the EPC register and stores a code in the Cause register indicating the source of the exception. Exception causes include 0x28 for undefined instructions and 0x30 for read more..

  • Page - 467

    that selects the appropriate code for the exception. The ALU must also generate an overflow signal, as was discussed in actionGoTo:273,Section actionGoTo:273,5.2.4. actionGoTo:467,5 To support the mfc0 instruction, we also add a way to select the Coprocessor 0 registers and write them to the register file, as shown in actionGoTo:467,Figure actionGoTo:467,7.63. The mfc0 instruction specifies read more..

  • Page - 468

    and jumps to the exception handler. Note that, when an exception occurs, the instruction is discarded and the register file is not written. When a mfc0 instruction is decoded, the processor goes to S14 and writes the appropriate Coprocessor 0 register to the main register file. IorD = 0 AluSrcA = 0 ALUSrcB = 01 ALUOp = 00 PCSrc = 00 IRWrite ALUSrcA = 0 read more..

  • Page - 469

    7.8 ADVANCED MICROARCHITECTURE* High-performance microprocessors use a wide variety of techniques to run programs faster. Recall that the time required to run a program is propor- tional to the period of the clock and to the number of clock cycles per instruction (CPI). Thus, to increase performance we would like to speed up the clock and/or reduce the CPI. This section surveys read more..

  • Page - 470

    Example 7.11 DEEP PIPELINES Consider building a pipelined processor by chopping up the single-cycle processor into N stages (N ≥ 5). The single-cycle processor has a propagation delay of 875 ps through the combinational logic. The sequencing overhead of a register is 50 ps. Assume that the combinational delay can be arbitrarily divided into any number of stages and that pipeline read more..

  • Page - 471

    7.8.2 Branch Prediction An ideal pipelined processor would have a CPI of 1. The branch mispre- diction penalty is a major reason for increased CPI. As pipelines get deeper, branches are resolved later in the pipeline. Thus, the branch misprediction penalty gets larger, because all the instructions issued after the mispredicted branch must be flushed. To address this problem, most read more..

  • Page - 472

    predicts that the branch should be taken when the loop is first run again. In summary, a 1-bit branch predictor mispredicts the first and last branches of a loop. A 2-bit dynamic branch predictor solves this problem by having four states: strongly taken, weakly taken, weakly not taken,and strongly not taken, as shown in actionGoTo:472,Figure actionGoTo:472,7.66. When the loop is read more..

  • Page - 473

    actionGoTo:473,Figure actionGoTo:473,7.68 shows a pipeline diagram illustrating the two-way super- scalar processor executing two instructions on each cycle. For this pro- gram, the processor has a CPI of 0.5. Designers commonly refer to the reciprocal of the CPI as the instructions per cycle,or IPC. This processor has an IPC of 2 on this program. Executing many instructions read more..

  • Page - 474

    based on $t0 , and between or and sw based on $t3 ) are handled by for- warding results produced in one cycle to be consumed in the next. This program, also given below, requires five cycles to issue six instructions, for an IPC of 1.17. lw $t0, 40($s0) add $t1, $t0, $s1 sub $t0, $s2, $s3 and $t2, $s4, $t0 or $t3, $s5, $s6 sw $s7, 80($t3) Recall that parallelism comes read more..

  • Page - 475

    7.8.4 Out-of-Order Processor To cope with the problem of dependencies, an out-of-order processor looks ahead across many instructions to issue, or begin executing, inde- pendent instructions as rapidly as possible. The instructions can be issued in a different order than that written by the programmer, as long as dependencies are honored so that the program produces the intended result. read more..

  • Page - 476

    ▶ Cycle 2 – Remember that there is a two-cycle latency between when a lw instruction issues and when a dependent instruction can use its result, so add cannot issue yet because of the $t0 dependence. sub writes $t0 , so it cannot issue before add , lest add receive the wrong value of $t0. and is dependent on sub . – Only the sw instruction issues. ▶ Cycle 3 – On read more..

  • Page - 477

    written to the register. For example, in the following program, add and sub both write $t0 . The final value in $t0 should come from sub accord- ing to the order of the program. If an out-of-order processor attempted to execute sub first, a WAW hazard would occur. add $t0, $s1, $s2 sub $t0, $s3, $s4 WAW hazards are not essential either; again, they are artifacts caused by read more..

  • Page - 478

    sub could be executed sooner, because $r0 has no dependency on the add instruction. The processor keeps a table of which registers were renamed so that it can consistently rename registers in subsequent dependent instructions. In this example, $t0 must also be renamed to $r0 in the and instruction, because it refers to the result of sub . actionGoTo:478,Figure actionGoTo:478,7.71 shows read more..

  • Page - 479

    ▶ Cycle 3 – On cycle 3, $t0 is available, so add issues. $t3 is also available, so sw issues. The out-of-order processor with register renaming issues the six instructions in three cycles, for an IPC of 2. 7.8.6 Single Instruction Multiple Data The term SIMD (pronounced “sim-dee”) stands for single instruction multiple data, in which a single instruction acts on multiple pieces read more..

  • Page - 480

    7.8.7 Multithreading Because the ILP of real programs tends to be fairly low, adding more execu- tion units to a superscalar or out-of-order processor gives diminishing returns. Another problem, discussed in actionGoTo:500,Chapter actionGoTo:500,8, is that memory is much slower than the processor. Most loads and stores access a smaller and fas- ter memory, called a cache. However, when read more..

  • Page - 481

    7.8.8 Homogeneous Multiprocessors A multiprocessor system consists of multiple processors and a method for communication between the processors. A common form of multiproces- sing in computer systems is homogeneous multiprocessing, also called symmetric multiprocessing (SMP), in which two or more identical proces- sors share a single main memory. The multiple processors may be separate chips read more..

  • Page - 482

    and a typical consumer might be expected to have a couple of applica- tions actually computing simultaneously. While this is enough to keep dual- and quad-core systems busy, unless programs start incorporating significantly more parallelism, continuing to add more cores beyond this point will provide diminishing benefits. As an added issue, because gen- eral purpose processors are read more..

  • Page - 483

    units on chip and are now beginning to include other types of specialized hardware. AMD and Intel both have processors that incorporate a graphics processing unit (GPU) or FPGA and one or more traditional x86 cores on the same chip. AMD ’s Fusion line and Intel ’s Sandy Bridge are the first set of devices to incorporate a processor and GPU in the same die. Intel ’s read more..

  • Page - 484

    Intel x86 microprocessors. In the 40 years since the 4004, transistor fea- ture size has shrunk 160-fold, the number of transistors on a chip has increased by five orders of magnitude, and the operating frequency has increased by almost four orders of magnitude. No other field of engineer- ing has made such astonishing progress in such a short time. The 80386 is a multicycle read more..

  • Page - 485

    clearly visible on the left. Each of the columns processes one bit of data. Some of the control signals are generated using a microcode PLA that steps through the various states of the control FSM. The memory management unit in the upper right controls access to the external memory. The 80486, shown in actionGoTo:486,Figure actionGoTo:486,7.75, dramatically improved performance using read more..

  • Page - 486

    cannot be trademarked. The Pentium uses separate instruction and data caches. It also uses a branch predictor to reduce the performance penalty for branches. The Pentium Pro, Pentium II, and Pentium III processors all share a common out-of-order microarchitecture, code-named P6. The complex x86 instructions are broken down into one or more micro-ops similar to MIPS instructions. The read more..

  • Page - 487

    8 KB Data Cache Floating- Point Unit 8 KB Instruction Cache Multiprocessor Logic Instruction Fetch & Decode Complex Instruction Support Bus Interface Unit 32-bit Datapath Superscalar Controller Branch Prediction Figure 7.76 Pentium microprocessor chip Instruction Fetch & 16 KB Cache 16 KB Data Cache Data TLB Branch Target Buffer FPU IEU SIMD Register Renaming Microcode PLA 256 KB Level 2 read more..

  • Page - 488

    The floating-point datapath is called the Floating Point Unit (FPU). The processor also has a SIMD unit to perform packed operations on short integer and floating-point data. A larger portion of the chip is dedicated to issuing instructions out-of-order than to actually executing the instruc- tions. The instruction and data caches have grown to 16 KB each. The Pentium III also read more..

  • Page - 489

    The Pentium 4 ’s reliance on deep pipelines and high clock speed led to extremely high power consumption, sometimes more than 100 W. This is unacceptable in laptops and makes cooling of desktops expensive. Intel discovered that the older P6 architecture could achieve compar- able performance at much lower clock speed and power. The Pentium M uses an enhanced version of the P6 read more..

  • Page - 490

    7.10 SUMMARY This chapter has described three ways to build MIPS processors, each with different performance and cost trade-offs. We find this topic almost magical: how can such a seemingly complicated device as a microproces- sor actually be simple enough to fit in a half-page schematic? Moreover, the inner workings, so mysterious to the uninitiated, are actually reason- ably read more..

  • Page - 491

    arrangement of components. The microarchitectures exploit regularity and modularity, reusing a library of common building blocks such as ALUs, memories, multiplexers, and registers. Hierarchy is used in numer- ous ways. The microarchitectures are partitioned into the datapath and control units. Each of these units is built from logic blocks, which can be built from gates, which in turn read more..

  • Page - 492

    Exercises Exercise 7.1 Suppose that one of the following control signals in the single-cycle MIPS processor has a stuck-at-0 fault, meaning that the signal is always 0, regardless of its intended value. What instructions would malfunction? Why? (a) RegWrite (b) ALUOp1 (c) MemWrite Exercise 7.2 Repeat actionGoTo:492,Exercise actionGoTo:492,7.1, assuming that the signal has a stuck-at-1 fault. read more..

  • Page - 493

    Exercise 7.5 Many processor architectures have a load with postincrement instruction, which updates the index register to point to the next memory word after completing the load. lwinc $rt, imm($rs) is equivalent to the following two instructions: lw $rt, imm($rs) addi $rs, $rs, 4 Repeat actionGoTo:492,Exercise actionGoTo:492,7.3 for the lwinc instruction. Is it possible to add the read more..

  • Page - 494

    control signals. Mark up a copy of actionGoTo:430,Figure actionGoTo:430,7.39 to show the changes to the controller FSM. Describe any other changes that are required. (a) srlv (b) ori (c) xori (d) jr Exercise 7.14 Repeat actionGoTo:493,Exercise actionGoTo:493,7.13 for the following MIPS instructions. (a) bne (b) lb (c) lbu (d) andi Exercise 7.15 Repeat actionGoTo:493,Exercise actionGoTo:493,7.5 for read more..

  • Page - 495

    Exercise 7.22 What is the CPI of the redesigned multicycle MIPS processor from actionGoTo:494,Exercise actionGoTo:494,7.21? Use the instruction mix from actionGoTo:431,Example actionGoTo:431,7.7. Exercise 7.23 How many cycles are required to run the following program on the multicycle MIPS processor? What is the CPI of this program? addi $s0, $0, done # result = 5 while: beq $s0, $0, done read more..

  • Page - 496

    assign rd = RAM[a[31:2]]; // word aligned always @(posedge clk) if (we) RAM[a[31:2]] <= wd; endmodule Exercise 7.26 Extend your HDL code for the multicycle MIPS processor from actionGoTo:495,Exercise actionGoTo:495,7.25 to handle one of the new instructions from actionGoTo:493,Exercise actionGoTo:493,7.13. Enhance the testbench to test the new instruction. Exercise 7.27 Repeat actionGoTo:496,Exercise read more..

  • Page - 497

    Exercise 7.34 Explain how to extend the pipelined MIPS processor to handle the addi instruction. Exercise 7.35 Explain how to extend the pipelined processor to handle the j instruction. Give particular attention to how the pipeline is flushed when a jump takes place. Exercise 7.36 actionGoTo:451,Examples actionGoTo:451,7.9 actionGoTo:451,and actionGoTo:451,7.10 point out that the pipelined MIPS read more..

  • Page - 498

    Interview Questions The following exercises present questions that have been asked at interviews for digital design jobs. Question 7.1 Explain the advantages of pipelined microprocessors. Question 7.2 If additional pipeline stages allow a processor to go faster, why don ’t processors have 100 pipeline stages? Question 7.3 Describe what a hazard is in a microprocessor and explain ways in read more..

  • Page - 499

    read more..

  • Page - 500

    8 Memory and I/O Systems 8.1 INTRODUCTION A computer ’s ability to solve problems is influenced by its memory system and the input/output (I/O) devices – such as monitors, keyboards, and printers – that allow us to manipulate and view the results of its compu- tations. This chapter investigates these practical memory and I/O systems. Computer system performance depends on the read more..

  • Page - 501

    off the shelf and bring it to your cubicle. After skimming it, you might put it back and pull out Jung ’s The Psychology of the Unconscious. You might then go back for another quote from Interpretation of Dreams, followed by yet another trip to the stacks for Freud ’s The Ego and the Id. Pretty soon you would get tired of walking from your cubicle to the stacks. read more..

  • Page - 502

    large memory stores the remainder of the data and instructions, so the overall capacity is large. The combination of two cheap memories is much less expensive than a single large fast memory. These principles extend to using an entire hierarchy of memories of increasing capacity and decreasing speed. Computer memory is generally built from DRAM chips. In 2012, a typical PC had read more..

  • Page - 503

    eliminates lengthy delays caused by traveling to and from a separate chip. In 2012, on-chip SRAM costs were on the order of $10,000/GB, but the cache is relatively small (kilobytes to several megabytes), so the overall cost is low. Caches can store both instructions and data, but we will refer to their contents generically as “data.” If the processor requests data that is read more..

  • Page - 504

    access input and output devices, such as keyboards and monitors, in much the same way as they access memory. actionGoTo:531,Section actionGoTo:531,8.5 investigates such mem- ory-mapped I/O. Section 8.6 addresses I/O for embedded systems, and actionGoTo:583,Section actionGoTo:583,8.7 describes major I/O standards for personal computers. 8.2 MEMORY SYSTEM PERFORMANCE ANALYSIS Designers (and computer read more..

  • Page - 505

    where tcache, tMM, and tVM are the access times of the cache, main mem- ory, and virtual memory, and MRcache and MRMM are the cache and main memory miss rates, respectively. Example 8.2 CALCULATING AVERAGE MEMORY ACCESS TIME Suppose a computer system has a memory organization with only two levels of hierarchy, a cache and main memory. What is the average memory access time read more..

  • Page - 506

    of the cache is smaller than that of main memory, the computer system designer must choose what subset of the main memory is kept in the cache. When the processor attempts to access data, it first checks the cache for the data. If the cache hits, the data is available immediately. If the cache misses, the processor fetches the data from main memory and places it in the read more..

  • Page - 507

    variable is likely to be used again, creating temporal locality. If an element in an array is used, other elements in the same array are also likely to be used, creating spatial locality. 8.3.2 How is Data Found? A cache is organized into S sets, each of which holds one or more blocks of data. The relationship between the address of data in main memory and the location read more..

  • Page - 508

    map to set 1, as shown in blue. Likewise, data at addresses 0x00000010, . . . , 0xFFFFFFF0 all map to set 4, and so forth. Each main memory address maps to exactly one set in the cache. Example 8.4 CACHE FIELDS To what cache set in actionGoTo:508,Figure actionGoTo:508,8.5 does the word at address 0x00000014 map? Name another address that maps to the same set. Solution: The read more..

  • Page - 509

    Example 8.5 CACHE FIELDS Find the number of set and tag bits for a direct mapped cache with 1024 (2 10) sets and a one-word block size. The address size is 32 bits. Solution: A cache with 2 10 sets requires log2(2 10) = 10 set bits. The two least signif- icant bits of the address are the byte offset, and the remaining 32 − 10 – 2 = 20 bits form the tag. read more..

  • Page - 510

    address and the valid bit is 1, the cache hits and the data is returned to the processor. Otherwise, the cache misses and the memory system must fetch the data from main memory. Example 8.6 TEMPORAL LOCALITY WITH A DIRECT MAPPED CACHE Loops are a common source of temporal and spatial locality in applications. Using the eight-entry cache of actionGoTo:509,Figure actionGoTo:509,8.7, read more..

  • Page - 511

    Example 8.7 CACHE BLOCK CONFLICT What is the miss rate when the following loop is executed on the eight-word direct mapped cache from actionGoTo:509,Figure actionGoTo:509,8.7? Assume that the cache is initially empty. addi $t0, $0, 5 loop: beq $t0, $0, done lw $t1, 0x4($0) lw $t2, 0x24($0) addi $t0, $t0, –1 j loop done: Solution: Memory addresses 0x4 and 0x24 both map to set 1. read more..

  • Page - 512

    log24 = 2 set bits rather than 3 are used to select the set. The tag increases from 27 to 28 bits. Each set contains two ways or degrees of associativity. Each way consists of a data block and the valid and tag bits. The cache reads blocks from both ways in the selected set and checks the tags and valid bits for a hit. If a hit occurs in one of the ways, read more..

  • Page - 513

    caches tend to have the fewest conflict misses for a given cache capacity, but they require more hardware for additional tag comparisons. They are best suited to relatively small caches because of the large number of comparators. Block Size The previous examples were able to take advantage only of temporal locality, because the block size was one word. To exploit spatial locality, read more..

  • Page - 514

    is organized as two sets. Thus, only log22 = 1 bit is used to select the set. A multiplexer is now needed to select the word within the block. The multiplexer is controlled by the log24 = 2 block offset bits of the address. The most significant 27 address bits form the tag. Only one tag is needed for the entire block, because the words in the block are at read more..

  • Page - 515

    consists of a data block and its associated valid and tag bits. Caches are characterized by ▶ capacity C ▶ block size b (and number of blocks, B = C/b) ▶ number of blocks in a set (N) actionGoTo:515,Table actionGoTo:515,8.2 summarizes the various cache organizations. Each address in memory maps to only one set but can be stored in any of the ways. Cache capacity, read more..

  • Page - 516

    block within the least recently used group. Such a policy is called pseudo-LRU and is good enough in practice. Example 8.10 LRU REPLACEMENT Show the contents of an eight-word two-way set associative cache after executing the following code. Assume LRU replacement, a block size of one word, and an initially empty cache. lw $t0, 0x04($0) lw $t1, 0x24($0) lw $t2, 0x54($0) Solution: The read more..

  • Page - 517

    Multiple-Level Caches Large caches are beneficial because they are more likely to hold data of interest and therefore have lower miss rates. However, large caches tend to be slower than small ones. Modern systems often use at least two levels of caches, as shown in actionGoTo:517,Figure actionGoTo:517,8.16. The first-level (L1) cache is small enough to provide a one- or two-cycle read more..

  • Page - 518

    Reducing Miss Rate Cache misses can be reduced by changing capacity, block size, and/or associativity. The first step to reducing the miss rate is to understand the causes of the misses. The misses can be classified as compulsory, capa- city, and conflict. The first request to a cache block is called a compulsory miss, because the block must be read from memory regardless of read more..

  • Page - 519

    As mentioned, miss rate can also be decreased by using larger block sizes that take advantage of spatial locality. But as block size increases, the number of sets in a fixed-size cache decreases, increasing the probabil- ity of conflicts. actionGoTo:519,Figure actionGoTo:519,8.18 plots miss rate versus block size (in number of bytes) for caches of varying capacity. For small caches, read more..

  • Page - 520

    write-back cache. Modern caches are usually write-back, because main memory access time is so large. Example 8.12 WRITE-THROUGH VERSUS WRITE-BACK Suppose a cache has a block size of four words. How many main memory accesses are required by the following code when using each write policy: write-through or write-back? sw $t0, 0x0($0) sw $t0, OxC($0) sw $t0, 0x8($0) sw $t0, 0x4($0) read more..

  • Page - 521

    8.4 VIRTUAL MEMORY Most modern computer systems use a hard drive made of magnetic or solid state storage as the lowest level in the memory hierarchy (see actionGoTo:504,Fig- actionGoTo:504,ure actionGoTo:504,8.4). Compared with the ideal large, fast, cheap memory, a hard drive is large and cheap but terribly slow. It provides a much larger capacity than is possible with a read more..

  • Page - 522

    The objective of adding a hard drive to the memory hierarchy is to inexpensively give the illusion of a very large memory while still providing the speed of faster memory for most accesses. A computer with only 128 MB of DRAM, for example, could effectively provide 2 GB of mem- ory using the hard drive. This larger 2-GB memory is called virtual mem- ory, and the smaller read more..

  • Page - 523

    the block. In an analogous virtual memory system, each physical page would need a comparator to check the most significant virtual address bits against a tag to determine whether the virtual page maps to that physical page. A realistic virtual memory system has so many physical pages that providing a comparator for each page would be excessively expensive. Instead, the virtual read more..

  • Page - 524

    actionGoTo:524,Figure actionGoTo:524,8.21 illustrates the page organization of a virtual memory sys- tem with 2 GB of virtual memory and 128 MB of physical memory divided into 4-KB pages. MIPS accommodates 32-bit addresses. With a 2-GB = 2 31-byte virtual memory, only the least significant 31 virtual address bits are used; the 32nd bit is always 0. Similarly, with a 128-MB = 2 read more..

  • Page - 525

    Example 8.13 VIRTUAL ADDRESS TO PHYSICAL ADDRESS TRANSLATION Find the physical address of virtual address 0x247C using the virtual memory sys- tem shown in actionGoTo:524,Figure actionGoTo:524,8.21. Solution: The 12-bit page offset (0x47C) requires no translation. The remaining 19 bits of the virtual address give the virtual page number, so virtual address 0x247C is found in virtual page read more..

  • Page - 526

    the index into the page table. The page table maps virtual page 0x2 to physical page 0x7FFF. So, virtual address 0x247C maps to physical address 0x7FFF47C. The least significant 12 bits are the same in both the physical and the virtual address. The page table can be stored anywhere in physical memory, at the discretion of the OS. The processor typically uses a dedicated read more..

  • Page - 527

    8.4.3 The Translation Lookaside Buffer Virtual memory would have a severe performance impact if it required a page table read on every load or store, doubling the delay of loads and stores. Fortunately, page table accesses have great temporal local- ity. The temporal and spatial locality of data accesses and the large page size mean that many consecutive loads or stores are read more..

  • Page - 528

    8.4.4 Memory Protection So far, this section has focused on using virtual memory to provide a fast, inexpensive, large memory. An equally important reason to use virtual memory is to provide protection between concurrently running programs. As you probably know, modern computers typically run several pro- grams or processes at the same time. All of the programs are simulta- neously read more..

  • Page - 529

    8.4.5 Replacement Policies* Virtual memory systems use write-back and an approximate least recently used (LRU) replacement policy. A write-through policy, where each write to physical memory initiates a write to the hard drive, would be impracti- cal. Store instructions would operate at the speed of the hard drive instead of the speed of the processor (milliseconds instead of read more..

  • Page - 530

    hard drive when V is 0. The page table offset indexes the second-level page table. The remaining 12 bits of the virtual address are the page offset, as before, for a page size of 2 12 = 4KB. In actionGoTo:530,Figure actionGoTo:530,8.26 the 19-bit virtual page number is broken into 9 and 10 bits, to indicate the page table number and the page table offset, respectively. read more..

  • Page - 531

    Solution: As always, only the virtual page number requires translation. The most significant nine bits of the virtual address, 0x0, give the page table number, the index into the first-level page table. The first-level page table at entry 0x0 indicates that the second-level page table is resident in memory (V = 1) and its physical address is 0x2375000. The next ten bits of the read more..

  • Page - 532

    systems, devices could include a toaster ’s heating element, a doll ’sspeech synthesizer, an engine ’sfuelinjector, asatellite ’s solar panel positioning motors, and so forth. A processor accesses an I/O device using the address and data busses in the same way that it accesses memory. A portion of the address space is dedicated to I/O devices rather than memory. For example, read more..

  • Page - 533

    Solution: The following MIPS assembly code writes the value 7 to I/O Device 1. addi $t0, $0, 7 sw $t0, 0xFFF4($0) # FFF4 is sign-extended to 0xFFFFFFF4 The address decoder asserts WE1 because the address is 0xFFFFFFF4 and Mem- Write is TRUE. The value on the WriteData bus, 7, is written into the register con- nected to the input pins of I/O Device 1. To read from I/O read more..

  • Page - 534

    For the sake of concreteness, this section will illustrate embedded system I/O in the context of a commercial microcontroller. Specifically, we will focus on the PIC32MX675F512H, a member of Microchip ’s PIC32-series of microcontrollers based on the 32-bit MIPS microproces- sor. The PIC32 family also has a generous assortment of on-chip periph- erals and memory so that the complete read more..

  • Page - 535

    actionGoTo:536,Figure actionGoTo:536,8.31 shows the pinout of the microcontroller. Pins include power and ground, clock, reset, and many I/O pins that can be used for general- purpose I/O and/or special-purpose peripherals. actionGoTo:536,Figure actionGoTo:536,8.32 shows a photo- graph of the microcontroller in a 64-pin Thin Quad Flat Pack (TQFP) with pins along the four sides spaced at 20 read more..

  • Page - 536

    VDD and GND pins to reduce power supply noise by providing a low- impedance path. An assortment of bypass capacitors provide a reservoir of charge to keep the power supply stable in the event of sudden changes in cur- rent demand. A bypass capacitor also connects to the VCORE pin, which serves an internal 1.8 V voltage regulator. The PIC32 typically draws 1 –2 mA/MHz of read more..

  • Page - 537

    runs at a fraction (e.g., half) of the speed of the main system clock SYSCLK. This clocking scheme can be set by configuration bits in the MPLAB development software or by placing the following lines of code at the start of your C program. #pragma config FPBDIV = DIV_2 // peripherals operate at half // sysckfreq (20 MHz) #pragma config POSCMOD = EC // configure primary read more..

  • Page - 538

    The easiest way to program the microcontroller is with a Microchip In Circuit Debugger (ICD) 3, affectionately known as a puck. The ICD3, shown in actionGoTo:538,Figure actionGoTo:538,8.34, allows the programmer to communicate with the PIC32 from a PC to download code and to debug the program. The ICD3 connects to a USB port on the PC and to a six-pin RJ-11 modular connector read more..

  • Page - 539

    determine whether the pin of the port is an input or an output, while the PORTx registers indicate the value read from an input or driven to an output. The 16 least significant bits of each register correspond to the sixteen pins of the GPIO port. When a given bit of the TRISx register is 0, the pin is an output, and when it is 1, the pin is an input. It is read more..

  • Page - 540

    sets the first and third bits of PORTD and clears the fourth bit. If the bot- tom four bits of PORTD had been 1110, they would become 0111. The number of GPIO pins available depends on the size of the pack- age. actionGoTo:540,Table actionGoTo:540,8.5 summarizes which pins are available in various packages. For example, a 100-pin TQFP provides pins RA[15:14], RA[10:9], and read more..

  • Page - 541

    in a fashion similar to SPI. USB and Ethernet are more complex, high- performance standards described in actionGoTo:584,Sections actionGoTo:584,8.7.1 actionGoTo:584,and actionGoTo:584,8.7.4, respectively. Serial Peripheral Interface (SPI) SPI (pronounced “S-P-I”) is a simple synchronous serial protocol that is easy to use and relatively fast. The physical interface consists of three pins: Serial read more..

  • Page - 542

    Table 8.6 SPIxCON register fields Bits Name Function 31 FRMEN 1: Enable framing 30 FRMSYNC Frame sync pulse direction control 29 FRMPOL Frame sync polarity (1 = active high) 28 MSSEN 1: Enable slave select generation in master mode 27 FRMSYPW Frame sync pulse width bit (1 = 1 word wide, 0 = 1 clock wide) 26:24 FRMCNT[2:0] Frame sync pulse counter (frequency of sync pulses) 23 MCLKSEL read more..

  • Page - 543

    interrupts are not used in this section but can be found in the datasheet. STAT is the status register indicating, for example, whether the receive register is full. Again, the details of this register are also fully described in the PIC32 datasheet. The serial clock can be configured to toggle at half the peripheral clock rate or less. SPI has no theoretical limit on the read more..

  • Page - 544

    Example 8.19 SENDING AND RECEIVING BYTES OVER SPI Design a system to communicate between a PIC® master and an FPGA slave over SPI. Sketch a schematic of the interface. Write the C code for the microcon- troller to send the character 'A' and receive a character back. Write HDL code for an SPI slave on the FPGA. How could the slave be simplified if it only needs to read more..

  • Page - 545

    int main(void) { char received; initspi(); // initialize the SPI port received = spi_send_receive('A'); // send letter A and receive byte // back from slave } The HDL code for the FPGA is listed below and a block diagram with timing is shown in actionGoTo:505,Figure actionGoTo:505,8.38. The FPGA uses a shift register to hold the bits that have been received from the master and read more..

  • Page - 546

    logic [2:0] cnt; logic qdelayed; // 3-bit counter tracks when full byte is transmitted always_ff @(negedge sck, posedge reset) if (reset) cnt = 0; else cnt = cnt + 3 ’ b1; // loadable shift register // loads d at the start, shifts sdi into bottom on each step always_ff @(posedge sck) q <= (cnt == 0) ? {d[6:0], sdi} : {q[6:0], sdi}; // align sdo to falling edge of read more..

  • Page - 547

    as RS-232 and RS-485. For example, computer serial ports use the RS- 232C standard, introduced in 1969 by the Electronic Industries Associa- tion. The standard originally envisioned connecting Data Terminal Equipment (DTE) such as a mainframe computer to Data Communica- tion Equipment (DCE) such as a modem. Although a UART is relatively slow compared to SPI, the standards have been read more..

  • Page - 548

    the total collection of data and parity has an even number of 1 ’s; in other words, the parity bit is the XOR of the data bits. The receiver can then check if an even number of 1 ’s was received and signal an error if not. Odd parity is the reverse and is hence the XNOR of the data bits. A common choice is 8 data bits, no parity, and 1 stop bit, read more..

  • Page - 549

    RS-232 represents a 0 electrically with 3 to 15 V and a 1 with −3to −15 V; this is called bipolar signaling. A transceiver converts the digital logic levels of the UART to the positive and negative levels expected by RS-232, and also provides electrostatic discharge protection to protect the serial port from damage when the user plugs in a cable. The MAX3232E is a read more..

  • Page - 550

    Example 8.20 SERIAL COMMUNICATION WITH A PC Develop a circuit and a C program for a PIC32 to communicate with a PC over a serial port at 115200 baud with 8 data bits, 1 stop bit, and no parity. The PC should be running a console program such as PuTTY actionGoTo:550,3 to read and write over the serial port. The program should ask the user to type a string. It read more..

  • Page - 551

    The code is listed below. The Enter key in the terminal program corresponds to a carriage return character called '\r' in C with an ASCII code of 0x0D. To advance to the beginning of the next line when printing, send both the '\n ' and '\r ' (new line and carriage return) actionGoTo:551,characters. actionGoTo:551,4 The functions to initialize, read, and write the serial port read more..

  • Page - 552

    return U2RXREG; // return character received from // serial port } void getstrserial(char *str) { int i = 0; do { // read an entire string until detecting str[i] = getcharserial(); // carriage return } while (str[i++] != ’\r’); // look for carriage return str[i −1] = 0; // null-terminate the string } void putcharserial(char c) { while (U2STAbits.UTXBF); // wait until transmit buffer read more..

  • Page - 553

    Each timer is associated with three 16-bit registers: TxCON, TMRx, and PRx. For example, T1CON is the control register for Timer 1. CON is the control register. TMR contains the current time count. PR is the period register. When a timer reaches the specified period, it rolls back to 0 and sets the TxIF bit in the IFS0 interrupt flag register. A program can poll this read more..

  • Page - 554

    void delaymillis(int millis) { while (millis--) delaymicros(1000); // repeatedly delay 1 ms until done } Another handy timer feature is gated time accumulation, in which the timer only counts while an external pin is high. This allows the timer to measure the duration of an external pulse. It is enabled using the CON register. 8.6.5 Interrupts Timers are often used in conjunction read more..

  • Page - 555

    the timer ISR would be invoked, and finally the microcontroller would return to the main program. The subpriority is in the range of 0 –3. If two events with the same prior- ity are simultaneously pending, the one with the higher subpriority will be handled first. However, the subpriority will not cause a new interrupt to pre- empt an interrupt of the same priority read more..

  • Page - 556

    ADC V DD V ref − V in(t ) V out (t ) DAC X [n] V DD V ref (b) (a) N N clk V ref + X [n] Figure 8.45 ADC and DAC symbols void _ _attribute_ _((interrupt(IPL7))) _ _attribute_ _((vector(4))) blinkISR(void) { PORTDbits.RD0 = !PORTDbits.RD0; // toggle the LED IFS0bits.T1IF = 0; // clear the interrupt flag return; } void initTimer1Interrupt(void) { T1CONbits.ON = 0; // turn timer off TMR1 = read more..

  • Page - 557

    Note that the ADC has a confusing use of the terms sampling rate and sampling time. The sampling time, also called the acquisition time,is the amount of time necessary for the input to settle before conversion takes place. The sampling rate is the number of samples taken per second. It is at most 1/(sampling time + 12 TAD). For example, an input voltage of 2.5 V (half of read more..

  • Page - 558

    be at least 65 ns for correct operation. It is set as a multiple of the peripheral clock period TPB using the ADCS bits according to the relation: TAD = 2TPB ðADCS + 1 Þ (8.7) Hence, for peripheral clocks up to 30 MHz, the ADCS bits can be left at their default value of 0. The sampling time is the amount of time necessary for a new input to stabilize before its read more..

  • Page - 559

    Some DACs accept the N-bit digital input on a parallel interface with N wires, while others accept it over a serial interface such as SPI. Some DACs require both positive and negative power supply voltages, while others operate off of a single supply. Some support a flexible range of supply voltages, while others demand a specific voltage. The input logic levels should be read more..

  • Page - 560

    frequency of 605 Hz (38.7 Ksamples/sec) is set by the time to send each point in the genwaves function, of which the SPI transmission is a major component. #include <P32xxxx.h> #include <math.h> // required to use the sine function #define NUMPTS 64 int sine[NUMPTS], triangle[NUMPTS]; void initio(int freq) { // freq can be 5-605 Hz TRISD = 0xFF00; // make the bottom 8 read more..

  • Page - 561

    else triangle[i] = 510-i*511/NUMPTS; } } void genwaves(void) { int i; while (1) { for (i=0; i<NUMPTS; i++) { IFS0bits.T1IF = 0; // clear timer overflow flag PORTFbits.RF0 = 1; // disable load while inputs are changing SPI2BUF = sine[i]; // send current points to the DACs PORTD = triangle[i]; while (SPI2STATbits.SPIBUSY); // wait until transfer completes PORTFbits.RF0 = 0; // load new read more..

  • Page - 562

    OCxR, and OCxRS. CON is the control register. The OCM bits of the CON register should be set to 1102 to activate PWM mode, and the ON bit should be enabled. By default, the output compare uses Timer2 in 16-bit mode, but the OCTSEL and OC32 bits can be used to select Timer3 and/or 32-bit mode. In PWM mode, RS sets the duty cycle, the timer ’speriodregister PR sets read more..

  • Page - 563

    links, and motor control. Standard communication interfaces including USB and Ethernet are described in actionGoTo:584,Sections actionGoTo:584,8.7.1 actionGoTo:584,and actionGoTo:584,8.7.4. Character LCDs A character LCD is a small liquid crystal display capable of showing one or a few lines of text. They are commonly used in the front panels of appliances such as cash registers, laser read more..

  • Page - 564

    ▶ Wait > 4100 μs ▶ Write 0x30 to set 8-bit mode again ▶ Wait > 100 μs ▶ Write 0x30 to set 8-bit mode yet again ▶ Wait until busy flag is clear ▶ Write 0x3C to set 2 lines and 5 × 8 dot font ▶ Wait until busy flag is clear ▶ Write 0x08 to turn display OFF ▶ Wait until busy flag is clear ▶ Write 0x01 to clear the display ▶ Wait > 1530 read more..

  • Page - 565

    ▶ Write 0x06 to set entry mode to increment cursor after each character ▶ Wait until busy flag is clear ▶ Write 0x0C to turn display ON with no cursor Then, to write text to the LCD, the microcontroller can send a sequence of ASCII characters. It may also send the instructions 0x01 to clear the display or 0x02 to return to the home position in the upper left. read more..

  • Page - 566

    char lcdprintstring(char *str) { while(*str != 0) { // loop until null terminator lcdwrite(*str, DATA); // print this character lcdbusywait(); str++; // advance pointer to next character in string } } void lcdclear(void) { lcdwrite(0x01, INSTR); // clear display delaymicros(1530); // wait for execution } void initlcd(void) { // set LCD control pins TRISC = 0x1FFF; // PORTC[15:13] are outputs, read more..

  • Page - 567

    In a cathode ray tube, an electron gun scans across the screen from left to right exciting fluorescent material to display an image. Color CRTs use three different phosphors for red, green, and blue, and three electron beams. The strength of each beam determines the intensity of each color in the pixel. At the end of each scanline, the gun must turn off for a hor- read more..

  • Page - 568

    The horizontal timing involves a front porch of 24 clocks, hsync pulse of 136 clocks, and back porch of 160 clocks. The vertical timing involves a front porch of 3 scan lines, vsync pulse of 6 lines, and back porch of 29 lines. actionGoTo:568,Figure actionGoTo:568,8.52 shows the pinout for a female connector coming from a video source. Pixel information is conveyed with three read more..

  • Page - 569

    an FPGA driving a VGA monitor through an ADV7125 triple 8-bit video DAC. The DAC receives 8 bits of R, G, and B from the FPGA. It also receives a SYNC_b signal that is driven active low whenever HSYNC or VSYNC are asserted. The video DAC produces three output currents to drive the red, green, and blue analog lines, which are normally 75 Ω trans- mission lines parallel read more..

  • Page - 570

    logic [9:0] x, y; // Use a PLL to create the 25.175 MHz VGA pixel clock // 25.175 MHz clk period = 39.772 ns // Screen is 800 clocks wide by 525 tall, but only 640 x 480 used for display // HSync = 1/(39.772 ns * 800) = 31.470 KHz // Vsync = 31.474 KHz / 525 = 59.94 Hz (~60 Hz refresh rate) pll vgapll(.inclk0(clk), .c0(vgaclk)); // generate monitor timing read more..

  • Page - 571

    // counters for horizontal and vertical positions always @(posedge vgaclk) begin x++; if (x == HMAX) begin x= 0; y++; if (y == VMAX) y = 0; end end // compute sync signals (active low) assign hsync = ~(hcnt >= HACTIVE + HFP & hcnt < HACTIVE + HFP + HSYN); assign vsync = ~(vcnt >= VACTIVE + VFP & vcnt < VACTIVE + VFP + VSYN); assign sync_b = hsync read more..

  • Page - 572

    Bluetoothisnamed forKing Harald Bluetooth of Denmark, a 10th century monarch who unified the warring Danish tribes. This wireless standard was only partially successful at unifying a host of competing wireless protocols! Table 8.10 Bluetooth classes Class Transmitter Power (mW) Range (m) 1 100 100 2 2.5 10 31 5 charrom.txt // A ASCII 65 011100 100010 100010 111110 100010 100010 100010 000000 //B ASCII read more..

  • Page - 573

    range and power consumption. In the basic rate mode, it operates at 1 Mbit/ sec using Gaussian frequency shift keying (FSK). In ordinary FSK, each bit is conveyed by transmitting a frequency of fc ± fd,where fc is the center fre- quency of the channel and fd is an offset of at least 115 kHz. The abrupt transition in frequencies between bits consumes extra bandwidth. In read more..

  • Page - 574

    (b) (c) Brushes Shaft Power lugs (a) Stator north pole Stator south pole Rotor electromagnet Commutator Figure 8.58 DC motor if the user wants to know the current position of the motor. Servo motors accept a pulse-width modulated signal to specify their position over a lim- ited range of angles. They are very easy to interface, but are not as power- ful and are not suited to read more..

  • Page - 575

    Therefore, many H-bridges also have protection diodes in parallel with the switches, as shown in actionGoTo:575,Figure actionGoTo:575,8.59(b). If the inductive kick drives either terminal of the motor above Vmotor or below ground, the diodes will turn ON and clamp the voltage at a safe level. H-bridges can dissipate large amounts of power, and a heat sink may be necessary to keep read more..

  • Page - 576

    control a motor. The microcontroller drives the enable signals with a PWM signal to control the speed of the motors. It drives the four other pins to control the direction of each motor. The PWM is configured to work at about 781 Hz with a duty cycle ranging from 0 to 100%. #include <P32xxxx.h> void setspeed(int dutycycle) { OC1RS = dutycycle; // set duty cycle between read more..

  • Page - 577

    In the previous example, there is no way to measure the position of each motor. Two motors are unlikely to be exactly matched, so one is likely to turn slightly faster than the other, causing the robot to veer off course. To solve this problem, some systems add shaft encoders. actionGoTo:577,Figure actionGoTo:577,8.61(a) shows a simple shaft encoder consisting of a disk with read more..

  • Page - 578

    remote-control model airplanes and small robots because they are small, light, and convenient. Finding a motor with an adequate datasheet can be difficult. The center pin with a red wire is normally power, and the black or brown wire is normally ground. Example 8.29 SERVO MOTOR Design a system in which a PIC32 microcontroller drives a servo motor to a desired angle. Solution: read more..

  • Page - 579

    speed backward. A continuous rotation servo may be more convenient and less expensive than a simple DC motor combined with an H-bridge and gear train. Stepper Motor A stepper motor advances in discrete steps as pulses are applied to alter- nate inputs. The step size is usually a few degrees, allowing precise posi- tioning and continuous rotation. Small stepper motors generally read more..

  • Page - 580

    applied to the coil. The current ramps up to I = V/R with a time constant set by L/R, as shown in actionGoTo:581,Figure actionGoTo:581,8.68(a). This works well for slow speed operation. However, at higher speed, the current doesn ’t have enough time to ramp up to the full level, as shown in actionGoTo:581,Figure actionGoTo:581,8.68(b), and the torque drops off. (a) Wave drive A read more..

  • Page - 581

    A more efficient way to drive a stepper motor is by pulse-width mod- ulating a higher voltage. The high voltage causes the current to ramp up to full current more rapidly, then it is turned off (PWM) to avoid over- loading the motor. The voltage is then modulated or chopped to maintain the current near the desired level. This is called chopper constant current drive and read more..

  • Page - 582

    Example 8.30 BIPOLAR STEPPER MOTOR DIRECT WAVE DRIVE Design a system in which a PIC32 microcontroller drives an AIRPAX bipolar stepper motor at a specified speed and direction using direct drive. Solution: actionGoTo:582,Figure actionGoTo:582,8.69 shows a bipolar stepper motor being driven directly by an H-bridge controlled by a PIC32. The spinstepper function initializes the sequence array read more..

  • Page - 583

    T1CONbits.ON = 0; // turn Timer1 off T1CONbits.TCKPS = 3; // prescale by 256 to run slower } void spinstepper(int dir, int steps, float rpm) { // dir = 0 for forward, 1 = reverse { int sequence[4] = {0b00011, 0b01001, 0b00101, 0b10001}; // wave drive sequence int step; PR1 = (int)(20.0e6/(256*(360.0/STEPSIZE)*(rpm/60.0))); // time/step w/ 20 MHz peripheral clock TMR1 = 0; T1CONbits.ON read more..

  • Page - 584

    around parallel links consisting of a wide data bus and a clock signal. As data rates increased, the difference in delay among the wires in the bus set a limit to how fast the bus could run. Moreover, busses con- nected to multiple devices suffer from transmission line problems such as reflections and different flight times to different loads. Noise can also corrupt the read more..

  • Page - 585

    peripherals by standardizing the cables and software configuration process. Billions of USB peripherals are now sold each year. USB 1.0 was released in 1996. It uses a simple cable with four wires: 5 V, GND, and a differential pair of wires to carry data. The cable is impossible to plug in backward or upside down. It operates at up to 12 Mb/s. A device can pull up to read more..

  • Page - 586

    8.7.3 DDR3 Memory DRAM connects to the microprocessor over a parallel bus. In 2012, the present standard is DDR3, a third generation of double-data rate memory bus operating at 1.5 V. Typical motherboards now come with two DDR3 channels so they can access two banks of memory modules simultaneously. actionGoTo:586,Figure actionGoTo:586,8.71 shows a 4 GB DDR3 dual inline memory module read more..

  • Page - 587

    bands at low power. actionGoTo:587,Table actionGoTo:587,8.12 summarizes the capabilities of three gen- erations of Wi-Fi; the emerging 802.11ac standard promises to push wire- less data rates beyond 1 Gb/s. The increasing performance comes from advancing modulation and signal processing, multiple antennas, and wider signal bandwidths. 8.7.5 SATA Internal hard disks require a fast interface to read more..

  • Page - 588

    commonly available as USB devices, making them easy to install. National Instruments (NI) is a leading DAQ manufacturer. High-performance DAQ prices tend to run into the thousands of dollars, mostly because the market is small and has limited competition. For- tunately, as of 2012, NI now sells their handy myDAQ system at a student discount price of $200 including their LabVIEW read more..

  • Page - 589

    and, at the other end, an SPI interface operating at up to 30 Mb/s, along with 3.3 V power and four general purpose I/O pins. actionGoTo:589,Figure actionGoTo:589,8.75 shows an example of connecting a PC to an FPGA using the cable. The cable can optionally supply 3.3 V power to the FPGA. The three SPI pins con- nect to an FPGA slave device like the one from read more..

  • Page - 590

    has scarcely diminished. The delay from when the processor sends an address to main memory until the memory returns the data can now exceed 100 processor clock cycles. Therefore, caches with a low miss rate are essential to good performance. actionGoTo:590,Table actionGoTo:590,8.13 summarizes the evolution of cache systems on Intel x86 processors. The 80486 introduced a unified read more..

  • Page - 591

    to the processor to improve its latency and throughput. The Pentium Pro was packaged in a multichip module (MCM) containing both the processor chip and a second-level cache chip, as shown in actionGoTo:591,Figure actionGoTo:591,8.77. Like the Pentium, the processor had separate 8-KB level 1 instruction and data caches. However, these caches were nonblocking,sothat the out-of-order processor read more..

  • Page - 592

    The Pentium 4 offered a nonblocking level 1 data cache. It switched to a trace cache to store instructions after they had been decoded into micro-ops, avoiding the delay of redecoding each time instructions were fetched from the cache. The Pentium M design was adapted from the Pentium III. It further increased the level 1 caches to 32 KB each and featured a 1- to 2-MB read more..

  • Page - 593

    locations. x86 uses programmed I/O, in which special IN and OUT instructions are used to read and write I/O devices. x86 defines 2 16 I/O ports. The IN instruction reads one, two, or four bytes from the port specified by DX into AL, AX, or EAX. OUT is similar, but writes the port. Connecting a peripheral device to a programmed I/O system is similar to connecting it to read more..

  • Page - 594

    program a MIPS processor in its native assembly language and how to build the processor and memory system using digital building blocks. Throughout, you have seen the application of abstraction, discipline, hier- archy, modularity, and regularity. With these techniques, we have pieced together the puzzle of a microprocessor ’s inner workings. From cell phones to digital television to read more..

  • Page - 595

    Exercises Exercise 8.1 In less than one page, describe four everyday activities that exhibit temporal or spatial locality. List two activities for each type of locality, and be specific. Exercise 8.2 In one paragraph, describe two short computer applications that exhibit temporal and/or spatial locality. Describe how. Be specific. Exercise 8.3 Come up with a sequence of addresses for read more..

  • Page - 596

    Exercise 8.8 A cache has the following parameters: b, block size given in numbers of words; S, number of sets; N, number of ways; and A, number of address bits. (a) In terms of the parameters described, what is the cache capacity, C? (b) In terms of the parameters described, what is the total number of bits required to store the tags? (c) What are S and N for a read more..

  • Page - 597

    (a) If you use a direct mapped cache with a cache size of 1 KB and a block size of 8 bytes (2 words), how many sets are in the cache? (b) With the same cache and block size as in part (a), what is the miss rate of the direct mapped cache for the given memory access pattern? (c) For the given memory access pattern, which of the following would decrease the read more..

  • Page - 598

    storage, address comparison, data output selection, and any other parts you feel are relevant. Note that the multiplexer and comparator blocks may be any size (n or p bits wide, respectively), but the SRAM blocks must be 16K × 4 bits. Be sure to include a neatly labeled block diagram. You need only design the cache for reads. Exercise 8.14 You ’ve joined a hot new read more..

  • Page - 599

    (a) For a given word in memory, what is the total number of locations in which it might be found in the on-chip cache and in the second-level cache? (b) What is the size, in bits, of each tag for the on-chip cache and the second-level cache? (c) Give an expression for the average memory read access time. The caches are accessed in sequence. (d) Measurements show that, read more..

  • Page - 600

    Exercise 8.17 Repeat Exercise 8.16 with the following parameters. (a) The instruction cache is perfect (i.e., always hits) but the data cache has a 15% miss rate. On a cache miss, the processor stalls for 200 ns to access main memory, then resumes normal operation. Taking cache misses into account, what is the average memory access time? (b) How many clock cycles per read more..

  • Page - 601

    you use a page table to store mappings (translations from virtual page num- ber to physical page number). How many page table entries will the page table contain? (g) Assume that, in addition to the physical page number, each page table entry also contains some status information in the form of a valid bit (V) and a dirty bit (D). How many bytes long is each page read more..

  • Page - 602

    how often the requested entry is not found. The main memory miss rate indicates how often page faults occur. (a) What is the average memory access time of the virtual memory system before and after adding the TLB? Assume that the page table is always resident in physical memory and is never held in the data cache. (b) If the TLB has 64 entries, how big (in bits) is read more..

  • Page - 603

    (c) The page table is to be integrated on the processor chip, along with the on-chip cache. The on-chip cache deals only with physical (not virtual) addresses. Is it possible to access the appropriate set of the on-chip cache concurrently with the page table access for a given memory access? Explain briefly the relationship that is necessary for concurrent access to the cache read more..

  • Page - 604

    (b) Draw a schematic for this memory-mapped I/O system. (c) Write HDL code to implement the address decoder for your memory-mapped I/O system. Exercise 8.30 Repeat actionGoTo:603,Exercise actionGoTo:603,8.29 for the FSM in actionGoTo:535,Figure actionGoTo:535,3.30(a). The input A and output Y are memory-mapped to bits 0 and 1, respectively, of address 0xFFFFF040. Exercises 579 read more..

  • Page - 605

    Interview Questions The following exercises present questions that have been asked on interviews. Question 8.1 Explain the difference between direct mapped, set associative, and fully associative caches. For each cache type, describe an application for which that cache type will perform better than the other two. Question 8.2 Explain how virtual memory systems work. Question 8.3 Explain the read more..

  • Page - 606

    This page intentionally left blank read more..

  • Page - 607

    This page intentionally left blank read more..

  • Page - 608

    A Digital System Implementation A.1 INTRODUCTION This appendix introduces practical issues in the design of digital systems. The material is not necessary for understanding the rest of the book, how- ever, it seeks to demystify the process of building real digital systems. Moreover, we believe that the best way to understand digital systems is to build and debug them yourself in read more..

  • Page - 609

    A.2.1 Logic Gates actionGoTo:610,Figure actionGoTo:610,A.1 shows the pinout diagrams for a variety of popular 74xx-series chips containing basic logic gates. These are sometimes called small-scale integration (SSI) chips, because they are built from a few transistors. The 14-pin packages typically have a notch at the top or a dot on the top left to indicate orientation. Pins are read more..

  • Page - 610

    D Q Q 1 2 3 4 5 6 7 14 13 12 11 10 9 8 GND VDD 1A 1B 1Y 2A 2B 2Y 4B 4A 4Y 3B 3A 3Y 7400 NAND 1 2 3 4 5 6 7 14 13 12 11 10 9 8 GND VDD 1Y 1A 1B 2Y 2A 2B 4Y 4B 4A 3Y 3B 3A 7402 NOR 1 2 3 4 5 6 7 14 13 12 11 10 9 8 GND VDD 1A 1Y 2A 2Y 3A 3Y 6A 6Y 5A 5Y 4A 4Y 7404 NOT 1 2 3 4 5 6 7 14 13 12 11 10 9 8 GND VDD 1A 1B 2A 2B 2C 2Y 1C 1Y 3C 3B 3A 3Y 7411 AND3 1 2 3 4 5 6 7 read more..

  • Page - 611

    CLR CLK D0 D1 D2 D3 ENP GND 74161 /163 Counter VDD Q0 RCO LOAD ENT 1 2 3 4 5 6 7 8 16 15 14 13 12 11 10 9 Q1 Q2 Q3 always @(posedge CLK) // 74163 if (~CLRb) Q <= 4'b0000; else if (~LOADb) Q <= D; else if (ENP & ENT) Q <= Q+1; assign RCO = (Q == 4'b1111) & ENT; 4-bit Counter CLK: clock Q3:0: counter output D3:0: parallel input CLRb: async reset read more..

  • Page - 612

    4-bit ALU A 3:0,B3:0 : Y 3:0 : output F 3:0 : function select M : mode select Cb n : carry in Cb nplus4 : carry out AeqB : equality (in some modes) X,Y : carry lookahead adder outputs B0 A0 S3 S2 S1 S0 C n M F0 F1 F2 GND 74181 ALU VDD A1 B1 A2 B2 A3 B3 Y C n+4 X A=B F3 1 2 3 4 5 6 7 8 9 10 11 12 24 23 22 21 20 19 18 17 16 15 14 13 D1 D2 LT RBO RBI D3 D0 GND 7447 7 - read more..

  • Page - 613

    replacing the contents of the PROM rather than rewiring connections between chips. Lookup tables are useful for small functions but become prohibitively expensive as the number of inputs grows. For example, the classic 2764 8-KB (64-Kb) erasable PROM (EPROM) is shown in actionGoTo:613,Figure actionGoTo:613,A.4. The EPROM has 13 address lines to specify one of the 8K words and 8 data read more..

  • Page - 614

    is one of the most popular classic PLDs. It has 12 dedicated input pins and 10 outputs. The outputs can come directly from the PLA or from clocked registers on the chip. The outputs can also be fed back into the PLA. Thus, the 22V10 can directly implement FSMs with up to 12 inputs, 10 outputs, and10bits ofstate.The 22V10costs about$2inquantities of100.PLDs have been rendered read more..

  • Page - 615

    turned on. The FPGA can download this information from a computer in the laboratory, or can read it from a nonvolatile ROM when power is first applied. Example A.1 FPGA TIMING ANALYSIS Alyssa P. Hacker is using an FPGA to implement an M &M sorter with a color sensor and motors to put red candy in one jar and green candy in another. Her design is implemented as an read more..

  • Page - 616

    The cost has declined at approximately 30% per year, so FPGAs are becoming extremely popular. A.4 APPLICATION-SPECIFIC INTEGRATED CIRCUITS Application-specific integrated circuits (ASICs) are chips designed for a particular purpose. Graphics accelerators, network interface chips, and cell phone chips are common examples of ASICS. The ASIC designer places transistors to form logic gates and read more..

  • Page - 617

    the data book has additional explanatory information. This information can usually be found on the Web with a careful search. This section dissects the Texas Instruments (TI) data sheet for a 74HC04 inverter chip. The data sheet is relatively simple but illustrates many of the major elements. TI still manufacturers a wide variety of 74xx-series chips. In the past, many other read more..

  • Page - 618

    SCLS078D – DECEMBER 1982 – REVISED JULY 2003 POST OFFICE BOX 655303 • DALLAS, TEXAS 75265 Wide Operating Voltage Range of 2 V to 6 V Outputs Can Drive Up To 10 LSTTL Loads Low Power Consumption, 20- μA Max I CC Typical tpd = 8 ns ±4-mA Output Drive at 5 V Low Input Current of 1 μA Max 1 2 3 4 5 6 7 14 13 12 11 10 9 8 1A 1Y 2A 2Y 3A 3Y GND VCC 6A 6Y 5A 5Y 4A 4Y read more..

  • Page - 619

    SN54HC04, SN74HC04 HEX INVERTERS SCLS078D – DECEMBER 1982 – REVISED JULY 2003 POST OFFICE BOX 655303 • DALLAS, TEXAS 75265 logic diagram (positive logic) Y A absolute maximum ratings over operating free-air temperature range (unless otherwise noted)† Supply voltage range, VCC –0.5 V to 7 V .......................................................... Input clamp current, IIK (VI < 0 or read more..

  • Page - 620

    to 50 °C + 0.02 W × 80 °C/W = 51.6 °C. Internal power dissipation is sel- dom important for 74xx-series chips, but it becomes important for mod- ern chips that dissipate tens of watts or more. The recommended operating conditions define the environment in which the chip should be used. Within these conditions, the chip should meet specifications. These conditions are more read more..

  • Page - 621

    SN54HC04, SN74HC04 HEX INVERTERS SCLS078D – DECEMBER 1982 – REVISED JULY 2003 POST OFFICE BOX 655303 • DALLAS, TEXAS 75265 TEXAS INSTRUMENTS electrical characteristics over recommended operating free-air temperature range (unless otherwise noted) PARAMETER TEST CONDITIONS V CC TA = 25 °C SN54HC04 SN74HC04 UNIT MIN TYP MAX MIN MAX MIN MAX UNIT 2V 1.9 1.998 1.9 1.9 I OH = –20 μA I read more..

  • Page - 622

    A.6 LOGIC FAMILIES The 74xx-series logic chips have been manufactured using many different technologies, called logic families, that offer different speed, power, and logic level trade-offs. Other chips are usually designed to be compatible with some of these logic families. The original chips, such as the 7404, were built using bipolar transistors in a technology called read more..

  • Page - 623

    asymmetric logic levels. They conform to the “TTL” logic levels: VIH = 2V, VIL = 0.8 V, VOH > 2.4 V, and VOL < 0.5 V. As CMOS circuits matured in the 1980s and 1990s, they became popular because they draw very little power supply or input current. The High Speed CMOS (HC) and Advanced High Speed CMOS (AHC) families draw almost no static power. They also deliver read more..

  • Page - 624

    A.7 PACKAGING AND ASSEMBLY Integrated circuits are typically placed in packages made of plastic or ceramic. The packages serve a number of functions, including connecting the tiny metal I/O pads of the chip to larger pins in the package for ease of connection, protecting the chip from physical damage, and spreading the heat generated by the chip over a larger area to help read more..

  • Page - 625

    sides, with 0.05-inch spacing. They can be soldered directly to a board or placed in special sockets. Quad flat packs (QFPs) accommodate a large number of pins using closely spaced legs on all four sides. Ball grid arrays (BGAs) eliminate the legs altogether. Instead, they have hundreds of tiny solder balls on the underside of the package. They are carefully placed over read more..

  • Page - 626

    Printed Circuit Boards Instead of breadboarding, chip packages may be soldered to a printed cir- cuit board (PCB). The PCB is formed of alternating layers of conducting copper and insulating epoxy. The copper is etched to form wires called traces. Holes called vias are drilled through the board and plated with metal to connect between layers. PCBs are usually designed with compu- read more..

  • Page - 627

    PCB traces are normally made of copper because of its low resistance. The traces are embedded in an insulating material, usually a green, fire- resistant plastic called FR4. A PCB also typically has copper power and ground layers, called planes, between signal layers. actionGoTo:627,Figure actionGoTo:627,A.13 shows a cross-section of a PCB. The signal layers are on the top and read more..

  • Page - 628

    assumption is good enough. When the wire is long or the signal is very fast, the transmission time along the wire becomes important to accurately determine the circuit delay. We must model such wires as transmission lines, in which a wave of voltage and current propagates at the speed of light. When the wave reaches the end of the line, it may reflect back along the read more..

  • Page - 629

    The speed of light in free space is v = c= 3 × 10 8 m/s. Signals in a PCB travel at about half this speed, because the FR4 insulator has four times the permittivity of air. Thus, PCB signals travel at about 1.5 × 10 8 m/s, or 15 cm/ns. The time delay for a signal to travel along a transmission line of length l is td = l v (A.4) The characteristic impedance of read more..

  • Page - 630

    switch that closes at time t = 0. The other end is connected to the 50 Ω matched load. This section analyzes the voltages and currents at points A, B, and C —at the beginning of the line, one-third of the length along the line, and at the end of the line, respectively. actionGoTo:630,Figure actionGoTo:630,A.17 shows the voltages at points A, B,and Covertime. Initially, read more..

  • Page - 631

    Z0, because the load at the end of the line cannot possibly influence the behavior of the line until a speed of light delay has elapsed. Hence, by the voltage divider equa- tion, the incident voltage flowing down the line is V = VS Z0 Z0 + ZS = VS 2 (A.6) Thus, at t = 0, a wave of voltage, V = VS 2 , is sent down the line from point A. Again, the signal read more..

  • Page - 632

    the wave is absorbed by the source termination impedance that matches the characteristic impedance of the line. Thus, the system reaches steady state at time t = 2td, and the transmission line becomes equivalent to an equipoten- tial wire with voltage VS and current I = 0. A.8.3 Short Termination actionGoTo:632,Figure actionGoTo:632,A.24 shows a transmission line terminated with a read more..

  • Page - 633

    a transmission line of characteristic impedance Z0 reaches a termination impedance ZT at the end of the line, the reflection coefficient is kr = ZT − Z0 ZT + Z0 (A.7) Note a few special cases. If the termination is an open circuit (ZT = ∞), kr = 1, because the incident wave is entirely reflected (so the current out the end of the line remains zero). If the termination read more..

  • Page - 634

    The bounce diagram shown in actionGoTo:634,Figure actionGoTo:634,A.30 helps visualize reflections off both ends of the transmission line. The horizontal axis represents dis- tance along the transmission line, and the vertical axis represents time, increasing downward. The two sides of the bounce diagram represent the source and load ends of the transmission line, points A and C. The read more..

  • Page - 635

    against time. As t approaches infinity, the voltages approach steady state with VA = VB = VC = VS. A.8.5 When to Use Transmission Line Models Transmission line models for wires are needed whenever the wire delay, td, is longer than a fraction (e.g., 20%) of the edge rates (rise or fall times) of a signal. If the wire delay is shorter, it has an insignificant effect on read more..

  • Page - 636

    with the load (between the input of the receiver gate and ground). When the driver switches from 0 to VDD, it sends a wave with voltage VDD down the line. The wave is absorbed by the matched load termination, and no reflections take place. In series termination, a source resistor ZS is placed in series with the driver to raise the source impedance to Z0. The load has read more..

  • Page - 637

    until the reflection returns. If other gates are attached to the middle of the line, they will momentarily see an illegal logic level. Therefore, series termination works best for point-to-point communication with a single driver and a single receiver. Parallel termination is better for a bus with multiple receivers, because receivers at the middle of the line never see an illegal read more..

  • Page - 638

    Using the impedances of an inductor and a capacitor, we rewrite the relationship of actionGoTo:638,Figure actionGoTo:638,A.34 in equation form: Z0 = j ωLdx + ½Z0jjð1=ðjωCdxÞފ (A.11) Rearranging, we get Z2 0ðjωCÞ − j ωL + ω2Z0LCdx = 0 (A.12) Taking the limit as dx approaches 0, the last term vanishes and we find that Z0 = ffiffiffiffi L C r (A.13) A.8.8 Derivation of the read more..

  • Page - 639

    Using Ohm ’s law and substituting for IL, Ii, and Ir in actionGoTo:638,Equation actionGoTo:638,A.15, we get Vi + Vr ZL = Vi Z0 − Vr Z0 (A.16) Rearranging, we solve for the reflection coefficient, kr: Vr Vi = ZL − Z0 ZL + Z0 = kr (A.17) A.8.9 Putting It All Together Transmission lines model the fact that signals take time to propagate down long wires because the speed of light read more..

  • Page - 640

    When the first gate switches, a wave of voltage is driven onto the transmission line. The source impedance and transmission line form a vol- tage divider, so the voltage of the incident wave is Vi = VS Z0 Z0 + ZS (A.18) At time td, the wave reaches the end of the line. Part is absorbed by the load impedance, and part is reflected. The reflection coefficient kr indicates read more..

  • Page - 641

    development kit to design and test the FPGA costs $1500. Each FPGA costs $17. The ASIC costs $600,000 for a mask set and $4 per chip. Regardless of what chip implementation he chooses, Ben needs to mount the pack- aged chip on a printed circuit board (PCB), which will cost him $1.50 per board. He thinks he can sell 1000 devices per month. Ben has coerced a team of read more..

  • Page - 642

    Solving the equation gives N = 46,039 units, or 1919 units per month. He would need to almost double his monthly sales to benefit from the ASIC solution. Example A.5 BEN GETS LESS GREEDY Ben realizes that his eyes have gotten too big for his stomach, and he doesn ’t think he can sell more than 1000 devices per month. But he does think the product life can be longer read more..

  • Page - 643

    This page intentionally left blank read more..

  • Page - 644

    B MIPS Instructions This appendix summarizes MIPS instructions used in this book. actionGoTo:645,Tables actionGoTo:645,B.1 actionGoTo:645,– actionGoTo:645,B.3 define the opcode and funct fields for each instruction, along with a short description of what the instruction does. The following notations are used: ▶ [reg]: contents of the register ▶ imm: 16-bit immediate field of the I-type read more..

  • Page - 645

    Table B.1 Instructions, sorted by opcode Opcode Name Description Operation 000000 (0) R-type all R-type instructions see actionGoTo:646,Table actionGoTo:646,B.2 000001 (1) (rt = 0/1) bltz rs, label / bgez rs, label branch less than zero/branch greater than or equal to zero if ([rs] < 0) PC = BTA/ if ([rs] ≥ 0) PC = BTA 000010 (2) j label jump PC = JTA 000011 (3) jal label jump and read more..

  • Page - 646

    Table B.1 Instructions, sorted by opcode —Cont’d Opcode Name Description Operation 101000 (40) sb rt, imm(rs) store byte [Address]7:0 = [rt]7:0 101001 (41) sh rt, imm(rs) store halfword [Address]15:0 = [rt]15:0 101011 (43) sw rt, imm(rs) store word [Address] = [rt] 110001 (49) lwc1 ft, imm(rs) load word to FP coprocessor 1 [ft] = [Address] 111001 (56) swc1 ft, imm(rs) store word to FP read more..

  • Page - 647

    Table B.2 R-type instructions, sorted by funct field —Cont’d Funct Name Description Operation 100000 (32) add rd, rs, rt add [rd] = [rs] + [rt] 100001 (33) addu rd, rs, rt add unsigned [rd] = [rs] + [rt] 100010 (34) sub rd, rs, rt subtract [rd] = [rs] – [rt] 100011 (35) subu rd, rs, rt subtract unsigned [rd] = [rs] – [rt] 100100 (36) and rd, rs, rt and [rd] = [rs] & read more..

  • Page - 648

    C C Programming C.1 INTRODUCTION The overall goal of this book is to give a picture of how computers work on many levels, from the transistors by which they are constructed all the way up to the software they run. The first five chapters of this book work up through the lower levels of abstraction, from transistors to gates to logic design. actionGoTo:320,Chapters read more..

  • Page - 649

    ▶ Moderate level of abstraction providing higher productivity than assembly language, yet giving the programmer a good understanding of how the code will be executed ▶ Suitability for generating high performance programs ▶ Ability to interact directly with the hardware This chapter is devoted to C programming for a variety of reasons. Most importantly, C allows the programmer to read more..

  • Page - 650

    ▶ Low-level access: C code is powerful because, in addition to high- level constructs, it provides access to low-level hardware and memory. C.2 WELCOME TO C A C program is a text file that describes operations for the computer to perform. The text file is compiled, converted into a machine-readable format, and run or executed on a computer. actionGoTo:650,C actionGoTo:650,Code read more..

  • Page - 651

    Body: printf("Hello world!\n"); The body of this main function contains one statement, a call to the printf function, which prints the phrase “Hello world! ” followed by a newline character indicated by the special sequence "\n" . Further details about I/O functions are described in actionGoTo:682,Section actionGoTo:682,C.9.1. All programs follow the general format of the read more..

  • Page - 652

    C.3 COMPILATION A compiler is a piece of software that reads a program in a high-level language and converts it into a file of machine code called an executable. Entire textbooks are written on compilers, but we describe them here briefly. The overall operation of the compiler is to (1) preprocess the file by including referenced libraries and expanding macro definitions, (2) read more..

  • Page - 653

    The # indicates that this line in the program will be handled by the pre- processor. Before compilation, the preprocessor replaces each occurrence of the identifier MAXGUESSES in the program with 5 . By convention, #define lines are located at the top of the file and identifiers are written in all capital letters. By defining constants in one location and then using the read more..

  • Page - 654

    C.4 Variables 629 functions, variables, or identifiers are needed, they are conventionally located at the top of a C file. Programmer-created header files can also be included by using quota- tion marks ("") around the file name instead of brackets (<>). For exam- ple, a user-created header file called myfunctions.h would be included using the following line. #include read more..

  • Page - 655

    C.4.1 Primitive Data Types C has a number of primitive, or built-in, data types available. They can be broadly characterized as integers, floating-point variables, and characters. An integer represents a 2 ’s complement or unsigned number within a finite range. A floating-point variable uses IEEE floating point representa- tion to describe real numbers with a finite range and read more..

  • Page - 656

    The scope of a variable is the context in which it can be used. For example, for a local variable, its scope is the function in which it is declared. It is out of scope everywhere else. The machine-dependent nature of the int data type is a blessing and a curse. On the bright side, it matches the native word size of the processor so it can be fetched and manipulated read more..

  • Page - 657

    actionGoTo:657,C actionGoTo:657,Code actionGoTo:657,Examples actionGoTo:657,C.4 actionGoTo:657,and actionGoTo:657,C.5 compare programs using global versus local variables. In actionGoTo:657,C actionGoTo:657,Code actionGoTo:657,Example actionGoTo:657,C.4, the global variable max can be accessed by any function. Using a local variable, as shown in actionGoTo:657,C actionGoTo:657,Code actionGoTo:657,Example actionGoTo:657,C.5, read more..

  • Page - 658

    C.4.3 Initializing Variables A variable needs to be initialized – assigned a value – before it is read. When a variable is declared, the correct number of bytes is reserved for that variable in memory. However, the memory at those locations retains whatever value it had last time it was used, essentially a random value. Global and local variables can be initialized either read more..

  • Page - 659

    Table C.3 Operators listed by decreasing precedence Category Operator Description Example Unary ++ post-increment a++; // a = a+1 −− post-decrement x--; // x = x −1 & memory address of a variable x = &y; // x = the memory // address of y ~ bitwise NOT z = ~a; ! Boolean NOT !x − negation y = -a; ++ pre-increment ++a; // a = a+1 −− pre-decrement --x; // x = x −1 read more..

  • Page - 660

    operators. Within the same category, operators are evaluated in the order that they appear in the program. Unary operators, also called monadic operators, have a single operand. Ternary operators have three operands, and all others have two. The ternary operator (from the Latin ternarius meaning consisting of three) chooses the second or third operand depending on whether the first read more..

  • Page - 661

    Simple assignment uses the = operator. C code also allows for com- pound assignment, that is, assignment after a simple operation such as addition (+=) or multiplication (*=). In compound assignments, the vari- able on the left side is both operated on and assigned the result. actionGoTo:661,C actionGoTo:661,Code actionGoTo:661,Example actionGoTo:661,C.7 shows these and other C operations. read more..

  • Page - 662

    C.6 FUNCTION CALLS Modularity is key to good programming. A large program is divided into smaller parts called functions that, similar to hardware modules, have well-defined inputs, outputs, and behavior. actionGoTo:662,C actionGoTo:662,Code actionGoTo:662,Example actionGoTo:662,C.8 shows the sum3 function. The function declaration begins with the return type, int , followed by the name, sum3 , read more..

  • Page - 663

    With careful ordering of functions, prototypes may be unnecessary. However, they are unavoidable in certain cases, such as when function f1 calls f2 and f2 calls f1 .Itis good style to place prototypes for all of a program ’s functions near the beginning of the C file or in a header file. the function, declaring the return type, function name, and function inputs. For example, the read more..

  • Page - 664

    Curly braces, {} ,are used to group one or more statements into a compound statement or block. C.7.1 Conditional Statements if , if/else , and switch/case statements are conditional statements commonly used in high-level languages including C. if Statements An if statement executes the statement immediately following it when the expression in parentheses is TRUE (i.e., nonzero). The general read more..

  • Page - 665

    switch/case Statements switch/case statements execute one of several statements depending on the conditions, as shown in the general format below. switch (variable) { case (expression1): statement1 break; case (expression2): statement2 break; case (expression3): statement3 break; default: statement4 } For example, if variable is equal to expression2 , execution con- tinues at statement2 until the read more..

  • Page - 666

    C.7.2 Loops while , do/while , and for loops are common loop constructs used in many high-level languages including C. These loops repeatedly execute a statement as long as a condition is satisfied. while Loops while loops repeatedly execute a statement until a condition is not met, as shown in the general format below. while (condition) statement The while loop in actionGoTo:666,C read more..

  • Page - 667

    for Loops for loops, like while and do/while loops, repeatedly execute a statement until a condition is not satisfied. However, for loops add support for a loop variable, which typically keeps track of the number of loop executions. The general format of the for loop is for (initialization; condition; loop operation) statement The initialization code executes only once, before the for read more..

  • Page - 668

    SUMMARY ▶ Control-flow statements: C provides control-flow statements for con- ditional statements and loops. ▶ Conditional statements: Conditional statements execute a statement when a condition is TRUE. C includes the following conditional statements: if , if/else , and switch/case . ▶ Loops: Loops repeatedly execute a statement until a condition is FALSE. C provides while , do/while , read more..

  • Page - 669

    indicated memory address contained in the pointer. The & operator is pro- nounced “address of, ” and it produces the memory address of the variable being referenced. Pointers are particularly useful when a function needs to modify a vari- able, instead of just returning a value. Because functions can ’tmodifytheir inputs directly, a function can make the input a pointer to read more..

  • Page - 670

    C.8.2 Arrays An array is a group of similar variables stored in consecutive addresses in memory. The elements are numbered from 0 to N −1, where N is the size of the array. actionGoTo:670,C actionGoTo:670,Code actionGoTo:670,Example actionGoTo:670,C.20 declares an array variable called scores that holds the final exam scores for three students. Memory space is reserved for three long read more..

  • Page - 671

    When an array is declared, the length must be constant so that the com- piler can allocate the proper amount of memory. However, when the array is passed to a function as an input argument, the length need not be defined because the function only needs to know the address of the beginning of the array. actionGoTo:671,C actionGoTo:671,Code actionGoTo:671,Example actionGoTo:671,C.24 read more..

  • Page - 672

    An array argument is equivalent to a pointer to the beginning of the array. Thus, getMean could also have been declared as float getMean(int *arr, int len); Although functionally equivalent, datatype[] is the preferred method for passing arrays as input arguments because it more clearly indicates that the argument is an array. A function is limited to a single output, i.e., return read more..

  • Page - 673

    Arrays may have multiple dimensions. actionGoTo:673,C actionGoTo:673,Code actionGoTo:673,Example actionGoTo:673,C.26 uses a two-dimensional array to store the grades across eight problem sets for ten students. Recall that initialization of array values using {} is only allowed at declaration. actionGoTo:673,C actionGoTo:673,Code actionGoTo:673,Example actionGoTo:673,C.27 shows some functions that operate on read more..

  • Page - 674

    The term “carriage return ” originates from typewriters that required the carriage, the contraption that holds the paper, to move to the right in order to allow typing to begin at theleftsideof the page. A carriage return lever, shown on the left in the figure below, is pressedsothatthe carriage wouldbothmove tothe right and advance the paper by one line, called a line feed. A read more..

  • Page - 675

    C strings are called null terminated or zero terminated because the length is determined by looking for a zero at the end. In contrast, languages such as Pascal use the first byte to specify the string length,uptoa maximum of 255 characters. This byte is called the prefix byte and such strings are called P-strings.Anadvantage of null-terminated strings is that the length can be read more..

  • Page - 676

    C.8.5 Structures In C, structures are used to store a collection of data of various types. The general format of a structure declaration is struct name { type1 element1; type2 element2; … }; where struct is a keyword indicating that it is a structure, name is the structure tag name, and element1 and element2 are members of the struc- ture. A structure may have any number of read more..

  • Page - 677

    Just like built-in C types, you can create arrays of structures and poin- ters to structures. actionGoTo:677,C actionGoTo:677,Code actionGoTo:677,Example actionGoTo:677,C.32 creates an array of contacts. It is common to use pointers to structures. C provides the member access operator - > to dereference a pointer to a structure and access a member of the structure. actionGoTo:677,C read more..

  • Page - 678

    C.8.6 * typedef C also allows you to define your own names for data types using the typedef statement. For example, writing struct contact becomes tedious when it is often used, so we can define a new type named contact and use it as shown in actionGoTo:678,CCode actionGoTo:678,Example actionGoTo:678,C.35. typedef can be used to create a new type occupying the same amount of read more..

  • Page - 679

    actionGoTo:679,C actionGoTo:679,Code actionGoTo:679,Example actionGoTo:679,C.37 illustrates defining a 3-element vector and a 3 × 3 matrix type using arrays. C.8.7 * Dynamic Memory Allocation In all the examples thus far, variables have been declared statically;that is, their size is known at compile time. This can be problematic for arrays and strings of variable size because the array must read more..

  • Page - 680

    C.8.8 * Linked Lists A linked list is a common data structure used to store a variable number of elements. Each element in the list is a structure containing one or more data fields and a link to the next element. The first element in the list is called the head. Linked lists illustrate many of the concepts of structures, pointers, and dynamic memory allocation. read more..

  • Page - 681

    char uname[80]; // user name char passwd[80]; // password int uid; // user identification number int admin; // 1 indicates administrator privileges struct userL *next; } userL; userL *users = NULL; void insertUser(char *uname, char *passwd, int uid, int admin) { userL *newUser; newUser = malloc(sizeof(userL)); // create space for new user strcpy(newUser->uname, uname); // copy values into user read more..

  • Page - 682

    SUMMARY ▶ Pointers: A pointer holds the address of a variable. ▶ Arrays: An array is a list of similar elements declared using square brackets [] . ▶ Characters: char types can hold small integers or special codes for representing text or symbols. ▶ Strings: A string is an array of characters ending with the null termi- nator 0x00. ▶ Structures: A structure stores a read more..

  • Page - 683

    Table C.5 Frequently used C libraries C Library Header File Description stdio.h Standard input/output library. Includes functions for printing or reading to/from the screen or a file (printf, fprintf and scanf , fscanf ) and to open and close files (fopen and fclose ). stdlib.h Standard library. Includes functions for random number generation (rand and srand ), for dynamically allocating or read more..

  • Page - 684

    Floating point formats (floats and doubles ) default to printing six digits after the decimal point. To change the precision, replace %f with %w.df ,where w is the minimum width of the number, and d is the number of decimal places to print. Note that the decimal point is included in the width count. In actionGoTo:684,C actionGoTo:684,Code actionGoTo:684,Example actionGoTo:684,C.41, pi read more..

  • Page - 685

    scanf The scanf function reads text typed on the keyboard. It uses format codes in the same way as printf . actionGoTo:685,C actionGoTo:685,Code actionGoTo:685,Example actionGoTo:685,C.43 shows how to use scanf . When the scanf function is encountered, the program waits until the user types a value before continuing execution. The arguments to scanf are a string indicating one or more read more..

  • Page - 686

    actionGoTo:686,C actionGoTo:686,Code actionGoTo:686,Example actionGoTo:686,C.44 shows how to open, print to, and close a file. It is good practice to always check if the file was opened successfully and to provide an error message if it was not. The exit function will be dis- cussed in Section C.9.2.2. The fprintf function is like printf but it also takes the file pointer as an read more..

  • Page - 687

    Other Handy stdio Functions The sprintf function prints characters into a string, and sscanf reads variables from a string. The fgetc function reads a single character from a file, while fgets reads a complete line into a string. fscanf is rather limited in its ability to read and parse complex files, so it is often easier to fgets one line at a time and then digest that read more..

  • Page - 688

    A programmer creates a different sequence of random numbers each time a program runs by changing the seed. This is done by calling the srand function, which takes the seed as its input argument. As shown in actionGoTo:688,C actionGoTo:688,Code actionGoTo:688,Example actionGoTo:688,C.47, the seed itself must be random, so a typical C program assigns it by calling the time function, read more..

  • Page - 689

    when reading in mixed data (a mix of strings and numbers) from a file or when processing numeric command line arguments, as described in actionGoTo:691,Section actionGoTo:691,C.10.3. C.9.3 math The math library math.h provides commonly used math functions such as trigonometry functions, square root, and logs. actionGoTo:689,C actionGoTo:689,Code actionGoTo:689,Example actionGoTo:689,C.49 shows how to read more..

  • Page - 690

    C.9.4 string The string library string.h provides commonly used string manipulation functions. Key functions include: // copy src into dst and return dst char *strcpy(char *dst, char *src); // concatenate (append) src to the end of dst and return dst char *strcat(char *dst, char *src); // compare two strings. Return 0 if equal, nonzero otherwise int strcmp(char *s1, char *s2); // read more..

  • Page - 691

    standardized, but actionGoTo:691,Table actionGoTo:691,C.7 lists ones that are commonly used. Each option is typically preceded by a dash (-) on the command line, as shown. For example, the "-o" option allows the programmer to specify an output file name other than the a.out default. A plethora of options exist; they can be viewed by typing gcc --help at the command line. read more..

  • Page - 692

    C.11 COMMON MISTAKES As with any programming language, you are almost certain to make errors while you write nontrivial C programs. Below are descriptions of some common mistakes made when programming in C. Some of these errors are particularly troubling because they compile but do not function as the programmer intended. C Code Example C.50 COMMAND LINE ARGUMENTS // Print command read more..

  • Page - 693

    C Code Mistake C.3 INDEXING PAST LAST ELEMENT OF ARRAY Erroneous Code int array[10]; array[10] = 42; // index is 0-9 Corrected Code int array[10]; array[9] = 42; C Code Mistake C.4 USING = IN #define STATEMENT Erroneous Code // replaces NUM with "= 4" in code #define NUM = 4 Corrected Code #define NUM 4 C Code Mistake C.5 USING AN UNINITIALIZED VARIABLE Erroneous Code int read more..

  • Page - 694

    C Code Mistake C.8 FORGETTING break IN A switch/case STATEMENT Erroneous Code char x = 'd'; ... switch (x) { case 'u': direction = 1; case 'd': direction = 2; case 'l': direction = 3; case 'r': direction = 4; default: direction = 0; } // direction = 0 Corrected Code char x = 'd'; ... switch (x) { case 'u': direction = 1; break; case 'd': direction = 2; break; case 'l': read more..

  • Page - 695

    C Code Mistake C.12 TRYING TO INITIALIZE AN ARRAY WITH {} AFTER DECLARATION Erroneous Code int scores[3]; scores = {93, 81, 97}; // won't compile Corrected Code int scores[3] = {93, 81, 97}; C Code Mistake C.13 ASSIGNING ONE ARRAY TO ANOTHER USING = Erroneous Code int scores[3] = {88, 79, 93}; int scores2[3]; scores2 = scores; Corrected Code int scores[3] = {88, 79, 93}; int read more..

  • Page - 696

    This appendix has focused on using C on a system with a console, such as a Linux computer. actionGoTo:533,Section actionGoTo:533,8.6 describes how C is used to pro- gram PIC32 microcontrollers used in embedded systems. Microcontrol- lers are usually programmed in C because the language provides nearly as much low-level control of the hardware as assembly language, yet is much more read more..

  • Page - 697

    This page intentionally left blank read more..

  • Page - 698

    Further Reading Berlin L., The Man Behind the Microchip: Robert Noyce and the Invention of Silicon Valley, Oxford University Press, 2005. The fascinating biography of Robert Noyce, an inventor of the microchip and founder of Fairchild and Intel. For anyone thinking of working in Silicon Valley, this book gives insights into the culture of the region, a culture influenced more read more..

  • Page - 699

    SystemVerilog IEEE Standard (IEEE STD 1800). The IEEE standard for the SystemVerilog Hardware Description Language; last updated in 2009. Available at actionURI( VHDL IEEE Standard (IEEE STD 1076). The IEEE standard for VHDL; last updated in 2008. Available from IEEE. Available at actionURI( Wakerly J., Digital Design: read more..

  • Page - 700

    Index Page numbers in italics indicate figures, tables and text boxes. 0, actionGoTo:47,22. See also LOW, OFF 1, actionGoTo:47,22. See also HIGH, ON 32-bit datapath, actionGoTo:486,461 32-bit microprocessor, actionGoTo:479,454 4004 microprocessor chip, actionGoTo:483,458, actionGoTo:484,459 74xx series logic, actionGoTo:608,583 actionGoTo:608,–587 parts, 2:1 mux (74157), actionGoTo:611,586 3:8 decoder (74138), read more..

  • Page - 701

    Amdahl ’s Law, actionGoTo:505,480 American Standard Code for Information Interchange (ASCII), actionGoTo:347,322, actionGoTo:348,323, actionGoTo:655,630, actionGoTo:674,649 actionGoTo:674,–650 Analog I/O, actionGoTo:556,531 actionGoTo:556,–537 A/D conversion, actionGoTo:557,532 actionGoTo:557,–533 D/A conversion, actionGoTo:558,533 actionGoTo:558,–537 Pulse-width modulation (PWM), actionGoTo:561,536 actionGoTo:561,–537 read more..

  • Page - 702

    equation simplification, actionGoTo:90,65 actionGoTo:90,–66 theorems, actionGoTo:86,61 actionGoTo:86,–64 Boolean equations, actionGoTo:83,58 actionGoTo:83,–60 product-of-sums (POS) form, actionGoTo:85,60 sum-of-products (SOP) form, actionGoTo:83,58 actionGoTo:83,–60 Boolean logic, actionGoTo:33,8. See also Boolean algebra, Logic gates Boolean theorems, actionGoTo:86,61 actionGoTo:86,–64 associativity, actionGoTo:88,63 combining, read more..

  • Page - 703

    Circuits (Continued) synchronous, actionGoTo:147,122 actionGoTo:147,–123 synchronous sequential, actionGoTo:145,120 actionGoTo:145,–123, actionGoTo:147,122 synthesized, actionGoTo:201,176, actionGoTo:204,179, actionGoTo:206,181 timing, actionGoTo:113,88 actionGoTo:113,–95 with two-stage pipeline, actionGoTo:185,160 without glitch, actionGoTo:120,95 CISC. See Complex Instruction Set Computer CLBs. See Configurable logic blocks Clock read more..

  • Page - 704

    Data types, actionGoTo:668,643 actionGoTo:668,–657 arrays. See Arrays characters. See Character (char) dynamic memory allocation. See Dynamic memory allocation (malloc and free ) linked list. See Linked list pointers. See Pointers strings. See Strings (str) structures. See Structures (struct) typedef, actionGoTo:678,653 actionGoTo:678,–654 Datapath multicycle MIPS processor, actionGoTo:415,390 actionGoTo:415,–396 read more..

  • Page - 705

    Equality comparator, actionGoTo:272,247 Equation minimization using Boolean algebra, actionGoTo:90,65 actionGoTo:90,–66 using Karnaugh maps. See Karnaugh maps Erasable programmable read only memory (EPROM), actionGoTo:294,269, actionGoTo:613,588 Ethernet, actionGoTo:586,561 Exception program counter (EPC), actionGoTo:368,343 actionGoTo:368,–344 Exceptions, actionGoTo:368,343 actionGoTo:368,–344, actionGoTo:465,440 read more..

  • Page - 706

    Gray codes, actionGoTo:101,76 Gray, Frank, actionGoTo:101,76 Ground (GND), actionGoTo:47,22 symbol for, actionGoTo:56,31 H Half adder, actionGoTo:265,240, actionGoTo:265,240 Hard disk, actionGoTo:503,478 actionGoTo:503,–479. See also Hard drive Hard drive, actionGoTo:503,478 actionGoTo:503,–479, actionGoTo:521,496. See also Hard disk, Solid state drive, and Virtual memory Hardware description languages (HDLs). See read more..

  • Page - 707

    Interrupt service routine (ISR), actionGoTo:554,529. See also Exceptions Interrupts, actionGoTo:368,343, actionGoTo:554,529 actionGoTo:554,–531 PIC32, actionGoTo:554,529 actionGoTo:554,–531 Invalid logic level, actionGoTo:211,186 Inverters, actionGoTo:45,20, actionGoTo:144,119, 178. See also NOT gate cross-coupled, actionGoTo:134,109, actionGoTo:135,110 in HDL, actionGoTo:203,178, actionGoTo:224,199 An Investigation of the read more..

  • Page - 708

    Logical shifter, actionGoTo:275,250 Lookup tables (LUTs), actionGoTo:295,270, actionGoTo:300,275 Loops, actionGoTo:342,317 actionGoTo:342,–319, actionGoTo:666,641 actionGoTo:666,–642 in C, do/while, actionGoTo:666,641 actionGoTo:666,–642 for, actionGoTo:667,642 while, actionGoTo:666,641 in MIPS assembly, for, actionGoTo:344,319 actionGoTo:344,–320 while, actionGoTo:343,318 actionGoTo:343,–319 LOW, actionGoTo:47,22. See also 0, read more..

  • Page - 709

    Microcontroller peripherals (Continued) motor control, actionGoTo:573,548 actionGoTo:573,–549 VGA monitor, actionGoTo:566,541 actionGoTo:566,–547 Microcontroller units (MCUs), actionGoTo:533,508 Micro-ops, actionGoTo:486,461 Microprocessors, actionGoTo:28,3, actionGoTo:38,13, actionGoTo:320,295 architectural state of, actionGoTo:335,310 designers, actionGoTo:469,444 high-performance, actionGoTo:469,444 Millions of instructions per read more..

  • Page - 710

    Nonleaf function, actionGoTo:355,330 Nonpreserved registers, actionGoTo:354,329, actionGoTo:355,330 nop , actionGoTo:367,342 nor , actionGoTo:336,311 NOR gate, actionGoTo:46,21 actionGoTo:46,–22, actionGoTo:136,111, actionGoTo:153,128, actionGoTo:610,585 chip (7402), actionGoTo:610,585 CMOS, actionGoTo:57,32 pseudo-nMOS logic, actionGoTo:58,33 truth table, actionGoTo:47,22 Not a number (NaN), actionGoTo:282,257 NOT gate, read more..

  • Page - 711

    PLAs. See Programmable logic arrays Plastic leaded chip carriers (PLCCs), actionGoTo:624,599 Platters, actionGoTo:521,496 PLCCs. See Plastic leaded chip carriers PLDs. See Programmable logic devices PLL. See Phase locked loop pMOS transistors, actionGoTo:53,28 actionGoTo:53,–31, actionGoTo:54,29 Pointers, actionGoTo:668,643 actionGoTo:668,–645, actionGoTo:672,647, actionGoTo:675,650, actionGoTo:677,652, read more..

  • Page - 712

    Semiconductors, actionGoTo:52,27 industry, sales, actionGoTo:28,3 Sequencing overhead, actionGoTo:168,143 actionGoTo:168,–144, actionGoTo:174,149, actionGoTo:185,160, actionGoTo:453,428 Sequential building blocks. See Sequential logic Sequential logic, actionGoTo:134,109 actionGoTo:134,–161, actionGoTo:285,260 actionGoTo:285,–263 counters, actionGoTo:285,260 finite state machines. See Finite state machine flip-flops, read more..

  • Page - 713

    Stores store byte (sb or sbu ), actionGoTo:327,302 actionGoTo:327,–304, actionGoTo:348,323 actionGoTo:348,–324 store half (sh or shu ), actionGoTo:370,345 store word (sw), actionGoTo:327,302 actionGoTo:327,–304 string.h, C library, actionGoTo:690,665 Strings, actionGoTo:349,324, actionGoTo:675,650 actionGoTo:675,–651. See also Characters (char) Structural modeling, actionGoTo:198,173 actionGoTo:198,–174, actionGoTo:215,190 read more..

  • Page - 714

    TLB. See Translation lookaside buffer Trace cache, actionGoTo:488,463 Transistors, actionGoTo:51,26 actionGoTo:51,–34 bipolar, actionGoTo:51,26 CMOS, actionGoTo:51,26 actionGoTo:51,–33 gates made from, actionGoTo:56,31 actionGoTo:56,–34 latches and flip-flops, actionGoTo:141,116 actionGoTo:141,–117 MOSFETs, actionGoTo:51,26 nMOS, actionGoTo:53,28 actionGoTo:53,–34, actionGoTo:54,29 actionGoTo:54,–33 pMOS, actionGoTo:53,28 read more..

  • Page - 715

    VHSIC Hardware Description Language (VHDL) (Continued) reduction operators, actionGoTo:205,180 actionGoTo:205,–181 registers, actionGoTo:218,193 actionGoTo:218,–197 enabled, actionGoTo:221,196 resettable, actionGoTo:219,194 actionGoTo:219,–196 sequential logic using, actionGoTo:218,193 actionGoTo:218,–198, actionGoTo:234,209 actionGoTo:234,–213 seven-segment display decoder, actionGoTo:226,201 simulation and synthesis, actionGoTo:200,175 read more..

  • Page - 716

    This page intentionally left blank read more..

  • Page - 717

    This page intentionally left blank read more..

  • Page - 718

    This page intentionally left blank read more..

  • Page - 719

    This page intentionally left blank read more..

  • Page - 720

    This page intentionally left blank read more..

  • Page - 721

    This page intentionally left blank read more..

Write Your Review