本書由2017年圖靈獎得主Patterson和Hennessy共同撰寫,是計算機體系結(jié)構(gòu)領(lǐng)域的經(jīng)典書籍,強調(diào)軟硬件協(xié)同設(shè)計及其對性能的影響。本書采用開源的RISC-V指令系統(tǒng)體系結(jié)構(gòu),講解硬件技術(shù)、指令、算術(shù)運算、流水線、存儲層次、I/O以及并行處理器等。第2版將RV64切換為RV32以降低學習難度,新增關(guān)于領(lǐng)域定制體系結(jié)構(gòu)(DSA)的討論以反映新的技術(shù)趨勢。此外,每一章都增加了性能提升和自學章節(jié),并更新了大量練習題。本書適合計算機體系結(jié)構(gòu)領(lǐng)域的專業(yè)技術(shù)人員參考,也適合高等院校計算機相關(guān)專業(yè)的學生閱讀。
本書由圖靈獎得主Patterson和Hennessy聯(lián)袂撰寫,是計算機體系結(jié)構(gòu)新黃金時代之作。根據(jù)讀者的需求,這一版將RV64切換為RV32,減少10條指令,降低學習難度;新增關(guān)于領(lǐng)域定制體系結(jié)構(gòu)(DSA)的討論,使用Google的TPUv1作為示例,還新增了TPUv3 DSA超級計算機與NVIDIA Volta GPU集群的比較;每一章都增加了性能提升一節(jié),分別采用數(shù)據(jù)級并行、指令級并行、線程級并行等方法,僅增加21行代碼便使矩陣乘法程序加速近50 000倍,直觀呈現(xiàn)出硬件對提高能效的重要性。
Preface
The most beautiful thing we can experience is the mysterious. It is the source of all true art and science.
Albert Einstein, What I Believe, 1930
About This Book
We believe that learning in computer science and engineering should reflect the current state of the field, as well as introduce the principles that are shaping computing. We also feel that readers in every specialty of computing need to appreciate the organizational paradigms that determine the capabilities, performance, energy, and, ultimately, the success of computer systems.
Modern computer technology requires professionals of every computing specialty to understand both hardware and software. The interaction between hardware and software at a variety of levels also offers a framework for understanding the fundamentals of computing. Whether your primary interest is hardware or software, computer science or electrical engineering, the central ideas in computer organization and design are the same. Thus, our emphasis in this book is to show the relationship between hardware and software and to focus on the concepts that are the basis for current computers.
The recent switch from uniprocessor to multicore microprocessors confirmed the soundness of this perspective, given since the first edition. While programmers could ignore the advice and rely on computer architects, compiler writers, and silicon engineers to make their programs run faster or be more energy-efficient without change, that era is over. For programs to run faster, they must become parallel. While the goal of many researchers is to make it possible for programmers to be unaware of the underlying parallel nature of the hardware they are programming, it will take many years to realize this vision. Our view is that for at least the next decade, most programmers are going to have to understand the hardware/software interface if they want programs to run efficiently on parallel computers.
The audience for this book includes those with little experience in assembly language or logic design who need to understand basic computer organization as well as readers with backgrounds in assembly language and/or logic design who want to learn how to design a computer or understand how a system works and why it performs as it does.
About the Other Book
Some readers may be familiar with Computer Architecture: A Quantitative Approach, popularly known as Hennessy and Patterson. (This book in turn is often called Patterson and Hennessy.) Our motivation in writing the earlier book was to describe the principles of computer architecture using solid engineering fundamentals and quantitative cost/performance tradeoffs. We used an approach that combined examples and measurements, based on commercial systems, to create realistic design experiences. Our goal was to demonstrate that computer architecture could be learned using quantitative methodologies instead of a descriptive approach. It was intended for the serious computing professional who wanted a detailed understanding of computers.
A majority of the readers for this book do not plan to become computer architects. The performance and energy efficiency of future software systems will be dramatically affected, however, by how well software designers understand the basic hardware techniques at work in a system. Thus, compiler writers, operating system designers, database programmers, and most other software engineers need a firm grounding in the principles presented in this book. Similarly, hardware designers must understand clearly the effects of their work on software applications.
Thus, we knew that this book had to be much more than a subset of the material in Computer Architecture, and the material was extensively revised to match the different audience. We were so happy with the result that the subsequent editions of Computer Architecture were revised to remove most of the introductory material; hence, there is much less overlap today than with the first editions of both books.
戴維·A. 帕特森(David A. Patterson)
自1977年加入加州大學伯克利分校以來,他一直在該校教授計算機體系結(jié)構(gòu)課程,并在那里擔任計算機科學Pardee教席。他曾因教學工作獲得加州大學杰出教學獎、ACM Karlstrom獎、IEEE Mulligan教育獎章以及IEEE本科教學獎。因為對RISC的貢獻,Patterson獲得了IEEE技術(shù)進步獎和ACM Eckert-Mauchly獎,并因為對RAID的貢獻分享了IEEE Johnson信息存儲獎。他和Hennessy共同獲得了IEEE John von Neumann獎章以及C&C獎金。與Hennessy一樣,Patterson是美國國家工程院、美國國家科學院、美國藝術(shù)與科學院和計算機歷史博物館院士,ACM和IEEE會士,并入選了硅谷工程名人堂。他曾擔任加州大學伯克利分校電氣工程與計算機科學(EECS)系計算機科學分部主任、計算研究學會主席和ACM主席。這些工作使他獲得了ACM、CRA以及SIGARCH的杰出服務獎。他因在科學普及和計算多樣化方面的貢獻而獲得了Tapia成就獎,并與Hennessy共同獲得了2017年ACM圖靈獎。
在伯克利,Patterson領(lǐng)導了RISC I的設(shè)計與實現(xiàn)工作,這可能是第一臺VLSI精簡指令系統(tǒng)計算機,為商用SPARC體系結(jié)構(gòu)奠定了基礎(chǔ)。他也是廉價磁盤冗余陣列(RAID)項目的領(lǐng)導者,RAID技術(shù)引導許多公司開發(fā)出了高可靠的存儲系統(tǒng)。他還參加了工作站網(wǎng)絡(NOW)項目,正是因為該項目,才有了被互聯(lián)網(wǎng)公司廣泛使用的集群技術(shù)以及后來的云計算。這些項目獲得了四個ACM最佳論文獎。2016年,他成為伯克利的榮休教授和谷歌杰出工程師,在谷歌,他致力于面向機器學習的領(lǐng)域定制體系結(jié)構(gòu)的研究工作。他還是RISC-V國際協(xié)會副主席和RISC-V國際開源實驗室主任。
約翰·L.亨尼斯(John L. Hennessy)
斯坦福大學第十任校長,從1977年開始任教于該校電氣工程與計算機科學系。Hennessy是IEEE和ACM會士,美國國家工程院、美國國家科學院、美國哲學院以及美國藝術(shù)與科學院院士。Hennessy獲得的眾多獎項包括:2001年ACM Eckert-Mauchly獎(因?qū)ISC的貢獻),2001年Seymour Cray計算機工程獎,2000年與Patterson共同獲得IEEE John von Neumann獎章,2017年又與Patterson共同獲得ACM圖靈獎。他還獲得了七個榮譽博士學位。
1981年,Hennessy帶領(lǐng)幾位研究生在斯坦福大學開始研究MIPS項目。1984年完成該項目后,他暫時離開大學,與他人共同創(chuàng)建了MIPS Computer Systems公司(現(xiàn)在的MIPS Technologies公司),該公司開發(fā)了早期的商用 RISC 微處理器之一。2006年,已有超過20億個MIPS微處理器應用在從視頻游戲和掌上計算機到激光打印機和網(wǎng)絡交換機的各類設(shè)備中。Hennessy后來領(lǐng)導了共享存儲器體系結(jié)構(gòu)(DASH)項目,該項目設(shè)計了第一個可擴展cache一致性多處理器原型,其中的很多關(guān)鍵思想都在現(xiàn)代多處理器中得到了應用。除了參與科研活動和履行學校職責之外,Hennessy還作為前期顧問和投資者參與了很多初創(chuàng)項目,為相關(guān)領(lǐng)域?qū)W術(shù)成果的商業(yè)化做出了杰出貢獻。
他目前是Knight-Hennessy學者獎學金項目的主管,并擔任Alphabet的非執(zhí)行董事長。
Contents
CHAPTERS
Computer Abstractions and Technology 2
1.1 Introduction 3
1.2 Seven Great Ideas in Computer Architecture 10
1.3 Below Your Program 13
1.4 Under the Covers 16
1.5 Technologies for Building Processors and Memory 25
1.6 Performance 29
1.7 The Power Wall 40
1.8 The Sea Change: The Switch from Uniprocessors to Multiprocessors 43
1.9 Real Stuff: Benchmarking the Intel Core i7 46
1.10 Going Faster: Matrix Multiply in Python 49
1.11 Fallacies and Pitfalls 50
1.12 Concluding Remarks 53
1.13 Historical Perspective and Further Reading 55
1.14 Self-Study 55
1.15 Exercises 59
Instructions: Language of the Computer 66
2.1 Introduction 68
2.2 Operations of the Computer Hardware 69
2.3 Operands of the Computer Hardware 73
2.4 Signed and Unsigned Numbers 80
2.5 Representing Instructions in the Computer 87
2.6 Logical Operations 95
2.7 Instructions for Making Decisions 98
2.8 Supporting Procedures in Computer Hardware 104
2.9 Communicating with People 114
2.10 RISC-V Addressing for Wide Immediates and Addresses 120
2.11 Parallelism and Instructions: Synchronization 128
2.12 Translating and Starting a Program 131
2.13 A C Sort Example to Put it All Together 140
2.14 Arrays versus Pointers 148
2.15 Advanced Material: Compiling C and Interpreting Java 151
2.16 Real Stuff: MIPS Instructions 152
2.17 Real Stuff: ARMv7 (32-bit) Instructions 153
2.18 Real Stuff: ARMv8 (64-bit) Instructions 157
2.19 Real Stuff: x86 Instructions 158
2.20 Real Stuff: The Rest of the RISC-V Instruction Set 167
2.21 Going Faster: Matrix Multiply in C 168
2.22 Fallacies and Pitfalls 170
2.23 Concluding Remarks 172
2.24 Historical Perspective and Further Reading 174
2.25 Self-Study 175
2.26 Exercises 178
Arithmetic for Computers 188
3.1 Introduction 190
3.2 Addition and Subtraction 190
3.3 Multiplication 193
3.4 Division 199
3.5 Floating Point 208
3.6 Parallelism and Computer Arithmetic: Subword Parallelism 233
3.7 Real Stuff: Streaming SIMD Extensions and Advanced Vector Extensions
in x86 234
3.8 Going Faster: Subword Parallelism and Matrix Multiply 236
3.9 Fallacies and Pitfalls 238
3.10 Concluding Remarks 241
3.11 Historical Perspective and Further Reading 242
3.12 Self-Study 242
3.13 Exercises 246
The Processor 252
4.1 Introduction 254
4.2 Logic Design Conventions 258
4.3 Building a Datapath 261
4.4 A Simple Implementation Scheme 269
4.5 Multicycle Implementation 282
4.6 An Overview of Pipelining 283
4.7 Pipelined Datapath and Control 296
4.8 Data Hazards: Forwarding versus Stalling 313
4.9 Control Hazards 325
4.10 Exceptions 333
4.11 Parallelism via Instructions 340
4.12 Putting It All Together: The Intel Core i7 6700 and ARM
Cortex-A53 354
4.13 Going Faster: Instruction-Level Parallelism and Matrix Multiply 363
4.14 Advanced Topic: An Introduction to Digital Design Using a Hardware Design Language to Describe and Model a Pipeline and More Pipelining Illustrations 365
4.15 Fallacies and Pitfalls 365
4.16 Concluding Remarks 367
4.17 Historical Perspective and Further Reading 368
4.18 Self-Study 368
4.19 Exercises 369
Large and Fast: Exploiting Memory Hierarchy 386
5.1 Introduction 388
5.2 Memory Technologies 392
5.3 The Basics of Caches 398
5.4 Measuring and Improving Cache Performance 412
5.5 Dependable Memory Hierarchy 431
5.6 Virtual Machines 436
5.7 Virtual Memory 440
5.8 A Common Framework for Memory Hierarchy 464
5.9 Using a Finite-State Machine to Control a Simple Cache 470
5.