图书介绍
大规模并行处理器程序设计2025|PDF|Epub|mobi|kindle电子书版本百度云盘下载

- (美)柯克(Kirk.D.)著 著
- 出版社: 北京市:清华大学出版社
- ISBN:9787302229735
- 出版时间:2010
- 标注页数:258页
- 文件大小:59MB
- 文件页数:278页
- 主题词:并行程序-程序设计-高等学校-教材-英文
PDF下载
下载说明
大规模并行处理器程序设计PDF格式电子书版下载
下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!
(文件页数 要大于 标注页数,上中下等多册电子书除外)
注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具
图书目录
CHAPTER 1 INTRODUCTION1
1.1 GPUs as Parallel Computers2
1.2 Architecture of a Modern GPU8
1.3 Why More Speed or Parallelism?10
1.4 Parallel Programming Languages and Models13
1.5 Overarching Goals15
1.6 Organization of the Book16
CHAPTER 2 HISTORY OF GPU COMPUTING21
2.1 Evolution of Graphics Pipelines21
2.1.1 The Era of Fixed-Function Graphics Pipelines22
2.1.2 Evolution of Programmable Real-Time Graphics26
2.1.3 Unified Graphics and Computing Processors29
2.1.4 GPGPU:An Intermediate Step31
2.2 GPU Computing32
2.2.1 Scalable GPUs33
2.2.2 Recent Developments34
2.3 Future Trends34
CHAPTER 3 INTRODUCTION TO CUDA39
3.1 Data Parallelism39
3.2 CUDA Program Structure41
3.3 A Matrix-Matrix Multiplication Example42
3.4 Device Memories and Data Transfer46
3.5 Kernel Functions and Threading51
3.6 Summary56
3.6.1 Function declarations56
3.6.2 Kernel launch56
3.6.3 Predefined variables56
3.6.4 Runtime API57
CHAPTER 4 CUDA THREADS59
4.1 CUDA Thread Organization59
4.2 Using blockIdx and threadIdx64
4.3 Synchronization and Transparent Scalability68
4.4 Thread Assignment70
4.5 Thread Scheduling and Latency Tolerance71
4.6 Summary74
4.7 Exercises74
CHAPTER 5 CUDATM MEMORIES77
5.1 Importance of Memory Access Efficiency78
5.2 CUDA Device Memory Types79
5.3 A Strategy for Reducing Global Memory Traffic83
5.4 Memory as a Limiting Factor to Parallelism90
5.5 Summary92
5.6 Exercises93
CHAPTER 6 PERFORMANCE CONSIDERATIONS95
6.1 More on Thread Execution96
6.2 Global Memory Bandwidth103
6.3 Dynamic Partitioning of SM Resources111
6.4 Data Prefetching113
6.5 Instruction Mix115
6.6 Thread Granularity116
6.7 Measured Performance and Summary118
6.8 Exercises120
CHAPTER 7 FLOATING POINT CONSIDERATIONS125
7.1 Floating-Point Format126
7.1.1 Normalized Representation of M126
7.1.2 Excess Encoding of E127
7.2 Representable Numbers129
7.3 Special Bit Patterns and Precision134
7.4 Arithmetic Accuracy and Rounding135
7.5 Algorithm Considerations136
7.6 Summary138
7.7 Exercises138
CHAPTER 8 APPLICATION CASE STUDY:ADVANCED MRI RECONSTRUCTION141
8.1 Application Background142
8.2 Iterative Reconstruction144
8.3 Computing FHd148
Step 1.Determine the Kernel Parallelism Structure149
Step 2.Getting Around the Memory Bandwidth Limitation156
Step 3.Using Hardware Trigonometry Functions163
Step 4.Experimental Performance Tuning166
8.4 Final Evaluation167
8.5 Exercises170
CHAPTER 9 APPLICATION CASE STUDY:MOLECULAR VISUALIZATION AND ANALYSIS173
9.1 Application Background174
9.2 A Simple Kernel Implementation176
9.3 Instruction Execution Efficiency180
9.4 Memory Coalescing182
9.5 Additional Performance Comparisons185
9.6 Using Multiple GPUs187
9.7 Exercises188
CHAPTER 10 PARALLEL PROGRAMMING AND COMPUTATIONAL THINKING191
10.1 Goals of Parallel Programming192
10.2 Problem Decomposition193
10.3 Algorithm Selection196
10.4 Computational Thinking202
10.5 Exercises204
CHAPTER 11 A BRIEF INTRODUCTION TO OPENCLTM205
11.1 Background205
11.2 Data Parallelism Model207
11.3 Device Architecture209
11.4 Kernel Functions211
11.5 Device Management and Kernel Launch212
11.6 Electrostatic Potential Map in OpenCL214
11.7 Summary219
11.8 Exercises220
CHAPTER 12 CONCLUSION AND FUTURE OUTLOOK221
12.1 Goals Revisited221
12.2 Memory Architecture Evolution223
12.2.1 Large Virtual and Physical Address Spaces223
12.2.2 Unified Device Memory Space224
12.2.3 Configurable Caching and Scratch Pad225
12.2.4 Enhanced Atomic Operations226
12.2.5 Enhanced Global Memory Access226
12.3 Kernel Execution Control Evolution227
12.3.1 Function Calls within Kernel Functions227
12.3.2 Exception Handling in Kernel Functions227
12.3.3 Simultaneous Execution of Multiple Kernels228
12.3.4 Interruptible Kernels228
12.4 Core Performance229
12.4.1 Double-Precision Speed229
12.4.2 Better Control Flow Efficiency229
12.5 Programming Environment230
12.6 A Bright Outlook230
APPENDIX A MATRIX MULTIPLICATION HOST-ONLY VERSION SOURCE CODE233
A.1 matrixmul.cu233
A.2 matrixmul_gold.cpp237
A.3 matrixmul.h238
A.4 assist.h239
A.5 Expected Output243
APPENDIX B GPU COMPUTE CAPABILITIES245
B.1 GPU Compute Capability Tables245
B.2 Memory Coalescing Variations246
Index251
热门推荐
- 331361.html
- 1965564.html
- 3250514.html
- 1799858.html
- 3546024.html
- 1145305.html
- 931710.html
- 3822436.html
- 3353537.html
- 2290012.html
- http://www.ickdjs.cc/book_599902.html
- http://www.ickdjs.cc/book_2493380.html
- http://www.ickdjs.cc/book_2261890.html
- http://www.ickdjs.cc/book_539583.html
- http://www.ickdjs.cc/book_1389408.html
- http://www.ickdjs.cc/book_396354.html
- http://www.ickdjs.cc/book_3181499.html
- http://www.ickdjs.cc/book_1657818.html
- http://www.ickdjs.cc/book_3088237.html
- http://www.ickdjs.cc/book_1274363.html