图书介绍
大规模并行处理器程序设计 第2版 英文版2025|PDF|Epub|mobi|kindle电子书版本百度云盘下载

- (美)柯克,(美)胡文美著 著
- 出版社: 北京:机械工业出版社
- ISBN:9787111416296
- 出版时间:2013
- 标注页数:496页
- 文件大小:106MB
- 文件页数:517页
- 主题词:并行程序-程序设计-英文
PDF下载
下载说明
大规模并行处理器程序设计 第2版 英文版PDF格式电子书版下载
下载的文件为RAR压缩包。需要使用解压软件进行解压得到PDF格式图书。建议使用BT下载工具Free Download Manager进行下载,简称FDM(免费,没有广告,支持多平台)。本站资源全部打包为BT种子。所以需要使用专业的BT下载软件进行下载。如BitComet qBittorrent uTorrent等BT下载工具。迅雷目前由于本站不是热门资源。不推荐使用!后期资源热门了。安装了迅雷也可以迅雷进行下载!
(文件页数 要大于 标注页数,上中下等多册电子书除外)
注意:本站所有压缩包均有解压码: 点击下载压缩包解压工具
图书目录
CHAPTER 11 Introduction1
1.1 Heterogeneous Parallel Computing2
1.2 Architecture of a Modern GPU8
1.3 Why More Speed or Parallelism?10
1.4 Speeding Up Real Applications12
1.5 Parallel Programming Languages and Models14
1.6 Overarching Goals16
1.7 Organization of the Book17
References21
CHAPTER 2 History of GPU Computing23
2.1 Evolution of Graphics Pipelines23
The Era of Fixed-Function Graphics Pipelines24
Evolution of Programmable Real-Time Graphics28
Unified Graphics and Computing Processors31
2.2 GPGPU:An Intermediate Step33
2.3 GPU Computing34
Scalable GPUs35
Recent Developments36
Future Trends37
References and Further Reading37
CHAPTER 3 Introduction to Data Parallelism and CUDA C41
3.1 Data Parallelism42
3.2 CUDA Program Structure43
3.3 A Vector Addition Kernel45
3.4 Device Global Memory and Data Transfer48
3.5 Kernel Functions and Threading53
3.6 Summary58
Function Declarations59
Kernel Launch59
Predefined Variables59
Runtime API60
3.7 Exercises60
References62
CHAPTER 4 Data-Parallel Execution Model63
4.1 Cuda Thread Organization64
4.2 Mapping Threads to Multidimensional Data68
4.3 Matrix-Matrix Multiplication—A More Complex Kernel74
4.4 Synchronization and Transparent Scalabilitv81
4.5 Assigning Resources to Blocks83
4.6 Querying Device Properties85
4.7 Thread Scheduling and Latency Tolerance87
4.8 Summary91
4.9 Exercises91
CHAPTER 5 CUDA Memories95
5.1 Importance of Memory Access Efficiency96
5.2 CUDA Device Memory Types97
5.3 A Strategy for Reducing Global Memory Traffic105
5.4 A Tiled Matrix—Matrix Multiplication Kernel109
5.5 Memory as a Limiting Factor to Parallelism115
5.6 Summary118
5.7 Exercises119
CHAPTER 6 Performance Considerations123
6.1 Warps and Thread Execution124
6.2 Global Memory Bandwidth132
6.3 Dynamic Partitioning of Execution Resources141
6.4 Instruction Mix and Thread Granularity143
6.5 Summary145
6.6 Exercises145
References149
CHAPTER 7 Floating-Point Considerations151
7.1 Floating-Point Format152
Normalized Representation of M152
Excess Encoding of E153
7.2 Representable Numbers155
7.3 Special Bit Patterns and Precision in IEEE Format160
7.4 Arithmetic Accuracy and Rounding161
7.5 Algorithm Considerations162
7.6 Numerical Stability164
7.7 Summary169
7.8 Exercises170
References171
CHAPTER 8 Parallel Patterns:Convolution173
8.1 Background174
8.2 1D Parallel Convolution—A Basic Algorithm179
8.3 Constant Memory and Caching181
8.4 Tiled 1D Convolution with Halo Elements185
8.5 A Simpler Tiled 1D Convolution—General Caching192
8.6 Summary193
8.7 Exercises194
CHAPTER 9 Parallel Patterns:Prefix Sum197
9.1 Background198
9.2 A Simple Parallel Scan200
9.3 Work Efficiency Considerations204
9.4 A Work-Efficient Parallel Scan205
9.5 Parallel Scan for Arbitrary-Length Inputs210
9.6 Summary214
9.7 Exercises215
Reference216
CHAPTER 10 Parallel Patterns:Sparse Matrix—Vector Multiplication217
10.1 Background218
10.2 Parallel SpMV Using CSR222
10.3 Padding and Transposition224
10.4 Using Hybrid to Control Padding226
10.5 Sorting and Partitioning for Regularization230
10.6 Summary232
10.7 Exercises233
References234
CHAPTER 11 Application Case Study:Advanced MRI Reconstruction235
11.1 Application Background236
11.2 Iterative Reconstruction239
11.3 Computing FHD241
Step 1:Determine the Kernel Parallelism Structure243
Step 2:Getting Around the Memory Bandwidth Limitation249
Step 3:Using Hardware Trigonometry Functions255
Step 4:Experimental Performance Tuning259
11.4 Final Evaluation260
11.5 Exercises262
References264
CHAPTER 12 Application Case Study:Molecular Visualization and Analysis265
12.1 Application Background266
12.2 A Simple Kernel Implementation268
12.3 Thread Granularity Adiustment272
12.4 Memory Coalescing274
12.5 Summary277
12.6 Exercises279
References279
CHAPTER 13 Parallel Programming and Computational Thinking281
13.1 Goals of Parallel Computing282
13.2 Problem Decomposition283
13.3 Algorithm Selection287
13.4 Computational Thinking293
13.5 Summary294
13.6 Exercises294
References295
CHAPTER 14 An Introduction to OpenCLTM297
14.1 Background297
14.2 Data Parallelism Model299
14.3 Device Architecture301
14.4 Kernel Functions303
14.5 Device Management and Kernel Launch304
14.6 Electrostatic Potential Map in OpenCL307
14.7 Summary311
14.8 Exercises312
References313
CHAPTER 15 Parallel Programming with OpenACC315
15.1 OpenACC Versus CUDA C315
15.2 Execution Model318
15.3 Memory Model319
15.4 Basic OpenACC Programs320
Parallel Construct320
Loop Construct322
Kernels Construct327
Data Management331
Asynchronous Computation and Data Transfer335
15.5 Future Directions of OpenACC336
15.6 Exercises337
CHAPTER 16 Thrust:A Productivity-Oriented Library for CUDA339
16.1 Background339
16.2 Motivation342
16.3 Basic Thrust Features343
Iterators and Memory Space344
Interoperability345
16.4 Generic Programming347
16.5 Benefits of Abstraction349
16.6 Programmer Productivity349
Robustness350
Real-World Performance350
16.7 Best Practices352
Fusion353
Structure of Arrays354
Implicit Ranges356
16.8 Exercises357
References358
CHAPTER 17 CUDA FORTRAN359
17.1 CUDA FORTRAN and CUDA C Differences360
17.2 A First CUDA FORTR AN Program361
17.3 Multidimensional Array in CUDA FORTRAN363
17.4 Overloading Host/Device Routines With Generic Interfaces364
17.5 Calling CUDA C Via Iso_C_Binding367
17.6 Kernel Loop Directives and Reduction Operations369
17.7 Dynamic Shared Memory370
17.8 Asynchronous Data Transfers371
17.9 Compilation and Profiling377
17.1 0 Calling Thrust from CUDA FORTR AN378
17.1 1 Exercises382
CHAPTER 18 An Introduction to C++AMP383
18.1 Core C++Amp Features384
18.2 Details of the C++AMP Execution Model391
Explicit and Implicit Data Copies391
Asynchronous Operation393
Section Summary395
18.3 Managing Accelerators395
18.4 Tiled Execution398
18.5 C++AMP Graphics Features401
18.6 Summary405
18.7 Exercises405
CHAPTER 19 Programming a Heterogeneous Computing Cluster407
19.1 Background408
19.2 A Running Example408
19.3 MPI Basics410
19.4 MPI Point-to-Point Communication Types414
19.5 Overlapping Computation and Communication421
19.6 MPI Collective Communication431
19.7 Summary431
19.8 Exercises432
Reference433
CHAPTER 20 CUDA Dynamic Parallelism435
20.1 Background436
20.2 Dynamic Parallelism Overview438
20.3 Important Details439
Launch Environment Configuration439
API Errors and Launch Failures439
Events439
Streams440
Synchronization Scope441
20.4 Memory Visibility442
Global Memory442
Zero-Copy Memory442
Constant Memory442
Texture Memory443
20.5 A Simple Example444
20.6 Runtime Limitations446
Memory Footprint446
Nesting Depth448
Memory Allocation and Lifetime448
ECC Errors449
Streams449
Events449
Launch Pool449
20.7 A More Complex Example449
Linear Bezier Curves450
Quadratic Bezier Curves450
Bezier Curve Calculation(Predynamic Parallelism)450
Bezier Curve Calculation(with Dynamic Parallelism)453
20.8 Summary456
Reference457
CHAPTER 21 Conclusion and Future Outlook459
21.1 Goals Revisited459
21.2 Memory Model Evolution461
21.3 Kernel Execution Control Evolution464
21.4 Core Performance467
21.5 Programming Environment467
21.6 Future Outlook468
References469
Appendix A:Matrix Multiplication Host-Only Version Source Code471
Appendix B:GPU Compute Capabilities481
Index487
热门推荐
- 2800826.html
- 2614847.html
- 3298517.html
- 743609.html
- 1147302.html
- 3135924.html
- 3515782.html
- 1009707.html
- 2725847.html
- 624215.html
- http://www.ickdjs.cc/book_3144652.html
- http://www.ickdjs.cc/book_569868.html
- http://www.ickdjs.cc/book_3035451.html
- http://www.ickdjs.cc/book_3835693.html
- http://www.ickdjs.cc/book_1776603.html
- http://www.ickdjs.cc/book_2823761.html
- http://www.ickdjs.cc/book_1123213.html
- http://www.ickdjs.cc/book_2470536.html
- http://www.ickdjs.cc/book_348680.html
- http://www.ickdjs.cc/book_2543677.html