Saturday, November 22, 2014

CUDA Application Design and Development

As the computer industry retools to leverage massively parallel graphics processing units (GPUs), this book is designed to meet the needs of working software developers who need to understand GPU programming with CUDA and increase efficiency in their projects. 

CUDA Application Design and Development starts with an introduction to parallel computing concepts for readers with no previous parallel experience, and focuses on issues of immediate importance to working software developers: achieving high performance, maintaining competitiveness, analyzing CUDA benefits versus costs, and determining application lifespan.

The book then details the thought behind CUDA and teaches how to create, analyze, and debug CUDA applications. Throughout, the focus is on software engineering issues: how to use CUDA in the context of existing application code, with existing compilers, languages, software tools, and industry-standard API libraries.
Using an approach refined in a series of well-received articles at Dr Dobb's Journal, author Rob Farber takes the reader step-by-step from fundamentals to implementation, moving from language theory to practical coding.

* Includes multiple examples building from simple to more complex applications in four key areas: machine learning, visualization, vision recognition, and mobile computing
* Addresses the foundational issues for CUDA development: multi-threaded programming and the different memory hierarchy
* Includes teaching chapters designed to give a full understanding of CUDA tools, techniques and structure.
* Presents CUDA techniques in the context of the hardware they are implemented on as well as other styles of programming that will help readers bridge into the new material.

Graphics Shaders: Theory and Practice

Programmable graphics shaders, programs that can be downloaded to a graphics processor (GPU) to carry out operations outside the fixed-function pipeline of earlier standards, have become a key feature of computer graphics. This book is designed to open computer graphics shader programming to the student, whether in a traditional class or on their own. It will complement texts based on fixed-function graphics APIs, specifically OpenGL. It introduces shader programming in general, and specifically the GLSL shader language. It also introduces a flexible, easy-to-use tool, glman, that helps you develop, test, and tune shaders outside an application that would use them.

API Design for C++

The design of application programming interfaces can affect the behavior, capabilities, stability, and ease of use of end-user applications. With this book, you will learn how to design a good API for large-scale long-term projects. With extensive C++ code to illustrate each concept, API Design for C++ covers all of the strategies of world-class API development. Martin Reddy draws on over fifteen years of experience in the software industry to offer in-depth discussions of interface design, documentation, testing, and the advanced topics of scripting and plug-in extensibility. Throughout, he focuses on various API styles and patterns that will allow you to produce elegant and durable libraries.

* The only book that teaches the strategies of C++ API development, including design, versioning, documentation, testing, scripting, and extensibility.
* Extensive code examples illustrate each concept, with fully functional examples and working source code for experimentation available online.
* Covers various API styles and patterns with a focus on practical and efficient designs for large-scale long-term projects.

Interactive Computer Graphics

This book is suitable for undergraduate students in computer science and engineering, for students in other disciplines who have good programming skills, and for professionals.

Computer animation and graphics–once rare, complicated, and comparatively expensive–are now prevalent in everyday life from the computer screen to the movie screen. Interactive Computer Graphics: A Top-Down Approach with Shader-Based OpenGL®, 6e, is the only introduction to computer graphics text for undergraduates that fully integrates OpenGL 3.1 and emphasizes application-based programming. Using C and C++, the top-down, programming-oriented approach allows for coverage of engaging 3D material early in the text so readers immediately begin to create their own 3D graphics. Low-level algorithms (for topics such as line drawing and filling polygons) are presented after readers learn to create graphics.

The Boost C++ Libraries

The Boost C++ Libraries introduces 38 general purpose Boost libraries. They should be of great use to C++ developers - no matter what industry they work in and no matter what software they create.
The most important goal of the book is to increase your efficiency as a C++ developer. You will learn how to use the Boost libraries to write less code with fewer bugs and finish projects faster. And you will see how the Boost libraries help you write more concise code that is more easily maintained and more easily understood by others.
Just as The Boost C++ Libraries focuses on increasing your efficiency, the author has tried hard to introduce the libraries as efficiently as possible so you can learn about the Boost libraries easily and quickly. Ideally you should be able to read the book in one or two days and understand each Boost library immediately, without having to read chapters a second time. Even if you have no experience with any of the 38 Boost libraries, once you have read the book, you should be able to decide which ones to use and know how to use them.
Although the book is not a reference, you may want to look up chapters from time to time to recall details. It does not replace the official documentation for the Boost libraries; instead it complements it.
The book comes with over 250 examples, which are short but complete - they can be built and run. The idea is to help you quickly understand what classes and functions the Boost libraries offer. Again, it's about getting you up to speed.
The author considers the book a success if you find the 38 Boost libraries introduced easy to use, and if they help you become a more productive C++ developer. He also considers it a success if you go through the book with ease and find the explanations and examples crystal-clear. This book and the Boost libraries should make your life as a C++ developer easier.
The Boost C++ Libraries introduces the following libraries from Boost 1.47.0, which was released in July 2011:

  1. Any 
  2. Array 
  3. Asio 
  4. Bimap 
  5. Bind 
  6. CircularBuffer 
  7. Conversion 
  8. DateTime 
  9. DynamicBitset 
  10. Exception 
  11. Filesystem 3 
  12. Foreach 
  13. Format 
  14. Function 
  15. Interprocess 
  16. Intrusive 
  17. Lambda 
  18. MinMax 
  19. MultiArray 
  20. MultiIndex 
  21. NumericConversion 
  22. Operators 
  23. PointerContainer 
  24. Ref 
  25. Regex 
  26. Serialization 
  27. Signals2 
  28. SmartPointers 
  29. Spirit 2.x 
  30. StringAlgorithms 
  31. System 
  32. Swap 
  33. Thread 
  34. Tokenizer 
  35. Tuple 
  36. Unordered 
  37. Utility 
  38. Variant

Physically-based Rendering

Physically Based Rendering, Second Edition describes both the mathematical theory behind a modern photorealistic rendering system as well as its practical implementation. A method known as "literate programming" combines human-readable documentation and source code into a single reference that is specifically designed to aid comprehension. Through the ideas and software in this book, you will learn to design and employ a full-featured rendering system for creating stunning imagery.
This new edition greatly refines its best-selling predecessor by adding sections on parallel rendering and system design; animating transformations; multispectral rendering; blue noise and adaptive sampling patterns and reconstruction; measured BRDFs; instant global illumination, as well as subsurface and multiple-scattering integrators. These updates reflect the current state-of-the-art technology, and along with the lucid pairing of text and code, ensure the book's leading position as a reference text for those working with images, whether it is for film, video, photography, digital design, visualization, or games.
The author team of Matt Pharr, Greg Humphreys, and Pat Hanrahan garnered a 2014 Academy Award for Scientific and Technical Achievement from the Academy of Motion Picture Arts and Sciences based on the knowledge shared in this book. The Academy called the book a "widely adopted practical roadmap for most physically based shading and lighting systems used in film production."

For further details, you can visit the official website for PBRT. 

Thursday, November 6, 2014

Using Shared Memory in CUDA C/C++

In a previous post, I looked at how global memory accesses by a group of threads can be coalesced into a single transaction, and how alignment and stride affect coalescing for various generations of CUDA hardware. For recent versions of CUDA hardware, misaligned data accesses are not a big issue. However, striding through global memory is problematic regardless of the generation of the CUDA hardware, and would seem to be unavoidable in many cases, such as when accessing elements in a multidimensional array along the second and higher dimensions. However, it is possible to coalesce memory access in such cases if we use shared memory. Before I show you how to avoid striding through global memory in the next post, first I need to describe shared memory in some detail.

Shared Memory

Because it is on-chip, shared memory is much faster than local and global memory. In fact, shared memory latency is roughly 100x lower than uncached global memory latency (provided that there are no bank conflicts between the threads, which we will examine later in this post). Shared memory is allocated per thread block, so all threads in the block have access to the same shared memory. Threads can access data in shared memory loaded from global memory by other threads within the same thread block. This capability (combined with thread synchronization) has a number of uses, such as user-managed data caches, high-performance cooperative parallel algorithms (parallel reductions, for example), and to facilitate global memory coalescing in cases where it would otherwise not be possible.

Thread Synchronization

When sharing data between threads, we need to be careful to avoid race conditions, because while threads in a block run logically in parallel, not all threads can execute physically at the same time. Let’s say that two threads A and B each load a data element from global memory and store it to shared memory. Then, thread A wants to read B’s element from shared memory, and vice versa. Let’s assume that A and B are threads in two different warps. If B has not finished writing its element before A tries to read it, we have a race condition, which can lead to undefined behavior and incorrect results.
To ensure correct results when parallel threads cooperate, we must synchronize the threads. CUDA provides a simple barrier synchronization primitive, __syncthreads(). A thread’s execution can only proceed past a __syncthreads() after all threads in its block have executed the__syncthreads(). Thus, we can avoid the race condition described above by calling__syncthreads() after the store to shared memory and before any threads load from shared memory. It’s important to be aware that calling __syncthreads() in divergent code is undefined and can lead to deadlock—all threads within a thread block must call __syncthreads() at the same point. 

For the rest of the article, have a look here

Wednesday, November 5, 2014


"... the perfect companion to Programming Massively Parallel Processors by Hwu  Kirk."
— Nicolas Pinto, Research Scientist at Harvard & MIT, NVIDIA Fellow 2009-2010.  

Graphics processing units (GPUs) can do much more than render graphics. Scientists and researchers increasingly look to GPUs to improve the efficiency and performance of computationally-intensive experiments across a range of disciplines.  GPU Computing Gems: Emerald Edition brings their techniques to you, showcasing GPU-based solutions including:  Black hole simulations with CUDA GPU-accelerated computation and interactive display of molecular orbitals Temporal data mining for neuroscience GPU -based parallelization for fast circuit optimization Fast graph cuts for computer vision Real-time stereo on GPGPU using progressive multi-resolution adaptive windows GPU image demosaicing Tomographic image reconstruction from unordered lines with CUDA Medical image processing using GPU -accelerated ITK image filters 41 more chapters of innovative GPU computing ideas, written to be accessible to researchers from any domain GPU Computing Gems: Emerald Edition is the first volume in Morgan Kaufmann's Applications of GPU Computing Series, offering the latest insights and research in computer vision, electronic design automation, emerging data-intensive applications, life sciences, medical imaging, ray tracing and rendering, scientific simulation, signal and audio processing, statistical modeling, and video / image processing.   Covers the breadth of industry from scientific simulation and electronic design automation to audio / video processing, medical imaging, computer vision, and more  Many examples leverage NVIDIA's CUDA parallel computing architecture, the most widely-adopted massively parallel programming solution  Offers insights and ideas as well as practical "hands-on" skills you can immediately put to use.


“Every C++ professional needs a copy of Effective C++. It is an absolute must-read for anyone thinking of doing serious C++ development. If you’ve never read Effective C++ and you think you know everything about C++, think again.”
— Steve Schirripa, Software Engineer, Google

“C++ and the C++ community have grown up in the last fifteen years, and the third edition of Effective C++ reflects this. The clear and precise style of the book is evidence of Scott’s deep insight and distinctive ability to impart knowledge.”
— Gerhard Kreuzer, Research and Development Engineer, Siemens AG

The first two editions of Effective C++ were embraced by hundreds of thousands of programmers worldwide. The reason is clear: Scott Meyers’ practical approach to C++ describes the rules of thumb used by the experts — the things they almost always do or almost always avoid doing — to produce clear, correct, efficient code.

The book is organized around 55 specific guidelines, each of which describes a way to write better C++. Each is backed by concrete examples. For this third edition, more than half the content is new, including added chapters on managing resources and using templates. Topics from the second edition have been extensively revised to reflect modern design considerations, including exceptions, design patterns, and multithreading.

Important features of Effective C++ include:
Expert guidance on the design of effective classes, functions, templates, and inheritance hierarchies.
Applications of new “TR1” standard library functionality, along with comparisons to existing standard library components.
Insights into differences between C++ and other languages (e.g., Java, C#, C) that help developers from those languages assimilate “the C++ way” of doing things.

Run CUDA without Recompilation on x86, AMD GPUs, and Intel Xeon Phi with gpuOcelot

Various pathways exist to run CUDA on a variety of different architectures. The freely available gpuOcelot project is unique in that it currently allows CUDA binaries to run on NVIDIA GPUs, AMD GPUs, x86 and Intel Xeon Phi at full speed without recompilation. It works by dynamically analyzing and recompiling the PTX instructions of the CUDA kernels so they can run on the destination device. Sound too good to be true? Udacity has prepared a tutorial to run CUDA codes without a GPU under Linux (link). The tutorial also provides links to using gpuOcelot on Windows and Mac.

For further details, have a look here !