Thursday, 16 May 2013

GPGPUES!!! What?!


What the hell is GPGPUES?


This is about General Purpose computing on Graphics Processing Units for Embedded Systems.

Choices And Rationale

When it comes to computing using GPUs on Embedded Systems, there are myriad ways and platforms. Presented here are what we choose and why.

GPU/Graphics APIs

As about embedded systems, we'll be focusing on mobile/portable devices (phones/ tablets/ wearable/ ...) and may be embedded systems that hide in your TV, robot or sensors. And hence the only sane choice is OpenGL ES 2.0 and later versions. Other 3D graphics APIs like DirectX or even desktop equivalent of OpenGL don't find room here. Also we won't be going as far to consider OpenCL and cousins. Market share of mobile devices supporting these is near zero at the time of writing this (pre 2020AD).

Further to not restrict ourselves to specific hardware, we'll try our best to avoid non-standard or proprietary or even non-mandatory extensions of OpenGL ES 2.0. Where ever an extension or a feature of OpenGL ES 3.0 or higher is likely to give a boost, it will be mentioned with an unmissable glaring alarm after providing equivalent standard method to avoid it for the sake of compatibility.

Exception: Most General Purpose computing algorithms we try to run on a GPU will still need to represent data as floats. Data will be passed as textures to fragment shaders, not as coordinates to vertex shaders. Unfortunately, vanilla OpenGL ES 2.0 doesn't mandate any float point textures. Well, we can and we will indeed pack/unpack a 32 bit float in a simple color texture RGBA (red, green, blue, alpha - each one byte), but that's far from ideal. We'd have ticked our marketing checkbox of running on GPU but the performance due to packing/unpacking overhead will be abysmal. So, we'll prefer to do GPGPU on GPUs that do support float point textures. In such textures, each color component (eg. red) can hold one 32bit float. Given the pace of development of mobile GPU technology especially since Android proliferated, we are already seeing good competition in supporting 32 bit float point textures. Exclusively to support GPGPU on ES, many mobile GPU vendors are coming forward with such advanced features that have far less role in a pure graphics perspective.

Clarification: When used for things that don't involve graphics, there is no confusion about GPGPU. But it may sound confusing when we use the GPU in graphics context. Here's what and what isn't GPGPU. When we display a swarm on screen, we use GPU for its original purpose - graphics. But if we calculate physics (movement, kinematics, interaction of individual swarm elements, ...) on the GPU before displaying it, that would be GPGPU.

Platform

Now the platform of choice must not just be ubiquitous enough but open as well. Luckily unlike a decade ago we do have a winner - Android. This dynamic dashboard shows that at least 99.9% active Android devices support OpenGL ES 2.0. Amen.

Platform Access


The platform itself isn't the focus here. Most of our task in GPGPU will be using programmable fragment shaders (typically compiled on the fly) and running on the GPU.

The platform's language just provides a framework to pass raw data to and receive processed data from the GPU.


Android provides multiple ways of harnessing the GPU. The GPU is exposed through, OpenGL ES 1.x APIs and OpenGL ES 2.0 APIs both from Java (Android SDK) and C/C++ (Android NDK); and the GPU to some extent through RenderScript. Of those RenderScript can apparently run on CPU/GPU/DSP but is specific to Android. While we choose to focus on Android, the content is expected to be same for other device supporting OpenGL ES 2.0 as well. So RenderScript loses out. And why have an extra wrapper when we can go native? After all our job here is high performance computation. NDK wins over SDK.


GIST: We'll focus on GPGPU using OpenGL ES 2.0 on Android platform using NDK. 

Since we can't be bothered to describe those terms in detail, here are the wikipedia links (and excerpts) about the terms involved.

GPU: Graphics Processing Unit

From wikipedia: GPU:
A graphics processing unit (GPU), also occasionally called visual processing unit (VPU), is a specialized electronic circuit designed to rapidly manipulate and alter memory to accelerate the creation of images in a frame buffer intended for output to a display.

GPGPU: General-purpose computing on graphics processing units 

From wikipedia: GPGPU:
General-purpose computing on graphics processing units (General-purpose graphics processing unit, GPGPU, GPGP or less often GP²U) is the utilization of a graphics processing unit (GPU), which typically handles computation only for computer graphics, to perform computation in applications traditionally handled by the central processing unit (CPU).

ES: Embedded Systems 

From wikipedia: Embedded Systems:
An embedded system is a computer system with a dedicated function within a larger mechanical or electrical system, often with real-time computing constraints. It is embedded as part of a complete device often including hardware and mechanical parts. By contrast, a general-purpose computer, such as a personal computer (PC), is designed to be flexible and to meet a wide range of end-user needs.

OpenGL ES: ES suffix here is of course inspired by use of the same in OpenGL context a.k.a OpenGL ES 

OpenGL for Embedded Systems (OpenGL ES) is a subset of the OpenGL 3D graphics application programming interface (API) designed for embedded systems such as mobile phones, PDAs, and video game consoles. There is no GLUT or GLU. OpenGL ES is managed by the not-for-profit technology consortium, the Khronos Group, Inc.

OpenCL: Open Computing Language

Open Computing Language (OpenCL) is a framework for writing programs that execute across heterogeneous platforms consisting of central processing units (CPUs), graphics processing units (GPUs), DSPs and other processors. OpenCL includes a language (based on C99) for writing kernels (functions that execute on OpenCL devices), plus application programming interfaces (APIs) that are used to define and then control the platforms. OpenCL provides parallel computing using task-based and data-based parallelism. 

Android: Android Operating System used on mobile/portable/embedded devices

Android is a Linux-based operating system designed primarily for touchscreen mobile devices such as smartphones and tablet computers. Initially developed by Android, Inc., which Google backed financially and later bought in 2005, Android was unveiled in 2007 along with the founding of the Open Handset Alliance: a consortium of hardware, software, and telecommunication companies devoted to advancing open standards for mobile devices. The first Android-powered phone was sold in October 2008.

Renderscript: Render Script for computation using CPU/GPU/DSP on Android

From Android's own website: RenderScript Compute
Renderscript offers a high performance computation API at the native level that you write in C (C99 standard). Renderscript gives your apps the ability to run operations with automatic parallelization across all available processor cores. It also supports different types of processors such as the CPU, GPU or DSP. Renderscript is useful for apps that do image processing, mathematical modeling, or any operations that require lots of mathematical computation.
While some of Renderscript's functionality has gotten better over years, its graphics ability was curtailed/deprecated without pre-information. Android team justified the deprecation with claims that it (- the graphics feature or the entire thing) was experimental but wasn't mentioned so in any open documentation till SDK 4.1. Now should the whole thing be removed by official Android, here is a wikipedia link to Renderscript.

Look forward to tutorials, code snippets and projects here.