Hardware Platforms for Embedded Computer Vision, Image Processing and Deep Learning
When developing computer vision or image processing applications for a desktop computer, the choices in hardware platforms are very simple: if you don’t need much processing then you use a cheap laptop or PC with integrated GPU, or if you need lots of processing power you use a fast CPU with a powerful dedicated GPU card. But for embedded systems, there are a lot more options to choose from, and there is no single device that is suitable for all embedded platforms because each has some advantages and some disadvantages. So this is a basic summary of embedded hardware that can be used for computer vision and image processing:
|Device Family||Fastest Calculations||Battery Usage||Features|
|Microcontroller||<0.2 GFLOPS||<0.3 Watts||Most microcontrollers (eg: Arduino, AVR, PIC) are far too slow for camera processing, but a 32-bit ARM Cortex-M4 (eg: $15 168MHz STM32F407) might handle some extremely basic camera applications. Microcontrollers only support very minimal OSes, so typically your software runs as low-level firmware, and you must write most algorithms & code yourself, with the advantage of direct access to the hardware such as I/O pins & timers and hard-realtime operation.|
|Mobile SOC dev board or tablet||1 – 25 GFLOPS||1 – 6 Watts||The latest mobile ARM CPUs can provide both great speed & low battery draw. If you mostly do integer processing then a cheaper Cortex-A8 (eg: $45 BeagleBone Black) or even an ARM11 (eg: $25 Raspberry Pi) might be good enough, but if you do a lot of floating-point then you should definitely get a Cortex-A9 (eg: $65 quad-core ODROID-U3) or Cortex-A15 (eg: $139 quad-core ODROID-XU or $192 quad-core Jetson TK1) board, since their FPU hardware is many times faster than the FPU in Cortex-A8. If you want the most performance available in this class then the GPU acceleration of Jetson TK1 is fastest, or if you want smallest size then a Gumstix Overo (>$100) is tiny, or if you want the most efficient CPU then Cortex-A7 (eg: $55 A20-OLinuXino-LIME2) is the most efficient CPU. If you also want to visualize images on a screen then a rooted tablet running Android or Linux might be a better option (using Wifi or Bluetooth to a microcontroller if you need I/O access). Software development for an ARM SOC is similar to desktop software and libraries like OpenCV are supported on ARM, but its not as easy as x86. Running Android or standard Linux as your OS means your code can be interrupted just like on a PC, but some real-time OSes are available.|
|x86 SBC, netbook or small laptop||15 – 110 GFLOPS||30 – 100 Watts||Portable computers (eg: $139 Mini-ITX M/B + $300 quad-core 3.5GHz Core i7 CPU) can have a really fast CPU and are really easy to develop code on just like a PC, but are a lot more power hungry & larger. If you want to also visualize images then a netbook might be a better option than an SBC. Although software typically runs fastest on x86 CPUs than other CPUs, your OS will often interrupt your code to perform things in the background, potentially causing you to lose camera frames, but some real-time OSes are available.|
|x86 laptop with dGPU||240 – 2200 GFLOPS||40 – 110 Watts||Some larger laptops include a dedicated GPU capable of CUDA or OpenCL GPU acceleration (eg: $1200 MSI GE60 or $1700 Alienware 14), so are very well suited to intense computer vision. These are also really easy to develop code on just like a PC, but are a lot more power hungry, heavy & larger, even compared to x86 SBCs & netbooks.|
|FPGA/DSP/ASIC/DPU/CV hardware design||50 – 1000 GFLOPS||0.5 – 3 Watts||FPGAs (eg: $199 Cyclone II Starter Kit + $85 5MP camera) can be extremely fast with extremely low battery usage, but are very complex to design, taking months or years! With FPGAs you design the actual hardware logic of the chip, not the software, so the programming model is completely different to software programming, although you can insert a CPU into your FPGA design as well. Since you are effectively designing an electronic circuit, it is extremely low-level but also deterministic, so you can have guarantees that you won’t drop camera frames, etc.|
Some high-end DSPs are powerful enough for vision or deep learning and are basically CPUs with large scale SIMD instructions, and thus need specialized programming but are much easier to develop algorithms on than FPGAs (eg: TI C6x / EVE DSPs, Qualcomm “Machine Learning Platform” DSPs, and Analog Devices).
Some offer specialized CPU designs that require specialized programming such as multi-processor (eg: Adapteva Parallella FPUs) or VLIW + Vector CPUs or DSPs (eg: Mobileye EyeQ, Tensilica Xtensa, CEVA CEVA-XM and Qualcomm Hexagon).
There are also some fixed-function imaging or vision accelerator ASIC chips that are extremely efficient at certain very specific algorithms (eg: Ambarella, Movidius Myriad 2, TI EVE, NEC IMAPCAR, Inuitive, FotoNation IPU, Renesas IMP, Visconti, Sensity / Eutecus, and Analog Devices).
Or if you are designing your own custom silicon chips, there are vision IP cores to put into your own silicon chips (eg: CEVA, Tensilica, Synopsis, Videantis, Apical, CogniVue, Imagination Technologies, Vivante / VeriSilicon, and Adapteva “Epiphany”).
There are also some highly parallel Neural Network / Deep Learning Processor DPU AI accelerator chips on the market now or coming soon (eg: Nervana / Intel, Wave Computing, DeePhi, IBM TrueNorth Neuromorphic computer and Toshiba TDNN).
Commercial Computer Vision Software providers:
In addition to hardware choices, you might also be interested in paying a computer vision development company to provide the software or algorithms, such as:
- VectorBlox (vision FPGA development)
- ZMicro (vision FPGA development)
- AImotive / AdasWorks (software for self-driving cars and ADAS)
- Algolux, Itseez / Intel (creators & maintainers of OpenCV)
- QuEST Global (computer vision firmware development)
- Mitek (image based POS sales)
- Numenta (software for anomaly detection)
- PathPartner Technology
- Speed testing a website
- ARM’s 64-bit mode (AArch64) ARMv8)
- Motivation for hand-optimized Assembly code
- Best Concise Linux System Administrator’s Guide (SAG)
- Color-based Blob Detection
- Mixing Assembly language with Visual Basic
- Using HTTPWebRequest & HTTPWebResponse to automate web browsing
- Setting up IIS 7.5 and Apache on same server
- My experience running PHP on IIS 7.5
- Script to check available diskspace in C#/.net