Computer vision (CV) based on Convolutional Neural Networks (CNN) is a rapidly developing field thanks to CNN's flexibility, strong generalization capability and classification accuracy (matching and sometimes exceeding human performance). CNN-based classifiers are typically deployed on servers or high-end embedded platforms. However, their ability to “compress” low information density data such as images into highly informative classification tags makes them extremely interesting for wearable and IoT scenarios, should it be possible to fit their computational requirements within deeply embedded devices such as visual sensor nodes. We propose a 65nm system-on-chip implementing a hybrid HW/SW CNN accelerator while meeting this energy efficiency target. The SoC integrates a near-threshold parallel processor cluster  and a hardware accelerator for convolution-accumulation operations , which constitute the basic kernel of CNNs: it achieves peak performance of 11.2 GMAC/s @ 1.2 V and peak energy efficiency of 261 GMAC/s/W @ 0.65V.