The main objective of this subproject is to endow IoT devices with intelligence based on artificial vision with low energy cost by integrating advanced sensing capabilities and algorithmics adapted to such capabilities. Today, convolutional neural networks (CNNs) constitute the underlying processing architecture for a multitude of vision-related tasks. Although their precision is much higher than that of classical vision algorithms based on manually designed feature extraction (in fact, this is the main reason for the great relevance of CNNs), the hardware and power resources they require are massive. This is fundamentally due to the fact that the input data flow of these neural networks consists of a serialization of the raw information provided by the sensor (at most, this information previously passes through a specific processor for image improvement: edge enhancement, tone mapping, etc.) We intend to explore different alternatives to incorporate vision to embedded platforms in a much more efficient way. We will start by tackling the problem of the reliable generation of scene representations in all kinds of situations. Thus, we will study high-dynamic-range techniques based on the operation of natural systems (in particular the retina) to be able to accommodate extreme lighting conditions in a signal range equivalent to 8 bits. This will provide computational relief from the very beginning of the signal chain. At pixel level, we will research on an operation based on interaction of two diodes that will continuously provide each other with information regarding the local and global lighting in the scene. The tasks to be carried out on this topic will range from the physical modeling of the photodiodes, the design of circuits, the implementation of an integrated circuit, and finally its testing. We will also study the potential of compressive learning as an alternative mechanism to conventional sensing based on frames and subsequent inference based on CNNs. Through this learning, the compressive samples generated by a prototype chip that we will design in this subproject will be analyzed and classified by an algorithm (for example, a support vector machine) co-designed with the sensor. As an application scenario for compressive learning, we will work on facial recognition, which is of special interest to the IoT due to the growing importance of privacy. We will also study how emerging sensory modalities (event-based vision, depth sensing, multi-spectral sensing) can be combined with CNNs to increase the performance of embedded vision systems).
Project PID2021-128009OB-C31 funded by MICIU / AEI / 10.13039/501100011033 / and FEDER 'Una manera de hacer Europa'.