PhD Thesis, Jon Zubizarreta
Title: Monocular Visual Perception Techniques for Augmented Reality and Mobile Robotics Applications in Industry
Defense Date: 15/12/2019
Director: Iker Aguinaga Hoyos
The current advances in communication and computing technologies are having a large impact in industry, leading to what’s known as the fourth industrial revolution or Industry 4.0. One of the challenges being addressed is to augment machines with the intelligence to mimic the cognitive functions of the human mind. In this context, machine perception is one of the core capacities to interpret data related to the world around us. For this purpose, computer vision (CV) is a commonly used solutions due its versatility and low cost implementation of the optical sensors.
This thesis studies two different visual perception problems: object recognition and simultaneous localization and mapping (SLAM). The proposed solutions focus on single camera (monocular) approaches in industrial environments. This is specially challenging due to the lack of textured surfaces of objects typical in industry, uncontrolled illumination changes, non-Lambertian materials – that render many reflections – and cluttered scenes. Both problems consist in understanding the scene and determining the camera motion as accurately as possible. Object recognition sets its focus on identifying target 3D objects in the scene, whereas SLAM aims to recover the 3D structure of the scene.
The first part of this thesis proposes a novel model-based object recognition method which uses geometric properties. It combines model surface conics and edge templates to reduce the image search space increasing the localization robustness and saving computational time. In addition, the proposed method is integrated into a complete augmented reality (AR) framework for guidance in maintenance in industry, called ARgitu. It generates and presents virtual and augmented information, including the tools required for the development of new contents and adapt AR technology applications into the advanced manufacturing industry.
The second part of this thesis presents a direct monocular SLAM system, called Direct Sparse Mapping (DSM). It uses a direct formulation within a mapping framework to locate the position of the camera in the scene and build a consistent global map. Up to our knowledge, this is the first fully direct SLAM approach to reuse map point reobservations. As a direct method, it does not rely on point matches and it can work with points sampled across image edges – instead of only corners – and obtain a more descriptive reconstruction despite the sparse geometry representation. The system is robust in scenes with low texture and motion blur. The extensive experimental validation demonstrates that the proposed direct mapping framework outperforms current direct odometry approaches – even with loop closure – both in the estimated trajectory and map accuracy.