TY - JOUR
T1 - A Fast Monocular 6D Pose Estimation Method for Textureless Objects Based on Perceptual Hashing and Template Matching
AU - Moises Araya-Martinez, Jose
AU - Soares Matthiesen, Vinicius
AU - Bøgh, Simon
AU - Lambrecht, Jens
AU - Figueiredo, Rui Miguel Horta Pimentel de
PY - 2025/1/8
Y1 - 2025/1/8
N2 - Object pose estimation is essential for computer vision applications such as quality inspection, robotic bin picking, and warehouse logistics. However, this task often requires expensive equipment such as 3D cameras or Lidar sensors, as well as significant computational resources.Many state-of-the-art methods for 6D pose estimation depend on deep neural networks, which are computationally demanding and require GPUs for real-time performance. Moreover, they usually involve collection and labeling of large training datasets, which is costly and time-consuming.We propose a template-based matching algorithm that utilizes a novel perceptual hashing method for binary images, enabling fast and robust pose estimation. This approach allows the automatic preselection of a subset of templates, significantly reducing inference time while maintaining similar accuracy. Our solution runs efficiently on multiple devices without GPU support, offering reduced runtime and high accuracy on cost-effective hardware.We benchmarked our proposed approach on a body-in-white automotive part, relevant to the automotive industry and on a widely-used publicly available dataset. Our set of experiments, on a synthetically generated dataset reveals a superior trade-off between accuracy and computation time compared to a previous work evaluated on the same automotive-production use case. The algorithm Additionally, our algorithm efficiently utilizes all CPU cores and includes adjustable parameters for balancing computation time and accuracy, making it suitable for a wide range of 1 Araya-Martinez et al.applications where hardware cost and power efficiency are critical. For instance, with a rotation step of 10°in the template database, we achieve an average rotation error of 10°, matching the template quantization level, and an average translation error of 14cm 14% of the object's size, with an average processing time of 0.3s per image on an small form-factor Nvidia AGX Orin device. We also evaluate robustness under partial occlusions (up to 10% occlusion) and noisy inputs (SNRs up to 10dB), with only minor losses in accuracy. Additionally, we compare our method to state-of-the-art deep learning models on a public dataset. While our algorithm does not outperform them in absolute accuracy, it provides a more favorable trade-off between accuracy and processing time, which is especially relevant to applications employing resource-constrained devices.
AB - Object pose estimation is essential for computer vision applications such as quality inspection, robotic bin picking, and warehouse logistics. However, this task often requires expensive equipment such as 3D cameras or Lidar sensors, as well as significant computational resources.Many state-of-the-art methods for 6D pose estimation depend on deep neural networks, which are computationally demanding and require GPUs for real-time performance. Moreover, they usually involve collection and labeling of large training datasets, which is costly and time-consuming.We propose a template-based matching algorithm that utilizes a novel perceptual hashing method for binary images, enabling fast and robust pose estimation. This approach allows the automatic preselection of a subset of templates, significantly reducing inference time while maintaining similar accuracy. Our solution runs efficiently on multiple devices without GPU support, offering reduced runtime and high accuracy on cost-effective hardware.We benchmarked our proposed approach on a body-in-white automotive part, relevant to the automotive industry and on a widely-used publicly available dataset. Our set of experiments, on a synthetically generated dataset reveals a superior trade-off between accuracy and computation time compared to a previous work evaluated on the same automotive-production use case. The algorithm Additionally, our algorithm efficiently utilizes all CPU cores and includes adjustable parameters for balancing computation time and accuracy, making it suitable for a wide range of 1 Araya-Martinez et al.applications where hardware cost and power efficiency are critical. For instance, with a rotation step of 10°in the template database, we achieve an average rotation error of 10°, matching the template quantization level, and an average translation error of 14cm 14% of the object's size, with an average processing time of 0.3s per image on an small form-factor Nvidia AGX Orin device. We also evaluate robustness under partial occlusions (up to 10% occlusion) and noisy inputs (SNRs up to 10dB), with only minor losses in accuracy. Additionally, we compare our method to state-of-the-art deep learning models on a public dataset. While our algorithm does not outperform them in absolute accuracy, it provides a more favorable trade-off between accuracy and processing time, which is especially relevant to applications employing resource-constrained devices.
KW - 6D pose estimation
KW - Artificial Intelligence (AI)
KW - Automotive production
KW - Perceptual hashing
KW - Hamming distance
KW - Object pose estimation
KW - Computer Vision
KW - Robotic
KW - Cameras
U2 - 10.3389/frobt.2024.1424036
DO - 10.3389/frobt.2024.1424036
M3 - Journal article
SN - 2296-9144
VL - 11
JO - Frontiers in Robotics and AI
JF - Frontiers in Robotics and AI
ER -