This paper proposes an image-guided HRTF selection procedure that exploits the relation between features of the pinna shape and HRTF notches. Using a 2D image of a user's pinna, the procedure selects from a database the HRTF set that best fits the anthropometry of that user. The proposed procedure is designed to be quickly applied and easy to use for a user without previous knowledge on binaural audio technologies. The entire process is evaluated by means of (i) an auditory model for sound localization in the mid-sagittal plane available from previous literature, and (ii) a short localization test in virtual reality. Using both virtual and real subjects from an HRTF database, predictions and the experimental evaluation aimed to assess the vertical localization performance with HRTF sets selected by the proposed procedure. Our results report a statistically significant improvement in predictions of the auditory model for localization performance with selected HRTFs compared to KEMAR HRTFs, which is a commercial standard in many binaural audio solutions. Moreover, the proposed localization test with human listeners reflect the model's predictions, further supporting the applicability of our perceptually-motivated metrics with anthropometric data extracted by pinna images.