Varietal identification plays a pivotal role in viticulture for several purposes. Nowadays, such identification is accomplished using ampelography and molecular markers, techniques requiring specific expertise and equipment. Deep learning, on the other hand, appears to be a viable and cost-effective alternative, as several recent studies claim that computer vision models can identify different vine varieties with high accuracy. Such works, however, limit their scope to a handful of selected varieties and do not provide accurate figures for external data validation. In the current study, five well-known computer vision models were applied to leaf images to verify whether the results presented in the literature can be replicated over a larger data set consisting of 27 varieties with 26 382 images. It was built over 2 years of dedicated field sampling at three geographically distinct sites, and a validation data set was collected from the Internet. Cross-validation results on the purpose-built data set confirm literature results. However, the same models, when validated against the independent data set, appear unable to generalize over the training data and retain the performances measured during cross validation. These results indicate that further enhancement have been done in filling such a gap and developing a more reliable model to discriminate among grape varieties, underlining that, to achieve this purpose, the image resolution appears to be a crucial factor in the development of such models.