Visual foveation is the association of a spatially variant resolution sensor – the retina – and a dynamic controlled mechanism for directing the area of maximal acuity. This paper describes a processing chain able to exploit and control foveated vision for high level interpretation tasks requiring high resolution in several areas of the field of view, such as subordinate or fine-grained recognition. The process sequentially rejects wrong hypotheses by applying binary classifiers between subsets of hypotheses on local parts according to an adaptive policy maximizing the rejection capacity. The algorithm is evaluated on a problem of fine-grained car classification. Foveation is mimicked by subsampling high resolution images1. The underlying addressed question is the design of high-level image understanding tasks that are compliant with a given visual bandwidth, i.e. with a given budget of acquired pixels per time.