Skip to main content

The 'one-shot' problem in vision science

Understanding how we perceive objects and how we might automate this perception has been difficult, partly because of how problems in this field of study have been formulated.  So much effort in trying to understand how we perceive physical scenes has been driven by language and its categories.  The mechanisms of visual perception are largely unconscious, and as a result, there are some basic visual skills that researchers may easily overlook.

There is a formulation that may clarify our understanding of object perception.  It's called the 'one-shot' problem in object identification (let's leave object classification aside for now).  Show a human two images, each containing something looking like a real but unfamiliar rigid, opaque object, and ask them to judge whether the contents of those two images might be the same object.  Not in the same class or category, but the same actual object.  Might these two objects have a single identity?

This decision involves similarity analysis and correspondence analysis, between patterns of shape and pigmentation of opaque light-reflecting surfaces in three dimensions.  Reasonable changes in point of view and illumination are the confounding factors here.

The important thing about this visual skill in humans is that the viewed objects are novel, so a person can't have memorized a library of various different presentations to compare to any new presentation.  There is no 'holistic' consolidated representation of the object and how it looks under different conditions.

For this to work, local patches of 3D surface and local patterns of pigment must be familiar even if the whole object is unfamiliar.  The projective geometry of different viewpoints and different illumination conditions, and how to relate these, must also be familiar.

Can a cognitive system account for, and then discount, changes in illumination and viewpoint?  You and I can.

The author's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended to convey or imply MITRE's concurrence with, or support for, the positions, opinions or viewpoints expressed by the author.

Blog Name