Skip to main content

Supervised machine learning: Introduction


In a previous post, I discussed statistical artificial intelligence as a family of methods for optimizing the behavior of software functions whose job it is to make judgments from data.  The oldest and most familiar of these is an approach called supervised learning.

A supervised learning system is one that processes signals in a mathematical question-and-answer format.  When presented with input (made up of 'known' variables or 'questions'), the system's job is to produce accurate output (made up of 'unknown' variables or 'answers').  In the most general case, a very large number of input and output variables may be involved, and all of the input and output variables may be continuous-valued (or analog).  Situations in which inputs or outputs take on discrete values (such as 1 for 'true' and 0 for 'false') are special cases of the more general analog case.  As such, the act of supervised categorization or classification of data is a special case of analog regression or prediction.

In supervised systems, the training data consists of many input-output pairs, where the outputs are correct answers given the input (this is the sense in which training is 'supervised').  In between inputs and outputs lies computational machinery that includes model variables akin to 'volume knobs' whose job it is to transform input signals to output signals.  These 'volume knobs' are acted upon by the learning process.  Training proceeds by presenting an input, using the existing machinery to generate an output, and measuring the errors between the generated output and the correct answers.  Changes to the 'volume knobs' are made to improve the result.  When performed in small steps across many input-output pairs, improvements are made that can best serve all of them.

Such learning is only possible if there is statistical regularity in the training data.  Figure A shows a data set with statistical regularity that will be useful for learning how to predict the unknown variable from the known variable, whereas Figure B shows a data set with little predictive value.

The author's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended to convey or imply MITRE's concurrence with, or support for, the positions, opinions, or viewpoints expressed by the author.