OVERVIEW OF NEURAL NETWORK AI & DEEP LEARNING
Neural networks were developed to simulate the human nervous system for machine
learning tasks by treating the computational units in a learning model in a manner similar
to human neurons. The grand vision of neural networks is to create artificial intelligence
by building machines whose architecture simulates the computations in the human nervous
system. This is obviously not a simple task because the computational power of the
fastest computer today is a minuscule fraction of the computational power of a human
brain. Neural networks were developed soon after the advent of computers in the fifties and
sixties. Rosenblatt's perceptron algorithm was seen as a fundamental cornerstone of neural
networks, which caused an initial excitement about the prospects of artificial intelligence.
However, after the initial euphoria, there was a period of disappointment in which the data
hungry and computationally intensive nature of neural networks was seen as an impediment
to their usability. Eventually, at the turn of the century, greater data availability and increasing
computational power lead to increased successes of neural networks, and this area
was reborn under the new label of "deep learning."
AI Performance
Although we are still far from the day that artificial intelligence (AI) is close to human performance, there are specific domains
like image recognition, self-driving cars, and game playing, where AI has matched or exceeded
human performance. It is also hard to predict what AI might be able to do in the
future. For example, few computer vision experts would have thought two decades ago that
any automated system could ever perform an intuitive task like categorizing an image more
accurately than a human.
Neural networks are theoretically capable of learning any mathematical function with
sufficient training data, and some variants like recurrent neural networks are known to be
Turing complete. Turing completeness refers to the fact that a neural network can simulate
any learning algorithm, given sufficient training data. The sticking point is that the amount
of data required to learn even simple tasks is often extraordinarily large, which causes a
corresponding increase in training time (if we assume that enough training data is available
in the first place). For example, the training time for image recognition, which is a simple
task for a human, can be on the order of weeks even on high-performance systems. Furthermore,
there are practical issues associated with the stability of neural network training,
which are being resolved even today. Nevertheless, given that the speed of computers is
expected to increase rapidly over time, and fundamentally more powerful paradigms like
quantum computing are on the horizon, the computational issue might not eventually turn
out to be quite as critical as imagined.
Although the biological analogy of neural networks is an exciting one and evokes comparisons
with science fiction, the mathematical understanding of neural networks is a more
mundane one. The neural network abstraction can be viewed as a modular approach of
enabling learning algorithms that are based on continuous optimization on a computational
graph of dependencies between the input and output. To be fair, this is not very different
from traditional work in control theory; indeed, some of the methods used for optimization
in control theory are strikingly similar to (and historically preceded) the most fundamental
algorithms in neural networks. However, the large amounts of data available in recent years
together with increased computational power have enabled experimentation with deeper
architectures of these computational graphs than was previously possible. The resulting
success has changed the broader perception of the potential of deep learning.