Reading Notes: Andrej Karpathy's Software 2.0

Notes on the paradigm shift from traditional software to neural network-based software.

Published May 8, 2024 ET

Source: https://karpathy.medium.com/software-2-0-a64152b37c35

Summary

  • Core task of software 2.0: curating, growing, massaging and cleaning labeled datasets (vs 1.0: maintain and iterate on code)
  • Key advantage of 2.0: for many problems, it's significantly easier to collect the data than to explicitly write the program
  • Use cases: Visual Recognition, Speech Recognition, Speech Synthesis, Machine Translation, Games, Databases

Neural Networks are made up of only 2 operations:

  • Matrix Multiplication
  • Thresholding at Zero (ReLU)

Benefits of Software 2.0

  • Computational Homogeneity: only two operations despite the use case (allows baking operations directly into hardware)
  • Constant Running Time: and memory use
  • High Portability
  • Agility

Disadvantages of Software 2.0

  • Obfuscated: "Across many applications areas, we'll be left with a choice of using a 90% accurate model we understand, or 99% accurate model we don't"
  • Unpredictable Failure: silently adopting biases in training data
  • Still new: properties are being discovered

Key Concepts

  • Backpropagation & Stochastic Gradient Descent: allows locating programs in the program space more efficiently
  • Matrix Multiplication
  • Thresholding at Zero (ReLU)
  • ASICs - Application-Specific Integrated Circuit
  • Neuromorphic chips - computing inspired by the structure and function of the human brain

How Backpropagation Works

  1. Use the loss function to quantify the difference between predicted output and actual output
  2. Calculate the gradient of the loss function with respect to weights and biases
  3. Update the weights and biases accordingly
  4. Repeat forward propagation and back propagation until "convergence"

Static vs Recurrent Back Propagation

  • Static: employed in Feed-Forward Neural Networks (OCR, Spam Detection)
  • Recurrent: used for sentiment analysis, time series prediction