Reading Notes: Andrej Karpathy's Software 2.0
Notes on the paradigm shift from traditional software to neural network-based software.
Published May 8, 2024 ET
Source: https://karpathy.medium.com/software-2-0-a64152b37c35
Summary
- Core task of software 2.0: curating, growing, massaging and cleaning labeled datasets (vs 1.0: maintain and iterate on code)
- Key advantage of 2.0: for many problems, it's significantly easier to collect the data than to explicitly write the program
- Use cases: Visual Recognition, Speech Recognition, Speech Synthesis, Machine Translation, Games, Databases
Neural Networks are made up of only 2 operations:
- Matrix Multiplication
- Thresholding at Zero (ReLU)
Benefits of Software 2.0
- Computational Homogeneity: only two operations despite the use case (allows baking operations directly into hardware)
- Constant Running Time: and memory use
- High Portability
- Agility
Disadvantages of Software 2.0
- Obfuscated: "Across many applications areas, we'll be left with a choice of using a 90% accurate model we understand, or 99% accurate model we don't"
- Unpredictable Failure: silently adopting biases in training data
- Still new: properties are being discovered
Key Concepts
- Backpropagation & Stochastic Gradient Descent: allows locating programs in the program space more efficiently
- Matrix Multiplication
- Thresholding at Zero (ReLU)
- ASICs - Application-Specific Integrated Circuit
- Neuromorphic chips - computing inspired by the structure and function of the human brain
How Backpropagation Works
- Use the loss function to quantify the difference between predicted output and actual output
- Calculate the gradient of the loss function with respect to weights and biases
- Update the weights and biases accordingly
- Repeat forward propagation and back propagation until "convergence"
Static vs Recurrent Back Propagation
- Static: employed in Feed-Forward Neural Networks (OCR, Spam Detection)
- Recurrent: used for sentiment analysis, time series prediction