Reading Notes: Andrej Karpathy's Software 2.0

Notes on the paradigm shift from traditional software to neural network-based software.

Published May 8, 2024 ET

ai reading-notes

Source: https://karpathy.medium.com/software-2-0-a64152b37c35

Summary

Core task of software 2.0: curating, growing, massaging and cleaning labeled datasets (vs 1.0: maintain and iterate on code)
Key advantage of 2.0: for many problems, it's significantly easier to collect the data than to explicitly write the program
Use cases: Visual Recognition, Speech Recognition, Speech Synthesis, Machine Translation, Games, Databases

Neural Networks are made up of only 2 operations:

Matrix Multiplication
Thresholding at Zero (ReLU)

Benefits of Software 2.0

Computational Homogeneity: only two operations despite the use case (allows baking operations directly into hardware)
Constant Running Time: and memory use
High Portability
Agility

Disadvantages of Software 2.0

Obfuscated: "Across many applications areas, we'll be left with a choice of using a 90% accurate model we understand, or 99% accurate model we don't"
Unpredictable Failure: silently adopting biases in training data
Still new: properties are being discovered

Key Concepts

Backpropagation & Stochastic Gradient Descent: allows locating programs in the program space more efficiently
Matrix Multiplication
Thresholding at Zero (ReLU)
ASICs - Application-Specific Integrated Circuit
Neuromorphic chips - computing inspired by the structure and function of the human brain

How Backpropagation Works

Use the loss function to quantify the difference between predicted output and actual output
Calculate the gradient of the loss function with respect to weights and biases
Update the weights and biases accordingly
Repeat forward propagation and back propagation until "convergence"

Static vs Recurrent Back Propagation

Static: employed in Feed-Forward Neural Networks (OCR, Spam Detection)
Recurrent: used for sentiment analysis, time series prediction