Collected Notes on LLMs and Neural Nets

My journey understanding AI, Machine Learning, LLMs, and the various models, patterns, companies, and concepts that shape this field.

Published May 8, 2024 ET

Series: My AI Journey, Part 1

This series follows my journey of understanding AI, Machine Learning, LLMs, and the many various models, patterns, companies, APIs, people, concepts, movers, and news that shapes that understanding.

Reading Notes:

Reading List:

A16z offers the "AI Canon": https://a16z.com/ai-canon/

What's a "vector" in the context of AI? https://www.pinecone.io/learn/

Prompt engineering guide: https://www.promptingguide.ai/

OpenAI Cookbook: https://github.com/openai/openai-cookbook/tree/main

Chain-of-thought: https://arxiv.org/abs/2201.11903

Sparks of AGI: https://arxiv.org/pdf/2303.12712

A survey of LLMs: https://arxiv.org/pdf/2303.18223v4

Chinchilla's implications: https://www.lesswrong.com/posts/6Fpvch8RR29qLEWNH/chinchilla-s-wild-implications

AI for FSD at Tesla: https://www.youtube.com/watch?v=hx7BXih7zx8

Predictive learning: https://www.youtube.com/watch?v=Ount2Y4qxQo&t=1072s

Reinforcement Learning: https://www.youtube.com/watch?v=hhiLw5Q_UFg

Reinforcement Learning from Human Feedback (RLHF): https://huyenchip.com/2023/05/02/rlhf.html

Illustrated Stable Diffusion: https://jalammar.github.io/illustrated-stable-diffusion/

Built GPT: https://www.youtube.com/watch?v=kCc8FmEb1nY

Annotated Transformer: https://nlp.seas.harvard.edu/annotated-transformer/

Stanford NLP: https://www.youtube.com/playlist?list=PLoROMvodv4rOSH4v6133s9LFPRHjEmbmJ

Stanford ML: https://www.youtube.com/playlist?list=PLoROMvodv4rMiGQp3WXShtMGgzqpfVfbU

Convolutional Neural Nets: https://cs231n.github.io/

Backpropagation, Neural Nets: https://www.youtube.com/watch?v=i94OvYb6noo

Word2Vec: https://towardsdatascience.com/word2vec-explained-49c52b4ccb71

Practical Deep Learning for Coders: https://course.fast.ai/Lessons/lesson1.html

Wolfram Alpha Neural Net Repo: https://resources.wolframcloud.com/NeuralNetRepository

Building LLM applications for production: https://huyenchip.com/2023/04/11/llm-engineering.html

Criteria

To narrow the scope, at this point I need an understanding that allows me to confidently architect a user-facing application that leverages an LLM to accomplish some task.

Approach

To start, I believe the correct approach is to build a "skill map" or "conceptual framework", and challenge that framework on a regular basis with all the new information.

Key Questions

  • Why doesn't Jarvis exist yet? What's stopping that?
  • How would I build such an AI with the technology we have today?
  • For an application using an LLM, how can you keep user data private?
  • Given that LLMs hallucinate, how do you avoid hallucination?
  • How do you teach your LLM to get smarter over time?

Key Vocabulary

ML

  • VC dimension, over-fitting, under-fitting
  • logistic regression, kernel trick, boosting
  • SVM, Bellman equation, decision tree
  • naive Bayesian model, autoregressive model

DL

  • Adam, softmax, residual connections
  • relu, dropout, CLIP
  • ViT, transposed convolution layer
  • SGD, batchnorm, tokenizer, VAE
  • LSTM, GRU, GPT, GAN
  • Transformer

Math

  • Hessian, entropy, mutual information
  • Jacobian, gradient, Bayes' law
  • eigen-decomposition, svd