Understanding Transformer Architecture by Building GPT
In Part2, we constructed a straightforward MLP model to generate characters based on 32k popular names. In this lecture, Andrej guides us on gradually incorporating the transformer architecture to improve the performance of our bigram model. We will start by refactoring our previous model and then add code from the transformer architecture piece by piece to see how it helps our model. Data Preparation Let鈥檚 first import the necessary libraries and get the data ready....
Multilayer Perceptron (MLP)
In Part1, we learned how to build a neural network with one hidden layer to generate words. The model we built performed fairly well as we got the exact words generated based on counting. However, the bigram model suffers from the limitation that it assumes that each character only depends on its previous character. Suppose there is only one bigram starting with a particular character. In that case, the model will always generate the following character in that bigram, regardless of the context or the probability of other characters....
Bigram Character-level Language Model
This is a series of learning notes for the excellent online course Neural Networks: Zero to Hero created by Andrej Karpathy. The official Jupyter Notebook for this lecture is here. In this lecture, Andrej shows us two different approaches to generating characters. The first approach involves sampling characters based on a probability distribution, while the second uses a neural network built from scratch. Before we can generate characters using either approach, let鈥檚 prepare the data first....
Back to Basics - Data Science
Statistics Introduction to Probability: Probability Distributions: Expected Value: Code | Slides | Video Bayes Theorem: Central Limit Theorem: Code | Slides | Video Confidence Interval: Code | Slides | Video Hypothesis Testing: Code | Slides | Video p-value: Type I and Type II Errors: Code | Slides | Video Power of a Test: t-test: Code | Slides | Video ANOVA: Code | Slides | Video Chi-Square Test: Linear Regression: Machine Learning Logistic Regression: Decision Trees: Random Forests: Gradient Boosting: Support Vector Machines: K-Nearest Neighbors: K-Means Clustering: Hierarchical Clustering: Principal Component Analysis: Singular Value Decomposition: Deep Learning Neural Networks: Convolutional Neural Networks: Natural Language Processing Word Embeddings: Word2Vec: GloVe: FastText: BERT: LeetCode (Grind75) Two Sum R tidyverse: ggplot2: dplyr: tidyr: purrr: readr: tibble: stringr: Python numpy: pandas: matplotlib: seaborn: scikit-learn: scipy: decorator: