👋 Hi, I’m Gejun Zhu

Welcome to my learning journal. I learn by doing, building projects in analytics engineering, data pipelines, AI, and machine learning to truly understand how things work. Inspired by Feynman’s insight, “What I cannot create, I don’t understand,” I document what I build, what I break, and what I learn along the way. Currently transitioning from analyst to analytics engineer while exploring the world of LLMs and AI agents. If you believe in learning through hands-on experimentation, you’re in the right place.

📌 Understanding Transformer Architecture by Building GPT

In Part2, we constructed a straightforward MLP model to generate characters based on 32k popular names. In this lecture, Andrej guides us on gradually incorporating the transformer architecture to improve the performance of our bigram model. We will start by refactoring our previous model and then add code from the transformer architecture piece by piece to see how it helps our model. ...

Kimball vs Inmon: Why Kimball is Better for Analytics

I have been learning dbt recently and wanted to get a better understanding of data warehousing methodologies. Two of the most well-known ones are the Kimball and Inmon approaches. In this post, I will explain the key differences between the two and why Kimball is generally considered better for analytics. Kimball (Dimensional Modeling) Kimball’s approach uses a star schema with two main types of tables: Fact Tables: Store measurable events (sales amount, quantity sold, order totals). These are the “what happened” of your business. Dimension Tables: Store descriptive context (customer info, product details, dates). These answer “who, what, where, when.” The star schema looks like this: ...

Fundamentals of dbt

This cheat sheet covers the fundamentals of dbt (data build tool)—a popular data transformation tool used in modern data engineering and analytics workflows. It includes key concepts, commands, and best practices to get you started with dbt. Fundamentals (Must Know) What is dbt? dbt (data build tool) is a transformation tool that enables analysts and engineers to transform data in their warehouse using SQL. Key concept: dbt handles the T in ELT (Extract, Load, Transform). It doesn’t extract or load data—it transforms data that’s already in your warehouse. ...

Building My First dbt Project with DuckDB

Why I’m Learning dbt as an Analyst You can find the project on GitHub here: zhugejun/learn-dbt-by-building I’ve been an Institutional Research Analyst in higher education for almost a decade. For my day-to-day job, I can wrangle enrollment with SQL, automate reports with R, build prediction models with Python, and visualize data with Power BI. There are times that I need to run queries to pull data from CAMS directly, download data from ZogoTech (our third-party OLAP vendor), save it as CSV, and load it to R for aggregation, visualization, and further analysis. Sometimes, I need to ingest the enrollment history data from National Student Clearinghouse (NSC) and combine it with data from multiple resources to create a superintendent report. ...

Multilayer Perceptron (MLP)

In Part1, we learned how to build a neural network with one hidden layer to generate words. The model we built performed fairly well as we got the exact words generated based on counting. However, the bigram model suffers from the limitation that it assumes that each character only depends on its previous character. Suppose there is only one bigram starting with a particular character. In that case, the model will always generate the following character in that bigram, regardless of the context or the probability of other characters. This lack of context can lead to poor performance of bigram models. In this lecture, Andrej shows us how to build a multilayer neural network to improve the model performance. ...