Teaching | Ravil Mussabayev

My Courses

You can find my University of Washington profile with a list of courses here.

Satbayev University (2024-current)

Spring 2025
- CSE1273: Object-Oriented Programming in Java
Autumn 2024
- CSE2562: Applied Text Processing (NLP)

University of Washington (2018-2022)

Spring 2022 (grading)
- MATH 409 A: Discrete Optimization
Winter 2022 (grading)
- MATH 407 A: Linear Optimization
Autumn 2021 (grading)
- MATH 407 B: Linear Optimization
Summer 2021 Full-term (instructor for a full course)
- MATH 308 A: Matrix Algebra with Applications
Winter 2021
Autumn 2020
- MATH 126 DC: Calculus with Analytic Geometry III
- MATH 126 DD: Calculus with Analytic Geometry III
Summer 2020 Full-term (instructor for a full course)
- MATH 307 D: Introduction to Differential Equations
Spring 2020
- MATH 125 CA: Calculus with Analytic Geometry II
- MATH 125 CB: Calculus with Analytic Geometry II
Winter 2020
- MATH 120 BC: Precalculus
Autumn 2019
- MATH 126 GC: Calculus with Analytic Geometry III
- MATH 126 GD: Calculus with Analytic Geometry III
Summer 2019 (grading)
- MATH 327 B: Introductory Real Analysis I
Spring 2019
- MATH 126 FA: Calculus with Analytic Geometry III
- MATH 126 FB: Calculus with Analytic Geometry III
Winter 2019
- MATH 125 GA: Calculus with Analytic Geometry II
- MATH 125 GB: Calculus with Analytic Geometry II
Autumn 2018
- MATH 124 EA: Calculus with Analytic Geometry I
- MATH 124 EB: Calculus with Analytic Geometry I

Kazakh-British Technical University (2017-2018)

Calculus I
Calculus II
Calculus III

Resources

In this section, I have curated resources that have been instrumental in my research. The aim is to provide valuable resources for those exploring these subjects independently.

General Machine Learning & Theory

Yandex’s online machine learning textbook is designed for those who are not afraid of mathematics and want to delve into ML technologies, covering classical theory to cutting-edge topics, with new chapters to be added regularly.
A visual introduction to information theory. This post explores the fundamentals of information theory, including optimal encoding, entropy, cross-entropy, mutual information, and other essential concepts that underpin how machine learning models learn from data.
Matrix multiplication as two kinds of linear combinations (row-wise and column-wise).
Different upsampling techinques used in CNNs.
Grokking: Generalization beyond overfitting on small algorithmic datasets.

Natural Language Processing (NLP)

A wonderful introductory course on NLP from Lena Voita. It covers basic topics in a very beginner-friendly and visual format. Moreover, each chapter contains research-oriented questions that can inspire a reader to think about the learned material in novel ways.
A nicely illustrated blog post about transformers and Seq2Seq models with attention by Jay Alammar. These posts can serve as a great entry point into the nuts and bolts of transformer-based models. His blog also contains some other well-visualized posts on machine learning topics.
An addon by Lena explaining Convolutional Neural Networks for text in more detail.
A blog post by Lena digging into the nuts and bolts of attention heads in Transformer models.
A video course on NLP from Standford University (CS224N, Winter 2019). It can be used as a good theoretical introduction into the basics of NLP.
A great resource for deeper understanding of how LSTMs work with pleasant and informative illustrations.
The illustrated GPT-2.
The Annotated Transformer.
A useful tutorial on understanding and coding self-attention, multi-head attention, cross-attention, and causal-attention in LLMs.
RLHF: Reinforcement Learning from Human Feedback.
Sparse Attention.
Mixture of Experts architecture.
Rotary Embeddings (RoPE).
Explanation of LLaMa’s architecture in contrast to a vanilla transformer.
Some works addressing the issue of the context length for transformers: Longformer, YaRN, LongRoPE.
Siamese representation learning.
COS 597G: Understanding Large Language Models.

Multimodal Learning

The most notable multimodal architectures to know: CLIP (and its variations: X-CLIP, UniCLIP, DeCLIP, FILIP, ULIP), Flamingo, BLIP, BLIP-2, InstructBLIP, Macaw-LLM, LLaVA (shallow fusion), LLaVA-NeXT, CogVLM (deep fusion), ImageBind, NExT-GPT, LaVIN (Mixture-of-Modality Adaptation (MMA)), ALIGN, OFA.

A comprehensive survey on Multimodal Large Language Models (MLLMs).
Contrastive loss for multimodal retrieval.
A great blog post on Multimodality and Large Multimodal Models (LMMs).
One more blog post about popular LMM architectures from Determined AI.

Reinforcement Learning

A glorious introductory course on Deep Reinforcement learning.
Maximum Entropy Reinforcement Learning.
Proximal Policy Optimization.
Direct Preference Optimization.

Deep Learning Engineering

LangChain AI Handbook.
LangChain cheatsheet.
Parameter-Efficient Fine-Tuning (PEFT) techniques: prefix-tuning, low-rank approximation (LoRA).