Teaching
Discover my curated collection of teaching materials, resources, and expert tips, drawn from my university studies, teaching experience, and professional expertise. Empower your learning and growth with valuable insights and practical tools
My Courses
You can find my University of Washington profile with a list of courses here.
Satbayev University (2024-current)
- CSE2562: Applied Text Processing
University of Washington (2018-2022)
- Spring 2022 (grading)
- Winter 2022 (grading)
- Autumn 2021 (grading)
- Summer 2021 Full-term (instructor for a full course)
- Winter 2021
- Autumn 2020
- Summer 2020 Full-term (instructor for a full course)
- Spring 2020
- Winter 2020
- Autumn 2019
- Summer 2019 (grading)
- Spring 2019
- Winter 2019
- Autumn 2018
Kazakh-British Technical University (2017-2018)
- Calculus I
- Calculus II
- Calculus III
Resources
In this section, I have curated resources that have been instrumental in my research. The aim is to provide valuable resources for those exploring these subjects independently.
General Machine Learning & Theory
-
Yandex’s online machine learning textbook is designed for those who are not afraid of mathematics and want to delve into ML technologies, covering classical theory to cutting-edge topics, with new chapters to be added regularly.
-
A visual introduction to information theory. This post explores the fundamentals of information theory, including optimal encoding, entropy, cross-entropy, mutual information, and other essential concepts that underpin how machine learning models learn from data.
-
Matrix multiplication as two kinds of linear combinations (row-wise and column-wise).
-
Different upsampling techinques used in CNNs.
-
Grokking: Generalization beyond overfitting on small algorithmic datasets.
Natural Language Processing (NLP)
-
A wonderful introductory course on NLP from Lena Voita. It covers basic topics in a very beginner-friendly and visual format. Moreover, each chapter contains research-oriented questions that can inspire a reader to think about the learned material in novel ways.
-
A nicely illustrated blog post about transformers and Seq2Seq models with attention by Jay Alammar. These posts can serve as a great entry point into the nuts and bolts of transformer-based models. His blog also contains some other well-visualized posts on machine learning topics.
-
An addon by Lena explaining Convolutional Neural Networks for text in more detail.
-
A blog post by Lena digging into the nuts and bolts of attention heads in Transformer models.
-
A video course on NLP from Standford University (CS224N, Winter 2019). It can be used as a good theoretical introduction into the basics of NLP.
-
A great resource for deeper understanding of how LSTMs work with pleasant and informative illustrations.
-
The illustrated GPT-2.
-
A useful tutorial on understanding and coding self-attention, multi-head attention, cross-attention, and causal-attention in LLMs.
-
RLHF: Reinforcement Learning from Human Feedback.
-
Mixture of Experts architecture.
-
Explanation of LLaMa’s architecture in contrast to a vanilla transformer.
-
Some works addressing the issue of the context length for transformers: Longformer, YaRN, LongRoPE.
-
COS 597G: Understanding Large Language Models.
Multimodal Learning
The most notable multimodal architectures to know: CLIP (and its variations: X-CLIP, UniCLIP, DeCLIP, FILIP, ULIP), Flamingo, BLIP, BLIP-2, InstructBLIP, Macaw-LLM, LLaVA (shallow fusion), LLaVA-NeXT, CogVLM (deep fusion), ImageBind, NExT-GPT, LaVIN (Mixture-of-Modality Adaptation (MMA)), ALIGN, OFA.
-
A comprehensive survey on Multimodal Large Language Models (MLLMs).
-
Contrastive loss for multimodal retrieval.
-
A great blog post on Multimodality and Large Multimodal Models (LMMs).
-
One more blog post about popular LMM architectures from Determined AI.
Reinforcement Learning
-
A glorious introductory course on Deep Reinforcement learning.
-
Maximum Entropy Reinforcement Learning.
Deep Learning Engineering
-
Parameter-Efficient Fine-Tuning (PEFT) techniques: prefix-tuning, low-rank approximation (LoRA).