Teaching
My Courses
Satbayev University (2024-current)
- Autumn 2025
- CSE7912: Development of Intelligent Applications
- Spring 2025
- CSE1273: Object-Oriented Programming in Java
- Autumn 2024
- CSE2562: Applied Text Processing (NLP)
University of Washington (2018-2022)
- Spring 2022 (grading)
- Winter 2022 (grading)
- Autumn 2021 (grading)
- Summer 2021 Full-term (instructor for a full course)
- Winter 2021
- Autumn 2020
- Summer 2020 Full-term (instructor for a full course)
- Spring 2020
- Winter 2020
- Autumn 2019
- Summer 2019 (grading)
- Spring 2019
- Winter 2019
- Autumn 2018
Kazakh-British Technical University (2017-2018)
- Calculus I
- Calculus II
- Calculus III
Resources
This section presents some resources that have been useful in my work.
General Machine Learning & Theory
-
Yandex’s online machine learning textbook is designed for those who are not afraid of mathematics and want to delve into ML technologies, covering classical theory to cutting-edge topics, with new chapters to be added regularly.
-
A visual introduction to information theory. This post explores the fundamentals of information theory, including optimal encoding, entropy, cross-entropy, mutual information, and other essential concepts that underpin how machine learning models learn from data.
-
Matrix multiplication as two kinds of linear combinations (row-wise and column-wise).
-
Different upsampling techinques used in CNNs.
-
Grokking: Generalization beyond overfitting on small algorithmic datasets.
Natural Language Processing (NLP)
-
A wonderful introductory course on NLP from Lena Voita. It covers basic topics in a very beginner-friendly and visual format. Moreover, each chapter contains research-oriented questions that can inspire a reader to think about the learned material in novel ways.
-
A nicely illustrated blog post about transformers and Seq2Seq models with attention by Jay Alammar. These posts can serve as a great entry point into the nuts and bolts of transformer-based models. His blog also contains some other well-visualized posts on machine learning topics.
-
An addon by Lena explaining Convolutional Neural Networks for text in more detail.
-
A blog post by Lena digging into the nuts and bolts of attention heads in Transformer models.
-
A video course on NLP from Standford University (CS224N, Winter 2019). It can be used as a good theoretical introduction into the basics of NLP.
-
A great resource for deeper understanding of how LSTMs work with pleasant and informative illustrations.
-
The illustrated GPT-2.
-
A useful tutorial on understanding and coding self-attention, multi-head attention, cross-attention, and causal-attention in LLMs.
-
RLHF: Reinforcement Learning from Human Feedback.
-
Mixture of Experts architecture.
-
Explanation of LLaMa’s architecture in contrast to a vanilla transformer.
-
Some works addressing the issue of the context length for transformers: Longformer, YaRN, LongRoPE.
-
COS 597G: Understanding Large Language Models.
Multimodal Learning
The most notable multimodal architectures to know: CLIP (and its variations: X-CLIP, UniCLIP, DeCLIP, FILIP, ULIP), Flamingo, BLIP, BLIP-2, InstructBLIP, Macaw-LLM, LLaVA (shallow fusion), LLaVA-NeXT, CogVLM (deep fusion), ImageBind, NExT-GPT, LaVIN (Mixture-of-Modality Adaptation (MMA)), ALIGN, OFA.
-
A comprehensive survey on Multimodal Large Language Models (MLLMs).
-
Contrastive loss for multimodal retrieval.
-
A great blog post on Multimodality and Large Multimodal Models (LMMs).
-
One more blog post about popular LMM architectures from Determined AI.
Reinforcement Learning
-
A glorious introductory course on Deep Reinforcement learning.
-
Maximum Entropy Reinforcement Learning.
Deep Learning Engineering
-
Parameter-Efficient Fine-Tuning (PEFT) techniques: prefix-tuning, low-rank approximation (LoRA).