| Week | Topic | Contents | Instructor |
|---|---|---|---|
| 1 | Introduction (Part 1) |
Fundamental scientific questions in life sciences; The scale of biological data.
A brief history of AI.
Overview of recent breakthroughs (e.g., protein folding, drug discovery).
|
Haicang Zhang |
| 2 | Introduction (Part 2) |
Core ML Concepts:
Supervised versus unsupervised learning, clustering, generative model;
Training, testing, and validation;
Loss functions;
Evaluation metrics;
Optimization algorithms;
Linear models (e.g. logistic regression, linear regression, SVM);
Non-linear models (e.g. decision tree, random forest, neural network);
Generalization: The Bias-Variance Tradeoff; Strategies for mitigating overfitting in biological datasets.
|
Haicang Zhang |
| 3 | Sequence Modeling (Part 1) |
Background on biological sequences.
The pre-LLM era: n-grams; Recurrent Neural Networks (RNNs), LSTMs, and Seq2Seq.
The information bottleneck and the origin of the Attention Mechanism.
|
Haicang Zhang |
| 4 | Sequence Modeling (Part 2) |
Deep dive into the Transformer architecture; BERT vs. GPT.
Interpretability: token-level embedding, sequence-level embedding, and attention maps.
|
Haicang Zhang |
| 5 | Sequence Modeling (Part 3) |
SOTA Models: ESM2/3, MSA-Transformer, ProGen, xTrimoPGLM, and LLMs for antibodies, RNAs, and DNAs.
Advanced Tuning: Parameter-efficient fine-tuning (PEFT) using LoRA; Direct Preference Alignment (DPO).
|
Haicang Zhang |
| 6 | Structure Modeling (Part 1) |
Background on biomolecular structure modeling.
Model architectures in pre-AlphaFold2 era: CNNs, ResNets, DenseNets, AlexNet, GoogleNet, and Inception architectures.
Structure modeling in pre-AlphaFold2 era: RaptorX, ProFold, trRosetta, and AlphaFold1.
|
Haicang Zhang |
| 7 | Structure Modeling (Part 2) |
Deep Dive into AlphaFold2.
Adapting AF2 for multimer prediction, docking, mutation effect prediction, and protein design.
|
Haicang Zhang |
| 8 | Probabilistic Graphical Models (Part 1) |
Introduction to Directed Graphical Models.
Gaussian Mixture Models (GMMs), Bayesian GMMS, and Hidden Markov Models (HMMs).
Markov Chain Monte Carlo (MCMC) vs. Variational Inference.
|
Haicang Zhang |
| 9 | Probabilistic Graphical Models (Part 2) |
Deep Directed Graphical Model: Variational Autoencoder (VAE).
VAEs for functional effects prediction, single-cell analysis, and sequence generation.
|
Haicang Zhang |
| 10 | Probabilistic Graphical Models (Part 3) |
Introduction to Undirected Graphical Models.
Ising Models, Potts Models, and Markov Random Fields (MRFs).
Pseudo-likelihood approximation.
Deep Undirected Graphical Models.
Applications in protein contacts prediction and protein sequence design.
|
Haicang Zhang |
| 11 | Diffusion Models (Part 1) |
From VAEs to Denoising Diffusion Probabilistic Models (DDPM).
Stochastic Differential Equation (SDE)-based Difusion Models; EDM.
Applications: De novo backbone generation (e.g. RFdiffusion, FrameDiff, Chroma, and CarbonNovo).
|
Haicang Zhang |
| 12 | Diffusion Models (Part 2) |
Consistency Models.
Flow Matching and Optimal Transport.
More applications: FoldFlow, AlphaFold3 (EDM).
|
Haicang Zhang |
| 13 | Diffusion Models (Part 3) |
Classifier guidance vs. classifier free diffusion.
Post-training with Direct Preference Optimization (DPO); Applications in antibody design: AbDPO, AbNovo.
|
Haicang Zhang |
| 14 | Computational Proteomics |
Principles of peptide sequencing with Mass Spectrometry (MS/MS).
Deep Learning models for de novo peptide sequencing (e.g., DeepNovo).
|
Shiwei Sun |
| 15 | Computational Glycomics |
Introduction to glycan identification with MS/MS.
Deep learning models for glycan identification.
|
Shiwei Sun |