63.
Building an open-source post-training stack for large language models from first principles.
Building an open-source post-training stack for large language models from first principles. The goal is to understand and implement the systems behind modern reasoning models end-to-end: • SFT • Preference Optimization • RLHF / RLVR • Rew