62.
how do you sync a trillion parameter model every RL step without a shared cluster? we just wrote a blog about it,… (x.com)
how do you sync a trillion parameter model every RL step without a shared cluster? we just wrote a blog about it, led by @AmineDirhoussi what I like the most is the way it proves you can use the Hub for basically everything → trainer on