50.
i just tested to put GLM-5.2 on my rig. 753B parameter MoE. 2x RTX PRO 6000 Blackwells, Threadripper PRO 9995WX w…
i just tested to put GLM-5.2 on my rig. 753B parameter MoE. 2x RTX PRO 6000 Blackwells, Threadripper PRO 9995WX with 1TB DDR5. prefills at 64 tok/s. decode holds at 13-15. system RAM bandwidth is the bottleneck. running UD-Q4_K_XL 4-bit.