@antirez on Backlist

66.

I was thinking about Vector Sets and the Redis approach to this stuff in general. Now that the hype with RAG is g…

I was thinking about Vector Sets and the Redis approach to this stuff in general. Now that the hype with RAG is gone, I'm 100% sure I made the right call there, saying: RAG will mostly go away, but raw vector search is a useful, fundamental

by @antirez · backlist 2026-06-22 · rubric 97.5

68.

Apparently it is possible to go much faster than my 13.5 t/s, and this is a very good news. It means 4-bit GLM 5.…

Apparently it is possible to go much faster than my 13.5 t/s, and this is a very good news. It means 4-bit GLM 5.2 will be usable on Mac Studios M3 Ultras. The problem is that it is literally hardware you *can not buy* anymore.

by @antirez · backlist 2026-06-21 · rubric 77.0

13.

Redis Claims 50% Memory Savings on Sorted Sets

A deep data-structure optimization like this can materially change the cost profile of real production workloads

by @antirez · backlist 2026-06-19 · rubric 73.0

86.

I'm using the Unsloth 4-bit weights to integrate interence into DwarfStar. If the experiment works well I'll spec…

I'm using the Unsloth 4-bit weights to integrate interence into DwarfStar. If the experiment works well I'll specialize the quants with an optimizer for the best setup. My target is however the 512GB M3 ultra and in distributed inference 4

by @antirez · backlist 2026-06-18 · rubric 68.0

36.

Another important thing: Chinese models are not strong because they distill US models. Distillation of models via…

Another important thing: Chinese models are not strong because they distill US models. Distillation of models via API is *impossible*. If somebody tells you the contrary, they don't understand machine learning:

by @antirez · backlist 2026-06-15 · rubric 84.0

54.

DwarfStar now supports SSD streaming in the DGX Spark and Strix Halo, not just in Metal. You can run the Q4 quant…

DwarfStar now supports SSD streaming in the DGX Spark and Strix Halo, not just in Metal. You can run the Q4 quants at decent speed, and even DeepSeek v4 PRO at low speed, or you can run Q2 Flash if you have less than 128GB.

by @antirez · backlist 2026-06-15 · rubric 78.0

80.

People often have this idea of Chinese models being N months behind US models. This mental model is not helpful t…

People often have this idea of Chinese models being N months behind US models. This mental model is not helpful to predict the future. The lag is due to compute deficit, so the playfield is that. It's not by chance that OpenAI and Anthropic

by @antirez · backlist 2026-06-15 · rubric 72.0

32.

For days, many folks here are citing DeepSWE as the benchmark that restores reality only because it shows GPT 5.5…

For days, many folks here are citing DeepSWE as the benchmark that restores reality only because it shows GPT 5.5 on top. But actually, it almost gets a single entry right: the top one, and all the rest is shuffled.

by @antirez · backlist 2026-06-07 · rubric 89.0

7.

DeepSeek V4 Pro streamed from SSD on a 128GB MacBook

A 1.6T-parameter model running on consumer Apple hardware via SSD streaming changes the practical boundary of local model experimentation

by @antirez · backlist 2026-06-04 · rubric 88.0

79.

SSD Streamed Dwarf Start by (x.com)

SSD Streamed Dwarf Start by @anemll , cool demo! Official implementation of streaming is arriving too. DeepSeek Flash should run at ~14 t/s on MacBook m5 max 64GB, DeepSeek PRO should run at 4 t/s on MacBook m5 max 128GB. Those are genera

by @antirez · backlist 2026-06-04 · rubric 74.0

49.

DwarfStar prefill is a hell faster after merging (t.co)

DwarfStar prefill is a hell faster after merging https:// github.com/antirez/ds4/pu ll/264 …, I need to update the README benchmarks as they no longer mean most :D after +40% prefill speed boost.

by @antirez · backlist 2026-05-27 · rubric 91.0