37.
In the profiling blog post we talk about the CPU chain of dispatch
In the profiling blog post we talk about the CPU chain of dispatch > how operations are wrapped > how to annotate operations > what are the cuda launches > why is there a gap between the dispact and the kernl launch