<aside> 💡 For more efficient Transformer implementations.
</aside>
Scientific LLM
StableMask: Refining Causal Masking in Decoder-only Transformer
DLCNet: Enabling Long-Range Convolution with Data Dependency
Triton Tutorials
[IMPORTANT] All I hope to learn
MikaStars @ Research
Reports