This link section is inspired by the ones from my favourite bloggers such as gwern, guzey or nintil. It presents a semi up-to-date list of my most interesting reads of the last few months.
October 2023
- Phi-1.5 Model: A Case of Comparing Apples to Oranges?
- Flash-Decoding for long-context inference
- RingAttention
- https://arxiv.org/abs/2310.01889
- The urge to go full tri dao et al and port that thing from Jax to a CUDA/Triton kernel…
- This would not only enable RingAttention to scale the sequence length by the number of devices used during training, but potentially also achieve higher a Model FLOPs utilization than FlashAtention-2 by computing the full transformer block in a blockwise manner in one kernel
- You could fine-tune a CodeLLaMA 7B to a 4million token context window with just 32x A100s to literally fit every code repository in the context…
- It’s time to be a definite techno-optimist
June 2023
- Large Language Models can Simulate Everything
- Large Language Models as Tool Makers
- Blockwise Parallel Transformer for Long Context Large Models:
May 2023
April 2023
- Scaffolded LLMs are not just cool toys but actually the substrate of a new type of general-purpose natural language computer
March 2023
- Is ChatGPT 175 Billion Parameters? Technical Analysis
- A step towards self-improving LLMs
- Alexey Guzey’s Lifehacks: https://guzey.com/lifehacks/
- huge L for Chomsky: https://scottaaronson.blog/?p=7094
- “like the Jesuit astronomers declining to look through Galileo’s telescope, what Chomsky and his followers are ultimately angry at is reality itself, for having the temerity to offer something up that they didn’t predict and that doesn’t fit their worldview.”
- The Waluigi Effect of LLMs: https://www.lesswrong.com/posts/D7PumeYTDPfBTp3i7/the-waluigi-effect-mega-post
- I stopped myself from reading the waluigi post until today because I don’t really think its beneficial for the space to make up such words where no one outside the LW sphere understands anything (even tho the term is quite self explanatory). But I have to admit its a really good post. Go check it out.