Links | Notion

This link section is inspired by the ones from my favourite bloggers such as gwern, guzey or nintil. It presents a semi up-to-date list of my most interesting reads of the last few months.

February 2025

How to Make Superbabies
- https://www.lesswrong.com/posts/DfrSZaf3JC8vJdbZL/how-to-make-superbabies

October 2023

Phi-1.5 Model: A Case of Comparing Apples to Oranges?
- https://pratyushmaini.github.io/phi-1_5/
Flash-Decoding for long-context inference
- https://pytorch.org/blog/flash-decoding/
RingAttention
- https://arxiv.org/abs/2310.01889
- The urge to go full tri dao et al and port that thing from Jax to a CUDA/Triton kernel…
- This would not only enable RingAttention to scale the sequence length by the number of devices used during training, but potentially also achieve higher a Model FLOPs utilization than FlashAtention-2 by computing the full transformer block in a blockwise manner in one kernel
- You could fine-tune a CodeLLaMA 7B to a 4million token context window with just 32x A100s to literally fit every code repository in the context…
It’s time to be a definite techno-optimist
- https://a16z.com/the-techno-optimist-manifesto/

June 2023

Large Language Models can Simulate Everything
- https://kliu.io/post/llms-can-simulate-everything/
- It might be time to build a General LLM Company — a virtual company of LLMs, with each “employee” specialized into a particular task.
Large Language Models as Tool Makers
- https://arxiv.org/abs/2305.17126
- In similar fashion to the recent Voyager paper
Blockwise Parallel Transformer for Long Context Large Models:
- https://arxiv.org/abs/2305.19370
- Created the the urge in me to go full Tri Dao et al and write a custom kernel for this neat trick of applying blockwise computation also to the FeedForward network

May 2023

Jason Wei’s response to emergent abilities of LLMs are a mirage argumgents: https://www.jasonwei.net/blog/common-arguments-regarding-emergent-abilities

April 2023

Scaffolded LLMs are not just cool toys but actually the substrate of a new type of general-purpose natural language computer
- https://www.beren.io/2023-04-11-Scaffolded-LLMs-natural-language-computers/

March 2023

Is ChatGPT 175 Billion Parameters? Technical Analysis
- https://orenleung.super.site/is-chatgpt-175-billion-parameters-technical-analysis
- Interesting counterarguments in the comments: https://twitter.com/O42nl/status/1631820805972668416
A step towards self-improving LLMs
- https://finbarr.ca/self-improving-LLMs/
Alexey Guzey’s Lifehacks: https://guzey.com/lifehacks/