davethehuman.com

davethehuman.comhttps://davethehuman.com/Recent content on davethehuman.comHugo -- gohugo.ioen© 2026 Dave the humanSat, 30 May 2026 00:00:00 +0000BPEhttps://davethehuman.com/notes/bpe/Sat, 30 May 2026 00:00:00 +0000https://davethehuman.com/notes/bpe/<p><em>BPE</em> stands for <em>Byte Pair Encoding</em>, a method to split into <a href="https://davethehuman.com/notes/token/">token</a>s that can represent both common and rare words using a mix of full words and sub-words units. Spaces are also included in tokens, and this helps <a href="https://davethehuman.com/notes/llm/">LLM</a> detect word boundaries.</p>decodinghttps://davethehuman.com/notes/decoding/Sat, 30 May 2026 00:00:00 +0000https://davethehuman.com/notes/decoding/<p>When a <a href="https://davethehuman.com/notes/tokenizer/">tokenizer</a> performs the process of <em>decoding</em> (through its <code>decode</code> method), a list of IDs representing <a href="https://davethehuman.com/notes/token/">token</a>s is converted to natural language.</p> <div class="highlight-wrapper"><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span>input_token_ids_list <span style="color:#f92672">=</span> [<span style="color:#ae81ff">19152</span>, <span style="color:#ae81ff">20238</span>, <span style="color:#ae81ff">11</span>, <span style="color:#ae81ff">358</span>, <span style="color:#ae81ff">646</span>, <span style="color:#ae81ff">944</span>, <span style="color:#ae81ff">653</span>, <span style="color:#ae81ff">429</span>] </span></span><span style="display:flex;"><span>text <span style="color:#f92672">=</span> tokenizer<span style="color:#f92672">.</span>decode(input_token_ids_list) </span></span><span style="display:flex;"><span>print(text)</span></span></code></pre></div></div> <pre class="code-output">Sorry Dave, I can't do that</pre> <p>The other way around from natural language to IDs is called <a href="https://davethehuman.com/notes/encoding/">encoding</a>.</p>encodinghttps://davethehuman.com/notes/encoding/Sat, 30 May 2026 00:00:00 +0000https://davethehuman.com/notes/encoding/<p>When a <a href="https://davethehuman.com/notes/tokenizer/">tokenizer</a> performs the process of <em>encoding</em> (through its <em>encode</em> method), a natural language text is broken into <a href="https://davethehuman.com/notes/token/">token</a>s that are then converted to IDs.</p> <div class="highlight-wrapper"><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#75715e"># Even notes can have code</span> </span></span><span style="display:flex;"><span>prompt <span style="color:#f92672">=</span> <span style="color:#e6db74">"Sorry Dave, I can't do that"</span> </span></span><span style="display:flex;"><span>input_token_ids_list <span style="color:#f92672">=</span> tokenizer<span style="color:#f92672">.</span>encode(prompt) </span></span><span style="display:flex;"><span>print(input_token_ids_list)</span></span></code></pre></div></div> <pre class="code-output">[19152, 20238, 11, 358, 646, 944, 653, 429]</pre> <p>The way back from IDs to natural language is called [[decoding]].</p>vocabularyhttps://davethehuman.com/notes/vocabulary/Sat, 30 May 2026 00:00:00 +0000https://davethehuman.com/notes/vocabulary/<p>The number of <a href="https://davethehuman.com/notes/token/">token</a>s that can be handled by a <a href="https://davethehuman.com/notes/tokenizer/">tokenizer</a>.</p> <p>A larger vocabulary in [[LLM]]s:</p> <ul> <li>increases the model size because the [[embedding]] and output layers must store more <a href="https://davethehuman.com/notes/token/">token</a> representations</li> <li>increases the per-token compute cost of producing next-token probabilities</li> <li>allows more words to be represented as single tokens rather than being split into subword components; this can reduce the sequence length since less tokens are required to represent a sentence</li> </ul> <p>So the tradeoff is between a larger vocabulary with somewhat higher per-token cost and a smaller vocabulary that often produces longer token sequences.</p>RLHFhttps://davethehuman.com/notes/rlhf/Fri, 15 May 2026 00:00:00 +0000https://davethehuman.com/notes/rlhf/<p>See <a href="https://davethehuman.com/notes/reinforcement-learning-from-human-feedback/">reinforcement learning from human feedback</a>].</p>distillationhttps://davethehuman.com/notes/distillation/Thu, 14 May 2026 00:00:00 +0000https://davethehuman.com/notes/distillation/<p><em>Distillation</em> (also called <a href="https://davethehuman.com/coming-soon/">knowledge distillation</a>) consists of transferring complex reasoning patterns learned by larger models into smaller ones. In <a href="https://davethehuman.com/coming-soon/">deep learning</a>, distillation happens when a smaller “student” model learns from outputs and logits of a larger “teacher” model; when talking about <a href="https://davethehuman.com/notes/large-language-model/">Large Language Model</a>s, distillation typically means performing <a href="https://davethehuman.com/notes/supervised-fine-tuning/">supervised fine-tuning</a> using high-quality labeled instruction datasets generated by a more capable <a href="https://davethehuman.com/notes/llm/">LLM</a>.</p>reinforcement learning from human feedbackhttps://davethehuman.com/notes/reinforcement-learning-from-human-feedback/Thu, 14 May 2026 00:00:00 +0000https://davethehuman.com/notes/reinforcement-learning-from-human-feedback/<p><em>Reinforcement Learning From Human Feedback</em> (or <a href="https://davethehuman.com/notes/rlhf/">RLHF</a>) involves human evaluations or rankings of model outputs as reward signals; this means that humans are involved in the process to guide the model toward human-preferred behaviors.</p> <p>This is in contrast with <a href="https://davethehuman.com/notes/reinforcement-learning/">reinforcement learning</a> in the context of <a href="https://davethehuman.com/notes/reasoning/">reasoning</a>, where the models rely on automated or environment-based reward signals (more objective but potentially less aligned with human preference).</p>inference-compute scalinghttps://davethehuman.com/notes/inference-compute-scaling/Wed, 13 May 2026 00:00:00 +0000https://davethehuman.com/notes/inference-compute-scaling/<p>See <a href="https://davethehuman.com/notes/inference-time-compute-scaling/">inference-time compute scaling</a>.</p>inference-time compute scalinghttps://davethehuman.com/notes/inference-time-compute-scaling/Wed, 13 May 2026 00:00:00 +0000https://davethehuman.com/notes/inference-time-compute-scaling/<p><em>Inference-time compute scaling</em> (also called <a href="https://davethehuman.com/notes/inference-compute-scaling/">inference-compute scaling</a> or <a href="https://davethehuman.com/notes/test-time-scaling/">test-time scaling</a>) is a technique that aims to improve a <a href="https://davethehuman.com/notes/large-language-model/">Large Language Model</a>’s <a href="https://davethehuman.com/notes/reasoning/">reasoning</a> capabilities at <a href="https://davethehuman.com/coming-soon/">inference</a> time without training or modifying the underlying model <a href="https://davethehuman.com/coming-soon/">weights</a>.</p> <p>The core idea is to trade off increased computational resources for improved performance; in this way, even fixed models can become more capable through techniques like <a href="https://davethehuman.com/notes/chain-of-thought-cot/">chain-of-thought (COT)</a> and various sampling procedures.</p>reasoninghttps://davethehuman.com/notes/reasoning/Wed, 13 May 2026 00:00:00 +0000https://davethehuman.com/notes/reasoning/<p>In the context of <a href="https://davethehuman.com/notes/large-language-model/">Large Language Model</a>s, <em>reasoning</em> is the ability to tackle more complex problems step-by-step.</p> <p>The concept of <em>reasoning</em> became popular when <a href="https://openai.com/index/introducing-openai-o1-preview/">OpenAI announced o1</a> on the 12th of September 2024, highlighting their capabilities in tackling complex problems in science, math, coding etc. Few months later, in January 2025, DeepSeek <a href="https://arxiv.org/abs/2501.12948">released their R1 model</a> that competed and exceeded the performance of the proprietary o1 model. The great thing is that they made it openly available, sharing a blueprint on how to train such a model.</p>reinforcement learninghttps://davethehuman.com/notes/reinforcement-learning/Wed, 13 May 2026 00:00:00 +0000https://davethehuman.com/notes/reinforcement-learning/<p><em>Reinforcement Learning</em> (or <a href="https://davethehuman.com/notes/rl/">RL</a>) aims to improve a <a href="https://davethehuman.com/notes/large-language-model/">Large Language Model</a>’s <a href="https://davethehuman.com/notes/reasoning/">reasoning</a> capabilities by encouraging it to take actions that lead to high reward signals.</p> <p>While <a href="https://davethehuman.com/notes/inference-time-compute-scaling/">inference-time compute scaling</a> improves model’s reasoning performance without modifying the model, <em>RL</em> updates the model’s <a href="https://davethehuman.com/coming-soon/">weights</a> during training, enabling the model to learn through trial and error based on the feedback from the environment.</p> <p>It is important to distinguish reinforcement learning in the context of reasoning from <a href="https://davethehuman.com/notes/reinforcement-learning-from-human-feedback/">reinforcement learning from human feedback</a> (<a href="https://davethehuman.com/notes/rlhf/">RLHF</a>), which is used during <a href="https://davethehuman.com/notes/preference-tuning/">preference tuning</a>. Both settings use reinforcement learning principles but they differ primarily in how the reward is obtained and validated (through human verifiers for RLHF versus automated verifiers or environments for reasoning RL).</p>RLhttps://davethehuman.com/notes/rl/Wed, 13 May 2026 00:00:00 +0000https://davethehuman.com/notes/rl/<p>See <a href="https://davethehuman.com/notes/reinforcement-learning/">reinforcement learning</a>.</p>test-time scalinghttps://davethehuman.com/notes/test-time-scaling/Wed, 13 May 2026 00:00:00 +0000https://davethehuman.com/notes/test-time-scaling/<p>See <a href="https://davethehuman.com/notes/inference-time-compute-scaling/">inference-time compute scaling</a>.</p>tokenhttps://davethehuman.com/notes/token/Wed, 13 May 2026 00:00:00 +0000https://davethehuman.com/notes/token/<p>A <em>token</em> is not necessary an entire word: it can be defined as a small unit of text that gets processed by a language model.</p> <p>It can be a full word, part of a word, or even punctuation depending on how the text is split and mapped to IDs by the <a href="https://davethehuman.com/notes/tokenizer/">tokenizer</a>. E.g. the sentence <code>Sorry Dave, I can't do that</code> can be broken into tokens like this:</p> <div class="highlight-wrapper"><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-text" data-lang="text"><span style="display:flex;"><span>19152 --> Sorry </span></span><span style="display:flex;"><span>20238 --> Dave </span></span><span style="display:flex;"><span>11 --> , </span></span><span style="display:flex;"><span>358 --> I </span></span><span style="display:flex;"><span>646 --> can </span></span><span style="display:flex;"><span>944 --> 't </span></span><span style="display:flex;"><span>653 --> do </span></span><span style="display:flex;"><span>429 --> that</span></span></code></pre></div></div> <p>The ID mapping is necessary to make the model ingest the tokens.</p>tokenizerhttps://davethehuman.com/notes/tokenizer/Wed, 13 May 2026 00:00:00 +0000https://davethehuman.com/notes/tokenizer/<p>A <em>tokenizer</em> is a critical component of the <a href="https://davethehuman.com/notes/llm/">LLM</a> text processing and generation pipeline even though it is not directly part of it. It splits the text into <a href="https://davethehuman.com/notes/token/">token</a>s that get converted into numerical IDs to be ingested by the language model (<a href="https://davethehuman.com/notes/encoding/">encoding</a>) and it decodes back the LLM’s output to human-readable text (<a href="https://davethehuman.com/notes/decoding/">decoding</a>).</p>LLM training pipelinehttps://davethehuman.com/notes/llm-training-pipeline/Tue, 12 May 2026 00:00:00 +0000https://davethehuman.com/notes/llm-training-pipeline/<p>Conventional <a href="https://davethehuman.com/notes/llm/">LLM</a> are typically trained in two stages:</p> <ul> <li><a href="https://davethehuman.com/notes/pre-training/">pre-training</a></li> <li><a href="https://davethehuman.com/notes/post-training/">post-training</a></li> </ul> <p>Some recent research also distinguish a <a href="https://davethehuman.com/coming-soon/">mid-training</a> stage between them.</p>preference tuninghttps://davethehuman.com/notes/preference-tuning/Tue, 12 May 2026 00:00:00 +0000https://davethehuman.com/notes/preference-tuning/<p><em>Preference tuning</em> is nowadays often performed through <a href="https://davethehuman.com/notes/reinforcement-learning-from-human-feedback/">Reinforcement Learning From Human Feedback</a> (or <a href="https://davethehuman.com/notes/rlhf/">RLHF</a>) refines the <a href="https://davethehuman.com/notes/supervised-fine-tuning/">supervised fine-tuning</a> with preferred stylistic choices.</p>supervised fine-tuninghttps://davethehuman.com/notes/supervised-fine-tuning/Tue, 12 May 2026 00:00:00 +0000https://davethehuman.com/notes/supervised-fine-tuning/<p><em>Supervised fine-tuning</em> (or <a href="https://davethehuman.com/coming-soon/">instruction tuning</a>) improves an <a href="https://davethehuman.com/notes/large-language-model/">Large Language Model</a> capabilities in question-answering, summarization, translation etc. Later, the <a href="https://davethehuman.com/notes/preference-tuning/">preference tuning</a> refines these capabilities.</p>chain-of-thoughthttps://davethehuman.com/notes/chain-of-thought-cot/Thu, 07 May 2026 00:00:00 +0000https://davethehuman.com/notes/chain-of-thought-cot/<p><em>Chain of thought</em> is a style of intermediate-step generation that the <a href="https://davethehuman.com/notes/llm/">LLM</a> uses to make all the <a href="https://davethehuman.com/notes/reasoning/">reasoning</a> stages explicit and easier to follow.</p> <p>With COT, the LLM does not just recall a fact but it rather gets to the conclusion through intermediate passages, resembling a person articulating their thoughts out loud.</p> <p>Example:</p> <p><em><strong>Prompt</strong></em>: <em>“Alice has 3 apples and Bob has 5 apples. Alice gives 1 apple to Bob. How many apples do they have together?”</em></p>agentic loophttps://davethehuman.com/notes/agentic-loop/Tue, 05 May 2026 00:00:00 +0000https://davethehuman.com/notes/agentic-loop/<p>The <em>agentic loop</em> consists of the core components of an <a href="https://davethehuman.com/notes/ai-agent/">AI agent</a>:</p> <ol> <li>Brain (or <a href="https://davethehuman.com/coming-soon/">reasoning engine</a>)</li> <li>Planning: the agent uses techniques like <a href="https://davethehuman.com/notes/chain-of-thought-cot/">chain-of-thought (COT)</a> or <a href="https://davethehuman.com/coming-soon/">ReAct</a> to decide what to do next based on previous outcomes</li> <li>Memory: <ul> <li>Short-term: the immediate <a href="https://davethehuman.com/coming-soon/">context window</a> (current conversation)</li> <li>Long-term: vector databases or logs that allow the agent to “remember” past experiences across different sessions</li> </ul> </li> <li>Tools (capabilities): APIs, web search, code execution environments, or database access. Tools allow the agents to affect the real world (e.g. booking a flight, sending an email, etc.)</li> </ol>AI agenthttps://davethehuman.com/notes/ai-agent/Tue, 05 May 2026 00:00:00 +0000https://davethehuman.com/notes/ai-agent/<p>An <em>AI agent</em> is an autonomous software system that uses a <a href="https://davethehuman.com/coming-soon/">reasoning engine</a> (typically a <a href="https://davethehuman.com/notes/large-language-model/">Large Language Model</a>) to perceive its environment, reason about goals, and execute multi-step actions using external tools to achieve a specific objective. It basically uses an LLM as its brain.</p> <p>Unlike standard LLMs, which are reactive (answering prompts), agents are proactive (pursuing goals). Some example differences between them:</p> <ul> <li>capabilities: the standard LLM can generate text and code, and the AI agent can get to action execution via APIs/Tools</li> <li>logic: the LLM works with <a href="https://davethehuman.com/coming-soon/">one-shot generation</a> while the AI agent can do iterative planning and correction</li> <li>memory: a standard LLM resets at every session while the AI agent’s memory persist via external storage</li> </ul> <p>The <a href="https://davethehuman.com/notes/agentic-loop/">agentic loop</a> identifies the core components of AI agents.</p>fine-tuninghttps://davethehuman.com/notes/fine-tuning/Tue, 05 May 2026 00:00:00 +0000https://davethehuman.com/notes/fine-tuning/<p>During <em>fine-tuning</em> the <a href="https://davethehuman.com/notes/large-language-model/">Large Language Model</a> undergoes a specialised training on curated datasets to follow specific instructions or perform niche tasks.</p> <p>There are two key fine-tuning techniques:</p> <ul> <li><a href="https://davethehuman.com/notes/supervised-fine-tuning/">supervised fine-tuning</a> or <a href="https://davethehuman.com/coming-soon/">SFT</a> or <a href="https://davethehuman.com/coming-soon/">instruction tuning</a></li> <li><a href="https://davethehuman.com/notes/preference-tuning/">preference tuning</a></li> </ul>hallucinationhttps://davethehuman.com/notes/hallucination/Tue, 05 May 2026 00:00:00 +0000https://davethehuman.com/notes/hallucination/<p>The tendency to state false information with high confidence because it “looks” linguistically correct. It is one of the main issues of <a href="https://davethehuman.com/notes/large-language-model/">Large Language Model</a>s.</p>Large Language Modelhttps://davethehuman.com/notes/large-language-model/Tue, 05 May 2026 00:00:00 +0000https://davethehuman.com/notes/large-language-model/<p>A <em>Large Language Model</em> is a type of artificial intelligence trained on humongous datasets of text to understand, generate, and manipulate human language. They are built using <a href="https://davethehuman.com/coming-soon/">transformer</a> architectures and function by predicting the next most likely <a href="https://davethehuman.com/notes/token/">token</a> (word or part of a word) in a sequence.</p> <p>The core technical concepts:</p> <ul> <li><strong>scale</strong>: “Large” refers to both the training data (petabytes of text) and the <a href="https://davethehuman.com/notes/parameters/">parameters</a> count (billions)</li> <li><a href="https://davethehuman.com/notes/llm-training-pipeline/">LLM training pipeline</a></li> <li><strong>context window</strong>: the maximum amount of text the model can “hold in mind” at one time during a single conversation</li> </ul> <p>The main capabilities:</p>parametershttps://davethehuman.com/notes/parameters/Tue, 05 May 2026 00:00:00 +0000https://davethehuman.com/notes/parameters/<p>The internal variables of a model that determine how the model itself processes information. When talking about <a href="https://davethehuman.com/notes/large-language-model/">Large Language Model</a>s, we are usually talking about billions of <em>parameters</em>.</p>post-traininghttps://davethehuman.com/notes/post-training/Tue, 05 May 2026 00:00:00 +0000https://davethehuman.com/notes/post-training/<p>During <em>post-training</em>, the model trained during <a href="https://davethehuman.com/notes/pre-training/">pre-training</a> undergoes a specialised training stage to learn to respond to user queries.</p>pre-traininghttps://davethehuman.com/notes/pre-training/Tue, 05 May 2026 00:00:00 +0000https://davethehuman.com/notes/pre-training/<p>During <em>pre-training</em> the <a href="https://davethehuman.com/notes/large-language-model/">Large Language Model</a> learns general patterns, grammar, and facts from the internet/books via self-supervised learning. At this stage, the objective for the LLM is to learn to predict the next word (or <a href="https://davethehuman.com/notes/token/">token</a>) in these texts.</p> <p>We can think of this stage as “raw language prediction” that gives the LLM basic capabilities to produce coherent texts.</p>Hello Worldhttps://davethehuman.com/posts/hello-world/Sat, 02 May 2026 00:00:00 +0000https://davethehuman.com/posts/hello-world/<p>This is the first post of this website, just to say that more will be coming soon!</p> <div class="highlight-wrapper"><div class="highlight"><pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;-webkit-text-size-adjust:none;"><code class="language-python" data-lang="python"><span style="display:flex;"><span><span style="color:#66d9ef">def</span> <span style="color:#a6e22e">greet</span>(name: str) <span style="color:#f92672">-></span> str: </span></span><span style="display:flex;"><span> <span style="color:#66d9ef">return</span> <span style="color:#e6db74">f</span><span style="color:#e6db74">"Hello, </span><span style="color:#e6db74">{</span>name<span style="color:#e6db74">}</span><span style="color:#e6db74">"</span> </span></span><span style="display:flex;"><span> </span></span><span style="display:flex;"><span>print(greet(<span style="color:#e6db74">"world"</span>))</span></span></code></pre></div></div>LLMhttps://davethehuman.com/notes/llm/Sat, 02 May 2026 00:00:00 +0000https://davethehuman.com/notes/llm/<p>See <a href="https://davethehuman.com/notes/large-language-model/">Large Language Model</a>.</p>Abouthttps://davethehuman.com/about/Mon, 01 Jan 0001 00:00:00 +0000https://davethehuman.com/about/<p><figure><img class="my-0 rounded-md" loading="lazy" decoding="async" fetchpriority="auto" alt="Human evolution" width="2752" height="1536" src="https://davethehuman.com/attachments/evolution_hu_b497d5ebc2712b6a.png" srcset="https://davethehuman.com/attachments/evolution_hu_b497d5ebc2712b6a.png 800w, https://davethehuman.com/attachments/evolution_hu_6c105d298514b4db.png 1280w" sizes="(min-width: 768px) 50vw, 65vw" data-zoom-src="https://davethehuman.com/attachments/evolution.png"></figure> </p> <p>A human working as a Machine Learning Scientist at a big tech company.</p> <p>This site is a <a href="https://davethehuman.com/posts/">personal blog</a> where I write about whatever’s on my mind, and a <a href="https://davethehuman.com/brain/">second brain</a> where I connect ideas, notes, and things I’m learning.</p> <p>Some of it is technical. Some of it isn’t. All of it is me thinking out loud while AI changes the world around me. Before it’s too late.</p>Not yethttps://davethehuman.com/coming-soon/Mon, 01 Jan 0001 00:00:00 +0000https://davethehuman.com/coming-soon/<p>This node doesn’t exist yet — but the fact that something links here means it’s on the radar.</p> <p>The brain grows slowly, one idea at a time.</p> <p><em>Check back later.</em></p>Projectshttps://davethehuman.com/projects/Mon, 01 Jan 0001 00:00:00 +0000https://davethehuman.com/projects/<p>Side projects. Mostly data, models, and experiments that got out of hand.</p> <hr> <h3 class="relative group">WC 2026 Prediction Engine <div id="wc-2026-prediction-engine" class="anchor"></div> <span class="absolute top-0 w-6 transition-opacity opacity-0 -start-6 not-prose group-hover:opacity-100 select-none"> <a class="text-primary-300 dark:text-neutral-700 !no-underline" href="#wc-2026-prediction-engine" aria-label="Anchor">#</a> </span> </h3> <p>A Dixon-Coles Poisson model trained on international football history, predicting every match of the 2026 FIFA World Cup. Group stage probabilities, knockout bracket simulations, upset detection — all running on pre-tournament data, updated after each matchday.</p> <p><a href="https://davethehuman.com/projects/wc2026/">Open the app →</a></p> <hr> <p><em>More incoming.</em></p>