{"id":159,"date":"2026-03-16T12:39:23","date_gmt":"2026-03-16T12:39:23","guid":{"rendered":"https:\/\/lab.laeka.org\/lora-explained-fine-tuning-billion-parameter-models-laptop\/"},"modified":"2026-03-16T12:39:23","modified_gmt":"2026-03-16T12:39:23","slug":"lora-explained-fine-tuning-billion-parameter-models-laptop","status":"publish","type":"post","link":"https:\/\/laeka.org\/publications\/lora-explained-fine-tuning-billion-parameter-models-laptop\/","title":{"rendered":"LoRA Explained: Fine-Tuning Billion Parameter Models on Your Laptop"},"content":{"rendered":"<p>Fine-tuning a billion-parameter model typically requires modifying billions of weights. That&#8217;s prohibitively expensive. LoRA (Low-Rank Adaptation) sidesteps this by updating only a tiny fraction of the model while achieving comparable results.<\/p>\n<p>The insight is elegant: weight updates during fine-tuning have low rank. You don&#8217;t need to update the full weight matrix. You only need to update a low-rank approximation of the update.<\/p>\n<h2>How LoRA Works<\/h2>\n<p>Instead of fine-tuning the full weight matrix W, LoRA decomposes the update as a product of two smaller matrices: \u0394W = B \u00d7 A.<\/p>\n<p>For a weight matrix of shape (d_out \u00d7 d_in), LoRA introduces:<\/p>\n<p><strong>A:<\/strong> shape (r \u00d7 d_in), where r is the rank (typically 8-64)<br \/>\n<strong>B:<\/strong> shape (d_out \u00d7 r)<\/p>\n<p>During forward pass: output = W \u00d7 input + (B \u00d7 A) \u00d7 input<\/p>\n<p>You only train A and B, freezing the original W. The rank r is typically much smaller than d_in and d_out, so the parameter count explodes down.<\/p>\n<h2>The Numbers<\/h2>\n<p>For a 70B model with 4k hidden dimensions:<\/p>\n<p><strong>Full fine-tuning:<\/strong> 70B trainable parameters<br \/>\n<strong>LoRA (rank 8):<\/strong> 70B \u00d7 (8 \/ 4000) \u2248 140M trainable parameters<br \/>\n<strong>LoRA (rank 64):<\/strong> 70B \u00d7 (64 \/ 4000) \u2248 1B trainable parameters<\/p>\n<p>You&#8217;re training 0.2% of the model with rank-8 LoRA. The memory and compute savings are massive.<\/p>\n<h2>Rank: The Key Tradeoff<\/h2>\n<p>LoRA&#8217;s rank is the tuning knob. Higher rank = more expressiveness but more parameters.<\/p>\n<p><strong>Rank 8:<\/strong> Very cheap, fast training. Works for minor domain adaptation. Fine-tuning instructions or specific styles.<\/p>\n<p><strong>Rank 16-32:<\/strong> Sweet spot for most applications. Enough expressiveness for meaningful adaptation without excessive cost.<\/p>\n<p><strong>Rank 64+:<\/strong> Approaching full fine-tuning cost. Use when minor rank isn&#8217;t expressive enough.<\/p>\n<p>In practice, rank 16 works for 80% of use cases. Rank 32 works for 95%. Diminishing returns set in fast.<\/p>\n<h2>Why It Works<\/h2>\n<p>The assumption underlying LoRA is empirically validated: fine-tuning updates have low intrinsic rank. The model doesn&#8217;t need to change very much to adapt to new domains or tasks.<\/p>\n<p>This makes sense. The pre-trained model already encodes enormous amounts of knowledge. Adapting to a new domain doesn&#8217;t require wholesale rewiring, just targeted adjustments.<\/p>\n<p>LoRA captures these adjustments efficiently.<\/p>\n<h2>Practical Implementation<\/h2>\n<p>Using LoRA in code is trivial with the peft library:<\/p>\n<pre><code>from peft import LoraConfig, get_peft_model\n\nconfig = LoraConfig(\n    r=16,\n    lora_alpha=32,\n    target_modules=[\"q_proj\", \"v_proj\"],\n    lora_dropout=0.05,\n    bias=\"none\"\n)\n\nmodel = get_peft_model(model, config)\n<\/code><\/pre>\n<p>Train the model normally. Only A and B matrices get updated. At inference, merge the weights or keep them separate for easy switching between adapters.<\/p>\n<h2>The Practical Advantage<\/h2>\n<p>A 70B model with LoRA can fine-tune on a consumer GPU. Storage is minimal (rank-8 LoRA for 70B is ~140MB). You can load multiple adapters and switch between them at runtime.<\/p>\n<p>This unlocks a new development model: base models + many specialized adapters. No need to train 10 different full models. Train 10 LoRA adapters instead, at 1% the cost.<\/p>\n<p><strong>Laeka Research \u2014 <a href=\"https:\/\/laeka.org\">laeka.org<\/a><\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Fine-tuning a billion-parameter model typically requires modifying billions of weights. That&#8217;s prohibitively expensive. LoRA (Low-Rank Adaptation) sidesteps this by updating only a tiny fraction of the model while achieving comparable results. The insight is&#8230;<\/p>\n","protected":false},"author":1,"featured_media":158,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","footnotes":""},"categories":[249],"tags":[],"class_list":["post-159","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-fine-tuning"],"_links":{"self":[{"href":"https:\/\/laeka.org\/publications\/wp-json\/wp\/v2\/posts\/159","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/laeka.org\/publications\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/laeka.org\/publications\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/laeka.org\/publications\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/laeka.org\/publications\/wp-json\/wp\/v2\/comments?post=159"}],"version-history":[{"count":0,"href":"https:\/\/laeka.org\/publications\/wp-json\/wp\/v2\/posts\/159\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/laeka.org\/publications\/wp-json\/wp\/v2\/media\/158"}],"wp:attachment":[{"href":"https:\/\/laeka.org\/publications\/wp-json\/wp\/v2\/media?parent=159"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/laeka.org\/publications\/wp-json\/wp\/v2\/categories?post=159"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/laeka.org\/publications\/wp-json\/wp\/v2\/tags?post=159"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}