{"id":178,"date":"2026-03-16T12:41:00","date_gmt":"2026-03-16T12:41:00","guid":{"rendered":"https:\/\/lab.laeka.org\/model-merge-phenomenon-combining-capabilities\/"},"modified":"2026-03-16T12:41:00","modified_gmt":"2026-03-16T12:41:00","slug":"model-merge-phenomenon-combining-capabilities","status":"publish","type":"post","link":"https:\/\/laeka.org\/publications\/model-merge-phenomenon-combining-capabilities\/","title":{"rendered":"The Model Merge Phenomenon: Combining Capabilities Without Training"},"content":{"rendered":"<p>What if you could combine the strengths of two models without retraining? Create a model that writes code like Model A but reasons like Model B? This is model merging, and it works.<\/p>\n<p>Model merging takes weights from two or more models and combines them in clever ways. The result is often surprising: emergent capabilities you wouldn&#8217;t expect from simple averaging.<\/p>\n<h2>How Model Merging Works<\/h2>\n<p>The simplest merge is linear interpolation. If Model A has weights W_A and Model B has weights W_B, the merged model has weights W = (1-a)*W_A + a*W_B for some weight a.<\/p>\n<p>This almost never works well. Naive averaging destroys the delicate weight distributions both models learned. But with careful techniques, it works surprisingly well.<\/p>\n<h2>SLERP: Spherical Linear Interpolation<\/h2>\n<p>SLERP (Spherical Linear Interpolation) treats weight vectors as points on a sphere. Instead of straight-line interpolation, it moves along a geodesic through the weight space.<\/p>\n<p>SLERP preserves the magnitude of weight vectors better than linear interpolation. The result: merges that maintain model coherence better.<\/p>\n<h2>TIES Merging<\/h2>\n<p>TIES (Trim, Interleave, and Ensemble) is more sophisticated. It identifies the most important weight changes in each model, combines only those changes, and uses ensemble techniques to blend results.<\/p>\n<p>TIES has published results showing that merging a code model with a reasoning model produces better performance on tasks requiring both skills than either model alone.<\/p>\n<h2>DARE Merging<\/h2>\n<p>DARE (Domain Adaptation and Rapid Ensemble) randomly samples weights from each model instead of averaging. Counter-intuitively, this works well for merging models fine-tuned on different datasets.<\/p>\n<p>DARE is particularly good for combining multiple fine-tuned models (e.g., 5 different LoRA adapters) into a single coherent model.<\/p>\n<h2>Why Merging Works<\/h2>\n<p>The key insight is that fine-tuned models share the same base architecture and are trained from the same initialization. Their weight spaces are aligned in ways that allow meaningful interpolation.<\/p>\n<p>When you merge models that diverged from the same starting point, you&#8217;re not combining arbitrary weight matrices. You&#8217;re blending carefully learned deviations from a common base.<\/p>\n<h2>Practical Use Cases<\/h2>\n<p><strong>Combining specialized adapters:<\/strong> Train 5 LoRA adapters on different domains, merge them into a single multi-domain model.<br \/>\n<strong>Balancing trade-offs:<\/strong> One model is verbose but accurate. Another is concise but sometimes wrong. Merge them to balance both.<\/p>\n<p><strong>Rapid model development:<\/strong> Don&#8217;t have time to train? Merge two existing models and iterate from there.<\/p>\n<h2>Tools for Merging<\/h2>\n<p>mergekit is the standard tool. It handles SLERP, TIES, DARE, and custom merge strategies. Using it is trivial:<\/p>\n<p>Define a YAML config specifying which models to merge and which method. Run mergekit. Get a merged model.<\/p>\n<p>The process is fast (minutes, not hours) and requires no training.<\/p>\n<h2>The Limitation<\/h2>\n<p>Merging only works well when models are compatible: same architecture, similar capability levels, trained from the same initialization.<\/p>\n<p>Merging a 7B and a 70B model won&#8217;t work. Merging models from different architectures won&#8217;t work. But within compatible families, merging is powerful.<\/p>\n<h2>What This Means<\/h2>\n<p>Model merging democratizes the ability to create specialized models. You don&#8217;t need to train from scratch. Combine existing models, and you often get something better than any individual model.<\/p>\n<p>This is particularly powerful in the era of open source models where dozens of fine-tuned variants exist for every task.<\/p>\n<p><strong>Laeka Research \u2014 <a href=\"https:\/\/laeka.org\">laeka.org<\/a><\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>What if you could combine the strengths of two models without retraining? Create a model that writes code like Model A but reasons like Model B? This is model merging, and it works. Model&#8230;<\/p>\n","protected":false},"author":1,"featured_media":176,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_kad_post_transparent":"","_kad_post_title":"","_kad_post_layout":"","_kad_post_sidebar_id":"","_kad_post_content_style":"","_kad_post_vertical_padding":"","_kad_post_feature":"","_kad_post_feature_position":"","_kad_post_header":false,"_kad_post_footer":false,"_kad_post_classname":"","footnotes":""},"categories":[243],"tags":[],"class_list":["post-178","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-architecture"],"_links":{"self":[{"href":"https:\/\/laeka.org\/publications\/wp-json\/wp\/v2\/posts\/178","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/laeka.org\/publications\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/laeka.org\/publications\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/laeka.org\/publications\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/laeka.org\/publications\/wp-json\/wp\/v2\/comments?post=178"}],"version-history":[{"count":0,"href":"https:\/\/laeka.org\/publications\/wp-json\/wp\/v2\/posts\/178\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/laeka.org\/publications\/wp-json\/wp\/v2\/media\/176"}],"wp:attachment":[{"href":"https:\/\/laeka.org\/publications\/wp-json\/wp\/v2\/media?parent=178"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/laeka.org\/publications\/wp-json\/wp\/v2\/categories?post=178"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/laeka.org\/publications\/wp-json\/wp\/v2\/tags?post=178"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}