By the “delta in the LLM weights”, I am assuming you mean the gradients. You are...

		whimsicalism on Sept 17, 2023 \| parent \| context \| favorite \| on: Run LLMs at home, BitTorrent‑style By the “delta in the LLM weights”, I am assuming you mean the gradients. You are effectively describing large batch training (data parallelism) which is part of the way you can scale up but there are quickly diminishing returns to large batch sizes.