I combine git-annex with the bup special remote[1], which lets me still externalize big files, while benefiting from block level deduplication. Or depending on your needs, you can just use a tool like bup[2] or borg directly. Bup actually uses the git pack file format and git metadata.
I actually wrote a script which I'm happy to share, that makes this much easier, and even lets you mount your bup repo over .git/annex/objects for direct access.
Have you tested this out with Unreal Engine blueprint files? If you all can do block-based diffing on those, and other binary assets used in game development it'd be huge for game development.
I have a couple ~1TB repositories I've had the misfortune of working with using perforce in the past.
Last time I used perforce in anger it did pretty decent with ~800GB repo(checkout+history).
I keep expecting someone to come along and dethrone it but as far as I can tell it hasn't been done yet. The combination of specific filetree views, drop-in proxies, UI-forward and checkout based workflow that works well with unmergeable binary assets still left Git LFS and other solutions in the dust.
+1 on testing this against a moderate size gamedev repo, that usually has some of the harder constraints where code + assets can be coupled and the art portion of a sync can easily top a couple hundred GB.
1TB of checkout is the kind of repo I'm talking about I have two such repos checked out on this box currently. I'm not sure I've ever checked out a repo of this scale locally with history. I'd love to have the local history.
HashBackup author here. Your question is (I think) about how well block-based dedup functions on a database - whether rows are changed or columns are changed. This answer is how most block-based dedup software, including HashBackup work.
Block-based dedup can be done either with fixed block sizes or variable block sizes. For a database with fixed page sizes, a fixed block size matching the page size is most efficient. For a database with variable page sizes, a variable block size will work better, assuming there the dedup "chunking" algorithm is fine-grained enough to detect the database page size. For example, if the db used a 4-6K variable page size and the dedup algo used a 1M variable block size, it could not save a single modified db page but would save more like 20 db pages surrounding the modified page.
Your column vs row question depends on how the db stores data, whether key fields are changed, etc. The main dedup efficiency criteria are whether the changes are physically clustered together in the file or whether they are dispersed throughout the file, and how fine-grained the dedup block detection algorithm is.
Imagine you have a 500MB file (lastmonth.csv) where every day 1MB is changed.
With file-based deduplication every day 500MB will be uploaded, and all clones of the repo will need to download 500MB.
With block-based deduplication, only around the 1MB that changed is uploaded and downloaded.