How Subversion conserves disk space

I wanted to share something from our March openCollabNet Technical Newsletter. If you do not get our newsletter yet, sign up for openCollabNet. It only takes a minute.

To keep the size of the repository as small as possible, Subversion uses deltification, also called "deltified storage". Deltification is the encoding of a chunk of data as a collection of differences against some other data. If two files are very similar, deltification results in storage savings because only the changes are stored, not the entire file.

This works differently depending on what filesystem back-end you use. In BDB (Berkeley Database) fulltexts are found at the tips of each distinct line of a file’s history. When a change occurs, the new version is stored as fulltext, then the previous version is rewritten as a delta against that new version. FSFS stores deltas in the opposite direction so that old versions never need to be rewritten.  When a file is changed, the new version is stored as a delta against an older version.

Most source code files change frequently and Subversion’s performance would degrade if it had to use every individual delta to re-create a file that has changed many times. Subversion uses "skip-deltas" to improve performance. Skip-deltas are deltas that are calculated not against the immediate next or previous version, but against a version that’s closer in the chain of deltas to a fulltext representation of the file. This way the version of a file can be re-created using less deltas than when a delta for each individual change would be needed.

For repositories created with Subversion 1.4 or later, space savings increase further because the delta chunks are stored using a compression algorithm.

Tagged with: , , , , , , ,
Posted in Subversion
One comment on “How Subversion conserves disk space
  1. How Subversion conserves disk space

    我希望分享一些openCollabNet技术时事通讯的信息,如果你没有得到,你可以注册openCollabNet,只需要几分钟。 为了保持版本库尽可能的小,Subversion增量,叫做”增量存储”。增量是基于一些其他数据的区别数据,如果两个文件非常类似,只会保存增量结果,因为只保留了修改,而不是整个文件。 这个工作要取决于你使用的文件系统后端,在BDB(Berkeley数据库)时,可以在文件历史的每个主线前端看到文件全文,当发生修改时,新修订版本会以全文方式存放,然后以前的版本会以新版本的增量方式…

Leave a Reply

Your email address will not be published. Required fields are marked *

*