How not to handle large files in GitHub
What’s the problem?
As you probably know GitHub has a strict file limit of 100MB. If you are just uploading lines of codes probably you don’t need to worry about it. However, if you want to upload a bit of data, or graphic projects, or something in binary, this is a limit that you might want to cross. There are at least 2 different solutions to this problem (Git LFS and cleaning repo history) but in this short article we will look just to one of these.
Photo by hara gopal on Unsplash
What’s happened?
This is what happens to me last week. As you probably know, after 2 years of work (yes, there are lot of contents and I’m very lazy) I published my 4th technical book. After 3 books of the serie “Delphi Cookbook” published with PacktPub I decided to publish my last book (DMVCFramework - the official guide) using LeanPub, a self publishing service. I’m not a graphic guy, who know me in person known that while I’m quite good at software UX design I’m quite terrible for pure graphical stuffs, however, in this case I decided for a minimalist and polished book cover so, eventually, I’ve got my Paint.NET and a nice photo from Unspash and started to create something…
IMHO the cover book is nice enough.
Considering that DMVCFramework is the most popular framework for Delphi on github, this book has been a good number of readers, so I decided to translate this book also in Brasilian Portuguese and Spanish. Two friends of mine are translating the contents (Diego Farisato for the Brasilian version and José Davalos for the Spanish version). All these localization started to make the book cover project file bigger…
Where should I put all this graphic material? Yeah! Put it on github along with the project itself. Here’s the problem started…
The Github limit and the problem…
When I committed the file in my local Git working folder all work as usual. However, when I tried to push my local changes to remote repo, Github blocked me because the main Paint.NET project is more than 170MiB but the Github limit is 100MiB for a single file. OK, someone could think: “Easy to solve, remove the file and retry”. No, it doesn’t work in this way. Git mantains the full history of the commits so this trick don’t solve the problem. Googling for a solution I finally found this little FOSS program named BFG which says about itself: “Removes large or troublesome blobs like git-filter-branch does, but faster. And written in Scala”. Let’s try.
The Solution
The solution has been quite fast and easy. There are anumber of option availables for BFG, however I opted to just remove from all commits the big graphic files using the following command. Yes, using this little gem you can change the Git history.
The command line to use is the following:
java -jar bfg-1.13.0.jar -D <filename_or_mask_to_remove> <local_git_project_folder>
Here’s my command line session:
C:\> java -jar bfg-1.13.0.jar -D *_guide.pdn C:\DEV\dmvcframework\
Using repo : C:\DEV\dmvcframework\.git
Found 1794 objects to protect
Found 4 tag-pointing refs : refs/tags/dmscontainer_branch_start, refs/tags/feature_branch_dmscontainer, refs/tags/feature_dmscontainer_v_3_1, refs/tags/v3_2_0-boron-RC2
Found 44 commit-pointing refs : HEAD, refs/heads/feature_dmscontainer_v3_1, refs/heads/feature_restclient, ...
Protected commits
-----------------
These are your protected commits, and so their contents will NOT be altered:
* commit df3913f8 (protected by 'HEAD')
Cleaning
--------
Found 1455 commits
Cleaning commits: 100% (1455/1455)
Cleaning commits completed in 1.533 ms.
Updating 1 Ref
--------------
Ref Before After
---------------------------------------
refs/heads/master | df3913f8 | bdf44232
Updating references: 100% (1/1)
...Ref update completed in 22 ms.
Commit Tree-Dirt History
------------------------
Earliest Latest
| |
...........................................................D
D = dirty commits (file tree fixed)
m = modified commits (commit message or parents changed)
. = clean commits (no changes to file tree)
Before After
-------------------------------------------
First modified commit | 84af4a4c | 89565eb8
Last dirty commit | 84af4a4c | 89565eb8
Deleted files
-------------
Filename Git id
----------------------------------------------------------
dmvcframework_the_official_guide.pdn | e3f5e9a1 (175,8 MB)
In total, 7 object ids were changed. Full details are logged here:
C:\DEV\dmvcframework.bfg-report\2020-10-24\15-03-43
BFG run is complete! When ready, run: git reflog expire --expire=now --all && git gc --prune=now --aggressive
cd c:\dev\dmvcframework
C:\dev\dmvcframework>git reflog expire --expire=now --all && git gc --prune=now --aggressive
Enumerating objects: 17009, done.
Counting objects: 100% (17009/17009), done.
Delta compression using up to 8 threads
Compressing objects: 100% (16662/16662), done.
Writing objects: 100% (17009/17009), done.
Total 17009 (delta 12246), reused 4299 (delta 0), pack-reused 0
After this the file and its history has been removed from the repo and I can freely push my repo to Github.
Now, I’m probably will use Git Large File Storage for such big file.
Comments
comments powered by Disqus