How to properly(?) relicense a large open source project - part 1
Relicensing VLC to LGPL
There are quite a few good reasons to do so, some more obvious than others, but notably competition, necessity to have more professional developers around VLC and AppStores. Other reasons also exist, but this is not the place and time to discuss those.
This is a crazy task, because every developer keeps all its rights, and VideoLAN has little rights on VLC. This involves contacting a few hundred developers, some who were active only 10 years ago, some with bouncing mails, and people spread across continents, countries, languages, OS...
Yet, I did it. Here is the first part of how I did it...
Copyright, droit d'auteurs, public domain and VLC.
This means, we are not under a copyright system, but under an author rights system. As such, every author has moral rights and patrimonial rights. The first ones are nontransferable while the later ones can be transfered to another legal entity. This is quite different from copyright.
Moreover, this explains why public domain is not a valid concept for everyone on this planet.
Unlike a lot of large open source projects, authors of VLC keep all their rights on their code, even if the code is
minimal. Therefore, to change the license, one must contact every author, even small contributors. From a community management point-of-view, this also makes sense :).
VLC authors, core and modules
VLC is split in several parts, but most of the code is in the core or in some modules.
The core, that was successfully relicensed last year, involved around 150 developers and 80000 lines of code, and the very vast majority of the code was done by two dozens of people, most of whom I have not lost contact with.
The modules are a different piece of cake.
Even, if we concentrate on the playback modules, which I did, we speak here of 300 developers and 300000 lines of code and the repartition is distributed more evenly.
Listing the right people
The first step, which is the most important, is to correctly list the authors.
This would seem simple from an external point of view, but it is not, mostly because there was no split between authors and commiters in the CVS and SVN days. Moreover, some code was
imported from Xine or MPlayer... And sometimes, the author is not even credited in the commit log.
And this should be the time were you think I am completely crazy.
To get a proper listing of contributors, I used 3 things:
- git blame
- git log
- grep, awk
on our vlc git repository.
The first obvious thing to do, is to use on all the files you care, so you know, lines-by-line who actually wrote the code, even after code copy, code move or re-indentation.
I ran it, with extra protection, like this:
git blame -C -C -C20 -M -M10 -e $file.
Of course, as some Git expert told me, this should have been enough:
git blame -C -C -M -e $file
but I preferred to be extra-safe.
The second obvious thing to do, was to check all the logs on the specific modules folder or file, in case :
- someone did some commits on one module
- the code was quite changed, so blaming does not find it
- yet the idea behind the code is the same.
This solves what I call the authorship leak.
git shortlog -sne $file was used for that task.
Finally, some people where only mentioned in the commits logs or just had their names in the final file.
For this, I grepped "patch", "original" "at", "@" in the commit logs.
I also grepped the author sections of every file to check if there were any other author missing.
After those steps, I had a quite accurate list of people to contact. I'll skip you the de-duplicating step, because this is obvious and boring.