sábado, 11 de enero de 2014

Pareto’s Law in free software project

I wonder whether the Pareto’s Law in dedication hours is true in free software projects. The Pareto's Law is a principle of economist announced by Vilfredo Pareto that, in generalized form, holds that 80% of output originates from 20% of input. The 80/20 ratio can be applied to revenues, advertising effectiveness, or management headaches. For example, 20% of advertising produces 80% of results. Likewise, 80% of management headaches are caused by 20% of the employees. Also called 80/20 rule.

Therefore in free software projects this law could be that 20% of committers produce 80% of commits.
Let's go to analyse some projects to do a first approximation checking of the law, the list of chosen projects:
    - LibreOffice
    - CloudStack
    - Evince
    - CVSAnalyt
   
The analysis assesses number of commits and committers in the 2013 and aggregate committers to 20% of commits, and calculate percentage in number of committers. therefore the results are
    - CloudStack
 

    - Linux


       
Therefore, the Pareto's Law does not seem true in these big projects (Maybe we should define accurately when a committer is really a contributor, a contribuitor with 1 commit is a contributor?.)

But finally, this law is valid for small project, try to check the others, for instance, evince (a pdf visor) and even CVSAnalyt.
    - Evince
   
    - CVSAnalyt



In these projects, the law does not seem to be true neither.


References:
[1] Metrics Grimoire - http://metricsgrimoire.github.io/
[2] CVSAnalY - https://github.com/MetricsGrimoire/CVSAnalY
[3] LibreOffice repository - git://anongit.freedesktop.org/libreoffice/core
[4] CloudStack repository - https://git-wip-us.apache.org/repos/asf/cloudstack.git
[5] Linux repository - https://github.com/torvalds/linux

Note:
Using query in CVSAnalY to get data
SELECT COUNT(distinct(scmlog.id)) as total, people.email
FROM scmlog, actions, people
WHERE scmlog.id = actions.commit_id
AND scmlog.author_id = people.id
AND YEAR(scmlog.date) = 2013
GROUP BY people.email
ORDER BY total DESC
INTO OUTFILE '/tmp/data.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n';

No hay comentarios:

Publicar un comentario