lunes, 20 de enero de 2014

The Free Software Development Quantitative Analysis

Nowadays, the usage of free software is generalized in companies, and more or less they have business strategy plans about it. Besides, in the crisis context public administrations and companies need to reduce costs and perform accurate evaluation of products. But the ecosystem of free software project is really different to private software products, since the development process in a free software project is usually openned and you could analyze aspects about source code, commits and the activity of developers (community members) committers, etc.

There are companies like Bitergia that is specialized in getting, providing and analyzing those metrics and information. With these analysis companies and public administratins could assess free software project, and thus they made the best election of software. But even more outstanding in my opinion is that they create dashboards that allow tracking free software project, helping to understand the health of community, in summary, an essential tool for take correct decisions.

In particular for public administration, the implementation of these dashboards could help to create a new paradigm for finance development by public administration, in this way they could drive the development of community with the information in the dashboard and even assess the contribution of different contributor and in some way financing them.

These dashboards have data about commits, developers, tickets, mails messages, files, and so on. For example, in the following picture you can see summary dashboard





In this picture you can see source code tracking, with data like commits, authors, files, lines added and removed.



Finally, in this picture you can see information about tickets and top companies. I wonder it this kind of information could be a model to assess companies to be financed by public administration development.



It is important to know that this dashboard has implemented by  MetricsGrimoire, VizGrimoire  and Bootstrap. Besides, in the bottom of each dashboard there are links for data source, so you could reproduce this graphs if you want.

If you are interested, you could access to other community metrics dashboards: Ceph, Wikipedia and gvSIG.

References:
[0] Bitergia
[1] OpenStack dashboard
[2] MetricsGrimoire
[3] VizGrimoire
[4] Bootstrap
[5] Ceph’s development dashboard
[6] Wikipedia Community Metrics dashboard
[7] gvSIG Desktop Analysis (Mar 2013)

domingo, 19 de enero de 2014

X2go Metrics Analysis

X2GO is for having a graphical desktop of computers over a low bandwidth connection. With X2Go you can access your desktop using another computer -- that means both LAN and internet connections. The transmission is done using the ssh protocol, so it is encrypted. By using the free nx libraries from NoMachine, a very acceptable performance in both speed and responsiveness is achieved. Even an ISDN connection runs smoothly.

X2Go's Git projects can be cloned to a local copy through anonymous Git with the following instruction set (in a Unix-like console session). We are going to analyze the following package x2goclient an x2goserver.


janague@metricsGrimoireHost:~/repositories$ mkdir -p x2go
janague@metricsGrimoireHost:~/repositories$ cd x2go/
janague@metricsGrimoireHost:~/repositories/x2go$ git clone git://code.x2go.org/x2goclient.git
Cloning into 'x2goclient'...
remote: Counting objects: 4171, done.
remote: Compressing objects: 100% (3364/3364), done.
remote: Total 4171 (delta 2982), reused 952 (delta 631)
Receiving objects: 100% (4171/4171), 17.88 MiB | 1.11 MiB/s, done.
Resolving deltas: 100% (2982/2982), done.

janague@metricsGrimoireHost:~/repositories/x2go$ git clone git://code.x2go.org/x2goserver.git
Cloning into 'x2goserver'...
remote: Counting objects: 8865, done.
remote: Compressing objects: 100% (2548/2548), done.
remote: Total 8865 (delta 6601), reused 8124 (delta 5930)
Receiving objects: 100% (8865/8865), 1010.62 KiB | 568 KiB/s, done.
Resolving deltas: 100% (6601/6601), done.

Firstly, we are going to use SLOCCount to automatically identify and measure a several languages used in x2go development.


janague@metricsGrimoireHost:~/repositories/x2go$ sloccount x2goclient
Have a non-directory at the top, so creating directory top_dir
Adding /home/janague/repositories/x2go/x2goclient/AUTHORS to top_dir
...
Adding /home/janague/repositories/x2go/x2goclient/xsettingswidget.h to top_dir
Categorizing files.
Finding a working MD5 command....
Found a working MD5 command.
Computing results.

SLOC    Directory    SLOC-by-Language (Sorted)
22456   top_dir         cpp=21612,perl=768,sh=76
2352    qtbrowserplugin-2.4_1-opensource cpp=2352
512     examples        perl=512
40      debian          sh=40
28      x2go-logos      sh=28
12      portable        cpp=12
2       nsis            sh=2
0       desktop         (none)
0       icons           (none)
0       man             (none)
0       png             (none)
0       provider        (none)
0       svg             (none)
0       txt             (none)

Totals grouped by language (dominant language first):
cpp:          23976 (94.39%)
perl:          1280 (5.04%)
sh:             146 (0.57%)

Total Physical Source Lines of Code (SLOC)                = 25,402
Development Effort Estimate, Person-Years (Person-Months) = 5.97 (71.67)
 (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Schedule Estimate, Years (Months)                         = 1.06 (12.68)
 (Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule)  = 5.65
Total Estimated Cost to Develop                           = $ 806,776
 (average salary = $56,286/year, overhead = 2.40).
SLOCCount, Copyright (C) 2001-2004 David A. Wheeler
SLOCCount is Open Source Software/Free Software, licensed under the GNU GPL.
SLOCCount comes with ABSOLUTELY NO WARRANTY, and you are welcome to
redistribute it under certain conditions as specified by the GNU GPL license;
see the documentation for details.
Please credit this data as "generated using David A. Wheeler's 'SLOCCount'."

In the following graph it shows the programming languages distribution


Therefore, C++ language is dominant in X2goclient package and the total estimated cost to develop is $ 806,776.

In the case of x2goserver package


janague@metricsGrimoireHost:~/repositories/x2go$ sloccount x2goserver/
Have a non-directory at the top, so creating directory top_dir
Adding /home/janague/repositories/x2go/x2goserver//INSTALL to top_dir
...

Totals grouped by language (dominant language first):
perl:          3814 (73.57%)
sh:            1344 (25.93%)
xml:             21 (0.41%)
ansic:            5 (0.10%)

Total Physical Source Lines of Code (SLOC)                = 5,184
Development Effort Estimate, Person-Years (Person-Months) = 1.13 (13.51)
 (Basic COCOMO model, Person-Months = 2.4 * (KSLOC**1.05))
Schedule Estimate, Years (Months)                         = 0.56 (6.72)
 (Basic COCOMO model, Months = 2.5 * (person-months**0.38))
Estimated Average Number of Developers (Effort/Schedule)  = 2.01
Total Estimated Cost to Develop                           = $ 152,069
 (average salary = $56,286/year, overhead = 2.40).
SLOCCount, Copyright (C) 2001-2004 David A. Wheeler
SLOCCount is Open Source Software/Free Software, licensed under the GNU GPL.
SLOCCount comes with ABSOLUTELY NO WARRANTY, and you are welcome to
redistribute it under certain conditions as specified by the GNU GPL license;
see the documentation for details.
Please credit this data as "generated using David A. Wheeler's 'SLOCCount'."

In the following graph it shows the programming languages distribution



Therefore, Perl language is dominant in X2goserver package and the total estimated cost to develop is $ 152,069.


Secondly, we are going to analyse the git repository to know data about community, using commits and committers information. Using CVSAnalY to get data and MariaDB to store and query. So we need to create a new database, for example x2goDB, and execute CVSAnalY



janague@metricsGrimoireHost:~/repositories/x2go$ mysql -u root -p
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 79
Server version: 5.5.34-MariaDB-1~precise-log mariadb.org binary distribution
Copyright (c) 2000, 2013, Oracle, Monty Program Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> create database x2goDB;
Query OK, 1 row affected (0.00 sec)

MariaDB [(none)]> grant all privileges on x2goDB.* to 'janague'@'localhost' identified by 'janague123';
Query OK, 0 rows affected (0.00 sec)

And using the same database for both package


janague@metricsGrimoireHost:~/repositories/x2go/x2goclient$ cvsanaly2 -u janague -d x2goDB
Password:
Parsing log for /home/janague/repositories/x2go/x2goclient (git)
Warning: Detected empty branch 'master', it'll be ignored
Warning: Detected empty branch 'build-main', it'll be ignored
Executing extensions

janague@metricsGrimoireHost:~/repositories/x2go/x2goclient$ cd ..
janague@metricsGrimoireHost:~/repositories/x2go$ cd x2goserver/
janague@metricsGrimoireHost:~/repositories/x2go/x2goserver$ cvsanaly2 -u janague -d x2goDB
Password:
Parsing log for /home/janague/repositories/x2go/x2goserver (git)
Warning: Detected empty branch 'build-baikal', it'll be ignored
Warning: Detected empty branch 'build-main', it'll be ignored
Executing extensions

Following we show different results

1.- Commits per committer limited to the 30 first with the highest accumulated activity


using the next query


select
    year(date),
    month(date),
    day(date),
    count(*)
from
    scmlog
group by year(date), month(date), day(date);

2.- Commits per committer limited to the 30 first with the highest accumulated activity



using the same query and the aggregation is made using a python function.


3.- Commits per committer limited to the 30 first with the highest accumulated activity


using the next query

select
    count(*)
from
    scmlog
group by committer_id
order by count(*) desc
limit 30;

This graph shows that few amount of developers that they develop the vast majority of the source code.

4.- Top 10 of developers, all history.

Using this query 


SELECT COUNT(distinct(scmlog.id)) as total, people.email, people.name
FROM scmlog, actions, people
WHERE scmlog.id = actions.commit_id
AND scmlog.author_id = people.id
GROUP BY people.email
ORDER BY total DESC
LIMIT 10

we have the following result


+-------+--------------------------------------+------------------------+
| total | email                                | name                   |
+-------+--------------------------------------+------------------------+
|  1444 | mike.gabriel@das-netzwerkteam.de     | Mike Gabriel           |
|    83 | oleksandr.shneyder@obviously-nice.de | Oleksandr Shneyder     |
|    55 | o.shneyder@phoca-gmbh.de             | Oleksandr Shneyder     |
|    34 | oleksandr.shneyder@treuchtlingen.de  | Oleksandr Shneyder     |
|    20 | siretart@tauware.de                  | Reinhard Tartler       |
|    14 | ionic@ionic.de                       | Mihai Moldovan         |
|    11 | morty@gmx.net                        | Moritz 'Morty' Strübe  |
|     8 | jengelh@inai.de                      | Jan Engelhardt         |
|     8 | teranders@gmail.com                  | Terje Andersen         |
|     7 | bd.dali@gmail.com                    | Daniel Lindgren        |
+-------+--------------------------------------+------------------------+
10 rows in set (0.05 sec)

Conclusion
This community is only 3 years of life, really young, with a outstanding and continuous activity but with a huge problem of maintenance and sustainability, because almost the 83% of commits are made for the same contributor.


References
[0] X2go homepage
[1] <package> names 
[2] SLOCCount
[3] CVSAnalY
[4] script by Daniel Izquierdo Cortazar <dizquierdo@libresoft.es>
[5] matplotlib
[6] SciPy

sábado, 11 de enero de 2014

LibreOffice - Roles and responsabilities - OpenOffice




Firstly, let's go to see the case of LibreOffice, these are the different roles:

QA Tester
        Triage, prioritize and do in-depth analysis of bug reports and feature requests
        Do manual testing and test reporting
        Take part in localization QA.


UX/Visual Designers
        Design provides the visual basis for any tool.
        Transport usability, quality and emotions.
        Improving LibreOffice by visual means, inside the office suite, in user interaction and at any place where the product.
   
Localizers
        Help localize the LibreOffice user interface
        Help files into his language

Documenters
        Work on user guides and LibreOffice's built-in help system as part of the LibreOffice documentation team.
        Writing and reviewing are a key part of the workflow.
        Recruiting people to help out in other ways: screenshot production, indexing...

Marketers
        Attend FOSS events as part of an official Foundation Team.
        Publicize LibreOffice and The Document Foundation in media, at fairs and other events, etc.
        Take part in creating content and coordinating activities for special marketing initiatives.
        Work on research (such as marketing and feature research for future versions, usability research, etc.) 

Developers
        Work on code for the LibreOffice code base and extensions.
        Work on bug-fixing and submitting patches.

Web Admins
        Maintain and develop LibreOffice and Document Foundation online infrastructure: websites, servers, etc.

Donate
        Making a donation to help fund the community's work.

On the other hand, in Apache OpenOffice we can find

Observer
        Views, but does not change project resources.
        Read-only access to most project resources.
        Read-only access to web content and source code (CVS).
        Submits issue to issue tracking (IssueZilla)
        Subscribes and posts to project mailing lists.

Developer
        Contributes directly to project -- source code and HTML.
        Gains write access to most project resources.
        Write access to HTML, news utility, files utility, CVS, Issuezilla.
        Mailing list privileges the same.

Content Developer
        Contributes directly to project's web content (HTML).
        Gains write access to project's HTML, news utility, files utility, and Issuezilla.
        Mailing list privileges the same.

Project Owner
        Defines the project's overall mission, direction, methodology, and community make-up.
        Gains administrative access to all Project functions.
        Grants members requested permissions on project.
        Administers all project mailing lists and is default moderator on all lists.
        Administers Issuezilla.
        Project Owner role supersedes any other roles you may hold on a project.

It is important to note that LibreOffice seems to have a model of roles more detailed and richest, and with some specify role, like QA testers, Marketers, and Localizers.

References:
[0] LibreOffice - Get involved
[1] Apache OpenOffice Roles

R Governance Model

The purpose of this entry is to show whether the governance model of R project is a cathedral in the direction of Raymond's classic "Bazaar" article and based on meritocracy community. Let's go to analyze the following aspects of governance model: Overview, roles and responsibilities, support, decision making process, and contribution process.

    Overview
   
    The project objectives is to implement the R programming language, which is a language and environment for statistical computing and graphics. It is a GNU project and is available as Free Software under the terms of the Free Software Foundation's GNU General Public License. The copyright holder is by Robert Gentleman and Ross Ihaka for the first source code and after the R Core Team was added.
And how users can become involved with the development of the project, the starting point is R Developer Page with documentation about project development. 
       
    Roles and responsibilities
   
    The current R is the result of a collaborative effort with contributions from all over the world. R was initially written by Robert Gentleman and Ross Ihaka—also known as "R & R" of the Statistics Department of the University of Auckland.
    There is a core group with write access to the R source code, currently consisting of 22 members[3][4].
    Besides, there are many others that contributed by donating code, bug fixes and documentation.
   
    The R Foundation is a not for profit organization that holds and administers the copyright of R software and documentation. The principal organs of the “R Foundation” are: The general assembly, the board, the auditors, and the court of arbitration.
   
    The business transactions of the general assembly include:
  1. Election and dismissal of the members of the board.
  2. Election and dismissal of the auditors.
  3. Acceptance of activity report, statement and estimates of account.
  4. Release of the board.
  5. Determination of membership fees.
  6. Approval or rejection of proposed changes to these statutes.
  7. The decision to terminate the “R Foundation”.
  8. Discussion of and decisions on other topics of the agenda
The board of the organization consists of at least four persons:
  1. Either a president and a vice-president or two presidents of equal rights
  2. A secretary general.
  3. A treasurer
The main board responsibilities:
  1. Preparation of activity report, statement and estimates of account.
  2. Preparation of and call for general assemblies.
  3. Management of all assets.
There are two auditors who routinely check business and accounting of the organization and report to the general assembly.
   
    There is a Court of Arbitration to resolve the disputes.
   
    Support
   
        There are four general mailing lists devoted to R: R-announce, R-packages, R-help, and R-devel. Additionally, there are several specific Special Interest Group mailing lists. And to satisfy geographic or regional (or subject) needs, some R users have formed "R User Groups".
   
    Decision making process
   
    Court of Arbitration decides with majority vote, the chairman decides in case of a draw due to abstention.
        In particular, membership terminated process could be by the death of a person, by voluntary withdrawal and by an affirmative vote of a two-thirds majority of the ordinary members.
   
    Contribution process
   
The core team is the only that has access to the source code.

The R-devel list is intended for questions and discussion about code development in R, and the starting point to begin contribution.
   
Ordinary members have a vote in the general assembly. New ordinary members shall be admitted only by a majority vote of the existing ordinary members. Another role is supporting member, that could be any person or legal entity.
   
There is a posting guide with general instructions of how to use mail lists.
    
The project uses Bugzilla to manage bugs and subversion for software versioning and revision control system.
   
In conclusion, the governance model is a Cathedral composed by ordinary members, and base on meritocracy to be elected by the ordinary members. 
   
References:
[1] R project
[2] R Developer Page
[3] R Project Contributors
[4] R Foundation Members & Supporters
[5] Statutes of “The R Foundation for Statistical Computing”
[6] R bug

Modelo de gobierno en un proyecto de software libre

La variedad en proyectos de software libre es inmensa, superior sin dudas a los provectos de software privativo, la cuestión clara es que existe un modelo de gobierno en cada uno de ellos, pero cómo de homogenios o heterogenios son estos modelos es cuestionable. Lo cierto es que un marco de proyectos de software libre el estudio de estos es factible por ser un entorno abierto.

En los proyectos de software libre donde existe una base colaboradores importante, que no quiere decir muchos contribuidores, posiblemente hablamos a partir de 5. Existe siempre de manera implícita o explicita en el modelo de gobernanza un procedimiento para participar, proteger su trabajo y permitir compartirlo.

Los modelos de gobierno en los proyectos de software libre se mueven entre aquellos centralizados por una sola persona o organización, por ejemplo, Linux centralizado en Linux Torvald o proyectos como Emacs de GNU donde el desarrollo esta completamente controlado por la fundación GNU.
U otros constituidos como una meritocracia de sus miembros, por ejemplo, sería el proyecto R, completamente desarrollado por miembros "ilustres" de la comunidad y muy cerrado a la posibilidad de desarrollo. En el otro extremo de la apertura a la colaboración en comunidades de meritocracia, sería la fundación Apache con proyectos como Apache httpd.

Los proyectos de software libre suelen comenzar teniendo un modelo de gobernanza centralizado y cerrado a la participación, si lo comparamos con el software privativo ese es su modelo de gobernanza. Y a los largo de la evolución de los proyectos suelen ampliar sus grados de libertad de colaboración y su descentralización a modelos de meritocracia del los miembros de la comunidad. Pero es importante indicar que definir un modelo de gobernanza claro al inicio del proyecto es fundamental para que la comunidad alrededor del proyecto se dinamice y crezca.

El porqué unos proyectos evolucionan en una dirección de una comunidad gobernada por la meritocracia o mas abiertos a la participación es variado, desde la personalidad del líder o la organización gestora, hasta temas relacionados con el tamaño y perfil de los colaboradores, o evolución de la comunidad como la existencia de forks, o incluso razones asociadas al propio mercado, por ejemplo por una razón de posicionamiento como sería el caso de Android.

Establecido el modelo de gobernanza en un proyecto de software libre parece a primera vista que puede ser complicado llegar a tomar decisiones. Por ello en los modelos de gobernanza se tiene que documentar de manera explicita los procesos de toma de decisiones y quien aprueba cada una de las fases. En los procesos de toma de decisiones de las comunidades centralizadas parece más intuitivo, pero que ocurre en las comunidades gobernadas por meritrocacia. En algunos proyectos con este sistema, suelen tener un control débil sobre el proceso de toma de decisiones y permite de este modo a la comunidad de una manera consciente y mediantes mecanismos de votación la toma de decisiones. Donde podemos identificar dos modelos habituales de decisión:
  • Consenso
      Ante una propuesta de un miembro de la comunidad, se analiza y debate por el resto de miembros, si la propuesta no es rechazada por ningún miembro de la comunidad durante un periodo de tiempo acordado, se considera tomada por consenso. Este consenso en algunos proyectos se categoriza como relajado(lazy), relajado mayoritario, consenso y unánime.
    
  • Votaciones.
      Donde todos o parte de los miembros de la comunidad tienen derecho a participar en la discusión y en las votaciones. La votación suele utilizarse solamente para temas de bloqueo o en algunos caso legales.

Aunque las comunidades son en principio planas con miembros con los mismos derechos, la meritocracia evoluciona a tener miembros con un peso "moral" superior en la toma de decisiones dentro de la comunidad, conseguido por el numero y calidad de sus contribuciones.

Y que elementos debería tener un modelo de gobernanza:
  • Roles y  responsabilidades: Es importante definir que diferentes roles pueden participar en el proyecto, si existe algún rol que sea responsable del proyecto o de parte, si existen comité. Hay que detallar las responsabilidades de cada uno de los roles.
  • Soporte: Describe los procesos y canales de soporte para el usuario y colaborador. Es muy importante, ya que hay que tener en cuenta que los usuarios de hoy son los futuros contribuyentes.
  • Proceso de toma de decisiones: Es critico conocer como se tomas las decisiones, que deber ser comunicado de manera clara y no ambigua.
  • Proceso de participación: Se debe detallar la documentación, repositorios y todo lo que sea necesario para que un usuario que quiera participar lo pueda hacer.

Para finalizar es importante destacar que parece razonable pensar que un modelo claro de participación y comunicación, y por lo tanto su modelo de gobernanza, es básico para el crecimiento y sostenibilidad de un proyecto de software libre.

References:
[1] How the ASF works
[2] Mozilla Gobernance
[3] Governance Models
[4] Meritocratic Governance Model

Pareto’s Law in free software project

I wonder whether the Pareto’s Law in dedication hours is true in free software projects. The Pareto's Law is a principle of economist announced by Vilfredo Pareto that, in generalized form, holds that 80% of output originates from 20% of input. The 80/20 ratio can be applied to revenues, advertising effectiveness, or management headaches. For example, 20% of advertising produces 80% of results. Likewise, 80% of management headaches are caused by 20% of the employees. Also called 80/20 rule.

Therefore in free software projects this law could be that 20% of committers produce 80% of commits.
Let's go to analyse some projects to do a first approximation checking of the law, the list of chosen projects:
    - LibreOffice
    - CloudStack
    - Evince
    - CVSAnalyt
   
The analysis assesses number of commits and committers in the 2013 and aggregate committers to 20% of commits, and calculate percentage in number of committers. therefore the results are
    - CloudStack
 

    - Linux


       
Therefore, the Pareto's Law does not seem true in these big projects (Maybe we should define accurately when a committer is really a contributor, a contribuitor with 1 commit is a contributor?.)

But finally, this law is valid for small project, try to check the others, for instance, evince (a pdf visor) and even CVSAnalyt.
    - Evince
   
    - CVSAnalyt



In these projects, the law does not seem to be true neither.


References:
[1] Metrics Grimoire - http://metricsgrimoire.github.io/
[2] CVSAnalY - https://github.com/MetricsGrimoire/CVSAnalY
[3] LibreOffice repository - git://anongit.freedesktop.org/libreoffice/core
[4] CloudStack repository - https://git-wip-us.apache.org/repos/asf/cloudstack.git
[5] Linux repository - https://github.com/torvalds/linux

Note:
Using query in CVSAnalY to get data
SELECT COUNT(distinct(scmlog.id)) as total, people.email
FROM scmlog, actions, people
WHERE scmlog.id = actions.commit_id
AND scmlog.author_id = people.id
AND YEAR(scmlog.date) = 2013
GROUP BY people.email
ORDER BY total DESC
INTO OUTFILE '/tmp/data.csv'
FIELDS TERMINATED BY ','
ENCLOSED BY '"'
LINES TERMINATED BY '\n';

jueves, 2 de enero de 2014

MetricsGrimoire setting - KVM - Python - MariaDB - CVSAnalyt

The goal of this entry is to create a controlled setting Virtual Machine for Metrics Grimoire tools.

My setup

Host OS: Ubuntu 12.04 LTS
KVM: QEMU emulator version 1.5.0 (Debian 1.5.0+dfsg-3ubuntu5.1)
VM OS: Ubuntu Quantal Quetzal 12.04 with libvirtd

Create a VM in KVM
With the following characteristics:
    CPU: 1 vCPU
    RAM: 1 GB
    HDD: 14 GB
   
we are going to use Virtual Machine Manager, in particular, "Create a new virtual machine" to create a virtual machine (VM) with name metricsGrimoireVM, and choosing "local install media(ISO image or CDROM" in the step 1.


In the step 2, choose iso image, in this case, ubuntu-12.04.3-server-amd64.iso, and OS type an version. We choose this version since it is a LTS version.



In the step 3, we are going to choose memory 1 GB and 1 vCPU.


In the step 4, it is necessary to create a disk image for storage, in this cases, 14 GB, enough to save several repositories. For instance, Linux uses around 2 GB. We use the command fallocate to create a fully-allocated (non-sparse) raw file:

root@gon:/media/janague/vms# fallocate -l 14G metricsGrimoireDisk.img



In the step 5, a configuration summary

  
Install Ubuntu Server 12.04 LTS

In the the Ubuntu Server Guide in chapter 2 is detailed installation process.
https://help.ubuntu.com/12.04/serverguide/serverguide.pdf

hostname: metricsGrimoireHost

Besides, it is important in the installation process to install openSSH server.

To make easier access, we set a static IP on metricsGrimoreHost, editing file /etc/network/interfaces

auto lo eth0
iface lo inet loopback

iface eth0 inet static
    address 192.168.122.101
    netmask 255.255.255.0
    gateway 192.168.122.1

and in the host server in /etc/hosts file

 # MSWL - Project Evaluation
 192.168.122.101 metricsGrimoireHost

To create your public and private SSH keys to access in secure mode, use this link
https://help.ubuntu.com/community/SSH/OpenSSH/Keys

Install git

janague@metricsGrimoireHost:~$ sudo apt-get install git
Reading package lists... Done
Building dependency tree      
Reading state information... Done
The following extra packages will be installed:
  git-man liberror-perl
Suggested packages:
  git-daemon-run git-daemon-sysvinit git-doc git-el git-arch git-cvs git-svn git-email git-gui gitk
  gitweb
The following NEW packages will be installed
  git git-man liberror-perl
0 upgraded, 3 newly installed, 0 to remove and 66 not upgraded.
Need to get 6,741 kB of archives.
After this operation, 15.2 MB of additional disk space will be used.

...

Unpacking git (from .../git_1%3a1.7.9.5-1_amd64.deb) ...
Processing triggers for man-db ...
Setting up liberror-perl (0.17-1) ...
Setting up git-man (1:1.7.9.5-1) ...
Setting up git (1:1.7.9.5-1) ...

Install Python

janague@metricsGrimoireHost:~$ sudo apt-get install python3
Reading package lists... Done
Building dependency tree      
Reading state information... Done
The following extra packages will be installed:
  python3-minimal python3.2 python3.2-minimal
Suggested packages:
  python3-doc python3-tk python3.2-doc binutils binfmt-support
The following NEW packages will be installed
  python3 python3-minimal python3.2 python3.2-minimal
0 upgraded, 4 newly installed, 0 to remove and 66 not upgraded.
Need to get 4,355 kB of archives.
After this operation, 14.7 MB of additional disk space will be used.

Install MariaDB

Here are the commands to run to add MariaDB to your system:

1.- Install the repo manager

janague@metricsGrimoireHost:~$ sudo apt-get install python-software-properties
Reading package lists... Done
Building dependency tree      
Reading state information... Done
The following extra packages will be installed:
  python-pycurl unattended-upgrades
Suggested packages:
  libcurl4-gnutls-dev python-pycurl-dbg bsd-mailx
The following NEW packages will be installed
  python-pycurl python-software-properties unattended-upgrades
0 upgraded, 3 newly installed, 0 to remove and 66 not upgraded.
Need to get 97.6 kB of archives.
After this operation, 657 kB of additional disk space will be used.

...

Setting up python-pycurl (7.19.0-4ubuntu3) ...
Setting up python-software-properties (0.82.7.6) ...

2.- Import the GnuPG signing key

janague@metricsGrimoireHost:~$ sudo apt-key adv --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xcbcb082a1bb943db
Executing: gpg --ignore-time-conflict --no-options --no-default-keyring --secret-keyring /tmp/tmp.EjYqLFa95L --trustdb-name /etc/apt/trustdb.gpg --keyring /etc/apt/trusted.gpg --primary-keyring /etc/apt/trusted.gpg --recv-keys --keyserver hkp://keyserver.ubuntu.com:80 0xcbcb082a1bb943db
gpg: requesting key 1BB943DB from hkp server keyserver.ubuntu.com
gpg: key 1BB943DB: public key "MariaDB Package Signing Key <package-signing-key@mariadb.org>" imported
gpg: no ultimately trusted keys found
gpg: Total number processed: 1
gpg:               imported: 1
janague@metricsGrimoireHost:~$ sudo add-apt-repository 'deb http://ftp.igh.cnrs.fr/pub/mariadb/repo/5.5/ubuntu precise main'
Once the key is imported and the repository added you can install MariaDB with:

4.- Refresh your system

janague@metricsGrimoireHost:~$ sudo apt-get update
Hit http://es.archive.ubuntu.com precise Release.gpg
Hit http://es.archive.ubuntu.com precise-updates Release.gpg                  
Hit http://es.archive.ubuntu.com precise-backports Release.gpg                
Hit http://es.archive.ubuntu.com precise Release 

...

Hit http://es.archive.ubuntu.com precise-backports/multiverse Translation-en
Hit http://es.archive.ubuntu.com precise-backports/restricted Translation-en
Hit http://es.archive.ubuntu.com precise-backports/universe Translation-en
Fetched 11.5 kB in 1s (6,567 B/s)
Reading package lists... Done

5.- And finally install MariaDB

janague@metricsGrimoireHost:~$ sudo apt-get install mariadb-server
Reading package lists... Done
Building dependency tree      
Reading state information... Done
The following extra packages will be installed:
  libaio1 libdbd-mysql-perl libdbi-perl libhtml-template-perl
  libmariadbclient18 libmysqlclient18 libnet-daemon-perl libplrpc-perl
  mariadb-client-5.5 mariadb-client-core-5.5 mariadb-common mariadb-server-5.5
  mariadb-server-core-5.5 mysql-common
Suggested packages:
  libipc-sharedcache-perl tinyca mailx mariadb-test
The following NEW packages will be installed
  libaio1 libdbd-mysql-perl libdbi-perl libhtml-template-perl
  libmariadbclient18 libmysqlclient18 libnet-daemon-perl libplrpc-perl
  mariadb-client-5.5 mariadb-client-core-5.5 mariadb-common mariadb-server
  mariadb-server-5.5 mariadb-server-core-5.5 mysql-common
0 upgraded, 15 newly installed, 0 to remove and 66 not upgraded.
Need to get 32.7 MB of archives.
After this operation, 114 MB of additional disk space will be used.

...

Setting up mariadb-server-core-5.5 (5.5.34+maria-1~precise) ...
Setting up mariadb-server-5.5 (5.5.34+maria-1~precise) ...
 * Stopping MariaDB database server mysqld                               [ OK ]
131224 20:00:56 [Note] Plugin 'InnoDB' is disabled.
131224 20:00:56 [Note] Plugin 'FEEDBACK' is disabled.
 * Starting MariaDB database server mysqld                               [ OK ]
 * Checking for corrupt, not cleanly closed and upgrade needing tables.
Setting up mariadb-server (5.5.34+maria-1~precise) ...
Processing triggers for libc-bin ...
ldconfig deferred processing now taking place

See Installing MariaDB .deb Files for more information.

You can also create a custom MariaDB sources.list file. To do so, copy and paste the following into a file under /etc/apt/sources.list.d/(we suggest naming the file MariaDB.list or something similar), or add it to the bottom of your /etc/apt/sources.list file.

# MariaDB 5.5 repository list - created 2013-12-24 18:52 UTC # http://mariadb.org/mariadb/repositories/ deb http://ftp.igh.cnrs.fr/pub/mariadb/repo/5.5/ubuntu precise main deb-src http://ftp.igh.cnrs.fr/pub/mariadb/repo/5.5/ubuntu precise main

References:
https://downloads.mariadb.org/mariadb/repositories/#mirror=cnrs&distro=Ubuntu&distro_release=precise&version=5.5

Create database and user for analysis

Using a root account on the applications as user is risky. It is more secure to create a normal user for the applications.

janague@metricsGrimoireHost:~$ mysql -u root -p

Enter password:

Welcome to the MariaDB monitor.  Commands end with ; or \g.

Your MariaDB connection id is 30

Server version: 5.5.34-MariaDB-1~precise-log mariadb.org binary distribution

Copyright (c) 2000, 2013, Oracle, Monty Program Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [(none)]> create user 'janague'@'localhost' identified by 'janague123';
Query OK, 0 rows affected (0.00 sec)

MariaDB [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
+--------------------+
3 rows in set (0.00 sec)

MariaDB [(none)]> create database CVSAnalytDB;
Query OK, 1 row affected (0.00 sec)

MariaDB [(none)]> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| CVSAnalytDB        |
| mysql              |
| performance_schema |
+--------------------+
4 rows in set (0.00 sec)

MariaDB [(none)]> grant all privileges on CVSAnalytDB.* to 'janague'@'localhost' identified by 'janague123';
Query OK, 0 rows affected (0.00 sec)

Install CVSAnalyt

Create a directory for the repositories with user different of root.

janague@metricsGrimoireHost:~$ mkdir repositories
janague@metricsGrimoireHost:~$ cd repositories/

Clone CVSAnalyt repository

janague@metricsGrimoireHost:~/repositories$ git clone git://github.com/MetricsGrimoire/CVSAnalY.git
Cloning into 'CVSAnalY'...
remote: Counting objects: 3029, done.
remote: Compressing objects: 100% (1225/1225), done.
remote: Total 3029 (delta 1828), reused 2964 (delta 1776)
Receiving objects: 100% (3029/3029), 1.95 MiB | 704 KiB/s, done.
Resolving deltas: 100% (1828/1828), done.

You can install cvsanaly2 just by running setup.py script in CVSAnalY directory

janague@metricsGrimoireHost:~/repositories/CVSAnalY$ sudo python setup.py install
Traceback (most recent call last):
  File "setup.py", line 35, in <module>
    from setuptools import setup
ImportError: No module named setuptools

In this case, we have this error "No module named setuptools". To resolve the problem we need to install python-pip and setuptools

janague@metricsGrimoireHost:~/repositories/CVSAnalY$ sudo apt-get install python-pip
Reading package lists... Done
Building dependency tree      
Reading state information... Done
The following extra packages will be installed:
  python-setuptools
The following NEW packages will be installed
  python-pip python-setuptools
0 upgraded, 2 newly installed, 0 to remove and 66 not upgraded.
Need to get 536 kB of archives.
After this operation, 1,467 kB of additional disk space will be used.
Do you want to continue [Y/n]? Y
Get:1 http://es.archive.ubuntu.com/ubuntu/ precise/main python-setuptools all 0.6.24-1ubuntu1 [441 kB]
Get:2 http://es.archive.ubuntu.com/ubuntu/ precise/universe python-pip all 1.0-1build1 [95.1 kB]
Fetched 536 kB in 0s (619 kB/s) 
Selecting previously unselected package python-setuptools.
(Reading database ... 52983 files and directories currently installed.)
Unpacking python-setuptools (from .../python-setuptools_0.6.24-1ubuntu1_all.deb) ...
Selecting previously unselected package python-pip.
Unpacking python-pip (from .../python-pip_1.0-1build1_all.deb) ...

...

janague@metricsGrimoireHost:~/repositories/CVSAnalY$ sudo  pip install --upgrade setuptools
Downloading/unpacking distribute
  Downloading distribute-0.7.3.zip (145Kb): 145Kb downloaded
  Running setup.py egg_info for package distribute

...

    Installing easy_install script to /usr/local/bin
    Installing easy_install-2.7 script to /usr/local/bin
Successfully installed distribute setuptools
Cleaning up...

Try again to install CVSAnalyt


janague@metricsGrimoireHost:~/repositories/CVSAnalY$ sudo python setup.py install
running install
running bdist_egg
running egg_info
creating cvsanaly2.egg-info
writing requirements to cvsanaly2.egg-info/requires.txt
writing cvsanaly2.egg-info/PKG-INFO

...

Installed /usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg
Processing dependencies for cvsanaly2==2.1.0
Searching for RepositoryHandler==0.5.1
Best match: RepositoryHandler 0.5.1
Adding RepositoryHandler 0.5.1 to easy-install.pth file
Using /usr/local/lib/python2.7/dist-packages
Finished processing dependencies for cvsanaly2==2.1.0

Install dependence to RepositoryHandler


janague@metricsGrimoireHost:~/repositories$ git clone git://github.com/MetricsGrimoire/RepositoryHandler.git
Cloning into 'RepositoryHandler'...
remote: Counting objects: 649, done.
remote: Compressing objects: 100% (223/223), done.
remote: Total 649 (delta 423), reused 635 (delta 411)
Receiving objects: 100% (649/649), 173.71 KiB | 230 KiB/s, done.
Resolving deltas: 100% (423/423), done.
Install RepositoryHandler
janague@metricsGrimoireHost:~/repositories/RepositoryHandler$ sudo python setup.py install
[sudo] password for janague:
running install
running build
running build_py
creating build
creating build/lib.linux-x86_64-2.7
creating build/lib.linux-x86_64-2.7/repositoryhandler

...

byte-compiling /usr/local/lib/python2.7/dist-packages/repositoryhandler/Downloader.py to Downloader.pyc
running install_egg_info
Removing /usr/local/lib/python2.7/dist-packages/RepositoryHandler-0.5.1.egg-info
Writing /usr/local/lib/python2.7/dist-packages/RepositoryHandler-0.5.1.egg-info

To test CVSAnalyt installation, let's go to analyse CVSAnalyt repository. We are going to use de database CVSAnalytDB.

Before executing CVSAnalyt


janague@metricsGrimoireHost:~$ mysql -u janague -p -h localhost CVSAnalytDB
Enter password:
Welcome to the MariaDB monitor.  Commands end with ; or \g.
Your MariaDB connection id is 32
Server version: 5.5.34-MariaDB-1~precise-log mariadb.org binary distribution
Copyright (c) 2000, 2013, Oracle, Monty Program Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MariaDB [CVSAnalytDB]> show tables;
Empty set (0.00 sec)

Execute cvsanaly2 in CVSAnalyt repository


janague@metricsGrimoireHost:~/repositories/CVSAnalY$ cvsanaly2 -u janague -d CVSAnalytDB
Traceback (most recent call last):
  File "/usr/local/bin/cvsanaly2", line 5, in <module>
    pkg_resources.run_script('cvsanaly2==2.1.0', 'cvsanaly2')
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources.py", line 487, in run_script
    self.require(requires)[0].run_script(script_name, ns)
  File "/usr/local/lib/python2.7/dist-packages/pkg_resources.py", line 1337, in run_script
    execfile(script_filename, namespace, namespace)
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/EGG-INFO/scripts/cvsanaly2", line 37, in <module>
    retval = pycvsanaly2.main.main (sys.argv[1:])
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/main.py", line 275, in main
    config.db_hostname)
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/Database.py", line 609, in create_database
    db.connect ().close ()
  File "/usr/local/lib/python2.7/dist-packages/cvsanaly2-2.1.0-py2.7.egg/pycvsanaly2/Database.py", line 462, in connect
    import MySQLdb
ImportError: No module named MySQLdb

cvsanaly2 shows an error, because we have a dependence of MySQLdb module.


janague@metricsGrimoireHost:~/repositories/CVSAnalY$ sudo apt-get install python-mysqldb
Reading package lists... Done
Building dependency tree      
Reading state information... Done
Suggested packages:
  python-egenix-mxdatetime python-mysqldb-dbg
The following NEW packages will be installed
  python-mysqldb
0 upgraded, 1 newly installed, 0 to remove and 66 not upgraded.
Need to get 64.0 kB of archives.
After this operation, 221 kB of additional disk space will be used.
Get:1 http://es.archive.ubuntu.com/ubuntu/ precise-updates/main python-mysqldb amd64 1.2.3-1ubuntu0.1 [64.0 kB]
Fetched 64.0 kB in 0s (137 kB/s)   
Selecting previously unselected package python-mysqldb.
(Reading database ... 53198 files and directories currently installed.)
Unpacking python-mysqldb (from .../python-mysqldb_1.2.3-1ubuntu0.1_amd64.deb) ...
Setting up python-mysqldb (1.2.3-1ubuntu0.1)

...

Execute again cvsanaly2 in CVSAnalyt repository


janague@metricsGrimoireHost:~/repositories/CVSAnalY$ cvsanaly2 -u janague -d CVSAnalytDB
Password:
Parsing log for /home/janague/repositories/CVSAnalY (git)
Warning: Detected empty branch 'libresoft-utils', it'll be ignored
Executing extensions

And now in the database it shows the following tables


MariaDB [CVSAnalytDB]> show tables;
+-----------------------+
| Tables_in_CVSAnalytDB |
+-----------------------+
| action_files          |
| actions               |
| actions_file_names    |
| branches              |
| file_copies           |
| file_links            |
| files                 |
| people                |
| repositories          |
| scmlog                |
| tag_revisions         |
| tags                  |
+-----------------------+
12 rows in set (0.00 sec)

Some example queries

1.- Top of committers


MariaDB [CVSAnalytDB]> select p.name, count(distinct(s.id)) num_commits from people p, scmlog s where p.id=s.author_id group by p.name order by count(distinct(s.id)) desc;
+--------------------------------+-------------+
| name                           | num_commits |
+--------------------------------+-------------+
| Carlos Garcia Campos           |         277 |
| Jesus M. Gonzalez-Barahona     |          60 |
| Israel Herraiz                 |          39 |
| Andy Grunwald                  |          31 |
| Santiago Dueñas                |          23 |
| Alvaro Navarro                 |          17 |
| Germán Póo-Caamaño             |          17 |
| Santiago Dueñas Domínguez      |          17 |
| Luis Cañas Díaz                |          10 |
| Daniel Izquierdo Cortazar      |           9 |
| Alvaro del Castillo            |           7 |
| Gregorio Robles                |           6 |
| Daniel Izquierdo               |           4 |
| Luis Cañas                     |           3 |
| Luis Cañas-Díaz                |           3 |
| Felipe Ortega                  |           3 |
| companheiro.vermelho@gmail.com |           2 |
| Ilya Shakhat                   |           1 |
| Juan Francisco Gato Luis       |           1 |
| Carlos Gonzalez                |           1 |
| Liliana Tovar                  |           1 |
| Maëlick Claes                  |           1 |
+--------------------------------+-------------+
22 rows in set (0.00 sec)

2.- Sort list of developers with the number of touched files.


MariaDB [CVSAnalytDB]> select p.name, count(distinct(a.file_id)) num_files from people p, actions a, scmlog s where s.id = a.commit_id and s.author_id = p.id group by p.name order by count(distinct(a.id)) desc;
+--------------------------------+-----------+
| name                           | num_files |
+--------------------------------+-----------+
| Carlos Garcia Campos           |       302 |
| Alvaro Navarro                 |       138 |
| Jesus M. Gonzalez-Barahona     |        56 |
| Israel Herraiz                 |        43 |
| Andy Grunwald                  |        27 |
| Luis Cañas Díaz                |        43 |
| Santiago Dueñas Domínguez      |         5 |
| Luis Cañas-Díaz                |        32 |
| Santiago Dueñas                |        19 |
| Germán Póo-Caamaño             |         5 |
| Liliana Tovar                  |        12 |
| companheiro.vermelho@gmail.com |        12 |
| Alvaro del Castillo            |         8 |
| Carlos Gonzalez                |         9 |
| Daniel Izquierdo Cortazar      |         7 |
| Gregorio Robles                |         6 |
| Juan Francisco Gato Luis       |         7 |
| Daniel Izquierdo               |         4 |
| Luis Cañas                     |         3 |
| Felipe Ortega                  |         2 |
| Ilya Shakhat                   |         1 |
| Maëlick Claes                  |         1 |
+--------------------------------+-----------+
22 rows in set (0.01 sec)

3.- Sort list of contributors in 2013.


MariaDB [CVSAnalytDB]> select p.name, count(distinct(a.file_id)) num_files from people p, actions a, scmlog s where s.id = a.commit_id and s.author_id = p.id and year(s.date) = 2013 group by p.name order by count(distinct(a.id)) desc;
+---------------------------+-----------+
| name                      | num_files |
+---------------------------+-----------+
| Andy Grunwald             |        27 |
| Carlos Gonzalez           |         9 |
| Santiago Dueñas           |         5 |
| Alvaro del Castillo       |         2 |
| Daniel Izquierdo Cortazar |         2 |
| Daniel Izquierdo          |         1 |
| Ilya Shakhat              |         1 |
| Maëlick Claes             |         1 |
+---------------------------+-----------+
8 rows in set (0.00 sec)

References
http://metricsgrimoire.github.io/
https://github.com/MetricsGrimoire
https://github.com/MetricsGrimoire/CVSAnalY
http://www.linux-kvm.org
https://mariadb.org/
https://help.ubuntu.com/12.04/serverguide/serverguide.pdf
https://help.ubuntu.com/community/SSH/OpenSSH/Keys