✏️ writes @inishchith

Home About Work RSS

[week-4] Structuring data & Evaluating approaches

  • GSoC
  • 2019
  • week-4
  • coding-period-1

We had our 4th meeting for the first coding phase (27th May to 28th June) on Friday, 21st June 2019 at 12:30 CEST or 16:00 IST, which was as always conducted on #grimoirelab channel.
In this post, I’ll try to highlight some key points discussed during the meeting, ranging from the work done in this week, blockers or pending work (if any) to the work planned for the coming week.

# What work was done in this week?

Last week, we had addressed the issue with then implementation of repository-level analysis that it wouldn’t persist the file-level information and had discussed a new enrich index structure.

  • Given a thought; Instead of altering the structure of enrich index, there’s an alternate way to deal with the problem and would lead to implementation. I implemented a rough version of repository-level analysis this time with the help of lizard (which is also employed for file-level analysis), as it provided extra optimization for recursively performing the operation on a given software repository. (Evaluation results below)

Repository Number of Commits    File Level   Repository Level
chaoss/grimoirelab-perceval 1394 26:22 min 26:56 min
chaoss/grimoirelab-sirmordred 869 08:51 min 3:51 min
chaoss/grimoirelab-graal 171 2:24 min 1:04 min

above results are considering only the master branch. We can observe that the divergence in execution time is dependent on the number of files in a given repository (for obvious reasons 😅)

  • With the help of the above implementation, I could produce the following metrics in Kibana with minimal configuration. The approaches for producing others (including the evolution of the metrics) efficiently is yet to be discussed with mentors.

Repository: chaoss/grimoirelab-perceval

[ Total Lines of Code after latest commit ]

Screenshot 2019-06-13 at 11 23 55 PM

[ Total Code Complexity after latest commit ]

Screenshot 2019-06-13 at 11 24 35 PM

[ Most Complex files after latest commit ]

Screenshot 2019-06-12 at 7 48 09 PM

This week’s task also included working on TimeLion visualizations, but as our idea of implementing the metrics involved bucketing (an attribute) and I couldn’t get a clear reference for implementation and after giving some time had to stage the work which will be addressed in today’s meeting with some discussion on how to proceed with more visualizations.

# Plans for next week:

  • In the coming week, we’re planning to get ready with the first dashboard for a set of visualizations of metrics produced with the help of Graal’s CoCom backend.

  • Currently, we have implemented - 1) Overall Lines of Code. 2) Overall Code Complexity 3) Top Complex files in the software repository. (as shown above). We’re planning to add more visualizations for metrics such as the relation between LOC and CCN, LOC and functions, and others.

  • There are some refinements that can be done in the structure of results produced by CoCom’s repository-level analysis in order to aid implementing visualization of other metrics (evolution of LOC, CCN, and others) and (if needed) some alteration in the structure of the enriched index.

  • We’ll be kicking off the discussion related to visualizations which will be part of the dashboard at - inishchith/gsoc-graal#11: [visualization] Dashboard for CoCom backend

  • [ Optional ] To analyze data produced by Graal’s CoLic(Code License) backend, evaluate approaches and set goals for 2nd Coding Period :)

# Footnotes:

  • Next meeting will be conducted on Friday, 28th June 2019 at 12:30 CEST or 16:00 IST, so let’s catch up next week ;)
  • IRC conversation log of the meeting can be found here
  • GSoC project proposal
  • Project tracker can be found here

I’ll be posting updates on the mailing list as well as the following twitter thread. So in case you’re interested, please do follow the thread -

If you have any questions or suggestions, please make sure to comment down below ;)