✏️ writes @inishchith

Home About Work RSS

[week-8] Finalizing the Enrichment phase

  • GSoC
  • 2019
  • week-8
  • coding-period-2

We had our last meeting for the second coding phase (28th June to 26th July) on Friday, 26th July 2019 at 13:30 CEST or 17:00 IST, which was as always conducted on #grimoirelab channel.
In this post, I’ll try to highlight some key points discussed during the meeting, ranging from the work done in this week, blockers or pending work (if any) to the work planned for the coming week.

# What work was done in this week?

  • We had discussed the optimization approaches in the last meeting in order to tackle performance issues related to memory and execution time. This week, we spent time implementing and evaluating them and making possible refinements to the dashboard.
  • With the help of the new approach (i.e query-chunk via study), we could completely get rid of the extra memory(cach_dict) by trading some extra execution time for it. After evaluating the new approach, we have decided to go ahead with it.

# Code Complexity(CoCom) Dashboard

CoCom Dashboard

  • Earlier the visualizations were bound to repository-level data and we had concerns related to the implementation of evolution metrics and their visualizations.
  • After migrating to the new approach, we’ve decided to keep the visualizations segregated into different levels (file/commit-level and ones obtained via study.) which would help us enhance the quality of results shown on the dashboard.
  • With the help of the study-level data, we could get the evolution metrics correct along with the line-chart corresponding to the metrics which can be delegated via the in-place Selector.

# Code License(CoLic) Dashboard

CoLic Dashboard

  • The CoLic dashboard was much cleaner and insightful which was bind to commit-level(primitive) data produced by Graal, we had planned to add evolution metrics for licensed and copyrighted files by adapting the new study-approach employed at the CoCom Enricher.
  • The incremental iteration of the CoCom & CoLic Dashboard can be viewed from the remote instance

  • In summary, and as per the evaluation results, we had started off the process long time back with (1)Repository-Level enricher which used about 36,000 items in memory. We tried to improve this via (2)File-level + Study(cache_dict) and brought down the execution time significantly and memory usage to just 200 items. And finally, to File-Level + Study(query_chuck) which brought down the memory usage to a worst-case of number_of_intervals items (an interval is decided by chunking data into months, starting from the first commit) and all by trading of just 5% more execution time. 🎉

# Plans for next week:

  • For the coming week, we’ll be focusing on code-reviews for the Pull Requests related to the Graal integration @ chaoss/grimoirelab-elk along with rectifying some errors/redundancy that may have occurred or fields of data that might be missing.
  • We’re currently on the 2nd iteration of the CoCom & CoLic dashboards, which we’ve thought to finalize in next few days and look for some more improvements thereafter ;)

# Footnotes:

  • IRC conversation log of the meeting can be found here
  • Project tracker can be found here

I’ll be posting updates on the mailing list as well as on the twitter thread.

If you have any questions or suggestions, please make sure to comment down below ;)