✏️ writes @inishchith

Home About Work RSS

[week-7] Scope of improvement

  • GSoC
  • 2019
  • week-7
  • coding-period-2

We had our second meeting for the second coding phase (28th June to 26th July) on Tuesday, 16th July 2019 at 12:30 CEST or 16:00 IST, which was as always conducted on #grimoirelab channel.
In this post, I’ll try to highlight some key points discussed during the meeting, ranging from the work done in this week, blockers or pending work (if any) to the work planned for the coming week.
(This week’s meeting had to be rescheduled due to unavailability of mentors.)

# What work was done in this week?

  • Our goal for this week was to make refinements by reiterating through the Code Complexity & Code License Dashboards
  • Along with improvements (listed in the last blog post), we also had to explore and evaluate solutions to improve the performance of Graal’s integration in terms of memory management and time for execution.

# Code Complexity(CoCom) Dashboard

CoCom Dashboard

  • We had started off the week with reiteration and making some necessary changes such as rearrangement of widgets, adding appropriate labels and revamping some of the visualizations (as the changes can be seen above).

During our re-iteration time, @valcos had pointed out that due to no data corresponding to the common datetime, the current line-chart has problems. It should produce the sum total of the metrics on the line-chart at every point, which was incorrect in our case. As it would show the metric corresponding to the latest commit.
(We, earlier had a line-chart for each metric, lines in the line-chart would show the evolution of each repository(origin) for the corresponding metric over time, But this would cause problems in case of a large number of repositories. Hence, we had to think of other possible solutions)

  • Ahead of all, we had already addressed the performance issues that repository-level data brought along with the insightful results and this was the time we had to re-think of ways to eliminate the overhead. At this point, @valcos had suggested me to explore the execution of a study over the available index which would help eliminate execution of analyzer at repository-level and hence reducing the memory overhead.

  • I could produce an initial implementation of study and following is a comparison

Repository Commits   File Level   Repository Level   File-Level + Study 
grimoirelab-kingarthur 208 3:12 min 34:23 min 5:06 min
grimoirelab-graal 179 2:22 min 28:35 min 3:58 min
  • Space Complexity: For instance, if we consider an item per file, the results for Graal repository would be as follows: (no. of files: ~200, no. of commits: ~180)
    • Repository Level: 36000 items (200*180)
    • File-Level + Study: 200 items
  • A more detailed summary and discussion related to the above exploration can be found on the issue thread.
  • Instead of a line-chart, now we can easily implement a Bar-chart, split wrt. to repositories and in case of common data points we can have the sum total of the metrics. (created with the help of data in study index)

Stacked Bar-Chart

  • The incremental iteration of the CoCom Dashboard can be viewed from the remote instance : here

# Code License(CoLic) Dashboard

CoLic Dashboard

  • We had decided to have as many insightful visualizations as possible on the dashboard and to reduce the number of data tables.

  • Refinements
    • We earlier had separate tables for license and copyright definition per file, we’ve now merged it into a single table which is delegated with the help of Selector.
    • We’ve added more Pie-chart for the license and copyright information
      • All license and copyright definition.
      • Replaced number metric with Pie-Chart showing the number of licensed v/s non-licensed files, a similar one for copyrighted v/s non-copyrighted files.
  • Note: In the last meeting we’d addressed the issue with copyright information extracted, The problem seems to be with the underlying tool(Scancode-toolkit) that we’re using to extract the related information, we’re planning to look into this in due time.

  • Today’s meeting would be very crucial in terms of defining further steps with the details that we have currently have and prioritizing work that needs to be done.

  • The incremental iteration of the CoLic Dashboard can be viewed from the remote instance : here

  • Working branch
  • Issue Thread

# Plans for next week:

  • For the next week, we’re planning to put more efforts on the optimization of the in-place implementation of study(“the cache_dict” approach), which is as of now better than others. One of the ideas being iterations per repository, which reduces the space complexity from (number_of_repositories)*(number_of_files) to just number_of_files.
  • Once we have the Code Complexity enricher completely ready, we can move on to switch the visualizations leveraging on the repository-level data to the one in study index.
  • A similar approach can be implemented for Code License enricher and study index in order to produce license & copyright related evolution metrics.

# Footnotes:

  • Next meeting will be conducted around Tuesday, 23th July 2019 at 12:30 CEST or 16:00 IST, so let’s catch up next week ;)
  • IRC conversation log of the meeting can be found here
  • Project tracker can be found here

I’ll be posting updates on the mailing list as well as on the twitter thread.

If you have any questions or suggestions, please make sure to comment down below ;)