✏️ writes @inishchith

Home About Work RSS

[week-3] Ideate Visualize Repeat

  • GSoC
  • 2019
  • week-3
  • coding-period-1

We had our 3rd meeting for the first coding phase (27th May to 28th June) on Friday, 14th June 2019 at 12:30 CEST or 16:00 IST, which was as always conducted on #grimoirelab channel.
In this post, I’ll try to highlight some key points discussed during the meeting, ranging from the work done in this week, blockers or pending work (if any) to the work planned for the coming week.

# What work was done in this week?

Last week, we had thought to provide analysis(visualization) of changes over a period of time(configurable) for attributes such as ccn (code complexity), loc (lines of code), num_funs (number of functions), tokens (number of tokens) which are produced by Graal’s CoCom Backend.
So for this week, My task revolved around working on some reference visualization and to conduct some discussion related to the requirement in the issue thread - inishchith/gsoc-graal#6): [discussion] Considering relevant attributes from Graal for Visualization

  • In order to provide some context, Graal’s CoCom backend would provide us analysis data for each file that is affected in a commit. (shown below)
      analysis = [
      ...
          {
              "ccn": 19,
              "avg_ccn": 2.111111111111111,
              "avg_loc": 9.11111111111111,
              "avg_tokens": 64,
              "num_funs": 9,
              "loc": 129,
              "tokens": 786,
              "ext": "py",
              "blanks": 48,
              "comments": 63,
              "file_path": "graal/codecomplexity.py"
          }
      ...
      ]
    

At this point, We need to keep track of files(and their analysis data) that are available after each checkout commit and calculate the sum of the attributes, in order to produce some insights related to (for eg.) Total Code Complexity (and others) of the Software repository.

  • Fortunately, @valcos had shared some reference implementation of the above idea and agreed to the fact that we had to maintained separate items in the enriched index for each file’s analysis data that is available in the raw index. (We had exchanged some implementation in the course of the week in the corresponding working branch, and yesterday @valcos had made some improvements to the code wrt. to the redundancy in the insertion of items)

  • [ BLOCKER ] Now, the idea was a lot clear but there was still work to be done from configuring visualization in Kibana and get it working according to the discussed idea (i.e point 1). I couldn’t find a way to implement the idea for (variable number of) files available after a point in time, though the latter part was easier to implement (calculating the sum total of the particular analysis attribute eg. ccn).

  • In order to utilize the time, I proposed an idea of repository-level analysis in an issue thread which was basically producing repository-level analysis as an option for CoCom Backend, which would make the visualization to be carried out in Kibana in a bit easier way. (below are some comparison)

    Repository Number of Commits  File Level   Repository Level
    coala/coala 4458 58.89 min 59.92 min
    grimoirelab-perceval 1387 24.83 min 26.63 min
    grimoirelab-graal 169 1.82 min 1.86 min


  • I felt this would be a good-to-have option in Graal and as I’ve mentioned in the comment, would make it easier to visualize the data in Kibana. Hence, I’ve worked on the repository level analysis and could produce the following results.

Repository: chaoss/grimoirelab-perceval


[ Total Lines of Code at a given point of time ( here per month ) ]

Screenshot 2019-06-12 at 7 46 50 PM


[ Total Code Complexity at a given point of time ( here per month ) ]

Screenshot 2019-06-12 at 7 48 09 PM

[ Code Complexity per week ]

Screenshot 2019-06-13 at 11 23 55 PM


[ Code Complexity v/s Lines of Code per week ]

Screenshot 2019-06-13 at 11 24 35 PM

# Plans for next week:

  • During today’s meeting, we had discussed the repository-level analysis implementation that was proposed this week. The idea seems great and a good-to-have option but when it comes to visualization, it has a limitation that file-level data would no more be available and complex “files” couldn’t be tracked down if one is interested in knowing (as @valcos had correctly pointed it out).

  • With the earlier structure of enrich index implementation for CoCom Backend, it would be complex to implement the proposed idea. But when given a thought, @valcos had proposed a new structure for the enrich index, which could help us achieve the expected results. (newly proposed structure)

    file-1
        commit-1 (date, ccn, ..)
        commit-2 (date, ccn, ..)
        ...
    file-2
        commit-1 (date, ccn, ..)
        ...
        commit-3 (date, ccn, ..)
  • Now, every item in the index will be a file, and it will include the commits that modified it. So, the coming week would involve me working on implementing the above-proposed structure for enrich index and along with that some reference implementation using TimeLion for time-series data visualization.

  • The new issue for discussion and evaluating the implementation [enrich] New structure for CoCom Enrich Index

# Footnotes:

  • Next meeting will be conducted on Friday, 21th June 2019 at 12:30 CEST or 16:00 IST, so let’s catch up next week ;)
  • IRC conversation log of the meeting can be found here
  • GSoC project proposal
  • Project tracker can be found here

I’ll be posting updates on the mailing list as well as on the twitter thread.

If you have any questions or suggestions, please make sure to comment down below ;)