This is the last week of the final coding phase of GSoC 2019. In this post, I’ll try to summarize the work done during all the coding phases and share the results at the end.
This post would also serve as a
Final Report for the project undertaken and contains all the information regarding the contributions done by me towards the project.
# Coding Phase I
(27th May to 28th June)
In this phase, I had tried to understand how the GrimoireLab toolchain works on entirety and come up with an initial approach to integrating CoCom Backend with ELK.
- Before the start of this phase, I had only contributed to Graal and didn’t have any experience of how the Grimoirelab chain works. Hence, with the guidance of my mentors, I started learning how Mordred works along with ELK and how to import dashboards using Kidash and Sigils.
- Next step was to discuss and initiate the integration of Graal’s Backend with ELK. We had agreed to integrate Code Complexity(CoCom) Backend entirely and then focus on Code License(CoLic). Along these lines, I had also been fixing a few issues in Graal and adding an analyzer(Scancode-CLI) which was merged after a code review in the second week and we had started working on the initial integration task.
- At this point, we had an initial integration of Graal-CoCom-ELK, which was dependent on the pure(raw) data produced by Graal and with no preprocessing steps performed on the process. I could produce some mock visualizations and understand how Kibana worked, as I didn’t have any experience in the ELK-stack prior to this project except for some idea in ETL tasks.
- Although we had integrated the initial version of CoCom Backend, it wasn’t something we were looking for, at this point we had all sorts of restrictions as the data fields produced by graal-cocom was per-commit. One good thing about this implementation was it was very efficient as an item had information related to only the files affected in that given commit. But, the issue, in order to produce visualization which can be analyzed over time we need to have a history of points in order to show (for instance) Evolution of a metric (Line of Code).
- And the time had come, we had to think of a solution to either structure the data stored in ElasticSearch index or to perform a Repository-Level analysis at every commit using Graal 😅
- Pull requests
- chaoss/grimoirelab-graal#29: [colic] Add support of scancode_cli to colic backend
- chaoss/grimoirelab-graal#32: [graal] Derive
- chaoss/grimoirelab-graal#34: [logger] Switch
infologger level to
- chaoss/grimoirelab-graal#37): [analyzer] Fix results for deleted files for CoCom backend
- chaoss/grimoirelab-graal#38: [cocom] Add repository level analysis option for CoCom backend
- chaoss/grimoirelab-graal#39: [cocom] Add repository level analysis via lizard
- chaoss/grimoirelab-tutorial#86: [graal] Add Graal to the Sidebar
- chaoss/grimoirelab-tutorial#87: [micro] Add tutorial for exectution of Micro-Mordred via Docker-Compose
- inishchith/gsoc/issues#3: [doc] How to play with Grimoirelab components
- inishchith/gsoc/issues#4: [integration] Add support of Graal to ELK & Mordred
- inishchith/gsoc/issue#5: [visualization] Creation Import & Export of Kibana Dashboards
- inishchith/gsoc/issue#7: [enrich] New structure for CoCom Enrich Index
- inishchith/gsoc/issue#8: [integration] Add support of Graal’s CoCom Backend to ELK
- chaoss/grimoirelab-graal#18: [discussion] Improvements in existing analyzers and additions
- chaoss/grimoirelab-graal#27: [colic] Add scancode_cli option to CoLic Backend
- chaoss/grimoirelab-graal#33: [graal] Checkout log an issue in case of large repositories
- chaoss/grimoirelab-graal#35: [analyzer] Fix results for deleted files
- chaoss/grimoirelab-graal#36: [cocom] Evaluating results with repository level analysis
- chaoss/grimoirelab-elk#642: Add option to fetch from selected branches
- chaoss/grimoirelab-tutorial#84: [components] How to play with Grimoirelab components
# Coding Phase II
(28th June to 26th July)
In this phase, we had identified potential performance issues with the repository-level analysis and come up with the
File-Level + Studyapproach and evaluated it after integrating with the ELK, Also producing some visualizations with the help of the data.
- At this point, I had discussed the repository-level analysis approach my mentors during the weekly meetings and we had agreed to give it a try and see how things go. (evaluation below)
|Repository||Number of Commits||File Level||Repository Level|
|chaoss/grimoirelab-perceval||1394||26:22 min||26:56 min|
|chaoss/grimoirelab-sirmordred||869||08:51 min||3:51 min|
|chaoss/grimoirelab-graal||171||2:24 min||1:04 min|
- The above experimentation, lead to some good results with the visualizations as every commit item now had information about related to each and every file in the repository, I could produce data-tables and metrics which gave many insights about a given repository. Each of the items can be viewed as a data-point on a graph and can be visualized and.. we’re done. No, not yet 😉
- I spent much time implementing this approach only to understand that the structure of data we’re using would consume a lot of data in memory once we integrate the implementation with ELK.
- Ahead of all, we had already addressed the performance issues that repository-level data brought along with the insightful results and this was the time we had to re-think of ways to eliminate the overhead. At this point, @valcos had suggested me to explore the execution of a study over the enrich index which would help eliminate execution of analyzer at repository-level and hence reducing the memory overhead.
- I could produce an initial implementation of study and the following is a comparison.
|Repository||Commits||File Level||Repository Level||File-Level + Study|
|grimoirelab-kingarthur||208||3:12 min||34:23 min||5:06 min|
|grimoirelab-graal||179||2:22 min||28:35 min||3:58 min|
- Space Complexity: For instance, if we consider an item per file, the results for Graal repository would be as follows: (no. of files: ~200, no. of commits: ~180)
Repository Level: 36000 items (200*180)
File-Level + Study: 200 items
- On further improving the approach i.e adding an
interval_monthsparameter, which helps us conduct study over an interval of time, we could get the total number of items in memory to
Max(1st commit date / interval_months)(for instance, if a repository is 3 years old and we want to analyzer the evolution of a metric every month, then we would have items in memory equal to 3 x 12 =
32, no imagine increasing the
- CoCom Dashboard (1st Iteration)
CoLic Dashboard (1st Iteration)
In summary, and as per the evaluation results, we had started off the process long time back with (1)Repository-Level enricher which used about 36,000 items in memory. We tried to improve this via (2)File-level + Study(cache_dict) and brought down the execution time significantly and memory usage to just 200 items. And finally, to File-Level + Study(query_chuck) which brought down the memory usage to a worst-case of
number_of_intervalsitems (an interval is decided by chunking data into months, starting from the first commit) and all by the trading of just 5% more execution time. 🎉
- Pull requests
- chaoss/grimoirelab-graal#40: [docs] Update documentation and links to requirements
- chaoss/grimoirelab-graal#41: [analyzer] Fix scancode_cli results
- chaoss/grimoirelab-graal#46: [cloc] Fix cloc error due to mulitple word language-name
- chaoss/grimoirelab-elk#650: [elk] Add option to fetch from selected branches
- chaoss/grimoirelab-elk#651: [graal] Add support of Graal’s CoCom Backend to ELK
- chaoss/grimoirelab-elk#653: [graal] Add support of Graal’s CoLic Backend to ELK
- chaoss/grimoirelab-elk#664: [graal] Add support of Graal’s CoCom Backend to ELK (study approach)
- chaoss/grimoirelab-elk#669: [graal] Add support of Graal’s CoLic Backend to ELK (study approach)
- inishchith/gsoc/issues#12: [improvements] Evaluation of existing approaches and optimizations
- inishchith/gsoc/issues#13: [visualization] Issues related to metrics visualization
- inishchith/gsoc/issues#14: [visualization] Dashboard for CoLic backend
- inishchith/gsoc/issues#15: [refinements] Revamp Index and visualizations to appropriate levels
- chaoss/grimoirelab-graal#47: [cocom] Redundant log on every file-open operation
# Coding Phase III
(26th July to 26th August)
In this phase, we had tried to identify issues in the integration by conducting full-chain tests and fixing some of the major issues and addressing some minor ones. Also added tests for the integration and conducted several iterations on both the dashboards.
- After successfully implementing the Study Enricher, I went on to add support of other categories of CoLic analyzers
NOMOSand evaluate their performance with respect to small & medium-sized repositories.
- Next, I spent time adding corresponding adding tests, performing code-reviews with the help of @valcos and lessen the redundancy in all the existing pull-requests. Our next step was to perform Full tests of the entire chain of tools, right from raw collection of data to produce the dashboards. In this phase, we had conducted the entire chain test more than 10 times and subsequently identifying issues and accordingly fixing them.
At this point, we started testing the import and export of the dashboard panels with the help of Micro-Mordred(panel’s task) and subsequently opening a pull-request at sigils.
- The video below summarizes the results achieved. (It shows how to collect and enrich
CoLicdata and importing their corresponding dashboards)
- Pull requests
- chaoss/grimoirelab-graal#50: [colic] Add copyright flag for extraction of copyright information
- chaoss/grimoirelab-elk#672: [graal] Add support of Graal CoCom & CoLic Backend (finalized)
- chaoss/grimoirelab-sirmordred#320: [graal] Add configuration for Graal integration in ELK
- chaoss/grimoirelab-sigils#380: [graal] Add Code Complexity(CoCom) & Code License(CoLic) panels
- inishchith/gsoc/issues#16: [task-list] Plan for Final Coding Phase
- inishchith/gsoc/issues#17: [integration] Evaluation of Small/Medium size repositories
- inishchith/gsoc/issues#18: [eval] Full chain test reports
- chaoss/grimoirelab-graal#48: [colic] Incorrect extraction of copyright information
- chaoss/grimoirelab-graal#49: [colic] Add copyright flag for extraction of copyright information
- chaoss/grimoirelab-graal#54: [colic] Slow execution of ScanCode-CLI
- chaoss/grimoirelab-graal#55: [colic] KeyError on execution of ELK with ScanCode-CLI
- chaoss/grimoirelab-graal#56: [cocom] Impossible to checkout the worktree
Thank You ♥️
- To my mentors: Valerio Cosentino, Jesus Gonzalez-Barahona & Pranjal Aswani for your continued support that helped me grow and understand things in much clarity this summer.
- To the @CHAOSS community: especially Georg Link & Matt Germonprez. Thanks for being so appreciative of the work 😅
- To all the readers of the blog & the twitter thread 😉
- Project tracker can be found here
I’ll be posting updates on the mailing list as well as on the twitter thread.
If you have any questions or suggestions, please make sure to comment down below ;)
« [week-10] Regression testing & Evaluating results