
Related items loading ...
Section 1: Publication
Publication Type
Journal Article
Authorship
Nafi, K. W., Roy, B., Roy, C. K., & Schneider, K. A.
Title
crolsim: Cross language software similarity detector using api documentation
Year
2018
Publication Outlet
In 2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM) (pp. 139-148). IEEE
DOI
ISBN
ISSN
Citation
Nafi, K. W., Roy, B., Roy, C. K., & Schneider, K. A. (2018). crolsim: Cross language software similarity detector using api documentation. In 2018 IEEE 18th International Working Conference on Source Code Analysis and Manipulation (SCAM) (pp. 139-148). IEEE.
https://doi.org/10.1109/SCAM.2018.00023
Abstract
In today's open source era, developers look forsimilar software applications in source code repositories for anumber of reasons, including, exploring alternative implementations, reusing source code, or looking for a better application. However, while there are a great many studies for finding similarapplications written in the same programming language, there isa marked lack of studies for finding similar software applicationswritten in different languages. In this paper, we fill the gapby proposing a novel modelCroLSimwhich is able to detectsimilar software applications across different programming lan-guages. In our approach, we use the API documentation tofind relationships among the API calls used by the differentprogramming languages. We adopt a deep learning based word-vector learning method to identify semantic relationships amongthe API documentation which we then use to detect cross-language similar software applications. For evaluating CroLSim, we formed a repository consisting of 8,956 Java, 7,658 C#, and 10,232 Python applications collected from GitHub. Weobserved thatCroLSimcan successfully detect similar softwareapplications across different programming languages with a meanaverage precision rate of 0.65, an average confidence rate of3.6 (out of 5) with 75% high rated successful queries, whichoutperforms all related existing approaches with a significantperformance improvement.
Plain Language Summary