CLCDSA: cross language code clone detection using syntactical features and API documentation

Section 1: Publication

Publication Type

Authorship

Nafi, K. W., Kar, T. S., Roy, B., Roy, C. K., & Schneider, K. A.

Title

CLCDSA: cross language code clone detection using syntactical features and API documentation

Year

2019

Publication Outlet

In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE) (pp. 1026-1037). IEEE

DOI

https://doi.org/10.1109/ASE.2019.00099

ISBN

ISSN

Citation

Nafi, K. W., Kar, T. S., Roy, B., Roy, C. K., & Schneider, K. A. (2019, November). CLCDSA: cross language code clone detection using syntactical features and API documentation. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE) (pp. 1026-1037). IEEE. https://doi.org/10.1109/ASE.2019.00099

Abstract

Software clones are detrimental to software maintenance and evolution and as a result many clone detectors have been proposed. These tools target clone detection in software applications written in a single programming language. However, a software application may be written in different languages for different platforms to improve the application's platform compatibility and adoption by users of different platforms. Cross language clones (CLCs) introduce additional challenges when maintaining multi-platform applications and would likely go undetected using existing tools. In this paper, we propose CLCDSA, a cross language clone detector which can detect CLCs without extensive processing of the source code and without the need to generate an intermediate representation. The proposed CLCDSA model analyzes different syntactic features of source code across different programming languages to detect CLCs. To support large scale clone detection, the CLCDSA model uses an action filter based on cross language API call similarity to discard non-potential clones. The design methodology of CLCDSA is two-fold: (a) it detects CLCs on the fly by comparing the similarity of features, and (b) it uses a deep neural network based feature vector learning model to learn the features and detect CLCs. Early evaluation of the model observed an average precision, recall and F-measure score of 0.55, 0.86, and 0.64 respectively for the first phase and 0.61, 0.93, and 0.71 respectively for the second phase which indicates that CLCDSA outperforms all available models in detecting cross language clones.

Plain Language Summary