Static Code Analysis of Data-driven Applications Through Common Lingua and the Semantic Web Technologies

"Web applications have become increasingly popular due to their potential for businesses' high revenue gain through global reach. Along with these opportunities, also come challenges in terms of Web application security. The increased rise in the number of data-driven applications has also seen an increased rise in their systematic attacks. Cyber-attacks exploit Web application vulnerabilities. Attack trends show a major increase in Web application vulnerabilities caused by improper implementation of information-flow control methods and they account for more than 50% of all Web application vulnerabilities found in the year 2013. Static code analysis using methods of information-flow control is a widely acknowledged technique to secure Web applications. Whilst this technique has been found to be both very effective and efficient in finding Web application vulnerabilities, specific tools are highly dependent on the programming language. This thesis leverages Semantic Web technologies in order to offer a common language through source code represented using the Resource Description Framework format, whereby reasoning can be applied to securely test Web applications. In this thesis, we present a framework that extracts source code facts from various programming languages at a variable-level of granularity using Abstract Syntax Trees (ASTs) generated using language grammars and the ANTLR parser generator. The methodology for detecting Web application vulnerabilities implements three phases: entry points identification, tracing information-flow and vulnerability detection using the Jena framework inference mechanism and rules describing patterns of source code. The approach discussed in this thesis is found to be effective and practical in finding Web application vulnerabilities with the limitation that it can only detect patterns that are used as training data or very similar patterns. False positives are caused by limitations of the language grammar, but they do not affect the accuracy of the security vulnerability detection method in identifying the correct Web application vulnerability." -- Abstract.

"Web applications have become increasingly popular due to their potential for businesses' high revenue gain through global reach.