Skip to content
Sergej Dechand7 min read

The Risks of AI-Generated Code

AI is fundamentally transforming how we write, test and deploy code. However, AI is not a new phenomenon, as the term was first coined in the 1950s. With the more recent release of ChatGPT, generative AI has taken a huge step forward in delivering this technology to the masses. Especially for development teams, this has enormous potential.

Today, AI represents the biggest change since the adoption of cloud computing. However, using it to create code comes with its own risks. In this article, we will discuss the potential pitfalls of using AI tools to produce large amounts of code. We hope that you find it useful.


AI Coding Tools Are Starting to See Adoption in 2023 

The adoption of AI coding assistants such as GitHub Copilot or Amazon CodeWhisperer has undoubtedly lightened the load on developers, allowing them to write code much faster than ever before. However, this has not been without drawbacks.

In a recent study by Stanford University, it was found that programmers who used AI tools wrote less secure code than those who did not, where developers using the AI assistants produced significantly more insecure results.

Sergej 4 Security Challenges Post (800 × 800 px) (3)
Moreover, the study revealed that the use of AI assistants created a false sense of security among developers, where those who use them tend to believe that the code they produce is more secure than it actually is.

These findings are unsurprising because such tools are heavily based on prompts and operate on algorithms with little contextual or project-specific understanding. However, it is evident that they are still in their infancy and will undoubtedly improve over time. This highlights a crucial need for scalable, effective testing methods to make sure that AI-generated code is secured before it is shipped.

AI-Generated Code and the Limitations of Traditional Testing Methods

As AI code tools and assistants become the new norm, traditional testing methods like static analysis and dynamic testing still face inherent limitations, which will only be amplified as AI technologies are used to create vast amounts of code.

Static Analysis

Static analysis testing methods (SAST) use large language models (LLMs) to develop test cases, and this has been common practice since way before the release of ChatGPT. However, as a testing method, SAST has some limitations as it involves analyzing code without executing it and often fails to detect issues that depend on runtime configuration or user input. This lack of runtime context often results in potentially harmful bugs and vulnerabilities going undetected. Additionally, static analysis tools tend to produce large quantities of false positives, e.g. findings that turn out to be harmless. Sorting through these false positives is time-consuming and tedious work.

The lack of reproducibility is another drawback of static analysis, as SAST tools do not provide the input that triggered an alleged finding which adds complexity to the process of sorting out false positives. While advances in neural networks and LLMs have improved the effectiveness of static analysis, these fundamental problems remain.

Dynamic Software Testing

On the other hand, many companies heavily rely on dynamic black-box testing, which involves attacking the application from an attacker’s perspective. During dynamic black-box testing, test cases are generated randomly or based on predefined inputs without knowing the internal structure of the software under test.

Dynamic testing is performed while a system is running, alleviating some of the shortcomings of static analysis. However, it is difficult for dev teams to use this method efficiently as it provides no insights into how much of the source code was executed during a test. This makes it hard to draw conclusions from test results. As black-box tests run blindly, without any information about the system under test, they tend to miss deeply hidden bugs and vulnerabilities.

Staying Ahead of Attackers in an Era Driven by AI Code

The challenges that AI-generated code brings with it require scalable testing tools that enable dev teams to ship software with high confidence and without making any concessions in speed or efficiency.

One approach to address these challenges is to incorporate white-box testing powered by self-learning AI. This method offers dev teams full access to the source code, enabling them to create more robust and comprehensive test cases. By leveraging the capabilities of self-learning AI, dev teams can gather valuable insights from previous test runs and use this knowledge to automatically generate new, intelligent test cases. These advanced test cases can uncovering deeply hidden bugs and vulnerabilities that traditional testing methods tend to miss.

A key advantage of white-box testing is the ability to integrate seamlessly with existing unit tests for fully automated testing allowing developers to identify bugs and vulnerabilities before they enter the codebase.

AI and the Future of Software Testing

At Code Intelligence, we believe that using AI coding tools will be a huge part of how software is developed moving forward. But to unlock their full potential and ensure that the code produced is secure before it is shipped, their use must be paired with scalable and automated testing approaches. 

AI-powered white-box testing makes it possible to test large volumes of code and identify deep-routed bugs and vulnerabilities. Moreover, as white-box testing solutions can fully access the source code and seamlessly integrate into CI/CD pipelines, they will aid dev teams drastically in writing secure code without slowing down the development process. 

For more info on how AI-powered white-box testing enables dev teams to ship both human and AI-generated code securely, check out our freely available whitepaper.

Download Whitepaper



What is the risk of code generated by AI?

While AI-generated code can accelerate development, it also introduces new risks. This is because AI tools currently rely on the LLMs, which use general data and don't know the specifics of the application under test. Moreover, they are being used to produce much higher quantities of code which in turn will lead to more bugs and vulnerabilities. The reliance on AI can also lead to a reduced understanding of the codebase by human programmers and a false sense of security, as found in a study by Stanford University.

What is the issue with traditional static analysis and black-box testing?

Traditional methods of software security testing, like static analysis (SAST) and dynamic black-box testing (DAST), suffer from significant limitations. 

Starting with SAST, it lacks runtime context, meaning it cannot fully assess the actual behavior of the software during execution. Consequently, SAST often generates numerous false positives, leading to potential vulnerabilities being overlooked or ignored.

On the other hand, DAST faces its own set of challenges. While it evaluates the external behavior of a software application, it fails to assess its internal workings, leaving potential security flaws within the code undetected. Furthermore, DAST does not achieve complete code coverage, meaning certain parts of the application might remain untested and susceptible to vulnerabilities.

What are the benefits of AI-powered white-box testing?

AI-powered white-box testing offers several advantages:

  • It can integrate easily with existing unit tests for automated testing at each code change
  • It leverages the internal design of the application to find hidden issues
  • It eliminates false positives and duplicates, providing actionable results 
  • It helps assess untested parts of the system by refining test inputs based on coverage information
What is AI-Powered white-box testing?

Dynamic white-box testing with self-learning AI is a testing strategy that leverages the internal design of the software under test to generate myriads of intelligent test cases. Genetic algorithms learn from previous test data and create new test cases to dig deeper into the software under test. This approach enables dev teams to identify bugs and vulnerabilities that traditional testing methods often miss.

How does AI-powered white-box testing compare to traditional black-box testing?

AI-powered white-box testing differs from black-box testing as it can leverage the internal design of software to provide a more comprehensive testing approach. It can also allow dev teams to gather information from previous test cases and use this knowledge to generate new, more intelligent ones automatically.

How can Large Language Models (LLMs) and self-learning AI work together in software testing?

LLMs can be used to analyze code and automatically identify the potential attack surface. They can also generate the corresponding test harnesses needed for self-learning AI. Self-learning AI can then “attack” these entry points while continuously refining test cases based on coverage feedback. When combined, these two forms of AI provide a more automated and scalable testing model, enabling development teams to secure self-written code, third-party components, and AI-generated code at scale.

Related Articles