GitHub Copilot: Productivity Boost or DORA Metrics Disaster?

Imagine a world where measuring developer productivity is as simple as checking your fitness stats on your smartwatch. With AI programming assistants like GitHub Copilot, this seems within reach. GitHub Copilot claims to increase developer productivity with context-aware code completion and snippet generation. By using artificial intelligence to design entire lines or modules of code, GitHub Copilot aims to reduce manual coding effort, the equivalent of using a supercharged assistant to help you code faster and focus on complex problem solving.

Organizations use DevOps Research and Assessment (DORA) metrics as a structured approach to evaluating software development and team performance. This data-driven approach enables teams to deliver software faster with greater reliability and improved system stability. By focusing on deployment frequency, time to change, change failure rate, and mean time to recovery (MTTR), teams gain invaluable insight into their workflows.

The impact of artificial intelligence on DORA metrics

Here’s the kicker – DORA metrics aren’t all sunshine and rainbows. Misusing them can lead to a narrow focus on quantity over quality. Developers can game the system just to improve their metrics, like students cramming for exams without really understanding the material. This can make a difference, as developers working on modern microservices-based applications will naturally shine in DORA metrics compared to those working with older, monolithic systems.

The arrival of AI-generated code makes this problem much worse. While tools like GitHub Copilot can increase productivity metrics, the results don’t necessarily reflect better deployment practices or system stability. Auto-generated code could increase productivity statistics without actually improving development processes.

Despite their potential, AI coding assistants bring new challenges. In addition to concerns about the atrophy of developer skills and ethical issues surrounding the use of public code, experts predict a massive increase in QA and security issues in software production, which has a direct impact on your DORA metrics.

AI coding assistants, trained on vast amounts of public code, can inadvertently design snippets with bugs or vulnerabilities. Imagine an AI generating code that doesn’t properly sanitize user input, opening the door to SQL injection attacks. Additionally, a lack of project-specific AI context can lead to code misalignment with the project’s unique business logic or architectural standards, causing functionality issues discovered late in the development cycle or even in production.

There is also a risk that developers will become too dependent on AI-generated code, leading to a lax approach to code review and testing. Small errors and inefficiencies could slip through the cracks and increase the likelihood of production defects.

These issues can directly affect your DORA metrics. More defects caused by AI-generated code can increase the change failure rate and negatively impact the stability of the deployment pipeline. Bugs that make it to production can increase the mean time to recovery (MTTR) as developers spend more time fixing problems caused by AI. In addition, the need for additional checks and tests to catch bugs introduced by AI assistants can slow down the development process and increase the time required for changes.

Guidelines for Development Teams

To mitigate these impacts, development teams must follow strict code review procedures and implement comprehensive testing strategies. These huge volumes of ever-increasing AI-generated code should be tested as thoroughly as handwritten code. Organizations must invest in comprehensive test automation and test management solutions to ensure tracking and complete visibility into code quality earlier in the cycle and systematically automate testing throughout the cycle. Development teams must handle the increased load of AI-generated code by being smarter about how they perform code reviews, apply security tests, and automate their testing. This would ensure continuous delivery of high quality software with the right level of trust.

Here are some guidelines for software development teams to consider:

Code review — Include testing best practices during code reviews to maintain code quality even for AI-generated code. AI assistants like GitHub Copilot can really contribute to this process by suggesting improvements to test coverage, identifying areas where additional testing may be required, and highlighting potential edge cases that need to be addressed. This helps teams maintain high standards of code quality and reliability.

Security review — Consider every input in your code a potential threat. To harden your application against common threats such as SQL injection or cross-site scripting (XSS) attacks that can penetrate AI-generated code, it’s essential to consistently validate and sanitize all input. Create robust governance policies to protect sensitive data such as personal information and credit card numbers that require additional layers of security.

Automated testing — Automate test case creation, allowing teams to quickly generate steps for unit, functional, and integration tests. This will help handle the massive increase in AI-generated code in applications. Expand beyond just helping developers and traditional QA folks by bringing non-technical users to build and maintain these tests for end-to-end automated testing.

API testing — Use open specifications to create an AI-enhanced testing approach for your APIs, including creating and maintaining API tests and contracts. Seamlessly integrate these API tests with developer tools to speed development, reduce costs, and keep tests up-to-date with ongoing code changes.

Better test management — AI can help with intelligent decision making, risk analysis and optimization of the testing process. Artificial intelligence can analyze vast amounts of data to provide insight into test coverage, efficiency and areas that need attention.

While GitHub Copilot and other AI coding assistants promise to increase productivity, they raise serious concerns that could make DORA metrics unmanageable. Developer productivity may be superficially increased, but at what cost? Hidden efforts to investigate and fix AI-generated code could overshadow any initial gains and lead to potential disaster if not carefully managed. Armed with an approach that is ready for AI-generated code, organizations need to rethink their DORA metrics to better match AI-generated productivity. By setting the right expectations, teams can reach new heights of productivity and efficiency.

Madhup Mishra is the senior vice president of product marketing at the company SmartBear. With over two decades of technology experience at companies such as Hitachi Vantara, Volt Active Data, HPE SimpliVity, Dell and Dell-EMC, Madhup has held various roles in product management, sales engineering and product marketing. He is passionate about how artificial intelligence is changing the world.

Generative AI Insights provides a place for technology leaders—including vendors and other external contributors—to explore and discuss the challenges and opportunities of generative AI. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s tech-sophisticated audience. InfoWorld does not accept marketing materials for publication and reserves the right to edit any content submitted. Contact doug_dineley@foundryco.com.

Leave a Comment