Using AI to Write Code: How Much It Really Helps, and the Risks Your Team Needs to Know

For development teams bringing AI into real work

The picture inside most dev teams right now looks pretty much the same. One person has GitHub Copilot open in their editor, another uses Cursor to write whole files in one shot, and the team lead tries pointing Claude Code at an entire folder to fix things. The questions that follow are always the same ones. How much does it really help, is it safe enough to merge into main, and how does the team use it effectively without piling up a mountain of technical debt.

This article answers both sides based on research and security reports actually published during 2025 and 2026, not on marketing claims. It covers where AI genuinely helps, where the results run against your gut feeling, and the security risks that can be measured in numbers.

Confirmed as of 18 June 2026. The productivity and security figures are drawn from research and reports published in 2025 and 2026. The tools evolve very fast, but the patterns of benefit and risk that have been observed still work as a framework for decisions.

What AI coding tools can actually do

These tools fall roughly into three types based on how they fit into the work.

The first type is smart autocomplete like GitHub Copilot, which fills in code line by line or completes whole functions as you type. It is good for cutting down on repetitive typing and boilerplate. The second type is editors with AI embedded more deeply, like Cursor, which let you direct multi-file edits in natural language, talk to the codebase, and refactor in batches. The third type is agentic coding like Claude Code, which takes work as a mission, then reads files, makes edits, runs commands, and checks the results itself across iterations.

The tasks these tools handle well in practice are tasks with clear scope: writing boilerplate, drafting initial functions, writing unit tests, explaining old undocumented code, and repetitive work with a clear pattern. Work that requires understanding the architecture of the whole system, making design trade-off decisions, or fixing bugs buried deep in a codebase the writer already knows well is a different matter, which the numbers below will explain.

How much it really helps, from the research

Start with the usage picture. The Stack Overflow Developer Survey 2025 found that most developers use or plan to use AI tools, around 84 percent, and about half use them every day. This is mainstream tooling now, not an experimental toy. But it has to be noted that these are self-report survey figures, meaning people stated themselves that they use it, not a measurement of outcomes.

On the productivity side, there are two bodies of evidence that have to be read together.

The first set says it speeds up work with clear scope. GitHub’s controlled experiment found that Copilot helped people finish well-defined tasks significantly faster. The figure often quoted is around 55 percent faster. The important point usually cut out is that this number comes from an experiment under controlled conditions, a single well-defined task, not real whole-system work with tangled context. So it only tells us that on the type of task it was designed for, the tool genuinely shortens the time.

The second set is the most important point of this article, and it is the point where the results run against your gut feeling. METR’s randomized controlled trial, published in mid-2025 (arXiv 2507.09089), ran on experienced developers working in open-source codebases they were already familiar with. The sample was 16 people across 246 tasks. The result was that getting to use AI made this group take around 19 percent longer, even though both before and after the experiment the participants believed AI made them around 20 percent faster.

This gap is what the “10x faster” marketing never tells you. People feel faster, but the clock says slower. The likely cause is that the time spent reading the AI’s output, verifying it, fixing what it guessed wrong, and adapting it to a codebase they already understand well costs more than writing it themselves.

These two sets do not contradict each other. They tell different stories. The first is clear-scope work under controlled conditions. The second is real work by people who are skilled and already familiar with the codebase. The usable conclusion is that the effect of AI depends clearly on the type of task and the user’s familiarity, not a fixed multiplier that applies to every task. And more importantly, feeling faster is not evidence of being faster. You have to measure.

A caveat on the METR numbers themselves is that the sample is small and the context is specific: open-source that the participants knew very well. So it should not be generalized into a claim that AI makes everyone slower on every task. But it is enough to refute the claim that it is always massively faster.

The security of AI-generated code

This matters no less than speed, because code that is fast but full of holes is debt that costs more than before.

Veracode’s 2025 GenAI Code Security Report tested over 100 language models and found that around 45 percent of the code AI generated contained vulnerabilities that fall under the OWASP Top 10, the list of the most common and most serious security vulnerabilities in web applications. Other independent academic work landed in the same direction, above 40 percent. This proportion varies by the type of task and the language used, but the size of the problem is clear enough to conclude that AI-generated code has to pass human review and a security scan every time. Never merge straight from the AI’s output.

A new type of risk the team needs to know about is slopsquatting. Research found that around 20 percent of AI-generated code samples reference packages or libraries that do not actually exist, meaning the AI made up the names itself. What is dangerous is that these made-up names often recur in patterns. Bad actors then register those fake package names on public registries and stuff malicious code inside them, waiting for developers who trust the AI to install them as recommended. This is confirmed by the Cloud Security Alliance and Snyk. The defense is to check that every dependency the AI recommends actually exists and is trustworthy before installing. Do not run install on the AI’s command without looking at the name.

There is another layer of long-term quality risk. There is a report (GitClear) that codebases relying heavily on AI have more duplicated code, or code clones, which makes them harder to maintain over time. This is a trend signal, not a universal benchmark, but it lines up with common sense that generating code faster without designing for reuse accumulates into a burden later. Fast when writing does not mean cheap when you have to come back and fix it.

Last is leaked secrets and credentials. Fast code writing in the “vibe coding” style has been reported by the Cloud Security Alliance to make it easier for API keys, passwords, or sensitive data to slip into files committed to the repo, because people are in a hurry and do not check what the AI put in.

How to use it effectively in a team

From all the evidence, here are the practices that work for real teams.

First, people must review and understand the code the AI writes every time, especially the parts to do with security, authentication, and money. If anyone on the team cannot explain the code they committed, they will not be able to debug it or take responsibility for it when it breaks. Do not copy-paste what you do not understand.

Second, put security scanning into the pipeline, both SAST to scan the code itself and dependency scanning to check packages, and set a rule that every dependency added must be verified to actually exist and come from a trusted source before installation. This is the gate that catches both OWASP vulnerabilities and slopsquatting.

Third, match the task to the tool correctly. Use AI for repetitive work, drafts, and clear-scope work. For work that requires deep understanding of the system in a codebase the team already knows, do not assume AI will always be faster, because METR shows that in some contexts it is slower.

Fourth, never put API keys, passwords, or customer data anywhere the AI or git can see, and set up secret scanning in the repo as the last line of defense.

Fifth, and most important from a management standpoint, measure from reality. Use time per task cycle, the number of bugs that escaped to production, and review time as the metrics, not the team’s feeling that it is faster, because METR has clearly proven that feeling and reality can run in opposite directions. Teams that do not measure will make decisions from an illusion.

Update box: Right now (June 2026)

The information in this chapter is confirmed as of 18 June 2026. These tools change versions very fast and each one’s capabilities will keep improving, but these three patterns will likely hold for quite a while: the benefit depends on the type of task, the feeling of being faster cannot be trusted without measurement, and AI-generated code must always pass a security check. Specific numbers, such as Veracode’s 45 percent or METR’s 19 percent, should be checked against newer reports periodically, because both the models and the measurement methods will change.

A feature-by-feature and price comparison of the individual tools, between GitHub Copilot, Cursor, and Claude Code, has not been verified in this round, and will be split into a separate chapter once every source is fact-checked.

Next step

If your team does not yet have security scanning in the pipeline, start there before anything else, because it is the gate that handles the two biggest risks in this chapter at once: vulnerabilities in the code and fake packages. Then set up real measurement for 1 to 2 months, recording time per task cycle and bug counts both before and after using AI, and only then decide from the numbers which types of work to expand its use into. That is how you make AI a real accelerator for the work, not an illusion of speed.

References

Stack Overflow Developer Survey 2025 (AI section): https://survey.stackoverflow.co/2025/
METR, Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity (arXiv 2507.09089): https://arxiv.org/abs/2507.09089
Veracode 2025 GenAI Code Security Report: https://www.veracode.com/resources/analyst-reports/2025-genai-code-security-report/