AI in software development: gains, gaps, and what’s next

AI is reshaping software development, but real impact depends on context. Google’s RCT shows teams completing complex tasks 21% faster, and Microsoft and Accenture report 26% more completed tasks. Yet Stanford’s 100,000 developer study finds near zero gains on large and complex codebases, and security risks also remain. The final picture is mixed: some wins, some gaps. To save you some time, here’s our overview of what the data reveals, what we’ve learned in practice, and what matters if you lead engineering or product teams.

TL; DR

AI helps you best with well-scoped tasks, not everything.
Roles stay the same while the workflows get small and targeted upgrades.
Quality and security still come from discipline and structure.
You have to measure end-to-end and give it at least 3 months before you can judge the actual impact.
Keep sensitive data out of prompts and note AI assistance for traceability.

The tools that work (for us)

We rely on Copilot-style assistance (often powered by Claude or Gemini) for coding tasks, Perplexity for business research, and we’re piloting an Azure-based agent platform designed to replace isolated tools with a coordinated system of micro-agents that cover the whole software development cycle, including coding, documentation, CI/CD, research and more.

Copilot-style tools give developers immediate drafting speed and reduce boilerplate.
Perplexity helps analysts find and structure relevant insights faster.
The agent platform is our long-term bet: instead of juggling disconnected tools, we aim for a unified, policy driven environment that scales across teams.

Roles haven’t shifted, and we don’t expect them to. We’re not tearing up workflows. Instead, we’re augmenting them. Where AI helps most: market research, first drafts of code, test scaffolds, basic documentation, and focused investigations.

Earlier studies showed developers using AI assistants often wrote less secure code while feeling more confident [6].
Newer work confirms the risk persists but shows structured feedback and vulnerabilityaware prompts can reduce it [7,8].

This is still an evolving field, and we continue to monitor new findings and adapt our practices.

The reality check on speed

It’s too early to claim a revolution. Juniors often feel faster. Seniors point out that AI code still needs work to meet the quality bar.

The data backs this up:

+21% faster on a complex enterprise task in a Google RCT [1].
+26% more pull requests in large field experiments at Microsoft, Accenture, and a Fortune 100 company [2].
Near‑zero net gains (−5% to +5%) on large, complex codebases in Stanford’s 100,000 developer study [3].
A recent Upwork survey of 1,000 U.S. business leaders shows AI adoption is widespread, but productivity gains are uneven, with most companies still in early stages of integrating AI into workflows [4].
Standford’s 2025 AI Index further confirms that impact is highly context‑dependent [5].

You are still responsible for security

AI doesn’t replace engineering discipline. We stick to three core habits when working with Copilots:

Be specific and clear in what you ask for.
Verify the result with your own skill.
Move in small steps so you don’t break working code.

Then, pipelines run static analysis, secrets and license checks, and dynamic tests on staging. We also recommend noting AI assistance in PRs for traceability.

How can you measure value?

It is really not as straightforward as it seems. Focus on running selected pilots and comparing to a baseline for at least one quarter. This has worked well for us. Then, measure actual business KPIs like project throughput, quality, reliability, and cost, which also includes people time and cloud compute. Your goal should be to produce working software that meets your quality bar with fewer resources.

Risks to watch out for:

Quality decrease when working on complex projects.
Overreliance on tools that erodes your core skills.
Sensitive data (e.g. secrets, IP, etc.) leakage.

You can mitigate the risks through guided platform approach, automated testing, clear usage guidelines, and role based training. Remember, secrets and IP must never land in prompts!

Risks to watch out for:

Quality decrease when working on complex projects.
Over-reliance on tools that erodes your core skills.
Sensitive data (e.g. secrets, IP, etc.) leakage.

We mitigate with a guided platform approach, automated testing, clear usage guidelines, and role based training. Secrets and IP never go into prompts.

Does AI change how you lead teams?

The fundamentals don’t change. Set clear goals, pick tools that fit the work, and empower your people. Add one extra lens: trust and verification. AI can be wrong with confidence. The team needs to keep an eye on the output and continues to be responsible for the outcome.

So, what’s coming up… in 2-3 years?

Coding remains will remain a valuable skill. Tools will improve, workflows will stabilize, and productivity gains will be modest. Unless there’s a true breakthrough in model design, not just a new LLM version number. For a small team starting now:

Investigate your process and find the real bottleneck.
Pick one or two tools to pilot.
Attach business KPIs like project velocity
Measure for at least a quarter, then decide to continue or switch.
Share what works across teams to speed up learning.

Get in touch

Artur Sossin

Lead AI-specialist

artur.sossin@helmes.com

Get in touch

AI in software development: gains, gaps, and what’s next

Author

TL; DR

The tools that work (for us)

The reality check on speed

You are still responsible for security

How can you measure value?

Risks to watch out for:

Does AI change how you lead teams?

So, what’s coming up… in 2-3 years?

Read more on the topic:

Get in touch

Get in touch