The next challenge for AI: knowing what to build

AI has got very good at building things. What it hasn't figured out yet is what to build, and why. That's not an intelligence problem. It's a context problem, and it's one that human teams have been wrestling with for years, with mixed results. With agents now writing code and running experiments autonomously, it's about to matter a lot more.

We had three systems. Product management, engineering, support. Everyone had access to all three. In theory, an engineer picking up a ticket could open the product tool, read the customer quotes, and build something that addressed the actual pain. The support team could surface patterns that would reshape priorities: the exact language customers used, the workarounds they'd built, the features they kept asking about.

In practice, people lived where their work happened. The engineers lived in the engineering tool. The product team lived in the product tool. The support team lived in the support tool. It wasn't that nobody ever crossed over. Some did. But it depended on individual curiosity, not on how the work flowed. The spec arrived, the work got done, and the context that would have changed how it got done stayed where it was.

The instinct is always to fix this with process. Weekly syncs. Shared dashboards. "Everyone should check the other systems." A new checklist on top of the last checklist. It doesn't stick. People gravitate towards the system where their work happens. Everything else is overhead they'll do for a week, maybe two, and then stop. McKinsey found that employees spend 1.8 hours per day just searching for and gathering information. The information is there. Finding it has become the job.

Nobody decided to silo the teams. It's just where things naturally end up. People use the tool that does their job, and the context stays where it was created. But when context did flow, the results were obvious. An engineer who happened to read through the support tickets before picking up a feature would build it differently. Better. More aligned with what customers actually needed rather than what the spec described. "Happened to" is the key phrase. It was serendipity, not a system.

The loop

There's a loop that runs in every product team, whether they've named it or not. Identify an opportunity. Build something. Measure whether it worked. Feed the results back into the next decision.

At small scale, this loop runs naturally. A small remote team on Slack, maybe one or two channels for everything. Everyone sees the customer quote get pasted in. Everyone catches the thread where someone shares a support pattern. Everyone knows why yesterday's decision got made, because they were in the conversation when it happened. The context is ambient. Nobody has to go looking because it's already there.

Then the team grows. One channel becomes ten. Team channels appear where people outside the team feel less welcome. Private channels emerge where context is completely hidden, sometimes even from leadership. Each split makes sense in isolation. Nobody wants to be spammed with irrelevant messages. But each one is another fracture in the shared picture.

At scale, each step of the loop gets handled by different people with different tools. The people identifying opportunities aren't the people building. The people building aren't the people measuring. The people measuring aren't feeding results back to the people making the next decision.

The loop doesn't stop working because someone failed. It degrades because each step is missing context from the step before it. The engineer builds the feature without knowing why it was prioritised. The PM measures outcomes without knowing what trade-offs the engineer made during implementation. The results feed back into a planning process that has already moved on to the next thing.

Not everything that goes through this loop needs the same level of rigour. A button change, a copy tweak: the loop still runs, it just runs quickly. Ship it, measure it, move on. Larger things need more care. Pre-committed success criteria. Proper measurement. Someone making sure the results actually inform the next decision rather than disappearing into a dashboard nobody checks. The loop is the same at every scale. The discipline scales with the stakes.

GitLab's 2025 survey of over 3,000 DevSecOps professionals found that 60% of organisations use five or more tools for software development. They called it the "AI Paradox": AI accelerates coding, but fragmented toolchains create new bottlenecks. Their number: seven hours per week per person lost to inefficient processes, collaboration barriers, and limited knowledge sharing across teams. That's almost a full working day, every week, spent not on building but on finding things out and getting people on the same page.

What the fastest-growing company in the world figured out

Amol Avasare, head of growth at Anthropic, was on Lenny's Podcast recently talking about something that caught my attention. Anthropic's growth team has built a system where an agent identifies opportunities for growth experiments, builds the experiment, checks it against brand guidelines and quality standards, and then analyses the results. Small experiments run fast with minimal human review. Larger ones get more oversight.

The thing that makes it work isn't the agent. It's that the agent has context. Brand guidelines, previous experiment results, what's been tried before, what worked and what didn't: all of it is available to the agent as part of the process. Not sitting in a separate system waiting for someone to go find it. Built into the loop.

Amol was talking about growth experiments specifically, but the loop is the same for any product decision. Identify the opportunity. Build something. Measure whether it worked. Feed the results back. The principle is the same whether you're running a growth experiment at Anthropic or deciding what feature to build next at a 20-person startup.

He made another observation that stuck with me. The thing that makes this hard for larger projects isn't the loop itself. It's the cross-functional coordination. Getting six people to align. His head of design put it best: "We will have AGI and it will still be impossible to get six people in a room to align."

I think shared context would take a lot of the pain out of that. Not because it eliminates disagreement. People will still disagree about priorities, about trade-offs, about what matters most. But at least they'd be disagreeing from the same starting point. I'd bet that a good chunk of the friction in those conversations comes from people arguing past each other because they're each working from a different slice of the picture. That's not a disagreement. That's a failure to share information.

Agents inherit the problem and amplify it

Here's where it gets interesting. And urgent.

Agents are now part of the team. They're writing code, running experiments, analysing data. And they have the same problem the engineers had, except worse.

Even on a fully remote team, humans accumulate context over time. The Slack thread you happened to read three weeks ago. The customer call recording someone shared. The standup where someone mentioned a pattern in the support tickets. You build up a fuzzy sense of what matters, even if you can't always point to where you heard it.

Agents don't have that. They have exactly what you give them. Nothing more. An agent building a feature has no idea that the support team fielded 47 tickets about the same friction last month, unless that context is somewhere the agent can reach. It doesn't accumulate a sense of things over time. It starts cold every session.

The current wave of solutions looks like this: connect your agent to your project tracker. Give Cursor your codebase. Set up an integration for your support system. Each agent gets smarter about its own slice. The engineer's agent knows the codebase inside out. The PM's agent can recite every opportunity. The support agent sees every ticket.

But this is the same trap. Three teams with three agents, each with deep context about one silo and virtually zero awareness of the others. The engineer's agent has never seen a customer quote. The PM's agent doesn't know what shipped last week. The support agent has no idea what's being built. A recent report on context management found that 57% of organisations are duplicating AI efforts across departments because there's no shared context layer connecting them. Each team builds its own context, picks its own tools, and defines its own version of what matters.

Apparently we're now in the era of AI psychosis. Spend long enough talking to an agent and you come out convinced you've cracked it. The thing keeps refining your reasoning, agreeing with your approach, telling you the logic is sound. (Is that actually psychosis, or am I stretching the term to make a point? A bit of both.)

Now multiply it across a team. The PM's agent confirms the PM's priorities are spot on. The engineer's agent confirms the technical approach is solid. The support lead's agent confirms the customers are being heard. Everyone walks into the meeting more certain than ever, armed with AI-backed evidence, all looking at a different slice of the same picture. Alignment just got harder, not easier.

Individual tool memory is the same problem as "everyone has access to all three tools" was. The information is technically reachable. Nothing connects it. You've added agents to the team and given them the same fragmented view that was already failing the humans.

The problem isn't giving agents more memory. It's giving them our shared context. We might still be deluded, but at least we'd be on the same page.

A question worth asking

I keep seeing this pattern at every scale. A small team where context flows naturally. A growing team where it fragments across three tools. A larger company where the fragmentation has become so deep that entire teams duplicate work because they don't know what the team next door already tried.

And now, agents entering the loop with deep expertise in their own silo and virtually no awareness of anything outside it. Each one getting really good at its own job while the bigger picture drifts.

The information exists. It always exists. Customer quotes, support patterns, experiment results, the reasoning behind last month's decision. It's all somewhere. The problem has never been generating context. We generate mountains of it. The problem is that it's not where it needs to be when someone, or something, is making a decision.

So here's the question I think every team should be asking right now: when someone picks up a task (a teammate, an agent, you) what do they actually know? Do they know why this work was prioritised? Do they know what customers have said about this problem? Do they know what was tried before and why it didn't work? Or do they just know what's in the ticket?

And the harder follow-up: if they don't know those things, how is the agent any different from the engineer who never checked the product tool?

I don't think anyone has fully solved this yet. Anthropic is further along than most because they're building the tools and using them at the same time. But I think the shape of the answer is becoming clearer: context can't be something you go and find. It has to be something that flows to where decisions happen, whether those decisions are being made by a person or an agent.

AI can build anything now. The next challenge is making sure it knows what's worth building.