Cognition’s Scott Wu says AI coding agents shouldn’t replace humans ↗
Cognition’s Scott Wu pushed back on the idea that Devin is built to replace programmers outright. A slightly awkward stance, given the company also says Devin commits a huge share of its own code.
His framing is more “AI coding buddy” than “your job has been eaten by a laptop goblin.” Still, the tension is hard to miss - better agents, fewer excuses for bloated engineering teams… or so the argument goes.
This AI startup will clean your home for free to train future robots ↗
Shift is offering free home cleaning, with a catch that is both handy and faintly unsettling: cleaners wear a camera-equipped “magic hat” so the company can gather robot-training data.
The pitch is simple - you get a clean flat, they get video of domestic chores. A tidy bargain, maybe.
Shift says it blurs sensitive details and anonymizes footage, but the broader question is still sitting there like a sock under the sofa: how much home privacy are people willing to trade for convenience?
Anthropic releases Claude Opus 4.8 ↗
Anthropic rolled out Claude Opus 4.8 with upgrades across coding, agentic workflows, reasoning, and professional work. The big sell is reliability - fewer unsupported claims, better tool use, and more self-checking.
Claude Code also gets dynamic workflows, letting the model plan, spin up parallel sub-agents, verify outputs, and report back. That sounds dry until you realise it’s basically project management in a trench coat.
Pricing stays split between standard and fast modes, with Anthropic leaning harder into effort controls so users can trade off speed, quality, and token burn.
Foxconn has immense confidence in growth momentum due to AI, chairman says ↗
Foxconn’s chairman said AI demand is changing the company’s usual seasonal rhythm. The old mid-year supplier slump? Apparently not acting normal anymore.
The reason is cloud giants’ gigantic AI spending, which Foxconn sees as its own market opportunity. That’s the hardware side of the AI boom, less glossy than chatbots, but very much where the money-pipes are clanging.
Foxconn is already a major Nvidia server maker, so its optimism is basically a temperature check on the AI infrastructure race.
A shared playbook for trustworthy third party evaluations ↗
OpenAI published guidance on third-party AI evaluations, arguing that tests need clearer detail about what was evaluated, how it was tested, and what the results can prove.
The core point is surprisingly practical: frontier AI evals can’t just be leaderboard-shaped guesswork. Evaluators need to explain the tested system, prompts, safeguards, validity checks, and where claims stop.
That matters because as models get more agentic, shallow tests can make systems look safer or stronger than they are. Small paperwork energy, big consequences.
FAQ
Are AI coding agents like Devin meant to replace programmers?
Scott Wu frames AI coding agents as coding partners rather than full replacements for human programmers. The article does point to a tension, however: Devin is also described as contributing a large share of Cognition’s own code. In practical terms, the takeaway is that these tools may reduce some routine engineering work while still depending on humans for judgment, direction, and accountability.
Why is Shift offering free home cleaning for AI training data?
Shift is offering free home cleaning because it wants physical-world video data of domestic chores to train future robots. Cleaners wear a camera-equipped “magic hat” while working, producing footage that can help AI systems understand household tasks. The exchange is clear: customers get a clean home, while the company gets data from private living spaces.
How does Shift handle privacy when collecting home-cleaning footage?
The article says Shift claims it blurs sensitive details and anonymizes footage. That may reduce some privacy risks, but it does not remove the broader concern of recording inside people’s homes. For users, the central question is whether the convenience of free cleaning feels worth that level of data collection.
What is new in Claude Opus 4.8?
Claude Opus 4.8 is described as improving coding, agentic workflows, reasoning, and professional work. The update centers on reliability, including fewer unsupported claims, stronger tool use, and more self-checking. Claude Code also gains dynamic workflows, where the model can plan, run parallel sub-agents, verify outputs, and report results.
Why does Foxconn’s AI boom optimism matter?
Foxconn’s confidence matters because it reflects the hardware side of the AI boom. The company’s chairman said AI demand is changing its usual seasonal pattern, with cloud giants’ infrastructure spending creating a major market opportunity. Since Foxconn is already a major Nvidia server maker, its comments serve as a strong signal for demand in AI infrastructure.
What does OpenAI say makes third-party AI evaluations trustworthy?
OpenAI argues that AI evaluations need clearer explanations of what system was tested, how it was tested, and what the results genuinely demonstrate. That includes details about prompts, safeguards, validity checks, and the limits of any claims. The point is especially important for more agentic models, where shallow tests can make systems appear safer or more capable than they are.