Do More Newsletter

This issue contains featured article "Why AI Makes Mistakes: Understanding AI's Limits and Biases" and exciting product information about PetClaw AI – Autonomous Desktop Companion, Snowflake Project SnowWork – Autonomous Enterprise AI Workbench, Unily Glass – AI Native Employee Productivity Layer, Beautiful.ai – Context Aware Workflow for Presentations, and Picsart AI Copilot in Flow & AI Agent Marketplace – Creative AI that Acts.

In partnership with

Keep up to date on the latest products, workflows, apps and models so that you can excel at your work. Curated by Duet.

If you want help deploying AI in your business, email us at [email protected].

Stay ahead with the most recent breakthroughs—here’s what’s new and making waves in AI-powered productivity:

PetClaw AI is a new “desktop pet” that quietly sits on your computer and works alongside you, automating repetitive tasks across apps while learning your habits over time. Instead of being just a cute animation, PetClaw acts as an AI companion that can summarize online discussions, monitor trends for content creation, and adapt to your workflows so support gets smarter the more you use it. With an expanding Skill Store, you can add modules for things like financial monitoring, automated data collection, or video generation without needing to train models yourself, making it an accessible automation layer for everyday users and creators.

Project SnowWork is Snowflake’s new autonomous enterprise AI platform designed to function as a proactive “AI partner” for business users, orchestrating planning, analysis, and execution from a single interface. Instead of just answering questions, SnowWork can handle complex multi step workflows like querying data, running analyses, synthesizing insights, and preparing presentation ready deliverables from conversational prompts. It also includes preconfigured, role specific skills (for finance, sales, marketing, operations, and more) so non technical users can move from intent to action without relying on analysts or hopping between dashboards and tools.

Unily Glass is a new conversational “execution layer” that sits on top of a company’s existing systems and turns the Unily employee experience platform into a place where work actually gets done, not just stored. Employees can move directly from information to action—raising IT tickets, requesting time off, or executing workflows in tools like Workday, Salesforce, ServiceNow, Microsoft 365, Google Workspace, and Slack—without switching apps or juggling multiple AI assistants.

Beautiful.ai, already known for its AI powered presentation software, has launched a new context aware AI workflow that takes users from first prompt to finished deck in minutes via a smarter conversational interface. The system can ingest prompts, outlines, and documents, then automatically structure slides, apply design best practices, and iterate with you in real time so you spend less time formatting and more time on the story.

Picsart, the design platform serving over 130 million creators, has opened the waitlist for its AI Agent Marketplace and unveiled an AI Copilot in Flow that aims to turn every creator into their own production team. Instead of only generating one asset at a time, Picsart’s specialized agents are designed to plan, execute, and deliver complete creative workflows, using market research, live data, and integrations while you control everything via simple briefs and approvals, even from WhatsApp or Telegram. For creators overwhelmed by editing, repurposing, and publishing across channels, these agents promise to take on the operational heavy lifting so you can focus on ideas and strategy.

PetClaw AI looks like a playful desktop companion, but under the hood it is a fully autonomous AI agent built to live where you work and quietly clear friction from your day. Rather than existing as yet another chat window, PetClaw runs across applications, observes your typical patterns, and uses long term memory to become more helpful over time. Early users report it can automatically summarize online discussions and track trends for content creation, effectively acting as a background researcher that keeps pace with fast moving conversations and feeds you the highlights.

The headline update is PetClaw’s new Skill Store, which turns this single companion into a modular platform that can grow with your needs. You can install skills for financial monitoring, automated data collection, video generation, or even complex literature organization, all without needing to prompt engineer or fine tune your own models. For a solo creator or busy professional, that means you can “hire” specialized AI capabilities as easily as installing mobile apps, then let PetClaw orchestrate them on your desktop so workflows run in the background while you focus elsewhere.

The practical benefit of this model is that automation moves from being a one off script to a living system that adapts as your work changes. For example, a content creator might combine skills that monitor social channels, collect relevant posts, and then trigger video generation pipelines, so by the time they sit down to work, PetClaw has already assembled research, draft clips, and performance insights. Because it leverages long term memory, PetClaw can learn which sources you trust, which formats you prefer, and what “good” looks like for you, leading to more relevant outputs with less micromanagement over time.

What makes PetClaw notable in this week’s wave of AI launches is that it brings enterprise style autonomous agents into a consumer friendly package that runs on your own desktop. Instead of logging into a separate SaaS dashboard or configuring complex integrations, you install a single companion that works across your existing apps and grows via skills as you discover new use cases. For average users curious about AI but unwilling to rebuild their workflow from scratch, PetClaw offers a path to 24/7 assistance that feels approachable, visual, and incrementally more powerful the longer you keep it around.

Why AI Makes Mistakes: Understanding AI's Limits and Biases

You've probably had the experience by now. You ask ChatGPT, Claude, or Gemini something straightforward — maybe a historical date, a legal citation, or a recipe conversion — and it answers with perfect confidence. Smooth prose, zero hesitation. And it's completely wrong.

Welcome to the weird reality of large language models in 2026. These systems can ace the MCAT, write passable code, and explain quantum mechanics to a ten-year-old. They can also invent court cases, fabricate research papers, and insist on "facts" they pulled from nowhere.

If you've spent any time with these tools, none of that is news. The more interesting question is why. Understanding how AI makes mistakes — and how those mistakes differ fundamentally from human error — is the key to using these tools well instead of being used by them.

The Guessing Machine

Here's the most important thing to understand about LLMs: they don't know things. They predict things.

When you ask a language model a question, it's not retrieving a fact from a database. It's generating the most statistically probable sequence of words based on patterns absorbed during training. This is an incredibly powerful trick — powerful enough to simulate understanding across a staggering range of topics. But simulation and understanding are not the same thing.

This distinction is the root cause of hallucination, the industry's polite term for when AI just makes stuff up. OpenAI published research in 2025 arguing that hallucination isn't really a bug — it's an incentive problem baked into how these models are trained and evaluated. Think of it like a multiple-choice exam that doesn't have a "none of the above" option. If the model doesn't know your birthday, guessing "September 10th" gives it a 1-in-365 shot at being right. Saying "I don't know" guarantees a zero. Multiply that across thousands of evaluation questions, and the model that guesses aggressively looks smarter on the leaderboard than the model that's honest about its limits.

This isn't a theoretical concern. A 2025 Stanford study found that AI tools used in legal research hallucinated between 17 and 34 percent of the time — even when they cited real documents. They just interpreted them wrong. In late 2025, Deloitte submitted a government report in Australia that contained fabricated academic citations. GPTZero scanned 300 papers submitted to ICLR, one of the world's top machine learning conferences, and found that over 50 contained hallucinated citations that had slipped past three to five peer reviewers each.

The models are getting better, no question. Newer reasoning models hallucinate less, especially on straightforward factual queries. But the fundamental tension — between "guess confidently" and "admit uncertainty" — remains unsolved. As long as the training pipeline and benchmark culture reward confident answers over calibrated ones, hallucination will persist.

Bias All the Way Down

If hallucination is AI's honesty problem, bias is its worldview problem. And it goes deeper than most people realize.

LLMs are trained on enormous scrapes of the internet, and the internet is not a representative sample of humanity. Large-scale web corpora encode social stereotypes — UNESCO's analysis of major LLMs found systematic gendered associations in generated content, with female names consistently linked to domestic and family roles while male names skewed toward business and leadership. The result is a model that doesn't just reflect existing societal biases — it can amplify them at scale.

A 2025 study published in PNAS found something that should unsettle anyone who assumes alignment training has fixed this: even models that pass explicit bias tests with flying colors still harbor implicit biases. The researchers developed a word association test for LLMs, analogous to the Implicit Association Test used in human psychology, and found that models aligned to be fair and egalitarian still showed measurable stereotypical associations at the implicit level. The surface was clean. The wiring underneath was not.

Meanwhile, a study published in Nature Computational Science tested 77 different LLMs and found that nearly all base models — and even some instruction-tuned ones — exhibited in-group favoritism and out-group hostility patterns that mirror well-documented human social identity biases. These weren't fringe models. These were the foundation models powering products used by hundreds of millions of people.

And here's the part that should really get your attention: Stanford researchers identified what they call "ontological bias" — the idea that AI systems don't just reflect biases about what is, but shape assumptions about what can be imagined. When a PhD candidate asked an image generator to draw a tree, it consistently produced trees without roots, no matter how she adjusted her prompt. It took explicitly philosophical prompting — "everything in the world is connected" — before roots appeared. The model had internalized a narrow assumption about what a tree is, and that assumption constrained the output space.

As these systems embed themselves in education, healthcare, and creative work, ontological bias becomes an invisible ceiling on thinking itself.

The Bias That Wasn't Supposed to Be There

Here's the twist that makes 2026 different from 2024: we now know that bias mitigation doesn't generalize the way people hoped.

Stanford Law professor Julian Nyarko and colleagues published research showing that while you can prune biased neural pathways from an LLM — essentially deactivating the artificial neurons responsible for biased outputs — the fix is maddeningly context-specific. Prune the bias out of financial decision-making scenarios and it persists in hiring scenarios. Fix it for hiring and it shows up in consumer recommendations.

This has enormous policy implications. If a one-size-fits-all debiasing strategy doesn't work, then holding the model developers responsible for all downstream bias may be the wrong approach. Nyarko's argument — which is gaining traction in legal scholarship — is that accountability should shift toward the companies deploying these models in specific use cases. The online retailer using GPT to make product recommendations, not OpenAI itself, is better positioned to test for and mitigate the biases that matter in that context.

There's also a brand-new category of bias that didn't exist two years ago: AI-AI bias. A July 2025 PNAS study found that LLMs consistently prefer content generated by other LLMs over human-written content. In one condition, when GPT-4 generated the pitch text, the AI selector chose the LLM-written version 78% of the time for academic papers and 89% of the time for consumer products. Human preferences were weaker and more variable — they sometimes leaned toward the AI-written version too, but not with the same consistency or magnitude. The models are, in effect, developing a taste for their own kind.

The implications are unsettling. As AI systems increasingly mediate hiring decisions, content curation, loan approvals, and academic review, a systematic preference for AI-generated content could compound into real economic disadvantage for humans who don't use AI — or don't use it well enough for their output to read like AI output. The researchers explicitly warn about "cumulative disadvantage effects" reminiscent of systemic discrimination research.

The Benchmark Illusion

Part of what keeps these problems hidden is the way we measure AI performance. The headline numbers — "GPT-5 scores 95% on the bar exam!" — create an impression of near-human competence that collapses under scrutiny.

Recent research has shown that standard benchmarks routinely overstate real-world performance. A 2024 study presented at NAACL documented how data from benchmark test sets can leak into training data, producing inflated scores that don't transfer to novel problems. Cultural and domain-specific biases in the benchmarks themselves mean that a model performing brilliantly on English-language tasks can fall apart on Arabic or Persian ones. And as we discussed, accuracy metrics that reward guessing create perverse incentives that mask the actual reliability of these systems.

The industry is slowly moving toward better evaluation — unseen problem sets, real-world task simulations, LLM-as-judge methods — but the gap between benchmark performance and actual trustworthiness remains large enough to drive a truck through.

So What Do We Do?

None of this means AI is useless. It means AI is a tool, and like all tools, it works best when the person holding it understands its failure modes.

A few practical principles:

Verify anything that matters. If you're citing a fact, making a decision, or publishing a claim that originated with an LLM, check it. The models are unreliable narrators by design — not because they're trying to deceive you, but because their architecture genuinely cannot distinguish between "I know this" and "this sounds plausible."

Watch for invisible bias. The explicit, in-your-face biases are mostly handled by alignment training. The implicit ones — the subtle patterns in word choice, framing, and what gets included or excluded — are not. If you're using AI for anything involving people, hiring, evaluation, content curation, or recommendation, assume bias is present and test for it.

Don't trust confidence. The most dangerous feature of LLMs is that they sound equally certain whether they're right or wrong. Calibrated uncertainty, the ability to say "I'm about 60% sure," is something these models are only beginning to develop. Until it's standard, treat AI confidence as a stylistic feature, not an epistemic signal.

Understand the limits of your specific model. Not all LLMs are the same. Smaller models hallucinate more. Open-weight models may have less alignment training. Models fine-tuned for one domain (say, coding) may perform unpredictably in another (say, medical advice). Know what you're working with.

The AI industry is spending enormous resources trying to fix these problems, and progress is real. But the honest truth, as of 2026, is that we're managing these limitations more than solving them. The models are getting better. They're not getting reliable — not yet, and not in the ways that matter most.

The best defense isn't better AI. It's a better-informed user.

Partner Spotlight: Duet Display

Duet Display turns your iPad, Android tablet, extra PC or Mac into a high performance second display for your Mac or PC. It helps you to reclaim screen real estate without buying a new monitor. Built by former Apple engineers, Duet focuses on low latency, wired or wireless connectivity so you can drag windows, dashboards, or creative tools onto a separate screen and stay in flow while you work or create. It is especially useful for AI powered workflows, letting you keep chatbots, dashboards, or reference material on one display while you write, design, or code on the other, all from hardware you already own. Sign up at Duet Display.

The Future of AI in Marketing. Your Shortcut to Smarter, Faster Marketing.

Unlock a focused set of AI strategies built to streamline your work and maximize impact. This guide delivers the practical tactics and tools marketers need to start seeing results right away:

  • 7 high-impact AI strategies to accelerate your marketing performance

  • Practical use cases for content creation, lead gen, and personalization

  • Expert insights into how top marketers are using AI today

  • A framework to evaluate and implement AI tools efficiently

Stay ahead of the curve with these top strategies AI helped develop for marketers, built for real-world results.

Stay productive, stay curious—see you next week with more AI breakthroughs!