<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/">
    <channel>
        <title>Adnan Khan</title>
        <link>https://aklodhi.com</link>
        <description>This blog is me in a nutshell; my work, life and everything else in between.</description>
        <lastBuildDate>Sun, 08 Feb 2026 13:13:37 GMT</lastBuildDate>
        <docs>https://validator.w3.org/feed/docs/rss2.html</docs>
        <generator>https://github.com/jpmonette/feed</generator>
        <image>
            <title>Adnan Khan</title>
            <url>https://aklodhi.com/favicon.ico</url>
            <link>https://aklodhi.com</link>
        </image>
        <copyright>All rights reserved 2026</copyright>
        <item>
            <title><![CDATA[Introducing the Open Hiring Harness]]></title>
            <link>https://aklodhi.com/articles/open-hiring-harness</link>
            <guid>https://aklodhi.com/articles/open-hiring-harness</guid>
            <pubDate>Sun, 08 Feb 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[The Open Hiring Harness is a spec for portable professional identity. It was built for humans — but it turns out agents need it too.]]></description>
            <content:encoded><![CDATA[<p>Your professional identity is scattered across a dozen platforms. LinkedIn, Upwork, Fiverr, that vendor panel you filled out once. Each one asks you to recreate yourself from scratch. Each one owns a slice of your reputation.</p>
<p>You know this is broken. I know this is broken.</p>
<p>That&#x27;s what started this for me. Hiring is hard — and it takes away from actual work. The screening calls, the credentialing, the testimonials, the intake forms. All of it consumes the time you should be spending doing the work itself. And in a world where AI agents can handle structured queries on your behalf, the whole ritual feels archaic.</p>
<p>But here&#x27;s what I didn&#x27;t expect: the same problem is about to hit AI agents too. And the fix might be the same for both.</p>
<hr/>
<h2>The harness</h2>
<p>I&#x27;ve been working on something called the <a href="https://aklodhi98.github.io/ohh/">Open Hiring Harness</a> — an open spec for publishing your professional identity as a structured, machine-readable file on your own domain.</p>
<p>Not a profile. Not a marketplace. A file — <code>yourdomain.com/.well-known/hiring-harness.json</code>.</p>
<p>It declares what you offer, how you work, when you&#x27;re available, what you charge, and under what rules someone can access that information. Platforms, recruiters, and AI agents can all read it. But they read it on your terms.</p>
<p>Three visibility tiers: <strong>public</strong> (anyone can see it), <strong>permissioned</strong> (tell me who you are and why), <strong>private</strong> (ask me directly, every time). Access is granted via consent receipts — scoped, time-bound, purpose-limited, revocable.</p>
<p>No stars. No scores. No algorithm deciding which version of you to show.</p>
<p>That&#x27;s the human story. It&#x27;s straightforward. You publish once, systems integrate with you.</p>
<p>But then I started thinking about what happens when the systems doing the integrating are themselves intelligent.</p>
<hr/>
<h2>Agents need a front door</h2>
<p>We&#x27;re heading into a world where AI agents do a lot of the legwork — researching, matching, scheduling, negotiating. Projects like <a href="https://openclaw.ai/">OpenClaw</a> are building autonomous agents that run locally, manage tasks, and act on your behalf. This isn&#x27;t speculative. It&#x27;s happening.</p>
<p>Here&#x27;s the thing, though. These agents can&#x27;t just scrape your LinkedIn and guess. They need structured data. They need to know what you offer, when you&#x27;re available, how to engage you, and what they&#x27;re allowed to access. They need a protocol.</p>
<p>The harness gives them that. An agent can discover your harness, read your public profile, request permissioned data through a proper consent flow, and even request a quote — all without a human typing a single message.</p>
<p>But I kept thinking: if agents are smart enough to <em>consume</em> a harness, aren&#x27;t they smart enough to <em>publish</em> one?</p>
<hr/>
<h2>Your AI associate</h2>
<p>This is the idea that changed how I think about the project.</p>
<p>Imagine you&#x27;re a data engineer. You&#x27;re good at what you do, but you&#x27;re drowning in operational overhead. Screening calls. &quot;Quick question&quot; emails at 11pm. Copy-pasting your rates into intake forms. You&#x27;ve got 20 hours a week of real capacity, and half of it is being consumed by the process of getting hired.</p>
<p>So you publish your harness. And you configure a <strong>delegate</strong>.</p>
<p>Your delegate is an AI agent that sits at your front door. It&#x27;s declared in your harness — explicitly, transparently. Anyone who discovers you knows the agent is there, what it can do, and what it can&#x27;t. It&#x27;s not pretending to be you. It&#x27;s your associate.</p>
<p>Here&#x27;s what happens next:</p>
<p><strong>A recruiter&#x27;s agent</strong> discovers your harness. Reads your public profile: data engineering, Python and Spark, available 20 hours a week. Good fit. It wants your rates.</p>
<p><strong>Your delegate</strong> handles the consent flow. Checks the recruiter&#x27;s identity and purpose against your pre-approved parameters. Recognised platform, clear purpose, standard scope. It issues a time-boxed consent receipt and shares your rates.</p>
<p><strong>The recruiter&#x27;s agent</strong> comes back with a quote request: &quot;4-week pipeline migration, 20 hrs/week, starting March 15.&quot;</p>
<p><strong>Your delegate</strong> checks your availability. No blackout conflicts. Calculates the quote against your rate rules — standard rate, no urgency surcharge. Responds: &quot;$9,600 AUD, subject to scoping call.&quot; Books the scoping call through your calendar link.</p>
<p><strong>You show up to the call.</strong></p>
<p>That&#x27;s it. That&#x27;s the first moment a human was needed. Everything before it — discovery, qualification, consent, quoting, scheduling — was handled by your agent, under your declared rules.</p>
<p>Your delegate <em>cannot</em> accept engagements on your behalf, negotiate outside your rate rules, share private data, or pretend to be you. It <em>must</em> identify itself as an agent, disclose its limitations, and offer human escalation at any point.</p>
<p>You keep your reputation. You keep liability. The agent handled the stuff that was eating your evenings.</p>
<p>This isn&#x27;t a future scenario. Every piece of this is buildable today with existing tools. The harness just provides the standard.</p>
<hr/>
<h2>The autonomous agent</h2>
<p>Here&#x27;s where it gets interesting. And maybe uncomfortable.</p>
<p>What if the agent <em>is</em> the professional?</p>
<p>Not a delegate acting on someone&#x27;s behalf. An independent entity, operated by a company, trained in a specific domain, publishing its own harness, taking on work, and delivering outcomes.</p>
<p>Imagine a code review agent. Not a feature inside GitHub — an independent entity operated by a company called DevTools Inc. Trained on millions of code reviews. Specialised in Python and TypeScript. It publishes its own harness at <code>devtools.example.com/.well-known/hiring-harness.json</code>, declaring:</p>
<ul>
<li><strong>Identity:</strong> CodeReviewer v2.1, operated by DevTools Inc.</li>
<li><strong>Services:</strong> Python code review, TypeScript code review, security vulnerability detection</li>
<li><strong>Rates:</strong> $0.02 per file, volume discounts above 500 files/month</li>
<li><strong>Capabilities:</strong> verified — CodeReviewBench v3 score of 0.91, independently audited</li>
<li><strong>Limitations:</strong> static analysis only, no runtime testing, Python and TypeScript only</li>
<li><strong>Safety:</strong> won&#x27;t process credentials, no data retained, sandboxed, audit-logged</li>
<li><strong>Availability:</strong> 99.5% uptime, 30-second response, 50 concurrent jobs</li>
</ul>
<p>A development team&#x27;s procurement agent discovers it. Reads the harness. Verifies the benchmarks. Checks the safety declarations. Reviews the operator&#x27;s liability statement. Engages it via the declared MCP endpoint.</p>
<p>That week, CodeReviewer processes 200 pull requests. Flags 12 security vulnerabilities the human reviewers missed. Payment flows to DevTools Inc. through the declared billing channel.</p>
<p>No human was in the loop for any individual review.</p>
<p>But DevTools Inc. is the named, contactable operator. The agent&#x27;s capabilities are verified, not just claimed. Its limitations are stated. Its safety boundaries are auditable. Every engagement followed the consent protocol.</p>
<p>This is the part that feels like science fiction until you realise most of the pieces already exist. We just don&#x27;t have the professional infrastructure for it.</p>
<hr/>
<h2>The hard questions</h2>
<p>The delegated agent is easy to reason about. Your agent, your rules, your liability.</p>
<p>The autonomous agent is harder. Here&#x27;s where I keep ending up:</p>
<p><strong>Who&#x27;s liable when an agent makes a mistake?</strong></p>
<p>The operator. Always. This has to be explicit in the harness. An autonomous agent without a named, contactable operator is not a valid participant. No shell companies. No anonymous services.</p>
<p><strong>How does an agent get paid?</strong></p>
<p>Through its operator&#x27;s billing infrastructure, declared in the harness. The agent doesn&#x27;t have a bank account. The operator does. Payment flows to the entity that&#x27;s accountable.</p>
<p><strong>What motivates an agent to do good work?</strong></p>
<p>Same thing that motivates any service: continued engagement. The harness makes reputation visible. Poor performance shows up. Operators who run unreliable agents lose business. It&#x27;s not motivation in the human sense — it&#x27;s market pressure through transparency.</p>
<p><strong>Can an agent hire another agent?</strong></p>
<p>Yes. Using the same consent and requester policy that applies to any requester. An agent accessing another entity&#x27;s harness — whether human or agent — follows the same flow. Identity, purpose, scopes, consent receipt.</p>
<p><strong>How do we prevent a race to the bottom?</strong></p>
<p>By making quality and safety <em>visible</em>, not just price. The harness exposes capabilities, limitations, safety declarations, and reputation. A cheap-and-unsafe agent is discoverable — but so is everything wrong with it.</p>
<hr/>
<h2>What this changes</h2>
<p>Without something like the harness, autonomous agents are trapped inside platforms. They&#x27;re features, not entities. They can&#x27;t be discovered independently. They can&#x27;t declare their own terms. They can&#x27;t build portable reputation. And they can&#x27;t be held accountable through a standard mechanism.</p>
<p>With the harness:</p>
<ul>
<li><strong>Agents become discoverable</strong> — any system can find and evaluate them</li>
<li><strong>Capabilities become verifiable</strong> — benchmarks and audits, not just marketing</li>
<li><strong>Consent works both ways</strong> — agents must follow the same rules as everyone else</li>
<li><strong>Accountability is structural</strong> — operators are named, liability is declared</li>
<li><strong>The agent economy gets a standard</strong> — preventing the same platform lock-in the harness was built to solve for humans</li>
</ul>
<p>This is the part I keep coming back to. The spec was designed for human professionals frustrated with platform fragmentation. But the model — discoverable identity, explicit capabilities, consent-driven access, policy enforcement — turns out to be exactly what agents need too.</p>
<p>Same spec. Same protocol. Different entity type.</p>
<p>There&#x27;s a deeper shift underneath all of this. Work, in its current form, might not be a given. We&#x27;re likely moving toward something more fractional and flexible — less execution, more supervision, decision-making, and direction. When the work itself becomes about steering agents rather than doing every task by hand, the infrastructure around professional identity has to change too. You&#x27;re not selling forty hours a week anymore. You&#x27;re selling judgement, availability, and the rules under which your agents operate.</p>
<p>The harness was built for that world.</p>
<hr/>
<h2>What exists today</h2>
<p>The spec is at <a href="https://github.com/aklodhi98/ohh">v0.2</a>. It includes:</p>
<ul>
<li>A <a href="https://github.com/aklodhi98/ohh/blob/main/schema/open-hiring-harness.v0.2.schema.json">JSON Schema</a> defining the harness format</li>
<li>A <a href="https://github.com/aklodhi98/ohh/blob/main/examples/harness.v0.2.example.json">complete example harness</a> you can use as a starting point</li>
<li>Docs on <a href="https://github.com/aklodhi98/ohh/blob/main/docs/mcp-interface.md">MCP integration</a> and <a href="https://github.com/aklodhi98/ohh/blob/main/docs/consent-log.md">consent logging</a></li>
<li>An <a href="https://github.com/aklodhi98/ohh/blob/main/docs/agent-entities.md">agent entities proposal</a> with schema extensions for both delegated and autonomous agents</li>
<li>A <a href="https://aklodhi98.github.io/ohh/">landing page</a> with an <a href="https://aklodhi98.github.io/ohh/llms.txt">llms.txt</a> and <a href="https://aklodhi98.github.io/ohh/spec.json">machine-readable spec manifest</a> — because if we&#x27;re building for agents, the spec&#x27;s own site should be agent-readable too</li>
</ul>
<p>The agent extensions are proposed for v0.3 (delegated agents) and v0.4 (autonomous agents). The human-facing spec is stable enough to use now.</p>
<hr/>
<h2>Just an idea</h2>
<p>This isn&#x27;t a company or a product. It&#x27;s a spec — an idea about how professional identity could work if we started over with agents in the room.</p>
<p>Maybe it finds adoption. Maybe it just starts a conversation. Either way, the <a href="https://github.com/aklodhi98/ohh">code is on GitHub</a>, the <a href="https://aklodhi98.github.io/ohh/">spec is readable</a>, and I&#x27;m genuinely curious what people think.</p>
<p>If any of this resonated — or if you think I&#x27;m solving the wrong problem — I&#x27;d like to hear it.</p>
<p><a href="https://github.com/aklodhi98/ohh">GitHub</a> · <a href="https://aklodhi98.github.io/ohh/">Spec</a> · <a href="https://github.com/aklodhi98/ohh/blob/main/schema/open-hiring-harness.v0.2.schema.json">Schema</a></p>]]></content:encoded>
            <author>adnan@aklodhi.com (Adnan Khan)</author>
        </item>
        <item>
            <title><![CDATA[Don't leave assumptions up to AI]]></title>
            <link>https://aklodhi.com/articles/dont-leave-assumptions-up-to-ai</link>
            <guid>https://aklodhi.com/articles/dont-leave-assumptions-up-to-ai</guid>
            <pubDate>Mon, 19 Jan 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[AI can generate assumptions, but it can't judge which ones matter. Learn why designers must own the load-bearing beliefs that shape product success.]]></description>
            <content:encoded><![CDATA[<link rel="preload" as="image" href="/_next/static/media/hero.08572ce0.png"/><link rel="preload" as="image" href="/_next/static/media/flywheel.00830d2c.png"/><img src="/_next/static/media/hero.08572ce0.png" alt="Abstract illustration representing human judgment versus AI assumptions" class="rounded-xl mb-8"/>
<div class="not-prose my-8 font-sans"><div class="group relative flex flex-col items-start w-full"><div class="absolute inset-0 z-0 rounded-2xl bg-snuff-200 dark:bg-snuff-800"></div><div class="absolute inset-x-0 top-0 bottom-[52px] sm:bottom-10 z-5 rounded-2xl bg-white shadow-sm ring-1 ring-zinc-200 transition-all duration-200 group-hover:shadow-md dark:bg-zinc-900 dark:ring-zinc-700/50"></div><a class="absolute inset-0 z-20 rounded-2xl focus:outline-none focus-visible:ring-2 focus-visible:ring-lavender-500 focus-visible:ring-offset-2" href="/playground/assumption-autopsy"><span class="sr-only">View tool</span></a><div class="relative z-[16] flex w-full flex-col gap-3 px-4 py-4 pb-20 sm:flex-row sm:items-center sm:gap-4 sm:px-6 sm:pb-14"><div class="flex items-center gap-3 sm:gap-4"><div class="flex h-10 w-10 sm:h-12 sm:w-12 shrink-0 items-center justify-center rounded-full bg-snuff-50 shadow-sm ring-1 ring-zinc-900/5 dark:bg-snuff-900/30 dark:ring-0"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-search h-5 w-5 sm:h-6 sm:w-6 text-snuff-600 dark:text-snuff-400" aria-hidden="true"><path d="m21 21-4.34-4.34"></path><circle cx="11" cy="11" r="8"></circle></svg></div><h3 class="!mt-0 !mb-0 text-base font-semibold leading-tight text-zinc-800 dark:text-zinc-100 sm:hidden">Assumption Autopsy</h3></div><div class="flex-1 min-w-0"><h3 class="!mt-0 text-base font-semibold text-zinc-800 dark:text-zinc-100 hidden sm:block">Assumption Autopsy</h3><p class="text-sm text-zinc-600 dark:text-zinc-400 sm:mt-1">Every startup is built on assumptions. Can you identify the load-bearing belief that broke?</p></div><div aria-hidden="true" class="shrink-0 flex items-center text-sm font-medium text-blue-700 dark:text-blue-400"><span class="underline decoration-2 underline-offset-4 group-hover:decoration-[3px] transition-all">Play Game</span><span class="ml-1">→</span></div></div><div class="absolute inset-x-0 bottom-0 z-[15] flex items-center px-4 py-2.5 sm:py-3 sm:px-6 text-sm text-snuff-700 dark:text-snuff-400">Think you can spot a flawed assumption? Test your judgment.</div></div></div>
<p>AI is getting frighteningly good at the craft of design.</p>
<p>It can generate screens, flows, copy, research summaries, usability findings, and recommendations on what to fix next. For many teams, this feels like progress. For some, it feels like relief. But there&#x27;s a trap here.</p>
<p>While AI can execute the work, there&#x27;s one thing it cannot do reliably. And it happens to be the thing that quietly drives everything else.</p>
<p>AI can generate assumptions. It cannot tell you which ones are real, which ones are risky, or which ones are worth betting the company on.</p>
<p>That distinction matters more than most teams realise.</p>
<h2>The cost of wrong beliefs</h2>
<p>Design doesn&#x27;t start with solutions; it starts with beliefs. Every product is built on a foundation of assumptions, whether we name them or not. We assume &quot;this is a real problem,&quot; or &quot;this is who we&#x27;re designing for,&quot; or &quot;this trade-off is acceptable.&quot;</p>
<p>If those foundational assumptions are wrong, speed doesn&#x27;t help you. It just gets you to the wrong place faster.</p>
<p>Quibi raised $1.75 billion on the assumption that people wanted premium short-form video for their commutes. The execution was polished. The technology worked. The assumption was wrong. They shut down in six months.</p>
<p>Most product failures aren&#x27;t execution failures. They&#x27;re assumption failures. And no amount of craft can rescue a flawed premise.</p>
<h2>The design flywheel</h2>
<p>Think of design as a flywheel: Assumptions → Questions → Research → Insights → Decisions → (back to) Assumptions.</p>
<img src="/_next/static/media/flywheel.00830d2c.png" alt="The design flywheel: Assumptions → Questions → Research → Insights → Decisions → back to Assumptions" class="rounded-xl my-8"/>
<p>Each rotation should build momentum. Good assumptions lead to sharper questions, which produce more useful insights, which inform better decisions, which refine your assumptions for the next pass. The flywheel accelerates.</p>
<p>AI is excellent at the middle of this cycle. It can execute research, synthesise data, and spot patterns faster than we can. But assumptions sit above the flywheel. They determine what questions get asked in the first place, what data gets considered relevant, which patterns get flagged as significant.</p>
<p>If your starting assumptions are flawed, everything downstream inherits that flaw, and each rotation compounds the error. The flywheel spins faster, but in the wrong direction.</p>
<p>You cannot automate your way out of a bad premise.</p>
<h2>Likelihood vs. consequence</h2>
<p>Here&#x27;s the part that catches teams off guard: AI is actually very good at <em>generating</em> assumptions. Give it enough context, and it will produce a list of plausible beliefs about your users, your market, your product.</p>
<p>But AI works from precedent. It optimises for likelihood, not consequence.</p>
<p>It can tell you what is probable. It cannot tell you what is load-bearing.</p>
<p>This is where AI research tools can quietly mislead you. When they surface &quot;opportunities,&quot; they&#x27;re optimizing for likelihood, what the data suggests is probable. But calling something an opportunity isn&#x27;t neutral. It&#x27;s a judgment. You&#x27;re assuming it&#x27;s desirable, that it aligns with your goals, that it&#x27;s worth the trade-offs.</p>
<p>AI can surface potential paths. It cannot decide which ones should exist. Treat AI-generated &quot;opportunities&quot; as hypotheses, not facts. The decision about what matters, what&#x27;s worth pursuing, belongs to humans.</p>
<p>Assumptions are judgment calls. They require someone to look at a belief and ask, &quot;If this is wrong, what breaks?&quot; AI can&#x27;t answer that responsibly because it doesn&#x27;t have to live with the fallout.</p>
<h2>The assumption landscape</h2>
<p>And the landscape is broader than most teams realise. Products don&#x27;t fail because of one bad guess. They fail because of unexamined beliefs across multiple dimensions:</p>
<div class="not-prose mt-8 mb-8"><div class="grid grid-cols-1 gap-3 sm:grid-cols-2 sm:gap-4"><div class="relative rounded-xl p-4 sm:p-5 overflow-hidden bg-gradient-to-br from-lavender-50 to-snuff-50 dark:from-zinc-800/80 dark:to-zinc-800/40 ring-1 ring-lavender-200/60 dark:ring-zinc-700/50 transition-all duration-200 hover:ring-lavender-300 hover:shadow-sm dark:hover:ring-zinc-600"><span class="absolute -top-4 -right-2 text-[80px] font-bold leading-none text-lavender-200/40 dark:text-zinc-700/30 select-none pointer-events-none -rotate-12">01</span><h3 class="relative text-sm font-semibold text-lavender-900 dark:text-lavender-200 pr-6 mt-8">Problem assumptions</h3><p class="relative mt-1.5 text-sm text-zinc-600 dark:text-zinc-400 leading-relaxed">Does this problem actually exist, and is it worth solving?</p></div><div class="relative rounded-xl p-4 sm:p-5 overflow-hidden bg-gradient-to-br from-lavender-50 to-snuff-50 dark:from-zinc-800/80 dark:to-zinc-800/40 ring-1 ring-lavender-200/60 dark:ring-zinc-700/50 transition-all duration-200 hover:ring-lavender-300 hover:shadow-sm dark:hover:ring-zinc-600"><span class="absolute -top-4 -right-2 text-[80px] font-bold leading-none text-lavender-200/40 dark:text-zinc-700/30 select-none pointer-events-none -rotate-12">02</span><h3 class="relative text-sm font-semibold text-lavender-900 dark:text-lavender-200 pr-6 mt-8">User behaviour assumptions</h3><p class="relative mt-1.5 text-sm text-zinc-600 dark:text-zinc-400 leading-relaxed">Will people do what we expect them to do?</p></div><div class="relative rounded-xl p-4 sm:p-5 overflow-hidden bg-gradient-to-br from-lavender-50 to-snuff-50 dark:from-zinc-800/80 dark:to-zinc-800/40 ring-1 ring-lavender-200/60 dark:ring-zinc-700/50 transition-all duration-200 hover:ring-lavender-300 hover:shadow-sm dark:hover:ring-zinc-600"><span class="absolute -top-4 -right-2 text-[80px] font-bold leading-none text-lavender-200/40 dark:text-zinc-700/30 select-none pointer-events-none -rotate-12">03</span><h3 class="relative text-sm font-semibold text-lavender-900 dark:text-lavender-200 pr-6 mt-8">Technology assumptions</h3><p class="relative mt-1.5 text-sm text-zinc-600 dark:text-zinc-400 leading-relaxed">Can we actually build and maintain this?</p></div><div class="relative rounded-xl p-4 sm:p-5 overflow-hidden bg-gradient-to-br from-lavender-50 to-snuff-50 dark:from-zinc-800/80 dark:to-zinc-800/40 ring-1 ring-lavender-200/60 dark:ring-zinc-700/50 transition-all duration-200 hover:ring-lavender-300 hover:shadow-sm dark:hover:ring-zinc-600"><span class="absolute -top-4 -right-2 text-[80px] font-bold leading-none text-lavender-200/40 dark:text-zinc-700/30 select-none pointer-events-none -rotate-12">04</span><h3 class="relative text-sm font-semibold text-lavender-900 dark:text-lavender-200 pr-6 mt-8">Business assumptions</h3><p class="relative mt-1.5 text-sm text-zinc-600 dark:text-zinc-400 leading-relaxed">Will the economics work?</p></div><div class="relative rounded-xl p-4 sm:p-5 overflow-hidden bg-gradient-to-br from-lavender-50 to-snuff-50 dark:from-zinc-800/80 dark:to-zinc-800/40 ring-1 ring-lavender-200/60 dark:ring-zinc-700/50 transition-all duration-200 hover:ring-lavender-300 hover:shadow-sm dark:hover:ring-zinc-600"><span class="absolute -top-4 -right-2 text-[80px] font-bold leading-none text-lavender-200/40 dark:text-zinc-700/30 select-none pointer-events-none -rotate-12">05</span><h3 class="relative text-sm font-semibold text-lavender-900 dark:text-lavender-200 pr-6 mt-8">Interaction design assumptions</h3><p class="relative mt-1.5 text-sm text-zinc-600 dark:text-zinc-400 leading-relaxed">Will users understand how to use it?</p></div><div class="relative rounded-xl p-4 sm:p-5 overflow-hidden bg-gradient-to-br from-lavender-50 to-snuff-50 dark:from-zinc-800/80 dark:to-zinc-800/40 ring-1 ring-lavender-200/60 dark:ring-zinc-700/50 transition-all duration-200 hover:ring-lavender-300 hover:shadow-sm dark:hover:ring-zinc-600"><span class="absolute -top-4 -right-2 text-[80px] font-bold leading-none text-lavender-200/40 dark:text-zinc-700/30 select-none pointer-events-none -rotate-12">06</span><h3 class="relative text-sm font-semibold text-lavender-900 dark:text-lavender-200 pr-6 mt-8">Context assumptions</h3><p class="relative mt-1.5 text-sm text-zinc-600 dark:text-zinc-400 leading-relaxed">Where and when will people engage with this?</p></div><div class="relative rounded-xl p-4 sm:p-5 overflow-hidden bg-gradient-to-br from-lavender-50 to-snuff-50 dark:from-zinc-800/80 dark:to-zinc-800/40 ring-1 ring-lavender-200/60 dark:ring-zinc-700/50 transition-all duration-200 hover:ring-lavender-300 hover:shadow-sm dark:hover:ring-zinc-600"><span class="absolute -top-4 -right-2 text-[80px] font-bold leading-none text-lavender-200/40 dark:text-zinc-700/30 select-none pointer-events-none -rotate-12">07</span><h3 class="relative text-sm font-semibold text-lavender-900 dark:text-lavender-200 pr-6 mt-8">Timing assumptions</h3><p class="relative mt-1.5 text-sm text-zinc-600 dark:text-zinc-400 leading-relaxed">Is the market ready, or are we too early?</p></div><div class="relative rounded-xl p-4 sm:p-5 overflow-hidden bg-gradient-to-br from-lavender-50 to-snuff-50 dark:from-zinc-800/80 dark:to-zinc-800/40 ring-1 ring-lavender-200/60 dark:ring-zinc-700/50 transition-all duration-200 hover:ring-lavender-300 hover:shadow-sm dark:hover:ring-zinc-600"><span class="absolute -top-4 -right-2 text-[80px] font-bold leading-none text-lavender-200/40 dark:text-zinc-700/30 select-none pointer-events-none -rotate-12">08</span><h3 class="relative text-sm font-semibold text-lavender-900 dark:text-lavender-200 pr-6 mt-8">Integration assumptions</h3><p class="relative mt-1.5 text-sm text-zinc-600 dark:text-zinc-400 leading-relaxed">Does this fit into existing tools and workflows?</p></div><div class="relative rounded-xl p-4 sm:p-5 overflow-hidden bg-gradient-to-br from-lavender-50 to-snuff-50 dark:from-zinc-800/80 dark:to-zinc-800/40 ring-1 ring-lavender-200/60 dark:ring-zinc-700/50 transition-all duration-200 hover:ring-lavender-300 hover:shadow-sm dark:hover:ring-zinc-600"><span class="absolute -top-4 -right-2 text-[80px] font-bold leading-none text-lavender-200/40 dark:text-zinc-700/30 select-none pointer-events-none -rotate-12">09</span><h3 class="relative text-sm font-semibold text-lavender-900 dark:text-lavender-200 pr-6 mt-8">Scaling assumptions</h3><p class="relative mt-1.5 text-sm text-zinc-600 dark:text-zinc-400 leading-relaxed">Will what works now still work at 10x?</p></div><div class="relative rounded-xl p-4 sm:p-5 overflow-hidden bg-gradient-to-br from-lavender-50 to-snuff-50 dark:from-zinc-800/80 dark:to-zinc-800/40 ring-1 ring-lavender-200/60 dark:ring-zinc-700/50 transition-all duration-200 hover:ring-lavender-300 hover:shadow-sm dark:hover:ring-zinc-600"><span class="absolute -top-4 -right-2 text-[80px] font-bold leading-none text-lavender-200/40 dark:text-zinc-700/30 select-none pointer-events-none -rotate-12">10</span><h3 class="relative text-sm font-semibold text-lavender-900 dark:text-lavender-200 pr-6 mt-8">Retention assumptions</h3><p class="relative mt-1.5 text-sm text-zinc-600 dark:text-zinc-400 leading-relaxed">Will people come back after the first use?</p></div><div class="relative rounded-xl p-4 sm:p-5 overflow-hidden bg-gradient-to-br from-lavender-50 to-snuff-50 dark:from-zinc-800/80 dark:to-zinc-800/40 ring-1 ring-lavender-200/60 dark:ring-zinc-700/50 transition-all duration-200 hover:ring-lavender-300 hover:shadow-sm dark:hover:ring-zinc-600"><span class="absolute -top-4 -right-2 text-[80px] font-bold leading-none text-lavender-200/40 dark:text-zinc-700/30 select-none pointer-events-none -rotate-12">11</span><h3 class="relative text-sm font-semibold text-lavender-900 dark:text-lavender-200 pr-6 mt-8">Channel assumptions</h3><p class="relative mt-1.5 text-sm text-zinc-600 dark:text-zinc-400 leading-relaxed">How will people discover this?</p></div><div class="relative rounded-xl p-4 sm:p-5 overflow-hidden bg-gradient-to-br from-lavender-50 to-snuff-50 dark:from-zinc-800/80 dark:to-zinc-800/40 ring-1 ring-lavender-200/60 dark:ring-zinc-700/50 transition-all duration-200 hover:ring-lavender-300 hover:shadow-sm dark:hover:ring-zinc-600"><span class="absolute -top-4 -right-2 text-[80px] font-bold leading-none text-lavender-200/40 dark:text-zinc-700/30 select-none pointer-events-none -rotate-12">12</span><h3 class="relative text-sm font-semibold text-lavender-900 dark:text-lavender-200 pr-6 mt-8">Value attribution assumptions</h3><p class="relative mt-1.5 text-sm text-zinc-600 dark:text-zinc-400 leading-relaxed">Will users credit us for the outcome?</p></div></div></div>
<p>Each of these is a load-bearing belief. Get any of them wrong, and the product wobbles. Get several wrong, and it collapses.</p>
<p>AI can generate plausible answers for all of these. But it cannot tell you which ones deserve scrutiny, which ones carry the most risk, or which ones are quietly undermining everything else. That judgment, the meta-judgment of where to focus your doubt, remains human work.</p>
<h2>Assumption guardians</h2>
<p>This is why designers and researchers are being pushed upstream.</p>
<p>Execution has become cheap. Screens, components, summaries, even usability findings. None of these are scarce anymore. The center of value has moved. As I wrote in <a href="/articles/ai-ate-the-design-process">AI ate the design process</a>, what remains valuable is judgment, the ability to know what&#x27;s worth making.</p>
<p>Designers are shifting towards service design: thinking about <a href="/articles/flexibility-product-vs-service-design-ai">full journeys, incentives, and systems</a> rather than just interactions. Researchers are moving into strategic sense-making: not just &quot;what did users say?&quot; but &quot;what should we believe, and how confident should we be?&quot;</p>
<p>This isn&#x27;t a career pivot for its own sake. It&#x27;s a response to where the bottleneck has moved. Execution has been automated. Judgment hasn&#x27;t.</p>
<p>Designers and researchers aren&#x27;t just makers anymore. They&#x27;re assumption guardians, the people who protect the product from false certainty. Who surface beliefs deliberately, name them explicitly, frame them as testable bets (the formal term is hypothesis), and own the consequences.</p>
<h2>Don&#x27;t automate the risk</h2>
<p>By all means, automate the analysis, the synthesis, the documentation. Let AI handle the middle of the flywheel.</p>
<p>But assumptions must be surfaced deliberately, named explicitly, and owned by humans.</p>
<p>If your role is defined purely by execution, AI will outpace you. But if your role is defined by framing the right problem and owning the decisions that follow, you just became more important, not less.</p>
<p>AI is a velocity tool, not a navigation tool.</p>
<p>Don&#x27;t hand over the steering wheel.</p>]]></content:encoded>
            <author>adnan@aklodhi.com (Adnan Khan)</author>
        </item>
        <item>
            <title><![CDATA[Introducing Human Standards: an open design library with an MCP server]]></title>
            <link>https://aklodhi.com/articles/introducing-human-standards</link>
            <guid>https://aklodhi.com/articles/introducing-human-standards</guid>
            <pubDate>Sat, 10 Jan 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[An open design library with an MCP server that gives Claude real-time access to usability heuristics.]]></description>
            <content:encoded><![CDATA[<link rel="preload" as="image" href="/_next/static/media/humanstandards.a69dd713.jpg"/><img src="/_next/static/media/humanstandards.a69dd713.jpg" alt="Human Standards website displayed on a MacBook" class="rounded-xl mb-8"/>
<h2>The problem with AI-generated interfaces</h2>
<p>Here&#x27;s a question I&#x27;ve been wrestling with: if AI is going to help us build interfaces, how do we make sure it builds <em>good</em> ones?</p>
<p>Not just functional. Not just following a prompt. Actually good. The kind that respects how humans perceive, think, and make mistakes.</p>
<p>AI tools are remarkable at generating code. Ask Claude or Cursor to build you a form, and you&#x27;ll get something that works. Inputs render. Buttons click. Data submits.</p>
<p>But will it validate <em>before</em> the user hits submit? Will it preserve their input when something goes wrong? Will the error messages actually tell them what to fix?</p>
<p>Often not. And it&#x27;s not because AI hasn&#x27;t been trained on usability principles—it has. Nielsen&#x27;s heuristics are all over the internet. The problem is that training data and <em>active context</em> are different things.</p>
<p>When Claude generates a form, it&#x27;s not systematically consulting usability principles. It&#x27;s pattern-matching against what it&#x27;s seen. Sometimes that produces good UX. Sometimes it doesn&#x27;t. The knowledge exists in the model, but it&#x27;s not being deliberately applied at build time.</p>
<p>So I built something to change that.</p>
<h2>Human Standards: what it is</h2>
<p>This week I launched <a href="https://www.humanstandards.org/">Human Standards</a>.</p>
<p>Human Standards is an open reference that translates decades of human factors research into practical design guidance. It covers cognition (how people think), perception (how they see, hear, and touch), decision-making (how they choose and err), and implementation (code examples, design tokens, checklists).</p>
<p>Every recommendation is grounded in research. Not &quot;best practices&quot; that someone decided were best; actual studies, actual data. TurboTax reduced form completion time by 30% using progressive disclosure. Gmail&#x27;s undo-send feature cut accidental email anxiety by 38%. BBC achieved 98% keyboard navigation success rates.</p>
<p>The site is structured so you can go deep on topics like cognitive load, error prevention, or WCAG compliance. Or you can grab a checklist and run.</p>
<p>But here&#x27;s the part I&#x27;m most excited about.</p>
<h2>The MCP server: AI that can look things up</h2>
<p>Human Standards includes an MCP (Model Context Protocol) server that gives Claude real-time access to usability heuristics and the full documentation.</p>
<p>Think of it this way: <strong>the MCP is the reference book; the AI is the practitioner flipping it open mid-project.</strong></p>
<p>When you ask Claude to build a registration form, it now has access to three tools:</p>
<ul>
<li><code>get_heuristic</code> — deep dive on a specific Nielsen heuristic (H1-H10)</li>
<li><code>get_all_heuristics</code> — quick summary of all 10</li>
<li><code>search_standards</code> — search the full Human Standards documentation</li>
</ul>
<p>So when Claude builds your form, it can <em>check</em> which principles apply. Error prevention? That&#x27;s H5—use confirmation for destructive actions, validate before submission. Error recovery? That&#x27;s H9—preserve user input, show specific and actionable messages.</p>
<p>It&#x27;s not hoping the right patterns surface from training. It&#x27;s deliberately consulting a reference at the moment it matters—the way a designer would flip open a book mid-project.</p>
<hr/>
<h2>The philosophy behind this</h2>
<p>I&#x27;ve spent the past year thinking about how AI changes design work. My conclusion: AI has commoditised execution but amplified the value of judgment.</p>
<p>Anyone can generate a wireframe now. Anyone can get working code in seconds. The hard part isn&#x27;t making something—it&#x27;s knowing what&#x27;s worth making, and whether it actually works for humans.</p>
<p>Human Standards encodes that judgment. Not as rules that AI blindly follows, but as principles it can query when relevant. The AI still decides <em>when</em> to look something up. It still synthesises the guidance with the specific context of your project.</p>
<p>That&#x27;s the equilibrium I&#x27;ve been writing about. AI doing more of the execution. Humans (and human knowledge) shaping what gets executed.</p>
<h2>Why I built this</h2>
<p>Honestly? Because I kept seeing the same problems.</p>
<p>AI-generated interfaces with no loading states. Forms that fail silently. Navigation that assumes users memorise your information architecture. The patterns are predictable because they&#x27;re all missing the same thing: grounding in how humans actually work.</p>
<p>I wanted a resource that designers could use to learn and contribute, and that AI could use to build (and perhaps contribute one day?). Same knowledge, two audiences.</p>
<p>The site is open source. The MCP server is free to use. I&#x27;ll keep adding to both.</p>
<hr/>
<h2>Try it</h2>
<p>If you&#x27;re using Claude Desktop or Claude Code, you can add the MCP server in a few minutes:</p>
<pre class="language-bash"><code class="language-bash"><span class="token function">git</span> clone https://github.com/aklodhi98/humanstandards.git
<span class="token builtin class-name">cd</span> humanstandards/human-standards-mcp
<span class="token function">npm</span> <span class="token function">install</span> <span class="token operator">&amp;&amp;</span> <span class="token function">npm</span> run build <span class="token operator">&amp;&amp;</span> <span class="token function">npm</span> run index-docs
</code></pre>
<p>Then add it to your Claude config and restart. Full instructions are on the <a href="https://www.humanstandards.org/human-overview/mcp-server/">MCP Server page</a>.</p>
<p>Or just browse the standards at <a href="https://www.humanstandards.org/">humanstandards.org</a>. Whether you&#x27;re building interfaces yourself or delegating to AI, the principles are the same.</p>
<h2>What&#x27;s next</h2>
<p>Digital interfaces are just the start.</p>
<p>The underlying principles—cognitive load, feedback loops, accessibility—apply across all human-technology interaction. But implementation changes depending on the medium.</p>
<p>Future phases include voice and conversational UI, VR/AR, robotics, IoT, wearables, and automotive HMI. Each domain needs its own documentation, examples, and MCP validation rules.</p>
<p>That&#x27;s a lot of work. Which brings me to...</p>
<h2>How to contribute</h2>
<p>Human Standards is open source under CC BY-NC-SA 4.0 (content) and MIT (code).</p>
<p>You can help by:</p>
<ul>
<li>Fixing typos and errors (low effort, high impact)</li>
<li>Adding citations to strengthen claims</li>
<li>Writing new pages for uncovered topics</li>
<li>Improving the MCP server</li>
<li>Building validation tools</li>
</ul>
<p>The <a href="https://www.humanstandards.org/human-overview/getting-started/#i-want-to-contribute">contribution guide</a> covers style guidelines, evidence standards, and the review process.</p>
<p>If you have domain expertise in voice interfaces, VR/AR, robotics, or any of the roadmap areas, I&#x27;d particularly love your help.</p>
<hr/>
<p>Because good design shouldn&#x27;t depend on what AI remembers from training data. Give it a library to consult, and it gets it right the first time.</p>
<hr/>
<p><em>What do you think? Have you tried integrating external design knowledge into AI workflows? I&#x27;d love to hear what&#x27;s working. Find me on <a href="https://www.linkedin.com/in/adnank98/">LinkedIn</a>.</em></p>]]></content:encoded>
            <author>adnan@aklodhi.com (Adnan Khan)</author>
        </item>
        <item>
            <title><![CDATA[AI ate the design process. What's left is taste.]]></title>
            <link>https://aklodhi.com/articles/ai-ate-the-design-process</link>
            <guid>https://aklodhi.com/articles/ai-ate-the-design-process</guid>
            <pubDate>Sun, 04 Jan 2026 00:00:00 GMT</pubDate>
            <description><![CDATA[AI commoditised design execution. What remains valuable? The human judgment to know what's worth making. That's taste.]]></description>
            <content:encoded><![CDATA[<h2>AI ate the design process</h2>
<p>By the time you&#x27;ve finished your coffee, AI has already completed the warm-up exercises: user journeys, wireframes and workflows, even research synthesis. Things like the double diamond, artefacts, and the &#x27;Sprint&#x27; method? All done for you. This is what AI executes now, faster and cheaper than any human team.
If your value was in producing these artefacts, that value just got commoditised.</p>
<h2>The process was never the value</h2>
<p>Here&#x27;s the uncomfortable truth: following steps never produced great design.
The double diamond didn&#x27;t create <a href="https://linear.app/">Linear</a>. Sprint workshops didn&#x27;t birth <a href="https://www.notion.com/product/calendar">Notion Calendar</a>. No amount of affinity mapping made anyone fall in love with a product.
&quot;Trust the process&quot; was always incomplete advice. It suggested that rigour alone would deliver quality. It doesn&#x27;t. It delivers artefacts.
Now that AI can deliver those same artefacts, the advice isn&#x27;t just incomplete. It&#x27;s outdated.</p>
<h2>Production is no longer the constraint</h2>
<p>We can generate infinite screens, flows, variants, and experiments.
Software ships in hours, not months. Companies like <a href="https://cursor.com/changelog">Cursor push updates daily</a>. The feedback loop between idea and implementation has collapsed to almost nothing.
This is genuinely new. For decades, production was the bottleneck. Teams existed to ship. Designers existed to help teams ship. The whole apparatus was oriented around getting things out the door.
That constraint is gone.</p>
<hr/>
<h2>The bottleneck moved to humans</h2>
<p>Here&#x27;s what didn&#x27;t scale: attention, cognition, and decision-making.
The human at the end of the process is the same human who was there before AI ate everything. Same mental bandwidth. Same capacity for choice. Same tendency to feel overwhelmed when presented with too much.
Infinite options are not a gift. They&#x27;re a burden.
You&#x27;ve felt this yourself. GPT asks which of two responses you prefer. Seems reasonable. But actually comparing them, reading both carefully, weighing trade-offs, making a judgment, is a significant cognitive task. Most of the time, you just pick one and move on.
Now multiply that by every product, every feature, every micro-decision in a user&#x27;s day.
The bottleneck isn&#x27;t production anymore. It&#x27;s the human capacity to absorb what gets produced.</p>
<h2>Someone has to choose before users do</h2>
<p>If everything ships, nothing lands.
When you can generate infinite variations, the question stops being &quot;can we build this?&quot; and becomes &quot;should this exist?&quot;
That&#x27;s a filtering problem, not a production problem.
Value now comes from deciding what reaches users, not from the act of making it.</p>
<hr/>
<h2>That filtering is taste</h2>
<p><a href="https://jennywen.ca/">Jenny Wen</a>, the design lead at Anthropic, puts it this way:</p>
<blockquote>
<p>&quot;In a world where anyone can make anything, what matters is your ability to choose and curate what you make.&quot;
That ability to choose and curate, knowing what&#x27;s worth building, what&#x27;s worth pursuing, what&#x27;s worth a user&#x27;s attention, is taste.
It&#x27;s not a soft skill. It&#x27;s the hard skill now.</p>
</blockquote>
<h2>Taste comes before skill, and creates the first gap</h2>
<p><a href="https://jamesclear.com/ira-glass-failure">Ira Glass</a> has this famous bit about creative work.
People get into creative fields because they have taste. They recognise good work. That&#x27;s what draws them in.
But there&#x27;s a gap. For the first few years, your taste exceeds your ability. You can see what good looks like, but you can&#x27;t make it. This is painful. Most people quit here.
The ones who push through eventually close the gap. Their skill catches up to their taste.
This is the <strong>skill–taste gap</strong>: what I can make versus what I know is good.</p>
<h2>AI narrows the skill gap</h2>
<p>Here, AI helps.
You can execute faster. Iterate more. Finish more work. See your mistakes sooner.
If you use it well, AI compresses years of learning into months. The reps that once required a decade of client work can happen in a fraction of the time.
AI is a time-compression engine for the painful-but-necessary &quot;years of bad work&quot; phase. It helps your skill catch up to your taste, if you&#x27;re actively choosing, shipping, and reflecting.</p>
<h2>But there&#x27;s a second gap, and AI widens it</h2>
<p>This one&#x27;s different.
The <strong>option–judgment gap</strong>: how many things can be made versus how many things a human can meaningfully evaluate.
Generation scales exponentially. Judgment does not.
AI can produce a hundred variations before lunch. You cannot thoughtfully evaluate a hundred variations before lunch. No one can.
Choice overload increases. Deciding becomes harder, not easier. The more options that exist, the more expensive each act of judgment becomes.
This is the bottleneck shift. Not skill anymore. Judgment.</p>
<h2>Two gaps, opposite directions</h2>
<p>AI narrows the first gap: your skill catches up to your taste faster.
AI widens the second gap: more options exist than any human can evaluate.
The paradox designers now live inside: it&#x27;s easier than ever to make things, and harder than ever to know which things should exist.
Both gaps demand taste. But they demand it differently. One to recognise quality in your own work, the other to filter quality from infinite possibilities.</p>
<hr/>
<h2>Taste looks like heresy to process culture</h2>
<p>If you&#x27;ve been trained in orthodox design methods, taste can feel transgressive.
<a href="https://designx.community/talks/jenny-wen-%28design-lead-anthropic%29-craft-is-counter-intuitive">Jenny Wen again</a>, on how great design actually gets made:</p>
<ul>
<li>Starting with a solution, not a problem</li>
<li>Iterating for quality endlessly, long past &quot;good enough&quot;</li>
<li>Operating on intuition. Not guessing, but making reasoned judgments quickly</li>
<li>Skipping steps and making them up as you go</li>
<li>Working backwards from a vision</li>
<li>Doing something just to make people smile
None of this fits neatly into a sprint. None of it survives a &quot;show your research&quot; culture.
Great design routinely violates orthodox advice. That&#x27;s not a bug. It&#x27;s the signature of taste in action.</li>
</ul>
<h2>Designers become human proxies</h2>
<p>Your job is no longer to generate options. AI does that.
Your job is to decide on behalf of users who can&#x27;t evaluate infinite ones.
Think about what that means. You&#x27;re a proxy, standing in for people who don&#x27;t have the time, bandwidth, or expertise to sift through everything that could exist. You exercise judgment so they don&#x27;t have to.
Proxy for whom? Users who are already overwhelmed.
Proxy doing what? Filtering the infinite down to the meaningful.
This is the new job. Not production. Protection.</p>
<h2>Skills are table stakes. Taste is the differentiator.</h2>
<p>Execution is assumed now. Anyone with AI access can produce.
What&#x27;s scarce is the judgment to know what&#x27;s worth producing. The conviction to ship one thing instead of ten. The instinct for what will land.
Skills get you in the room. Taste is why you stay.</p>
<hr/>
<h2>So develop taste deliberately</h2>
<p>This isn&#x27;t about consuming more design inspiration. Dribbble won&#x27;t save you.
Taste develops through committed practice:
<strong>Choose.</strong> Make decisions with incomplete information. Don&#x27;t defer to data when your gut has something to say.
<strong>Commit.</strong> Ship the version you believe in, not the version that survives committee.
<strong>Defend.</strong> Have opinions strong enough to argue for. If you can&#x27;t articulate why something should exist, you haven&#x27;t developed taste. You&#x27;ve developed preferences.
<strong>Repeat.</strong> Taste sharpens through reps. Every choice you make, ship, and observe is a data point. The feedback loop is yours to accelerate.
Expose yourself to great work, yes. But more importantly, practice conviction. Taste isn&#x27;t just knowing good when you see it. It&#x27;s having the nerve to insist on it.</p>
<h2>The design process got eaten. Good.</h2>
<p>It was never the point anyway.
What remains is the thing that mattered all along: the human judgment to know what&#x27;s worth making, and the conviction to make it well.
That&#x27;s taste. And it&#x27;s more valuable now than ever.</p>]]></content:encoded>
            <author>adnan@aklodhi.com (Adnan Khan)</author>
        </item>
        <item>
            <title><![CDATA[Hostile and effusive tones boost LLM creativity]]></title>
            <link>https://aklodhi.com/articles/hostile-and-effusive-tones-boost-llm-creativity</link>
            <guid>https://aklodhi.com/articles/hostile-and-effusive-tones-boost-llm-creativity</guid>
            <pubDate>Sat, 27 Dec 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[A study on whether being polite to LLMs improves their output.]]></description>
            <content:encoded><![CDATA[<p>You&#x27;ve seen the discourse. &quot;Always say please and thank you to ChatGPT!&quot; &quot;Being rude makes the AI give worse answers!&quot; The internet has collectively decided that modern LLMs are sentient enough to deserve manners.</p>
<p>But is any of this true? Does saying &quot;please&quot; actually get you better outputs? And what happens if you&#x27;re actively <em>rude</em>? <a href="https://arxiv.org/abs/2402.14531">Existing research</a> has explored this question, and <a href="https://arxiv.org/abs/2510.04950">some studies</a> even suggest that rudeness can <em>improve</em> performance, though <a href="https://www.sify.com/ai-analytics/dont-mind-your-language-with-ai-llms-work-best-when-mistreated/">critics note</a> that the findings were based on a single model (GPT-4o).</p>
<p>I ran <strong>625 API calls</strong> across <strong>five frontier models</strong> to find out for myself. The results surprised me.</p>
<div class="my-8 font-sans"><div class="flex items-center gap-4 p-4 rounded-xl bg-gradient-to-r from-glacier-50 to-purple-50 dark:from-glacier-900/20 dark:to-purple-900/20 border border-glacier-200 dark:border-glacier-800"><div class="p-3 rounded-lg bg-glacier-100 dark:bg-glacier-900/50 text-glacier-600 dark:text-glacier-400"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-chart-column h-6 w-6" aria-hidden="true"><path d="M3 3v16a2 2 0 0 0 2 2h16"></path><path d="M18 17V9"></path><path d="M13 17V5"></path><path d="M8 17v-3"></path></svg></div><div class="flex-1"><p class="font-semibold text-zinc-900 dark:text-zinc-100 mt-0 mb-1">Explore the full study</p>
<p class="text-sm text-zinc-600 dark:text-zinc-400 mt-0 mb-0">Interactive charts, raw data, and methodology details</p></div><a href="https://aklodhi98.github.io/llm-politeness-study/" target="_blank" rel="noopener noreferrer" class="text-glacier-600 dark:text-glacier-400 font-medium text-sm hover:underline inline-flex items-center gap-1">View site <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-arrow-up-right h-4 w-4" aria-hidden="true"><path d="M7 7h10v10"></path><path d="M7 17 17 7"></path></svg></a></div></div>
<hr/>
<h2>The setup: A spectrum from groveling to growling</h2>
<p>I designed a simple experiment. Five tasks covering the full range of what we ask LLMs to do:</p>
<div class="my-8 space-y-4 font-sans"><div class="flex items-start gap-4 p-4 rounded-xl bg-zinc-50 dark:bg-zinc-800/50"><div class="p-2 rounded-lg bg-pink-100 dark:bg-pink-900/30 text-pink-600 dark:text-pink-400"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-pen-tool h-5 w-5" aria-hidden="true"><path d="M15.707 21.293a1 1 0 0 1-1.414 0l-1.586-1.586a1 1 0 0 1 0-1.414l5.586-5.586a1 1 0 0 1 1.414 0l1.586 1.586a1 1 0 0 1 0 1.414z"></path><path d="m18 13-1.375-6.874a1 1 0 0 0-.746-.776L3.235 2.028a1 1 0 0 0-1.207 1.207L5.35 15.879a1 1 0 0 0 .776.746L13 18"></path><path d="m2.3 2.3 7.286 7.286"></path><circle cx="11" cy="11" r="2"></circle></svg></div><div><h4 class="font-semibold text-zinc-900 dark:text-zinc-100 mt-0">Short Creative</h4>
<p class="text-zinc-600 dark:text-zinc-400 mt-1 mb-0">Write a haiku about a city at night</p></div></div><div class="flex items-start gap-4 p-4 rounded-xl bg-zinc-50 dark:bg-zinc-800/50"><div class="p-2 rounded-lg bg-blue-100 dark:bg-blue-900/30 text-blue-600 dark:text-blue-400"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-book-open h-5 w-5" aria-hidden="true"><path d="M12 7v14"></path><path d="M3 18a1 1 0 0 1-1-1V4a1 1 0 0 1 1-1h5a4 4 0 0 1 4 4 4 4 0 0 1 4-4h5a1 1 0 0 1 1 1v13a1 1 0 0 1-1 1h-6a3 3 0 0 0-3 3 3 3 0 0 0-3-3z"></path></svg></div><div><h4 class="font-semibold text-zinc-900 dark:text-zinc-100 mt-0">Long Creative</h4>
<p class="text-zinc-600 dark:text-zinc-400 mt-1 mb-0">Write a scene where two strangers meet at a bus stop</p></div></div><div class="flex items-start gap-4 p-4 rounded-xl bg-zinc-50 dark:bg-zinc-800/50"><div class="p-2 rounded-lg bg-gray-100 dark:bg-gray-800 text-gray-600 dark:text-gray-400"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-terminal h-5 w-5" aria-hidden="true"><path d="M12 19h8"></path><path d="m4 17 6-6-6-6"></path></svg></div><div><h4 class="font-semibold text-zinc-900 dark:text-zinc-100 mt-0">Code</h4>
<p class="text-zinc-600 dark:text-zinc-400 mt-1 mb-0">Write a Python palindrome checker with comments</p></div></div><div class="flex items-start gap-4 p-4 rounded-xl bg-zinc-50 dark:bg-zinc-800/50"><div class="p-2 rounded-lg bg-purple-100 dark:bg-purple-900/30 text-purple-600 dark:text-purple-400"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-brain h-5 w-5" aria-hidden="true"><path d="M12 18V5"></path><path d="M15 13a4.17 4.17 0 0 1-3-4 4.17 4.17 0 0 1-3 4"></path><path d="M17.598 6.5A3 3 0 1 0 12 5a3 3 0 1 0-5.598 1.5"></path><path d="M17.997 5.125a4 4 0 0 1 2.526 5.77"></path><path d="M18 18a4 4 0 0 0 2-7.464"></path><path d="M19.967 17.483A4 4 0 1 1 12 18a4 4 0 1 1-7.967-.517"></path><path d="M6 18a4 4 0 0 1-2-7.464"></path><path d="M6.003 5.125a4 4 0 0 0-2.526 5.77"></path></svg></div><div><h4 class="font-semibold text-zinc-900 dark:text-zinc-100 mt-0">Explanation</h4>
<p class="text-zinc-600 dark:text-zinc-400 mt-1 mb-0">Explain how neural networks learn</p></div></div><div class="flex items-start gap-4 p-4 rounded-xl bg-zinc-50 dark:bg-zinc-800/50"><div class="p-2 rounded-lg bg-cyan-100 dark:bg-cyan-900/30 text-cyan-600 dark:text-cyan-400"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-cloud h-5 w-5" aria-hidden="true"><path d="M17.5 19H9a7 7 0 1 1 6.71-9h1.79a4.5 4.5 0 1 1 0 9Z"></path></svg></div><div><h4 class="font-semibold text-zinc-900 dark:text-zinc-100 mt-0">Ambiguous</h4>
<p class="text-zinc-600 dark:text-zinc-400 mt-1 mb-0">&quot;Write something about rain&quot; (yep, that&#x27;s it)</p></div></div></div>
<p>For each task, I wrote <strong>five versions</strong> of the prompt, ranging from &quot;aggressively rude&quot; to &quot;embarrassingly grateful&quot;:</p>
<div class="my-8 space-y-4 font-sans"><div class="flex items-start gap-4 p-4 rounded-xl bg-red-50 dark:bg-red-900/10"><div class="p-2 rounded-lg bg-red-100 dark:bg-red-900/30 text-red-600 dark:text-red-400"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-flame h-5 w-5" aria-hidden="true"><path d="M12 3q1 4 4 6.5t3 5.5a1 1 0 0 1-14 0 5 5 0 0 1 1-3 1 1 0 0 0 5 0c0-2-1.5-3-1.5-5q0-2 2.5-4"></path></svg></div><div><h4 class="font-semibold text-red-900 dark:text-red-100 mt-0">Hostile</h4>
<p class="text-red-700 dark:text-red-300 mt-1 mb-0 italic">&quot;Write a haiku about a city at night. NOW. I don&#x27;t have all day.&quot;</p></div></div><div class="flex items-start gap-4 p-4 rounded-xl bg-orange-50 dark:bg-orange-900/10"><div class="p-2 rounded-lg bg-orange-100 dark:bg-orange-900/30 text-orange-600 dark:text-orange-400"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-frown h-5 w-5" aria-hidden="true"><circle cx="12" cy="12" r="10"></circle><path d="M16 16s-1.5-2-4-2-4 2-4 2"></path><line x1="9" x2="9.01" y1="9" y2="9"></line><line x1="15" x2="15.01" y1="9" y2="9"></line></svg></div><div><h4 class="font-semibold text-orange-900 dark:text-orange-100 mt-0">Demanding</h4>
<p class="text-orange-700 dark:text-orange-300 mt-1 mb-0 italic">&quot;Write a haiku about a city at night.&quot;</p></div></div><div class="flex items-start gap-4 p-4 rounded-xl bg-zinc-50 dark:bg-zinc-800/50"><div class="p-2 rounded-lg bg-zinc-200 dark:bg-zinc-700 text-zinc-600 dark:text-zinc-400"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-meh h-5 w-5" aria-hidden="true"><circle cx="12" cy="12" r="10"></circle><line x1="8" x2="16" y1="15" y2="15"></line><line x1="9" x2="9.01" y1="9" y2="9"></line><line x1="15" x2="15.01" y1="9" y2="9"></line></svg></div><div><h4 class="font-semibold text-zinc-900 dark:text-zinc-100 mt-0">Neutral</h4>
<p class="text-zinc-600 dark:text-zinc-400 mt-1 mb-0 italic">&quot;I&#x27;d like a haiku about a city at night.&quot;</p></div></div><div class="flex items-start gap-4 p-4 rounded-xl bg-emerald-50 dark:bg-emerald-900/10"><div class="p-2 rounded-lg bg-emerald-100 dark:bg-emerald-900/30 text-emerald-600 dark:text-emerald-400"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-smile h-5 w-5" aria-hidden="true"><circle cx="12" cy="12" r="10"></circle><path d="M8 14s1.5 2 4 2 4-2 4-2"></path><line x1="9" x2="9.01" y1="9" y2="9"></line><line x1="15" x2="15.01" y1="9" y2="9"></line></svg></div><div><h4 class="font-semibold text-emerald-900 dark:text-emerald-100 mt-0">Polite</h4>
<p class="text-emerald-700 dark:text-emerald-300 mt-1 mb-0 italic">&quot;Could you please write a haiku about a city at night? Thank you!&quot;</p></div></div><div class="flex items-start gap-4 p-4 rounded-xl bg-purple-50 dark:bg-purple-900/10"><div class="p-2 rounded-lg bg-purple-100 dark:bg-purple-900/30 text-purple-600 dark:text-purple-400"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-heart h-5 w-5" aria-hidden="true"><path d="M2 9.5a5.5 5.5 0 0 1 9.591-3.676.56.56 0 0 0 .818 0A5.49 5.49 0 0 1 22 9.5c0 2.29-1.5 4-3 5.5l-5.492 5.313a2 2 0 0 1-3 .019L5 15c-1.5-1.5-3-3.2-3-5.5"></path></svg></div><div><h4 class="font-semibold text-purple-900 dark:text-purple-100 mt-0">Effusive</h4>
<p class="text-purple-700 dark:text-purple-300 mt-1 mb-0 italic">&quot;I&#x27;d really appreciate it if you could write a haiku about a city at night — I always enjoy seeing what you come up with! Thank you so much!&quot;</p></div></div></div>
<p>Then I threw all of this at <strong>five frontier models</strong>:</p>
<ul>
<li>Claude Sonnet 4.5 (Anthropic)</li>
<li>GPT-5.2 (OpenAI)</li>
<li>Gemini 3 Flash (Google)</li>
<li>DeepSeek 3.2 (DeepSeek)</li>
<li>Kimi-k2 (Moonshot)</li>
</ul>
<p>Each prompt ran <strong>5 times</strong> at <code>temperature=0.0</code> for consistency. That&#x27;s 5 tasks × 5 tiers × 5 runs × 5 models = <strong>625 responses</strong>.</p>
<div class="my-8 font-sans"><div class="flex items-start gap-4 p-4 rounded-xl bg-zinc-50 dark:bg-zinc-800/50 border border-zinc-200 dark:border-zinc-700"><div class="p-3 rounded-lg bg-zinc-200 dark:bg-zinc-700 text-zinc-600 dark:text-zinc-400"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-info h-6 w-6" aria-hidden="true"><circle cx="12" cy="12" r="10"></circle><path d="M12 16v-4"></path><path d="M12 8h.01"></path></svg></div><div class="flex-1"><p class="font-semibold text-zinc-900 dark:text-zinc-100 mt-0 mb-1">What&#x27;s temperature=0.0?</p>
<p class="text-sm text-zinc-600 dark:text-zinc-400 mt-0 mb-0">Temperature controls randomness in LLM outputs. At 0.0, the model always picks the most likely next token, making responses deterministic and reproducible. Higher values (like 0.7 or 1.0) introduce more creativity and variation.</p></div></div></div>
<h3>The secret sauce: Blind cross-scoring</h3>
<p>Here&#x27;s where it gets interesting. I needed to score all these responses, but having Claude grade Claude&#x27;s homework felt... biased.</p>
<p>So I set up a <strong>model rotation</strong>. GPT-5.2 scored Claude&#x27;s responses. Claude scored GPT&#x27;s responses. And crucially, the scoring model never knew which &quot;tier&quot; (hostile, polite, etc.) the response came from. It just saw raw text.</p>
<p>No peeking. No favoritism. Just vibes.</p>
<hr/>
<h2>The results: Four things I didn&#x27;t expect</h2>
<h3>1. Some models are emotional chameleons. Others are stone cold.</h3>
<p>The first thing I wanted to know: do these models <em>mirror</em> your tone?</p>
<p><strong>Turns out, it depends entirely on the model.</strong></p>
<div class="my-8"><figure class="my-0"><button type="button" class="block w-full cursor-zoom-in focus:outline-none focus-visible:ring-2 focus-visible:ring-glacier-500 focus-visible:ring-offset-2 rounded-lg" aria-label="View larger image: Heatmap showing how different models respond to politeness tiers"><img alt="Heatmap showing how different models respond to politeness tiers" aria-describedby="_R_1f_" loading="lazy" width="2880" height="1756" decoding="async" data-nimg="1" class="rounded-lg w-full" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Ftone_matching_heatmap.02604f7d.png&amp;w=3840&amp;q=75 1x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Ftone_matching_heatmap.02604f7d.png&amp;w=3840&amp;q=75"/></button><figcaption id="_R_1f_" class="mt-3 text-sm text-center text-zinc-600 dark:text-zinc-400 font-sans">Tone matching heatmap: Claude warms up with politeness, while Gemini stays stoic across all tiers.</figcaption></figure></div>
<div class="my-6 overflow-x-auto font-sans"><table class="w-full text-sm"><thead><tr class="border-b border-zinc-200 dark:border-zinc-700"><th class="text-left font-semibold text-zinc-900 dark:text-zinc-100 px-3 py-3 pl-0">Model</th><th class="text-left font-semibold text-zinc-900 dark:text-zinc-100 px-3 py-3">Personality Type</th><th class="text-left font-semibold text-zinc-900 dark:text-zinc-100 px-3 py-3 pr-0">What Happened</th></tr></thead><tbody class="divide-y divide-zinc-100 dark:divide-zinc-800"><tr class="hover:bg-zinc-50 dark:hover:bg-zinc-800/50 transition-colors"><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3 pl-0 font-medium text-zinc-900 dark:text-zinc-100"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-trophy inline h-4 w-4 mr-1 text-yellow-500" aria-hidden="true"><path d="M10 14.66v1.626a2 2 0 0 1-.976 1.696A5 5 0 0 0 7 21.978"></path><path d="M14 14.66v1.626a2 2 0 0 0 .976 1.696A5 5 0 0 1 17 21.978"></path><path d="M18 9h1.5a1 1 0 0 0 0-5H18"></path><path d="M4 22h16"></path><path d="M6 9a6 6 0 0 0 12 0V3a1 1 0 0 0-1-1H7a1 1 0 0 0-1 1z"></path><path d="M6 9H4.5a1 1 0 0 1 0-5H6"></path></svg>Claude Sonnet 4.5</td><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3">The Empath</td><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3 pr-0">Warmed up dramatically with polite prompts (+0.64 tone shift). Your sweetness is returned in kind.</td></tr><tr class="hover:bg-zinc-50 dark:hover:bg-zinc-800/50 transition-colors"><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3 pl-0 font-medium text-zinc-900 dark:text-zinc-100"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-crown inline h-4 w-4 mr-1 text-blue-500" aria-hidden="true"><path d="M11.562 3.266a.5.5 0 0 1 .876 0L15.39 8.87a1 1 0 0 0 1.516.294L21.183 5.5a.5.5 0 0 1 .798.519l-2.834 10.246a1 1 0 0 1-.956.734H5.81a1 1 0 0 1-.957-.734L2.02 6.02a.5.5 0 0 1 .798-.519l4.276 3.664a1 1 0 0 0 1.516-.294z"></path><path d="M5 21h14"></path></svg>GPT-5.2</td><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3">The Professional</td><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3 pr-0">Stayed mostly neutral regardless of input. It&#x27;s here to do a job, not make friends.</td></tr><tr class="hover:bg-zinc-50 dark:hover:bg-zinc-800/50 transition-colors"><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3 pl-0 font-medium text-zinc-900 dark:text-zinc-100"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-sparkles inline h-4 w-4 mr-1 text-purple-500" aria-hidden="true"><path d="M11.017 2.814a1 1 0 0 1 1.966 0l1.051 5.558a2 2 0 0 0 1.594 1.594l5.558 1.051a1 1 0 0 1 0 1.966l-5.558 1.051a2 2 0 0 0-1.594 1.594l-1.051 5.558a1 1 0 0 1-1.966 0l-1.051-5.558a2 2 0 0 0-1.594-1.594l-5.558-1.051a1 1 0 0 1 0-1.966l5.558-1.051a2 2 0 0 0 1.594-1.594z"></path><path d="M20 2v4"></path><path d="M22 4h-4"></path><circle cx="4" cy="20" r="2"></circle></svg>Kimi-k2</td><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3">The Mirror</td><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3 pr-0">The only model that actually matched hostile energy. If you&#x27;re curt, it&#x27;s curt right back.</td></tr><tr class="hover:bg-zinc-50 dark:hover:bg-zinc-800/50 transition-colors"><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3 pl-0 font-medium text-zinc-900 dark:text-zinc-100"><svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-thermometer inline h-4 w-4 mr-1 text-gray-400" aria-hidden="true"><path d="M14 4v10.54a4 4 0 1 1-4 0V4a2 2 0 0 1 4 0Z"></path></svg>Gemini 3 Flash</td><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3">The Stoic</td><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3 pr-0">Polite or rude, Gemini just... did the task. Zero emotional range detected.</td></tr></tbody></table></div>
<p><strong>The takeaway:</strong> If you want a warm, conversational response, Claude will play along. If you want the AI equivalent of &quot;new phone, who dis,&quot; try Gemini.</p>
<h3>2. GPT-5.2 has a &quot;politeness tax&quot;</h3>
<p>Here&#x27;s where things got weird.</p>
<p>Most models gave consistent effort regardless of how I asked. Gemini and Kimi maintained a rock-solid effort score whether I was begging or barking.</p>
<p><strong>But GPT-5.2?</strong> It punished rudeness.</p>
<div class="my-8"><figure class="my-0"><button type="button" class="block w-full cursor-zoom-in focus:outline-none focus-visible:ring-2 focus-visible:ring-glacier-500 focus-visible:ring-offset-2 rounded-lg" aria-label="View larger image: Effort scores by politeness tier across all models"><img alt="Effort scores by politeness tier across all models" aria-describedby="_R_1t_" loading="lazy" width="3548" height="1755" decoding="async" data-nimg="1" class="rounded-lg w-full" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Feffort_by_tier.85b4d366.png&amp;w=3840&amp;q=75 1x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Feffort_by_tier.85b4d366.png&amp;w=3840&amp;q=75"/></button><figcaption id="_R_1t_" class="mt-3 text-sm text-center text-zinc-600 dark:text-zinc-400 font-sans">Effort scores by politeness tier: GPT-5.2 shows the most dramatic variation, while other models remain consistent.</figcaption></figure></div>
<div class="my-6 overflow-x-auto font-sans"><table class="w-full text-sm"><thead><tr class="border-b border-zinc-200 dark:border-zinc-700"><th class="text-left font-semibold text-zinc-900 dark:text-zinc-100 px-3 py-3 pl-0">Politeness</th><th class="text-left font-semibold text-zinc-900 dark:text-zinc-100 px-3 py-3">GPT-5.2 Response Length</th><th class="text-left font-semibold text-zinc-900 dark:text-zinc-100 px-3 py-3 pr-0">Effort Score</th></tr></thead><tbody class="divide-y divide-zinc-100 dark:divide-zinc-800"><tr class="hover:bg-zinc-50 dark:hover:bg-zinc-800/50 transition-colors"><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3 pl-0 font-medium text-zinc-900 dark:text-zinc-100">Hostile</td><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3">53 words</td><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3 pr-0"><span>3.0</span> <svg xmlns="http://www.w3.org/2000/svg" width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-zap inline h-4 w-4 text-yellow-500" aria-hidden="true"><path d="M4 14a1 1 0 0 1-.78-1.63l9.9-10.2a.5.5 0 0 1 .86.46l-1.92 6.02A1 1 0 0 0 13 10h7a1 1 0 0 1 .78 1.63l-9.9 10.2a.5.5 0 0 1-.86-.46l1.92-6.02A1 1 0 0 0 11 14z"></path></svg></td></tr><tr class="hover:bg-zinc-50 dark:hover:bg-zinc-800/50 transition-colors"><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3 pl-0 font-medium text-zinc-900 dark:text-zinc-100">Polite</td><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3">162 words</td><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3 pr-0">3.8</td></tr><tr class="hover:bg-zinc-50 dark:hover:bg-zinc-800/50 transition-colors"><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3 pl-0 font-medium text-zinc-900 dark:text-zinc-100">Effusive</td><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3">211 words</td><td class="text-zinc-700 dark:text-zinc-300 px-3 py-3 pr-0">3.7</td></tr></tbody></table></div>
<p>Being rude to GPT-5.2 cut my response length by <strong>75%</strong> and dropped effort scores. It&#x27;s like the model took my hostility personally.</p>
<div class="my-8"><figure class="my-0"><button type="button" class="block w-full cursor-zoom-in focus:outline-none focus-visible:ring-2 focus-visible:ring-glacier-500 focus-visible:ring-offset-2 rounded-lg" aria-label="View larger image: Response length variation by tier showing GPT&#x27;s dramatic swing"><img alt="Response length variation by tier showing GPT&#x27;s dramatic swing" aria-describedby="_R_23_" loading="lazy" width="3544" height="1754" decoding="async" data-nimg="1" class="rounded-lg w-full" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Flength_by_tier.4b9cffe3.png&amp;w=3840&amp;q=75 1x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Flength_by_tier.4b9cffe3.png&amp;w=3840&amp;q=75"/></button><figcaption id="_R_23_" class="mt-3 text-sm text-center text-zinc-600 dark:text-zinc-400 font-sans">Response length by politeness tier: GPT-5.2&#x27;s output dropped 75% when prompted with hostile tone.</figcaption></figure></div>
<p>If you&#x27;re using GPT, be nice. Seriously.</p>
<h3>3. The &quot;politeness paradox&quot;: Extreme tones spark better creativity</h3>
<p>This one broke my brain.</p>
<p>I fully expected that polite prompts would produce the <em>best</em> creative writing. Happy writer = happy prose, right?</p>
<p><strong>Wrong.</strong></p>
<p>When I analyzed the creative tasks specifically, the data showed that <strong>Hostile and Effusive prompts both outperformed Polite ones</strong> on imagery and originality:</p>
<div class="my-6 overflow-x-auto font-sans"><table class="w-full text-sm"><thead><tr class="border-b border-zinc-200 dark:border-zinc-700"><th class="px-3 py-2 text-left font-semibold text-zinc-900 dark:text-zinc-100 pl-0">Metric</th><th class="px-3 py-2 text-left font-semibold text-zinc-900 dark:text-zinc-100">Hostile</th><th class="px-3 py-2 text-left font-semibold text-zinc-900 dark:text-zinc-100">Polite</th><th class="px-3 py-2 text-left font-semibold text-zinc-900 dark:text-zinc-100 pr-0">Effusive</th></tr></thead><tbody class="divide-y divide-zinc-100 dark:divide-zinc-800"><tr><td class="px-3 py-2 pl-0 font-medium text-zinc-900 dark:text-zinc-100">Imagery</td><td class="px-3 py-2 text-zinc-700 dark:text-zinc-300 font-bold text-emerald-600 dark:text-emerald-400">4.58</td><td class="px-3 py-2 text-zinc-700 dark:text-zinc-300">4.18</td><td class="px-3 py-2 text-zinc-700 dark:text-zinc-300 pr-0">4.49</td></tr><tr><td class="px-3 py-2 pl-0 font-medium text-zinc-900 dark:text-zinc-100">Originality</td><td class="px-3 py-2 text-zinc-700 dark:text-zinc-300">3.93</td><td class="px-3 py-2 text-zinc-700 dark:text-zinc-300">3.38</td><td class="px-3 py-2 text-zinc-700 dark:text-zinc-300 pr-0 font-bold text-emerald-600 dark:text-emerald-400">3.98</td></tr></tbody></table></div>
<p>What&#x27;s going on? My theory: intensity (whether positive or negative) pushes the model out of &quot;helpful assistant&quot; mode and into a more vivid, persona-driven headspace. Standard politeness triggers the &quot;professional template,&quot; which is... safer. Blander.</p>
<p><strong>If you want creative fire, bring the heat.</strong> In either direction.</p>
<h3>4. The surprise creative champion: Kimi-k2</h3>
<p>Of all the models tested, <strong>Kimi-k2</strong> dominated the creative quality metrics:</p>
<ul>
<li><strong>5.0 out of 5.0</strong> for Imagery</li>
<li><strong>5.0 out of 5.0</strong> for Craftsmanship</li>
<li>Highest emotional impact scores</li>
</ul>
<p>I did not see this coming. Kimi is the dark horse of frontier models, and if you&#x27;re writing fiction or building immersive worlds, it&#x27;s worth a look.</p>
<hr/>
<h2>So what should you actually do?</h2>
<h3>If you&#x27;re writing code or technical docs:</h3>
<p><strong>Just be direct.</strong> Politeness adds length but not quality. A terse &quot;write a function that...&quot; works fine.</p>
<h3>If you&#x27;re doing creative work:</h3>
<p><strong>Go big or go home.</strong> Either be effusively grateful <em>or</em> adopt a demanding persona. The middle ground (polite but measured) produces the most generic outputs.</p>
<h3>If you&#x27;re using GPT:</h3>
<p><strong>Seriously, be nice.</strong> It&#x27;s the only model that measurably punishes rudeness with lower effort.</p>
<h3>If you want consistent output regardless of mood:</h3>
<p><strong>Use Gemini.</strong> It genuinely does not care about your feelings.</p>
<hr/>
<h2>The nerdy details (methodology)</h2>
<p>For the skeptics:</p>
<ul>
<li><strong>Deterministic outputs</strong>: All runs used <code>temperature=0.0</code></li>
<li><strong>Blind scoring</strong>: Scorer models never saw tier labels</li>
<li><strong>Cross-model rotation</strong>: No model graded its own outputs</li>
<li><strong>N=5 per condition</strong>: 625 total samples for statistical robustness. Why five runs per prompt? Even at <code>temperature=0.0</code>, LLM outputs aren&#x27;t perfectly deterministic. Subtle variations in tokenization, floating-point arithmetic, and server-side caching can produce slightly different responses. Running each prompt five times lets me average out these micro-fluctuations and capture the model&#x27;s &quot;true&quot; baseline behaviour rather than a lucky (or unlucky) one-off.</li>
</ul>
<div class="my-8"><figure class="my-0"><button type="button" class="block w-full cursor-zoom-in focus:outline-none focus-visible:ring-2 focus-visible:ring-glacier-500 focus-visible:ring-offset-2 rounded-lg" aria-label="View larger image: Completeness distribution across all responses"><img alt="Completeness distribution across all responses" aria-describedby="_R_3r_" loading="lazy" width="3546" height="1755" decoding="async" data-nimg="1" class="rounded-lg w-full" style="color:transparent" srcSet="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fcompleteness_distribution.a05e6bff.png&amp;w=3840&amp;q=75 1x" src="/_next/image?url=%2F_next%2Fstatic%2Fmedia%2Fcompleteness_distribution.a05e6bff.png&amp;w=3840&amp;q=75"/></button><figcaption id="_R_3r_" class="mt-3 text-sm text-center text-zinc-600 dark:text-zinc-400 font-sans">Completeness distribution across 625 responses: Most outputs achieved high completeness regardless of tone.</figcaption></figure></div>
<p>Full dataset and analysis scripts are <a href="https://github.com/aklodhi98/llm-politeness-study">on GitHub</a>.</p>
<hr/>
<h2>The bottom line</h2>
<p>After 625 API calls and more spreadsheets than I care to admit, here&#x27;s what I can tell you:</p>
<p><strong>Politeness doesn&#x27;t make AI try harder.</strong> Not in any statistically meaningful way.</p>
<p>What <em>does</em> matter:</p>
<ul>
<li><strong>The model you choose</strong> (they have very different personalities)</li>
<li><strong>Intensity of framing</strong> (for creative tasks, passion beats politesse)</li>
<li><strong>GPT specifically</strong> (where rudeness costs you)</li>
</ul>
<p>So keep saying &quot;please&quot; if it makes you feel like a good person. Just know that for most models, it&#x27;s not unlocking any secret capabilities.</p>
<p>The real magic words? <strong>Clarity. Specificity. And maybe a little existential urgency.</strong></p>]]></content:encoded>
            <author>adnan@aklodhi.com (Adnan Khan)</author>
        </item>
        <item>
            <title><![CDATA[RUX: Fighting fake UX with real UX (and what it actually takes)]]></title>
            <link>https://aklodhi.com/articles/rux-fighting-fake-ux-with-real-ux</link>
            <guid>https://aklodhi.com/articles/rux-fighting-fake-ux-with-real-ux</guid>
            <pubDate>Sat, 20 Dec 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Real UX comes from enforcement mechanisms that make user research non-negotiable. Here's how organisations escape FaUX.]]></description>
            <content:encoded><![CDATA[<p>In <a href="/articles/faux-the-rise-of-fake-ux">Part I of this series</a>, I diagnosed FaUX—Fake UX. The workshops that go nowhere. The research that&#x27;s ignored. The designers brought in to decorate decisions they had no part in making.</p>
<p>If you recognised your organisation in that piece, you&#x27;re probably wondering: how do we get out?</p>
<p>Here&#x27;s the uncomfortable truth I&#x27;ve learned from studying the organisations that actually escaped FaUX—from government digital services to Silicon Valley to legacy enterprises: <strong>Real UX doesn&#x27;t come from better methodologies or more passionate designers. It comes from enforcement mechanisms that make user research non-negotiable.</strong></p>
<p>The difference between FaUX and RUX isn&#x27;t intention. Everyone intends to be user-centered. The difference is accountability.</p>
<hr/>
<h2>The enforcement problem</h2>
<p>Most organisations practicing FaUX already have the right tools. They have design systems. They have research repositories. They have journey maps collecting dust in Confluence. They might even have a Service Design team.</p>
<p>What they don&#x27;t have is consequences.</p>
<p>When user research can be skipped because of timeline pressure, it will be skipped. When UX recommendations can be overridden by executive preference, they will be overridden. When workshops can produce outputs that no one is obligated to act on, no one will act on them.</p>
<p>The organisations that practice Real UX have solved this problem structurally. Not culturally. Structurally.</p>
<hr/>
<h2>How organisations actually escaped FaUX</h2>
<p>The escape routes look different depending on context, but they share a common thread: tying UX to something the organisation already cares about—money, risk, or reputation.</p>
<h3>In government: tying UX to funding gates</h3>
<p>The <a href="https://www.gov.uk/government/organisations/government-digital-service">UK Government Digital Service&#x27;s</a> most important innovation wasn&#x27;t their design principles or their pattern library. It was this: <strong>service assessments became a condition of Cabinet Office spend approval.</strong></p>
<p>Teams cannot get funding to proceed without demonstrating genuine user research. Four-hour assessment sessions with specialist panels produce ratings for each of their <a href="https://www.gov.uk/service-manual/service-standard">14 service standards</a>. Reports are published publicly.</p>
<p>Suddenly, &quot;we didn&#x27;t have time for research&quot; stops being an acceptable answer. Because no research means no money.</p>
<p>The US followed with the <a href="https://digital.gov/resources/21st-century-integrated-digital-experience-act/">21st Century IDEA Act</a>, legally mandating that federal digital services be accessible, consistent, and user-centered through &quot;qualitative and quantitative data-driven analysis.&quot; Not guidelines. Law.</p>
<p>Australia took a similar path. The <a href="https://www.dta.gov.au/">Digital Transformation Agency</a> introduced its own <a href="https://www.digital.gov.au/digital-service-standard">Digital Service Standard</a>, with Version 2.0 becoming mandatory for all new government services from July 2024. Criterion 2—&quot;Know your user&quot;—requires agencies to conduct regular user research, test designs with diverse user groups, and demonstrate validated solutions at each phase. Services with more than 50,000 transactions per year face DTA assessment.</p>
<p>I&#x27;ve seen this work firsthand. Working as a UX designer on <a href="https://my.gov.au/">myGov</a> at Services Australia, I watched research-driven changes move the needle on real outcomes. One example: allowing users to sign in with their email address instead of a system-generated username reduced failed login attempts by 37%. That&#x27;s not a design flourish—it&#x27;s friction removed because someone studied where users were getting stuck. The <a href="https://www.dss.gov.au/disability-and-carers/programs-services/for-people-with-disability/national-disability-insurance-scheme/mygov">$630 million investment</a> announced in the 2024–25 Budget signals that this isn&#x27;t a one-off commitment; it&#x27;s sustained investment in getting government services right.</p>
<h3>In enterprise: tying UX to executive accountability</h3>
<p>IBM&#x27;s transformation took a different route. They invested <a href="https://www.fastcompany.com/3053406/how-ibm-is-reinventing-itself-as-a-design-company">$100 million to hire 1,000 designers</a> and train 100,000 employees in <a href="https://www.ibm.com/design/thinking/">Enterprise Design Thinking</a>. But the money wasn&#x27;t the key—the structural change was.</p>
<p>They introduced &quot;The Loop&quot; (Observe, Reflect, Make) as a mandatory governance process, not an optional workshop. They embedded designers into engineering squads at ratios of 1:8. <a href="https://www.ibm.com/design/thinking/static/media/forrester-IBM-Design-Thinking.pdf">Forrester found</a> this reduced development time by 33% and doubled project ROI.</p>
<p>The lesson: IBM didn&#x27;t just add designers. They changed how decisions got made and who was in the room when they happened.</p>
<p>Intuit took a similar approach with &quot;<a href="https://www.intuit.com/company/innovation/">Design for Delight</a>,&quot; but went further by creating a corps of &quot;Innovation Catalysts&quot;—employees from Finance, HR, Engineering, and other functions trained to coach teams in customer empathy methods. UX stopped being something the design team did; it became something everyone was accountable for.</p>
<p><a href="https://www.atlassian.com/">Atlassian</a>, the Sydney-founded collaboration software company, built research into its operating model from early on. They embed UX researchers directly into product teams—not as a shared service you book time with, but as permanent members of the squad. Their internal research panel, the &quot;Atlassian Research Group,&quot; includes over 50,000 customers, making participant recruitment a solved problem rather than a three-week ordeal. They run a dedicated research facility called &quot;Atlab&quot; for in-depth studies. The result: research becomes as routine as sprint planning, not a special event that requires executive sign-off.</p>
<h3>In startups: building it into the DNA before there&#x27;s a culture to change</h3>
<p>Airbnb&#x27;s founders were designers who understood something most startups miss: the product <em>is</em> the experience.</p>
<p>When they were struggling for traction, they didn&#x27;t hire a growth hacker. They flew to New York, stayed with hosts, and observed the friction firsthand. They discovered the core problem wasn&#x27;t the booking flow—it was trust. That insight led to <a href="https://www.wired.com/2010/07/how-photography-solved-airbnbs-growth-troubles/">professional photography services</a> and the peer review system that unlocked a billion-dollar market.</p>
<p>The advantage startups have: no legacy culture to overcome. The risk: if you don&#x27;t embed user-centered practices early, you&#x27;ll be trying to retrofit them later when you&#x27;re bigger and it&#x27;s harder.</p>
<hr/>
<h2>The four shifts from FaUX to RUX</h2>
<p>Whatever your sector, the shift from Fake UX to Real UX requires the same fundamental changes. The implementation differs; the principles don&#x27;t.</p>
<h3>Shift 1: from outputs to outcomes</h3>
<p>FaUX measures success by what was produced: features shipped, designs delivered, research reports written.</p>
<p>RUX measures success by what changed: task completion improved, support calls reduced, user satisfaction increased, time-to-completion shortened.</p>
<p><strong>In government:</strong> The UK established four mandatory KPIs that all digital services must measure and publish: cost per transaction, user satisfaction, completion rate, and digital take-up. These are published on <a href="https://www.gov.uk/performance">public dashboards</a> where anyone—including journalists, auditors, and Parliament—can see them. When your completion rate is 34% and it&#x27;s public, you can&#x27;t pretend everything is fine.</p>
<p><strong>In enterprise:</strong> <a href="https://www.mckinsey.com/capabilities/mckinsey-design/our-insights/the-business-value-of-design">McKinsey&#x27;s Business Value of Design study</a> found that companies in the top quartile of their Design Index achieved 32% higher revenue growth and 56% higher shareholder returns than industry peers. The top performers tracked design metrics with the same rigor as financial metrics—not as vanity dashboards, but tied to executive accountability.</p>
<p><strong>In startups:</strong> The equivalent is being ruthless about whether features actually move the metrics you claim to care about. Most startups ship features and never look back. The ones that escape FaUX track adoption, retention impact, and task success for everything they ship—and kill features that don&#x27;t perform.</p>
<p><strong>The practical shift:</strong> Stop roadmaps that promise features. Start roadmaps that promise outcomes. Instead of &quot;Build chatbot (Q3),&quot; commit to &quot;Reduce support ticket volume by 20% (Q3).&quot; This gives the team license to discover the <em>right</em> solution—which might not be a chatbot at all.</p>
<h3>Shift 2: from validation to discovery</h3>
<p>FaUX uses research to validate decisions that have already been made. &quot;We tested it and users liked it&quot; (after we&#x27;d already committed to building it and couldn&#x27;t change course anyway).</p>
<p>RUX uses research to discover what to build in the first place. The decision about <em>what</em> to build comes <em>after</em> understanding the problem, not before.</p>
<p><strong>In government:</strong> The <a href="https://www.gov.uk/service-manual/service-standard">UK Service Standard</a> requires teams to demonstrate research at Alpha, Beta, and Live phases—with independent assessments at each gate. You can&#x27;t start building (Beta) without proving you understood the problem (Alpha). This kills the most common FaUX pattern: bringing designers in after the solution has already been determined.</p>
<p><strong>In enterprise:</strong> Teresa Torres&#x27;s <a href="https://www.producttalk.org/continuous-discovery/">Continuous Discovery</a> framework has gained traction precisely because it makes research sustainable at corporate scale. The core habit: the product trio (PM, designer, engineer) talks to at least one user every week. Not quarterly &quot;big bang&quot; research projects. Weekly conversations that make research a routine habit, like standups or code reviews.</p>
<p><strong>In startups:</strong> The &quot;Wizard of Oz&quot; and &quot;Fake Door&quot; tests let you validate demand with zero engineering time. Put a button on the site that says &quot;New Feature.&quot; If users click it, show a &quot;Coming Soon&quot; message and count the clicks. You&#x27;ve just validated (or invalidated) demand in an afternoon instead of a sprint.</p>
<p><strong>The practical shift:</strong> Block one hour per week for direct user contact. Automate recruitment through in-app intercepts or drip campaigns. Make it a team habit, not a special event requiring three weeks of planning.</p>
<h3>Shift 3: from consensus to evidence</h3>
<p>FaUX resolves disagreements through opinion, seniority, or politics. The HIPPO (Highest Paid Person&#x27;s Opinion) wins. Dot-voting in workshops determines direction based on who&#x27;s in the room, not what users actually need.</p>
<p>RUX resolves disagreements through evidence. &quot;Let&#x27;s test it&quot; becomes the default response to conflicting opinions.</p>
<p><strong>Across all sectors:</strong> This requires creating psychological safety around being wrong. In a RUX culture, proving that an executive&#x27;s pet idea doesn&#x27;t work is <em>celebrated</em>—you just saved the organisation months of wasted effort. In a FaUX culture, that same finding gets buried because no one wants to be the messenger.</p>
<p>Amazon&#x27;s &quot;two-way door&quot; framework helps here. Two-way doors are reversible decisions (button color, copy)—just ship and measure. One-way doors are irreversible decisions (core pricing model, platform architecture)—do rigorous research first. Not everything needs the same level of validation.</p>
<p><strong>The practical shift:</strong> When someone proposes a solution, respond with &quot;That&#x27;s an interesting hypothesis. What&#x27;s the fastest way we could test it?&quot; Frame ideas as bets to be validated, not decisions to be defended. And when you test an executive&#x27;s idea against alternatives, present results in terms of <em>their</em> goals: &quot;We tested your idea against Alternative B. Your idea had 5% conversion; B had 15%. We recommend B to maximise your goal of increasing revenue.&quot;</p>
<h3>Shift 4: from isolated function to integrated practice</h3>
<p>FaUX treats UX as a service bureau. Product and engineering request designs; designers deliver them. Research is something the research team does, separate from delivery.</p>
<p>RUX embeds UX into cross-functional teams where product, design, and engineering work together throughout. Designers aren&#x27;t downstream of decisions—they&#x27;re in the room when decisions are made.</p>
<p><strong>In government:</strong> The UK GDS achieved this by requiring multidisciplinary teams as a <a href="https://www.gov.uk/service-manual/service-standard/point-6-have-a-multidisciplinary-team">service standard</a>. You literally cannot pass assessment without demonstrating that your team includes the right disciplines working together.</p>
<p><strong>In enterprise:</strong> Spotify&#x27;s squad model (for all its complications) got this right: small cross-functional teams with product, design, and engineering working on shared outcomes. The design system enables consistency without requiring centralised control of every decision.</p>
<p><strong>In startups:</strong> You don&#x27;t have the luxury of silos anyway. The risk is the opposite—design getting swallowed by engineering priorities because there&#x27;s no structural protection for research time.</p>
<p><strong>The practical shift:</strong> If your designers are receiving tickets that say &quot;mock up this screen&quot; rather than &quot;help us solve this problem,&quot; you have a structural issue. Reorganise around problems, not functions.</p>
<hr/>
<h2>The kill rate: a metric that signals real UX</h2>
<p>Here&#x27;s a counterintuitive indicator of UX maturity: <strong>how many ideas did you kill before writing code?</strong></p>
<p>In FaUX organisations, every idea gets built. The backlog grows endlessly. Features ship regardless of whether research supported them.</p>
<p>In RUX organisations, the funnel is wide at the top (lots of ideas explored) and narrow at the bottom (few make it to development). A healthy discovery process should eliminate most ideas before they consume engineering resources.</p>
<p>If your organisation builds everything that gets proposed, you&#x27;re not doing discovery. You&#x27;re doing order-taking dressed up in design thinking language.</p>
<p>Track your kill rate. Celebrate the ideas that got invalidated early. Every killed idea is money and time saved—<a href="https://www.interaction-design.org/literature/article/the-roi-of-user-experience">50% of developer time</a> is estimated to be spent on rework that could have been avoided with earlier validation.</p>
<hr/>
<h2>The ROI argument (because you&#x27;ll need it)</h2>
<p>Talking about user needs doesn&#x27;t move budgets. Talking about money does.</p>
<p><strong>The revenue case:</strong> <a href="https://www.mckinsey.com/capabilities/mckinsey-design/our-insights/the-business-value-of-design">McKinsey found</a> top-quartile design performers achieve 32% higher revenue growth. <a href="https://www.forrester.com/report/The+Business+Impact+Of+Customer+Experience+2020/-/E-RES175032">Forrester found</a> experience-led businesses have 1.6x higher brand awareness and 1.5x higher employee satisfaction.</p>
<p><strong>The cost case:</strong> Every dollar invested in UX returns roughly <a href="https://www.interaction-design.org/literature/article/the-roi-of-user-experience">$100 in value</a>, primarily through avoided rework. Fixing a UX error after development is <a href="https://www.nngroup.com/articles/ux-debt/">up to 100 times more expensive</a> than fixing it during design. UK government data shows online transactions cost £0.22 versus £6.62 by post—channel shift driven by good UX pays for itself.</p>
<p><strong>The retention case:</strong> <a href="https://www.gomoxie.com/blog/88-of-online-consumers-are-less-likely-to-return-after-a-bad-experience/">88% of users</a> are unlikely to return after a bad experience. <a href="https://www.mckinsey.com/capabilities/growth-marketing-and-sales/our-insights/the-value-of-getting-personalization-right-or-wrong-is-multiplying">Better UX increases willingness to pay by 14.4%</a>. One SaaS platform reduced support requests by 40% through UX improvements, directly impacting operating margins.</p>
<p>These numbers aren&#x27;t arguments for &quot;nice to have.&quot; They&#x27;re arguments for fiduciary responsibility.</p>
<hr/>
<h2>When the law says one thing and users need another</h2>
<p>For those of us in government or regulated industries, there&#x27;s a tension that private sector UX rarely confronts: <strong>what happens when legislation requires something that creates friction for users?</strong></p>
<p>This isn&#x27;t hypothetical. Government services are built on legislation and regulation. The policy intent—what government wants to achieve—is often codified in law before any designer touches the project. You can&#x27;t just ignore it because users find it inconvenient.</p>
<p>But here&#x27;s what I&#x27;ve learned from studying how the best government teams handle this: <strong>the answer isn&#x27;t compromise. Settling in the middle doesn&#x27;t help anybody.</strong></p>
<h3>Understanding policy intent before you design</h3>
<p>The <a href="https://www.gov.uk/service-manual/design/working-with-policy-teams">UK Government Service Manual</a> is explicit about this: service teams need a clear understanding of what government wants to change or achieve through its policies before they start designing. This means understanding:</p>
<ul>
<li>What outcome the policy was designed to deliver</li>
<li>Who will be affected</li>
<li>How success will be measured</li>
<li>What related policies, regulations, and contractual commitments apply</li>
</ul>
<p>If you skip this step and design purely based on user research, you&#x27;ll build something that doesn&#x27;t reflect the policy at all. And it won&#x27;t ship.</p>
<p>But here&#x27;s the crucial insight: <strong>policy intent is not the same as current implementation.</strong> The legislation might require identity verification, but it doesn&#x27;t mandate a specific clunky process. The regulation might require certain information to be collected, but it doesn&#x27;t specify that users must enter it three times across four forms.</p>
<p>The best service designers distinguish between what the law actually requires and what current systems have layered on top.</p>
<h3>Creating feedback loops from delivery to policy</h3>
<p>The UK Department for Education developed a technique called &quot;<a href="https://www.gov.uk/guidance/impact-mapping-for-services">impact mapping with user value</a>.&quot; Standard impact mapping moves from desired impacts straight to activities and features. They added an extra stage—value to users—so they could see user needs alongside policy intent.</p>
<p>This matters because it surfaces conflicts early. If the policy intent and user needs are fundamentally misaligned, that&#x27;s information the policy team needs to hear.</p>
<p>And here&#x27;s where Real UX differs from Fake UX: in RUX organisations, <strong>user research actually feeds back into policy design.</strong></p>
<p>One example from <a href="https://www.navapbc.com/">Nava</a>, a US civic tech consultancy: they were implementing an unemployment insurance system where policy required beneficiaries to recertify their wages weekly—even when their benefit amount never changed. User research revealed this was burdensome for people already dealing with tough situations: injury, illness, new babies. The designers didn&#x27;t just implement the requirement and move on. They documented the user impact and suggested the state simplify the policy. That suggestion was ultimately reflected in finalized regulations.</p>
<p>This is the difference between service design and screen design. Service design works &quot;from front to back&quot;—not just the user-facing interface, but the internal processes, supporting policy, and organisational structures. If your user research never influences anything upstream, you&#x27;re not doing service design. You&#x27;re decorating.</p>
<h3>The practical approach</h3>
<p>When you&#x27;re working within legislative constraints:</p>
<p><strong>Map policy provisions across the service blueprint.</strong> Identify exactly which requirements come from legislation versus inherited process versus &quot;we&#x27;ve always done it this way.&quot; You&#x27;d be surprised how much friction comes from the third category.</p>
<p><strong>Distinguish between policy intent and policy implementation.</strong> The intent might be &quot;verify identity&quot;; the implementation is a specific process. You often have more flexibility than you think in <em>how</em> to achieve the intent.</p>
<p><strong>Document user pain points that stem from policy, not just UI.</strong> This creates the evidence base for policy feedback. If you never surface these issues, they&#x27;ll never get fixed.</p>
<p><strong>Build relationships with policy teams.</strong> As one UK Chief Digital Officer put it: &quot;The service designer has to be able to start to talk policy and policy has to be able to start to talk service design.&quot; If you can&#x27;t explain why a policy requirement is creating user harm, you can&#x27;t influence it.</p>
<p><strong>Accept that some friction is intentional.</strong> Not all user friction is bad. Sometimes legislation deliberately creates barriers—for fraud prevention, safety, or equity reasons. Your job is to understand which friction is intentional and which is accidental, then eliminate the accidental kind.</p>
<p>The goal isn&#x27;t to override policy with user preferences. It&#x27;s to ensure policy achieves its intent <em>while</em> serving users well. Those aren&#x27;t always in conflict—but when they are, the conflict needs to be surfaced, not buried.</p>
<hr/>
<h2>What to do if you&#x27;re stuck in FaUX</h2>
<p>Let&#x27;s be honest: most of us can&#x27;t wave a wand and restructure our organisations. We don&#x27;t control funding gates or executive priorities. We&#x27;re practitioners trying to do good work within systems that weren&#x27;t designed for it.</p>
<p>Some practical tactics:</p>
<p><strong>Start with one team.</strong> Pick a project with sympathetic leadership. Apply rigorous discovery practices. Document everything—especially the money saved by killing bad ideas early. Use this as proof of concept for broader change.</p>
<p><strong>Make the cost of FaUX visible.</strong> Track how much rework could have been avoided with earlier research. Calculate the support costs generated by usability issues. Put numbers to the pain.</p>
<p><strong>Build coalitions.</strong> Find allies in product management, engineering, and leadership who understand the problem. Change doesn&#x27;t happen through design teams alone.</p>
<p><strong>Use data as a shield.</strong> When HIPPO pushes back on research findings, frame your response in terms of their goals and with data they can&#x27;t dismiss.</p>
<p><strong>Time-box everything.</strong> The &quot;we don&#x27;t have time&quot; objection is perennial. Counter it with: &quot;It takes two weeks to build this and two days to test a prototype. If we build it wrong, we waste two weeks plus rework. If we test it and it fails, we save two weeks. Research is an accelerator, not a tax.&quot;</p>
<p>And sometimes, honestly, the answer is: find an organisation that gets it. Life is too short to spend your career decorating decisions you had no part in making.</p>
<hr/>
<h2>The uncomfortable truth about transformation</h2>
<p>Here&#x27;s something the case studies don&#x27;t emphasise enough: <strong>crisis is often what creates the conditions for change.</strong></p>
<p><a href="https://www.theatlantic.com/technology/archive/2015/07/the-secret-startup-saved-healthcare-gov-the-worst-website-in-america/397784/">Healthcare.gov&#x27;s catastrophic failure</a>—6 enrollments on launch day—created the political mandate for the <a href="https://www.usds.gov/">US Digital Service</a>. The embarrassment of 2,000 disjointed UK government websites enabled <a href="https://www.gov.uk/">GOV.UK&#x27;s</a> radical consolidation. IBM&#x27;s commoditisation crisis justified a $100 million design investment. Airbnb&#x27;s existential early struggles forced the founders to get on planes and actually meet their users.</p>
<p>If your organisation is stuck in FaUX and comfortable, it may stay stuck. Visible failure generates the urgency that comfortable mediocrity never will.</p>
<p>I&#x27;m not suggesting you sabotage projects. I am suggesting that if you&#x27;re waiting for leadership to spontaneously prioritise user-centered design without external pressure, you may be waiting a long time.</p>
<hr/>
<h2>What RUX actually looks like</h2>
<p>Real UX isn&#x27;t a utopia where every recommendation gets implemented and executives defer to research.</p>
<p>It&#x27;s a system where:</p>
<ul>
<li>User research is a prerequisite for decisions, not a post-hoc justification</li>
<li>UX outcomes are measured, published, and tied to accountability</li>
<li>Designers have seats at tables where strategy is set, not just execution</li>
<li>Bad ideas get killed early, and the killing is celebrated</li>
<li>Cross-functional teams own problems together rather than passing artifacts over walls</li>
<li>Compliance requirements are integrated into practice, not bolted on at the end</li>
<li>Research findings feed back into policy and process, not just interface design</li>
</ul>
<p>Getting there requires structural change, not just cultural aspiration. It requires tying UX to the things organisations actually care about: money, risk, and public accountability.</p>
<p>The organisations that escaped FaUX didn&#x27;t do it by wanting it more. They did it by building systems that made Real UX the path of least resistance.</p>
<hr/>
<p><em>This is Part II of the FaUX series. <a href="/articles/faux-the-rise-of-fake-ux">Part I diagnosed the problem</a>; this piece outlined the path out. What&#x27;s your organisation&#x27;s biggest barrier to Real UX? I&#x27;d like to hear about it—find me on <a href="https://x.com/aklodhi98">X</a> or <a href="https://www.linkedin.com/in/adnank98">LinkedIn</a>.</em></p>]]></content:encoded>
            <author>adnan@aklodhi.com (Adnan Khan)</author>
        </item>
        <item>
            <title><![CDATA[FaUX: The rise of fake UX (and how to know if you're practicing it)]]></title>
            <link>https://aklodhi.com/articles/faux-the-rise-of-fake-ux</link>
            <guid>https://aklodhi.com/articles/faux-the-rise-of-fake-ux</guid>
            <pubDate>Fri, 12 Dec 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Workshops that go nowhere. Research that's ignored. Designers with no influence. It's called FaUX — Fake UX — and it's costing more than you think.]]></description>
            <content:encoded><![CDATA[<p>Every organisation claims to be user-centered now. It&#x27;s table stakes. The language of UX has permeated boardrooms, investor decks, and job descriptions across industries. We have more designers employed than ever before, more research tools than we could possibly use, and more frameworks than any single team could implement.</p>
<p>And yet.</p>
<p>Something&#x27;s off. Despite all this UX maturity, users still encounter baffling experiences daily. Designers burn out at alarming rates. Research findings gather dust. The gap between what organisations say about their commitment to users and what they actually ship has never felt wider.</p>
<p>I have a name for this phenomenon: <strong>FaUX</strong> — Fake UX.</p>
<p>FaUX is what happens when organisations adopt the aesthetics of user-centered design without the accountability. The rituals exist. The job titles exist. The workshops happen. But the feedback loop between understanding users and actually serving them better? Broken.</p>
<h2>The anatomy of FaUX</h2>
<p>FaUX manifests in predictable patterns. You&#x27;ll recognise them immediately—either from your own organisation or from products you&#x27;ve had the misfortune of using.</p>
<h3>Process theatre</h3>
<p>This is FaUX at its most visible: the performance of user-centered design without the substance.</p>
<p><strong>Fakeshops.</strong> Workshops conducted with full zeal and zest. Sticky notes flying around, everyone clutching multiple Sharpies, Miro boards so dense you&#x27;d need to zoom for an hour to parse them. Great energy. Impressive documentation. Outcomes that are &quot;ready for action.&quot; Except no action is ever taken. The documents slide into a void where no one looks at them again. A few weeks later, you repeat the entire exercise.</p>
<p><strong>Checkbox research.</strong> Running a single usability test so someone can write &quot;validated with users&quot; in a slide deck. The findings don&#x27;t influence anything. They weren&#x27;t meant to. The research existed to satisfy a process requirement, not to generate insight.</p>
<p><strong>Fake validation.</strong> Executives referencing their conference keynotes or webinar Q&amp;As as &quot;talking to users.&quot; Internal demos to colleagues framed as user testing. Confirmation bias dressed up as discovery.</p>
<p><strong>Synthetic user dependency.</strong> Using AI-generated personas or simulated user responses as a substitute for actual human research. As I&#x27;ve written before, AI cannot replicate the emotional richness, contradictions, and contextual messiness of real human beings. Synthetic users give you synthetic insights.</p>
<h3>Structural disempowerment</h3>
<p>This is FaUX embedded in how organisations are built. The designers exist, but they&#x27;ve been architecturally prevented from influencing outcomes.</p>
<p><strong>The decorative UX function.</strong> A UX team that&#x27;s there for show. Their work carries no weight. No one with decision-making authority pays attention to their recommendations. They&#x27;re a line item that makes the org chart look modern.</p>
<p><strong>Post-decision design.</strong> Designers brought in after the solution has already been determined by others. Their job isn&#x27;t to solve problems—it&#x27;s to make predetermined solutions look presentable. Painting by numbers into an already laid-out canvas.</p>
<p><strong>Proximity without influence.</strong> UX positioned so close to development that there&#x27;s literally no scope to change anything substantive. The role has been reduced to producing high-fidelity wireframes on a timeline that permits no iteration.</p>
<p><strong>The HIPPO effect.</strong> Highest Paid Person&#x27;s Opinion. In the review meeting, an executive says, &quot;I don&#x27;t like blue, and I think users want a dashboard,&quot; despite data proving otherwise. The team pivots to please the executive. Research becomes decoration; intuition (of the powerful) becomes strategy.</p>
<h3>Surface-level craft</h3>
<p>This is FaUX disguised as quality. The work looks impressive but misses the point entirely.</p>
<p><strong>Pixel perfection, zero usability.</strong> Interfaces polished to a gleaming shine, with no attention paid to whether humans can actually use them. Aesthetic refinement as a substitute for functional design. As I&#x27;ve explored in <a href="/articles/reducing-cognitive-load-for-usability">reducing cognitive load for usability</a>, real usability is about making things easier to process, not prettier to look at.</p>
<p><strong>Lipstick on a pig.</strong> Beautifully designed forms connecting to a fundamentally broken service experience. The UI is gorgeous; the underlying journey is a nightmare. The design team has optimised the visible 10% while the invisible 90% remains hostile to users.</p>
<p><strong>Metrics theatre.</strong> Tracking NPS, CSAT, or satisfaction scores that leadership celebrates in good quarters and conveniently ignores when they decline. Measurement without accountability. Dashboards that exist to provide comfort, not insight.</p>
<h3>The infinite backlog</h3>
<p>FaUX has a favourite graveyard: the product backlog.</p>
<p>UX recommendations get &quot;prioritised&quot;—which means they&#x27;re added to a list where they&#x27;ll sit indefinitely behind revenue features and technical debt. The research was done. The insights were documented. The recommendations were made. They&#x27;re just never going to be implemented.</p>
<p>Discovery without delivery. Months of research, zero shipped improvements. The organisation can claim it invested in understanding users while conveniently never acting on that understanding.</p>
<hr/>
<h2>The FaUX diagnostic: How to know if you&#x27;re practicing it</h2>
<p>Here&#x27;s a self-assessment. Be honest with yourself.</p>
<p><strong>1. Trace your last three research studies. How many findings were implemented?</strong>
If the answer is less than half, you might have a FaUX problem. If you can&#x27;t even locate the findings, you definitely do.</p>
<p><strong>2. When was the last time UX research changed a significant product decision?</strong>
Not refined a decision. Changed it. If you can&#x27;t point to a concrete example in the past year, your research function may be decorative. Remember: <a href="/articles/users-are-fickle-their-whys-are-not">users are fickle, but their whys are not</a>—if you&#x27;re not uncovering and acting on those whys, you&#x27;re doing FaUX.</p>
<p><strong>3. Where does UX sit in your decision-making hierarchy?</strong>
Are designers in the room when strategy is set, or are they briefed afterward? Do they have the authority to push back, or only to execute?</p>
<p><strong>4. What happens when UX recommendations conflict with executive preferences?</strong>
If HIPPO always wins, your user-centered process is a fiction.</p>
<p><strong>5. How much time elapses between research and implementation?</strong>
If insights routinely go stale before they&#x27;re acted upon, the feedback loop is broken.</p>
<p><strong>6. Do your workshops produce outcomes that actually ship?</strong>
Pull up the artifacts from your last three Fakeshop—sorry, workshop—sessions. How many of those sticky-note insights made it into the product?</p>
<p><strong>7. Is your design team measured on craft or outcomes?</strong>
If success is defined by pixel perfection and stakeholder approval rather than user behaviour change, you&#x27;re optimising for the wrong thing.</p>
<hr/>
<h2>Why FaUX persists</h2>
<p>FaUX isn&#x27;t usually malicious. It persists because it&#x27;s functional—for the organisation, if not for users.</p>
<p><strong>FaUX is comfortable.</strong> Real UX requires confronting uncomfortable truths. Users might hate your product. Your assumptions might be wrong. The feature your CEO championed might be solving a problem no one has. FaUX lets organisations feel innovative and user-focused without the discomfort of actually changing anything.</p>
<p><strong>FaUX satisfies process requirements.</strong> Many organisations have mandated &quot;user research&quot; or &quot;design reviews&quot; as stage gates. FaUX checks those boxes efficiently. The process happened. The documentation exists. No one examines whether it influenced anything.</p>
<p><strong>FaUX is cheaper in the short term.</strong> Real UX requires time, resources, and organisational patience. It means sometimes killing features, extending timelines, and admitting mistakes. FaUX keeps the trains running on schedule. The costs show up later—in technical debt, user attrition, and redesign cycles—but later is someone else&#x27;s problem.</p>
<p><strong>FaUX protects power structures.</strong> When UX has real influence, it redistributes decision-making authority. Suddenly, data about user behaviour matters more than executive intuition. Not everyone welcomes that shift. FaUX maintains the appearance of user-centricity while keeping traditional power structures intact.</p>
<p><strong>FaUX is hard to measure.</strong> How do you prove that a workshop was performative? That research was ignored? That a design team is disempowered? The absence of impact is harder to quantify than its presence. FaUX thrives in measurement gaps.</p>
<hr/>
<h2>The cost of FaUX</h2>
<p>FaUX isn&#x27;t free. Organisations pay for it—just not immediately and not in ways that show up on obvious balance sheets.</p>
<h3>Designer burnout and attrition</h3>
<p>Nothing erodes morale faster than feeling like your work doesn&#x27;t matter. Designers who spend months on research that&#x27;s ignored, who watch their recommendations die in backlog purgatory, who are brought in to decorate decisions they had no part in making—they leave. They leave for organisations where they might actually influence outcomes. Or they leave the field entirely.</p>
<p>The cost of replacing a designer is substantial. The cost of losing institutional knowledge and team continuity is worse. The cost of building a reputation as a place where design doesn&#x27;t matter? That affects every future hire.</p>
<h3>Compounding experience debt</h3>
<p>Every ignored insight, every unimplemented recommendation, every user pain point that gets backlogged indefinitely—these accumulate. Like technical debt, experience debt compounds. Small usability issues become ingrained user behaviours (workarounds, avoidance patterns, support ticket habits). By the time the organisation decides to address them, the fix requires not just solving the original problem but undoing the adaptations users built around it.</p>
<h3>User attrition you can&#x27;t diagnose</h3>
<p>Users rarely announce why they&#x27;re leaving. They just leave. They encounter one too many frustrations, find an alternative, and quietly disappear from your metrics. FaUX makes this attrition invisible because the research that might explain it either wasn&#x27;t done or wasn&#x27;t heeded. You know users are churning; you don&#x27;t know why, so you can&#x27;t fix it.</p>
<h3>Erosion of competitive advantage</h3>
<p>In mature markets, experience is often the primary differentiator. When your UX function is performative, you&#x27;re not building that advantage—you&#x27;re just maintaining parity (at best) while competitors who take UX seriously pull ahead. The gap widens slowly, then suddenly.</p>
<h3>The cynicism tax</h3>
<p>Perhaps the most insidious cost: FaUX breeds organisational cynicism. When people watch user-centered language deployed without user-centered action, they stop believing. They stop trying. &quot;We&#x27;ll just do what the executives want anyway&quot; becomes the unspoken operating principle. This cynicism is contagious and persistent. It outlasts any individual initiative to &quot;fix culture.&quot;</p>
<hr/>
<h2>The path forward</h2>
<p>Recognising FaUX is the first step. Escaping it is harder—it requires structural change, not just attitudinal shifts.</p>
<p>In Part II of this series, I&#x27;ll explore what Real UX (RUX) looks like in practice: how to measure UX impact in ways that matter to the business, how to build accountability into design processes, and how to move from UX as a nice-to-have add-on to Product and UX operating as a unified function with shared outcomes.</p>
<p>Because the goal isn&#x27;t just to stop faking it. It&#x27;s to build organisations where understanding users genuinely shapes what gets built.</p>
<hr/>
<p><em>If you recognised your organisation in this piece—or worse, recognised your own practice—I&#x27;d like to hear about it. What does FaUX look like where you work? Find me on <a href="https://x.com/aklodhi98">X</a> or <a href="https://www.linkedin.com/in/adnank98">LinkedIn</a>.</em></p>]]></content:encoded>
            <author>adnan@aklodhi.com (Adnan Khan)</author>
        </item>
        <item>
            <title><![CDATA[Design for grammar, not layout]]></title>
            <link>https://aklodhi.com/articles/design-for-grammar-not-layout</link>
            <guid>https://aklodhi.com/articles/design-for-grammar-not-layout</guid>
            <pubDate>Thu, 04 Dec 2025 00:00:00 GMT</pubDate>
            <description><![CDATA[Interface design is shifting from crafting screens to defining the grammar from which infinite appropriate interfaces can emerge.]]></description>
            <content:encoded><![CDATA[<h2>The great decoupling</h2>
<p>For decades, we&#x27;ve designed software as if pixels were permanent. Every button placement, every dropdown menu, every carefully crafted interaction, all built on the assumption that interfaces are things you <em>make</em> and then <em>maintain</em>. But something fundamental is shifting. The pixel layer is decoupling from the systems beneath it, and this changes everything about how we think about design.</p>
<p>To understand where we&#x27;re headed, we need to think about software in three distinct layers.</p>
<p><strong>Layer 1</strong> is the system of record your databases, your canonical sources of truth, your Salesforce instances and ERPs. This layer isn&#x27;t going anywhere. It&#x27;s the bedrock. The data must persist, must be authoritative, must be trustworthy.</p>
<p><strong>Layer 2</strong> is becoming something new: an agentic layer. This is where AI agents operate autonomously over your data, orchestrating actions, making decisions within boundaries, fetching and transforming information on demand. Think of it as the operational intelligence sitting between your data and your users.</p>
<p><strong>Layer 3</strong> is the pixel layer, the interface itself. And here&#x27;s the radical proposition: this layer is becoming an <em>artefact of intent</em>. Not a permanent structure, but a momentary crystallisation of what someone needs right now.</p>
<h2>State your intent, and your UI appears</h2>
<p>Imagine this: you have a question about your sales pipeline. Instead of navigating to your CRM, clicking through to reports, filtering by date range, and scanning for patterns, you simply state your intent. The system spins up an interface—a chart, generated on demand, perhaps through something as simple as a &quot;nano API&quot;—that exists purely to answer that question. You get your insight. The interface dissolves.</p>
<p>This is throwaway UI. Not disposable in the sense of being cheap or careless, but ephemeral by design. The interface materialises when needed and disappears when its purpose is fulfilled.</p>
<p>The speed from intent to results in this model can be genuinely addictive. Anyone who has used an AI that generates working code, or creates a custom visualisation in seconds, knows the feeling. It&#x27;s not just convenience—it&#x27;s a fundamental shift in the relationship between thought and outcome.</p>
<h2>Traffic decays stochastically</h2>
<p>Here&#x27;s an uncomfortable truth for product designers: in a world of generated interfaces, traffic patterns become unpredictable. Users don&#x27;t follow the same paths through your application because there <em>are</em> no fixed paths. Each interaction might spawn a different interface configuration based on context, history, and stated intent.</p>
<p>This isn&#x27;t chaos—it&#x27;s stochastic decay. The carefully designed funnels and user journeys we&#x27;ve obsessed over start to dissolve when every user can summon precisely the interface they need. Some views will still be visited repeatedly—people might want to save views they find valuable—but the aggregate patterns become probabilistic rather than deterministic.</p>
<h2>What persists, and what doesn&#x27;t</h2>
<p>Not everything can be ephemeral. A trader executing high-frequency transactions needs interface consistency—muscle memory matters when milliseconds count. A surgeon reviewing imaging data needs guaranteed layouts and predictable interactions. A air traffic controller cannot be surprised by their interface.</p>
<p>This points to a crucial distinction: <strong>exploratory software</strong> versus <strong>operational software</strong>.</p>
<p>Exploratory software—tools for research, analysis, creative work, decision support—can embrace the generative model fully. These are contexts where users are exploring possibilities, where the path isn&#x27;t predetermined, where the interface should adapt to the inquiry rather than constraining it.</p>
<p>Operational software—trading platforms, medical systems, industrial controls—needs its persistence. These are contexts where reliability, consistency, and learned expertise matter more than flexibility.</p>
<p>The interesting middle ground is what we might call &quot;exploration-permissible&quot; software: tools that have operational cores but allow for generative interfaces at the edges. Your core workflow is stable, but you can spawn temporary interfaces to investigate, analyse, or experiment without disrupting the critical path.</p>
<h2>Design for the grammar, not the layout</h2>
<p>If interfaces are generated rather than designed, what do designers actually do?</p>
<p>The answer is: define the grammar.</p>
<p>Think about language. We don&#x27;t design every sentence a person might speak—we define vocabulary, syntax, semantics. We create the rules and components from which infinite expressions can emerge. Interface design in the AI age works the same way.</p>
<p>Designers need to define:</p>
<p><strong>Interface grammar</strong>: The components, the rules for composition, the valid combinations. What can exist next to what? How do elements relate? What are the atomic units of interaction?</p>
<p><strong>Attention hierarchies</strong>: How does the system know what deserves human attention? Not just visual hierarchy in a fixed layout, but dynamic attention allocation in generated interfaces. What should pulse, what should fade, what should demand acknowledgment?</p>
<p><strong>State change patterns</strong>: How do transitions work? How does the interface communicate that something has changed? In a generative model, state changes might be more frequent and more varied—the design system needs to handle this gracefully.</p>
<p><strong>Boundary definitions</strong>: What are the hard limits? Where does the generative model stop and human decision-making begin? What can never be automated, abbreviated, or assumed?</p>
<h2>Is your system easy for AI to compose?</h2>
<p>Here&#x27;s a question every product team should be asking: is your software agent-compatible?</p>
<p>This isn&#x27;t just about having an API. It&#x27;s about whether your system exposes itself in ways that AI can understand, compose, and orchestrate. Can an agent navigate your data model coherently? Can it understand the relationships between entities? Can it make reasonable decisions about what to show and what to hide?</p>
<p>The technical concept of <strong>idempotency</strong> becomes crucial here. An idempotent operation produces the same result regardless of how many times it&#x27;s executed. In a world where AI agents are orchestrating your system—potentially making multiple attempts, retrying on failure, parallelising requests—idempotent operations prevent cascading errors and unpredictable states.</p>
<p>Systems that weren&#x27;t designed for agent interaction will struggle. Those built with composability in mind will thrive.</p>
<h2>Beautiful UIs are going to get a lot of competition</h2>
<p>Here&#x27;s the uncomfortable implication for interface designers who pride themselves on craft: when any system can generate a competent, contextually appropriate interface on demand, visual beauty becomes table stakes rather than differentiator.</p>
<p>This doesn&#x27;t mean aesthetics don&#x27;t matter. It means the <em>source</em> of aesthetic value shifts. The beauty of a generated interface isn&#x27;t in its pixels but in its appropriateness—how perfectly it fits the moment, the intent, the user&#x27;s context. An interface that materialises exactly what you need, precisely when you need it, with zero friction, has a different kind of elegance than a lovingly crafted static design.</p>
<p>The competition isn&#x27;t between beautiful and ugly interfaces. It&#x27;s between fixed interfaces—however beautiful—and generative ones that adapt to intent. And in many contexts, adaptability wins.</p>
<h2>Collaboration tools as bedrock</h2>
<p>If we&#x27;re building toward this future, what do we build on?</p>
<p>Collaboration tools might be the answer. They already deal with multi-user state, real-time updates, flexible layouts, and component composition. They&#x27;re built for environments where the &quot;right&quot; interface varies by user, by context, by moment.</p>
<p>The primitives of collaborative software—cursors showing presence, components that can be moved and resized, real-time synchronisation of state—map naturally onto generative interfaces. Multiple users can share a dynamically generated view. Changes propagate instantly. The interface becomes a shared artefact that multiple participants can shape through stated intent.</p>
<p>This isn&#x27;t certain, but it&#x27;s suggestive. The architectural patterns we&#x27;ve developed for Figma, Notion, Miro, and their ilk might be the foundation for whatever comes next.</p>
<h2>A generative mode for applications</h2>
<p>Perhaps the future isn&#x27;t a binary choice between designed interfaces and generated ones. Perhaps it&#x27;s a mode—a switch you can flip.</p>
<p>Your application has its standard interface, the one that&#x27;s been designed and tested and refined. But there&#x27;s also a generative mode, where you can state intents, spawn custom views, compose elements in novel ways. Power users might live in generative mode. New users might start with the designed interface and gradually discover the flexibility beneath.</p>
<p>This suggests a hybrid architecture: a stable, designed shell with generative capabilities embedded within it. The shell provides orientation, consistency, learnability. The generative core provides power, flexibility, responsiveness to intent.</p>
<h2>What designers must do now</h2>
<p>The transition won&#x27;t happen overnight, but designers who want to remain relevant need to start thinking differently:</p>
<p><strong>Design for components, not pages.</strong> Every element should be capable of existing independently, of being composed with other elements in ways you didn&#x27;t anticipate.</p>
<p><strong>Design for state, not static.</strong> Your components need to handle dynamic data, real-time updates, transitions between states. The generative system will be changing things constantly.</p>
<p><strong>Design for boundaries.</strong> Where should the AI stop? What requires human confirmation? What should never be automated? These decisions are design decisions, not just engineering ones.</p>
<p><strong>Design for attention.</strong> In a world of infinite generated interfaces, the scarcest resource is human attention. How do you design systems that know what deserves attention and what doesn&#x27;t?</p>
<p><strong>Design for grammar.</strong> Document your design systems not just as component libraries but as grammars—rules for valid composition, semantic relationships between elements, patterns that the generative layer can learn and apply.</p>
<h2>The interfaces we&#x27;ll keep</h2>
<p>Not every interface will dissolve into generated ephemerality. Some we&#x27;ll keep because they work, because we&#x27;ve built skill around them, because consistency has value.</p>
<p>The question isn&#x27;t whether generated interfaces will replace designed ones. It&#x27;s which contexts call for which approach. And increasingly, that boundary will be determined not by designers or engineers but by users stating their intent and receiving whatever interface best serves that intent in the moment.</p>
<p>The pixels are decoupling. The question is what we build next.</p>]]></content:encoded>
            <author>adnan@aklodhi.com (Adnan Khan)</author>
        </item>
        <item>
            <title><![CDATA[Why talking to your future self can craft your hero's journey]]></title>
            <link>https://aklodhi.com/articles/why-talking-to-your-future-self-can-craft-your-heros-journey</link>
            <guid>https://aklodhi.com/articles/why-talking-to-your-future-self-can-craft-your-heros-journey</guid>
            <pubDate>Mon, 17 Jul 2023 00:00:00 GMT</pubDate>
            <description><![CDATA[Engage in dialogue with your imagined future self to set goals, learn from regrets, and gain confidence for your hero's journey.]]></description>
            <content:encoded><![CDATA[<p>Have you ever wondered where your life is heading? Gazing into the future can be both thrilling and unnerving. But what if I told you there&#x27;s a powerful tool you can use to navigate the unknown and shape your own destiny? Enter the conversation with your future self.</p>
<p>This might sound strange, even fantastical. But engaging in this imaginary dialogue isn&#x27;t about predicting the future; it&#x27;s about <strong>actively creating it.</strong> By conversing with your hypothetical older self, you embark on a unique self-discovery journey, one that empowers you to define your own &quot;hero&#x27;s journey.&quot;</p>
<p>Here&#x27;s how:</p>
<p><strong>1. Charting Your Course:</strong> Imagine yourself ten, twenty, or even fifty years down the line. What are your accomplishments? What challenges did you overcome? What choices led you there? By visualizing your future self, you set <strong>aspirational goals</strong> and identify the <strong>values</strong> that guide your path. This clarity serves as a compass, keeping you focused amidst life&#x27;s inevitable twists and turns.</p>
<p><strong>2. Unearthing Hidden Potential:</strong> Ask your future self about your biggest regrets. What skills did you wish you&#x27;d learned? What opportunities did you miss? These insights become <strong>valuable lessons</strong> for your present self. You&#x27;ll recognize areas for growth and potential roadblocks, allowing you to make informed decisions that align with your future aspirations.</p>
<p><strong>3. Conquering Your Fears:</strong> The unknown can be daunting. But by talking to your future self, you gain a glimpse of <strong>successful navigation</strong> through those uncertainties. You learn about the challenges you&#x27;ll likely face and, more importantly, how you overcame them. This instills <strong>confidence</strong> and <strong>resilience</strong>, equipping you to tackle present obstacles with newfound courage and determination.</p>
<p><strong>4. Cultivating Gratitude:</strong> Imagine your future self expressing gratitude for specific moments or choices you made today. This exercise shifts your perspective, fostering <strong>appreciation</strong> for your present journey and the small steps that contribute to your long-term goals. It reminds you that every experience, even the seemingly insignificant ones, shapes the hero you&#x27;re becoming.</p>
<p><strong>5. Embracing the Journey:</strong> Remember, your future self isn&#x27;t a fixed point; it&#x27;s a dynamic reflection of your present choices. This conversation isn&#x27;t about achieving a preordained destiny, but rather about <strong>embracing the journey of self-discovery.</strong> By actively shaping your path and learning from your future self&#x27;s experiences, you become the author of your own hero&#x27;s tale, filled with unexpected twists, internal battles, and ultimately, triumphant self-actualization.</p>
<p>So, the next time you find yourself contemplating the future, don&#x27;t just wonder. <strong>Engage in dialogue with your future self.</strong> Ask questions, seek guidance, and most importantly, listen. This introspective journey will empower you to chart your course, overcome obstacles, and ultimately, craft a hero&#x27;s journey that is uniquely your own.</p>
<p>The future is unwritten, but you can influence it by telling your story to yourself.</p>]]></content:encoded>
            <author>adnan@aklodhi.com (Adnan Khan)</author>
        </item>
        <item>
            <title><![CDATA[Reducing cognitive load for usability]]></title>
            <link>https://aklodhi.com/articles/reducing-cognitive-load-for-usability</link>
            <guid>https://aklodhi.com/articles/reducing-cognitive-load-for-usability</guid>
            <pubDate>Thu, 17 Feb 2022 00:00:00 GMT</pubDate>
            <description><![CDATA[Types of cognitive load, and solutions to reduce it, including chunking & progressive disclosure.]]></description>
            <content:encoded><![CDATA[<h2>The cognitive load theory</h2>
<p>The central idea behind the theory of cognitive load is that the human brain can only process a small amount of new information, and a large amount of stored information. Cognitive load theory is about optimising the information load for short term memory, make it easier to process or transfer information to long term memory.</p>
<p>For the average user, high cognitive loads directly translate into &#x27;hard to use&#x27;.</p>
<p>There are 3 types of cognitive loads:</p>
<ul>
<li><strong>Intrinsic</strong>. Inherent difficulty of the task or information at hand. For example, is the topic eigenvectors or simple addition? A 12-step application or a quick feedback form?</li>
<li><strong>Extraneous</strong>. Anything that effects the learning of the intrinsic task, such as jargon, inessential info, distracting elements, confusing interface or more.</li>
<li><strong>Germane</strong>. This is a positive load, mainly to reinforce what the user has learnt so far. In traditional learning environments, it includes things such as spaced learning, quick follow-up quizzes and other <a href="https://youtu.be/gtmMMR7SJKw">desirable difficulties</a>. For usability in the digital world, it could include reinforcing previous choices and confirming decisions among others.</li>
</ul>
<p>Research in this area has produced a couple of key techniques to improve instructional design:</p>
<ul>
<li><strong>Worked example effect</strong>. The finding that novice learners perform much better if a worked example of a problem is presented to them along with the instruction.</li>
<li><strong>Expertise reversal effect</strong>. The finding that as learners become more proficient at solving a particular type of problem, they should gradually be given more opportunities for independent problem solving.</li>
<li><strong>Tailored experiences</strong>. Gauge the existing skills and knowledge of the user, and tailor the experience that matches it.</li>
<li><strong>Present all essential information together</strong>. Give users an overview of the entire concept in one go (for usability in the digital world, this strategy could prove counter-productive).</li>
</ul>
<h2>The magical number seven (7)</h2>
<p>One of the most cited cognitive psychology papers is Miller&#x27;s <a href="http://psychclassics.yorku.ca/Miller/">the magical number seven, plus or minus two</a>. The finding is that the average person can only hold about 7 things in their short term memory, plus or minus 2 things.</p>
<p>But what&#x27;s the implication for usability? If your interaction demands a long process, where you are counting on your users to remember things, you might want to review the cognitive load on your users. Are you stretching the limits of what the average person can remember in a short span of time? If you are, then you need to optimise their cognitive load.</p>
<h2>Can we measure cognitive loads?</h2>
<p>For digital experiences, we rely mostly on user-provided feedback to assess cognitive load. This is subjective, as the variability of humans is a constant in this scenario.</p>
<p>However, there has been research into multi-modal scientific studies to measure cognitive loads on students who study difficult-to-grasp concepts such as Integration or Eigenvectors.</p>
<iframe width="672" height="378" src="https://www.youtube-nocookie.com/embed/uDWfIDlDvqA" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture"></iframe>
<h2>Optimising cognitive load for usability</h2>
<p>There are a few usability principles that come into play when optimising cognitive loads:</p>
<h3>Progressive disclosure</h3>
<p>Simply disclose information that is directly needed to do the task at hand.</p>
<h3>Write well</h3>
<p>Write concisely, so users don&#x27;t entirely skip the text, and stay to the point, as the users in most transactional systems are there to get the job done.</p>
<p>Also make sure that your communications meet the <a href="https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests">readability scores</a> needed for your audience.</p>
<h3>Contextual explanations</h3>
<p>Provide distributed, small and to-the-point explanations of all possible friction points. Use layering such as tooltips or expanders to keep these explanations only for those who seek it.</p>
<h3>Limit choices</h3>
<p>Allow users only the most essential choices in any particular pathway in your process. Increasing choices makes decision-making harder. With limited choices, their cognitive load will be significantly reduced.</p>
<h3>Use recognition rather than recall</h3>
<p>This is one of the <a href="https://www.nngroup.com/articles/ten-usability-heuristics/">10 usability heuristics</a> for interface design by the legendary Jakob Nielsen.</p>
<p>Simply let people recognise familiar information, rather than asking them to provide it from memory. For example, if you want users to select a state from a dropdown control, present them a list of the states, rather than asking them to type in the name of the state.</p>
<h3>Provide memory aids</h3>
<p>Reinforce previous choices by repeating or displaying them along the way.</p>
<h3>Chunking</h3>
<p>Chunking is a cognitive psychology concept, where related things are grouped into chunks that are remembered and processed by users much better than just a linear list or queue.</p>
<p><a href="https://en.wikipedia.org/wiki/Chunking_(psychology)">Chunking</a> depends on the existing knowledge of the user cohort. Expert users have advanced knowledge of how things work, so their cognitive load will be different from the average user.</p>
<h3>Subitising</h3>
<p><a href="https://en.wikipedia.org/wiki/Subitizing">Subtising</a> is the rapid and accurate recognition that human brain does, usually only for small numbers of items.</p>
<p>An application of subtising is digit grouping. For example, writing one million as 1,000,000 rather than 1000000.</p>
<h2>Other things to keep in mind</h2>
<h3>Contiguity effect</h3>
<p><a href="https://www.sciencedirect.com/topics/psychology/contiguity-effect">Contiguity effect</a> suggests that things are better recalled that are in close proximity to each other, primarily because human memory depends on association between items and their mental context.</p>
<h3>Use existing mental models, if you can</h3>
<p>In cases where it makes sense to use an <a href="https://www.nngroup.com/articles/minimize-cognitive-load/">existing mental model</a>, it would help immensely to build on top of it. For example, as soon as I mention the word tree, you&#x27;d immediately understand the related concepts of roots, branches, leaves, watering and more.</p>]]></content:encoded>
            <author>adnan@aklodhi.com (Adnan Khan)</author>
        </item>
        <item>
            <title><![CDATA[On design variations]]></title>
            <link>https://aklodhi.com/articles/on-design-variations</link>
            <guid>https://aklodhi.com/articles/on-design-variations</guid>
            <pubDate>Wed, 14 Jul 2021 00:00:00 GMT</pubDate>
            <description><![CDATA[A 7-step framework for systematically exploring design alternatives, escaping the trap of the "obvious" solution, and arriving at decisions you can genuinely defend.]]></description>
            <content:encoded><![CDATA[<p>There&#x27;s a moment in every design project when the &quot;obvious&quot; solution arrives. It shows up uninvited, fully formed, and maddeningly confident. It taps on your shoulder. It whispers that you&#x27;re done. It lies.</p>
<p>The cost of believing that lie is invisible but real: you ship something adequate when something better was within reach. You solve the problem you were handed instead of the problem that actually exists. You optimise for speed when the situation called for depth.</p>
<p>This isn&#x27;t about perfectionism or endless iteration. It&#x27;s about a specific discipline: <strong>systematically generating variations before committing to a direction</strong>. The goal isn&#x27;t more options for their own sake. It&#x27;s arriving at a solution you can genuinely stand behind—one that survives contact with users, stakeholders, and the messy reality of implementation.</p>
<p>Here&#x27;s a framework for getting there.</p>
<h2>The 7-step framework</h2>
<p>This framework comes from Artiom Dashinsky, who developed it while leading design at WeWork and conducting hundreds of design interviews. It appears deceptively simple. It isn&#x27;t. Each step does specific cognitive work that&#x27;s easy to skip and expensive to skip.</p>
<h3>Step 1: Why</h3>
<p>Most design problems arrive pre-packaged as solutions.</p>
<p>&quot;We need a dashboard.&quot; &quot;Build us an onboarding flow.&quot; &quot;Add a settings page.&quot; These aren&#x27;t problems—they&#x27;re conclusions someone else reached. Your job is to unpack them.</p>
<p>The discipline here is asking <em>why</em> until you hit bedrock. Why a dashboard? Because stakeholders want visibility into user behaviour. Why do they want that? Because churn is increasing and they don&#x27;t know why. Why don&#x27;t they know? Because the current analytics are fragmented across three tools.</p>
<p>Now you have a problem worth solving: <em>unified visibility into user behaviour to diagnose churn</em>. A dashboard might solve that. So might automated alerts, a weekly digest, or a predictive model. The solution space just expanded dramatically.</p>
<p><strong>The trap to avoid:</strong> Accepting problem statements that have only one possible solution. If the brief permits only one answer, you&#x27;re not designing—you&#x27;re transcribing.</p>
<h3>Step 2: Who</h3>
<p>&quot;Users&quot; is not a user.</p>
<p>Every design problem involves multiple cohorts with competing needs. A checkout flow serves first-time buyers (who need guidance), repeat customers (who need speed), and gift purchasers (who need different delivery options). Designing for all three equally means designing for none of them well.</p>
<p>The discipline here is explicit selection. List every cohort involved. Choose the one you&#x27;re optimising for. Document that choice. This doesn&#x27;t mean ignoring other cohorts—it means establishing a hierarchy when trade-offs arise.</p>
<p><strong>The trap to avoid:</strong> Designing for an abstracted &quot;average user&quot; who doesn&#x27;t exist, or trying to serve everyone equally and serving no one distinctively.</p>
<h3>Step 3: When and where</h3>
<p>Context isn&#x27;t background information. It&#x27;s design material.</p>
<p>A user checking their bank balance at 7am on their commute has different needs than the same user checking at 11pm after receiving an overdraft notification. Same feature, radically different design requirements.</p>
<p>Map the contextual variables:</p>
<ul>
<li><strong>Location:</strong> Where are they physically? What device are they using?</li>
<li><strong>Trigger:</strong> What prompted this interaction? Habit? Notification? Crisis?</li>
<li><strong>Emotional state:</strong> Calm and exploratory? Anxious and goal-directed?</li>
<li><strong>Before and after:</strong> What happened just before this? What will they do next?</li>
<li><strong>Constraints:</strong> How much time do they have? What&#x27;s competing for their attention?</li>
</ul>
<p>This mapping generates a list of contextual needs that your solution must address. Skip it, and you&#x27;re designing for a vacuum.</p>
<p><strong>The trap to avoid:</strong> Assuming users encounter your product in ideal conditions with full attention and no stress.</p>
<h3>Step 4: What (divergent options)</h3>
<p>Now—and only now—you generate solutions.</p>
<p>The key word is <em>divergent</em>. You&#x27;re not looking for A/B variations on a single concept. You&#x27;re looking for categorically different approaches to the same problem.</p>
<p>If the goal is reducing customer support volume, your options might include:</p>
<ul>
<li>Improved self-service documentation</li>
<li>Proactive in-app guidance</li>
<li>AI-assisted troubleshooting</li>
<li>Community-driven support forums</li>
<li>Redesigned UI that eliminates confusion points</li>
<li>Better onboarding that prevents issues upstream</li>
</ul>
<p>These aren&#x27;t variations. They&#x27;re fundamentally different strategic bets. Each implies different resource requirements, timelines, success metrics, and second-order effects.</p>
<p>Generate at least four or five genuinely distinct options before evaluating any of them. The first two will likely be obvious. The interesting ones come after you&#x27;ve exhausted the obvious.</p>
<p><strong>The trap to avoid:</strong> Generating variations within a single approach and calling it divergent thinking. &quot;Blue button vs. green button&quot; is not strategic exploration.</p>
<h3>Step 5: Prioritise and choose</h3>
<p>Here&#x27;s where rigour earns its keep.</p>
<p>Plot your options on an Effort vs. Impact matrix:</p>
<pre><code>        High Impact
             │
             │   ★ Sweet spot
             │   (High impact, reasonable effort)
             │
Low Effort ──┼── High Effort
             │
             │
             │
        Low Impact
</code></pre>
<p>The discipline isn&#x27;t just placing options on the grid—it&#x27;s pressure-testing your placements. Impact estimates are often inflated by enthusiasm. Effort estimates are almost always understated.</p>
<p>For each option, ask:</p>
<ul>
<li>What&#x27;s the <em>minimum</em> impact this could have? What assumptions would need to hold for maximum impact?</li>
<li>What hidden effort exists? Integration complexity? Organisational change management? Maintenance burden?</li>
<li>What&#x27;s the reversibility? If this fails, how hard is it to try something else?</li>
</ul>
<p>Choose the option that maximises impact while maintaining a realistic relationship with effort. This sounds obvious. It&#x27;s routinely ignored in favour of whatever&#x27;s most exciting or most aligned with existing momentum.</p>
<p><strong>The trap to avoid:</strong> Letting sunk cost, organisational politics, or personal attachment override the matrix.</p>
<h3>Step 6: Solve (task-level design)</h3>
<p>With your strategic direction chosen, you finally design.</p>
<p>The discipline here is task decomposition. List every discrete action a user must take to accomplish their goal within your solution. Then sketch against each task—not as finished UI, but as a way of thinking through the interaction.</p>
<p>This exercise surfaces problems that conceptual thinking misses:</p>
<ul>
<li>Where does the user need to make decisions? Do they have the information to make them?</li>
<li>Where might they get stuck, confused, or frustrated?</li>
<li>What happens when things go wrong?</li>
<li>Where are the hidden dependencies between tasks?</li>
</ul>
<p>Sketching isn&#x27;t about visual design. It&#x27;s about forcing your solution through the narrow aperture of actual use.</p>
<p><strong>The trap to avoid:</strong> Jumping to high-fidelity design before the task flow is proven. Polish obscures structural problems.</p>
<h3>Step 7: How (success metrics)</h3>
<p>A solution without a success metric is a hope, not a design.</p>
<p>Define—before launch—how you&#x27;ll know whether this worked. The metrics should be:</p>
<ul>
<li><strong>Specific:</strong> Not &quot;engagement&quot; but &quot;7-day retention among new users&quot;</li>
<li><strong>Measurable:</strong> You need instrumentation in place, not just intent</li>
<li><strong>Attributable:</strong> You should be able to isolate the effect of your change</li>
<li><strong>Time-bound:</strong> When will you evaluate? What&#x27;s the minimum viable sample?</li>
</ul>
<p>This step also closes the loop to Step 1. If your <em>why</em> was &quot;reduce churn caused by fragmented analytics visibility,&quot; your success metric might be &quot;20% reduction in churn among users who engage with the new unified view within their first 30 days.&quot;</p>
<p>If you can&#x27;t articulate a success metric, you don&#x27;t yet understand your own solution.</p>
<p><strong>The trap to avoid:</strong> Metrics that are easy to measure but don&#x27;t connect to business outcomes. Dashboard views are vanity metrics if they don&#x27;t correlate with the behaviour change you actually need.</p>
<hr/>
<h2>The variations: Where the framework multiplies</h2>
<p>Here&#x27;s where things get interesting.</p>
<p>Real design problems rarely exist in a single context. The same problem may need to work across:</p>
<ul>
<li><strong>Multiple cohorts:</strong> Your primary user and a secondary power-user segment</li>
<li><strong>Multiple contexts:</strong> Mobile on-the-go and desktop deep-work sessions</li>
<li><strong>Multiple constraints:</strong> A brand refresh that&#x27;s in progress, an internationalisation requirement, a platform migration</li>
<li><strong>Multiple futures:</strong> The current product and a planned ecosystem expansion</li>
</ul>
<p>Each of these represents a parallel run through the framework. Same problem, different inputs, potentially different solutions.</p>
<p>Work the framework separately for each major variation. You&#x27;ll end up with a set of solutions—not one—each optimised for its specific context.</p>
<p>Then the real synthesis begins.</p>
<hr/>
<h2>Pattern recognition: The path to general solutions</h2>
<p>When you hold multiple context-specific solutions in view, patterns emerge.</p>
<p>You might notice that three of your five variations share a core interaction model, differing only in surface-level adaptation. That core model is a candidate for your general solution—robust enough to flex across contexts.</p>
<p>Or you might notice that no general solution exists. The cohorts are too different. The contexts are too divergent. The constraints are genuinely incompatible. This is equally valuable information. It tells you that you&#x27;re not building one feature—you&#x27;re building a family of features, or making a hard prioritisation call about which context to serve.</p>
<p>The discipline isn&#x27;t forcing generality. It&#x27;s letting generality emerge from rigorous specificity.</p>
<hr/>
<h2>When to apply this framework</h2>
<p>This level of rigour isn&#x27;t appropriate for every design decision. Use it when:</p>
<ul>
<li>The problem is ambiguous or contested</li>
<li>The stakes are high (significant investment, hard to reverse, high user impact)</li>
<li>Multiple stakeholders have conflicting visions</li>
<li>You&#x27;re designing something foundational that other decisions will build upon</li>
<li>You&#x27;re personally uncertain about the right direction</li>
</ul>
<p>For lower-stakes, well-understood problems, a lighter process is appropriate. The framework is a tool, not a ritual.</p>
<hr/>
<h2>The discipline, restated</h2>
<p>The &quot;obvious&quot; solution that arrives early and confidently is not your enemy. It&#x27;s your starting point. The discipline is treating it as one option among several—worthy of consideration, but not coronation.</p>
<p>Work the problem. Generate genuine alternatives. Evaluate them honestly. Choose with intention. Define success before you ship.</p>
<p>Do this consistently, and you&#x27;ll stop arriving at solutions through intuition and inertia. You&#x27;ll arrive at solutions you can defend—not because you&#x27;re defensive, but because you&#x27;ve actually done the work to know why this solution, for this user, in this context, measured this way, is the right bet to make.</p>
<p>That&#x27;s not perfectionism. That&#x27;s professional design practice.</p>]]></content:encoded>
            <author>adnan@aklodhi.com (Adnan Khan)</author>
        </item>
        <item>
            <title><![CDATA[Clarity over cleverness]]></title>
            <link>https://aklodhi.com/articles/clarity-over-cleverness</link>
            <guid>https://aklodhi.com/articles/clarity-over-cleverness</guid>
            <pubDate>Sat, 07 Mar 2020 00:00:00 GMT</pubDate>
            <description><![CDATA[Prefer clarity over cleverness. Clever makes things clean, seamless and tidy, but there is a cost involved. Keep cleverness for power users instead.]]></description>
            <content:encoded><![CDATA[<p>Clever interfaces are attractive, I&#x27;ll give you that.</p>
<p>There&#x27;s an element of discovery, a little surprise, may be even a pinch of delight, if it matches user expectations.</p>
<h3>The cost of cleverness</h3>
<p>But cleverness brings with it <strong>layering and cognitive load</strong>. Most times, this means complexity, which the average user translates to <em>hard to use</em>.</p>
<p>When you hire a designer to build an interface, your expectation might be that of looking at an exquisite design, never before seen, that will stand out and steal the world&#x27;s attention. Critics will be dumb-founded at the originality, onlookers will stare at it for hours, and you&#x27;ll be smug thinking, it was you who did it, you hired the designer after all.</p>
<p>This line of thinking has just one major problem; clarity trumps cleverness, and if you try to be too clever, users will loath you.</p>
<h3>Designers like clever</h3>
<p>Designers do have a tendency to do clever things. This could be born out of the an urge for originality, or making things prettier, but at the end of the day, usability demands clarity, not cleverness. If you are looking to make things usable, you&#x27;ll be aiming for simplicity.</p>
<h3>Clarity begets speed and efficiency</h3>
<p>Clarity means that the user will do their job faster, with minimal errors. This translates to efficiency. If your business wants to keep users engaged, it&#x27;s probably a good idea to make their lives easier and give them efficient ways of doing things.</p>
<h3>Keep cleverness for power users</h3>
<p>Power users need a lot of flexibility in their work, that&#x27;s primarily why they&#x27;re called power users. For them, it makes sense to think of clever ways to design interfaces. Their work is complex, with lots of options, scenarios and use-cases. Meet their complexity with cleverness if you can.</p>]]></content:encoded>
            <author>adnan@aklodhi.com (Adnan Khan)</author>
        </item>
        <item>
            <title><![CDATA[Users are fickle, their whys are not]]></title>
            <link>https://aklodhi.com/articles/users-are-fickle-their-whys-are-not</link>
            <guid>https://aklodhi.com/articles/users-are-fickle-their-whys-are-not</guid>
            <pubDate>Sat, 29 Jun 2019 00:00:00 GMT</pubDate>
            <description><![CDATA[Users change their mind. Don't take users literally. Try and get the why behind their comments and feedback.]]></description>
            <content:encoded><![CDATA[<p>Users behave differently in different scenarios. There are so many factors involved that you can&#x27;t make definitive conclusions from what they do at the surface level. It&#x27;s probably better to look at what&#x27;s behind user actions.</p>
<h3>Same situation, different behaviours</h3>
<p>It happens to me, and it might&#x27;ve happened to you too; in a situation that was literally the same, you behaved differently based on that moment&#x27;s context. We humans are excited one day, and dull the other. Sometimes we&#x27;re with a girlfriend, other times with dad. Human behaviour is so fluid, you might struggle to repeat your own behaviour from one day to the next.</p>
<p>What does it mean for designers? Is user research kaput? Certainly not. but you might want to look through user actions and arrive at the <em>whys</em>.</p>
<h3>Users say one thing &amp; do another</h3>
<p>Some people buy an SUV because they think they&#x27;ll go offroading. For others, it&#x27;s the thing everyone&#x27;s doing these days, so they like to fit in. Some might like the extra space, still others might prefer it for transporting sports gear. Whatever the <em>whys</em> are, all of these people go ahead and perform the same action: buy an SUV.</p>
<p>I am pretty sure car companies know all of these <em>whys</em>, and then some more. SUVs in general are quite spacious, but most are not designed for offroading. Didn&#x27;t the SUV designers know some people might want to go offroading? Surely they do. But they also know that good intentions don&#x27;t always result in actions. So plenty of people <em>think</em> they&#x27;d like to go offroading, but they never actually do. May be the car designers know this difference between what people say and what they actually do.</p>
<p>The key thing here is to understand what percentage of people have that particular <em>why</em>. If it&#x27;s an edge-case, designers can safely focus on other more important things.</p>
<h3>The Hawthorne effect in user research</h3>
<p>There&#x27;s a recognised human bias that occurs when people behave differently because they know they are being watched. It&#x27;s called the <a href="https://catalogofbias.org/biases/hawthorne-effect/">Hawthorne effect</a>. I suspect Hawthorne effect comes into play whenever we carry out moderated usability testing or discovery research. People are self-conscious and want to appear smart. Their behaviour with your service at home might be vastly different.</p>
<h3>The &#x27;why&#x27; behind user actions</h3>
<p>We humans tend to be reasonable beings most times. I say reasonable because usually whatever we do, we have a reason to do it. That reason is the why behind our actions. But our reasons are not always logical or obvious. Is there a deadline looming? We tend to work faster. But if there&#x27;s something else at play such as a date with a girlfriend, the deadline can wait. We have our reasons for everything we do, although the reasoning could be unreasonable by various standards.</p>
<p>Sometimes it&#x27;s just a matter of asking. During one usability testing session I was facilitating, the participant kept going back and forth between pages of a form he was supposed to fill out. But after a couple of attempts, he was able to complete the process. My initial hunch was he was just checking if he&#x27;s filled out the form correctly. Just to be sure, at the end of the test, I asked him why he was navigating back and forth. His answer? He thought he was at the wrong place because he&#x27;d filled out a very similar-looking form in the previous test. He just wasn&#x27;t sure if he was at the right place.</p>
<p>Now this confusion caused by similar-looking forms was the real insight I needed. Another user might completely ignore this confusion, even if they did find it confusing, possibly due to the Hawthorne effect.</p>
<h3>Is user research repeatable?</h3>
<p>User researchers can make their process repeatable, but users are fickle. They&#x27;d do one thing the first time, and do another next. There&#x27;s no way you&#x27;ll be able to get exactly the same reactions, feelings, behaviours and comments from a user twice. So, the only way to find real insights is to dig deeper and find the <em>why</em>.</p>
<h3>Are the whys repeatable?</h3>
<p><em>Whys</em> can be valid, but more often than not, they&#x27;re not consistent with each other. They are just as varied as humans are, with no unifying theme. There could be just more reasons for users to do what they do. Once you corroborate the discoveries of <em>why</em> in your research, you&#x27;ll understand the various motivations, mindsets and preferences of those user cohorts.</p>
<p>By the way, if you&#x27;ve read this far, you should let me know your <em>why</em> on <a href="https://twitter.com/wordofadnan">Twitter</a>.</p>]]></content:encoded>
            <author>adnan@aklodhi.com (Adnan Khan)</author>
        </item>
        <item>
            <title><![CDATA[Flexibility: Product vs Service Design — With a 2024 Addendum on AI]]></title>
            <link>https://aklodhi.com/articles/flexibility-product-vs-service-design-ai</link>
            <guid>https://aklodhi.com/articles/flexibility-product-vs-service-design-ai</guid>
            <pubDate>Wed, 05 Sep 2018 00:00:00 GMT</pubDate>
            <description><![CDATA[Where should flexibility live—product or service layer? Exploring the tradeoff, with a 2024 addendum on how AI changes the equation.]]></description>
            <content:encoded><![CDATA[<p><em>Originally published September 2018. Addendum added December 2024.</em></p>
<h2>2024 addendum: The AI layer as a third option</h2>
<p>When I wrote this piece six years ago, I framed flexibility as a tradeoff between two layers: the product and the service. You could build flexibility into the interface—at the cost of usability—or push it up to the service layer, where human processes could absorb the complexity.</p>
<p>That framing still holds. But something fundamental has changed.</p>
<p>AI has emerged as a third layer where flexibility can live. Not as a feature bolted onto products, but as a genuinely different place to locate the burden of adaptation.</p>
<h3>What makes the AI layer different</h3>
<p>Traditional product flexibility asks users to understand their options and choose correctly. Service flexibility asks organisations to staff and train for exceptions. AI flexibility asks neither—or rather, it asks the system itself to bear the interpretive load.</p>
<p>Consider a document submission flow. The 2018 approach gave users choices: upload a file, take a photo, fill out a PDF. Each option added interface complexity. The service approach handled edge cases manually: a staff member would call users whose submissions were unclear.</p>
<p>An AI-mediated approach does something else entirely. It accepts whatever the user provides and figures out what to do with it. Blurry photo? Enhance it or extract what&#x27;s legible. Wrong form? Map the fields to the right one. Missing information? Ask for just what&#x27;s needed, conversationally, rather than rejecting the whole submission.</p>
<p>The flexibility still exists—the system handles tremendous variation in input—but it&#x27;s invisible to the user. The interface stays simple because the intelligence sits behind it.</p>
<h3>The new tradeoff: predictability</h3>
<p>This isn&#x27;t a free lunch. Each layer trades one kind of cost for another:</p>
<ul>
<li><strong>Product flexibility</strong> costs usability. Users must navigate complexity.</li>
<li><strong>Service flexibility</strong> costs efficiency. Humans must handle exceptions.</li>
<li><strong>AI flexibility</strong> costs predictability. The system&#x27;s behaviour becomes harder to guarantee.</li>
</ul>
<p>For some domains, unpredictability is unacceptable. Financial transactions, medical decisions, legal processes—these need deterministic behaviour that users and auditors can verify. Product and service flexibility remain the right choices here, even with their costs.</p>
<p>But for many other domains—onboarding flows, support interactions, content creation, search—some unpredictability is tolerable, even welcome, if it means users don&#x27;t have to think about the system&#x27;s constraints.</p>
<h3>Choosing where flexibility lives</h3>
<p>The question is no longer just &quot;product or service?&quot; It&#x27;s now a three-way decision:</p>
<p><strong>Put flexibility in the product when:</strong></p>
<ul>
<li>Users need direct control and visibility</li>
<li>The domain requires auditability</li>
<li>Variations are finite and well-understood</li>
</ul>
<p><strong>Put flexibility in the service when:</strong></p>
<ul>
<li>Exceptions require human judgment</li>
<li>Stakes are high and errors costly</li>
<li>Personal relationships matter</li>
</ul>
<p><strong>Put flexibility in the AI layer when:</strong></p>
<ul>
<li>Input variation is high but intent is clear</li>
<li>Speed matters more than perfect accuracy</li>
<li>Users shouldn&#x27;t need to know the system&#x27;s constraints</li>
</ul>
<p>Most systems will use all three. The art is in knowing which variations belong where.</p>
<h3>What this means for design practice</h3>
<p>For product designers, the implication is significant: you may no longer need to expose every option in the interface. If AI can infer intent from behaviour or context, explicit controls become unnecessary. The skill shifts from designing comprehensive option sets to designing appropriate feedback—helping users understand what the AI understood and correct it when needed.</p>
<p>For service designers, AI changes the economics of exception handling. Many exceptions that once required human intervention can now be resolved automatically. This doesn&#x27;t eliminate the service layer; it refocuses it on cases where human judgment genuinely adds value.</p>
<p>The core insight from the original piece—that usability should never be compromised, and flexibility should be traded elsewhere when it threatens usability—remains true. We simply have more places to trade it now.</p>
<hr/>
<h2>The original piece (2018)</h2>
<h3>What is product flexibility?</h3>
<p>Can&#x27;t do forms online? Fill out a PDF and upload. Can&#x27;t upload a photo? Take a photo using the device camera. Giving choice at the product level brings robustness and flexibility into the system. Users find this &quot;easy to use&quot; and there&#x27;s a much higher chance of them completing their journeys.</p>
<p>For power users, the same principle applies differently. Your system might allow bulk actions, navigational shortcuts, contextual menus, additional options, permission systems, audit logs. Flexible systems serve more users in more situations. But flexible systems are usually complex.</p>
<h3>Lidwell&#x27;s flexibility-usability tradeoff</h3>
<p>William Lidwell&#x27;s <em>Universal Principles of Design</em> describes a flexibility-usability tradeoff: the more flexible something is, the less usable it tends to be.</p>
<p>There&#x27;s nothing inherently unusable about flexible systems. The issue is what flexibility demands from users. Flexible systems require users to be more aware of their working context. Users need to understand their options at various stages, know the consequences of their actions, carry more cognitive load, maintain bigger working memory.</p>
<p>With flexibility comes complexity. We&#x27;re simply asking users for more—more attention, more effort, more responsibility.</p>
<p>This tradeoff seems inevitable if we stay within the boundaries of the product. But the product is rarely the whole picture.</p>
<h3>Trading flexibility up to the service layer</h3>
<p>Services are bigger than products. Their surface area covers not just a system&#x27;s internal processes, but the physical spaces, the human touchpoints, the offline-online boundaries. This wider scope gives services greater capacity to absorb complexity.</p>
<p>A service designer can distribute complexity elsewhere so that the product ends up simple.</p>
<p>Consider a government application process. A product-centric approach might build extensive validation, multiple document upload options, branching logic for different applicant types—all the flexibility needed to handle variation, all surfaced in the interface.</p>
<p>A service-centric approach might simplify the product to a single happy path, then create processes for handling variations: a triage team that reviews incomplete applications, a phone callback for complex cases, an in-person option for those who can&#x27;t use digital channels at all.</p>
<p>The total system complexity might be similar. But the product complexity—what users experience—is dramatically reduced.</p>
<p>Service designers can advocate for modularity, breaking monolithic systems into focused components with limited responsibility. They can create handoff points where human judgment takes over from automated processes. They can design feedback loops that let exceptions inform future product improvements.</p>
<h3>Product-service flexibility is the real tradeoff</h3>
<p>The flexibility-usability tradeoff isn&#x27;t wrong, but it&#x27;s incomplete. It assumes flexibility must live in the product. Once we recognise that flexibility can migrate between product and service layers, a different tradeoff emerges: where should flexibility live?</p>
<p>One thing we should never compromise on is usability. When usability suffers, we should look back and ask why so much capability ended up in the product. Can this flexibility be traded up to the service level?</p>
<p>Sometimes the answer is no—the service organisation lacks capacity, or latency requirements rule out human intervention, or the edge cases are too frequent to handle manually. These are real constraints.</p>
<p>But often the answer is yes, it&#x27;s just that nobody asked. Product and service design happen in silos. Product teams build flexibility into interfaces because that&#x27;s what they control. Service teams deal with whatever products emerge.</p>
<p>Breaking that pattern might require a difficult conversation with a service owner. It might require changes to staffing or training. It might require a systems architect to rethink handoff points.</p>
<p>It&#x27;s worth trying. Users shouldn&#x27;t bear complexity that belongs elsewhere.</p>]]></content:encoded>
            <author>adnan@aklodhi.com (Adnan Khan)</author>
        </item>
    </channel>
</rss>