<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Puneet Ghanshani]]></title><description><![CDATA[Simple matters for technology leaders]]></description><link>https://puneetghanshani.substack.com</link><image><url>https://substackcdn.com/image/fetch/$s_!Ghus!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec3f0884-b44e-40ee-a118-ebe0fc57996a_1080x1080.png</url><title>Puneet Ghanshani</title><link>https://puneetghanshani.substack.com</link></image><generator>Substack</generator><lastBuildDate>Sat, 20 Jun 2026 22:16:28 GMT</lastBuildDate><atom:link href="https://puneetghanshani.substack.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Puneet Ghanshani]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[puneetghanshani@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[puneetghanshani@substack.com]]></itunes:email><itunes:name><![CDATA[Puneet Ghanshani]]></itunes:name></itunes:owner><itunes:author><![CDATA[Puneet Ghanshani]]></itunes:author><googleplay:owner><![CDATA[puneetghanshani@substack.com]]></googleplay:owner><googleplay:email><![CDATA[puneetghanshani@substack.com]]></googleplay:email><googleplay:author><![CDATA[Puneet Ghanshani]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Prompts, Skills, Agents]]></title><description><![CDATA[A few weeks ago, I was in a conversation with a friend about moving from Gen AI to agentic systems.]]></description><link>https://puneetghanshani.substack.com/p/prompts-skills-agents</link><guid isPermaLink="false">https://puneetghanshani.substack.com/p/prompts-skills-agents</guid><dc:creator><![CDATA[Puneet Ghanshani]]></dc:creator><pubDate>Sun, 14 Jun 2026 23:01:28 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Ghus!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec3f0884-b44e-40ee-a118-ebe0fc57996a_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A few weeks ago, I was in a conversation with a friend about moving from Gen AI to agentic systems. We spent most of it on the architectural question: what actually changes when you go from a model answering questions to a model taking actions, and what prompt engineering meant in the Gen AI phase versus what context engineering means now.</p><p>The observation that surfaced: <strong>teams that struggle most with agentic systems are the ones that carried over their Gen AI mental model intact.</strong> They optimised their prompts, got good at crafting instructions, and assumed that scaling to agents was mostly a matter of chaining those prompts together more cleverly. That is precisely where things started to break.</p><p>The prompt is important. It is one of the most important things in an agent. But it is not the only thing, and the distinction between what a prompt does, what a skill does, what MCP enables, and what the agent itself is responsible for, that is where most agentic architectures are currently underspecified.</p><p><strong>Gen AI mental model</strong></p><pre><code><code>Prompt
(craft the instruction, get an output)

Prompt engineering: optimise the instruction.</code></code></pre><p><strong>Agentic mental model</strong></p><pre><code><code>Agent
  |-- Control loop     (decide, act, stop, or escalate)
  |-- Memory           (in-context / retrieved / episodic)
  |-- MCP Tools        (runtime-discoverable: APIs, search, databases,
  |                     code interpreters, connectors -- anything callable)
  |-- Skill [x N]      (build-time: AI capability + evaluation harness,
  |                     versioned in shared source, tested against benchmark)
  |-- Prompt [x N]     (context boundary, output contract, failure behaviour)
  |-- Observability    (decision trace, tool call quality, goal completion)

Context engineering: design the whole information environment
the model operates in.</code></code></pre><h2>The prompt is not config. It is the spec.</h2><p><strong>Inside an agent, a prompt is an architecture decision</strong>, not a text string waiting to be filled in. It determines what the model can see, what it is authorised to do, and how it should behave when the input does not match expectations. Write it carelessly and it becomes a liability embedded in a running system.</p><p>Most teams treat prompts as configuration, something you tune after the real architecture is done. But in an agentic system, <strong>the prompt is where the architecture lives</strong>. The context boundary, the tool permissions, the output contract, the failure behaviour: all of it is specified in the prompt, or it is not specified at all.</p><p><strong>Prompt changes are architecture changes.</strong> When someone edits a prompt in production to &#8220;fix a tone issue,&#8221; they are modifying a running system, usually without a review, without a version bump, and without any test coverage. If a prompt is doing more than one thing, it is two architectural decisions badly merged. That is almost always the diagnosis when a team reports that their agent is &#8220;inconsistent.&#8221;</p><h2>Skills: versioned, testable, shared</h2><p><strong>A skill encodes a repeatable AI capability alongside its evaluation criteria.</strong> Entity extraction, document classification, summarisation under a specific constraint. Unlike a prompt, a skill can be run against a benchmark, regressed across model versions, and verified when an upgrade breaks something. That testability is the defining property, and it is what makes a skill worth sharing rather than rewriting per service.</p><p>Both prompts and skills can live in source code. <strong>A git repo with PR-reviewed changes is sufficient governance for both.</strong> The question is not where they are stored but whether they are shared. I have watched three engineering teams in the same organisation each rewrite the same extraction logic over six months, each convinced the previous implementation had been abandoned. A common repo they all knew about would have been enough. <a href="https://github.com/MicrosoftDocs/Agent-Skills">MicrosoftDocs/Agent-Skills</a> on GitHub works exactly this way for Azure-specific capabilities.</p><p>For most teams, <strong>a shared repo with a benchmark folder per skill is the right first step</strong>. How the agent calls those skills at runtime is a separate question, and that is where MCP comes in.</p><h2>The agent layer: control flow, trust boundaries, audit trail</h2><p><strong>An agent adds a control loop and a decision to act</strong>, two things neither prompts nor skills have. It decides which skill to invoke, on what input, and whether to continue or stop. The agent does not contain the inference logic. It coordinates it.</p><p>Most confusion lives here. Teams build chained prompts with no real branching and call them agents because the framework uses the word. <strong>A bad prompt produces a wrong output. A bad agent produces a wrong action</strong>, and in an agentic loop that action may already have been taken before anyone reviews it. An agent needs a trust boundary, a halt condition, and an audit trail that reconstructs the decision sequence. Applying prompt-level governance to an agent-level system is where organisations run into trouble, and the gap usually only becomes visible after something has gone wrong in production.</p><h2>Agent &gt; Skill &gt; Prompt: dependency is not the same as importance</h2><p>The cleaner architecture: agents orchestrate skills, skills invoke prompts. But <strong>the direction of dependency should not be confused with the direction of importance</strong>. The prompt is the foundation. Everything the agent does is constrained by what its prompts specify. If those are underspecified, the agent is underspecified, regardless of how clean the orchestration logic is.</p><p>The counterargument is that this is over-engineered for most current use cases. For a single-task pipeline, the ceremony is not worth it. <strong>The hierarchy earns its cost when multiple teams share capability, when skills need to be tested in isolation, or when a model upgrade requires selective regression.</strong> That describes most systems after six months in production. Almost no one designs for it at the start, and most teams wish they had.</p><h2>Where to invest, and when</h2><p><strong>The mental model shift from Gen AI to agentic is not about prompts or skills alone.</strong> </p><p>It is about designing a system that can decide, remember, act, and be held accountable across all six layers above. Teams that treat prompts and skills as the whole design surface keep running into failures that originate elsewhere: memory never architected, tools added without revisiting trust boundaries, observability never built.</p><p>Getting the full mental model right is the precondition for everything else.</p>]]></content:encoded></item><item><title><![CDATA[The Recovery Instinct That Creates a Second Problem]]></title><description><![CDATA[Years ago, I have cleared my calendar for a failing project and called it a recovery plan. It felt like the right call for about two weeks. Then the team stopped making decisions without checking with me first, and I realised I had become the bottleneck I was trying to fix.]]></description><link>https://puneetghanshani.substack.com/p/the-recovery-instinct-that-creates</link><guid isPermaLink="false">https://puneetghanshani.substack.com/p/the-recovery-instinct-that-creates</guid><dc:creator><![CDATA[Puneet Ghanshani]]></dc:creator><pubDate>Mon, 08 Jun 2026 03:37:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Ghus!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec3f0884-b44e-40ee-a118-ebe0fc57996a_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Years ago, I have cleared my calendar for a failing project and called it a <strong>recovery plan</strong>. It felt like the right call for about two weeks. Then the team stopped making decisions without checking with me first, and I realised I had become the bottleneck I was trying to fix. </p><p>Most project recoveries start this way. And most leaders do not notice the moment it stops being leadership.</p><h2>The Cognitive Logic Behind the Behaviour</h2><p>When a project is visibly failing, there is a specific kind of pressure that builds on the person at the top. Every status update feels unreliable, every milestone slippage raises the question of what else is being underreported, and the natural response is to go closer to the source. Stop relying on summaries. Attend the meetings. See the work directly. This feels like good judgement, and in some ways it is: the instinct to improve the quality of information you are working with is sound.</p><p>What makes it a trap is the underlying assumption. </p><div class="pullquote"><p>Most leaders reached their position through a period where their direct involvement produced direct results. That muscle memory does not disappear when the job changes. It just gets misdirected. </p><p>When things go wrong, the brain reaches for the tool that worked before, which is personal, hands-on engagement with the problem. </p><p>The context has changed entirely, but the response has not.</p></div><p>There is also a structural accountability dynamic running underneath this. When a project is publicly failing, the leader feels exposed. Getting visibly closer to the work creates a defensible position: whatever happens, it cannot be said that they were not paying attention. That is a reasonable human response to an uncomfortable situation, and it shapes behaviour in ways that are rarely acknowledged openly.</p><h2>Where Getting Closer Stops Helping</h2><p>None of this means the right answer is to stay back. A leader who deliberately maintains distance from a burning project, on principle, is not demonstrating trust. They are avoiding the discomfort of engaging with something that is broken, and calling it a management philosophy.</p><p>Some direct contact is necessary. The question worth asking at the start is: <strong>what specifically am I trying to learn, and how will I know when I have learned it?</strong> That framing turns an open-ended takeover into a structured diagnostic. </p><p>You are looking for the point of failure, not managing the work. You want to understand whether this is a <strong>process problem, a capability gap, a dependency that was never properly resolved, or some combination</strong>. You also want to understand <strong>who on the team has an accurate picture</strong> of where things stand, because that person is rarely the one doing most of the talking in status calls.</p><p>The problem is that reading reports, attending calls, and reviewing outputs all generate a genuine sense of being informed, which makes it easy to keep going past the point where it is useful. That activity has to produce a decision: <strong>what is broken, what changes, who owns the fix</strong>. If weeks pass without that judgment being made, the diagnostic has drifted into something else, and the team starts to feel it before the leader does. Every additional status update, every decision that now requires sign-off from above consumes capacity the team does not have. Work that would take three hours takes five, not because anyone is working slowly, but because two hours went into keeping the leader informed.</p><p>The less visible cost is what happens to how the team thinks. When people learn that their calls will be reviewed and their estimates questioned, they stop making calls. They wait, send messages asking for confirmation on things they would have decided themselves the week before, and the leader, now absorbed in the detail, starts answering those messages. The dependency compounds quietly, and by the time it is visible the project has a new structural problem layered on top of the original one.</p><h2>The Conversation Only You Can Have</h2><p>Most failing projects have a structural problem sitting underneath the technical one, and it is usually the one that nobody has named out loud. </p><ul><li><p>A dependency that everyone knows is broken but that has been carried forward in plans anyway. </p></li><li><p>A scope that has grown steadily because the right person was never told no. </p></li><li><p>A delivery date that was set for reasons that no longer exist but that has never been formally revisited because doing so would require an uncomfortable conversation with someone senior.</p></li></ul><p>The team knows these things. They talk about them. What they cannot do is resolve them, because resolution requires authority they do not have. The tech lead cannot tell the steering committee the deadline is not achievable. The project manager cannot push back on a VP who keeps adding requirements. These conversations require someone with enough standing to have them without it becoming a career event, and that person is usually the one who has spent the last three weeks reviewing tickets.</p><p>The shift that actually turns a recovery around is <strong>when the leader stops managing the work and starts managing the conditions around the work</strong>. Clearing a blocker that has been stuck for two weeks. Making a scope decision that everyone knew needed to be made but that nobody wanted to own. Having the timeline conversation that the team could not have. That is where the leverage is, and it is almost always underused because the pull towards the detail is so strong.</p><p>The Monday question is simple: what conversation are you avoiding that only you can have?</p>]]></content:encoded></item><item><title><![CDATA[Why AI Transformations Lose Momentum]]></title><description><![CDATA[Most AI transformations are not failing on the technical side.]]></description><link>https://puneetghanshani.substack.com/p/why-ai-transformations-lose-momentum</link><guid isPermaLink="false">https://puneetghanshani.substack.com/p/why-ai-transformations-lose-momentum</guid><dc:creator><![CDATA[Puneet Ghanshani]]></dc:creator><pubDate>Wed, 03 Jun 2026 22:42:39 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Ghus!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec3f0884-b44e-40ee-a118-ebe0fc57996a_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most AI transformations are not failing on the technical side.</p><p>They are failing because new models get deployed in weeks, while new behaviours take months. That gap is where momentum disappears.</p><p>A Harvard Business Review article describes this pattern as a &#8220;<strong>false start</strong>&#8221; in large-scale transformation. Technology moves forward, pilots show results, leadership attention follows, and then the organization falls back into familiar routines. What looked like progress becomes another tool that only a small portion of the intended audience uses consistently.</p><p>The common explanation is resistance to change. <strong>The more useful explanation is that organizations have a limited capacity to absorb change, regardless of how compelling the technology may be.</strong></p><p>This matters because most AI programmes treat adoption as a consequence of deployment. Build the capability, train people, and assume usage will follow. In practice, <strong>AI is often deployed into operating models, incentives, and workflows designed for a different way of working.</strong> The technology changes. The surrounding system does not.</p><p>As a result, people use AI where it fits naturally and avoid it where it requires meaningful behavioural change.</p><p>The organizations making real progress approach the problem differently.</p><p><strong>They make the cost of standing still visible, not just the benefits of moving forward.</strong> They reduce competing priorities because every transformation draws from the same finite pool of attention and execution capacity. They build support beyond the executive layer, where day-to-day operating decisions are actually made. And they create early proof points that build credibility before skepticism takes hold.</p><p>Most importantly, they stop treating technology deployment, operating model redesign, and change management as separate programmes. They recognize that these are different dimensions of the same transformation effort.</p><p>The organizations that succeed with AI will not necessarily be the fastest to deploy.</p><p>They will be the ones that understand a simple constraint: technology adoption is limited by organizational change capacity.</p><p>The technology is rarely the bottleneck.</p><p>The organization&#8217;s ability to absorb change is.</p><p><em>Inspired by Timothy Clark&#8217;s Harvard Business Review article, &#8220;How to Avoid a False Start When You&#8217;re Leading a Big Change.&#8221; The AI-focused interpretation and analysis in this essay are my own.</em></p><p><a href="https://hbr.org/2026/02/how-to-avoid-a-false-start-when-youre-leading-a-big-change">https://hbr.org/2026/02/how-to-avoid-a-false-start-when-youre-leading-a-big-change</a></p>]]></content:encoded></item><item><title><![CDATA[From Hardcoded APIs to MCP: How Enterprise Developers Must Rethink Agent Integration]]></title><description><![CDATA[I have sat through too many architecture discussions and review boards where the debate gets stuck on &#8220;should we go with Skills or MCP?&#8221; as if we are choosing between two competing options.]]></description><link>https://puneetghanshani.substack.com/p/from-hardcoded-apis-to-mcp-how-enterprise</link><guid isPermaLink="false">https://puneetghanshani.substack.com/p/from-hardcoded-apis-to-mcp-how-enterprise</guid><dc:creator><![CDATA[Puneet Ghanshani]]></dc:creator><pubDate>Fri, 29 May 2026 01:00:52 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Ghus!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec3f0884-b44e-40ee-a118-ebe0fc57996a_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I have sat through too many architecture discussions and review boards where the debate gets stuck on &#8220;should we go with <strong>Skills</strong> or <strong>MCP</strong>?&#8221; as if we are choosing between two competing options. And, <strong>function calling</strong> has entered the same conversation and made things even more confusing.</p><p>All three of these get conflated constantly, and that conflation is what causes bad architectural decisions.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://puneetghanshani.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><strong>Skills are about reasoning. Function calling is about how the model requests an action in a single turn. MCP is about connectivity and governance at scale.</strong> </p><p>These three operate at completely different levels of the stack. Mixing them up is like asking your network team whether you need TCP/IP, a REST convention, or a good database schema. You need all three, and each has its rightful place.</p><p>But before we talk about any of those three, we need to talk about where most enterprise agent code actually starts. And that is hardcoded API calls.</p><h2>Stage One: The Hardcoded API Call</h2><p>Every developer who has built an enterprise agent in the last two years has written some version of this. A Python function that hits the SAP endpoint directly. A bearer token pulled from an environment variable. The response parsed inline and fed back into the prompt. It works in the demo. It gets committed to the repository. And then it quietly becomes a technical debt.</p><p>The issue is that this approach was perfectly reasonable when you were building a traditional application where a human was clicking a button to trigger each call. </p><p><strong>When an agent is autonomously deciding to call that same API ten times in a session, chaining the results across three other services, and doing all of this without a human in the loop, the hardcoded approach breaks in ways that are very hard to recover from.</strong></p><p>Every schema change, every endpoint rotation, every authentication update is a code deployment that warrants change request, UAT sign off, and a release window. Your agent integration now lives on the same release cycle as your application code, and that cycle was never designed for the pace at which business system APIs evolve.</p><p>You also have zero discoverability. The agent only knows what a developer decided to bake in at build time. <strong>New capabilities that come online in your ERP system are invisible to the agent until someone writes more code.</strong> You are building a static map of a landscape that is constantly changing.</p><p>And when your CISO asks which agents are touching which business systems and under what conditions, there is no clean answer. The API calls are buried across repositories, called from different places, using different credentials that someone hardcoded into environment variables six months ago and nobody has rotated since.</p><p>The architecture conversation has to start here, because this is the reality on the ground.</p><h2>Stage Two: Function Calling in lieu of Hardcoded APIs </h2><p>The natural next step for most teams is to introduce function calling. <strong>The model now decides dynamically which function to invoke based on the user&#8217;s intent, rather than the developer hardcoding the sequence.</strong> This feels like a significant improvement, and in some ways it genuinely is.</p><p>But function calling solves the routing problem without solving the integration problem. And in an enterprise context, the integration problem is the one that will actually hurt you.</p><p>Every function your agent needs to call must be defined and passed into the model context at inference time. In a simple chatbot with three or four tools, that is fine. </p><blockquote><p><strong>In an enterprise environment where your agent might need access to 10-15 different business system capabilities depending on context, you are either bloating every prompt with tool definitions the agent will never use in that session, or you are building complex routing logic to figure out which function definitions to inject. </strong></p></blockquote><p>The function definition now lives in the prompt rather than buried in application code, which is marginally better from an auditability standpoint. </p><p>Native function calling primitives are intentionally lightweight and request-scoped. Long-running enterprise workflows require orchestration capabilities beyond the function calling interface itself.</p><p>So where does function calling actually fit in a mature architecture? It fits as the low level dispatch mechanism inside an MCP tool implementation. Your MCP server receives a standardised request from the agent, and internally it may use function calling patterns to route to the right backend handler. </p><h2>Stage Three: MCP as the Enterprise Integration Layer</h2><p>MCP does not replace hardcoded API calls or function calling at the code level. It replaces the <strong>architectural pattern of having agents reach directly into business systems at all</strong>.</p><blockquote><p>Instead of your agent knowing about your SAP API, your agent knows about an MCP server that exposes SAP capabilities in a governed, discoverable, standardised way. </p></blockquote><p>The MCP server is the boundary. Everything behind it is an implementation detail that the agent team does not need to own.</p><p>This effort should be led by Enterprise Architects, as this impacts agents across organization, rather than individual development teams.</p><h3>One Server, Many Consumers</h3><p>When your platform team builds an MCP server on top of your SAP instance, every agent team in the organisation consumes it. </p><blockquote><p>You want to swap the underlying language model next year? Fine. </p><p>You want to migrate your orchestration framework? No problem. </p></blockquote><p>The integration layer stays stable and provides the reuse we have been promising stakeholders for years with SOA and microservices. MCP gives us another shot at it in agent-era.</p><p>There is an important caveat here that platform architects need to plan for honestly. Wrapping a modern REST API into an MCP server is relatively straightforward work. Wrapping a legacy ERP system, a stateful on premise integration, or a proprietary mainframe interface is a significantly different proposition. These systems were not designed to be called as stateless tools by an autonomous agent. </p><div class="callout-block" data-callout="true"><p><strong>Session management, transaction boundaries, error handling, and partial rollback scenarios all require careful engineering that the MCP specification itself does not prescribe.</strong> </p></div><p>The reuse promise is real, but the upfront investment to get legacy systems wrapped correctly should not be underestimated in your programme planning.</p><h2>Where Skills Fit in All of This</h2><p>Here is the nuance that often gets lost when teams get excited about MCP. The protocol tells your agent what tools are available and how to call them. <strong>It says absolutely nothing about when to call them, in what order, how to reconcile conflicting data coming back from three different API responses, or when the right answer is to stop and escalate to a human.</strong></p><p>An agent sitting on top of 20+ well governed MCP servers but with shallow reasoning logic is basically a very expensive search interface with a compliance dashboard attached to it.</p><p>The genuinely hard problems in enterprise agent deployments are not connectivity problems. They are judgment problems. Figuring out that a vendor master check must happen before a purchase order is raised, even when the user did not mention it. Handling the situation where your ERP returns a partial result because of a downstream timeout. Knowing when ambiguous user intent requires clarification rather than a best guess action.</p><p>None of that lives in MCP. All of that lives in skills.</p><p>So skills are not going away. What changes is their role in the stack. <strong>They get promoted to the reasoning layer above orchestration and they get relieved of their current job as bespoke integration glue.</strong> The middle tier of framework specific, non reusable, ungovernable tool definitions is what we need to retire.</p><h2>The Identity Problem, and How Entra ID Solves It</h2><p>I want to be transparent about where the hardest architectural question in this space sits, because it is the one that determines whether your agent programme scales safely or collapses under its own complexity.</p><p>Federated trust across organisational boundaries is genuinely hard. </p><blockquote><p><strong>When your agent calls an MCP server operated by a third party SaaS vendor, whose identity model is in charge?</strong></p></blockquote><p>OAuth scopes give us a partial answer. They do not compose cleanly across different organisational trust boundaries.</p><p>Services like, Microsoft Entra ID, provide the most mature and practical answer available in the market today.</p><p>Entra ID now supports agent identity, which means your AI agents can be registered as <strong>non-human identities</strong> inside the same directory where your users, service principals, and managed identities already live. Each agent gets an identity object. That identity object participates in the same conditional access policies, role based access controls, and audit logging pipelines that your IT and security teams already operate. This <strong>extends</strong> <strong>your existing governance model to cover agents.</strong></p><p>The authentication flow is clean. The agent authenticates using OAuth 2.0 and receives a scoped access token. That token carries the agent identity, not a hardcoded service account credential that three different developers know the password to. The token is short lived. It is scoped to the specific resource the agent needs to access in that session. And every token issuance is logged centrally.</p><p><strong>Where this becomes genuinely powerful is with third party SaaS applications that already support Entra based authentication.</strong> </p><p><strong>If your business systems already accept Entra tokens for human user authentication, your agents can participate in the same trust chain without any bespoke integration work.</strong> </p><p>The MCP server calls the SaaS API, the SaaS API validates the Entra token, and the access decision is made using the same policies that govern your human users.</p><p>The complexity of cross-boundary agent authentication does not disappear, but with Entra ID as the identity backbone it becomes a manageable engineering problem rather than an unsolved architectural one.</p><h2>How the Stack Actually Fits Together</h2><p>The organisations that will get enterprise agents right in 2026 and beyond are not the ones debating Skills vs Function Calling vs MCP. They are the ones who recognise that these are three different layers of the same stack, each solving a different problem, and each needing to be governed differently.</p><p>Here is what that architecture looks like in practice:</p><ul><li><p><strong>MCP servers</strong> sit at the integration layer, owned by your platform team, with proper IAM controls and audit logging from day one. </p></li><li><p><strong>Agent identity</strong> is anchored in Microsoft Entra ID, with tokens that are short lived and scoped to specific tool audiences.</p></li><li><p><strong>Reasoning skills</strong> sit above the orchestration layer and encode the domain judgment that your business actually needs. Domain architects and business analysts need to be deeply involved in defining the reasoning skills.</p></li><li><p><strong>Emerging MCP gateway patterns are beginning to address where an MCP gateway</strong> handles the security concerns the protocol itself does not address, including tool poisoning, rug pull protection, cross server shadowing, and per user OAuth passthrough. </p><p>Rather than building this from scratch, some gateways like, Azure API Management, allow existing APIs to be registered directly as MCP servers. If your SAP, Salesforce, and internal APIs are already onboarded into APIM, <strong>they become MCP tools without hosting another server, </strong>carrying forward existing policies for authentication, rate limiting, and access control. </p></li></ul><p>We went through this same maturity journey with application integration before. We started with hardcoded database connections. We moved to DAOs and service layers. Then to REST APIs. Then to API gateways and service meshes. Each step followed the same concept: stop letting individual developers own the integration boundary and put it somewhere it can be governed.</p><p>Agent integration is following the exact same path. The developers who built their first agents with hardcoded API calls were just at Stage 1. <strong>The job now is to help every team in the organisation make that journey to Stage 3 without losing the speed and agility that made those early agents valuable in the first place.</strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://puneetghanshani.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Your autopilot is only as good as your plan]]></title><description><![CDATA[Lessons from shipping with GitHub Copilot CLI and what actually reduces the back-and-forth]]></description><link>https://puneetghanshani.substack.com/p/your-autopilot-is-only-as-good-as</link><guid isPermaLink="false">https://puneetghanshani.substack.com/p/your-autopilot-is-only-as-good-as</guid><dc:creator><![CDATA[Puneet Ghanshani]]></dc:creator><pubDate>Mon, 25 May 2026 06:35:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Ghus!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fec3f0884-b44e-40ee-a118-ebe0fc57996a_1080x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Vibe coding in the terminal is a different beast from an IDE. There is no sidebar, no inline ghost text, no visual scaffolding. Just you, a prompt, and whatever Copilot decides to do with it. That constraint forces you to be deliberate about when you reach for the model and what you ask it to do.</p><p>The biggest lever I found for cutting wasted iterations was not prompt engineering. It was understanding <strong>how the three CLI modes work, what each one is actually for,</strong> and when to trust Auto model selection versus when to pin something specific.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://puneetghanshani.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><h2>A note on how Auto model selection works</h2><p>Before getting into the phases, it helps to understand what happens when you leave model selection on Auto. It is not random and it is not just picking the most capable model available.</p><p>Auto follows a priority stack. First it applies your organisation&#8217;s policy filter, so if your company has blocked certain models, they are out of the pool entirely regardless of what you ask for. Then it avoids unnecessarily expensive models by default, reaching for a faster, cheaper model when that is sufficient, and escalating when the task complexity warrants it. <strong>The practical effect is that Auto is conservative by default.</strong></p><p>This matters because manually pinning a heavy reasoning model for <em>every prompt</em> is not actually the right move. It burns premium requests on work that does not need them. The right approach is to understand when Auto will make a good choice on its own, and when you need to override it.</p><p>Read <a href="https://docs.github.com/en/copilot/using-github-copilot/using-github-copilot-in-the-command-line">this article</a> for more information, and <a href="https://docs.github.com/copilot/concepts/auto-model-selection">this article</a> for Auto Models.</p><h2>The Three Modes</h2><h3>Plan</h3><p><code>copilot --mode plan</code> is a staging area.</p><p>You discuss the approach, Copilot lays out what it intends to do, and nothing touches your files until you say so. Most people skip this and go straight to Autopilot with a half-formed goal. That is where the <strong>iteration debt</strong> starts.</p><p>For planning work, I pin a <strong>strong reasoning model manually.</strong></p><p>Auto will often route plan-mode prompts to something cost-efficient, which is fine for a simple script but not for feature-level architectural thinking. <strong>A stronger reasoning model will push back on assumptions you did not know you were making. That pushback is the whole point of Plan mode. The extra latency is worth it.</strong></p><p>Getting to a plan you trust usually takes a few rounds, not one shot.</p><p>My first attempt at the prompt below asked for failure modes and got a decent list. But the plan was still too vague to hand to Autopilot safely. It described what to build without specifying the contract, the boundaries, or the test scenarios. I had to go back and explicitly ask for <strong>spec-driven development</strong>: user stories with acceptance criteria, a <strong>traceable todo list</strong>, a defined core contract, and <strong>unit test scenarios</strong> written from the stories. <strong>That conversation took a few rounds.</strong></p><p>The plan that came out of it was something I could actually use.</p><p>Here is what the initial prompt looked like, built up across those iterations, for a content filtering agent that intercepts third-party API responses in Python 3.11:</p><pre><code><code>copilot --mode plan --model &lt;your strongest reasoning model&gt;

I need to build a Python agent that filters API responses from a third-party
data provider before they reach our application layer. Here is the context:

Goal:
  Intercept API responses and strip or flag content that violates our policy
  (PII, profanity, off-topic categories). Pass clean responses through unchanged.

Stack:
  - Python 3.11
  - No new packages beyond what is already in requirements.txt

Constraints:
  - Must be a thin middleware layer, not a rewrite of the existing API client
  - Latency budget is 50ms added overhead max
  - Filtering rules will change frequently; config must be external, not hardcoded
  - Async; the existing client uses httpx with async/await throughout

What I am NOT sure about:
  - Whether to filter at the response object level or deserialised JSON level
  - How to handle partial matches (e.g. a field that is 90% clean)
  - Whether failed filter checks should raise, return a sanitised object, or return None

Use spec-driven development. Before writing any code, produce:
  1. An epic with acceptance criteria
  2. Features broken down from the epic
  3. User stories with acceptance criteria for each feature
  4. A core contract with the minimal interface
  5. Likely failure modes and assumptions to validate early
  6. A strict traceable todo list
  7. Unit test scenarios derived from the stories
</code></code></pre><p>The full spec, after a few iterations, that came out of this is on GitHub here: https://github.com/punitganshani/spec-to-code-ghcp/blob/main/plan.md</p><p>Notice what this gives Autopilot: a contract it cannot misinterpret, a todo list where each item maps to a story, and failure modes already named so they become test cases rather than surprises.</p><p>The model flagged two things I had not considered: that async exceptions in a plain httpx middleware chain can be silently swallowed if not structured carefully, and that config reloading mid-session needs a concurrency strategy.</p><p>Both would have been hard-to-diagnose bugs two hours into Autopilot.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bg4M!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f1677e-9b12-47fd-b49b-cb02470afabe_3008x1752.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bg4M!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f1677e-9b12-47fd-b49b-cb02470afabe_3008x1752.png 424w, https://substackcdn.com/image/fetch/$s_!bg4M!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f1677e-9b12-47fd-b49b-cb02470afabe_3008x1752.png 848w, https://substackcdn.com/image/fetch/$s_!bg4M!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f1677e-9b12-47fd-b49b-cb02470afabe_3008x1752.png 1272w, https://substackcdn.com/image/fetch/$s_!bg4M!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f1677e-9b12-47fd-b49b-cb02470afabe_3008x1752.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bg4M!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f1677e-9b12-47fd-b49b-cb02470afabe_3008x1752.png" width="1456" height="848" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d0f1677e-9b12-47fd-b49b-cb02470afabe_3008x1752.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:848,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:471008,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://puneetghanshani.substack.com/i/199152526?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f1677e-9b12-47fd-b49b-cb02470afabe_3008x1752.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!bg4M!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f1677e-9b12-47fd-b49b-cb02470afabe_3008x1752.png 424w, https://substackcdn.com/image/fetch/$s_!bg4M!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f1677e-9b12-47fd-b49b-cb02470afabe_3008x1752.png 848w, https://substackcdn.com/image/fetch/$s_!bg4M!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f1677e-9b12-47fd-b49b-cb02470afabe_3008x1752.png 1272w, https://substackcdn.com/image/fetch/$s_!bg4M!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd0f1677e-9b12-47fd-b49b-cb02470afabe_3008x1752.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>Interactive</h3><p><code>copilot --mode interactive</code> suggests a command and waits for you to press Enter before running it. It is the read-and-confirm loop, which makes it well suited for the kind of targeted questions that come up while you are working through a spec.</p><p>For interactive mode, Auto is usually fine.</p><p>The questions you bring here are specific and bounded. Auto will route them to something fast and capable, which is exactly what you want. The one exception is when you are untangling genuinely complex behaviour: async concurrency semantics, something deep in the httpx request lifecycle, anything that requires the model to hold a lot of context and reason carefully. In those cases I switch to a stronger model manually, but I do not default to it.</p><p>For the content filtering agent, these were the kinds of questions I actually brought to interactive mode:</p><pre><code><code>The spec says config must be reloadable without process restart. I am using a
module-level dict to cache compiled rules. Under concurrent async requests, is
there a window where one coroutine reads a partially updated cache? What is the
right primitive here given everything is running on the same event loop?
</code></code></pre><pre><code><code>Story 3.1 says violations in nested lists should be reported with exact payload
paths. My traversal uses a recursive approach and the path for list items is
coming out as "responses[*].content" instead of "responses[0].content". Here is
the relevant code and the failing assertion:

[pasted code and assertion]
</code></code></pre><pre><code><code>The PolicyFilter protocol has evaluate() as async. But the PII check I am
wiring in is a synchronous regex pass. If I wrap it with asyncio.to_thread(),
does that change the concurrency characteristics for the middleware hook, or is
it fine to just call it directly as a coroutine that happens to be non-blocking?
</code></code></pre><p>Each question has a specific failing thing, real context, and a specific decision to make. That is what interactive mode is for. The more I paraphrased or sanitised the context to make it feel like a clean example, the worse the answers got. Paste the actual stack trace, the actual failing assertion, the real config fragment.</p><p>The other thing that helped: stop using interactive mode for questions that should have been resolved in Plan.</p><p>&#8220;How should I structure the evaluation engine?&#8221; belongs in the spec. &#8220;Why is this traversal returning the wrong path for list items?&#8221; is an interactive question.</p><h3>Autopilot</h3><p><code>copilot --mode autopilot</code> is the CLI equivalent of agent mode. </p><p>It executes shell commands autonomously to achieve a goal and will keep going until it is done or stuck. Fast when the plan is solid. Expensive when it is not.</p><p>For model selection here, Auto is doing something useful: it <strong>dynamically switches between lighter and heavier models depending on the step it is on</strong>. Simple file writes or dependency installs will get a cheaper model. A step that requires generating a critical implementation or calling a complex tool will escalate to a heavier one. </p><p><strong>You do not need to pin a heavy model globally for an entire autopilot session. </strong>What matters more than model selection at this stage is <strong>prompt scope</strong>.</p><p>Point Autopilot at individual todo list items, not the whole epic. </p><p>&#8220;Implement story 3.1: nested traversal with path reporting, matching the <code>PolicyFilter</code> protocol in the spec&#8221; is a good Autopilot prompt. &#8220;Build the filtering agent&#8221; is not.</p><p>The spec gives Autopilot the contract, the acceptance criteria, and the boundaries. Without that, it makes design decisions silently and you find out when the tests fail.</p><p>Read what it produces before accepting it. Each todo item has acceptance criteria from the user stories, so you know exactly what done looks like before running the tests. Hand it one item, review against the criteria, run the relevant test scenarios, move to the next.</p><h2>Wrapping Up</h2><p>Plan, Interactive, and Autopilot are genuinely different jobs with genuinely different model needs.</p><ul><li><p>Manually pin a strong reasoning model in Plan. </p></li><li><p>Trust Auto in Interactive for most things. </p></li><li><p>Let Auto&#8217;s dynamic routing do its job in Autopilot and focus on giving it one well-scoped todo item at a time.</p></li></ul><p>That is where the iteration reduction actually comes from.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://puneetghanshani.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[Orchestrating AI Inference at Scale]]></title><description><![CDATA[API + Worker Patterns That Actually Hold Under Pressure]]></description><link>https://puneetghanshani.substack.com/p/orchestrating-ai-inference-at-scale</link><guid isPermaLink="false">https://puneetghanshani.substack.com/p/orchestrating-ai-inference-at-scale</guid><dc:creator><![CDATA[Puneet Ghanshani]]></dc:creator><pubDate>Sun, 22 Feb 2026 10:51:56 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!n2cy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f43d41-6a61-4bf6-b2e8-95702a0f024a_1920x1080.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If serving AI inference were like running a restaurant, most teams start by having the waiter cook the food.</p><p>It works when there are three tables. It collapses when there are three hundred.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://puneetghanshani.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>The same is true for AI systems. <strong>A single API server calling a model directly feels simple at first. But as usage grows, latency spikes, retries multiply, costs balloon, and reliability becomes unpredictable.</strong></p><p>For tech leaders building serious AI platforms, the question is no longer &#8220;How do we call the model?&#8221; It becomes:</p><p><strong>How do we orchestrate inference as a resilient, scalable system?</strong></p><p>This article walks through the API + Worker architecture pattern for AI inference and post-processing &#8212; not as a coding trick, but as an operational strategy.</p><h1>The Core Pattern</h1><p>At scale, inference must be treated as a <strong>distributed system</strong>.</p><p><strong>Logical Flow</strong></p><p><strong>Client &#8594; API &#8594; Durable Queue &#8594; Worker Pool &#8594; Model Service &#8594; Postprocessing &#8594; Result Store</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!n2cy!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f43d41-6a61-4bf6-b2e8-95702a0f024a_1920x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!n2cy!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f43d41-6a61-4bf6-b2e8-95702a0f024a_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!n2cy!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f43d41-6a61-4bf6-b2e8-95702a0f024a_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!n2cy!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f43d41-6a61-4bf6-b2e8-95702a0f024a_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!n2cy!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f43d41-6a61-4bf6-b2e8-95702a0f024a_1920x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!n2cy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f43d41-6a61-4bf6-b2e8-95702a0f024a_1920x1080.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/04f43d41-6a61-4bf6-b2e8-95702a0f024a_1920x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:784669,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://puneetghanshani.substack.com/i/188784112?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f43d41-6a61-4bf6-b2e8-95702a0f024a_1920x1080.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!n2cy!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f43d41-6a61-4bf6-b2e8-95702a0f024a_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!n2cy!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f43d41-6a61-4bf6-b2e8-95702a0f024a_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!n2cy!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f43d41-6a61-4bf6-b2e8-95702a0f024a_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!n2cy!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04f43d41-6a61-4bf6-b2e8-95702a0f024a_1920x1080.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This is not architectural ceremony. It is separation of failure domains.</p><p>When traffic spikes, when a model slows down, or when retries surge, you want pressure isolated &#8212; not amplified.</p><p>This separation enforces discipline:</p><ul><li><p><strong>API: validate, authorize, enqueue</strong><br>The API is your front door. It authenticates tenants, enforces rate limits, validates payloads, and turns requests into durable jobs. It does not execute inference.</p></li><li><p><strong>Queue: absorb burst traffic, guarantee durability</strong><br>The queue is the shock absorber. It smooths bursty demand and guarantees work is not lost when services restart.</p></li><li><p><strong>Workers: execute inference and post-processing</strong><br>Workers perform the heavy lifting &#8212; preprocessing, batching, calling models, handling retries, and persisting outputs.</p></li><li><p><strong>Store: persist results and job state</strong><br>State must be explicit. Job lifecycle, artifacts, metadata, and outputs must survive crashes and restarts.</p><p></p></li></ul><p>The API should not block on inference for unpredictable workloads. That responsibility belongs to workers. </p><p><strong>Separate Ingress from Execution</strong></p><div><hr></div><h1>Sync vs Async: A Strategic Decision</h1><p>This is not an implementation detail. It is an architectural decision.</p><h2>Synchronous (Blocking)</h2><p>Use only when:</p><ul><li><p>Inference latency is consistently &lt; 1&#8211;2 seconds</p></li><li><p>UX requires immediate response</p></li><li><p>No batching opportunity exists</p></li></ul><p>Risks:</p><ul><li><p>Burst traffic amplifies tail latency</p></li><li><p>Retries compound instability</p></li><li><p>Scaling becomes tightly coupled to API capacity</p></li></ul><h2>Asynchronous (Default for Serious Workloads)</h2><p>Async unlocks:</p><ul><li><p>Horizontal worker scaling</p></li><li><p>Intelligent batching</p></li><li><p>Durable retries</p></li><li><p>Circuit-breaking</p></li><li><p>Back-pressure control</p></li></ul><p><strong>Sequence:</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!AYYn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6663d1e2-ad20-4596-a8fb-7ad175fd26b6_1920x1080.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!AYYn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6663d1e2-ad20-4596-a8fb-7ad175fd26b6_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!AYYn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6663d1e2-ad20-4596-a8fb-7ad175fd26b6_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!AYYn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6663d1e2-ad20-4596-a8fb-7ad175fd26b6_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!AYYn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6663d1e2-ad20-4596-a8fb-7ad175fd26b6_1920x1080.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!AYYn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6663d1e2-ad20-4596-a8fb-7ad175fd26b6_1920x1080.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6663d1e2-ad20-4596-a8fb-7ad175fd26b6_1920x1080.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:121128,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://puneetghanshani.substack.com/i/188784112?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6663d1e2-ad20-4596-a8fb-7ad175fd26b6_1920x1080.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!AYYn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6663d1e2-ad20-4596-a8fb-7ad175fd26b6_1920x1080.png 424w, https://substackcdn.com/image/fetch/$s_!AYYn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6663d1e2-ad20-4596-a8fb-7ad175fd26b6_1920x1080.png 848w, https://substackcdn.com/image/fetch/$s_!AYYn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6663d1e2-ad20-4596-a8fb-7ad175fd26b6_1920x1080.png 1272w, https://substackcdn.com/image/fetch/$s_!AYYn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6663d1e2-ad20-4596-a8fb-7ad175fd26b6_1920x1080.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Async decouples user interaction from compute variability. That decoupling is what enables scale.</p><div><hr></div><h1>Your Control Plane Contract</h1><p>Every inference request becomes a job and each job should have a schema (i.e. your control plane contract).</p><p>Principles:</p><ul><li><p>Queue messages remain small</p></li><li><p>Large payloads are referenced via object storage</p></li><li><p>Job state is persisted separately</p></li><li><p>Idempotency is first-class</p></li></ul><p>Example job schema:</p><pre><code><code>{
  "jobId": "uuid-1234",
  "idempotencyKey": "user123:prompt:sha256",
  "tenantId": "tenant-01",
  "modelHint": "gpt-xx-small",
  "payloadRef": "storage/inputs/uuid-1234.json",
  "createdAt": "2026-02-22T12:00:00Z",
  "priority": "interactive",
  "attempts": 0,
  "maxAttempts": 5,
  "batchable": true,
  "callbackUrl": "https://client.example/callback"
}
</code></code></pre><p>Track job states explicitly:</p><ul><li><p><code>queued</code></p></li><li><p><code>running</code></p></li><li><p><code>succeeded</code></p></li><li><p><code>failed</code></p></li><li><p><code>cancelled</code></p></li></ul><p>Expose via:</p><pre><code><code>GET /v1/jobs/{jobId}
</code></code></pre><p>Once inference is modeled as a lifecycle-managed job, you gain operational control &#8212; visibility, retries, cancellation, auditing.</p><div><hr></div><h1>Request Routing</h1><p>Request Routing should be lightweight and predictable through metadata.</p><p>Typical routing keys:</p><ul><li><p><code>priority</code></p></li><li><p><code>tenantId</code></p></li><li><p><code>modelHint</code></p></li><li><p><code>long_context</code></p></li><li><p><code>batchable</code></p></li></ul><p>Minimal router:</p><pre><code><code>def route_request(payload):
    if payload.priority == "interactive":
        return "fast-model"
    if payload.long_context:
        return "large-context-model"
    return "default-model"
</code></code></pre><p>The API decides <em>where</em> a request should go.</p><p>Workers decide <em>how</em> it is executed.</p><p>Complex orchestration logic belongs inside workers, not the API.</p><div><hr></div><h1>Worker Design: Stateless, Idempotent, Lease-Aware</h1><p>Workers operate in a hostile environment.</p><p>They must assume:</p><ul><li><p>Jobs can be delivered twice</p></li><li><p>Models can rate-limit</p></li><li><p>Pods can die mid-processing</p></li><li><p>Downstream calls can fail</p></li></ul><p>Minimal processing flow:</p><pre><code><code>def process_job(job):
    if is_already_completed(job.jobId):
        return "skipped"

    lock = acquire_lock(job.jobId, ttl=60000)
    if not lock:
        return "locked"

    try:
        payload = download(job.payloadRef)
        prepped = preprocess(payload)

        if job.batchable:
            batch = attempt_batching(job)
            responses = send_batch_to_model(batch)
            results = postprocess_batch(responses)
        else:
            response = call_model(prepped, model_hint=job.modelHint)
            results = postprocess(response)

        persist_result(job.jobId, results)
        notify_client(job.callbackUrl, results)
        mark_completed(job.jobId)

    finally:
        release_lock(lock)
</code></code></pre><p>Key properties:</p><ul><li><p>Stateless workers &#8594; horizontal scaling</p></li><li><p>Locking + atomic state transitions &#8594; idempotency</p></li><li><p>Visibility timeout / lease extension &#8594; crash safety</p></li><li><p>Dead-letter queue &#8594; bounded retries</p></li></ul><p>Workers are disposable. State is not.</p><div><hr></div><h1>Batching: Where Efficiency Is Won</h1><p>GPU-backed inference is throughput-sensitive. Running single-item requests on GPU infrastructure is like sending one dish at a time to a commercial oven. Hence batching strategies are even more important:</p><p>Batching strategies:</p><ol><li><p>Time-window batching</p></li><li><p>Size-based batching (tokens / payload size)</p></li><li><p>Hybrid threshold</p></li></ol><p>Example loop:</p><pre><code><code>def batching_loop(model_queue, max_batch_size, max_wait_ms):
    buffer = []
    start = now()

    while True:
        job = model_queue.pop(timeout=max_wait_ms)
        if job:
            buffer.append(job)

        if len(buffer) &gt;= max_batch_size or elapsed(start) &gt;= max_wait_ms:
            batch_request = merge_jobs(buffer)
            responses = call_model(batch_request)
            split_and_dispatch_responses(buffer, responses)
            buffer = []
            start = now()
</code></code></pre><p>Maintain request-to-response mapping carefully.</p><p>Without batching, GPU utilization and cost efficiency suffer.</p><div><hr></div><h1>Idempotency: Protecting Cost and Correctness</h1><p>Retries are inevitable.</p><p>Submission logic:</p><pre><code><code>def submit_inference(payload, idempotency_key=None):
    key = idempotency_key or sha256(payload)

    existing = idempotency_store.get(key)
    if existing:
        return existing.jobId, existing.status

    job = create_job(payload, idempotencyKey=key)
    idempotency_store.set(key, {"jobId": job.jobId, "status": "queued"})
    enqueue(job)

    return job.jobId, "queued"
</code></code></pre><p>Worker-level dedupe requires:</p><ul><li><p>Atomic job transitions</p></li><li><p>Lock per <code>jobId</code></p></li><li><p>Idempotent persistence</p></li></ul><p>Without this, duplicate GPU calls become silent cost leaks.</p><div><hr></div><h1>Retries, 429s, and Back-Pressure</h1><p>Two failure domains:</p><ol><li><p>API overload &#8594; return <code>429</code> or <code>503</code></p></li><li><p>Model rate-limiting &#8594; respect <code>Retry-After</code></p></li></ol><p>Retry logic:</p><pre><code><code>def call_with_retries(call_fn, max_retries=5):
    attempt = 0

    while attempt &lt;= max_retries:
        try:
            return call_fn()

        except RateLimitError as e:
            wait = parse_retry_after(e) or (2 ** attempt + jitter())
            sleep(wait)
            attempt += 1

        except TransientError:
            sleep(2 ** attempt + jitter())
            attempt += 1

    raise PermanentFailure()
</code></code></pre><p>Prefer queue-based delayed re-enqueueing over worker sleep loops.</p><div><hr></div><h1>Circuit Breaker Pattern</h1><p>Prevent cascading GPU failure:</p><pre><code><code>class CircuitBreaker:
    def __init__(self, fail_threshold, reset_timeout):
        self.fail_threshold = fail_threshold
        self.reset_timeout = reset_timeout
        self.fail_count = 0
        self.opened_at = None

    def allowed(self):
        if self.opened_at and time.time() - self.opened_at &lt; self.reset_timeout:
            return False
        return True

    def record_failure(self):
        self.fail_count += 1
        if self.fail_count &gt;= self.fail_threshold:
            self.opened_at = time.time()

    def record_success(self):
        self.fail_count = 0
        self.opened_at = None
</code></code></pre><p>When open:</p><ul><li><p>Route to fallback model</p></li><li><p>Delay non-critical jobs</p></li><li><p>Shed load intentionally</p></li></ul><div><hr></div><h1>Observability: The Leading Indicators</h1><p>Correlate by <code>jobId</code>.</p><p>Track:</p><ul><li><p>Enqueue &#8594; dequeue latency</p></li><li><p>Processing time</p></li><li><p>Model latency</p></li><li><p>Batch size</p></li><li><p>Retry count</p></li><li><p>429 frequency</p></li><li><p>Queue depth</p></li><li><p>DLQ growth</p></li></ul><p>The two most predictive instability signals:</p><ul><li><p>Sustained queue growth</p></li><li><p>Increasing time-in-queue</p></li></ul><p>These indicate capacity mismatch before customer complaints surface.</p><div><hr></div><h1>Final Principles</h1><ul><li><p>Default to async for unpredictable workloads</p></li><li><p>Keep the API thin</p></li><li><p>Treat jobs as first-class control-plane objects</p></li><li><p>Make idempotency mandatory</p></li><li><p>Batch for cost efficiency</p></li><li><p>Respect 429s as system signals</p></li><li><p>Monitor queue depth continuously</p></li></ul><p>The model determines capability.</p><p>The orchestration layer determines reliability, cost structure, and scale.</p><p>AI inference is not an API call.</p><p>It is a distributed system.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://puneetghanshani.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[MCP Tool Integration as Systems Thinking (Part 4): Advanced Patterns & Production Readiness]]></title><description><![CDATA[Part 4: Production-ready MCP systems through advanced composition patterns, structural security, chaos testing, and operational excellence.]]></description><link>https://puneetghanshani.substack.com/p/mcp-tool-integration-as-systems-thinking-cdb</link><guid isPermaLink="false">https://puneetghanshani.substack.com/p/mcp-tool-integration-as-systems-thinking-cdb</guid><dc:creator><![CDATA[Puneet Ghanshani]]></dc:creator><pubDate>Sat, 14 Feb 2026 02:00:59 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/69c757a7-bcbe-4f2f-82b1-14edcdf89833_1080x1350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In Part 1, we built foundations. In Part 2, we designed for resilience. In Part 3, we established governance.<br>This final article completes the picture with <strong>advanced patterns that make MCP systems production-ready</strong>: composition, routing, security, and testing for failure.</p><p>This is where architecture stops being theoretical and starts being operational.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://puneetghanshani.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Technology, Trends and Leadership! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Series Navigation</h2><ol><li><p><a href="https://puneetghanshani.substack.com/p/mcp-tool-integration-as-systems-thinking">Part 1: Foundation &amp; Architecture</a></p></li><li><p><a href="https://puneetghanshani.substack.com/p/mcp-tool-integration-as-systems-thinking-80e">Part 2: Resilience &amp; Runtime Behavior</a></p></li><li><p><a href="https://puneetghanshani.substack.com/p/mcp-tool-integration-as-systems-thinking-521">Part 3: System Behavior &amp; Policies</a></p></li><li><p><strong>Part 4: Advanced Patterns &amp; Production</strong> (this article)</p></li></ol><div><hr></div><h2>Composition, Routing, and Orchestration Are Where Architecture Shows</h2><p>As agents mature, tools stop being called in isolation. They become <strong>building blocks</strong>.</p><p>Three higher-level patterns emerge:</p><ul><li><p><strong>Composition</strong> turns simple tools into reusable workflows</p></li><li><p><strong>Routing</strong> selects tools dynamically based on context</p></li><li><p><strong>Orchestration</strong> coordinates multi-step operations with dependencies</p></li></ul><p>These patterns must be <strong>explicit and observable</strong>. Hidden orchestration buried in prompts or ad-hoc logic rarely survives scale.</p><div><hr></div><h3>Tool Composition Pattern</h3><pre><code><code>Input
  &#8594; Search
      &#8594; Extract
          &#8594; Synthesize
              &#8594; Output
</code></code></pre><p>Composition works best when:</p><ul><li><p>Data flow is linear</p></li><li><p>Each step has a clear contract</p></li><li><p>Intermediate results are inspectable</p></li></ul><div><hr></div><h3>Composition Algorithm</h3><pre><code><code>FUNCTION composeWorkflow(steps, input):
  context = { input, results: {} }

  FOR EACH step IN steps:
    stepInput = resolve(step.inputMapping, context)
    result = step.tool.execute(stepInput)
    context.results[step.name] = result

  RETURN context.results[lastStep]
</code></code></pre><p>Composition enables reuse without coupling agents to tool internals.</p><div><hr></div><h2>Dynamic Tool Routing</h2><p>When multiple tools offer the same capability, selection becomes a <strong>runtime decision</strong>, not a configuration choice.</p><p>Routing should account for:</p><ul><li><p>Tool health</p></li><li><p>Latency</p></li><li><p>Cost</p></li><li><p>Capability match</p></li><li><p>Permissions</p></li></ul><div><hr></div><h3>Routing Algorithm (Conceptual)</h3><pre><code><code>FUNCTION routeTool(intent, context):
  candidates = findByCapability(intent)

  FOR EACH tool IN candidates:
    score = 100

    IF tool unhealthy: score = 0
    IF degraded: score -= 20
    IF slow and speed matters: score -= 30
    IF expensive and cost matters: score -= costPenalty
    IF missing permissions: score = 0

  RETURN highestScore(candidates)
</code></code></pre><p>Routing logic belongs in infrastructure&#8212;not agent prompts.</p><div><hr></div><h2>Orchestration: Coordinating Complex Workflows</h2><p>Orchestration coordinates:</p><ul><li><p>Sequential steps</p></li><li><p>Parallel execution</p></li><li><p>Conditional branches</p></li><li><p>Error recovery</p></li></ul><p>It&#8217;s where policies, retries, fallbacks, and observability converge.</p><div><hr></div><h3>Orchestration Pattern</h3><pre><code><code>FOR EACH step:
  resolve dependencies
  execute step
  record outcome

  IF failure:
    apply policy
    fallback or abort
</code></code></pre><p>Well-designed orchestration produces <strong>execution logs</strong> that operators can reason about without reading code.</p><div><hr></div><h3>When to Use Each Pattern</h3><p><strong>Composition</strong></p><ul><li><p>Linear pipelines</p></li><li><p>Reusable workflows</p></li><li><p>ETL-style processes</p></li></ul><p><strong>Routing</strong></p><ul><li><p>Redundant capabilities</p></li><li><p>Cost vs. speed tradeoffs</p></li><li><p>Health-aware selection</p></li></ul><p><strong>Orchestration</strong></p><ul><li><p>Multi-step processes</p></li><li><p>Conditional logic</p></li><li><p>Parallelism with dependencies</p></li></ul><div><hr></div><h2>Security Is Structural, Not Additive</h2><p>Tool integration increases blast radius.</p><p>Security cannot be layered on later&#8212;it must be <strong>structural</strong>:</p><ul><li><p>Credentials are scoped and rotated</p></li><li><p>Inputs are validated consistently</p></li><li><p>Data sharing is minimized</p></li><li><p>Execution is sandboxed</p></li></ul><p>Most tool-related security failures are architectural, not novel.</p><div><hr></div><h3>Structural Security Principles</h3><p><strong>Authentication &amp; Authorization</strong></p><ul><li><p>Users are authenticated</p></li><li><p>Tools are authorized</p></li><li><p>Permissions are scoped narrowly</p></li></ul><p><strong>Data Protection</strong></p><ul><li><p>Encryption at rest and in transit</p></li><li><p>Data minimization by default</p></li></ul><p><strong>Input Validation</strong></p><ul><li><p>Schema enforcement</p></li><li><p>Size limits</p></li><li><p>Dangerous input sanitization</p></li></ul><p><strong>Auditability</strong></p><ul><li><p>Every tool execution logged</p></li><li><p>PII detection enforced</p></li><li><p>Compliance is observable</p></li></ul><p>Security that depends on &#8220;remembering to do the right thing&#8221; does not scale.</p><div><hr></div><h2>Testing for Failure Is a Form of Respect</h2><p>Testing only the happy path assumes the system will be treated gently.</p><p>It won&#8217;t.</p><p>Production MCP systems must be tested against:</p><ul><li><p>Tool outages</p></li><li><p>Partial responses</p></li><li><p>Network degradation</p></li><li><p>Expired credentials</p></li></ul><p>Chaos testing is not pessimism&#8212;it&#8217;s respect for complexity.</p><div><hr></div><h3>Critical Test Scenarios</h3><pre><code><code>&#8226; Transient failures &#8594; retries succeed
&#8226; Circuit breakers &#8594; fail fast after threshold
&#8226; Credential expiration &#8594; refresh and retry once
&#8226; Fallbacks &#8594; degraded success, not total failure
&#8226; Cache behavior &#8594; consistent reuse within TTL
</code></code></pre><div><hr></div><h3>Chaos Testing (Conceptual)</h3><pre><code><code>Inject failures + latency
Run real workloads
Measure success rate
Verify graceful degradation
</code></code></pre><p>A resilient system bends under stress&#8212;it doesn&#8217;t shatter.</p><div><hr></div><h2>Production Readiness Checklist</h2><p>Ship to production only when:</p><p><strong>Resilience</strong></p><ul><li><p>Fallback paths tested</p></li><li><p>Circuit breakers configured</p></li><li><p>Timeouts tuned</p></li><li><p>Degraded modes verified</p></li></ul><p><strong>Observability</strong></p><ul><li><p>All tool calls instrumented</p></li><li><p>Correlation IDs propagate</p></li><li><p>Health reflects reality</p></li></ul><p><strong>Security</strong></p><ul><li><p>Credentials encrypted and rotated</p></li><li><p>Input validation enforced</p></li><li><p>PII detection active</p></li><li><p>Audit logs complete</p></li></ul><p><strong>Operations</strong></p><ul><li><p>Runbooks exist</p></li><li><p>Alerts are actionable</p></li><li><p>Rollbacks are tested</p></li></ul><p><strong>Governance</strong></p><ul><li><p>Tool registry complete</p></li><li><p>Policies documented</p></li><li><p>Deprecation process defined</p></li></ul><div><hr></div><h2>Final Reflection: Building Infrastructure That Lasts</h2><p>MCP tool integration is not about adding capabilities to agents.</p><p>It&#8217;s about building <strong>infrastructure that earns trust over time</strong>.</p><p>The systems that last are not the ones with the most tools, but the ones with:</p><ul><li><p>Clear boundaries</p></li><li><p>Honest assumptions about failure</p></li><li><p>Visible behavior</p></li><li><p>Disciplined evolution</p></li></ul><p>If you design MCP integration as a system&#8212;not a shortcut&#8212;you give your agents something rare: a foundation that doesn&#8217;t crack as they grow.</p><div><hr></div><h2>From Principles to Practice</h2><p>Patterns don&#8217;t build systems. Teams do.</p><p>What matters in practice:</p><ol><li><p>Design for scale even when starting small</p></li><li><p>Make failure visible and explicit</p></li><li><p>Measure everything</p></li><li><p>Automate governance</p></li><li><p>Test the edges, not just the center</p></li></ol><div><hr></div><h2>Series Conclusion</h2><p>MCP is young, but the principles behind resilient systems are not.</p><p>Separation of concerns.<br>Graceful degradation.<br>Observability.<br>Security by design.</p><p>These are timeless.</p><p>If this series helped clarify how MCP fits into real systems, apply the patterns, adapt them, and share what you learn. Infrastructure improves when understanding is shared.</p><p><strong>Series Links</strong></p><ol><li><p>Part 1: Foundation &amp; Architecture</p></li><li><p>Part 2: Resilience &amp; Runtime Behavior</p></li><li><p>Part 3: System Behavior &amp; Policies</p></li><li><p><strong>Part 4: Advanced Patterns &amp; Production</strong> (this article)</p></li></ol><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://puneetghanshani.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Technology, Trends and Leadership! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[MCP Tool Integration as Systems Thinking (Part 3): System Behavior & Policies]]></title><description><![CDATA[Part 3: Managing system behavior through tool discovery, error handling policies, performance patterns, and strategic tool selection.]]></description><link>https://puneetghanshani.substack.com/p/mcp-tool-integration-as-systems-thinking-521</link><guid isPermaLink="false">https://puneetghanshani.substack.com/p/mcp-tool-integration-as-systems-thinking-521</guid><dc:creator><![CDATA[Puneet Ghanshani]]></dc:creator><pubDate>Thu, 12 Feb 2026 02:00:23 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/2d08579b-f382-4fb4-a6f9-baf3f0b800f2_1080x1350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In Part 1, we established architectural foundations. In Part 2, we designed for resilience. This article addresses <strong>system-wide behavior</strong>: how tools are discovered, how errors are handled consistently, how performance emerges, and how tool selection becomes a strategic decision.</p><p><strong>Policy beats improvisation at scale.</strong></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://puneetghanshani.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Technology, Trends and Leadership! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Series Navigation</h2><ol><li><p><a href="https://puneetghanshani.substack.com/p/mcp-tool-integration-as-systems-thinking">Part 1: Foundation &amp; Architecture</a></p></li><li><p><a href="https://puneetghanshani.substack.com/p/mcp-tool-integration-as-systems-thinking-80e">Part 2: Resilience &amp; Runtime Behavior</a></p></li><li><p><strong>Part 3: System Behavior &amp; Policies</strong> (this article)</p></li><li><p>Part 4: Advanced Patterns &amp; Production</p></li></ol><div><hr></div><h2>Tool Discovery Is a Governance Problem</h2><p>As systems grow, the question shifts from <em>how do we call tools</em> to <em>which tools should exist at all</em>.</p><p>Dynamic discovery enables flexibility&#8212;but without governance, it creates entropy. A tool registry becomes a <strong>source of truth</strong>, not a convenience.</p><p>Effective registries capture intent:</p><ul><li><p>What the tool does</p></li><li><p>What guarantees it provides</p></li><li><p>How expensive or slow it is</p></li><li><p>What permissions it requires</p></li></ul><p>This metadata enables smarter routing, safer fallbacks, and deliberate deprecation.</p><div><hr></div><h3>Tool Metadata Model (Conceptual)</h3><pre><code><code>IDENTITY
&#8226; toolId
&#8226; name
&#8226; version
&#8226; description

CAPABILITIES
&#8226; capabilities (e.g. search, realtime-data)
&#8226; tags (production-ready, external)

PERFORMANCE
&#8226; estimated latency (fast / medium / slow)
&#8226; rate limits
&#8226; cost per call

RELIABILITY
&#8226; SLA
&#8226; retryable
&#8226; idempotent

SECURITY
&#8226; required permissions
&#8226; data classification
&#8226; PII handling

SCHEMA
&#8226; input validation
&#8226; output structure

OPERATIONAL
&#8226; fallback tools
&#8226; health check endpoint
</code></code></pre><div><hr></div><h3>Discovery &amp; Routing Algorithm</h3><pre><code><code>FUNCTION discoverTools(path):
  definitions = scan(path)

  FOR EACH tool IN definitions:
    validate(tool)
    registry.register(tool)
    LOG("Tool registered", tool.id)

FUNCTION findTools(capability):
  RETURN registry.query(
    capability = capability,
    tag = "production-ready",
    orderBy = "sla DESC"
  )

FUNCTION selectTool(intent, constraints):
  candidates = findTools(intent.capability)

  APPLY latency, cost, SLA constraints
  SCORE remaining tools
  RETURN best match
</code></code></pre><div><hr></div><h3>Governance Questions</h3><p><strong>Before adding a tool</strong></p><ul><li><p>Does this capability already exist?</p></li><li><p>What&#8217;s the cost per invocation?</p></li><li><p>Who owns it?</p></li><li><p>What&#8217;s the deprecation plan?</p></li></ul><p><strong>Before removing a tool</strong></p><ul><li><p>What depends on it?</p></li><li><p>Is there a migration path?</p></li><li><p>Do usage metrics support removal?</p></li><li><p>How will users be informed?</p></li></ul><div><hr></div><h2>Error Handling Is a Policy Decision</h2><p>Error handling should never be improvised at call sites.</p><p>It&#8217;s a <strong>policy</strong>, applied consistently, that defines:</p><ul><li><p>Which errors are retryable</p></li><li><p>Which errors alert humans</p></li><li><p>Which errors are safe to surface</p></li><li><p>When tools should be disabled automatically</p></li></ul><p>When policies are centralized, systems behave coherently under stress. When they aren&#8217;t, behavior becomes unpredictable.</p><div><hr></div><h3>Error Handling Policy (Conceptual)</h3><pre><code><code>IF transient error:
  retry with backoff
  fallback if exhausted

IF rate limit:
  honor retry-after
  delay execution

IF authentication error:
  refresh credentials once
  then alert and disable tool

IF validation error:
  fail fast
  surface to agent

IF unknown error:
  fail
  alert operator
</code></code></pre><div><hr></div><h3>Policy-Driven Error Algorithm</h3><pre><code><code>FUNCTION handleError(error, context):
  category = classify(error)

  SWITCH category:
    TRANSIENT:
      retry or fallback
    RATE_LIMIT:
      delay and retry
    AUTH:
      refresh once or disable
    VALIDATION:
      fail and surface
    UNKNOWN:
      fail and alert
</code></code></pre><div><hr></div><h3>Circuit Breaker Pattern</h3><pre><code><code>IF failures exceed threshold:
  open circuit
  fail fast

AFTER timeout:
  half-open
  allow limited retries
</code></code></pre><p>Circuit breakers prevent cascading failure and buy operators time.</p><div><hr></div><h3>Policy Configuration (Example)</h3><pre><code><code>retry:
  maxAttempts: 3
  backoff: exponential

circuitBreaker:
  threshold: 5 failures
  timeout: 30s

auth:
  autoRefresh: true
  maxAttempts: 1

validation:
  surfaceToAgent: true

rateLimit:
  honorRetryAfter: true
</code></code></pre><div><hr></div><h2>Performance Emerges From Architecture</h2><p>In multi-tool systems, performance is not about fast tools&#8212;it&#8217;s about <strong>composition</strong>.</p><p>Latency multiplies when:</p><ul><li><p>Calls are duplicated</p></li><li><p>Connections aren&#8217;t reused</p></li><li><p>Results aren&#8217;t cached</p></li><li><p>Execution is unnecessarily sequential</p></li></ul><p>Good performance engineering focuses on flow, not micro-optimizations.</p><div><hr></div><h3>Core Performance Patterns</h3><p><strong>1. Request Deduplication</strong></p><ul><li><p>Share in-flight requests</p></li><li><p>Prevent duplicate work</p></li></ul><p><strong>2. Connection Pooling</strong></p><ul><li><p>Reuse expensive connections</p></li><li><p>Maintain headroom for spikes</p></li></ul><p><strong>3. Adaptive Caching</strong></p><ul><li><p>TTL based on tool characteristics</p></li><li><p>Longer cache for slow or costly tools</p></li></ul><p><strong>4. Parallel Execution with Limits</strong></p><ul><li><p>Controlled concurrency</p></li><li><p>Avoid overload</p></li></ul><p><strong>5. Cache Key Normalization</strong></p><ul><li><p>Normalize inputs before hashing</p></li><li><p>Prevent accidental cache misses</p></li></ul><div><hr></div><h3>Metrics That Matter</h3><pre><code><code>&#8226; Cache hit rate (&gt;80% for cacheable ops)
&#8226; P50 / P95 / P99 latency
&#8226; Deduplication rate
&#8226; Connection pool utilization (60&#8211;80%)
&#8226; Sustained throughput
</code></code></pre><p>Watch for bimodal latency&#8212;it often signals architectural issues, not slow tools.</p><div><hr></div><h2>Tool Selection Is an Exercise in Restraint</h2><p>Mature MCP systems are defined less by how many tools they have&#8212;and more by how many they <em>refuse to add</em>.</p><p>Tool selection is strategy in disguise:</p><ul><li><p>Community tools for common capabilities</p></li><li><p>Custom tools where differentiation matters</p></li><li><p>Redundancy for resilience, not indecision</p></li></ul><p>Every tool increases operational surface area. Complexity should be earned.</p><div><hr></div><h3>Tool Evaluation Scorecard</h3><p>Score each category from 0&#8211;10:</p><p><strong>Need</strong></p><ul><li><p>Real user value?</p></li><li><p>Cost of not having it?</p></li></ul><p><strong>Quality</strong></p><ul><li><p>SLA and maintenance?</p></li><li><p>Documentation and tests?</p></li></ul><p><strong>Operational Cost (inverted)</strong></p><ul><li><p>Integration complexity?</p></li><li><p>Monitoring burden?</p></li></ul><p><strong>Strategic Fit</strong></p><ul><li><p>Aligns with platform direction?</p></li><li><p>Relevant in 6&#8211;12 months?</p></li></ul><p><strong>Threshold:</strong> Require <strong>30+ points</strong> to add a tool.</p><div><hr></div><h3>Deprecation Signals</h3><p>Remove tools when:</p><ul><li><p>Usage &lt;1% for 30 days</p></li><li><p>Better alternatives exist</p></li><li><p>Maintenance cost exceeds value</p></li><li><p>Strategy shifts away from the capability</p></li></ul><div><hr></div><h2>Policy Checklist</h2><p>Your MCP system shows strong governance when:</p><ul><li><p>Tool metadata is complete and current</p></li><li><p>Discovery is automated and validated</p></li><li><p>Error handling follows centralized policy</p></li><li><p>Circuit breakers prevent cascading failure</p></li><li><p>Performance patterns are consistently applied</p></li><li><p>Tool selection has explicit criteria</p></li><li><p>Deprecation is intentional and communicated</p></li><li><p>Operators can query tool health programmatically</p></li></ul><div><hr></div><h2>Coming Next: Part 4 &#8212; Advanced Patterns &amp; Production</h2><p>In the final part, we&#8217;ll explore:</p><ul><li><p>Tool composition and orchestration</p></li><li><p>Security as structural design</p></li><li><p>Testing for failure at scale</p></li><li><p>Production readiness principles</p></li></ul><div><hr></div><h2>Reflection</h2><p>Policies scale where improvisation fails.</p><p>By centralizing decisions about discovery, errors, performance, and selection, you create systems that behave predictably under stress.</p><p>The best systems make governance invisible to users&#8212;and obvious to operators.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://puneetghanshani.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Technology, Trends and Leadership! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[MCP Tool Integration as Systems Thinking (Part 2): Resilience & Runtime Behavior]]></title><description><![CDATA[Part 2: Building resilient MCP systems through graceful degradation, lazy loading, stateless design, and comprehensive observability.]]></description><link>https://puneetghanshani.substack.com/p/mcp-tool-integration-as-systems-thinking-80e</link><guid isPermaLink="false">https://puneetghanshani.substack.com/p/mcp-tool-integration-as-systems-thinking-80e</guid><dc:creator><![CDATA[Puneet Ghanshani]]></dc:creator><pubDate>Wed, 11 Feb 2026 04:00:41 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/7bcc67e7-8a14-444e-8eca-66af0cf6f6c0_1080x1350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In Part 1, we established the architectural foundation for MCP tool integration. This article turns to <strong>runtime behavior</strong>&#8212;how systems actually behave when tools fail, lag, or act unpredictably.</p><p>Resilience isn&#8217;t about preventing failure.<br>It&#8217;s about <strong>controlling what happens when failure occurs</strong>.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://puneetghanshani.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Technology, Trends and Leadership! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><h2>Series Navigation</h2><ol><li><p><a href="https://puneetghanshani.substack.com/p/mcp-tool-integration-as-systems-thinking">Part 1: Foundation &amp; Architecture</a></p></li><li><p><strong>Part 2: Resilience &amp; Runtime Behavior</strong> (this article)</p></li><li><p>Part 3: System Behavior &amp; Policies</p></li><li><p>Part 4: Advanced Patterns &amp; Production</p></li></ol><div><hr></div><h2>Failure Is Normal&#8212;Design for It</h2><p>One of the most dangerous assumptions in tool integration is that failure is exceptional.</p><p>In distributed systems, failure is the <strong>default state</strong>. The only thing that changes is frequency.</p><p>The real question isn&#8217;t <em>whether</em> a tool will fail, but <strong>how much damage that failure causes</strong>.</p><p>Resilient MCP systems assume that <em>something</em> is always degraded:</p><ul><li><p>A tool may be slow rather than fully down</p></li><li><p>Credentials may expire mid-session</p></li><li><p>Rate limits may apply unevenly</p></li><li><p>Partial responses may be better than no response</p></li></ul><p>Graceful degradation means explicitly deciding:</p><ul><li><p>Which failures are acceptable</p></li><li><p>Which failures are recoverable</p></li><li><p>Which failures must surface to users</p></li></ul><p>Clarity here prevents silent corruption and builds long-term trust.</p><div><hr></div><h2>Graceful Degradation</h2><h3>Conceptual Flow</h3><pre><code><code>Execute request
  &#8594; Try primary tool
    &#8594; Success &#8594; Return result
    &#8594; Failure
        &#8594; Refresh credentials (if needed)
        &#8594; Try fallback tools
            &#8594; Success &#8594; Log fallback + return result
            &#8594; Failure
                &#8594; Return cached or degraded response
</code></code></pre><div><hr></div><h3>Algorithm (Language-Agnostic)</h3><pre><code><code>FUNCTION executeWithFallback(primaryTool, fallbackTools[], input):
  tools = [primaryTool] + fallbackTools
  errors = []

  FOR EACH tool IN tools:
    TRY:
      result = executeWithTimeout(tool, input, 5000ms)

      IF tool != primaryTool:
        LOG_WARNING("Fallback used", {tool, errors})

      RETURN { success: true, data: result }

    CATCH error:
      errors.append({tool: tool.name, error})

      IF error.type == CREDENTIALS_EXPIRED:
        refreshCredentials(tool)

  cachedData = getCachedResponse(input)

  RETURN {
    success: false,
    degraded: true,
    data: cachedData,
    errors
  }
</code></code></pre><div><hr></div><h3>Degradation Strategies</h3><p><strong>1. Fallback Chains</strong></p><ul><li><p>Search: Primary API &#8594; Secondary API &#8594; Cache &#8594; Empty result</p></li><li><p>Translation: Premium service &#8594; Free service &#8594; Pass-through</p></li></ul><p><strong>2. Partial Results</strong></p><ul><li><p>Return 8 of 10 search results</p></li><li><p>Return summaries without citations</p></li></ul><p><strong>3. Cached Responses</strong></p><ul><li><p>Serve stale data with explicit metadata:</p></li></ul><pre><code><code>{ data: ..., cached: true, age: "5 minutes" }
</code></code></pre><p><strong>4. Explicit Degraded Messages</strong></p><pre><code><code>{
  success: false,
  degraded: true,
  message: "Search service unavailable",
  suggestion: "Try rephrasing your query",
  retryAfter: 60
}
</code></code></pre><div><hr></div><h2>Lazy Loading Is About Control, Not Optimization</h2><p>Lazy loading is often framed as a performance trick. In reality, it&#8217;s about <strong>control</strong>.</p><p>Eager loading assumes:</p><ul><li><p>All tools are equally important</p></li><li><p>All tools are equally reliable</p></li></ul><p>That&#8217;s almost never true.</p><p>On-demand initialization creates a more honest system:</p><ul><li><p>Tools are paid for only when used</p></li><li><p>Failures appear in context, not at startup</p></li><li><p>Resource usage reflects real demand</p></li></ul><p>The trade-off is complexity: first-use latency and readiness must be observable. In production systems, that trade-off is usually worth it.</p><div><hr></div><h3>Lazy Loading State Model</h3><pre><code><code>Not Loaded
  &#8594; Initializing (first request)
      &#8594; Ready (cached + monitored)
      &#8594; Failed (retry or give up)
</code></code></pre><div><hr></div><h3>Algorithm</h3><pre><code><code>FUNCTION getTool(toolId):
  IF tools.contains(toolId):
    RETURN tools[toolId]

  IF initializing.contains(toolId):
    AWAIT initializing[toolId]
    RETURN tools[toolId]

  promise = initializeTool(toolId)
  initializing[toolId] = promise

  TRY:
    tool = AWAIT promise
    tools[toolId] = tool
    RETURN tool
  FINALLY:
    initializing.remove(toolId)
</code></code></pre><div><hr></div><h3>When to Lazy Load</h3><p><strong>Good candidates</strong></p><ul><li><p>Expensive or heavyweight tools</p></li><li><p>Rarely used capabilities</p></li><li><p>External or experimental services</p></li></ul><p><strong>Poor candidates</strong></p><ul><li><p>Critical path tools used in most requests</p></li><li><p>Lightweight utilities</p></li><li><p>Tools whose failure should block startup</p></li></ul><div><hr></div><h2>Statelessness Is What Makes Systems Predictable</h2><p>Stateless tools aren&#8217;t exciting&#8212;but they&#8217;re essential.</p><p>Hidden state makes systems fragile:</p><ul><li><p>Retries become dangerous</p></li><li><p>Debugging becomes guesswork</p></li><li><p>Ordering bugs appear under load</p></li></ul><p>Stateless, idempotent tools enable:</p><ul><li><p>Safe retries</p></li><li><p>Reliable caching</p></li><li><p>Clean composition</p></li><li><p>Predictable orchestration</p></li></ul><p>This principle feels restrictive early and liberating later.</p><div><hr></div><h3>Stateful vs. Stateless</h3><p><strong>Stateful (Fragile)</strong></p><ul><li><p>Behavior depends on call order</p></li><li><p>Retries change outcomes</p></li><li><p>Implicit configuration leaks</p></li></ul><p><strong>Stateless (Robust)</strong></p><ul><li><p>All inputs are explicit</p></li><li><p>Same input &#8594; same output</p></li><li><p>Safe to retry, cache, and parallelize</p></li></ul><div><hr></div><h3>Stateless Tool Design</h3><pre><code><code>FUNCTION search(params):
  query = params.query
  filters = params.filters OR []
  sortBy = params.sortBy OR "date"

  RETURN api.search(query, filters, sortBy)
</code></code></pre><p><strong>Properties</strong></p><ul><li><p>Idempotent</p></li><li><p>Cacheable</p></li><li><p>Retryable</p></li><li><p>Testable</p></li><li><p>Composable</p></li></ul><div><hr></div><h3>Making Stateful APIs Behave Statelessly</h3><p><strong>State Containers</strong></p><pre><code><code>FUNCTION createSession(filters, sortBy):
  RETURN {
    execute: (query) =&gt; api.search(query, filters, sortBy)
  }
</code></code></pre><p><strong>State Serialization</strong></p><pre><code><code>token = encrypt(serialize({filters, sortBy}))
</code></code></pre><div><hr></div><h2>Observability Is the Difference Between Control and Hope</h2><p>Without observability, multi-tool MCP systems operate on hope.</p><p>Hope tools are healthy.<br>Hope retries are working.<br>Hope latency spikes resolve themselves.</p><p>Hope does not scale.</p><p>Resilient systems treat observability as a <strong>first-class feature</strong>:</p><ul><li><p>Every tool call is correlated</p></li><li><p>Latency and errors are tracked per tool</p></li><li><p>Health is continuously evaluated</p></li></ul><p>This benefits operators <em>and</em> improves architectural decisions over time.</p><div><hr></div><h3>Observability Flow</h3><pre><code><code>Tool Call
  &#8594; Wrapper
      &#8594; Start log + correlation ID
      &#8594; Execute
          &#8594; Record metrics
          &#8594; Log success or failure
</code></code></pre><div><hr></div><h3>Algorithm</h3><pre><code><code>FUNCTION executeWithObservability(tool, input, correlationId):
  start = NOW()

  LOG("Started", {tool, correlationId})

  TRY:
    result = tool.execute(input)
    recordMetrics(tool, "success", NOW() - start)
    LOG("Completed", {tool, correlationId})
    RETURN result

  CATCH error:
    recordMetrics(tool, "error", NOW() - start)
    LOG_ERROR("Failed", {tool, correlationId, error})
    THROW error
</code></code></pre><div><hr></div><h3>What to Observe</h3><p><strong>Per-Tool</strong></p><ul><li><p>Request volume</p></li><li><p>Error rate</p></li><li><p>Latency (P50 / P95 / P99)</p></li><li><p>Timeout frequency</p></li><li><p>Fallback usage</p></li></ul><p><strong>System-Wide</strong></p><ul><li><p>Tool calls per request</p></li><li><p>Concurrent executions</p></li><li><p>End-to-end latency by tool combination</p></li></ul><p><strong>Health Signals</strong></p><ul><li><p>Availability</p></li><li><p>Success-rate trends</p></li><li><p>Initialization failures</p></li><li><p>Credential refresh errors</p></li></ul><div><hr></div><h2>Resilience Checklist</h2><p>Your MCP system is resilient when:</p><ul><li><p>Tool failures don&#8217;t crash agents</p></li><li><p>Fallback paths are tested and visible</p></li><li><p>Degraded modes are explicit</p></li><li><p>Lazy loading failures are recoverable</p></li><li><p>Tools are stateless and idempotent</p></li><li><p>Every call is traced</p></li><li><p>Health metrics influence routing decisions</p></li></ul><div><hr></div><h2>Coming Next: Part 3 &#8212; System Behavior &amp; Policies</h2><p>Next, we&#8217;ll cover:</p><ul><li><p>Tool discovery at scale</p></li><li><p>Centralized error policies</p></li><li><p>Performance tuning patterns</p></li><li><p>Strategic tool selection</p></li></ul><div><hr></div><h2>Reflection</h2><p>Resilience emerges from honest assumptions.</p><p>Don&#8217;t assume tools won&#8217;t fail&#8212;design for failure.<br>Don&#8217;t assume fast startup&#8212;lazy load and observe.<br>Don&#8217;t hide state&#8212;make everything explicit.</p><p>The systems operators trust are the ones that <strong>fail visibly, predictably, and gracefully</strong>.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://puneetghanshani.substack.com/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Technology, Trends and Leadership! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[ MCP Tool Integration as Systems Thinking (Part 1): Foundation & Architecture]]></title><description><![CDATA[Part 1: Understanding the architectural foundation for MCP tool integration&#8212;why systems fail at scale and how to design clear boundaries that last.]]></description><link>https://puneetghanshani.substack.com/p/mcp-tool-integration-as-systems-thinking</link><guid isPermaLink="false">https://puneetghanshani.substack.com/p/mcp-tool-integration-as-systems-thinking</guid><dc:creator><![CDATA[Puneet Ghanshani]]></dc:creator><pubDate>Mon, 09 Feb 2026 08:41:44 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/cdd1c35f-12af-4756-9a57-f6f8ef3ae8de_1080x1350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Most conversations about MCP tool integration focus on mechanics: how to register tools, how to call them, how to handle errors. Those details matter&#8212;but they&#8217;re not where systems succeed or fail.</p><p>The real challenge is <strong>systems thinking</strong>: understanding how tools behave over time, under load, during failure, and in the hands of people who didn&#8217;t build them. MCP tools aren&#8217;t just capabilities you add to an agent. They are dependencies that reshape architecture, operations, and trust in subtle but compounding ways.</p><p>This series argues that MCP integration should be treated as <strong>platform design</strong>, not an implementation detail.</p><div><hr></div><h2>Who This Series Is For</h2><p><strong>This series is for you if:</strong></p><ul><li><p>You&#8217;re architecting multi-tool agent systems expected to run in production</p></li><li><p>You&#8217;ve experienced cascading failures or unpredictable behavior in tool integrations</p></li><li><p>You&#8217;re responsible for reliability, security, or operational excellence in AI systems</p></li><li><p>You want to understand systems-thinking principles applied to MCP</p></li></ul><p><strong>This series is </strong><em><strong>not</strong></em><strong> for you if:</strong></p><ul><li><p>You&#8217;re building a simple proof-of-concept with one or two tools</p></li><li><p>You&#8217;re looking for a quick &#8220;getting started&#8221; tutorial</p></li><li><p>You need basic MCP protocol documentation</p></li><li><p>You prefer framework-specific walkthroughs over architectural principles</p></li></ul><p><strong>Note on examples:</strong><br>All patterns are presented as <strong>language-agnostic</strong> algorithms, flowcharts, and diagrams. The architectural principles apply equally to Python, Go, Rust, Java, C#, or JavaScript.</p><div><hr></div><h2>Series Overview</h2><p>This four-part series covers:</p><ol><li><p><strong>Part 1: Foundation &amp; Architecture</strong> &#8212; Core principles and system design</p></li><li><p><strong>Part 2: Resilience &amp; Runtime Behavior</strong> &#8212; Failure handling, state, and observability</p></li><li><p><strong>Part 3: System Behavior &amp; Policies</strong> &#8212; Discovery, errors, performance, and tool selection</p></li><li><p><strong>Part 4: Advanced Patterns &amp; Production</strong> &#8212; Composition, security, and testing</p></li></ol><div><hr></div><h2>Architecture Overview</h2><p>Before diving into specifics, here&#8217;s how a well-designed MCP tool system is structured conceptually:</p><pre><code><code>Agent Logic (Intent &amp; Reasoning)
        &#8595;
Tool Abstraction Layer (Registry &amp; Discovery)
        &#8595;
Execution Layer (Retry, Timeout, Fallback)
        &#8595;
Policy Layer (Error Handling &amp; Security)
        &#8595;
MCP Tools (External Services)
        &#8593;
Observability (Metrics, Logs, Health)
</code></code></pre><p>Each layer has a distinct responsibility. When these boundaries blur, complexity compounds. This article focuses on why each layer matters and how to design them deliberately.</p><div><hr></div><h2>Why Tool Integration Breaks Down at Scale</h2><p>Early-stage MCP systems often feel deceptively simple. A tool call succeeds, the agent responds, and everything appears to work.</p><p>But as more tools are added, systems cross an invisible threshold where problems stop being local and start being <strong>systemic</strong>.</p><p>Latency spikes without a clear cause. Tool errors propagate in unexpected ways. Agents behave inconsistently depending on which tools respond first&#8212;or at all.</p><p>This breakdown usually comes from three root causes:</p><ul><li><p><strong>Tools are treated as synchronous function calls</strong> rather than distributed dependencies</p></li><li><p><strong>Failure is assumed to be rare</strong> instead of routine</p></li><li><p><strong>Operational concerns are deferred</strong> in favor of speed</p></li></ul><p>Once these assumptions are baked into the system, they&#8217;re difficult to unwind. Thoughtful integration starts by rejecting them early.</p><div><hr></div><h3>The Complexity Cliff</h3><p>Small systems tolerate loose coupling. As tool count grows, interactions grow exponentially. Without architectural discipline:</p><ul><li><p><strong>Discovery becomes chaotic</strong> &#8212; &#8220;Which tool does what?&#8221; becomes tribal knowledge</p></li><li><p><strong>Error handling diverges</strong> &#8212; Each tool fails differently, with ad-hoc recovery</p></li><li><p><strong>Observability gaps widen</strong> &#8212; You can&#8217;t tell which tool is slow or why</p></li><li><p><strong>Security becomes patchwork</strong> &#8212; Credentials and permissions are managed inconsistently</p></li></ul><p>The solution isn&#8217;t adding more coordination logic. It&#8217;s designing <strong>clear boundaries</strong> from the start.</p><div><hr></div><h2>Separation of Concerns Is a Strategic Choice</h2><p>Keeping MCP tooling separate from agent logic isn&#8217;t just about cleanliness&#8212;it&#8217;s a long-term strategy.</p><p>Agents should reason about intent and outcomes. Tooling layers should handle connectivity, protocols, retries, and fallbacks. When those responsibilities blur, every new tool increases cognitive load across the entire system.</p><p>Well-designed systems introduce a clear boundary:</p><ul><li><p>A <strong>tool registry</strong> that knows what tools exist and what they can do</p></li><li><p>An <strong>execution layer</strong> responsible for invocation and error handling</p></li><li><p><strong>Protocol abstractions</strong> that shield agents from MCP specifics</p></li></ul><p>This separation creates leverage. Teams can evolve tools independently, test them in isolation, and reason about failures without dragging agent behavior into every discussion.</p><div><hr></div><h3>Tool Registry Pattern</h3><p><strong>Conceptual flow:</strong></p><pre><code><code>Agent &#8594; Tool Registry &#8594; Executor &#8594; Retry Policy &#8594; Tool
                           &#8595;
                      Failure Handling
</code></code></pre><p><strong>Algorithm (language-agnostic):</strong></p><pre><code><code>FUNCTION executeTool(toolId, input, context):
  executor = registry.lookup(toolId)
  IF executor is NULL:
    RETURN { success: false, error: "Tool not found" }

  RETURN executeWithRetry(executor, input, context)

FUNCTION executeWithRetry(executor, input, context, maxAttempts = 3):
  FOR attempt FROM 1 TO maxAttempts:
    TRY:
      result = executor.execute(input, context)
      RETURN { success: true, data: result }
    CATCH error:
      IF attempt == maxAttempts OR NOT isRetryable(error):
        RETURN { success: false, error: error.message }

      delay = 2^(attempt - 1) * 1000
      WAIT(delay)

FUNCTION isRetryable(error):
  RETURN error.type IN [TIMEOUT, RATE_LIMIT]
         OR error.statusCode &gt;= 500
</code></code></pre><div><hr></div><h3>Benefits of Separation</h3><p><strong>For agent logic:</strong></p><ul><li><p>Agents focus on reasoning and decision-making</p></li><li><p>Tool failures don&#8217;t leak into agent state</p></li><li><p>Agents can be tested without real tools</p></li></ul><p><strong>For tool management:</strong></p><ul><li><p>Tools evolve independently</p></li><li><p>Retry, timeout, and fallback behavior is centralized</p></li><li><p>Metrics and logging are consistent</p></li></ul><p><strong>For operations:</strong></p><ul><li><p>Tool health is monitored separately from agent health</p></li><li><p>Deploying tools doesn&#8217;t require redeploying agents</p></li><li><p>Tool-level incidents are isolated and debuggable</p></li></ul><div><hr></div><h2>Architectural Principles That Matter</h2><h3>1. Explicit Over Implicit</h3><p>Every dependency, failure mode, and performance characteristic should be explicit and discoverable.</p><p><strong>Anti-pattern:</strong></p><pre><code><code>result = httpClient.get("https://api.example.com/search?q=" + query)
</code></code></pre><p><strong>Better:</strong></p><pre><code><code>result = toolRegistry.execute("search", { query })
</code></code></pre><div><hr></div><h3>2. Assume Failure, Design for Degradation</h3><p>Distributed systems fail in partial, unpredictable ways. Your architecture should make degradation explicit and graceful.</p><p>Questions worth answering <em>before</em> production:</p><ul><li><p>If this tool is slow, what happens?</p></li><li><p>If this tool returns partial data, is that acceptable?</p></li><li><p>If this tool is down, what&#8217;s the fallback?</p></li><li><p>Should the agent be aware of the degradation?</p></li></ul><div><hr></div><h3>3. Observability Is Not Optional</h3><p>You can&#8217;t improve what you can&#8217;t measure. Every tool call should be:</p><ul><li><p>Logged with correlation IDs</p></li><li><p>Metered for latency and error rates</p></li><li><p>Health-checked continuously</p></li></ul><div><hr></div><h3>4. Security Boundaries Are Architectural</h3><p>Tools have different trust levels and data sensitivity. These boundaries belong in architecture, not ad-hoc application code.</p><p>Key questions:</p><ul><li><p>Which tools can access user data?</p></li><li><p>Which tools can make external network calls?</p></li><li><p>How are credentials rotated?</p></li><li><p>What audit trail exists for tool usage?</p></li></ul><div><hr></div><h2>What Makes MCP Integration Different</h2><p>Unlike traditional API integration, MCP tools operate in a dynamic, agent-driven environment where:</p><ol><li><p>Tools are chosen at runtime</p></li><li><p>Tool combinations vary by context</p></li><li><p>Failure modes are compositional</p></li><li><p>Performance costs are cumulative</p></li></ol><p>Patterns that work for static APIs often collapse here. MCP systems must treat tools as <strong>first-class, runtime-discoverable components with explicit contracts</strong>.</p><div><hr></div><h2>Foundation Checklist</h2><p>Before moving on to Part 2, your MCP system should have:</p><ul><li><p>Clear layer boundaries between agents and tools</p></li><li><p>A tool registry with capability metadata</p></li><li><p>A consistent execution wrapper for retries and timeouts</p></li><li><p>Explicit failure contracts</p></li><li><p>Observability at tool boundaries</p></li><li><p>A defined security and credential model</p></li></ul><div><hr></div><h2>Coming Next: Part 2 &#8212; Resilience &amp; Runtime Behavior</h2><p>In Part 2, we&#8217;ll cover:</p><ul><li><p>Graceful degradation strategies</p></li><li><p>Lazy loading trade-offs</p></li><li><p>Why statelessness matters</p></li><li><p>Observability patterns that scale with tool count</p></li></ul><p><em>Link coming soon.</em></p><div><hr></div><h2>Reflection</h2><p>MCP tool integration isn&#8217;t about adding capabilities to agents.</p><p>It&#8217;s about building infrastructure that earns trust over time.</p><p>The systems that last aren&#8217;t the ones with the most tools&#8212;they&#8217;re the ones with clear boundaries, honest assumptions, and disciplined evolution.</p><p>Start with strong foundations. The rest follows naturally.</p>]]></content:encoded></item><item><title><![CDATA[3 Reasons Organizations Fail in AI Initiatives (And How to Avoid Them)]]></title><description><![CDATA[Drawing from my two decades in the tech industry, I&#8217;ve witnessed firsthand the transformative potential of Artificial Intelligence (AI).]]></description><link>https://puneetghanshani.substack.com/p/reasons-organizations-fail</link><guid isPermaLink="false">https://puneetghanshani.substack.com/p/reasons-organizations-fail</guid><dc:creator><![CDATA[Puneet Ghanshani]]></dc:creator><pubDate>Fri, 07 Mar 2025 00:00:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/ed0d035b-64fc-48eb-a1d9-1cd2a9e290bd_1080x1350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Drawing from my two decades in the tech industry, I&#8217;ve witnessed firsthand the transformative potential of Artificial Intelligence (AI). Yet, I&#8217;ve also observed numerous AI initiatives falter, not due to technological shortcomings, but because of strategic missteps. Understanding these pitfalls is crucial for steering AI projects toward success.</p><h2>Misalignment with Business Objectives</h2><p>I was part of a team that developed an advanced AI-driven recommendation system. Technically, it was a marvel, but it failed to resonate with the end-user. The reason? There was a lack of alignment of project&#8217;s goals to be able to recommend vs. business needs to cater to long tail (where there wasn&#8217;t much data) instead of power users. </p><p><strong>Common Pitfalls:</strong></p><ul><li><p>Pursuing AI for novelty&#8217;s sake without a clear business problem can lead to solutions in search of issues, wasting resources.</p></li><li><p>Without understanding how AI will drive efficiency, reduce costs, or boost revenue, projects can become directionless.</p></li><li><p>Selecting models, or data that don&#8217;t align with the business context can result in ineffective solutions.</p></li></ul><p><strong>How to Fix It:</strong></p><ul><li><p><strong>Define Clear Business Goals:</strong> Before venturing into AI, articulate the specific challenges you aim to address. For instance, are you looking to reduce customer churn, optimize supply chain logistics, or enhance product recommendations? By identifying concrete objectives, you ensure that the AI initiative has a targeted purpose and measurable outcomes.</p></li><li><p><strong>Assess ROI Before Investing:</strong> AI projects require substantial investments of time, money, and talent. Conducting a thorough cost-benefit analysis helps determine the potential return on investment, considering financial returns, operational efficiency, customer satisfaction, and market positioning.</p></li><li><p><strong>Choose the Right AI Model and Data:</strong> Aligning the AI model and data with your business needs is paramount. For example, if your goal is to analyze customer sentiment from social media, Natural Language Processing (NLP) models are appropriate. For inventory management optimization, predictive analytics models might be more suitable. Ensuring the chosen model fits the problem context and there&#8217;s enough data increases the likelihood of success.</p></li></ul><h2>Insufficient Data Quality and Quantity</h2><p>In one project, we developed an AI system that, during testing, showed promising results. However, 1 year post deployment, its performance declined. The culprit was inadequate data quality and no investments, post go-live, to keep up the data quality, train the model.</p><p><strong>Common Pitfalls:</strong></p><ul><li><p>Flawed data can lead to inaccurate models that perpetuate existing biases, resulting in decisions that may harm the business or its stakeholders.</p></li><li><p>Disorganized data hampers the training of effective AI models, making it challenging to extract meaningful insights.</p></li><li><p>Without frameworks to maintain data quality, datasets can become unreliable over time, leading to erosion of trust in AI systems.</p></li></ul><p><strong>How to Fix It:</strong></p><ul><li><p><strong>Ensure Data Quality:</strong> Implement robust data governance policies, including regular data audits, cleansing processes to rectify inaccuracies, and protocols to handle missing values. High-quality data serves as the foundation for reliable AI models.</p></li><li><p><strong>Secure Sufficient Data Quantity:</strong> AI models thrive on large datasets that capture various scenarios and nuances. Investing in comprehensive data collection strategies, and considering data augmentation techniques or synthetic data generation when real-world data is scarce, can enhance model performance.</p></li><li><p><strong>Keep Data Updated:</strong> The dynamic nature of business environments means that data can quickly become outdated. Establishing automated data pipelines ensures continuous integration of new data, allowing AI models to adapt to evolving patterns.</p></li></ul><h2>Lack of Specialized AI Expertise</h2><p>I recall a scenario where a company invested heavily in AI but lacked the in-house expertise to guide the project. This oversight led to expected outcomes not being met, or team just catching up with pace of technology that a vendor had installed as an accelerator - an accelerator that wasn&#8217;t the best fit!</p><p><strong>Common Pitfalls:</strong></p><ul><li><p>Without customization, generic tools may not address unique business challenges, leading to suboptimal performance.</p></li><li><p>A lack of skilled professionals can hinder the development, deployment, and maintenance of AI solutions, resulting in delays and increased costs.</p></li><li><p>Without guidelines, AI initiatives may face ethical dilemmas and operational inconsistencies, potentially leading to reputational damage and regulatory penalties.</p></li></ul><p><strong>How to Fix It:</strong></p><ul><li><p><strong>Invest in AI Talent Development:</strong> Building a team with the requisite AI skills is crucial. This can involve hiring experienced data scientists, providing training programs for existing staff, and fostering a culture that encourages continuous learning in AI and machine learning domains.</p></li><li><p><strong>Collaborate with Experts:</strong> Partnering with external consultants or AI vendors can provide the necessary expertise and accelerate implementation. These collaborations can also facilitate knowledge transfer, empowering internal teams to manage AI solutions independently in the future.</p></li><li><p><strong>Establish AI Governance and Ethics Policies:</strong> Developing a governance framework ensures that AI initiatives align with organizational values and regulatory requirements. This includes setting up ethics committees, defining accountability structures, and implementing monitoring mechanisms to oversee AI deployments responsibly.</p></li></ul><p>Reflecting on my journey, I&#8217;ve learned that embarking on an AI initiative requires more than just technological investment; it demands strategic alignment, robust data practices, and specialized expertise. By addressing these areas, organizations can transform potential pitfalls into stepping stones toward success.</p><p><em>What challenges have you faced in your AI endeavors?</em></p>]]></content:encoded></item><item><title><![CDATA[How to Breathe Life into Your Presentations]]></title><description><![CDATA[Ever sat through a presentation where every slide felt like a mini eulogy?]]></description><link>https://puneetghanshani.substack.com/p/breathe-life-into-presentations</link><guid isPermaLink="false">https://puneetghanshani.substack.com/p/breathe-life-into-presentations</guid><dc:creator><![CDATA[Puneet Ghanshani]]></dc:creator><pubDate>Wed, 05 Mar 2025 00:00:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/7bc84913-c1b1-4d1b-b841-de7210a83471_1080x1350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Ever sat through a presentation where every slide felt like a mini eulogy? We&#8217;ve all been there&#8212;endless text-heavy slides draining energy and engagement, turning what should be a dynamic discussion into a tedious monologue.</p><p>Recently, I sat through a presentation packed with hundreds of slides, and it got me thinking: How much is the audience really absorbing? Is the goal to cram in as much information as possible, or to craft a story that truly sticks?</p><p>Quality over quantity. Instead of overwhelming your audience, focus on a compelling narrative and key messages that resonate. Less isn&#8217;t just more&#8212;it&#8217;s memorable.</p><p>Here are a few strategies to overcome &#8220;death by slides&#8221;:</p><ul><li><p><strong>Tell a Story</strong>: Shift from bullet points to storytelling. Engage your audience with a narrative that connects the dots.</p></li><li><p><strong>Visual Impact</strong>: Use high-quality images, infographics, or minimalistic designs. A picture often communicates more than a wall of text.</p></li><li><p><strong>Interactivity</strong>: Ask questions, include polls, or encourage discussions. Make your audience part of the conversation.</p></li><li><p><strong>Keep it Simple</strong>: Focus on key messages. Simplicity and clarity can be far more persuasive than overwhelming details.</p></li><li><p><strong>Practice Delivery</strong>: A dynamic speaker can transform even the simplest slide into an engaging experience. Your energy is contagious!</p></li></ul><p>Transform your next presentation into an experience that inspires, motivates, and, most importantly, keeps your audience awake. Let&#8217;s break the cycle of boredom and create slides that spark conversations and drive action.</p>]]></content:encoded></item><item><title><![CDATA[Striking a Balance Between Experience and Cost in the Age of Cloud Computing]]></title><description><![CDATA[Start writing today.]]></description><link>https://puneetghanshani.substack.com/p/cost-and-experience</link><guid isPermaLink="false">https://puneetghanshani.substack.com/p/cost-and-experience</guid><dc:creator><![CDATA[Puneet Ghanshani]]></dc:creator><pubDate>Fri, 28 Feb 2025 00:00:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/96f775aa-5800-4115-b8a5-7dfdb35a20fa_461x263.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://substack.com/refer/puneetghanshani?utm_source=substack&amp;utm_context=post&amp;utm_content=158109550&amp;utm_campaign=writer_referral_button&quot;,&quot;text&quot;:&quot;Start a Substack&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Start writing today. Use the button below to create a Substack of your own</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://substack.com/refer/puneetghanshani?utm_source=substack&amp;utm_context=post&amp;utm_content=158109550&amp;utm_campaign=writer_referral_button&quot;,&quot;text&quot;:&quot;Start a Substack&quot;,&quot;hasDynamicSubstitutions&quot;:false}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://substack.com/refer/puneetghanshani?utm_source=substack&amp;utm_context=post&amp;utm_content=158109550&amp;utm_campaign=writer_referral_button"><span>Start a Substack</span></a></p></div><p>In a previous article, I explored managing cloud costs from a technological perspective, and based on your feedback, today we&#8217;re diving deeper into balancing performance and budget. Many IT professionals have experienced the shift from fixed-capacity infrastructure to cloud-driven architectures. The move to the cloud has brought incredible benefits such as scalability, flexibility, and automation, yet it has also introduced a level of cost unpredictability that did not exist before.</p><p>Imagine an e-commerce website on Azure that faces significant traffic surges during peak shopping seasons. To maintain performance, the site must autoscale across various tiers. However, if every layer scales independently without coordination, the result can be uncontrolled cloud expenses. This begs the question: How can we achieve cost efficiency without sacrificing user experience?</p><h2>Understanding Demand-Based Scaling</h2><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KxBn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7e13d17-788a-4de0-b53d-81a00c123124_461x263.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KxBn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7e13d17-788a-4de0-b53d-81a00c123124_461x263.png 424w, https://substackcdn.com/image/fetch/$s_!KxBn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7e13d17-788a-4de0-b53d-81a00c123124_461x263.png 848w, https://substackcdn.com/image/fetch/$s_!KxBn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7e13d17-788a-4de0-b53d-81a00c123124_461x263.png 1272w, https://substackcdn.com/image/fetch/$s_!KxBn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7e13d17-788a-4de0-b53d-81a00c123124_461x263.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KxBn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7e13d17-788a-4de0-b53d-81a00c123124_461x263.png" width="461" height="263" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c7e13d17-788a-4de0-b53d-81a00c123124_461x263.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:263,&quot;width&quot;:461,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:26965,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://puneetghanshani.substack.com/i/158109550?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7e13d17-788a-4de0-b53d-81a00c123124_461x263.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!KxBn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7e13d17-788a-4de0-b53d-81a00c123124_461x263.png 424w, https://substackcdn.com/image/fetch/$s_!KxBn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7e13d17-788a-4de0-b53d-81a00c123124_461x263.png 848w, https://substackcdn.com/image/fetch/$s_!KxBn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7e13d17-788a-4de0-b53d-81a00c123124_461x263.png 1272w, https://substackcdn.com/image/fetch/$s_!KxBn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc7e13d17-788a-4de0-b53d-81a00c123124_461x263.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>The diagram above illustrates the trade-off inherent in demand-based scaling. The horizontal line represents the threshold of &#8220;acceptable user experience,&#8221; while the bell-shaped curve shows typical demand over time. As demand increases beyond the acceptable threshold, additional resources must be provisioned, incurring extra costs. Ideally, you want to scale enough to preserve user experience during peak loads, yet avoid over-scaling that leads to runaway expenses.</p><p>Striking this balance is at the heart of cloud cost management.</p><h2>On-Premises vs. Cloud: A Cost Management Perspective</h2><p>Before the advent of cloud computing, managing IT costs was straightforward but inflexible. Organizations procured infrastructure upfront, with budgets designed around fixed-capacity purchases. This approach ensured predictable expenses but came at the cost of agility. IT teams would often plan for peak loads, meaning they had to pay for excess capacity during normal operations. The procurement process involved multiple levels of approval, instilling financial discipline yet making it difficult to scale quickly when unexpected demand occurred.</p><p>With the cloud, the paradigm shifted to on-demand scalability and a transformation from capital expenditures (CapEx) to operational expenditures (OpEx). This allowed businesses to scale faster and more dynamically.</p><p>However, the ease of scaling also means that if autoscaling is left unchecked&#8212;especially when multiple application tiers scale independently&#8212;costs can quickly spiral, undermining the benefits of improved performance.</p><h2>Smart Scaling: Coordinated Autoscaling for Optimal Performance</h2><p>There is a common misconception that autoscaling automatically resolves performance challenges. In reality, an uncoordinated scaling approach can lead to significant cost spikes. Consider a news website that suddenly experiences a viral surge in traffic. Typically, web servers might scale first, followed by application services, and then databases. Without coordination, this cascading scale-out results in an exponential rise in expenses.</p><p>A more effective strategy involves coordinating autoscaling across tiers. For example, web and application layers should work in tandem so that additional resources are introduced only when absolutely necessary. Implementing caching mechanisms such as Azure Front Door or Azure Cache for Redis can alleviate the load on backend services, while database autoscaling should be optimized using tools like Azure SQL Hyperscale or Cosmos DB autoscale. This ensures that compute power is added precisely when needed, rather than as a reflexive response to a traffic surge.</p><h2>API Cost Management: Taming Request-Driven Expenses</h2><p>APIs often serve as a hidden driver of cloud costs. Take, for example, a financial services application offering real-time stock tracking. Such an application may process millions of requests per second, which can inadvertently trigger backend resources to scale beyond what is required. This uncontrolled scaling can lead to enormous cloud bills, even if the performance improvements are marginal.</p><p>The solution lies in implementing measures to control API usage. Using Azure API Management (APIM) to enforce rate limits ensures that free-tier users or unexpected traffic bursts do not overwhelm the system. Additionally, caching frequently requested API responses helps reduce the strain on backend systems. Adopting quota-based API pricing models further aligns usage with cost, ensuring that the expense grows in proportion to actual demand rather than uncontrolled scaling.</p><h2>Strategic Workload Placement with Azure Services</h2><p>Choosing the right Azure service for each workload is a critical element of cost control. When workloads are assigned to services that best fit their usage patterns, businesses avoid over-provisioning and unnecessary expenditure. For example, web applications benefit from the managed hosting environment of Azure App Service, while containerized applications are well-suited for Azure Kubernetes Service (AKS). Event-driven workloads can leverage Azure Functions, where costs are based solely on execution time rather than continuous resource allocation.</p><p>Similarly, for data-intensive processes, on-demand solutions like Azure Synapse Analytics or Microsoft Fabric enable payment only for the compute power used at the moment. Storage solutions such as Azure Blob Storage take advantage of tiered pricing, automatically moving infrequently accessed data to lower-cost storage options. These strategic decisions play a pivotal role in preventing cost spirals and ensuring that each workload is both performant and cost-effective.</p><h2>FinOps: Integrating IT, Finance, and Operations</h2><p>Managing cloud costs effectively is no longer solely an IT challenge&#8212;it requires a harmonious integration of engineering, finance, and business insights. The adoption of FinOps practices has emerged as a robust approach to achieving this balance. By establishing accountability for cloud spending through chargeback or showback models, organizations can ensure that every team understands the financial impact of their resource usage. Tools like Azure Cost Management provide real-time budget thresholds and alerts, enabling proactive adjustments to prevent overspending.</p><p>Furthermore, optimizing resource commitments with options like Reserved Instances and Spot VMs can reduce long-term costs, making the overall cloud strategy financially sustainable. This integrated approach transforms cloud cost management into a collaborative effort, aligning technical efficiency with financial prudence.</p><h2>Continuous Optimization: Cultivating a Cost-Conscious Culture</h2><p>Effective cloud cost management is an ongoing journey rather than a one-time fix. It requires continuous evaluation of spending patterns, workload placements, and scaling policies. By fostering a culture where cost awareness is part of everyday operations, IT teams can proactively audit expenses and optimize underutilized resources. Emphasizing the use of serverless and managed services, where applicable, further minimizes the need for dedicated VMs and drives cost efficiency. Educating development teams about the financial implications of their architectural decisions ensures that everyone is aligned in the pursuit of sustainable cloud operations.</p><h2>The Takeaway: Architecting for Experience and Cost Efficiency</h2><p>Balancing high-performing digital experiences with cost-effective cloud strategies demands a deliberate, strategic approach. Through coordinated autoscaling, intelligent API management, thoughtful workload placement, and robust FinOps practices, organizations can harness the full potential of cloud computing without letting expenses spiral out of control.</p><p>Key strategies include:</p><ul><li><p>Coordinated autoscaling to prevent exponential cost increases</p></li><li><p>Controlled API usage to manage request-driven expenses</p></li><li><p>Strategic placement of workloads on the most appropriate Azure services</p></li><li><p>A collaborative FinOps approach that integrates IT, finance, and business operations</p></li></ul><p>What strategies have helped you optimize cloud costs in your organization? Let&#8217;s discuss and learn from each other&#8217;s experiences.</p>]]></content:encoded></item><item><title><![CDATA[Balancing Performance and Cost in Cloud Architecture]]></title><description><![CDATA[Cloud adoption brings immense scalability and flexibility, but managing costs while maintaining optimal performance is a delicate balance.]]></description><link>https://puneetghanshani.substack.com/p/cloud-costs</link><guid isPermaLink="false">https://puneetghanshani.substack.com/p/cloud-costs</guid><dc:creator><![CDATA[Puneet Ghanshani]]></dc:creator><pubDate>Thu, 13 Feb 2025 00:00:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/8db73fef-9657-4695-8d2c-6f1cc835c643_1080x1350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Cloud adoption brings immense scalability and flexibility, but managing costs while maintaining optimal performance is a delicate balance. Without strategic planning, businesses may either overspend on underutilized resources or compromise performance by cutting costs too aggressively. The key is <strong>right-sizing infrastructure based on actual workloads and implementing cost-efficient scaling mechanisms</strong>.</p><div><hr></div><h2><strong>Key Strategies for Cost-Effective Cloud Performance</strong></h2><h3><strong>Optimize Resource Allocation with Load Testing</strong></h3><p>To efficiently allocate cloud resources, businesses must understand actual workload demands. This is where <strong>nominal and peak load testing</strong> come into play.</p><ul><li><p><strong>Nominal Load Testing</strong> assesses performance under <strong>normal traffic conditions</strong> (e.g., 500&#8211;1,000 Requests Per Second (RPS) for an e-commerce platform during regular hours).</p></li><li><p><strong>Peak Load Testing</strong> simulates <strong>traffic surges</strong> to ensure the system can handle unexpected spikes (e.g., 5,000 RPS during a Black Friday sale).</p></li></ul><p>By analyzing these tests, organizations can set an <strong>effective baseline for expected load (Events Per Second - EPS)</strong>. This ensures that resources are neither <strong>over-provisioned (leading to unnecessary costs)</strong> nor <strong>under-provisioned (causing system failures during traffic surges)</strong>.</p><h4><strong>Re-baselining EPS for Load Adjustments</strong></h4><p>As business demand fluctuates, it&#8217;s essential to <strong>re-baseline EPS</strong> periodically:</p><ul><li><p>If nominal load decreases over time (e.g., a drop from 1,000 RPS to 600 RPS due to seasonal variation), resources should be <strong>scaled down</strong> to avoid wastage.</p></li><li><p>If traffic steadily increases (e.g., a startup growing from 200 RPS to 1,500 RPS), <strong>scaling up ahead of time</strong> prevents bottlenecks.</p></li></ul><div><hr></div><h3><strong>Auto-Scale and Load Balance for Efficiency</strong></h3><p>Modern cloud platforms, like <strong>Azure</strong>, offer <strong>auto-scaling</strong> to dynamically allocate resources based on real-time demand:</p><ul><li><p><strong>Horizontal Scaling</strong>: Adds/removes instances based on load.</p></li><li><p><strong>Vertical Scaling</strong>: Adjusts CPU and memory allocations dynamically.</p></li></ul><p>Coupled with <strong>load balancing</strong>, this ensures that:</p><ul><li><p>Resources are efficiently used.</p></li><li><p>No single instance is overwhelmed.</p></li><li><p>Scaling is automatic, reducing both costs and manual interventions.</p></li></ul><div><hr></div><h3><strong>Rate Limits and Throttling to Prevent Noisy Neighbors</strong></h3><p>One major challenge in shared cloud environments is <strong>noisy neighbors</strong>&#8212;where excessive API requests from one service impact other applications sharing the same infrastructure.</p><p>To prevent incorrect consumption patterns:</p><ul><li><p><strong>Rate limiting</strong> ensures that services do not exceed pre-defined thresholds (e.g., limiting API calls to 100 RPS per user to prevent abuse).</p></li><li><p><strong>Throttling</strong> slows down or rejects requests beyond a certain limit, ensuring fair resource distribution.</p></li></ul><p>For example, a payment gateway might limit each user to <strong>10 transactions per second</strong> to prevent system overload while ensuring priority transactions go through.</p><div><hr></div><h3><strong>Use Spot and Reserved Instances for Cost Savings</strong></h3><p>Cloud providers offer different pricing models to optimize costs:</p><ul><li><p><strong>Spot Instances</strong> allow businesses to use spare compute capacity at <strong>discounted rates</strong>&#8212;ideal for batch processing and background jobs.</p></li><li><p><strong>Reserved Instances</strong> offer significant discounts for predictable, long-term workloads.</p></li></ul><p>A hybrid strategy blending <strong>on-demand, reserved, and spot instances</strong> optimizes both <strong>cost and availability</strong>.</p><div><hr></div><h3><strong>Leverage Serverless Computing and Managed Databases</strong></h3><p>For workloads with fluctuating demand, <strong>serverless computing</strong> offers a cost-effective alternative:</p><ul><li><p><strong>Azure Functions</strong> scale automatically based on event triggers.</p></li><li><p><strong>Managed Databases</strong> (like Azure SQL Database) adjust performance dynamically.</p></li></ul><p>Since serverless models <strong>only charge for execution time</strong>, businesses <strong>avoid paying for idle resources</strong>.</p><div><hr></div><h3><strong>Monitor Firewall Activity to Prevent Costly Attacks</strong></h3><p>Cloud firewalls and Web Application Firewalls (WAFs) protect against <strong>malicious traffic spikes</strong>, which can:</p><ol><li><p><strong>Compromise security</strong> (e.g., DDoS attacks flooding systems with millions of requests).</p></li><li><p><strong>Increase cloud costs</strong> due to excessive bandwidth and compute consumption.</p></li></ol><p>To mitigate such risks:</p><ul><li><p><strong>Enable automated threat detection</strong> to block unauthorized traffic before it reaches cloud resources.</p></li><li><p><strong>Monitor logs for unusual patterns</strong> (e.g., a sudden jump from 1,000 to 50,000 RPS from unknown IPs).</p></li><li><p><strong>Use AI-driven security rules</strong> to adapt to evolving threats.</p></li></ul><p>By proactively <strong>monitoring and blocking malicious activity</strong>, businesses <strong>avoid unnecessary costs while ensuring security</strong>.</p><div><hr></div><h3><strong>Implement Tiered Storage and Lifecycle Policies</strong></h3><p>Storage costs in cloud environments can be optimized using:</p><ul><li><p><strong>Hot storage</strong> for frequently accessed data.</p></li><li><p><strong>Cold storage</strong> (e.g., Azure Blob Storage) for archived data.</p></li><li><p><strong>Lifecycle policies</strong> to automatically transition data between tiers based on access frequency.</p></li></ul><p>For example, <strong>customer invoices older than 6 months</strong> can be <strong>moved to cold storage</strong>, reducing costs without affecting performance.</p><div><hr></div><h3><strong>Optimize Cost with Cloud Monitoring Tools</strong></h3><p>Regular monitoring helps businesses <strong>avoid surprises in billing</strong>:</p><ul><li><p><strong>Azure Cost Management</strong> provides real-time cost analytics.</p></li><li><p><strong>Budget alerts</strong> notify teams when expenses exceed thresholds.</p></li><li><p><strong>Anomaly detection</strong> flags unexpected resource spikes.</p></li></ul><p>By tracking cloud usage trends, companies can <strong>identify opportunities to downscale or optimize resources</strong>, ensuring efficient spending.</p><div><hr></div><h3><strong>Containerization with Kubernetes and Docker</strong></h3><p>Containerization helps businesses <strong>maximize resource efficiency</strong>:</p><ul><li><p><strong>Kubernetes (K8s)</strong> automates deployment and scaling of microservices.</p></li><li><p><strong>Docker</strong> ensures lightweight, portable applications that optimize infrastructure use.</p></li></ul><p>For example, a <strong>web app running on Kubernetes</strong> can scale <strong>only its API services</strong> during peak load rather than scaling the entire application.</p><div><hr></div><h3><strong>Minimize Data Transfer Costs with CDNs</strong></h3><p>Data transfer costs can escalate quickly in cloud environments.<br>To <strong>reduce bandwidth expenses</strong>:</p><ul><li><p><strong>CDNs (Content Delivery Networks)</strong> cache frequently accessed data close to end-users, reducing the need for repeated requests to the main server.</p></li><li><p><strong>Optimize inter-region data transfers</strong> to avoid unnecessary cross-datacenter traffic.</p></li></ul><p>For example, an e-learning platform delivering video content globally can <strong>use Azure CDN</strong> to serve videos from edge locations rather than streaming directly from its central storage.</p><div><hr></div><h2><strong>Final Thoughts</strong></h2><p>Balancing cost and performance in cloud architecture is a continuous process of <strong>right-sizing resources, implementing automation, monitoring security, and optimizing data management</strong>. By leveraging <strong>load testing, auto-scaling, serverless computing, rate limiting, firewall monitoring, and cost management tools</strong>, organizations can <strong>achieve scalability, security, and efficiency without unnecessary expenses</strong>.</p><h3><strong>What&#8217;s Next?</strong></h3><ul><li><p><strong>Analyze your current cloud resource usage</strong> and identify inefficiencies.</p></li><li><p><strong>Implement auto-scaling and load balancing</strong> to dynamically adjust costs.</p></li><li><p><strong>Use security monitoring</strong> to prevent attacks that drive up cloud bills.</p></li><li><p><strong>Regularly re-baseline EPS</strong> to align infrastructure with actual demand.</p></li></ul><p><strong>Are your cloud costs under control?</strong> Now is the time to optimize!</p>]]></content:encoded></item><item><title><![CDATA[Tiny Steps, Big Swings: Coaching Tennis]]></title><description><![CDATA[Guiding children under 10 in tennis is like building a brand-new structure from the ground up: you need a clear plan, the right environment, and step-by-step execution.]]></description><link>https://puneetghanshani.substack.com/p/itf-coach-certification</link><guid isPermaLink="false">https://puneetghanshani.substack.com/p/itf-coach-certification</guid><dc:creator><![CDATA[Puneet Ghanshani]]></dc:creator><pubDate>Wed, 15 Jan 2025 00:00:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/8bda9c9f-cad0-424a-8e5e-95ba9e5a6ea2_1080x1350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Guiding children under 10 in tennis is like building a brand-new structure from the ground up: you need a clear plan, the right environment, and step-by-step execution. When I first started as an ITF-certified Play Tennis coach&#8212;I was eager to share drills and exercises. But it quickly became clear that the psychology of learning is vital&#8212;especially for younger players whose focus can shift in a moment. This post outlines some of my key learnings.</p><p>It&#8217;s kind of like when we&#8217;re teaching kids to tie their shoes: we want them to see the steps and then try them out. The same principle applies on the tennis court. Rather than explaining the grip or footwork in a detailed speech, we demonstrate with short, clear actions. Show vs. tell becomes a game-changer when working with young learners who thrive on visual cues.</p><p>Fun should always be at the heart of our lessons for under-10 players. It&#8217;s crucial that kids get to experience the thrill of playing tennis right from the very first session, rather than being bogged down by too much technique. We can think of it as giving them a quick, hands-on preview so they see the basics in action. By emphasizing play early on, we can show them that tennis is accessible and exciting, laying the groundwork for deeper skill development later. This approach motivates the kids to continue playing Tennis much longer.</p><p>We also know children can lose interest quickly if drills aren&#8217;t engaging. That&#8217;s why we should plan activities that blend competition (like mini-tournaments or timed challenges) with cooperation (like partner drills or team-based targeting games). This balance keeps the energy high and ensures they feel both challenged and supported.</p><p>Safety is equally important. An unsafe environment not only poses a risk to the child&#8217;s health but can also disrupt the entire class. Think of it as rushing into a big, untested change without any safeguards: we&#8217;d open the door to problems that can derail everyone&#8217;s progress. So, we should ensure proper net coverage, clear any clutter around the court, and keep an eye on the intensity of drills. This way, kids can remain confident in their abilities and we can teach more effectively.</p><p>Planning our sessions is key. We should not just show up with tennis balls and hope for the best. A successful lesson plan might include:</p><ul><li><p>A quick warm-up and icebreaker (like a playful run around the court).</p></li><li><p>A main skill focus, such as basic forehand or backhand techniques demonstrated visually.</p></li><li><p>Adjusted drills allowing kids to practice in pairs, reinforcing both skills and social engagement.</p></li><li><p>Properly supervised play to maintain safety and reduce the risk of injury.</p></li><li><p>A playful wrap-up that gives them a sense of achievement and motivation to come back.</p></li></ul><p>We should also remember some players might be left-handed. It&#8217;s almost like mirroring our instructions for those who need a slightly different approach. Showing them the correct grip and movement can save everyone a lot of confusion later on.</p><p>A lot of these learnings are also from training my son who is a tennis enthusiast and plays relentlessly in various clubs, and from following many professional tennis coaches. There&#8217;s much to observe, learn and adopt. I wouldn&#8217;t be where I am without the incredible guidance of the coaches at the Singapore Tennis Association.</p><p>I am looking forward to be coaching and shaping up players!</p>]]></content:encoded></item><item><title><![CDATA[Leveraging NLP in Data Analytics: Transforming Healthcare and Beyond]]></title><description><![CDATA[In a world where vast amounts of unstructured text data are continuously generated, the power of Natural Language Processing (NLP) has become indispensable.]]></description><link>https://puneetghanshani.substack.com/p/nlp-data-analytics</link><guid isPermaLink="false">https://puneetghanshani.substack.com/p/nlp-data-analytics</guid><dc:creator><![CDATA[Puneet Ghanshani]]></dc:creator><pubDate>Tue, 01 Oct 2024 00:00:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/f99d9a73-73b8-4b1a-8498-07ba7f0521e9_1080x1350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In a world where vast amounts of unstructured text data are continuously generated, the power of Natural Language Processing (NLP) has become indispensable. From healthcare to banking and other businesses, NLP enables organizations to extract valuable insights from this data, leading to better decision-making and operational improvements. While this article focuses on healthcare to highlight significant advancements using NLP, the potential of these technologies stretches far beyond the medical field.</p><h2>How NLP Helps with Data Analytics</h2><p><strong>Analyzing Patient Feedback in Healthcare</strong></p><p>In healthcare, NLP is invaluable for processing patient feedback from surveys, social media, and online reviews, transforming it into actionable insights. For example, hospitals can use sentiment analysis to identify recurring issues, such as long wait times or communication challenges. By quantifying these concerns, healthcare providers can address specific pain points, enhance patient satisfaction, and track improvements over time.</p><p><strong>Mining Electronic Health Records (EHRs)</strong></p><p>One of the most transformative uses of NLP in healthcare is the extraction of valuable information from unstructured clinical notes and medical records. NLP can help hospitals:</p><ul><li><p><strong>Detect Health Risks Early</strong>: By identifying patterns in clinical notes, NLP can flag potential health risks before they escalate, enabling timely interventions.</p></li><li><p><strong>Evaluate Treatment Effectiveness</strong>: Analyzing patient outcomes in clinical records allows healthcare providers to assess the success of various treatments.</p></li><li><p><strong>Optimize Resources</strong>: Understanding patient needs and treatment patterns can guide better resource allocation, ensuring hospitals are staffed and equipped to meet demand.</p></li></ul><p><strong>Improving Clinical Decision Support</strong></p><p>NLP enhances clinical decision support by automating the summarization of patient histories from multiple sources. Instead of manually reviewing medical records, doctors can quickly access a comprehensive summary, aiding faster and more accurate decision-making. NLP also supports the analysis of medical literature, ensuring that clinicians have access to the most current and relevant research when making decisions.</p><p><strong>Enhancing Medical Coding</strong></p><p>Medical coding, traditionally a time-consuming process, can be automated with NLP. By extracting key information from clinical notes, NLP tools can assign accurate codes, reducing human error and improving billing accuracy, while also ensuring compliance with regulatory standards.</p><h3>Real-World Impact in Healthcare</h3><p>NLP&#8217;s impact on healthcare is significant. Consider a hospital network that has implemented NLP-driven data analytics:</p><ul><li><p><strong>Patient Satisfaction</strong>: Addressing recurring issues identified from patient feedback led to a 15% improvement in patient satisfaction scores.</p></li><li><p><strong>Early Intervention</strong>: NLP systems identified potential complications in 8% of patients before they became critical, improving outcomes and reducing costs.</p></li><li><p><strong>Operational Efficiency</strong>: Automated medical coding reduced errors by 30% and saved 40% of the time spent on coding.</p></li><li><p><strong>Resource Allocation</strong>: Insights derived from NLP analytics improved staff scheduling by 20% and reduced unnecessary tests by 10%.</p></li></ul><h3>Challenges and Considerations in Healthcare</h3><p>Implementing NLP in healthcare presents unique challenges:</p><ul><li><p><strong>Data Privacy</strong>: Ensuring compliance with patient confidentiality regulations such as HIPAA is critical.</p></li><li><p><strong>Integration</strong>: Integrating NLP tools with existing healthcare systems requires careful planning to ensure smooth operations.</p></li><li><p><strong>Accuracy</strong>: NLP models need continual refinement to handle complex medical terminology accurately.</p></li><li><p><strong>User Adoption</strong>: Proper training is essential for healthcare professionals to trust and use NLP-driven insights effectively.</p></li></ul><h2>NLP in Broader Data Analytics</h2><p>Beyond healthcare, NLP is revolutionizing how businesses analyze text data. The applications are diverse and wide-ranging:</p><ul><li><p><strong>Text Analysis and Classification</strong>: NLP helps businesses categorize large volumes of text data, uncovering patterns and trends across customer feedback, reviews, or social media posts.</p></li><li><p><strong>Sentiment Analysis</strong>: NLP tools can analyze customer satisfaction by gauging the emotional tone in reviews and social media conversations.</p></li><li><p><strong>Chatbots and Virtual Assistants</strong>: Powered by NLP, chatbots can handle customer queries, improve customer service, and enhance user engagement by providing quick, conversational support.</p></li><li><p><strong>Text Summarization</strong>: NLP algorithms condense lengthy reports, articles, or research papers into concise summaries, saving time for decision-makers and analysts.</p></li></ul><p>The possibilities are endless in sectors ranging from retail to finance, where NLP can be used for automating customer service, improving brand monitoring, and driving business intelligence.</p><h3>The Future of NLP in Data Analytics</h3><p>Looking ahead, NLP technology will continue to evolve and unlock even more potential. As NLP models become more sophisticated, they will:</p><ul><li><p>Provide deeper insights by better understanding complex linguistic nuances and cultural contexts.</p></li><li><p>Integrate seamlessly with other advanced technologies such as computer vision, making way for more comprehensive and multi-dimensional applications.</p></li><li><p>Offer faster and more accurate data analysis, making these tools more accessible and beneficial to organizations of all sizes.</p></li></ul><h2>Ethical Considerations in Using NLP for Data Analytics</h2><p>While NLP offers powerful benefits, it&#8217;s crucial to consider the ethical implications of its use. Here are the key ethical issues that must be addressed:</p><ul><li><p><strong>Privacy and Data Protection</strong>: With the increasing use of NLP to process sensitive data, privacy and data protection must be top priorities. Ensuring proper consent, anonymizing data, and implementing robust security measures are critical to maintaining trust and complying with regulations like GDPR and HIPAA.</p></li><li><p><strong>Bias and Fairness</strong>: NLP models can inadvertently perpetuate biases present in training data. It&#8217;s essential to use diverse datasets to train models and regularly audit them to ensure fair outcomes across all demographic groups, mitigating the risk of discrimination.</p></li><li><p><strong>Transparency and Explainability</strong>: The complexity of NLP models, especially deep learning-based models, can make it challenging to understand how decisions are made. Developing more interpretable models and providing clear documentation helps ensure that decisions can be explained and trusted.</p></li><li><p><strong>Consent and Ownership</strong>: When using publicly available text data, clear guidelines must be established to ensure ethical collection and use. Organizations should respect intellectual property rights and be transparent with users about how their data will be analyzed.</p></li><li><p><strong>Accountability</strong>: As NLP plays a bigger role in decision-making, establishing accountability is crucial. Governance frameworks and regular audits are necessary to ensure that NLP systems are used responsibly and fairly.</p></li></ul><h2>Leveraging NLP with Azure Services</h2><p>Azure provides an excellent suite of tools to leverage NLP in data analytics through its <strong>Azure AI Language Service</strong>. Key features of the service include:</p><ul><li><p><strong>Sentiment Analysis</strong> to gauge customer feedback.</p></li><li><p><strong>Key Phrase Extraction</strong> for identifying main points in text.</p></li><li><p><strong>Named Entity Recognition (NER)</strong> to categorize entities like people, places, and organizations.</p></li><li><p><strong>Language Detection</strong> to automatically identify languages in input text.</p></li></ul><p>By using Azure&#8217;s NLP capabilities, businesses can automate and improve their analysis of large text datasets, enhancing decision-making and operational efficiency across industries.</p><h2>Conclusion</h2><p>NLP is a powerful tool in data analytics, transforming the way we process and analyze text data. Whether in healthcare, business, or any other field, NLP can uncover valuable insights that drive efficiency, improve decision-making, and enhance customer experiences. However, as we unlock these capabilities, it&#8217;s essential to address the ethical considerations that come with it. By focusing on privacy, fairness, transparency, and accountability, we can harness the full potential of NLP while ensuring that it benefits all users responsibly. With platforms like Azure providing robust NLP services, the future of data analytics is bright, offering new opportunities to innovate and excel in any industry.</p>]]></content:encoded></item><item><title><![CDATA[Types of Chunking Mechanisms for RAG]]></title><description><![CDATA[Chunking is a critical component in Retrieval-Augmented Generation (RAG) systems, influencing efficiency, accuracy, and performance.]]></description><link>https://puneetghanshani.substack.com/p/chunking-approaches</link><guid isPermaLink="false">https://puneetghanshani.substack.com/p/chunking-approaches</guid><dc:creator><![CDATA[Puneet Ghanshani]]></dc:creator><pubDate>Tue, 10 Sep 2024 00:00:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/a229032b-ae56-4666-a43c-6e59bfd3a619_1080x1350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Chunking is a critical component in Retrieval-Augmented Generation (RAG) systems, influencing efficiency, accuracy, and performance. Effective chunking enhances information retrieval, optimizing how language models generate responses. This article explores various chunking mechanisms, their ideal use cases, and best practices, along with Python implementation examples.</p><h2>Types of Chunking Mechanisms</h2><h3>Fixed-Size Chunking</h3><p>Fixed-size chunking divides text into uniform-sized segments based on a predefined number of characters, words, or tokens.</p><ul><li><p><strong>Retrieval Efficiency:</strong> High due to consistent chunk sizes.</p></li><li><p><strong>Best for:</strong> Simple data processing where speed is prioritized over contextual coherence.</p></li><li><p><strong>Industries &amp; Data Types:</strong></p><ul><li><p>Financial transactions and banking logs</p></li><li><p>Sensor data processing in IoT applications</p></li><li><p>Server logs and system monitoring data</p></li></ul></li><li><p><strong>Example Scenario:</strong> Processing large volumes of standardized reports or logs.</p></li></ul><p><strong>Effect of Chunk Size:</strong></p><ul><li><p>Smaller chunks (e.g., 100-200 tokens) increase granularity but may lose context.</p></li><li><p>Larger chunks (e.g., 500-1000 tokens) retain more context but may introduce irrelevant information.</p></li></ul><div><hr></div><h3>Semantic Chunking</h3><p>Semantic chunking segments text based on meaning rather than fixed sizes, ensuring that each chunk maintains contextual integrity. You can check out <a href="https://www.nltk.org/api/nltk.tokenize.html">NLTK</a> for <code>Semantic Chunking</code>.</p><ul><li><p><strong>Retrieval Efficiency:</strong> Moderate to high, depending on complexity.</p></li><li><p><strong>Best for:</strong> Complex documents requiring high contextual accuracy.</p></li><li><p><strong>Industries &amp; Data Types:</strong></p><ul><li><p>Healthcare: Medical research papers and patient case studies</p></li><li><p>Legal: Contracts and compliance documentation</p></li><li><p>Scientific Research: White papers and journal articles</p></li></ul></li><li><p><strong>Example Scenario:</strong> Academic papers or technical documentation.</p></li></ul><p><strong>Effect of Chunk Size:</strong></p><ul><li><p>Larger semantic units improve context but may slow down retrieval.</p></li></ul><div><hr></div><h3>Recursive Chunking</h3><p>Recursive chunking progressively divides text into smaller segments while preserving meaningful units like sentences or phrases.</p><ul><li><p><strong>Retrieval Efficiency:</strong> Moderate, balancing granularity and context.</p></li><li><p><strong>Best for:</strong> Hierarchical documents such as legal texts.</p></li><li><p><strong>Industries &amp; Data Types:</strong></p><ul><li><p>Legal: Multi-section contracts and regulatory policies</p></li><li><p>Technical: API documentation with nested structures</p></li><li><p>Government: Policy papers and legislative texts</p></li></ul></li><li><p><strong>Example Scenario:</strong> Processing contracts or nested technical specifications.</p></li></ul><p><strong>Effect of Chunk Size:</strong></p><ul><li><p>Smaller recursive chunks improve granularity for specific queries.</p></li></ul><div><hr></div><h3>Hybrid Chunking</h3><p>Hybrid chunking combines multiple strategies to optimize chunking based on document structure.</p><ul><li><p><strong>Retrieval Efficiency:</strong> Variable, depending on the techniques used.</p></li><li><p><strong>Best for:</strong> Documents with mixed content types.</p></li><li><p><strong>Industries &amp; Data Types:</strong></p><ul><li><p>Corporate: Business reports, emails, and presentations</p></li><li><p>Educational: Course materials and e-learning documents</p></li><li><p>Marketing: Ad copies, customer reviews, and case studies</p></li></ul></li><li><p><strong>Example Scenario:</strong> Corporate documents containing reports, emails, and presentations.</p></li></ul><div><hr></div><h3>Agentic Chunking</h3><p>This advanced method uses autonomous AI agents to dynamically determine chunk boundaries based on context.</p><ul><li><p><strong>Retrieval Efficiency:</strong> High when optimized but can be resource-intensive.</p></li><li><p><strong>Best for:</strong> Dynamic content such as social media or news feeds.</p></li><li><p><strong>Industries &amp; Data Types:</strong></p><ul><li><p>Journalism: Real-time news articles and updates</p></li><li><p>Social Media: Tweets, blog posts, and live feeds</p></li><li><p>Customer Support: Chat logs and ticketing systems</p></li></ul></li><li><p><strong>Example Scenario:</strong> Processing real-time information.</p></li></ul><p><strong>Effect of Chunk Size:</strong></p><ul><li><p>AI-driven segmentation enhances context-aware retrieval.</p></li></ul><div><hr></div><h3>Embedding-Based Chunking</h3><p>This method uses embedding models to determine chunk boundaries based on semantic similarity. You can check out <a href="https://sbert.net/">SentenceTransformer</a> to perform embedding-based chunking.</p><ul><li><p><strong>Retrieval Efficiency:</strong> Moderate to high.</p></li><li><p><strong>Best for:</strong> Applications requiring high semantic coherence.</p></li><li><p><strong>Industries &amp; Data Types:</strong></p><ul><li><p>E-commerce: Customer feedback, product reviews, and recommendations</p></li><li><p>HR: Resume parsing and job descriptions</p></li><li><p>Cybersecurity: Threat intelligence reports and risk assessments</p></li></ul></li><li><p><strong>Example Scenario:</strong> Customer feedback analysis or product reviews.</p></li></ul><h2>Performance Comparisons</h2><p>Chunking Method Retrieval Efficiency Context Preservation Ideal Use Case Fixed-Size Chunking High Low Logs, reports Semantic Chunking Moderate to High High Research papers, documentation Recursive Chunking Moderate Moderate to High Legal documents, hierarchical data Hybrid Chunking Variable Adaptive Mixed document types Agentic Chunking High (when optimized) Very High Real-time, dynamic content Embedding-Based Chunking Moderate to High High Semantic retrieval</p><h2>Best Practices for Effective Chunking</h2><ol><li><p><strong>Balance Chunk Size and Context:</strong> Use overlapping chunks (10-20%) to maintain context.</p></li><li><p><strong>Optimize for Performance:</strong> Avoid excessive small chunks to reduce retrieval overhead.</p></li><li><p><strong>Choose a Strategy Based on Content:</strong> Hybrid approaches often yield the best results.</p></li><li><p><strong>Leverage AI Where Needed:</strong> Agentic and embedding-based chunking improve accuracy in dynamic environments.</p></li><li><p><strong>Continuously Evaluate:</strong> Measure retrieval accuracy and adjust chunk sizes accordingly.</p></li></ol><h2>Conclusion</h2><p>Selecting the right chunking strategy is essential for optimizing RAG performance. Whether using fixed-size, semantic, or advanced AI-driven methods, the choice depends on data structure, retrieval needs, and available resources. Implementing hybrid or AI-driven chunking can significantly enhance accuracy and efficiency in real-world applications.</p><p>What chunking strategy do you find most effective for your use case?</p>]]></content:encoded></item><item><title><![CDATA[Envisioning for Better Outcomes]]></title><description><![CDATA[Envisioning for Better Outcomes]]></description><link>https://puneetghanshani.substack.com/p/envisioning</link><guid isPermaLink="false">https://puneetghanshani.substack.com/p/envisioning</guid><dc:creator><![CDATA[Puneet Ghanshani]]></dc:creator><pubDate>Sat, 01 Jun 2024 00:00:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/0b81e59c-7798-4915-96c9-ebc19a9abbca_1080x1350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>As an architect, have you ever faced a customer who came to you with a specific request, only to realize that what they were asking for wasn&#8217;t actually what they needed? Maybe they wanted a feature added, a system tweaked, or a performance issue fixed. You could deliver exactly what they asked for, but would it truly solve their problem?</p><p>This is where <strong>envisioning becomes critical</strong>. Instead of jumping straight into solutions, we need to step back and <strong>understand the broader context, the actual pain points, and the opportunities for real impact</strong>. Envisioning helps us <strong>challenge assumptions, explore possibilities, and design solutions that don&#8217;t just work&#8212;but create lasting value.</strong></p><p>Let&#8217;s walk through an example together. Imagine we are working with a financial services company struggling with <strong>declining engagement in their digital banking platform</strong>. Customers are frustrated, complaints are increasing, and transactions are being abandoned midway. We could refine the interface, add a few enhancements, and optimize performance&#8212;but is that enough? Or should we <strong>rethink what a modern banking experience should truly feel like?</strong></p><p>Let&#8217;s go on this journey.</p><div><hr></div><h2><strong>Understanding the Real Problem</strong></h2><p>Before designing anything, we need to <strong>step into the customer&#8217;s shoes</strong>.</p><p>Imagine a user logging into their banking app. They want to check their savings balance, but it takes multiple taps to find. They receive a reminder about an upcoming bill, but there&#8217;s no easy way to act on it directly. The app offers charts and insights, but none of it <strong>feels personalized to their financial habits</strong>.</p><p>If we focus only on surface-level fixes&#8212;rearranging buttons, tweaking colors&#8212;will we actually enhance the experience? Probably not. Instead, we ask:</p><ul><li><p><strong>What does a truly great banking experience look like?</strong></p></li><li><p><strong>How can the app evolve from being just a tool to a proactive financial partner?</strong></p></li><li><p><strong>How might we help users make smarter financial decisions effortlessly?</strong></p></li></ul><p>By shifting our mindset from <strong>fixing small issues to envisioning a better experience</strong>, we open the door to real transformation.</p><div><hr></div><h2><strong>Defining the Challenge Together</strong></h2><p>Now that we see the problem clearly, we need to define it in a way that encourages meaningful solutions. Instead of asking, <em>&#8220;How do we improve the app?&#8221;</em>, we reframe the challenge:</p><p><strong>How might we create a digital banking experience that is intuitive, proactive, and helps customers make smarter financial decisions?</strong></p><p>This guiding question ensures that we <strong>don&#8217;t just add features, but truly enhance the way users interact with their finances</strong>.</p><div><hr></div><h2><strong>Exploring Solutions Through Envisioning</strong></h2><p>With a well-defined challenge, we begin <strong>exploring possibilities</strong>. Rather than jumping to quick fixes, we take a step back and <strong>imagine what an ideal banking experience should be like</strong>.</p><p>Let&#8217;s say we&#8217;re in a brainstorming session, sketching ideas on a whiteboard. A few promising concepts emerge:</p><ul><li><p><strong>A Smart Financial Assistant</strong> &#8211; The app <strong>anticipates user needs</strong>, providing real-time financial guidance based on spending habits.</p></li><li><p><strong>Goal-Based Navigation</strong> &#8211; Instead of generic menus, the dashboard <strong>adapts to the user&#8217;s financial priorities</strong>, like saving for a house or managing monthly expenses.</p></li><li><p><strong>Actionable Notifications</strong> &#8211; Instead of just reminders, the app suggests, <em>&#8220;Would you like to split this payment into installments?&#8221;</em> or <em>&#8220;You have extra funds&#8212;want to invest them?&#8221;</em></p></li></ul><p>At this stage, we <strong>aren&#8217;t just iterating on what exists&#8212;we are completely reimagining how digital banking should work.</strong></p><div><hr></div><h2><strong>Bringing the Vision to Life</strong></h2><p>Ideas alone aren&#8217;t enough; we need to <strong>validate them before committing to development</strong>. Instead of building everything at once, we start with <strong>prototypes</strong>.</p><p>We create <strong>interactive mockups</strong> and let real customers test them. Their feedback helps us refine the experience:</p><ul><li><p>Users love the AI-driven <strong>smart financial assistant</strong>, but they want <strong>more control over notifications</strong>.</p></li><li><p>The goal-based dashboard is intuitive, but some prefer <strong>customization options</strong>.</p></li><li><p>Actionable notifications are useful, but users prefer <strong>a balance between automation and manual control</strong>.</p></li></ul><p>By testing early, we avoid costly mistakes and <strong>fine-tune the design to meet real needs before full development</strong>.</p><div><hr></div><h2><strong>Turning Envisioning into a Continuous Process</strong></h2><p>Once the solution is launched, we don&#8217;t stop. <strong>A great experience isn&#8217;t static&#8212;it evolves.</strong></p><p>We continue monitoring data:</p><ul><li><p><strong>Are users completing transactions faster?</strong></p></li><li><p><strong>Do they find the recommendations helpful?</strong></p></li><li><p><strong>Are they making smarter financial decisions?</strong></p></li></ul><p>By continuously measuring and refining, we ensure the experience remains <strong>valuable and relevant over time</strong>.</p><div><hr></div><h2><strong>Why Envisioning Leads to Better Outcomes</strong></h2><p>By taking time to <strong>envision the right solution</strong>, we didn&#8217;t just improve an app&#8212;we transformed <strong>how customers interact with their finances</strong>.</p><p>We moved from:</p><ul><li><p>A <strong>basic banking tool</strong> &#8594; to a <strong>proactive financial assistant</strong>.</p></li><li><p>Generic alerts &#8594; to <strong>personalized, actionable recommendations</strong>.</p></li><li><p>Static navigation &#8594; to a <strong>goal-oriented, user-friendly experience</strong>.</p></li></ul><p>This isn&#8217;t just about banking. The same approach applies to <strong>any industry, any problem</strong>. Whether designing enterprise applications, retail experiences, or workplace systems, the key takeaways remain:</p><ul><li><p><strong>Understand the real problem before proposing solutions.</strong></p></li><li><p><strong>Frame challenges in a way that leads to innovative thinking.</strong></p></li><li><p><strong>Test and refine ideas early to avoid wasted effort.</strong></p></li><li><p><strong>Treat envisioning as an ongoing process, not a one-time exercise.</strong></p></li></ul><div><hr></div><h2><strong>Looking Ahead: What Can We Envision Next?</strong></h2><p>Now that we&#8217;ve walked through this journey together, think about your own work.</p><ul><li><p>Are you solving the <strong>right</strong> problem, or just reacting to symptoms?</p></li><li><p>Are you designing for what customers <strong>ask for</strong>, or what they <strong>truly need</strong>?</p></li><li><p>If you <strong>step back and envision a better future</strong>, what would that look like?</p></li></ul><p>The best solutions don&#8217;t come from fixing today&#8217;s issues&#8212;they come from <strong>imagining what&#8217;s possible tomorrow.</strong></p>]]></content:encoded></item><item><title><![CDATA[Export Azure Key Vault Secrets using PowerShell]]></title><description><![CDATA[When working with Azure Key Vault, you may need to export stored secrets for backup or migration purposes.]]></description><link>https://puneetghanshani.substack.com/p/keyvault-export</link><guid isPermaLink="false">https://puneetghanshani.substack.com/p/keyvault-export</guid><dc:creator><![CDATA[Puneet Ghanshani]]></dc:creator><pubDate>Wed, 15 May 2024 00:00:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/b3454319-b554-43ae-a035-c6d02c1a4b64_1080x1350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When working with <strong>Azure Key Vault</strong>, you may need to export stored secrets for backup or migration purposes. This post provides a <strong>PowerShell script</strong> to extract secrets from a Key Vault and save them in a JSON file.</p><h2>Prerequisites</h2><p>Before running the script, ensure you have:</p><ul><li><p><strong>Azure CLI installed</strong> (<a href="https://docs.microsoft.com/en-us/cli/azure/install-azure-cli">Install Azure CLI</a>)</p></li><li><p><strong>Logged in to Azure CLI</strong> using:</p></li></ul><p><code>az login</code></p><ul><li><p>Set the correct <strong>Azure subscription</strong> (if you have multiple subscriptions):</p></li></ul><p><code>az account set --subscription "your-subscription-id"</code></p><h2>PowerShell Script</h2><p>Save the following script as Export-Secrets.ps1:</p><pre><code># Define variables
$vaultName = "your-key-vault-name"
$outputFile = "keyvault-secrets.json"

# Initialize an empty array
$secretsArray = @()

# Get the list of secret names
$secretIds = az keyvault secret list --vault-name $vaultName --query "[].id" -o tsv

foreach ($secretId in $secretIds) {
    # Extract the secret name from the secret ID
    $secretName = [System.IO.Path]::GetFileName($secretId)
    
    # Get the secret value
    $secretValue = az keyvault secret show --id $secretId --query "value" -o tsv
    
    # Create an object with the secret name and value
    $secretObject = @{
        key   = $secretName
        value = $secretValue
    }
    
    # Add the object to the array
    $secretsArray += $secretObject
}

# Convert the array to JSON and save to a file
$secretsArray | ConvertTo-Json | Set-Content $outputFile

Write-Output "Secrets exported to $outputFile"

</code></pre><h2>Running the Script</h2><ol><li><p>Open <strong>PowerShell</strong>.</p></li><li><p>Navigate to the folder where you saved the script.</p></li><li><p>Run the script:</p></li></ol><pre><code>.\Export-Secrets.ps1
</code></pre><h2>Example Output</h2><p>Once executed, the script generates a JSON file (<code>keyvault-secrets.json</code>) with the following structure:</p><pre><code>[
    {
        "key": "secret1",
        "value": "value1"
    },
    {
        "key": "secret2",
        "value": "value2"
    }
]
</code></pre><p>This script exports secrets in <strong>plain text</strong>. Ensure you store the <strong>keyvault-secrets.json</strong> file securely.</p>]]></content:encoded></item><item><title><![CDATA[Exploring ETL vs. ELT]]></title><description><![CDATA[When designing data pipelines, it&#8217;s important to understand the performance differences between ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform).]]></description><link>https://puneetghanshani.substack.com/p/etl-vs-elt</link><guid isPermaLink="false">https://puneetghanshani.substack.com/p/etl-vs-elt</guid><dc:creator><![CDATA[Puneet Ghanshani]]></dc:creator><pubDate>Sun, 12 Nov 2023 00:00:00 GMT</pubDate><enclosure url="https://substack-post-media.s3.amazonaws.com/public/images/e22dc025-f0bd-4149-b52e-7163da64a2fa_1080x1350.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When designing data pipelines, it&#8217;s important to understand the performance differences between ETL (Extract, Transform, Load) and ELT (Extract, Load, Transform). Each approach has unique advantages depending on your data processing needs. Let&#8217;s break down the performance implications of each method and explore how Azure tools can help you implement them.</p><h2>Speed and Processing Time</h2><p>ETL generally involves slower initial processing because the transformation step occurs before loading the data. This can create bottlenecks, especially when working with large datasets, as the data must be cleaned and transformed before it can be used. This delay can affect the availability of data for analysis.</p><p>In contrast, ELT typically allows for faster data ingestion since raw data is loaded directly into the target system first, and transformations happen later. This method is more suited for environments where real-time data availability is crucial.</p><h2>Scalability</h2><p>As data volumes increase, ETL can become challenging to scale. The transformation process often requires significant computing power before loading the data, which can slow down performance as the dataset grows.</p><p>ELT scales more effectively with large data volumes. With modern data platforms like Azure, you can store and process vast amounts of raw data in a data lake and then transform only what is necessary, leveraging the cloud&#8217;s computational power for better efficiency.</p><h2>Resource Utilization</h2><p>ETL requires dedicated servers for the transformation step, which can be resource-intensive. This setup may also lead to higher operational costs, particularly when transformations are complex and require significant compute power.</p><p>ELT takes advantage of the computational resources of the target system (e.g., cloud data warehouses), which makes it more cost-effective and efficient. Since the transformations occur after data is loaded, it can reduce the need for intermediate servers.</p><h2>Flexibility and Agility</h2><p>ETL is less flexible when data requirements change frequently. If you need to adjust your data structure or transformation rules, you often have to modify the entire ETL pipeline, which can be time-consuming.</p><p>ELT offers more flexibility in handling data transformations. Since data is loaded first and transformations are done on-demand, it is easier to adapt to changes and experiment with different approaches based on evolving business needs.</p><h2>Performance Optimization Techniques</h2><p>Both ETL and ELT can benefit from optimization techniques such as parallel processing, partitioning data, incremental loading, and caching. These methods help speed up data processing, reduce resource consumption, and manage large datasets more efficiently.</p><h2>Choosing the Right Approach</h2><p>The choice between ETL and ELT largely depends on the specifics of your project:</p><ul><li><p><strong>Data volume</strong>: ELT is typically more suited for large datasets, while ETL works better with smaller, more structured datasets.</p></li><li><p><strong>Transformation complexity</strong>: If you have complex transformations that require detailed cleaning or restructuring, ETL might be the better choice. For simpler transformations, ELT leverages the power of the target system.</p></li><li><p><strong>Real-time requirements</strong>: ELT can provide faster initial data loading, which is beneficial for real-time analytics.</p></li><li><p><strong>Compliance and security</strong>: ETL provides better control over sensitive data, allowing for data masking or encryption before it enters the target system.</p></li></ul><h2>Implementing ETL and ELT on Azure</h2><p>Azure provides a variety of tools and services that support both ETL and ELT processes, offering flexibility to choose the right approach for your needs.</p><h3>Azure Data Factory: The Primary Tool</h3><p>Azure Data Factory (ADF) is a comprehensive tool for orchestrating both ETL and ELT processes. It allows for visual design of data transformations (ETL) and offers efficient data loading and transformation capabilities (ELT).</p><p><strong>For ETL:</strong></p><ul><li><p><strong>Data Flow</strong>: ADF&#8217;s Data Flow feature allows you to visually design your data transformations, enabling easy mapping and structuring of data.</p></li><li><p><strong>Integration with Azure Databricks</strong>: For more complex transformations, ADF can integrate with Azure Databricks, which provides powerful processing capabilities.</p></li></ul><p><strong>For ELT:</strong></p><ul><li><p><strong>Copy Activity</strong>: ADF can quickly load raw data into Azure storage or data warehouses, allowing you to store data first and process it later.</p></li><li><p><strong>Integration with Azure Synapse Analytics</strong>: This enables in-database transformations, making it easy to perform powerful analytics on your data without needing to move it out of the warehouse.</p></li></ul><h3>Azure Services for ETL/ELT</h3><ol><li><p><strong>Azure Synapse Analytics</strong>: Ideal for ELT, it offers powerful in-database transformations.</p></li><li><p><strong>Azure Databricks</strong>: Great for complex ETL jobs, particularly when dealing with big data.</p></li><li><p><strong>Azure SQL Database</strong>: Suitable for traditional ETL processes, especially with structured data.</p></li><li><p><strong>Azure Data Lake Storage</strong>: Works well for both ETL and ELT, providing scalable storage for large datasets.</p></li></ol><h2>Conclusion</h2><p>The choice between ETL and ELT isn&#8217;t about which approach is universally better; it&#8217;s about choosing the method that best fits your specific data needs. Consider factors such as data volume, transformation complexity, real-time requirements, and compliance needs when deciding between the two. Azure&#8217;s flexible ecosystem lets you mix and match ETL and ELT methods as needed&#8212;like combining different cooking styles to craft the perfect meal.</p><p>What&#8217;s your next step in selecting the ideal approach for your data pipeline? Try outlining your requirements&#8212;data size, desired speed, and transformation complexity&#8212;and then experiment with Azure Data Factory to see which method meets your performance needs best.</p>]]></content:encoded></item></channel></rss>