<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Zerberos Labs</title><description>A corner of the internet for DFIR research, security tooling, and postmortems by Zach Burnham.</description><link>https://labs.zerberos.io/</link><atom:link href="https://labs.zerberos.io/rss.xml" rel="self" type="application/rss+xml"/><item><title>Agentjacking: the attack where nothing is unauthorized</title><link>https://labs.zerberos.io/blog/triage-agentjacking/</link><guid isPermaLink="true">https://labs.zerberos.io/blog/triage-agentjacking/</guid><description>A public, write-only Sentry credential is enough to feed malicious instructions to an AI agent. The scary part is that nothing in the chain is technically unauthorized.</description><pubDate>Sun, 14 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;This week, researchers at Tenet Security disclosed a new attack class they’re calling &lt;a href=&quot;https://tenetsecurity.ai/blog/agentjacking-coding-agents-with-fake-sentry-errors/&quot;&gt;agentjacking&lt;/a&gt;. After seeing Apple’s new iOS 27 agentic password features, it’s the first AI agent attack in a while that made me stop and actually think about the IR side of it, and I wanted to jot some of that down.&lt;/p&gt;
&lt;p&gt;The mechanics are wicked simple. Tenet’s proof-of-concept involves &lt;a href=&quot;https://docs.sentry.io/concepts/key-terms/dsn-explainer/&quot;&gt;Sentry Data Source Name&lt;/a&gt; (DSN), a project-specific address your app uses to send errors and performance events to the service, which is public and write-only by design. An attacker who finds one can write their own “error events” into your Sentry project, and stuff those events full of instructions. Later, a developer asks their AI coding agent to look into a failing error. The agent pulls the event in via Model Context Protocol (MCP), reads the attacker’s text as if it were context, and runs the embedded commands - with the developer’s privileges, on the developer’s machine. No phishing, no server compromise, no user interaction beyond the workflow a developer does probably 50 times a day.&lt;/p&gt;
&lt;p&gt;Tenet Security says they pulled this off against Claude Code, Cursor, and Codex with a ~85% success rate, and found over 2,000 exposed organizations. But the detail that should bother people most is that the agents ran the payloads &lt;em&gt;even when the system prompt explicitly told them to ignore untrusted data.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;What stuck with me is the terminology Tenet Security used - the “Authorized Intent Chain.” Every step in this attack is a thing the system is &lt;em&gt;supposed&lt;/em&gt; to allow:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Reading a Sentry event &lt;strong&gt;is authorized&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Running a shell command the developer asked for &lt;strong&gt;is authorized&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Hitting an internal API with the developer’s own token &lt;strong&gt;is authorized&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Nothing trips a control because there’s no unauthorized action to trip on. As Tenet states, the attack bypasses EDR, WAF, IAM, VPN, Cloudflare, and firewalls because, as far as any of them can tell, nothing wrong happened.&lt;/p&gt;
&lt;p&gt;That’s the part that should worry any IR analyst - the agent authenticated as the developer, from the developer’s machine, during the developer’s working hours. There’s no malware on disk, no anomalous login, and the C2 beacon was a bug report. The timeline reads as a developer doing ordinary developer things, because mechanically that’s exactly what happened. The malicious actor in this story is a trusted tool faithfully executing poisoned input, and faithful execution doesn’t leave the artifacts we’re trained to go looking for. Scoping an Incident like this means treating your agent’s entire context window as attacker-reachable input, and I’d bet most orgs have zero logging of what their coding agent actually read before it acted. Honestly, the whole thing is bringing back memories of the initial introduction of fileless malware.&lt;/p&gt;
&lt;p&gt;I’ll be watching how agent vendors respond here, because the whole value of these agents is that they &lt;em&gt;act&lt;/em&gt; on what they read. One thing I’m for sure noticing - the growing theme seems to be bolting new capabilities on faster than we’re bolting on the controls to contain it.&lt;/p&gt;
&lt;p&gt;ZB&lt;/p&gt;</content:encoded><enclosure url="https://labs.zerberos.io/_astro/zb-logo-banner-darkbg-white.CcbtFrcP.png" length="0" type="image/png"/></item><item><title>iOS 27 gives Siri write access to your passwords - should it?</title><link>https://labs.zerberos.io/blog/triage-siri-ios27-passwords/</link><guid isPermaLink="true">https://labs.zerberos.io/blog/triage-siri-ios27-passwords/</guid><description>Apple Passwords is getting agentic AI that can rotate credentials on your behalf. Convenient - but at what cost?</description><pubDate>Wed, 10 Jun 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;A few days ago at WWDC, &lt;a href=&quot;https://www.apple.com/newsroom/2026/06/apple-unveils-next-generation-of-apple-intelligence-siri-ai-and-more/&quot;&gt;Apple announced iOS 27&lt;/a&gt; - with the biggest feature being a revamped Siri, called &lt;a href=&quot;https://www.apple.com/newsroom/2026/06/apple-introduces-siri-ai-a-profoundly-more-capable-and-personal-assistant/&quot;&gt;Siri AI&lt;/a&gt;. After looking at some of Siri AI’s new features, I noticed it can now make changes &lt;em&gt;inside&lt;/em&gt; the Passwords app. Specifically, it can now act on a weak or compromised credential by walking through a password rotation on the site for you, end to end. Agentic magic.&lt;/p&gt;
&lt;p&gt;At face value, the convenience is obvious: most people never rotate a leaked password because the flow is tedious and people can be naturally lazy. As incriminating as it is to admit, I am also one of those people (for my least important credentials). Apple likely believes that adding the ability to pass this off to an agent could measurably shrink the window between “compromise detected” and “credential rotated,” and they’re probably right.&lt;/p&gt;
&lt;p&gt;But it also moves the trust boundary. Until now, the only thing (hopefully) that could read and write every credential you own was &lt;em&gt;you&lt;/em&gt;, gated behind some form of biometric unlock (i.e. Face ID / Touch ID). An agent that can navigate to a site, authenticate as you, and submit a new password is a new high-value capability, but the interesting question isn’t “is the model good,” it’s “what exactly can trigger it” and “what can / can’t it be talked into doing.” Prompt injection is a thing - a malicious login page or a spoofed “your password was compromised” prompt are the first things that come to mind. We already see the effects of &lt;a href=&quot;https://www.tomshardware.com/tech-industry/artificial-intelligence/linkedin-recruitment-spam-becomes-olde-english-prose-after-user-hides-ai-prompt-injection-in-bio-bots-also-also-manipulated-to-address-user-as-my-lord&quot;&gt;prompt injection on LinkedIn with recruiter bots&lt;/a&gt;, which is the same exposure a password agent can have the second it reads a malicious login page. Apple appears to show this capability as navigating to a website you have previously attributed to the credential, but what if that page ends up compromised?&lt;/p&gt;
&lt;p&gt;And I’m not alone in this thinking. At first, I actually was fooled by the “magic” of it all. But after speaking to a few college friends who also work in the industry, their immediate reaction was a mix of “I don’t like that” and “sus.” However, their initial thoughts were more based on whether this functionality was running locally on-device or passed off via Apple’s &lt;a href=&quot;https://security.apple.com/blog/private-cloud-compute/&quot;&gt;Private Cloud Compute&lt;/a&gt;. Another valid concern.&lt;/p&gt;
&lt;p&gt;I’ll be keeping an eye on this as the iOS 27 beta cycle unfolds to see if Apple posts anything more about this feature, specifically from a security standpoint. Who knows, maybe Apple has this all figured out already with their army of engineers and everything will work perfectly with privacy in mind. Regardless, the world is fast implementing AI - and the risks associated with that are only just beginning.&lt;/p&gt;
&lt;p&gt;ZB&lt;/p&gt;</content:encoded><enclosure url="https://labs.zerberos.io/_astro/zb-logo-banner-darkbg-white.CcbtFrcP.png" length="0" type="image/png"/></item><item><title>Home Lab Snapshot: May 2026</title><link>https://labs.zerberos.io/blog/home-lab-snapshot-may-2026/</link><guid isPermaLink="true">https://labs.zerberos.io/blog/home-lab-snapshot-may-2026/</guid><description>Notes on the current shape of my home lab: an M5 Pro MacBook running LM Studio, Ollama, and a hardened Windows 11 detonation VM under VMware Fusion. Less a how-to, more a baseline for future posts that build on this setup.</description><pubDate>Mon, 25 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;https://labs.zerberos.io/_astro/homelab-snapshot-may26-heroimage.nKvnxm5s.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;&lt;p&gt;I’m planning this post as the beginning of a series - whenever I make consequential changes to my home lab, I’ll make a cooresonding post like this. But this acts as half notes-to-self, half baseline reference to be used for future posts that may need context around the specs I’m currently working within. The shape of the lab will drift over time (hence the blog post name) so consider this a snapshot, not a permanent answer.&lt;/p&gt;
&lt;h2 id=&quot;why-have-a-home-lab&quot;&gt;Why have a home lab&lt;/h2&gt;
&lt;p&gt;Two reasons, both opportunistic:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Apple Silicon is genuinely good for local LLMs now.&lt;/strong&gt; Not to be an Apple fanboy, but M5 Pro’s unified memory allows a 35B-parameter abliterated model to load on the same laptop I use day-to-day without a massive rig. For malware-adjacent work where I don’t want samples or decoded artifacts being sent to a hosted API, this is perfect.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;VMware Fusion is free (for personal use).&lt;/strong&gt; &lt;a href=&quot;https://blogs.vmware.com/cloud-foundation/2024/11/11/vmware-fusion-and-workstation-are-now-free-for-all-users/&quot;&gt;Broadcom released Fusion for personal use in 2024&lt;/a&gt;, which removes the last meaningful cost barrier to running a couple of Windows/Linux VMs on a Mac.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;the-stack&quot;&gt;The stack&lt;/h2&gt;
&lt;p&gt;Three layers, each doing a specific job. The diagram below is interactive - click any layer to see what runs there and current specs I am using.&lt;/p&gt;
&lt;div class=&quot;lab-stack&quot;&gt;&lt;div class=&quot;lab-stack-diagram&quot; role=&quot;tablist&quot; aria-label=&quot;Home lab stack layers&quot;&gt;&lt;button role=&quot;tab&quot; type=&quot;button&quot; aria-selected=&quot;false&quot; class=&quot;lab-stack-layer  layer-vms&quot;&gt;&lt;span class=&quot;lab-stack-chip&quot;&gt;Layer 3&lt;/span&gt;&lt;span class=&quot;lab-stack-label&quot;&gt;VMs&lt;/span&gt;&lt;span class=&quot;lab-stack-short&quot;&gt;Isolated guests&lt;/span&gt;&lt;/button&gt;&lt;button role=&quot;tab&quot; type=&quot;button&quot; aria-selected=&quot;false&quot; class=&quot;lab-stack-layer  layer-apps&quot;&gt;&lt;span class=&quot;lab-stack-chip&quot;&gt;Layer 2&lt;/span&gt;&lt;span class=&quot;lab-stack-label&quot;&gt;macOS apps&lt;/span&gt;&lt;span class=&quot;lab-stack-short&quot;&gt;LLMs &amp;amp; virtualization&lt;/span&gt;&lt;/button&gt;&lt;button role=&quot;tab&quot; type=&quot;button&quot; aria-selected=&quot;true&quot; class=&quot;lab-stack-layer active layer-host&quot;&gt;&lt;span class=&quot;lab-stack-chip&quot;&gt;Layer 1&lt;/span&gt;&lt;span class=&quot;lab-stack-label&quot;&gt;Host&lt;/span&gt;&lt;span class=&quot;lab-stack-short&quot;&gt;M5 Pro MacBook Pro&lt;/span&gt;&lt;/button&gt;&lt;/div&gt;&lt;div class=&quot;lab-stack-detail&quot; role=&quot;tabpanel&quot;&gt;&lt;h4&gt;M5 Pro MacBook Pro&lt;/h4&gt;&lt;p&gt;The whole lab runs on my MacBook. Apple Silicon&amp;#x27;s unified memory is what makes the rest of the stack viable on one machine - the GPU and Neural Engine share the same 64GB pool the OS and VMs draw from, allowing the usage of sizeable local models + VMs without a massive rig.&lt;/p&gt;&lt;ul class=&quot;lab-stack-specs&quot;&gt;&lt;li&gt;&lt;span class=&quot;k&quot;&gt;Chip&lt;/span&gt;&lt;span class=&quot;v&quot;&gt;M5 Pro&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class=&quot;k&quot;&gt;CPU&lt;/span&gt;&lt;span class=&quot;v&quot;&gt;18-core&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class=&quot;k&quot;&gt;GPU&lt;/span&gt;&lt;span class=&quot;v&quot;&gt;20-core&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class=&quot;k&quot;&gt;Neural Engine&lt;/span&gt;&lt;span class=&quot;v&quot;&gt;16-core&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class=&quot;k&quot;&gt;Memory&lt;/span&gt;&lt;span class=&quot;v&quot;&gt;64 GB unified&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span class=&quot;k&quot;&gt;Storage&lt;/span&gt;&lt;span class=&quot;v&quot;&gt;2 TB SSD&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/div&gt;&lt;div class=&quot;lab-stack-footer&quot;&gt;&lt;span&gt;Click a layer to see what runs there.&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;!--astro:end--&gt;
&lt;h2 id=&quot;local-llms-lm-studio--ollama&quot;&gt;Local LLMs: LM Studio + Ollama&lt;/h2&gt;
&lt;p&gt;Two LLM apps, two different roles. Each have their own advantages, so I decided to just leverage both. I’m calling this out specifically, including the specific model each is running, so future posts can link back here for the model context without re-explaining each time.&lt;/p&gt;
&lt;h3 id=&quot;lm-studio-abliterated-models&quot;&gt;LM Studio (abliterated models)&lt;/h3&gt;
&lt;p&gt;LM Studio gets the &lt;strong&gt;abliterated&lt;/strong&gt; models, which are variants where the refusal vector has been ablated out of the weights. This allows the model to answer questions about obfuscated payloads, reverse-engineering, and exploit code without dragging in the safety jargon that (although needed for day-to-day models) derails a deobfuscation session. I’m currently running &lt;code&gt;huihui-qwen3.6-35b-a3b-abliterated&lt;/code&gt;, which is the same base Qwen 3.6 35B with the refusal behavior removed.&lt;/p&gt;
&lt;p&gt;The use case is narrow: code deobfuscation, walking through what a malware sample is doing, and asking questions such as “what does this PowerShell loader look like decoded.” A safety-aligned model buries the answer under disclaimers, which isn’t helpful.&lt;/p&gt;
&lt;h3 id=&quot;ollama-regular-models&quot;&gt;Ollama (regular models)&lt;/h3&gt;
&lt;p&gt;Ollama runs the regular, safety-aligned models, and I’m currently using &lt;code&gt;qwen3-coder:30b&lt;/code&gt;. This is the model that powers &lt;a href=&quot;https://labs.zerberos.io/blog/introducing-souschef/&quot;&gt;SousChef&lt;/a&gt; and anything else where I’m not asking for content the alignment guardrails would block anyway. This includes code generation, CyberChef recipe creation, and structured-output tasks.&lt;/p&gt;
&lt;p&gt;Keeping these two apps separate (rather than running everything through one) is partly historical, partly practical: LM Studio’s UI is built around model browsing and chat, which is what I want for the analysis side. Ollama’s HTTP API and CLI are what I want for the tooling side. They share the host’s RAM pool, but I rarely have both pulling on a model at the same time.&lt;/p&gt;
&lt;h2 id=&quot;windows-11-detonation-vm&quot;&gt;Windows 11 detonation VM&lt;/h2&gt;
&lt;p&gt;My VM is named &lt;strong&gt;“Windows 11 Detonation.”&lt;/strong&gt; It’s the host for any “let’s see what this PowerShell loader actually does” session, used for static analysis only - no execution of live samples (yet).&lt;/p&gt;





































&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Setting&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Guest OS&lt;/td&gt;&lt;td&gt;Windows 11 (&lt;a href=&quot;https://www.microsoft.com/en-us/software-download/windows11&quot;&gt;ARM64 ISO from Microsoft&lt;/a&gt;)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;vCPUs&lt;/td&gt;&lt;td&gt;2&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;RAM&lt;/td&gt;&lt;td&gt;4 GB&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Disk&lt;/td&gt;&lt;td&gt;60 GB thin-provisioned, single file&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Virtual TPM&lt;/td&gt;&lt;td&gt;Enabled (required by the Win11 installer)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Network&lt;/td&gt;&lt;td&gt;Host-Only (“Private to my Mac”)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;VMware Tools&lt;/td&gt;&lt;td&gt;Installed (copy/paste + drag/drop)&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;The vTPM toggle is in Fusion’s VM settings and is non-negotiable for Win11 here. Without it, the installer refuses to proceed past the system-requirements check. Thin-provisioned single-file disk keeps the VM portable for backup; I’d rather one large file than 60 GB worth of sparse extents.&lt;/p&gt;
&lt;h3 id=&quot;oobe-bypass-for-a-no-account-install&quot;&gt;OOBE bypass for a no-account install&lt;/h3&gt;
&lt;p&gt;Out-of-Box Experience is the first-boot setup wizard on a fresh Windows 11 install (and for DFIR labs, very annoying). By default, OOBE forces a Microsoft account sign-in and an active internet connection before letting you reach the desktop, both of which are undesirable for an analysis VM.&lt;/p&gt;
&lt;p&gt;Luckily, there is a way around it. At the “Let’s connect you to a network” screen, hit &lt;code&gt;Shift + F10&lt;/code&gt; to open a command prompt and run:&lt;/p&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8;overflow-x:auto&quot; tabindex=&quot;0&quot; data-language=&quot;cmd&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt;oobe\bypassnro&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The VM reboots OOBE in a mode where an “I don’t have internet” option becomes available, and you can finish setup with a local account.&lt;/p&gt;
&lt;h2 id=&quot;networking-host-only&quot;&gt;Networking (Host-Only)&lt;/h2&gt;
&lt;p&gt;Fusion’s network adapter is set to &lt;strong&gt;“Private to my Mac”&lt;/strong&gt; — “Host-Only” in standard hypervisor terminology. The VM can reach the macOS host (and the host can reach the VM), but the VM has no path to the LAN and no path to the internet.&lt;/p&gt;
&lt;p&gt;For static analysis and decoding, this is exactly what you want. If I’m pasting in an obfuscated PowerShell blob and asking the VM to walk it through &lt;code&gt;IO.Compression&lt;/code&gt; decompression, there is no scenario where it benefits from talking to a C2. NAT and Bridged both leave a path open and Host-Only closes it.&lt;/p&gt;
&lt;aside class=&quot;note&quot;&gt;&lt;p&gt;This stance only holds while the workflow is decoding-only. The moment I start executing live samples (even ones I think I understand) Host-Only with VMware Tools clipboard sharing isn’t enough. At that point the tradeoffs flip - airgapped snapshot, no VMware Tools, no shared clipboard, and probably a different VM entirely. That’s a future-Zach problem.&lt;/p&gt;&lt;/aside&gt;
&lt;h2 id=&quot;snapshots&quot;&gt;Snapshots&lt;/h2&gt;
&lt;p&gt;Fusion’s snapshot model is generous enough that I can keep two named baselines and just roll back between sessions instead of rebuilding the VM each time. I chose to keep two for this detonation VM:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Snapshot 1: Clean post-OOBE.&lt;/strong&gt; Nothing installed. This is the “I need to test something against an out-of-box Windows” snapshot — useful for verifying that a behavior isn’t an artifact of the modifications below.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Snapshot 2: Tools + Defender off.&lt;/strong&gt; VMware Tools installed, Defender fully disabled (see the next section) and no other software. This is the working baseline I usually roll back to between sessions.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;disabling-defender&quot;&gt;Disabling Defender&lt;/h2&gt;
&lt;p&gt;For pure decoding work, such as feeding the VM an obfuscated PowerShell command and asking it to walk through the deobfuscation, Defender’s real-time scanning and behavior monitoring will block or quarantine the artifact mid-session (booooo). You need to disable Windows Defender to get to the goodies. But only doing one or two of the steps outlined below leaves Defender in a state where some engines re-enable themselves on reboot or the next policy refresh. So you gotta do them all.&lt;/p&gt;
&lt;div class=&quot;defender-stepper&quot;&gt;&lt;div class=&quot;defender-callout&quot;&gt;&lt;strong&gt;All three steps are required.&lt;/strong&gt; Doing one or two leaves Defender in a state where some engines re-enable themselves on reboot or policy refresh. Run them in order, then reboot.&lt;/div&gt;&lt;ol class=&quot;defender-tabs&quot; role=&quot;tablist&quot; aria-label=&quot;Defender disable steps&quot;&gt;&lt;li&gt;&lt;button role=&quot;tab&quot; type=&quot;button&quot; aria-selected=&quot;true&quot; class=&quot;defender-tab active&quot;&gt;&lt;span class=&quot;defender-step-num&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;defender-tab-text&quot;&gt;&lt;span class=&quot;defender-tab-label&quot;&gt;Tamper Protection&lt;/span&gt;&lt;span class=&quot;defender-tab-summary&quot;&gt;Settings UI&lt;/span&gt;&lt;/span&gt;&lt;/button&gt;&lt;/li&gt;&lt;li&gt;&lt;button role=&quot;tab&quot; type=&quot;button&quot; aria-selected=&quot;false&quot; class=&quot;defender-tab &quot;&gt;&lt;span class=&quot;defender-step-num&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;defender-tab-text&quot;&gt;&lt;span class=&quot;defender-tab-label&quot;&gt;Set-MpPreference&lt;/span&gt;&lt;span class=&quot;defender-tab-summary&quot;&gt;PowerShell&lt;/span&gt;&lt;/span&gt;&lt;/button&gt;&lt;/li&gt;&lt;li&gt;&lt;button role=&quot;tab&quot; type=&quot;button&quot; aria-selected=&quot;false&quot; class=&quot;defender-tab &quot;&gt;&lt;span class=&quot;defender-step-num&quot;&gt;3&lt;/span&gt;&lt;span class=&quot;defender-tab-text&quot;&gt;&lt;span class=&quot;defender-tab-label&quot;&gt;Group Policy + reboot&lt;/span&gt;&lt;span class=&quot;defender-tab-summary&quot;&gt;Local Group Policy&lt;/span&gt;&lt;/span&gt;&lt;/button&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div class=&quot;defender-body&quot; role=&quot;tabpanel&quot;&gt;&lt;div class=&quot;defender-progress&quot; aria-hidden=&quot;true&quot;&gt;Step &lt;!-- --&gt;1&lt;!-- --&gt; of &lt;!-- --&gt;3&lt;/div&gt;&lt;p&gt;Tamper Protection is a guardrail that blocks every other Defender setting from being changed via PowerShell, registry, or Group Policy. It can only be disabled through the Settings UI, so this step has to happen before Steps 2 and 3 will actually stick.&lt;/p&gt;&lt;ol&gt;&lt;li&gt;Open &lt;strong&gt;Settings → Privacy &amp;amp; security → Windows Security&lt;/strong&gt;.&lt;/li&gt;&lt;li&gt;Open &lt;strong&gt;Virus &amp;amp; threat protection&lt;/strong&gt;, then&lt;!-- --&gt; &lt;strong&gt;Manage settings&lt;/strong&gt;.&lt;/li&gt;&lt;li&gt;Toggle &lt;strong&gt;Tamper Protection&lt;/strong&gt; to &lt;code&gt;Off&lt;/code&gt;. Accept the UAC prompt.&lt;/li&gt;&lt;/ol&gt;&lt;p class=&quot;step-note&quot;&gt;Without this, &lt;code&gt;Set-MpPreference&lt;/code&gt; calls in the next step will silently fail or revert on reboot.&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;&lt;!--astro:end--&gt;
&lt;p&gt;Once all three steps land and the VM has rebooted clean, this is the moment to take Snapshot 2 (Tools + Defender off).&lt;/p&gt;
&lt;h2 id=&quot;so---how-am-i-using-this-detonation-vm&quot;&gt;So - how am I using this Detonation VM?&lt;/h2&gt;
&lt;aside class=&quot;note&quot;&gt;&lt;p&gt;I put the following Proof-of-Concepts (PoCs) below to give some rough examples of what I’m currently playing around with. If anything comes of it, I may write a more detailed follow-up post.&lt;/p&gt;&lt;/aside&gt;
&lt;h3 id=&quot;poc-1-simple-obfuscated-powershell-iocompression&quot;&gt;PoC #1: Simple obfuscated PowerShell (IO.Compression)&lt;/h3&gt;
&lt;p&gt;First real test of the VM was a small obfuscated PowerShell command that used &lt;code&gt;System.IO.Compression&lt;/code&gt; to inflate a base64 payload at runtime. The interesting bits that came out of the session:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pasting and inspecting.&lt;/strong&gt; First instinct was to wrap the decompressed payload in &lt;code&gt;Write-Host&lt;/code&gt; so the CLI would print the inflated script content. It works, but I quickly learned reading bytes through &lt;code&gt;Write-Host&lt;/code&gt; is fragile for anything with embedded quoting.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;&lt;code&gt;Invoke-Expression&lt;/code&gt; capture trap.&lt;/strong&gt; Leaving the original &lt;code&gt;IEX (...)&lt;/code&gt; wrapper in place meant &lt;code&gt;$result&lt;/code&gt; was capturing the return value of whatever the payload executed, not the payload itself. The right move was to strip the &lt;code&gt;IEX&lt;/code&gt; and read the inflated stream directly via &lt;code&gt;StreamReader&lt;/code&gt; over a &lt;code&gt;DeflateStream&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Leftover &lt;code&gt;);&lt;/code&gt; parse error.&lt;/strong&gt; After stripping the &lt;code&gt;IEX&lt;/code&gt; wrapper, the dangling &lt;code&gt;);&lt;/code&gt; from the original tail caused PowerShell to bail with an unexpected-token parse error. Had to remember to clear that, then the command printed the goodies.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;poc-2-complex-insertremovereplace-obfuscation&quot;&gt;PoC #2: Complex insert/remove/replace obfuscation&lt;/h3&gt;
&lt;p&gt;Second test was a different animal: a PowerShell loader reconstructed from 17 separate scriptblock-logging entries in a PowerShell EVTX. Insert/remove/replace obfuscation across the chain made manual reassembly painful.&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Reassembly ordering.&lt;/strong&gt; Scriptblock logs need to be sorted by &lt;code&gt;MessageNumber&lt;/code&gt; within a matching &lt;code&gt;ScriptBlockId&lt;/code&gt; GUID, &lt;strong&gt;not&lt;/strong&gt; by timestamp. When pulled from EVTX logs, often times this requires manual sorting by the analyst.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Null expression error.&lt;/strong&gt; First try, the reassembled chain threw a null-expression error on execution, which I traced to a likely missing or out-of-order scriptblock somewhere in the middle of the chain (pain). Essentially I had to iterate from here to figure out where I was dumb.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;more-to-come&quot;&gt;More to come&lt;/h2&gt;
&lt;p&gt;This is just the baseline of a lab setup as I currently get more into it - posts that lean on the detonation VM or the snapshot/rollback workflow may link back here instead of re-explaining the setup each time. When the shape of my home lab changes meaningfully, such as whenever I get off my couch and seriously play around with a Kali Linux VM, I’ll add a new dated snapshot post to this series.&lt;/p&gt;
&lt;p&gt;ZB&lt;/p&gt;</content:encoded><enclosure url="https://labs.zerberos.io/_astro/homelab-snapshot-may26-heroimage.nKvnxm5s.png" length="0" type="image/png"/></item><item><title>Meet SousChef, an Experiment in CyberChef Recipes from a Local LLM</title><link>https://labs.zerberos.io/blog/introducing-souschef/</link><guid isPermaLink="true">https://labs.zerberos.io/blog/introducing-souschef/</guid><description>A Python-based CLI that turns obfuscated payloads into browser-ready CyberChef recipes, powered by a local Ollama model so samples never leave your system.</description><pubDate>Mon, 11 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;https://labs.zerberos.io/_astro/souschef-heroimage.CUVOKBvF.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;&lt;p&gt;Anyone in the DFIR world can relate to this - you come across a command line that has a &lt;code&gt;powershell -enc&lt;/code&gt; blob with seemingly a bagillion characters of Base64, and you know from experience there’s probably another layer or two underneath. This could involve compression via gzip, maybe a single-byte XOR using a key the script kindly left lying around. You then walk it through CyberChef by hand, something you’ve done a thousand times (and likely seen it throw &lt;strong&gt;invalid blah blah&lt;/strong&gt; back at your face a similar amount). But it’s tedious…and exactly the kind of pattern-matching a language model is good at.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;SousChef&lt;/strong&gt; is a Python-based CLI tool I’ve been building to do that first pass for you. You hand it an obfuscated payload, it asks a &lt;em&gt;&lt;strong&gt;local&lt;/strong&gt;&lt;/em&gt; Ollama model what the recipe should look like, sanitizes and validates the model’s output against a known operation catalog, and hands you back a CyberChef URL with the recipe already loaded. The payload itself stays on-device.&lt;/p&gt;
&lt;p&gt;It lives on GitHub at &lt;strong&gt;&lt;a href=&quot;https://github.com/zerber0s/souschef&quot;&gt;github.com/zerber0s/souschef&lt;/a&gt;&lt;/strong&gt;. Fair warning up front though that it’s &lt;strong&gt;experimental&lt;/strong&gt; - the prompt is still being tuned / battle-tested and there are some quirks (more on those below).&lt;/p&gt;
&lt;h2 id=&quot;why-i-built-it&quot;&gt;Why I built it&lt;/h2&gt;
&lt;p&gt;DFIR triage on encoded samples is a lot of mechanical work. Most “interesting” payloads I see in the wild aren’t doing anything novel cryptographically, they’re usually just stacking 3-4 well-known wrappers (base64 → UTF-16LE → gzip → XOR, etc.) and hoping the layering buys time. The slow part isn’t decoding any single layer, but identifying which layers are present and in what order. Then you leverage a tool like CyberChef to make it human-readable.&lt;/p&gt;
&lt;p&gt;The other piece is sample sensitivity. Half of the obfuscated content I’d actually want a model’s opinion on (even malware) is stuff I can’t paste into a hosted API. This could be Client data or PII-adjacent and, usually being part of an active engagement, the unknowns need to limit how you handle the data. Knowing others face this same sceanrio, the design constraint was always “this has to run on the analyst’s machine, on a model the analyst controls.” The tool can even run against a local instance of CyberChef for the most sensitive of situations. Ollama running &lt;a href=&quot;https://ollama.com/library/qwen3-coder&quot;&gt;qwen3-coder:30b&lt;/a&gt; locally turned out to be a reasonable sweet spot on Apple Silicon: code-tuned and disciplined enough about structured output to produce parseable recipe JSON most of the time.&lt;/p&gt;
&lt;h2 id=&quot;how-it-works&quot;&gt;How it works&lt;/h2&gt;
&lt;p&gt;End-to-end, one run looks like this:&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Input&lt;/strong&gt; → &lt;strong&gt;Local model&lt;/strong&gt; → &lt;strong&gt;Parse &amp;amp; repair&lt;/strong&gt; → &lt;strong&gt;Sanitize&lt;/strong&gt; → &lt;strong&gt;Normalize&lt;/strong&gt; → &lt;strong&gt;Heuristics&lt;/strong&gt; → &lt;strong&gt;Confidence&lt;/strong&gt; → &lt;strong&gt;Output&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;Each step expanded:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;&lt;strong&gt;Input&lt;/strong&gt; - a file, a stdin pipe, or a &lt;code&gt;--input&lt;/code&gt; string. The same blob you’d paste into CyberChef.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Local model&lt;/strong&gt; - SousChef sends the payload plus a fairly large system prompt to Ollama. The system prompt encodes the CyberChef operation catalog (~122 ops) it can use, a set of &lt;strong&gt;few-shots&lt;/strong&gt; (odd LLM lingo for examples) covering common DFIR patterns, and rules about argument formating / shape.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Recipe parsing &amp;amp; repair&lt;/strong&gt; - the model returns JSON. SousChef automatically handles fence markers, dangling brackets, and the usual LLM output noise, then parses out the recipe.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Sanitization&lt;/strong&gt; - anything that looks like a PowerShell execution sink (&lt;code&gt;IEX&lt;/code&gt;, &lt;code&gt;Invoke-Expression&lt;/code&gt;, trailing &lt;code&gt;&amp;amp;&lt;/code&gt; calls) is stripped. These aren’t CyberChef ops, so if the model emits them, it’s confused about the boundary between “decode this” and “run this.”&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Argument normalization&lt;/strong&gt; - coerces each op’s arguments into CyberChef’s exact positional format. This is the part that bit me hardest in early testing (see the “Where it is today” section below).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Heuristic detectors&lt;/strong&gt; - a panel of currently ~11 small checks runs over the recipe and a Python-side simulation of its output. They flag things like “the output is still mostly non-printable, you probably need another XOR layer” or “these two ops cancel each other out.”&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Confidence scoring&lt;/strong&gt; - rolls everything up into &lt;code&gt;HIGH&lt;/code&gt; / &lt;code&gt;MEDIUM&lt;/code&gt; / &lt;code&gt;LOW&lt;/code&gt; with a list of actionable signals.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Output&lt;/strong&gt; - assembles a CyberChef URL fragment, prints it, optionally copies it to the clipboard, optionally opens it in a browser.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;The value this tool brings at a high-level:&lt;/p&gt;
&lt;div class=&quot;feature-grid&quot; data-astro-cid-p7n436wq&gt; &lt;div class=&quot;feature-card&quot; data-astro-cid-p7n436wq&gt; &lt;div class=&quot;icon&quot; data-astro-cid-p7n436wq&gt;🔒&lt;/div&gt; &lt;h4 data-astro-cid-p7n436wq&gt;Runs entirely offline&lt;/h4&gt;  &lt;p data-astro-cid-p7n436wq&gt;Samples are sent to a local Ollama model on your machine. No cloud APIs, no third-party telemetry.&lt;/p&gt; &lt;/div&gt;&lt;div class=&quot;feature-card&quot; data-astro-cid-p7n436wq&gt; &lt;div class=&quot;icon&quot; data-astro-cid-p7n436wq&gt;🧪&lt;/div&gt; &lt;h4 data-astro-cid-p7n436wq&gt;Heuristic validation&lt;/h4&gt;  &lt;p data-astro-cid-p7n436wq&gt;A panel of small Python checks flags missing layers, redundant op pairs, and garbage output before you click the URL.&lt;/p&gt; &lt;/div&gt;&lt;div class=&quot;feature-card&quot; data-astro-cid-p7n436wq&gt; &lt;div class=&quot;icon&quot; data-astro-cid-p7n436wq&gt;📚&lt;/div&gt; &lt;h4 data-astro-cid-p7n436wq&gt;Operation-catalog enforcement&lt;/h4&gt;  &lt;p data-astro-cid-p7n436wq&gt;Recipes are constrained to the known CyberChef op set. Hallucinated ops get caught at parse time, not in your browser.&lt;/p&gt; &lt;/div&gt;&lt;div class=&quot;feature-card&quot; data-astro-cid-p7n436wq&gt; &lt;div class=&quot;icon&quot; data-astro-cid-p7n436wq&gt;📊&lt;/div&gt; &lt;h4 data-astro-cid-p7n436wq&gt;Confidence scoring&lt;/h4&gt;  &lt;p data-astro-cid-p7n436wq&gt;Every run produces a HIGH / MEDIUM / LOW signal with a short list of &quot;why&quot; and &quot;what to check next.&quot;&lt;/p&gt; &lt;/div&gt;&lt;div class=&quot;feature-card&quot; data-astro-cid-p7n436wq&gt; &lt;div class=&quot;icon&quot; data-astro-cid-p7n436wq&gt;🔗&lt;/div&gt; &lt;h4 data-astro-cid-p7n436wq&gt;Browser-ready URLs&lt;/h4&gt;  &lt;p data-astro-cid-p7n436wq&gt;Terminal output contains a CyberChef URL fragment with the recipe pre-loaded. Can be configured to auto-open in browser as well.&lt;/p&gt; &lt;/div&gt;&lt;div class=&quot;feature-card&quot; data-astro-cid-p7n436wq&gt; &lt;div class=&quot;icon&quot; data-astro-cid-p7n436wq&gt;🛰️&lt;/div&gt; &lt;h4 data-astro-cid-p7n436wq&gt;Air-gap friendly&lt;/h4&gt;  &lt;p data-astro-cid-p7n436wq&gt;A &lt;code&gt;--cyberchef&lt;/code&gt; flag points the URL at a self-hosted CyberChef instance for sensitive engagements.&lt;/p&gt; &lt;/div&gt; &lt;/div&gt;
&lt;h2 id=&quot;what-i-tested-it-against&quot;&gt;What I tested it against&lt;/h2&gt;
&lt;p&gt;All testing was performed against a mix of benign sample data, generated by AI from known techniques / things I have seen in the field, and malicious samples pulled from public repositories such as &lt;a href=&quot;https://www.virustotal.com/gui/home/upload&quot;&gt;VirusTotal&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;A representative slice of what works end-to-end today:&lt;/p&gt;

















































&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Family&lt;/th&gt;&lt;th&gt;Shape&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;PowerShell &lt;code&gt;-EncodedCommand&lt;/code&gt; / &lt;code&gt;-enc&lt;/code&gt;&lt;/td&gt;&lt;td&gt;UTF-16LE base64 wrappers, with and without inner layers&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Empire-style multi-layer&lt;/td&gt;&lt;td&gt;&lt;code&gt;$s1 + $s2&lt;/code&gt; substitution + base64 + UTF-16LE + gzip + single-byte XOR&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Invoke-Obfuscation &lt;code&gt;COMPRESS&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Reversed base64 + DeflateStream&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;AES-CBC&lt;/td&gt;&lt;td&gt;&lt;code&gt;AesCryptoServiceProvider&lt;/code&gt; with key/IV extraction&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;RC4&lt;/td&gt;&lt;td&gt;Passphrase-keyed, base64-wrapped payloads&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;ChaCha20&lt;/td&gt;&lt;td&gt;Stream-cipher payloads&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Charcode + XOR&lt;/td&gt;&lt;td&gt;&lt;code&gt;@(N,N,N) | %{ [char]($_ -bxor $k) }&lt;/code&gt; patterns&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Custom-alphabet base64&lt;/td&gt;&lt;td&gt;Paired &lt;code&gt;$std&lt;/code&gt; / &lt;code&gt;$norm&lt;/code&gt; translation tables&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Meterpreter format-string stagers&lt;/td&gt;&lt;td&gt;&lt;code&gt;-f&lt;/code&gt; operator with concatenation chains&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Bare ROT13’d-base64 blobs&lt;/td&gt;&lt;td&gt;Inner base64 alphabet ROT13’d before encoding, SousChef auto-detects and prepends &lt;code&gt;ROT13&lt;/code&gt; to the recipe&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;Full coverage list, including patterns explicitly out of scope (cmd.exe DOSfuscation, raw shellcode disassembly, identifier-renaming-only obfuscation), lives in the &lt;a href=&quot;https://github.com/zerber0s/souschef#coverage&quot;&gt;SousChef README&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Most of my recent debugging time has gone into samples where the obfuscation pattern looked extremely similar but had a small twist (i.e. a custom base64 alphabet whose decode was silently falling back to the standard alphabet, or an RC4 sample where the key was hex-encoded one way and the model assumed another). Those cases actually produced perfect recipes that just…gave you garbage. They’re the reason that the heuristic detector layer exists at all (in addition to some iterative assistance from Claude Code).&lt;/p&gt;
&lt;h2 id=&quot;where-it-is-today&quot;&gt;Where it is today&lt;/h2&gt;
&lt;aside class=&quot;note&quot;&gt;&lt;strong&gt;TL;DR - Still in testing.&lt;/strong&gt; The system prompt is pretty solid for the patterns listed above on &lt;code&gt;qwen3-coder:30b&lt;/code&gt;, but is still running through a lot fast and there are rough edges. Treat the URL as a starting point, not a 100% finished decode. There is some baked in feedback to SousChef’s terminal output to give a confidence level via scoring.&lt;/aside&gt;
&lt;p&gt;Honest status, as of the time of this post:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Verified end-to-end on &lt;code&gt;qwen3-coder:30b&lt;/code&gt; against the tested patterns.&lt;/strong&gt; Smaller models (7B, 13B) do tend to degrade, but gracefully (they generate plausible recipes but miss the trickier multi-layer cases). Larger models work fine if you have the RAM.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Argument normalizer is critical, not cosmetic.&lt;/strong&gt; CyberChef’s URL fragment parser expects positional arguments in an exact order, otherwise named-object arguments silently fall back to defaults. I was working alongside an unknown bug for a while where decoding custom-alphabet base64 looked successful but actually used the standard alphabet, only fixed by a stricter shape enforcement via the normalizer.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A few ops have non-obvious weird quirks.&lt;/strong&gt; &lt;code&gt;From Hex&lt;/code&gt; gets forced to the &lt;code&gt;Auto&lt;/code&gt; delimiter (handles dashes, spaces, colons, line breaks) and I have no clue why. &lt;code&gt;ROT13&lt;/code&gt; and &lt;code&gt;ROT47&lt;/code&gt; are purposely &lt;em&gt;not&lt;/em&gt; treated as terminal ops, since they’re legitimate middle steps in real chains. &lt;code&gt;Find / Replace&lt;/code&gt; is forced to global matching to work around UI-vs-URL inconsistencies in CyberChef itself. All of these determined through testing (and pain).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Out of scope situations end with a graceful fallback.&lt;/strong&gt; Bohannon-style cmd.exe DOSfuscation, raw shellcode disassembly, and identifier-renaming-only obfuscation don’t produce CyberChef recipes (even though I tried). In these and similar cases, the model is instructed to produce a &lt;code&gt;Comment&lt;/code&gt; op explaining why instead of guessing.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;All of the above is from me being only a few commits in. The system prompt is still the part most likely to change between sessions, which is also why I keep a more accessible static copy in the repo &lt;a href=&quot;https://github.com/zerber0s/souschef/blob/main/SYSTEM_PROMPT.md&quot;&gt;here&lt;/a&gt;. If you use this tool and something that worked yesterday doesn’t work today, the few-shot examples are the first place to look.&lt;/p&gt;
&lt;h2 id=&quot;try-it&quot;&gt;Try it&lt;/h2&gt;
&lt;div class=&quot;link-cards&quot; data-astro-cid-z2sybhdy&gt; &lt;a class=&quot;link-card&quot; href=&quot;https://github.com/zerber0s/souschef&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; data-astro-cid-z2sybhdy&gt; &lt;span class=&quot;icon&quot; aria-hidden=&quot;true&quot; data-astro-cid-z2sybhdy&gt; 📦 &lt;/span&gt; &lt;span class=&quot;text&quot; data-astro-cid-z2sybhdy&gt; &lt;span class=&quot;label&quot; data-astro-cid-z2sybhdy&gt;SousChef on GitHub&lt;/span&gt; &lt;span class=&quot;sub&quot; data-astro-cid-z2sybhdy&gt;Source, example payloads, README, and the system prompt mirror.&lt;/span&gt; &lt;/span&gt; &lt;span class=&quot;arrow&quot; aria-hidden=&quot;true&quot; data-astro-cid-z2sybhdy&gt;
↗
&lt;/span&gt; &lt;/a&gt;&lt;a class=&quot;link-card&quot; href=&quot;https://gchq.github.io/CyberChef/&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; data-astro-cid-z2sybhdy&gt; &lt;span class=&quot;icon&quot; aria-hidden=&quot;true&quot; data-astro-cid-z2sybhdy&gt; 🧪 &lt;/span&gt; &lt;span class=&quot;text&quot; data-astro-cid-z2sybhdy&gt; &lt;span class=&quot;label&quot; data-astro-cid-z2sybhdy&gt;CyberChef&lt;/span&gt; &lt;span class=&quot;sub&quot; data-astro-cid-z2sybhdy&gt;GCHQ&amp;#39;s swiss-army knife for decode/encode/compile pipelines.&lt;/span&gt; &lt;/span&gt; &lt;span class=&quot;arrow&quot; aria-hidden=&quot;true&quot; data-astro-cid-z2sybhdy&gt;
↗
&lt;/span&gt; &lt;/a&gt;&lt;a class=&quot;link-card&quot; href=&quot;https://ollama.com&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; data-astro-cid-z2sybhdy&gt; &lt;span class=&quot;icon&quot; aria-hidden=&quot;true&quot; data-astro-cid-z2sybhdy&gt; 🦙 &lt;/span&gt; &lt;span class=&quot;text&quot; data-astro-cid-z2sybhdy&gt; &lt;span class=&quot;label&quot; data-astro-cid-z2sybhdy&gt;Ollama&lt;/span&gt; &lt;span class=&quot;sub&quot; data-astro-cid-z2sybhdy&gt;Local LLM runtime. One-time install, then `ollama serve`&lt;/span&gt; &lt;/span&gt; &lt;span class=&quot;arrow&quot; aria-hidden=&quot;true&quot; data-astro-cid-z2sybhdy&gt;
↗
&lt;/span&gt; &lt;/a&gt;&lt;a class=&quot;link-card&quot; href=&quot;https://ollama.com/library/qwen3-coder&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; data-astro-cid-z2sybhdy&gt; &lt;span class=&quot;icon&quot; aria-hidden=&quot;true&quot; data-astro-cid-z2sybhdy&gt; 🤖 &lt;/span&gt; &lt;span class=&quot;text&quot; data-astro-cid-z2sybhdy&gt; &lt;span class=&quot;label&quot; data-astro-cid-z2sybhdy&gt;qwen3-coder model&lt;/span&gt; &lt;span class=&quot;sub&quot; data-astro-cid-z2sybhdy&gt;The default model I tested. 30B variant runs comfortably on Apple Silicon with &amp;gt;32GB RAM.&lt;/span&gt; &lt;/span&gt; &lt;span class=&quot;arrow&quot; aria-hidden=&quot;true&quot; data-astro-cid-z2sybhdy&gt;
↗
&lt;/span&gt; &lt;/a&gt; &lt;/div&gt;
&lt;p&gt;If you try it on a sample and the recipe is consistently wrong, or even sometimes, &lt;a href=&quot;https://github.com/zerber0s/souschef/issues&quot;&gt;file an issue&lt;/a&gt; with the input (sanitized as needed), the model you used, and what you expected the recipe to be. That’s how this thing will continue to improve - every weird sample is a regression test waiting to be added that’ll only enhance the accuracy of future submissions.&lt;/p&gt;
&lt;p&gt;I may update this post in the future, or write a follow-up, if this tool advances past the experimental phase.&lt;/p&gt;
&lt;p&gt;ZB&lt;/p&gt;</content:encoded><enclosure url="https://labs.zerberos.io/_astro/souschef-heroimage.CUVOKBvF.png" length="0" type="image/png"/></item><item><title>Running Claude Code Locally with LM Studio on Apple Silicon</title><link>https://labs.zerberos.io/blog/local-claude-code-lm-studio/</link><guid isPermaLink="true">https://labs.zerberos.io/blog/local-claude-code-lm-studio/</guid><description>How to redirect Claude Code from the hosted API to a local Qwen3-Coder model — and why everyone who tries this with Ollama silently breaks the agentic loop.</description><pubDate>Fri, 01 May 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;https://labs.zerberos.io/_astro/local-claude-code-heroimage.Ct71OV55.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;&lt;p&gt;Most guides for running Anthropic’s &lt;a href=&quot;https://code.claude.com/docs/en/overview&quot;&gt;Claude Code&lt;/a&gt; against a local model point you at &lt;a href=&quot;https://ollama.com&quot;&gt;Ollama&lt;/a&gt;, tell you to set a couple of env vars, and consider it done. Granted, it looks like it works - you can chat with it, but the agentic loop is silently broken. The model can talk to you, but it can’t actually &lt;em&gt;do&lt;/em&gt; anything. No file reads, no real tool calls, no multi-step task execution. Just a polite chatbot wearing Claude Code’s UI as a costume (and making my MacBook a mini space heater.)&lt;/p&gt;
&lt;p&gt;This post walks through a setup I eventually came to that actually works on Apple Silicon: &lt;strong&gt;&lt;a href=&quot;https://lmstudio.ai&quot;&gt;LM Studio&lt;/a&gt; + the Unsloth GGUF version of the Qwen3-Coder-30B-A3B model&lt;/strong&gt;, running entirely local on a &lt;strong&gt;14” M5 Pro MacBook Pro with 64GB of unified memory&lt;/strong&gt;. Full agentic loop, no API costs, no rate limits, no data leaving the machine.&lt;/p&gt;
&lt;h2 id=&quot;why-bother-running-it-locally&quot;&gt;Why bother running it locally&lt;/h2&gt;
&lt;p&gt;One of the key advantages of running local large language models (LLMs) is privacy, which can also be a key component in DFIR work. When dealing with sensitive Client info, or even malware, cloud models introduce risk and restrictions. I wanted to test for myself what these local models could do, which initially led me to beta testing Claude Code local on my MacBook Pro in the first place. If you also have capable hardware (and hate the direction of Anthropic’s pricing and plans) it can be worthwhile to explore local models for lighter agentic work.&lt;/p&gt;
&lt;p&gt;While this post doesn’t cover it, one of the other benefits to local models can be the ability to run “abliterated” versions, which are models that have had their refusal &amp;#x26; safety behavior weakened or removed after training. These can be very useful for malware decoding and analysis where normal cloud-based models, like OpenAI’s &lt;a href=&quot;https://chatgpt.com&quot;&gt;ChatGPT&lt;/a&gt; and Google’s &lt;a href=&quot;https://gemini.google.com&quot;&gt;Gemini&lt;/a&gt;, will refuse. These would be run independently, not via the Claude Code process outlined below.&lt;/p&gt;
&lt;h2 id=&quot;why-lm-studio-and-not-ollama&quot;&gt;Why LM Studio (and not Ollama)&lt;/h2&gt;
&lt;p&gt;This is the part I learned after about an hour of wondering why Ollama was spitting back gibberish to me after thinking on the question “&lt;em&gt;what files are in this directory&lt;/em&gt;?” for 5-10 minutes.&lt;/p&gt;
&lt;p&gt;Claude Code is built around Anthropic’s &lt;a href=&quot;https://docs.claude.com/en/api/messages&quot;&gt;Messages API&lt;/a&gt;, which uses structured &lt;code&gt;tool_use&lt;/code&gt; and &lt;code&gt;tool_result&lt;/code&gt; blocks for every agentic action. Essentially every Bash command, file read, and edit. The model’s response isn’t just text, it’s a sequence of typed content blocks that the CLI parses and dispatches.&lt;/p&gt;
&lt;p&gt;Ollama serves an OpenAI-compatible endpoint and translates Anthropic-shaped requests on the fly. That translation layer doesn’t preserve the tool-call blocks cleanly. The model emits something that &lt;em&gt;looks&lt;/em&gt; like a tool call, the adapter mangles it, Claude Code can’t parse it, and the agentic loop breaks. You get a model that says &lt;em&gt;“I’ll check that file for you”&lt;/em&gt; and then nothing happens. Super fun.&lt;/p&gt;
&lt;p&gt;LM Studio 0.4.1 added a &lt;strong&gt;native Anthropic Messages API&lt;/strong&gt; at &lt;code&gt;/v1/messages&lt;/code&gt;. Claude Code talks to it the same way it talks to Anthropic’s hosted API, and tool calls round-trip correctly. No adapter or translation needed.&lt;/p&gt;
&lt;aside class=&quot;note&quot;&gt;&lt;strong&gt;The other half of this:&lt;/strong&gt; even with LM Studio, you need a GGUF version that actually emits valid tool calls. The &lt;a href=&quot;https://huggingface.co/unsloth&quot;&gt;Unsloth&lt;/a&gt; uploads of Qwen3-Coder include patches that fix tool-calling bugs in the base release. Other uploads of the same model (e.g. lmstudio-community, mradermacher) predate those fixes and will misfire. More on this in Step 2.&lt;/aside&gt;
&lt;h2 id=&quot;prerequisites&quot;&gt;Prerequisites&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;macOS&lt;/strong&gt; (this walkthrough was done on macOS 26.4)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;LM Studio 0.4.1 or later&lt;/strong&gt; — earlier versions don’t expose the native Anthropic endpoint&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;An Anthropic account&lt;/strong&gt; — Pro, Max, Team, Enterprise, or Console. Free tier doesn’t include Claude Code access. You only need to authenticate once, then redirect everything local via env vars.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Terminal access&lt;/strong&gt; (Terminal, iTerm2, whatever you use)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Apple Silicon with enough unified memory for the model you want.&lt;/strong&gt; I’m on a 14” M5 Pro with 64GB. To maximize the value of this post, I have added in model recommendations below that scale by RAM tier.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;step-by-step-setup&quot;&gt;Step-by-step setup&lt;/h2&gt;
&lt;h3 id=&quot;step-1-install-claude-code&quot;&gt;Step 1: Install Claude Code&lt;/h3&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B392F0&quot;&gt;curl&lt;/span&gt;&lt;span style=&quot;color:#79B8FF&quot;&gt; -fsSL&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; https://claude.ai/install.sh&lt;/span&gt;&lt;span style=&quot;color:#F97583&quot;&gt; |&lt;/span&gt;&lt;span style=&quot;color:#B392F0&quot;&gt; bash&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Verify the install:&lt;/p&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B392F0&quot;&gt;claude&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; doctor&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;This checks installation health and surfaces config issues. On the first run it’ll authenticate against Anthropic and that’s expected. The redirect to local happens via env vars in Step 5.&lt;/p&gt;
&lt;h3 id=&quot;step-2-pick-and-download-a-model&quot;&gt;Step 2: Pick and download a model&lt;/h3&gt;
&lt;p&gt;The sad reality is what you can run depends on how much unified memory you have. Rough guide:&lt;/p&gt;



































&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;RAM&lt;/th&gt;&lt;th&gt;Recommended model&lt;/th&gt;&lt;th&gt;Quant&lt;/th&gt;&lt;th&gt;Size&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;24GB&lt;/td&gt;&lt;td&gt;Qwen3.5-35B-A3B&lt;/td&gt;&lt;td&gt;Q4_K_M&lt;/td&gt;&lt;td&gt;~22GB&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;64GB &lt;em&gt;&lt;strong&gt;(my setup)&lt;/strong&gt;&lt;/em&gt;&lt;/td&gt;&lt;td&gt;Qwen3-Coder-30B-A3B&lt;/td&gt;&lt;td&gt;UD Q4_K_XL&lt;/td&gt;&lt;td&gt;~17.67GB&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;64GB (more intensive)&lt;/td&gt;&lt;td&gt;Qwen3.5-27B dense&lt;/td&gt;&lt;td&gt;Q8_0&lt;/td&gt;&lt;td&gt;~30GB&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;128GB+&lt;/td&gt;&lt;td&gt;Qwen3-Coder-Next 80B&lt;/td&gt;&lt;td&gt;Q4_K_M&lt;/td&gt;&lt;td&gt;~48GB&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;I went with &lt;strong&gt;Qwen3-Coder-30B-A3B&lt;/strong&gt; for the agentic Claude Code use case. A few reasons for this:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Purpose-built for agentic coding&lt;/strong&gt;, tool calling, and multi-file reasoning&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Mixture of Experts (MoE) architecture&lt;/strong&gt; - 30B total params but only 3B active per token, so prefill is fast&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Native 256K context support&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;No “thinking” mode&lt;/strong&gt; - less overhead per turn, which matters when you’re firing off tool calls in a loop&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;In LM Studio’s model search, look for &lt;code&gt;Qwen3-Coder-30B-A3B&lt;/code&gt; and pick:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Author&lt;/strong&gt;: &lt;code&gt;unsloth&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Repo&lt;/strong&gt;: &lt;code&gt;Qwen3-Coder-30B-A3B-Instruct-GGUF&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Quant&lt;/strong&gt;: &lt;code&gt;UD Q4_K_XL&lt;/code&gt; (~17.67GB)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;code&gt;UD&lt;/code&gt; refers to Unsloth’s “dynamic” quantization, which uses layer-aware compression to retain more model quality than standard Q4 while staying around the same file size. On a 64GB MacBook such as mine, that should leave roughly 46GB available for macOS, KV cache, and other apps running alongside the model.&lt;/p&gt;
&lt;aside class=&quot;note&quot;&gt;&lt;strong&gt;Don&apos;t grab the wrong upload.&lt;/strong&gt; The lmstudio-community upload of this model is months older and predates the tool-calling fixes. If you do, your tool calls will silently fail. Stick with the &lt;code&gt;unsloth&lt;/code&gt; author. Also avoid fine-tuned variants (Huihui abliterated, etc.) for this use case as Claude Code expects the base instruct format.&lt;/aside&gt;
&lt;h3 id=&quot;step-3-configure-the-model-in-lm-studio&quot;&gt;Step 3: Configure the model in LM Studio&lt;/h3&gt;
&lt;p&gt;Once downloaded, open the model’s settings panel. There are two tabs that matter, &lt;strong&gt;Load&lt;/strong&gt; and &lt;strong&gt;Inference&lt;/strong&gt; (plus the prompt template). A lot of these settings were discovered by me through a combo of trial/error and research, validated by some Claude questions.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Load tab:&lt;/strong&gt;&lt;/p&gt;




























































&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Setting&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;th&gt;Notes&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Context Length&lt;/td&gt;&lt;td&gt;&lt;code&gt;32768&lt;/code&gt;&lt;/td&gt;&lt;td&gt;32K is the sweet spot. Push to &lt;code&gt;65536&lt;/code&gt; if you keep hitting limits.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;GPU Offload&lt;/td&gt;&lt;td&gt;&lt;code&gt;Max / -1&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Full Metal offload so model fits in unified memory.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Evaluation Batch Size&lt;/td&gt;&lt;td&gt;&lt;code&gt;1024&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Default is &lt;code&gt;512&lt;/code&gt;. Doubling this noticeably speeds up prefill, which is relevant for Claude Code’s large system prompt.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Unified KV Cache&lt;/td&gt;&lt;td&gt;&lt;code&gt;On&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Default, leave it.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Offload KV Cache to GPU&lt;/td&gt;&lt;td&gt;&lt;code&gt;On&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Default, leave it.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Keep Model in Memory&lt;/td&gt;&lt;td&gt;&lt;code&gt;On&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Avoids cold-load delays between sessions.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Flash Attention&lt;/td&gt;&lt;td&gt;&lt;code&gt;On&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Reduces memory pressure at long contexts.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;K/V Cache Quantization&lt;/td&gt;&lt;td&gt;&lt;code&gt;Off&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Experimental, leave off for stability.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Try mmap()&lt;/td&gt;&lt;td&gt;&lt;code&gt;On&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Default&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Number of Experts&lt;/td&gt;&lt;td&gt;&lt;code&gt;8&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Correct for this model, don’t change.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Inference tab:&lt;/strong&gt;&lt;/p&gt;













































&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Setting&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;th&gt;Notes&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Temperature&lt;/td&gt;&lt;td&gt;&lt;code&gt;0.7&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Qwen’s official recommendation for the Coder series.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Top K&lt;/td&gt;&lt;td&gt;&lt;code&gt;20&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Down from default &lt;code&gt;40&lt;/code&gt; - keeps tool calls tight.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Top P&lt;/td&gt;&lt;td&gt;&lt;code&gt;0.80&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Down from default &lt;code&gt;0.95&lt;/code&gt; - also Qwen’s recommendation.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Repeat Penalty&lt;/td&gt;&lt;td&gt;&lt;code&gt;1.05&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Down from default &lt;code&gt;1.1&lt;/code&gt; - discourages repetition in long sessions.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Min P&lt;/td&gt;&lt;td&gt;&lt;code&gt;Off / 0&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Disable. Can interfere with tool-call format.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Reasoning Section Parsing&lt;/td&gt;&lt;td&gt;&lt;code&gt;Off&lt;/code&gt;&lt;/td&gt;&lt;td&gt;This model has no &lt;code&gt;&amp;#x3C;think&gt;&lt;/code&gt; blocks.&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Structured Output&lt;/td&gt;&lt;td&gt;&lt;code&gt;Off&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Claude Code handles its own structure, enabling this breaks tool calls.&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Prompt Template tab:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The default Jinja template included with the GGUF version of this model uses an unsupported &lt;code&gt;safe&lt;/code&gt; filter, which causes an error on the first prompt. This error specifically was a major headache for me to identify, but luckily was an easy fix.&lt;/p&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;[ERROR] Error rendering prompt with jinja template:&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span&gt;&quot;Unknown StringValue filter: safe&quot;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Fix it manually:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Switch from &lt;strong&gt;Template (Jinja)&lt;/strong&gt; to &lt;strong&gt;Manual&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Pick &lt;strong&gt;ChatML&lt;/strong&gt; from the dropdown&lt;/li&gt;
&lt;li&gt;Confirm the start/end tags populate as &lt;code&gt;&amp;#x3C;|im_start|&gt;&lt;/code&gt; / &lt;code&gt;&amp;#x3C;|im_end|&gt;&lt;/code&gt; for system, user, and assistant&lt;/li&gt;
&lt;li&gt;Confirm stop strings include &lt;code&gt;&amp;#x3C;|im_start|&gt;&lt;/code&gt; and &lt;code&gt;&amp;#x3C;|im_end|&gt;&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Eject and reload the model&lt;/li&gt;
&lt;/ol&gt;
&lt;aside class=&quot;note&quot;&gt;If you ever see the model start &lt;em&gt;echoing back&lt;/em&gt; chunks of its own previous output or running on past where it should stop, the stop strings probably didn&apos;t carry over when you switched template modes. Re-check this tab.&lt;/aside&gt;
&lt;h3 id=&quot;step-4-start-the-lm-studio-server&quot;&gt;Step 4: Start the LM Studio server&lt;/h3&gt;
&lt;p&gt;Either flip the server toggle on in LM Studio’s &lt;strong&gt;Developer&lt;/strong&gt; tab, or start it from the terminal:&lt;/p&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B392F0&quot;&gt;lms&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; server&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; start&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Default port is &lt;code&gt;1234&lt;/code&gt;. Verify the model is loaded and reachable:&lt;/p&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B392F0&quot;&gt;curl&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; http://localhost:1234/v1/models&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Expected output:&lt;/p&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;json&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt;{&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#79B8FF&quot;&gt;  &quot;data&quot;&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt;: [&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt;    {&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#79B8FF&quot;&gt;      &quot;id&quot;&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt;: &lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt;&quot;qwen3-coder-30b-a3b-instruct&quot;&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#79B8FF&quot;&gt;      &quot;object&quot;&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt;: &lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt;&quot;model&quot;&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt;,&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#79B8FF&quot;&gt;      &quot;owned_by&quot;&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt;: &lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt;&quot;organization_owner&quot;&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt;    }&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt;  ]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt;}&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;&lt;strong&gt;Write down the exact &lt;code&gt;id&lt;/code&gt; value&lt;/strong&gt; - it should match the name of the model you chose and you’ll need it character-for-character for the env var in the next step.&lt;/p&gt;
&lt;p&gt;While you’re in the Developer tab, two server settings worth tweaking:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Just-in-Time Model Loading:&lt;/strong&gt; &lt;code&gt;Off&lt;/code&gt; — keeps the model resident in memory between Claude Code prompts instead of re-loading on each request.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Require Authentication:&lt;/strong&gt; &lt;code&gt;Off&lt;/code&gt; — a dummy token works fine locally, so no need for the overhead.&lt;/li&gt;
&lt;/ul&gt;
&lt;h3 id=&quot;step-5-set-environment-variables&quot;&gt;Step 5: Set environment variables&lt;/h3&gt;
&lt;p&gt;There are two options we can do here - permanent global changes via &lt;code&gt;.zshrc&lt;/code&gt;, or a shell script you run per terminal session. Hardcoding global changes will force Claude Code to always run locally, while the script method is something you can run before each intended local session, allowing Claude Code to automatically default back to cloud models in future sessions (Credit to KW 🐍).&lt;/p&gt;
&lt;p&gt;The tradeoff is convenience vs control. Leveraging &lt;code&gt;.zshrc&lt;/code&gt; is simpler to set up once and forget, but the script approach lets you switch between local and hosted cleanly. I am outlining both methods below, up to you what you think works best for your situation.&lt;/p&gt;
&lt;h4 id=&quot;option-1-dedicated-shell-script&quot;&gt;&lt;strong&gt;Option 1: Dedicated shell script&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Create a file like &lt;code&gt;~/scripts/local-claude.sh&lt;/code&gt;:&lt;/p&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B392F0&quot;&gt;mkdir&lt;/span&gt;&lt;span style=&quot;color:#79B8FF&quot;&gt; -p&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; ~/scripts&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt; &amp;#x26;&amp;#x26; &lt;/span&gt;&lt;span style=&quot;color:#B392F0&quot;&gt;nano&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; ~/scripts/local-claude.sh&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The script should contain something like this (specific to the model you chose):&lt;/p&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#6A737D&quot;&gt;#!/bin/zsh&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#F97583&quot;&gt;export&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt; ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span style=&quot;color:#F97583&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt;http://localhost:1234&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#F97583&quot;&gt;export&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt; ANTHROPIC_AUTH_TOKEN&lt;/span&gt;&lt;span style=&quot;color:#F97583&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt;lmstudio&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#F97583&quot;&gt;export&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt; ANTHROPIC_MODEL&lt;/span&gt;&lt;span style=&quot;color:#F97583&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt;qwen3-coder-30b-a3b-instruct&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#79B8FF&quot;&gt;echo&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; &quot;Claude Code → local LM Studio&quot;&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Ensure the script is executable after saving:&lt;/p&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B392F0&quot;&gt;chmod&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; +x&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; ~/scripts/local-claude.sh&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Then invoke it per-session when needed:&lt;/p&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#79B8FF&quot;&gt;source&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; ~/scripts/local-claude.sh&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B392F0&quot;&gt;claude&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Opening a fresh terminal without sourcing the script gives you the default hosted API back immediately. No editing files, no unsetting variables manually.&lt;/p&gt;
&lt;h4 id=&quot;option-2-persistent-changes-via-zshrc&quot;&gt;&lt;strong&gt;Option 2: Persistent changes via &lt;code&gt;.zshrc&lt;/code&gt;&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Open your shell rc file:&lt;/p&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B392F0&quot;&gt;nano&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; ~/.zshrc&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Add these three lines at the bottom (using the model you chose):&lt;/p&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#F97583&quot;&gt;export&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt; ANTHROPIC_BASE_URL&lt;/span&gt;&lt;span style=&quot;color:#F97583&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt;http://localhost:1234&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#F97583&quot;&gt;export&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt; ANTHROPIC_AUTH_TOKEN&lt;/span&gt;&lt;span style=&quot;color:#F97583&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt;lmstudio&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#F97583&quot;&gt;export&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt; ANTHROPIC_MODEL&lt;/span&gt;&lt;span style=&quot;color:#F97583&quot;&gt;=&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt;qwen3-coder-30b-a3b-instruct&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The &lt;code&gt;AUTH_TOKEN&lt;/code&gt; value doesn’t matter (LM Studio doesn’t validate it locally) but Claude Code refuses to start without one set. The &lt;code&gt;MODEL&lt;/code&gt; value must match the &lt;code&gt;id&lt;/code&gt; from Step 4 exactly.&lt;/p&gt;
&lt;p&gt;Save (&lt;code&gt;Control+O&lt;/code&gt;, Enter, &lt;code&gt;Control+X&lt;/code&gt;), then apply to the current shell:&lt;/p&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#79B8FF&quot;&gt;source&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; ~/.zshrc&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Verify:&lt;/p&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#79B8FF&quot;&gt;echo&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt; $ANTHROPIC_BASE_URL&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#6A737D&quot;&gt;# http://localhost:1234&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;aside class=&quot;note&quot;&gt;For this &lt;code&gt;.zshrc&lt;/code&gt; method, if you forget to &lt;code&gt;source&lt;/code&gt;, the new env vars only take effect in &lt;em&gt;new&lt;/em&gt; terminal windows. Existing windows still point at Anthropic&apos;s hosted API, and Claude Code will quietly bill you instead of routing local. Always re-verify with &lt;code&gt;echo&lt;/code&gt; before launching &lt;code&gt;claude&lt;/code&gt; in a session you care about. You can also see what model is being used when Claude Code loads.&lt;/aside&gt;
&lt;h3 id=&quot;step-6-launch-claude-code&quot;&gt;Step 6: Launch Claude Code&lt;/h3&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#79B8FF&quot;&gt;cd&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; /your/project&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B392F0&quot;&gt;claude&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;The very first thing to check: &lt;strong&gt;the bottom-left of the Claude Code UI.&lt;/strong&gt; If it shows the LM Studio model id (e.g. &lt;code&gt;qwen3-coder-30b-a3b-instruct&lt;/code&gt;), you’re routed local. If it still shows something like &lt;code&gt;Sonnet 4.6 · API Usage Billing&lt;/code&gt;, the env vars didn’t take in this terminal session - back to Step 5 you go.&lt;/p&gt;
&lt;p&gt;I have seen that it’s advisable to set effort to low for routine tasks - local models can’t match hosted Sonnet at high effort, and &lt;code&gt;low&lt;/code&gt; is the sweet spot for prefill speed:&lt;/p&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;/effort low&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Smoke-test the agentic loop with something concrete:&lt;/p&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;what files are in this directory?&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Watch LM Studio’s developer logs, which are visible under the &lt;strong&gt;Developer&lt;/strong&gt; tab. You should see prefill, generation, and a tool call go out. Claude Code should also come back with actual filenames, not a description of what it would do &lt;em&gt;if&lt;/em&gt; it could read files. If the model just narrates what it’s about to do without anything happening, the tool-call format is broken and you’ll need to re-check the GGUF (Step 2) and the prompt template (Step 3). The speed at which Claude Code responds will also be heavily dependent on your hardware and the model you chose.&lt;/p&gt;
&lt;p&gt;Finally, you can generate a &lt;code&gt;CLAUDE.md&lt;/code&gt; for your project if you’re already in the folder you wish to code within:&lt;/p&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;plaintext&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span&gt;/init&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Claude Code reads this file on every session start, which lets the model skip a chunk of the cold-start exploration it did initially.&lt;/p&gt;
&lt;h2 id=&quot;performance-expectations&quot;&gt;Performance expectations&lt;/h2&gt;
&lt;p&gt;For my setup (&lt;strong&gt;M5 Pro, 64GB, Qwen3-Coder-30B-A3B UD Q4_K_XL&lt;/strong&gt;) this is what I found as the consensus online, which helps me ensure everything is working as it should:&lt;/p&gt;













































&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Metric&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Model size in memory&lt;/td&gt;&lt;td&gt;~17.67GB&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;macOS overhead&lt;/td&gt;&lt;td&gt;~8–10GB&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Total memory pressure&lt;/td&gt;&lt;td&gt;~26–28GB &lt;em&gt;(comfortable on 64GB)&lt;/em&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;GPU utilization during inference&lt;/td&gt;&lt;td&gt;~100%&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;GPU power draw&lt;/td&gt;&lt;td&gt;~33W&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;GPU temperature under load&lt;/td&gt;&lt;td&gt;~91°C &lt;em&gt;(safe — M5 Pro throttles around 105°C)&lt;/em&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Prefill speed&lt;/td&gt;&lt;td&gt;~100 tok/s&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;First response time (cold)&lt;/td&gt;&lt;td&gt;20–30 seconds (after &lt;code&gt;/effort low&lt;/code&gt;)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Subsequent responses&lt;/td&gt;&lt;td&gt;Faster - KV cache holds the session context&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;&lt;strong&gt;Why first responses feel slow:&lt;/strong&gt; Claude Code sends a 10–40K token system prompt at the start of every session. All of that has to be prefilled before your first answer comes back. Subsequent prompts in the same session reuse the KV cache and respond noticeably faster, which is why &lt;code&gt;/init&lt;/code&gt; and re-using the same session both pay off.&lt;/p&gt;
&lt;p&gt;The unified memory architecture is doing a lot of heavy lifting here, which is also why Apple’s Mac Mini and Mac Studio products have been flying off shelves lately. GPU and CPU share the same pool, so there’s no transfer bottleneck between discrete VRAM and system RAM the way there would be on a desktop with a dedicated card, such as a gaming PC.&lt;/p&gt;
&lt;h2 id=&quot;troubleshooting-the-pain&quot;&gt;Troubleshooting the pain&lt;/h2&gt;
&lt;p&gt;Since it could help to see my failures, below I listed out some of the specific problems I faced going through the initial setup and what I found out to fix them.&lt;/p&gt;
&lt;h4 id=&quot;issue-1-claude-code-still-shows-sonnet-46-after-setting-env-vars&quot;&gt;&lt;strong&gt;Issue 1: Claude Code still shows Sonnet 4.6 after setting env vars&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Bottom-left of the UI still says &lt;code&gt;Sonnet 4.6 · API Usage Billing&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cause:&lt;/strong&gt; Env vars not live in the current terminal, or &lt;code&gt;ANTHROPIC_MODEL&lt;/code&gt; wasn’t set.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; &lt;code&gt;echo $ANTHROPIC_BASE_URL&lt;/code&gt; to confirm it’s set, run &lt;code&gt;source ~/.zshrc&lt;/code&gt; or re-execute your shell script (whatever you chose in Step 5) if not. Confirm LM Studio is up with &lt;code&gt;curl http://localhost:1234/v1/models&lt;/code&gt;. Make sure &lt;code&gt;ANTHROPIC_MODEL&lt;/code&gt; matches the &lt;code&gt;id&lt;/code&gt; returned by that curl, character-for-character. Relaunch &lt;code&gt;claude&lt;/code&gt; from the same terminal.&lt;/p&gt;
&lt;h4 id=&quot;issue-2-first-response-takes-5-minutes&quot;&gt;&lt;strong&gt;Issue 2: First response takes 5+ minutes&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Claude Code hangs for several minutes on the first prompt of a session.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Causes &amp;#x26; Fixes:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Multiple models loaded in LM Studio&lt;/strong&gt; - combined weight pushed past available RAM into swap. Eject everything except the model you’re using.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;High effort mode&lt;/strong&gt; — run &lt;code&gt;/effort low&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Cold prefill of the 10–40K-token system prompt&lt;/strong&gt; — this is normal, especially on the first prompt. Subsequent prompts are faster and &lt;code&gt;/init&lt;/code&gt; can help reduce it further.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Default batch size of 512&lt;/strong&gt; — bump to &lt;code&gt;1024&lt;/code&gt; in LM Studio Load settings.&lt;/li&gt;
&lt;/ul&gt;
&lt;h4 id=&quot;issue-3-jinja-template-error-on-first-prompt&quot;&gt;&lt;strong&gt;Issue 3: Jinja template error on first prompt&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; LM Studio dev logs show &lt;code&gt;Unknown StringValue filter: safe&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cause:&lt;/strong&gt; The Qwen3-Coder GGUF ships a Jinja template that uses a filter LM Studio’s template engine doesn’t support.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Switch the prompt template from Jinja to &lt;code&gt;Manual&lt;/code&gt; —&gt; &lt;code&gt;ChatML&lt;/code&gt;, confirm the &lt;code&gt;im_start&lt;/code&gt;/&lt;code&gt;im_end&lt;/code&gt; tags and stop strings, eject and reload the model. Full steps in Step 3.&lt;/p&gt;
&lt;h4 id=&quot;issue-4-model-describes-actions-but-doesnt-actually-execute-them&quot;&gt;&lt;strong&gt;Issue 4: Model describes actions but doesn’t actually execute them&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; The model says &lt;em&gt;“I’ll read that file for you”&lt;/em&gt; and then…literally nothing. No file actually opened, no tool call in the LM Studio logs. Unhappy Zach.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Cause:&lt;/strong&gt; Either you’re somehow behind Ollama’s translation layer, or the GGUF you’re using predates the Unsloth tool-calling fixes.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Use LM Studio (not Ollama) for the native Anthropic endpoint, and use the &lt;strong&gt;unsloth&lt;/strong&gt; GGUF specifically (not lmstudio-community or mradermacher). This is the whole reason the post exists.&lt;/p&gt;
&lt;h4 id=&quot;issue-5-anthropic_model-value-doesnt-take-effect&quot;&gt;&lt;strong&gt;Issue 5: &lt;code&gt;ANTHROPIC_MODEL&lt;/code&gt; value doesn’t take effect&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; Claude Code routes to the wrong model or errors out at startup.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Fix:&lt;/strong&gt; Copy the &lt;code&gt;id&lt;/code&gt; straight from the &lt;code&gt;curl /v1/models&lt;/code&gt; response. The display name in LM Studio’s UI is sometimes formatted differently (version suffixes, capitalization) and the env var has to match the API &lt;code&gt;id&lt;/code&gt; exactly.&lt;/p&gt;
&lt;h2 id=&quot;tldr&quot;&gt;TL;DR&lt;/h2&gt;
&lt;p&gt;From what I’ve seen, it appears most people just point Claude Code at Ollama, set a couple of env vars, and call it done. It looks like it works, but the agentic loop can be annoyingly broken. &lt;strong&gt;LM Studio’s native Anthropic API + the Unsloth Qwen3-Coder GGUF&lt;/strong&gt; are what separated my ultimate working setup from a PoC.&lt;/p&gt;
&lt;p&gt;I have been playing around with this since getting it setup and it has been incredibly useful for local coding tasks with an agentic boost. While it will never be as powerful as a full cloud model, not every task needs it to be (which can also save me usage and a few $$ in API credits.)&lt;/p&gt;
&lt;p&gt;My next step, independent of Claude Code, is going to be exploring static malware analysis with “abliterated” models, such as deobfuscated complicated Base64 commands to determine their functionality. Additionally, I am hoping these models will allow me to dive deeper into &lt;em&gt;&lt;strong&gt;ethical&lt;/strong&gt;&lt;/em&gt; research related to different attack methodologies via malware generation.&lt;/p&gt;
&lt;p&gt;Enjoy those tokens.&lt;/p&gt;
&lt;p&gt;ZB&lt;/p&gt;</content:encoded><enclosure url="https://labs.zerberos.io/_astro/local-claude-code-heroimage.Ct71OV55.png" length="0" type="image/png"/></item><item><title>Introducing EIDVault: An EID Reference App Built by an Analyst, for Analysts</title><link>https://labs.zerberos.io/blog/eidvault-launch/</link><guid isPermaLink="true">https://labs.zerberos.io/blog/eidvault-launch/</guid><description>The announcement of my first iOS App.</description><pubDate>Mon, 20 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;https://labs.zerberos.io/_astro/eidvault-blog-banner-whitebg-1600x900.DcyJoBb7.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;&lt;p&gt;If you’ve ever found yourself three hours into an investigation, staring at Event ID (EID) 4624 Logon Type 10, trying to remember whether that’s the interactive one, the remote one, or the one you always have to Google (“GooGoo” as a colleague calls it) - this app is for you. It’s also, admittedly, for me.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;EIDVault&lt;/strong&gt; is an iOS app for digital forensic analysts and incident responders. It’s a quick-reference for Windows Event IDs, enriched with &lt;a href=&quot;https://attack.mitre.org&quot;&gt;MITRE ATT&amp;amp;CK&lt;/a&gt; mappings, detection rules, relevant XML fields, and investigation pivots.&lt;/p&gt;
&lt;p&gt;It’s live on the App Store now - &lt;a href=&quot;https://apps.apple.com/us/app/eidvault/id6761655272&quot;&gt;download EIDVault for iPhone &amp;amp; iPad&lt;/a&gt;.&lt;/p&gt;
&lt;h2 id=&quot;why-i-built-it&quot;&gt;Why I built it&lt;/h2&gt;
&lt;p&gt;There’s no shortage of excellent Windows event reference material on the internet - &lt;a href=&quot;https://learn.microsoft.com/en-us/&quot;&gt;Microsoft Learn&lt;/a&gt;, &lt;a href=&quot;https://www.ultimatewindowssecurity.com&quot;&gt;Ultimate Windows Security&lt;/a&gt;, a stack of bookmarked &lt;a href=&quot;https://www.sans.org&quot;&gt;SANS&lt;/a&gt; whitepapers, etc. What I kept wanting was something a little faster, something that could possibly live on my phone so I could look up an EID while on a call, skim/correlate related events, or even export out specific relevant information for use later.&lt;/p&gt;
&lt;p&gt;So I started drafting up ideas…before quickly realizing how much of a lift learning Swift and the intricacies of iOS app development would be from scratch. Then Apple decided they would &lt;a href=&quot;https://www.apple.com/newsroom/2026/02/xcode-26-point-3-unlocks-the-power-of-agentic-coding/&quot;&gt;add agentic coding to Xcode&lt;/a&gt; and, with that, eliminate all my excuses. So I built it (and I would encourage everyone interested to try the same.)&lt;/p&gt;
&lt;p&gt;The goals to start were pretty simple:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Fast lookup&lt;/strong&gt; — type an EID, get an answer.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Real context&lt;/strong&gt; — not just “Generated when a logon session is created”, but &lt;em&gt;what to correlate with, what’s noisy, what the key XML fields are, and how adversaries could abuse it.&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Offline-first&lt;/strong&gt; — the dataset ships inside the app. No login, no hoops.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Analyst-shaped&lt;/strong&gt; — built around how I actually use EIDs during an investigation.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;the-dataset&quot;&gt;The dataset&lt;/h2&gt;
&lt;p&gt;Everything the app displays is backed by a structured JSON dataset that lives in a &lt;strong&gt;public GitHub repo&lt;/strong&gt;:&lt;/p&gt;
&lt;p&gt;🧾 &lt;strong&gt;&lt;a href=&quot;https://github.com/zerber0s/windows-eid-data&quot;&gt;github.com/zerber0s/windows-eid-data&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;I split the data from the app intentionally. The app is essentially a lens while the dataset is the source of truth. An added benefit of this structure is EIDs can be tweaked, or even added to pre-existing log channels, without needing a cooresponding iOS app update. And as every analyst knows, the field of cybersecurity is &lt;em&gt;always&lt;/em&gt; changing. So if you spot an error, want to suggest a new event, see something out of date, or just think my investigation pivots for 4688 are missing something obvious (they probably are), that’s the place to raise it. &lt;strong&gt;Issues and PRs are open.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;The data is organized by log channel, one JSON file per channel - &lt;code&gt;security.json&lt;/code&gt;, &lt;code&gt;powershell.json&lt;/code&gt;, &lt;code&gt;sysmon.json&lt;/code&gt;, &lt;code&gt;kerberos.json&lt;/code&gt;, and so on. Every entry conforms to a published &lt;a href=&quot;https://github.com/zerber0s/windows-eid-data/blob/main/schema.json&quot;&gt;JSON schema&lt;/a&gt;, which keeps things predictable as the dataset grows.&lt;/p&gt;
&lt;p&gt;Below you can see what a single entry looks like - click through the tabs to see how the same JSON feeds different views inside the app:&lt;/p&gt;
&lt;div class=&quot;eid-preview&quot;&gt;&lt;div class=&quot;eid-header&quot;&gt;&lt;div class=&quot;eid-id&quot;&gt;&lt;span class=&quot;badge&quot;&gt;Security&lt;/span&gt;&lt;span class=&quot;eid-number&quot;&gt;EID 4624&lt;/span&gt;&lt;/div&gt;&lt;h4 class=&quot;eid-title&quot;&gt;An account was successfully logged on&lt;/h4&gt;&lt;div class=&quot;eid-tags&quot;&gt;&lt;span class=&quot;tag&quot;&gt;logon&lt;/span&gt;&lt;span class=&quot;tag&quot;&gt;authentication&lt;/span&gt;&lt;span class=&quot;tag mitre&quot;&gt;T1078 · Valid Accounts&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class=&quot;eid-tabs&quot; role=&quot;tablist&quot;&gt;&lt;button role=&quot;tab&quot; aria-selected=&quot;true&quot; class=&quot;eid-tab active&quot;&gt;Details&lt;/button&gt;&lt;button role=&quot;tab&quot; aria-selected=&quot;false&quot; class=&quot;eid-tab &quot;&gt;Key Fields&lt;/button&gt;&lt;button role=&quot;tab&quot; aria-selected=&quot;false&quot; class=&quot;eid-tab &quot;&gt;Pivots&lt;/button&gt;&lt;button role=&quot;tab&quot; aria-selected=&quot;false&quot; class=&quot;eid-tab &quot;&gt;Detections&lt;/button&gt;&lt;button role=&quot;tab&quot; aria-selected=&quot;false&quot; class=&quot;eid-tab &quot;&gt;Raw JSON&lt;/button&gt;&lt;/div&gt;&lt;div class=&quot;eid-body&quot;&gt;&lt;p&gt;Generated when a logon session is created on a system. The event is recorded on the machine being accessed and includes the account name, logon type, source network address, and authentication package used.&lt;/p&gt;&lt;/div&gt;&lt;div class=&quot;eid-footer&quot;&gt;&lt;span&gt;Preview · what the app renders from a single JSON entry&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;!--astro:end--&gt;
&lt;p&gt;Each field has a purpose. &lt;code&gt;details&lt;/code&gt; is the factual “what/when” - no directives, no “look for suspicious values.” That stuff lives in &lt;code&gt;notesGuidance.investigationPivots&lt;/code&gt;, so the app can render the two cleanly and separately: &lt;em&gt;here’s what the event is&lt;/em&gt;, and &lt;em&gt;here’s what to do with it during an investigation.&lt;/em&gt;&lt;/p&gt;
&lt;h2 id=&quot;what-the-app-actually-does&quot;&gt;What the app actually does&lt;/h2&gt;
&lt;p&gt;Inside EIDVault you’ll find:&lt;/p&gt;
&lt;div class=&quot;feature-grid&quot; data-astro-cid-p7n436wq&gt; &lt;div class=&quot;feature-card&quot; data-astro-cid-p7n436wq&gt; &lt;div class=&quot;icon&quot; data-astro-cid-p7n436wq&gt;🔎&lt;/div&gt; &lt;h4 data-astro-cid-p7n436wq&gt;Search &amp;amp; Browse&lt;/h4&gt;  &lt;p data-astro-cid-p7n436wq&gt;Browse by log channel or search across every EID, tag, and ATT&amp;CK tactic.&lt;/p&gt; &lt;/div&gt;&lt;div class=&quot;feature-card&quot; data-astro-cid-p7n436wq&gt; &lt;div class=&quot;icon&quot; data-astro-cid-p7n436wq&gt;🧠&lt;/div&gt; &lt;h4 data-astro-cid-p7n436wq&gt;Scenarios&lt;/h4&gt;  &lt;p data-astro-cid-p7n436wq&gt;An on-device AI tab powered by &lt;strong&gt;Apple Foundation Models&lt;/strong&gt;. Describe what you&apos;re seeing and on-device intelligence surfaces relevant EIDs. No network calls, no prompts leaving the device.&lt;/p&gt; &lt;/div&gt;&lt;div class=&quot;feature-card&quot; data-astro-cid-p7n436wq&gt; &lt;div class=&quot;icon&quot; data-astro-cid-p7n436wq&gt;🗺️&lt;/div&gt; &lt;h4 data-astro-cid-p7n436wq&gt;MITRE Mapping&lt;/h4&gt;  &lt;p data-astro-cid-p7n436wq&gt;Every applicable EID is tagged with ATT&amp;CK techniques and tactics, including direct links to MITRE&apos;s knowledge base.&lt;/p&gt; &lt;/div&gt;&lt;div class=&quot;feature-card&quot; data-astro-cid-p7n436wq&gt; &lt;div class=&quot;icon&quot; data-astro-cid-p7n436wq&gt;🛡️&lt;/div&gt; &lt;h4 data-astro-cid-p7n436wq&gt;Detection Rules&lt;/h4&gt;  &lt;p data-astro-cid-p7n436wq&gt;Inline Sigma, KQL, and Splunk rules where they exist - copy &amp; paste as a starting point, then tune.&lt;/p&gt; &lt;/div&gt;&lt;div class=&quot;feature-card&quot; data-astro-cid-p7n436wq&gt; &lt;div class=&quot;icon&quot; data-astro-cid-p7n436wq&gt;📎&lt;/div&gt; &lt;h4 data-astro-cid-p7n436wq&gt;Key Fields&lt;/h4&gt;  &lt;p data-astro-cid-p7n436wq&gt;The XML fields that matter for each event, with their xpaths, so you know what to grep for in raw EVTX.&lt;/p&gt; &lt;/div&gt;&lt;div class=&quot;feature-card&quot; data-astro-cid-p7n436wq&gt; &lt;div class=&quot;icon&quot; data-astro-cid-p7n436wq&gt;🔗&lt;/div&gt; &lt;h4 data-astro-cid-p7n436wq&gt;Related Events&lt;/h4&gt;  &lt;p data-astro-cid-p7n436wq&gt;Every entry cross-references the other EIDs you&apos;d want to pull into a timeline.&lt;/p&gt; &lt;/div&gt;&lt;div class=&quot;feature-card&quot; data-astro-cid-p7n436wq&gt; &lt;div class=&quot;icon&quot; data-astro-cid-p7n436wq&gt;📤&lt;/div&gt; &lt;h4 data-astro-cid-p7n436wq&gt;Markdown Exports&lt;/h4&gt;  &lt;p data-astro-cid-p7n436wq&gt;Built-in functionality to export out all EID data, or even just specific fields, to Markdown-formatted output. Useful for sharing or later use.&lt;/p&gt; &lt;/div&gt;&lt;div class=&quot;feature-card&quot; data-astro-cid-p7n436wq&gt; &lt;div class=&quot;icon&quot; data-astro-cid-p7n436wq&gt;📴&lt;/div&gt; &lt;h4 data-astro-cid-p7n436wq&gt;Fully Offline&lt;/h4&gt;  &lt;p data-astro-cid-p7n436wq&gt;The dataset is bundled. Works on a plane, in a SCIF-adjacent coffee shop (if that somehow applies to you), or wherever you answer pages from.&lt;/p&gt; &lt;/div&gt; &lt;/div&gt;
&lt;p&gt;The &lt;strong&gt;Scenarios&lt;/strong&gt; tab is probably the piece I’m most excited about. Running Apple’s on-device intelligence models means I get a meaningful “suggest EIDs for this situation” experience without sending a single byte to a third party. It is still &lt;strong&gt;experimental&lt;/strong&gt; and limited by the available on-device model context, but any DFIR tool that can run 100% local is a huge win. Obviously, &lt;strong&gt;those results will always need to be validated&lt;/strong&gt;, but it can be a great starting point or even just a useful discovery tool if you’re bored.&lt;/p&gt;
&lt;h2 id=&quot;why-the-data-repo-is-public-and-the-app-isnt&quot;&gt;Why the data repo is public (and the app isn’t)&lt;/h2&gt;
&lt;p&gt;The app source lives in a private repo - it’s my first shipped iOS app and I’d like room to iterate without anyone watching me rename various “View” Swift files or reassigning a log channel a different SF symbol six+ times. But the dataset is the part that benefits from more eyes, and the part that will keep improving long after the UI settles down. Making that public felt obvious.&lt;/p&gt;
&lt;p&gt;If you:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;find an event described incorrectly&lt;/li&gt;
&lt;li&gt;think an investigation pivot is wrong or missing&lt;/li&gt;
&lt;li&gt;want to propose a new channel (looking at you, AD FS nerds)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;…open an &lt;a href=&quot;https://github.com/zerber0s/windows-eid-data/issues&quot;&gt;issue&lt;/a&gt;. I’ll eventually read all of them.&lt;/p&gt;
&lt;h2 id=&quot;relevant-links&quot;&gt;Relevant links&lt;/h2&gt;
&lt;div class=&quot;link-cards&quot; data-astro-cid-z2sybhdy&gt; &lt;a class=&quot;link-card&quot; href=&quot;https://apps.apple.com/us/app/eidvault/id6761655272&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; data-astro-cid-z2sybhdy&gt; &lt;span class=&quot;icon&quot; aria-hidden=&quot;true&quot; data-astro-cid-z2sybhdy&gt; 📱 &lt;/span&gt; &lt;span class=&quot;text&quot; data-astro-cid-z2sybhdy&gt; &lt;span class=&quot;label&quot; data-astro-cid-z2sybhdy&gt;App Store&lt;/span&gt; &lt;span class=&quot;sub&quot; data-astro-cid-z2sybhdy&gt;EIDVault for iPhone &amp;amp; iPad&lt;/span&gt; &lt;/span&gt; &lt;span class=&quot;arrow&quot; aria-hidden=&quot;true&quot; data-astro-cid-z2sybhdy&gt;
↗
&lt;/span&gt; &lt;/a&gt;&lt;a class=&quot;link-card&quot; href=&quot;https://github.com/zerber0s/windows-eid-data&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; data-astro-cid-z2sybhdy&gt; &lt;span class=&quot;icon&quot; aria-hidden=&quot;true&quot; data-astro-cid-z2sybhdy&gt; 🧾 &lt;/span&gt; &lt;span class=&quot;text&quot; data-astro-cid-z2sybhdy&gt; &lt;span class=&quot;label&quot; data-astro-cid-z2sybhdy&gt;Data Repo&lt;/span&gt; &lt;span class=&quot;sub&quot; data-astro-cid-z2sybhdy&gt;zerber0s/windows-eid-data&lt;/span&gt; &lt;/span&gt; &lt;span class=&quot;arrow&quot; aria-hidden=&quot;true&quot; data-astro-cid-z2sybhdy&gt;
↗
&lt;/span&gt; &lt;/a&gt;&lt;a class=&quot;link-card&quot; href=&quot;https://www.linkedin.com/posts/zmb781_dfir-activity-7450189851229908992-4N61&quot; target=&quot;_blank&quot; rel=&quot;noopener noreferrer&quot; data-astro-cid-z2sybhdy&gt; &lt;span class=&quot;icon&quot; aria-hidden=&quot;true&quot; data-astro-cid-z2sybhdy&gt; 💼 &lt;/span&gt; &lt;span class=&quot;text&quot; data-astro-cid-z2sybhdy&gt; &lt;span class=&quot;label&quot; data-astro-cid-z2sybhdy&gt;Launch Post&lt;/span&gt; &lt;span class=&quot;sub&quot; data-astro-cid-z2sybhdy&gt;LinkedIn announcement&lt;/span&gt; &lt;/span&gt; &lt;span class=&quot;arrow&quot; aria-hidden=&quot;true&quot; data-astro-cid-z2sybhdy&gt;
↗
&lt;/span&gt; &lt;/a&gt; &lt;/div&gt;
&lt;p&gt;If you do give it a try, I’d love to hear what’s working, what’s missing, and what my overcaffinated brain got wrong. This is v1.0 and there’s a lot of room to grow. Also, the best direction usually comes from the people not coding all of this until 2am.&lt;/p&gt;
&lt;p&gt;Happy hunting.&lt;/p&gt;
&lt;p&gt;ZB&lt;/p&gt;</content:encoded><enclosure url="https://labs.zerberos.io/_astro/eidvault-blog-banner-whitebg-1600x900.DcyJoBb7.png" length="0" type="image/png"/></item><item><title>Building Zerberos Labs: Astro on Cloudflare Pages</title><link>https://labs.zerberos.io/blog/building-zerberos-labs/</link><guid isPermaLink="true">https://labs.zerberos.io/blog/building-zerberos-labs/</guid><description>Notes from standing up this blog — why Astro, the Cloudflare Pages setup, and some gotchas.</description><pubDate>Sun, 19 Apr 2026 00:00:00 GMT</pubDate><content:encoded>&lt;p&gt;&lt;img src=&quot;https://labs.zerberos.io/_astro/astro-cloudflare-heroimage.BMKi7fYv.png&quot; alt=&quot;&quot; /&gt;&lt;/p&gt;&lt;p&gt;After coming from WordPress previously, I figured I would document a quick write-up on what it took to stand this Astro-based blog up and why. If you’re thinking about doing the same, hopefully a few of the gotchas below save you the hour I lost to them (thank you Claude).&lt;/p&gt;
&lt;h2 id=&quot;why-astro-over-hugo&quot;&gt;Why Astro over Hugo&lt;/h2&gt;
&lt;p&gt;I started by doing some research to compare the two top web framework options, &lt;a href=&quot;https://astro.build&quot;&gt;Astro&lt;/a&gt; and &lt;a href=&quot;https://gohugo.io&quot;&gt;Hugo&lt;/a&gt;, against one another. Hugo is hard to beat for pure-blog use cases: single binary, no Node, fast builds. There were some tradeoffs however:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Hugo’s templating is limited to Go’s &lt;code&gt;html/template&lt;/code&gt;. No components, no JSX, no &lt;a href=&quot;https://docs.astro.build/en/guides/integrations-guide/react/&quot;&gt;React&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;There’s no clean path to embedding interactive UI (filterable tables, an EID lookup widget, etc.) without bolting on raw JS by hand (which I already barely understand to that extent).&lt;/li&gt;
&lt;li&gt;Astro uses an &lt;strong&gt;islands architecture&lt;/strong&gt;, which means it ships zero JS by default and only hydrates the components that need to be interactive.&lt;/li&gt;
&lt;li&gt;Astro supports React natively, so I can drop components anywhere, including inside a blog post.&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;For a site I want to grow beyond pure blogging (DFIR tooling, embedded reference widgets, possibly a standalone EID lookup page, etc.), Astro is the better foundation. Hugo would have likely shipped the blog faster, but I’d have hit limitations on just post #2. There was also something satisfying about building something that could evolve with me over time and match my identity.&lt;/p&gt;
&lt;h2 id=&quot;prerequisites&quot;&gt;Prerequisites&lt;/h2&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Node.js&lt;/strong&gt; (&lt;code&gt;brew install node&lt;/code&gt;, verify with &lt;code&gt;node -v&lt;/code&gt;) - write down your version, you’ll need it for Cloudflare&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Visual Studio Code&lt;/strong&gt; for editing&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;GitHub&lt;/strong&gt; account&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;Cloudflare&lt;/strong&gt; account with your domain already managed there (I leveraged Cloudflare Pages to host)&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;creating-the-project&quot;&gt;Creating the project&lt;/h2&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B392F0&quot;&gt;npm&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; create&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; astro@latest&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt; [blog-or-repo-name]&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Pick the &lt;strong&gt;blog&lt;/strong&gt; template when prompted, say yes to TypeScript, and let it install dependencies.&lt;/p&gt;
&lt;p&gt;Then add React for interactive island components:&lt;/p&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#79B8FF&quot;&gt;cd&lt;/span&gt;&lt;span style=&quot;color:#E1E4E8&quot;&gt; [blog-or-repo-name]&lt;/span&gt;&lt;/span&gt;
&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B392F0&quot;&gt;npx&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; astro&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; add&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; react&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Run locally:&lt;/p&gt;
&lt;div class=&quot;code-block-wrapper&quot;&gt;&lt;pre class=&quot;astro-code github-dark&quot; style=&quot;background-color:#24292e;color:#e1e4e8; overflow-x: auto;&quot; tabindex=&quot;0&quot; data-language=&quot;bash&quot;&gt;&lt;code&gt;&lt;span class=&quot;line&quot;&gt;&lt;span style=&quot;color:#B392F0&quot;&gt;npm&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; run&lt;/span&gt;&lt;span style=&quot;color:#9ECBFF&quot;&gt; dev&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;
&lt;p&gt;Astro previews at &lt;code&gt;http://localhost:4321&lt;/code&gt; and hot-reloads on save.&lt;/p&gt;
&lt;aside class=&quot;note&quot;&gt;The dev server can die when your host machine goes to sleep. If this happens, re-run &lt;code&gt;npm run dev&lt;/code&gt; when you wake your machine.&lt;/aside&gt;
&lt;h2 id=&quot;where-things-live&quot;&gt;Where things live&lt;/h2&gt;
&lt;p&gt;Below is a short list of the files and folders I found mattered most as I was getting started:&lt;/p&gt;





































&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;File&lt;/th&gt;&lt;th&gt;Purpose&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;&lt;code&gt;src/consts.ts&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Site name and description, referenced site-wide&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;code&gt;astro.config.mjs&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Set &lt;code&gt;site: &apos;https://[DOMAIN]&apos;&lt;/code&gt; here&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;code&gt;src/content.config.ts&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Zod schemas for blog post frontmatter validation&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;code&gt;src/content/blog/&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Markdown and MDX post files&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;code&gt;src/pages/index.astro&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Homepage&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;code&gt;src/components/Header.astro&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Site header&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;&lt;code&gt;src/styles/global.css&lt;/code&gt;&lt;/td&gt;&lt;td&gt;Global styles&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;p&gt;A couple of quick notes:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;site&lt;/code&gt; field in &lt;code&gt;astro.config.mjs&lt;/code&gt; is only used at build time for RSS, sitemap, and canonical URLs. Doesn’t affect local dev. I ended up setting this early, even though my blog wasn’t “live” yet.&lt;/li&gt;
&lt;li&gt;Newer Astro versions moved the content-config file out of the &lt;code&gt;content/&lt;/code&gt; folder up to &lt;code&gt;src/content.config.ts&lt;/code&gt;. Same file, slightly different path than older docs describe. I spent a stupid amount of time stuck on this.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2 id=&quot;github--cloudflare-pages&quot;&gt;GitHub + Cloudflare Pages&lt;/h2&gt;
&lt;p&gt;Push the repo to GitHub but keep it &lt;strong&gt;private&lt;/strong&gt; - Cloudflare Pages works fine with private repos via OAuth, and the OAuth connection itself doesn’t expire or take the site down if anything wobbles (the CDN keeps serving the last successful deployment regardless). When authorizing Cloudflare on GitHub, choose &lt;strong&gt;Only select repositories&lt;/strong&gt; and pick just this one. Security first, obviously.&lt;/p&gt;
&lt;p&gt;In Cloudflare:&lt;/p&gt;
&lt;aside class=&quot;note&quot;&gt;&lt;strong&gt;Workers vs Pages &quot;Gotcha&quot; -&lt;/strong&gt; the Cloudflare dashboard will by default (annoyingly so) route you into the &lt;strong&gt;Workers&lt;/strong&gt; setup flow, which shows &lt;code&gt;npx wrangler deploy&lt;/code&gt; - that&apos;s the wrong place we need to be. Navigate explicitly to &lt;strong&gt;Workers &amp;#x26; Pages → Create → Pages tab → Connect to Git&lt;/strong&gt;.&lt;/aside&gt;
&lt;p&gt;Build settings I used:&lt;/p&gt;

























&lt;table&gt;&lt;thead&gt;&lt;tr&gt;&lt;th&gt;Setting&lt;/th&gt;&lt;th&gt;Value&lt;/th&gt;&lt;/tr&gt;&lt;/thead&gt;&lt;tbody&gt;&lt;tr&gt;&lt;td&gt;Framework preset&lt;/td&gt;&lt;td&gt;Astro (auto-detected)&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Build command&lt;/td&gt;&lt;td&gt;&lt;code&gt;npm run build&lt;/code&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Build output directory&lt;/td&gt;&lt;td&gt;&lt;code&gt;dist&lt;/code&gt;&lt;/td&gt;&lt;/tr&gt;&lt;tr&gt;&lt;td&gt;Environment variable&lt;/td&gt;&lt;td&gt;&lt;code&gt;NODE_VERSION&lt;/code&gt; = output of &lt;code&gt;node -v&lt;/code&gt; on your machine&lt;/td&gt;&lt;/tr&gt;&lt;/tbody&gt;&lt;/table&gt;
&lt;aside class=&quot;note&quot;&gt;&lt;strong&gt;Node version &quot;Gotcha&quot; -&lt;/strong&gt; very new Node releases (e.g. v25.x at the time I set this up) may not be supported by Cloudflare Pages yet. If the first deploy fails on a Node-version error, drop &lt;code&gt;NODE_VERSION&lt;/code&gt; to &lt;code&gt;22&lt;/code&gt; (current LTS) in the Pages env settings and trigger a redeploy.&lt;/aside&gt;
&lt;h2 id=&quot;configuring-a-custom-domain&quot;&gt;Configuring a custom domain&lt;/h2&gt;
&lt;p&gt;After the first successful deploy: &lt;strong&gt;Pages project → Custom Domains → Add Domain&lt;/strong&gt; → enter your subdomain (&lt;code&gt;labs.zerberos.io&lt;/code&gt; for me). Because my domain’s DNS is already on Cloudflare, the &lt;strong&gt;CNAME&lt;/strong&gt; is auto-created and SSL is provisioned automatically. Usually live within a few minutes. Magic.&lt;/p&gt;
&lt;h2 id=&quot;the-deploy-loop&quot;&gt;The deploy loop&lt;/h2&gt;
&lt;p&gt;As I grow this blog by adding posts or making tweaks to the underlying UI (like adding in a dark mode toggle), I have fallen into the following deployment loop:&lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;Edit files locally.&lt;/li&gt;
&lt;li&gt;Preview changes at &lt;code&gt;localhost:4321&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Push to GitHub.&lt;/li&gt;
&lt;li&gt;Cloudflare auto-rebuilds and deploys. No manual steps.&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Based on the above setup, all I have to do to create a new blog post is whip up a Markdown file (&lt;code&gt;.md&lt;/code&gt;) or an enhanced file that can run React components (&lt;code&gt;.mdx&lt;/code&gt;), follow standard Markdown formatting, add in any React components as needed, and push to GitHub. Then it’s live.&lt;/p&gt;
&lt;p&gt;Build status lives under the &lt;strong&gt;Deployments&lt;/strong&gt; tab in the Pages project within Cloudflare’s portal. Old deployments stay in history, which can be useful for one-click rollback if something implodes in live time.&lt;/p&gt;
&lt;h2 id=&quot;later-additions--tweaks&quot;&gt;Later additions &amp;#x26; tweaks&lt;/h2&gt;
&lt;p&gt;A few things weren’t done on day one but have been added since:&lt;/p&gt;
&lt;h4 id=&quot;support-for-dynamic-mobile-layouts&quot;&gt;&lt;strong&gt;Support for dynamic mobile layouts&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;This was actually an extremely frustrating late realization when I first deployed the blog, opened it on my phone, and saw nothing but a garbled mess. I went back and used Chrome’s developer tools (&lt;code&gt;FN+F12&lt;/code&gt; on my MacBook) to preview the site in different mobile aspect ratios, fixing the header and blog post sizings. Small changes like this can actually involve multiple Astro files, so utilizing tools like Claude Code or OpenAI’s Codex can help simplify the lift.&lt;/p&gt;
&lt;h4 id=&quot;dark-mode-toggle&quot;&gt;&lt;strong&gt;Dark mode toggle&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;Dark mode doesn’t really need an explanation (it’s just better) so I wanted the blog to have that option. Astro did not have it baked in from the start, so I added in the button you see in the site header as a manual toggle, as well as system preference detection that persists via &lt;code&gt;localStorage&lt;/code&gt; and a &lt;code&gt;data-theme&lt;/code&gt; attribute on &lt;code&gt;&amp;#x3C;html&gt;&lt;/code&gt;. I also leveraged Claude Code to help me implement this.&lt;/p&gt;
&lt;h4 id=&quot;embedded-react-islands-in-posts&quot;&gt;&lt;strong&gt;Embedded React islands in posts&lt;/strong&gt;&lt;/h4&gt;
&lt;p&gt;React allows you to add interactive elements to Astro, so not eveything us just plain Markdown format. My first real use of this was the EID preview widget in &lt;a href=&quot;https://labs.zerberos.io/blog/eidvault-launch/&quot;&gt;the EIDVault launch post&lt;/a&gt;, which also validated my decision to use Astro and MDX files.&lt;/p&gt;
&lt;p&gt;Hopefully all of the above helps anyone looking to do something similar. The functionality of this platform allows me more flexability compared to a blog via WordPress, which also opens the door to some additional ideas I could explore down the road. For example, I could create a standalone EID lookup page here driven by the &lt;a href=&quot;https://github.com/zerber0s/windows-eid-data&quot;&gt;windows-eid-data&lt;/a&gt; JSON dataset, the same source of truth &lt;a href=&quot;https://labs.zerberos.io/blog/eidvault-launch/&quot;&gt;EIDVault&lt;/a&gt; uses, served as a free web tool.&lt;/p&gt;
&lt;p&gt;More to come if I ever take that on.&lt;/p&gt;
&lt;p&gt;ZB&lt;/p&gt;</content:encoded><enclosure url="https://labs.zerberos.io/_astro/astro-cloudflare-heroimage.BMKi7fYv.png" length="0" type="image/png"/></item></channel></rss>