<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="3.10.0">Jekyll</generator><link href="https://danielfarid.github.io/danielfarid.github.io/feed.xml" rel="self" type="application/atom+xml" /><link href="https://danielfarid.github.io/danielfarid.github.io/" rel="alternate" type="text/html" /><updated>2026-04-11T23:12:02+00:00</updated><id>https://danielfarid.github.io/danielfarid.github.io/feed.xml</id><title type="html">Daniel Farid</title><subtitle>A personal website to document projects and learn new skills</subtitle><author><name>Daniel Farid</name></author><entry><title type="html">host-sentry: A Local Security Monitor for the Machine Running Claude Code</title><link href="https://danielfarid.github.io/danielfarid.github.io/security/tools/homelab/claude/2026/04/11/host-sentry.html" rel="alternate" type="text/html" title="host-sentry: A Local Security Monitor for the Machine Running Claude Code" /><published>2026-04-11T12:56:00+00:00</published><updated>2026-04-11T12:56:00+00:00</updated><id>https://danielfarid.github.io/danielfarid.github.io/security/tools/homelab/claude/2026/04/11/host-sentry</id><content type="html" xml:base="https://danielfarid.github.io/danielfarid.github.io/security/tools/homelab/claude/2026/04/11/host-sentry.html"><![CDATA[<p>I use Claude Code for a lot of things — building bots, forensics tooling, homelab automation, troubleshooting things that would take me three times as long to solve on my own. It's become a genuine force multiplier for the kind of solo technical work I do, and I've leaned into that. But I've never been fully comfortable with it, and I think that discomfort is worth being honest about.</p>

<p>The issue isn't that Claude is adversarial. It isn't. The issue is that the surface area of risk is large and most of it is invisible during normal use. Every file Claude reads, every bash command it runs, every MCP tool it calls is a potential vector. If a project directory contains a malicious string crafted to look like an instruction, Claude might follow it. If a dependency in one of my Node-based MCP servers was compromised, Claude would happily execute it every time I opened that project. If my conversation history — which contains over 300KB of every prompt, code snippet, and environment variable I've ever pasted into a session — ended up readable by the wrong process, that's a significant credential exposure. None of this is theoretical. These are real attack surfaces that exist on any machine running an AI coding assistant with broad file system and shell access.</p>

<p>I've managed this the careful way for a while. Narrow approval modes. Reviewing every tool call before confirming. Not pasting secrets directly into chat. Keeping MCP servers scoped. But managing it manually only works until it doesn't, and the recent public discussion around the Claude Code source code pushed me past the point of being comfortable with a manual-only approach. Not because the leak itself was catastrophic, but because it was a reminder that the gap between "this is probably fine" and "this has been compromised" isn't always visible from where you're sitting.</p>

<p>My laptop does most of the heavy lifting on this setup. It's where I write code with Claude, run initial analysis, and test things before they get offloaded to TrueNAS. That makes it both the most capable node in my homelab and the most exposed one — the first explorer and the canary in the coal mine simultaneously. Something watching it locally, without depending on Claude Code being active, was probably overdue.</p>

<h2 class="section-heading">What I Built</h2>

<p>host-sentry is a small FastAPI service that runs in WSL under systemd and monitors my workstation on a schedule. It doesn't depend on Claude Code, doesn't call out to any external service by default, and doesn't require anything from me day-to-day. It runs, it watches, and it tells me when something changes that I should look at.</p>

<p>The architecture follows the same pattern as the trading bots I've built — FastAPI app, APScheduler for periodic jobs, SQLite for persistence, a live dashboard served locally. That pattern is familiar enough that the whole thing came together quickly, and familiar enough that I trust it. There's no reason to invent a new shape for something like this.</p>

<p>Five things run on a schedule. An integrity watcher hashes a defined set of critical files — Claude's settings, every project's MCP configuration, my shell rc files, SSH authorized keys, memory files I write to persist things across sessions — and alerts when anything changes unexpectedly. A secret scanner walks my project directories and conversation history looking for high-entropy strings and known credential patterns: Alpaca keys, Discord tokens, Anthropic API keys, GitHub PATs, private key headers. A process watcher snapshots listening sockets and established outbound connections and diffs them against a known-good allowlist. A Claude audit module tails my conversation history looking for tool calls matching patterns I've decided are dangerous: curl piped anywhere, base64 decode, writes to authorized_keys, rm -rf from root. And a YARA scanner runs nightly against Downloads and tmp using the same ruleset I use for memory forensics work.</p>

<p>On the Windows side, a PowerShell companion script runs hourly via Task Scheduler and checks Defender status, listening ports, scheduled tasks, startup items, local admins, and WSL network configuration. It POSTs findings to the WSL service over localhost. The whole picture — both sides of the WSL boundary — lands in one dashboard.</p>

<h2 class="section-heading">The Threat Model It's Actually Solving For</h2>

<p>Before building it I wrote out the specific threats I was trying to detect. That exercise is worth doing because "I want security monitoring" is too vague to build something useful from. You end up with either too much noise or coverage in the wrong places.</p>

<p>The thing I care most about is hook abuse. Claude Code supports configuring shell hooks that fire automatically on tool events — before a file edit, after a bash command, on session start. Currently I have none configured. But <code>~/.claude/settings.json</code> and any project-level <code>.claude/settings.local.json</code> can define them, and an attacker who can write to either of those files gets persistent arbitrary code execution under my user on every subsequent Claude session. The integrity watcher fires immediately on any change to those files, before the next session opens.</p>

<p>The second thing is MCP server compromise. Each project's <code>.mcp.json</code> declares a Node process that Claude spawns. A poisoned <code>npm install</code> in any of my MCP server directories is code execution every time I open that project. The integrity watcher covers the package files. The secret scanner covers the directories. Neither is a substitute for auditing dependencies, but both give me a signal if something changes unexpectedly after a dependency update.</p>

<p>The third is credential sprawl. I found during the initial scan that several <code>.env</code> files across my projects had world-readable permissions — a consequence of WSL's default mount behavior, not intentional, but real. My conversation history is 300KB of text that contains fragments of things I've pasted into chat over months, including values that are now rotated but were valid at the time. The secret scanner doesn't just catch new exposures; it gives me a map of where sensitive material has accumulated so I can make deliberate decisions about rotation and cleanup.</p>

<p>The fourth is memory poisoning. I write persistent notes to a memory directory that Claude reads at the start of every conversation. That's useful for continuity but it means the memory files are part of my attack surface — anyone who can write there can plant instructions that persist across sessions. Having the integrity watcher on that directory is roughly equivalent to watching a CI config: changes there are changes to behavior, and they should be reviewed.</p>

<h2 class="section-heading">What It Looks Like Day to Day</h2>

<p>Most of the time it's invisible. The service runs under systemd, starts at login, and sits quietly in the background. When a scan completes and finds nothing new, nothing happens. When it finds something, a single summary notification appears in the Windows system tray — not one notification per finding, which was an early mistake that resulted in a genuinely impressive balloon storm — and the dashboard reflects the new alerts.</p>

<p>The dashboard lives at <code>http://127.0.0.1:8765/dashboard</code> and is the main interface. It polls every thirty seconds, shows unacknowledged alerts with expandable detail rows, scan status cards with next-run countdowns, and the full recent history. It's self-contained HTML served from the same FastAPI process so there are no CORS issues and no dependency on anything external.</p>

<p>Real-time coverage for the most critical paths comes from inotify. Changes to Claude settings files and MCP configurations fire an alert immediately rather than waiting for the next scheduled integrity check. Everything else runs on intervals: integrity every four hours, process checks every six, Claude audit every four, secrets and YARA nightly.</p>

<p>An AI triage module is built in but off by default. When enabled with an Anthropic API key, it routes medium-and-above alerts through Claude for classification — benign, suspicious, or malicious, with reasoning and a recommended action. It's useful for the cases where the raw alert is technically accurate but the right response isn't obvious. I left it off by default because running every alert through an API call when the service is a local security monitor felt like the wrong default, but the option is there when the noise-to-signal ratio makes it worth it.</p>

<h2 class="section-heading">What the Initial Scan Found</h2>

<p>The honest answer is more than I expected. All of my <code>.env</code> files were mode 777 because WSL mounts the Windows filesystem without metadata support unless you explicitly configure it. One project had its API secret key hardcoded in the MCP configuration file — visible to any process that could read that directory, and technically committed to a local repo. My Claude sessions directory had files at 644. A plugin had <code>git push</code> in its auto-approval list, which means Claude could push to remote without prompting. An account called <code>enduser</code> had its password changed with no audit trail — nine minutes after a Cellebrite UFED crash-loop on the same machine, cause still undetermined. Two orphaned Active Directory SIDs were sitting in local group membership with no corresponding accounts attached to them. The built-in Administrator account was still enabled. All three are textbook persistence footholds, and none of them were visible without specifically looking.</p>

<p>The account was disabled, the SIDs were removed, and account management auditing got turned on so future changes leave a trail. The ambiguity around <code>enduser</code> is what stuck with me — not because it was necessarily malicious, but because I couldn't rule it out. That's exactly the kind of finding host-sentry is built to surface before it becomes a question you're answering after the fact.</p>

<p>None of these were the result of malicious activity. All of them were the result of convenience and accumulated configuration drift — the kind of thing that doesn't register as a problem until you look for it specifically. Running the scan wasn't alarming. It was clarifying. The fixes were straightforward once I knew what I was looking at.</p>

<h2 class="section-heading">Why Local</h2>

<p>The whole thing is local by design. No cloud logging, no external webhook by default, no dependency on any service being reachable. This was a deliberate choice rather than a constraint. A security monitor that phones home is a monitor that has its own attack surface, its own availability dependency, and its own data residency question. The machine I'm trying to protect is the one I'm sitting in front of. The alerts should land there too.</p>

<p>The code is at <a href="https://github.com/dfarid479/host-sentry">github.com/dfarid479/host-sentry</a>.</p>]]></content><author><name>Daniel Farid</name></author><category term="security" /><category term="tools" /><category term="homelab" /><category term="claude" /><summary type="html"><![CDATA[I use Claude Code for a lot of things — building bots, forensics tooling, homelab automation, troubleshooting things that would take me three times as long to solve on my own. It's become a genuine force multiplier for the kind of solo technical work I do, and I've leaned into that. But I've never been fully comfortable with it, and I think that discomfort is worth being honest about.]]></summary></entry><entry><title type="html">Digital Forensics and Incident Response - memdump Tool</title><link href="https://danielfarid.github.io/danielfarid.github.io/dfir/forensics/tools/2026/04/04/dfir-memdump.html" rel="alternate" type="text/html" title="Digital Forensics and Incident Response - memdump Tool" /><published>2026-04-04T16:26:00+00:00</published><updated>2026-04-04T16:26:00+00:00</updated><id>https://danielfarid.github.io/danielfarid.github.io/dfir/forensics/tools/2026/04/04/dfir-memdump</id><content type="html" xml:base="https://danielfarid.github.io/danielfarid.github.io/dfir/forensics/tools/2026/04/04/dfir-memdump.html"><![CDATA[<p>I capture RAM on almost every scene I work. It's part of the collection checklist at this point — device is live, collection kit comes out, memory goes first. That part is well established. What's less established, at least in my own workflow, is what happens to that image afterward.</p>

<p>The honest answer most of the time: it gets imaged, hashed, bagged, logged, and then sits on a drive while the investigation moves forward on artifacts that are faster to process. Disk forensics has a mature toolchain. Network forensics has a mature toolchain. Memory forensics has Volatility, which is powerful and deeply capable, but the path from "here's a raw image" to "here's what's actually happening in this machine" has a lot of manual steps and a high floor for entry. The result is that RAM — which is often the most volatile and time-sensitive evidence on the whole scene — ends up being the least utilized data point in the investigation.</p>

<p>The motivation for building this came from two directions. The first is the cyber response side. I wanted something that could run locally on an endpoint during an active incident — no sending images off-site, no waiting on a lab queue — and that could also give a fast first read when someone comes in reporting a compromise. The triage report needs to be good enough to tell you whether you're looking at a real incident and what the attacker was doing, in the time it takes to have the first conversation with the affected party.</p>

<p>The second is criminal forensics, specifically cases involving the possession of unlawful files. That work has a growing encryption problem. Whether it's because the barrier to using full-disk encryption has dropped, because suspects are becoming more aware of forensic methods, or simply because encryption is more prevalent across all consumer devices — the result is the same: more devices that are live and mounted at collection time, and more cases where the question isn't whether contraband exists but whether we can access it. Memory capture is one of the few collection techniques that addresses that directly. The key material is in RAM while the system is running. If you don't get it there, you don't get it.</p>

<p>Those two use cases shaped the tool in different ways, but the core requirement was the same: hand it a raw image, walk away for a few minutes, and come back to a prioritized summary of what's worth looking at. No manual plugin chaining, no parsing JSON by hand, no rebuilding the same analysis workflow every case. Just signal.</p>

<h2 class="section-heading">What's Actually in Memory</h2>

<p>It's worth being specific about why RAM matters, because "volatile evidence" gets said so often it can start to feel like a checkbox rather than a reason. Here's what a memory image can contain that disk forensics simply cannot recover after the fact:</p>

<p>Running processes that have no corresponding file on disk. Fileless malware, reflective DLL injection, process hollowing — all of these techniques exist specifically to avoid writing to disk, and all of them are visible in memory. The injected shellcode sitting in a PAGE_EXECUTE_READWRITE region of a legitimate process doesn't exist anywhere on the filesystem. If you don't grab memory before shutdown, it's gone.</p>

<p>Network connections in ESTABLISHED state. The disk tells you what was installed. Memory tells you what was actively communicating at the moment of collection — the live socket table, the foreign addresses, the processes behind each connection. That's the difference between knowing malware was present and knowing it was actively beaconing to infrastructure at the time you arrived.</p>

<p>Decrypted content. Full-disk encryption protects data at rest. A running system has already decrypted everything it needs to operate. Credentials in LSASS, encryption keys in process memory, plaintext values that are ciphertext on disk — all of it is accessible in a live capture in a way it simply isn't from a disk image alone.</p>

<p>The limitation has never been what's in memory. The limitation has been the time cost of getting real meaning from it.</p>

<h2 class="section-heading">Building the Tool</h2>

<p>dfir-memdump is a Python tool that sits on top of Volatility3 and adds an intelligence layer between the raw plugin output and the investigator. The architecture is straightforward: run a fixed set of Volatility3 plugins against the image, pass all the output through a set of analysis modules, and render the results into a report that a human can actually read.</p>

<p>The plugin stack covers the core artifacts: process list and tree, network connections, suspicious VAD regions via malfind, command lines, loaded DLLs, open handles, token privileges, and BitLocker keys. Eight plugins, all run automatically, all output parsed into typed models so the analysis layer has clean structured data to work with.</p>

<p>On top of that sits ten intelligence modules. The ones that tend to produce the most immediately actionable output in practice:</p>

<p><strong>Anomaly Detector</strong> — checks every process for parent-child relationships that don't make sense (Word spawning PowerShell, explorer.exe spawning net.exe), name masquerading (svchost.exe running from the wrong path), and hollow process indicators. These are the behavioral anomalies that show up before any hash or signature matching is possible.</p>

<p><strong>C2 Detector</strong> — matches outbound connections against the Feodo tracker IP blocklist and a local list of known malicious infrastructure. Also flags connections on ports commonly associated with C2 frameworks and tools like Cobalt Strike, Meterpreter, and Sliver.</p>

<p><strong>Mutex Checker</strong> — this one is underappreciated. Named mutexes are how malware prevents multiple instances of itself from running simultaneously, and they are highly family-specific. WannaCry has a mutex. Cobalt Strike beacons have GUID-style global mutexes. LockBit has a mutex. The module checks open handles against a list of known-bad mutex signatures and also flags any process holding a Process-type handle to another process — the classic OpenProcess → WriteProcessMemory → CreateRemoteThread injection setup.</p>

<p><strong>Privilege Checker</strong> — inspired by PrivHound, this module looks at the token privileges held by each non-system process. SeDebugPrivilege on a process that isn't a debugger is a significant red flag. SeImpersonatePrivilege in the wrong place is a Potato attack waiting to happen. When both are present on the same process, the finding escalates to CRITICAL — that combination is generally sufficient for full domain compromise from a low-privilege starting point.</p>

<p><strong>VirusTotal Client</strong> — submits SHA256 hashes of process images to the VirusTotal API and surfaces any detections. Rate-limited, cached locally in SQLite so you're not burning API quota on hashes you've already checked, and skippable entirely with a flag for offline or air-gapped work.</p>

<p><strong>Encryption Key Finder</strong> — this one is driven directly by the criminal forensics side of the work. Encrypted devices are appearing more often, and a live memory capture at the time of seizure is often the only opportunity to recover key material. If a BitLocker-encrypted drive is present and the system is running, the Full Volume Encryption Key is often sitting in memory. Volatility3's windows.bitlocker.Bitlocker plugin can pull it out directly. The module also runs aeskeyfind and bulk_extractor as subprocess tools if they're installed, scanning the raw image for AES key schedules by detecting the statistical properties of expanded key material — this covers BitLocker, VeraCrypt, TrueCrypt, and any other AES-based cipher. For VeraCrypt and TrueCrypt specifically, it checks whether the mounting process was running at collection time; if it was, the master key is typically in its VAD and recoverable with the same tooling. Recovered keys land in a dedicated section of the report with the dislocker and bdemount commands pre-populated. If the plugin runs and finds nothing — because the drive wasn't encrypted, or because the system had already been shut down and restarted — the report says so explicitly rather than leaving a blank section. Chain-of-custody note: key recovery is performed against the forensic copy of the memory image and documented in the report.</p>

<h2 class="section-heading">Running It</h2>

<p>Installation is standard Python — clone, pip install, done. The only external dependency is Volatility3, which needs to be on PATH. A VirusTotal API key is optional but easy to add via a .env file.</p>

<pre><code>dfir-memdump analyze /evidence/RAM.raw</code></pre>

<p>That's the full command for a default run. It produces three report files — JSON, Markdown, and HTML — in a reports directory. The terminal output gives you the executive summary and a severity-sorted findings table while the full reports are being written.</p>

<p>For faster turnaround when you don't need VT lookups:</p>

<pre><code>dfir-memdump analyze /evidence/RAM.raw --no-vt --format html</code></pre>

<p>The HTML report is the most useful artifact for casework. It's fully self-contained — one file, no internet required to view it — and covers everything in a readable format: the image hash for chain of custody, the executive summary, a process risk leaderboard sorted by weighted finding score, the process tree with suspicious processes flagged, a chronological event timeline, the attack chain reconstruction ordered by kill-chain stage, a dedicated encryption key section with mount commands, and an intelligence findings section grouped by severity level so you see the critical items immediately without scrolling through everything. IOCs are summarized by type — N IPs, N hashes, N mutexes — with the full list expandable when you need it. There's a Print / Save PDF button in the header for generating a clean printable version.</p>

<h2 class="section-heading">The Attack Chain</h2>

<p>One of the features I put the most thought into is the attack chain reconstruction. After all the intelligence modules run, the tool groups findings by their MITRE ATT&amp;CK tactic and orders them along the kill chain — Initial Access through Impact — generating a plain-English narrative for each observed stage.</p>

<p>The output looks something like this: Privilege Escalation was observed — SeDebugPrivilege and SeImpersonatePrivilege were both enabled on process X, sufficient for full domain compromise. Command and Control was observed — one connection to a known C2 IP with X detections on VirusTotal. Impact was observed — a mutex matching the LockBit ransomware signature was found in process Y.</p>

<p>That's the kind of summary you can hand to a supervisor or include in an affidavit without requiring them to understand what PAGE_EXECUTE_READWRITE means. It ties the technical findings to a coherent narrative about what the attacker was doing and what stage they had reached at the time of collection. For investigations where the memory image is one artifact among many, that framing is genuinely useful.</p>

<h2 class="section-heading">Where It Fits</h2>

<p>This isn't a replacement for a full Volatility workflow. There are things the tool doesn't do — no registry hive extraction, no shimcache parsing, no deep artifact carving — and for complex cases those manual deep-dives are still necessary. The purpose is different: get through the first pass quickly, surface what's worth looking at, and let the investigator make informed decisions about where to spend time.</p>

<p>On the cyber response side, that means having an answer ready before the initial triage conversation is over — yes this is a real incident, here's what was running, here's where it was calling out to. On the criminal forensics side, it means leaving every scene knowing whether the memory image contains key material that will matter later, rather than finding out months down the line when the image has been sitting on a shelf.</p>

<p>The memory image has always been worth collecting. The gap has been in making it worth reading before the investigation has moved on. That's what this is trying to close.</p>

<p>The code is open source at <a href="https://github.com/dfarid479/dfir-memdump">github.com/dfarid479/dfir-memdump</a>.</p>]]></content><author><name>Daniel Farid</name></author><category term="dfir" /><category term="forensics" /><category term="tools" /><summary type="html"><![CDATA[I capture RAM on almost every scene I work. It's part of the collection checklist at this point — device is live, collection kit comes out, memory goes first. That part is well established. What's less established, at least in my own workflow, is what happens to that image afterward.]]></summary></entry><entry><title type="html">Adding Off-Site Backup to My TrueNAS Homelab with Backblaze B2</title><link href="https://danielfarid.github.io/danielfarid.github.io/homelab/2026/04/04/backblaze-b2-truenas.html" rel="alternate" type="text/html" title="Adding Off-Site Backup to My TrueNAS Homelab with Backblaze B2" /><published>2026-04-04T14:00:00+00:00</published><updated>2026-04-04T14:00:00+00:00</updated><id>https://danielfarid.github.io/danielfarid.github.io/homelab/2026/04/04/backblaze-b2-truenas</id><content type="html" xml:base="https://danielfarid.github.io/danielfarid.github.io/homelab/2026/04/04/backblaze-b2-truenas.html"><![CDATA[<p>My homelab gap analysis from a few months back flagged a few things as critical. No resource limits on containers was one. No off-site backups was another. The exact note I wrote was: "all backups on same machine—fire or theft means total loss." That's not a backup strategy. That's just having two copies of the same thing in the same building.</p>

<p>I finally got around to fixing it. The solution was Backblaze B2, and I want to write about it because the setup was simpler than I expected and the cost is almost nothing for what I'm backing up.</p>

<h2 class="section-heading">What I Actually Need to Back Up</h2>

<p>My TrueNAS box runs ZFS with RAIDZ2, which protects against drive failures. It runs daily snapshots, which protect against accidental deletion within a retention window. What neither of those protect against is the whole machine being gone—fire, theft, or a catastrophic failure that takes out the array entirely.</p>

<p>For off-site backup, I'm not trying to replicate the whole homelab. Most of what runs on that box is reconstructible: container images can be re-pulled, media can be re-acquired, config files are in Git. What I care about is data that's genuinely irreplaceable or painful to recreate: the trading bot's SQLite databases and logs, the Obsidian vault (which also syncs through CouchDB, but I want a third copy), .env files and secrets, and a few other datasets that took real time to set up.</p>

<p>That comes out to a few gigabytes. Not a lot. Which is exactly why Backblaze B2 made sense.</p>

<h2 class="section-heading">Why Backblaze B2</h2>

<p>I looked at the usual options. S3 is the obvious choice if you're already deep in AWS, but for a homelab with a few GB of data, the pricing structure and the complexity of IAM permissions felt like overkill. Wasabi is popular in homelab circles. Cloudflare R2 has no egress fees, which matters if you're pulling data back regularly.</p>

<p>For a backup workload—write frequently, read almost never—B2 made the most sense. It's $6 per terabyte per month for storage. My datasets add up to maybe 15 GB. That's under a dollar a month. The first 10 GB of egress per day is free, which is more than enough for the rare restore scenario. The pricing is simple enough that I didn't need a calculator to know it would be cheap.</p>

<p>Backblaze also has a long track record in the homelab and indie developer community. They've been running cloud storage since 2009, their pricing has been stable, and they're not a division of a larger company with different strategic priorities that might affect the service. For something I want to mostly forget about, that reliability matters more than squeezing out the last cent.</p>

<h2 class="section-heading">The Setup</h2>

<p>TrueNAS Scale has cloud sync built in. Under Data Protection, there's a Cloud Sync Tasks section that handles scheduling, credential management, and the actual sync—powered by rclone under the hood. You don't have to touch rclone directly; TrueNAS wraps it in a UI.</p>

<p>Setting it up took three steps.</p>

<p>First, I created a bucket in Backblaze B2. Private, not public. I also created a scoped application key—restricted to that specific bucket, read and write only—rather than using the master account key. If the key ever leaks, the blast radius is contained to this one bucket.</p>

<p>Second, I added the credentials to TrueNAS under System → Credentials → Backup Credentials. Provider is Backblaze B2, paste in the key ID and application key, click Verify Credential. TrueNAS made a test connection and confirmed it worked before saving.</p>

<p>Third, I created the Cloud Sync Task itself. Direction: Push. Transfer Mode: Sync. Source: the datasets I care about on <code>/mnt/main-storage/</code>. Schedule: daily at 2 AM. And then the important part—I enabled client-side encryption and set a passphrase.</p>

<h2 class="section-heading">Client-Side Encryption</h2>

<p>Backblaze offers server-side encryption. I turned it off and used TrueNAS's built-in client-side encryption instead. The difference matters.</p>

<p>Server-side encryption means Backblaze encrypts your data on their end, with a key they manage. They can decrypt it. Their employees can decrypt it, in principle. Any court order or compelled access that reaches Backblaze reaches your data. For a lot of use cases this is fine—the encryption protects against other customers' data being mixed with yours, not against Backblaze itself.</p>

<p>Client-side encryption means your data is encrypted before it leaves your machine. What arrives at Backblaze is ciphertext. They store ciphertext. If their infrastructure were compromised, the attacker gets ciphertext. Backblaze can't read it. I have .env files in these backups with API keys and private key paths. I have a trading bot database. I'd rather those not be readable by anyone other than me, regardless of what happens on Backblaze's end.</p>

<p>The passphrase lives locally. If I lose it, the backup is unrecoverable—that's the tradeoff. I stored it in my password manager and wrote it down in a sealed envelope in a drawer. Belt and suspenders. The backup is useless without it, so treating it with the same care as the data itself makes sense.</p>

<h2 class="section-heading">Running It</h2>

<p>After saving the task, I ran a dry run first. TrueNAS shows you what would be transferred without actually doing it—good for confirming the source paths are right and the credential works against the destination bucket. The dry run completed cleanly, showing the files it would push.</p>

<p>Then I ran it for real. The first sync transferred the full dataset and took a few minutes. Every subsequent run is incremental—only changed files get pushed. For a daily backup of mostly-static data like a SQLite database and config files, the incremental transfers are small. The scheduled 2 AM job is done before anyone's awake.</p>

<p>I can see the transfer logs in TrueNAS and can verify the files landed in the B2 bucket through the Backblaze web UI. The encrypted files show up with rclone-generated names—just hashed strings, no readable filenames. The directory structure is obfuscated too. From Backblaze's perspective, it's an opaque bucket of encrypted blobs with timestamps.</p>

<h2 class="section-heading">What This Actually Cost Me</h2>

<p>Setup time: maybe 30 minutes, including reading through the TrueNAS documentation on cloud credentials and deciding between the encryption options. This was genuinely one of the simpler infrastructure tasks I've done on the homelab.</p>

<p>Ongoing cost: under a dollar a month at my current data size. If I add more datasets, it scales at $6/TB. For a homelab, I don't realistically see hitting a price point that matters.</p>

<p>Ongoing maintenance: essentially none. The task runs on schedule. TrueNAS logs the results. Uptime Kuma monitors the TrueNAS box. If a sync fails, I'll find out. Other than periodically checking that the job is still completing and the bucket is growing as expected, there's nothing to manage.</p>

<h2 class="section-heading">One Issue I Hit Post-Setup</h2>

<p>After the initial sync completed successfully, the nightly job started failing every night with the same error: <code>Checksum did not match data received (400 bad_request)</code>. The sync would run for
two-plus hours, transfer everything, and then fail at the very end—meaning nothing was flagged as complete and rclone wouldn't clean up the remote. Digging into the job logs with <code>midclt call
core.get_jobs</code> revealed it was always the same file: <code>graylog/mongodb/journal/WiredTigerLog.0000000100</code>. This is a WiredTiger journal file—MongoDB's crash recovery log—and Graylog is actively
writing to it throughout the backup window, so the file changes between when rclone starts reading it and when the upload finishes, causing the checksum to not match what actually arrived at B2. The fix was
straightforward: exclude the MongoDB journal directory from the sync task, since WiredTiger journals are ephemeral by design and regenerate automatically on startup—the actual database collections are what
matter for a restore, not the journal. If you're running any databases or other apps that write continuously during your backup window, either exclude those transient files explicitly or consider stopping the
containers with a pre-script before the sync runs.</p>

<h2 class="section-heading">Recommended</h2>

<p>If you're running TrueNAS and haven't set up off-site backup, this is the path of least resistance. The TrueNAS cloud sync integration is genuinely well done—rclone handles the heavy lifting, the UI abstracts the complexity, and the credential verification step catches config errors before you schedule a task and walk away. Backblaze B2 is cheap enough that cost isn't a real consideration at homelab scale. And client-side encryption means you can put sensitive data in the backup without thinking twice about who else might have access to it.</p>

<p>It took one afternoon. I probably should have done it a while ago.</p>]]></content><author><name>Daniel Farid</name></author><category term="homelab" /><summary type="html"><![CDATA[My homelab gap analysis from a few months back flagged a few things as critical. No resource limits on containers was one. No off-site backups was another. The exact note I wrote was: "all backups on same machine—fire or theft means total loss." That's not a backup strategy. That's just having two copies of the same thing in the same building.]]></summary></entry><entry><title type="html">Replacing iCloud Sync for Obsidian with a Self-Hosted CouchDB</title><link href="https://danielfarid.github.io/danielfarid.github.io/homelab/productivity/2026/03/28/obsidian-self-hosted-sync.html" rel="alternate" type="text/html" title="Replacing iCloud Sync for Obsidian with a Self-Hosted CouchDB" /><published>2026-03-28T14:00:00+00:00</published><updated>2026-03-28T14:00:00+00:00</updated><id>https://danielfarid.github.io/danielfarid.github.io/homelab/productivity/2026/03/28/obsidian-self-hosted-sync</id><content type="html" xml:base="https://danielfarid.github.io/danielfarid.github.io/homelab/productivity/2026/03/28/obsidian-self-hosted-sync.html"><![CDATA[<p>I've been using Obsidian since the homelab idea started. It's the tool that finally clicked for me—local Markdown files, no proprietary format, a linking model that actually maps to how I think. The plugin ecosystem is deep enough to do serious work without feeling like you're assembling a Rube Goldberg machine.</p>

<p>But sync was always the weak point.</p>

<p>I'm running a TrueNAS homelab with twenty-plus containerized services. I have Tailscale, Nginx Proxy Manager, a wildcard SSL cert, and a ZFS array for storage. The one thing I couldn't figure out was how to get my Obsidian vault to sync from my workstation to my iPhone without routing it through Apple's servers.</p>

<p>This is the story of how I finally fixed that.</p>

<h2 class="section-heading">The iCloud Problem</h2>

<p>Obsidian on iOS uses iCloud Drive for sync when you're not paying for Obsidian Sync. In theory, straightforward. In practice, it was never quite right. Sync would stall. Files would show up out of order. Opening the app on my phone sometimes showed vault contents from hours ago even after walking into the same room as my workstation. The reliability just wasn't there.</p>

<p>Beyond the reliability issues, I don't love the idea of my notes—a combination of DFIR research notes, project documentation, and personal knowledge base—sitting in Apple's cloud. Not because I think Apple is reading them, but because data sovereignty matters to me as a principle. I run a homelab for exactly this reason: I want control over where my data lives and who can access it.</p>

<p>Obsidian Sync exists and it probably works great. But I'm already running storage infrastructure. Paying monthly to sync a folder of Markdown files feels like buying groceries when you have a full pantry.</p>

<h2 class="section-heading">What I Already Had</h2>

<p>I'd already solved the Windows-to-TrueNAS side of the problem with Syncthing. The vault files sync continuously between my workstation and a dataset on TrueNAS over Tailscale. Block-level delta sync, works on LAN and remotely, near-instant propagation when files change. I'm not replacing Syncthing—it handles the backup angle well, and having a copy of the vault on TrueNAS matters to me.</p>

<p>What Syncthing can't do is iOS. It doesn't have an iPhone client, and even if it did, the iOS sandbox model would make it difficult to have Syncthing write directly into the Obsidian app container. iOS is a different problem from desktop sync.</p>

<p>I tried Möbius Sync, which is a Syncthing client for iOS. It worked, but Obsidian on iOS expects to own its vault directory inside the iCloud container, and Möbius couldn't write to that path. A dead end.</p>

<p>What I needed was something that worked at the application layer—a sync backend that Obsidian's iOS app could talk to directly. That's a different category of solution.</p>

<h2 class="section-heading">The LiveSync Discovery</h2>

<p>After ruling out filesystem-level approaches, I started looking at Obsidian's community plugin ecosystem. That's where I found <a href="https://github.com/vrtmrz/obsidian-livesync">Self-hosted LiveSync</a> by vrtmrz. It's a community plugin that syncs Obsidian vaults through a CouchDB backend. The plugin handles conflict resolution, chunked storage for large files, and end-to-end encryption. The backend is just CouchDB—a well-documented, Apache-licensed document database that's been in production use since 2005.</p>

<p>CouchDB's replication protocol is what makes this work. It's designed for disconnected, eventually-consistent sync—exactly the problem Obsidian sync needs to solve. The LiveSync plugin wraps it in an Obsidian-native interface with a setup wizard and a config doctor that validates the backend configuration. The whole thing is well thought out.</p>

<p>The catch was that iOS Obsidian requires an HTTPS endpoint. The app refuses to connect to plain HTTP backends. That ruled out direct local network access—I'd need a proper SSL certificate on whatever URL I pointed the plugin at. Fortunately, I already had <code>*.waltbyte.net</code> wildcard certificate in Nginx Proxy Manager and a bunch of services running behind it. Adding one more subdomain was straightforward.</p>

<h2 class="section-heading">The Deployment</h2>

<p>CouchDB runs in a Docker container. I added it to my existing Portainer stack on TrueNAS alongside everything else. The compose block is minimal—the image, a few environment variables for the admin credentials, and two bind mounts: one for the data directory and one for a <code>local.d</code> configuration directory.</p>

<p>CouchDB reads all <code>.ini</code> files in <code>local.d</code> and merges them into its configuration. I created a <code>local.ini</code> there with the settings LiveSync requires: single-node mode, CORS enabled, maximum document and request sizes raised to handle vault chunks, authentication required on all endpoints.</p>

<p>The CORS configuration needed explicit origins for all three Obsidian platforms. Desktop Obsidian uses <code>app://obsidian.md</code>. iOS uses <code>capacitor://localhost</code>. Android uses <code>http://localhost</code>. All three need to be in the <code>origins</code> list or certain clients will get rejected at the preflight stage.</p>

<p>Before starting the container, the data directory needs to be owned by uid 5984—that's the user CouchDB runs as inside the container. If the directory is owned by root, the container starts but can't write to it. I pre-created the TrueNAS datasets and chowned them before deploying the stack. This is the kind of thing that's easy to miss and produces logs that look like permissions errors but don't immediately tell you which layer is wrong.</p>

<h2 class="section-heading">The Init Script</h2>

<p>A fresh CouchDB instance needs a one-time initialization step: <code>_cluster_setup</code> API call to enable single-node mode, then CORS configuration, database creation, and a few other settings. The LiveSync documentation ships a shell script that handles all of it—ten API calls, each checking for success before proceeding.</p>

<p>I ran the script once against the container over the local network. It completed cleanly and returned <code>{"ok": true}</code> on each step. After that I didn't need it again—the config persisted in the CouchDB data directory, and the container has been running correctly through restarts since.</p>

<p>One thing that tripped me up: CouchDB writes the admin password hash to a config file (<code>docker.ini</code>) once you set it. If you ever change the password through the Portainer environment variables after first run, the new value in the env var doesn't match what's in the config file and authentication breaks. Set the password once and leave it. If you need to change it, go through the CouchDB admin UI, not the container environment.</p>

<h2 class="section-heading">SSL and Proxy Setup</h2>

<p>In Nginx Proxy Manager I added <code>sync.waltbyte.net</code> as a new proxy host pointing to the CouchDB container on port 5984. I used the existing wildcard certificate rather than generating a new Let's Encrypt cert for the subdomain—the wildcard already covered it, so no additional certificate management needed.</p>

<p>One thing to verify in the NPM configuration: WebSocket support needs to be enabled for the proxy host. CouchDB's <code>_changes</code> feed uses long-polling in some configurations, and LiveSync's real-time sync benefits from it. I enabled it in the advanced settings tab.</p>

<p>With that in place, I could reach CouchDB at <code>https://sync.waltbyte.net</code> from anywhere—local network, Tailscale, and from my iPhone on cellular. The SSL certificate validated correctly on iOS.</p>

<h2 class="section-heading">LiveSync Setup: Windows</h2>

<p>The LiveSync plugin setup wizard walks you through the connection. You point it at the CouchDB URL, enter the admin credentials, name the database, and it creates everything it needs. The key decision point is end-to-end encryption.</p>

<p>LiveSync's E2EE uses a separate passphrase from your CouchDB credentials. The passphrase encrypts vault content before it ever reaches CouchDB—the database stores ciphertext, not your notes. This matters because CouchDB's admin account could theoretically be compromised at the proxy layer, and I wanted the vault contents protected even in that scenario. The passphrase never leaves your devices; CouchDB never sees the plaintext.</p>

<p>I set a strong E2EE passphrase, completed the wizard, and let it do the initial upload. The first sync takes a few minutes as it chunks and uploads the entire vault. After that, only changed documents replicate.</p>

<h2 class="section-heading">LiveSync Setup: iOS</h2>

<p>On the iPhone, it's the same plugin, same wizard. The first screen asks whether this is a new setup or adding a device to an existing sync. I chose the latter, pasted in the server URL and credentials, entered the E2EE passphrase—which has to match exactly what I set on the desktop—and it connected.</p>

<p>The iOS wizard has one more confirmation step than the desktop: it asks you to verify you understand it's going to overwrite local data with the remote vault. Read this carefully. It means the CouchDB copy becomes authoritative. Since I'd just uploaded from the desktop, that was correct—I wanted the desktop version to win. Confirmed, and it pulled the vault down cleanly.</p>

<p>After setup I disabled iCloud Drive for Obsidian on the iPhone. There's no reason to have it enabled now that sync is handled through CouchDB. The files live in Obsidian's local container on the phone and stay current through LiveSync.</p>

<h2 class="section-heading">How It All Fits Together</h2>

<p>After the full deployment, my sync architecture looks like this:</p>

<ul>
<li><strong>Syncthing</strong>: Windows workstation ↔ TrueNAS, file-level block sync over Tailscale. The TrueNAS copy is a filesystem backup of the vault. This keeps running and doesn't interact with LiveSync.</li>
<li><strong>CouchDB + LiveSync</strong>: Windows workstation ↔ iPhone, real-time document sync over HTTPS. Changes propagate in seconds when both devices are online.</li>
<li><strong>iCloud</strong>: disabled entirely for Obsidian on iPhone.</li>
</ul>

<p>The two sync mechanisms coexist without conflict because they operate at different layers. LiveSync manages document state inside Obsidian. Syncthing manages the raw files on disk on the platforms where it runs. They don't step on each other.</p>

<p>Edit a note on my phone. Open my workstation five minutes later. The change is there. Same in reverse. It's exactly what iCloud was supposed to do but didn't.</p>

<h2 class="section-heading">A Note on Data Sovereignty</h2>

<p>I want to be honest about what this buys you and what it doesn't.</p>

<p>Running CouchDB on your own hardware means your notes aren't on Apple's servers or Obsidian's servers. Your CouchDB admin credentials are in Nginx Proxy Manager's config and your Docker compose file. Your E2EE passphrase lives in the LiveSync plugin config on each device. If either of those is compromised, your notes could be exposed. The security model here is "I trust my homelab infrastructure more than I trust a third-party cloud service"—which may or may not be true for your setup.</p>

<p>For me, with Tailscale for VPN access, Nginx terminating SSL with a proper wildcard cert, and the vault contents encrypted at rest in CouchDB, I'm comfortable with the tradeoff. The data lives on hardware I own, in a building I'm in every day, behind infrastructure I've audited and understand.</p>

<h2 class="section-heading">Was It Worth It?</h2>

<p>Setup time was probably three hours end to end, including the CouchDB container configuration, the Nginx Proxy Manager proxy host, running the init script, and going through the LiveSync wizard on both devices. Not a one-afternoon quickstart, but not a multi-day project either.</p>

<p>The result is sync that actually works. My notes are available on both devices, stay current, and don't pass through any third-party cloud. The end-to-end encryption means the CouchDB database only ever stores ciphertext—the actual vault content is opaque to the infrastructure layer.</p>

<p>If you're already running a homelab with Docker, a reverse proxy, and a wildcard SSL cert, CouchDB plus LiveSync is probably the cleanest self-hosted Obsidian sync solution out there. The LiveSync plugin is actively maintained, the documentation is thorough, and the init script takes care of the tedious CouchDB API calls so you don't have to.</p>

<p>iCloud wasn't working for me anyway. This works.</p>]]></content><author><name>Daniel Farid</name></author><category term="homelab" /><category term="productivity" /><summary type="html"><![CDATA[I've been using Obsidian since the homelab idea started. It's the tool that finally clicked for me—local Markdown files, no proprietary format, a linking model that actually maps to how I think. The plugin ecosystem is deep enough to do serious work without feeling like you're assembling a Rube Goldberg machine.]]></summary></entry><entry><title type="html">Building a Proper AI Assistant with NemoClaw</title><link href="https://danielfarid.github.io/danielfarid.github.io/security/ai/homelab/2026/03/21/nemoclaw.html" rel="alternate" type="text/html" title="Building a Proper AI Assistant with NemoClaw" /><published>2026-03-21T14:00:00+00:00</published><updated>2026-03-21T14:00:00+00:00</updated><id>https://danielfarid.github.io/danielfarid.github.io/security/ai/homelab/2026/03/21/nemoclaw</id><content type="html" xml:base="https://danielfarid.github.io/danielfarid.github.io/security/ai/homelab/2026/03/21/nemoclaw.html"><![CDATA[<p>A couple months ago I wrote about Moltbot—how it's essentially Claude Code with a Telegram wrapper and a security posture that would make a pentester cry. No sandboxing, plaintext credentials, exposed admin panels, and supply chain attacks via its skills library. I ended that post saying I'd keep my AI agents on a short leash.</p>

<p>That wasn't idle commentary. I immediately started building something that actually did the job properly. This is that build.</p>

<h2 class="section-heading">The Problem Statement</h2>

<p>What I actually wanted was simple to describe, hard to build: an AI assistant I can talk to through Discord, that runs inference locally on my GPU, that can't touch files or network resources I haven't explicitly allowed, and that doesn't phone home to a cloud API with my conversations.</p>

<p>Moltbot's failure wasn't the concept—it was the execution. An always-on AI assistant that can take action is genuinely useful. But those exact attributes demand a serious security model, and Moltbot treats them as an afterthought. The irony is that NVIDIA already had a proper solution. I just had to actually deploy it.</p>

<h2 class="section-heading">What NemoClaw Is</h2>

<p>NemoClaw is NVIDIA's OpenClaw agent running inside their OpenShell sandbox framework. OpenClaw is an agentic AI assistant—it can write code, run shell commands, browse the web, manage files. OpenShell is the containment layer: a k3s-based runtime that enforces network policies and Landlock filesystem isolation at the kernel level.</p>

<p>Landlock is a Linux Security Module that restricts filesystem access independent of process permissions. Even if the agent runs as a privileged user, Landlock enforces what directories it can actually touch. This is meaningfully different from "I put it in a Docker container and called it sandboxed." It's kernel-enforced isolation.</p>

<p>The network side uses declarative policy files. You define exactly which external hosts the sandbox can reach; the OpenShell gateway enforces it. For inference, NemoClaw integrates with NVIDIA's NIM microservices—optimized model containers with TensorRT-LLM backends. Instead of paying per token and sending queries to someone else's servers, the model runs on my RTX 5000 Ada under my desk.</p>

<h2 class="section-heading">The Stack</h2>

<p>Here's what the full system looks like:</p>

<pre><code>Discord #tpk-agent
      │
      ▼
discord-bridge/bot.py  (Python, discord.py)
      │  SSH subprocess per message
      ▼
openclaw agent  (inside tpk-agent sandbox)
      │
      ▼
openclaw gateway  (ws://127.0.0.1:18789)
      │
      ▼
nim-proxy.py  (request rewriter, port 8009)
      │
      ▼
NIM container  (Nemotron Nano 4B v1.1, TRT-LLM, RTX 5000 Ada)
</code></pre>

<p>Discord messages hit a Python bot that SSHes into the sandbox and runs <code>openclaw agent --message "..." --json</code>. The response comes back as JSON and gets posted to the channel. The sandbox routes inference through the openclaw gateway to the NIM container via a compatibility proxy I wrote—and that proxy is where most of the actual work happened.</p>

<h2 class="section-heading">The Real Work: NIM Compatibility</h2>

<p>OpenClaw is designed for OpenAI-compatible APIs, but it sends several parameters that NIM's TRT-LLM backend rejects with HTTP 400s. The format issues were mechanical: <code>tool_choice</code> needs to be a plain string, not an object; <code>message.content</code> needs to be a plain string, not a typed array; <code>max_completion_tokens</code> must become <code>max_tokens</code>; <code>store: false</code> needs stripping entirely. The proxy handles all of this on the fly.</p>

<p>The harder problem was tool schemas. OpenClaw sends full function schemas with every request. NIM's backend compiles these into a Finite State Machine for structured output—which takes 10–15 minutes per unique schema, blocks the GPU completely, and resets every time NIM restarts. I found this by watching GPU utilization sit at 100% for fifteen minutes and checking what TensorRT-LLM was actually doing.</p>

<p>The fix: strip tool definitions at the proxy layer. With <code>STRIP_TOOLS=1</code>, the proxy removes <code>tools</code> and <code>tool_choice</code> before forwarding, NIM sees a plain completion request, and it responds in under two seconds. The tradeoff is the model can't use tools—acceptable for my use case of a general Discord assistant.</p>

<h2 class="section-heading">The Model Problem</h2>

<p>The original NemoClaw documentation is written around the 120B Nemotron Super model on NVIDIA cloud infrastructure. I'm running the 4B Nemotron Nano locally. The 8B OOM'd—TensorRT-LLM at bf16 precision needs more VRAM headroom than my 16GB card allows once WSL2 and the OS take their share. The 4B fits comfortably.</p>

<p>The 4B model handles general Q&amp;A well, but it struggled with OpenClaw's full system prompt—approximately 27,000 characters of agent instructions, workspace context, and tool documentation. It would occasionally output <code>NO_REPLY</code>, OpenClaw's special silent response token, instead of actually responding. The model was getting confused by conditional instructions in the tool descriptions that referenced the token in edge cases.</p>

<p>The fix was to deny all tools at the OpenClaw config level (<code>tools.deny: ["*"]</code>), which removes their descriptions from the system prompt entirely. The resulting ~16,000-character prompt the model handles correctly, and responses are typically under two seconds from Discord message to reply.</p>

<h2 class="section-heading">The Security Model (vs. Moltbot)</h2>

<p><strong>Credential storage:</strong> Moltbot stores API keys in plaintext under <code>~/.clawdbot/</code>. NemoClaw's sensitive config lives inside the OpenShell sandbox container. The host has an <code>.env</code> for startup variables in <code>.gitignore</code>. Commodity infostealers already know Moltbot's key paths; the OpenShell container is not a standard target.</p>

<p><strong>Sandboxing:</strong> Moltbot runs with your full user permissions. OpenShell enforces Landlock filesystem restrictions at the kernel level—the agent can only reach directories explicitly granted in its policy. My SSH keys, browser cookies, and the rest of the host are off-limits by default.</p>

<p><strong>Network access:</strong> Moltbot's network access is whatever your OS allows. OpenShell uses a YAML network policy file I commit to the project repo. Nothing else reaches out. If the model tries to exfiltrate data, the policy blocks it at the network layer.</p>

<p><strong>Inference:</strong> Moltbot calls out to Claude, GPT, or whatever API you've configured—every query leaves your machine. With NIM, inference runs locally on my GPU. My conversations don't leave the host.</p>

<p><strong>Supply chain:</strong> OpenClaw's skills library has significantly less community contribution than Moltbot's ClawdHub. That's sometimes frustrating, but it's the right tradeoff. I'm not installing skills from anonymous contributors until the ecosystem matures.</p>

<p>Is it perfect? No. The sandbox configuration took real work to get right, and if you grant the agent permissions, it can still act within them. Security is about reducing attack surface, not eliminating it. But it's meaningfully better than Moltbot's "use at your own risk" posture.</p>

<h2 class="section-heading">Hypothetical Use Cases</h2>

<p><em>The scenarios below are speculative. Any use of agentic AI in a real investigative or legal context should be reviewed by qualified legal and forensic professionals. AI output is not a substitute for trained human analysis.</em></p>

<p>I work in digital forensics and incident response. One workflow that comes to mind: warrant returns from electronic service providers. These returns typically unpack into sprawling directory trees of JSON, SQLite databases, CDR spreadsheets, and account metadata. A local agent with shell access could automate the unpack chain, inventory file types, and surface data categories before an investigator starts digging. Natural language queries against call records, schema discovery on unfamiliar databases, cross-referencing device identifiers across returns—these become conversational rather than manual.</p>

<p>The critical word is <em>local</em>. ESP returns contain victim and suspect data that cannot go to a cloud API—full stop. A model running on your hardware, with no network egress, scoped to only the files relevant to the case, is a fundamentally different proposition than pasting evidence into a chat interface backed by someone else's servers. For classroom use, the full visible pipeline—model, sandbox, network policy—also makes data sovereignty concrete rather than abstract. You can show students what the model <em>can't</em> reach, which is often the more important lesson.</p>

<h2 class="section-heading">The Result</h2>

<p>After all that, what I have is an AI assistant—I named it Ahab—that runs entirely on hardware I own, responds in under two seconds, and can't touch anything on my system I haven't explicitly granted it access to. Message to <code>#tpk-agent</code>, typing indicator while the SSH subprocess runs, response appears. Context maintained within sessions.</p>

<p>The security model is one I can actually defend: kernel-enforced filesystem isolation, declarative network policy as code, local inference with no data leaving the host.</p>

<p>Is a 4B model going to replace Claude Code for complex engineering tasks? No—and that's not what this is for. It's an always-on assistant that can answer questions, help me think through problems, and run simple tasks without handing my credentials to an unsandboxed process or trusting a cloud API not to log my queries. When I have a larger GPU budget, I'll point this at a 70B model through Ollama. The architecture is ready—swap the model ID and the NIM container image, and everything else stays the same.</p>

<p>Until then, Nemo runs on four billion parameters and answers in two seconds, and I know exactly what it can and can't access.</p>

<p>That's the short leash I was talking about.</p>]]></content><author><name>Daniel Farid</name></author><category term="security" /><category term="ai" /><category term="homelab" /><summary type="html"><![CDATA[A couple months ago I wrote about Moltbot—how it's essentially Claude Code with a Telegram wrapper and a security posture that would make a pentester cry. No sandboxing, plaintext credentials, exposed admin panels, and supply chain attacks via its skills library. I ended that post saying I'd keep my AI agents on a short leash.]]></summary></entry><entry><title type="html">Cracking Tails OS Persistent Storage</title><link href="https://danielfarid.github.io/danielfarid.github.io/dfir/forensics/2026/03/09/tails-luks-crack.html" rel="alternate" type="text/html" title="Cracking Tails OS Persistent Storage" /><published>2026-03-09T13:00:00+00:00</published><updated>2026-03-09T13:00:00+00:00</updated><id>https://danielfarid.github.io/danielfarid.github.io/dfir/forensics/2026/03/09/tails-luks-crack</id><content type="html" xml:base="https://danielfarid.github.io/danielfarid.github.io/dfir/forensics/2026/03/09/tails-luks-crack.html"><![CDATA[<p>Seized a SanDisk USB during an investigation. Subject declined to provide the encryption password. Standard situation — so we work the problem from the other end. This post covers how I identified the persistent storage partition, imaged the device, and conducted an exhaustive password attack against the LUKS2 encryption. Spoiler: the encryption won. Here's why that's actually the expected outcome, and why the methodology used was the right call regardless.</p>

<h2 class="section-heading">The Device</h2>

<p>The USB in question was a SanDisk, 114.6 GiB total capacity, running Tails OS. For anyone unfamiliar, Tails is a privacy-focused live operating system that boots entirely from USB and leaves no trace on the host machine. The relevant forensic detail is that Tails supports an optional Persistent Storage feature — an encrypted partition on the same USB that survives reboots and stores user-configured data: documents, browser history, credentials, application data, and more. This is the partition that matters. Everything else on the drive is the read-only Tails system image.</p>

<p>Before touching anything, the device was photographed, bagged, logged, and handled according to standard chain of custody procedures. All work was performed on a forensic copy.</p>

<h2 class="section-heading">Imaging</h2>

<p>The USB was imaged using a standard forensic acquisition to produce a raw sector-by-sector image: <code>SanDisk.001</code>. This preserves every byte of the original, including partition tables, slack space, and the full LUKS header — which becomes important later. Working from the image rather than the live device protects against accidental writes and ensures the original evidence is never modified.</p>

<p>To work with the image in Linux, I mounted it as a loop device:</p>

<pre><code>sudo losetup -fP /home/tpkali/SanDisk.001
sudo losetup -l
</code></pre>

<p>This maps the image to <code>/dev/loop1</code> and automatically creates sub-devices for each partition (<code>/dev/loop1p1</code>, <code>/dev/loop1p2</code>).</p>

<h2 class="section-heading">Identifying the Persistent Storage</h2>

<p>With the loop device attached, I ran <code>fdisk</code> to examine the partition layout:</p>

<pre><code>sudo fdisk -l /home/tpkali/SanDisk.001
</code></pre>

<p>Output revealed a GPT-formatted disk with two partitions:</p>

<pre><code>Device                    Start       End   Sectors  Size Type
/home/tpkali/SanDisk.001p1     2048  16775390  16773343     8G EFI System
/home/tpkali/SanDisk.001p2 16777216 240326655 223549440 106.6G Linux reserved
</code></pre>

<p>The layout is exactly what you'd expect from a Tails USB. Partition 1 is the 8GB EFI system partition — the Tails live environment. Partition 2 is the 106.6GB "Linux reserved" partition — that's the Persistent Storage volume. The type designation "Linux reserved" is Tails' way of marking the persistence partition in the GPT table. It's not labeled or advertised, but if you know Tails, you know what you're looking at.</p>

<p>To confirm it was LUKS-encrypted and gather header information:</p>

<pre><code>sudo cryptsetup isLuks -v /dev/loop1p2
sudo cryptsetup luksDump /dev/loop1p2
</code></pre>

<p>The first command confirmed the LUKS signature. The dump revealed the specifics:</p>

<pre><code>Version:       2
Cipher:        aes-xts-plain64
Cipher key:    512 bits
PBKDF:         argon2id
Time cost:     4
Memory:        1048576
Threads:       4
</code></pre>

<p>LUKS version 2. AES-XTS-PLAIN64 with a 512-bit key. And the key derivation function: Argon2id with one gigabyte of memory, time cost 4, and 4 threads. That last part is where the real challenge lives, which I'll get to shortly.</p>

<h2 class="section-heading">Building the Attack</h2>

<p>With no password from the subject and no other avenue for key recovery, the only path forward is a targeted password attack. Since brute-force against modern encryption is computationally unrealistic, the approach is intelligence-driven: build a wordlist from everything known about the subject, then run permutations against it.</p>

<p>OSINT collection on the subject produced a set of candidate terms — names, pets, usernames, years, known patterns. These formed the initial list: 39 candidates covering the most likely base passwords. From there, I generated a permutation list applying common password construction patterns: appended numbers, special character suffixes, case variations, combined terms, and common "complexity" modifiers people use to satisfy password requirements. That expanded the attack surface to 14,491 total candidates.</p>

<p>The attack used a bash loop against the live image via cryptsetup's <code>--test-passphrase</code> flag — no write operations, no decryption, just a direct test of each candidate against the LUKS2 keyslot:</p>

<pre><code>while IFS= read -r password; do
    echo "Trying: $password"
    if echo "$password" | sudo cryptsetup luksOpen --test-passphrase /dev/loop1p2 2&gt;/dev/null; then
        echo "PASSWORD FOUND: $password"
        break
    fi
done &lt; "PasswordList.txt"
</code></pre>

<p>The session ran from the afternoon of March 5th through the morning of March 6th — approximately 17.5 hours. Every candidate was tested. None matched.</p>

<h2 class="section-heading">Why This Approach Was Correct</h2>

<p>Two main alternatives exist for attacking LUKS2: dictionary/rule-based attacks via hashcat (GPU-accelerated), and direct keyslot testing via cryptsetup (what was done here). The choice depends heavily on the PBKDF in use.</p>

<p>For LUKS2 volumes protected by PBKDF2 or bcrypt, hashcat (mode 29200) provides substantial GPU acceleration. You extract the hash header with <code>hashcat-utils/luks2hashcat</code>, run it against a GPU cluster, and throughput can reach tens of thousands of candidates per second depending on hardware.</p>

<p>Argon2id changes that calculus entirely. It is intentionally designed to be resistant to GPU and ASIC acceleration through its memory-hard construction. Each hash derivation requires the full memory allocation — in this case, one gigabyte — to be written and read in full. GPUs have fast cores but limited per-thread memory bandwidth. You can't parallelize Argon2id across thousands of GPU cores the way you can with simpler hash functions, because each thread needs exclusive access to its 1GB working set. The result is that GPU acceleration provides marginal speedup over CPU for Argon2id, and in some configurations CPU is actually more efficient.</p>

<p>With the header parameters confirmed (argon2id, time=4, memory=1048576, threads=4), the direct cryptsetup approach on modern hardware was testing roughly one candidate every 2-4 seconds. That's not a tooling limitation — that's the function doing exactly what it was designed to do. A hashcat GPU run against this same header would have yielded comparable per-candidate timing, not the orders-of-magnitude speedup GPU acceleration provides against weaker KDFs.</p>

<p>Given that reality, a targeted high-quality wordlist with intelligent permutations was the correct strategy. Throwing rockyou at it would take weeks or months for no practical gain. Intelligence-driven candidate generation is how you work a real case under realistic time constraints.</p>

<h2 class="section-heading">Why the Encryption Won</h2>

<p>The attack failed, and that outcome isn't surprising once you understand the math. It comes down to two things working together: the strength of the key derivation function and the quality of the password.</p>

<p>Argon2id was the winner of the Password Hashing Competition in 2015 — the outcome of a years-long effort by the cryptographic community to produce a KDF specifically resistant to modern hardware attacks. The parameters used here (1GB memory, time cost 4) are aggressive. They mean that any attacker — regardless of whether they're using a laptop or a GPU cluster — faces the same fundamental bottleneck: time and memory per attempt. There's no algorithmic shortcut. The 512-bit AES key derived from the password cannot be recovered without the password, and the password cannot be recovered from the LUKS header without testing candidates one by one at the rate the KDF imposes.</p>

<p>The second factor is the password itself. The entire premise of a dictionary attack is that people choose passwords from a predictable space: common words, personal identifiers, simple patterns. The 14,491 candidates tested here covered the full OSINT-derived attack surface — every known name, pet, username, date, and personal identifier associated with the subject, combined with the full range of common permutation patterns. That list should have been sufficient to crack a typical human-chosen password. It wasn't.</p>

<p>What that tells you: either the password contains a component with no OSINT anchor (a random string, an unrelated word, a passphrase fragment not derivable from anything in the public record), or the password length and entropy push it beyond the permutation space we could reasonably construct. Either way, the password obeyed the two fundamental rules of good password creation: no personal information, and sufficient length or randomness to resist pattern-based attacks. Combined with Argon2id's hardware-resistant key stretching, the result is a system that simply can't be broken on any realistic investigative timeline.</p>

<h2 class="section-heading">The Bottom Line</h2>

<p>The methodology used here — forensic imaging, LUKS header analysis, targeted OSINT-based wordlist attack, intelligent permutation generation — represents the correct approach given the available information. There was no faster path. GPU acceleration against Argon2id doesn't provide the speedup that makes brute-force feasible against well-chosen passwords, and throwing generic wordlists like rockyou at an Argon2id container is an exercise in burning time with near-zero probability of success.</p>

<p>This is what defense-grade encryption looks like when it's properly used. Tails made the right call implementing LUKS2 with Argon2id as the persistence layer. A subject who chose a strong password and used Tails as intended created a forensic barrier that reflects the current state of the art — not a failure of methodology, but a demonstration that the cryptography is working exactly as designed.</p>

<p>Note: All work was conducted under proper legal authority as part of a lawful digital forensic examination.</p>]]></content><author><name>Daniel Farid</name></author><category term="dfir" /><category term="forensics" /><summary type="html"><![CDATA[Seized a SanDisk USB during an investigation. Subject declined to provide the encryption password. Standard situation — so we work the problem from the other end. This post covers how I identified the persistent storage partition, imaged the device, and conducted an exhaustive password attack against the LUKS2 encryption. Spoiler: the encryption won. Here's why that's actually the expected outcome, and why the methodology used was the right call regardless.]]></summary></entry><entry><title type="html">I Tested Moltbot So You Don’t Have To</title><link href="https://danielfarid.github.io/danielfarid.github.io/security/ai/2026/01/30/moltbot-test.html" rel="alternate" type="text/html" title="I Tested Moltbot So You Don’t Have To" /><published>2026-01-30T14:00:00+00:00</published><updated>2026-01-30T14:00:00+00:00</updated><id>https://danielfarid.github.io/danielfarid.github.io/security/ai/2026/01/30/moltbot-test</id><content type="html" xml:base="https://danielfarid.github.io/danielfarid.github.io/security/ai/2026/01/30/moltbot-test.html"><![CDATA[<p>You've probably seen it by now. Moltbot (formerly Clawdbot) has been everywhere—60,000+ GitHub stars, viral Twitter threads, breathless Medium posts about "your personal AI assistant that actually does things." The hype machine was running at full throttle.</p>

<p>So I did what any reasonable person would do: I spun up a VM and tested it myself.</p>

<p>My takeaway? It's Claude Code with a Telegram wrapper and a multitude of attack vectors. I'm not impressed.</p>

<h2 class="section-heading">The Setup</h2>

<p>I wasn't about to run this thing on bare metal. After spending years hardening my homelab and locking down every attack surface I could find, I wasn't going to hand root access to an AI agent on my main system. I spun up an isolated VM specifically for this test.</p>

<p>The installation was straightforward enough—Moltbot promises to be your "personal AI assistant" that can manage your calendar, respond to emails, control your smart home, and basically act as a digital butler. The pitch is compelling: local-first, open-source, full system access.</p>

<p>I connected it to Telegram, which worked without issue. The Telegram interface was actually nice—polished, responsive, easy to use. But it also felt kitschy, like someone wrapped a CLI tool in a chatbot skin and called it innovation. WhatsApp pairing wouldn't work for me, though I suspect that was user error on my part rather than a Moltbot issue.</p>

<p>That "full system access" part should have been the first red flag.</p>

<h2 class="section-heading">What It Actually Is</h2>

<p>After spending time with Moltbot, I came to a simple conclusion: this is essentially Claude Code (or any agentic coding assistant) with a messaging platform wrapper bolted on top.</p>

<p>Don't get me wrong—Claude Code is genuinely useful for software development tasks. But Moltbot takes that same concept and tries to extend it to "life admin" through Telegram, WhatsApp, or whatever messaging platform you connect. The problem is that the security model doesn't scale.</p>

<p>When I'm using Claude Code, I'm in my terminal, in my development environment, watching every command it suggests before execution. With Moltbot, the expectation is that you'll fire off a message from your phone while you're at the grocery store and trust the AI to "handle it."</p>

<p>That's a fundamentally different threat model, and Moltbot doesn't treat it as such.</p>

<h2 class="section-heading">The Security Nightmare</h2>

<p>Nick Saraev put it bluntly in his videos: <a href="https://www.youtube.com/watch?v=esXXuejofgk">Clawdbot Sucks, Actually</a>. And then it got worse. My sentiments align exactly with his analysis.</p>

<p>The security issues aren't theoretical. They're well-documented and actively exploited:</p>

<p><strong>Plaintext Credential Storage:</strong> Moltbot stores your API keys, OAuth tokens, and credentials in plaintext files under <code>~/.clawdbot/</code>. Plain. Text. In 2026. Security researchers at <a href="https://www.bitdefender.com/en-us/blog/hotforsecurity/moltbot-security-alert-exposed-clawdbot-control-panels-risk-credential-leaks-and-account-takeovers">Bitdefender</a> noted that commodity infostealers like RedLine, Lumma, and Vidar are already targeting these files.</p>

<p><strong>Exposed Admin Panels:</strong> Jamieson O'Reilly from Dvuln found <a href="https://www.theregister.com/2026/01/27/clawdbot_moltbot_security_concerns/">hundreds of Moltbot instances exposed to the internet</a> with no authentication. Open admin dashboards. Full access to API keys, conversation histories, and remote code execution capabilities. Just sitting there.</p>

<p><strong>No Sandboxing:</strong> By default, Moltbot runs with the same permissions as your user account. No containerization, no isolation. The AI agent has full access to everything you have access to. <a href="https://blogs.cisco.com/ai/personal-ai-agents-like-moltbot-are-a-security-nightmare">Cisco's security team</a> called it "an absolute nightmare."</p>

<p><strong>Poisoned Skills Library:</strong> O'Reilly demonstrated a <a href="https://snyk.io/articles/clawdbot-ai-assistant/">proof-of-concept supply chain attack</a> by uploading a malicious skill to ClawdHub, artificially inflating its download count to 4,000+, and watching developers from seven countries install it. Remote code execution via the skills library. Classic supply chain attack.</p>

<p><strong>Prompt Injection Surface:</strong> Moltbot ingests data from emails, web searches, and messages. Each of these is a potential prompt injection vector. A malicious email could contain hidden instructions that the AI dutifully executes with your full system permissions.</p>

<h2 class="section-heading">The Hype Machine</h2>

<p>What makes this worse is the manufactured hype. The project went viral partly due to <a href="https://dev.to/sivarampg/from-clawdbot-to-moltbot-how-a-cd-crypto-scammers-and-10-seconds-of-chaos-took-down-the-4eck">crypto scammers hijacking the old Clawdbot handles</a> during the rebrand and pumping a fake $CLAWDE token to $16 million before it crashed.</p>

<p>The rebrand itself happened because Anthropic sent a cease and desist over the name similarity to Claude. In the ten seconds between releasing the old GitHub organization name and claiming the new one, scammers snatched both the old handles.</p>

<p>This isn't just a security story—it's a case study in how AI hype cycles can be weaponized.</p>

<h2 class="section-heading">The Fundamental Problem</h2>

<p>Here's what bothers me most: the Moltbot documentation openly admits "there is no 'perfectly secure' setup." The creators have been transparent that there are no built-in security policies or safety guardrails. It's designed for "advanced AI innovators who prioritize testing and productivity over security controls."</p>

<p>That's not a disclaimer. That's an abdication of responsibility.</p>

<p>According to <a href="https://www.token.security/blog/the-clawdbot-enterprise-ai-risk-one-in-five-have-it-installed">Token Security</a>, 22% of their enterprise customers have employees running Moltbot—likely without IT approval. Hudson Rock's assessment was damning: "Clawdbot represents the future of personal AI, but its security posture relies on an outdated model of endpoint trust."</p>

<h2 class="section-heading">What Would Make It Better</h2>

<p>If you absolutely must run Moltbot, <a href="https://www.bleepingcomputer.com/news/security/viral-moltbot-ai-assistant-raises-concerns-over-data-security/">security researchers recommend</a>:</p>

<ul>
<li>Isolate it in a VM or container—never run on your host OS</li>
<li>Firewall all admin ports aggressively</li>
<li>Enable encryption at rest for stored credentials</li>
<li>Never expose it to the internet</li>
<li>Vet every skill before installing—treat ClawdHub like an untrusted package repository</li>
<li>Restrict filesystem access to only what's absolutely necessary</li>
</ul>

<p>But at that point, you've basically rebuilt the security model from scratch. And if you're doing all that work, why not just use Claude Code in your terminal where you can actually see what's happening?</p>

<h2 class="section-heading">My Verdict</h2>

<p>Moltbot is a solution looking for a problem, wrapped in a security nightmare. The promise of an AI assistant that "does things" is compelling, but the implementation prioritizes convenience over security in ways that are genuinely dangerous.</p>

<p>For my use case—someone who runs a homelab, cares about security posture, and doesn't want to hand my credentials to plaintext files accessible by any infostealer—Moltbot is a hard pass.</p>

<p>I'll stick with Claude Code in my terminal, where I can see every command before it executes, where my credentials aren't sitting in plaintext Markdown files, and where the attack surface is something I actually control.</p>

<p>The hype will die down. The exposed instances will get compromised. And hopefully, the next generation of AI assistants will learn from Moltbot's mistakes.</p>

<p>Until then, I'll keep my AI agents on a short leash.</p>]]></content><author><name>Daniel Farid</name></author><category term="security" /><category term="ai" /><summary type="html"><![CDATA[You've probably seen it by now. Moltbot (formerly Clawdbot) has been everywhere—60,000+ GitHub stars, viral Twitter threads, breathless Medium posts about "your personal AI assistant that actually does things." The hype machine was running at full throttle.]]></summary></entry><entry><title type="html">When Your ISP Says You’re Under Attack</title><link href="https://danielfarid.github.io/danielfarid.github.io/homelab/security/2026/01/25/xfinity-security-alerts.html" rel="alternate" type="text/html" title="When Your ISP Says You’re Under Attack" /><published>2026-01-25T13:00:00+00:00</published><updated>2026-01-25T13:00:00+00:00</updated><id>https://danielfarid.github.io/danielfarid.github.io/homelab/security/2026/01/25/xfinity-security-alerts</id><content type="html" xml:base="https://danielfarid.github.io/danielfarid.github.io/homelab/security/2026/01/25/xfinity-security-alerts.html"><![CDATA[<p>So there I was, Saturday morning, coffee in hand, when my phone buzzes with an Xfinity Advanced Security alert. The message was ominous: "Attempt from IP 194.180.49.117 was blocked on Generic Brand Linux. This happens when a known source of hacking tries to attack a device on your home network."</p>

<p>My TrueNAS server was apparently under attack. Cool. Great way to start the weekend.</p>

<h2 class="section-heading">The Initial Panic</h2>

<p>When you're running 17+ containers, and your entire homelab infrastructure on one box, "known source of hacking" gets your attention. My first thought was that something got exposed—maybe a misconfigured Cloudflare tunnel, a rogue UPnP port forward, or some container decided to phone home to the wrong neighborhood.</p>

<p>I did a quick sanity check. Pulled up my Cloudflare DNS records—all pointing to my Tailscale IP (100.64.0.0/10 range), not my actual public IP. Good. Checked the Xfinity gateway for port forwards—none configured. Also good.</p>

<p>Then I ran a quick port scan against my own public IP:</p>

<pre><code>for port in 22 80 443 445 8080 8443 9000 32400 6881; do
  timeout 2 bash -c "echo &gt;/dev/tcp/[PUBLIC_IP]/$port" 2&gt;/dev/null &amp;&amp; echo "Port $port: OPEN" || echo "Port $port: closed"
done
</code></pre>

<p>All closed. Nothing exposed. So what gives?</p>

<h2 class="section-heading">The Second Alert</h2>

<p>The next morning, another alert. Different IP this time: 185.156.73.181. Same story—blocked attempt on "Generic Brand Linux."</p>

<p>At this point I figured it was time to actually investigate these IPs and run a proper security assessment of my stack. If someone's knocking, I want to know who and make sure the door is actually locked.</p>

<h2 class="section-heading">Threat Intelligence</h2>

<p>I dug into both IPs to see what we were dealing with.</p>

<p><strong>IP #1: 194.180.49.117</strong> belongs to MEVSPACE sp. z o.o. (AS201814), a Polish VPS hosting provider based in Warsaw. MEVSPACE has been documented in threat intelligence reports as infrastructure commonly abused for session hijacking and credential stuffing campaigns. Their IPs frequently appear in automated scanning operations. Source: <a href="https://ipinfo.io/AS201814">IPinfo.io</a>, <a href="https://cybersecuritynews.com/hackers-abuse-vps-servers/">CyberSecurityNews</a></p>

<p><strong>IP #2: 185.156.73.181</strong> is registered to TOV E-RISHENNYA (AS211736), with the network "Reldas-net." It's registered in the Netherlands but the actual organization address is in Kyiv, Ukraine. The abuse contact? A Gmail address: erishennya.res@gmail.com. That's always a red flag. This is classic bulletproof-adjacent hosting—cheap VPS instances used for scanning campaigns, then burned when they get blacklisted. Source: <a href="https://ipinfo.io/AS211736/185.156.73.0/24">IPinfo.io</a></p>

<h2 class="section-heading">The Full Security Audit</h2>

<p>Since I was already in investigation mode, I ran a comprehensive security posture assessment on my TrueNAS box. Here's the script:</p>

<pre><code>#!/bin/bash
# TrueNAS Security Posture Assessment

echo "=============================================="
echo "  TRUENAS SECURITY POSTURE ASSESSMENT"
echo "  $(date)"
echo "=============================================="

echo -e "\n[1/15] SYSTEM INFORMATION"
echo "Hostname: $(hostname)"
echo "TrueNAS Version: $(cat /etc/version 2&gt;/dev/null)"
echo "Kernel: $(uname -r)"
echo "Uptime: $(uptime -p)"

echo -e "\n[2/15] USER ACCOUNTS &amp; ACCESS"
echo "--- Users with shell access ---"
cat /etc/passwd | grep -v nologin | grep -v false | grep -v /bin/sync

echo -e "\n--- Users with UID 0 (root equivalent) ---"
awk -F: '$3 == 0 {print $1}' /etc/passwd

echo -e "\n[3/15] SSH CONFIGURATION"
SSHD_CONFIG="/etc/ssh/sshd_config"
if [ -f "$SSHD_CONFIG" ]; then
  echo "PermitRootLogin: $(grep -i "^PermitRootLogin" $SSHD_CONFIG 2&gt;/dev/null || echo 'not set')"
  echo "PasswordAuthentication: $(grep -i "^PasswordAuthentication" $SSHD_CONFIG 2&gt;/dev/null || echo 'not set')"
  echo "PubkeyAuthentication: $(grep -i "^PubkeyAuthentication" $SSHD_CONFIG 2&gt;/dev/null || echo 'not set')"
fi

echo -e "\n[4/15] NETWORK LISTENERS"
echo "--- Services on 0.0.0.0 (all interfaces) ---"
ss -tlnp | grep "0.0.0.0" | awk '{print $4, $6}' | sort -u

echo -e "\n[5/15] FIREWALL STATUS"
iptables -L -n 2&gt;/dev/null | head -20

echo -e "\n[6/15] DOCKER SECURITY"
echo "--- Privileged containers ---"
docker ps -q 2&gt;/dev/null | xargs -I {} docker inspect --format ': Privileged=' {} 2&gt;/dev/null | grep "true"

echo -e "\n--- Containers with host network ---"
docker ps -q 2&gt;/dev/null | xargs -I {} docker inspect --format ': ' {} 2&gt;/dev/null | grep "host"

echo -e "\n--- Docker socket mounts (container escape risk) ---"
docker ps -q 2&gt;/dev/null | xargs -I {} docker inspect --format ':  ' {} 2&gt;/dev/null | grep "docker.sock"

echo -e "\n[7/15] APPLICATION AUTH STATUS"
echo "--- Checking *arr apps for auth ---"
for app in sonarr radarr prowlarr; do
  config=$(docker exec $app cat /config/config.xml 2&gt;/dev/null)
  if [ -n "$config" ]; then
    auth=$(echo "$config" | grep -oPm1 "(?&lt;=&lt;AuthenticationMethod&gt;)[^&lt;]+" || echo "None")
    echo "$app: $auth"
  fi
done 2&gt;/dev/null

echo -e "\n[8/15] FAILED LOGIN ATTEMPTS"
grep -i "failed\|invalid" /var/log/auth.log 2&gt;/dev/null | tail -10

echo -e "\n[9/15] SMB SECURITY"
testparm -s 2&gt;/dev/null | grep -i "hosts allow"

echo -e "\n[10/15] NFS EXPORTS"
showmount -e localhost 2&gt;/dev/null

echo -e "\n[11/15] FILE PERMISSIONS"
echo "--- World-writable files in /etc ---"
find /etc -type f -perm -002 2&gt;/dev/null | head -5

echo -e "\n[12/15] CRON JOBS"
crontab -l 2&gt;/dev/null | grep -v "^#"

echo -e "\n[13/15] TAILSCALE STATUS"
tailscale status 2&gt;/dev/null || docker exec tailscale tailscale status 2&gt;/dev/null

echo -e "\n[14/15] CERTIFICATE EXPIRY"
echo "--- Certs expiring within 30 days ---"
find /etc -name "*.crt" 2&gt;/dev/null | head -5 | while read cert; do
  expiry=$(openssl x509 -enddate -noout -in "$cert" 2&gt;/dev/null | cut -d= -f2)
  if [ -n "$expiry" ]; then
    exp_epoch=$(date -d "$expiry" +%s 2&gt;/dev/null)
    now_epoch=$(date +%s)
    days_left=$(( (exp_epoch - now_epoch) / 86400 ))
    [ "$days_left" -lt 30 ] &amp;&amp; echo "EXPIRING: $cert ($days_left days)"
  fi
done

echo -e "\n[15/15] RECENT CONFIG CHANGES"
find /etc -type f -mtime -7 2&gt;/dev/null | head -10

echo -e "\n=============================================="
echo "  ASSESSMENT COMPLETE"
echo "=============================================="
</code></pre>

<h2 class="section-heading">The Results: Everything Was Fine</h2>

<p>Here's what my audit confirmed: SSH service disabled (inactive). SSH config set to key-only auth with no root password. SMB access restricted to Tailscale IPs only (hosts allow = 100.64.0.0/10). NFS had no exports configured. All *arr apps had Forms authentication enabled. Only Tailscale ran as a privileged container (which is required). Zero failed login attempts. All external ports closed.</p>

<p>My perimeter was solid. The scanners were hitting a brick wall.</p>

<h2 class="section-heading">What's Actually Happening</h2>

<p>Every public IP on the internet gets scanned thousands of times daily. This isn't news—it's just the background radiation of the modern internet. What's changed is the scale and sophistication.</p>

<p>These alerts represent automated reconnaissance from cheap VPS instances. The attackers spin up servers on MEVSPACE, E-RISHENNYA, and dozens of similar providers, run internet-wide scans looking for exposed NAS devices (Synology, QNAP, TrueNAS), default credentials on routers, open Plex servers, vulnerable IoT devices, and anything running on common ports.</p>

<p>When Xfinity's threat intelligence recognizes a known-bad IP hitting your public address, it generates an alert. The key phrase in the notification? "was blocked." The probe never reached my TrueNAS—it hit the gateway's firewall and died.</p>

<h2 class="section-heading">The Bot and AI Factor</h2>

<p>What's interesting is the sheer volume. These aren't script kiddies manually probing networks. This is automated infrastructure—likely AI-assisted in target selection and vulnerability identification. The economics are simple: spin up disposable VPS instances for $5-10/month, run automated scans against residential IP ranges, flag anything that responds, sell access or exploit directly, burn the IP when it gets blacklisted, repeat.</p>

<p>The barrier to entry has never been lower. Tools like Shodan and Censys have legitimate uses, but they've also created a roadmap for attackers. Everyone knows where the NAS devices are. The scanners will keep coming. That's just the internet now. The goal isn't to make them stop—it's to make sure they find nothing when they arrive.</p>]]></content><author><name>Daniel Farid</name></author><category term="homelab" /><category term="security" /><summary type="html"><![CDATA[So there I was, Saturday morning, coffee in hand, when my phone buzzes with an Xfinity Advanced Security alert. The message was ominous: "Attempt from IP 194.180.49.117 was blocked on Generic Brand Linux. This happens when a known source of hacking tries to attack a device on your home network."]]></summary></entry><entry><title type="html">Homelab Chronicles</title><link href="https://danielfarid.github.io/danielfarid.github.io/homelab/2025/12/29/homelabfollowup.html" rel="alternate" type="text/html" title="Homelab Chronicles" /><published>2025-12-29T16:27:43+00:00</published><updated>2025-12-29T16:27:43+00:00</updated><id>https://danielfarid.github.io/danielfarid.github.io/homelab/2025/12/29/homelabfollowup</id><content type="html" xml:base="https://danielfarid.github.io/danielfarid.github.io/homelab/2025/12/29/homelabfollowup.html"><![CDATA[<p>So I built a homelab three months ago—a proper production-grade personal cloud infrastructure running 17+ services. Since then, I've learned that "production-ready" doesn't mean "maintenance-free." I figured a little update could be helpful.</p>

<p>For the most part my services have been running flawlessly. No issues with my mobile device integration,; it's been great. In this update, a routine TrueNAS update that broke DNS resolution across my entire network, a three-day battle with Wazuh SIEM that ended in defeat, and how a 15-minute Graylog deployment finally gave me the security monitoring I actually needed.</p>

<h2 class="section-heading">The Update That Broke Everything</h2>

<p>TrueNAS had been nagging me about updates for a couple weeks. I figured "how bad could it be?" The system had been rock-solid for months and every piece of data I own is on this thing so I felt like updates shouldnt be avoided. The update went fine—system rebooted, came back up, all services started. Then I tried to access anything. Nothing worked.</p>

<p>The problem was DNS resolution. Tailscale IP addresses had shifted during the update, and my Cloudflare DNS records were still pointing to the old IPs suddenly resolved to nowhere. Every subdomain I'd configured was broken.</p>

<p>What should have been a 10-minute fix turned into an evening of troubleshooting because I hadn't documented which services relied on which IP configurations. The fix was straightforward—update DNS records in Cloudflare, restart Pi-hole. But the real lesson was documentation: I spent the next evening properly mapping my entire network topology, IP allocations, and service dependencies.</p>

<h2 class="section-heading">The Wazuh SIEM Disaster</h2>

<p>With the network stable, I decided to implement proper security monitoring. After some research I settle on Wazuh—an enterprise-grade open-source SIEM that looked perfect on paper. Enterprise features, agent-based monitoring, threat detection, MITRE ATT&amp;CK integration.</p>

<p>"I'll just deploy the Docker stack," I thought. "It'll be running in 10 minutes."</p>

<p>Three hours later, I was staring at SSL certificate errors. The Wazuh indexer refused to start. Certificate validation failures everywhere. I tried disabling SSL requirements with every environment variable I could find: DISABLE_INSTALL_DEMO_CONFIG=true, plugins.security.disabled=true, OPENSEARCH_SSL_VERIFICATIONMODE=none.</p>

<p>Guess WHAT: You cannot disable SSL in Wazuh. It's baked into every component at a fundamental level.</p>

<p>Attempt two: Use Wazuh 4.3.10—older, more stable. Same SSL certificate errors. Different version, identical problems. The indexer crashed: "access denied reading /etc/wazuh-indexer/certs/indexer.pem". Fixed permissions. It crashed again: "certificate is valid for demo.indexer, not wazuh-indexer".</p>

<p>Attempt three: Follow Wazuh's official certificate generation process. Downloaded their cert generation tool, created proper certificates with correct hostnames:</p>

<pre><code>nodes:
  indexer:
    - name: wazuh-indexer
      ip: 172.20.0.26
  server:
    - name: wazuh
      ip: 172.20.0.25
  dashboard:
    - name: wazuh-dashboard
      ip: 172.20.0.27
</code></pre>

<p>Generated certificates. Mounted them correctly. The indexer started for 30 seconds, then crashed with "bad_certificate" errors—despite having certificates it just generated using Wazuh's own tool.</p>

<p>After 8+ hours across three attempts, I gave up. I'd learned a lot about OpenSearch security and SSL/TLS (which I'm sure will be super useful- never), but I didn't have a working SIEM.</p>

<h2 class="section-heading">Graylog: 15 Minutes to Working SIEM</h2>

<p>Time for a reality check. If I've learned anything from my Incident Response training I could accomplish this with LOGS. I needed to see Docker logs in one place, catch failed SSH attempts, and have searchable logs. I didn't need enterprise-grade security orchestration or MITRE ATT&amp;CK integration. I figured with the looming role of being the official SysAdmin at work, I might as well get knee deep know.</p>

<p>Graylog deployment: 15 minutes from docker-compose to working web UI. Three containers (MongoDB, OpenSearch, Graylog), no SSL certificates, no security plugin battles. Just services that start and work:</p>

<pre><code>services:
  mongodb:
    image: mongo:6.0
  opensearch:
    image: opensearchproject/opensearch:2.11.0
    environment:
      - "plugins.security.disabled=true"
  graylog:
    image: graylog/graylog:5.2
    ports:
      - "9001:9000"      # Web interface
      - "1514:1514/udp"  # Syslog
      - "12201:12201/udp" # GELF
</code></pre>

<p>Getting logs flowing was trivial. For TrueNAS: created a Syslog UDP input in Graylog, pointed TrueNAS at it. Done. For Docker containers, updated /etc/docker/daemon.json:</p>

<pre><code>{
  "log-driver": "gelf",
  "log-opts": {
    "gelf-address": "udp://10.0.0.96:12201",
    "tag": ""
  }
}
</code></pre>

<p>For existing containers in Portainer stacks, added logging config to each service:</p>

<pre><code>logging:
  driver: gelf
  options:
    gelf-address: "udp://10.0.0.96:12201"
    tag: ""
</code></pre>

<p>Redeployed stacks. Logs started flowing immediately from all 17+ containers.</p>

<h2 class="section-heading">Building Dashboards That Actually Help</h2>

<p>Created a "Homelab Overview" dashboard with widgets that matter:</p>

<p>**Top Active Containers** - Shows which services are chattiest (Immich logs everything).</p>

<p>**SSH Failed Logins** - Search: message:"Failed password". Shows bot brute-force attempts.</p>

<p>**Error Timeline** - Search: level:&lt; 4 (emergency, alert, critical, error). Red area chart showing problems across all services.</p>

<p>**Container Restarts** - Catches flapping services. qBittorrent occasionally restarts—something to investigate.</p>

<p>Total time to create useful dashboards: was probably 60 minutes. No fighting the tool, just building what I needed.The UI isn't super straightfoward and I got hung up a bit. Be careful to make sure you know how the Indexer is naming logs from the stream.</p>

<h2 class="section-heading">Other Updates</h2>

<p>**Kali Linux VM:** Deployed on NVMe storage, accessible via web console. Only runs when needed for security research. Isolated environment with full toolkit (Burp Suite, Metasploit, nmap, Wireshark).</p>

<p>**qBittorrent Performance:** Still glacially slow through ProtonVPN. Tried different endpoints, connection tweaks, nothing helps. Downloads that should take hours take days. It works, just slowly—on the "fix when bored" list.</p>

<p>**Everything Else:** Rock solid. Plex streams 4K without stuttering.Which came in clucth to avoid the Christmas movie streaming price gauging this year. Immich processes photos instantly. Vaultwarden syncs passwords perfectly. Tailscale provides VPN access from anywhere. The core homelab just works.</p>

<h2 class="section-heading">Lessons Learned</h2>

<p>Know When to Walk Away: I spent 8+ hours fighting Wazuh. Should have quit after attempt two. Graylog took 15 minutes and was immediately useful. Sometimes "good enough" beats "perfect but doesn't work."</p>

<p>Documentation Saves Lives: When DNS broke, I couldn't remember which services needed which configurations. Now I have network diagrams, IP allocations, and dependency maps. This has already saved hours during troubleshooting.</p>

<p>Match Tools to Requirements: Wazuh has impressive features I don't need. Graylog does exactly what I need without overwhelming complexity. For a 17-container homelab, simple and functional beats feature-rich and broken.</p>

<p>Monitoring Changes Everything:Before Graylog, I knew when services went down but not why. Now I can see patterns, identify root causes in minutes instead of hours, have alerts in place for any malicious behavior, and actually understand what's happening across my infrastructure. </p>

<h2 class="section-heading">Current State</h2>

<p>Homelab is back to stable production. Seventeen containers with 99%+ uptime. Memory at 40GB/64GB, CPU rarely exceeds 30%, storage at 12% utilization. Graylog ingests 50,000 log messages daily. Monitoring dashboards show what matters. Family members use services without realizing anything changed.</p>

<p>Most importantly: infrastructure I completely control with full visibility into everything happening.</p>

<h2 class="section-heading">For Anyone Building Their Own</h2>

<p>Start with Graylog unless you need specific enterprise features. Don't waste days fighting Wazuh's SSL requirements.</p>

<p>Document your network before you need it. Future you will thank present you. Seriously that sucked.</p>

<p>Accept imperfection. My qBittorrent is slow. Some containers restart occasionally. It's fine because the goal is functional, monitored, maintainable infrastructure you enjoy working on and to learn while doing it.</p>

<p>The homelab journey is iterative—you build, it breaks, you fix, you learn, you improve. Three months ago I thought it was "done." Today I know it's never done. And that's what makes it interesting and I'm sure will have a ton of iterations. Questions? Hit me up. Want to see my configs? They're probably in one of my GitHub repos. Want to tell me what I did wrong? You're probably right, and I'd love to hear about it.</p>]]></content><author><name>Daniel Farid</name></author><category term="homelab" /><summary type="html"><![CDATA[So I built a homelab three months ago—a proper production-grade personal cloud infrastructure running 17+ services. Since then, I've learned that "production-ready" doesn't mean "maintenance-free." I figured a little update could be helpful.]]></summary></entry><entry><title type="html">Building a Production-Grade Homelab:</title><link href="https://danielfarid.github.io/danielfarid.github.io/homelab/2025/10/02/homelab.html" rel="alternate" type="text/html" title="Building a Production-Grade Homelab:" /><published>2025-10-02T13:27:43+00:00</published><updated>2025-10-02T13:27:43+00:00</updated><id>https://danielfarid.github.io/danielfarid.github.io/homelab/2025/10/02/homelab</id><content type="html" xml:base="https://danielfarid.github.io/danielfarid.github.io/homelab/2025/10/02/homelab.html"><![CDATA[<p>So I built a homelab. Not the "throw Docker on a Raspberry Pi" kind of homelab, but a proper production-grade personal cloud infrastructure that I'd be comfortable showing to enterprise architects. This is the story of how three months of eBay hunting, some hard-learned networking lessons, and way too much coffee turned into a 17-service platform that handles everything from media streaming to family photo management with enterprise-level security.</p>

<p>Fair warning: this is going to be long. We're talking multi-site backup strategies, ZFS pool encryption, VPN-only admin access, and automated monitoring. If you're looking for a quick "Docker Compose and done" tutorial, this isn't it. But if you want to understand how to build something that actually works in production while learning from my mistakes along the way, grab your beverage of choice and settle in.</p>

<h2 class="section-heading">The Hardware Hunt: Three Months on eBay</h2>

<p>The project started the way most homelab projects do—with the realization that my collection of cloud subscriptions was costing more per month than a decent used server would cost per year. But more than that, I wanted something I controlled. As someone who works in digital forensics, the idea of having complete visibility into my infrastructure, proper logging, and the ability to practice enterprise security concepts at home was appealing.</p>

<p>I spent about three months watching eBay for deals on enterprise hardware. The goal was simple: find something powerful enough to run a dozen containerized services, reliable enough to trust with family data, and cheap enough that I wouldn't feel bad if I completely bricked it during experimentation.</p>

<p>What I ended up with was a Dell T3620 workstation. Not the sexiest hardware, but it checked all the boxes: Intel i7 processor, and here's the kicker—I upgraded it to 64GB of DDR4 ECC RAM. That memory upgrade alone changed the entire scope of what became possible. With that much headroom, I could run comprehensive monitoring, allocate generous ZFS ARC cache, and never worry about memory pressure even with a dozen services running simultaneously.</p>

<p>For storage, I grabbed three 8TB WD Red drives for the main RAID-Z1 array, a 1TB SSD for the OS, and a 1TB NVMe drive for VM storage. The whole setup came together piece by piece as deals appeared, and by the time I had all the components, I'd spent roughly what six months of my previous cloud subscriptions would have cost.</p>

<p>Oh, and I picked up an APC Back-UPS 600 for power protection. Nothing fancy, but it gives me about 15-20 minutes of runtime—enough for graceful shutdowns during power events. More on why that matters later.</p>

<h2 class="section-heading">TrueNAS Scale: The Foundation</h2>

<p>I went with TrueNAS Scale as the foundation OS. Being Debian-based means it has proper Docker support while still giving me enterprise-grade ZFS management. The installation itself was straightforward—flash the ISO to USB, boot, install to the dedicated SSD, configure a static IP, and you're running.</p>

<p>I set up two ZFS pools: the main storage array in RAID-Z1 configuration across the three 8TB drives (giving me about 16TB usable), and the NVMe as a high-performance pool for databases and temporary processing. Both pools got full AES-256-GCM encryption because, well, why wouldn't you? Here's where having 64GB of RAM became important: I could allocate 20GB to ZFS ARC cache. That might seem excessive, but when you're serving media to multiple streams while Immich is processing photo uploads and Plex is generating thumbnails, that cache hit rate makes the difference between smooth operation and stuttering performance.</p>

<p>The dataset structure took some planning. I created separate datasets for each major service—Plex, Immich, user files, media libraries, downloads, backups, and configuration files. ZFS datasets are like having individual filesystems with their own properties, which becomes important later when you're doing snapshots and setting different compression levels.</p>

<p>I enabled LZ4 compression across the board. It's essentially free on modern CPUs and typically nets you 20-30% space savings on text and configuration files. For my use case, that means an extra few terabytes of effective capacity.</p>

<h2 class="section-heading">Network Security: The Hard Lessons</h2>

<p>My plan was solid: use Tailscale for VPN access, NGINX Proxy Manager for SSL termination and reverse proxying, and Pi-hole for DNS filtering. Standard stuff, right? Well, the execution taught me some valuable lessons.</p>

<h2 class="section-heading">DNS Resolution: When Everything Breaks After a Reboot</h2>

<p>First major issue: after getting everything configured and running beautifully, I rebooted. Simple maintenance reboot. When the system came back up, nothing worked. Containers couldn't resolve hostnames, Docker couldn't pull images, and I was staring at "temporary failure in name resolution" errors everywhere.</p>

<p>TrueNAS was overwriting my DNS configuration in /etc/resolv.conf on every boot. I'd set up proper nameservers, everything worked, but the system wasn't persisting the configuration. The fix was to set the DNS configuration through the TrueNAS web interface (Network → Global Configuration) rather than editing files directly. Once I configured it properly through the UI with Google DNS and Cloudflare as fallbacks, it persisted across reboots.</p>

<p>Lesson learned: respect the abstraction layers. TrueNAS has its own configuration management system, and fighting it just creates problems.</p>

<h2 class="section-heading">Tailscale: The Restart Loop of Doom</h2>

<p>The second major headache was getting Tailscale working for VPN access. I wanted to advertise my entire home network through Tailscale so I could access all services remotely without exposing anything to the public internet. Simple concept, frustrating execution.</p>

<p>The container kept restarting. Over and over. Exit code 1, restart, exit code 1, restart. The issue? I was using both the TS_ROUTES environment variable and the --advertise-routes flag in TS_EXTRA_ARGS. Tailscale didn't like the duplication and threw a fit.</p>

<p>The fix was to use only TS_EXTRA_ARGS with the full route advertisement configuration, and completely remove the TS_ROUTES variable. Once I made that change, Tailscale came up stable and has been rock-solid ever since. I can now access every service through the VPN, and nothing is exposed directly to the internet except the VPN endpoint itself.</p>

<h2 class="section-heading">The qBittorrent Authorization Mystery</h2>

<p>Third lesson came from qBittorrent. I set it up, configured the downloads, pointed it at the correct storage... and got "Unauthorized" errors whenever I tried to access the web interface. The credentials were correct, the container was running, but authentication just failed.</p>

<p>Turns out the issue was port mapping. I had the external port mapped to 8083 but the internal port set differently in the container configuration. qBittorrent's web interface is particular about this—the WEBUI_PORT environment variable needs to exactly match your port mapping. Once I set both to 8083, authentication worked immediately.</p>

<p>These weren't catastrophic failures, but each one cost me hours of troubleshooting. And that's fine—every production system has its quirks, and learning them now means they won't bite me during an actual emergency.</p>

<h2 class="section-heading">Service Architecture: Building the Stack</h2>

<p>With the foundation stable, I started deploying services. Everything runs in Docker containers managed through Portainer, with a single docker-compose stack orchestrating the whole setup. This gives me version control on the entire infrastructure configuration and makes disaster recovery straightforward—restore the compose file and data volumes, run docker-compose up, and you're back.</p>

<p>The service architecture breaks down into four tiers:</p>

<h2 class="section-heading">Infrastructure Services</h2>

<p>At the bottom, we have the core platform services that everything else depends on. Portainer provides the container management interface. Tailscale handles VPN mesh networking. NGINX Proxy Manager terminates SSL and routes traffic to the appropriate backends. Pi-hole filters DNS queries network-wide.</p>

<p>These services start first and everything else depends on them. If any of these fails, the whole platform is compromised, so they get careful attention in the startup order and have aggressive health checks.</p>

<h2 class="section-heading">Public-Facing Applications</h2>

<p>These are the services that family members actually use, accessible through SSL-secured domain names. Plex streams media. Immich handles photo management with AI-powered organization. Overseerr provides a clean interface for requesting new movies and TV shows. FileBrowser gives web-based file access. Vaultwarden serves as the family password manager.</p>

<p>Each of these services is accessible via HTTPS through properly configured reverse proxies. Let's Encrypt handles certificate automation, and everything renews automatically. Users never see the underlying complexity—they just access photos.homelab.example or requests.homelab.example and everything works.</p>

<h2 class="section-heading">Administrative Services</h2>

<p>These are VPN-only. No public access, period. The download stack—qBittorrent routed through ProtonVPN, plus Radarr, Sonarr, and Prowlarr for media automation. Pi-hole admin interface for DNS management. Portainer for container management. NGINX Proxy Manager's admin interface.</p>

<p>The security model here is simple: if you're not on the VPN, these services don't exist. Even on my local network, they're locked down to only respond to VPN traffic. This gives me secure remote access without the attack surface of public-facing admin interfaces.</p>

<h2 class="section-heading">Monitoring and Automation</h2>

<p>This is where the platform becomes self-maintaining. Uptime Kuma monitors every service with real-time health checks, tracks response times, and maintains historical data. It's got a clean dashboard accessible at dashboard.homelab.example (VPN-only, naturally) that shows the health of the entire infrastructure at a glance.</p>

<p>Watchtower runs in monitor-only mode, checking daily for container updates but not auto-applying them. This gives me visibility into available updates without the risk of automatic deployments breaking something at 3 AM. I review the updates manually, test them, and deploy during maintenance windows.</p>

<p>I wrote custom shell scripts for deeper monitoring—SSL certificate expiration tracking, API connectivity validation, system resource monitoring, and backup verification. These run on cron schedules and output to JSON status files that get parsed into an administrative dashboard. The dashboard updates every 30 minutes and shows me everything from disk usage to the last successful backup timestamp.</p>

<h2 class="section-heading">The Download Pipeline: Privacy-First Architecture</h2>

<p>One of the cooler pieces of the setup is the media acquisition pipeline. I wanted something fully automated but also privacy-conscious, which meant routing all torrent traffic through a VPN.</p>

<p>The solution was to use Gluetun as a VPN gateway container, specifically configured for ProtonVPN. qBittorrent runs with its network stack attached to the Gluetun container, meaning all of its traffic—every single packet—routes through the encrypted VPN tunnel. If the VPN connection drops, qBittorrent loses network access entirely. No leaks, no fallback to the WAN connection. I have noticed significantly slower download speeds with this configuration.</p>

<p>Radarr and Sonarr monitor for new content and automatically send downloads to qBittorrent through the VPN gateway. When downloads complete, the media files get organized into the appropriate Plex libraries, metadata gets fetched, and Overseerr notifies whoever requested the content. The whole pipeline is zero-touch once configured.</p>

<p>This setup gives me plausible deniability and proper security. All torrent traffic is encrypted and anonymized through ProtonVPN. The ISP sees VPN traffic and nothing else. And because everything is automated, there's no manual intervention needed—request something through Overseerr, and it shows up in Plex automatically.</p>

<h2 class="section-heading">User Management: The Family Access Problem</h2>

<p>Running a homelab for yourself is straightforward. Running it for family members who shouldn't have admin access? That's where things get interesting. I needed to provide services to a spouse who wants to stream media and upload photos, but definitely shouldn't be able to access download management or SSH into containers.</p>

<p>The solution required thinking about isolation at multiple layers...</p>

<h2 class="section-heading">Service-Level Access Control</h2>

<p>Some services got configured with managed user accounts. Plex has a proper user management system, so I created a managed user account with library access but no admin capabilities. They can watch content, manage their own watch history, but can't see server settings or make configuration changes.</p>

<p>Overseerr integrates with Plex authentication and has granular permission controls. The user account can request content but those requests require approval. There are daily limits on requests to prevent runaway spending if someone decides to request the entire IMDb catalog. Movie requests are capped at five per day, TV series at two, individual seasons at five.</p>

<p>Immich got set up with completely separate user libraries. Each user account has their own photo storage with their own AI models and facial recognition. There's no cross-contamination of personal photos between accounts, and each user's data is genuinely isolated.</p>

<h2 class="section-heading">Filesystem-Level Isolation</h2>

<p>At the ZFS level, I created separate datasets for user-specific data with proper permissions. The admin user has full access everywhere because, well, I'm the admin. The family user has read-write access to their own directory, read-only access to shared media libraries, and no access whatsoever to admin directories, configuration files, or download management.</p>

<p>SMB shares enforce these permissions at the network level. The family user can map their personal share and the media library, but they can't even see the admin directories or download folders. Access Based Share Enumeration means if you don't have permissions, the share doesn't appear in your network browser.</p>

<p>FileBrowser got configured with a scoped view—the user account sees only their personal directory and shared content. The admin account sees everything. Same tool, different views based on authentication.</p>

<h2 class="section-heading">Network-Level Restrictions</h2>

<p>The really sensitive services don't have user accounts at all—they're simply not accessible without VPN access. Download management, container administration, system monitoring, Pi-hole configuration—all of these require being on the VPN, which requires admin credentials.</p>

<p>This creates a clean separation: family members access public-facing services through standard HTTPS. Administrators access everything through VPN. There's no confusion about what should or shouldn't be accessible, and there's no risk of accidentally exposing admin interfaces.</p>

<h2 class="section-heading">Power Protection: Planning for the Inevitable</h2>

<p>Living in an area with somewhat questionable power reliability meant UPS integration wasn't optional. The APC BE600M1 connects via USB and gets monitored through Network UPS Tools (NUT).</p>

<p>The configuration handles three scenarios: power loss (log it and monitor battery levels), low battery (initiate graceful container shutdown and create emergency ZFS snapshots), and power restoration (log the event and verify service health). The shutdown sequence is carefully ordered—user-facing services stop first, then databases, then infrastructure services, ensuring nothing gets corrupted during power events.</p>

<p>The UPS gives me about 15-20 minutes of runtime with typical load. That's more than enough for graceful shutdowns, and it's saved me multiple times during brief power flickers that would have otherwise meant hard crashes and potential filesystem corruption.</p>

<h2 class="section-heading">Backup Strategy: Multiple Layers of Paranoia</h2>

<p>As someone who deals with data recovery professionally, I'm perhaps unreasonably paranoid about backups. The strategy here is multi-layered:</p>

<h2 class="section-heading">ZFS Snapshots</h2>

<p>Automated snapshots run hourly, daily, and weekly with appropriate retention policies. Hourly snapshots keep for 24 hours, daily for a week, weekly for a month. This gives me point-in-time recovery for accidental deletions or configuration mistakes. Rollback is instant—literally just a ZFS command and you're back to any previous snapshot.</p>

<h2 class="section-heading">Configuration Backups</h2>

<p>Every week, a script backs up all container configurations, docker-compose files, and service configurations to encrypted archives. These get stored on the main storage pool but also copied to an external drive. If I need to rebuild from scratch, I have everything needed to restore the exact configuration.</p>

<h2 class="section-heading">Container Image Redundancy</h2>

<p>I run a local Docker registry that maintains copies of all running container images. Every day, a script pulls the latest versions of critical images, saves them to the registry, and also creates compressed tar archives as a fallback.</p>

<p>This means if Docker Hub goes down, or an image gets pulled, or something breaks in a new version, I have local copies of everything. I can restore any container from local storage without needing internet access. Combined with the configuration backups, I can rebuild the entire platform from local resources.</p>

<h2 class="section-heading">Monitoring Backup Health</h2>

<p>The monitoring scripts verify that backups are actually happening and are restorable. They check that ZFS snapshots exist, that configuration backups completed successfully, that the Docker registry is accessible, and that tar archives are present and not corrupted. If any of these checks fail, I get alerted immediately.</p>

<h2 class="section-heading">SSL and DNS: The Professional Touch</h2>

<p>Everything uses proper SSL certificates. NGINX Proxy Manager integrates with Let's Encrypt and handles automatic certificate generation and renewal. Each service gets its own certificate rather than using a wildcard, which provides better security and clearer logging.</p>

<p>DNS is handled through Cloudflare with proper A records pointing to my public IP. NGINX Proxy Manager sits behind the router and handles all incoming HTTPS traffic, routing it to the appropriate backend services. The public services are accessible from anywhere, while admin services remain VPN-only regardless of the DNS configuration.</p>

<p>The automated monitoring includes SSL certificate expiration checking. I get warnings seven days before any certificate expires, though with Let's Encrypt automation, they should renew automatically. Belt and suspenders.</p>

<h2 class="section-heading">Current State: Production Ready</h2>

<p>After three months of acquisition, two weeks of intense setup, and another couple weeks of debugging and optimization, the system has been running stable for weeks now. Seventeen containers operating with 99%+ uptime. The only downtime has been during planned maintenance windows.</p>

<p>Resource utilization is comfortable—memory usage hovers around 35GB with all services running, leaving plenty of headroom. The 20GB ZFS ARC cache is well-utilized with excellent hit rates. Storage is at about 9% utilization with 1.5TB used out of 16TB available, giving me years of growth runway.</p>

<p>The automated media pipeline works flawlessly. Someone requests content through Overseerr, I approve it, Radarr or Sonarr finds it, qBittorrent downloads it through the VPN, files get organized automatically, and Plex updates its library. Zero manual intervention needed beyond the approval step.</p>

<p>Family members use the services without even realizing they're running on local infrastructure. Photos upload to Immich, media streams through Plex, passwords sync through Vaultwarden. From their perspective, it's just another cloud service. From my perspective, it's a platform I completely control with full visibility into everything happening.</p>

<h2 class="section-heading">Lessons Learned: What I'd Do Differently</h2>

<p>If I were starting over, I'd do a few things differently:</p>

<p>First, I'd spend more time up front on planning the network architecture. The DNS issues and Tailscale problems were avoidable if I'd properly understood how TrueNAS manages network configuration. Read the documentation, understand the platform's layers, and work with them rather than against them.</p>

<p>Second, I'd implement monitoring from day one rather than adding it later. Having Uptime Kuma and the monitoring scripts running from the start would have saved debugging time. When something breaks, you want historical data to reference, not just current state.</p>

<p>Third, I'd be more systematic about documentation. I kept notes, but they were scattered across multiple files and formats. Having a proper wiki or documentation system from the beginning would have been valuable, especially when trying to remember why I made specific configuration choices.</p>

<p>That said, the project was successful. I built something that works in production, handles real user traffic, and hasn't required emergency intervention. The security architecture is sound, the backup strategy is comprehensive, and the whole thing is maintainable without constant attention.</p>

<h2 class="section-heading">What's Next: The Roadmap</h2>

<p>The platform is production-ready, but there's always room for improvement. On the immediate roadmap:</p>

<p>Adding proper off-site backup. Currently everything is local or local-plus-external-drive. I want encrypted backups going to cloud storage for true disaster recovery capability.</p>

<p>Expanding monitoring with Prometheus and Grafana for better metrics visualization and historical trending. Uptime Kuma is great for service health, but I want deeper visibility into resource utilization patterns.</p>

<p>Implementing more sophisticated automation workflows. The current setup handles media requests well, but there's potential for automated maintenance tasks, health remediation, and smarter resource management.</p>

<p>Adding a second host for high-availability and load distribution. The Dell T3620 is solid, but having a backup host would enable proper failover and zero-downtime updates.</p>

<h2 class="section-heading">The Bottom Line</h2>

<p>Building a homelab at this scale takes time, costs money, and requires dealing with frustrating problems. But the result is worth it—a platform that provides cloud-level functionality with complete control and visibility. No monthly subscriptions, no wondering what the provider is doing with your data, no depending on someone else's uptime.</p>

<p>For someone in forensics and security, it's also a learning laboratory. I can experiment with security concepts, practice incident response, and understand enterprise architecture patterns in a low-stakes environment. The skills learned here translate directly to professional work.</p>

<p>And honestly? There's something satisfying about running your own infrastructure well. When family members seamlessly stream media or quickly find old photos, they don't think about the ZFS arrays, VPN tunnels, and automated pipelines making it possible. They just see services that work. And that's exactly the point.</p>

<p>If you're considering building something similar, my advice is simple: start with solid hardware, plan your network architecture carefully, implement monitoring early, and expect to learn through problems. Document everything, back up religiously, and don't be afraid to rebuild when something goes fundamentally wrong.</p>

<p>The homelab journey isn't about having the flashiest setup or the most services. It's about building something that works reliably, understanding how all the pieces fit together, and maintaining control over your own infrastructure. Three months of eBay hunting and two weeks of intense setup led to a platform that will serve for years. Not a bad investment.</p>]]></content><author><name>Daniel Farid</name></author><category term="homelab" /><summary type="html"><![CDATA[So I built a homelab. Not the "throw Docker on a Raspberry Pi" kind of homelab, but a proper production-grade personal cloud infrastructure that I'd be comfortable showing to enterprise architects. This is the story of how three months of eBay hunting, some hard-learned networking lessons, and way too much coffee turned into a 17-service platform that handles everything from media streaming to family photo management with enterprise-level security.]]></summary></entry></feed>