How I Used AI to Debug My Crashing Laptop

My homelab node froze overnight. Instead of digging through logs manually, I asked Claude Code to investigate. It SSHed in, traced the crash through kernel logs, and identified a known Intel i915 PSR lockup—then applied the kernel fix in minutes.

How I Used AI to Debug My Crashing Laptop
Photo by Thomas Elliott / Unsplash

This morning I sat down at my desk, opened my laptop, and found it completely dead. Screen frozen. Keyboard unresponsive. Didn't answer to ping. Just a static image of whatever was on the screen when it gave up. Classic hard freeze.

This is icarus — a Dell Latitude 7490 I repurposed as a homelab node. It runs k3s, a few services, and generally just sits there quietly doing its job. Or it's supposed to.

I hard-reset it, it came back up, and then I did something I probably wouldn't have done a year ago: instead of spending an hour grep-ing through logs myself, I just handed it to Claude and said "do a deep dive into why that machine crashed."

Here's what happened.


The Setup

I use Claude Code — Anthropic's CLI tool — as a kind of always-on terminal assistant. It has SSH access to my homelab machines. So when I said "SSH to icarus and figure out what happened", it just... did that. No preamble, no configuration dance. It SSHed in, saw the machine had just rebooted 1 minute ago, and started digging.

What It Found

The investigation was methodical in a way I wouldn't always be at 9am before coffee. It checked:

  • uptime and last reboot — confirmed a reboot at 09:35, previous boot started Mar 10
  • journalctl -b -1 — all logs from the crashed session
  • Kernel messages, OOM killer activity, thermal events, GPU logs
  • last -x — whether sessions ended cleanly or as "crash"
  • pstore / kdump — any crash dumps captured

Most of those came back clean. No OOM. No kernel panic. No disk full. No overheating. The sessions in last -x showed crash instead of a clean logout, meaning the kernel died without being able to write a proper goodbye.

Then it noticed something I might have glossed over: a chain of perf: interrupt took too long messages, spread across 15 hours, each one worse than the last:

Mar 10 12:45  perf interrupt took too long (2502 > 2500) → rate to 79000
Mar 10 15:30  perf interrupt took too long (3151 > 3127) → rate to 63000
Mar 10 19:15  perf interrupt took too long (3951 > 3938) → rate to 50000
Mar 11 01:14  perf interrupt took too long (4948 > 4938) → rate to 40000

Something was eating into interrupt time, and it was getting worse. The machine had been slowly suffocating.

And then, on the current boot — after the crash — the kernel logged this when the GPU driver came back up:

i915: [ENCODER:102:DDI B/PHY B] is disabled/in DSI mode with an ungated DDI clock, gate it
i915: [ENCODER:118:DDI C/PHY C] is disabled/in DSI mode with an ungated DDI clock, gate it

That's the smoking gun. The GPU left its display clocks running when the system went down — because it didn't shut down, it froze. The i915 driver was picking up the wreckage.

The Actual Cause

Intel i915 GPU hard lockup, triggered by Panel Self Refresh (PSR).

PSR is a power-saving feature where the GPU powers down parts of the display pipeline between frames. On Kaby Lake hardware (which is what the i5-8350U in the Latitude 7490 is), this has a well-known tendency to deadlock kernel spinlocks. When it goes wrong, it goes wrong silently — the GPU freeze can happen before the NMI watchdog fires, so you get no panic, no crash dump, no log entry. The kernel just stops.

The screen freezes because the GPU froze. The keyboard stops because the kernel froze. The machine stops answering ping for the same reason. It looks like a power cut, but the machine is still on.

The Fix

Two kernel parameters, added to GRUB:

i915.enable_psr=0 i915.enable_dc=0

enable_psr=0 turns off Panel Self Refresh. enable_dc=0 turns off display C-states, a secondary source of the same problem. Both are power-saving features — fine to lose on a machine that's plugged in as a homelab node. The tradeoff is a few watts for a stable system.

Claude made the edit to /etc/default/grub directly, ran update-grub, and verified the parameters landed in grub.cfg. Total time from "SSH to icarus" to "fix applied": about 10 minutes.


What I Actually Learned

The obvious takeaway is that AI can debug Linux crashes. But that's not the interesting part.

The interesting part is what it did differently from how I would have debugged this myself.

I would have started with the most dramatic possibilities — OOM killer, disk full, kernel panic — and probably declared victory or defeat within those first three checks. Claude worked more like a checklist than a hunch. It went wide first, systematically eliminating categories, and only then focused on what remained.

I also wouldn't have connected the dots between the perf: interrupt took too long messages and the i915 GPU. I might have seen them and filed them under "noise". The model recognized them as a known symptom of a known Kaby Lake issue — not because it's smarter than me, but because it's read a lot more bug reports.

That said: I still had to know enough to read the diagnosis and trust it. I had to recognize that "ungated DDI clock" was meaningful, that Kaby Lake has a well-known i915 problem, that disabling PSR is a real and documented fix. The AI found the thread; I had to know what I was looking at when it handed it to me.

That's the dynamic I keep coming back to. It's not that AI replaces the need to understand things. It's that it radically lowers the time between "something is broken" and "here is a specific, testable hypothesis". The understanding still has to be yours.

icarus is stable now. Whether it stays that way will tell me if the diagnosis was right. But I'd bet on it.


icarus is a Dell Latitude 7490 running Ubuntu 24.04, k3s, and various homelab services. This debugging session happened on March 11, 2026.