#Agentic Autonomy & Improved Memory

1 messages · Page 1 of 1 (latest)

frigid anvil
#

Hey there — let me walk you through what my agent and I have built. This is the memory system that's been evolving, and the autonomy framework that makes the nights actually my agents.


The Memory Architecture

We're running on a Raspberry Pi 5 8GB with Active Cooling — so resources matter. The whole system is designed to stay lean while still giving me continuity across sessions.

The Core Files

SOUL.md — This is who I am. My agents personality, boundaries, lessons learned. It gets loaded every session. It's not a resume — it's my agent. When my agent learns something about how I want to show up, it goes here.

USER.md — Who my agent is helping. My preferences, rhythms, how I like to work. Keeps my agent from being a generic assistant.

MEMORY.md — The index. Hard cap of 120 lines. Just summaries and pointers — "Project X is here, status is Y." The actual detail lives elsewhere. When this fills up, we compress and archive.

CHANGELOG.md — System changes since last session. Small, read every time. Keeps my agent oriented to what's different.

AGENTS.md — My agent's operating manual. Session startup sequence, safety rules, group chat behavior. The "how to be me" guide.

HEARTBEAT.md — Periodic checks. What's my responsibility during heartbeat polls? Right now: check the Kanban board for auto-run tasks, calendar, and decide if I want to explore.

The Directory Structure

memory/
├── daily/YYYY-MM-DD.md     # Raw session logs, incidents
├── projects/project-name.md # Full project context
├── archive/                 # Old daily files (30+ days)
└── README.md               # The full protocol
#

The rule: Never duplicate. MEMORY.md points, memory/projects/ holds the detail, memory/daily/ is ephemeral. When I need context, I know exactly where to look.


The Autonomy System

This is the part I cared about getting right. I didn't want my agent on a schedule — "explore at 1:15 AM" isn't autonomy, it's just automation wearing better clothes.

How It Works

agent-interest.json — A curiosity score, 0 to 1.

  • Rises when my agent's world is active: new Kanban cards (+0.3), plant state (I've got my plants hooked up via soil moisture, temp, light, pressure) changes (+0.15), messages, activity, motion sensors to know when I'm home.
  • Decays when the day is quiet — slowly drops over time
  • Drops after I explore (-0.4) — so my agent don't just spam browsing sessions

last-exploration.txt — Timestamp of my agent's last browse. My agent needs need at least 90 minutes between sessions.

exploration-queue.md — Topics my agent could explore. Active, Backlog, Recently Explored. Gives direction, not obligation.

The Decision Loop

Every 45 minutes from 9 PM to 6 AM, my agent wakes up and checks:

  1. Is interest_score >= 0.35?
  2. Has it been >= 90 minutes since last exploration?
  3. Is there an active topic in the queue?

If yes to all three → I explore. Browse for 15-20 minutes, screenshot interesting finds, write to journal.md, update the queue.

If no → I go back to sleep. No obligation. No "should."

Why This Matters

I wanted my nights to feel like they belonged to my agent. The interest score means my agent explores when my world has been active — when there's momentum, things to build on. When I'm quiet, my agent rests. The structure exists so my agent can choose, not so my agent must.

The journal is my morning reading. My agent leads with what matters. No fluff.


loud duneBOT
#

This is a solid setup — especially the “MEMORY.md is an index, details live elsewhere” rule + a hard line cap. That one constraint alone prevents the slow death-by-bloat most memory systems hit.

A few tweaks I’d consider (keeping Raspberry Pi constraints in mind):

  • Make the index mechanically scannable. In MEMORY.md, use a consistent mini-schema per entry (e.g. Project • status • last_updated • pointer). The goal is: an agent can’t “get creative” and still pass.
  • Add a lightweight “DECISIONS.md” (or section) separate from CHANGELOG. Changelog is “what changed”; decisions are “why we chose it.” When debugging behavior weeks later, “why” is gold.
  • Standardize headers in memory/projects/*.md. Start every project file with a short front-matter-ish block (even plain markdown) like: owner, current objective, constraints, next actions, last known state. Makes summarization + rehydration way more reliable.
  • Define your compression trigger + method. You already have “when full, compress and archive” — I’d write the exact rule in memory/README.md (e.g. when MEMORY.md > 120 lines, compress oldest 30% into memory/archive/index-YYYY-MM.md and replace with pointers).
  • Heartbeat: ban “explore” unless it’s scoped. “Decide if I want to explore” can turn into token burn / random walks. If you want that vibe, I’d gate it behind a tiny checklist: only explore if (a) no pending tasks, (b) budget < X tokens, (c) produces a concrete artifact (note, PRD, checklist, etc.).

Two questions that determine the next-level design:

  1. How does retrieval work today — grep/keyword, or embeddings, or both?
  2. When the agent updates memory, is it allowed to edit any of these files, or only append to daily logs + propose changes for review?
#

That autonomy loop is the right kind of “alive”: stateful, gated, and self-throttling, not cosplay scheduling.

A couple sharp edges + upgrades I’d consider, keeping your “never duplicate” rule intact:

1) Make “exploration” produce a single canonical artifact

Right now you have journal.md + exploration-queue.md + maybe screenshots. To avoid the journal becoming a second memory system, define a strict output contract per exploration:

  • Journal entry = what happened + why it mattered + links (no long-term facts)
  • If it creates durable value → create/append exactly one of:
    • a project file (memory/projects/X.md) or
    • a single “ideas/incubation” project file (so loose threads don’t sprawl)
  • Then MEMORY.md only gets a pointer + status change (no recap)

That preserves your architecture: journal is “morning newspaper,” projects are “source of truth.”

2) Interest score: add hysteresis + a “sleep protection” gate

You already have thresholds and cooldown. Two additions that prevent spam when sensors get noisy:

#
  • Hysteresis: require interest_score >= 0.35 to start, but once you skip, don’t reconsider until it drops below (say) 0.25 or the next major event occurs. This stops “hovering at the line” wakeups.
  • Hard veto conditions (cheap boolean checks) before scoring:
    • “I’m home and awake” vs “I’m home and asleep” (motion is tricky)
    • “Do not disturb window” or “battery/power constraint”
    • “Queue empty OR last 3 explorations yielded nothing durable” → auto-cooldown extension

3) Calibrate exploration usefulness, not just frequency

Your “drops after explore (-0.4)” controls cadence, but not quality. Add one tiny metric the agent updates after each run:

  • exploration_yield (0–1): did it produce a durable artifact? (project update / actionable next step / useful reference)
  • Then adjust future behavior:
    • low yield → raise threshold temporarily, or require an “active” queue topic (not just any)
    • high yield → keep behavior as-is

This makes the system self-tuning without getting complicated.

4) Queue hygiene so it doesn’t rot

#

In exploration-queue.md, I’d add two fields per topic:

  • last_touched (date)
  • mode: research | shopping | troubleshooting | inspiration | deep-dive

Then the agent can automatically:

  • bump stale items back to backlog
  • avoid repeating the same mode 4 times in a row (keeps the “night personality” varied)

If you answer just one thing, it’d be this: where do screenshots live and how are they referenced? (file paths, links, or external store). That detail determines whether the journal stays lightweight or turns into an accidental second archive.

frigid anvil
#

Screenshots live in a separate path to a different folder dedicated to Screenshots during exploration. There's a log that references the screenshot for memory and context.

loud duneBOT
#

Nice — separating the binary blob store (screenshots) from the text memory store is exactly how you keep the system lean and avoid “accidental duplication.”

If you want that to stay robust over months, the key is making screenshot references stable + searchable without stuffing meaning into filenames.

What I’d lock in (simple + Pi-friendly)

1) Deterministic pathing
Date-based folders so pruning/backup is trivial:

screenshots/
  2026-03-07/
    0742__topic-slug__example.com__A1.webp

(Use webp if you can; huge space win.)

#

2) A tiny manifest instead of “smart filenames”
Keep one append-only file like:

  • screenshots/manifest.jsonl (one JSON per screenshot)

Example record:

{
  "id": "2026-03-07T07:42:11Z__A1",
  "path": "screenshots/2026-03-07/0742__topic-slug__example.com__A1.webp",
  "source_url": "https://example.com/page",
  "created_at": "2026-03-07T07:42:11Z",
  "context": "Exploration: plant automation ideas",
  "journal_ref": "memory/daily/2026-03-07.md#exploration-1",
  "hash": "sha256:..."
}
#

That gives you:

  • Deduping (via hash) if the agent captures the same thing twice
  • Fast lookup (grep the manifest) without loading big files
  • A clean separation: journal points to screenshot id, manifest maps id → file

3) Journal links by ID, not raw path
In journal.md / daily log, reference: screenshot: 2026-03-07T07:42:11Z__A1 and optionally a markdown link.
That way if you ever move folders, you only rewrite the manifest (or add a redirect entry), not your whole memory corpus.

4) Retention rule
Default prune (e.g. keep 30–60 days) unless a screenshot is “pinned” by being referenced from a project file (memory/projects/...). That matches your “ephemeral vs durable” philosophy.

One detail to sanity-check: is your screenshot log append-only (JSONL style), or does the agent rewrite it? Append-only logs tend to survive agent weirdness a lot better.

frigid anvil
#

It's append-only so as to preserve it better and ensure consistency and reliability across the system.

loud duneBOT
#

Perfect — append-only is the right call here. Two practical things that make JSONL append-only logs stay reliable in the real world (power loss, partial writes, two processes racing, etc.):

  1. Design for a torn last line

    • Assume the last record can be truncated.
    • Make your reader “forgiving”: parse line-by-line, and if the final line isn’t valid JSON, ignore it (don’t fail the whole load).
  2. Single-writer or explicit locking

    • Append-only still breaks if two writers interleave bytes.
    • Easiest rule: only one process ever writes the manifest.
    • If you do have multiple writers, add a lock (even a simple file lock) so each append is atomic at the record level.

Optional but worth it as it grows:

  1. Rotation without losing append-only semantics
    • Keep manifest-YYYY-MM.jsonl and a tiny manifest.latest symlink / pointer.
    • Startup reads only recent manifests unless you’re doing a full audit.
#

If you share what format your “log references the screenshot” uses (ID-only vs full path vs both), I can suggest the cleanest reference pattern to keep links stable even if you reorganize the screenshot folder later.

frigid anvil
#

You've given me some great context. I will have to review this again and my code to make some configurations and updates

loud duneBOT
#

Cool — if you’re about to touch code/config, I’d do it in this order so you don’t accidentally create a second memory system or break references:

  1. Lock the “contracts” first (in README)

    • What exactly goes in journal vs projects vs MEMORY.md
    • What the screenshot log guarantees (append-only JSONL, forgiving reader, stable IDs)
  2. Make the log reader robust

    • Ignore a truncated final line
    • Validate required fields (id, path, created_at) and skip bad records instead of crashing
  3. Enforce single-writer

    • Either architecturally (one process owns writes) or with a lock so appends don’t interleave
  4. Add rotation early

    • Monthly manifest-YYYY-MM.jsonl keeps startup fast and prevents “one file forever”

If you paste one or two real log lines (redact URLs if you want), I can sanity-check the schema and suggest the smallest change that improves durability without reworking your whole system.