Protocol Design

Protocol Design Decisions That Keep You Out of Trouble

Protocol architecture whiteboard

There's a category of engineering mistake that only becomes visible months after you made it. The code worked fine when you wrote it. The tests passed. The audit came back clean. And then, six months into production, a combination of edge cases and real-world usage patterns reveals a design decision that seemed harmless but wasn't.

We've made some of these mistakes. We've also watched others make them. This is a collection of the design decisions that, in our experience, tend to determine whether a DeFi protocol stays out of trouble or creates its own problems.

Decision 1: Separate state from execution from settlement

The biggest structural mistake we see in protocol design is conflating state management, execution logic, and settlement in the same contract. When these concerns are mixed, upgrades become dangerous (changing execution logic might accidentally change state layout), bugs are harder to isolate, and auditors have to hold more context simultaneously to evaluate security properties.

In the Defimec protocol, these three concerns live in separate contracts with well-defined interfaces between them. The state contract holds positions and balances. The execution contract contains routing and swap logic. The settlement contract handles final fund movement. Upgrading execution logic doesn't touch state. A bug in execution can be addressed without the settlement contract being involved.

This sounds like obvious software engineering. It is. But DeFi protocols routinely mix these concerns in a single contract, usually because it started simple and grew organically. Growth without refactoring creates the mixed architecture that causes trouble later.

Decision 2: Design for failure modes, not happy paths

Most protocol design documents describe what happens when everything works. Few describe what should happen when something fails. This is backwards.

For a cross-chain protocol, the relevant failure modes include:

  • Source chain transaction confirmed, destination chain down
  • Bridge relayer goes offline mid-transfer
  • Gas spike on destination chain causes delivery to fail
  • Price moves adversely during routing, invalidating route assumptions
  • Destination contract call reverts

For each failure mode, you need a defined outcome: does the user get their funds back? How? Through what mechanism? Who can trigger recovery? What are the time bounds?

If your protocol design document doesn't have a section on failure modes, write one before you write any code. The decisions you make in that section will shape the architecture more than any performance optimization.

We use a 72-hour escrow with a defined recovery path for all cross-chain transfers. If a transfer doesn't settle within 72 hours, funds are automatically returned to the sender on the source chain. No manual intervention needed, no governance vote required. The recovery path was designed before a single line of protocol code was written.

Decision 3: Make administrative power minimal and time-locked

Every admin function in a DeFi protocol is a potential attack vector — either for external attackers compromising admin keys, or for insider decisions that harm users. The principle should be: administrative power should be exactly as large as it needs to be and no larger, and all uses of it should be time-locked to give users time to react.

In practice, this means:

  • Protocol fee changes: 48-hour timelock before taking effect
  • New chain additions: governance vote required, minimum 7-day voting period
  • Emergency pause: no timelock (by definition), but scope limited to halting new transactions, not touching in-progress ones
  • Contract upgrades: 7-day timelock, announcement required

Users should never be surprised by protocol changes. The timelock gives them time to exit if they disagree with a change before it takes effect.

Decision 4: Never rely on off-chain data for critical path decisions

Oracles are useful. They're also a significant attack surface. The larger the financial stake that depends on an oracle's output, the more attractive that oracle becomes as a manipulation target.

Our rule: no oracle price data touches the critical transfer path. Routing decisions use on-chain pool state pulled directly from DEX contracts. Slippage calculations use on-chain reserves. The only place we use price feeds is in non-critical analytics — reporting, dashboards, historical analysis.

This decision constrains some product features. We can't offer certain types of limit orders or conditional routing that would require reliable price feeds. We've chosen to accept those constraints rather than introduce oracle risk into the core transfer flow.

Decision 5: Audit everything, but especially the interfaces

Smart contract audits focus heavily on the internals of each contract. This is correct. But the interfaces between contracts — the assumptions each contract makes about the others — are where subtle bugs hide. One contract assumes a call will succeed; the other might revert under certain conditions. One contract passes a value in one unit; the other expects a different unit.

We audit interfaces explicitly, with test cases that simulate adversarial inputs at every inter-contract boundary. This is separate from the standard audit scope and adds time to the process. We've found two non-trivial bugs this way that the standard contract-level audit didn't catch.

What we'd do differently

The decision we'd reverse if we could: our initial choice to use a proxy pattern for upgradeability on the routing engine. Proxy patterns introduce storage layout risks that require careful management across upgrades. We've managed it correctly so far, but the cognitive overhead is significant. If we were starting today, we'd use immutable contracts with versioned interfaces instead — accept that upgrades require new deployments, and design migration paths accordingly. Simpler to reason about, less room for subtle upgrade mistakes.

Good protocol design is mostly about accepting constraints clearly. Know what you won't do, and make sure the system enforces those boundaries. The protocols that stay out of trouble aren't the ones that got lucky — they're the ones that made explicit decisions about where the risks were and how to bound them.

Continue Reading