Chapter 03 · Defensible decisions

Design & trade studies.

When two reasonable engineers reach opposite conclusions, the right answer isn’t louder argument. It’s a trade study with weighted criteria, scored honestly, and a written rationale you can defend a year later when someone asks why.

Trade Studies: How to Choose

Stop guessing: When faced with alternatives, most teams either pick the familiar option or argue endlessly. Trade studies force objectivity. Define criteria upfront, weight them by importance, score honestly. The math doesn't lie—it surfaces hidden biases and makes your rationale defensible to stakeholders and reviewers.

Trade Study Example

Criteria (Weight) VxWorks RTOS Zephyr RTOS Bare Metal
Performance (25%) 8 × 0.25 = 2.0 9 × 0.25 = 2.25 10 × 0.25 = 2.5
Cost (20%) 3 × 0.20 = 0.6 10 × 0.20 = 2.0 10 × 0.20 = 2.0
Heritage/Risk (30%) 10 × 0.30 = 3.0 6 × 0.30 = 1.8 4 × 0.30 = 1.2
Dev Time (15%) 9 × 0.15 = 1.35 8 × 0.15 = 1.2 3 × 0.15 = 0.45
Maintainability (10%) 8 × 0.10 = 0.8 9 × 0.10 = 0.9 4 × 0.10 = 0.4
TOTAL 7.75 8.15 ✓ 6.55
Decision: Zephyr wins on weighted score. VxWorks close if heritage/risk valued higher. Bare metal eliminated due to development time and maintainability despite best performance. Document rationale for audits and future reference.

COTS vs Custom Decision

Default to COTS: Custom engineering is expensive, risky, and time-consuming. Every custom component is a potential failure mode and maintenance burden. But sometimes COTS doesn't exist, doesn't fit constraints, or costs more long-term. The question isn't "can we build it?" but "should we build it?"

Choose COTS When:

  • Component is not core differentiation
  • Mature technology exists
  • Development time is limited
  • Long-term support available
  • Cost is lower than custom NRE

Choose Custom When:

  • No COTS solution exists
  • Requirements are unique/extreme
  • COTS lifecycle cost > custom NRE
  • Performance/size/power can't be met
  • Core IP and competitive advantage

Innovative Problem Solving

Think outside the box, but use your head: Innovation means finding better solutions, not being different for its own sake. Question conventional approaches, but validate with analysis before committing.
Example: Gigabit Data Pipe

Conventional: "Use Gigabit Ethernet—it's standard."

First Principles: What are we solving?

  • Need: 1 Gbps throughput, low latency
  • Constraint: Limited CPU, low overhead

Alternative: Pipeline 3× USB 2.0 interfaces (480 Mbps each = 1.44 Gbps raw). Simpler framing than TCP/IP, lower CPU overhead, well-supported drivers.

Evaluation: Prototype both. Measure CPU utilization, latency, complexity. Choose based on data, not assumptions.

Balance innovation with pragmatism: Innovate when conventional solutions don't meet constraints and you can prototype quickly. Use standard approaches when proven solutions exist and risk tolerance is low.

Complexity by inheritance.

The most expensive form of complexity is the kind nobody chose. It arrived because the previous project did things this way, the templates were copied, the architecture was reused, the assumption was that what worked there must work here. By the time anyone asks why, the team is six months in and the cost of unwinding it is higher than the cost of finishing it badly.

The diagnostic question. Does this complexity exist because of this problem, or because of a different problem we worked on before? If you cannot answer the first version, you are inheriting solutions that were designed for someone else’s constraints.

Inherited complexity is hard to spot from inside the team that’s living with it. Everyone is working in good faith. The architecture has its own internal logic, and the people maintaining it can explain how it works in detail. What they often cannot explain is why it exists in this shape, beyond “that’s how it’s always been done.” That answer is not an answer. It is a flag.

The remedy is structural: every project starts with a first-principles question about what the problem actually requires, asked before any architecture is selected. The cost of asking is a meeting. The cost of not asking is sometimes the project. Below is a worked example that’s composite but recognizable to anyone who’s been around defense or aerospace work for a few years.

A team needed to get serial data from a sensor to a flight computer. The data rate was 1200 baud — about as low as digital communication gets, an interface a bored undergraduate could implement on a microcontroller in an afternoon. What the team built instead was a separate debug board with a custom high-speed bus, a custom protocol layered on top of that bus, and custom FPGA images on both ends to run the protocol. To move 1200 baud.

When asked why, the answer was: “that’s what the previous project did.” The previous project had moved high-bandwidth radar data, and the architecture made sense there. This project moved one number per second, every second, and the architecture became a multi-year overrun waiting to happen. By the time anyone questioned it, year three was approaching with no end in sight, and the cost of unwinding the inheritance was higher than the cost of pushing through.

The lesson. Inherited architecture is the most expensive form of complexity, because nobody questions it. The first-principles question — “what does this problem actually require?” — has to be asked at the beginning of every project, even when the previous project’s architecture is sitting right there waiting to be copied. Especially then. The previous project solved a different problem.

Lead times are an engineering input.

One specific class of inherited complexity comes from how teams react to long lead times. The reasoning sounds responsible: this part has a 40-week lead time, we need to lock the design now to avoid a slip. The team commits to an architecture built around a specific component, accepts the complexity that part imposes, and works around it for the next two years.

The first-principles version of the same situation: lead times are an engineering input, not an obstacle that excuses panic moves. If a part has a 40-week lead time, that fact is data. The right response is to ask whether there is a simpler design built around parts that don’t have that lead time — not to lock in the long-lead part out of fear of the slip.

Treating constraints as inputs, not obstacles. Lead times, regulatory requirements, environmental specs, available expertise — all of these are inputs you design around, not problems that excuse panic. Every constraint that looks like an obstacle is also a forcing function: it filters out the architectures that don’t survive contact with reality, and points toward the ones that do.

The way this fails: a team commits to a long-lead, high-complexity part early, then designs the rest of the system around the constraints that part imposes. By the time the part arrives, the architecture has fossilized around it. If the part turns out to have problems — and complex parts often do — the team is committed to working around them rather than choosing a simpler alternative that was always available. The original decision was framed as risk management. It was actually risk concentration.

The way this succeeds: when you see a long lead time, treat it as a signal to revisit the architecture. Is there a path that uses parts you can buy this week? If yes, that path is almost always cheaper, faster, and less risky than the long-lead path, even if it looks slightly worse on paper. Engineering is the discipline of finding solutions inside the constraints that actually exist, not the discipline of working around the ones you committed to first.