Chapter 05 · Proving it works

Verification & validation.

“It seems fine” is not a verification result. Real verification is the disciplined act of proving each requirement is met, with evidence you can hand to an auditor or a customer. This chapter is about the difference between testing and proving.

Verification vs validation.

Two questions, often confused. Verification asks “did we build it right?” Validation asks “did we build the right thing?” The first is internal — does the system meet its specifications? The second is external — do those specifications actually solve the customer’s problem? You can verify a flawless implementation of requirements that the customer didn’t actually need. You need both.

Why the distinction matters. A team that runs only verification ships systems that pass every test and disappoint every customer. A team that runs only validation ships systems the customer likes but no one can prove will hold up. The real failure modes are different: verification catches engineering errors against the contract, validation catches the contract being wrong about the problem.

The mental model. Verification is the building inspector checking that the construction matches the plans. Validation is asking the family if the house actually fits how they live. The inspector can’t answer the family’s question, and the family can’t answer the inspector’s. Both visits are necessary, and they happen at different times for different reasons.

Verification.

Did we build it right?

  • Test against specifications
  • Measure performance vs requirements
  • Check all interfaces work correctly
  • Verify error handling and edge cases
  • Document results objectively

Validation.

Did we build the right thing?

  • Customer acceptance testing
  • Does it solve the real problem?
  • Usable in operational environment?
  • Meets unstated but obvious needs?
  • Customer signs off

Test isolation by tier.

The failure mode this fixes. A team where the first integration test is the entire system. Every bug found requires the full stack standing up to reproduce. Every fix requires the full stack to verify. Every test cycle costs a coordinated team effort. Days of debugging burn on bugs that should have surfaced in seconds. The fix is not “test harder.” It is testing at the right level for the bug you’re hunting, with infrastructure built on purpose to catch each kind early.

The principle is simple: each tier introduces exactly one new variable. Tier 1 exercises pure logic with zero dependencies. Tier 2 adds component-to-component integration in a single process. Tier 3 adds real serialization and transport on a local machine. Tier 4 brings up the full software stack. Hardware enters only above Tier 4, as a separate problem with its own tiers. When a test fails at any tier, you know with certainty that the failure was introduced by what changed between that tier and the one below it. That is the diagnostic value — and it is the value system-only testing throws away.

Test isolation pyramid A four-tier software isolation pyramid sits below a hardware integration block. Tier 1 is the widest base, representing many cheap fast tests. Tier 4 at the top represents few expensive slow tests. Above Tier 4, hardware integration is shown as a separate dashed block with its own four tiers. Hardware integration DEV BOARDS · INTERFACE BRING-UP · INTEGRATED · PRODUCTION By the time code reaches hardware, the software logic is already verified. EXPENSIVE slow, sparse CHEAP fast, plentiful SOFTWARE ISOLATION — NO HARDWARE REQUIRED 4 Full software stack 3 Wire-level transport 2 Component contracts 1 Pure logic, zero dependencies Each tier introduces exactly one new variable.
Test isolation by tier · Software pyramid below, hardware integration above

The pyramid shape is also the cost shape. Tier 1 tests are cheap, fast, and run on every commit. Tier 4 tests are expensive, slow, and run rarely. You want hundreds of tests at the bottom of the pyramid filtering bugs out cheaply before they ever reach the slow tiers. A team that only has Tier 4 tests pays full price to find every bug, including the ones a thirty-second unit test would have caught in isolation.

The software tiers.

The names below are general. In any specific project the boundaries land where the architecture has natural seams — what counts as “transport” depends on whether you are dealing with gRPC, MQTT, shared memory, or raw sockets. The principle — each tier adds exactly one new thing — is what carries across.

Tier What it tests What it doesn’t need Catches
Tier 1
Pure logic
Algorithms, state machines, data structures, message types. I/O, threading, serialization, sockets. Math errors, edge cases, contract bugs.
Tier 2
Component contracts
Components calling each other through their public interfaces, in a single process. Serialization, transport, sockets, network. Interface mismatches, contract drift, ordering bugs.
Tier 3
Wire-level transport
Real serialization across a real transport, on a local machine. Network latency, remote dependencies, hardware. Encoding bugs, protocol mismatches, framing.
Tier 4
Full software stack
End-to-end software, all components, real transport, no hardware. Hardware. (That is the next pyramid.) Emergent behavior, resource contention, lifecycle bugs.

Hardware integration is its own pyramid.

Above the software pyramid sits hardware integration, and it deserves to be treated as a separate problem rather than another tier on top. Software tiers test logic; hardware tiers test the physical environment. They are hunting different kinds of bugs, on different cost curves. By the time code reaches dev hardware, state-machine bugs and serialization bugs are already gone. What remains are the questions only hardware can answer.

Hardware tier What it tests Catches
HW Tier 1
Software on dev hardware
Verified software running on the target processor. No FPGA logic yet. Toolchain issues, target-specific timing, processor surprises.
HW Tier 2
Interface bring-up
The hardware interface boundary specifically — GPIO, UART, SPI, interrupts — with FPGA loaded but logic not exercised. Pinout errors, signal integrity, interrupt storms.
HW Tier 3
Integrated hardware
Full system on dev hardware. FPGA running real logic, software running on target. End-to-end timing, real-world data flow, system-level surprises.
HW Tier 4
Production hardware
Final hardware, full system, formal verification. Production-only artifacts: thermal, EMI, vibration, manufacturing variation.
Not strictly sequential. The tiers represent levels of confidence, not gates. You will likely run HW Tier 1 as soon as anything compiles for the target, in parallel with continued work in the software pyramid. The point is that the lower tiers should run continuously, on every change, because they are cheap; the upper tiers run when their cost is justified. The discipline is keeping the lower tiers alive throughout the project, not finishing each one before starting the next.

Tiers are infrastructure, not stepping stones.

A team had built solid tiered tests during early development — no specific hardware required, fast feedback, clean isolation. When production hardware finally arrived, support for the development boards was dropped. The reasoning sounded sensible at the time: “the dev hardware is no longer needed, supporting it costs engineer time, drop it.” The savings looked real on the schedule.

Months later, the only way to test anything was to bring it up as an entire system on production hardware — the most expensive, slowest, most contested test environment available. Every bug fix required the full stack standing up. Every regression chased on production silicon. The team that was “saving money” was paying full retail for every subsequent bug, while the engineers who’d built the original infrastructure watched it erode from under them.

The lesson. Lower tiers don’t get cheaper to skip as the project matures — they get more valuable to keep. The cost of maintaining a tier is fixed. The cost of debugging at the wrong level is paid every time a bug appears. By the time someone is willing to revisit the decision, the institutional knowledge to rebuild the tiers is usually gone too.

Maintained tiers are infrastructure. Treat them like the build system, the version control server, the CI pipeline — permanent, owned, and worth investing in across the project lifecycle. The team that abandons its dev-hardware support once final hardware arrives will pay full price for every subsequent bug. The team that keeps the lower tiers running pays the cheap price most of the time and the expensive price rarely. Over a multi-year project, the difference is enormous.

This is a management decision as much as an engineering one.

Sustaining tiered test infrastructure costs engineer time, and that cost shows up on someone’s budget every quarter. Without management understanding the math — that the cost of the tiers is dwarfed by the cost of debugging at the wrong level — the tiers will erode under schedule pressure. Engineering rigor here cannot stand alone. If you are an engineer reading this without that backing, your job is to make the cost case explicitly, with numbers, before the erosion starts.

Requirements Traceability Matrix

Close the loop: Every requirement must have a test. Every test must trace to a requirement. If you can't demonstrate compliance with a test, the requirement is unverifiable (and probably poorly written).
RTM Example
Requirement ID Requirement Test Procedure Status
SYS-001 Boot within 30 seconds TP-SYS-001: Measure boot time ✓ PASS
SYS-002 Operate -40°C to +85°C TP-SYS-002: Thermal chamber test ✓ PASS
COM-001 Ethernet link at 1 Gbps TP-COM-001: Throughput measurement ✗ FAIL (850 Mbps)
When tests fail: Don't just retest and hope. Root cause the failure. Is the requirement wrong? Is the design inadequate? Is the test procedure flawed? Fix the actual problem, not the symptom.

Reliability Engineering

MTBF and Failure Rates

MTBF (Mean Time Between Failures): Average time a system operates before failing. Higher is better. But understand what it actually means—MTBF of 10,000 hours doesn't mean every unit lasts 10,000 hours. It's a statistical average.

MTBF Calculation (for non-repairable systems):

MTBF = Total Operating Time / Number of Failures

Example Calculation

Test 10 units for 1000 hours each. 2 failures occur.

MTBF = (10 units × 1000 hours) / 2 failures = 5000 hours

Failure rate (λ): 1 / MTBF = 1 / 5000 = 0.0002 failures/hour

Design for Reliability

Derating

Don't run components at max ratings. Stress causes early failure.

  • Power: Use components at 50-70% max power
  • Voltage: Stay under 80% of max voltage
  • Temperature: Keep well below max junction temp
  • Current: Don't push wire/trace limits

Redundancy

Critical functions need backup paths.

  • Dual power supplies (N+1 redundancy)
  • Redundant communication links
  • Watchdog timers for processors
  • RAID for storage systems
  • Graceful degradation (fail soft, not hard)

Environmental Testing

Test like it'll be used: Lab conditions don't match real-world operation. Temperature extremes, vibration, humidity, EMI—if your system will see it, test for it.

Thermal Testing

  • Operating temperature range (-40°C to +85°C typical)
  • Storage temperature (-55°C to +125°C)
  • Thermal cycling (stress joints and connections)
  • Power-on thermal soak

Vibration & Shock

  • Random vibration (simulates transportation)
  • Sinusoidal sweep (resonance detection)
  • Mechanical shock (drop/impact)
  • Ensure connectors, solder joints survive

EMI/EMC Testing

  • Radiated emissions (FCC Part 15)
  • Conducted emissions
  • Susceptibility to external RF
  • ESD tolerance (human body model)

Test Automation

Automate regression tests: Manual testing is slow, error-prone, and doesn't scale. Automated tests run consistently, catch regressions immediately, and free engineers to focus on new functionality.
What to automate:
  • Unit tests (always automated, run on every commit)
  • Integration tests (automated for common interfaces)
  • Regression tests (detect if changes break existing functionality)
  • Performance benchmarks (catch degradation early)
  • Basic system tests (smoke tests after builds)
What to test manually:
  • Exploratory testing (finding unknown unknowns)
  • Usability and user experience
  • Complex environmental testing
  • Acceptance testing with customer