Verification & Validation

Prove it works. Not "it seems fine"—prove it meets every requirement.

Verification vs Validation

Know the difference: Verification asks "Did we build it right?" (meets specifications). Validation asks "Did we build the right thing?" (solves customer problem). You can verify a perfect implementation of the wrong requirements. You need both.

Verification

Did we build it right?

  • Test against specifications
  • Measure performance vs requirements
  • Check all interfaces work correctly
  • Verify error handling and edge cases
  • Document results objectively

Validation

Did we build the right thing?

  • Customer acceptance testing
  • Does it solve the real problem?
  • Usable in operational environment?
  • Meets unstated but obvious needs?
  • Customer signs off

Testing Strategies

Test early, test often: Waiting until integration to test is how schedules explode. Test components individually (unit tests), test subsystems together (integration tests), test the full system (system tests). Each layer catches different failures.

Levels of Testing

Test Level What You're Testing When to Do It Typical Issues Found
Unit Tests Individual functions/modules During development Logic errors, boundary conditions, math mistakes
Integration Tests Interfaces between components As modules are integrated Protocol mismatches, timing issues, data corruption
System Tests Complete end-to-end functionality After integration complete Performance degradation, resource contention, emergent behavior
Acceptance Tests Requirements compliance Before delivery Missing features, unclear requirements, usability problems

Requirements Traceability Matrix

Close the loop: Every requirement must have a test. Every test must trace to a requirement. If you can't demonstrate compliance with a test, the requirement is unverifiable (and probably poorly written).
RTM Example
Requirement ID Requirement Test Procedure Status
SYS-001 Boot within 30 seconds TP-SYS-001: Measure boot time ✓ PASS
SYS-002 Operate -40°C to +85°C TP-SYS-002: Thermal chamber test ✓ PASS
COM-001 Ethernet link at 1 Gbps TP-COM-001: Throughput measurement ✗ FAIL (850 Mbps)
When tests fail: Don't just retest and hope. Root cause the failure. Is the requirement wrong? Is the design inadequate? Is the test procedure flawed? Fix the actual problem, not the symptom.

Reliability Engineering

MTBF and Failure Rates

MTBF (Mean Time Between Failures): Average time a system operates before failing. Higher is better. But understand what it actually means—MTBF of 10,000 hours doesn't mean every unit lasts 10,000 hours. It's a statistical average.

MTBF Calculation (for non-repairable systems):

MTBF = Total Operating Time / Number of Failures

Example Calculation

Test 10 units for 1000 hours each. 2 failures occur.

MTBF = (10 units × 1000 hours) / 2 failures = 5000 hours

Failure rate (λ): 1 / MTBF = 1 / 5000 = 0.0002 failures/hour

Design for Reliability

Derating

Don't run components at max ratings. Stress causes early failure.

  • Power: Use components at 50-70% max power
  • Voltage: Stay under 80% of max voltage
  • Temperature: Keep well below max junction temp
  • Current: Don't push wire/trace limits

Redundancy

Critical functions need backup paths.

  • Dual power supplies (N+1 redundancy)
  • Redundant communication links
  • Watchdog timers for processors
  • RAID for storage systems
  • Graceful degradation (fail soft, not hard)

Environmental Testing

Test like it'll be used: Lab conditions don't match real-world operation. Temperature extremes, vibration, humidity, EMI—if your system will see it, test for it.

Thermal Testing

  • Operating temperature range (-40°C to +85°C typical)
  • Storage temperature (-55°C to +125°C)
  • Thermal cycling (stress joints and connections)
  • Power-on thermal soak

Vibration & Shock

  • Random vibration (simulates transportation)
  • Sinusoidal sweep (resonance detection)
  • Mechanical shock (drop/impact)
  • Ensure connectors, solder joints survive

EMI/EMC Testing

  • Radiated emissions (FCC Part 15)
  • Conducted emissions
  • Susceptibility to external RF
  • ESD tolerance (human body model)

Test Automation

Automate regression tests: Manual testing is slow, error-prone, and doesn't scale. Automated tests run consistently, catch regressions immediately, and free engineers to focus on new functionality.
What to automate:
  • Unit tests (always automated, run on every commit)
  • Integration tests (automated for common interfaces)
  • Regression tests (detect if changes break existing functionality)
  • Performance benchmarks (catch degradation early)
  • Basic system tests (smoke tests after builds)
What to test manually:
  • Exploratory testing (finding unknown unknowns)
  • Usability and user experience
  • Complex environmental testing
  • Acceptance testing with customer