Verification vs Validation
Know the difference: Verification asks "Did we build it right?" (meets specifications). Validation asks "Did we build the right thing?" (solves customer problem). You can verify a perfect implementation of the wrong requirements. You need both.
Verification
Did we build it right?
- Test against specifications
- Measure performance vs requirements
- Check all interfaces work correctly
- Verify error handling and edge cases
- Document results objectively
Validation
Did we build the right thing?
- Customer acceptance testing
- Does it solve the real problem?
- Usable in operational environment?
- Meets unstated but obvious needs?
- Customer signs off
Testing Strategies
Test early, test often: Waiting until integration to test is how schedules explode. Test components individually (unit tests), test subsystems together (integration tests), test the full system (system tests). Each layer catches different failures.
Levels of Testing
| Test Level | What You're Testing | When to Do It | Typical Issues Found |
|---|---|---|---|
| Unit Tests | Individual functions/modules | During development | Logic errors, boundary conditions, math mistakes |
| Integration Tests | Interfaces between components | As modules are integrated | Protocol mismatches, timing issues, data corruption |
| System Tests | Complete end-to-end functionality | After integration complete | Performance degradation, resource contention, emergent behavior |
| Acceptance Tests | Requirements compliance | Before delivery | Missing features, unclear requirements, usability problems |
Requirements Traceability Matrix
Close the loop: Every requirement must have a test. Every test must trace to a requirement. If you can't demonstrate compliance with a test, the requirement is unverifiable (and probably poorly written).
RTM Example
| Requirement ID | Requirement | Test Procedure | Status |
|---|---|---|---|
SYS-001 |
Boot within 30 seconds | TP-SYS-001: Measure boot time | ✓ PASS |
SYS-002 |
Operate -40°C to +85°C | TP-SYS-002: Thermal chamber test | ✓ PASS |
COM-001 |
Ethernet link at 1 Gbps | TP-COM-001: Throughput measurement | ✗ FAIL (850 Mbps) |
When tests fail: Don't just retest and hope. Root cause the failure. Is the requirement wrong? Is the design inadequate? Is the test procedure flawed? Fix the actual problem, not the symptom.
Reliability Engineering
MTBF and Failure Rates
MTBF (Mean Time Between Failures): Average time a system operates before failing. Higher is better. But understand what it actually means—MTBF of 10,000 hours doesn't mean every unit lasts 10,000 hours. It's a statistical average.
MTBF Calculation (for non-repairable systems):
MTBF = Total Operating Time / Number of Failures
Example Calculation
Test 10 units for 1000 hours each. 2 failures occur.
MTBF = (10 units × 1000 hours) / 2 failures = 5000 hours
Failure rate (λ): 1 / MTBF = 1 / 5000 = 0.0002 failures/hour
Design for Reliability
Derating
Don't run components at max ratings. Stress causes early failure.
- Power: Use components at 50-70% max power
- Voltage: Stay under 80% of max voltage
- Temperature: Keep well below max junction temp
- Current: Don't push wire/trace limits
Redundancy
Critical functions need backup paths.
- Dual power supplies (N+1 redundancy)
- Redundant communication links
- Watchdog timers for processors
- RAID for storage systems
- Graceful degradation (fail soft, not hard)
Environmental Testing
Test like it'll be used: Lab conditions don't match real-world operation. Temperature extremes, vibration, humidity, EMI—if your system will see it, test for it.
Thermal Testing
- Operating temperature range (-40°C to +85°C typical)
- Storage temperature (-55°C to +125°C)
- Thermal cycling (stress joints and connections)
- Power-on thermal soak
Vibration & Shock
- Random vibration (simulates transportation)
- Sinusoidal sweep (resonance detection)
- Mechanical shock (drop/impact)
- Ensure connectors, solder joints survive
EMI/EMC Testing
- Radiated emissions (FCC Part 15)
- Conducted emissions
- Susceptibility to external RF
- ESD tolerance (human body model)
Test Automation
Automate regression tests: Manual testing is slow, error-prone, and doesn't scale. Automated tests run consistently, catch regressions immediately, and free engineers to focus on new functionality.
What to automate:
- Unit tests (always automated, run on every commit)
- Integration tests (automated for common interfaces)
- Regression tests (detect if changes break existing functionality)
- Performance benchmarks (catch degradation early)
- Basic system tests (smoke tests after builds)
- Exploratory testing (finding unknown unknowns)
- Usability and user experience
- Complex environmental testing
- Acceptance testing with customer