Real Bugs We Found
These are actual bugs discovered during production validation of 13 customer extensions (81+ alerts deployed):
| Extension | Bug | Impact | Root Cause |
|---|---|---|---|
| APIC v0.0.6 | 3 Python typos | 3 metrics always zero | identInt16 โ identInst16, wrong API field names |
| Nexus v0.0.1 | SNMPv2c auth broken | 50 SNMP metrics missing | v2 fell through to SNMPv3 handler |
| ACI v0.0.2 | Cross-table OID mixing | Wrong data matched to rows | ipAddrTable OID in ifTable subgroup (DED018) |
| MSSQL v2.10.6 | ROUND() missing precision | SQL query error | ROUND(..., ) โ ROUND(..., 2) |
| Catalyst v1.0.0 | Uptime threshold 10x off | False alerts every reboot | sysUpTime in centiseconds: 360000 โ 3600000 |
| MikroTik v0.0.3 | 8 wrong metric key refs | Screens show empty tiles | Metric keys in screens didn't match extension.yaml |
| Checkpoint v0.0.1 | 8 metric gaps | Missing monitoring coverage | OIDs exist in MIB but not in extension.yaml |
| F5 BIG-IP v2.16 | 7 CLI/API gaps | Metrics not in SNMP extension | Some F5 metrics only available via iControl REST |
Validation Scorecard
Extension Metrics Validated Alerts Status
โโโโโโโโโโโโโโโ โโโโโโโ โโโโโโโโโ โโโโโโ โโโโโโโโโโ
APIC 8/8 VALID 12 3 bugs FIXED
ASR 15/15 code only 19 4 metrics NO DATA
ACI 9/9 VALID 7 Fault isolation fix
Catalyst 25/25 VALID 12 Reference implementation
FortiSwitch 19/19 VALID 19 3 deploy methods documented
F5 BIG-IP 16/23 partial 10 7 gaps (CLI/API only)
MSSQL 3/3 VALID 2 4 metrics missing from screens
MikroTik 4/7 partial TBD 8 wrong metric key refs FIXED
Production Patterns
Fault Isolation
โ ๏ธ ACI v0.0.5 fix: A device's interface table hung for 180 seconds, blocking ALL metrics. Fix: move interfaces to a separate group: so CPU/memory keeps polling independently.
# Bad: everything in one group
snmp:
- group: Device Default
subgroups:
- subgroup: CPU # blocked when interfaces hang
- subgroup: Interfaces # hangs for 180s
# Good: separate groups
snmp:
- group: Device Default
subgroups:
- subgroup: CPU # keeps polling
- group: Interfaces # hangs independently
subgroups:
- subgroup: Interfaces
Feature Set Granularity
Customers monitoring 1000+ interfaces need to toggle feature sets. Don't put everything in one group.
Duplicate Alert Cleanup
Before deploying new alerts, check for pre-existing ones. ASR had 28 duplicate alerts from previous deployments โ cleaned down to 19.
Lessons Learned
| Lesson | Why |
|---|---|
| Always use 64-bit counters | 32-bit wraps every 3.4s on 10Gbps links |
| Test with real devices | Simulators don't reproduce timeout/hang bugs |
| Check AG logs for silent failures | Extensions fail silently โ no UI error shown |
| func: metrics don't work in DQL | Only in screens/dashboards (Metrics API v2) |
Sprint needs custom: prefix | Non-Dynatrace extensions rejected without it |
| CA cert in BOTH locations | Credential Vault (server) + AG filesystem (runtime) |
๐ Course Complete!
You now know how to build, validate, and deploy Dynatrace Extensions 2.0. Check out the Apps course to build custom Dynatrace applications.