TrackIt
TrackIt
Contact us
Blogs

Modernizing WAFR Automation: From Python Scripts to a TypeScript, Hexagonal, Nx-Powered Platform

Author

Alexian Kauffmann

Date Published

Enterprises working to maintain AWS Partner status must conduct frequent Well-Architected Framework Reviews (WAFR). Manual reviews and ad-hoc scripts often slow progress, create configuration drift, and conceal risk in spreadsheets and ticketing systems. The WAFR Automation Tool streamlines these processes by scanning AWS environments with open-source utilities, generating findings, and mapping them to the AWS Well-Architected Framework—some assisted by AI, others curated manually.

The sections below use the WAFR Automation Tool as a reference implementation to illustrate how large-scale automation systems can evolve from script-based workflows to a structured, TypeScript-driven architecture. The modernization approach described—built on hexagonal design principles and an Nx monorepo—demonstrates how typed contracts, consistent patterns, and stronger boundaries can accelerate delivery, improve reliability, and make ongoing change easier to manage across architecture, repository structure, migration, contracts, testing, CI/CD, observability, and risk management.

The Case for Change: Python-era Pain Points

The original WAFR automation platform, built on Python scripts and ad-hoc services, enabled rapid experimentation early on but became difficult to maintain as usage expanded. Growing complexity, inconsistent conventions, and limited visibility across components slowed delivery and increased operational risk.

  • Coordination tax: Multiple languages and one-off scripts made onboarding slow and integrations brittle.
  • Runtime surprises: Dynamic typing and partial test coverage pushed defects to production, with mismatched payloads often discovered late.
  • Scaling complexity: Expanding dependencies across Lambdas and ECS jobs increased cold starts and blurred boundaries between logic layers.
  • Governance gaps: Limited dependency visibility, drifting schemas, and inconsistent release workflows made change management unpredictable.

These challenges collectively reduced change velocity, increased incident frequency, and drove up total cost of ownership as the platform surface area continued to grow.

Why TypeScript

The shift to TypeScript addresses long-standing reliability, consistency, and scalability issues that surfaced as the automation platform grew. Strong typing, mature tooling, and ecosystem alignment make it easier to build predictable, maintainable, and high-performing systems across all layers of the stack.

  • Typed contracts, earlier feedback: Compile-time checks and static analysis eliminate many “stringly-typed” errors before deployment. Shared type definitions with runtime validation at service boundaries make integrations more reliable.
  • Ecosystem maturity: TypeScript now benefits from robust AWS SDK v3 clients, Lambda Powertools (TS), OpenTelemetry, Zod, TypeORM, Vitest, Playwright, and Nx—supported by advanced bundling and cold-start optimization for serverless environments.
  • Stack alignment: With most frontends, CLIs, and infrastructure tooling now built around TypeScript and Node.js, using a single language across tiers promotes consistent patterns for APIs, events, and automation workflows.

These advantages enable a unified, typed platform that scales cleanly, reduces runtime uncertainty, and supports faster, more confident delivery.

Architectural North Star: Hexagonal Architecture

The modernization adopts a hexagonal, or ports-and-adapters, architecture to isolate business logic from delivery mechanisms and infrastructure details. This structure preserves a pure, testable core while keeping external layers thin, replaceable, and easier to evolve over time.

Layer

Responsibility

Examples

Use cases (domain)

Business rules, policies, orchestration decisions

Evaluate findings, plan remediation, score risk

Adapters

Translate external calls to use cases

AWS API Gateway/Lambda handlers, Step Functions tasks, CLI

Infrastructure

Implement ports behind interfaces

S3 for findings, Aurora for recommendations, SNS/SQS for events, Bedrock for AI assists

Key principles include maintaining a domain layer free of cloud SDKs and frameworks, ensuring adapters act as mappers rather than decision-makers, and designing infrastructure to be swappable without changing domain logic. Validating data at boundaries ensures the core operates only on well-typed, trusted inputs.

This structure aligns closely with the AWS Well-Architected pillars—Operational Excellence through clear separation of responsibilities, Reliability through testable logic, and Cost Optimization through incremental infrastructure replacement.

Visualization of the hexagonal architecture used in TrackIt's WAFR Automation Tool

Visualization of the hexagonal architecture used in the WAFR Automation Tool


Repository strategy: Nx-Powered Monorepo

An Nx-based monorepo provides a unified structure for managing all components of the modernization effort. It centralizes governance, enforces architectural boundaries, and accelerates development through caching and dependency awareness.

  • Project graph and affected logic: Nx automatically determines which projects are impacted by a change, allowing CI pipelines to build and test only what’s necessary. Remote caching further speeds up pull requests and ensures consistent, repeatable builds.
  • Enforced boundaries: Projects are tagged by layer—domain, adapters, infrastructure, UI, or contracts. Dependency rules prevent the domain from importing adapters or infrastructure code, preserving architectural integrity.
  • Generators and consistency: Opinionated templates scaffold new use cases, handlers, repositories, and workflows to prevent drift. Shared libraries are versioned internally, and semantic commits automate changelog generation and release management.

Migration Method: Strangler-Fig, Incremental Replacement

The modernization follows a strangler-fig approach—gradually replacing legacy components with TypeScript-based services while minimizing disruption to ongoing operations. This method allows controlled rollout, continuous validation, and safe decommissioning of older code.

  • High-leverage entry point: Begin with a focused, high-visibility workflow such as PrepareFindings to validate the new architecture early.
  • Dual-run behind feature flags: Route a small fraction of traffic (1–5%) to the new implementation and compare outputs silently using shadow reads. AWS AppConfig enables progressive exposure and provides instant rollback through kill-switches.
  • Anti-corruption layer: A lightweight façade normalizes legacy inputs and outputs to align with the new contracts without embedding business logic.
  • Tight feedback loops: Track latency, error rates, and payload differences, addressing discrepancies before expanding traffic.
  • Decommissioning: Retire legacy endpoints once functional parity is confirmed, removing transitional components to prevent long-term complexity.
Strangler Fig Visualization

Visualization of the strangler-fig pattern, where a new TypeScript-based service gradually grows around the legacy system like a fig enveloping its host tree


Contracts and Schemas: Shared Types Across Services and Clients

Contracts form the foundation of predictable integrations and long-term maintainability. Treating them as products—versioned, validated, and well-documented—creates a consistent interface between services and clients while reducing runtime uncertainty.

  • Single source of truth: A shared contracts package defines type information, JSON Schemas, OpenAPI specifications, and standardized error taxonomies.
  • Runtime validation: All external inputs such as API requests, event payloads, and key downstream responses are validated. Discriminated unions are used to represent workflow states and prevent invalid combinations.
  • Versioning and evolution: Contracts follow semantic versioning. During transitions, adapters handle mixed versions, while consumer-driven contract tests ensure backward compatibility for downstream systems.
  • Documentation and discoverability: Human-readable documentation generated from schemas is published in the internal developer portal, including examples for common and edge cases along with structured error codes and remediation guidance.

Testing Strategy: Unit-First, Contract-Focused, Targeted E2E

A pragmatic testing approach balances speed with confidence, ensuring that each layer validates what matters most without slowing delivery.

  • Unit-first on use cases: Pure, deterministic tests cover core business rules comprehensively, providing instant feedback on logic changes.
  • Contract tests on adapters: Mapping accuracy, schema conformance, and error translation are verified at integration boundaries. Snapshot testing is reserved for stable, intentional presentation details to avoid brittle dependencies.
  • Targeted end-to-end: A focused suite of golden-path scenarios runs against ephemeral stacks or sandboxes. Test data builders and seeded fixtures streamline reproducibility and reduce manual setup.
  • Non-functional tests: Load tests measure p95/p99 latency and cold-start behavior, while resilience tests validate timeouts and downstream recovery. Security coverage includes IAM least-privilege verification, dependency scans, and secret exposure checks.
Unit Testing Lifecycle Diagram

CI/CD and developer experience

Continuous integration and delivery pipelines are designed to optimize iteration speed while preserving reliability, traceability, and security across environments.

  • Affected-based builds and caching: Nx executes only affected projects while reusing prior results from the remote cache. This shortens CI duration and ensures reproducibility across environments.
  • Generators and guardrails: Opinionated templates enforce structure—folders, naming, linting, and scripts. Pre-commit hooks surface formatting and lint violations before code reaches CI.
  • Branching and releases: Trunk-based workflows with short-lived feature branches keep integration smooth. Semantic commits automate versioning and changelog generation for shared packages.
  • Environment promotion: Multi-account AWS environments (dev → staging → prod) follow controlled deployment gates with manual approval for production. Artifacts are signed and promoted rather than rebuilt, preserving integrity and traceability.

Observability and Operations

Operational excellence depends on visibility across every layer—connecting technical telemetry with business outcomes to guide continuous improvement.

  • Logging: Structured logs include correlation and trace IDs, workload identifiers, and key domain fields. Redaction policies protect PII and secrets, while standardized log levels and contexts enable reliable filtering and analysis.
  • Tracing: Distributed tracing spans API Gateway, Lambda, Step Functions, and storage calls. Key business steps, such as “associate finding to pillar,” are instrumented as spans to tie performance back to user and business impact.
  • Metrics: Business KPIs track high-risk findings per workload, remediation lead time, and time-to-report. Technical SLOs monitor error rate, p95 latency, cold-start ratio, throttling, and retry or DLQ frequency.
  • Alarming and runbooks: Budget-based SLO alerts trigger documented runbooks with clear ownership and escalation paths. Dashboards are segmented by audience—executive dashboards highlight outcomes, engineering dashboards surface root causes, and cost dashboards track telemetry overhead.

Risk Management

A deliberate risk management strategy ensures that each migration step can be observed, rolled back, or corrected without large-scale impact. The goal is to validate behavior early, preserve data integrity, and maintain trust in production systems.

  • Performance parity checks: Latency and payload sizes are compared throughout dual-run phases, with alerts configured for significant deviations. This continuous verification guarantees operational parity before full cutover.
  • Canary rollouts and rollback: Weighted routing allows progressive traffic shifts. Automatic rollback triggers on predefined error or latency thresholds prevent broad impact from unnoticed regressions.
  • Data safety: Idempotency keys and deduplication logic enforce exactly-once semantics for mutations. Stateful systems adopt blue/green patterns, tested backup and restore procedures, and dry-run migrations to ensure consistency.
  • Security posture: Least-privilege IAM roles, short-lived credentials, and KMS-encrypted secrets protect sensitive assets. Dependency policies and SBOM tracking maintain visibility into potential supply-chain vulnerabilities.
  • Planned decommissioning: Each legacy endpoint follows a controlled retirement process—traffic drained to zero, alarms disabled, code removed, and documentation updated—to eliminate residual technical debt.

Outcomes and Impact

Modernizing the WAFR Automation platform with TypeScript, hexagonal architecture, and an Nx monorepo led to measurable improvements in delivery speed, reliability, and operational clarity.

  • Faster delivery: Typed contracts, generators, and Nx caching shortened iteration cycles from weeks to days, accelerating feature rollout without compromising quality.
  • Fewer runtime errors: Compile-time type safety and runtime validation eliminated entire categories of production defects, improving stability and developer confidence.
  • Clear extensibility: New scanners, lenses, or storage backends integrate seamlessly through new ports or adapters, allowing expansion without modifying core business logic.
  • Better governance: Defined ownership boundaries and dependency graphs enhance visibility, enabling controlled evolution and safer change management.

Conclusion

Adopting a TypeScript-based, hexagonal, Nx-powered architecture transforms WAFR automation into a long-lived, adaptable platform—simpler to test, observe, and evolve. The WAFR Automation Tool delivers the core assessment and mapping foundation, while this modernization blueprint ensures that future iterations remain fast, reliable, and maintainable. Success depends on enforcing clear boundaries early, focusing on one high-impact use case at a time, and keeping transitional bridges minimal so they can be retired cleanly.

Explore the benefits of using WAFR Automation Tool on AWS Marketplace:

Unit-Based Pricing – Source | Monthly Subscription – Source