AWS Infrastructure Management
Overview
My role: Product concept to deployed system.
Challenge: Efficiently manage the infrastructure and cloud application for an SME client, including deployment processes, backup/restore and monitoring from Windows and macOS workstations
Solution: Multi-platform desktop application integrating infrastructure-as-code, status monitoring and log analytics
Impact: Enabled safe day-to-day operations (deploy/backup/restore/log review) without requiring AWS console access or broad administrative permissions.
Technologies: Avalonia, C#/.NET, AWS SDK
Infrastructure As Code
In infrastructure as code, the key question is how much of the system is declarative configuration and how much is behavioural automation. For this client, a suite of Python automation already existed to manage the cloud infrastructure: it did exactly what we needed, but it was command-line driven and required deep operational knowledge to use safely.
Transitioning from Python to C# while still using the AWS SDK allowed us to keep the proven automation model while overlaying a modern Avalonia UI for safer day-to-day operations.
Standard Processes
Running and monitoring deployment processes was also part of our Python suite of tools. With the new UI, deployment and recovery workflows are easier to initiate and monitor. This reduces the risk of operator error (wrong command line calls or parameters) and makes day-to-day operations accessible to non-specialists who previously had to delegate these tasks to technical team members.
Log Analytics
Logs are essential, and they are large. There are various analytics tools available and AWS Console also has CloudWatch log review capabilities. But if an existing system needs significant tailoring, or if the workflow relies on a basic UI like the AWS Console, there is still substantial effort required to interpret logs consistently.
The solution enabled local analytics with semantic parsing of log formats and supporting visualisations (e.g., histogram charts) to understand when key events occur.
Delivery and Technical Approach
The production environment is intentionally self-contained. Core services (e.g., RDS and the ECS/Fargate cluster) are not directly administered through day-to-day console usage. Instead, operational actions are wrapped into predetermined, externally triggerable workflows inside the environment:
- AWS Lambda functions and ECS tasks for build/deploy, backup/restore, and rollback operations
- read-only status visibility into what is running and what artefacts/backups are available
The desktop UI allows authorised users to execute these workflows without logging into the AWS console and without requiring broad administrative permissions.
Least-Privilege IAM Model
The application uses a constrained IAM role:
- read-only access to selected resources (e.g., available backups, available container images, running ECS tasks)
- execution permission for specific Lambdas
- permission to start specific ECS tasks that are wired to perform a single operational action
This reduces the blast radius and makes operational intent explicit.
Build and Deploy Integrity
Container builds run inside the production environment from source code in a protected GitHub main branch. The build workflow reports the Git commit hash used to build the images, allowing operators to validate that production artefacts correspond to an expected, reviewed commit before deployment.
Deployment is a separately triggered step after validation, updating the running container constellation to the new version. Backup, restore, and rollback to earlier versions follow the same controlled, explicit workflow pattern.
Local Log Analytics
CloudWatch logs are accessible, but interactive investigation through the console can be slow and limited without a dedicated log management tool. The solution therefore provides local log analytics:
- Lucene-based deep search and semantic parsing for known log formats
- distribution charts across time slices (e.g., day/hour/15-minute buckets)
This reduces time-to-diagnosis for operational issues from hours to minutes.