The Challenge
A business owner reached out needing help with Athena queries against CloudFront logs stored in S3. The queries were not returning the expected results, and without reliable log analysis, she had limited visibility into how her platform was being used and where it might be underperforming.
On the surface, this looked like a straightforward debugging task. Fix the query logic, restore the visibility, move on. But that framing turned out to be too narrow.
What I Found
Correcting the Athena queries was achievable, but once I started reviewing the broader environment, the picture became more complex.
The log collection approach was unnecessarily convoluted. Content was being delivered directly from S3 without appropriate security controls. And when I engaged with the technical custodians, I discovered the environment held over 350TB of data spread across multiple S3 buckets - a scale that made the existing approach not just inefficient, but genuinely unsustainable.
The original query problem was real, but it was a symptom. The underlying challenge was an architecture that had grown organically without a clear model for managing data at scale.
How I Approached It
I resolved the immediate issue first by correcting the query logic, then introduced a simpler path using CloudWatch for log analysis and QuickSight for cost visualisation. I built a working CloudWatch dashboard during the engagement so the customer had something tangible and usable straight away.
With the immediate pressure removed, I shifted to the broader architecture. I documented the risks I had identified and, rather than trying to work through the 350TB data consolidation problem alone, I reached out to an AWS storage specialist to explore what was actually possible. That conversation surfaced S3 replication as a practical mechanism for managing data across buckets asynchronously and at scale.
From there, I worked with the customer's technical team to design an improved architecture - one that simplified log management, addressed the security gaps, and was maintainable long-term. We validated the approach in a non-production environment before any changes were made to production.
What Changed
The platform was re-architected with stronger security controls, a significantly simplified data management model, and better observability built in from the start rather than bolted on.
The engagement also progressed into a Well-Architected review - a signal that the relationship had moved well beyond the original support request. The customer went from struggling to get basic log queries working to having a clear, defensible platform architecture and a structured improvement pathway.
Lessons for Enterprise Cloud Platform Management
Start with the symptom, but do not stop there. The Athena query issue was real and urgent, and resolving it quickly mattered. But treating it as the whole problem would have left a fragile, poorly understood architecture in place. Trusted advisory starts with solving what is in front of you, then earning the right to look further.
Scale changes what is possible. At 350TB across multiple buckets, manual or ad hoc approaches to data management are not just inefficient. They are a governance and reliability risk. Recognising that the right solution had to be asynchronous and automated, rather than trying to fit a human-scale process to a machine-scale problem, was the key shift.
Bring in the right expertise. Consulting the AWS storage specialist was not an admission of limitation. It was the right call. Platform problems at this scale rarely have single-discipline solutions. Knowing when to pull in a specialist, and how to translate their input into something actionable for the customer, is a core part of the advisory role.
Operational simplicity has compounding value. Every time a platform is made easier to understand, operate and maintain, the organisation benefits beyond the immediate change. Simpler architectures are easier to secure, cheaper to run and more resilient when something goes wrong.