Financial data reporting is required to enable informed decision-making. In this article, we will look at the importance of an Operational Data Store (ODS) to an e-commerce business – how it’s built, the challenges we’ve encountered in existing implementations, and how we would configure it to address shortcomings.
The main objective of an ODS is to Extract, Transform, and Load (ETL) data generated by one or more production applications to create a Data Warehouse (DW)or a Data Lake (DL). The DW serves as a system used for data reporting and analysis and is considered to be an essential component of a company’s business intelligence process.
With vast amounts of data consistently being gathered from website visitors and customers, business data intelligence is a valuable addition to any e-commerce business. With a well-architected Data Warehouse, companies can perform analysis and create reports on the data generated across all platforms and applications.
These analyses can help identify potential irregularities from a financial perspective such as items and taxes that may not add up correctly to the order price for instance. This is a key feature that companies can benefit immensely from to ensure proper accounting. It can also be leveraged to reveal inconsistencies in an application stack such as missing data or strange behavior.
For reporting, the ODS should provide nicely formatted data so that users can create the types of reports they need such as graphs and dashboards generally created using a visualization tool. A simple example might be that of a historical trend of refunds over the past six months.
Let’s take a look at the ODS architecture design for an actual customer. We used Serverless Framework which is very convenient when working on the AWS cloud since the framework is robust, scalable, and cost-effective. We leveraged several Amazon Web Services (AWS) cloud services such as Lambda, S3, Redshift, CloudWatch, CloudFormation, VPC, IAM, CodeBuild, CodeCommit, CodeDeploy, and CodePipeline to implement the solution.
We will briefly detail the elements of this architectural design.
The Serverless Framework is a very powerful Infrastructure as Code (IaC) tool that enables communication with the AWS web console from code and the command line. It permits easily defining of our application as functions and events, deploying infrastructure and code. It also helps make use of AWS Lambda, AWS CloudFormation, AWS VPC, and AWS IAM.
The logic of this ODS implementation lies in AWS Lambdas, which are functions that can be called after an event such as a daily schedule, or after a new document is created in S3. It innately provides the project with auto-scaling so that charges are incurred for only what processing is required. Auto-scaling facilitates real-time ETL processes even during peak traffic periods (Black Friday, Christmas, Super Bowl, etc.) in order to get near-instant feedback under any circumstances.
Amazon S3 (Simple Storage Service)is used to store the files that are sent by the production website. Following that event, the ODS performs its ETL processes to complete its ingest process into an AWS Redshift Data Warehouse.
One of the most important services within the ODS is AWS Redshift. After ETL processes are finished, it is important to be efficient in performing analysis and reporting on the captured data. AWS Redshift addresses this requirement as an Online Analytical Processing System (OLAP), yielding the performance required for analytics, which is especially important when the business generates a large amount of data every day. This is particularly efficient with Redshift because it is column-based, not row-based like many other databases. The potential drawback with RedShift is that it might struggle more than an OLTP (Online Transaction Processing system) when performing data mutations (Inserts, Updates, and Deletes). However, in a typical data lake (DL) environment, this is not an issue since the focus is mainly placed on extracting data (Select) for analysis that helps in decision making.
AWS CloudWatch is a tool that helps monitor the AWS services in use. It presents users with convenient graphs that indicate the status and metrics required to assess the health of the AWS services that have been deployed.
CloudFormation, AWS’ infrastructure as code toolset, is mostly used through Serverless Framework. It enables management of S3, Lambda, and CloudWatch as Infrastructure as Code (IaC).
Amazon Virtual Private Cloud (VPC) also works with Serverless Framework. It permits the creation of a virtual network for AWS resources.
Identity and Access Management (IAM) is used to control who is authenticated and authorized to use resources.
We inherited an ODS project that was originally built by another team, and we faced three major challenges.
The first was a lack of documentation. This was a significant obstacle that required some reverse engineering on our part to clearly understand what the last team had achieved.
The second challenge we faced was to ensure maintainability, particularly Store Procedures (SQL code) living in AWS Redshift. The files in the Git repository that were supposed to be used as backups for Store Procedures (SP) were slightly different from those in production. We realized the importance of keeping the files are consistently up to date to facilitate future backups.
The third challenge we faced was with the Store Procedures (SP). We encountered difficulties when building and debugging them. The solution to these issues was to make use of Temporary Tables within the DB-IDE or in the SP.
First, we would have taken advantage of AWS Glue versus utilizing AWS Lambda because Glue specifically addresses the desired data flow required for ETL. We would Stick with the Serverless Framework because of its compatibility with AWS Glue.
Second, we would implement testing from day one since it is easier to do incrementally on an ongoing basis versus doing it all at once. Effective testing is an absolute requirement for the implementation of a robust and stable application.
Third, we would carefully plan out how the AWS StepFunctions are implemented. The implementation discussed in this article uses AWS Lambdas that succeed each other, but are not necessarily linked together. For example, the output of one process would trigger the event of another to follow and this can impact the readability of the code. For developers reading this article, bear in mind that cost considerations of this approach should be taken into account when selecting an architectural choice.
Fourth, as a modification, we would have opted for a hexagonal architecture in order to have a clearer understanding of the input and output of each Lambda.
Finally, we would explore the applicability of AWS Glue DataBrew and AWS Glue Elastic Views for the same reason as AWS Glue since both these services are specifically used for ETL.
In this article, we have outlined the importance of an ODS when it comes to the reporting and analysis of financial data. An ODS serves as a crucial tool to aggregate information from multiple data sources and helps optimize both the reporting and analysis of data. We reviewed an architecture we implemented in a real-world example, we discussed the challenges we faced during the implementation, and lastly, we ended the article with some insights into how we would build it differently with other advanced AWS services if we were starting from scratch.
TrackIt is an Amazon Web Services Advanced Consulting Partner specializing in cloud management, consulting, and software development solutions based in Venice, CA.
TrackIt specializes in Modern Software Development, DevOps, Infrastructure-As-Code, Serverless, CI/CD, and Containerization with specialized expertise in Media & Entertainment workflows, High-Performance Computing environments, and data storage.
TrackIt’s forté is cutting-edge software design with deep expertise in containerization, serverless architectures, and innovative pipeline development. The TrackIt team can help you, architect, design, build, and deploy customized solutions tailored to your exact requirements.
In addition to providing cloud management, consulting, and modern software development services, TrackIt also provides an open-source AWS cost management tool that allows users to optimize their costs and resources on AWS.