Financial data reporting is required to enable informed decision-making. In this article, we will look at the importance of an Operational Data Store (ODS) to an e-commerce business – how it’s built, the challenges we’ve encountered in existing implementations, and how we would configure it to address shortcomings.
Contents [hide]
- What is an ODS?
- Why an E-Commerce Company Needs an ODS
- How an ODS is Built for an E-Commerce Retailer
- Serverless Framework
- AWS Lambda
- Amazon S3
- Amazon Redshift
- Amazon CloudWatch
- AWS CloudFormation
- Amazon VPC
- AWS IAM
- Challenges Faced by the TrackIt Team
- How We Would Build It Differently If We Had to Start from Scratch
- Let’s Review
- About TrackIt
What is an ODS?
The main objective of an ODS is to Extract, Transform, and Load (ETL) data generated by one or more production applications to create a Data Warehouse (DW) or a Data Lake (DL). The DW serves as a system used for data reporting and analysis and is considered to be an essential component of a company’s business intelligence process.
Why an E-Commerce Company Needs an ODS
With vast amounts of data consistently being gathered from website visitors and customers, business data intelligence is a valuable addition to any e-commerce business. With a well-architected Data Warehouse, companies can perform analysis and create reports on the data generated across all platforms and applications.
These analyses can help identify potential irregularities from a financial perspective such as items and taxes that may not add up correctly to the order price for instance. This is a key feature that companies can benefit immensely from to ensure proper accounting. It can also be leveraged to reveal inconsistencies in an application stack such as missing data or strange behavior.
For reporting, the ODS should provide nicely formatted data so that users can create the types of reports they need such as graphs and dashboards generally created using a visualization tool. A simple example might be that of a historical trend of refunds over the past six months.
How an ODS is Built for an E-Commerce Retailer
Let’s take a look at the ODS architecture design for an actual customer. We used Serverless Framework which is very convenient when working on the AWS cloud since the framework is robust, scalable, and cost-effective. We leveraged several Amazon Web Services (AWS) cloud services such as Lambda, S3, Redshift, CloudWatch, CloudFormation, VPC, IAM, CodeBuild, CodeCommit, CodeDeploy, and CodePipeline to implement the solution.
We will briefly detail the elements of this architectural design.
Serverless Framework
The Serverless Framework is a very powerful Infrastructure as Code (IaC) tool that enables communication with the AWS web console from code and the command line. It permits easily defining of our application as functions and events, deploying infrastructure and code. It also helps make use of AWS Lambda, AWS CloudFormation, AWS VPC, and AWS IAM.
AWS Lambda
The logic of this ODS implementation lies in AWS Lambdas, which are functions that can be called after an event such as a daily schedule, or after a new document is created in S3. It innately provides the project with auto-scaling so that charges are incurred for only what processing is required. Auto-scaling facilitates real-time ETL processes even during peak traffic periods (Black Friday, Christmas, Super Bowl, etc.) in order to get near-instant feedback under any circumstances.
Amazon S3
Amazon S3 (Simple Storage Service)is used to store the files that are sent by the production website. Following that event, the ODS performs its ETL processes to complete its ingest process into an AWS Redshift Data Warehouse.
Amazon Redshift
One of the most important services within the ODS is AWS Redshift. After ETL processes are finished, it is important to be efficient in performing analysis and reporting on the captured data. AWS Redshift addresses this requirement as an Online Analytical Processing System (OLAP), yielding the performance required for analytics, which is especially important when the business generates a large amount of data every day.
This is particularly efficient with Redshift because it is column-based, not row-based like many other databases. The potential drawback with RedShift is that it might struggle more than an OLTP (Online Transaction Processing system) when performing data mutations (Inserts, Updates, and Deletes). However, in a typical data lake (DL) environment, this is not an issue since the focus is mainly placed on extracting data (Select) for analysis that helps in decision making.
Amazon CloudWatch
AWS CloudWatch is a tool that helps monitor the AWS services in use. It presents users with convenient graphs that indicate the status and metrics required to assess the health of the AWS services that have been deployed.
AWS CloudFormation
CloudFormation, AWS’ infrastructure as code toolset, is mostly used through Serverless Framework. It enables management of S3, Lambda, and CloudWatch as Infrastructure as Code (IaC).
Amazon VPC
Amazon Virtual Private Cloud (VPC) also works with Serverless Framework. It permits the creation of a virtual network for AWS resources.
AWS IAM
Identity and Access Management (IAM) is used to control who is authenticated and authorized to use resources.
Challenges Faced by the TrackIt Team
We inherited an ODS project that was originally built by another team, and we faced three major challenges.
The first was a lack of documentation. This was a significant obstacle that required some reverse engineering on our part to clearly understand what the last team had achieved.
The second challenge we faced was to ensure maintainability, particularly Store Procedures (SQL code) living in AWS Redshift. The files in the Git repository that were supposed to be used as backups for Store Procedures (SP) were slightly different from those in production. We realized the importance of keeping the files are consistently up to date to facilitate future backups.
The third challenge we faced was with the Store Procedures (SP). We encountered difficulties when building and debugging them. The solution to these issues was to make use of Temporary Tables within the DB-IDE or in the SP.
How We Would Build It Differently If We Had to Start from Scratch
First, we would have taken advantage of AWS Glue versus utilizing AWS Lambda because Glue specifically addresses the desired data flow required for ETL. We would Stick with the Serverless Framework because of its compatibility with AWS Glue.
Second, we would implement testing from day one since it is easier to do incrementally on an ongoing basis versus doing it all at once. Effective testing is an absolute requirement for the implementation of a robust and stable application.
Third, we would carefully plan out how the AWS StepFunctions are implemented. The implementation discussed in this article uses AWS Lambdas that succeed each other, but are not necessarily linked together. For example, the output of one process would trigger the event of another to follow and this can impact the readability of the code. For developers reading this article, bear in mind that cost considerations of this approach should be taken into account when selecting an architectural choice.
Fourth, as a modification, we would have opted for a hexagonal architecture in order to have a clearer understanding of the input and output of each Lambda.
Finally, we would explore the applicability of AWS Glue DataBrew and AWS Glue Elastic Views for the same reason as AWS Glue since both these services are specifically used for ETL.
Let’s Review
In this article, we have outlined the importance of an ODS when it comes to the reporting and analysis of financial data. An ODS serves as a crucial tool to aggregate information from multiple data sources and helps optimize both the reporting and analysis of data. We reviewed an architecture we implemented in a real-world example, we discussed the challenges we faced during the implementation, and lastly, we ended the article with some insights into how we would build it differently with other advanced AWS services if we were starting from scratch.
About TrackIt
TrackIt is an international AWS cloud consulting, systems integration, and software development firm headquartered in Marina del Rey, CA.
We have built our reputation on helping media companies architect and implement cost-effective, reliable, and scalable Media & Entertainment workflows in the cloud. These include streaming and on-demand video solutions, media asset management, and archiving, incorporating the latest AI technology to build bespoke media solutions tailored to customer requirements.
Cloud-native software development is at the foundation of what we do. We specialize in Application Modernization, Containerization, Infrastructure as Code and event-driven serverless architectures by leveraging the latest AWS services. Along with our Managed Services offerings which provide 24/7 cloud infrastructure maintenance and support, we are able to provide complete solutions for the media industry.