Jul. 24, 2023
In the search for the optimum architecture for our new Data Insights offering, TrackIt embarked on a rigorous evaluation process to compare the suitability of a data lake versus a data warehouse implementation. The objective was to design a data repository solution that would enable advanced analytics, provide deep insights into optimizing AWS infrastructure, and showcase TrackIt’s expertise in implementing comprehensive data workflows.
Note: Readers looking for a comprehensive guide that outlines the differences between a data lake and a data warehouse can read the recently published Data Lake vs. Data Warehouse article which compares both solutions across eight different qualifications.
Data Lake Architecture Diagram
The cost estimation presented below is based on the following assumptions: One data refresh per day, with 10 accounts (including 1 editor) having access to view visualizations for a maximum of 4 hours per day.
A detailed breakdown of costs for the data lake can be found on the AWS Pricing Calculator.
Monthly cost | Annual cost |
$68.00 | $816.05 |
Note: cost estimates provided are based on currently known elements and may vary depending on factors such as changes in usage patterns, data volumes, or refresh rates.
Data Warehouse Architecture Diagram
The cost estimation presented below is based on the following assumptions: One data refresh per day, with 10 accounts (including 1 editor) having access to view visualizations for a maximum of 4 hours per day.
Considering the small volume of data (<100 GB) and computational needs (2 vCPU) required for this use case, a smaller Redshift instance (dc2.large) was chosen. However, it is important to note that Redshift costs rise sharply depending on the volume of data and computing needs.
A detailed breakdown of costs for the data warehouse can be found on the AWS Pricing Calculator.
Monthly cost | Annual cost |
$80.96 | $971.48 |
Note: cost estimates provided are based on currently known elements and may vary depending on factors such as changes in usage patterns, data volumes, or refresh rates.
TrackIt opted for a data lake as the chosen solution for its Data Insights offering. This choice was made due to the scalability, flexibility, and cost-effectiveness offered by data lakes, particularly for larger implementations. A data lake allows for efficient storage, processing, and analysis of substantial volumes of both structured and unstructured data without a significant increase in costs. This aligned with the initial goal of building a solution that can accommodate future data sources and analytics requirements.
In the example outlined above, the decision to implement a data lake was driven by the need for scalability, flexibility, and cost-effectiveness. Data lakes provide the advantage of accommodating growing data sources, allowing for efficient storage, processing, and analysis of data without steep increases in cost. However, if the project objective was to ensure optimized performance, structured data insights, and simplified maintenance, data warehouses would have been a more appropriate solution.
This highlights the importance of conducting a thorough evaluation before choosing a solution. The choice between a data lake and a data warehouse depends on multiple factors such as workflow requirements, the volume of data to be stored and processed, and the degree of flexibility required. A careful examination of workflow needs, advantages, and potential trade-offs of each solution can help in making the right decision.
TrackIt is an Amazon Web Services Advanced Tier Services Partner specializing in cloud management, consulting, and software development solutions based in Marina del Rey, CA.
TrackIt specializes in Modern Software Development, DevOps, Infrastructure-As-Code, Serverless, CI/CD, and Containerization with specialized expertise in Media & Entertainment workflows, High-Performance Computing environments, and data storage.
In addition to providing cloud management, consulting, and modern software development services, TrackIt also provides an open-source AWS cost management tool that allows users to optimize their costs and resources on AWS.