Database Migration from Reach Engine to Dalet Flex Using AWS Services

Written by Maram Ayari, Software Engineer at TrackIt

Migrating databases is a critical process for organizations seeking to modernize infrastructure, improve scalability, or adopt cloud-native platforms. It often involves not only transferring data but also transforming and validating it to align with the target environment’s requirements. This becomes especially complex when metadata structures, access methods, and schema definitions differ significantly between legacy and modern systems.

Successful database migrations require careful planning, reliable tooling, and clear strategies for minimizing downtime and ensuring data integrity. The following example outlines a metadata migration project from an on-premises environment to a cloud-based Dalet Flex deployment, designed and implemented for a media customer.

Contents

Project Overview
- Steps
Architecture Overview
Step 1: Discovery – Evaluating the Existing Infrastructure
Step 2: Defining Metadata Requirements in Dalet Flex
Step 3: Migration from the On-Prem Database to Amazon RDS
Step 4: Transform and Validate the Data (ETL)
Step 5: Load into Dalet Flex
Conclusion
How TrackIt Can Help
About TrackIt

Project Overview

The objective of the project was to migrate transformed metadata from an on-premises Reach Engine environment to Dalet Flex hosted on AWS. TrackIt addressed specific data quality issues identified in the client-provided documentation to ensure that the transformed data aligned with the requirements of Dalet Flex.

While Reach Engine provides a RESTful API for data access, limitations related to API rate throttling and time constraints from the client made it impractical for use in this migration. As a result, an alternative pipeline was designed to authenticate directly against the underlying database. This approach enabled the extraction of metadata from the relevant database tables, followed by the reconstruction and validation of entities as part of the ETL process.

Steps

Evaluate the existing infrastructure and current Reach Engine setup.
Define the metadata fields, asset types, and any configurations that need to be adapted in Dalet Flex.
Create scripts using the API to extract data from Reach Engine and move it to an intermediate database.
Use a structured ETL (Extract, Transform, Load) process to prepare data for Dalet Flex.
Move the newly formatted data into Dalet Flex using the Dalet Flex API.

Architecture Overview

AD 4nXeX7hpaQkXzCPoV88 W1AmNQ75GqWHow uLzHBqTrg4AJ3cBIFYTgnEEk9f ByL91iNXfGL328dkw4xwUjdiRdOqmXNSFZ4RynpKi0bXx 2Is8xq5KfDsOdsCA MlrQm3h5tang?key=641CYrvRJAzzQtSSQd8G1BR

Amazon RDS (Relational Database Service)

RDS is used to host the intermediate database that temporarily stores data copied from the source Reach Engine. At this stage, the data remains untransformed but serves as a stable and secure staging area for analysis, testing, and verification to ensure that all necessary data is accurately captured for the migration.

AWS Direct Connect

Direct Connect establishes a dedicated and secure network connection between the on-premises Reach Engine and the AWS infrastructure. This enables smooth data transfer and low latency throughout the migration process, even with the large data volumes involved.

AWS Glue

AWS Glue handle the transformation of metadata from the original database format to the new structure. These jobs extract data from the intermediate AWS RDS database, transform it as required, and prepare it for loading into the Dalet Flex environment. Glue’s serverless architecture and automatic scaling capabilities make it well-suited for this large-scale migration.

Amazon S3 (Simple Storage Service)

S3 plays a critical role throughout the migration, beyond serving as the final destination for transformed data. During both the extraction and transformation phases, S3 provides temporary storage for in-process data. Its scalability and reliability are essential for handling large datasets. S3 also enables secure storage and flexible access at various stages, supporting workflow management and minimizing the risk of data loss.

Step 1: Discovery – Evaluating the Existing Infrastructure

The first step in the migration process involved evaluating the existing Reach Engine environment. This included an assessment of the infrastructure and database to understand the scale and accessibility of the data.

While Reach Engine offers a RESTful API for interacting with metadata, it was subject to high rate limiting. Given the volume of data and project time constraints, using the API would have significantly prolonged the extraction process. As a result, the team opted for a more efficient approach: accessing the database directly to extract and reconstruct metadata for the migration pipeline.

As part of the discovery, the following details were gathered:

Database Size: 77GB
Total Tables: 227
Database Version: PostgreSQL 13.3
Connectivity: Internet access is routed through a proxy; access to RDS is functional via this proxy.

Step 2: Defining Metadata Requirements in Dalet Flex

In coordination with the client and the Dalet Flex team, metadata fields and asset types required for support were defined. Necessary transformations and field mappings were also documented to ensure alignment with Dalet Flex’s schema and ingestion requirements.

Authentication

Dalet Flex’s API uses Basic Authentication:

A username and password are provided.
These credentials are sent in the Authorization header, encoded in base64 format.

API Endpoints

Development Environment: https://development.flex-dev.cloud.example.com/api/
Production Environment: https://production.flex-prod.cloud.example.com/api/

Matrix Parameters Format
Unlike typical REST APIs that use query strings (?param=value), Dalet Flex employs matrix URIs. For example: /api/assets;limit=10;page=2;sortBy=id;sort=desc

Common Parameters

limit: Maximum number of items per page (default: 100)
page: Pagination control
createdFrom, createdTo: Date filtering (format: 24 Jun 2021 10:48:41)
deleted: Include deleted assets (default: false)
sort, sortBy: Sorting results

Available Metadata Types

Metadata types can be viewed directly from the Dalet Flex interface or retrieved via the API:

Get all metadata definitions: GET /api/metadataDefinitions
Filter by name (e.g., asset types): GET/api/metadataDefinitions;name=assetType
Retrieve the full definition of a metadata type: GET /api/metadataDefinitions/{metadataId}/definition

Common Asset Metadata Types

Examples of metadata types used for assets in this deployment include:

assetType_Project
assetType_Promo
assetType_Collection
assetType_DeliverableImage

Creating a New Asset with Metadata

Create the asset: POST /assets
Include the metadatadefinitionId field to indicate the metadata type to apply.
Set metadata attributes (annotations): POST /assets/{assetId}/annotations
This is a separate request used to populate the asset’s metadata fields after creation. Metadata and asset creation cannot be performed in the same API call.

Step 3: Migration from the On-Prem Database to Amazon RDS

Connect to the On-Premises Environment

Connect to Jump Station: Use SSH or your preferred method to connect to the jump station in the on-premises environment.

Connect to the PostgreSQL Server Instance

From the jump station, connect to the server instance hosting PostgreSQL database

Create a backup with pg_dump

Using pg_dump, a full logical backup of the PostgreSQL database was created and stored in a shared location on the local SAN. This ensured a reliable snapshot of the data for migration.

Restore Backup to Amazon RDS Instance

Initial Test with Selected Tables:
- Perform a test restore with a backup containing only two or three tables (totaling approximately 8 GB).
Use pg_restore to restore the backup to the AWS RDS instance.
- pg_restore -U <username> -h <rds_endpoint> -d <database_name> -v /path/to/backup_file_name.backup
Full Database Restore:
- Once the test restore is successful, proceed with a full backup and restore it to the RDS instance.

Post-Migration Verification

Test the Tables:
- After the transfer is complete, thoroughly test the tables in the AWS RDS instance to ensure data integrity and consistency.
Handle Inconsistencies:
- If any inconsistencies are detected, troubleshoot the issues and repeat the migration process as necessary.

Ongoing Replication with pglogical

Install pglogical:
- Install pglogical on both the source (on-premises) and target (AWS RDS) PostgreSQL databases.
Set Up Provider Node (Source Database):
- Configure the on-premises PostgreSQL database as a provider node
Set Up Subscriber Node (Target Database):
- Configure the AWS RDS PostgreSQL database as a subscriber node.
Monitor and Verify Replication:
- Monitor the replication task with CloudWatch Logs to ensure data is continuously synchronized
- Perform data consistency checks periodically to verify the integrity of the replicated data

Alternative Ongoing Replication with AWS DMS through VPN

Configure DMS:
- Create a replication instance in AWS DMS.
- Configure source and target endpoints: (Needs Site-to-Site VPN)
  - Source: On-premises PostgreSQL database.
  - Target: AWS RDS for PostgreSQL instance.
Create a Migration Task:
- Set up a DMS task to perform ongoing replication
- Ensure the task is configured for continuous data replication to keep the AWS RDS instance synchronized with the on-premises database
Monitor and Verify Replication:
- Monitor the replication task with CloudWatch Logs to ensure data is continuously synchronized
- Perform data consistency checks periodically to verify the integrity of the replicated data

Step 4: Transform and Validate the Data (ETL)

Extracting and Filtering Asset IDs

The initial step involved extracting asset IDs from Reach Engine databse via its API. A Python script applied specific filters to capture only the relevant asset types (e.g., videos, images, documents) required for the migration. Extracted asset IDs were sent to an S3 bucket for further processing. An AWS Glue job was triggered to filter out assets missing mandatory data, ensuring that only complete records were considered for migration and establishing a roadmap for subsequent phases.

Fetching Metadata for Each Asset

Following the extraction of asset IDs, the next step was to fetch the associated metadata. This metadata, including descriptive information, media attributes, and relationships to other assets, was stored in various tables within the database.

ETL jobs were designed to join asset IDs with these metadata tables, ensuring that all relevant details for each asset were gathered. The process involved querying metadata tables using asset IDs as the primary filter and extracting metadata such as technical information (file formats, resolutions), descriptive attributes (titles, tags, descriptions), and relational data (links to other assets or categories).

The logic here was to:

Query the metadata tables using the asset IDs as the primary filter.
Extract the metadata related to each asset, which could include technical information (like file formats, resolutions), descriptive attributes (like titles, tags, descriptions), and any relational data (such as links to other assets or categories).

Transforming the Data for the Target Database (Dalet Flex)

After extracting the necessary metadata, the data was transformed into a format compatible with Dalet Flex. This transformation process included:

Data normalization: Standardizing metadata fields to align with the target Database’s structure.
Field mapping: Ensuring that fields from the source Database were correctly mapped to the corresponding fields in the new Database. For instance, technical attributes such as resolution or file format might need to be renamed or reformatted to meet the requirements of the new platform.

Loading the Data into Amazon S3

Once the transformation was complete, the final step involved loading the transformed data into Amazon S3. This staging area provided a secure and scalable storage solution where the data could be temporarily held before being migrated to Dalet Flex. The transformed data in Amazon S3 was organized and prepared for the final migration phase, ensuring that all metadata was properly validated and ready for transfer.

Step 5: Load into Dalet Flex

AD 4nXdunRGgZq23WXNcMJBxGfcVkOfFfNkyMredIGIEs7vGpL1h7DFKSos6jOUJiA sBPR0ZKtvGCqunSZdSozX5hZvyyf

This workflow outlines a data transformation pipeline designed to prepare JSON files for ingestion into a Dalet/Flex system. It uses AWS services like S3, DataSync, and Glue to automate and scale the process efficiently:

Source of Truth S3 Bucket: Raw JSON files are stored in a centralized S3 bucket acting as the source of truth.
AWS DataSync to Transformation Bucket: DataSync transfers the contents of the source bucket to a transformation S3 bucket for staging and processing.
Sanitize Filenames with AWS Glue: A Glue script standardizes file names to ensure Spark compatibility during downstream processing.
Purge Unwanted Files: A second Glue job identifies and removes unwanted or unnecessary JSON files based on defined criteria.
Conflict Detection: Another Glue script scans for content conflicts (e.g., schema mismatches) to determine if transformation rules need updates.
Data Transformation: A final Glue job applies the latest transformation logic to reformat JSON files according to Dallet/Flex requirements.
DataSync to Dallet/Flex Bucket: The transformed files are moved to the final S3 buckets (prod & dev) using DataSync, where they are ready for ingestion into the Dallet/Flex platform.

Conclusion

Database migrations, particularly those involving legacy systems and evolving metadata requirements, are inherently complex and demand careful orchestration across infrastructure, data integrity, and workflow compatibility. The transition from Reach Engine to Dalet Flex highlights the importance of tailored pipelines that balance technical constraints with business objectives. In media environments, where metadata accuracy and content accessibility are critical, aligning migration strategies with platform-specific schemas is essential.

How TrackIt Can Help

TrackIt, an AWS Advanced Tier Services Partner with deep expertise in media workflows, brings the technical capability and industry understanding required to execute such transformations efficiently. By leveraging AWS-native services and a media-focused approach, TrackIt can help media organizations modernize their infrastructure while ensuring operational continuity and data fidelity.

About TrackIt

TrackIt is an international AWS cloud consulting, systems integration, and software development firm headquartered in Marina del Rey, CA.

We have built our reputation on helping media companies architect and implement cost-effective, reliable, and scalable Media & Entertainment workflows in the cloud. These include streaming and on-demand video solutions, media asset management, and archiving, incorporating the latest AI technology to build bespoke media solutions tailored to customer requirements.

Cloud-native software development is at the foundation of what we do. We specialize in Application Modernization, Containerization, Infrastructure as Code and event-driven serverless architectures by leveraging the latest AWS services. Along with our Managed Services offerings which provide 24/7 cloud infrastructure maintenance and support, we are able to provide complete solutions for the media industry.