Apr. 01, 2020
TrackIt deployed the Magento on-line selling platform in Amazon Web Services using a variety of AWS infrastructure while encountering challenges along the way. This application note details the process.
The client website, a Magento ecommerce store, written in PHP
The Product Information Management (PIM) system, which provides manipulation of the product sets without requiring access to the store backend
Client’s Custom Connection application, which is a bridge between the store, the marketplace and the logistics environment (warehouse, shipping orders, etc.)
The Mirakl-HiPay connector that integrates the cash-out operations between HiPay and Mirakl
Client’s Custom application that processes orders and prints shipping labels
Data storage for all media and statics are hosted on an NFS server distributed to the Magento instances
The front end and back end of Magento is hosted on AWS Elastic Beanstalk, with separate development and a production environments
The PIM, Custom Connection application, and Mirakl connectors are hosted on Beanstalk
The Redis in-memory data store is hosted on AWS ElastiCache
AWS CloudFront is utilized as the Content Distribution Network (CDN)
Databases are hosted on AWS Relational Database Service (RDS)
OpenVPN, Ansible, Rundeck, Microsoft Windows Servers, Gateway Servers and phpMyAdmin are hosted on AWS EC2
EC2 (and operating environments) is a client-managed service; all other services are managed by AWS
The Magento software is deployed and managed by Beanstalk. Two nearly identical environments were created, one for pre-production and one for production.
In the production environment there are two beanstalk environments. The first is for the front end servers that serve traffic to clients. The second is for the back office server which handles tasks such as reindexing, product import, routines, and the access interface to the backend web.
There are four front end servers and one backend server. They are in two different target groups, and are both served by a single Application Load Balancer. This ALB uses a path-based routing rule, which directs all traffic on /backend/* to the back office server, and the rest to the frontend server.
The code is kept in sync between the front end and back office server.
The source of the code is in a repository handled by the client.
Beanstalk configuration files, that lists all the actions performed by the instance during deployment, are stored in this repository “email@example.com: client/beanstalk-config.git”, on the branch “prod” for production, and “preprod” for pre-production.
The configuration file with the secret parameters (keys, database connections, etc) is kept on AWS S3, and was pulled by the instances during the deployment.
The Magento instances communicates with these services:
The deployment process was handled by Rundeck. There is not automatic deployment when a new commit is pushed, but it is a “one button” deployment.
Rundeck deployment process:
Beanstalk deployment process:
The PIM is deployed on Beanstalk, but the deployment is much simpler. The deployment only requires:
● Setting up the credentials
● Updating the DB scheme from the code
● Chowning the repository to the correct user permissions
The PIM Beanstalk configuration can be found on the same repository as Magento’s, except it is in the branch master in the “pim” directory.
There is only one instance being deployed, as high availability is not required.
The instance is accessible from the internet via the load balancer DNS name, in HTTP only.
It is connected to an RDS instance database, shared with the Custom Connection application and Mirakl connector.
The Client’s in-house developed Custom Connection application deployment is almost identical to the PIM, except it also has an ftp server, used to transfer product images and other items to the server.
The instance is not directly publicly accessible as it goes through a gateway server (detailed later in this document) that only exposes the required ftp ports.
The ftp user and password are created during the deployment, as well as the vsftpd configuration.
The Beanstalk configuration files can be found in the same repository and branch as the PIM, but in the Custom Connection application’ own directory.
The web root directory (/var/app/current) needs to be accessible, and writeable, from ftp. Since the web user is distinct from the ftp user, the directories in the web root directory need to be executable by anyone in the web user group, so they are chmoded to 775.
The instance is accessible from the internet via the load balancer DNS name, in HTTP only.
It is connected to an RDS instance database and shared with the PIM and Mirakl connector.
The Mirakl-HiPay connector deployment is almost identical to the PIM and Custom Connection application deployments.
The configuration file must be added, along with the Apache configuration. A cron operation is also added.
The Beanstalk configuration can be found in the same repository and branch as the Custom Connection application, except in the “mirakl-cashout” directory.
The instance is accessible from the internet via the load balancer DNS name, in HTTP only. It is connected to an RDS instance database, shared with the Custom Connection application and PIM.
The Custom application, developed by the client, is neither managed nor deployed via an AWS service. It runs on a Windows instance and all the software in it was installed manually.
There are two attached EBS volumes. Some of its ports are accessible from the internet via the gateway server. Those accessible ports are for RDP from the internet, and access to an ftp server running on the instance.
The following services are managed by Packer and Terraform. Packer builds the AMI to run on those servers, and Terraform launches those servers.
The packer configuration will first copy private keys to the instance (private keys are persistent such that if the image needs to be updated the same keys are used), then an installation script is run to generate client configurations.
The client creation script is located in /root/client-configs/create_user.sh. The only parameter required by this script is a unique username.
An Elastic IP is attached to the instance.
There are two separate parts to the Packer configuration for Rundeck.
Rundeck must have persistent data for its different jobs and configuration, and the data cannot be lost if the instance is re-deployed. The Packer AMI does not allow nor provide persistent data on its instances however. Therefore, all of the persistent data for Rundeck is stored and maintained on a mounted AWS EBS volume.
First, there is a Packer configuration to install Rundeck itself on an instance, and to then build an AMI for it. There is nothing particularly custom added here, only the specific dependencies for Rundeck.
Second, there is a Packer configuration to deploy the EBS volume attached to the instance that stores the persistent data for Rundeck. This configuration is only needed for bootstrapping the EBS volume, as once it is configured it should not have to be restarted from scratch.
The web portal for Rundeck is accessible on the internal domain name (rundeck.internal-client.url) on port 4440. It is only accessible from inside the VPC, with no external access allowed.
The Packer configuration for the EFS server is relatively light, as it only needs to install its specific dependencies.
The actual configuration of shares, and mounting of EBS volumes, is performed by a user_data script in the Terraform configuration. This script attaches the EBS volumes to mount points, and shares those mount points in the exports file.
The exported shares are the following:
The gateway server is an instance running HAProxy. It provides public access for certain services, without completely exposing the instance.
The exposed services are all running on a Windows instance:
The packer configuration is fairly trivial. Tt installs the HAProxy packet, and copies the configuration file.
For security purposes, the instance is not completely open to the internet, the security group is a whitelisting of IP addresses. The IP addresses authorized to access this instance are the following :
Three different databases are utilized. All are accessible only from inside the VPC.
The largest (in this case two r4.4xlarge instances) is the production database. These instances implement high availability by utilizing synchronous replication to another instance in another availability zone. Only the production website connects with the production database.
The second is the pre-production database. It is significantly smaller than the production database with only one r4.large). Only the pre-production website connects to it.
The third database shared is by the PIM and the Custom Connection application.
For each database, a snapshot is performed daily and retained until the next snapshot.
Elasticache is implemented for two Redis clusters; one cluster for the production environment and one for the pre-production.
The production Redis cluster is designed for high availability, with a primary node and a read replica node. Automatic failover will occur if the primary fails.
The pre-production cluster does not have a high availability implementation.
Cloudfront is used as a CDN for the media and static files of the website. Two distributions are configured: one for the media files and one for the static files.
Both distributions are configured according to Magento recommendations for Cloudfront.
They both use only HTTPS, and have domain names in the form .client.url.
Both distributions point to the production instances load balancer, but with /pub/static or /pub/media added to the url.
S3 is used to store multiple type of data. None exposed publicly.
A SSL certificate for the domain *.client.url is imported to the Certificate Manager. This certificate is added to the CloudFront distribution, and to the production load balancer.
Since the domain name .client.url is not managed by AWS, this certificate was simply imported, not generated.
Cloudwatch is used for monitoring of the production environment. The monitoring is split in two parts; alarms, and a monitoring dashboard.
All metrics that AWS generates on from services, as well as custom ones, are accessible from CloudWatch.
When alarms are triggered, an email is sent to a mailing list so that actions can be taken to resolve the problem.
Those alarms are as follow:
The monitoring dashboard provides representation of the important metrics of the production environment at a glance.
Metrics displayed in the production dashboard:
All services and instances are in the same VPC, which has the following IP range: XXX.0.0.0/16
There are 6 subnets in the VPC, 2 per availability zone; one public and one private.
The public subnets are:
The private subnets are:
The VPC has an Internet Gateway to communicate with the external Internet, and each private subnet has a NAT to provide connectivity to the internet.
By convention, all instances that do not require high availability are hosted on the first subnet (XXX.0.1.0/24, XXX.0.101.0/24); either the public or the private one.
There are also three VPN IPSec links that are connected to different parts of the client infrastructure and are needed to access different services.
There is also a VPN access to this network via an OpenVPN server. All subnets are accessible when connected to this VPN.
AWS Route 53 is used to manage the DNS of two domain names. The first one is .internal-client.url. which is used for internal domain names (e.g. rundeck, windows instance, nfs server), and is not publicly advertised (it is only advertised by the internal AWS DNS servers). This domain is registered with AWS.
The second one is the .client.url. domain. The ownership of the domain is not AWS.
Subdomains in use for the domain .internal-client.url.:
Subdomain in use solely for the AWS infrastructure, for the domain .client.url.:
Disable the maintenance mode on the store. To do so, run this command on the four production instances: “sudo -u webapp php /var/app/current/bin/magento maintenance:disable”
If an error happens during a deployment, Beanstalk will automatically stop the deployment and will not re-attach the instance to the Load Balancer. Beanstalk will print details about the error (such as the instance id on which the error happened), and a truncated version of the error message in the log file.
To troubleshoot the problem, reconnect via SSH to the instance on which the error happened. To find the IP address of the instance, get the instance id given by Beanstalk (for instance i-012e92fb498ad40d7). Then go into the EC2 console:
e.g.: https://eu-west-3.console.aws.amazon.com/ec2/v2/home?region=eu-west-3#Instances:sort=tag:Name, select the instance with the corresponding ID, and retrieve the private IP of the instance.
Tail the log file (via the command tail) /var/log/eb-activity.log. This file logs all the commands executed by Beanstalk during a deployment.
If there was an error during the deployment, it will be logged in this file.
After having found and corrected the cause of the error, relaunch the deployment in the same manner as before, via Rundeck.
Specifically Beanstalk related to high availability:
When deploying a new Magento website, databases migrations are frequent. Blue/Green deployment are thus very difficult, since a migration from the new environment will impact the old deployment, and break the website.
It would be possible to have one database per Blue/Green environment, but this adds significant complexity due to the requirement to sync databases that have diverged; especially difficult when handling payments, so it’s not really feasible.
Therefore deployments require downtime.
Since deployments are performed sequentially on each host, the website needs to be in maintenance mode when occurring, again, since online database upgrades would break the website. It should also be noted that maintenance mode is a database setting, so it cannot be activated/deactivated for single hosts.
While loading the website when in maintenance mode, it will return an HTTP code of 5xx. This unfortunately confuses the Beanstalk health check run once a deployment is completed on each host, and the deployment will fail.
To address this issue maintenance mode must be deactivated after each host has finished its deployment. The website is available for end users, but in a broken state since the code on the host may not match the database version. The statics and cache that are generated after the deployment may be affected as well.