Mar. 09, 2023
This tutorial demonstrates how to configure the Deadline Spot Plugin for bursting AWS cloud rendering jobs.
There are two prerequisites and settings in the environment to consider before proceeding with the setup of the Spot Plugin.
Prerequisite #1: It is critical that at least one node is running Pulse (or at least RCS) for nodes to be correctly requested. It is also preferable to prevent Workers from doing House Cleaning during this process.
Prerequisite #2: To avoid unexpected expenses, it is required to disable the submission of jobs without groups (i.e. ‘none’ group jobs).
Prerequisite #3: It is important to have a reliable connection between the Workers’ VPC and the projects/Deadline repository storages. This can be done either using Direct Connect or site-to-site VPN.
Limitation: Only one Region can be configured per repository on the Spot Plugin. For multiple Regions, users will need to configure new repositories.
This section describes the configuration of Roles and Profiles needed for the Spot Event Plugin.
Spot credentials are used by the Spot Event Plugin and contain the permissions necessary to create, maintain, and modify the Spot Fleets and optionally those permissions necessary to deploy the Resource Tracker. The required permissions can be granted by attaching the following AWS-managed IAM policies to the IAM user:
These credentials will be used when configuring Deadline. Readers can consult the step-by-step tutorial on how to create IAM Users here. In the example presented in this tutorial, the user created is called DeadlineSpotEventPluginAdmin.
The Spot Fleet IAM Instance Profile is an IAM role used by the Workers that are started by the Spot Fleet Requests. The role is used to give the Workers permission to terminate themselves, determine what Group they are part of, and report their status to the Resource Tracker (if in use).
The required permissions can be assigned by attaching the following AWS-managed IAM policy to an IAM role.
Readers can consult the step-by-step tutorial on how to create IAM Roles here. In the example presented in this tutorial, the role created is called DeadlineSpotWorker (It is important to note that the name of this role has to begin with “DeadlineSpot”). Additional permissions can be added to the role for System Manager access and to seamlessly join domains.
The IAM Fleet Role is used directly by the Spot Fleet. It gives the Spot Fleet the permissions required to start, stop, and tag instances. By default, a role will automatically be created for the account called aws-ec2-spot-fleet-tagging-role containing all the permissions that are needed. If the role has not been created, users can assign the following policies:
o AmazonEC2SpotFleetTaggingRole
o AWSThinkboxDeadlineSpotEventPluginWorkerPolicy
o AWSThinkboxDeadlineSpotEventPluginAdminPolicy
The Resource Tracker Role is an IAM role used by Deadline Resource Tracker to access the AWS resources that it creates in the account. The IAM role must have the following settings:
When creating Spot Fleet Requests, an Amazon Machine Image (AMI) is required for each Spot Fleet. The AMIs represent the base states for each of the instances. Readers can consult a tutorial on how to create a custom AMI here.
To ensure the proper functioning of the fleet, certain procedures need to be executed at runtime. For this, PowerShell scripts are created and will later be used to create the template. For the domain bound, the domain has to be joined and then unjoined at terminations. For more information, readers can consult the tutorial on how to manage domain membership of a dynamic fleet of EC2 instances.
A Spot Fleet Request defines a collection of Spot instances and their launching parameters. The Spot Event Plugin uses a separate Spot Fleet Request for each Deadline Group. With the images, roles, and users already created, this section covers the creation of the launch template and the JSON (JavaScript Object Notation) that will be used to request it.
1) Go to Services → EC2 and click AMIs
2) Find the AMI to be used as a base for the Launch, right-click and choose Images and templates → Create template from instance
3) Create a name for the launch template
4) Create a description for the launch template
5) Make sure the AMI ID is the one originally selected
6) Under Storage (volumes) select the volume size (it only scales up, so make sure to create an image above the minimum volume size but not too big)
7) Under Resource tags, click Add tag (*These steps must be followed in order for the plugin to work)
8) Under Network interfaces make sure that Auto-assign public IP is set. Set the subnet to Production Private Subnet 1, and Security group ID to the one corresponding to deadline-spot-SG
9) Open Advanced Details
10) Click Create launch template
a Spot Request needs to be created to enable the automatic launching of Workers by Deadline:
1) In the EC2 Console, click on Spot Requests on the left
2) Click Request Spot Instances
3) On Launch Parameters set to Use a launch template and select the template created on the previous section
4) Most of the settings can be left as default
5) Scroll down and select the Total Target Capacity
6) Select the checkbox next to Maintain Target Capacity
7) Once ready, DO NOT CLICK Launch
8) Scroll down to the bottom and click JSON config on the right corner, this will download a file called config.json that will be modified and used as a base for the requests
Below is the edited version of the config.json file. Notice that an “Overrides” field was added under “LaunchTemplateConfigs” in order to further filter instances and add weight. Fields corresponding to a new group name can also be added in order to create another fleet. The “Version” field has been changed to $Latest.
{“awsspot”: {
“IamFleetRole”: “arn:aws:iam:::role/aws-ec2-spot-fleet-tagging-role”,
“AllocationStrategy”: “capacityOptimizedPrioritized”,
“TargetCapacity”: 4,
“ValidFrom”: “2022-07-25T02:52:51.000Z”,
“ValidUntil”: “2023-07-25T02:52:51.000Z”,
“TerminateInstancesWithExpiration”: true,
“Type”: “maintain”,
“OnDemandAllocationStrategy”: “lowestPrice”,
“LaunchSpecifications”: [],
“LaunchTemplateConfigs”: [
{
“LaunchTemplateSpecification”: {
“LaunchTemplateId”: “-“,
“Version”: “$Latest”
},
“Overrides”: [
{
“InstanceType”: “c6i.16xlarge”,
“WeightedCapacity”: 1,
“SubnetId”: “-“,
“Priority”: 1
},
{
“InstanceType”: “c6i.16xlarge”,
“WeightedCapacity”: 1,
“SubnetId”: “-“,
“Priority”: 1
},
{
“InstanceType”: “m6i.12xlarge”,
“WeightedCapacity”: 1,
“SubnetId”: “-“,
“Priority”: 5
},
{
“InstanceType”: “m6i.12xlarge”,
“WeightedCapacity”: 1,
“SubnetId”: “-“,
“Priority”: 5
},
{
“InstanceType”: “m6i.8xlarge”,
“WeightedCapacity”: 1,
“SubnetId”: “-“,
“Priority”: 3
},
{
“InstanceType”: “m6i.8xlarge”,
“WeightedCapacity”: 1,
“SubnetId”: “-“,
“Priority”: 3
},
{
“InstanceType”: “r6i.12xlarge”,
“WeightedCapacity”: 1,
“SubnetId”: “-“,
“Priority”: 6
},
{
“InstanceType”: “r6i.12xlarge”,
“WeightedCapacity”: 1,
“SubnetId”: “-“,
“Priority”: 6
},
{
“InstanceType”: “r6i.8xlarge”,
“WeightedCapacity”: 1,
“SubnetId”: “-“,
“Priority”: 4
},
{
“InstanceType”: “r6i.8xlarge”,
“WeightedCapacity”: 1,
“SubnetId”: “-“,
“Priority”: 4
},
{
“InstanceType”: “r6i.4xlarge”,
“WeightedCapacity”: 1,
“SubnetId”: “-“,
“Priority”: 2
},
{
“InstanceType”: “r6i.4xlarge”,
“WeightedCapacity”: 1,
“SubnetId”: “-“,
“Priority”: 2
}
]
}
]
}
}
This section covers the configuration of the Deadline Spot Plugin. To access the plugin, users need to ascend to Super User access on the Deadline Monitor.
Go to Tools → Configure Events → Spot. Multiple fields are required:
Configuration
Example:
{
“group_name”:[“pool1″,”pool2”],
“2nd_group_name”:[“pool3”]
}
For users who want bursting render farm capabilities, the following steps have to be followed:
{ “hybrid_group_1”: [“cloud”], “hybrid_group_2”: [“cloud”] } |
When submitting projects, it is important to choose the desired hybrid group during submission (for example “hybrid_group”). When submitting Pool options, select the primary Pool as “onprem” and the Secondary Pool as “cloud”. Otherwise, cloud nodes may be requested even when on-premise nodes are sufficient.
Pool submission options
Spot Plugin keeps instances active as long as they have work. It is not possible to prevent cloud nodes from taking jobs when they are in the “none” group. When jobs in the “none” group are enabled, they can be picked up by cloud nodes and may continue to run even when not required, leading to unpredictable costs. The only way to prevent cloud nodes from picking jobs from the “none” group is to prevent all nodes from taking them. This setting can be configured under Configure Repository Options → Worker Settings → Exclude Jobs in the ‘none’ Group. It can also be useful to adopt the practice of not accepting jobs in the “none” pool in the future thus avoiding unpredictable behavior. Users need Super User access to configure this additional setting.
Repository Options: Exclude Jobs in the ‘none’ Group
There are no changes from the end user’s perspective. The user selects the group configured in the plugin and the request is automatically sent to AWS. The estimated time to start and configure instances is between 10 and 12 minutes. If the new instances take longer than 15 minutes to appear on the Dashboard, the logs must be checked. This will be covered in the Troubleshooting section.
This section covers some key troubleshooting areas.
TrackIt is an Amazon Web Services Advanced Consulting Partner specializing in cloud management, consulting, and software development solutions based in Marina del Rey, CA.
TrackIt specializes in Modern Software Development, DevOps, Infrastructure-As-Code, Serverless, CI/CD, and Containerization with specialized expertise in Media & Entertainment workflows, High-Performance Computing environments, and data storage.
TrackIt’s forté is cutting-edge software design with deep expertise in containerization, serverless architectures, and innovative pipeline development. The TrackIt team can help you architect, design, build and deploy a customized solution tailored to your exact requirements.
In addition to providing cloud management, consulting, and modern software development services, TrackIt also provides an open-source AWS cost management tool that allows users to optimize their costs and resources on AWS.