Storage Management

Refer to Introductionto learn more about Shared Storage configuration on IDEA.

Apps and Data Storage (Required)

For the IDEA Cluster to function, shared storage configuration must include Apps and Data storage configurations. Both Apps and Data are cluster scoped file systems and are mounted automatically on all applicable infrastructure hosts, eVDI linux sessions and SOCA Compute Nodes.

Apps

  • Apps shared-storage is used to save critical cluster configuration scripts, files and logs.

  • For Scale-Out computing workloads, additional Applications (eg. OpenMPI or IntelMPI, Python, Solvers etc) can be installed on shared-storage, and can be leveraged by Compute Nodes.

  • Default Configuration:

    • Apps storage is mounted on /apps mount path, and is configurable.

    • Amazon EFS is used as the default storage provider for Apps storage.

    • A custom CloudWatch monitoring rules and Lambda function is deployed for EFS Apps storage volumes, which help monitor the throughput of the file system and dynamically adjust the throughput mode to provisioned or bursting.

Data

  • Data storage is primarily used to store User Home Directories.

  • Additional directories for project/group level file shares can be created on Data Storage.

  • Default Configuration:

    • Data storage is mounted on /data mount path, and is configurable.

    • Amazon EFS is used as default storage provider for Apps.

    • To save cost, EFS Lifecycle policy is set to move data to Infrequently Accessed storage class after 30 days.

Scope

A notion of scope is introduced in IDEA to enable cluster administrators manage multiple file systems and specify mount criteria based on access, use-case and workload needs. Shared Storage mounts can be scoped based on:

Cluster

Cluster scoped shared storage mounts are applied across all nodes in the cluster. These include applicable infrastructure nodes, SOCA Compute Nodes and eVDI Hosts.

Scale-Out Computing: Queue Profiles

Queue Profile scoped shared storage mounts are applicable for all Compute Nodes launched for Jobs submitted to the queues configured under a Queue Profile.

Add or Attach Shared Storage to Cluster

The idea-admin.sh shared-storage utility enables admins to generate configurations for:

  • Provisioning new file systems

  • Re-use existing file systems

Either of the use-cases can be executed prior to initial cluster deployment OR after cluster deployment.

If shared storage configurations are updated after an IDEA Cluster is deployed, depending upon the Scope, manual actions will be required to mount the file system on applicable existing cluster nodes. All new hosts launched after the configuration update will automatically mount the configured file systems. See below for example(s)

Provision new File System

Shared Storage config generation for provisioning new file systems is only supported for Amazon EFS at the moment.

To generate configurations for provisioning new file systems you can use the idea-admin.sh shared-storage add-file-system command as below:

Example

idea-admin.sh utility will automatically update your IDEA cluster environment if you select " Update Cluster Settings and Exit". You can also choose to automatically "Deploy" the cluster which will automatize the steps mentioned below. For this demo, we are just Updating Cluster Settings and will proceed to a manual deployment afterwards.

Once done, you can validate your new mount point in the web interface via "Cluster Management" > "Settings" > "Shared Storage"

At this point, the FileSystem ID is empty because you asked to provision a brand new EFS. To update the backend infrastructure and trigger the EFS creation, you must run deploy command (see this page for more details about deploy utility).

First, run the idea-admin.sh cdk diff to confirm the new EFS will be created:

./idea-admin.sh cdk diff shared-storage \
   --cluster-name <CLUSTER_NAME> \
   --aws-region <REGION> 
Stack <CLUSTER_NAME>-shared-storage
IAM Statement Changes
┌───┬────────────────────────────┬────────┬────────────────────────────────────┬───────────┬──────────────────────────────────────────────────────┐
│   │ Resource                   │ Effect │ Action                             │ Principal │ Condition                                            │
├───┼────────────────────────────┼────────┼────────────────────────────────────┼───────────┼──────────────────────────────────────────────────────┤
│ + │ ${testefs-storage-efs.Arn} │ Allow  │ elasticfilesystem:ClientMount      │ AWS:*     │ "Bool": {                                            │
│   │                            │        │ elasticfilesystem:ClientRootAccess │           │   "elasticfilesystem:AccessedViaMountTarget": "true" │
│   │                            │        │ elasticfilesystem:ClientWrite      │           │ }                                                    │
└───┴────────────────────────────┴────────┴────────────────────────────────────┴───────────┴──────────────────────────────────────────────────────┘
(NOTE: There may be security-related changes not in this list. See https://github.com/aws/aws-cdk/issues/1299)

Resources
[+] AWS::EFS::FileSystem testefs-storage-efs testefsstorageefs
[+] AWS::EFS::MountTarget testefs-storage-efs/testefs-storage-efs-mount-target-1 testefsstorageefstestefsstorageefsmounttarget15EF95F6B
[+] AWS::EFS::MountTarget testefs-storage-efs/testefs-storage-efs-mount-target-2 testefsstorageefstestefsstorageefsmounttarget20E352012
[+] AWS::EFS::MountTarget testefs-storage-efs/testefs-storage-efs-mount-target-3 testefsstorageefstestefsstorageefsmounttarget32982F891
[~] Custom::ClusterSettings <CLUSTER_NAME>-shared-storage-settings ideapatchusharedstoragesettings
 └─ [~] settings
     ├─ [~] .deployment_id:
     │   ├─ [-] d99dccd4-0c05-4535-a13f-8dda34662848
     │   └─ [+] 918057b8-b0c3-4ea5-a92a-6da5569920f5
     ├─ [+] Added: .testefs.efs.dns
     └─ [+] Added: .testefs.efs.file_system_id


This command confirmed the change and CDK will proceed to the EFS creation once you will run the actual deploy command:

$ ./idea-admin.sh deploy shared-storage \
   --cluster-name <CLUSTER_NAME> \
   --aws-region <REGION> \ 
   --upgrade
deploying module: shared-storage, module id: shared-storage

✨  Synthesis time: 19.54s

<CLUSTER_NAME>-shared-storage: building assets...

[0%] start: Building d70814031a62eca4c91303efaad90b00703c3d9adbcc3b64c4f3de07322adf24:<REDACTED>-us-east-2
[100%] success: Built d70814031a62eca4c91303efaad90b00703c3d9adbcc3b64c4f3de07322adf24:<REDACTED>-us-east-2

<CLUSTER_NAME>-shared-storage: assets built

<CLUSTER_NAME>-shared-storage: deploying...
[0%] start: Publishing d70814031a62eca4c91303efaad90b00703c3d9adbcc3b64c4f3de07322adf24:<REDACTED>-us-east-2
[100%] success: Published d70814031a62eca4c91303efaad90b00703c3d9adbcc3b64c4f3de07322adf24:<REDACTED>-us-east-2
<CLUSTER_NAME>-shared-storage: creating CloudFormation changeset...
<CLUSTER_NAME>-shared-storage | 0/7 | 7:04:12 PM | UPDATE_IN_PROGRESS   | AWS::CloudFormation::Stack | <CLUSTER_NAME>-shared-storage User Initiated
<CLUSTER_NAME>-shared-storage | 0/7 | 7:04:17 PM | CREATE_IN_PROGRESS   | AWS::EFS::FileSystem    | testefs-storage-efs (testefsstorageefs)
<CLUSTER_NAME>-shared-storage | 0/7 | 7:04:18 PM | CREATE_IN_PROGRESS   | AWS::EFS::FileSystem    | testefs-storage-efs (testefsstorageefs) Resource creation Initiated
<CLUSTER_NAME>-shared-storage | 1/7 | 7:04:22 PM | CREATE_COMPLETE      | AWS::EFS::FileSystem    | testefs-storage-efs (testefsstorageefs)
<CLUSTER_NAME>-shared-storage | 1/7 | 7:04:23 PM | CREATE_IN_PROGRESS   | AWS::EFS::MountTarget   | testefs-storage-efs/testefs-storage-efs-mount-target-3 (testefsstorageefstestefsstorageefsmounttarget32982F891)
<CLUSTER_NAME>-shared-storage | 1/7 | 7:04:23 PM | CREATE_IN_PROGRESS   | AWS::EFS::MountTarget   | testefs-storage-efs/testefs-storage-efs-mount-target-2 (testefsstorageefstestefsstorageefsmounttarget20E352012)
<CLUSTER_NAME>-shared-storage | 1/7 | 7:04:23 PM | CREATE_IN_PROGRESS   | AWS::EFS::MountTarget   | testefs-storage-efs/testefs-storage-efs-mount-target-1 (testefsstorageefstestefsstorageefsmounttarget15EF95F6B)
<CLUSTER_NAME>-shared-storage | 1/7 | 7:04:23 PM | UPDATE_IN_PROGRESS   | Custom::ClusterSettings | <CLUSTER_NAME>-shared-storage-settings/Default (ideapatchusharedstoragesettings)
<CLUSTER_NAME>-shared-storage | 1/7 | 7:04:25 PM | CREATE_IN_PROGRESS   | AWS::EFS::MountTarget   | testefs-storage-efs/testefs-storage-efs-mount-target-2 (testefsstorageefstestefsstorageefsmounttarget20E352012) Resource creation Initiated
<CLUSTER_NAME>-shared-storage | 1/7 | 7:04:25 PM | CREATE_IN_PROGRESS   | AWS::EFS::MountTarget   | testefs-storage-efs/testefs-storage-efs-mount-target-1 (testefsstorageefstestefsstorageefsmounttarget15EF95F6B) Resource creation Initiated
<CLUSTER_NAME>-shared-storage | 2/7 | 7:04:27 PM | UPDATE_COMPLETE      | Custom::ClusterSettings | <CLUSTER_NAME>-shared-storage-settings/Default (ideapatchusharedstoragesettings)
<CLUSTER_NAME>-shared-storage | 2/7 | 7:04:32 PM | CREATE_IN_PROGRESS   | AWS::EFS::MountTarget   | testefs-storage-efs/testefs-storage-efs-mount-target-3 (testefsstorageefstestefsstorageefsmounttarget32982F891) Resource creation Initiated
2/7 Currently in progress: <CLUSTER_NAME>-shared-storage, testefsstorageefstestefsstorageefsmounttarget32982F891, testefsstorageefstestefsstorageefsmounttarget20E352012, testefsstorageefstestefsstorageefsmounttarget15EF95F6B
<CLUSTER_NAME>-shared-storage | 3/7 | 7:05:50 PM | CREATE_COMPLETE      | AWS::EFS::MountTarget   | testefs-storage-efs/testefs-storage-efs-mount-target-3 (testefsstorageefstestefsstorageefsmounttarget32982F891)
<CLUSTER_NAME>-shared-storage | 4/7 | 7:05:58 PM | CREATE_COMPLETE      | AWS::EFS::MountTarget   | testefs-storage-efs/testefs-storage-efs-mount-target-1 (testefsstorageefstestefsstorageefsmounttarget15EF95F6B)
<CLUSTER_NAME>-shared-storage | 5/7 | 7:05:59 PM | CREATE_COMPLETE      | AWS::EFS::MountTarget   | testefs-storage-efs/testefs-storage-efs-mount-target-2 (testefsstorageefstestefsstorageefsmounttarget20E352012)
<CLUSTER_NAME>-shared-storage | 6/7 | 7:06:00 PM | UPDATE_COMPLETE_CLEA | AWS::CloudFormation::Stack | <CLUSTER_NAME>-shared-storage
<CLUSTER_NAME>-shared-storage | 7/7 | 7:06:01 PM | UPDATE_COMPLETE      | AWS::CloudFormation::Stack | <CLUSTER_NAME>-shared-storage

 ✅  <CLUSTER_NAME>-shared-storage

✨  Deployment time: 124.62s

Stack ARN:
arn:aws:cloudformation:us-east-2:<REDACTED>:stack/<CLUSTER_NAME>-shared-storage/b7ada700-73d6-11ed-a507-0645fa8e8430

✨  Total time: 144.16s

Now that the deployment command is complete, go back to the web interface and validate the new EFS has been created and now has valid FileSystem ID assigned.

To further validate our new mount point, we can submit a test job which will output df command

qsub -- /bin/df -h

The job output should display the mount point (custom/path) for your new filesystem

Filesystem                                          Size  Used Avail Use% Mounted on
devtmpfs                                            1.9G     0  1.9G   0% /dev
tmpfs                                               1.9G     0  1.9G   0% /dev/shm
tmpfs                                               1.9G  408K  1.9G   1% /run
tmpfs                                               1.9G     0  1.9G   0% /sys/fs/cgroup
/dev/nvme0n1p1                                       10G  3.2G  6.9G  32% /
fs-0175f5f6e34dd73ee.efs.us-east-2.amazonaws.com:/  8.0E  278M  8.0E   1% /apps
fs-0782abfc0e273d46d.efs.us-east-2.amazonaws.com:/  8.0E     0  8.0E   0% /data
fs-01db45fc6a9eaf20a.efs.us-east-2.amazonaws.com:/  8.0E     0  8.0E   0% /custom_path

Attach existing File System

To generate configurations for attaching an existing file system, you can use the idea-admin.sh shared-storage attach-file-system command as below. This utility will automatically search for existing backed storage (FSx for Lustre/NetApp/OpenZFS/Windows, EFS) running in your VPC.

Example

Remove a File System

Run ./idea-admin.sh config delete shared-storage.<filesystem_name> to remove a shared filesystem from IDEA.

./idea-admin.sh config delete shared-storage.testefs \
 --cluster-name <CLUSTER_NAME> \
 --aws-region <REGION_NAME>

searching for config entries with prefix: shared-storage.testefs
found 14 config entries matching: shared-storage.testefs
deleting config entry - shared-storage.testefs.efs.performance_mode = generalPurpose
deleting config entry - shared-storage.testefs.efs.dns = fs-01db45fc6a9eaf20a.efs.us-east-2.amazonaws.com
deleting config entry - shared-storage.testefs.efs.throughput_mode = bursting
deleting config entry - shared-storage.testefs.efs.transition_to_ia = None
deleting config entry - shared-storage.testefs.title = "New Shared EFS for Project A"
deleting config entry - shared-storage.testefs.mount_options = nfs4 nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport 0 0
deleting config entry - shared-storage.testefs.efs.removal_policy = DESTROY
deleting config entry - shared-storage.testefs.mount_dir = /custom_path
deleting config entry - shared-storage.testefs.efs.kms_key_id = None
deleting config entry - shared-storage.testefs.efs.encrypted = True
deleting config entry - shared-storage.testefs.provider = efs
deleting config entry - shared-storage.testefs.efs.file_system_id = fs-01db45fc6a9eaf20a
deleting config entry - shared-storage.testefs.efs.cloudwatch_monitoring = False
deleting config entry - shared-storage.testefs.scope = ['cluster']
deleted 14 config entries

Removing a file system from IDEA won't trigger a file system deletion. Make sure to re-deploy the shared-storage module if you want to remove a filesystem previously created by IDEA

Shared Storage Providers

Amazon EFS

New EFS Configuration

Existing EFS Configuration

Amazon FSx for Lustre

Existing FSx for Lustre Configuration

Amazon FSx for NetApp ONTAP

Existing FSx for NetApp ONTAP

Amazon FSx for OpenZFS

Existing FSx for OpenZFS

Amazon FSx for Windows File Server

Existing FSx for Windows File Server

Visualize Cluster Settings

Shared storage settings can be viewed via Web Portal and IDEA CLI.

Web Portal

Navigate to "Cluster Management" > "Settings" > "Shared Storage"

IDEA CLI

Last updated