Sunday 20 March 2022

AWS ParallelCluster

 

  • An AWS-supported open source cluster management tool for deploying and managing High Performance Computing (HPC) clusters on AWS. ParallelCluster uses a simple text file to model and provision all the resources needed for your HPC applications in an automated and secure manner.
  • AWS ParallelCluster provisions a master instance for build and control, a cluster of compute instances, a shared filesystem, and a batch scheduler. You can also extend and customize your use cases using custom pre-install and post-install bootstrap actions.

How It Works

AWS ParallelCluster

  • You have four supported schedulers to use along with ParallelCluster:
    • SGE (Son of Grid Engine)
    • Torque
    • Slurm
    • AWS Batch
  • AWS ParallelCluster supports 
    • On-Demand,
    • Reserved,
    • and Spot Instances

Networking

    • AWS ParallelCluster uses Amazon Virtual Private Cloud (VPC) for networking. The VPC must have DNS Resolution = yesDNS Hostnames = yes and DHCP options with the correct domain-name for the Region. 
    • AWS ParallelCluster supports the following high-level configurations:
      • One subnet for both master and compute instances.
      • Two subnets, with the master in one public subnet, and compute instances in a private subnet. The subnets can be new or existing.
    • AWS ParallelCluster can also be deployed to use an HTTP proxy for all AWS requests.

Storage

    • By default, AWS ParallelCluster automatically configures an external volume of 15 GB of Elastic Block Storage (EBS) attached to the cluster’s master node and exported to the cluster’s compute nodes via Network File System (NFS).
    • AWS ParallelCluster is also compatible with Amazon Elastic File System (EFS), RAID, and Amazon FSx for Lustre file systems. 
    • You can configure AWS ParallelCluster with Amazon S3 object storage as the source of job inputs or as a destination for job output.
  • Cluster Configuration
    • By default, AWS ParallelCluster uses the file ~/.parallelcluster/config for all configuration parameters. A custom configuration file may be specified via the -c or –config command line option or the AWS_PCLUSTER_CONFIG_FILE environment variable.
    • The following sections are required: 
      • [global] section and [aws] section.
      • At least one [cluster] section and one [vpc] section.
  • Cluster Processes
    • When a cluster is running, a process called a jobwatcher monitors the configured scheduler ( SGE , Slurm , or Torque ) and each minute, it evaluates the queue in order to decide when to scale up.
    • The sqswatcher process monitors for Amazon SQS messages that are sent by Auto Scaling, to notify you of state changes within the cluster.
    • The nodewatcher process runs on each node in the compute fleet and terminates instances that have been idle for a set amount of time.

Pricing

    • AWS ParallelCluster is available at no additional charge. You pay only for the AWS resources needed to run your applications.

Limitations

    • AWS ParallelCluster does not support building Windows clusters.
    • AWS ParallelCluster does not currently support mixed instance types for a cluster. However, you can pick one instance type for the master node and another instance type for the compute nodes.

No comments:

Post a Comment