LightSpin’s Research and Discoveries for Securing AWS SageMaker, a Popular Data Science Tool
Disclaimer: This post includes findings from December 2020. Some of them are already fixed in production and cannot be reproduced.
Amazon SageMaker is a fully managed machine learning AWS service that enables data scientists and developers to build and train machine learning models, and then directly deploy them into a production-ready hosted environment. It is used by thousands of leading organizations worldwide.
This post aims to look at this service in more detail, and to point out the potential risks and misconfigurations that can expose an AWS account to security breaches through the SageMaker service.
Some Background on AWS SageMaker
Amazon SageMaker provides a Jupyter notebook that runs on a managed EC2 instance with three main options of networking. These are, no VPC, a VPC attached with internet access, and a VPC attached without internet access. The VPC option allows a user to provision off a section of the cloud that is logically isolated. This means that an area of the cloud will be cut off from the broader cloud environment accessible only through very specific and secure private networks.
Usually, users tend to choose the easy and default choice - no VPC, which allows the notebook instance to have direct access to the internet and therefore to download packages or notebooks easily. The downside of allowing a notebook instance access to the broader internet is that it exposes the notebook to threats via direct access or malicious packages that might include a malicious code. This graphic can provide a visual representation of the environment.
The malicious code could potentially access the notebook data or perform lateral movement inside the AWS account.
There are two ways to create a new Jupyter notebook in Amazon SageMaker:
- Amazon SageMaker Studio, which is a web-based, integrated development environment (IDE) for machine learning that lets you build, train, debug, deploy, and monitor your machine learning models. The Studio extends the JupyterLab interface and provides an integrated Jupyter notebook instance.
- Amazon SageMaker notebook instance, which is a machine learning (ML) compute instance running the Jupyter Notebook App.
Lightspin’s Research and Discoveries
As part of our in-depth research into this Machine Learning service, we compared SageMaker Studio and SageMaker notebook instances, examining their environment and the underlying architecture in terms of security. Additionally, we have outlined a few potential attack flows, risky configurations and ideas for shoring up your cloud security defenses when using SageMaker.
We started by opening a JupyterLab terminal on both SageMaker Studio and SageMaker notebook instances and examine their environment and underling architecture.
The first thing we wanted to try was to access the Metadata Endpoint, see below.
SageMaker notebook instance
It was clear that the Metadata Endpoint was accessible when using the notebook instance but not through the SageMaker Studio. This finding fits Amazon’s documentation regarding IMDS access where they state, “Due to security concerns, access to the Amazon Elastic Compute Cloud (Amazon EC2) Instance Metadata Service (IMDS) is unavailable in SageMaker Studio.”
Next, we dug deeper, to explore the environment variables:
SageMaker notebook instance
From the names of the environment variables we understood that the notebook instance is running regularly on an EC2 instance while SageMaker Studio uses Amazon Elastic Container Service (ECS) in the background.
The evidence for that is the AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable. The user can provide a task role when running a task on ECS. The Amazon ECS agent sets a unique task credential ID as an identification token and updates its internal credential cache, so that the identification token for the task points to the role credentials that are received in the payload.
The Amazon ECS agent populates the AWS_CONTAINER_CREDENTIALS_RELATIVE_URI environment variable for all containers that belong to this task, with the following relative URI: /credential_provider_version/credentials?id=task_credential_id
To our surprise, this contradicted the Amazon documentation that said “Due to security concerns, access to the Amazon Elastic Compute Cloud (Amazon EC2) Instance Metadata Service (IMDS) is unavailable in SageMaker Studio.”
We realized that we had found a bypass to access the EC2 instance metadata on SageMaker Studio through ECS task metadata.
We immediately contacted AWS with our findings.
Since we reached out to AWS, they have fixed this vulnerability, by blocking access to the above link so that it is inaccessible.
Privilege Escalation Methods in SageMaker
Our next approach was to look at options for privilege escalation from inside SageMaker.
PE by Create Notebook and Pass Role
Both the notebook and studio instances require attaching an IAM Role referred to as an execution role. This role is used by the SageMaker service to perform operations on the user’s behalf. A notebook within SageMaker can access the execution role with the following code:
sagemaker_session = sagemaker.Session()
role = sagemaker.get_execution_role()
For proper functionality, the execution role permissions should be limited and allow only the necessary actions. An execution role with overly loose permissions can assist an attacker with privilege escalation and may even lead to AWS account takeover.
Another way to access the execution role’s temporary credentials is through the instance metadata endpoint.
An attacker with the iam:PassRole and sagemaker:CreateNotebookInstance permissions can create a new notebook instance. They will have complete access to this instance, and can pass an existing execution role to it.
aws sagemaker create-notebook-instance --notebook-instance-name <name> --instance-type <type> --role-arn <role>
They can then access the execution role credentials by one of the options above which gives them access to all the permissions that the execution role has.
PE by Notebook Lifecycle
SageMaker enables the user to create lifecycle configurations to perform a number of tasks. Examples include,install packages or sample notebooks to the instance, configuring networking and security or leveraging shell scripts to further customize the notebook.
There are two options for script execution in a lifecycle configuration:
- Start notebook - the script code will be executed when the notebook instance is started or restarted, including its initial creation.
- Create notebook - the script code will be executed when the notebook instance is created (only initial creation).
An attacker with the sagemaker:CreateNotebookInstanceLifecycleConfig or sagemaker:UpdateNotebookInstanceLifecycleConfig permission can create or update a notebook lifecycle configuration to include a malicious code that will run in the event of instance startup or initialization. The malicious code can open a reverse shell, manipulate existing data on the instance or access the execution role.
Open reverse shell Lifecycle Configuration
The configuration above is attached to an instance notebook in production environment.
When the instance is initialized or restarts, it enters a pending status and a reverse shell is opened to the attacker machine.
Pending notebook instance
Opened reverse shell
Note that the attacker runs as the root user as Amazon mention in their documentation for lifecycle configuration:
“Lifecycle configurations run as the root user.”
This is an example of a Jupyter notebook that opens a reverse shell to the attacker once the code in the notebook is executed.
Securing AWS SageMaker Against Privilege Escalation
As a result of our research, here are three actionable tips for data science teams to ensure secure use of AWS SageMaker.
- AWS SageMaker role should be attached to the minimum required permissions to avoid risky privilege escalation that might take over the account.
- Always analyze the content of configured lifecycle configurations as they by default run with root privileges on the machine.
- Identify who can create new lifecycle configurations and attach them to a notebook instance.
Understanding Cloud Risk
It’s clear that when you’re thinking about cloud risk, your security teams need to think outside of the usual AWS suspects. Even more niche services such as data science tools like SageMaker could open up your environment to risk. [Watch this space for an in-depth white paper on how data science teams can better protect the organization when working with cloud-native tools.]
Want to hear more about protecting your AWS environment from cloud misconfigurations and risky permissions? Reach out to schedule a demo.