How to Guard Your Secrets — Best Practices for Handling Sensitive Data in Python

Navigating secrets management in Python from environment variables to cloud services

How to Guard Your Secrets — Best Practices for Handling Sensitive Data in Python
Handling sensitive data using cloud services. Image generated by Midjourney, prompt by author.

It was a typical Thursday afternoon at CodeHive. At this bustling software development firm, Ruby, an enthusiastic junior developer, eagerly discussed a new project with her experienced team lead, Brad.

They were exploring integrating a Bitcoin wallet service into their latest product, which required handling a sensitive API key.

“Brad,” Ruby began, her voice filled with anticipation, “I’ve thought this through. What if we stored our API key in a file and added it to our private repository? It’s private, after all. That should be secure, right?”

Brad, a seasoned developer with countless projects, gently shook his head. He knew it was a teachable moment.

“Ruby,” he said with a comforting smile, “I appreciate your initiative. However, storing sensitive information like API keys in a file, even in a private repo, can lead to serious security issues.

Suppose someone gains unauthorized access to our repository or our sensitive data is exposed if we accidentally make it public. Plus, it’s hard to update or rotate secrets when they’re hardcoded in files.”

Brad’s eyes twinkled with the excitement of sharing knowledge. “A better way would be to use environment variables. In Python, we can leverage the os module to manage these.

The API key is stored only on the machine where the code runs, not the code itself. It’s safer, and it’s also easier to manage and update."

He paused, letting Ruby absorb the information before adding another layer. “This method works well and significantly improves storing our keys in files.

However, it has limitations in a professional environment, especially when multiple developers are involved, or we must deploy our applications in different environments.

Environment variables can differ between development, testing, and production environments, and keeping them synchronized can be tricky.”

Seeing Ruby’s nod of understanding, Brad continued, “And that’s where cloud-based solutions come into play. We’re already using Azure for hosting our project.

We can utilize Azure Key Vault to manage our secrets. This way, the API key is safely stored in the cloud and can be accessed only by authorized entities. It’s a best practice for managing sensitive data in a dynamic, multi-developer context.”

This article will delve into these essential aspects of secure coding in Python. We’ll explore using environment variables to store sensitive data, discuss its challenges, and walk through a practical guide on using Cloud services such as Azure Key Vault for remote management in a Python application.


Table of contents

· Environment Variables in Python
Setting an Environment Variable in Python
Getting an Environment Variable
Using .env Files for Environment Variables
· Limitations of Environment Variables
· Understanding Identity, Access Control, and Secrets Management in the Cloud
Identities
Access Controls
Development vs. Production Access
· Using Azure Key Vault with Python
1. Create a Managed Identity
2. Create a Key Vault
3. Configure Access Policy
4. Add Secrets to Key Vault
5. Use Azure Key Vault libraries for Python
6. Access Secrets in your Python code
· Using AWS Secrets Manager with Python
1. Create an IAM role
2. Define a Secrets Manager secret
3. Set the IAM policy for the secret
4. Use AWS SDK for Python (Boto3)
5. Access the Secret in Python code
· Using Google Cloud Secret Manager with Python
1. Create a Service Account
2. Create a Secret in Secret Manager
3. Grant the Service Account Access to the Secret
4. Use Google Cloud Client Libraries for Python
5. Access the Secret in Python code
· Conclusion: Level Up Your Python Application Security with Secret Management


Environment Variables in Python

Environment variables are a fundamental way to store sensitive data like API keys, database passwords, and other credentials. They offer an excellent method for hiding sensitive information from your application’s code, allowing the application to remain oblivious to the specific details of its environment.

Instead of hardcoding this data directly into your application, you can set it in the environment where your code runs.

In Python, we can use the os module, a part of the standard library, to interact with environment variables. Here's a brief tutorial on how to set and get environment variables using the os module.

Setting an Environment Variable in Python

Environment variables are typically set through a terminal command or within a setup script. This script would contain the variables and their respective values.

For instance, you might define these in the YAML of a Kubernetes deployment, a Docker Compose YAML file, or even within a startup shell script.

However, did you know setting environment variables directly within your Python code is also possible?

Python’s os module offers the os.environ object, essentially a dictionary representing your environment's string elements when the os module is loaded. By manipulating this dictionary, you can add new environment variables.

Let’s look at an example:

import os 
 
# set a new environment variable 
os.environ['API_KEY'] = 'your-api-key'

Getting an environment variable

Once you’ve set an environment variable through code or via the terminal, you can access it using os.getenv. Here's how:

import os 
 
# get the value of an existing environment variable 
api_key = os.getenv('API_KEY')

In this example, we get the environment variable ‘API_KEY’ value. We just set and stored it in the api_key variable. If the environment variable is not set, os.getenv will return None.

It’s important to note that these environment variables are temporary — they’re only available for the duration of the current process where they’re set.

Suppose you want to set environment variables permanently. In that case, you should do so through your system’s command line interface or within the settings of your IDE or text editor, depending on your development setup.

Using .env file for environment variables

Another effective approach for managing environment variables in Python is using .env files in conjunction with the python-dotenv library. .env files allow you to define your environment variables in one place, making managing and updating them easier.

To get started, you first need to install the python-dotenv library. You can do this via pip:

pip install python-dotenv

Then, you can create a .env file in your project's root directory. In this file, you can define your environment variables in the KEY=VALUE format like this:

API_KEY=your-api-key 
DB_PASSWORD=your-database-password

Replace your-api-key and your-database-password with your API key and database password. You can add as many key-value pairs as you need.

Next, in your Python code, use python-dotenv to load the variables from the .env file:

from dotenv import load_dotenv 
import os 
 
# load the .env file 
load_dotenv() 
 
# get the value of an existing environment variable 
api_key = os.getenv('API_KEY') 
db_password = os.getenv('DB_PASSWORD')

With this setup, the load_dotenv() function loads the key-value pairs from the .env file into the environment. You can then use os.getenv to access the environment variables normally.

A critical thing to remember when working with .env files is to prevent them from ending up in your version control system. Exposing these files publicly can reveal your sensitive data. Therefore, if you're using Git, add .env to your .gitignore file. This practice ensures that Git ignores the .env file and doesn't include it in your repository.

This does, however, mean that every developer in the development team needs to create the .env file on their copy of the repository.

With the addition of .env files, managing environment variables in your Python application becomes more streamlined, paving the way for the secure handling of sensitive data. In the following sections, we will further enhance this security by introducing cloud-based solutions.


Limitations of environment variables

While environment variables and .env files provide a practical solution for managing sensitive data, they have limitations. One of the significant drawbacks lies in their vulnerability when the application runs on a shared system.

Despite all precautions, if an unauthorized person gains access to the terminal or the machine your application is running, they can easily reveal the secrets by inspecting the environment variables.

For example, in most Unix-like systems, a simple command like printenv or env can display all the environment variables, including your sensitive data.

printenv API_KEY

When executed on the terminal, the command above would reveal the API_KEY value stored in the environment variable.

Moreover, in some cases, environment variables can be leaked through error messages or log files if the application’s error handling and logging are not configured correctly.

So, while environment variables are a step in the right direction, they aren’t the ultimate solution for managing sensitive data securely.

This is where cloud services come in. Cloud vendors like Azure, Amazon Web Services (AWS), and Google Cloud offer true secrets management services that provide a more secure way of storing and accessing sensitive data.

In the next sections, we will explore these services and learn how to leverage them for secrets management in our Python applications.


Understanding secrets management in the cloud

All major cloud vendors offer sophisticated secrets management services that significantly enhance the security of sensitive data. These services primarily operate on two key concepts: identities and access controls. You will see later how each of the Cloud vendors implements these concepts.

Identities

Every application or service running on the cloud platform is assigned a unique identity. This identity is an authentic representation of the application, comparable to how a username represents a user in a traditional login system.

Access controls

Once identities are established, the cloud platform allows you to set fine-grained access controls on your secrets. These controls decide which identities (applications or services) have permission to access which secrets.

When your application needs to access a secret, it requests the cloud service. The service authenticates the application’s identity and verifies if it possesses the required permissions. If the check is successful, the cloud service provides the application access to the secret.

Development vs. production access

There’s a clear distinction between the development and production accesses in a secure application environment. Developers working in the development setting use a distinct set of test secrets granting access to non-production resource versions.

This can include a sandboxed database filled with mock data, a dummy API that mirrors your actual one, or similar test resources. Developers should never need nor be granted access to production secrets.

Managing production secrets and setting access controls usually involves a DevOps team or similar roles in charge of deployment and infrastructure management.

When your application is ready for production deployment, these individuals assign the application an identity authorized to access the production secrets. This assignment is typically a part of the application’s deployment pipeline.

This separation of access for development and production minimizes the risk of accidental exposure of sensitive production data. It allows finer control over who has access to your production secrets and under what circumstances.

Using these services, you can ensure your secrets are always securely stored and only accessible by authorized applications, reducing the risk of data breaches. In the upcoming sections, we will explore these services in more detail and discuss how to use them in your Python applications.


Using Azure Key Vault with Python

Before we delve into how to use Azure Key Vault with Python, let’s familiarize ourselves with the relevant terminology. Understanding these terms is crucial, as each cloud vendor has unique resource naming conventions.

In Azure’s ecosystem, the services that host web applications are known as ‘Azure Web Apps.’ Regardless of the language your application is implemented in — be it Python, Go, C#, or any other — Azure Web Apps provide a robust platform for running it.

These hosted services offer several advantages. For instance, the host platform updates automatically, ensuring zero downtime for your application. Additionally, Azure Web Apps offer automatic scaling, enabling your application to handle changes in load and traffic seamlessly.

In the context of Azure, identities play a crucial role in managing access controls. Azure employs a ‘Managed Identities’ system to facilitate secure and easy access to cloud resources.

There are two types of Managed Identities:

  1. System-assigned Managed Identity: This is an identity automatically created and tied to an instance of an Azure service, such as an Azure VM or Azure Web App. If the instance is deleted, Azure automatically cleans up the credentials and rights associated with that identity.
  2. User-assigned Managed Identity: You manually create and manage this identity. You can assign this identity to one or more instances of an Azure service. Unlike system-assigned identities, user-assigned identities aren’t automatically deleted when the associated service instance is deleted.

By employing Managed Identities, Azure allows you to authenticate to any service that supports Azure Active Directory (Azure AD) authentication without storing any credentials in your code.

Leveraging Azure Key Vault with Managed Identities and Python

Using Azure Key Vault with Managed Identities involves the following steps:

1. Create a Managed Identity

As mentioned earlier, you can create either a system-assigned or a user-assigned Managed Identity. If you run your Python application as an Azure Web App, you may prefer to create a system-assigned Managed Identity for simplicity. This identity will be tied to your Web App and automatically cleaned up if the app is deleted.

2. Create a Key Vault

Navigate to the Azure portal and create a new Key Vault. Please give it a unique name and select the same region as your Web App for optimal performance.

3. Configure Access Policy

Within the Key Vault settings, you must configure an access policy, including your Managed Identity. You must select the appropriate permissions (Get, List, etc.) for secrets, keys, or certificates based on what your Python app needs to access.

4. Add Secrets to Key Vault

Add your sensitive data (like API keys, connection strings, etc.) as secrets in the Key Vault.

5. Use Azure Key Vault libraries for Python

Azure provides SDKs for multiple languages, including Python, to make it easier to interact with its services. Install the azure-identity and azure-keyvault-secrets libraries in your Python environment using pip:

pip install azure-identity azure-keyvault-secrets

6. Access Secrets in your Python code

You can now access your Python code without embedding your secrets directly into your source. Here’s a basic example:

from azure.identity import DefaultAzureCredential 
from azure.keyvault.secrets import SecretClient 
 
# set up the default credential, which will use the Managed Identity 
credential = DefaultAzureCredential() 
 
# create a secret client using the credential 
vault_url = "https://your-key-vault-name.vault.azure.net/" 
secret_client = SecretClient(vault_url=vault_url, credential=credential) 
 
# get a secret 
secret = secret_client.get_secret("your-secret-name") 
print(secret.value)

DefaultAzureCredential uses the Managed Identity to authenticate to Azure AD in this example. The SecretClient is then initialized with your Key Vault's URL and the credential, allowing your app to fetch secrets from the Key Vault.


Using AWS Secrets Manager with Python

AWS also provides a secure and robust service to handle sensitive data like API keys, passwords, and database connection strings, as with Azure Key Vault.

This service, called AWS Secrets Manager, follows a similar philosophy but with a few AWS-specific terms and procedures. Here are the steps to leverage AWS Secrets Manager in your Python application:

1. Create an IAM role

IAM (Identity and Access Management) is AWS’s access control service, similar to Azure’s Managed Identities. You need to create an IAM role that your application can assume to gain access to AWS resources. This IAM role will be used to authenticate your Python application to AWS.

2. Define a Secrets Manager secret

Navigate to the AWS Secrets Manager service in the AWS Management Console and create a new secret. This secret can hold sensitive data your application needs, like an API key or a database password. Once the secret is created, note down the ARN (Amazon Resource Name), as you will need it to access the secret from your Python application.

3. Set the IAM policy for the secret

In the secret settings, define an IAM policy that grants your IAM role permission to access this secret. Typically, you will allow the secretsmanager:GetSecretValue action for your role.

4. Use AWS SDK for Python (Boto3)

AWS provides an SDK for Python called Boto3. Install Boto3 into your Python environment using pip.

pip install boto3

5. Access the Secret in Python code

With Boto3 and your IAM role, you can now access your secret from Python code. Here’s an example:

import boto3 
from botocore.exceptions import BotoCoreError, ClientError 
 
session = boto3.session.Session() 
client = session.client( 
    service_name='secretsmanager', 
    region_name="your-aws-region" 
) 
 
try: 
    get_secret_value_response = client.get_secret_value( 
        SecretId='your-secret-arn' 
    ) 
except ClientError as e: 
    raise Exception("Couldn't retrieve the secret") from e 
else: 
    # Depending on whether the secret is a string or binary, one of these fields will be populated 
    if 'SecretString' in get_secret_value_response: 
        secret = get_secret_value_response['SecretString'] 
    else: 
        secret = base64.b64decode(get_secret_value_response['SecretBinary']) 
     
    print(secret)

In this code, boto3.session.Session() uses the IAM role to authenticate your Python application to AWS. Then, client.get_secret_value() fetches the secret value from the Secrets Manager.


Using Google Cloud Secret Manager with Python

In the Google Cloud ecosystem, the Google Cloud Secret Manager serves as the secure vault for your application’s sensitive data. It offers a safe and reliable way to store and retrieve your secrets. Here’s a step-by-step guide on how to harness its potential.

1. Create a Service Account

Google Cloud uses service accounts for authentication and access control, similar to Azure’s Managed Identities and AWS’s IAM roles. You need to create a new service account and download the service account key JSON file.

2. Create a Secret in Secret Manager

Navigate to the Secret Manager in the Google Cloud Console and create a new secret. As with Azure and AWS, this secret can hold sensitive data your application requires.

3. Grant the Service Account Access to the Secret

Under your secret's “Permissions” tab, add your service account and select the appropriate role, such as ‘Secret Manager Secret Accessor’, enabling your service account to access the secret’s value.

4. Use Google Cloud Client Libraries for Python

Google provides client libraries for Python to interact with its services, including Secret Manager. Install the Secret Manager client library into your Python environment using pip:

pip install google-cloud-secret-manager

5. Access the Secret in Python code

With the Secret Manager client library and your service account, you can now access your secret from Python code. Here’s an example:

from google.cloud import secretmanager 
 
# Create the Secret Manager client. 
client = secretmanager.SecretManagerServiceClient() 
 
# Build the resource name of the secret. 
secret_name = f"projects/your-project-id/secrets/your-secret-name/versions/latest" 
 
# Access the secret version. 
response = client.access_secret_version(request={"name": secret_name}) 
 
# Print the secret payload. 
payload = response.payload.data.decode('UTF-8') 
print("Plaintext: {}".format(payload))

In this code, secretmanager.SecretManagerServiceClient() uses the service account credentials to authenticate your Python application to Google Cloud. Then, client.access_secret_version() fetches the secret value from Secret Manager.


Conclusion

Our journey through secure sensitive data handling in Python began with basic environment variables, transitioned through .env files, and ultimately, arrived at using powerful, cloud-based secrets management services.

We observed that while environment variables and .env files offer simplicity and convenience, they suffer from significant security vulnerabilities, especially when terminal access falls into the wrong hands.

We explored the full world of cloud-based secrets management services in response to this challenge. Offered by major players like Azure, AWS, and Google Cloud, these services provide an efficient and highly secure way to manage secrets.

By tying access to defined identities and segregating development from production environments, they ensure that sensitive data is only available to those needing it.

An important factor that emerged from this discussion was the separation of responsibilities. In a well-managed system, developers shouldn’t have access to production secrets.

Instead, this access is typically managed by a DevOps team or engineers specifically tasked with deployment and system administration. This way, we ensure that the individuals who require access to sensitive data for productive tasks have it, while those who don’t can’t accidentally misuse or mishandle it.

The choice of your secrets management solution will depend largely on the specifics of your organization, its infrastructure, and its data handling policies.

For those operating in an on-premise environment, tools like HashiCorp Vault provide similar functionality to their cloud-based counterparts.

Security is not just about erecting barriers; it’s about intelligently managing access to sensitive data. It involves protecting your users, your system, and your organization’s reputation.

The tools and practices covered in this article help construct a framework where secure coding and data handling can thrive.

So, remember, efficient handling of sensitive data goes beyond just safeguarding information — strategically orchestrating your resources, roles, and responsibilities.

With these insights and tools at your disposal, your Python applications will be more secure and in line with best practices for modern software development.

Happy coding!