Threat Model¶

Last Updated: January 15, 2025

Table of Contents:¶

Purpose
Key points
Architectural Walkthrough
Reviews and thoughts provided by Gemini 2.0.

Purpose:¶

A security threat model is a useful tool for not only understanding the security threats that are possible for a given solution, but also the architectural constructs that can be leveraged to counter the threats. The purpose of this page is to walkthrough a threat model specifically focused on running a Kubernetes service on a cloud provider. In this case, the Kubernetes cluster is deployed to a Virtual Private Cloud (VPC) based network.

Key Points:¶

The key to dealing with threats is to use multiple lines of defense for threats external and internal, with monitoring and auditing at each level.
- This document focuses more on to control points than the monitor/audit points. That’s not to say one is more important than the other. The later will be covered in a later post.
The architecture leveraged here has it’s foundation based on a set of key building blocks:
- It is cloud based. The same threat model can be applied to IBM Cloud, AWS, etc….
- The network is based on Virtual Private Clouds ( VPCs )
  - There can be many VPCs hosting workloads in a solution architecture. The view below represents just one.
- The concept of a Transit VPC is leveraged to provide a secure public entry/exit point for all traffic AND a consolidation point for network access rules.
  - Transit VPC Pattern: https://aws.amazon.com/marketplace/solutions/infrastructure-software/transit-VPC
- The use of Transit Gateway(s) to connect VPCs between accounts, regions, etc…
  - Of course VPC Peering can be used within a region/account
- The use of Kubernetes and the Kubernetes network and the Container Network Interface (CNI) implementation ( e.g. Calico )
A key element of control is also the Identity and Access Management (IAM) configuration
- It is critical to controlling access to the elements of the VPC, the Kubernetes cluster, and the namespaces within the cluster:
As mentioned above, there are multiple lines of defense. In this case:
- The Perimeter: This can be defended leveraging several technologies working in tandem: A Web App Firewall, a Transit VPC, and or the firewall in the VPC. Alternative solutions could be be NSX in a VMWare solution, vSRX GW Appliance, FSA.
  - The Transit VPC and the Transit Gateway are key architectural elements enhancing the control and maintainability of this architecture
- The virtual networks and associated controls within them. These include, but are not limited to….
  - The cloud provider VPC
    - VPC ACL: Control inbound and outbound traffic to your cluster by creating and applying access control lists (ACLs) to each subnet that your cluster is attached to.
    - The Route configuration of the VPC.
    - VPC Security Group: Control inbound and outbound traffic to your worker nodes by modifying a VPC security group.
    - ACLs and Security Groups:
  - The Kubernetes Network
    - Kubernetes Network Policy: You can use Kubernetes policies to control network traffic between pods in your cluster and to isolate app microservices from each other within a namespace or across namespaces. When Kubernetes network policies are applied, they are automatically converted into Calico network policies. The Calico network plug-in in your cluster enforces these policies by setting up Linux Iptables rules on the worker nodes. Iptables rules serve as a firewall for the worker node to define the characteristics that the network traffic must meet to be forwarded to the targeted resource. Kubernetes Network Policies are configured via yaml and kubectl.
    - Kubernetes Calico Network Policy: Most Kubernetes distibutions use the Calico CNI, which is the kubernetes internal network that pods are connected to. Calico supports standard Kubernetes Network Policy, but also provides its own custom resource definition which offers a lot of enhancements. Install 'calicoctl' on your jump system and use it to manage Calico network policies. Some documentation and education links below
There are a number of best practices relative to VPC and Kubernetes deployment encompassed in this simplified threat model. See the Supporting Information Section for more details.

Walkthrough:¶

The following is a walkthrough of the various elements in the following picture:

Cloud Identity & Access Management (IAM) Controls:
1. Role: To control access to all services ( Compute, Storage, networking ) MUST be controlled via IAM. Many of the threats from rogue internal users can be controlled via the capabilities of IAM. It typically is also integrated with the Kubernetes Role Based Access Control (RBAC) mechanism to control access within K8S clusters.
  - Also included here can be the usage of Multi-Factor Authentication ( MFA )
  - https://w3.ibm.com/w3publisher/ibm-cloud-did-you-know/security/understanding-iam
Leverage Transit VPC pattern: All Public Access, inbound and outbound go through here.
1. Role: To provide a single control point for all traffic in/out of VPCs AND between VPCs. In effect, a Transit VPC is a DMZ into a multi-zone environment. This is accomplished via the use of an HA enabled, public facing, VPC Load Balancer; a set of Virtual Network Firewalls (VNFs) ( for consistent FW rule mgmt); public egress gateway(s) to control/manage outbound traffic, and VPN Gateways.
2. The key here is that the Transit VPC is the home for VNFs that provide a central point of control.
MicroSegmented Networks based on 1..N VPCs with all Subnets only exposed to cloud private network:
1. Role: The network design is the foundation to control the place meant of resources, and hence the controls.
  - By properly segmenting the number of VPCs leveraged and/or the subnets within a VPC using micro segmentation techniques coupled with VPC and Subnet level ACLs and Security groups, access to key resources can be tightly controlled
Transit Gateway (TGW)
1. Role: The TGW is an enabler to connect VPCs, in the same or different cloud accounts, via the cloud private network. TGWs can be leveraged as another vehicle to segment the network.
  - Leveraging VPC Routing Rules traffic to/from TGWs can be tightly controlled if desired.
VPC controls: To provide the “traffic cop” into, out of a VPC and to/from subnets and VMs running on the Subnets. Increasingly finer grained access control.
1. Access Control Lists - Role: control traffic to and from the subnets. Controls IP addresses that can get into a VPC or Subnet (inbound access) AND/OR IP addresses that can be targeted ( outbound) [Stateless controls - Allow or Deny]
2. Security Groups - Role: Control Access to Virtual Servers. In the case of a K8S deployment, they provide firewall level control to limit access to worker nodes. For example, allow port 22 (SSH) access to VMs [Stateful Controls - Explicit Allow]
3. Routing Rules - Role: Routing rules ( tables ) exist at the VPC to control traffic ( based on IP address ) that can ingress a VPC or egress a VPC.
VPC Load Balancers:
1. Role: To provide the control point (domain name/IP address ) to the services hosted in K8S. By leveraging a “private” VPC Loadbalancer only traffic from the cloud private network can access the K8S cluster.
  - The usage of a Private ALB in IKS is dependent on your choice for routing traffic. See this https://w3.ibm.com/w3publisher/ibm-cloud-did-you-know/understanding-vpcs/accessing-apps-on-iks-via-vpc
  - The VPC Load balancers provide either Layer 4 or Layer 7 Load balancer.
Kubernetes Cluster Role Management:
1. Role: To provide access controls within the K8S cluster. Controls who has admin access and what they can access. KEY POINT: in a hosted Kubernetes Service this can be configured and controlled via the cloud IAM. Additional tuning could be performed leveraging the K8S capabilities but it must be managed in conjunction with cloud IAM.
Kubernetes Network Policies and Calico Network Policies
1. Kubernetes Network Policies - Role: Control access between namespaces, applications, and pods within a cluster. This provides “east/west” control IF the previous lines of control are breached. Limits what an attacker can access.
  - 2 Levels of control: 1) Globally applied to the entire cluster. These rules enforce common standards across all applications. Enables standard controls. 2) Application/Namespace controls. The Policies are defined to only apply to specific namespaces. Enables controls unique to an application.
2. Calico Network Policies - Role: Builds on Kubernetes Network Policies to provide controls such as prioritizing the access rules, increase flexibility to match the controls (rules) to the environment and manage controls to VMs in one place.
Web Application Firewalls:
1. Role: Front line of defense for DDoS Control Point and monitoring of known attack vectors. The WAF monitors the incoming traffic and matches it against known attack types and points and blocks this type of traffic before it gets in.
Access to Cloud Services via private service endpoints:
1. Role: To ensure that the access to cloud services, consumed by an application workload ONLY utilize the private network and hence are not exposed to public internet threats
Bastion Host access to private subnet:
1. Role: An option control point to ensure that all operations access to an environment is via a separate network where tracking and controls are in place. This bastion network is also only accessible via a client VPN.

The following text was provided by Google Gemini 2.0. It references the picture above, drawn by Gary Zeien. The output was reviewed.

Validation of Security Controls and Governance in the Cloud Native Deployment Architecture

Overall, the diagram presents a valid approach to implementing security controls and governance in a cloud-native environment. It demonstrates a layered defense strategy with controls at multiple levels, including infrastructure, network, and application.

Key Valid Points:

Multi-Level Security: The diagram illustrates a layered approach with security controls at various levels:
- Infrastructure Level: VPCs, subnets, security groups, and network ACLs provide foundational security by segmenting the network and controlling traffic flow at the infrastructure level.
- Network Level: Kubernetes Network Policies, Istio, and Calico Network Policies enforce fine-grained control over traffic between Pods and namespaces within the cluster.
- Application Level: Security Groups, IAM roles, and service accounts control access to resources and services within the application layer.
Defense in Depth: The use of multiple security controls at different levels provides a defense-in-depth strategy, making it more difficult for attackers to exploit vulnerabilities.
Key Security Controls:
- VPCs and Subnets: Segmenting the network into VPCs and subnets provides isolation and limits the blast radius of potential attacks.
- Security Groups: Act as firewalls for EC2 instances, controlling traffic flow based on source/destination IP addresses and ports.
- Network ACLs: Control traffic flow at the subnet level, providing an additional layer of security.
- Kubernetes Network Policies: Enforce granular control over traffic between Pods and namespaces within the Kubernetes cluster.
- Istio: A service mesh that provides advanced traffic management, security, and observability features.
- IAM Roles and Policies: Control access to AWS services and resources based on the principle of least privilege.
- WAF (Web Application Firewall): Protects web applications from common attacks such as SQL injection and cross-site scripting.
Cloud Services: The diagram highlights the use of cloud services like IAM, VPC, and Security Groups, which provide built-in security controls and ease of management.

Areas for Improvement/Clarification:

Transit VPC: While the diagram mentions a Transit VPC, its specific role and how it interacts with other components could be further clarified.
Istio Integration: The diagram could provide more detail on how Istio interacts with other security controls, such as Kubernetes Network Policies, and how it enforces security policies.
Data Flow: The diagram could be more explicit in showing the flow of traffic between different components and how security controls are applied at each stage.
Threat Modeling: The diagram could benefit from incorporating threat modeling concepts to identify potential attack vectors and how the implemented controls mitigate those risks.

Overall, the diagram presents a good foundation for understanding how to implement security controls and governance in a cloud-native environment. By incorporating the suggestions for improvement, the diagram could be further enhanced to provide a more comprehensive and detailed view of the security architecture.

If you have a specific aspect of the diagram you'd like me to focus on or validate in more detail, please feel free to ask!

More on a Transit VPC¶

You're absolutely correct. A Transit VPC implements a "hub-and-spoke" network topology where all incoming and outgoing traffic traverses a central "hub" (the Transit VPC). This design offers several advantages for security and governance:

Benefits of a Transit VPC:

Centralized Security Inspection: By routing all traffic through the Transit VPC, you can centralize security inspection and enforcement. This allows you to deploy security appliances (firewalls, intrusion detection/prevention systems, etc.) in a single location and inspect all traffic entering and leaving your network.
Simplified Network Management: A Transit VPC simplifies network management by reducing the number of direct connections between VPCs. Instead of managing a complex mesh of peerings, you only need to manage connections to the central Transit VPC.
Improved Security Posture: Centralized security inspection and simplified network management contribute to an improved security posture. You have better visibility into network traffic and can more easily enforce consistent security policies across all VPCs.
Reduced Operational Overhead: Centralized management reduces operational overhead by eliminating the need to configure and maintain multiple security appliances and network connections.
Cost Optimization: In some cases, a Transit VPC can help optimize costs by reducing data transfer charges between VPCs.

How a Transit VPC Enhances Security and Governance:

Consistent Security Policies: You can enforce consistent security policies across all VPCs by implementing them in the Transit VPC. This ensures that all traffic is subject to the same security rules, regardless of its source or destination.
Centralized Logging and Monitoring: The Transit VPC provides a central point for logging and monitoring network traffic. This simplifies security analysis and incident response.
Isolation and Segmentation: While the Transit VPC acts as a central hub, it still allows for isolation between different VPCs. You can use routing rules and security policies to control which VPCs can communicate with each other.
Compliance and Auditing: A Transit VPC can help you meet compliance requirements by providing a clear and auditable record of network traffic.

Use Cases for Transit VPC:

Connecting Multiple VPCs: A Transit VPC is ideal for connecting multiple VPCs in a hub-and-spoke topology.
Hybrid Cloud Connectivity: You can use a Transit VPC to connect your cloud environment to your on-premises data center.
Centralized Security Inspection: A Transit VPC is well-suited for implementing centralized security inspection and enforcement.

Alternatives to Transit VPC:

AWS Transit Gateway: AWS Transit Gateway is a managed service that provides similar functionality to a Transit VPC but with greater scalability and availability.
VPC Peering: VPC peering allows you to create direct connections between VPCs. However, this can become complex to manage with a large number of VPCs.

In summary: A Transit VPC provides a centralized hub for network traffic, which significantly helps in implementing security controls and governance. It simplifies network management, improves security posture, and reduces operational overhead. However, it's essential to consider the specific needs of your environment and evaluate whether a Transit VPC or an alternative solution like AWS Transit Gateway is the best fit.

Adding Istio....¶

You're asking about a very relevant and important aspect of cloud-native security. Let's explore how integrating Istio would affect the diagram and enhance the security and governance posture.

Key Impacts of Istio on the Diagram:

Sidecar Proxies: Istio's core mechanism is the injection of Envoy proxy sidecars into each application Pod. This is already partially represented in the diagram, but Istio would make this explicit for all relevant Pods within the mesh.
Traffic Interception and Control: Istio would take over the responsibility of managing almost all network traffic within the Kubernetes cluster. This means:
- Traffic between Pods within the same namespace.
- Traffic between Pods in different namespaces.
- Egress traffic to external services.
Enhanced Security: Istio would significantly enhance security through several features:
- Mutual TLS (mTLS): Istio can automatically enforce mTLS between all services within the mesh, encrypting all communication and providing strong authentication. This would replace or enhance the basic security provided by network policies at the network level.
- Authorization Policies: Istio allows you to define fine-grained authorization policies to control access to services based on various criteria (e.g., identity, source IP, HTTP headers). This adds an application-level security layer.
Traffic Management: Istio provides advanced traffic management capabilities:
- Routing Rules: You can define sophisticated routing rules to control how traffic is directed between services (e.g., based on HTTP headers, paths, or weights).
- Traffic Splitting: You can gradually roll out new versions of a service by splitting traffic between different versions.
- Fault Injection: You can inject faults (e.g., delays, errors) to test the resilience of your application.
Observability: Istio collects detailed telemetry data (metrics, logs, traces) about all traffic within the mesh. This provides valuable insights into the performance and behavior of your applications.
Egress Control: Istio can manage egress traffic to external services through its egress gateway. This provides more fine-grained control over which external services can be accessed and how.

Specific Changes to the Diagram:

More Prominent Sidecars: The Istio sidecar proxies would be more prominently shown in all relevant Pods.
Reduced Reliance on Network Policies: While Kubernetes Network Policies would still be relevant for basic network segmentation (e.g., preventing traffic between namespaces that should be completely isolated), Istio would handle most of the fine-grained traffic control within the mesh.
Istio Configuration: The diagram would need to show how Istio is configured (using YAML files or istioctl).
mTLS Enforcement: The diagram would need to indicate the enforcement of mTLS between services.
Egress Gateway Enhancement: The role of the Istio Egress Gateway would be more prominent, showing how it controls and secures egress traffic.

How Istio Improves Security and Governance in this Architecture:

Zero Trust Security: Istio's mTLS and authorization policies enable a zero-trust security model, where every service is authenticated and authorized before it can communicate with another service.
Fine-Grained Access Control: Istio's authorization policies allow you to implement fine-grained access control at the application level, going beyond the basic network-level controls provided by Network Policies and Security Groups.
Improved Observability and Auditing: Istio's telemetry data provides detailed insights into network traffic and service behavior, which is crucial for monitoring, troubleshooting, and auditing.
Simplified Security Management: Istio simplifies security management by providing a centralized platform for configuring and enforcing security policies.

In summary: Integrating Istio would significantly enhance the security and governance of the architecture depicted in the diagram. It would provide more fine-grained control over traffic, enforce strong authentication and authorization, and provide valuable observability data. While the diagram already shows a good foundation for security, Istio would add a powerful layer of application-level security and traffic management.

Threat Modeling¶

Threat modeling is a structured process for identifying, analyzing, and mitigating security threats in a system or application. It involves understanding potential attackers, their motivations, and the methods they might use to exploit vulnerabilities. The goal is to proactively identify and address security risks before they can be exploited.

Here are some core threat modeling concepts:

1. Assets:

These are the valuable components of your system that you want to protect. They can include data (customer data, financial records), applications, systems (servers, databases), and even physical infrastructure.
Identifying assets is the first step in threat modeling, as it helps you focus your efforts on protecting what matters most.

2. Threats:

These are potential events that could harm your assets. They can be intentional (attacks by malicious actors) or unintentional (accidental data deletion).
Threats can be categorized in various ways, such as:
- STRIDE: Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, Elevation of Privilege.
- DREAD: Damage, Reproducibility, Exploitability, Affected Users, Discoverability.

3. Vulnerabilities:

These are weaknesses in your system that could be exploited by threats. They can be technical (software bugs, misconfigurations) or non-technical (weak passwords, lack of security awareness).
Identifying vulnerabilities is crucial for understanding how threats could be realized.

4. Attack Vectors:

These are the methods that attackers might use to exploit vulnerabilities and compromise your system. They can include:
- Network attacks (e.g., port scanning, denial-of-service attacks)
- Application attacks (e.g., SQL injection, cross-site scripting)
- Social engineering (e.g., phishing)

5. Threat Agents:

These are the individuals or groups who might carry out threats. They can include:
- External attackers (hackers, cybercriminals)
- Internal attackers (malicious employees)
- Nation-state actors

6. Risk:

Risk is the potential for harm resulting from a threat exploiting a vulnerability. It is typically assessed based on the likelihood of the threat occurring and the impact it would have.
Risk assessment helps prioritize mitigation efforts by focusing on the most critical risks.

Threat Modeling Methodologies:

Several methodologies can be used for threat modeling, including:

STRIDE: Focuses on identifying threats based on six categories: Spoofing, Tampering, Repudiation, Information Disclosure, Denial of Service, and Elevation of Privilege.
DREAD: Focuses on risk assessment based on five categories: Damage, Reproducibility, Exploitability, Affected Users, and Discoverability.
PASTA (Process for Attack Simulation and Threat Analysis): A seven-step methodology that focuses on understanding the business context and simulating attacks.
OCTAVE (Operationally Critical Threat, Asset, and Vulnerability Evaluation): A risk-based methodology that focuses on identifying critical assets and threats to those assets.
Trike: A methodology that focuses on understanding the system from the perspective of different stakeholders.

The Threat Modeling Process:

A typical threat modeling process involves the following steps:

Define Scope: Determine the system or application to be analyzed.
Identify Assets: Identify the valuable components of the system.
Decompose the Application: Understand how the application works, including its architecture, data flow, and dependencies.
Identify Threats: Identify potential threats to the system.
Identify Vulnerabilities: Identify weaknesses in the system that could be exploited by threats.
Analyze and Prioritize Risks: Assess the likelihood and impact of each threat and prioritize mitigation efforts.
Develop Mitigation Strategies: Develop and implement security controls to mitigate the identified risks.

By applying these threat modeling concepts and methodologies, you can proactively identify and address security risks in your systems and applications, improving your overall security posture.

Last update: January 15, 2025