The biggest threat to enterprise AI isn’t hallucination, it’s networking. Here is how to lock down your LLM infrastructure using private subnets, CIDR blocks, and custom ENIs.
Every week, another enterprise announces a “secure” internal AI copilot, and every week, I see architecture diagrams that make my skin crawl.
We’ve spent the last two years obsessing over retrieval-augmented generation (RAG). We’ve mastered chunking strategies, embedding models, and orchestration graphs. But if you look closely at how most of these systems are deployed, they are an absolute security nightmare.
Developers are spinning up cloud instances, loading proprietary company data into vector databases, and letting them communicate over public internet gateways. It is the cloud computing equivalent of leaving the keys in the ignition of a running car.
When you’re dealing with enterprise-grade deployments, the data passing through your embedding models and vector databases is highly sensitive. You cannot afford to leak context. As the industry matures, the focus is rapidly shifting from building AI to securing AI.
Having spent years navigating both machine learning optimization and the tangled web of cloud networking, I want to bridge the gap. Let’s talk about how to actually isolate your AI infrastructure.
♦The Public Route vs. The Private FortressThe typical “tutorial” RAG deployment looks like this: Your application server queries a public LLM API, pulls context from a managed vector database (often with a publicly accessible endpoint), and stitches it together.
In a production environment for healthcare, finance, or proprietary tech, this is unacceptable. You need absolute network isolation. This means bringing your infrastructure inside a Virtual Private Cloud (VPC) and meticulously managing your subnets.
Defining the Network: VPCs and CIDRTo build a private AI enclave, you need to start with the foundational boundaries. A VPC acts as your logically isolated section of the AWS cloud. But simply having a VPC isn’t enough; you need to partition it correctly using Classless Inter-Domain Routing (CIDR).
When allocating your CIDR blocks, you must ensure your inference nodes (which might need to autoscale) have enough IP addresses, while keeping your vector databases locked in completely isolated private subnets.
Here is how you might use the AWS Cloud Development Kit (CDK) in Python to define a strict, secure network foundation for your AI apps:
from aws_cdk import (
aws_ec2 as ec2,
App, Stack
)
from constructs import Construct
class SecureAIVpcStack(Stack):
def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
super().__init__(scope, construct_id, **kwargs)
# 1. Define the VPC with a specific CIDR block
# A /16 gives us 65,536 IP addresses - plenty for autoscaling ML clusters
self.ai_vpc = ec2.Vpc(self, "EnterpriseAIVPC",
ip_addresses=ec2.IpAddresses.cidr("10.0.0.0/16"),
max_azs=2, # High availability across two availability zones
# 2. Strict Subnet Routing
subnet_configuration=[
ec2.SubnetConfiguration(
# Public subnet for load balancers ONLY
subnet_type=ec2.SubnetType.PUBLIC,
name="Ingress",
cidr_mask=24
),
ec2.SubnetConfiguration(
# Private subnet for the Application and LLM Inference Nodes
subnet_type=ec2.SubnetType.PRIVATE_WITH_EGRESS,
name="InferenceCluster",
cidr_mask=24
),
ec2.SubnetConfiguration(
# ISOLATED subnet for the Vector Database (No internet access at all)
subnet_type=ec2.SubnetType.PRIVATE_ISOLATED,
name="VectorDB",
cidr_mask=24
)
]
)
The ENI Bottleneck: Connecting the PiecesNow you have a Vector DB sitting in an isolated subnet (meaning it literally has no route to the internet) and an open-source LLM (like Llama 3 or Mistral) hosted on an EC2 instance in a private subnet.
How do they talk?
This is where many developers get tripped up and accidentally open a public route. Instead, they need to communicate via Elastic Network Interfaces (ENIs). Think of an ENI as a virtual network card.
When your inference node needs to query the vector database, the traffic must route internally through these interfaces using Security Groups as the bouncer.
♦# 3. Defining the Security Groups (The Bouncers)
llm_security_group = ec2.SecurityGroup(self, "LLMSecurityGroup",
vpc=self.ai_vpc,
description="Allow traffic from Load Balancer to Inference Nodes",
allow_all_outbound=True
)
vector_db_security_group = ec2.SecurityGroup(self, "VectorDBSecurityGroup",
vpc=self.ai_vpc,
description="Strictly allow traffic ONLY from the LLM nodes",
allow_all_outbound=False # Lock it down
)
# 4. The crucial step: Tying them together via internal routing
# The Vector DB only accepts connections on port 5432 (e.g., pgvector)
# specifically from the LLM's security group.
vector_db_security_group.add_ingress_rule(
peer=llm_security_group,
connection=ec2.Port.tcp(5432),
description="Allow private RAG queries via internal ENI routing"
)
By configuring it this way, even if a malicious actor breached your public-facing application, they couldn’t directly access the vector database. The data is fundamentally isolated at the hardware/network layer, not just by an API key.
Final ThoughtsMoving from experimental Jupyter notebooks to enterprise reality requires a paradigm shift. You have to start thinking like a network engineer.
As we push towards highly specialized, agentic workflows, the volume of data moving between models and databases will only increase. By mastering VPCs, mastering your CIDR blocks, and locking down your ENIs, you ensure that your cutting-edge AI remains an asset, rather than your company’s biggest vulnerability.
References & Further Reading- AWS Architecture Blog (2024). Securing Generative AI on AWS. A deep dive into the official AWS recommendations for isolating ML workloads using VPCs and PrivateLink.
- Pinecone Security Architecture. Deploying Vector Databases in Private Networks. Documentation on how managed vector databases handle private endpoints and VPC peering for secure enterprise deployments.
- Towards Data Science (2025). The Shift to Local LLMs: Privacy and Security in the Enterprise. Explores the growing industry trend of moving away from public APIs toward self-hosted, network-isolated open-source models for sensitive data.
- HashiCorp / Terraform. AWS VPC Network Provisioning. For those who prefer Terraform over CDK, the fundamental principles of isolating private_isolated subnets apply directly here.
♦Stop Exposing Your Vector Database: The Architect’s Guide to Private RAG using AWS VPCs was originally published in Code Like A Girl on Medium, where people are continuing the conversation by highlighting and responding to this story.