Production goal
- Deliver a stable prod stack for the first customer
- Keep cost under 200 dollars per month at low traffic
- Keep operations light and make scale outs simple
Core components
- Spring API Gateway as the single backend entry point
- Java microservices, Rasa, Kafka in KRaft mode
- PostgreSQL on Amazon RDS
- S3 plus CloudFront for the website and assets
High-level architecture
c1.connexing.io ──▶ CloudFront ──▶ S3 (static site)
api.c1.connexing.io ──▶ ALB (TLS terminates) ──▶ EC2 t4g.xlarge (Docker Compose)
├─ Spring API Gateway :8080 ◀─ ALB target
├─ Java services (internal only)
├─ Rasa server (internal only)
└─ Kafka broker (KRaft) + 200 GB gp3
PostgreSQL (RDS db.t4g.micro Single-AZ, private subnets)
Nginx is not required. The Spring API Gateway is the only internet-facing container behind the ALB.
Networking and DNS
- Public subnets for ALB and the app EC2. Private subnets for RDS
- Route 53 for DNS. ACM for TLS
- No NAT Gateway. No VPC interface endpoints
- EC2 has a public IP for outbound only. No 0.0.0.0 inbound
- SSM Session Manager for admin access. SSH available on port 22
Domains
- c1.connexing.io → CloudFront → S3 origin with OAC. S3 bucket has Block Public Access
- api.c1.connexing.io → ALB → target group on port 8080 → Spring API Gateway
Certificates
- CloudFront certificate in us-east-1
- ALB certificate in the application region
- Note. A wildcard
*.connexing.io does not cover api.c1.connexing.io
- Options. Issue per-tenant certs for
api.c1.connexing.io or use api-c1.connexing.io so the wildcard applies
ALB specifics
- Listener 443 only. TLS terminates at ALB
- Target group port 8080
- Health check path /actuator/health. Success codes 200–399
Compute and runtime
- One EC2 t4g.xlarge. Amazon Linux 2023
- Docker Compose runs Spring API Gateway and all Java services
- Also runs Rasa and one Kafka broker in KRaft mode
- One EBS gp3 200 GB attached for Kafka data
Service discovery
- Use Docker DNS service names
- If you already rely on Eureka, run a small Eureka container
Database
- Amazon RDS for PostgreSQL db.t4g.micro. Single-AZ
- 20–40 GB storage. Encryption on. Storage autoscaling on
- Automated backups and point-in-time restore. Retain 7–14 days
- Security group allows port 5432 only from the app EC2 security group
Kafka settings
- KRaft mode. No ZooKeeper
advertised.listeners uses the EC2 private hostname or Docker network alias
- Heap target 3–4 GB. Tune retention by time or bytes to fit the 200 GB volume
Security and access
- SSM Session Manager preferred. SSH on port 22 allowed from allowlisted office IPs
- Least privilege IAM roles
- AWS Secrets Manager for DB passwords and API keys. Inject as env vars
- Encrypt at rest on RDS and S3
- ALB → EC2 SG rule allows inbound to port 8080 from the ALB SG only
Do not enable AWS WAF for now. Do not add NAT Gateways or VPC interface endpoints.
CI and CD — GitLab to AWS
- Keep GitLab CI on Netcup
- GitLab OIDC assumes a prod deploy IAM role in AWS
- Build arm64 images. Push to Amazon ECR. One repo per service. Lifecycle keeps last 5–10 tags
- Deploy via SSM Run Command on EC2
Deploy steps run on EC2
docker login && docker compose pull && docker compose up -d
# DB migrations as a one-shot container
- Static assets uploaded to S3. Invalidate CloudFront
- Rollback job redeploys previous tag by SHA
IAM summary
- EC2 instance profile: AmazonSSMManagedInstanceCore, ECR pull, CloudWatch logs
- GitLab OIDC deploy role: ECR push, SSM SendCommand, EC2 Describe, S3 Put on web bucket, CloudFront CreateInvalidation, Logs Put
Observability
- CloudWatch Agent on EC2. Ship container logs
- Log retention 30 days
- Alarms. EC2 memory and disk. Kafka disk and consumer lag. RDS CPU and free storage. ALB HTTP 5xx
Cost guardrails
- One EC2 t4g.xlarge only
- RDS db.t4g.micro Single-AZ only
- One ALB only
- No NAT Gateway. No VPC interface endpoints
- ECR storage capped by lifecycle
- AWS Budgets alert at 180 dollars forecast to email and SNS
Scale plan
- If EC2 memory or IO is high. Split Kafka to a dedicated m7g.large. Keep app on t4g.large
- If availability is required. Add 2 more Kafka brokers. Move DB to Aurora PostgreSQL Serverless v2
- If traffic grows. Add WAF and CloudFront rules. Consider moving stateless services to App Runner or ECS Fargate
Deliverables
- Terraform. VPC, subnets, ALB, EC2, EBS, RDS, S3, CloudFront, Route 53, ACM, IAM roles, Secrets, SSM, CloudWatch, Budgets
- Docker Compose file on EC2 with memory limits and KRaft config
- GitLab CI template for build, push, deploy, rollback
- Runbook. Backup, restore, and scale-out steps
Exact rules and roles to avoid guessing
Security groups
- alb-sg. Inbound 443 from 0.0.0.0/0. Outbound any
- app-ec2-sg. Inbound 8080 from alb-sg only; Inbound 22 from allowlisted office IPs. Outbound any
- rds-sg. Inbound 5432 from app-ec2-sg only
EC2 instance profile
- AmazonSSMManagedInstanceCore
- Read-only ECR pull
- CloudWatch logs: CreateLogStream, PutLogEvents
GitLab deploy role (OIDC)
- ECR: GetAuthorizationToken, BatchCheckLayerAvailability, InitiateLayerUpload, UploadLayerPart, CompleteLayerUpload, PutImage, BatchGetImage
- SSM: SendCommand, GetCommandInvocation, DescribeInstanceInformation
- EC2: DescribeInstances
- S3: PutObject, ListBucket, DeleteObject on the web bucket prefix
- CloudFront: CreateInvalidation
- Logs: CreateLogStream, PutLogEvents for a deploy log group
All numbers refer to the first customer only. Review before adding more tenants.