FroquizFroquiz
HomeQuizzesSenior ChallengeGet CertifiedBlogAbout
Sign InStart Quiz
Sign InStart Quiz
Froquiz

The most comprehensive quiz platform for software engineers. Test yourself with 10000+ questions and advance your career.

LinkedIn

Platform

  • Start Quizzes
  • Topics
  • Blog
  • My Profile
  • Sign In

About

  • About Us
  • Contact

Legal

  • Privacy Policy
  • Terms of Service

Β© 2026 Froquiz. All rights reserved.Built with passion for technology
Blog & Articles

AWS RDS: Setup, Multi-AZ, Read Replicas, Parameter Groups and Cost Optimization

Learn AWS RDS from a developer perspective. Covers instance setup, Multi-AZ for high availability, read replicas, parameter groups, automated backups, security, monitoring, and cost optimization.

Yusuf SeyitoğluMarch 17, 20262 views10 min read

AWS RDS: Setup, Multi-AZ, Read Replicas, Parameter Groups and Cost Optimization

Amazon RDS (Relational Database Service) manages the operational overhead of running a relational database β€” provisioning, patching, backups, and failover β€” so you can focus on your application. But knowing how to configure it correctly for production workloads, security, and cost is where the real skill lies.

What RDS Manages for You

  • Operating system and database engine patches
  • Automated backups and point-in-time recovery
  • Storage auto-scaling
  • Multi-AZ failover
  • Monitoring and alerting via CloudWatch
  • Encryption at rest and in transit

What you still manage: schema design, query optimization, indexes, and connection management.

Creating an RDS Instance with Terraform

hcl
-- modules/rds/main.tf resource "aws_db_subnet_group" "main" { name = "${var.name}-subnet-group" subnet_ids = var.private_subnet_ids tags = { Name = "${var.name}-subnet-group" } } resource "aws_security_group" "rds" { name = "${var.name}-rds-sg" vpc_id = var.vpc_id ingress { from_port = 5432 to_port = 5432 protocol = "tcp" security_groups = var.app_security_group_ids -- only app servers can connect } tags = { Name = "${var.name}-rds-sg" } } resource "aws_db_instance" "main" { identifier = var.name engine = "postgres" engine_version = "16.1" instance_class = var.instance_class allocated_storage = var.storage_gb max_allocated_storage = var.storage_gb * 3 -- auto-scaling ceiling db_name = var.db_name username = var.db_username password = var.db_password -- use AWS Secrets Manager in production -- Networking db_subnet_group_name = aws_db_subnet_group.main.name vpc_security_group_ids = [aws_security_group.rds.id] publicly_accessible = false -- never expose to internet -- High availability multi_az = var.environment == "prod" -- Backups backup_retention_period = var.environment == "prod" ? 7 : 1 backup_window = "03:00-04:00" maintenance_window = "Mon:04:00-Mon:05:00" -- Performance parameter_group_name = aws_db_parameter_group.main.name storage_type = "gp3" storage_encrypted = true kms_key_id = var.kms_key_arn -- Monitoring monitoring_interval = 60 -- Enhanced Monitoring every 60s monitoring_role_arn = var.monitoring_role_arn enabled_cloudwatch_logs_exports = ["postgresql", "upgrade"] performance_insights_enabled = true performance_insights_retention_period = 7 -- days -- Deletion protection deletion_protection = var.environment == "prod" skip_final_snapshot = var.environment != "prod" final_snapshot_identifier = "${var.name}-final-snapshot" tags = { Environment = var.environment } }

Multi-AZ for High Availability

Multi-AZ maintains a synchronous standby replica in a different Availability Zone. Failover is automatic and typically completes in 60-120 seconds.

code
AZ-1 (eu-west-1a) AZ-2 (eu-west-1b) β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Primary RDS │────────►│ Standby RDS β”‚ β”‚ (read/write) β”‚ sync β”‚ (no traffic) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ DNS name (stays the same after failover)

What triggers automatic failover:

  • AZ outage
  • Primary instance failure
  • OS patching on the primary
  • Instance class change

Your application connects via the RDS endpoint DNS name β€” it automatically points to the new primary after failover. No application code change needed.

bash
-- RDS endpoint format your-db.abc123xyz.eu-west-1.rds.amazonaws.com -- After failover: same DNS name, now points to former standby

Read Replicas for Read Scaling

Read replicas serve read-only traffic, reducing load on the primary:

hcl
resource "aws_db_instance" "read_replica" { identifier = "${var.name}-replica-1" replicate_source_db = aws_db_instance.main.identifier instance_class = "db.t3.medium" publicly_accessible = false skip_final_snapshot = true -- Read replicas use async replication -- Small replication lag is normal (usually <1 second) }
python
-- Application: use replica for reads, primary for writes import os WRITE_DB_URL = os.environ["DATABASE_URL"] -- primary READ_DB_URL = os.environ["DATABASE_READ_URL"] -- replica -- SQLAlchemy example from sqlalchemy import create_engine write_engine = create_engine(WRITE_DB_URL, pool_size=5, max_overflow=10) read_engine = create_engine(READ_DB_URL, pool_size=10, max_overflow=20) -- Use read_engine for SELECT-heavy operations -- Use write_engine for INSERT, UPDATE, DELETE

Read replicas vs Multi-AZ:

  • Multi-AZ is for availability (failover) β€” standby gets no traffic
  • Read replicas are for performance (scale reads) β€” they receive query traffic
  • Read replicas can be promoted to standalone instances for disaster recovery

Parameter Groups

Parameter groups configure database engine settings:

hcl
resource "aws_db_parameter_group" "main" { family = "postgres16" name = "${var.name}-params" parameter { name = "shared_buffers" value = "{DBInstanceClassMemory/4}" -- 25% of RAM for buffer cache } parameter { name = "work_mem" value = "16384" -- 16MB per sort/hash operation } parameter { name = "maintenance_work_mem" value = "524288" -- 512MB for VACUUM, CREATE INDEX } parameter { name = "effective_cache_size" value = "{DBInstanceClassMemory*3/4}" -- hint to planner: 75% of RAM is cacheable } parameter { name = "log_min_duration_statement" value = "1000" -- log queries taking more than 1 second } parameter { name = "log_lock_waits" value = "1" -- log lock wait events } parameter { name = "max_connections" value = "200" } parameter { name = "rds.force_ssl" value = "1" -- require SSL connections } }

Automated Backups and Point-in-Time Recovery

RDS takes daily automated snapshots and streams transaction logs continuously:

bash
-- Restore to a specific point in time (AWS CLI) aws rds restore-db-instance-to-point-in-time \ --source-db-instance-identifier mydb \ --target-db-instance-identifier mydb-restored \ --restore-time 2025-03-10T08:30:00Z -- Create a manual snapshot (before a risky migration) aws rds create-db-snapshot \ --db-instance-identifier mydb \ --db-snapshot-identifier mydb-pre-migration-20250311

Best practice: take a manual snapshot before any destructive migration or major change.

Secrets Manager Integration

Never hardcode database passwords. Use AWS Secrets Manager with automatic rotation:

hcl
resource "aws_secretsmanager_secret" "db_password" { name = "${var.name}/db-password" recovery_window_in_days = 7 } resource "aws_secretsmanager_secret_rotation" "db_password" { secret_id = aws_secretsmanager_secret.db_password.id rotation_lambda_arn = var.rotation_lambda_arn rotation_rules { automatically_after_days = 30 } }
python
-- Application fetches password from Secrets Manager at startup import boto3 import json from functools import lru_cache @lru_cache(maxsize=1) def get_db_password() -> str: client = boto3.client("secretsmanager", region_name="eu-west-1") response = client.get_secret_value(SecretId="myapp/db-password") secret = json.loads(response["SecretString"]) return secret["password"]

Connection Pooling with RDS Proxy

RDS has a maximum connection limit. With many Lambda functions or app instances, you can exhaust connections quickly. RDS Proxy pools and multiplexes connections:

hcl
resource "aws_db_proxy" "main" { name = "${var.name}-proxy" engine_family = "POSTGRESQL" idle_client_timeout = 1800 require_tls = true role_arn = aws_iam_role.rds_proxy.arn vpc_security_group_ids = [aws_security_group.rds_proxy.id] vpc_subnet_ids = var.private_subnet_ids auth { auth_scheme = "SECRETS" iam_auth = "REQUIRED" secret_arn = aws_secretsmanager_secret.db_password.arn } }

RDS Proxy is especially valuable for serverless workloads (Lambda) where connection thrashing is common.

Monitoring and Alerting

hcl
-- CloudWatch alarm for high CPU resource "aws_cloudwatch_metric_alarm" "rds_cpu_high" { alarm_name = "${var.name}-cpu-high" comparison_operator = "GreaterThanThreshold" evaluation_periods = 2 metric_name = "CPUUtilization" namespace = "AWS/RDS" period = 300 -- 5 minutes statistic = "Average" threshold = 80 alarm_actions = [var.sns_topic_arn] dimensions = { DBInstanceIdentifier = aws_db_instance.main.id } } -- Also monitor: -- FreeStorageSpace (alert at <10GB) -- DatabaseConnections (alert near max_connections) -- ReadLatency / WriteLatency (alert above 20ms) -- ReplicaLag (alert above 5 seconds for read replicas)

Cost Optimization

  • Right-size instances: Use Performance Insights to check if your instance is over-provisioned. A db.t3.medium handles most small web apps.
  • Reserved Instances: 1-year reserved saves ~40%; 3-year saves ~60% vs on-demand.
  • Aurora Serverless v2: For variable workloads β€” scales to zero, pay per ACU-hour.
  • Stop dev/test databases: RDS instances can be stopped for up to 7 days. Automate stopping non-production instances outside business hours.
  • gp3 over gp2: gp3 gives 3,000 IOPS baseline (vs 100 for gp2) at lower cost.
  • Delete old snapshots: Automated snapshots expire automatically; manual snapshots accumulate and incur storage costs.

Common Interview Questions

Q: What is the difference between Multi-AZ and a read replica?

Multi-AZ maintains a synchronous standby in another AZ for automatic failover β€” it improves availability but does not serve traffic. A read replica uses asynchronous replication to a separate instance that serves read queries β€” it improves read performance but does not automatically fail over. Some teams use both: Multi-AZ for availability and read replicas for scaling.

Q: What is RDS Proxy and why would you use it with Lambda?

RDS Proxy is a fully managed database proxy that pools and shares database connections. Lambda functions are ephemeral β€” each invocation may open a new connection, and with thousands of concurrent invocations you can hit RDS connection limits quickly. RDS Proxy multiplexes many Lambda connections into a smaller pool of actual database connections, solving this problem without application changes.

Q: How does point-in-time recovery work in RDS?

RDS continuously streams transaction logs (WAL for PostgreSQL) to S3. Combined with daily automated snapshots, this lets you restore to any second within your retention window. RDS restores the latest snapshot before your target time, then replays the transaction logs up to the exact point you specified.

Practice on Froquiz

Cloud database knowledge is tested in backend and DevOps interviews. Explore our AWS and infrastructure quizzes on Froquiz to test your cloud knowledge.

Summary

  • RDS manages patching, backups, failover, and monitoring β€” you manage schema and queries
  • Multi-AZ = high availability (automatic failover in 60-120s); read replicas = read scalability
  • Never expose RDS publicly β€” place in private subnets, restrict security groups to app servers
  • Parameter groups tune engine settings β€” log slow queries, set appropriate memory parameters
  • Always take a manual snapshot before destructive migrations
  • Use Secrets Manager for passwords with automatic rotation β€” never hardcode credentials
  • RDS Proxy multiplexes connections β€” essential for Lambda and other ephemeral compute
  • Cost: right-size instances, use Reserved Instances for steady workloads, gp3 over gp2

About Author

Yusuf Seyitoğlu

Author β†’

Other Posts

  • TanStack Query (React Query): Server State Management Done RightMar 17
  • Python Testing with pytest: Fixtures, Parametrize, Mocking and Best PracticesMar 17
  • Software Architecture Patterns: MVC, Clean Architecture, Hexagonal and Event-DrivenMar 17
All Blogs