
GCP
163+ kostenlose Übungsfragen mit KI-verifizierten Antworten
KI-gestützt
Jede Google Professional Cloud Database Engineer-Antwort wird von 3 führenden KI-Modellen kreuzverifiziert, um maximale Genauigkeit zu gewährleisten. Erhalte detaillierte Erklärungen zu jeder Option und tiefgehende Fragenanalysen.
Your media-streaming platform uses Memorystore for Redis (Standard Tier, Redis 6.x) as a cache for user session tokens and frequently requested metadata; during live event traffic spikes, p95 cache latency jumps from 6 ms to 180–250 ms, and Cloud Monitoring shows memory utilization at 95–98% with Redis INFO reporting ~25,000 evicted_keys/min and an eviction policy of allkeys-lru; average key size is ~1.5 KB, average TTL is 5 minutes, CPU stays under 40%, and network RTT between your GKE cluster and the Redis instance is ~1 ms in the same region; you need to reduce the frequency and impact of these latency spikes. What should you do?
Increasing TTL keeps entries in cache longer, which can improve hit rate only if there is sufficient memory headroom. Here memory is already 95–98% with massive evictions. A longer TTL increases the resident set size and reduces natural expiration, typically increasing eviction pressure and churn under allkeys-lru. That usually worsens tail latency and miss storms during spikes rather than reducing them.
Moving workloads into the same zone can reduce cross-zone latency, but the measured RTT is already ~1 ms in the same region and CPU is not saturated. The dominant symptom is memory pressure and high evictions, not network delay. Also, Memorystore Standard Tier is regional with replicas for HA; pinning apps to one zone can reduce resilience and doesn’t address eviction-driven latency spikes.
Resizing to a larger memory tier directly addresses the root cause: near-capacity memory utilization and a very high eviction rate. More RAM reduces eviction frequency, stabilizes cache hit rate, and avoids eviction bookkeeping overhead that contributes to p95 spikes. This is the primary scaling mechanism for Memorystore when the working set plus overhead exceeds available memory.
Additional replicas (read scaling) can help when the primary is CPU-bound on reads or when client connections saturate throughput. In this scenario CPU is under 40% and the issue is eviction/memory pressure. Replicas do not increase the primary’s memory capacity for writes and do not prevent evictions on the primary. They also add replication overhead and cost without fixing the root cause.
Core concept: This question tests diagnosing Redis cache latency under load in Memorystore for Redis (Standard Tier) and selecting the right scaling lever. Key signals are memory pressure (95–98%), high eviction rate (~25,000 evicted_keys/min), allkeys-lru policy, and low CPU/network latency. Why the answer is correct: The p95 latency spikes correlate with extreme memory utilization and heavy evictions. With allkeys-lru, Redis must constantly sample keys, update LRU metadata, and evict items to admit new writes. Under bursty traffic, this creates churn: hot keys can be evicted, causing cache misses, more backend fetches, and additional cache repopulation writes—amplifying latency. Since CPU is <40% and RTT is ~1 ms, the bottleneck is not compute or network; it is memory capacity relative to working set and write rate. Resizing to a larger memory tier increases available RAM, reduces eviction frequency, stabilizes hit rate, and removes the eviction-driven tail latency. Key features / best practices: Memorystore capacity is fixed per instance; when memory is near full, Redis eviction behavior dominates performance. Best practice is to size Redis so typical utilization stays well below the limit (often targeting ~60–80% depending on workload) to absorb spikes, fragmentation, and overhead. Also consider that average key size (1.5 KB) plus Redis object overhead and TTL bookkeeping means real memory per key is higher than payload size. Standard Tier provides HA via replication/failover, but it does not automatically add memory; you must scale up (or shard at the application layer if needed). Common misconceptions: It’s tempting to blame network (but RTT is already ~1 ms) or to add replicas (but replicas don’t increase write-side memory capacity and don’t stop evictions on the primary). Increasing TTL sounds like it improves hit rate, but with memory already saturated it increases residency time, worsening evictions and churn. Exam tips: When you see high evicted_keys with high memory utilization and low CPU, think “insufficient cache capacity/working set too large” and choose scale-up memory (or redesign/shard). Replicas help read scaling, not memory pressure. TTL changes must be evaluated against memory headroom; longer TTL under pressure usually hurts.
Möchtest du alle Fragen unterwegs üben?
Lade Cloud Pass kostenlos herunter – mit Übungstests, Fortschrittsverfolgung und mehr.
Möchtest du alle Fragen unterwegs üben?
Lade Cloud Pass kostenlos herunter – mit Übungstests, Fortschrittsverfolgung und mehr.
Möchtest du alle Fragen unterwegs üben?
Lade Cloud Pass kostenlos herunter – mit Übungstests, Fortschrittsverfolgung und mehr.
Lernzeitraum: 1 month
Cloud Pass helped me master through practical and realistic questions. The explanations were clear and helped me understand.
Lernzeitraum: 1 month
문제 다 풀고 시험쳤는데 유사한 문제가 많았어요. 다른 분들도 화이팅입니다!
Lernzeitraum: 2 weeks
I studied with Cloud Pass for two weeks, doing around 20 ~30 questions a day. The difficulty level was very similar to the real PCDBE exam. If you’re preparing for this certification, this app is a must have.
Lernzeitraum: 1 month
Very close to the real exam. Thanks
Lernzeitraum: 1 month
Being able to reset my progress and re-solve the hard questions helped me a lot. Passed!


Lade Cloud Pass herunter und erhalte kostenlosen Zugang zu allen Google Professional Cloud Database Engineer-Übungsfragen.
Möchtest du alle Fragen unterwegs üben?
Kostenlose App holen
Lade Cloud Pass kostenlos herunter – mit Übungstests, Fortschrittsverfolgung und mehr.
Your healthcare analytics company is migrating a self-managed PostgreSQL 12 database from an on-premises datacenter to Cloud SQL for PostgreSQL; after cutover, the system must tolerate a single-zone outage in the target region with no more than 3 minutes of disruption (RTO ≤ 3 minutes) and zero transaction loss (RPO = 0), and you want to follow Google-recommended practices for the migration—what should you do?
Nightly automated backups are a disaster recovery mechanism, not high availability. Restoring from backup during an outage typically takes far longer than 3 minutes (instance provisioning, restore time, DNS/app reconfiguration), so it fails the RTO requirement. It also cannot guarantee RPO = 0 because transactions after the last backup would be lost. Backups remain important, but they do not meet the stated objectives.
A CDC/logical replication pipeline to a secondary instance is complex to operate and is not the Google-recommended HA approach for Cloud SQL. Logical replication is typically asynchronous, so it generally cannot guarantee RPO = 0. Manual failover also risks exceeding the 3-minute RTO due to detection, decision, and cutover steps. This option is more error-prone than managed HA.
A cross-region read replica is primarily a disaster recovery strategy for regional outages, not a zonal outage within the same region. Read replicas for Cloud SQL are asynchronous, so they cannot guarantee RPO = 0. Promotion is also a manual/operational action and may not meet the 3-minute RTO consistently. It also adds cross-region cost and complexity beyond the requirement.
Cloud SQL High Availability (regional) is the managed, recommended solution for zonal resilience. It uses a synchronous standby in a different zone within the same region, enabling RPO = 0 for committed transactions. Automatic failover minimizes disruption and is designed to meet tight RTO targets (minutes) during a single-zone failure. This aligns directly with the requirements and best practices.
Core Concept: This question tests Cloud SQL for PostgreSQL high availability design for zonal resilience, specifically meeting strict recovery objectives: RTO ≤ 3 minutes and RPO = 0. In Google Cloud, the recommended pattern for tolerating a single-zone outage within a region is Cloud SQL High Availability (HA), also called a regional instance. Why the Answer is Correct: A Cloud SQL for PostgreSQL HA (regional) instance provisions a primary in one zone and a standby in another zone in the same region using synchronous replication. Because replication is synchronous, committed transactions are durably written to both zones before acknowledgement, which is the key requirement for RPO = 0. During a zonal failure, Cloud SQL performs automatic failover to the standby, typically within minutes, aligning with an RTO of 3 minutes far better than manual approaches. This is the Google-recommended practice for zonal fault tolerance for Cloud SQL. Key Features / Configurations: - Choose “High availability (regional)” when creating the instance. - Standby is in a different zone in the same region; this matches the requirement (single-zone outage in the target region). - Synchronous replication enables zero data loss for committed transactions. - Automatic failover reduces operational burden and improves RTO predictability. - Combine with backups and (optionally) PITR for protection against logical corruption/user error, which HA does not address. Common Misconceptions: Many confuse read replicas (asynchronous) with HA. Read replicas are primarily for scaling reads and DR patterns, and promotion is manual/operationally complex; asynchronous replication cannot guarantee RPO = 0. Backups are essential but are not an HA mechanism and cannot meet a 3-minute RTO. Custom CDC/logical replication pipelines add complexity and still typically cannot guarantee synchronous, zero-loss behavior with fast, automated failover. Exam Tips: Map requirements to Cloud SQL features: - RPO = 0 + zonal outage tolerance in-region => Cloud SQL HA (regional) with synchronous standby. - Cross-region replicas are for regional disasters and usually imply non-zero RPO. - Backups/PITR address data recovery, not availability. Use the Google Cloud Architecture Framework principle of designing for reliability with managed HA and automated failover rather than bespoke replication unless explicitly required.
Your edtech platform uses Cloud Firestore for storage and serves a React web app to a global audience. Each day at 00:00 UTC, you publish the same Top 20 practice tips (20 documents, ~5 KB each; total payload ~100 KB) to approximately 5 million daily active users across North America, Europe, and APAC. You need to cut Firestore read costs and achieve sub-150 ms p95 load times for this daily list while the content remains identical for 24 hours. What should you do?
Enabling serializable isolation (or stronger consistency semantics) targets correctness under concurrency, not cost or latency. Firestore already provides strong consistency for document reads and queries in native mode. Adding stricter transactional patterns would not reduce the number of reads; it may increase latency and contention. This option misunderstands the problem: the bottleneck is repeated global reads of identical content, not inconsistent results.
Moving to a US multi-region can improve availability and latency for North America, but it can worsen or not materially improve p95 latency for Europe/APAC compared to serving from CDN edges. More importantly, it does not reduce Firestore read costs: each user still triggers reads for 20 documents daily. Multi-region placement is not a substitute for caching when the same payload is requested millions of times.
Firestore bundles allow you to pre-generate the exact 20 documents and serve them as a static file. Hosting + Cloud CDN caches the bundle at edge locations worldwide with a 24-hour TTL, delivering low latency and offloading traffic from Firestore. This directly reduces billed Firestore reads and improves p95 load times because most requests are served from nearby CDN POPs rather than hitting the database.
A composite index on publish_date can improve query execution efficiency and avoid query failures for certain filter/order combinations, but it does not change Firestore billing for document reads returned, and it does not address global latency for 5 million users. The query returns the same 20 documents regardless; indexing won’t eliminate the repeated reads or provide edge caching benefits.
Core concept: This question tests how to reduce Cloud Firestore read costs and latency for globally distributed, read-heavy, identical-for-a-day content. The key pattern is to avoid per-user database reads by precomputing and caching immutable (for 24 hours) content at the edge. Why the answer is correct: A Firestore bundle lets you export a known set of documents (the Top 20 tips) into a static artifact that clients can load without issuing Firestore document reads. Serving that bundle from Firebase Hosting fronted by Cloud CDN shifts traffic from Firestore to CDN edge caches. With a 24-hour TTL, most of the 5 million daily users will be served from nearby edge locations, achieving sub-150 ms p95 globally while dramatically reducing Firestore read operations (and therefore cost). Firestore is still the system of record, but the daily list becomes a cacheable static asset. Key features / best practices: - Firestore Bundles: package documents/queries into a file that the client SDK can load, populating the local cache and avoiding network reads for those docs. - Firebase Hosting + Cloud CDN: global edge caching, HTTP caching headers, and low-latency delivery for small payloads (~100 KB). - Cache control: set Cache-Control max-age=86400 (and optionally immutable) aligned to the 24-hour content window; publish a new bundle at 00:00 UTC (often with a versioned filename) to ensure clean cache refresh. - Architecture Framework alignment: improves performance efficiency and cost optimization by offloading repetitive reads to edge caching. Common misconceptions: - Stronger consistency (serializable isolation) does not reduce reads or latency; it can increase overhead. - Moving Firestore to a different multi-region may help some users but won’t eliminate the fundamental cost driver: millions of repeated reads of identical data. - Indexing helps query performance, not per-document read billing or global edge latency. Exam tips: When content is identical for many users and changes on a predictable schedule, prefer CDN/edge caching or static delivery over database reads. For Firestore specifically, remember bundles are designed for preloading and reducing read costs, and Hosting+CDN is the standard global distribution mechanism for web assets.
Your travel-tech company operates a globally distributed seat allocation platform on Cloud Spanner with 3 read-write and 6 read-only regions; after importing 12 million seat and route records from a partner, you observe write latency spikes and CPU hotspots on 2 of 8 leader replicas, and Cloud Monitoring shows hot ranges on a table keyed by a monotonically increasing ticket_id and a composite key (route_id, class) where class has only 3 distinct values; at peak 35,000 writes/second, 70% of writes concentrate in a narrow key range; to follow Google-recommended schema design practices and avoid hotspots without sacrificing strong consistency or availability, what should you do? (Choose two.)
Incorrect. Auto-incrementing (monotonically increasing) primary keys are a known Cloud Spanner anti-pattern for high write rates because new rows land in the same “end” of the keyspace. This concentrates writes on a small number of ranges and their leaders, producing CPU hotspots and latency spikes. Spanner can split ranges, but the newest keys still target the hottest range, so the bottleneck persists.
Incorrect. Normalization changes how data is modeled across tables, which can help with data integrity and reduce duplication, but it does not directly address hot ranges caused by poor primary key distribution. If the primary keys remain sequential or low-cardinality-leading, the same leaders and ranges will still receive the majority of writes. Hotspot mitigation in Spanner is primarily about key design and write distribution.
Incorrect. Promoting low-cardinality attributes (like class with only 3 values) earlier in a composite primary key tends to cluster rows into a few large contiguous key ranges (e.g., all “economy” together). Under heavy writes, this increases the likelihood that a small number of ranges/leaders become hot. In Spanner, the leading key columns determine locality and are critical for distributing write load.
Correct. Promoting high-cardinality attributes early in a multi-attribute primary key improves distribution across the sorted keyspace, which spreads writes across more ranges and leader replicas. In the (route_id, class) example, class has only 3 values, so it should not be a leading discriminator for write distribution. Putting higher-cardinality fields first is a core Spanner schema best practice to avoid hotspots.
Correct. Using a bit-reversed sequential value (or similar key transformation) is a recommended technique to avoid hotspots when you need a unique, sequentially generated identifier. Bit-reversal spreads adjacent sequence numbers across the keyspace, distributing inserts among many ranges and leaders, reducing CPU hotspots and write latency spikes. This preserves Spanner’s strong consistency and availability while improving write scalability.
Core concept: This question tests Cloud Spanner schema design to prevent hotspots. In Spanner, data is stored in sorted key order and split into “ranges” (tablets). Writes concentrate on the range(s) that contain the targeted key values. If many writes land in a narrow, adjacent key space, a small number of splits and their leaders become CPU/latency bottlenecks, even in a multi-region configuration. Why the answer is correct: Two hotspot patterns are described: (1) a monotonically increasing ticket_id primary key, which causes all new inserts to target the “end” of the keyspace (the latest range), and (2) a composite key (route_id, class) where class has only 3 values. If class is placed early (or otherwise drives locality), it creates only three large contiguous partitions, concentrating writes. Google-recommended practice is to ensure the leading portion of the primary key has high cardinality and good distribution. Therefore, promote high-cardinality attributes in multi-attribute primary keys (D). For sequential IDs that must remain ordered logically, use a key transformation such as bit-reversed sequential values (E) to spread inserts across the keyspace while preserving uniqueness and enabling efficient generation. Key features / best practices: Spanner splits ranges automatically, but it cannot eliminate a “single hot end” caused by sequential keys; it can only keep splitting the hot range, which still has a single leader handling the write load. Bit-reversal (or similar hashing/salting patterns) distributes writes across many ranges and leaders. For composite keys, ordering matters: put the attribute with the most distinct values (and best write distribution) first; keep low-cardinality fields later. Common misconceptions: Normalization (B) may improve update anomalies or storage, but it does not inherently fix key-range write concentration. Promoting low-cardinality attributes (C) often worsens hotspots by clustering writes into a few contiguous ranges. Auto-incrementing keys (A) are a classic anti-pattern in Spanner for high write throughput. Exam tips: When you see “hot ranges,” “monotonically increasing,” “low cardinality,” or “writes concentrated in a narrow key range,” think primary key design and key ordering. In Spanner, the first key parts dominate locality; choose high-cardinality leading keys and use techniques like bit-reversed sequences for sequential identifiers to maintain strong consistency and availability without hotspots.
You are building a Pub/Sub–triggered service on Cloud Functions (2nd gen) in us-central1 that must connect to a Cloud SQL for PostgreSQL instance. Your security policy requires the database to accept connections only from workloads inside the prod-vpc VPC, with no public internet exposure. The function can scale up to 200 concurrent requests during peak load, and you need stable connection management. What should you do to meet the security and performance requirements?
Incorrect. Using an external (public) IP violates the requirement of “no public internet exposure,” even if firewall rules restrict source ranges. Also, Cloud Functions does not have fixed egress IPs by default, making firewall allowlisting brittle. The Cloud SQL Auth Proxy helps with IAM-based auth and TLS, but it does not inherently provide stable connection management for 200 concurrent requests.
Incorrect. A public IP still exposes the database to the internet, conflicting with the security policy. Additionally, relying on firewall rules to allow only Cloud Functions traffic is problematic because serverless egress IPs are not stable unless you add additional NAT/egress controls. Connection pooling is good for performance, but the networking model fails the “inside prod-vpc only” requirement.
Partially correct on networking: private IP + private services access + Serverless VPC Access is the right private connectivity pattern. However, choosing Cloud SQL Auth Proxy as the primary answer does not address the stated need for “stable connection management” under 200 concurrent requests. The proxy is not a substitute for pooling; without pooling you can still exhaust PostgreSQL connections and degrade performance.
Correct. Cloud SQL private IP with private services access ensures the instance is reachable only within prod-vpc and not exposed publicly. Serverless VPC Access in the same region lets Cloud Functions (2nd gen) reach that private IP. Using a connection pool (and optionally a pooling layer like PgBouncer) provides stable connection management under high concurrency, preventing connection storms and aligning with Cloud SQL connection limits and best practices.
Core concept: This question tests secure private connectivity from serverless (Cloud Functions 2nd gen) to Cloud SQL for PostgreSQL, and how to handle high concurrency with stable database connections. The key services are Cloud SQL private IP with private services access, Serverless VPC Access, and application-side connection pooling. Why the answer is correct: The requirement that the database accept connections only from workloads inside prod-vpc, with no public internet exposure, means Cloud SQL should use only a private IP and no public IP. Cloud Functions (2nd gen) is serverless and does not attach directly to your VPC by default, so you must use a Serverless VPC Access connector in the same region and VPC to route traffic to private resources. For performance, 200 concurrent requests can create too many database sessions if each request opens its own connection, so a connection pool is needed to bound and reuse connections. Option D is the only choice that satisfies both the private networking requirement and the stable connection management requirement. Key features: - Cloud SQL for PostgreSQL configured with private IP only, using private services access so the instance receives an internal address reachable from prod-vpc. - Serverless VPC Access connector in us-central1 attached to prod-vpc so Cloud Functions (2nd gen) can reach the private IP. - Application-level connection pooling to limit total database connections and improve reuse under bursty serverless concurrency. - Standard database authentication and IAM controls still apply; private IP reduces exposure but does not replace least-privilege access design. Common misconceptions: - Restricting a public IP with firewall rules is not the same as eliminating public exposure; a public endpoint still violates a strict no-internet-exposure policy. - The Cloud SQL Auth Proxy improves authentication and encrypted connectivity, but it does not by itself solve connection scaling problems caused by high concurrency. - Serverless workloads do not automatically live inside your VPC, so private IP access requires a Serverless VPC Access connector. Exam tips: When a question says serverless must reach a private database, think Serverless VPC Access plus Cloud SQL private IP. When it also mentions high concurrency or connection stability, think connection pooling. Prefer answers that remove the public IP entirely when the requirement explicitly forbids internet exposure.
Your healthcare analytics platform is closing its private data center and must move a 2-node Oracle RAC 19c OLTP cluster (24 TB) that uses ASM and SCAN listeners to Google Cloud. The application depends on RAC services with FAN/TAF-based failover and requires minimal to no code changes, while maintaining equivalent performance (~5 ms storage latency and >40,000 TPS) after migration. You need a supported landing zone that preserves the RAC architecture and existing Oracle licensing with the least rework. What should you do?
Cloud Spanner is a cloud-native, horizontally scalable relational database with strong consistency and multi-region HA. However, migrating from Oracle RAC OLTP to Spanner requires significant schema and application changes (SQL dialect differences, data model changes, and transaction semantics considerations). It also does not preserve Oracle RAC services, ASM, SCAN, or FAN/TAF behavior. This violates the minimal/no code change requirement and is not a “preserve RAC architecture” landing zone.
Compute Engine VMs with persistent disks and instance groups are not the right fit for a supported Oracle RAC deployment. RAC depends on tightly controlled cluster networking and shared storage behavior; building RAC on generic VMs can run into supportability constraints and may not meet the required low storage latency and high TPS consistently. Instance groups address VM availability, not Oracle RAC’s cluster semantics (SCAN, ASM, FAN/TAF) and shared-disk requirements.
Cloud SQL for PostgreSQL (even with Database Migration Service) implies a cross-engine migration from Oracle to PostgreSQL. That requires application and SQL changes, revalidation of stored procedures, data types, and performance tuning, and it will not preserve RAC-specific client failover mechanisms (FAN/TAF) or Oracle features. It also conflicts with the requirement to keep existing Oracle licensing and to preserve the RAC architecture with minimal rework.
Bare Metal Solution for Oracle is purpose-built to run Oracle workloads, including Oracle RAC, on dedicated physical servers with predictable performance and supported architectures. It allows you to keep Oracle RAC features (ASM, SCAN listeners, FAN/TAF-based failover) with minimal application changes and supports BYOL licensing models. It is the most appropriate landing zone when you must preserve RAC and meet stringent OLTP latency/TPS requirements in Google Cloud.
Core Concept: This question tests choosing the correct Google Cloud landing zone for a highly specialized, performance-sensitive Oracle RAC workload that must retain RAC-specific features (ASM, SCAN, FAN/TAF) with minimal application change and continued use of existing Oracle licenses. Why the Answer is Correct: Bare Metal Solution (BMS) for Oracle is the supported Google Cloud offering designed specifically to run Oracle databases (including Oracle RAC) on dedicated physical servers in Google-managed facilities, connected to a customer VPC. It preserves the RAC architecture (multiple nodes, shared storage semantics as implemented for RAC on BMS), supports ASM and SCAN listeners, and enables the same client failover patterns (FAN/TAF) with minimal to no code changes because the database and connectivity model remain Oracle RAC. It also aligns with the performance requirement: BMS provides predictable, low-latency storage and network characteristics suitable for OLTP at high TPS, avoiding the variability and feature gaps you’d encounter on generic virtualized platforms. Key Features / Best Practices: - Supported Oracle RAC on dedicated hosts, avoiding unsupported DIY RAC builds. - Retains Oracle licensing model (commonly BYOL) and operational tooling. - Network integration to VPC enables private connectivity from apps (including hybrid via Cloud Interconnect/VPN). - Migration approach typically uses Oracle-native methods (RMAN duplicate/restore, Data Guard, or storage-based approaches depending on design) to minimize downtime. - Fits Google Cloud Architecture Framework goals: reliability (RAC HA), performance efficiency (dedicated hardware), and operational excellence (supported reference architectures). Common Misconceptions: Teams often assume “lift-and-shift to Compute Engine” is equivalent, but Oracle RAC has strict requirements around shared storage, cluster interconnect, and vendor support. Similarly, modernizing to Spanner or PostgreSQL can improve cloud-native HA, but it violates the “minimal to no code changes” constraint and may not meet Oracle feature parity. Exam Tips: When you see Oracle RAC + ASM + SCAN + FAN/TAF + minimal change + keep licensing + high OLTP performance, strongly bias toward Bare Metal Solution for Oracle. Compute Engine is appropriate for single-instance Oracle or some HA patterns, but RAC supportability and performance predictability are the differentiators in exam scenarios.
Your media-streaming platform runs an AlloyDB for PostgreSQL cluster in us-central1 that is accessible only via a private IP; compliance forbids opening a public endpoint or installing agents on database VMs, and you must continuously replicate a subset of 12 tables (about 1.8 TB total, up to 2,500 row changes per second) into BigQuery for analytics and ML with end-to-end latency under 10 seconds and 99.9% delivery reliability using Google-managed services that automatically handle schema changes and scale without downtime; what should you do?
A custom GKE microservice that polls AlloyDB is not CDC and will struggle to meet <10s latency and 99.9% delivery reliability at 2,500 changes/sec without significant custom engineering (deduplication, ordering, retries, backpressure, schema drift handling). Polling also increases load on the source and is operationally complex. It violates the requirement to use Google-managed services that automatically scale and handle schema changes without downtime.
BigQuery federated queries (via external connections) are for querying data in place, not continuously replicating it into BigQuery. They typically won’t meet the requirement to deliver changes with under-10-second end-to-end latency into BigQuery tables for downstream analytics/ML, and performance/cost can be unpredictable at high change rates. It also doesn’t provide a durable delivery pipeline with 99.9% delivery reliability guarantees for replicated data.
Database Migration Service is designed for migrations and continuous replication between databases (e.g., on-prem/MySQL/PostgreSQL to Cloud SQL/AlloyDB). It is not the standard managed service for streaming CDC directly into BigQuery as a target. Even if you could stage data elsewhere, it would add components and latency and would not match the requirement for automatic schema change handling and a direct, scalable, managed path into BigQuery.
Datastream is the managed CDC service that captures changes from PostgreSQL-compatible sources like AlloyDB using log-based replication without installing VM agents. It supports private connectivity so the source can remain private-only. Pairing Datastream with the Google-provided Dataflow template to stream into BigQuery provides autoscaling, checkpointing, and robust retry semantics to achieve low-latency (<10s) ingestion and high delivery reliability, while also supporting schema evolution handling in a managed way.
Core concept: This question tests near-real-time change data capture (CDC) from a private AlloyDB for PostgreSQL source into BigQuery using fully managed Google services, with low latency, high reliability, automatic scaling, and resilience to schema evolution. Why the answer is correct: Datastream is Google’s managed CDC service for databases (including PostgreSQL-compatible sources such as AlloyDB). It can capture ongoing row-level changes and stream them with low latency. Because the AlloyDB cluster is private-only and you cannot install agents on database VMs, Datastream fits: it uses database-native replication/log-based mechanisms rather than host agents. Datastream supports private connectivity patterns (e.g., VPC connectivity/Private Service Connect depending on setup) so you can keep traffic on private IP paths. To land changes in BigQuery with <10s end-to-end latency and 99.9% delivery reliability, the recommended pattern is Datastream CDC into a streaming pipeline using the Google-provided Dataflow template for Datastream-to-BigQuery. Dataflow provides autoscaling, checkpointing, and exactly-once/at-least-once processing semantics (depending on sink behavior), enabling high delivery reliability and continuous operation without downtime. Key features / configurations / best practices: - Configure Datastream for PostgreSQL CDC with an include list for the 12 required tables (subset replication). - Use private connectivity (no public endpoints) and least-privilege database replication user. - Use the Datastream-to-BigQuery Dataflow template to handle continuous ingestion, scaling, and operational management. - Enable schema evolution handling: Datastream captures DDL changes and the template/pipeline can propagate schema updates to BigQuery (within supported change types), reducing manual intervention. - Design for reliability per the Google Cloud Architecture Framework: use regional resources appropriately, monitor lag/throughput, set alerts, and plan quotas (BigQuery streaming inserts, Dataflow worker limits, Datastream throughput) for 2,500 changes/sec and 1.8 TB initial state. Common misconceptions: - “Federated queries” feels simpler, but it does not replicate data nor meet sub-10-second analytics freshness at scale. - “DMS can replicate to BigQuery” is a common confusion: DMS is for database-to-database migrations/replication (e.g., to Cloud SQL/AlloyDB), not continuous CDC into BigQuery as the primary target. - “Custom polling service” seems flexible but violates managed-service/scaling/reliability requirements and typically cannot meet low-latency CDC without heavy engineering. Exam tips: When you see: private-only source, no agents, continuous CDC, BigQuery target, low latency, and managed schema change handling—think Datastream + Dataflow template to BigQuery. Reserve DMS for migrations between operational databases, not analytics ingestion into BigQuery.
Your company runs a global event-ticketing platform on Google Cloud. Your engineering team is building a real-time seat reservation service that must prevent double-booking via strongly consistent, durable writes and elastically scale with live traffic spikes. The system must sustain up to 120,000 write operations per second at peak, keep p95 write latency under 15 ms, and incur less than 5 minutes of unplanned downtime per month (99.99% availability). You need a primary data store with very low-latency, high write throughput, and a 99.99% uptime SLA for production. What should you do?
Cloud SQL (MySQL/PostgreSQL/SQL Server) provides strong consistency and ACID transactions, but it primarily scales vertically and has practical limits for sustained 120,000 writes/sec with p95 <15 ms without significant sharding and operational complexity. High availability is available, but meeting 99.99% with elastic spike handling at this write rate is not its typical sweet spot. It’s better for regional OLTP workloads with moderate scaling needs.
Cloud Spanner is purpose-built for strongly consistent, horizontally scalable relational workloads. It supports ACID transactions to prevent double-booking, scales write throughput by adding nodes/processing units, and offers a 99.99% SLA with multi-region configurations. With proper schema/key design to avoid hotspots and colocated application compute, Spanner can achieve low write latency at high throughput while remaining a durable system of record for reservations.
Memorystore (Redis/Memcached) is an in-memory cache/data store optimized for low latency, not a primary durable database of record. While Redis can support atomic operations, it does not provide the same durability, multi-region relational transactional guarantees, or 99.99% SLA characteristics expected for a mission-critical reservation ledger. It is commonly used to cache seat maps, sessions, or rate limits, but not as the authoritative booking store.
Cloud Bigtable delivers very high throughput and low latency for wide-column/key-value workloads and can scale massively. However, it is not a relational database and does not provide full ACID transactions across rows in the way needed for strict seat reservation semantics (preventing double-booking with multi-entity constraints). Bigtable is excellent for event streams, time-series, and large-scale analytics serving, but not ideal as the primary transactional reservation database.
Core Concept: This question tests selecting a primary operational database that can deliver (1) strong consistency to prevent double-booking, (2) very high write throughput with low latency, (3) elastic scaling during spikes, and (4) a 99.99% availability SLA. In Google Cloud, this combination most directly maps to Cloud Spanner. Why the Answer is Correct: Cloud Spanner is a globally distributed, strongly consistent relational database designed for horizontal scale. Seat reservation is a classic transactional workload requiring ACID semantics (e.g., conditional updates, unique constraints, serializable isolation) so two users cannot reserve the same seat. Spanner provides strongly consistent, durable writes and can scale write throughput by adding nodes. It also offers a 99.99% SLA for multi-region configurations, aligning with the <5 minutes/month downtime requirement. Key Features / Best Practices: - Strong consistency + ACID transactions: Use read-write transactions for seat holds/confirmations; enforce uniqueness with primary keys/unique indexes. - Horizontal scaling: Size by nodes/processing units; Spanner scales reads/writes without sharding at the application layer. - Low latency at scale: Place compute close to the Spanner instance; use multi-region for availability and regional for lower single-region latency tradeoffs. - Availability: Use multi-region instance configuration (e.g., nam* or eur*) and follow best practices for schema design to avoid hotspots (e.g., avoid monotonically increasing keys; use hashed/UUID keys or key salting). - Architecture Framework alignment: Reliability (multi-region, automated replication), Performance (scale-out writes), and Operational Excellence (managed service, online schema changes). Common Misconceptions: Bigtable can handle massive write throughput and low latency, but it is not a relational/ACID transactional database and does not provide the same strong transactional guarantees needed to prevent double-booking across rows/entities. Memorystore is in-memory and not a durable system of record. Cloud SQL provides strong consistency but typically cannot elastically scale to 120k writes/sec with p95 <15 ms without complex sharding and still may not meet the 99.99% SLA requirement in the same way. Exam Tips: When you see “prevent double-booking,” “strongly consistent durable writes,” “global,” “elastic scale,” and “99.99% SLA,” think Cloud Spanner. For very high throughput key-value/time-series without relational transactions, think Bigtable. For caching, think Memorystore. For traditional relational workloads with vertical scaling and read replicas, think Cloud SQL.
You operate a real-time sports ticketing platform that uses Cloud SQL for MySQL in asia-northeast1. At 10:12 JST on a weekday launch, the instance entered an automatic maintenance event that caused 6 minutes of downtime, impacting over 15,000 concurrent users. Your SLO requires that any maintenance occur only outside your peak window of 09:00–22:00 JST, and you must make an immediate configuration change without migrating platforms or adding new components. What should you do to prevent future maintenance from occurring during peak hours?
Correct. Configuring the Cloud SQL instance maintenance window to a non-business period is the intended control to reduce the likelihood of maintenance occurring during peak hours. You select a weekly day/time window (convert JST to UTC when setting it) so Google schedules eligible maintenance within that window. This is an immediate configuration change and meets the constraint of not adding components or migrating.
Incorrect. Migrating to Cloud Spanner is a platform change and a significant migration effort (schema changes, application changes, data migration, cost model differences). The question explicitly forbids migrating platforms or adding new components. Also, even fully managed services still have operational events; the key is meeting the stated requirement with the simplest allowed change.
Incorrect. Opening a Support case may help explain why downtime exceeded expectations, but it does not implement a preventative control to ensure maintenance occurs outside 09:00–22:00 JST. The requirement is to make an immediate configuration change to prevent future peak-hour maintenance. Support is reactive and not a configuration mechanism for scheduling maintenance windows.
Incorrect. Cloud Scheduler cannot “enforce” Cloud SQL maintenance windows or limit Google-managed maintenance to 5 minutes. Maintenance duration is not something you can cap with an external scheduler, and Cloud SQL maintenance is controlled through Cloud SQL settings (maintenance window/track), not by triggering jobs. This option misunderstands the managed nature of Cloud SQL maintenance.
Core Concept: This question tests Cloud SQL operational controls for planned maintenance, specifically configuring a Cloud SQL maintenance window to align with availability objectives. In Cloud SQL for MySQL, Google periodically applies maintenance (patching, minor version updates, infrastructure work). Some maintenance can cause brief downtime, especially for single-zone instances or when failover/replica promotion is involved. Why the Answer is Correct: Your SLO requires maintenance only outside 09:00–22:00 JST and you must make an immediate configuration change without migrating platforms or adding components. The direct, supported control is to set the instance’s maintenance window to a non-peak period (e.g., Sunday 01:00–03:00 JST). Cloud SQL will then schedule eligible maintenance within that window (best-effort), significantly reducing the risk of maintenance starting during peak hours. This is exactly what the maintenance window feature is designed for. Key Features / Best Practices: - Cloud SQL maintenance window: choose day-of-week and hour (UTC-based in configuration; you must convert from JST to UTC correctly). Also consider setting the maintenance track (preview vs stable) depending on risk tolerance. - Align with Google Cloud Architecture Framework (Reliability): plan for operational events, use controlled maintenance windows, and design for graceful degradation. While HA (regional) can reduce downtime, the question forbids adding components. - Operational detail: maintenance windows are not a hard guarantee for every event, but they are the primary control available without architectural changes. Common Misconceptions: - “Avoid maintenance by switching databases” (Spanner) is a migration and violates constraints. - “Support can explain downtime” doesn’t prevent recurrence during peak hours. - “Cloud Scheduler can enforce maintenance duration” is not how Cloud SQL maintenance works; you cannot programmatically cap Google-managed maintenance to 5 minutes. Exam Tips: When a question asks to prevent Cloud SQL maintenance during business hours and restricts you from migrations or new components, the expected answer is configuring the Cloud SQL maintenance window (and optionally maintenance track). For availability beyond maintenance windows, think HA (regional), read replicas, and application-level retries, but only if the question allows architectural changes.
You are designing a global order reconciliation service for a multinational retail platform. Applications in North America, Europe, and Asia must be able to read and write concurrently, with p95 cross-region transaction commit latency under 200 ms. The database must be fully managed, relational (ANSI SQL), provide global external consistency with RPO=0, support online schema changes, and meet a 99.99% availability target. Which Google Cloud service should you choose?
Bigtable is a fully managed wide-column NoSQL database optimized for very high throughput and low-latency key/value access (time-series, IoT, analytics serving). It is not ANSI SQL relational and does not provide globally externally consistent, multi-region transactional semantics like Spanner. While it can replicate for availability, it is not the right fit for cross-region ACID transactions with strict consistency and online relational schema evolution requirements.
Firestore is a fully managed document database with strong consistency and multi-region replication options, and it can support global applications. However, it is not a relational ANSI SQL database and does not provide the same relational schema, joins, and SQL transaction model expected for an order reconciliation system described as relational. It also does not match the classic “external consistency + SQL + global transactions” requirement that points to Spanner.
Cloud SQL for MySQL is a managed relational database, but it is primarily regional. Cross-region designs typically rely on read replicas and asynchronous replication, which cannot guarantee RPO=0. Achieving concurrent multi-region writes with global external consistency is not a Cloud SQL capability; failover changes the primary and introduces operational complexity and potential data loss depending on replication mode. Meeting 99.99% globally with active-active writes is not its target use case.
Cloud Spanner is the only Google Cloud database that directly matches all requirements: fully managed, relational with ANSI SQL, global external consistency (true serializability), synchronous replication for RPO=0, and multi-region configurations designed for 99.99% availability. It supports concurrent reads/writes from multiple regions and provides online schema changes. It is purpose-built for global transactional systems like order processing and reconciliation.
Core Concept: This question tests selecting a fully managed, globally distributed relational database that supports concurrent multi-region reads/writes with strong consistency (external consistency), zero data loss (RPO=0), and very high availability. Why the Answer is Correct: Cloud Spanner is Google Cloud’s globally distributed, horizontally scalable, ANSI SQL relational database designed for multi-region active-active workloads. It provides external consistency (true serializability) for transactions across regions, which is the key requirement behind “global external consistency with RPO=0.” With Spanner, commits are replicated synchronously across replicas in a multi-region configuration, so acknowledged commits are durable even if a region fails (RPO=0). Spanner is fully managed and supports online schema changes, aligning with the operational requirements. Key Features / Configurations / Best Practices: - Multi-region instance configurations (e.g., nam3, eur3, asia1) provide high availability and synchronous replication across geographically separated regions. This supports a 99.99% availability target when configured appropriately. - External consistency is enabled by Spanner’s TrueTime-based concurrency control, ensuring globally consistent reads and writes. - ANSI SQL and relational modeling are first-class, including secondary indexes and transactions. - Online schema changes allow many schema updates without downtime (important for global retail platforms). - Latency: Cross-region commit latency depends on replica placement and network distance; Spanner is the intended service when you must do cross-region transactions with strong consistency. In practice, you design the instance configuration and application access patterns (e.g., locality, leader placement) to meet p95 targets. Common Misconceptions: Firestore is multi-region and strongly consistent, but it is a NoSQL document database and not ANSI SQL relational. Cloud SQL is relational but is not designed for global active-active writes with external consistency and RPO=0 across continents; cross-region replication is typically asynchronous and failover changes the primary. Bigtable is wide-column NoSQL and does not provide relational SQL transactions or external consistency across regions. Exam Tips: When you see “global,” “concurrent writes in multiple regions,” “ANSI SQL,” “external consistency,” and “RPO=0,” default to Cloud Spanner. If the question instead emphasizes PostgreSQL/MySQL compatibility with read replicas and regional HA, think Cloud SQL; if it emphasizes document data and offline sync, think Firestore; if it emphasizes massive throughput key-value/time-series, think Bigtable.
A global event ticketing platform runs its reservation and seat-allocation system on Cloud SQL for PostgreSQL 14; your SLOs require RTO ≤ 5 minutes and RPO ≤ 5 minutes, you must tolerate a single-zone outage with zero data loss and also be able to recover from a regional outage within minutes, and you need to choose a high-availability and disaster-recovery topology that meets these constraints without changing the application.
Cross-region read replicas use asynchronous replication, which cannot guarantee zero data loss during a primary failure (RPO could be > 0 depending on lag). Also, having only cross-region replicas does not address the requirement to tolerate a single-zone outage with zero data loss and fast automatic failover; you still need HA within the primary region (multi-zone synchronous standby). This option focuses on DR but misses HA and the zero-data-loss zonal requirement.
A synchronous failover replica (HA) must be in a different zone, but keeping everything only within one region does not satisfy the requirement to recover from a regional outage within minutes. Same-region HA protects against zonal failures, not regional disasters. Even if you add multiple replicas, a region-wide outage would take out the primary and all same-region replicas, violating the DR requirement.
This matches Cloud SQL best practice: deploy an HA (regional) instance with a synchronous standby in another zone for zero-data-loss zonal failover, and add cross-region read replicas (asynchronous) for DR. It satisfies zero data loss for a single-zone outage (synchronous) and provides a near-real-time copy in another region that can be promoted to meet RPO/RTO targets with proper monitoring and tested procedures, without changing the application.
Cloud SQL does not provide synchronous replication for cross-region replicas; cross-region replication is asynchronous. Additionally, using asynchronous replicas across zones in the same region would not guarantee zero data loss for a zonal outage, violating the requirement. This option reverses the correct pattern: you want synchronous within a region (multi-zone HA) and asynchronous across regions (DR).
Core concept: This question tests Cloud SQL for PostgreSQL high availability (HA) versus disaster recovery (DR). In Cloud SQL, HA is achieved with a regional instance that maintains a synchronous standby (failover replica) in a different zone within the same region. DR across regions is typically achieved with cross-region read replicas that use asynchronous replication. Why the answer is correct: You must tolerate a single-zone outage with zero data loss and meet RPO ≤ 5 minutes. Zero data loss for a zonal failure requires synchronous replication to a standby in another zone, which is exactly what Cloud SQL HA (regional) provides. For a regional outage, you need a copy of the data in another region and the ability to promote it quickly. Cross-region replicas in Cloud SQL are asynchronous, but with typical low replication lag they can meet an RPO of minutes, and promotion provides a recovery path within minutes (RTO ≤ 5 minutes) if operational runbooks/automation are in place. This topology also avoids application changes because failover within a region keeps the same instance connection name, and DR promotion can be handled by updating connection endpoints/DNS or using a connection indirection layer without changing application code. Key features / best practices: - Use a Cloud SQL HA (regional) primary: synchronous standby in another zone, automatic failover for zonal outages. - Add cross-region read replica(s): asynchronous replication for DR and optional read scaling. - Monitor replication lag and set alerting to ensure RPO objectives are met. - Pre-provision the DR replica with adequate machine type/storage and test promotion procedures to meet RTO. - Align with Google Cloud Architecture Framework: design for reliability (multi-zone HA + multi-region DR), operational excellence (runbooks, testing), and cost awareness (cross-region replicas add cost). Common misconceptions: - Assuming read replicas can be synchronous across regions (Cloud SQL cross-region replication is asynchronous). - Believing same-region replicas alone cover regional outages (they do not). - Thinking “asynchronous everywhere” can still guarantee zero data loss for zonal failures (it cannot). Exam tips: - For Cloud SQL: “HA/regional instance” implies synchronous standby across zones and automatic failover. - “Read replicas” are primarily for scaling and DR; they are asynchronous. - Map requirements explicitly: zero data loss for zonal outage => synchronous multi-zone; regional outage recovery => cross-region replica + promotion and operational readiness.
You manage a development analytics environment running on Cloud SQL for MySQL in us-central1. The instance stores approximately 1.5 TB of data, with automated backups enabled to run daily and retain 7 copies. The workload is not mission-critical, and an RPO of 24 hours is acceptable. You need to lower monthly backup storage charges without disabling backups or changing the instance machine type. What should you change to reduce backup costs?
Changing automated backups to every 48 hours would reduce the number of backups created, but it conflicts with the requirement of a 24-hour RPO. If the last backup is up to 48 hours old, you could lose up to 2 days of data, exceeding the acceptable data loss window. Also, Cloud SQL automated backups are typically configured on a daily schedule window rather than an every-N-hours cadence.
Cloud SQL does not provide a supported option to choose a different storage tier (SSD vs HDD) specifically for automated backup storage. The instance’s disk type affects primary storage performance/cost, but backup artifacts are managed by the service and billed as backup storage without a user-selectable HDD/SSD tier. Therefore, this is not a valid or applicable cost-reduction lever for automated backups.
Cloud SQL automated backups are managed by the service and are not configurable to be stored in an arbitrary “lower-cost region within the same continent.” While some services support multi-region or cross-region storage classes, Cloud SQL automated backups don’t offer a setting to relocate them for cost optimization. Additionally, moving backups across regions can introduce compliance and restore-time considerations.
Reducing retention from 7 days to 2 days directly lowers the amount of backup storage retained and therefore reduces monthly backup storage charges. It still supports a 24-hour RPO because you continue taking daily backups but keep fewer historical copies. This aligns backup configuration with business requirements and is the most straightforward, supported way to reduce Cloud SQL backup costs without changing compute or disabling backups.
Core Concept: This question tests Cloud SQL automated backup cost drivers and how to tune backup configuration to meet a stated recovery objective (RPO) at lower cost. In Cloud SQL for MySQL, automated backups create backup artifacts stored and billed as backup storage. The two primary levers that directly affect ongoing backup storage consumption are (1) how many backups you retain and (2) how large each backup is (which you are not changing here). Why the Answer is Correct: With 1.5 TB of data and a 7-backup retention policy, you are paying for multiple days of retained backup storage. The requirement states the workload is not mission-critical and an RPO of 24 hours is acceptable. An RPO of 24 hours means you need the ability to restore to a point no more than 24 hours before an incident. Retaining 2 days of backups (2 copies) still supports a 24-hour RPO while materially reducing the amount of retained backup storage compared to 7 copies. Therefore, reducing automated backup retention from 7 days to 2 days is the most direct, supported, and predictable way to lower monthly backup storage charges without disabling backups or changing the instance machine type. Key Features / Best Practices: Cloud SQL lets you configure automated backup retention (number of backups/days). Align retention with business requirements (RPO/RTO) per the Google Cloud Architecture Framework’s cost optimization and reliability principles: don’t retain more than needed for the stated recovery objective. For dev/test analytics environments, shorter retention is common, especially when RPO is relaxed. Common Misconceptions: Many assume lowering backup frequency (Option A) is the best cost lever. However, Cloud SQL automated backups are designed around daily scheduling; even if frequency were adjustable, reducing frequency to every 48 hours would violate the stated 24-hour RPO. Others assume you can change backup “storage tier” (Option B) or move backups to a cheaper region (Option C), but Cloud SQL automated backups don’t offer a selectable HDD/SSD tier for backup artifacts, nor a supported configuration to store automated backups in an arbitrary cheaper region. Exam Tips: When you see “reduce backup costs” in Cloud SQL, first look for retention settings and confirm they still satisfy RPO. If RPO is 24 hours, keep at least daily backups and retain enough copies to cover operational realities (missed backup, corruption discovered late), but avoid excessive retention for non-critical environments.
You are designing a new centralized fleet maintenance scheduling system for 180 municipal depots, each with approximately 450 GB of historical data; you plan to onboard 12 depots per week over a 15-week phased rollout; the solution must use an SQL database, minimize costs and user disruption during each regional cutover, and allow capacity to scale up during weekday peaks and down on nights and public holidays to control spend; what should you do?
Oracle RAC on Bare Metal Solution can deliver high performance and compatibility for Oracle workloads, but it is typically high cost and operationally heavier than cloud-native managed databases. It also doesn’t naturally meet the requirement to scale capacity up/down frequently to control spend; RAC capacity is largely fixed to provisioned hardware. This option is more appropriate for lift-and-shift of existing Oracle RAC with strict compatibility needs.
Sharded Cloud SQL instances can reduce per-instance size and isolate depots, but it introduces significant operational complexity (many instances, backups, patching, connection routing, shard rebalancing). Cross-depot analytics and centralized scheduling queries become harder. Cloud SQL scaling is primarily vertical and can be disruptive; scaling down is limited and not designed for frequent weekday/holiday elasticity. This conflicts with minimizing disruption and enabling elastic cost control.
Cloud Bigtable supports autoscaling and massive throughput, but it is a wide-column NoSQL database, not an SQL relational database. The requirement explicitly states the solution must use an SQL database, which rules Bigtable out. Bigtable also requires different data modeling (row keys, column families) and does not provide relational joins/constraints expected in a scheduling system without additional layers.
Cloud Spanner provides a fully managed SQL database with horizontal scalability and high availability. It supports online scaling of compute (nodes/processing units), enabling capacity to increase during weekday peaks and decrease during nights/holidays to manage cost. While Spanner lacks built-in autoscaling, you can implement a custom mechanism using Cloud Monitoring metrics and automation (Cloud Scheduler + Cloud Functions/Run) to adjust capacity with minimal user disruption during phased cutovers.
Core concept: This question tests selecting a managed SQL database that can support a centralized, multi-tenant workload with phased onboarding and minimal cutover disruption, while also controlling cost by scaling capacity up/down with predictable usage patterns. It also probes knowledge of which Google Cloud databases support SQL plus elastic scaling. Why the answer is correct: Cloud Spanner is the only Google Cloud-native, fully managed relational (SQL) database in the options that is designed for horizontal scale and high availability across zones/regions. With 180 depots and ~450 GB each (~81 TB total), a single Cloud SQL topology becomes operationally complex and can hit practical limits (instance sizing, storage/IOPS, connection management, maintenance windows). Spanner supports online scaling of compute capacity (adding/removing nodes or processing units) without downtime, enabling you to scale up for weekday peaks and scale down nights/holidays to control spend. Because onboarding is phased (12 depots/week), you can migrate depot-by-depot (or region-by-region) into Spanner with controlled cutovers, using dual-write or change data capture patterns to minimize user disruption. Key features / best practices: - Spanner SQL with strong consistency and high availability (regional or multi-regional configurations). - Online compute scaling: adjust nodes/processing units; pair with a scheduler (e.g., Cloud Scheduler + Cloud Functions/Run) and metrics (Cloud Monitoring) to implement “custom autoscaling.” - Schema design for multi-tenancy: use interleaving/secondary indexes carefully; choose primary keys to avoid hotspots (e.g., include depot_id + time-based component with hashing if needed). - Migration approach: staged backfill + CDC (Datastream where applicable, or application-level dual writes) to reduce downtime during each cutover. Common misconceptions: Cloud SQL “feels” cheaper and simpler, but sharding 180 depots across many instances increases operational overhead, complicates cross-depot reporting, and doesn’t provide true scale-down/scale-up elasticity without disruptive instance resizing. Bigtable autoscaling is attractive, but it is not an SQL relational database. Oracle RAC on Bare Metal is powerful but expensive and not aligned with managed, elastic scaling goals. Exam tips: When requirements include (1) SQL, (2) very large aggregate dataset, (3) need for horizontal scalability and high availability, and (4) online capacity changes, Spanner is the default answer. Also note that Spanner doesn’t have built-in autoscaling; exam questions often expect you to implement scheduled/metric-driven scaling using Cloud Monitoring and automation.
Your company needs to relocate a high-traffic payment ledger database from a co-location data center to Google Cloud. The source is MySQL 8.0.28 using the InnoDB engine with binary logging enabled in ROW format; the dataset is 1.6 TB, averaging 900 TPS with bursts up to 1,500 TPS. A dedicated Cloud VPN tunnel provides 1 Gbps bandwidth between on-premises and Google Cloud. You must preserve ACID transactions and keep the production cutover under 3 minutes at 02:00 UTC. Cloud SQL for MySQL supports the source version. What should you do?
Correct. Database Migration Service is the Google-recommended service for migrating supported MySQL databases into Cloud SQL with minimal downtime. It performs an initial full load of the 1.6 TB dataset and then uses MySQL binlog-based replication to continuously apply ongoing changes from the source, which is exactly what is needed for a database sustaining 900 TPS with bursts to 1,500 TPS. Because the source already has binary logging enabled in ROW format, the prerequisites align well with DMS continuous migration. This allows the team to keep production online during the bulk transfer and use the 3-minute cutover window only to stop writes, let replication catch up, finalize the migration, and redirect clients.
Incorrect. Cloud Data Fusion is primarily an ETL and data integration platform, not a purpose-built transactional database migration service for low-downtime cutovers. A table-by-table pipeline approach does not naturally preserve commit ordering, transactional consistency, or continuous synchronization for a high-traffic payment ledger. It would also require substantial downtime or custom reconciliation logic to handle changes occurring during the migration. That makes it a poor fit for a 1.6 TB OLTP database with a strict sub-3-minute cutover requirement.
Incorrect. mysqldump is a logical export tool and is generally too slow for a 1.6 TB production database when downtime must remain under 3 minutes. Even if the export were compressed, the combined time for dumping, transferring, importing, and validating the data would be far longer than the allowed outage window. This method also places heavy load on the source and target during export and import, which increases operational risk. It is acceptable for smaller databases or maintenance windows with long downtime, but not for this scenario.
Incorrect. Exporting tables to CSV and importing them into Cloud SQL is a manual bulk-load pattern that is unsuitable for a transactional MySQL migration of this size and criticality. CSV workflows do not preserve full database semantics such as triggers, routines, foreign keys, and other schema-level objects without significant extra work. Rebuilding indexes and constraints after import would add even more time and complexity, making the outage far exceed 3 minutes. This approach is better suited to analytical data movement than to a payment ledger requiring low-downtime migration.
Core concept: This question tests low-downtime migration of MySQL to Cloud SQL using Database Migration Service (DMS) with continuous replication (binlog-based CDC). It emphasizes meeting strict cutover RTO (under 3 minutes) while preserving ACID behavior at the database layer. Why the answer is correct: With a 1.6 TB dataset and sustained 900 TPS (bursts to 1,500), any “offline” export/import approach (mysqldump/CSV) will take far longer than 3 minutes and introduces extended downtime. DMS supports MySQL 8.0 to Cloud SQL for MySQL and can perform an initial load followed by continuous replication from MySQL binary logs in ROW format. By letting the initial load run ahead of time and keeping replication nearly caught up, the final cutover becomes a short, controlled operation: stop writes on the source, allow replication to drain to near-zero lag, promote the destination, and repoint applications. Key features / configurations / best practices: - Use DMS continuous migration with binlog-based replication (GTID preferred when available; otherwise file/position). Ensure source has ROW binlog format (given) and appropriate retention so logs aren’t purged during initial load. - Validate network throughput: 1 Gbps VPN (~125 MB/s theoretical, lower effective) is adequate for continuous replication and initial load over time; the key is that cutover only depends on draining remaining changes, not transferring 1.6 TB during the window. - Preserve transactional integrity: InnoDB + row-based binlogs allow deterministic replication of committed changes. Application cutover should include connection string updates and possibly DNS/connection pool refresh. - Follow Google Cloud Architecture Framework reliability guidance: perform rehearsals, monitor replication lag, and plan rollback (keep source read-only but available briefly). Common misconceptions: - “mysqldump is simplest”: true for small databases, but at 1.6 TB it creates long downtime and risks missing the 3-minute requirement. - “ETL tools can migrate data”: Data Fusion pipelines are for transformation/analytics-style movement, not for maintaining transactional consistency with near-zero downtime. - “CSV export is faster”: it loses schema fidelity (constraints, triggers, routines), requires index rebuilds, and still implies long downtime. Exam tips: When you see large datasets + strict cutover window + MySQL with binlogs enabled, default to DMS continuous migration (CDC). Reserve dump/CSV imports for small datasets or when downtime is acceptable. Also remember Cloud SQL supports read replicas, but cross-environment migration is best handled by DMS for managed, monitored cutovers.
During a quarterly disaster-recovery review at a fintech company, you discover that a production Cloud SQL for MySQL 8.0 instance (single-zone, us-central1-a, 800 GB with storage auto-increase enabled) is not configured for high availability (HA), while SLOs require RTO < 60 seconds and zero data loss during zonal failures; you have a 30-minute maintenance window this weekend and cannot accept multi-hour downtime for data migration. Following Google-recommended practices, how should you enable HA on this existing instance with the least operational risk?
Creating a new HA instance and using export/import is a classic migration pattern, but it is high risk here. Export/import for 800 GB can take hours and requires a cutover window; it also risks data loss unless you freeze writes or implement additional replication. It violates the constraint of not accepting multi-hour downtime and is not the least operational risk compared to in-place HA enablement.
Cloud Data Fusion is an ETL/integration tool, not the standard or lowest-risk method to migrate a production MySQL database for HA. Using it for full database migration adds complexity, operational overhead, and potential data consistency issues. It also does not inherently solve the cutover/RPO problem without additional replication and careful orchestration, making it unsuitable under tight downtime constraints.
Patching the existing instance to set availability-type=REGIONAL is the intended Cloud SQL approach to enable HA with minimal operational risk. It avoids a full data migration and leverages Cloud SQL’s managed creation of a standby in another zone with synchronous replication and automatic failover. It typically requires only a short maintenance interruption, aligning with the 30-minute window and HA SLOs.
Manually shutting down the instance and “toggling HA” is not the recommended or necessary procedure. HA enablement is done through an instance update (console/API/gcloud patch), and introducing manual stop/start steps increases the chance of errors and extended downtime. It also doesn’t add value beyond what option C already provides in a controlled, supported way.
Core concept: This question tests Cloud SQL high availability (HA) for MySQL and how to retrofit HA onto an existing instance with minimal risk and downtime. In Cloud SQL, HA is provided by a REGIONAL (availability-type=REGIONAL) configuration that creates a standby in a different zone within the same region and uses synchronous replication with automatic failover. Why the answer is correct: Option C is the Google-recommended operational approach: patch the existing instance to REGIONAL. Cloud SQL supports enabling HA on an existing instance by updating the availability type. This avoids a full logical migration (export/import) and avoids building a parallel stack with cutover risk. It fits the constraint of a short maintenance window and the requirement to avoid multi-hour downtime. The operation will incur some downtime during the reconfiguration (typically minutes, varies by instance size and workload), but it is the least risky path compared to data migration. Key features and best practices: REGIONAL HA places primary and standby in different zones (zonal failure protection) and uses synchronous replication to minimize data loss (meeting “zero data loss” expectations for zonal failures). It also enables automatic failover with a low RTO (often well under 60 seconds, though applications must use recommended connection practices such as the Cloud SQL connector/proxy and retry logic). Storage auto-increase remains supported; ensure sufficient regional quota for the additional standby resources. Plan the change in the maintenance window and validate with a controlled failover test. Common misconceptions: Export/import (A) or Data Fusion (B) can achieve HA, but they introduce long migration time for 800 GB, higher cutover complexity, and greater risk of missing the “zero data loss” requirement unless you add replication and carefully orchestrate cutover. Option D suggests “toggle HA” but implies manual stop/start; Cloud SQL HA enablement is performed via an update/patch operation rather than an ad-hoc shutdown procedure, and manual steps increase operational risk. Exam tips: For Cloud SQL, “HA” generally maps to availability-type=REGIONAL (standby in another zone, automatic failover). When asked for least operational risk on an existing instance, prefer in-place configuration changes supported by the service over migrations. Always connect HA requirements to RTO/RPO: REGIONAL HA targets zonal failures with low RTO and near-zero RPO, but still requires client retry/connection best practices.
Your team runs a meal-delivery ordering platform in us-central1. The API is served from Cloud Run, and transactional data is stored in a single Cloud SQL for PostgreSQL instance with automatic maintenance updates enabled. 92% of customers are in the America/Chicago time zone and expect the app to be available every day from 6:00 to 22:00 local time. Security policy requires that database maintenance patches be applied within 7 days of release. You need to apply regular Cloud SQL maintenance without creating downtime for users during operating hours. What should you do?
Setting a maintenance window helps control when maintenance occurs, but it does not prevent downtime on a single Cloud SQL instance because patching often requires a restart. Additionally, Cloud SQL maintenance windows are configured in UTC (not a local time zone), so “02:00–03:00 America/Chicago” is not how the setting is applied. Sequencing non-prod first is good practice, but it doesn’t solve production availability.
A read replica can offload read traffic, but it does not provide automatic, seamless continuity for a transactional (write-heavy) system during primary maintenance. Cloud SQL replicas are asynchronous; promoting a replica to primary is a disruptive operational event and typically requires connection endpoint changes and careful failover planning. This does not meet the requirement of avoiding downtime for users during operating hours.
Maintenance notifications and rescheduling are operationally helpful, but they don’t eliminate downtime. The requirement is to apply patches within 7 days and avoid downtime during 06:00–22:00 local time. Notifying users implies accepting downtime, which conflicts with the stated goal. Also, maintenance timing can be constrained by Google’s maintenance policies and the patch compliance window.
Cloud SQL High Availability (regional) is designed to minimize downtime during both planned maintenance and unplanned failures. With a synchronous standby in another zone, Cloud SQL can fail over during maintenance so the service remains available with minimal disruption. This best aligns with the Architecture Framework’s reliability principles and meets the requirement to patch within 7 days without impacting users during operating hours.
Core concept: This question tests Cloud SQL maintenance behavior and how to design for availability during planned maintenance. In Cloud SQL, maintenance (OS/database patching) can require a restart and therefore downtime on a single-instance deployment. The key architectural principle from the Google Cloud Architecture Framework is to design for high availability and minimize single points of failure. Why the answer is correct: Enabling High Availability (regional) for Cloud SQL for PostgreSQL creates a primary instance with a synchronous standby in a different zone within the same region (us-central1). During maintenance, Cloud SQL can perform a controlled failover to the standby, apply updates, and then (optionally) fail back. This significantly reduces user-visible disruption compared to patching a single instance, helping meet the requirement of no downtime during operating hours (06:00–22:00 America/Chicago) while still complying with the policy to apply patches within 7 days. Key features / configurations: - Cloud SQL HA (regional) uses zonal redundancy with automatic failover and a regional IP option. - Maintenance can be scheduled, but even with a window, a single instance still experiences downtime during restart. - HA is the recommended approach for production workloads that require high availability and reduced planned/unplanned downtime. - Cloud Run is stateless and can tolerate brief connection blips; HA minimizes the database-side interruption window. Common misconceptions: - A maintenance window alone (Option A) does not eliminate downtime; it only controls when it happens. Also, Cloud SQL maintenance windows are configured in UTC, and “sequencing” across instances doesn’t help if production is a single instance. - Read replicas (Option B) are for scaling reads and disaster recovery patterns, but they are asynchronous and cannot automatically take over as a primary for writes during maintenance without promotion and application reconfiguration. - Notifying users (Option C) does not meet the requirement to avoid downtime during operating hours. Exam tips: When you see “avoid downtime during maintenance” for Cloud SQL, think HA (regional) first. Maintenance windows are about scheduling, not eliminating downtime. Read replicas do not provide seamless write availability. Also remember Cloud SQL maintenance windows are specified in UTC and security patch timelines may force maintenance within a limited period, making HA the safest compliance-friendly design.
You are deploying a new Java service on a Windows Server 2019 VM in your company’s on-premises data center (no Cloud VPN or Interconnect to Google Cloud), and the service uses JDBC on port 5432 to connect to a Cloud SQL for PostgreSQL instance that has a private IP 10.20.30.40 and a public IP 34.98.120.10, with SSL disabled on the instance; you must ensure the service can access the database without making any configuration changes to the Cloud SQL instance—what should you do?
Incorrect. JDBC authentication to PostgreSQL uses database users (created in PostgreSQL/Cloud SQL), not Google Workspace usernames/passwords. Even if the public IP is reachable, Workspace credentials are not valid database credentials. Additionally, connecting directly to the public IP without the Cloud SQL Auth Proxy typically requires configuring authorized networks or other instance-side controls, which the question forbids changing.
Incorrect. The private IP (10.20.30.40) is only reachable from within a connected VPC network or from on-prem via Cloud VPN/Interconnect. The scenario explicitly states there is no VPN or Interconnect, so there is no route from the on-prem VM to the private IP. A direct JDBC connection to the private address will fail regardless of correct database credentials.
Incorrect. While using the Cloud SQL Auth Proxy with a service account is a best practice, configuring it to connect to the instance’s private IP still requires network connectivity to that private IP. Without VPN/Interconnect (or being in the same VPC), the on-prem VM cannot reach 10.20.30.40. The proxy does not magically provide private IP routing; it provides authenticated tunneling to Cloud SQL endpoints.
Correct. Because the VM is on-premises and there is no Cloud VPN or Interconnect, the instance’s private IP 10.20.30.40 is not reachable from the data center. The Cloud SQL Auth Proxy can be run on the Windows Server and authenticated with a service account, and it will establish a secure connection to the Cloud SQL instance over Google-managed connectivity using the instance’s public path. The Java application should then use JDBC to connect to the proxy’s local listener, while the proxy handles the authenticated, encrypted connection to Cloud SQL without requiring changes to the instance configuration.
Core concept: This question tests Cloud SQL connectivity patterns from an external (on-prem) environment, specifically the difference between private IP vs public IP access and when to use the Cloud SQL Auth Proxy (or Cloud SQL connectors). It also implicitly tests IAM vs database authentication and secure connectivity best practices. Why the answer is correct: Because the VM is in an on-premises data center with no Cloud VPN or Cloud Interconnect, it has no network path to the Cloud SQL instance’s private IP (10.20.30.40). Private IP for Cloud SQL is reachable only from VPC-connected networks (same VPC, peered VPCs, or on-prem via VPN/Interconnect). You are also not allowed to change the Cloud SQL instance configuration (so you cannot add VPN/Interconnect, change authorized networks, enforce SSL, etc.). The only viable connectivity path is via the instance’s public IP (34.98.120.10). To connect securely and without changing instance settings, you should use the Cloud SQL Auth Proxy with a service account. The proxy establishes an authenticated, encrypted tunnel to Cloud SQL using IAM, and your application connects locally (e.g., to 127.0.0.1:5432) via standard JDBC. Key features / best practices: - Cloud SQL Auth Proxy (or language connectors) provides IAM-based authorization and TLS encryption in transit even if the instance’s “require SSL” setting is disabled. - Works over outbound connections from the client to Google-managed endpoints; typically no inbound firewall openings are needed on the client side. - Uses a service account with least privilege (e.g., roles/cloudsql.client) and avoids exposing database credentials in broad network contexts. - Aligns with Google Cloud Architecture Framework security principles: strong identity, secure connectivity, and minimizing attack surface. Common misconceptions: - Confusing Google Workspace credentials with database credentials (Cloud SQL uses database users for PostgreSQL authentication; Workspace login is not a JDBC auth mechanism). - Assuming private IP is reachable from anywhere if you “know the IP.” Without hybrid connectivity, it is not routable. - Thinking SSL disabled means traffic is unencrypted; the proxy still uses TLS to Cloud SQL. Exam tips: When you see “on-prem without VPN/Interconnect,” eliminate any option that relies on private IP. When you see “no changes to the instance,” prefer Cloud SQL Auth Proxy/Connectors over instance-side network allowlisting or SSL enforcement changes. Also remember: IAM controls who can connect (proxy/connector), while PostgreSQL users/passwords control database login once connected.
You manage a small, non-critical Cloud SQL for PostgreSQL instance (2 vCPUs, 50-GB storage) used by a staging QA pipeline; the business accepts a recovery point objective of up to 72 hours and you want to minimize ongoing operational and storage costs while still retaining basic recovery capability—what backup/restore configuration should you choose?
Disabling all backups and turning off transaction log retention is the absolute lowest cost, but it eliminates recovery capability. Even for non-critical systems, the question explicitly requires “still retaining basic recovery capability.” Without backups or PITR logs, you cannot restore after accidental deletion, corruption, or instance failure beyond what high availability might cover. This fails the stated business requirement.
One manual backup per day with transaction log retention off can meet a 72-hour RPO (daily backups are more frequent than required). However, it increases operational overhead and risk: someone must ensure backups are created, retained, and not accidentally deleted. Manual backups also don’t inherently optimize retention unless you manage lifecycle yourself. For “minimize ongoing operational cost,” automated backups are preferred.
Automated backups with transaction log retention off provide a managed, low-ops baseline recovery method at lower cost than PITR. You can set backup retention to at least 3 days to satisfy the 72-hour RPO. This configuration avoids the extra storage costs of retaining WAL/transaction logs while still allowing restore to the most recent automated backup. It best matches “basic recovery capability” plus cost minimization.
Automated backups with transaction log retention on enables PITR, allowing restores to a specific timestamp between backups. This is valuable for tighter RPO/RTO targets and protection from logical errors discovered quickly. But it increases ongoing storage consumption (transaction logs) and can raise costs without providing meaningful benefit when the business accepts up to 72 hours of data loss. It is over-engineered for staging QA.
Core concept: This question tests Cloud SQL for PostgreSQL backup strategy tradeoffs: automated backups vs manual backups, and transaction log retention (point-in-time recovery/PITR). It focuses on meeting an RPO requirement at the lowest operational and storage cost, aligning with the Google Cloud Architecture Framework principles of cost optimization and operational excellence. Why the answer is correct: An RPO of up to 72 hours means the business can tolerate losing up to 3 days of data. The lowest-cost configuration that still provides basic recovery capability is to enable automated backups but disable transaction log retention (PITR). Automated backups provide scheduled, managed backups with retention controls and minimal operator effort. With PITR disabled, you avoid the additional storage and ongoing write/retention overhead of keeping WAL/transaction logs, which is unnecessary given the relaxed RPO. Key features and best practices: Cloud SQL automated backups are managed by the service and can be configured with a retention window (number of backups/days). To satisfy a 72-hour RPO, ensure retention covers at least 3 days (commonly set 3–7 days). Disabling transaction log retention removes the ability to restore to an arbitrary point in time between backups, but it reduces storage consumption and cost. For a non-critical staging QA pipeline, restore-to-last-backup is typically sufficient. Common misconceptions: Option B (manual daily backups) may seem cheaper because you control frequency, but it increases operational burden and risk of missed backups, and manual backups still consume storage. Option D (automated + PITR) is the most robust but adds cost and complexity not justified by the relaxed RPO. Option A (no backups) minimizes cost but violates the requirement to retain basic recovery capability. Exam tips: Translate RPO into backup frequency/retention: if RPO is 72 hours, you need backups at least every 72 hours and retained for at least that window. Use automated backups for low-ops environments. Enable PITR only when you need finer-grained recovery (minutes/hours) or protection against logical corruption between backups. For cost questions, remember PITR implies ongoing transaction log storage and typically higher costs than backups alone.
You operate a Cloud SQL for PostgreSQL deployment in Google Cloud. The primary instance runs in zone europe-west1-b, and a read replica runs in zone europe-west1-c within the same region. An alert reports that the read replica in europe-west1-c was unreachable for 11 minutes due to a zonal network disruption. You must ensure that the read-only workload continues to function and that the replica remains available with minimal manual intervention. What should you do?
A clone of the primary is not the same as a read replica. Clones are independent copies and do not continue asynchronous replication from the primary after creation. Using a clone for read traffic would create data divergence over time and require more manual management. Therefore, it does not properly restore a managed read replica topology.
This is the best choice because the read-only workload depends on a serving replica, and the only replica has been unreachable for an extended period. Creating a new read replica is the most direct way to restore read capacity while keeping the primary online for writes. Redirecting traffic to the new replica addresses the workload requirement, whereas waiting for automatic recovery does not guarantee service continuity. Although not ideal compared with having multiple replicas preconfigured, it is the strongest recovery action among the listed options.
This option is incorrect because it relies on an assumed automatic recreation of the read replica in a healthy zone. Cloud SQL does not provide a documented guarantee that a single read replica will be automatically recreated elsewhere to preserve read availability after a zonal disruption. Verifying status is a reasonable operational check, but it does not itself ensure the read-only workload continues to function. The question asks what you should do to ensure availability, not merely what you should observe.
Restarting the primary is unnecessary and potentially harmful. The problem is with the replica being unreachable due to a zonal network disruption, not with the primary database process. Restarting the primary can interrupt write traffic and does not guarantee restoration of the replica or read service. It adds risk without addressing the actual failure domain.
Core concept: This question tests Cloud SQL for PostgreSQL read replica availability and operational recovery during a zonal disruption. A Cloud SQL read replica is a separate asynchronous replica instance, and a single replica in one zone is still a single point of failure for read traffic in that zone. Cloud SQL does not provide automatic read workload failover for a lone read replica, so if continued read availability is required, you must restore replica capacity yourself or design multiple replicas in advance. Why correct: Because the only read replica was unreachable for 11 minutes and the requirement is to ensure the read-only workload continues to function with minimal manual intervention, the best available action is to create a new read replica from the latest available state and redirect read traffic to it. This restores a serving replica without disrupting the primary. It is the closest operational response among the options to re-establishing read availability after the zonal failure. Key features: - Cloud SQL read replicas are asynchronous and do not automatically provide read failover semantics by themselves. - A zonal outage affecting the replica zone can leave read traffic without a serving target unless another replica already exists. - Recreating a read replica is an administrative recovery action; for stronger resilience, deploy multiple replicas across zones and use application-side routing or connection management. Common misconceptions: A common mistake is assuming Cloud SQL will automatically recreate a read replica in another zone after a zonal disruption. Another misconception is that restarting the primary helps replica recovery; it usually only adds unnecessary write disruption. Clones are also not substitutes for read replicas because they are independent copies, not replication targets. Exam tips: - Distinguish primary HA/failover behavior from read replica behavior; they are not the same. - If the question asks about maintaining read workload continuity, think about replica redundancy and traffic redirection. - When only one replica exists and it becomes unavailable, manual recreation may be necessary unless the architecture already includes additional replicas.
You are migrating a vendor CRM database from a legacy SQL Server 2014 Enterprise instance running on a 3‑node VMware cluster with a Fibre Channel SAN to a single Cloud SQL for SQL Server instance in Google Cloud. Storage telemetry from the SAN shows peak read workloads reaching approximately 27,000 IOPS with read latency under 2 ms during quarterly reporting. You want to size the Cloud SQL instance to maximize read performance while keeping licensing costs minimal. What should you do?
This option is too small on both compute and storage for the stated workload. With only 8 vCPUs, 32 GB of RAM, and 600 GB of SSD, it is the least capable configuration and is unlikely to sustain the peak read demand during quarterly reporting. The storage allocation is especially problematic because SSD-backed IOPS capacity in Cloud SQL scales with disk size, and 600 GB does not provide enough headroom for a 27,000 IOPS target. Even though it is cheaper, it does not meet the performance requirement.
This is the best answer because it uses SQL Server Standard, which keeps licensing costs lower than Enterprise, while still providing a high-memory 16 vCPU / 104 GB configuration suitable for a read-heavy CRM workload. The key sizing factor is storage: 800 GB of SSD is the smallest option that is intended to satisfy the required performance tier among the choices, making it the most cost-efficient Standard configuration presented. It balances compute, memory, and storage without overprovisioning disk capacity more than necessary. For an exam question emphasizing both performance and minimal licensing cost, this is the most economical option that is designed to meet the stated read-performance requirement.
This option would likely provide sufficient performance, but it is not the most cost-efficient choice. The 1 TB SSD gives more storage performance headroom than option B, yet the question asks you to maximize read performance while keeping licensing costs minimal, which implies avoiding unnecessary overprovisioning. Since both B and C use SQL Server Standard and the same compute shape, the extra storage in C adds cost without being the minimal sizing choice. Therefore, C is not the best answer when cost optimization is part of the requirement.
This option uses SQL Server Enterprise, which significantly increases licensing cost and directly conflicts with the requirement to keep licensing costs minimal. Although it has the same 16 vCPU / 104 GB high-memory shape as B and C, its 500 GB SSD is also smaller than the Standard options that are better suited for high read IOPS. Enterprise edition does not inherently remove Cloud SQL storage performance limits, so paying more for Enterprise does not solve the core sizing issue. It is therefore both more expensive and less appropriate than the Standard alternatives.
Core concept: This question tests Cloud SQL for SQL Server performance sizing, specifically how storage size and machine shape affect IOPS/latency, while also balancing SQL Server licensing cost (Standard vs Enterprise). In Cloud SQL, storage performance is primarily governed by the underlying persistent disk characteristics and provisioned size (which drives IOPS/throughput ceilings), while CPU/RAM influence query execution and buffer cache hit rate. Why the answer is correct: You have measured peak read workload of ~27,000 IOPS with sub‑2 ms latency. To maximize read performance on Cloud SQL, you typically need SSD storage sized large enough to meet the required IOPS headroom, because persistent disk SSD performance scales with provisioned capacity and has per‑GB IOPS limits and per‑instance caps. Among the Standard-edition options, the largest SSD (1 TB) provides the highest achievable IOPS/throughput envelope and best chance to sustain 27k read IOPS during quarterly reporting. Keeping SQL Server Standard also minimizes licensing cost versus Enterprise. Key features / best practices: - Prefer SSD for latency-sensitive OLTP/reporting bursts. - Size storage for peak IOPS plus headroom; don’t size only for data volume. - Use High Memory shapes when SQL Server benefits from larger buffer pool (improves read performance by caching hot data), but recognize that storage IOPS can still be the limiting factor. - Validate with load testing and monitor Cloud SQL metrics (disk read ops, latency, CPU, buffer cache hit ratio). Common misconceptions: - Assuming more vCPUs alone increases IOPS: CPU helps query processing, but storage IOPS limits can dominate. - Choosing Enterprise for “performance”: Enterprise adds features (e.g., advanced HA/online operations) but doesn’t inherently remove Cloud SQL storage limits; it also increases licensing cost. - Under-sizing SSD (e.g., 600–800 GB) can look cost-effective but may fail peak IOPS targets. Exam tips: When a question provides IOPS/latency requirements, map them first to storage type and size, then choose the smallest edition (Standard) that meets requirements. Use larger SSD to raise the IOPS ceiling, and only choose Enterprise if a required feature explicitly demands it (e.g., specific HA/DR or advanced SQL Server capabilities).
Associate









