
GCP
300+ 무료 연습 문제 (AI 검증 답안 포함)
AI 기반
모든 Google Professional Data Engineer 답안은 3개의 최고 AI 모델로 교차 검증하여 최고의 정확도를 보장합니다. 선택지별 상세 해설과 심층 문제 분석을 제공합니다.
You are troubleshooting an Apache Flink streaming cluster running on 12 Compute Engine VMs in a managed instance group without external IPs on the custom VPC "analytics-vpc" and subnet "stream-subnet". TaskManager nodes cannot communicate with one another. Your networking team manages access using Google Cloud network tags to define firewall rules. Flink has been configured to use TCP ports 12345 and 12346 for RPC and data transport between nodes. You need to identify the issue while following Google-recommended networking security practices. What should you do?
Your company operates three independent data workflows that must be orchestrated from a single place with consistent scheduling, monitoring, and on-demand execution.
You are designing a platform to store 1-second interval temperature and humidity readings from 12 million cold-chain sensors across 40 warehouses. Analysts require real-time, ad hoc range queries over the most recent 7 days with sub-second latency. You must avoid per-query charges and ensure the schema can scale to 25 million sensors and accommodate new metrics without frequent schema changes. Which database and data model should you choose?
You operate a Cloud Run service that receives messages from a Cloud Pub/Sub push subscription at a steady rate of ~1,200 messages per minute, aggregates events into 5-minute batches, and writes compressed JSON files to a dedicated Cloud Storage bucket.\nYou want to configure Cloud Monitoring alerts that will reliably indicate if the pipeline stalls for more than 10 minutes by detecting a growing upstream backlog and a slowdown in data written downstream; which alerts should you create?
Your fintech compliance team must store 12 TB of transaction audit files (about 200,000 objects per month) in a Cloud Storage Archive bucket with a 7-year retention requirement. Due to a zero-trust mandate, you must implement a Trust No One (TNO) model so that even cloud provider personnel cannot decrypt the data; uploads will be performed from an on-prem hardened host using gsutil, and only the internal security team may hold the encryption material. What should you do to meet these requirements?
이동 중에도 모든 문제를 풀고 싶으신가요?
Cloud Pass를 무료로 다운로드하세요 — 모의고사, 학습 진도 추적 등을 제공합니다.
Your marketing analytics team needs to run a weekly PySpark batch job on Google Cloud Dataproc to score customer churn propensity using input data in Cloud Storage and write results to BigQuery; testing shows the workload completes in about 35 minutes on a 16-worker n1-standard-4 cluster when triggered every Friday at 02:00 UTC; you are asked to cut infrastructure costs without rewriting the job or changing the schedule—how should you configure the cluster for cost optimization?
Your factory collects 50 MB/s of PLC telemetry into an on-premises Apache Kafka cluster (3 brokers, 6 topics with 48 total partitions, 7-day retention), and you must replicate these topics to Google Cloud so raw events land in Cloud Storage and can later be analyzed in BigQuery with end-to-end replication lag under 3 minutes; due to strict change control you must avoid deploying any Kafka Connect plugins on-premises and the team prefers a mirroring-based approach for replication; what should you do?
Your company runs a private Google Kubernetes Engine (GKE) cluster in a custom VPC in us-central1 using a subnetwork named analytics-subnet; due to the organization policy constraints/compute.vmExternalIpAccess, all nodes have only internal IPs with no external IPs. A nightly Kubernetes Job must download 500 MB CSV files from Cloud Storage and load transformed results into BigQuery using the BigQuery Storage Write API, but pods fail with DNS resolution/connection errors when contacting storage.googleapis.com and bigquery.googleapis.com. What should you do to allow access to Google APIs while keeping the nodes on internal IPs only?
Your ride-hailing platform operates a Standard Tier Memorystore for Redis instance (15 GB capacity, ~80k QPS, multi-zone production deployment with 12-hour key TTLs), and you need to run the most realistic disaster recovery drill by triggering a Redis failover while guaranteeing zero impact on production data (no data loss); what should you do?
Your team operates a 7-node RabbitMQ ingress tier (~50,000 msgs/sec) and a 5-node TimescaleDB cluster for durable storage; both clusters run on Compute Engine VMs with 2 TB Persistent Disks per node spread across three zones. Compliance mandates that all data at rest be encrypted with keys your team can create, rotate every 90 days, and destroy on demand, without requiring changes to the application code. What should you do?
Your company runs a real-time vehicle telemetry system on Google Cloud, where a Cloud Dataflow streaming job consumes events from a Cloud Pub/Sub topic 'telemetry-prod' via subscription 'telemetry-prod-v1' at an average rate of 25,000 messages per minute with a 60-second ack deadline. You must roll out a new version of the pipeline within the next hour that changes the keying and windowing logic in a way that is incompatible with the current job, and you cannot pause event producers; the business requires zero data loss during the cutover. What should you do to deploy the new pipeline without losing data?
ByteFarm, an agri-tech startup, runs a Cloud Dataflow streaming pipeline that ingests telemetry from 75,000 greenhouse sensors via Pub/Sub and writes aggregated metrics to BigQuery. To prepare for seasonal peaks where throughput can triple for up to 4 hours, you enabled autoscaling and set the initial number of workers to 25. During a load test, the job stops scaling at 40 workers and backlog grows; you want Dataflow to be able to scale compute higher without manual intervention. Which Cloud Dataflow pipeline configuration setting should you update?
Your healthcare analytics startup must lift-and-shift a single-region 2.3 TB on-premises PostgreSQL database that powers your billing API; you have fewer than 400 concurrent client connections, require standard SQL with ACID transactions and point-in-time recovery, cannot redesign the schema or application within the next quarter, do not need global distribution, and minimizing ongoing operating cost is the top priority; which Google Cloud service should you use to store and serve this workload?
You manage a BigQuery dataset that stores hourly IoT telemetry for 500,000 sensors, and you must let 5 internal departments across 10 consumer projects discover and use the data without creating copies, keeping monthly maintenance under 1 hour and costs minimal; within the same Google Cloud organization, what is the most self-service, low-maintenance, and cost-effective way to share this dataset?
Your globally distributed ride-hailing platform lets drivers accept trip requests, and occasionally multiple drivers tap Accept for the same request within 10–50 ms while different regional application clusters handle those taps; each acceptance event includes rideId, driverId, acceptTimestamp (RFC3339 UTC), region, and fareEstimate, and events may arrive out of order by up to 3 seconds; you must aggregate these events centrally in real time with under 2 seconds end-to-end latency at a sustained rate of 200,000 events per minute to determine which driver accepted first. What should you do?
You are using Cloud Bigtable to persist and serve real-time error logs from five microservices in a payment platform, and the on-call dashboard needs only the most recent log entry per service (logs stream at up to 1,000 rows per second per service) with the simplest possible query to fetch the latest per service—how should you design your row keys and tables?
You manage an overnight telemetry-validation workflow in Cloud Composer 2; one Airflow task calls a partner's device registry API via an HTTP operator and is configured with retries=3 and retry_delay=5 minutes, while the DAG has an SLA of 45 minutes; you want a notification to be sent only when this specific task ultimately fails after exhausting all retries (and not on retries or SLA misses); what should you do?
Your analytics team streams 80,000 events per second into a BigQuery table via a Pub/Sub BigQuery subscription in us-central1. Currently, both the Pub/Sub topic (project: stream-prd) and the BigQuery table (project: analytics-prd, dataset: ops_ds, table: events_raw) use Google-managed encryption keys. A new organization policy mandates that all at-rest data for this pipeline must use a customer-managed encryption key (CMEK) from a centralized KMS project (project: sec-kms-prj, key ring: analytics-ring, key: event-data-key, region: us-central1). You must comply with the policy and keep streaming ingestion running while you transition and preserve historical data. What should you do?
A logistics company (AeroFleet) ingests 120,000 events/sec (avg ~40 MB/s, peak 80 MB/s) from a 3-broker on-premises Apache Kafka cluster into Google Cloud over a 10 Gbps Dedicated Interconnect with 7–10 ms RTT; security policy allows only private IPs and TLS/SASL to Kafka, and the analytics team needs events queryable in BigQuery with p50 < 5 s and p99 < 20 s end-to-end latency while keeping architecture hops to a minimum and ensuring horizontal scalability; what should you do to meet throughput and latency goals with minimal added components?
You are building a regression model to estimate hourly fuel consumption for cargo drones from 70 telemetry features in historical flight logs stored in BigQuery. You have 120M labeled rows, you randomly shuffle the table and create an 85/15 train–test split, then train a 4-layer neural network with early stopping in TensorFlow. After evaluation, you observe that the RMSE on the training set is about 2x higher than on the test set (e.g., 3.0 L vs 1.5 L). To improve overall model performance without changing the dataset source, what should you do next?
학습 기간: 1 month
I tend to get overwhelmed with large exams, but doing a few questions every day kept me on track. The explanations and domain coverage felt balanced and practical. Happy to say I passed on the first try.
학습 기간: 2 months
Thank you ! These practice questions helped me pass the GCP PDE exam at the first try.
학습 기간: 1 month
The layout and pacing make it comfortable to study on the bus or during breaks. I solved around 20–30 questions a day, and after a few days I could feel my confidence improving.
학습 기간: 1 month
해설이 영어 기반이긴 하지만 나름 도움 됐어요! 실제 시험이랑 문제도 유사하고 좋네요 ㅎㅎ
학습 기간: 2 months
I combined this app with some hands-on practice in GCP, and the mix worked really well. The questions pointed out gaps I didn’t notice during practice labs. Good companion for PDE prep.

Professional

Associate

Professional

Associate

Foundational

Professional

Professional

Professional

Professional

Professional
무료 앱 받기