
Simulez l'expérience réelle de l'examen avec 50 questions et une limite de temps de 120 minutes. Entraînez-vous avec des réponses vérifiées par IA et des explications détaillées.
Propulsé par l'IA
Chaque réponse est vérifiée par 3 modèles d'IA de pointe pour garantir une précision maximale. Obtenez des explications détaillées par option et une analyse approfondie des questions.
You are designing a platform to store 1-second interval temperature and humidity readings from 12 million cold-chain sensors across 40 warehouses. Analysts require real-time, ad hoc range queries over the most recent 7 days with sub-second latency. You must avoid per-query charges and ensure the schema can scale to 25 million sensors and accommodate new metrics without frequent schema changes. Which database and data model should you choose?
Envie de vous entraîner partout ?
Téléchargez Cloud Pass gratuitement — inclut des tests d'entraînement, le suivi de progression et plus encore.
Période de préparation: 1 month
I tend to get overwhelmed with large exams, but doing a few questions every day kept me on track. The explanations and domain coverage felt balanced and practical. Happy to say I passed on the first try.
Période de préparation: 2 months
Thank you ! These practice questions helped me pass the GCP PDE exam at the first try.
Période de préparation: 1 month
The layout and pacing make it comfortable to study on the bus or during breaks. I solved around 20–30 questions a day, and after a few days I could feel my confidence improving.
Période de préparation: 1 month
해설이 영어 기반이긴 하지만 나름 도움 됐어요! 실제 시험이랑 문제도 유사하고 좋네요 ㅎㅎ
Période de préparation: 2 months
I combined this app with some hands-on practice in GCP, and the mix worked really well. The questions pointed out gaps I didn’t notice during practice labs. Good companion for PDE prep.
Téléchargez Cloud Pass et accédez gratuitement à toutes les questions d'entraînement Google Professional Data Engineer.
Obtenir l'application gratuite
Your micromobility platform migrated a 4.5 TB ride-events warehouse from an on-prem system to BigQuery; the core fact_rides table (≈2.2 billion rows, ~75 million new rows per day) is modeled in a star schema with small dimension tables and currently stored as one unpartitioned table. Analysts run dashboards that filter for the last 30 days using WHERE event_date >= DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY), yet queries still scan nearly the entire table and take 30–45 seconds, increasing query costs. Without increasing storage costs, what should you change to speed up these 30-day queries in line with Google-recommended practices?
You are migrating a Scala Spark 3 nightly ETL pipeline that processes 2 TB of JSON logs from an Azure HDInsight cluster to Google Cloud. You need the job to read from a Cloud Storage bucket and append results to a BigQuery table with no application logic changes. The job is tuned for Spark with each executor using 8 vCPUs and 16 GB memory, and you want to retain similar executor sizing. You want to minimize installation and infrastructure management (no cluster lifecycle or connector setup) while running the job. What should you do?
You are the data platform lead at a global ride-sharing company where five regional operations teams share a single BigQuery project billed with on-demand pricing. The project is capped at 2,000 concurrent on-demand slots; during end-of-quarter surge analysis, some analysts cannot obtain slots and their queries are queued or canceled. You must avoid creating additional projects, enforce a priority scheme across teams (e.g., Finance > Operations > Marketing), and ensure predictable performance during spikes; what should you do?
A regional public transit agency runs a 160-node on-prem Hadoop environment (Spark and Hive on HDFS) to process ridership and farebox logs; workloads are sized for weekday peak demand, but over 70% of pipelines are nightly batch and midday utilization often drops below 20%. The lease on the municipal server room ends in 60 days, and an extension is expensive; the agency wants to reduce operational overhead, favor serverless where practical, and lower storage and compute costs without jeopardizing its SLA of completing nightly batch by 5:00 a.m. They have approximately 900 TB of Parquet and ORC data and 250 scheduled Spark/Hive jobs; the immediate goal is to move within the deadline, minimize risk, and realize near-term cost savings. Which migration strategy should they choose to maximize cost savings in the cloud while still meeting the 60-day timeline?
You are building a global restaurant reservation microservice on Google Cloud that must handle sudden growth from 50,000 to 20,000,000 daily active users and peak write traffic of 6,000 requests per second while you avoid provisioning or managing database servers; you need a fully managed, automatically scaling operational database with low-latency reads/writes and simple transactional updates on small entity groups— which Google Cloud database service should you choose?
You are the data platform lead at a nationwide healthcare network rolling out a virtual assistant for the patient portal using Dialogflow CX. You analyzed 180,000 historical chat transcripts and labeled intents: about 70% of patient requests are routine tasks (e.g., check lab results, reschedule appointment, password reset) that resolve within 10 intents and under 4 turns; the remaining 30% are complex, multi-turn workflows (e.g., prior-authorization appeals, insurance coordination) that average 20–30 turns and frequently need live-agent handoff. Your goal is to reduce live-agent volume by 40% in the first quarter without degrading patient experience. Which intents should you automate first?
At a logistics company, you created a Dataprep recipe on a 5% sample of a BigQuery table that stores daily truck telemetry, and each day a batch load with variable completion time (between 02:10 and 03:50 UTC) appends the new day's data with the same schema; you want the same transformations to run automatically on each daily upload after the load completes—what should you do?
A logistics company streams shipment scan events in a compact JSON schema from 1,200 handheld devices (about 50,000 events per minute) into a Pub/Sub topic; a Dataflow streaming pipeline reads from a subscription, applies fixed 1-minute windows and aggregations, and feeds an operations dashboard that should reflect every scan in real time; during a 2-hour pilot, the dashboard intermittently shows 3–5% fewer scans than expected, while producer logs show all HTTP publish calls succeeding and Cloud Monitoring for the topic reports 0% publish errors with median publish latency under 100 ms. What should you do next to isolate the issue?
You are building a healthcare analytics warehouse in BigQuery that stores 80 million lab-result rows and PII for 600,000 patients across 12 tables. Compliance requires per-patient cryptographic deletion so that, upon an erasure request, only that patient’s sensitive columns become permanently undecipherable by removing their key material—without exporting data, rewriting other rows, or changing the storage location. You must rely on native Google Cloud capabilities (no custom cryptographic libraries or client-side encryption) and allow authorized analysts to decrypt data at query time using SQL; what should you implement?