Google Professional Data Engineer

Practice Test #1

50問と120分の制限時間で実際の試験をシミュレーションしましょう。AI検証済み解答と詳細な解説で学習できます。

50問題120分700/1000合格点

練習問題を見る

AI搭載

3重AI検証済み解答＆解説

すべての解答は3つの主要AIモデルで交差検証され、最高の精度を保証します。選択肢ごとの詳細な解説と深い問題分析を提供します。

GPT Pro

Claude Opus

Gemini Pro

選択肢ごとの解説

深い問題分析

3モデル合意の精度

練習問題

問題 1

Your marketing analytics team needs to run a weekly PySpark batch job on Google Cloud Dataproc to score customer churn propensity using input data in Cloud Storage and write results to BigQuery; testing shows the workload completes in about 35 minutes on a 16-worker n1-standard-4 cluster when triggered every Friday at 02:00 UTC; you are asked to cut infrastructure costs without rewriting the job or changing the schedule—how should you configure the cluster for cost optimization?

問題 2

Your company runs a private Google Kubernetes Engine (GKE) cluster in a custom VPC in us-central1 using a subnetwork named analytics-subnet; due to the organization policy constraints/compute.vmExternalIpAccess, all nodes have only internal IPs with no external IPs. A nightly Kubernetes Job must download 500 MB CSV files from Cloud Storage and load transformed results into BigQuery using the BigQuery Storage Write API, but pods fail with DNS resolution/connection errors when contacting storage.googleapis.com and bigquery.googleapis.com. What should you do to allow access to Google APIs while keeping the nodes on internal IPs only?

問題 3

Your mobility startup needs to build a predictive maintenance model with BigQuery ML and deploy a near–real-time prediction endpoint on Vertex AI; you will ingest continuous telemetry from 12 scooter OEMs averaging 80,000 messages per minute with an end-to-end latency target under 3 seconds, and incoming payloads may include malformed JSON, missing fields, and outliers (for example, speed > 120 km/h); what should you do to reliably ingest, validate, and deliver this data for training and inference?

問題分析

Core concept: This question tests designing a robust streaming ingestion and processing architecture on Google Cloud: Pub/Sub for durable event ingestion, Dataflow (Apache Beam) for scalable stream processing with validation/cleansing, and BigQuery as the analytical store feeding BigQuery ML and downstream Vertex AI online prediction. Why the answer is correct: With 80,000 messages/min (~1,333/sec) from 12 OEMs and an end-to-end latency target under 3 seconds, you need a horizontally scalable, low-latency streaming pipeline that can handle malformed JSON, missing fields, and outliers while preserving reliability. Pub/Sub provides backpressure handling, at-least-once delivery, and buffering during downstream slowdowns. Dataflow streaming is purpose-built for continuous processing at this scale, enabling parsing, schema enforcement, enrichment, windowing, and routing of invalid records to a dead-letter path without blocking good data. Clean, validated events can then be streamed into BigQuery for training datasets and feature generation, while the same pipeline can publish sanitized features to a serving path (e.g., another Pub/Sub topic or online store) used by Vertex AI endpoints. Key features / best practices: - Pub/Sub topics (one per OEM or shared with attributes) for isolation, quota management, and easier troubleshooting. - Dataflow streaming with schema validation, side outputs for bad records, and dead-letter topics/tables. - Outlier handling (filtering, capping, or flagging) and missing-field defaults to stabilize model training. - Exactly-once semantics are not guaranteed end-to-end; design idempotent writes (e.g., BigQuery insertId/dedup keys) and use replayable sources. - Aligns with Google Cloud Architecture Framework: reliability (DLQ, retries), operational excellence (monitoring/alerts), performance efficiency (autoscaling), and security (IAM, CMEK where needed). Common misconceptions: BigQuery streaming inserts alone do not provide robust validation, dead-letter routing, or complex transformations; pushing raw, malformed payloads directly into BigQuery complicates downstream training and can break pipelines. Cloud Functions can parse messages, but at this sustained throughput and strict latency, it’s harder to manage concurrency, retries, ordering, and operational stability compared to Dataflow. Exam tips: For high-throughput, low-latency streaming with data quality requirements, the canonical pattern is Pub/Sub -> Dataflow (validate/transform + DLQ) -> BigQuery (analytics/training) and a parallel serving path. Prefer managed streaming engines (Dataflow) over ad hoc serverless functions when you need sustained scale, complex processing, and strong operational controls.

問題 4

A media intelligence firm receives irregularly timed 2–5 GB CSV files from 50 partners into a dedicated Cloud Storage bucket via Storage Transfer Service, after which a Dataproc PySpark job must standardize the files and write them to BigQuery, followed by table-specific BigQuery SQL transformations that vary by table and can run for up to 3 hours across roughly 600 destination tables, and you must design the most efficient and maintainable workflow to process all tables promptly and deliver the freshest results to analysts—what should you do?

問題 5

In a fintech company, Business Intelligence developers hold the Project Owner role in their respective Google Cloud projects to work across multiple services. Your compliance policy requires that all Cloud Storage Data Access audit logs be retained for 180 days, and only the internal audit team may read these logs across all current and future projects. What should you do?

問題分析

Core concept: This question tests centralized audit logging governance in Google Cloud: enabling and exporting Cloud Storage Data Access audit logs, enforcing retention, and restricting read access across all current and future projects. Key services/features are Cloud Audit Logs (Data Access logs), Cloud Logging sinks (aggregated sinks at folder/org), Cloud Storage retention policies, and IAM separation of duties. Why the answer is correct: You need a solution that (1) covers all current and future projects, (2) retains logs for 180 days, and (3) ensures only the internal audit team can read them. An aggregated sink at the organization or folder level exports matching logs from all descendant projects automatically, including newly created projects, which satisfies the “current and future projects” requirement. Exporting to a Cloud Storage bucket in a dedicated audit-logs project centralizes control and reduces the risk that project owners can tamper with or access the logs. A bucket retention policy (180 days) enforces immutability of the objects for the retention period, meeting compliance retention requirements. Key features/configurations: - Create an aggregated sink (org/folder) with an inclusion filter for Cloud Storage Data Access audit logs (e.g., resource.type="gcs_bucket" and logName matching data_access). - Choose a destination Cloud Storage bucket in a dedicated project controlled by the audit team. - Apply a 180-day Cloud Storage retention policy (and optionally enable Bucket Lock for stronger compliance guarantees). - Restrict IAM: grant the sink’s writer identity permission to write objects to the bucket; grant read access (e.g., Storage Object Viewer) only to the audit team; avoid granting broad Logging Viewer roles to BI project owners for the exported dataset. Common misconceptions: Many assume enabling logs and restricting Cloud Logging access is enough, but project owners can often still access logs within their projects and retention in Cloud Logging is not the same as an explicit 180-day compliance retention requirement. Project-level sinks also fail the “future projects” requirement and can be modified by project owners. Exam tips: When requirements mention “across all current and future projects,” think organization/folder-level policies and aggregated sinks. When compliance mentions a fixed retention period, think Cloud Storage retention policies (and Bucket Lock) rather than relying on default log retention. For separation of duties, centralize audit logs in a dedicated project with tightly scoped IAM.

外出先でもすべての問題を解きたいですか？

Cloud Passをダウンロード — 模擬試験、学習進捗の追跡などを提供します。

問題 6

(2つ選択)

A global ride-hailing platform is migrating driver and trip ledgers from multiple transactional sources (Cloud SQL for MySQL and an on-prem PostgreSQL cluster) into BigQuery; these systems emit log-based CDC events (operation type INSERT/UPDATE/DELETE, commit_ts, and primary key) at a steady 7,500 rows/sec with spikes to 18,000 rows/sec; product managers require that changes become queryable in a BigQuery reporting table within 60 seconds, and the data team must reduce slot consumption for applying changes by at least 40% compared to per-row DML; you will stream the CDC events continuously into BigQuery; which two steps should you take so that changes reach the reporting table with minimal latency while keeping compute overhead low? (Choose two.)

問題分析

Core concept: This question tests CDC ingestion into BigQuery with low-latency availability and cost-efficient change application. The key BigQuery pattern is: stream immutable CDC events into a staging (append-only) table, then periodically apply them to a curated reporting table using set-based operations (MERGE) rather than per-row DML. Why the answer is correct: Streaming each CDC event directly into the reporting table with per-row INSERT/UPDATE/DELETE (option A) is expensive in slot consumption and can cause contention and inefficiency at 7,500 rows/sec with spikes to 18,000 rows/sec. Instead, you should (B) stream all CDC events (including op type, commit_ts, PK, and payload) into a staging table. This keeps ingestion simple, scalable, and low-latency because streaming inserts are optimized for append workloads. Then (D) run a frequent, scheduled DML MERGE (e.g., every 30–60 seconds) that deduplicates/chooses the latest event per primary key (often using commit_ts and a tie-breaker) and applies INSERT/UPDATE/DELETE in one set-based statement. MERGE reduces overhead by batching many row changes into a single query execution, typically cutting slot usage significantly versus executing one DML per event, meeting the requirement to reduce compute overhead by at least 40% while still achieving <60s freshness. Key features / best practices: - Use an append-only staging table for streaming, partitioned by ingestion time or commit_ts and clustered by primary key to speed MERGE scans. - In the MERGE source, select only the newest event per primary key within the batch window (QUALIFY ROW_NUMBER() OVER (PARTITION BY pk ORDER BY commit_ts DESC) = 1). - Schedule MERGE with Cloud Scheduler + BigQuery scheduled queries or orchestrate with Cloud Composer/Workflows. - Keep the MERGE window bounded (e.g., last N minutes) to limit scanned bytes and latency. Common misconceptions: - “Real-time per-row DML is lowest latency”: it is, but it is the highest overhead and often fails cost/throughput goals. - “Materialized views can resolve CDC latest-state cheaply”: BigQuery materialized views have constraints and do not support arbitrary “latest per key” logic with deletes in a way that replaces proper upsert/delete application. Exam tips: For CDC into BigQuery, default to: stream to raw/staging, then batch-apply with MERGE into curated tables. Mention partitioning/clustering and dedup-by-key-by-timestamp to meet both latency and cost goals, aligning with the Google Cloud Architecture Framework’s cost optimization and operational excellence pillars.

問題 7

Your retail analytics team receives mixed-format files (Avro and JSON) from branch exports and a partner SFTP feed, totaling about 300 GB per day and up to 2 million objects per month; you must land all files in a Cloud Storage bucket encrypted with your own Customer-Managed Encryption Key (CMEK), and you want to build the ingestion with a GUI-driven pipeline where you can explicitly configure an object sink that uses your KMS key; What should you do?

問題 8

You are migrating a nightly batch ETL for an e-commerce company: at 02:00 UTC, about 300 GB of gzip-compressed JSON files with sensitive purchase data land in a Google Cloud Storage bucket (gs://orchid-orders-batch), and a PySpark job on a temporary Cloud Dataproc cluster (1 master, 8 workers) transforms them and writes aggregated results to a BigQuery dataset (analytics.orders_agg) in the same project. You currently trigger the job manually with your user account, but you want to automate it while following security best practices and the principle of least privilege. How should you run this workload securely?

問題 9

A Singapore-based fintech platform ingests real-time authorization events from point-of-sale terminals worldwide, and the primary ledger table grows by approximately 280,000 rows per second. Multiple partner banks integrate your query APIs to embed live risk and compliance checks into their own systems. Your query APIs must meet the following requirements:

Single global endpoint
ANSI SQL support
Consistent access to the most up-to-date data What should you do?

問題分析

Core concept: This question tests choosing the right serving datastore for globally distributed, high-write-rate, low-latency query APIs that require strong consistency and ANSI SQL. It’s primarily about operational/serving databases vs analytical warehouses. Why the answer is correct: Cloud Spanner is the best fit because it provides (1) ANSI SQL, (2) horizontal scalability for very high write throughput, and (3) strong consistency with externally consistent reads/writes across regions using TrueTime. With a multi-region Spanner instance and a leader in asia-southeast1, writes can be committed with strong consistency while read-only replicas in Europe and the US can serve reads with strong consistency (at higher latency) or stale reads (lower latency) depending on API requirements. The requirement “consistent access to the most up-to-date data” implies strong reads, which Spanner supports globally. Key features / configurations: - Multi-region Spanner instance: automatic synchronous replication and high availability across regions. - Leader region placement (asia-southeast1) aligns with Singapore-based primary operations and write locality. - Read-only replicas in europe-west1 and us-central1 support global read scaling. - A “single global endpoint” is typically implemented at the API layer (e.g., global external HTTP(S) load balancer) routing to stateless API services that connect to the same Spanner instance; Spanner itself is a single logical database endpoint from an application perspective. - Spanner is designed for financial/ledger-like workloads requiring correctness, transactions, and SQL. Common misconceptions: BigQuery is ANSI SQL and highly scalable, but it is an analytics data warehouse with batch/streaming ingestion and query jobs; it is not designed as a strongly consistent, up-to-the-moment serving database for high-QPS partner APIs. Bigtable scales massively but is not ANSI SQL and does not provide relational querying/joins. Cloud SQL read replicas are asynchronous, so “most up-to-date” reads globally cannot be guaranteed. Exam tips: When you see “global users + SQL + strong consistency + very high write rate + serving APIs,” think Cloud Spanner. When you see “analytics/BI + large scans + OLAP,” think BigQuery. For “wide-column key/value at massive scale without SQL,” think Bigtable. Also watch for replica semantics: Cloud SQL replicas are typically async, which breaks strict freshness requirements.

問題 10

Your team is building a Google Cloud–hosted tool to auto-tag up to 80 customer support emails per second with topic labels so agents can route them, you must release this in 10 business days with zero additional headcount and no team ML experience, and the labels only need to capture subject matter such as product names or issue types; what should you do?

合格体験記(9)

M*********Nov 25, 2025

学習期間: 1 month

I tend to get overwhelmed with large exams, but doing a few questions every day kept me on track. The explanations and domain coverage felt balanced and practical. Happy to say I passed on the first try.

L*************Nov 25, 2025

学習期間: 2 months

Thank you ! These practice questions helped me pass the GCP PDE exam at the first try.

S***********Nov 21, 2025

学習期間: 1 month

The layout and pacing make it comfortable to study on the bus or during breaks. I solved around 20–30 questions a day, and after a few days I could feel my confidence improving.

정

정**Nov 19, 2025

学習期間: 1 month

해설이 영어 기반이긴 하지만 나름 도움 됐어요! 실제 시험이랑 문제도 유사하고 좋네요 ㅎㅎ

E********Nov 16, 2025

学習期間: 2 months

I combined this app with some hands-on practice in GCP, and the mix worked really well. The questions pointed out gaps I didn’t notice during practice labs. Good companion for PDE prep.

他の模擬試験

Practice Test #2

50 問題·120分·合格 700/1000

← すべてのGoogle Professional Data Engineer問題を見る

今すぐ学習を始める

Cloud Passをダウンロードして、すべてのGoogle Professional Data Engineer練習問題を利用しましょう。

外出先でもすべての問題を解きたいですか？

アプリを入手

Cloud Passをダウンロード — 模擬試験、学習進捗の追跡などを提供します。

Cloud Pass

Google Professional Data Engineer

Practice Test #1

50問と120分の制限時間で実際の試験をシミュレーションしましょう。AI検証済み解答と詳細な解説で学習できます。

50問題120分700/1000合格点

練習問題を見る

AI搭載

3重AI検証済み解答＆解説

すべての解答は3つの主要AIモデルで交差検証され、最高の精度を保証します。選択肢ごとの詳細な解説と深い問題分析を提供します。

GPT Pro

Claude Opus

Gemini Pro

選択肢ごとの解説

深い問題分析

3モデル合意の精度

練習問題

問題 1

問題 2

問題 3

問題分析

問題 4

問題 5

問題分析

外出先でもすべての問題を解きたいですか？

Cloud Passをダウンロード — 模擬試験、学習進捗の追跡などを提供します。

問題 6

(2つ選択)

問題分析

問題 7

問題 8

問題 9

Single global endpoint
ANSI SQL support
Consistent access to the most up-to-date data What should you do?

問題分析

問題 10

合格体験記(9)

M*********Nov 25, 2025

学習期間: 1 month

L*************Nov 25, 2025

学習期間: 2 months

Thank you ! These practice questions helped me pass the GCP PDE exam at the first try.

S***********Nov 21, 2025

学習期間: 1 month

The layout and pacing make it comfortable to study on the bus or during breaks. I solved around 20–30 questions a day, and after a few days I could feel my confidence improving.

정

정**Nov 19, 2025

学習期間: 1 month

해설이 영어 기반이긴 하지만 나름 도움 됐어요! 실제 시험이랑 문제도 유사하고 좋네요 ㅎㅎ

E********Nov 16, 2025

学習期間: 2 months

I combined this app with some hands-on practice in GCP, and the mix worked really well. The questions pointed out gaps I didn’t notice during practice labs. Good companion for PDE prep.

他の模擬試験

Practice Test #2

50 問題·120分·合格 700/1000

← すべてのGoogle Professional Data Engineer問題を見る

今すぐ学習を始める

Cloud Passをダウンロードして、すべてのGoogle Professional Data Engineer練習問題を利用しましょう。

外出先でもすべての問題を解きたいですか？

アプリを入手

Cloud Passをダウンロード — 模擬試験、学習進捗の追跡などを提供します。