
GCP
90+ 무료 연습 문제 (AI 검증 답안 포함)
AI 기반
모든 Google Associate Data Practitioner 답안은 3개의 최고 AI 모델로 교차 검증하여 최고의 정확도를 보장합니다. 선택지별 상세 해설과 심층 문제 분석을 제공합니다.
A global sportswear retailer is standardizing on BigQuery for analytics and needs a fully managed way to run a nightly batch ETL at 02:00 UTC that pulls 50 tables (~12 TB total) from mixed sources (Cloud SQL, an SFTP server, and a partner REST API), triggers transformations across multiple Google Cloud services, and then loads curated datasets into BigQuery. Your engineering team (8 developers) is strongest in Python and wants to write maintainable code, use pre-built connectors/operators for Google services, set task dependencies with retries/alerts, and avoid managing servers. Which tool should you recommend to orchestrate these batch ETL workflows while leveraging the team’s Python skills?
이동 중에도 모든 문제를 풀고 싶으신가요?
Cloud Pass를 무료로 다운로드하세요 — 모의고사, 학습 진도 추적 등을 제공합니다.
무료 앱 받기
At a multinational retailer, you maintain a BigQuery dataset ret_prod.sales_tx in project ret-prod that stores tokenized credit card transactions, and you must ensure that only the 8-person Risk-Analytics Google Group (risk-analytics@retail.example) can run SELECT queries on the tables while preventing the other 120 employees in the organization from querying them and adhering to the principle of least privilege; what should you do?
You work for a video-streaming platform. An existing Bash/Python ETL script on a Compute Engine VM aggregates ~120,000 playback events each day from a legacy NFS share, transforms them, and loads the results into BigQuery. The script is run manually today; you must automate a 02:00 UTC daily trigger and add centralized monitoring with run history, task-level logs, and retry visibility for troubleshooting. You want a single, managed solution that uses open-source tooling for orchestration and does not require rewriting the ETL code. What should you do?
A gaming analytics startup collects in-app telemetry from 2 million daily active users across 6 Google Cloud regions (us-central1, europe-west1, asia-east1, australia-southeast1, southamerica-east1, us-east4), producing approximately 120,000 JSON events per minute. You must deliver dashboards in BigQuery with near real-time freshness (under 90 seconds end-to-end). Before loading, each event must be cleaned (drop null fields), enriched with a region_code derived from the producing region, and flattened from nested JSON into a columnar schema. To accelerate delivery and enable future maintainability, the pipeline must be built using a visual, low-code interface. What should you do?
Your healthcare analytics startup stores patient encounter data that is updated once per day at 02:00 UTC and is spread across 6 BigQuery datasets; several tables contain PHI fields like full_name, phone_number, and notes. You need to let a new contract analyst query only non-sensitive operational metrics (e.g., clinic_id, visit_date, procedure_code, total_cost) for the last 180 days while ensuring they cannot access any PHI or underlying base tables. What should you do?
You work for a cold-chain logistics company that streams real-time IoT telemetry (temperature, GPS, battery) from 8,000 refrigerated containers into Pub/Sub at a peak of 50,000 messages per second. You must process the stream with sub–5-second end-to-end p95 latency to: (1) filter out invalid readings (e.g., battery_level < 10%), (2) enrich each event with a static route lookup (~500 route IDs updated hourly), and (3) compute 1-minute per-container aggregates (avg temperature, count) before loading both raw and aggregated records into BigQuery tables partitioned by event_time (daily partitions). You need a Google-recommended design that provides low latency, high throughput, windowed aggregation, and easy autoscaling from Pub/Sub to BigQuery. What should you do?
Your e-commerce company has 160 data staff split across four regional squads (Americas, EMEA, APAC, LATAM). Leadership is concerned that any user can currently move or delete dashboards in the Global Reports Shared folder. You need an easy-to-manage setup that allows everyone to view everything in Global Reports, but only lets each squad move or delete dashboards that belong to their own squad. What should you do?
Your mobile game studio needs to measure player sentiment about a new in-game economy update. You have 30 million rows of player comments from in-app support and app store reviews stored in BigQuery; messages average 140 characters and contain gamer slang, emojis, and mixed casing. You must build and deploy a sentiment classification solution within two weeks with minimal ML operations overhead using managed Google Cloud services. What should you do?
You manage a municipal water utility and must forecast the next 30 days of daily water demand for 85 service districts to plan pumping capacity and avoid shortages. Five years of historical daily meter readings are stored in a BigQuery table utility.daily_demand (district_id STRING, reading_date DATE, liters_used INT64) that exhibits weekday/weekend and summer seasonality. You need a scalable approach that leverages this seasonality and historical data and writes the forecasts into a new BigQuery table. What should you do?
Your media streaming service archives daily viewer comments as newline-delimited JSON files (~5 files/day, ~80 MB each) in a Cloud Storage bucket gs://stream-comments-prod. The comments arrive in 12 languages and must be normalized and translated to French within 30 minutes of file arrival before being stored in BigQuery for analytics. You need a pipeline that is fully serverless, auto-scales to about 60,000 comments per day, and requires minimal maintenance with no clusters to manage. What should you do?
Your analytics team has a 180 MB CSV file (~1.2 million rows) stored in Cloud Storage (gs://retail-dumps/2025-08/sales.csv) that must be filtered to exclude rows where test_flag = true and aggregated to daily revenue by product_id, then loaded into BigQuery for analysis once per day; to minimize operational overhead and cost while keeping performance efficient for this small dataset and simple transformations, which approach should you choose?
You operate a real-time fraud detection service for a fintech app where 1,500 JSON events per second are published to a Pub/Sub topic from mobile devices. You must validate JSON schema, drop records missing required fields, mask PII, and deduplicate by event_id within a 10-minute window before loading to BigQuery. The pipeline must autoscale, handle bursts up to 5,000 events/sec, and keep end-to-end 99th-percentile latency under 4 seconds with minimal operations overhead. What should you do?
You oversee a smart-city media archive in Cloud Storage containing approximately 200 TB/month of raw 4K camera footage, 50 TB of processed highlight clips, and 80 TB of daily backups. Compliance requires that any footage tagged as “evidence” remain immutable for at least 7 years; other data follow these patterns: raw footage is frequently accessed for 14 days then rarely, processed clips are accessed daily for 90 days then infrequently, and backups are rarely accessed but must be retained for at least 365 days. You need to minimize storage costs and satisfy the retention/immutability requirements using a managed, low-overhead approach without building custom code. What should you do?
You manage an energy utility that ingests approximately 8 million smart meter readings per day into BigQuery for billing and analytics. A new compliance rule requires that all meter readings be retained for a minimum of seven years for auditability while keeping storage cost and operations overhead low; what should you do?
A media analytics startup operates an existing Dataproc cluster (1 master, 3 workers) that runs Spark batch jobs on roughly 60 GB of log files stored in Cloud Storage, and they must generate a daily summary CSV at 06:00 UTC and email it to 20 regional managers; they want a fully managed, easy-to-implement approach that minimizes operational overhead and avoids standing up a separate orchestration platform—what should they do?
A national retail chain stores background checks and performance notes for 12,000 employees in BigQuery; compliance requires that within 24 hours of termination, the personal records of the departing employee must be rendered irreversibly unreadable while keeping the data stored for 7 years for audit purposes and without affecting access to other employees’ records—what should you do?
Your e-commerce platform streams about 15 million clickstream events per day into a BigQuery table (analytics.clicks_raw) that is partitioned by ingestion time; to reduce storage costs and meet a retention policy, you must automatically remove any data older than 180 days with minimal ongoing maintenance and query overhead; what should you do?
Your hospital analytics team receives a 5-GB daily CSV export (about 8 million rows, 30 columns) of patient-monitoring events in a Cloud Storage bucket and needs to load it into a partitioned BigQuery table for clinical KPI dashboards. You must stand up a scalable batch pipeline within one day that applies type casting and reference data joins, and that also provides built-in data quality insights (e.g., profiling of nulls, outliers, and schema anomalies) during ingestion; what should you do?
At a university, you store 120,000 course-enrollment records in a BigQuery table university.enrollments partitioned by term, with a STRING column dept_code (e.g., BIO, CHEM, MATH) indicating the student’s department; you must ensure that each academic advisor—who belongs to a Google Group mapped to a single department—can run queries against the table but only see rows where dept_code matches their department, without creating per-department tables or requiring query changes—what should you do?
Your IoT-based fleet tracking platform streams about 50,000 GPS events per minute (peaks to 120,000/min) that must be deduplicated, validated, and enriched by joining each event with a 2,000-row region-code lookup, with an end-to-end latency target under 2 seconds; the cleaned, enriched data will be stored for ad hoc SQL analysis and to train weekly forecasting models, so you must choose the appropriate data manipulation approach and Google Cloud services for this pipeline—what should you select?
Foundational