
GCP
335+ Soal Latihan Gratis dengan Jawaban Terverifikasi AI
Didukung AI
Setiap jawaban Google Professional Machine Learning Engineer diverifikasi silang oleh 3 model AI terkemuka untuk memastikan akurasi maksimum. Dapatkan penjelasan detail per opsi dan analisis soal mendalam.
You deployed a TensorFlow recommendation model to a Vertex AI Prediction endpoint in us-central1 with autoscaling enabled. Over the last week, you observed sustained traffic of ~1,200 requests per hour (about 20 RPS) during business hours, which is 2x higher than your original estimate, and you need to keep P95 latency under 150 ms during future surges. You want the endpoint to scale efficiently to handle this higher baseline and upcoming spikes without causing user-visible latency. What should you do?
Deploying a second model to the same endpoint is mainly for A/B testing, canarying, or gradual rollouts via traffic splitting. It does not inherently guarantee more serving capacity or lower P95 latency unless you also increase total replicas. It also adds operational complexity (two model versions, monitoring two deployments) without directly addressing the need for warm baseline capacity.
Setting minReplicaCount ensures a baseline number of replicas are always running and ready to serve, preventing cold-start/warm-up delays and reducing queueing during predictable business-hour load. This is the standard approach to protect tail latency (P95) when traffic is consistently higher than expected. Autoscaling can still add replicas above the minimum for future surges.
Increasing the target utilization percentage delays scale-out, meaning each replica runs hotter before new replicas are added. That can increase queueing and push P95 latency above the 150 ms SLO during spikes. While it may reduce cost by using fewer replicas, it conflicts with the requirement to avoid user-visible latency during surges and is generally the opposite of what you want for strict latency targets.
Switching to GPU-accelerated machines can reduce inference time for compute-heavy models, but it’s not the first lever for a scaling/latency issue caused by insufficient warm capacity. GPUs increase cost and may have quota/availability constraints in us-central1. If the model already meets latency when enough CPU replicas are warm, GPUs won’t solve autoscaling reaction time or cold-start effects.
Core Concept: This question tests Vertex AI Prediction online serving autoscaling behavior and how to meet latency SLOs under a higher steady-state load. Key knobs are minimum/maximum replica counts and autoscaling signals (utilization/QPS), which directly affect cold-start risk and tail latency. Why the Answer is Correct: With a sustained new baseline (~20 RPS during business hours) that is 2x the original estimate, relying purely on reactive autoscaling can cause periods where too few replicas are available, leading to queueing and cold-start/warm-up delays that inflate P95 latency. Setting minReplicaCount to match the new baseline ensures enough replicas are always provisioned (“warm”) to absorb steady traffic and small surges immediately. Autoscaling can still add replicas for larger spikes, but the floor prevents user-visible latency while new replicas start. Key Features / Best Practices: - Configure minReplicaCount based on observed steady-state throughput per replica and latency headroom. Use load testing to determine safe RPS/replica at P95 < 150 ms. - Keep maxReplicaCount high enough for anticipated surges; ensure quotas (CPU/GPU, regional) support it. - Monitor endpoint metrics (latency percentiles, replica count, utilization, request backlog/errors) and adjust. - This aligns with Google Cloud Architecture Framework reliability and performance principles: provision for predictable load, autoscale for variability, and design to meet SLOs. Common Misconceptions: It’s tempting to “scale later” (option C) to save cost, but that worsens latency during spikes. Similarly, adding another model (option A) doesn’t increase capacity unless it results in more replicas, and it complicates routing/versioning. GPUs (option D) can reduce per-request latency for some models, but they don’t address cold-start and may be unnecessary/costly for a TensorFlow recommender that already meets latency when adequately provisioned. Exam Tips: For online prediction, when you see sustained baseline traffic plus strict tail-latency targets, think “set min replicas to cover baseline” and “autoscale for spikes.” Use GPUs only when profiling shows compute-bound inference and CPU can’t meet latency at reasonable replica counts. Always consider warm capacity, scaling reaction time, and quotas in the chosen region.
You plan to fine-tune a video-frame classifier via transfer learning using a pre-trained ResNet-50 backbone. Your labeled dataset contains 18,000 1080p frames, and you will retrain the model once per day; each training run completes in under 60 minutes on 4 V100 GPUs, and you must minimize infrastructure cost and operational overhead. Which platform components and configuration should you choose?
A single Deep Learning VM with 4 V100s can meet the performance requirement, and local SSDs provide high throughput. However, it increases operational overhead (VM lifecycle, patching, GPU/CUDA/driver management) and can increase cost if the VM is left running between daily trainings. You also must manage data staging to local SSD each run and handle durability/backups yourself.
Reading/writing directly to Cloud Storage from a Deep Learning VM reduces local storage management, but you still own the VM operations and typically pay for idle GPUs unless you automate start/stop and image maintenance. For 1080p frames, Cloud Storage I/O is usually fine, but the primary requirement is minimizing operational overhead, which is better addressed by managed Vertex AI Training.
GKE with GPU node pools and a self-managed NFS server is the highest operational overhead option: cluster management, GPU scheduling, node autoscaling tuning, NFS reliability/backup, and security hardening. It’s appropriate for complex multi-tenant training platforms or many concurrent jobs, not a single daily <60-minute training run where simplicity and cost minimization are priorities.
Vertex AI Training custom job with 4 V100 GPUs and data in Cloud Storage is the most managed, lowest-ops approach. It provisions GPUs only for the job, integrates with Cloud Logging/Monitoring, and avoids managing VM images, drivers, and cluster infrastructure. Cloud Storage provides durable, low-cost dataset storage and easy access for repeatable daily retraining runs.
Core concept: This question tests choosing the right managed training platform for recurring GPU training with minimal ops. It contrasts self-managed GPU VMs/GKE versus Vertex AI Training, and how to store training data (Cloud Storage) for durability and repeatability. Why the answer is correct: A managed Vertex AI Training job (custom scale tier with 4 V100 GPUs) best matches “minimize infrastructure cost and operational overhead” for a once-per-day retraining workload. Vertex AI Training provisions GPU resources only for the job duration, tears them down automatically, and provides managed logging/metrics and artifact handling. With daily runs under 60 minutes, paying only for the training job runtime (plus storage) typically reduces idle GPU cost compared to keeping a VM or cluster around. Storing the 18,000 1080p frames in Cloud Storage is the standard pattern: durable, low-ops, and accessible from managed training without managing disks or file servers. Key features / best practices: - Vertex AI Training custom jobs: specify machine type and accelerators (4x V100) and container-based training. - Cloud Storage as the system of record for datasets and outputs; supports versioning/lifecycle policies and integrates with IAM. - Operational excellence (Google Cloud Architecture Framework): managed service reduces toil (patching, driver/CUDA management, job retries, centralized logs). - Cost optimization: ephemeral training resources; no need to pay for GPUs when not training. (If available in your region, consider Spot/Preemptible GPUs for additional savings, but the question doesn’t require it.) Common misconceptions: - “A single VM is simpler”: it can be, but it often leads to paying for GPUs while idle or to manual start/stop automation, plus ongoing maintenance (drivers, images, security patching). - “GKE is more scalable”: true, but it adds significant operational overhead for a simple daily training job. Exam tips: When you see recurring training, short runtimes, and a requirement to minimize ops, prefer Vertex AI Training + Cloud Storage. Choose self-managed VMs/GKE only when you need custom networking, specialized storage/latency, or long-running/interactive workflows that justify the management burden.
Your team is preparing to train a fraud detection model using data in BigQuery that includes several fields containing PII (for example, card_number, customer_email, and phone_number). The dataset has approximately 250 million rows and every column is required as a feature. Security requires that you reduce the sensitivity of PII before training while preserving each column’s format and length so downstream SQL joins and validations continue to work. The transformation must be deterministic so the same input always maps to the same protected value, and authorized teams must be able to decrypt values for audits. How should you proceed?
Randomizing sensitive values is not deterministic unless you maintain a mapping table, and it is typically not reversible for audits. It also risks breaking referential integrity and downstream joins/validations because randomized outputs may not preserve the original format/length constraints (e.g., card number patterns). While Dataflow can scale to 250M rows, this approach does not meet the deterministic and decryptable requirements.
Cloud DLP can both identify PII and apply de-identification at scale. Using DLP Format-Preserving Encryption (FPE) preserves the original data’s format and length, enabling downstream SQL joins and validations to continue working. With Cloud KMS protecting the key material, authorized teams can re-identify/decrypt for audits under controlled IAM. Dataflow provides the scalable execution layer for transforming hundreds of millions of BigQuery rows.
AES-256 with a per-row random salt makes the transformation non-deterministic (same input encrypts differently each time), which breaks the requirement for stable mapping needed for joins and consistent feature values. Additionally, standard ciphertext encoding (base64/hex) changes length and character set, violating format/length preservation. Building custom crypto also increases implementation risk compared to DLP’s managed FPE designed for this use case.
Dropping PII columns contradicts the requirement that every column is required as a feature for training. Authorized views can restrict access, but they do not reduce the sensitivity of the data used in training; the model pipeline would still either need the raw PII (violating security requirements) or lose critical features. This option addresses access control, not deterministic, reversible de-identification.
Core Concept: This question tests privacy-preserving feature engineering for ML using managed de-identification. The key services are Cloud Data Loss Prevention (DLP) for de-identification and Format-Preserving Encryption (FPE), Cloud KMS for key management, and Dataflow for scalable transformation of BigQuery-scale datasets. Why the Answer is Correct: You must reduce PII sensitivity while (1) preserving each column’s format and length, (2) ensuring deterministic mapping (same input -> same output), and (3) enabling authorized re-identification (decrypt) for audits. Cloud DLP’s FPE is designed exactly for this: it produces ciphertext that matches the original data’s character set/length constraints (e.g., credit card-like strings), can be configured deterministically, and supports reversible transformation when paired with appropriate keying material. Using Cloud KMS to protect the wrapping key aligns with enterprise security and auditability requirements. Dataflow provides the throughput needed for ~250M rows and integrates well with BigQuery I/O. Key Features / Configurations: - DLP de-identification template using cryptoReplaceFfxFpeConfig (FPE/FFX mode) to preserve format/length. - Deterministic behavior via consistent keying material and configuration; optionally use a stable surrogate/“tweak” strategy if required by policy. - Cloud KMS-managed key encryption key (KEK) to protect the DLP crypto key material (enables centralized IAM, rotation, and audit logs). - Dataflow pipeline (batch) reading from BigQuery, applying DLP transform to specific columns, and writing back to BigQuery for training. - Principle of least privilege: Dataflow service account needs BigQuery read/write and DLP/KMS permissions; restrict decrypt capability to audit teams. Common Misconceptions: - “Randomizing” values removes PII but breaks joins/validations and is not reversible. - Standard encryption with random salts improves security but defeats determinism and typically changes length/format, breaking downstream SQL expectations. - Dropping PII columns violates the requirement that every column is needed as a feature. Exam Tips: When you see requirements for (a) preserving format/length, (b) deterministic tokenization, and (c) reversible access for authorized users, think Cloud DLP FPE + Cloud KMS. For very large BigQuery datasets, pair it with Dataflow for scalable batch processing and use DLP templates for repeatability and governance (aligned with the Google Cloud Architecture Framework’s security and operational excellence pillars).
Your team deployed a regression model that predicts hourly water usage for industrial chillers. Four months after launch, a vendor firmware update changed sensor sampling and units for three input features, and the live feature distributions diverged: 5 of 18 features now have a population stability index > 0.25, 27% of temperature readings fall outside the training range, and production RMSE increased from 0.62 to 1.45. How should you address the input differences in production?
Correct. The evidence (high PSI, many values outside training range, RMSE jump) indicates data drift/skew driven by upstream firmware changes. Automated monitoring with alerts is required to detect and quantify ongoing drift, and an automated retraining pipeline using recent production data (with corrected preprocessing/unit normalization) is the standard operational response to restore performance and reduce future downtime.
Incorrect. Feature selection may reduce sensitivity to some drifting inputs, but it does not fix the root cause: the meaning/units/sampling of inputs changed. Removing “low-importance” features also won’t address that 27% of temperature values are out of range or that multiple key features have shifted. You still need monitoring and likely preprocessing updates and retraining on correctly interpreted data.
Incorrect. Hyperparameter tuning (e.g., L2 regularization) addresses overfitting/underfitting, not a production data contract change. With unit changes and distribution shift, tuning regularization may slightly stabilize predictions but will not restore accuracy reliably. The correct approach is to detect drift, validate feature semantics, update transformations, and retrain with representative recent data.
Incorrect. A fixed monthly retraining schedule is weaker than event-driven monitoring and response. It can leave the system degraded for weeks after a sudden upstream change. Feature selection still doesn’t solve unit/sampling changes. Best practice is continuous monitoring with alerting and automated pipelines that retrain when drift/performance thresholds are exceeded, with evaluation gates before deployment.
Core Concept: This scenario tests production ML monitoring for data skew/drift and the operational response when upstream systems change. In Google Cloud, this maps to Vertex AI Model Monitoring (feature skew/drift, out-of-distribution detection, performance monitoring) plus an automated retraining pipeline (Vertex AI Pipelines/Cloud Composer/Cloud Build) to continuously adapt models. Why the Answer is Correct: A vendor firmware update changed sampling and units for multiple input features, causing clear distribution shift (PSI > 0.25 on 5/18 features, 27% of temperature outside training range) and a large performance regression (RMSE 0.62 → 1.45). This is not primarily a modeling/regularization problem; it’s a data contract and data drift problem. The correct response is (1) detect and alert on skew/drift and (2) update the model (and often the preprocessing) using recent, correctly interpreted production data. Automated monitoring prevents silent degradation, and a retraining pipeline shortens mean time to recovery when upstream changes recur. Key Features / Best Practices: - Use Vertex AI Model Monitoring to track feature skew/drift (training vs serving), set thresholds, and route alerts to Cloud Monitoring. - Log prediction requests/responses and ground truth (when available) to enable performance monitoring (RMSE) and root-cause analysis. - Implement robust feature engineering with explicit unit normalization and schema validation (e.g., TFDV/Great Expectations) to catch unit changes early. - Automate retraining with Vertex AI Pipelines, including data extraction, validation, training, evaluation gates, and safe rollout (canary/rollback). Common Misconceptions: It’s tempting to “fix” the model with feature selection or stronger regularization, but those do not address incorrect units/sampling or out-of-range inputs. If the feature semantics changed, the model is effectively receiving different variables than it was trained on. Exam Tips: When you see PSI drift, out-of-training-range rates, and degraded metrics, prioritize monitoring + data validation + retraining/refresh. For upstream changes, also consider updating preprocessing/feature store transformations and establishing data contracts with vendors. In the Architecture Framework, this aligns with Operational Excellence (monitoring/automation) and Reliability (rapid detection and recovery).
You are a data scientist at a city transportation agency tasked with forecasting hourly bike-share demand per station to optimize rebalancing. Your historical trips table in BigQuery contains 24 months of data (~22 million rows) with columns: timestamp, station_id, neighborhood, weather_condition (sunny/rainy/snow), special_event (boolean), and surge_pricing_flag (boolean). You need to choose the most effective combination of a BigQuery ML model and feature engineering to minimize RMSE while capturing weekly/seasonal patterns and handling multiple categorical variables; what should you do?
Best choice: LINEAR_REG can perform very well when you engineer strong temporal features (hour, day-of-week, month) that expose weekly/seasonal cycles. One-hot encoding is appropriate for nominal categories like station_id and weather_condition, preventing false ordinal relationships. This combination is a standard, exam-favored approach in BigQuery ML for demand prediction with multiple categorical inputs and periodic patterns.
Boosted trees can be powerful, but label encoding introduces artificial ordering for nominal categories (e.g., station_id=10 > station_id=2), which can distort splits and reduce generalization. Casting timestamp to a single Unix time value does not explicitly represent cyclical weekly/seasonal patterns; the model must infer periodicity indirectly, often leading to worse RMSE than using engineered time components.
Autoencoders are primarily unsupervised models used for dimensionality reduction or anomaly detection, not for directly optimizing supervised regression RMSE on demand forecasting. Label encoding has the same ordinal-risk issue as in option B. Normalizing timestamp to [0,1] still fails to capture cyclical structure (e.g., hour 23 is close to hour 0), so it’s not effective for weekly/seasonal patterns.
Matrix factorization is suited to collaborative filtering (user-item rating prediction) and latent factor discovery, not time-dependent demand regression with exogenous features. While interaction features (station_id x weather) can be useful, the underlying model type is mismatched for forecasting hourly demand and won’t naturally incorporate temporal seasonality without additional time feature engineering and a regression-appropriate algorithm.
Core Concept: This question tests selecting an appropriate BigQuery ML model type and feature engineering for time-based demand forecasting with many categorical variables. It emphasizes how model choice interacts with feature representation (one-hot vs label encoding) and how to encode seasonality/weekly patterns from timestamps. Why the Answer is Correct: Option A (linear regression + one-hot encoding + explicit time features) is the most effective and reliable BigQuery ML approach among the choices for minimizing RMSE while capturing weekly/seasonal patterns. In BigQuery ML, linear models (LINEAR_REG) work well when you provide informative engineered features. Creating hour-of-day, day-of-week, and month (and often additional features like is_weekend, holiday flags, and cyclic transforms) allows the model to learn periodic demand patterns. One-hot encoding is appropriate for nominal categorical variables (station_id, neighborhood, weather_condition) because it avoids imposing an artificial ordering that label encoding introduces, which can degrade performance and stability. Key Features / Best Practices: - Use BigQuery ML LINEAR_REG with automatic feature preprocessing where appropriate, but explicitly engineer time-derived features to expose periodicity. - One-hot encode nominal categories; consider reducing cardinality (e.g., station_id) via grouping rare stations or using neighborhood-level features if needed for sparsity/compute. - Consider interactions (e.g., station_id x hour, weather x hour) if supported via feature crosses in SQL to capture localized temporal effects. - Use proper train/validation splits by time (e.g., last N weeks as eval) to avoid leakage and to reflect forecasting reality. Common Misconceptions: Boosted trees are often strong, but casting timestamp to a single Unix number (Option B) hides cyclical structure; trees may learn some thresholds but typically won’t represent weekly/seasonal periodicity as cleanly as explicit time features. Label encoding for nominal categories can mislead both linear and tree models by implying rank/order. Autoencoders (Option C) are for representation learning/anomaly detection, not supervised regression RMSE optimization in this setup. Matrix factorization (Option D) is designed for recommendation-style user-item interactions, not time-series demand regression with exogenous variables. Exam Tips: For BigQuery ML forecasting-like problems without using specialized time-series models, prioritize: (1) explicit time feature engineering for periodicity, (2) correct categorical handling (one-hot for nominal), and (3) time-based evaluation splits. Watch for answers that “simplify” timestamps into a single numeric value—this usually harms seasonality learning and is a common trap.
Ingin berlatih semua soal di mana saja?
Unduh Cloud Pass gratis — termasuk tes latihan, pelacakan progres & lainnya.
You work for a vacation rental marketplace with 1.8 million property listings stored across BigQuery and Cloud Storage; the current search relies on keyword matching and filter chips, but you are seeing more complex semantic queries that reference amenities and metadata (for example, "quiet pet-friendly cabin near a lake with a fireplace, sleeps 6, under $200/night, host rating > 4.7"). You must deliver a revamped semantic search proof of concept within 2 weeks with minimal custom modeling and integration effort that can quickly index both structured listing attributes and unstructured descriptions; what should you choose as the search backend?
A foundational LLM is not a search backend or index. While an LLM can interpret queries and generate responses, it does not natively provide document/listing indexing, deterministic filtering (price, rating thresholds), faceting, or scalable retrieval over 1.8M listings. You would still need a retrieval system (keyword, vector, or managed search) to ground results and meet latency/cost expectations.
Vertex AI Vector Search is excellent for approximate nearest neighbor retrieval over embeddings and can support semantic similarity at scale. However, for a 2-week POC with minimal integration, it is usually more work: you must build ingestion, generate embeddings for descriptions, design and maintain metadata filtering, and implement hybrid ranking and query orchestration. It’s a component, not a full search product.
Hosting a BERT-based model on a Vertex AI endpoint implies building a custom semantic retrieval/ranking system. You would need to handle embedding generation, indexing, retrieval, and filtering yourself, plus manage model serving, scaling, and updates. This is higher engineering effort and risk for a 2-week proof of concept, and it’s unnecessary when managed search capabilities exist.
Vertex AI Agent Builder (Search) provides a low-code, managed semantic search backend with ingestion, indexing, hybrid retrieval, and metadata filtering/faceting. It is well-suited for quickly unifying structured listing attributes with unstructured descriptions and delivering an API-ready search experience. This matches the requirement for minimal custom modeling and rapid POC delivery.
Core Concept: This question tests choosing a low-code, production-ready semantic search backend on Google Cloud that can index both structured attributes (price, sleeps, ratings) and unstructured text (descriptions) with minimal custom modeling and fast time-to-value. Why the Answer is Correct: Vertex AI Agent Builder (specifically its Search capability, formerly tied to Gen App Builder/Enterprise Search) is designed to stand up semantic search quickly. It provides managed ingestion, indexing, and retrieval over heterogeneous data sources, and supports hybrid retrieval (keyword + semantic) and filtering/faceting over structured metadata—exactly what a marketplace search needs. For a 2-week proof of concept, Agent Builder minimizes integration effort: you configure a data store, connect sources (e.g., Cloud Storage documents/JSON, BigQuery exports or feeds), map metadata fields, and get an API/UI-ready search experience without building and serving your own retrieval stack. Key Features / Best Practices: - Hybrid search: combines lexical matching with semantic relevance to handle complex queries. - Metadata filtering and facets: critical for constraints like “sleeps 6”, “under $200”, “rating > 4.7”, “pet-friendly”. - Managed indexing and relevance tuning: reduces operational burden versus self-managed pipelines. - Rapid POC path: minimal custom modeling; you can optionally add embeddings/LLM-based query understanding later. - Architecture Framework alignment: accelerates delivery (performance and operational excellence) while reducing undifferentiated heavy lifting (reliability and security via managed service). Common Misconceptions: Many candidates jump to “Vector Search” because semantic search implies embeddings. However, vector retrieval alone doesn’t provide a complete search product: you still must build ingestion, chunking, embedding generation, metadata schema, filtering logic, ranking, and query orchestration. Agent Builder packages these capabilities into a cohesive search backend. Exam Tips: - If the prompt emphasizes “2 weeks”, “minimal custom modeling”, and “search backend” with structured + unstructured data, look for managed search solutions (Agent Builder/Search) rather than raw model hosting or standalone vector databases. - Use Vector Search when you are building a custom RAG/retrieval layer and can invest in pipeline and ranking logic; use Agent Builder when you want an end-to-end enterprise search experience quickly. - LLMs are not search indexes; they complement retrieval but don’t replace indexing and filtering requirements.
You are a data scientist at a national power utility analyzing 850 million smart-meter readings from 3,000 substations collected over 5 years; for exploratory analysis, you must compute descriptive statistics (mean, median, mode) by device and region, perform complex hypothesis tests (e.g., differences between peak vs off-peak and seasonal periods with multiple comparisons), and plot feature variations at hourly and daily granularity over time, while using as much of the telemetry as possible and minimizing computational resources—what should you do?
Not ideal because it calculates descriptive statistics and runs statistical analyses inside a notebook after importing data. With 850 million rows, pulling large volumes into a user-managed notebook is expensive and slow, and may exceed memory/IO limits. Looker Studio is fine for visualization, but the heavy computations should be pushed down to BigQuery to minimize compute and leverage MPP execution.
Incorrect because it relies entirely on a Vertex AI Workbench user-managed notebook for importing and analyzing the full dataset. Notebooks are not designed as a primary engine for scanning and aggregating hundreds of millions of records; this typically requires large VM sizing, long runtimes, and high cost. It also reduces reproducibility and scalability compared to BigQuery-based processing.
Partially correct: BigQuery is the right place for descriptive statistics at scale, and Workbench can run complex hypothesis tests. However, using notebooks to generate all time plots is not the most resource-efficient approach for interactive hourly/daily visual exploration. Looker Studio can query BigQuery directly and offload visualization without keeping notebook compute running.
Correct: BigQuery handles large-scale aggregations and descriptive statistics efficiently, minimizing compute and cost. Looker Studio connects directly to BigQuery for interactive time-series plots at hourly/daily granularity without exporting data. Vertex AI Workbench is then used only for complex hypothesis testing and multiple-comparison procedures, ideally on filtered/aggregated extracts from BigQuery, balancing fidelity with resource efficiency.
Core Concept: This question tests choosing the right tools for large-scale exploratory data analysis (EDA) on Google Cloud: push aggregation and filtering to BigQuery (serverless MPP analytics), use a BI tool for interactive visualization, and reserve notebooks for advanced statistics that are not easily expressed in SQL. Why the Answer is Correct: With 850 million time-series readings, importing “full data” into a notebook is inefficient and often infeasible due to memory/IO limits and high compute cost. BigQuery is designed to scan and aggregate massive datasets efficiently and can compute descriptive statistics by device/region (mean, approximate quantiles for median, counts for mode) using SQL at scale. For plotting hourly/daily variations over time, Looker Studio (formerly Data Studio) can query BigQuery directly, enabling interactive dashboards without exporting data or running a notebook continuously. Complex hypothesis tests with multiple comparisons (e.g., t-tests/ANOVA variants, nonparametric tests, p-value adjustments) are better handled in Python/R in Vertex AI Workbench; critically, the notebook should query only the necessary slices/aggregates from BigQuery to minimize resources while still using as much telemetry as possible. Key Features / Best Practices: - BigQuery: partitioning by timestamp and clustering by device_id/region to reduce scanned bytes and cost; approximate quantiles for scalable median; materialized views or scheduled queries for repeated rollups. - Looker Studio: direct BigQuery connector, cached results, parameterized filters for peak/off-peak and seasonal windows. - Vertex AI Workbench: use BigQuery client/BigQuery Storage API to pull only required subsets; run statistical libraries (SciPy/Statsmodels) for hypothesis testing and multiple-comparison corrections. These align with Google Cloud Architecture Framework principles: choose managed services, optimize cost/performance, and separate concerns (analytics vs visualization vs advanced computation). Common Misconceptions: A and B assume notebooks are the primary engine for both aggregation and visualization, but notebooks are not optimized for scanning hundreds of millions of rows and lead to oversized instances and long runtimes. C is close, but it misses the most resource-efficient approach for visualization: using Looker Studio directly on BigQuery avoids notebook-based plotting workloads and supports broad stakeholder exploration. Exam Tips: For very large datasets, default to BigQuery for heavy aggregations and filtering, BI tools for dashboards, and notebooks for specialized analyses. Watch for phrases like “minimize computational resources” and “use as much telemetry as possible”—they usually imply serverless analytics (BigQuery) plus direct-connect visualization rather than exporting data into notebooks.
You are launching a grocery delivery mobile app across 3 cities and will use Google Cloud's Recommendations AI to build, test, and deploy product suggestions; you currently capture about 2.5 million user events per day, maintain a catalog of 120,000 SKUs with accurate price and availability, and your business objective is to raise average order value (AOV) by at least 6% within the next quarter while adhering to best practices. Which approach should you take to develop recommendations that most directly increase revenue under these constraints?
"You Might Also Like" is commonly used for discovery and can improve click-through rate on a home feed, but CTR is not the same as revenue. Home feed users may be in browsing mode, so incremental clicks may not translate into larger baskets or higher AOV. For an explicit AOV +6% goal in a short timeframe, cross-sell at high-intent moments is typically more effective than broad personalization on the home screen.
"Frequently Bought Together" is purpose-built to recommend complementary items that increase basket size (attach rate). Showing these recommendations on product detail and cart pages targets users close to purchase, which most directly impacts AOV and revenue. This aligns with best practices: leverage strong purchase/add-to-cart signals from high event volume and use accurate catalog metadata (price/availability) to avoid recommending out-of-stock or irrelevant items.
This reverses best practice. Recommendations AI Retail requires a product catalog to exist so events can be correctly attributed to items and enriched with metadata. Importing events first can lead to unmatched item IDs, reduced training quality, and delayed time-to-value. The system does not "backfill" missing metadata reliably from events; you should upload/maintain the catalog first (or in parallel) and then stream/import user events.
Creating placeholder SKUs with default categories/prices is explicitly counterproductive for recommendation quality and business outcomes. Recommendations AI relies on accurate item attributes (category, price, availability) to train and to filter/serve relevant results. Placeholders can cause irrelevant or misleading recommendations (e.g., wrong price or out-of-stock), harming user trust and conversion. It also complicates governance and measurement during A/B tests, producing noisy results.
Core Concept: This question tests how to choose the most appropriate Recommendations AI model type and placement to meet a concrete business KPI (increase AOV/revenue) while following data-quality best practices. Recommendations AI offers different recommendation types optimized for different user intents and surfaces (home feed vs product detail vs cart). Why the Answer is Correct: To most directly increase revenue/AOV, you want to increase basket size and attach rate (adding complementary items to an order). "Frequently Bought Together" is designed for cross-sell by recommending items that are commonly purchased in the same transaction. Placing it on product detail pages and especially the cart page targets users at high purchase intent, where incremental add-ons are most likely to convert and increase order value within a quarter. With 2.5M events/day and a well-maintained catalog (120k SKUs with accurate price/availability), you have the key prerequisites to train and serve high-quality recommendations quickly. Key Features / Best Practices: - Use the Retail domain in Recommendations AI with a complete, accurate product catalog (including price, availability, categories, and attributes) and high-volume user events (view, add-to-cart, purchase). - Ensure event logging includes user IDs (or visitor IDs), product IDs, timestamps, and event types; prioritize purchase and add-to-cart signals for revenue impact. - Run online A/B tests (e.g., via your experimentation framework) comparing placements (PDP vs cart) and measure AOV, conversion rate, and revenue per session, not just CTR. - Follow the Google Cloud Architecture Framework: align with business goals (AOV), ensure data quality and governance (accurate catalog), and design for reliability/observability (monitor recommendation serving latency and drift). Common Misconceptions: CTR-optimized placements (home feed) can look successful but may not move revenue. Also, trying to shortcut catalog quality (placeholders) often degrades model performance and can violate best practices for Retail recommendations. Exam Tips: When the KPI is revenue/AOV, prefer cross-sell/upsell recommendation types and high-intent surfaces (PDP/cart). When the KPI is engagement/discovery, home-feed "You Might Also Like" can be appropriate. Always import catalog first (or keep it current) and avoid synthetic placeholders; Recommendations AI depends heavily on accurate item metadata and availability.
You are training a LightGBM model to forecast daily inventory for 120 stores using a small dataset (~60 MB) on Vertex AI; your training script needs a system library (libgomp) and several custom Python packages, and each run takes about 10 minutes, so you want job startup time to be under 2 minutes to minimize overhead. How should you configure the Vertex AI custom training job to minimize startup time while keeping the dataset easy to update?
Correct. A custom container image pre-installs libgomp and all Python dependencies, eliminating runtime installation overhead and improving reproducibility. Keeping the dataset in Cloud Storage makes it easy to update independently of the container image. For a 60 MB dataset, Cloud Storage read time is small compared to the time saved by avoiding pip/apt installs, helping meet the <2 minute startup goal.
Incorrect. The main problem is that this option relies on the scikit-learn prebuilt container and installs custom dependencies at runtime, which adds startup latency and variability that works against the under-2-minute requirement. It also does not properly address the need for the system library libgomp, because OS-level dependencies are not reliably handled through a Python source distribution or normal pip installation. In addition, bundling the dataset with the source package is not a good pattern for Vertex AI training inputs, since datasets should generally be stored separately in Cloud Storage so they can be updated and versioned independently of the training code.
Incorrect. Baking the dataset into the container can reduce data download time, but it makes the dataset difficult to update because every data refresh requires rebuilding and redeploying the container image. It also increases image size, which can increase image pull time and negate startup benefits. This conflicts with the requirement to keep the dataset easy to update.
Incorrect. Keeping data in Cloud Storage is good for updates, but installing dependencies at startup from a requirements file is the main problem: it adds significant latency and can be unreliable due to network/package repository issues. Additionally, libgomp is a system library and may require apt-get steps, further increasing startup time beyond the 2-minute requirement.
Core Concept: This question tests how Vertex AI Custom Training startup time is affected by environment provisioning (container image pull + dependency installation) versus data access patterns (reading from Cloud Storage). It also tests best practices for packaging system libraries (like libgomp for LightGBM) and Python dependencies to ensure fast, repeatable jobs. Why the Answer is Correct: Option A minimizes startup time by prebuilding a custom container image that already contains (1) the OS-level dependency libgomp and (2) all required Python packages and your training entrypoint. With this approach, Vertex AI only needs to schedule the worker and pull the image; it does not spend extra minutes installing apt packages or pip dependencies at runtime. The dataset remains in Cloud Storage, which keeps it easy to update without rebuilding the image. For a small dataset (~60 MB), reading from Cloud Storage adds minimal overhead and is typically far less than runtime dependency installation. Key Features / Best Practices: - Use a custom container for Custom Training when you need system libraries or nontrivial dependencies. This aligns with reproducibility and operational excellence in the Google Cloud Architecture Framework. - Keep data external (Cloud Storage) so updates don’t require image rebuilds; version data via object versioning or date-partitioned paths. - Avoid runtime installs: pip/apt at startup increases latency, introduces network variability, and can fail due to transient repository issues. - LightGBM commonly requires libgomp (OpenMP); baking it into the image is the reliable approach. Common Misconceptions: A frequent mistake is assuming prebuilt containers plus requirements.txt is “good enough.” In practice, installing system packages and Python wheels at job start often pushes startup beyond a 2-minute target. Another misconception is baking data into the image for speed; that harms maintainability and forces rebuilds for every data refresh. Exam Tips: For Vertex AI Custom Training, choose custom containers when you need OS-level dependencies or strict startup SLAs. Keep datasets in Cloud Storage (or BigQuery) for easy updates and governance. Prebuilt containers are best when dependencies are minimal and can be installed quickly, or when you can accept longer startup times.
You are building an end-to-end scikit-learn MLOps workflow in Vertex AI Pipelines (Kubeflow Pipelines) that ingests 50 GB of CSV data from Cloud Storage, performs data cleaning, feature selection, model training, and model evaluation, then writes a .pkl model artifact to a versioned path in a GCS bucket. You are iterating on multiple versions of the feature selection and training components, submitting each version as a new pipeline run in us-central1 on n1-standard-4 CPU-only executors; each end-to-end run currently takes about 80 minutes. You want to reduce iteration time during development without increasing your GCP costs; what should you do?
Skipping or commenting out components can reduce runtime, but it is a manual, error-prone workflow that changes the pipeline graph and may bypass important validations. It also doesn’t leverage Vertex AI Pipelines’ built-in orchestration best practices. In team settings, this harms reproducibility and makes it harder to compare runs because different runs execute different subsets of steps.
Step caching is designed for exactly this scenario: repeated pipeline runs where only a subset of components change. With caching enabled, unchanged steps (like ingestion and cleaning) can reuse prior outputs, cutting iteration time while typically lowering costs because fewer tasks execute. This is the most direct, MLOps-aligned approach within Vertex AI Pipelines/Kubeflow Pipelines.
Dataflow can be excellent for large-scale ETL, but migrating feature processing to Dataflow is a redesign that adds operational overhead and may increase costs (Dataflow job charges, potential always-on resources, and additional integration). It also doesn’t inherently solve iteration speed if the pipeline still retriggers expensive processing; caching would still be needed.
Adding a T4 GPU increases cost and often provides little to no benefit for scikit-learn training, which is typically CPU-bound and not GPU-accelerated. Even if training sped up, the overall 80-minute runtime likely includes significant data ingestion/cleaning time, so a GPU would not address the main bottleneck and violates the “without increasing costs” requirement.
Core Concept: This question tests Vertex AI Pipelines (Kubeflow Pipelines) execution optimization during iterative development, specifically pipeline/step caching (a.k.a. reuse of execution results) to avoid recomputing unchanged components. Why the Answer is Correct: Enabling step caching allows Vertex AI Pipelines to reuse outputs from prior runs when a component’s inputs, container image, command, and relevant metadata have not changed. In an iterative workflow where you repeatedly modify only feature selection and training, the expensive upstream steps (e.g., ingesting 50 GB from Cloud Storage, cleaning, and any stable preprocessing) can be skipped automatically, reducing end-to-end runtime without adding compute resources. Because you are not increasing machine sizes or adding accelerators, costs typically decrease (fewer CPU-minutes consumed) while iteration speed improves. Key Features / Best Practices: Vertex AI Pipelines supports caching at the task/component level. Best practice is to: 1) Ensure deterministic components (same inputs -> same outputs) and stable base images. 2) Version inputs explicitly (e.g., GCS URIs with generation numbers or versioned paths) so cache behavior is predictable. 3) Avoid embedding timestamps/randomness in component logic or output paths unless intentionally invalidating cache. 4) Use pipeline parameters for feature-selection configuration so only the affected steps invalidate. This aligns with the Google Cloud Architecture Framework principles of cost optimization and operational excellence by reducing wasteful recomputation. Common Misconceptions: It’s tempting to “comment out” steps (Option A), but that changes the pipeline definition and can break dependencies, reduce test coverage, and doesn’t scale as a disciplined MLOps practice. Moving to Dataflow (Option C) may improve performance but introduces additional services and can increase costs/complexity; it’s not the most direct solution for iteration speed “without increasing costs.” Adding a GPU (Option D) increases cost and may not help scikit-learn CPU-bound training. Exam Tips: For questions about faster iteration in pipelines, first consider caching, modular components, and parameterization before scaling hardware. On the exam, “reduce time without increasing cost” strongly signals reuse/caching rather than bigger machines, GPUs, or service migrations.
Ingin berlatih semua soal di mana saja?
Unduh Cloud Pass gratis — termasuk tes latihan, pelacakan progres & lainnya.
Your team must deliver an ML solution on Google Cloud to triage warranty claim emails for a global appliance manufacturer into 8 categories within 4 weeks. You are required to use TensorFlow to maintain full control over the model's code, serving, and deployment, and you will orchestrate the workflow with Kubeflow Pipelines. You have 30,000 labeled examples and want to accelerate delivery by leveraging existing resources and managed services instead of training a brand-new model from scratch. How should you build the classifier?
Natural Language API provides pretrained NLP capabilities (sentiment, entity extraction, and a limited content classification taxonomy). It is fast to integrate but offers minimal control over model architecture, training, and deployment. It also may not support custom 8-category warranty triage labels. This conflicts with the requirement to use TensorFlow and maintain full control over code, serving, and deployment, making it unsuitable here.
AutoML Natural Language can train a custom text classifier quickly from labeled data and is often a strong choice for rapid delivery. However, it is a managed training and serving solution where you do not maintain full control over the TensorFlow model code and deployment mechanics. While it can be orchestrated in pipelines, it violates the explicit requirement for full TensorFlow control, so it is not the best answer.
Transfer learning with an established text classification model (pretrained language model or embedding backbone) lets you fine-tune on 30,000 labeled emails quickly and reliably. You keep full control by implementing training in TensorFlow, packaging the model, and deploying with custom serving (Vertex AI custom containers or GKE/KServe). This aligns with the 4-week timeline, leverages existing resources, and fits Kubeflow Pipelines orchestration.
Using an established text classification model “as-is” is unlikely to work because the target labels are specific (8 warranty triage categories) and won’t match the pretrained model’s original label set or taxonomy. Even if the model outputs generic categories, it won’t map cleanly to your business classes without adaptation. The requirement emphasizes leveraging existing resources, but still implies customization; transfer learning is needed.
Core concept: This question tests when to use transfer learning with TensorFlow on Google Cloud (Vertex AI/legacy AI Platform) versus fully managed “no/low-code” NLP services, under constraints requiring full control of model code, serving, and deployment, and pipeline orchestration with Kubeflow Pipelines. Why the answer is correct: You have 30,000 labeled emails and only 4 weeks, so training a modern NLP model from scratch is unnecessary and risky. The requirement to “use TensorFlow to maintain full control over the model’s code, serving, and deployment” rules out managed black-box training/serving approaches (Natural Language API classification and AutoML Natural Language). The best fit is to start from an established text classification model (for example, a pretrained Transformer encoder or a TF Hub text embedding/classifier backbone) and fine-tune it on your 8 warranty categories. This is classic transfer learning: it accelerates convergence, reduces data requirements, and improves accuracy/time-to-market. You can implement training in TensorFlow, package the model artifact, and deploy it on Vertex AI Prediction (or GKE) with custom containers, all orchestrated via Kubeflow Pipelines. Key features / best practices: Use pretrained language representations (e.g., BERT-style encoders or TF Hub text embeddings) and fine-tune a classification head for 8 classes. Build a Kubeflow Pipeline with components for data validation, preprocessing (tokenization), training, evaluation (precision/recall per class, confusion matrix), and conditional deployment. Use Vertex AI custom training jobs (or GKE) for reproducibility, and Vertex AI Model Registry + endpoints (or KFServing/KServe) for controlled serving. Ensure global email language considerations (multilingual models if needed) and monitor drift. Common misconceptions: Managed APIs (Natural Language API) feel fast, but they don’t provide full control over model code and deployment. AutoML is also fast, but it abstracts training and typically doesn’t satisfy “full control” requirements. Using a pretrained model “as-is” rarely matches domain-specific labels like warranty triage categories. Exam tips: When a question explicitly requires TensorFlow control and custom deployment, prefer custom training/transfer learning over AutoML/APIs. When labels are domain-specific, expect fine-tuning rather than zero-shot or off-the-shelf classification. Map “accelerate delivery” + “limited data” to transfer learning.
You are building an anomaly detection model for an industrial IoT platform using Keras and TensorFlow. The last 24 months of sensor events (~900 million rows, ~2.6 TB) are stored in a single partitioned table in BigQuery, and you need to apply feature scaling, categorical encoding, and time-window aggregations in a cost-effective and efficient way before training. The trained model will be used to run weekly batch inference directly in BigQuery against newly ingested partitions. How should you implement the preprocessing workflow?
Dataproc/Spark can preprocess large datasets, but exporting transformed Parquet to Cloud Storage introduces extra data movement, storage management, and pipeline operations. It also risks training/serving skew because weekly inference is required to run directly in BigQuery; you would need to re-implement the same feature logic in BigQuery or continuously export new partitions. This is typically less cost-effective and less consistent than doing feature engineering in BigQuery.
Loading 2.6 TB (900M rows) into a local pandas DataFrame is infeasible due to memory and compute constraints and would be extremely slow and costly. It also breaks scalability and operational best practices for production ML. This option ignores distributed processing and BigQuery’s strengths, and it would not support ongoing weekly inference in BigQuery without duplicating preprocessing logic elsewhere.
BigQuery SQL is ideal for feature scaling, categorical encoding, and time-window aggregations at this scale using partition pruning, clustering, and window functions. Keeping preprocessing in BigQuery reduces data movement and supports consistent feature definitions for both training and weekly batch inference on new partitions. Using the TensorFlow I/O BigQuery connector to feed a tf.data pipeline enables scalable training input without exporting massive intermediate files.
Dataflow/Beam is strong for streaming and ETL, but writing preprocessed data as CSV is inefficient (large size, slow parsing, poor typing) and increases storage and pipeline overhead. Like option A, it also complicates training/serving consistency because inference must run in BigQuery; you would still need equivalent SQL feature logic or repeated exports for new partitions, increasing cost and operational complexity.
Core Concept: This question tests scalable feature engineering and training data input pipelines when the source of truth is BigQuery and inference will run in BigQuery. It emphasizes pushing preprocessing to the data (BigQuery SQL) and using efficient, distributed ingestion into TensorFlow. Why the Answer is Correct: Option C aligns the entire workflow with BigQuery as the central analytical engine. BigQuery is well-suited for large-scale transformations (2.6 TB, 900M rows) using partition pruning, clustering, window functions, and SQL-based feature engineering. Doing scaling, categorical encoding, and time-window aggregations in BigQuery is cost-effective because you can restrict scans to relevant partitions (e.g., last 24 months) and materialize features into a derived table or view. For training, the TensorFlow I/O BigQuery connector (or equivalent BigQuery-to-tf.data integration) enables streaming data into a tf.data pipeline without exporting massive intermediate files, supporting shuffling, batching, and parallel reads. This also keeps the feature logic consistent with weekly batch inference “directly in BigQuery” (e.g., via BigQuery ML remote models or by applying the same SQL feature view to new partitions). Key Features / Best Practices: - Use partitioned tables and WHERE filters on partition columns to minimize bytes scanned and cost. - Use window functions (e.g., SUM/AVG over time windows) and APPROX functions where appropriate for performance. - Materialize engineered features into a partitioned/clustered feature table to avoid recomputation and improve repeatability. - Ensure training/serving consistency by reusing the same SQL feature definitions for both training and weekly inference. - Follow Google Cloud Architecture Framework principles: optimize cost (partition pruning), performance (BigQuery’s distributed execution), and operational excellence (single source of feature truth). Common Misconceptions: Spark/Dataflow pipelines can be powerful, but exporting large intermediate datasets often increases operational overhead, storage costs, and risks training/serving skew if inference is done in BigQuery with different logic. CSV exports are especially inefficient at this scale. Exam Tips: When data is already in BigQuery and inference will run in BigQuery, prefer SQL-based feature engineering and avoid unnecessary ETL exports. Look for answers that minimize data movement, leverage partitioning/clustering, and keep preprocessing logic consistent across training and serving.
Your edtech company operates a live Q&A chat in virtual classrooms, where an automated text moderation model flags toxic messages. After recent complaints, you discover that benign messages referencing certain indigenous festivals are being misclassified as abusive; an audit on a 10,000-message holdout shows a 12–15% false positive rate for messages containing those festival names versus 3% overall, and those references make up <1% of your training set. With a tight budget and an overextended team this quarter, a major overhaul or full replacement is not feasible; what should you do?
Correct. The subgroup terms are rare in training, and the holdout audit shows a large false positive disparity. Adding curated or carefully reviewed synthetic non-toxic examples containing those festival phrases increases representation and helps the model learn correct context. This is a targeted, budget-friendly mitigation that addresses the root cause and can be validated via slice-based metrics before redeploying.
Incorrect. Fully switching to human moderation may reduce model-driven bias, but it is typically expensive, slow, and hard to scale for live classroom chat. It also removes the benefits of automation (latency and coverage) and is not aligned with the constraint that the team is overextended and budget is tight. A targeted model improvement is more practical.
Incorrect. Replacing with an off-the-shelf classifier is a major change with integration, evaluation, and retraining costs, and it may still exhibit similar bias or poor performance on domain-specific language (festival names, slang, classroom context). Even if adopted, you would still need representative data and slice-based evaluation, so it doesn’t solve the core issue efficiently.
Incorrect. Increasing the global threshold reduces overall flags but is a blunt instrument: it does not specifically reduce false positives for the affected subgroup and can significantly increase false negatives for genuinely toxic messages, undermining safety. It also fails the fairness objective because the disparity may persist even if the overall flag rate drops.
Core Concept: This question tests responsible ML operations: diagnosing and mitigating bias/representation gaps that cause disparate error rates across subgroups. It also touches on practical model improvement under constraints—targeted data augmentation and retraining rather than large architectural changes. This aligns with the Google Cloud Architecture Framework’s Responsible AI and Operational Excellence principles: measure, monitor, and iteratively improve with minimal-risk changes. Why the Answer is Correct: The audit shows a clear slice-based performance issue: messages containing specific festival names have a 12–15% false positive rate vs 3% overall, and those terms are underrepresented (<1%) in training. This is a classic data imbalance/coverage problem leading to poor generalization for a minority subgroup. Adding targeted, clearly non-toxic examples (synthetic or curated) that include those phrases directly addresses the root cause by improving representation and helping the model learn that these tokens are not inherently toxic. This is the highest-leverage, lowest-cost intervention compared to replacing systems or changing global thresholds. Key Features / Best Practices: Use slice-based evaluation (e.g., by keyword, locale, or demographic proxy) and track subgroup metrics (false positive rate, precision) before/after retraining. Prefer a small, high-quality augmentation set plus validation to avoid overfitting or introducing artifacts. If using synthetic data (LLM-generated), apply human review and deduplication, and ensure it matches production language patterns. Retrain with class/feature balancing and consider calibration checks so predicted toxicity aligns with real-world rates. Common Misconceptions: Raising the global threshold (D) may reduce flags but does not fix the subgroup disparity and can increase false negatives for truly toxic content. Replacing the model (C) is costly and risky; off-the-shelf models often share similar biases and still require domain adaptation. Removing automation (B) is operationally expensive and harms scalability/latency. Exam Tips: When you see “subgroup has much worse error rate” plus “underrepresented in training,” the exam typically expects a data-centric fix: collect/augment representative data, then retrain and re-evaluate with slice metrics. Choose the option that addresses the root cause with minimal blast radius and aligns with responsible AI monitoring and continuous improvement.
A fintech analytics team has migrated 12 time-series forecasting and anomaly-detection models to Google Cloud over the last 90 days and is now standardizing new training on Vertex AI. You must implement a system that automatically tracks model artifacts (datasets, feature snapshots, checkpoints, and model binaries) and end-to-end lineage across pipeline steps for dev, staging, and prod; the solution must be simple to adopt via reusable templates, require minimal custom code, retain lineage for at least 180 days, and scale to future models without re-architecting; what should you do?
Vertex AI Pipelines integrates natively with Vertex ML Metadata to automatically capture lineage: which inputs produced which outputs, per component execution, across pipeline runs. Using the Vertex AI SDK enables reusable pipeline templates and components, minimizing custom code while standardizing dev/stage/prod workflows. This scales well as new models are added because lineage capture is built-in and consistent, and artifact URIs are tracked centrally.
Mixing Vertex AI Pipelines for artifacts and MLflow for lineage adds unnecessary operational complexity and weakens standardization. MLflow tracking is not the native lineage store for Vertex Pipelines, so you would need custom integration to correlate pipeline steps, artifacts, and environments. This violates the “minimal custom code” and “simple to adopt via reusable templates” requirements and increases maintenance burden over time.
Vertex AI Experiments is primarily for experiment/run tracking (parameters, metrics, comparisons) and is not designed to provide full end-to-end lineage across multi-step pipelines and artifact dependencies. While ML Metadata can store lineage, using Experiments for artifacts is a mismatch: artifacts like datasets, checkpoints, and binaries are typically managed via pipeline artifacts and storage, with MLMD capturing their relationships automatically.
Using Cloud Composer to schedule lineage capture via Cloud Run functions is a custom-built metadata system. It requires significant bespoke code to infer lineage, map artifacts to steps, and maintain consistency across dev/stage/prod. This approach is harder to standardize, less reliable for audit-grade provenance, and does not leverage Vertex AI’s native MLMD integration, making it a poor fit for low-code, scalable governance.
Core Concept: This question tests end-to-end ML governance on Google Cloud: tracking artifacts (datasets, feature snapshots, checkpoints, model binaries) and lineage across pipeline steps/environments using Vertex AI Pipelines and Vertex ML Metadata (MLMD). This aligns with the Google Cloud Architecture Framework pillars of Operational Excellence (repeatable automation), Reliability (consistent provenance), and Security/Compliance (auditability). Why the Answer is Correct: Vertex AI Pipelines (Kubeflow Pipelines on Vertex) automatically integrates with Vertex ML Metadata to record executions, inputs/outputs, and artifact URIs for each pipeline component. Using the Vertex AI SDK and reusable pipeline templates/components provides a low-code adoption path: teams standardize a pipeline pattern once, then future models inherit artifact and lineage tracking without re-architecting. This directly satisfies the requirement for minimal custom code, reusable templates, and scaling to additional models. Key Features / How to Implement: - Define pipelines with the Vertex AI SDK (KFP v2) and standard components (e.g., data extraction, feature generation, training, evaluation, deployment). - Ensure each step produces typed artifacts (Dataset, Model, Metrics, etc.) and writes outputs to durable storage (typically Cloud Storage). MLMD stores metadata/lineage references to these artifacts. - Use separate projects or environments (dev/stage/prod) with consistent pipeline templates; lineage is captured per run and can be queried for audits and debugging. - Retention: MLMD retains lineage/metadata; the 180-day requirement is met by keeping metadata and underlying artifact storage (e.g., GCS lifecycle policies for binaries/checkpoints). If organizational policy requires, configure dataset/model artifact retention and access controls. Common Misconceptions: Some assume MLflow is required for lineage; on Vertex, MLMD is the native lineage system tightly integrated with Pipelines. Others confuse Vertex AI Experiments (run tracking) with full artifact lineage across pipeline steps; Experiments is not a complete lineage solution for multi-step pipelines. Exam Tips: When you see “end-to-end lineage across pipeline steps” and “simple via templates/minimal custom code,” default to Vertex AI Pipelines + Vertex ML Metadata. Prefer native integrations over assembling multiple tools unless requirements explicitly demand third-party portability. Also remember: metadata stores references; artifact retention is handled by the backing storage (often GCS) and lifecycle policies.
You manage the ML engineering team for a regional logistics network; most training runs are multi-node PyTorch Lightning jobs on managed training with NVIDIA T4 GPUs where a single experiment consumes ~3,000 GPU-hours, new model versions are released every 6–10 weeks, and finance requires at least a 40% reduction in training compute spend without degrading model quality or materially increasing wall-clock time; your pipeline already writes restartable checkpoints to Cloud Storage every 10 minutes with <2% overhead and can tolerate node interruptions. What should you do to reduce Google Cloud compute costs without impacting the model’s performance?
Continuing with Vertex AI Training with checkpoints improves resiliency but does not, by itself, guarantee the required >=40% cost reduction. If the underlying compute remains on-demand T4 GPUs, spend will be similar to current levels. This option is attractive because it is operationally simple, but the question’s explicit cost target and interruption tolerance point to using discounted interruptible capacity rather than only improving fault tolerance.
Running distributed training without checkpoints is risky and conflicts with the stated tolerance for interruptions. Without checkpoints, any node preemption or failure can force restarting from scratch or losing significant progress, materially increasing wall-clock time and potentially increasing total cost. The small checkpoint overhead (<2%) is not a meaningful savings lever compared to GPU-hour costs, so removing checkpoints is a poor trade-off.
This option combines the strongest cost lever (Spot/Preemptible GPU pricing) with the necessary engineering control plane (Kubeflow on GKE) and the already-implemented frequent Cloud Storage checkpoints. Because the workload can tolerate interruptions and checkpoints are frequent, lost work per preemption is small, helping keep wall-clock time roughly stable while achieving large compute cost reductions. This best meets the finance requirement without affecting model quality.
Using Spot/Preemptible GPU node pools can reduce cost, but doing so without checkpoints is likely to cause large recomputation after evictions and can significantly increase wall-clock time. In multi-node distributed training, a single node interruption can disrupt the whole job; without checkpoints, recovery is expensive and may negate savings. Given the pipeline already supports efficient checkpointing, omitting it is unjustified.
Core Concept: This question tests cost optimization for large-scale training on GPUs, specifically using interruptible capacity (Spot/Preemptible GPUs) together with fault-tolerant training via frequent checkpoints. It also implicitly tests when to choose managed training (Vertex AI Training) versus self-managed orchestration (GKE/Kubeflow) to unlock specific pricing models. Why the Answer is Correct: To achieve at least a 40% reduction in training compute spend without degrading model quality, the most direct lever is switching from on-demand GPU VMs to Spot (preemptible) GPU capacity, which is typically discounted substantially versus on-demand. The workload already writes restartable checkpoints to Cloud Storage every 10 minutes with minimal overhead (<2%) and can tolerate interruptions, which is exactly the prerequisite for using Spot/Preemptible resources without materially increasing wall-clock time. With multi-node distributed PyTorch Lightning, interruptions can be handled by restarting workers and resuming from the latest checkpoint, limiting lost work to roughly the checkpoint interval. Key Features / Configurations / Best Practices: - Use GKE node pools with Spot/Preemptible GPU nodes (e.g., T4) and configure autoscaling and node auto-provisioning as appropriate. - Run training via Kubeflow (e.g., PyTorchJob) so the controller can recreate pods after eviction and resume from Cloud Storage checkpoints. - Ensure checkpointing includes optimizer state, RNG seeds (if needed), and distributed state to avoid quality regressions. - Use PodDisruptionBudgets and appropriate retry/backoff policies; keep checkpoint frequency aligned with acceptable lost work. - This aligns with Google Cloud Architecture Framework cost optimization: use discounted compute for interruptible workloads and design for resiliency. Common Misconceptions: A common trap is assuming Vertex AI Training automatically provides the same Spot GPU savings. While Vertex AI supports various accelerators and managed infrastructure, the option set here implies that the guaranteed way to use Spot/Preemptible GPU node pools is to migrate to GKE/Kubeflow. Another misconception is disabling checkpoints to reduce overhead; however, the overhead is already <2%, and removing checkpoints increases risk of large recomputation after interruptions. Exam Tips: When you see requirements like “>=40% cost reduction,” “interruptions tolerated,” and “frequent checkpoints,” think Spot/Preemptible compute plus robust restart logic. Also, prefer the option that preserves model quality (no algorithmic changes) and avoids large wall-clock increases by minimizing lost work through checkpointing.
Ingin berlatih semua soal di mana saja?
Unduh Cloud Pass gratis — termasuk tes latihan, pelacakan progres & lainnya.
Your robotics team is deploying a quality inspection system for ceramic floor tiles on a high-speed conveyor line; each tile is captured as a 3840x3840 RGB image under controlled lighting, and you have 60,000 labeled images (defective vs. non-defective); operations managers require pixel-level attribution heatmaps overlaid on each image so they can pinpoint hairline cracks and decide whether to discard a tile within the same shift. How should you build the model?
Incorrect. Tree-based models in scikit-learn are intended for tabular features; using raw pixels would create tens of millions of features per image, which is infeasible. Even with engineered features, spatial localization is lost. SHAP values can provide feature attributions, but for images this becomes computationally expensive and does not naturally yield clean pixel-level crack localization at 3840x3840 resolution in a production setting.
Incorrect. Partial dependence plots are global interpretability tools showing how changing one (or a few) features affects predictions on average. They are not designed for per-example, pixel-level attribution, and they assume a tabular feature representation. For image inspection requiring localized heatmaps, PDPs are the wrong explanation modality and do not meet the operational requirement to pinpoint hairline cracks on each tile.
Correct. TensorFlow deep learning models (CNNs/transfer learning) are the standard approach for image classification and can learn spatial patterns like hairline cracks directly from pixels. Integrated Gradients is a gradient-based explainability method for differentiable models that produces per-input attributions, which can be visualized as pixel-level heatmaps overlaid on the image—matching the managers’ requirement for localized defect evidence.
Mostly incorrect for this scenario. While TensorFlow is appropriate for the vision model, sampled Shapley methods are typically too computationally heavy for very high-dimensional inputs like 3840x3840 RGB images, especially if explanations are needed routinely in operations. Shapley approximations can also be noisy and require many samples for stable pixel-level maps, making Integrated Gradients a more practical and commonly tested choice.
Core Concept: This question tests selecting an appropriate model family for high-resolution image inspection and choosing an explanation method that produces pixel-level attribution heatmaps. In GCP/ML Engineer terms, it’s about using deep vision models (e.g., CNN/ViT in TensorFlow) and applying a gradient-based explainability technique suitable for dense, per-pixel interpretability. Why the Answer is Correct: For 3840x3840 RGB images and hairline crack detection, tree-based models in scikit-learn are a poor fit because they require engineered tabular features and cannot naturally preserve spatial structure. A deep learning vision model (TensorFlow/Keras) can learn spatial patterns directly from pixels and scale to large datasets (60,000 labeled images). Operations managers require pixel-level attribution overlays; Integrated Gradients (IG) is designed for differentiable models and produces saliency/attribution maps aligned to input pixels, making it well-suited for highlighting crack regions. IG is also relatively efficient compared to Shapley-style methods on large inputs. Key Features / Best Practices: Use a CNN-based classifier (or transfer learning with EfficientNet/ResNet) and generate IG attributions for each prediction. Because images are very large, you typically downsample, crop/patch, or use a two-stage approach (coarse model then high-res patch inspection) to meet latency needs. IG requires a baseline (e.g., black image or blurred image) and multiple interpolation steps; choose steps to balance fidelity and throughput. This aligns with the Google Cloud Architecture Framework’s performance and reliability principles: design for low-latency inference and operational usability (interpretable outputs for same-shift decisions). Common Misconceptions: SHAP is popular for explainability, but Shapley methods are computationally expensive and scale poorly with very high-dimensional inputs like 3840x3840x3. PDPs explain global feature effects in tabular settings and do not provide pixel-level localization. Sampled Shapley for images can work in theory but is typically too slow/noisy for high-resolution, high-throughput inspection. Exam Tips: When you see “pixel-level heatmap” + “images” + “localize defects,” prefer deep vision models plus gradient-based attribution (Integrated Gradients, Grad-CAM). Reserve SHAP/PDP primarily for tabular models. Also consider practical constraints: high-dimensional inputs make Shapley-style explanations costly, which matters for production inspection pipelines.
A digital payments startup trained a binary classification model on Vertex AI to flag potentially fraudulent card transactions using 24 months of historical data (validation AUC = 0.93) and deployed it to a Vertex AI online endpoint that processes ~60,000 requests per day; after 4 weeks, the production AUC computed from feedback labels has dropped to 0.76, while autoscaling shows sufficient replicas and Cloud Monitoring reports P95 latency around 110 ms and error rate < 0.1%. What should you do first to troubleshoot the drop in predictive performance?
Correct. AUC degradation with stable latency/error rates points to data drift or training-serving skew rather than serving capacity. Vertex AI Model Monitoring is designed to compare serving feature distributions to a baseline and surface which features drifted or are skewed. This is the fastest way to identify whether the production population changed, a feature pipeline broke, or the model needs retraining/refresh.
Incorrect. CPU and memory utilization help diagnose serving bottlenecks (e.g., saturation causing timeouts), but the question states autoscaling is sufficient, latency is low, and error rate is <0.1%. Resource bottlenecks typically manifest as elevated latency, throttling, or errors—not a clean drop in AUC computed from feedback labels. This is not the first troubleshooting step for predictive performance loss.
Incorrect as a first step. Explainable AI attributions can help understand which features influence predictions and can be useful after you identify drifted features or suspect spurious correlations. However, explanations do not directly measure distribution shift or confirm training-serving mismatches. Without first checking drift/skew, you may misinterpret changing attributions that are actually driven by changed input distributions.
Incorrect. Latency/throughput checks validate the serving SLO, but the question already provides healthy serving metrics (P95 ~110 ms, low error rate, sufficient replicas). Confirming SLO compliance won’t explain why AUC dropped. For ML systems, predictive quality issues are usually rooted in data/label changes, drift, skew, or concept drift—requiring model monitoring rather than performance monitoring.
Core concept: This question tests ML solution monitoring in production, specifically diagnosing a drop in predictive quality (AUC) when serving health (latency, errors, autoscaling) looks normal. On Vertex AI, the first-line tool is Vertex AI Model Monitoring to detect data drift and training-serving skew, which are common root causes of performance degradation. Why the answer is correct: AUC falling from 0.93 (validation) to 0.76 (production feedback) after several weeks strongly suggests the model is no longer seeing data that matches the training distribution, or the feature engineering/serving pipeline differs from training (skew). Since endpoint replicas are sufficient, P95 latency is stable (~110 ms), and error rate is low, the system is serving predictions correctly and quickly; the issue is likely statistical, not infrastructural. The most effective first troubleshooting step is to quantify whether input feature distributions have shifted (data drift) and whether the online features match the training features (training-serving skew). Model Monitoring provides drift metrics, thresholds, and per-feature signals to quickly identify which features changed and guide remediation (retraining, feature pipeline fixes, or updated labeling strategy). Key features / best practices: Vertex AI Model Monitoring can be configured to: - Capture prediction requests (sampling) and compare feature distributions against a baseline (training or a reference window). - Detect training-serving skew by comparing training data statistics to serving request statistics. - Alert via Cloud Monitoring when drift/skew exceeds thresholds. Operationally, this aligns with the Google Cloud Architecture Framework’s Reliability and Operational Excellence pillars: observe model quality, detect changes early, and automate alerts. After identifying drift, you typically validate upstream data sources, feature store freshness, schema changes, seasonality, new fraud patterns, or policy changes; then decide on retraining cadence, feature updates, or model replacement. Common misconceptions: It’s tempting to focus on CPU/memory, latency, or throughput because those are classic production issues. However, those metrics explain availability/performance, not predictive quality. Explainable AI can help interpret model behavior, but it does not directly diagnose distribution shift or skew, and it’s usually a second step after confirming drift. Exam tips: When a question states predictive metrics degraded while serving metrics are healthy, prioritize monitoring for drift/skew and label/feedback integrity. Use Vertex AI Model Monitoring first, then investigate data pipelines, feature generation, and retraining triggers.
You are part of a data science team at a ride‑sharing platform and need to train and compare multiple TensorFlow models on Vertex AI using 850 million labeled trip records (≈2.3 TB) stored in a BigQuery table; training will run on 4–8 workers and you want to minimize data‑ingestion bottlenecks while ensuring the pipeline remains scalable and repeatable. What should you do?
Loading 2.3 TB into a pandas DataFrame is not feasible (memory limits, single-node bottleneck) and does not scale to 4–8 distributed workers. tf.data.Dataset.from_tensor_slices() is appropriate for small in-memory datasets or prototypes, not production-scale training. This approach also makes repeatability and fault tolerance difficult because the dataset must be rebuilt in memory each run.
Exporting to CSV in Cloud Storage improves decoupling from BigQuery, but CSV is inefficient for ML training at this scale. Text parsing is CPU-heavy, files are larger than binary formats, and schema/typing issues are common. While tf.data.TextLineDataset can be parallelized, overall throughput is typically worse than TFRecord, increasing the risk of input bottlenecks on multi-worker training.
Sharded TFRecords in Cloud Storage are a best-practice format for high-throughput TensorFlow training. Sharding (e.g., 1–2 GB) enables parallel reads across workers, reduces single-file contention, and supports repeatable experiments by reusing the same immutable dataset version. Using TFRecordDataset with parallel interleave and prefetch overlaps I/O and compute, minimizing data-ingestion bottlenecks and improving scalability.
Streaming directly from BigQuery during training can create ingestion bottlenecks due to BigQuery read throughput limits, concurrency/quotas, and per-request overhead, especially with multiple workers. It also couples training stability to BigQuery availability and query performance, reducing repeatability. While TensorFlow I/O can work for smaller datasets or experimentation, for TB-scale multi-worker training it is generally safer to materialize to Cloud Storage in an efficient format.
Core concept: This question tests scalable input pipelines for distributed TensorFlow training on Vertex AI. The key is decoupling training from the source system (BigQuery) and using an efficient, parallelizable file format and tf.data best practices to avoid input bottlenecks at multi-worker scale. Why the answer is correct: With 850M rows (~2.3 TB) and 4–8 workers, streaming directly from BigQuery or materializing into a single-node structure will bottleneck on network, per-request overhead, and/or BigQuery concurrency limits. Sharded TFRecords in Cloud Storage are a standard, repeatable “training-ready” dataset format: they enable high-throughput sequential reads, easy parallelization across workers, and deterministic reuse across experiments. Proper sharding (e.g., 1–2 GB) balances metadata overhead (too many small files) against parallelism (too few large files). Using tf.data.TFRecordDataset with parallel interleave, map, and prefetch allows overlapping I/O and compute, maximizing accelerator/CPU utilization. Key features / best practices: - Store training data in Cloud Storage in a binary, splittable format (TFRecord) with compression (e.g., GZIP) when appropriate. - Use many shards and let each worker read different shards (via file patterns and dataset sharding options) to reduce contention. - Use tf.data optimizations: parallel_interleave (or Dataset.interleave with num_parallel_calls), map with AUTOTUNE, prefetch(AUTOTUNE), and optionally cache only when it fits. - Make the pipeline repeatable: a one-time (or scheduled) export/transform step from BigQuery to TFRecords can be orchestrated (e.g., Vertex AI Pipelines / Dataflow) and versioned. Common misconceptions: - “Directly read from BigQuery” sounds convenient, but it couples training throughput to BigQuery read performance, quotas, and transient query/streaming behavior, which is risky at scale. - “CSV is universal” but is inefficient: large text parsing overhead, larger storage footprint, and slower input pipelines. - “Load into pandas” is a common prototype pattern but fails for multi-terabyte datasets and distributed training. Exam tips: For large-scale training on Vertex AI, prefer Cloud Storage + TFRecords (or similarly efficient formats) with tf.data performance patterns. Choose architectures that separate data preparation from training, support multi-worker parallel reads, and minimize per-record parsing overhead. When you see TB-scale data and multiple workers, avoid pandas and avoid text formats unless explicitly required.
Your e-commerce price-optimization model serves about 30,000 predictions per hour on a Vertex AI endpoint, and a Vertex AI Model Monitoring job is configured to detect training-serving skew using a 24-hour sliding window with a 0.3 sampling rate and a baseline dataset at gs://retail-ml/training/2025-06/data.parquet; after three consecutive windows reporting skew on features inventory_days and competitor_price, you retrained the model using the last 45 days of data at gs://retail-ml/training/2025-08/data.parquet and deployed version v2 to the same endpoint, but the monitoring job still raises the same skew alert—what should you do?
Lowering the sampling rate reduces the number of prediction requests used to estimate serving distributions, which can reduce monitoring cost and sometimes reduce noisy estimates. However, it does not correct a systematic skew caused by comparing against an outdated baseline. It can also increase variance and potentially miss true skew/drift, weakening reliability and observability.
Training-serving skew is defined relative to a baseline dataset. After retraining on gs://retail-ml/training/2025-08/data.parquet and deploying v2, the monitoring baseline must be updated to that same dataset (or an equivalent reference) so the comparison reflects the new model’s expected feature distributions. This directly addresses the root cause: baseline/model mismatch.
Waiting for more production traffic can help if the issue is insufficient sample size or cold-start behavior. But here, the monitor already had three consecutive windows and still alerts after deployment, indicating a persistent distribution difference. If the baseline remains the June dataset, additional traffic won’t resolve the skew; it will likely confirm it more strongly.
Disabling alerts until another retrain is risky and contrary to good MLOps practices. It sacrifices monitoring coverage and can hide genuine data issues (pipeline bugs, upstream changes, seasonality impacts). Also, retraining again without fixing the monitoring baseline/configuration may still produce the same alert pattern, because the monitor would still be comparing to the wrong reference.
Core Concept: This question tests Vertex AI Model Monitoring for training-serving skew. Skew detection compares the distribution of features seen in online predictions (serving data) against a configured baseline dataset (typically the training dataset) over a defined window (here, a 24-hour sliding window) and sampling rate. Why the Answer is Correct: You retrained and deployed model version v2 using a newer 45-day dataset (gs://retail-ml/training/2025-08/data.parquet), but the monitoring job is still configured to compare serving traffic against the old baseline (gs://retail-ml/training/2025-06/data.parquet). If the feature distributions legitimately shifted between June and August (common in retail pricing, inventory, and competitor dynamics), the monitor will continue to flag skew because it is still measuring serving data against an outdated reference distribution. The correct remediation is to update the Model Monitoring job’s baseline to the dataset that corresponds to the currently deployed model’s training data. Key Features / Best Practices: - Baseline management: For training-serving skew, the baseline should align with the deployed model version’s training set (or a curated “golden” reference) and be updated when the model is retrained. - Windowing and sampling: Sliding windows and sampling rate affect sensitivity and cost, but they do not fix a baseline mismatch. - Operational practice: Treat baseline updates as part of the deployment/release process (MLOps). This aligns with the Google Cloud Architecture Framework’s operational excellence and reliability principles—monitoring must reflect the current system state. Common Misconceptions: - Lowering sampling (Option A) may reduce alert frequency but can hide real issues and does not address the root cause (wrong baseline). - Waiting 72 hours (Option C) doesn’t help if the comparison target remains the old baseline; more traffic just provides more evidence of the same mismatch. - Disabling alerts until retraining again (Option D) is an anti-pattern: it reduces observability and can allow real drift/skew to go undetected. Exam Tips: When a monitoring alert persists after a model update, check configuration coupling: baseline dataset, feature schema, objective thresholds, and which model/version the monitor is attached to. For training-serving skew specifically, the first thing to verify is that the baseline corresponds to the training data for the currently deployed model version.
You trained an automated scholarship eligibility classifier for a national education nonprofit using Vertex AI on 1.2 million labeled applications, reaching an offline ROC AUC of 0.95; the review board is concerned that predictions may be biased by applicant demographics (e.g., gender, ZIP-code–derived income bracket, first-generation college status) and asks you to deliver transparent insight into how the model makes decisions for 500 sampled approvals and denials and to identify any fairness issues across these cohorts. What should you do?
Separating features in Feature Store and retraining without demographics is a remediation attempt, not the requested analysis. It also risks “fairness through unawareness,” which often fails because proxy variables (e.g., ZIP code, school, essay topics) can still encode demographics. Additionally, removing sensitive features can make it harder to measure fairness and audit outcomes by protected class. The board asked for transparent insight and cohort fairness evaluation first.
Vertex AI feature attribution (Explainable AI) provides per-instance explanations for approvals and denials, quantifying how each feature influenced each prediction. You can then aggregate and compare attributions and outcomes across cohorts (gender, income bracket, first-gen) to surface potential bias and proxy-feature reliance. This directly satisfies the requirement for transparency on 500 sampled decisions and supports fairness analysis using group-level comparisons and fairness metrics.
Vertex AI Model Monitoring focuses on operational issues like training-serving skew, feature drift, and prediction drift. While valuable for production reliability, it does not provide decision transparency for specific cases nor does it directly identify fairness issues across demographic cohorts. Retraining with recent data may even perpetuate or amplify bias if the underlying labeling or historical decision process is biased. This option addresses a different problem than requested.
Vector Search nearest-neighbor retrieval can help find similar examples, but it is not a primary tool for explainability or fairness auditing. Similarity search depends on embeddings and distance metrics and may not reveal which features drove the model’s decision or quantify demographic impacts. It can be a supplemental qualitative investigation technique, but it does not provide the systematic, per-feature, cohort-based transparency and bias evidence the board requested.
Core Concept: This question tests model transparency and fairness evaluation in Vertex AI—specifically using explainability (feature attribution) to understand why individual predictions were made and then analyzing those explanations across demographic cohorts to detect potential bias. This aligns with responsible AI practices in the Google Cloud Architecture Framework (governance, risk management, and trust). Why the Answer is Correct: The board requests “transparent insight” for 500 sampled approvals/denials and to “identify fairness issues across cohorts.” Vertex AI feature attribution (Vertex Explainable AI) provides per-prediction explanations (local explanations) showing how each input feature contributed to a specific decision. By aggregating attributions and outcomes by cohort (e.g., gender, income bracket, first-gen status), you can identify whether sensitive or proxy features disproportionately drive approvals/denials, and whether similarly qualified applicants receive different outcomes across groups—key evidence for bias investigation. Key Features / How to Do It: Use Vertex AI Explainable AI on the deployed model or batch predictions for the 500 sampled cases to obtain attributions (e.g., Integrated Gradients / sampled Shapley depending on model type). Then slice results by cohort and compare: (1) distribution of prediction scores, (2) top contributing features, and (3) fairness metrics such as demographic parity difference, equal opportunity / TPR gaps, and calibration by group (often computed outside Vertex AI using BigQuery/Looker/Python, but driven by the attribution outputs). Also look for proxy variables (ZIP-code–derived income) acting as sensitive-feature surrogates. Common Misconceptions: High ROC AUC (0.95) does not imply fairness; it can coexist with discriminatory behavior. Monitoring drift/skew is operationally important but does not answer “why” decisions are made or whether they are biased. Removing demographic features may not remove bias because proxies remain, and it can reduce the ability to measure/mitigate fairness. Exam Tips: When the prompt asks for transparency, interpretability, and cohort-based bias analysis, think “Vertex Explainable AI/feature attribution + slice by groups.” When it asks for production data drift or training-serving skew, think “Vertex Model Monitoring.” For fairness, prefer measurement and evidence (explanations + group metrics) before remediation steps like reweighting, constraints, or feature removal.
Masa belajar: 1 month
Just want to say a massive thank you to the entire Cloud pass, for helping me pass my exam first time. I wont lie, it wasn't easy, especially the way the real exam is worded, however the way practice questions teaches you why your option was wrong, really helps to frame your mind and helps you to understand what the question is asking for and the solutions your mind should be focusing on. Thanks once again.
Masa belajar: 1 month
Good questions banks and explanations that help me practise and pass the exam.
Masa belajar: 1 month
강의 듣고 바로 문제 풀었는데 정답률 80% 가량 나왔고, 높은 점수로 시험 합격했어요. 앱 잘 이용했습니다
Masa belajar: 1 month
Good mix of theory and practical scenarios
Masa belajar: 1 month
I used the app mainly to review the fundamentals—data preparation, model tuning, and deployment options on GCP. The explanations were simple and to the point, which really helped before the exam.

Professional

Associate

Professional

Associate

Foundational

Professional

Professional

Professional

Professional

Professional


Ingin berlatih semua soal di mana saja?
Dapatkan aplikasi gratis
Unduh Cloud Pass gratis — termasuk tes latihan, pelacakan progres & lainnya.