
65問と90分の制限時間で実際の試験をシミュレーションしましょう。AI検証済み解答と詳細な解説で学習できます。
AI搭載
すべての解答は3つの主要AIモデルで交差検証され、最高の精度を保証します。選択肢ごとの詳細な解説と深い問題分析を提供します。
A public health analytics team at a city hospital network plans to build an AI application using large language models (LLMs) that will read 20–50 page clinical incident reports (PDFs) and produce key takeaways for safety reviews; the system must return a concise summary of the top findings (≤200 words) within 5 seconds per request and support up to 10 concurrent reviewers. Which solution meets these requirements?
NER is a classic NLP extraction task that identifies and labels entities (e.g., drug names, dosages, patient identifiers). While useful for structuring clinical text and supporting downstream analytics, it does not generate a concise narrative or bullet summary of “top findings.” It also doesn’t inherently satisfy the requirement to return a ≤200-word takeaway summary within 5 seconds.
A recommendation engine suggests similar or related incident reports, which can improve reviewer workflow and discovery. However, it does not read a single 20–50 page PDF and produce key takeaways. Even if paired with search, it fails the primary functional requirement: generating a concise summary of the report’s top findings within the specified word and latency constraints.
Option C is correct because the primary requirement is to read long clinical incident reports and generate a concise summary of the top findings. That is a classic LLM summarization use case, since large language models are designed to understand unstructured text and produce coherent condensed outputs. The option also explicitly includes the requirement to keep the response within 200 words and return results in under 5 seconds, matching the stated functional and performance constraints. None of the other options produce a summary, so C is the only choice that directly satisfies the business need.
Translation systems convert text between languages. The prompt does not require multilingual support; it requires summarization of clinical incident reports. Translation may be valuable in other hospital contexts, but it does not produce “top findings” summaries and does not address the core requirement. Therefore it is not the best solution for this use case.
Core Concept: This question tests selecting the correct generative AI application pattern—LLM-based document summarization—based on explicit functional and nonfunctional requirements (output length, latency, and concurrency). In AWS terms, this commonly maps to using a foundation model (for example via Amazon Bedrock) to perform summarization on long-form documents. Why the Answer is Correct: The team needs an AI application that reads 20–50 page PDFs and produces a concise summary of top findings (≤200 words) within 5 seconds per request, supporting up to 10 concurrent reviewers. Option C directly describes an LLM-powered summarization assistant that performs exactly this task and aligns with the constraints (bounded summary length and low latency). The other options describe different ML tasks (NER, recommendations, translation) that do not satisfy the primary requirement: generating a concise narrative/bulleted summary of key findings. Key AWS Features: A typical AWS implementation would use Amazon S3 to store PDFs, text extraction (often Amazon Textract for PDFs) to convert documents to text, and an LLM for summarization (for example, Amazon Bedrock with an appropriate model). To meet the 5-second SLA and 10 concurrent users, you would focus on: - Prompt design to enforce ≤200 words (explicit instruction + max tokens). - Low-latency inference (choose a model/throughput configuration appropriate for real-time requests). - Caching summaries for repeated access (for example, DynamoDB/ElastiCache) and asynchronous pre-summarization for newly uploaded reports when possible. - Concurrency controls and scaling at the API layer (API Gateway + Lambda/ECS) while ensuring the model endpoint can handle parallel requests. Common Misconceptions: NER (Option A) can look relevant because clinical reports contain entities like drugs and dosages, but tagging entities does not produce a coherent “top findings” summary. Recommendations (Option B) help discovery, not summarization. Translation (Option D) is unrelated unless multilingual output is required. Exam Tips: For exam questions, anchor on the verb and output: “produce key takeaways/summary” strongly indicates generative AI summarization. Then validate against nonfunctional requirements (latency, concurrency, output length). Choose the option that matches the end-user deliverable, not adjacent analytics tasks (entity extraction, search, recommendations).
An IT operations team uses an LLM to diagnose incidents by analyzing 8–12 sequential service logs and 5 metric anomalies, and they require the model to produce a numbered, step-by-step reasoning trace with intermediate calculations (e.g., latency deltas and error-rate ratios) that justifies the final root cause and remediation; which prompt engineering technique best meets these requirements?
Few-shot prompting provides a small set of examples (input-output pairs) to teach the model the desired pattern. It can help the LLM format incident analyses consistently and may improve accuracy by demonstrating how to compute deltas/ratios. However, the requirement is specifically to produce an explicit step-by-step reasoning trace with intermediate calculations, which is more directly addressed by chain-of-thought prompting than by merely providing examples.
Zero-shot prompting relies only on instructions without examples. While you could ask for numbered steps and calculations, zero-shot is generally less reliable for complex, multi-evidence reasoning across sequential logs and metrics. It often increases variability in structure and completeness, making it a weaker choice when the output must consistently include intermediate computations and a justified root cause/remediation trace.
Directional stimulus prompting steers the model by emphasizing specific signals, constraints, or hints (e.g., “focus on latency spikes after deployment,” “prioritize 5xx errors”). This can improve relevance and reduce distraction from noisy logs, but it does not inherently cause the model to expose a full reasoning chain with intermediate calculations. It is more about guiding attention than eliciting explicit step-by-step reasoning.
Chain-of-thought prompting is intended to elicit multi-step reasoning by having the model articulate intermediate steps before producing a final answer. This directly matches the requirement for a numbered reasoning trace with intermediate calculations (latency deltas, error-rate ratios) that justify the final root cause and remediation. It is the best fit when the task requires transparent, structured reasoning over multiple logs and metric anomalies.
Core Concept: This question tests prompt engineering techniques for generative AI/LLMs, specifically how to elicit structured, multi-step reasoning with intermediate computations from a model. In AWS exam contexts, this commonly appears alongside Amazon Bedrock or Amazon SageMaker JumpStart usage patterns, but the core is the prompting method. Why the Answer is Correct: The requirement is explicit: the team needs a numbered, step-by-step reasoning trace with intermediate calculations (latency deltas, error-rate ratios) that justifies the final diagnosis and remediation. Chain-of-thought prompting is designed to encourage the model to “show its work” by producing intermediate reasoning steps rather than only a final answer. When you ask for a structured reasoning trace (e.g., “Think step by step; include calculations; output numbered steps”), you are applying chain-of-thought prompting to improve coherence across multiple evidence items (8–12 sequential logs plus 5 metric anomalies) and to reduce the chance the model jumps to an unsupported conclusion. Key AWS Features / Best Practices: In practice on AWS (e.g., Amazon Bedrock), you would combine chain-of-thought style instructions with output formatting constraints (numbered steps, sections for calculations, final root cause, remediation). You may also pair it with retrieval (RAG) to supply the log/metric context, but the question is specifically about the prompting technique that yields intermediate reasoning. For operational use, teams often additionally enforce guardrails (e.g., “cite which log lines/metrics support each step”) and validate calculations externally. Common Misconceptions: Few-shot prompting can improve task adherence by providing examples, but it does not inherently require the model to reveal intermediate reasoning. Zero-shot is least likely to reliably produce detailed traces. Directional stimulus prompting can steer attention toward certain evidence, but it is not the canonical technique for eliciting explicit multi-step reasoning with calculations. Exam Tips: When a question asks for “step-by-step reasoning,” “show your work,” “intermediate steps,” or “multi-hop reasoning,” the exam answer is typically chain-of-thought prompting. If the question instead emphasizes “learn from examples,” choose few-shot. If it emphasizes “do it with only instructions,” choose zero-shot. If it emphasizes “focus on specific cues or constraints,” directional stimulus may apply.
A retail analytics team is building a retrieval-augmented generation (RAG) prototype that stores 20 million 768-dimensional text embeddings in Amazon OpenSearch Service and must retrieve the top 20 most similar vectors within 100 ms; which OpenSearch capability specifically enables this type of vector database application?
Native integration with Amazon S3 is useful for snapshots, backups, and data lifecycle patterns (e.g., storing index snapshots in S3). However, S3 is object storage and does not provide the low-latency, high-dimensional similarity search required for RAG. S3 integration helps durability and cost management, but it does not enable k-NN retrieval of embeddings within 100 ms.
Geospatial indexing and queries support location-based use cases such as finding points within a radius, bounding box searches, and geo-distance sorting. While it is a specialized indexing capability, it is unrelated to semantic similarity over 768-dimensional embedding vectors. Geospatial features won’t help retrieve the top 20 nearest embeddings for a RAG workload.
Scalable vector index management and k-nearest neighbor (k-NN) search is the specific OpenSearch capability that enables vector database applications. It allows storing embeddings in vector fields and performing approximate nearest neighbor searches (top K most similar vectors) efficiently at large scale, meeting tight latency requirements typical of RAG systems (e.g., retrieving top 20 matches from tens of millions of vectors).
Real-time analysis on streaming data refers to ingesting and querying continuously arriving events (logs, metrics, clickstreams) with low indexing latency. Although OpenSearch is commonly used for near-real-time analytics, streaming support does not address the core requirement here: fast similarity search over high-dimensional embeddings. Vector retrieval performance depends on k-NN/ANN indexing, not streaming analytics.
Core Concept: This question tests Amazon OpenSearch Service’s vector database capabilities used in retrieval-augmented generation (RAG): storing high-dimensional embeddings and performing fast approximate similarity search (k-nearest neighbor) at scale. Why the Answer is Correct: A RAG system needs to retrieve the most semantically similar documents to a query embedding. With 20 million 768-dimensional vectors and a strict latency target (top 20 within 100 ms), the key OpenSearch capability is scalable vector indexing plus k-NN search. OpenSearch provides vector fields and k-NN/ANN (approximate nearest neighbor) search so you can efficiently find the closest vectors without scanning all 20 million embeddings. This is exactly what enables OpenSearch to function as a vector database for semantic search and RAG. Key AWS Features: OpenSearch supports vector search through k-NN functionality (commonly backed by ANN algorithms such as HNSW) and manages vector indexes across shards and nodes for horizontal scalability. You typically store embeddings in a vector field, choose an ANN index type/parameters, and query using k-NN to return the top K results (here, K=20). Performance depends on index parameters (e.g., graph construction/ef settings), shard sizing, instance types (memory/CPU), and using filters to narrow candidate sets when applicable. This aligns with Well-Architected performance efficiency: optimize data structures and scale-out rather than brute-force scans. Common Misconceptions: S3 integration (A) is about storing snapshots or ingesting data, not low-latency vector similarity search. Geospatial indexing (B) is specialized for location-based queries, not embedding similarity. Streaming analytics (D) relates to near-real-time log/event analytics, not ANN vector retrieval. Exam Tips: When you see “embeddings,” “top K similar,” “vector database,” or “RAG,” look for “vector search,” “k-NN,” or “ANN.” In OpenSearch/Elasticsearch-style services, the enabling feature is the k-NN/vector index, not generic search, storage integrations, or streaming features. Also note that strict latency with tens of millions of vectors strongly implies approximate nearest neighbor indexing rather than exact brute-force similarity computation.
An e-commerce marketplace plans to deploy a product Q&A assistant using a managed large language model on Amazon Bedrock; in a two-day pilot, each request averages 800 prompt tokens and 200 completion tokens, the temperature is set to 0.7, and no custom training or fine-tuning is performed for the model. Which factor will primarily drive the daily inference charges when the system processes 10,000 requests?
Correct. Amazon Bedrock on-demand inference charges are primarily based on token usage: the number of input (prompt) tokens plus output (completion) tokens. With 10,000 requests/day and ~1,000 tokens/request, total daily tokens dominate cost. This aligns with typical Bedrock model pricing, which specifies rates for input and output tokens separately.
Incorrect. Temperature controls randomness/creativity in generation, affecting determinism and sometimes indirectly influencing response length. However, Bedrock does not bill “per temperature setting.” Billing is metered by usage (primarily tokens). Even if temperature changes the style of output, the charge is still driven by how many tokens are generated and processed.
Incorrect. The scenario explicitly states no custom training or fine-tuning is performed. Training data volume would matter only if the solution included model customization (fine-tuning) or separate training workflows. For managed FM inference in Bedrock, you pay for inference usage (tokens), not for the provider’s original pre-training data.
Incorrect. Training time is irrelevant here because the model is not being trained by the customer. Bedrock provides managed foundation models where customers typically consume inference APIs. Charges based on training duration would apply to training jobs in services like Amazon SageMaker training or certain customization workflows, not standard Bedrock inference.
Core Concept: Amazon Bedrock inference pricing for managed foundation models is primarily usage-based. For most Bedrock models, on-demand inference charges are driven by the number of tokens processed (input/prompt tokens plus output/completion tokens). This question tests understanding of how generative AI inference is metered and billed. Why the Answer is Correct: With 10,000 requests/day and an average of 800 prompt tokens + 200 completion tokens, the system consumes ~1,000 tokens per request, or ~10,000,000 tokens/day total. Bedrock’s inference cost scales with this token volume because the service meters how much text you send to the model and how much text the model generates. Since no fine-tuning or custom training is performed, there are no training-related charges to consider. Therefore, the dominant cost driver is tokens consumed per request. Key AWS Features: Bedrock provides access to multiple foundation models with pricing typically expressed per 1,000 (or 1 million) input tokens and per 1,000 (or 1 million) output tokens, often at different rates. This encourages cost control via prompt optimization (shorter prompts), response length limits (max tokens), caching/retrieval strategies (RAG to reduce unnecessary context), and request batching where supported. Temperature is an inference parameter that affects randomness, but it does not change the billing unit. Common Misconceptions: Many assume “temperature” or other generation settings directly change cost. While temperature can indirectly influence output length or retries (which could increase tokens), the billing mechanism is still token-based. Another misconception is that training time or training data affects inference charges; those apply only when you are actually training or fine-tuning a model (and Bedrock’s managed FMs are typically used without customer-managed training). Exam Tips: For Bedrock and most LLM services, remember: inference cost is usually proportional to input + output tokens. Always compute approximate daily/monthly token totals from request volume and average token counts. Separate inference pricing from customization (fine-tuning) pricing, and don’t confuse model hyperparameters (temperature, top-p) with billing dimensions unless the question explicitly mentions a pricing model based on compute time or provisioned throughput.
A retail marketplace uses a foundation model to classify product photos into 200 categories; before launch, the team wants to verify accuracy using a held-out benchmark of 10,000 labeled images with a target of at least 92% top-1 accuracy—what is the most appropriate strategy to evaluate the model’s accuracy?
Compute cost and runtime are operational metrics (efficiency), not predictive performance metrics (effectiveness). They help with budgeting, scaling, and latency/throughput planning, but they cannot confirm whether the model meets the required 92% top-1 accuracy. A model can be cheap and fast yet inaccurate, or expensive and slow yet accurate. This option does not evaluate correctness against labeled ground truth.
This is the correct strategy because it directly measures the required metric on the specified evaluation dataset. Run inference on the 10,000 labeled images, compute top-1 accuracy (argmax prediction equals the true label), and compare the result to the 92% acceptance threshold. This is standard ML evaluation practice for multi-class classification and best reflects expected real-world performance prior to launch.
Counting layers or parameters describes model capacity/complexity, not actual accuracy on the task. Parameter count is sometimes correlated with capability, but it is not a substitute for empirical evaluation on a representative labeled benchmark. Two models with similar size can have very different accuracy due to training data, fine-tuning, preprocessing, or domain shift. This option fails to validate the 92% requirement.
Color fidelity checks can be useful for validating image preprocessing pipelines (e.g., ensuring no unintended transformations), but they do not measure classification accuracy against labeled categories. A model could receive perfectly color-accurate images and still misclassify them. The requirement is explicitly top-1 accuracy on a labeled benchmark, which must be computed from predictions versus ground truth labels.
Core Concept: This question tests fundamental ML model evaluation using a labeled holdout dataset and an appropriate metric (top-1 accuracy) for multi-class image classification. In AWS terms, this aligns with standard evaluation practices you would apply whether you built the model in Amazon SageMaker (training jobs + evaluation jobs) or are assessing a foundation model’s performance on a benchmark dataset. Why the Answer is Correct: The team has a clear acceptance criterion: at least 92% top-1 accuracy on a held-out benchmark of 10,000 labeled images across 200 categories. The most appropriate strategy is to run inference on that benchmark set, compute top-1 accuracy (percentage of images where the model’s highest-probability predicted class matches the ground-truth label), and compare the result to the 92% target. This directly measures the stated objective and uses the correct dataset split (held-out) to estimate generalization performance prior to launch. Key AWS Features / Best Practices: In practice, you would store the benchmark in Amazon S3, run batch inference (e.g., SageMaker Batch Transform or a processing job), and compute metrics in a repeatable pipeline (SageMaker Pipelines). You may also track metrics and artifacts with SageMaker Experiments/Model Registry. For classification, also consider complementary metrics (confusion matrix, per-class accuracy, macro/micro F1) to detect class imbalance, but the question explicitly requires top-1 accuracy. Common Misconceptions: Cost/runtime (A) is important for operations but does not validate predictive quality. Model size (C) is not a performance guarantee; larger models can still be inaccurate or miscalibrated. Image color fidelity checks (D) relate to preprocessing/quality assurance, not classification correctness against labels. Exam Tips: When a question provides (1) a labeled holdout set and (2) a target metric threshold, the correct approach is almost always to compute that metric on the holdout set and compare to the threshold. Match the evaluation method to the business requirement (here: top-1 accuracy for multi-class classification).
外出先でもすべての問題を解きたいですか?
Cloud Passを無料でダウンロード — 模擬試験、学習進捗の追跡などを提供します。
A media-streaming platform operates 150 ML inference containers across 3 AWS Regions processing 25,000 requests per minute and needs a highly scalable AWS service to centrally track and alert on P95 latency, 5xx error rate, and GPU/CPU utilization for these workloads; which AWS service should the company use?
Amazon CloudWatch is the correct service for centralized operational monitoring. It collects and stores metrics at scale, supports percentile statistics (such as P95) for latency distributions, and provides alarms, dashboards, and notifications. With Container Insights and custom metrics, it can track CPU/GPU utilization and application KPIs like 5xx error rate across many containers and multiple Regions.
AWS CloudTrail records AWS API calls and events for auditing, governance, and security investigations (for example, who launched an instance or changed an IAM policy). It is not intended for performance monitoring such as P95 latency, HTTP 5xx rates, or GPU/CPU utilization, and it does not provide metric-based alarming for application SLOs.
AWS Trusted Advisor provides periodic checks and recommendations across cost optimization, performance, security, fault tolerance, and service limits. While it can flag issues like approaching quotas or underutilized resources, it does not provide real-time, per-request latency percentiles, 5xx error monitoring, or container-level GPU/CPU telemetry with alerting.
AWS Config tracks configuration state and changes of AWS resources and evaluates them against compliance rules (for example, whether S3 buckets are public or security groups allow 0.0.0.0/0). It is not a metrics/observability service and cannot natively compute P95 latency, monitor 5xx error rates, or track runtime GPU/CPU utilization for containers.
Core Concept: This question tests observability and operational monitoring on AWS—collecting metrics, aggregating them across many containers and Regions, creating percentiles (P95), and alerting on thresholds. The AWS-native service for metrics, logs, dashboards, and alarms is Amazon CloudWatch. Why the Answer is Correct: The platform needs centralized tracking and alerting for P95 latency, 5xx error rate, and GPU/CPU utilization across 150 inference containers in 3 Regions at high request volume. CloudWatch is designed to ingest high-cardinality time-series metrics, compute statistics (including percentiles for distributions), and trigger alarms. It can collect application metrics (latency, HTTP 5xx) via custom metrics or embedded metric format, and infrastructure/container metrics (CPU, memory, GPU) via CloudWatch Agent and Container Insights (ECS/EKS). For multi-Region operations, CloudWatch supports cross-account and cross-Region dashboards and can route alarms/notifications through Amazon SNS, OpsCenter, or incident tooling. Key AWS Features: 1) CloudWatch Metrics + Alarms: Create alarms on P95 latency and 5xx rate; use metric math to compute error rates (5xx/total). 2) Container Insights: Collect per-container and per-node CPU/memory/network; integrate with EKS/ECS. 3) GPU monitoring: Publish GPU utilization as custom metrics (e.g., via CloudWatch Agent/telegraf/nvidia-smi exporters) and alarm on thresholds. 4) Dashboards and cross-Region visibility: Central dashboards to view all Regions; optionally centralize via cross-account observability. 5) Anomaly Detection and Logs Insights (optional): Detect latency regressions and query logs for correlation. Common Misconceptions: CloudTrail is often confused with monitoring, but it records API activity (who did what) rather than performance metrics. Trusted Advisor provides best-practice checks and cost/security recommendations, not real-time P95 latency tracking. AWS Config tracks resource configuration changes and compliance, not runtime latency or GPU utilization. Exam Tips: When you see “track metrics,” “percentiles (P95),” “alerting,” “dashboards,” and “operational monitoring,” default to CloudWatch. Choose CloudTrail for audit/API history, Config for configuration compliance/drift, and Trusted Advisor for account-level recommendations. For containerized ML inference, remember Container Insights + custom metrics for model latency and GPU utilization are common patterns.
A financial advisory chatbot hosted on Amazon Bedrock shows a 12% hallucination rate in an internal evaluation of 500 prompts when invoked with temperature=0.9 and top_p=0.95, and the team must reduce hallucinations below 5% within 24 hours without retraining or changing the foundation model—what should they do?
Incorrect. Agents for Amazon Bedrock are used to orchestrate multi-step tasks at inference time (tool use, function calling, retrieval, and workflow execution). They do not “supervise the model’s training process,” and they do not directly reduce hallucinations by changing how the foundation model was trained. While agents can improve factuality by grounding responses with tools or knowledge bases, the option’s premise about training supervision is wrong.
Incorrect. Data pre-processing to remove problematic training examples implies you have access to and can modify the model’s training dataset, followed by retraining or fine-tuning. The question explicitly prohibits retraining and requires improvement within 24 hours. In managed foundation models on Bedrock, customers generally cannot edit the original training corpus. This approach is not feasible under the stated constraints.
Correct. Lowering temperature reduces randomness in token sampling, making outputs more deterministic and typically reducing hallucinations—especially in high-stakes domains like financial advice where creative phrasing can become fabricated facts. This is an immediate inference-time change using Bedrock invocation parameters (temperature/top_p) and can be validated quickly against the same 500-prompt evaluation set to confirm hallucinations drop below 5%.
Incorrect. Switching to a different foundation model could reduce hallucinations, but the question explicitly says the team cannot change the foundation model. Even if allowed, model switching often requires re-validation, prompt adjustments, and regression testing—unlikely to be the fastest compliant fix within 24 hours. The best answer must respect the constraint and use inference-time controls.
Core Concept: This question tests inference-time controls for foundation models on Amazon Bedrock—specifically how decoding parameters (temperature and top_p) affect output variability and hallucination risk. When you cannot retrain, fine-tune, or change the model, the fastest lever is generation configuration. Why the Answer is Correct: A temperature of 0.9 with top_p=0.95 encourages diverse, creative outputs by increasing randomness in token selection. In a financial advisory chatbot, that creativity often manifests as fabricated facts, citations, or overly confident incorrect statements (hallucinations). Lowering temperature (e.g., to ~0.2) makes sampling more deterministic, pushing the model toward higher-probability tokens and more conservative completions. This typically reduces hallucinations quickly and can be implemented immediately (within minutes) by changing the Bedrock invocation parameters—meeting the 24-hour constraint. Key AWS Features: Amazon Bedrock runtime APIs allow per-request or default configuration of inference parameters such as temperature and top_p (nucleus sampling). Teams can A/B test parameter sets against the same evaluation prompts to quantify hallucination reduction. In production, these settings can be applied consistently via the application layer or orchestration components (for example, a Bedrock invocation wrapper) without modifying the underlying foundation model. Common Misconceptions: It’s tempting to think “hallucinations require better training data” or “a different model,” but the prompt explicitly forbids retraining and changing the model. Another misconception is that Agents for Amazon Bedrock “supervise training”; agents orchestrate tool use, retrieval, and action execution at inference time, not model training. While retrieval-augmented generation (RAG) and guardrails can also reduce hallucinations, the option set here focuses on the quickest guaranteed change: decoding randomness. Exam Tips: When constraints say “no retraining/fine-tuning” and “must fix fast,” look for inference-time mitigations: lower temperature/top_p, add grounding via retrieval, add guardrails, and require citations. High temperature/top_p increases creativity; low temperature increases determinism—preferred for regulated domains like finance, healthcare, and legal. Map the mitigation to the constraint and timeline: parameter tuning is the fastest, lowest-risk change.
A fintech startup using a general-purpose foundation model on Amazon Bedrock must ensure the model consistently uses industry-specific terminology (e.g., 'ACH return code R01', 'SAR filing') and compliant report formats; they have 12,000 labeled examples pairing prompts with the desired domain-specific outputs and want to adapt the model to this vocabulary and constraints—what technique should they use?
Data augmentation creates additional training examples by transforming or synthesizing variations of existing data (e.g., paraphrasing prompts, adding noise, generating extra labeled pairs). It can help improve robustness and reduce overfitting, but by itself it is not the technique that makes a foundation model consistently adopt new terminology and formatting. Augmentation is often complementary to fine-tuning, not a replacement for it.
Fine-tuning uses labeled prompt–response examples to update a model’s weights so it reliably follows domain-specific language, tone, and output formats. With 12,000 labeled pairs, the startup can teach the model to consistently use fintech terminology (ACH codes, SAR phrasing) and produce compliant report structures. In Amazon Bedrock, fine-tuning is the standard approach for supervised model customization when you need consistent behavior beyond prompt engineering.
Model quantization reduces model precision (e.g., from FP16 to INT8/INT4) to lower inference cost, memory footprint, and sometimes latency. It does not teach the model new vocabulary, improve adherence to compliance formats, or align outputs to domain-specific constraints. Quantization is an optimization technique applied after training/customization, not a method for adapting a general-purpose model to fintech terminology.
Continuous pre-training (continued pre-training) trains the model further on large volumes of unlabeled domain text to improve domain knowledge and language patterns. While it can help a model “sound” more like the domain, it is typically used when you have massive domain corpora and need broader domain adaptation. Given the presence of 12,000 labeled prompt/output pairs and the need for specific compliant formats, supervised fine-tuning is the more direct and exam-appropriate choice.
Core Concept: This question tests how to adapt a general-purpose foundation model (FM) on Amazon Bedrock to reliably produce domain-specific language and structured, compliant outputs. The key concept is supervised adaptation of an FM using labeled prompt–response pairs. Why the Answer is Correct: Fine-tuning is the appropriate technique when you have a dataset of labeled examples (here, 12,000 prompt/output pairs) and you want the model to learn consistent terminology (e.g., ACH return codes, SAR filing language) and formatting constraints. Fine-tuning updates model weights so the behavior becomes more “baked in” than prompt-only approaches, improving consistency and reducing reliance on long, brittle prompts. In Bedrock, fine-tuning is designed for exactly this: aligning an FM to your organization’s domain vocabulary, tone, and output structure using supervised examples. Key AWS Features: On Amazon Bedrock, you can use model customization (fine-tuning) with your training data stored in Amazon S3, producing a customized model artifact you can invoke like the base model. This supports repeatable, versioned deployments and can be combined with guardrails (Amazon Bedrock Guardrails) and evaluation to help enforce safety/compliance requirements. For fintech, you’d typically also apply data governance controls (encryption, access policies) and maintain auditability of training data and model versions. Common Misconceptions: Data augmentation can increase dataset size/variety but does not itself adapt the model. Continuous pre-training sounds like “teach the model finance,” but it requires large-scale unlabeled corpora and is more complex/costly than needed for 12,000 labeled pairs. Quantization improves inference efficiency/cost, not domain adherence. Exam Tips: If the question mentions “labeled prompt–completion pairs” and the goal is consistent style/terminology/format, think fine-tuning. If it mentions “unlabeled domain text at scale,” think continued/continuous pre-training. If it mentions “reduce latency/cost,” think quantization. If it mentions “expand training examples,” think augmentation. Also remember: RAG helps with factual grounding from documents, but the prompt explicitly asks to adapt vocabulary/constraints using labeled examples—classic fine-tuning territory.
An online retail company has deployed a product-recommendation model behind an API that serves about 10,000 inference requests per hour with an SLO that 95% of requests complete under 200 ms; which metric would most directly indicate the runtime efficiency of the operating model?
Customer satisfaction score (CSAT) is a business KPI that can correlate with performance, but it is not a direct measure of runtime efficiency. CSAT is influenced by many non-technical factors (product quality, pricing, UX, delivery experience). Even if latency improves, CSAT might not change, and vice versa. For an inference SLO in milliseconds, you should focus on latency/response-time metrics instead.
Training time for each epoch measures how quickly the model trains, which is part of the ML development lifecycle, not the runtime serving lifecycle. It can indicate training efficiency and cost, but it does not tell you whether the deployed API can meet a 200 ms latency SLO for inference. Training performance and inference performance are often optimized differently and can even trade off.
Average response time is a direct runtime metric that reflects how quickly inference requests are completed by the deployed API. It is closely tied to operating efficiency (model size, compute choice, container performance, scaling). Although the SLO is defined as p95 under 200 ms (a percentile metric), average response time is still the most directly relevant option provided for runtime efficiency and is commonly monitored in CloudWatch.
Number of training instances indicates the scale of compute used during training, which affects training speed and cost. It does not directly measure inference runtime efficiency or whether the production endpoint meets latency SLOs. Inference performance depends more on the serving infrastructure (instance type, autoscaling, concurrency, batching) than on how many instances were used to train the model.
Core Concept - The question is testing operational (runtime) efficiency for an ML model served behind an inference API. In AWS terms, this maps to model serving performance/latency metrics (for example, Amazon SageMaker real-time endpoints, ECS/EKS-based inference, or Lambda-based inference), and how those metrics relate to an SLO such as “95% of requests under 200 ms.” Why the Answer is Correct - “Average response time” is the metric among the options that most directly reflects runtime efficiency of the operating model. Runtime efficiency is about how quickly the system can complete inference requests given the deployed model, instance type, container/runtime, and scaling configuration. While the SLO is explicitly percentile-based (p95 latency), average response time is still a direct latency metric and is far closer to runtime efficiency than training or business satisfaction metrics. Key AWS Features - In production, you would typically monitor latency percentiles (p50/p90/p95/p99) and throughput using Amazon CloudWatch metrics and logs. For SageMaker endpoints, key metrics include ModelLatency and OverheadLatency (and their percentile statistics), plus Invocations and CPUUtilization/MemoryUtilization to correlate performance with resource saturation. For API Gateway + Lambda/ECS/EKS, you’d use CloudWatch metrics (Latency, IntegrationLatency), ALB target response time, and distributed tracing (AWS X-Ray) to isolate where time is spent. Autoscaling (SageMaker endpoint autoscaling, ECS Service Auto Scaling, KEDA on EKS) helps maintain latency under load. Common Misconceptions - CSAT is an outcome metric influenced by many factors beyond inference runtime (UI, pricing, delivery, etc.). Training time per epoch and number of training instances relate to the training phase, not inference serving. These can affect model iteration speed and cost, but they do not directly indicate how efficiently the deployed endpoint handles live requests. Exam Tips - When you see an SLO stated in milliseconds and percentiles, think “latency metrics” (especially p95/p99). If the exact percentile metric is not offered, choose the closest direct runtime performance indicator (response time/latency) rather than business KPIs or training metrics. Also remember to distinguish training-time optimization (epochs, training cluster size) from inference-time optimization (instance type, batching, model compilation, autoscaling, caching).
A regional healthcare clinic uses a foundation model (FM) from Amazon Bedrock to power a triage assistant that answers patient questions; the system handles about 18,000 queries per day, and the clinic wants to improve accuracy on clinic-specific policies by fine-tuning the FM with 6,000 curated examples. Which strategy will successfully fine-tune the model?
Correct. Fine-tuning in Amazon Bedrock is driven by labeled examples that pair an input with the desired output. Representing each example with a prompt field (instruction/user input/context) and a completion field (ideal assistant response) matches the supervised fine-tuning paradigm and is the core requirement to teach the model clinic-specific policy behavior.
Incorrect. A plain .txt file containing CSV-formatted lines is not the expected dataset structure for Bedrock fine-tuning. Bedrock customization workflows typically require a structured dataset format (commonly JSON Lines) with clearly defined fields (for example, prompt/completion). Using an arbitrary text file risks schema validation failure and unusable training data.
Incorrect. Provisioned Throughput for Amazon Bedrock is an inference capacity feature that reserves model throughput for consistent performance and lower variance in latency. It does not modify model weights and therefore does not improve accuracy on clinic-specific policies. It addresses scaling and predictability, not fine-tuning or customization.
Incorrect. Training on journals and textbooks is not a targeted fine-tuning strategy for clinic-specific policies, and it is not described as labeled prompt-response pairs. Unlabeled domain text is more aligned with pretraining or building a retrieval corpus for RAG. To improve policy adherence, the model needs supervised examples of desired answers.
Core Concept: This question tests Amazon Bedrock model customization (fine-tuning) and the required training data format for supervised fine-tuning of a foundation model. In Bedrock, fine-tuning typically uses labeled prompt-response pairs so the model learns to produce clinic-specific outputs given clinic-specific inputs. Why the Answer is Correct: To fine-tune an FM for better accuracy on clinic policies, the clinic must supply curated examples that map an input (what the user asks or the instruction/context) to the desired output (the ideal assistant answer). Option A describes the essential supervised fine-tuning structure: a “prompt” field and a “completion” field (i.e., input-output pairs). This is the canonical approach for instruction tuning and is what Bedrock customization workflows expect conceptually: the model is trained to predict the completion conditioned on the prompt. Key AWS Features: Amazon Bedrock supports model customization for certain FMs, where you provide training and (optionally) validation datasets stored in Amazon S3. The dataset is commonly provided as structured records (often JSON Lines) containing prompt/completion (or equivalent input/output) fields. You also configure training parameters (epochs, learning rate where applicable), and Bedrock produces a customized model artifact you can invoke via the Bedrock runtime. For healthcare, you should also ensure data governance: least-privilege access to S3, encryption (SSE-KMS), and careful PHI handling. Common Misconceptions: A frequent trap is confusing fine-tuning with throughput scaling. Provisioned Throughput improves performance/consistency for inference but does not change model weights or accuracy. Another trap is thinking any text corpus (journals/textbooks) will “teach” the model policies; without labeled target outputs, that’s not supervised fine-tuning for policy adherence. Finally, incorrect file formats (like a .txt containing CSV lines) won’t meet Bedrock’s expected dataset schema. Exam Tips: When you see “fine-tune with curated examples,” think “supervised prompt-response pairs” and “S3-hosted structured dataset.” If the option talks about throughput, latency, or capacity, that’s inference scaling, not training. If it talks about unlabeled corpora, that’s closer to pretraining or RAG content, not fine-tuning for specific response behavior.
学習期間: 1 month
I found this practice questions and explanations very aligned with actual certification exam.
学習期間: 2 weeks
문제 좋네요. 비슷한 유형들이 꽤나 많았어요
学習期間: 1 month
실무에서 ai 서비스 개발 중이어서 문제 풀 때 쉬웠고, 시험도 무난하게 합격했어요
学習期間: 2 weeks
실제 시험 문제 절반이 비슷했고, 나머지는 새로보는 유형이었어요
学習期間: 1 month
I learned about this certification only with these questions and passed it after learning for 2 days
外出先でもすべての問題を解きたいですか?
無料アプリを入手
Cloud Passを無料でダウンロード — 模擬試験、学習進捗の追跡などを提供します。