
AWS
192+ Soal Latihan Gratis dengan Jawaban Terverifikasi AI
Didukung AI
Setiap jawaban AWS Certified AI Practitioner (AIF-C01) diverifikasi silang oleh 3 model AI terkemuka untuk memastikan akurasi maksimum. Dapatkan penjelasan detail per opsi dan analisis soal mendalam.
A public health analytics team at a city hospital network plans to build an AI application using large language models (LLMs) that will read 20–50 page clinical incident reports (PDFs) and produce key takeaways for safety reviews; the system must return a concise summary of the top findings (≤200 words) within 5 seconds per request and support up to 10 concurrent reviewers. Which solution meets these requirements?
NER is a classic NLP extraction task that identifies and labels entities (e.g., drug names, dosages, patient identifiers). While useful for structuring clinical text and supporting downstream analytics, it does not generate a concise narrative or bullet summary of “top findings.” It also doesn’t inherently satisfy the requirement to return a ≤200-word takeaway summary within 5 seconds.
A recommendation engine suggests similar or related incident reports, which can improve reviewer workflow and discovery. However, it does not read a single 20–50 page PDF and produce key takeaways. Even if paired with search, it fails the primary functional requirement: generating a concise summary of the report’s top findings within the specified word and latency constraints.
Option C is correct because the primary requirement is to read long clinical incident reports and generate a concise summary of the top findings. That is a classic LLM summarization use case, since large language models are designed to understand unstructured text and produce coherent condensed outputs. The option also explicitly includes the requirement to keep the response within 200 words and return results in under 5 seconds, matching the stated functional and performance constraints. None of the other options produce a summary, so C is the only choice that directly satisfies the business need.
Translation systems convert text between languages. The prompt does not require multilingual support; it requires summarization of clinical incident reports. Translation may be valuable in other hospital contexts, but it does not produce “top findings” summaries and does not address the core requirement. Therefore it is not the best solution for this use case.
Core Concept: This question tests selecting the correct generative AI application pattern—LLM-based document summarization—based on explicit functional and nonfunctional requirements (output length, latency, and concurrency). In AWS terms, this commonly maps to using a foundation model (for example via Amazon Bedrock) to perform summarization on long-form documents. Why the Answer is Correct: The team needs an AI application that reads 20–50 page PDFs and produces a concise summary of top findings (≤200 words) within 5 seconds per request, supporting up to 10 concurrent reviewers. Option C directly describes an LLM-powered summarization assistant that performs exactly this task and aligns with the constraints (bounded summary length and low latency). The other options describe different ML tasks (NER, recommendations, translation) that do not satisfy the primary requirement: generating a concise narrative/bulleted summary of key findings. Key AWS Features: A typical AWS implementation would use Amazon S3 to store PDFs, text extraction (often Amazon Textract for PDFs) to convert documents to text, and an LLM for summarization (for example, Amazon Bedrock with an appropriate model). To meet the 5-second SLA and 10 concurrent users, you would focus on: - Prompt design to enforce ≤200 words (explicit instruction + max tokens). - Low-latency inference (choose a model/throughput configuration appropriate for real-time requests). - Caching summaries for repeated access (for example, DynamoDB/ElastiCache) and asynchronous pre-summarization for newly uploaded reports when possible. - Concurrency controls and scaling at the API layer (API Gateway + Lambda/ECS) while ensuring the model endpoint can handle parallel requests. Common Misconceptions: NER (Option A) can look relevant because clinical reports contain entities like drugs and dosages, but tagging entities does not produce a coherent “top findings” summary. Recommendations (Option B) help discovery, not summarization. Translation (Option D) is unrelated unless multilingual output is required. Exam Tips: For exam questions, anchor on the verb and output: “produce key takeaways/summary” strongly indicates generative AI summarization. Then validate against nonfunctional requirements (latency, concurrency, output length). Choose the option that matches the end-user deliverable, not adjacent analytics tasks (entity extraction, search, recommendations).
Ingin berlatih semua soal di mana saja?
Unduh Cloud Pass gratis — termasuk tes latihan, pelacakan progres & lainnya.
Ingin berlatih semua soal di mana saja?
Unduh Cloud Pass gratis — termasuk tes latihan, pelacakan progres & lainnya.
Ingin berlatih semua soal di mana saja?
Unduh Cloud Pass gratis — termasuk tes latihan, pelacakan progres & lainnya.
Masa belajar: 1 month
I found this practice questions and explanations very aligned with actual certification exam.
Masa belajar: 2 weeks
문제 좋네요. 비슷한 유형들이 꽤나 많았어요
Masa belajar: 1 month
실무에서 ai 서비스 개발 중이어서 문제 풀 때 쉬웠고, 시험도 무난하게 합격했어요
Masa belajar: 2 weeks
실제 시험 문제 절반이 비슷했고, 나머지는 새로보는 유형이었어요
Masa belajar: 1 month
I learned about this certification only with these questions and passed it after learning for 2 days


Unduh Cloud Pass dan akses semua soal latihan AWS Certified AI Practitioner (AIF-C01) secara gratis.
Ingin berlatih semua soal di mana saja?
Dapatkan aplikasi gratis
Unduh Cloud Pass gratis — termasuk tes latihan, pelacakan progres & lainnya.
An IT operations team uses an LLM to diagnose incidents by analyzing 8–12 sequential service logs and 5 metric anomalies, and they require the model to produce a numbered, step-by-step reasoning trace with intermediate calculations (e.g., latency deltas and error-rate ratios) that justifies the final root cause and remediation; which prompt engineering technique best meets these requirements?
Few-shot prompting provides a small set of examples (input-output pairs) to teach the model the desired pattern. It can help the LLM format incident analyses consistently and may improve accuracy by demonstrating how to compute deltas/ratios. However, the requirement is specifically to produce an explicit step-by-step reasoning trace with intermediate calculations, which is more directly addressed by chain-of-thought prompting than by merely providing examples.
Zero-shot prompting relies only on instructions without examples. While you could ask for numbered steps and calculations, zero-shot is generally less reliable for complex, multi-evidence reasoning across sequential logs and metrics. It often increases variability in structure and completeness, making it a weaker choice when the output must consistently include intermediate computations and a justified root cause/remediation trace.
Directional stimulus prompting steers the model by emphasizing specific signals, constraints, or hints (e.g., “focus on latency spikes after deployment,” “prioritize 5xx errors”). This can improve relevance and reduce distraction from noisy logs, but it does not inherently cause the model to expose a full reasoning chain with intermediate calculations. It is more about guiding attention than eliciting explicit step-by-step reasoning.
Chain-of-thought prompting is intended to elicit multi-step reasoning by having the model articulate intermediate steps before producing a final answer. This directly matches the requirement for a numbered reasoning trace with intermediate calculations (latency deltas, error-rate ratios) that justify the final root cause and remediation. It is the best fit when the task requires transparent, structured reasoning over multiple logs and metric anomalies.
Core Concept: This question tests prompt engineering techniques for generative AI/LLMs, specifically how to elicit structured, multi-step reasoning with intermediate computations from a model. In AWS exam contexts, this commonly appears alongside Amazon Bedrock or Amazon SageMaker JumpStart usage patterns, but the core is the prompting method. Why the Answer is Correct: The requirement is explicit: the team needs a numbered, step-by-step reasoning trace with intermediate calculations (latency deltas, error-rate ratios) that justifies the final diagnosis and remediation. Chain-of-thought prompting is designed to encourage the model to “show its work” by producing intermediate reasoning steps rather than only a final answer. When you ask for a structured reasoning trace (e.g., “Think step by step; include calculations; output numbered steps”), you are applying chain-of-thought prompting to improve coherence across multiple evidence items (8–12 sequential logs plus 5 metric anomalies) and to reduce the chance the model jumps to an unsupported conclusion. Key AWS Features / Best Practices: In practice on AWS (e.g., Amazon Bedrock), you would combine chain-of-thought style instructions with output formatting constraints (numbered steps, sections for calculations, final root cause, remediation). You may also pair it with retrieval (RAG) to supply the log/metric context, but the question is specifically about the prompting technique that yields intermediate reasoning. For operational use, teams often additionally enforce guardrails (e.g., “cite which log lines/metrics support each step”) and validate calculations externally. Common Misconceptions: Few-shot prompting can improve task adherence by providing examples, but it does not inherently require the model to reveal intermediate reasoning. Zero-shot is least likely to reliably produce detailed traces. Directional stimulus prompting can steer attention toward certain evidence, but it is not the canonical technique for eliciting explicit multi-step reasoning with calculations. Exam Tips: When a question asks for “step-by-step reasoning,” “show your work,” “intermediate steps,” or “multi-hop reasoning,” the exam answer is typically chain-of-thought prompting. If the question instead emphasizes “learn from examples,” choose few-shot. If it emphasizes “do it with only instructions,” choose zero-shot. If it emphasizes “focus on specific cues or constraints,” directional stimulus may apply.
A real-time financial news platform ingests about 12 GB of new multilingual articles per day and wants its in-house foundation model (FM) to reflect breaking developments within 24 hours without resetting weights; the team plans daily refreshes using a rolling 90-day corpus and warm-starting from the latest checkpoint to preserve capabilities. Which training strategy will keep the FM current with the most recent data while maintaining previously learned knowledge?
Batch learning is too generic for this scenario and does not specifically describe the continued training of a foundation model from an existing checkpoint. Many training jobs process data in batches, but that alone does not address the requirement to keep the FM current with newly arriving text while preserving prior knowledge. The question is asking for a lifecycle strategy for updating a pretrained model, not merely a data ingestion style. In exam terms, batch learning is not the precise label for warm-started ongoing FM refreshes.
Continuous pre-training is the correct strategy because it continues training an existing foundation model from its latest checkpoint instead of starting over from randomly initialized weights. That directly matches the requirement to refresh the model daily, keep it current with breaking multilingual news, and preserve previously learned capabilities. Using a rolling 90-day corpus helps the model absorb recent developments while still revisiting prior data, which reduces catastrophic forgetting. This is the standard approach when an organization wants an FM to stay up to date without resetting weights or discarding prior knowledge.
Static training means training the model once on a fixed dataset and then leaving the weights unchanged until a future full retraining cycle. That conflicts with the requirement to reflect breaking developments within 24 hours and to perform daily refreshes. A static model would quickly become outdated in a financial news setting where new events materially change the information landscape every day. It also does not align with the stated plan to warm-start from the latest checkpoint on a rolling corpus.
Latent training is not a standard training strategy term used for foundation model maintenance in AWS certification contexts or mainstream ML practice. Although the word 'latent' appears in discussions of latent representations or latent spaces, it does not describe a recognized method for incrementally updating an FM with new corpora. The option therefore does not match the operational pattern of daily checkpoint-based updates over a rolling dataset. It is essentially a distractor rather than a valid answer choice for this use case.
Core Concept: This question tests training lifecycle strategies for foundation models (FMs), specifically how to keep an FM up to date with newly arriving data without losing previously learned capabilities. The relevant concept is incremental/ongoing training on new data using an existing checkpoint (warm start), often called continuous pre-training (continued pretraining). Why the Answer is Correct: The platform ingests new multilingual articles daily and needs the FM to reflect breaking news within 24 hours. The team also explicitly wants to avoid “resetting weights” and instead warm-start from the latest checkpoint while training on a rolling 90-day corpus. That is the hallmark of continuous pre-training: you periodically continue training the same base model on newly collected domain data, starting from the most recent weights, so the model adapts to recent information while retaining general language competence. Using a rolling window helps balance recency with retention and reduces catastrophic forgetting compared to training only on the newest day’s data. Key AWS Features / Best Practices: On AWS, this pattern is typically implemented as a scheduled training pipeline (for example, orchestrated with Amazon SageMaker Pipelines or AWS Step Functions) that: 1) Curates daily data into Amazon S3, 2) Builds a rolling 90-day training set (often with versioning and lineage), 3) Launches a training job that initializes from the latest model checkpoint stored in S3, and 4) Registers the updated model in a model registry for controlled deployment. Best practices include checkpointing, data/version governance, evaluation gates (to detect regressions), and monitoring for drift and degradation. Common Misconceptions: “Batch learning” can sound similar because it uses periodic batches, but in ML terminology it usually contrasts with online learning and does not specifically imply continuing pre-training of an FM from a prior checkpoint to keep knowledge current. “Static training” implies one-and-done training, which fails the 24-hour freshness requirement. “Latent training” is not a standard strategy for keeping FMs current. Exam Tips: When you see requirements like “warm-start from latest checkpoint,” “rolling corpus,” “daily refresh,” and “keep the base model current,” map it to continuous/continued pre-training. If the question instead emphasized task-specific adaptation with small labeled datasets, that would point more toward fine-tuning rather than pre-training.
While using Amazon Bedrock, a team sees that a 44-word chat message is billed as 1,536 input tokens and 72 output tokens; in this context, what does the term token refer to?
Correct. Tokens are the discrete text units a foundation model consumes and produces after tokenization. A token can be a whole word, part of a word (subword), punctuation, whitespace markers, or special symbols. Bedrock meters many models by counting these input and output tokens, which explains why a short message in words can still be large in tokens due to subword splitting and request overhead.
Incorrect. Mathematical vector representations are embeddings, not tokens. Embeddings are continuous-valued vectors used to represent meaning for tasks like semantic search or retrieval-augmented generation (RAG). Token counts used for billing refer to the number of discrete token IDs processed by the model, not the dimensionality or number of embedding vectors.
Incorrect. Pre-trained weights are the model parameters learned during training (often billions of parameters). They determine the model’s behavior and are not counted per request. Token billing is about runtime usage (how much text is processed/generated), whereas weights relate to the model’s size and training, not per-inference metering.
Incorrect. Prompts are the instructions and context you provide, but a prompt is composed of tokens after tokenization. Bedrock bills based on the number of tokens in the prompt (plus any included context/system messages) and the number of tokens in the generated completion. Tokens are the units inside the prompt, not the prompt itself.
Core Concept: This question tests understanding of “tokens” in generative AI usage and billing, specifically in Amazon Bedrock. Bedrock (and most LLM providers) meters requests based on token counts for input (prompt + conversation history + system instructions) and output (model-generated text). Why the Answer is Correct: A token is the basic unit of text that a model reads and writes. Tokens are not always whole words; they can be subwords (word pieces), punctuation, whitespace markers, or special control symbols. That’s why a 44-word message can map to a much larger number of input tokens: the model’s tokenizer may split words into multiple pieces, and the billed “input tokens” often include more than the visible user message (for example, system prompts, formatting wrappers, safety instructions, and prior chat turns that are sent along with the request). Key AWS Features: In Amazon Bedrock, pricing and quotas are typically expressed in terms of input and output tokens for the selected foundation model. Tokenization is model-specific: different models (and even different versions) can tokenize the same text differently, leading to different token counts and costs. In chat use cases, the request commonly includes structured message roles (system/user/assistant) and may include conversation context, which increases input tokens. Understanding token-based metering helps with cost optimization (shorter prompts, trimming history, summarizing context) and latency management. Common Misconceptions: People often confuse tokens with embeddings (vectors) because both relate to text processing. Others assume tokens equal words, which is incorrect; tokenization is a preprocessing step that converts text into discrete IDs. Another confusion is thinking tokens are model parameters (weights) or prompts themselves—those are different concepts. Exam Tips: When you see Bedrock billing, throughput, or context window questions, “tokens” almost always means the text units produced by a tokenizer (word/subword/symbol units). Remember that input token counts can include hidden overhead (system prompts, chat formatting, and conversation history). If a word count seems inconsistent with token count, that’s a clue the question is about tokenization granularity and request composition, not an error in billing.
A video streaming platform wants to use AI to protect its content delivery APIs from malicious traffic; the AI must determine whether a new request’s client IP and user agent come from a suspicious source by comparing against 30 days of baseline traffic patterns, score up to 12,000 requests per minute with under 150 ms latency per request, and automatically flag unusual sources; which solution meets these requirements?
Speech recognition converts audio to text (ASR) and is used for call transcription, voice assistants, and media captioning. The problem here involves structured request attributes (client IP and user agent) and detecting suspicious sources, not interpreting audio signals. Even if the platform is “video streaming,” the security requirement is about API traffic analysis, so speech recognition is not applicable.
NLP named entity recognition (NER) extracts entities (people, organizations, locations, etc.) from unstructured text. The inputs in this scenario are client IP addresses and user-agent strings used for request fingerprinting and behavioral analysis. While user-agent is a string, the task is not entity extraction; it is identifying deviations from normal traffic patterns. Therefore, NER is the wrong ML approach.
An anomaly detection system is designed to learn normal patterns from historical data (e.g., 30 days of traffic) and score new events to flag unusual behavior. This directly matches the requirement to compare new requests’ IP/user-agent against baseline patterns and automatically identify suspicious sources. It also fits real-time scoring needs (12,000 RPM, <150 ms) via low-latency inference (e.g., SageMaker endpoints) and automated response (e.g., AWS WAF updates).
Fraud forecasting focuses on predicting future fraud rates or volumes over time (time-series forecasting), not evaluating each incoming request for abnormality relative to a learned baseline. The requirement is to score individual requests and flag unusual sources immediately, which is anomaly detection/classification. Forecasting could complement capacity planning or trend analysis, but it does not satisfy per-request, low-latency suspicious-source detection.
Core Concept: This question tests selecting the correct ML problem type for security telemetry: detecting unusual request sources (client IP + user agent) by learning “normal” behavior over a historical window and scoring new events in near real time. That is classic anomaly detection (often unsupervised or semi-supervised) rather than NLP, speech, or time-series forecasting. Why the Answer is Correct: The platform needs to compare new requests against 30 days of baseline traffic patterns and automatically flag unusual sources. That maps directly to anomaly detection: build a model of normal patterns (e.g., typical IP/user-agent combinations, frequency, geo/ASN distributions, request rates) and score incoming requests for deviation. The requirement to “score up to 12,000 requests per minute with under 150 ms latency per request” implies low-latency online inference, which anomaly detection systems commonly support via real-time endpoints. Key AWS Features: On AWS, this is commonly implemented with Amazon SageMaker (real-time inference endpoint) using built-in algorithms such as Random Cut Forest (RCF) for anomaly detection, or custom models. You would train on 30 days of logs (e.g., from Amazon CloudFront, ALB, API Gateway access logs, or AWS WAF logs) stored in Amazon S3, then deploy to a SageMaker endpoint with autoscaling to meet 12k RPM and latency targets. For streaming ingestion and near-real-time scoring, Amazon Kinesis Data Streams / Firehose can feed features to the model, and results can trigger automated actions (e.g., update AWS WAF IP sets, publish to Amazon EventBridge, or alert via Amazon SNS). This aligns with Well-Architected Security and Reliability pillars: automate detection/response and scale horizontally. Common Misconceptions: Some may confuse “suspicious traffic” with “fraud forecasting,” but forecasting predicts future values (e.g., demand or fraud volume) rather than classifying individual requests as anomalous. NLP named entity recognition and speech recognition are irrelevant because the inputs are structured request metadata, not text entities or audio. Exam Tips: When you see “baseline of normal behavior,” “flag unusual,” and “score events,” think anomaly detection. When you see “predict future trend,” think forecasting. Match the ML task to the data type: IP/user-agent telemetry is structured, not language or audio. Also note operational constraints (RPM/latency) point to real-time inference endpoints and autoscaling.
A healthcare startup is using few-shot prompting with a foundation model hosted on Amazon Bedrock; the current prompt includes 12 examples, the model is invoked once per day at 07:00 UTC, and output quality is satisfactory. The company wants to lower its monthly cost while maintaining current performance. Which solution will meet these requirements?
Customizing/fine-tuning a model can sometimes reduce the need for long few-shot prompts, but it adds extra cost and operational overhead (training, evaluation, and potentially higher ongoing charges). Given the model runs only once per day and quality is already satisfactory, fine-tuning is unlikely to be the most cost-effective way to reduce monthly spend while maintaining current performance.
Bedrock usage costs are largely driven by input and output tokens. Few-shot prompting with 12 examples increases input tokens every invocation. Reducing prompt tokens (fewer/shorter examples, removing redundant text, concise formatting) directly lowers cost while keeping the same model and invocation frequency. You can iteratively trim and validate quality to maintain current performance.
Increasing the number of tokens in the prompt (for example, adding more examples or longer instructions) will increase input token usage and therefore increase cost. While it might improve quality in some cases, the question states output quality is already satisfactory and the goal is to lower monthly cost, making this the opposite of what is needed.
Provisioned Throughput on Amazon Bedrock is intended for workloads that need guaranteed capacity, predictable latency, and sustained throughput. For a single invocation per day, provisioned capacity would typically be more expensive than on-demand token-based usage. It does not reduce token charges and is not aligned with the low-frequency usage pattern described.
Core Concept: Amazon Bedrock pricing for most foundation models is primarily usage-based (input tokens + output tokens). Few-shot prompting increases input tokens because each example adds text to the prompt. If quality is already satisfactory, the most direct way to reduce cost while keeping the same model and invocation pattern is to reduce token usage. Why the Answer is Correct: The model is invoked only once per day, so throughput/latency guarantees are not a requirement. With 12 few-shot examples, the prompt likely contains many tokens that are billed every invocation. Decreasing the number of tokens in the prompt (for example, reducing the number of examples, shortening examples, removing redundant instructions, or compressing formatting) reduces input token consumption and therefore lowers monthly cost. Because the current output quality is satisfactory, you can iteratively trim tokens while validating that performance remains acceptable—meeting the “maintain current performance” requirement. Key AWS Features: Bedrock model invocation costs scale with token counts; prompt engineering is an operational lever to control spend. Practical techniques include: keeping only the most representative few-shot examples, using shorter exemplars, removing verbose system instructions, and standardizing concise schemas. You can measure token usage and response quality via logging/observability (for example, capturing prompt/response metadata) and run A/B tests to ensure quality is maintained. Common Misconceptions: Fine-tuning/customization can improve quality or reduce prompt length, but it introduces additional costs (training and potentially hosting/management) and is unnecessary when quality is already satisfactory and traffic is extremely low. Provisioned Throughput is often misunderstood as a cost saver; it is designed for consistent, high-throughput workloads and reserved capacity, not a once-per-day invocation. Exam Tips: When a question asks to “lower cost” for LLMs, first look for token-reduction strategies (shorter prompts, fewer examples, smaller outputs) before considering customization or reserved capacity. Choose Provisioned Throughput only when the workload needs predictable capacity/latency at scale or sustained high request rates. For sporadic, low-volume usage, on-demand token-based pricing plus prompt optimization is typically the most cost-effective approach.
A retail analytics team is building a retrieval-augmented generation (RAG) prototype that stores 20 million 768-dimensional text embeddings in Amazon OpenSearch Service and must retrieve the top 20 most similar vectors within 100 ms; which OpenSearch capability specifically enables this type of vector database application?
Native integration with Amazon S3 is useful for snapshots, backups, and data lifecycle patterns (e.g., storing index snapshots in S3). However, S3 is object storage and does not provide the low-latency, high-dimensional similarity search required for RAG. S3 integration helps durability and cost management, but it does not enable k-NN retrieval of embeddings within 100 ms.
Geospatial indexing and queries support location-based use cases such as finding points within a radius, bounding box searches, and geo-distance sorting. While it is a specialized indexing capability, it is unrelated to semantic similarity over 768-dimensional embedding vectors. Geospatial features won’t help retrieve the top 20 nearest embeddings for a RAG workload.
Scalable vector index management and k-nearest neighbor (k-NN) search is the specific OpenSearch capability that enables vector database applications. It allows storing embeddings in vector fields and performing approximate nearest neighbor searches (top K most similar vectors) efficiently at large scale, meeting tight latency requirements typical of RAG systems (e.g., retrieving top 20 matches from tens of millions of vectors).
Real-time analysis on streaming data refers to ingesting and querying continuously arriving events (logs, metrics, clickstreams) with low indexing latency. Although OpenSearch is commonly used for near-real-time analytics, streaming support does not address the core requirement here: fast similarity search over high-dimensional embeddings. Vector retrieval performance depends on k-NN/ANN indexing, not streaming analytics.
Core Concept: This question tests Amazon OpenSearch Service’s vector database capabilities used in retrieval-augmented generation (RAG): storing high-dimensional embeddings and performing fast approximate similarity search (k-nearest neighbor) at scale. Why the Answer is Correct: A RAG system needs to retrieve the most semantically similar documents to a query embedding. With 20 million 768-dimensional vectors and a strict latency target (top 20 within 100 ms), the key OpenSearch capability is scalable vector indexing plus k-NN search. OpenSearch provides vector fields and k-NN/ANN (approximate nearest neighbor) search so you can efficiently find the closest vectors without scanning all 20 million embeddings. This is exactly what enables OpenSearch to function as a vector database for semantic search and RAG. Key AWS Features: OpenSearch supports vector search through k-NN functionality (commonly backed by ANN algorithms such as HNSW) and manages vector indexes across shards and nodes for horizontal scalability. You typically store embeddings in a vector field, choose an ANN index type/parameters, and query using k-NN to return the top K results (here, K=20). Performance depends on index parameters (e.g., graph construction/ef settings), shard sizing, instance types (memory/CPU), and using filters to narrow candidate sets when applicable. This aligns with Well-Architected performance efficiency: optimize data structures and scale-out rather than brute-force scans. Common Misconceptions: S3 integration (A) is about storing snapshots or ingesting data, not low-latency vector similarity search. Geospatial indexing (B) is specialized for location-based queries, not embedding similarity. Streaming analytics (D) relates to near-real-time log/event analytics, not ANN vector retrieval. Exam Tips: When you see “embeddings,” “top K similar,” “vector database,” or “RAG,” look for “vector search,” “k-NN,” or “ANN.” In OpenSearch/Elasticsearch-style services, the enabling feature is the k-NN/vector index, not generic search, storage integrations, or streaming features. Also note that strict latency with tens of millions of vectors strongly implies approximate nearest neighbor indexing rather than exact brute-force similarity computation.
A university research team used Amazon Bedrock to customize a foundation model to answer questions about campus services; they now want to run a structured validation on 2,500 new, unseen queries and need to upload a single 150 MB JSONL file that Amazon Bedrock can access for evaluation in the same AWS Region as the Bedrock instance. Which AWS service should they use to store this dataset so Bedrock can read it directly from object storage?
Amazon S3 is the correct choice because it is AWS’s native object storage service and is commonly used as the source location for datasets consumed by managed services like Amazon Bedrock. A 150 MB JSONL file is well within S3 object limits, and placing it in an S3 bucket in the same Region enables Bedrock to access it via an S3 URI with appropriate IAM permissions (and optional SSE-KMS encryption).
Amazon EBS is block storage designed to be attached to an EC2 instance (or certain managed compute services) as a volume. It is not an object storage service and does not provide a direct “read from object storage” integration point for Bedrock evaluation. Using EBS would require provisioning compute to host and expose the data, adding unnecessary complexity and not meeting the stated requirement.
Amazon EFS is a managed network file system (NFS) used primarily with EC2, containers, and some AWS compute services that can mount file systems. While it can store files, it is not object storage and is not the standard direct input source for Bedrock evaluation datasets. It would also require a mounted environment and network configuration rather than a simple S3 object reference.
AWS Snowcone is a small edge device used for offline/edge compute and data transfer to AWS, typically when connectivity is limited or for rugged environments. It is not a regional object storage service that Bedrock can directly read from for an evaluation job. Snowcone might help move data into AWS, but the dataset would still ultimately need to land in S3 for direct Bedrock access.
Core Concept: This question tests which AWS storage service provides object storage that Amazon Bedrock can read directly for model evaluation datasets. Bedrock evaluation workflows commonly reference datasets stored in Amazon S3 (for example, JSONL files) via an S3 URI in the same AWS Region. Why the Answer is Correct: Amazon S3 is AWS’s regional object storage service designed for storing and retrieving files (objects) like a single 150 MB JSONL dataset. Bedrock can access evaluation inputs from S3 because S3 is natively integrated across AWS services and is the standard “object storage” target referenced by managed AI/ML services. Storing the dataset in an S3 bucket in the same Region as the Bedrock resources satisfies the requirement for regional access and minimizes latency and data transfer complexity. Key AWS Features / Best Practices: S3 provides durable, highly available storage with simple access patterns (S3 URI), supports large objects well beyond 150 MB, and integrates with IAM for fine-grained access control. For Bedrock to read the dataset, you typically grant permissions (e.g., s3:GetObject on the bucket/prefix) to the Bedrock service role or execution role used for the evaluation job. You can also apply bucket policies, encryption (SSE-S3 or SSE-KMS), and S3 versioning for governance and repeatable evaluations. Common Misconceptions: EBS and EFS are file/block storage attached to compute (EC2, some container runtimes) and are not “object storage” endpoints that Bedrock reads from directly. Snowcone is an edge/offline data transfer and compute device, not a regional object store for online service access. These options can seem plausible because they store data, but they don’t match the “read directly from object storage” requirement. Exam Tips: When a question says “object storage,” “upload a file,” “service reads from a bucket,” or references datasets for managed ML/GenAI services, default to Amazon S3 unless there’s a specific constraint pushing you to another service. Also watch for Region requirements: choose an S3 bucket in the same Region and ensure IAM permissions and (if required) KMS key policies allow the service to read the objects.
An e-commerce marketplace plans to deploy a product Q&A assistant using a managed large language model on Amazon Bedrock; in a two-day pilot, each request averages 800 prompt tokens and 200 completion tokens, the temperature is set to 0.7, and no custom training or fine-tuning is performed for the model. Which factor will primarily drive the daily inference charges when the system processes 10,000 requests?
Correct. Amazon Bedrock on-demand inference charges are primarily based on token usage: the number of input (prompt) tokens plus output (completion) tokens. With 10,000 requests/day and ~1,000 tokens/request, total daily tokens dominate cost. This aligns with typical Bedrock model pricing, which specifies rates for input and output tokens separately.
Incorrect. Temperature controls randomness/creativity in generation, affecting determinism and sometimes indirectly influencing response length. However, Bedrock does not bill “per temperature setting.” Billing is metered by usage (primarily tokens). Even if temperature changes the style of output, the charge is still driven by how many tokens are generated and processed.
Incorrect. The scenario explicitly states no custom training or fine-tuning is performed. Training data volume would matter only if the solution included model customization (fine-tuning) or separate training workflows. For managed FM inference in Bedrock, you pay for inference usage (tokens), not for the provider’s original pre-training data.
Incorrect. Training time is irrelevant here because the model is not being trained by the customer. Bedrock provides managed foundation models where customers typically consume inference APIs. Charges based on training duration would apply to training jobs in services like Amazon SageMaker training or certain customization workflows, not standard Bedrock inference.
Core Concept: Amazon Bedrock inference pricing for managed foundation models is primarily usage-based. For most Bedrock models, on-demand inference charges are driven by the number of tokens processed (input/prompt tokens plus output/completion tokens). This question tests understanding of how generative AI inference is metered and billed. Why the Answer is Correct: With 10,000 requests/day and an average of 800 prompt tokens + 200 completion tokens, the system consumes ~1,000 tokens per request, or ~10,000,000 tokens/day total. Bedrock’s inference cost scales with this token volume because the service meters how much text you send to the model and how much text the model generates. Since no fine-tuning or custom training is performed, there are no training-related charges to consider. Therefore, the dominant cost driver is tokens consumed per request. Key AWS Features: Bedrock provides access to multiple foundation models with pricing typically expressed per 1,000 (or 1 million) input tokens and per 1,000 (or 1 million) output tokens, often at different rates. This encourages cost control via prompt optimization (shorter prompts), response length limits (max tokens), caching/retrieval strategies (RAG to reduce unnecessary context), and request batching where supported. Temperature is an inference parameter that affects randomness, but it does not change the billing unit. Common Misconceptions: Many assume “temperature” or other generation settings directly change cost. While temperature can indirectly influence output length or retries (which could increase tokens), the billing mechanism is still token-based. Another misconception is that training time or training data affects inference charges; those apply only when you are actually training or fine-tuning a model (and Bedrock’s managed FMs are typically used without customer-managed training). Exam Tips: For Bedrock and most LLM services, remember: inference cost is usually proportional to input + output tokens. Always compute approximate daily/monthly token totals from request volume and average token counts. Separate inference pricing from customization (fine-tuning) pricing, and don’t confuse model hyperparameters (temperature, top-p) with billing dimensions unless the question explicitly mentions a pricing model based on compute time or provisioned throughput.
A fintech startup needs to build an AWS Glue ETL job to process 120 CSV files (about 80 GB) that arrive daily in Amazon S3, but the team has little AWS Glue programming experience and wants step-by-step guidance and code suggestions directly in the AWS console to produce a working PySpark job within 2 days; which AWS service should they use to help them build and use AWS Glue?
Amazon Q Developer is the best answer because it is AWS’s AI-powered assistant for developers, designed to help users build applications and workflows on AWS more quickly. For an AWS Glue ETL job, it can help generate or explain PySpark code, suggest implementation patterns, and provide guidance on how to use Glue features such as reading from Amazon S3, transforming data, and writing outputs. This is especially valuable for a team with limited Glue programming experience and a short delivery timeline. Among the options, it is the only service whose primary purpose is developer productivity and guided code assistance.
AWS Config is a service for assessing, auditing, and evaluating the configurations of AWS resources. It helps with compliance monitoring and governance, such as checking whether resources conform to desired rules or policies. It does not provide code generation, step-by-step development help, or PySpark authoring assistance for AWS Glue jobs. Therefore, it does not meet the requirement for guided ETL job creation.
Amazon Personalize is a managed machine learning service used to build recommendation systems and personalized user experiences. Its purpose is to generate recommendations based on user behavior and item metadata, not to help developers write ETL code or learn AWS Glue. Although ETL pipelines may prepare data for Personalize, the service itself does not provide development guidance for Glue. That makes it unrelated to the need described in the question.
Amazon Comprehend is a natural language processing service that extracts insights such as entities, sentiment, key phrases, and PII from text. It is useful for text analytics workloads, but it is not a developer assistant and does not help users build AWS Glue ETL jobs. It cannot provide step-by-step coding guidance or generate PySpark scripts for Glue. As a result, it does not satisfy the stated requirement.
Core concept: identify the AWS service that provides AI-powered developer assistance to help a team quickly create an AWS Glue ETL job with minimal prior experience. The correct choice is Amazon Q Developer because it offers conversational guidance, code generation, and troubleshooting help for AWS development tasks, including Glue-related PySpark patterns. Key features include code suggestions, explanations of AWS services and APIs, and help accelerating implementation when teams are unfamiliar with service-specific programming. A common misconception is to confuse operational or ML services with developer-assistance tooling; AWS Config is for compliance, while Personalize and Comprehend are application AI services, not coding assistants. Exam tip: when the question emphasizes step-by-step guidance, code suggestions, and helping developers build on AWS faster, look for Amazon Q Developer.
A regional e-commerce startup planning a 6-week holiday campaign needs a solution that can automatically generate 300 unique marketing assets per week from text briefs of 100 words or fewer without relying on any labeled historical datasets; which of the following best represents a generative AI use case for this requirement?
Intrusion detection and anomaly flagging is a security analytics/ML classification use case, typically using services like Amazon GuardDuty, Amazon Detective, or custom anomaly detection. It focuses on identifying suspicious patterns, not generating new content. It also often relies on historical telemetry and baselines rather than prompt-driven creation, so it does not match the requirement to generate marketing assets from short text briefs.
This is a direct generative AI use case: text-to-image generation to create photorealistic marketing/product images from short prompts. It satisfies the need for hundreds of unique assets per week and does not require labeled historical datasets because pretrained foundation models can generate images via prompting. On AWS, this aligns well with Amazon Bedrock image generation models (invoked via API) and storing outputs in Amazon S3 for campaign workflows.
Optimizing indexing strategies is a database engineering/performance tuning activity (e.g., Amazon RDS, Aurora, DynamoDB design patterns). It is not an AI/ML task and does not involve generating new content. While it can improve query latency and throughput, it does not address the requirement to create marketing assets from text briefs, so it is not a generative AI use case.
Forecasting financial time-series data is a predictive analytics/ML use case (e.g., Amazon Forecast or custom models in SageMaker). It produces predictions (future values) rather than creative assets like images. It also commonly benefits from historical labeled/structured time-series data. Therefore, it does not match the prompt-based, content-generation requirement described in the question.
Core Concept: This question tests recognition of a generative AI use case: using foundation models to create new content (images, text, audio) from prompts, without requiring labeled historical training data. On AWS, this commonly maps to using Amazon Bedrock (or Amazon SageMaker JumpStart) to access pretrained image generation models and generate assets from short text briefs. Why the Answer is Correct: The startup needs to automatically generate 300 unique marketing assets per week from short (<=100 words) text briefs, over a 6-week campaign, and explicitly cannot rely on labeled historical datasets. That aligns directly with text-to-image generation: a generative model can synthesize novel, photorealistic product/marketing images from prompts. The “unique assets” requirement is a hallmark of generative AI (creating new outputs), and the “no labeled data” constraint points away from supervised ML and toward pretrained foundation models that can be prompted (and optionally lightly customized) rather than trained from scratch. Key AWS Features: With Amazon Bedrock, you can invoke image generation foundation models via API, control variability/uniqueness with parameters (e.g., seed, guidance, style), and scale generation through serverless patterns (AWS Lambda, Step Functions, EventBridge) to meet weekly volume. You can store outputs in Amazon S3, track prompts/metadata, and use IAM for least-privilege access. If brand consistency is needed, you can explore model customization where supported or use prompt templates and reference images (model-dependent) while still avoiding labeled datasets. Common Misconceptions: Options involving anomaly detection, forecasting, or performance tuning are “predictive/analytical” or “systems engineering” tasks, not content generation. They may use ML, but they do not produce new creative assets from prompts. Also, some may think any AI automation qualifies; on exams, “generative AI” specifically means generating new content. Exam Tips: Look for keywords: “generate,” “create,” “from prompts,” “unique assets,” “no labeled data,” and “foundation model.” These strongly indicate generative AI and services like Amazon Bedrock. If the task is classification, detection, or forecasting, it is typically traditional ML/analytics rather than generative AI.
An edtech company operating in 3 regions and managing 12 TB of proprietary learning content that includes student PII governed by FERPA is using the AWS Generative AI Security Scoping Matrix to evaluate security responsibilities across four proposed solution scopes. Under this matrix, which approach gives the company the MOST ownership of security responsibilities?
Licensing a third-party enterprise SaaS with embedded GenAI places most security responsibilities on the SaaS provider (application security, model operations, patching, availability). The customer mainly manages identity/access, configuration, and data governance decisions (what data is uploaded, retention, user permissions). In the scoping matrix, this is typically the lowest customer ownership model.
Using a third-party FM via API increases customer responsibility compared to SaaS (you secure your app, prompts, connectors, and any stored outputs). However, the FM provider still owns core model training, model-layer security controls, and much of the inference platform. You must focus on secure integration, data minimization, encryption in transit, and preventing sensitive data leakage in prompts.
Fine-tuning a third-party FM shifts more responsibility to the customer than simple API usage because you manage training data preparation, privacy controls, and evaluation for harmful outputs. Still, the underlying base model and often the managed training/inference environment are controlled by the provider/service. Customer ownership is high, but not as high as building and training a model entirely from scratch.
Building, training, and deploying a model from scratch maximizes customer security ownership: you secure data pipelines, training infrastructure, model artifacts, deployment endpoints, monitoring, and incident response. You must implement FERPA-aligned controls for PII, regional compliance, encryption, access control, logging, and governance across the full ML lifecycle. This is the highest-responsibility scope in the matrix.
Core Concept: This question tests the AWS Generative AI Security Scoping Matrix concept: as you move from consuming GenAI as a managed SaaS to building and training your own model, your organization assumes progressively more security responsibilities (data governance, model security, infrastructure, SDLC, monitoring, compliance). Why the Answer is Correct: Option D (designing, building, and training a model from scratch using the company’s own data) gives the company the MOST ownership of security responsibilities. In the scoping matrix, this is the highest-customer-responsibility scenario because the customer controls (and must secure) the entire stack: data ingestion and labeling, training pipelines, model artifacts, evaluation, deployment endpoints, and ongoing operations. With FERPA-governed student PII across 3 regions and 12 TB of proprietary content, the company must implement end-to-end controls for confidentiality, integrity, access, residency, retention, and auditability. Key AWS Features / Best Practices: In this scope, the company would typically rely on AWS shared responsibility for the underlying cloud, but must configure everything above it: IAM least privilege, KMS encryption for data/model artifacts, VPC isolation and private connectivity, Secrets Manager, CloudTrail/CloudWatch logging, GuardDuty/Security Hub, S3 bucket policies and access points, data classification and DLP controls, and strong SDLC controls (code scanning, artifact signing). For multi-region operations, they must also manage replication, key policies, and region-specific compliance controls. Common Misconceptions: Fine-tuning (Option C) can feel “most responsible” because proprietary/PII data is used, but the base model and much of the platform responsibility still sits with the model provider and/or managed service. Using an FM via API (Option B) may involve sensitive prompts, yet the provider still owns most model-layer security. SaaS (Option A) is the least customer ownership because the vendor manages application, model, and much of the operational security. Exam Tips: For questions asking “MOST ownership/responsibility,” choose the option where you build/train/host the model yourself. For “LEAST,” choose SaaS. Map options to the spectrum: SaaS (lowest) -> API consumption -> fine-tune -> build/train from scratch (highest).
A regional healthcare provider is deploying a triage chatbot powered by a large language model (LLM) that answers scheduling and insurance questions and retrieves clinic policies from a vector index; during red-team testing, 12% of 500 adversarial prompts coerced the model into revealing masked sample IDs even with temperature set to 0.2 and an 8,192-token context window. Which action will most effectively reduce the risk of prompt-injection and jailbreak attempts that try to elicit sensitive information or unsafe behaviors?
Correct. A structured system prompt and reusable template are foundational mitigations for prompt injection: they define strict behavioral boundaries, explicitly reject role-change/override attempts, and standardize refusal and safe-completion patterns. This reduces ambiguity and makes it harder for adversarial prompts to supersede instructions. In AWS deployments, this pairs well with Bedrock Guardrails and application-layer validation for stronger defense-in-depth.
Incorrect. Increasing temperature generally increases output variability, which can reduce reproducibility but does not reduce vulnerability. In many cases it can worsen safety by making the model more likely to produce unexpected or policy-violating content. Security controls should be deterministic and enforceable (guardrails, filtering, access control), not based on randomness.
Incorrect. Choosing models from SageMaker listings (or any catalog) does not inherently prevent prompt injection or jailbreaks. Prompt injection is largely an application-layer and interaction-pattern problem, not solely a model provenance problem. Even well-vetted models can be coerced without strong system instructions, guardrails, and data minimization.
Incorrect. Trimming user input to 200 tokens may slightly reduce the space for elaborate attacks, but many jailbreaks are short and still effective. It also degrades user experience and may break legitimate complex queries. Effective mitigation focuses on policy enforcement (system prompts/guardrails), output filtering, and preventing sensitive data from being available to the model.
Core Concept: This question tests LLM application security controls against prompt injection/jailbreaks—an AI security and governance topic. In AWS terms, it aligns with implementing guardrails and policy enforcement (for example, Amazon Bedrock Guardrails and application-layer input/output validation) to prevent sensitive data disclosure and unsafe behaviors. Why the Answer is Correct: A structured system prompt plus a reusable prompt template that explicitly defines allowed behavior, refusal rules, and handling of role-change attempts is the most direct and effective control among the options. Prompt injection commonly works by overriding instructions (e.g., “ignore previous instructions,” “act as system,” “reveal hidden data”). A robust system prompt and templating approach establishes a consistent instruction hierarchy, reduces ambiguity, and enables deterministic enforcement patterns (refuse, redirect, or sanitize). While not sufficient alone, it is the best single action listed to reduce jailbreak success rates. Key AWS Features / Best Practices: In production on AWS, you typically combine: (1) strong system prompts and templates, (2) input/output filtering and policy checks (e.g., Bedrock Guardrails for topic restrictions, PII redaction, and blocked content), (3) retrieval controls for RAG (limit what can be retrieved, store only non-sensitive embeddings, enforce document-level authorization), and (4) monitoring and incident response (CloudWatch logs, audit trails). Also apply least privilege to data sources and avoid placing secrets or sensitive identifiers in the prompt context. Common Misconceptions: Temperature does not “secure” a model; it changes randomness. Model marketplace listing does not guarantee jailbreak resistance. Trimming tokens can reduce some attack surface but also harms legitimate use and does not prevent short, effective injections. Exam Tips: For questions about prompt injection, choose controls that enforce policy and instruction hierarchy (system prompts, guardrails, input/output validation, and data minimization). Avoid answers that rely on randomness, vendor selection alone, or superficial context reduction as primary security measures.
A healthcare analytics startup partners with eight medical device ISVs to validate its AI pipelines every quarter. The compliance team wants to receive email notifications within 15 minutes whenever any ISV publishes a new quarterly compliance report, without building a custom polling solution, by using a managed AWS service that supports subscriptions to third-party data products; which AWS service should the startup use?
AWS Audit Manager helps continuously collect evidence from AWS services and map it to compliance frameworks (e.g., HIPAA, SOC) to simplify audits. It is not a subscription platform for third-party ISVs to publish quarterly reports, and it does not natively provide a “new report published by an external vendor” notification workflow. It’s about assessing your AWS environment’s controls, not distributing partner content.
AWS Artifact provides on-demand access to AWS compliance reports (e.g., SOC reports, ISO certifications) and allows acceptance of certain agreements. It is limited to AWS-provided artifacts, not third-party medical device ISV reports. While Artifact is central to compliance documentation, it does not support subscribing to external data products or notifying you when an ISV publishes a new quarterly report.
AWS Trusted Advisor delivers best-practice recommendations and checks across cost optimization, performance, security, fault tolerance, and service limits. It can generate alerts for certain account-related findings, but it is unrelated to third-party data product subscriptions or publication events. Trusted Advisor won’t notify you when an external ISV publishes a compliance report; it focuses on your AWS account posture.
AWS Data Exchange is designed for subscribing to and consuming third-party data products published by providers. ISVs can publish quarterly compliance reports as dataset revisions, and subscribers can use managed event-driven integrations (commonly EventBridge events routed to SNS email) to receive notifications when new revisions are published—meeting the 15-minute requirement without building a custom polling solution.
Core Concept: This question tests knowledge of AWS managed services for distributing and subscribing to third-party data products and triggering notifications on updates without custom polling. It also touches compliance workflows and event-driven integration. Why the Answer is Correct: AWS Data Exchange is purpose-built for providers (the ISVs) to publish data products and for subscribers (the startup) to subscribe to those products. When a provider publishes a new revision (e.g., a quarterly compliance report), subscribers can be notified via managed integrations (commonly through Amazon EventBridge events for Data Exchange actions), enabling near-real-time alerting (well within 15 minutes) without building a polling mechanism. This matches the requirement: “managed AWS service,” “subscriptions to third-party data products,” and “email notifications on publish.” Key AWS Features: 1) Data products and revisions: ISVs can publish datasets as products; each quarterly report can be a new revision. 2) Subscription model: The startup subscribes once and receives access to new revisions as they are published. 3) Event-driven notifications: Use Amazon EventBridge rules for AWS Data Exchange events (e.g., new revision published) and route to Amazon SNS for email delivery. This is a standard serverless pattern: EventBridge -> SNS (email subscription), meeting the “no custom polling” requirement. 4) Governance and auditability: Data Exchange integrates with IAM for access control and CloudTrail for API auditing, aligning with compliance needs. Common Misconceptions: AWS Artifact and Audit Manager are compliance-related, so they can appear relevant. However, Artifact is for AWS’s own compliance reports, not third-party ISV publications. Audit Manager helps automate evidence collection against frameworks but does not act as a marketplace/subscription service for external ISV reports. Trusted Advisor provides best-practice checks and alerts about AWS account posture, not third-party report publication. Exam Tips: When you see “subscribe to third-party data products” and “provider publishes updates,” think AWS Data Exchange. If the question also requires “notifications without polling,” pair it mentally with EventBridge and SNS. Artifact is specifically “AWS compliance documents,” while Audit Manager is “collect evidence and assess controls,” and Trusted Advisor is “account optimization checks.”
A regional bank is fine-tuning a foundation model (FM) to generate approval recommendations for small-business loans using both applicant text summaries and structured financial features; regulators require transparent, audit-ready explanations, including bias metrics across 4 demographic groups and per-prediction feature attributions (e.g., SHAP) for at least 90% of sampled inferences within 24 hours of each weekly model update. Which solution will meet these requirements?
Amazon Inspector focuses on automated vulnerability management for workloads (e.g., EC2 instances, container images in ECR, and certain Lambda package scanning). It helps with security posture and finding CVEs, not ML transparency. Inspector does not compute fairness/bias metrics across demographic groups and does not generate per-inference feature attributions like SHAP, so it cannot satisfy regulator explainability requirements.
SageMaker Clarify is purpose-built for Responsible AI: it can evaluate bias across sensitive attributes (such as demographic groups) and generate explainability outputs using feature attribution methods, including SHAP. Clarify can run as a processing job on a schedule or after each model update, producing reports that can be stored in S3 for audit readiness. This directly matches the bias + per-prediction attribution requirements and the 24-hour reporting window.
Amazon Macie discovers and classifies sensitive data (like PII) in Amazon S3 and helps with data security and privacy governance. While important for a bank’s compliance program, Macie does not provide model bias evaluation, fairness metrics, or prediction-level explainability. It addresses “what sensitive data exists and where,” not “why the model made this decision” or “whether outcomes are biased.”
Amazon Rekognition is a computer vision service for analyzing images and videos (labels, faces, text in images, custom labels). The use case here is loan underwriting using applicant text summaries and structured financial features, not image classification. Adding Rekognition labels does not produce bias metrics or SHAP explanations and does not meet the regulator’s audit-ready transparency requirements.
Core concept: The question is testing Responsible AI governance and model explainability on AWS—specifically bias detection/monitoring and per-inference explainability artifacts (e.g., SHAP) with audit-ready reporting after model updates. Why the answer is correct: Amazon SageMaker Clarify is the AWS service designed to (1) compute bias metrics across sensitive/demographic groups and (2) generate feature attribution explanations (including SHAP-based explanations) for model predictions. The requirement calls for transparent, regulator-ready explanations, bias metrics across 4 demographic groups, and per-prediction feature attributions for at least 90% of sampled inferences within 24 hours of each weekly model update. Clarify supports bias analysis for pre-training and post-training (including group-based metrics) and can produce explainability reports that include feature attributions. These artifacts can be stored (for example, in Amazon S3) and used for audits, and the analysis jobs can be scheduled/automated as part of a weekly model update pipeline. Key AWS features and best practices: Use SageMaker Clarify processing jobs to run bias and explainability analyses after each model update. Configure the sensitive attributes (the 4 demographic groups) and the label/outcome to compute bias metrics. For explainability, enable SHAP explainers to generate per-prediction attributions on a sampled inference dataset (meeting the “90% of sampled inferences” requirement). Orchestrate the weekly workflow with SageMaker Pipelines or AWS Step Functions, trigger on model package approval in SageMaker Model Registry, and write reports to S3 with versioning and retention for auditability. Common misconceptions: Security/compliance scanners (Inspector) and data discovery/classification tools (Macie) are important for governance, but they do not generate bias metrics or SHAP explanations. Rekognition is unrelated to tabular/text loan underwriting explainability and does not address regulatory transparency. Exam tips: When you see “bias metrics,” “demographic groups,” “explainability,” “feature attributions,” or “SHAP,” the exam is usually pointing to SageMaker Clarify. Pair it mentally with automation (Pipelines/Step Functions) and durable storage (S3) for audit-ready reporting timelines.
A telemedicine platform has deployed a virtual care assistant that returns AI-curated medical illustration images in response to natural-language queries; to comply with clinical guidelines and app store policies, the company must prevent display of images containing explicit nudity, graphic violence, or self-harm and must block and log any image with a moderation label confidence of 90% or higher for those categories at inference time without retraining the base model. Which solution will meet these requirements?
Correct. Scanning each image at inference time with a moderation service (e.g., Amazon Rekognition DetectModerationLabels) enables deterministic enforcement using the required confidence threshold (≥90%) for nudity/violence/self-harm categories. The application can block or blur the image before display and log the labels, confidence, and decision to CloudWatch/S3 for auditability. This meets the “no retraining” and “block and log at inference time” requirements.
Incorrect. Retraining may reduce the probability of unsafe outputs but does not guarantee that explicit content will never appear, and it does not satisfy the requirement to block based on a 90% moderation confidence at inference time. It also violates the constraint “without retraining the base model.” In regulated or policy-driven environments, runtime enforcement is required even if the model is improved.
Incorrect. A one-time validation on a held-out test set is a pre-release quality step, not a runtime control. Generative systems can produce novel outputs depending on prompts and context, so pre-release testing cannot ensure ongoing compliance. It also does not implement the required threshold-based blocking and logging for each inference request.
Incorrect. Incorporating user feedback can help identify issues and improve future behavior, but it is reactive and non-deterministic. It does not prevent unsafe images from being displayed in the moment, and it does not guarantee blocking when confidence is ≥90%. Feedback mechanisms are complementary to, not a substitute for, automated inference-time moderation and governance logging.
Core Concept: This question tests inference-time safety controls and governance for generative AI outputs without changing the underlying model. The key pattern is to apply automated content moderation (policy enforcement) on every generated/retrieved image before it is shown to users, and to log enforcement actions for audit/compliance. Why the Answer is Correct: The requirement is explicit: block and log any image containing explicit nudity, graphic violence, or self-harm when the moderation label confidence is 90% or higher, and do so at inference time without retraining. Option A directly implements a runtime “safety filter” by scanning each image output and enforcing a threshold-based decision (≥90%) to block/blur and record the event. This meets clinical guideline and app store policy needs because it prevents unsafe content from being displayed regardless of how it was produced (generated or retrieved). Key AWS Features: Amazon Rekognition provides DetectModerationLabels for images, returning labels and confidence scores that can be compared to a 90% threshold. You can implement this in the inference path (e.g., Lambda/ECS/EKS middleware) and log outcomes to CloudWatch Logs, S3, or a database for audit trails. If using Amazon Bedrock, Guardrails can help enforce safety policies at runtime (primarily for text; for images you typically pair with image moderation such as Rekognition or a Bedrock-compatible moderation workflow). Best practices include fail-closed behavior (block if moderation fails), configurable thresholds per category, and structured logging (label, confidence, request metadata) for governance. Common Misconceptions: Retraining (B) may reduce risk but cannot guarantee compliance at runtime and violates the “without retraining” constraint. One-time validation (C) is necessary but insufficient because generative outputs vary and new prompts can produce unsafe content after release. User feedback loops (D) are helpful for continuous improvement but do not provide deterministic, immediate blocking at inference time. Exam Tips: When you see “must block at inference time,” “threshold confidence,” and “no retraining,” choose runtime guardrails/moderation services. For images, think Amazon Rekognition moderation labels; for foundation model applications, think Bedrock guardrails plus logging/auditing. Always include enforcement + observability (logging) for compliance and governance requirements.
A retail marketplace uses a foundation model to classify product photos into 200 categories; before launch, the team wants to verify accuracy using a held-out benchmark of 10,000 labeled images with a target of at least 92% top-1 accuracy—what is the most appropriate strategy to evaluate the model’s accuracy?
Compute cost and runtime are operational metrics (efficiency), not predictive performance metrics (effectiveness). They help with budgeting, scaling, and latency/throughput planning, but they cannot confirm whether the model meets the required 92% top-1 accuracy. A model can be cheap and fast yet inaccurate, or expensive and slow yet accurate. This option does not evaluate correctness against labeled ground truth.
This is the correct strategy because it directly measures the required metric on the specified evaluation dataset. Run inference on the 10,000 labeled images, compute top-1 accuracy (argmax prediction equals the true label), and compare the result to the 92% acceptance threshold. This is standard ML evaluation practice for multi-class classification and best reflects expected real-world performance prior to launch.
Counting layers or parameters describes model capacity/complexity, not actual accuracy on the task. Parameter count is sometimes correlated with capability, but it is not a substitute for empirical evaluation on a representative labeled benchmark. Two models with similar size can have very different accuracy due to training data, fine-tuning, preprocessing, or domain shift. This option fails to validate the 92% requirement.
Color fidelity checks can be useful for validating image preprocessing pipelines (e.g., ensuring no unintended transformations), but they do not measure classification accuracy against labeled categories. A model could receive perfectly color-accurate images and still misclassify them. The requirement is explicitly top-1 accuracy on a labeled benchmark, which must be computed from predictions versus ground truth labels.
Core Concept: This question tests fundamental ML model evaluation using a labeled holdout dataset and an appropriate metric (top-1 accuracy) for multi-class image classification. In AWS terms, this aligns with standard evaluation practices you would apply whether you built the model in Amazon SageMaker (training jobs + evaluation jobs) or are assessing a foundation model’s performance on a benchmark dataset. Why the Answer is Correct: The team has a clear acceptance criterion: at least 92% top-1 accuracy on a held-out benchmark of 10,000 labeled images across 200 categories. The most appropriate strategy is to run inference on that benchmark set, compute top-1 accuracy (percentage of images where the model’s highest-probability predicted class matches the ground-truth label), and compare the result to the 92% target. This directly measures the stated objective and uses the correct dataset split (held-out) to estimate generalization performance prior to launch. Key AWS Features / Best Practices: In practice, you would store the benchmark in Amazon S3, run batch inference (e.g., SageMaker Batch Transform or a processing job), and compute metrics in a repeatable pipeline (SageMaker Pipelines). You may also track metrics and artifacts with SageMaker Experiments/Model Registry. For classification, also consider complementary metrics (confusion matrix, per-class accuracy, macro/micro F1) to detect class imbalance, but the question explicitly requires top-1 accuracy. Common Misconceptions: Cost/runtime (A) is important for operations but does not validate predictive quality. Model size (C) is not a performance guarantee; larger models can still be inaccurate or miscalibrated. Image color fidelity checks (D) relate to preprocessing/quality assurance, not classification correctness against labels. Exam Tips: When a question provides (1) a labeled holdout set and (2) a target metric threshold, the correct approach is almost always to compute that metric on the holdout set and compare to the threshold. Match the evaluation method to the business requirement (here: top-1 accuracy for multi-class classification).
A media-streaming platform operates 150 ML inference containers across 3 AWS Regions processing 25,000 requests per minute and needs a highly scalable AWS service to centrally track and alert on P95 latency, 5xx error rate, and GPU/CPU utilization for these workloads; which AWS service should the company use?
Amazon CloudWatch is the correct service for centralized operational monitoring. It collects and stores metrics at scale, supports percentile statistics (such as P95) for latency distributions, and provides alarms, dashboards, and notifications. With Container Insights and custom metrics, it can track CPU/GPU utilization and application KPIs like 5xx error rate across many containers and multiple Regions.
AWS CloudTrail records AWS API calls and events for auditing, governance, and security investigations (for example, who launched an instance or changed an IAM policy). It is not intended for performance monitoring such as P95 latency, HTTP 5xx rates, or GPU/CPU utilization, and it does not provide metric-based alarming for application SLOs.
AWS Trusted Advisor provides periodic checks and recommendations across cost optimization, performance, security, fault tolerance, and service limits. While it can flag issues like approaching quotas or underutilized resources, it does not provide real-time, per-request latency percentiles, 5xx error monitoring, or container-level GPU/CPU telemetry with alerting.
AWS Config tracks configuration state and changes of AWS resources and evaluates them against compliance rules (for example, whether S3 buckets are public or security groups allow 0.0.0.0/0). It is not a metrics/observability service and cannot natively compute P95 latency, monitor 5xx error rates, or track runtime GPU/CPU utilization for containers.
Core Concept: This question tests observability and operational monitoring on AWS—collecting metrics, aggregating them across many containers and Regions, creating percentiles (P95), and alerting on thresholds. The AWS-native service for metrics, logs, dashboards, and alarms is Amazon CloudWatch. Why the Answer is Correct: The platform needs centralized tracking and alerting for P95 latency, 5xx error rate, and GPU/CPU utilization across 150 inference containers in 3 Regions at high request volume. CloudWatch is designed to ingest high-cardinality time-series metrics, compute statistics (including percentiles for distributions), and trigger alarms. It can collect application metrics (latency, HTTP 5xx) via custom metrics or embedded metric format, and infrastructure/container metrics (CPU, memory, GPU) via CloudWatch Agent and Container Insights (ECS/EKS). For multi-Region operations, CloudWatch supports cross-account and cross-Region dashboards and can route alarms/notifications through Amazon SNS, OpsCenter, or incident tooling. Key AWS Features: 1) CloudWatch Metrics + Alarms: Create alarms on P95 latency and 5xx rate; use metric math to compute error rates (5xx/total). 2) Container Insights: Collect per-container and per-node CPU/memory/network; integrate with EKS/ECS. 3) GPU monitoring: Publish GPU utilization as custom metrics (e.g., via CloudWatch Agent/telegraf/nvidia-smi exporters) and alarm on thresholds. 4) Dashboards and cross-Region visibility: Central dashboards to view all Regions; optionally centralize via cross-account observability. 5) Anomaly Detection and Logs Insights (optional): Detect latency regressions and query logs for correlation. Common Misconceptions: CloudTrail is often confused with monitoring, but it records API activity (who did what) rather than performance metrics. Trusted Advisor provides best-practice checks and cost/security recommendations, not real-time P95 latency tracking. AWS Config tracks resource configuration changes and compliance, not runtime latency or GPU utilization. Exam Tips: When you see “track metrics,” “percentiles (P95),” “alerting,” “dashboards,” and “operational monitoring,” default to CloudWatch. Choose CloudTrail for audit/API history, Config for configuration compliance/drift, and Trusted Advisor for account-level recommendations. For containerized ML inference, remember Container Insights + custom metrics for model latency and GPU utilization are common patterns.
A healthtech startup needs to launch a multilingual symptom-checking chatbot and a product-description generator across 3 AWS Regions (us-east-1, eu-west-1, ap-southeast-1), must experiment with at least 4 different foundation models from multiple providers, requires serverless pay-per-request inference with built-in guardrails and retrieval-augmented generation integrations, and must scale from 500 to 50,000 requests per day without managing model hosting—Which AWS service provides managed access to foundation models to build and scale these generative AI applications?
Amazon Q Developer is a generative AI assistant focused on software development workflows (code generation, explanations, debugging, IDE integration, and AWS console assistance). It is not intended as a general managed service to access multiple third-party foundation models for building custom multilingual chatbots or content generators. It also does not represent the primary AWS service for serverless FM inference with configurable guardrails and RAG knowledge base integrations.
Amazon Bedrock is the correct choice because it provides fully managed, serverless access to multiple foundation models from different providers through a unified API. It supports pay-per-request inference, scales automatically, and removes the need to host or manage model infrastructure. Bedrock also includes Amazon Bedrock Guardrails for safety controls and Knowledge Bases for Bedrock to implement retrieval-augmented generation, matching the chatbot and product-description generation requirements.
Amazon Kendra is an enterprise search service that indexes and retrieves information from multiple data sources using connectors and relevance tuning. While it can be used as part of a RAG architecture (retrieving documents to ground responses), it does not provide managed access to multiple foundation models for text generation. Kendra solves search and retrieval, not serverless multi-model FM inference with built-in guardrails.
Amazon Comprehend is a managed natural language processing service for tasks like entity recognition, key phrase extraction, sentiment analysis, topic modeling, and PII detection. It is not a foundation-model platform and does not provide generative capabilities for building chatbots or product-description generators. Comprehend may complement a solution (e.g., PII detection), but it does not meet the requirement for managed multi-provider FM access and RAG/guardrails.
Core Concept: This question tests knowledge of AWS managed services for building generative AI applications using foundation models (FMs) without provisioning or operating model infrastructure. The key requirement is “managed access to foundation models” with serverless, pay-per-request inference, multi-model experimentation, and built-in safety and RAG integrations. Why the Answer is Correct: Amazon Bedrock is AWS’s fully managed service that provides API-based access to multiple foundation models from different providers (e.g., Anthropic, Meta, Mistral, Amazon Titan, Cohere, Stability AI—availability varies by Region and over time). It is designed for exactly this scenario: rapidly experimenting with several FMs, deploying generative AI apps (chatbots, content generation), and scaling usage without managing model hosting. Bedrock is serverless and supports on-demand throughput, aligning with the requirement to scale from 500 to 50,000 requests/day. Key AWS Features: Bedrock offers (1) model choice across providers via a consistent API, (2) on-demand and (where needed) provisioned throughput options, (3) built-in guardrails through Amazon Bedrock Guardrails to help enforce safety policies (e.g., topic restrictions, PII handling, toxicity filters), and (4) retrieval-augmented generation support via Knowledge Bases for Amazon Bedrock, which integrates with vector stores (including Amazon OpenSearch Service and others) to ground responses in enterprise data. It also integrates with AWS security primitives (IAM, KMS, CloudWatch, CloudTrail) for governance and auditing—important for healthtech contexts. Common Misconceptions: Amazon Kendra is often associated with “search + RAG,” but it is an enterprise search service, not a managed multi-FM inference platform. Amazon Comprehend is for classic NLP (entity extraction, sentiment, PII detection) rather than generative text creation. Amazon Q Developer is a specialized assistant for software development tasks, not a general-purpose FM hosting/access layer for multilingual symptom-checking and product generation. Exam Tips: When you see “multiple foundation models,” “serverless pay-per-request inference,” “no model hosting,” “guardrails,” and “RAG integrations,” the exam is pointing to Amazon Bedrock. Distinguish Bedrock (FM access + genAI app building) from SageMaker (custom ML training/hosting) and from search/NLP point services (Kendra/Comprehend). Also note that multi-Region deployment is an application architecture choice; Bedrock provides managed endpoints in supported Regions, while you deploy your app stack per Region for latency and resiliency.
A financial advisory chatbot hosted on Amazon Bedrock shows a 12% hallucination rate in an internal evaluation of 500 prompts when invoked with temperature=0.9 and top_p=0.95, and the team must reduce hallucinations below 5% within 24 hours without retraining or changing the foundation model—what should they do?
Incorrect. Agents for Amazon Bedrock are used to orchestrate multi-step tasks at inference time (tool use, function calling, retrieval, and workflow execution). They do not “supervise the model’s training process,” and they do not directly reduce hallucinations by changing how the foundation model was trained. While agents can improve factuality by grounding responses with tools or knowledge bases, the option’s premise about training supervision is wrong.
Incorrect. Data pre-processing to remove problematic training examples implies you have access to and can modify the model’s training dataset, followed by retraining or fine-tuning. The question explicitly prohibits retraining and requires improvement within 24 hours. In managed foundation models on Bedrock, customers generally cannot edit the original training corpus. This approach is not feasible under the stated constraints.
Correct. Lowering temperature reduces randomness in token sampling, making outputs more deterministic and typically reducing hallucinations—especially in high-stakes domains like financial advice where creative phrasing can become fabricated facts. This is an immediate inference-time change using Bedrock invocation parameters (temperature/top_p) and can be validated quickly against the same 500-prompt evaluation set to confirm hallucinations drop below 5%.
Incorrect. Switching to a different foundation model could reduce hallucinations, but the question explicitly says the team cannot change the foundation model. Even if allowed, model switching often requires re-validation, prompt adjustments, and regression testing—unlikely to be the fastest compliant fix within 24 hours. The best answer must respect the constraint and use inference-time controls.
Core Concept: This question tests inference-time controls for foundation models on Amazon Bedrock—specifically how decoding parameters (temperature and top_p) affect output variability and hallucination risk. When you cannot retrain, fine-tune, or change the model, the fastest lever is generation configuration. Why the Answer is Correct: A temperature of 0.9 with top_p=0.95 encourages diverse, creative outputs by increasing randomness in token selection. In a financial advisory chatbot, that creativity often manifests as fabricated facts, citations, or overly confident incorrect statements (hallucinations). Lowering temperature (e.g., to ~0.2) makes sampling more deterministic, pushing the model toward higher-probability tokens and more conservative completions. This typically reduces hallucinations quickly and can be implemented immediately (within minutes) by changing the Bedrock invocation parameters—meeting the 24-hour constraint. Key AWS Features: Amazon Bedrock runtime APIs allow per-request or default configuration of inference parameters such as temperature and top_p (nucleus sampling). Teams can A/B test parameter sets against the same evaluation prompts to quantify hallucination reduction. In production, these settings can be applied consistently via the application layer or orchestration components (for example, a Bedrock invocation wrapper) without modifying the underlying foundation model. Common Misconceptions: It’s tempting to think “hallucinations require better training data” or “a different model,” but the prompt explicitly forbids retraining and changing the model. Another misconception is that Agents for Amazon Bedrock “supervise training”; agents orchestrate tool use, retrieval, and action execution at inference time, not model training. While retrieval-augmented generation (RAG) and guardrails can also reduce hallucinations, the option set here focuses on the quickest guaranteed change: decoding randomness. Exam Tips: When constraints say “no retraining/fine-tuning” and “must fix fast,” look for inference-time mitigations: lower temperature/top_p, add grounding via retrieval, add guardrails, and require citations. High temperature/top_p increases creativity; low temperature increases determinism—preferred for regulated domains like finance, healthcare, and legal. Map the mitigation to the constraint and timeline: parameter tuning is the fastest, lowest-risk change.
Associate








