
Simulasikan pengalaman ujian sesungguhnya dengan 65 soal dan batas waktu 90 menit. Berlatih dengan jawaban terverifikasi AI dan penjelasan detail.
Didukung AI
Setiap jawaban diverifikasi silang oleh 3 model AI terkemuka untuk memastikan akurasi maksimum. Dapatkan penjelasan detail per opsi dan analisis soal mendalam.
A real-time financial news platform ingests about 12 GB of new multilingual articles per day and wants its in-house foundation model (FM) to reflect breaking developments within 24 hours without resetting weights; the team plans daily refreshes using a rolling 90-day corpus and warm-starting from the latest checkpoint to preserve capabilities. Which training strategy will keep the FM current with the most recent data while maintaining previously learned knowledge?
Batch learning is too generic for this scenario and does not specifically describe the continued training of a foundation model from an existing checkpoint. Many training jobs process data in batches, but that alone does not address the requirement to keep the FM current with newly arriving text while preserving prior knowledge. The question is asking for a lifecycle strategy for updating a pretrained model, not merely a data ingestion style. In exam terms, batch learning is not the precise label for warm-started ongoing FM refreshes.
Continuous pre-training is the correct strategy because it continues training an existing foundation model from its latest checkpoint instead of starting over from randomly initialized weights. That directly matches the requirement to refresh the model daily, keep it current with breaking multilingual news, and preserve previously learned capabilities. Using a rolling 90-day corpus helps the model absorb recent developments while still revisiting prior data, which reduces catastrophic forgetting. This is the standard approach when an organization wants an FM to stay up to date without resetting weights or discarding prior knowledge.
Static training means training the model once on a fixed dataset and then leaving the weights unchanged until a future full retraining cycle. That conflicts with the requirement to reflect breaking developments within 24 hours and to perform daily refreshes. A static model would quickly become outdated in a financial news setting where new events materially change the information landscape every day. It also does not align with the stated plan to warm-start from the latest checkpoint on a rolling corpus.
Latent training is not a standard training strategy term used for foundation model maintenance in AWS certification contexts or mainstream ML practice. Although the word 'latent' appears in discussions of latent representations or latent spaces, it does not describe a recognized method for incrementally updating an FM with new corpora. The option therefore does not match the operational pattern of daily checkpoint-based updates over a rolling dataset. It is essentially a distractor rather than a valid answer choice for this use case.
Core Concept: This question tests training lifecycle strategies for foundation models (FMs), specifically how to keep an FM up to date with newly arriving data without losing previously learned capabilities. The relevant concept is incremental/ongoing training on new data using an existing checkpoint (warm start), often called continuous pre-training (continued pretraining). Why the Answer is Correct: The platform ingests new multilingual articles daily and needs the FM to reflect breaking news within 24 hours. The team also explicitly wants to avoid “resetting weights” and instead warm-start from the latest checkpoint while training on a rolling 90-day corpus. That is the hallmark of continuous pre-training: you periodically continue training the same base model on newly collected domain data, starting from the most recent weights, so the model adapts to recent information while retaining general language competence. Using a rolling window helps balance recency with retention and reduces catastrophic forgetting compared to training only on the newest day’s data. Key AWS Features / Best Practices: On AWS, this pattern is typically implemented as a scheduled training pipeline (for example, orchestrated with Amazon SageMaker Pipelines or AWS Step Functions) that: 1) Curates daily data into Amazon S3, 2) Builds a rolling 90-day training set (often with versioning and lineage), 3) Launches a training job that initializes from the latest model checkpoint stored in S3, and 4) Registers the updated model in a model registry for controlled deployment. Best practices include checkpointing, data/version governance, evaluation gates (to detect regressions), and monitoring for drift and degradation. Common Misconceptions: “Batch learning” can sound similar because it uses periodic batches, but in ML terminology it usually contrasts with online learning and does not specifically imply continuing pre-training of an FM from a prior checkpoint to keep knowledge current. “Static training” implies one-and-done training, which fails the 24-hour freshness requirement. “Latent training” is not a standard strategy for keeping FMs current. Exam Tips: When you see requirements like “warm-start from latest checkpoint,” “rolling corpus,” “daily refresh,” and “keep the base model current,” map it to continuous/continued pre-training. If the question instead emphasized task-specific adaptation with small labeled datasets, that would point more toward fine-tuning rather than pre-training.
While using Amazon Bedrock, a team sees that a 44-word chat message is billed as 1,536 input tokens and 72 output tokens; in this context, what does the term token refer to?
Correct. Tokens are the discrete text units a foundation model consumes and produces after tokenization. A token can be a whole word, part of a word (subword), punctuation, whitespace markers, or special symbols. Bedrock meters many models by counting these input and output tokens, which explains why a short message in words can still be large in tokens due to subword splitting and request overhead.
Incorrect. Mathematical vector representations are embeddings, not tokens. Embeddings are continuous-valued vectors used to represent meaning for tasks like semantic search or retrieval-augmented generation (RAG). Token counts used for billing refer to the number of discrete token IDs processed by the model, not the dimensionality or number of embedding vectors.
Incorrect. Pre-trained weights are the model parameters learned during training (often billions of parameters). They determine the model’s behavior and are not counted per request. Token billing is about runtime usage (how much text is processed/generated), whereas weights relate to the model’s size and training, not per-inference metering.
Incorrect. Prompts are the instructions and context you provide, but a prompt is composed of tokens after tokenization. Bedrock bills based on the number of tokens in the prompt (plus any included context/system messages) and the number of tokens in the generated completion. Tokens are the units inside the prompt, not the prompt itself.
Core Concept: This question tests understanding of “tokens” in generative AI usage and billing, specifically in Amazon Bedrock. Bedrock (and most LLM providers) meters requests based on token counts for input (prompt + conversation history + system instructions) and output (model-generated text). Why the Answer is Correct: A token is the basic unit of text that a model reads and writes. Tokens are not always whole words; they can be subwords (word pieces), punctuation, whitespace markers, or special control symbols. That’s why a 44-word message can map to a much larger number of input tokens: the model’s tokenizer may split words into multiple pieces, and the billed “input tokens” often include more than the visible user message (for example, system prompts, formatting wrappers, safety instructions, and prior chat turns that are sent along with the request). Key AWS Features: In Amazon Bedrock, pricing and quotas are typically expressed in terms of input and output tokens for the selected foundation model. Tokenization is model-specific: different models (and even different versions) can tokenize the same text differently, leading to different token counts and costs. In chat use cases, the request commonly includes structured message roles (system/user/assistant) and may include conversation context, which increases input tokens. Understanding token-based metering helps with cost optimization (shorter prompts, trimming history, summarizing context) and latency management. Common Misconceptions: People often confuse tokens with embeddings (vectors) because both relate to text processing. Others assume tokens equal words, which is incorrect; tokenization is a preprocessing step that converts text into discrete IDs. Another confusion is thinking tokens are model parameters (weights) or prompts themselves—those are different concepts. Exam Tips: When you see Bedrock billing, throughput, or context window questions, “tokens” almost always means the text units produced by a tokenizer (word/subword/symbol units). Remember that input token counts can include hidden overhead (system prompts, chat formatting, and conversation history). If a word count seems inconsistent with token count, that’s a clue the question is about tokenization granularity and request composition, not an error in billing.
A video streaming platform wants to use AI to protect its content delivery APIs from malicious traffic; the AI must determine whether a new request’s client IP and user agent come from a suspicious source by comparing against 30 days of baseline traffic patterns, score up to 12,000 requests per minute with under 150 ms latency per request, and automatically flag unusual sources; which solution meets these requirements?
Speech recognition converts audio to text (ASR) and is used for call transcription, voice assistants, and media captioning. The problem here involves structured request attributes (client IP and user agent) and detecting suspicious sources, not interpreting audio signals. Even if the platform is “video streaming,” the security requirement is about API traffic analysis, so speech recognition is not applicable.
NLP named entity recognition (NER) extracts entities (people, organizations, locations, etc.) from unstructured text. The inputs in this scenario are client IP addresses and user-agent strings used for request fingerprinting and behavioral analysis. While user-agent is a string, the task is not entity extraction; it is identifying deviations from normal traffic patterns. Therefore, NER is the wrong ML approach.
An anomaly detection system is designed to learn normal patterns from historical data (e.g., 30 days of traffic) and score new events to flag unusual behavior. This directly matches the requirement to compare new requests’ IP/user-agent against baseline patterns and automatically identify suspicious sources. It also fits real-time scoring needs (12,000 RPM, <150 ms) via low-latency inference (e.g., SageMaker endpoints) and automated response (e.g., AWS WAF updates).
Fraud forecasting focuses on predicting future fraud rates or volumes over time (time-series forecasting), not evaluating each incoming request for abnormality relative to a learned baseline. The requirement is to score individual requests and flag unusual sources immediately, which is anomaly detection/classification. Forecasting could complement capacity planning or trend analysis, but it does not satisfy per-request, low-latency suspicious-source detection.
Core Concept: This question tests selecting the correct ML problem type for security telemetry: detecting unusual request sources (client IP + user agent) by learning “normal” behavior over a historical window and scoring new events in near real time. That is classic anomaly detection (often unsupervised or semi-supervised) rather than NLP, speech, or time-series forecasting. Why the Answer is Correct: The platform needs to compare new requests against 30 days of baseline traffic patterns and automatically flag unusual sources. That maps directly to anomaly detection: build a model of normal patterns (e.g., typical IP/user-agent combinations, frequency, geo/ASN distributions, request rates) and score incoming requests for deviation. The requirement to “score up to 12,000 requests per minute with under 150 ms latency per request” implies low-latency online inference, which anomaly detection systems commonly support via real-time endpoints. Key AWS Features: On AWS, this is commonly implemented with Amazon SageMaker (real-time inference endpoint) using built-in algorithms such as Random Cut Forest (RCF) for anomaly detection, or custom models. You would train on 30 days of logs (e.g., from Amazon CloudFront, ALB, API Gateway access logs, or AWS WAF logs) stored in Amazon S3, then deploy to a SageMaker endpoint with autoscaling to meet 12k RPM and latency targets. For streaming ingestion and near-real-time scoring, Amazon Kinesis Data Streams / Firehose can feed features to the model, and results can trigger automated actions (e.g., update AWS WAF IP sets, publish to Amazon EventBridge, or alert via Amazon SNS). This aligns with Well-Architected Security and Reliability pillars: automate detection/response and scale horizontally. Common Misconceptions: Some may confuse “suspicious traffic” with “fraud forecasting,” but forecasting predicts future values (e.g., demand or fraud volume) rather than classifying individual requests as anomalous. NLP named entity recognition and speech recognition are irrelevant because the inputs are structured request metadata, not text entities or audio. Exam Tips: When you see “baseline of normal behavior,” “flag unusual,” and “score events,” think anomaly detection. When you see “predict future trend,” think forecasting. Match the ML task to the data type: IP/user-agent telemetry is structured, not language or audio. Also note operational constraints (RPM/latency) point to real-time inference endpoints and autoscaling.
A healthcare startup is using few-shot prompting with a foundation model hosted on Amazon Bedrock; the current prompt includes 12 examples, the model is invoked once per day at 07:00 UTC, and output quality is satisfactory. The company wants to lower its monthly cost while maintaining current performance. Which solution will meet these requirements?
Customizing/fine-tuning a model can sometimes reduce the need for long few-shot prompts, but it adds extra cost and operational overhead (training, evaluation, and potentially higher ongoing charges). Given the model runs only once per day and quality is already satisfactory, fine-tuning is unlikely to be the most cost-effective way to reduce monthly spend while maintaining current performance.
Bedrock usage costs are largely driven by input and output tokens. Few-shot prompting with 12 examples increases input tokens every invocation. Reducing prompt tokens (fewer/shorter examples, removing redundant text, concise formatting) directly lowers cost while keeping the same model and invocation frequency. You can iteratively trim and validate quality to maintain current performance.
Increasing the number of tokens in the prompt (for example, adding more examples or longer instructions) will increase input token usage and therefore increase cost. While it might improve quality in some cases, the question states output quality is already satisfactory and the goal is to lower monthly cost, making this the opposite of what is needed.
Provisioned Throughput on Amazon Bedrock is intended for workloads that need guaranteed capacity, predictable latency, and sustained throughput. For a single invocation per day, provisioned capacity would typically be more expensive than on-demand token-based usage. It does not reduce token charges and is not aligned with the low-frequency usage pattern described.
Core Concept: Amazon Bedrock pricing for most foundation models is primarily usage-based (input tokens + output tokens). Few-shot prompting increases input tokens because each example adds text to the prompt. If quality is already satisfactory, the most direct way to reduce cost while keeping the same model and invocation pattern is to reduce token usage. Why the Answer is Correct: The model is invoked only once per day, so throughput/latency guarantees are not a requirement. With 12 few-shot examples, the prompt likely contains many tokens that are billed every invocation. Decreasing the number of tokens in the prompt (for example, reducing the number of examples, shortening examples, removing redundant instructions, or compressing formatting) reduces input token consumption and therefore lowers monthly cost. Because the current output quality is satisfactory, you can iteratively trim tokens while validating that performance remains acceptable—meeting the “maintain current performance” requirement. Key AWS Features: Bedrock model invocation costs scale with token counts; prompt engineering is an operational lever to control spend. Practical techniques include: keeping only the most representative few-shot examples, using shorter exemplars, removing verbose system instructions, and standardizing concise schemas. You can measure token usage and response quality via logging/observability (for example, capturing prompt/response metadata) and run A/B tests to ensure quality is maintained. Common Misconceptions: Fine-tuning/customization can improve quality or reduce prompt length, but it introduces additional costs (training and potentially hosting/management) and is unnecessary when quality is already satisfactory and traffic is extremely low. Provisioned Throughput is often misunderstood as a cost saver; it is designed for consistent, high-throughput workloads and reserved capacity, not a once-per-day invocation. Exam Tips: When a question asks to “lower cost” for LLMs, first look for token-reduction strategies (shorter prompts, fewer examples, smaller outputs) before considering customization or reserved capacity. Choose Provisioned Throughput only when the workload needs predictable capacity/latency at scale or sustained high request rates. For sporadic, low-volume usage, on-demand token-based pricing plus prompt optimization is typically the most cost-effective approach.
A university research team used Amazon Bedrock to customize a foundation model to answer questions about campus services; they now want to run a structured validation on 2,500 new, unseen queries and need to upload a single 150 MB JSONL file that Amazon Bedrock can access for evaluation in the same AWS Region as the Bedrock instance. Which AWS service should they use to store this dataset so Bedrock can read it directly from object storage?
Amazon S3 is the correct choice because it is AWS’s native object storage service and is commonly used as the source location for datasets consumed by managed services like Amazon Bedrock. A 150 MB JSONL file is well within S3 object limits, and placing it in an S3 bucket in the same Region enables Bedrock to access it via an S3 URI with appropriate IAM permissions (and optional SSE-KMS encryption).
Amazon EBS is block storage designed to be attached to an EC2 instance (or certain managed compute services) as a volume. It is not an object storage service and does not provide a direct “read from object storage” integration point for Bedrock evaluation. Using EBS would require provisioning compute to host and expose the data, adding unnecessary complexity and not meeting the stated requirement.
Amazon EFS is a managed network file system (NFS) used primarily with EC2, containers, and some AWS compute services that can mount file systems. While it can store files, it is not object storage and is not the standard direct input source for Bedrock evaluation datasets. It would also require a mounted environment and network configuration rather than a simple S3 object reference.
AWS Snowcone is a small edge device used for offline/edge compute and data transfer to AWS, typically when connectivity is limited or for rugged environments. It is not a regional object storage service that Bedrock can directly read from for an evaluation job. Snowcone might help move data into AWS, but the dataset would still ultimately need to land in S3 for direct Bedrock access.
Core Concept: This question tests which AWS storage service provides object storage that Amazon Bedrock can read directly for model evaluation datasets. Bedrock evaluation workflows commonly reference datasets stored in Amazon S3 (for example, JSONL files) via an S3 URI in the same AWS Region. Why the Answer is Correct: Amazon S3 is AWS’s regional object storage service designed for storing and retrieving files (objects) like a single 150 MB JSONL dataset. Bedrock can access evaluation inputs from S3 because S3 is natively integrated across AWS services and is the standard “object storage” target referenced by managed AI/ML services. Storing the dataset in an S3 bucket in the same Region as the Bedrock resources satisfies the requirement for regional access and minimizes latency and data transfer complexity. Key AWS Features / Best Practices: S3 provides durable, highly available storage with simple access patterns (S3 URI), supports large objects well beyond 150 MB, and integrates with IAM for fine-grained access control. For Bedrock to read the dataset, you typically grant permissions (e.g., s3:GetObject on the bucket/prefix) to the Bedrock service role or execution role used for the evaluation job. You can also apply bucket policies, encryption (SSE-S3 or SSE-KMS), and S3 versioning for governance and repeatable evaluations. Common Misconceptions: EBS and EFS are file/block storage attached to compute (EC2, some container runtimes) and are not “object storage” endpoints that Bedrock reads from directly. Snowcone is an edge/offline data transfer and compute device, not a regional object store for online service access. These options can seem plausible because they store data, but they don’t match the “read directly from object storage” requirement. Exam Tips: When a question says “object storage,” “upload a file,” “service reads from a bucket,” or references datasets for managed ML/GenAI services, default to Amazon S3 unless there’s a specific constraint pushing you to another service. Also watch for Region requirements: choose an S3 bucket in the same Region and ensure IAM permissions and (if required) KMS key policies allow the service to read the objects.
Ingin berlatih semua soal di mana saja?
Unduh Cloud Pass gratis — termasuk tes latihan, pelacakan progres & lainnya.
A fintech startup needs to build an AWS Glue ETL job to process 120 CSV files (about 80 GB) that arrive daily in Amazon S3, but the team has little AWS Glue programming experience and wants step-by-step guidance and code suggestions directly in the AWS console to produce a working PySpark job within 2 days; which AWS service should they use to help them build and use AWS Glue?
Amazon Q Developer is the best answer because it is AWS’s AI-powered assistant for developers, designed to help users build applications and workflows on AWS more quickly. For an AWS Glue ETL job, it can help generate or explain PySpark code, suggest implementation patterns, and provide guidance on how to use Glue features such as reading from Amazon S3, transforming data, and writing outputs. This is especially valuable for a team with limited Glue programming experience and a short delivery timeline. Among the options, it is the only service whose primary purpose is developer productivity and guided code assistance.
AWS Config is a service for assessing, auditing, and evaluating the configurations of AWS resources. It helps with compliance monitoring and governance, such as checking whether resources conform to desired rules or policies. It does not provide code generation, step-by-step development help, or PySpark authoring assistance for AWS Glue jobs. Therefore, it does not meet the requirement for guided ETL job creation.
Amazon Personalize is a managed machine learning service used to build recommendation systems and personalized user experiences. Its purpose is to generate recommendations based on user behavior and item metadata, not to help developers write ETL code or learn AWS Glue. Although ETL pipelines may prepare data for Personalize, the service itself does not provide development guidance for Glue. That makes it unrelated to the need described in the question.
Amazon Comprehend is a natural language processing service that extracts insights such as entities, sentiment, key phrases, and PII from text. It is useful for text analytics workloads, but it is not a developer assistant and does not help users build AWS Glue ETL jobs. It cannot provide step-by-step coding guidance or generate PySpark scripts for Glue. As a result, it does not satisfy the stated requirement.
Core concept: identify the AWS service that provides AI-powered developer assistance to help a team quickly create an AWS Glue ETL job with minimal prior experience. The correct choice is Amazon Q Developer because it offers conversational guidance, code generation, and troubleshooting help for AWS development tasks, including Glue-related PySpark patterns. Key features include code suggestions, explanations of AWS services and APIs, and help accelerating implementation when teams are unfamiliar with service-specific programming. A common misconception is to confuse operational or ML services with developer-assistance tooling; AWS Config is for compliance, while Personalize and Comprehend are application AI services, not coding assistants. Exam tip: when the question emphasizes step-by-step guidance, code suggestions, and helping developers build on AWS faster, look for Amazon Q Developer.
A regional e-commerce startup planning a 6-week holiday campaign needs a solution that can automatically generate 300 unique marketing assets per week from text briefs of 100 words or fewer without relying on any labeled historical datasets; which of the following best represents a generative AI use case for this requirement?
Intrusion detection and anomaly flagging is a security analytics/ML classification use case, typically using services like Amazon GuardDuty, Amazon Detective, or custom anomaly detection. It focuses on identifying suspicious patterns, not generating new content. It also often relies on historical telemetry and baselines rather than prompt-driven creation, so it does not match the requirement to generate marketing assets from short text briefs.
This is a direct generative AI use case: text-to-image generation to create photorealistic marketing/product images from short prompts. It satisfies the need for hundreds of unique assets per week and does not require labeled historical datasets because pretrained foundation models can generate images via prompting. On AWS, this aligns well with Amazon Bedrock image generation models (invoked via API) and storing outputs in Amazon S3 for campaign workflows.
Optimizing indexing strategies is a database engineering/performance tuning activity (e.g., Amazon RDS, Aurora, DynamoDB design patterns). It is not an AI/ML task and does not involve generating new content. While it can improve query latency and throughput, it does not address the requirement to create marketing assets from text briefs, so it is not a generative AI use case.
Forecasting financial time-series data is a predictive analytics/ML use case (e.g., Amazon Forecast or custom models in SageMaker). It produces predictions (future values) rather than creative assets like images. It also commonly benefits from historical labeled/structured time-series data. Therefore, it does not match the prompt-based, content-generation requirement described in the question.
Core Concept: This question tests recognition of a generative AI use case: using foundation models to create new content (images, text, audio) from prompts, without requiring labeled historical training data. On AWS, this commonly maps to using Amazon Bedrock (or Amazon SageMaker JumpStart) to access pretrained image generation models and generate assets from short text briefs. Why the Answer is Correct: The startup needs to automatically generate 300 unique marketing assets per week from short (<=100 words) text briefs, over a 6-week campaign, and explicitly cannot rely on labeled historical datasets. That aligns directly with text-to-image generation: a generative model can synthesize novel, photorealistic product/marketing images from prompts. The “unique assets” requirement is a hallmark of generative AI (creating new outputs), and the “no labeled data” constraint points away from supervised ML and toward pretrained foundation models that can be prompted (and optionally lightly customized) rather than trained from scratch. Key AWS Features: With Amazon Bedrock, you can invoke image generation foundation models via API, control variability/uniqueness with parameters (e.g., seed, guidance, style), and scale generation through serverless patterns (AWS Lambda, Step Functions, EventBridge) to meet weekly volume. You can store outputs in Amazon S3, track prompts/metadata, and use IAM for least-privilege access. If brand consistency is needed, you can explore model customization where supported or use prompt templates and reference images (model-dependent) while still avoiding labeled datasets. Common Misconceptions: Options involving anomaly detection, forecasting, or performance tuning are “predictive/analytical” or “systems engineering” tasks, not content generation. They may use ML, but they do not produce new creative assets from prompts. Also, some may think any AI automation qualifies; on exams, “generative AI” specifically means generating new content. Exam Tips: Look for keywords: “generate,” “create,” “from prompts,” “unique assets,” “no labeled data,” and “foundation model.” These strongly indicate generative AI and services like Amazon Bedrock. If the task is classification, detection, or forecasting, it is typically traditional ML/analytics rather than generative AI.
A regional healthcare provider is deploying a triage chatbot powered by a large language model (LLM) that answers scheduling and insurance questions and retrieves clinic policies from a vector index; during red-team testing, 12% of 500 adversarial prompts coerced the model into revealing masked sample IDs even with temperature set to 0.2 and an 8,192-token context window. Which action will most effectively reduce the risk of prompt-injection and jailbreak attempts that try to elicit sensitive information or unsafe behaviors?
Correct. A structured system prompt and reusable template are foundational mitigations for prompt injection: they define strict behavioral boundaries, explicitly reject role-change/override attempts, and standardize refusal and safe-completion patterns. This reduces ambiguity and makes it harder for adversarial prompts to supersede instructions. In AWS deployments, this pairs well with Bedrock Guardrails and application-layer validation for stronger defense-in-depth.
Incorrect. Increasing temperature generally increases output variability, which can reduce reproducibility but does not reduce vulnerability. In many cases it can worsen safety by making the model more likely to produce unexpected or policy-violating content. Security controls should be deterministic and enforceable (guardrails, filtering, access control), not based on randomness.
Incorrect. Choosing models from SageMaker listings (or any catalog) does not inherently prevent prompt injection or jailbreaks. Prompt injection is largely an application-layer and interaction-pattern problem, not solely a model provenance problem. Even well-vetted models can be coerced without strong system instructions, guardrails, and data minimization.
Incorrect. Trimming user input to 200 tokens may slightly reduce the space for elaborate attacks, but many jailbreaks are short and still effective. It also degrades user experience and may break legitimate complex queries. Effective mitigation focuses on policy enforcement (system prompts/guardrails), output filtering, and preventing sensitive data from being available to the model.
Core Concept: This question tests LLM application security controls against prompt injection/jailbreaks—an AI security and governance topic. In AWS terms, it aligns with implementing guardrails and policy enforcement (for example, Amazon Bedrock Guardrails and application-layer input/output validation) to prevent sensitive data disclosure and unsafe behaviors. Why the Answer is Correct: A structured system prompt plus a reusable prompt template that explicitly defines allowed behavior, refusal rules, and handling of role-change attempts is the most direct and effective control among the options. Prompt injection commonly works by overriding instructions (e.g., “ignore previous instructions,” “act as system,” “reveal hidden data”). A robust system prompt and templating approach establishes a consistent instruction hierarchy, reduces ambiguity, and enables deterministic enforcement patterns (refuse, redirect, or sanitize). While not sufficient alone, it is the best single action listed to reduce jailbreak success rates. Key AWS Features / Best Practices: In production on AWS, you typically combine: (1) strong system prompts and templates, (2) input/output filtering and policy checks (e.g., Bedrock Guardrails for topic restrictions, PII redaction, and blocked content), (3) retrieval controls for RAG (limit what can be retrieved, store only non-sensitive embeddings, enforce document-level authorization), and (4) monitoring and incident response (CloudWatch logs, audit trails). Also apply least privilege to data sources and avoid placing secrets or sensitive identifiers in the prompt context. Common Misconceptions: Temperature does not “secure” a model; it changes randomness. Model marketplace listing does not guarantee jailbreak resistance. Trimming tokens can reduce some attack surface but also harms legitimate use and does not prevent short, effective injections. Exam Tips: For questions about prompt injection, choose controls that enforce policy and instruction hierarchy (system prompts, guardrails, input/output validation, and data minimization). Avoid answers that rely on randomness, vendor selection alone, or superficial context reduction as primary security measures.
A regional hospital network is deploying a customer-facing Q&A assistant using the Claude 3 Sonnet foundation model through Amazon Bedrock. The team needs the model to answer with up-to-date facts from 75,000 internal policy PDFs privately stored in Amazon S3, without custom fine-tuning, and they want nightly data refresh and source citations in responses with a target latency under 2 seconds. Which solution will meet this requirement?
Switching to a different foundation model does not solve the core need: securely grounding responses in 75,000 private PDFs with nightly refresh and citations. Any FM (including Claude) will not automatically know the hospital’s internal policies unless you provide them at inference time (RAG) or train/fine-tune. Model choice may affect quality/latency, but it won’t provide private data access or source attribution by itself.
Reducing the temperature makes outputs more deterministic and can reduce hallucination slightly, but it does not provide the model with new knowledge from internal S3 PDFs. Temperature tuning cannot enable nightly refresh, semantic retrieval across a large corpus, or citations. It’s a generation parameter, not a data integration or grounding mechanism, so it cannot meet the “up-to-date facts + citations” requirement.
Amazon Bedrock Knowledge Bases connected to the S3 data source is the correct approach because it implements retrieval-augmented generation: ingest and index the PDFs, retrieve relevant chunks at query time, and provide grounded answers with citations. It supports ongoing refresh via ingestion jobs (e.g., nightly) and avoids custom fine-tuning. With an appropriately configured vector store and retrieval settings, it can meet low-latency targets.
Model invocation logging in Amazon Bedrock helps with monitoring, troubleshooting, and audit/compliance by recording prompts/responses (subject to configuration). However, it does not improve factual accuracy, does not retrieve content from S3, does not enable nightly refresh of knowledge, and does not generate citations. It’s an observability feature, not a solution for grounding or enterprise knowledge integration.
Core Concept: This question tests retrieval-augmented generation (RAG) on Amazon Bedrock using Amazon Bedrock Knowledge Bases to ground a foundation model’s responses in private enterprise data stored in Amazon S3, without fine-tuning. Why the Answer is Correct: The requirement is for Claude 3 Sonnet to answer with up-to-date facts from 75,000 internal policy PDFs, refreshed nightly, privately stored in S3, and to include source citations with low latency. A Bedrock knowledge base connected to an S3 data source is purpose-built for this: it ingests documents from S3, chunks and indexes them into a vector store, and at query time retrieves the most relevant passages and injects them into the model prompt. This “grounding” enables accurate, current answers without custom fine-tuning and can return citations that reference the retrieved source documents. Key AWS Features: - Amazon Bedrock Knowledge Bases: managed RAG workflow (ingestion, chunking, embeddings, retrieval, orchestration). - S3 as a data source: supports large document corpora such as PDFs; ingestion jobs can be scheduled/triggered for nightly refresh. - Vector store integration: Knowledge Bases can use supported vector databases (including AWS-native options) to enable fast semantic retrieval, helping meet sub-2-second latency targets when properly sized and tuned. - Citations: Knowledge Bases can return retrieved references/attributions so the application can display source citations alongside answers. - Security: keeps data private in the customer’s AWS account with IAM-based access controls; aligns with least privilege and data governance best practices. Common Misconceptions: Changing the model (A) doesn’t inherently provide access to private, up-to-date internal PDFs. Temperature (B) affects randomness/creativity, not factual grounding or access to proprietary data. Invocation logging (D) is for observability/auditing and does not improve answer accuracy or add citations. Exam Tips: When you see “private S3 documents,” “no fine-tuning,” “fresh data,” and “citations,” think Bedrock Knowledge Bases (RAG). Fine-tuning is for style/task adaptation, not for continuously updated factual corpora. Also, latency requirements often imply retrieval + prompt augmentation rather than large prompt stuffing of thousands of documents.
A metropolitan public transit agency plans to deploy a large language model (LLM) to triage customer complaints and route them to the correct department; during a 6-week pilot the system will process about 20,000 tickets across 5 languages, and the agency must evaluate the LLM outputs for bias and potential discrimination toward riders from different demographic groups. Which data source will enable the agency to evaluate the LLM outputs with the least administrative effort?
Live pilot tickets are realistic and reflect current user behavior, but they rarely include the labels needed to measure bias (e.g., demographic group identifiers, protected-class proxies, or ground-truth fairness outcomes). Using them responsibly often requires consent/privacy review, data minimization, and a human labeling effort to create evaluation targets. That administrative overhead is high for a short pilot.
Historical logs may include prior routing decisions and notes, which can help assess consistency and operational accuracy. However, they can also encode historical human bias, and they typically still lack explicit, reliable demographic labels needed for discrimination analysis. Cleaning, de-identifying, and labeling historical data for fairness slices is time-consuming and increases administrative effort.
Policy documents define what “nondiscrimination” means for the agency and can guide evaluation criteria, thresholds, and escalation processes. But policies are not a data source for measuring model bias/toxicity; they contain rules, not labeled examples. You would still need an evaluation dataset to test outputs against demographic groups and bias metrics.
Public benchmark datasets for bias/toxicity/fairness come with predefined labels and established evaluation protocols, enabling rapid, repeatable measurement with minimal setup. They reduce the need for custom annotation, demographic taxonomy design, and governance work required to label internal tickets. While not perfectly domain-specific, they are the lowest-effort way to get an initial bias/discrimination assessment.
Core Concept: This question tests Responsible AI evaluation: selecting an evaluation dataset that minimizes operational overhead while enabling bias/toxicity/fairness assessment. In AWS exam context, this aligns with using standardized, labeled evaluation resources (often used with tools like Amazon SageMaker Clarify or model evaluation pipelines) rather than building a bespoke labeled dataset. Why the Answer is Correct: Publicly available benchmark datasets for bias/toxicity/fairness with predefined labels require the least administrative effort because they are already curated, documented, and labeled for specific responsible-AI metrics. The agency can quickly run repeatable evaluations (e.g., disparate toxicity across demographic identifiers, stereotyping prompts, or fairness slices) without first creating a labeling program, defining sensitive-attribute taxonomies, or collecting consent for demographic inference. For a short 6-week pilot and only ~20,000 tickets, the fastest path to an initial bias/discrimination signal is to use established benchmarks. Key AWS Features / Best Practices: In AWS, responsible AI evaluations commonly leverage: - Dataset-driven bias analysis and slice metrics (e.g., SageMaker Clarify bias metrics and explainability workflows). - Repeatable evaluation pipelines (SageMaker Pipelines) using fixed benchmark datasets to track regressions. - Governance practices: documenting evaluation datasets, metrics, and limitations (model cards / evaluation reports) consistent with AWS Well-Architected ML guidance (governance, monitoring, and continuous evaluation). Benchmarks also help standardize comparisons across model versions and prompt templates. Common Misconceptions: It can seem intuitive to use live pilot tickets (A) or historical logs (B) because they are “real.” However, for bias evaluation they typically lack ground-truth labels for protected attributes and outcomes, and using them responsibly may require additional privacy reviews, demographic labeling/inference decisions, and human annotation—significant administrative work. Policy documents (C) are important for requirements but are not an evaluation dataset. Exam Tips: When a question emphasizes “least administrative effort” for bias/toxicity/fairness evaluation, look for “pre-labeled benchmark datasets” or “standard evaluation sets.” Real-world data is valuable later for domain-specific validation, but it usually increases effort due to labeling, privacy, and governance steps. Separate “requirements/policies” from “evaluation data.”
Masa belajar: 1 month
I found this practice questions and explanations very aligned with actual certification exam.
Masa belajar: 2 weeks
문제 좋네요. 비슷한 유형들이 꽤나 많았어요
Masa belajar: 1 month
실무에서 ai 서비스 개발 중이어서 문제 풀 때 쉬웠고, 시험도 무난하게 합격했어요
Masa belajar: 2 weeks
실제 시험 문제 절반이 비슷했고, 나머지는 새로보는 유형이었어요
Masa belajar: 1 month
I learned about this certification only with these questions and passed it after learning for 2 days


Ingin berlatih semua soal di mana saja?
Dapatkan aplikasi gratis
Unduh Cloud Pass gratis — termasuk tes latihan, pelacakan progres & lainnya.