
Simulez l'expérience réelle de l'examen avec 65 questions et une limite de temps de 130 minutes. Entraînez-vous avec des réponses vérifiées par IA et des explications détaillées.
Propulsé par l'IA
Chaque réponse est vérifiée par 3 modèles d'IA de pointe pour garantir une précision maximale. Obtenez des explications détaillées par option et une analyse approfondie des questions.
A media streaming startup lands ~3 TB of raw clickstream logs per day in Amazon S3 and loads curated aggregates into an Amazon Redshift RA3 cluster, and analysts also need to run low-latency ad hoc queries on the freshest S3 data via Amazon Redshift Spectrum using an external schema backed by the AWS Glue Data Catalog; given that most filters are on event_date (YYYY-MM-DD) and region and the team wants the fastest Spectrum query performance, which two actions should they take? (Choose two.)
A media analytics company needs a workflow orchestrator for 200+ scheduled data pipelines that run across an on-premises Kubernetes cluster (3 worker nodes, 32 vCPU each) and an AWS account in us-east-1, requiring the same open-source DAG definitions in both locations, avoiding vendor lock-in, and supporting at least 500 task runs per day; which AWS service should the team adopt so they can run the open-source engine on premises and a fully managed equivalent in the cloud?
A media analytics startup operates an on-premises Oracle 12c database connected to AWS over a 1 Gbps Direct Connect link, and a data engineer must crawl a specific table (~50 million rows, 30 columns) via JDBC to catalog the schema, then extract, transform, and load the data into an Amazon S3 bucket as partitioned Parquet (Snappy) on a daily 01:00 UTC schedule while orchestrating the end-to-end pipeline with minimal managed service overhead to keep costs low; which AWS service or feature will most cost-effectively meet these requirements?
A fintech company streams payment event logs to an Amazon Kinesis Data Streams data stream with 12 shards; each record is 2 KB and producers send about 5,000 records per second overall, but CloudWatch shows two shards at 95% write utilization while the other shards are under 10%, and PutRecords calls return ProvisionedThroughputExceeded for those hot shards. Producers currently use merchantId as the partition key, and during a flash sale a single merchant generates approximately 70% of events, creating hot shards even though total throughput is below the stream's aggregate limits. How should the data engineer eliminate the throttling while keeping the same overall throughput?
A media platform needs to analyze playback logs stored in a PostgreSQL database. The company wants to correlate the logs with customer issues tracked in Zendesk. The company receives 2 GB of new playback logs each day. The company has 100 GB of historical Zendesk tickets. A data engineer must develop a process that analyzes and correlates the logs and tickets. The process must run once each night. Which solution will meet these requirements with the LEAST operational overhead?
Envie de vous entraîner partout ?
Téléchargez Cloud Pass gratuitement — inclut des tests d'entraînement, le suivi de progression et plus encore.
A media-streaming analytics team uses Amazon Redshift Serverless (workgroup: prod-analytics in us-east-1) with 9 materialized views over a clickstream schema and must automate a schedule that runs REFRESH MATERIALIZED VIEW for all 9 views every 30 minutes between 08:00 and 20:00 UTC without provisioning or managing any orchestration infrastructure; which approach meets this requirement with the least effort?
A travel-tech company is consolidating booking and customer-support datasets from multiple legacy systems into an Amazon S3 data lake; an engineer reviewing historical exports (about 3 TB of CSV and JSON per week, ~120 million rows) finds that many bookings and customer profiles are duplicated across systems. The engineer must identify and remove duplicate information before publishing to the curated zone and wants a solution that minimizes operational overhead, scales automatically, and avoids managing servers or third-party libraries. Which approach meets these requirements with the least operational overhead?
An urban mobility firm ingests 8,000 sensor events per second from city traffic cameras into Amazon Kinesis Data Streams and requires a highly fault-tolerant, near-real-time analytics solution that performs multiple aggregations over event-time windows up to 30 minutes with up to 90 seconds of late arrivals while keeping operational overhead to a minimum; which approach should the data engineer choose?
A gaming analytics company streams real-time gameplay telemetry from console clients, dedicated game servers, and anti-cheat sensors into Amazon Kinesis Data Streams at an average of 12 MB/s with peaks up to 30 MB/s across 6 shards. A data engineer must process this streaming feed and land it in an Amazon Redshift Serverless workgroup for analytics. The dashboards must provide near real-time insights with sub-60-second freshness while also joining against the previous day's data, and the solution must minimize operational overhead. Which solution will meet these requirements with the least operational overhead?
A data engineer configured a custom Amazon EventBridge rule named trigger-etl on the analytics-bus in account 111111111111 (us-west-2) to invoke the AWS Lambda function arn:aws:lambda:us-west-2:111111111111:function:etl-summarizer-v2 on a rate(5 minutes) schedule, but when a test event is sent the target invocation fails with AccessDeniedException from Lambda; how should the engineer resolve the exception?
Période de préparation: 1 month
문제 제대로 이해하고 풀었으면 여러분들도 합격 가능할거에요! 화이팅
Période de préparation: 1 month
I passed the AWS data engineer associate exam. Cloud pass questions is best app which help candidate to preparer well for any exam. Thanks
Période de préparation: 1 month
시험하고 문제 패턴이 비슷
Période de préparation: 2 months
813/1000 합격했어요!! 시험하고 문제가 유사한게 많았어요
Période de préparation: 1 month
해설까지 있어서 공부하기 좋았어요. 담에 또 올게요
Obtenir l'application gratuite