
Simula la experiencia real del examen con 65 preguntas y un límite de tiempo de 130 minutos. Practica con respuestas verificadas por IA y explicaciones detalladas.
Impulsado por IA
Cada respuesta es verificada de forma cruzada por 3 modelos de IA líderes para garantizar la máxima precisión. Obtén explicaciones detalladas por opción y análisis profundo de cada pregunta.
A media streaming startup lands ~3 TB of raw clickstream logs per day in Amazon S3 and loads curated aggregates into an Amazon Redshift RA3 cluster, and analysts also need to run low-latency ad hoc queries on the freshest S3 data via Amazon Redshift Spectrum using an external schema backed by the AWS Glue Data Catalog; given that most filters are on event_date (YYYY-MM-DD) and region and the team wants the fastest Spectrum query performance, which two actions should they take? (Choose two.)
¿Quieres practicar todas las preguntas en cualquier lugar?
Descarga Cloud Pass gratis — incluye exámenes de práctica, seguimiento de progreso y más.
Periodo de estudio: 1 month
문제 제대로 이해하고 풀었으면 여러분들도 합격 가능할거에요! 화이팅
Periodo de estudio: 1 month
I passed the AWS data engineer associate exam. Cloud pass questions is best app which help candidate to preparer well for any exam. Thanks
Periodo de estudio: 1 month
시험하고 문제 패턴이 비슷
Periodo de estudio: 2 months
813/1000 합격했어요!! 시험하고 문제가 유사한게 많았어요
Periodo de estudio: 1 month
해설까지 있어서 공부하기 좋았어요. 담에 또 올게요
Descarga Cloud Pass y accede a todas las preguntas de práctica de AWS Certified Data Engineer - Associate (DEA-C01) gratis.
Obtén la app gratis
A media analytics company needs a workflow orchestrator for 200+ scheduled data pipelines that run across an on-premises Kubernetes cluster (3 worker nodes, 32 vCPU each) and an AWS account in us-east-1, requiring the same open-source DAG definitions in both locations, avoiding vendor lock-in, and supporting at least 500 task runs per day; which AWS service should the team adopt so they can run the open-source engine on premises and a fully managed equivalent in the cloud?
A media analytics startup operates an on-premises Oracle 12c database connected to AWS over a 1 Gbps Direct Connect link, and a data engineer must crawl a specific table (~50 million rows, 30 columns) via JDBC to catalog the schema, then extract, transform, and load the data into an Amazon S3 bucket as partitioned Parquet (Snappy) on a daily 01:00 UTC schedule while orchestrating the end-to-end pipeline with minimal managed service overhead to keep costs low; which AWS service or feature will most cost-effectively meet these requirements?
A fintech company streams payment event logs to an Amazon Kinesis Data Streams data stream with 12 shards; each record is 2 KB and producers send about 5,000 records per second overall, but CloudWatch shows two shards at 95% write utilization while the other shards are under 10%, and PutRecords calls return ProvisionedThroughputExceeded for those hot shards. Producers currently use merchantId as the partition key, and during a flash sale a single merchant generates approximately 70% of events, creating hot shards even though total throughput is below the stream's aggregate limits. How should the data engineer eliminate the throttling while keeping the same overall throughput?
A media platform needs to analyze playback logs stored in a PostgreSQL database. The company wants to correlate the logs with customer issues tracked in Zendesk. The company receives 2 GB of new playback logs each day. The company has 100 GB of historical Zendesk tickets. A data engineer must develop a process that analyzes and correlates the logs and tickets. The process must run once each night. Which solution will meet these requirements with the LEAST operational overhead?
A media-streaming analytics team uses Amazon Redshift Serverless (workgroup: prod-analytics in us-east-1) with 9 materialized views over a clickstream schema and must automate a schedule that runs REFRESH MATERIALIZED VIEW for all 9 views every 30 minutes between 08:00 and 20:00 UTC without provisioning or managing any orchestration infrastructure; which approach meets this requirement with the least effort?
A travel-tech company is consolidating booking and customer-support datasets from multiple legacy systems into an Amazon S3 data lake; an engineer reviewing historical exports (about 3 TB of CSV and JSON per week, ~120 million rows) finds that many bookings and customer profiles are duplicated across systems. The engineer must identify and remove duplicate information before publishing to the curated zone and wants a solution that minimizes operational overhead, scales automatically, and avoids managing servers or third-party libraries. Which approach meets these requirements with the least operational overhead?
An urban mobility firm ingests 8,000 sensor events per second from city traffic cameras into Amazon Kinesis Data Streams and requires a highly fault-tolerant, near-real-time analytics solution that performs multiple aggregations over event-time windows up to 30 minutes with up to 90 seconds of late arrivals while keeping operational overhead to a minimum; which approach should the data engineer choose?
A gaming analytics company streams real-time gameplay telemetry from console clients, dedicated game servers, and anti-cheat sensors into Amazon Kinesis Data Streams at an average of 12 MB/s with peaks up to 30 MB/s across 6 shards. A data engineer must process this streaming feed and land it in an Amazon Redshift Serverless workgroup for analytics. The dashboards must provide near real-time insights with sub-60-second freshness while also joining against the previous day's data, and the solution must minimize operational overhead. Which solution will meet these requirements with the least operational overhead?
A data engineer configured a custom Amazon EventBridge rule named trigger-etl on the analytics-bus in account 111111111111 (us-west-2) to invoke the AWS Lambda function arn:aws:lambda:us-west-2:111111111111:function:etl-summarizer-v2 on a rate(5 minutes) schedule, but when a test event is sent the target invocation fails with AccessDeniedException from Lambda; how should the engineer resolve the exception?