Databricks Certified Data Engineer Associate: Certified Data Engineer Associate

Practice Test #4

Simulate the real exam experience with 45 questions and a 90-minute time limit. Practice with AI-verified answers and detailed explanations.

45Questions90Minutes700/1000Passing Score

Browse Practice Questions

AI-Powered

Triple AI-Verified Answers & Explanations

Every answer is cross-verified by 3 leading AI models to ensure maximum accuracy. Get detailed per-option explanations and in-depth question analysis.

GPT Pro

Claude Opus

Gemini Pro

Per-option explanations

In-depth question analysis

3-model consensus accuracy

Practice Questions

Question 1

Which of the following is hosted completely in the control plane of the classic Databricks architecture?

Want to practice all questions on the go?

Download Cloud Pass — includes practice tests, progress tracking & more.

Other Practice Tests

Start Practicing Now

Download Cloud Pass and start practicing all Databricks Certified Data Engineer Associate: Certified Data Engineer Associate exam questions.

Want to practice all questions on the go?

Get the app

Download Cloud Pass — includes practice tests, progress tracking & more.

Question 2

Which of the following code blocks will remove the rows where the value in column age is greater than 25 from the existing Delta table my_table and save the updated table?

Question 3

A data engineer has left the organization. The data team needs to transfer ownership of the data engineer’s Delta tables to a new data engineer. The new data engineer is the lead engineer on the data team. Assuming the original data engineer no longer has access, which of the following individuals must be the one to transfer ownership of the Delta tables in Data Explorer?

Question 4

A data engineer is designing a data pipeline. The source system generates files in a shared directory that is also used by other processes. As a result, the files should be kept as is and will accumulate in the directory. The data engineer needs to identify which files are new since the previous run in the pipeline, and set up the pipeline to only ingest those new files with each run. Which of the following tools can the data engineer use to solve this problem?

Question Analysis

Core concept: This question tests incremental file ingestion from a directory where files are not deleted or moved, requiring the pipeline to reliably detect and ingest only newly arrived files across runs. In Databricks, this is a classic use case for Auto Loader (cloudFiles) in Structured Streaming. Why the answer is correct: Auto Loader is designed to efficiently discover new files in cloud object storage and ingest them incrementally while maintaining state (a checkpoint) so that each run processes only files not previously seen. This directly matches the requirement: the directory is shared, files accumulate, and the pipeline must identify “new since last run” without modifying the source directory. Auto Loader tracks processed files using its checkpoint location and (depending on configuration) file notification services or directory listing with scalable metadata management. Key features and best practices: Auto Loader supports schema inference and evolution, exactly-once processing semantics when used with Structured Streaming and a stable checkpoint, and scalable file discovery. Common configurations include: - A persistent checkpointLocation to remember progress across runs - cloudFiles.format to specify source format (json, csv, parquet, etc.) - Options for schemaLocation to persist inferred schema - Using file notification mode (when available) for better scalability vs. repeated listing Auto Loader is frequently paired with writing to Delta Lake for reliable downstream storage, but Delta Lake itself is not the mechanism that discovers new raw files. Common misconceptions: Delta Lake provides ACID tables and supports incremental reads from Delta tables (e.g., streaming reads), but it does not inherently solve incremental discovery of new files in an arbitrary directory. Unity Catalog governs access and metadata; it doesn’t track which raw files have been ingested. Databricks SQL is a query engine and can query external locations, but it doesn’t provide robust, stateful “ingest only new files since last run” behavior for accumulating file drops. Exam tips: When you see “new files arriving in a folder” + “don’t move/delete files” + “ingest only new files each run,” think Auto Loader with checkpoints. If the question instead says “incremental changes in a Delta table,” then Delta streaming/CDC features may be the focus. Always distinguish file discovery (Auto Loader) from storage/transactionality (Delta Lake) and governance (Unity Catalog).

Question 5

Which of the following SQL keywords can be used to convert a table from a long format to a wide format?

Question 6

In which of the following scenarios should a data engineer select a Task in the Depends On field of a new Databricks Job Task?

Question 7

A data engineer needs access to a table new_table, but they do not have the correct permissions. They can ask the table owner for permission, but they do not know who the table owner is.

Which of the following approaches can be used to identify the owner of new_table?

Question 8

A financial services company stores financial data in Amazon Redshift. A data engineer wants to run real-time queries on the financial data to support a web-based trading application. The data engineer wants to run the queries from within the trading application. Which solution will meet these requirements with the LEAST operational overhead?

Question 9

A manufacturing company collects sensor data from its factory floor to monitor and enhance operational efficiency. The company uses Amazon Kinesis Data Streams to publish the data that the sensors collect to a data stream. Then Amazon Kinesis Data Firehose writes the data to an Amazon S3 bucket. The company needs to display a real-time view of operational efficiency on a large screen in the manufacturing facility. Which solution will meet these requirements with the LOWEST latency?

Question 10

A company uses Amazon S3 to store semi-structured data in a transactional data lake. Some of the data files are small, but other data files are tens of terabytes. A data engineer must perform a change data capture (CDC) operation to identify changed data from the data source. The data source sends a full snapshot as a JSON file every day and ingests the changed data into the data lake. Which solution will capture the changed data MOST cost-effectively?

Practice Test #4

Triple AI-Verified Answers & Explanations

Practice Questions

Other Practice Tests

Practice Test #1

Practice Test #2

Practice Test #3

Start Practicing Now

Practice Test #5