Glossary

Data Engineering & ETL Glossary

A concise reference for the language of modern ETL, agentic data engineering, and code-quality governance—designed for teams using CoeurData and Undraleu to raise their engineering standards.

38+ curated definitions
28 topical tags
Aligned to Undraleu, modernization, and governance programs

AI & Automation

Agentic Data Engineering

Autonomous, AI-driven processes that analyze, validate, and optimize ETL pipelines without constant manual intervention.

Code Quality

Automated Code Review (ACR)

Rule-based analysis that evaluates ETL code for best practices, performance, and compliance before it is merged or deployed.

Standards

Best-Practice Ruleset

A curated library of guardrails that ensures ETL jobs are designed and implemented consistently and reliably.

DevOps

CI/CD for ETL

Applying continuous integration and deployment practices to data pipelines, including automated testing, code review, and staged releases.

Risk

Code Debt (ETL Debt)

Accumulated shortcuts, anti-patterns, and inefficiencies in ETL logic that increase maintenance cost and slow future change.

Data Quality

Data Drift

Unexpected changes in schema, distribution, or content of data that can break pipelines or corrupt downstream analytics.

Traceability

Data Lineage

A full trace of data origins, transformations, and flows to understand impact and ensure trust in analytics outputs.

Monitoring

Data Observability

Continuous monitoring of freshness, volume, quality, and anomalies to catch issues before they reach business users.

Core Concept

ETL (Extract, Transform, Load)

Moving data from sources, transforming it into the required structure, and loading it into a target system.

Core Concept

ELT (Extract, Load, Transform)

Loading raw data directly into a target platform and transforming it there using the platform’s compute engine.

Architecture

Data Pipeline

A sequence of automated steps that move and transform data from sources into analytics-ready destinations.

Data Quality

Data Quality Rules

Business and technical checks to validate completeness, accuracy, timeliness, and consistency of data.

Governance

Control Framework (Data)

Structured controls that make data processes repeatable, auditable, and compliant with policies or regulations.

Regulation

Compliance Mapping

Linking ETL controls to regulations (FFIEC, NIST, HIPAA, etc.) so audit evidence can be produced quickly.

Process

Change Management (ETL)

Formal processes that govern how ETL jobs are modified, reviewed, tested, and promoted.

Testing

Regression Impact Analysis

Evaluating downstream tables, jobs, and reports affected by an ETL change to scope the right regression tests.

Productivity

Reusable Components (ETL Patterns)

Standardized, parameterized ETL templates that speed delivery and improve consistency.

Design

Metadata-Driven Design

Using configuration and metadata instead of hard-coded logic to drive how data pipelines behave.

DevOps

Environment Parity

Ensuring development, test, and production environments align so behavior is predictable when promoting changes.

Architecture

Non-Functional Requirements (NFRs)

Performance, scalability, availability, observability, and security expectations for data pipelines.

Reliability

Exception Handling & Retry Logic

Patterns that capture, log, and gracefully recover from errors, including controlled retries and fallbacks.

Operations

Job Orchestration

Coordinating execution order, dependencies, and scheduling of ETL jobs across workflows and environments.

Operations

Runtime Monitoring

Tracking ETL job health in real time—run times, failures, error codes, and SLA adherence.

Standards

Golden Path Template

A reference implementation that demonstrates the ideal way to build a pipeline, used as a starting point for new work.

Testing

Shift-Left Testing (Data)

Moving quality checks earlier in development so issues are caught before deployment.

Code Analysis

Static Analysis (ETL)

Analyzing ETL mappings without executing them to detect bad practices, complexity, and potential failures.

AI & Intelligence

Semantic Code Analysis

Understanding ETL logic at the business-rule level—validating intent, joins, aggregations, and transformations.

Service Levels

SLA Compliance (Data Engineering)

Ensuring pipelines consistently meet timelines, performance metrics, and data quality thresholds.

Modernization

ETL Modernization

Transforming legacy ETL platforms into modern, cloud-native, automation-friendly architectures.

Performance

Pipeline Efficiency

How quickly, cost-effectively, and reliably a data pipeline processes workloads.

Governance

Quality Gates

Automated checkpoints that block non-compliant or low-quality ETL code from moving forward.

Maturity

Technical Uplift

Improving data engineering capabilities via better tools, patterns, training, and automation.

Business Rules

Transformation Logic

The business logic applied to raw data to create trusted metrics, dimensions, and facts.

DevOps

Version-Controlled Pipelines

ETL assets stored in Git with full change history, approvals, and rollback capabilities.

Cost & Performance

Workload Optimization

Reducing runtime and compute costs by tuning queries, partitioning, schedules, and resources.

Readiness

Production Readiness Checklist

Standardized functional, non-functional, and operational checks before a pipeline goes live.

Operations

Runbook / Playbook

Documented steps for handling incidents, failures, and routine operational tasks.

FinOps

Cloud Cost Governance (Data)

Practices that keep cloud data workloads cost-efficient, visible, and aligned with business value.