Glossary

Data Engineering & ETL Glossary

A concise reference for the language of modern ETL, agentic data engineering, and code-quality governance—designed for teams using CoeurData and Undraleu to raise their engineering standards.
  • 38+ curated definitions
  • 28 topical tags
  • Aligned to Undraleu, modernization, and governance programs
AI & Automation
Agentic Data Engineering

Autonomous, AI-driven processes that analyze, validate, and optimize ETL pipelines without constant manual intervention.

Code Quality
Automated Code Review (ACR)

Rule-based analysis that evaluates ETL code for best practices, performance, and compliance before it is merged or deployed.

Standards
Best-Practice Ruleset

A curated library of guardrails that ensures ETL jobs are designed and implemented consistently and reliably.

DevOps
CI/CD for ETL

Applying continuous integration and deployment practices to data pipelines, including automated testing, code review, and staged releases.

Risk
Code Debt (ETL Debt)

Accumulated shortcuts, anti-patterns, and inefficiencies in ETL logic that increase maintenance cost and slow future change.

Data Quality
Data Drift

Unexpected changes in schema, distribution, or content of data that can break pipelines or corrupt downstream analytics.

Traceability
Data Lineage

A full trace of data origins, transformations, and flows to understand impact and ensure trust in analytics outputs.

Monitoring
Data Observability

Continuous monitoring of freshness, volume, quality, and anomalies to catch issues before they reach business users.

Core Concept
ETL (Extract, Transform, Load)

Moving data from sources, transforming it into the required structure, and loading it into a target system.

Core Concept
ELT (Extract, Load, Transform)

Loading raw data directly into a target platform and transforming it there using the platform’s compute engine.

Architecture
Data Pipeline

A sequence of automated steps that move and transform data from sources into analytics-ready destinations.

Data Quality
Data Quality Rules

Business and technical checks to validate completeness, accuracy, timeliness, and consistency of data.

Governance
Control Framework (Data)

Structured controls that make data processes repeatable, auditable, and compliant with policies or regulations.

Regulation
Compliance Mapping

Linking ETL controls to regulations (FFIEC, NIST, HIPAA, etc.) so audit evidence can be produced quickly.

Process
Change Management (ETL)

Formal processes that govern how ETL jobs are modified, reviewed, tested, and promoted.

Testing
Regression Impact Analysis

Evaluating downstream tables, jobs, and reports affected by an ETL change to scope the right regression tests.

Productivity
Reusable Components (ETL Patterns)

Standardized, parameterized ETL templates that speed delivery and improve consistency.

Design
Metadata-Driven Design

Using configuration and metadata instead of hard-coded logic to drive how data pipelines behave.

DevOps
Environment Parity

Ensuring development, test, and production environments align so behavior is predictable when promoting changes.

Architecture
Non-Functional Requirements (NFRs)

Performance, scalability, availability, observability, and security expectations for data pipelines.

Reliability
Exception Handling & Retry Logic

Patterns that capture, log, and gracefully recover from errors, including controlled retries and fallbacks.

Operations
Job Orchestration

Coordinating execution order, dependencies, and scheduling of ETL jobs across workflows and environments.

Operations
Runtime Monitoring

Tracking ETL job health in real time—run times, failures, error codes, and SLA adherence.

Standards
Golden Path Template

A reference implementation that demonstrates the ideal way to build a pipeline, used as a starting point for new work.

Testing
Shift-Left Testing (Data)

Moving quality checks earlier in development so issues are caught before deployment.

Code Analysis
Static Analysis (ETL)

Analyzing ETL mappings without executing them to detect bad practices, complexity, and potential failures.

AI & Intelligence
Semantic Code Analysis

Understanding ETL logic at the business-rule level—validating intent, joins, aggregations, and transformations.

Service Levels
SLA Compliance (Data Engineering)

Ensuring pipelines consistently meet timelines, performance metrics, and data quality thresholds.

Modernization
ETL Modernization

Transforming legacy ETL platforms into modern, cloud-native, automation-friendly architectures.

Performance
Pipeline Efficiency

How quickly, cost-effectively, and reliably a data pipeline processes workloads.

Governance
Quality Gates

Automated checkpoints that block non-compliant or low-quality ETL code from moving forward.

Maturity
Technical Uplift

Improving data engineering capabilities via better tools, patterns, training, and automation.

Business Rules
Transformation Logic

The business logic applied to raw data to create trusted metrics, dimensions, and facts.

DevOps
Version-Controlled Pipelines

ETL assets stored in Git with full change history, approvals, and rollback capabilities.

Cost & Performance
Workload Optimization

Reducing runtime and compute costs by tuning queries, partitioning, schedules, and resources.

Readiness
Production Readiness Checklist

Standardized functional, non-functional, and operational checks before a pipeline goes live.

Operations
Runbook / Playbook

Documented steps for handling incidents, failures, and routine operational tasks.

FinOps
Cloud Cost Governance (Data)

Practices that keep cloud data workloads cost-efficient, visible, and aligned with business value.