1. The Monolithic “Kitchen Sink” Job
One giant job extracts from many systems, transforms everything, and loads multiple targets. It is hard to understand and even harder to fix under pressure.
- Problem: A single failure can impact many domains and is difficult to troubleshoot.
- Better pattern: Break work into smaller, purpose-driven pipelines (staging, cleansing, business rules, loading).
2. Row-by-Row Processing
Loops, cursors, or step logic that processes one row at a time where set-based operations would be far more efficient.
- Problem: Poor performance and non-scalable design on large volumes.
- Better pattern: Use set-based SQL or Spark logic, and push heavy work into the engine.
3. Copy-Paste Pipelines
Entire jobs or mappings are duplicated with minor tweaks instead of being parameterized and reused.
- Problem: Fixing a bug or improving logic requires changes in many places.
- Better pattern: Introduce reusable components or templates and drive variation through configuration.
4. Hidden Business Rules
Critical business rules are embedded in complex expressions or nested logic with no external documentation.
- Problem: Changes in business rules are risky because their full impact is unclear.
- Better pattern: Document rules, centralize them where possible, and align them with specifications.
Using ETL tools for tasks better suited to apps, services, or orchestration engines (e.g., business workflows in ETL).
- Problem: Pipelines become a tangle of control flow, with unclear ownership and behavior.
- Better pattern: Keep ETL focused on data movement and transformation; orchestrate processes with specialized tools.
6. No Housekeeping
Pipelines create large staging tables, temp files, or logs that are never cleaned up.
- Problem: Storage growth, slower jobs, and operational surprises.
- Better pattern: Implement explicit cleanup and archival patterns, and monitor growth.
7. “It Works on My Machine” Configurations
Hard-coded paths, credentials, or environment settings that only work in development.
- Problem: Deployment friction and brittle jobs when moving across environments.
- Better pattern: Externalize configuration via parameters, config files, or environment-specific settings.
Anti-patterns are much easier to spot across hundreds of pipelines when you use automated static analysis. Platforms like Undraleu highlight recurring design issues so teams can fix root causes—not just individual defects.