Database Source Generator
<!--
Work info
-->
Company:
VAE, Inc.
Role:
Software Engineer
Year:
2024-2025

Project Overview
This project focused on modernizing a legacy PostgreSQL data-access layer to be compatible with source-generated database commands, with an emphasis on correctness, nullability safety, and long-term maintainability.
While working in the codebase, I repeatedly noticed a pattern where raw SQL was embedded directly inside C# query classes, tightly coupling query logic, execution, and mapping. Although this approach worked at runtime, it made queries difficult to reason about, harder to validate, and increasingly brittle as the system grew.
I raised this concern to my manager, arguing that the pattern did not scale and that there should be a clearer separation between SQL, models, and execution. We aligned that a better approach was needed, and source generation emerged as the direction forward.
Although I did not build the source generator itself from scratch, I became the first on-the-ground implementer and primary adopter in the production codebase—responsible for migrating legacy queries, identifying gaps in the generator’s behavior, and providing concrete feedback based on real usage.
Background: Why Source Generation Was Needed
Source generators allow database commands to be generated at compile time based on method signatures, parameters, and SQL definitions. When used correctly, they provide:
compile-time validation of parameters
strong typing and nullability guarantees
elimination of ad-hoc runtime mapping
clearer contracts between SQL and application code
In this system, source generation was desirable because the existing pattern had several drawbacks:
SQL embedded in C# obscured query intent and structure
Runtime validation hid correctness issues until late stages
Nullability assumptions differed between SQL and C#
Conventions existed, but were not mechanically enforced
Without a disciplined architecture, SQL-in-code made it easy for subtle issues to accumulate unnoticed.
Problem Statement
The core problem was not simply legacy code, but a pattern that encouraged ambiguity.
Specifically:
SQL embedded in C# made queries harder to audit and validate
Legacy query code compiled successfully while violating generator requirements
Nullability mismatches surfaced late via integration tests
Developers had no clear, repeatable way to implement new queries correctly
Runtime constructs such as
DBNullblocked source generation entirely
As more queries were added, the cost of maintaining this pattern increased, and attempts to adopt source generation surfaced widespread inconsistencies.
Approach & Constraints
Several constraints shaped the solution:
Queries needed to satisfy the constraints of source-generated database commands in a PostgreSQL/Npgsql-based data-access layer
SQL needed to live outside C# code to be reasoned about independently
DBNullcould not be used, as it prevented command generationNullability had to be correct end-to-end (SQL → C# → generator output)
Some domain models matched database schemas exactly; others did not
Existing queries had to be migrated incrementally without breaking consumers
Feedback from real usage needed to flow back into generator design
Given these constraints, the guiding principle was to make correctness explicit and structural, rather than implicit and convention-based.
Key Decisions
Separate SQL from C# and make structure explicit
The first step was enforcing a clean separation:
SQL lived in dedicated
.sqlfilesC# classes defined clear models and method contracts
Source-generated commands acted as the bridge
This made SQL easier to inspect, test, and validate independently, while giving the generator a predictable structure to operate on.
Introduce explicit query patterns based on model parity
I formalized two explicit patterns for implementing queries:
Simplified pattern
Used when the domain model matched the database schema exactly. A single record type served as both the database model and return type, enabling direct use in generated commands.
Verbose pattern
Used when the domain model diverged from the database schema. This pattern introduced a private database-specific record and an explicit mapping step.
This made divergence visible and intentional, rather than accidental.
Eliminate runtime null handling (DBNull)
Legacy queries relied on DBNull to represent missing values. While functional at runtime, this blocked command generation and obscured intent.
I refactored parameters and models to rely on nullable types, aligning SQL definitions, C# signatures, and generator expectations. This required careful coordination across layers and surfaced several previously hidden assumptions.
Treat generator friction as feedback, not failure
As the first real user of the source generator in the codebase, I repeatedly encountered missing functionality, edge cases, and integration issues that only surfaced under real-world usage.
Examples included:
generator failures caused by materialized view behavior in pipelines
nullability mismatches that were not obvious from legacy code
unsupported patterns that required architectural adjustment
Rather than working around these issues, I treated them as feedback loops, adjusting query structure, refining patterns, and communicating gaps so the generator could be used reliably at scale.
Tradeoffs & Risks
This approach introduced upfront rigidity:
More structure than embedding SQL directly in C#
Additional boilerplate in cases where models diverged
Slower initial migration as legacy assumptions were uncovered
These tradeoffs were accepted because they replaced fragile runtime behavior with compile-time guarantees and made future query work safer and more predictable.
Results and Impact
This work transformed the PostgreSQL data-access layer from a collection of hand-written, runtime-driven queries into a structured, source-generator-compatible system with explicit correctness guarantees.
Key outcomes included:
Migrated and standardized 75+ production queries, replacing inline SQL and bespoke execution logic with typed records and source-generated commands
Eliminated 100% of custom result-mapping logic for migrated queries, reducing duplication and long-term maintenance overhead
Resolved entire classes of nullability-related integration test failures by enforcing model–schema alignment at design time rather than runtime
Established two explicit query patterns (simplified and verbose) that became the default approach for new and refactored query work
Introduced SQL view–based validation workflows (schema and nullability checks), enabling verification of model correctness without relying on production data
Extended CI pipelines with 3+ database-focused validation checks, blocking schema-invalid SQL and analyzer regressions during merge requests
Surfaced generator limitations and edge cases through real production usage, directly informing refinements and unblocking broader adoption
As a result, source generation became viable across the codebase, correctness failures shifted left into compile time, and new query work became harder to implement incorrectly by default.
Reflections and Takeaways
Looking back, earlier validation tooling could have reduced iteration cycles, but being embedded directly in the codebase surfaced real constraints that would not have appeared in abstract design.
Key takeaways:
Real adoption exposes issues design alone cannot
Being the first implementer carries architectural responsibility
Explicit structure scales better than flexible conventions
Generator constraints are design tools, not limitations
What I Intentionally Did Not Do
I avoided forcing all queries into a single pattern or abstracting away meaningful differences between domain and database models. Preserving this distinction kept the architecture honest and prevented accidental complexity.
Results
Production queries migrated and standardized
Elimination of custom result-mapping logic
Explicit query patterns established




