This guide provides comprehensive instructions for adding new T-SQL system functions to the SqlScriptDOM parser, incorporating lessons learned from fixing JSON function parsing in RETURN statements.
Adding a new system function involves three main components:
- AST Definition (
Ast.xml) - Define the abstract syntax tree node structure - Grammar Rules (
.gfiles) - Define parsing logic for the function syntax - Script Generator - Handle conversion from AST back to T-SQL text
- Testing - Ensure functionality works correctly across all contexts
Critical Requirement: New system functions must be parseable in ALTER FUNCTION RETURN statements. This requires special handling due to ANTLR v2's limitation with semantic predicates during syntactic predicate lookahead.
The returnStatement grammar rule uses a syntactic predicate for lookahead:
returnStatement: Return ((expression) => expression)? semicolonOptDuring lookahead, ANTLR cannot evaluate semantic predicates (which check runtime values like vResult.FunctionName.Value). This causes new functions to fail parsing in RETURN contexts even if they work elsewhere.
This section explains the fundamental issue we encountered with JSON functions and why it affects any new system function.
SELECT JSON_ARRAY('name'); -- ✅ Always workedGrammar Path: selectStatement → selectElementsList → selectElement → expression → expressionPrimary → builtInFunctionCall
Why it works: No syntactic predicates in the path - parser can evaluate semantic predicates normally during parsing.
RETURN JSON_ARRAY('name'); -- ❌ Failed before our fixGrammar Path: returnStatement uses syntactic predicate ((expression) => for lookahead
Why it failed:
- Parser encounters
RETURN JSON_ARRAY(...) - Syntactic predicate triggers lookahead to check if
JSON_ARRAY(...)is a valid expression - During lookahead:
expression→expressionPrimary→builtInFunctionCall builtInFunctionCallhas semantic predicate:{(vResult.FunctionName.Value == "JSON_ARRAY")}?- ANTLR v2 limitation: Cannot evaluate
vResult.FunctionName.Valueduring lookahead (object doesn't exist yet) - Lookahead fails → parser assumes not an expression → syntax error
The Solution: Add token-based syntactic predicates in expressionPrimary that work during lookahead:
{NextTokenMatches(CodeGenerationSupporter.JsonArray) && (LA(2) == LeftParenthesis)}?
vResult=jsonArrayCallThis is why every new system function must include the syntactic predicate pattern to work in RETURN statements.
Define the function's AST node structure:
<!-- Add to appropriate location in Ast.xml -->
<Class Name="YourNewFunctionCall" Base="PrimaryExpression">
<Member Name="Parameter1" Type="ScalarExpression" Summary="First parameter of the function" />
<Member Name="Parameter2" Type="StringLiteral" Summary="Second parameter (if string literal only)" />
<!-- Use ScalarExpression for dynamic values, specific types for literals only -->
</Class>Best Practice: Use ScalarExpression for parameters that should support:
- Literals (
'value',123) - Parameters (
@param) - Variables (
@variable) - Column references (
table.column) - Computed expressions (
value + 1)
Use specific literal types only when the SQL syntax strictly requires literals.
Add to the appropriate grammar files (typically TSql160.g, TSql170.g, TSqlFabricDW.g):
yourNewFunctionCall returns [YourNewFunctionCall vResult = FragmentFactory.CreateFragment<YourNewFunctionCall>()]
{
ScalarExpression vParam1;
StringLiteral vParam2;
}
:
tFunction:Identifier LeftParenthesis
{
Match(tFunction, CodeGenerationSupporter.YourFunctionName);
UpdateTokenInfo(vResult, tFunction);
}
vParam1 = expression
{
vResult.Parameter1 = vParam1;
}
(Comma vParam2 = stringLiteral
{
vResult.Parameter2 = vParam2;
})?
RightParenthesis
;Add to expressionPrimary rule before the generic (Identifier LeftParenthesis) predicate:
expressionPrimary returns [PrimaryExpression vResult]
// ... existing rules ...
// Add BEFORE the generic identifier predicate
| {NextTokenMatches(CodeGenerationSupporter.YourFunctionName) && (LA(2) == LeftParenthesis)}?
vResult=yourNewFunctionCall
// ... rest of existing rules including the generic identifier case ...
| (Identifier LeftParenthesis) => vResult=builtInFunctionCallWhy This is Required:
- The syntactic predicate uses
NextTokenMatches()which works during lookahead - It must come before the generic
builtInFunctionCallpredicate - This enables the function to be recognized in RETURN statements
If your function should also be recognized through the general built-in function mechanism, add it to builtInFunctionCall:
builtInFunctionCall returns [FunctionCall vResult = FragmentFactory.CreateFragment<FunctionCall>()]
// ... existing cases ...
| {(vResult.FunctionName.Value == "YOUR_FUNCTION_NAME")}?
vResult=yourNewFunctionCallAdd the function name constant to CodeGenerationSupporter.cs:
public const string YourFunctionName = "YOUR_FUNCTION_NAME";Add visitor method to handle AST-to-script conversion in the appropriate script generator file:
public override void ExplicitVisit(YourNewFunctionCall node)
{
GenerateIdentifier(CodeGenerationSupporter.YourFunctionName);
GenerateSymbol(TSqlTokenType.LeftParenthesis);
if (node.Parameter1 != null)
{
GenerateFragmentIfNotNull(node.Parameter1);
if (node.Parameter2 != null)
{
GenerateSymbol(TSqlTokenType.Comma);
GenerateSpace();
GenerateFragmentIfNotNull(node.Parameter2);
}
}
GenerateSymbol(TSqlTokenType.RightParenthesis);
}Add your function to the appropriate place in the grammar hierarchy:
// Add to function call expressions
functionCall returns [FunctionCall vResult]
: vResult=yourNewFunctionCall
| // ... other function types
;
// Or add to primary expressions if it's a primary expression type
primaryExpression returns [PrimaryExpression vResult]
: vResult=yourNewFunctionCall
| // ... other primary expressions
;dotnet build SqlScriptDom/Microsoft.SqlServer.TransactSql.ScriptDom.csproj -c DebugThis will regenerate parser files from the grammar.
YOU MUST ADD UNIT TESTS - DO NOT CREATE STANDALONE PROGRAMS TO TEST
Create test script in Test/SqlDom/TestScripts/YourFunctionTests160.sql:
-- Test basic function call
SELECT YOUR_FUNCTION_NAME('param1', 'param2');
-- CRITICAL: Test in ALTER FUNCTION RETURN statement
ALTER FUNCTION TestYourFunction()
RETURNS NVARCHAR(MAX)
AS
BEGIN
RETURN (YOUR_FUNCTION_NAME('value1', 'value2'));
END;
GO- Create placeholder baseline file:
Test/SqlDom/Baselines160/YourFunctionTests160.sql - Run the test (it will fail)
- Copy the "Actual" output from the test failure
- Update the baseline file with the correctly formatted output
Add test entry to Test/SqlDom/Only160SyntaxTests.cs:
new ParserTest160("YourFunctionTests160.sql", nErrors80: 1, nErrors90: 1, nErrors100: 1, nErrors110: 1, nErrors120: 1, nErrors130: 1, nErrors140: 1, nErrors150: 1),Adjust error counts based on which SQL versions should support your function.
dotnet test Test/SqlDom/UTSqlScriptDom.csproj -c DebugEnsure all tests pass, including existing ones (no regressions).
This guide incorporates lessons learned from fixing JSON_OBJECT and JSON_ARRAY parsing in RETURN statements:
-- This worked fine:
SELECT JSON_ARRAY('name'); -- ✅ Always worked
-- This failed before the fix:
ALTER FUNCTION GetAuth() RETURNS NVARCHAR(MAX) AS BEGIN
RETURN (JSON_OBJECT('key': 'value')); -- ❌ Parse error here
END;SELECT Statement Context (Always Worked):
SELECT JSON_ARRAY('name');In a SELECT statement, the parser follows this path:
selectStatement→queryExpression→querySpecificationselectElementsList→selectElement→expressionexpression→expressionPrimary→builtInFunctionCall- ✅ No syntactic predicate blocking the path
RETURN Statement Context (Previously Failed):
RETURN JSON_ARRAY('name');In a RETURN statement, the parser follows this path:
returnStatementuses a syntactic predicate:((expression) =>- During lookahead, parser tries:
expression→expressionPrimary→builtInFunctionCall builtInFunctionCallhas a semantic predicate:{(vResult.FunctionName.Value == "JSON_ARRAY")}?- ❌ ANTLR v2 limitation: Semantic predicates cannot be evaluated during syntactic predicate lookahead
- ❌ Lookahead fails → parser doesn't recognize
JSON_ARRAYas valid expression
The semantic predicate {(vResult.FunctionName.Value == "JSON_OBJECT")}? in builtInFunctionCall could not be evaluated during the syntactic predicate lookahead in returnStatement.
Added syntactic predicates in expressionPrimary:
// Added before generic identifier predicate
| {NextTokenMatches(CodeGenerationSupporter.JsonObject) && (LA(2) == LeftParenthesis)}?
vResult=jsonObjectCall
| {NextTokenMatches(CodeGenerationSupporter.JsonArray) && (LA(2) == LeftParenthesis)}?
vResult=jsonArrayCallThis uses token-based checking (NextTokenMatches) which works during lookahead, unlike semantic predicates.
For SQL Server 2022+ functions, typically modify:
SqlScriptDom/Parser/TSql/TSql160.g(SQL Server 2022)SqlScriptDom/Parser/TSql/TSql170.g(SQL Server 2025)SqlScriptDom/Parser/TSql/TSqlFabricDW.g(Azure Synapse)
For earlier versions, add to appropriate grammar files (TSql150.g, TSql140.g, etc.).
- Forgetting RETURN Statement Support: Always add syntactic predicates to
expressionPrimary - Wrong Predicate Order: Syntactic predicates must come before generic predicates
- Semantic Predicates in Lookahead: Don't rely on semantic predicates in contexts with syntactic predicate lookahead
- Missing Script Generator: Every AST node needs a corresponding script generation visitor
- Incomplete Testing: Test both standalone function calls and RETURN statement usage
- Version Compatibility: Consider which SQL versions should support your function
- Function parses in SELECT statements
- Function parses in WHERE clauses
- Function parses in ALTER FUNCTION RETURN statements
- Function parses with literal parameters
- Function parses with variable parameters
- Function parses with computed expressions as parameters
- Script generation produces correct T-SQL output
- Round-trip parsing (parse → generate → parse) works
- No regressions in existing tests
- Appropriate error handling for invalid syntax
- Syntactic Predicates: Can check token types during lookahead (
LA(),NextTokenMatches()) - Semantic Predicates: Check runtime values, but fail during lookahead in syntactic predicates
- RETURN Statement Context: Uses syntactic predicate
((expression) =>which triggers lookahead
returnStatement
└── expression
└── expressionPrimary
├── yourNewFunctionCall (syntactic predicate)
└── builtInFunctionCall (semantic predicate)
By adding syntactic predicates to expressionPrimary, we catch function calls before they reach the problematic semantic predicate in builtInFunctionCall.
Following this guide ensures new system functions work correctly in all T-SQL contexts, especially the challenging RETURN statement scenario. The key insight is that ANTLR v2's limitations require careful predicate ordering and the use of token-based syntactic predicates for functions that need to work in lookahead contexts.