1. 概述
上篇文章《詞法解析》分享了詞法解析器 Lexer 是如何解析 SQL 里的詞法。本文分享 SQL 解析引擎是如何解析與理解 SQL 的。因為本文建立在《詞法解析》之上,你需要閱讀它后再開始這段旅程。
Parser 有兩個組件:
- SQLParsingEngine :SQL 解析引擎
- SQLParser :SQL 解析器
SQLParsingEngine 調(diào)用 SQLParserFactory 生成 SQLParser,SQLParser 調(diào)用 LexerEngine(封裝了 Lexer) 解析 SQL 詞法。
2. SQLParsingEngine
SQLParsingEngine,SQL 解析引擎。其parse()方法作為 SQL 解析入口,本身不帶復(fù)雜邏輯,通過調(diào)用 SQL 對應(yīng)的 SQLParser 進行 SQL 解析。
核心代碼如下:
// SQLParsingEngine.java
public SQLStatement parse() {
LexerEngine lexerEngine = LexerEngineFactory.newInstance(dbType, sql);
lexerEngine.nextToken();
return SQLParserFactory.newInstance(dbType, lexerEngine.getCurrentToken().getType(), shardingRule, lexerEngine).parse();
}
@NoArgsConstructor(access = AccessLevel.PRIVATE)
public final class LexerEngineFactory {
/**
* Create lexical analysis engine instance.
*
* @param dbType database type
* @param sql SQL
* @return lexical analysis engine instance
*/
public static LexerEngine newInstance(final DatabaseType dbType, final String sql) {
switch (dbType) {
case H2:
case MySQL:
return new LexerEngine(new MySQLLexer(sql));
case Oracle:
return new LexerEngine(new OracleLexer(sql));
case SQLServer:
return new LexerEngine(new SQLServerLexer(sql));
case PostgreSQL:
return new LexerEngine(new PostgreSQLLexer(sql));
default:
throw new UnsupportedOperationException(String.format("Cannot support database [%s].", dbType));
}
}
}
主要流程為:
- 根據(jù) db 類型和 sql 語句,生成對應(yīng)的
Lexer,并作為創(chuàng)建LexerEngine的構(gòu)造參數(shù)。目前支持的 db 類型為 H2、MySQL、Oracle、SQLServer、PostgreSQL。 - 調(diào)用
lexerEngine.nextToken()方法,生成第一個 Token。以查詢語句為例,第一個 Token 的詞法字面量為“select”,其類型為DefaultKeyword#SELECT。 - 根據(jù)第一個 Token 的類型,以及 db 類型,獲取對應(yīng)的 SQLParse,如
MySQLSelectParser。
// SQLParserFactory.java
public static SQLParser newInstance(final DatabaseType dbType, final TokenType tokenType, final ShardingRule shardingRule, final LexerEngine lexerEngine) {
if (!(tokenType instanceof DefaultKeyword)) {
throw new SQLParsingUnsupportedException(tokenType);
}
switch ((DefaultKeyword) tokenType) {
case SELECT:
return SelectParserFactory.newInstance(dbType, shardingRule, lexerEngine);
case INSERT:
return InsertParserFactory.newInstance(dbType, shardingRule, lexerEngine);
case UPDATE:
return UpdateParserFactory.newInstance(dbType, shardingRule, lexerEngine);
case DELETE:
return DeleteParserFactory.newInstance(dbType, shardingRule, lexerEngine);
case CREATE:
return CreateParserFactory.newInstance(dbType, shardingRule, lexerEngine);
case ALTER:
return AlterParserFactory.newInstance(dbType, shardingRule, lexerEngine);
case DROP:
return DropParserFactory.newInstance(dbType, shardingRule, lexerEngine);
case TRUNCATE:
return TruncateParserFactory.newInstance(dbType, shardingRule, lexerEngine);
case SET:
case COMMIT:
case ROLLBACK:
case SAVEPOINT:
case BEGIN:
return TCLParserFactory.newInstance(dbType, shardingRule, lexerEngine);
default:
throw new SQLParsingUnsupportedException(lexerEngine.getCurrentToken().getType());
}
}
// SelectParserFactory.java
public static AbstractSelectParser newInstance(final DatabaseType dbType, final ShardingRule shardingRule, final LexerEngine lexerEngine) {
switch (dbType) {
case H2:
case MySQL:
return new MySQLSelectParser(shardingRule, lexerEngine);
case Oracle:
return new OracleSelectParser(shardingRule, lexerEngine);
case SQLServer:
return new SQLServerSelectParser(shardingRule, lexerEngine);
case PostgreSQL:
return new PostgreSQLSelectParser(shardingRule, lexerEngine);
default:
throw new UnsupportedOperationException(String.format("Cannot support database [%s].", dbType));
}
}
最后,調(diào)用SQLParser#parse方法,對 SQL 進行解析。下面,我們就以 MySQL 的查詢語句為例,探討其解析流程。
3. 查詢 SQL (MySQL) 解析流程
查詢 SQL 解析主流程如下:

// AbstractSelectParser.java
public final SelectStatement parse() {
SelectStatement result = parseInternal();
if (result.containsSubQuery()) {
result = result.mergeSubQueryStatement();
}
// TODO move to rewrite
appendDerivedColumns(result);
appendDerivedOrderBy(result);
return result;
}
3.1 SelectStatement
SelectStatement,查詢語句解析結(jié)果對象。
public final class SelectStatement extends DQLStatement {
// 是否是“*”
private boolean containStar;
// 最后一個查詢項下一個 Token 的開始位置
private int selectListLastPosition;
// 最后一個分組項下一個 Token 的開始位置
private int groupByLastPosition;
// 查詢項
private final Set<SelectItem> items = new HashSet<>();
// 分組項
private final List<OrderItem> groupByItems = new LinkedList<>();
// 排序項
private final List<OrderItem> orderByItems = new LinkedList<>();
// 分頁信息
private Limit limit;
}
3.2 AbstractSQLStatement
增刪改查解析結(jié)果對象的抽象父類。
public abstract class AbstractSQLStatement implements SQLStatement {
// SQL 類型
private final SQLType type;
// 表名
private final Tables tables = new Tables();
// 過濾條件。只有對路由結(jié)果有影響的條件,才添加進數(shù)組
private final Conditions conditions = new Conditions();
// SQL標記對象
private final List<SQLToken> sqlTokens = new LinkedList<>();
}
這里需要注意的是,conditions屬性存放的是對路由結(jié)果有影響的條件,即分片鍵的過濾條件。
3.3 SQLToken
SQLToken,SQL標記對象接口,記錄著標記對象的起始位置。下面都是它的實現(xiàn)類:
| 類 | 說明 |
|---|---|
| GeneratedKeyToken | 自增主鍵標記對象 |
| TableToken | 表標記對象 |
| ItemsToken | 選擇項標記對象 |
| OffsetToken | 分頁偏移量標記對象 |
| OrderByToken | 排序標記對象 |
| RowCountToken | 分頁長度標記對象 |
3.4 解析流程分析
我們以 MySQL 的查詢語句為例,直接看AbstractSelectParser#parseInternal()的源碼:
// AbstractSelectParser.java
private SelectStatement parseInternal() {
SelectStatement result = new SelectStatement();
lexerEngine.nextToken();
parseInternal(result);
return result;
}
// MySQLSelectParser.java
@Override
protected void parseInternal(final SelectStatement selectStatement) {
parseDistinct();
parseSelectOption();
parseSelectList(selectStatement, getItems());
parseFrom(selectStatement);
parseWhere(getShardingRule(), selectStatement, getItems());
parseGroupBy(selectStatement);
parseHaving();
parseOrderBy(selectStatement);
parseLimit(selectStatement);
parseSelectRest();
}
該方法調(diào)用lexerEngine對 SQL 進行詞法解析,并生產(chǎn)SelectStatement對象。
這里有一點我們需要注意,SQLParser 并不是等 Lexer 解析完詞法( Token ),再根據(jù)詞法去理解 SQL。而是,在理解 SQL 的過程中,調(diào)用 Lexer 進行分詞。
3.4.1 #parseDistinct()
解析 DISTINCT、DISTINCTROW 謂語。
核心代碼DistinctClauseParser#parse:
/**
* Parse distinct.
*/
public final void parse() {
lexerEngine.skipAll(DefaultKeyword.ALL);
Collection<Keyword> distinctKeywords = new LinkedList<>();
distinctKeywords.add(DefaultKeyword.DISTINCT);
distinctKeywords.addAll(Arrays.asList(getSynonymousKeywordsForDistinct()));
lexerEngine.unsupportedIfEqual(distinctKeywords.toArray(new Keyword[distinctKeywords.size()]));
}
public class MySQLDistinctClauseParser extends DistinctClauseParser {
public MySQLDistinctClauseParser(final LexerEngine lexerEngine) {
super(lexerEngine);
}
@Override
protected Keyword[] getSynonymousKeywordsForDistinct() {
return new Keyword[] {MySQLKeyword.DISTINCTROW};
}
}
此處 DISTINCT 和 DISTINCT(字段) 不同,它是針對查詢結(jié)果做去重,即整行重復(fù)。舉個例子:
mysql> SELECT item_id, order_id FROM t_order_item;
+---------+----------+
| item_id | order_id |
+---------+----------+
| 1 | 1 |
| 1 | 1 |
+---------+----------+
2 rows in set (0.03 sec)
mysql> SELECT DISTINCT item_id, order_id FROM t_order_item;
+---------+----------+
| item_id | order_id |
+---------+----------+
| 1 | 1 |
+---------+----------+
1 rows in set (0.02 sec)
3.4.2 #parseSelectList()
將 SQL 查詢字段 按照逗號( , )切割成多個選擇項( SelectItem)。核心代碼如下SelectListClauseParser#parse:
public void parse(final SelectStatement selectStatement, final List<SelectItem> items) {
do {
selectStatement.getItems().add(parseSelectItem(selectStatement));
} while (lexerEngine.skipIfEqual(Symbol.COMMA));
selectStatement.setSelectListLastPosition(lexerEngine.getCurrentToken().getEndPosition() - lexerEngine.getCurrentToken().getLiterals().length());
items.addAll(selectStatement.getItems());
}
private SelectItem parseSelectItem(final SelectStatement selectStatement) {
lexerEngine.skipIfEqual(getSkippedKeywordsBeforeSelectItem());
SelectItem result;
if (isRowNumberSelectItem()) {
// 是否是 ROW_NUMBER 關(guān)鍵字(SQLServer 才有)
result = parseRowNumberSelectItem(selectStatement);
} else if (isStarSelectItem()) {
// 是否是全表查詢“*”
selectStatement.setContainStar(true);
result = parseStarSelectItem();
} else if (isAggregationSelectItem()) {
// 聚合函數(shù)查詢,如 SUM、AVG 等
result = parseAggregationSelectItem(selectStatement);
parseRestSelectItem(selectStatement);
} else {
// 普通查詢
result = new CommonSelectItem(SQLUtil.getExactlyValue(parseCommonSelectItem(selectStatement) + parseRestSelectItem(selectStatement)), aliasExpressionParser.parseSelectItemAlias());
}
return result;
}
該方法會解析 select 字面量后面的查詢選項,并賦值SelectStatement#items。
3.4.3 #parseFrom()
解析表以及表連接關(guān)系。如 JOIN ON、子查詢,解析過程中獲得的表名存入AbstractSQLStatement#tables屬性中,以及表對應(yīng)的標識對象TableToken存入AbstractSQLStatement#sqlTokens屬性中。
核心代碼為TableReferencesClauseParser#parseTableFactor:
protected final void parseTableFactor(final SQLStatement sqlStatement, final boolean isSingleTableOnly) {
final int beginPosition = lexerEngine.getCurrentToken().getEndPosition() - lexerEngine.getCurrentToken().getLiterals().length();
String literals = lexerEngine.getCurrentToken().getLiterals();
lexerEngine.nextToken();
if (lexerEngine.equalAny(Symbol.DOT)) {
throw new UnsupportedOperationException("Cannot support SQL for `schema.table`");
}
// 獲取表名
String tableName = SQLUtil.getExactlyValue(literals);
if (Strings.isNullOrEmpty(tableName)) {
return;
}
// 解析別名
Optional<String> alias = aliasExpressionParser.parseTableAlias();
if (isSingleTableOnly || shardingRule.tryFindTableRule(tableName).isPresent() || shardingRule.findBindingTableRule(tableName).isPresent()
|| shardingRule.getDataSourceMap().containsKey(shardingRule.getDefaultDataSourceName())) {
sqlStatement.getSqlTokens().add(new TableToken(beginPosition, literals));
sqlStatement.getTables().add(new Table(tableName, alias));
}
// 解析聯(lián)表查詢
parseJoinTable(sqlStatement);
if (isSingleTableOnly && !sqlStatement.getTables().isSingleTable()) {
throw new UnsupportedOperationException("Cannot support Multiple-Table.");
}
}
3.4.4 #parseWhere()
解析 WHERE 條件。將對路由結(jié)果有影響的條件,即分片鍵的過濾條件,存入AbstractSQLStatement#conditions中。
核心代碼為WhereClauseParser#parseComparisonCondition:
private void parseComparisonCondition(final ShardingRule shardingRule, final SQLStatement sqlStatement, final List<SelectItem> items) {
lexerEngine.skipIfEqual(Symbol.LEFT_PAREN);
SQLExpression left = basicExpressionParser.parse(sqlStatement);
if (lexerEngine.skipIfEqual(Symbol.EQ)) {
// 解析 = 條件
parseEqualCondition(shardingRule, sqlStatement, left);
lexerEngine.skipIfEqual(Symbol.RIGHT_PAREN);
return;
}
if (lexerEngine.skipIfEqual(DefaultKeyword.IN)) {
// 解析 in 條件
parseInCondition(shardingRule, sqlStatement, left);
lexerEngine.skipIfEqual(Symbol.RIGHT_PAREN);
return;
}
if (lexerEngine.skipIfEqual(DefaultKeyword.BETWEEN)) {
// 解析 Between And 條件,即區(qū)間條件
parseBetweenCondition(shardingRule, sqlStatement, left);
lexerEngine.skipIfEqual(Symbol.RIGHT_PAREN);
return;
}
if (sqlStatement instanceof SelectStatement && isRowNumberCondition(items, left)) {
// ROW_NUMBER 的查詢語句(MySQL 沒有)
if (lexerEngine.skipIfEqual(Symbol.LT)) {
parseRowCountCondition((SelectStatement) sqlStatement, false);
return;
}
if (lexerEngine.skipIfEqual(Symbol.LT_EQ)) {
parseRowCountCondition((SelectStatement) sqlStatement, true);
return;
}
if (lexerEngine.skipIfEqual(Symbol.GT)) {
parseOffsetCondition((SelectStatement) sqlStatement, false);
return;
}
if (lexerEngine.skipIfEqual(Symbol.GT_EQ)) {
parseOffsetCondition((SelectStatement) sqlStatement, true);
return;
}
}
// 其他條件查詢,如<,<=,>,>=,!= 等
List<Keyword> otherConditionOperators = new LinkedList<>(Arrays.asList(getCustomizedOtherConditionOperators()));
otherConditionOperators.addAll(
Arrays.asList(Symbol.LT, Symbol.LT_EQ, Symbol.GT, Symbol.GT_EQ, Symbol.LT_GT, Symbol.BANG_EQ, Symbol.BANG_GT, Symbol.BANG_LT, DefaultKeyword.LIKE, DefaultKeyword.IS));
if (lexerEngine.skipIfEqual(otherConditionOperators.toArray(new Keyword[otherConditionOperators.size()]))) {
lexerEngine.skipIfEqual(DefaultKeyword.NOT);
parseOtherCondition(sqlStatement);
}
if (lexerEngine.skipIfEqual(DefaultKeyword.NOT)) {
lexerEngine.nextToken();
lexerEngine.skipIfEqual(Symbol.LEFT_PAREN);
parseOtherCondition(sqlStatement);
lexerEngine.skipIfEqual(Symbol.RIGHT_PAREN);
}
lexerEngine.skipIfEqual(Symbol.RIGHT_PAREN);
}
3.4.5 #parseGroupBy()
解析分組條件,實現(xiàn)上比較類似 #parseSelectList,會更加簡單一些。
解析出來的分組信息存入SelectStatement#groupByItems屬性中。
核心代碼為GroupByClauseParser#parse:
public final void parse(final SelectStatement selectStatement) {
if (!lexerEngine.skipIfEqual(DefaultKeyword.GROUP)) {
return;
}
lexerEngine.accept(DefaultKeyword.BY);
while (true) {
// 解析分組表達式,得到 OrderItem,并存入 SelectStatement#groupByItems 屬性中
addGroupByItem(basicExpressionParser.parse(selectStatement), selectStatement);
if (!lexerEngine.equalAny(Symbol.COMMA)) {
break;
}
lexerEngine.nextToken();
}
lexerEngine.skipAll(getSkippedKeywordAfterGroupBy());
selectStatement.setGroupByLastPosition(lexerEngine.getCurrentToken().getEndPosition() - lexerEngine.getCurrentToken().getLiterals().length());
}
3.4.6 #parseHaving()
目前 Sharding-JDBC 不支持 Having 條件。
核心代碼為HavingClauseParser#parse:
public void parse() {
lexerEngine.unsupportedIfEqual(DefaultKeyword.HAVING);
}
// lexerEngine.java
public void unsupportedIfEqual(final TokenType... tokenTypes) {
if (equalAny(tokenTypes)) {
throw new SQLParsingUnsupportedException(lexer.getCurrentToken().getType());
}
}
3.4.7 #parseOrderBy()
解析排序條件。實現(xiàn)邏輯類似 #parseGroupBy(),這里就跳過,有興趣的同學(xué)可以去看看。
3.4.8 #parseLimit()
解析分頁 Limit 條件。相對簡單,這里就跳過,有興趣的同學(xué)可以去看看。注意下,分成 3 種情況:
- LIMIT row_count
- LIMIT offset, row_count
- LIMIT row_count OFFSET offset
解析出來的分頁信息存入SelectStatement#limit屬性中。
- Limit
public final class Limit {
// 數(shù)據(jù)庫類型
private final DatabaseType databaseType;
// offset
private LimitValue offset;
// row
private LimitValue rowCount;
}
當分頁位置為非占位符,即為數(shù)字時,會生成 OffsetToken 和 RowCountToken。
3.4.9 appendDerived 等方法
因為 Sharding-JDBC 對表做了分片,在 AVG , GROUP BY , ORDER BY 需要對 SQL 進行一些改寫,以達到能在內(nèi)存里對結(jié)果做進一步處理,例如求平均值、分組、排序等。
3.4.9.1 #appendAvgDerivedColumns()
解決 AVG 查詢。
核心代碼為AbstractSelectParser#appendAvgDerivedColumns:
private void appendAvgDerivedColumns(final ItemsToken itemsToken, final SelectStatement selectStatement) {
int derivedColumnOffset = 0;
for (SelectItem each : selectStatement.getItems()) {
if (!(each instanceof AggregationSelectItem) || AggregationType.AVG != ((AggregationSelectItem) each).getType()) {
continue;
}
AggregationSelectItem avgItem = (AggregationSelectItem) each;
// COUNT 字段
String countAlias = String.format(DERIVED_COUNT_ALIAS, derivedColumnOffset);
AggregationSelectItem countItem = new AggregationSelectItem(AggregationType.COUNT, avgItem.getInnerExpression(), Optional.of(countAlias));
// SUM 字段
String sumAlias = String.format(DERIVED_SUM_ALIAS, derivedColumnOffset);
AggregationSelectItem sumItem = new AggregationSelectItem(AggregationType.SUM, avgItem.getInnerExpression(), Optional.of(sumAlias));
// AggregationSelectItem 設(shè)置
avgItem.getDerivedAggregationSelectItems().add(countItem);
avgItem.getDerivedAggregationSelectItems().add(sumItem);
// TODO 將AVG列替換成常數(shù),避免數(shù)據(jù)庫再計算無用的AVG函數(shù)
itemsToken.getItems().add(countItem.getExpression() + " AS " + countAlias + " ");
itemsToken.getItems().add(sumItem.getExpression() + " AS " + sumAlias + " ");
derivedColumnOffset++;
}
}
針對 AVG 聚合字段,增加推導(dǎo)字段,將 AVG 改寫成 SUM 和 COUNT 查詢,內(nèi)存計算出 AVG = SUM / COUNT 結(jié)果。
3.4.9.2 #appendDerivedOrderColumns()
解決 GROUP BY , ORDER BY。
核心代碼為AbstractSelectParser#appendDerivedOrderColumns:
private void appendDerivedOrderColumns(final ItemsToken itemsToken, final List<OrderItem> orderItems, final String aliasPattern, final SelectStatement selectStatement) {
int derivedColumnOffset = 0;
for (OrderItem each : orderItems) {
if (!isContainsItem(each, selectStatement)) {
String alias = String.format(aliasPattern, derivedColumnOffset++);
each.setAlias(Optional.of(alias));
itemsToken.getItems().add(each.getQualifiedName().get() + " AS " + alias + " ");
}
}
}
private boolean isContainsItem(final OrderItem orderItem, final SelectStatement selectStatement) {
if (selectStatement.isContainStar()) {
return true;
}
for (SelectItem each : selectStatement.getItems()) {
if (-1 != orderItem.getIndex()) {
return true;
}
if (each.getAlias().isPresent() && orderItem.getAlias().isPresent() && each.getAlias().get().equalsIgnoreCase(orderItem.getAlias().get())) {
return true;
}
if (!each.getAlias().isPresent() && orderItem.getQualifiedName().isPresent() && each.getExpression().equalsIgnoreCase(orderItem.getQualifiedName().get())) {
return true;
}
}
return false;
}
針對 GROUP BY 或 ORDER BY 字段,增加推導(dǎo)字段。
如果該字段不在查詢字段里,需要額外查詢該字段,這樣才能在內(nèi)存里 GROUP BY 或 ORDER BY。
3.4.9.3 #appendDerivedOrderBy()
當無 Order By 條件時,使用 Group By 作為排序條件。
核心代碼為AbstractSelectParser#appendDerivedOrderBy:
private void appendDerivedOrderBy(final SelectStatement selectStatement) {
if (!selectStatement.getGroupByItems().isEmpty() && selectStatement.getOrderByItems().isEmpty()) {
selectStatement.getOrderByItems().addAll(selectStatement.getGroupByItems());
selectStatement.getSqlTokens().add(new OrderByToken(selectStatement.getGroupByLastPosition()));
}
}
3.4.10 ItemsToken
選擇項標記對象,屬于分片上下文信息,目前有 3 個情況會創(chuàng)建:
- AVG 查詢額外 COUNT 和 SUM: #appendAvgDerivedColumns()
- GROUP BY 不在 查詢字段,額外查詢該字段 : #appendDerivedOrderColumns()
- ORDER BY 不在 查詢字段,額外查詢該字段 : #appendDerivedOrderColumns()
public final class ItemsToken implements SQLToken {
/**
* SQL 開始位置
*/
private final int beginPosition;
/**
* 字段名數(shù)組
*/
private final List<String> items = new LinkedList<>();
}
4. 結(jié)語
查詢語句的 SQL 解析已經(jīng)講解完畢,其他的 INSERT,UPDATE,DELETE 就更簡單了,感興趣的同學(xué)可以自行去了解。那么,我們拿到 SQL 解析的結(jié)果SQLStatement,就可以進行下一步的路由操作了,于是下一篇,我們將討論 Sharding-JDBC 的路由流程,盡請關(guān)注!