Mybatis的Cursor如何避免OOM异常

程序浅谈后端数据库 2024-09-06

研究Cursor如何避免OOM异常之前，先了解一下Cursor是啥。
在Mybatis中，有一个特殊的对象Cursor，这个对象的注释上清晰的说明了，这个类的用途。java

 代码解读
复制代码
/**
 * Cursor contract to handle fetching items lazily using an Iterator.
 * Cursors are a perfect fit to handle millions of items queries that would not normally fits in memory.
 * If you use collections in resultMaps then cursor SQL queries must be ordered (resultOrdered="true")
 * using the id columns of the resultMap.
 *
 * @author Guillaume Darmont / guillaume@dropinocean.com
 */

Cursors are a perfect fit to handle millions of items queries that would not normally fits in memory.Cursor非常适合处理通常不适合内存的数百万项查询

甚至在说明中还着重的说明了是非常适合的。
这个类的作用其实就是为了避免在数据库批量查询到大数据时导致程序OOM错误。

如何使用Cursor

在Mybatis中使用Cursor非常简单，只要在Mapper文件中将方法的返回值设置成Cursor<T>即可。java

 代码解读
复制代码
@Select("SELECT * FROM log")
Cursor<Log> selectAll();

注意：要是想在SpringBoot中使用Cursor的话，需要下面方式二选一，不然的话使用Cursor会报错。

手动创建SqlSession
在调用Mapper方法的方法上标注@Transactional事务注解。

之所以需要额外配置是因为在SpringBoot中，Mybatis的SqlSession生命周期只在Mapper方法中，并且在关闭SqlSession时，还会将SqlSession**绑定的Cursor关闭，**所以就需要延长SqlSession的存活时间了。

Cursor原理

解析Mapper方法返回值

在Mybatis中，调用Mapper方法时，会由MapperProxy进行方法的代理。此时就会根据具体的方法进行不同的解析。java

 代码解读
复制代码
public MethodSignature(Configuration configuration, Class<?> mapperInterface, Method method) {
    // 解析方法返回值
    Type resolvedReturnType = TypeParameterResolver.resolveReturnType(method, mapperInterface);
    if (resolvedReturnType instanceof Class<?>) {
        this.returnType = (Class<?>) resolvedReturnType;
    } else if (resolvedReturnType instanceof ParameterizedType) {
        this.returnType = (Class<?>) ((ParameterizedType) resolvedReturnType).getRawType();
    } else {
        this.returnType = method.getReturnType();
    }
    this.returnsVoid = void.class.equals(this.returnType);
    this.returnsMany = configuration.getObjectFactory().isCollection(this.returnType) || this.returnType.isArray();
    // 方法是否返回Cursor类型
    this.returnsCursor = Cursor.class.equals(this.returnType);
    this.returnsOptional = Optional.class.equals(this.returnType);
    this.mapKey = getMapKey(method);
    this.returnsMap = this.mapKey != null;
    this.rowBoundsIndex = getUniqueParamIndex(method, RowBounds.class);
    this.resultHandlerIndex = getUniqueParamIndex(method, ResultHandler.class);
    this.paramNameResolver = new ParamNameResolver(configuration, method);
}

根据Cursor返回值调用selectCursor

解析Mapper方法得到返回值后，就会根据返回值的类型来决定具体调用的查询方法。java

 代码解读
复制代码
public Object execute(SqlSession sqlSession, Object[] args) {
    Object result;
    switch (command.getType()) {
    // ---------- 其他查询----------------
        case SELECT:
            if (method.returnsVoid() && method.hasResultHandler()) {
                executeWithResultHandler(sqlSession, args);
                result = null;
            } else if (method.returnsMany()) {
                result = executeForMany(sqlSession, args);
            } else if (method.returnsMap()) {
                result = executeForMap(sqlSession, args);
            } else if (method.returnsCursor()) {
                // Cursor返回类型
                result = executeForCursor(sqlSession, args);
            } else {
                Object param = method.convertArgsToSqlCommandParam(args);
                result = sqlSession.selectOne(command.getName(), param);
                if (method.returnsOptional() && (result == null || !method.getReturnType().equals(result.getClass()))) {
                    result = Optional.ofNullable(result);
                }
            }
            break;
    // ---------- 其他查询----------------
    return result;
}

构建statement

使用上面解析Mapper方法后得到的Sql，从数据库链接中创建一个PreparedStatement并填充对应的参数值。java

 代码解读
复制代码
private Statement prepareStatement(StatementHandler handler, Log statementLog) throws SQLException {
    Statement stmt;
    Connection connection = getConnection(statementLog);
    stmt = handler.prepare(connection, transaction.getTimeout());
    handler.parameterize(stmt);
    return stmt;
}

封装Cursor

在调用的最后，会将从数据库得到的ResultSet以及Mybatis内部的ResultSetHandler封装成Cursor对象供用户使用。java

 代码解读
复制代码
public <E> Cursor<E> handleCursorResultSets(Statement stmt) throws SQLException {
    ErrorContext.instance().activity("handling cursor results").object(mappedStatement.getId());
    
    ResultSetWrapper rsw = getFirstResultSet(stmt);
    
    List<ResultMap> resultMaps = mappedStatement.getResultMaps();
    
    int resultMapCount = resultMaps.size();
    validateResultMapsCount(rsw, resultMapCount);
    if (resultMapCount != 1) {
        throw new ExecutorException("Cursor results cannot be mapped to multiple resultMaps");
    }
    
    ResultMap resultMap = resultMaps.get(0);
    return new DefaultCursor<>(this, resultMap, rsw, rowBounds);
}

为啥能避免内存溢出

在讨论这个问题前，我们可以看一下在Mybatis中，Cursor返回值的查询以及批量查询的实际调用逻辑。

Cursor查询java

 代码解读
复制代码
  @Override
  protected <E> Cursor<E> doQueryCursor(MappedStatement ms, Object parameter, RowBounds rowBounds, BoundSql boundSql)
      throws SQLException {
    Configuration configuration = ms.getConfiguration();
    StatementHandler handler = configuration.newStatementHandler(wrapper, ms, parameter, rowBounds, null, boundSql);
    Statement stmt = prepareStatement(handler, ms.getStatementLog());
    Cursor<E> cursor = handler.queryCursor(stmt);
    stmt.closeOnCompletion();
    return cursor;
  }

批量查询java

 代码解读
复制代码
  @Override
  public <E> List<E> doQuery(MappedStatement ms, Object parameter, RowBounds rowBounds, ResultHandler resultHandler,
      BoundSql boundSql) throws SQLException {
    Statement stmt = null;
    try {
      Configuration configuration = ms.getConfiguration();
      StatementHandler handler = configuration.newStatementHandler(wrapper, ms, parameter, rowBounds, resultHandler,
          boundSql);
      stmt = prepareStatement(handler, ms.getStatementLog());
      return handler.query(stmt, resultHandler);
    } finally {
      closeStatement(stmt);
    }
  }

可以对比一下两个实际执行的方法，比较明显的区别就是在批量搜索中，显式关闭了打开的Statement，而在Cursor查询中，并没有关闭与数据库的连接。归根结底就是因为Cursor在使用上就是在操作原生的Statement，故不能在查询后关闭。
另外，在批量查询的handler.query(stmt, resultHandler)方法中，是获取本次查询的所有数据后返回的，而这就会导致在大批量数据时塞爆内存导致OOM了。
然而在Cursor查询中，并不会获取全部数据后返回，而是根据用户操作来获取对于数据，自然而然也就不会塞爆内存了。

转载来源：https://juejin.cn/post/7383917346644361256