通过 HTTP 服务 PostgreSQL 大对象

Question

我正在构建一个应用程序，通过 REST API（使用 Spring MVC）和 PWA（使用 Vaadin）从 PostgreSQL 数据库提供数据。

PostgreSQL 数据库使用 Large Objects (I'm not in control of that); the JDBC driver provides streamed access to their binary content via Blob#getBinaryStream 存储最大 2GB 的文件，因此不需要将数据完全读入内存。

唯一的要求是来自 blob 的流必须在同一事务中使用，否则 JDBC 驱动程序将抛出。

问题是，即使我在事务存储库方法中检索流，Spring MVC 和 Vaadin 的 StreamResource 都会在事务之外使用它，因此 JDBC 驱动程序抛出。

例如给定

public interface SomeRepository extends JpaRepository<SomeEntity, Long> {

    @Transactional(readOnly = true)
    default InputStream getStream() {
        return findById(1).getBlob().getBinaryStream();
    }
}

此Spring MVC 方法将失败

@RestController
public class SomeController {

    private final SomeRepository repository;

    @GetMapping
    public ResponseEntity getStream() {
        var stream = repository.getStream();
        var resource = new InputStreamResource(stream);
        return new ResponseEntity(resource, HttpStatus.OK);
    }
}

这个 Vaadin 也一样 StreamResource

public class SomeView extends VerticalLayout {

    public SomeView(SomeRepository repository) {
        var resource = new StreamResource("x", repository::getStream);
        var anchor = new Anchor(resource, "Download");
        add(anchor);
    }
}

同样例外：

org.postgresql.util.PSQLException: ERROR: invalid large-object descriptor: 0

这意味着在读取流时事务已经关闭。

我看到了两个可能的解决方案：

在下载过程中保持事务打开；
在事务期间将流写入磁盘，然后在下载期间从磁盘提供文件。

解决方案 1 是一种反模式和安全风险：事务持续时间由客户端决定，速度较慢 reader 或攻击者可能会阻止数据访问。

解决方案 2 在客户端请求和服务器响应之间造成巨大的延迟，因为流首先从数据库中读取并写入磁盘。

一个想法可能是在用数据库中的数据写入文件时开始从磁盘读取，这样传输会立即开始，但事务持续时间将与客户端下载分离；但我不知道这可能有哪些副作用。

我怎样才能以安全和高性能的方式实现为 PostgreSQL 大对象提供服务的目标？

Answer 1

如您所述，一种选择是将读取数据库和写入客户端响应分离。缺点是解决方案的复杂性，您需要在 reader 和编写器之间进行同步。

另一种选择是先在主事务中获取大对象id，然后分块读取数据，每个块在单独的事务中。

byte[] getBlobChunk(Connection connection, long lobId, long start, long chunkSize) throws SQLException { 
   Blob blob = PgBlob(connection, lobId);
   InputStream is = blob.getBinaryStream(start, chunkSize);
   return IOUtils.toByteArray(is);
}

此解决方案要简单得多，但会产生建立新连接的开销，如果您使用连接池，这应该不是什么大问题。

Answer 2

我们在 Spring Content by using threads + piped streams and a special inputstream wrapper ClosingInputStream that delays closes the connection/transaction until the consumer closes the input stream. Maybe something like this 中解决了这个问题，对您也有帮助吗？

仅供参考。我们发现与类似的数据库相比，使用 Postgres 的 OID 和大对象 API 非常慢。

或许您也可以将 Spring Content JPA 改造为您的解决方案，从而使用它的 http 端点（以及我刚才概述的解决方案）而不是创建您自己的？像这样：-

pom.xml

   <!-- Java API -->
   <dependency>
      <groupId>com.github.paulcwarren</groupId>
      <artifactId>spring-content-jpa-boot-starter</artifactId>
      <version>0.4.0</version>
   </dependency>

   <!-- REST API -->
   <dependency>
      <groupId>com.github.paulcwarren</groupId>
      <artifactId>spring-content-rest-boot-starter</artifactId>
      <version>0.4.0</version>
   </dependency>

SomeEntity.java

@Entity
public class SomeEntity {
   @Id
   @GeneratedValue
   private long id;

   @ContentId
   private String contentId;

   @ContentLength
   private long contentLength = 0L;

   @MimeType
   private String mimeType = "text/plain";

   ...
}

SomeEntityContentStore.java

@StoreRestResource(path="someEntityContent")
public interface SomeEntityContentStore extends ContentStore<SomeEntity, String> {
}

您只需要获取 REST 端点，即可将内容与您的实体相关联 SomeEntity。我们的示例存储库中有一个工作示例 here.

通过 HTTP 服务 PostgreSQL 大对象

Serve PostgreSQL large objects via HTTP

postgresql

spring

hibernate

spring-mvc

vaadin