如何在 JAVA (GAE) 中使用 ListFeed 获取超过 80K 行的 Google 电子表格?

How to fetch a Google Spreadsheet having more than 80K rows using ListFeed in JAVA (GAE)?

我想知道如何使用 Google 电子表格中的 "List-based Feeds" 获取超过 80K 行的完整 Google 电子表格。

为了更清楚,申请的流程如下:

  1. 使用 service.getFeed()
  2. 连接到 Google 电子表格
  3. 使用基于列表的订阅源获取所有行并将任务推送到任务队列中以将数据输入数据存储。

问题: 1. 应用程序在本地主机上运行良好,但在部署时出现超时错误 "HardDeadlineExceeded Exception"。我看过这个异常的文档,发现处理这样的异常没有多大用处。以下代码用于建立连接和获取基于列表的提要:

            try
            {
                lf = service.getFeed(url, ListFeed.class); //Exception occurs at this point
                timeoutflag=1;
            }
            catch(Exception e)
            {
                timeoutinc += 3;
                service.setConnectTimeout(timeoutinc * 10000);
                service.setReadTimeout(timeoutinc * 10000);
            }
  1. 我得到的第二个异常是:内存不足异常

    java.lang.OutOfMemoryError:Java堆space 在 com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse (AbstractSAXParser.java:1213) 在 com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse (SAXParserImpl.java:642) 在 org.xml.sax.helpers.ParserAdapter.parse (ParserAdapter.java:430) ...

我已经阅读了 Google 的官方文档,发现我可以使用基于 Cell 的提要,但是由于我的应用程序完全依赖于基于列表的提要,因此转向基于 Cell 的提要并不是一个好主意。我的用例的最佳选择,因为我需要逐行而不是逐个单元格地获取数据。

请指导...!

1.该应用程序在本地主机上运行良好,但在部署时,出现超时错误,显示 "HardDeadlineExceeded Exception"。我看过这个异常的文档,发现处理这样的异常用处不大。

基于此 documentation, if the DeadlineExceededException is not caught, an uncatchable HardDeadlineExceededError is thrown. The instance is terminated in both cases, but the HardDeadlineExceededError does not give any time margin to return a custom response. To make sure your request returns within the allowed time frame, you can use the ApiProxy.getCurrentEnvironment().getRemainingMillis() 方法检查您的代码,return 如果您没有时间。

2。我得到的第二个异常是:内存不足异常

从此related SO post, you got the error maybe because the heap is being over-allocated. The only way to solve other then increasing the heap space is to see what is using all the heap space and then trying to make sure objects can be collected that stay around longer then they are needed. If it is a file or something that can't be collected that is making you run out of heap space, you should re-engineer your program if the file sizes aren't constant and keep changing. If they are constant just increase the heap space above the file size. You can check this thread获取更多信息。

对于您的问题,如何在 JAVA (GAE) 中使用 ListFeed 获取超过 80K 行的 Google 电子表格?,我建议检查此 documentation. This sample code 也可能有帮助。

// Make a request to the API and get all spreadsheets.
    SpreadsheetFeed feed = service.getFeed(SPREADSHEET_FEED_URL,
        SpreadsheetFeed.class);
    List<SpreadsheetEntry> spreadsheets = feed.getEntries();

   if (spreadsheets.size() == 0) {
      // TODO: There were no spreadsheets, act accordingly.
    }