Reading .xls file via PHPExcel throws Fatal error: allowed memory size... even with chunk reader
Reading .xls file via PHPExcel throws Fatal error: allowed memory size... even with chunk reader
我正在使用 PHPExcel 读取 .xls 文件。我认识的时间很短
Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 730624 bytes) in Excel\PHPExcel\Shared\OLERead.php on line 93
经过一些谷歌搜索后,我尝试使用 chunkReader 来防止这种情况(甚至在 PHPExcel 主页上也提到过),但我仍然遇到这个错误。
我的想法是,通过块 reader,我将逐个读取文件,我的内存不会溢出。但是肯定有一些严重的内存泄漏?或者我正在释放一些内存不好?我什至试图将服务器内存提高到 1GB。我尝试读取的文件大小约为 700k,这并不算大(我还可以毫无问题地读取 ~20MB pdf、xlsx、docx、doc 等文件)。所以我假设可能只是我忽略了一些小巨魔。
代码如下所示
function parseXLS($fileName){
require_once dirname(__FILE__) . './sphider_design/include/Excel/PHPExcel/IOFactory.php';
require_once dirname(__FILE__) . './sphider_design/include/Excel/PHPExcel/ChunkReadFilter.php';
$inputFileType = 'Excel5';
/** Create a new Reader of the type defined in $inputFileType **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/** Define how many rows we want to read for each "chunk" **/
$chunkSize = 20;
/** Create a new Instance of our Read Filter **/
$chunkFilter = new chunkReadFilter();
/** Tell the Reader that we want to use the Read Filter that we've Instantiated **/
$objReader->setReadFilter($chunkFilter);
/** Loop to read our worksheet in "chunk size" blocks **/
/** $startRow is set to 2 initially because we always read the headings in row #1 **/
for ($startRow = 2; $startRow <= 65536; $startRow += $chunkSize) {
/** Tell the Read Filter, the limits on which rows we want to read this iteration **/
$chunkFilter->setRows($startRow,$chunkSize);
/** Load only the rows that match our filter from $inputFileName to a PHPExcel Object **/
$objPHPExcel = $objReader->load($fileName);
// Do some processing here
// Free up some of the memory
$objPHPExcel->disconnectWorksheets();
unset($objPHPExcel);
}
}
这里是 chunkReader 的代码
class chunkReadFilter implements PHPExcel_Reader_IReadFilter
{
private $_startRow = 0;
private $_endRow = 0;
/** Set the list of rows that we want to read */
public function setRows($startRow, $chunkSize) {
$this->_startRow = $startRow;
$this->_endRow = $startRow + $chunkSize;
}
public function readCell($column, $row, $worksheetName = '') {
// Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow
if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) {
return true;
}
return false;
}
}
希望以下链接对您有所帮助:
PHPExcel runs out of 256, 512 and also 1024MB of RAM
http://phpexcel.codeplex.com/discussions/242712?ProjectName=phpexcel
所以我在这里找到了有趣的解决方案How to read large worksheets from large Excel files (27MB+) with PHPExcel?
as 附录 3 有问题
edit1:同样通过这个解决方案,我遇到了我最喜欢的 errr 消息的瓶颈,但我发现了一些关于缓存的东西,所以我实现了这个
$cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_phpTemp;
$cacheSettings = array(' memoryCacheSize ' => '8MB');
PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);
最近我只对小于 10MB 的 xls 文件进行了测试,但它似乎可以工作(我也设置了 $objReader->setReadDataOnly(true);
)并且它似乎足够平衡以实现速度和内存消耗。 (如果可能的话,我会更多地走我的荆棘之路)
编辑2:
所以我做了一些进一步的研究,发现 chunk reader 对我来说是不必要的。 (在我看来,内存问题与块 reader 和没有它一样。)所以我对我的问题的最终回答是这样的,它读取 .xls 文件(仅来自单元格的数据,没有格式化,甚至过滤出公式)。当我使用 cache_tp_php_temp
我能够在几秒钟内读取 xls 文件(测试为 10MB)和大约 10k 行和多列并且没有内存问题
function parseXLS($fileName){
/** PHPExcel_IOFactory */
require_once dirname(__FILE__) . './sphider_design/include/Excel/PHPExcel/IOFactory.php';
require_once dirname(__FILE__) . './sphider_design/include/Excel/PHPExcel/ChunkReadFilter.php';
require_once dirname(__FILE__) . './sphider_design/include/Excel/PHPExcel.php';
$inputFileName = $fileName;
$fileContent = "";
//get inputFileType (most of time Excel5)
$inputFileType = PHPExcel_IOFactory::identify($inputFileName);
//initialize cache, so the phpExcel will not throw memory overflow
$cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_phpTemp;
$cacheSettings = array(' memoryCacheSize ' => '8MB');
PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);
//initialize object reader by file type
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
//read only data (without formating) for memory and time performance
$objReader->setReadDataOnly(true);
//load file into PHPExcel object
$objPHPExcel = $objReader->load($inputFileName);
//get worksheetIterator, so we can loop sheets in workbook
$worksheetIterator = $objPHPExcel->getWorksheetIterator();
//loop all sheets
foreach ($worksheetIterator as $worksheet) {
//use worksheet rowIterator, to get content of each row
foreach ($worksheet->getRowIterator() as $row) {
//use cell iterator, to get content of each cell in row
$cellIterator = $row->getCellIterator();
//dunno
$cellIterator->setIterateOnlyExistingCells(false);
//iterate each cell
foreach ($cellIterator as $cell) {
//check if cell exists
if (!is_null($cell)) {
//get raw value (without formating, and all unnecessary trash)
$rawValue = $cell->getValue();
//if cell isnt empty, print its value
if ((trim($rawValue) <> "") and (substr(trim($rawValue),0,1) <> "=")){
$fileContent .= $rawValue . " ";
}
}
}
}
}
return $fileContent;
}
这是我根据您的示例所做的。我发现需要设置一些带有 php 引擎的变量才能确保函数成功。看看这个。我删除了一些部分以插入到我的数据库中,但主要思想在这里。
$upload_dir = dirname(__DIR__) . "/uploads/";
$inputFileName = $upload_dir . basename($_FILES["fileToUpload"]["name"]);
$insertOk = FALSE;
// get inputFileType (most of time Excel5)
$inputFileType = PHPExcel_IOFactory::identify($inputFileName);
// initialize cache, so the phpExcel will not throw memory overflow
ini_set('memory_limit', '-1');
ini_set('max_execution_time', 180); // 180 seconds of execution time maximum
$cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_phpTemp;
$cacheSettings = array(' memoryCacheSize ' => '8MB');
PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);
// initialize object reader by file type
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
// read only data (without formating) for memory and time performance
$objReader->setReadDataOnly(true);
// load file into PHPExcel object
$objPHPExcel = $objReader->load($inputFileName);
$objPHPExcel->setActiveSheetIndex(0);
$spreadsheetInfo = $objReader->listWorksheetInfo($inputFileName);
$maxRowsAllowed = $spreadsheetInfo[0]['totalRows'];
// Define how many rows we want to read for each "chunk"
$chunkSize = 200;
// Create a new Instance of our Read Filter
$chunkFilter = new ReportChunkReadFilter();
// Tell the Reader that we want to use the Read Filter that we've
// Instantiated
$objReader->setReadFilter($chunkFilter);
// Loop to read our worksheet in "chunk size" blocks
for ($startRow = 0; $startRow <= $maxRowsAllowed; $startRow += $chunkSize) {
// Tell the Read Filter, the limits on which rows we want to
// read this iteration
$chunkFilter->setRows($startRow,$chunkSize);
// Load only the rows that match our filter from $inputFileName
// to a PHPExcel Object
$objPHPExcel = $objReader->load($inputFileName);
$sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true);
// loop on the rows of the filtered excel file (the chunk)
foreach ($sheetData as $rowArray) {
echo $rowArray['A'];
// do your stuff here
}
// Free up some of the memory
$objPHPExcel->disconnectWorksheets();
unset($objPHPExcel);
}
unlink($inputFileName);
我正在使用 PHPExcel 读取 .xls 文件。我认识的时间很短
Fatal error: Allowed memory size of 1073741824 bytes exhausted (tried to allocate 730624 bytes) in Excel\PHPExcel\Shared\OLERead.php on line 93
经过一些谷歌搜索后,我尝试使用 chunkReader 来防止这种情况(甚至在 PHPExcel 主页上也提到过),但我仍然遇到这个错误。
我的想法是,通过块 reader,我将逐个读取文件,我的内存不会溢出。但是肯定有一些严重的内存泄漏?或者我正在释放一些内存不好?我什至试图将服务器内存提高到 1GB。我尝试读取的文件大小约为 700k,这并不算大(我还可以毫无问题地读取 ~20MB pdf、xlsx、docx、doc 等文件)。所以我假设可能只是我忽略了一些小巨魔。
代码如下所示
function parseXLS($fileName){
require_once dirname(__FILE__) . './sphider_design/include/Excel/PHPExcel/IOFactory.php';
require_once dirname(__FILE__) . './sphider_design/include/Excel/PHPExcel/ChunkReadFilter.php';
$inputFileType = 'Excel5';
/** Create a new Reader of the type defined in $inputFileType **/
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
/** Define how many rows we want to read for each "chunk" **/
$chunkSize = 20;
/** Create a new Instance of our Read Filter **/
$chunkFilter = new chunkReadFilter();
/** Tell the Reader that we want to use the Read Filter that we've Instantiated **/
$objReader->setReadFilter($chunkFilter);
/** Loop to read our worksheet in "chunk size" blocks **/
/** $startRow is set to 2 initially because we always read the headings in row #1 **/
for ($startRow = 2; $startRow <= 65536; $startRow += $chunkSize) {
/** Tell the Read Filter, the limits on which rows we want to read this iteration **/
$chunkFilter->setRows($startRow,$chunkSize);
/** Load only the rows that match our filter from $inputFileName to a PHPExcel Object **/
$objPHPExcel = $objReader->load($fileName);
// Do some processing here
// Free up some of the memory
$objPHPExcel->disconnectWorksheets();
unset($objPHPExcel);
}
}
这里是 chunkReader 的代码
class chunkReadFilter implements PHPExcel_Reader_IReadFilter
{
private $_startRow = 0;
private $_endRow = 0;
/** Set the list of rows that we want to read */
public function setRows($startRow, $chunkSize) {
$this->_startRow = $startRow;
$this->_endRow = $startRow + $chunkSize;
}
public function readCell($column, $row, $worksheetName = '') {
// Only read the heading row, and the rows that are configured in $this->_startRow and $this->_endRow
if (($row == 1) || ($row >= $this->_startRow && $row < $this->_endRow)) {
return true;
}
return false;
}
}
希望以下链接对您有所帮助:
PHPExcel runs out of 256, 512 and also 1024MB of RAM
http://phpexcel.codeplex.com/discussions/242712?ProjectName=phpexcel
所以我在这里找到了有趣的解决方案How to read large worksheets from large Excel files (27MB+) with PHPExcel?
as 附录 3 有问题
edit1:同样通过这个解决方案,我遇到了我最喜欢的 errr 消息的瓶颈,但我发现了一些关于缓存的东西,所以我实现了这个
$cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_phpTemp;
$cacheSettings = array(' memoryCacheSize ' => '8MB');
PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);
最近我只对小于 10MB 的 xls 文件进行了测试,但它似乎可以工作(我也设置了 $objReader->setReadDataOnly(true);
)并且它似乎足够平衡以实现速度和内存消耗。 (如果可能的话,我会更多地走我的荆棘之路)
编辑2:
所以我做了一些进一步的研究,发现 chunk reader 对我来说是不必要的。 (在我看来,内存问题与块 reader 和没有它一样。)所以我对我的问题的最终回答是这样的,它读取 .xls 文件(仅来自单元格的数据,没有格式化,甚至过滤出公式)。当我使用 cache_tp_php_temp
我能够在几秒钟内读取 xls 文件(测试为 10MB)和大约 10k 行和多列并且没有内存问题
function parseXLS($fileName){
/** PHPExcel_IOFactory */
require_once dirname(__FILE__) . './sphider_design/include/Excel/PHPExcel/IOFactory.php';
require_once dirname(__FILE__) . './sphider_design/include/Excel/PHPExcel/ChunkReadFilter.php';
require_once dirname(__FILE__) . './sphider_design/include/Excel/PHPExcel.php';
$inputFileName = $fileName;
$fileContent = "";
//get inputFileType (most of time Excel5)
$inputFileType = PHPExcel_IOFactory::identify($inputFileName);
//initialize cache, so the phpExcel will not throw memory overflow
$cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_phpTemp;
$cacheSettings = array(' memoryCacheSize ' => '8MB');
PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);
//initialize object reader by file type
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
//read only data (without formating) for memory and time performance
$objReader->setReadDataOnly(true);
//load file into PHPExcel object
$objPHPExcel = $objReader->load($inputFileName);
//get worksheetIterator, so we can loop sheets in workbook
$worksheetIterator = $objPHPExcel->getWorksheetIterator();
//loop all sheets
foreach ($worksheetIterator as $worksheet) {
//use worksheet rowIterator, to get content of each row
foreach ($worksheet->getRowIterator() as $row) {
//use cell iterator, to get content of each cell in row
$cellIterator = $row->getCellIterator();
//dunno
$cellIterator->setIterateOnlyExistingCells(false);
//iterate each cell
foreach ($cellIterator as $cell) {
//check if cell exists
if (!is_null($cell)) {
//get raw value (without formating, and all unnecessary trash)
$rawValue = $cell->getValue();
//if cell isnt empty, print its value
if ((trim($rawValue) <> "") and (substr(trim($rawValue),0,1) <> "=")){
$fileContent .= $rawValue . " ";
}
}
}
}
}
return $fileContent;
}
这是我根据您的示例所做的。我发现需要设置一些带有 php 引擎的变量才能确保函数成功。看看这个。我删除了一些部分以插入到我的数据库中,但主要思想在这里。
$upload_dir = dirname(__DIR__) . "/uploads/";
$inputFileName = $upload_dir . basename($_FILES["fileToUpload"]["name"]);
$insertOk = FALSE;
// get inputFileType (most of time Excel5)
$inputFileType = PHPExcel_IOFactory::identify($inputFileName);
// initialize cache, so the phpExcel will not throw memory overflow
ini_set('memory_limit', '-1');
ini_set('max_execution_time', 180); // 180 seconds of execution time maximum
$cacheMethod = PHPExcel_CachedObjectStorageFactory::cache_to_phpTemp;
$cacheSettings = array(' memoryCacheSize ' => '8MB');
PHPExcel_Settings::setCacheStorageMethod($cacheMethod, $cacheSettings);
// initialize object reader by file type
$objReader = PHPExcel_IOFactory::createReader($inputFileType);
// read only data (without formating) for memory and time performance
$objReader->setReadDataOnly(true);
// load file into PHPExcel object
$objPHPExcel = $objReader->load($inputFileName);
$objPHPExcel->setActiveSheetIndex(0);
$spreadsheetInfo = $objReader->listWorksheetInfo($inputFileName);
$maxRowsAllowed = $spreadsheetInfo[0]['totalRows'];
// Define how many rows we want to read for each "chunk"
$chunkSize = 200;
// Create a new Instance of our Read Filter
$chunkFilter = new ReportChunkReadFilter();
// Tell the Reader that we want to use the Read Filter that we've
// Instantiated
$objReader->setReadFilter($chunkFilter);
// Loop to read our worksheet in "chunk size" blocks
for ($startRow = 0; $startRow <= $maxRowsAllowed; $startRow += $chunkSize) {
// Tell the Read Filter, the limits on which rows we want to
// read this iteration
$chunkFilter->setRows($startRow,$chunkSize);
// Load only the rows that match our filter from $inputFileName
// to a PHPExcel Object
$objPHPExcel = $objReader->load($inputFileName);
$sheetData = $objPHPExcel->getActiveSheet()->toArray(null,true,true,true);
// loop on the rows of the filtered excel file (the chunk)
foreach ($sheetData as $rowArray) {
echo $rowArray['A'];
// do your stuff here
}
// Free up some of the memory
$objPHPExcel->disconnectWorksheets();
unset($objPHPExcel);
}
unlink($inputFileName);