保持引文内的文本完整,同时拆分文本

Keep text within quotation intact, while splitting text

我需要引号内的字符串 [$str] 中的数据不被拆分。 在这种情况下,“会计公司”应该保留在一串中,而不是散开。

<?php

$str =
'#PROGRAM   "Accounting company"    98.2
 #GENERATED     2020715 "SE"';

$data = explode("\n", $str);

foreach($data as &$value){
    $value = preg_split("/\s+/", $value);
}

var_dump($data);

结果:

array(2) {
  [0]=>
  array(4) {
    [0]=>
    string(8) "#PROGRAM"
    [1]=>
    string(11) ""Accounting" // Unwanted split
    [2]=>
    string(8) "company""  // Unwanted split
    [3]=>
    string(4) "98.2"
  }
  [1]=>
  &array(4) {
    [0]=>
    string(0) ""
    [1]=>
    string(10) "#GENERATED"
    [2]=>
    string(7) "2020715"
    [3]=>
    string(4) ""SE""
  }
}

想要的结果:

array(2) {
  [0]=>
  array(4) {
    [0]=>
    string(8) "#PROGRAM"
    [1]=>
    string(18) ""Accounting company"
    [2]=>
    string(4) "98.2"
  }
  [1]=>
  &array(4) {
    [0]=>
    string(0) ""
    [1]=>
    string(10) "#GENERATED"
    [2]=>
    string(7) "2020715"
    [3]=>
    string(4) ""SE""
  }
}

这是一个没有正则表达式的解决方案

$str =
'#PROGRAM   "Accounting company"    98.2
 #GENERATED     2020715 "SE"';

$quoted = false;
$index = 0;
$data = [];
$rows = explode("\n", $str);

foreach($rows as $row) {
    $temp = [];
    for ($i = 0; $i < strlen($row); $i++) {
        if ($row[$i] === "\"") $quoted = !$quoted;
        if ($row[$i] === " " && !$quoted) {
            $index++;
            continue;
        }
        
        $temp[$index] = ($temp[$index] ?? "") . $row[$i];
    }
    
    $data[] = array_values($temp);
}

var_dump($data);

结果

array(2) {
  [0]=>
  array(3) {
    [0]=>
    string(8) "#PROGRAM"
    [1]=>
    string(20) ""Accounting company""
    [2]=>
    string(4) "98.2"
  }
  [1]=>
  array(3) {
    [0]=>
    string(10) "#GENERATED"
    [1]=>
    string(7) "2020715"
    [2]=>
    string(4) ""SE""
  }
}

Demo

虽然仍在寻找正则表达式解决方案:)

如果您想保留 [1][0] 处的空元素:Demo

您可以使用 SKIP FAIL 模式跳过从开始到结束双引号的匹配值,然后匹配 1+ 个水平空白字符以拆分

"[^"]*"(*SKIP)(*FAIL)|\h+

Regex demo

$str =
    '#PROGRAM   "Accounting company"    98.2
 #GENERATED     2020715 "SE"';

$data = explode("\n", $str);

foreach($data as &$value){
    $value = preg_split("/\"[^\"]*\"(*SKIP)(*FAIL)|\h+/", $value);
}

print_r($data);

输出

Array
(
    [0] => #PROGRAM
    [1] => "Accounting company"
    [2] => 98.2
)
Array
(
    [0] => 
    [1] => #GENERATED
    [2] => 2020715
    [3] => "SE"
)

如果你不想要第二个数组中的空条目,你可以使用 PREG_SPLIT_NO_EMPTY 标志:

$value = preg_split("/\"[^\"]*\"(*SKIP)(*FAIL)|\h+/", $value, -1, PREG_SPLIT_NO_EMPTY);

Php demo