PHP 将文本解析为结构化 Json
PHP parsing text to a structured Json
我有这样的文字:
some text Xª 1234567-89.0123.45.6789 (YZ) 01/01/2011 Esbjörn Svensson 02/02/2022 Awesome Trio Wª 0987654-32.1098.76.5432 (KBoo) 07/09/2013 Some Full Name 09/07/2017 Observation 12/12/2018 some text that I don't want to keep Xª 4335678-98.7123.95.5689 09/10/2010 Name Here 08/09/2020 Observation and more text to delete
我需要这样一个结构化的 Json:
{
"data":
{
"Team": "Xª",
"ID": "1234567-89.0123.45.6789",
"Type": "(YZ)",
"Date 1": "01/01/2011",
"Name": "Esbjörn Svensson",
"Date 2: "02/02/2022",
"Obs": "Awesome Trio",
"Date 3": ""
},
{
"Team": "Wª",
"ID": "0987654-32.1098.76.5432",
"Type": "(KBoo)",
"Date 1": "07/09/2013",
"Name": "Some Full Name",
"Date 2: "09/07/2017",
"Obs": "Observation",
"Date 3": "12/12/2018"
},
{
"Team": "Xª",
"ID": "4335678-98.7123.95.5689",
"Type": "",
"Date 1": "09/10/2010",
"Name": "Name Here",Name Here
"Date 2: "08/09/2020",
"Obs": "Observation",
"Date 3": ""
}
}
我在这里搜索了很多代码,但我无法让它按照我需要的方式工作。我试图拆分有空白 space 和“ª”字符的文本,但没有成功。
foreach($textsource as &$lista) {
$y = implode(' ',$lista);
$x = preg_split(' ', $y);
$delimiter = '/\ª/';
$childIndex = array_keys(preg_grep($delimiter, $x));
$chunks = [];
$final = [];
for ($i=0; $i<count($childIndex); $i++) {
$chunks[$i]['begin'] = $childIndex[$i];
if (isset($childIndex[$i+1])) {
$chunks[$i]['len'] = $childIndex[$i+1]-$childIndex[$i];
}
}
foreach ($chunks as $chunk) {
if (isset($chunk['len'])){
$final[] = array_slice($x, $chunk['begin'], $chunk['len']);
} else {
$final[] = array_slice($x, $chunk['begin']);
}
}
echo "<pre>";
print_r($final);
echo "</pre>";
感谢任何帮助。
所以我试图解决这个问题,这是你的 working soluiton。顺便说一句,您的 json 无效。使用 jsonlint 检查。
$text = "some text Xª 1234567-89.0123.45.6789 (YZ) 01/01/2011 Esbjörn Svensson 02/02/2022 Awesome Trio Wª 0987654-32.1098.76.5432 (KBoo) 07/09/2013 Some Full Name 09/07/2017 Observation 12/12/2018 some text that I don't want to keep Xª 4335678-98.7123.95.5689 09/10/2010 Name Here 08/09/2020 Observation and more text to delete";
$arr = explode("ª", $text);
$team_arr = array_map(function ($team){ return substr($team, -1)."ª"; }, $arr);
array_shift($arr);
array_pop($team_arr);
$text = 'ignore everything except this (text)';
preg_match('#\((.*?)\)#', $text, $match);
$t = "01/01/2011 Esbjörn Svensson 02/02/2022";
$regEx = '/(\d{2})\/(\d{2})\/(\d{4})/';
preg_match_all($regEx, $t, $result);
$res = [];
$start = 0;
$end = count($arr);
for($i = 1; $i < $end; $i++){
$obj = $arr[$i];
$temp_obj_arr = explode(' ', trim($obj));
preg_match('#\((.*?)\)#', $obj, $match);
$type = (!empty($match[0]) ? $match[0] : "");
preg_match_all('/(\d{2})\/(\d{2})\/(\d{4})/', $obj, $dates);
$date1 = (!empty($dates[0][0]) ? $dates[0][0] : "");
$date2 = (!empty($dates[0][1]) ? $dates[0][1] : "");
$date3 = (!empty($dates[0][2]) ? $dates[0][2] : "");
$tname = explode($date1." ", $obj);
$char_arr = str_split($tname[1]);
$name = '';
foreach($char_arr as $ch){
if (is_numeric($ch)) {
break;
} else {
$name .=$ch;
}
}
$tname = explode($date2." ", $obj);
$char_arr = str_split($tname[1]);
$obs = '';
foreach($char_arr as $ch){
if (is_numeric($ch)) {
break;
} else {
$obs .=$ch;
}
}
$tkey = $i;
$tkey--;
$obj = [];
$obj['Team'] = $team_arr[$tkey];
$obj['ID'] = $temp_obj_arr[0];
$obj['Type'] = $type;
$obj['Date 1'] = $date1;
$obj['Name'] = $name;
$obj['Date 2'] = $date2;
$obj['Obs'] = $obs;
$obj['Date 3'] = $date3;
$res[] = $obj;
}
$json_res = json_encode($res, true);
print_r($json_res);
我有这样的文字:
some text Xª 1234567-89.0123.45.6789 (YZ) 01/01/2011 Esbjörn Svensson 02/02/2022 Awesome Trio Wª 0987654-32.1098.76.5432 (KBoo) 07/09/2013 Some Full Name 09/07/2017 Observation 12/12/2018 some text that I don't want to keep Xª 4335678-98.7123.95.5689 09/10/2010 Name Here 08/09/2020 Observation and more text to delete
我需要这样一个结构化的 Json:
{
"data":
{
"Team": "Xª",
"ID": "1234567-89.0123.45.6789",
"Type": "(YZ)",
"Date 1": "01/01/2011",
"Name": "Esbjörn Svensson",
"Date 2: "02/02/2022",
"Obs": "Awesome Trio",
"Date 3": ""
},
{
"Team": "Wª",
"ID": "0987654-32.1098.76.5432",
"Type": "(KBoo)",
"Date 1": "07/09/2013",
"Name": "Some Full Name",
"Date 2: "09/07/2017",
"Obs": "Observation",
"Date 3": "12/12/2018"
},
{
"Team": "Xª",
"ID": "4335678-98.7123.95.5689",
"Type": "",
"Date 1": "09/10/2010",
"Name": "Name Here",Name Here
"Date 2: "08/09/2020",
"Obs": "Observation",
"Date 3": ""
}
}
我在这里搜索了很多代码,但我无法让它按照我需要的方式工作。我试图拆分有空白 space 和“ª”字符的文本,但没有成功。
foreach($textsource as &$lista) {
$y = implode(' ',$lista);
$x = preg_split(' ', $y);
$delimiter = '/\ª/';
$childIndex = array_keys(preg_grep($delimiter, $x));
$chunks = [];
$final = [];
for ($i=0; $i<count($childIndex); $i++) {
$chunks[$i]['begin'] = $childIndex[$i];
if (isset($childIndex[$i+1])) {
$chunks[$i]['len'] = $childIndex[$i+1]-$childIndex[$i];
}
}
foreach ($chunks as $chunk) {
if (isset($chunk['len'])){
$final[] = array_slice($x, $chunk['begin'], $chunk['len']);
} else {
$final[] = array_slice($x, $chunk['begin']);
}
}
echo "<pre>";
print_r($final);
echo "</pre>";
感谢任何帮助。
所以我试图解决这个问题,这是你的 working soluiton。顺便说一句,您的 json 无效。使用 jsonlint 检查。
$text = "some text Xª 1234567-89.0123.45.6789 (YZ) 01/01/2011 Esbjörn Svensson 02/02/2022 Awesome Trio Wª 0987654-32.1098.76.5432 (KBoo) 07/09/2013 Some Full Name 09/07/2017 Observation 12/12/2018 some text that I don't want to keep Xª 4335678-98.7123.95.5689 09/10/2010 Name Here 08/09/2020 Observation and more text to delete";
$arr = explode("ª", $text);
$team_arr = array_map(function ($team){ return substr($team, -1)."ª"; }, $arr);
array_shift($arr);
array_pop($team_arr);
$text = 'ignore everything except this (text)';
preg_match('#\((.*?)\)#', $text, $match);
$t = "01/01/2011 Esbjörn Svensson 02/02/2022";
$regEx = '/(\d{2})\/(\d{2})\/(\d{4})/';
preg_match_all($regEx, $t, $result);
$res = [];
$start = 0;
$end = count($arr);
for($i = 1; $i < $end; $i++){
$obj = $arr[$i];
$temp_obj_arr = explode(' ', trim($obj));
preg_match('#\((.*?)\)#', $obj, $match);
$type = (!empty($match[0]) ? $match[0] : "");
preg_match_all('/(\d{2})\/(\d{2})\/(\d{4})/', $obj, $dates);
$date1 = (!empty($dates[0][0]) ? $dates[0][0] : "");
$date2 = (!empty($dates[0][1]) ? $dates[0][1] : "");
$date3 = (!empty($dates[0][2]) ? $dates[0][2] : "");
$tname = explode($date1." ", $obj);
$char_arr = str_split($tname[1]);
$name = '';
foreach($char_arr as $ch){
if (is_numeric($ch)) {
break;
} else {
$name .=$ch;
}
}
$tname = explode($date2." ", $obj);
$char_arr = str_split($tname[1]);
$obs = '';
foreach($char_arr as $ch){
if (is_numeric($ch)) {
break;
} else {
$obs .=$ch;
}
}
$tkey = $i;
$tkey--;
$obj = [];
$obj['Team'] = $team_arr[$tkey];
$obj['ID'] = $temp_obj_arr[0];
$obj['Type'] = $type;
$obj['Date 1'] = $date1;
$obj['Name'] = $name;
$obj['Date 2'] = $date2;
$obj['Obs'] = $obs;
$obj['Date 3'] = $date3;
$res[] = $obj;
}
$json_res = json_encode($res, true);
print_r($json_res);