用于在 OpenEdge 数据导出中解析带引号的字符串的正则表达式
Regex for parsing strings with quotes in OpenEdge data export
我有一个从 Progress OpenEdge 系统导出的数据,我想在 JavaScript 中进行解析。我想使用正则表达式查找导出的所有字段。
我已经尝试过很多类似的事情:/("[^"]*")|[^\s]+/g
我也尝试过负前瞻性试验(?!"")
但到目前为止我还没有成功。
示例导出输出可能与此类似:
12345 24,25 0 2015-06-30T14:53:14.891 "12345" "24,25" "0" "2015-06-30T14:53:14.891" "" yes no ? "String with ""quoted"" word" "String
with a multi
line string. "" <- Just a quote
" " This is the last value "
6789 35,36 0 2016-07-31T15:54:15.892 "6789" "35,36" "0" "2016-07-31T15:54:15.892" "" no yes ? "Just a simple string" ? ?
字段是:
DEFINE TEMP-TABLE tt_test NO-UNDO
FIELD valueA AS INTEGER
FIELD valueB AS DECIMAL
FIELD valueC AS INTEGER
FIELD valueD AS DATETIME
FIELD valueE AS CHARACTER
FIELD valueF AS CHARACTER
FIELD valueG AS CHARACTER
FIELD valueH AS CHARACTER
FIELD valueI AS CHARACTER
FIELD valueJ AS LOGICAL
FIELD valueK AS LOGICAL
FIELD valueL AS LOGICAL
FIELD valueM AS CHARACTER
FIELD valueN AS CHARACTER
FIELD valueO AS CHARACTER
.
导出格式为:
所有字段都由 space 分隔。字符串包含在双引号字符(")内。如果字符串中有一个引号使用两个双引号字符("")进行转义。如果有一个空字符串也是两个双引号字符( "" ) 但周围有分隔符 spaces。
实际的数据类型和这是一个 Progress 系统的事实并不重要,这只是为我的问题提供一些背景信息。
总结一下: 我如何编写一个(JavaScript 兼容的)正则表达式来成功分隔导出数据的不同部分,同时忽略转义的双 -字符串中的引号?
我认为用一个正则表达式无法做到这一点。您将需要一个解析器。幸运的是,它会很容易写,例如:
str = `12345 24,25 0 2015-06-30T14:53:14.891 "12345" "24,25" "0" "2015-06-30T14:53:14.891" "" yes no ? "String with ""quoted"" word" "String
with a multi
line string. "" <- Just a quote
" " This is the last value "
6789 35,36 0 2016-07-31T15:54:15.892 "6789" "35,36" "0" "2016-07-31T15:54:15.892" "" no yes ? "Just a simple string" ? ?`;
str = str.replace(/""/g, '@');
matches = str.match(/"([\s\S]*?)"|\S+|\n/g);
rows = [[]]
for(var m of matches) {
if (m === '\n') {
rows.push([]);
continue;
}
if(m === '@') {
m = '';
}
if (m[0] === '"') {
m = m.slice(1, -1);
}
m = m.replace(/@/g, '"');
rows[rows.length - 1].push(m)
}
console.log(rows)
我有一个从 Progress OpenEdge 系统导出的数据,我想在 JavaScript 中进行解析。我想使用正则表达式查找导出的所有字段。
我已经尝试过很多类似的事情:/("[^"]*")|[^\s]+/g
我也尝试过负前瞻性试验(?!"")
但到目前为止我还没有成功。
示例导出输出可能与此类似:
12345 24,25 0 2015-06-30T14:53:14.891 "12345" "24,25" "0" "2015-06-30T14:53:14.891" "" yes no ? "String with ""quoted"" word" "String
with a multi
line string. "" <- Just a quote
" " This is the last value "
6789 35,36 0 2016-07-31T15:54:15.892 "6789" "35,36" "0" "2016-07-31T15:54:15.892" "" no yes ? "Just a simple string" ? ?
字段是:
DEFINE TEMP-TABLE tt_test NO-UNDO
FIELD valueA AS INTEGER
FIELD valueB AS DECIMAL
FIELD valueC AS INTEGER
FIELD valueD AS DATETIME
FIELD valueE AS CHARACTER
FIELD valueF AS CHARACTER
FIELD valueG AS CHARACTER
FIELD valueH AS CHARACTER
FIELD valueI AS CHARACTER
FIELD valueJ AS LOGICAL
FIELD valueK AS LOGICAL
FIELD valueL AS LOGICAL
FIELD valueM AS CHARACTER
FIELD valueN AS CHARACTER
FIELD valueO AS CHARACTER
.
导出格式为: 所有字段都由 space 分隔。字符串包含在双引号字符(")内。如果字符串中有一个引号使用两个双引号字符("")进行转义。如果有一个空字符串也是两个双引号字符( "" ) 但周围有分隔符 spaces。
实际的数据类型和这是一个 Progress 系统的事实并不重要,这只是为我的问题提供一些背景信息。
总结一下: 我如何编写一个(JavaScript 兼容的)正则表达式来成功分隔导出数据的不同部分,同时忽略转义的双 -字符串中的引号?
我认为用一个正则表达式无法做到这一点。您将需要一个解析器。幸运的是,它会很容易写,例如:
str = `12345 24,25 0 2015-06-30T14:53:14.891 "12345" "24,25" "0" "2015-06-30T14:53:14.891" "" yes no ? "String with ""quoted"" word" "String
with a multi
line string. "" <- Just a quote
" " This is the last value "
6789 35,36 0 2016-07-31T15:54:15.892 "6789" "35,36" "0" "2016-07-31T15:54:15.892" "" no yes ? "Just a simple string" ? ?`;
str = str.replace(/""/g, '@');
matches = str.match(/"([\s\S]*?)"|\S+|\n/g);
rows = [[]]
for(var m of matches) {
if (m === '\n') {
rows.push([]);
continue;
}
if(m === '@') {
m = '';
}
if (m[0] === '"') {
m = m.slice(1, -1);
}
m = m.replace(/@/g, '"');
rows[rows.length - 1].push(m)
}
console.log(rows)