REGEX_TOO_COMPLEX 解析正则表达式时出错

REGEX_TOO_COMPLEX error when parsing regex expression

我需要在逗号处拆分 CSV 文件,但问题是文件可以在字段内包含逗号。例如:

one,two,tree,"four,five","six,seven".

用双引号转义,我解决不了。 我试图在这个正则表达式中使用类似这样的东西,但我得到了一个错误:REGEX_TOO_COMPLEX.

    data: lv_sep     type string,
      lv_rep_pat type string.
    data(lv_row) = iv_row.
"Define a separator to replace commas in double quotes
lv_sep = cl_abap_conv_in_ce=>uccpi( uccp = 10 ).
concatenate '' lv_sep into lv_rep_pat.
"replace all commas that are separator with the new separator
replace all occurrences of regex '(?:"((?:""|[^"]+)+)"|([^,]*))(?:,|$)' in lv_row with lv_rep_pat.

split lv_row at lv_sep into table rt_cells.

我从来没有接触过 ABAP,所以请将此视为伪代码

我建议在这里使用非正则表达式解决方案:

data: checkedOffsetComma type i,
checkedOffsetQuotes type i,
baseOffset type i,
testString type string value 'val1, "val2, val21", val3'.

LOOP AT SomeFancyConditionYouDefine.
    checkedOffsetComma = baseOffset.
    checkedOffsetQuotes = baseOffset.
    find FIRST OCCURRENCE OF ','(or end of line here) in testString match OFFSET checkedOffsetComma.
    write checkedOffsetComma.
    find FIRST OCCURRENCE OF '"' in testString match OFFSET checkedOffsetQuotes.
    write checkedOffsetQuotes.
    
    *if the next comma is closer than the next quotes
    IF checkedOffsetComma < checkedOffsetQuotes.
        REPLACE SECTION checkedOffsetComma 1 OF ',' WITH lv_rep_pat.
        baseOffset = checkedOffsetComma.
    ELSE.
        *if we found  quotes, we go to the next quotes afterwards and then continue as before after that position
        find FIRST OCCURRENCE OF '"' in testString match OFFSET checkedOffsetQuotes.
        write baseOffset.
    ENDIF.
ENDLOOP.

这假定引号中没有引号。没有测试,没有以任何方式验证。如果这至少可以部分编译,我会很高兴:)

您必须使用此正则表达式 => ,(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)

DATA: lv_sep     TYPE string,
      lv_rep_pat TYPE string.
DATA(lv_row) = 'one,two,tree,"four,five","six,seven"'.
"Define a separator to replace commas in double quotes
lv_sep = cl_abap_conv_in_ce=>uccpi( uccp = 10 ).
CONCATENATE '' lv_sep INTO lv_rep_pat.
"replace all commas that are separator with the new separator
REPLACE ALL OCCURRENCES OF REGEX ',(?=(?:[^\"]*\"[^\"]*\")*[^\"]*$)' IN lv_row WITH lv_rep_pat.

SPLIT lv_row AT lv_sep INTO TABLE data(rt_cells).

LOOP AT rt_cells into data(cells).
  WRITE cells.
  SKIP.

ENDLOOP.

Testing output