将类似于 table 的字符串解析为 JavaScript 对象

Parsing table-like string into JavaScript object

此字符串的结构类似于人类可读的 table。它包含三列。但是,我唯一需要的信息是第一列中所有值的列表。

app115                                115.115                              winget
app225                                115.115Chrome                        winget
Knotes                                1MHz.Knotes                          winget
BPMN-RPA Studio                       1ic.BPMN-RPAstudio                   winget
Fishing Funds                         1zilc.FishingFunds                   winget
3601                                  360.360Chrome                        winget
3602                                  360.360Chrome.X                      winget
3603                                  360.360CleanMaster                   winget
3604                                  360.360se                            winget
3CX Call Flow Designer (.exe edition) 3CX.3CXCallFlowDesigner              winget

使用 javascript,我将如何解析此字符串以获得类似这样的结果:

['app115', 'app225', 'Knotes', 'BPMN-RPA Studio', 'Fishing Funds', '360', '360', '360', '360', '3CX Call Flow Designer (.exe edition)']

以下是我无法实现的一些想法:

步骤 1,由于后两列不是必需的,我们可以先将 'winget' 替换为空白文本 string1.replaceAll("winget", "") 这将删除整个左列,因为该列是 'winget'.

第 2 步删除每边被 2 个或更多 space 包围的所有多个字符。这应该摆脱整个第二列,因为每个值在每一侧至少有两个 space。 - 将不起作用,因为如果第一列中的值是太长了,第二列的值旁边可能只有一个 space。检查原始字符串的最后一行。

最后一步,一旦字符串现在看起来像:"app115 app225 Knotes BPMN-RPA Studio Fishing Funds...",使用string.split(" ")

组成一个数组 希望我的问题有意义。 感谢您的帮助

这个适合我

假设任何类型的空白,包括列之间的制表符

\s+\w+\. 是带有句号

的字符串之前的前导空格

const lines = table.split(/\r?\n/)
const column1 = lines.map(line => line.split(/\s+\w+\./)[0])
console.log(column1)
<script>
const table = `app115                                115.115                              winget
app225                                115.115Chrome                        winget
Knotes                                1MHz.Knotes                          winget
BPMN-RPA Studio                       1ic.BPMN-RPAstudio                   winget
Fishing Funds                         1zilc.FishingFunds                   winget
3601                                  360.360Chrome                        winget
3602                                  360.360Chrome.X                      winget
3603                                  360.360CleanMaster                   winget
3604                                  360.360se                            winget
3CX Call Flow Designer (.exe edition) 3CX.3CXCallFlowDesigner              winget`
</script>

如果第 1 列的长度始终为 38 个字符,则

const lines = table.split(/\r?\n/)
const column1 = lines.map(line => line.slice(0,38).trim())
console.log(column1)
<script>
const table = `app115                                115.115                              winget
app225                                115.115Chrome                        winget
Knotes                                1MHz.Knotes                          winget
BPMN-RPA Studio                       1ic.BPMN-RPAstudio                   winget
Fishing Funds                         1zilc.FishingFunds                   winget
3601                                  360.360Chrome                        winget
3602                                  360.360Chrome.X                      winget
3603                                  360.360CleanMaster                   winget
3604                                  360.360se                            winget
3CX Call Flow Designer (.exe edition) 3CX.3CXCallFlowDesigner              winget`
</script>

此正则表达式将提取固定长度的第一列(此处为 38)。
这是一个模板,可以修改以获取任何列。
它还会修剪前导和尾随空格。 (?<=^\s*(?!\s)).{1,38}(?<!\s)(?<=^.{1,38})|^(?=\s{38})
这是一个单一的操作,是一个Template,只有在使用variable
时才有效 length look behind 构造引擎,如 JS 和 C#。

正则表达式并不比组合密码正则表达式复杂。

  (?<=               # Alignment using a look behind assertion
    ^ \s*              # Beginning of line, optional ws
    (?! \s )           # Not a ws forward
  )
  .{1,38}            # 1-38 characters width column
  (?<! \s )          # Look behind assertion for trailing ws trim
  (?<= ^ .{1,38} )   # Look behind assertion to fix overall length to 38
| 
  ^                  # Or the entire column is WS
  (?= \s{38} )       # Check with look ahead asserstion

column1 = table.match( /(?<=^\s*(?!\s)).{1,38}(?<!\s)(?<=^.{1,38})|^(?=\s{38})/gm )
console.log(column1)
<script>
const table = `app115                                115.115                              winget
app225                                115.115Chrome                        winget
   Knotes                             1MHz.Knotes                          winget
BPMN-RPA Studio                       1ic.BPMN-RPAstudio                   winget
Fishing Funds                         1zilc.FishingFunds                   winget
3601                                  360.360Chrome                        winget
                                      1zilc.FishingFunds                   winget
3602                                  360.360Chrome.X                      winget
3603                                  360.360CleanMaster                   winget
3604                                  360.360se                            winget
3CX Call Flow Designer (.exe edition) 3CX.3CXCallFlowDesigner              winget
    Call Flow Designer (.exe edition) 3CX.3CXCallFlowDesigner              winget
         Flow Designer (.exe edition) 3CX.3CXCallFlowDesigner              winget`
</script>

将上面的概括化以获得任何列,只是列偏移量(以字符为单位)
并且需要列宽。这些可以插入到这个正则表达式模板中:
(?:(?<=^.{N}\s*(?!\s)).{1,W}(?<!\s)(?<=^.{N}.{1,W})|(?<=^.{N})(?=\s{W}))
其中 N 是到列的偏移量。 W 是列的宽度。
在链接示例中,N = 10W = 38.

(?:
  (?<=               # Alignment using a look behind assertion
    ^ .{N} \s*         # Offset to column and leading ws trim
    (?! \s )           # Not a ws forward
  )
  .{1,W}             # 1 - width column characters
  (?<! \s )          # Not a ws behind for trailing ws trim
  (?<=               # Behind col offset and 1 - width, to fix overall length
    ^ .{N} .{1,W}   
  )
|                   # Or the entire column is WS
  (?<= ^ .{N} )      # Alignment behind offset to column
  (?= \s{W} )        # Ahead insure entire column is ws
)