组合 2 个正则表达式模式以获得子字符串
Combine 2 regex patterns to get a substring
我有一个要解析的文本文件 "sanitise"。来自文件
的样本数据
Trade '4379160'\Acquire Day 2015-05-07 Create acquire_day
Trade '4379160'\Fund XXXY Create acquirer_ptynbr
Trade '4379160'\Assinf Create assinf
Trade '4379160'\Authorizer Create authorizer_usrnbr
Trade '4379160'\Base Curr Equivalent 0 Create base_cost_dirty
我想要实现的是在第一个反斜杠之后得到前2个"fields"。例如,Acquire Day 2015-05-07
。请注意,有时第二个字段为空(这没关系 - 我不需要任何 Create 字符串)。我所做的是使用 RegEx
首先查找反斜杠后的任何内容,然后获取 2 个必填字段。到目前为止我的测试代码
Private Sub SanitiseTradeAudit(fileInput)
Dim objFSO, objFile, regEx, validTxt, validTxt1, arrValidTxt, i
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(fileInput, 1)
validTxt = objFile.ReadAll
objFile.Close
Set objFile = Nothing
Set regEx = New RegExp
regEx.Pattern = "(.*)\'\(.*)" 'To Remove all [[ Trade '4379160'\ ]] prefix from audit lines
regEx.Global = True
validTxt = regEx.Replace(validTxt, "") 'Text would be ==> Aggregate 0 Create aggregate
regEx.Pattern = "[(\t.*)](\t.*)" 'Pick only first 2 data points ==> Aggregate 0
regEx.Global = True
validTxt1 = regEx.Replace(validTxt, vbCr)
arrValidTxt = Split(validTxt1, vbCrLf) 'To Remove the first 2 header lines, split it based on new line
Set objFile = objFSO.OpenTextFile(fileInput, 2)
For i = 2 To (Ubound(arrValidTxt) - 1) 'Ignore first 2 header lines
objFile.WriteLine arrValidTxt(i)
Next
objFile.Close
Set objFile = Nothing
Set regEx = Nothing
Set objFSO = Nothing
End sub
Call SanitiseTradeAudit("C:\Users\pankaj.jaju\Desktop\ActualAuditMessage.txt")
我的问题是 - 这个正则表达式替换可以在一个模式中完成吗?
如果您逐行处理文件,这样的模式应该有效:
^.*?\([^\t]*)\t([^\t]*)
以上匹配所有内容,直到第一个反斜杠(非贪婪匹配)后跟由单个制表符分隔的两组零个或多个非制表符字符(贪婪匹配)。
示例代码:
Set re = New RegExp
re.Pattern = "^.*?\([^\t]*)\t([^\t]*)"
txt = objFSO.OpenTextFile(fileInput).ReadAll
Set objFile = objFSO.OpenTextFile(fileInput)
For Each line In Split(txt, vbNewLine)
For Each m In re.Execute(line)
objFile.WriteLine m.SubMatches(0) & vbTab & m.SubMatches(1)
Next
Next
objFile.Close
如果您需要处理大文件,我会完全删除 ReadAll
并逐行读取输入文件以避免内存耗尽:
Set re = New RegExp
re.Pattern = "^.*?\([^\t]*)\t([^\t]*)"
Set inFile = objFSO.OpenTextFile(fileInput)
Set outFile = objFSO.OpenTextFile(fileOutput, 2, True)
Do Until inFile.AtEndOfStream
line = inFile.ReadLine
For Each m In re.Execute(line)
objFile.WriteLine m.SubMatches(0) & vbTab & m.SubMatches(1)
Next
Loop
inFile.Close
outFile.Close
我有一个要解析的文本文件 "sanitise"。来自文件
的样本数据Trade '4379160'\Acquire Day 2015-05-07 Create acquire_day Trade '4379160'\Fund XXXY Create acquirer_ptynbr Trade '4379160'\Assinf Create assinf Trade '4379160'\Authorizer Create authorizer_usrnbr Trade '4379160'\Base Curr Equivalent 0 Create base_cost_dirty
我想要实现的是在第一个反斜杠之后得到前2个"fields"。例如,Acquire Day 2015-05-07
。请注意,有时第二个字段为空(这没关系 - 我不需要任何 Create 字符串)。我所做的是使用 RegEx
首先查找反斜杠后的任何内容,然后获取 2 个必填字段。到目前为止我的测试代码
Private Sub SanitiseTradeAudit(fileInput)
Dim objFSO, objFile, regEx, validTxt, validTxt1, arrValidTxt, i
Set objFSO = CreateObject("Scripting.FileSystemObject")
Set objFile = objFSO.OpenTextFile(fileInput, 1)
validTxt = objFile.ReadAll
objFile.Close
Set objFile = Nothing
Set regEx = New RegExp
regEx.Pattern = "(.*)\'\(.*)" 'To Remove all [[ Trade '4379160'\ ]] prefix from audit lines
regEx.Global = True
validTxt = regEx.Replace(validTxt, "") 'Text would be ==> Aggregate 0 Create aggregate
regEx.Pattern = "[(\t.*)](\t.*)" 'Pick only first 2 data points ==> Aggregate 0
regEx.Global = True
validTxt1 = regEx.Replace(validTxt, vbCr)
arrValidTxt = Split(validTxt1, vbCrLf) 'To Remove the first 2 header lines, split it based on new line
Set objFile = objFSO.OpenTextFile(fileInput, 2)
For i = 2 To (Ubound(arrValidTxt) - 1) 'Ignore first 2 header lines
objFile.WriteLine arrValidTxt(i)
Next
objFile.Close
Set objFile = Nothing
Set regEx = Nothing
Set objFSO = Nothing
End sub
Call SanitiseTradeAudit("C:\Users\pankaj.jaju\Desktop\ActualAuditMessage.txt")
我的问题是 - 这个正则表达式替换可以在一个模式中完成吗?
如果您逐行处理文件,这样的模式应该有效:
^.*?\([^\t]*)\t([^\t]*)
以上匹配所有内容,直到第一个反斜杠(非贪婪匹配)后跟由单个制表符分隔的两组零个或多个非制表符字符(贪婪匹配)。
示例代码:
Set re = New RegExp
re.Pattern = "^.*?\([^\t]*)\t([^\t]*)"
txt = objFSO.OpenTextFile(fileInput).ReadAll
Set objFile = objFSO.OpenTextFile(fileInput)
For Each line In Split(txt, vbNewLine)
For Each m In re.Execute(line)
objFile.WriteLine m.SubMatches(0) & vbTab & m.SubMatches(1)
Next
Next
objFile.Close
如果您需要处理大文件,我会完全删除 ReadAll
并逐行读取输入文件以避免内存耗尽:
Set re = New RegExp
re.Pattern = "^.*?\([^\t]*)\t([^\t]*)"
Set inFile = objFSO.OpenTextFile(fileInput)
Set outFile = objFSO.OpenTextFile(fileOutput, 2, True)
Do Until inFile.AtEndOfStream
line = inFile.ReadLine
For Each m In re.Execute(line)
objFile.WriteLine m.SubMatches(0) & vbTab & m.SubMatches(1)
Next
Loop
inFile.Close
outFile.Close