正则表达式重复模式
regular expression repeating pattern
我尝试使用正则表达式从下面的日志中捕获数据组。模式是
<item> : <key> = <value> , <key> = <value>, ..., <key> = <value>
([#\w\d]*?)[\s]*=[\s]*([.\w\d]*)
可以捕获组<key>
和组<value>
但我也想捕获 <item>
组,所以我将以上内容分组并使用 {n} 重复。
([\w]*):([\s]*(([#\w\d]*?)[\s]*=[\s]*([.\w\d]*)),*){1,}
20141207,07:15:52,0,>>RATIO: casher#=30,
Value=2.579,Units=ratio,Error=N 20141207,07:15:52,0,>>RATIO:
casher#=31, Value=4.509,Units=ratio,Error=N
20141207,07:15:52,0,>>RATIO: casher#=32,
Value=3.735,Units=ratio,Error=N 20141207,07:15:52,0,>>RATIO:
casher#=33, Value=2.401,Units=ratio,Error=N
20141207,07:15:52,0,>>CUSTOMER: casher#=30, Value=50,Units= count
20141207,07:15:52,0,>>CUSTOMER: casher#=31, Value=6,Units= count
20141207,07:15:52,0,>>CUSTOMER: casher#=32, Value=88,Units= count
20141207,07:15:52,0,>>CUSTOMER: casher#=33, Value=33,Units= count
显然结果不是预期的那样。谁能给我一些提示?我正在使用 python 最终翻译成代码。谢谢。
(?<=>>)(\w+):|([\w#]+)\s*=\s*(\S+?)(?:,|\s)
尝试 this.Grab capture.See 演示。
https://regex101.com/r/fA6wE2/1
NODE EXPLANATION
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
>> '>>'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
[\w#]+ any character of: word characters (a-z,
A-Z, 0-9, _), '#' (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
\S+? non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
least amount possible))
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
, ','
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
) end of grouping
您的文件是一个 csv 文件,因此您可以使用 csv 模块让您的生活更轻松:
import csv
f = open('data.txt', 'rb')
for row in csv.reader(f, delimiter=','):
if row:
item, key_and_val = row[3].split(':')
item = item[2:]
key, val = key_and_val.split('=')
print item
print ' {} => {}'.format(key.strip(), val.strip())
for key_and_val in row[4:]:
key, val = key_and_val.split('=')
print ' {} => {}'.format(key.strip(), val.strip())
--output:--
RATIO
casher# => 30
Value => 2.579
Units => ratio
Error => N
RATIO
casher# => 31
Value => 4.509
Units => ratio
Error => N
RATIO
casher# => 32
Value => 3.735
Units => ratio
Error => N
RATIO
casher# => 33
Value => 2.401
Units => ratio
Error => N
CUSTOMER
casher# => 30
Value => 50
Units => count
CUSTOMER
casher# => 31
Value => 6
Units => count
CUSTOMER
casher# => 32
Value => 88
Units => count
CUSTOMER
casher# => 33
Value => 33
Units => count
your matching pattern also matched key=value even if the "item :" not
exist, any advance technique to exclude those key = value line?
以下将跳过没有项目的行:
for row in csv.reader(f, delimiter=','):
if row:
if row[3].startswith('>>'): #Check if there is an item
item, key_and_val = row[3].split(': ')
item = item[2:]
key, val = key_and_val.split('=')
print item
print ' {} => {}'.format(key.strip(), val.strip())
for key_and_val in row[4:]:
key, val = key_and_val.split('=')
print ' {} => {}'.format(key.strip(), val.strip())
f.close()
我尝试使用正则表达式从下面的日志中捕获数据组。模式是
<item> : <key> = <value> , <key> = <value>, ..., <key> = <value>
([#\w\d]*?)[\s]*=[\s]*([.\w\d]*)
可以捕获组<key>
和组<value>
但我也想捕获 <item>
组,所以我将以上内容分组并使用 {n} 重复。
([\w]*):([\s]*(([#\w\d]*?)[\s]*=[\s]*([.\w\d]*)),*){1,}
20141207,07:15:52,0,>>RATIO: casher#=30, Value=2.579,Units=ratio,Error=N 20141207,07:15:52,0,>>RATIO: casher#=31, Value=4.509,Units=ratio,Error=N 20141207,07:15:52,0,>>RATIO: casher#=32, Value=3.735,Units=ratio,Error=N 20141207,07:15:52,0,>>RATIO: casher#=33, Value=2.401,Units=ratio,Error=N
20141207,07:15:52,0,>>CUSTOMER: casher#=30, Value=50,Units= count 20141207,07:15:52,0,>>CUSTOMER: casher#=31, Value=6,Units= count 20141207,07:15:52,0,>>CUSTOMER: casher#=32, Value=88,Units= count 20141207,07:15:52,0,>>CUSTOMER: casher#=33, Value=33,Units= count
显然结果不是预期的那样。谁能给我一些提示?我正在使用 python 最终翻译成代码。谢谢。
(?<=>>)(\w+):|([\w#]+)\s*=\s*(\S+?)(?:,|\s)
尝试 this.Grab capture.See 演示。
https://regex101.com/r/fA6wE2/1
NODE EXPLANATION
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
>> '>>'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
: ':'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
[\w#]+ any character of: word characters (a-z,
A-Z, 0-9, _), '#' (1 or more times
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
\s* whitespace (\n, \r, \t, \f, and " ") (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
\S+? non-whitespace (all but \n, \r, \t, \f,
and " ") (1 or more times (matching the
least amount possible))
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
, ','
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
--------------------------------------------------------------------------------
) end of grouping
您的文件是一个 csv 文件,因此您可以使用 csv 模块让您的生活更轻松:
import csv
f = open('data.txt', 'rb')
for row in csv.reader(f, delimiter=','):
if row:
item, key_and_val = row[3].split(':')
item = item[2:]
key, val = key_and_val.split('=')
print item
print ' {} => {}'.format(key.strip(), val.strip())
for key_and_val in row[4:]:
key, val = key_and_val.split('=')
print ' {} => {}'.format(key.strip(), val.strip())
--output:--
RATIO
casher# => 30
Value => 2.579
Units => ratio
Error => N
RATIO
casher# => 31
Value => 4.509
Units => ratio
Error => N
RATIO
casher# => 32
Value => 3.735
Units => ratio
Error => N
RATIO
casher# => 33
Value => 2.401
Units => ratio
Error => N
CUSTOMER
casher# => 30
Value => 50
Units => count
CUSTOMER
casher# => 31
Value => 6
Units => count
CUSTOMER
casher# => 32
Value => 88
Units => count
CUSTOMER
casher# => 33
Value => 33
Units => count
your matching pattern also matched key=value even if the "item :" not exist, any advance technique to exclude those key = value line?
以下将跳过没有项目的行:
for row in csv.reader(f, delimiter=','):
if row:
if row[3].startswith('>>'): #Check if there is an item
item, key_and_val = row[3].split(': ')
item = item[2:]
key, val = key_and_val.split('=')
print item
print ' {} => {}'.format(key.strip(), val.strip())
for key_and_val in row[4:]:
key, val = key_and_val.split('=')
print ' {} => {}'.format(key.strip(), val.strip())
f.close()