从包含 Python 中特殊字符的 CSV 单元格中提取字符串
Extract a string from a CSV cell containing special characters in Python
我正在编写一个 Python 程序来从 .CSV 文件列中的每个单元格中提取特定值,然后 使所有提取的值新列。
示例列单元格:(这实际上是一小部分,真正的单元格包含更多数据)
AudioStreams":[{"JitterInterArrival":10,"JitterInterArrivalMax":24,"PacketLossRate":0.01353227,"PacketLossRateMax":0.09027778,"BurstDensity":null,"BurstDuration":null,"BurstGapDensity":null,"BurstGapDuration":null,"BandwidthEst":25245423,"RoundTrip":520,"RoundTripMax":11099,"PacketUtilization":2843,"RatioConcealedSamplesAvg":0.02746676,"ConcealedRatioMax":0.01598402,"PayloadDescription":"SIREN","AudioSampleRate":16000,"AudioFECUsed":true,"SendListenMOS":null,"OverallAvgNetworkMOS":3.487248,"DegradationAvg":0.2727518,"DegradationMax":0.2727518,"NetworkJitterAvg":253.0633,"NetworkJitterMax":1149.659,"JitterBufferSizeAvg":220,"JitterBufferSizeMax":1211,"PossibleDataMissing":false,"StreamDirection":"FROM-to-
我要提取的一个值是 "JitterInterArrival":
和 ,"JitterInterArrivalMax"
之间的数字 10
。但是由于每个单元格都包含相对较长的字符串和周围的特殊字符(例如“”),因此 opener=re.escape(r"***")
和 closer=re.escape(r"***")
将不起作用。
有谁知道更好的解决方案吗?非常感谢!
IIUC,您有一个 json
字符串并希望从其属性中获取值。所以,给定
s = '''
{"AudioStreams":[{"JitterInterArrival":10,"JitterInterArrivalMax":24,"PacketLossRate":0.01353227,"PacketLossRateMax":0.09027778,"BurstDensity":null,
"BurstDuration":null,"BurstGapDensity":null,"BurstGapDuration":null,"BandwidthEst":25245423,"RoundTrip":520,"RoundTripMax":11099,"PacketUtilization":2843,"RatioConcealedSamplesAvg":0.02746676,"ConcealedRatioMax":0.01598402,"PayloadDescription":"SIREN","AudioSampleRate":16000,"AudioFECUsed":true,"SendListenMOS":null,"OverallAvgNetworkMOS":3.487248,"DegradationAvg":0.2727518,
"DegradationMax":0.2727518,"NetworkJitterAvg":253.0633,
"NetworkJitterMax":1149.659,"JitterBufferSizeAvg":220,"JitterBufferSizeMax":1211,
"PossibleDataMissing":false}]}
'''
你可以做到
import json
>>> data = json.loads(s)
>>> ji = data['AudioStreams'][0]['JitterInterArrival']
10
在数据框场景中,如果您有一列 col
字符串,例如上述内容,例如
df = pd.DataFrame({"col": [s]})
您可以使用 transform
传递 json.loads
作为参数
df.col.transform(json.loads)
获取 Series
字典。然后,您可以像上面那样操作这些字典或访问数据。
我正在编写一个 Python 程序来从 .CSV 文件列中的每个单元格中提取特定值,然后 使所有提取的值新列。
示例列单元格:(这实际上是一小部分,真正的单元格包含更多数据)
AudioStreams":[{"JitterInterArrival":10,"JitterInterArrivalMax":24,"PacketLossRate":0.01353227,"PacketLossRateMax":0.09027778,"BurstDensity":null,"BurstDuration":null,"BurstGapDensity":null,"BurstGapDuration":null,"BandwidthEst":25245423,"RoundTrip":520,"RoundTripMax":11099,"PacketUtilization":2843,"RatioConcealedSamplesAvg":0.02746676,"ConcealedRatioMax":0.01598402,"PayloadDescription":"SIREN","AudioSampleRate":16000,"AudioFECUsed":true,"SendListenMOS":null,"OverallAvgNetworkMOS":3.487248,"DegradationAvg":0.2727518,"DegradationMax":0.2727518,"NetworkJitterAvg":253.0633,"NetworkJitterMax":1149.659,"JitterBufferSizeAvg":220,"JitterBufferSizeMax":1211,"PossibleDataMissing":false,"StreamDirection":"FROM-to-
我要提取的一个值是 "JitterInterArrival":
和 ,"JitterInterArrivalMax"
之间的数字 10
。但是由于每个单元格都包含相对较长的字符串和周围的特殊字符(例如“”),因此 opener=re.escape(r"***")
和 closer=re.escape(r"***")
将不起作用。
有谁知道更好的解决方案吗?非常感谢!
IIUC,您有一个 json
字符串并希望从其属性中获取值。所以,给定
s = '''
{"AudioStreams":[{"JitterInterArrival":10,"JitterInterArrivalMax":24,"PacketLossRate":0.01353227,"PacketLossRateMax":0.09027778,"BurstDensity":null,
"BurstDuration":null,"BurstGapDensity":null,"BurstGapDuration":null,"BandwidthEst":25245423,"RoundTrip":520,"RoundTripMax":11099,"PacketUtilization":2843,"RatioConcealedSamplesAvg":0.02746676,"ConcealedRatioMax":0.01598402,"PayloadDescription":"SIREN","AudioSampleRate":16000,"AudioFECUsed":true,"SendListenMOS":null,"OverallAvgNetworkMOS":3.487248,"DegradationAvg":0.2727518,
"DegradationMax":0.2727518,"NetworkJitterAvg":253.0633,
"NetworkJitterMax":1149.659,"JitterBufferSizeAvg":220,"JitterBufferSizeMax":1211,
"PossibleDataMissing":false}]}
'''
你可以做到
import json
>>> data = json.loads(s)
>>> ji = data['AudioStreams'][0]['JitterInterArrival']
10
在数据框场景中,如果您有一列 col
字符串,例如上述内容,例如
df = pd.DataFrame({"col": [s]})
您可以使用 transform
传递 json.loads
作为参数
df.col.transform(json.loads)
获取 Series
字典。然后,您可以像上面那样操作这些字典或访问数据。