Double For 循环(在 DataFrame 和 List 上)
Double ForLooping (over DataFrame & List)
我有这个 test2 数据框:
manufacturer condition fuel drive cylinders description
0 ford excellent gas rwd NaN ford in excellent condition. 4 cylinders
1 cadillac NaN NaN NaN 4 cylinders 4 cylinders. Half-new cadillac. Diesel.
2 NaN new diesel fwd 12 cylinders Ford, diesel, new condition.
3 NaN NaN electric NaN 10 cylinders Ferrari, excellent condition. 4wd
4 ferrari NaN NaN 4wd NaN New ferrari. Electric with 12 cylinders.
我想遍历数据框并使用“描述”列的信息填充每列的 NaN 值。为此,我这样做了:
import re
manufacturer = '(ford | cadillac | ferrari)'
condition = '(excellent, good, fair, like new, salvage, new)'
fuel = '(gas, hybrid, diesel, electric)'
drive = '(\S*wd)'
cylinders = '(\d+\s+cylinders?)'
test2['manufacturer'] = test2['manufacturer'].fillna(
test2['description'].str.extract(manufacturer, flags=re.IGNORECASE, expand=False)).str.lower()
test2['condition'] = test2['condition'].fillna(
test2['description'].str.extract(condition, flags=re.IGNORECASE, expand=False)).str.lower()
test2['fuel'] = test2['fuel'].fillna(
test2['description'].str.extract(fuel, flags=re.IGNORECASE, expand=False)).str.lower()
test2['drive'] = test2['drive'].fillna(
test2['description'].str.extract(drive, flags=re.IGNORECASE, expand=False)).str.lower()
test2['cylinders'] = test2['cylinders'].fillna(
test2['description'].str.extract(cylinders, flags=re.IGNORECASE, expand=False)).str.lower()
test2
但是看起来不太好所以我尝试做一个for循环来简化编程:
columns = [manufacturer, condition, fuel, drive, cylinders]
for i in test2:
for column in columns:
test2[i] = test2[i].fillna(
test2['description'].str.extract(column, flags=re.IGNORECASE, expand=False)).str.lower()
无论我怎么尝试,它总是给我错误。它在 test2 中的“i”上很好地循环,但是当它开始在列表“列”上循环时循环出现错误...
知道如何解决这个问题吗?
谢谢!
每个元素循环多次。你应该每个元素只循环一次。使用 zip
函数`合并键和列表。
试试这个代码:
keys = ['manufacturer', 'condition', 'fuel', 'drive', 'cylinders']
columns = [ manufacturer, condition, fuel, drive, cylinders]
for i,column in zip(keys,columns):
test2[i] = test2[i].fillna(
test2['description'].str.extract(column, flags=re.IGNORECASE, expand=False)).str.lower()
我有这个 test2 数据框:
manufacturer condition fuel drive cylinders description
0 ford excellent gas rwd NaN ford in excellent condition. 4 cylinders
1 cadillac NaN NaN NaN 4 cylinders 4 cylinders. Half-new cadillac. Diesel.
2 NaN new diesel fwd 12 cylinders Ford, diesel, new condition.
3 NaN NaN electric NaN 10 cylinders Ferrari, excellent condition. 4wd
4 ferrari NaN NaN 4wd NaN New ferrari. Electric with 12 cylinders.
我想遍历数据框并使用“描述”列的信息填充每列的 NaN 值。为此,我这样做了:
import re
manufacturer = '(ford | cadillac | ferrari)'
condition = '(excellent, good, fair, like new, salvage, new)'
fuel = '(gas, hybrid, diesel, electric)'
drive = '(\S*wd)'
cylinders = '(\d+\s+cylinders?)'
test2['manufacturer'] = test2['manufacturer'].fillna(
test2['description'].str.extract(manufacturer, flags=re.IGNORECASE, expand=False)).str.lower()
test2['condition'] = test2['condition'].fillna(
test2['description'].str.extract(condition, flags=re.IGNORECASE, expand=False)).str.lower()
test2['fuel'] = test2['fuel'].fillna(
test2['description'].str.extract(fuel, flags=re.IGNORECASE, expand=False)).str.lower()
test2['drive'] = test2['drive'].fillna(
test2['description'].str.extract(drive, flags=re.IGNORECASE, expand=False)).str.lower()
test2['cylinders'] = test2['cylinders'].fillna(
test2['description'].str.extract(cylinders, flags=re.IGNORECASE, expand=False)).str.lower()
test2
但是看起来不太好所以我尝试做一个for循环来简化编程:
columns = [manufacturer, condition, fuel, drive, cylinders]
for i in test2:
for column in columns:
test2[i] = test2[i].fillna(
test2['description'].str.extract(column, flags=re.IGNORECASE, expand=False)).str.lower()
无论我怎么尝试,它总是给我错误。它在 test2 中的“i”上很好地循环,但是当它开始在列表“列”上循环时循环出现错误...
知道如何解决这个问题吗? 谢谢!
每个元素循环多次。你应该每个元素只循环一次。使用 zip
函数`合并键和列表。
试试这个代码:
keys = ['manufacturer', 'condition', 'fuel', 'drive', 'cylinders']
columns = [ manufacturer, condition, fuel, drive, cylinders]
for i,column in zip(keys,columns):
test2[i] = test2[i].fillna(
test2['description'].str.extract(column, flags=re.IGNORECASE, expand=False)).str.lower()