从 pandas 中的红色列、绿色列和蓝色列创建十六进制列
Create Hex column from Red column, Green column & Blue column in pandas
我有一个包含 16,777,216 行的 pandas 数据框。这是介于 0 和 255 之间的三列(红色、绿色和蓝色)的所有可能组合。
我想在此数据框中添加一列,该列是该行三个值的十六进制代码。我认为像下面这样的东西是最好的解决方案:
df["Hex"] = "#{0:02x}{1:02x}{2:02x}".format(df["Red"],df["Green"],df["Blue"])
但是,您似乎无法将系列传递到字符串格式方法中。
有没有办法解决这个问题?此外,考虑到数据框相当大,这是最有效的方法吗?
您可以使用.apply
,例如:
df = pd.DataFrame(np.random.randint(256, size=(10, 3)), columns=['Red', 'Green', 'Blue'])
例如:
Red Green Blue
0 125 100 174
1 107 247 235
2 230 254 33
3 91 107 33
4 209 220 232
5 175 10 47
6 120 66 44
7 21 136 254
8 226 237 32
9 89 57 71
然后:
df.apply('#{Red:02X}{Green:02X}{Blue:02X}'.format_map, axis=1)
给你:
0 #7D64AE
1 #6BF7EB
2 #E6FE21
3 #5B6B21
4 #D1DCE8
5 #AF0A2F
6 #78422C
7 #1588FE
8 #E2ED20
9 #593947
dtype: object
对于 python 3.6+
是可能的 使用非常快 f-string
s:
z = zip(df['Red'], df['Blue'], df['Green'])
df["Hex"] = [f'#{R:02X}{B:02X}{G:02X}' for R,B,G in z]
对于较低版本:
df["Hex"] = ['#{0:02X}{1:02X}{2:02X}'.format(R,B,G) for R,B,G in z]
感谢@Jon 改进解决方案:
df["Hex"] = ['#{0:02X}{1:02X}{2:02X}'.format(*el) for el in z]
性能:
#10000 rows
df = pd.DataFrame(np.random.randint(256, size=(10000, 3)), columns=['Red', 'Green', 'Blue'])
In [244]: %%timeit
...: z = zip(df['Red'], df['Green'], df['Blue'])
...: df["Hex"] = [f'#{R:02X}{B:02X}{G:02X}' for R,B,G in z]
...:
12.9 ms ± 45.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [245]: %%timeit
...: z = zip(df['Red'], df['Green'], df['Blue'])
...: df["Hex"] = ['#{0:02X}{1:02X}{2:02X}'.format(R,B,G) for R,B,G in z]
...:
12.4 ms ± 1.14 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [246]: %%timeit
...: z = zip(df['Red'], df['Green'], df['Blue'])
...: df["Hex"] = ['#{0:02X}{1:02X}{2:02X}'.format(*el) for el in z]
...:
11.3 ms ± 55 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [246]: %%timeit
...: df["Hex"] = df.apply('#{Red:02X}{Green:02X}{Blue:02X}'.format_map, axis=1)
...:
346 ms ± 42.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
我有一个包含 16,777,216 行的 pandas 数据框。这是介于 0 和 255 之间的三列(红色、绿色和蓝色)的所有可能组合。
我想在此数据框中添加一列,该列是该行三个值的十六进制代码。我认为像下面这样的东西是最好的解决方案:
df["Hex"] = "#{0:02x}{1:02x}{2:02x}".format(df["Red"],df["Green"],df["Blue"])
但是,您似乎无法将系列传递到字符串格式方法中。
有没有办法解决这个问题?此外,考虑到数据框相当大,这是最有效的方法吗?
您可以使用.apply
,例如:
df = pd.DataFrame(np.random.randint(256, size=(10, 3)), columns=['Red', 'Green', 'Blue'])
例如:
Red Green Blue
0 125 100 174
1 107 247 235
2 230 254 33
3 91 107 33
4 209 220 232
5 175 10 47
6 120 66 44
7 21 136 254
8 226 237 32
9 89 57 71
然后:
df.apply('#{Red:02X}{Green:02X}{Blue:02X}'.format_map, axis=1)
给你:
0 #7D64AE
1 #6BF7EB
2 #E6FE21
3 #5B6B21
4 #D1DCE8
5 #AF0A2F
6 #78422C
7 #1588FE
8 #E2ED20
9 #593947
dtype: object
对于 python 3.6+
是可能的 使用非常快 f-string
s:
z = zip(df['Red'], df['Blue'], df['Green'])
df["Hex"] = [f'#{R:02X}{B:02X}{G:02X}' for R,B,G in z]
对于较低版本:
df["Hex"] = ['#{0:02X}{1:02X}{2:02X}'.format(R,B,G) for R,B,G in z]
感谢@Jon 改进解决方案:
df["Hex"] = ['#{0:02X}{1:02X}{2:02X}'.format(*el) for el in z]
性能:
#10000 rows
df = pd.DataFrame(np.random.randint(256, size=(10000, 3)), columns=['Red', 'Green', 'Blue'])
In [244]: %%timeit
...: z = zip(df['Red'], df['Green'], df['Blue'])
...: df["Hex"] = [f'#{R:02X}{B:02X}{G:02X}' for R,B,G in z]
...:
12.9 ms ± 45.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [245]: %%timeit
...: z = zip(df['Red'], df['Green'], df['Blue'])
...: df["Hex"] = ['#{0:02X}{1:02X}{2:02X}'.format(R,B,G) for R,B,G in z]
...:
12.4 ms ± 1.14 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [246]: %%timeit
...: z = zip(df['Red'], df['Green'], df['Blue'])
...: df["Hex"] = ['#{0:02X}{1:02X}{2:02X}'.format(*el) for el in z]
...:
11.3 ms ± 55 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [246]: %%timeit
...: df["Hex"] = df.apply('#{Red:02X}{Green:02X}{Blue:02X}'.format_map, axis=1)
...:
346 ms ± 42.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)