如何对pythonpandas中的列进行base64编码和解码?
How to base64 encode and decode a column in python pandas?
我读了这个post
但我不想加密数据帧,只需将其转换为 base 64。我将一个由回车符 return 分隔的单词列表导入到数据帧中:
words = pd.read_table("sampleText.txt",names=['word'], header=None)
words.head()
这给
word
0 difference
1 where
2 mc
3 is
4 the
然后
words['words_encoded'] = map(lambda x: x.encode('base64','strict'), words['word'])
print (words)
给予
word words_encoded
0 difference <map object at 0x7fad3e89e410>
1 where <map object at 0x7fad3e89e410>
2 mc <map object at 0x7fad3e89e410>
3 is <map object at 0x7fad3e89e410>
4 the <map object at 0x7fad3e89e410>
... ... ...
999995 distribution <map object at 0x7fad3e89e410>
999996 in <map object at 0x7fad3e89e410>
999997 scenario <map object at 0x7fad3e89e410>
999998 less <map object at 0x7fad3e89e410>
999999 land <map object at 0x7fad3e89e410>
[1000000 rows x 2 columns]
我不明白为什么我的编码列指的是地图对象而不是实际数据,所以我尝试了:
b64words = words.word.str.encode('base64')
print(b64words)
给予
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
..
999995 NaN
999996 NaN
999997 NaN
999998 NaN
999999 NaN
Name: word, Length: 1000000, dtype: float64
嗯,
这让我很震惊,所以我阅读了上面的链接答案并尝试了
import base64
def encode(text):
return base64.b64encode(text)
words['Encoded_Column'] = [encode(x) for x in words]
但是得到了
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-89-8cf5a6f1f3a9> in <module>
2 def encode(text):
3 return base64.b64encode(text)
----> 4 words['Encoded_Column'] = [encode(x) for x in words]
<ipython-input-89-8cf5a6f1f3a9> in <listcomp>(.0)
2 def encode(text):
3 return base64.b64encode(text)
----> 4 words['Encoded_Column'] = [encode(x) for x in words]
<ipython-input-89-8cf5a6f1f3a9> in encode(text)
1 import base64
2 def encode(text):
----> 3 return base64.b64encode(text)
4 words['Encoded_Column'] = [encode(x) for x in words]
~/miniconda3/envs/p37cu10.2PyTo/lib/python3.7/base64.py in b64encode(s, altchars)
56 application to e.g. generate url or filesystem safe Base64 strings.
57 """
---> 58 encoded = binascii.b2a_base64(s, newline=False)
59 if altchars is not None:
60 assert len(altchars) == 2, repr(altchars)
TypeError: a bytes-like object is required, not 'str'
所以我尝试像这样转换为类似字节的对象:
import base64
def encode(text):
btext = text.str.encode('utf-8')
return base64.b64encode(btext)
words['Encoded_Column'] = [encode(x) for x in words]
但是得到了
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-90-46db6d3688ba> in <module>
3 btext = text.str.encode('utf-8')
4 return base64.b64encode(btext)
----> 5 words['Encoded_Column'] = [encode(x) for x in words]
<ipython-input-90-46db6d3688ba> in <listcomp>(.0)
3 btext = text.str.encode('utf-8')
4 return base64.b64encode(btext)
----> 5 words['Encoded_Column'] = [encode(x) for x in words]
<ipython-input-90-46db6d3688ba> in encode(text)
1 import base64
2 def encode(text):
----> 3 btext = text.str.encode('utf-8')
4 return base64.b64encode(btext)
5 words['Encoded_Column'] = [encode(x) for x in words]
AttributeError: 'str' object has no attribute 'str'
在 this C 示例中,他们也首先转换为字节字符串,然后转换为 base64,但我无法在 Python.I 中完成这个简单的任务,我掉进了这个兔子洞,每次尝试都会得到我更深。我非常感谢头脑清醒的人可以提供的任何帮助。
只需从函数体中删除.str。
真实代码:
import base64
def encode(text):
btext = text.encode('utf-8')
return base64.b64encode(btext)
words = {'1': 1, '2': 2, '3': 3, 'asdasd': 4}
words['Encoded_Column'] = [encode(x) for x in words]
print(words)
它的输出是:
{'1': 1, '2': 2, '3': 3, 'asdasd': 4, 'Encoded_Column': [b'MQ==', b'Mg==', b'Mw==', b'YXNkYXNk']}
map
returns 一个迭代器,而不是一个列表,所以 pandas
只是将它分配给新形成的 "words_encoded" 列中的所有槽。同样,如果您执行 words['all_ones'] = 1
,pandas
会在该列下分配一个 1。
其次,"base64" 不是字符串的编解码器,它适用于 bytes
。您必须选择一种文本编码,然后对其进行编码。所以,
words['word_encoded'] = words.word.str.encode(
'utf-8', 'strict').str.encode('base64')
除了这个编码器在 base64 字符串的末尾放置一个“\n”外,我觉得这很奇怪。相反,您可以执行以下操作之一
words['word_encoded'] = words.word.str.encode(
'utf-8', 'strict').apply(
base64.b64encode)
# or
words['word_encoded'] = [base64.b64encode(x.encode('utf-8', 'strict'))
for x in words.word]
我个人认为第一个有点多"pandas",因为它直接生成系列,没有中间列表。
行动中的解决方案
>>> import base64
>>> import pandas as pd
>>> words = pd.read_table("sampleText.txt",names=['word'], header=None)
__main__:1: FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.
>>> words['word_encoded'] = words.word.str.encode(
... 'utf-8', 'strict').str.encode('base64')
>>>
>>> words
word word_encoded
0 difference b'ZGlmZmVyZW5jZQ==\n'
1 where b'd2hlcmU=\n'
2 mc b'bWM=\n'
3 is b'aXM=\n'
4 the b'dGhl\n'
>>>
>>> words['word_encoded'] = words.word.str.encode(
... 'utf-8', 'strict').apply(
... base64.b64encode)
>>>
>>> words
word word_encoded
0 difference b'ZGlmZmVyZW5jZQ=='
1 where b'd2hlcmU='
2 mc b'bWM='
3 is b'aXM='
4 the b'dGhl'
>>>
>>> words['word_encoded'] = [base64.b64encode(x.encode('utf-8', 'strict'))
... for x in words.word]
>>>
>>> words
word word_encoded
0 difference b'ZGlmZmVyZW5jZQ=='
1 where b'd2hlcmU='
2 mc b'bWM='
3 is b'aXM='
4 the b'dGhl'
我读了这个post
但我不想加密数据帧,只需将其转换为 base 64。我将一个由回车符 return 分隔的单词列表导入到数据帧中:
words = pd.read_table("sampleText.txt",names=['word'], header=None)
words.head()
这给
word
0 difference
1 where
2 mc
3 is
4 the
然后
words['words_encoded'] = map(lambda x: x.encode('base64','strict'), words['word'])
print (words)
给予
word words_encoded
0 difference <map object at 0x7fad3e89e410>
1 where <map object at 0x7fad3e89e410>
2 mc <map object at 0x7fad3e89e410>
3 is <map object at 0x7fad3e89e410>
4 the <map object at 0x7fad3e89e410>
... ... ...
999995 distribution <map object at 0x7fad3e89e410>
999996 in <map object at 0x7fad3e89e410>
999997 scenario <map object at 0x7fad3e89e410>
999998 less <map object at 0x7fad3e89e410>
999999 land <map object at 0x7fad3e89e410>
[1000000 rows x 2 columns]
我不明白为什么我的编码列指的是地图对象而不是实际数据,所以我尝试了:
b64words = words.word.str.encode('base64')
print(b64words)
给予
0 NaN
1 NaN
2 NaN
3 NaN
4 NaN
..
999995 NaN
999996 NaN
999997 NaN
999998 NaN
999999 NaN
Name: word, Length: 1000000, dtype: float64
嗯,
这让我很震惊,所以我阅读了上面的链接答案并尝试了
import base64
def encode(text):
return base64.b64encode(text)
words['Encoded_Column'] = [encode(x) for x in words]
但是得到了
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-89-8cf5a6f1f3a9> in <module>
2 def encode(text):
3 return base64.b64encode(text)
----> 4 words['Encoded_Column'] = [encode(x) for x in words]
<ipython-input-89-8cf5a6f1f3a9> in <listcomp>(.0)
2 def encode(text):
3 return base64.b64encode(text)
----> 4 words['Encoded_Column'] = [encode(x) for x in words]
<ipython-input-89-8cf5a6f1f3a9> in encode(text)
1 import base64
2 def encode(text):
----> 3 return base64.b64encode(text)
4 words['Encoded_Column'] = [encode(x) for x in words]
~/miniconda3/envs/p37cu10.2PyTo/lib/python3.7/base64.py in b64encode(s, altchars)
56 application to e.g. generate url or filesystem safe Base64 strings.
57 """
---> 58 encoded = binascii.b2a_base64(s, newline=False)
59 if altchars is not None:
60 assert len(altchars) == 2, repr(altchars)
TypeError: a bytes-like object is required, not 'str'
所以我尝试像这样转换为类似字节的对象:
import base64
def encode(text):
btext = text.str.encode('utf-8')
return base64.b64encode(btext)
words['Encoded_Column'] = [encode(x) for x in words]
但是得到了
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-90-46db6d3688ba> in <module>
3 btext = text.str.encode('utf-8')
4 return base64.b64encode(btext)
----> 5 words['Encoded_Column'] = [encode(x) for x in words]
<ipython-input-90-46db6d3688ba> in <listcomp>(.0)
3 btext = text.str.encode('utf-8')
4 return base64.b64encode(btext)
----> 5 words['Encoded_Column'] = [encode(x) for x in words]
<ipython-input-90-46db6d3688ba> in encode(text)
1 import base64
2 def encode(text):
----> 3 btext = text.str.encode('utf-8')
4 return base64.b64encode(btext)
5 words['Encoded_Column'] = [encode(x) for x in words]
AttributeError: 'str' object has no attribute 'str'
在 this C 示例中,他们也首先转换为字节字符串,然后转换为 base64,但我无法在 Python.I 中完成这个简单的任务,我掉进了这个兔子洞,每次尝试都会得到我更深。我非常感谢头脑清醒的人可以提供的任何帮助。
只需从函数体中删除.str。 真实代码:
import base64
def encode(text):
btext = text.encode('utf-8')
return base64.b64encode(btext)
words = {'1': 1, '2': 2, '3': 3, 'asdasd': 4}
words['Encoded_Column'] = [encode(x) for x in words]
print(words)
它的输出是:
{'1': 1, '2': 2, '3': 3, 'asdasd': 4, 'Encoded_Column': [b'MQ==', b'Mg==', b'Mw==', b'YXNkYXNk']}
map
returns 一个迭代器,而不是一个列表,所以 pandas
只是将它分配给新形成的 "words_encoded" 列中的所有槽。同样,如果您执行 words['all_ones'] = 1
,pandas
会在该列下分配一个 1。
其次,"base64" 不是字符串的编解码器,它适用于 bytes
。您必须选择一种文本编码,然后对其进行编码。所以,
words['word_encoded'] = words.word.str.encode(
'utf-8', 'strict').str.encode('base64')
除了这个编码器在 base64 字符串的末尾放置一个“\n”外,我觉得这很奇怪。相反,您可以执行以下操作之一
words['word_encoded'] = words.word.str.encode(
'utf-8', 'strict').apply(
base64.b64encode)
# or
words['word_encoded'] = [base64.b64encode(x.encode('utf-8', 'strict'))
for x in words.word]
我个人认为第一个有点多"pandas",因为它直接生成系列,没有中间列表。
行动中的解决方案
>>> import base64
>>> import pandas as pd
>>> words = pd.read_table("sampleText.txt",names=['word'], header=None)
__main__:1: FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\t'.
>>> words['word_encoded'] = words.word.str.encode(
... 'utf-8', 'strict').str.encode('base64')
>>>
>>> words
word word_encoded
0 difference b'ZGlmZmVyZW5jZQ==\n'
1 where b'd2hlcmU=\n'
2 mc b'bWM=\n'
3 is b'aXM=\n'
4 the b'dGhl\n'
>>>
>>> words['word_encoded'] = words.word.str.encode(
... 'utf-8', 'strict').apply(
... base64.b64encode)
>>>
>>> words
word word_encoded
0 difference b'ZGlmZmVyZW5jZQ=='
1 where b'd2hlcmU='
2 mc b'bWM='
3 is b'aXM='
4 the b'dGhl'
>>>
>>> words['word_encoded'] = [base64.b64encode(x.encode('utf-8', 'strict'))
... for x in words.word]
>>>
>>> words
word word_encoded
0 difference b'ZGlmZmVyZW5jZQ=='
1 where b'd2hlcmU='
2 mc b'bWM='
3 is b'aXM='
4 the b'dGhl'