将日语字符解码为 Base64
Decode Japanese Characters to Base64
关于解码我有一点点复杂的问题...
我有一个代码,它从 gmail(由 siri 记录)中记录笔记并将它们插入变量并比较单词的 len
以了解该单词是否在 keywords
列表中 - 另一个 .py 文件
问题是 gmail 将日语字符 車
更改为 6luk
并且它不匹配...即使我更改 keywords
.py 文件中的单词 車
到 6luk
它不起作用....只有当我直接将 6luk
写入代码时它才起作用。
当我使用
时 6luk
可以更改为 車
base64.b64decode(command).decode('utf-8')
但因为它已经在
中进行解码
voice_command = email.message_from_string(data[0][1].decode('utf-8'))
效果不佳....我可以从那里删除 .decode('utf-8')
,但它根本不起作用...
我试图将包含 6luk
的变量 command
从 gmail 解码为在线工作(解码站点)的 base64
,即使在另一个文件中
base64.b64decode(command).decode('utf-8')
但它不会在 command
变量内工作。
上面写着
The word(s) '6luk' have been said
Received an exception while running: 'utf-8' codec can't decode byte 0xea in position 0: invalid continuation byte
我正在搜索看起来像 Latin-1 的 0xea
,但是当我将其转换为 Latin-1
时,它变得更加复杂:ê[¤
这是代码,它是
的一部分
hackster.io/thesanjeetc/siricontrol-add-siri-voice-control-to-any-project-644b52
项目
顺便说一句。 gmail 中的原始笔记,看起来像这样
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: base64
From: <@gmail.com>
X-Uniform-Type-Identifier: com.apple.mail-note
Mime-Version: 1.0 (iOS/12.2 \(-----\) dataaccessd/1.0)
Date: Thu, 25 Apr 2019 11:42:33 +0900
X-Mail-Created-Date: Thu, 25 Apr 2019 11:42:33 +0900
Subject: 車
X-Universally-Unique-Identifier: --------
Message-Id: <-------@gmail.com>
6LuK
import time
import imaplib
import email
import os
import pkgutil
import base64
##########################################
# Add your gmail username and password here
username = ""
password = ""
##########################################
class ControlException(Exception):
pass
class Control():
def __init__(self, username, password):
print("------------------------------------------------------")
print("- SIRI CONTROL -")
print("- Created by Sanjeet Chatterjee -")
print("- Website: https://medium.com/@thesanjeetc -")
print("------------------------------------------------------")
try:
self.last_checked = -1
self.mail = imaplib.IMAP4_SSL("imap.gmail.com", 993)
self.mail.login(username, password)
self.mail.list()
self.mail.select("Notes")
# Gets last Note id to stop last command from executing
result, uidlist = self.mail.search(None, "ALL")
try:
self.last_checked = uidlist[0].split()[-1]
except IndexError:
pass
self.load()
self.handle()
except imaplib.IMAP4.error:
print("Your username and password is incorrect")
print("Or IMAP is not enabled.")
def load(self):
"""Try to load all modules found in the modules folder"""
print("\n")
print("Loading modules...")
self.modules = []
path = os.path.join(os.path.dirname(__file__), "modules")
directory = pkgutil.iter_modules(path=[path])
for finder, name, ispkg in directory:
try:
loader = finder.find_module(name)
module = loader.load_module(name)
if hasattr(module, "commandWords") \
and hasattr(module, "moduleName") \
and hasattr(module, "execute"):
self.modules.append(module)
print("The module '{0}' has been loaded, "
"successfully.".format(name))
else:
print("[ERROR] The module '{0}' is not in the "
"correct format.".format(name))
except:
print("[ERROR] The module '" + name + "' has some errors.")
print("\n")
def fetch_command(self):
"""Retrieve the last Note created if new id found"""
self.mail.list()
self.mail.select("Notes")
result, uidlist = self.mail.search(None, "ALL")
try:
latest_email_id = uidlist[0].split()[-1]
except IndexError:
return
if latest_email_id == self.last_checked:
return
self.last_checked = latest_email_id
result, data = self.mail.fetch(latest_email_id, "(RFC822)")
voice_command = email.message_from_string(data[0][1].decode('utf-8'))
return str(voice_command.get_payload()).lower().strip()
def handle(self):
"""Handle new commands
Poll continuously every second and check for new commands.
"""
print("Fetching commands...")
print("\n")
while True:
try:
command = self.fetch_command()
if not command:
raise ControlException("No command found.")
print("The word(s) '" + command + "' have been said")
command = base64.b64decode(command)
command = (command.decode('Latin-1'))
command = base64.b64encode(command).encode('utf-8')
command = base64.b64encode(command).decode('utf-8')
print(command)
for module in self.modules:
foundWords = []
for word in module.commandWords:
if str(word) in command:
foundWords.append(str(word))
if len(foundWords) == len(module.commandWords):
try:
module.execute(command)
print("The module {0} has been executed "
"successfully.".format(module.moduleName))
except:
print("[ERROR] There has been an error "
"when running the {0} module".format(
module.moduleName))
else:
print("\n")
except (TypeError, ControlException):
pass
except Exception as exc:
print("Received an exception while running: {exc}".format(
**locals()))
print("Restarting...")
time.sleep(1)
if __name__ == '__main__':
Control(username, password)
您使用 imaplib
检索的 body 是 bytes
object。无需 decode
即可将其传递给 b64decode
;
>>> base64.b64decode(b'6Luk')
b'\xe8\xbb\xa4'
这是字符 U+8ECA 的 UTF-8 编码,因此下一步是对其进行解码。
>>> base64.b64decode(b'6Luk').decode('utf-8')
'軤'
如何修复您的代码是个好问题。我会将 fetch_command
更改为 return 来自有效负载的实际解码字符串,因为您已经在该函数中对您期望的输入进行了许多假设。
无法访问您的基础设施,我真的没有很好的方法来测试它,但即兴的,也许是像
def fetch_command(self):
"""Retrieve the body of the last Note created if new id found"""
self.mail.list()
self.mail.select("Notes")
result, uidlist = self.mail.search(None, "ALL")
try:
latest_email_id = uidlist[0].split()[-1]
except IndexError:
return
if latest_email_id == self.last_checked:
return
self.last_checked = latest_email_id
result, data = self.mail.fetch(latest_email_id, "(RFC822)")
# use message_from_bytes instead of attempting to decode something which almost certainly isn't UTF-8
note = email.message_from_bytes(data[0][1])
# extract body part
voice_command = note.get_payload(decode=True)
return voice_command.lower().strip()
def handle(self):
"""Handle new commands
Poll continuously every second and check for new commands.
"""
print("Fetching commands...")
#print("\n") # empty output lines are an annoyance up with which I will not put
while True:
try:
command = self.fetch_command()
if not command:
raise ControlException("No command found.")
print("The word(s) '" + command + "' have been said")
#print(command)
# etc etc
如果您的 Python 足够新(实际上是 3.3+,但正确地说是从 3.6 开始,也就是新的 API 成为默认值的时候),您可能需要研究使用新的email
库功能 email.policy
而不是旧界面。
from email.policy import default
# ....
note = email.message_from_bytes(data[0][1], policy=default)
voice_command = note.get_body()
您会注意到我们让 email
库决定解码什么以及如何解码。我们避免对 utf-8
或 base64
之类的内容进行硬编码,因为不同的文本可能会使用不同的字符集 and/or 和不同的传输编码。您必须检查并遵守每个单独邮件部分的 MIME headers。 (我们正在硬编码期望只有一个有效载荷。我也不完全确定这是一个可靠的假设。)
顺便说一下,这种消息格式并不是 GMail 的一个特殊功能,它是 MIME 封装内容以使其与基本的 7 位 ASCII-only RFC822 电子邮件消息格式兼容的方式。
关于解码我有一点点复杂的问题...
我有一个代码,它从 gmail(由 siri 记录)中记录笔记并将它们插入变量并比较单词的 len
以了解该单词是否在 keywords
列表中 - 另一个 .py 文件
问题是 gmail 将日语字符 車
更改为 6luk
并且它不匹配...即使我更改 keywords
.py 文件中的单词 車
到 6luk
它不起作用....只有当我直接将 6luk
写入代码时它才起作用。
当我使用
6luk
可以更改为 車
base64.b64decode(command).decode('utf-8')
但因为它已经在
中进行解码 voice_command = email.message_from_string(data[0][1].decode('utf-8'))
效果不佳....我可以从那里删除 .decode('utf-8')
,但它根本不起作用...
我试图将包含 6luk
的变量 command
从 gmail 解码为在线工作(解码站点)的 base64
,即使在另一个文件中
base64.b64decode(command).decode('utf-8')
但它不会在 command
变量内工作。
上面写着
The word(s) '6luk' have been said
Received an exception while running: 'utf-8' codec can't decode byte 0xea in position 0: invalid continuation byte
我正在搜索看起来像 Latin-1 的 0xea
,但是当我将其转换为 Latin-1
时,它变得更加复杂:ê[¤
这是代码,它是
的一部分 hackster.io/thesanjeetc/siricontrol-add-siri-voice-control-to-any-project-644b52
项目
顺便说一句。 gmail 中的原始笔记,看起来像这样
Content-Type: text/html; charset=utf-8
Content-Transfer-Encoding: base64
From: <@gmail.com>
X-Uniform-Type-Identifier: com.apple.mail-note
Mime-Version: 1.0 (iOS/12.2 \(-----\) dataaccessd/1.0)
Date: Thu, 25 Apr 2019 11:42:33 +0900
X-Mail-Created-Date: Thu, 25 Apr 2019 11:42:33 +0900
Subject: 車
X-Universally-Unique-Identifier: --------
Message-Id: <-------@gmail.com>
6LuK
import time
import imaplib
import email
import os
import pkgutil
import base64
##########################################
# Add your gmail username and password here
username = ""
password = ""
##########################################
class ControlException(Exception):
pass
class Control():
def __init__(self, username, password):
print("------------------------------------------------------")
print("- SIRI CONTROL -")
print("- Created by Sanjeet Chatterjee -")
print("- Website: https://medium.com/@thesanjeetc -")
print("------------------------------------------------------")
try:
self.last_checked = -1
self.mail = imaplib.IMAP4_SSL("imap.gmail.com", 993)
self.mail.login(username, password)
self.mail.list()
self.mail.select("Notes")
# Gets last Note id to stop last command from executing
result, uidlist = self.mail.search(None, "ALL")
try:
self.last_checked = uidlist[0].split()[-1]
except IndexError:
pass
self.load()
self.handle()
except imaplib.IMAP4.error:
print("Your username and password is incorrect")
print("Or IMAP is not enabled.")
def load(self):
"""Try to load all modules found in the modules folder"""
print("\n")
print("Loading modules...")
self.modules = []
path = os.path.join(os.path.dirname(__file__), "modules")
directory = pkgutil.iter_modules(path=[path])
for finder, name, ispkg in directory:
try:
loader = finder.find_module(name)
module = loader.load_module(name)
if hasattr(module, "commandWords") \
and hasattr(module, "moduleName") \
and hasattr(module, "execute"):
self.modules.append(module)
print("The module '{0}' has been loaded, "
"successfully.".format(name))
else:
print("[ERROR] The module '{0}' is not in the "
"correct format.".format(name))
except:
print("[ERROR] The module '" + name + "' has some errors.")
print("\n")
def fetch_command(self):
"""Retrieve the last Note created if new id found"""
self.mail.list()
self.mail.select("Notes")
result, uidlist = self.mail.search(None, "ALL")
try:
latest_email_id = uidlist[0].split()[-1]
except IndexError:
return
if latest_email_id == self.last_checked:
return
self.last_checked = latest_email_id
result, data = self.mail.fetch(latest_email_id, "(RFC822)")
voice_command = email.message_from_string(data[0][1].decode('utf-8'))
return str(voice_command.get_payload()).lower().strip()
def handle(self):
"""Handle new commands
Poll continuously every second and check for new commands.
"""
print("Fetching commands...")
print("\n")
while True:
try:
command = self.fetch_command()
if not command:
raise ControlException("No command found.")
print("The word(s) '" + command + "' have been said")
command = base64.b64decode(command)
command = (command.decode('Latin-1'))
command = base64.b64encode(command).encode('utf-8')
command = base64.b64encode(command).decode('utf-8')
print(command)
for module in self.modules:
foundWords = []
for word in module.commandWords:
if str(word) in command:
foundWords.append(str(word))
if len(foundWords) == len(module.commandWords):
try:
module.execute(command)
print("The module {0} has been executed "
"successfully.".format(module.moduleName))
except:
print("[ERROR] There has been an error "
"when running the {0} module".format(
module.moduleName))
else:
print("\n")
except (TypeError, ControlException):
pass
except Exception as exc:
print("Received an exception while running: {exc}".format(
**locals()))
print("Restarting...")
time.sleep(1)
if __name__ == '__main__':
Control(username, password)
您使用 imaplib
检索的 body 是 bytes
object。无需 decode
即可将其传递给 b64decode
;
>>> base64.b64decode(b'6Luk')
b'\xe8\xbb\xa4'
这是字符 U+8ECA 的 UTF-8 编码,因此下一步是对其进行解码。
>>> base64.b64decode(b'6Luk').decode('utf-8')
'軤'
如何修复您的代码是个好问题。我会将 fetch_command
更改为 return 来自有效负载的实际解码字符串,因为您已经在该函数中对您期望的输入进行了许多假设。
无法访问您的基础设施,我真的没有很好的方法来测试它,但即兴的,也许是像
def fetch_command(self):
"""Retrieve the body of the last Note created if new id found"""
self.mail.list()
self.mail.select("Notes")
result, uidlist = self.mail.search(None, "ALL")
try:
latest_email_id = uidlist[0].split()[-1]
except IndexError:
return
if latest_email_id == self.last_checked:
return
self.last_checked = latest_email_id
result, data = self.mail.fetch(latest_email_id, "(RFC822)")
# use message_from_bytes instead of attempting to decode something which almost certainly isn't UTF-8
note = email.message_from_bytes(data[0][1])
# extract body part
voice_command = note.get_payload(decode=True)
return voice_command.lower().strip()
def handle(self):
"""Handle new commands
Poll continuously every second and check for new commands.
"""
print("Fetching commands...")
#print("\n") # empty output lines are an annoyance up with which I will not put
while True:
try:
command = self.fetch_command()
if not command:
raise ControlException("No command found.")
print("The word(s) '" + command + "' have been said")
#print(command)
# etc etc
如果您的 Python 足够新(实际上是 3.3+,但正确地说是从 3.6 开始,也就是新的 API 成为默认值的时候),您可能需要研究使用新的email
库功能 email.policy
而不是旧界面。
from email.policy import default
# ....
note = email.message_from_bytes(data[0][1], policy=default)
voice_command = note.get_body()
您会注意到我们让 email
库决定解码什么以及如何解码。我们避免对 utf-8
或 base64
之类的内容进行硬编码,因为不同的文本可能会使用不同的字符集 and/or 和不同的传输编码。您必须检查并遵守每个单独邮件部分的 MIME headers。 (我们正在硬编码期望只有一个有效载荷。我也不完全确定这是一个可靠的假设。)
顺便说一下,这种消息格式并不是 GMail 的一个特殊功能,它是 MIME 封装内容以使其与基本的 7 位 ASCII-only RFC822 电子邮件消息格式兼容的方式。