Python Protobuf (IPv4/IPv6 地址) 到 Clickhouse FixedString(16)
Python Protobuf (IPv4/IPv6 address) into Clickhouse FixedString(16)
像这样使用简单的 protobuf 定义:
syntax = "proto3";
package flowprotobuf;
message FlowMessage {
bytes SourceIP = 6;
bytes DestIP = 7;
}
编码 IPv4/IPv6 地址的正确方法是什么,以便将其插入到 Clickhouse table 中,SourceIP
和 [= 都设置了 FixedString(16)
类型18=]?
经过几天的奋斗,我目前正在执行以下操作 (Python 3) 将 protobuf 流转储到 Kafka 主题(然后通过 Clickhouse Kafka 引擎和物化视图使用) “好”结果:
#!/usr/bin/env python
import flow_pb2
from google.protobuf.internal.encoder import _VarintBytes
from socket import inet_pton, AF_INET, AF_INET6
from binascii import hexlify
def pack_addr(ipaddr):
if ':' in ipaddr:
l = int(hexlify(inet_pton(AF_INET6, ipaddr)), 16)
return l.to_bytes(16, byteorder='big')
else:
l = int(hexlify(inet_pton(AF_INET, ipaddr)), 16)
return l.to_bytes(16, byteorder='big')
fm = flow_pb2.FlowMessage()
fm.SourceIP = pack_addr(ip_src)
fm.DestIP = pack_addr(ip_dst)
size = fm.ByteSize()
fpb = _VarintBytes(size) + fm.SerializeToString()
producer.produce(kafka_producer_topic, fpb)
producer.poll(0)
我把好的放在引号里是因为根据 Clickhouse documentation for IPv6NumToString()
:
Accepts a FixedString(16) value containing the IPv6 address in binary format. Returns a string containing this address in text format.
IPv6-mapped IPv4 addresses are output in the format ::ffff:111.222.33.44.
但是我的查询结果没有显示 ::ffff:x.x.x.x
格式 - 而是:
de33137dfc80 :) SELECT Date,TimeReceived,IPv6NumToString(SourceIP),IPv6NumToString(DestIP) FROM test LIMIT 5;
SELECT
Date,
TimeReceived,
IPv6NumToString(SourceIP),
IPv6NumToString(DestIP)
FROM test
LIMIT 5
┌───────Date─┬────────TimeReceived─┬─IPv6NumToString(SourceIP)─┬─IPv6NumToString(DestIP)─┐
│ 2020-08-05 │ 2020-08-05 06:41:27 │ ::98.158.157.211 │ ::202.122.147.98 │
│ 2020-08-05 │ 2020-08-05 06:41:27 │ ::98.158.157.211 │ ::217.118.23.125 │
│ 2020-08-05 │ 2020-08-05 06:41:27 │ ::192.34.21.69 │ ::104.34.73.41 │
│ 2020-08-05 │ 2020-08-05 06:41:27 │ ::98.158.157.211 │ ::194.28.167.103 │
│ 2020-08-05 │ 2020-08-05 06:41:27 │ ::98.158.148.89 │ ::79.170.71.49 │
└────────────┴─────────────────────┴───────────────────────────┴─────────────────────────┘
5 rows in set. Elapsed: 0.006 sec.
我知道 IPv4 地址是正确的,它也正确显示了 IPv6 地址。我只是想确保我没有遗漏任何东西 glaring/obvious。谢谢。
编辑添加:Clickhouse 服务器版本 20.5.4 修订版 54435
编辑 2:下面 Denis 的建议让我想出了解决方案:
else:
m = '::ffff:' + ipaddr
l = int(hexlify(inet_pton(AF_INET6, m)), 16)
return l.to_bytes(16, byteorder='big')
SELECT hex(IPv6StringToNum('::98.158.157.211'))
┌─hex(IPv6StringToNum('::98.158.157.211'))─┐
│ 000000000000000000000000629E9DD3 │
└──────────────────────────────────────────┘
SELECT hex(IPv6StringToNum('::ffff:98.158.157.211'))
┌─hex(IPv6StringToNum('::ffff:98.158.157.211'))─┐
│ 00000000000000000000FFFF629E9DD3 │
└───────────────────────────────────────────────┘
https://en.wikipedia.org/wiki/IPv6_address
例如,IPv4映射的IPv6地址::ffff:c000:0280写成::ffff:192.0.2.128,从而清楚地表达了映射到IPv6的原始IPv4地址
https://www.ultratools.com/tools/ipv4toipv6
0:0:0:0:0:ffff:629e:9dd3
Converting IPv4 Address to a Hex IPv6 Address in Python
像这样使用简单的 protobuf 定义:
syntax = "proto3";
package flowprotobuf;
message FlowMessage {
bytes SourceIP = 6;
bytes DestIP = 7;
}
编码 IPv4/IPv6 地址的正确方法是什么,以便将其插入到 Clickhouse table 中,SourceIP
和 [= 都设置了 FixedString(16)
类型18=]?
经过几天的奋斗,我目前正在执行以下操作 (Python 3) 将 protobuf 流转储到 Kafka 主题(然后通过 Clickhouse Kafka 引擎和物化视图使用) “好”结果:
#!/usr/bin/env python
import flow_pb2
from google.protobuf.internal.encoder import _VarintBytes
from socket import inet_pton, AF_INET, AF_INET6
from binascii import hexlify
def pack_addr(ipaddr):
if ':' in ipaddr:
l = int(hexlify(inet_pton(AF_INET6, ipaddr)), 16)
return l.to_bytes(16, byteorder='big')
else:
l = int(hexlify(inet_pton(AF_INET, ipaddr)), 16)
return l.to_bytes(16, byteorder='big')
fm = flow_pb2.FlowMessage()
fm.SourceIP = pack_addr(ip_src)
fm.DestIP = pack_addr(ip_dst)
size = fm.ByteSize()
fpb = _VarintBytes(size) + fm.SerializeToString()
producer.produce(kafka_producer_topic, fpb)
producer.poll(0)
我把好的放在引号里是因为根据 Clickhouse documentation for IPv6NumToString()
:
Accepts a FixedString(16) value containing the IPv6 address in binary format. Returns a string containing this address in text format. IPv6-mapped IPv4 addresses are output in the format ::ffff:111.222.33.44.
但是我的查询结果没有显示 ::ffff:x.x.x.x
格式 - 而是:
de33137dfc80 :) SELECT Date,TimeReceived,IPv6NumToString(SourceIP),IPv6NumToString(DestIP) FROM test LIMIT 5;
SELECT
Date,
TimeReceived,
IPv6NumToString(SourceIP),
IPv6NumToString(DestIP)
FROM test
LIMIT 5
┌───────Date─┬────────TimeReceived─┬─IPv6NumToString(SourceIP)─┬─IPv6NumToString(DestIP)─┐
│ 2020-08-05 │ 2020-08-05 06:41:27 │ ::98.158.157.211 │ ::202.122.147.98 │
│ 2020-08-05 │ 2020-08-05 06:41:27 │ ::98.158.157.211 │ ::217.118.23.125 │
│ 2020-08-05 │ 2020-08-05 06:41:27 │ ::192.34.21.69 │ ::104.34.73.41 │
│ 2020-08-05 │ 2020-08-05 06:41:27 │ ::98.158.157.211 │ ::194.28.167.103 │
│ 2020-08-05 │ 2020-08-05 06:41:27 │ ::98.158.148.89 │ ::79.170.71.49 │
└────────────┴─────────────────────┴───────────────────────────┴─────────────────────────┘
5 rows in set. Elapsed: 0.006 sec.
我知道 IPv4 地址是正确的,它也正确显示了 IPv6 地址。我只是想确保我没有遗漏任何东西 glaring/obvious。谢谢。
编辑添加:Clickhouse 服务器版本 20.5.4 修订版 54435
编辑 2:下面 Denis 的建议让我想出了解决方案:
else:
m = '::ffff:' + ipaddr
l = int(hexlify(inet_pton(AF_INET6, m)), 16)
return l.to_bytes(16, byteorder='big')
SELECT hex(IPv6StringToNum('::98.158.157.211'))
┌─hex(IPv6StringToNum('::98.158.157.211'))─┐
│ 000000000000000000000000629E9DD3 │
└──────────────────────────────────────────┘
SELECT hex(IPv6StringToNum('::ffff:98.158.157.211'))
┌─hex(IPv6StringToNum('::ffff:98.158.157.211'))─┐
│ 00000000000000000000FFFF629E9DD3 │
└───────────────────────────────────────────────┘
https://en.wikipedia.org/wiki/IPv6_address 例如,IPv4映射的IPv6地址::ffff:c000:0280写成::ffff:192.0.2.128,从而清楚地表达了映射到IPv6的原始IPv4地址
https://www.ultratools.com/tools/ipv4toipv6 0:0:0:0:0:ffff:629e:9dd3
Converting IPv4 Address to a Hex IPv6 Address in Python