如何将CSV文件中的数据加载到Python脚本中进行概率分析?
How to load data in CSV file into Python script for probability analysis?
我有一个 Python 脚本,用于模拟 IoT 环境。该脚本具有静态值并且是基本的(不像 Pymote、NS 等更高级的模拟器)。我后来计划在脚本中添加一个轻量级神经网络来进行异常检测,但这是以后的事情。
我有一个包含以下数据的 CSV 文件 (node.csv) -
1.0,0.5,0.2
0.6,1.0,0.2
0.3,0.4,1.0
准确地说,这些是 "rate of success" 的概率或 "message" 从节点 0 传递到节点 1,2,从节点 1 传递到节点 2,0 等的概率
代码如下-
import numpy as np
# dict <epoch, dict <node id, value>>
# change later to read from file instead
sensor_values = {0: {0: 12.0, 1: 15.0, 2: 20.0},
1: {0: 12.5, 1: 18.0, 2: np.nan},
2: {0: 11.0, 1: np.nan, 2: 20.0},
3: {0: 10.0, 1: 150.0, 2: 28.0},
4: {0: np.nan, 1: 15.0, 2: 27.0}
}
# static configuration
max_epoch = 5
node_ids = range(3)
connection_success_rates = [[1.0, 0.5, 0.2],
[0.6, 1.0, 0.2],
[0.3, 0.4, 1.0]] # change later to read from
# file instead
# simulation global variables
nodes_memory = None # dict <node_id, dict <key, value>>
successful_communications = None
failed_communications = None
# initialize simulation global variables
def init():
global nodes_memory
nodes_memory = dict()
for node_id in node_ids:
nodes_memory[node_id] = dict()
global successful_communications
global failed_communications
successful_communications, failed_communications = 0, 0
def print_node_memory(node_id):
print 'Memory content of node %d' % node_id
for key in nodes_memory[node_id]:
print 'Key={%s}, value={%s}' % (key, nodes_memory[node_id][key])
# here we could print some extra stats, like the size of used memory...
def print_communication_stats():
print '--- Communication stats ---'
print 'successful communications: %d' % successful_communications
print 'failed communications: %d' % failed_communications
print 'total communications: %d' % (successful_communications +
failed_communications)
# send <key, value> from sender to receiver according to connection success
# rates
def send_value(sender_id, receiver_id, key, value):
success_rate = connection_success_rates[sender_id][receiver_id]
if np.random.rand() < success_rate:
# communication successful
# write transmitted value in target node memory
nodes_memory[receiver_id][key] = value
global successful_communications
successful_communications += 1
return True
else:
# communication fails
global failed_communications
failed_communications += 1
return False
# some energy counter could be added here too
# internal behavior of each node
# (neural networks will added here later)
def run_node(local_id, epoch):
# send current value to all other nodes
for target_node_id in node_ids:
if target_node_id != local_id:
# this will be adapted later to communicate with neighbor nodes only
key = 'node=%d, epoch=%d' % (local_id, epoch)
value = sensor_values[epoch][local_id]
send_value(local_id, target_node_id, key, value)
# here we could do some additional stuff
# for example clean old values (more than 3 epochs) from node memory
keys_to_remove = []
for key in nodes_memory[local_id]:
# parse key to get epoch when the value was written
# later the values should be encapsulated in objects for convenience
epoch_received = int(key[key.find('epoch=')+len('epoch='):])
if epoch - epoch_received > 3:
# mark for delete as python does not support deleting values while
iterating through dict
# print 'marked old value with key %s from epoch %d for removal' %
(key, epoch_received)
keys_to_remove.append(key)
for key_to_remove in keys_to_remove:
print 'removing old value with key %s' % key_to_remove
nodes_memory[local_id].pop(key_to_remove)
return 0
def main():
print 'Simulation example'
init()
for epoch in range(max_epoch):
print '------------- Epoch %d -------------' % epoch
for node_id in node_ids:
print '--- Node %d ---' % node_id
run_node(node_id, epoch)
print_node_memory(node_id)
print_communication_stats()
return 0
if __name__ == '__main__':
main()
我希望代码足够清晰易读,尽管它确实需要改进。
简而言之,代码 运行s 5 'epochs' 每个节点与其他节点都有各自的成功率。此外,每个节点都有一个内存,用于存储从其他节点收到的 'data'。 (一旦代码为运行,输出将被清除)
我目前面临的问题是将 CSV 文件中的 connection_success_rates
用于我的代码。我尝试在我的原始代码中使用以下内容 -
import csv # at the very top
f = open('node.csv')
csvfile = csv.readfile(f, delimiter=',')
connection_success_rates = []
for row in csvfile:
connection_success_rates.append(row) # Commented the previous
# connection_success_rates section completely
但这并没有帮助。尽管代码仍然 运行,但它完全忽略了 CSV 文件部分,因此它无法像在原始代码中那样计算成功率。我尝试了其他几种使用 CSV 选项的组合,但没有任何帮助。 (我也计划用 sensor_values
部分扩展相同的内容)
我特别喜欢使用 CSV 文件格式,因为稍后我将插入一个神经网络,因为 NN 使用来自 CSV 文件的大型数据集。另外,稍后我将在脚本中使用包含许多节点的更大数据集,但我首先需要这个简单的任务 运行 这个简单的数据。
我谦虚地请求任何人的帮助。如果我应该对特定数据或代码本身的进一步定义进行更改,请纠正我。
提前谢谢大家。
P.S。 - 这是我关于 SOF 的第一个问题,所以如果有任何错误,请宽待我。
我可以看到您手动创建的 connection_success_rates 是一个列表,因此可能的解决方案是使用 pandas,如下所示:
import pandas as pd
connection_success_rates = pd.read_csv("node.csv", header=None)
#convert the dataframe to list
connection_success_rates =connection_success_rates.values.tolist()
print(connection_success_rates)
结果:
[[1.0, 0.5, 0.2], [0.6, 1.0, 0.2], [0.3, 0.4, 1.0]]
所以删除这部分:
import csv
f = open('node.csv')
csvfile = csv.readfile(f, delimiter=',')
connection_success_rates = []
for row in csvfile:
connection_success_rates.append(row)
并尝试使用我在上面发布的那个。
为了问题有一个纯粹的python解法,可以这样解:
connection_success_rates = []
with open('node.csv') as file:
connection_success_rates = [line.split(',') for line in file.readlines()]
我有一个 Python 脚本,用于模拟 IoT 环境。该脚本具有静态值并且是基本的(不像 Pymote、NS 等更高级的模拟器)。我后来计划在脚本中添加一个轻量级神经网络来进行异常检测,但这是以后的事情。 我有一个包含以下数据的 CSV 文件 (node.csv) -
1.0,0.5,0.2
0.6,1.0,0.2
0.3,0.4,1.0
准确地说,这些是 "rate of success" 的概率或 "message" 从节点 0 传递到节点 1,2,从节点 1 传递到节点 2,0 等的概率
代码如下-
import numpy as np
# dict <epoch, dict <node id, value>>
# change later to read from file instead
sensor_values = {0: {0: 12.0, 1: 15.0, 2: 20.0},
1: {0: 12.5, 1: 18.0, 2: np.nan},
2: {0: 11.0, 1: np.nan, 2: 20.0},
3: {0: 10.0, 1: 150.0, 2: 28.0},
4: {0: np.nan, 1: 15.0, 2: 27.0}
}
# static configuration
max_epoch = 5
node_ids = range(3)
connection_success_rates = [[1.0, 0.5, 0.2],
[0.6, 1.0, 0.2],
[0.3, 0.4, 1.0]] # change later to read from
# file instead
# simulation global variables
nodes_memory = None # dict <node_id, dict <key, value>>
successful_communications = None
failed_communications = None
# initialize simulation global variables
def init():
global nodes_memory
nodes_memory = dict()
for node_id in node_ids:
nodes_memory[node_id] = dict()
global successful_communications
global failed_communications
successful_communications, failed_communications = 0, 0
def print_node_memory(node_id):
print 'Memory content of node %d' % node_id
for key in nodes_memory[node_id]:
print 'Key={%s}, value={%s}' % (key, nodes_memory[node_id][key])
# here we could print some extra stats, like the size of used memory...
def print_communication_stats():
print '--- Communication stats ---'
print 'successful communications: %d' % successful_communications
print 'failed communications: %d' % failed_communications
print 'total communications: %d' % (successful_communications +
failed_communications)
# send <key, value> from sender to receiver according to connection success
# rates
def send_value(sender_id, receiver_id, key, value):
success_rate = connection_success_rates[sender_id][receiver_id]
if np.random.rand() < success_rate:
# communication successful
# write transmitted value in target node memory
nodes_memory[receiver_id][key] = value
global successful_communications
successful_communications += 1
return True
else:
# communication fails
global failed_communications
failed_communications += 1
return False
# some energy counter could be added here too
# internal behavior of each node
# (neural networks will added here later)
def run_node(local_id, epoch):
# send current value to all other nodes
for target_node_id in node_ids:
if target_node_id != local_id:
# this will be adapted later to communicate with neighbor nodes only
key = 'node=%d, epoch=%d' % (local_id, epoch)
value = sensor_values[epoch][local_id]
send_value(local_id, target_node_id, key, value)
# here we could do some additional stuff
# for example clean old values (more than 3 epochs) from node memory
keys_to_remove = []
for key in nodes_memory[local_id]:
# parse key to get epoch when the value was written
# later the values should be encapsulated in objects for convenience
epoch_received = int(key[key.find('epoch=')+len('epoch='):])
if epoch - epoch_received > 3:
# mark for delete as python does not support deleting values while
iterating through dict
# print 'marked old value with key %s from epoch %d for removal' %
(key, epoch_received)
keys_to_remove.append(key)
for key_to_remove in keys_to_remove:
print 'removing old value with key %s' % key_to_remove
nodes_memory[local_id].pop(key_to_remove)
return 0
def main():
print 'Simulation example'
init()
for epoch in range(max_epoch):
print '------------- Epoch %d -------------' % epoch
for node_id in node_ids:
print '--- Node %d ---' % node_id
run_node(node_id, epoch)
print_node_memory(node_id)
print_communication_stats()
return 0
if __name__ == '__main__':
main()
我希望代码足够清晰易读,尽管它确实需要改进。
简而言之,代码 运行s 5 'epochs' 每个节点与其他节点都有各自的成功率。此外,每个节点都有一个内存,用于存储从其他节点收到的 'data'。 (一旦代码为运行,输出将被清除)
我目前面临的问题是将 CSV 文件中的 connection_success_rates
用于我的代码。我尝试在我的原始代码中使用以下内容 -
import csv # at the very top
f = open('node.csv')
csvfile = csv.readfile(f, delimiter=',')
connection_success_rates = []
for row in csvfile:
connection_success_rates.append(row) # Commented the previous
# connection_success_rates section completely
但这并没有帮助。尽管代码仍然 运行,但它完全忽略了 CSV 文件部分,因此它无法像在原始代码中那样计算成功率。我尝试了其他几种使用 CSV 选项的组合,但没有任何帮助。 (我也计划用 sensor_values
部分扩展相同的内容)
我特别喜欢使用 CSV 文件格式,因为稍后我将插入一个神经网络,因为 NN 使用来自 CSV 文件的大型数据集。另外,稍后我将在脚本中使用包含许多节点的更大数据集,但我首先需要这个简单的任务 运行 这个简单的数据。
我谦虚地请求任何人的帮助。如果我应该对特定数据或代码本身的进一步定义进行更改,请纠正我。
提前谢谢大家。
P.S。 - 这是我关于 SOF 的第一个问题,所以如果有任何错误,请宽待我。
我可以看到您手动创建的 connection_success_rates 是一个列表,因此可能的解决方案是使用 pandas,如下所示:
import pandas as pd
connection_success_rates = pd.read_csv("node.csv", header=None)
#convert the dataframe to list
connection_success_rates =connection_success_rates.values.tolist()
print(connection_success_rates)
结果:
[[1.0, 0.5, 0.2], [0.6, 1.0, 0.2], [0.3, 0.4, 1.0]]
所以删除这部分:
import csv
f = open('node.csv')
csvfile = csv.readfile(f, delimiter=',')
connection_success_rates = []
for row in csvfile:
connection_success_rates.append(row)
并尝试使用我在上面发布的那个。
为了问题有一个纯粹的python解法,可以这样解:
connection_success_rates = []
with open('node.csv') as file:
connection_success_rates = [line.split(',') for line in file.readlines()]