如何将CSV文件中的数据加载到Python脚本中进行概率分析?

How to load data in CSV file into Python script for probability analysis?

我有一个 Python 脚本,用于模拟 IoT 环境。该脚本具有静态值并且是基本的(不像 Pymote、NS 等更高级的模拟器)。我后来计划在脚本中添加一个轻量级神经网络来进行异常检测,但这是以后的事情。 我有一个包含以下数据的 CSV 文件 (node.csv) -

1.0,0.5,0.2
0.6,1.0,0.2
0.3,0.4,1.0

准确地说,这些是 "rate of success" 的概率或 "message" 从节点 0 传递到节点 1,2,从节点 1 传递到节点 2,0 等的概率
代码如下-

import numpy as np

# dict <epoch, dict <node id, value>>
# change later to read from file instead
sensor_values = {0: {0: 12.0, 1: 15.0, 2: 20.0},
                 1: {0: 12.5, 1: 18.0, 2: np.nan},
                 2: {0: 11.0, 1: np.nan, 2: 20.0},
                 3: {0: 10.0, 1: 150.0, 2: 28.0},
                 4: {0: np.nan, 1: 15.0, 2: 27.0}
                }


# static configuration
max_epoch = 5
node_ids = range(3)
connection_success_rates = [[1.0, 0.5, 0.2],
                            [0.6, 1.0, 0.2],
                            [0.3, 0.4, 1.0]]  # change later to read from 
                                              # file instead

# simulation global variables
nodes_memory = None  # dict <node_id, dict <key, value>>
successful_communications = None
failed_communications = None


# initialize simulation global variables
def init():
    global nodes_memory
    nodes_memory = dict()
    for node_id in node_ids:
        nodes_memory[node_id] = dict()
    global successful_communications
    global failed_communications
    successful_communications, failed_communications = 0, 0


def print_node_memory(node_id):
    print 'Memory content of node %d' % node_id
    for key in nodes_memory[node_id]:
        print 'Key={%s}, value={%s}' % (key, nodes_memory[node_id][key])

# here we could print some extra stats, like the size of used memory...


def print_communication_stats():
    print '--- Communication stats ---'
    print 'successful communications: %d' % successful_communications
    print 'failed communications: %d' % failed_communications
    print 'total communications: %d' % (successful_communications + 
                                         failed_communications)


# send <key, value> from sender to receiver according to connection success 
# rates
def send_value(sender_id, receiver_id, key, value):
    success_rate = connection_success_rates[sender_id][receiver_id]
    if np.random.rand() < success_rate:
        # communication successful
        # write transmitted value in target node memory
        nodes_memory[receiver_id][key] = value
        global successful_communications
        successful_communications += 1
        return True
    else:
        # communication fails
        global failed_communications
        failed_communications += 1
        return False
    # some energy counter could be added here too


# internal behavior of each node
# (neural networks will added here later)
def run_node(local_id, epoch):

    # send current value to all other nodes
    for target_node_id in node_ids:
        if target_node_id != local_id:
        # this will be adapted later to communicate with neighbor nodes only
            key = 'node=%d, epoch=%d' % (local_id, epoch)
            value = sensor_values[epoch][local_id]
            send_value(local_id, target_node_id, key, value)

    # here we could do some additional stuff

     # for example clean old values (more than 3 epochs) from node memory
    keys_to_remove = []
    for key in nodes_memory[local_id]:
        # parse key to get epoch when the value was written
        # later the values should be encapsulated in objects for convenience
        epoch_received = int(key[key.find('epoch=')+len('epoch='):])
        if epoch - epoch_received > 3:
        # mark for delete as python does not support deleting values while 
        iterating through dict
        # print 'marked old value with key %s from epoch %d for removal' % 
        (key, epoch_received)
            keys_to_remove.append(key)

     for key_to_remove in keys_to_remove:
        print 'removing old value with key %s' % key_to_remove
        nodes_memory[local_id].pop(key_to_remove)
     return 0


def main():
    print 'Simulation example'
    init()

    for epoch in range(max_epoch):
        print '------------- Epoch %d -------------' % epoch
        for node_id in node_ids:
            print '--- Node %d ---' % node_id
            run_node(node_id, epoch)
            print_node_memory(node_id)

    print_communication_stats()

    return 0

if __name__ == '__main__':
    main()  

我希望代码足够清晰易读,尽管它确实需要改进。
简而言之,代码 运行s 5 'epochs' 每个节点与其他节点都有各自的成功率。此外,每个节点都有一个内存,用于存储从其他节点收到的 'data'。 (一旦代码为运行,输出将被清除)
我目前面临的问题是将 CSV 文件中的 connection_success_rates 用于我的代码。我尝试在我的原始代码中使用以下内容 -

import csv # at the very top

f = open('node.csv')
csvfile = csv.readfile(f, delimiter=',')
connection_success_rates = []

for row in csvfile:
    connection_success_rates.append(row)   # Commented the previous 
                           # connection_success_rates section completely

但这并没有帮助。尽管代码仍然 运行,但它完全忽略了 CSV 文件部分,因此它无法像在原始代码中那样计算成功率。我尝试了其他几种使用 CSV 选项的组合,但没有任何帮助。 (我也计划用 sensor_values 部分扩展相同的内容)
我特别喜欢使用 CSV 文件格式,因为稍后我将插入一个神经网络,因为 NN 使用来自 CSV 文件的大型数据集。另外,稍后我将在脚本中使用包含许多节点的更大数据集,但我首先需要这个简单的任务 运行 这个简单的数据。
我谦虚地请求任何人的帮助。如果我应该对特定数据或代码本身的进一步定义进行更改,请纠正我。
提前谢谢大家。

P.S。 - 这是我关于 SOF 的第一个问题,所以如果有任何错误,请宽待我。

我可以看到您手动创建的 connection_success_rates 是一个列表,因此可能的解决方案是使用 pandas,如下所示:

import pandas as pd

connection_success_rates = pd.read_csv("node.csv", header=None)

#convert the dataframe to list
connection_success_rates =connection_success_rates.values.tolist()

print(connection_success_rates)

结果:

[[1.0, 0.5, 0.2], [0.6, 1.0, 0.2], [0.3, 0.4, 1.0]]

所以删除这部分:

import csv 

f = open('node.csv')
csvfile = csv.readfile(f, delimiter=',')
connection_success_rates = []

for row in csvfile:
    connection_success_rates.append(row)  

并尝试使用我在上面发布的那个。

为了问题有一个纯粹的python解法,可以这样解:

connection_success_rates = []
with open('node.csv') as file:
    connection_success_rates = [line.split(',') for line in file.readlines()]