如何重塑 numpy 数据 table？

Question

我是 python 的新手，我的任务是“重塑”.txt 文件中的一些数据。原始数据的简化格式如下所示：

A 1 x      
A 2 y      
A 3 z    
B 1 q    
B 2 w    
B 3 e   
 ...

我需要得到的是这样的

  A B
1 x q
2 y w  
3 z e
 ...

问题是，有多个 .txt 文件我必须重新整形，每个 A-B-C 没有固定数量的 1-2-3，这意味着 A 可以从 1 到 50，而 B 可以从 1 到 10或 75。我正在寻找一种算法来执行此操作，我已经想出了如何获取我需要的数据并丢弃我不需要的数据，但我不知道如何“减少”数据的维度。

到目前为止我所做的是在数组中获取必要的数据并将这些数组放入一个 numpy 数组中

data = np.array([station, depth, temperature])

现在我正在尝试填充一个新的二维数据数组，x 和 y 轴是不同站点和深度的数量：如果原始数据有 AAAABBCCDDDD，那么新数据数组的 x 轴将包含 ABCD（使用 Counter().keys()）。

Answer 1

首先，您可以解析所有内容，逐行读取，并将值存储在字典中。由于每一行看起来像A 1 x ，一般情况如下

BIG_LETTER INDEX VALUE WHITESPACE

在字典中，您将有 keys BIG_LETTER 和 values 另一个存储索引的字典和值，类似于 {A : {1: 'q', 2: 'c'}}。这可以轻松实现。

replace_with_your_file_name = "./text.txt"
with open(replace_with_your_file_name, "r") as file:
    for line in file.readlines():
        line = line.strip().split(' ') # remove ending whitespace and split ''
    
        # Store in a dictionary the big letter and all its values
        # something like {A : {1: 'q', 2: 'c'}}
        if not line[0] in data: 
            data[line[0]] = {}
        data[line[0]][line[1]] = line[2] # data[ big_letter ][number] = char

然后，在完成之后，您可以使用另一个 for 循环对嵌套字典中的 keys 进行排序，因此如果它是 {'B' : { 5: 'a', 2:'c' } }，它将变为 {'B' : {2: 'c', 5: 'a'}}。然后，您还可以轻松地为每个大字母提取它们具有值的 maxmium 数字，这解决了非固定长度的问题。最高的 maxmium 个数留待以后使用。

# Sort the by the dictionary key
GLOBAL_MAX_NUMBER: int = 0 # the larget number among all big letters

for item in data:
    big_letter: dict = data[item]

    data[item] = dict(sorted(big_letter.items(),)) # Sorth according to the keys
    local_max_number = list(data[item])[-1] # The last element is the largest
    
    if local_max_number > GLOBAL_MAX_NUMBER:
        GLOBAL_MAX_NUMBER = local_max_number

iterations = GLOBAL_MAX_NUMBER # Improve readability

现在您可以按照您希望的格式将数据写入新文件

# Write them to a new file
with open("newfile.txt", "w") as file:
    # FORMAT: A B C D ... (BIG NUMBRES)
    # ----- 1 a b c d ... (INDEX AND VALUE FOR EACH BIG LETTTER IN THE FIRST ROW)

    # Write all the big letters in a row
    WHITESPACE: str = "  "
    file.write(WHITESPACE + " ".join(list(data)) + "\n") 
    
    # that `GLOBAL_MAX_NUMBER` we kept track off
    for i in range(iterations):
        current_number: int = i+1 # Current index
        file.write(f'{current_number} ')

        for big_letter in data: # A, B, C ...
            if current_number not in data[big_letter]:
                file.write("0 ") # in case this does not exist write 0
            else:
                file.write(f'{data[big_letter][current_number]} ') # write the value
        
        file.write("\n")

以上所有，结合起来会给出所需的输出

  A B
1 x q 
2 y w 
3 z e

如何重塑 numpy 数据 table？

How to reshape a numpy data table?

python

arrays

formatting

numpy