为什么 awk 中的分类会弄乱我数组中的记录之一？

Question

我正在做家庭作业，这是我第一次使用 awk。我正在尝试按降序对数组进行排序，而且我似乎已经完成了……主要是。输出应该显示一个人的姓名、职位和来自数据库文件的销售总额。它在未分类的情况下工作正常，但是当我使用分类时，其中一个人（Davy Jones，他的记录应该在中间）的姓名和职位被删除，并且格式混乱。这是我的代码：

BEGIN {
    printf("%4s  %22s  %15s\n", "Name", "Position", "Sales Amount");
    printf("=============================================\n");
    FS = ":";
}

/^[0-9]*:[a-z]*:[A-Z || a-z || -]*:[0-9]*\.[0-9]*$/ {
    productPRICEar[pprice_key++] = ;
}

/^[0-9]*:[A-Z || a-z]*:[A-Z || a-z]*$/ {
    associateNUMar[anum_key++] = ;
    associateNAMEar[aname_key++] = ;
    associatePOSar[apos_key++] = ;
}

/^[0-9]*:[0-9]*:[0-9]*:[0-9]*\/[0-9]*\/[0-9]*:[0-9]*$/ {
    transactionIDar[tID_key++] = ;
    productIDar[pID_key++] = ;
    quantityar[quant_key++] = ;
    associateIDar[aID_key++] = ;
}

END {
    # Create an empty array value for each associate
    for (key in associateNUMar) {
        associate_total[key] = 0; # Stores the total sales made by the associate
    }

    # For each transaction
    for(transaction in transactionIDar) {
        # Declare variables
        belongs_to = associateIDar[transaction]; # Who the transaction belongs to
        product_id = productIDar[transaction]; # ID of the product sold in the transaction
        quantity_sold = quantityar[transaction]; # Quantity of the product sold in the transaction
        transaction_total = productPRICEar[product_id-1] * quantity_sold; # Total revenue from the transaction.

        # For each associate
        for (associate in associateNUMar) {
            # If this is the associate the current transaction belongs to
            if (associateNUMar[associate] == belongs_to) {
                current_total = associate_total[associate]; # Get the associate's current sales total
                associate_total[associate] = current_total + transaction_total; # Add the transaction total to the associate's sales total
            }
        }
    }
    print "\nUnsorted\n=============================================";
    # For each associate's sales total
    for(key2 in associate_total) {
        # Retrieve the associate's records
        associate_name = associateNAMEar[key2]; # Associate's name
        associate_position = associatePOSar[key2]; # Associate's position
        associate_salestotal = associate_total[key2]; # Associate's sales total

        printf("%-18s  %-13s  %10.2f\n", associate_name, associate_position, associate_salestotal);
    }


    n = asort(associate_total);
    print "\nSorted\n=============================================";
    # For each associate's sales total
    for (key2=n; key2>=1; key2--) {
        # Retrieve the associate's records
        associate_name = associateNAMEar[key2]; # Associate's name
        associate_position = associatePOSar[key2]; # Associate's position
        associate_salestotal = associate_total[key2]; # Associate's sales total

        printf("%-18s  %-13s  %10.2f\n", associate_name, associate_position, associate_salestotal);
    }
}

这是数据库：

1:software:Word Processor:55.00
2:software:Bad Wolf Video Game:19.99
3:software:Return to Gallifrey Video Game:59.99
4:vehicle:TARDIS:999999.99
5:hardware:sonic screwdriver:9999.99
6:merchandise:company t-shirt:20.00

1:Davy Jones:Security
2:Ricky Davis:Developer
3:Samantha Smith:Salesperson
4:Matt Smith:Doctor
5:David Tennant:Doctor
6:Buckminster Fuller:Engineer
7:Clara Oswald:Nurse
8:Amelia Pond:Nurse

1:1:1:01/02/2015:2
2:2:1:02/04/2017:2
3:3:1:03/06/2018:5
4:4:1:11/05/2018:5
5:1:1:01/12/2018:2
6:2:2:02/11/2018:2
7:3:1:05/13/2018:6
8:6:3:06/24/2018:1
9:5:1:02/02/2016:5
10:1:1:05/01/2017:5
11:2:1:11/05/2018:5
12:3:1:12/06/2018:5
13:2:1:02/12/2018:5
14:1:1:10/16/2018:5
15:6:4:05/18/2018:3
16:5:1:06/28/2018:6
17:1:1:07/30/2018:5
18:2:1:08/04/2018:7
19:3:1:09/07/2018:5
20:6:1:10/17/2018:4
21:6:1:10/17/2018:8
22:2:1:08/04/2018:7
23:3:1:09/07/2018:5

以下是我的输出。它显示未排序和排序的版本。排序后的版本明显乱了。

Name                Position     Sales Amount
=============================================

Unsorted
=============================================
Davy Jones          Security            60.00
Ricky Davis         Developer          169.97
Samantha Smith      Salesperson         80.00
Matt Smith          Doctor              20.00
David Tennant       Doctor         1010444.92
Buckminster Fuller  Engineer         10059.98
Clara Oswald        Nurse               39.98
Amelia Pond         Nurse               20.00

Sorted
=============================================
                                   1010444.92
Amelia Pond         Nurse            10059.98
Clara Oswald        Nurse              169.97
Buckminster Fuller  Engineer            80.00
David Tennant       Doctor              60.00
Matt Smith          Doctor              39.98
Samantha Smith      Salesperson         20.00
Ricky Davis         Developer           20.00

这是程序的逻辑：

1. Products - each product record has the following fields
    1. Product id - an integer uniquely identifying a product
    2. Product category - a string describing the category of the product
    3. Description - a string describing the product
    4. Price - floating point number with 2 significant digits - how much does this product cost?

2. Associates - each record for an associate will have the following fields:
    1. Associate id - an integer uniquely idenfitying the associate
    2. Name - a string containing the name of the associate
    3. Position - a string describing the job position of the associate

3. Sales - each record for a sale will have the following fields
    1. Transaction id - integer uniquely identifying the transaction
    2. Product id - the product id of the product sold in this transaction
    3. Quantity - integer quantifying how many of the specified product were sold
    4. Date - date of the transaction in the format mm/dd/yyyy
    5. Associate id - the associate id of the associate that made this sale

我不确定为什么 asort 会这样做，我完全被困住了。请让我知道问题出在哪里，以便我解决它。

Answer 1

首先，asort函数重写了数组的索引。例如，如果我们有这样一个数组：

a["foo"] = "world"
a[9] = "there"
a[3.5] = "hello"

然后在排序之后它看起来像这样：

a[1] = "hello"
a[2] = "there"
a[3] = "world"

请注意，我们的原始索引已被破坏。 "world" 不再位于键 "foo" 下，依此类推。

您的第一个循环遍历 associate_total 中的原始键。然后这些被 asort 破坏，替换为从 1 到 n 的自然数。第二个循环现在以数字方式迭代这些自然数。看起来好像原始密钥不是由从 1 到 n 的自然数集合组成的。

例如，如果原始键是从 0 到 n-1，则意味着我们无法再访问项目 0。新的 associate_total 数组包含值 n，它不映射到其他数组中的任何项目。

因此我们可以使用此程序重现与您的问题类似的问题：

BEGIN {
   for (i = 0; i < 3; i++)
     key[i];

   color[0] = "red"
   color[1] = "green"
   color[2] = "blue"


   for (i in key)
     print i, color[i]

   print "---"

   asort(key)

   for (i in key)
     print i, color[i]
}

输出：

$ awk -f asort.awk 
0 red
1 green
2 blue
---
1 green
2 blue
3

糟糕，asort 之后的 red 在哪里？

我的程序在输出中包含关键值，这突出了问题的根源；很明显，这组键已经改变了。您可能还想在程序中打印密钥，直到它被调试。

为什么 awk 中的分类会弄乱我数组中的记录之一？

Why is asort in awk messing up one of the records in my array?

awk

gnu