使用 unix shell 脚本将文本文件中的列拆分为行 - 动态更改源文件结构
Splitting columns into rows from a text file using unix shell script - Dynamically changing source file structure
我有一个制表符分隔的源文件,结构如下: 只有从 ID 到行 Item/Property 的前 9 列是固定的,其余都是动态变化的计数和结构。
ID Date/Time (UTC) User Description Security Change Previous Value New Value Module/List Line Item/Property Scenarios Region EM2 Plan Item PB6 Market EM4 Plants - Master Plan Brand PB4 T/DI GRS 6 GRS 7 Target User Import Object Target Role Export Dashboard Action Time
这是该文件中的一条示例记录
2572561 3/24/2020 14:01 chiara.bettini@gmail.com FALSE TRUE FILTER: Brand P&L Report - Market Plan Brands Polly Pocket chiara.bettini@gmail.com
我需要使用 Unix shell 脚本将其更改为以下结构 具有以下 headers 和数据格式 的 CSV 文件。我想保留永久列(ID 到第 Item/Property 行),并将所有其他动态可变列放入属性名称和属性值列:
ID,Date/Time (UTC),User,Description,Security Change,Previous Value,New Value,Module/List,Line Item/Property,Attribute Name,Attribute Value
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Scenarios,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Region EM2,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Plan Item PB6,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Market EM4,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Plants - Master,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Plan Brand PB4,Polly Pocket
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,T/DI,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,GRS 6,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,GRS 7,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Target User,chiara.bettini@gmail.com
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Import,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Object,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Target Role,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Export,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Dashboard,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Action,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Time,
注意:如果任何字段包含逗号 (,
).[=,以下将 not 正常工作15=]
试试这个 bash
脚本(为随后的终端会话命名为 process
):
#!/bin/bash
tr '\t' ',' | {
IFS=',' # separator for all array reads and printfs
# read and output heading
read -r -a heading
printf "%s\n" "${heading[*]:0:9},Attribute Name,Attribute Value"
# process one line of data
while read -r -a data ; do
for (( i=9; i<${#heading[*]}; ++i )) ; do
printf "%s\n" "${data[*]:0:9},${heading[i]},${data[i]}"
done
done
}
终端会话:
$ cat data.in | tr '\t' ','
ID,Date/Time (UTC),User,Description,Security Change,Previous Value,New Value,Module/List,Line Item/Property,Scenarios,Region EM2,Plan Item PB6,Market EM4,Plants - Master,Plan Brand PB4,T/DI,GRS 6,GRS 7,Target User,Import,Object,Target Role,Export,Dashboard,Action,Time
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,,,,,,Polly Pocket,,,,chiara.bettini@gmail.com
$ ./process < data.in
ID,Date/Time (UTC),User,Description,Security Change,Previous Value,New Value,Module/List,Line Item/Property,Attribute Name,Attribute Value
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Scenarios,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Region EM2,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Plan Item PB6,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Market EM4,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Plants - Master,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Plan Brand PB4,Polly Pocket
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,T/DI,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,GRS 6,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,GRS 7,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Target User,chiara.bettini@gmail.com
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Import,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Object,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Target Role,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Export,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Dashboard,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Action,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Time,
$
我有一个制表符分隔的源文件,结构如下: 只有从 ID 到行 Item/Property 的前 9 列是固定的,其余都是动态变化的计数和结构。
ID Date/Time (UTC) User Description Security Change Previous Value New Value Module/List Line Item/Property Scenarios Region EM2 Plan Item PB6 Market EM4 Plants - Master Plan Brand PB4 T/DI GRS 6 GRS 7 Target User Import Object Target Role Export Dashboard Action Time
这是该文件中的一条示例记录
2572561 3/24/2020 14:01 chiara.bettini@gmail.com FALSE TRUE FILTER: Brand P&L Report - Market Plan Brands Polly Pocket chiara.bettini@gmail.com
我需要使用 Unix shell 脚本将其更改为以下结构 具有以下 headers 和数据格式 的 CSV 文件。我想保留永久列(ID 到第 Item/Property 行),并将所有其他动态可变列放入属性名称和属性值列:
ID,Date/Time (UTC),User,Description,Security Change,Previous Value,New Value,Module/List,Line Item/Property,Attribute Name,Attribute Value
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Scenarios,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Region EM2,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Plan Item PB6,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Market EM4,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Plants - Master,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Plan Brand PB4,Polly Pocket
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,T/DI,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,GRS 6,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,GRS 7,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Target User,chiara.bettini@gmail.com
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Import,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Object,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Target Role,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Export,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Dashboard,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Action,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Time,
注意:如果任何字段包含逗号 (,
).[=,以下将 not 正常工作15=]
试试这个 bash
脚本(为随后的终端会话命名为 process
):
#!/bin/bash
tr '\t' ',' | {
IFS=',' # separator for all array reads and printfs
# read and output heading
read -r -a heading
printf "%s\n" "${heading[*]:0:9},Attribute Name,Attribute Value"
# process one line of data
while read -r -a data ; do
for (( i=9; i<${#heading[*]}; ++i )) ; do
printf "%s\n" "${data[*]:0:9},${heading[i]},${data[i]}"
done
done
}
终端会话:
$ cat data.in | tr '\t' ','
ID,Date/Time (UTC),User,Description,Security Change,Previous Value,New Value,Module/List,Line Item/Property,Scenarios,Region EM2,Plan Item PB6,Market EM4,Plants - Master,Plan Brand PB4,T/DI,GRS 6,GRS 7,Target User,Import,Object,Target Role,Export,Dashboard,Action,Time
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,,,,,,Polly Pocket,,,,chiara.bettini@gmail.com
$ ./process < data.in
ID,Date/Time (UTC),User,Description,Security Change,Previous Value,New Value,Module/List,Line Item/Property,Attribute Name,Attribute Value
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Scenarios,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Region EM2,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Plan Item PB6,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Market EM4,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Plants - Master,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Plan Brand PB4,Polly Pocket
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,T/DI,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,GRS 6,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,GRS 7,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Target User,chiara.bettini@gmail.com
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Import,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Object,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Target Role,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Export,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Dashboard,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Action,
2572561,3/24/2020 14:01,chiara.bettini@gmail.com,,,FALSE,TRUE,FILTER: Brand P&L Report - Market,Plan Brands,Time,
$