通过 shell/bash 将文本数据文件转换为 csv 格式
Converting text data file to csv format via shell/bash
我正在寻找一个简单的控制台解决方案来更改如下所示的文本文件:
...
Gender: M
Age: 46
History: 01305
Gender: F
Age: 46
History: 01306
Gender: M
Age: 19
History: 01307
Gender: M
Age: 19
History: 01308
....
像这样的 csv 文件:
Gender,Age,History
M,46,01305
F,46,01306
M,19,01307
M,19,01308
感谢任何帮助
通过以下解决方案,我收到了此输出。我做错了什么吗?
awk 'BEGIN{printf "Gender,Age,History%s",ORS;FS=":"}{c++} {sub(/^ */,"",);printf "%s%s",,(c==3)?ORS:","}c==3{c=0}' data.txt >> 1.csv
Gender,Age,History
M
,37
,00001
M
,37
,00001
M
,41
,00001
这一行应该有帮助:
awk 'BEGIN{FS=":|\n";RS="Gender";OFS=",";print "Gender,Age,History"}[=10=]{print ,,}' file
以您的示例作为输入,它给出:
Gender,Age,History
M, 46, 01305
F, 46, 01306
M, 19, 01307
M, 19, 01308
这是 bash 中的一种方法。假设您的数据文件名为 data.txt
#!/bin/bash
echo "Gender,Age,History"
while read -r line; do
printf '%s' "$(cut -d ' ' -f2 <<< $line )"
if [[ "$line" =~ ^History.* ]]; then
printf "\n"
else
printf ","
fi
done < data.txt
输出:
Gender,Age,History
M,46,01305
F,46,01306
M,19,01307
M,19,01308
仅使用 bash 内置命令,我会说:
#!/bin/bash
echo "Gender,Age,History"
while read line; do
if [[ $line =~ ^Gender:\ *([^\ ]+) ]]; then
r=${BASH_REMATCH[1]}
elif [[ $line =~ ^Age:\ *([^\ ]+) ]]; then
r+=,${BASH_REMATCH[1]}
elif [[ $line =~ ^History:\ *([^\ ]+) ]]; then
echo $r,${BASH_REMATCH[1]}
fi
done < data.text
我还是不知道问题到底出在哪里
所以我决定清除所有字符的数据,除了那些应该存在的字符(很可能是不寻常的行尾符号)
sed -e 's/[^a-zA-Z*0-9:]/ /g;s/ */ /g' history.txt > output.txt
然后成功使用@sjsam
的解决方案
awk 'BEGIN{printf "Gender,Age,History%s",ORS;FS=":"}{c++} {sub(/^ */,"",);printf "%s%s",,(c==3)?ORS:","}c==3{c=0}' data.txt >> 1.csv
谢谢大家!
我正在寻找一个简单的控制台解决方案来更改如下所示的文本文件:
...
Gender: M
Age: 46
History: 01305
Gender: F
Age: 46
History: 01306
Gender: M
Age: 19
History: 01307
Gender: M
Age: 19
History: 01308
....
像这样的 csv 文件:
Gender,Age,History
M,46,01305
F,46,01306
M,19,01307
M,19,01308
感谢任何帮助
通过以下解决方案,我收到了此输出。我做错了什么吗?
awk 'BEGIN{printf "Gender,Age,History%s",ORS;FS=":"}{c++} {sub(/^ */,"",);printf "%s%s",,(c==3)?ORS:","}c==3{c=0}' data.txt >> 1.csv
Gender,Age,History
M
,37
,00001
M
,37
,00001
M
,41
,00001
这一行应该有帮助:
awk 'BEGIN{FS=":|\n";RS="Gender";OFS=",";print "Gender,Age,History"}[=10=]{print ,,}' file
以您的示例作为输入,它给出:
Gender,Age,History
M, 46, 01305
F, 46, 01306
M, 19, 01307
M, 19, 01308
这是 bash 中的一种方法。假设您的数据文件名为 data.txt
#!/bin/bash
echo "Gender,Age,History"
while read -r line; do
printf '%s' "$(cut -d ' ' -f2 <<< $line )"
if [[ "$line" =~ ^History.* ]]; then
printf "\n"
else
printf ","
fi
done < data.txt
输出:
Gender,Age,History
M,46,01305
F,46,01306
M,19,01307
M,19,01308
仅使用 bash 内置命令,我会说:
#!/bin/bash
echo "Gender,Age,History"
while read line; do
if [[ $line =~ ^Gender:\ *([^\ ]+) ]]; then
r=${BASH_REMATCH[1]}
elif [[ $line =~ ^Age:\ *([^\ ]+) ]]; then
r+=,${BASH_REMATCH[1]}
elif [[ $line =~ ^History:\ *([^\ ]+) ]]; then
echo $r,${BASH_REMATCH[1]}
fi
done < data.text
我还是不知道问题到底出在哪里 所以我决定清除所有字符的数据,除了那些应该存在的字符(很可能是不寻常的行尾符号)
sed -e 's/[^a-zA-Z*0-9:]/ /g;s/ */ /g' history.txt > output.txt
然后成功使用@sjsam
的解决方案awk 'BEGIN{printf "Gender,Age,History%s",ORS;FS=":"}{c++} {sub(/^ */,"",);printf "%s%s",,(c==3)?ORS:","}c==3{c=0}' data.txt >> 1.csv
谢谢大家!