按文件名删除目录中的重复文件 (linux)
Remove duplicate files by filename in a directory (linux)
我有这样的目录结构
ARCHIVE_LOC -> epoch1 -> a.txt
b.txt
-> epoch2 -> b.txt
c.txt
-> epoch3 -> b.txt
c.txt
我有一个基本存档目录。此目录通过 rsync(定期)从 android 应用程序获取日志,这些日志保存在基于 rsync 进程的 epoch/timestamp 的目录中。我想删除所有重复的日志文件(它们具有相同的名称)并保留最新的。关于如何实现这一目标的任何帮助?
简而言之,我只想保留每个文件的最新文件。了解哪个文件是最新的一种方法是文件的大小,因为新文件的大小总是大于或等于旧文件。
编写以下脚本,对我来说效果很好。
# check base diectory provided exists
[ -e "" ] || {
printf "\nError: invalid path. \n\n"
exit 1
}
# find the files in base directory, sort them and filter out uniques, and iterate over the resulting list of files
# note: we're only filtering .json files here
for name in `find -type f -printf "%f\n" | sort | uniq -d`;
do
# we keep count of the duplicate files for a file to keep track of the last file(biggest in size)
numDups=$(find -name $name | wc -l); # number of duplicates found for a given file
for file in $(find -name $name | sort -h); # sort the files again on basis of size
do
if [ $numDups -ne 1 ];
then
if [ "$option" = -d ] # remove the duplicate file
then
rm $file
else
echo $file # if -d is not provided, just print the duplicate file names
# note: this will print only the duplicate files, and not the latest/biggest file
fi
fi
numDups=$(($numDups-1))
# note: as per current code, we are checking options value for each duplicate file
# we can move the if conditions out of the for loop, but that would need duplication of code
# we may try modifying the script otherwise, if we see serious performance issues.
done
done;
exit 0;
#!/bin/bash
declare -A arr
shopt -s globstar
for file in **; do
[[ -f "$file" ]] || continue
read cksm _ < <(md5sum "$file")
if ((arr[$cksm]++)); then
echo "rm $file"
fi
done
[https://superuser.com/questions/386199/how-to-remove-duplicated-files-in-a-directory][1]
在 Debian 7 上,我设法得出以下 one-liner:
find path/to/folder -type f -name *.txt -printf '%Ts\t%p\n' | sort -nr | cut -f2 | perl -ne '/(\w+.txt)/; print if $seen{$&}++' | xargs rm
它很长,也许还有更短的方法,但它似乎可以解决问题。我在这里综合了调查结果
https://superuser.com/questions/608887/how-can-i-make-find-find-files-in-reverse-chronological-order
这里
Perl regular expression removing duplicate consecutive substrings in a string
我有这样的目录结构
ARCHIVE_LOC -> epoch1 -> a.txt
b.txt
-> epoch2 -> b.txt
c.txt
-> epoch3 -> b.txt
c.txt
我有一个基本存档目录。此目录通过 rsync(定期)从 android 应用程序获取日志,这些日志保存在基于 rsync 进程的 epoch/timestamp 的目录中。我想删除所有重复的日志文件(它们具有相同的名称)并保留最新的。关于如何实现这一目标的任何帮助?
简而言之,我只想保留每个文件的最新文件。了解哪个文件是最新的一种方法是文件的大小,因为新文件的大小总是大于或等于旧文件。
编写以下脚本,对我来说效果很好。
# check base diectory provided exists
[ -e "" ] || {
printf "\nError: invalid path. \n\n"
exit 1
}
# find the files in base directory, sort them and filter out uniques, and iterate over the resulting list of files
# note: we're only filtering .json files here
for name in `find -type f -printf "%f\n" | sort | uniq -d`;
do
# we keep count of the duplicate files for a file to keep track of the last file(biggest in size)
numDups=$(find -name $name | wc -l); # number of duplicates found for a given file
for file in $(find -name $name | sort -h); # sort the files again on basis of size
do
if [ $numDups -ne 1 ];
then
if [ "$option" = -d ] # remove the duplicate file
then
rm $file
else
echo $file # if -d is not provided, just print the duplicate file names
# note: this will print only the duplicate files, and not the latest/biggest file
fi
fi
numDups=$(($numDups-1))
# note: as per current code, we are checking options value for each duplicate file
# we can move the if conditions out of the for loop, but that would need duplication of code
# we may try modifying the script otherwise, if we see serious performance issues.
done
done;
exit 0;
#!/bin/bash
declare -A arr
shopt -s globstar
for file in **; do
[[ -f "$file" ]] || continue
read cksm _ < <(md5sum "$file")
if ((arr[$cksm]++)); then
echo "rm $file"
fi
done
[https://superuser.com/questions/386199/how-to-remove-duplicated-files-in-a-directory][1]
在 Debian 7 上,我设法得出以下 one-liner:
find path/to/folder -type f -name *.txt -printf '%Ts\t%p\n' | sort -nr | cut -f2 | perl -ne '/(\w+.txt)/; print if $seen{$&}++' | xargs rm
它很长,也许还有更短的方法,但它似乎可以解决问题。我在这里综合了调查结果
https://superuser.com/questions/608887/how-can-i-make-find-find-files-in-reverse-chronological-order
这里
Perl regular expression removing duplicate consecutive substrings in a string