如何从fsimage中找到文件的文件名和大小？

Question

我正在尝试在 HDFS 中查找小于块大小的文件。

通过使用 OIV，我将 fsimage 转换为带有如下分隔符的文本文件。

hdfs oiv_legacy -i /tmp/fsimage -o /tmp/fsimage_$RUNDATE/fsimage.txt -p Delimited -delimiter '@'

因为fsimage 有很多数据。从这里如何找到HDFS中每个文件的文件名和文件大小。

谁能帮忙。

提前致谢....

Answer 1

hadoop fs -find /tmp/fsimage size 64 -print

注意：我使用的是 MapR Hadoop.The 语法，如果它是 Cloudera、Hortonworks，可能会很谨慎。

Answer 2

看看本文末尾的脚本documentation。

开始于：

A = LOAD '$inputFile' USING PigStorage('\t') AS (path:chararray,
                                                replication:int,
                                                modTime:chararray,
                                                accessTime:chararray,
                                                blockSize:long,
                                                numBlocks:int,
                                                fileSize:long,
                                                NamespaceQuota:int,
                                                DiskspaceQuota:int,
                                                perms:chararray,
                                                username:chararray,
                                                groupname:chararray);

-- Grab the pathname and filesize
B = FOREACH A generate path, fileSize;

-- Save results
STORE B INTO '$outputFile';

如何从fsimage中找到文件的文件名和大小？

How to find the file name and size of the file from fsimage?

hadoop

apache-pig