Pig 处理日志文件使用

Pig Processing log file using

我有以下日志:谁能告诉我如何使用 PigLatin 处理它?

**

SYSTEM IP:192.168.68.78 
Distro info:Red Hat Enterprise Linux Server release 6.6 (Santiago)
Kernel:Linux bugzilla-blr-in 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 17:01:00 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
Uptime:12:27:42 up 8 days, 17:57,  0 users,  load average: 0.00, 0.00, 0.00
Memory:Total:1869Mb Memory:Used:1567Mb  Memory:Free:302Mb
Swap:Total:1999Mb   Swap:Used:0Mb   Swap:Free: 1999Mb
Architecture:x86_64
  Processor:0:Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz
Date:Wed Jun 29 12:27:42 IST 2016

SCRIPT USER
User:aimsadm (uid:503)
Groups:aimsadm
Working dir:/home/aimsadm
Home dir:/home/aimsadm

NETWORK DETAILS
Hostname:bugzilla-blr-in
IP (    ):127.0.0.1/8
IP (eth0):192.168.68.78/24
Gateway:192.168.68.1
Name Server:8.8.8.8
Name Server:192.168.68.80

LIST OF USERS:sdudam,sudutha,djegathesa,aimsadm,krishnang,

CLAMD STATUS: CLAM AV service is stopped or not installed

NAGIOS STATUS: Nagios service is running

OSSEC STATUS: Ossec service is stopped or not installed

NTPD STATUS: NTP service is running

HARDENING STATUS:Hardening Done

AD INTEGRATION STATUS:AD Integration Not Done

HARDWARE/PLATFORM DETAILS
Hardware Platform:64Bit
Hardware Info :DMI 2.3 present.
DMI: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  05/23/2012

OS DETAILS
Red Hat Enterprise Linux Server release 6.6 (Santiago)
Linux bugzilla-blr-in 2.6.32-504.16.2.el6.x86_64 #1 SMP Tue Mar 10 17:01:00 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

CPU INFO
model name  : Intel(R) Xeon(R) CPU E5-2640 0 @ 2.50GHz

MEMORY INFO
MemTotal:        1914776 kB
RAM:1 GB

HARD DISK DETAILS

MOUNT DETAILS
Filesystem:/dev/mapper/vg_bugzillablrin-LogVol00,Type:ext4,Total Size:22G,Used:2.4G,Avail:19G,Use%:12%,Mounted on:/
Filesystem:tmpfs,Type:tmpfs,Total Size:981M,Used:0,Avail:981M,Use%:0%,Mounted on:/dev/shm
Filesystem:/dev/sda1,Type:ext4,Total Size:297M,Used:95M,Avail:186M,Use%:34%,Mounted on:/boot
Filesystem:/dev/mapper/vg_bugzillablrin-LogVol01,Type:ext4,Total Size:21G,Used:5.8G,Avail:14G,Use%:30%,Mounted on:/var

LSBLK OUTPUT
NAME:sr0,
MAJ:MIN:11:0,RM:1,SIZE:1024M,RO:0,TYPE:rom,MOUNTPOINT::
NAME:sda,
MAJ:MIN:8:0,RM:0,SIZE:60G,RO:0,TYPE:disk,MOUNTPOINT::
NAME:sda1,
MAJ:MIN:8:1,RM:0,SIZE:300M,RO:0,TYPE:part,MOUNTPOINT::/boot
NAME:sda2,
MAJ:MIN:8:2,RM:0,SIZE:59.7G,RO:0,TYPE:part,MOUNTPOINT::

RUNNING SERVICES
auditd running...
crond running...
messagebus running...
nrpe running...
ntpd running...
rhnsd running...
rhsmcertd running...
rpcbind running...
openssh-daemon running...




**

SYSTEM IP:192.168.68.35 
Distro info:CentOS release 6.6 (Final)
Kernel:Linux altifin-ci-app 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Uptime:12:28:06 up 48 days, 20:31,  0 users,  load average: 0.00, 0.00, 0.00
Memory:Total:11903Mb    Memory:Used:1277Mb  Memory:Free:10625Mb
Swap:Total:8191Mb   Swap:Used:0Mb   Swap:Free: 8191Mb
Architecture:x86_64
  Processor:0:Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
  Processor:1:Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
Date:Wed Jun 29 12:28:06 IST 2016

SCRIPT USER
User:aimsadm (uid:509)
Groups:aimsadm
Working dir:/home/aimsadm
Home dir:/home/aimsadm

NETWORK DETAILS
Hostname:altifin-ci-app
IP (lo):127.0.0.1/8
IP (eth0):192.168.68.35/24
Gateway:192.168.68.1
Name Server:192.168.68.10
Name Server:192.168.68.4

LIST OF USERS:altipay,aramesh,sdudam,nagios,kpankaj,sudutha,miyappan,skosanam,djegathesa,aimsadm,

CLAMD STATUS: CLAM AV service is stopped or not installed

NAGIOS STATUS: Nagios service is running

OSSEC STATUS: Ossec service is stopped or not installed

NTPD STATUS: NTP service is running

HARDENING STATUS:Hardening Done

AD INTEGRATION STATUS:AD Integration Not Done

HARDWARE/PLATFORM DETAILS
Hardware Platform:64Bit
Hardware Info :DMI 2.3 present.
DMI: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090006  05/23/2012

OS DETAILS
CentOS release 6.6 (Final)
Linux altifin-ci-app 2.6.32-504.16.2.el6.x86_64 #1 SMP Wed Apr 22 06:48:29 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

CPU INFO
model name  : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz
model name  : Intel(R) Xeon(R) CPU E5-2650 v2 @ 2.60GHz

MEMORY INFO
MemTotal:       12189032 kB
RAM:11 GB

HARD DISK DETAILS

MOUNT DETAILS
Filesystem:/dev/mapper/vg_altifinci-LogVol01,Type:ext4,Total Size:203G,Used:80G,Avail:113G,Use%:42%,Mounted on:/
Filesystem:tmpfs,Type:tmpfs,Total Size:6.3G,Used:0,Avail:6.3G,Use%:0%,Mounted on:/dev/shm
Filesystem:/dev/sda1,Type:ext4,Total Size:500M,Used:64M,Avail:410M,Use%:14%,Mounted on:/boot

LSBLK OUTPUT
NAME:sr0,
MAJ:MIN:11:0,RM:1,SIZE:1024M,RO:0,TYPE:rom,MOUNTPOINT::
NAME:sda,
MAJ:MIN:8:0,RM:0,SIZE:200G,RO:0,TYPE:disk,MOUNTPOINT::
NAME:sda1,
MAJ:MIN:8:1,RM:0,SIZE:500M,RO:0,TYPE:part,MOUNTPOINT::/boot
NAME:sda2,
MAJ:MIN:8:2,RM:0,SIZE:199.5G,RO:0,TYPE:part,MOUNTPOINT::

RUNNING SERVICES
abrtd running...
abrt-dump-oops running...
acpid running...
atd running...
auditd running...
automount running...
crond running...
cupsd running...
hald running...
mcelog running...
messagebus running...
MySQL but
rpc.statd running...
nrpe running...
ntpd running...
rpcbind running...
openssh-daemon running...

是的。
有办法。让我解释一下。
虽然给定的样本数据属于 'unstructured' 类别,但我们总是在其中寻找 'some thing'。
话虽如此,我们正在寻找一种模式,比如具有您正在寻找的所需数据的一行或多行!
为此,我们需要从示例数据中识别出 'pattern',并使用适当的 'RegEx'(正则表达式)来提取它。
此外,Pig 附带内置 jar 'piggybank' 以支持各种预定义文件格式,包括您所说的非结构化文件格式。
尝试使用 'RegExLoader' class ,它是PIG's piggybank 的以下包裹!!! (包裹org.apache.pig.piggybank.storage) https://pig.apache.org/docs/r0.15.0/api/

此外,让所有人知道您正在查看的确切输出。