使用模式匹配从文件中排序,Java
Using pattern matching to sort from a file, Java
所以我已经让我的程序正确地分隔文本文件的行,甚至可以匹配第一行文本的模式,但我还需要能够检测和分隔文本文件的地址行并根据它们的方向或 street/broadway 对它们进行排序,但我什至无法为地址设置检测到初始模式。我是否使用了错误的正则表达式,这就是为什么无法正确检测到地址部分的原因?
代码
package csi311;
// Import some standard Java libraries.
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.ArrayList;
/**
* Hello world example. Shows passing in command line arguments, in this case a filename.
* If the filename is given, read in the file and echo it to stdout.
*/
public class HelloCsi311 {
/**
* Class construtor.
*/
public HelloCsi311() {
}
/**
* @param filename the name of a file to read in
* @throws Exception on anything bad happening
*/
public void run(String filename) throws Exception {
if (filename != null) {
readFile(filename);
}
}
/**
* @param filename the name of a file to read in
* @throws Exception on anything bad happening
*/
private void readFile(String filename) throws Exception {
System.out.println("Dumping file " + filename);
// Open the file and connect it to a buffered reader.
BufferedReader br = new BufferedReader(new FileReader(filename));
ArrayList<String> foundaddr = new ArrayList<String>();
String line = null;
String pattern = "^\d\d\d-[A-Za-z][A-Za-z][A-Za-z]-\d\d\d\d";
String address[] = new String[4];
address[0] = "\d{1,3}\s\[A-Za-z]{1,20}";
address[1] = "\d{1,3}\s\[A-Za-z]{1,20}\s\d{1,3}\[A-Za-z]{1,20}\s\[A-Za-z]{1,20}";
address[2] = "\d{1,3}\s\d{1,3}\[A-Za-z]{1,20}\s\[A-Za-z]{1,20}";
address[3] = "\d\d\s\[A-Za-z]{1,20}";
Pattern r = Pattern.compile(pattern);
// Get lines from the file one at a time until there are no more.
while ((line = br.readLine()) != null) {
if(line.trim().isEmpty()) {
continue;
}
String sample = line.replaceAll("\s+,", ",").replaceAll(",+\s",",");
String[] result = sample.split(",");
String pkgId = result[0].trim().toUpperCase();
String pkgAddr = result[1].trim();
Float f = Float.valueOf(result[2]);
for(String str : result){
// Trying to match for different types
for(String pat : address){
if(str.matches(pat)){
System.out.println(pat);
}
}
if(f < 50 && !pkgId.matches(pattern)) {
Matcher m = r.matcher(str);
if(m.find()) {
foundaddr.add(str);
}
}
}
}
if(foundaddr != null) {
System.out.println(foundaddr.size());
}
// Close the buffer and the underlying file.
br.close();
}
/**
* @param args filename
*/
public static void main(String[] args) {
// Make an instance of the class.
HelloCsi311 theApp = new HelloCsi311();
String filename = null;
// If a command line argument was given, use it as the filename.
if (args.length > 0) {
filename = args[0];
}
try {
// Run the run(), passing in the filename, null if not specified.
theApp.run(filename);
}
catch (Exception e) {
// If anything bad happens, report it.
System.out.println("Something bad happened!");
e.printStackTrace();
}
}
}
文本文件
123-ABC-4567, 15 W. 15th St., 50.1
456-BGT-9876,22 Broadway,24
QAZ-456-QWER, 100 East 20th Street,50
Q2Z-457-QWER, 200 East 20th Street, 49
6785-FGH-9845 ,45 5th Ave, 12.2,
678-FGH-9846 ,45 5th Ave, 12.2
123-ABC-9999, 46 Foo Bar, 220.0
347-poy-3465, 101 B'way,24
下面是应该能够处理地址行的代码行,但由于某种原因,它不匹配正确分隔地址行的模式和输出,可以在 for 循环上方的打印语句中看到处理地址,但出于某种原因,地址行甚至没有被检测为匹配项,我对为什么会这样感到困惑。
代码行问题在于
for(String str : result){
//System.out.println(str);
// Trying to match for different types
for(String pat : address){
if(str.matches(pat)){
System.out.println(pat);
}
}
所需输出 - 按要求编辑 -
22 Broadway
45 5th Ave
101 B'way
我认为问题出在您的正则表达式上。例如\d\d\s\[A-Za-z]{1,20}
,全部转义后变成\d\d\s\[A-Za-z]{1,20}
。细分如下:
\d
:匹配任意数字
\d
:匹配任意数字
\s
:匹配任意空白字符
\[
:匹配[
字符
A-Za-z
:匹配文字A-Za-z
]
:匹配文字字符]
{1,20}
:匹配前面的字符(]
)1-20次。
您可能需要的正则表达式是 \d\d\s[A-Za-z]{1,20}
,作为转义字符串,它是 \d\d\s[A-Za-z]{1,20}
。请注意 [
.
之前没有 \
还有一点要记住,正则表达式可以匹配字符串中的任何位置。例如,正则表达式 a
会匹配字符串 a
,但也会匹配 abc
、bac
、abracadabra
等。要避免这种情况,您必须使用锚定符号 ^
和 $
分别匹配开始和结束。然后你的正则表达式变成 ^\d\d\s[A-Za-z]{1,20}$
.
我还注意到您使用 for 循环 for(String str : result){
将每一列与正则表达式进行匹配。在我看来,您应该只匹配 result[1]
或 pkgAddr
.
最后一点,看看Regex 101。它将允许您针对一堆输入测试您的正则表达式,看看它们是否匹配。
所以我已经让我的程序正确地分隔文本文件的行,甚至可以匹配第一行文本的模式,但我还需要能够检测和分隔文本文件的地址行并根据它们的方向或 street/broadway 对它们进行排序,但我什至无法为地址设置检测到初始模式。我是否使用了错误的正则表达式,这就是为什么无法正确检测到地址部分的原因?
代码
package csi311;
// Import some standard Java libraries.
import java.io.BufferedReader;
import java.io.FileReader;
import java.util.regex.Pattern;
import java.util.regex.Matcher;
import java.util.ArrayList;
/**
* Hello world example. Shows passing in command line arguments, in this case a filename.
* If the filename is given, read in the file and echo it to stdout.
*/
public class HelloCsi311 {
/**
* Class construtor.
*/
public HelloCsi311() {
}
/**
* @param filename the name of a file to read in
* @throws Exception on anything bad happening
*/
public void run(String filename) throws Exception {
if (filename != null) {
readFile(filename);
}
}
/**
* @param filename the name of a file to read in
* @throws Exception on anything bad happening
*/
private void readFile(String filename) throws Exception {
System.out.println("Dumping file " + filename);
// Open the file and connect it to a buffered reader.
BufferedReader br = new BufferedReader(new FileReader(filename));
ArrayList<String> foundaddr = new ArrayList<String>();
String line = null;
String pattern = "^\d\d\d-[A-Za-z][A-Za-z][A-Za-z]-\d\d\d\d";
String address[] = new String[4];
address[0] = "\d{1,3}\s\[A-Za-z]{1,20}";
address[1] = "\d{1,3}\s\[A-Za-z]{1,20}\s\d{1,3}\[A-Za-z]{1,20}\s\[A-Za-z]{1,20}";
address[2] = "\d{1,3}\s\d{1,3}\[A-Za-z]{1,20}\s\[A-Za-z]{1,20}";
address[3] = "\d\d\s\[A-Za-z]{1,20}";
Pattern r = Pattern.compile(pattern);
// Get lines from the file one at a time until there are no more.
while ((line = br.readLine()) != null) {
if(line.trim().isEmpty()) {
continue;
}
String sample = line.replaceAll("\s+,", ",").replaceAll(",+\s",",");
String[] result = sample.split(",");
String pkgId = result[0].trim().toUpperCase();
String pkgAddr = result[1].trim();
Float f = Float.valueOf(result[2]);
for(String str : result){
// Trying to match for different types
for(String pat : address){
if(str.matches(pat)){
System.out.println(pat);
}
}
if(f < 50 && !pkgId.matches(pattern)) {
Matcher m = r.matcher(str);
if(m.find()) {
foundaddr.add(str);
}
}
}
}
if(foundaddr != null) {
System.out.println(foundaddr.size());
}
// Close the buffer and the underlying file.
br.close();
}
/**
* @param args filename
*/
public static void main(String[] args) {
// Make an instance of the class.
HelloCsi311 theApp = new HelloCsi311();
String filename = null;
// If a command line argument was given, use it as the filename.
if (args.length > 0) {
filename = args[0];
}
try {
// Run the run(), passing in the filename, null if not specified.
theApp.run(filename);
}
catch (Exception e) {
// If anything bad happens, report it.
System.out.println("Something bad happened!");
e.printStackTrace();
}
}
}
文本文件
123-ABC-4567, 15 W. 15th St., 50.1
456-BGT-9876,22 Broadway,24
QAZ-456-QWER, 100 East 20th Street,50
Q2Z-457-QWER, 200 East 20th Street, 49
6785-FGH-9845 ,45 5th Ave, 12.2,
678-FGH-9846 ,45 5th Ave, 12.2
123-ABC-9999, 46 Foo Bar, 220.0
347-poy-3465, 101 B'way,24
下面是应该能够处理地址行的代码行,但由于某种原因,它不匹配正确分隔地址行的模式和输出,可以在 for 循环上方的打印语句中看到处理地址,但出于某种原因,地址行甚至没有被检测为匹配项,我对为什么会这样感到困惑。
代码行问题在于
for(String str : result){
//System.out.println(str);
// Trying to match for different types
for(String pat : address){
if(str.matches(pat)){
System.out.println(pat);
}
}
所需输出 - 按要求编辑 -
22 Broadway
45 5th Ave
101 B'way
我认为问题出在您的正则表达式上。例如\d\d\s\[A-Za-z]{1,20}
,全部转义后变成\d\d\s\[A-Za-z]{1,20}
。细分如下:
\d
:匹配任意数字\d
:匹配任意数字\s
:匹配任意空白字符\[
:匹配[
字符A-Za-z
:匹配文字A-Za-z
]
:匹配文字字符]
{1,20}
:匹配前面的字符(]
)1-20次。
您可能需要的正则表达式是 \d\d\s[A-Za-z]{1,20}
,作为转义字符串,它是 \d\d\s[A-Za-z]{1,20}
。请注意 [
.
\
还有一点要记住,正则表达式可以匹配字符串中的任何位置。例如,正则表达式 a
会匹配字符串 a
,但也会匹配 abc
、bac
、abracadabra
等。要避免这种情况,您必须使用锚定符号 ^
和 $
分别匹配开始和结束。然后你的正则表达式变成 ^\d\d\s[A-Za-z]{1,20}$
.
我还注意到您使用 for 循环 for(String str : result){
将每一列与正则表达式进行匹配。在我看来,您应该只匹配 result[1]
或 pkgAddr
.
最后一点,看看Regex 101。它将允许您针对一堆输入测试您的正则表达式,看看它们是否匹配。