仅提取 1 个单词的行?
Extracting lines with only 1 word?
试图只获取其中包含 1 个单词的行。
目前的方法得到正确的结果,但有时输入文件的每个单词之间有超过 4 行。所以需要一种方法来 只获取其中包含 1 个单词的行。 有什么想法吗?
下面是输入文本的示例:
adversary
someone who offers opposition
The students are united by shared suffering, and by a common adversary.
— New York Times (Nov 10, 2014)
aplomb
great coolness and composure under strain
I wish I had handled it with aplomb.
— New York Times (May 18, 2014)
apprehensive
所以输出应该是这样的:
adversary
aplomb
apprehensive
目前的代码如下:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.PrintWriter;
import java.nio.file.Files;
import java.nio.file.Paths;
public class Process {
public static void main(String[] args) {
String fileNameOutput = "OutputFile.txt";
String fileName = "InputWords";
try (BufferedReader bReader = Files.newBufferedReader(Paths.get(fileName))){
PrintWriter outputStream = new PrintWriter(fileNameOutput);
int lineNum = 0;
String line = null;
while ( (line = bReader.readLine() ) != null ) {
lineNum++;
if ( lineNum % 4 == 0 ) continue;
outputStream.println(line);
}
outputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
感谢您的宝贵时间。
编辑
根据以下建议的修复从控制台获取此错误。
java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(Unknown Source)
at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
at sun.nio.cs.StreamDecoder.read(Unknown Source)
at java.io.InputStreamReader.read(Unknown Source)
at java.io.BufferedReader.fill(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at Process.main(Process.java:20)
嗯,而不是
if ( lineNum % 4 == 0 ) continue;
条件,你可以简单的检查你刚刚读到的行是否包含多个token :
if (line.split(" ").length > 1) continue;
或
if (line.indexOf(" ") >= 0) continue;
后一种情况应该比前一种更有效率。
而不是
if ( lineNum % 4 == 0 ) continue;
只需检查包含 space.
的行
if(line.trim().contains(" ")) continue;
您在 java.io.BufferedReader.readLine(Unknown Source) 收到一条错误消息,因此未找到输入文件...
尝试更改文件名
String fileName = "InputWords";
to
String fileName = "InputWords.txt";
取决于你对"word"的定义:
- 一系列字母
- 非空格字符的序列
- 表示单词的字形(例如中文)
让我们坚持前两个,并使用正则表达式进行检查,这样我们也可以轻松地忽略前导和尾随空格。以下是三种方式:
if (line.matches("\s*[a-zA-Z]+\s*")) // One or more ASCII letters
outputStream.println(line);
if (line.matches("\s*\p{L}+\s*")) // One or more Unicode letters
outputStream.println(line);
if (line.matches("\s*\S+\s*")) // One or more non-space characters
outputStream.println(line);
至于MalformedInputException
,是代码页不匹配导致的(StreamDecoder
抛出的异常)。
newBufferedReader(path)
以UTF-8读取文件,文件很可能是系统默认代码页,而不是UTF-8。
改用newBufferedReader(path, Charset.defaultCharset())
。
工作!!需要添加字符集。
public static void main(String args[]){
//testAnimal();
String fileNameOutput = "OutputFile.txt";
String fileName = "InputWords.txt";
Charset cs = Charset.defaultCharset() ;
try (BufferedReader bReader = Files.newBufferedReader(Paths.get(fileName), cs)){
PrintWriter outputStream = new PrintWriter(fileNameOutput);
int lineNum = 0;
String line = null;
while ( (line = bReader.readLine() ) != null ) {
lineNum++;
if (line.split(" ").length > 1) continue;
outputStream.println(line);
}
outputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
试图只获取其中包含 1 个单词的行。
目前的方法得到正确的结果,但有时输入文件的每个单词之间有超过 4 行。所以需要一种方法来 只获取其中包含 1 个单词的行。 有什么想法吗?
下面是输入文本的示例:
adversary
someone who offers opposition
The students are united by shared suffering, and by a common adversary.
— New York Times (Nov 10, 2014)
aplomb
great coolness and composure under strain
I wish I had handled it with aplomb.
— New York Times (May 18, 2014)
apprehensive
所以输出应该是这样的:
adversary
aplomb
apprehensive
目前的代码如下:
import java.io.BufferedReader;
import java.io.IOException;
import java.io.PrintWriter;
import java.nio.file.Files;
import java.nio.file.Paths;
public class Process {
public static void main(String[] args) {
String fileNameOutput = "OutputFile.txt";
String fileName = "InputWords";
try (BufferedReader bReader = Files.newBufferedReader(Paths.get(fileName))){
PrintWriter outputStream = new PrintWriter(fileNameOutput);
int lineNum = 0;
String line = null;
while ( (line = bReader.readLine() ) != null ) {
lineNum++;
if ( lineNum % 4 == 0 ) continue;
outputStream.println(line);
}
outputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
感谢您的宝贵时间。
编辑
根据以下建议的修复从控制台获取此错误。
java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(Unknown Source)
at sun.nio.cs.StreamDecoder.implRead(Unknown Source)
at sun.nio.cs.StreamDecoder.read(Unknown Source)
at java.io.InputStreamReader.read(Unknown Source)
at java.io.BufferedReader.fill(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at java.io.BufferedReader.readLine(Unknown Source)
at Process.main(Process.java:20)
嗯,而不是
if ( lineNum % 4 == 0 ) continue;
条件,你可以简单的检查你刚刚读到的行是否包含多个token :
if (line.split(" ").length > 1) continue;
或
if (line.indexOf(" ") >= 0) continue;
后一种情况应该比前一种更有效率。
而不是
if ( lineNum % 4 == 0 ) continue;
只需检查包含 space.
的行if(line.trim().contains(" ")) continue;
您在 java.io.BufferedReader.readLine(Unknown Source) 收到一条错误消息,因此未找到输入文件... 尝试更改文件名
String fileName = "InputWords";
to
String fileName = "InputWords.txt";
取决于你对"word"的定义:
- 一系列字母
- 非空格字符的序列
- 表示单词的字形(例如中文)
让我们坚持前两个,并使用正则表达式进行检查,这样我们也可以轻松地忽略前导和尾随空格。以下是三种方式:
if (line.matches("\s*[a-zA-Z]+\s*")) // One or more ASCII letters
outputStream.println(line);
if (line.matches("\s*\p{L}+\s*")) // One or more Unicode letters
outputStream.println(line);
if (line.matches("\s*\S+\s*")) // One or more non-space characters
outputStream.println(line);
至于MalformedInputException
,是代码页不匹配导致的(StreamDecoder
抛出的异常)。
newBufferedReader(path)
以UTF-8读取文件,文件很可能是系统默认代码页,而不是UTF-8。
改用newBufferedReader(path, Charset.defaultCharset())
。
工作!!需要添加字符集。
public static void main(String args[]){
//testAnimal();
String fileNameOutput = "OutputFile.txt";
String fileName = "InputWords.txt";
Charset cs = Charset.defaultCharset() ;
try (BufferedReader bReader = Files.newBufferedReader(Paths.get(fileName), cs)){
PrintWriter outputStream = new PrintWriter(fileNameOutput);
int lineNum = 0;
String line = null;
while ( (line = bReader.readLine() ) != null ) {
lineNum++;
if (line.split(" ").length > 1) continue;
outputStream.println(line);
}
outputStream.close();
} catch (IOException e) {
e.printStackTrace();
}
}