将Java编程中的段落拆分为句子
来源:爱站网时间:2022-11-11编辑:网友分享
将Java编程中的段落拆分为句子的方法其实非常简单,比如给定一个段落然后执行拆分任务等,想要获取这方面的知识点,那下列爱站技术频道小编分享的知识点一定不要错过了。
问题描述
我正在执行一项需要将段落拆分为句子的任务。例如给定一个段落:
"This is a long string with some numbers 123.456,78 or 100.000 and e.g. some abbreviations in it, which shouldn't split the sentence. Sometimes there are problems, i.e. in this one. here and abbr at the end x.y.. cool."
我需要下面4个句子:
This is a long string with some numbers 123.456,78 or 100.000 and e.g. some abbreviations in it, which shouldn't split the sentence.
Sometimes there are problems, i.e. in this one.
here and abbr at the end x.y..
cool
现在它非常类似于用JavaScript实现的this task。
var re = /\b(\w\.\w\.)|([.?!])\s+(?=[A-Za-z])/g;
var str = 'This is a long string with some numbers 123.456,78 or 100.000 and e.g. some abbreviations in it, which shouldn\'t split the sentence. Sometimes there are problems, i.e. in this one. here and abbr at the end x.y.. cool.';
var result = str.replace(re, function(m, g1, g2){
return g1 ? g1 : g2+"\r";
});
var arr = result.split("\r");
document.body.innerHTML = "<pre>" + JSON.stringify(arr, 0, 4) + "</pre>";
[我试图在this link的帮助下用Java实现此功能,但在Java代码中卡住了如何从上方使用replace
函数。
public static void main(String[] args) {
String content = "This is a long string with some numbers 123.456,78 or 100.000 and e.g. some abbreviations in it, which shouldn't split the sentence. Sometimes there are problems, i.e. in this one. here and abbr at the end x.y.. cool.";
Pattern p = Pattern.compile("/\\b(\\w\\.\\w\\.)|([.?!])\\s+(?=[A-Za-z])/g");
Matcher m = p.matcher(content);
List<String> tokens = new LinkedList<String>();
while (m.find()) {
String token = m.group(1); // group 0 is always the entire match
tokens.add(token);
}
System.out.println(tokens);
}
如何在Java编程中做同样的事情?对于这种给定的示例文本,是否有比这更好的方法来将段落拆分为Java中的句子?
思路:
public static void main(String[] args) {
String content = "This is a long string with some numbers 123.456,78 or 100.000 and e.g. some abbreviations in it, which shouldn't split the sentence. Sometimes there are problems, i.e. in this one. here and abbr at the end x.y.. cool.";
BreakIterator bi = BreakIterator.getSentenceInstance();
bi.setText(content);
int index = 0;
while (bi.next() != BreakIterator.DONE) {
String sentence = content.substring(index, bi.current());
System.out.println(sentence);
index = bi.current();
}
}
将Java编程中的段落拆分为句子文章看完了吗?不知道你们对这方面的内容有没有了解透彻,不明白的小伙伴可以多看两遍或者跟小编探讨,小编时刻都在等着你。