使用 Java 客户端 API 获取 MarkLogic 中的所有文档 URI
Fetching all the document URI's in MarkLogic Using Java Client API
我试图在不知道确切 url 的情况下从数据库中获取所有文档。我收到一个查询
DocumentPage documents =docMgr.read();
while (documents.hasNext()) {
DocumentRecord document = documents.next();
System.out.println(document.getUri());
}
但我没有具体的 urls ,我想要所有的文件
第一步是在数据库上启用您的 uris 词典。
您可以评估一些 XQuery 和 运行 cts:uris()(或服务器端 JS 和 运行 cts.uris()):
ServerEvaluationCall call = client.newServerEval()
.xquery("cts:uris()");
for ( EvalResult result : call.eval() ) {
String uri = result.getString();
System.out.println(uri);
}
两个缺点是:(1) 您需要一个具有 privileges 的用户,并且 (2) 没有分页。
如果您的文档数量较少,则不需要分页。但对于大量文档,建议分页。下面是一些使用搜索 API 和分页的代码:
// do the next eight lines just once
String options =
"<options xmlns='http://marklogic.com/appservices/search'>" +
" <values name='uris'>" +
" <uri/>" +
" </values>" +
"</options>";
QueryOptionsManager optionsMgr = client.newServerConfigManager().newQueryOptionsManager();
optionsMgr.writeOptions("uriOptions", new StringHandle(options));
// run the following each time you need to list all uris
QueryManager queryMgr = client.newQueryManager();
long pageLength = 10000;
queryMgr.setPageLength(pageLength);
ValuesDefinition query = queryMgr.newValuesDefinition("uris", "uriOptions");
// the following "and" query just matches all documents
query.setQueryDefinition(new StructuredQueryBuilder().and());
int start = 1;
boolean hasMore = true;
Transaction transaction = client.openTransaction();
try {
while ( hasMore ) {
CountedDistinctValue[] uriValues =
queryMgr.values(query, new ValuesHandle(), start, transaction).getValues();
for (CountedDistinctValue uriValue : uriValues) {
String uri = uriValue.get("string", String.class);
//System.out.println(uri);
}
start += uriValues.length;
// this is the last page if uriValues is smaller than pageLength
hasMore = uriValues.length == pageLength;
}
} finally {
transaction.commit();
}
仅当您需要与此过程同时发生的 adds/deletes 隔离的有保障的 "snapshot" 列表时,才需要该事务。由于它会增加一些开销,如果您不需要这种精确性,请随时将其删除。
找出页面长度,在queryMgr中可以指定访问的起点。继续增加起点并循环遍历所有 URL。我能够获取所有 URI。这可能不是很好的方法,但有效。
List<String> uriList = new ArrayList<>();
QueryManager queryMgr = client.newQueryManager();
StructuredQueryBuilder qb = new StructuredQueryBuilder();
StructuredQueryDefinition querydef = qb.and(qb.collection("xxxx"), qb.collection("whatever"), qb.collection("whatever"));//outputs 241152
SearchHandle results = queryMgr.search(querydef, new SearchHandle(), 10);
long pageLength = results.getPageLength();
long totalResults = results.getTotalResults();
System.out.println("Total Reuslts: " + totalResults);
long timesToLoop = totalResults / pageLength;
for (int i = 0; i < timesToLoop; i = (int) (i + pageLength)) {
System.out.println("Printing Results from: " + (i) + " to: " + (i + pageLength));
results = queryMgr.search(querydef, new SearchHandle(), i);
MatchDocumentSummary[] summaries = results.getMatchResults();//10 results because page length is 10
for (MatchDocumentSummary summary : summaries) {
// System.out.println("Extracted friom URI-> " + summary.getUri());
uriList.add(summary.getUri());
}
if (i >= 1000) {//number of URI to store/retreive. plus 10
break;
}
}
uriList= uriList.stream().distinct().collect(Collectors.toList());
return uriList;
我试图在不知道确切 url 的情况下从数据库中获取所有文档。我收到一个查询
DocumentPage documents =docMgr.read();
while (documents.hasNext()) {
DocumentRecord document = documents.next();
System.out.println(document.getUri());
}
但我没有具体的 urls ,我想要所有的文件
第一步是在数据库上启用您的 uris 词典。
您可以评估一些 XQuery 和 运行 cts:uris()(或服务器端 JS 和 运行 cts.uris()):
ServerEvaluationCall call = client.newServerEval()
.xquery("cts:uris()");
for ( EvalResult result : call.eval() ) {
String uri = result.getString();
System.out.println(uri);
}
两个缺点是:(1) 您需要一个具有 privileges 的用户,并且 (2) 没有分页。
如果您的文档数量较少,则不需要分页。但对于大量文档,建议分页。下面是一些使用搜索 API 和分页的代码:
// do the next eight lines just once
String options =
"<options xmlns='http://marklogic.com/appservices/search'>" +
" <values name='uris'>" +
" <uri/>" +
" </values>" +
"</options>";
QueryOptionsManager optionsMgr = client.newServerConfigManager().newQueryOptionsManager();
optionsMgr.writeOptions("uriOptions", new StringHandle(options));
// run the following each time you need to list all uris
QueryManager queryMgr = client.newQueryManager();
long pageLength = 10000;
queryMgr.setPageLength(pageLength);
ValuesDefinition query = queryMgr.newValuesDefinition("uris", "uriOptions");
// the following "and" query just matches all documents
query.setQueryDefinition(new StructuredQueryBuilder().and());
int start = 1;
boolean hasMore = true;
Transaction transaction = client.openTransaction();
try {
while ( hasMore ) {
CountedDistinctValue[] uriValues =
queryMgr.values(query, new ValuesHandle(), start, transaction).getValues();
for (CountedDistinctValue uriValue : uriValues) {
String uri = uriValue.get("string", String.class);
//System.out.println(uri);
}
start += uriValues.length;
// this is the last page if uriValues is smaller than pageLength
hasMore = uriValues.length == pageLength;
}
} finally {
transaction.commit();
}
仅当您需要与此过程同时发生的 adds/deletes 隔离的有保障的 "snapshot" 列表时,才需要该事务。由于它会增加一些开销,如果您不需要这种精确性,请随时将其删除。
找出页面长度,在queryMgr中可以指定访问的起点。继续增加起点并循环遍历所有 URL。我能够获取所有 URI。这可能不是很好的方法,但有效。
List<String> uriList = new ArrayList<>();
QueryManager queryMgr = client.newQueryManager();
StructuredQueryBuilder qb = new StructuredQueryBuilder();
StructuredQueryDefinition querydef = qb.and(qb.collection("xxxx"), qb.collection("whatever"), qb.collection("whatever"));//outputs 241152
SearchHandle results = queryMgr.search(querydef, new SearchHandle(), 10);
long pageLength = results.getPageLength();
long totalResults = results.getTotalResults();
System.out.println("Total Reuslts: " + totalResults);
long timesToLoop = totalResults / pageLength;
for (int i = 0; i < timesToLoop; i = (int) (i + pageLength)) {
System.out.println("Printing Results from: " + (i) + " to: " + (i + pageLength));
results = queryMgr.search(querydef, new SearchHandle(), i);
MatchDocumentSummary[] summaries = results.getMatchResults();//10 results because page length is 10
for (MatchDocumentSummary summary : summaries) {
// System.out.println("Extracted friom URI-> " + summary.getUri());
uriList.add(summary.getUri());
}
if (i >= 1000) {//number of URI to store/retreive. plus 10
break;
}
}
uriList= uriList.stream().distinct().collect(Collectors.toList());
return uriList;