阅读罗马页码页码
Read roman page number of page
在 Adobe Reader 电子书的第一页可以有罗马格式的页码,如下图所示
图片:http://i.stack.imgur.com/GSm0Q.jpg
我想用 iText 读出这些页码(不是索引页码),但我不知道我应该使用哪些属性(标签或注释..)。我已经可以用 PdfReader 打开文件,遍历所有页面,但不知道我应该访问这些罗马数字
的内容
using (Stream pdfStream = new FileStream(sourceFileName, FileMode.Open))
{
PdfReader pdfReader = new PdfReader(pdfStream);
for (int index = 1; index <= pdfReader.NumberOfPages; index++)
{
}
}
谢谢。
您正在查找页码如下的 PageLabelExample. In this example, we have a PDF, page_labels.pdf:
在listPageLabels()
方法中,我们创建一个包含所有页面标签的txt文件:
public void listPageLabels(String src, String dest) throws IOException {
// no PDF, just a text file
PrintStream out = new PrintStream(new FileOutputStream(dest));
PdfReader reader = new PdfReader(src);
String[] labels = PdfPageLabels.getPageLabels(reader);
for (int i = 0; i < labels.length; i++) {
out.println(labels[i]);
}
out.flush();
out.close();
reader.close();
}
结果如下所示:
A
B
1
2
3
Movies-4
Movies-5
Movies-6
Movies-7
Movies-8
如果你想要一个 iTextSharp 的例子,看看这个方法:
/**
* Reads the page labels from an existing PDF
* @param src the existing PDF
*/
public string ListPageLabels(byte[] src) {
StringBuilder sb = new StringBuilder();
String[] labels = PdfPageLabels.GetPageLabels(new PdfReader(src));
for (int i = 0; i < labels.Length; i++) {
sb.Append(labels[i]);
sb.AppendLine();
}
return sb.ToString();
}
更新
如评论区所承诺:PdfPageLabels.cs
我不是 C# 开发人员,但这是不添加前缀的 GetPageLabels()
方法的快速而肮脏的版本:
public static String[] GetPageLabels(PdfReader reader) {
int n = reader.NumberOfPages;
PdfDictionary dict = reader.Catalog;
PdfDictionary labels = (PdfDictionary)PdfReader.GetPdfObjectRelease(dict.Get(PdfName.PAGELABELS));
if (labels == null)
return null;
String[] labelstrings = new String[n];
Dictionary<int, PdfObject> numberTree = PdfNumberTree.ReadTree(labels);
int pagecount = 1;
char type = 'D';
for (int i = 0; i < n; i++) {
if (numberTree.ContainsKey(i)) {
PdfDictionary d = (PdfDictionary)PdfReader.GetPdfObjectRelease(numberTree[i]);
if (d.Contains(PdfName.ST)) {
pagecount = ((PdfNumber)d.Get(PdfName.ST)).IntValue;
}
else {
pagecount = 1;
}
if (d.Contains(PdfName.S)) {
type = ((PdfName)d.Get(PdfName.S)).ToString()[1];
}
else {
type = 'e';
}
}
switch (type) {
default:
labelstrings[i] = "" + pagecount;
break;
case 'R':
labelstrings[i] = RomanNumberFactory.GetUpperCaseString(pagecount);
break;
case 'r':
labelstrings[i] = RomanNumberFactory.GetLowerCaseString(pagecount);
break;
case 'A':
labelstrings[i] = RomanAlphabetFactory.GetUpperCaseString(pagecount);
break;
case 'a':
labelstrings[i] = RomanAlphabetFactory.GetLowerCaseString(pagecount);
break;
case 'e':
labelstrings[i] = "";
break;
}
pagecount++;
}
return labelstrings;
}
在 Adobe Reader 电子书的第一页可以有罗马格式的页码,如下图所示
图片:http://i.stack.imgur.com/GSm0Q.jpg
我想用 iText 读出这些页码(不是索引页码),但我不知道我应该使用哪些属性(标签或注释..)。我已经可以用 PdfReader 打开文件,遍历所有页面,但不知道我应该访问这些罗马数字
的内容using (Stream pdfStream = new FileStream(sourceFileName, FileMode.Open))
{
PdfReader pdfReader = new PdfReader(pdfStream);
for (int index = 1; index <= pdfReader.NumberOfPages; index++)
{
}
}
谢谢。
您正在查找页码如下的 PageLabelExample. In this example, we have a PDF, page_labels.pdf:
在listPageLabels()
方法中,我们创建一个包含所有页面标签的txt文件:
public void listPageLabels(String src, String dest) throws IOException {
// no PDF, just a text file
PrintStream out = new PrintStream(new FileOutputStream(dest));
PdfReader reader = new PdfReader(src);
String[] labels = PdfPageLabels.getPageLabels(reader);
for (int i = 0; i < labels.length; i++) {
out.println(labels[i]);
}
out.flush();
out.close();
reader.close();
}
结果如下所示:
A
B
1
2
3
Movies-4
Movies-5
Movies-6
Movies-7
Movies-8
如果你想要一个 iTextSharp 的例子,看看这个方法:
/**
* Reads the page labels from an existing PDF
* @param src the existing PDF
*/
public string ListPageLabels(byte[] src) {
StringBuilder sb = new StringBuilder();
String[] labels = PdfPageLabels.GetPageLabels(new PdfReader(src));
for (int i = 0; i < labels.Length; i++) {
sb.Append(labels[i]);
sb.AppendLine();
}
return sb.ToString();
}
更新
如评论区所承诺:PdfPageLabels.cs
我不是 C# 开发人员,但这是不添加前缀的 GetPageLabels()
方法的快速而肮脏的版本:
public static String[] GetPageLabels(PdfReader reader) {
int n = reader.NumberOfPages;
PdfDictionary dict = reader.Catalog;
PdfDictionary labels = (PdfDictionary)PdfReader.GetPdfObjectRelease(dict.Get(PdfName.PAGELABELS));
if (labels == null)
return null;
String[] labelstrings = new String[n];
Dictionary<int, PdfObject> numberTree = PdfNumberTree.ReadTree(labels);
int pagecount = 1;
char type = 'D';
for (int i = 0; i < n; i++) {
if (numberTree.ContainsKey(i)) {
PdfDictionary d = (PdfDictionary)PdfReader.GetPdfObjectRelease(numberTree[i]);
if (d.Contains(PdfName.ST)) {
pagecount = ((PdfNumber)d.Get(PdfName.ST)).IntValue;
}
else {
pagecount = 1;
}
if (d.Contains(PdfName.S)) {
type = ((PdfName)d.Get(PdfName.S)).ToString()[1];
}
else {
type = 'e';
}
}
switch (type) {
default:
labelstrings[i] = "" + pagecount;
break;
case 'R':
labelstrings[i] = RomanNumberFactory.GetUpperCaseString(pagecount);
break;
case 'r':
labelstrings[i] = RomanNumberFactory.GetLowerCaseString(pagecount);
break;
case 'A':
labelstrings[i] = RomanAlphabetFactory.GetUpperCaseString(pagecount);
break;
case 'a':
labelstrings[i] = RomanAlphabetFactory.GetLowerCaseString(pagecount);
break;
case 'e':
labelstrings[i] = "";
break;
}
pagecount++;
}
return labelstrings;
}