网站中的希伯来语渲染

Question

我正在开发一个有互联网的产品 "Admin Panel" - 用户可以在某个地方看到有关该产品的信息。最低要求之一是该网站同时具有英语和希伯来语版本。那么问题是什么？问题是有些字符看起来像 this, But they should look like this.

当我收到来自浏览器的请求时，我使用以下代码 (JAVA) 读取了一个 HTML 文件：

public static String loadPage(String page, String lang) {
    Path path = Paths.get(System.getProperty("user.dir"), "htmlTemplate", lang, page + ".html");
    try (BufferedReader br = Files.newBufferedReader(path)) {
        StringBuilder website = new StringBuilder();
        String currentLine;
        while ((currentLine = br.readLine()) != null) {
            website.append(currentLine);
        }
        return website.toString();
    } catch (Exception e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
    return null;
}

（感谢 Jon Skeet helpig 将其读取为 UTF-8），在我阅读文件后，我将一些评论替换为正确的数据（例如：我有这样的评论this:  我将其替换为 "Itay"), 替换后我只发送响应。

服务器本身使用 sun 的 HttpServer 托管。

我还确保做了这些事情：

我将 html 文件保存为 UTF-8
在 html 文件中有这个元标记：<meta charset="UTF-8">"
其中一个响应 headers 是：Content-Type=text/html;charset=utf-8

顺便说一句，我正在使用 Chrome。

所以我希望我提供了关于我的问题的足够详细信息，如果您需要更多信息，请随时告诉我！

（我也希望我用正确的标签和标题发布了问题）

Answer 1

您需要告诉您的 FileReader 以 UTF8 格式读取。

Answer 2

基本上不使用FileReader。它总是使用平台默认编码，这可能不适合这个文件。

如果您使用的是 Java 的现代版本，最好使用：

Path path = Paths.get(System.getProperty("user.dir"), "htmlTemplate", lang, page + ".html");
br = Files.newBufferedReader(path);

默认情况下将以 UTF-8 读取 - 如果您想要不同的字符集，您可以将其指定为 newBufferedReader 的另一个参数。

我还建议您使用 try-with-resources 语句通过手动 finally 块摆脱所有麻烦：

Path path = Paths.get(System.getProperty("user.dir"), "htmlTemplate", lang, page + ".html");
try (BufferedReader br = Files.newBufferedReader(path)) {
    StringBuilder website = new StringBuilder();
    String currentLine;
    while ((currentLine = br.readLine()) != null) {
        website.append(currentLine);
    }
    return website.toString();
}

请注意，这将删除所有换行符。（请注意，我使用 StringBuilder 来避免重复字符串连接引起的性能问题...）

Answer 3

最后我发现我在读取 UTF-8 时确实遇到了问题，但另一个问题是我没有将它作为 UTF-8 发回所以我是这样发送的：

public void end(HttpExchange t, String response, long tStart, int status) throws IOException {
        try {
            String temp = convertToUTF8(response);
            t.sendResponseHeaders(status, temp.length());
            OutputStream os = t.getResponseBody();
            OutputStream bout= new BufferedOutputStream(os);
            OutputStreamWriter out = new OutputStreamWriter(bout, "UTF-8");
            out.write(response);
            out.flush();
            out.close();
        }catch (UnsupportedEncodingException e) {
            System.out.println("This VM does not support the UTF-8 character set.");
        }catch (IOException e) {
            System.out.println(e.getMessage());
        }
        long tEnd = System.currentTimeMillis();
        long tDelta = tEnd - tStart;
        System.out.println("Done handling request! Time took: " + tDelta);
    }

再次感谢 Jon Skeet 的回答，非常有帮助！

Path path = Paths.get(System.getProperty("user.dir"), "htmlTemplate", lang, page + ".html");
try (BufferedReader br = Files.newBufferedReader(path)) {
    StringBuilder website = new StringBuilder();
    String currentLine;
    while ((currentLine = br.readLine()) != null) {
        website.append(currentLine);
    }
    return website.toString();
}

（这是用他的方式读取UTF-8文件的方法）

网站中的希伯来语渲染

Hebrew rendering in website

html

java

http

utf-8