ReadAsStringAsync returns 虚线描述

ReadAsStringAsync returns dashed description

我有一个方法 ReadJsonUrl,它获取一个 url(字符串地址(例如:https://www.ah.nl/service/rest/delegate?url=%2Fproducten%2Fproduct%2Fwi224732%2Fsmiths-nibb-it-happy-ones-kruis-rond-paprika ))到一个 JSON 文件。

此方法读取 JSON 并在控制台中输出一些数据。

但问题是产品的描述输出为

Smiths Nibb-it hap-­py on-­es kruis-rond pa-­pri-­ka

但如果我在浏览器中检查 JSON,它会显示

Smiths Nibb-it hap­py on­es kruis-rond pa­pri­ka

这就是我想要的打印方式。

我认为问题在于,请求是使用 0px x 0px 分辨率浏览器完成的,因此 returns 将单词分开以保持其可读性。如果我让我的浏览器非常小,那么它也会显示带有破折号的描述。 我在我的代码中添加了一个用户代理,但是没有用。

有人知道如何解决这个问题吗?

我的代码:

    public static async Task<object> ReadJsonUrl(string address)
    {
        using (HttpClient client = new HttpClient())
        {
            client.DefaultRequestHeaders.Add("User-Agent", "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36");

            HttpResponseMessage response = await client.GetAsync(address);
            var content = await response.Content.ReadAsStringAsync();
            //JObject obj = JObject.Parse(content);


            var data = Empty.FromJson(content);

            var product = data.Embedded.Lanes[4].Embedded.Items[0].Embedded.Product;

            Console.WriteLine(product.Id);
            Console.WriteLine(product.Description);
            Console.WriteLine(product.PriceLabel.Now);
            Console.WriteLine(product.Availability.Label);
            Console.WriteLine("-------------------------------------");

            System.Threading.Thread.Sleep(5000);

            //the return value is for later use
            return product;

        }

    }

如果您将第二个字符串(预期输出)复制并粘贴到十六进制编辑器中,它会告诉您它有 0xAD 个字符。这些是 soft hyphens.

Internet Explorer 或 Firefox 等浏览器只会在必要时(在换行符处)显示这些软连字符,但控制台每次都会显示。

为了补充 Thomas Weller 的回答,它很好地解释了问题,这里有一个函数可以从 string 中删除所有软连字符。它被写成一个扩展方法,所以你可以像这样轻松地使用它:

Console.WriteLine(product.Description.RemoveSoftHyphens());

扩展方法:

public static class StringExtensions
{
    public static string RemoveSoftHyphens(this string input)
    {
        var output = new StringBuilder(input.Length);
        foreach (char c in input)
        {
            if (c != 0xAD)
            {
                output.Append(c);
            }
        }
        return output.ToString();
    }
}

作为一些附加信息,这里是 HTML4 对软连字符使用的描述:

In HTML, there are two types of hyphens: the plain hyphen and the soft hyphen. The plain hyphen should be interpreted by a user agent as just another character. The soft hyphen tells the user agent where a line break can occur. Those browsers that interpret soft hyphens must observe the following semantics. If a line is broken at a soft hyphen, a hyphen character must be displayed at the end of the first line. If a line is not broken at a soft hyphen, the user agent must not display a hyphen character. For operations such as searching and sorting, the soft hyphen should always be ignored.