触发特定条件时,如何使用 C# 更改 CSV 文件中特定列中的特定单元格?
How can I change certain cells in a certain column in a CSV file with C # when a certain condition is triggered?
所以在我的程序中,用户可以选择一个带有 OpenFileDialog 的文件,如果他想用 SaveFileDialog 保存文件,csv 文件的列和行应该改变。为此我已经试过了
保存文件对话框:
List<string> liste = new List<string>();
// Build the header row into the output:
liste.Add(String.Join(',', Enum.GetNames<CsvColumn>()));
CultureInfo ci = new CultureInfo("de-DE"); // neccessary only when running the code from other cultures.
SaveFileDialog dialog = new SaveFileDialog();
dialog.Filter = "CVS (*.cvs)|*.csv|All files (*.*)|*.*";
if (dialog.ShowDialog() == true)
{
string line;
// Read the file and display it line by line.
try
{
System.IO.StreamReader file = new System.IO.StreamReader(path);
while ((line = file.ReadLine()) != null)
{
var cellArray = Regex.Split(line, @"[\t,](?=(?:[^\""]|\""[^\""]*\"")*$)")
.Select(s => Regex.Replace(s.Replace("\"\"", "\""), "^\"|\"$", "")).ToArray();
// Check value of Betrag, only operate on it if there is a decimal value there
if (decimal.TryParse(cellArray[(int)CsvColumn.Betrag], NumberStyles.Any, ci, out decimal betrag))
{
if (betrag >= 0)
{
cellArray[(int)CsvColumn.Soll] = "42590";
cellArray[(int)CsvColumn.Haben] = "441206";
}
else
{
cellArray[(int)CsvColumn.Soll] = "441206";
cellArray[(int)CsvColumn.Haben] = "42590";
}
// Assuming we only write to the purple field when the green field was a decimal:
cellArray[(int)CsvColumn.Belegnummer] = "a dummy text";
}
// Make sure you escape the columns that have a comma
liste.Add(string.Join(",", cellArray.Select(x => x.Contains(',') ? $"\"{x}\"" : x)) + "\n");
}
File.WriteAllLines(dialog.FileName, liste);
file.Close();
}
catch
{
MessageBox.Show("Der Gewählte Prozess wird bereits von einem anderen verwendet,\n " +
" bitte versuchen sie es erneut");
}
}
现在我更改了 header,但现在当您看这里的图片时我想要这样:
我要执行的操作:
- 当绿色字段在正区域时,
42590
应该在蓝色 字段和 orange 字段中的 441206
。
- 如果 green 值为 negative 那么
441206
应该在 blue 字段和 orange. 中的 42590
- 在紫色字段中,应该会自动写入一个虚拟文本。
那么如何使用我的C#代码来填写我在代码中标记的字段呢?
编辑
我的文本格式输入 CSV 文件的示例:
Datum;Wertstellung;Kategorie;Name;Verwendungszweck;Konto;Bank;Betrag;Währung
31.10.2019;01.11.2019;;Some Text;;;;-42,89;EUR
31.10.2019;01.11.2019;;Some Text;;;;-236,98;EUR
31.10.2019;31.10.2019;;Some Text;;;;-200;EUR
30.10.2019;29.10.2019;;Some Text;;;;204,1;EUR
30.10.2019;31.10.2019;;Some Text;;;;-646,98;EUR
任务本身非常简单,但您的尝试显示出许多外部影响且几乎没有文档。这会导致对您的 post 发表许多评论,但是 best-practise 答案确实需要解决您迄今为止忽略的许多较小的元素。你已经把文件管理整理好了,所以我会尽量把重点放在数组逻辑上。
Make sure you have run and debugged your code before posting, the output from the initial post has a few quirks:
Your input file uses a semi-colon, so you need to split the line by ONLY THAT CHARACTER in your regular expression:
var cellArray = Regex.Split(line, @"[;](?=(?:[^\""]|\""[^\""]*\"")*$)")
.Select(s => Regex.Replace(s.Replace("\"\"", "\""), "^\"|\"$", "")).ToArray();
You can't assume to split the string by multiple delimiters at the same time because only value that contain the file specific delimiter will be quote escaped.
This line is doing nothing, it looks like a previous attempt, .Split()
and .ToArray()
return new values, they do not manipulate the source value, as you are not using the result of this line of code just remove it:
//line.Split(new char[] { '\t' }).ToArray();
The header row is being written into the first cell of the first row, while it may look like it works, I challenge you to explain the intent. You have also used a semicolon as the delimiter, even though the rest of your output is using comma, so this is fixed too. You will also find it far simpler to write this header row first, before we even read the input file:
List<String> liste = new List<string>();
// Build the header row into the output:
liste.Add("Belegdatum,Buchungsdatum,Belegnummer,Buchungstext,Verwendungszweck,Soll,Haben,Betrag,Währung");
With the german decimal separator being a comma, you will also need to escape the Betrag
decimal value in the output
liste.Add(string.Join(",", cellArray.Select(x => x.Contains(',') ? $"\"{x}\"" : x)) + "\n");
Alternatively, you could use a semi-colon like your input data however it is still good practise to test for and escape the values that might contain the delimiter character.
Do you really want the additional line break in the output?
It is not necessary to append each line with the "\n"
line feed character because you are later using WriteAllLines()
. This method accepts an array of lines and will inject the line break between each line for you. In file processing like this it is only necessary to manually include the line feed if you were storing the output as a single string variable and perhaps later using WriteAllText()
to write the final output to file.
- This is often not clear when referencing different guidance material on text file manipulations, be aware of this if you copy one technique from an article that maintains an array of the lines, and a separate example that uses a single string variable or
StringBuilder
or StringWriter
approaches.
The line from above now becomes this, note the trailing \n
has been removed:
liste.Add(string.Join(",", cellArray.Select(x => x.Contains(',') ? $"\"{x}\"" : x)));
tldr; - 给我看代码!
一种简单的正向处理方法
它用于 light-weight 代码,但复杂的逻辑可能更难阅读,但是当您将每一行解析到数组中时,您可以根据您的规则简单地操作值。我们可以将其称为 sequential、in-line 或 forward processing 因为我们阅读了一次输入、处理和准备输出。
List<string> liste = new List<string>();
// Build the header row into the output:
liste.Add("Belegdatum,Buchungsdatum,Belegnummer,Buchungstext,Verwendungszweck,Soll,Haben,Betrag,Währung");
CultureInfo ci = new CultureInfo("de-DE"); // necessary only when running the code from other cultures.
SaveFileDialog dialog = new SaveFileDialog();
dialog.Filter = "CVS (*.cvs)|*.csv|All files (*.*)|*.*";
if (dialog.ShowDialog() == true)
{
string line;
// Read the file and display it line by line.
try
{
System.IO.StreamReader file = new System.IO.StreamReader(path);
int counter = 0;
while ((line = file.ReadLine()) != null)
{
counter++;
var cellArray = Regex.Split(line, @"[;](?=(?:[^\""]|\""[^\""]*\"")*$)")
.Select(s => Regex.Replace(s.Replace("\"\"", "\""), "^\"|\"$", "")).ToArray();
// Skip lines that fail for any reason
try
{
// Check value of Betrag, only operate on it if there is a decimal value there
if (decimal.TryParse(cellArray[7], NumberStyles.Any, ci, out decimal betrag))
{
if (betrag >= 0)
{
cellArray[5] = "42590";
cellArray[6] = "441206";
}
else
{
cellArray[5] = "441206";
cellArray[6] = "42590";
}
// Assuming we only write to the purple field when the green field was a decimal:
cellArray[2] = "a dummy text";
}
else
{
// Skip lines where the Betrag is not a decimal
// this will cover the case when or if the first line is the header.
continue;
}
}
catch(Exception ex)
{
// Construct a message box to help the user resolve the issue.
// You can use the MessageBox API to allow the user to cancel the process if you want to extend this.
// or remove the message altogether if you want it to silently skip the erroneous rows.
MessageBox.Show("Fehler beim Analysieren der Eingabezeile,\n" +
$"{ex.Message}\n\n " +
$"{counter}:> {line} \n " +
$"{new String(' ', counter.ToString().Length)} - {cellArray.Length} Cells\n " +
$"|{String.Join("|", cellArray)}|\n " +
"\n " +
" Zeile wird verworfen, weiter!");
continue; // advance to the next iteration of the while loop.
}
// Make sure you escape the columns that have a comma
liste.Add(string.Join(",", cellArray.Select(x => x.Contains(',') ? $"\"{x}\"" : x)));
}
File.WriteAllLines(dialog.FileName, liste);
file.Close();
}
catch
{
MessageBox.Show("Der Gewählte Prozess wird bereits von einem anderen verwendet,\n " +
" bitte versuchen sie es erneut");
}
}
使用命名常量
如果您试图避免 OO 方法,那么它可以通过引入一些常量来引用索引来使代码更易于阅读,这有很多变体,但是使代码更易于阅读将有所帮助在以后的代码维护和理解中。
定义常量,我建议在静态 class 定义中执行此操作以将这些值组合在一起,而不是仅将它们定义为局部变量或实例变量。
如果您只需要将字符串映射到整数,或者只想给整数值一个名称,enum
是另一种方法。
public enum CsvColumn
{
Belegdatum = 0,
Buchungsdatum = 1,
Belegnummer = 2,
Buchungstext = 3,
Verwendungszweck = 4,
Soll = 5,
Haben = 6,
Betrag = 7,
Währung = 8
}
枚举的额外好处是可以使用简单的命令来检索列的所有名称,现在我们可以使用它来构建 header 行 AND 作为索引代码中的引用:
List<string> liste = new List<string>();
// Build the header row into the output:
liste.Add(String.Join(',', Enum.GetNames<CsvColumn>()));
In previous versions of .Net the generic overload for Enum functions were not defined, in that case you will need to cast the type of the enum:
liste.Add(String.Join(',', Enum.GetNames(typeof(CsvColumn))));
https://docs.microsoft.com/en-us/dotnet/api/system.enum.getnames?view=netframework-4.7.2
在以下使用枚举引用的逻辑中,我们需要将枚举值显式转换为 int
。如果您改用 int
常量,则不需要 (int)
显式转换。无论哪种方式,现在我们都可以立即理解逻辑的意图,而不必记住索引 5 和 6 处的列应该是什么意思。
if (decimal.TryParse(cellArray[(int)CsvColumn.Betrag], NumberStyles.Any, ci, out decimal betrag))
{
if (betrag >= 0)
{
cellArray[(int)CsvColumn.Soll] = "42590";
cellArray[(int)CsvColumn.Haben] = "441206";
}
else
{
cellArray[(int)CsvColumn.Soll] = "441206";
cellArray[(int)CsvColumn.Haben] = "42590";
}
// Assuming we only write to the purple field when the green field was a decimal:
cellArray[(int)CsvColumn.Belegnummer] = "a dummy text";
}
View a fiddle of this implementation: https://dotnetfiddle.net/Cd10Cd
当然,类似的技术可以用于 "42590"
和 "441206"
值,这些值必须有某种业务 relevance/significance。所以再次将它们存储为常量 named 字符串变量。
- 这里我称之为魔术字符串,它们没有任何意义并且在代码重构过程中很容易被破坏,如果离散值具有特定的业务意义,那么它也应该在代码中有一个特定的名称。
OO 方法
使用 Object-Oriented 方法可能意味着很多事情,在这种情况下,我们想要分离 3 个不同的关注点或过程,解析输入、执行业务逻辑、格式化输出。您可以简单地创建 3 个接受字符串数组的方法,但是通过使用结构化 object 对我们的业务领域概念 row 进行建模,这段代码变得难以理解CSV 文件中我们可以删除很多数字,例如,数组中的哪个元素是 Betrag
(值)。
在此处查看 OO Fiddle:https://dotnetfiddle.net/tjxcQN
You could use this Object-Oriented concept in the above code directly, parsing each line into the object, processing and serializing back to a string value all in one code block, however that makes it hard to gain a higer level view of the process which is necessary to understand the code itself. Even if you do this in your head, when we look at our peer's code, we break it down into blocks or discrete steps. So to be a good coding citizen, modularise your logic into functional methods where you can, it will assist you in the future when you need to write unit tests and it will help to keep your code clean, but also to allow us to extend your code in the future.
对于这个例子,我们将创建一个简单的模型来表示每条线。请注意,此示例采用了将日期字段解析为 DateTime
属性的额外步骤,即使您在本示例中不需要它们。我故意使用常量而不是枚举来向您展示不同的概念。你使用当天有意义的东西,这仍然是第一个原则方法,你可以使用不同的库来管理与 CSV、XML、JSON 和其他文本格式之间的序列化。
如果您的业务需求是在应用程序中显示这些信息,而不是仅仅读取一个文件然后直接写回另一个文件,那么这些信息可能对您有所帮助,否则获取是一个好习惯如果您只是在练习,因为更大的应用程序或更大的团队将需要这种级别的模块化,这本身并不是一个特定的 OO 概念...... OO 部分来自我们定义处理逻辑的地方,在这个例子中 BankRecord
包含解析 CSV 字符串输入以及如何序列化回 CSV 输出的逻辑。
public class BankRecord
{
/// <summary> Receipt Date </summary>
public DateTime Belegdatum { get; set; }
/// <summary> Entry Date </summary>
public DateTime Buchungsdatum { get; set; }
/// <summary>Sequence number</summary>
public string Belegnummer { get; set; }
/// <summary>Memo - Description</summary>
public string Buchungstext { get; set; }
/// <summary>Purpose</summary>
public string Verwendungszweck { get; set; }
/// <summary>Debit</summary>
public string Soll { get; set; }
/// <summary>Credit</summary>
public string Haben { get; set; }
/// <summary>Amount</summary>
public decimal Betrag { get; set; }
/// <summary>Currency</summary>
public string Währung { get; set; }
/// <summary> Column Index Definitions to simplify the CSV parsing</summary>
public static class Columns
{
public const int Belegdatum = 0;
public const int Buchungsdatum = 1;
public const int Belegnummer = 2;
public const int Buchungstext = 3;
public const int Verwendungszweck = 4;
public const int Soll = 5;
public const int Haben = 6;
public const int Betrag = 7;
public const int Währung = 8;
/// <summary>
/// Construct a CSV Header row from these column definitions
/// </summary>
public static string BuildCsvHeader()
{
return String.Join(',',
nameof(Belegdatum),
nameof(Buchungsdatum),
nameof(Belegnummer),
nameof(Buchungstext),
nameof(Verwendungszweck),
nameof(Soll),
nameof(Haben),
nameof(Betrag),
nameof(Währung));
}
}
/// <summary>
/// Parse a CSV string using the <see cref="Columns"/> definitions as the index for each of the named properties in this class
/// </summary>
/// <param name="csvLine">The CSV Line to parse</param>
/// <param name="provider">An object that supplies culture-specific formatting information.</param>
/// <returns>BankRecord populated from the input string</returns>
public static BankRecord FromCSV(string csvLine, IFormatProvider provider)
{
var cellArray = Regex.Split(csvLine, @"[\t,](?=(?:[^\""]|\""[^\""]*\"")*$)")
.Select(s => Regex.Replace(s.Replace("\"\"", "\""), "^\"|\"$", "")).ToArray();
// TODO: add in some validation, today we'll just check the number of cells.
if (cellArray.Length != 9)
throw new NotSupportedException("Input CSV did not contain the expected number of columns. (Expected 9)");
// The following is redimentary and doesn't perform any active error checking, the good news is that when it fails you
// will atleast know that it was in this specific method. Proper production level error handling is out of scope for this issue.
var transaction = new BankRecord();
transaction.Belegdatum = DateTime.Parse(cellArray[Columns.Belegdatum], provider);
transaction.Buchungsdatum = DateTime.Parse(cellArray[Columns.Buchungsdatum], provider);
transaction.Belegnummer = cellArray[Columns.Belegnummer];
transaction.Buchungstext = cellArray[Columns.Buchungstext];
transaction.Verwendungszweck = cellArray[Columns.Verwendungszweck];
transaction.Soll = cellArray[Columns.Soll];
transaction.Haben = cellArray[Columns.Haben];
transaction.Betrag = Decimal.Parse(cellArray[Columns.Betrag], provider);
transaction.Währung = cellArray[Columns.Währung];
return transaction;
}
/// <summary>
/// Write this object out to a CSV string that can be interpreted using the <see cref="Columns"/> definitions as the index for each of the named properties in this class
/// </summary>
/// <param name="provider">An object that supplies culture-specific formatting information.</param>
/// <returns>CSV string that represents this record./returns>
public string ToCSV(IFormatProvider provider)
{
return String.Join(',',
CsvEscape(Belegdatum, provider),
CsvEscape(Buchungsdatum, provider),
CsvEscape(Belegnummer, provider),
CsvEscape(Buchungstext, provider),
CsvEscape(Verwendungszweck, provider),
CsvEscape(Soll, provider),
CsvEscape(Haben, provider),
CsvEscape(Betrag, provider),
CsvEscape(Währung, provider));
}
/// <summary>
/// Simple routine to format a value for CSV output
/// </summary>
/// <param name="value">The value to serialize</param>
/// <param name="provider">An object that supplies culture-specific formatting information.</param>
/// <returns>Value escaped and safe for direct inclusion in a CSV output</returns>
private string CsvEscape(object value, IFormatProvider provider)
{
if (value == null)
return string.Empty;
string stringValue = String.Format(provider, "{0}", value);
if (stringValue.Contains(','))
return $"\"{stringValue}\"";
else
return stringValue;
}
/// <summary>
/// Format a Date value for CSV output
/// </summary>
/// <param name="value">The value to serialize</param>
/// <param name="provider">An object that supplies culture-specific formatting information.</param>
/// <remarks>Simple override to allow for common syntax between types, removes the need to the caller to understand the differences</remarks>
/// <returns>Value escaped and safe for direct inclusion in a CSV output</returns>
private string CsvEscape(DateTime value, IFormatProvider provider)
{
string stringValue = String.Format(provider, "{0:d}", value);
if (stringValue.Contains(','))
return $"\"{stringValue}\"";
else
return stringValue;
}
}
以下是流程逻辑:
CultureInfo ci = new CultureInfo("de-DE"); // neccessary only when running the code from other cultures.
// I'll leave this in, but don't call your list, "liste" instead give it some context or meaing, like "records" or "transactions"
List<BankRecord> liste = new List<BankRecord>();
SaveFileDialog dialog = new SaveFileDialog();
dialog.Filter = "CVS (*.cvs)|*.csv|All files (*.*)|*.*";
if (dialog.ShowDialog() == true)
{
string line;
// Read the file line by line.
try
{
#region Parse the input File
System.IO.StreamReader file = new System.IO.StreamReader(path);
while ((line = file.ReadLine()) != null)
{
try
{
liste.Add(BankRecord.FromCSV(line, ci));
}
catch
{
// TODO: re-raise or otherwise handle this error if you want.
// today we will simply ignore erroneous entries and will suppress this error
}
}
#endregion Parse the input File
#region Evaluate your business rules
// Evaluate your business rules here, natively in C#, no arrays or indexes, just manipulate the business domain object.
// assuming that Belegnummer is a sequencing number, not sure if it is from the start of this file or a different context...
// This just demonstrates a potential reason for NOT encapsulating the processing logic inside the BankRecord class.
int previousLineNumber = 47; // aribrary start...
foreach (var transaction in liste)
{
// Check value of Betrag, only operate on it if there is a decimal value there
if (transaction.Betrag >= 0)
{
transaction.Soll = "42590";
transaction.Haben = "441206";
}
else
{
transaction.Soll = "441206";
transaction.Haben = "42590";
}
transaction.Belegnummer = $"#{++previousLineNumber}";
}
#endregion Evaluate your business rules
#region Now write to the output
List<string> outputLines = new List<string>();
outputLines.Add(BankRecord.Columns.BuildCsvHeader());
outputLines.AddRange(liste.Select(x => x.ToCSV(ci)));
File.WriteAllLines(dialog.FileName, outputLines);
file.Close();
#endregion Now write to the output
}
catch
{
MessageBox.Show("Der Gewählte Prozess wird bereits von einem anderen verwendet,\n " +
" bitte versuchen sie es erneut");
}
}
最终输出:
Belegdatum,Buchungsdatum,Belegnummer,Buchungstext,Verwendungszweck,Soll,Haben,Betrag,Währung
31.10.2019,01.11.2019,#48,Some Text,,42590,441206,"50,43",EUR
31.10.2019,01.11.2019,#49,Some Text,,441206,42590,"-239,98",EUR
31.10.2019,31.10.2019,#50,Some Text,,441206,42590,-500,EUR
Belegdatum
Buchungsdatum
Belegnummer
Buchungstext
Verwendungszweck
Soll
Haben
Betrag
Währung
31.10.2019
01.11.2019
#48
Some Text
42590
441206
50,43
EUR
31.10.2019
01.11.2019
#49
Some Text
441206
42590
-239,98
EUR
31.10.2019
31.10.2019
#50
Some Text
441206
42590
-500
EUR
所以在我的程序中,用户可以选择一个带有 OpenFileDialog 的文件,如果他想用 SaveFileDialog 保存文件,csv 文件的列和行应该改变。为此我已经试过了
保存文件对话框:
List<string> liste = new List<string>();
// Build the header row into the output:
liste.Add(String.Join(',', Enum.GetNames<CsvColumn>()));
CultureInfo ci = new CultureInfo("de-DE"); // neccessary only when running the code from other cultures.
SaveFileDialog dialog = new SaveFileDialog();
dialog.Filter = "CVS (*.cvs)|*.csv|All files (*.*)|*.*";
if (dialog.ShowDialog() == true)
{
string line;
// Read the file and display it line by line.
try
{
System.IO.StreamReader file = new System.IO.StreamReader(path);
while ((line = file.ReadLine()) != null)
{
var cellArray = Regex.Split(line, @"[\t,](?=(?:[^\""]|\""[^\""]*\"")*$)")
.Select(s => Regex.Replace(s.Replace("\"\"", "\""), "^\"|\"$", "")).ToArray();
// Check value of Betrag, only operate on it if there is a decimal value there
if (decimal.TryParse(cellArray[(int)CsvColumn.Betrag], NumberStyles.Any, ci, out decimal betrag))
{
if (betrag >= 0)
{
cellArray[(int)CsvColumn.Soll] = "42590";
cellArray[(int)CsvColumn.Haben] = "441206";
}
else
{
cellArray[(int)CsvColumn.Soll] = "441206";
cellArray[(int)CsvColumn.Haben] = "42590";
}
// Assuming we only write to the purple field when the green field was a decimal:
cellArray[(int)CsvColumn.Belegnummer] = "a dummy text";
}
// Make sure you escape the columns that have a comma
liste.Add(string.Join(",", cellArray.Select(x => x.Contains(',') ? $"\"{x}\"" : x)) + "\n");
}
File.WriteAllLines(dialog.FileName, liste);
file.Close();
}
catch
{
MessageBox.Show("Der Gewählte Prozess wird bereits von einem anderen verwendet,\n " +
" bitte versuchen sie es erneut");
}
}
现在我更改了 header,但现在当您看这里的图片时我想要这样:
我要执行的操作:
- 当绿色字段在正区域时,
42590
应该在蓝色 字段和 orange 字段中的441206
。 - 如果 green 值为 negative 那么
441206
应该在 blue 字段和 orange. 中的 42590
- 在紫色字段中,应该会自动写入一个虚拟文本。
那么如何使用我的C#代码来填写我在代码中标记的字段呢?
编辑
我的文本格式输入 CSV 文件的示例:
Datum;Wertstellung;Kategorie;Name;Verwendungszweck;Konto;Bank;Betrag;Währung 31.10.2019;01.11.2019;;Some Text;;;;-42,89;EUR 31.10.2019;01.11.2019;;Some Text;;;;-236,98;EUR 31.10.2019;31.10.2019;;Some Text;;;;-200;EUR 30.10.2019;29.10.2019;;Some Text;;;;204,1;EUR 30.10.2019;31.10.2019;;Some Text;;;;-646,98;EUR
任务本身非常简单,但您的尝试显示出许多外部影响且几乎没有文档。这会导致对您的 post 发表许多评论,但是 best-practise 答案确实需要解决您迄今为止忽略的许多较小的元素。你已经把文件管理整理好了,所以我会尽量把重点放在数组逻辑上。
Make sure you have run and debugged your code before posting, the output from the initial post has a few quirks:
Your input file uses a semi-colon, so you need to split the line by ONLY THAT CHARACTER in your regular expression:
var cellArray = Regex.Split(line, @"[;](?=(?:[^\""]|\""[^\""]*\"")*$)") .Select(s => Regex.Replace(s.Replace("\"\"", "\""), "^\"|\"$", "")).ToArray();
You can't assume to split the string by multiple delimiters at the same time because only value that contain the file specific delimiter will be quote escaped.
This line is doing nothing, it looks like a previous attempt,
.Split()
and.ToArray()
return new values, they do not manipulate the source value, as you are not using the result of this line of code just remove it://line.Split(new char[] { '\t' }).ToArray();
The header row is being written into the first cell of the first row, while it may look like it works, I challenge you to explain the intent. You have also used a semicolon as the delimiter, even though the rest of your output is using comma, so this is fixed too. You will also find it far simpler to write this header row first, before we even read the input file:
List<String> liste = new List<string>(); // Build the header row into the output: liste.Add("Belegdatum,Buchungsdatum,Belegnummer,Buchungstext,Verwendungszweck,Soll,Haben,Betrag,Währung");
With the german decimal separator being a comma, you will also need to escape the
Betrag
decimal value in the outputliste.Add(string.Join(",", cellArray.Select(x => x.Contains(',') ? $"\"{x}\"" : x)) + "\n");
Alternatively, you could use a semi-colon like your input data however it is still good practise to test for and escape the values that might contain the delimiter character.
Do you really want the additional line break in the output?
It is not necessary to append each line with the
"\n"
line feed character because you are later usingWriteAllLines()
. This method accepts an array of lines and will inject the line break between each line for you. In file processing like this it is only necessary to manually include the line feed if you were storing the output as a single string variable and perhaps later usingWriteAllText()
to write the final output to file.
- This is often not clear when referencing different guidance material on text file manipulations, be aware of this if you copy one technique from an article that maintains an array of the lines, and a separate example that uses a single string variable or
StringBuilder
orStringWriter
approaches.The line from above now becomes this, note the trailing
\n
has been removed:liste.Add(string.Join(",", cellArray.Select(x => x.Contains(',') ? $"\"{x}\"" : x)));
tldr; - 给我看代码!
一种简单的正向处理方法
它用于 light-weight 代码,但复杂的逻辑可能更难阅读,但是当您将每一行解析到数组中时,您可以根据您的规则简单地操作值。我们可以将其称为 sequential、in-line 或 forward processing 因为我们阅读了一次输入、处理和准备输出。
List<string> liste = new List<string>(); // Build the header row into the output: liste.Add("Belegdatum,Buchungsdatum,Belegnummer,Buchungstext,Verwendungszweck,Soll,Haben,Betrag,Währung"); CultureInfo ci = new CultureInfo("de-DE"); // necessary only when running the code from other cultures. SaveFileDialog dialog = new SaveFileDialog(); dialog.Filter = "CVS (*.cvs)|*.csv|All files (*.*)|*.*"; if (dialog.ShowDialog() == true) { string line; // Read the file and display it line by line. try { System.IO.StreamReader file = new System.IO.StreamReader(path); int counter = 0; while ((line = file.ReadLine()) != null) { counter++; var cellArray = Regex.Split(line, @"[;](?=(?:[^\""]|\""[^\""]*\"")*$)") .Select(s => Regex.Replace(s.Replace("\"\"", "\""), "^\"|\"$", "")).ToArray(); // Skip lines that fail for any reason try { // Check value of Betrag, only operate on it if there is a decimal value there if (decimal.TryParse(cellArray[7], NumberStyles.Any, ci, out decimal betrag)) { if (betrag >= 0) { cellArray[5] = "42590"; cellArray[6] = "441206"; } else { cellArray[5] = "441206"; cellArray[6] = "42590"; } // Assuming we only write to the purple field when the green field was a decimal: cellArray[2] = "a dummy text"; } else { // Skip lines where the Betrag is not a decimal // this will cover the case when or if the first line is the header. continue; } } catch(Exception ex) { // Construct a message box to help the user resolve the issue. // You can use the MessageBox API to allow the user to cancel the process if you want to extend this. // or remove the message altogether if you want it to silently skip the erroneous rows. MessageBox.Show("Fehler beim Analysieren der Eingabezeile,\n" + $"{ex.Message}\n\n " + $"{counter}:> {line} \n " + $"{new String(' ', counter.ToString().Length)} - {cellArray.Length} Cells\n " + $"|{String.Join("|", cellArray)}|\n " + "\n " + " Zeile wird verworfen, weiter!"); continue; // advance to the next iteration of the while loop. } // Make sure you escape the columns that have a comma liste.Add(string.Join(",", cellArray.Select(x => x.Contains(',') ? $"\"{x}\"" : x))); } File.WriteAllLines(dialog.FileName, liste); file.Close(); } catch { MessageBox.Show("Der Gewählte Prozess wird bereits von einem anderen verwendet,\n " + " bitte versuchen sie es erneut"); } }
使用命名常量
如果您试图避免 OO 方法,那么它可以通过引入一些常量来引用索引来使代码更易于阅读,这有很多变体,但是使代码更易于阅读将有所帮助在以后的代码维护和理解中。
定义常量,我建议在静态 class 定义中执行此操作以将这些值组合在一起,而不是仅将它们定义为局部变量或实例变量。
如果您只需要将字符串映射到整数,或者只想给整数值一个名称,
enum
是另一种方法。public enum CsvColumn { Belegdatum = 0, Buchungsdatum = 1, Belegnummer = 2, Buchungstext = 3, Verwendungszweck = 4, Soll = 5, Haben = 6, Betrag = 7, Währung = 8 }
枚举的额外好处是可以使用简单的命令来检索列的所有名称,现在我们可以使用它来构建 header 行 AND 作为索引代码中的引用:
List<string> liste = new List<string>(); // Build the header row into the output: liste.Add(String.Join(',', Enum.GetNames<CsvColumn>()));
In previous versions of .Net the generic overload for Enum functions were not defined, in that case you will need to cast the type of the enum:
liste.Add(String.Join(',', Enum.GetNames(typeof(CsvColumn))));
https://docs.microsoft.com/en-us/dotnet/api/system.enum.getnames?view=netframework-4.7.2
在以下使用枚举引用的逻辑中,我们需要将枚举值显式转换为
int
。如果您改用int
常量,则不需要(int)
显式转换。无论哪种方式,现在我们都可以立即理解逻辑的意图,而不必记住索引 5 和 6 处的列应该是什么意思。if (decimal.TryParse(cellArray[(int)CsvColumn.Betrag], NumberStyles.Any, ci, out decimal betrag)) { if (betrag >= 0) { cellArray[(int)CsvColumn.Soll] = "42590"; cellArray[(int)CsvColumn.Haben] = "441206"; } else { cellArray[(int)CsvColumn.Soll] = "441206"; cellArray[(int)CsvColumn.Haben] = "42590"; } // Assuming we only write to the purple field when the green field was a decimal: cellArray[(int)CsvColumn.Belegnummer] = "a dummy text"; }
View a fiddle of this implementation: https://dotnetfiddle.net/Cd10Cd
当然,类似的技术可以用于
"42590"
和"441206"
值,这些值必须有某种业务 relevance/significance。所以再次将它们存储为常量 named 字符串变量。- 这里我称之为魔术字符串,它们没有任何意义并且在代码重构过程中很容易被破坏,如果离散值具有特定的业务意义,那么它也应该在代码中有一个特定的名称。
OO 方法
使用 Object-Oriented 方法可能意味着很多事情,在这种情况下,我们想要分离 3 个不同的关注点或过程,解析输入、执行业务逻辑、格式化输出。您可以简单地创建 3 个接受字符串数组的方法,但是通过使用结构化 object 对我们的业务领域概念 row 进行建模,这段代码变得难以理解CSV 文件中我们可以删除很多数字,例如,数组中的哪个元素是
Betrag
(值)。在此处查看 OO Fiddle:https://dotnetfiddle.net/tjxcQN
You could use this Object-Oriented concept in the above code directly, parsing each line into the object, processing and serializing back to a string value all in one code block, however that makes it hard to gain a higer level view of the process which is necessary to understand the code itself. Even if you do this in your head, when we look at our peer's code, we break it down into blocks or discrete steps. So to be a good coding citizen, modularise your logic into functional methods where you can, it will assist you in the future when you need to write unit tests and it will help to keep your code clean, but also to allow us to extend your code in the future.
对于这个例子,我们将创建一个简单的模型来表示每条线。请注意,此示例采用了将日期字段解析为
DateTime
属性的额外步骤,即使您在本示例中不需要它们。我故意使用常量而不是枚举来向您展示不同的概念。你使用当天有意义的东西,这仍然是第一个原则方法,你可以使用不同的库来管理与 CSV、XML、JSON 和其他文本格式之间的序列化。如果您的业务需求是在应用程序中显示这些信息,而不是仅仅读取一个文件然后直接写回另一个文件,那么这些信息可能对您有所帮助,否则获取是一个好习惯如果您只是在练习,因为更大的应用程序或更大的团队将需要这种级别的模块化,这本身并不是一个特定的 OO 概念...... OO 部分来自我们定义处理逻辑的地方,在这个例子中
BankRecord
包含解析 CSV 字符串输入以及如何序列化回 CSV 输出的逻辑。public class BankRecord { /// <summary> Receipt Date </summary> public DateTime Belegdatum { get; set; } /// <summary> Entry Date </summary> public DateTime Buchungsdatum { get; set; } /// <summary>Sequence number</summary> public string Belegnummer { get; set; } /// <summary>Memo - Description</summary> public string Buchungstext { get; set; } /// <summary>Purpose</summary> public string Verwendungszweck { get; set; } /// <summary>Debit</summary> public string Soll { get; set; } /// <summary>Credit</summary> public string Haben { get; set; } /// <summary>Amount</summary> public decimal Betrag { get; set; } /// <summary>Currency</summary> public string Währung { get; set; } /// <summary> Column Index Definitions to simplify the CSV parsing</summary> public static class Columns { public const int Belegdatum = 0; public const int Buchungsdatum = 1; public const int Belegnummer = 2; public const int Buchungstext = 3; public const int Verwendungszweck = 4; public const int Soll = 5; public const int Haben = 6; public const int Betrag = 7; public const int Währung = 8; /// <summary> /// Construct a CSV Header row from these column definitions /// </summary> public static string BuildCsvHeader() { return String.Join(',', nameof(Belegdatum), nameof(Buchungsdatum), nameof(Belegnummer), nameof(Buchungstext), nameof(Verwendungszweck), nameof(Soll), nameof(Haben), nameof(Betrag), nameof(Währung)); } } /// <summary> /// Parse a CSV string using the <see cref="Columns"/> definitions as the index for each of the named properties in this class /// </summary> /// <param name="csvLine">The CSV Line to parse</param> /// <param name="provider">An object that supplies culture-specific formatting information.</param> /// <returns>BankRecord populated from the input string</returns> public static BankRecord FromCSV(string csvLine, IFormatProvider provider) { var cellArray = Regex.Split(csvLine, @"[\t,](?=(?:[^\""]|\""[^\""]*\"")*$)") .Select(s => Regex.Replace(s.Replace("\"\"", "\""), "^\"|\"$", "")).ToArray(); // TODO: add in some validation, today we'll just check the number of cells. if (cellArray.Length != 9) throw new NotSupportedException("Input CSV did not contain the expected number of columns. (Expected 9)"); // The following is redimentary and doesn't perform any active error checking, the good news is that when it fails you // will atleast know that it was in this specific method. Proper production level error handling is out of scope for this issue. var transaction = new BankRecord(); transaction.Belegdatum = DateTime.Parse(cellArray[Columns.Belegdatum], provider); transaction.Buchungsdatum = DateTime.Parse(cellArray[Columns.Buchungsdatum], provider); transaction.Belegnummer = cellArray[Columns.Belegnummer]; transaction.Buchungstext = cellArray[Columns.Buchungstext]; transaction.Verwendungszweck = cellArray[Columns.Verwendungszweck]; transaction.Soll = cellArray[Columns.Soll]; transaction.Haben = cellArray[Columns.Haben]; transaction.Betrag = Decimal.Parse(cellArray[Columns.Betrag], provider); transaction.Währung = cellArray[Columns.Währung]; return transaction; } /// <summary> /// Write this object out to a CSV string that can be interpreted using the <see cref="Columns"/> definitions as the index for each of the named properties in this class /// </summary> /// <param name="provider">An object that supplies culture-specific formatting information.</param> /// <returns>CSV string that represents this record./returns> public string ToCSV(IFormatProvider provider) { return String.Join(',', CsvEscape(Belegdatum, provider), CsvEscape(Buchungsdatum, provider), CsvEscape(Belegnummer, provider), CsvEscape(Buchungstext, provider), CsvEscape(Verwendungszweck, provider), CsvEscape(Soll, provider), CsvEscape(Haben, provider), CsvEscape(Betrag, provider), CsvEscape(Währung, provider)); } /// <summary> /// Simple routine to format a value for CSV output /// </summary> /// <param name="value">The value to serialize</param> /// <param name="provider">An object that supplies culture-specific formatting information.</param> /// <returns>Value escaped and safe for direct inclusion in a CSV output</returns> private string CsvEscape(object value, IFormatProvider provider) { if (value == null) return string.Empty; string stringValue = String.Format(provider, "{0}", value); if (stringValue.Contains(',')) return $"\"{stringValue}\""; else return stringValue; } /// <summary> /// Format a Date value for CSV output /// </summary> /// <param name="value">The value to serialize</param> /// <param name="provider">An object that supplies culture-specific formatting information.</param> /// <remarks>Simple override to allow for common syntax between types, removes the need to the caller to understand the differences</remarks> /// <returns>Value escaped and safe for direct inclusion in a CSV output</returns> private string CsvEscape(DateTime value, IFormatProvider provider) { string stringValue = String.Format(provider, "{0:d}", value); if (stringValue.Contains(',')) return $"\"{stringValue}\""; else return stringValue; } }
以下是流程逻辑:
CultureInfo ci = new CultureInfo("de-DE"); // neccessary only when running the code from other cultures. // I'll leave this in, but don't call your list, "liste" instead give it some context or meaing, like "records" or "transactions" List<BankRecord> liste = new List<BankRecord>(); SaveFileDialog dialog = new SaveFileDialog(); dialog.Filter = "CVS (*.cvs)|*.csv|All files (*.*)|*.*"; if (dialog.ShowDialog() == true) { string line; // Read the file line by line. try { #region Parse the input File System.IO.StreamReader file = new System.IO.StreamReader(path); while ((line = file.ReadLine()) != null) { try { liste.Add(BankRecord.FromCSV(line, ci)); } catch { // TODO: re-raise or otherwise handle this error if you want. // today we will simply ignore erroneous entries and will suppress this error } } #endregion Parse the input File #region Evaluate your business rules // Evaluate your business rules here, natively in C#, no arrays or indexes, just manipulate the business domain object. // assuming that Belegnummer is a sequencing number, not sure if it is from the start of this file or a different context... // This just demonstrates a potential reason for NOT encapsulating the processing logic inside the BankRecord class. int previousLineNumber = 47; // aribrary start... foreach (var transaction in liste) { // Check value of Betrag, only operate on it if there is a decimal value there if (transaction.Betrag >= 0) { transaction.Soll = "42590"; transaction.Haben = "441206"; } else { transaction.Soll = "441206"; transaction.Haben = "42590"; } transaction.Belegnummer = $"#{++previousLineNumber}"; } #endregion Evaluate your business rules #region Now write to the output List<string> outputLines = new List<string>(); outputLines.Add(BankRecord.Columns.BuildCsvHeader()); outputLines.AddRange(liste.Select(x => x.ToCSV(ci))); File.WriteAllLines(dialog.FileName, outputLines); file.Close(); #endregion Now write to the output } catch { MessageBox.Show("Der Gewählte Prozess wird bereits von einem anderen verwendet,\n " + " bitte versuchen sie es erneut"); } }
最终输出:
Belegdatum,Buchungsdatum,Belegnummer,Buchungstext,Verwendungszweck,Soll,Haben,Betrag,Währung 31.10.2019,01.11.2019,#48,Some Text,,42590,441206,"50,43",EUR 31.10.2019,01.11.2019,#49,Some Text,,441206,42590,"-239,98",EUR 31.10.2019,31.10.2019,#50,Some Text,,441206,42590,-500,EUR
Belegdatum Buchungsdatum Belegnummer Buchungstext Verwendungszweck Soll Haben Betrag Währung 31.10.2019 01.11.2019 #48 Some Text 42590 441206 50,43 EUR 31.10.2019 01.11.2019 #49 Some Text 441206 42590 -239,98 EUR 31.10.2019 31.10.2019 #50 Some Text 441206 42590 -500 EUR