Garbage data issue while loading data from multiple sources to .csv file using SAP DS

Garbage data issue while loading data from multiple sources to .csv file using SAP DS
2023-10-6 04:31:5 Author: blogs.sap.com(查看原文) 阅读量:8 收藏

In today’s world migrating or integrating the data from one system to another plays a vital role for any businesses.

While loading the data from SAP source to CSV file we have many tables which contain the data in various languages which means It includes information not only in western languages but also in other languages for example chinese, japanese etc.

While loading these kind of tables into CSV file, we will get improper data for other languages except English,The reason behind this is that when we load the data into CSV file format and open it using excel it uses windows-1252 encoding technique which is also known as ANSI (American National Standards Institute) encoding and is mostly compatible with English and other western languages.

Scenario:

In table TPART, the field VTEXT contains Chinese data at the source but when we load it in .csv file the garbage data is loaded.

Fig%3A1%29%20Source%20Data%20from%20table%20TPART.

Fig:1) Source Data from table TPART which contains chinese data in field VTEXT.

Fig%3A2%29%20target%20data%20in%20.csv%20file%20which%20contains%20garbage%20data.

Fig:2) target data in .csv file which contains garbage data in field VTEXT.

Solution:

To overcome this kind of data issue we need to tweak some settings in flat file editor in SAP DS.

Code page: should be UTF-8 or UTF-16. (We have used UTF-8 since it is more flexible and efficient character encoding scheme that can represent data from all writing systems and is widely supported by modern softwares).
Write BOM: should be Yes. (Enabling BOM is optional for code page UTF-8, but we enable it because some text editor especially older ones might not interpret UTF-8 without a BOM).

A Byte Order Mark (BOM) is a special marker that can be used at the beginning of an Unicode (UTF-8, UTF-16, or UTF-32) Encoded text file, it is used to indicate the byte order (little endian or big endian) and encoding of text to software that reads the file.

Fig%203%29%20Snip%20of%20file%20format%20editor%20where%20the%20above%20changes%20are%20implemented.

Fig 3) Snip of file format editor where the above changes are implemented.

Below Screenshot shows the output data after doing the codepage setting:

Fig%20%3A4%29%20Output%20snip%20of%20.csv%20file%20after%20the%20settings%20in%20flat%20file%20editor.

Fig :4) Output snip of .csv file after applying the above settings in flat file editor.

find the below links for more details on UTF-8 and file format properties.

What is UTF-8 https://www.freecodecamp.org/news/what-is-utf-8-character-encoding/

Input/Output options in the File Format editor

I hope you found this blog informative.

文章来源: https://blogs.sap.com/2023/10/05/garbage-data-issue-while-loading-data-from-multiple-sources-to-.csv-file-using-sap-ds/
如有侵权请联系:admin#unsafe.sh