Convert a PDF file to DOCX format within SAP using Python Script
2023-10-19 05:10:7 Author: blogs.sap.com(查看原文) 阅读量:11 收藏

We ran into a roadblock when our client expressed a specific requirement: the occasional need to edit invoices after generation. Now if you have bought Adobe license it is no-brainer, otherwise it is not feasible with the native PDF format.

Use Case: Converting Invoices from PDF to DOCX

The invoices are generated as non-editable PDFs by SAP Adobe forms with a complicated design. After exploring many technical options, I came to realize there was no way to convert file yet keep the exact original formatting (including images and colors) within SAP unless I convert all the Adobe form invoices to Smartform and then get them printed into XLS version. But it wasn’t acceptable so I decided to make things interesting by integrating Python with SAP.

Python offers a large catalog of libraries to perform nearly anything. But not all of them can be safe to use. I came across a library which converted PDF to XLS format like a charm, exact same format. But I could not rely upon it with my client’s data as I wasn’t sure how it worked in backend. If it saves any uploaded file somewhere in their cloud. So I decided to go with a relatively reliable and much popular library — PDF2DOCX (code available online to make sure that it works locally).

Every now and then we run into a need of converting a PDF to some editable format, and so I decided to create a generic SAP program which would call a python script in the backend to convert any PDF file to .docx format.

Aim of the Integration

  • Maintain Data Accuracy: Ensuring that during the conversion from PDF to DOCX, all data, formatting, and structural elements are preserved accurately to prevent any loss or misrepresentation of information.
  • Uphold Data Security: Guaranteeing that the client’s sensitive and confidential data remains secure by managing the conversion process in-house, without transmitting data externally, thus adhering to stringent data security and privacy protocols.
  • In-House Solution: Developing a solution within the existing technological framework (SAP and Python), thereby negating the need for external software or services and ensuring tighter integration with existing processes.
  • User Accessibility: Crafting a user-friendly interface and process within SAP, where users can effortlessly select, convert, and utilize documents, while ensuring that the behind-the-scenes complexity remains abstracted.
  • Cost-Effective Approach: Providing a reliable solution for document conversion without incurring additional costs related to third-party software licenses or services.

Detailed Workflow

  1. Since the python code is going to be executed on SAP server, Python needs be installed there. Generally Basis or OS Team have access to install Python in SAP Server. For this project, 2 python libraries are required to be installed after Python installation – OS, PDF2DOCX.
  2. Once Python is installed on SAP server, go to T-code — SM69 and create a command for Linux to confirm the installation:

Check Version of installed Python package

Make sure the installed package is not too old. Here python is installed using RPM, and 2.7.5 being the latest package in RPM.

~~~ Now the actual interesting work starts ~~~

3. Create a SM69 command — Z_PYTHON to execute python with OS command — ‘python’ (I used the same OS Command I created to check version). Make sure you have ticked the checkbox — ‘Additional Command’.

4. Create Logical and Physical Path (T-code-FILE)

Step-(a)

Step-(b)

Step-(c)

5. Create an Include — ZCONVERT_PY in SE38 and write below Python code in the include and Activate it.

Python Code Logic: Use Library PDF2DOCX and OS (make sure both are installed in SAP server). Put <PLACEHOLDER> where user-selected PDF filepath can be inserted on runtime.

from pdf2docx import Converter
import os

# get the input path from SAP Program Selection-Screen user-Input
input_path = '<PLACEHOLDER>'

# save the docx file in the same folder as pdf file
base = os.input_file.splitext(path)[0]
output_path = base + '.docx'

# convert pdf to docx
cv = Converter(input_path)
cv.convert(output_path) # all pages by default
cv.close()

Note: DO NOT ‘pretty printer’ the code and make sure it is with exact same indentation and same alphabet-case as it would be in your Python text editor.

If you want to manipulate something in the PDF, here is a reference documentation for PDF2DOCX library.

6. Create a new Generic FM — ZCM_EXECUTE_SCRIPT to run any Python code in SE37.

  • This FM can be used in future to execute any other Python scripts as well.
  • Import Parameter will be the (i) Include Name which has python code (ii) A temporary .py file name (iii) Placeholder_1. There will be scenarios where you have to pass a value from SAP to Python Script on execution (User-selected file path in this case). This value can be passed via placeholder.

FM Import Parameters
  • Export Parameter will be Log table (table type of BTCXPM).

FM Export Parameters

                                               Log Table — Table Type
  • Code Logic: Check if any file with the imported filename already exists in the directory (declared in Logical Path — ZPYTHON). If file exists, delete it and create a new temporary AL11 file and write the Include ZCONVERT_PY content (python code) to it. Replace the occurrence of <PLACEHOLDER> in content with the imported file path. Run SM69 command — Z_PYTHON to execute the temporarily created python file. Once the execution is complete, delete the temporary file.
"-Variables
DATA lv_filename(255) TYPE c.
DATA : lv_fname TYPE epsf-epsfilnam,
lv_directory TYPE epsf-epsdirnam,
lt_files TYPE STANDARD TABLE OF epsfili.
DATA : lv_file_path TYPE epsf-epspath,
lv_long TYPE eps2path.

DATA lv_tadir TYPE tadir.
DATA lt_incl TYPE TABLE OF string.
DATA lv_lineincl TYPE string.
DATA lv_strincl TYPE string.
DATA lv_status TYPE extcmdexex-status.
DATA lv_exitcode TYPE extcmdexex-exitcode.

"Gets directory path
CALL FUNCTION 'FILE_GET_NAME'
EXPORTING
client = sy-mandt
logical_filename = 'ZPYTHON' (created in Step-4)
operating_system = sy-opsys
parameter_1 = i_filename
eleminate_blanks = 'X'
IMPORTING
file_name = lv_filename
EXCEPTIONS
file_not_found = 1
OTHERS = 2.
IF sy-subrc = 0.
" Extract the directory part of the path
lv_directory = lv_filename.
REPLACE FIRST OCCURRENCE OF REGEX '/[^/]*$' IN lv_directory WITH space.
CONDENSE lv_directory.

"-Check if file exists in the AL11 Directory
IF lv_directory IS NOT INITIAL.
" Get the list of files in the directory
CALL FUNCTION 'EPS_GET_DIRECTORY_LISTING'
EXPORTING
dir_name = lv_directory
TABLES
dir_list = lt_files
EXCEPTIONS
invalid_eps_subdir = 1
sapgparam_failed = 2
build_directory_failed = 3
no_authorization = 4
read_directory_failed = 5
too_many_read_errors = 6
empty_directory_list = 7
OTHERS = 8.
IF sy-subrc = 0.
LOOP AT lt_files INTO DATA(ls_file).
IF ls_file-name EQ i_filename.

"If file exists, delete it
CLEAR : lv_fname.
lv_fname = i_filename.

CALL FUNCTION 'EPS_DELETE_FILE'
EXPORTING
file_name = lv_fname
dir_name = lv_directory
IMPORTING
file_path = lv_file_path
ev_long_file_path = lv_long
EXCEPTIONS
invalid_eps_subdir = 1
sapgparam_failed = 2
build_directory_failed = 3
no_authorization = 4
build_path_failed = 5
delete_failed = 6
OTHERS = 7.
IF sy-subrc <> 0.
MESSAGE 'Unable to delete already existing file with same name!' TYPE sy-abcde+4(1). "'E'.
ENDIF.
EXIT.
ENDIF.
ENDLOOP.
ENDIF.
ENDIF.
ELSE.
WRITE: / 'Error getting file path:', sy-subrc.
ENDIF.

"-Gets script content-----------------------------------------------
SELECT SINGLE * FROM tadir INTO lv_tadir
WHERE obj_name = i_inclname.
IF sy-subrc = 0.
READ REPORT i_inclname INTO lt_incl.
IF sy-subrc = 0.
LOOP AT lt_incl INTO lv_lineincl.
lv_strincl = lv_strincl && lv_lineincl &&
cl_abap_char_utilities=>cr_lf.
ENDLOOP.

"If there is any placeholder in python scrip, replace it with input parameter
REPLACE ALL OCCURRENCES OF '<PLACEHOLDER>' IN lv_strincl WITH i_plholder.
ENDIF.
ELSE.
MESSAGE 'Error' TYPE 'E'.
ENDIF.

"-Writes script-----------------------------------------------------
OPEN DATASET lv_filename FOR OUTPUT IN TEXT MODE
ENCODING NON-UNICODE WITH WINDOWS LINEFEED.
IF sy-subrc = 0.
TRANSFER lv_strincl TO lv_filename.
CLOSE DATASET lv_filename.
ELSE.
MESSAGE 'Error' TYPE 'E'.
ENDIF.

"-Executes script---------------------------------------------------
CALL FUNCTION 'SXPG_COMMAND_EXECUTE'
EXPORTING
commandname = 'Z_PYTHON' (created in step-3)
additional_parameters = lv_filename
IMPORTING
status = lv_status
exitcode = lv_exitcode
TABLES
exec_protocol = e_execprot
EXCEPTIONS
no_permission = 1
command_not_found = 2
parameters_too_long = 3
security_risk = 4
wrong_check_call_interface = 5
program_start_error = 6
program_termination_error = 7
x_error = 8
parameter_expected = 9
too_many_parameters = 10
illegal_command = 11
wrong_asynchronous_parameters = 12
cant_enq_tbtco_entry = 13
jobcount_generation_error = 14
OTHERS = 15.
IF sy-subrc <> 0.
MESSAGE 'Error' TYPE 'E'.
ENDIF.

"Delete script
DELETE DATASET lv_filename.
IF sy-subrc <> 0.
MESSAGE 'Error' TYPE 'E'.
ENDIF.

7. Create a new SAP Program — Z_CONVERT in SE38 which will :

  • Let user to F4 and select a PDF File from their local machine (p_path)
cl_gui_frontend_services=>file_open_dialog(
EXPORTING
window_title = lv_title " Title Of File Open Dialog
default_extension = 'PDF' " Default Extension
default_filename = lv_file " Default File Name
file_filter = '*.PDF' " File Extension Filter String
CHANGING
file_table = lt_file " Table Holding Selected Files
rc = lv_rc " Return Code, Number of Files or -1 If Error Occurred
user_action = lv_action " User Action (See Class Constants ACTION_OK, ACTION_CANCEL)
EXCEPTIONS
file_open_dialog_failed = 1 " "Open File" dialog failed
cntl_error = 2 " Control error
error_no_gui = 3 " No GUI available
not_supported_by_gui = 4 " GUI does not support this
OTHERS = 5 ).
if sy-subrc = 0.
"display user-selected-path on selection-screen
p_path = lt_file-filename.
endif.
  • Call FM — ZCM_EXECUTE_SCRIPT by passing:

i_inclname = ‘ZCONVERT_PY

i_filename = ‘TEMP.PY’

i_plholder = p_path

  • Checks the returned logs to see if any error occurred in ZCM_EXECUTE_SCRIPT. You can do error handling as per your scenario and requirement.

Now, every time user executes program — Z_CONVERT and select the PDF file, the FM — ZCM_EXECUTE_SCRIPT will create a temporary file (TEMP.PY) with the python code from include (ZCONVERT_PY) and user-selected file’s path (p_path), execute the temporary file using SM69 command (ZPYTHON), and delete the temporary file. SIMPLE !!

You can also place the python code .py file permanently in AL11 instead of creating an Include and reading it in the FM. But that is not the best approach in terms of solution maintenance, considering uploading (CG3Y) and downloading (CG3Z) the file again in AL11 for every little code change and making sure the file exists in the same folder at any point of time. Also, I liked the idea of making Function Module generic for any future python-related solution.

This integration between SAP and Python aspires to elegantly solve this challenge, demonstrating that with the right blend of technologies, solutions can be crafted that are not only technically sound but also deeply aligned with user needs and organizational processes. My journey through this integration explores the technical depth, challenges, and innovative strategies employed to bring this objective to fruition.

REFERENCE:

https://blogs.sap.com/2016/02/09/how-to-use-python-via-external-os-commands-and-embed-the-scripts-seamlessly/


文章来源: https://blogs.sap.com/2023/10/18/convert-a-pdf-file-to-docx-format-within-sap-using-python-script/
如有侵权请联系:admin#unsafe.sh