Hey Champs,
Hope you enjoyed the last blog on Sap Datasphere Data Flow Series – Operators (Joins, Projection, Aggregation, Union), well its just the trailer. Today lets jump into script operator of Datasphere in details.
The Script Operator seamlessly integrates the functionalities of popular Python libraries Pandas and NumPy with SAP Data Warehouse Cloud, enabling the creation of Data Flows and tailored information views. This versatile operator caters to a diverse range of tasks, including data cleansing, data transformation, and more.
Syntax for script operator:
———————————————————————–
def transformation(data):
# Fill in your scripts here with data as a source
# and save the values into data as output
return data
————————————————————————
Currently included libraries are restricted to Pandas, NumPy and several built-in module operations. But still not all functions are supported we can check the official documentation for the supported things.
Limitations of Sap Datasphere Script Operator:
Limitation due to batch size
Importance of DataSphere script operator in real world:
The Script Operator in SAP DataSphere offers several valuable use cases and is crucial for real-world data manipulation and analysis tasks. Here are some key usage and advantage of the Script Operator:
Note : We will try to learn from use case about how we can use the python operators and get familiar with the syntax. Not all use cases will be useful. Try to get the use of the operators and how to write the code
Use Case 1:
We will always get a scenario where we want to count the number of employee for each department or number of items in a order. So in this case we will try to take a similar example where I will use script operator to find the count of different payment mode. Using this we will see how efficient is script operator and what flexibility it gives us while writing python code.
Let’s have a look at our data:
This is our order table data and using this I want to count the payment modes.
Order Table Sample Data
So we will expect some output like this.
Expected Output Script Operator
Let’s have a look at the code and understand it step by step:
Python Script to group Data
This code defines a function called transform that takes a Pandas DataFrame as input and returns a new DataFrame with two columns: ‘Payment Mode’ and ‘Count of Payment mode’. The function performs the following steps:
Now lets get our hand dirty by entering to datasphere and using the dataflow.
Dataflow Script Operator
Conclusion :
This blog introduced the script operator available in datasphere data flow that we’ll use along the entire blog series, I have explained each and every part of the script operator in detail. Script allow is a very handy thing to do a lot of manipulation and analysis of data. The things which we can do using sql script, the same thing also we can do using script operator but since pyhon is used as a language so its gives us the flexibility to use lots of inbuilt function to do a more operation on data. I will continue the second part of this script operator to show what things are possible easily using script operator which we can’t do using sql script.
Thanks for reading! I hope you find this post helpful. For any questions or feedback just leave a comment below this post. Feel free to also check out the other blog posts in the series and follow me to learn as well as master sap analytics. Let me know if you find something can be improved or added.
Best wishes,
Kunal Mohanty