In blog 6 of our Spotify series, we shift our focus to visualization, explicitly pertaining to Playlist and Audio features API, as discussed in this series’s second and third blogs. We will delve into the detailed consumption of Calculation views from SAP HANA Cloud, utilizing tools such as SAP Analytics Cloud and Microsoft Power BI. For SAP Datasphere models, based on the third blog, the consumption process is quite similar, and we will highlight the differences where they exist.
In this blog, we will cover the following topics in detail:
This blog post is part of a comprehensive series. If you’re interested in exploring more, feel free to visit the other blogs in this series:
This section will concentrate on accessing the deployed calculation view in SAP Analytics Cloud (SAC). Suppose you have followed along with blog 2 and cloned and deployed the associated Git repository, which now includes content for both reporting and building Graph networks. In that case, you should have access to the calculation view, CV_TOP_ALL[1].
To access the Calculation view from either SAC or Microsoft PowerBI, it’s crucial to ensure that the hdbrole[2] is assigned to the database user created for frontend consumption. This step is vital for enabling the necessary permissions and access rights for the user to interact with the data through these platforms.
Remember, the correct assignment of roles and permissions is a crucial step in maintaining the security and integrity of your data while still allowing for flexible and robust data analysis capabilities.
This calculation view is based on the SQL view TOPALL1 discussed in blog2.
The semantics details for the calculation view, CV_TOP_ALL[1], include a calculated column named “Speechiness1”.
As highlighted previously, the audio features from the Spotify API include a “speechiness” attribute. Speechiness measures the presence of spoken words in a track. Songs with exclusively instrumental music and no vocals have low speechiness, while rap songs and podcasts with continuous speaking have higher scores.
This metric helps compare speech patterns across playlists and geo-markets. Analyzing speechiness allows identifying playlists that contain:
In summary, the speechiness measure detects tracks with more spoken emphasis over instrumentation. We can leverage this to spot regional playlist preferences for words vs music.
Source: Spotify API Documentation
I establish a threshold to differentiate speech-centric playlists by subtracting 0.33 from the Spotify speechiness score. Playlists with tracks containing predominantly spoken words (e.g., rap, hip hop, podcasts) typically have higher speechiness exceeding this threshold.
The interactive visualizations in SAP Analytics Cloud will spotlight playlists based on this speechiness threshold:
With the calculation view deployed, the next step is allowing consumption by assigning database permissions.
Specifically:
Enabling this role assignment via Cockpit is a prerequisite before the calculation view can be leveraged for reporting and analytics. The Explorer allows consumption confirmation post assignment.
On successfully creating a new database user, provisioning the roles, and connecting to the HDI container, you can effectively generate a scatter plot[4] based on the calculation view, CV_TOP_ALL[1].
The scatter plot is a powerful visualization tool that allows you to see the relationship between two variables. In this case, we have chosen to visualize the correlation between the metrics ‘Danceability’ and ‘Popularity'[3] across tracks of playlists from four different countries. This is achieved by setting a filter[2] on ‘Tracklistname’.
To interpret the scatter plot, each point on the plot represents a track. The position of a point on the horizontal axis indicates its ‘Danceability’ score, and its position on the vertical axis indicates its ‘Popularity’ score. If there is a pattern in the points, such as a line or curve, this suggests a correlation between ‘Danceability’ and ‘Popularity’.
Remember, correlation does not imply causation. While the scatter plot may show a relationship between ‘Danceability’ and ‘Popularity’, it does not prove that increasing ‘Danceability’ will increase ‘Popularity’. Other factors may be influencing both variables.
If you already have access to the SAP Analytics Cloud (SAC) tenant, you can proceed with the next steps. However, if you don’t, you have the option to register for a SAC trial, which lasts up to 60 days. This trial period allows you to explore and familiarize yourself with the functionalities of SAC.
For further information about the SAC trial, including registration process, available features, and any limitations, please refer to the provided link. This link should direct you to a FAQ page and additional details about the SAC trial.
You already have access to the SAC tenant.
To create a connection to SAP HANA Cloud from your active SAC tenant, follow these steps:
Now, your SAC tenant is connected to your SAP HANA Cloud instance, and you can start creating models and stories using your SAP HANA Cloud data.
Select the Modeler from the drop-down and build a Live data connection. Use the connection that was created in the previous step and select the calculation view CV_TOP_ALL.
Save the model once you validate all the measures and Dimensions. In my case, the analytical model is “Tracksall”
Now that we have built an analytical model, we can leverage it to gain insights into playlists and songs by visualizing key metrics. Specifically, we will create data stories using ggplot to examine three aspects:
This is a standard trending chart[1] based on the SAC Analytic model TRACKSALL, and basically compares the 3 metrics across the playlist TOPUSA[2]. TRACKLISTNAME groups all 50 tracks from the USA and I have used the filter[2] for the same.
When you analyze the top 50 songs from the USA tracks, you will notice the songs with high energy or danceability will have less “speechiness”.
And songs like Rich Flex by Drake has higher “speechiness” value because of all the rapping yo! 😊 It’s a mix of R&B and RAP and you notice the high Speechiness factor.
Prerequisites:
1. To utilize R-scripts, connect to either a remote R environment (BYOR) or an R environment provided by SAP in various data centers. Refer to this link for availability details. Follow these steps to set up a remote R server, similar to integrating R with SAP HANA.
2. For production use cases, verify required R packages. If packages needed for your use case are unsupported by SAP, set up an R environment with those packages accordingly.
3. Review the basics of ggplot2, an R library providing flexible, tidy, optimized data visualization. It enables integrated data exploration and analysis. Refer to the ggplot2 documentation.
For this Spotify example, I leveraged an SAP-provided R environment in the EU10 data center. First I will share the visualization, followed by the script, explanation, and SAC steps. This output utilizes interactive data visualization to compare “danceability” metrics across music playlists.
And here is the script for the R visualization.
library(plotly)
library(ggplot2)
salmon <- "#F8766D"
teal <- "#00BFC4"
orange <- "#D95E0E"
limegreen <- "#7CAE00"
tangerine <- "#FF9E13"
skyblue <- "#56B4E9"
junglegreen <- "#009E73"
mustard <- "#F0E442"
sapphire<- "#0072B2"
goldenrod <- "#E69F00"
viz4 <- ggplot(Tracksall, aes(x=Tracksall$DANCEABILITY, fill=Tracksall$TRACKLISTNAME,
text = paste(Tracksall$TRACKLISTNAME)))+
geom_density(alpha=0.7, color=NA)+
scale_fill_manual(values=c(salmon, teal, orange, limegreen, tangerine,skyblue,junglegreen, mustard,sapphire,goldenrod))+
labs(x="Danceability", y="Density") +
guides(fill=guide_legend(title="Playlist"))+
theme_minimal()+
ggtitle("Distribution of Danceability Data")
ggplotly(viz4, tooltip=c("text"))
What this graph specifically shows is the danceability distribution across various playlists based on the dataset TRACKSALL. Using density plots, we can visualize how concentrated certain playlists are in high or low danceability scores. For example, some playlists have most of their tracks clustered on the higher end of danceability, meaning the songs tend to be quite danceable(e.g. Chile Playlist). Other playlists have a wider spread across the axis. Just by glancing at the colors and density shapes, we can get a sense of the variation in dance “suitability” across these playlists.
Sure, let’s break down the script.
Please make sure R server is enabled on your SAC tenant or your remote R connection is established.
As part of your story, add R visualization as shown below:
Once you add the R visualization, provide the input data source[1] as your Analytics model “Tracksall”. Select the “Edit script”[2] option and copy the script that I shared. If your Analytics model name is different, please make sure you replace it in the script on line 13.
With just a few lines of code, the flexibility and ease of ggplot2 allows us to quickly visualize the data into an insightful plot, demonstrating the power of this graphics package.
I will share the visualization, R script, and an explanation of the script. However, you can follow the same implementation steps in SAC as we previously discussed for the “Distribution of Danceability Data” example. Let’s start with the visualization first: This script creates an interactive bar chart that visually represents the “Speechiness” of different tracks across various playlists. Speechiness measures the presence of spoken words in a track – the higher the value, the more words in the track. Each bar represents a track, with the bar height indicating the speechiness. The bars are color-coded by playlist, enabling easy distinction between playlists. Some tracks in different playlists have Speechiness > 0, potentially indicating rap songs, podcasts, or audiobooks.
This interactive chart allows us to compare the “speechiness” of tracks across different playlists. Speechiness measures the presence of spoken words in a song. Songs with exclusive music and no words have low speechiness. Podcast and audiobook tracks would be highly speechy. The height of the bars represents the speechiness value – higher bars mean more spoken content. The color shading groups tracks by the playlist they belong to.
And the SAC Implementation?
Follow the same steps for SAC implementation as before, and you should be able to view both the SAC visualizations based on R-script.
In Blog 3 of the Spotify Series, we detailed how to construct analytical models akin to those in SAP HANA Cloud.
When using SAP HANA Cloud, we connected directly to the database with a user who had access to the deployed containers. However, with SAP Datasphere, analytic models built within a space can be directly exposed when constructing trend charts in SAC. There is no need to separately establish a database connection. Instead, the data models are available for visualization as soon as they are deployed to the space. This enables simpler and faster data access when leveraging SAP Datasphere as the backend data source within SAC. The tight integration facilitates the rapid connection of visualizations to data models that have been developed and exposed through spaces.
You can build stories in SAC connected to SAP Datasphere in the same way as described in previous examples linking to SAP HANA Cloud. The process of building visualizations, filters, stories, and explanatory text follows the same methodology whether your data models reside in HANA Cloud or Datasphere.
For this example, I will connect SAP HANA Cloud to Power BI. Once you are in the Power BI Desktop App, click on the “Home” tab in the ribbon, then click on “Get Data”.
You can choose either “Import” or “DirectQuery” to connect to SAP HANA Cloud. Import means you’ll be pulling the data into Power BI, while DirectQuery means you’ll be working directly with the data on the server.
Enter your SAP HANA Cloud credentials when prompted in the Power BI connection window. After a successful connection, the Power BI Navigator will appear allowing you to select the specific tables, views, or calculation views to load. For this analysis, we will be working with the CV_TOP_ALL calculation view that was previously created in SAP Business Application Studio.
It may take a few seconds for the CV_TOP_ALL calculation view to load in Power BI. Once loaded, you will see CV_TOP_ALL listed on the right side of the screen along with options to build visualizations. To demonstrate some useful features in Power BI, I will create a simple table visualization[2] based on the data from CV_TOP_ALL.
Once you select the table option, a blank table visualization will load. You can then select the specific columns from the CV_TOP_ALL calculation view that you want to display in the table. For this example, I chose to include the Album name, Track name, and Image columns. The Image column contains public URLs pointing to album artwork associated with each track.
Select the Image column[1] from CV_TOP_ALL, and change the data category[2] to “Image URL”.
Once you add those columns to the table, Power BI works its magic to transform the image URLs into actual album cover art on the fly. How awesome is that! With its slick auto-image rendering wizardry, Power BI saves us muggles from having to manually extract and embed images in visualizations. We just provide the URLs, and presto – album covers appear in the table as if by divination! 🙂
If you have been following this Spotify blog series, you may recall that we used the HANA_ML library to extract image URLs from Spotify and ingest them to SAP HANA Cloud. We loaded the JSON metadata containing these image links into the SAP HANA Cloud Document Store. SQL views were then created to select the image URLs from the JSON artifacts stored in the document store. These SQL views were incorporated into the CV_TOP_ALL calculation view, which combines datasets from various sources. This calculation view is later accessed in both SAP Analytics Cloud for visualization and Power BI to demonstrate auto-image rendering as we have explored.
In this Power BI report, I utilized the same CV_TOP_ALL calculation view that was created in SAP Analytics Cloud and generated two visualizations:
Power BI’s integration with custom visual apps allows for a more flexible analysis of the CV_TOP_ALL calculation view, including visualizing based on images. You can access these additional visualizations in Power BI by selecting “Get more visuals” from the Visualizations pane.
Selecting “Get more visuals” will open the Power BI visuals gallery. In the gallery, search for the “image grid” custom visual. When you find the image grid visual, add it to your report by clicking the “Add” button.
Once added, you will see the app as part of your Power BI desktop.
When you select the image grid visual and choose the Image column from the CV_TOP_ALL calculation view, Power BI will automatically populate an image grid displaying all 500 album cover images that were extracted from the Spotify data. By default, the images are rendered in the grid sorted by the predefined order in the view. A key capability offered by the image grid is the ability to visualize a collection of images and rapidly sort them by different attributes to spot visual patterns or trends. For example, with a few clicks, you can rearrange the grid sorted alphabetically by artist, genre, release date, etc. Looking at the images sorted in different ways allows you to analyze the data in new visual perspectives that may yield additional insights. Pretty cool, eh?
Additional filters can be applied to the image grid visual to narrow down the list of tracks being analyzed. For example, the grid can be filtered to only show images from the “Top Indian tracks” playlist that was ingested from Spotify. Furthermore, the Speechiness audio feature metric can be added as a filtering criteria to only display tracks above or below a certain speechiness threshold. Applying these types of filters allows slicing the 500-track image grid down to a subset of images matching the given criteria. This enables more focused visual analysis. For instance, in this case, filtering to Indian tracks with high speechiness shows the actual album images associated with that segmented list of verbal tracks.
I hope this Spotify series has sparked interest and ideas for adopting SAP HANA Cloud and integrating it into your own analytics use cases. My sincere thanks to the colleagues and community members who reached out to share the visualization blog—your feedback motivated me to complete it before the end of the year 🙂
Looking ahead to 2024, we plan to continue creating enablement blogs focusing on SAP HANA Cloud- either expanding on this Spotify series or developing a new one focusing on both multi-model analysis extending it to the new SAP HANA Cloud Vector Engine as well as the SAP’s Generative AI Hub platform. There are so many emerging capabilities to explore!
Please stay tuned for more to come, and happy learning on your own data analytics discovery journeys in the New Year! Please feel free to reach out with any additional questions or feedback on the topics covered in this Spotify series. Looking forward to hearing from you! Happy Learning!!