AI Challenge | Web Scraping with Generative AI
2023-10-23 18:9:6 Author: blogs.sap.com(查看原文) 阅读量:5 收藏

I have read the blog State of GenAI in the SAP Community 09/2023, which is a great analysis of blogs about the applicability of GenAI. I had this conversation, which you can find in the comments:

Sergiu: How did you collect statistics for Overview SAP Community Insights from 82 blogs?
Peter: I would love to say I used a sophisticated method like accessing an API or at least did web scraping. But I simply invested one evening to search for the right keywords and another one to manually collect the numbers (likes/views).
Sergiu: I thought you succeeded in giving a set of prompt instructions to ChatGPT Plus to access the web and return the results in CSV or JSON. One day we will get that without coding.

I decided to develop an online app SAP Blog Statistics to collect statistics and launch a community challenge to achieve the same results with Generative AI without coding and without web scraping apps.

Large Language Models (LLM) are an impressive piece of technology. Generative AI applications like ChatOpenAI and MONICA are great copilots for coding. ChatGPT Plus and MONICA Pro have a web access option which is great at summarizing; however, they can’t perform specific tasks like web scraping of statistics.

With the prompt instruction “Collect the number of views,” I received the answer, “As an AI language model, I don’t have access to the exact number of views mentioned in the blog post.”

Have you engineered a prompt to extract statistics of SAP blogs?

Is it possible?

Share the solution in the comments.

For now, you can collect statistics with Python code. 🐍 😊’

The online app SAP Blog Statistics has three options:

On GitHub, you can find the Jupyter Notebook for a playground.

Select the list of blogs or upload a list of blogs and press the button extract. You can download the results in CSV or JSON formats.

Screenshots:

Select an option.

Press Extract and wait for results. URLs are clickable, and columns are sortable.
Press Download CSV or Download JSON to save Blog Statistics to a local file.

The code is contained in the class SapBlogStatistics so you can integrate it into other apps.

Humans designed computer formats and languages to store, process, and transmit the information we perceive in a natural way. Somehow we think that from now on we can rely on LLMs to achieve any past and future tasks with computers using prompt instructions with texts or images so we never have to type on keyboards thousands of line codes, test and debug hundreds of times. How will computer formats and languages evolve without further human contribution?

Have you engineered a prompt to extract statistics of SAP blogs?

Is it possible?

Share the solution in the comments.

What would you prefer for web scraping in terms of efficiency: LLM or apps and code?

Enjoy! 🎈🎈🎈

SAP HANA Cloud Machine Learning Challenge “I quit!” – Understanding metrics

Could machine learning build a model for prime numbers?

“Hello, world!” your crafted chat GPT bot!

SAP Machine Learning Embedding in OpenAI

Building Trust in AI: YouTube Transcript OpenAI Assistant

AI Challenge | Web scraping with Generative AI


文章来源: https://blogs.sap.com/2023/10/23/ai-challenge-web-scraping-with-generative-ai/
如有侵权请联系:admin#unsafe.sh