Blog / How Web Scraping Google Play App Reviews Will Create Dataset for Sentiment Analysis?
11 November 2021
A guide to using Python to scrape Android App reviews and turn the data into a sentiment analysis database.
Let's look at how to scrape reviews and ratings for Android apps to produce a dataset for sentiment analysis. You'll save the material to CSV files after converting the application and reviewing the data into Data Frames.
Executing the code with Scripting with Pytorch (Google Calab)
Installing necessary packages and setting up the imports
You'll learn how to:
import json import pandas as pd from tqdm import tqdm import seaborn as sns import matplotlib.pyplot as plt from pygments import highlight from pygments.lexers import JsonLexer from pygments.formatters import TerminalFormatter from google_play_scraper import Sort, reviews, app %matplotlib inline %config InlineBackend.figure_format='retina' sns.set(style='whitegrid', palette='muted', font_scale=1.2)
You'd like to receive customer feedback on your items, whether positive or negative; both are valuable. You'd want to know what other people think of your app. Both the negative and positive features are advantageous. The negative one, on the other hand, may reveal critical features that are missing or service disruptions (when it is much more frequent).
Fortunately, Google Play offers a diverse selection of apps, ratings, and reviews. We can scrape app metadata and reviews using the google-play-scraper program.
When it comes to evaluating apps, you have a lot of alternatives. On the other hand, different app categories have diverse target audiences, domain-specific characteristics, and so on. Let's start with the fundamentals.
We need applications that have been around for a long so that natural feedback may be gathered. We want to keep the amount of advertising we utilise to a minimum. Because apps are updated on a regular basis, the date of the review is crucial.
In a perfect world, you'd collect every possible review and use it to your advantage. In the real world, however, data is frequently restricted (too large, inaccessible, etc.). As a result, we'll give it our all.
Let's take a look at a few apps that meet the Productivity category's requirements. We'll use AppAnnie to select a few of the most popular apps in the US:
app_packages = [ 'com.anydo', 'com.todoist', 'com.ticktick.task', 'com.habitrpg.android.habitica', 'cc.forestapp', 'com.oristats.habitbull', 'com.levor.liferpgtasks', 'com.habitnow', 'com.microsoft.todos', 'prox.lab.calclock', 'com.gmail.jmartindev.timetune', 'com.artfulagenda.app', 'com.tasks.android', 'com.appgenix.bizcal', 'com.appxy.planner' ]
Scraping the information for every application
app_infos = [] for ap in tqdm(app_packages): info = app(ap, lang='en', country='us') del info['comments'] app_infos.append(info) For each of the 15 apps, we are able to gather information. Let's create a method to make printing JSON objects easier: def print_json(json_object): json_str = json.dumps( json_object, indent=2, sort_keys=True, default=str ) print(highlight(json_str, JsonLexer(), TerminalFormatter())) Here's an example of app data from the list: print_json(app_infos[0]) { "adSupported": null, "androidVersion": "Varies", "androidVersionText": "Varies with device", "appId": "com.anydo", "containsAds": null, "contentRating": "Everyone", "contentRatingDescription": null, "currency": "USD", "description": "\ud83c\udfc6 Editor's Choice by Google\r\n\r\nAny.do is a To Do List, Calendar, Planner, Tasks & Reminders App That Helps Over 25M People Stay Organized and Get More Done.\r\n\r\n\ud83e\udd47 \"It\u2019s A MUST HAVE PLANNER & TO DO LIST APP\" (NYTimes, USA TODAY, WSJ & Lifehacker).\r\n\r\nAny.do is a free to-do list, planner & calendar app for managing and organizing your daily tasks, to-do lists, notes, reminders, checklists, calendar events, grocery lists and more.\r\n\r\n\ud83d\udcc5 Organize Your Tasks & To-Do List in Seconds\r\n\r\n\u2022 ADVANCED CALENDAR & DAILY PLANNER - Keep your to-do list and calendar events always at hand with our calendar widget. Any.do to-do list & planner support daily calendar view, 3-day Calendar view, Weekly calendar view & agenda view, with built-in reminders. Review and organize your calendar events and to do list side by side.\r\n\r\n\u2022 SYNCS SEAMLESSLY - Keeps all your to do list, tasks, reminders, notes, calendar & agenda always in sync so you\u2019ll never forget a thing. Sync your phone\u2019s calendar, google calendar, Facebook events, outlook calendar or any other calendar so you don\u2019t forget an important event.\r\n\r\n\u2022 SET REMINDERS - One time reminders, recurring reminders, Location reminders & voice reminders. NEW! Easily create tasks and get reminders in WhatsApp.\r\n\r\n\u2022 WORK TOGETHER - Share your to do list and assign tasks with your friends, family & colleagues from your task list to collaborate and get more done. \r\n\r\n---\r\n\r\nALL-IN-ONE PLANNER & CALENDAR APP FOR GETTING THINGS DONE\r\nCreate and set reminders with voice to your to do list. \r\nFor better task management flow we added a calendar integration to keep your agenda always up to date. \r\nFor better productivity, we added recurring reminders, location reminders, one-time reminder, sub-tasks, notes & file attachments. \r\nTo keep your to do list up to date, we\u2019ve added a daily planner and focus mode.\r\n\r\nINTEGRATIONS\r\nAny.do To do list, Calendar, planner & Reminders Integrates with Google Calendar, Outlook, WhatsApp, Slack, Gmail, Google Tasks, Evernote, Trello, Wunderlist, Todoist, Zapier, Asana, Microsoft to-do, Salesforce, OneNote, Google Assistant, Amazon Alexa, Office 365, Exchange, Jira & More.\r\n\r\nTO DO LIST, CALENDAR, PLANNER & REMINDERS MADE SIMPLE\r\nDesigned to keep you on top of your to do list, tasks and calendar events with no hassle. With intuitive drag and drop of tasks, swiping to mark to-do's as complete, and shaking your device to remove completed from your to do list - you can stay organized and enjoy every minute of it.\r\n\r\nPOWERFUL TO DO LIST TASK MANAGEMENT\r\nAdd a to do list item straight from your email / Gmail / Outlook inbox by forwarding do@Any.do. Attach files from your computer, Dropbox, or Google Drive to your to- tasks.\r\n\r\nDAILY PLANNER & LIFE ORGANIZER\r\nAny.do is a to do list, a calendar, an inbox, a notepad, a checklist, task list, a board for post its or sticky notes, a task & project management tool, a reminder app, a daily planner, a family organizer, an agenda, a bill planner and overall the simplest productivity tool you will ever have. \r\n\r\nSHARE LISTS, ASSIGN & ORGANIZE TASKS\r\nTo plan & organize projects has never been easier. Now you can share lists between family members, assign tasks to each other, chat and much more. Any.do will help you and the people around you stay in-sync and get reminders so that you can focus on what matters, knowing you had a productive day and crossed off your to do list.\r\n\r\nGROCERY LIST & SHOPPING LIST\r\nAny.do task list, calendar, agenda, reminders & planner is also great for shopping lists at the grocery store. Simply create a list on Any.do, share it with your loved ones and see them adding their shopping items in real-time.", "descriptionHTML": "\ud83c\udfc6 Editor's Choice by Google
Any.do is a To Do List, Calendar, Planner, Tasks & Reminders App That Helps Over 25M People Stay Organized and Get More Done.
\ud83e\udd47 "It\u2019s A MUST HAVE PLANNER & TO DO LIST APP" (NYTimes, USA TODAY, WSJ & Lifehacker).
Any.do is a free to-do list, planner & calendar app for managing and organizing your daily tasks, to-do lists, notes, reminders, checklists, calendar events, grocery lists and more.
\ud83d\udcc5 Organize Your Tasks & To-Do List in Seconds
\u2022 ADVANCED CALENDAR & DAILY PLANNER - Keep your to-do list and calendar events always at hand with our calendar widget. Any.do to-do list & planner support daily calendar view, 3-day Calendar view, Weekly calendar view & agenda view, with built-in reminders. Review and organize your calendar events and to do list side by side.
\u2022 SYNCS SEAMLESSLY - Keeps all your to do list, tasks, reminders, notes, calendar & agenda always in sync so you\u2019ll never forget a thing. Sync your phone\u2019s calendar, google calendar, Facebook events, outlook calendar or any other calendar so you don\u2019t forget an important event.
\u2022 SET REMINDERS - One time reminders, recurring reminders, Location reminders & voice reminders. NEW! Easily create tasks and get reminders in WhatsApp.
\u2022 WORK TOGETHER - Share your to do list and assign tasks with your friends, family & colleagues from your task list to collaborate and get more done.
---
ALL-IN-ONE PLANNER & CALENDAR APP FOR GETTING THINGS DONE
Create and set reminders with voice to your to do list.
For better task management flow we added a calendar integration to keep your agenda always up to date.
For better productivity, we added recurring reminders, location reminders, one-time reminder, sub-tasks, notes & file attachments.
To keep your to do list up to date, we\u2019ve added a daily planner and focus mode.
INTEGRATIONS
Any.do To do list, Calendar, planner & Reminders Integrates with Google Calendar, Outlook, WhatsApp, Slack, Gmail, Google Tasks, Evernote, Trello, Wunderlist, Todoist, Zapier, Asana, Microsoft to-do, Salesforce, OneNote, Google Assistant, Amazon Alexa, Office 365, Exchange, Jira & More.
TO DO LIST, CALENDAR, PLANNER & REMINDERS MADE SIMPLE
Designed to keep you on top of your to do list, tasks and calendar events with no hassle. With intuitive drag and drop of tasks, swiping to mark to-do's as complete, and shaking your device to remove completed from your to do list - you can stay organized and enjoy every minute of it.
POWERFUL TO DO LIST TASK MANAGEMENT
Add a to do list item straight from your email / Gmail / Outlook inbox by forwarding do@Any.do. Attach files from your computer, Dropbox, or Google Drive to your to- tasks.
DAILY PLANNER & LIFE ORGANIZER
Any.do is a to do list, a calendar, an inbox, a notepad, a checklist, task list, a board for post its or sticky notes, a task & project management tool, a reminder app, a daily planner, a family organizer, an agenda, a bill planner and overall the simplest productivity tool you will ever have.
SHARE LISTS, ASSIGN & ORGANIZE TASKS
To plan & organize projects has never been easier. Now you can share lists between family members, assign tasks to each other, chat and much more. Any.do will help you and the people around you stay in-sync and get reminders so that you can focus on what matters, knowing you had a productive day and crossed off your to do list.
GROCERY LIST & SHOPPING LIST
Any.do task list, calendar, agenda, reminders & planner is also great for shopping lists at the grocery store. Simply create a list on Any.do, share it with your loved ones and see them adding their shopping items in real-time.", "developer": "Any.do Calendar & To-Do List", "developerAddress": "Any.do Inc.\n\n6 Agripas Street, Tel Aviv\n6249106 ISRAEL", "developerEmail": "feedback+androidtodo@any.do", "developerId": "5304780265295461149", "developerInternalID": "5304780265295461149", "developerWebsite": "https://www.any.do", "free": true, "genre": "Productivity", "genreId": "PRODUCTIVITY", "headerImage": "https://lh3.googleusercontent.com/dZknnlk1LM8fYS3wjOvVHOmWKOGH1HAe691Yuh7LAeBj6a730A1CQqZnXxjNahAYUFFw", "histogram": [27291, 9246, 13735, 29904, 262997], "icon": "https://lh3.googleusercontent.com/zgOLUXCHkF91H8xuMTMLT17smwgLPwSBjUlKVWF-cZRFjlv-Uvtman7DiHEii54fbEE", "installs": "10,000,000+", "minInstalls": 10000000, "offersIAP": true, "price": 0, "privacyPolicy": "https://www.any.do/privacy", "ratings": 343174, "recentChanges": "Faster and smoother for better user experience!", "recentChangesHTML": "Faster and smoother for better user experience!", "released": "Nov 10, 2011", "reviews": 122170, "score": 4.43388, "screenshots": [ "https://lh3.googleusercontent.com/C-L3_FPMlKVrZItAORaszhnQzlzMyXcqF_-oGaabHm_OnwUW1jz02BXBVSKi0HRUtQ", "https://lh3.googleusercontent.com/uAP6G5ANQcgVs4Uj6yrcsAo4OUhejTJRVCXOxnAVA5Efit_OtAnrOYyL1SUHj1rv", "https://lh3.googleusercontent.com/AI5mLFu0Atsl0km2FO9_IwJXNy_1q1_X6Ua3EVMZNedp0dsDToDRaWQ1UDvI6mb1-I0", "https://lh3.googleusercontent.com/bYCAn3mjgB4ugSY0PL-PCcMBfbvXCSFkzL-pLSIIbZ8sQByQPerHboPQ2fA126K4LDtU", "https://lh3.googleusercontent.com/u-dX4lpTepsvXs33ds4xxYpApuGS4JBAEb0UsvY_fPbptxnF0QxaKNW0-tJVXaP8a1E", "https://lh3.googleusercontent.com/qvUz_9IXHQd6FSLUALZo8NKLx-s4uDGyElPOGRsU28TCEficQc0BoNRloRRLqUkH2A", "https://lh3.googleusercontent.com/tEyGs6MGlY97ccLc4c_HxV9xNOpsvwQyHz6uGAezkVtxm1ydAaTj5EZSUgqlg69qrrk", "https://lh3.googleusercontent.com/StN0i2BskOs6HCfaPO0DMBOCQMCag3okWVI_SlFJtMytwbgNMBnD5i9hbSqdNlGxffmn", "https://lh3.googleusercontent.com/GRKqWfo-PLzCKwpgZ8fej4PGsUp1q9eM5a3LQeiYCOW-KUpCOIHXOp3mteZWbJ-pz4My", "https://lh3.googleusercontent.com/pFQQ_qi8u92duWCNXpEcNKpH2lVpD_hFd5f-UlTP_f6wft3YyYLMzwLitxt-UI6G8vs", "https://lh3.googleusercontent.com/AoeCU6bT1x0eHRvJwvQyOSKJ31oSayox959qMNVaSzz3uN9bvk1cGek5zyRDe1BdtA", "https://lh3.googleusercontent.com/vICme1f4J9vFt8wY3xBY-LshGgYyvSbsa4TLJyEtNsy0alUI0i9oMQVq8oJ4l_yR1Aw", "https://lh3.googleusercontent.com/7sn9m__iVM-peiG6_jkKBuE-QVH_xDaycF_oR1XJlwcAC45ybNZ_Exor09ENOJ41Q2U", "https://lh3.googleusercontent.com/9I_m2ZXgPtiU4Po4cw_cyIaEpZxynxQ1n3YkhFgakATfbu63a8_f8vGQDxKOHYITzew" ], "size": "Varies with device", "summary": "Task Manager \u2705 Organizer \ud83d\udcc5 Agenda \ud83d\udcdd Daily Reminders \ud83d\udd14 All-in-One Simple App.", "summaryHTML": "Task Manager \u2705 Organizer \ud83d\udcc5 Agenda \ud83d\udcdd Daily Reminders \ud83d\udd14 All-in-One Simple App.", "title": "Any.do: To do list, Calendar, Planner & Reminders", "updated": 1586258773, "url": "https://play.google.com/store/apps/details?id=com.anydo&hl=en&gl=us", "version": "Varies with device", "video": "https://www.youtube.com/embed/2nkllLD0x6o?ps=play&vq=large&rel=0&autohide=1&showinfo=0", "videoImage": "https://i.ytimg.com/vi/2nkllLD0x6o/hqdefault.jpg" }
This offers a great deal of information, such as the number of ratings, reviews, and ratings for each score (1 to 5). Let's set aside all of that and have a look at their lovely icons:
def format_title(title): sep_index = title.find(':') if title.find(':') != -1 else title.find('-') if sep_index != -1: title = title[:sep_index] return title[:10] fig, axs = plt.subplots(2, len(app_infos) // 2, figsize=(14, 5)) for i, ax in enumerate(axs.flat): ai = app_infos[i] img = plt.imread(ai['icon']) ax.imshow(img) ax.set_title(format_title(ai['title'])) ax.axis('off')
We can save the app information for later by converting the JSON objects into a Pandas data frame and saving the output to a CSV file:
app_infos_df = pd.DataFrame(app_infos) app_infos_df.to_csv('apps.csv', index=None, header=True)
You may use the scraping tool to create a balanced dataset by filtering the review score. And, to receive a sample of evaluations for each app, you may arrange the reviews by how helpful they are, which Google Play considers to be the most essential factor.
We're looking for:
You may achieve the first criterion by utilizing the scrape package option to filter the review score. For the second, we'll sort the reviews by helpfulness, which suggests which are the most important to Google Play. Just in case, we'll get a subset from the most recent:
app_reviews = [] for ap in tqdm(app_packages): for score in list(range(1, 6)): for sort_order in [Sort.MOST_RELEVANT, Sort.NEWEST]: rvs, _ = reviews( ap, lang='en', country='us', sort=sort_order, count= 200 if score == 3 else 100, filter_score_with=score ) for r in rvs: r['sortOrder'] = 'most_relevant' if sort_order == Sort.MOST_RELEVANT else 'newest' r['appId'] = ap app_reviews.extend(rvs)
Each review includes the app's id and sort order. Consider the following as an example:
print_json(app_reviews[0]) { "appId": "com.anydo", "at": "2020-04-05 22:25:57", "content": "Update: After getting a response from the developer I would change my rating to 0 stars if possible. These guys hide behind confusing and opaque terms and refuse to budge at all. I'm so annoyed that my money has been lost to them! Really terrible customer experience. Original: Be very careful when signing up for a free trial of this app. If you happen to go over they automatically charge you for a full years subscription and refuse to refund. Terrible customer experience and the app is just OK.", "repliedAt": "2020-04-07 14:09:03", "replyContent": "Our policy and TOS are completely transparent and can be found in the Help Center and our main page. In addition, payment can only be made upon the user's authorization via the app and Google Play. We provide users with a full 7 days trial to test the app with an additional 48 hours for a refund, along with priority support for all issues.", "reviewCreatedVersion": "4.17.0.3", "score": 1, "sortOrder": "most_relevant", "thumbsUpCount": 37, "userImage": "https://lh3.googleusercontent.com/a-/AOh14GiHdfNEu1DwwcJ6yNyju8Yvn4JwjpzuXvD74aVmDA", "userName": "Andrew Thomas" }
repliedAt and replyContent are the developer's answer to the review is included in the content which can be sometimes found missing.
len(app_reviews)
len(app_reviews) 15750 Save the reviews to CSV files app_reviews_df = pd.DataFrame(app_reviews) app_reviews_df.to_csv('reviews.csv', index=None, header=True)
We now have over 15K user reviews from 15 different productivity apps.
Scripting with Pytorch is used to run the code (Google Calab)
Installing required packages and configuring imports
You learned how to:
Following that, we'll use BERT to analyse the reviews for sentiment.
Are you looking for an app store reviews API? Request a quotation from ReviewGators today.
Feel free to reach us if you need any assistance.
We’re always ready to help as well as answer all your queries. We are looking forward to hearing from you!
Call Us On
Email Us
Address
10685-B Hazelhurst Dr. # 25582 Houston,TX 77043 USA