Scraping apps and automatically transfer the data by mail: a Python learning journey

How the ‘Ommetje-app’ taught me my first steps in automating certain tasks

justinkraaijenbrink
Analytics Vidhya

--

Photo by Shane Aldendorff on Unsplash

Imagine you want to kick off 2022 with a great start, so you decide to definitely stick to your healthy intentions this year. You install an activity tracker on your phone and each week you want an update in your mailbox with an overview of the progress you made. Or imagine another situation, where your country has an amazing neuropsychology professor who has developed an application that stimulates people to walk more often, and you want to track your walking activity. This might sound like an artificially fabricated scenario, but in The Netherlands we actually have such a professor, called Erik Scherder and he is definitely worth a Google search! He does a wonderful job in inspiring people to inhabit a more active lifestyle and together with the Dutch Brain Foundation he developed the ‘Ommetje-app’. By using this app, people can gather points for taking a walk of at least 20 minutes (called an ommetje, in Dutch) and compete with each other in different teams. Together with my colleagues we take part in such a competition at work, where our teamcaptain must report our scores to the organisation on a weekly basis. Extracting the scores, summing them to a total teamscore, sending the information by mail… different parts that might possibly be assembled in an automated process! Well, this is actually where my learning journey started. Let’s explore!

Alright, first a small little disclaimer: before I started this project, I had no previous experience with automating tasks using Python or any other programming language. So don’t expect a high-level-expert-how-to-tutorial! But anyway, I hope to provide you some useful tips and tricks by taking you along with my learning journey. What I deemed highly beneficial was to first formulate the problem and divide it into subproblems:

  1. Extract scores from the Ommetjes-app;
  2. Format the scores into a text message;
  3. Send the text message by mail;
  4. Schedule the process such that steps 1–3 are executed automatically.

In the following sections we will explore these steps one by one.

1. Extract scores from the Ommetjes-app

Before we dive into the steps needed for extracting data from the Ommetjes-app, I think it would be nice to first have an impression about the look and feel of the application. Here you see a screenshot of our teamscores, all beautifully designed with the nickname, the different scores (XP) and several medals for different achievements.

For our project we are particularly interested in the nickname and corresponding XP-score, which are the features we want to extract from the app. To do so, we must intercept the data traffic between our app and the Internet, for which mitmproxy is an excellent tool. Gaurav Sharma has written an amazing tutorial (Android and iOS) on how to do this, but I will summarize the steps here for for Apple users:

  1. [Mac] Install mitmproxy in Terminal: brew install mitmproxy
  2. [Mac] Find IP-address of your network: System Preferences → Network → Wi-Fi → Advanced… → TCP/IP → IPv4 Address
  3. [iPhone] Set up your iPhone: Settings → Wi-Fi → [click on blue (i) next to connected network] [scroll down to HTTP-PROXY] → Configure proxy → Manual
  4. [iPhone] Server: IPv4 Adress from step 3
    Port: 8080
  5. [Mac] Open mitmproxy in Terminal: mitmproxy
  6. [iPhone] Go to mitm.it, select Apple and install the certificate
  7. [iPhone] Settings → General → VPN → install and certify [mitmproxy]
  8. [iPhone] Settings → General → About → Certificate Trust Settings → toggle on the mitmproxy button, and voilà!

Ow yeah, we are now almost good to go and intercept the data we want! Yup, almost… But hold on, because we only need to conduct some additional small steps! We type in mitmproxy in the terminal and open the app on our phone. If all went okay in the previous steps, we are now shown all the traffic that the proxy intercepts:

The last line (GET https://ommetje.nu/api/users/highscore) is the one we want to jump into, because that’s where our data is stored. Now, simply click on it and under Response we actually see the nicknames and scores, which is exactly what we are looking for!

To transform the information to a format that Python can work with, we convert it to a Python request:

  1. Press w and replace save.file @focus path with export.clip curl @focus. Hitting [Enter] will copy the curl to your clipboard
  2. Go to a curl converter (e.g., https://curlconverter.com) and convert the copied curl to a Python request
  3. Copy the Python request to your favourite Python IDE

All steps successfully completed? Well done! A great deal of the work has been finished and we are now ready to do the actual cool stuff, so hold on!^^

2. Format the scores into a text message

Alright, let’s start this section with some good news: the most tedious part of the process has already been covered, so we can switch to some actual Pyton programming. The first thing we want to do is importing the libraries we need, extract the text from the request-object, convert it to a BeautifulSoup() object and load the data into an json-format. There are other formats possible as well, but I found this one the easiest to work with, given it’s dictionary-ish structure. Here is the corresponding Python code:

Now that we have the data in json-format, we first determine the number of competitors, then create lists that store the nicknames and XP-scores and finally format the entire thing into a beautiful dataframe. We finish by computing the total team score.

The only thing that’s left is to create a text message that we can send by email. There are several ways of doing this — with mine probably not the most aesthetically pleasing — but when you use HTML make sure the final message starts with <body> and closes with </body>. You can also use <br> to skip to the next line.

And as a matter of fact, that’s all there is to step 2 of our automation process!

3. Send the text message by mail

You might want to take a small coffee break by now, but my advice would be to hold on just a little longer, because this step really just requires a few copy-paste-insert-actions. Some elucidating remarks for the code below: we start with importing a few essential libraries, create a connection with our email server, then add the email we want to write and finally send the message.

Now you might have your coffee. ☕️

4. Schedule the process

Alright, if you have made it to this final part you had definitely deserved that coffee, so I hope it tasted great! It might even prove quite handy, because we can really use a caffeine shot to get us through the last part of this tutorial: the actual automation step. This shouldn’t be too difficult when using crontab, but I had some security issues for which I decided to take an alternative route. So you might either stop here and follow for example this tutorial from Gavin Wiener, or you can may continue reading for a summary of the implementation of Yaniss Illoul. The choice is yours, but it might be interesting to now that — although I will describe the steps for Mac users — the latter is also suitable for Windows users. With that having said, let’s see how we can get the scheduling task to work. The first step is to convert our Python script to an executable file (.exe), which can be quite easily done with some lines of code in the terminal:

  1. Navigate to the folder where you want to store your executable and type in nano [insert file name]. Make sure you don’t add any file extension to the filename!
  2. Add the following two lines of code to the empty file:
#!/bin/sh
Python [insert path_to_Python_file]

Here is how it should look like:

3. When you have saved the file and you are back in the terminal, simply type in chmod 755 [insert path to newly created file] and hit [Enter].

We now have an executable file, but unfortunately there are quite some additional steps required before we can actually execute this file on a scheduled basis. The fundamental thing to do here is converting our .exe to an application for which we make use of the Automator:

  1. [Cmd] + [Space], type in Automator and hit [Enter]
  2. New Document → Application → drag both Get Specified Finder Items and Open Finder Items to the panel on the right
  3. In the Get Specified Finder Items panel, click on Add, navigate to your executable and add it. The result should look something like this:

4. Save the file ([Cmd] + [s]) and your app is created! Now that we have an application, the only thing left to do is to run it on a scheduled basis, for which we can use the Calendar application. So, open the Calendar app on your Mac and create a new iCloud agenda item (for clarity it might be useful to create a separate calendar. Go to File → New Calendar → [insert calendar name])

5. Set the date and time at the moment you want to have your file executed and repeat the agenda item at the desired frequency. For example, I wanted to send an email each Friday at 3PM, for a period of two months. Nothing special here, but the trick is in setting the alert: Alert → Custom → Open file[select created application]. The result:

That’s it! Your Mac will now execute the Python script at the required date and time on the scheduled basis of your choice! I realise there were quite some steps involved in the full process, but I hope that some of my learning experiences — which I found very instructive — might come in useful for you as well. Just very maybe it might even help you in sticking to your 2022 health intentions, which would be a nice bonus, but if it doesn’t: don’t blame yourself, because less then 10% of the people manage to get used to their new habits. Happy 2022!

--

--

justinkraaijenbrink
Analytics Vidhya

Statistics lover | Likes Python, loves R | Gets happy from peanut butter recipes.