Journal

Posts tagged "Data Extraction"

1 posts

March 2023

Reverse Engineering Read Later Data from the Apple News App

As we navigate the digital world, we often come across articles we don't have time to read but still want to save for later. One way to accomplish this is by using the Read Later feature in Apple News. But what if you want to access those articles outside the Apple News app, such as on a different device or with someone who doesn't use Apple News? Or what if you want to automatically post links to those articles on your blog? That's where the nerd powers come in.

Reverse Engineering the Data

Initially, I reached out to Rhet Turnbull, the creator of the amazing osxphotos app/Python library that I use to extract the data from Apple Photos. I use that data to power the photo section of my site.

I asked Rhet if he had ever pulled this data from News. While I waited to hear back from him, I used lsof to look for the file that Apple News uses to store Read Later Articles. I discovered that Apple News uses a Binary PList file located in a super obvious place:

/Users/eecue/Library/Containers/com.apple.news/Data/Library/Application Support/com.apple.news/com.apple.news.public-com.apple.news.private-production/reading-list

Simple and obvious, right?! After I found it, I noticed it was in a strange format that a normal binary PList parser couldn’t understand. However, I was able to just run strings on the file and extract the Apple News Article ID which looks like this: https://apple.news/AbtWOAgVqToW62MeeZ1xkcQ.

I wrote a script to parse the data on the page above and then use Beautiful Soup to extract the article data. It wasn’t perfect, but it did the job:

import subprocess
import requests
from bs4 import BeautifulSoup

# Run the `strings` command to extract the strings from the binary file
proc = subprocess.Popen(['strings', '/Users/eecue/Library/Containers/com.apple.news/Data/Library/Application Support/com.apple.news/com.apple.news.public-com.apple.news.private-production/reading-list'], stdout=subprocess.PIPE)

# Loop through the output and look for article IDs
article_ids = []
for line in proc.stdout:
    # Check if the line starts with "rl-" and ends with "_"
    if line.startswith(b'rl-'):
        # Extract the article ID by removing the "rl-" prefix and "_" suffix
        article_id = line.decode().strip()[3:]
        if article_id.endswith('_'):
            article_id = article_id[:-1]
        article_ids.append(article_id)

def extract_info_from_apple_news(news_id):
    # Construct the Apple News URL from the ID
    apple_news_url = f'https://apple.news/{news_id}'
March 13, 2023 Read more