I subscribe to blogs’ RSS feeds in my “limbo reader” with NetNewsWire and usually these come in dumps like Blaugust 2023, 2024, etc. Which means there will be duplicate entries across different folders which likely lead to duplicate articles in the main feed.

To find duplicates by URL from the Subscriptions.opm (in /Users/juhis/Library/Containers/com.ranchero.NetNewsWire-Evergreen/Data/Library/Application Support/NetNewsWire/Accounts/OnMyMac) file, I ran:

xmllint --xpath "//outline/@xmlUrl" Subscriptions.opml | sort | uniq -c | sort | awk -F' ' '{if($1>1)print$2}'

explained:

xmllint --xpath "//outline/@xmlUrl" Subscriptions.opml | # Find all xmlUrls from <outline> elements
sort | # Sort them by URL
uniq -c | # Group by url and add count at the start
sort | # Sort again by count
awk -F' ' '{if($1>1)print$2}' # Only print ones where count > 1

I then did the same but for titles as not all URLs were exactly the same across matching feeds:

xmllint --xpath "//outline/@title" Subscriptions.opml | # Find all xmlUrls from <outline> elements
sort | # Sort them by URL
uniq -c | # Group by url and add count at the start
sort | # Sort again by count
grep -ev "^\s*1" # Only print ones where count > 1