There must be huge overlap Between PubSub, Technorati, Google, BlogPulse, Yahoo! and other search engines. One strategy for measuring the overlap:
- design a hash that hides the url, but is unique for any given url.
- The search engines dump their hashed list of urls to a file.
- A neutral third party (university, research firm, Blogcount.com) analyzes for duplicates and coverage of the space.
Optionally, each hashed url may have a TLD (country domain like .com or .ch or .iq), so we can break down coverage by source and country.