Your Map to the World of Data Scraping

How Are Your Competitors Managing Bot Traffic?

Out of the top 100 sites:
30% of them block OpenAI GPT
22% restrict Google web-scraping
93% have explicit rules for bots

How Does This Domain Handle Bot Traffic?

Bots can do whatever they want with your content,
Unless you write rules for them.

Without specifying authorized behavior for bots you expose yourself to unauthorized data scraping

Why Do I Need This?

Protect Your Data From Unauthorized Bots

"I have seen reports that it may or may not have been used. I have no information myself."
- YouTube CEO, Neal Mohan on OpenAI (ChatGPT) scraping and using their data

© RobotsMapper. All rights reserved.

Thank you!

Aenean ornare velit lacus et varius enim proin aliquam facilisis ante sed etiam feugiat sed lorem consequat.

What is a Robots.txt?

A Robots.txt tells bots what they are allowed to look at on your website

"Web Robots (also known as Web Wanderers, Crawlers, or Spiders), are programs that traverse the Web automatically. Search engines such as Google use them to index the web content, spammers use them to scan for email addresses, and they have many other uses."

Source:
https://www.robotstxt.org/robotstxt.html

Glossary background:
https://darkvisitors.com/agents/

More resources on creating and managing robots.txt files:

https://moz.com/learn/seo/robotstxt
http://www.robotstxt.org/about.html
https://moz.com/blog/interactive-guide-to-robots-txt
http://www.robotstxt.org/robotstxt.html

About Us

Futureproof TMT: Welcome to Robots Mapper!

RobotsMapper is a freemium tool built and developed by Futureproof TMT, the research and advisory firm of FIT Holdings.

Futureproof’s inaugural 2024 research report on the changing technical dynamics and market economics was released in April 2024, which includes an analysis of how the technology, media, and telecommunications sectors have evolved to serve hyperscale needs of the modern media and data landscape. The paper, which can be found here, also outlines how GenAI tools & LLM’s have caused an even further increase in the level and impact of non-human web traffic not just on analytics but on media intellectual property rights.

While researching current news coverage in preparation for releasing the research report, our team saw an interview with the YouTube CEO speaking about their stance on crawlers and scrapers from firms like OpenAI using YouTube’s hosted content to train their models. Surprisingly, we found that YouTube does not set even basic rules for crawlers from the likes of OpenAI and others, while navigating their sitemap on the open web!

If YouTube isn’t yet doing the basics to manage web robots, we created a tool to help others interested in this space to analyze what sites are managing their sitemaps properly for the age of AI.

Creators: Shailin Dhar, Scott Thomson and Arjun Krishna

Contact Us

Want more data? Let us know

List of Sites and Categories

Want access to our full dataset? Contact us

Domain Category Link to Report
amazon.com eCommerce robotsmapper.com/amazon.com
walmart.com eCommerce robotsmapper.com/walmart.com
ebay.com eCommerce robotsmapper.com/ebay.com
target.com eCommerce robotsmapper.com/target.com
temu.com eCommerce robotsmapper.com/temu.com
rakuten.com eCommerce robotsmapper.com/rakuten.com
amazon.in eCommerce robotsmapper.com/amazon.in
apple.com eCommerce robotsmapper.com/apple.com
craigslist.com eCommerce robotsmapper.com/craigslist.com
aliexpress.com eCommerce robotsmapper.com/aliexpress.com
alibaba.com eCommerce robotsmapper.com/alibaba.com
taobao.com eCommerce robotsmapper.com/taobao.com
ikea.com eCommerce robotsmapper.com/ikea.com
bestbuy.com eCommerce robotsmapper.com/bestbuy.com
wayfair.com eCommerce robotsmapper.com/wayfair.com
etsy.com eCommerce robotsmapper.com/etsy.com
booking.com eCommerce robotsmapper.com/booking.com
expedia.com eCommerce robotsmapper.com/expedia.com
baolau.com eCommerce robotsmapper.com/baolau.com
kayak.com eCommerce robotsmapper.com/kayak.com
indiamart.com eCommerce robotsmapper.com/indiamart.com
youtube.com Social Media robotsmapper.com/youtube.com
x.com Social Media robotsmapper.com/x.com
facebook.com Social Media robotsmapper.com/facebook.com
tiktok.com Social Media robotsmapper.com/tiktok.com
whatsapp.com Social Media robotsmapper.com/whatsapp.com
instagram.com Social Media robotsmapper.com/instagram.com
telegram.org Social Media robotsmapper.com/telegram.org
reddit.com Social Media robotsmapper.com/reddit.com
quora.com Social Media robotsmapper.com/quora.com
snapchat.com Social Media robotsmapper.com/snapchat.com
linkedin.com Social Media robotsmapper.com/linkedin.com
threads.net Social Media robotsmapper.com/threads.net
wechat.com Social Media robotsmapper.com/wechat.com
discord.com Social Media robotsmapper.com/discord.com
twitch.com Social Media robotsmapper.com/twitch.com
tumblr.com Social Media robotsmapper.com/tumblr.com
blsky.app Social Media robotsmapper.com/blsky.app
pinterest.com Social Media robotsmapper.com/pinterest.com
kuaishou.com Social Media robotsmapper.com/kuaishou.com
weibo.com Social Media robotsmapper.com/weibo.com
forbes.com News/Information robotsmapper.com/forbes.com
google.com News/Information robotsmapper.com/google.com
bing.com News/Information robotsmapper.com/bing.com
yahoo.com News/Information robotsmapper.com/yahoo.com
msn.com News/Information robotsmapper.com/msn.com
perplexity.ai News/Information robotsmapper.com/perplexity.ai
yandex.com News/Information robotsmapper.com/yandex.com
baidu.com News/Information robotsmapper.com/baidu.com
bloomberg.com News/Information robotsmapper.com/bloomberg.com
cnn.com News/Information robotsmapper.com/cnn.com
bbc.com News/Information robotsmapper.com/bbc.com
aljazeera.com News/Information robotsmapper.com/aljazeera.com
washingtonpost.com News/Information robotsmapper.com/washingtonpost.com
newyorktimes.com News/Information robotsmapper.com/newyorktimes.com
apnews.com News/Information robotsmapper.com/apnews.com
reuters.com News/Information robotsmapper.com/reuters.com
foxnews.com News/Information robotsmapper.com/foxnews.com
indiatimes.com News/Information robotsmapper.com/indiatimes.com
hindustantimes.com News/Information robotsmapper.com/hindustantimes.com
archive.org News/Information robotsmapper.com/archive.org
commoncrawl.org News/Information robotsmapper.com/commoncrawl.org
wikipedia.org News/Information robotsmapper.com/wikipedia.org
pluto.tv Entertainment Media robotsmapper.com/pluto.tv
fubo.tv Entertainment Media robotsmapper.com/fubo.tv
peacocktv.com Entertainment Media robotsmapper.com/peacocktv.com
therokuchannel.roku.com Entertainment Media robotsmapper.com/therokuchannel.roku.com
crunchyroll.com Entertainment Media robotsmapper.com/crunchyroll.com
tubitv.com Entertainment Media robotsmapper.com/tubitv.com
crackle.com Entertainment Media robotsmapper.com/crackle.com
imdb.com Entertainment Media robotsmapper.com/imdb.com
vudu.com Entertainment Media robotsmapper.com/vudu.com
xumo.com Entertainment Media robotsmapper.com/xumo.com
pandora.com Entertainment Media robotsmapper.com/pandora.com
max.com Entertainment Media robotsmapper.com/max.com
paramountplus.com Entertainment Media robotsmapper.com/paramountplus.com
spotify.com Entertainment Media robotsmapper.com/spotify.com
tidal.com Entertainment Media robotsmapper.com/tidal.com
soundcloud.com Entertainment Media robotsmapper.com/soundcloud.com
stitcher.com Entertainment Media robotsmapper.com/stitcher.com
netflix.com Entertainment Media robotsmapper.com/netflix.com
disneyplus.com Entertainment Media robotsmapper.com/disneyplus.com
dailymotion.com Entertainment Media robotsmapper.com/dailymotion.com
jiocinema.com Entertainment Media robotsmapper.com/jiocinema.com
shopify.com Business/Enterprise robotsmapper.com/shopify.com
oracle.com Business/Enterprise robotsmapper.com/oracle.com
thetradedesk.com Business/Enterprise robotsmapper.com/thetradedesk.com
canva.com Business/Enterprise robotsmapper.com/canva.com
squarespace.com Business/Enterprise robotsmapper.com/squarespace.com
wix.com Business/Enterprise robotsmapper.com/wix.com
hubspot.com Business/Enterprise robotsmapper.com/hubspot.com
salesforce.com Business/Enterprise robotsmapper.com/salesforce.com
slack.com Business/Enterprise robotsmapper.com/slack.com
adobe.com Business/Enterprise robotsmapper.com/adobe.com
accenture.com Business/Enterprise robotsmapper.com/accenture.com
ey.com Business/Enterprise robotsmapper.com/ey.com
deloitte.com Business/Enterprise robotsmapper.com/deloitte.com
kpmg.com Business/Enterprise robotsmapper.com/kpmg.com
mckinsey.com Business/Enterprise robotsmapper.com/mckinsey.com
microsoft.com Business/Enterprise robotsmapper.com/microsoft.com
cisco.com Business/Enterprise robotsmapper.com/cisco.com
tata.com Business/Enterprise robotsmapper.com/tata.com
tencent.com Business/Enterprise robotsmapper.com/tencent.com
reliance.com Business/Enterprise robotsmapper.com/reliance.com
intel.com Business/Enterprise robotsmapper.com/intel.com
weather.com Miscellaneous robotsmapper.com/weather.com
zergnet.com Miscellaneous robotsmapper.com/zergnet.com