We summarized this source into key points to remember. To know more about it, please click on the link above.
Receive a daily summary of what happened in tech, powered by ML and AI.
Thank you! We sent you a verification email.
Oops! Something went wrong while submitting the form.
Join 1,500+ thinkers, builders and investors.
OpenAI has launched a bot to scour the internet for data to enhance AI systems, prompting concerns from website operators and prompting guidance on how to prevent the bot from accessing certain sites.
Purpose of the Bot:
OpenAI has developed a new bot to crawl the internet to gather data for training AI systems.
AI systems like ChatGPT need vast amounts of data for proper training and accurate outputs.
Previously, much of the required data was freely sourced from the internet.
Concerns Raised:
Authors and web users have expressed dissatisfaction with OpenAI for using personal and copyrighted content.
Such content could potentially influence or even be duplicated in AI responses.
Companies like OpenAI have been criticized for straining web infrastructures with their data crawlers.
Elon Musk mentioned that the high traffic from these bots resulted in Twitter limiting user post visibility.
OpenAI's Current and New Systems:
ChatGPT 3.5 and 4 models were trained using data from the web until the end of 2021.
Owners of this data or the websites it came from cannot have it removed from OpenAI's models.
The new bot, 'GPTBot', aims to collect data and content from the web to train upcoming models.
Guidance for Website Operators:
Website administrators can instruct the bot to avoid crawling their site if they don't want their data accessed.
Instructions can be included in a "robots.txt" file, similar to directives for other web crawlers.
OpenAI's Assurances:
The bot may be used to refine future AI models.
It is designed to exclude sources that have paywalls, gather personal data, or contain content that breaches OpenAI's guidelines.
OpenAI posits that allowing bot access could enhance the precision, overall functionality, and safety of AI models.
Did you like this article? 🙌
Receive a daily summary of the best tech news from 50+ media (The Verge, Tech Crunch...).
Thank you! We sent you a verification email.
Oops! Something went wrong while submitting the form.
Join 1,500+ thinkers, builders and investors.
You're in! Thanks for subscribing to Techpresso :)
Oops! Something went wrong while submitting the form.