Happy Friday! If you're receiving this, it’s because you’re a cherished member of my network and/or a subscriber to my Savvy Musings newsletter, where I share my learnings and experiences every few months.
For nearly two years, I’ve been deeply immersed in generative AI, a technology that’s quickly reshaping business and our lives in ways we are only beginning to understand.
Many of you have expressed interest in learning more and staying updated on what matters most, but with rapid advancements, constant hype, and busy lives, it’s hard to find the time to stay informed and cut through the noise.
That’s why I’m starting a new weekly AI newsletter on Substack.
Each Friday, you’ll receive:
Curated AI Developments and Insights: A selection of the most important AI news and insights, presented mostly as links with brief summaries to help you quickly grasp the key points. Sometimes it will include 15 links, and others 3, just depending on what's happening.
Occasional Extras: Some weeks, I’ll also share my insights and perspective, prompts, favorite tools, or practical tips and tricks to help you stay ahead.
My goal is to keep it short and sweet, and most importantly, valuable to you — so you can stay informed without feeling overwhelmed.
I'm open to feedback to evolve the format and make it even more helpful (while keeping it manageable for me too, so I don’t get overwhelmed either 😊).
If this isn't for you, you can easily unsubscribe below 👇.
For those who are curious to learn more, thanks for being part of this journey.
Let’s dive into this week’s edition:
It has been a big week in AI. Here’s what you need to know:
Meta has released Llama 3.1, its first “best in class” AI model, claiming to match or even exceed the performance of top models like OpenAI's GPT-4o and Anthropic’s Claude 3.5 Sonnet on certain benchmarks. This is a big deal because the model is open source, meaning anyone can examine, use, modify and build apps on top of it – for free. This approach challenges the business models of other AI giants and raises concerns about the security and misuse of powerful, widely accessible AI technologies.
OpenAI is launching SearchGPT, an AI-powered search engine that summarizes information from websites and cites sources, including news from its partner publishers like The Wall Street Journal, The Associated Press, Vox Media and The Atlantic. SearchGPT also offers a sidebar with additional results and sources, and allows users to ask follow-up questions. The Google rival will be available as a prototype in limited release, with plans to eventually build it into ChatGPT. You can sign up for the waitlist here.
A few key points from OpenAI’s blog:
“We are also launching a way for publishers to manage how they appear in SearchGPT, so publishers have more choices. Importantly, SearchGPT is about search and is separate from training OpenAI’s generative AI foundation models. Sites can be surfaced in search results even if they opt out of generative AI training. To read more about publisher controls and OpenAI’s bots, see here (you’ll want to make sure your SEO team has this info ASAP).”
Major gaming companies like Activision Blizzard are rapidly adopting generative AI for game development, concept art and asset creation amid mass layoffs. While some studios embrace AI to cut costs, others ban its use, sparking industry-wide debates on job security, copyright infringement, artistic quality and the need for unionization.
Etsy now permits the sale of AI-generated art as long as artists disclose the use of AI in their product descriptions. But the platform bans the sale of standalone AI prompts or bundles, as it views them as essential to the creative process and believes they “should not be sold separately from the final artwork.” I’m curious about how this will shape consumer perception of AI generated art and whether transparency about AI use impacts purchasing decisions. Also, as AI art can’t currently be copyrighted, what are the implications for artists and marketplaces?
Condé Nast has sent a cease-and-desist to Perplexity telling the AI search engine to stop including content from its publications in its results, 1 month after Forbes did the same.
Runway's highly praised Gen-3 AI video generator was secretly trained on thousands of YouTube videos and pirated films without permission, including content from popular creators and major media outlets like Pixar, Disney, Netflix, Sony, The New Yorker and Vice News, among others. This contradicts Runway's claims of using "curated, internal datasets," and raises serious ethical and legal issues regarding the use of copyrighted material. Though my hunch is that most of the top video generation tools have also trained on YouTube data…
Kling AI, a high-quality text & image-to-video generation model from China, similar to OpenAI’s yet-to-be-released Sora, is now available worldwide. And it’s currently free to use, though that will not likely be the case for long. Developed by Kuaishou Technology, Kling can generate videos up to two minutes long in HD. Users can create approximately six videos daily. I hope to try it soon and will report back.
Google is sponsoring Team USA at the Paris Olympics and will use the opportunity to promote Gemini and AI features like Google Maps 3D views, and Circle to Search. These technologies will provide real-time AI insights, detailed venue visualizations, and athlete-driven social content, demonstrating Google's AI capabilities in a high-profile setting.
Meta AI’s new “Imagine me” feature allows you to generate selfies in any style while doing anything you want. Zuck demonstrates how it works by creating images of himself as a gladiator, a member of a boy band and a streetwear designer.
FEATURED STORY
The Data That Powers A.I. Is Disappearing Fast 🚨
New research from MIT reveals a big decline in publicly available data for AI training.
The study found that over the past year, many crucial web sources (including publishers and online platforms) have restricted the use of their data.
How are websites limiting access?
↳ The Robots Exclusion Protocol - a decades-old method that prevents automated bots from crawling web pages
↳ Website Terms of Services, which serve as legal barriers to data scraping
Some major players like Reddit, The Associated Press and News Corp are cutting deals with AI companies.
Others, like The New York Times are taking the legal route, suing for copyright infringement.
Why It Matters:
Quality data is essential for training advanced generative AI models like ChatGPT. Restrictions from major sources may pose a threat to AI companies in advancing their models.
Smaller AI startups and academic researchers are particularly vulnerable, unable to compete with tech giants for licensed data.
To overcome this challenge, some AI companies are considering using synthetic data—data generated by AI itself —to train models. “But many researchers doubt that today’s AI systems are capable of generating enough high-quality synthetic data to replace the human-created data they’re losing.”
Also, although publishers can attempt to stop AI companies from scraping their data, these requests are not legally binding and rely on voluntary compliance.
The big take-away? While the legal battles unfold over data usage rights, we need new tools to give website owners more precise ways to control the use of their data.
But I also don’t think time is on the side of the publishers who are holding out, as online search is transforming and we are all starting to get used to a new normal…
For a deeper dive, check out the piece from The New York Times.
That's all!
I’ll see you next Friday. Thoughts, feedback and questions are welcome and much appreciated. Shoot me a note at avi@joinsavvyavi.com.
Stay curious,
Avi
Great Newsletter Avi. As your featured article highlights, AI value really lies in who owns the data and quality data will help avoid hallucinations!
All the big players have been collecting our data for years and that’s why the likes of Meta don’t mind open sourcing their LLM’s! Ie they want the wider community to continue to improve the models that they can pump their proprietary data into!
Thanks Allan.
Yes, that and they want to commoditize the underlying tech to neutralize advantages for OpenAI, Google and Anthropic. Will be interesting to see where things land in a year.