2nd stage of the AI age: AI Labs Need Your Data
First stage of the "AI age" was getting inference costs down to "too cheap to meter".
Cost of AI models:
Training time: a one-time, 3-month-long process costing up to a billion dollars
Inference time: when you ask questions in your favorite chat app. The cost of this is converging to a fraction of a cent.
The second stage of the AI age is all about your data
Ilya Sutskever announced in a NeurIPS talk, that AI labs are running out of data, "The fossil fuel of AI".
"We have but one internet" - the data on the internet is limited (also meaning that they've already scraped whatever they could).
In light of this, some of the recent announcements by the companies behind the AI labs make immediate sense.
Microsoft is offering Copilot for free, previously $20 per month (still a loss leader even at that price!). The catch: in all plans under $19 per month, they are training on your data by default.
Google releases Gemini 2.0 Flash, one of the most advanced models ever, with real-time audio and video-streaming input! And if that's not enough, you get it entirely free on Google AI Studio and through the API!
The catch? When you use the Free version, Google uses your data for model training. At least they clearly point out that you should not submit "sensitive, confidential, or personal information to the Unpaid Services".
(Also, can anyone point out how to access the non-Free version? I really tried but couldn't find how. Do you need a Google Cloud account for that?)
OpenAI is offering previous premium features, like Search, to all users, even those on the Free plan. They really want to see Google dance. (Meanwhile, Google makes them dance with the Veo 2 video model - it's really fun to watch.)
They also now give Free users a monthly allowance of "Advanced voice mode" and have even started 1-800-ChatGPT, a free phone line where you can talk with ChatGPT!
The catch? You guessed it - for all consumer plans (aka not Teams or Enterprise), your data is used for model training. Ironically, this seems to include even the new $200/month Pro plan!
One company surprisingly missing from the last 2 weeks' series of announcements was Anthropic, which happens to be the only company that is not training on your data.
They seem to be taking a different direction compared to other players, and they also happen to have the best model for coding! Claude Sonnet is actually so good that Microsoft was forced to start offering it in GitHub Copilot; otherwise, developers would have left the platform.
It's wonderful to watch a dance of titans, who are outbidding each other while offering loss-making products for free. Just remember to stay aware of how they handle your data and which providers/plans actually respect your privacy.
Also, to be entirely honest, there might be other reasons for offering these products for free. For example, by offering CoPilot for free, Microsoft overnight decided the fate of dozens of "coding assistance" startups. The chances of a new "Cursor" being started in 2025 just got quite a bit slimmer.
I’ve also written 3 TIL posts since the last newsletter.