Introduction
Make no mistake, Google got obliterated by Microsoft's blitz attack in the great AI war of 2023. GPT-4 captured the zeitgeist of the artificial intelligence age we just entered, and things got so bad for Google that people unironically started using Bing. But the war is just getting started, and just yesterday, Google unleashed its highly anticipated Gemini model that beats GPT-4 on nearly every benchmark. It is December 7th, 2023, and you're watching The Code Report.
The Rise of Gemini
Gemini first became known to the public earlier this year at Google I/O when Sundar Pichai explained it like this: "We've been applying AI to make AI, rigorously tested AI with AI." Gemini is a multimodal large language model that will replace Lambda and PaLM 2. Like GPT-4, it's multimodal, which means it's not only trained on text but also sound, images, and video.
Impressive Capabilities
Google's demo is absolutely insane. Gemini can recognize what's going on in a video feed and respond in real time. For example, a person draws a duck, and the AI tells him it's a duck. It can do that in multiple languages. It can even keep track of things in an ongoing video feed, like playing the game of find the ball under the cup, and even after the cups are scrambled, it still knows where the ball is. Gemini can also generate images on the fly like Stable Diffusion and even generate music based on a prompt, including converting images to audio.
Logic and Reasoning
Gemini excels in logic and spatial reasoning. Using two pictures, it can tell you which car will go faster based on the aerodynamics of the vehicle. In the future, a civil engineer will be able to just take a picture of some land, and the AI can instantly generate blueprints for a bridge. This means software engineers aren't the only type of engineers becoming obsolete. Google also unveiled AlphaCode 2, which performs better than 90% of competitive programmers, solving highly complex abstract problems using techniques like dynamic programming.
Is It Just Marketing?
All these demos look amazing at first glance, but is this all just a marketing sleight of hand from Google? Currently, Gemini comes in three sizes: Tall, Grande, and Venti. The smallest version is designed to be embedded on devices like Android phones, while the Pro version is a more general-purpose model, and Ultra is the high-end model that's blowing everyone's minds.
If you're in the United States, you can actually use Gemini right now in the Bard chatbot, which is using Gemini Pro, the mid-range version. Bard is way better than it was 6 months ago and still extremely fast, but after using it for a few minutes, it's pretty obvious that it's not quite as good as GPT-4 Pro. However, GPT-4 is nervous about Gemini Ultra. When I asked about it, GPT-4 started throwing shade at itself before Sam Altman pulled the plug, giving me a network error.
Benchmarks
Gemini Pro underperforms GPT-4 in most situations, but Gemini Ultra outperforms it on almost every single category. Most notably, it's the first model ever to outperform human experts on the Massive Multitask Language Understanding (MMLU), which is typically a multiple-choice test over a wide array of subjects, kind of like the SATs but for AI. However, Gemini Ultra underperforms GPT-4 on the HellaSwag benchmark, designed to evaluate common sense natural language by having the AI finish a sentence that's often vague and ambiguous.
Training and Infrastructure
The technical paper describes how they train this beast using version 5 tensor processing units deployed in super pods of 4,096 chips each. These super pods have dedicated optical switches allowing data to transfer quickly between the pods to train in parallel. They can dynamically reconfigure into 3D torus topologies, reducing latency between chips. The scale of Gemini Ultra is so large that they had to communicate between multiple data centers. The training dataset includes everything from the internet, including web pages, YouTube videos, scientific papers, and books. They filter for quality and use reinforcement learning through human feedback to fine-tune quality and avoid hallucinations.
Availability
Overall, Gemini looks amazing on paper, but prepare to be disappointed. The Nano and Pro models will be available on Google Cloud on December 13th, but the Gemini Ultra Pro Max won't be available until next year until additional safety tests are done and it reaches 100% on the HellaSwag benchmark.
Sources: