Google has emerged as a leading powerhouse in the world of artificial intelligence (AI) and chatbot technology, alongside the likes of Claude and ChatGPT. It’s currently embracing its “Gemini era” after rebranding from its former iteration, known as Bard. However, in typical Google fashion, it has applied its family of multimodal AI models to many of its other products.
Here’s what we know about Google Gemini.
What is Google Gemini?
Google Gemini came onto the AI scene in February this year and quickly made waves. But it was the release of Gemini Live at the “Made by Google” event in August that truly captured attention. ReadWrite reported that Gemini Live brings conversational AI directly to Android phones, which allows users to talk about complex topics in real time using voice instead of typing—a much more natural and interactive experience.
At its core, Gemini is Google’s large language model (LLM), which powers a range of AI tools similar to those you may have seen, like OpenAI’s ChatGPT. Just as OpenAI’s GPT-4 model fuels ChatGPT-4 and ChatGPT Plus, Gemini powers Google’s AI chatbot and tools. However, Gemini represents more than just an AI model – it’s also the new identity for Google’s chatbot, previously called Bard. This rebranding simplifies things by unifying the model and chatbot under the Gemini name.
Gemini Live is coming to 40+ languages! It's rolling out to Android devices over the next few weeks, starting with French, German, Portuguese, Hindi, and Spanish, with many more on the way.
Soon, you'll be able to communicate, collaborate, and get creative in even more… pic.twitter.com/yP4YVFwFCr
— Google Gemini App (@GeminiApp) October 3, 2024
So, what can Gemini do? It can answer questions, summarize text, write code, translate, and create images (on mobile, not in the free browser version). Google’s also working on Imagen 3, its response to Midjourney, which will likely be integrated into Gemini soon for even more creative power.
Image generation with Imagen 3 is now available to all Gemini users around the world.
Imagen 3 is our highest quality image generation model yet and brings an even higher degree of photorealism, better instruction following, and fewer distracting artifacts than ever before. pic.twitter.com/E8CrcyFcz5
— Google Gemini App (@GeminiApp) October 9, 2024
Beyond being a conversational tool, Gemini is also integrated across various Google apps, adding intelligent features to Google Workspace tools like Gmail, Google Docs, and other productivity apps for paying users.
Enable extensions to get more out of Gemini.
Connect the dots across your @Google universe and use Gemini to find info in your Gmail, summarize lengthy Docs, and more.
See below to learn how ⬇️ #GeminiProTip pic.twitter.com/RAhigHbIpo
— Google Gemini App (@GeminiApp) October 22, 2024
Developers can even incorporate Gemini’s capabilities into their own applications. Gemini could eventually replace Google Assistant, possibly offering an improved, AI-powered assistant that interacts seamlessly with Google’s ecosystem.
How does it compare to ChatGPT?
Google has shared some interesting insights into how Gemini, their AI model, works. Like many leading AI models, Gemini uses a transformer architecture and applies both pretraining and fine-tuning techniques. However, what makes Gemini unique is that it was trained on multiple types of media—text, images, audio, and video—all at once, rather than focusing on each individually.
This approach aims to give Gemini a more nuanced understanding of language and context. Imagine a phrase like “small talk.” If an AI is simply trained to associate images of “small” and “talk,” it might take it literally, generating an image of short people conversing. But because Gemini’s training integrates language and visuals simultaneously, it should grasp the playful, undertones of “small talk.”
This multimodal training helps Gemini “seamlessly understand and reason about all kinds of inputs from the ground up.” It can, for example, read charts alongside captions, interpret signs, and blend information across text, images, and more. While these features were innovative when Gemini launched, other models, like Claude 3.5 and GPT-4o, now have similar multimodal capabilities.
Another major feature of Gemini is its long context window. With Gemini 1.5 Pro, you can include up to two million tokens in a single prompt, accommodating extensive documents, databases, and complex contracts. This is particularly handy if you’re working with large text resources or building a retrieval-augmented generation (RAG) pipeline—though costs could add up if you use the full capacity regularly.
In terms of performance, benchmarks show that Gemini 1.5 Pro is slightly behind the top models like GPT-4o and Claude 3.5 Sonnet but on par with models like Llama 3 70B. The lighter version, Gemini 1.5 Flash, is comparable to GPT-4o Mini and Claude 3 Haiku, making it a solid option among mid-range models.
Is Google Gemini free?
There’s now a free Gemini app for Android, which may even replace Google Assistant on your phone if you like. iPhone users can find Gemini in the Google app, and it’s accessible to everyone through any web browser.
In addition to the free version, Google offers a premium option called Gemini Advanced. This subscription, part of the Google One AI Premium plan, gives access to a more powerful model, Gemini Ultra. Subscribers get extra perks, like using Gemini Live on mobile—a hands-free, voice-controlled AI experience for Android. So, whether you’re using the free version or the upgraded one, there are plenty of ways to access Gemini across devices.
What is Gemini Google Messages?
Google’s focus with Gemini has been on integrating it into productivity apps like Docs and Gmail, but now it’s made its way into Google Messages—an app most Android users rely on daily. Originally announced at I/O 2024, Gemini in Messages makes it easy to get AI help with everything from drafting texts to planning your weekend.
Before you can start chatting with Gemini in Messages, you’ll need to meet a few requirements: you should be 18 or older, have RCS chats enabled, use a personal Google Account, have an Android phone with at least 6GB of RAM, and be set to either English (in supported countries) or French (Canada).
Once you’re set, here’s how to chat with Gemini:
- Open Google Messages
- Tap “Start chat” in the bottom right corner
- Select Gemini at the top as the contact
- Pick a sample prompt or type your request
- Chat until you get the text or image you need.
Gemini is also behind Magic Compose, a feature Google introduced in 2023 to help you rewrite and tweak message styles. While Magic Compose can adjust your messages in a few ways, its flexibility is more limited than a full chat with Gemini.
While Gemini in Messages means you don’t have to switch to the dedicated Gemini app or set it as your default assistant, it’s not quite the full experience. Responses are formatted like texts, which can lead to a few hiccups. For now, it’s a convenient tool for quick ideas and responses, even if it lacks some of the versatility you’d find in other Gemini-powered apps.
Is it any good?
Google Gemini is holding its own in the AI race, especially with its strong multimodal abilities and seamless integration across Google’s apps. Meanwhile, ChatGPT is making strides with its new SearchGPT feature, which provides real-time data access for the first time.
Google, however, has a significant advantage in its extensive search index, covering hundreds of billions of pages—a strong foundation for its reliability. It’s also reportedly working on a new AI tool, codenamed “Project Jarvis,” designed to operate a web browser for managing daily tasks.
The project may be previewed as early as December, along with Google’s next flagship Gemini model, expected to power Jarvis. If successful, it could leap-frog over the other models in AI capabilities, but we’ll have to wait and see how it performs.
Featured image: Google / Canva