Categories: Blog

TalkBack can read images even if your phone is offline – thanks to the on-device Gemini Nano

TalkBack, the indispensable Android feature for people who have blindness or low vision, gets a lot more useful – and powerful – thanks to the Gemini Nano with multimodality model.

There’s an extensive blog piece on the Android Developers Blog, where the team opens up about the latest enhancement of the screen reader feature from the Android Accessibility Suite.

Today, thanks to Gemini Nano with multimodality, TalkBack automatically provides users with blindness or low vision more vivid and detailed image descriptions to better understand the images on their screen.

– Android Developers Blog, September 2024

TalkBack includes a feature that provides image descriptions when developers haven’t added descriptive alt text. Previously, this feature relied on a small machine learning model called Garcon, which generated brief and generic responses, often lacking specific details like landmarks or products.The introduction of Gemini Nano with multimodal capabilities presented an ideal opportunity to enhance TalkBack’s accessibility features. Now, when users opt in on eligible devices, TalkBack leverages Gemini Nano’s advanced multimodal technology to automatically deliver clear and detailed image descriptions in apps like Google Photos and Chrome, even when the device is offline or experiencing an unstable network connection.

Google’s team provides an example that illustrates how Gemini Nano improves image descriptions. First, Garcon is presented with a panorama of the Sydney, Australia shoreline at night – and it might read: “Full moon over the ocean”. Gemini Nano with multimodality, however, can paint a richer picture, with a description like: “A panoramic view of Sydney Opera House and the Sydney Harbour Bridge from the north shore of Sydney, New South Wales, Australia”. Sounds far better, right?

Utilizing an on-device model like Gemini Nano was the only practical solution for TalkBack to automatically generate detailed image descriptions, even when the device is offline.

The average TalkBack user comes across 90 unlabeled images per day, and those images weren’t as accessible before this new feature. The feature has gained positive user feedback, with early testers writing that the new image descriptions are a “game changer” and that it’s “wonderful” to have detailed image descriptions built into TalkBack

– Lisie Lillianfeld, product manager at Google

When implementing Gemini Nano with multimodality, the Android accessibility team had to choose between inference verbosity and speed, a decision partly influenced by image resolution. Gemini Nano currently supports images at either 512 pixels or 768 pixels.

While the 512-pixel resolution generates the first token almost two seconds faster than the 768-pixel option, the resulting descriptions are less detailed. The team ultimately prioritized providing longer, more detailed descriptions, even at the cost of increased latency. To reduce the impact of this delay on the user experience, the tokens are streamed directly to the text-to-speech system, allowing users to begin hearing the response before the entire text is generated.

While I’m not yet boarding the AI hype train fully, AI-powered features like this are stunning – just think about the potential! And then, there are stories like this one that makes you want to tone down this “wonderful” progress of ours:

👇Follow more 👇
👉 bdphone.com
👉 ultraactivation.com
👉 trainingreferral.com
👉 shaplafood.com
👉 bangladeshi.help
👉 www.forexdhaka.com
👉 uncommunication.com
👉 ultra-sim.com
👉 forexdhaka.com
👉 ultrafxfund.com
👉 ultractivation.com
👉 bdphoneonline.com

Ultra Activation

Next At whopping 42% off the budget Galaxy Buds FE sell for peanuts »

Previous « Apple Intelligence on iPhone 16 has the highest chance to come to China

What we expect from Android & Wear OS smartwatches in 2025

Happy holidays to all my fellow Wear OS watch owners and a preemptive happy new…

18 hours ago

Blog

How to create AI-generated images on a Motorola phone with Magic Canvas

In the age of AI, being able to generate images from your phone is becoming…

2 days ago

Blog

Apple steps in to defend Google, says it ‘does not plan to create a search engine’ of its own

What you need to knowApple is seeking to defend its deal with Google, which provides…

3 days ago

Blog

This One UI 7 feature might make you want to only use Samsung devices

What you need to knowA new One UI 7 feature has been discovered, called "Camera…

4 days ago

Blog

Gemini’s Deep Research feature is available across more languages and regions

What you need to knowGemini Advanced's Deep Research feature was first announced for users early…

5 days ago

Blog

OnePlus 13 vs. OnePlus 11: Time for an upgrade?

The latest and greatestThe OnePlus 13 is the brand's newest and best smartphone. It's already…

6 days ago