The two-hour-long Google I/O keynote was held on May 10, and to no one’s surprise the event heavily focused on AI. AI was mentioned almost 150 times during the keynote, more than once per minute. As always Bontouch analyzes what Google shares during I/O, and sends out a newsletter with take-aways and insights to our brand partners.
This year we wanted to conduct an experiment. What if a Generative AI model could do the job instead of our professional experts? We already know Generative AI can be used to create all sorts of content, let’s put it to the test in this real-life scenario and see how it compares.
Excerpt from the AI-generated article:
An AI-Driven Future: Google I/O 2023 Highlights
At the crux of innovation, Google sets the stage at this year's Google I/O, demonstrating the company's strides in harnessing the power of AI to improve people's lives, enhance user experiences across products, and ensure a responsible approach to development. As a trailblazer in the product innovation industry, we delve into the highlights of this game-changing conference.
AI for All
Google's latest PaLM 2 model is revolutionizing the way we use AI, powering over 25 products and features, from Gmail to Google Photos. With fine-tuned domains in security and medicine, PaLM 2 aims to make AI accessible and universally useful. Google's commitment to AI responsibility includes watermarking and metadata to identify synthetically generated content, ensuring transparence and trust.
Let’s start with the basics. This year’s newsletter took about 12 hours to create by two of our senior Google experts. From watching the keynote to creating insights, writing the newsletter, and creating/finding suitable illustrations. In less than two days it was ready to be sent out to our partners. A job well done.
The AI summary, which was made from a transcript of the keynote video, took about 2-3 hours to create using ChatGPT together with generated illustrations from Midjourney. Roughly a 5x time gain using AI.
What about the actual result (see the entire AI summary at the end of this post)? ChatGPT did a good job of summarizing the event, finding the main themes, and grouping the information. There were however drawbacks:
The grouping wasn’t spot on. Sometimes it would group topics that weren’t related. Like information about “Project Tailwind” and “Responsible AI Tools”. Project Tailwind is a new product from Google meant to help users summarize and make sense of their own information in Google Docs, whereas the topic of responsible AI tools is Google’s overall attempt of creating AI tools that can’t be misused or create harm.
The language of the resulting summary isn’t dynamic and engaging enough for the intended audience. It’s pretty generic and at times repetitive.
The main drawback of the AI summary is that it doesn’t provide any additional insights or takeaways. It is a plain summary of the transcript; a report, not an analysis. This is where we – humans – as experts and creatives can provide differentiating value.
The first two items above can most likely be improved by spending more time crafting better prompts for ChatGPT. At that point, however, the time gained from using the tool would be lost on improving details. A better approach at this time in the process would be to hand the draft over to the experts to craft the end result.
“Do you need a Lo-Fi MP3 sample or a freshly composed piece of music?”
In its current state, AI can cut away large pieces of bulk work for you. Developers can get help with boilerplate code, designers can have an assistant when ideating and prototyping and data analysts can leave tedious data transformation to an AI and focus on the actual analysis.
In the case of summarizing an event or article, ChatGPT acts effectively as a compression tool. It takes the available data as input and returns a more condensed version of the same content. In the process, you never really know what gets cut, but the model seems to have done a decent job of selecting the best pieces and presenting them in a coherent way. In a way, it’s similar to listening to an MP3 instead of going to a concert.
What ChatGPT currently lacks is the ability to understand how the audience will receive and interpret the information that is provided, and what consequences it might have for the future. A human expert that does these things is effectively creating new knowledge that didn’t exist before. It’s akin to composing a completely new score after having been inspired by another performance.
What’s the better solution depends solely on the user’s needs.
Anyone who’s been experimenting with long texts in for example ChatGPT has encountered one significant limitation of language models – the maximum number of words you can prompt. To tackle this, we chained multiple model calls together in a series of prompts. The idea was to compress the text, while still maintaining the original meaning of it. Below you can follow the approach we took.
We started out by extracting a transcript from the video presentation. The transcript was then segmented into chunks of text based on the maximum word count the model could handle.
Before summarizing the entire transcript, we went through a context preparation step where the model was instructed to pull key takeaways from each of the created chunks. These key takeaways were later used as context for each summarization prompt.
The purpose is to provide relevant information to the model in each summarization step since the entire text can’t be summarized in one go. An earlier chunk of text might for example contain information that is important when summarizing a later chunk.
In this step, we create the summaries. The model is instructed to summarize each chunk, but this time we provide the context as explained in the previous step. This allows the model to generate higher-quality summaries. Each summary chunk is then combined into one large summary.
To introduce diverse perspectives, we guided the summarization process by including specific descriptions in the prompts. For instance, we instructed the model to condense the text from the perspective of a developer. This approach aimed to improve the variety of the output.
In the final step, we instructed the model to generate an article based on the compiled summaries. We provided explicit instructions regarding the desired writing style, composition, conveyed emotion, and other relevant aspects. Here’s the prompt we used (yes, we also used ChatGPT to formulate the prompt itself):
“You are a professional writer, writing about Google I/O, which is an annual developer conference held by Google in Mountain View, California. You will be provided a text about this year’s conference and should transform it into a section in an article.
Write an engaging and delightful article section about the highlights of Google I/O 2023 that is easy to understand for a wide audience. Put the analysis in the perspective of your organization as an innovative and digital product developer.
Use a writing style inspired by Ernest Hemingway, with its concise and straightforward approach, but make sure it still feels modern and reflects the tone of an innovative tech company. The article section should have good pacing, with well-structured sections that are neither too long nor too short. Make sure to keep the content concise, avoiding unnecessary fluff or filler such as buzzwords and business jargon. It should evoke emotion and engagement, providing readers with an enjoyable reading experience.“
While summarization can be seen as a valuable breakdown of information, automatically generating articles based on third-party content may feel like appropriating someone else’s work.
Among other things, it raises the question of whether the value lies not only in the end result but also in the required effort to produce the content in question. Where do you draw the line and how will rules and regulations be adapted to deal with this? That’s a topic for another day.
__________________________________________________________________________
(Article created by ChatGPT, images by Midjourney)
At the crux of innovation, Google sets the stage at this year’s Google I/O, demonstrating the company’s strides in harnessing the power of AI to improve people’s lives, enhance user experiences across products, and ensure a responsible approach to development. As a trailblazer in the product innovation industry, we delve into the highlights of this game-changing conference.
Google’s latest PaLM 2 model is revolutionizing the way we use AI, powering over 25 products and features, from Gmail to Google Photos. With fine-tuned domains in security and medicine, PaLM 2 aims to make AI accessible and universally useful. Google’s commitment to AI responsibility includes watermarking and metadata to identify synthetically generated content, ensuring transparency and trust.
Google is taking significant strides in integrating generative AI technology into its products. Gmail’s “Help me write” feature streamlines email drafting, while Google Maps’ Immersive View offers a bird’s eye overview of journeys, complete with traffic and weather data. Google Photos’ Magic Editor employs semantic understanding and generative AI to perform photo editing tasks like removing distractions and repositioning subjects.
The advanced PaLM 2 model will power over 25 products, aiming to make AI helpful for everyone and fulfill Google’s mission of organizing the world’s information for universal accessibility. With fine-tuning in specific domains, PaLM 2 facilitates global developer collaboration and emphasizes AI responsibility, including watermarking and metadata for synthetically generated content.
Google’s conversational AI experiment, Bard, now supports more programming languages and collaborates on tasks such as code generation, debugging, and code snippet explanations. With code citations, visual responses, and Google Lens integration, Bard is set to become a vital tool for developers.
Google’s next-generation foundation model, Gemini, is a multi-modal and highly efficient tool for API integrations. Expected to be as versatile as PaLM 2, Gemini showcases Google’s deep commitment to AI responsibility. Duet AI for Workspace, on the other hand, brings generative AI features to users, aiding collaborative writing with contextual prompts.
Generative AI is also transforming Google Search, providing comprehensive, user-friendly search experiences. The AI-powered snapshot feature and conversational mode enable quick topic overviews and seamless shopping. Vertex AI lets businesses build generative applications, while Enterprise Search retrieves fresh, factual information from various sources.
Google introduces three new models in Vertex: Imagine for image generation, Codey for code completion, and Chirp for universal speech. Reinforcement Learning From Human Feedback will enable fine-tuning of pre-trained models with human input. Google’s AI-optimized infrastructure accelerates large-scale training workloads and reduces costs, backed by NVIDIA’s H100 GPU-based A3 Virtual Machines.
Project Tailwind, an AI-first Notebook, facilitates rapid learning with personalized, private AI models. Google also presents tools to combat misinformation, like the “About this Image” feature in Search and metadata for AI-generated images. The experimental AI video dubbing service, Universal Translator, incorporates guardrails to prevent misuse.
AI technology is elevating experiences on tablets and watches, enabling seamless device integration through features like Fast Pair and Nearby Share. The updated Find My Device experience supports a broader range of devices, while Android 14 introduces new customization options for lock screens and wallpapers.
Google’s Tensor G2 chip brings cutting-edge AI research to Pixel devices, combining on-device intelligence with cloud-based AI for personalized experiences. The Pixel 7a and Pixel Tablet, powered by the Tensor G2 chip, boast upgraded camera hardware, AI-driven features, and smart home integration.
The Pixel Fold, a foldable phone that transforms into a compact tablet, offers a versatile form factor, flagship camera system, and fluid app experience across both screens. Built to last with a durable hinge and Corning Gorilla Glass Victus, the Pixel Fold is priced at $1799 and features Personal AI capabilities like Dual Screen Interpreter Mode for live translations. The Pixel Tablet is available for pre-order at $499, with a free $129 charging speaker dock included.
Google’s dedication to making AI helpful for everyone and unlocking vast opportunities with the developer community is evident in the innovations presented at Google I/O 2023, setting the stage for a technologically advanced and responsible future.
Google I/O 2023 has provided us with a wealth of inspiration and cutting-edge technology that we, as product developers, can leverage to enhance our digital products. From AI-driven hardware experiences to responsible implementation, the possibilities are vast and exciting.
As we move forward, we will continue to embrace the power of AI, unlocking its potential to revolutionize our digital products and improve the lives of our users. Together, we embark on a journey towards a future where AI is helpful, accessible, and responsible for all.