Top 10 AI News #weekly
Claude is now introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. They are also introducing a new capability in beta: computer use.
Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text.
Genmo, an AI company focused on video generation, has announced the release of a research preview for Mochi 1, a new open-source model for generating high-quality videos from text prompts — and claims performance comparable to, or exceeding, leading closed-source/proprietary rivals such as Runway’s Gen-3 Alpha, Luma AI’s Dream Machine, Kuaishou’s Kling, Minimax’s Hailuo, and many others.
Canadian AI image startup Ideogram, founded last year by former AI researchers from Google Brain, has made a new for itself among AI creators with its text-to-image models that produce a wide range of styles from realistic to fantastical, and most impressively of all, highly accurate text baked into the image itself (something other leading image generators, including Midjourney, took a while to implement and still struggle to generate reliably).
The SD 3.5 family is designed to run on consumer-grade systems—even low end by some standards—making advanced image generation more accessible than ever. And yes, they’ve heard the complaints about the previous version so this one promises to be a lot better.
Another important aspect of this release is the new licensing model. Stable Diffusion 3.5 comes under a more permissive license, allowing both commercial and non-commercial use. Small businesses and people who make less than $1,000,000 in revenue from the tool can use and build on these models for free.
ElevenLabs just introduced Voice Design, a new AI voice generation that allows you to generate a unique voice from a text prompt alone.
You can describe the age, accent, tone, or character itself to generate a new and accurate AI voice in seconds. The new Voice Design is fairly easy to use, and ElevenLabs has also stated that the API will be available in 1 week.
Google DeepMind has been using its AI watermarking method on Gemini chatbot responses for months – and now it’s making the tool available to any AI developer.
Microsoft is introducing autonomous artificial intelligence agents, or virtual employees, that can perform tasks such as handling client queries and identifying sales leads.
The US tech company is giving customers the ability to build their own AI agents as well as releasing 10 off-the-shelf bots that can carry out a range of roles including supply chain management and customer service.
Anthropic’s Claude chatbot can now write and run JavaScript code.
Today, Anthropic launched a new analysis tool that helps Claude respond with what the company describes as “mathematically precise and reproducible answers.” With the tool enabled — it’s currently in preview — Claude can perform calculations and analyze data from files like spreadsheets and PDFs, rendering the results as interactive visualizations.
OmniParser is a general screen parsing tool, which interprets/converts UI screenshot to structured format, to improve existing LLM based UI agent.
Google is developing J.A.R.V.I.S. that can takes over a person’s web browser to complete tasks such as gathering research, purchasing a product or booking a flight.
Google is “developing artificial intelligence that takes over a person’s web browser to complete tasks such as gathering research, purchasing a product or booking a flight.”
“Project Jarvis” — in a nod to J.A.R.V.I.S. in Iron Man — would operate in Google Chrome and is a consumer-facing (rather than enterprise) feature to “automate everyday, web-based tasks.” The article doesn’t specify whether this would be for mobile or desktop.
At I/O, Pichai showed off “Gemini and Chrome working together to help you do a number of things to get ready: Organizing, reasoning, synthesizing on your behalf.” That on-stage scenario was generically happening via gemini.google.com with no other UI shown off compared to the previous example happening through Gemini for Android.
Best Prompt(s) #weekly
Prompt from Anthropic Claude Sonnet 3.5
Prompt from Anthropic for the new Claude Sonnet 3.5.
Best Open Source Alternatives to Proprietary Software#weekly
Zerox OCR
Zerox OCR is a zero-configuration OCR powered by GPT-4o mini, which can easily translate documents into Markdown formats such as PDFs, Word files and images.
Agent.exe
Agent.exe is a simple Electron app that lets Claude 3.5 Sonnet control your local computer directly.
This Week's Summary
In this edition, we have explored a variety of developments in the world of artificial intelligence. Big moments include Anthropic's new "computer use" ability for its Claude AI model, allowing it to automate tasks by directly controlling computers; Ideogram's "Infinite Canvas" for manipulating and combining generated images with accurate text; Genmo's open-source Mochi 1 model for high-quality video generation; and ElevenLabs' innovative Voice Design tool that can generate personalized AI voices from text prompts. We also cover news on Microsoft's AI employees, Google's AI watermarking tool, and updates to Stable Diffusion 3.5.