Issues / ISSUE #2404

From Anthropic's Computer Control to Ideogram's Infinite Canvas, This Week is Filled with Key Insights You Won't Want to Miss

Stay informed with the latest in AI! This week, we highlight a range of exciting advancements, from AI-powered computer control to cutting-edge text-to-image generation and voice synthesis.

From Anthropic's Computer Control to Ideogram's Infinite Canvas, This Week is Filled with Key Insights You Won't Want to Miss

Jump to:

Top 10 AI News #weekly

Claude is now introducing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. They are also introducing a new capability in beta: computer use.

Developers can now direct Claude to use computers the way people do—by looking at a screen, moving a cursor, clicking, and typing text.

Further Reading

Genmo, an AI company focused on video generation, has announced the release of a research preview for Mochi 1, a new open-source model for generating high-quality videos from text prompts — and claims performance comparable to, or exceeding, leading closed-source/proprietary rivals such as Runway’s Gen-3 Alpha, Luma AI’s Dream Machine, Kuaishou’s Kling, Minimax’s Hailuo, and many others.

Further Reading

Canadian AI image startup Ideogram, founded last year by former AI researchers from Google Brain, has made a new for itself among AI creators with its text-to-image models that produce a wide range of styles from realistic to fantastical, and most impressively of all, highly accurate text baked into the image itself (something other leading image generators, including Midjourney, took a while to implement and still struggle to generate reliably).

Further Reading

The SD 3.5 family is designed to run on consumer-grade systems—even low end by some standards—making advanced image generation more accessible than ever. And yes, they’ve heard the complaints about the previous version so this one promises to be a lot better.

Another important aspect of this release is the new licensing model. Stable Diffusion 3.5 comes under a more permissive license, allowing both commercial and non-commercial use. Small businesses and people who make less than $1,000,000 in revenue from the tool can use and build on these models for free.

Further Reading

ElevenLabs just introduced Voice Design, a new AI voice generation that allows you to generate a unique voice from a text prompt alone.

You can describe the age, accent, tone, or character itself to generate a new and accurate AI voice in seconds. The new Voice Design is fairly easy to use, and ElevenLabs has also stated that the API will be available in 1 week.

Further Reading

Google DeepMind has been using its AI watermarking method on Gemini chatbot responses for months – and now it’s making the tool available to any AI developer.

Further Reading

Microsoft is introducing autonomous artificial intelligence agents, or virtual employees, that can perform tasks such as handling client queries and identifying sales leads.

The US tech company is giving customers the ability to build their own AI agents as well as releasing 10 off-the-shelf bots that can carry out a range of roles including supply chain management and customer service.

Further Reading

Anthropic’s Claude chatbot can now write and run JavaScript code.

Today, Anthropic launched a new analysis tool that helps Claude respond with what the company describes as “mathematically precise and reproducible answers.” With the tool enabled — it’s currently in preview — Claude can perform calculations and analyze data from files like spreadsheets and PDFs, rendering the results as interactive visualizations.

Further Reading

OmniParser is a general screen parsing tool, which interprets/converts UI screenshot to structured format, to improve existing LLM based UI agent.

Further Reading

Google is developing J.A.R.V.I.S. that can takes over a person’s web browser to complete tasks such as gathering research, purchasing a product or booking a flight.

google preps jarvis ai

Google is “developing artificial intelligence that takes over a person’s web browser to complete tasks such as gathering research, purchasing a product or booking a flight.” 

“Project Jarvis” — in a nod to J.A.R.V.I.S. in Iron Man — would operate in Google Chrome and is a consumer-facing (rather than enterprise) feature to “automate everyday, web-based tasks.” The article doesn’t specify whether this would be for mobile or desktop.

At I/O, Pichai showed off “Gemini and Chrome working together to help you do a number of things to get ready: Organizing, reasoning, synthesizing on your behalf.” That on-stage scenario was generically happening via gemini.google.com with no other UI shown off compared to the previous example happening through Gemini for Android.

Further Reading

Best Prompt(s) #weekly

Prompt from Anthropic Claude Sonnet 3.5

Prompt from Anthropic Claude Sonnet 3.5

Prompt from Anthropic for the new Claude Sonnet 3.5.

Read More

Best Open Source Alternatives to Proprietary Software#weekly

Zerox OCR

Zerox OCR

Zerox OCR is a zero-configuration OCR powered by GPT-4o mini, which can easily translate documents into Markdown formats such as PDFs, Word files and images.

Read More

Agent.exe

Agent.exe

Agent.exe is a simple Electron app that lets Claude 3.5 Sonnet control your local computer directly.

Read More

This Week's Summary

In this edition, we have explored a variety of developments in the world of artificial intelligence. Big moments include Anthropic's new "computer use" ability for its Claude AI model, allowing it to automate tasks by directly controlling computers; Ideogram's "Infinite Canvas" for manipulating and combining generated images with accurate text; Genmo's open-source Mochi 1 model for high-quality video generation; and ElevenLabs' innovative Voice Design tool that can generate personalized AI voices from text prompts. We also cover news on Microsoft's AI employees, Google's AI watermarking tool, and updates to Stable Diffusion 3.5.

Our Rating System

To maintain objectivity and fairness, our news or project selection has not been influenced by any advertisers.