This Week in AI | Week 10

blog thumbnail

You are reading latest edition of This Week in AI, where we bring you the most recent advancements and updates in the field of artificial intelligence. In week 10, we have witnessed major launches and developments from various organisations, including Meta, Microsoft, and Hugging Face. Let us see the highlights of this week’s AI industry.


Meta’s LLama-3: Open Multimodal Model

Meta releases LLama-3, their latest flagship large language model, which is a great step forward in open AI development. Despite ongoing arguments about its “openness,” LLama-3 remains a well-known project in the community.

Key Features of LLama-3:

  • Multimodal Capabilities: LLama-3 is designed to handle multimodal inputs, allowing for seamless integration of text and images.
  • Scalability: With its large-scale architecture, LLama-3 offers extensive capacity for processing complex data.
  • Community Engagement: Despite Meta’s contested approach to openness, LLama-3 remains a focal point for collaborative efforts and innovation within the AI community.

Other Model Releases This Week:

  • Mistral 8×22: A “mixture of experts” model from a French outfit, showcasing significant computational capabilities.
  • Stable Diffusion 3 Turbo: An upgraded version of SD3, demonstrating improved performance and stability.
  • Adobe Acrobat AI Assistant: Offering document interaction through AI technology, potentially leveraging ChatGPT for enhanced user experience.
  • Reka Core: Developed by a small team formerly employed by major AI companies, Reka Core competes with established models in the market.
  • Idefics2: Hugging Face’s latest multimodal model, emphasising openness and enhanced capabilities in text and image processing.

These releases show the expanding field of artificial intelligence, with a range of models meeting different application and user requirements.


Microsoft’s VASA-1: Lifelike Audio-Driven Talking Faces

Microsoft presented VASA-1, an outstanding framework for creating hyper-realistic talking faces using a single portrait shot and spoken sounds. This new model generates exact lip-audio synchronisation, lifelike face behaviour, and naturalistic head movements, all generated in real time.

Key Highlights of VASA-1:

  • Real-time Generation: VASA-1 enables the generation of lifelike talking face videos in real time, offering seamless interaction with virtual characters.
  • Visual Affective Skills: The framework captures a broad spectrum of facial nuances, enhancing the perception of authenticity and liveliness in generated content.
  • Controllability: VASA-1 provides flexible generation control, including changes for eye direction, head distance, and emotional expression.

While VASA-1 presents immense potential for interactive applications and virtual communication, Microsoft emphasises responsible AI practices to mitigate misuse risks.


Hugging Face Introduces Idefics2

Hugging Face introduces Idefics2, a powerful multimodal 8B model designed to handle text and images, with features such as image description, visual question responding, and document analysis.

Key Features of Idefics2:

  • Enhanced Capabilities: Idefics2, with its 8 billion parameters and open licencing, is a major advancement in multimodal AI research.
  • Top-tier Performance: Idefics2 achieves competitive results in Visual Question Answering benchmarks, rivalling larger models in its class.
  • Integration with HuggingFace Transformers: Seamlessly integrated into the HuggingFace🤗 Transformers ecosystem, Idefics2 facilitates easy finetuning for various multimodal applications.

Idefics2 promises to be a useful tool for both scholars and practitioners, due to its rigorous training and community-driven development.


Suggested Reading

Conclusion

The latest releases of LLama-3 & others , VASA-1, and Idefics2 highlight the ongoing advancements in multimodal AI research, showcasing enhanced capabilities and new frameworks for various applications. These models represent significant progress in the area and present promising chances for both researchers and practitioners to explore open-source AI technology.

profile pic
Neha
April 21, 2024
Newsletter
Sign up for our newsletter to get the latest updates

Related posts