The race to become your go-to AI-powered search engine is heating up, while Claude's latest update lets it control your computer, and recent breakthroughs in scent teleportation and tactile sensing demonstrate the expanding influence of AI across scientific domains.
Join us for an overview of the latest news in AI R&D, the greatest open-source model releases in the past weeks, and a curated list of the month's top 10 trending research papers.
🎉 Catch us live for the next edition of our webinar on December 6th for the year-end special:
News Articles
Model Releases
Anthropic: Claude 3.5 Sonnet & Haiku
DeepSeek: Janus
Zyphra: Zamba 2
Genmo: Mochi 1 Preview
Decart AI & Etched: Oasis
Microsoft: Omniparser
Stability AI: Stable Diffusion 3.5
Recraft: Recraft V3
Black Forest Labs: Flux 1.1 [pro] Ultra
Cohere For AI: Aya Expanse
HuggingFace: SmolLM 2
UsefulSensors: Moonshine
Trending AI papers for November 2024
[1] TapeAgents: a Holistic Framework for Agent Development and Optimization - D. Bahdanau et al. (ServiceNow) - 16 October 2024
→ A novel framework for agent development and optimization based on tapes (structured logs) for session persistence, debugging, and data-driven optimization.
🤔 Why? To provide AI practitioners with holistic tooling support for all stages of the agent lifecycle.
💡 Key Results:
LLM agents are resumable, modular state machines that read and write the tape.
The tape is both the memory, the blackboard, and the orchestrator queue.
Supports the creation of various agents and finetuning both prompts and LLM weights.
[2] Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free - Z. Li & T. Zhou (UMD) - 14 October 2024
→ The routing weights (RW) in MoEs can be used as embedding models without any fine-tuning.
🤔 Why? RWs better capture the underlying themes of the input, while hidden states (HS) are designed to predict the next output token.
💡 Key Results:
MoE Embedding models outperform embedding models based solely on HS.
RWs are more robust to prompt variations and focus on higher-level semantics.
A weighted sum of the similarities between RW & HS outperforms their concatenation.
[3] MM-Embed: Universal Multimodal Retrieval with Multimodal LLMs - S. Lin et al. (Intuit) - 04 November 2024
→ MM-Embed: a multimodal retrieval framework that finetunes multimodal LLMs as bi-encoders. (LLaVa-NeXT + NV-Embed-v1)
🤔 Why? More and more real-world retrieval needs span multiple modalities (such as tables, figures & complex PDF pages).
💡 Key Results:
Naive fine-tuning on multimodal retrieval data underperforms CLIP in cross-modal retrieval due to the modality bias.
Modality-aware negative mining fixes this, achieving SOTA on multimodal retrieval while preserving text-only skills.
MM-Embed can also be used as a 0-shot multimodal re-ranker (+7 pts. mAP@5).
[4] HyQE: Ranking Contexts with Hypothetical Query Embeddings - W. Zhou et al. (NVIDIA) - 19 October 2024
→ HyQE creates document representations by using hypothetical queries generated by LLMs. It is the inverse of HyDE.
🤔 Why? Even with finetuning, it is hard to bring queries and documents into the same parts of the vector space.
💡 Key Results:
HyQE outperforms baseline embedding models in terms of NDCG@10 across various datasets.
HyQE is compatible with methods like HyDE, and further improves the ranking quality beyond using HyDE alone.
[5] Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval - S. Hsu et al. (Stanford) - 30 October 2024
→ Learning to Retrieve by Trying (LeReT) is an RL-based framework that improves the grounding of LLM outputs by training the search query generator using preference optimization.
🤔 Why? It enables accurate retrieval to tackle complex queries that require multi-hop QA.
💡 Key Results:
Evaluated on HotpotQA and HoVer, both multi-hop datasets based on Wikipedia.
LeReT outperforms few-shot prompting with up to 29% improvement in recall,
and 17% in downstream accuracy.
Applying LeReT iteratively leads to further performance gains.
[6] Differential Transformer - T. Ye et al. (Microsoft) - 07 October 2024
→ Differential Transformer enhances attention to relevant context while canceling noise.
🤔 Why? Transformers often over-attend to irrelevant context (i.e., attention noise) hurting the capability of context modeling.
💡 Key Results:
Outperforms the Transformer in various tasks like language modeling, key information retrieval, and in-context learning.
Experimental results show its effectiveness in mitigating hallucination, handling long-context modeling, and reducing activation outliers
[7] What Matters for Model Merging at Scale? - P. Yadav et al. (Google) - 04 October 2024
→ An investigation into the factors that affect model merging, such as model size, base model quality, merging methods, and the number of experts involved.
🤔 Why? Model merging has emerged as an effective way to enhance capabilities while allowing for decentralized development.
💡 Key Results:
Merging methods studied: Averaging, Task Arithmetic, TIES-Merging, Dare-TIES.
Larger models and instruction tuning facilitate easier merging.
All methods show similar performance at large scales.
[8] DexMimicGen: Automated Data Generation for Bimanual Dexterous Manipulation via Imitation Learning - Z. Jiang et al. (NVIDIA) - 31 October 2024
→ DexMimicGen automatically generates large (21k) datasets for bimanual dexterous manipulation from a few human demonstrations (60) using simulation.
🤔 Why? Data collection is the main bottleneck for training robotics models, especially for complex (humanoid or dexterous) robots.
💡 Key Results:
Create a simulated digital twin for a real-world task, replay real-world human demonstrations in the simulation, and synthesize trajectories.
Transfer generated trajectories back into the real world, producing a visuomotor policy of 90% success rate, as opposed to 0% from just using the human demos.
[9] Artificial Kuramoto Oscillatory Neurons - T. Miyato et al. (University of Tübingen, University of Amsterdam) - 17 October 2024
→ AKOrNs leverage the dynamic, spatiotemporal nature of neuron interactions, using dynamic synchronization to bind neurons together.
🤔 Why? Neuroscience. Exploring neurons as more complex oscillatory units subject to synchronization, encouraging them to become aligned or anti-aligned.
💡 Key Results:
Applications in unsupervised object discovery, solving Sudoku puzzles, image classification with robustness to adversarial attacks, and calibrated uncertainty quantification.
[10] Does equivariance matter at scale? - J. Brehmer et al. (Qualcomm) - 30 October 2024
→ Do equivariant architectures retain their benefits when scaling to large datasets and compute?
🤔 Why? Explores the fundamental question of whether explicitly modeling inductive biases is advantageous compared to learning from data.
💡 Key Results:
Equiv. models improve data efficiency and outperform non-equiv. ones on large sets.
Non-equiv. models can close the gap with data augmentation given enough epochs.
Scaling with compute follows a power law, with equiv. models being more efficient.
The compute-optimal budget differs between the two model types.
And a few runner-ups:
MIRAGE-Bench: Automatic Multilingual Benchmark Arena for Retrieval-Augmented Generation Systems - N. Thakur et al. (U. Waterloo, Vectara) - 17 October 2024
Starbucks: Improved Training for 2D Matryoshka Embeddings - S. Zhuang et al. (CSIRO) - 17 October 2024
CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos - N. Karaev et al. (Meta) - 15 October 2024
MAIR: A Massive Benchmark for Evaluating Instructed Retrieval - W. Sun et al. (CMU, SDU, SUDA, Baidu, Leiden University) - 13 October 2024
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition - Z. Xiong et al. (UW-Madison) - 07 October 2024
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations - H. Orgad et al. (Technion) - 03 October 2024
MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering - J. Chan et al. (OpenAI) - 09 October 2024
MatMamba: A Matryoshka State Space Model - A. Shukla et al. (Scaled Foundations) - 09 October 2024
You can find an annotated collection of these papers (+ more that didn't make the cut) in Zeta Alpha, allowing you to easily discover relevant literature and dive deeper into any topic that interests you.
Here is a 4-minute overview of the papers in our top-10 list:
As always, the full recording of our latest Trends in AI episode is available on our YouTube, covering all of the news, model releases, and papers in depth. Until next time, enjoy discovery!