top of page

Trends in AI — January 2025

Writer's picture: Dinos PapakostasDinos Papakostas

Will 2025 be the year of AGI? We're kicking off the year by looking at the key trends that shaped AI and the early signs of where things might head next. OpenAI's O3 models smashed the ARC-AGI benchmark, agents are poised to disrupt the workforce as we know it, and a volatile dynamic is shaping up between the US, the EU, and the rest of the world.

Join us for an overview of the most exciting developments in AI R&D, important model releases, and 10 of the latest trending research papers.

 

Catch us live for the next edition of our webinar for all the latest news and developments in AI:

 

News

Models

Trending AI papers for January 2025


Meta-CoT extends the traditional Chain-of-Thought (CoT) paradigm by explicitly modeling the underlying reasoning process involved in complex problem-solving, including non-linear iterative exploration and verification.


🤔 Why? Language models often struggle with complex reasoning tasks, as the reasoning data in their pre-training does not accurately represent the true data generation process for such problems.


💡 Key Findings

  • Training data and model size both greatly affect the performance on the HARP benchmark.

  • The integration of search algorithms enhances the models' ability to handle complex reasoning.

  • Process Reward Models help with search efficiency and improve downstream accuracy.



[2] Search-o1: Agentic Search-Enhanced Large Reasoning Models - X. Li et al. (RUC) - 09 January 2025

Search-o1 combines an agentic RAG mechanism and a Reason-in-Documents module to acquire and integrate external knowledge during the reasoning process.



🤔 Why? Uncertainty in reasoning models can cause hallucinations & incoherence.


💡 Key Findings

  • Search-o1 outperforms QwQ (+4.7%) and QwQ+RAG (+3.1%) on Science QA, Maths, and Coding.

  • On GPQA, it outperforms expert biologists when the base/naive RAG models didn’t.

  • Also strong performance on both single-hop and muti-hop QA.


[3] Training Large Language Models to Reason in a Continuous Latent Space - S. Hao et al. (FAIR) - 09 December 2024

Chain Of CONtinUous Thought (COCONUT) enables LLMs to reason in a continuous latent space rather than using explicit tokens by using the last hidden state of the models as a representation of reasoning. This allows them to explore multiple potential reasoning paths simultaneously, similar to a breadth-first-search algorithm.



🤔 Why? Current reasoning methods like CoT rely heavily on language, which is primarily a communication tool rather than one for reasoning.


💡 Key Findings

  • Coconut outperforms CoT in logical reasoning tasks that require planning and backtracking.

  • Models with Coconut also demonstrate advanced reasoning patterns with fewer tokens.

  • Increasing the number of continuous thoughts correlates with improved performance.


[4] Titans: Learning to Memorize at Test Time - A. Behrouz et al. (Google Research) - 31 December 2024

Titans is a new architectural variant of the Transformer that incorporates long-term neural memory modules to effectively memorize historical context at test time.


🤔 Why? Existing architectures struggle with long contexts, generalization, and reasoning, which limits their effectiveness in real-world applications.



💡 Key Findings

  • Titans outperform modern recurrent models and their hybrid variants on language modeling and commonsense reasoning tasks.

  • They remain competitive with Transformers while scaling to context window sizes > 2M tokens.

  • The architecture demonstrates superior accuracy in needle-in-the-haystack tasks.


[5] Agent Laboratory: Using LLM Agents as Research Assistants - S. Schmidgall et al. (AMD) - 07 January 2025

Agent Laboratory is an end-to-end autonomous research workflow that helps researchers implement their ideas. It uses agents to conduct literature reviews, formulate plans, execute experiments, and write reports.



🤔 Why? Automate repetitive and tedious tasks to complement creativity rather than replace it; humans should focus on ideation.


💡 Key Findings

  • o1-preview is the best “brain”.

  • Human involvement significantly improves the overall research quality.

  • Research expenses are reduced by 84% compared to previous methods.



ModernBERT is an improved encoder-only Transformer model optimized for performance and efficiency, particularly for tasks involving long sequence lengths.


🤔 Why? Despite the success of BERT, there have been limited improvements in encoders, with existing models relying on outdated architectures and training data.



💡 Key Findings

  • ModernBERT achieves state-of-the-art performance on the GLUE benchmark.

  • In retrieval tasks, ModernBERT outperforms other baselines in both single-vector and multi-vector settings.

  • It supports long-context inputs (up to 8192 tokens) two times faster than the next fastest model.



MiniMax-01 uses lightning attention and a mixture-of-experts architecture to scale foundation models effectively and handle long context windows on par with top-tier commercial models.


🤔 Why? There is an increasing demand for models that can process long contexts for real-world applications, while the Transformer architecture struggles with computational complexity and memory limitations when handling long sequences.



💡 Key Findings

  • MiniMax-01 achieves comparable performance with SOTA models at lower computational costs.

  • The model is trained with context windows up to 1M tokens and can extrapolate up to 4M tokens during inference.

  • Its architecture brings significant improvements in pre-filling latency.


[8] Cosmos World Foundation Model Platform for Physical AI - NVIDIA Team - 07 January 2025

Cosmos is a permissively licensed world model that can be finetuned for real-world downstream applications. Large-scale pre-training on digital data simulating the world allows it to be adapted to physical scenarios through post-training.



🤔 Why? “Physical AI needs to be trained digitally first”. It needs a digital twin of itself (policy model) and the world (world model).


[9] GME: Improving Universal Multimodal Retrieval by Multimodal LLMs - X. Zhang et al. (Alibaba) - 21 December 2024

GME enables search across multiple modalities with a unified retrieval model. Based on a multimodal LLM + synthetic data to create a diverse training set.


🤔 Why? Existing multimodal training data is imbalanced, limiting the potential of multimodal LLMs for retrieval.



💡 Key Findings

  • Diverse modalities in training data and synthetic data quality are crucial.

  • Outperforms strong baselines on all modality types (single/cross/fused)

  • The representation space is much more unified compared to CLIP.


[10] Arctic-Embed 2.0: Multilingual Retrieval Without Compromise - P. Yu et al. (Snowflake) - 03 December 2024

Arctic-Embed 2.0 is a text embedding model with strong retrieval performance across both multilingual and English-only benchmarks, trained with Matryoshka Representation Learning (MRL).


🤔 Why? Multilingual embedding models often underperform in English retrieval, creating a need for a solution that maintains high performance across languages.



💡 Key Findings

  • Arctic Embed shows competitive performance on benchmarks like BEIR, CLEF, and MIRACL.

  • Detailed ablation experiments verify several modeling choices on the scale of the pre-training datasets and the benefits of cross-lingual transfer.

  • MRL allows Arctic Embed to maintain ~99% of its performance with truncated embeddings.


And a few runner-ups:


You can find an annotated collection of these papers (+ more that didn't make the cut) in Zeta Alpha, allowing you to easily discover relevant literature and dive deeper into any topic that interests you.


Here is a 3-minute overview of the papers in our top-10 list:


As always, the full recording of our latest Trends in AI episode is available on our YouTube channel, covering all of the news, model releases, and papers in depth.


Until next time, enjoy discovery!

282 views0 comments

Recent Posts

See All
bottom of page