Analyzing the homerun year for LLMs: the top-100 most cited AI papers in 2023, with all medals for open models.

9 October 2024, Mathias Parisot, Jakub Zavrel.

Even in the red hot global race for AI dominance, you publish and you perish, unless your peers pick up your work, build further on it, and you manage to drive real progress in the field. And of course, we are all very curious who is currently having that kind of impact. Are the billions of dollars spent on AI R&D paying off in the long run? So here is, in continuation of our popular publication impact analysis of last year, Zeta Alpha's ranking of the most cited AI papers of 2023. Citations offer a clear signal of the innovations that continue to shape the trajectory of the field today, much more than for example social media mentions. As we progress through 2024, we can separate the wheat from the chaff, and see which papers produced the foundational research that is driving the next wave of breakthroughs.

While the race to bring competitive AI products to the market has led many industry labs like OpenAI and Google to be more secretive and publish less, in favor of more closed models and product-focused development, others, like Microsoft and Meta are picking up the glove and moving towards more open research. Tracking citation counts and academic contributions over time allows us to better understand which ideas have lasting impact and where AI research is heading.

Count of papers in the top 100 published by organisations — Figure 1: Top organisations by count in top-100 in 2023

So without further ado, the most cited papers of 2023 are:

1️⃣ LLaMA: Open and Efficient Foundation Language Models by Meta with 8534 citations - The first collection of foundation models outperforming GPT-3 and open to the research community released by Meta.

2️⃣ Llama 2: Open Foundation and Fine-Tuned Chat Models by Meta with 7774 citations - The 2nd version of the Llama collection, this time fine-tuned for dialogue.

3️⃣ Segment Anything by Meta with 5293 citations - Release of a model and dataset for zero-shot image segmentation.

4️⃣ GPT-4 Technical Report by OpenAI with 3384 citations - Significant improvement over GPT-3 and multimodal (text and image) inputs.

5️⃣ BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models by Salesforce with 3099 citations - Pre-training method for SOTA vision-language performance with far fewer parameters.

6️⃣ Visual Instruction Tuning by University of Wisconsin–Madison, Microsoft and Columbia University with 2818 citations - LLaVA, a large multimodal model tuned for multimodal chat capabilities.

7️⃣ InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning by Salesforce, Hong Kong University of Science and Technology and Nanyang Technical University with 2818 citations - Improvement over BLIP-2 with vision-language instruction-tuning.

8️⃣ Sparks of Artificial General Intelligence: Early experiments with GPT-4 by Microsoft with 2712 citations - Microsoft's deep dive into GPT-4 capabilities, AGI yet?

9️⃣ A Survey of Large Language Models by Renmin University of China, Université de Montréal with 2275 citations - A comprehensive survey on LLMs.

🔟 Adding Conditional Control to Text-to-Image Diffusion Models by Stanford University with 2263 citations - ControlNet, a text-to-image diffusion model with precise control over image generation through various inputs like edges, depth, and segmentation.

Read on below to see the full list of 100 papers for 2023, but let's first dive into the analyses for countries and institutions. We see some rather big changes in the ranking for this year. Guess Microsoft really made Google dance... as Google loses the top spot.

2023 Rank	Organization	Number of papers in AI top 100	Rank Change from 2022
1	Microsoft	13	+3
2	Stanford University	11	+1
3	Google	10	-2
3	Carnegie Mellon University	10	+4
5	Meta	8	-3
6	UC Berkeley	5	-2
6	MIT	5	+7
6	HKUST	5	+7
6	UC San Diego	5	+16
10	OpenAI	4	-3

Meta's focus on open-source AI, also emphasized in Mark Zuckerberg's 2024 letter "open source AI is the path forward", has earned them widespread appreciation from the AI community (even though their definition of "open" has also received criticism). This is reflected with the top 3 most cited AI papers of 2023 all coming from Meta, although their overall ranking is down by three positions.

Counting by author affiliations in the top-100, Google and Meta actually both saw their top 100 counts reduced by 59% and 44% respectively. In the case of Meta it is definitely offset by the impact of their most cited contributions. In contrast, Microsoft’s top 100 counts in 2023 increased by 56% compared to the previous 3 years. However, it should be noted that their most impactful paper (Sparks of AGI) is a study of OpenAI’s GPT-4 capabilities.

Count of papers in the top 100 published by countries — Figure 2: Top countries by count in top-100 in 2023

When we break down the top-100 by country, in the interest of understanding the geopolitics of the AI race better, the US and China are still the strongest players with China accelerating its growth, while the US remains stable (and still in the lead). The UK is dropping out of the top-3, and seeing its lowest count in four years, but this is mainly due to the complete integration between Google Brain and DeepMind in April 2023 (DeepMind alone had 6 papers in the top 100 in 2022), and one could argue that those R&D teams have not necessarily moved on the map.

Count of papers in the top 100 published by region (North America, Asia, Europe, Oceania, Middle East, Africa) — Figure 3: Top regions by count in top-100 between 2020-2023

In total, North America is still leading Europe and Asia by a large margin. In 2023, Europe had 35% fewer papers in the top 100 compared to the average of the previous 3 years while the Middle East and Africa are showing the highest growth percentage with counts increased by 145% and 350% respectively.

Figure 4: Top conversion rate (ratio of the number of papers in the top 100 to the number of papers published) by company

In 2023, Stability AI emerged as a leader in converting publications into top-tier papers, although that position might be short lived. OpenAI continued to perform strongly, demonstrating their consistent ability to produce high-impact research. The Salesforce AI Research team proved it can hold its own in the high-impact research space, especially regarding their work on vision-language pretraining. Meanwhile, industry heavyweights like NVIDIA, Microsoft, Meta, and Google show up with lower conversion rates, reflecting their broad publication volume but a more diluted concentration of blockbuster papers.

Topics and trends in the top 100 of 2023

2023 was definitely the year of the breakthrough of LLMs and chatGPT. In society in general, and the same is true for the academic publication record as measured by citations. In fact, out of the top-100 cited papers, a whopping 83% are about LLMs. This leaves only 17 which are about other topics, like computer vision, image generation using diffusion models or robotics.

Most common topic keywords in the top-100 — Figure 5: Topics in the top-100, in percent of the papers covering this topic (singletons are omitted).

Multimodal Integration

The second most popular topic in the top-100 is multimodality, mainly focused on vision-language models, with even some progress in other modalities and embodied models. 2023 has been a year where models like GPT-4, Gemini, and LLaVA broke new grounds in understanding and interacting with the world around us. These multimodal models blend text, visuals, and other data types, opening doors to exciting applications like robotic manipulation with PaLM-E and real-world interactions with Qwen-VL and VisualChatGPT.

Relevant papers:

BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models

Visual Instruction Tuning

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

Gemini: A Family of Highly Capable Multimodal Models

PaLM-E: An Embodied Multimodal Language Model

Improved Baselines with Visual Instruction Tuning

Qwen-VL: A Frontier Large Vision-Language Model with Versatile Abilities

Ethical implications, Education and Research

Generative AI tools like ChatGPT are reshaping the education landscape, personal productivity and offering personalized and interactive learning experiences. While some research points to transformative benefits, others urge caution regarding AI's potential to disrupt education and perpetuate biases. The discussion is vibrant, reflecting on how AI can both support and challenge traditional educational models. 16% of the top-10 discusses ethical implications of LLMs and Generative AI, and 12% has concerns about its impact on education. As LLMs grow more powerful, understanding their impact is crucial. Evaluation isn't just about accuracy, it's also about societal effects and ethical use. Multiple studies challenge us to consider AI's implications on jobs, academia, and beyond, emphasizing a need for guidelines to ensure AI benefits society responsibly and equitably.

Relevant papers:

Education in the Era of Generative Artificial Intelligence

Engineering Education in the Era of ChatGPT: Promise and Pitfalls of Generative AI for Education

Generative AI and the Future of Education: Ragnarök or Reformation?

Examining Science Education in ChatGPT: An Exploratory Study of Generative Artificial Intelligence

Collaborating With ChatGPT: Considering the Implications of Generative Artificial Intelligence for Journalism and Media Education

A Survey on Evaluation of Large Language Models

How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection

GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models

ChatGPT and a New Academic Reality: Artificial Intelligence-Written Research Papers and the Ethics of the Large Language Models in Scholarly Publishing

ChatGPT: five priorities for research

Instruction Tuning and Alignment

AI's ability to follow human instructions is getting a makeover, thanks to models like LLaVA and InstructBLIP. These advancements in instruction tuning allow AI to handle complex tasks involving both text and images more effectively. WizardLM shows the potential of machine-generated instructions, while LIMA shows that quality beats quantity when it comes to prompt-based training. This is all about making AI that listens better and acts smarter.

Relevant papers:

Visual Instruction Tuning

InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning

Improved Baselines with Visual Instruction Tuning

Instruction Tuning with GPT-4

WizardLM: Empowering Large Language Models to Follow Complex Instructions

The Flan Collection: Designing Data and Methods for Effective Instruction Tuning

Otter: A Multi-Modal Model with In-Context Instruction Tuning

LIMA: Less Is More for Alignment

G-Eval: NLG Evaluation using Gpt-4 with Better Human Alignment

Efficiency and Scalability

Given the price of compute, doing more with less is the name of the game. Tools like QLoRA have revolutionized the finetuning of large models, making it possible on limited hardware thanks to advanced quantization techniques on top of Low-Rank Adapters. The Mamba architecture steps away from traditional transformers, bringing speed and linear scaling to the table while maintaining similar performance. And with breakthroughs like PagedAttention for better memory management, AI is becoming not just more powerful, but also more accessible.

Relevant papers:

QLoRA: Efficient Finetuning of Quantized LLMs

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling

Efficient Memory Management for Large Language Model Serving with PagedAttention

Domain-Specific Applications

From healthcare to finance and robotics, AI is also about tailoring expertise to specific fields. Whether reaching physician-level accuracy with Med-PaLM 2 in medicine or seamlessly navigating financial tasks with BloombergGPT, AI's specialisation is transforming industries. In the field of robotics, models like RT-2 and PaLM-E demonstrate the power of integrating vision-language pretraining with robotic control, enhancing generalization and reasoning in complex tasks.

Relevant papers:

Towards Expert-Level Medical Question Answering with Large Language Models

BloombergGPT: A Large Language Model for Finance

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

PaLM-E: An Embodied Multimodal Language Model

Medical Applications and Implications

AI's potential in medicine is expanding, some studies show it can provide compassionate and accurate interactions that sometimes surpass human counterparts. But beneath the promise, there's caution: ensuring these tools improves care without spreading misinformation. The medical community is exploring a future where AI helps diagnose and support clinician-patient interactions.

Relevant papers:

Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.

Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine

The Role of ChatGPT, Generative Language Models, and Artificial Intelligence in Medical Education

Role of ChatGPT in Public Health

Top 100 papers of 2023 - the full list

And finally, here is our top-100 list itself, with titles, citation counts, and affiliations.

Here you have the list for the year 2020, for 2021, and for 2022 from our edition of last year.

Methodology

To create the analysis above, we have first collected the most cited papers per year in the Zeta Alpha platform (we use the publication date of the arXiv preprint if available). We supplemented this list by mining for highly cited AI papers on Semantic Scholar with its broader coverage including closed-source publishers (e.g. Nature, Elsevier, Springer and other journals). We then take for each paper the number of citations on Google Scholar as the representative metric and sort the papers by this number to yield the top-100 for a year. For these papers we used GPT-4o-mini to extract the authors, their affiliations and the topics and manually checked these results. A paper with authors from multiple affiliations counts once for each of the affiliations. As a disclaimer we want to note that the citation counts were collected on September 20th, 2024, and will change over time as we go forward. Also, the cut off at one hundred papers skews the picture towards the true blockbusters and may not accurately represent the broader research impact of certain labs or institutions. And finally, the great ideas of tomorrow might have their roots in a really overlooked paper of many years ago. So always keep your eyes open, think for yourself, swim against the current when needed, and enjoy discovery!

Updates 2024/10/17

Add ChatGPT: five priorities for research to the list with 1390 citations

This concludes our analysis; what surprised you the most about these numbers? Try out our platform, follow us on Twitter @zetavector and let us know if you have any feedback or would like to receive a more detailed analysis for your domain or organization.