Unlocking AI Mysteries: DeepSeek LLM Open Source Release Dominates Llama 2 and Claude-2

Unlocking AI Mysteries: DeepSeek LLM Open Source Release Dominates Llama 2 and Claude-2

China's DeepSeek Releases Powerful DeepSeek LLM Model

DeepSeek, a Chinese company dedicated to unraveling the mysteries of AGI, has open-sourced its DeepSeek LLM model on GitHub, Hugging Face, and AWS S3. This model, with 67 billion parameters, was carefully trained from scratch on a massive dataset of 2 trillion tokens.

Model Versions

  • Two versions: Base and Chat
  • Available in English and Chinese
  • Accessible under the MIT license

Performance Highlights

  • 67B Base outperforms competitors like Llama 2 in reasoning, coding, mathematics, and Chinese comprehension
  • Coding: HumanEval Pass@1 score of 73.78
  • Mathematics: GSM8K 0-shot: 84.1, Math 0-shot: 32.6
  • Generalization with a score of 65 on the Hungarian National High School Exam
  • Open-sourced on GitHub, Hugging Face, and AWS S3 with access to intermediate checkpoints during training

Evaluations

  • Rigorous evaluations against benchmarks like LLaMA-2, GPT-3.5, and Claude-2
  • Superiority in English and Chinese languages, even in unique exams like the Hungarian National High School Exam
  • Use of 20 million Chinese multiple-choice questions in DeepSeek LLM 7B Chat enhances benchmark performance in MMLU, C-Eval, and CMMLU.

Pre-training

  • Transparent pre-training process, emphasizing openness
  • Model architecture similar to LLaMA, using auto-regressive transformer decoder models with unique attention mechanisms

Alibaba's Entry

  • Alibaba introduces its LLM called Qwen-72B, trained on 3T tokens
  • Smaller model, Qwen-1.8B, released as a gift to the research community.

Read more