Unlocking AI Mysteries: DeepSeek LLM Open Source Release Dominates Llama 2 and Claude-2
![Unlocking AI Mysteries: DeepSeek LLM Open Source Release Dominates Llama 2 and Claude-2](/content/images/size/w1200/2023/12/unlocking-ai-mysteries-deepseek-llm-open-source-release-dominates-llama-2-and-claude-2.png)
China's DeepSeek Releases Powerful DeepSeek LLM Model
DeepSeek, a Chinese company dedicated to unraveling the mysteries of AGI, has open-sourced its DeepSeek LLM model on GitHub, Hugging Face, and AWS S3. This model, with 67 billion parameters, was carefully trained from scratch on a massive dataset of 2 trillion tokens.
Model Versions
- Two versions: Base and Chat
- Available in English and Chinese
- Accessible under the MIT license
Performance Highlights
- 67B Base outperforms competitors like Llama 2 in reasoning, coding, mathematics, and Chinese comprehension
- Coding: HumanEval Pass@1 score of 73.78
- Mathematics: GSM8K 0-shot: 84.1, Math 0-shot: 32.6
- Generalization with a score of 65 on the Hungarian National High School Exam
- Open-sourced on GitHub, Hugging Face, and AWS S3 with access to intermediate checkpoints during training
Evaluations
- Rigorous evaluations against benchmarks like LLaMA-2, GPT-3.5, and Claude-2
- Superiority in English and Chinese languages, even in unique exams like the Hungarian National High School Exam
- Use of 20 million Chinese multiple-choice questions in DeepSeek LLM 7B Chat enhances benchmark performance in MMLU, C-Eval, and CMMLU.
Pre-training
- Transparent pre-training process, emphasizing openness
- Model architecture similar to LLaMA, using auto-regressive transformer decoder models with unique attention mechanisms
Alibaba's Entry
- Alibaba introduces its LLM called Qwen-72B, trained on 3T tokens
- Smaller model, Qwen-1.8B, released as a gift to the research community.