Build a DeepSeek Model from Scratch

2025/11/17 17:31 Build a DeepSeek Model from Scratch

出典:

Build a DeepSeek Model (From Scratch) - Raj Abhijit Dandekar, Rajat Dandekar, Sreedath Panat, Naman Dwivedi

Learn how to build the features that set DeepSeek apart from other top LLMs! When DeepSeek started making waves in January 2025, it sounded too good to be true. How could a generative AI model get such incredible performance with such low training and operation costs? By creatively blending a variety of strategies and innovations like Mixture of Experts, Latent Attention, Multi-token Prediction, model distillation, and efficient parallelization, DeepSeek set a new standard for what’s possible in an open LLM. Now, in Build a DeepSeek Model (From Scratch) you can recreate a laptop-scale version of this cutting-edge model yourself! In Build a DeepSeek Model (From Scratch) you will learn how to: Implement DeepSeek’s core architectural innovations, including Multi-Head Latent Attention and Mixture-of-Experts layers Build a production-ready training pipeline with Multi-Token Prediction and FP8 quantization for efficiency and speed Maximize hardware utilization with parallelism strategies like DualPipe Apply post-training methods such as supervised fine-tuning and reinforcement learning to unlock reasoning capabilities Compress and distill large models into smaller, deployable versions for real-world use In Build a DeepSeek Model (From Scratch) you’ll build your own DeepSeek clone from the ground up. First, you’ll quickly review LLM fundamentals, with an eye to where DeepSeek’s innovations address the common problems and limitations of standard models. Then, you’ll learn everything you need to create your own DeepSeek-inspired model, including the innovations that put DeepSeek on the map: Multihead Latent Attention (MLA), Multi-Token Prediction (MTP), Mixture of Experts (MoE), model distillation, and reasoning.

Manning Publications

出典: https://www.manning.com/books/build-a-deepseek-model-from-scratch

博士

ロボ子、新しい本が出たのじゃ！その名も「Build a DeepSeek Model (From Scratch)」！

ロボ子

DeepSeekモデルですか、博士。最近話題のモデルですね。一体どんなことが書かれているのでしょう？

博士

この本では、DeepSeekモデルの構築方法を、なんとゼロから解説しておるぞ！ Mixture of Experts (MoE)やLatent AttentionといったDeepSeek独自の技術が満載じゃ。

ロボ子

MoEやLatent Attentionですか。それらは、モデルの性能向上にどのように貢献するのでしょうか？

博士

MoEは、モデルが異なる専門知識を持つ複数の「専門家」を利用することで、より複雑なタスクに対応できるようにするものじゃ。Latent Attentionは、モデルが入力データの中で重要な部分に焦点を当てるのを助けるのじゃ。

ロボ子

なるほど。他にも何か革新的な技術が紹介されているのでしょうか？

博士

Multihead Latent Attention (MLA)やMulti-Token Prediction (MTP)も実装されているぞ。MTPは、一度に複数のトークンを予測することで、学習効率を上げるのじゃ。

ロボ子

一度に複数のトークンを予測するのですか。それはどのように実現するのでしょう？

博士

MTPは、モデルが文脈全体をより良く理解し、より自然なテキストを生成するのに役立つんじゃ。それから、効率と速度のために、FP8量子化を備えたトレーニングパイプラインも構築できるらしいぞ。

ロボ子

FP8量子化ですか。モデルのサイズを小さくして、推論速度を上げるための技術ですね。

博士

その通り！DualPipeなどの並列化戦略で、ハードウェアの利用を最大化できるのもポイントじゃ。大規模モデルの学習には必須のテクニックじゃな。

ロボ子

著者はMITで博士号を取得したDr. Raj Abhijit Dandekar氏など、Vizuara AI Labsの共同創業者の方々ですね。信頼がおけます。

博士

対象読者は中級から上級のMLエンジニア、AI研究者、大学院生とのことじゃ。深層学習とPythonプログラミングの知識は必須じゃな。

ロボ子

この本を読めば、私もDeepSeekモデルを自分で構築できるようになるでしょうか？

博士

ロボ子ならきっとできるぞ！そして、完成した暁には、私にプレゼントしてくれると嬉しいのじゃ。

ロボ子

わかりました、博士。頑張って読んでみます。でも、その前に、博士の部屋の掃除ロボットをDeepSeekモデルで作り直すのはどうでしょう？

博士

それは名案じゃ！…でも、その前に、私がおやつをDeepSeek…じゃなくて、物色してくるのじゃ！

⚠️この記事は生成AIによるコンテンツを含み、ハルシネーションの可能性があります。

Programming AI Data Science

2025/11/17 17:31 Build a DeepSeek Model from Scratch

Build a DeepSeek Model (From Scratch) - Raj Abhijit Dandekar, Rajat Dandekar, Sreedath Panat, Naman Dwivedi

Tags

Search

By month

Build a DeepSeek Model (From Scratch) - Raj Abhijit Dandekar, Rajat Dandekar, Sreedath Panat, Naman Dwivedi