Udemy - LLM Reinforcement Learning Fine-Tuning DeepSeek Method GRPO

seeders: 6
leechers: 1
Added 2 months ago by freecoursewb in Other

Download Fast Safe Anonymous
movies, software, shows...

Files

Udemy - LLM Reinforcement Learning Fine-Tuning DeepSeek Method GRPO (Size: 1.8 GB)
  Bonus Resources.txt 102.4 B
  Get Bonus Downloads Here.url 204.8 B
  ~Get Your Files Here !
  1 - Introduction
  1. Introduction.mp4 11.4 MB
  2. Course Content Introduction.mp4 47.7 MB
  3. Jupyter Notebooks.html 5.4 KB
  Notebooks 2
  Bolum_(Section)_1.ipynb 465.1 KB
  Bolum_(Section)_3_DPO.ipynb 259.4 KB
  Bolum_(Section)_4_GRPO_.ipynb 624.2 KB
  Bolum_(Section)__2.ipynb 207.9 KB
  DS_Store 6 KB
  Quantization.ipynb 81.9 KB
  Thinking__(REASONING)_model.ipynb 54.8 KB
  __MACOSX
  Notebooks 2
  2 - Quantization, LoRA, SFT, Data Collator, Data Preparation…
  10. Preparing Dataset, Chat Template, and Integrating Custom Tokens.en_US.srt 13.3 KB
  10. Preparing Dataset, Chat Template, and Integrating Custom Tokens.mp4 145.9 MB
  11. Continuing Dataset Preparation and Tokenization.en_US.srt 5.6 KB
  11. Continuing Dataset Preparation and Tokenization.mp4 47 MB
  12. What is a Data Collator How Does It Work Practical Example.en_US.srt 9.1 KB
  12. What is a Data Collator How Does It Work Practical Example.mp4 84.6 MB
  13. What is LoRA Why Use It.en_US.srt 3.4 KB
  13. What is LoRA Why Use It.mp4 17 MB
  14. Integrating LoRA Matrices into the Model.en_US.srt 7.6 KB
  14. Integrating LoRA Matrices into the Model.mp4 37.6 MB
  15. Setting Training Arguments (Training Hyperparameters).en_US.srt 9.8 KB
  15. Setting Training Arguments (Training Hyperparameters).mp4 32.1 MB
  16. Setting Trainer, Starting Training, and Evaluating Results.en_US.srt 3.9 KB
  16. Setting Trainer, Starting Training, and Evaluating Results.mp4 21.4 MB
  17. Merging Trained LoRA Matrices with the Model.en_US.srt 6.8 KB
  17. Merging Trained LoRA Matrices with the Model.mp4 51 MB
  18. Uploading Model on Hugging Face and Using it.en_US.srt 5.7 KB
  18. Uploading Model on Hugging Face and Using it.mp4 49.4 MB
  19. Hyperparameters Affecting the Outputs.en_US.srt 6.5 KB
  19. Hyperparameters Affecting the Outputs.mp4 30.3 MB
  3 - Adding New Tokens and Creating Templates for the Tokenizer
  20. Bolum_(Section)__2.ipynb.bin 207.9 KB
  20. Download the Model and Tokenizer.en_US.srt 4.6 KB
  20. Download the Model and Tokenizer.mp4 37 MB
  21. Adding New Custom Tokens to the Tokenizer.en_US.srt 8 KB
  21. Adding New Custom Tokens to the Tokenizer.mp4 30.9 MB
  22. Creating Templates with New Custom Tokens and Integrating Them into the Dataset.en_US.srt 7.7 KB
  22. Creating Templates with New Custom Tokens and Integrating Them into the Dataset.mp4 28.7 MB
  4 - DPO (Direct Preference Optimization)
  23. Bolum_(Section)_3_DPO.ipynb.bin 259.4 KB
  23. What is DPO What Data Format Does It Expect.en_US.srt 7.5 KB
  23. What is DPO What Data Format Does It Expect.mp4 43.4 MB
  24. Bolum_(Section)_3_DPO.ipynb.bin 259.4 KB
  24. Downloading Model & Understanding How the DPO Data Collator do Padding.en_US.srt 7.1 KB
  24. Downloading Model & Understanding How the DPO Data Collator do Padding.mp4 45.4 MB
  25. Preparing the Dataset for DPO.en_US.srt 10.9 KB
  25. Preparing the Dataset for DPO.mp4 84.4 MB
  26. Adding LoRA Matrices to the Model.en_US.srt 3.8 KB
  26. Adding LoRA Matrices to the Model.mp4 19.1 MB
  27. Setting Training Arguments (with DPOConfig).en_US.srt 5.4 KB
  27. Setting Training Arguments (with DPOConfig).mp4 13.3 MB
  28. Training the Model and Merging the LoRA Matrices.en_US.srt 6.9 KB
  28. Training the Model and Merging the LoRA Matrices.mp4 49.7 MB
  5 - GRPO (Group Relative Policy Optimization) Reinforcement Learning
  29. Bolum_(Section)_4_GRPO_.ipynb.bin 624.2 KB
  29. Thinking__(REASONING)_model.ipynb.bin 54.8 KB
  29. What is a “Reasoning” Model How Does It Work.en_US.srt 5 KB
  29. What is a “Reasoning” Model How Does It Work.mp4 56.5 MB
  30. What is GRPO How Is It Applied.en_US.srt 4.9 KB
  30. What is GRPO How Is It Applied.mp4 21.4 MB
  31. Bolum_(Section)_4_GRPO_.ipynb.bin 624.2 KB
  31. What are Unsloth and VLLM + Download the Model.en_US.srt 6.9 KB
  31. What are Unsloth and VLLM + Download the Model.mp4 62.7 MB
  32. Examining the Dataset and Initial Preparation Steps.en_US.srt 7.6 KB
  32. Examining the Dataset and Initial Preparation Steps.mp4 54 MB
  33. Extracting Specific Parts of Data Regex and Group Operations.en_US.srt 13.5 KB
  33. Extracting Specific Parts of Data Regex and Group Operations.mp4 49.5 MB
  34. In Which Format is Data Sent to Reward Functions.en_US.srt 7 KB
  34. In Which Format is Data Sent to Reward Functions.mp4 88.9 MB
  35. 1st Reward Function.en_US.srt 13.1 KB
  35. 1st Reward Function.mp4 64.4 MB
  36. 2nd Reward Function.en_US.srt 12.3 KB
  36. 2nd Reward Function.mp4 73.2 MB
  37. 3rd Reward Function.en_US.srt 11.1 KB
  37. 3rd Reward Function.mp4 77.9 MB
  38. 4th Reward Function.en_US.srt 7.2 KB
  38. 4th Reward Function.mp4 26.6 MB
  39. Training Hyperparameters (with GRPO Config).en_US.srt 8.3 KB
  39. Training Hyperparameters (with GRPO Config).mp4 61.3 MB
  40. Trainer Object and Training Process.en_US.srt 2.6 KB
  40. Trainer Object and Training Process.mp4 12 MB
  41. Results Table Rewards and Sample Outputs.en_US.srt 4.3 KB
  41. Results Table Rewards and Sample Outputs.mp4 78.4 MB
  42. BONUS_New_GRPO_Notebook.html 7.1 KB
  42. SFT_GRPO_Training.ipynb.bin 10 MB
  6 - BONUS_New_GRPO_Notebook
  43. BONUS_New_GRPO_Notebook.html 7.1 KB
  43. SFT_GRPO_Training.ipynb.bin 10 MB
  4. Quantization.ipynb.bin 81.9 KB
  4. What is Quantization How does it affect model size and parameters.en_US.srt 4.9 KB
  4. What is Quantization How does it affect model size and parameters.mp4 40.2 MB
  5. Create a Hugging Face Account and Get a Token.en_US.srt 5 KB
  5. Create a Hugging Face Account and Get a Token.mp4 35.1 MB
  6. Create a Colab Notebook and Get Familiar with the Libraries.en_US.srt 4.7 KB
  6. Create a Colab Notebook and Get Familiar with the Libraries.mp4 14.7 MB
  7. Bolum_(Section)_1.ipynb.bin 465.1 KB
  7. Download the Model with Quantization.en_US.srt 6.8 KB
  7. Download the Model with Quantization.mp4 27.5 MB
  8. Bolum_(Section)_1.ipynb.bin 465 KB
  8. Differences Between Base and Instruct Models.en_US.srt 8.5 KB
  8. Differences Between Base and Instruct Models.mp4 78 MB
  9. Download and Examine the Dataset.en_US.srt 4.7 KB
  9. Download and Examine the Dataset.mp4 18.9 MB
  _.DS_Store 102.4 B
  _Bolum_(Section)_1.ipynb 716.8 B
  _Bolum_(Section)_3_DPO.ipynb 409.6 B
  _Bolum_(Section)_4_GRPO_.ipynb 512 B
  _Bolum_(Section)__2.ipynb 204.8 B
  _Quantization.ipynb 409.6 B
  _Thinking__(REASONING)_model.ipynb 204.8 B

Description


LLM Reinforcement Learning Fine-Tuning DeepSeek Method GRPO

https://WebToolTip.com

Last updated 6/2025
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz, 2 Ch
Language: English | Duration: 3h 45m | Size: 1.85 GB

[EN] LLM Fine-Tuning and Reinforcement Learning with SFT, LoRA, DPO, and GRPO Custom Data HuggingFace

What you'll learn
You will grasp the core principles of Large Language Models (LLMs) and the overall structure behind their training processes.
You will learn the differences between base models and instruct models, as well as the methods for preparing data for each.
You’ll learn data preprocessing techniques along with essential tips, how to identify special tokens required by models, understanding data formats, and methods
You’ll gain practical, hands-on experience and detailed knowledge of how LoRA and Data Collator work.
You’ll gain a detailed understanding of crucial hyperparameters used in training, including their purpose and how they function.
You’ll practically learn, in detail, how trained LoRA matrices are merged with the base model, as well as key considerations and best practices to follow during
You’ll learn what Direct Preference Optimization (DPO) is, how it works, the expected data format, and the specific scenarios in which it’s used.
You’ll learn key considerations when preparing data for DPO, as well as understanding how the DPO data collator functions.
You’ll learn about the specific hyperparameters used in DPO training, their roles, and how they function.
You’ll learn how to upload your trained model to platforms like Hugging Face and manage hyperparameters effectively after training.
You’ll learn in detail how Group Relative Policy Optimization (GRPO), a reinforcement learning method, works, including an in-depth understanding of its learnin
You’ll learn how to prepare data specifically for Group Relative Policy Optimization (GRPO).
You’ll learn how to create reward functions—the most critical aspect of Group Relative Policy Optimization (GRPO)—through various practical reward function exam
In what format should data be provided to GRPO reward functions, and how can we process this data within the functions? You’ll learn these details thoroughly.
You’ll learn how to define rewards within functions and establish clear reward templates for GRPO.
You’ll practically learn numerous details, such as extracting reward-worthy parts from raw responses and defining rewards based on these extracted segments.
You’ll learn how to transform an Instruct model into one capable of generating “Chain of Thought” reasoning through GRPO (Group Relative Policy Optimization).

Requirements
Basic knowledge of Python programming.
Introductory-level familiarity with artificial intelligence and machine learning concepts.
Ideally, prior experience with Jupyter Notebook or Google Colab.

Related Torrents

torrent name size uploader age seed leech
0
13
0
0
3