1 00:00:00,420 --> 00:00:06,420 Hello and welcome to this new tutorial and most importantly to this very essential new code section 2 00:00:06,420 --> 00:00:10,970 that we're about to implement and that is of course the training of the AI. 3 00:00:11,250 --> 00:00:19,020 So we build the AI and basically we made all the tools that will allow us to implement all the steps 4 00:00:19,020 --> 00:00:22,360 here of the augmented random search algorithm. 5 00:00:22,380 --> 00:00:28,410 You know you understand that what we've done so far are just preparing the tools so that we can integrate 6 00:00:28,470 --> 00:00:30,520 all of them in one same training. 7 00:00:30,630 --> 00:00:36,290 And this is exactly what we'll do in this code section this nuke detection to do the training. 8 00:00:36,300 --> 00:00:38,500 However this will still be a function. 9 00:00:38,550 --> 00:00:44,450 That doesn't mean that a D.N. of this new code section about the training we will run the code. 10 00:00:44,610 --> 00:00:45,930 No it is still a function. 11 00:00:45,930 --> 00:00:52,560 However at the very end the very last code section of this implementation we will run everything by 12 00:00:52,560 --> 00:00:58,540 you know preparing the environment connecting the environment to the AI and then execute this train 13 00:00:58,540 --> 00:01:02,220 function to train the AI to walk inside this environment. 14 00:01:02,220 --> 00:01:04,960 So we need to understand the approach we're doing. 15 00:01:05,000 --> 00:01:08,680 We first made some tools to organize and structure everything. 16 00:01:08,820 --> 00:01:14,790 And now we're about to make this huge training function which will use the previous classes and methods 17 00:01:14,790 --> 00:01:17,460 we've done before to run this whole training. 18 00:01:17,460 --> 00:01:18,590 So let's get started. 19 00:01:18,810 --> 00:01:20,870 And let's go back to Python. 20 00:01:20,880 --> 00:01:25,920 All right so as I said we will integrate the Holcim still inside a function. 21 00:01:25,920 --> 00:01:32,340 So I'm starting with Def and the name of the function will simply be train and it will take as arguments 22 00:01:32,670 --> 00:01:34,890 first the environment. 23 00:01:34,890 --> 00:01:40,790 Then of course the policy which is of course our AI AI which will train inside this environment. 24 00:01:41,100 --> 00:01:46,850 Then our normalizer is why the normalizer should be an argument of this function. 25 00:01:47,100 --> 00:01:53,040 Well it's in the case you know you want to change your normalizer In fact when you build an AI AI when 26 00:01:53,040 --> 00:01:58,700 you build a machine learning or deploying model you usually try several normalizer lasers. 27 00:01:58,800 --> 00:02:04,230 There can be several of them so it's nice to still have the option to change it easily by sending it 28 00:02:04,230 --> 00:02:07,750 as an argument of a function which is to try and function here. 29 00:02:08,010 --> 00:02:15,510 And then the last argument we will use is H-P which will be our future object that will contain all 30 00:02:15,510 --> 00:02:19,230 the hyper parameters there are going to be fixed during the training. 31 00:02:19,350 --> 00:02:25,320 And again why do we use it as an argument that because we want to be able to do some Chuen easily in 32 00:02:25,320 --> 00:02:31,020 the end meaning that we want to be able to try other values of the hyper parameters. 33 00:02:31,260 --> 00:02:37,020 And indeed it will be simple for us to do that because since HP is an argument of this train function 34 00:02:37,380 --> 00:02:44,760 well we'll simply need to change and tweak the values here to try some other ones and still keep this 35 00:02:44,940 --> 00:02:48,710 hybrid parameter object in a function without having to change anything. 36 00:02:48,710 --> 00:02:54,810 Basically if you want to try other values of your hyper parameters you only need to change them here. 37 00:02:54,910 --> 00:02:56,480 So that's pretty practical. 38 00:02:56,480 --> 00:03:00,800 And that said that the arguments will need for this strange function. 39 00:03:00,810 --> 00:03:06,110 All right let's not forget the Collen and let's define what the string function has to do. 40 00:03:06,150 --> 00:03:07,410 And so what does it have to do. 41 00:03:07,410 --> 00:03:13,580 Well let's go back to the paper and let me show you exactly what we'll do in this train function will 42 00:03:13,680 --> 00:03:21,120 basically do this whole while loop you know while ending condition that satisfy do all the stuff and 43 00:03:21,210 --> 00:03:23,950 all this stuff will be very easy for us to do. 44 00:03:23,970 --> 00:03:25,710 Thanks to all the functions we made. 45 00:03:25,950 --> 00:03:31,860 So that's why the reason we're going to do is well it's not actually going to be a while loop because 46 00:03:31,860 --> 00:03:35,580 we made this and the steps are horrible. 47 00:03:35,610 --> 00:03:42,060 You know as one of our hyper parameters which I remind is the number of loops in the training or more 48 00:03:42,060 --> 00:03:48,300 specifically the number of times we're going to update our moral you know with this update function. 49 00:03:48,420 --> 00:03:54,150 So it's also the number of steps of great in the sense we're going to do to abate the weight of our 50 00:03:54,150 --> 00:03:55,030 policy. 51 00:03:55,320 --> 00:04:03,240 And since we have this variable that is going to set the number of training loops or the number of updates. 52 00:04:03,450 --> 00:04:08,430 Well we don't need to do a while loop and it's actually better to do a for loop because indeed we're 53 00:04:08,460 --> 00:04:15,540 going to loop over all the steps in the range starting from zero but we don't have to specify because 54 00:04:15,540 --> 00:04:20,240 it's a default value you know the lower bound of the range function in Python. 55 00:04:20,490 --> 00:04:29,700 So from 0 up to this total number of steps of the training and I didn't forget the HP here because the 56 00:04:29,700 --> 00:04:33,450 steps is a hyper parameter of our implementation. 57 00:04:33,450 --> 00:04:37,410 All right and now we are ready to start the fall. 58 00:04:37,710 --> 00:04:44,850 So this training code section is quite long so I want to structure it as much as possible and more specifically 59 00:04:44,850 --> 00:04:49,320 I want to highlight each of the different steps of the training we're going into. 60 00:04:49,530 --> 00:04:54,900 So we're going to break this down in several tutorials and in the next one we're going to start with 61 00:04:54,900 --> 00:05:01,370 the first step of the training which will be to initialize the perturbations Delta and the positive 62 00:05:01,370 --> 00:05:03,600 rewards as well as the negative rewards. 63 00:05:03,770 --> 00:05:10,610 So basically we will sample the Deltas using our simple deltas method from the policy class that we 64 00:05:10,610 --> 00:05:11,500 made earlier. 65 00:05:11,600 --> 00:05:16,060 So that will simple are dealt us then we will initialize the positive rewards. 66 00:05:16,070 --> 00:05:21,370 That is the worst that will get by applying the perturbations in the positive directions. 67 00:05:21,650 --> 00:05:24,170 And then we'll initialized the negative rewards. 68 00:05:24,170 --> 00:05:25,650 That is the reward. 69 00:05:25,670 --> 00:05:29,380 We're going to get by applying the perturbations in the negative. 70 00:05:29,390 --> 00:05:32,510 Or remember it's actually the opposite direction. 71 00:05:32,750 --> 00:05:37,010 So just an initialization step and then we'll move on to the next step. 72 00:05:37,030 --> 00:05:40,190 They're going to be about eight steps in total for the train. 73 00:05:40,190 --> 00:05:45,530 So as you can see it's pretty long but since we're going to break it down you'll be able to take it 74 00:05:45,530 --> 00:05:51,950 step by step and most of the time I remind I recommend to take a step back in order to see the logical 75 00:05:51,950 --> 00:05:55,800 flow of the implementation and hear the training. 76 00:05:55,820 --> 00:05:57,930 So let's start the first step in the next tutorial. 77 00:05:57,950 --> 00:05:59,770 And until then enjoy AI.