1 00:00:00,560 --> 00:00:06,200 Highs and welcome back to Python now we're going to start the real implementation from scratch. 2 00:00:06,280 --> 00:00:12,400 Step-By-Step as usual and here we go with the first step in putting the libraries which you're going 3 00:00:12,400 --> 00:00:17,870 to see are not going to be any advanced libraries like tenso flow or by torch. 4 00:00:17,870 --> 00:00:24,210 We're just going to use non-pilot to build one of the most powerful eyes that you can build today. 5 00:00:24,400 --> 00:00:25,600 So this is a big deal. 6 00:00:25,600 --> 00:00:32,500 This contrasts between the power of the eye and the simple library which is done by Remember that sense 7 00:00:32,510 --> 00:00:38,920 of flow and by torch were invented because I was not advanced enough to build some AI's you know to 8 00:00:38,920 --> 00:00:40,500 build some deep learning models. 9 00:00:40,690 --> 00:00:47,040 But indeed here we're not doing some deep Eremos we're not doing deep learning at all just shallow learning 10 00:00:47,050 --> 00:00:53,170 if I may say is we'll still have a perception with some neurons but we'll just have one layer and therefore 11 00:00:53,170 --> 00:00:56,780 we can totally use pi to build our AI. 12 00:00:57,100 --> 00:01:03,820 So what we're going to import now importing the libraries will be well first before we import them by 13 00:01:03,820 --> 00:01:10,150 we're going to import the O S library which is the library you need where you're going to use your operating 14 00:01:10,150 --> 00:01:15,820 system and indeed we're going to use our operating system because you know in the end we will create 15 00:01:15,880 --> 00:01:23,850 a folder an output folder which will contain the videos of our AI walking on the field in the Bible 16 00:01:23,850 --> 00:01:24,930 that framework. 17 00:01:25,090 --> 00:01:35,560 OK so operating system and then to build our AI we will just need by only this and that will be enough 18 00:01:35,560 --> 00:01:37,500 to build our AI. 19 00:01:37,890 --> 00:01:39,600 OK so here we go. 20 00:01:39,610 --> 00:01:45,940 I just saved and now we're going to set the hyper parameters all the happy parameters that will be used 21 00:01:46,120 --> 00:01:47,820 to build the AI. 22 00:01:47,860 --> 00:01:49,630 So how are we going to do that. 23 00:01:49,630 --> 00:01:54,200 The best way this is a classic way of setting the hyper parameters. 24 00:01:54,310 --> 00:02:00,130 The best way is doing that through class I remind for those of you who are new to programming a class 25 00:02:00,250 --> 00:02:04,350 is like an ensemble of instructions of something you want to build. 26 00:02:04,480 --> 00:02:10,390 You know you find instructions and then what you can do is create some objects as many objects as you 27 00:02:10,390 --> 00:02:15,870 want which will contain all the properties that were defined in the class. 28 00:02:16,030 --> 00:02:22,150 So you first define the class and then you create the object of the class which will do later on but 29 00:02:22,510 --> 00:02:28,240 the class that we're about to build right now is just a class that will contain all the hype parameters 30 00:02:28,450 --> 00:02:29,420 of our AI. 31 00:02:29,420 --> 00:02:34,220 I remind that the hyper parameter is a parameter that is supposed to have a fixed value. 32 00:02:34,240 --> 00:02:39,940 No it won't be any variable of some function or any other kind of variable but that doesn't mean that 33 00:02:39,940 --> 00:02:41,970 you can try some other values. 34 00:02:41,980 --> 00:02:45,410 But in the training of the eye the value is fixed. 35 00:02:45,430 --> 00:02:46,540 That's what you need to understand. 36 00:02:46,540 --> 00:02:51,690 You know the hyper parameter during the building and the training of the eye is fixed. 37 00:02:51,730 --> 00:02:57,640 But then if you want to try some other trainings Well you can try these trainings with some other values 38 00:02:58,000 --> 00:03:03,280 of the hyper parameters and therefore what we're going to define right now through this class is the 39 00:03:03,280 --> 00:03:09,610 list of all these parameters that will have a fixed value during the training and which won't be modified 40 00:03:09,640 --> 00:03:12,570 unless you want to try some other trainings. 41 00:03:12,610 --> 00:03:13,940 All right so let's do this. 42 00:03:14,020 --> 00:03:20,560 So the final lesson Python you simply need to start with class and then you add here the name of your 43 00:03:20,560 --> 00:03:25,400 class the name of the class and Python usually starts with a capital letter. 44 00:03:25,450 --> 00:03:31,990 And since we're just a new class that will give the list of all the hyper parameters of the AI. 45 00:03:32,140 --> 00:03:39,490 Well I'm going to call it h p as hyper parameters and then you need to add some parenthesis because 46 00:03:39,880 --> 00:03:47,950 in a class you can specify some variables or even some inheritance tools but we won't use any variables 47 00:03:47,950 --> 00:03:56,380 here simply because we just want to define some hyper parameters that will be defined through the first 48 00:03:56,590 --> 00:03:57,370 method of the class. 49 00:03:57,370 --> 00:04:04,480 You know it is always the same method you start with when building a class which is the in it method 50 00:04:04,870 --> 00:04:12,370 separated by two double underscores and this method will take one argument which is self. 51 00:04:12,370 --> 00:04:20,020 So self is a mystery for much people when they start programming but that is only used to refer to the 52 00:04:20,110 --> 00:04:22,750 object that will be created from the class. 53 00:04:22,750 --> 00:04:28,630 We also call it an instance of the class you know the future objects will create are called instances 54 00:04:28,630 --> 00:04:35,710 of the class and self here is to specify that when you're going to use a variable that belongs to the 55 00:04:35,710 --> 00:04:41,980 object then you will specify that with self in order to specify that the variable indeed belongs to 56 00:04:41,980 --> 00:04:42,840 the object. 57 00:04:43,060 --> 00:04:49,240 OK so you're going to understand now what I'm going to do then because I'm going to define all the variables 58 00:04:49,240 --> 00:04:54,380 of the object and to define them to specify that these are variables of the object. 59 00:04:54,400 --> 00:04:58,370 I will indeed use the self here to refer to the object. 60 00:04:58,690 --> 00:05:02,050 So what is to be the first hyper parameter. 61 00:05:02,050 --> 00:05:06,590 What is going to be the first parameter that we're going to use and is going to be fixed in all the 62 00:05:06,590 --> 00:05:07,350 training. 63 00:05:07,700 --> 00:05:10,790 Well that's going to be the number of steps. 64 00:05:10,790 --> 00:05:16,080 So here I'm just choosing a name for the number of steps and what is the number of steps. 65 00:05:16,080 --> 00:05:21,110 That's basically the number of training loops we're going to have in the end or in other words that 66 00:05:21,190 --> 00:05:24,270 the number of times we're going to update our model. 67 00:05:24,350 --> 00:05:31,400 Know this perceptual one layer of several neurons which has a policy ticking hasn't put the state of 68 00:05:31,400 --> 00:05:38,780 the environment and returning it outputs the actions to pray in order to walk properly and which value 69 00:05:38,780 --> 00:05:41,370 are we going to choose where we're going to choose 1000. 70 00:05:41,570 --> 00:05:48,560 We will get some good results with that then next parameter still yourself because this is still a parameter 71 00:05:48,560 --> 00:05:51,580 of the future instances of the HP class. 72 00:05:51,590 --> 00:05:57,610 It's going to be the episode length the episode length which is what. 73 00:05:57,830 --> 00:06:00,810 Which is the maximum length of an episode. 74 00:06:01,130 --> 00:06:08,480 So the length here is just the maximum length of an episode meaning the maximum time will I walk on 75 00:06:08,480 --> 00:06:11,630 the field and the value we're going to choose for that. 76 00:06:11,720 --> 00:06:20,420 Still after experimentation is same one doesn't feel free to test any other values then we're going 77 00:06:20,420 --> 00:06:26,270 to define you Paramo it is going to be self taught learning rate. 78 00:06:26,570 --> 00:06:33,590 So learning rates is an inevitable parameter that you will always have whether you're doing some machine 79 00:06:33,590 --> 00:06:39,950 learning deep learning or AI learning where it is always here and that's just to control how fast your 80 00:06:39,950 --> 00:06:41,270 AI is learning. 81 00:06:41,300 --> 00:06:47,150 And usually you want to start with not too small but not too large a learning rate and a good value. 82 00:06:47,150 --> 00:06:52,530 Here is 0.00 to again feel free to try some other values. 83 00:06:52,820 --> 00:06:58,730 Then another hyper parameter that is really important and really you know at the heart of the augmented 84 00:06:58,730 --> 00:07:08,800 random search algorithm is the number of directions the number of directions which is the number of 85 00:07:08,800 --> 00:07:11,580 perturbations we'll apply on each of the weights. 86 00:07:11,590 --> 00:07:17,920 Remember that we're testing a certain number of directions and also they're opposite directions to figure 87 00:07:17,920 --> 00:07:20,640 out which direction increases the most reward. 88 00:07:20,800 --> 00:07:26,230 And this number of directions is a fixed hyper parameter and the value we're going to choose here actually 89 00:07:26,230 --> 00:07:27,370 started with 8. 90 00:07:27,580 --> 00:07:33,750 But I realized that I had better results with 16 and actually I will even try with some more directions 91 00:07:33,760 --> 00:07:38,930 because indeed the more directions you test the better chance you have to increase the reward. 92 00:07:39,010 --> 00:07:43,140 But of course the more directions you test and the longer it will take the training. 93 00:07:43,210 --> 00:07:44,820 So I will test it. 94 00:07:44,860 --> 00:07:47,830 But right now let's just start with 16. 95 00:07:48,010 --> 00:07:53,500 Then remember in the article in the paper they want to consider separately the number of directions 96 00:07:53,890 --> 00:07:59,410 and the number of best directions you know the directions are best that are increasing the reward the 97 00:07:59,410 --> 00:08:00,070 most. 98 00:08:00,250 --> 00:08:05,440 You want to keep them separately because you're going to reduce them and therefore separately we're 99 00:08:05,440 --> 00:08:14,440 going to create the number of best directions and we're going to start with 16 as well which means that 100 00:08:14,500 --> 00:08:16,690 we're going to test all the direction so far. 101 00:08:16,690 --> 00:08:22,510 But later on who will choose a number that is lower than the total number of directions. 102 00:08:22,510 --> 00:08:25,030 And speaking of being lower. 103 00:08:25,210 --> 00:08:28,690 That's what we'll need to assert right now. 104 00:08:28,800 --> 00:08:34,840 That is we need to make sure that the number of best directions is lower than the number of directions. 105 00:08:34,840 --> 00:08:35,960 Right that makes sense. 106 00:08:36,130 --> 00:08:41,950 You want to keep your top directions among all your directions and in order to assert something in Python. 107 00:08:42,070 --> 00:08:55,040 The trick to do that is add here assert self doubt and the best directions lower than the number of 108 00:08:55,670 --> 00:09:02,970 directions there you go with a certain number of best directions is always lower than the number of 109 00:09:02,970 --> 00:09:04,360 directions. 110 00:09:04,920 --> 00:09:08,790 Perfect then three more variables. 111 00:09:08,790 --> 00:09:15,690 The next one is the noise which is going to be the sigma in the Gaussian distribution will use to sample 112 00:09:15,690 --> 00:09:21,570 the perturbations because actually these perturbations are going to be sampled following a Gaussian 113 00:09:21,570 --> 00:09:26,640 distribution and you know in a Galchen distribution you have the standard deviation and this noise is 114 00:09:26,640 --> 00:09:34,650 actually this standard deviation Sigma and we're going to set it equal to both point three to not have 115 00:09:34,650 --> 00:09:35,690 a large variance. 116 00:09:35,940 --> 00:09:44,720 OK then the next hyper parameter will be the seed which is just to fix the current configuration of 117 00:09:44,720 --> 00:09:45,850 the environment. 118 00:09:45,860 --> 00:09:50,620 So that's basically to fix the parameters of the environment. 119 00:09:50,840 --> 00:09:54,280 We can just choose any value here just so that we have the same result. 120 00:09:54,290 --> 00:09:58,870 You know if you want to get the same result as the ones I will show you in the end. 121 00:09:58,910 --> 00:10:04,970 In addition when we test our AI on several environments Well it's normal to observe the same thing. 122 00:10:04,970 --> 00:10:08,030 So let's just pick one that's the seed. 123 00:10:08,270 --> 00:10:12,500 And then finally one has variable which is of course the environment. 124 00:10:12,710 --> 00:10:16,920 So I'm going to give the following name to that variable. 125 00:10:16,970 --> 00:10:20,510 It's basically the environment will connect to our AI. 126 00:10:20,650 --> 00:10:27,500 It's just the name of the environment so we can call it and name and name and so forth we're just going 127 00:10:27,500 --> 00:10:34,190 to input some quotes and later on when we choose our environment to play with we will give the name 128 00:10:34,190 --> 00:10:36,660 of this environment inside the quotes. 129 00:10:36,710 --> 00:10:37,080 All right. 130 00:10:37,080 --> 00:10:38,170 And that's it. 131 00:10:38,180 --> 00:10:42,630 These are basically the hyper parameters that will be used in the AI. 132 00:10:42,770 --> 00:10:44,300 You can actually add some more. 133 00:10:44,300 --> 00:10:45,980 For example you can add a dk. 134 00:10:45,990 --> 00:10:51,410 You know when you use the learning red you can add a DK hyper parameter to reduce the learning rate 135 00:10:51,410 --> 00:10:53,270 of the epochs in the training. 136 00:10:53,450 --> 00:10:55,130 But this will be enough. 137 00:10:55,130 --> 00:10:58,730 With this group of hyper Renner's we will get some great results. 138 00:10:58,730 --> 00:10:59,510 All right. 139 00:10:59,510 --> 00:11:04,910 So that was the first step and each step in the next step is going to get slightly more challenging. 140 00:11:04,910 --> 00:11:12,080 We will implement the first important feature of the paper which is about normalizing the States so 141 00:11:12,080 --> 00:11:14,410 we will follow exactly what is in the paper. 142 00:11:14,660 --> 00:11:18,640 And I remind that when you do this for performance purposes. 143 00:11:18,980 --> 00:11:20,650 So let's do this in the next tutorial. 144 00:11:20,660 --> 00:11:21,480 I'll see you there. 145 00:11:21,500 --> 00:11:22,900 And until then enjoy a.