1 00:00:00,670 --> 00:00:02,670 Hello and welcome to this new tutorial. 2 00:00:02,830 --> 00:00:09,430 So in the Bruce Doyle we made this evaluate method that returns the output when we feed the perception 3 00:00:09,430 --> 00:00:16,420 with a certain input and with three possible options which are first no perturbation is applied when 4 00:00:16,420 --> 00:00:17,500 Direction is known. 5 00:00:17,500 --> 00:00:20,770 Second when a positive perturbation is applied. 6 00:00:20,860 --> 00:00:27,010 And third when a negative perturbation is applied and negative I remind it means it's the opposite. 7 00:00:27,130 --> 00:00:31,560 It's a perturbation in the opposite direction as this one the positive one. 8 00:00:31,810 --> 00:00:33,300 So that's a good thing done. 9 00:00:33,310 --> 00:00:40,840 But the only thing missing in all this is that the doubters are not simple yet anywhere in the code 10 00:00:40,850 --> 00:00:47,890 so we just need to make this extra method here too simple to Deltas and then in the arguments here we'll 11 00:00:47,890 --> 00:00:53,780 put the perturbations that were sampled thanks to this additional method that we're about to make. 12 00:00:54,020 --> 00:00:55,960 So let's make it deaf. 13 00:00:56,230 --> 00:01:05,240 And as I said in the peruses oil we're going to call it simple deltas or you can call it simple perturbations 14 00:01:05,270 --> 00:01:12,230 but the Deltas are the perturbations of course so simple they are to us and it's not going to take any 15 00:01:12,230 --> 00:01:20,690 argument except self of course because simply it just consists of returning some small values following 16 00:01:20,780 --> 00:01:22,430 a normal distribution. 17 00:01:22,430 --> 00:01:26,090 That is a Gaussian distribution of mean zero and variance one. 18 00:01:26,090 --> 00:01:29,410 So indeed we don't need any argument here to do that. 19 00:01:29,430 --> 00:01:36,370 We'll just return it directly by using the rand and function from the library. 20 00:01:36,500 --> 00:01:43,970 And since it's actually direct to use this function well we can just start here with a return and this 21 00:01:43,970 --> 00:01:51,080 will be the only line of the method will return exactly what we want that is these simpled perturbations 22 00:01:51,100 --> 00:01:51,820 deltas. 23 00:01:52,010 --> 00:01:52,680 OK. 24 00:01:52,940 --> 00:02:01,610 So as I said we can do this through the non-pay library and the non-pay library is given by the shortcut 25 00:02:01,940 --> 00:02:09,440 and then from this non-pilot rii we're going to take the random module because it's a random module 26 00:02:09,440 --> 00:02:19,550 that contains the rand n function that is returning these simpled small values following a normal distribution. 27 00:02:19,550 --> 00:02:24,920 You know the end here is for normal and Rande it's because it's returning some random values. 28 00:02:24,920 --> 00:02:26,870 So normal distribution. 29 00:02:27,110 --> 00:02:34,610 But now it's important to understand that we're not only going to return one small value we're going 30 00:02:34,610 --> 00:02:39,580 to return a matrix of small values following a random distribution. 31 00:02:39,650 --> 00:02:40,600 And why is that. 32 00:02:40,610 --> 00:02:48,290 That's simply because we are adding this delta here multiplied by the little noise to our matrix of 33 00:02:48,290 --> 00:02:49,510 weight theta. 34 00:02:49,670 --> 00:02:53,570 And therefore this delta here must be a matrix of course. 35 00:02:53,570 --> 00:03:00,420 So basically we're adding those little very small values to each of the weights of the matrix of weight 36 00:03:00,440 --> 00:03:01,170 theta. 37 00:03:01,520 --> 00:03:09,510 And that's why here in the random function we have to specify the dimensions of the matrix theta. 38 00:03:09,500 --> 00:03:15,260 It must have the exact same dimensions of the matrix theta because we are adding those small values 39 00:03:15,260 --> 00:03:18,240 to each of the values of the matrix theta. 40 00:03:18,410 --> 00:03:25,460 And therefore here we need to specify the dimensions of this matrix of small values we want to add to 41 00:03:25,460 --> 00:03:28,390 the matrix of weight theta. 42 00:03:28,630 --> 00:03:32,070 And there is a trick in Python to do that quickly. 43 00:03:32,150 --> 00:03:34,490 It is by adding a star here. 44 00:03:34,490 --> 00:03:42,840 Then we take our matrix of weight self the theta and then we just add shape and this will return as 45 00:03:43,070 --> 00:03:47,770 the dimensions of the matrix of weight theta which is exactly what we need. 46 00:03:47,780 --> 00:03:53,630 Otherwise the other way was to take some of that theta that shape and then in square brackets 0 that 47 00:03:53,630 --> 00:03:58,810 would give us the first time mention of the matrix of where theta and then come up and then sell that 48 00:03:58,820 --> 00:04:04,550 theta that shape and then square brackets 1 which would give us the second dimension of the matrix of 49 00:04:04,550 --> 00:04:05,520 what theta. 50 00:04:05,540 --> 00:04:10,970 So you know we would have a couple with these two elements but it's much quicker it's actually a little 51 00:04:10,970 --> 00:04:11,610 trick. 52 00:04:11,630 --> 00:04:18,050 That is only compatible with Python 3 2 at this door here to specify that we want the two dimensions 53 00:04:18,110 --> 00:04:19,670 of the shape of theta. 54 00:04:19,670 --> 00:04:21,720 All right so good to know. 55 00:04:21,920 --> 00:04:29,030 So that gives us the dimensions and therefore that creates a matrix of the exact same time emotions 56 00:04:29,150 --> 00:04:30,130 as theta. 57 00:04:30,200 --> 00:04:36,020 So we can add this matrix to theta and this matrix will contain some small values following a normal 58 00:04:36,020 --> 00:04:37,100 distribution. 59 00:04:37,100 --> 00:04:38,500 So that's a good first thing done. 60 00:04:38,720 --> 00:04:42,660 But then that's not all that's not only what we want to return. 61 00:04:42,740 --> 00:04:48,020 We not only want to return a matrix of small values following a normal distribution we want to return 62 00:04:48,230 --> 00:04:59,030 16 matrices of these small values why 16 is because we are playing those perturbations for 16 different 63 00:04:59,030 --> 00:05:00,050 directions. 64 00:05:00,050 --> 00:05:04,760 Remember this and the directions hyper parameter here that is equal to 16. 65 00:05:04,850 --> 00:05:09,310 That means that when we apply a perturbation in some direction. 66 00:05:09,500 --> 00:05:17,270 Well we are going to do that for 16 directions and 16 opposite directions so therefore in total 32 directions 67 00:05:17,480 --> 00:05:20,640 16 positive directions and 16 negative directions. 68 00:05:20,670 --> 00:05:29,290 So we want to return these matrices as a list of matrices. 69 00:05:29,290 --> 00:05:35,920 And that's why I'm adding some square brackets here surrounding these matrices that we're creating and 70 00:05:35,920 --> 00:05:44,200 in order to get 16 matrices we just need to add a for loop with any you know boy will I can just add 71 00:05:44,520 --> 00:05:56,730 I or even an underscore in the range of the total number of directions and I as you can notice the HP 72 00:05:56,740 --> 00:06:03,550 because the number of directions is a hyper parameter of our future HP objects that will create at the 73 00:06:03,550 --> 00:06:05,230 end of the implementation. 74 00:06:05,230 --> 00:06:11,020 All right so basically by just adding this full hoop here for the total number of directions I'm going 75 00:06:11,020 --> 00:06:20,290 to create 16 matrices of small random values following a normal distribution that is a Gaussian distribution 76 00:06:20,290 --> 00:06:21,970 of mean zero and variance 1. 77 00:06:22,120 --> 00:06:24,030 And then things to this noise here. 78 00:06:24,160 --> 00:06:27,910 Well actually add to the matrix of weights. 79 00:06:27,910 --> 00:06:35,470 Theta are not small values following normal distribution but small values following a Gaussian distribution 80 00:06:35,530 --> 00:06:43,060 of mean zero and of variance or standard deviation 0.03 because remember that we initialized the noise 81 00:06:43,390 --> 00:06:45,070 to 0.03. 82 00:06:45,160 --> 00:06:53,450 So that's how we'll add the perturbations in a positive direction and the opposite negative direction. 83 00:06:53,510 --> 00:07:00,790 Right now we have our sampled values and so we're ready to move on to the final method of this policy 84 00:07:00,790 --> 00:07:07,910 class which is basically the next step of the paper that is make the update step. 85 00:07:08,110 --> 00:07:09,940 Well actually there is step 6 first. 86 00:07:09,970 --> 00:07:17,110 So the directions us or perturbations by the max or do we ward in the positive direction and the reward 87 00:07:17,110 --> 00:07:18,670 in the negative direction. 88 00:07:18,670 --> 00:07:20,000 But no worries we will do that. 89 00:07:20,020 --> 00:07:26,240 And actually the training itself will implement that feature here in the training we don't have to implement 90 00:07:26,240 --> 00:07:28,300 it right now in the evaluate method. 91 00:07:28,300 --> 00:07:37,390 However we have to implement this make the update which is actually one step of great in the sense because 92 00:07:37,390 --> 00:07:43,170 indeed we are updating the matrix of weight in the direction of the perturbation here. 93 00:07:43,240 --> 00:07:43,780 Right. 94 00:07:43,780 --> 00:07:51,850 The gradient is approximated by taking the differences of the words into two couple of positive and 95 00:07:51,940 --> 00:07:55,090 opposite directions and multiplied by the perturbation. 96 00:07:55,090 --> 00:08:01,900 That's one step of gradient descent and that's the update step that we have to implement now and we'll 97 00:08:01,900 --> 00:08:03,600 do that in the next hour tomorrow. 98 00:08:03,880 --> 00:08:05,550 Until then enjoy AI.