1 00:00:00,390 --> 00:00:03,870 Hello welcome to this new tutorial and here we go with the next step. 2 00:00:03,870 --> 00:00:10,200 Gathering all the positive and negative rewards into one same list to compute the standard deviation 3 00:00:10,200 --> 00:00:11,610 of all these rewards. 4 00:00:11,910 --> 00:00:12,670 So here we go. 5 00:00:12,720 --> 00:00:15,160 Let's introduce that huge list. 6 00:00:15,210 --> 00:00:24,000 Well that's list of 32 words containing first the 16 percent reward and then second to 16 negative words. 7 00:00:24,120 --> 00:00:30,540 So basically we're going to do a concatenation of lists and this list which I'm introducing right now 8 00:00:30,540 --> 00:00:38,710 will be called all we want it and we'll get it by taking the non-pay library which has a shortcut and 9 00:00:38,720 --> 00:00:46,200 P and then the trick is to use the array function because this allows to do a concatenation of lists 10 00:00:46,590 --> 00:00:49,500 by just summing these two lists. 11 00:00:49,500 --> 00:00:53,690 So by these two lists I'm talking of course about the list of positive words. 12 00:00:53,700 --> 00:00:57,810 Let's take that first positive words. 13 00:00:57,810 --> 00:00:58,530 Here we go. 14 00:00:58,560 --> 00:01:06,240 And then here's what you can do you can do a plus and then add the other less you want to concatenate 15 00:01:06,300 --> 00:01:08,780 the first list negative rewards. 16 00:01:08,820 --> 00:01:09,840 Here it is. 17 00:01:09,930 --> 00:01:17,040 And pasting here and there we go we have concatenated to previous list into a numb pie array. 18 00:01:17,070 --> 00:01:23,220 So we have to understand that now we don't have a list like before but a number array but of one dimension 19 00:01:23,220 --> 00:01:25,380 and therefore it is like a list. 20 00:01:25,590 --> 00:01:32,820 But we wanted to convert that into a number array because then we're going to use the SDD method to 21 00:01:32,910 --> 00:01:39,000 directly compute the standard deviation of all the values in this list integrated into a number. 22 00:01:38,990 --> 00:01:39,420 Right. 23 00:01:39,570 --> 00:01:41,350 So that's the purpose of doing this. 24 00:01:41,490 --> 00:01:48,870 So I'm going to show you you take all three words and then just add a dot and then this s t d function 25 00:01:48,870 --> 00:01:55,380 which will return you the standard deviation of all the values inside your list of other words positive 26 00:01:55,380 --> 00:01:56,670 ones and negative ones. 27 00:01:56,850 --> 00:02:02,130 And therefore since it returns the standard deviation we're going to introduce a new variable that will 28 00:02:02,130 --> 00:02:06,960 be exactly this Sigma are variable. 29 00:02:06,960 --> 00:02:15,600 That was the variable in the Date function because remember in order to do our one step of in the sense 30 00:02:15,890 --> 00:02:22,080 we were dividing the learning rate here by little number of desperations times this standard deviation 31 00:02:22,080 --> 00:02:22,800 of two words. 32 00:02:22,830 --> 00:02:24,730 That's why we're getting it. 33 00:02:25,020 --> 00:02:32,160 And therefore now that we've made and prepare this Sigma our Very well we'll be able to input it in 34 00:02:32,160 --> 00:02:38,970 the arguments of the update function to make that one step of gray in the sense to update our policy. 35 00:02:38,970 --> 00:02:41,110 All right so perfect. 36 00:02:41,160 --> 00:02:46,050 Newstead done and now we're going to move onto the next step which will be something that we haven't 37 00:02:46,050 --> 00:02:49,480 done yet any time with no function or method. 38 00:02:49,500 --> 00:02:56,040 It is the sorting of the directions by the highest reward whether it is we were in a positive direction 39 00:02:56,130 --> 00:02:58,600 or we were obtained in the opposite direction. 40 00:02:58,800 --> 00:02:59,760 So we have to do it. 41 00:02:59,790 --> 00:03:04,740 It will be quite easy just taking three lines of code and then that will mean that we will have done 42 00:03:05,070 --> 00:03:07,950 everything here you know from 1 to 6. 43 00:03:08,010 --> 00:03:14,380 And so finally we'll be able to make that update step by applying this one step of gray in the center 44 00:03:14,370 --> 00:03:19,260 to update the weights and direction that increases the we want which is the whole purpose of what we're 45 00:03:19,260 --> 00:03:20,130 doing. 46 00:03:20,160 --> 00:03:25,860 So let's do that in the next story all sorting the directions and then we'll make that a data step until 47 00:03:25,860 --> 00:03:26,970 then enjoy.