1
00:00:00,390 --> 00:00:03,870
Hello welcome to this new tutorial and here we go with the next step.

2
00:00:03,870 --> 00:00:10,200
Gathering all the positive and negative rewards into one same list to compute the standard deviation

3
00:00:10,200 --> 00:00:11,610
of all these rewards.

4
00:00:11,910 --> 00:00:12,670
So here we go.

5
00:00:12,720 --> 00:00:15,160
Let's introduce that huge list.

6
00:00:15,210 --> 00:00:24,000
Well that's list of 32 words containing first the 16 percent reward and then second to 16 negative words.

7
00:00:24,120 --> 00:00:30,540
So basically we're going to do a concatenation of lists and this list which I'm introducing right now

8
00:00:30,540 --> 00:00:38,710
will be called all we want it and we'll get it by taking the non-pay library which has a shortcut and

9
00:00:38,720 --> 00:00:46,200
P and then the trick is to use the array function because this allows to do a concatenation of lists

10
00:00:46,590 --> 00:00:49,500
by just summing these two lists.

11
00:00:49,500 --> 00:00:53,690
So by these two lists I'm talking of course about the list of positive words.

12
00:00:53,700 --> 00:00:57,810
Let's take that first positive words.

13
00:00:57,810 --> 00:00:58,530
Here we go.

14
00:00:58,560 --> 00:01:06,240
And then here's what you can do you can do a plus and then add the other less you want to concatenate

15
00:01:06,300 --> 00:01:08,780
the first list negative rewards.

16
00:01:08,820 --> 00:01:09,840
Here it is.

17
00:01:09,930 --> 00:01:17,040
And pasting here and there we go we have concatenated to previous list into a numb pie array.

18
00:01:17,070 --> 00:01:23,220
So we have to understand that now we don't have a list like before but a number array but of one dimension

19
00:01:23,220 --> 00:01:25,380
and therefore it is like a list.

20
00:01:25,590 --> 00:01:32,820
But we wanted to convert that into a number array because then we're going to use the SDD method to

21
00:01:32,910 --> 00:01:39,000
directly compute the standard deviation of all the values in this list integrated into a number.

22
00:01:38,990 --> 00:01:39,420
Right.

23
00:01:39,570 --> 00:01:41,350
So that's the purpose of doing this.

24
00:01:41,490 --> 00:01:48,870
So I'm going to show you you take all three words and then just add a dot and then this s t d function

25
00:01:48,870 --> 00:01:55,380
which will return you the standard deviation of all the values inside your list of other words positive

26
00:01:55,380 --> 00:01:56,670
ones and negative ones.

27
00:01:56,850 --> 00:02:02,130
And therefore since it returns the standard deviation we're going to introduce a new variable that will

28
00:02:02,130 --> 00:02:06,960
be exactly this Sigma are variable.

29
00:02:06,960 --> 00:02:15,600
That was the variable in the Date function because remember in order to do our one step of in the sense

30
00:02:15,890 --> 00:02:22,080
we were dividing the learning rate here by little number of desperations times this standard deviation

31
00:02:22,080 --> 00:02:22,800
of two words.

32
00:02:22,830 --> 00:02:24,730
That's why we're getting it.

33
00:02:25,020 --> 00:02:32,160
And therefore now that we've made and prepare this Sigma our Very well we'll be able to input it in

34
00:02:32,160 --> 00:02:38,970
the arguments of the update function to make that one step of gray in the sense to update our policy.

35
00:02:38,970 --> 00:02:41,110
All right so perfect.

36
00:02:41,160 --> 00:02:46,050
Newstead done and now we're going to move onto the next step which will be something that we haven't

37
00:02:46,050 --> 00:02:49,480
done yet any time with no function or method.

38
00:02:49,500 --> 00:02:56,040
It is the sorting of the directions by the highest reward whether it is we were in a positive direction

39
00:02:56,130 --> 00:02:58,600
or we were obtained in the opposite direction.

40
00:02:58,800 --> 00:02:59,760
So we have to do it.

41
00:02:59,790 --> 00:03:04,740
It will be quite easy just taking three lines of code and then that will mean that we will have done

42
00:03:05,070 --> 00:03:07,950
everything here you know from 1 to 6.

43
00:03:08,010 --> 00:03:14,380
And so finally we'll be able to make that update step by applying this one step of gray in the center

44
00:03:14,370 --> 00:03:19,260
to update the weights and direction that increases the we want which is the whole purpose of what we're

45
00:03:19,260 --> 00:03:20,130
doing.

46
00:03:20,160 --> 00:03:25,860
So let's do that in the next story all sorting the directions and then we'll make that a data step until

47
00:03:25,860 --> 00:03:26,970
then enjoy.