1
00:00:00,670 --> 00:00:02,670
Hello and welcome to this new tutorial.

2
00:00:02,830 --> 00:00:09,430
So in the Bruce Doyle we made this evaluate method that returns the output when we feed the perception

3
00:00:09,430 --> 00:00:16,420
with a certain input and with three possible options which are first no perturbation is applied when

4
00:00:16,420 --> 00:00:17,500
Direction is known.

5
00:00:17,500 --> 00:00:20,770
Second when a positive perturbation is applied.

6
00:00:20,860 --> 00:00:27,010
And third when a negative perturbation is applied and negative I remind it means it's the opposite.

7
00:00:27,130 --> 00:00:31,560
It's a perturbation in the opposite direction as this one the positive one.

8
00:00:31,810 --> 00:00:33,300
So that's a good thing done.

9
00:00:33,310 --> 00:00:40,840
But the only thing missing in all this is that the doubters are not simple yet anywhere in the code

10
00:00:40,850 --> 00:00:47,890
so we just need to make this extra method here too simple to Deltas and then in the arguments here we'll

11
00:00:47,890 --> 00:00:53,780
put the perturbations that were sampled thanks to this additional method that we're about to make.

12
00:00:54,020 --> 00:00:55,960
So let's make it deaf.

13
00:00:56,230 --> 00:01:05,240
And as I said in the peruses oil we're going to call it simple deltas or you can call it simple perturbations

14
00:01:05,270 --> 00:01:12,230
but the Deltas are the perturbations of course so simple they are to us and it's not going to take any

15
00:01:12,230 --> 00:01:20,690
argument except self of course because simply it just consists of returning some small values following

16
00:01:20,780 --> 00:01:22,430
a normal distribution.

17
00:01:22,430 --> 00:01:26,090
That is a Gaussian distribution of mean zero and variance one.

18
00:01:26,090 --> 00:01:29,410
So indeed we don't need any argument here to do that.

19
00:01:29,430 --> 00:01:36,370
We'll just return it directly by using the rand and function from the library.

20
00:01:36,500 --> 00:01:43,970
And since it's actually direct to use this function well we can just start here with a return and this

21
00:01:43,970 --> 00:01:51,080
will be the only line of the method will return exactly what we want that is these simpled perturbations

22
00:01:51,100 --> 00:01:51,820
deltas.

23
00:01:52,010 --> 00:01:52,680
OK.

24
00:01:52,940 --> 00:02:01,610
So as I said we can do this through the non-pay library and the non-pay library is given by the shortcut

25
00:02:01,940 --> 00:02:09,440
and then from this non-pilot rii we're going to take the random module because it's a random module

26
00:02:09,440 --> 00:02:19,550
that contains the rand n function that is returning these simpled small values following a normal distribution.

27
00:02:19,550 --> 00:02:24,920
You know the end here is for normal and Rande it's because it's returning some random values.

28
00:02:24,920 --> 00:02:26,870
So normal distribution.

29
00:02:27,110 --> 00:02:34,610
But now it's important to understand that we're not only going to return one small value we're going

30
00:02:34,610 --> 00:02:39,580
to return a matrix of small values following a random distribution.

31
00:02:39,650 --> 00:02:40,600
And why is that.

32
00:02:40,610 --> 00:02:48,290
That's simply because we are adding this delta here multiplied by the little noise to our matrix of

33
00:02:48,290 --> 00:02:49,510
weight theta.

34
00:02:49,670 --> 00:02:53,570
And therefore this delta here must be a matrix of course.

35
00:02:53,570 --> 00:03:00,420
So basically we're adding those little very small values to each of the weights of the matrix of weight

36
00:03:00,440 --> 00:03:01,170
theta.

37
00:03:01,520 --> 00:03:09,510
And that's why here in the random function we have to specify the dimensions of the matrix theta.

38
00:03:09,500 --> 00:03:15,260
It must have the exact same dimensions of the matrix theta because we are adding those small values

39
00:03:15,260 --> 00:03:18,240
to each of the values of the matrix theta.

40
00:03:18,410 --> 00:03:25,460
And therefore here we need to specify the dimensions of this matrix of small values we want to add to

41
00:03:25,460 --> 00:03:28,390
the matrix of weight theta.

42
00:03:28,630 --> 00:03:32,070
And there is a trick in Python to do that quickly.

43
00:03:32,150 --> 00:03:34,490
It is by adding a star here.

44
00:03:34,490 --> 00:03:42,840
Then we take our matrix of weight self the theta and then we just add shape and this will return as

45
00:03:43,070 --> 00:03:47,770
the dimensions of the matrix of weight theta which is exactly what we need.

46
00:03:47,780 --> 00:03:53,630
Otherwise the other way was to take some of that theta that shape and then in square brackets 0 that

47
00:03:53,630 --> 00:03:58,810
would give us the first time mention of the matrix of where theta and then come up and then sell that

48
00:03:58,820 --> 00:04:04,550
theta that shape and then square brackets 1 which would give us the second dimension of the matrix of

49
00:04:04,550 --> 00:04:05,520
what theta.

50
00:04:05,540 --> 00:04:10,970
So you know we would have a couple with these two elements but it's much quicker it's actually a little

51
00:04:10,970 --> 00:04:11,610
trick.

52
00:04:11,630 --> 00:04:18,050
That is only compatible with Python 3 2 at this door here to specify that we want the two dimensions

53
00:04:18,110 --> 00:04:19,670
of the shape of theta.

54
00:04:19,670 --> 00:04:21,720
All right so good to know.

55
00:04:21,920 --> 00:04:29,030
So that gives us the dimensions and therefore that creates a matrix of the exact same time emotions

56
00:04:29,150 --> 00:04:30,130
as theta.

57
00:04:30,200 --> 00:04:36,020
So we can add this matrix to theta and this matrix will contain some small values following a normal

58
00:04:36,020 --> 00:04:37,100
distribution.

59
00:04:37,100 --> 00:04:38,500
So that's a good first thing done.

60
00:04:38,720 --> 00:04:42,660
But then that's not all that's not only what we want to return.

61
00:04:42,740 --> 00:04:48,020
We not only want to return a matrix of small values following a normal distribution we want to return

62
00:04:48,230 --> 00:04:59,030
16 matrices of these small values why 16 is because we are playing those perturbations for 16 different

63
00:04:59,030 --> 00:05:00,050
directions.

64
00:05:00,050 --> 00:05:04,760
Remember this and the directions hyper parameter here that is equal to 16.

65
00:05:04,850 --> 00:05:09,310
That means that when we apply a perturbation in some direction.

66
00:05:09,500 --> 00:05:17,270
Well we are going to do that for 16 directions and 16 opposite directions so therefore in total 32 directions

67
00:05:17,480 --> 00:05:20,640
16 positive directions and 16 negative directions.

68
00:05:20,670 --> 00:05:29,290
So we want to return these matrices as a list of matrices.

69
00:05:29,290 --> 00:05:35,920
And that's why I'm adding some square brackets here surrounding these matrices that we're creating and

70
00:05:35,920 --> 00:05:44,200
in order to get 16 matrices we just need to add a for loop with any you know boy will I can just add

71
00:05:44,520 --> 00:05:56,730
I or even an underscore in the range of the total number of directions and I as you can notice the HP

72
00:05:56,740 --> 00:06:03,550
because the number of directions is a hyper parameter of our future HP objects that will create at the

73
00:06:03,550 --> 00:06:05,230
end of the implementation.

74
00:06:05,230 --> 00:06:11,020
All right so basically by just adding this full hoop here for the total number of directions I'm going

75
00:06:11,020 --> 00:06:20,290
to create 16 matrices of small random values following a normal distribution that is a Gaussian distribution

76
00:06:20,290 --> 00:06:21,970
of mean zero and variance 1.

77
00:06:22,120 --> 00:06:24,030
And then things to this noise here.

78
00:06:24,160 --> 00:06:27,910
Well actually add to the matrix of weights.

79
00:06:27,910 --> 00:06:35,470
Theta are not small values following normal distribution but small values following a Gaussian distribution

80
00:06:35,530 --> 00:06:43,060
of mean zero and of variance or standard deviation 0.03 because remember that we initialized the noise

81
00:06:43,390 --> 00:06:45,070
to 0.03.

82
00:06:45,160 --> 00:06:53,450
So that's how we'll add the perturbations in a positive direction and the opposite negative direction.

83
00:06:53,510 --> 00:07:00,790
Right now we have our sampled values and so we're ready to move on to the final method of this policy

84
00:07:00,790 --> 00:07:07,910
class which is basically the next step of the paper that is make the update step.

85
00:07:08,110 --> 00:07:09,940
Well actually there is step 6 first.

86
00:07:09,970 --> 00:07:17,110
So the directions us or perturbations by the max or do we ward in the positive direction and the reward

87
00:07:17,110 --> 00:07:18,670
in the negative direction.

88
00:07:18,670 --> 00:07:20,000
But no worries we will do that.

89
00:07:20,020 --> 00:07:26,240
And actually the training itself will implement that feature here in the training we don't have to implement

90
00:07:26,240 --> 00:07:28,300
it right now in the evaluate method.

91
00:07:28,300 --> 00:07:37,390
However we have to implement this make the update which is actually one step of great in the sense because

92
00:07:37,390 --> 00:07:43,170
indeed we are updating the matrix of weight in the direction of the perturbation here.

93
00:07:43,240 --> 00:07:43,780
Right.

94
00:07:43,780 --> 00:07:51,850
The gradient is approximated by taking the differences of the words into two couple of positive and

95
00:07:51,940 --> 00:07:55,090
opposite directions and multiplied by the perturbation.

96
00:07:55,090 --> 00:08:01,900
That's one step of gradient descent and that's the update step that we have to implement now and we'll

97
00:08:01,900 --> 00:08:03,600
do that in the next hour tomorrow.

98
00:08:03,880 --> 00:08:05,550
Until then enjoy AI.