1
00:00:00,560 --> 00:00:06,200
Highs and welcome back to Python now we're going to start the real implementation from scratch.

2
00:00:06,280 --> 00:00:12,400
Step-By-Step as usual and here we go with the first step in putting the libraries which you're going

3
00:00:12,400 --> 00:00:17,870
to see are not going to be any advanced libraries like tenso flow or by torch.

4
00:00:17,870 --> 00:00:24,210
We're just going to use non-pilot to build one of the most powerful eyes that you can build today.

5
00:00:24,400 --> 00:00:25,600
So this is a big deal.

6
00:00:25,600 --> 00:00:32,500
This contrasts between the power of the eye and the simple library which is done by Remember that sense

7
00:00:32,510 --> 00:00:38,920
of flow and by torch were invented because I was not advanced enough to build some AI's you know to

8
00:00:38,920 --> 00:00:40,500
build some deep learning models.

9
00:00:40,690 --> 00:00:47,040
But indeed here we're not doing some deep Eremos we're not doing deep learning at all just shallow learning

10
00:00:47,050 --> 00:00:53,170
if I may say is we'll still have a perception with some neurons but we'll just have one layer and therefore

11
00:00:53,170 --> 00:00:56,780
we can totally use pi to build our AI.

12
00:00:57,100 --> 00:01:03,820
So what we're going to import now importing the libraries will be well first before we import them by

13
00:01:03,820 --> 00:01:10,150
we're going to import the O S library which is the library you need where you're going to use your operating

14
00:01:10,150 --> 00:01:15,820
system and indeed we're going to use our operating system because you know in the end we will create

15
00:01:15,880 --> 00:01:23,850
a folder an output folder which will contain the videos of our AI walking on the field in the Bible

16
00:01:23,850 --> 00:01:24,930
that framework.

17
00:01:25,090 --> 00:01:35,560
OK so operating system and then to build our AI we will just need by only this and that will be enough

18
00:01:35,560 --> 00:01:37,500
to build our AI.

19
00:01:37,890 --> 00:01:39,600
OK so here we go.

20
00:01:39,610 --> 00:01:45,940
I just saved and now we're going to set the hyper parameters all the happy parameters that will be used

21
00:01:46,120 --> 00:01:47,820
to build the AI.

22
00:01:47,860 --> 00:01:49,630
So how are we going to do that.

23
00:01:49,630 --> 00:01:54,200
The best way this is a classic way of setting the hyper parameters.

24
00:01:54,310 --> 00:02:00,130
The best way is doing that through class I remind for those of you who are new to programming a class

25
00:02:00,250 --> 00:02:04,350
is like an ensemble of instructions of something you want to build.

26
00:02:04,480 --> 00:02:10,390
You know you find instructions and then what you can do is create some objects as many objects as you

27
00:02:10,390 --> 00:02:15,870
want which will contain all the properties that were defined in the class.

28
00:02:16,030 --> 00:02:22,150
So you first define the class and then you create the object of the class which will do later on but

29
00:02:22,510 --> 00:02:28,240
the class that we're about to build right now is just a class that will contain all the hype parameters

30
00:02:28,450 --> 00:02:29,420
of our AI.

31
00:02:29,420 --> 00:02:34,220
I remind that the hyper parameter is a parameter that is supposed to have a fixed value.

32
00:02:34,240 --> 00:02:39,940
No it won't be any variable of some function or any other kind of variable but that doesn't mean that

33
00:02:39,940 --> 00:02:41,970
you can try some other values.

34
00:02:41,980 --> 00:02:45,410
But in the training of the eye the value is fixed.

35
00:02:45,430 --> 00:02:46,540
That's what you need to understand.

36
00:02:46,540 --> 00:02:51,690
You know the hyper parameter during the building and the training of the eye is fixed.

37
00:02:51,730 --> 00:02:57,640
But then if you want to try some other trainings Well you can try these trainings with some other values

38
00:02:58,000 --> 00:03:03,280
of the hyper parameters and therefore what we're going to define right now through this class is the

39
00:03:03,280 --> 00:03:09,610
list of all these parameters that will have a fixed value during the training and which won't be modified

40
00:03:09,640 --> 00:03:12,570
unless you want to try some other trainings.

41
00:03:12,610 --> 00:03:13,940
All right so let's do this.

42
00:03:14,020 --> 00:03:20,560
So the final lesson Python you simply need to start with class and then you add here the name of your

43
00:03:20,560 --> 00:03:25,400
class the name of the class and Python usually starts with a capital letter.

44
00:03:25,450 --> 00:03:31,990
And since we're just a new class that will give the list of all the hyper parameters of the AI.

45
00:03:32,140 --> 00:03:39,490
Well I'm going to call it h p as hyper parameters and then you need to add some parenthesis because

46
00:03:39,880 --> 00:03:47,950
in a class you can specify some variables or even some inheritance tools but we won't use any variables

47
00:03:47,950 --> 00:03:56,380
here simply because we just want to define some hyper parameters that will be defined through the first

48
00:03:56,590 --> 00:03:57,370
method of the class.

49
00:03:57,370 --> 00:04:04,480
You know it is always the same method you start with when building a class which is the in it method

50
00:04:04,870 --> 00:04:12,370
separated by two double underscores and this method will take one argument which is self.

51
00:04:12,370 --> 00:04:20,020
So self is a mystery for much people when they start programming but that is only used to refer to the

52
00:04:20,110 --> 00:04:22,750
object that will be created from the class.

53
00:04:22,750 --> 00:04:28,630
We also call it an instance of the class you know the future objects will create are called instances

54
00:04:28,630 --> 00:04:35,710
of the class and self here is to specify that when you're going to use a variable that belongs to the

55
00:04:35,710 --> 00:04:41,980
object then you will specify that with self in order to specify that the variable indeed belongs to

56
00:04:41,980 --> 00:04:42,840
the object.

57
00:04:43,060 --> 00:04:49,240
OK so you're going to understand now what I'm going to do then because I'm going to define all the variables

58
00:04:49,240 --> 00:04:54,380
of the object and to define them to specify that these are variables of the object.

59
00:04:54,400 --> 00:04:58,370
I will indeed use the self here to refer to the object.

60
00:04:58,690 --> 00:05:02,050
So what is to be the first hyper parameter.

61
00:05:02,050 --> 00:05:06,590
What is going to be the first parameter that we're going to use and is going to be fixed in all the

62
00:05:06,590 --> 00:05:07,350
training.

63
00:05:07,700 --> 00:05:10,790
Well that's going to be the number of steps.

64
00:05:10,790 --> 00:05:16,080
So here I'm just choosing a name for the number of steps and what is the number of steps.

65
00:05:16,080 --> 00:05:21,110
That's basically the number of training loops we're going to have in the end or in other words that

66
00:05:21,190 --> 00:05:24,270
the number of times we're going to update our model.

67
00:05:24,350 --> 00:05:31,400
Know this perceptual one layer of several neurons which has a policy ticking hasn't put the state of

68
00:05:31,400 --> 00:05:38,780
the environment and returning it outputs the actions to pray in order to walk properly and which value

69
00:05:38,780 --> 00:05:41,370
are we going to choose where we're going to choose 1000.

70
00:05:41,570 --> 00:05:48,560
We will get some good results with that then next parameter still yourself because this is still a parameter

71
00:05:48,560 --> 00:05:51,580
of the future instances of the HP class.

72
00:05:51,590 --> 00:05:57,610
It's going to be the episode length the episode length which is what.

73
00:05:57,830 --> 00:06:00,810
Which is the maximum length of an episode.

74
00:06:01,130 --> 00:06:08,480
So the length here is just the maximum length of an episode meaning the maximum time will I walk on

75
00:06:08,480 --> 00:06:11,630
the field and the value we're going to choose for that.

76
00:06:11,720 --> 00:06:20,420
Still after experimentation is same one doesn't feel free to test any other values then we're going

77
00:06:20,420 --> 00:06:26,270
to define you Paramo it is going to be self taught learning rate.

78
00:06:26,570 --> 00:06:33,590
So learning rates is an inevitable parameter that you will always have whether you're doing some machine

79
00:06:33,590 --> 00:06:39,950
learning deep learning or AI learning where it is always here and that's just to control how fast your

80
00:06:39,950 --> 00:06:41,270
AI is learning.

81
00:06:41,300 --> 00:06:47,150
And usually you want to start with not too small but not too large a learning rate and a good value.

82
00:06:47,150 --> 00:06:52,530
Here is 0.00 to again feel free to try some other values.

83
00:06:52,820 --> 00:06:58,730
Then another hyper parameter that is really important and really you know at the heart of the augmented

84
00:06:58,730 --> 00:07:08,800
random search algorithm is the number of directions the number of directions which is the number of

85
00:07:08,800 --> 00:07:11,580
perturbations we'll apply on each of the weights.

86
00:07:11,590 --> 00:07:17,920
Remember that we're testing a certain number of directions and also they're opposite directions to figure

87
00:07:17,920 --> 00:07:20,640
out which direction increases the most reward.

88
00:07:20,800 --> 00:07:26,230
And this number of directions is a fixed hyper parameter and the value we're going to choose here actually

89
00:07:26,230 --> 00:07:27,370
started with 8.

90
00:07:27,580 --> 00:07:33,750
But I realized that I had better results with 16 and actually I will even try with some more directions

91
00:07:33,760 --> 00:07:38,930
because indeed the more directions you test the better chance you have to increase the reward.

92
00:07:39,010 --> 00:07:43,140
But of course the more directions you test and the longer it will take the training.

93
00:07:43,210 --> 00:07:44,820
So I will test it.

94
00:07:44,860 --> 00:07:47,830
But right now let's just start with 16.

95
00:07:48,010 --> 00:07:53,500
Then remember in the article in the paper they want to consider separately the number of directions

96
00:07:53,890 --> 00:07:59,410
and the number of best directions you know the directions are best that are increasing the reward the

97
00:07:59,410 --> 00:08:00,070
most.

98
00:08:00,250 --> 00:08:05,440
You want to keep them separately because you're going to reduce them and therefore separately we're

99
00:08:05,440 --> 00:08:14,440
going to create the number of best directions and we're going to start with 16 as well which means that

100
00:08:14,500 --> 00:08:16,690
we're going to test all the direction so far.

101
00:08:16,690 --> 00:08:22,510
But later on who will choose a number that is lower than the total number of directions.

102
00:08:22,510 --> 00:08:25,030
And speaking of being lower.

103
00:08:25,210 --> 00:08:28,690
That's what we'll need to assert right now.

104
00:08:28,800 --> 00:08:34,840
That is we need to make sure that the number of best directions is lower than the number of directions.

105
00:08:34,840 --> 00:08:35,960
Right that makes sense.

106
00:08:36,130 --> 00:08:41,950
You want to keep your top directions among all your directions and in order to assert something in Python.

107
00:08:42,070 --> 00:08:55,040
The trick to do that is add here assert self doubt and the best directions lower than the number of

108
00:08:55,670 --> 00:09:02,970
directions there you go with a certain number of best directions is always lower than the number of

109
00:09:02,970 --> 00:09:04,360
directions.

110
00:09:04,920 --> 00:09:08,790
Perfect then three more variables.

111
00:09:08,790 --> 00:09:15,690
The next one is the noise which is going to be the sigma in the Gaussian distribution will use to sample

112
00:09:15,690 --> 00:09:21,570
the perturbations because actually these perturbations are going to be sampled following a Gaussian

113
00:09:21,570 --> 00:09:26,640
distribution and you know in a Galchen distribution you have the standard deviation and this noise is

114
00:09:26,640 --> 00:09:34,650
actually this standard deviation Sigma and we're going to set it equal to both point three to not have

115
00:09:34,650 --> 00:09:35,690
a large variance.

116
00:09:35,940 --> 00:09:44,720
OK then the next hyper parameter will be the seed which is just to fix the current configuration of

117
00:09:44,720 --> 00:09:45,850
the environment.

118
00:09:45,860 --> 00:09:50,620
So that's basically to fix the parameters of the environment.

119
00:09:50,840 --> 00:09:54,280
We can just choose any value here just so that we have the same result.

120
00:09:54,290 --> 00:09:58,870
You know if you want to get the same result as the ones I will show you in the end.

121
00:09:58,910 --> 00:10:04,970
In addition when we test our AI on several environments Well it's normal to observe the same thing.

122
00:10:04,970 --> 00:10:08,030
So let's just pick one that's the seed.

123
00:10:08,270 --> 00:10:12,500
And then finally one has variable which is of course the environment.

124
00:10:12,710 --> 00:10:16,920
So I'm going to give the following name to that variable.

125
00:10:16,970 --> 00:10:20,510
It's basically the environment will connect to our AI.

126
00:10:20,650 --> 00:10:27,500
It's just the name of the environment so we can call it and name and name and so forth we're just going

127
00:10:27,500 --> 00:10:34,190
to input some quotes and later on when we choose our environment to play with we will give the name

128
00:10:34,190 --> 00:10:36,660
of this environment inside the quotes.

129
00:10:36,710 --> 00:10:37,080
All right.

130
00:10:37,080 --> 00:10:38,170
And that's it.

131
00:10:38,180 --> 00:10:42,630
These are basically the hyper parameters that will be used in the AI.

132
00:10:42,770 --> 00:10:44,300
You can actually add some more.

133
00:10:44,300 --> 00:10:45,980
For example you can add a dk.

134
00:10:45,990 --> 00:10:51,410
You know when you use the learning red you can add a DK hyper parameter to reduce the learning rate

135
00:10:51,410 --> 00:10:53,270
of the epochs in the training.

136
00:10:53,450 --> 00:10:55,130
But this will be enough.

137
00:10:55,130 --> 00:10:58,730
With this group of hyper Renner's we will get some great results.

138
00:10:58,730 --> 00:10:59,510
All right.

139
00:10:59,510 --> 00:11:04,910
So that was the first step and each step in the next step is going to get slightly more challenging.

140
00:11:04,910 --> 00:11:12,080
We will implement the first important feature of the paper which is about normalizing the States so

141
00:11:12,080 --> 00:11:14,410
we will follow exactly what is in the paper.

142
00:11:14,660 --> 00:11:18,640
And I remind that when you do this for performance purposes.

143
00:11:18,980 --> 00:11:20,650
So let's do this in the next tutorial.

144
00:11:20,660 --> 00:11:21,480
I'll see you there.

145
00:11:21,500 --> 00:11:22,900
And until then enjoy a.