1
00:00:00,570 --> 00:00:06,900
Hello and welcome to the beginning of this journey this new journey in which we're going to build one

2
00:00:06,900 --> 00:00:13,470
of the most powerful eyes this new artificial intelligence that was just released in 2018 late March

3
00:00:13,470 --> 00:00:14,370
2010.

4
00:00:14,640 --> 00:00:19,320
And so in this little I want to take the opportunity to show you the environment.

5
00:00:19,340 --> 00:00:25,400
We'll be working on as well as the research paper that contains the whole theory of this new AI.

6
00:00:25,440 --> 00:00:31,170
And because we're going to go back and forth from our code to the research paper and vice versa because

7
00:00:31,380 --> 00:00:37,860
really I want to show you how to understand a research paper and translate it into code so it's important

8
00:00:37,860 --> 00:00:42,630
that we already start to get familiar with it so that it doesn't get too overwhelming when we start

9
00:00:42,630 --> 00:00:43,730
digging into it.

10
00:00:43,740 --> 00:00:44,290
All right.

11
00:00:44,460 --> 00:00:49,560
But first let me show you what we're going to do you know to build AI on which kind of environment we're

12
00:00:49,560 --> 00:00:50,570
going to train it.

13
00:00:50,640 --> 00:00:54,720
And before I show you exactly the environment we'll be working on.

14
00:00:54,720 --> 00:01:00,610
Let me show you something that is very exciting and that made a huge buzz in the AI community.

15
00:01:00,930 --> 00:01:05,940
What I want to show you is the Google deep mind humanum it.

16
00:01:05,940 --> 00:01:06,570
Here we go.

17
00:01:06,570 --> 00:01:12,590
I already have the research gesture and the video I want to show you is exactly this one Google deep

18
00:01:12,610 --> 00:01:13,860
mine.

19
00:01:15,150 --> 00:01:23,450
You can see I'm walking on sunshield and this is taking has put the states on the arm which were not

20
00:01:24,280 --> 00:01:29,880
the images but something called vectors during what's happening in the environment.

21
00:01:29,880 --> 00:01:36,740
So you know these are all the angles between the axis angles of rotations the coordinates of the eye.

22
00:01:36,840 --> 00:01:45,510
And this is predicting the actions to play in order to walk on these kind of fields and you have to

23
00:01:45,510 --> 00:01:51,810
understand that the action to play are not only one action but several actions like you know the muscles

24
00:01:51,860 --> 00:01:58,620
in portions of the human are its robots or these other arms or cheetah robots.

25
00:01:58,620 --> 00:02:03,910
So you have to understand that what you will build is actually a function that will take us into the

26
00:02:03,990 --> 00:02:09,570
states environment which will be vector isn't including what's happening in the environment and which

27
00:02:09,570 --> 00:02:16,320
will be returning the actions to play which are the muscles in the muscles and bosons in order to walk

28
00:02:16,320 --> 00:02:17,730
on these fields.

29
00:02:17,730 --> 00:02:19,580
So this is very exciting.

30
00:02:19,580 --> 00:02:21,990
This is the Google environment.

31
00:02:22,290 --> 00:02:26,790
But the bad news is that this environment is not on purpose.

32
00:02:26,800 --> 00:02:30,080
However I would like to see something that is at least as exciting as this.

33
00:02:30,240 --> 00:02:36,930
So what we're going to do is work on some similar environment where we are going to train and to work

34
00:02:36,990 --> 00:02:42,730
on some fields not with these amazing graphics but still the graphics will be awesome.

35
00:02:42,810 --> 00:02:48,820
And this is what I'm going to show you right now on the environment that we'll be working on.

36
00:02:49,260 --> 00:02:51,020
So here we are back on Google.

37
00:02:51,090 --> 00:02:57,630
And before I show to you this environment I just want to say that today if you want to build an AI to

38
00:02:57,850 --> 00:03:02,230
train you to walk on a field or run across a field you have two options.

39
00:03:02,370 --> 00:03:09,930
The first option is Miyoko which you can find on opening Jim and also on the team control suite by deep

40
00:03:09,930 --> 00:03:10,500
mind.

41
00:03:10,710 --> 00:03:13,120
And the second one is pilot.

42
00:03:13,140 --> 00:03:20,790
Now the bad news is that Djoko is not open source is not open source you have to buy a license to get

43
00:03:20,790 --> 00:03:21,920
environments.

44
00:03:21,930 --> 00:03:27,450
You can actually get a free trial for one month but you know I want you to have fun as much as you want

45
00:03:27,750 --> 00:03:30,500
with this course and with this AAA we're going to build.

46
00:03:30,660 --> 00:03:37,050
So we are not going to go for Mukoko if you really want to go from a Djoko you can get a free trial

47
00:03:37,080 --> 00:03:40,070
or buy a license or even if you are a student.

48
00:03:40,110 --> 00:03:44,500
Well you can get the license for free but I know that not all of your students in this course.

49
00:03:44,520 --> 00:03:46,970
So we're not going to go over this one.

50
00:03:47,010 --> 00:03:52,050
The other one we're going to go for and which is by the way even better in terms of graphics and mechanics

51
00:03:52,320 --> 00:03:53,610
is pilots.

52
00:03:53,760 --> 00:04:00,020
And let me show you this amazing environment which is the one we're going to use for this course.

53
00:04:00,030 --> 00:04:00,600
All right.

54
00:04:00,600 --> 00:04:08,130
Bible it Bible it is an easy to use by the Munjal for physics simulation robotics and deep reinforcement

55
00:04:08,130 --> 00:04:08,910
learning.

56
00:04:08,910 --> 00:04:16,890
It was built by Irwin Cummins who I often discuss with sometimes about AI and robotics and mostly about

57
00:04:16,890 --> 00:04:20,810
people because I actually contributed to the development of pilot.

58
00:04:20,820 --> 00:04:27,970
I made a request to have recently to improve on the features so Bible that you have to have paid Schir

59
00:04:28,380 --> 00:04:36,120
of Bible it which is originally from bullet 3 where you can find some more details if you scroll down.

60
00:04:36,120 --> 00:04:40,980
All right you have all the details about the bullet physics as the case for those of you who are into

61
00:04:40,980 --> 00:04:41,940
mechanics.

62
00:04:41,940 --> 00:04:49,230
You can check it out more closely but what I really want to show you right now is some examples of pilot

63
00:04:49,290 --> 00:04:50,360
environments.

64
00:04:50,400 --> 00:04:54,390
So if we click on videos here we can see tons of examples.

65
00:04:54,390 --> 00:04:57,230
These are different bible environments.

66
00:04:57,240 --> 00:04:59,810
Let's have a look at this one for example.

67
00:04:59,870 --> 00:05:08,210
The mini tour this is the mini tour and this is an agent you can train to move forward with some a build

68
00:05:08,330 --> 00:05:15,140
so it can be an AI based on deep reinforcement learning like you know deep learning or the A-3 see this

69
00:05:15,140 --> 00:05:16,620
is not what we're gonna do for this course.

70
00:05:16,620 --> 00:05:24,040
We're going to build the latest and very powerful AI that was released in 2018 March 2018 ers.

71
00:05:24,260 --> 00:05:27,070
But you can train it with several miles.

72
00:05:27,200 --> 00:05:28,240
Let's have a look.

73
00:05:28,250 --> 00:05:32,000
And clicking on it and here is the minute we're moving forward.

74
00:05:32,120 --> 00:05:34,540
And it was trained by an AI model.

75
00:05:34,790 --> 00:05:36,260
So that's pretty cool.

76
00:05:36,260 --> 00:05:38,840
This is not the environment we'll be playing with.

77
00:05:38,870 --> 00:05:44,110
We will actually be playing with the half cheetah environment for two reasons.

78
00:05:44,120 --> 00:05:50,010
I personally think this is more fun to train a cheetah on how to run than a mini tour.

79
00:05:50,210 --> 00:05:57,560
And I hesitated between the hatchet and the humanoid but the thing is the humanoid is the most challenging

80
00:05:57,800 --> 00:06:04,340
environment to train and I had to walk and it actually takes several weeks on a normal computer to train

81
00:06:04,340 --> 00:06:05,020
the human.

82
00:06:05,060 --> 00:06:09,950
And you know I want everyone in this course to be able to train the AI we're going to be on this course

83
00:06:10,170 --> 00:06:15,720
and then how she dies the best compromise between something challenging to train and the training time.

84
00:06:15,730 --> 00:06:22,150
No we won't have to wait for days or weeks to train our health CIDA and you're going to see the areas

85
00:06:22,210 --> 00:06:26,750
so powerful that will only have to wait for a few minutes.

86
00:06:26,750 --> 00:06:27,700
This is insane.

87
00:06:27,800 --> 00:06:30,450
I can't wait to show this to you.

88
00:06:30,470 --> 00:06:31,220
All right.

89
00:06:31,220 --> 00:06:37,340
So now that we have the environment let me show you again the research paper I'm sure will show that

90
00:06:37,340 --> 00:06:38,030
to you.

91
00:06:38,030 --> 00:06:42,500
But let me show it to you again because we really need to get familiar with it because we're going to

92
00:06:42,500 --> 00:06:46,640
go back and forth from the code to the research paper and vice versa.

93
00:06:46,640 --> 00:06:52,820
So I just want to show it to you once again and especially show you exactly what we'll be implementing

94
00:06:53,090 --> 00:06:58,670
in the course you know which version of the Airbus algorithm will be implementing because indeed the

95
00:06:58,670 --> 00:07:02,560
research paper suggests several versions and no worries.

96
00:07:02,720 --> 00:07:05,160
We will be implementing the most powerful one.

97
00:07:05,440 --> 00:07:12,530
So let's go to the research paper it's called Mantid random search provides a competitive approach to

98
00:07:12,530 --> 00:07:14,360
reinforcement learning.

99
00:07:14,360 --> 00:07:16,160
Actually this is the research paper.

100
00:07:16,160 --> 00:07:17,600
Let's click on it.

101
00:07:17,600 --> 00:07:18,460
Here we go.

102
00:07:18,530 --> 00:07:23,080
And then PTF here and here we are in the research paper.

103
00:07:23,150 --> 00:07:27,810
Simple random search provides a competitive approach to reinforcement learning.

104
00:07:27,820 --> 00:07:34,850
In his paper written by Horia Manua earlier guy and Benjamin Rashed vision in Russia is very famous

105
00:07:34,850 --> 00:07:36,110
in the AI community.

106
00:07:36,110 --> 00:07:42,410
One of the best and so all of this was invented in the University of California Berkeley one of the

107
00:07:42,410 --> 00:07:43,860
top universities in the world.

108
00:07:43,950 --> 00:07:47,450
And as you can see March 20 2018.

109
00:07:47,450 --> 00:07:51,490
So just very recently at the time I'm speaking.

110
00:07:51,500 --> 00:07:57,680
And so what I want to show you exactly is the algorithm we'll be implementing in the course which is

111
00:07:58,160 --> 00:08:00,440
on page 6 if I remember correctly.

112
00:08:00,440 --> 00:08:00,900
Here we go.

113
00:08:00,920 --> 00:08:01,310
Yeah.

114
00:08:01,490 --> 00:08:07,970
This is the algorithm we'll be implementing exactly as it is so you know you will really learn how to

115
00:08:08,000 --> 00:08:13,570
follow a research paper and translate what's going on here in a code.

116
00:08:13,580 --> 00:08:15,960
We will be implementing exactly the same thing actually.

117
00:08:16,130 --> 00:08:18,700
They suggest two versions of the algorithm.

118
00:08:18,880 --> 00:08:25,040
One which is without any normalization of the states and we do which is with normalization of states

119
00:08:25,040 --> 00:08:30,770
because indeed they say that we can normalize the states to improve the performance and we can also

120
00:08:31,010 --> 00:08:37,130
scale the standard deviation to improve the performance as well which we will also be doing the standard

121
00:08:37,130 --> 00:08:39,170
deviation of by the way that we want.

122
00:08:39,170 --> 00:08:44,750
So basically we will implement the best version of this paper which is V-2 with the normalization of

123
00:08:44,750 --> 00:08:48,030
the states and the normalization of the world.

124
00:08:48,110 --> 00:08:51,740
And so we'll be making some classes and functions to implement this.

125
00:08:51,770 --> 00:08:55,120
The best way you know how a real AI developer would do it.

126
00:08:55,220 --> 00:09:01,820
So you will learn how to think logically like an AI developer and work on some pretty advanced research

127
00:09:01,820 --> 00:09:07,310
paper but I promise you you will understand everything because we will really go into the details of

128
00:09:07,310 --> 00:09:08,040
this.

129
00:09:08,270 --> 00:09:13,070
And now I would like to say just a few words on how this works because you know we could explain it

130
00:09:13,070 --> 00:09:14,270
in a few words.

131
00:09:14,300 --> 00:09:20,390
So basically the first important thing to understand is that your AI you have to see it as a function

132
00:09:20,390 --> 00:09:27,490
which is called a policy taking as with the states of the environment which are inputs vectors encoding

133
00:09:27,530 --> 00:09:30,770
exactly what's happening at each time in the environment.

134
00:09:30,770 --> 00:09:36,230
So they are like the angles of the axis of Eurobonds the angles of rotation the coordinates of the points

135
00:09:36,230 --> 00:09:42,470
of your robot and more you know enough values to describe exactly what's happening environment so that

136
00:09:42,470 --> 00:09:44,970
we could almost draw a picture.

137
00:09:44,980 --> 00:09:45,730
All right.

138
00:09:45,740 --> 00:09:51,680
So this is a function taking this as input and returning as output the actions to play the actions to

139
00:09:51,680 --> 00:09:52,720
play in order to walk.

140
00:09:52,830 --> 00:09:58,790
But then the second very important thing to understand is that the output is not only one action but

141
00:09:58,880 --> 00:10:02,200
several It's you know a group of actions and that makes sense.

142
00:10:02,200 --> 00:10:07,240
That's because in order to work on some field you not only have to move one leg you have to move all

143
00:10:07,240 --> 00:10:09,790
the parts of your body in more precisely.

144
00:10:09,790 --> 00:10:15,120
These actions will be all the muscles and portions you can have within these parts of the body.

145
00:10:15,130 --> 00:10:20,360
So this is probably very different from what you've done before you know because in the previous age

146
00:10:20,360 --> 00:10:21,360
you might have built.

147
00:10:21,460 --> 00:10:26,220
Generally you return one action and mostly one discrete action here.

148
00:10:26,210 --> 00:10:31,360
It's not only one action it's a group of action but also this is a group of continuous actions because

149
00:10:31,360 --> 00:10:37,420
the muscle and portions are measured by some continuous metrics in order to make it even closer to reality.

150
00:10:37,420 --> 00:10:44,350
So you know we are really going to build an AI AI for some very realistic goal and I remind that you

151
00:10:44,350 --> 00:10:50,170
know it's fun to train any eye to work on some field but this is not only for fun purposes but also

152
00:10:50,170 --> 00:10:56,290
it is a benchmark for your AI because if you manage to train your AI to walk on some field Well you

153
00:10:56,290 --> 00:11:03,280
can adapt this AI to other problems even business problems by just changing the inputs of the environment

154
00:11:03,520 --> 00:11:08,110
know which will still be some including vectors and by of course adapting the outputs the actions to

155
00:11:08,110 --> 00:11:10,530
play and the reward strategy.

156
00:11:10,570 --> 00:11:17,320
So by only a few tweaks you could transpose this AI that we're going to build and train into another

157
00:11:17,320 --> 00:11:18,270
kind of problem.

158
00:11:18,340 --> 00:11:20,240
Even business problems.

159
00:11:20,260 --> 00:11:21,810
Now how does that work well.

160
00:11:21,850 --> 00:11:26,950
So your AI is a policy that is a function to Gazen put the states of the environment and returning a

161
00:11:26,950 --> 00:11:30,430
group of continuous actions which are the muscles in potions.

162
00:11:30,490 --> 00:11:36,520
And how does it work well between those inputs and those outputs you're going to have a perception which

163
00:11:36,520 --> 00:11:40,820
is actually a neural network of one layer composed of several neurons.

164
00:11:41,050 --> 00:11:47,530
Each of these new ones will have a wait and we will do an exploration on the policy meaning we will

165
00:11:47,590 --> 00:11:54,310
explore lots and lots of updates on these weights by just adding you know some little values here that

166
00:11:54,310 --> 00:11:59,560
will follow a normal distribution to see which update of the weights will increase.

167
00:11:59,560 --> 00:12:00,610
Do we want.

168
00:12:00,620 --> 00:12:04,660
And so basically what we'll do is that will test several directions.

169
00:12:04,690 --> 00:12:09,970
Each direction corresponding to one little small value will add to the matrix of weights.

170
00:12:09,970 --> 00:12:15,280
This is a matrix of weight and for each of these directions will also do the same update you know with

171
00:12:15,280 --> 00:12:18,140
the same value but with the opposite direction.

172
00:12:18,200 --> 00:12:23,890
You know that's why we have a plus an A minus here that correspond to one specific direction but both

173
00:12:24,160 --> 00:12:26,610
For one way and the opposite way.

174
00:12:26,800 --> 00:12:30,150
And why do we need to do this you know with one way and the opposite way.

175
00:12:30,190 --> 00:12:37,030
That's because in this step 7 here we will do what is called an approximated grade in the sense meaning

176
00:12:37,030 --> 00:12:43,410
that we will take the word that we'll get by playing this perturbation here in the plus direction the

177
00:12:43,420 --> 00:12:48,970
positive direction and we will subtract this we to the word we get by applying the same perturbation

178
00:12:48,970 --> 00:12:52,460
with the same value but with the opposite direction.

179
00:12:52,630 --> 00:13:00,190
And that will be in order to compute the finished difference of reward with respect to this small update

180
00:13:00,310 --> 00:13:05,020
of the weight which is the smart perturbation value we add to each of the weights.

181
00:13:05,100 --> 00:13:10,750
And indeed this is an approximation of the gradient because this is an approximation of the partial

182
00:13:10,750 --> 00:13:17,950
derivative of the word with respect to the weights because indeed you are trying to optimize the word

183
00:13:18,250 --> 00:13:19,900
with respect to the weights.

184
00:13:19,940 --> 00:13:24,880
All right and then you're going to add this approximation of the gradient of the word with respect to

185
00:13:24,880 --> 00:13:31,300
the weight to your current matrix of weight to have to wait in the best directions that will increase

186
00:13:31,300 --> 00:13:33,140
the most the reward.

187
00:13:33,580 --> 00:13:33,980
Great.

188
00:13:33,970 --> 00:13:35,650
So I can't wait to start.

189
00:13:35,710 --> 00:13:42,190
But before we start I want to show you one last thing which is an implementation by some developer we

190
00:13:42,370 --> 00:13:47,610
know personally and who was very happy to share his code for the course.

191
00:13:47,620 --> 00:13:49,320
His name is Alex D-Jack.

192
00:13:49,330 --> 00:13:56,280
He is a very talented AI developer who made this amazing Python implementation for this same algorithm

193
00:13:56,290 --> 00:14:01,580
recommended random search with these X and paper and he obtained amazing results.

194
00:14:01,780 --> 00:14:07,450
And for those of you who don't know Alex Jack he is a highly skilled Ph.D. student in one of the best

195
00:14:07,450 --> 00:14:14,620
universities EPF fail in Switzerland who has made some great contributions to the community and not

196
00:14:14,620 --> 00:14:21,110
only that he is in the prestigious Facebook that AI page.

197
00:14:21,220 --> 00:14:21,930
Here we go.

198
00:14:21,940 --> 00:14:27,310
That's a prestigious page you know all the people of this page are very talented in the AI community

199
00:14:27,310 --> 00:14:28,780
and great contributors.

200
00:14:28,780 --> 00:14:36,250
And indeed if we scroll down we will find Alex Jack right here with its amazing tutorial on neural transfer

201
00:14:36,250 --> 00:14:37,190
with by touch.

202
00:14:37,390 --> 00:14:42,970
So I would like to say a huge thank you to Alex Jack for helping us with this course and sharing his

203
00:14:42,970 --> 00:14:47,710
amazing implementation of the areas which will get inspired from we won't have exactly the same code

204
00:14:47,730 --> 00:14:51,670
but still will use the same type of parameters and the same logic.

205
00:14:51,700 --> 00:14:52,360
All right.

206
00:14:52,450 --> 00:14:53,760
So I can't wait to start.

207
00:14:53,800 --> 00:14:55,500
Let's start from the next tutorial.

208
00:14:55,540 --> 00:14:57,210
And until then enjoy AI.