1 00:00:00,570 --> 00:00:06,900 Hello and welcome to the beginning of this journey this new journey in which we're going to build one 2 00:00:06,900 --> 00:00:13,470 of the most powerful eyes this new artificial intelligence that was just released in 2018 late March 3 00:00:13,470 --> 00:00:14,370 2010. 4 00:00:14,640 --> 00:00:19,320 And so in this little I want to take the opportunity to show you the environment. 5 00:00:19,340 --> 00:00:25,400 We'll be working on as well as the research paper that contains the whole theory of this new AI. 6 00:00:25,440 --> 00:00:31,170 And because we're going to go back and forth from our code to the research paper and vice versa because 7 00:00:31,380 --> 00:00:37,860 really I want to show you how to understand a research paper and translate it into code so it's important 8 00:00:37,860 --> 00:00:42,630 that we already start to get familiar with it so that it doesn't get too overwhelming when we start 9 00:00:42,630 --> 00:00:43,730 digging into it. 10 00:00:43,740 --> 00:00:44,290 All right. 11 00:00:44,460 --> 00:00:49,560 But first let me show you what we're going to do you know to build AI on which kind of environment we're 12 00:00:49,560 --> 00:00:50,570 going to train it. 13 00:00:50,640 --> 00:00:54,720 And before I show you exactly the environment we'll be working on. 14 00:00:54,720 --> 00:01:00,610 Let me show you something that is very exciting and that made a huge buzz in the AI community. 15 00:01:00,930 --> 00:01:05,940 What I want to show you is the Google deep mind humanum it. 16 00:01:05,940 --> 00:01:06,570 Here we go. 17 00:01:06,570 --> 00:01:12,590 I already have the research gesture and the video I want to show you is exactly this one Google deep 18 00:01:12,610 --> 00:01:13,860 mine. 19 00:01:15,150 --> 00:01:23,450 You can see I'm walking on sunshield and this is taking has put the states on the arm which were not 20 00:01:24,280 --> 00:01:29,880 the images but something called vectors during what's happening in the environment. 21 00:01:29,880 --> 00:01:36,740 So you know these are all the angles between the axis angles of rotations the coordinates of the eye. 22 00:01:36,840 --> 00:01:45,510 And this is predicting the actions to play in order to walk on these kind of fields and you have to 23 00:01:45,510 --> 00:01:51,810 understand that the action to play are not only one action but several actions like you know the muscles 24 00:01:51,860 --> 00:01:58,620 in portions of the human are its robots or these other arms or cheetah robots. 25 00:01:58,620 --> 00:02:03,910 So you have to understand that what you will build is actually a function that will take us into the 26 00:02:03,990 --> 00:02:09,570 states environment which will be vector isn't including what's happening in the environment and which 27 00:02:09,570 --> 00:02:16,320 will be returning the actions to play which are the muscles in the muscles and bosons in order to walk 28 00:02:16,320 --> 00:02:17,730 on these fields. 29 00:02:17,730 --> 00:02:19,580 So this is very exciting. 30 00:02:19,580 --> 00:02:21,990 This is the Google environment. 31 00:02:22,290 --> 00:02:26,790 But the bad news is that this environment is not on purpose. 32 00:02:26,800 --> 00:02:30,080 However I would like to see something that is at least as exciting as this. 33 00:02:30,240 --> 00:02:36,930 So what we're going to do is work on some similar environment where we are going to train and to work 34 00:02:36,990 --> 00:02:42,730 on some fields not with these amazing graphics but still the graphics will be awesome. 35 00:02:42,810 --> 00:02:48,820 And this is what I'm going to show you right now on the environment that we'll be working on. 36 00:02:49,260 --> 00:02:51,020 So here we are back on Google. 37 00:02:51,090 --> 00:02:57,630 And before I show to you this environment I just want to say that today if you want to build an AI to 38 00:02:57,850 --> 00:03:02,230 train you to walk on a field or run across a field you have two options. 39 00:03:02,370 --> 00:03:09,930 The first option is Miyoko which you can find on opening Jim and also on the team control suite by deep 40 00:03:09,930 --> 00:03:10,500 mind. 41 00:03:10,710 --> 00:03:13,120 And the second one is pilot. 42 00:03:13,140 --> 00:03:20,790 Now the bad news is that Djoko is not open source is not open source you have to buy a license to get 43 00:03:20,790 --> 00:03:21,920 environments. 44 00:03:21,930 --> 00:03:27,450 You can actually get a free trial for one month but you know I want you to have fun as much as you want 45 00:03:27,750 --> 00:03:30,500 with this course and with this AAA we're going to build. 46 00:03:30,660 --> 00:03:37,050 So we are not going to go for Mukoko if you really want to go from a Djoko you can get a free trial 47 00:03:37,080 --> 00:03:40,070 or buy a license or even if you are a student. 48 00:03:40,110 --> 00:03:44,500 Well you can get the license for free but I know that not all of your students in this course. 49 00:03:44,520 --> 00:03:46,970 So we're not going to go over this one. 50 00:03:47,010 --> 00:03:52,050 The other one we're going to go for and which is by the way even better in terms of graphics and mechanics 51 00:03:52,320 --> 00:03:53,610 is pilots. 52 00:03:53,760 --> 00:04:00,020 And let me show you this amazing environment which is the one we're going to use for this course. 53 00:04:00,030 --> 00:04:00,600 All right. 54 00:04:00,600 --> 00:04:08,130 Bible it Bible it is an easy to use by the Munjal for physics simulation robotics and deep reinforcement 55 00:04:08,130 --> 00:04:08,910 learning. 56 00:04:08,910 --> 00:04:16,890 It was built by Irwin Cummins who I often discuss with sometimes about AI and robotics and mostly about 57 00:04:16,890 --> 00:04:20,810 people because I actually contributed to the development of pilot. 58 00:04:20,820 --> 00:04:27,970 I made a request to have recently to improve on the features so Bible that you have to have paid Schir 59 00:04:28,380 --> 00:04:36,120 of Bible it which is originally from bullet 3 where you can find some more details if you scroll down. 60 00:04:36,120 --> 00:04:40,980 All right you have all the details about the bullet physics as the case for those of you who are into 61 00:04:40,980 --> 00:04:41,940 mechanics. 62 00:04:41,940 --> 00:04:49,230 You can check it out more closely but what I really want to show you right now is some examples of pilot 63 00:04:49,290 --> 00:04:50,360 environments. 64 00:04:50,400 --> 00:04:54,390 So if we click on videos here we can see tons of examples. 65 00:04:54,390 --> 00:04:57,230 These are different bible environments. 66 00:04:57,240 --> 00:04:59,810 Let's have a look at this one for example. 67 00:04:59,870 --> 00:05:08,210 The mini tour this is the mini tour and this is an agent you can train to move forward with some a build 68 00:05:08,330 --> 00:05:15,140 so it can be an AI based on deep reinforcement learning like you know deep learning or the A-3 see this 69 00:05:15,140 --> 00:05:16,620 is not what we're gonna do for this course. 70 00:05:16,620 --> 00:05:24,040 We're going to build the latest and very powerful AI that was released in 2018 March 2018 ers. 71 00:05:24,260 --> 00:05:27,070 But you can train it with several miles. 72 00:05:27,200 --> 00:05:28,240 Let's have a look. 73 00:05:28,250 --> 00:05:32,000 And clicking on it and here is the minute we're moving forward. 74 00:05:32,120 --> 00:05:34,540 And it was trained by an AI model. 75 00:05:34,790 --> 00:05:36,260 So that's pretty cool. 76 00:05:36,260 --> 00:05:38,840 This is not the environment we'll be playing with. 77 00:05:38,870 --> 00:05:44,110 We will actually be playing with the half cheetah environment for two reasons. 78 00:05:44,120 --> 00:05:50,010 I personally think this is more fun to train a cheetah on how to run than a mini tour. 79 00:05:50,210 --> 00:05:57,560 And I hesitated between the hatchet and the humanoid but the thing is the humanoid is the most challenging 80 00:05:57,800 --> 00:06:04,340 environment to train and I had to walk and it actually takes several weeks on a normal computer to train 81 00:06:04,340 --> 00:06:05,020 the human. 82 00:06:05,060 --> 00:06:09,950 And you know I want everyone in this course to be able to train the AI we're going to be on this course 83 00:06:10,170 --> 00:06:15,720 and then how she dies the best compromise between something challenging to train and the training time. 84 00:06:15,730 --> 00:06:22,150 No we won't have to wait for days or weeks to train our health CIDA and you're going to see the areas 85 00:06:22,210 --> 00:06:26,750 so powerful that will only have to wait for a few minutes. 86 00:06:26,750 --> 00:06:27,700 This is insane. 87 00:06:27,800 --> 00:06:30,450 I can't wait to show this to you. 88 00:06:30,470 --> 00:06:31,220 All right. 89 00:06:31,220 --> 00:06:37,340 So now that we have the environment let me show you again the research paper I'm sure will show that 90 00:06:37,340 --> 00:06:38,030 to you. 91 00:06:38,030 --> 00:06:42,500 But let me show it to you again because we really need to get familiar with it because we're going to 92 00:06:42,500 --> 00:06:46,640 go back and forth from the code to the research paper and vice versa. 93 00:06:46,640 --> 00:06:52,820 So I just want to show it to you once again and especially show you exactly what we'll be implementing 94 00:06:53,090 --> 00:06:58,670 in the course you know which version of the Airbus algorithm will be implementing because indeed the 95 00:06:58,670 --> 00:07:02,560 research paper suggests several versions and no worries. 96 00:07:02,720 --> 00:07:05,160 We will be implementing the most powerful one. 97 00:07:05,440 --> 00:07:12,530 So let's go to the research paper it's called Mantid random search provides a competitive approach to 98 00:07:12,530 --> 00:07:14,360 reinforcement learning. 99 00:07:14,360 --> 00:07:16,160 Actually this is the research paper. 100 00:07:16,160 --> 00:07:17,600 Let's click on it. 101 00:07:17,600 --> 00:07:18,460 Here we go. 102 00:07:18,530 --> 00:07:23,080 And then PTF here and here we are in the research paper. 103 00:07:23,150 --> 00:07:27,810 Simple random search provides a competitive approach to reinforcement learning. 104 00:07:27,820 --> 00:07:34,850 In his paper written by Horia Manua earlier guy and Benjamin Rashed vision in Russia is very famous 105 00:07:34,850 --> 00:07:36,110 in the AI community. 106 00:07:36,110 --> 00:07:42,410 One of the best and so all of this was invented in the University of California Berkeley one of the 107 00:07:42,410 --> 00:07:43,860 top universities in the world. 108 00:07:43,950 --> 00:07:47,450 And as you can see March 20 2018. 109 00:07:47,450 --> 00:07:51,490 So just very recently at the time I'm speaking. 110 00:07:51,500 --> 00:07:57,680 And so what I want to show you exactly is the algorithm we'll be implementing in the course which is 111 00:07:58,160 --> 00:08:00,440 on page 6 if I remember correctly. 112 00:08:00,440 --> 00:08:00,900 Here we go. 113 00:08:00,920 --> 00:08:01,310 Yeah. 114 00:08:01,490 --> 00:08:07,970 This is the algorithm we'll be implementing exactly as it is so you know you will really learn how to 115 00:08:08,000 --> 00:08:13,570 follow a research paper and translate what's going on here in a code. 116 00:08:13,580 --> 00:08:15,960 We will be implementing exactly the same thing actually. 117 00:08:16,130 --> 00:08:18,700 They suggest two versions of the algorithm. 118 00:08:18,880 --> 00:08:25,040 One which is without any normalization of the states and we do which is with normalization of states 119 00:08:25,040 --> 00:08:30,770 because indeed they say that we can normalize the states to improve the performance and we can also 120 00:08:31,010 --> 00:08:37,130 scale the standard deviation to improve the performance as well which we will also be doing the standard 121 00:08:37,130 --> 00:08:39,170 deviation of by the way that we want. 122 00:08:39,170 --> 00:08:44,750 So basically we will implement the best version of this paper which is V-2 with the normalization of 123 00:08:44,750 --> 00:08:48,030 the states and the normalization of the world. 124 00:08:48,110 --> 00:08:51,740 And so we'll be making some classes and functions to implement this. 125 00:08:51,770 --> 00:08:55,120 The best way you know how a real AI developer would do it. 126 00:08:55,220 --> 00:09:01,820 So you will learn how to think logically like an AI developer and work on some pretty advanced research 127 00:09:01,820 --> 00:09:07,310 paper but I promise you you will understand everything because we will really go into the details of 128 00:09:07,310 --> 00:09:08,040 this. 129 00:09:08,270 --> 00:09:13,070 And now I would like to say just a few words on how this works because you know we could explain it 130 00:09:13,070 --> 00:09:14,270 in a few words. 131 00:09:14,300 --> 00:09:20,390 So basically the first important thing to understand is that your AI you have to see it as a function 132 00:09:20,390 --> 00:09:27,490 which is called a policy taking as with the states of the environment which are inputs vectors encoding 133 00:09:27,530 --> 00:09:30,770 exactly what's happening at each time in the environment. 134 00:09:30,770 --> 00:09:36,230 So they are like the angles of the axis of Eurobonds the angles of rotation the coordinates of the points 135 00:09:36,230 --> 00:09:42,470 of your robot and more you know enough values to describe exactly what's happening environment so that 136 00:09:42,470 --> 00:09:44,970 we could almost draw a picture. 137 00:09:44,980 --> 00:09:45,730 All right. 138 00:09:45,740 --> 00:09:51,680 So this is a function taking this as input and returning as output the actions to play the actions to 139 00:09:51,680 --> 00:09:52,720 play in order to walk. 140 00:09:52,830 --> 00:09:58,790 But then the second very important thing to understand is that the output is not only one action but 141 00:09:58,880 --> 00:10:02,200 several It's you know a group of actions and that makes sense. 142 00:10:02,200 --> 00:10:07,240 That's because in order to work on some field you not only have to move one leg you have to move all 143 00:10:07,240 --> 00:10:09,790 the parts of your body in more precisely. 144 00:10:09,790 --> 00:10:15,120 These actions will be all the muscles and portions you can have within these parts of the body. 145 00:10:15,130 --> 00:10:20,360 So this is probably very different from what you've done before you know because in the previous age 146 00:10:20,360 --> 00:10:21,360 you might have built. 147 00:10:21,460 --> 00:10:26,220 Generally you return one action and mostly one discrete action here. 148 00:10:26,210 --> 00:10:31,360 It's not only one action it's a group of action but also this is a group of continuous actions because 149 00:10:31,360 --> 00:10:37,420 the muscle and portions are measured by some continuous metrics in order to make it even closer to reality. 150 00:10:37,420 --> 00:10:44,350 So you know we are really going to build an AI AI for some very realistic goal and I remind that you 151 00:10:44,350 --> 00:10:50,170 know it's fun to train any eye to work on some field but this is not only for fun purposes but also 152 00:10:50,170 --> 00:10:56,290 it is a benchmark for your AI because if you manage to train your AI to walk on some field Well you 153 00:10:56,290 --> 00:11:03,280 can adapt this AI to other problems even business problems by just changing the inputs of the environment 154 00:11:03,520 --> 00:11:08,110 know which will still be some including vectors and by of course adapting the outputs the actions to 155 00:11:08,110 --> 00:11:10,530 play and the reward strategy. 156 00:11:10,570 --> 00:11:17,320 So by only a few tweaks you could transpose this AI that we're going to build and train into another 157 00:11:17,320 --> 00:11:18,270 kind of problem. 158 00:11:18,340 --> 00:11:20,240 Even business problems. 159 00:11:20,260 --> 00:11:21,810 Now how does that work well. 160 00:11:21,850 --> 00:11:26,950 So your AI is a policy that is a function to Gazen put the states of the environment and returning a 161 00:11:26,950 --> 00:11:30,430 group of continuous actions which are the muscles in potions. 162 00:11:30,490 --> 00:11:36,520 And how does it work well between those inputs and those outputs you're going to have a perception which 163 00:11:36,520 --> 00:11:40,820 is actually a neural network of one layer composed of several neurons. 164 00:11:41,050 --> 00:11:47,530 Each of these new ones will have a wait and we will do an exploration on the policy meaning we will 165 00:11:47,590 --> 00:11:54,310 explore lots and lots of updates on these weights by just adding you know some little values here that 166 00:11:54,310 --> 00:11:59,560 will follow a normal distribution to see which update of the weights will increase. 167 00:11:59,560 --> 00:12:00,610 Do we want. 168 00:12:00,620 --> 00:12:04,660 And so basically what we'll do is that will test several directions. 169 00:12:04,690 --> 00:12:09,970 Each direction corresponding to one little small value will add to the matrix of weights. 170 00:12:09,970 --> 00:12:15,280 This is a matrix of weight and for each of these directions will also do the same update you know with 171 00:12:15,280 --> 00:12:18,140 the same value but with the opposite direction. 172 00:12:18,200 --> 00:12:23,890 You know that's why we have a plus an A minus here that correspond to one specific direction but both 173 00:12:24,160 --> 00:12:26,610 For one way and the opposite way. 174 00:12:26,800 --> 00:12:30,150 And why do we need to do this you know with one way and the opposite way. 175 00:12:30,190 --> 00:12:37,030 That's because in this step 7 here we will do what is called an approximated grade in the sense meaning 176 00:12:37,030 --> 00:12:43,410 that we will take the word that we'll get by playing this perturbation here in the plus direction the 177 00:12:43,420 --> 00:12:48,970 positive direction and we will subtract this we to the word we get by applying the same perturbation 178 00:12:48,970 --> 00:12:52,460 with the same value but with the opposite direction. 179 00:12:52,630 --> 00:13:00,190 And that will be in order to compute the finished difference of reward with respect to this small update 180 00:13:00,310 --> 00:13:05,020 of the weight which is the smart perturbation value we add to each of the weights. 181 00:13:05,100 --> 00:13:10,750 And indeed this is an approximation of the gradient because this is an approximation of the partial 182 00:13:10,750 --> 00:13:17,950 derivative of the word with respect to the weights because indeed you are trying to optimize the word 183 00:13:18,250 --> 00:13:19,900 with respect to the weights. 184 00:13:19,940 --> 00:13:24,880 All right and then you're going to add this approximation of the gradient of the word with respect to 185 00:13:24,880 --> 00:13:31,300 the weight to your current matrix of weight to have to wait in the best directions that will increase 186 00:13:31,300 --> 00:13:33,140 the most the reward. 187 00:13:33,580 --> 00:13:33,980 Great. 188 00:13:33,970 --> 00:13:35,650 So I can't wait to start. 189 00:13:35,710 --> 00:13:42,190 But before we start I want to show you one last thing which is an implementation by some developer we 190 00:13:42,370 --> 00:13:47,610 know personally and who was very happy to share his code for the course. 191 00:13:47,620 --> 00:13:49,320 His name is Alex D-Jack. 192 00:13:49,330 --> 00:13:56,280 He is a very talented AI developer who made this amazing Python implementation for this same algorithm 193 00:13:56,290 --> 00:14:01,580 recommended random search with these X and paper and he obtained amazing results. 194 00:14:01,780 --> 00:14:07,450 And for those of you who don't know Alex Jack he is a highly skilled Ph.D. student in one of the best 195 00:14:07,450 --> 00:14:14,620 universities EPF fail in Switzerland who has made some great contributions to the community and not 196 00:14:14,620 --> 00:14:21,110 only that he is in the prestigious Facebook that AI page. 197 00:14:21,220 --> 00:14:21,930 Here we go. 198 00:14:21,940 --> 00:14:27,310 That's a prestigious page you know all the people of this page are very talented in the AI community 199 00:14:27,310 --> 00:14:28,780 and great contributors. 200 00:14:28,780 --> 00:14:36,250 And indeed if we scroll down we will find Alex Jack right here with its amazing tutorial on neural transfer 201 00:14:36,250 --> 00:14:37,190 with by touch. 202 00:14:37,390 --> 00:14:42,970 So I would like to say a huge thank you to Alex Jack for helping us with this course and sharing his 203 00:14:42,970 --> 00:14:47,710 amazing implementation of the areas which will get inspired from we won't have exactly the same code 204 00:14:47,730 --> 00:14:51,670 but still will use the same type of parameters and the same logic. 205 00:14:51,700 --> 00:14:52,360 All right. 206 00:14:52,450 --> 00:14:53,760 So I can't wait to start. 207 00:14:53,800 --> 00:14:55,500 Let's start from the next tutorial. 208 00:14:55,540 --> 00:14:57,210 And until then enjoy AI.