1 00:00:01,240 --> 00:00:05,830 Hello and welcome back to the course on augmented random search engine today's tutorial we're going 2 00:00:05,830 --> 00:00:09,060 to compare basic versus augmented random search. 3 00:00:09,190 --> 00:00:13,990 So we're going to find out why is augmented random search called augmented random search. 4 00:00:13,990 --> 00:00:17,440 What is this word augmented in the name stands for. 5 00:00:17,470 --> 00:00:18,700 So let's have a look. 6 00:00:18,700 --> 00:00:26,230 We're going to look at three main updates that were done in Erris to make it augmented. 7 00:00:26,370 --> 00:00:32,640 The first one is scale update step by the standard deviation of rewards. 8 00:00:32,850 --> 00:00:35,640 Second one was online normalization of states. 9 00:00:35,640 --> 00:00:41,230 And third one is discarding directions that yield the lowest rewards. 10 00:00:41,400 --> 00:00:51,000 So the first one which is about scaling up data step by step division rewards is as simple as taking 11 00:00:51,060 --> 00:00:57,840 this calculation which we discussed in the previous tutorial and dividing it by the standard deviation 12 00:00:57,840 --> 00:01:01,070 of the rewards involved in this in this calculation. 13 00:01:01,080 --> 00:01:05,330 Now it's not as important for the purposes of our intuition. 14 00:01:05,340 --> 00:01:10,610 There's a technical reason behind this and what we're not going to dive into it. 15 00:01:10,800 --> 00:01:15,200 You can find more information on this in part three point one. 16 00:01:15,230 --> 00:01:20,070 All of the research paper which will reference at the end of that tutorials same research paper that 17 00:01:20,190 --> 00:01:24,700 we referenced in the previous article the main research paper. 18 00:01:24,720 --> 00:01:27,770 So this is not as important for us as the other one. 19 00:01:27,770 --> 00:01:29,570 So let's move on from here. 20 00:01:29,580 --> 00:01:34,410 But just something important something to keep in mind and something to have a look in the paper or 21 00:01:34,700 --> 00:01:37,050 to look out for in the practical tutorial. 22 00:01:37,260 --> 00:01:43,940 The second update is online normalization of states and online basically means real time normalization 23 00:01:43,980 --> 00:01:50,070 of states as the AI is learning as the AI is going through these environments. 24 00:01:50,070 --> 00:01:54,690 The states are normalized and what does that what is their emotional states mean. 25 00:01:54,690 --> 00:02:02,010 It means normalizing these values that are inputs over here and not just normalizing them based on what 26 00:02:02,010 --> 00:02:11,280 they are normalizing them on based on what values we've already seen so that everything is treated in 27 00:02:11,280 --> 00:02:11,960 a similar way. 28 00:02:11,970 --> 00:02:16,350 And so they talk about this in more detail in their research paper and this will be one of the first 29 00:02:16,350 --> 00:02:18,000 things that will code I'd love. 30 00:02:18,000 --> 00:02:21,980 I'll just give a quick overview of how to ensure we think about this. 31 00:02:21,990 --> 00:02:24,940 And this is the same example as they given the research paper as well. 32 00:02:25,080 --> 00:02:32,430 So imagine that these input states for one case are between ranging between 90 or 100 so these values 33 00:02:32,430 --> 00:02:38,010 are between somewhere like between 80 and 100 like 91 92 and maybe 99. 34 00:02:38,100 --> 00:02:40,860 And another case there between minus 1 and 1. 35 00:02:40,860 --> 00:02:48,140 Now the thing is that the weights the fact that they're going to have like changing the way you remember 36 00:02:48,140 --> 00:02:54,870 how we learn we learn by being the weights while changing the way slightly by like 0.1. 37 00:02:55,250 --> 00:03:02,750 In the case when this value is 100 is going to yield a much more drastic change in output as opposed 38 00:03:02,750 --> 00:03:08,810 to when this value is something like 1 or zero point five or something like that. 39 00:03:08,810 --> 00:03:16,880 So basically even though like it's the environment might be so the environment might be described with 40 00:03:16,880 --> 00:03:21,200 different ranging values which is fair environment might be changing might be different it might be 41 00:03:21,200 --> 00:03:26,800 some you know different forces happening different types of terrain or things like that. 42 00:03:26,960 --> 00:03:33,590 And because because of that because these values might be in different ranges what can happen is that 43 00:03:33,590 --> 00:03:39,950 slight perturbation of weights might have different likely different magnitude or impacts on the output 44 00:03:39,950 --> 00:03:47,780 values even though they is the only reason for that is that the is in a different environment so we 45 00:03:47,780 --> 00:03:49,970 want to minimize that effect. 46 00:03:50,120 --> 00:03:58,100 And we want to have cheap perturbations in the way it's to be kind of like fair the perturbations to 47 00:03:58,100 --> 00:04:01,270 be fairly treated across the whole training process. 48 00:04:01,400 --> 00:04:05,930 And that's why we're going to normalize them but not just normalize them because a set of weights I 49 00:04:05,930 --> 00:04:11,600 think is eventually going to normalize them online which is also called widening of states meaning we're 50 00:04:11,600 --> 00:04:18,350 going to take the values that we've already seen for these states and we're going to normalize these 51 00:04:18,350 --> 00:04:20,520 values alongside what we've already seen. 52 00:04:20,690 --> 00:04:28,220 And that way the weights won't be affected as drastically by the different input the different ranges 53 00:04:28,220 --> 00:04:29,390 of these in propellors. 54 00:04:29,600 --> 00:04:37,700 So that was one of the biggest effects if you read the blog post that was in reference reading at the 55 00:04:37,700 --> 00:04:42,500 start of this section you would have noticed that the supervisor of the 56 00:04:45,530 --> 00:04:53,130 Horia money and really a guy he actually mentioned that this was one of the biggest changes that helped 57 00:04:53,570 --> 00:04:58,610 us tackle challenges such as humanoid walking. 58 00:04:58,610 --> 00:05:00,320 All right so that's important to understand. 59 00:05:00,320 --> 00:05:04,970 And again you'll see a bit more of that in the whole practical side of things. 60 00:05:04,970 --> 00:05:13,390 And in the research paper and finally the third update was discarding directions that yield lowest rewards. 61 00:05:13,430 --> 00:05:15,550 Quite a bold move. 62 00:05:15,920 --> 00:05:19,850 And it turned out to be a good bet for the authors. 63 00:05:19,850 --> 00:05:24,890 So here we've got our four or eight actually different perturbations. 64 00:05:24,890 --> 00:05:31,940 So for positive or negative the ones that we talked about the to tutorial and here we've got the results 65 00:05:32,600 --> 00:05:36,140 how they went through and we've got the rewards. 66 00:05:36,320 --> 00:05:40,760 And basically what they like as you recall previously we would use all of them but now they're saying 67 00:05:40,790 --> 00:05:48,320 all right we're only going to take the top k top k of these rewards the top k of these perturbations 68 00:05:48,320 --> 00:05:52,940 are actually going to work with them when are we going to discard the the ones that weren't part of 69 00:05:52,940 --> 00:05:53,290 the top. 70 00:05:53,320 --> 00:05:58,060 OK so we're going to take this one we're going to take this one and we're going to throw away these 71 00:05:58,220 --> 00:06:03,860 these two or these four perturbations we're not actually going to include them in our calculation and 72 00:06:03,860 --> 00:06:05,010 that changes the formula. 73 00:06:05,010 --> 00:06:06,280 We discussed over here. 74 00:06:06,590 --> 00:06:13,160 It's actually not going to have these include basically the weights are going to evolve in the direction 75 00:06:13,160 --> 00:06:18,520 of only the most successful results that we saw. 76 00:06:19,230 --> 00:06:27,440 And yeah and so that was also a significant update that helped this method and it's quite quite an intuitive 77 00:06:27,440 --> 00:06:28,240 one as well. 78 00:06:28,250 --> 00:06:28,510 Why. 79 00:06:28,520 --> 00:06:36,270 Why not take the the highest pathé the directions or revolution that have the highest potential of all. 80 00:06:36,290 --> 00:06:42,630 Just in that direction and discard the ones that have the lowest part of sounds a bit like natural selection. 81 00:06:42,950 --> 00:06:50,330 And now those are the three main differences if you'd like to get a bit more details of course you'll 82 00:06:50,330 --> 00:06:53,300 have them in all of these things in the practical terms. 83 00:06:53,300 --> 00:06:57,710 But if you'd like to read a bit more about them and understand them not just sort of intuitive Bay bases 84 00:06:57,740 --> 00:07:05,810 but like more philosophically or more mathematically then you once again would like to refer you to 85 00:07:06,140 --> 00:07:12,700 the research paper by her money and really a guy on an argument and random searches same research paper 86 00:07:12,710 --> 00:07:13,770 reference the embryos that are. 87 00:07:13,780 --> 00:07:19,310 But just in case you skipped that tural because it was quite advanced then this is a good opportunity 88 00:07:19,370 --> 00:07:23,530 to get back to that research paper and have a look at some of these things. 89 00:07:23,660 --> 00:07:27,530 And on that note I look forward to seeing that next of Tauriel. 90 00:07:27,540 --> 00:07:29,240 And until then enjoy AI.