1
00:00:00,420 --> 00:00:06,420
Hello and welcome to this new tutorial and most importantly to this very essential new code section

2
00:00:06,420 --> 00:00:10,970
that we're about to implement and that is of course the training of the AI.

3
00:00:11,250 --> 00:00:19,020
So we build the AI and basically we made all the tools that will allow us to implement all the steps

4
00:00:19,020 --> 00:00:22,360
here of the augmented random search algorithm.

5
00:00:22,380 --> 00:00:28,410
You know you understand that what we've done so far are just preparing the tools so that we can integrate

6
00:00:28,470 --> 00:00:30,520
all of them in one same training.

7
00:00:30,630 --> 00:00:36,290
And this is exactly what we'll do in this code section this nuke detection to do the training.

8
00:00:36,300 --> 00:00:38,500
However this will still be a function.

9
00:00:38,550 --> 00:00:44,450
That doesn't mean that a D.N. of this new code section about the training we will run the code.

10
00:00:44,610 --> 00:00:45,930
No it is still a function.

11
00:00:45,930 --> 00:00:52,560
However at the very end the very last code section of this implementation we will run everything by

12
00:00:52,560 --> 00:00:58,540
you know preparing the environment connecting the environment to the AI and then execute this train

13
00:00:58,540 --> 00:01:02,220
function to train the AI to walk inside this environment.

14
00:01:02,220 --> 00:01:04,960
So we need to understand the approach we're doing.

15
00:01:05,000 --> 00:01:08,680
We first made some tools to organize and structure everything.

16
00:01:08,820 --> 00:01:14,790
And now we're about to make this huge training function which will use the previous classes and methods

17
00:01:14,790 --> 00:01:17,460
we've done before to run this whole training.

18
00:01:17,460 --> 00:01:18,590
So let's get started.

19
00:01:18,810 --> 00:01:20,870
And let's go back to Python.

20
00:01:20,880 --> 00:01:25,920
All right so as I said we will integrate the Holcim still inside a function.

21
00:01:25,920 --> 00:01:32,340
So I'm starting with Def and the name of the function will simply be train and it will take as arguments

22
00:01:32,670 --> 00:01:34,890
first the environment.

23
00:01:34,890 --> 00:01:40,790
Then of course the policy which is of course our AI AI which will train inside this environment.

24
00:01:41,100 --> 00:01:46,850
Then our normalizer is why the normalizer should be an argument of this function.

25
00:01:47,100 --> 00:01:53,040
Well it's in the case you know you want to change your normalizer In fact when you build an AI AI when

26
00:01:53,040 --> 00:01:58,700
you build a machine learning or deploying model you usually try several normalizer lasers.

27
00:01:58,800 --> 00:02:04,230
There can be several of them so it's nice to still have the option to change it easily by sending it

28
00:02:04,230 --> 00:02:07,750
as an argument of a function which is to try and function here.

29
00:02:08,010 --> 00:02:15,510
And then the last argument we will use is H-P which will be our future object that will contain all

30
00:02:15,510 --> 00:02:19,230
the hyper parameters there are going to be fixed during the training.

31
00:02:19,350 --> 00:02:25,320
And again why do we use it as an argument that because we want to be able to do some Chuen easily in

32
00:02:25,320 --> 00:02:31,020
the end meaning that we want to be able to try other values of the hyper parameters.

33
00:02:31,260 --> 00:02:37,020
And indeed it will be simple for us to do that because since HP is an argument of this train function

34
00:02:37,380 --> 00:02:44,760
well we'll simply need to change and tweak the values here to try some other ones and still keep this

35
00:02:44,940 --> 00:02:48,710
hybrid parameter object in a function without having to change anything.

36
00:02:48,710 --> 00:02:54,810
Basically if you want to try other values of your hyper parameters you only need to change them here.

37
00:02:54,910 --> 00:02:56,480
So that's pretty practical.

38
00:02:56,480 --> 00:03:00,800
And that said that the arguments will need for this strange function.

39
00:03:00,810 --> 00:03:06,110
All right let's not forget the Collen and let's define what the string function has to do.

40
00:03:06,150 --> 00:03:07,410
And so what does it have to do.

41
00:03:07,410 --> 00:03:13,580
Well let's go back to the paper and let me show you exactly what we'll do in this train function will

42
00:03:13,680 --> 00:03:21,120
basically do this whole while loop you know while ending condition that satisfy do all the stuff and

43
00:03:21,210 --> 00:03:23,950
all this stuff will be very easy for us to do.

44
00:03:23,970 --> 00:03:25,710
Thanks to all the functions we made.

45
00:03:25,950 --> 00:03:31,860
So that's why the reason we're going to do is well it's not actually going to be a while loop because

46
00:03:31,860 --> 00:03:35,580
we made this and the steps are horrible.

47
00:03:35,610 --> 00:03:42,060
You know as one of our hyper parameters which I remind is the number of loops in the training or more

48
00:03:42,060 --> 00:03:48,300
specifically the number of times we're going to update our moral you know with this update function.

49
00:03:48,420 --> 00:03:54,150
So it's also the number of steps of great in the sense we're going to do to abate the weight of our

50
00:03:54,150 --> 00:03:55,030
policy.

51
00:03:55,320 --> 00:04:03,240
And since we have this variable that is going to set the number of training loops or the number of updates.

52
00:04:03,450 --> 00:04:08,430
Well we don't need to do a while loop and it's actually better to do a for loop because indeed we're

53
00:04:08,460 --> 00:04:15,540
going to loop over all the steps in the range starting from zero but we don't have to specify because

54
00:04:15,540 --> 00:04:20,240
it's a default value you know the lower bound of the range function in Python.

55
00:04:20,490 --> 00:04:29,700
So from 0 up to this total number of steps of the training and I didn't forget the HP here because the

56
00:04:29,700 --> 00:04:33,450
steps is a hyper parameter of our implementation.

57
00:04:33,450 --> 00:04:37,410
All right and now we are ready to start the fall.

58
00:04:37,710 --> 00:04:44,850
So this training code section is quite long so I want to structure it as much as possible and more specifically

59
00:04:44,850 --> 00:04:49,320
I want to highlight each of the different steps of the training we're going into.

60
00:04:49,530 --> 00:04:54,900
So we're going to break this down in several tutorials and in the next one we're going to start with

61
00:04:54,900 --> 00:05:01,370
the first step of the training which will be to initialize the perturbations Delta and the positive

62
00:05:01,370 --> 00:05:03,600
rewards as well as the negative rewards.

63
00:05:03,770 --> 00:05:10,610
So basically we will sample the Deltas using our simple deltas method from the policy class that we

64
00:05:10,610 --> 00:05:11,500
made earlier.

65
00:05:11,600 --> 00:05:16,060
So that will simple are dealt us then we will initialize the positive rewards.

66
00:05:16,070 --> 00:05:21,370
That is the worst that will get by applying the perturbations in the positive directions.

67
00:05:21,650 --> 00:05:24,170
And then we'll initialized the negative rewards.

68
00:05:24,170 --> 00:05:25,650
That is the reward.

69
00:05:25,670 --> 00:05:29,380
We're going to get by applying the perturbations in the negative.

70
00:05:29,390 --> 00:05:32,510
Or remember it's actually the opposite direction.

71
00:05:32,750 --> 00:05:37,010
So just an initialization step and then we'll move on to the next step.

72
00:05:37,030 --> 00:05:40,190
They're going to be about eight steps in total for the train.

73
00:05:40,190 --> 00:05:45,530
So as you can see it's pretty long but since we're going to break it down you'll be able to take it

74
00:05:45,530 --> 00:05:51,950
step by step and most of the time I remind I recommend to take a step back in order to see the logical

75
00:05:51,950 --> 00:05:55,800
flow of the implementation and hear the training.

76
00:05:55,820 --> 00:05:57,930
So let's start the first step in the next tutorial.

77
00:05:57,950 --> 00:05:59,770
And until then enjoy AI.