Hi, first of all thanks for contributing with such a template, it is very useful and I am trying to use it for a domain adaptation algorithm.
I have a couple of questions regarding iteration-based training, since the template is supposed to work for both epoch-based and iteration-based training. In the following lines:
|
if self.len_epoch is None: |
|
# epoch-based training |
|
self.len_epoch = len(self.train_data_loaders["data"]) |
|
else: |
|
# iteration-based training |
|
self.train_data_loaders["data"] = inf_loop(self.train_data_loaders["data"]) |
you define the logic for both training methods. However, in the _train_epoch function, you do validation only after the epoch is finished, which is fine if you are doing epoch-based training, but if I want to evaluate the model every a certain val_steps, then the function will never perform evaluation because the iterator declared in line 33 is infinite, causing the for-loop from _train_epoch to never stop as long as the condition:
|
if batch_idx == self.len_epoch: |
|
break |
is met. Another issue is that you are doing the step of the learning rate scheduler after an epoch:
|
if self.do_lr_scheduling: |
|
self.lr_scheduler.step() |
but for iteration-based training that is supposed to happen every step. Shouldn't the lines 123-124 go inside of the for-loop?
Hi, first of all thanks for contributing with such a template, it is very useful and I am trying to use it for a domain adaptation algorithm.
I have a couple of questions regarding iteration-based training, since the template is supposed to work for both epoch-based and iteration-based training. In the following lines:
pytorch-template/trainers/trainer.py
Lines 28 to 33 in e73871d
you define the logic for both training methods. However, in the
_train_epochfunction, you do validation only after the epoch is finished, which is fine if you are doing epoch-based training, but if I want to evaluate the model every a certainval_steps, then the function will never perform evaluation because the iterator declared in line 33 is infinite, causing the for-loop from_train_epochto never stop as long as the condition:pytorch-template/trainers/trainer.py
Lines 110 to 111 in e73871d
is met. Another issue is that you are doing the step of the learning rate scheduler after an epoch:
pytorch-template/trainers/trainer.py
Lines 123 to 124 in e73871d
but for iteration-based training that is supposed to happen every step. Shouldn't the lines 123-124 go inside of the for-loop?