Skip to content

Questions about iteration-based training #1

@fabriziojpiva

Description

@fabriziojpiva

Hi, first of all thanks for contributing with such a template, it is very useful and I am trying to use it for a domain adaptation algorithm.

I have a couple of questions regarding iteration-based training, since the template is supposed to work for both epoch-based and iteration-based training. In the following lines:

if self.len_epoch is None:
# epoch-based training
self.len_epoch = len(self.train_data_loaders["data"])
else:
# iteration-based training
self.train_data_loaders["data"] = inf_loop(self.train_data_loaders["data"])

you define the logic for both training methods. However, in the _train_epoch function, you do validation only after the epoch is finished, which is fine if you are doing epoch-based training, but if I want to evaluate the model every a certain val_steps, then the function will never perform evaluation because the iterator declared in line 33 is infinite, causing the for-loop from _train_epoch to never stop as long as the condition:

if batch_idx == self.len_epoch:
break

is met. Another issue is that you are doing the step of the learning rate scheduler after an epoch:

if self.do_lr_scheduling:
self.lr_scheduler.step()

but for iteration-based training that is supposed to happen every step. Shouldn't the lines 123-124 go inside of the for-loop?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions