In order to train your own model, two things are required:

• A dataset ready to be consumed by Luminoth (see Adapting a dataset).
• A configuration file for the run.

We’ll start by covering the configuration file, then proceed to the training itself, both locally and in the cloud.

## Configuration¶

Training orchestration, including the model to be used, the dataset location and training schedule, is specified in a YAML config file. This file will be consumed by Luminoth and merged to the default configuration to start the training session.

You can see a minimal config file example in sample_config.yml. This file illustrates the entries you’ll most probably need to modify, which are:

• train.run_name: The run name for the training session, used to identify it.
• train.job_dir: Directory in which both model checkpoints and summaries (for Tensorboard consumption) will be saved. The actual files will be stored under {job_dir}/{run_name}, so serving {job_dir} with Tensorboard will allow you to see all your runs at once.
• dataset.dir: Directory from which to read the TFrecords files.
• model.type: Model to use for object detection (e.g. fasterrcnn, ssd).
• network.num_classes: Number of classes to predict.

There are a great deal of configuration options, mostly related to the model itself. You can, for instance, see the full range of options for the Faster R-CNN model, along with a brief description of each, in its base_config.yml file.

## Training¶

The model training itself can either be run locally (on the CPU or GPU available) or in Google Cloud’s Cloud ML Engine.

### Locally¶

Assuming you already have both your dataset and the config file ready, you can start your training session by running the command as follows:

\$ lumi train -c my_config.yml


The lumi train CLI tool provides the following options related to training.

• --config/-c: Config file to use. If the flag is repeated, all config files will be merged in left-to-right order so that every file overwrites the configuration of keys defined previously.
• --override/-o: Override any configuration setting using dot notation (e.g.: -o model.rpn.proposals.nms_threshold=0.8).

If you’re using a CUDA-based GPU, you can select the GPU to use by setting the CUDA_VISIBLE_DEVICES environment variable. (See the NVIDIA site for more information.)

You can run Tensorboard on the job_dir to visualize training, including the loss, evaluation metrics, training speed, and even partial images.