Models

class luminoth.models.base.BaseNetwork(config, name='base_network')[source]

Convolutional Neural Network used for image classification, whose architecture can be any of the VALID_ARCHITECTURES.

This class wraps the tf.slim implementations of these models, with some helpful additions.

_build(inputs, is_training=False)[source]

Add elements to the Graph, computing output Tensors from input Tensors.

Subclasses must implement this method, which will be wrapped in a Template.

Parameters:
  • *args – Input Tensors.
  • **kwargs – Additional Python flags controlling connection.
Returns:

output Tensor(s).

_get_base_network_vars()[source]

Returns a list of all the base network’s variables.

_normalize(inputs)[source]

Normalize between -1.0 to 1.0.

Parameters:inputs – A Tensor of images we want to normalize. Its shape is (1, height, width, num_channels).
Returns:
A Tensor of images normalized between -1 and 1.
Its shape is the same as the input.
Return type:outputs
_subtract_channels(inputs, means=[123.68, 116.78, 103.94])[source]

Subtract channels from images.

It is common for CNNs to subtract the mean of all images from each channel. In the case of RGB images we first calculate the mean from each of the channels (Red, Green, Blue) and subtract those values for training and for inference.

Parameters:
  • inputs – A Tensor of images we want to normalize. Its shape is (1, height, width, num_channels).
  • means – A Tensor of shape (num_channels,) with the means to be subtracted from each channels on the inputs.
Returns:

A Tensor of images normalized with the means.

Its shape is the same as the input.

Return type:

outputs

get_base_network_checkpoint_vars()[source]

Returns the vars which the base network checkpoint will load into.

We return a dict which maps a variable name to a variable object. This is needed because the base network may be created inside a particular scope, which the checkpoint may not contain. Therefore we must map each variable to its unscoped name in order to be able to find them in the checkpoint file.

get_trainable_vars()[source]

Returns a list of the variables that are trainable.

If a value for fine_tune_from is specified in the config, only the variables starting from the first that contains this string in its name will be trainable. For example, specifying vgg_16/fc6 for a VGG16 will set only the variables in the fully connected layers to be trainable. If fine_tune_from is None, then all the variables will be trainable.

Returns:a tuple of tf.Variable.
Return type:trainable_variables
class luminoth.models.base.TruncatedBaseNetwork(config, name='truncated_base_network', **kwargs)[source]

Feature extractor for images using a regular CNN.

By using the notion of an “endpoint”, we truncate a classification CNN at a certain layer output, and return this partial feature map to be used as a good image representation for other ML tasks.

_build(inputs, is_training=False)[source]
Parameters:inputs – A Tensor of shape (batch_size, height, width, channels).
Returns:
A Tensor of shape
(batch_size, feature_map_height, feature_map_width, depth). The resulting dimensions depend on the CNN architecture, the endpoint used, and the dimensions of the input images.
Return type:feature_map
_get_endpoint(endpoints)[source]

Returns the endpoint tensor from the list of possible endpoints.

Since we already have a dictionary with variable names we should be able to get the desired tensor directly. Unfortunately the variable names change with scope and the scope changes between TensorFlow versions. We opted to just select the tensor for which the variable name ends with the endpoint name we want (it should be just one).

Parameters:endpoints – a dictionary with {variable_name: tensor}.
Returns:a tensor.
Return type:endpoint_value
get_trainable_vars()[source]

Returns a list of the variables that are trainable.

Returns:a tuple of tf.Variable.
Return type:trainable_variables
class luminoth.models.fasterrcnn.FasterRCNN(config, name='fasterrcnn')[source]

Faster RCNN Network module

Builds the Faster RCNN network architecture using different submodules. Calculates the total loss of the model based on the different losses by each of the submodules.

It is also responsible for building the anchor reference which is used in graph for generating the dynamic anchors.

_build(image, gt_boxes=None, is_training=False)[source]

Returns bounding boxes and classification probabilities.

Parameters:
  • image – A tensor with the image. Its shape should be (height, width, 3).
  • gt_boxes – A tensor with all the ground truth boxes of that image. Its shape should be (num_gt_boxes, 5) Where for each gt box we have (x1, y1, x2, y2, label), in that order.
  • is_training – A boolean to whether or not it is used for training.
Returns:

A tensor with the softmax probability for

each of the bounding boxes found in the image. Its shape should be: (num_bboxes, num_categories + 1)

classification_bbox: A tensor with the bounding boxes found.

It’s shape should be: (num_bboxes, 4). For each of the bboxes we have (x1, y1, x2, y2)

Return type:

classification_prob

_generate_anchors(feature_map_shape)[source]

Generate anchor for an image.

Using the feature map, the output of the pretrained network for an image, and the anchor_reference generated using the anchor config values. We generate a list of anchors.

Anchors are just fixed bounding boxes of different ratios and sizes that are uniformly generated throught the image.

Parameters:feature_map_shape – Shape of the convolutional feature map used as input for the RPN. Should be (batch, height, width, depth).
Returns:
A flattened Tensor with all the anchors of shape
(num_anchors_per_points * feature_width * feature_height, 4) using the (x1, y1, x2, y2) convention.
Return type:all_anchors
get_trainable_vars()[source]

Get trainable vars included in the module.

loss(prediction_dict, return_all=False)[source]

Compute the joint training loss for Faster RCNN.

Parameters:prediction_dict

The output dictionary of the _build method from which we use two different main keys:

rpn_prediction: A dictionary with the output Tensors from the
RPN.
classification_prediction: A dictionary with the output Tensors
from the RCNN.
Returns:If return_all is False, a tensor for the total loss. If True, a dict with all the internal losses (RPN’s, RCNN’s, regularization and total loss).
summary

Generate merged summary of all the sub-summaries used inside the Faster R-CNN network.

class luminoth.models.ssd.SSD(config, name='ssd')[source]

SSD: Single Shot MultiBox Detector

_build(image, gt_boxes=None, is_training=False)[source]

Returns bounding boxes and classification probabilities.

Parameters:
  • image – A tensor with the image. Its shape should be (height, width, 3).
  • gt_boxes – A tensor with all the ground truth boxes of that image. Its shape should be (num_gt_boxes, 5) Where for each gt box we have (x1, y1, x2, y2, label), in that order.
  • is_training – A boolean to whether or not it is used for training.
Returns:

predictions: proposal_prediction: A dictionary with:

proposals: The proposals of the network after appling some

filters like negative area; and NMS

proposals_label: A tensor with the label for each proposal. proposals_label_prob: A tensor with the softmax probability

for the label of each proposal.

bbox_offsets: A tensor with the predicted bbox_offsets class_scores: A tensor with the predicted classes scores

Return type:

A dictionary with the following keys

get_trainable_vars()[source]

Get trainable vars included in the module.

loss(prediction_dict, return_all=False)[source]

Compute the loss for SSD.

Parameters:prediction_dict

The output dictionary of the _build method from which we use different main keys:

cls_pred: A dictionary with the classes classification. loc_pred: A dictionary with the localization predictions target: A dictionary with the targets for both classes and

localizations.
Returns:A tensor for the total loss.
summary

Generate merged summary of all the sub-summaries used inside the ssd network.