Models¶

class
luminoth.models.base.
BaseNetwork
(config, name='base_network')[source]¶ Convolutional Neural Network used for image classification, whose architecture can be any of the VALID_ARCHITECTURES.
This class wraps the tf.slim implementations of these models, with some helpful additions.

_build
(inputs, is_training=False)[source]¶ Add elements to the Graph, computing output Tensors from input Tensors.
Subclasses must implement this method, which will be wrapped in a Template.
Parameters:  *args – Input Tensors.
 **kwargs – Additional Python flags controlling connection.
Returns: output Tensor(s).

_normalize
(inputs)[source]¶ Normalize between 1.0 to 1.0.
Parameters: inputs – A Tensor of images we want to normalize. Its shape is (1, height, width, num_channels). Returns:  A Tensor of images normalized between 1 and 1.
 Its shape is the same as the input.
Return type: outputs

_subtract_channels
(inputs, means=[123.68, 116.78, 103.94])[source]¶ Subtract channels from images.
It is common for CNNs to subtract the mean of all images from each channel. In the case of RGB images we first calculate the mean from each of the channels (Red, Green, Blue) and subtract those values for training and for inference.
Parameters:  inputs – A Tensor of images we want to normalize. Its shape is (1, height, width, num_channels).
 means – A Tensor of shape (num_channels,) with the means to be subtracted from each channels on the inputs.
Returns:  A Tensor of images normalized with the means.
Its shape is the same as the input.
Return type: outputs

get_base_network_checkpoint_vars
()[source]¶ Returns the vars which the base network checkpoint will load into.
We return a dict which maps a variable name to a variable object. This is needed because the base network may be created inside a particular scope, which the checkpoint may not contain. Therefore we must map each variable to its unscoped name in order to be able to find them in the checkpoint file.

get_trainable_vars
()[source]¶ Returns a list of the variables that are trainable.
If a value for fine_tune_from is specified in the config, only the variables starting from the first that contains this string in its name will be trainable. For example, specifying vgg_16/fc6 for a VGG16 will set only the variables in the fully connected layers to be trainable. If fine_tune_from is None, then all the variables will be trainable.
Returns: a tuple of tf.Variable. Return type: trainable_variables


class
luminoth.models.base.
TruncatedBaseNetwork
(config, name='truncated_base_network', **kwargs)[source]¶ Feature extractor for images using a regular CNN.
By using the notion of an “endpoint”, we truncate a classification CNN at a certain layer output, and return this partial feature map to be used as a good image representation for other ML tasks.

_build
(inputs, is_training=False)[source]¶ Parameters: inputs – A Tensor of shape (batch_size, height, width, channels). Returns:  A Tensor of shape
 (batch_size, feature_map_height, feature_map_width, depth). The resulting dimensions depend on the CNN architecture, the endpoint used, and the dimensions of the input images.
Return type: feature_map

_get_endpoint
(endpoints)[source]¶ Returns the endpoint tensor from the list of possible endpoints.
Since we already have a dictionary with variable names we should be able to get the desired tensor directly. Unfortunately the variable names change with scope and the scope changes between TensorFlow versions. We opted to just select the tensor for which the variable name ends with the endpoint name we want (it should be just one).
Parameters: endpoints – a dictionary with {variable_name: tensor}. Returns: a tensor. Return type: endpoint_value


class
luminoth.models.fasterrcnn.
FasterRCNN
(config, name='fasterrcnn')[source]¶ Faster RCNN Network module
Builds the Faster RCNN network architecture using different submodules. Calculates the total loss of the model based on the different losses by each of the submodules.
It is also responsible for building the anchor reference which is used in graph for generating the dynamic anchors.

_build
(image, gt_boxes=None, is_training=False)[source]¶ Returns bounding boxes and classification probabilities.
Parameters:  image – A tensor with the image. Its shape should be (height, width, 3).
 gt_boxes – A tensor with all the ground truth boxes of that image. Its shape should be (num_gt_boxes, 5) Where for each gt box we have (x1, y1, x2, y2, label), in that order.
 is_training – A boolean to whether or not it is used for training.
Returns:  A tensor with the softmax probability for
each of the bounding boxes found in the image. Its shape should be: (num_bboxes, num_categories + 1)
 classification_bbox: A tensor with the bounding boxes found.
It’s shape should be: (num_bboxes, 4). For each of the bboxes we have (x1, y1, x2, y2)
Return type: classification_prob

_generate_anchors
(feature_map_shape)[source]¶ Generate anchor for an image.
Using the feature map, the output of the pretrained network for an image, and the anchor_reference generated using the anchor config values. We generate a list of anchors.
Anchors are just fixed bounding boxes of different ratios and sizes that are uniformly generated throught the image.
Parameters: feature_map_shape – Shape of the convolutional feature map used as input for the RPN. Should be (batch, height, width, depth). Returns:  A flattened Tensor with all the anchors of shape
 (num_anchors_per_points * feature_width * feature_height, 4) using the (x1, y1, x2, y2) convention.
Return type: all_anchors

loss
(prediction_dict, return_all=False)[source]¶ Compute the joint training loss for Faster RCNN.
Parameters: prediction_dict – The output dictionary of the _build method from which we use two different main keys:
 rpn_prediction: A dictionary with the output Tensors from the
 RPN.
 classification_prediction: A dictionary with the output Tensors
 from the RCNN.
Returns: If return_all is False, a tensor for the total loss. If True, a dict with all the internal losses (RPN’s, RCNN’s, regularization and total loss).

summary
¶ Generate merged summary of all the subsummaries used inside the Faster RCNN network.


class
luminoth.models.ssd.
SSD
(config, name='ssd')[source]¶ SSD: Single Shot MultiBox Detector

_build
(image, gt_boxes=None, is_training=False)[source]¶ Returns bounding boxes and classification probabilities.
Parameters:  image – A tensor with the image. Its shape should be (height, width, 3).
 gt_boxes – A tensor with all the ground truth boxes of that image. Its shape should be (num_gt_boxes, 5) Where for each gt box we have (x1, y1, x2, y2, label), in that order.
 is_training – A boolean to whether or not it is used for training.
Returns: predictions: proposal_prediction: A dictionary with:
 proposals: The proposals of the network after appling some
filters like negative area; and NMS
proposals_label: A tensor with the label for each proposal. proposals_label_prob: A tensor with the softmax probability
for the label of each proposal.
bbox_offsets: A tensor with the predicted bbox_offsets class_scores: A tensor with the predicted classes scores
Return type: A dictionary with the following keys

loss
(prediction_dict, return_all=False)[source]¶ Compute the loss for SSD.
Parameters: prediction_dict – The output dictionary of the _build method from which we use different main keys:
cls_pred: A dictionary with the classes classification. loc_pred: A dictionary with the localization predictions target: A dictionary with the targets for both classes and
localizations.Returns: A tensor for the total loss.

summary
¶ Generate merged summary of all the subsummaries used inside the ssd network.
