Add a Feature Type
1. Define the new feature type¶
Feature types are defined as constants in ludwig/constants.py
.
Add the name of the new feature type as a constant:
BINARY = "binary"
CATEGORY = "category"
...
NEW_FEATURE_TYPE = "new_feature_type_name"
2. Add feature classes in a new python module¶
Source code for feature classes lives under ludwig/features/
. Add the implementation of the new feature into a new
python module ludwig/feature/<new_name>_feature.py
.
Input and output feature classes are defined in the same file, for example CategoryInputFeature
and
CategoryOutputFeature
are defined in ludwig/features/category_feature.py
.
Input features inherit from ludwig.features.base_feature.InputFeature
and corresponding mixin feature classes:
class CategoryInputFeature(CategoryFeatureMixin, InputFeature):
Similarly, output features inherit from the ludwig.features.base_feature.OutputFeature
and corresponding mixin feature
classes:
class CategoryOutputFeature(CategoryFeatureMixin, OutputFeature):
Feature base classes (InputFeature
, OutputFeature
) inherit from LudwigModule
which is itself a
torch.nn.Module, so all the usual concerns of
developing Torch modules apply.
Mixin classes provide shared preprocessing/postprocessing state and logic, such as the mapping from categories to indices, which are shared by input and output feature implementations. Mixin classes are not torch modules, and do not need to provide a forward method.
class CategoryFeatureMixin(BaseFeatureMixin):
3. Implement required methods¶
Input features¶
Constructor¶
Feature parameters are provided in a dictionary of key-value pairs as an argument to the constructor. The feature
dictionary should usually be passed to the superclass constructor before initialization:
def __init__(self, feature: [str, Any], encoder_obj=None):
super().__init__(feature)
# Initialize any modules, layers, or variable state
Inputs
- feature: (dict) contains all feature config parameters.
- encoder_obj: (Encoder, default:
None
) is an encoder object of the supported type (category encoder, binary encoder, etc.). Input features typically create their own encoder,encoder_obj
is only specified when two input features share the same encoder.
forward¶
All input features must implement the forward
method with the following signature:
def forward(self, inputs: torch.Tensor) -> torch.Tensor:
# perform forward pass
# ...
# inputs_encoded = result of encoder forward pass
return inputs_encoded
Inputs
- inputs (torch.Tensor): The input tensor.
Return
- (torch.Tensor): Input data encoded by the input feature's encoder.
input_shape¶
@property
def input_shape(self) -> torch.Size:
Return
- (torch.Size): The fully-specified size of the feature's expected input, without batch dimension.
Output features¶
Constructor¶
def __init__(self, feature: Dict[str, Any], output_features: Dict[str, OutputFeature]):
super().__init__(feature, output_features)
self.overwrite_defaults(feature)
# Initialize any decoder modules, layers, metrics, loss objects, etc...
Inputs
- feature (dict): contains all feature parameters.
- output_features (dict[Str, OutputFeature]): Dictionary of other output features, only used if this output feature depends on other outputs.
logits¶
Computes feature logits from the combiner output (and any features this feature depends on).
def logits(self, inputs: Dict[str, torch.Tensor], **kwargs):
hidden = inputs[HIDDEN]
# logits = results of decoder operation
return logits
Inputs
- inputs (dict): input dictionary which contains the
HIDDEN
key, whose value is the output of the combiner. Will contain other input keys if this feature depends on other output features.
Return
- (torch.Tensor): feature logits.
create_predict_module¶
Creates and returns a torch.nn.Module
that converts raw model outputs (logits) to predictions.
This module is required for exporting models to Torchscript.
def create_predict_module(self) -> PredictModule:
Return
- (PredictModule): A module whose forward method convert feature logits to predictions.
output_shape¶
@property
def output_shape(self) -> torch.Size:
Return
- (torch.Size): The fully-specified size of the feature's output, without batch dimension.
Feature Mixins¶
If your new feature can re-use the preprocessing and postprocessing logic of an existing feature type, you do not need
to implement a new mixin class. If your new feature does require unique pre or post-processing, add a new subclass of
ludwig.features.base_feature.BaseFeatureMixin
. Implement all abstract methods of BaseFeatureMixin
.
4. Add the new feature classes to the corresponding feature registries¶
Input and output feature registries are defined in ludwig/features/feature_registries.py
. Import your new feature
classes, and add them to the appropriate registry dictionaries:
base_type_registry = {
CATEGORY: CategoryFeatureMixin,
...
}
input_type_registry = {
CATEGORY: CategoryInputFeature,
...
}
output_type_registry = {
CATEGORY: CategoryOutputFeature,
...
}