ColumnReference
The ColumnReference type is a special Callable parameter type, used to reference a column of the Callable’s target DataFrame.
Definition
- class ColumnReference
Every Callable receives data from a DataFrame aggregated from the outputs of the Callable’s immediate parents.
The DataFrame is known as the Callable’s target DataFrame.
This DataFrame is headered (meaning its columns have names), which must be in the format callable_id.output_name.
As such, the ColumnReference type is used to reference a column of the DataFrame by its name.
Note
Most of the time, the Callable’s output column name is configurable through its output_name parameter.
However, some Callables have fixed output names. This is especially common for Callables that add more than one column to the output DataFrame.
Consult the Output section of a Callable’s documentation for more information on what the ``output_name``s of its output columns are.
Definitively, this type is an alias for a Callable parameter of str type with a specific format: $<str>.
The
$character is used to distinguish a column reference from a regular string.The
<str>part must be a valid column name in the Callable’s target DataFrame.
Note
It is the Callable’s responsibility to validate the column reference. The pipeline runner does not check if the column referenced actually exists in the Callable’s target DataFrame.
Example
Consider a simple Pipeline of 2 Callables connected sequentially:
Query Callable. Its ID is
get_all_students, and it selects thestudent.student_idfield. (See the documentation for the Query Callable for more details on its output format.)Max Callable. It has a parameter
columnof typeColumnReference.
A valid value for Max’s column parameter would then be $get_all_students.student.student_id.