ColumnReference

The ColumnReference type is a special Callable parameter type, used to reference a column of the Callable’s target DataFrame.

Definition

class ColumnReference

Every Callable receives data from a DataFrame aggregated from the outputs of the Callable’s immediate parents. The DataFrame is known as the Callable’s target DataFrame. This DataFrame is headered (meaning its columns have names), which must be in the format callable_id.output_name. As such, the ColumnReference type is used to reference a column of the DataFrame by its name.

Note

Most of the time, the Callable’s output column name is configurable through its output_name parameter. However, some Callables have fixed output names. This is especially common for Callables that add more than one column to the output DataFrame. Consult the Output section of a Callable’s documentation for more information on what the ``output_name``s of its output columns are.

Definitively, this type is an alias for a Callable parameter of str type with a specific format: $<str>.

  • The $ character is used to distinguish a column reference from a regular string.

  • The <str> part must be a valid column name in the Callable’s target DataFrame.

Note

It is the Callable’s responsibility to validate the column reference. The pipeline runner does not check if the column referenced actually exists in the Callable’s target DataFrame.

Example

Consider a simple Pipeline of 2 Callables connected sequentially:

  1. Query Callable. Its ID is get_all_students, and it selects the student.student_id field. (See the documentation for the Query Callable for more details on its output format.)

  2. Max Callable. It has a parameter column of type ColumnReference.

A valid value for Max’s column parameter would then be $get_all_students.student.student_id.