agents.components.mllm

Module Contents

Classes

MLLM

This component utilizes multi-modal large language models (e.g. Llava) that can be used to process text and image data.

API

class agents.components.mllm.MLLM(*, inputs: List[Union[agents.ros.Topic, agents.ros.FixedInput]], outputs: List[agents.ros.Topic], model_client: agents.clients.model_base.ModelClient, config: Optional[agents.config.MLLMConfig] = None, db_client: Optional[agents.clients.db_base.DBClient] = None, trigger: Union[agents.ros.Topic, List[agents.ros.Topic], float, agents.ros.Event] = 1.0, component_name: str, **kwargs)

Bases: agents.components.llm.LLM

This component utilizes multi-modal large language models (e.g. Llava) that can be used to process text and image data.

Parameters:
  • inputs (list[Topic | FixedInput]) – The input topics or fixed inputs for the MLLM component. This should be a list of Topic objects or FixedInput instances, limited to String and Image types.

  • outputs (list[Topic]) – The output topics for the MLLM component. This should be a list of Topic objects. String, Detections2D and PointsOfInterest2D types is handled automatically.

  • model_client (ModelClient) – The model client for the MLLM component. This should be an instance of ModelClient.

  • config (MLLMConfig) – Optional configuration for the MLLM component. This should be an instance of MLLMConfig. If not provided, defaults to MLLMConfig().

  • trigger (Union[Topic, list[Topic], float]) – The trigger value or topic for the MLLM component. This can be a single Topic object, a list of Topic objects, or a float value for a timed component. Defaults to 1.

  • component_name (str) – The name of the MLLM component. This should be a string and defaults to “mllm_component”.

Example usage:

text0 = Topic(name="text0", msg_type="String")
image0 = Topic(name="image0", msg_type="Image")
text0 = Topic(name="text1", msg_type="String")
config = MLLMConfig()
model = TransformersMLLM(name='idefics')
model_client = ModelClient(model=model)
mllm_component = MLLM(inputs=[text0, image0],
                      outputs=[text1],
                      model_client=model_client,
                      config=config,
                      component_name='mllm_component')
custom_on_configure()

Create model client if provided and initialize model.

set_task(task: Literal[general, pointing, affordance, trajectory, grounding]) None

Set a task for the MLLM component. This is useful when using a multimodal LLM model that has been trained on specific tasks. This method can be invoked as an action, in response to an event, to change the task at runtime. For an example checkout RoboBrain2.0, available on RoboML.

Parameters:

task – A task that is one of the following “general”, “pointing”, “affordance”, “trajectory”, “grounding”.

custom_on_deactivate()

Destroy model client if it exists

add_documents(ids: List[str], metadatas: List[Dict], documents: List[str]) None

Add documents to vector DB for Retrieval Augmented Generation (RAG).

Important

Documents can be provided after parsing them using a document parser. Checkout various document parsers, available in packages like langchain_community

Parameters:
  • ids (list[str]) – List of unique string ids for each document

  • metadatas (list[dict]) – List of metadata dicts for each document

  • documents (list[str]) – List of documents which are to be store in the vector DB

Return type:

None

set_topic_prompt(input_topic: agents.ros.Topic, template: Union[str, pathlib.Path]) None

Set prompt template on any input topic of type string.

Parameters:
  • input_topic (Topic) – Name of the input topic on which the prompt template is to be applied

  • template (Union[str, Path]) – Template in the form of a valid jinja2 string or a path to a file containing the jinja2 string.

Return type:

None

Example usage:

llm_component = LLM(inputs=[text0],
                    outputs=[text1],
                    model_client=model_client,
                    config=config,
                    component_name='llama_component')
llm_component.set_topic_prompt(text0, template="Please answer the following: {{ text0 }}")
set_component_prompt(template: Union[str, pathlib.Path]) None

Set component level prompt template which can use multiple input topics.

Parameters:

template (Union[str, Path]) – Template in the form of a valid jinja2 string or a path to a file containing the jinja2 string.

Return type:

None

Example usage:

llm_component = LLM(inputs=[text0],
                    outputs=[text1],
                    model_client=model_client,
                    config=config,
                    component_name='llama_component')
llm_component.set_component_prompt(template="You can see the following items: {{ detections }}. Please answer the following: {{ text0 }}")
set_system_prompt(prompt: str) None

Set system prompt for the model, which defines the models ‘personality’.

Parameters:

prompt – string or a path to a file containing the string.

Return type:

None

Example usage:

llm_component = LLM(inputs=[text0],
                    outputs=[text1],
                    model_client=model_client,
                    config=config,
                    component_name='llama_component')
llm_component.set_system_prompt(prompt="You are an amazing and funny robot. You answer all questions with short and concise answers.")
register_tool(tool: Callable, tool_description: Dict, send_tool_response_to_model: bool = False) None

Register a tool with the component which can be called by the model. If the send_tool_response_to_model flag is set to True than the output of the tool is sent back to the model and final output of the model is sent to component publishers (i.e. the model “uses” the tool to give a more accurate response.). If the flag is set to False than the output of the tool is sent to publishers of the component.

Parameters:
  • tool (Callable) – An arbitrary function that needs to be called. The model response will describe a call to this function.

  • tool_description (dict) – A dictionary describing the function. This dictionary needs to be made in the format shown here. Also see usage example.

  • send_tool_response_to_model – Whether the model should be called with the tool response. If set to false the tool response will be sent to component publishers. If set to true, the response will be sent back to the model and the final response from the model will be sent to the publishers. Default is False.

  • send_tool_response_to_model – bool

Return type:

None

Example usage:

def my_arbitrary_function(first_param: str, second_param: int) -> str:
    return f"{first_param}, {second_param}"

my_func_description = {
          'type': 'function',
          'function': {
            'name': 'my_arbitrary_function',
            'description': 'Description of my arbitrary function',
            'parameters': {
              'type': 'object',
              'properties': {
                'first_param': {
                  'type': 'string',
                  'description': 'Description of the first param',
                },
                'second_param': {
                  'type': 'int',
                  'description': 'Description of the second param',
                },
              },
              'required': ['first_param', 'second_param'],
            },
          },
        }

my_component.register_tool(tool=my_arbitrary_function, tool_description=my_func_description, send_tool_response_to_model=False)
property additional_model_clients: Optional[Dict[str, agents.clients.model_base.ModelClient]]

Get the dictionary of additional model clients registered to this component.

Returns:

A dictionary mapping client names (str) to ModelClient instances, or None if not set.

Return type:

Optional[Dict[str, ModelClient]]

change_model_client(model_client_name: str) bool

Hot-swap the active model client at runtime.

This method replaces the component’s current model_client with one from the registered additional_model_clients. It handles the safe de-initialization of the old client and initialization of the new one.

This is commonly used as a target for Actions in the Event system.

Parameters:

model_client_name (str) – The key corresponding to the desired client in additional_model_clients.

Returns:

True if the swap was successful, False otherwise (e.g., if the name was not found or initialization failed).

Return type:

bool

Example:


    from agents.ros import Action

    # Define an action to switch to the 'local_backup' client defined previously
    switch_to_local = Action(
        method=brain.change_model_client,
        args=("local_backup",)
    )

    # Trigger this action if the component fails (e.g. internet loss)
    brain.on_component_fail(action=switch_to_local, max_retries=3)
property warmup: bool

Enable warmup of the model.

custom_on_activate()

Custom configuration for creating triggers.

create_all_subscribers()

Override to handle trigger topics and fixed inputs. Called by parent BaseComponent

activate_all_triggers() None

Activates component triggers by attaching execution step to callbacks

destroy_all_subscribers() None

Destroys all node subscribers