`agents.config`¶

Module Contents¶

Classes¶

`LLMConfig`	Configuration for the Large Language Model (LLM) component.
`MLLMConfig`	Configuration for the Multi-Modal LLM (VLM) component.
`CortexConfig`	Configuration for the Cortex task planning and execution component.
`VLAConfig`	Configuration for the Vision-Language-Action (VLA) component.
`SpeechToTextConfig`	Configuration for a Speech-To-Text component.
`TextToSpeechConfig`	Configuration for a Text-To-Speech component.
`SemanticRouterConfig`	Configuration parameters for a semantic router component.
`MapConfig`	Configuration for a MapEncoding component.
`MemoryConfig`	Configuration for the Memory component.
`VideoMessageMakerConfig`	Configuration parameters for a video message maker component.
`VisionConfig`	Configuration for a detection component.

API¶

class agents.config.LLMConfig¶

Bases: agents.config.ModelComponentConfig

Configuration for the Large Language Model (LLM) component.

It defines various settings that control how the LLM component operates, including whether to enable chat history, retrieval augmented generation (RAG) and more.

Parameters:

enable_rag (bool) – Enables or disables Retrieval Augmented Generation.
collection_name (Optional[str]) – The name of the vectordb collection to use for RAG.
distance_func (str) – The distance metric used for nearest neighbor search for RAG. Supported values are “l2”, “ip”, and “cosine”.
n_results (int) – The maximum number of results to return for RAG. Defaults to 1. For numbers greater than 1, results will be concatenated together in a single string.
chat_history (bool) – Whether to include chat history in the LLM’s prompt.
history_reset_phrase (str) – Phrase to reset chat history. Defaults to ‘chat reset’
history_size (int) – Number of user messages to keep in chat history. Defaults to 10
temperature (float) – Temperature used for sampling tokens during generation. Default is 0.8 and must be greater than 0.0.
max_new_tokens (int) – The maximum number of new tokens to generate. Default is 100 and must be greater than 0.
stream (bool) – Publish the llm output as a stream of tokens, useful when sending llm output to a user facing client or to a TTS component. Cannot be used in conjunction with tool calling. Default is false
break_character (str) – A string character marking that the output thus far received in a stream should be published. This parameter only takes effect when stream is set to True. As stream output is received token by token, it is useful to publish full sentences instead of individual tokens as the components output (for example, for downstream text to speech conversion). This value can be set to an empty string to publish output token by token. Default is ‘.’ (period)
response_terminator (str) – A string token marking that the end of a single response from the model. This token is only used in case of a persistent clients, such as a websocket client and when stream is set to True. It is not published. This value cannot be an empty string. Default is ‘<>’
strip_think_tokens (bool) – Whether to strip <think>...</think> blocks from model output. Reasoning models (e.g. Qwen3, DeepSeek-R1) emit these blocks which are useful for debugging but should typically not be forwarded to downstream components such as TTS or UI. Applies to both streaming and non-streaming output. Default is True.
enable_local_model (bool) – Whether to enable a local LLM model via llama.cpp, allowing the component to run without a remote model client. Requires the llama-cpp-python package. Default is False.
device_local_model (str) – Device to run the local model on, either “cpu” or “cuda” (default: “cuda”). This parameter is only effective when enable_local_model is True.
ncpu_local_model (int) – Number of CPU cores to allocate to the local model when using CPU (default: 1). This parameter is only effective when enable_local_model is True.
local_model_path (Optional[str]) – HuggingFace repository ID for a GGUF model (default: Qwen/Qwen3-0.6B-GGUF), or a local path to a .gguf file. This parameter is only effective when enable_local_model is True.

Example of usage:

config = LLMConfig(enable_rag=True, collection_name="my_collection", distance_func="l2")

Example of usage with local model:

config = LLMConfig(enable_local_model=True)

get_inference_params() → Dict¶: Get inference params from model components

class agents.config.MLLMConfig¶

Bases: agents.config.LLMConfig

Configuration for the Multi-Modal LLM (VLM) component.

It defines various settings that control how the VLM component operates, including whether to enable chat history, retreival augmented generation (RAG) and more.

Parameters:

enable_rag (bool) – Enables or disables Retreival Augmented Generation.
collection_name (Optional[str]) – The name of the vectordb collection to use for RAG.
distance_func (str) – The distance metric used for nearest neighbor search for RAG. Supported values are “l2”, “ip”, and “cosine”.
n_results (int) – The maximum number of results to return for RAG. Defaults to 1. For numbers greater than 1, results will be concatenated together in a single string.
chat_history (bool) – Whether to include chat history in the LLM’s prompt.
history_reset_phrase (str) – Phrase to reset chat history. Defaults to ‘chat reset’
history_size (int) – Number of user messages to keep in chat history. Defaults to 10
temperature (float) – Temperature used for sampling tokens during generation. Default is 0.8 and must be greater than 0.0.
max_new_tokens (int) – The maximum number of new tokens to generate. Default is 100 and must be greater than 0.
stream (bool) – Publish the llm output as a stream of tokens, useful when sending llm output to a user facing client or to a TTS component. Cannot be used in conjunction with tool calling. Default is false
break_character (str) – A string character marking that the output thus far received in a stream should be published. This parameter only takes effect when stream is set to True. As stream output is received token by token, it is useful to publish full sentences instead of individual tokens as the components output (for example, for downstream text to speech conversion). This value can be set to an empty string to publish output token by token. Default is ‘.’ (period)
response_terminator (str) – A string token marking that the end of a single response from the model. This token is only used in case of a persistent clients, such as a websocket client and when stream is set to True. It is not published. This value cannot be an empty string. Default is ‘<>’
strip_think_tokens – Whether to strip <think>...</think> blocks from model output. Reasoning models (e.g. Qwen3, DeepSeek-R1) emit these blocks which are useful for debugging but should typically not be forwarded to downstream components such as TTS or UI. Applies to both streaming and non-streaming output. Default is True.
enable_local_model (bool) – Whether to enable a local VLM via llama.cpp (Moondream2), allowing the component to run without a remote model client. Requires the llama-cpp-python package. Default is False.
device_local_model (str) – Device to run the local model on, either “cpu” or “cuda” (default: “cuda”). This parameter is only effective when enable_local_model is True.
ncpu_local_model (int) – Number of CPU cores to allocate to the local model when using CPU (default: 1). This parameter is only effective when enable_local_model is True.
local_model_path (Optional[str]) – HuggingFace repository ID for a GGUF VLM model (default: ggml-org/moondream2-20250414-GGUF). This parameter is only effective when enable_local_model is True.

Example of usage:

config = MLLMConfig(enable_rag=True, collection_name="my_collection", distance_func="l2", task=grounding)

Example of usage with local model:

config = MLLMConfig(enable_local_model=True)

get_inference_params() → Dict¶: Get inference params from model components

class agents.config.CortexConfig¶

Bases: agents.config.LLMConfig

Configuration for the Cortex task planning and execution component.

The Cortex component uses an LLM to decompose high-level tasks into sub-tasks and executes them by dispatching Actions registered on other components.

The task execution follows a two-phase approach:

Planning — A multi-step conversational loop where the LLM can call inspect_component to research available components and their capabilities. Once the LLM has enough context, it returns action tool calls which become the execution plan. RAG context from a vector DB is also available during this phase. Controlled by max_planning_steps.
Execution — Each planned step is executed sequentially. Before each step, a brief LLM confirmation call decides: EXECUTE, SKIP, or ABORT, based on the original plan and results so far. After a plan is fully executed, Cortex feeds the results back to the planner and may produce a follow-up plan, repeating the plan-execute loop until the planner signals completion. Both the per-plan length and the number of plan-execute iterations are capped by max_execution_steps.

The chat_history and stream fields are enforced by the component (chat_history=True, stream=False) and cannot be overridden.

Parameters:

max_planning_steps (int) – Maximum number of LLM calls allowed during the planning phase (e.g. inspect_component calls). Default is 10.
max_execution_steps (int) – Caps two things at once: (1) the maximum number of action steps allowed in any single execution plan, plans with more steps are truncated; and (2) the maximum number of plan-execute iterations Cortex will run before giving up if the planner never signals completion. The worst-case total number of actions executed for one task is therefore max_execution_steps². Default is 10.
confirmation_temperature (float) – Temperature for the per-step confirmation LLM calls. Used for both the decision and resolving tool call arguments from prior step results. Default is 0.3.
confirmation_max_tokens (int) – Maximum tokens for confirmation responses. Must be large enough to accommodate a tool call with resolved arguments when the LLM returns EXECUTE. Default is 500.
temperature (float) – Temperature used for the planning LLM call. Default is 0.8 and must be greater than 0.0.
max_new_tokens (int) – The maximum number of new tokens to generate during planning. Default is 1000 (inherited from LLMConfig) and must be greater than 0.
enable_rag (bool) – Enable Retrieval Augmented Generation to provide context during planning. Requires a db_client to be passed to the Cortex component. Default is False.
strip_think_tokens (bool) – Whether to strip <think>...</think> blocks from model output. Default is True.
enable_local_model (bool) – Whether to enable a local LLM via llama.cpp. Requires llama-cpp-python. Default is False.
device_local_model (str) – Device to run the local model on, either “cpu” or “cuda” (default: “cuda”).
ncpu_local_model (int) – Number of CPU cores for the local model (default: 1).
local_model_path (Optional[str]) – HuggingFace repository ID for a GGUF model (default: Qwen/Qwen3-0.6B-GGUF), or a local path to a .gguf file.

Example of usage:

config = CortexConfig(max_planning_steps=10, max_execution_steps=15, temperature=0.2)

Example of usage with local model:

config = CortexConfig(enable_local_model=True, max_execution_steps=20)

get_inference_params() → Dict¶: Get inference params from model components

class agents.config.VLAConfig¶

Bases: agents.config.ModelComponentConfig

Configuration for the Vision-Language-Action (VLA) component.

It defines settings that control how the VLA component maps sensor inputs to the model, manages the frequency of observation and action loops, and enforces safety constraints through URDF limits.

Parameters:

joint_names_map (Dict[str, str]) – A dictionary mapping the joint names expected by the model (keys) to the actual joint names in the robot’s URDF/ROS system (values).
camera_inputs_map (Mapping[str, Union[Topic, Dict]]) – A mapping of camera names expected by the model (keys) to the corresponding ROS topics (values).
state_input_type (Literal[“positions”, “velocities”, “accelerations”, “efforts”]) – The type of state data to extract from the joint state inputs. Supported values are “positions”, “velocities”, “accelerations”, and “efforts”. Default is “positions”.
action_output_type (Literal[“positions”, “velocities”, “accelerations”, “efforts”]) – The type of action data to publish to the robot controller. Supported values are “positions”, “velocities”, “accelerations”, and “efforts”. Default is “positions”.
observation_sending_rate (float) – The frequency (in Hz) at which observations are captured and sent to the model for inference. Default is 10.0 Hz.
action_sending_rate (float) – The frequency (in Hz) at which action commands are published to the robot’s controllers. Default is 10.0 Hz.
input_timeout (float) – The maximum time (in seconds) to wait for all required inputs (joints, images) to become available before aborting an action after an action request. Default is 30.0s.
robot_urdf_file (Optional[str]) – Path to the robot’s URDF file. This is strongly recommended for safety, as it allows the component to read joint limits and cap generated actions within safe bounds.
joint_limits (Optional[Dict]) – A manual dictionary of joint limits to be used if a URDF file is not provided. Format should match parsed URDF limits.

Example of usage:

joints_map = {"shoulder_pan": "joint1", "elbow_flex": "joint2"}
camera_map = {"front_view": camera_topic}

config = VLAConfig(
    joint_names_map=joints_map,
    camera_inputs_map=camera_map,
    observation_sending_rate=5.0,
    robot_urdf_file="/path/to/robot.urdf"
)

get_inference_params() → Dict¶: Get inference params from model components

class agents.config.SpeechToTextConfig¶

Bases: agents.config.ModelComponentConfig

Configuration for a Speech-To-Text component.

This class defines the configuration options for speech transcription, voice activity detection, wakeword detection, and audio streaming.

– Local Model

Parameters:

enable_local_model (bool) – Whether to enable a local STT model via sherpa-onnx (Whisper tiny.en by default), allowing the component to run without a remote model client. Requires the sherpa-onnx pip package. Default is False.
device_local_model (str) – Device to run the local model on, either “cpu” or “cuda” (default: “cuda”). This parameter is only effective when enable_local_model is True.
ncpu_local_model (int) – Number of CPU cores to allocate to the local model when using CPU (default: 1). This parameter is only effective when enable_local_model is True.
local_model_path (Optional[str]) – HuggingFace repository ID for a sherpa-onnx compatible Whisper STT model (default: csukuangfj/sherpa-onnx-whisper-tiny.en). For available models see https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html. This parameter is only effective when enable_local_model is True.

– Transcription

Parameters:

initial_prompt (str or None) – Optional initial prompt to guide transcription (e.g. speaker name or topic). Defaults to None.
language (str) – Language code for transcription (e.g. “en”, “zh”). Must be one of the supported language codes. Defaults to “en”.
max_new_tokens (int or None) – Maximum number of tokens to generate. If None, no limit is applied. Defaults to None.

– Voice Activity Detection (VAD)

Parameters:

enable_vad (bool) – Enable VAD to detect when speech is present in audio input. Requires onnxruntime and silero-vad model. Defaults to False.
device_audio (Optional[int]) – Audio input device ID. Only used if enable_vad is True. Defaults to None.
vad_threshold (float) – Threshold above which speech is considered present. Only used if enable_vad is True. Range: 0.0–1.0. Defaults to 0.5.
min_silence_duration_ms (int) – Minimum silence duration (ms) before it’s treated as a pause. Only used if enable_vad is True. Defaults to 300.
speech_pad_ms (int) – Silence padding (ms) added to start and end of detected speech regions. Only used if enable_vad is True. Defaults to 30.
speech_buffer_max_len (int) – Max length of speech buffer in ms. Only used if enable_vad is True. Defaults to 30000.
device_vad (str) – Device for VAD (‘cpu’ or ‘gpu’). Only used if enable_vad is True. Defaults to ‘cpu’.
ncpu_vad (int) – Number of CPU cores to use for VAD (if device_vad is ‘cpu’). Defaults to 1.

– Wakeword Detection

Parameters:

enable_wakeword (bool) – Enable detection of a wakeword phrase (e.g. ‘Hey Jarvis’). Requires enable_vad to be True. Defaults to False.
wakeword_threshold (float) – Minimum confidence score to trigger wakeword detection. Only used if enable_wakeword is True. Defaults to 0.6.
device_wakeword (str) – Device for Wakeword Detection (‘cpu’ or ‘gpu’). Only used if enable_wakeword is True. Defaults to ‘cpu’.
ncpu_wakeword (int) – Number of CPU cores for Wakeword Detection (if device_wakeword is ‘cpu’). Defaults to 1.

– Streaming

Parameters:

stream (bool) – Send audio as a stream to a persistent client (e.g., websockets). Requires enable_vad to be True. Useful for real-time transcription. Defaults to False.
min_chunk_size (int) – Audio chunk size in ms to send when streaming. Requires stream to be True. Must be > 100 ms. Defaults to 2000.

– Model Paths

Parameters:

vad_model_path (str) – Path or URL to VAD ONNX model. Defaults to the Silero VAD model URL.
melspectrogram_model_path (str) – Path or URL to melspectrogram model used in wakeword detection. Defaults to openWakeWord model URL.
embedding_model_path (str) – Path or URL to audio embedding model for wakeword detection. Defaults to openWakeWord model URL.
wakeword_model_path (str) – Path or URL to wakeword ONNX model (e.g. ‘Hey Jarvis’). Defaults to a pretrained openWakeWord model. For custom models, see: https://github.com/dscripka/openWakeWord/blob/main/notebooks/automatic_model_training.ipynb

– Example

Example usage:

config = SpeechToTextConfig(
    enable_vad=True,
    enable_wakeword=True,
    vad_threshold=0.5,
    wakeword_threshold=0.6,
    min_silence_duration_ms=1000,
    speech_pad_ms=30,
    speech_buffer_max_len=8000,
)

Example of usage with local model:

config = SpeechToTextConfig(enable_local_model=True, enable_vad=True)

get_inference_params() → Dict¶: Get inference params from model components

class agents.config.TextToSpeechConfig¶

Bases: agents.config.ModelComponentConfig

Configuration for a Text-To-Speech component.

This class defines the configuration options for a Text-To-Speech component.

Parameters:

enable_local_model (bool) – Whether to enable a local TTS model via sherpa-onnx (Kokoro English by default), allowing the component to run without a remote model client. Requires the sherpa-onnx pip package. Default is False.
device_local_model (str) – Device to run the local model on, either “cpu” or “cuda” (default: “cuda”). This parameter is only effective when enable_local_model is True.
ncpu_local_model (int) – Number of CPU cores to allocate to the local model when using CPU (default: 1). This parameter is only effective when enable_local_model is True.
local_model_path (Optional[str]) – HuggingFace repository ID for a sherpa-onnx compatible TTS model (default: csukuangfj/kokoro-en-v0_19). For available models see https://k2-fsa.github.io/sherpa/onnx/pretrained_models/index.html. This parameter is only effective when enable_local_model is True.
play_on_device (bool) – Whether to play the audio on available audio device (default: False).
device (int) – Optional device id (int) for playing the audio. Only effective if play_on_device is True (default: None).
stream_to_ip (Optional[str]) – If set, streams the audio to this IP address via UDP instead of playing locally. Requires play_on_device to be True.
stream_to_port (Optional[int]) – The target port for UDP streaming. Must be set if stream_to_ip is set.
buffer_size (int) – Size of the buffer for playing audio on device. Only effective if play_on_device is True (default: 20).
block_size (int) – Size of the audio block to be read for playing audio on device. Only effective if play_on_device is True (default: 4096).
thread_shutdown_timeout (int) – Timeout to shutdown a playback thread, if data is not received for more than a certain number of seconds. Only effective if play_on_device is True (default: 5 seconds).
stream (bool) – Stram output when used with WebSocketClient. Useful when model output is large and broken into chunks by the server. (default: True).

Example of usage for local playback:

config = TextToSpeechConfig(play_on_device=True)

Example of usage for UDP streaming:

config = TextToSpeechConfig(play_on_device=True, stream_to_ip="192.168.1.100", stream_to_port=12345)

Example of usage with local model:

config = TextToSpeechConfig(enable_local_model=True, play_on_device=True)

get_inference_params() → Dict¶: Get inference params from model components

class agents.config.SemanticRouterConfig¶

Bases: agents.config.ModelComponentConfig

Configuration parameters for a semantic router component.

Parameters:

router_name (str) – The name of the router.
distance_func (str) – The function used to calculate distance from route samples in vectordb. Can be one of “l2” (L2 distance), “ip” (Inner Product), or “cosine” (Cosine similarity). Default is “l2”.
maximum_distance (float) – The maximum distance threshold for routing. A value between 0.1 and 1.0. Defaults to 0.4

Example of usage:

config = SemanticRouterConfig(router_name="my_router")
# or
config = SemanticRouterConfig(router_name="my_router", distance_func="ip", maximum_distance=0.7)

get_inference_params() → Dict¶: Get inference params from model components

class agents.config.MapConfig¶

Bases: agents.ros.BaseComponentConfig

Configuration for a MapEncoding component.

Parameters:

map_name (str) – The name of the map.
distance_func (str) – The function used to calculate distance when retreiving information from the map collection. Can be one of “l2” (L2 distance), “ip” (Inner Product), or “cosine” (Cosine similarity). Default is “l2”.

Example of usage:

config = MapConfig(map_name="my_map", distance_func="ip")

class agents.config.MemoryConfig¶

Bases: agents.ros.BaseComponentConfig

Configuration for the Memory component.

Parameters:

db_path (str) – Path to the eMEM SQLite database file.
embedding_checkpoint (str) – Model name for sentence-transformers fallback. Only used when no embedding_client is provided to the Memory component.
auto_store (bool) – Automatically store layer data on each execution step. If False, storage only happens via the store component action.
working_memory_size (int) – Max observations held in the in-process buffer before the oldest are dropped. Observations are flushed to persistent storage well before this limit via flush_batch_size and flush_interval.
flush_interval (float) – Seconds between auto-flushes of the working memory buffer to persistent storage. Lower values mean observations become searchable faster but increase write frequency.
flush_batch_size (int) – Number of observations accumulated before an automatic flush is triggered, regardless of flush_interval.
consolidation_window (float) – Maximum temporal gap in seconds between consecutive observations within the same consolidation chunk. When an episode is consolidated, observations separated by more than this gap produce separate gist summaries. For example, 1800 (30 min) means a multi-session episode spanning days will get one gist per session, not one monolithic summary.
consolidation_spatial_eps (float) – DBSCAN epsilon in meters for spatial clustering during time-window consolidation. Observations farther apart than this are placed in separate clusters and produce separate gists. Only applies to non-episodic consolidation.
consolidation_min_samples (int) – Minimum number of observations required to form a spatial cluster during time-window consolidation. Clusters smaller than this are left in short-term memory.
archive_after_seconds (float) – How long (in seconds) observations remain in long-term memory (with full text preserved) before archival drops their text and embeddings, leaving only the gist searchable. Set higher to keep raw observations searchable longer at the cost of storage.
entity_extract_flush_interval (int) – Trigger entity extraction every N working-memory flushes.
entity_extract_time_interval (float) – Trigger entity extraction every N seconds, whichever comes first with entity_extract_flush_interval.
entity_similarity_threshold (float) – Cosine similarity threshold (0-1) for merging a newly detected entity with an existing one. Higher values require closer name matches before merging (e.g. 0.85 means “red chair” and “chair” may merge, but “chair” and “table” won’t).
entity_spatial_radius (float) – Maximum distance in meters between an existing entity and a new detection for them to be considered the same object. Only entities within this radius AND above the similarity threshold are merged.
recency_weight (float) – Alpha multiplier for recency-weighted semantic search. When > 0, recent observations are boosted over older ones at equal semantic distance. Set to 0.0 (default) for pure semantic ordering.
recency_halflife (float) – Time constant in seconds for recency decay. An observation this many seconds old receives half the recency boost. Only effective when recency_weight > 0.
hnsw_ef_construction (int) – HNSW index build-time quality parameter. Higher values produce a better quality index but take longer to build. Default (200) is suitable for most use cases.
hnsw_m (int) – Number of bidirectional links per node in the HNSW graph. Higher values improve recall but increase memory usage. Default (16) is suitable for most use cases.
hnsw_ef_search (int) – HNSW search-time quality parameter. Higher values improve recall at the cost of query latency. Default (50) is suitable for most use cases.
hnsw_max_elements (int) – Maximum number of vectors the HNSW index can hold. Should be set higher than the expected total number of observations + gists + entities over the system’s lifetime.

Example of usage:

config = MemoryConfig(db_path="/tmp/robot_memory.db")

class agents.config.VideoMessageMakerConfig¶

Bases: agents.ros.BaseComponentConfig

Configuration parameters for a video message maker component.

Parameters:

min_video_frames (int) – The minimum number of frames in a video segment. Default is 15, assuming a 0.5 second video at 30 fps.
max_video_frames (int) – The maximum number of frames in a video segment. Default is 600, assuming a 20 second video at 30 fps.
motion_estimation_func (Optional[str]) – The function used for motion estimation. Can be one of “frame_difference” or “optical_flow”. Default is None.
threshold (float) – The threshold value for motion detection. A float between 0.1 and 5.0. Default is 0.3.
flow_kwargs – Additional keyword arguments for the optical flow algorithm. Default is a dictionary with reasonable values.

Example of usage:

config = VideoMessageMakerConfig()
# or
config = VideoMessageMakerConfig(min_video_frames=30, motion_estimation_func="optical_flow", threshold=0.5)

class agents.config.VisionConfig¶

Bases: agents.config.ModelComponentConfig

Configuration for a detection component.

The config allows you to customize the detection and/or tracking process.

Parameters:

threshold –
The confidence threshold for object detection, ranging from 0.1 to 1.0 (default: 0.5).

type threshold:

float

param get_dataset_labels:

Whether to return data labels along with detections (default: True).

type get_dataset_labels:

bool

param labels_to_track:

A list of specific labels to track, when the model is used as a tracker (default: None).

type labels_to_track:

Optional[list]
enable_visualization –
Whether to enable visualization of detections (default: False). Useful for testing vision component output.

type enable_visualization:

Optional[bool]

param enable_local_classifier:

Whether to enable a local classifier model for detections (default: False). If a model client is given to the component, than this has no effect.

type enable_local_classifier:

bool

param input_height:

Height of the input to local classifier model in pixels (default: 640). This parameter is only effective when enable_local_classifier is set to True.

type input_height:

int

param input_width:

Width of the input to local classifier in pixels (default: 640). This parameter is only effective when enable_local_classifier is set to True.

type input_width:

int

param dataset_labels:

A dictionary mapping label indices to names, used to interpret model outputs (default: COCO labels). This parameter is only effective when enable_local_classifier is set to True.

type dataset_labels:

Dict

param device_local_classifier:

Device to run the local classifier on, either “cpu” or “gpu” (default: “gpu”). This parameter is only effective when enable_local_classifier is set to True.

type device_local_classifier:

str

param ncpu_local_classifier:

Number of CPU cores to allocate to the local classifier when using CPU (default: 1). This parameter is only effective when enable_local_classifier is set to True.

type ncpu_local_classifier:

int

param local_classifier_model_path:

Path or URL to the ONNX model used by the local classifier (default: DEIM, Huang et al. CVPR 2025). Other models based on DEIM can be checked here. This parameter is only effective when enable_local_classifier is set to True.

type local_classifier_model_path:

str

Example of usage:
```
config = DetectionConfig(threshold=0.3)
```

get_inference_params() → Dict¶: Get inference params from model components

agents.config¶

Module Contents¶

Classes¶

API¶

`agents.config`¶