crab.core package#
Subpackages#
- crab.core.models package
- Submodules
- crab.core.models.action module
Action
Action.name
Action.entry
Action.parameters
Action.returns
Action.description
Action.kept_params
Action.env_name
Action.description
Action.entry
Action.env_name
Action.from_function()
Action.get_required_params()
Action.kept_params
Action.local
Action.model_computed_fields
Action.model_config
Action.model_fields
Action.name
Action.parameters
Action.returns
Action.run()
Action.set_kept_param()
Action.to_openai_json_schema()
Action.to_raw_action()
ClosedAction
EMPTY_MODEL
- crab.core.models.benchmark_interface module
- crab.core.models.config module
BenchmarkConfig
BenchmarkConfig.common_setup
BenchmarkConfig.default_env
BenchmarkConfig.environments
BenchmarkConfig.model_computed_fields
BenchmarkConfig.model_config
BenchmarkConfig.model_fields
BenchmarkConfig.multienv
BenchmarkConfig.name
BenchmarkConfig.prompting_tools
BenchmarkConfig.root_action_space
BenchmarkConfig.step_limit
BenchmarkConfig.tasks
EnvironmentConfig
EnvironmentConfig.action_space
EnvironmentConfig.description
EnvironmentConfig.extra_attributes
EnvironmentConfig.model_computed_fields
EnvironmentConfig.model_config
EnvironmentConfig.model_fields
EnvironmentConfig.name
EnvironmentConfig.observation_space
EnvironmentConfig.remote_url
EnvironmentConfig.reset
VMEnvironmentConfig
- crab.core.models.evaluator module
- crab.core.models.task module
- Module contents
Action
Action.name
Action.entry
Action.parameters
Action.returns
Action.description
Action.kept_params
Action.env_name
Action.description
Action.entry
Action.env_name
Action.from_function()
Action.get_required_params()
Action.kept_params
Action.local
Action.model_computed_fields
Action.model_config
Action.model_fields
Action.name
Action.parameters
Action.returns
Action.run()
Action.set_kept_param()
Action.to_openai_json_schema()
Action.to_raw_action()
ActionOutput
BackendOutput
BenchmarkConfig
BenchmarkConfig.common_setup
BenchmarkConfig.default_env
BenchmarkConfig.environments
BenchmarkConfig.model_computed_fields
BenchmarkConfig.model_config
BenchmarkConfig.model_fields
BenchmarkConfig.multienv
BenchmarkConfig.name
BenchmarkConfig.prompting_tools
BenchmarkConfig.root_action_space
BenchmarkConfig.step_limit
BenchmarkConfig.tasks
EnvironmentConfig
EnvironmentConfig.action_space
EnvironmentConfig.description
EnvironmentConfig.extra_attributes
EnvironmentConfig.model_computed_fields
EnvironmentConfig.model_config
EnvironmentConfig.model_fields
EnvironmentConfig.name
EnvironmentConfig.observation_space
EnvironmentConfig.remote_url
EnvironmentConfig.reset
Evaluator
GeneratedTask
Message
MessageType
StepResult
SubTask
SubTaskInstance
Task
VMEnvironmentConfig
Submodules#
crab.core.benchmark module#
- class crab.core.benchmark.Benchmark(name: str, tasks: list[Task], environments: list[Environment], default_env: str | None = None, multienv: bool = False, prompting_tools: dict[str, dict[str, Action]] = {}, root_action_space: list[Action] = [], step_limit: int = 30, common_setup: list[Annotated[Action, AfterValidator(func=_check_no_param)]] = [])[source]#
Bases:
object
The crab benchmark controller managing environments and agent evaluation.
The class manages multiple environments together and provide the simple API by
step()
,observe()
andreset()
for language model agents to perform tasks in multiple environments.This class introduces a “root” environment with no action or observation capabilities, intended as a utility for evaluations not directly tied to a specific environment.
This class operates in two distinct modes: “multi-environment” and “single-environment”. In multi-environment mode, observations and action results are separated by environment, returned as a dictionary. While in single-environment mode, all observations and action outcomes are merged under the “root” environment, with actions being appropriately routed to their respective environments.
- export_action_space() dict[str, list[Action]] [source]#
Returns the action spaces from all environments.
- Returns:
A dict of action lists for each environment, keyed by environment name.
- observe() dict[str, dict[str, Any]] [source]#
Collects observations from all environments.
- Returns:
A dict-of-dict with observations from each environment. The first level keys are environment names, the second level keys are observation action names.
- observe_with_prompt() tuple[dict[str, dict[str, Any]], dict[str, tuple[str, MessageType]]] [source]#
Collects observations and applies prompting tools.
- Returns:
A tuple (observations, prompts), where “observations” and “prompts” are observations from each environment and the result of applying prompting tools on them. The first level keys are environment names, the second level keys are observation action names. Notice that some dicts can be empty if its prompting tool wasn’t set.
- start_task(task_id: str) tuple[Task, dict[str, list[Action]]] [source]#
Initializes and starts a specified task.
- Parameters:
task_id – The ID of the task to start.
- Returns:
A tuple (task, action_space), where task is the started task object, and action_sapce is a dict mapping action names to the corresponding action object.
- step(action: str, parameters: dict[str, Any] = {}, env_name: str | None = None) StepResult [source]#
Executes a step in the benchmark by performing an action.
- Parameters:
action – The action to execute.
parameters – Parameters for the action.
env_name – The name of the environment.
- Returns:
The result of the step including observations and evaluation metrics. Notice that the truncated field in the result is not meaningful for now.
- crab.core.benchmark.create_benchmark(config: BenchmarkConfig) Benchmark [source]#
Creates a benchmark by BenchmarkConfig
crab.core.decorators module#
crab.core.environment module#
- class crab.core.environment.Environment(name: str, action_space: list[Action], observation_space: list[Annotated[Action, AfterValidator(func=_check_no_param)]], description: str = '', reset: Action | None = None, remote_url: str | None = None, extra_attributes: dict[str, Any] = {})[source]#
Bases:
object
A crab environment for language model agent interaction and evaluation.
This class supports action execution and observation within a simulated or actual ecosystem. The environment is defined by customizable action and observation spaces, comprising various crab actions. Actions should include comprehensive docstrings to facilitate agent understanding and interaction.
Typically, users instantiate this class directly to perform actions within the local execution context (i.e., the device running the crab framework). This class may also serve as a base for specialized environments requiring unique action execution processes, such as forwarding actions to remote systems for execution. This is achieved by overriding the take_action method.
Actions defined in the action_space, observation_space, or reset, as well as those invoked through the take_action method that include an env parameter, will have this parameter automatically populated with the current environment instance. This allows actions to access and manipulate environment states and variables.
- name#
The name of the environment.
- Type:
str
- description#
A description of the environment.
- Type:
str
- trajectory#
A record of actions taken, their parameters, and the results.
- Type:
List[tuple[str, dict[str, Any], Any]]
- Parameters:
name (str) – The name of the environment.
action_space (List[Action]) – A list of actions that can be executed, defining the possible interactions agents can undertake.
observation_space (List[ClosedAction]) – A list of observations defining the possible states agents can perceive.
description (str, optional) – A textual description of the environment. Defaults to an empty string.
reset (Action | None, optional) – An action to reset the environment to its initial state. Defaults to None.
remote_url (Action | None, optional) – If set, the action will be taken at remote machine, by default it will be taken at local. Example: http://192.168.1.1:8000. Defaults to None.
- observe() dict[str, Any] [source]#
Observes the current state.
- Returns:
- A dictionary containing the current observations. Keys
represent the names of the observation actions.
- Return type:
Dict[str, Any]
- observe_with_prompt(prompt_tools: dict[str, Action]) tuple[dict[str, Any], dict[str, Any]] [source]#
Observes the current state with prompt.
- set_action(action: Action) None [source]#
Adds an action in the environment’s action space, either replace if the action name exist.
- Parameters:
action (Action) – The action to replace or add.
- step(action_name: str, parameters: dict[str, Any] = {})[source]#
Executes an action that is in the action space and recorded to the trajectory.
- Parameters:
action_name – Name of the action to execute. Must be in action space.
parameters (dict[str, Any], optional) – Parameters for the action. Defaults to an empty dict.
- Returns:
The result of the action execution.
- Return type:
Any
- Raises:
ActionNotFound – If the action is not found within the environment’s action space.
- take_action(action: Action, parameters: dict[str, Any] = {}) Any [source]#
Executes an action within the environment.
- Parameters:
action (Action) – The action to execute. Can be an action name or an Action object.
parameters (dict[str, Any], optional) – Parameters for the action. Defaults to an empty dict.
- Returns:
The result of the action execution.
- Return type:
Any
crab.core.exceptions module#
crab.core.graph_evaluator module#
- class crab.core.graph_evaluator.GraphEvaluator(incoming_graph_data, enable_shortcut: bool = False)[source]#
Bases:
object
- step(envs: dict[str, Environment], default_env: str = 'root')[source]#
- static visualize(evaluators: list[GraphEvaluator], path: str)[source]#
crab.core.task_generator module#
- class crab.core.task_generator.TaskGenerator(attribute_pool: dict[str, list] = {}, subtasks: list[SubTask] = [])[source]#
Bases:
object
Class to generate tasks based on a directed graph of subtasks.
- combine(current_description: str, target_description: str) str [source]#
Combines two task descriptions into a single task description using GPT model.
- Parameters:
current_description (str) – The current task description.
target_description (str) – The target task description to combine.
- Returns:
The combined task description.
- Return type:
str
- combine_subtask_list(subtask_list: list[SubTask])[source]#
Combines a list of subtasks into a single task sequence.
- Parameters:
subtask_list (list) – A list of SubTask instances to combine.
- Returns:
A tuple containing the final task description and a directed graph of the task sequence.
- Return type:
tuple
- combine_two_subtasks(sub_task_id_1: int, sub_task_id_2: int) tuple[str, DiGraph] [source]#
Combines two subtasks into a single task sequence based on user input.
- Parameters:
sub_task_id_1 (int) – ID of the first subtask.
sub_task_id_2 (int) – ID of the second subtask.
- Returns:
A tuple containing the combined task description and a directed graph of the task sequence.
- Return type:
tuple
- static dump_generated_task(description, task_instance_graph, dir_path='.')[source]#
Saves a generated task to a file.
- Parameters:
description (str) – The description of the generated task.
task_instance_graph (nx.DiGraph) – The directed graph of the task instance.
dir_path (str) – The directory path where the task file will be saved.
- classmethod from_config(config_path: str) TaskGenerator [source]#
Class method to create a TaskGenerator instance from a configuration file.
- Parameters:
config_path (str) – Path to the YAML configuration file.
- Returns:
An instance of TaskGenerator.
- Return type:
- static generate_evaluator(subtasks_graph: DiGraph)[source]#
Generates an evaluator graph from a directed graph of subtask instances.
- Parameters:
subtasks_graph (nx.DiGraph) – A directed graph of subtask instances.
- Returns:
A directed graph representing the combined evaluator.
- Return type:
nx.DiGraph
- static generate_single_node_task(subtask: SubTask)[source]#
Generates a single node task based on a SubTask instance.
- Parameters:
subtask (SubTask) – The subtask to generate a task for.
- Returns:
A tuple containing the task description and a directed graph of the task.
- Return type:
tuple
- get_task_from_file(file_name) Task [source]#
Loads a task from a file.
- Parameters:
file_name (str) – The file name containing the task data.
- Returns:
An instance of Task loaded from the file.
- Return type:
- gpt_choice(current_description: str, outgoing_edges: list[tuple[SubTask, SubTask, str]]) tuple[SubTask, dict[str, str], str, str] [source]#
Determines the best task choice from a list of possible target tasks using GPT model.
- Parameters:
current_description (str) – Description of the current task.
outgoing_edges (list) – List of possible outgoing edges representing target tasks.
- Returns:
A tuple containing the chosen SubTask, attributes, new description, and combined description.
- Return type:
tuple
- graph_generation(subtask_list: list[SubTask]) None [source]#
Generates a directed graph from a list of subtasks based on output and input types.
- random_walk(current_description: str, start_node: SubTask, random_number: int) tuple[SubTask, dict[str, str]] | None [source]#
Performs a random walk from the starting node to generate a task sequence.
- Parameters:
current_description (str) – The current task description.
start_node (SubTask) – The starting subtask node.
random_number (int) – Maximum number of edges to consider.
- Returns:
A tuple containing the next SubTask, attributes if a next step is available, otherwise None.
- Return type:
tuple | None
- task_generation(start_id: int | None = None, max_iter: int = 3, random_number: int = 5) tuple[str, list[SubTask]] [source]#
Generates a sequence of tasks starting from a given subtask ID or randomly.
- Parameters:
start_id (int | None) – The ID of the starting subtask or None to choose randomly.
max_iter (int) – The maximum number of iterations to perform in the generation process.
random_number (int) – The maximum number of neighbors to consider for random walk.
- Returns:
A tuple containing the final task description and a list of SubTask objects.
- Return type:
tuple
- crab.core.task_generator.generate_length1_all(generator: TaskGenerator, dir_path: str, subtask_collection: list)[source]#
Generates tasks for all subtasks in a collection and saves them.
- Parameters:
generator (TaskGenerator) – The task generator instance.
dir_path (str) – The directory path where the tasks will be saved.
subtask_collection (list) – The collection of subtasks to generate tasks for.
- crab.core.task_generator.generate_length1_by_id(generator: TaskGenerator, dir_path: str)[source]#
Generates a single task for a specified subtask ID and saves it.
- Parameters:
generator (TaskGenerator) – The task generator instance.
dir_path (str) – The directory path where the task will be saved.
- crab.core.task_generator.generate_length2_manual(generator: TaskGenerator, dir_path: str)[source]#
Manually generates a two-step task sequence from user-specified subtask IDs and saves it.
- Parameters:
generator (TaskGenerator) – The task generator instance.
dir_path (str) – The directory path where the task sequence will be saved.