crab.core package

Contents

crab.core package#

Subpackages#

Submodules#

crab.core.benchmark module#

class crab.core.benchmark.Benchmark(name: str, tasks: list[Task], environments: list[Environment], default_env: str | None = None, multienv: bool = False, prompting_tools: dict[str, dict[str, Action]] = {}, root_action_space: list[Action] = [], step_limit: int = 30, common_setup: list[Annotated[Action, AfterValidator(func=_check_no_param)]] = [])[source]#

Bases: object

The crab benchmark controller managing environments and agent evaluation.

The class manages multiple environments together and provide the simple API by step(), observe() and reset() for language model agents to perform tasks in multiple environments.

This class introduces a “root” environment with no action or observation capabilities, intended as a utility for evaluations not directly tied to a specific environment.

This class operates in two distinct modes: “multi-environment” and “single-environment”. In multi-environment mode, observations and action results are separated by environment, returned as a dictionary. While in single-environment mode, all observations and action outcomes are merged under the “root” environment, with actions being appropriately routed to their respective environments.

close_task() None[source]#

Cleans up after a task is completed.

evaluate()[source]#
export_action_space() dict[str, list[Action]][source]#

Returns the action spaces from all environments.

Returns:

A dict of action lists for each environment, keyed by environment name.

get_env_descriptions() dict[str, str][source]#

Get environment descriptions as a dict structure.

human_evaluation(task_id: str) None[source]#
observe() dict[str, dict[str, Any]][source]#

Collects observations from all environments.

Returns:

A dict-of-dict with observations from each environment. The first level keys are environment names, the second level keys are observation action names.

observe_with_prompt() tuple[dict[str, dict[str, Any]], dict[str, tuple[str, MessageType]]][source]#

Collects observations and applies prompting tools.

Returns:

A tuple (observations, prompts), where “observations” and “prompts” are observations from each environment and the result of applying prompting tools on them. The first level keys are environment names, the second level keys are observation action names. Notice that some dicts can be empty if its prompting tool wasn’t set.

reset() None[source]#

Resets all environments and the current task.

start_task(task_id: str) tuple[Task, dict[str, list[Action]]][source]#

Initializes and starts a specified task.

Parameters:

task_id – The ID of the task to start.

Returns:

A tuple (task, action_space), where task is the started task object, and action_sapce is a dict mapping action names to the corresponding action object.

step(action: str, parameters: dict[str, Any] = {}, env_name: str | None = None) StepResult[source]#

Executes a step in the benchmark by performing an action.

Parameters:
  • action – The action to execute.

  • parameters – Parameters for the action.

  • env_name – The name of the environment.

Returns:

The result of the step including observations and evaluation metrics. Notice that the truncated field in the result is not meaningful for now.

crab.core.benchmark.create_benchmark(config: BenchmarkConfig) Benchmark[source]#

Creates a benchmark by BenchmarkConfig

crab.core.decorators module#

crab.core.decorators.action(*args: Callable, env_name: str | None = None, local=False)[source]#

Use @action to change a function to an Action

crab.core.decorators.evaluator(*args: Callable, require_submit: bool = False, env_name: str | None = None, local=False)[source]#

Use @evaluator to change a function to an Evaluator

crab.core.environment module#

class crab.core.environment.Environment(name: str, action_space: list[Action], observation_space: list[Annotated[Action, AfterValidator(func=_check_no_param)]], description: str = '', reset: Action | None = None, remote_url: str | None = None, extra_attributes: dict[str, Any] = {})[source]#

Bases: object

A crab environment for language model agent interaction and evaluation.

This class supports action execution and observation within a simulated or actual ecosystem. The environment is defined by customizable action and observation spaces, comprising various crab actions. Actions should include comprehensive docstrings to facilitate agent understanding and interaction.

Typically, users instantiate this class directly to perform actions within the local execution context (i.e., the device running the crab framework). This class may also serve as a base for specialized environments requiring unique action execution processes, such as forwarding actions to remote systems for execution. This is achieved by overriding the take_action method.

Actions defined in the action_space, observation_space, or reset, as well as those invoked through the take_action method that include an env parameter, will have this parameter automatically populated with the current environment instance. This allows actions to access and manipulate environment states and variables.

name#

The name of the environment.

Type:

str

description#

A description of the environment.

Type:

str

trajectory#

A record of actions taken, their parameters, and the results.

Type:

List[tuple[str, dict[str, Any], Any]]

Parameters:
  • name (str) – The name of the environment.

  • action_space (List[Action]) – A list of actions that can be executed, defining the possible interactions agents can undertake.

  • observation_space (List[ClosedAction]) – A list of observations defining the possible states agents can perceive.

  • description (str, optional) – A textual description of the environment. Defaults to an empty string.

  • reset (Action | None, optional) – An action to reset the environment to its initial state. Defaults to None.

  • remote_url (Action | None, optional) – If set, the action will be taken at remote machine, by default it will be taken at local. Example: http://192.168.1.1:8000. Defaults to None.

property action_space: list[Action]#
close() None[source]#

Closes the environment, performing any necessary cleanup.

property observation_space: list[Annotated[Action, AfterValidator(func=_check_no_param)]]#
observe() dict[str, Any][source]#

Observes the current state.

Returns:

A dictionary containing the current observations. Keys

represent the names of the observation actions.

Return type:

Dict[str, Any]

observe_with_prompt(prompt_tools: dict[str, Action]) tuple[dict[str, Any], dict[str, Any]][source]#

Observes the current state with prompt.

reset() None[source]#

Resets the environment based on the provided reset action

set_action(action: Action) None[source]#

Adds an action in the environment’s action space, either replace if the action name exist.

Parameters:

action (Action) – The action to replace or add.

start() None[source]#

Starts the environment.

step(action_name: str, parameters: dict[str, Any] = {})[source]#

Executes an action that is in the action space and recorded to the trajectory.

Parameters:
  • action_name – Name of the action to execute. Must be in action space.

  • parameters (dict[str, Any], optional) – Parameters for the action. Defaults to an empty dict.

Returns:

The result of the action execution.

Return type:

Any

Raises:

ActionNotFound – If the action is not found within the environment’s action space.

take_action(action: Action, parameters: dict[str, Any] = {}) Any[source]#

Executes an action within the environment.

Parameters:
  • action (Action) – The action to execute. Can be an action name or an Action object.

  • parameters (dict[str, Any], optional) – Parameters for the action. Defaults to an empty dict.

Returns:

The result of the action execution.

Return type:

Any

crab.core.environment.create_environment(config)[source]#

crab.core.exceptions module#

exception crab.core.exceptions.ActionNotFound[source]#

Bases: ValueError

exception crab.core.exceptions.TaskNotFound[source]#

Bases: ValueError

crab.core.graph_evaluator module#

class crab.core.graph_evaluator.GraphEvaluator(incoming_graph_data, enable_shortcut: bool = False)[source]#

Bases: object

calculate_longest_unfinished_path_length() int[source]#
calculate_step_to_complete() int[source]#
compute_radar_stats() dict[str, float][source]#
entry() bool[source]#
get_completeness() float[source]#
get_completeness_per_action() float[source]#
get_longest_unfinished_path_length() int[source]#
get_next_source_nodes() set[Evaluator][source]#

Get next source nodes to evaluate.

get_step_to_complete() int[source]#
is_complete() bool[source]#
reset()[source]#
stat() dict[str, Any][source]#
step(envs: dict[str, Environment], default_env: str = 'root')[source]#
update()[source]#
static visualize(evaluators: list[GraphEvaluator], path: str)[source]#

crab.core.task_generator module#

class crab.core.task_generator.TaskGenerator(attribute_pool: dict[str, list] = {}, subtasks: list[SubTask] = [])[source]#

Bases: object

Class to generate tasks based on a directed graph of subtasks.

combine(current_description: str, target_description: str) str[source]#

Combines two task descriptions into a single task description using GPT model.

Parameters:
  • current_description (str) – The current task description.

  • target_description (str) – The target task description to combine.

Returns:

The combined task description.

Return type:

str

combine_subtask_list(subtask_list: list[SubTask])[source]#

Combines a list of subtasks into a single task sequence.

Parameters:

subtask_list (list) – A list of SubTask instances to combine.

Returns:

A tuple containing the final task description and a directed graph of the task sequence.

Return type:

tuple

combine_two_subtasks(sub_task_id_1: int, sub_task_id_2: int) tuple[str, DiGraph][source]#

Combines two subtasks into a single task sequence based on user input.

Parameters:
  • sub_task_id_1 (int) – ID of the first subtask.

  • sub_task_id_2 (int) – ID of the second subtask.

Returns:

A tuple containing the combined task description and a directed graph of the task sequence.

Return type:

tuple

static dump_generated_task(description, task_instance_graph, dir_path='.')[source]#

Saves a generated task to a file.

Parameters:
  • description (str) – The description of the generated task.

  • task_instance_graph (nx.DiGraph) – The directed graph of the task instance.

  • dir_path (str) – The directory path where the task file will be saved.

classmethod from_config(config_path: str) TaskGenerator[source]#

Class method to create a TaskGenerator instance from a configuration file.

Parameters:

config_path (str) – Path to the YAML configuration file.

Returns:

An instance of TaskGenerator.

Return type:

TaskGenerator

static generate_evaluator(subtasks_graph: DiGraph)[source]#

Generates an evaluator graph from a directed graph of subtask instances.

Parameters:

subtasks_graph (nx.DiGraph) – A directed graph of subtask instances.

Returns:

A directed graph representing the combined evaluator.

Return type:

nx.DiGraph

static generate_single_node_task(subtask: SubTask)[source]#

Generates a single node task based on a SubTask instance.

Parameters:

subtask (SubTask) – The subtask to generate a task for.

Returns:

A tuple containing the task description and a directed graph of the task.

Return type:

tuple

get_task_from_file(file_name) Task[source]#

Loads a task from a file.

Parameters:

file_name (str) – The file name containing the task data.

Returns:

An instance of Task loaded from the file.

Return type:

Task

gpt_choice(current_description: str, outgoing_edges: list[tuple[SubTask, SubTask, str]]) tuple[SubTask, dict[str, str], str, str][source]#

Determines the best task choice from a list of possible target tasks using GPT model.

Parameters:
  • current_description (str) – Description of the current task.

  • outgoing_edges (list) – List of possible outgoing edges representing target tasks.

Returns:

A tuple containing the chosen SubTask, attributes, new description, and combined description.

Return type:

tuple

graph_generation(subtask_list: list[SubTask]) None[source]#

Generates a directed graph from a list of subtasks based on output and input types.

random_walk(current_description: str, start_node: SubTask, random_number: int) tuple[SubTask, dict[str, str]] | None[source]#

Performs a random walk from the starting node to generate a task sequence.

Parameters:
  • current_description (str) – The current task description.

  • start_node (SubTask) – The starting subtask node.

  • random_number (int) – Maximum number of edges to consider.

Returns:

A tuple containing the next SubTask, attributes if a next step is available, otherwise None.

Return type:

tuple | None

task_generation(start_id: int | None = None, max_iter: int = 3, random_number: int = 5) tuple[str, list[SubTask]][source]#

Generates a sequence of tasks starting from a given subtask ID or randomly.

Parameters:
  • start_id (int | None) – The ID of the starting subtask or None to choose randomly.

  • max_iter (int) – The maximum number of iterations to perform in the generation process.

  • random_number (int) – The maximum number of neighbors to consider for random walk.

Returns:

A tuple containing the final task description and a list of SubTask objects.

Return type:

tuple

crab.core.task_generator.generate_length1_all(generator: TaskGenerator, dir_path: str, subtask_collection: list)[source]#

Generates tasks for all subtasks in a collection and saves them.

Parameters:
  • generator (TaskGenerator) – The task generator instance.

  • dir_path (str) – The directory path where the tasks will be saved.

  • subtask_collection (list) – The collection of subtasks to generate tasks for.

crab.core.task_generator.generate_length1_by_id(generator: TaskGenerator, dir_path: str)[source]#

Generates a single task for a specified subtask ID and saves it.

Parameters:
  • generator (TaskGenerator) – The task generator instance.

  • dir_path (str) – The directory path where the task will be saved.

crab.core.task_generator.generate_length2_manual(generator: TaskGenerator, dir_path: str)[source]#

Manually generates a two-step task sequence from user-specified subtask IDs and saves it.

Parameters:
  • generator (TaskGenerator) – The task generator instance.

  • dir_path (str) – The directory path where the task sequence will be saved.

crab.core.task_generator.load_subtasks(version)[source]#

Loads subtasks from specified benchmark version modules.

Parameters:

version (str) – The version of the benchmark to load subtasks from.

Returns:

A tuple containing two collections of subtasks.

Return type:

tuple

crab.core.task_generator.main()[source]#

crab.core.vagrant_manager module#

Module contents#