Refactor voice control core and robot behavior

2026-02-02 12:29:59 -03:00
parent b9798a2f46
commit 695d309816
36 changed files with 3436 additions and 1065 deletions
--- a/dora_voice_control/README.md
+++ b/dora_voice_control/README.md
@@ -1,131 +1,129 @@
 # Dora Voice Control Node

-A Dora node that processes Spanish voice commands from children and translates them into robot actions (movement, grasping, releasing objects). Includes a web debug interface.
+Dora node that processes Spanish voice commands and translates them into robot actions. Supports multiple robot types via robot subfolders.

 ## Features

- Spanish voice command parsing (rule-based or Gemini LLM)
+- Spanish voice command parsing (rule-based or LLM)
+- Robot adapter pattern for different gripper types
 - Real-time web debug interface
 - Command queue management
 - Workspace bounds validation
- Object detection integration

 ## File Structure

-```
+```text
 dora_voice_control/
-├── __init__.py
-├── main.py        # Main Dora node entry point
-├── api.py         # FastAPI web server
-├── config.py      # Configuration management
-├── models.py      # Pydantic request/response models
-├── parser.py      # Voice command parsing logic
-├── state.py       # Shared state management
-└── templates.py   # HTML template for web interface
+├── main.py                 # Thin orchestrator
+│
+├── core/                   # Shared logic
+│   ├── behavior.py         # RobotBehavior with actions
+│   ├── config.py           # Configuration classes
+│   ├── node.py             # Dora adapter + dispatcher + context
+│   ├── robot.py            # RobotAdapter base
+│   ├── robot_io.py         # Pose/status/image handlers + command queue
+│   ├── scene.py            # Scene state + notifier + objects handler
+│   ├── state.py            # Thread-safe shared state
+│   └── voice.py            # Voice input + parsing + intents
+│
+├── robots/                 # Robot-specific implementations
+│   └── littlehand/         # Vacuum gripper robot
+│       ├── adapter.py      # Vacuum adapter
+│       ├── actions.py      # Action vocabulary
+│       └── behavior.py     # Behavior binding
+│
+└── web/                    # Web interface
+    ├── api.py              # FastAPI server
+    ├── models.py           # Pydantic models
+    └── templates.py        # HTML template
 ```

-## Web Debug Interface
+## Robot Adapters

-Access the debug interface at `http://localhost:8080` (default).
+Set `ROBOT_TYPE` to select the robot package:

-Features:
- Real-time status monitoring (pose, objects, queue)
- Send manual voice commands
- Quick command buttons
- View parse results
- Command history
- Clear queue
+| Type | Grab Command | Release Command |
+|------|--------------|-----------------|
+| `littlehand` (alias: `vacuum`) | `vacuum_on` | `vacuum_off` |
+
+To add a new robot, create a new subfolder under `robots/` with its adapter and behavior, then register it in `robots/__init__.py`.

 ## Inputs/Outputs

-| Input         | Type   | Description                              |
-|---------------|--------|------------------------------------------|
-| `voice_in`    | string | Text transcription of voice command      |
-| `tcp_pose`    | array  | Current robot pose [x, y, z, roll, pitch, yaw] |
-| `objects`     | JSON   | Detected objects from vision system      |
-| `status`      | JSON   | Command execution status from robot      |
+| Input | Type | Description |
+|-------|------|-------------|
+| `voice_in` | string | Voice command text |
+| `tcp_pose` | array | Robot pose [x, y, z, roll, pitch, yaw] |
+| `objects` | JSON | Detected objects |
+| `status` | JSON | Command execution status |
+| `image_annotated` | array | Camera image |

-| Output        | Type   | Description                              |
-|---------------|--------|------------------------------------------|
-| `robot_cmd`   | JSON   | Robot command with action and payload    |
-| `voice_out`   | JSON   | Response confirmation to user            |
-| `scene_update`| JSON   | Updated scene with all visible objects   |
+| Output | Type | Description |
+|--------|------|-------------|
+| `robot_cmd` | JSON | Robot command |
+| `voice_out` | JSON | Response to user |
+| `scene_update` | JSON | Scene state |

 ## Supported Commands (Spanish)

-| Command       | Action         | Example                        |
-|---------------|----------------|--------------------------------|
-| `subir`       | Move up        | "sube"                         |
-| `bajar`       | Move down      | "baja"                         |
-| `tomar`       | Grab object    | "agarra el cubo rojo"          |
-| `soltar`      | Release object | "suelta en la caja azul"       |
-| `ir`          | Go to object   | "ve al cilindro"               |
-| `reiniciar`   | Reset          | "reinicia"                     |
+| Command | Action | Example |
+|---------|--------|---------|
+| `subir` | Move up | "sube" |
+| `bajar` | Move down | "baja" |
+| `tomar` | Grab object | "agarra el cubo rojo" |
+| `soltar` | Release object | "suelta en la caja azul" |
+| `ir` | Go to object | "ve al cilindro" |
+| `reiniciar` | Reset | "reinicia" |

 ## Environment Variables

 ```bash
-# Web API Server
-API_ENABLED=true        # Enable/disable web interface
-API_HOST=0.0.0.0        # Bind address
-API_PORT=8080           # Listen port
+# Robot Configuration
+ROBOT_TYPE=littlehand       # "littlehand" (alias: "vacuum")
+
+# Web API
+API_ENABLED=true
+API_PORT=9001

 # TCP Parameters
-TCP_OFFSET_MM=63.0          # Z-offset to object surface
-APPROACH_OFFSET_MM=50.0     # Safe approach distance above object
-STEP_MM=20.0                # Distance for up/down increments
+TCP_OFFSET_MM=63.0
+APPROACH_OFFSET_MM=50.0
+STEP_MM=20.0

-# LLM Configuration (optional)
-LLM_PROVIDER=rules          # "rules" or "gemini"
-GOOGLE_API_KEY=your_key     # Required if using gemini
-GEMINI_MODEL=gemini-2.0-flash
+# LLM (optional)
+LLM_PROVIDER=rules          # "rules", "gemini", "ollama"
+GOOGLE_API_KEY=your_key

-# Workspace Safety (optional)
-WORKSPACE_MIN_X=-300
-WORKSPACE_MAX_X=300
-WORKSPACE_MIN_Y=-300
-WORKSPACE_MAX_Y=300
+# Initial Position
+INIT_ON_START=true
+INIT_X=300.0
+INIT_Y=0.0
+INIT_Z=350.0
+
+# Safety
+DRY_RUN=false
 WORKSPACE_MIN_Z=0
 WORKSPACE_MAX_Z=500
-
-# Misc
-DRY_RUN=false               # Skip sending robot commands
 ```

-## Installation
+## Web Debug Interface
+
+Access at `http://localhost:8080`:
+
+- Camera view with detections
+- Real-time status (pose, objects, queue)
+- Send manual commands
+- View parse results
+
+## API Endpoints

 ```bash
-cd dora_voice_control
-pip install -e .
-
-# With LLM support
-pip install -e ".[llm]"
-```
-
-## Testing
-
-### Web Interface
-
-```bash
-# Start the node (standalone for testing)
-python -m dora_voice_control.main
-
-# Open in browser
-open http://localhost:8080
-```
-
-### API Endpoints
-
-```bash
-# Get status
+# Status
 curl http://localhost:8080/api/status

-# Get objects
+# Objects
 curl http://localhost:8080/api/objects

-# Get queue
-curl http://localhost:8080/api/queue
-
 # Send command
 curl -X POST http://localhost:8080/api/command \
  -H "Content-Type: application/json" \
@@ -135,77 +133,30 @@ curl -X POST http://localhost:8080/api/command \
 curl -X POST http://localhost:8080/api/queue/clear
 ```

-### Python Test
-
-```python
-from dora_voice_control.parser import rule_parse, normalize
-
-# Test command parsing
-text = "agarra el cubo rojo grande"
-result = rule_parse(normalize(text))
-print(result)
-# {'resultado': 'ok', 'accion': 'tomar', 'objeto': 'cubo', 'color': 'rojo', 'tamano': 'grande'}
-```
-
-## Dora Dataflow Configuration
+## Dataflow Example

 ```yaml
-nodes:
-  - id: voice_control
-    build: pip install -e ./dora_voice_control
-    path: dora_voice_control
-    inputs:
-      voice_in: iobridge/voice_in
-      tcp_pose: robot/tcp_pose
-      objects: detector/objects
-      status: robot/status
-    outputs:
-      - robot_cmd
-      - voice_out
-      - scene_update
-    env:
-      API_ENABLED: "true"
-      API_PORT: "8080"
-      DRY_RUN: "false"
+- id: voice
+  build: uv pip install -e dora_voice_control
+  path: dora_voice_control/dora_voice_control/main.py
+  env:
+    ROBOT_TYPE: "vacuum"
+    API_ENABLED: "true"
+  inputs:
+    voice_in: iobridge/text_out
+    tcp_pose: robot/tcp_pose
+    objects: detector/objects
+    status: robot/status
+  outputs:
+    - robot_cmd
+    - voice_out
+    - scene_update
 ```

-## Message Examples
+## Adding a New Robot

-### Input: voice_in
-```
-"sube"
-"agarra el cubo rojo"
-"suelta en la caja azul"
-```
-
-### Output: robot_cmd
-```json
-{
-  "id": "550e8400-e29b-41d4-a716-446655440000",
-  "action": "move_to_pose",
-  "payload": {
-    "x": 150.0,
-    "y": 200.0,
-    "z": 280.0,
-    "roll": 180.0,
-    "pitch": 0.0,
-    "yaw": 0.0
-  }
-}
-```
-
-### Output: voice_out
-```json
-{"text": "Ok, voy a subir", "status": "ok"}
-{"text": "No entendi el comando", "status": "error"}
-```
-
-## Dependencies
-
- dora-rs >= 0.3.9
- numpy < 2.0.0
- pyarrow >= 12.0.0
- fastapi >= 0.109.0
- uvicorn >= 0.27.0
- pydantic >= 2.0.0
- google-genai (optional, for Gemini mode)
+1) Create `dora_voice_control/dora_voice_control/robots/<robot_name>/` with:
+- `adapter.py` implementing a `RobotAdapter`
+- `actions.py` defining action aliases (can reuse defaults)
+- `behavior.py` binding the behavior class
+2) Register it in `dora_voice_control/dora_voice_control/robots/__init__.py`