Add voice control, working but need more work

2026-01-31 11:41:50 -03:00
parent 380c466170
commit b9798a2f46
21 changed files with 3101 additions and 0 deletions
--- a/dora_voice_control/README.md
+++ b/dora_voice_control/README.md
@@ -0,0 +1,211 @@
+# Dora Voice Control Node
+
+A Dora node that processes Spanish voice commands from children and translates them into robot actions (movement, grasping, releasing objects). Includes a web debug interface.
+
+## Features
+
+- Spanish voice command parsing (rule-based or Gemini LLM)
+- Real-time web debug interface
+- Command queue management
+- Workspace bounds validation
+- Object detection integration
+
+## File Structure
+
+```
+dora_voice_control/
+├── __init__.py
+├── main.py        # Main Dora node entry point
+├── api.py         # FastAPI web server
+├── config.py      # Configuration management
+├── models.py      # Pydantic request/response models
+├── parser.py      # Voice command parsing logic
+├── state.py       # Shared state management
+└── templates.py   # HTML template for web interface
+```
+
+## Web Debug Interface
+
+Access the debug interface at `http://localhost:8080` (default).
+
+Features:
+- Real-time status monitoring (pose, objects, queue)
+- Send manual voice commands
+- Quick command buttons
+- View parse results
+- Command history
+- Clear queue
+
+## Inputs/Outputs
+
+| Input         | Type   | Description                              |
+|---------------|--------|------------------------------------------|
+| `voice_in`    | string | Text transcription of voice command      |
+| `tcp_pose`    | array  | Current robot pose [x, y, z, roll, pitch, yaw] |
+| `objects`     | JSON   | Detected objects from vision system      |
+| `status`      | JSON   | Command execution status from robot      |
+
+| Output        | Type   | Description                              |
+|---------------|--------|------------------------------------------|
+| `robot_cmd`   | JSON   | Robot command with action and payload    |
+| `voice_out`   | JSON   | Response confirmation to user            |
+| `scene_update`| JSON   | Updated scene with all visible objects   |
+
+## Supported Commands (Spanish)
+
+| Command       | Action         | Example                        |
+|---------------|----------------|--------------------------------|
+| `subir`       | Move up        | "sube"                         |
+| `bajar`       | Move down      | "baja"                         |
+| `tomar`       | Grab object    | "agarra el cubo rojo"          |
+| `soltar`      | Release object | "suelta en la caja azul"       |
+| `ir`          | Go to object   | "ve al cilindro"               |
+| `reiniciar`   | Reset          | "reinicia"                     |
+
+## Environment Variables
+
+```bash
+# Web API Server
+API_ENABLED=true        # Enable/disable web interface
+API_HOST=0.0.0.0        # Bind address
+API_PORT=8080           # Listen port
+
+# TCP Parameters
+TCP_OFFSET_MM=63.0          # Z-offset to object surface
+APPROACH_OFFSET_MM=50.0     # Safe approach distance above object
+STEP_MM=20.0                # Distance for up/down increments
+
+# LLM Configuration (optional)
+LLM_PROVIDER=rules          # "rules" or "gemini"
+GOOGLE_API_KEY=your_key     # Required if using gemini
+GEMINI_MODEL=gemini-2.0-flash
+
+# Workspace Safety (optional)
+WORKSPACE_MIN_X=-300
+WORKSPACE_MAX_X=300
+WORKSPACE_MIN_Y=-300
+WORKSPACE_MAX_Y=300
+WORKSPACE_MIN_Z=0
+WORKSPACE_MAX_Z=500
+
+# Misc
+DRY_RUN=false               # Skip sending robot commands
+```
+
+## Installation
+
+```bash
+cd dora_voice_control
+pip install -e .
+
+# With LLM support
+pip install -e ".[llm]"
+```
+
+## Testing
+
+### Web Interface
+
+```bash
+# Start the node (standalone for testing)
+python -m dora_voice_control.main
+
+# Open in browser
+open http://localhost:8080
+```
+
+### API Endpoints
+
+```bash
+# Get status
+curl http://localhost:8080/api/status
+
+# Get objects
+curl http://localhost:8080/api/objects
+
+# Get queue
+curl http://localhost:8080/api/queue
+
+# Send command
+curl -X POST http://localhost:8080/api/command \
+  -H "Content-Type: application/json" \
+  -d '{"text": "sube"}'
+
+# Clear queue
+curl -X POST http://localhost:8080/api/queue/clear
+```
+
+### Python Test
+
+```python
+from dora_voice_control.parser import rule_parse, normalize
+
+# Test command parsing
+text = "agarra el cubo rojo grande"
+result = rule_parse(normalize(text))
+print(result)
+# {'resultado': 'ok', 'accion': 'tomar', 'objeto': 'cubo', 'color': 'rojo', 'tamano': 'grande'}
+```
+
+## Dora Dataflow Configuration
+
+```yaml
+nodes:
+  - id: voice_control
+    build: pip install -e ./dora_voice_control
+    path: dora_voice_control
+    inputs:
+      voice_in: iobridge/voice_in
+      tcp_pose: robot/tcp_pose
+      objects: detector/objects
+      status: robot/status
+    outputs:
+      - robot_cmd
+      - voice_out
+      - scene_update
+    env:
+      API_ENABLED: "true"
+      API_PORT: "8080"
+      DRY_RUN: "false"
+```
+
+## Message Examples
+
+### Input: voice_in
+```
+"sube"
+"agarra el cubo rojo"
+"suelta en la caja azul"
+```
+
+### Output: robot_cmd
+```json
+{
+  "id": "550e8400-e29b-41d4-a716-446655440000",
+  "action": "move_to_pose",
+  "payload": {
+    "x": 150.0,
+    "y": 200.0,
+    "z": 280.0,
+    "roll": 180.0,
+    "pitch": 0.0,
+    "yaw": 0.0
+  }
+}
+```
+
+### Output: voice_out
+```json
+{"text": "Ok, voy a subir", "status": "ok"}
+{"text": "No entendi el comando", "status": "error"}
+```
+
+## Dependencies
+
+- dora-rs >= 0.3.9
+- numpy < 2.0.0
+- pyarrow >= 12.0.0
+- fastapi >= 0.109.0
+- uvicorn >= 0.27.0
+- pydantic >= 2.0.0
+- google-genai (optional, for Gemini mode)