# Dora Voice Control Node A Dora node that processes Spanish voice commands from children and translates them into robot actions (movement, grasping, releasing objects). Includes a web debug interface. ## Features - Spanish voice command parsing (rule-based or Gemini LLM) - Real-time web debug interface - Command queue management - Workspace bounds validation - Object detection integration ## File Structure ``` dora_voice_control/ ├── __init__.py ├── main.py # Main Dora node entry point ├── api.py # FastAPI web server ├── config.py # Configuration management ├── models.py # Pydantic request/response models ├── parser.py # Voice command parsing logic ├── state.py # Shared state management └── templates.py # HTML template for web interface ``` ## Web Debug Interface Access the debug interface at `http://localhost:8080` (default). Features: - Real-time status monitoring (pose, objects, queue) - Send manual voice commands - Quick command buttons - View parse results - Command history - Clear queue ## Inputs/Outputs | Input | Type | Description | |---------------|--------|------------------------------------------| | `voice_in` | string | Text transcription of voice command | | `tcp_pose` | array | Current robot pose [x, y, z, roll, pitch, yaw] | | `objects` | JSON | Detected objects from vision system | | `status` | JSON | Command execution status from robot | | Output | Type | Description | |---------------|--------|------------------------------------------| | `robot_cmd` | JSON | Robot command with action and payload | | `voice_out` | JSON | Response confirmation to user | | `scene_update`| JSON | Updated scene with all visible objects | ## Supported Commands (Spanish) | Command | Action | Example | |---------------|----------------|--------------------------------| | `subir` | Move up | "sube" | | `bajar` | Move down | "baja" | | `tomar` | Grab object | "agarra el cubo rojo" | | `soltar` | Release object | "suelta en la caja azul" | | `ir` | Go to object | "ve al cilindro" | | `reiniciar` | Reset | "reinicia" | ## Environment Variables ```bash # Web API Server API_ENABLED=true # Enable/disable web interface API_HOST=0.0.0.0 # Bind address API_PORT=8080 # Listen port # TCP Parameters TCP_OFFSET_MM=63.0 # Z-offset to object surface APPROACH_OFFSET_MM=50.0 # Safe approach distance above object STEP_MM=20.0 # Distance for up/down increments # LLM Configuration (optional) LLM_PROVIDER=rules # "rules" or "gemini" GOOGLE_API_KEY=your_key # Required if using gemini GEMINI_MODEL=gemini-2.0-flash # Workspace Safety (optional) WORKSPACE_MIN_X=-300 WORKSPACE_MAX_X=300 WORKSPACE_MIN_Y=-300 WORKSPACE_MAX_Y=300 WORKSPACE_MIN_Z=0 WORKSPACE_MAX_Z=500 # Misc DRY_RUN=false # Skip sending robot commands ``` ## Installation ```bash cd dora_voice_control pip install -e . # With LLM support pip install -e ".[llm]" ``` ## Testing ### Web Interface ```bash # Start the node (standalone for testing) python -m dora_voice_control.main # Open in browser open http://localhost:8080 ``` ### API Endpoints ```bash # Get status curl http://localhost:8080/api/status # Get objects curl http://localhost:8080/api/objects # Get queue curl http://localhost:8080/api/queue # Send command curl -X POST http://localhost:8080/api/command \ -H "Content-Type: application/json" \ -d '{"text": "sube"}' # Clear queue curl -X POST http://localhost:8080/api/queue/clear ``` ### Python Test ```python from dora_voice_control.parser import rule_parse, normalize # Test command parsing text = "agarra el cubo rojo grande" result = rule_parse(normalize(text)) print(result) # {'resultado': 'ok', 'accion': 'tomar', 'objeto': 'cubo', 'color': 'rojo', 'tamano': 'grande'} ``` ## Dora Dataflow Configuration ```yaml nodes: - id: voice_control build: pip install -e ./dora_voice_control path: dora_voice_control inputs: voice_in: iobridge/voice_in tcp_pose: robot/tcp_pose objects: detector/objects status: robot/status outputs: - robot_cmd - voice_out - scene_update env: API_ENABLED: "true" API_PORT: "8080" DRY_RUN: "false" ``` ## Message Examples ### Input: voice_in ``` "sube" "agarra el cubo rojo" "suelta en la caja azul" ``` ### Output: robot_cmd ```json { "id": "550e8400-e29b-41d4-a716-446655440000", "action": "move_to_pose", "payload": { "x": 150.0, "y": 200.0, "z": 280.0, "roll": 180.0, "pitch": 0.0, "yaw": 0.0 } } ``` ### Output: voice_out ```json {"text": "Ok, voy a subir", "status": "ok"} {"text": "No entendi el comando", "status": "error"} ``` ## Dependencies - dora-rs >= 0.3.9 - numpy < 2.0.0 - pyarrow >= 12.0.0 - fastapi >= 0.109.0 - uvicorn >= 0.27.0 - pydantic >= 2.0.0 - google-genai (optional, for Gemini mode)