Dora Voice Control Node
A Dora node that processes Spanish voice commands from children and translates them into robot actions (movement, grasping, releasing objects). Includes a web debug interface.
Features
- Spanish voice command parsing (rule-based or Gemini LLM)
- Real-time web debug interface
- Command queue management
- Workspace bounds validation
- Object detection integration
File Structure
dora_voice_control/
├── __init__.py
├── main.py # Main Dora node entry point
├── api.py # FastAPI web server
├── config.py # Configuration management
├── models.py # Pydantic request/response models
├── parser.py # Voice command parsing logic
├── state.py # Shared state management
└── templates.py # HTML template for web interface
Web Debug Interface
Access the debug interface at http://localhost:8080 (default).
Features:
- Real-time status monitoring (pose, objects, queue)
- Send manual voice commands
- Quick command buttons
- View parse results
- Command history
- Clear queue
Inputs/Outputs
| Input | Type | Description |
|---|---|---|
voice_in |
string | Text transcription of voice command |
tcp_pose |
array | Current robot pose [x, y, z, roll, pitch, yaw] |
objects |
JSON | Detected objects from vision system |
status |
JSON | Command execution status from robot |
| Output | Type | Description |
|---|---|---|
robot_cmd |
JSON | Robot command with action and payload |
voice_out |
JSON | Response confirmation to user |
scene_update |
JSON | Updated scene with all visible objects |
Supported Commands (Spanish)
| Command | Action | Example |
|---|---|---|
subir |
Move up | "sube" |
bajar |
Move down | "baja" |
tomar |
Grab object | "agarra el cubo rojo" |
soltar |
Release object | "suelta en la caja azul" |
ir |
Go to object | "ve al cilindro" |
reiniciar |
Reset | "reinicia" |
Environment Variables
# Web API Server
API_ENABLED=true # Enable/disable web interface
API_HOST=0.0.0.0 # Bind address
API_PORT=8080 # Listen port
# TCP Parameters
TCP_OFFSET_MM=63.0 # Z-offset to object surface
APPROACH_OFFSET_MM=50.0 # Safe approach distance above object
STEP_MM=20.0 # Distance for up/down increments
# LLM Configuration (optional)
LLM_PROVIDER=rules # "rules" or "gemini"
GOOGLE_API_KEY=your_key # Required if using gemini
GEMINI_MODEL=gemini-2.0-flash
# Workspace Safety (optional)
WORKSPACE_MIN_X=-300
WORKSPACE_MAX_X=300
WORKSPACE_MIN_Y=-300
WORKSPACE_MAX_Y=300
WORKSPACE_MIN_Z=0
WORKSPACE_MAX_Z=500
# Misc
DRY_RUN=false # Skip sending robot commands
Installation
cd dora_voice_control
pip install -e .
# With LLM support
pip install -e ".[llm]"
Testing
Web Interface
# Start the node (standalone for testing)
python -m dora_voice_control.main
# Open in browser
open http://localhost:8080
API Endpoints
# Get status
curl http://localhost:8080/api/status
# Get objects
curl http://localhost:8080/api/objects
# Get queue
curl http://localhost:8080/api/queue
# Send command
curl -X POST http://localhost:8080/api/command \
-H "Content-Type: application/json" \
-d '{"text": "sube"}'
# Clear queue
curl -X POST http://localhost:8080/api/queue/clear
Python Test
from dora_voice_control.parser import rule_parse, normalize
# Test command parsing
text = "agarra el cubo rojo grande"
result = rule_parse(normalize(text))
print(result)
# {'resultado': 'ok', 'accion': 'tomar', 'objeto': 'cubo', 'color': 'rojo', 'tamano': 'grande'}
Dora Dataflow Configuration
nodes:
- id: voice_control
build: pip install -e ./dora_voice_control
path: dora_voice_control
inputs:
voice_in: iobridge/voice_in
tcp_pose: robot/tcp_pose
objects: detector/objects
status: robot/status
outputs:
- robot_cmd
- voice_out
- scene_update
env:
API_ENABLED: "true"
API_PORT: "8080"
DRY_RUN: "false"
Message Examples
Input: voice_in
"sube"
"agarra el cubo rojo"
"suelta en la caja azul"
Output: robot_cmd
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"action": "move_to_pose",
"payload": {
"x": 150.0,
"y": 200.0,
"z": 280.0,
"roll": 180.0,
"pitch": 0.0,
"yaw": 0.0
}
}
Output: voice_out
{"text": "Ok, voy a subir", "status": "ok"}
{"text": "No entendi el comando", "status": "error"}
Dependencies
- dora-rs >= 0.3.9
- numpy < 2.0.0
- pyarrow >= 12.0.0
- fastapi >= 0.109.0
- uvicorn >= 0.27.0
- pydantic >= 2.0.0
- google-genai (optional, for Gemini mode)