Files
dora_littlehand/dora_voice_control/README.md
2026-01-31 11:41:50 -03:00

5.2 KiB

Dora Voice Control Node

A Dora node that processes Spanish voice commands from children and translates them into robot actions (movement, grasping, releasing objects). Includes a web debug interface.

Features

  • Spanish voice command parsing (rule-based or Gemini LLM)
  • Real-time web debug interface
  • Command queue management
  • Workspace bounds validation
  • Object detection integration

File Structure

dora_voice_control/
├── __init__.py
├── main.py        # Main Dora node entry point
├── api.py         # FastAPI web server
├── config.py      # Configuration management
├── models.py      # Pydantic request/response models
├── parser.py      # Voice command parsing logic
├── state.py       # Shared state management
└── templates.py   # HTML template for web interface

Web Debug Interface

Access the debug interface at http://localhost:8080 (default).

Features:

  • Real-time status monitoring (pose, objects, queue)
  • Send manual voice commands
  • Quick command buttons
  • View parse results
  • Command history
  • Clear queue

Inputs/Outputs

Input Type Description
voice_in string Text transcription of voice command
tcp_pose array Current robot pose [x, y, z, roll, pitch, yaw]
objects JSON Detected objects from vision system
status JSON Command execution status from robot
Output Type Description
robot_cmd JSON Robot command with action and payload
voice_out JSON Response confirmation to user
scene_update JSON Updated scene with all visible objects

Supported Commands (Spanish)

Command Action Example
subir Move up "sube"
bajar Move down "baja"
tomar Grab object "agarra el cubo rojo"
soltar Release object "suelta en la caja azul"
ir Go to object "ve al cilindro"
reiniciar Reset "reinicia"

Environment Variables

# Web API Server
API_ENABLED=true        # Enable/disable web interface
API_HOST=0.0.0.0        # Bind address
API_PORT=8080           # Listen port

# TCP Parameters
TCP_OFFSET_MM=63.0          # Z-offset to object surface
APPROACH_OFFSET_MM=50.0     # Safe approach distance above object
STEP_MM=20.0                # Distance for up/down increments

# LLM Configuration (optional)
LLM_PROVIDER=rules          # "rules" or "gemini"
GOOGLE_API_KEY=your_key     # Required if using gemini
GEMINI_MODEL=gemini-2.0-flash

# Workspace Safety (optional)
WORKSPACE_MIN_X=-300
WORKSPACE_MAX_X=300
WORKSPACE_MIN_Y=-300
WORKSPACE_MAX_Y=300
WORKSPACE_MIN_Z=0
WORKSPACE_MAX_Z=500

# Misc
DRY_RUN=false               # Skip sending robot commands

Installation

cd dora_voice_control
pip install -e .

# With LLM support
pip install -e ".[llm]"

Testing

Web Interface

# Start the node (standalone for testing)
python -m dora_voice_control.main

# Open in browser
open http://localhost:8080

API Endpoints

# Get status
curl http://localhost:8080/api/status

# Get objects
curl http://localhost:8080/api/objects

# Get queue
curl http://localhost:8080/api/queue

# Send command
curl -X POST http://localhost:8080/api/command \
  -H "Content-Type: application/json" \
  -d '{"text": "sube"}'

# Clear queue
curl -X POST http://localhost:8080/api/queue/clear

Python Test

from dora_voice_control.parser import rule_parse, normalize

# Test command parsing
text = "agarra el cubo rojo grande"
result = rule_parse(normalize(text))
print(result)
# {'resultado': 'ok', 'accion': 'tomar', 'objeto': 'cubo', 'color': 'rojo', 'tamano': 'grande'}

Dora Dataflow Configuration

nodes:
  - id: voice_control
    build: pip install -e ./dora_voice_control
    path: dora_voice_control
    inputs:
      voice_in: iobridge/voice_in
      tcp_pose: robot/tcp_pose
      objects: detector/objects
      status: robot/status
    outputs:
      - robot_cmd
      - voice_out
      - scene_update
    env:
      API_ENABLED: "true"
      API_PORT: "8080"
      DRY_RUN: "false"

Message Examples

Input: voice_in

"sube"
"agarra el cubo rojo"
"suelta en la caja azul"

Output: robot_cmd

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "action": "move_to_pose",
  "payload": {
    "x": 150.0,
    "y": 200.0,
    "z": 280.0,
    "roll": 180.0,
    "pitch": 0.0,
    "yaw": 0.0
  }
}

Output: voice_out

{"text": "Ok, voy a subir", "status": "ok"}
{"text": "No entendi el comando", "status": "error"}

Dependencies

  • dora-rs >= 0.3.9
  • numpy < 2.0.0
  • pyarrow >= 12.0.0
  • fastapi >= 0.109.0
  • uvicorn >= 0.27.0
  • pydantic >= 2.0.0
  • google-genai (optional, for Gemini mode)