dora_littlehand/dora_voice_control/README.md

# Dora Voice Control Node

A Dora node that processes Spanish voice commands from children and translates them into robot actions (movement, grasping, releasing objects). Includes a web debug interface.

## Features

- Spanish voice command parsing (rule-based or Gemini LLM)
- Real-time web debug interface
- Command queue management
- Workspace bounds validation
- Object detection integration

## File Structure

```
dora_voice_control/
├── __init__.py
├── main.py        # Main Dora node entry point
├── api.py         # FastAPI web server
├── config.py      # Configuration management
├── models.py      # Pydantic request/response models
├── parser.py      # Voice command parsing logic
├── state.py       # Shared state management
└── templates.py   # HTML template for web interface
```

## Web Debug Interface

Access the debug interface at `http://localhost:8080` (default).

Features:
- Real-time status monitoring (pose, objects, queue)
- Send manual voice commands
- Quick command buttons
- View parse results
- Command history
- Clear queue

## Inputs/Outputs

| Input         | Type   | Description                              |
|---------------|--------|------------------------------------------|
| `voice_in`    | string | Text transcription of voice command      |
| `tcp_pose`    | array  | Current robot pose [x, y, z, roll, pitch, yaw] |
| `objects`     | JSON   | Detected objects from vision system      |
| `status`      | JSON   | Command execution status from robot      |

| Output        | Type   | Description                              |
|---------------|--------|------------------------------------------|
| `robot_cmd`   | JSON   | Robot command with action and payload    |
| `voice_out`   | JSON   | Response confirmation to user            |
| `scene_update`| JSON   | Updated scene with all visible objects   |

## Supported Commands (Spanish)

| Command       | Action         | Example                        |
|---------------|----------------|--------------------------------|
| `subir`       | Move up        | "sube"                         |
| `bajar`       | Move down      | "baja"                         |
| `tomar`       | Grab object    | "agarra el cubo rojo"          |
| `soltar`      | Release object | "suelta en la caja azul"       |
| `ir`          | Go to object   | "ve al cilindro"               |
| `reiniciar`   | Reset          | "reinicia"                     |

## Environment Variables

```bash
# Web API Server
API_ENABLED=true        # Enable/disable web interface
API_HOST=0.0.0.0        # Bind address
API_PORT=8080           # Listen port

# TCP Parameters
TCP_OFFSET_MM=63.0          # Z-offset to object surface
APPROACH_OFFSET_MM=50.0     # Safe approach distance above object
STEP_MM=20.0                # Distance for up/down increments

# LLM Configuration (optional)
LLM_PROVIDER=rules          # "rules" or "gemini"
GOOGLE_API_KEY=your_key     # Required if using gemini
GEMINI_MODEL=gemini-2.0-flash

# Workspace Safety (optional)
WORKSPACE_MIN_X=-300
WORKSPACE_MAX_X=300
WORKSPACE_MIN_Y=-300
WORKSPACE_MAX_Y=300
WORKSPACE_MIN_Z=0
WORKSPACE_MAX_Z=500

# Misc
DRY_RUN=false               # Skip sending robot commands
```

## Installation

```bash
cd dora_voice_control
pip install -e .

# With LLM support
pip install -e ".[llm]"
```

## Testing

### Web Interface

```bash
# Start the node (standalone for testing)
python -m dora_voice_control.main

# Open in browser
open http://localhost:8080
```

### API Endpoints

```bash
# Get status
curl http://localhost:8080/api/status

# Get objects
curl http://localhost:8080/api/objects

# Get queue
curl http://localhost:8080/api/queue

# Send command
curl -X POST http://localhost:8080/api/command \
  -H "Content-Type: application/json" \
  -d '{"text": "sube"}'

# Clear queue
curl -X POST http://localhost:8080/api/queue/clear
```

### Python Test

```python
from dora_voice_control.parser import rule_parse, normalize

# Test command parsing
text = "agarra el cubo rojo grande"
result = rule_parse(normalize(text))
print(result)
# {'resultado': 'ok', 'accion': 'tomar', 'objeto': 'cubo', 'color': 'rojo', 'tamano': 'grande'}
```

## Dora Dataflow Configuration

```yaml
nodes:
  - id: voice_control
    build: pip install -e ./dora_voice_control
    path: dora_voice_control
    inputs:
      voice_in: iobridge/voice_in
      tcp_pose: robot/tcp_pose
      objects: detector/objects
      status: robot/status
    outputs:
      - robot_cmd
      - voice_out
      - scene_update
    env:
      API_ENABLED: "true"
      API_PORT: "8080"
      DRY_RUN: "false"
```

## Message Examples

### Input: voice_in
```
"sube"
"agarra el cubo rojo"
"suelta en la caja azul"
```

### Output: robot_cmd
```json
{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "action": "move_to_pose",
  "payload": {
    "x": 150.0,
    "y": 200.0,
    "z": 280.0,
    "roll": 180.0,
    "pitch": 0.0,
    "yaw": 0.0
  }
}
```

### Output: voice_out
```json
{"text": "Ok, voy a subir", "status": "ok"}
{"text": "No entendi el comando", "status": "error"}
```

## Dependencies

- dora-rs >= 0.3.9
- numpy < 2.0.0
- pyarrow >= 12.0.0
- fastapi >= 0.109.0
- uvicorn >= 0.27.0
- pydantic >= 2.0.0
- google-genai (optional, for Gemini mode)