212 lines
5.2 KiB
Markdown
212 lines
5.2 KiB
Markdown
# Dora Voice Control Node
|
|
|
|
A Dora node that processes Spanish voice commands from children and translates them into robot actions (movement, grasping, releasing objects). Includes a web debug interface.
|
|
|
|
## Features
|
|
|
|
- Spanish voice command parsing (rule-based or Gemini LLM)
|
|
- Real-time web debug interface
|
|
- Command queue management
|
|
- Workspace bounds validation
|
|
- Object detection integration
|
|
|
|
## File Structure
|
|
|
|
```
|
|
dora_voice_control/
|
|
├── __init__.py
|
|
├── main.py # Main Dora node entry point
|
|
├── api.py # FastAPI web server
|
|
├── config.py # Configuration management
|
|
├── models.py # Pydantic request/response models
|
|
├── parser.py # Voice command parsing logic
|
|
├── state.py # Shared state management
|
|
└── templates.py # HTML template for web interface
|
|
```
|
|
|
|
## Web Debug Interface
|
|
|
|
Access the debug interface at `http://localhost:8080` (default).
|
|
|
|
Features:
|
|
- Real-time status monitoring (pose, objects, queue)
|
|
- Send manual voice commands
|
|
- Quick command buttons
|
|
- View parse results
|
|
- Command history
|
|
- Clear queue
|
|
|
|
## Inputs/Outputs
|
|
|
|
| Input | Type | Description |
|
|
|---------------|--------|------------------------------------------|
|
|
| `voice_in` | string | Text transcription of voice command |
|
|
| `tcp_pose` | array | Current robot pose [x, y, z, roll, pitch, yaw] |
|
|
| `objects` | JSON | Detected objects from vision system |
|
|
| `status` | JSON | Command execution status from robot |
|
|
|
|
| Output | Type | Description |
|
|
|---------------|--------|------------------------------------------|
|
|
| `robot_cmd` | JSON | Robot command with action and payload |
|
|
| `voice_out` | JSON | Response confirmation to user |
|
|
| `scene_update`| JSON | Updated scene with all visible objects |
|
|
|
|
## Supported Commands (Spanish)
|
|
|
|
| Command | Action | Example |
|
|
|---------------|----------------|--------------------------------|
|
|
| `subir` | Move up | "sube" |
|
|
| `bajar` | Move down | "baja" |
|
|
| `tomar` | Grab object | "agarra el cubo rojo" |
|
|
| `soltar` | Release object | "suelta en la caja azul" |
|
|
| `ir` | Go to object | "ve al cilindro" |
|
|
| `reiniciar` | Reset | "reinicia" |
|
|
|
|
## Environment Variables
|
|
|
|
```bash
|
|
# Web API Server
|
|
API_ENABLED=true # Enable/disable web interface
|
|
API_HOST=0.0.0.0 # Bind address
|
|
API_PORT=8080 # Listen port
|
|
|
|
# TCP Parameters
|
|
TCP_OFFSET_MM=63.0 # Z-offset to object surface
|
|
APPROACH_OFFSET_MM=50.0 # Safe approach distance above object
|
|
STEP_MM=20.0 # Distance for up/down increments
|
|
|
|
# LLM Configuration (optional)
|
|
LLM_PROVIDER=rules # "rules" or "gemini"
|
|
GOOGLE_API_KEY=your_key # Required if using gemini
|
|
GEMINI_MODEL=gemini-2.0-flash
|
|
|
|
# Workspace Safety (optional)
|
|
WORKSPACE_MIN_X=-300
|
|
WORKSPACE_MAX_X=300
|
|
WORKSPACE_MIN_Y=-300
|
|
WORKSPACE_MAX_Y=300
|
|
WORKSPACE_MIN_Z=0
|
|
WORKSPACE_MAX_Z=500
|
|
|
|
# Misc
|
|
DRY_RUN=false # Skip sending robot commands
|
|
```
|
|
|
|
## Installation
|
|
|
|
```bash
|
|
cd dora_voice_control
|
|
pip install -e .
|
|
|
|
# With LLM support
|
|
pip install -e ".[llm]"
|
|
```
|
|
|
|
## Testing
|
|
|
|
### Web Interface
|
|
|
|
```bash
|
|
# Start the node (standalone for testing)
|
|
python -m dora_voice_control.main
|
|
|
|
# Open in browser
|
|
open http://localhost:8080
|
|
```
|
|
|
|
### API Endpoints
|
|
|
|
```bash
|
|
# Get status
|
|
curl http://localhost:8080/api/status
|
|
|
|
# Get objects
|
|
curl http://localhost:8080/api/objects
|
|
|
|
# Get queue
|
|
curl http://localhost:8080/api/queue
|
|
|
|
# Send command
|
|
curl -X POST http://localhost:8080/api/command \
|
|
-H "Content-Type: application/json" \
|
|
-d '{"text": "sube"}'
|
|
|
|
# Clear queue
|
|
curl -X POST http://localhost:8080/api/queue/clear
|
|
```
|
|
|
|
### Python Test
|
|
|
|
```python
|
|
from dora_voice_control.parser import rule_parse, normalize
|
|
|
|
# Test command parsing
|
|
text = "agarra el cubo rojo grande"
|
|
result = rule_parse(normalize(text))
|
|
print(result)
|
|
# {'resultado': 'ok', 'accion': 'tomar', 'objeto': 'cubo', 'color': 'rojo', 'tamano': 'grande'}
|
|
```
|
|
|
|
## Dora Dataflow Configuration
|
|
|
|
```yaml
|
|
nodes:
|
|
- id: voice_control
|
|
build: pip install -e ./dora_voice_control
|
|
path: dora_voice_control
|
|
inputs:
|
|
voice_in: iobridge/voice_in
|
|
tcp_pose: robot/tcp_pose
|
|
objects: detector/objects
|
|
status: robot/status
|
|
outputs:
|
|
- robot_cmd
|
|
- voice_out
|
|
- scene_update
|
|
env:
|
|
API_ENABLED: "true"
|
|
API_PORT: "8080"
|
|
DRY_RUN: "false"
|
|
```
|
|
|
|
## Message Examples
|
|
|
|
### Input: voice_in
|
|
```
|
|
"sube"
|
|
"agarra el cubo rojo"
|
|
"suelta en la caja azul"
|
|
```
|
|
|
|
### Output: robot_cmd
|
|
```json
|
|
{
|
|
"id": "550e8400-e29b-41d4-a716-446655440000",
|
|
"action": "move_to_pose",
|
|
"payload": {
|
|
"x": 150.0,
|
|
"y": 200.0,
|
|
"z": 280.0,
|
|
"roll": 180.0,
|
|
"pitch": 0.0,
|
|
"yaw": 0.0
|
|
}
|
|
}
|
|
```
|
|
|
|
### Output: voice_out
|
|
```json
|
|
{"text": "Ok, voy a subir", "status": "ok"}
|
|
{"text": "No entendi el comando", "status": "error"}
|
|
```
|
|
|
|
## Dependencies
|
|
|
|
- dora-rs >= 0.3.9
|
|
- numpy < 2.0.0
|
|
- pyarrow >= 12.0.0
|
|
- fastapi >= 0.109.0
|
|
- uvicorn >= 0.27.0
|
|
- pydantic >= 2.0.0
|
|
- google-genai (optional, for Gemini mode)
|