Refactor voice control core and robot behavior

This commit is contained in:
cristhian aguilera
2026-02-02 12:29:59 -03:00
parent b9798a2f46
commit 695d309816
36 changed files with 3436 additions and 1065 deletions

View File

@@ -1,178 +1,106 @@
# Dora IOBridge Node
A WebSocket server that bridges web clients with the Dora dataflow for real-time voice commands and scene updates.
Generic WebSocket bridge between web clients and Dora dataflow.
## Inputs/Outputs
## Dora Outputs (WebSocket → Dora)
| Input | Type | Description |
|----------------|--------|---------------------------------------|
| `voice_out` | JSON | Response from voice control node |
| `scene_update` | JSON | Scene objects from voice control |
| Output | Type | Description |
|-------------|--------|--------------------------|
| `text_out` | string | Text from clients |
| `audio_out` | bytes | WAV audio from clients |
| `data_out` | JSON | Generic data from clients|
| Output | Type | Description |
|----------------|--------|---------------------------------------|
| `voice_in` | string | Voice commands forwarded to Dora |
## Dora Inputs (Dora → WebSocket)
| Input | Type | Description |
|-------------|--------|--------------------------|
| `text_in` | string | Text to broadcast |
| `audio_in` | bytes | WAV audio to broadcast |
| `data_in` | JSON | Generic data to broadcast|
## Environment Variables
```bash
VOICE_HOST=0.0.0.0 # Bind address
VOICE_PORT=8765 # Listen port
| Variable | Default | Description |
|-----------|-----------|--------------|
| `WS_HOST` | `0.0.0.0` | Bind address |
| `WS_PORT` | `8765` | Listen port |
## WebSocket Endpoint
```text
ws://{WS_HOST}:{WS_PORT}
```
## Installation
Default: `ws://0.0.0.0:8765`
```bash
cd dora_iobridge
pip install -e .
## WebSocket Protocol
### Client → Server
| Type | Field | Description |
|---------|-----------|-----------------------|
| `text` | `content` | Text string |
| `audio` | `content` | Base64-encoded WAV |
| `data` | `payload` | Any JSON object |
| `ping` | - | Health check |
Examples:
```json
{"type": "text", "content": "hello world"}
{"type": "audio", "content": "UklGRi4AAABXQVZFZm10..."}
{"type": "data", "payload": {"key": "value"}}
{"type": "ping"}
```
### Server → Client
| Type | Field | Description |
|---------|-----------|-----------------------|
| `text` | `content` | Text string |
| `audio` | `content` | Base64-encoded WAV |
| `data` | `payload` | Any JSON object |
| `pong` | - | Response to ping |
| `error` | `message` | Error description |
Examples:
```json
{"type": "text", "content": "response text"}
{"type": "audio", "content": "UklGRi4A...", "format": "wav"}
{"type": "data", "payload": {"objects": [...]}}
{"type": "pong"}
{"type": "error", "message": "Invalid JSON"}
```
## Dataflow Example
```yaml
- id: iobridge
build: uv pip install -e dora_iobridge
path: dora_iobridge/dora_iobridge/main.py
env:
WS_HOST: "0.0.0.0"
WS_PORT: "8765"
inputs:
text_in: voice/voice_out
data_in: voice/scene_update
outputs:
- text_out
```
## Testing
### Test with WebSocket (wscat)
```bash
# Install wscat
npm install -g wscat
# Connect to the server
wscat -c ws://localhost:8765
```
### Test with curl (websocat)
```bash
# Install websocat
# Ubuntu: sudo apt install websocat
# macOS: brew install websocat
sudo apt install websocat
# Send a ping
echo '{"type": "ping"}' | websocat ws://localhost:8765
# Response: {"type": "pong"}
# Connect
websocat ws://localhost:8765
# Send a voice command
echo '{"type": "command", "text": "sube"}' | websocat ws://localhost:8765
# Send text
{"type": "text", "content": "hello"}
# Request scene refresh
echo '{"type": "scene_refresh"}' | websocat ws://localhost:8765
```
### Test with Python
```python
import asyncio
import websockets
import json
async def test_iobridge():
uri = "ws://localhost:8765"
async with websockets.connect(uri) as ws:
# Test ping
await ws.send(json.dumps({"type": "ping"}))
response = await ws.recv()
print(f"Ping response: {response}")
# Send command
await ws.send(json.dumps({
"type": "command",
"text": "agarra el cubo rojo"
}))
# Listen for responses
async for message in ws:
data = json.loads(message)
print(f"Received: {data}")
asyncio.run(test_iobridge())
```
### Test with curl (HTTP upgrade not supported directly)
Since WebSocket requires an upgrade handshake, use this shell script:
```bash
#!/bin/bash
# test_iobridge.sh
# Using websocat for interactive testing
websocat ws://localhost:8765 <<EOF
{"type": "ping"}
{"type": "command", "text": "sube"}
{"type": "scene_refresh"}
EOF
```
## WebSocket Message Types
### Client -> Server
**Command (voice input)**
```json
{"type": "command", "text": "agarra el cubo rojo"}
```
**Ping (health check)**
```json
# Ping
{"type": "ping"}
```
Response: `{"type": "pong"}`
**Scene Refresh**
```json
{"type": "scene_refresh"}
```
### Server -> Client (Broadcasts)
**Command Response**
```json
{
"type": "response",
"text": "Ok, voy a tomar",
"status": "ok"
}
```
**Scene Update**
```json
{
"type": "scene_updated",
"objects": [
{
"object_type": "cubo",
"color": "rojo",
"size": "big",
"position_mm": [150.0, 200.0, 280.0],
"source": "detection"
}
]
}
```
## Dora Dataflow Configuration
```yaml
nodes:
- id: iobridge
build: pip install -e ./dora_iobridge
path: dora_iobridge
inputs:
voice_out: voice_control/voice_out
scene_update: voice_control/scene_update
outputs:
- voice_in
env:
VOICE_HOST: "0.0.0.0"
VOICE_PORT: "8765"
```
```bash
dora up
dora start dataflow.yml
```
## Dependencies
- dora-rs >= 0.3.9
- pyarrow >= 12.0.0
- websockets >= 12.0