Manual Interaction

Overview

Manual interaction allows you to take direct control of the browser session and perform actions like clicking, typing, and pressing keys. This is useful when:

You need to handle sensitive information (passwords, credentials)
The AI agent encounters a situation it can’t handle
You want to guide the agent to a specific state before continuing
Testing and debugging workflows

Coordinate System

When using manual interactions, coordinates map to the browser viewport:

Resolution: 1024 × 600 pixels
Origin: Top-left (0, 0)
Browser chrome offset: 155 pixels from top

Server Browser (1024 × 755 total)
┌────────────────────────────────────┐
│  Browser Toolbar (155px height)   │ ◄─ Not clickable
├────────────────────────────────────┤
│                                    │
│     Clickable Content Area         │ ◄─ Y starts at 155
│     (1024 × 445 pixels)            │
│                                    │
└────────────────────────────────────┘

All click coordinates must account for the 155px browser chrome offset. If you click at coordinates (100, 100), you’re actually clicking (100, 255) on the full browser window.

Scaling Coordinates

If displaying the stream at a different size, scale coordinates accordingly:

function scaleCoords(clientX, clientY, displayWidth, displayHeight) {
  const serverWidth = 1024;
  const serverHeight = 600;
  const yOffset = 155; // Browser chrome height

  return {
    x: Math.round((clientX / displayWidth) * serverWidth),
    y: Math.round((clientY / displayHeight) * serverHeight) + yOffset
  };
}

// Example usage
canvas.onclick = (e) => {
  const coords = scaleCoords(
    e.clientX,
    e.clientY,
    canvas.width,
    canvas.height
  );

  socket.emit("message", {
    actionType: "interaction",
    action: { type: "CLICK", x: coords.x, y: coords.y }
  });
};

Taking Over Control

takeOverControl

Enable manual mode. The AI agent pauses, and you gain full control of the browser.

curl -X POST https://connect.enigma.click/start/send-message \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "sessionId": "SESSION_ID",
    "message": {
      "actionType": "interaction",
      "action": { "type": "takeOverControl" }
    }
  }'

Response:

{
  "success": true,
  "message": "Message sent successfully"
}

When you take over control, the current task is automatically paused. You can perform manual actions and then release control to let the agent continue.

releaseControl

Return control to the AI agent. The agent will resume from where it left off.

curl -X POST https://connect.enigma.click/start/send-message \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "sessionId": "SESSION_ID",
    "message": {
      "actionType": "interaction",
      "action": { "type": "releaseControl" }
    }
  }'

Interaction Actions

click

Click at specific coordinates. Parameters:

type: "CLICK" (required)
x: X coordinate (0-1024) (required)
y: Y coordinate (155-755, accounting for chrome offset) (required)

curl -X POST https://connect.enigma.click/start/send-message \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "sessionId": "SESSION_ID",
    "message": {
      "actionType": "interaction",
      "action": { "type": "CLICK", "x": 500, "y": 300 }
    }
  }'

doubleClick

Perform a double-click at specific coordinates. Parameters:

type: "DOUBLE_CLICK" (required)
x: X coordinate (0-1024) (required)
y: Y coordinate (155-755) (required)

curl -X POST https://connect.enigma.click/start/send-message \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "sessionId": "SESSION_ID",
    "message": {
      "actionType": "interaction",
      "action": { "type": "DOUBLE_CLICK", "x": 500, "y": 300 }
    }
  }'

type

Type text into the currently focused element. Parameters:

type: "TYPE" (required)
text: Text to type (required)
humanLike: Simulate human typing speed (optional, default: false)

curl -X POST https://connect.enigma.click/start/send-message \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "sessionId": "SESSION_ID",
    "message": {
      "actionType": "interaction",
      "action": { "type": "TYPE", "text": "Hello world", "humanLike": true }
    }
  }'

Set humanLike: true to simulate realistic typing speed and patterns. This can help avoid detection on sites with anti-bot measures.

keyPress

Press a single key or key combination. Parameters:

type: "KEY_PRESS" (required)
key: Key name (required)

curl -X POST https://connect.enigma.click/start/send-message \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -d '{
    "sessionId": "SESSION_ID",
    "message": {
      "actionType": "interaction",
      "action": { "type": "KEY_PRESS", "key": "Enter" }
    }
  }'

Common key values:

Enter
Escape
Tab
Backspace
Delete
ArrowUp, ArrowDown, ArrowLeft, ArrowRight
PageUp, PageDown
Home, End
Single characters: a, b, 1, 2, etc.

Combining with Video Streaming

Manual interaction is most useful when combined with video streaming so you can see what you’re clicking.

Complete Example

import { io } from "socket.io-client";

// 1. Create session and connect
const session = await fetch("https://connect.enigma.click/start/start-session", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "Authorization": `Bearer ${API_KEY}`
  },
  body: JSON.stringify({
    taskDetails: "Go to login page",
    startingUrl: "https://example.com/login"
  })
}).then(r => r.json());

// 2. Setup video stream
const video = document.getElementById("stream");
const canvas = document.getElementById("clickable-overlay");

// Load stream (iframe method)
video.src = session.streaming.webViewURL;

// 3. Connect WebSocket
const socket = io("https://connect.enigma.click", {
  auth: { sessionId: session.sessionId },
  transports: ["websocket"]
});

// 4. Take over control when ready
socket.on("connect", () => {
  socket.emit("message", {
    actionType: "interaction",
    action: { type: "takeOverControl" }
  });
});

// 5. Setup click handler
canvas.addEventListener("click", (e) => {
  const rect = canvas.getBoundingClientRect();
  const coords = scaleCoords(
    e.clientX - rect.left,
    e.clientY - rect.top,
    rect.width,
    rect.height
  );

  console.log("Clicking at:", coords);

  socket.emit("message", {
    actionType: "interaction",
    action: { type: "CLICK", x: coords.x, y: coords.y }
  });
});

// 6. Setup keyboard handler for typing
document.getElementById("username-input").addEventListener("input", (e) => {
  socket.emit("message", {
    actionType: "interaction",
    action: { type: "TYPE", text: e.target.value, humanLike: true }
  });
});

// 7. Release control when done
document.getElementById("done-button").addEventListener("click", () => {
  socket.emit("message", {
    actionType: "interaction",
    action: { type: "releaseControl" }
  });

  // Optionally send a new task
  socket.emit("message", {
    actionType: "newTask",
    newState: "start",
    taskDetails: "Complete the checkout process"
  });
});

function scaleCoords(clientX, clientY, displayWidth, displayHeight) {
  const serverWidth = 1024;
  const serverHeight = 600;
  const yOffset = 155; // Browser chrome height

  return {
    x: Math.round((clientX / displayWidth) * serverWidth),
    y: Math.round((clientY / displayHeight) * serverHeight) + yOffset
  };
}

Common Patterns

// Take control
socket.emit("message", {
  actionType: "interaction",
  action: { type: "takeOverControl" }
});

// Click username field (adjust coordinates as needed)
socket.emit("message", {
  actionType: "interaction",
  action: { type: "CLICK", x: 400, y: 300 }
});

// Type username
socket.emit("message", {
  actionType: "interaction",
  action: { type: "TYPE", text: "user@example.com", humanLike: true }
});

// Press Tab to move to password field
socket.emit("message", {
  actionType: "interaction",
  action: { type: "KEY_PRESS", key: "Tab" }
});

// Type password
socket.emit("message", {
  actionType: "interaction",
  action: { type: "TYPE", text: "secretPassword123", humanLike: true }
});

// Press Enter to submit
socket.emit("message", {
  actionType: "interaction",
  action: { type: "KEY_PRESS", key: "Enter" }
});

// Release control back to agent
setTimeout(() => {
  socket.emit("message", {
    actionType: "interaction",
    action: { type: "releaseControl" }
  });
}, 2000);

// Click to open dropdown
socket.emit("message", {
  actionType: "interaction",
  action: { type: "CLICK", x: 500, y: 350 }
});

// Wait for dropdown to open
await new Promise(r => setTimeout(r, 500));

// Press Down arrow to navigate options
socket.emit("message", {
  actionType: "interaction",
  action: { type: "KEY_PRESS", key: "ArrowDown" }
});

socket.emit("message", {
  actionType: "interaction",
  action: { type: "KEY_PRESS", key: "ArrowDown" }
});

// Press Enter to select
socket.emit("message", {
  actionType: "interaction",
  action: { type: "KEY_PRESS", key: "Enter" }
});

Pattern 3: CAPTCHA Handling

// Agent detects CAPTCHA and triggers guardrail
socket.on("message", (data) => {
  if (data.type === "guardrail_trigger" &&
      data.data.value.includes("CAPTCHA")) {

    // Take control for manual CAPTCHA solving
    socket.emit("message", {
      actionType: "interaction",
      action: { type: "takeOverControl" }
    });

    // User solves CAPTCHA manually via video stream
    // Once solved, release control
    document.getElementById("captcha-solved-btn").onclick = () => {
      socket.emit("message", {
        actionType: "interaction",
        action: { type: "releaseControl" }
      });

      // Resume the task
      socket.emit("message", {
        actionType: "guardrail",
        taskDetails: "CAPTCHA solved, continue",
        newState: "resume"
      });
    };
  }
});

Best Practices

1. Always Use Coordinate Scaling

Don’t hardcode coordinates. Always scale from your display to server coordinates:

// Good
const coords = scaleCoords(e.clientX, e.clientY, canvas.width, canvas.height);
socket.emit("message", {
  actionType: "interaction",
  action: { type: "CLICK", x: coords.x, y: coords.y }
});

// Bad - hardcoded coordinates won't work if display size differs
socket.emit("message", {
  actionType: "interaction",
  action: { type: "CLICK", x: 500, y: 300 }
});

2. Add Delays Between Actions

Give the browser time to respond between actions:

async function performSequence() {
  socket.emit("message", {
    actionType: "interaction",
    action: { type: "CLICK", x: 400, y: 300 }
  });

  await new Promise(r => setTimeout(r, 500)); // Wait 500ms

  socket.emit("message", {
    actionType: "interaction",
    action: { type: "TYPE", text: "search term" }
  });

  await new Promise(r => setTimeout(r, 300));

  socket.emit("message", {
    actionType: "interaction",
    action: { type: "KEY_PRESS", key: "Enter" }
  });
}

3. Use humanLike Typing for Sensitive Sites

For sites with anti-bot detection:

socket.emit("message", {
  actionType: "interaction",
  action: {
    type: "TYPE",
    text: "realistic human input",
    humanLike: true  // ← Simulates human typing speed
  }
});

4. Verify Actions with Video Stream

Always combine manual interaction with video streaming to verify your actions:

// Setup video stream first
const iframe = document.createElement("iframe");
iframe.src = session.streaming.webViewURL;
iframe.width = 1024;
iframe.height = 600;
document.body.appendChild(iframe);

// Then perform manual interactions while watching the stream

Troubleshooting

Issue	Solution
Clicks not registering	Verify Y offset (155px) is added to coordinates
Clicking wrong location	Check coordinate scaling function
Typing not working	Ensure element is focused first (click it)
Key presses ignored	Verify key name matches expected values
Actions happening too fast	Add delays between actions (500ms recommended)

Next Steps

Video Streaming

Setup video streaming for manual interaction

Controlling Sessions

Pause, resume, and manage task execution

Handling Guardrails

Respond when the agent needs help

Multi-Task Workflows

Chain multiple tasks together

Getting Started

Concepts

Usage Guides

Integrations

API Reference

Troubleshooting

Overview

Coordinate System

Scaling Coordinates

Taking Over Control

takeOverControl

releaseControl

Interaction Actions

click

doubleClick

type

keyPress

Combining with Video Streaming

Complete Example

Common Patterns

Pattern 3: CAPTCHA Handling

Best Practices

1. Always Use Coordinate Scaling

2. Add Delays Between Actions

3. Use humanLike Typing for Sensitive Sites

4. Verify Actions with Video Stream

Troubleshooting

Next Steps

Video Streaming

Controlling Sessions

Handling Guardrails

Multi-Task Workflows

Getting Started

Concepts

Usage Guides

Integrations

API Reference

Troubleshooting

​Overview

​Coordinate System

​Scaling Coordinates

​Taking Over Control

​takeOverControl

​releaseControl

​Interaction Actions

​click

​doubleClick

​type

​keyPress

​Combining with Video Streaming

​Complete Example

​Common Patterns

​Pattern 1: Manual Login

​Pattern 2: Selecting from Dropdown

​Pattern 3: CAPTCHA Handling

​Best Practices

​1. Always Use Coordinate Scaling

​2. Add Delays Between Actions

​3. Use humanLike Typing for Sensitive Sites

​4. Verify Actions with Video Stream

​Troubleshooting

​Next Steps

Video Streaming

Controlling Sessions

Handling Guardrails

Multi-Task Workflows

Overview

Coordinate System

Scaling Coordinates

Taking Over Control

takeOverControl

releaseControl

Interaction Actions

click

doubleClick

type

keyPress

Combining with Video Streaming

Complete Example

Common Patterns

Pattern 1: Manual Login

Pattern 2: Selecting from Dropdown

Pattern 3: CAPTCHA Handling

Best Practices

1. Always Use Coordinate Scaling

2. Add Delays Between Actions

3. Use humanLike Typing for Sensitive Sites

4. Verify Actions with Video Stream

Troubleshooting

Next Steps