Object detection
In this guide you’ll wire a Python sidecar that subscribes to a Smelter input’s
video side channel, runs YOLO object detection on
every frame, and posts the detection list back to the TypeScript app. The TS app
holds the current detections in a Zustand store;
the JSX composition renders one bounding-box view per detection, animated with a
transition so the boxes move smoothly.
The TS app is built up across the steps below in one app.tsx file. The Python
sidecar lives in detect.py.
-
Install the TypeScript app’s dependencies and the Python sidecar’s dependencies, then export the directory where Smelter will create the side channel sockets. Both the TS app and the sidecar read it from the environment, so set it once in the shell you run them from.
pnpm add @swmansion/smelter @swmansion/smelter-node react zustandpip install smelter-sdk ultralytics opencv-pythonexport SMELTER_SIDE_CHANNEL_SOCKET_DIR=/tmp/smelter-sockets -
Initialise Smelter.
app.tsx import Smelter from "@swmansion/smelter-node";async function main() {const smelter = new Smelter();await smelter.init();}main().catch(console.error); -
Add a Zustand store (or any other state management) for the current detection list, plus the HTTP endpoint the sidecar POSTs to. The endpoint writes to the store from outside React with
useStore.getState().setDetections(...), which re-renders the JSX.app.tsx import { create } from "zustand";import http from "node:http";interface Detection {id: number | null;x: number; y: number; width: number; height: number;}interface DetectionStore {detections: Detection[];setDetections: (detections: Detection[]) => void;}const useStore = create<DetectionStore>((set) => ({detections: [],setDetections: (detections) => set({ detections }),}));http.createServer((req, res) => {if (req.method !== "POST" || req.url !== "/update") {res.statusCode = 404;res.end();return;}let body = "";req.on("data", (chunk) => (body += chunk));req.on("end", () => {const { detections } = JSON.parse(body) as { detections: Detection[] };useStore.getState().setDetections(detections);res.end();});}).listen(3001, "127.0.0.1"); -
Wire the WHIP input, WHEP output, and the bounding-box composition. The input’s
sideChannel.delayMsdelays the output relative to the input, giving the sidecar time to run YOLO and push detections into the store before the matching frame is rendered. The 200 mstransitionmakes each box interpolate smoothly between updates; the per-detectionkeylets React reuse the same view across frames so the transition has a previous position to start from.app.tsx 34 collapsed linesimport Smelter from "@swmansion/smelter-node";import http from "node:http";import { create } from "zustand";import { View, InputStream, Rescaler } from "@swmansion/smelter";interface Detection {id: number | null;x: number; y: number; width: number; height: number;}interface DetectionStore {detections: Detection[];setDetections: (detections: Detection[]) => void;}const useStore = create<DetectionStore>((set) => ({detections: [],setDetections: (detections) => set({ detections }),}));http.createServer((req, res) => {if (req.method !== "POST" || req.url !== "/update") {res.statusCode = 404;res.end();return;}let body = "";req.on("data", (chunk) => (body += chunk));req.on("end", () => {const { detections } = JSON.parse(body) as { detections: Detection[] };useStore.getState().setDetections(detections);res.end();});}).listen(3001, "127.0.0.1");const OUTPUT_W = 1920;const OUTPUT_H = 1080;function Box({ det }: { det: Detection }) {return (<Viewstyle={{left: Math.round(det.x * OUTPUT_W),top: Math.round(det.y * OUTPUT_H),width: Math.max(2, Math.round(det.width * OUTPUT_W)),height: Math.max(2, Math.round(det.height * OUTPUT_H)),borderWidth: 4,borderColor: "#00FF88FF",borderRadius: 6,}}transition={{ durationMs: 200 }}/>);}function Composition() {const detections = useStore((s) => s.detections);return (<View style={{ width: OUTPUT_W, height: OUTPUT_H }}><Rescaler><InputStream inputId="input" /></Rescaler>{detections.map((det, i) => (<Box key={det.id ?? `i-${i}`} det={det} />))}</View>);}async function main() {2 collapsed linesconst smelter = new Smelter();await smelter.init();await smelter.registerInput("input", {type: "whip_server",bearerToken: "example",sideChannel: { video: true, delayMs: 200 },});await smelter.registerOutput("output", <Composition />, {type: "whep_server",bearerToken: "example",video: {resolution: { width: OUTPUT_W, height: OUTPUT_H },encoder: { type: "ffmpeg_h264", preset: "ultrafast" },},audio: { encoder: { type: "opus" } },});await smelter.start();}main().catch(console.error);Run the TS app with
tsx app.tsx(or your preferred TypeScript runner). Smelter starts the side channel sockets and waits for a WHIP stream. -
The Python sidecar subscribes to the video side channel, runs YOLO on every frame (
model.track(...)persists a per-targetidacross frames), and POSTs the detection list to the TS app’s HTTP endpoint.detect.py import jsonimport urllib.requestimport cv2from smelter import subscribe_video_channelfrom ultralytics import YOLOAPP_URL = "http://127.0.0.1:3001/update"INPUT_ID = "input"MIN_CONFIDENCE = 0.5def post(body: dict):req = urllib.request.Request(APP_URL,data=json.dumps(body).encode(),headers={"Content-Type": "application/json"},method="POST",)urllib.request.urlopen(req).read()def main():model = YOLO("yolov8n.pt")for frame in subscribe_video_channel(INPUT_ID):bgr = cv2.cvtColor(frame.rgba, cv2.COLOR_RGBA2BGR)results = model.track(bgr, persist=True, verbose=False, classes=[0])if not results or results[0].boxes is None:continueboxes = results[0].boxesxyxy = boxes.xyxy.cpu().numpy()conf = boxes.conf.cpu().numpy()ids = boxes.id.cpu().numpy().astype(int).tolist() if boxes.id is not None else [None] * len(xyxy)detections = [{"id": tid,"x": float(x1) / frame.width,"y": float(y1) / frame.height,"width": float(x2 - x1) / frame.width,"height": float(y2 - y1) / frame.height,}for (x1, y1, x2, y2), p, tid in zip(xyxy, conf, ids)if p >= MIN_CONFIDENCE]post({"detections": detections})if __name__ == "__main__":main()Run it with
python detect.py, in the same shell where you exportedSMELTER_SIDE_CHANNEL_SOCKET_DIR.classes=[0]restricts detection to people (COCO class 0); drop it or pass other class IDs to detect different objects. See the ultralytics docs for the full class list and other YOLO knobs (model size, GPU, NMS thresholds). -
Stream a test source and watch the result with Smelter’s hosted browser tools (no install required):
- Publish your camera or screen with the WHIP streamer.
- Watch the composed output with the WHEP player.
Each detection appears as a green rectangle that follows its target across frames.