Markov AI releases 48K screen recording dataset for computer-use agent training

Getting an AI agent to reliably navigate Photoshop or Salesforce requires one thing above almost everything else: footage of humans actually doing it. Markov AI is trying to fill that gap.

The company has published "Computer Use Large," an open-source dataset of 48,478 screen recording videos totaling approximately 12,300 hours of professional software usage, available on Hugging Face under the CC-BY-4.0 license. The dataset spans six categories: Blender (11,493 videos, 3,624 hours), Photoshop (10,704 videos, 2,060 hours), AutoCAD (10,059 videos, 2,149 hours), Excel (8,111 videos, 2,002 hours), Salesforce (7,807 videos, 2,336 hours), and VS Code (304 videos, 127 hours). All recordings have been preprocessed to strip intros, outros, talking heads, and audio, leaving clean GUI interaction footage with structured per-video metadata in JSONL format. The dataset accumulated over 45,000 downloads within its first month of release.

The dataset is designed for training and evaluating computer-use agents — AI systems that navigate desktop software through clicking, typing, and scrolling rather than structured APIs. Real-world GUI interaction data at this scale has historically been scarce and expensive to curate. The coverage skews toward enterprise and creative tools: Salesforce, Blender, and Photoshop represent the kind of deep, multi-step professional workflows that desktop automation companies have been circling for years. VS Code's thin representation — just 304 videos against more than 10,000 for AutoCAD — may limit the dataset's immediate usefulness for coding agent development.

Anthropic, OpenAI, and Google DeepMind have all shipped computer-use products in the past year — Claude's computer use feature, Operator, and Project Mariner respectively — pulling significant capital and researcher attention into the space. Markov AI's own Hugging Face organization also hosts a separate dataset of agent trajectories on OSWorld tasks, explicitly linked to Anthropic's ComputerRL open-source framework, positioning the company as data infrastructure rather than a model builder competing against the frontier labs. The company describes its focus as providing "RL environments for computer-use AI" and high-quality training data for the next generation of desktop agents.

Markov AI is run by Dev Mandal, an IIT Madras graduate who previously interned at Founders Fund-backed startup Delphi and studied at Stanford. The company appears to be pre-seed or bootstrapped, and the GUI automation focus is a sharp turn from its original pitch in August 2025 as a robotics data labeling company — billed then as "the Scale AI for Robotics." The free CC-BY-4.0 release functions as both a credibility play and a developer acquisition strategy: maximizing downstream adoption and citation builds visibility in a crowded infrastructure niche, while higher-value proprietary services — custom data pipelines, RL environment access, bespoke trajectory curation — are where the company's monetization presumably sits.