CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers

Dimitrios Mallis¹, Ahmet Serdar Karadeniz¹, Sebastian Cavada¹, Danila Rukhovich¹, Niki Foteinopoulou, Kseniya Cherenkova², Anis Kacem¹, Djamila Aouada¹

SnT, University of Luxembourg¹ Artec3D²

arXiv Code

Accepted at ICCV'25

CAD-Assistant is a tool-augmented VLLM framework for AI-assisted CAD. Our framework generated FreeCAD code that is executed within CAD software directly and can process multimodal inputs, including textual queries, sketches, drawn commands and 3D scans. This figure showcases various examples of generic CAD queries and the responses generated by CAD-Assistant.

Abstract

We propose CAD-Assistant , a general-purpose CAD agent for AI-assisted design. Our approach is based on a powerful Vision and Large Language Model (VLLM) as a planner and a tool-augmentation paradigm using CADspecific tools. CAD-Assistant addresses multimodal user queries by generating actions that are iteratively executed on a Python interpreter equipped with the FreeCAD software, accessed via its Python API. Our framework is able to assess the impact of generated CAD commands on geometry and adapts subsequent actions based on the evolving state of the CAD design. We consider a wide range of CAD-specific tools including a sketch image parameterizer, rendering modules, a 2D cross-section generator, and other specialized routines. CAD-Assistant is evaluated on multiple CAD benchmarks, where it outperforms VLLM baselines and supervised task-specific methods. Beyond existing benchmarks, we qualitatively demonstrate the potential of tool-augmented VLLMs as general-purpose CAD solvers across diverse workflows.

How CAD-Assistant works?

CAD-Assistant is a tool-augmented framework that uses a vision-language model to plan CAD actions, enhanced by specialized tools for geometric reasoning and multimodal understanding. It generates CAD code via the FreeCAD API, iteratively refining actions based on the evolving design state. The followinng annimation shows the execution flow for autoconstraining. CAD-Assistant utilizes the sketch recognizer function for multimodal CAD understanding and generates constraints in a chain-of-thought manner.

General framework

The CAD-Assistant framework can be described as follows: A multimodal user request \( x_0 \) is provided as context to a VLLM planner \( P \) that responds with a plan \( p_t \) and an action \( a_t \) (python code). The action is executed on an environment \( E \) equipped with CAD software. The output generated from the execution is then concatenated with the previously generated context and fed back to the planner, enabling the generation of the next iterative step. When \( x_0 \) has been successfully addressed, \( P \) generates \( P_T \) , a special TERMINATE plan that indicates the completion of CAD-Assistant’s response:

Figure 1: Overview of CAD-Assistant.

CAD-specific tools

CAD-Assistant integrates a wide range of external CAD-specific modules, including a hand-drawn image parameterizer, rendering modules for multimodal CAD sequence understanding, a specialized utility for analysis of geometric constraints and a 2D cross-section generator for VLLM interaction with 3D scans. Our CAD-specific tool set is summarized in the following table:

Tool	Description
Python Interpreter	Action format and logical operations.
FreeCAD	Integration with CAD software.
Sketch Parameterizer	Hand-drawn sketch image to CAD sketch network.
Sketch Recognizer	Renders sketch and plots parameters.
Solid Recognizer	Renders a 3D CAD model and plots parameters.
Constraint Checker	Renders a 3D CAD model and plots parameters.
Crosssection Extract	Generates an image of a cross section from a 3D mesh.

Real-world applications

We identify emerging capabilities of tool-augmented CAD agents and demonstrate their potential beyond existing benchmarks through real-world use cases. These include generating 3D solids from hand-drawn sketches, performing 3D reverse engineering from scans using cross-section parameterization, and enabling visual CAD design through semantically interpretable drawing commands, such as sketching an extrusion operation.

Results and Comparisons

To address the lack of benchmarks for tool use akin to specialized sets commonly used in other domains, this work adopts an evaluation setting for generic CAD agents leveraging multiple existing CAD tasks. Evaluations are conducted for 2D and 3D CAD question answering, auto-constraining, and hand-drawn CAD sketch image parametrization. CAD-Assistant outperforms both VLLM baselines and supervised taskspecific methods trained on large-scale datasets, despite being prompted in a zero-shot manner.

2D/3D CAD Question Answering Performance (SGPBench)

Method	Planner	2D Acc ↑	3D Acc↑
SGPBench	GPT-4 mini	0.594	0.737
	GPT-4 Turbo	0.674	0.762
	GPT-4o	0.686	0.782
CAD-Assistant	GPT-4 mini	0.614	0.783
	GPT-4 Turbo	0.741	0.825
	GPT-4o	0.791	0.857

Auto-constraining Performance (SketchGraphs)

Method	Type	PF1 ↑	CF1 ↑
GPT-4o	zero-shot	0.693	0.274
Vitruvion	supervised	0.706	0.238
CAD-Assistant	zero-shot	0.979	0.484

Hand-drawn Image Parametrization Performance (SketchGraphs)

Method	Acc ↑	CD ↓
Vitruvion	0.659	1.586
Davinci	0.789	1.184
CAD-Assistant	0.784	0.680

BibTeX

@article{Mallis2024CADAssistantTV,
        title={CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers},
        author={Dimitrios Mallis and Ahmet Serdar Karadeniz and Sebastian Cavada and Danila Rukhovich and Niki Maria Foteinopoulou and Kseniya Cherenkova and Anis Kacem and Djamila Aouada},
        journal={ArXiv},
        year={2024},
        volume={abs/2412.13810},
      }

Acknowledgement

The present project is supported by the National Research Fund, Luxembourg under the BRIDGES2021/IS/16849599/FREE-3D project, and by Artec 3D.