Running OpenAI’s GPT-OSS-20B Locally with Open WebUI (Full Setup Guide)

2025-12-05 18:288 min read

In this video, Rob walks viewers through a hands-on setup of the Open Web UI on an NVIDIA DGX Spark, designed for interaction with OpenAI's GBOSS20B model, which contains 20 billion parameters. The tutorial highlights the configuration processes that can be performed across various hardware platforms but is centered around a desktop Blackwell system. Rob explains how to utilize the NVIDIA Sync tool for creating SSH port mappings and managing Docker containers for running the Open Web UI application. He details the steps to download necessary Docker images, set up volume mounts for data storage, and access the GPU and memory usage while running the app. After establishing the admin account and installing the model, Rob demonstrates running a basic query. The video aims to provide a comprehensive guide for users interested in deploying large language models with NVIDIA hardware.

Key Information

  • Rob introduces a hands-on tutorial on setting up Open Web UI on a DGX Spark to interact with a local version of OpenAI's GBOSS20B model, which has 20 billion parameters.
  • The configuration can be done on various hardware, but the demonstration is specifically on an NVIDIA DGX Spark system.
  • The video covers the use of Nvidia Sync to manage GPU and memory utilization while setting up the configuration.
  • Rob showcases downloading Docker images, creating containers, and configuring them with port mappings for Open Web UI access via SSH.
  • The tutorial includes setting up custom applications in the Nvidia Sync UI for streamlined access to the Open Web UI.
  • Rob emphasizes observing GPU and RAM utilization during interactions with the model, noting spikes in usage when queries are processed.
  • He concludes by encouraging viewers to try setting up similar configurations on their systems with compatible GPUs.

Timeline Analysis

Content Keywords

Open Web UI

Rob introduces a hands-on walkthrough demonstrating how to set up Open Web UI on an NVIDIA DGX Spark to interact with a local version of OpenAI's GBOSS20B model, which has 20 billion parameters. The video shows various configurations that can be performed on different hardware.

NVIDIA DGX Spark

Details about the NVIDIA DGX Spark system used for this setup are covered, including its performance monitoring via the Nvidia sync utility and GPU utilization during different tasks.

Docker Container

The process to download and run the Open Web UI Docker container is explained, including creating the container with port mappings and ensuring it interacts correctly with the host system's ports.

Model Installation

Rob walks through the installation of the GPTOSS20 billion parameter model, emphasizing the expected performance increases in subsequent queries as the model is loaded into memory.

Response Testing

The video concludes with testing the model's capabilities by querying it to deliver jokes and more complex instructions, while keeping an eye on the GPU and RAM utilization throughout the process.

Nvidia Sync

Instructions on how to configure and utilize Nvidia Sync for managing and launching applications on the DGX Spark are provided, along with details on how to create a custom app within the interface.

Performance Monitoring

Users are encouraged to monitor GPU and RAM usage during operations to ensure optimal performance and to evaluate the responsiveness of the system as different queries are made.

More video recommendations

Share to: