IBM Research x U-Tune

Lowering the Technical Barriers to GenAI Fine-tuning for Non-technical Users

Skills

UX Research
Wireframing
Prototyping

Tools

Figma
Next.js

Team

2 UX Researchers

Duration

Apr 2024 - Dec 2024

Live Github Figma

Background

Grounding Fine-Tuning in Trustworthy & Equitable AI for Non-Technical Users

U-Tune is designed from the ground up as an educational no-code platform that makes fine-tuning accessible without requiring technical expertise. It not only enables customization but also helps users form a mental model of the process. The platform is grounded in IBM’s Equitable and Trustworthy AI principles, ensuring that accessibility, transparency, and safety translate directly into business value and user trust.

Key Responsibilities

I led ideation, design, and prototyping of the solution, and strengthened the research phase by making a pivotal decision

Research decisions: Prioritized surveys and expert interviews, and made the pivotal call to skip discovery interviews due to resource and scheduling constraints.
Mapping flows: Synthesized insights into clear process and user flows, linking pain points to opportunities for intervention.
Design & prototyping: Led ideation, designed hi-fidelity solutions, and built coded prototypes in Next.js with a simulated backend for realistic user testing.
Iteration & decision-making: Used rapid testing cycles (RITE method) to refine designs, balancing clarity, learning effectiveness, and user trust.

Problem Space

Fine-Tuning Without Understanding Creates Risks

Non-technical users often lack a mental model of how fine-tuning works. Approaching it without understanding the underlying technicalities can lead to unintended consequences, poor results, and reduced trust in the AI systems they’re working with. Without guidance and transparency, the barrier to entry remains high, leaving valuable capabilities inaccessible to most.

Target Users & Use Cases

But who are these non-technical users?

U-Tune is designed for non-technical users exploring the potential of Gen-AI, including analysts, content creators, small business owners, and more. While technically curious, these users often lack the foundational knowledge needed to understand AI workflows, making effective fine-tuning out of reach.

Alex Martinez

Marketing Analyst Exploring AI

I know fine-tuning GenAI could give me an edge, but without a clear understanding of how it works, I don’t know where to start.

Rachel Adams

Owner of an online home decor business

A fine-tuned chatbot could be great for my business, but without knowing how the process works, and how my customers’ data is handled, I can’t fully trust it.

Goals

To ensure U-Tune truly met the needs of non-technical users, we set clear goals to guide both design and evaluation

Increase comprehension of the fine-tuning workflow

Improve trust in GenAI systems.

Lower the barrier to experimentation with GenAI fine-tuning.

Approach

Breaking down the problem

To design a solution that was both accessible and trustworthy, the approach was grounded in three key focus areas:

The technical fine-tuning process flow
User pain points and goals when fine-tuning GenAI models
Effective learning techniques for technical topics

Research and Insights

Identifying design opportunities by mapping user pain-points and expert nuances on to a process flow

To bridge research and design, I created a comprehensive process flow diagram of the fine-tuning journey based on expert interviews. I then layered non-technical users’ pain points and expert considerations onto this flow. This synthesis helped identify exact points of friction and guided where design interventions would matter most.

1. Building the Process Map & Understanding Nuances

Semi-structured Interviews

/ /

12 Experts

Key Insights

Model Selection

Experts rely on documentation to choose models based on use case and modality.

Hyperparameter Trial & Error

Experts tweak parameters through trial and error, knowing how each one impacts fine-tuning.

Semi-structured interviews were chosen to balance consistent coverage of key topics with the flexibility to explore deeper when needed.

2. Mapping Friction Points for Non-Technical Users

Surveys & Discovery Interviews

/ /

15 Responses

Key Insights

Overwhelming Jargon

73% (11/15) users felt overwhelmed and unclear about the purpose of different settings due to jargon

Mistrust Due to Data Concerns

60% (9/15) of users worried about data retention and usage by the models.

Surveys were used to quickly gather early-stage insights and identify participants for potential follow-up interviews.

3. Proven Methods for Technical Learning

Literature Review

/ /

6 Peer-reviewed Articles

Key Insights

Active Learning

Active hands-on learning retains up to 75% of information, compared to just 5% from passive lectures.

Literature reviews were conducted to identify effective learning techniques for technical topics, aiming to use the surfaced techniques to increase fine-tuning comprehension among non-technical users.

Bringing All the Research Together

Insights from all the research were brought together onto the process map. This visual made it clear where non-technical users struggled, why these gaps occurred, and where design changes could make the biggest difference.

Key Feature #1

Simplifying Hyperparameters to Cut Through Jargon

The first tested iteration used analogy-based explanations, comparing the settings to everyday concepts like 'riding a bike', in an attempt to make the parameters feel more approachable.

However, concept testing with 12 users revealed that while the analogies were entertaining, they were too abstract.

Usability testing also showed that the text-heavy layout was visually overwhelming.

The final design replaced analogies with plain-language definitions, explaining what each parameter does and how it impacts results. These were presented in an accordion layout, allowing users to expand details only when needed, reducing cognitive load.

Plain-language definitions - Not too abstract, not too technical.

Accordion layout - Details on demand, no clutter.

Key Feature #2

AI Nutrition Facts to Build Trust at the Model Selection Stage

The first tested design used hover tooltips to display the Nutrition Facts providing important details like Privacy Ladder Level, Data Retention and Anonymization of Training Data.

However, usability testing revealed that many users missed these entirely. Those who did find them struggled to interpret the technical terminology, which undermined trust instead of improving it.

The final design placed the label on the model details page for easy access and introduced a “How to Read the Label” accordion section for clarity.

Persistent placing - Easy access & easy to locate.

Field-by-field breakdown - Removes guesswork.

Key Feature #3

Hands-On Tutorials for Confident Experimentation

Guided tutorials, with relatable tasks like fine-tuning an image-generation model to create a flying dog, were integrated into the workflow, allowing users to practice in a safe, hands-on sandbox before working with real models. Each step mirrored actual fine-tuning tasks, provided instant feedback, and reinforced hands-on learning to strengthen comprehension and confidence.

Design & Prototyping Process

Prototyping Realistic User Flows with Carbon and Next.js

The platform was designed in Figma using the IBM Carbon Design System to maintain consistency and scalability. I then built a high-fidelity prototype in Next.js, connected to a mock JSON server backend. This setup simulated real-time data flow and interactions, giving users a realistic experience during testing.

User Testing & Evaluation

Rapid Testing Using RITE and Assessments to Measure Impact

For testing, Rapid Iterative Testing & Evaluation (RITE) method was used to keep feedback loops always open, short and adaptive. This allowed rapid iteration on new ideas and issues as they emerged.

To measure impact, pre- and post-testing assessments were conducted, asking users to rate (on a Likert scale) and explain their understanding of fine-tuning and how their data is used.

Questions like “How would you describe the fine-tuning process in your own words?” helped identify outliers before quantifying impact.

Results

Quantitative Impact

92%

increase in fine-tuning comprehension per user.

70%

increase in AI trust per user.

Results

Qualitative Impact

at the end of the day i feel like I learned a new skill, better understanding of ai as a whole

love the ability to actually play around and use your own imagination

Results

Accessibility Impact

90%

WCAG Compliance, verified by IBM Equal Access Checker

Results

Performance Impact

0.7s

Average LCP across all developed pages

CLS across all developed pages

Next Steps

Expanding Features & Evaluating Business Impact

1. Business & Impact

Scalability Testing: Evaluate how the platform performs across different user groups and domains.
Market Validation: Pilot U-Tune with small business cohorts or research labs to validate adoption potential and refine go-to-market positioning.

2. New Features

Community Features: Explore community-based learning with a timeline where users can share fine-tuned models, teach others, or seek help.
Model Integration: Incorporate real model API endpoints to simulate live fine-tuning feedback, moving beyond hypothetical flows.

Reflection

From Validation to Refinement

U-Tune validated that complex technical processes like fine-tuning can be made accessible to non-technical users by addressing the mental model gap. The solution demonstrated measurable improvements in comprehension and trust, showing that simplified language, transparency features, and hands-on learning can transform user confidence.
While the interface delivered on functionality, the visual design remains minimal and could be refined to enhance engagement without increasing cognitive load.

Explore More Work

Classroom Connect

Modernizing a smart classroom experience through Atomic Design principles

Mobile Design