Building a Dataset π
Before running any experiments, we need to make sure we have a well-structured dataset. This dataset provides the necessary input information, allowing the model to generate responses that can later be evaluated. Learn more about datasets β After the dataset is available, verify that the structure is correct by inspecting the table in the dashboard and ensuring all fields are appropriately populated.Creating an Experiment π¬
- Navigate to the Experiments tab within the dataset view
- Click βCreate Experimentβ to initiate the setup
- Assign a name to the experiment for easy identification
- Select the dataset that will serve as input for testing
Configuring Experiment βοΈ
Input Source π₯
- Select the column in the dataset that contains the input text for the model
- This column provides the context for the experiment and determines how the model will generate responses
Model Selection π€
Choose the LLM model that will process the input. Adjust key parameters to control how the model generates responses:- Temperature π‘οΈ - Controls randomness; lower values produce more deterministic outputs
- Top P π - Regulates sampling diversity by restricting token probability mass
- Max Tokens π - Defines the maximum response length
- Presence & Frequency Penalty π - Adjusts token repetition behavior
- Response Format π - Specifies the expected structure of the output
Prompt Template π
- Define the prompt template that will be used during inference
- Use placeholders
{{variable}}
to inject dataset column values - Ensure the prompt aligns with your experiment goals
Evaluation Metrics π
You can either:- Create new evaluation metrics tailored to the experiment β¨
- Use saved evaluations from previous experiments πΎ
Running the Experiment βΆοΈ
Once configured:- Review all settings to ensure alignment with objectives π
- Click βSave and Runβ to begin π
- Monitor progress in the Summary tab π
Choosing the Best Prompt π
Accessing Results π
- Navigate to the Experiments tab and select the completed experiment
- View detailed performance metrics in the Summary tab
- Compare response time, token usage, accuracy, and quality scores
Selecting the Winner π―
- Click βChoose Winnerβ in the summary view β
- Adjust metric weights based on your priorities βοΈ
- Confirm your selection π