Data processor
Enhance datasets with the Data Processor: add manual or AI-driven annotations to prepare, scale, and enrich data for analysis, modeling, and insights.
Last updated
Was this helpful?
Enhance datasets with the Data Processor: add manual or AI-driven annotations to prepare, scale, and enrich data for analysis, modeling, and insights.
Last updated
Was this helpful?
The Data Processor is a powerful feature that allows you to enhance and enrich your datasets within the platform. It offers capabilities such as adding manual annotations quickly, and utilizing AI annotations to automate and scale your data enhancement efforts. Whether you're preparing data for analysis, training models, or generating insights, the Data Processor provides all the functionality you need for efficient data management and augmentation.
Before you begin enhancing your dataset, you need to upload your data into the platform. The Data Processor supports CSV files, making it easy to import data from various sources.
Step 1: Navigate to the Data Processor section in the platform.
Step 2: Click on the Upload CSV button.
Step 3: Select the CSV file from your local machine and upload it.
Step 4: Once uploaded, the data will be displayed in a spreadsheet format within the platform.
The Data Processor allows for quick manual annotation directly within the spreadsheet or through a modal that presents data entries more intuitively. You can set keys that assign a label to an entry with a single keystroke.
Spreadsheet Annotation:
Edit cells directly in the spreadsheet view.
Add new columns for annotations or additional data.
Use copy-paste functions for rapid data entry.
Modal Annotation:
Click on a row to open the data entry in a modal window.
View data in a more readable format, especially for lengthy text entries.
Add annotations or comments within the modal for better context.
AI Annotations enable you to automate the process of annotating your data using artificial intelligence. This feature considers the data from selected columns and provides annotations based on your configuration.
Benchmark Column: Optionally select a benchmark column to compare AI outputs against existing data.
Data Selection: Choose the columns that the AI assistant will consider for generating annotations.
Annotation Requirements: Define what annotations you need based on your data and objectives.
The Data Processor offers several AI annotation methods to suit different data enhancement needs:
Purpose: Generate open-ended annotations based on your prompt.
Functionality:
Write a custom prompt to guide the AI in generating annotations.
Useful for extracting insights or summaries from data.
Purpose: Infer themes or categories from data and group them accordingly.
Functionality:
The AI identifies underlying patterns or topics.
Groups data entries into inferred categories.
Purpose: Classify data into predefined categories with a single selection.
Functionality:
Define a list of possible categories.
The AI assigns each data entry to one category.
Purpose: Assign multiple applicable categories to each data entry.
Functionality:
Define a list of possible categories.
The AI selects all relevant categories for each entry.
Purpose: Combine classification with inference for more nuanced annotations.
Functionality:
The AI classifies data based on predefined categories.
Additionally, infers sub-categories or attributes within those classifications.
Saved Prompts:
Utilize prompts you've previously created in the Prompt Playground.
Save time by reusing effective prompts across different datasets.
Prompt Improvement:
Leverage AI assistance to refine and enhance your prompts.
Ensure that prompts are optimized for the desired annotation outcomes.
Purpose: Focus the Data Processor on a subset of the dataset for targeted enhancement.
Functionality:
Apply filters based on specific criteria (e.g., include or exclude entries containing certain parameters).
Streamline the annotation process by working only with relevant data.
Purpose: Generate insights on character and word lengths of data entries.
Functionality:
Calculate the length of text in selected columns.
Useful for estimating costs and complexities in prompt-response scenarios.
In-Processor Charting:
Create charts within the Data Processor for immediate quantitative insights.
Compare different data columns and visualize distributions.
Box Plots:
Generate box plots to display the mean and standard deviation of data segments.
Analyze data variability and identify outliers.
Initial Processing:
Run analyses on a subset of the data (default is 50 entries).
Allows for quick iteration and validation of your processing configuration.
Final Annotation:
Once satisfied with the initial results, run the AI annotation on the full dataset.
The processed data becomes accessible in the Insight Explorer and Reports function.
Login: Sign in to your Requesty account.
Navigate: From the dashboard, click on the Data Processor tab in the main menu.
Click on Upload CSV and select your file.
Wait for the data to load into the spreadsheet view.
Spreadsheet View:
Edit or add data directly in the cells.
Use spreadsheet functions for quick data manipulation.
Modal View:
Click on a row to open the modal.
Add annotations or comments in a more readable format.
Click on AI Annotations to open the configuration panel.
Select Annotation Method:
Choose from Freeform, Infer and Group, Single Choice, Multiple Values, or Classify and Infer.
Set Up Your Prompt:
Write a custom prompt or select a saved prompt.
Use the AI prompt improvement feature if needed.
Select Data Columns:
Choose which columns the AI should consider.
Benchmark Column (Optional):
Select a column to compare the AI output against existing annotations.
Click on Filters to narrow down the dataset.
Set criteria to include or exclude certain data entries.
Apply filters to focus on relevant data for annotation.
Navigate to Calculations.
Select the columns for which you want to calculate character and word lengths.
View the results in new columns added to your dataset.
Click on Chart Data within the Data Processor.
Choose the columns you want to visualize.
Select the type of chart (e.g., bar chart, box plot).
Analyze the quantitative insights from your dataset.
Initially, run the AI annotations on a subset of the data (default is 50 entries).
Review the annotations and make adjustments to the configuration if necessary.
Once satisfied with the initial results, click on Final Annotation.
The AI will process the entire dataset based on your configuration.
Upon completion, the enhanced data is accessible in the Insight Explorer and Reports function.
Objective: Annotate datasets with labels required for training machine learning models.
Actions:
Use Single Choice or Multiple Values annotation methods to label data.
Employ Classify and Infer to add nuanced annotations.
Objective: Extract key themes or summaries from text data.
Actions:
Utilize the Freeform method with prompts to generate summaries.
Apply Infer and Group to categorize text entries based on inferred topics.
Objective: Estimate the cost and complexity of processing text data.
Actions:
Use the Calculation Feature to determine character and word counts.
Analyze the data to predict processing time and resources needed.
Objective: Standardize data formats and correct inconsistencies.
Actions:
Apply AI annotations to reformat data entries.
Use Freeform prompts to instruct the AI on the desired data format.
Objective: Quickly generate insights and identify patterns in data.
Actions:
Use in-processor charting to visualize data distributions.
Generate box plots to understand data variability.
Begin processing with a small subset to validate your configuration.
Adjust prompts and settings based on initial results before scaling up.
Reuse effective prompts from the Prompt Playground to save time.
Maintain a library of prompts for different annotation tasks.
Apply filters to focus on the most relevant data.
This improves processing speed and annotation quality.
Use the prompt improvement feature to enhance your prompts.
Clear and precise prompts yield better AI annotations.
Manually check a sample of AI annotations for accuracy.
Adjust configurations as needed to improve results.
Use manual annotations for complex cases where AI may struggle.
Combine both methods for comprehensive data enhancement.
The Data Processor is an essential feature for enhancing and managing your datasets efficiently. By offering a combination of manual and AI-powered annotation tools, it allows you to tailor your data precisely to your needs. Whether you're preparing data for analysis, modeling, or reporting, the Data Processor equips you with the capabilities to enrich your data, generate insights, and ensure that your datasets are ready for any downstream application. Utilize its powerful features like AI Annotations, data filtering, calculation tools, and in-processor charting to streamline your data enhancement workflows.