All Collections
Preparing your Data
How do I prepare my data for upload?
How do I prepare my data for upload?

Learn how to format your data to get the most analysis value out of your data sources.

Amanda Robinson avatar
Written by Amanda Robinson
Updated over a week ago

Keatext is able to unify customer feedback from multiple channels such as surveys, social media, online reviews, help desk tickets, emails, chat logs and call centre transcripts. You can analyze data sources individually or analyze multiple data sources at the same time—doing this will help you build a 360-degree view of your customers. Each source should be uploaded as a separate CSV file.

Formatting CSV files

Data can be uploaded to Keatext from a CSV file. A CSV file is a comma-separated values spreadsheet — tabular data separated by a comma delimiter.

For analysis purposes, a typical CSV file should be structured into clearly marked column headers with at least one column containing customer comments and, if available, the other columns containing corresponding metadata, such as dates, locations, or commenter ID numbers. Including metadata is optional but will unlock more filtering features. Each column in your file should have a header and you cannot have duplicate headers.

Here is an image example of a CSV with three data fields in columns—Date, Review and City—each separated by a comma. 

The first row is a list of field names. Each subsequent row contains a Record. In this case, a Record is a comment made on a given date about a certain apartment rental, in a specific city.

Keatext analyses feedback data by looking for a Topic and Opinion, that is, an opinion on a given topic, to which the system can assign a sentiment.

Feedback data is often unpredictable. Unexpected insights may rise to the surface and you want to be able to segment those insights in as many ways as possible to get to the heart of the matter and accurately target your strategy. This is where the metadata comes in.

It can be tempting to combine similar metadata fields together. For example to have City, State, and Country in one column. This, however, limits your analysis possibilities. Each column should have only one metadata type for optimum filtering possibilities. City, State, and Country should each have their own column allowing for three location related filtering possibilities rather than one, if you combine them together. Formatting in this way also allows you to produce clear and simple correlations.

Here is an example:

These two datasets contain the same information but in the second example you would have a total of seven filtering possibilities as opposed to four in the first.

Checking feedback data

Keatext’s machine learning algorithm is specifically trained to extract customer feedback, finding the relevant Topic and its Opinion companion. Any data source that includes feedback data is suitable for text mining with Keatext.

When preparing your CSV for upload make feedback data top of mind, ensuring that:

  • At least one column contains feedback data.

  • Each dataset has 1000+ data points, for best results.

  • Each column corresponds to one question and each row to one respondent. 

  • Feedback is in sentences rather than yes/no answers or one-word answers. This form of commentary is essential to how Keatext analyzes data. 

  • Comments contain only a single language per comment row. 

  • Comments from social media are cleaned up to only include feedback on services or products, rather than extraneous sections of a social media post or thread. 

  • Non-feedback columns feature standardized responses, such as automatically generated pull-down menu options in a survey.

Configure your CSV file

Before uploading your data, check that your CSV file:

  • Is well-formed. If something is not working, you can double-check this by running it through a tool like CSV Lint. (If you're trying to validate private data, tread carefully as CSV Lint keeps a public list of validation reports for data uploaded through a URL.)

  • Does not exceed 100 MB.

  • Is encoded in UTF-8.

  • Uses a carriage return and a line feed (CRLF) for new lines.

Advice for creating and uploading surveys

For surveys, CSV columns should correspond to answers to open-ended questions, and corresponding metadata such as demographics, and location. 

You can improve both your survey completion rate and the AI analysis of your data when you:

  • Include at least one open-ended question to create comment data.

  • When uploading the data include only one respondent per row.

  • Though Keatext can analyze more, limiting the number of questions in the survey to less than 5 improves completion rate.

  • For more tips on survey creation, read Keatext’s blog post on designing user-friendly surveys with text analytics.

Now that your CSV files are prepared, it’s time to import your data.

Did this answer your question?