The user can load their csv (comma separated) data with one line column headers (title of the columns). They can request a report for one or more (depending on the file size) column in the file. The report shows error rates and accuracies for the column prediction. Additionally, the report presents the other columns used to make this calculation. Users can create the ai model if they want to create queries on that column. The system accepts csv (comma separated) data as data file. User can download the uploaded file, they can delete it (if file is deleted, it is deleted completely) or they can request a report. Request report opens the column selection page. The number of column can be selected determined by the column count of the file. More column count means less column to select simultaneously. The report process examines the best way to predict the selected column. It detects if it is a classification or regression problem and what other columns needed for best result. For classification columns, the way to measure success is the accuracy value. Accuracy value is a percentage. For regression columns the success measurement is mean absolute error. User can create models among the suggestions in the report. Selected models appear in the models & queries tab. Queries can be created in the model page with partial data needed by the model. Regression model query results are floating point numbers. Classification model query results are first two most possible results among all classes in the column.
Data science has the concept of overfitting. Overfitting means, very basically, the model is fitting too much on the given data reflecting every little details. This can be expressed as the model memorizes the data too much and when we ask queries from outside of the data set, it answers with bigger error than expected. Our system tries to minimize overfitting by finding better results with low number of train counts. So, even though our system sometimes gives a little less accuracy or a little high error than the kaggle best results, it may be less overfit than them (because of less train count) and maintains its quality for data outside of the given dataset.
All uploaded files are stored encrypted. Files are not used for any purpose other than user selected on the system. If a file encounters with an error during report creation, we contact with the customer and ask permission to debug before a human sees the data. If a file is deleted, it is also deleted from servers. When the file is deleted, related reports, models and queries stay for further use by the user. The reports, models and queries are not used by anyone or any machine other than the user. If the users want to delete any report, model and queries, they can send an email to firstname.lastname@example.org. For more information about privacy: https://bicedeepai.com/Landing/PrivacyPolicy
We are using third party systems to process payments on bicedeepai.com and we do not store any payment related info on our servers. We have PCI Compliance. For enterprise level payments, we can accept direct wire invoiced payments or they can register their payment methods at management page on bicedeepai.com
The system understands the following number formats as follows: Numbers with one dot are a floating point numbers: 2.34 Numbers with more than one dot are strings: 12.234.55 Numbers with only comma or commas are big numbers: 12,234,34 = 1223434 Numbers with both commas and dots are strings: 12,346.34 (to use this case as number rather than string, remove the comma) If a column has unit information such as price, it is better to be in the format of number: 12.34 rather than $12.34 for example. Then it is considered as number rather than string.
Csv (comma separated values) is an input file format for our system. Most of the data formats can be convert to csv.
The first line of the csv file should be headers that identifies the columns. It is not needed to have meaningful headers. You can rename the headers if you do not want to expose the column name.
Numbers, dates and strings are allowed. If the number and dates are not in the specific format, they are considered as strings, see the number format section for more details.
If the predicted column is considered as classification problem, the output is the most possible two answers with the possibility percentages. For example: “n_of_rings : 5(73.2%) 4(26.4%)”. If the predicted column is considered as regression problem, the output is the predicted floating point number. For example: “Close: 42.5429”
Classification problem is finding the correct answer among the given choices. The system considers the column as a classification problem if unique value count is less than one percent of the total row count and unique value count is less than 256 for that column. Additionally, If the data in the column is string or date, it is considered as classification problem. The system selects two of the existing values with highest possibilities as answer.
Regression problem is trying to predict the exact value of a number. The system considers the column as regression problem if it is not a classification problem. The result is a floating point number as a prediction of the answer.
Accuracy value is the success indicator of the classification problems. Accuracy is calculated as percentage of the correct best answers.
It is the success indicator of the regression problems. It is the average difference between the system prediction and the correct answer.
It is the success indicator of the classification problems. The system gives a probability for every possible answer and log loss is the error value of this probability assignment.
There is no one correct answer for this question and the answers may change according to the data. For some data, finding the correct amount of rows or adding more columns or decreasing unique values in a column may increase success. If you need assistance to find better success rate, you can get help from email@example.com
A better way to use the system is chaining the queries. You can use the result of a model query as an input of another query of another model.