Machine Learning Q&A Session
We continue our Q&A sessions about Machine Learning. Let’s start with the following questions:
- We fund commodity transactions for our clients. How can we estimate financial risks of involvement in specific transactions?
- Different machine learning models can be used for this purpose. ML model will depend on the size of your marked dataset:
- Logistic regression / LDA / SVM if there are too little records (<1000);
- Decision tree if there are some records (1000-5000);
- Gradient Boosting Trees (GBT) if there is enough data (10000+ records)
- Other models like kNN, Naive Bayes Classifier can be used as well. They change the issue core a bit, but they can provide business with more valuable results.
- If some classification errors are much more important than the others, class weights can be used to tune prediction thresholds as well as to consider different classes' importance.
- Some of the aforementioned models can also calculate features importance.
- How can we explore our audience?
- ML can give you an opportunity to see which groups your clients consist of. Or you can cluster transactions to see which transactions are the biggest part of your portfolio:
- Text encoding (TF-IDF, One-hot-encoder)
- Text embeddings (GloVe etc)
- Clusterization (DBSCAN, K-Means, K-Medoids, Hierarchical)
- Can we structure form output if a client fills a form with some text fields?
- Sure. Text standardization allows to extract meaningful data from unstructured text input fields. The following techniques can be used:
- Named Entity Recognition (NER) extracts certain types of data from input. Basic NER models extract names, phone numbers, location, prices, etc
- Custom Entity detection via spaCy / DeepPavlov can be trained to extract specific entities: datacenter name, item description, commodity article, intent to action.
- Extracted structural data is useful both for analytics and as input to another processes or models.
- Is it possible to search for similar transactions?
- Yes. ML allows to find already executed similar transactions in history. It is useful in transaction analysis. It can emphasize some points of specific transactions that are more important than others.
- kNN (“K nearest neighbors”) model allows you to find which historical transactions are similar to the new one. The “similarity” is calculated in the N-dimensional space of transaction features. Different methods are used with kNN such as window size tuning or kernel tricks. They allow models to be more flexible and accurate.
- Clustering (DBSCAN, K-Means, K-Medoids, Hierarchical) can be used to divide transactions into large groups that have something in common.
Let's discuss our cooperation