Blog

Machine Learning Q&A Session

We continue our Q&A sessions about Machine Learning. Let’s start with the following questions: 


- We fund commodity transactions for our clients. How can we estimate financial risks of involvement in specific transactions?

- Different machine learning models can be used for this purpose. ML model will depend on the size of your marked dataset:
  • Logistic regression / LDA / SVM if there are too little records (<1000);
  • Decision tree if there are some records (1000-5000);
  • Gradient Boosting Trees (GBT) if there is enough data (10000+ records)
  • Other models like kNN, Naive Bayes Classifier can be used as well. They change the issue core a bit, but they can provide business with more valuable results.
  • If some classification errors are much more important than the others, class weights can be used to tune prediction thresholds as well as to consider different classes' importance.
  • Some of the aforementioned models can also calculate features importance.

- How can we explore our audience?

- ML can give you an opportunity to see which groups your clients consist of. Or you can cluster transactions to see which transactions are the biggest part of  your portfolio:
  • Text encoding (TF-IDF, One-hot-encoder)
  • Text embeddings (GloVe etc)
  • Clusterization (DBSCAN, K-Means, K-Medoids, Hierarchical)

- Can we structure form output if a client fills a form with some text fields?

- Sure. Text standardization allows to extract meaningful data from unstructured text input fields. The following techniques can be used:
  • Named Entity Recognition (NER) extracts certain types of data from input. Basic NER models extract names, phone numbers, location, prices, etc
  • Custom Entity detection via spaCy / DeepPavlov can be trained to extract specific entities: datacenter name, item description, commodity article, intent to action.
  • Extracted structural data is useful both for analytics and as input to another processes or models.

- Is it possible to search for similar transactions?

- Yes. ML allows to find already executed similar transactions in history. It is useful in transaction analysis. It can emphasize some points of specific transactions that are more important than others.
  • kNN (“K nearest neighbors”) model allows you to find which historical transactions are similar to the new one. The “similarity” is calculated in the N-dimensional space of transaction features. Different methods are used with kNN such as window size tuning or kernel tricks. They allow models to be more flexible and accurate.
  • Clustering (DBSCAN, K-Means, K-Medoids, Hierarchical) can be used to divide transactions into large groups that have something in common.

Contact form

Let's discuss our cooperation
Please fill in the form below and we will contact you to clarify the details