This project aims to improve the name screening process by combining traditional matching techniques with machine learning. Its primary objective is to address false negatives by offering greater flexibility in tuning parameters for conventional distance-based matching algorithms, while leveraging ML models for pruning. This project is intended to showcase an expertise of the author in regulatory compliance and design of algorithms by tackling one of the key compliance challenges.
At the core of this project lies a fundamental principle: to complement, rather than supplant, the tried-and-true methods of name matching known to specialists in the industry. Rather than reinventing the wheel, the approach focuses on enhancing these established techniques with the integration of advanced technology. By leveraging machine learning capabilities, the project imprints the author’s investigative insights directly into the matching process, giving way to use broader search while leaving the pruning task to a model.
One of the primary goals of the project is to introduce tuning specialists to the usage of machine learning techniques and offer insight into its practical application in their field.
Algorithms utilised in the project:
A key question of choosing the most suitable model for the task at hand occurred at different stages of the projects. As the number of features grew and the logic became more and more complex, some of the models became obsolete. As of this moment (nowhere near the end of this endeavour) the author settled for a Histogram-based Gradient Boosting Classification Tree. Its main perks for the challenge at hand are:
The model is trained using a portion of names from the OFAC SDN List.
The project was written using Python programming language. Its current form would have been vastly different without contributions of several key libraries in the implementation of this project:
These libraries were instrumental in shaping the project.
Soon