Fighting Ad fraud with Machine Learning

Follow @Adello Follow @itraveribon

The advertisement industry can be different to other industries in many aspects. (Ad) Fraud is not one of them. Wherever there is money to be made, scammers are present and mobile advertisement is not different. To explain where fraudsters are picking their money from, first we have to understand how the money flows in the web advertisement industry. For the sake of clarity, we will explain this using a scenario with only four actors. In daily business often more actors are participating.

Advertiser: This is the one bringing the money to the industry. Brands like Coca Cola or McDonalds pay to show advertisements in apps and websites. They may have different KPIs like making their brand visible via impressions or directly generating purchases. Regardless of the used KPI, they want to show their ads to humans who may be interested in buying their products.
DSPs (Adello): They run campaigns for advertisers by participating in the open market of digital advertisement and buying impressions that fulfill a given set of conditions, e.g. location, gender, age, etc. DSPs should not have an interest in buying fraud. If the advertiser, i.e. customer, realizes that the DSP is buying fraud with their money, the advertiser won’t be happy and everybody knows what happens when your customer is not happy.
Exchanges: They act as aggregators for different publishers and sell their available impressions in sites and apps on one marketplace. The more inventory they have available, the more they can sell to DSPs. Thus, their main interest is to have a decent amount of active traffic available.
Publishers: They are offering real estate on their apps and sites where ads can be shown. For every impression they are able to sell, they receive a portion of the money the advertisers paid (minus the margin for the DSP and exchange). The more visitors they have, the more impressions they can sell. Most of the publishers try to increase the amount of visitors by offering quality content, e.g. news, tutorials, videos, etc. However, other publishers may prefer to create this traffic artificially and increase so their incomes. Fraudster are usually hired by publishers and their objective is normally to increase the traffic of the publisher websites in a way that appears to be human traffic.

Following this scenario, the main duty of fighting fraud is with the DSPs and the exchanges to make sure, that customers get the maximum value out of their money. In Adello we are fighting fraud every day and we know that this is a never ending war. Fraudsters will always find new ways to make their business, so we have to stay alert and proactively research the ways fraudsters are trying to cheat us. Sine it is hard to build a decent set of ground truth, we often work in a unsupervised learning scenario. Next, we give more details about some of our fraud detection method. All of those methods are actively used to detect and prevent fraud within our infrastructure. The following list is not comprehensive, but should give a good impression.

Site Entropy: In our context, we define an event as a visit of a certain entity, e.g. an IP adress, to a website. If this event is happening more often than the “normal” frequency, we may think that there is some suspicious interest/incentive for this IP address to visit this website with an abnormal high frequency. In Adello, we analyze our incoming traffic and compute an entropy coefficient for each site we have in our inventory. Sites that are always visited by (almost) the same set of IPs receive a low entropy coefficient and sites that are visited by very different IPs receive a high entropy coefficient. Low entropy sites are marked as suspicious and blacklisted. If you want to gain more insights, read this paper from Pastor Valles.
Massive Low Intensity Attack: This attack is performed by a large army of IPs or devices but in a low intensity way, i.e. without detectable peaks in their activity. Think about IPs that visit a website once a day, but everyday of the year. If this site is our favourite social network or our favourite web search engine it would not be that suspicious. However, if this is happening in not so popular websites, a fraudster may be behind this activity.
Click patterns: The complexity of click bots has increased over the last years and some of them are able to simulate human-like mouse movements, scrolling, etc. Nevertheless, the old click bots did not retire. They are still active, hoping to slip past. These old bots are not so smart and prefer to click on certain positions or they like to wait a determined amount of seconds before clicking. They may be difficult to detect when also human traffic is visiting the site. The good thing is that, sometimes they are the only visitors and then we end up discovering them because of their silly clicking strategy.

On a very basic level, we are creating our own version of the Turing test. In the classic Turing test, a machine pretends to be a human and a human has to decide if she is dealing with another human or a machine. Having machines able to imitate human behavior will make our job in fraud detection more challenging. Nevertheless, we have an advantage. Fraud bots won’t have to cheat only humans, but also other machines. From our perspective, this is sometimes much more difficult given that those machines also have the power of machine learning and AI available to them.

Hopefully, these examples help you to understand how we use data science in Adello to fight fraud. However we are not the only department of the company aiming to improve the quality of the traffic we buy. For example, we also continuosly update a white list of sites and apps. This site list ensures brand safety and includes only real sites and apps, i.e. dummy sites and apps which only aim to sell advertisement and clearly do not offer any content are excluded from this list. Fraud is a serious problem in our industry and for this reason we have to tackle it from all possible perspectives.