785 million euros. This is the amount tax collectors recovered in 2019 thanks to “data mining”, according to a statement by the French government in February 2020. This was made possible by a team of thirty data scientists. The group was set up in 2014 by the French tax authority (DGFiP) to develop machine learning techniques and help with tax audits. The French tax authority collects all taxes in France (including local taxes, which are then sent back to local authorities), which represent about 540 billion euros a year.
Going by the name "Fraud targeting and maximization of requests value" (CFVR in its French acronym), the tools aim at finding mistakes and fraud in tax returns using lots of data about taxpayers. The finance ministry planned to invest 21.3 million euros in the program from 2016 to 2023. Like the tax authority, the team running CFVR is based in Paris.
Tax audits of yore
Before the data science team was created, tax audits were targeted with various methods: thorough human review of tax returns, reports from other administrations and rule-based targeting using a regular computer at local tax offices. The team of data scientists running CFVR uses a set of machine learning techniques for various fraud schemes. The tools can investigate businesses as well as private persons.
To identify fraud, CFVR analyses data from past controls or tries to find patterns in the behavior of businesses or households. The tools also rely on network analysis to find persons and businesses related to known fraudsters.
The French data protection authority published a description of CFVR before it greenlighted the project. The personal data watchdog listed the following goals for the algorithms: find mistakes and incoherences in tax returns, identify potential fraud schemes and find them in declarations, identify incoherent buying habits, find mistakes with an analysis of previous tax returns, and more.
These algorithms are fed with data from dozens of government-owned databases (banking account data, tax declarations, cadastre and others), as well as private-sector ones, such as foreign companies listings or financial data. The rule-based algorithms that were previously used have been integrated in the new tools. This means that part of the findings attributed to CFVR would have been detected previously. The government does not break down how much of the amount recovered was because of rule-based algorithms.
One in 3 tax controls
In 2019, one in five tax controls on companies (one in ten for households) were scheduled using data analytics coming from CFVR, according to the ministry. In 2020, the number rose to one in three. By 2023, the ministry’s goal is to reach one in two. AlgorithmWatch talked to several tax collectors (they had to stay anonymous because their contracts do not allow them to talk to the press), who said the target would be hard to achieve. Currently, when CFVR gives tax inspectors a list of companies or households to control, only 10% to 30% of the entities on the list end up being controlled, according to the inspectors AlgorithmWatch talked to (inspectors see no reason to control the remaining 70% to 90%).
The ministry is planning to improve CFVR’s algorithms and collect more data. It will also, in some regions, loosen the criteria required to start a control. Tax inspectors currently need to have serious doubts on a tax return before starting a control. In the future, ministry guidance might force an inspector to launch a full-fledged control even when his or her doubts were light enough to be cleared with a phone call.
Government ministers are not keen to communicate on the relatively poor performance of the data mining algorithms. A parliamentary report published in July revealed that only one in three controls carried out based on CFVR data led to the recovery of tax income.
The ratio of successful controls – one in three – is roughly equivalent to what traditional methods achieve, but, in the case of CFVR, tax inspectors need to weed out entities flagged for control without reason. The total number of entities sent by CFVR to local tax inspectors is not public. The authors of the parliamentary report recommend creating new indicators to measure efficiently the success of CFVR. New metrics could include the number of entities that were correctly flagged for a control, the number of duplicates between CFVR results and audits otherwise scheduled, audits that led to legal proceedings, “severe” audits realized thanks to CFVR listings or the amounts recovered in successful audits.
Concerning the 785 million the government boasted, the authors note that they are negligible compared to the 12 billion recovered the same year. "The data-mining, despite what the government can say, is slow to become effective. This raises questions regarding the efficacy of their targeting," the authors wrote.
According to the finance ministry, targeting will be improved by adjusting the algorithms and growing the database they have access to. 99 collaborative platforms, such as auction websites, house-sharing and online marketplaces, agreed to provide data to the ministry. User identifiers, the number of transactions, the amounts spent and bank account details should soon be available to CFVR.
The French parliament voted in December 2019 to allow the finance ministry to collect data from social networks as well. (The authorization for the experimentation, which must come from the data protection authority and from a country’s highest administrative court, is still pending). On the other hand, changes in fiscal policy removed interesting data sources. The wealth tax, which was suppressed in 2017, was a good source of data on rich taxpayers and their assets, according to tax collectors unions.
When the algorithms make mistakes, a network of hand-picked local correspondents can report the error back to CFVR, the ministry says. However, tax collectors who actually have to use the lists of suspected taxpayers cannot do so, and some resent that they cannot provide feedback directly.
The development of tax collection algorithms is not only a question of efficiency or communication. These experimentations come in a context of massive personnel reduction. Since 2008, 2000 jobs have been cut at the tax authority each year, on average. The finance ministry estimates that machine learning tools will help cut another 546 jobs. The tax authority told AlgorithmWatch that the automatization brought by CFVR would allow tax inspectors to focus on “local scheduling”, “using intelligence” and on “work with more added value”.
Once described as a way to help tax collectors with “better-targeted or high-stakes audits”, CFVR now looks for any type of fraud. Instead of looking at fraud that is hard to identify for humans, the algorithms look for small mistakes, inconsistencies and obvious fraud.
The goal could be, eventually, to “fully automate” most tax audits, as one tax collector told us. Last year, a new law introduced a “right to make honest mistakes” for taxpayers. The algorithm can now automatically send taxpayers a message warning them that a mistake was detected in their tax returns, removing all human filters. “Those non-bindings requests will allow honest taxpayers to correct their tax returns without having to undergo a full tax control”, the data protection authority wrote in 2019.
The French tax authority is moving from a system where all tax returns were treated by humans, to one where taxpayers have to fix mistakes detected by machines. And if machines make mistakes, taxpayers would be the ones who have to check. It seems that machines are granted a right to make honest mistakes too, although taxpayers will have to do the unpaid work of fixing these.