India’s coronavirus data make for grim reading. For 20 consecutive days it has reported more new cases than any other country and this figure is on an upward trend. Worryingly, there is also growing evidence that official figures vastly underestimate the extent of the pandemic.

India’s draconian lockdown on March 24 reflected the growing fear that the country was on a precipice. Prime Minister Narendra Modi gave 1.4 billion citizens four hours’ notice before “a total ban of you coming out of your homes…for whatever reason” was put in place for 21 days. “Every state, every district, every village, every lane, will be under lockdown” declared Modi in his televised address.

India’s leaders understood that the country was supremely vulnerable to the disease. Two thirds of its population subsist on less than $2 a day and hundreds of millions live in densely packed urban areas. Sanitation is almost universally poor and public health care painfully under-resourced.

The lockdown brought economic activity to a near standstill, costing tens of millions of jobs and dealing a terrible blow to an economy that was already experiencing a slump before coronavirus hit. But India’s Great Lockdown failed to bend – let alone flatten – the infection curve and it was lifted on 30 May.

Despite individual states and cities imposing localised restrictions since then, cases are now soaring and the government’s fears are being proven right. Total coronavirus cases in India have passed 3.1 million, making it the third worst-affected country after the USA and Brazil and the fourth worst in terms of deaths with 58,390.

And while most of the hardest hit nations are seeing a flattening of their coronavirus curves, the number of infections and deaths in India is rising day-on-day. Nearly 70,000 new infections were recorded on 23 August.

These figures do not tell the whole story. They ignore a plethora of factors such as population size, demographics, testing rates and divergent accounting techniques which must be teased out for an accurate assessment of the prevalence and threat of coronavirus to be made.

While over-reliance on infection and death rates is reductive, the figures provide a crucial piece of the overall picture. But endemic undercounting in India means that neither figure is likely to be anywhere near the true number.

Undercounting is not a problem confined to India. Lack of widespread testing and the prevalence of asymptomatic cases mean almost all countries undercount to some degree. Professor Prabhat Jha, an epidemiologist at the University of Toronto who led India’s Million Deaths Study – a vast, ambitious study of premature mortality – told Reaction that: “even in rich countries with good medical certification, analyses show that deaths are underestimated by 30-60%”.

But in India’s case, studies suggest that the scale of the underestimation of cases is staggering.

A study conducted by India’s National Centre for Disease Control in the capital, Delhi, randomly tested 20,000 residents and found that 22% had been infected with the virus. The study indicates that almost a quarter of Delhi’s inhabitants are likely to have been infected – 6.6 million out of 29 million. At the end of July, at the time of the study, Delhi had only 123,747 confirmed cases.

A follow-up study in Delhi this month put the true infection rate at 29%. A similar study of nearly 7000 slum-dwellers and inhabitants of residential districts in Mumbai found coronavirus antibodies in 41% of those tested.

As Professor Bhramar Mukherjee, Chair of Department of Biostatistics at the University of Michigan, confirms, these studies suggest that infections in India are being under-reported by “a factor of 30 to 40”, meaning that only around 3% of cases are being recorded.

At 1.8%, India’s case fatality rate (CFR), the ratio of coronavirus infections to coronavirus deaths, is very low, as is the number of coronavirus deaths per million citizens, at 43.

The way in which coronavirus deaths are recorded in India complicates these encouraging statistics, however. “Only around one in five deaths in India is assigned a certified cause” says Jha. “This means that 80% of deaths have to be ignored when calculating Covid-19 death numbers.”

For the roughly 20% of deaths that are assigned a cause, most Indian states, contrary to WHO guidelines, are not including deaths merely “suspected” to be coronavirus deaths in the final count. An investigation by health journalist Priyanka Pulla for The Wire Science, confirmed that the states of Maharashtra, Gujarat, Telangana, Tamil Nadu, Uttar Pradesh and Madhya Pradesh – whose combined population is 554 million – were only counting deaths of patients who had tested positive for the virus in their official tolls. When a patient dies who has shown symptoms of Covid-19 but not been tested, tested negative or whose test result was inconclusive, their death is not included.

“The right strategy in a pandemic is not to undercount but, if anything, slightly over-count and make sure that you get the data out quickly” says Jha. “In the Indian context, unfortunately, we have huge lacunae in the data.”

Even amongst confirmed coronavirus cases, there is evidence of under-reporting of deaths. Devesh Patel, the medical officer of health at the Vadodara Municipal Corporation, told the British Medical Journal that in Vadodara, Gujurat’s third largest city, hospitals’ death audit committees were attributing the ultimate cause of around 75% of deaths of patients testing positive for Covid-19 to co-morbidities – pre-existing conditions like cancer, diabetes or AIDS, that worsen the effects of COVID-19 – which means the deaths were not counted in official figures.

But according to Professor K Sinrath Reddy, president of the Public Health Foundation of India, undercounting doesn’t fully explain the low death rate. “There would have been some undercounting of deaths in some parts of the country, affecting the estimated case-fatality rate.” Reddy told Reaction. “However, deaths per million people have remained lower in all South Asian countries compared to Europe or North America.”

“To attribute all or most of these regional differences in mortality to undercounting of deaths would be inappropriate. Younger age profiles (with lower severity of infection) and higher rural components of the population (with lower virus mobility) are socio-demographic reasons [for the low CFR].”

“There could be other reasons too like those related to non-specific immunity boosted by TB or polio immunisation, but these are hypothetical and still need evidence.” Reddy added.

Along with opposition leaders, Jha suggests that there is a political dimension to the under-reporting. “I think there should be concern about whether the government is acting to scare people so they stop producing these statistics. [Those collecting the data] have an order from high up that says ‘don’t share any data which might embarrass the country’. So we are staring into the dark regarding what’s going on.”

Whatever the reasons behind the lack of data, the consequences are severe, says Jha.

“Very simply, everyone wants to flatten the curve. But if you’re not measuring the curve then you won’t know. Without accurate data India won’t be able to walk out of the pandemic.”