A security researcher came across an unsecured MongoDB database server that contained highly detailed CVs for more than 202 million Chinese users.
The owner of the database remains a mystery, said Bob Diachenko, director of cyber risk research at Hacken Proof, who discovered that the server data had been exposed online.
The MongoDB instance contained 854 GB of data, with a total of 202,730,434 records, most of which were CVs for Chinese users.
Resumes contained all the sensitive details that you could hope to find on a resume, such as full names, personal addresses, phone numbers, emails, marital status, number of children, political affiliations, body measurements such as height and weight, level of literacy, salary expectations, education, previous jobs, etc.
The information was a goldmine of information about threats left open to all the internet. Stalking his owner was almost impossible. Diachenko was forced to use a public advocacy on Twitter to ask for help to identify the server administrator.
One of the researcher's followers came to the rescue last year when he headed Diachenko to a now deleted GitHub repository that contained the source code of a Web application.
The application, which was probably created to extract CVs from legitimate job search portals, contained data structures identical to those found in the leak database, which clearly indicates that it is the one that scraped and collected the CVs.
Diachenko said ZDNet bj.58.com, a very popular job portal in China, seems to be one of the main sources of the application that seems to have corrected CVs. However, other portals could also have been deleted.
When Diachenko contacted the staff at bj.58.com, he confirmed his initial assessment that the data came from a data scraper rather than a leak from his network.
"We searched the entire database and examined all other storage data, and concluded that the data samples were not disclosed to us," bj.58.com spokesman told Diachenko. . "It appears that the data is disclosed by a third party who retrieves data from many resumes. [sic] websites."
bj.58.com has not responded to a request for additional comment from ZDNet.
But the call for help from Diachenko on Twitter also seems to have attracted the attention of the database owner, who secured the server and removed the GitHub repository a few days ago.
This is not the first time that Diachenko has found a leaky server that contains data from CV site scrappers. Last month, he also discovered a similar server, revealing more than 66 million records that appeared to have been rooted out of LinkedIn and then filtered through another MongoDB database.
More data breach coverage: