Deepfake really makes people love and hate.
As we all know, Deepfake software based on deep learning models can create fake facial videos or images. It has a wide range of application scenarios in industries such as film and television and entertainment.
But since 2017, Deepfake has also been used by bad actors to create pornographic videos-Wonder Woman’s Sea Incident. According to statistics, 96% of Deepfake videos on social networks involve pornographic content, and the number of users watching them has exceeded 130 million.
In addition, Deepfake has also begun to get involved in the political field, being used to forge false politicians’ remarks, and relevant data is also increasing year by year.
Obama made remarks unrelated to him
More importantly, with the continuous upgrading of Deepfake technology, these fake videos are becoming more and more difficult to distinguish between true and false, posing a great threat to social stability.
Recently, a paper published in IEEE PAMI (Pattern Analysis and Machine Intelligence Transactions) claimed that there is a new method to recognize Deepfake videos with an accuracy rate of 97.29%, and it can also discover the generative model behind manufacturing Deepfake.
What’s more interesting is that, unlike conventional detection methods, the paper emphasizes that it uses biological signals-heartbeat.
Deepfake “heartbeat” detection method
This paper comes from a research team formed by Binghamton University and Intel Corporation. The team said that this AI tool is called FakeCatcher, and it can distinguish between true and false videos by detecting the subtle differences in the heartbeat on the face.
We know that blood vessels are all over the body, including the face. When the heart beats, it will drive the blood flow throughout the body, and the flowing blood will produce subtle changes on the surface of the face, and this change is the key to the researchers to distinguish between true and false videos.
Researchers call the method of distinguishing such changes as Photoplethysmography (PPG). Simply put, it uses the pulsating change of the light rate to convert it into an electrical signal, which corresponds to the heart rate.
This principle is similar to the heartbeat signals of medical pulse oximeters, Apple Watches, and wearable fitness tracking devices that detect exercise status.
The premise of the research is that biological signals are important signs to distinguish between true and false faces. In other words, the “people” shown in the fake video will not show a heartbeat pattern similar to the people in the real video.
Based on this, the researchers found through experiments that the Deepfake face cannot be normal due to the weak changes caused by blood flow.
According to Ilke Demir, a senior research scientist at Intel,
We extracted several PPG signals from different parts of the face, and observed the consistency of these signals in the spatial and temporal dimensions.
Here the spatial dimension refers to the facial area, and the time dimension refers to the heartbeat frequency. Demir means that by reading the PPG signal and enhancing technology, it can restore and amplify the faint changes in the face to judge the authenticity of the video.
If it is a Deepfake video, the facial effect produced will be very unnatural. As shown below:
Specifically, FakeCatcher’s complete detection process is as follows: 1) Recognize key face regions; 2) Extract biosignals (PPG); 3) Use signal conversion to calculate the correlation between spatial and temporal dimensions, and compare the feature set and PPG The signal features are captured in the mapping and the probability is trained; 4) According to the probability of authenticity, the video is classified as true and false.
According to the researcher, three major advancements have been made in this process:
Through signal conversion formulas and experiments, the feasibility of using the spatial and temporal consistency of biological signals to verify the authenticity of the video is verified.
A new general-purpose Deepfake detector is proposed.
A new biosignal structure map is proposed, which can be used to train neural networks for authenticity classification.
A diversified portrait video data set is constructed to provide a test bed for false content detection.
Model accuracy test results Before the experiment, in order to more accurately evaluate the FakeCatcher model, the researchers built a Deepfake data set, which comes from media networks, news articles, research reports, etc. Therefore, the video is generating models, resolution, and compression. , Lighting, aspect ratio, frame rate, motion, posture, occlusion, content and other aspects are real problems.
The data set contains 142 videos and has a size of 30 GB. From the classification results of the figure below, FakeCatcher is robust to low resolution, compression, motion, lighting, occlusion and other issues.
The upper part is the real video, and the lower part is the Deepfake video
Next, the researchers mainly conducted two experimental verifications. One is to compare with current deep learning solutions and other deepfake detectors. The experimental results are as follows:
Among them, Frame and Face represent segmentation accuracy. It can be seen that FakeCatcher is the highest, reaching 87.62%; Video represents video accuracy. FakeCatcher is 8.85% higher than the best architecture.
It should be noted that all the experiments in the table are carried out in the self-built data set DF (60% training and 40% test split).
The second is to perform cross-data set verification, including DF, Celeb DF, FF, FF++ and UADFV data sets.
The first column is the training data set, the second column is the test data set
From rows 5 and 6, the learning effect of FakeCatcher in a small and diverse data set is better than in a large and single data set. On the one hand, the accuracy of DF training and FF testing is 18.73% higher than the reverse test. On the other hand, the DF data set is only about 5% of the FF data set. From the third and sixth lines, we can find that increasing the diversity from FF to FF++ increases the accuracy of DF by 16.9%.
In the cross-data set FF++, each original video contains four synthesized videos, each of which is generated using a different generation model. Researchers split the original video of FF++ into 60% training and 40% testing. Then create four copies of these sets, and delete all samples generated by a particular model from each set.
In the first column of the table, each set contains 600 real videos and 1800 fake videos of three models, and 400 real videos and 400 fake videos of one model for testing.
From the cross-model evaluation results, except for NeuralTextures, the others have been very accurate predictions. NeuralTextures are essentially different generative models.
From this, the paper finally concluded that the Deepfake video detector FakeCatcher based on biosignals proved that the consistency of the spatial and temporal dimensions of biosignals was not well maintained in GAN-Rated content.
In addition, through the face forensics experiment and the introduction of the self-built DF data set, the paired separation of video clips and videos and the authenticity classification methods were evaluated, and the accuracy rates of 99.39%, 96% and 91.07% were obtained respectively. These results once again verify that FakeCatcher can detect fake content with high accuracy without relying on the video generator, content, resolution, and quality indicators.
For more paper content, please see: https://arxiv.org/pdf/1901.02212.pdf
Reference link:
https://ieeexplore.ieee.org/document/9141
For more such interesting article like this, app/softwares, games, Gadget Reviews, comparisons, troubleshooting guides, listicles, and tips & tricks related to Windows, Android, iOS, and macOS, follow us on Google News, Facebook, Instagram, Twitter, YouTube, and Pinterest.