Detection of the Fake News Using Machine Learning Algorithms and Data Analysis Techniques

Authors

  • Mirza Niksic International Burch University, Francuske revolucije bb, Sarajevo and 71000, Bosnia and Herzegovina
  • Dzelila Mehanovic International Burch University, Francuske revolucije bb, Sarajevo and 71000, Bosnia and Herzegovina

Keywords:

Fake news, Detection, Machine learning, Algorithm, Text Classification

Abstract

Due to the rapid advancement of online social networks in recent years, the prevalence of fake news has increased significantly. Fake news is deliberately created to deceive users by imitating real news, making it challenging to identify early on. So, we need to explore the accompanying information to improve its disclosure such as the publisher. This study focuses on analyzing and investigating various traditional machine learning models to determine the most effective one. The goal is to develop a supervised machine learning algorithm that can classify news articles as either true or fake, utilizing tools like Python’s scikitlearn and NLP for text analysis. The proposed approach involves feature extraction and vectorization. To accomplish this, the
scikit-learn library in Python is utilized, which offers helpful tools like CountVectorizer and TfidfVectorizer. The experiment involved implementing well-known algorithms: Logistic regression, Neural networks and SVM, and comparing their performance to determine the most suitable one. Each of the three algorithms
performed well, but SVM demonstrated superior outcomes across nearly all categories.

References

How Many People Use Social Media in 2023? (65+ Statistics).

Pengue, Maria. 2021. ”How Many People Get Their News From Social

Media in 2023?” Letter.ly. February 25, 2021. https://letter.ly/how-manypeople-get-their-news-from-social-media/.

Marr, Bernard. 2021. ”How Much Data Do We Create Every Day? The

Mind-Blowing Stats Everyone Should Read.” Bernard Marr. July 2,

https://bernardmarr.com/how-much-data-do-we-create-every-daythe-mind-blowing-stats-everyone-should-read/.

Tandoc, Edson C., Jr, Zheng Wei Lim, and Richard Ling. 2017.

”Defining ‘Fake News.’” Digital Journalism, August.

”Fake News.” 2017. WhatIs.com. TechTarget. February 24, 2017.

https://www.techtarget.com/whatis/definition/fake-news.

Shu, Kai, Amy Sliva, Suhang Wang, Jiliang Tang, and Huan Liu. 2017.

”Fake News Detection on Social Media.” ACM SIGKDD Explorations

Newsletter. https://doi.org/10.1145/3137597.3137600.

Chen, Yimin, Niall J. Conroy, and Victoria L. Rubin. 2015. ”Misleading

Online Content.” Proceedings of the 2015 ACM on Workshop on Multimodal Deception Detection. https://doi.org/10.1145/2823465.2823467.

Warkentin, Darcy, Michael Woodworth, Jeffrey T. Hancock, and

Nicole Cormier. 2010. ”Warrants and Deception in Computer Mediated Communication.” Proceedings of the 2010 ACM Conference on Computer Supported Cooperative Work - CSCW ’10.

https://doi.org/10.1145/1718918.1718922.

Toma, Catalina L., and Jeffrey T. Hancock. 2010. ”Looks and

Lies: The Role of Physical Attractiveness in Online Dating

Self-Presentation and Deception.” Communication Research.

https://doi.org/10.1177/0093650209356437.

Zhang, Linfeng, and Yong Guan. 2008. ”Detecting Click Fraud in

Pay-Per-Click Streams of Online Advertising Networks.” 2008 The

th International Conference on Distributed Computing Systems.

https://doi.org/10.1109/icdcs.2008.98.

Newman, Matthew L., James W. Pennebaker, Diane S. Berry,

and Jane M. Richards. 2003. ”Lying Words: Predicting Deception

from Linguistic Styles.” Personality and Social Psychology Bulletin.

https://doi.org/10.1177/0146167203029005010.

Qin, Tiantian, J. K. Burgoon, J. P. Blair, and J. F. Nunamaker. n.d. ”Modality Effects in Deception Detection and Applications in Automatic-Deception-Detection.” Proceedings of the

th Annual Hawaii International Conference on System Sciences.

https://doi.org/10.1109/hicss.2005.436.

Shafqat, Wafa, Seunghun Lee, Sehrish Malik, and Hyun-Chul Kim.

”The Language of Deceivers.” Proceedings of the 25th International Conference Companion on World Wide Web - WWW ’16

Companion. https://doi.org/10.1145/2872518.2889356.

Glez-Pena, Daniel, An ˜ alia Lourenc¸o, Hugo L ´ opez-Fern ´ andez, Miguel ´

Reboiro-Jato, and Florentino Fdez-Riverola. Web scraping technologies

in an API world. Briefings in bioinformatics, 15(5), 788-797. 2014.

Field, Andy. 2013. Discovering Statistics Using IBM SPSS Statistics.

SAGE.

”Pandas Documentation — Pandas 1.5.1 Documentation.” n.d. Accessed

November 18, 2022. https://pandas.pydata.org/docs/.

”Seaborn: Statistical Data Visualization — Seaborn 0.12.1 Documentation.” n.d. Accessed November 18, 2022. https://seaborn.pydata.org/.

Rahman, Kallur. 2021. Python Data Visualization Essentials Guide:

Become a Data Visualization Expert by Building Strong Proficiency in

Pandas, Matplotlib, Seaborn, Plotly, Numpy, and Bokeh (English Edition). BPB Publications. ”Logistic Regression.” n.d. Accessed July 17,

https://machine-learning.paperspace.com/wiki/logistic-regression.

Mahesh, B. 2020. ”Machine Learning Algorithmsa Review.” International Journal of Science and Research (IJSR). https://www.researchgate.net/profile/BattaMahesh/publication/344717762 Machine Learning Algorithms -

A Review/links/5f8b2365299bf1b53e2d243a/Machine-LearningAlgorithms-A-Review.pdf?eid=5082902844932096.

Oladipupo, Taiwo. 2010. ”Types of Machine Learning Algorithms.” New

Advances in Machine Learning. https://doi.org/10.5772/9385.

Noble, William S. 2006. ”What Is a Support Vector Machine?” Nature

Biotechnology. https://doi.org/10.1038/nbt1206-1565.

Kawsar, Md Shahidullah. 2021. ”Machine Learning Quiz

: Support Vector Machine.” Medium. March 15, 2021.

https://kawsar34.medium.com/machine-learning-quiz-03-supportvector-machine-c40cc80279a5.

Sperandei, Sandro. 2014. ”Understanding Logistic Regression Analysis.” Biochemia Medica: Casopis Hrvatskoga Drustva Medicinskih

Biokemicara / HDMB 24 (1): 12–18.

”Logistic Regression.” n.d. Accessed July 17, 2023. https://machinelearning.paperspace.com/wiki/logistic-regression.

Cruz-Cunha, and Maria Manuela. 2013. Handbook of Research on ICTs

and Management Systems for Improving Efficiency in Healthcare and

Social Care. IGI Global.

Wang, Sun-Chong. 2003. ”Artificial Neural Network.” In Interdisciplinary Computing in Java Programming, edited by Sun-Chong Wang,

–100. Boston, MA: Springer US.

Lohmann, Steffen, Florian Heimerl, Fabian Bopp, Michael Burch, and

Thomas Ertl. 2015. ”Concentri Cloud: Word Cloud Visualization for

Multiple Text Documents.” In 2015 19th International Conference on

Information Visualisation, 114–20. ieeexplore.ieee.org.

M, Hossin, M. Hossin, and M. N. Sulaiman. 2015. ”A Review

on Evaluation Metrics for Data Classification Evaluations.” International Journal of Data Mining & Knowledge Management Process.

https://doi.org/10.5121/ijdkp.2015.5201.

Tharwat, Alaa. 2021. ”Classification Assessment Methods.” Applied

Computing and Informatics. https://doi.org/10.1016/j.aci.2018.08.003.

Vujovic, Zeljko . 2021. ”Classification Model Evaluation Metrics.” ?

International Journal of Advanced Computer Science and Applications.

Downloads

Published

2023-09-11

How to Cite

Niksic, M., & Mehanovic, D. (2023). Detection of the Fake News Using Machine Learning Algorithms and Data Analysis Techniques. International Journal of Applied Sciences: Current and Future Research Trends, 19(1), 67–88. Retrieved from https://ijascfrtjournal.isrra.org/index.php/Applied_Sciences_Journal/article/view/1397

Issue

Section

Articles