Developing Algorithm for Matching Arabic Names Entered by Mobile Phone

Authors

  • Muneer Alsorori Faculty of Sciences, Department of Computer Sciences &Information Technology, Ibb University, Ibb, Yemen
  • Maher Al- Sanabani Faculty Computer Science and Information Systems, Thamar University, Thamar, Yemen
  • Salah AL- Hagree Faculty of Sciences, Department of Computer Sciences & Information Technology, Ibb University, Ibb,Yemen
  • Sarah Abdulmalik Department of Mathematics & Computer Sciences, Faculty of Sciences, Ibb University ,Yemen
  • Noor Al-Huda AlArhabi Department of Mathematics & Computer Sciences, Faculty of Sciences, Ibb University ,Yemen
  • Suad Abdu Department of Mathematics & Computer Sciences, Faculty of Sciences, Ibb University ,Yemen
  • Khawlah Meqran Department of Mathematics & Computer Sciences, Faculty of Sciences, Ibb University ,Yemen

DOI:

https://doi.org/10.53555/cse.v5i8.1045

Keywords:

Arabic Language, Name Matching, Levenshtein Distance, Mobile Phone, Phone Keyboard Arabic

Abstract

Name matching plays a vital and crucial role in many applications. They are for example used in information retrieval or deduplication systems to do comparisons among names to match them together or to find the names that refer to identical objects, persons, or companies. Since names in each application are subject to variations and errors that are unavoidable in any system and because of the importance of name matching, so far many algorithms have been developed to handle matching of names. These algorithms consider the name variations that may happen because of spelling, pattern or phonetic modifications. However most existing methods were developed for use with the English language and so cover the characteristics of this language. Up to now no specific one has been designed and implemented for the Arabic language. The purpose of this study is to present a name matching algorithm for Arabic language. In this project, after consideration of all major algorithms in this area, we selected one of the basic methods for name matching that we then expanded to make it work particularly  well for Arabic names. This proposed new algorithms based on the convergence and spacing between the Arabic characters in the keyboard of the mobile phone in order to give more accurate results for Arabic names. In this study the experiments have been
accomplished in order to evaluate the proposed algorithm (LD_F,LD_S and LD_KE). The first experiment has been
carried for the proposed algorithms (LD_F,LD_S,LD_KM and LD_KE). This experiment is carried based on F-Dataset which has 15 pairs of names. The result of the experiment showed that the proposed algorithms gave more accurate results than the Levenshtein algorithm. Therefore, it can be used in many applications such as Automatic Spell Correction (ASC), Search Engines (SE), Data Retrieval (DR), Computational Biology “DNA” ,Customer Relation Management (CRM), Customer Data Integration (CDI), AntiMoney Laundering (AML) and Criminal Investigation (CI).

Downloads

Download data is not yet available.

References

Ahmed,k,Elmagarimd (2007). Duplication Record Detection: A survey . IEEE transactionon knowledge and data enginering, , 19(1).

P. A. Hall and G. R. Dowling, “Approximate string matching. ACM Computing Surveys (CSUR),” ACM

Computing Surveys (CSUR), vol. 12, no. 4, pp. 381–402, 1980.

T. El-Shishtawy, “Linking Databases using Matched Arabic Names,” Computational Linguistics and Chinese Language Processing, vol. 19, no. 1, pp. 33–54, 2014.

Branting, L. K. (2003). A comparative evaluation of name-matching algorithms. ICAIL '03 Proceeding of the 9th international conference on Artificial intelligence and law (pp. 224-232). New York: ACM.

M.S. Waterman, T.F. Smith, and W.A. Beyer, “Some Biological Sequence Metrics,” Advances in Math., vol. 20, no. 4, pp. 367-387, 1976.

Smith, T. F., & Waterman, M. S. (1981). Identification of Common Molecular Subsequences. Journal of

Molecular Biology , 147, 195-197.

Jaro, M. (1989). Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of

Tampa, Florida. Journal of the American statistical Association , 89, 414-420.

F. Ahmed and A. N¨urnberger, “N-grams Conflation Approach for Arabic,” in ACM SIGIR Conference,

Amsterdam, 2007.

M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg, “Adaptive name matching in information integration,” IEEE Intelligent Systems, vol. 18, no. 5, pp. 16–23, 2003.

W. W. Cohen, “Integration of heterogeneous databases without common domains using queries based on textual similarity,” in ACM SIGMOD Record, vol. 27, pp. 201–212, ACM, 1998.

S. U. Aqeel, S. Beitzel, E. Jensen, D. Grossman, and O. Frieder, “On the Development of Name Search Techniques for Arabic,” Journal of the American Society for Information Science and Technology, vol. 57, no. 6, pp. 728–739, 2006.

H. A. Shedeed and H. Abdel, “A New Intelligent Methodology For Computer Based Assessment Of Short Answer Question Based On A New Enhanced Soundex Phonetic Algorithm For Arabic Language,” International Journal of Computer Applications, vol. 34, no. 10, 2011.

“Understanding classic soundex algorithms,”http://www.creativyst.com/Doc/Articles/SoundExl/SoundExl.htm.

L. Philips, “Hanging on the Metaphone,” Computer Language, vol. 7, no. 12, pp. 39–44, 1990. Levenshtein, V. I. (1966, February). Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady (Vol. 10, No. 8, pp. 707-710).

H. H. A. Ghafour, A. El-Bastawissy, and A. F. A. Heggazy, “AEDA: Arabic Edit Distance Algorithm

Towards A New Approach for Arabic Name Matching,” in IEEE International Conference, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, pp. 926–932,2011.

Al-Sanabani, M., & Al-Hagree, S. (2015) Improved An Algorithm For Arabic Name Matching. Open

Transactions On Information Processing ISSN (Print): 2374–3786 ISSN (Online): 2374–3778.

Alsurori, M., Al-Sanabani, M., & Salah, A. H. (2018). Design an Accurate Algorithm for Alias Detection, ISSN: 2074-9023 (Print), ISSN: 2074-9031 (Online).

Gueddah, H., & Yousfi, A. (2013, May). The impact of arabic inter-character proximity and similarity on spellchecking. In Intelligent Systems: Theories and Applications (SITA), 2013 8th International Conference on (pp. 1-4). IEEE.

Salah, A. H ,&Al-Sanabani, M.. (2016). A Framework For Name Matching In Arabic Language, 1st

Scientific Conference on Information Technology and Networks.

Hamza, B., Abdellah, Y., Hicham, G., & Mostafa, B. (2014). For an independent spell-checking system from the Arabic language vocabulary. International Journal of Advanced Computer Science and Applications.

Aljameel, S. S., O'Shea, J. D., Crockett, K. A., & Latham, A. (2016, December). Survey of string similarity approaches and the challenging faced by the Arabic language. In Computer Engineering & Systems (ICCES), 2016 11th International Conference on (pp. 241-247).IEEE.

Lhoussain, A. S., Hicham, G. U. E. D. D. A. H., & Abdellah, Y. O. U. S. F. I. (2015). Adaptating the

levenshtein distance to contextual spelling correction. International Journal Of Computer Science And

Application.(12), 1, 127-133.

Hicham, G. (2012). Introduction of the weight edition errors in the Levenshtein distance. arXiv preprint arXiv:1208.4503

Mohammed, N., & Abdellah, Y. (2018). The vocabulary and the morphology in spell checker. Procedia Computer Science, 127, 76-81.

Beernaerts, Jasper., Debever, E., Lenoir, M., De Baets, B., & Van de Weghe, N. (2019). A method based on the Levenshtein distance metric for the comparison of multiple movement patterns described by matrix sequences of different length. Expert Systems with Applications, 115, 373-385.

Rani, S., & Singh, J. (2017, October). Enhancing Levenshtein‟s Edit Distance Algorithm for Evaluating

Document Similarity. In International Conference on Computing, Analytics and Networks(pp. 72-80). Springer, Singapore.

Ichimura, T., & Kamada, S. (2013, October). A Clonal Selection Algorithm with Levenshtein Distance

based Image Similarity in Multidimensional Subjective Tourist Information and Discovery of Cryptic Spots by Interactive GHSOM. In Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on (pp. 2085- 2090). IEEE.

Halim, D., & Hansun, S. (2018). Voice Control in Calorie Tracker Application using Levenshtein Distance Algorithm. Aceh International Journal of Science and Technology, 7(1), 1-10.

Nurhayati., & Busman . (2017). Development of document plagiarism detection on Android smartphone. IEEE . https://ieeexplore.ieee.org/document/8089249 .

Wakil, K., Ghafoor, M., Abdulrahman, M., & Tariq, S. (2017). Plagiarism Detection System for the Kurdish.

Lodhi, A., Razzaq, S., & Gull, M. Detecting Urdu Text Plagiarism Using Similarity Matching Techniques.

Downloads

Published

2019-08-31

How to Cite

Alsorori, M., Sanabani, M. A.-., Hagree, S. A.-., Abdulmalik, S., AlArhabi, N. A.-H., Abdu, S., & Meqran, K. (2019). Developing Algorithm for Matching Arabic Names Entered by Mobile Phone. International Journal For Research In Advanced Computer Science And Engineering, 5(8), 01–10. https://doi.org/10.53555/cse.v5i8.1045