Vol 5 No 8 (2019): International Journal For Research In Advanced Computer Science And Engineering (ISSN: 2208-2107)
Articles

Developing Algorithm For Matching Arabic Names Entered by Mobile Phone

Muneer Alsorori
Ibb University
Bio
Maher Al- Sanabani
Thamar University
Bio
Salah AL-Hagree
Ibb University
Bio
Sarah Abdulmalik
Ibb University ,Yemen
Bio
Noor Al-huda AlArhabi
Ibb University
Bio
Suad Abdu
Ibb University
Bio
Khawlah Meqran
Ibb University
Bio
Published August 29, 2019
Keywords
  • Arabic Language,
  • Name Matching,
  • Levenshtein Distance,
  • Mobile Phone,
  • Phone Keyboard Arabic
How to Cite
Alsorori, M., Sanabani, M. A.-, Salah AL-Hagree, Sarah Abdulmalik, Noor Al-huda AlArhabi, Suad Abdu, & Khawlah Meqran. (2019). Developing Algorithm For Matching Arabic Names Entered by Mobile Phone. International Journal For Research In Advanced Computer Science And Engineering (ISSN: 2208-2107), 5(8), 01-10. Retrieved from https://gnpublication.org/index.php/cse/article/view/1045

Abstract

Name matching plays a vital and crucial role in many applications. They are for example used in information retrieval or deduplication systems to do comparisons among names to match them together or to find the names that refer to identical objects, persons, or companies. Since names in each application are subject to variations and errors that are unavoidable in any system and because of the importance of name matching, so far many algorithms have been developed to handle matching of names. These algorithms consider the name variations that may happen because of spelling, pattern or phonetic modifications. However most existing methods were developed for use with the English language and so cover the characteristics of this language. Up to now no specific one has been designed and implemented for the Arabic language. The purpose of this study is to present a name matching algorithm for Arabic language. In this project, after consideration of all major algorithms in this area, we selected one of the basic methods for name matching that we then expanded to make it work particularly  well for Arabic names. This proposed new algorithms based on the convergence and spacing between the Arabic characters in the keyboard of the mobile phone in order to give more accurate results for Arabic names. In this study the experiments have been
accomplished in order to evaluate the proposed algorithm (LD_F,LD_S and LD_KE). The first experiment has been
carried for the proposed algorithms (LD_F,LD_S,LD_KM and LD_KE). This experiment is carried based on F-Dataset which has 15 pairs of names. The result of the experiment showed that the proposed algorithms gave more accurate results than the Levenshtein algorithm. Therefore, it can be used in many applications such as Automatic Spell Correction (ASC), Search Engines (SE), Data Retrieval (DR), Computational Biology “DNA” ,Customer Relation
Management (CRM), Customer Data Integration (CDI), AntiMoney Laundering (AML) and Criminal Investigation (CI)..

Downloads

Download data is not yet available.

References

  1. Ahmed,k,Elmagarimd (2007). Duplication Record Detection: A survey . IEEE transactionon knowledge and
  2. dataenginering, , 19(1).
  3. P. A. Hall and G. R. Dowling, “Approximate string matching. ACM Computing Surveys (CSUR),” ACM
  4. Computing Surveys (CSUR), vol. 12, no. 4, pp. 381–402, 1980.
  5. T. El-Shishtawy, “Linking Databases using Matched Arabic Names,” Computational Linguistics and Chinese
  6. Language Processing, vol. 19, no. 1, pp. 33–54, 2014.
  7. Branting, L. K. (2003). A comparative evaluation of name-matching algorithms. ICAIL '03 Proceeding of the
  8. th international conference on Artificial intelligence and law (pp. 224-232). New York: ACM.
  9. M.S. Waterman, T.F. Smith, and W.A. Beyer, “Some Biological Sequence Metrics,” Advances in Math., vol. 20,
  10. no. 4, pp. 367-387, 1976.
  11. Smith, T. F., & Waterman, M. S. (1981). Identification of Common Molecular Subsequences. Journal of
  12. Molecular Biology , 147, 195-197.
  13. Jaro, M. (1989). Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of
  14. Tampa, Florida. Journal of the American statistical Association , 89, 414-420.
  15. Table 5. The Average similarity of LD, LD_S, LD_F, LD_KE
  16. and D_KM algorithmsDeveloping Algorithm For Matching Arabic Names Entered by Mobile Phone 3
  17. F. Ahmed and A. N¨urnberger, “N-grams Conflation Approach for Arabic,” in ACM SIGIR Conference,
  18. Amsterdam, 2007.
  19. M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg, “Adaptive name matching in information
  20. integration,” IEEE Intelligent Systems, vol. 18, no. 5, pp. 16–23, 2003.
  21. W. W. Cohen, “Integration of heterogeneous databases without common domains using queries based
  22. on textual similarity,” in ACM SIGMOD Record, vol. 27,
  23. pp. 201–212, ACM, 1998.
  24. S. U. Aqeel, S. Beitzel, E. Jensen, D. Grossman, and O. Frieder, “On the Development of Name Search Techniques for Arabic,” Journal of the American Society for Information Science and Technology, vol. 57, no. 6, pp. 728–739, 2006.
  25. H. A. Shedeed and H. Abdel, “A New Intelligent Methodology For Computer Based Assessment Of Short
  26. Answer Question Based On A New Enhanced Soundex Phonetic Algorithm For Arabic Language,” International
  27. Journal of Computer Applications, vol. 34, no. 10, 2011.
  28. “Understanding classic soundex algorithms,”http://www.creativyst.com/Doc/Articles/SoundExl/SoundExl.htm.
  29. L. Philips, “Hanging on the Metaphone,” Computer Language, vol. 7, no. 12, pp. 39–44, 1990. Levenshtein, V.
  30. I. (1966, February). Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics
  31. doklady (Vol. 10, No. 8, pp. 707-710).
  32. H. H. A. Ghafour, A. El-Bastawissy, and A. F. A. Heggazy, “AEDA: Arabic Edit Distance Algorithm
  33. Towards A New Approach for Arabic Name Matching,” in IEEE International Conference, IEEE Trans. Pattern
  34. Analysis and Machine Intelligence, vol. 15, pp. 926–932,2011.
  35. Al-Sanabani, M., & Al-Hagree, S. (2015) Improved An Algorithm For Arabic Name Matching. Open
  36. Transactions On Information Processing ISSN (Print): 2374–3786 ISSN (Online): 2374–3778.
  37. Alsurori, M., Al-Sanabani, M., & Salah, A. H. (2018). Design an Accurate Algorithm for Alias Detection,
  38. ISSN: 2074-9023 (Print), ISSN: 2074-9031 (Online).
  39. Gueddah, H., & Yousfi, A. (2013, May). The impact of arabic inter-character proximity and similarity on spellchecking. In Intelligent Systems: Theories and Applications (SITA), 2013 8th International Conference
  40. on (pp. 1-4). IEEE.
  41. Salah, A. H ,&Al-Sanabani, M.. (2016). A Framework For Name Matching In Arabic Language, 1st
  42. Scientific Conference on Information Technology and Networks.
  43. Hamza, B., Abdellah, Y., Hicham, G., & Mostafa, B. (2014). For an independent spell-checking system from the
  44. Arabic language vocabulary. International Journal of Advanced Computer Science and Applications.
  45. Aljameel, S. S., O'Shea, J. D., Crockett, K. A., & Latham, A. (2016, December). Survey of string similarity
  46. approaches and the challenging faced by the Arabic language. In Computer Engineering & Systems (ICCES),
  47. 11th International Conference on (pp. 241-247).IEEE.
  48. Lhoussain, A. S., Hicham, G. U. E. D. D. A. H., & Abdellah, Y. O. U. S. F. I. (2015). Adaptating the
  49. levenshtein distance to contextual spelling correction. International Journal Of Computer Science And
  50. Application.(12), 1, 127-133.
  51. Hicham, G. (2012). Introduction of the weight edition errors in the Levenshtein distance. arXiv preprint
  52. arXiv:1208.4503
  53. Mohammed, N., & Abdellah, Y. (2018). The vocabulary and the morphology in spell checker. Procedia
  54. Computer Science, 127, 76-81.
  55. Beernaerts, Jasper., Debever, E., Lenoir, M., De Baets, B., & Van de Weghe, N. (2019). A method based on
  56. the Levenshtein distance metric for the comparison of multiple movement patterns described by matrix sequences of different length. Expert Systems with Applications, 115, 373-385.
  57. Rani, S., & Singh, J. (2017, October). Enhancing Levenshtein‟s Edit Distance Algorithm for Evaluating
  58. Document Similarity. In International Conference on Computing, Analytics and Networks(pp. 72-80). Springer,
  59. Singapore.
  60. Ichimura, T., & Kamada, S. (2013, October). A Clonal Selection Algorithm with Levenshtein Distance
  61. based Image Similarity in Multidimensional Subjective Tourist Information and Discovery of Cryptic Spots by
  62. Interactive GHSOM. In Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on (pp. 2085-
  63. . IEEE.
  64. Halim, D., & Hansun, S. (2018). Voice Control in Calorie Tracker Application using Levenshtein Distance
  65. Algorithm. Aceh International Journal of Science and Technology, 7(1), 1-10.
  66. Nurhayati., & Busman . (2017). Development of document plagiarism detection on Android smartphone.
  67. IEEE . https://ieeexplore.ieee.org/document/8089249 .
  68. Wakil, K., Ghafoor, M., Abdulrahman, M., & Tariq,
  69. S. (2017). Plagiarism Detection System for the Kurdish.
  70. Lodhi, A., Razzaq, S., & Gull, M. Detecting Urdu
  71. Text Plagiarism Using Similarity Matching Techniques.