Developing Algorithm for Matching Arabic Names Entered by Mobile Phone
DOI:
https://doi.org/10.53555/cse.v5i8.1045Keywords:
Arabic Language, Name Matching, Levenshtein Distance, Mobile Phone, Phone Keyboard ArabicAbstract
Name matching plays a vital and crucial role in many applications. They are for example used in information retrieval or deduplication systems to do comparisons among names to match them together or to find the names that refer to identical objects, persons, or companies. Since names in each application are subject to variations and errors that are unavoidable in any system and because of the importance of name matching, so far many algorithms have been developed to handle matching of names. These algorithms consider the name variations that may happen because of spelling, pattern or phonetic modifications. However most existing methods were developed for use with the English language and so cover the characteristics of this language. Up to now no specific one has been designed and implemented for the Arabic language. The purpose of this study is to present a name matching algorithm for Arabic language. In this project, after consideration of all major algorithms in this area, we selected one of the basic methods for name matching that we then expanded to make it work particularly well for Arabic names. This proposed new algorithms based on the convergence and spacing between the Arabic characters in the keyboard of the mobile phone in order to give more accurate results for Arabic names. In this study the experiments have been
accomplished in order to evaluate the proposed algorithm (LD_F,LD_S and LD_KE). The first experiment has been
carried for the proposed algorithms (LD_F,LD_S,LD_KM and LD_KE). This experiment is carried based on F-Dataset which has 15 pairs of names. The result of the experiment showed that the proposed algorithms gave more accurate results than the Levenshtein algorithm. Therefore, it can be used in many applications such as Automatic Spell Correction (ASC), Search Engines (SE), Data Retrieval (DR), Computational Biology “DNA” ,Customer Relation Management (CRM), Customer Data Integration (CDI), AntiMoney Laundering (AML) and Criminal Investigation (CI).
Downloads
References
Ahmed,k,Elmagarimd (2007). Duplication Record Detection: A survey . IEEE transactionon knowledge and data enginering, , 19(1).
P. A. Hall and G. R. Dowling, “Approximate string matching. ACM Computing Surveys (CSUR),” ACM
Computing Surveys (CSUR), vol. 12, no. 4, pp. 381–402, 1980.
T. El-Shishtawy, “Linking Databases using Matched Arabic Names,” Computational Linguistics and Chinese Language Processing, vol. 19, no. 1, pp. 33–54, 2014.
Branting, L. K. (2003). A comparative evaluation of name-matching algorithms. ICAIL '03 Proceeding of the 9th international conference on Artificial intelligence and law (pp. 224-232). New York: ACM.
M.S. Waterman, T.F. Smith, and W.A. Beyer, “Some Biological Sequence Metrics,” Advances in Math., vol. 20, no. 4, pp. 367-387, 1976.
Smith, T. F., & Waterman, M. S. (1981). Identification of Common Molecular Subsequences. Journal of
Molecular Biology , 147, 195-197.
Jaro, M. (1989). Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of
Tampa, Florida. Journal of the American statistical Association , 89, 414-420.
F. Ahmed and A. N¨urnberger, “N-grams Conflation Approach for Arabic,” in ACM SIGIR Conference,
Amsterdam, 2007.
M. Bilenko, R. Mooney, W. Cohen, P. Ravikumar, and S. Fienberg, “Adaptive name matching in information integration,” IEEE Intelligent Systems, vol. 18, no. 5, pp. 16–23, 2003.
W. W. Cohen, “Integration of heterogeneous databases without common domains using queries based on textual similarity,” in ACM SIGMOD Record, vol. 27, pp. 201–212, ACM, 1998.
S. U. Aqeel, S. Beitzel, E. Jensen, D. Grossman, and O. Frieder, “On the Development of Name Search Techniques for Arabic,” Journal of the American Society for Information Science and Technology, vol. 57, no. 6, pp. 728–739, 2006.
H. A. Shedeed and H. Abdel, “A New Intelligent Methodology For Computer Based Assessment Of Short Answer Question Based On A New Enhanced Soundex Phonetic Algorithm For Arabic Language,” International Journal of Computer Applications, vol. 34, no. 10, 2011.
“Understanding classic soundex algorithms,”http://www.creativyst.com/Doc/Articles/SoundExl/SoundExl.htm.
L. Philips, “Hanging on the Metaphone,” Computer Language, vol. 7, no. 12, pp. 39–44, 1990. Levenshtein, V. I. (1966, February). Binary codes capable of correcting deletions, insertions, and reversals. In Soviet physics doklady (Vol. 10, No. 8, pp. 707-710).
H. H. A. Ghafour, A. El-Bastawissy, and A. F. A. Heggazy, “AEDA: Arabic Edit Distance Algorithm
Towards A New Approach for Arabic Name Matching,” in IEEE International Conference, IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 15, pp. 926–932,2011.
Al-Sanabani, M., & Al-Hagree, S. (2015) Improved An Algorithm For Arabic Name Matching. Open
Transactions On Information Processing ISSN (Print): 2374–3786 ISSN (Online): 2374–3778.
Alsurori, M., Al-Sanabani, M., & Salah, A. H. (2018). Design an Accurate Algorithm for Alias Detection, ISSN: 2074-9023 (Print), ISSN: 2074-9031 (Online).
Gueddah, H., & Yousfi, A. (2013, May). The impact of arabic inter-character proximity and similarity on spellchecking. In Intelligent Systems: Theories and Applications (SITA), 2013 8th International Conference on (pp. 1-4). IEEE.
Salah, A. H ,&Al-Sanabani, M.. (2016). A Framework For Name Matching In Arabic Language, 1st
Scientific Conference on Information Technology and Networks.
Hamza, B., Abdellah, Y., Hicham, G., & Mostafa, B. (2014). For an independent spell-checking system from the Arabic language vocabulary. International Journal of Advanced Computer Science and Applications.
Aljameel, S. S., O'Shea, J. D., Crockett, K. A., & Latham, A. (2016, December). Survey of string similarity approaches and the challenging faced by the Arabic language. In Computer Engineering & Systems (ICCES), 2016 11th International Conference on (pp. 241-247).IEEE.
Lhoussain, A. S., Hicham, G. U. E. D. D. A. H., & Abdellah, Y. O. U. S. F. I. (2015). Adaptating the
levenshtein distance to contextual spelling correction. International Journal Of Computer Science And
Application.(12), 1, 127-133.
Hicham, G. (2012). Introduction of the weight edition errors in the Levenshtein distance. arXiv preprint arXiv:1208.4503
Mohammed, N., & Abdellah, Y. (2018). The vocabulary and the morphology in spell checker. Procedia Computer Science, 127, 76-81.
Beernaerts, Jasper., Debever, E., Lenoir, M., De Baets, B., & Van de Weghe, N. (2019). A method based on the Levenshtein distance metric for the comparison of multiple movement patterns described by matrix sequences of different length. Expert Systems with Applications, 115, 373-385.
Rani, S., & Singh, J. (2017, October). Enhancing Levenshtein‟s Edit Distance Algorithm for Evaluating
Document Similarity. In International Conference on Computing, Analytics and Networks(pp. 72-80). Springer, Singapore.
Ichimura, T., & Kamada, S. (2013, October). A Clonal Selection Algorithm with Levenshtein Distance
based Image Similarity in Multidimensional Subjective Tourist Information and Discovery of Cryptic Spots by Interactive GHSOM. In Systems, Man, and Cybernetics (SMC), 2013 IEEE International Conference on (pp. 2085- 2090). IEEE.
Halim, D., & Hansun, S. (2018). Voice Control in Calorie Tracker Application using Levenshtein Distance Algorithm. Aceh International Journal of Science and Technology, 7(1), 1-10.
Nurhayati., & Busman . (2017). Development of document plagiarism detection on Android smartphone. IEEE . https://ieeexplore.ieee.org/document/8089249 .
Wakil, K., Ghafoor, M., Abdulrahman, M., & Tariq, S. (2017). Plagiarism Detection System for the Kurdish.
Lodhi, A., Razzaq, S., & Gull, M. Detecting Urdu Text Plagiarism Using Similarity Matching Techniques.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2019 International Journal For Research In Advanced Computer Science And Engineering (ISSN: 2208-2107)
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
In consideration of the journal, Green Publication taking action in reviewing and editing our manuscript, the authors undersigned hereby transfer, assign, or otherwise convey all copyright ownership to the Editorial Office of the Green Publication in the event that such work is published in the journal. Such conveyance covers any product that may derive from the published journal, whether print or electronic. Green Publication shall have the right to register copyright to the Article in its name as claimant, whether separately
or as part of the journal issue or other medium in which the Article is included.
By signing this Agreement, the author(s), and in the case of a Work Made For Hire, the employer, jointly and severally represent and warrant that the Article is original with the author(s) and does not infringe any copyright or violate any other right of any third parties, and that the Article has not been published elsewhere, and is not being considered for publication elsewhere in any form, except as provided herein. Each author’s signature should appear below. The signing author(s) (and, in