Kitabı oku: «Innovando la educación en la tecnología», sayfa 22

Yazı tipi:

6. CONCLUSIONES

Al ser una de las tendencias la generación de data y la necesidad de las organizaciones a obtener información y generar conocimientos de esta data, unos de los sectores con más oportunidades de minería de data es el sector retail. Cuando se habla de la minería de reglas de asociaciones siempre se ha tenido como referente al algoritmo Apriori, sin embargo, el algoritmo utilizado en este trabajo se presenta como una alternativa a las deficiencias del algoritmo clásico. FP-Growth propone la abstracción de la data en una estructura árbol y luego la minería de este, reduciendo así los costos en tiempo y recursos computacionales.

En este trabajo se realiza la implementación de FP-Growth para un conjunto de datos de una empresa de retail peruana perteneciente a los autoservicios. Se realiza una implementación en paralelo haciendo uso del framework Apache Spark, el cual tiene como ventaja hacer procesamiento paralelo, con una limitante, todo fue construido sobre un solo ordenador, lo que no permite explotar las capacidades del procesamiento en clúster. Existen otras soluciones como la implementación en clústeres o en máquinas virtuales, asimismo existen soluciones novedosas implementadas en la nube como es el servicio de Amazon. Ellos brindan un ambiente de clústeres preparados para funcionar con el framework Spark.

Finalmente, hay que considerar que las necesidades del negocio y sus objetivos estratégicos juegan un papel dominante y crítico en la implementación de un modelo como el propuesto en este trabajo. Este análisis define los parámetros de funcionamiento del modelo y los resultados obtenidos tienen que ser relevantes y consistentes con la información del negocio.

REFERENCIAS

Agrawal, R., Imienlinski, T., y Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data, 22(2), 207-216. doi:10.1145/170035.170072

Bhandari, A., Gupta, A., y Das, D. (2015). Improvised Apriori algorithm using frequent pattern tree for real time applications in data mining. Procedia Computer Science, 46, 644-651. doi:10.1016/j.procs.2015.02.115

Chunhua, J., y Dongjun, N. (2008). Distributed mining model and algorithm of association rules for Chain retail enterprise. Proceedings - ISECS International Colloquium on Computing, Communication, Control, and Management, CCCM 2008, 3, 235-239. doi:10.1109/CCCM.2008.129

Di Fatta, G. (2019). Association rules and frequent patterns. En S. Ranganathan, K. Nakai, y C. Schonbach, (Eds.), Encyclopedia of Bioinformatics and Computational Biology (pp. 367-373). doi:10.1016/B978-0-12-809633-8.20333-6

Fournier-Viger, P., Lin, J. C. W., Vo, B., Chi, T. T., Zhang, J., y Le, H. B. (2017). A survey of itemset mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 7(4), 1-41. doi:10.1002/widm.1207

Galarreta Vásquez, J. (2016). Inducción de reglas de asociación de minería de datos en base de datos de entidad retail. Rev. Ingeniería: Ciencia, Tecnología e Innovación, 3(2).

Girotra, M., Kanika, N., Saloni, M., y Neha, S. (2013). Comparative survey on association rule mining algorithms. International Journal of Computer Applications, 84(10), 975-8887. Recuperado de https://pdfs.semanticscholar.org/08a7/a7c571159d14660a402a4460ee1f828c5fed.pdf

Griva, A., Bardaki, C., Pramatari, K., y Papakiriakopoulos, D. (2018). Retail business analytics: Customer visit segmentation using market basket data. Expert Systems with Applications, 100, 1-16. doi:10.1016/j.eswa.2018.01.029

Han, J., Kamber, M., y Pei, J. (2011). Data mining: Concepts and techniques. doi:10.1016/C2009-0-61819-5

Han, J., Pei, J., y Yin, Y. (2000). Mining frequent patterns without candidate generation. Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, 1-12.

Hussain, R. Z., y Srivatsa, S. K. (2014). A study of different association rule mining techniques. International Journal of Computer Applications, 108, 10-15.

Kaur, M., y Kang, S. (2016). Market Basket Analysis: Identify the changing trends of market data using association rule mining. Procedia. Procedia Computer Science, 85, 78-85. doi:10.1016/j.procs.2016.05.180

Tan, P., Steinbach, M., y Kumar, V. (2005). Introduction to Data Mining. Discovering Knowledge in Data. doi:10.1002/0471687545.ch1

Xue, L., Wang, H., Liu, S., y Li, C. (2010). The application of data mining in online bookstore. 2010 International Conference on Machine Learning and Cybernetics, 1294-1298. doi:10.1109/ICMLC.2010.5580892

PÓSTERES

COMPARACIÓN ENTRE REGRESIÓN LOGÍSTICA Y RANDOM FOREST PARA DETERMINACIÓN DE FACTORES DE VIOLENCIA DE PAREJA EN EL PERÚ

Ashley Mercedes Guerrero-Muguerza

La violencia de pareja es una problemática social que ha sido estudiada por diferentes investigadores para los factores que influyen en la ocurrencia de la misma, considerando diferentes entornos, tiempos y locaciones. El 68,2 % de mujeres han sido víctimas de violencia, y el 31,7 % fueron víctimas de agresión física en el Perú. La presente investigación propone nueve modelos basados en logística y random forest con las de chi-square, entropía y Gini, y tres sub escenarios de cinco, diez y veinte variables que utilizaron el dataset de denuncias registradas en el año 2016 del Ministerio de la Mujer. Se obtuvo el mejor resultado de cada subescenario, pero finalmente el mejor modelo fue el de veinte variables utilizando el feature selection random forest (entropy) y el modelo random forest (Gini).

Comparison Between Logistic Regression and Random Forest for Determining Factors of Intimate Partner Violence in Peru

Intimate partner violence is a social problem that has been studied by different researchers to determine the factors that influence its occurrence, considering different environments, moments and locations. Sixty-eight point two percent (68.2%) of women have been victims of violence and 31.7% have been victims of physical aggression in Peru. The present research proposes nine models based on logistic regression and random forest with variants such as chi-square, Entropy and Gini, and three subscenarios out of five, ten and twenty variables that used the dataset of complaints registered in 2016 at the Ministry of Women. The best result of each subscenario was obtained, but finally the best model was that of twenty variables which used the random forest “feature selection” (Entropy) and the random forest model (Gini).

BLOCKCHAIN Y SMART CONTRACT PARA LA TRAZABILIDAD DE LAS DONACIONES

Rossy Espinoza / Carlos Ugaz

En la actualidad, una organización que administra donaciones tiene dificultades para controlar el flujo de ingreso y salida de donativos, dificultando la trazabilidad de estos. En esta investigación se propone un sistema basado en blockchain que garantice la trazabilidad de los donativos desde su origen hasta su destino, utilizando un protocolo de consenso proof of work y un contrato inteligente (smart contract) con funciones específicas.

Blockchain and Smart Contract for Traceability of Donations

At present, an organization that administers donations has difficulties in controlling the inflow and outflow of such donations, hindering their traceability. This research proposes a blockchain-based system that guarantees the traceability of donations from their origin to their destination, using a proof of work consensus protocol and an intelligent contract with specific functions.

COMPARACIÓN DE MÉTODOS PARA CLASIFICAR COMENTARIOS DE LUGARES TURÍSTICOS POR MEDIO DE ANÁLISIS DE AENTIMIENTO

Luis Guillermo Herrera-Sarmiento

Hoy en día los turistas luego de visitar algún destino, plasman sus experiencias como opiniones en diversas fuentes digitales, siendo información valiosa para empresas turísticas o relacionadas para identificar qué sitios son una oportunidad de mejora para los turistas durante la planificación de sus viajes. En esta investigación se propone la comparación de Support Vector Machine (SVM), Naïve Bayes (NB) y método propuesto basado en SVM y chi square como método de selección de características. La técnica híbrida propuesta obtuvo el mejor resultado, seguido de SVM y por último Naïve Bayes, cada una con 76,50 %, 67,53 % y 66,91 % de precisión, respectivamente.

Comparison of Methods for Classifying Comments on Tourist Places by Sentiment Analysis

Nowadays, tourists express their experiences as opinions in various digital sources after visiting a destination, which is considered a valuable information for tourist companies or other related companies to identify which places are an opportunity for improvement, and for tourists when planning their trips. This research proposes the comparison of the support vector machine (SVM), naïve Bayes (NB), and a suggested method based on SVM and chi-square as a feature selection method. The proposed hybrid technique obtained the best result, followed by SVM and finally naïve Bayes, each with 76.50%, 67.53% and 66.91% accuracy, respectively.

SISTEMA DE MONITOREO DE CALIDAD DE AGUA EN TANQUES ELEVADOS USANDO SENSORES IOT

Alejandro Javier Ortiz-Bazalar

A diferencia de otros recursos, el agua no puede ser substituida. Todos los seres vivos del planeta Tierra sobreviven gracias al agua, es así que asegurar la calidad del agua usando controles proactivos es urgente y necesario. Este trabajo muestra una nueva alternativa para el monitoreo de agua potable, aprovechando nuevas tecnologías como: el internet de las cosas y la computación en la nube.

Water Quality Monitoring System in Elevated Tanks Using IoT Sensors

Unlike other resources, water cannot be substituted. Every life on earth survives thanks to water, so ensuring water quality using proactive controls is urgent and necessary. This work shows a new alternative for monitoring drinking water, taking advantage of new technologies such as the Internet of Things and cloud computing.

ANÁLISIS COMPARATIVO DE TÉCNICAS DE ESTEGANÁLISIS EN IMÁGENES DIGITALES LSB

Luis Sifuentes-Villarroel

En la presente investigación se evaluaron dos métodos de esteganálisis frente a imágenes LSB, comparando su efectividad en la detección de imágenes esteganográficas y del porcentaje que abarca el mensaje oculto en la imagen portadora (porcentaje de embebido) respecto a la totalidad de la imagen.

Comparative Analysis of Steganalysis Techniques in LSB Digital Images

In the present research, two methods of steganalysis were evaluated against LSB images, comparing their effectiveness in detecting steganographic images and the percentage covered by the hidden message in the carrier image (percentage of embedding) with respect to the entire image.

SISTEMA DE INCREMENTO DE VOCABULARIO PARA LA MEJORA DE LA COMPRENSIÓN LECTORA EN PRIMARIA CON AYUDA DE REALIDAD AUMENTADA

Guianella Tania Urday-Ibarra

El Perú presenta resultados insatisfactorios en las evaluaciones internacionales y nacionales en el área de lectura. En los resultados del Programa para la Evaluación Internacional de Alumnos (PISA) del año 2015, el Perú obtuvo 398 puntos, casi 100 puntos por debajo de la media. El país se posicionó en el puesto 64 de los 72 participantes. En la evaluación nacional ECE, a medida que los grados avanzan el porcentaje de cumplimiento de objetivos académicos van decayendo. Pasa de 46,4 %, 31,4 % y 14,3 %, en el segundo y cuarto grado de primaria, y segundo de secundaria, respectivamente. El factor por acatar dentro del problema de la baja comprensión lectora es el vocabulario reducido.

A System to Improve Vocabulary for Enhancing Reading Comprehension in Primary School with the Help of Augmented Reality

Peru has achieved unsatisfactory results in international and national reading assessments. In the 2015 Programme for International Student Assessment (PISA), Peru obtained 398 points, almost 100 points below the average. The country ranked 64 out of 72 participants. In the national ECE assessment, the higher the school grade, the less the achievement percentage of academic goals. Said percentage accounts for 46.4% and 31.4% in the 2nd and 4th grade of primary school, respectively, and 14.3% in the 2nd grade of secondary school. The factor that causes this problem is a poor vocabulary.

APLICACIÓN EN REALIDAD VIRTUAL PARA CONCIENTIZACIÓN DEL PELIGRO DE SOBREEXPOSICIÓN A LA RADIACIÓN UV

Edgard Javier Hernán Vargas-Solís

La presente investigación busca concientizar a la población acerca de la elevada radiación UV registrada en nuestro país, a partir del desarrollo de una aplicación en realidad virtual. Este estudio evaluará el desempeño de dicha aplicación a través de un cuestionario.

A Virtual Reality Application to Raise Awareness on the Danger of Overexposure to UV Radiation

This research seeks to raise public awareness on high UV radiation levels recorded in our country. A virtual reality application was developed from a study and the performance of such application was evaluated through a questionnaire.

IDENTIFICACIÓN DE MICROEXPRESIONES FACIALES DURANTE EL PROCESO DE SELECCIÓN DE PERSONAL

Luis Villanueva

¿Cómo saber si la información brindada por un postulante durante un proceso de selección de personal es verídica? Paul Ekman, psicólogo pionero en el estudio de las expresiones faciales, vincula las mentiras con las microexpresiones, un tipo de gesto facial sutil, corto y que representa emociones que una persona intenta reprimir. Esta investigación propone el uso de Local Binary Patterns y Support Vector Machines para detectar microexpresiones y clasificarlas a través de la implementación de un sistema que fue probado en la base de datos SAMM y en sujetos de prueba.

Identifying Facial Microexpressions During a Personnel Selection Process

How to know if the information provided by an applicant during a personnel selection process is true? Paul Ekman, a pioneer psychologist in the study of facial expressions, links lies with microexpressions, a type of subtle, short facial gesture that represents emotions that a person tries to repress. This research proposes the use of “local binary patterns” and “support vector machines” to detect microexpressions and classify them through the implementation of a system evaluated in the SAMM database and test subjects.

HONEYCLOUD - COMBINING RESEARCH AND TEACHING IN A PROJECT FOR THE DIGITALIZATION OF BEEKEEPING

Alexander Hilgarth / Michael Dorin / Sergio Montenegro

The HONEYCLOUD project is a research project of the University of Würzburg in the field of bioeconomics. Within the scope of technology transfer, tools and methods that were originally developed for aerospace applications are now to be made available for precision agriculture. In particular, the field of precision beekeeping was selected for this purpose. An IT infrastructure to support the honeybees and the work of beekeeping is to be developed. The aim is to ensure that the electronics remain unobtrusive and that the beehive does not become a switch cabinet. This results in an experimental test setup that can be used in the field of teaching Internet-of-Things (IoT) systems. In this way, hardware and software components are created in the project and used directly in teaching.

HONEYCLOUD - Combinando investigación y docencia en un proyecto para la digitalización de la apicultura

El proyecto HONEYCLOUD es un proyecto de investigación de la Universidad de Würzburg en el campo de la bioeconomía. Dentro del alcance de una transferencia de tecnología, las herramientas y métodos que se desarrollaron originalmente para aplicaciones aeroespaciales ahora estarán disponibles para la agricultura de precisión. En particular, se seleccionó el campo de la apicultura de precisión para este propósito. Se debe desarrollar una infraestructura de TI para apoyar a las abejas y el trabajo de la apicultura. El objetivo es asegurar que la electrónica permanezca discreta y que la colmena no se convierta en un armario de distribución. Esto da como resultado una configuración de prueba experimental que se puede utilizar en el campo de la enseñanza de sistemas de Internet de las cosas (IoT). De esta forma, se crean componentes de hardware y software en el proyecto que se utilizan directamente en la enseñanza.

IMPLEMENTING MACHINE LEARNING ALGORITHMS TO PREDICT DONOR STATUS: PRELIMINARY WORK WITH DATA FROM AN INSTITUTION OF HIGHER LEARNING

Cecilia Coulter / Paula Baingana / Pascaline Mukakamari

Identifying potential donors allows institutions of higher learning to conduct more effective fundraising campaigns. Machine learning classification algorithms can be useful in building models to predict donor status. However, when data contains imbalanced classes, like the data we used for this project, models tend to over-index the majority class, which was non-donors in this case. These results have significant implications for institutions in that they may not pursue entities that may, in fact, become donors. In order to improve the usefulness of our model, we used a resampling technique called random undersampling (RUS) to balance the data and also the area under the receiver operating characteristic curve (AUC-ROC) metric to evaluate the performance. Our final model improved its predictive power from 67% to 76%. Institutions of higher learning can use this machine learning model to more efficiently target the pool of potential donors, saving money and time. Future research will focus on improving the predictive accuracy of our model by exploring other data manipulation techniques that minimize the effect of imbalanced data, changing thresholds for classification algorithms, and using genetic programming and feature engineering.

Implementación de algoritmos de aprendizaje automático para predecir el estado del donante

La identificación de posibles donantes permite a las instituciones de educación superior realizar campañas de recaudación de fondos más efectivas. Los algoritmos de clasificación de aprendizaje automático pueden ser útiles en la construcción de modelos para predecir el estado del donante. Sin embargo, cuando los datos contienen clases desequilibradas, como los datos que utilizamos para este proyecto, los modelos tienden a sobreindicar la clase mayoritaria, que en este caso eran los no donantes. Estos resultados tienen implicaciones significativas para las instituciones, ya que pueden no perseguir entidades que, de hecho, pueden convertirse en donantes. Para mejorar la utilidad de nuestro modelo, utilizamos una técnica de remuestreo llamada Random Under Sampling (RUS) para equilibrar los datos y utilizamos la métrica del área bajo la curva (AUC-ROC) para evaluar el rendimiento. Nuestro modelo final mejoró su poder predictivo del 67 % al 76 %. Las instituciones de educación superior pueden usar este modelo de aprendizaje automático para apuntar de manera más eficiente al grupo de donantes potenciales, ahorrando dinero y tiempo. La investigación futura se centrará en mejorar la precisión predictiva de nuestro modelo mediante la exploración de otras técnicas de manipulación de datos que minimicen el efecto de los datos desequilibrados, los umbrales cambiantes para los algoritmos de clasificación y el uso de la programación genética, así como la ingeniería de características.

OPEN-SOURCE SOFTWARE & PERSONAL MEDICAL DEVICES: INTERPRETING RISK THROUGH AN EVALUATION OF SOFTWARE TESTING

Michael Dorin / Heather Mortensen / Sergio Montenegro

The availability of powerful low-cost hardware and advanced software tools has made open-source medical devices possible. Deciding to use an open-source medical device may require acceptance of some risk. Fully comprehending the risk level is essential since failure of the software or the medical device is dangerous. As many medical applications contain complicated codes, an excellent method for understanding software readiness is to evaluate how much testing has been completed on the software. As a case study, this project evaluates the level of testing performed on software components of the Loop Artificial Pancreas system.

Classic methods for evaluating complicated source codes are used to demonstrate how much testing is needed in a project. Our analysis shows that the Loop Artificial Pancreas system (master branch) has been thoroughly tested with most of the faults likely discovered. By using classic software engineering metrics and techniques, it is possible to gauge how completely an open-source medical product has been tested and make an educated decision about the risk associated with using it.

Software de código abierto y dispositivos médicos personales: interpretación del riesgo a través de una evolución de las pruebas de software

La disponibilidad de hardware de bajo costo y herramientas de software avanzadas ha hecho posible los dispositivos médicos de código abierto. La decisión de utilizar un dispositivo médico de código abierto puede requerir la aceptación de algún riesgo. Comprender completamente el nivel de riesgo es esencial ya que la falla del software o del dispositivo médico es peligrosa. Como muchas aplicaciones médicas contienen códigos complicados, un método excelente para comprender la preparación del software es evaluar la cantidad de pruebas que se han completado en el software. Como estudio de caso, este proyecto evalúa el nivel de pruebas realizadas en componentes de software del páncreas artificial de bucle.

Los métodos clásicos para evaluar el código fuente complicado se utilizan para demostrar cuántas pruebas se necesitan en un proyecto. Nuestro análisis muestra que el sistema de páncreas artificial de bucle (rama maestra) se ha probado exhaustivamente con la mayoría de las fallas probablemente descubiertas. Mediante el uso de técnicas y métricas de ingeniería de software clásico, es posible medir cuán completamente se ha probado un producto médico de código abierto y tomar una decisión informada sobre el riesgo asociado con su uso.

EMOTION TWENTY QUESTIONS IN CHINESE

Shanshan Kong / Abe Kazemzadeh

Our study introduces the emotion twenty questions (EMO20Q) game, an experiment into the cognition and expression of emotions in ordinary people who speak Chinese. The preliminary results show that such a game is felicitous and that the questions generated to describe emotions have commonalities with earlier studies conducted in English.

Veinte preguntas sobre emociones en chino

Nuestro estudio presenta el juego Emotion Twenty Questions (EMO20Q), un experimento sobre el conocimiento y las expresiones de la emoción en personas comunes que hablan chino. Los resultados preliminares muestran que el juego es acertivo y que las preguntas generadas para describir las emociones tienen puntos en común con estudios anteriores de habla inglesa.

INTEGRATING AND ACCESSING UNIVERSITY INFORMATION APIS USING NATURAL LANGUAGE PROCESSING TOOLS

Shantanu Hadap / Shubha Shubha

Attending or working at a university requires students and faculty to manage a large amount of information to be successful. Both school administration and academic challenges have requirements to be met. One solution to this problem is to provide a virtual assistant which gives the required information to students and faculty. In this work, a virtual assistant that can completely understand conversational English and provide any needed information has been created. Talking to a virtual assistant to get the information is more convenient than the conventional way of doing it. Getting information from the virtual assistant does not require typing or browsing or any type of human interventions, which makes it more time-efficient and accessible. This solution has been tried in a private environment with a small scope at the University of St. Thomas and the results have been encouraging. Sixty percent (60%) of the users believe the assistant is very useful and the remainder finds it moderately useful.

Integración y acceso a API de información universitaria utilizando herramientas de procesamiento de lenguaje natural

Asistir o trabajar en una universidad requiere que los estudiantes y la facultad administren una gran cantidad de información para tener éxito. Tanto la administración escolar como los desafíos académicos tienen requisitos que cumplir. Una solución a este problema es proporcionar un asistente virtual para estudiantes y profesores que brinde la información requerida. En este trabajo se crea un asistente virtual que puede comprender completamente el inglés conversacional y proporcionar la información necesaria. Hablar con un asistente virtual para obtener la información es más conveniente que la forma convencional de hacerlo. Obtener información del asistente virtual no requiere tipear o navegar ni ningún tipo de intervención humana, lo que lo hace más eficiente y accesible. Esta solución se ha probado en un entorno privado con un pequeño alcance en la Universidad de St. Thomas y los resultados han sido alentadores. El 60 % de los usuarios cree que el asistente es muy útil y el resto lo encuentra moderadamente útil.

P’AWAQ YUPANA – NEOÁBACO DE LÓGICA HÍBRIDA

Carlos Saldívar (Dhavit Prem)

El P’awaq Yupana es una herramienta lúdico-pedagógica desarrollada para facilitar la abstracción de la operación en base binaria en el uso de la yupana inca mediante el método Tawa Pukllay. Sirve, además, para potenciar el pensamiento computacional, permitiendo al usuario encontrar soluciones aritméticas mediante una lógica híbrida de tres tipos de razonamiento aritmético: el andino (reconocimiento de patrones), el indo-arábigo (cálculo mental numérico) y el chino-japonés (manejo de cuentas en herramientas de conteo “ábaco” como apoyo al cálculo mental).

P’awaq Yupana: A New Hybrid Logic Abacus

P’awaq Yupana is a playful-pedagogical tool developed to facilitate the abstraction of a binary operation performed with the Inka Yupana through the Tawa Pukllay method. It is also useful to enhance computational thinking, enabling the user to find arithmetic solutions through the hybrid logic of three types of arithmetic reasoning: the Andean (pattern recognition), the Indo-Arabic (numerical mental calculation) and the Chinese-Japanese (handling of beads in a counting tool named “abacus” as a support for mental calculation).

G SUITE PARA LA EDUCACIÓN EN LA UNIVERSIDAD PÚBLICA DE LA REGIÓN CUSCO DEL PERÚ

Hernán Nina-Hanco

Sobre la base de un estudio longitudinal, entre los años 2015 y 2018, el conjunto de aplicaciones de G Suite for Education tiende a incrementar su uso y se consolida como el principal soporte digital de la comunicación interna en la universidad pública de la Región Cusco, del Perú.

G Suite for Education at Public Universities of the Cusco Region in Peru

Based on a longitudinal study conducted between 2015 and 2018, the G Suite for Education applications are increasingly being used and are consolidating their position as the main digital support for internal communication at public universities of the Cusco Region in Peru.