## Massimo CAFARO

### Professore II Fascia (Associato)

Settore Scientifico Disciplinare ING-INF/05: SISTEMI DI ELABORAZIONE DELLE INFORMAZIONI.

#### Dipartimento di Ingegneria dell'Innovazione

Centro Ecotekne Pal. O - S.P. 6, Lecce - Monteroni - LECCE (LE)

Ufficio, Piano terra

Telefono +39 0832 29 7371

Direttore del Master in Applied Data Science.

Responsabile scientifico del Laboratorio HPC

Afferente al Dipartimento di Ingegneria dell'Innovazione.

Calcolo parallelo e distribuito, grid/cloud computing, security, cryptography, data mining, machine learning, deep learning

#### Dipartimento di Ingegneria dell'Innovazione

Centro Ecotekne Pal. O - S.P. 6, Lecce - Monteroni - LECCE (LE)

Ufficio, Piano terra

Telefono +39 0832 29 7371

Direttore del Master in Applied Data Science.

Responsabile scientifico del Laboratorio HPC

Afferente al Dipartimento di Ingegneria dell'Innovazione.

Calcolo parallelo e distribuito, grid/cloud computing, security, cryptography, data mining, machine learning, deep learning

### Curriculum Vitae

**Massimo Cafaro** è **professore associato** presso il Dipartimento di Ingegneria dell'Innovazione dell'Università del Salento, nel settore scientifico disciplinare ING-INF/05. Il suo ambito di ricerca riguarda Parallel e Distributed Computing, Grid e Cloud Computing, Data Mining, Machine Learning, Deep Learning, Big Data e Security. È **direttore del Master in Applied Data Science** presso l'Università del Salento. È **Senior Member IEEE ed IEEE Computer Society,** **Senior Member ACM**, **Associate Editor per il journal IEEE Access e moderatore per IEEE TechRxiv preprint repository **("Computing and Processing" category). Laureato in Scienze dell'Informazione presso l’Università di Salerno, ha conseguito il dottorato in Informatica presso l'Università di Bari. È inoltre autore di oltre 110 pubblicazioni internazionali in refereed journals, proceedings e book chapters e co-autore del **brevetto** “Metodo e formalismo per inviare istruzioni a DataBase distribuiti realizzato mediante programma per computer”. È Visiting Professor presso il British Institute of Technology and E-Commerce, London, UK. È invited lecturer presso numerose università ed enti di ricerca, ed è membro dei Technical Committees IEEE on Parallel Processing, Distributed Processing, Scalable Computing e Services Computing. Inoltre, è **Vice Chair of Regional Centers e coordinatore della Technical Area on Data Intensive Computing nell’ambito di IEEE Technical Committee on Scalable Computing**. Collabora attivamente con l'Istituto Nazionale di Geofisica e Vulcanologia (INGV) ed il Centro Euro-Mediterraneano per i Cambiamenti Climatici (CMCC) occupandosi di aspetti di ricerca teorici e pratici prestando particolare attenzione al design, analisi ed implementazione di algoritmi sequenziali, paralleli e distribuiti. L'attività scientifica include infine l'organizzazione di workshop internazionali e l'insegnamento in università italiane e straniere, ed in scuole di dottorato internazionali.

Ha conseguito l’**Abilitazione Scientifica Nazionale alle funzioni di professore universitario di prima fascia (Ordinario) nel Settore Concorsuale 09/H1** - SISTEMI DI ELABORAZIONE DELLE INFORMAZIONI.

Ha conseguito l’**Abilitazione Scientifica Nazionale alle funzioni di professore universitario di prima fascia (Ordinario) nel Settore Concorsuale 01/B1** - INFORMATICA.

**Responsabile scientifico** del laboratorio di ricerca HPC (High Performance Computing).

**Responsabile scientifico** del laboratorio di ricerca CAMPI.

**Membro del Collegio dei Docenti del Corso di Dottorato di Ricerca in Ingegneria dei Sistemi Complessi **presso l'Università del Salento.

### Didattica

#### A.A. 2022/2023

Degree course MATEMATICA

Course type Laurea Magistrale

Language INGLESE

Credits 6.0

Teaching hours Ore totali di attività frontale: 42.0

Year taught 2022/2023

For matriculated on 2022/2023

Course year 1

Structure DIPARTIMENTO DI MATEMATICA E FISICA "ENNIO DE GIORGI"

Subject matter MATEMATICA PER L'INTELLIGENZA ARTIFICIALE

Location Lecce

##### DATA MINING

Degree course MATEMATICA

Course type Laurea Magistrale

Language INGLESE

Credits 6.0

Teaching hours Ore totali di attività frontale: 42.0

Year taught 2022/2023

For matriculated on 2022/2023

Course year 1

Structure DIPARTIMENTO DI MATEMATICA E FISICA "ENNIO DE GIORGI"

Subject matter TEORICO-MODELLISTICO

Location Lecce

##### DATA MINING & MACHINE LEARNING

Degree course COMPUTER ENGINEERING

Course type Laurea Magistrale

Language INGLESE

Credits 9.0

Teaching hours Ore totali di attività frontale: 81.0

Year taught 2022/2023

For matriculated on 2021/2022

Course year 2

Structure DIPARTIMENTO DI INGEGNERIA DELL'INNOVAZIONE

Subject matter ARTIFICIAL INTELLIGENCE

Location Lecce

##### PARALLEL ALGORITHMS

Degree course COMPUTER ENGINEERING

Course type Laurea Magistrale

Language INGLESE

Credits 9.0

Teaching hours Ore totali di attività frontale: 81.0

Year taught 2022/2023

For matriculated on 2021/2022

Course year 2

Structure DIPARTIMENTO DI INGEGNERIA DELL'INNOVAZIONE

Subject matter PERCORSO COMUNE

Location Lecce

#### A.A. 2021/2022

##### DATA MINING

Degree course MATEMATICA

Course type Laurea Magistrale

Language INGLESE

Credits 6.0

Teaching hours Ore totali di attività frontale: 42.0

Year taught 2021/2022

For matriculated on 2021/2022

Course year 1

Structure DIPARTIMENTO DI MATEMATICA E FISICA "ENNIO DE GIORGI"

Subject matter PERCORSO COMUNE

Location Lecce

##### PARALLEL ALGORITHMS

Degree course COMPUTER ENGINEERING

Course type Laurea Magistrale

Language INGLESE

Credits 9.0

Teaching hours Ore totali di attività frontale: 81.0

Year taught 2021/2022

For matriculated on 2020/2021

Course year 2

Structure DIPARTIMENTO DI INGEGNERIA DELL'INNOVAZIONE

Subject matter PERCORSO COMUNE

Location Lecce

#### A.A. 2020/2021

##### DATA MINING

Degree course MATEMATICA

Course type Laurea Magistrale

Language INGLESE

Credits 6.0

Teaching hours Ore totali di attività frontale: 42.0

Year taught 2020/2021

For matriculated on 2020/2021

Course year 1

Structure DIPARTIMENTO DI MATEMATICA E FISICA "ENNIO DE GIORGI"

Subject matter PERCORSO COMUNE

##### PARALLEL ALGORITHMS

Degree course COMPUTER ENGINEERING

Course type Laurea Magistrale

Language INGLESE

Credits 9.0

Teaching hours Ore totali di attività frontale: 81.0

Year taught 2020/2021

For matriculated on 2019/2020

Course year 2

Structure DIPARTIMENTO DI INGEGNERIA DELL'INNOVAZIONE

Subject matter PERCORSO COMUNE

Location Lecce

#### A.A. 2019/2020

##### DATA MINING

Degree course MATEMATICA

Course type Laurea Magistrale

Language INGLESE

Credits 6.0

Teaching hours Ore totali di attività frontale: 42.0

Year taught 2019/2020

For matriculated on 2019/2020

Course year 1

Structure DIPARTIMENTO DI MATEMATICA E FISICA "ENNIO DE GIORGI"

Subject matter PERCORSO COMUNE

Location Lecce

##### PARALLEL ALGORITHMS

Degree course COMPUTER ENGINEERING

Course type Laurea Magistrale

Language INGLESE

Credits 9.0

Teaching hours Ore totali di attività frontale: 81.0

Year taught 2019/2020

For matriculated on 2018/2019

Course year 2

Structure DIPARTIMENTO DI INGEGNERIA DELL'INNOVAZIONE

Subject matter PERCORSO COMUNE

Location Lecce

#### A.A. 2018/2019

##### DATA MINING

Degree course MATEMATICA

Course type Laurea Magistrale

Language INGLESE

Credits 6.0

Teaching hours Ore totali di attività frontale: 42.0

Year taught 2018/2019

For matriculated on 2018/2019

Course year 1

Structure DIPARTIMENTO DI MATEMATICA E FISICA "ENNIO DE GIORGI"

Subject matter PERCORSO COMUNE

Location Lecce

##### ELEMENTI DI INFORMATICA

Corso di laurea INGEGNERIA DELLE TECNOLOGIE INDUSTRIALI

Tipo corso di studio Laurea

Lingua ITALIANO

Crediti 6.0

Ripartizione oraria Ore totali di attività frontale: 54.0

Anno accademico di erogazione 2018/2019

Per immatricolati nel 2018/2019

Anno di corso 1

Struttura DIPARTIMENTO DI INGEGNERIA DELL'INNOVAZIONE

Percorso unico

Sede Lecce

Degree course MATEMATICA

Subject area ING-INF/05

Course type Laurea Magistrale

Credits 6.0

Teaching hours Ore totali di attività frontale: 42.0

For matriculated on 2022/2023

Year taught 2022/2023

Course year 1

Semestre Secondo Semestre (dal 27/02/2023 al 09/06/2023)

Language INGLESE

Subject matter MATEMATICA PER L'INTELLIGENZA ARTIFICIALE (A227)

Location Lecce

Calculus. Probability theory. Linear Algebra. Programming skills.

The course provides a modern introduction to data mining, which spans techniques, algorithms and methodologies for discovering structure, patterns and relationships in data sets (typically, large ones) and making predictions. Applications of data mining are already happening all around us, and, when they are done well, sometimes they even go unnoticed. For instance, how does the Google web search work? How does Shazam recognize a song? How does Netflix recommend movies to its users? The principles of data mining provide answers to these and others questions. Data mining overlaps the fields of computer science, statistical machine learning and data bases. The course aims at providing the students with the knowldedge required to explore, analyze and leverage available data in order to turn the data into valuable and actionable information for a company, for instance, in order to facilitate a decision-making process.

**Knowledge and understanding**. The course describes methods and models for the analysis of large amounts of data. Students must have a solid background with a broad spectrum of basic knowledge related to data mining:

- the students must have the basic cognitive tools to think analytically, creatively, critically and in an inquiring way, and have the abstraction and problem-solving skills needed to cope with complex systems;
- they must have solid knowledge of data mining models and methodologies;
- they must be able to work on large data collections, including heterogeneous and produced at high speed data, in order to integrate them - in particular by knowing how to manage their origin and quality - and to carry out in-depth thematic analyses, drawing on this knowledge to improve the decision-making process.

**Applying knowledge and understanding. **After the course the student should be able to:

- describe and use the main data mining techniques;
- understand the differences among several algorithms solving the same problem and recognize which one is better under different conditions;
- tackle new data mining problems by selecting the appropriate methods and justifying his/her choices;
- tackle new data mining problems by designing suitable algorithms and evaluating the results;
- explain experimental results to people outside of statistical machine learning or computer science.

**Making judgements**. Students must have the ability to process complex and/or fragmentary data and must arrive at original and autonomous ideas and judgments, and consistent choices in the context of their work, which are particularly delicate in the profession of data scientist. The course promotes the development of independent judgment in the appropriate choice of technique/model for data processing and the critical ability to interpret the goodness of the results of the models/methods applied to the datasets under examination.

**Communication.** It is essential that students are able to communicate with a varied and composite audience, not culturally homogeneous, in a clear, logical and effective way, using the methodological tools acquired and their scientific knowledge and, in particular, the specialty vocabulary. Students should be able to organize effective dissemination and study material through the most common presentation tools, including computer-based ones, to communicate the results of data analysis processes, for example by using visualization and reporting tools aimed at different types of audiences.

**Learning skills. **Students must acquire the critical ability to relate, with originality and autonomy, to the typical problems of data mining and, in general, cultural issues related to other similar areas. They should be able to develop and apply independently the knowledge and methods learnt with a view to possible continuation of studies at higher (doctoral) level or in the broader perspective of cultural and professional self-improvement of lifelong learning. Therefore, students should be able to switch to exhibition forms other than the source texts in order to memorize, summarize for themselves and for others, and disseminate scientific knowledge.

The course aims to provide students with advanced tools for data analysis, through which to extrapolate relevant information from large datasets and guide the related decision-making processes. The course consists of frontal lessons using slides made available to students via the Moodle platform, and classroom exercises. The frontal lessons are aimed at improving students' knowledge and understanding through the presentation of theories, models and methods; students are invited to participate in the lesson with autonomy of judgement, by asking questions and presenting examples. The exercises are aimed at understanding the algorithms and models presented.

Oral exam. During the exam the student is asked to illustrate theoretical topics in order to verify his/her knowledge and understanding of the selected topics. The student must demonstrate adequate knowledge and understanding of the issues presented or indicated, applying in a relevant manner the theories and conceptual models covered by the study programme.

**Office Hours**

By appointment; contact the instructor by email or at the end of class meetings.

Introduzione al corso. Streams. Funzioni hash uniformi, 2-universal e pairwise independent. Streaming: modello turnstile, strict turnstile e cash register. Frequency estimation. Sketches. Count-Sketch. Count-Min. Confronto comparativo tra Count-Sketch e Count-Min. Frequent items. Phi-frequent items. The majority problem. Algoritmo di Boyer-Moore. Algoritmo di Misra-Gries. Algoritmo Frequent.Algoritmo Space Saving. Proprieta' di Space Saving. Confronto comparativo con Frequent. Introduzione al paradigma di programmazione parallela Map-Reduce. Implementazione open-source Hadoop. Pro e contro di Hadoop e Map-Reduce. Distributed File System. Chunk servers, Master node. Map Function. Sort and Shuffle. Reduce Function. Map Tasks. Reduce Tasks. Word counting. Gestione dei guasti. Numero di Map e Reduce jobs. Granularita' dei tasks e pipelining. Mitigare il problema degli strugglers task: spawning di backup tasks. Combiners. Partition (hash) function. Altri esempi di algoritmi Map-Reduce: natural join, two-pass matrix multiply, single pass matrix multiply. Misure di costo per un algoritmo Map-Reduce. Discovery di association rules. Modello market-basket. Esempi di possibili applicazioni. Frequent itemsets. Supporto di un itemset. Association rules. Confidence e Interest.Association rules con elevato interesse positivo o negativo. Mining di association rules. Maximal e closed frequent itemsets. Lattice degli itemsets. Naive approach to counting frequent pairs. Algoritmo A-priori. Monotonicity. Algoritmo PCY. Raffinamenti di PCY: multistage e multihash. Frequent itemsets in 2 passate: random sampling. Frequent itemsets in 2 passate: Random sampling e scelta della soglia opportuna, algoritmo SON, monotonicità, SON parallelo mediante Map-Reduce in 2 passate, algoritmo di Toivonen, bordo negativo. Scene completion problem. Near neighbors in spazi di dimensionalità elevata. Document similarity. Coppie di documenti candidati. Near neighbor search. Jaccard similarity e distance. Shingling: convertire documenti email etc in insiemi. k-shingles. Compressione mediante hashing di k-shingles. Min-Hashing: conversione di insiemi di cardinalità elevata in brevi signatures preservando la similarità. Similarità e distanza di Jaccard per vettori booleani. Boolean matrices. Min-hash signatures. Implementazione. Locality-Sensitive Hashing: determinare coppie di documenti candidate. Matrix partitioning in b bande di r righe: analisi del grado di accuratezza associato rispetto ai falsi positivi ed ai falsi negativi. Link analysis. PageRank. Dead ends. Spider traps. Flow formulation. Matrix formulation. Random walk interpretation. Stationary distribution of a Discrete-Time Markov Chain. Perron-Frobenius Theorem. Google matrix and teleportation. Sparse matrix encoding. Block update algorithm. Topic-specific PageRank. Matrix formulation. Topic vector. Web Spam. Term spam. Spam farms. Analisi del valore di PageRank ottenuto tramite Spam Farm. TrustRank. Trust propagation. Spam Mass estimation. Introduzione al problema del clustering. Curse of dimensionality. Clustering in spazi euclidei e non euclidei. Distanze. Hierarchical clustering: agglomerative and divisive algorithms. Clustering by point assignment. Centroid and clustroid. K-means e K-means++. Scelta di k: elbow criterion. Algoritmo BFR. Discard, Compression e Retained sets. Summarizing points. Distanza di Mahalanobis. Algoritmo CURE. Punti rappresentativi e loro scelta. Input space e feature space. Kernel methods. Kernel matrix. Linear kernel. Kernel trick. Kernel operations in feature space. Represenattive clustering: K-means e Kernel K-means. Expectation-Maximization clustering. Hierarchical clustering. Density-based clustering. Algoritmo DBSCAN. Recommender systems. Recommendations. The long tail phenomenon. Content-based systems. Utility function and matrix. Ratings. Extrapolation of ratings (utilities). Item profiles. User profiles. Collaborative filtering. k-NN. Similarity metrics. User-user and item-item collaborative filtering. Evaluation of systems. Error metrics. RMSE, precision, rank correlation. Complexity of collaborative filtering. The Netflix challenge. Bellkor recommender system. Modeling local and global effects. Learning the optimal weights: optimization problem and gradient descent. Latent factor models. SVD decomposition. Learning the P and Q matrices. Preventing overfitting: regularization. Stochastic Gradient Descent. Biases and interactions. Temporal biases and factors. Machine learning: supervised and unsupervised approaches. Attributi numerici e categorici. Attributi categorici nominali ed ordinali. Probabilistic classifiers. Parametric approach: Bayes and naive Bayes classifiers. Data centering. Non parametric approach (density based): K-nearest neighbors classifier. Decision Trees. Hyperplans. Split points. Data partion and purity. Split Point Evaluation Measures: entropy, split entropy, information gain, Gini index, CART. Valutazione di split points numerici e categorici. Support Vector machines. Hyperplanes. Support Vectors and Margins. Linear and Separable Case. Soft Margin SVM: Linear and Nonseparable Case. Kernel SVM: Nonlinear Case. SVM Training Algorithms. Multiclass SVM. Analisi delle prestazioni di un classifier. Metriche di valutazione. ROC curve e AUC. K-fold cross-validation. Bootstrapping. Intervalli di confidenza. Paired t-Test. Bias and variance decomposition. Ensemble classifiers. Bagging. Random Forest. Boosting. Stacking. Introduction to neural networks. History. The Biological Inspiration. Learning in Biological vs Artificial Networks. An Alternative View: The Computational Graph Extension of Traditional Machine Learning. Machine Learning versus Deep Learning. Single Layer Networks: the Perceptron. Bias neurons. Training a Perceptron. Perceptron vs Linear SVMs. Activation and Loss Functions. Multilayer Neural Networks. Reasons for nonlinearity of hidden layers. Role of Hidden Layers. The Feature Engineering View of Hidden Layers. Multilayer Networks as Computational Graphs. Connecting Machine Learning with Shallow Neural Networks: Perceptron versus Linear SVM, RBF Network versus kernel SVM.

Mining of Massive Datasets

J. Leskovec, A. Rajaraman and J. Ullman

Freely availableonline: http://www.mmds.org

Data Mining and Analysis

M. J. Zaki and W. Meira

Freely available online: http://dataminingbook.info

Neural Networks and Deep Learning

Charu C. Aggarwal

Springer

##### (ING-INF/05)

##### DATA MINING

Degree course MATEMATICA

Subject area ING-INF/05

Course type Laurea Magistrale

Credits 6.0

Teaching hours Ore totali di attività frontale: 42.0

For matriculated on 2022/2023

Year taught 2022/2023

Course year 1

Semestre Secondo Semestre (dal 27/02/2023 al 09/06/2023)

Language INGLESE

Subject matter TEORICO-MODELLISTICO (A217)

Location Lecce

Calculus. Probability theory. Linear Algebra. Programming skills.

The course provides a modern introduction to data mining, which spans techniques, algorithms and methodologies for discovering structure, patterns and relationships in data sets (typically, large ones) and making predictions. Applications of data mining are already happening all around us, and, when they are done well, sometimes they even go unnoticed. For instance, how does the Google web search work? How does Shazam recognize a song? How does Netflix recommend movies to its users? The principles of data mining provide answers to these and others questions. Data mining overlaps the fields of computer science, statistical machine learning and data bases. The course aims at providing the students with the knowldedge required to explore, analyze and leverage available data in order to turn the data into valuable and actionable information for a company, for instance, in order to facilitate a decision-making process.

**Knowledge and understanding**. The course describes methods and models for the analysis of large amounts of data. Students must have a solid background with a broad spectrum of basic knowledge related to data mining:

- the students must have the basic cognitive tools to think analytically, creatively, critically and in an inquiring way, and have the abstraction and problem-solving skills needed to cope with complex systems;
- they must have solid knowledge of data mining models and methodologies;
- they must be able to work on large data collections, including heterogeneous and produced at high speed data, in order to integrate them - in particular by knowing how to manage their origin and quality - and to carry out in-depth thematic analyses, drawing on this knowledge to improve the decision-making process.

**Applying knowledge and understanding. **After the course the student should be able to:

- describe and use the main data mining techniques;
- understand the differences among several algorithms solving the same problem and recognize which one is better under different conditions;
- tackle new data mining problems by selecting the appropriate methods and justifying his/her choices;
- tackle new data mining problems by designing suitable algorithms and evaluating the results;
- explain experimental results to people outside of statistical machine learning or computer science.

**Making judgements**. Students must have the ability to process complex and/or fragmentary data and must arrive at original and autonomous ideas and judgments, and consistent choices in the context of their work, which are particularly delicate in the profession of data scientist. The course promotes the development of independent judgment in the appropriate choice of technique/model for data processing and the critical ability to interpret the goodness of the results of the models/methods applied to the datasets under examination.

**Communication.** It is essential that students are able to communicate with a varied and composite audience, not culturally homogeneous, in a clear, logical and effective way, using the methodological tools acquired and their scientific knowledge and, in particular, the specialty vocabulary. Students should be able to organize effective dissemination and study material through the most common presentation tools, including computer-based ones, to communicate the results of data analysis processes, for example by using visualization and reporting tools aimed at different types of audiences.

**Learning skills. **Students must acquire the critical ability to relate, with originality and autonomy, to the typical problems of data mining and, in general, cultural issues related to other similar areas. They should be able to develop and apply independently the knowledge and methods learnt with a view to possible continuation of studies at higher (doctoral) level or in the broader perspective of cultural and professional self-improvement of lifelong learning. Therefore, students should be able to switch to exhibition forms other than the source texts in order to memorize, summarize for themselves and for others, and disseminate scientific knowledge.

The course aims to provide students with advanced tools for data analysis, through which to extrapolate relevant information from large datasets and guide the related decision-making processes. The course consists of frontal lessons using slides made available to students via the Moodle platform, and classroom exercises. The frontal lessons are aimed at improving students' knowledge and understanding through the presentation of theories, models and methods; students are invited to participate in the lesson with autonomy of judgement, by asking questions and presenting examples. The exercises are aimed at understanding the algorithms and models presented.

Oral exam. During the exam the student is asked to illustrate theoretical topics in order to verify his/her knowledge and understanding of the selected topics. The student must demonstrate adequate knowledge and understanding of the issues presented or indicated, applying in a relevant manner the theories and conceptual models covered by the study programme.

**Office Hours**

By appointment; contact the instructor by email or at the end of class meetings.

Introduzione al corso. Streams. Funzioni hash uniformi, 2-universal e pairwise independent. Streaming: modello turnstile, strict turnstile e cash register. Frequency estimation. Sketches. Count-Sketch. Count-Min. Confronto comparativo tra Count-Sketch e Count-Min. Frequent items. Phi-frequent items. The majority problem. Algoritmo di Boyer-Moore. Algoritmo di Misra-Gries. Algoritmo Frequent.Algoritmo Space Saving. Proprieta' di Space Saving. Confronto comparativo con Frequent. Introduzione al paradigma di programmazione parallela Map-Reduce. Implementazione open-source Hadoop. Pro e contro di Hadoop e Map-Reduce. Distributed File System. Chunk servers, Master node. Map Function. Sort and Shuffle. Reduce Function. Map Tasks. Reduce Tasks. Word counting. Gestione dei guasti. Numero di Map e Reduce jobs. Granularita' dei tasks e pipelining. Mitigare il problema degli strugglers task: spawning di backup tasks. Combiners. Partition (hash) function. Altri esempi di algoritmi Map-Reduce: natural join, two-pass matrix multiply, single pass matrix multiply. Misure di costo per un algoritmo Map-Reduce. Discovery di association rules. Modello market-basket. Esempi di possibili applicazioni. Frequent itemsets. Supporto di un itemset. Association rules. Confidence e Interest.Association rules con elevato interesse positivo o negativo. Mining di association rules. Maximal e closed frequent itemsets. Lattice degli itemsets. Naive approach to counting frequent pairs. Algoritmo A-priori. Monotonicity. Algoritmo PCY. Raffinamenti di PCY: multistage e multihash. Frequent itemsets in 2 passate: random sampling. Frequent itemsets in 2 passate: Random sampling e scelta della soglia opportuna, algoritmo SON, monotonicità, SON parallelo mediante Map-Reduce in 2 passate, algoritmo di Toivonen, bordo negativo. Scene completion problem. Near neighbors in spazi di dimensionalità elevata. Document similarity. Coppie di documenti candidati. Near neighbor search. Jaccard similarity e distance. Shingling: convertire documenti email etc in insiemi. k-shingles. Compressione mediante hashing di k-shingles. Min-Hashing: conversione di insiemi di cardinalità elevata in brevi signatures preservando la similarità. Similarità e distanza di Jaccard per vettori booleani. Boolean matrices. Min-hash signatures. Implementazione. Locality-Sensitive Hashing: determinare coppie di documenti candidate. Matrix partitioning in b bande di r righe: analisi del grado di accuratezza associato rispetto ai falsi positivi ed ai falsi negativi. Link analysis. PageRank. Dead ends. Spider traps. Flow formulation. Matrix formulation. Random walk interpretation. Stationary distribution of a Discrete-Time Markov Chain. Perron-Frobenius Theorem. Google matrix and teleportation. Sparse matrix encoding. Block update algorithm. Topic-specific PageRank. Matrix formulation. Topic vector. Web Spam. Term spam. Spam farms. Analisi del valore di PageRank ottenuto tramite Spam Farm. TrustRank. Trust propagation. Spam Mass estimation. Introduzione al problema del clustering. Curse of dimensionality. Clustering in spazi euclidei e non euclidei. Distanze. Hierarchical clustering: agglomerative and divisive algorithms. Clustering by point assignment. Centroid and clustroid. K-means e K-means++. Scelta di k: elbow criterion. Algoritmo BFR. Discard, Compression e Retained sets. Summarizing points. Distanza di Mahalanobis. Algoritmo CURE. Punti rappresentativi e loro scelta. Input space e feature space. Kernel methods. Kernel matrix. Linear kernel. Kernel trick. Kernel operations in feature space. Represenattive clustering: K-means e Kernel K-means. Expectation-Maximization clustering. Hierarchical clustering. Density-based clustering. Algoritmo DBSCAN. Recommender systems. Recommendations. The long tail phenomenon. Content-based systems. Utility function and matrix. Ratings. Extrapolation of ratings (utilities). Item profiles. User profiles. Collaborative filtering. k-NN. Similarity metrics. User-user and item-item collaborative filtering. Evaluation of systems. Error metrics. RMSE, precision, rank correlation. Complexity of collaborative filtering. The Netflix challenge. Bellkor recommender system. Modeling local and global effects. Learning the optimal weights: optimization problem and gradient descent. Latent factor models. SVD decomposition. Learning the P and Q matrices. Preventing overfitting: regularization. Stochastic Gradient Descent. Biases and interactions. Temporal biases and factors. Machine learning: supervised and unsupervised approaches. Attributi numerici e categorici. Attributi categorici nominali ed ordinali. Probabilistic classifiers. Parametric approach: Bayes and naive Bayes classifiers. Data centering. Non parametric approach (density based): K-nearest neighbors classifier. Decision Trees. Hyperplans. Split points. Data partion and purity. Split Point Evaluation Measures: entropy, split entropy, information gain, Gini index, CART. Valutazione di split points numerici e categorici. Support Vector machines. Hyperplanes. Support Vectors and Margins. Linear and Separable Case. Soft Margin SVM: Linear and Nonseparable Case. Kernel SVM: Nonlinear Case. SVM Training Algorithms. Multiclass SVM. Analisi delle prestazioni di un classifier. Metriche di valutazione. ROC curve e AUC. K-fold cross-validation. Bootstrapping. Intervalli di confidenza. Paired t-Test. Bias and variance decomposition. Ensemble classifiers. Bagging. Random Forest. Boosting. Stacking. Introduction to neural networks. History. The Biological Inspiration. Learning in Biological vs Artificial Networks. An Alternative View: The Computational Graph Extension of Traditional Machine Learning. Machine Learning versus Deep Learning. Single Layer Networks: the Perceptron. Bias neurons. Training a Perceptron. Perceptron vs Linear SVMs. Activation and Loss Functions. Multilayer Neural Networks. Reasons for nonlinearity of hidden layers. Role of Hidden Layers. The Feature Engineering View of Hidden Layers. Multilayer Networks as Computational Graphs. Connecting Machine Learning with Shallow Neural Networks: Perceptron versus Linear SVM, RBF Network versus kernel SVM.

Mining of Massive Datasets

J. Leskovec, A. Rajaraman and J. Ullman

Freely availableonline: http://www.mmds.org

Data Mining and Analysis

M. J. Zaki and W. Meira

Freely available online: http://dataminingbook.info

Neural Networks and Deep Learning

Charu C. Aggarwal

Springer

##### DATA MINING (ING-INF/05)

##### DATA MINING & MACHINE LEARNING

Degree course COMPUTER ENGINEERING

Subject area ING-INF/05

Course type Laurea Magistrale

Credits 9.0

Teaching hours Ore totali di attività frontale: 81.0

For matriculated on 2021/2022

Year taught 2022/2023

Course year 2

Semestre Primo Semestre (dal 19/09/2022 al 16/12/2022)

Language INGLESE

Subject matter ARTIFICIAL INTELLIGENCE (A182)

Location Lecce

Calculus. Probability theory. Linear Algebra. Programming skills.

The course provides a modern introduction to data mining, which spans techniques, algorithms and methodologies for discovering structure, patterns and relationships in data sets (typically, large ones) and making predictions. Applications of data mining are already happening all around us, and, when they are done well, sometimes they even go unnoticed. For instance, how does the Google web search work? How does Netflix recommend movies to its users? The principles of data mining provide answers to these and others questions. Data mining overlaps the fields of computer science, statistical machine learning and data bases. The course aims at providing the students with the knowldedge required to explore, analyze and leverage available data in order to turn the data into valuable and actionable information for a company, for instance, in order to facilitate a decision-making process.

**Knowledge and understanding**. The course describes methods and models for the analysis of large amounts of data. Students must have a solid background with a broad spectrum of basic knowledge related to data mining:

- the students must have the basic cognitive tools to think analytically, creatively, critically and in an inquiring way, and have the abstraction and problem-solving skills needed to cope with complex systems;
- they must have solid knowledge of data mining models and methodologies;
- they must be able to work on large data collections, including heterogeneous and produced at high speed data, in order to integrate them - in particular by knowing how to manage their origin and quality - and to carry out in-depth thematic analyses, drawing on this knowledge to improve the decision-making process.

**Applying knowledge and understanding. **After the course the student should be able to:

- describe and use the main data mining techniques;
- understand the differences among several algorithms solving the same problem and recognize which one is better under different conditions;
- tackle new data mining problems by selecting the appropriate methods and justifying his/her choices;
- tackle new data mining problems by designing suitable algorithms and evaluating the results;
- explain experimental results to people outside of statistical machine learning or computer science.

**Making judgements**. Students must have the ability to process complex and/or fragmentary data and must arrive at original and autonomous ideas and judgments, and consistent choices in the context of their work, which are particularly delicate in the profession of data scientist. The course promotes the development of independent judgment in the appropriate choice of technique/model for data processing and the critical ability to interpret the goodness of the results of the models/methods applied to the datasets under examination.

**Communication.** It is essential that students are able to communicate with a varied and composite audience, not culturally homogeneous, in a clear, logical and effective way, using the methodological tools acquired and their scientific knowledge and, in particular, the specialty vocabulary. Students should be able to organize effective dissemination and study material through the most common presentation tools, including computer-based ones, to communicate the results of data analysis processes, for example by using visualization and reporting tools aimed at different types of audiences.

**Learning skills. **Students must acquire the critical ability to relate, with originality and autonomy, to the typical problems of data mining and, in general, cultural issues related to other similar areas. They should be able to develop and apply independently the knowledge and methods learnt with a view to possible continuation of studies at higher (doctoral) level or in the broader perspective of cultural and professional self-improvement of lifelong learning. Therefore, students should be able to switch to exhibition forms other than the source texts in order to memorize, summarize for themselves and for others, and disseminate scientific knowledge.

The course aims to provide students with advanced tools for data analysis, through which to extrapolate relevant information from large datasets and guide the related decision-making processes. The course consists of frontal lessons using slides made available to students via the UniSalento e-learning platform, and classroom exercises. The frontal lessons are aimed at improving students' knowledge and understanding through the presentation of theories, models and methods; students are invited to participate in the lesson with autonomy of judgement, by asking questions and presenting examples. The exercises are aimed at understanding the algorithms and models presented.

Software project and oral exam. During the exam the student is asked to illustrate theoretical topics in order to verify his/her knowledge and understanding of the selected topics. The student must demonstrate adequate knowledge and understanding of the issues presented or indicated, applying in a relevant manner the theories and conceptual models covered by the study programme. The software project is assigned upon request to the student, and must be mandatorily discussed into the same trial in which the oral test is performed.

**Office Hours**

By appointment; contact the instructor by email or at the end of class meetings.

Introduction to the course.

Introduction to the Map-Reduce parallel programming model. Open-source Hadoop implementation. Pros and cons of Hadoop and Map-Reduce. Distributed File System. Chunk servers, Master node. Map Function. Sort and Shuffle. Reduce Function. Map Tasks. Reduce Tasks. Word counting. Handling failures. Number of Map and Reduce jobs. Granularity of tasks and pipelining. Strugglers tasks: mitigating the problem by spawning backup tasks. Combiners. Partition (hash) function. Additional examples of Map-Reduce: natural join, two-pass matrix multiply, single pass matrix multiply. Cost measures for Map-Reduce algorithms.

Frequent Pattern Mining. Discovery of association rules. Market-basket model. Examples of possible applications. Frequent itemsets. Support of an itemset. Association rules. Confidence and Interest. Association rules with highly positive or negative interest. Mining association rules. Maximal and closed frequent itemsets. Lattice of the itemsets. Naive approach to counting frequent pairs. A-priori. Monotonicity. PCY. PCY refinements: multistage and multihash. Frequent itemsets in 2 passes: random sampling and choice of the threshold, SON, monotonicity, parallel SON parallelo with Map-Reduce in 2 passes, Toivonen and the negative border. ECLAT. DECLAT. FPGrowth. Sequence mining. Frequent sequences. Mining frequent sequences. GSP. SPADE. PREFIXSPAN. Suffix trees. Ukkonen’s Linear Time Suffix Tree Algorithm.

Scene completion problem. Near neighbors in high dimensionality spaces. Document similarity. Pairs of candidate documents. Near neighbor search. Jaccard similarity and distance. Shingling: how to convert documents, emails etc in sets. k-shingles. Compression through hashing of k-shingles. Min-Hashing: converting big sets in short signatures preserving the similarity. Similarity and Jaccard distance for boolean vectors. Boolean matrices. Min-hash signatures. Implementation. Locality-Sensitive Hashing: determining pairs of candidate documents. Matrix partitioning in b bands of r rows: analysis of accuracy with regard to false positives and false negatives.

Link analysis. PageRank. Dead ends. Spider traps. Flow formulation. Matrix formulation. Random walk interpretation. Stationary distribution of a Discrete-Time Markov Chain. Perron-Frobenius Theorem. Google matrix and teleportation. Sparse matrix encoding. Block update algorithm. Topic-specific PageRank. Matrix formulation. Topic vector. Web Spam. Term spam. Spam farms. PageRank value obtained through a Spam Farm. TrustRank. Trust propagation. Spam Mass estimation.

Recommender systems. Recommendations. The long tail phenomenon. Content-based systems. Utility function and matrix. Ratings. Extrapolation of ratings (utilities). Item profiles. User profiles. Collaborative filtering. k-NN. Similarity metrics. User-user and item-item collaborative filtering. Evaluation of systems. Error metrics. RMSE, precision, rank correlation. Complexity of collaborative filtering. The Netflix challenge. Bellkor recommender system. Modeling local and global effects. Learning the optimal weights: optimization problem and gradient descent. Latent factor models. SVD decomposition. Learning the P and Q matrices. Preventing overfitting: regularization. Stochastic Gradient Descent. Biases and interactions. Temporal biases and factors.

Streams. Uniform, 2-universal and pairwise independent hash function. Streaming: turnstile, strict turnstile and cash register models. Frequency estimation. Sketches. Count-Sketch. Count-Min. Comparing Count-Sketch and Count-Min. Frequent items. Phi-frequent items. The majority problem. Boyer-Moore. Misra-Gries. Frequent. Space Saving. Space Saving properties. Comparing Frequent and Space Saving. Sampling from a Data Stream: Sampling a fixed proportion. Sampling from a Data Stream: Sampling a fixed-size sample (Reserved Sampling). Queries over a sliding windows: counting the number of bits equal to 1 with DGIM. Filtering data streams: Bloom FIlters. Counting distinct elements: Flajolet-Martin. Estimating moments with AMS.

The clustering problem. Curse of dimensionality. Clustering in euclidean and non euclidean spaces. Distances. Hierarchical clustering: agglomerative and divisive algorithms. Clustering by point assignment. Centroid and clustroid. K-means and K-means++. Choosing k: elbow criterion. BFR. Discard, Compression and Retained sets. Summarizing points. Mahalanobis distance. CURE. Representative points and their choice. Input space and feature space. Kernel methods. Kernel matrix. Linear kernel. Kernel trick. Kernel operations in feature space. Represenative clustering: K-means and Kernel K-means. Expectation-Maximization clustering. Hierarchical clustering. Density-based clustering. DBSCAN. Clustering validation: external, internal and relative measures.

Machine learning. Supervised and unsupervised approaches. Numerical and categorical attributes. Categorical attributes: nominal and ordinal attributes. Probabilistic classifiers. Parametric approach: Bayes and naive Bayes classifiers. Data centering. Non parametric approach (density based): K-nearest neighbors classifier. Decision Trees. Hyperplans. Split points. Data partion and purity. Split Point Evaluation Measures: entropy, split entropy, information gain, Gini index, CART. Evaluation of numerical and categorical split points. Support Vector machines. Hyperplanes. Support Vectors and Margins. Linear and Separable Case. Soft Margin SVM: Linear and Nonseparable Case. Kernel SVM: Nonlinear Case. SVM Training Algorithms. Multiclass SVM. Assessing the performances of a classifier. Evaluation metrics. ROC curve and AUC. K-fold cross-validation. Bootstrapping. Confidence intervals. Paired t-Test. Bias and variance decomposition. Ensemble classifiers. Bagging. Random Forest. Boosting. Stacking.

Introduction to neural networks. History. The Biological Inspiration. Learning in Biological vs Artificial Networks. An Alternative View: The Computational Graph Extension of Traditional Machine Learning. Machine Learning versus Deep Learning. Single Layer Networks: the Perceptron. Bias neurons. Training a Perceptron. Perceptron vs Linear SVMs. Activation and Loss Functions. Multilayer Neural Networks. Reasons for nonlinearity of hidden layers. Role of Hidden Layers. The Feature Engineering View of Hidden Layers. Multilayer Networks as Computational Graphs. Connecting Machine Learning with Shallow Neural Networks: Perceptron versus Linear SVM, RBF Network versus kernel SVM. Neural models for linear regression, classification and the Fisher discriminant. Widrow-Hoff learning. Neural model for linear regression. Comparison of Widrow-Hoff with Perceptron and SVM. Connections with Fisher Discriminant. Neural Models for Logistic Regression. The Softmax Activation Function and Multinomial Logistic Regression. The Autoencoder for Unsupervised Representation Learning. Singular Value Decomposition with Autoencoders. Row-Index to Row-Value Autoencoders: Incomplete Matrix Factorization for Recommender Systems.

Mining of Massive Datasets

J. Leskovec, A. Rajaraman and J. Ullman

Freely availableonline: http://www.mmds.org

Data Mining and Analysis

M. J. Zaki and W. Meira

Freely available online: http://dataminingbook.info

Neural Networks and Deep Learning

Charu C. Aggarwal

Springer

##### DATA MINING & MACHINE LEARNING (ING-INF/05)

##### PARALLEL ALGORITHMS

Degree course COMPUTER ENGINEERING

Subject area ING-INF/05

Course type Laurea Magistrale

Credits 9.0

Teaching hours Ore totali di attività frontale: 81.0

For matriculated on 2021/2022

Year taught 2022/2023

Course year 2

Semestre Secondo Semestre (dal 01/03/2023 al 09/06/2023)

Language INGLESE

Subject matter PERCORSO COMUNE (999)

Location Lecce

Calculus I and II, Probability Theory. Programming skills and working knowledge of the C programming language.

The course provides a modern introduction to design, analysis and implementation of sequential and parallel algorithms. In particular, the course is based on a pragmatic approach to parallel programming of message-passing algorithms through the C language and the MPI library.

**Knowledge and understanding. **Students must have a solid background with a broad spectrum of basic knowledge of sequential and parallel algorithms:

· the students must have the basic cognitive tools to think analytically, creatively, critically and in an inquiring way, and have the abstraction and problem-solving skills needed to cope with complex systems;

· they must have a solid knowledge of the design and implementation of sequential and parallel efficient algorithms;

· they must have the tools for analysing the resources used by algorithms;

· they must have a catalogue of the most well-known and efficient sequential and parallel algorithms for basic computational problems.

**Applying knowledge and understanding. **After the course the student should be able to:

· Describe and use the main design techniques for sequential algorithms;

· Design, prove the correctness and analyze the computational complexity of sequential algorithms;

· Understand the differences among several algorithms solving the same problem and recognize which one is better under different conditions;

· Describe and use basic sequential algorithms;

· Describe and use basic data structures; know about the existence of advanced data structures;

· Understand the difference between sequential and parallel algorithms;

· Design, implement and analyze message-passing based parallel algorithms in C using the MPI library;

· Describe and use basic parallel algorithms.

**Making judgements. **Students are guided to learn critically everything that is explained to them in class, to compare different approaches to solving algorithmic problems, and to identify and propose, in an autonomous way, the most efficient solution they find.

**Communication. **It is essential that students are able to communicate with a varied and composite audience, not culturally homogeneous, in a clear, logical and effective way, using the methodological tools acquired and their scientific knowledge and, in particular, the specialty vocabulary. The course promotes the development of the following skills of the student: ability to expose in precise and formal terms an abstract model of concrete problems, identifying the salient features of them and discarding the nonessential ones; ability to describe and analyze an efficient solution to the problem in question.

**Learning skills. **Students must acquire the critical ability to relate, with originality and autonomy, to the typical problems of data mining and, in general, cultural issues related to other similar areas. They should be able to develop and apply independently the knowledge and methods learnt with a view to possible continuation of studies at higher (doctoral) level or in the broader perspective of cultural and professional self-improvement of lifelong learning. Therefore, students should be able to switch to exhibition forms other than the source texts in order to memorize, summarize for themselves and for others, and disseminate scientific knowledge.

The course aims to enable students to abstract formal algorithmic models and problems from concrete computational problems, and to design efficient algorithmic solutions for them. This will be done using the following teaching method. Every computational problem will be introduced, motivating it with concrete examples. The presentation of each topic will be divided into four parts: 1. Description of the actual computational problem. 2. Modelling the real problem by means of an abstract problem. 3. Resolution of the abstract problem through an algorithm obtained through the application of the general techniques of design of algorithms introduced in the course. 4. Analysis of the resources used by the algorithm. The course consists of frontal lessons, and classroom exercises. There will be theoretical lessons aimed at learning the basic techniques for the project and analysis of algorithms, and a part of lessons devoted to exercises in which we will illustrate, with plenty of examples, how the theoretical knowledge acquired can be used in order to solve algorithmic problems of practical interest and implement parallel algorithms in C language through the MPI library.

The exam consists of a written test and a prototype implementation of a parallel software. The written test (2 hours, 21 points out of 30) covers theoretical topics related to the design and analysis of sequential and parallel algorithms in order to verify the student’s knowledge and understanding of the materials. In order to pass the written test, students must obtain at least 9 points out of 21. The parallel software project (9 points out of 30) is meant to verify the practice of parallel programming and the ability of the student to implement theoretical parallel algorithms or to parallelize a sequential algorithm. In order to pass the parallel programming challenge, students must obtain at least 4 points out of 9. Both the written test and the prototype implementation of parallel software are mandatory. Students must pass both the written test and the parallel programming challenge; however, the overall exam is passed only if the sum of points obtained in the written test and the parallel programming challenge is greater than or equal to 18 points.

The problem to be solved in parallel (or a sequential algorithm to be implemented in parallel) is assigned upon request to the student. The assigned parallel software project can be discussed only if a written test has been successfully passed; moreover, the parallel software project must be mandatorily discussed into the same trial in which the written test has been passed.

**Office Hours**

By appointment; contact the instructor by email or at the end of class meetings.

*Parallel **Algorithms*

Introduction. Parallel Architectures. Parallel algorithm design. Message-Passing Programming. Sieve of Erathostenes. Floyd all-pairs shortest path algorithm. Performance analysis. Matrix-vector multiplication. Document classification. Matrix multiplication. Linear Systems. Finite Difference Methods. Sorting.

*Parallel Programming*

Message-Passing programming using MPI.

*Sequential* A*lgorithms*

Introduction. Order of growth. Analysis of algorithms. Decrease and conquer. Divide and conquer. Recurrences. Randomized algorithms. Transform and conquer. Dynamic programming. Greedy algorithms. Single Source Shortest Paths. Dijkstra algorithm. Breadth-First Search. Bellman-Ford Algorithm. Complexity and computability. NP-Completeness. The transition from sequential to parallel computing. Parallel complexity.

Introduction to Algorithms. Fourth edition. Cormen, Leiserson, Rivest, Stein. The MIT Press

Parallel Programming in C with MPI and OpenMP (International Edition). Michael J. Quinn. McGraw-Hill

##### PARALLEL ALGORITHMS (ING-INF/05)

##### DATA MINING

Degree course MATEMATICA

Subject area ING-INF/05

Course type Laurea Magistrale

Credits 6.0

Teaching hours Ore totali di attività frontale: 42.0

For matriculated on 2021/2022

Year taught 2021/2022

Course year 1

Semestre Secondo Semestre (dal 21/02/2022 al 27/05/2022)

Language INGLESE

Subject matter PERCORSO COMUNE (999)

Location Lecce

Calculus. Probability theory. Linear Algebra. Programming skills.

The course provides a modern introduction to data mining, which spans techniques, algorithms and methodologies for discovering structure, patterns and relationships in data sets (typically, large ones) and making predictions. Applications of data mining are already happening all around us, and, when they are done well, sometimes they even go unnoticed. For instance, how does the Google web search work? How does Shazam recognize a song? How does Netflix recommend movies to its users? The principles of data mining provide answers to these and others questions. Data mining overlaps the fields of computer science, statistical machine learning and data bases. The course aims at providing the students with the knowldedge required to explore, analyze and leverage available data in order to turn the data into valuable and actionable information for a company, for instance, in order to facilitate a decision-making process.

**Knowledge and understanding**. The course describes methods and models for the analysis of large amounts of data. Students must have a solid background with a broad spectrum of basic knowledge related to data mining:

- they must have solid knowledge of data mining models and methodologies;

**Applying knowledge and understanding. **After the course the student should be able to:

- describe and use the main data mining techniques;
- tackle new data mining problems by selecting the appropriate methods and justifying his/her choices;
- tackle new data mining problems by designing suitable algorithms and evaluating the results;
- explain experimental results to people outside of statistical machine learning or computer science.

**Making judgements**. Students must have the ability to process complex and/or fragmentary data and must arrive at original and autonomous ideas and judgments, and consistent choices in the context of their work, which are particularly delicate in the profession of data scientist. The course promotes the development of independent judgment in the appropriate choice of technique/model for data processing and the critical ability to interpret the goodness of the results of the models/methods applied to the datasets under examination.

**Communication.** It is essential that students are able to communicate with a varied and composite audience, not culturally homogeneous, in a clear, logical and effective way, using the methodological tools acquired and their scientific knowledge and, in particular, the specialty vocabulary. Students should be able to organize effective dissemination and study material through the most common presentation tools, including computer-based ones, to communicate the results of data analysis processes, for example by using visualization and reporting tools aimed at different types of audiences.

**Learning skills. **Students must acquire the critical ability to relate, with originality and autonomy, to the typical problems of data mining and, in general, cultural issues related to other similar areas. They should be able to develop and apply independently the knowledge and methods learnt with a view to possible continuation of studies at higher (doctoral) level or in the broader perspective of cultural and professional self-improvement of lifelong learning. Therefore, students should be able to switch to exhibition forms other than the source texts in order to memorize, summarize for themselves and for others, and disseminate scientific knowledge.

The course aims to provide students with advanced tools for data analysis, through which to extrapolate relevant information from large datasets and guide the related decision-making processes. The course consists of frontal lessons using slides made available to students via the Moodle platform, and classroom exercises. The frontal lessons are aimed at improving students' knowledge and understanding through the presentation of theories, models and methods; students are invited to participate in the lesson with autonomy of judgement, by asking questions and presenting examples. The exercises are aimed at understanding the algorithms and models presented.

Oral exam. During the exam the student is asked to illustrate theoretical topics in order to verify his/her knowledge and understanding of the selected topics. The student must demonstrate adequate knowledge and understanding of the issues presented or indicated, applying in a relevant manner the theories and conceptual models covered by the study programme.

**Office Hours**

By appointment; contact the instructor by email or at the end of class meetings.

Introduzione al corso. Streams. Funzioni hash uniformi, 2-universal e pairwise independent. Streaming: modello turnstile, strict turnstile e cash register. Frequency estimation. Sketches. Count-Sketch. Count-Min. Confronto comparativo tra Count-Sketch e Count-Min. Frequent items. Phi-frequent items. The majority problem. Algoritmo di Boyer-Moore. Algoritmo di Misra-Gries. Algoritmo Frequent.Algoritmo Space Saving. Proprieta' di Space Saving. Confronto comparativo con Frequent. Introduzione al paradigma di programmazione parallela Map-Reduce. Implementazione open-source Hadoop. Pro e contro di Hadoop e Map-Reduce. Distributed File System. Chunk servers, Master node. Map Function. Sort and Shuffle. Reduce Function. Map Tasks. Reduce Tasks. Word counting. Gestione dei guasti. Numero di Map e Reduce jobs. Granularita' dei tasks e pipelining. Mitigare il problema degli strugglers task: spawning di backup tasks. Combiners. Partition (hash) function. Altri esempi di algoritmi Map-Reduce: natural join, two-pass matrix multiply, single pass matrix multiply. Misure di costo per un algoritmo Map-Reduce. Discovery di association rules. Modello market-basket. Esempi di possibili applicazioni. Frequent itemsets. Supporto di un itemset. Association rules. Confidence e Interest.Association rules con elevato interesse positivo o negativo. Mining di association rules. Maximal e closed frequent itemsets. Lattice degli itemsets. Naive approach to counting frequent pairs. Algoritmo A-priori. Monotonicity. Algoritmo PCY. Raffinamenti di PCY: multistage e multihash. Frequent itemsets in 2 passate: random sampling. Frequent itemsets in 2 passate: Random sampling e scelta della soglia opportuna, algoritmo SON, monotonicità, SON parallelo mediante Map-Reduce in 2 passate, algoritmo di Toivonen, bordo negativo. Scene completion problem. Near neighbors in spazi di dimensionalità elevata. Document similarity. Coppie di documenti candidati. Near neighbor search. Jaccard similarity e distance. Shingling: convertire documenti email etc in insiemi. k-shingles. Compressione mediante hashing di k-shingles. Min-Hashing: conversione di insiemi di cardinalità elevata in brevi signatures preservando la similarità. Similarità e distanza di Jaccard per vettori booleani. Boolean matrices. Min-hash signatures. Implementazione. Locality-Sensitive Hashing: determinare coppie di documenti candidate. Matrix partitioning in b bande di r righe: analisi del grado di accuratezza associato rispetto ai falsi positivi ed ai falsi negativi. Link analysis. PageRank. Dead ends. Spider traps. Flow formulation. Matrix formulation. Random walk interpretation. Stationary distribution of a Discrete-Time Markov Chain. Perron-Frobenius Theorem. Google matrix and teleportation. Sparse matrix encoding. Block update algorithm. Topic-specific PageRank. Matrix formulation. Topic vector. Web Spam. Term spam. Spam farms. Analisi del valore di PageRank ottenuto tramite Spam Farm. TrustRank. Trust propagation. Spam Mass estimation. Introduzione al problema del clustering. Curse of dimensionality. Clustering in spazi euclidei e non euclidei. Distanze. Hierarchical clustering: agglomerative and divisive algorithms. Clustering by point assignment. Centroid and clustroid. K-means e K-means++. Scelta di k: elbow criterion. Algoritmo BFR. Discard, Compression e Retained sets. Summarizing points. Distanza di Mahalanobis. Algoritmo CURE. Punti rappresentativi e loro scelta. Input space e feature space. Kernel methods. Kernel matrix. Linear kernel. Kernel trick. Kernel operations in feature space. Represenattive clustering: K-means e Kernel K-means. Expectation-Maximization clustering. Hierarchical clustering. Density-based clustering. Algoritmo DBSCAN. Recommender systems. Recommendations. The long tail phenomenon. Content-based systems. Utility function and matrix. Ratings. Extrapolation of ratings (utilities). Item profiles. User profiles. Collaborative filtering. k-NN. Similarity metrics. User-user and item-item collaborative filtering. Evaluation of systems. Error metrics. RMSE, precision, rank correlation. Complexity of collaborative filtering. The Netflix challenge. Bellkor recommender system. Modeling local and global effects. Learning the optimal weights: optimization problem and gradient descent. Latent factor models. SVD decomposition. Learning the P and Q matrices. Preventing overfitting: regularization. Stochastic Gradient Descent. Biases and interactions. Temporal biases and factors. Machine learning: supervised and unsupervised approaches. Attributi numerici e categorici. Attributi categorici nominali ed ordinali. Probabilistic classifiers. Parametric approach: Bayes and naive Bayes classifiers. Data centering. Non parametric approach (density based): K-nearest neighbors classifier. Decision Trees. Hyperplans. Split points. Data partion and purity. Split Point Evaluation Measures: entropy, split entropy, information gain, Gini index, CART. Valutazione di split points numerici e categorici. Support Vector machines. Hyperplanes. Support Vectors and Margins. Linear and Separable Case. Soft Margin SVM: Linear and Nonseparable Case. Kernel SVM: Nonlinear Case. SVM Training Algorithms. Multiclass SVM. Analisi delle prestazioni di un classifier. Metriche di valutazione. ROC curve e AUC. K-fold cross-validation. Bootstrapping. Intervalli di confidenza. Paired t-Test. Bias and variance decomposition. Ensemble classifiers. Bagging. Random Forest. Boosting. Stacking. Introduction to neural networks. History. The Biological Inspiration. Learning in Biological vs Artificial Networks. An Alternative View: The Computational Graph Extension of Traditional Machine Learning. Machine Learning versus Deep Learning. Single Layer Networks: the Perceptron. Bias neurons. Training a Perceptron. Perceptron vs Linear SVMs. Activation and Loss Functions. Multilayer Neural Networks. Reasons for nonlinearity of hidden layers. Role of Hidden Layers. The Feature Engineering View of Hidden Layers. Multilayer Networks as Computational Graphs. Connecting Machine Learning with Shallow Neural Networks: Perceptron versus Linear SVM, RBF Network versus kernel SVM.

Mining of Massive Datasets

J. Leskovec, A. Rajaraman and J. Ullman

Freely availableonline: http://www.mmds.org

Data Mining and Analysis

M. J. Zaki and W. Meira

Freely available online: http://dataminingbook.info

Neural Networks and Deep Learning

Charu C. Aggarwal

Springer

##### DATA MINING (ING-INF/05)

##### PARALLEL ALGORITHMS

Degree course COMPUTER ENGINEERING

Subject area ING-INF/05

Course type Laurea Magistrale

Credits 9.0

Teaching hours Ore totali di attività frontale: 81.0

For matriculated on 2020/2021

Year taught 2021/2022

Course year 2

Semestre Primo Semestre (dal 20/09/2021 al 17/12/2021)

Language INGLESE

Subject matter PERCORSO COMUNE (999)

Location Lecce

Calculus I and II, Probability Theory. Programming skills and working knowledge of the C programming language.

The course provides a modern introduction to design, analysis and implementation of sequential and parallel algorithms. In particular, the course is based on a pragmatic approach to parallel programming of message-passing algorithms through the C language and the MPI library.

**Knowledge and understanding. **Students must have a solid background with a broad spectrum of basic knowledge of sequential and parallel algorithms:

· the students must have the basic cognitive tools to think analytically, creatively, critically and in an inquiring way, and have the abstraction and problem-solving skills needed to cope with complex systems;

· they must have a solid knowledge of the design and implementation of sequential and parallel efficient algorithms;

· they must have the tools for analysing the resources used by algorithms;

· they must have a catalogue of the most well-known and efficient sequential and parallel algorithms for basic computational problems.

**Applying knowledge and understanding. **After the course the student should be able to:

· Describe and use the main design techniques for sequential algorithms;

· Design, prove the correctness and analyze the computational complexity of sequential algorithms;

· Understand the differences among several algorithms solving the same problem and recognize which one is better under different conditions;

· Describe and use basic sequential algorithms;

· Describe and use basic data structures; know about the existence of advanced data structures;

· Understand the difference between sequential and parallel algorithms;

· Design, implement and analyze message-passing based parallel algorithms in C using the MPI library;

· Describe and use basic parallel algorithms.

**Making judgements. **Students are guided to learn critically everything that is explained to them in class, to compare different approaches to solving algorithmic problems, and to identify and propose, in an autonomous way, the most efficient solution they find.

**Communication. **It is essential that students are able to communicate with a varied and composite audience, not culturally homogeneous, in a clear, logical and effective way, using the methodological tools acquired and their scientific knowledge and, in particular, the specialty vocabulary. The course promotes the development of the following skills of the student: ability to expose in precise and formal terms an abstract model of concrete problems, identifying the salient features of them and discarding the nonessential ones; ability to describe and analyze an efficient solution to the problem in question.

**Learning skills. **Students must acquire the critical ability to relate, with originality and autonomy, to the typical problems of data mining and, in general, cultural issues related to other similar areas. They should be able to develop and apply independently the knowledge and methods learnt with a view to possible continuation of studies at higher (doctoral) level or in the broader perspective of cultural and professional self-improvement of lifelong learning. Therefore, students should be able to switch to exhibition forms other than the source texts in order to memorize, summarize for themselves and for others, and disseminate scientific knowledge.

The course aims to enable students to abstract formal algorithmic models and problems from concrete computational problems, and to design efficient algorithmic solutions for them. This will be done using the following teaching method. Every computational problem will be introduced, motivating it with concrete examples. The presentation of each topic will be divided into four parts: 1. Description of the actual computational problem. 2. Modelling the real problem by means of an abstract problem. 3. Resolution of the abstract problem through an algorithm obtained through the application of the general techniques of design of algorithms introduced in the course. 4. Analysis of the resources used by the algorithm. The course consists of frontal lessons, and classroom exercises. There will be theoretical lessons aimed at learning the basic techniques for the project and analysis of algorithms, and a part of lessons devoted to exercises in which we will illustrate, with plenty of examples, how the theoretical knowledge acquired can be used in order to solve algorithmic problems of practical interest and implement parallel algorithms in C language through the MPI library.

The exam consists of a written test and a prototype implementation of a parallel software. The written test (3 hours, 21 points out of 30) covers theoretical topics related to the design and analysis of sequential and parallel algorithms in order to verify the student’s knowledge and understanding of the materials. In order to pass the written test, students must obtain at least 9 points out of 21. The parallel software project (9 points out of 30) is meant to verify the practice of parallel programming and the ability of the student to implement theoretical parallel algorithms or to parallelize a sequential algorithm. In order to pass the parallel programming challenge, students must obtain at least 4 points out of 9. Both the written test and the prototype implementation of parallel software are mandatory. Students must pass both the written test and the parallel programming challenge; however, the overall exam is passed only if the sum of points obtained in the written test and the parallel programming challenge is greater to or equal to 18 points.

The problem to be solved in parallel (or a sequential algorithm to be implemented in parallel) is assigned upon request to the student. The assigned parallel software project can be discussed only if a written test has been successfully passed; moreover, the parallel software project must be mandatorily discussed into one of the trials available in the same exam session in which the written test has been passed.

WARNING: THE FOLLOWING ARE DETAILED INSTRUCTIONS RELATED TO ONLINE WRITTEN TESTS USING MICROSOFT TEAMS

Written test of

Parallel Algorithms

Microsoft Teams is used to verify the student's identity and to supervise the test; the teacher will select each student asking him to show his identification document. The exam takes place within a team that students access through registration by the teacher or through a link that will be communicated via email. Students are asked to enter the team at least 15 minutes prior to the scheduled start time in order to perform recognition. Each student will participate in the meeting through a device equipped with a microphone and a webcam that must remain on for the duration of the test, framing the student and the paper on which he/she writes. The teacher will check the regularity of the students' work, immediately cancelling the test in case of irregularities of any kind.

For the distribution of the exam outline to the students, the teacher shares a pdf file in the TEAMS chat. The track is divided into 3 exercises for Parallel Algorithms, and the teacher releases to the students each of the exercises that compose it at fixed time intervals. The test is then carried out as follows:

1. The teacher releases the first exercise at the beginning of the roll call;

a. students print out the text of the exercise and begin drafting the paper on a sheet of paper clearly visible from the video capture device;

b. the teacher provides his/her estimate x of sufficient time to answer;

c. delivery of the paper related to the first exercise after x amount of time has elapsed;

d. Teacher waits 2 minutes for delivery;

2. Teacher delivers the second exercise;

a. students print out the text of the exercise and begin writing the paper on a sheet of paper clearly visible from the video capture device;

b. teacher gives his estimate y of sufficient time to answer;

c. delivery of the paper related to the second exercise after the time y has elapsed;

d. The teacher waits 2 minutes for the delivery;

3. Teacher releases the third exercise;

a. students print out the text of the exercise and begin writing the paper on a piece of paper that is clearly visible from the video capture device;

b. teacher gives his estimate z of sufficient time to answer;

c. delivery of the paper related to the second exercise after the time z has elapsed;

d. The teacher waits 2 minutes for the delivery;

4. The teacher informs the participating students that the roll call is closed.

For the delivery of the papers related to the individual exercises, the student is requested to take a photo of the relative sheets, including an image of the university booklet or, alternatively, an identification document. The photos should be placed in a directory called "Exercise_a_b_c_d" in which:

- a indicates the progressive number of the exercise (1, 2 or 3);

- b is PA (for Parallel Algorithms)

- c indicates the student's first name;

- d indicates the student's last name.

The directory must then be compressed, e.g. in zip format, and the resulting file must be delivered via team chat.

It is not possible to turn in an exercise before the allotted time and move on to the next exercise. The student who intends to abandon the test can give notice to the teacher at any time, and leave the team immediately afterwards. It is not possible to leave for any reason during the test. The student is required to ensure that he/she has everything necessary for the test (microphone, webcam, printer, camera, pen, adequate number of sheets of paper).

**Office Hours**

By appointment; contact the instructor by email or at the end of class meetings.

*Parallel **Algorithms*

Introduction. Parallel Architectures. Parallel algorithm design. Message-Passing Programming. Sieve of Erathostenes. Floyd all-pairs shortest path algorithm. Performance analysis. Matrix-vector multiplication. Document classification. Matrix multiplication. Linear Systems. Finite Difference Methods. Sorting.

*Parallel Programming*

Message-Passing programming using MPI.

*Sequential* A*lgorithms*

Introduction. Order of growth. Analysis of algorithms. Decrease and conquer. Divide and conquer. Recurrences. Randomized algorithms. Transform and conquer. Dynamic programming. Greedy algorithms. Single Source Shortest Paths. Dijkstra algorithm. Breadth-First Search. Bellman-Ford Algorithm. Complexity and computability. NP-Completeness. The transition from sequential to parallel computing. Parallel complexity.

Introduction to Algorithms. Third edition. Cormen, Leiserson, Rivest, Stein. The MIT Press

Parallel Programming in C with MPI and OpenMP (International Edition). Michael J. Quinn. McGraw-Hill

##### PARALLEL ALGORITHMS (ING-INF/05)

##### DATA MINING

Degree course MATEMATICA

Subject area ING-INF/05

Course type Laurea Magistrale

Credits 6.0

Teaching hours Ore totali di attività frontale: 42.0

For matriculated on 2020/2021

Year taught 2020/2021

Course year 1

Semestre Secondo Semestre (dal 22/02/2021 al 28/05/2021)

Language INGLESE

Subject matter PERCORSO COMUNE (999)

Calculus. Probability theory. Linear Algebra. Programming skills.

**Knowledge and understanding**. The course describes methods and models for the analysis of large amounts of data. Students must have a solid background with a broad spectrum of basic knowledge related to data mining:

- they must have solid knowledge of data mining models and methodologies;

**Applying knowledge and understanding. **After the course the student should be able to:

- describe and use the main data mining techniques;
- tackle new data mining problems by selecting the appropriate methods and justifying his/her choices;
- tackle new data mining problems by designing suitable algorithms and evaluating the results;
- explain experimental results to people outside of statistical machine learning or computer science.

**Making judgements**. Students must have the ability to process complex and/or fragmentary data and must arrive at original and autonomous ideas and judgments, and consistent choices in the context of their work, which are particularly delicate in the profession of data scientist. The course promotes the development of independent judgment in the appropriate choice of technique/model for data processing and the critical ability to interpret the goodness of the results of the models/methods applied to the datasets under examination.

**Communication.** It is essential that students are able to communicate with a varied and composite audience, not culturally homogeneous, in a clear, logical and effective way, using the methodological tools acquired and their scientific knowledge and, in particular, the specialty vocabulary. Students should be able to organize effective dissemination and study material through the most common presentation tools, including computer-based ones, to communicate the results of data analysis processes, for example by using visualization and reporting tools aimed at different types of audiences.

**Learning skills. **Students must acquire the critical ability to relate, with originality and autonomy, to the typical problems of data mining and, in general, cultural issues related to other similar areas. They should be able to develop and apply independently the knowledge and methods learnt with a view to possible continuation of studies at higher (doctoral) level or in the broader perspective of cultural and professional self-improvement of lifelong learning. Therefore, students should be able to switch to exhibition forms other than the source texts in order to memorize, summarize for themselves and for others, and disseminate scientific knowledge.

**Office Hours**

By appointment; contact the instructor by email or at the end of class meetings.

Mining of Massive Datasets

J. Leskovec, A. Rajaraman and J. Ullman

Freely availableonline: http://www.mmds.org

Data Mining and Analysis

M. J. Zaki and W. Meira

Freely available online: http://dataminingbook.info

Neural Networks and Deep Learning

Charu C. Aggarwal

Springer

##### DATA MINING (ING-INF/05)

##### PARALLEL ALGORITHMS

Degree course COMPUTER ENGINEERING

Subject area ING-INF/05

Course type Laurea Magistrale

Credits 9.0

Teaching hours Ore totali di attività frontale: 81.0

For matriculated on 2019/2020

Year taught 2020/2021

Course year 2

Semestre Primo Semestre (dal 22/09/2020 al 18/12/2020)

Language INGLESE

Subject matter PERCORSO COMUNE (999)

Location Lecce

Calculus I and II, Probability Theory. Programming skills and working knowledge of the C programming language.

The course provides a modern introduction to design, analysis and implementation of sequential and parallel algorithms. In particular, the course is based on a pragmatic approach to parallel programming of message-passing algorithms through the C language and the MPI library.

**Knowledge and understanding. **Students must have a solid background with a broad spectrum of basic knowledge of sequential and parallel algorithms:

· the students must have the basic cognitive tools to think analytically, creatively, critically and in an inquiring way, and have the abstraction and problem-solving skills needed to cope with complex systems;

· they must have a solid knowledge of the design and implementation of sequential and parallel efficient algorithms;

· they must have the tools for analysing the resources used by algorithms;

· they must have a catalogue of the most well-known and efficient sequential and parallel algorithms for basic computational problems.

**Applying knowledge and understanding. **After the course the student should be able to:

· Describe and use the main design techniques for sequential algorithms;

· Design, prove the correctness and analyze the computational complexity of sequential algorithms;

· Understand the differences among several algorithms solving the same problem and recognize which one is better under different conditions;

· Describe and use basic sequential algorithms;

· Describe and use basic data structures; know about the existence of advanced data structures;

· Understand the difference between sequential and parallel algorithms;

· Design, implement and analyze message-passing based parallel algorithms in C using the MPI library;

· Describe and use basic parallel algorithms.

**Making judgements. **Students are guided to learn critically everything that is explained to them in class, to compare different approaches to solving algorithmic problems, and to identify and propose, in an autonomous way, the most efficient solution they find.

**Communication. **It is essential that students are able to communicate with a varied and composite audience, not culturally homogeneous, in a clear, logical and effective way, using the methodological tools acquired and their scientific knowledge and, in particular, the specialty vocabulary. The course promotes the development of the following skills of the student: ability to expose in precise and formal terms an abstract model of concrete problems, identifying the salient features of them and discarding the nonessential ones; ability to describe and analyze an efficient solution to the problem in question.

**Learning skills. **Students must acquire the critical ability to relate, with originality and autonomy, to the typical problems of data mining and, in general, cultural issues related to other similar areas. They should be able to develop and apply independently the knowledge and methods learnt with a view to possible continuation of studies at higher (doctoral) level or in the broader perspective of cultural and professional self-improvement of lifelong learning. Therefore, students should be able to switch to exhibition forms other than the source texts in order to memorize, summarize for themselves and for others, and disseminate scientific knowledge.

The course aims to enable students to abstract formal algorithmic models and problems from concrete computational problems, and to design efficient algorithmic solutions for them. This will be done using the following teaching method. Every computational problem will be introduced, motivating it with concrete examples. The presentation of each topic will be divided into four parts: 1. Description of the actual computational problem. 2. Modelling the real problem by means of an abstract problem. 3. Resolution of the abstract problem through an algorithm obtained through the application of the general techniques of design of algorithms introduced in the course. 4. Analysis of the resources used by the algorithm. The course consists of frontal lessons, and classroom exercises. There will be theoretical lessons aimed at learning the basic techniques for the project and analysis of algorithms, and a part of lessons devoted to exercises in which we will illustrate, with plenty of examples, how the theoretical knowledge acquired can be used in order to solve algorithmic problems of practical interest and implement parallel algorithms in C language through the MPI library.

The exam consists of a written test and a prototype implementation of a parallel software. The written test (3 hours, 21 points out of 30) covers theoretical topics related to the design and analysis of sequential and parallel algorithms in order to verify the student’s knowledge and understanding of the materials. In order to pass the written test, students must obtain at least 14 points out of 21. The parallel software prototype (9 points out of 30) is meant to verify the practice of parallel programming and the ability of the student to implement theoretical parallel algorithms or to parallelize a sequential algorithm. In order to pass the parallel programming challenge, students must obtain at least 4 points out of 9. Both the written test and the prototype implementation of parallel software are mandatory. Students must pass both the written test and the parallel programming challenge.

WARNING: THE FOLLOWING ARE DETAILED INSTRUCTIONS RELATED TO ONLINE WRITTEN TESTS USING MICROSOFT TEAMS

Written test of

Parallel Algorithms

Microsoft Teams is used to verify the student's identity and to supervise the test; the teacher will select each student asking him to show his identification document. The exam takes place within a team that students access through registration by the teacher or through a link that will be communicated via email. Students are asked to enter the team at least 15 minutes prior to the scheduled start time in order to perform recognition. Each student will participate in the meeting through a device equipped with a microphone and a webcam that must remain on for the duration of the test, framing the student and the paper on which he/she writes. The teacher will check the regularity of the students' work, immediately cancelling the test in case of irregularities of any kind.

For the distribution of the exam outline to the students, the teacher shares a pdf file in the TEAMS chat. The track is divided into 3 exercises for Parallel Algorithms, and the teacher releases to the students each of the exercises that compose it at fixed time intervals. The test is then carried out as follows:

1. The teacher releases the first exercise at the beginning of the roll call;

a. students print out the text of the exercise and begin drafting the paper on a sheet of paper clearly visible from the video capture device;

b. the teacher provides his/her estimate x of sufficient time to answer;

c. delivery of the paper related to the first exercise after x amount of time has elapsed;

d. Teacher waits 2 minutes for delivery;

2. Teacher delivers the second exercise;

a. students print out the text of the exercise and begin writing the paper on a sheet of paper clearly visible from the video capture device;

b. teacher gives his estimate y of sufficient time to answer;

c. delivery of the paper related to the second exercise after the time y has elapsed;

d. The teacher waits 2 minutes for the delivery;

3. Teacher releases the third exercise;

a. students print out the text of the exercise and begin writing the paper on a piece of paper that is clearly visible from the video capture device;

b. teacher gives his estimate z of sufficient time to answer;

c. delivery of the paper related to the second exercise after the time z has elapsed;

d. The teacher waits 2 minutes for the delivery;

4. The teacher informs the participating students that the roll call is closed.

For the delivery of the papers related to the individual exercises, the student is requested to take a photo of the relative sheets, including an image of the university booklet or, alternatively, an identification document. The photos should be placed in a directory called "Exercise_a_b_c_d" in which:

- a indicates the progressive number of the exercise (1, 2 or 3);

- b is PA (for Parallel Algorithms)

- c indicates the student's first name;

- d indicates the student's last name.

The directory must then be compressed, e.g. in zip format, and the resulting file must be delivered via team chat.

It is not possible to turn in an exercise before the allotted time and move on to the next exercise. The student who intends to abandon the test can give notice to the teacher at any time, and leave the team immediately afterwards. It is not possible to leave for any reason during the test. The student is required to ensure that he/she has everything necessary for the test (microphone, webcam, printer, camera, pen, adequate number of sheets of paper).

**Office Hours**

By appointment; contact the instructor by email or at the end of class meetings.

*Parallel **Algorithms*

Introduction. Parallel Architectures. Parallel algorithm design. Message-Passing Programming. Sieve of Erathostenes. Floyd all-pairs shortest path algorithm. Performance analysis. Matrix-vector multiplication. Document classification. Matrix multiplication. Linear Systems. Finite Difference Methods. Sorting.

*Parallel Programming*

Message-Passing programming using MPI.

*Sequential* A*lgorithms*

Introduction. Order of growth. Analysis of algorithms. Decrease and conquer. Divide and conquer. Recurrences. Randomized algorithms. Transform and conquer. Dynamic programming. Greedy algorithms. Single Source Shortest Paths. Dijkstra algorithm. Breadth-First Search. Bellman-Ford Algorithm. Complexity and computability. NP-Completeness. The transition from sequential to parallel computing. Parallel complexity.

Introduction to Algorithms. Third edition. Cormen, Leiserson, Rivest, Stein. The MIT Press

Parallel Programming in C with MPI and OpenMP (International Edition). Michael J. Quinn. McGraw-Hill

##### PARALLEL ALGORITHMS (ING-INF/05)

##### DATA MINING

Degree course MATEMATICA

Subject area ING-INF/05

Course type Laurea Magistrale

Credits 6.0

Teaching hours Ore totali di attività frontale: 42.0

For matriculated on 2019/2020

Year taught 2019/2020

Course year 1

Semestre Secondo Semestre (dal 24/02/2020 al 29/05/2020)

Language INGLESE

Subject matter PERCORSO COMUNE (999)

Location Lecce

Calculus. Probability and Statistics. Linear Algebra. Programming skills.

**Knowledge and understanding**. The course describes methods and models for the analysis of large amounts of data. Students must have a solid background with a broad spectrum of basic knowledge related to data mining:

- they must have solid knowledge of data mining models and methodologies;

**Applying knowledge and understanding. **After the course the student should be able to:

- describe and use the main data mining techniques;
- tackle new data mining problems by selecting the appropriate methods and justifying his/her choices;
- tackle new data mining problems by designing suitable algorithms and evaluating the results;
- explain experimental results to people outside of statistical machine learning or computer science.

**Making judgements**. Students must have the ability to process complex and/or fragmentary data and must arrive at original and autonomous ideas and judgments, and consistent choices in the context of their work, which are particularly delicate in the profession of data scientist. The course promotes the development of independent judgment in the appropriate choice of technique/model for data processing and the critical ability to interpret the goodness of the results of the models/methods applied to the datasets under examination.

**Communication.** It is essential that students are able to communicate with a varied and composite audience, not culturally homogeneous, in a clear, logical and effective way, using the methodological tools acquired and their scientific knowledge and, in particular, the specialty vocabulary. Students should be able to organize effective dissemination and study material through the most common presentation tools, including computer-based ones, to communicate the results of data analysis processes, for example by using visualization and reporting tools aimed at different types of audiences.

**Learning skills. **Students must acquire the critical ability to relate, with originality and autonomy, to the typical problems of data mining and, in general, cultural issues related to other similar areas. They should be able to develop and apply independently the knowledge and methods learnt with a view to possible continuation of studies at higher (doctoral) level or in the broader perspective of cultural and professional self-improvement of lifelong learning. Therefore, students should be able to switch to exhibition forms other than the source texts in order to memorize, summarize for themselves and for others, and disseminate scientific knowledge.

**Office Hours**

By appointment; contact the instructor by email or at the end of class meetings.

Introduction. Map-Reduce. Mining data streams. Frequent Items. Frequent Itemsets and association rules. Mining similar items and Locality-Sensitive Hashing. Graph analysis. Link analysis and PageRank. Clustering. Recommendation systems. Mining Social-Network Graphs. Dimensionality reduction. Classification.

Mining of Massive Datasets

J. Leskovec, A. Rajaraman and J. Ullman

Freely availableonline: http://www.mmds.org

Data Mining and Analysis

M. J. Zaki and W. Meira

Freely available online: http://dataminingbook.info

##### DATA MINING (ING-INF/05)

##### PARALLEL ALGORITHMS

Degree course COMPUTER ENGINEERING

Subject area ING-INF/05

Course type Laurea Magistrale

Credits 9.0

Teaching hours Ore totali di attività frontale: 81.0

For matriculated on 2018/2019

Year taught 2019/2020

Course year 2

Semestre Primo Semestre (dal 23/09/2019 al 20/12/2019)

Language INGLESE

Subject matter PERCORSO COMUNE (999)

Location Lecce

**Knowledge and understanding. **Students must have a solid background with a broad spectrum of basic knowledge of sequential and parallel algorithms:

· they must have the tools for analysing the resources used by algorithms;

**Applying knowledge and understanding. **After the course the student should be able to:

· Describe and use the main design techniques for sequential algorithms;

· Design, prove the correctness and analyze the computational complexity of sequential algorithms;

· Describe and use basic sequential algorithms;

· Describe and use basic data structures; know about the existence of advanced data structures;

· Understand the difference between sequential and parallel algorithms;

· Describe and use basic parallel algorithms.

**Making judgements. **Students are guided to learn critically everything that is explained to them in class, to compare different approaches to solving algorithmic problems, and to identify and propose, in an autonomous way, the most efficient solution they find.

**Communication. **It is essential that students are able to communicate with a varied and composite audience, not culturally homogeneous, in a clear, logical and effective way, using the methodological tools acquired and their scientific knowledge and, in particular, the specialty vocabulary. The course promotes the development of the following skills of the student: ability to expose in precise and formal terms an abstract model of concrete problems, identifying the salient features of them and discarding the nonessential ones; ability to describe and analyze an efficient solution to the problem in question.

**Learning skills. **Students must acquire the critical ability to relate, with originality and autonomy, to the typical problems of data mining and, in general, cultural issues related to other similar areas. They should be able to develop and apply independently the knowledge and methods learnt with a view to possible continuation of studies at higher (doctoral) level or in the broader perspective of cultural and professional self-improvement of lifelong learning. Therefore, students should be able to switch to exhibition forms other than the source texts in order to memorize, summarize for themselves and for others, and disseminate scientific knowledge.

The course aims to enable students to abstract formal algorithmic models and problems from concrete computational problems, and to design efficient algorithmic solutions for them. This will be done using the following teaching method. Every computational problem will be introduced, motivating it with concrete examples. The presentation of each topic will be divided into four parts: 1. Description of the actual computational problem. 2. Modelling the real problem by means of an abstract problem. 3. Resolution of the abstract problem through an algorithm obtained through the application of the general techniques of design of algorithms introduced in the course. 4. Analysis of the resources used by the algorithm. The course consists of frontal lessons using slides made available to students via the Moodle platform, and classroom exercises. There will be theoretical lessons aimed at learning the basic techniques for the project and analysis of algorithms, and a part of lessons of an exercise type in which you will illustrate, with plenty of examples, how the theoretical knowledge acquired can be used in order to solve algorithmic problems of practical interest and implement parallel algorithms in C language through the MPI library.

Oral exam. Optionally, a student may be assigned a small project. During the exam the student is asked to illustrate theoretical topics in order to verify his/her knowledge and understanding of the selected topics. The student may also be asked to design a very simple algorithm in order to assess his/her ability to identify and use the relevant design techniques; alternatively, the student may be asked to analyze the complexity of a small code fragment.

**Office Hours**

By appointment; contact the instructor by email or at the end of class meetings.

Sequential A*lgorithms*

Introduction. Order of growth. Analysis of algorithms. Decrease and conquer. Divide and conquer. Recurrences. Randomized algorithms. Transform and conquer. Dynamic programming. Greedy algorithms. Complexity and computability. NP-Completeness.

*Parallel **Algorithms*

Introduction. The transition from sequential to parallel computing. Parallel complexity. Parallel architectures. Parallel algorithm design. Message-Passing programming. Sieve of Erathostenes. Floyd all-pairs shortest path algorithm. Performance analysis. Matrix-vector multiplication. Document classification. Matrix multiplication.

Introduction to Algorithms. Third edition. Cormen, Leiserson, Rivest, Stein. The MIT Press

Parallel Programming in C with MPI and OpenMP International Edition (2004) Michael J. Quinn McGraw-Hill

##### PARALLEL ALGORITHMS (ING-INF/05)

##### DATA MINING

Degree course MATEMATICA

Subject area ING-INF/05

Course type Laurea Magistrale

Credits 6.0

Teaching hours Ore totali di attività frontale: 42.0

For matriculated on 2018/2019

Year taught 2018/2019

Course year 1

Semestre Secondo Semestre (dal 25/02/2019 al 31/05/2019)

Language INGLESE

Subject matter PERCORSO COMUNE (999)

Location Lecce

Calculus. Probability and Statistics. Linear Algebra. Programming skills.

**Knowledge and understanding**. The course describes methods and models for the analysis of large amounts of data. Students must have a solid background with a broad spectrum of basic knowledge related to data mining:

- they must have solid knowledge of data mining models and methodologies;

**Applying knowledge and understanding. **After the course the student should be able to:

- describe and use the main data mining techniques;
- tackle new data mining problems by selecting the appropriate methods and justifying his/her choices;
- tackle new data mining problems by designing suitable algorithms and evaluating the results;
- explain experimental results to people outside of statistical machine learning or computer science.

**Making judgements**. Students must have the ability to process complex and/or fragmentary data and must arrive at original and autonomous ideas and judgments, and consistent choices in the context of their work, which are particularly delicate in the profession of data scientist. The course promotes the development of independent judgment in the appropriate choice of technique/model for data processing and the critical ability to interpret the goodness of the results of the models/methods applied to the datasets under examination.

**Communication.** It is essential that students are able to communicate with a varied and composite audience, not culturally homogeneous, in a clear, logical and effective way, using the methodological tools acquired and their scientific knowledge and, in particular, the specialty vocabulary. Students should be able to organize effective dissemination and study material through the most common presentation tools, including computer-based ones, to communicate the results of data analysis processes, for example by using visualization and reporting tools aimed at different types of audiences.

**Learning skills. **Students must acquire the critical ability to relate, with originality and autonomy, to the typical problems of data mining and, in general, cultural issues related to other similar areas. They should be able to develop and apply independently the knowledge and methods learnt with a view to possible continuation of studies at higher (doctoral) level or in the broader perspective of cultural and professional self-improvement of lifelong learning. Therefore, students should be able to switch to exhibition forms other than the source texts in order to memorize, summarize for themselves and for others, and disseminate scientific knowledge.

**Office Hours**

By appointment; contact the instructor by email or at the end of class meetings.

Introduction. Map-Reduce. Mining data streams. Frequent Items. Frequent Itemsets and association rules. Mining similar items and Locality-Sensitive Hashing. Graph analysis. Link analysis and PageRank. Clustering. Recommendation systems. Mining Social-Network Graphs. Dimensionality reduction. Classification.

Mining of Massive Datasets

J. Leskovec, A. Rajaraman and J. Ullman

Freely availableonline: http://www.mmds.org

Data Mining and Analysis

M. J. Zaki and W. Meira

Freely available online: http://dataminingbook.info

##### DATA MINING (ING-INF/05)

##### ELEMENTI DI INFORMATICA

Corso di laurea INGEGNERIA DELLE TECNOLOGIE INDUSTRIALI

Settore Scientifico Disciplinare ING-INF/05

Tipo corso di studio Laurea

Crediti 6.0

Ripartizione oraria Ore totali di attività frontale: 54.0

Per immatricolati nel 2018/2019

Anno accademico di erogazione 2018/2019

Anno di corso 1

Semestre Secondo Semestre (dal 04/03/2019 al 04/06/2019)

Lingua ITALIANO

Percorso unico (A96)

Sede Lecce

Nessuno

Il corso fornisce da una parte una moderna introduzione alla programmazione orientata agli oggetti, in particolare al linguaggio di programmazione Java, dall’altra introduce i concetti fondamentali delle basi di dati.

**Conoscenze e comprensione**. I risultati attesi di apprendimento prevedono che al termine del corso gli studenti:

· conoscano e siano in grado di applicare la sintassi e la semantica caratterizzanti il linguaggio di programmazione Java;

· conoscano e siano in grado di applicare gli elementi distintivi della programmazione orientata agli oggetti;

· conoscano i principi fondamentali delle basi di dati;

· conoscano il modello Entità-Relazioni ed il design di basi di dati;

· conoscano il linguaggio SQL;

· abbiano acquisito la capacità di problem solving;

· siano in grado di sviluppare, compilare, eseguire ed effettuare il debug di applicazioni Java, incluse applicazioni per l’accesso ad una base di dati.

**Capacità di applicare conoscenze e comprensione. **Dopo aver seguito il corso, lo studente dovrebbe essere in grado di:

· descrivere una possibile soluzione algoritmica per un problema reale;

· effettuare il design di una base di dati;

· interrogare una base di dati;

· implementare in linguaggio Java un’applicazione.

**Autonomia di giudizio.** Gli studenti devono possedere la capacità di problem solving e devono pervenire a idee e giudizi originali e autonomi, a scelte coerenti nell’ambito del loro lavoro, particolarmente delicate nell’ambito della implementazione di una applicazione in un linguaggio orientato agli oggetti.Il corso promuovelo sviluppo dell’autonomia di giudizio nella scelta appropriata della soluzione migliore relativa a semplici problemi e la capacità critica di interpretare la bontà dei risultati ottenuti.

**Abilità comunicative.** È fondamentale che gli studenti siano in grado di comunicare con un pubblico vario e composito, non omogeneo culturalmente, in modo chiaro, logico ed efficace, utilizzando gli strumenti metodologici acquisiti e le loro conoscenze scientifiche e, in particolar modo, il lessico di specialità.

Il corso favorisce lo sviluppo delle abilità inerenti le capacità di esporre in termini precisi e formali snippets di codice sorgente in linguaggio Java, modelli Entità-Relazioni, database designs, queries SQL e la descrizione di possibili soluzioni algoritmiche a problemi reali.

**Capacità di apprendimento. **Gli studenti devono acquisire la capacità critica di rapportarsi, con originalità e autonomia, alle problematiche tipiche della programmazione orientata agli oggetti. Devono essere in grado di rielaborare e di applicare autonomamente le conoscenze nella più ampia prospettiva di auto-aggiornamento culturale e professionale dell'apprendimento permanente. In particolare, devono poter riusare le conoscenze acquisite nell’ambito dell’apprendimento di altri linguaggi di programmazione orientati agli oggetti e, più in generale, nell’ambito dell’Ingegneria del Software. Devono inoltre essere in grado di riusare le conoscenze legate alle basi di dati indipendentemente dallo specifico DBMS utilizzato.

Il corso si articola in lezioni frontali che si avvalgono dell’uso di slides rese disponibili agli studenti mediante la piattaforma Moodle, ed esercitazioni in aula. Le lezioni frontali sono finalizzate al miglioramento delle conoscenze e capacità di comprensione degli studenti mediante l’esposizione del linguaggio di programmazione Java, dei principi di progettazione orientata agli oggetti e dei principi relativi alle basi di dati; gli studenti sono invitati a partecipare alla lezione con autonomia di giudizio, formulando domande, presentando esempi e discutendo possibili soluzioni alternative. Le esercitazioni sono finalizzate sia alla comprensione degli algoritmi e dei codici Java presentati, ed allo sviluppo della capacità di problem solving (dato un problema, lo studente deve analizzarlo ed individuare una soluzione algoritmica appropriata, implementandola correttamente in Java), sia alla comprensione delle queries SQL, dei modelli Entità-Relazioni e dei database designs presentati.

L’esame consiste in una prova scritta,nella quale lo studente dovràdimostrare di aver acquisito da una parte la capacità di modellare ed interrogare una base di dati, dall’altra la capacità di problem solving medianteimplementazionein linguaggio Java di un algoritmorisolutivo di un problema, utilizzandocorrettamente le principali strutture dati e glialgoritmidi base visti a lezione.

**Svolgimento prova scritta mediante Microsoft Teams **

**degli appelli di**

**Elementi di Informatica**

Per la verifica dell’identità dello studente e la sorveglianza della prova si usa Microsoft Teams; il docente selezionerà ogni singolo studente chiedendogli di mostrare il documento di identificazione. L’esame si svolge all’interno di un team a cui gli studenti accedono mediante iscrizione da parte del docente o tramite un link che sarà comunicato via mail. Gli studenti sono invitati ad accedere al team almeno 15 minuti prima dell’orario di inizio previsto, al fine di effettuare il riconoscimento. Ogni studente parteciperà al meeting mediante un dispositivo dotato di microfono e webcam che dovrà rimanere accesa per tutta la durata della prova, inquadrando lo studente ed il foglio su cui scrive. Il docente controllerà la regolarità del lavoro degli studenti, annullando immediatamente la prova in caso di irregolarità di ogni genere.

Per la distribuzione agli studenti della traccia d’esame, il docente condivide nella chat di TEAMS un file pdf. La traccia è suddivisa in 2 esercizi, ed il docente rilascia agli studenti ognuno degli esercizi che la compongono ad intervalli di tempo determinati. La prova si svolge quindi come segue:

- Il docente rilascia il primo esercizio all’inizio dell’appello;
- gli studenti stampano il testo dell’esercizio ed iniziano la stesura dell’elaborato su un foglio ben visibile dal dispositivo di acquisizione video;
- il docente fornisce la sua stima
*x*del tempo sufficiente per rispondere; - consegna dello svolgimento relativo al primo esercizio trascorso il tempo
*x*; - Il docente attende 2 minuti per la consegna;

- Il docente rilascia il secondo esercizio;
- gli studenti stampano il testo dell’esercizio ed iniziano la stesura dell’elaborato su un foglio ben visibile dal dispositivo di acquisizione video;
- il docente fornisce la sua stima
*y*del tempo sufficiente per rispondere; - consegna dello svolgimento relativo al secondo esercizio trascorso il tempo
*y*; - Il docente attende 2 minuti per la consegna;

- Il docente comunica agli studenti partecipanti la chiusura dell’appello.

Per la consegna degli elaborati relativi ai singoli esercizi, si richiede che lo studente fotografi i fogli relativi, includendo anche un’immagine del libretto universitario o, in alternativa, un documento di riconoscimento. Le foto dovranno essere inserite in una directory chiamata “Esercizio_*a*_*b*_*c_d*” in cui:

*a*indica il numero progressivo dell’esercizio (1, 2 o 3);*b*è EI (per Elementi di Informatica)*c*indica il nome dello studente;*d*indica il cognome dello studente.

La directory deve quindi essere compressa, ad esempio in formato zip, ed il file risultante deve essere consegnato tramite la chat del team.

Non è possibile consegnare lo svolgimento di un esercizio prima del tempo stabilito e passare allo svolgimento dell’esercizio successivo. Lo studente che intende rinunciare alla prova può darne comunicazione al docente in qualunque momento, ed uscire immediatamente dopo dal team. Non è possibile allontanarsi per nessun motivo durante lo svolgimento della prova. Lo studente è tenuto ad assicurarsi di disporre di tutto quanto il necessario allo svolgimento (microfono, webcam, stampante, fotocamera, penna, numero di fogli adeguato).

**Orario di ricevimento**

Previo appuntamento da concordare via email o al termine delle lezioni.

Introduzione al corso. Computers, Internet e Java. ApplicazioniJava. Input/Output ed operatori.Classi, oggetti, metodi e stringhe.Strutture di controllo. Operatori di assegnamento, incremento e decremento. Operatori logici. Dettagli sui metodi. Array ed ArrayList. Dettagli su classi e oggetti. Ereditarietà. Polimorfismo. Interfacce. Dettagli sulla gestione delle eccezioni. Stringhe, caratteri ed espressioni regolari. Files, Input/Output Streams, NIO e serializzazione XML. Generic collections. Espressioni lambda e streams. Ricorsione. Algoritmi di ricerca ed ordinamento, notazione asintotica. Concorrenza. Basi di dati ed utenti. Concetti ed architettura dei DBMS (Database Management Systems). Modellazione dei dati mediante il modello Entità-Relazioni (ER). Il modello relazionale, schemi e vincoli. Il linguaggio SQL. Accesso ai database con JDBC.

Java How to Program, Early Objects, 11th Edition

Deitel & Deitel

Pearson

Fundamentals Of Database Systems, Seventh Edition

Elmasri, Navathe

Pearson

##### ELEMENTI DI INFORMATICA (ING-INF/05)

### Tesi

Gli studenti interessati possono chiedere una tesi relativa al data mining, machine learning e deep learning sequenziale, parallelo e distribuito. In alternativa, gli studenti possono anche chiedere una tesi di laurea in materia di sicurezza e crittografia. Infine, la tesi può anche essere strettamente correlata al calcolo parallelo e distribuito, inclusi cloud, P2P e internet of things.

### Pubblicazioni

Profilo ORCID

Profilo DBLP

Profilo Google Scholar

Profilo ResearchGate

**Last update: June 16, 2022**

**Edited books**

[5] Massimo Cafaro, Jinoh Kim, Alex Sim. Proceedings of the 4th International Workshop on Systems and Network Telemetry and Analytics, SNTA@HPDC 2021, Stockholm, Sweden, June 21, 2021. ACM 2021, ISBN 978-1-4503-8386-8

[4] Massimo Cafaro, Jinoh Kim, Alex Sim. Proceedings of the 3rd International Workshop on Systems and Network Telemetry and Analytics, SNTA@HPDC 2020, Stockholm, Sweden, June 23, 2020. ACM 2020, ISBN 978-1-4503-7980-9

[3] 18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018, Washington, DC, USA, May 1-4, 2018. IEEE Computer Society 2018, ISBN 978-1-5386-5815-4. Edited by Esam El-Araby, Dhabaleswar K. Panda, Sandra Gesing, Amy W. Apon, Volodymyr V. Kindratenko, Massimo Cafaro, Alfredo Cuzzocrea. DOI: 10.1109/CCGRID.2018.00-76

[2] Grids, Clouds and Virtualization, Springer-Verlag, London, 2011, Series: Computer Communications and Networks. Edited by Massimo Cafaro and Giovanni Aloisio. Hardcover ISBN 978-0-85729-048-9, Softcover ISBN 978-1-4471-2592-1, eBook ISBN 978-0-85729-049-6. DOI: 10.1007/978-0-85729-049-6

[1] Workshop Proceedings of the Conference Grid and Pervasive Computing 2009, May 4-8 2009, Geneva, Switzerland, IEEE, edited by H. Muller, J. Chen, M. Cafaro, J. H. Park and N. Abdennadher, ISBN 978-0-7695-3677-4. DOI: 10.1109/GPC.2009.5

**Journals**

[48] Cesaroni, C., Spogli, L., Franceschi, G. D., Damaceno, J. G., Grzesiak, M., Vani, B., Monico, J. F. G., Romano, V., Alfonsi L. and Cafaro, M. (2021). A measure of ionospheric irregularities: zonal velocity and its implications for L-band scintillation at low-latitudes. Earth and Planetary Physics, 5(5), pp. 1–12. Amer Geophysical Union. Print and Electronic ISSN 2096-3955 DOI 10.26464/epp2021042

[47] I. Epicoco, C. Melle, M. Cafaro, M. Pulimeno. AFQN: Approximate Qn Estimation in Data Streams. To appear in Applied Intelligence. Springer. Print ISSN 0924-669X, Electronic ISSN 1573-7497.

[46] Luca Spogli, Hossein Ghobadi, Antonio Cicone, Lucilla Alfonsi, Claudio Cesaroni, Nicola Linty, Vincenzo Romano, Massimo Cafaro. Adaptive Phase Detrending for GNSS Scintillation Detection: A Case Study Over Antarctica. IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1-5, 2022, Art no. 8009905, Print ISSN 1545-598X Electronic ISSN 1558-0571, DOI: 10.1109/LGRS.2021.3067727

[45] M. Pulimeno, I. Epicoco, M. Cafaro. Distributed mining of time-faded heavy hitters. Information Sciences, Elsevier, Volume 545, 2021, Pages 633-662, ISSN 0020-0255, DOI 10.1016/j.ins.2020.09.048

[44] M. Cafaro, C. Melle, M. Pulimeno, I. Epicoco. Fast Online Computation of the Qn Estimator with Applications to the Detection of Outliers in Data Streams. Expert Systems With Applications, Elsevier, Volume 164, February 2021, Article 113831, ISSN: 0957-4174, DOI: 10.1016/j.eswa.2020.113831

[43] F. Ventruto, M. Pulimeno, M. Cafaro, I. Epicoco. On frequency estimation and detection of heavy hitters in data streams. Future Internet, MDPI, Volume 12, Issue 9, Article 158, September 2020, ISSN 1999-5903, DOI: 10.3390/fi12090158

[42] I. Epicoco, C. Melle, M. Cafaro, M. Pulimeno and G. Morleo. UDDSketch: Accurate Tracking of Quantiles in Data Streams. IEEE Access, vol. 8, pp. 147604-147617, 2020, ISSN: 2169-3536, DOI: 10.1109/ACCESS.2020.3015599

[41] Ghobadi H., Spogli L., Alfonsi L., Cesaroni C., Cicone A., Linty N., Romano V., Cafaro M. Disentangling ionospheric refraction and diffraction effects in GNSS raw phase through fast iterative filtering technique. GPS Solutions 24, 85 (2020), Springer, ISSN 1080-5370, DOI: 10.1007/s10291-020-01001-1

[40] Juliana G. Damaceno, Karl Bolmgren, Jon Bruno, Giorgiana De Franceschi, Cathryn Mitchell, Massimo Cafaro, "GPS loss of lock statistics over Brazil during the 24th solar cycle", Advances in Space Research, Elsevier, Volume 66, Issue 2, 2020, Pages 219-225, ISSN 0273-1177, DOI 10.1016/j.asr.2020.03.041

[39] M. Cafaro, I. Epicoco, M. Pulimeno, “CMSS: Sketching Based Reliable Tracking of Large Network Flows”. Future Generation Computer Systems, Elsevier, Volume 101, 2019, Pages 770-784, ISSN 0167-739X, DOI 10.1016/j.future.2019.07.031

[38] M. Cafaro, I. Epicoco, M. Pulimeno, Special Issue on Parallel and Distributed Data Mining, Information Sciences, Elsevier, Volume 496, 2019, Pages 284-286,, ISSN 0020-0255, DOI 10.1016/j.ins.2019.05.062

[37] H. Ghobadi, P. Testa, L. Spogli, M. Cafaro, L. Alfonsi, V. Romano, R. Bru, "User-Oriented ICT Cloud Architecture for High-Accuracy GNSS-Based Services”, Sensors 19, no. 11: 2635. 2019, ISSN 1424-8220, DOI:10.3390/s19112635. https://www.mdpi.com/1424-8220/19/11/2635

[36] M. Cafaro, I. Epicoco, M. Pulimeno, “Mining frequent items in unstructured P2P networks”, Future Generation Computer Systems, Elsevier, Volume 95, 2019, Pages 1-16, DOI 10.1016/j.future.2018.12.030, ISSN 0167-739X

[35] M.Cafaro, P. Pelle’, “Space-efficient Verifiable Secret Sharing Using Polynomial Interpolation”, IEEE Transactions on Cloud Computing, IEEE, vol. 6, no. 2, pp. 453-463, April-June 2018, DOI 10.1109/TCC.2015.2396072, ISSN: 2168-7161, http://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=7018923&isnumber=8372844

[34] M. Cafaro, M. Pulimeno, I. Epicoco, "Parallel mining of time-faded heavy hitters", Expert Systems With Applications, Elsevier, Volume 96, 2018, Pages 115-128, ISSN: 0957-4174, DOI: 10.1016/j.eswa.2017.11.021

[33] M. Cafaro, I. Epicoco, M. Pulimeno, G. Aloisio, "On Frequency Estimation and Detection of Frequent Items in Time Faded Streams”, IEEE Access, IEEE, Volume 5, Issue 1, (2017), pp. 24078-24093, ISSN: 2169-3536, DOI: 10.1109/ACCESS.2017.2757238

[32] I. Epicoco, M. Cafaro, M. Pulimeno, Fast and Accurate Mining of Correlated Heavy Hitters”, Data Mining & Knowledge Discovery, Springer, Volume 32, Issue 1, 2018, pp. 162-186, ISSN: 1384-5810, DOI:10.1007/s10618-017-0526-x

[31] Arcangelo Messina, Massimo Cafaro. "Parallel damage detection through finite frequency changes on multicore processors", Structural Engineering and Mechanics, Techno-Press, Volume 63, No. 4 (2017), pp. 457-469. ISSN: 1225-4568, DOI: https://doi.org/10.12989/sem.2017.63.4.457

[30] M. Cafaro, M. Pulimeno, I. Epicoco, G. Aloisio, “Parallel Space Saving on Multi and Many-Core Processors”, Concurrency and Computation: Practice and Experience, John Wiley and Sons Ltd, Volume 30, Issue 7, DOI: 10.1002/cpe.4160, Print ISSN 1532-0626, Online ISSN 1532-0634

[29] M. Cafaro, M. Pulimeno, I. Epicoco, G. Aloisio, “Mining frequent items in the time fading model”, Information Sciences, Elsevier, Volume 370-371, 2016, pp. 221-238, ISSN: 0020-0255, DOI: 10.1016/j.ins.2016.07.077 available on ScienceDirect http://www.sciencedirect.com/science/article/pii/S0020025516305618 and arXiv https://arxiv.org/abs/1601.03892

[28] M. Cafaro, M. Pulimeno, P. Tempesta, “A Parallel Space Saving Algorithm For Frequent Items and the Hurwitz zeta distribution”, Information Sciences, Elsevier, Volume 329, 2016, pp. 1 - 19, ISSN: 0020-0255, DOI: 10.1016/j.ins.2015.09.003 available on ScienceDirect http://www.sciencedirect.com/science/article/pii/S002002551500657X and arXiv http://arxiv.org/abs/1401.0702

[27] M. Cafaro, R. Civino, B. Masucci, "On the Equivalence of Two Security Notions for Hierarchical Key Assignment Schemes in the Unconditional Setting", IEEE Transactions on Dependable and Secure Computing, Volume 12, Issue 4 (2015), pp. 485 - 490, IEEE, DOI 10.1109/TDSC.2014.2355841, http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6894147, ISSN: 1545-5971

[26] M. Cafaro, M. Mirto, G. Aloisio, "Preference-Based Matchmaking of Grid Resources with CP-Nets", Journal of Grid Computing, Volume 11, Issue 2 (2013), pp. 211-237, Springer Netherlands, DOI: 10.1007/s10723-012-9235-2 Print ISSN 1570-7873, Online ISSN 1572-9184

[25] M. Cafaro, P. Tempesta: Finding frequent items in parallel. Concurrency and Computation: Practice and Experience Volume 23 Issue 15 pp. 1774-1788 (2011), John Wiley and Sons Ltd., Chichester, UK DOI: 10.1002/cpe.1761 Print ISSN 1532-0626, Online ISSN 1532-0634

[24] M. Cafaro, H. Müller, N. Abdennadher: Special Section: Grid and Pervasive Computing 2009. Future Generation Computer Systems Volume 27, Issue 5, May 2011, pp. 587-589, Elsevier, DOI:10.1016/j.future.2010.11.019, ISSN 0167-739X

[23] J. Chen, M. Cafaro: Special Issue: 3rd International Workshop on Workflow Management and Applications in Grid Environments (WaGe2008). Concurrency and Computation: Practice and Experience Volume 21 Issue 16 pp. 1961-1964 (2009) John Wiley and Sons Ltd., DOI: 10.1002/cpe.1459, Print ISSN 1532-0626, Online ISSN 1532-0634

[22] M. Cafaro, V. De Bene, G. Aloisio, “Deterministic Parallel Selection Algorithms on Coarse Grained Multicomputers”, Concurrency and Computation: Practice and Experience, Volume 21 Issue 18 (2009), pp. 2336–2354, John Wiley and Sons Ltd., DOI: 10.1002/cpe.1453, Print ISSN 1532-0626, Online ISSN 1532-0634

[21] M. Cafaro, I. Epicoco, S. Fiore, D. Lezzi, S. Mocavero, G. Aloisio, “Near Real-Time Parallel Processing and Advanced Data Management of SAR Images in Grid Environments”, Journal of Real-Time Image Processing, Special issue on architectures and techniques for real-time processing of remotely sensed images, Volume 4, Number 3, 2009, pp. 219-227, Springer-Verlag, DOI 10.1007/s11554-009-0119-z, Print ISSN 1861-8200, Online ISSN 1861-8219

[20] M. Cafaro, I. Epicoco, M. Mirto, D. Lezzi, G. Aloisio, "The Grid Resource Broker Workflow Engine", Concurrency and Computation: Practice and Experience, Wiley, Special Issue: 2nd International Workshop on Workflow Management and Applications in Grid Environments (WaGe2007), Volume 20, Issue 15, pp. 1725 – 1739, 2008, John Wiley & Sons, Ltd., DOI: 10.1002/cpe.1318, Print ISSN 1532-0626, Online ISSN 1532-0634

[19] M. Mirto, S. Fiore, I. Epicoco, M. Cafaro, S. Mocavero, E. Blasi, G. Aloisio, "A Bioinformatics Grid Alignment Toolkit", Future Generation Computer Systems, Elsevier, Volume 24, Number 7, pp. 752-762, 2008, DOI:10.1016/j.future.2008.02.001, ISSN 0167-739X

[18] M. Mirto, M. Cafaro, S. Fiore, Daniele Tartarini, G. Aloisio, "A Grid-enabled Protein Secondary Structure Predictor", IEEE Transactions on NanoBioscience, IEEE, Volume 6, Issue 2, pp. 124-130, June 2007, DOI 10.1109/TNB.2007.897475, ISSN 1536-1241

[17] G. Aloisio, M. Cafaro, G. Carteni, I. Epicoco, S. Fiore, D. Lezzi, M. Mirto , S. Mocavero, "The Grid Resource Broker Portal", Concurrency and Computation: Practice and Experience, Special Issue on Grid Computing Environments, Volume 19, Issue 12 (2007), pp. 1663-1670, John Wiley & Sons, Ltd., DOI: 10.1002/cpe.1131, Print ISSN 1532-0626, Online ISSN 1532-0634

[16] Y. Yang, O. F. Rana, D.W. Walker, R. Williams, C. Georgousopoulos, M. Cafaro, G. Aloisio, "An agent infrastructure for on-demand processing of remote-sensing archives", Int. J. on Digital Libraries Volume 5, Issue 2, pp 120-132 (2005), Springer-Verlag, DOI 10.1007/s00799-003-0054-8, Print ISSN 1432-5012, Online ISSN 1432-1300

[15] G. Aloisio, M.C. Barba, M. Cafaro, E. Blasi, S. Fiore, M. Mirto, "A Web Service based Grid Portal for Edgebreaker Compression", Methods of Information in Medicine, Special issue on HealthGrid Volume 44 Issue 2, pp. 233-238, Schattauer Publishers (2005), ISSN 0026-1270

[14] G. Aloisio, M. Cafaro, G. Carteni, I. Epicoco, G. Quarta (2005). A grid portal for Earth Observation community. IL NUOVO CIMENTO DELLA SOCIETÀ ITALIANA DI FISICA C, GEOPHYSICS AND SPACE PHYSICS, Società Italiana di Fisica, Bologna, vol. 28 issue 2, pp. 193-203 (2005), DOI: 10.1393/ncc/i2005-10182-5, Print ISSN 2037-4909, Online ISSN 1826-9885

[13] G.P. Marra, I. Schipa, G. Aloisio, M. Cafaro, D. Conte, C. Elefante, C. Mangia, M. Miglietta, U. Rizza, A. Tanzarella (2005). G-AQFS (Grid Air Quality Forecast System): An experimental system based on GRID computing technologies to forecast atmospheric dispersion of pollutants. IL NUOVO CIMENTO DELLA SOCIETÀ ITALIANA DI FISICA. C, GEOPHYSICS AND SPACE PHYSICS, Società Italiana di Fisica, Bologna, vol. 28 issue 2, pp. 183-192 (2005), Print ISSN 2037-4909, Online ISSN 1826-9885

[12] G. Aloisio, M. Cafaro, S. Fiore, I. Epicoco, M. Mirto, S. Mocavero, “Performance Analysis of Information Services in a Grid Environment”, Journal of Systemics, Cybernetics and Informatics, Volume 2, Number 5, 2004, pp. 24-30, Online ISSN 1690-4524

[11] G. Aloisio, M.C. Barba, E. Blasi, M. Cafaro, S. Fiore, M. Mirto, “BIG: a Grid Portal for Biomedical Data and Images”, Journal of Systemics, Cybernetics and Informatics, Volume 2, Number 3, 2004, pp. 10-18, Online ISSN 1690-4524

[10] G. Aloisio, M. Cafaro, R. Cesari, C. Mangia, G.P. Marra, M. Miglietta, M. Mirto, U. Rizza, I. Schipa, A. Tanzarella, "G-AQFS: Grid computing exploitation for the management of air quality in presence of complex meteorological circulations", Journal of Digital Information Management, Volume 2, Number 2, pp. 67-73, June 2004, Digital Information Research Foundation (DIRF) Press, ISSN 0972-7272

[9] G. Aloisio, M. Cafaro, M. Mirto, S. Fiore, "Early Experiences with the GRelC Library", Journal of Digital Information Management, Volume 2, Number 2, pp. 54-60, June 2004, Digital Information Research Foundation (DIRF) Press, ISSN 0972-7272

[8] G. Aloisio, M. Cafaro, "A Dynamic Earth Observation System", Parallel Computing, Volume 29, Issue 10 (2003), pp. 1357-1362, Special Issue on High performance computing with geographical data, Elsevier, DOI:10.1016/j.parco.2003.04.002, ISSN 0167-8191

[7] G. Aloisio, M. Cafaro, D. Lezzi, "The Desktop Grid Environment Enabler", Computing and Informatics, Volume 21, Number 4 (2002), pp. 333-345, Special Issue on Grid Computing, Slovak Academy of Sciences, Institute of Informatics, ISSN 1335-9150

[6] G. Aloisio, M. Cafaro, I. Epicoco, "Early experiences with the GridFTP protocol using the GRB-GSIFTP library", Future Generation Computer Systems, Volume 18, Issue 8 (2002), pp. 1053-1059, Special Issue on Grid Computing: Towards a New Computing Infrastructure, North-Holland, DOI:10.1016/S0167-739X(02)00084-5, ISSN 0167-739X

[5] G. Aloisio, M. Cafaro, "Web-based access to the grid using the Grid Resource Broker Portal", Concurrency and Computation: Practice and Experience, Volume 14 Issue 13-15 (2002), pp. 1145-1160, Special Issue on Grid Computing Environments, John Wiley & Sons, Ltd., DOI: 10.1002/cpe.677, Print ISSN 1532-0626, Online ISSN 1532-0634

[4] G. Aloisio, M. Cafaro, E. Blasi, I. Epicoco, "The Grid Resource Broker, a Ubiquitous Grid Computing Framework", Journal of Scientific Programming, Volume 10, Number 2 (2002), pp. 113-119, Special Issue on Grid Computing, IOS Press, Amsterdam, Print ISSN 1058-9244, Online ISSN 1875-919X

[3] G. Aloisio, M. Cafaro, C. Kesselman, R. Williams, "Web Access to SuperComputing", IEEE Computing in Science and Engineering, IEEE, Volume 3, Issue 6 (2001), pp. 66-72, DOI 10.1109/5992.963430, ISSN 1521-9615

[2] G. Aloisio, M.Cafaro, P. Beraldi, F. Guerriero, R. Musmanno,"An algorithm for solving the distributed termination detection problem": Parallel Algorithms and Applications, Volume 14, Issue 2, pp. 149-164, (1999) Taylor and Francis, DOI 10.1080/10637199808947383, ISSN 1063-7192

[1] B. Paternoster, M. Cafaro, "Computation of interval of stability of Runge-Kutta-Nystrom methods", Journal of Symbolic Computation, Elsevier, Volume 25, Issue 3 (1998) pp. 383-394, DOI:10.1006/jsco.1997.0183, ISSN 0747-7171

**Book Chapters**

[12] M. Cafaro, M. Pulimeno (2019), "Frequent Itemset Mining”. In: Business and Consumer Analytics: New Directions, Springer, DOI 10.1007/978-3-030-06222-4_6, pp. 269—304, book ISBN 978-3-030-06221-7, ebook ISBN 978-3-030-06222-4

[11] M. Cafaro, I. Epicoco and M. Pulimeno (2019) Data Mining: Mining Frequent Patterns, Associations Rules, and Correlations. In: Guenther, R. and Steel, D. (eds.), Encyclopedia of Bioinformatics and Computational Biology, vol. 1, pp. 358–366. Elsevier.

[10] M. Cafaro, I. Epicoco and M. Pulimeno (2019) Techniques for Designing Bioinformatics Algorithms. In: Guenther, R. and Steel, D. (eds.), Encyclopedia of Bioinformatics and Computational Biology, vol. 1, pp. 5–14. Elsevier.

[9] M. Cafaro, G. Aloisio, “Grids, Clouds and Virtualization”, in Grids, Clouds and Virtualization, Springer-Verlag, London, 2011, pp. 1-21, Series: Computer Communications and Networks. Edited by Massimo Cafaro and Giovanni Aloisio. Hardcover ISBN 978-0-85729-048-9, Softcover ISBN 978-1-4471-2592-1, eBook ISBN 978-0-85729-049-6

[8] S. Fiore, A. Negro, S. Vadacca, M. Cafaro, G. Aloisio, R. Barbera and E. Giorgio, “An Architectural Overview of the GRelC Data Access Service”, in Handbook of Research on Grid Technologies and Utility Computing: Concepts for Managing Large-Scale Applications, Emmanuel Udoh (Editor), Frank Wang (Co-Editor), pp. 98-108, IGI Global, 2009, DOI 10.4018/978-1-60566-184-1.ch010, ISBN13: 9781605661841 ISBN10: 1605661848 EISBN13: 9781605661858

[7] M. Mirto, I. Epicoco, M. Cafaro, S. Fiore, M. Passante, A. negro and G. Aloisio, “ProGenGrid: A Grid Problem Solving Environment for Bioinformatics”, in Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine, and Healthcare, Mario Cannataro (Editor), pp. 269-291, IGI Global, 2009, DOI: 10.4018/978-1-60566-374-6.ch014, ISBN13: 9781605663746 ISBN10: 1605663743 EISBN13: 9781605663753

[6] M. Cafaro et al., “The LIBI Grid Platform for Bioinformatics”, in Handbook of Research on Computational Grid Technologies for Life Sciences, Biomedicine, and Healthcare, Mario Cannataro (Editor), pp. 577-613, IGI Global, 2009, DOI: 10.4018/978-1-60566-374-6.ch029, ISBN13: 9781605663746 ISBN10: 1605663743 EISBN13: 9781605663753

[5] S. Fiore, M. Cafaro, M. Mirto, S. Vadacca, A. Negro and G. Aloisio, "The GRelC Project: state of the art and future directions", in Advances in Parallel Computing”, “High Performance Computing and Grids in Action”, L. Grandinetti (Ed), pp. 331-344, IOS Press, 2008, ISBN 978-1-58603-839-7 (print) 978-1-60750-313-2 (online), Series “Advances in Parallel Computing”, Volume 16

[4] M. Cafaro, I. Epicoco, G. Quarta, Sandro Fiore, G. Aloisio, “Design and Implementation of a Grid Computing Environment for Remote Sensing”, in “High-Performance Computing in Remote Sensing”, A. Plaza and C. Chang (Eds), pp. 281-308, Chapman & Hall/CRC, 2008, ISBN13 9781584886624 ISBN10 1-58488-662-5

[3] G. Aloisio, M. Cafaro, I. Epicoco, "A Grid Software Process", in "Grid Computing: Software Environments and Tools", Jose C. Cunha and Omer F. Rana (Eds), pp. 75-98, Springer-Verlag London, 2006, DOI: 10.1007/1-84628-339-6_4, Print ISBN 978-1-85233-998-2 Online ISBN 978-1-84628-339-0

[2] G. Aloisio, M. Cafaro, I. Epicoco, J. Nabrzyski, "The EU GridLab Project: A Grid Application Toolkit and Testbed", in "Engineering the Grid: Status and Perspective", pp. 123-138, L. T. Yang, Jack Dongarra, Adolfy Hoisie, Beniamino Di Martino and Hans Zima (Eds), American Scientific Publisher 2006, ISBN: 1-58883-038-1

[1] G. Aloisio, M. Cafaro, S. Fiore, M. Mirto, "The Grid Relational Catalog Project", in "Grid Computing: The New Frontiers of High Performance Computing”, Series “Advances in Parallel Computing”, Volume 14, pp.129-155, Elsevier, 2005, L. Grandinetti (Ed), ISBN: 978-0-444-51999-3

**Proceedings**

[73] A. Mariani, I. Epicoco, M. Cafaro, M. Pulimeno. Grid-based Contraction Clustering in a Peer-to-Peer Network. To appear in Proceedings of LOD 2022 - The 8th International Online & Onsite Conference on Machine Learning, Optimization, and Data Science – September 18 – 22, 2022 – Certosa di Pontignano, Siena – Tuscany, Italy, https://lod2022.icas.cc

[72] A. Della Monaca, M. Cafaro, M. Pulimeno, I. Epicoco. An Adaptive Clustering Approach for Distributed Outlier Detection in Data Streams. To appear in Proceedings of DCAI 2022 - The 19th International Conference on Distributed Computing and Artificial Intelligence, L'Aquila (Italy), 13th-15th July, 2022, www.dcai-conference.net

[71] J. G. Damaceno, C. Cesaroni, L. Spogli, G. D. Franceschi and M. Cafaro, "Multi-instrumental analyses of the September 2017 space weather storm over Brazil”, 2019 URSI Asia-Pacific Radio Science Conference (AP-RASC), New Delhi, India, 2019, pp. 1-4. DOI: 10.23919/URSIAP-RASC.2019.8738327

[70] Damaceno, J.G., Cesaroni, C., Spogli, L., Grzesiak, M., De Franceschi, G., Cafaro, M., “Regional short-term forecasting model to predict ionospheric scintillation and TEC at low latitudes”, 2019 URSI Asia-Pacific Radio Science Conference (AP-RASC), New Delhi, India, 2019, DOI: 10.23919/URSIAP-RASC.2019.8738621

[69] Zhou, W., Zomaya, A., Fox, G., Martino, B.D., Yang, L., Dong, M., Ranjan, R., Cafaro, M., Wang, W., “Message from the ISPA 2018 Chairs”

(2019) Proceedings - 16th IEEE International Symposium on Parallel and Distributed Processing with Applications, 17th IEEE International Conference on Ubiquitous Computing and Communications, 8th IEEE International Conference on Big Data and Cloud Computing, 11th IEEE International Conference on Social Computing and Networking and 8th IEEE International Conference on Sustainable Computing and Communications, ISPA/IUCC/BDCloud/SocialCom/SustainCom 2018, art. no. 8672268, pp. XXII-XXIII. DOI: 10.1109/BDCloud.2018.00005

[68] M. Pulimeno, I. Epicoco, M. Cafaro, C. Melle and G. Aloisio, "Parallel Mining of Correlated Heavy Hitters on Distributed and Shared-Memory Architectures”, 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 2018, pp. 5111-5118. doi: 10.1109/BigData.2018.8622201

[67] S. Samarelli, L. Agrimano, I. Epicoco, M. Cafaro, R. Nutricato, D. O. Nitti and F. Bovenga, “Rheticus®: a Cloud-Based Geo-Information Service for Ground Instabilities Detection and Monitoring”, in Proceedings of IGARSS 2018 - 2018 IEEE International Geoscience and Remote Sensing Symposium, Valencia, 2018, pp. 2238-2240, DOI: 10.1109/IGARSS.2018.8518226 Electronic ISBN: 978-1-5386-7150-4, USB ISBN: 978-1-5386-7149-8

[66] M. Pulimeno, I. Epicoco, M. Cafaro, C. Melle and G. Aloisio, "Parallel Mining of Correlated Heavy Hitters", in Gervasi O. et al. (eds), Proceedings of the 18th International Conference on Computational Science and Its Applications (ICCSA 2018), July 2 - 5, 2018, Melbourne, Australia. Lecture Notes in Computer Science, vol 10964, pp. 627–641, 2018. Springer, DOI https://doi.org/10.1007/978-3-319-95174-4_48 Print ISBN 978-3-319-95173-7 Online ISBN 978-3-319-95174-4

[65] M. Cafaro, I. Epicoco, G. Aloisio and M. Pulimeno, "CUDA Based Parallel Implementations of Space-Saving on a GPU," 2017 International Conference on High Performance Computing & Simulation (HPCS), Genoa, Italy, 2017, pp. 707-714. doi: 10.1109/HPCS.2017.108, ISBN 978-1-5386-3250-5

[64] M. Cafaro, M. Pulimeno, "Merging Frequent Summaries", ICTCS 2016, Proceedings of the 17th Italian Conference on Theoretical Computer Science, Lecce, Italy, September 7-9 2016, CEUR Proceedings, Volume 1720, pp. 280-285, http://ceur-ws.org/Vol-1720/

[63] Chen, J., Nepal, S., Cafaro, M., Pei, J., Sun, X.-H., Lin, X., Yang, L.T., ”Message from BDSE2013 Chairs," 2013 IEEE 16th International Conference on Computational Science and Engineering, Sydney, NSW, 2013, pp. xxx-xxx.

doi: 10.1109/CSE.2013.206

[62] M. Mirto, M. Cafaro, G. Aloisio, “Peer-to-Peer Data Discovery in Health Centers”, Computer-Based Medical Systems (CBMS), 2013 IEEE 26th International Symposium on, IEEE, pp. 343-348, 20-22 June 2013, DOI 10.1109/CBMS.2013.6627813, ISBN 978-1-4799-1053-3

[61] R. Van Engelen, M. Govindaraju, M. Cafaro, “Editorial message: First workshop on advances in high-performance e-Science middleware and applications: eScience 2008”, Proceedings of the 4th IEEE International Conference on eScience, eScience 2008, Indianapolis, IN, USA, DOI: 10.1109/eScience.2008.4

[60] S. Fiore, A. Negro, S. Vadacca, M. Cafaro, G. Aloisio, R. Barbera, E. Giorgio, “Advances in the GRelC Data Access Service”, IEEE Proceedings of the International Symposium on Parallel and Distributed Processing and Applications (ISPA 2008) - December 10-12, 2008 - Sydney, Australia, IEEE, pp. 849-854, DOI:10.1109/ISPA.2008.87 ISBN 978-0-7695-3471-8

[59] M. Mirto, M. Cafaro, I. Epicoco, G. Aloisio, Advances in the ProGenGrid Workflow Management System, IEEE Proceedings of the 1st International Workshop on High Performance Data Grid (HPDataGrid'08), held in conjunction with the 9th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'08), December 1-4 2008 - Dunedin, New Zealand, IEEE, pp. 538-543, DOI:10.1109/PDCAT.2008.60, ISBN 978-0-7695-3443-5

[58] M. Mirto, S. Fiore, M. Cafaro, M. Passante and G. Aloisio, A Grid-based Bioinformatics Wrapper for Biological Databases, Proceedings of the 21st IEEE International Symposium on Computer-Based Medical Systems (CBMS 2008), June 17-19 2008, Jyvaskyla, Finland, IEEE, pp. 191-196, DOI:10.1109/CBMS.2008.93, ISBN 978-0-7695-3165-6

[57] S. Fiore, M. Mirto, M. Cafaro, S. Vadacca, A. Negro and G. Aloisio, A GRelC based Data Grid Management Environment, Proceedings of the 21st IEEE International Symposium on Computer-Based Medical Systems (CBMS 2008), June 17-19 2008, Jyvaskyla, Finland, IEEE, pp. 355-360, DOI:10.1109/CBMS.2008.125, ISBN 978-0-7695-3165-6

[56] Sandro Fiore, Massimo Cafaro, Salvatore Vadacca, Alessandro Negro, Emanuele Verdesca, Maria Mirto, Giovanni Aloisio, “Asynchronous query mechanisms within the GRelC Data Access Service”, Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN 2008), February 12-14, 2008, Innsbruck, Austria, ACTA Press Anaheim, CA, USA, pp. 49-54, ISBN: 978-0-88986-714-7

[55] Massimo Cafaro, Daniele Lezzi, Sandro Fiore, Giovanni Aloisio, Robert Van Engelen, "The GSI plug-in for gSOAP: building cross-grid interoperable secure grid services", Proceedings of Parallel Processing and Applied Mathematics 2007, 7th International Conference, PPAM 2007, Gdansk, Poland, September 9-12, 2007, Volume 4967, pp. 894-901, Lecture Notes in Computer Science, Springer-Verlag Berlin, DOI:10.1007/978-3-540-68111-3_95, Print ISBN 978-3-540-68105-2, Online ISBN 978-3-540-68111-3

[54] Sandro Fiore, Alessandro Negro, Salvatore Vadacca, Massimo Cafaro, Maria Mirto, Giovanni Aloisio, "Advanced Grid DataBase Management with the GRelC Data Access Service", Proceedings of the 5th International Symposium on Parallel and Distributed Processing and Applications, ISPA 2007 Niagara Falls, Canada, August 29-31, 2007, Lecture Notes in Computer Science 4742, Springer-Verlag Berlin Heidelberg, pp. 683-694, DOI:10.1007/978-3-540-74742-0_61, Print ISBN 978-3-540-74741-3, Online ISBN 978-3-540-74742-0

[53] M. Cafaro, I. Epicoco, M. Mirto, D. Lezzi, G. Aloisio, "The Grid Resource Broker Workflow Engine", IEEE Proceedings of The 6th International Conference on Grid and Cooperative Computing (GCC 2006), Urumchi, Xinjiang, China, August 16-18, 2007, IEEE, pp. 725-732, DOI:10.1109/GCC.2007.120, ISBN 0-7695-2871-6

[52] S. Fiore, M. Cafaro, A. Negro, S. Vadacca, G. Aloisio, R. Barbera, E. Giorgio, "GRelC DAS: a Grid-DB Access Service for gLite Based Production Grids”, IEEE Proceedings of the Fourth International Workshop on Emerging Technologies for Next-generation GRID (ETNGRID 2007), June 18-20, 2007 - Paris (France), IEEE, in 16th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises, 2007. WETICE 2007, pp. 261-266, DOI:10.1109/WETICE.2007.4407167, ISBN 978-0-7695-2879-3

[51] S. Fiore, M. Mirto, M. Cafaro, G. Aloisio, "GRelC Data Storage: a Lightweight Disk Storage Management Solution for Bioinformatics “in silico” experiments", Proceedings of the 20th IEEE International Symposium on COMPUTER-BASED MEDICAL SYSTEMS (IEEE CBMS 2007), June 20-22, 2007, Maribor (Slovenia), IEEE, pp. 495-502, DOI:10.1109/CBMS.2007.53, ISBN 0-7695-2905-4

[50] Giovanni Aloisio, Massimo Cafaro, Italo Epicoco, Sandro Fiore, Maria Mirto, "A Services Oriented System for Bioinformatics Applications on the Grid", Studies in Health Technology and Informatics, From Genes to Personalized HealthCare: Grid Solutions for the Life Sciences, Volume 126, Proceedings of HealthGrid 2007, N. Jacq et al. (Eds.), 2007, Volume 126, IOS Press, pp. 174-183, 2007, ISBN 978-1-58603-738-3

[49] Giovanni Aloisio, Massimo Cafaro, Sandro Fiore, Maria Mirto, "A Grid System for the Ingestion of Biological Data into a Relational DBMS", Proceedings of IEEE International Symposium on Bioinformatics and Life Science Computing (BLSC-07), 21-23 May 2007, Niagara Falls, Canada, in 21st International Conference on Advanced Information Networking and Applications Workshops, 2007, AINAW ’07, IEEE, pp. 713-718, DOI:10.1109/AINAW.2007.26, ISBN:978-0-7695-2847-2

[48] Giovanni Aloisio, Massimo Cafaro, Sandro Fiore, Maria Mirto, Salvatore Vadacca, “GRelC Data Gather Service: a Step Towards P2P Production Grids”, Proceedings of 22nd ACM Symposium on Applied Computing (SAC 2007), Seoul, Korea, March 11 - 15, 2007, ACM New York, NY, USA, pp. 561-565, DOI:10.1145/1244002.1244131, ISBN 1-59593-480-4

[47] G. Aloisio, M. Cafaro, I. Epicoco, S. Fiore, M. Mirto, “BioGAT: a Grid Toolkit for Bioinformatics Sequence Alignment”, Proceeding of the Grid-Enabling Legacy Applications and Supporting End Users Workshop (GELA) within the framework of the 15th IEEE International Symposium on High Performance Distributed Computing, HPDC’15, Paris, France, June 19-23, 2006, pp. 77-85, 2006

[46] G. Aloisio, M. Cafaro, S. Fiore, M. Tana, “GridSAT architecture: a step further towards security and efficiency”, Proceedings of Parallel and Distributed Computing and Networks (PDCN) – IASTED, pp. 1-6, February 14 – 16, 2006 Innsbruck, Austria, ACTA Press Anaheim, CA, USA

[45] G. Aloisio, M. Cafaro, S. Fiore, M. Mirto, "A Split&Merge Data Management Architecture for a Grid Environment", Proceedings of the 19th IEEE International Symposium on Computer-Based Medical Systems (IEEE CBMS 2006), IEEE, pp. 739-744, June 22-23 2006, Salt Lake City Utah (USA), DOI:10.1109/CBMS.2006.28, ISBN 0-7695-2517-1

[44] G. Aloisio, M. Cafaro, S. Molendini, M. Tana, F. Tommasi, A. Tricco, “GridSAT: Grid enabled Satellite Architecture for Reliable Transmissions”, Proceedings of the 2nd IEEE International Symposium on Wireless Communication Systems, IEEE, September 5-7, 2005 Siena, Italy, pp. 634-638, DOI:10.1109/ISWCS.2005.1547782, ISBN 0-7803-9206-X

[43] G. Aloisio, M. Cafaro, G. Carteni, I. Epicoco, G. Quarta, S. Raolil, "GridFlow for Earth Observation Data Processing", Proceedings of The 2005 International Conference on Grid Computing and Applications (GCA'05: June 20-23 2005, Las Vegas, USA), pp. 168-174, CSREA Press, ISBN 1-932415-57-2

[42] G. Aloisio, M. Cafaro, S. Fiore, M. Mirto, “ProGenGrid: a Grid-enabled platform for Bioinformatics”, Studies in Health Technology and Informatics, From grid to Healthgrid - Proceedings of HealthGrid 2005, Tony Solomonides et al. (Eds), IOS Press, Volume 112, IOS Press, pp. 113-126, 2005, ISBN 978-1-58603-510-5

[41] G. Aloisio, M. Cafaro, I. Epicoco, Sandro Fiore, Maria Mirto, “A Semantic Grid-based Data Access and Integration Service for Bioinformatics”, Electronic Proceedings of the 5th IEEE International Symposium on Cluster Computing and the Grid (CCGrid) 2005, IEEE, May 9-12, 2005, Cardiff, Wales, UK, pp. 196 - 203 Vol. 1 , DOI:10.1109/CCGRID.2005.1558554, ISBN:0-7803-9074-1

[40] G. Aloisio, M. Cafaro, S. Fiore, M. Mirto, “ProGenGrid: a Workflow Service Infrastructure for Composing and Executing Bioinformatics Grid Services”, Proceedings of the 18th IEEE International Symposium on Computer-Based Medical Systems (IEEE CBMS 2005), IEEE, pp. 555-560, Dublin, June 23-24 2005, Ireland, DOI:10.1109/CBMS.2005.90, ISBN 0-7695-2355-2

[39] G. Aloisio, M. Cafaro, I. Epicoco, G. Quarta, “Teaching High Performance Computing Parallelizing a real Computational Science Application”, Procedings of Computational Science – ICCS 2005: 5th International Conference, Atlanta, GA, USA, May 22-25, 2005. Proceedings, Part II, Springer-Verlag Berlin Heidelberg, Lecture Notes in Computer Science Volume 3515, 2005, pp 10-17, DOI:10.1007/11428848_2, Print ISBN 978-3-540-26043-1, Online ISBN 978-3-540-32114-9

[38] G. Aloisio, M. Cafaro, I. Epicoco, S. Fiore, D. Lezzi, M. Mirto, S. Mocavero, “Resource and Service Discovery in the iGrid Information Service”, Proceedings of International Conference on Computational Science and its Applications (ICCSA 2005), Singapore, May 9-12, 2005, Proceedings, Part III, Springer-Verlag Berlin Heidelberg, Lecture Notes in Computer Science Volume 3482, 2005, pp 1-9, DOI:10.1007/11424857_1, Print ISBN 978-3-540-25862-9, Online ISBN 978-3-540-32045-6

[37] G. Aloisio, M. Cafaro, S. Fiore, M. Mirto. “The Grid-DBMS: Towards Dynamic Data Management in Grid Environments”. Proceedings of IEEE International Conference on Information Technology: Coding and Computing, ITCC 2005, 4-6 April 2005, Las Vegas, Nevada, USA, IEEE, Vol. II, pp. 199-204, 2005, DOI:10.1109/ITCC.2005.272, ISBN:0-7695-2315-3

[36] G. Aloisio, M. Cafaro, D. Conte, S. Fiore, I. Epicoco, G.P. Marra, G. Quarta, "A grid enabled Web Map Server", Proceedings of IEEE International Conference on Information Technology: Coding and Computing, ITCC 2005, 4-6 April 2005, Las Vegas, Nevada, USA, IEEE, Volume I, pp. 298-303, 2005, DOI:10.1109/ITCC.2005.12, ISBN 0-7695-2315-3

[35] G. Aloisio, M. Cafaro, I. Epicoco, D. Lezzi, R. Van Engelen, “The GSI plug-in for gSOAP: Enhanced Security, Performance, and Reliability”, Proceedings of IEEE International Conference on Information Technology: Coding and Computing, ITCC 2005, 4-6 April 2005, Las Vegas, Nevada, USA, IEEE, Volume I, pp. 304-309, DOI:10.1109/ITCC.2005.273, ISBN:0-7695-2315-3

[34] G. Aloisio, M. Cafaro, I. Epicoco, S. Fiore, D. Lezzi, M. Mirto and S. Mocavero, “iGrid, a Novel Grid Information Service”, Proceedings of Advances in Grid Computing - EGC 2005 (European Grid Conference, Amsterdam, The Netherlands, February 14-16, 2005, Revised Selected Papers), Lecture Notes in Computer Science, Springer-Verlag Berlin Heidelberg, Volume 3470, pp. 506-515, 2005, DOI:10.1007/11508380_52, Print ISBN 978-3-540-26918-2, Online ISBN 978-3-540-32036-4

[33] G. Aloisio, M. Cafaro, S. Fiore, G. Quarta, “A Grid-Based Architecture for Earth Observation Data Access”, Proceeding of the 2005 ACM Symposium on Applied Computing (SAC 2005), ACM New York, NY, USA, March 13-17, 2005, Santa Fe, New Mexico, USA, Volume I, pp. 701-705, DOI:10.1145/1066677.1066837, ISBN 1-58133-964-0

[32] G. Aloisio, Z. Balaton, P. Boon, M. Cafaro, I. Epicoco, G. Gombás, P. Kacsuk, T. Kielmann, D. Lezzi, “Integrating Resource and Service Discovery in the CoreGrid Information Cache Mediator Component”, In Proceedings of CoreGrid Integration Workshop, 2005

[31] G. Aloisio, M. Cafaro, S. Fiore, M. Mirto, “Advanced Delivery Mechanisms in the GRelC Project”, Proceeding of 2nd International Workshop on Middleware for Grid Computing (MGC 2004), ACM New York, NY, USA, October 18 2004, Toronto, Ontario Canada, pp. 69-74, DOI:10.1145/1028493.1028505, ISBN 1-58113-950-0

[30] G. Aloisio, M. Cafaro, S. Fiore, M. Mirto, “Bioinformatics Data Access Service in the ProGenGrid System”, Proceeding of First International Workshop on Grid Computing and its Application to Data Analysis (GADA 2004), October 25-29, Larnaca, Cyprus, Greece, in On the Move to Meaningful Internet Systems 2004: OTM 2004 Workshops, Lecture Notes in Computer Science Volume 3292, 2004, pp. 211-221, Springer-Verlag Berlin Heidelberg, DOI:10.1007/978-3-540-30470-8_38, Print ISBN 978-3-540-23664-1, Online ISBN 978-3-540-30470-8

[29] G. Aloisio, M. Cafaro, S. Fiore, M. Mirto, "ProGenGrid: A Grid Framework for Bioinformatics", Proceeding of Biological and Artificial Intelligence Environments, 15th Italian Workshop on Neural Nets, WIRN VIETRI 2004 (Part 1, Pre-Wirn workshop on Computational Intelligence Methods for Bioinformatics and Bistatistics - CIBB 2004), Springer Netherlands, September 14-17 2004, Perugia, Italy, pp. 1-9, DOI:10.1007/1-4020-3432-6_1, Print ISBN 978-1-4020-3431-2, Online ISBN 978-1-4020-3432-9

[28] G. Aloisio, M. Cafaro, I. Epicoco, G. Quarta, “Information Management for Grid-Based Remote Sensing Problem Solving Environment,” Proceedings of the 2004 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'04), Las Vegas, Nevada, USA, June 21-24, 2004, Hamid R. Arabnia and Jun Ni (Ed.), pp. 887-894, Vol. II, CSREA Press, 2004, ISBN: 1-932415-24-6

[27] G. Aloisio, M. Cafaro, S. Fiore, M. Mirto, “A Gather Service in a Health Grid Environment”, Electronic proceedings of Medicon and Health Telematics 2004, IFMBE Proceedings (Springer), Volume 6, July 31 - August 05 2004, Island of Ischia, Naples, Italy

[26] G. Aloisio, M. Cafaro, R. Cesari, C. Mangia, G.P. Marra, M. Miglietta, M. Mirto, U. Rizza, I. Schipa, A. Tanzarella, “G-AQFS: Grid computing exploitation for the management of air quality in presence of complex meteorological circulations”, Proceedings of IEEE International Conference on Information Technology: Coding and Computing, April 5 to 7, 2004, Las Vegas, Nevada, IEEE, Volume II, pp. 83-87, DOI:10.1109/ITCC.2004.1286594, ISBN 0-7695-2108-8

[25] G. Aloisio, M. Cafaro, I. Epicoco, G. Quarta, “A Problem Solving Environment for Remote Sensing Data Processing”, in Proceedings of IEEE International Conference on Information Technology: Coding and Computing, April 5 to 7, 2004, Las Vegas, Nevada, IEEE, Volume II, pp. 56-61, DOI:10.1109/ITCC.2004.1286590, ISBN 0-7695-2108-8

[24] G. Aloisio, M. Cafaro, S. Fiore, M. Mirto, “The GRelC Library: A Basic Pillar in the Grid Relational Catalog Architecture”, in Proceedings of IEEE International Conference on Information Technology: Coding and Computing, April 5 to 7, 2004, Las Vegas, Nevada, IEEE Press, Volume I, pp. 372-376, DOI:10.1109/ITCC.2004.1286482, ISBN 0-7695-2108-8

[23] G. Aloisio, M. Cafaro, S. Fiore, M. Mirto, “The GRELC Project: Towards GRID-DBMS”, Proceedings of the International Conference on Parallel and Distributed Computing and Networks (PDCN) - IASTED, pp. 1-6, Innsbruck (Austria) February 17-19, 2004, ACTA Press Anaheim, CA, USA

[22] G. Aloisio, M.C. Barba, M. Cafaro, E. Blasi, S. Fiore, M. Mirto, “Web Services for Edgebreaker Compression in a Grid Portal”, Electronic Proceedings of HealthGrid, 29-30 January, 2004, Clermont-Ferrand, Francia

[21] G. Aloisio, E. Blasi, M. Cafaro, M, S. Fiore, M. Mirto, “A virtual clinical folder on the grid”, Edited by: Callaos, N; Horimoto, K; Chen, J; et al., the 8th World Multi-Conference on Systemics, Cybernetics and Informatics, Orlando, FL, JUL 18-21, 2004, Vol. VII pp. 113-118

[20] G. Aloisio, M. Cafaro, S. Fiore, S. Mocavero, “An adaptive scheduling infrastructure for a grid environment”, Edited by: Callaos, N; Lesso, W; Sanchez, B, the 8th World Multi-Conference on Systemics, Cybernetics and Informatics, Orlando, FL, JUL 18-21, 2004, Vol. II pp. 11-15

[19] G. Aloisio, E. Blasi, M. Cafaro, I. Epicoco, G. Quarta, M. Tana, A. Zuccala, “A distributed architecture for remote sensing data management”, Edited by: Callaos, N; Lesso, W; Sanchez, B, the 8th World Multi-Conference on Systemics, Cybernetics and Informatics, Orlando, FL, JUL 18-21, 2004, Vol. I pp. 236-240

[18] G. Aloisio, M. Cafaro, I. Epicoco, D. Lezzi, M. Mirto, S. Mocavero, “The Design and Implementation of the GridLab Information Service”, Proceedings of The Second International Workshop on Grid and Cooperative Computing (GCC 2003), 7-10 December 2003, Shanghai (China), Springer-Verlag Berlin Heidelberg, Lecture Notes in Computer Science Volume 3032, 2004, pp. 131-138, DOI:10.1007/978-3-540-24679-4_26, Print ISBN 978-3-540-21988-0, Online ISBN 978-3-540-24679-4

[17] G. Aloisio, E. Blasi, M. Cafaro, I. Epicoco, S. Fiore, S. Mocavero, “A Grid Environment for Diesel Engine Chamber Optimization”, Proceedings of ParCo2003, 2-5 September, Dresden (Germany), 2003, Elsevier, ISBN 0-444-51689-1, pp. 599-607

[16] G. Aloisio, M. Cafaro, D. Lezzi, R. Van Engelen, “Secure Web Services with Globus GSI and gSOAP”, Proceedings of Euro-Par 2003 Parallel Processing, 26th - 29th August 2003, Klagenfurt, Austria, Springer-Verlag Berlin Heidelberg, Lecture Notes in Computer Science Volume 2790, 2003, pp .421-426, DOI:10.1007/978-3-540-45209-6_62, Print ISBN 978-3-540-40788-1, Online ISBN 978-3-540-45209-6

[15] G. Aloisio, M. Cafaro, E. Blasi, M. Mirto, S. Fiore, D. Lezzi, “Web Services for a Biomedical Imaging Portal”, Proceedings of Information Technology: Coding and Computing (ITCC 2003), IEEE, Volume IV, pp. 432-436, Las Vegas, Nevada, 28-30 April 2003, best student paper award, DOI:10.1109/ITCC.2003.1197568, ISBN 0-7695-1916-4

[14] G. Aloisio, E. Blasi, M. Cafaro, I. Epicoco, S. Fiore, M. Mirto, “Dynamic Grid Catalog Information Service”, Proceedings of the First European Across Grids Conference, February 13-14, 2003 Santiago de Compostela, Spain, Springer-Verlag Berlin Heidelberg, Lecture Notes in Computer Science Volume 2970, 2004, pp 198-205, DOI:10.1007/978-3-540-24689-3_25, Print ISBN 978-3-540-21048-1, Online ISBN 978-3-540-24689-3

[13] G. Aloisio, E. Blasi, M. Cafaro, S. Fiore, M. Mirto, “A Grid Portal for Biomedical Imaging”, Edited by: Callaos, N; Lesso, W; Schewe, KD; et al., the 7th World Multiconference on Systemics, Cybernetics and Informatics, ORLANDO, FL, JUL 27-30, 2003, Vol. V pp. 57-62

[12] G. Aloisio, M. Cafaro, S. Fiore, S; I. Epicoco, M. Mirto, S. Mocavero, “A performance comparison between GRIS and LDGC Information Services”, Edited by: Callaos, N; Lesso, W; Schewe, KD; et al., the 7th World Multiconference on Systemics, Cybernetics and Informatics, ORLANDO, FL, JUL 27-30, 2003, Vol. XII pp. 416-420

[11] G. Aloisio, E. Blasi, M. Cafaro, M. Mirto, S. Fiore, “A genetic algorithm for medical image registration”, Edited by: Callaos, N; Whymark, G; Lesso, W, the 6th World Multi-Conference on Systemics, Cybernetics and Informatics (SCI 2002)/8th International Conference on Information Systems Analysis and Synthesis (ISAS 2002)ORLANDO, FL, JUL 14-18, 2002, pp. 192-197

[10] G. Aloisio, L. De Paolis, L. Provenzano, M. Cafaro, L. Colizzi, “Coronary stent implant simulation using haptic interface: Problems and solutions “, Edited by: Callaos, N; Whymark, G; Lesso, W, the 6th World Multi-Conference on Systemics, Cybernetics and Informatics (SCI 2002)/8th International Conference on Information Systems Analysis and Synthesis (ISAS 2002) ORLANDO, FL, JUL 14-18, 2002, pp. 570-573

[9] G. Aloisio, M. Cafaro, I. Epicoco, “SARA, a Web Based Remote Sensing Digital Library”, Electronic Proceedings of IEEE International Parallel and Distributed Processing Symposium 2002, IEEE, Fort Lauderdale, Florida, April 15-19, 2002, DOI:10.1109/IPDPS.2002.1016634

[8] G. Aloisio, M. Cafaro, E. Blasi, L. Depaolis, I. Epicoco, “The GRB Library: Grid Computing with Globus in C”, Proceedings of High-Performance Computing and Networking, the 9th International Conference HPCN Europe 2001, Amsterdam, Netherlands, June 25–27, 2001, Springer-Verlag Berlin Heidelberg, Lecture Notes in Computer Science Volume 2110, 2001, pp. 133-139, DOI:10.1007/3-540-48228-8_14, Print ISBN 978-3-540-42293-8, Online ISBN 978-3-540-48228-4

[7] Allen G., Dramlitsch T., Goodale T., Lanfermann G., Radke T., Seidel E., Kielmann T., Verstoep K., Balaton Z., Kacsuk P., Szalai F., Gehring J., Keller A., Streit A., Matyska L., Ruda M., Krenek A., Knipp H., Merzky A., Reinefeld A., Schintke F., Ludwiczak B., Nabrzyski J., Pukacki J., Kersken H.-P., Aloisio G., Cafaro M., Ziegler W., Russell M., “Early experiences with the EGrid testbed”, Proceedings of the first IEEE/ACM symposium on Cluster Computing and the Grid, CCGRID 2001, 15-18 May 2001, Brisbane, IEEE, pp.130-137, 2001, DOI:10.1109/CCGRID.2001.923185, ISBN 0-7695-1010-8

[6] G. Aloisio, M. Cafaro, “An introduction to the Globus toolkit”, 2000 CERN School of Computing Conference, Edited by: Vandoni, CE, MARATHON, GREECE, SEP 17- 30, 2000, Book Series: C E R N REPORTS , pp. 117-131

[5] G. Aloisio, M. Cafaro, A. Mongelli, R. Williams, “The use of Computational Grids for a Dynamic Earth Observation System”, the 4th World Multi-Conference on Systemics, Cybernetics and Informatics, Orlando, FL, JUL 23-26, 2000, Vol. VII pp. 535-537

[4] G. Aloisio, M. Cafaro, P. Falabella, C. Kesselman, R. Williams, “Grid Computing on the Web using the Globus Toolkit”, Proceedings of High Performance Computing and Networking, the 8th International Conference HPCN Europe 2000 Amsterdam, The Netherlands, May 8–10, 2000, Springer-Verlag Berlin Heidelberg, Lecture Notes in Computer Science Volume 1823, 2000, pp. 32-40, DOI:10.1007/3-540-45492-6_4, Print ISBN 978-3-540-67553-2, Online ISBN 978-3-540-45492-2

[3] M. Cafaro, B. Paternoster, “Analysis of Stability of Rational Approximations through computer algebra”, Proceedings of The Second Workshop on Computer Algebra in Scientific Computing (CASC 99), Munich, May 31-June 4, 1999, pp. 25-36, Springer-Verlag Berlin Heidelberg, DOI:10.1007/978-3-642-60218-4_2, Print ISBN 978-3-540-66047-7, Online ISBN 978-3-642-60218-4

[2] G. Aloisio, M. Cafaro, R. Williams, “The Digital Puglia Project: an Active Digital Library of remote sensing data”, Proceedings of High-Performance Computing and Networking, the 7th International Conference, HPCN Europe 1999, Amsterdam, The Netherlands, April 12–14, 1999, Springer Berlin Heidelberg, Lecture Notes in Computer Science Volume 1593, 1999, pp. 563-572, DOI:10.1007/BFb0100617, Print ISBN 978-3-540-65821-4, Online ISBN 978-3-540-48933-7

[1] G. Aloisio, M. Cafaro, R. Williams, P. Messina, “A distributed web-based metacomputing environment”, Proceedings of High-Performance Computing and Networking, the 5th International Conference, HPCN Europe 1997, Vienna, Austria, April 28–30, 1997 Springer Berlin Heidelberg, Lecture Notes in Computer Science Volume 1225, 1997, pp. 480-486, DOI:10.1007/BFb0031620, Print ISBN 978-3-540-62898-9, Online ISBN 978-3-540-69041-2

### Temi di ricerca

Algoritmi sequenziali, paralleli e distribuiti. Cloud/grid/P2P computing. Data mining. Machine learning. Deep learning. Security e crittografia.

### Risorse correlate

#### Collegamenti

Sito web personale (in inglese) (Apre una nuova finestra)