Your search
Results 129 resources
-
Association Rule Mining by Aprior method has been one of the popular data mining techniques for decades, where knowledge in the form of item-association rules is harvested from a dataset. The quality of item-association rules nevertheless depends on the concentration of frequent items from the input dataset. When the dataset becomes large, the items are scattered far apart. It is known from previous literature that clustering helps produce some data groups which are concentrated with frequent items. Among all the data clusters generated by a clustering algorithm, there must be one or more clusters which contain suitable and frequent items. In turn, the association rules that are mined from such clusters would be assured of better qualities in terms of high confidence than those mined from the whole dataset. However, it is not known in advance which cluster is the suitable one until all the clusters are tried by association rule mining. It is time consuming if they were to be tested by brute-force. In this paper, a statistical property called prior probability is investigated with respect to selecting the best out of many clusters by a clustering algorithm as a pre-processing step before association rule mining. Experiment results indicate that there is correlation between prior probability of the best cluster and the relatively high quality of association rules generated from that cluster. The results are significant as it is possible to know which cluster should be best used for association rule mining instead of testing them all out exhaustively.
-
Air pollution is a major concern issue on Macao since the concentration levels of several of the most common pollutants are frequently above the internationally recommended values. The low air quality episodes impacts on human health paired with highly populated urban areas are important motivations to develop forecast methodologies in order to anticipate pollution episodes, allowing establishing warnings to the local community to take precautionary measures and avoid outdoor activities during this period. Using statistical methods (multiple linear regression (MLR) and classification and regression tree (CART) analysis) we were able to develop forecasting models for the main pollutants (NO2, PM2.5, and O3) enabling us to know the next day concentrations with a good skill, translated by high coefficients of determination (0.82–0.90) on a 95% confidence level. The model development was based on six years of historical data, 2013 to 2018, consisting of surface and upper-air meteorological observations and surface air quality observations. The year of 2019 was used for model validation. From an initially large group of meteorological and air quality variables only a few were identified as significant dependent variables in the model. The selected meteorological variables included geopotential height, relative humidity and air temperature at different altitude levels and atmospheric stability characterization parameters. The air quality predictors used included recent past hourly levels of mean concentrations for NO2 and PM2.5 and maximum concentrations for O3. The application of the obtained models provides the expected daily mean concentrations for NO2 and PM2.5 and maximum hourly concentrations O3 for the next day in Taipa Ambient air quality monitoring stations. The described methodology is now operational, in Macao, since 2020.