Developing Fusion Approaches to Integrating Uncertain and Inconsistent Information from Social Media Sources
Supervisor: Prof. Weiru Liu
Social media is to be one of the main sources of obtaining information for all kinds of purposes and applications. Information extraction tools have been intensively developed to extract useful information from unstructured, semi-structured or structured data (text, documents, data logs). These pieces of extracted information are often uncertain, incomplete, and inconsistent, while a decision maker needs to combine these different information in order to make an optimal decision. Issues such as resolving inconsistency, redundancy, incompleteness and uncertainty must be addressed when making a decision utilizing all the information.
This PhD project continues world leading research conducted within the KDE cluster on uncertain knowledge/information fusion and revision, with specific focuses on
- Developing new fusion algorithms for merging extracted information from multiple sources under uncertainty, incompleteness and inconsistency.
- Developing new revision strategies for changing an agent’s beliefs when new inputs are obtained as the result of situation change.
- Applying theoretical developments to web-based or sensor-network based decision making problems, such as opinion discovery in social networks or unusual situation discovery in security or healthcare.
- Implementing prototype system to demonstrate “proof of concepts.”
Note: we assume information extraction tools are available and can extract information required.
Any further inquiries, please contact Professor Weiru Liu at firstname.lastname@example.org, Tel: 028 9097 4896
Mining Social Media
Supervisor: Dr Jun Hong
Social media is now pervasive and part of people’s online world. It provides a massive amount of data that can be accessed, analyzed and mined for different kinds of useful information: commercial companies are interested in learning more about you by mining data that is publically available on Twitter, forums, and blogs; governments are interested in your opinions expressed on social media; and analyzing data on social media may help intelligence and information gathering for business and security decision making. These are just some of the potential applications that mining social media can offer.
This project is concerned with an investigation into the state-of-the-art in this rapidly growing research area which involves multi-disciplines of research, such as social network analysis and mining, natural language processing, text mining, web mining, information retrieval, web data extraction, knowledge representation, and logical and uncertain reasoning. A specific type of social media, such as the microblogging service Twitter, will be selected as a data source for data mining. A specific application, such as headline news, market trends, sentiment analysis and opinion mining etc, will be identified and used as a test bed for theoretical research and development. The research will lead to both theoretical breakthroughs and commercial advances.
Query-Aware Web Search
Supervisor: Dr Jun Hong
Search engines, such as Google, search the Web in response to a keyword-based query and return a ranked list of links to the relevant web pages. On the other hand, people often want to search the Web for products that match their structured queries. For example, someone may want to search for a particular digital camera that many online vendors sell. Ideally, they would like a search engine to return a list of camera descriptions from different vendors with each description containing a link to the vendor’s page for the camera. If that cannot be achieved, they would like to have a list of ranked links that lead to the vendors’ pages containing these cameras. Current search engines cannot do this. Google product search (beta version) allows vendors to submit their products to it, which can then be searched in response to a buyer’s query, driving web traffic to the vendor’s site. This is searching the database that contains the submitted products rather than searching the Web.
This project is concerned with research into query-aware Web search which supports searching the Web for products that match a structured query. The project needs to address a number of challenging issues, for example:
- How to crawl the Web and discover web pages that contain product descriptions?
- How to index discovered web pages? Is it necessary to extract products from these pages so that they can be classified and indexed?
- How to match a structured query to indexed web pages so that a list of relevant web pages or products can be returned?
- How to rank relevant web pages?
The project is related to a number of research areas, including information retrieval, web mining, data extraction, deep Web crawling etc.
Advancing Keywords-based Search into Deep Web
Supervisor: Dr Jun Hong
Imagine that someone uses Google or Live Search to search for and book flights from Belfast to Paris. The user first types in keywords “Flights Belfast Paris”. This returns a list of search results, some of which directly link to flight booking sites and the others link to flight comparison sites. The latter then link to some selected flight booking sites according to certain comparison criteria. These search results are helpful in terms of directing the user to the relevant flight booking sites. However, to complete the booking, the user still needs to spend a considerable amount of time on following up these links, visiting multiple booking sites, locating query forms on these sites, filling in query forms, checking flight availability, comparing prices, etc. The proposed project aims to develop a novel approach that automates much of the above process, going beyond keywords-based search on the surface Web.
The proposed system would work alongside a conventional search engine such as Google or Live Search. The user still starts a search as usual by typing in keywords (e.g. Flights Belfast Paris) to the search engine, and he/she is prompted with a list of search results. Normally, then with the conventional search engine, the user follows up one of the search results to a booking site, locating and filling in the query form, and submitting the completed query form. The proposed system can now automatically advance the rest of the search process. The proposed system makes use of two sets of data available. On one hand, it has a list of ranked search results in response to the user’s original keyword-based search query, from which other relevant booking sites can be located and the appropriate query forms on these sites can be found. On the other hand, the user has completed and submitted a query form so that the system knows with more detail what the user seeks.
The system carries out a number of tasks:
- locate relevant booking sites and find the appropriate query forms on these sites from the search results for the user’s query;
- extract these query forms;
- match the user’s completed query form to the extracted query forms;
- auto-fill the other query forms and submit them to the corresponding sites;
- combine and compare the query results from these sites and present the combined query results to the user.
Thus, the proposed system acts like a Search Assistant who works on the user’s behalf, visiting alternative Web sites (e.g. flight booking sites), locating query forms to the backend databases on these sites, filling in these query forms and combining the query results from different Web sites, hence saving the user a considerable amount of time.
Software Planning using Value, Effort, Risk and Assurance: A Search Based Approach
Within the KDE group significant expertise exists in the areas of software engineering and decision support optimisation. This project will combine aspects of this expertise in the area of software product planning. On completion of this research, the successful candidate will have developed a unique and innovative set of skills within the areas of software engineering, decision support, optimisation and automation.
Software product planning is a complicated process with many inputs of which many are inter-dependent. Given a huge backlog of potential features that can be developed (either in new build or evolving software), each feature has a predicted value and will involve cost. Each feature has also a set of risks associated with it and also requires varying degrees of assurance. This project is firstly to determine a means to model the problem domain and thus to try to develop an understanding of the trade-offs between value, effort, risk and assurance. Release planning considering cost, risk and value has been considered in previous research, but the circumstances for safetry or mission critical software require non-functional requirements to be considered as well. Little or nothing has been done about formalising this in a model. To do so would allow decision making about how much assurance to build in and about what activities are most rewarding and which tasks should be prioritised. Building a model of the decision space would include establishing the data items available, determining the dependencies and relationships between these, providing a means to investigate the trade-offs between them, and ultimately to simulate and visualise the plan for delivering the software, thus enabling decision support. Taking a security sensitive product perspective, such a model might, for example, include the set of functional requirements (or use cases/ user stories) for a given system, the dependencies between these requirements, the predicted effort of each item of functionality, a set of possible misuses of the system, a set of threats to the system, derived security measures, impact assessment of security measures etc. and a means to combine these and other properties into an estimated value for the delivered system.
The second part of the project would be to determine an search based solution that would allow decision support for release planning for assured systems. Given the large number of variables in the decision making process about which features, which security measures, which assurance requirements can be met for a given effort value as well as the number of constraints that are to be in determining what to include in the development, the eventual solution of the problem will involve a large search space. Search Based Software Engineering (SBSE) is gaining increasing recognition as a suitable approach for optimising software engineering tasks. In such an approach an objective function is defined and the best (or nearly best) solution is chosen according to the fitness metric. Thus we intend to make use of recent advances in SBSE alongside readily available high-performance computing power to find a near-optimal solution in all cases.
Argumentation Extraction and Summarisation
Supervisor: Dr Ian O'Neill
Argumentation extraction is an area of natural language processing and knowledge engineering that is rapidly growing in importance. Argumentation extraction systems aim to understand how and why writers and speakers support or challenge particular hypotheses or points of view. They achieve this by analysing the statements that people make, whether these are scientists, designers, book reviewers or bloggers. The technology is relevant to any sphere of human activity (politics, manufacturing, product evaluation, literary criticism) where it is important to learn – especially from a large body of documentation – what arguments are emerging, and how convincing the arguments are.
As well as refining machine learning and pattern matching techniques for statement (proposition) and argumentation identification, this research will involve classification of types of argument and quantification of the weight with which arguments are presented. It will also investigate practical techniques for comparing and summarising arguments that occur within and across texts.
One of the aims of this research is to develop systems that can create concise reports that are based on the rationale underlying writers’ support for or rejection of particular hypotheses. Such reports will be of particular benefit to public policy makers and decision-makers in industry and business.
Among the main challenges that this research will tackle are the following:
- Identification of propositions – statements or assertions about people or things – in written, position-taking (evaluative) texts.
- Identification of the arguments that are supported by or challenged by propositions.
- Classification and evaluation of argument types and their strength
- Comparison and assessment of the strength and polarity (for and against) of competing arguments.
- Summarisation of hypotheses, their justification and the manner in which they are qualified or challenged
Developing advanced data mining approaches to analysing data from Belfast City Council
Supervisor: Prof Weiru Liu
Project area: Data mining, knowledge discovery and information management
Belfast City Council has collected a number of large data sets (more than 10 subjects) on various topics that the council is interested in. These datasets include records of waste/recycle information by households, all the trees, all the major buildings, council properties, restaurant surveys/inspections, air-quality at certain spot (such as near a chemical plant), and many more. Plans are also underway to collect power grid sensor data and sewage system data. All these will form part of the Smart Cities vision for the future.
The primary purposes of collecting such data are many folds, including (i) keeping records of historic information, such as prestigious buildings and trees, (ii) strengthening business potentials, and (iii) making the running of a city more effective and efficient to fulfil the Smart City vision.
This project will develop advanced data mining approaches to analyse these data to discover interesting findings and to inform Belfast City Council or other business parties for business intelligence and for effective use of such information.
This PhD project continues world leading research conducted within the KDE cluster on large scale data mining/machine learning, with specific focuses on
- Developing novel data mining algorithms to discover business and eco-system related knowledge from Belfast City Council data. This work could start with WEKA system with the aim of extending existing algorithms for providing more comprehensive functionalities.
- Developing a data analysis framework consisting of a number of mining algorithms for analysing different kinds of datasets.
- Working closely with Belfast City Council and other business parties to evaluate the algorithms developed above and to transfer such knowledge to business sectors.
Any further inquiries, please contact Professor Weiru Liu at email@example.com, Tel: 028 9097 4896