P E N U M B R A: Big data

Wednesday, September 21, 2016

ECPPM 2016

The 11th European Conference on Product and Process Modelling (ECPPM) was held Limassol, Cyprus, 7th to 9th September. Use of the acronym BIM (Building Information Modelling) was noticeably much more widespread compared with previous conferences in Vienna and Reykjavik. The trend was also addressed by Dr. Raimar Scherer in his opening speech connoting that - despite all ambiguities and misinterpretations - the concept of BIM is now central to the domain of building information management and broadly used by both academicians and practitioners. He also highlighted the increasing importance of interlinking existing semantic data and developing more efficient stochastic methods for analyzing the massive sets of data known as big data.

Dr. Lucio Soibelman from University of Southern California was the first keynote speaker. He raised a broad range of issues from implementing BIM for a more lean construction industry to new classification, query and analysis methods for efficient use of sensor data. One of the presented classification methods was identifying signatures of materials through photos of buildings and spaces with the aim of batch-tagging of photos. In the same way, visual signatures of different activities could also be identified using data sourced from sensors. As an example, the peak times of electricity consumption in an office could be associated with certain behaviors of the employees. This is done by identification of the visual signature of those behaviors through real-time data graphs. The real problem, however, is how to apply such insights for promoting a more efficient use of reaources. Empirical evidences show that changing the behavioral patterns of the employees proves to be an even greater challenge the moment the counter-productive behaviors have been identified.

Another fascinating initiative presented by Dr. Soibelman was a method for identifying defects of urban water distribution systems through analysis of videos captured by small robots running through pipes. Such an approach could result in considerable cost savings through replacing costly reactive maintenance tasks by less expensive predictive maintenance measures informed and triggered by updated visual information. The process of analysis of the captured videos is, however, not entirely automated yet. Spatial analysis of the collected data from the piping network will also facilitate identification of break clusters which could be used for analyses at macro level.

The keynote speaker of the second day, Dr. Rafael Sacks

The keynote speaker of the second day of the conference was Dr. Rafael Sacks, the co-author of BIM Handbook. The focus of his presentation was on how to automate the process of intelligence semantic enrichment of existing BIMs for specific use cases. Such processes are nowadays to a great degree manual and thus tedious and error-prone. His radical suggested fix was to replace Model View Definitions (MVDs) with a set of semantic rules within the original native BIMs. In this approach, building components are directly distinguished by those rules rather than by the MVD-specific IFC exporters of the proprietary BIM-authoring applications. Such an approach could resolve some of the currently prevailing problems with IFC exports such as individual pieces of slabs being aggregated to one component or incorrect semantic definitions of translated components (e.g. windows exported as doors or studs exported as columns or beams). A relationship feature matrix embracing topological features such as adjacency and containedness lies at the heart of this proposal. The presentation was concluded with introducing the optimistic vision of performing a BIM round trip where no information is lost or distorted.

The opening lecture of the third day by Dr. Ioannis Brilakis from Cambridge University elaborated on the challenges around creating as-is BIMs. Mid-range mobile videogrammetry and videotaping were two suggested methods for this purpose. The models produced by videotaping could be directly sent to CNC machines for the purpose of maintenance of existing structures such as roofs and road surfaces. The input data could be sourced from readily available devices such as the parking cameras of cars. Processing of the scanned models of more complicated structures is, however, more demanding. The point-cloud model of a middle-sized building captured in one single day, for example, may require up to ten working days of alteration and semantic enrichment.

The suggested fix for this problem presented by Dr. Brilakis was a top-down recursive enrichment methodology where major elements of the structure are distinguished first and further details are then identified incrementally. Such an approach would be most appropriate for simpler structures with fewer and well-distinguished components such a s bridges. Applying this method to more complicated structures such as buildings and factories requires more elaborated techniques. Next-level challenges would be how to capture non-visible information i.e. the internal structure of building elements. A relevant concern raised by the third keynote speaker was how sensor data should be integrated with the building models procured in conventional formats. The unanswered questions that were brought up by the speaker were whether new data model extensions should be added to the IFC schema for capturing sensor data; whether sensor values should be registered as properties of the building component in the model; and whether a new concept of "live BIM" needs to be defined.

Semantic web or a web of building information was a recurring topic through presentations. The motivation behind the semantic web initiatives is, in a nutshell, to provide a more flexible alternative for capturing building information from different actors across construction industry. The ifcOWL ontology, the RDF data model and Product Data Templates (PDTs) were the most referred features within the semantic web concept.

Indulging in traditional Cypriot meze following a visit to
Limassol Castle, the first four persons in front of the photo
in a CCW direction: Elektra Petrova from Aalborg University,
me, Eleni Papadonikolaki from TU Delft and Dr. Eilif Hjeseth
from Oslo and Akershus University College of Applied
Science (Source of the photo)

Several speakers addressed the ambiguities around the acronym LOD (level of detail or level of development). Apparently, the most agreed-upon interpretation of LOD among industry actors is the one adopted by the American Institute of Architects (AIA) for the AIA G202-2013 Building Information Modeling Protocol Form i.e. the levels defined for building models as LOD 100 to LOD 500. A handful of alternative terms have been suggested for more clarity namely level of reliability, level of completeness, level of information and level of approximation. These terms are sometimes associated with other BIM concepts such as BIM data drops and exchange requirements (ERs). A comprehensive review of the history and variations of interpretations of the term LOD could be found here.

Hurdles of information management in the FM sector, methods and metrics for measuring the benefits of BIM and monitoring indoor climate were some other prevalent topics. Standardization of building information communication formats and processes was also a shared concern among participants as it is deemed an essential requirement for a holistic approach to built environment information management.

Wednesday, January 13, 2016

Big data demystified

On 13th December 2013, I received the book "Big Data" written by Viktor Mayer-Schönberger and Kenneth Cukieran as a Christmas present from my supervisor. The authors are respectively professor of internet governance at University of Oxford and data editor at Economist. The concept was still in its infancy those days and had not yet become the broadly pervasive buzzword that it is right now. The hustle and bustle of doctoral studies left no free time for reading the book until recently. I had already listened to a handful of presentations focused on or touching upon the concept when I started reading through the book. I did therefore not have very high expectations of the book. Surprisingly, I found the contents still fresh and informative. It actually helped me reassembling all the scattered pieces of knowledge I had acquired about big data by then. Below comes a summary of the contents as well as some own reflections:

Big data is, in the words of the authors, a paradigm shift in statistical analysis. The rationale is that immense quantitative changes enable qualitative changes; so has the significant increase in the extent of processable digital data resulted in a change of state. The three major shifts characterizing the big data phenomenon are:

the shift from small sample sets augmented through extrapolation to big amounts of information sourced from the entire population;
the shift from exactitude to messiness; and - probably the most important one -
the shift from causality to correlation.

We can no longer consent to lose valuable nuances as a side effect of using samples instead of the entire studied population. Moreover, the astonishingly large datasets of today do seldom exist in one place as compelled by the conventional statistical analysis methods. Thus, we have to shed our fears of messiness and step into the big data realm in the pursuit of the subtle though significant insights that we have been discarding all the time. And finally, in the big data realm, it is no longer important to investigate why things happen; rather it is good enough to predict what would happen next.

The importance of data had already been recognized in logic some decades ago when inductive reasoning was coined as opposed to the traditional deductive reasoning school. Though, due to the abundance of data and data-crunching technologies, data has recently become even more valuable and is often prioritized over theoretical reasoning. In the light of the capabilities brought about by big data technologies, we can now contemptuously describe traditional statistical analysis methods as inadequate and stochastic hypothesis-driven trial and error! As big data technologies mature, further aspects of the phenomenon such as the secondary usages (also called the option values) of data are revealed and taken into account in high-end business models and vision statements of the pioneer information management firms.

As any other emerging technology, big data also has its downsides. The fact that data anonymization in a big data world is nothing but a blatant myth poses a serious threat to our privacy. It also questions the decency and legitimacy of using data for commercial and even research purposes. Eventually, we will face unprecedented cases as this: imagine a real-life situation where the probability of committing a serious crime by a suspect at a specific moment according to statistical analysis of his/her behavior is so huge that could serve as a fairly good evidence for the police for seizing him/her. At that stage, predictions could be so accurate that it would no longer be prudent to postpone preventive measures until human lives are lost and irreversible damages are perpetrated by the statistically-identified potential criminal.