Data Citation for Publication & Knowledge Management

As the last one of the three-session series (Data Publication and Data Citation, part I & II), the roundtable discussion (part III) is based on the outline report of Data Citation Standards and Practices Task Group. The importance of data citation is argued from the points of the research life-cycle (e.g. traceability and reproducibility) as well as the benefits for different communities (i.e. data creators, repository managers, funders, and wider research community) However, credit attributions are highly related with benefit issues, the discussion thus leads to a conversation about the relation between ‘benefit” and “responsibility”.  

The current use of data citation practice adopts ISO 690-2 standard (for bibliographic references and citations to information resources, and has been revised by:  ISO 690-2010). Several gaps have been identified as granularity, microattributions (datasets assembled from many contributions), contributor identifiers, and placement of data citations. Six emerging principles (i.e. First Class Status, Persistence, Granularity, Resolvability, Attribution, and Metadata Standards) are proposed to balance identification and attribution. Note that, the six principles are not formally regarded as “the structure for data citation”. Problems of several unclear definitions and ethical issues remain. Still, the way to look at standardization opportunities for data citation from a national or interdisciplinary perspective is concluded as that the TG will promote the findings and recommendations of “good” practices report, and investigate a more formal consensus standardization process. 

In addition, tools, technology and infrastructure need to support integrated data citation (e.g. citation management software); to extend practices for digital data citation in a “Big Data” world (e.g. versioning & citation for dynamic data); and to leverage data citation such as collaborative authoring of data, and data mining and citation analysis. Overcoming the socio-cultural and institutional challenges need to tackle the normative barriers in accepting data citation practices. To overcome the economic and financial challenges, data citation practices are suggested to have a positive cost-benefit ratio, both from a general, systemic standpoint and from the perspective of major stakeholder groups and discipline areas. 

A research model to enable ubiquitous, seamless & interoperable data citation has been proposed as an attempt to rethink the importance of data citation at the end of the roundtable session. The Theory-Translational-Applied-Evaluation model has been discussed to meet data citation challenges.  For example, the Theory in data equivalence concerns the Translational issues for significant properties as well as the data modelling and decomposition. While properties are Applied in Canonical Forms and Semantic Fingerprints, data modelling and decomposition are Applied in Canonical Forms and Efficient Versioning Algorithms. The final Evaluation should be conducted through criteria such as effectiveness of interventions to promote citation, citation infrastructure reliability, and policy impact. 

Collaborative Knowledge Management session brings light to the status quo of Knowledge Management (KM) standards and frameworks under collaborative environments, starting out with an introduction to euroCRIS (Current Research Information Systems) and discussing issues in data consistency.  The importance of portfolio management in finance investment systems, CUSAT KM models (Cochin University of Science and Technology, CUSAT) to manage marine community data, and skillset identification for roles of digital curation in companies follow up to address data management in various contexts.  Raising awareness for data management through communities is also proposed in the experience of organizing O3D clubs ("Observatoire des Données du Dévelopement Durable, O3D").  The session ends with an initiative to bring in ontology learning and modelling for KM bringing out discussion on international collaboration and research data management. 

The roundtable session of Task Group on Preservation of and Access to Science ad Technical Data in Developing Countries (PASTDC) discussed about policy and sharing.  The differences between data and knowledge or engineers and decision makers shall be clarified first.  For example, CBERS (China-Brazil Earth Resources Satellite) is capable of observation, but to make better applications, we have to transform data into knowledge.  It’s a time of mass data and resources, but how can we best utilize the data to help developing countries?  The Kenya experience illustrates the roadmap: starts from the geo resources and always take steps in accordance with the social-eco data.  It requires interdisciplinary cooperation to turn data into knowledge, and collective efforts from scientists and decision makers is the key to success. 

The other roundtable session of CODATA Strategic Vision discussed future tasks of CODATA and marked the ending of the 23rd Conference.  The attendees thought holding symposiums and workshops worldwide and a solid finance can outreach to individual scientists and attract members.  They also thought hearing scientists talking about what data needed is inspiring and CODATA’s standardization is the most important contribution.  As for the visions, publishing science journals to advance communication, pushing government transparency, cultivating young scientists, and promoting CODATA’s sharing website were suggested.  In conclusion, the attendees believe that members should gather to clarify CODATA’s organization role.  



在三場「資料發表與資料引用」的最後一場圓桌討論會議中,主要是以「資料引用標準與應用工作小組」的報告大鋼為討論題要。首先針對資料引用的重要性加以討論,主要論述的觀點是從研究的生命週期(例如,追朔性與可重複製造性),以及從不同社群的利益觀點出發,和與會者討論資料引用的重要性 (而其中所謂的社群則包括,資料創造者、repository管理者、贊助者、以及廣泛的研究社群等)。然而,當「利益」這個因素可能考量credit attributions議題時,與會討論的方向則進一步對於所謂的「利益」與「責任」之間的關係進行意見交換。

目前資料引用的實作主要採用ISO 690-2 標準 (主要是針對標註引用文獻方式的標準,目前已更訂為ISO 690-2010版本)。會中也提出許多資料引用實作困難點,如資料的顆粒度(Granularity)、微屬性(microattributions) (整合自許多資料提供者的屬性)、資料提供者的身分、以及資料引用的配置問題等。另外,針對均衡身分識別 (identification) 與屬性(attribution),工作小組的報告則提出六項原則,其中包括了視資料引用為首要(First Class Status)、資料與引用的持續性(Persistence)、資料的顆粒度(Granularity)、可解析性(Resolvability)、資料屬性(Attribution), 以及後設資料標準(Metadata Standard)。然而,由於許多名詞的定義與學界的倫理問題仍待釐清,因此對於這六項觀點仍尚未達成普遍的共識。目前僅限原則性的意見交換,而非提出資料引用的架構。最後,就國家或跨領域觀點,與會者進行對資料引用標準化機會的觀察,這項議題的討論則以工作小組將持續分享他們的研究發現、提供實作佳例建議報告、並探究正規標準化共識的過程等議題作結。

此外,工具、科技以及基礎建設等議題,則必須提供整合資料引用的需求(如引用管理軟體)、必須在巨量資料的脈絡前提下,延伸資料引用實作(如對動態資料的版本與引用的延伸議題)、同時也必須提升資料引用的影響,例如資料的協同創作、資料探勘與引用分析等。為克服社會文化與組織結構等多方面的的挑戰,我們必須處理資料引用實作接受度的基本障礙 。 另一方面,為克服經濟與財務方面的挑戰,自一般性且系統化的觀點、以及主要學術領域與stakeholder groups的觀點出發,提出一個正面積極的成本有效分析將有助於資料引用的實作。

「資料發表與資料引用」的最後討論則提出了一個「理論-轉譯-應用-評估」架構(Theory-Translational-Applied-Evaluation)的研究模型。其目的是重新思考資料引用的重要性,並促使資料引用能達到普遍存在、無縫式運作、且可相互操作的目標。例如,在資料對稱性的理論中,須考量「資料屬性」、「資料模型和資料配置 」的轉譯。而當「資料屬性」應用於正規形式與語意Semantic Fingerprints時,「資料模型和資料配置 」 則是應用於正規形式與版本效率控制的演算法之中。最後,評估則必須透過一些評估要件,如促進引用實作的有效干預程度、資料引用基礎建設的可信賴度、以及政策的影響程度等。


「開發中國家科學與技術資料保護與利用工作小組,PASTDC 」圓桌討論場次,則討論政策與資料分享;資料和知識的差別、工程師和決策制訂者的認知鴻溝需先釐清方能善用資料。例如CBERS在觀察資料上著有貢獻,然若不將其分解為常人可懂之知識,便不能稱的上是最佳應用。吾人刻正處於大量資料與資源的時代,如何能夠協助發展中國家?肯亞經驗或許描繪出了路線圖,亦即從地球資源開始、並時時配合該區域的社會經濟資料。唯有跨領域的合作方能將資料轉換為知識,而科學家和決策者的共同努力更是成功的鎖鑰。

「CODATA 策略願景 」圓桌討論場次,分享與會者認知CODATA的未來任務,同時也為本次第二十三屆大畫會下完美句點;與會者認為在各國舉辦研討會及座談會可增加與個別科學家接觸的機會並吸引更多會員;能聽到各科學家討論他們的資料需求對與會者而言是最具啟發性的,標準化可說是CODATA最重要的貢獻;至於CODATA的願景是什麼、又該如何完成呢?在場人士提出了許多想法。如出版科學期刊來建立與社會大眾溝通的管道、對政府施加更加透明化的壓力、陶冶年輕的科學家、及推動分享CODATA的網站以達到更親民的目標;至於其他對CODATA的觀點?總結來說,與會者認為會員應聚集並確認此專業機構的主要目標是處理資料方面的各種問題。