The Italian Legal Data Ecosystem: Sources, Databases and Access Regimes for Artificial Intelligence
The Italian Legal Data Ecosystem: Sources, Databases and Access Regimes for Artificial Intelligence" summary.
Legal profiles, institutional practice and prospects for opening judicial data.
Abstract
This contribution aims to reconstruct, in an organic manner, the ecosystem of Italian legal data, both statutory and jurisprudential, from the perspective of its usability for building datasets and automated research systems based on artificial intelligence.
The inquiry, conducted with an empirical-normative method, found a specific anchor in a formal exchange with the Director of the Electronic Documentation Centre (CED) of the Court of Cassation, whose written replies constitute, as far as is known, the first public and formalised expression of the Institution's position on these matters.
The analysis reveals a structurally asymmetric system. The maturity now reached by the body of statutory law, entirely available in open format and under licences suitable for commercial reuse, is matched by the substantial closure of the case law of the court of last instance, guarded by an articulated system of reservations (legal notices, sui generis database rights, opt-out from Text and Data Mining, privacy safeguards) that preclude any hypothesis of mass extraction for algorithmic training purposes.
The work critically reconstructs the legal reasons for this closure, measures its implications for the Italian legal tech market and formulates, in conclusion, some operational guidance for sector operators, together with some reconstructive observations in view of a possible evolution of the institutional framework.
1. Introduction and coordinates of the research
The application of information technology to the legal domain is not, in the Italian experience, a recent phenomenon. The founding, in 1969, of the Electronic Documentation Centre (Centro Elettronico di Documentazione, CED) at the Court of Cassation in fact places the Italian legal system among the comparative pioneers of the field [1].
Nonetheless, the spread over the last five years of large language models and generative artificial intelligence systems has imparted to the field a shift of structural significance. The centre of gravity of the inquiry has progressively moved from assisted documentary research, the model on which twentieth-century legal informatics was formed, to the mass ingestion of data for algorithmic training purposes.
It is no longer a matter, in other words, of querying electronic archives in search of a precedent or a rule, but of feeding systems capable of autonomously reworking legal data, up to the limit, by no means merely theoretical, of the assisted drafting of legal documents, the prediction of litigation outcomes and the semantic reconstruction of entire lines of authority.
This paradigm shift requires us to ask, with a degree of systematicity hitherto substantially neglected, an apparently elementary question: what is, at the current state of the art, the consistency, accessibility and legal usability of the Italian legal information heritage for the purpose of building artificial intelligence systems?
The question, only apparently circumscribed, involves a plurality of planes, whose overlap delineates an ecosystem far more fragmented and contradictory than the rhetoric of open public data would suggest [2]. The scenario that emerges from the survey is marked by a structural paradox, on which this work intends to dwell with particular attention.
The Italian legal system is characterised, on the one hand, by a now consolidated tradition of openness of statutory data (legislation is entirely available in open format and under licences suitable for commercial reuse) and, on the other, by a persistent closure of case-law data, especially with regard to the ordinary jurisdiction of last instance. The paradox reaches its apex when one considers that the law-unifying jurisprudential source of the legal system, the Supreme Court, is at the same time the least accessible source for purposes of systematic reuse and economic exploitation.
This observation, as will be seen in section 5, was the subject of a formal exchange between the writer and the CED, whose replies crystallised the Institution's position in unequivocal terms [3].
After this introduction, the contribution is articulated into five further sections. Sections 2 and 3 map, respectively, Italian statutory and jurisprudential sources, examined with regard to publication portals, available formats and the legal status of reuse. Section 4 reconstructs systematically the regulatory framework applicable to the access to and extraction of legal data, from the regime on the reuse of public sector information to Text and Data Mining, from copyright protection to the protection of personal data. Section 5, the critical core of the contribution, analyses in depth the case of the Court of Cassation and the SentenzeWeb portal, in light of the official position taken by the CED. Section 6, finally, formulates concluding observations and operational recommendations.
2. The ecosystem of statutory sources
The Italian legal system is characterised by a stratification of legislation of considerable complexity. From the point of view of data availability for algorithmic training purposes, this stratification represents, paradoxically, a point of strength. The entire apparatus of legislation in force is today acquirable in open format and under licences that permit its commercial reuse, according to a model that, as will be seen shortly, finds no comparable maturity on the jurisprudential side.
The absolute reference portal in this area is Normattiva [4]. The system aggregates all numbered State legislative acts published in the Official Gazette from 1861 to the present in so-called "multi-version" (multivigente) form [5]. From a legal standpoint, Normattiva officialised, with effect from 1 January 2026, the release of its data under a Creative Commons Attribution 4.0 International licence (CC-BY 4.0), guaranteeing the availability of dedicated APIs and of bulk download in structured XML format also for commercial purposes [6]. The legal basis of this openness goes back to Article 52 of Legislative Decree 82/2005 (Digital Administration Code, or CAD), which establishes the principle that data published by public administrations are deemed released as "open-type data" unless a different licence applies. Italian legislation, as a whole, is therefore open data to all intents and purposes, and its automated ingestion encounters no legal barriers whatsoever, save the obligation to cite the source.
As regards regional legislation, Normattiva has a federated search engine that allows the databases of all regional legislative assemblies to be queried from a single access point. Each Region, moreover, maintains its own legislative portal and its own Official Bulletin, so that mapping regional data requires a capillary survey of the twenty regional systems and the two autonomous Provinces. The survey conducted in the context of the present work confirmed the substantially generalised openness of regional statutory data [7]. Almost all Regions adopt the Italian Open Data License (IODL 2.0) or CC-BY 4.0, rendering their acts fully reusable. For completeness, mention should also be made of the open data portals of the parliamentary Chambers, which make available, in structured format, the materials of the legislative process useful for reconstructing the interpretive context [8].
Italian statutory data presents, in summary, a level of openness that can be defined as structurally mature. The entire corpus is available in open format, under licences suitable for commercial reuse and with technical infrastructure adequate for automated ingestion. This is evidently no accidental result: it reflects a conscious institutional choice, which found expression in Article 52 of the CAD, in Article 1, paragraph 2, of Legislative Decree 36/2006 (reuse of public sector information) and, most recently, in the European directives on open data (Dir. EU 2019/1024). A choice which, as will be seen, has not been reproduced with the same coherence in the jurisprudential domain.
3. The ecosystem of jurisprudential sources
If the statutory side presents a substantially complete arrangement, the jurisprudential side reveals a profoundly heterogeneous panorama, in which advanced experiences of openness coexist with pockets of almost total closure. Understanding this heterogeneity requires a systematic premise: the Italian legal system is characterised by a plurality of jurisdictional branches, each of which manages its own information flows according to distinct logics, instruments and degrees of openness. In this framework, the absence of a unitary policy of openness of judicial data represents, in itself, one of the most characteristic, and problematic, features of the Italian experience.
At the level of the individual jurisdictions, the Constitutional Court represents a virtuous and, in certain respects, pioneering example. Its institutional portal makes available all rulings (judgments and orders) from 1956 to the present, accompanied by the official headnotes and the case notes prepared by the Court's research service. Above all, the Institution has equipped itself with a dedicated open data portal (dati.cortecostituzionale.it), which releases the entire corpus in structured format and under a Creative Commons Attribution - ShareAlike 3.0 licence (CC-BY-SA 3.0), allowing its reuse also for commercial and training purposes [9]. The operation rests, from a legal standpoint, on a premise that the Court itself has implicitly endorsed. Pursuant to Article 5 of Law 633/1941 (the Copyright Law), the texts of official acts of the State are not covered by authorial exclusive rights, so that constitutional judgments are by their nature in the public domain. The choice of the Consulta, in other words, is the coherent recognition of a status that the law already attributes to its rulings.
Administrative justice constitutes, from the opposite direction, the most advanced model of openness of jurisdictional data in the Italian system. In implementation of the directives of the National Recovery and Resilience Plan (PNRR), the Council of State activated the Open GA portal (openga.giustizia-amministrativa.it), which releases structured and already pseudonymised datasets of the rulings of the Council of State, of the Regional Administrative Courts and of the Council of Administrative Justice for the Region of Sicily, divided by year and by judicial seat, under a CC-BY 4.0 licence [10]. This is, in the writer's view, the only Italian judicial portal to have adopted a model designed from the outset for automated reuse, and it certainly deserves to be taken as a benchmark in the prospects for reform of the system. The choice of administrative justice to move in this direction is, moreover, no accident. It reflects an institutional sensitivity to transparency and to the openness of its decisional products that finds, at present, no equivalent in the other jurisdictional branches.
The ordinary jurisdiction on the merits (Tribunals and Courts of Appeal) has had, since December 2023, the Merits Case Law Database (bdp.giustizia.it), which collects civil rulings published from 1 January 2016 originating from the SICID system. The portal is accessible upon SPID authentication, but is designed for individual consultation only and does not, therefore, permit mass-download operations. The only avenue for large-scale ingestion is represented by the Convention between the Ministry of Justice and the Italian Publishers Association (AIE), renewed on 30 January 2025, which provides for the supply of civil rulings, in raw and non-anonymised format, through dedicated SFTP channels and APIs, against an annual fee of thirty thousand euros per publisher [11]. This sum, from a legal standpoint, does not constitute a sale of data, but rather a reimbursement of the marginal costs incurred by the Administration for making the service available, pursuant to Article 7 of Legislative Decree 36/2006. Membership entails, moreover, stringent obligations of pseudonymisation according to the 2010 Guidelines of the Data Protection Authority and of localisation of data within the territory of the European Union, which fall entirely on the adhering publisher. The AIE Convention represents a unicum in the Italian panorama: there exists, for the other jurisdictions, no analogous mechanism allowing regulated mass access to rulings.
Tax justice has, for its part, a database that collects digitally native judgments issued from 2021 onwards (bancadatigiurisprudenza.giustiziatributaria.gov.it). The licence adopted, however, Creative Commons Attribution - Non Commercial (CC-BY-NC), excludes commercial use at the root, significantly reducing the usability of the corpus for the purpose of training market-oriented systems [12]. The Court of Auditors and the Higher Tribunal of Public Waters, finally, bind by regulation the use of their data to research and documentation purposes, excluding any form of economic exploitation [13].
At the base of the descending curve just traced is, as anticipated, the Supreme Court of Cassation. The body that the legal system invests with the function of ensuring the uniform interpretation of the law, and whose orientations assume paradigmatic value for the entire system, has opted, conversely, for a model of almost integral closure. The legal notices of the institutional site (cortedicassazione.it) expressly prohibit, in unequivocal terms, any commercial use or economic exploitation of the published data [14]. The SentenzeWeb portal, a search engine that allows navigation among the judgments issued by the Court over the last six years, is conceived exclusively for individual human consultation. No API, no bulk download, no reuse licence. ItalgiureWeb, in turn, constitutes the official legal informatics system curated by the CED pursuant to Presidential Decree No 195 of 3 July 2004: it aggregates an extraordinary information heritage (civil and criminal case law, headnotes, legislation, doctrine, judgments of the Council of State and of the Regional Administrative Courts, of the Court of Auditors, of the ECtHR, of the Tax Commissions), but operates under a regime of restricted access, on institutional and individual subscription [15]. The legal and operational implications of this arrangement will be the subject of specific analysis in section 5, which devotes an autonomous treatment to the question.
The picture that emerges is that of a system in which the openness of jurisprudential data proceeds asymmetrically, on the initiative of the individual jurisdictions, in the absence of a unitary design and of a generalised obligation of release in open format. The Constitutional Court opened its corpus with institutional foresight; administrative justice realised, with Open GA, the technically most mature model; the civil merits are acquirable through an onerous but regulated channel; the other jurisdictions adopt variously restrictive regimes; the Cassation remains, finally, entirely precluded from reuse. The result, from the standpoint of the overall ecosystem, is a system in which the most precious jurisprudential component, namely that which orients the living interpretation of the law, is also the least available for the construction of artificial intelligence systems.
4. The legal framework for access to and reuse of legal data
The reconstruction conducted in the preceding sections returns an institutionally fragmented picture, in which each jurisdiction and each administration adopts its own solutions regarding the openness and reusability of data. On the regulatory plane, however, there exists a system of principles and rules, of national and European origin, that governs, in general terms, access to public data and its reuse, including for commercial purposes. Understanding this system constitutes an obligatory step for assessing the lawfulness of mass-extraction and algorithmic-training operations.
The founding pillar of the openness regime is represented by Legislative Decree No 36 of 24 January 2006, as amended by Legislative Decree No 200 of 8 November 2021, transposing Directive (EU) 2019/1024 on open data, which establishes the obligation for public administrations to make their documents reusable, including for commercial purposes. The principle does not, however, operate unconditionally. Article 1, paragraph 2, of the decree clarifies that the decision to permit reuse rests with the administration that owns the data, while Article 3, letter h-quater, allows it to be denied where access proves prejudicial to the protection of confidentiality [16]. It is, in other words, a constrained openness, expressed through uniform licences and conditions adopted in accordance with the AgID Guidelines. In the absence of an express licence or a dedicated channel, no entitlement to commercial reuse can be presumed [17].
On the copyright plane, conversely, a principle of openness ex lege applies. Article 5 of Law 633/1941 excludes authorial protection for "the texts of official acts of the State and of public administrations, whether Italian or foreign". The rule, of consolidated application, entails that the text of a judgment or of a legislative act is by its nature in the public domain, and is therefore not covered by exclusive rights. This openness, however, does not extend, and here lies one of the most delicate points of the entire matter, to the organisation of the material in a database. Article 102-bis of Law 633/1941, transposing Directive 96/9/EC, recognises in favour of the "maker" of a database a sui generis right autonomous from copyright: the holder has the exclusive right to prohibit the extraction and reutilisation of the whole or of a substantial part of the content, as well as the repeated and systematic extraction and reutilisation of non-substantial parts, where such conduct causes prejudice to the maker's investment. The subsequent Article 102-ter governs its limits and exceptions.
The combined provision of the two rules, Article 5 and Article 102-bis of Law 633/1941, produces a legal effect of particular significance: the individual jurisdictional ruling, as an official act, is in the public domain; its placement in a structured database fed by dedicated investment, conversely, attributes to the database maker an autonomous right to prohibit mass extraction. It is precisely in this duality that the question of the usability of the SentenzeWeb and ItalgiureWeb portals for the purpose of training artificial intelligence systems is played out, from a legal standpoint.
Onto this framework was grafted, most recently, the regime of Text and Data Mining (TDM), introduced into the Italian system by Legislative Decree No 177 of 8 November 2021, transposing Directive (EU) 2019/790 (the so-called Copyright Directive). Two distinct exceptions were inserted within Law 633/1941. Article 70-ter authorises TDM for scientific research purposes, but reserves this exception exclusively to universities, non-profit research bodies and cultural heritage institutions. The exception is mandatory and does not cover commercial uses in a broad sense. Article 70-quater, of broader scope, conversely permits TDM for any purpose, including commercial, but only on condition that the rights holders have not expressly reserved such rights ("opt-out"), typically through machine-readable clauses (the robots.txt file) or through contractual clauses or explicit legal notices.
Law No 132 of 23 September 2025 (the so-called Italian AI Law), at Article 25, confirmed the structure of the TDM exceptions, reiterating that the extraction of text and data to train artificial intelligence models is permitted on condition that there is "lawful access" to the source and that the holder has not expressed an opt-out. The Italian legislator thereby aligned itself with the European framework, but at the same time reinforced the role of the opt-out as an instrument for protecting the data holder.
A final profile, of growing importance, is that of the protection of personal data. Articles 51 and 52 of Legislative Decree 196/2003 (the Privacy Code) specifically govern the online dissemination of jurisdictional rulings and the anonymisation of identifying data, on the application of the data subject or ex officio, with specific obligations for those who disseminate or re-disseminate the rulings. The matter has recently undergone an evolution of particular significance with Decision No 329 of 20 May 2024 of the Data Protection Authority, which held that the indiscriminate collection of public data online for the purpose of training AI models cannot be founded on mere "legitimate interest" (Article 6(1)(f) GDPR), in the absence of adequate mitigation measures, and imposed on site operators the adoption of technical "anti-scraping" measures. This is a stance which, although not specifically directed at the judicial sector, profoundly affects the lawfulness of mass-extraction operations on rulings, even where such rulings are abstractly in the public domain.
The synthesis of the legal framework thus reconstructed can be expressed in the following terms. The reuse of legal data for algorithmic training purposes is lawful, on the copyright plane, only where the following concur cumulatively: (i) lawful access to the source, derivable from an express licence, from an institutional convention or, failing that, from the condition of public domain under Article 5 of Law 633/1941; (ii) the absence of an express opt-out by the holder; (iii) compliance with the limits of the sui generis database right, which excludes unauthorised mass extraction; (iv) observance of the obligations of pseudonymisation and safeguarding imposed by the legislation on the protection of personal data. The coexistence of all these prerequisites, by no means a given, as will be seen, constitutes the condition of lawfulness, from a civil-law standpoint, of the training operation.
5. The case of the SentenzeWeb portal and the official position of the CED of the Court of Cassation
The framework reconstructed thus far finds in the case of the Court of Cassation its most significant and, in certain respects, most paradigmatic test bed. The Court represents, as has been observed, the apex body of the ordinary jurisdiction, whose decisions constitute the principal operative interpretive reference for the entire legal system. It is therefore easy to understand how the unavailability of this corpus for algorithmic training purposes constitutes, for anyone intending to build an artificial intelligence system oriented to the Italian legal market, an obstacle of strategic dimension.
To obtain a formal position of the Institution on this problem, the writer transmitted, on 24 February 2026, two distinct applications by certified email (PEC) addressed to the General Protocol of the Court, to the Electronic Documentation Centre and to the Public Relations Office. The first application, registered under no. 24/02/2026.0002234.E, articulated five specific questions relating to: (i) the qualification of the reuse of the contents published on the SentenzeWeb portal; (ii) the protection of the database pursuant to Articles 102-bis and 102-ter of Law 633/1941; (iii) the admissibility of Text and Data Mining in light of Directive 2019/790/EU; (iv) the profiles of personal data protection; (v) the possible existence of structured access channels (API, exports, conventions). The second application, registered under no. 24/02/2026.0002236.E, supplemented the first by asking, in particular, whether there existed an administrative, authorisational or conventional procedure that would allow, in a compliant and lawful manner, the obtaining and reuse of the Court's rulings for professional and economic purposes, including their integration within artificial intelligence systems and automated text analysis.
The replies arrived, formally signed by the Director of the CED, Dr Alessio Scarcella, on 25 February 2026 [18]. These are documents of absolute institutional relevance. As far as is known, they constitute the first public and formalised expression of the Court's position on the matters indicated. Given their centrality to the economy of the present contribution, it is appropriate to analyse their contents point by point, according to the order and articulation adopted by the Director himself.
The legal regime of the institutional site
The first profile addressed by the Director concerns the legal regime of the institutional site. The Court's legal notices, it is observed, expressly establish that "the website and the data contained therein may be used only for personal use (information, research, study)" and that "any use for commercial purposes or for economic exploitation [...] is expressly prohibited". This condition, it is specified, extends also to the materials reachable through the SentenzeWeb portal. As to the text of the rulings, the Director recalls Article 5 of Law 633/1941, which excludes, as seen, authorial protection for the texts of official acts, to conclude that the individual ruling, considered in isolation, is not in itself covered by copyright. The clarification is important, but not decisive: as the Director himself takes care to highlight, from the public-domain status of the individual text there by no means follows the freedom to extract en masse from the portal that conveys it.
The protection of the database
The qualifying point of the reply is the following. The SentenzeWeb database, verbatim, "constitutes, by selection and organisation of the material and by the dedicated investments, a database within the meaning of Articles 102-bis and 102-ter of Law 633/1941". The maker, that is to say, in this case, the Justice Administration through the CED, therefore has the right to prohibit the extraction and reutilisation of the whole or of a substantial part of the content, as well as the repeated and systematic extraction and reutilisation of non-substantial parts where such conduct prejudices the maker's interests. From this it follows, the Director concludes in express terms, that "activities such as crawling, scraping, internal indexing, mass and/or periodic acquisitions to build datasets or to integrate proprietary archives are not permitted without authorisation".
The stance is, from a legal standpoint, of limpid coherence with the regulatory framework reconstructed in section 4. The Court distinguishes, as it must, between the text of the ruling, public domain ex Article 5 of Law 633/1941, and the database that organises it, which enjoys autonomous protection and is not available for automated extraction.
Text and Data Mining and the opt-out
On the third profile, the Director conducts an articulated analysis. Article 70-ter of Law 633/1941, the TDM exception for scientific research, is inapplicable to the case of commercial reuse, given that the rule reserves this exception exclusively to qualified subjects (universities, non-profit research bodies, cultural heritage institutions) and for scientific research purposes in the proper sense. The rule does not cover, it is expressly noted, professional and/or commercial uses in a broad sense.
Article 70-quater of Law 633/1941 (the general TDM exception) would be abstractly applicable, since commercial reuse falls within its scope; however, the Director observes that "the Legal Notices express a clear reservation against commercial uses and economic exploitation, with the consequent inapplicability of the general TDM exception to the contents of the SentenzeWeb database". The conclusion is legally unimpeachable. Article 70-quater subordinates the exception to the absence of an "express and appropriate" opt-out; the Court's legal notices constitute, quite evidently, an opt-out of this nature, with the consequent preclusion of the exception.
The safeguards regarding personal data
The fourth profile recalls Articles 51 and 52 of the Privacy Code, which, as observed in section 4, govern the online dissemination of jurisdictional rulings and the anonymisation of identifying data, with specific obligations for those who disseminate or re-disseminate the rulings. The Director specifies that "the SentenzeWeb database operates in compliance with such regulatory safeguards", and deduces that "any reuse that does not respect such safeguards is unlawful". The clarification assumes relevance for anyone intending, even once a valid title to reuse has been obtained, to develop dissemination services in turn: the obligation of anonymisation, far from being discharged upstream, is renewed in the hands of each subsequent disseminator.
The openness of public data and the absence of licences
The fifth profile addresses the side of Legislative Decree 36/2006 and the openness of public data. The Director observes that the decree "governs the openness of data and the reuse of public sector information under the conditions and within the limits provided therein, including: respect for intellectual property rights, protection of personal data and the exceptions provided by Article 3; moreover, openness/reuse takes place through uniform licences and conditions, in accordance with the AgID Guidelines". The consequence is clear-cut: "in the absence of a licence or an expressly dedicated channel, no entitlement to commercial reuse can be presumed".
The conclusion that the Director draws from this combined provision merits integral reproduction, for its systematic density: "even leaving aside the matter of copyright on the text of the judgment (Article 5 of Law 633/1941), the combined provision of the site's Legal Notices, the sui generis database right, the TDM opt-out, privacy (Articles 51-52) and the absence of Open Data licences applicable to the SentenzeWeb portal entails the impossibility of reusing the contents for purposes other than personal/study use, and precludes economic exploitation (including mass extractions, internal indexing, the creation of datasets and integrations into professional/AI services)". The sentence represents, it should be noted, the most complete synthesis known of the Institution's position on these matters, and deserves to be taken as a reference parameter for any subsequent assessment.
The release of copies by the Public Relations Office
The sixth profile addresses the matter, of no secondary operational importance, of the release of copies by the Public Relations Office (URP) pursuant to Presidential Decree No 115 of 30 May 2002. The Director clarifies that the URP releases simple copies (for study use) and authentic/legal copies (for procedural purposes), also in electronic format with digital signature, upon payment of the copy fees. However, here lies the qualifying point, "this is a service for the release of copies for procedural or study purposes, not a title to commercially reuse the contents or to constitute proprietary databases". The availability of a copy, in other words, does not legitimise the economic exploitation of the contents.
The point is significant because it eliminates, from the map of abstractly viable avenues, even the hypothesis, recurrent in practice, of "circumventing" the portal's regime through the mass acquisition of copies via the URP. The reply is clear. Such acquisition does not constitute a valid title for commercial reuse, and any further dissemination or reworking remains subject "a) to the privacy safeguards (Articles 51-52); b) to the prohibition of extraction/reutilisation of the database (Articles 102-bis/ter); c) to the terms of use of the institutional site (prohibition of economic exploitation)".
The non-existence of a procedure for commercial reuse
The most decisive point and, for Italian legal tech, the most consequential, is contained in the Director's second reply. To the specific question of whether a procedure exists for obtaining the commercial reuse of the Court's rulings and whether it is possible to submit an application for the conclusion of an ad hoc convention, the reply is negative in explicit terms: "the Court has not opened the commercial reuse of the contents of SentenzeWeb through licences or technical channels and, indeed, excludes it in the Legal Notices. There is, therefore, no procedure to 'authorise' the requested use and, for the same reason, there is no office competent to receive or assess applications to that effect".
The institutional significance of this affirmation cannot be underestimated. The legal tech operator who intends to build an artificial intelligence system fed by decisions of the Court of Cassation finds himself, at present, faced not with an onerous or complex path, but with the material non-existence of a path. There exists, in other words, no procedure, not even a theoretical one, for obtaining the Institution's consent to reuse. The practical consequence is radical. However willing an operator may be to accept even stringent conditions (fees, pseudonymisation obligations, transparency constraints, periodic audits) there is, at present, no authority before which to formalise a proposal to that effect.
The Director adds that any future developments, such as projects for anonymised datasets for scientific research or the publication of APIs under a specific licence, will be made known exclusively through the institutional pages of the Court. This is a formulation which, while leaving open the door to a possible evolution of the framework, clearly confirms the Institution's current closure on these matters.
Legal coherence and systematic tensions
The Court's position, thus reconstructed, presents an indisputable legal coherence. The Director did nothing other, in substance, than apply punctually the regulatory framework in force: Article 102-bis of Law 633/1941 protects the database; the legal notices express an opt-out that excludes the operation of Article 70-quater; Legislative Decree 36/2006 refers to licences, which in this case have not been adopted; the Privacy Code imposes specific safeguards. From a strictly formal standpoint, the position is unassailable.
From a different perspective, however, the Court's closure raises systematic questions of considerable depth. At least three profiles of tension can be identified here.
In the first place, the closure of the Cassation configures, in fact, a substantially differentiated treatment of jurisprudential data within the Italian legal system. The Constitutional Court opens its corpus to commercial reuse; administrative justice likewise; the civil merits are acquirable through the AIE Convention; only the Cassation remains entirely precluded. This asymmetry does not rest on a distinct legal nature of the rulings, the rulings of the Cassation being, like those of the Constitutional Court, official acts of the State ex Article 5 of Law 633/1941, but on an organisational choice internal to the individual administration that owns the data. It is a choice legitimate from a formal standpoint, but difficult to justify from the standpoint of systemic rationality.
In the second place, the closure presents profiles of tension with the principles of transparency and of publicity of jurisdictional acts, constitutionally protected. The judgments of the Cassation, as an instrument of nomofilachia (the function of ensuring the uniform interpretation of the law), have a function that exceeds the interest of the parties to the individual case: they orient the interpretation of the law for the entire legal system. Their widespread accessibility, in structured and reusable format, is therefore not a concession to the market, but a condition instrumental to the very effectiveness of the law-unifying function itself. A Court whose case law cannot be systematically analysed, indexed and classified is a Court whose function of interpretive guidance is, for that very reason, partially compromised.
In the third place, the current arrangement produces a market effect that is anything but neutral. Italian legal tech operators who intend to build artificial intelligence systems fed by the case law of last instance are de facto compelled to procure this corpus through the mediation of the traditional legal publishers (Giuffrè, Wolters Kluwer, Il Foro Italiano), which, by virtue of their consolidated relationships with the CED and of their own independent operations of headnoting and curation, possess broad and structured documentary bases. The result is a market arrangement in which access to the raw material is, in fact if not in law, reserved to a narrow range of historic operators, while new entrants must negotiate licences downstream rather than being able to turn upstream to the administration that owns the data. The question presents, in the writer's opinion, although it does not constitute the specific subject of the present work, profiles of relevance from the standpoint of antitrust law and of the essential facilities doctrine, which would merit autonomous examination.
Nor does it escape notice, from another perspective, that the Italian legal tech market presents, as far as can be gathered from a survey conducted on commercially available products, a not insignificant number of operators offering artificial intelligence solutions avowedly fed by the case law of the Cassation, in the absence of any public indication as to the nature of the title authorising the acquisition of the corpus. The descriptions of the sources, in the commercial materials, remain generic ("official databases", "complete case law", "updated archives") and do not allow the origin of the data to be traced either to structured publishing licences, or to an institutional title (which, as has been seen, does not exist), or to any conventional or authorisational channel. The observation, formulated by way of mere hypothesis and in the absence of conclusive public evidence, is that a not insignificant portion of these operators have built their archives of last-instance case law through means extraneous to the perimeter of lawfulness reconstructed in the preceding section, and specifically through operations of automated extraction from the Court's institutional portals, primarily SentenzeWeb, in violation both of the prohibition of mass extraction set by the legal notices, and of the sui generis protection of the database under Articles 102-bis and 102-ter of Law 633/1941, and, finally, of the express opt-out that those same notices configure pursuant to Article 70-quater of the same law.
This is, where confirmed, a practice that exposes the operators involved to substantial legal risk, both from a civil-law standpoint, for compensation of damages from abusive extraction of a database, and from a regulatory standpoint, in view of the sanctions provided by the GDPR, and that, on a systematic level, reveals how the institutional closure of last-instance data produces, paradoxically, the opposite effect to that declared. That is, rather than protecting the database, it ends up generating a grey area of uncontrollable practices in which the safeguarding of lawfulness is, in fact, left to the diligence of the individual operators.
This state of affairs does not constitute, in the writer's view, a sustainable equilibrium in the medium term. The pressure of European policies on judicial open data, the affirmation of common technical standards such as ECLI and ELI, the development of the so-called European Judicial Data Space and the maturation of the legal tech market, also as an effect of the initiatives of public bodies such as Cassa Forense, will presumably render, within a reasonable time horizon, no longer deferrable an institutional stance by the Court on the theme of the opening, albeit regulated and conditioned, of its corpus.
6. Concluding observations
The reconstruction conducted in the present contribution returns a structurally asymmetric ecosystem of Italian legal data: mature on the statutory side, profoundly heterogeneous on the jurisprudential side, largely closed to reuse for algorithmic training purposes in its most significant nodes and, above all, the jurisdiction of last instance.
From an operational standpoint, the panorama offered to the legal tech operator is the following. Legislation is entirely available in open format, under licences suitable for commercial reuse, through Normattiva, the Official Gazette, the parliamentary portals and the individual regional portals; the Constitutional Court releases its corpus as open data; administrative justice offers, through Open GA, the most mature model of openness of jurisdictional data; the civil merits are acquirable, in onerous and regulated form, through the AIE Convention; the other jurisdictions adopt variously restrictive regimes. The Cassation, conversely, is at present entirely precluded from commercial reuse, in the absence of any procedure for obtaining it.
The conclusion is that a legal AI operation directed at the Italian market must be designed, from the standpoint of source compliance, according to a logic of stratification: mass and free ingestion from Normattiva, from the regional portals, from Open GA and from the Constitutional Court's open data portal; adherence to the AIE Convention for the acquisition of the civil merits, with the implementation of a pseudonymisation pipeline compliant with the Data Protection Authority's Guidelines; renunciation of the direct ingestion of the Cassation's corpus and recourse, in the alternative, to the mediation of the traditional legal publishers, through formally structured business-to-business licences. Compliance, in other words, is not a variable to be managed after the fact, but the very precondition of a sustainable architecture.
On a systematic level, moreover, the current arrangement does not constitute, in the writer's view, an equilibrium destined to endure for long. The progressive pressure of European policies, the affirmation of shared technical standards, the maturation of the market and, not least, the growing perception, including within the Institutions, of the instrumental function of the openness of jurisdictional data with respect to the effectiveness of the law-unifying function itself, constitute convergent factors that will presumably render, within a not-remote horizon, no longer deferrable a reconsideration of the choices currently adopted by the Court of Cassation.
In this perspective, some operational indications would merit consideration de iure condendo (from the standpoint of the law to be enacted). In the first place, the provision of a formal procedure for the recognition of licences for the commercial reuse of the Cassation's rulings, modelled on the AIE Convention (annual fee, pseudonymisation obligation, EU localisation, periodic audits) would constitute a first, decisive step towards overcoming the current impasse. In the second place, the adoption of a dedicated open data portal of the Cassation, even only for the subset of the most recent decisions (for example, the last five years) and within the limits of the privacy safeguards, would allow the regime of the individual ruling to be decoupled from that of the database and would enhance, in a law-unifying perspective, the organised dissemination of case law. In the third place, a joint initiative of the Ministry of Justice, the Cassation and AgID for the definition of a national pseudonymisation standard, also in coherence with the developments of the European Judicial Data Space, would allow the needs of protecting the privacy of third parties cited in the rulings to be safeguarded in a unitary manner.
Ultimately, Italian law today faces a challenge that is at once technological, institutional and cultural. The construction of artificial intelligence systems at the service of the jurist can only be founded on a data heritage that is broad, structured, lawfully acquired and, at the same time, respectful of the balance between openness, the protection of privacy and the protection of investment. It is a challenge that calls into question, at one and the same time, the legislator, the Institutions that own the data, the publishers and the market operators. A challenge that requires, even more than technological innovations, a renewed capacity for systematic reading of the relationship between law, technology and public function.
Notes
[1] For a historical-institutional framing of Italian legal informatics, reference may be made to the foundational contributions of V. Frosini, Cibernetica, diritto e società, Turin, 1968, and of M.G. Losano, Giuscibernetica. Macchine e modelli cibernetici nel diritto, Turin, 1969. On the specific experience of the CED, founded on the initiative of R. Borruso, see, by the same author, Computer e diritto, Milan, 1988, vol. I.
[2] Intersecting, in particular, are: the institutional plane, concerning the organisation of the sources of production and cognition of the law and the subjects responsible for their publication; the technological plane, relating to formats, standards and distribution infrastructure; the regulatory plane, concerning the legal regime governing access, reuse and systematic extraction of public sector information; and the market plane, finally, relating to the strategies by which publishers, startups and research centres acquire and transform raw legal data into value-added products.
[3] The replies of the Director of the CED, Dr Alessio Scarcella, are dated 25 February 2026 and follow two distinct applications transmitted by certified email (PEC) by the writer on 24 February 2026 (Prot. 24/02/2026.0002234.E and Prot. 24/02/2026.0002236.E). The documentation is held on file at Studio Agostini & Kasapoğlu.
[4] Portal created and managed by the Italian State Mint and Polygraphic Institute (IPZS).
[5] A functionality that allows the rule applicable on any historical date to be reconstructed, correctly handling amendments, repeals and substitutions over time. This is, from a technical standpoint, a valuable resource for training systems that must operate on legal questions dating back in time.
[6] See the "Legal Notices" section of the Normattiva portal (normattiva.it/staticPage/legal), which attests that "the reproduction of the texts provided in electronic format is permitted provided that the source is mentioned". With effect from 1 January 2026, the portal officialised the release of data under a CC-BY 4.0 licence.
[7] Some Regions stand out for the quality of their platforms: Demetra in Emilia-Romagna, Arianna in Piedmont, Lexview in Friuli-Venezia Giulia, SardegnaLegislazione and Lexbrowser of the Autonomous Province of Bolzano offer consolidated multi-version texts, advanced search functionalities, APIs and structured datasets. This is an information resource which, although it does not enjoy the visibility of Normattiva, holds strategic importance for the coverage of sectors of regional competence. For completeness, it should be noted that Normattiva does not contain unnumbered acts, and that for regulatory and administrative acts of general interest (Presidential Decrees, ministerial decrees, circulars, communications) the primary source remains the archive of the Official Gazette of the Italian Republic, also curated by the IPZS.
[8] The data of the Chamber of Deputies are accessible in RDF, CSV and JSON format through a SPARQL endpoint (dati.camera.it/sparql); those of the Senate in Akoma Ntoso format, with a dedicated GitHub repository (github.com/senato/openparlamento). The licences adopted are, respectively, CC-BY 4.0 and CC-BY 3.0.
[9] The Constitutional Court's open data portal (dati.cortecostituzionale.it) makes available approximately twenty thousand rulings issued from 1956 to the present, in structured and machine-readable format, under a CC-BY-SA 3.0 licence. This is the only experience of true openness of apex jurisdictional data in the Italian system.
[10] The Open GA portal (openga.giustizia-amministrativa.it) releases datasets divided by year and by judicial seat, already pseudonymised upstream and equipped with structured procedural metadata. The "Information" section expressly attests: "The contents of Open GA Open Data are freely distributable and reusable, provided that the source is always cited".
[11] The Convention between the Ministry of Justice and the Italian Publishers Association, renewed on 30 January 2025, provides for the supply, through dedicated SFTP or APIs, of civil rulings issued by Tribunals and Courts of Appeal from 1 January 2016, in raw and non-anonymised format. Articles 6.1 and 6.3 of the Convention impose, on the adhering publisher, the pseudonymisation of personal data according to the 2010 Guidelines of the Data Protection Authority; Article 7.3 prescribes the localisation of data within the territory of the European Union.
[12] Tax Case Law Database (bancadatigiurisprudenza.giustiziatributaria.gov.it). The rulings are consultable through an advanced search engine; the corpus is limited to digitally native judgments issued from 2021 to the present. The CC-BY-NC licence excludes commercial use, save express authorisation of the Department for Tax Justice.
[13] The Legal Notices of the Court of Auditors bind the use of its corpus to "the exclusive purposes of research and legal documentation" (corteconti.it). The Legal Notices of the Higher Tribunal of Public Waters authorise reproduction exclusively for non-commercial purposes.
[14] The Legal Notices of the institutional site (cortedicassazione.it) establish verbatim: "The user may not use, or allow third parties to use, for commercial purposes the website and the data contained therein. Any use for commercial intent or utility or for economic exploitation is expressly prohibited". The clause, together with the reservation of rights expressed therein, constitutes an opt-out within the meaning of Article 70-quater of Law 633/1941.
[15] Presidential Decree No 195 of 3 July 2004 governs the activity of the CED. On an operational level, ItalgiureWeb aggregates: civil and criminal case law of the Cassation (headnotes and full judgments); judgments of the Council of State and of the Regional Administrative Courts; headnotes of the Constitutional Court; case law of the Court of Auditors; judgments and abstracts of the European Court of Human Rights; judgments of the Higher Tribunal of Public Waters; rulings of the Tax Commissions; as well as legislation, doctrine and classificatory schemes. Access is reserved to subscribers, with specific conventions also for those enrolled with Cassa Forense.
[16] The provision is justified in the perspective of avoiding that the indiscriminate opening of public data should determine a disproportionate sacrifice of the confidentiality of third parties. The same logic founds, from another standpoint, the privacy safeguards applicable to jurisdictional rulings (Articles 51-52 of Legislative Decree 196/2003).
[17] This principle of general scope assumes, as has been seen, decisive relevance precisely in the analysis of the case of the Court of Cassation.
[18] The replies, formally signed by the Director of the CED Dr Alessio Scarcella, are dated 25 February 2026. The integral documentation is held on file at Studio Agostini & Kasapoğlu.
Alberto Agostini and Irmak Kasapoğlu. Bologna, 10 May 2026. A contribution of Studio Agostini & Kasapoğlu.
#ArtificialIntelligence #LegalTech #OpenData #LegalData #CourtOfCassation #TechnologyLaw #TextAndDataMining #LegalAI #GDPR #StudioAgostiniKasapoglu