Although the number of AI models distributed under Open Source licenses is increasing, it can be said that AI systems in which all related components, including training data, are open are still in a developmental stage, even as a few promising systems have emerged. In this context, this past May, the Linux Foundation, in collaboration with companies such as Amazon, Meta, IBM, and Microsoft, released the Open Model Definition & Weights License v1.0 (OpenMDW-1.0). This license comprehensively handles the components related to an AI system and grants open usage rights. This article will discuss the Open Source nature of OpenMDW-1.0, compare it with other licenses, and raise points of legal ambiguity and topics that require discussion within the AI domain.

  1. Is This a License That Can Be Called Open Source?
    1. Features of OpenMDW
    2. Evaluation of Conformance to Open Source
  2. Unclear Points and Concerns Regarding OpenMDW
    1. The Meaning of Including Trade Secrets in the Scope of the Grant
    2. The Broader, Clearer, and Stronger Patent Retaliation Clause
    3. Forcing Due Diligence on the User
    4. Inferior to Creative Commons as a Data License
    5. The Potential to Encourage Openwashing
  3. Overall Evaluation
  4. OpenMDW License Agreement, version 1.0 (OpenMDW-1.0)

Is This a License That Can Be Called Open Source?

OpenMDW is a license that comprehensively handles AI-related components such as AI models, data, and weights (parameters), which were not explicitly within the scope of conventional software-centric licenses. Overall, it can be described as a permissive license with many similarities to the MIT License and Apache-2.0. Regarding whether it is a license that can be called “Open Source” by conforming to the 10 points of the “Open Source Definition,” it clears the basic principles such as freedom of redistribution, provision of source code, and freedom to create derivative works. Furthermore, as it can be evaluated as having no restrictions on people or fields of endeavor, nor any clauses that exclude specific technologies, it is highly likely to be approved as an Open Source license. (Note: As of July 13, 2025, OpenMDW has not been submitted to the Open Source Initiative (OSI) for its license approval process.)

Subject to your compliance with this agreement, permission is hereby granted, free of charge, to deal in the Model Materials without restriction, including under all copyright, patent, database, and trade secret rights included or embodied therein.

The core grant of rights is stipulated as above. The structure of this clause is very similar to the grant of rights clause in the MIT License. However, its subject is not “software” as in MIT, but rather “Model Materials,” a concept defined within OpenMDW that comprehensively includes the model architecture, parameters, and related data and documentation. Furthermore, the rights to deal in the materials without restriction and free of charge are explicitly listed as including “all copyright, patent, database, and trade secret rights.” In essence, this grants comprehensive freedom of use for the various components that can be distributed together with the model at their core. Although similar to the MIT License, MIT relies on implied licenses for rights other than copyright. By explicitly granting all conceivable rights in the new domain of AI, it seems the intent is to seal the risk of sophistry prevailing—such as the argument, “While the copyright is granted, the ideas not protected by copyright are not.”

Features of OpenMDW

Other features or conditions of the license include the following.

  • Conditions for redistribution are only a copy of the license agreement and copyright notices or notices of origin.
    This is a similar concept to the MIT License and others, but a difference lies in the inclusion of “notices of origin.” This is likely because the diversity of components constituting an AI system may require notices other than a copyright notice. Basically, it would mean that all statements indicating where the distributed components came from, and by whose hand, should be retained in a NOTICE file or similar.
  • A patent clause with a framework similar to Apache-2.0.
    Simply put, this is a framework similar to the patent clause of Apache-2.0. It is a clause that terminates the user’s license if they initiate a patent infringement lawsuit against the model provider, but there are several minor changes tailored for AI components. This point will be discussed later, but at present, I evaluate that it does not affect its Open Source nature.
  • Explicitly states no restrictions or obligations on the use of AI-generated outputs.
    Generally, the rights of the model provider do not extend to the output of an AI unless an explicit contract exists. However, OpenMDW explicitly states that there are no restrictions or obligations on the output. This is in contrast to the license agreements for Meta Llama, Google Gemma, and others, which have a “viral effect” that propagates through the output, and this serves as a point of reassurance for users.
  • Clarification of due diligence responsibility.
    It is explicitly stated that the clearing of third-party rights, such as copyrights that may be included in the training data, and the acquisition of necessary permissions are all the responsibility of the model user to conduct due diligence. While this can be said to be a realistic clause that mitigates risks for the model provider, it can be a burden for small and medium-sized model users who may not have processes for rights auditing. However, regardless of the existence of such a clause, the burden principally falls on the user side unless there are specific regulatory laws, and thus it can be judged as not affecting the evaluation of its Open Source nature.

Evaluation of Conformance to Open Source

Summarizing the features of the license thus far, it cannot be judged that OpenMDW contains content that violates the Open Source Definition. Therefore, it is considered that there are no major obstacles at present for it to become an OSI-approved license. It is also a permissive license similar to the MIT License and Apache-2.0, and it has the convenience of being comprehensively applicable to AI-related components such as models, code, data, and documents. With the simplicity of its clauses, there is no doubt that it is an easy-to-handle license for AI model providers.

However, as will be discussed later, several unclear points and concerns that could be indirectly caused by the license are conceivable. I believe that the situations where the use of OpenMDW can be recommended are actually few compared to what its simplicity might suggest. For many organizations pursuing Open Source, a combination of traditional permissive Open Source licenses and Creative Commons-style licenses will likely be more suitable in most cases.


Unclear Points and Concerns Regarding OpenMDW

OpenMDW is a license designed for AI systems that is easy for model providers to handle. On the other hand, by boldly incorporating domains different from those for conventional program code into its scope, it has created uncertainties that are expected to give rise to several discussions, and there are also concerns that arise from them. None of them are fatal flaws, but I believe it cannot be denied that they may become major problems in the future in the rapidly changing field of AI.

The Meaning of Including Trade Secrets in the Scope of the Grant

The rights explicitly within the scope of OpenMDW include copyright, patent, database, and trade secret rights. Copyright is the foundation of the rights handled by Open Source licenses in the first place, and patent rights are handled by licenses such as Apache-2.0 and GPL-3.0, so they are no longer viewed as special. As for database rights, they are implemented in Creative Commons-style licenses, and it can be said that granting them for content that includes database characteristics is becoming common. However, a license that grants explicit usage rights for trade secrets is extremely rare.

In the first place, in major legal jurisdictions such as Japan, the EU, and the United States, the protection of trade secrets absolutely requires that they be “not publicly known” (secret). Therefore, information distributed and made public under a license like OpenMDW is no longer legally protected as a trade secret. The question will naturally arise as to whether there is any meaning in deliberately granting usage rights for this fundamentally unprotected right.

Considering the potential legal meaning of the trade secret clause, it would at least have a declaratory meaning that the provider will not assert rights over the know-how and ideas realized in the model, such as the algorithm, design philosophy, and parameter tuning methods. This might have the effect of preventing a malicious model provider from later using the sophistry that “while the copyright was granted, the use of the implemented ideas and know-how was not.” It is also conceivable that listing trade secrets has the effect of making the license’s intent—to allow unrestricted and free use from the perspective of all intellectual property rights, including yet unknown rights—as robust as possible.

However, it is still only meaningful to that extent. In exchange for clarifying the unrestricted grant of model use to users, there may be unknown side effects from incorporating a right that is not very familiar to the license framework. I think there will probably be no problems, but the differences in the legal definitions of trade secrets among countries are greater than those for copyright, and there will likely be a legal test period for several years.

The Broader, Clearer, and Stronger Patent Retaliation Clause

The patent clause of OpenMDW is very similar to the patent retaliation clause of Apache-2.0, and its basic philosophy and purpose can be evaluated as the same. However, there are several points of difference from Apache-2.0.

If you file, maintain, or voluntarily participate in a lawsuit against any person or entity asserting that the Model Materials directly or indirectly infringe any patent, then all rights and grants made to you hereunder are terminated, unless that lawsuit was in response to a corresponding lawsuit first brought against you.

Under Apache-2.0, the patent license terminates when a patent lawsuit is filed. In OpenMDW, however, it is not only when a lawsuit is “filed,” but also when it is “maintained, or voluntarily participated in,” which covers not only the plaintiff who initiated the lawsuit but also acts of joining the lawsuit later in some capacity. Furthermore, the scope of the lawsuit covers the broad concept of “Model Materials,” which is unique to OpenMDW, and the wording “directly or indirectly infringe any patent” also includes claims of “indirect” infringement. In other words, the patent retaliation clause of OpenMDW covers a much broader range of actions and scope compared to Apache-2.0, and one can discern the intent to address future, unknown forms of litigation in the new domain of AI. This breadth of scope may appear as a harsh condition to some users, but it is likely favorable in that it protects the model provider as much as possible.

Incidentally, while a situation of direct patent infringement is easy to understand, what exactly is indirect infringement? It is presumed that this refers to indirect infringement as stipulated in the patent laws of relevant jurisdictions, but the requirements and philosophy of this indirect infringement differ in several respects in jurisdictions such as Japan, the United States, and the EU. Therefore, a concern arises that these differences in the requirements for indirect infringement may cause confusion in the interpretation of the license. There could be a pattern where an act is judged as indirect infringement in the United States but is not deemed an infringement in Japan. However, this expression “indirectly” is likely intended to cover the general concept of “non-direct infringement” in any country under whose laws it is interpreted, and it is thought to aim at strongly deterring litigation from users through legal uncertainty.

Furthermore, there is another, most significant difference from Apache-2.0. Under Apache-2.0, it is the patent license that terminates upon the filing of a lawsuit, but under OpenMDW, it is stipulated that “all rights and grants” shall terminate. This is unambiguous and constitutes a harsher retaliatory measure. Combined with the breadth of its scope, it is considered an extremely powerful legal safeguard to prevent lawsuits and is likely aimed at the practical effect of preventing patent disputes within the AI community as much as possible. On the other hand, it will likely appear as a harsh condition, especially for large corporate model users, and since it covers the maximum conceivable scope, the possibility of unknown cases arising cannot be denied.

Forcing Due Diligence on the User

In Open Source licenses, disclaimer clauses consisting of “AS IS” and “no-liability” are common. While OpenMDW has clauses almost identical to those in MIT and others, there is another major disclaimer clause. A clause has been added that makes “clearing all third-party rights included in the Model Materials, obtaining necessary consents, and due diligence all the responsibility of the user.” This effectively obligates the user to audit whether third-party rights, such as an individual’s copyright or personal rights including privacy, are included in the model or the training data distributed with it.

For the model provider, this due diligence clause may provide a certain sense of security, but they cannot escape mandatory provisions such as personal information protection, consumer protection, and AI regulations in various countries. For the model user, it can also be seen as imposing a very large burden of due diligence. The model provider might be perceived by the public as arrogant, and users might feel, “All rights were supposed to be granted, but this is a different story!” Therefore, some might think that it only creates unnecessary confusion.

However, this due diligence clause is likely a sincere reflection of the reality that a model provider cannot grant rights they do not possess, such as a third party’s copyright or personal rights, nor can they escape the mandatory laws of each country. In the end, in the use of AI models, including their outputs, it is the user who must continue to audit whether they are infringing on the rights of third parties. Personally, I consider this to be a favorable clause for an AI model license.

Inferior to Creative Commons as a Data License

The OpenMDW license allows a single model to be distributed under the same license along with related source code, documentation, and metadata. The components distributed under this same license also include the training dataset. This mechanism provides the convenience of licensing all components related to model development in a single batch. On the other hand, it cannot be said, even as a compliment, that OpenMDW is a suitable license for a training dataset.

The grant conditions of OpenMDW can be said to be close to CC-BY in the Creative Commons family, but it is inferior in several respects when compared as a data license. First, CC-BY 4.0 is designed to clearly handle data-specific rights, assuming the EU’s database rights, whereas OpenMDW only grants a comprehensive database concept and can be said to be legally unclear. Also, the risk of the data itself infringing on a patent is extremely low, and the strong patent retaliation clause of OpenMDW is excessive for a data license and could hinder its use by companies with legitimate rights.

Furthermore, Creative Commons holds the status of a de facto standard for open data licenses, and the rights and obligations regarding the use, modification, and redistribution of data are designed in line with the common understanding of the global data community. There seems to be little meaning in applying a license that is not suitable for a dataset, going against this common sense.

Creative Commons includes not only CC-BY but also CC0, which is close to a public domain declaration, and CC-BY-SA, a license where the same CC-BY-SA license propagates to derivative datasets if one is created. It is common practice to use them selectively depending on the use case and purpose. Introducing a comprehensive license like OpenMDW into this landscape might only push the licensing of AI training data into a more complex situation. I also fear that the casual selection of OpenMDW could induce cases where conflicts with CC-BY-SA or copyleft licenses arise.

The Potential to Encourage Openwashing

Personally, this is the biggest concern. I personally judge that the OpenMDW license conforms to the “Open Source Definition,” and if the license were submitted to the OSI, it is highly likely to be approved as Open Source. However, when the number of model development vendors adopting the OpenMDW license actually increases, this license might encourage the “openwashing” that has plagued our community for the past few years.

As explained in the previous section, when the OpenMDW license is applied to a specific model, it is also applied to related components placed in the same repository. This means that if a training dataset is included, OpenMDW is applied to it. In this case, it is possible for a model vendor to release a model under the OpenMDW license by including only a small amount, or a dataset of little rarity or importance, in the same repository, and then to advertise this state as “released as Open Source AI” or as “a truly open model from the dataset level.” If the OpenMDW license is one that conforms to the “Open Source Definition,” it would lend credibility to such sophistry—that is, to openwashing.

In the “Open Source AI Definition” that the OSI has worked tirelessly to create, the requirement for data is “information in sufficient detail to allow the construction of a practically equivalent system.” This allows anyone to reconstruct an equivalent model to verify its behavior or to examine what data it was trained on and what biases it might have. However, if only a small part of the training data is released, these values would be completely lost. But if the OpenMDW license, which conforms to the “Open Source Definition,” becomes commonly used, it would create a situation where one can claim to be Open Source as long as they release just a portion of the dataset, while keeping the rest completely secret. This would damage the “Open Source” brand, which has built up immense value over several decades, representing “freedom,” “transparency,” “collaboration,” and “trust.”

I do not know why the Linux Foundation has not yet submitted the OpenMDW license to the OSI’s Open Source approval process, but I currently recognize the OpenMDW license as a well-thought-out and excellent license. The reason that it might encourage openwashing should not be a reason to deny its approval as Open Source. Rather, I believe this has clarified the challenges regarding AI-related licenses for the Open Source license community, centered around the OSI.

We must continue our efforts to promote the interpretation that applies Article 2 “Source Code” of the Open Source Definition—that it “must be the preferred form in which a programmer would modify the program”—to AI, meaning that “the preferred form for studying and modifying an AI model includes not only the inference code but also information about the training data and process.” By doing so, it will be possible to reduce the view that OpenMDW encourages openwashing.


Overall Evaluation

The OpenMDW license is a permissive license that makes the concepts of MIT and Apache-2.0 applicable to an entire complex AI system, and it may be the first license specialized for AI systems that the Open Source community can be satisfied with. The declaration that there are absolutely no restrictions on the output is also very favorable.

However, I think it is necessary to hear opinions from various legal jurisdictions on the point of including trade secrets in the granted rights, and I also feel that the effect of the patent retaliation clause extending to data is somewhat excessive. Considering the enormous size and licensing complexity of general training data, I believe the best practice will continue to be a separated approach where the training data is distributed in a different repository from the model, under a different license such as one from the Creative Commons family. The approach of applying MIT or Apache-2.0 to the model and a CC-style license to the dataset will likely continue in the future.

Presumably, cases where it is suitable to apply the OpenMDW license to include some training data will basically be special cases. The ones that come to mind are as follows:

  • Cases involving scientific and technical data with patent risks. If the data itself carries litigation risk, such as in drug discovery, genetics, or semiconductor design, a comprehensive patent retaliation clause that provides a deterrent to patent disputes is suitable.
  • Cases where derivative or synthetic data processed specifically for the model exists. If the data is a byproduct of model preprocessing and is incomplete for other uses, it is clearer to align its license with that of the model for easier management.
  • Cases where model parameters and data are interdependent to ensure reproducibility. When the model’s operation cannot be verified without the model alone, batch licensing with clear rights relationships is easier to understand.

As someone not involved in AI model development, I can only point out such general cases. However, it is safe to assume that recommending the application of OpenMDW to include training data is limited to special circumstances where the model and data need to be inseparable. Still, it can be considered a sufficiently useful license even for just that.

There is a concern of openwashing, where some large corporations might claim to be “Open Source AI” by including a non-critical dataset with a model to which OpenMDW is applied. However, confronting this concern is the job of the Open Source license community, centered around the OSI. To repeat, we must continue our efforts to promote the premise for being called “Open Source AI,” which is that “the preferred form for studying and modifying an AI model includes not only the inference code but also information about the training data and process.”


OpenMDW License Agreement, version 1.0 (OpenMDW-1.0)

By exercising rights granted to you under this agreement, you accept and agree to its terms.

As used in this agreement, “Model Materials” means the materials provided to you under this agreement, consisting of: (1) one or more machine learning models (including architecture and parameters); and (2) all related artifacts (including associated data, documentation and software) that are provided to you hereunder.

Subject to your compliance with this agreement, permission is hereby granted, free of charge, to deal in the Model Materials without restriction, including under all copyright, patent, database, and trade secret rights included or embodied therein.

If you distribute any portion of the Model Materials, you shall retain in your distribution (1) a copy of this agreement, and (2) all copyright notices and other notices of origin included in the Model Materials that are applicable to your distribution.

If you file, maintain, or voluntarily participate in a lawsuit against any person or entity asserting that the Model Materials directly or indirectly infringe any patent, then all rights and grants made to you hereunder are terminated, unless that lawsuit was in response to a corresponding lawsuit first brought against you.

This agreement does not impose any restrictions or obligations with respect to any use, modification, or sharing of any outputs generated by using the Model Materials.

THE MODEL MATERIALS ARE PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, TITLE, NONINFRINGEMENT, ACCURACY, OR THE ABSENCE OF LATENT OR OTHER DEFECTS OR ERRORS, WHETHER OR NOT DISCOVERABLE, ALL TO THE GREATEST EXTENT PERMISSIBLE UNDER APPLICABLE LAW.

YOU ARE SOLELY RESPONSIBLE FOR (1) CLEARING RIGHTS OF OTHER PERSONS THAT MAY APPLY TO THE MODEL MATERIALS OR ANY USE THEREOF, INCLUDING WITHOUT LIMITATION ANY PERSON’S COPYRIGHTS OR OTHER RIGHTS INCLUDED OR EMBODIED IN THE MODEL MATERIALS; (2) OBTAINING ANY NECESSARY CONSENTS, PERMISSIONS OR OTHER RIGHTS REQUIRED FOR ANY USE OF THE MODEL MATERIALS; OR (3) PERFORMING ANY DUE DILIGENCE OR UNDERTAKING ANY OTHER INVESTIGATIONS INTO THE MODEL MATERIALS OR ANYTHING INCORPORATED OR EMBODIED THEREIN.

IN NO EVENT SHALL THE PROVIDERS OF THE MODEL MATERIALS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE MODEL MATERIALS, THE USE THEREOF OR OTHER DEALINGS THEREIN.

The Hidden Risks of NVIDIA’s Open Model License

Recently, regarding the open-weights AI model “Nemotron 3” released by NVIDIA, there are scattered media reports mistakenly describing it as open source. Because there is concern that these reports encourage ignoring the usage risks of the NVIDIA Open Model License Agreement (version dated October 24, 2025; hereinafter referred to as the NVIDIA License), which is…

The Current State of the Theory that GPL Propagates to AI Models Trained on GPL Code

When GitHub Copilot was launched in 2021, the fact that its training data included a vast amount of Open Source code publicly available on GitHub attracted significant attention, sparking lively debates regarding licensing. While there were issues concerning conditions such as attribution required by most licenses, there was a particularly high volume of discourse suggesting…

The Legal Hack: Why U.S. Law Sees Open Source as “Permission,” Not a Contract

In Japan, the common view is to treat an Open Source license as a license agreement, or a contract. This is also the case in the EU. However, in the United States—the origin point for almost every aspect of Open Source—an Open Source license has long been considered not a contract, but a “unilateral permission”…

A Curious Phenomenon with Gemma Model Outputs and License Propagation

While examining the licensing details of Google’s Gemma model, I noticed a potentially puzzling phenomenon: you can freely assign a license to the model’s outputs, yet depending on how those outputs are used, the original Terms of Use might suddenly propagate to the resulting work. Outputs vs. Model Derivatives The Gemma Terms of Use distinguish…

Should ‘Open Source AI’ Mean Exposing All Training Data?

DeepSeek has had a major global impact. This appears to stem not only from the emergence of a new force in China that threatens the dominance of major U.S. AI vendors, but also from the fact that the AI model itself is being distributed under the MIT License, which is an Open Source license. Nevertheless,…

Significant Risks in Using AI Models Governed by the Llama License

Although it has already been explained that the Llama model and the Llama License (Llama Community License Agreement) do not, in any sense, qualify as Open Source, it bears noting that the Llama License contains several additional issues. While not directly relevant to whether it meets Open Source criteria, these provisions may nonetheless cause the…

The Hidden Traps in Meta’s Llama License

— An Explanation of Llama’s Supposed “Open Source” Status and the Serious Risks of Using Models under the Llama License — It is widely recognized—despite Meta’s CEO persistently promoting the notion that “Llama is Open Source”—that the Llama License is in fact not Open Source. Yet few individuals have clearly articulated the precise reasons why…