dataAttribs->tsr->expandTsrV() ), new KV( 'content', $content, $srcOffsets->expandTsrV() ), new KV( 'inlineContext', ( $opts['inlineContext'] ?? false ) ? "1" : "0" ), new KV( 'inPHPBLock', ( $opts['inPHPBLock'] ?? false ) ? "1" : "0" ), ] ); } /** * Processes content (wikitext, array of tokens, whatever) in its own pipeline * based on options. * * @param Env $env The environment/context for the expansion. * @param Frame $frame * The parent frame within which the expansion is taking place. * Used for template expansion and source text tracking. * @param string|Token|Token[] $content * This could be wikitext or single token or an array of tokens. * How this content is processed depends on what kind of pipeline * is constructed specified by opts. * @param array $opts * Processing options that specify pipeline-type, opts, and callbacks. * - string pipelineType * - array pipelineOpts * - array tplArgs - if set, defines parameters for the child frame * - string tplArgs.name * - array tplArgs.attribs * - string srcText - if set, defines the source text for the expansion * - SourceRange srcOffsets - if set, defines the range within the * source text that $content corresponds to * - bool sol * @return Token[]|DOMDocument (depending on pipeline type) */ public static function processContentInPipeline( Env $env, Frame $frame, $content, array $opts ) { // Build a pipeline $pipeline = $env->getPipelineFactory()->getPipeline( $opts['pipelineType'], $opts['pipelineOpts'] ); // Set frame if necessary $srcText = $opts['srcText'] ?? $frame->getSrcText(); if ( isset( $opts['tplArgs'] ) ) { $pipeline->setFrame( $frame, $opts['tplArgs']['title'], $opts['tplArgs']['attribs'], $srcText ); } else { $pipeline->setFrame( $frame, null, [], $srcText ); } // Set source offsets for this pipeline's content if ( isset( $opts['srcOffsets'] ) ) { $pipeline->setSourceOffsets( $opts['srcOffsets'] ); } // Off the starting block ... ready, set, go! return $pipeline->parse( $content, [ "sol" => $opts['sol'] ] ); } /** * Expands value all the way to DOM. * * @param Env $env * The environment/context for the expansion. * @param Frame $frame * The parent frame within which the expansion is taking place. * Used for template expansion and source text tracking. * @param array $v * The value to process. * The value is expected to be an associative array with a "html" property. * The html property is expanded to DOM only if it is an array (of tokens). * Non-arrays are passed back unexpanded. * @param bool $expandTemplates * Should any templates encountered here be expanded * (usually false for nested templates since they are never directly editable). * @param bool $inTemplate * Unexpanded templates can occur in the content of extension tags. * @return array */ public static function expandValueToDOM( Env $env, Frame $frame, array $v, bool $expandTemplates, bool $inTemplate ): array { if ( is_array( $v['html'] ?? null ) ) { // Set up pipeline options $opts = [ 'pipelineType' => 'tokens/x-mediawiki/expanded', 'pipelineOpts' => [ 'attrExpansion' => true, 'inlineContext' => true, 'expandTemplates' => $expandTemplates, 'inTemplate' => $inTemplate ], 'srcOffsets' => $v['srcOffsets'], 'sol' => true ]; $content = array_merge( $v['html'], [ new EOFTk() ] ); $dom = self::processContentInPipeline( $env, $frame, $content, $opts ); // Since we aren't at the top level, data attrs // were not applied in cleanup. However, tmp // was stripped. $v['html'] = ContentUtils::ppToXML( DOMCompat::getBody( $dom ), [ 'innerXML' => true ] ); } // Remove srcOffsets after value is expanded, so they don't show // up in the output data-mw attribute unset( $v['srcOffsets'] ); return $v; } /** * @param Env $env * The environment/context for the expansion. * @param Frame $frame * The parent frame within which the expansion is taking place. * Used for template expansion and source text tracking. * @param array $vals * Array of values to expand. * Non-array elements of $vals are passed back unmodified. * If an array element, it is expected to be an associative array with a "html" property. * The html property is expanded to DOM only if it is an array (of tokens). * @param bool $expandTemplates * Should any templates encountered here be expanded * (usually false for nested templates since they are never directly editable). * @param bool $inTemplate * Unexpanded templates can occur in the content of extension tags. * @return array */ public static function expandValuesToDOM( Env $env, $frame, array $vals, bool $expandTemplates, bool $inTemplate ): array { $ret = []; foreach ( $vals as $v ) { $ret[] = self::expandValueToDOM( $env, $frame, $v, $expandTemplates, $inTemplate ); } return $ret; } /** * Convert a DOM node to a token. The node comes from a DOM whose data attributes * are stored outside the DOM. * * @param DOMElement $node * @param DOMAttr[] $attrs * @return array */ private static function domAttrsToTagAttrs( DOMElement $node, array $attrs ): array { $out = []; foreach ( $attrs as $a ) { if ( $a->name !== DOMDataUtils::DATA_OBJECT_ATTR_NAME ) { $out[] = new KV( $a->name, $a->value ); } } if ( DOMDataUtils::validDataMw( $node ) ) { $out[] = new KV( 'data-mw', PHPUtils::jsonEncode( DOMDataUtils::getDataMw( $node ) ) ); } return [ 'attrs' => $out, 'dataAttrs' => DOMDataUtils::getDataParsoid( $node ) ]; } /** * Convert a DOM to tokens. Data attributes for nodes are stored outside the DOM. * * @param DOMNode $node The root of the DOM tree to convert to tokens * @param Token[] $tokBuf This is where the tokens get stored * @return array */ private static function convertDOMtoTokens( DOMNode $node, array $tokBuf ): array { if ( $node instanceof DOMElement ) { $nodeName = strtolower( $node->nodeName ); $attrInfo = self::domAttrsToTagAttrs( $node, DOMCompat::attributes( $node ) ); if ( Utils::isVoidElement( $nodeName ) ) { $tokBuf[] = new SelfclosingTagTk( $nodeName, $attrInfo['attrs'], $attrInfo['dataAttrs'] ); } else { $tokBuf[] = new TagTk( $nodeName, $attrInfo['attrs'], $attrInfo['dataAttrs'] ); for ( $child = $node->firstChild; $child; $child = $child->nextSibling ) { $tokBuf = self::convertDOMtoTokens( $child, $tokBuf ); } $endTag = new EndTagTk( $nodeName ); // Keep stx parity if ( WTUtils::isLiteralHTMLNode( $node ) ) { $endTag->dataAttribs = PHPUtils::arrayToObject( [ 'stx' => 'html' ] ); } $tokBuf[] = $endTag; } } elseif ( $node instanceof DOMText ) { $tokBuf = array_merge( $tokBuf, TokenUtils::newlinesToNlTks( $node->nodeValue ) ); } elseif ( $node instanceof DOMComment ) { $tokBuf[] = new CommentTk( $node->nodeValue ); } else { // getWrapperTokens calls convertDOMToTokens with a DOMElement // and children of dom elements are always text/comment/elements // which are all covered above. PHPUtils::unreachable( "Should never get here!" ); } return $tokBuf; } /** * Get tokens representing a DOM forest (from transclusions, extensions, * whatever that were generated as part of a separate processing pipeline) * in the token stream. These tokens will tunnel the subtree through the * token processing while preserving token stream semantics as if * the DOM had been converted to tokens. * * @param DOMNode[] $nodes List of DOM nodes that need to be tunneled through. * @param array $opts * @see encapsulateExpansionHTML's doc. for more info about these options. * @return Token[] List of token representatives. */ public static function getWrapperTokens( array $nodes, array $opts ): array { if ( !$nodes ) { return [ new TagTk( 'span' ), new EndTagTk( 'span' ) ]; } $node = $nodes[0]; // Do we represent this with inline or block elements? // This is to ensure that we get p-wrapping correct. // // * If all content is inline, we use inline-elements to represent this // so that this content gets swallowed into the P tag that wraps // adjacent inline content. // // * If any part of this is a block content, we treat extension content // independent of surrounding content and don't want inline content // here to be swallowed into a P tag that wraps adjacent inline content. // // This behavior ensures that we and clients can "drop-in" extension content // into the DOM without messing with fixing up paragraph tags of surrounding // content. It could potentially introduce minor rendering differences when // compared to PHP parser output, but we'll swallow it for now. $wrapperType = 'INLINE'; if ( !empty( $opts['pipelineOpts']['inlineContext'] ) ) { // If the DOM fragment is being processed in the context where P wrapping // has been suppressed, we represent the DOM fragment with inline-tokens. // // FIXME(SSS): Looks like we have some "impedance mismatch" here. But, this // is correct in scenarios where link-content or image-captions are being // processed in a sub-pipeline and we don't want a
in the link-caption // to cause the .. to get split apart. // // Filed as T49963 } elseif ( !empty( $opts['sealFragment'] ) ) { // Sealed fragments aren't amenable to inspection, since the // ultimate content is unknown. For example, refs shuttle content // through treebuilding that ends up in the references list. // // FIXME(arlolra): Do we need a mechanism to specify content // categories? } else { for ( $i = 0; $i < count( $nodes ); $i++ ) { if ( DOMUtils::isBlockNode( $nodes[$i] ) || DOMUtils::hasBlockElementDescendant( $nodes[$i] ) ) { $wrapperType = 'BLOCK'; break; } } } $wrapperName = null; if ( $wrapperType === 'BLOCK' && !DOMUtils::isBlockNode( $node ) ) { $wrapperName = 'div'; } elseif ( $node->nodeName === 'a' ) { // Do not use 'A' as a wrapper node because it could // end up getting nested inside another 'A' and the DOM // structure can change where the wrapper tokens are no // longer siblings. // Ex: "[http://foo.com Bad nesting [[Here]]]. $wrapperName = 'span'; } elseif ( in_array( $node->nodeName, [ 'style', 'script' ], true ) && count( $nodes ) > 1 ) { //