Using hyperboloids to embed knowledge graphs

海外精选

海外精选的内容汇集了全球优质的亚马逊云科技相关技术内容。同时，内容中提到的“AWS” 是 “Amazon Web Services” 的缩写，在此网站不作为商标展示。

{"value":"Knowledge graphs are an extremely efficient way to represent information, but the standard techniques for analyzing them, which involve tracing connections through a graph one hop at a time, don’t scale well.\n\nRecently, knowledge graph embeddings, which represent elements of the graph as points in a multidimensional space, have provided a way to analyze knowledge graphs more efficiently. But they give up much of the graphs’ informational richness.\n\nAt this year’s [Web Conference](https://www.amazon.science/conferences-and-events/the-web-conference-2021), my colleagues and I [presented a new way](https://www.amazon.science/publications/self-supervised-hyperboloid-representations-from-logical-queries-over-knowledge-graphs) to embed knowledge graphs, as hyperboloids on a Poincaré hypersphere.\n\nHyperboloids are bounded curved surfaces — curved analogues of rectangles — and characteristics of hyperbolic space enable them to capture hierarchical information that conventional knowledge graph embeddings lose.\n\n![image.png](https://dev-media.amazoncloud.cn/fd00c76a5e484c1089fb10309f74cf22_image.png)\n\nA two-dimensional projection of hyperboloid embeddings of elements of the Amazon product graph. For ease of interpretation, the graph itself is superimposed on the embeddings. The size of the hyperboloids indicates how densely connected the corresponding graph elements are to other elements, which in this case indicates terms’ generality. The projection renders elements lower in the graph hierarchy as closer to the boundary of the embedding circle.\n\nOur paper describes a neural-network architecture for learning hyperboloid embeddings that enables us to logically compose knowledge graph queries. We could, for instance, search a product graph for all footwear from both brand A and brand B, a query that could be logically represented as the union of the intersections of the embeddings for brand A and brand B with the embedding for footwear.\n\n![image.png](https://dev-media.amazoncloud.cn/7ccf92cfe9a64781b372fd40c70d7f5c_image.png)\n\nIn this product graph, both brand names (A and B) and product types (footwear) are network nodes. A node that descends from two parents (say, brand B and footwear) fits into two categories. The set of nodes with the same two parents constitutes the intersection of the parents’ categories.\nCREDIT: GLYNIS CONDON\n\nIn experiments, we compared our approach to four other graph embedding schemes, using five different datasets. On each dataset, we posed nine different queries, for a total of 45 queries total. Our approach outperformed its predecessors on 44 of those queries. Sometimes the improvements were quite dramatic: on individual queries, improvements of 20% to 50% over the best-performing predecessor were common.\n\n#### **Horocycle intersection**\n\nA knowledge graph consists of nodes, which represent entities, connected by edges, which represent relationships between entities. A typical graph embedding would represent nodes and edges as vectors in the embedding space. With a good embedding, summing the vectors for a node and one of its edges should approximate the vector for the node that shares that edge.\n\nOur embedding scheme, which we call HypE, instead embeds nodes and edges as hyperboloids on a Poincaré hypersphere. More particularly, each hyperboloid is defined by the intersections of two pairs of parallel arc-aligned horocyles.\n\nAn arc-aligned horocyle is a partial circle that is parallel to the diameter of a Poincaré hypersphere and intersects the hypersphere boundary orthogonally. (On a Poincaré hypersphere, the notion of parallelism is that of a curved space; parallel lines may exhibit different degrees of curvature; what defines them is that they don’t intersect each other.)\n\n![image.png](https://dev-media.amazoncloud.cn/a9dc6ff6262040cab21edd0173fc7aee_image.png)\n\nAt left are nine arc-aligned horocycles on a Poincaré disk; the other eight horocycles are parallel to the disk diameter CD, in that they do not intersect it. At right is a hyperboloid (CDEF) defined by the intersection of two pairs of parallel horocycles.\n\nThe hyperboloid determined by the intersection of horocycles can be described by the location of its center and the limit of its boundary (see figure above).\n\n![image.png](https://dev-media.amazoncloud.cn/1bc676cd6ffd49b88079e12dcc8ae50d_image.png)\n\nThe intersection of hyperboloids.\n\nBecause hyperboloid embeddings are extended in space, HypE can learn embeddings whose spatial overlap represents the logical intersection of categories encoded in the graph. To extend the example above, the embedding for brand A footwear would be the intersection of the embedding for brand A and the embedding for footwear.\n\n#### **Architecture**\n\nWe train a neural network that takes as inputs an entity, one of its relations, an arbitrary number of additional entities, and a control signal that indicates one of three operations: translation, union, and intersection. Union and intersection are the standard logical operations; translation simply means the traversal of some number of hops (in our experiments, one to three) through the graph.\n\nThe control signal sets a series of switches within the network that determine which inputs contribute to the output in what capacity. The network thus explicitly learns embeddings that encode information about particular types of logical relationships.\n\nOur experiments compared the performance of embedding schemes on nine different queries: one-hop, two-hop, and three-hop translations; two- and three-entity intersections; two-entity unions; and three different combinations of translation and the logical operators.\n\n![image.png](https://dev-media.amazoncloud.cn/d004ab0887e6447ea1252984d2e3d921_image.png)\n\nThe HypeE architecture, with an operator signal that controls a sequence of switches within the network. Yellow represents the centers of hyperboloids, pink their limits.\n\nAcross five different datasets, HypE was the top performer on all but one test, a combination of union and translation; on that test, it finished second. On the five datasets, its average improvements over the second-best embedding schemes were 7% to 33%.\n\nABOUT THE AUTHOR\n#### **[Sumeet Katariya](https://www.amazon.science/author/sumeet-katariya)**\nSumeet Katariya is an applied scientist at Amazon.\n\n","render":"Knowledge graphs are an extremely efficient way to represent information, but the standard techniques for analyzing them, which involve tracing connections through a graph one hop at a time, don’t scale well.\nRecently, knowledge graph embeddings, which represent elements of the graph as points in a multidimensional space, have provided a way to analyze knowledge graphs more efficiently. But they give up much of the graphs’ informational richness.\nAt this year’s <a href=\"https://www.amazon.science/conferences-and-events/the-web-conference-2021\" target=\"_blank\">Web Conference</a>, my colleagues and I <a href=\"https://www.amazon.science/publications/self-supervised-hyperboloid-representations-from-logical-queries-over-knowledge-graphs\" target=\"_blank\">presented a new way</a> to embed knowledge graphs, as hyperboloids on a Poincaré hypersphere.\nHyperboloids are bounded curved surfaces — curved analogues of rectangles — and characteristics of hyperbolic space enable them to capture hierarchical information that conventional knowledge graph embeddings lose.\n<img src=\"https://dev-media.amazoncloud.cn/fd00c76a5e484c1089fb10309f74cf22_image.png\" alt=\"image.png\" />\nA two-dimensional projection of hyperboloid embeddings of elements of the Amazon product graph. For ease of interpretation, the graph itself is superimposed on the embeddings. The size of the hyperboloids indicates how densely connected the corresponding graph elements are to other elements, which in this case indicates terms’ generality. The projection renders elements lower in the graph hierarchy as closer to the boundary of the embedding circle.\nOur paper describes a neural-network architecture for learning hyperboloid embeddings that enables us to logically compose knowledge graph queries. We could, for instance, search a product graph for all footwear from both brand A and brand B, a query that could be logically represented as the union of the intersections of the embeddings for brand A and brand B with the embedding for footwear.\n<img src=\"https://dev-media.amazoncloud.cn/7ccf92cfe9a64781b372fd40c70d7f5c_image.png\" alt=\"image.png\" />\nIn this product graph, both brand names (A and B) and product types (footwear) are network nodes. A node that descends from two parents (say, brand B and footwear) fits into two categories. The set of nodes with the same two parents constitutes the intersection of the parents’ categories. \nCREDIT: GLYNIS CONDON\nIn experiments, we compared our approach to four other graph embedding schemes, using five different datasets. On each dataset, we posed nine different queries, for a total of 45 queries total. Our approach outperformed its predecessors on 44 of those queries. Sometimes the improvements were quite dramatic: on individual queries, improvements of 20% to 50% over the best-performing predecessor were common.\n<h4><a id=\"Horocycle_intersection_21\"></a>Horocycle intersection</h4>\nA knowledge graph consists of nodes, which represent entities, connected by edges, which represent relationships between entities. A typical graph embedding would represent nodes and edges as vectors in the embedding space. With a good embedding, summing the vectors for a node and one of its edges should approximate the vector for the node that shares that edge.\nOur embedding scheme, which we call HypE, instead embeds nodes and edges as hyperboloids on a Poincaré hypersphere. More particularly, each hyperboloid is defined by the intersections of two pairs of parallel arc-aligned horocyles.\nAn arc-aligned horocyle is a partial circle that is parallel to the diameter of a Poincaré hypersphere and intersects the hypersphere boundary orthogonally. (On a Poincaré hypersphere, the notion of parallelism is that of a curved space; parallel lines may exhibit different degrees of curvature; what defines them is that they don’t intersect each other.)\n<img src=\"https://dev-media.amazoncloud.cn/a9dc6ff6262040cab21edd0173fc7aee_image.png\" alt=\"image.png\" />\nAt left are nine arc-aligned horocycles on a Poincaré disk; the other eight horocycles are parallel to the disk diameter CD, in that they do not intersect it. At right is a hyperboloid (CDEF) defined by the intersection of two pairs of parallel horocycles.\nThe hyperboloid determined by the intersection of horocycles can be described by the location of its center and the limit of its boundary (see figure above).\n<img src=\"https://dev-media.amazoncloud.cn/1bc676cd6ffd49b88079e12dcc8ae50d_image.png\" alt=\"image.png\" />\nThe intersection of hyperboloids.\nBecause hyperboloid embeddings are extended in space, HypE can learn embeddings whose spatial overlap represents the logical intersection of categories encoded in the graph. To extend the example above, the embedding for brand A footwear would be the intersection of the embedding for brand A and the embedding for footwear.\n<h4><a id=\"Architecture_41\"></a>Architecture</h4>\nWe train a neural network that takes as inputs an entity, one of its relations, an arbitrary number of additional entities, and a control signal that indicates one of three operations: translation, union, and intersection. Union and intersection are the standard logical operations; translation simply means the traversal of some number of hops (in our experiments, one to three) through the graph.\nThe control signal sets a series of switches within the network that determine which inputs contribute to the output in what capacity. The network thus explicitly learns embeddings that encode information about particular types of logical relationships.\nOur experiments compared the performance of embedding schemes on nine different queries: one-hop, two-hop, and three-hop translations; two- and three-entity intersections; two-entity unions; and three different combinations of translation and the logical operators.\n<img src=\"https://dev-media.amazoncloud.cn/d004ab0887e6447ea1252984d2e3d921_image.png\" alt=\"image.png\" />\nThe HypeE architecture, with an operator signal that controls a sequence of switches within the network. Yellow represents the centers of hyperboloids, pink their limits.\nAcross five different datasets, HypE was the top performer on all but one test, a combination of union and translation; on that test, it finished second. On the five datasets, its average improvements over the second-best embedding schemes were 7% to 33%.\nABOUT THE AUTHOR\n<h4><a id=\"Sumeet_Katariyahttpswwwamazonscienceauthorsumeetkatariya_56\"></a><a href=\"https://www.amazon.science/author/sumeet-katariya\" target=\"_blank\">Sumeet Katariya</a></h4>\nSumeet Katariya is an applied scientist at Amazon.\n"}

亚马逊云科技解决方案基于行业客户应用场景及技术领域的解决方案

联系亚马逊云科技专家